Contact Us

Kockpit is here to help you

Business Form

Mastering Data Profiling and Cleaning with Qlik Sense

The goal of every Analyst is to derive analysis from the organization's data to discover trends, pitfalls, hidden insights, etc. Visualizing these discoveries helps organizations make data-driven decisions. 

However, firstly, it is essential to know whether the data in hand is suitable for the analysis. Once that is known, the data needs to be scrutinized for flaws such as inconsistencies, completeness, etc. This essential scrutiny phase is known as Data Profiling. 

The flaws uncovered through the Profiling phase need rectification, which is quite tedious. This tedious rectification phase is known as Data Cleaning. 

In this article, we will guide you through Data Profiling and Cleaning using Qlik Sense in a simple yet comprehensive manner. 


Phases in Data Profiling

There are three sequential phases involved in Data Profiling.


Content Discovery in Data Profiling

Content Discovery is scrutinizing each column to find out Nulls, unformatted data, etc. This process helps gain insights into the data quality. 

For example, if the customer’s address is not standardized, he might not receive his order due to misplacement. Another example is that most of the time the date column is unformatted, which can lead to invalid calculations. 


In the above table, the lack of clear structure makes it difficult to determine the intended format for street addresses, city names, and ZIP codes. 

Structure Discovery in Data Profiling 

Structure Discovery is finding inconsistencies in the dataset through statistical analysis. These statistical analyses include calculating the mean, median, min, max, and distribution of rows in a column. 

These statistics allow the Analyst to find inconsistencies like outliers, data variance, etc. 

Relationship Discovery in Data Profiling

Relationships Discovery goes by its name, i.e., to discover relations/ associations between datasets/ tables. It is achieved by identifying the Primary and Foreign Keys in various tables. 


Image showing the mapping of keys from all tables

Following these phases is essential since it helps you to assess the data quality, which has a major impact on the analysis. Poor data quality results in incorrect insights, which could harm the organization's growth.

Data Profiling in Qlik Sense

Qlik Sense streamlines the process of Data Profiling by automating all the phases mentioned above, however the only thing that isn't automated is the data load phase.😉

Data Profiling can be done in the Data Manager of Qlik Sense. Once you load the dataset(s) in the Data Manager, the tables from the dataset appear as bubbles. 

Let's go through the aforementioned phases, but using Qlik Sense.


Relationship Discovery in Qlik Sense

Relationship Discovery in Qlik Sense is simple since it automatically discovers the associations between all the tables. 


It also shows the strength of the association through colors (green, amber, red). Check out the image below!


It allows the user to create custom associations too. This can be done by dragging a bubble field onto another one. Doing so will display the association's strength and subsequently upon releasing the mouse, a pop-up will appear. You can now add the Primary and Foreign keys to create the association.


Check this GIF to learn how to create custom associations. 



Structure and Content Discovery in Qlik Sense

The Qlik Sense Data Manager has a feature called Summary Cards that displays the stats of individual columns. 



The Summary Card displays information relevant to both the Structure and Content Discovery. 


Below is the data presented in a Summary Card: 


  • Field Name

  • Field Type

  • Distinct Values

  • Total Values



  • Pie Chart showing Nulls if present. 


  • A bar chart showing the distinct value distribution of a column 


  • Value Range: Show the Min, Max, Mean, and Median.


  • Temporal Field (Date Fields) in a Summary Card shows the Min and Max date. Take a look! 


Qlik Sense is a powerful tool for analysis since it provides automation for almost every feature it provides, thus easing the job of the Analyst and giving him time to focus on the data. 


Data Cleaning in Qlik Sense

Once you’ve profiled your data and discovered the flaws, it's time to rectify them. Qlik Sense provides a way to cleanse your entire dataset by allowing you to: 

  • Replace certain values in the column (Value Replacement Card),

  • Setting nulls in place of empty spaces or another criterion (Setting NULL Card),

  • Custom sort the data (Sorting Card),

  • Splitting the values in a column based on delimiters (Value Split Card).  


Let's delve into each functionality sequentially! 


Value Replacement in Qlik Sense 

Value Replacement is an important data-cleaning concept where a value is replaced with another value from the same column.  Usually, in the data from the client, some of the entries are incorrect (for example a misspelled word) or don’t match the general format of the data within that column (for example, a country column contains full names of countries, but some rows may contain abbreviations, say India is stored as IN)

 


This causes a problem since Qlik will make a distinction between the full country name India and the abbreviation IN. In other words, even though India and IN mean the same, Qlik Sense will treat both as separate values. 


Important Rule: Replacement can be done if there are at most 5000 distinct values. 


Steps to Perform Value Replacement:

Visualize how Value Replacement is done below. 



NOTE: Qlik Sense allows you to choose and replace multiple values with a common value.



This approach of replacing values that Qlik Sense provides is such a load reducer, along with its easy-to-use UI.


Setting Nulls in Qlik Sense 

It is normal for rows to be null due to a business use case. However, the way a client represents these null rows differs. Some prefer representing Nulls as empty spaces, as X’s, etc. 

Setting Nulls is yet another important data-cleansing concept since Qlik Sense doesn’t recognize empty spaces and the like as Nulls, they must explicitly be converted to NULL().


While in the previous functionality, you can replace the dataset’s Null-like values with something else, replacing with Nulls isn’t possible. However, the Setting Null card allows you to set rows to Null. 

Setting Null Values: Visualize how Null Values are set. 



Qlik Sense provides an easy-to-use interface for setting Null values which otherwise is quite a laborious task!


Data Sorting in Qlik Sense 

Data Sorting is an important concept when it comes to sales or the like since we are often required to create charts with values in descending/ ascending order. This can be achieved using the Sorting Card.


IMPORTANT: It’s preferred to sort data in the backend of an application since it boosts the performance of the front-end visualizations. 


Qlik Sense provides a special functionality, i.e. a custom sort. Sometimes we must sort data according to the zones (south, north…) or a hierarchy.

In this section, we will see how to sort data according to different zones (One of our client's requirements)

For performing a Custom Sort, first visualize how to Custom Sort.



Once you click on the Order Tab, you see a list of distinct values in the left table → Besides each value, there are three stacked lines. 

Click and hold and drag that row to the position you desire! 


Splitting Data in Qlik Sense

This is a common concept when it comes to data cleaning, i.e. Splitting Data in rows. 

Usually, clients store the date and time in one column, the same is true for the entire address, etc. During meetings, they usually say that they want the date and time separated, the same for the postal code, street name in the address field, and so on. 

In this case, Splitting the values in the rows is essential, which can be easily done with the help of Qlik Sense. 


NOTE: Fields containing date and time information are automatically split into date and time fields in the Data Manager!



For performing Data Splitting in Qlik Sense, visualize this…



NOTE: For Split to work correctly, the data should be consistent across all rows. 


Conclusion

In wrapping up, this guide has walked you through the crucial phases of Data Profiling and Cleaning using the powerful tools within Qlik Sense. Qlik Sense streamlines the often intricate tasks of discovering data insights and rectifying flaws by automating processes and providing intuitive interfaces. From uncovering content and structure issues to effortlessly handling data cleaning tasks, Qlik Sense empowers analysts to focus on deriving meaningful insights, making it an invaluable asset in the realm of data analytics.