The goal of every Analyst is to derive analysis from the organization's data to discover trends, pitfalls, hidden insights, etc. Visualizing these discoveries helps organizations make data-driven decisions.
However, firstly, it is essential to know whether the data in hand is suitable for the analysis. Once that is known, the data needs to be scrutinized for flaws such as inconsistencies, completeness, etc. This essential scrutiny phase is known as Data Profiling.
The flaws uncovered through the Profiling phase need rectification, which is quite tedious. This tedious rectification phase is known as Data Cleaning.
In this article, we will guide you through Data Profiling and Cleaning using Qlik Sense in a simple yet comprehensive manner.
There are three sequential phases involved in Data Profiling.
Content Discovery is scrutinizing each column to find out Nulls, unformatted data, etc. This process helps gain insights into the data quality.
For example, if the customer’s address is not standardized, he might not receive his order due to misplacement. Another example is that most of the time the date column is unformatted, which can lead to invalid calculations.
In the above table, the lack of clear structure makes it difficult to determine the intended format for street addresses, city names, and ZIP codes.
Structure Discovery is finding inconsistencies in the dataset through statistical analysis. These statistical analyses include calculating the mean, median, min, max, and distribution of rows in a column.
These statistics allow the Analyst to find inconsistencies like outliers, data variance, etc.
Relationships Discovery goes by its name, i.e., to discover relations/ associations between datasets/ tables. It is achieved by identifying the Primary and Foreign Keys in various tables.
Following these phases is essential since it helps you to assess the data quality, which has a major impact on the analysis. Poor data quality results in incorrect insights, which could harm the organization's growth.
Qlik Sense streamlines the process of Data Profiling by automating all the phases mentioned above, however the only thing that isn't automated is the data load phase.😉
Data Profiling can be done in the Data Manager of Qlik Sense. Once you load the dataset(s) in the Data Manager, the tables from the dataset appear as bubbles.
Let's go through the aforementioned phases, but using Qlik Sense.
Relationship Discovery in Qlik Sense is simple since it automatically discovers the associations between all the tables.
It also shows the strength of the association through colors (green, amber, red). Check out the image below!
It allows the user to create custom associations too. This can be done by dragging a bubble field onto another one. Doing so will display the association's strength and subsequently upon releasing the mouse, a pop-up will appear. You can now add the Primary and Foreign keys to create the association.
Check this GIF to learn how to create custom associations.
The Qlik Sense Data Manager has a feature called Summary Cards that displays the stats of individual columns.
The Summary Card displays information relevant to both the Structure and Content Discovery.
Below is the data presented in a Summary Card:
Field Name
Field Type
Distinct Values
Total Values
Pie Chart showing Nulls if present.
A bar chart showing the distinct value distribution of a column
Value Range: Show the Min, Max, Mean, and Median.
Temporal Field (Date Fields) in a Summary Card shows the Min and Max date. Take a look!
Qlik Sense is a powerful tool for analysis since it provides automation for almost every feature it provides, thus easing the job of the Analyst and giving him time to focus on the data.
Once you’ve profiled your data and discovered the flaws, it's time to rectify them. Qlik Sense provides a way to cleanse your entire dataset by allowing you to:
Replace certain values in the column (Value Replacement Card),
Setting nulls in place of empty spaces or another criterion (Setting NULL Card),
Custom sort the data (Sorting Card),
Splitting the values in a column based on delimiters (Value Split Card).
Let's delve into each functionality sequentially!
Value Replacement is an important data-cleaning concept where a value is replaced with another value from the same column. Usually, in the data from the client, some of the entries are incorrect (for example a misspelled word) or don’t match the general format of the data within that column (for example, a country column contains full names of countries, but some rows may contain abbreviations, say India is stored as IN)
This causes a problem since Qlik will make a distinction between the full country name India and the abbreviation IN. In other words, even though India and IN mean the same, Qlik Sense will treat both as separate values.
Steps to Perform Value Replacement:
Visualize how Value Replacement is done below.
This approach of replacing values that Qlik Sense provides is such a load reducer, along with its easy-to-use UI.
It is normal for rows to be null due to a business use case. However, the way a client represents these null rows differs. Some prefer representing Nulls as empty spaces, as X’s, etc.
Setting Nulls is yet another important data-cleansing concept since Qlik Sense doesn’t recognize empty spaces and the like as Nulls, they must explicitly be converted to NULL().
While in the previous functionality, you can replace the dataset’s Null-like values with something else, replacing with Nulls isn’t possible. However, the Setting Null card allows you to set rows to Null.
Setting Null Values: Visualize how Null Values are set.
Qlik Sense provides an easy-to-use interface for setting Null values which otherwise is quite a laborious task!
Data Sorting is an important concept when it comes to sales or the like since we are often required to create charts with values in descending/ ascending order. This can be achieved using the Sorting Card.
Qlik Sense provides a special functionality, i.e. a custom sort. Sometimes we must sort data according to the zones (south, north…) or a hierarchy.
In this section, we will see how to sort data according to different zones (One of our client's requirements)
For performing a Custom Sort, first visualize how to Custom Sort.
Once you click on the Order Tab, you see a list of distinct values in the left table → Besides each value, there are three stacked lines.
Click and hold and drag that row to the position you desire!
This is a common concept when it comes to data cleaning, i.e. Splitting Data in rows.
Usually, clients store the date and time in one column, the same is true for the entire address, etc. During meetings, they usually say that they want the date and time separated, the same for the postal code, street name in the address field, and so on.
In this case, Splitting the values in the rows is essential, which can be easily done with the help of Qlik Sense.
For performing Data Splitting in Qlik Sense, visualize this…
In wrapping up, this guide has walked you through the crucial phases of Data Profiling and Cleaning using the powerful tools within Qlik Sense. Qlik Sense streamlines the often intricate tasks of discovering data insights and rectifying flaws by automating processes and providing intuitive interfaces. From uncovering content and structure issues to effortlessly handling data cleaning tasks, Qlik Sense empowers analysts to focus on deriving meaningful insights, making it an invaluable asset in the realm of data analytics.