Contact Us

Kockpit is here to help you

Business Form

Exploring Window Functions, CTEs, and Pivot Tables in SQL

In the world of databases, it's essential to learn advanced SQL techniques to make data work better for you. This article is your guide into the advanced side of SQL, where we'll introduce you to some powerful tools like Windows Functions, Common Table Expressions (CTEs), and Pivot/Unpivot operations. 


Once you understand and use these tools, your SQL skills will level up, enabling you to work with data in more sophisticated ways. This means you can uncover valuable insights from complex data more easily. Join us on this journey as we uncover the hidden potential of SQL for smarter and more efficient data handling.

Exploring Windows Functions 


Many people are familiar with the GROUP BY function in SQL that does calculations on a table when you group the data. But only some know about window functions, which are special tools that calculate things on a bunch of rows and give you one result for each row. In this article, we'll dive into window functions in SQL Server. 


Windows Functions 


Window functions calculate values for groups of rows, not the whole table. They perform calculations similar to aggregate functions but return results for each row within a group, instead of combining them into one result. This means each row keeps its unique value. 



Types of Windows Functions: 


1. Aggregate Window Functions: These functions work with many rows, like SUM(), MAX(), MIN(), AVG(), COUNT(), etc.


2. Ranking Window Functions: These functions assign ranks to rows in a table partition, such as RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE(), etc. 


3. Value Window Functions: These functions use a series of values nearby, like LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), etc. 


Syntax: 


SELECT coulmn_name, 

 window_function(cloumn_name)

 OVER([PARTITION BY column_name] [ORDER BY column_name]) AS new_column

FROM table_name; 


The OVER clause is used with window functions to define how the data should be organized. It has two main functions: 

  • It groups rows into different sets based on a specific column or condition (done using the PARTITION BY clause). 

  • It sorts the rows within each group in a particular order (achieved using the ORDER BY clause). 


Table - student_score

Student Table.png


  1. Aggregate Window Functions 

SUM ()

Example -

In this example, we find the department-wise total sum of scores and show it in front of each student.


SELECT *,

ROUND (SUM(score) OVER(PARTITION BY dep_name), 2) AS dep_Total_score 

FROM student_score;




Output: 

SUM.png


AVG()

Example -

Find the average score of students for each department.

SELECT *, 

ROUND (AVG(score) OVER(PARTITION BY dep_name), 2) AS dep_average_score 

FROM student_score;


Output: 

AVG.png


MAX() 

Example -

Find the maximum score of students for each department.


SELECT *,

MAX(score) OVER(PARTITION BY dep_name) AS dep_maximum_score 

FROM student_score;




Output: 

Max.png


MIN()

Example -

Find the minimum score of students for each department.


SELECT *,

MIN(score) OVER(PARTITION BY dep_name) AS dep_minimum_score FROM student_score;




Output: 

min.png

COUNT()

Example -

Find a department-wise count of students and show it in front of each student.


SELECT *,

COUNT(student_name) OVER(PARTITION BY dep_name) AS Count_of_Student 

FROM student_score;




Output: 

Count.png


  1. Ranking Windows Function

RANK()

In SQL, the RANK window function is used to assign a rank to each row within a result set. The rank is determined based on the values in one or more columns specified in the ORDER BY clause. Rows with the same values receive the same rank, and the next rank is skipped. 

For example, if two rows have the same values and both receive a rank of 2, the next row will receive a rank of 4, not 3.

Example -

Assign a rank to students for each department and order them by score in descending order.


SELECT *,

RANK() OVER(PARTITION BY dep_name ORDER BY score DESC) AS Rank_By_Department

 FROM student_score;




Output: 

Rank.png


DENSE_RANK()

The DENSE_RANK() function gives each row in a group a rank. If multiple rows have the same value, they will have the same rank. But the ranks of the next rows won't be skipped.

Example -

Assign a rank to students for each department and order them by score in descending order


SELECT*,

DENSE_RANK() OVER(PARTITION BY dep_name ORDER BY score DESC) AS Dense_Rank 

FROM student_score;




Output: 

Dens rank.png


ROW_NUMBER()

In SQL, the ROW_NUMBER() window function is used to assign a unique number to each row within a result set. It assigns a different number to each row although multiple rows have the same value.

Example -

Assign a unique value to each student in the alphabetically ascending order of their name.


SELECT*,

ROW_NUMBER() OVER(ORDER BY student_name) AS name_serial_number 

FROM student_score;



Output: 

Row_No.png


NTILE()

In SQL Server, the NTILE() function is like a tool that helps organize your data into groups. These groups are made so that each group has about the same amount of data. It makes it simpler to study or sort your information into different parts.

Syntax:


NTILE(number_expression) OVER (

   [PARTITION BY column name ]

   ORDER BY column name [ASC | DESC]

)




Number Expression - The number of groups we want to form.

Example -

Here we use the NTILE() function to divide rows into three groups.


SELECT student_id, student_name,

NTILE (3) OVER ( ORDER BY student_id ) AS Group_number 

FROM student_score;




Output: 

Ntile.png


  1. Value Window Functions

LAG()

We use the LAG() function to access previous row data with a defined offset value. We can use this window function to make a comparison with the previous row.

Syntax:

LAG (scalar_expression [, offset] [, default])  

OVER ( [  PARTITION BY column name  ] ORDER BY column name )


  • Scalar Expression — Column Name

  • Offset — This is an integer used to forgo behind the number of rows

  • Default —  If there is no previous row, then it shows NULL default. Otherwise, we can show the value we want.

Example -

Write a SQL query that retrieves the sales and previous sales details for each employee, splitting the result set by year and then sorting each partition based on the 'country' column.


SELECT Year, Prod name, Country, Sales, 

LAG(Sales, 1) OVER (PARTITION BY Year ORDER BY Country) AS Previous_Sale

FROM Product Sales;




Output: 

LAG.png


LEAD()

LEAD() Function is used when we compare individual row data with subsequent row data.

LEAD(scalar_expression, offset [, default])

OVER (

      [PARTITION BY Column Name,]

      ORDER BY Column Name [ASC | DESC])


  • Scalar Expression - Column Name

  • Offset - This is an integer used to forgo a subsequent number of rows

  • Default - If there is no subsequent row then it shows NULL default. Otherwise, we can show the value we want.


Example -

Write an SQL query that splits the result set by year and sorts each partition by the country column to retrieve the sales and next sales details for each employee.


SELECT Prod name, Year, Country, Sales, 

LEAD (Sales,1) OVER (PARTITION BY Year ORDER BY Country) AS Next_Sale 

FROM Product Sales;




Output: 

Lead.png


FIRST_VALUE()

FIRST_VALUE() window function returns the first value from the table.

Syntax:

SELECT *,

FROM table name;

FIRST_VALUE ( scalar_value )  

OVER (  

  [PARTITION BY column name]

  ORDER BY column name [ASC | DESC]

  ) AS column name ;


Scalar Value - Column Name

Example -

Here in this example, we find the first sale of each country.


SELECT Year, Prod name, Country, Sales, 

FIRST_VALUE(Sales) OVER(PARTITION BY Country ORDER BY Country) first_sale 

FROM Product Sales;




Output: 

First value.png



LAST_VALUE()

LAST_VALUE() window function returns the last value form the table.


SELECT *,

FROM table name

LAST_VALUE ( scalar_value )  

OVER (  

  [PARTITION BY column name ]

  ORDER BY column name [ASC | DESC]

AS column name ;


Scalar Value - Column Name

Example - 

Here in this example, we find the last sale of each country.


SELECT Prod name, Year, Country, Sales, 

LAST_VALUE(Sales) OVER(PARTITION BY Country ORDER BY Country) last_sale 

FROM Product Sales;




Output: 

Last Value.png


Exploring CTEs in SQL

We can use SQL Server's Common Table Expressions (CTEs) to simplify complex joins and subqueries. CTEs are like temporary tables that are created when we run a query and used for that query's calculations. However, they are not stored anywhere, and once the query is done, the CTE is deleted, so the data is not saved. This helps us work with data in a single query, like SELECT, INSERT, UPDATE, DELETE, CREATE VIEW, or MERGE.


Syntax:

WITH cte_name (column_names)   

AS (query)     

SELECT * FROM cte_name; 


Steps to Write a CTE:

  • Start CTE by writing the WITH clause.

  • Specify CTE name.

  • After that specify a comma-separate list of columns that we are going to use.

  • Give Alias to the CTE after AS.

  • Define a SELECT statement whose result set populates the CTE.


It should be kept in mind while writing the CTE query definition; we cannot use the following clauses:

  • ORDER BY unless you also use as TOP clause

  • INTO

  • OPTION clause with query hints

  • FOR BROWSE

Table - student_score

Student Table.png


Example -

Write a query to find students whose score is greater than the average score.


WITH average_score (avg_score) AS (SELECT AVG(Score) 

            FROM student score)


SELECT *

FROM student score S, average_score AV 

WHERE S.score > AV.avg_score;




Output:

cte.png

Multiple CTEs

Sometimes, we have to make several CTE queries and combine them to view the outcomes. We can apply the idea of multiple CTEs in this situation. To make multiple CTE queries into one statement, we need to use the comma (",") operator. Remember, the CTE name comes before the comma operator to tell them apart when we have multiple CTEs.


Syntax:

WITH   

   cte_name1 (column_names) AS (query),  

   cte_name2 (column_names) AS (query)  

SELECT * FROM cte_name  

UNION ALL  

SELECT * FROM cte_name; 


Table - Product_Sales 

Table.png

Examples -

Write a SQL query for a Country whose Total sale is greater than the average sales of all countries.


With Total_Sales (Country, Total_Sale_Per_Country) as 

                               (SELECT Country, SUM(Sales) as Total_Sale_Per_Country    

                                 FROM Product Sales GROUP BY Country), 

           Avg_Sales(avg_sales_for_all_Country) as

                               (SELECT AVG(Total_Sale_Per_Country) as avg_sales_for_all_Country

                                FROM Total_Sales)


SELECT *

FROM Total_Sales T, Avg_Sales Av 

WHERE T.Total_Sale_Per_Country > Av.avg_sales_for_all_Country;




Output:

Multtiple cte.png


Benefits Of Using CTEs

  • Improved readability: CTEs make your query more readable, especially for larger queries and those with complex logic.

  • Reuse of subqueries: If you have multiple parts of a query that use the same data or subquery, you can move it to a CTE and refer to it in multiple places.

  • Multiple CTEs: You can use multiple CTEs within a query by separating them with a comma.

  • Recursive queries: CTEs can be used to write recursive queries or hierarchical queries, which refer to previous rows in the same query.


Exploring PIVOTE and UNPIVOTE Functions in SQL:

Pivoting in SQL means changing the way your data is displayed. Instead of having rows of information, it arranges it into columns, which makes it neater and more organized. You can do this to make it simpler to work with or to show it differently. SQL has tools like `PIVOT` and `UNPIVOT` to help with this, and you can also use functions that summarize or rearrange your data for better analysis or presentation. 


Table  - Pivote_Demo

Pivote Data.png


PIVOT

In SQL, the “PIVOT” function is used to transform rows of data into columns. The PIVOT function is commonly used with aggregate functions to summarize and analyze data in a more structured format, making it easier to work with and understand. It is especially useful when you want to turn data from a long format (rows) into a wide format (columns) for reporting or analysis purposes.


Syntax:

SELECT (ColumnNames) 

FROM (TableName) 

PIVOT

 ( 

   AggregateFunction(ColumnToBeAggregated)

   FOR PivotColumn IN (PivotColumnValues)

 ) AS (Alias) //Alias is a temporary name for a table


Example -

Create  North and South as columns and show the sales from Pivot Demo Data.


SELECT Year, North, South 

FROM pivot_demo

PIVOT

(

SUM(Sales) FOR Region IN(North, South)

 ) AS Pivote_Table



Output: 

Pivote.png

UNPIVOT

In SQL, the UNPIVOT function is used to reverse the process of pivoting data. Essentially, it converts columns into rows, which can be particularly useful when you want to analyze or work with the data in a more detailed and structured way.


Syntax:

SELECT (ColumnNames) 

FROM (TableName) 

UNPIVOT

 ( 

   AggregateFunction(ColumnToBeAggregated)

   FOR PivotColumn IN (PivotColumnValues)

 ) AS (Alias)


Example -

Here we unpivote the pivoted data in the previous example.


SELECT Region, Year, Sales 

FROM (SELECT Year, North, South 

              FROM pivot_demo

PIVOT

( SUM(Sales) FOR Region IN(North, South) 

) AS Pivote_Table) as New_Table

UNPIVOT

(

Sales FOR Region IN(North, South) 

) AS Unpivote_Table




Output:

Unpivote.png

Conclusion 

SQL is a versatile language beyond basic SELECT, INSERT, UPDATE, and DELETE operations. Windows Functions, CTEs, and Pivot/Unpivot operations are valuable tools that enable you to solve complex data analysis challenges more efficiently and with greater precision. By understanding and utilizing these techniques, you can streamline your SQL queries, extract meaningful insights from your data, and ultimately make better-informed decisions in your data analysis journey.