Contact Us

Kockpit is here to help you

Business Form

All You Need to Know About SQL Windows Function

The most basic premise of the Window Function is to perform comparative analysis, which gives rise to its alternate name, i.e., Analytic Functions! 


The functionality of Window Functions and GROUP BY Clause is quite similar. Both functions aggregate data, however, the former presents the aggregation against each row, while the latter collapses and summarizes it. 


For the time being, just visualize this result obtained through windowing. I'm sure it will give you a rough idea.



Notice how the salary for both departments has been aggregated (averaged) and displayed in the dept_avg_salary column, while the individual salaries remain intact in the salary column!


This is what I meant when I said “The former presents the aggregation against each row.


Difference Between GroupBy and Window Functions 

Sometime in the future, when you learn more about SQL Window Functions, you may ask… 


“Wait, wait, wait, what's the difference between GroupBy and Windowing.” 


Let's make a distinction between GROUP BY and Windows Function before that day comes. Things are going to get a bit technical! 


For each user-specified group, the GROUP BY function applies user-defined aggregations, which collapses multiple rows into a single one. This helps in summarizing the data, specifically! 



Referencing the first example, the rows were grouped, however, the rows that lead to obtaining the dept_avg_salary weren't removed. In other words, the rows weren’t collapsed. 


Window Functions uses pseudo-GroupBY (I’ll explain this) and aggregation to summarize data, however, it leaves the table as it is, i.e., it does not collapse the data into a single row per group. Instead, the aggregation output is added next to each row/ data point. 


SQL Window Function Syntax Breakdown 

Before delving any further, I would like you to get to know the syntax used for Windowing. It’s pretty straightforward, making it easy to grasp! 


Syntax: 


SELECT 

  col_name1,

  ...,

  ...,

WINDOW_FUN(col_name) OVER (PARTITION BY col_name    ORDER BY col_name) AS alias


FROM table_name


Syntax Breakdown:





Kindly hold your horses, this was just the introduction.  Soon you’ll learn the intricacies that the above clauses hold! 


Types of SQL Window Functions 

All the windowing functions are condensed into 3 types, each one having its own set of unique functions! 


You need to know the difference between the above three functions, their capabilities, and usage, which I shall cover all! 

This article will contain the most prominent Window Functions, while the rarely to never-used functions will be excluded! 


Aggregate Window Functions

These functions simply aggregate data within the window and display the result across individual rows. The aggregate functions include SUM, AVG, COUNT, MIN, and MAX. 


SUM()

The SUM() function adds the values of all rows of a column within a user-defined window. Let’s learn by example.


Question

Suppose you have a table called sales that contains information about sales transactions, including the order_id, product_id, sales_date, and sales_amount. You want to calculate the total sales amount for each product, along with the running total sales amount for each product, sorted by sales date. Using SQL, write a query to achieve this. 


Solution

Let's first understand what the running total is.  


Suppose you have a list of daily sales for a product: 

- Day 1: $100

- Day 2: $150

- Day 3: $50


The running total for this data would be calculated as follows: 

- Day 1: $100 

- Day 2: (including Day 1): $100 + $150 = $250 

- Day 3: (including Day 1 and Day 2): $100 + $150 + $50 = $400 


Solution Code


SELECT 

  order_id,

  product_id,

  sales_date,

  sales_amount,

  SUM(sales_amount) OVER (PARTITION BY product_id ORDER BY sales_date) AS running_total_sales,

  SUM(sales_amount) OVER (PARTITION BY product_id) AS total_sales


FROM sales


Code Breakdown 




NOTE: How does a SQL Windows function work? 





Result



The running total can only be calculated using the ORDER BY Clause, while total_sales cannot be calculated using it. 


AVG()

The AVG() function averages the values of all rows of a column within a user-defined window. Let’s learn by example.


Question

Write an SQL query to calculate the 30-day rolling average closing price for each stock symbol in the daily_stock_prices table. The result should include the stock symbol, price date, closing price, and the rolling average closing price. 


Solution Code


SELECT

    stock_symbol,

    price_date,

    closing_price,

    AVG(closing_price) OVER (

        PARTITION BY stock_symbol

        ORDER BY price_date

    ) AS rolling_average


FROM daily_stock_prices;


Code Breakdown




Result 



One of the salient features of the rolling average is that it helps identify the outliers in the data. Suppose the closing_price on 06-01-2023 explodes, then the rolling_average on that day will also explode! 


Ranking Window Functions

These Window functions are specially designed to rank the rows of a dataset. Any query that requires ranking based on sales, salary, purchases, etc. will utilize these functions. 


You will grasp this topic once you go through a couple of examples. 


ROW_NUMBER() 

This function assigns a unique number to each row according to how the data is sorted! In other words, it assigns a row number to each row.


Question 

Suppose you have a table named Employees with the following columns: EmployeeID, EmployeeName, and Salary. You want to retrieve a list of employees with their names, salaries, and a unique row number assigned to each employee based on their salary, in descending order of salary.


Solution Code 


SELECT

    EmployeeName,

    Salary,

    ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNumber

FROM

    Employees;


Solution Breakdown



Results


This is the simplest way to rank your dataset based on the column you pass in the ORDER BY clause. However, further down you’ll learn that there is one functionality that it lacks! 

Can you guess it? 


RANK() 

The RANK() function ranks all rows within a window. It assigns the same rank to rows with the same score/ value, i.e. rows that have tied. Furthermore, in case of ties, subsequent ranks are skipped (There is an example ahead clearly explaining this point). 


Question

Consider a table named orders with the following columns: 



Write an SQL query to find the top 3 customers who have placed the most orders. If there are ties, assign the same rank to tied customers. Your query should return the customer_id, the count of orders they've placed, and their rank.


Solution Code


WITH CustomerOrderCounts AS (


   SELECT

     customer_id,

     COUNT(*) AS order_count, MAX(order_date) AS latest_order_date


   FROM Orders

   GROUP BY customer_id


), RankedCustomers AS (


   SELECT

     customer_id, 

     order_count,

     latest_order_date,

     RANK () OVER (ORDER BY order_count DESC, latest_order_date DESC) AS rank


   FROM CustomerOrderCounts

)


SELECT

  customer_id,

  order_count, 

  latest_order_date,

  rank 


FROM RankedCustomers

WHERE rank <= 3



Solution Breakdown



Result



NOTE: An important feature of RANK() is that in cases of ties, the following rank(s) is/ are skipped. Suppose 3 rows share the 3rd place, the following row will start from the 6th rank, meaning 2 Ranks are skipped.


Visualize:  


NOTE:

If, N → No. of rows sharing the same place

Then, No. of ranks to skip → (N - 1) 


Ex: 4 rows are tied on 3rd place. 

No. of ranks to skip → 4-1 → 3 

∴ The next rank → 7th place


NOTE: The ROW_NUMBER() function can also be used for ranking as we saw above, however, it lacks the functionality of handling ties! 


DENSE_RANK() 

This function is completely similar to the RANK() function explained above. The only difference lies in the ranking procedure. In cases of ties, DENSE_RANK() doesn’t skip the following rank. 


YES, That’s it!! 


Referencing the above example, the 2nd rank will be assigned after the 1st. Thus, no ranks are skipped. 


Here's a task…. Solve the question in RANK() using DENSE_RANK()… 


HINT: just interchange RANK()



 Result:   



You may be thinking… “Sure sir, I’ve grasped the difference between the two rankings due to my competency… however, when is it to be used??”


In my opinion, it’s a good practice to use DENSE_RANK().


Value Window Functions

These functions return the values of the next/ preceding row while the current row is being processed. This may be the most important function, since generally while performing analysis we are required to calculate differences between the current row and the next/ preceding row. 


Let's learn by example! 


LEAD() 


This function returns the value of the next row for the current row being processed! 


Syntax


SELECT 

  LEAD(col_name, offset, default) OVER (user_defined_Window)


FROM table_name





NOTE: If the offset is set to 1, the next row will be chosen. If it's set to 2 the second row will be chosen and so on. 



Question

A table named Sales exists with the following columns: OrderDate, ProductID, and Revenue.


Create a query that returns the ProductID, OrderDate, and the revenue difference (RevenueDiff) between the current order and the next order for each product. If there is no next order for a product, treat the RevenueDiff as 0. 


Solution


WITH RankedSales AS (

    SELECT

        ProductID,

        OrderDate,

        Revenue,

        LEAD(OrderDate) OVER (PARTITION BY ProductID ORDER BY OrderDate) AS NextOrderDate,

        LEAD(Revenue) OVER (PARTITION BY ProductID ORDER BY OrderDate) AS NextRevenue


    FROM Sales

)


SELECT

    a.ProductID,

    a.OrderDate,

    CASE

        WHEN b.ProductID IS NOT NULL THEN b.NextRevenue - a.Revenue

        ELSE 0

    END AS RevenueDiff


FROM

    RankedSales a

LEFT JOIN

    RankedSales b ON a.ProductID = b.ProductID AND 

    a.OrderDate = b.NextOrderDate


ORDER BY

    a.ProductID, a.OrderDate;


Solution Breakdown





Result

The reason why the last row of every product_id group is 0 is that if it takes the value of the next row (i.e. from the next group), the Window function gets violated (PARTITION BY product_id) 


LAG()

This function returns the value of the previous row for the current row being processed! 


Syntax

SELECT 

  LAG(col_name, offset, default) OVER (user_defined_window)


FROM table_name




A task again…. Solve the question in LEAD() using LAG()… 


HINT: just interchange LAG()    


In this example too, the 1st row is 0 since it’s a violation to take a value from the previous group! 


Conclusion 

By now, I hope you have gained valuable insights into SQL Window Functions. Also, I hope you comprehended the fundamentals of these functions, including Aggregate, Ranking, and Value Window Functions. Besides, keep in mind the key distinctions between Window Functions and GroupBy, the syntax for their implementation, and their practical use cases. 


All in all, Windowing is a powerful tool for performing comparative analysis in SQL!! 

HAPPY CODING!!