Abdullah G. R Mastan

Jan 25

SQL Indexing and Performance Optimization

What if I gave you the 7 volumes of Harry Potter without the index section and requested you to find the chapter — ‘The Missing Mirror’?

You might get a concussion by this lunatic request! Why… Because you would have to scan through almost three and a half thousand pages!

Indexes in SQL are the digital twin of Indexes in Books! Indexes aid in increasing the ‘Search’ performance of the database by referring to a Look Up Table (Index Table) consequently easing the process of data fetching.

In the absence of an SQL Index, the database engine will read each row in a table (also known as Table Scan) thereby decreasing performance.

NOTE: Indexing has multiple trade-offs that need to be considered, meaning it isn’t the absolute solution to performance optimization.

SQL Index Architecture

Indexes in SQL are stored in a Binary tree-like fashion, containing:-

The Root Node (Top Most)
The Intermediate Levels
The Lead Nodes (Data Pages).

The above architecture is known as the Binary tree.

The indexes at the Root Node split into multiple categories which leads to a Node at a Intermediate Level. Each Node at an Intermediate Level branches into multiple categories in subsequent levels. The nodes in Intermediate Level n are known as Leaf Nodes that point to Data Pages.

So.. what are data pages?

Data Pages in SQL Index

Data Pages are files that consume 8kb of hard disk space, storing a row value (From the actual table) and the corresponding row address. These rows are stored are sorted alphabetically.

Let's take a look at an example where an Index has been created on the column Country. The rows in the index have been sorted alphabetically:

NOTE: No. of rows that each page can store depends on the size of the row.

Suppose all the rows are to be fetched having Country corresponding to USA., the query will be used.

SELECT

FROM products

WHERE Country= ‘USA’

SQL Index Working:

Suppose there are 1M records in a data page and the USA starts at nearly half a million, then:

SQL will scan the Index Table and find the first instance of a row having Country USA.
It will take the row address.
It will directly jump to the corresponding row in the main table using the row address.
The process will repeat until a country other than the USA occurs. This means that all rows with the country ‘USA’ have been fetched and the index scan will come to a halt.
Lastly, it will produce the output.

In an indexless table, the entire table would have been scanned, since the engine is unaware of the position of the rows having Country USA.

Types of Indexes in SQL.

There are a range of SQL Indexing techniques that can be used to suit a specific use case.

While each technique has its pros and cons, they all gather on common grounds: to boost performance. Below are the types of Index in SQL.

Let's dig deeper into the unique use cases, and their pros and cons.

Comprehending these subtitles is key to making informed decisions on which type of index to use in a given scenario, as it can make or mar the performance of your query!

Clustered Index in SQL

If you’ve been using SQL for a while and recently decided to delve into Indexes, let me inform you that you’ve been using Clustered Index all along!

Yes, a Clustered Index is implicitly created when a Primary Key Constraint is added to a table’s schema.

As an analogy… The index section of a book is separate from its contents rights… That is, a few pages are exclusively dedicated to it.

However, in the case of a Dictionary, no dedicated pages are allocated for its index, the words are ordered alphabetically and the word’s first letter plays the role of an index which is mentioned at the top! To find a word, you jump to a specific letter and start searching.

Similarly, for Clustered Indexes, the Primary Key functions as an index! In the example below, an Employee can be recognized by the SQL engine by its corresponding Employee_ID.

This is true since each row is given a Primary Key which references to the entire row. Thus, no two employees will have the same ID.

NOTE: A Clustered Index is automatically created when the Primary Key constraint is added to the schema.

Clustered Index Syntax:

CREATE CLUSTERED INDEX name_your_index

ON table_name (column_name ASC/ DESC)

IMPORTANT: Only 1 Clustered Index per Table is allowed. However, a ‘Composite Clustered Index’ can be formed using multiple columns

CREATE CLUSTERED INDEX name_your_index

ON table_name (column_name ASC/ DESC, . column_name ASC/ DESC)

Example:

Suppose your web application allows users to fetch all of their personal information.

Here's the SQL query that can help us achieve it:-

CODE:

SELECT *

FROM web_database

WHERE user_id = 22

NOTE: The presence of the WHERE clause here is crucial. Without it, SQL won’t apply any indexing. Suppose you were to open a book with the intention of not finding anything, you wouldn’t use the index… right? Finding something is equivalent to the ‘WHERE Clause’

Clustered Indexes are a great way to fetch individual records with respect to the primary key. It consumes relatively less space in contrast to its counterparts.

Non-Clustered Index in SQL

Non-Clustered Index simply means that the actual data and the index are not clustered together, i.e., the actual data and the index are kept separate unlike the clustered index (where both are maintained on the same table).

Similar to any book, right?

The first few pages are dedicated to the index, while the content resides on subsequent pages.

Let's first look at the syntax and then dig into the intricacies.

Non-Clustered Index Syntax:

CREATE NONCLUSTERED INDEX name_your_index

ON table_name (column_name ASC/ DESC)

Once the index has been created, rows in the column(s) chosen as the index are sorted according to the user-specified sort and subsequently stored in a separate table.

NOTE: It is essential to consider that because the index is stored separately, additional memory is required! Thus, one should refrain from creating excessive indexes. However, for a Clustered Index, no extra storage is required since the index and data are stored in the same table.

Each row in the newly created table contains a reference to the data in the actual table. Consider the below table where the attendance of each employee is recorded.

Below is the created index table when an index is applied to the column Employee_ID.

NOTE: The data in the actual table is unsorted, while the data stored in the index table is sorted numerically.

Example:

Suppose you have a database with a table named “products” that contains information about various products available for sale. You want to retrieve a list of products that are currently in stock (quantity available is greater than 0) and have a price of less than Rs. 50.

CODE:

SELECT

FROM products

WHERE in_stock > 0

AND cost_per_unit <= 50

IMPORTANT: Always use an Index on frequently used columns and columns included in the WHERE/GROUP BY Clause.

Execution plan of the above code:

The reference address of all stocks greater than 0 will be fetched from the stocks index table.

Once rows having stock greater than 0 are fetched, their row address will be used to refer to the actual table.

Subsequently, rows having stock greater than 0 will be selected.

To conclude, think of a Non-Clustered Index in a database like the index of a book – it keeps things organized. When you're looking for specific info, the index guides you to the right pages. Similarly, in a database, this index sorts data for quick retrieval, making searches faster and more efficient, just like finding a topic in a well-organized book.

Columnstore Index in SQL

Columnstore indexes are an excellent way to boost your Analytical operations, which heavily depend on efficient read performance.

This should mean that the Columnstore Index boosts the reading capabilities of a database, right?

Exactly!

When the columnstore index is applied to a column, the entire column gets stored in a new table. However, Non-Clustered Indexes do the same!

Wait…then how is it different from a Non-Clustered Index?

It differs in how data is stored and how the query is executed!

Visualize the difference in how data is stored in both.

(a) Non-Clustered Index (b) ColumnStore Index

In the above example, the ColumnStore Index solely stores the entire column without sorting it.

One of the most important advantages of this index is that it can not only elevate read efficiency but also storage efficiency! Let's see how…

Suppose you have a database that stores information about shipments for a logistics company and you want to retrieve a list of all shipments that are destined for a specific city.

Intuitively the city column will contain duplicates.

When a Columnstore Index is created on the column; city, the entire column will be copied to another table. Each unique city can now be replaced with an integer and the corresponding key can replace the duplicates!

(a) UnCompressed Columstore (b) Compressed Columnstore

This increases the scope for data compression! Suppose there are 10 million records, and they consume x amount of memory and are stored in 3 data pages. Upon compression (i.e. replacing the city name with a less space-consuming data type), the entire data might fit into just 1 data page!

All stars and rainbows… right?

Not so fast, this isn’t an absolution solution that can be implemented in all scenarios!

The Columnstore index mustn’t be used on highly transactional columns (meaning columns that require frequent updates/ deletion) since it heavily impacts write performance.

Check this question asked on stack overflow where the user says that performing a write operation on a table having a ColumnStore Index takes a staggering 2 hours, compared to only 40 minutes on a normal table.

However, he says that without the index, the data consumes 4-8 TB of disk space, whereas it is reduced to a mere 500GB with the index!

Filtered Index in SQL.

Filtered Indexes are a way to store only the most relevant data from a column. Suppose there’s a query that is frequently used which reads data from a column named ProductName. However, only those rows from ProductName are required that have StockQty greater than 50.

In this case, we can create an index that stores only those products having StockQty greater than 50

Filtered Index Syntax:

CREATE NONCLUSTERED INDEX idx_products

ON Products (ProdcutName, StockQty)

WHERE StockQty > 50

In short, it is a non-clustered index created using a WHERE clause.

The above query creates the following index

NOTE: The Filtered Index should be used when a frequently used query requires some sort of filtering by comparison as shown above.

In summary, Filtered Indexes in databases selectively store relevant data based on specified conditions, optimizing performance for frequently used queries with specific filtering requirements.

Conclusion

SQL indexing is the key to unlocking the full potential of your database. Whether you're a seasoned SQL user or just getting started, understanding the various index types and their implications is crucial for optimizing performance. The clustered index, non-clustered index, columnstore index, and filtered index each have their unique strengths and applications, making them valuable tools in your database optimization arsenal.

However, it's important to remember that there are trade-offs to consider when implementing indexing strategies. While they can dramatically improve query performance, they can also impact write operations. Therefore, it's essential to choose the right indexing technique that aligns with your database's specific requirements.

Happy Indexing!!

Kockpit is here to help you

Business Form