0% found this document useful (0 votes)

314 views

How Does Database Indexing Work

Database indexing works by creating additional data structures called indexes that index specific fields within database tables. Indexes allow for faster searching of data by supporting operations like binary search instead of linear search. Indexes store only the field being indexed along with a pointer to the full database record. This reduces the size of the data that needs to be searched, improving performance of queries that filter on the indexed field. The most common type of index is the B-tree index, which supports fast lookups, inserts and deletes through logarithmic time complexity.

Uploaded by

Santosh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

314 views

How Does Database Indexing Work

Uploaded by

Santosh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

How does database indexing work?

Why is it needed?

When data is stored on disk based storage devices, it is stored as blocks of data. These blocks are accessed in
their entirety, making them the atomic disk access operation. Disk blocks are structured in much the same way
as linked lists; both contain a section for data, a pointer to the location of the next node (or block), and both
need not be stored contiguously.

Due to the fact that a number of records can only be sorted on one field, we can state that searching on a field
that isn’t sorted requires a Linear Search which requires N/2 block accesses (on average), where N is the number
of blocks that the table spans. If that field is a non-key field (i.e. doesn’t contain unique entries) then the entire
table space must be searched at N block accesses.

Whereas with a sorted field, a Binary Search may be used, this has log2 N block accesses. Also since the data
is sorted given a non-key field, the rest of the table doesn’t need to be searched for duplicate values, once a
higher value is found. Thus the performance increase is substantial.

What is indexing?

Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a table
creates another data structure which holds the field value, and pointer to the record it relates to. This index
structure is then sorted, allowing Binary Searches to be performed on it.

The downside to indexing is that these indexes require additional space on the disk, since the indexes are stored
together in a table using the MyISAM engine, this file can quickly reach the size limits of the underlying file
system if many fields within the same table are indexed.

How does it work?

Firstly, let’s outline a sample database table schema;

Field name Data type Size on disk

id (Primary key) Unsigned INT 4 bytes
firstName Char(50) 50 bytes
lastName Char(50) 50 bytes
emailAddress Char(100) 100 bytes

Note: char was used in place of varchar to allow for an accurate size on disk value. This sample database
contains five million rows, and is unindexed. The performance of several queries will now be analyzed. These
are a query using the id (a sorted key field) and one using the firstName (a non-key unsorted field).

Example 1 - sorted vs unsorted fields

Given our sample database of r = 5,000,000 records of a fixed size giving a record length of R = 204 bytes
and they are stored in a table using the MyISAM engine which is using the default block size B = 1,024 bytes.
The blocking factor of the table would be bfr = (B/R) = 1024/204 = 5 records per disk block. The total
number of blocks required to hold the table is N = (r/bfr) = 5000000/5 = 1,000,000 blocks.
A linear search on the id field would require an average of N/2 = 500,000 block accesses to find a value, given
that the id field is a key field. But since the id field is also sorted, a binary search can be conducted requiring an
average of log2 1000000 = 19.93 = 20 block accesses. Instantly we can see this is a drastic improvement.

Now the firstName field is neither sorted nor a key field, so a binary search is impossible, nor are the values
unique, and thus the table will require searching to the end for an exact N = 1,000,000 block accesses. It is this
situation that indexing aims to correct.

Given that an index record contains only the indexed field and a pointer to the original record, it stands to
reason that it will be smaller than the multi-field record that it points to. So the index itself requires fewer disk
blocks than the original table, which therefore requires fewer block accesses to iterate through. The schema for
an index on the firstName field is outlined below;

Field name Data type Size on disk

firstName Char(50) 50 bytes
(record pointer) Special 4 bytes

Note: Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table.

Example 2 - indexing

Given our sample database of r = 5,000,000 records with an index record length of R = 54 bytes and using
the default block size B = 1,024 bytes. The blocking factor of the index would be bfr = (B/R) = 1024/54 =
18 records per disk block. The total number of blocks required to hold the index is N = (r/bfr) =
5000000/18 = 277,778 blocks.

Now a search using the firstName field can utilise the index to increase performance. This allows for a binary
search of the index with an average of log2 277778 = 18.08 = 19 block accesses. To find the address of the
actual record, which requires a further block access to read, bringing the total to 19 + 1 = 20 block accesses, a
far cry from the 1,000,000 block accesses required to find a firstName match in the non-indexed table.

When should it be used?

Given that creating an index requires additional disk space (277,778 blocks extra from the above example, a
~28% increase), and that too many indexes can cause issues arising from the file systems size limits, careful
thought must be used to select the correct fields to index.

Since indexes are only used to speed up the searching for a matching field within the records, it stands to reason
that indexing fields used only for output would be simply a waste of disk space and processing time when doing
an insert or delete operation, and thus should be avoided. Also given the nature of a binary search, the
cardinality or uniqueness of the data is important. Indexing on a field with a cardinality of 2 would split the data
in half, whereas a cardinality of 1,000 would return approximately 1,000 records. With such a low cardinality
the effectiveness is reduced to a linear sort, and the query optimizer will avoid using the index if the cardinality
is less than 30% of the record number, effectively making the index a waste of space.
Now, let’s say that we want to run a query to find all the details of any employees who are named ‘Abc’?

SELECT * FROM Employee WHERE Employee_Name = 'Abc'

What would happen without an index?

Database software would literally have to look at every single row in the Employee table to see if the
Employee_Name for that row is ‘Abc’. And, because we want every row with the name ‘Abc’ inside it, we can
not just stop looking once we find just one row with the name ‘Abc’, because there could be other rows with the
name Abc. So, every row up until the last row must be searched – which means thousands of rows in this
scenario will have to be examined by the database to find the rows with the name ‘Abc’. This is what is called a
full table scan

How a database index can help performance

The whole point of having an index is to speed up search queries by essentially cutting down the number of
records/rows in a table that need to be examined. An index is a data structure (most commonly a B- tree) that
stores the values for a specific column in a table.

How does B-trees index work?

The reason B- trees are the most popular data structure for indexes is due to the fact that they are time efficient
– because look-ups, deletions, and insertions can all be done in logarithmic time. And, another major reason B-
trees are more commonly used is because the data that is stored inside the B- tree can be sorted. The RDBMS
typically determines which data structure is actually used for an index. But, in some scenarios with certain
RDBMS’s, you can actually specify which data structure you want your database to use when you create the
index itself.

How does a hash table index work?

The reason hash indexes are used is because hash tables are extremely efficient when it comes to just looking up
values. So, queries that compare for equality to a string can retrieve values very fast if they use a hash index.

For instance, the query we discussed earlier could benefit from a hash index created on the Employee_Name
column. The way a hash index would work is that the column value will be the key into the hash table and the
actual value mapped to that key would just be a pointer to the row data in the table. Since a hash table is
basically an associative array, a typical entry would look something like “Abc => 0x28939″, where 0x28939 is
a reference to the table row where Abc is stored in memory. Looking up a value like “Abc” in a hash table
index and getting back a reference to the row in memory is obviously a lot faster than scanning the table to find
all the rows with a value of “Abc” in the Employee_Name column.

The disadvantages of a hash index

Hash tables are not sorted data structures, and there are many types of queries which hash indexes can not even
help with. For instance, suppose you want to find out all of the employees who are less than 40 years old. How
could you do that with a hash table index? Well, it’s not possible because a hash table is only good for looking
up key value pairs – which means queries that check for equality

What exactly is inside a database index? So, now you know that a database index is created on a column in a
table, and that the index stores the values in that specific column. But, it is important to understand that a
database index does not store the values in the other columns of the same table. For example, if we create an
index on the Employee_Name column, this means that the Employee_Age and Employee_Address column
values are not also stored in the index. If we did just store all the other columns in the index, then it would be
just like creating another copy of the entire table – which would take up way too much space and would be very
inefficient.

How does a database know when to use an index? When a query like “SELECT * FROM Employee
WHERE Employee_Name = ‘Abc’ ” is run, the database will check to see if there is an index on the column(s)
being queried. Assuming the Employee_Name column does have an index created on it, the database will have
to decide whether it actually makes sense to use the index to find the values being searched – because there are
some scenarios where it is actually less efficient to use the database index, and more efficient just to scan the
entire table.

What is the cost of having a database index?

It takes up space – and the larger your table, the larger your index. Another performance hit with indexes is the
fact that whenever you add, delete, or update rows in the corresponding table, the same operations will have to
be done to your index. Remember that an index needs to contain the same up to the minute data as whatever is
in the table column(s) that the index covers.

As a general rule, an index should only be created on a table if the data in the indexed column will be queried
frequently. See also

1. What columns generally make good indexes?

2. How do database indexes work

Index is nothing but a data structure that stores the values for a specific column in a table. An index is created
on a column of a table.

Example,we have a database table called User with three columns – Name, Age, and Address. Assume that the
User table has thousands of rows.

Now, let’s say that we want to run a query to find all the details of any users who are named ‘John'. If we run
the following query.

SELECT * FROM User WHERE Name = 'John'

The database software would literally have to look at every single row in the User table to see if the Name for
that row is ‘John’. This will take long time.
This is where index helps us "index is used to speed up search queries by essentially cutting down the number
of records/rows in a table that need to be examined".
How to create a index

CREATE INDEX name_index ON User (Name)

Index consists of column values(Eg: John) from one table, and that those values are stored in a data structure.
So now the database will use the index to find employees named John, because the index will presumably
be sorted alphabetically by the Users name. And, because it is sorted, it means searching for a name is a
lot faster because all names starting with a “J” will be right next to each other in the index!

Taking Advantage of Indexes: How It Works
No ratings yet
Taking Advantage of Indexes: How It Works
7 pages
Indexing
No ratings yet
Indexing
6 pages
Indexing in Relational Databases
No ratings yet
Indexing in Relational Databases
2 pages
Data Structure Database Table Columns of A Database Table Lookups
No ratings yet
Data Structure Database Table Columns of A Database Table Lookups
3 pages
Indexes
No ratings yet
Indexes
70 pages
Practical Mysql Indexing Guidelines
No ratings yet
Practical Mysql Indexing Guidelines
35 pages
Optimistic Locking With Concurrency - PLSQL
No ratings yet
Optimistic Locking With Concurrency - PLSQL
9 pages
ER Diagrams
No ratings yet
ER Diagrams
35 pages
Indexing
No ratings yet
Indexing
8 pages
Nosql
No ratings yet
Nosql
8 pages
3 Object Modeling
No ratings yet
3 Object Modeling
17 pages
OOAD
No ratings yet
OOAD
52 pages
Flexible Indexing With Postgres: Ruce Omjian
No ratings yet
Flexible Indexing With Postgres: Ruce Omjian
52 pages
02 Transactions
No ratings yet
02 Transactions
5 pages
Coupling and Cohesion in OOP
No ratings yet
Coupling and Cohesion in OOP
1 page
Single Level Indexing
No ratings yet
Single Level Indexing
9 pages
File (SQL Tutorial)
No ratings yet
File (SQL Tutorial)
81 pages
Coupling and Cohesion
No ratings yet
Coupling and Cohesion
5 pages
Table Replication Using Materialized View in Oracle 11g
No ratings yet
Table Replication Using Materialized View in Oracle 11g
3 pages
Concurrency Control in Distributed Database Systems
No ratings yet
Concurrency Control in Distributed Database Systems
5 pages
Abstract Factory&Adapter
No ratings yet
Abstract Factory&Adapter
10 pages
Transaction in DDB
100% (1)
Transaction in DDB
9 pages
Lecturenotes Module-5 BCS403 Databasemanagementsystem
No ratings yet
Lecturenotes Module-5 BCS403 Databasemanagementsystem
20 pages
DBMS Lab 5 (JOINS)
No ratings yet
DBMS Lab 5 (JOINS)
7 pages
E ComerceSystem
No ratings yet
E ComerceSystem
11 pages
Data Modeling Using The Entity-Relationship Model
100% (1)
Data Modeling Using The Entity-Relationship Model
28 pages
LAB4
No ratings yet
LAB4
8 pages
11-Redis Cache Notes
No ratings yet
11-Redis Cache Notes
7 pages
Software Design Notes
No ratings yet
Software Design Notes
17 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
6 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
Shopping Cart Project Objective:: Project Design: A. System Design
No ratings yet
Shopping Cart Project Objective:: Project Design: A. System Design
3 pages
Database Normalization
No ratings yet
Database Normalization
4 pages
Chapter 1 IntroDistributed
No ratings yet
Chapter 1 IntroDistributed
143 pages
Design and Implementation of Online Shopping Syste
No ratings yet
Design and Implementation of Online Shopping Syste
5 pages
3.3 - Coupling and Cohesion
No ratings yet
3.3 - Coupling and Cohesion
22 pages
Introduction To Cloud Computing
No ratings yet
Introduction To Cloud Computing
27 pages
DBMS Lab File
No ratings yet
DBMS Lab File
44 pages
Cloud - Computing UNIT-3 Material .
No ratings yet
Cloud - Computing UNIT-3 Material .
17 pages
Concurrency Control Techniques
No ratings yet
Concurrency Control Techniques
12 pages
Storage in Cloud
No ratings yet
Storage in Cloud
51 pages
Lec 2 Data Modeling and Database Design
No ratings yet
Lec 2 Data Modeling and Database Design
10 pages
Kubernetes at A Glimpse 1691937493
No ratings yet
Kubernetes at A Glimpse 1691937493
13 pages
Difference Between Clustered and Non-Clustered Index
No ratings yet
Difference Between Clustered and Non-Clustered Index
7 pages
Use Case Diagram
100% (1)
Use Case Diagram
17 pages
DDD in Distributed Computing
No ratings yet
DDD in Distributed Computing
5 pages
ADB - CH - 7 Object Oriented Databse
No ratings yet
ADB - CH - 7 Object Oriented Databse
31 pages
Lecture 07 - Key-Value Databases
No ratings yet
Lecture 07 - Key-Value Databases
75 pages
Data Modeling
No ratings yet
Data Modeling
3 pages
Dbms Lab # 4: SQL Wildcards & Operators
No ratings yet
Dbms Lab # 4: SQL Wildcards & Operators
10 pages
Introduction To Redux
No ratings yet
Introduction To Redux
7 pages
Entity Relationship Diagram 2
No ratings yet
Entity Relationship Diagram 2
32 pages
Tentative Questions For The Data Structures Viva
No ratings yet
Tentative Questions For The Data Structures Viva
5 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
Cloud Computing Chapter 3
No ratings yet
Cloud Computing Chapter 3
17 pages
Design Pattern
No ratings yet
Design Pattern
15 pages
Unit - 1: Cloud Architecture and Model
No ratings yet
Unit - 1: Cloud Architecture and Model
9 pages
S - UNIT VII Indexing in Database
No ratings yet
S - UNIT VII Indexing in Database
9 pages
Indexes
No ratings yet
Indexes
4 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
5 pages
Byju
No ratings yet
Byju
2 pages
Venn Diagram 2
No ratings yet
Venn Diagram 2
1 page
Overloading and Overriding in PHP March 24
No ratings yet
Overloading and Overriding in PHP March 24
3 pages
Magento 2 Theming Changes: Jikke Broxterman
No ratings yet
Magento 2 Theming Changes: Jikke Broxterman
12 pages
Magento Testing Framework Guide Magento2 Extensions and Modules
No ratings yet
Magento Testing Framework Guide Magento2 Extensions and Modules
3 pages
Abstract Classes and Interface in PHP March 24
No ratings yet
Abstract Classes and Interface in PHP March 24
6 pages
What Is The Difference Between Order Status and Order State
No ratings yet
What Is The Difference Between Order Status and Order State
2 pages
Tutorial 25:: Simple Arguments
No ratings yet
Tutorial 25:: Simple Arguments
3 pages
Models, Resource Models and Collections in Magento
No ratings yet
Models, Resource Models and Collections in Magento
2 pages
Tips On How To Prioritize Task
50% (2)
Tips On How To Prioritize Task
2 pages
Full Page Caching in Magento 2 For Humans: 2016 / Opatija / Croatia
No ratings yet
Full Page Caching in Magento 2 For Humans: 2016 / Opatija / Croatia
29 pages
Override Magento Core Files
No ratings yet
Override Magento Core Files
3 pages
Agile Requirements Change Management
No ratings yet
Agile Requirements Change Management
5 pages
Magento Custom Module Development
No ratings yet
Magento Custom Module Development
9 pages
Magento Interview Questions and Answers
No ratings yet
Magento Interview Questions and Answers
14 pages
What Is Difference Between $ - Product - Setdata & $ - Product - Save ?
No ratings yet
What Is Difference Between $ - Product - Setdata & $ - Product - Save ?
1 page
How To Allow Orders For Out & Q-Ans
No ratings yet
How To Allow Orders For Out & Q-Ans
2 pages
Magento Cache System Basic Concepts
No ratings yet
Magento Cache System Basic Concepts
1 page
Magento URL Rewrite Process
No ratings yet
Magento URL Rewrite Process
7 pages
Design Patterns in Magento
No ratings yet
Design Patterns in Magento
3 pages
How To Log All Magento SQL
No ratings yet
How To Log All Magento SQL
2 pages
Real Estate Term Paper
No ratings yet
Real Estate Term Paper
24 pages
Pro Functional PHP Programming Application Development Strategies for Performance Optimization, Concurrency, Testability, and Code Brevity Aley - Download the ebook now for an unlimited reading experience
No ratings yet
Pro Functional PHP Programming Application Development Strategies for Performance Optimization, Concurrency, Testability, and Code Brevity Aley - Download the ebook now for an unlimited reading experience
68 pages
Mtcloud Computing
No ratings yet
Mtcloud Computing
26 pages
Seminar Topics I &amp III
No ratings yet
Seminar Topics I &amp III
6 pages
Assignment HuynhGiaAn 2254030124
No ratings yet
Assignment HuynhGiaAn 2254030124
27 pages
13.proactive Caching
No ratings yet
13.proactive Caching
8 pages
Knowledge-Based Expert System For Route Selection of Road Alignment
No ratings yet
Knowledge-Based Expert System For Route Selection of Road Alignment
7 pages
Hotel Management System
100% (1)
Hotel Management System
100 pages
Informatics College Pokhara: Information Systems CC4002NP
No ratings yet
Informatics College Pokhara: Information Systems CC4002NP
48 pages
XML Migration
No ratings yet
XML Migration
1 page
RUTGERS MIS Mid2 Practice
No ratings yet
RUTGERS MIS Mid2 Practice
7 pages
Visvesvaraya Technological University: R.N.S. Institute of Technology
No ratings yet
Visvesvaraya Technological University: R.N.S. Institute of Technology
35 pages
Syallabus For Ibps So 2017
No ratings yet
Syallabus For Ibps So 2017
8 pages
Working With Odoo - Sample Chapter
No ratings yet
Working With Odoo - Sample Chapter
28 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
26 pages
OPC Client Driver Ifix
No ratings yet
OPC Client Driver Ifix
156 pages
DBMS Record
No ratings yet
DBMS Record
76 pages
Aadarsh 180410107093
No ratings yet
Aadarsh 180410107093
28 pages
Restaurant Management System
No ratings yet
Restaurant Management System
33 pages
Topics Sheet - Automation Tester - TheTestingAcademy
No ratings yet
Topics Sheet - Automation Tester - TheTestingAcademy
11 pages
[FREE PDF sample] Advanced Programming Using Visual Basic 2008 4th Edition Julia Case Bradley ebooks
100% (12)
[FREE PDF sample] Advanced Programming Using Visual Basic 2008 4th Edition Julia Case Bradley ebooks
67 pages
SP 3 D Upgrade Guide
No ratings yet
SP 3 D Upgrade Guide
37 pages
INFO 1113 Projects
0% (1)
INFO 1113 Projects
11 pages
Find Courses by Course Number
No ratings yet
Find Courses by Course Number
1 page
5. introduction to Information storage and retreival
No ratings yet
5. introduction to Information storage and retreival
110 pages
12 CS ERNAKULAM-SAMPLE QUESTION PAPERS-22-23-3-QP
No ratings yet
12 CS ERNAKULAM-SAMPLE QUESTION PAPERS-22-23-3-QP
5 pages
Online Business Directory: Final Project Report On
No ratings yet
Online Business Directory: Final Project Report On
70 pages
FBLA Info
No ratings yet
FBLA Info
96 pages
An Introduction To HIS
No ratings yet
An Introduction To HIS
27 pages
Data and Digital Economy
No ratings yet
Data and Digital Economy
69 pages