SQL Server Query Optimization Techniques PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

SQL Server Query Optimization Techniques - Tips for Writing Efficient and Faster Queries

Complete SELECT Query Execution Order


SELECT TOP DISTINCT COLUMN, AGG_FUNC(column_or_expression), … (8) SELECT  (9) DISTINCT  (11) TOP

FROM mytable (1) FROM

JOIN another_table ON mytable.COLUMN = another_table.COLUMN (2) ON  (3) JOIN

WHERE constraint_expression (4) WHERE

GROUP BY COLUMN / ROLLUP (column) / CUBE (column) (5) GROUP BY  (6) CUBE | ROLLUP

HAVING constraint_expression (7) HAVING

ORDER BY COLUMN ASC / DESC (10) ORDER BY

 QUERY PERFORMANCE OVERVIEW USING STATISTICS IO


Two common methods to determine the best way to write queries:
 Looking at the number of logical reads produced by the query: The number of pages read from the data cache
 As fewer logical reads typically lead to faster execution times
 Looking at graphical execution plans
 Arrow Thickness
 Percentage value in each operator
 Estimated vs. Actual rows
 Missing Index
 Warning Symbol

For determining the number of logical reads, you can turn the STATISTICS IO option ON.
SET STATISTICS IO ON

SELECT *

FROM tablename;

Table 'tablename'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical

reads 0, lob read-ahead reads 0.

 GENERAL TIPS FOR QUERY OPTIMIZATION


 Specific Column Names instead of * in SELECT Query
Avoid Using Select * Clauses and use the actual columns names in SELECT statement to select needed fields:
 SQL Server will return only particular columns to the client, not all columns of a table. This will help reduce the
network traffic.
 The SQL parser handles all the field references by obtaining the names of valid columns from the data dictionary
and substitutes them on the command line, which is time consuming.
Example: Write the query as
SELECT col_1, col_2, col_3, col_4, subject

FROM table_name;
Instead of:
SELECT *

FROM table_name;

 Alternatives of COUNT (*) for returning total tables row count


Use alternative ways instead of the SELECT COUNT (*) statement, as SELECT COUNT (*) statement makes a full
table scan to return the table's row count, it can take much time for the large tables.
Example: Write the query as
DBCC UPDATEUSAGE ('db_name', 'table_name')

WITH COUNT_ROWS

-- SQL Server 2000

SELECT rows

FROM sys.sysindexes

WHERE id = OBJECT_ID('table_name')

AND indid < 2;

-- SQL Server 2005+

SELECT rows

FROM sys.partitions

WHERE object_id = OBJECT_ID('table_name')

AND index_id < 2;

EXEC sp_spaceused 'SYS_CODE'

Instead of:
SELECT COUNT(*)

FROM table_name;

 Try to avoid HAVING Clause in Select statements


HAVING clause is used like a filter to filter the rows after all the rows are selected. Try not to use HAVING clause for
any other purposes.
Example: Write the query as
SELECT Col_1, count(Col_1)

FROM table_name

WHERE col_1 != 'testvalue1'

AND col_1 != 'testvalue2'

GROUP BY col_1;

Instead of:
SELECT Col_1, count(Col_1)

FROM table_name

GROUP BY Col_1
HAVING Col_1 != 'testvalue1'

AND Col_1 != 'testvalue2';

 Try to minimize number of sub query blocks within a query


Example: Write the query as
-- For Oracle

SELECT col_1

FROM table_name1

WHERE (col_2, col_3) = (SELECT MAX(col_2), MAX(col_3)

FROM table_name2)

AND col_4 = 'testvalue1';

Instead of:
-- For SQL Server

SELECT col_1

FROM table_name1

WHERE col_2 = (SELECT MAX(col_2)

FROM table_name2)

AND col_3 = (SELECT MAX(col_3)

FROM table_name2)

AND col_4 = 'testvalue1';

 Try to use operators like EXISTS, IN and JOINS appropriately in your query
 Exists vs. In
The EXISTS function searches for the presence of a single row that meets the stated criteria, as opposed to the IN
statement that looks for all occurrences.
(a) Usually IN has the slowest performance.
(b) IN is efficient only when most of the filter criteria for selection are placed in the subquery of a SQL statement.
(c) EXISTS is efficient when most of the filter criteria for selection is in the main query of a SQL statement.
Example: Write the query as
SELECT *

FROM table1 t1

WHERE EXISTS (SELECT '1'

FROM table2 t2

WHERE t2.col_id = t1.col_id)

Instead of:
SELECT *

FROM table1 t1

WHERE t1.col_id IN (SELECT t2.col_id

FROM table2 t2)

 Not Exists vs. Not In


Example: Write the query as
SELECT *

FROM table1 t1

WHERE t1.col_id NOT IN (SELECT t2.col_id

FROM table2 t2)

Instead of:
SELECT *

FROM table1 t1

WHERE NOT EXISTS (SELECT '1'

FROM table2 t2

WHERE t2.col_id = t1.col_id)

 In with Minus vs. Not In for Non-Indexed Columns


The NOT IN clause causes an internal sort/merge
Example: Write the query as
SELECT *

FROM system_user

WHERE su_user_id IN (SELECT su_user_id

FROM system_user

MINUS

SELECT ac_user

FROM account)

Instead of:
SELECT *

FROM system_user

WHERE su_user_id NOT IN (SELECT ac_user

FROM account)

 Use EXISTS instead of DISTINCT when using table joins that involves tables having one-to-many relationships
Example: Write the query as
SELECT d.col_id, d.col2

FROM table1 d

WHERE EXISTS (SELECT '1'

FROM table2 e

WHERE e.col2 = d.col2);

Instead of:
SELECT DISTINCT d.col_id, d.col2

FROM table1 d, table2 e

WHERE e.col2 = e.col2;

 Try to use UNION ALL instead of UNION, whenever possible


The SORT operation is very expensive in terms of CPU consumption. The UNION operation sorts the result set to
eliminate any rows that are within the subqueries. UNION ALL includes duplicate rows and does not require a sort.
Example: Write the query as
SELECT id, col1

FROM table1

UNION ALL

SELECT id, col1

FROM table2;

Instead of:
SELECT id, col1, col2

FROM table1

UNION

SELECT id, col1

FROM table2;

 Use conditions in WHERE clause carefully


Limit Size of Your Working Data Set.
Example 1: Write the query as
SELECT id, col1, col2

FROM TABLE

WHERE col2 > 10;

Instead of:
SELECT id, col1, col2

FROM TABLE

WHERE col2 != 10;

Example 2: Write the query as


SELECT id, col1, col2

FROM TABLE

WHERE col1 LIKE 'Nav%';

Instead of:
SELECT id, col1, col2

FROM TABLE

WHERE SUBSTR(col1, 1, 3) = 'Nav';

Example 3: Write the query as


SELECT Col1, Col2

FROM TABLE

WHERE Col3 BETWEEN MAX(Col3) AND MIN(Col3)

Instead of:
SELECT Col1, Col2

FROM TABLE
WHERE Col3 >= MAX(Col3)

AND Col3 <= MIN(Col3)

 Use NON-Column expression on one side of the SQL query as it will be processed before any other clause
Example: Write the query as
SELECT id, Col1, Col2

FROM TABLE

WHERE Col2 < 25000;

Instead of:
SELECT id, Col1, Col2

FROM TABLE

WHERE Col2 + 10000 < 35000;

 Use join instead of subqueries


Example: Write the query as
SELECT a.id,

MAX(p.created) AS latest_post

FROM authors a

INNER JOIN posts p ON (a.id = p.author_id)

GROUP BY a.id

Instead of:
SELECT a.id,

(SELECT MAX(created)

FROM posts

WHERE author_id = a.id) AS latest_post

FROM authors a

 Use UNION instead of OR


Using OR on an indexed column causes the optimizer to perform a full-table scan rather than an indexed retrieval.
Example: Write the query as
SELECT *

FROM a, b

WHERE a.p = b.q

UNION

SELECT *

FROM a, b

WHERE a.x = b.y

Instead of:
SELECT *

FROM a, b

WHERE a.p = b.q OR a.x = b.y;


 Be careful when using wildcards
Example: Examine the following queries
-- Full wildcard

SELECT *

FROM TABLE

WHERE COLUMN LIKE '%hello%';

-- Postfix wildcard

SELECT *

FROM TABLE

WHERE COLUMN LIKE 'hello%';

-- Prefix wildcard

SELECT *

FROM TABLE

WHERE COLUMN LIKE '%hello';

 Alias Usage
A short alias is parsed more quickly than a long table name or alias.
Example: Write the query as
SELECT e.first_name, e.last_name, c.country

FROM employee e, countries c

WHERE e.country_id = c.id

AND e.last_name = 'HALL';

Instead of:
SELECT first_name, last_name, country

FROM employee, countries

WHERE country_id = id

AND last_name = 'HALL';

 Correlated Subqueries vs. Inline Views


The Correlated Subquery is extremely inefficient and is very CPU- intensive. Inline Views (subqueries in the FROM
clause of select statements) perform orders of magnitude faster and are much more scalable than Correlated Subqueries.
Example: Write the query as
SELECT e1.*

FROM e1, (SELECT e2.dept_id dept_id, avg(e2.salary) avg_sal

FROM emp e2

GROUP BY dept_id) dept_avg_sal

WHERE e1.dept_id = dept_avg_sal.dept_id

AND e1.salary > dept_avg_sal.avg_sal;

Instead of:
SELECT OUTER.*

FROM emp OUTER

WHERE OUTER.salary > (SELECT avg(salary)

FROM emp INNER

WHERE INNER.dept_id = OUTER.dept_id);

 Use driving tables


Always choose a table with less number of records as the driving table and place the driving table at the end
of the FROM clause. Select the intersection table that is the table that has many tables dependent on it as
the driving table. Subsequent driven tables should be placed in order so that those retrieving the most rows are
nearer to the start of the FROM clause. However, the WHERE clause should be written in the opposite order,
with the driving tables conditions first and the final driven table last

 SOME MORE TIPS FOR OPTIMIZATION OF QUERIES/ TABLES/ STORED PROCEDURES IN SQL
SERVER
a. Table should have minimum of one clustered index and appropriate number of non-clustered indexes, which
should be created on columns of table based on query which is running following the priority order as WHERE
clause, then JOIN clause, then ORDER BY clause and finally the SELECT clause.
b. Avoid using Triggers if possible; incorporate the logic of trigger in a stored procedure.
c. Table should have a primary key.
d. Try to use constraints for selection instead of using triggers. Constraints are efficient than triggers enhance
performance. So, you should use constraints instead of triggers, whenever possible.
e. Try to use table variables instead of temporary tables as table variables require less locking resources as well as
less logging resources than the temporary tables, so table variables should be used whenever possible.
f. Avoid the use of views or replace views with original tables.
g. Try to avoid the use of DISTINCT clause, wherever possible. As the DISTINCT clause will result in performance
degradation, we should use this clause only when it is necessary or unavoidable.
h. Try to add SET NOCOUNT ON statement into your stored procedures as it stops the message indicating the
number of rows affected by a SQL statement. It also reduces network traffic, because our client will not receive
any message indicating the number of rows affected by a SQL statement.
i. Try to use TOP keyword or the SET ROWCOUNT statement in the select statements, if we need to return only the
first n rows. This can improve performance of our queries, as the smaller result set will be returned. It can also
reduce the traffic between the server and the clients.
j. Try to use user-defined functions to keep the encapsulated code for reuse in future. The user-defined functions
(UDFs) contain one or more SQL statements that can be used to encapsulate code for reuse. Using UDFs can also
reduce the network traffic.
k. If possible move the logic of UDF to SP as well.
l. If you need to delete all rows of a table, try to use TRUNCATE TABLE command instead of DELETE command.
Using the TRUNCATE TABLE is a faster way to delete all rows of a tables, because it removes all rows from a
table without logging each row delete.
m. Remove any unnecessary joins from tables.
n. If there is cursor used in query, see if there is any other way to avoid the usage of this (either by SELECT … INTO
or INSERT … INTO, etc.). Try to avoid using cursors whenever possible. As SQL Server cursors can result in
some performance degradation as compared to select statements. Try to use correlated subqueries or derived tables
for row-by-row operations on tables.
o. When writing a subquery (a SELECT statement within the WHERE or HAVING clause of another SQL
statement):
1. Try to use a correlated (refers to at least one value from the outer query) subquery when the return is
relatively small and/or other criteria are efficient i.e. if the tables within the subquery have efficient indexes.
2. Try to use a non-correlated (does not refer to the outer query) subquery when dealing with large tables from
which you expect a large return (many rows) and/or if the tables within the subquery do not have efficient
indexes.
3. Ensure that multiple subqueries are in the most efficient order.
4. Remember that rewriting a subquery as a join can sometimes increase efficiency.
p. Use char/varchar columns data type, instead of nchar/nvarchar data type if we do not need to store Unicode data.
The char/varchar data value uses only one byte to store one character; whereas the nchar/nvarchar value uses two
bytes to store one character, so the char/varchar columns use two times less space to store data as compared to
nchar/nvarchar data columns.
q. Try to use stored procedures instead of heavy queries as they can reduce network traffic, because our client will
send to server only stored procedure name (along with some parameters) instead of heavy and lengthy queries text.
Stored procedures can be used to enhance security. For example, we can give different users, different set of
permissions to execute the stored procedure to work with the restricted set of the columns and data.
r. We should try to return an integer value from a RETURN statement instead of returning an integer value as a part
of a record set. As the RETURN statement exits from a stored procedure unconditionally, so the statements
following the RETURN statement are not executed. The RETURN statement is generally used for error checking,
but we can also use this statement to return an integer value for any other reason. Using RETURN statement can
improve performance as SQL Server will not create a record set.
s. Try to drop indexes that are not being used. Because each index takes up disk space and slow the DML operations,
we should drop indexes that are not used. We can use Index Wizard to identify indexes that are not being used in
our SQL queries.
t. We should try to create indexes on columns that have integer values rather than character values. Because the
integer values have less size than the size of characters values, so we can reduce the number of index pages which
are being used to store the index keys. This finally reduces the number of reads required to read the index and
enhances the overall index performance.
u. If we need to join several tables very frequently, then we should consider creating index on the joined columns
which can significantly improve performance of the queries against the joined tables.
v. Try to avoid any operations on the fields, where ever possible. Some operations will prevent the use of index on a
field even if it exists—for example, ltrim(rtrim(FieldColumnName)) as such operations will degrade the
performance. For example, instead of using the condition cast(DateColumn as varchar(20)) = @dateStr, we should
try to convert @dateStr to an expression of datetime type and then compare it to DateColumn value.

You might also like