10985C ENU TrainerHandbook
10985C ENU TrainerHandbook
10985C ENU TrainerHandbook
10985C
Introduction to SQL Databases
MCT USE ONLY. STUDENT USE PROHIBITED
ii Introduction to SQL Databases
Information in this document, including URL and other Internet Web site references, is subject to change
without notice. Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with
any real company, organization, product, domain name, e-mail address, logo, person, place or event is
intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the
user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in
or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these
patents, trademarks, copyrights, or other intellectual property.
The names of manufacturers, products, or URLs are provided for informational purposes only and
Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding
these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a
manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links
may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is not
responsible for the contents of any linked site or any link contained in a linked site, or any changes or
updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission
received from any linked site. Microsoft is providing these links to you only as a convenience, and the
inclusion of any link does not imply endorsement of Microsoft of the site or the products contained
therein.
© 2017 Microsoft Corporation. All rights reserved.
Released: 11/2017
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases iii
Acknowledgements
Microsoft Learning would like to acknowledge and thank the following for their contribution towards
developing this title. Their effort at various stages in the development has ensured that you have a good
classroom experience.
Contents
Module 1: Introduction to Databases
Module Overview 1-1
Lesson 1: Introduction to Relational Databases 1-2
Module 3: Normalization
Module Overview 3-1
Module 4: Relationships
Module Overview 4-1
Module 5: Performance
Module Overview 5-1
Course Description
This three-day instructor-led course is aimed at people looking to move into a database professional role
or whose job role is expanding to encompass database elements. The course describes fundamental
database concepts including database types, database languages, and database designs.
Audience
The primary audience for this course is people who are moving into a database role, or whose role has
expanded to include database technologies.
Student Prerequisites
This is a foundation level course and therefore only requires general computer literacy.
Course Objectives
After completing this course, students will be able to:
Course Outline
The course outline is as follows:
Module 1: ‘Introduction to Databases’ introduces key database concepts in the context of SQL Server
2016
Module 3: ‘Normalization’ introduces the concept of normalization and describes normalization and
denormalization techniques.
Module 4: ‘Relationships’ describes the different types of relationship and their effects in database
design
Course Materials
The following materials are included with your kit:
Course Handbook: a succinct classroom learning guide that provides the critical technical
information in a crisp, tightly-focused format, which is essential for an effective in-class learning
experience.
o Lessons: guide you through the learning objectives and provide the key points that are critical to
the success of the in-class learning experience.
o Labs: provide a real-world, hands-on platform for you to apply the knowledge and skills learned
in the module.
o Module Reviews and Takeaways: provide on-the-job reference material to boost knowledge
and skills retention.
Modules: include companion content, such as questions and answers, detailed demo steps and
additional reading links, for each lesson. Additionally, they include Lab Review questions and answers
and Module Reviews and Takeaways sections, which contain the review questions and answers, best
practices, common issues and troubleshooting tips with answers, and real-world issues and scenarios
with answers.
Resources: include well-categorized additional resources that give you immediate access to the most
current premium content on TechNet, MSDN®, or Microsoft® Press®.
Course evaluation: at the end of the course, you will have the opportunity to complete an online
evaluation to provide feedback on the course, training facility, and instructor.
Note: At the end of each lab, you must revert the virtual machines to a snapshot. You can
find the instructions for this procedure at the end of each lab
The following table shows the role of each virtual machine that is used in this course:
Software Configuration
The following software is installed on the virtual machines:
SQL2017
SharePoint 2017
Course Files
The files associated with the labs in this course are located in the D:\Labfiles folder on the 10985c-MIA-
SQL virtual machine.
Classroom Setup
Each classroom computer will have the same virtual machine configured in the same way.
DVD drive
Network adapter
Additionally, the instructor’s computer must be connected to a projection display device that supports
SVGA 1024×768 pixels, 16-bit colors.
MCT USE ONLY. STUDENT USE PROHIBITED
1-1
Module 1
Introduction to Databases
Contents:
Module Overview 1-1
Lesson 1: Introduction to Relational Databases 1-2
Module Overview
There are many different types of database, and many languages that you can use to create, manage, and
query them. This module describes some of the most common types of database, outlines some key
concepts for working with relational databases, and introduces the languages that you can use when
working with databases in Microsoft® SQL Server®. The module also explains how you can use Transact-
SQL to access and manipulate data.
Objectives
After completing this module, you will be able to:
Describe the characteristics of relational databases.
Lesson 1
Introduction to Relational Databases
The term database covers a range of different types of storage structures, each of which has its own
distinct set of characteristics. This lesson describes what a database is and introduces the relational
database, which is the most widely implemented type of database.
Lesson Objectives
After completing this lesson, you will be able to:
What Is a Database?
At the simplest level, a database is just an organized store
of data. Organizing data in a logical and consistent way
makes it faster and easier to access the data when you
need it. Just as it is easier to locate a specific book on a set
of shelves when the books are organized alphabetically, or
by genre, or by a combination of both, the organizational
principles that databases use make data retrieval much
more efficient. However, there is no single “correct” way to
organize data in a database—and there are several
different types of database in use today.
For most people, probably the most familiar type of database is the relational database. A relational
database consists of tables, each of which contains rows of data organized by columns, in an arrangement
similar to a spreadsheet. Each table is related to one or more of the other tables in the database—for
example, a table that contains customer information might relate to a table that contains information
about orders. This relationship is based on the real-world fact that customers place orders. You will learn
more about relational databases in subsequent topics. Other common types of database include
hierarchical databases, such as the Windows® registry database, object-oriented databases, and column-
oriented databases.
Redundancy
In addition to making it faster and easier to retrieve data, databases also make the storage of data more
efficient by reducing data redundancy. The term data redundancy refers to the storage of the same piece
of data multiple times. For example, imagine a small company that uses a spreadsheet to keep track of its
orders, including details about the customers who placed the orders.
The spreadsheet contains one row for each order, and includes columns called Customer Name, Customer
Email, and Customer Address. It is likely that some of the customers will have placed multiple orders, so
the spreadsheet might contain the same customer information several times. Redundant data like this can
significantly increase the size of a database, making storage more expensive, and the database potentially
more difficult to manage.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-3
Additionally, storing data repeatedly can lead to data inconsistency because a single data value might be
stored differently in two or more locations. This might be because of spelling mistakes or because of the
use of different terms to describe the same thing. For example, in the spreadsheet described above, a
customer’s address might include the state as “WA” in one row and as “Washington” in another.
It is important that data is consistent because this will affect the results of queries that you issue against
the database. Inaccurate query results can have serious consequences because businesses rely on data to
guide decision-making processes. Databases can reduce redundancy by storing different types of data in
different tables, so that a single piece of information, such as a customer’s address, can be stored just
once instead of many times.
Tables store data in horizontal rows, and each row represents a specific instance of the entity that the
table represents. For example, in the Orders table, each row would represent a single order, and in the
Products table, each row would represent a single product. Tables are divided vertically into columns, and
each column contains a specific type of information. For example, in the Customers table, there might be
columns called First Name, Last Name, Email Address, and Phone Number; each row would contain a
single value for each of these columns.
MCT USE ONLY. STUDENT USE PROHIBITED
1-4 Introduction to Databases
In some books and other sources about relational databases, you might come across the words “relation”,
“tuple”, and “attribute”. These terms date back to the time when Codd first published his ideas about
relational data models. They relate more to the conceptual description of a relational model rather than
to an actual implementation of it.
You will learn more about this in Module 2 of this course. Most modern DBMSs implement relations as
tables, tuples as rows, and attributes as columns. However, it is worth remembering these other terms
because you are likely to some across them occasionally. For example, the data values in a column are
sometimes referred to as attribute values. You may also come across the term “record”, which usually
means the same as “row”, and “field”, which usually means the same as “column”.
The numbers in the Customer ID column do not have any meaning in the real world; they simply serve as
identifiers within the database. It is important to ensure that there is no duplication of the values in this
column. For example, if the Products table contained two rows with the same product number, you would
not be able to tell the two rows apart, and this could quickly cause problems with data consistency.
Imagine that a data analyst creates a report that sums the sales revenue for each product, based on the
product number; the report would be inaccurate because the values for two distinct products would be
summed together. To avoid this kind of problem, you add a primary key constraint to the column that
contains the unique identifiers.
A constraint restricts the actions that you can perform when adding or modifying data in a table—there
are several types of constraint that you can create to help to prevent data inconsistency. When you define
a primary key constraint on a column, it prevents the addition of duplicate values to that column, so
ensuring that every value is unique.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-5
Additionally, if the company uses third-party shippers to deliver its orders to customers, there might be a
Shippers table that contains information about the shippers. The Orders table would not contain extensive
information about each customer, because it is the Customers table that stores this information. Instead,
the Orders table might include a column called Customer ID that contains values that identify the
customer who placed each order. For each row, this value must match one of the values in the unique
Customer ID column in the Customers table. The following example tables illustrate this:
Customers Table
Orders table
503 1 03/22/2016 4
504 2 03/22/2016 2
505 2 03/23/2016 3
This arrangement helps to minimize data redundancy because, instead of storing a customer’s name,
address, and other details in the Orders table every time they place an order, the database stores a
customer’s details only once, as a row in the Customers table. When a customer places an order, this adds
a new row to the Orders table, including a Customer ID that already exists in the Customers table.
Foreign keys
An order in the Orders table that did not have a corresponding Customer ID in the Customers table would
be impossible to fulfill because there would be no delivery address shown, and no email address or phone
number to contact the customer. To ensure that it is not possible to add Customer ID values to the Orders
table that do not exist in the Customers table, you can use another type of constraint called a foreign key
constraint.
MCT USE ONLY. STUDENT USE PROHIBITED
1-6 Introduction to Databases
When you add a foreign key constraint to a column, the constraint checks any new values that you add to
the column to make sure that the values already exist in a second column—which is called the referenced
column. In this example, you would create a foreign key on the Customer ID column in the Orders table
that references the Customer ID column in the Customers table. The constraint would prevent the
addition of a row to the Orders table if it contained a Customer ID value that was not present in the
Customers table.
Unlike a primary key constraint, a foreign key constraint does not enforce uniqueness. It would make little
sense to restrict a customer’s Customer ID to only one appearance in the Orders table, because this would
mean that they could only place a single order! The relationship between a primary key column and a
foreign key column is called a one-to-many relationship. This means that the values in a foreign key
column can each appear multiple times, but the column that the foreign key references, usually a primary
key column, must contain unique values.
To complete the example used in this topic, you would also create a primary key constraint on the Order
ID column of the Orders table, and on the unique identifier column of the other tables. You would then
add any foreign key constraints to the tables as required.
How to use a database diagram to view tables, primary key constraints, and foreign key constraints.
Demonstration Steps
1. Start the 10985C-MIA-DC and 10985C-MIA-SQL virtual machines, and then log on to 10985C-MIA-
SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
2. In the D:\Demofiles\Mod01 folder, right-click Setup.cmd, and then click Run as administrator.
3. In the User Account Control dialog box, click Yes, and wait for setup to complete.
4. Open Microsoft SQL Server Management Studio, and then connect to the MIA-SQL instance of the
database engine by using Windows Authentication.
6. If the Microsoft SQL Server Management Studio dialog box appears asking if you wish to create
support objects for database diagramming, click Yes.
7. In the Add Table dialog box, press and hold down the CTRL key, click Customer (Sales), click
SalesOrderDetail (Sales), click SalesOrderHeader (Sales), click ShipMethod (Purchasing), click
Add, and then click Close.
a. The SalesOrderHeader (Sales) table contains the SalesOrderID, which is the primary key
column.
9. In the SalesOrderDetail (Sales) table, right-click the SalesOrderID column, and then click
Properties.
10. In the Properties window, click the Description field, and then click the ellipsis button (…).
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-7
11. In the Description Property dialog box, note that the column is a primary key column, and that
there is a foreign key that references the SalesOrderID column in the SalesOrderHeader column,
and then click Cancel.
12. Click the line between the Customer (Sales) table and the SalesOrderHeader (Sales) table. This line
represents a foreign key relationship.
13. In the Properties window, click Description, and then click the ellipsis button (…).
14. In the Description Property dialog box, note that the foreign key references the CustomerID
column in the Customer (Sales) table, and then click Cancel.
15. Close the database diagram window, and do not save any changes. Close SQL Server Management
Studio.
Introduction to Normalization
The simple example order processing database
described in the previous topics is an example of
an operational or online transactional processing
(OLTP) database. In an OLTP database like this,
you expect there to be frequent inserts of new
rows into the various tables, as customers place
orders, and updates to rows as, for example,
customers update their personal details.
First normal form deals with separating related data into tables, and ensuring that each table has a
unique identifier, as previously described. It additionally involves eliminating columns that contain
repeating data. For example, a Customers table that contained columns called Telephone Number 1
and Telephone Number 2 would not be in first normal form because the two columns contain the
same type of data.
Second normal form involves complying with first normal form, in addition to separating data that
might repeat across rows into different tables. You saw an example of this in the previous topic, which
described storing customers’ details in a separate table to avoid repeating them in the Orders table.
MCT USE ONLY. STUDENT USE PROHIBITED
1-8 Introduction to Databases
Third normal form involves complying with second normal form, while also eliminating columns that
contain values that do not fully depend on the primary key. For example, a column in an Order
Details table called Total that contains the total order values as calculated by multiplying the unit
price of items by the quantity ordered. Because the column contains values that are derived from the
values in other columns, it is not fully dependent on the primary key and so the database is not in
third normal form.
There are additional levels of normal form. You will learn more about these, and see more examples of
normalization in Module 3 of this course.
Lesson 2
Other Types of Databases and Storage
Although the OLTP relational database is a very common type of database, it is one of many that you
might encounter. It is useful to be familiar with these types of database, even if you do not work with
them on a day-to-day basis. This lesson describes several different types of database and outlines their
key features.
Lesson Objectives
After completing this lesson, you will be able to:
Data Warehouses
The previous topics described relational databases in an
OLTP context, but it is also possible to use relational
databases to create data warehouses. A data warehouse is
a very large database that typically contains historical data
instead of current transactional data. For example, an
organization might use an OLTP database to process and
store transactions for the current month, and a separate
data warehouse to store transactions from previous months
and years.
Organizations can use this historical data to help them understand how the company is performing, or to
plan future strategies. For example, you can use a data warehouse to analyze sales performance over
previous periods, to assess the success of marketing campaigns, or to predict sales trends.
In a data warehouse, everyday transactional activity is usually rare to non-existent. Instead of inserting
rows one at a time, as is the case in an OLTP database, you typically bulk-load data into a data warehouse
on a scheduled basis—for example, once a week or once a month. In between these data loads, the
primary activity is read-based, when data analysts query the data warehouse. Consequently, you create
tables and relationships in a data warehouse with the aim of optimizing read performance instead of for
transactional performance.
MCT USE ONLY. STUDENT USE PROHIBITED
1-10 Introduction to Databases
There are various recognized methods for designing data warehouses. Amongst the most commonly used
are Ralph Kimball’s dimensional model and Bill Inmon’s normalized model. In the dimensional model, you
design the data warehouse so that it will contain more redundant data than an OLTP database. The
process of designing a database to include the storage of redundant data in this way is called
denormalization.
Star schemas
In a relational data warehouse built around the dimensional model, there is usually a large table, called a
fact table. Fact tables store the important numerical values, such as sales data, costs, and so on. These are
the data values, or measures, that organizations can analyze to gain business insights. Typically, business
analysts want to examine this data in aggregated form. For example, they might look at total revenue
aggregated by date, geographical region, or by product type.
The other tables—often called dimension tables—in the data warehouse, contain the information to be
able to do this. For example, there might be a Product table that contains product information, tables that
contain the required geographical data, and a table that includes the dates that are relevant to the orders
or other data that the data warehouse contains. Conceptually, you can imagine the fact table as a large
central table, surrounded by the dimension tables, with foreign key relationships between each dimension
table and the fact table. The design of a data warehouse is sometimes called a star schema because it
loosely resembles a star.
Multidimensional databases
The most widely used OLAP systems are based on the
multidimensional model.
Each cell is a location in the cube that represents the intersection of the dimensions of the cube for a
given set of dimension values. For example, you might select to analyze the sales data for the product
category “Clothing” in the “South” region in the year 2016. The cell in the cube that represents the
intersection of the dimensions for the values “Clothing”, “South”, and “2016” is easily located and it
contains the required value, already aggregated.
Data is pre-aggregated in a cube, which means that values do not have to be calculated when a user
issues a query against the cube. This makes it much less resource intensive which, in turn, makes it much
faster at handling queries.
To query multidimensional cubes, you use the Multidimensional Expressions (MDX) language, though
many analysts use tools such as Microsoft Excel® that issues MDX queries on their behalf. For example,
when you connect to a multidimensional cube and add data to a pivot table, Excel issues the required
MDX queries to the OLAP server in the background.
Note: Cubes in multidimensional databases nearly always have many more dimensions
than the three dimensions of a real, physical cube. Therefore, using the concept of a three-
dimensional cube to understand multidimensional cubes is not always intuitive.
Relational OLAP (ROLAP). ROLAP storage uses the underlying relational database (often a data
warehouse) to store the measures and aggregate values instead of processing them and storing them
in the cube. Users still query the cube to access the data by using the MDX language, but the OLAP
server must retrieve the necessary data from the relational database, which is less efficient and takes
longer than querying the equivalent MOLAP database would. ROLAP has two key benefits:
o It potentially provides users with access to data that is more up to date. This assumes that the
ROLAP database is more up to date than an equivalent MOLAP database would be, which might
not always be the case. If the ROLAP database is a data warehouse, the state of the data will
depend on the frequency of data loads into the data warehouse.
o The required storage is less than for the equivalent MOLAP database because the data is stored
in only one location (the source relational database) rather than two (the source relational
database and MOLAP storage).
Hybrid OLAP (HOLAP). As the name suggests, HOLAP uses a combination of ROLAP and MOLAP
storage to achieve a trade-off between the two storage methods. With HOLAP, you can store some of
the data in MOLAP storage and some in ROLAP. A common approach is to store aggregate data,
which is usually the most commonly accessed data, in MOLAP storage, and detail-level data in
ROLAP. This arrangement gives the performance advantage of MOLAP for queries for aggregate data,
and the storage advantages of ROLAP for non-aggregate data.
MCT USE ONLY. STUDENT USE PROHIBITED
1-12 Introduction to Databases
Hierarchical Databases
Although relational databases are the most common type
of database, there are several other types that you might
encounter, including hierarchical databases and NoSQL
databases.
Some data is naturally hierarchical and therefore suitable for storage in a hierarchical system. For example,
the arrangement of folders, subfolders, and files in a file directory is a natural hierarchy. Each file (child)
can only exist in a single folder (parent), but folders can contain many files. Each folder might itself be the
child of another folder, and so on. The Windows file directory is an example of this kind of hierarchical
database.
Note: The hierarchical data model dates back to the 1960s, when it was developed by IBM.
It predates the relational data model that you learnt about previously in this module. One of the
key drivers for the development of the relational data model was to overcome the limitations
imposed by the hierarchical model.
Some vendors include features with which you can store and manipulate hierarchical data in a relational
database. For example, SQL Server includes the “hierarchyid” data type for this purpose.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-13
Column-Oriented Databases
The term column-oriented refers to a method of
storing data that can optimize query response
times for some types of workloads. Instead of
storing data in rows, as a standard relational
database does, a column-oriented DBMS
rearranges the data so that all of the data in a
column is stored as a single record. For example,
the following illustration shows a relational table
that contains customer data:
4 Julie Goel UK
As you can see, each First Name value in the top row is stored along with the corresponding Customer ID
value. In the second row, the Last Name value “Martin” appears with two Customer ID values because it
pertains to both of these IDs. Similarly, in the third row, the Country value “USA” is associated with three
values in the Customer ID column.
This way of arranging data makes it faster and more efficient for some types of queries because it can
reduce the number of times the DBMS needs to access the storage disks to obtain the data. Disk access,
which is also called disk input and output (IO), is one of the major bottlenecks in a database system, so
reducing it can improve performance markedly. However, the query workloads for which column-oriented
storage results in better performance are quite limited and specific. For example, queries that aggregate
data from a relatively small subset of the total number of columns in a large relational table might
perform better if column-oriented storage is used. However, there is no guarantee of this, and it requires
careful planning and design.
SQL Server, although a relational DBMS, provides support for column-oriented storage through the
Columnstore index feature, and in-memory technology that also uses compression to optimize
performance for large database tables.
MCT USE ONLY. STUDENT USE PROHIBITED
1-14 Introduction to Databases
NoSQL Databases
NoSQL is used to refer to a set of different types of
storage solutions that have one thing in common:
they are not based on the relational data model.
NoSQL databases have become increasingly
prevalent in recent years in response to a number
of factors, including:
More powerful commodity hardware, virtualization technologies, and cloud services, which
organizations can pay for as and when they use the resources, have made it easier and less expensive
to create “scale-out” solutions. A scale-out solution involves distributing workloads across multiple
nodes (for example, virtual or physical servers) to add computing power. This contrasts with the
“scale-up” approach, which involves increasing computing power by adding components such as
memory and processors to an individual server. Because of their almost limitless potential for adding
processing nodes as required, scale-out solutions can deliver computing power on a massive scale.
In response to these factors, several technologies have emerged that can utilize the power of scale-out
systems to store and manipulate semi-structured and unstructured data. Because these technologies do
not use relational storage, they are often collectively referred to as NoSQL databases. Not all NoSQL
concepts and technologies are new, however—some have existed since before the term NoSQL was
coined.
Consistency means that a transaction will not leave the database in an inconsistent state after
completion—for example, by the addition of invalid data.
Isolation means that transactions occur without interference from other transactions, which ensures that
the data that users see is accurate and meaningful.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-15
Durability means that, after completion, the changes that a transaction makes to a database are
permanent.
Not all NoSQL systems support ACID properties to the same degree as relation databases. Furthermore,
relational databases use SQL, a fully featured query language, to provide data access and manipulation.
Although some NoSQL databases do include a query language that offers SQL-like functionality, these
languages are not as sophisticated as SQL.
Note: Because some NoSQL databases include support for query languages similar to SQL,
they are sometimes called “Not only SQL” databases as an alternative to the term NoSQL
database.
Hadoop
Like the NoSQL databases described in the previous topic,
Hadoop is a scale-out platform for storing and working with
semi-structured and unstructured data. Hadoop is actually a
group of related technologies, including:
Hadoop MapReduce is a program you can use to coordinate the processing of data in parallel across
multiple nodes. Parallel processing delivers much faster performance.
MCT USE ONLY. STUDENT USE PROHIBITED
1-16 Introduction to Databases
Although Hadoop and NoSQL databases both handle big data, they work best with different types of
workloads. Hadoop is designed for massive-scale processing, and performs especially well with extremely
large, read-only workloads. By contrast, NoSQL databases generally perform best with read/write
workloads, such as those generated by web applications.
Access data stored in Hadoop and Azure Blob Storage by using Transact-SQL. You can create queries
that return only data from Hadoop, or queries that return results which combine data from Hadoop
and relational data stored in SQL Server.
Export data from SQL Server to Hadoop and Azure Blob Storage.
Use the suite of Microsoft business intelligence (BI) tools to analyze and report on Hadoop data.
PolyBase Guide
http://aka.ms/Aqamxl
SQL Graph
SQL Graph is new in SQL Server 2017. A graph
database allows complex relationships to be
modelled, and simplifies the analysis of many-to-
many relationships.
SQL Graph enables graph nodes and edges to be
stored and queried in SQL Server. A SQL graph
database doesn’t provide any additional
functionality compared to a relational structure,
but it can simplify queries and make
interconnected relationships easier to model.
You hold hierarchical data, and entities have one or more parents.
Categorize Activity
Match the types of database to the correct description. Indicate your answer by writing the category
number to the right of each item.
Items
Lesson 3
Data Analysis
The data that a database stores can have many uses, but without context, the data is meaningless. This
lesson explains the difference between data and information, and how organizations can use data analysis
to assist with their decision-making processes.
Lesson Objectives
After completing this lesson, you will be able to:
Processing data
In the previous lesson, you learned that databases can store data in many different ways. Each type of
database stores data along with metadata, which is additional data that provides context for the data. For
example, in a relational database, you take raw data and process it by storing it in tables and columns. The
values in a column are the data and the column name is metadata. Processing data in this way gives
meaning to the data. Other types of metadata can provide additional context.
Were there any other significant circumstances, such as unusual weather conditions?
Data analysts can also explore data by using sophisticated algorithms and statistical analyses to reveal
patterns and trends that might not be immediately obvious, to reveal new lines of enquiry. In addition to
finding answers to questions, analysts can use the information they have to predict future trends, and
plan, for example, the best time to launch a promotion.
SQL Server Analysis Services enables you to create cubes and tabular data models that can be used with
business intelligence tools, including Excel and PowerPivot, to perform data analysis. Analysis Services also
includes data mining functionality.
Many RDBMS systems include a tool for ensuring data quality, and there are also third-party tools. With
SQL Server you can use Data Quality Services to profile data and correct data inconsistencies so that the
data analyses you undertake are as accurate as possible. You would typically use Data Quality Services to
ensure that the data you load into your data warehouse is valid.
Presenting data
It is important to communicate the insights that data analysis provide, and in a format that can be easily
understood. Poorly presented intelligence is frustrating for those who have to use it, and can lead to
misunderstandings about the findings. This might result in decision makers using incorrect interpretations
to guide the decision-making processes. For the same reason, it is also important to ensure that analysis
reports are up-to-date.
Reporting Services in SQL Server is a comprehensive tool for creating and managing reports—both
reporting specialists and non-specialists can use it to generate reports quickly and easily. Tools such as
Excel, PowerPivot, and Power View are also very popular ways of representing and communicating
intelligence.
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
Lesson 4
Database Languages in SQL Server
SQL Server includes support for several languages, so you can work with data in a wide variety of contexts
to achieve a range of different aims. SQL is the principal language that most SQL Server database
professionals use, but other languages exist that support other key functions, including data analysis and
data mining.
Lesson Objectives
After completing this lesson, you will be able to:
Explain the purpose of the other languages that SQL Server supports.
SQL includes three subsets, called Data Definition Language (DDL), Data Control Language (DCL), and
Data Manipulation Language (DML).
DDL
DDL includes components with which you can define database objects, such as tables and indexes. An
index is a data structure that improves performance by making it faster to find data, in a similar way to an
index in a book. DDL uses CREATE, ALTER, and DROP statements for creating, modifying, and deleting
objects.
The following code example is a Transact-SQL statement that creates a table called Sales.Customer, with
the columns CustomerID, StoreID, TerritoryID, AccountNumber, and ModifiedDate:
DDL Example
Each of the columns in the code example has a data type. The int data type defines that a column’s
contents must be integer values, and the datetime data type defines that a column must contain date and
time data.
The definition for each column also states either NULL or NOT NULL. NULL defines whether the column is
permitted to contain missing values. NULL means it can contain missing values, NOT NULL means it
cannot.
Note: You should use two-part names to refer to tables in SQL Server databases, such as
Sales.Customer. The first part of the name refers to the schema, which you can think of as a
namespace or logical container for the table.
You typically group tables into a schema based on a logical relationship between the tables, such
as the fact that a group of tables all store data that is relevant to the sales process. The second
part of the name refers to the table within the schema.
DML
DML includes components with which you can view and manipulate data in a database. You can write
DML statements by using the SELECT, INSERT, UPDATE, and DELETE keywords.
You use SELECT statements to retrieve data from a database—for example, to view it in an application.
With INSERT, UPDATE, and DELETE, you can add rows to tables, change values in rows in tables, and
delete rows from tables. For example, when a customer signs up to a shopping website for the first time, a
new row representing that customer will be added to the database.
When that customer changes their phone number, the same row is updated. The following code example
is a Transact-SQL statement that retrieves the CustomerID and AccountNumber values for all rows in the
Sales.Customer table:
DML Example
SELECT CustomerID, AccountNumber
FROM Sales.Customer;
GO
You will learn more about SELECT statements in the next topic.
DCL
DCL uses GRANT, DENY, and REVOKE statements to control access to data by defining permissions. You
can define permissions to individual users and to groups of users. GRANT statements permit users to
perform specific actions, such as read data or update data. If you do not grant permissions to users, by
default they will not be given any kind of access to the data. This is called implicitly denying permission.
DENY statements over-ride GRANT statements and explicitly prevent users from performing specific
actions. For example, a user might have access to data because permission has been granted to a group
that they are a member of. You can deny this user access without denying the rest of the group access by
using a DENY statement to override the GRANT permission for that user. REVOKE statements revoke the
permissions defined by both GRANT and DENY statements.
The following Transact-SQL code example grants SELECT permission to the Sales.Customer table to a
user called Sarah:
DCL Example
GRANT SELECT ON OBJECT::Sales.Customer
TO Lin;
GO
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-23
Transact-SQL Queries
The SELECT statement
You can query SQL databases by using SELECT statements,
as you saw in the previous topic. A SELECT statement
begins with the SELECT keyword, followed by a list of the
columns, separated by commas, which contain the data
that you want to view. You use the FROM keyword to
specify the name of the table that contains the data.
Operators
In the SELECT statement above, “SalespersonID = 282” is a predicate. The = (equals) symbol is an example
of an operator. Operators define how predicates are evaluated. Other operators include:
> (greater than)
You can also use operators to evaluate multiple conditions. For example the AND operator requires
multiple conditions in the WHERE clause to be true, and the BETWEEN operator requires only one or the
other of two conditions to be true. Other operators include LIKE, with which you can compare values by
using wildcard characters, and IS NULL, which you can use to identify NULL values.
Use a SELECT statement to return all rows and columns from a table.
Use a SELECT statement to return all rows and specific columns from a table.
Use a WHERE clause to filter the rows that a SELECT statement returns.
Demonstration Steps
1. Open Microsoft SQL Server Management Studio, and then connect to the MIA-SQL instance of the
database engine by using Windows Authentication.
2. On the File menu, point to Open, click File, browse to D:\Demofiles\Mod01, click
TransactSQLQueries.sql, and then click Open.
3. Select the code under the comment Return all rows from all columns in the
Sales.SalesOrderHeader table, and then click Execute.
4. Review the result set, noting the columns that the query returns and, in the bottom right-hand corner
of the results set, the number of rows the query returned.
5. Select the code under the comment Return all rows from the SalesOrderID and OrderDate
columns from the Sales.SalesOrderHeader table, and then click Execute.
6. Review the result set, noting the columns and the number of rows the query returned.
7. Select the code under the comment Return only rows from the SalesOrderID, OrderDate, and
SalesPersonID columns for which the SalespersonID = 282, and then click Execute.
8. Review the result set, noting the columns and the number of rows the query returned. Note that the
number of rows returned is much lower than for the previous two queries.
9. Select the code under the comment Return only rows from the SalesOrderID, OrderDate, and
SalesPersonID columns for which the SalespersonID > 282, and then click Execute.
10. Review the result set, noting the columns and the number of rows the query returned. Note that the
number of rows returned has increased again.
11. Select the code under the comment Return only rows from the SalesOrderID and OrderDate
columns for which the SalespersonID = 282 and the orderdate is from the year 2013, and then
click Execute.
12. Review the result set, noting the columns and the number of rows the query returned. This query
returned the lowest number of rows because it contained the most restrictive filters.
13. Close the TransactSQLQueries.sql query window and do not save any changes. Close SQL Server
Management Studio.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-25
XQuery
XML is a markup language that looks very similar to
Hyper Text Markup Language (HTML) because, like HTML,
XML uses tags. In XML, these tags are called elements. Elements in a block of XML have angled bracket
symbols (< and >) surrounding them. In XML, the tags surround items of data, giving it semantic
meaning.
In this example, the XML includes data about a person’s appearance, and the elements tell you what the
data means:
In this example, the XML includes data about a person’s appearance, and the elements tell you what the
data means:
XML Example
<Appearance>
<EyeColor>Green</EyeColor>
<HairColor>Brown</HairColor>
<HeightCentimeters>175</HeightCentimeters>
</Appearance>
XML documents usually include an XML schema, which defines the elements that the document can
contain, the data type of those elements, and other things such as any child elements that are permitted
or any default values that apply. With SQL Server, you can store XML data in columns in tables by using
the xml data type, and you can query this data by using XQuery. For example, using the simple XML
example above, you could use XQuery to return all people with green eyes, or blond hair.
MDX
MDX is a query language that is similar in some ways to SQL. Like SQL, MDX uses SELECT statements with
FROM and WHERE clauses, but the syntax within these clauses is different to the syntax in the equivalent
SQL clauses.
MDX is designed for querying multidimensional cubes, and the result set of an MDX query is also
multidimensional; it is also a cube. In an MDX SELECT statement, you have to specify the values that you
want to place on the axes in the result set. For example, you might choose to add measure values such as
Sales Revenue to the row axis, and dimension values such as the years 2014 and 2015, from the Date
dimension to the column axis. Because the cube is a multidimensional structure, you can add more than
two axes if required. You specify the name of the cube that contains the data in the FROM clause, and in
the WHERE clause, you specify the dimension that you want to use to filter the data. For example, you
might specify the Product dimension to limit the query results to a specific product.
Many of the client applications that data professionals use, including Microsoft Excel, issue MDX queries
to retrieve data in response to user actions. For example, a user might use Excel to connect to a cube and
then add measures and dimensions from that cube to a pivot table or chart in a worksheet to create a
visual representation of the data. The person using Excel does not have to write the MDX queries to
retrieve the data themselves; Excel does this automatically in the background.
MCT USE ONLY. STUDENT USE PROHIBITED
1-26 Introduction to Databases
DAX
DAX is a formula-based language that you can use to implement business logic when working with
tabular data models. DAX will be familiar to anyone who has worked with Excel formulas because it uses
similar functions and syntax. You can use DAX functions to create formulas—for example to add measures
to tabular data models—or to define data relationships, such as many-to-many relationships. You can also
use DAX to write queries to return data.
R
R is a statistical programming language for performing advanced statistical analyses on complex data sets.
The statistical analysis functionality that R offers goes beyond the analytical capabilities of the other
languages that SQL Server supports. With R, you can explore many different kinds of complex data, and
use it to create powerful predictive data models.
R also includes graphics libraries that can present visualizations of data to users, making them easier to
understand. Because R is built in to SQL Server, you can perform statistical analysis directly, without having
to extract the data before analyzing it.
Which one of the following correctly represents the structure of a SQL Server
Transact-SQL SELECT statement?
SELECT <table>
FROM <column list>
WHERE <filter that limits the rows returned>
SELECT <rows>
FROM <table>
WHERE <filter that limits the rows returned>
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-27
Objectives
After completing this lab, you will be able to:
o Address (Person)
o Customer (Sales)
o SalesOrderDetail (Sales)
o SalesOrderHeader (Sales)
o SalesPerson (Sales)
o SalesTerritory (Sales)
o ShipMethod (Purchasing)
MCT USE ONLY. STUDENT USE PROHIBITED
1-28 Introduction to Databases
3. Examine the primary key columns and the foreign key relationships between the tables in the
diagram, and note which columns are involved in the relationships.
4. Save the database diagram as Adventure Works Diagram, and then close the diagram window.
o DimCustomer
o DimDate
o DimProduct
o DimProductCategory
o DimProductSubcategory
o FactInternetSales
2. Examine the primary key columns and the foreign key relationships between the tables in the
diagram, and note which columns are involved in the relationships.
3. Save the database diagram as Adventure Works Data Warehouse Diagram, and then close the
diagram window.
Results: After completing this exercise, you will have created a database diagram in the
AdventureWorks2016 database and a database diagram in the AdventureWorksDW2016 database.
2. Write and execute a Transact-SQL query that returns all columns and all rows from the
Sales.SalesOrderHeader table.
3. In the same query window, under the existing query, write and execute a Transact-SQL statement that
returns the SalesOrderID, OrderDate, SalesPersonID columns and all rows from the
Sales.SalesOrderHeader table.
2. Review the results and note the number of rows that the query returned.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 1-29
3. Under the existing query, write and execute a Transact-SQL statement that returns the SalesOrderID,
OrderDate, SalesPersonID columns from the Sales.SalesOrderHeader table, and use a WHERE
clause to return only the rows with a SalesPersonID value of 279 or 282.
4. Review the results and note the number of rows that the query returned.
5. Under the existing query, write and execute a Transact-SQL statement that returns the SalesOrderID,
OrderDate columns from the Sales.SalesOrderHeader table, and use a WHERE clause to return only
the rows with a SalesOrderID value that is between 57000 and 58000.
6. Review the results and note the number of rows that the query returned.
7. Under the existing query, write and execute a Transact-SQL statement that returns the SalesOrderID,
OrderDate columns from the Sales.SalesOrderHeader table, and use a WHERE clause to return only
the rows with a SalesPersonID value of 279 and an OrderDate value that includes the year 2014.
8. Review the results and note the number of rows that the query returned.
Results:
Written and executed SELECT statements to retrieve all columns and to retrieve specific columns from a
table in the Adventure Works OLTP database.
Written and executed SELECT statements that include a WHERE clause to filter the rows that are returned
from a table in the Adventure Works OLTP database.
Question: Why did the number of rows returned by the queries that you wrote in the lab
vary?
MCT USE ONLY. STUDENT USE PROHIBITED
1-30 Introduction to Databases
Review Question(s)
Question: What types of databases and RDBMSs are in use in your work place? Do you have
any experience of working directly with them?
MCT USE ONLY. STUDENT USE PROHIBITED
2-1
Module 2
Data Modeling
Contents:
Module Overview 2-1
Lesson 1: Data Modeling 2-2
Module Overview
This module provides an introduction to data modeling techniques that you can employ when designing
databases. The techniques can be applied to any relational database system; however, in this module
(where appropriate) there will be specific examples using Microsoft® SQL Server®.
Objectives
After completing this module, you will be able to:
Describe data modeling techniques.
Lesson 1
Data Modeling
This lesson provides an overview of common data modeling techniques, so that you can understand them
and how they are used.
Lesson Objectives
After completing this lesson, you will be able to:
Students
Tutors
Courses
Venues
Course presentations
These items would be modeled as entities, with each entity containing the relevant details, or attributes,
for that item.
Entities and relationships are discussed in detail later in this module, but for now, it is sufficient that you
understand the basic purpose of entity/relationship modeling. That is, to show what information is
required to be stored and the relationships that need to exist between the information types.
Entities and relationships are usually shown in an entity-relationship diagram (ERD). In this diagram,
entities are shown as blocks, with the lines between them representing the relationships. A filled circle
joining a relationship to an entity signifies that the entity is required; an open circle signifies that the
entity is optional. The crow’s foot shows that more than one of that entity can be related to one instance
of the related entity.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 2-3
Normalization
Normalization is a method of sequentially
applying rules to a database design to remove
data redundancies, improve stability, eliminate
update anomalies, and improve data integrity.
These rules are divided into a series of normal
forms as introduced in Module 1 of this course.
The normal forms are applied to a table
sequentially; that is, a table must be in first normal
form before you can apply second normal form,
and so on.
Validate the process design. Is the process sufficiently understood and mapped correctly?
It may appear as though the processes have been defined to a sufficient level of detail. However, by
considering how the processes need to access the data stored in the database, TPA can help you
ensure that simple data access paths exist, and that you fully understand the process logic.
Form the basis of security/privacy design. Which processes need to do what to which data?
TPA is not only concerned with entity types and relationships. For each process, you should determine
how well the data model supports it. For this, you need to define which attributes should be selected
and how they are to be used. This can help you define the basis of the security and privacy
framework.
MCT USE ONLY. STUDENT USE PROHIBITED
2-4 Data Modeling
Form the basis of physical database design. With TPA, you can identify potential navigation problems
early, helping you to:
Identify where keys should be added to entities to avoid having to access them via a relationship
from another entity.
Recognize when derivable data should be held in an entity (rather than derived on the fly).
Whether the data model will support the required business processing.
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
True or false? You can use entity-relationship modeling to show the way
the elements in the database are interconnected.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 2-5
Lesson 2
ANSI-SPARC Database Model
The ANSI-SPARC database model is an abstract model for a DBMS. It stands for American National
Standards Institute, Standards Planning and Requirements Committee—but is not a formal standard. It
consists of three levels, each of which is associated with a database schema.
This lesson describes the three levels of the ANSI-SPARC database model and their usage. It also
introduces database schemas.
Lesson Objectives
After completing this lesson, you will be able to:
External Level
The external level of the ANSI-SPARC model
defines how the data in a database will be viewed
by different users. Each user might need to see
different parts of the data, or to see the data
arranged in different ways. These requirements are
handled by the use of views, each extracting and
organizing the information in the required ways.
You can also use the external view to restrict access to data to authorized users.
MCT USE ONLY. STUDENT USE PROHIBITED
2-6 Data Modeling
Conceptual Level
The conceptual level of an ANSI-SPARC database
model represents the data requirements for an
organization. It is important to recognize that:
o Entities
o Attributes
o Relationships
It usually comprises an ERD, entity headings, and any constraints and/or assumptions.
Constraints
The following constraints exist:
Assumptions
There are no assumptions made in this example.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 2-7
Internal Level
The internal level of an ANSI-SPARC database
model is used to show how the database is
physically represented in the computer system. It
describes how the data is actually stored in the
database and on the computer hardware.
The internal view defines the actual record layouts
of each table in a database. Information from the
conceptual view is used to create the physical files,
using SQL commands. These physical files will
store the data described by the conceptual views.
Database Schemas
A database schema can graphically show the way
a particular database is organized. There are three
different types of schema corresponding to the
three levels in the ANSI-SPARC architecture
previously described:
The internal schema, the lowest level, contains definitions of the stored records, the methods of
representation, the data fields, and indexes. There is only one internal schema per database.
External level
Conceptual level
Internal level
MCT USE ONLY. STUDENT USE PROHIBITED
2-8 Data Modeling
Lesson 3
Entity Relationship Modeling
This lesson expands on the technique of Entity Relationship Modeling introduced in a previous lesson.
Lesson Objectives
After completing this lesson, you will be able to:
Describe the types of attribute that an entity can have and their use as keys.
Identify the different types of relationships and how they are implemented.
Read and understand an ERD and appreciate the different notations used.
Entities
An entity is a data modeling concept that
represents a real-world object about which
information is required to be stored. An entity can
represent a physical object, such as a person,
place, or a thing; or it could represent a virtual
object, such as an event.
Entities can have the following associated with
them:
Name: the name of the entity, used to
uniquely identify the entity.
Growth: whether the number of instances of a particular entity is fixed, or can increase; also specifies
the rate at which an entity can grow.
In a database, an entity is represented by a table. Within each table, the individual rows represent one
instance of that entity. These instances are also referred to as “tuples”. Each tuple is, by definition, distinct
from all other tuples within the same table. This is because a primary key always exists.
An informal definition of a primary key is an attribute, or combination of attributes, where no two rows
(tuples) in a table (entity) can have the same value(s) for the attribute(s) comprising the primary key.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 2-9
First name
Last name
Date of birth
Gender
… and so on …
An attribute is a noun or noun phrase, specified in
the singular. Attributes representing the same type
of thing (such as names, dates, and so on) are declared on domains. Domains are the means by which the
values of attributes can be formalized. This means that, if we have attributes such as StartDate and
EndDate, these are common attributes whose values are comparable. They can therefore be declared on
the same domain.
domains
PersonNames = string
Years = 2000 to 2016
TitlesOfCourses = string
An example of a full domain declaration is shown in the model answers to the labs at the end of this
module.
Purpose of Attributes
Attributes, or combination of attributes, make each row (tuple) within a table (entity) unique. In the
Person example above, there may be several instances with the same First Name. We therefore need to
include the Last Name attribute in combination, to try to achieve this uniqueness. It may be necessary to
include other attributes to further ensure uniqueness.
Candidate Keys
By definition, tuples are distinct. This implies that, for some attribute, or combination of attributes, within
an entity, no two tuples will have the same value for that attribute, or combination of attributes. We call
such an attribute (or combination of attributes) K—a candidate key if, and only if, it has the properties of
uniqueness and minimality.
By uniqueness, we mean that, at any given time, no two tuples of the entity have the same value for the
attribute K. By minimality, we mean that, if K is a combination of attributes, no attribute can be discarded
from the combination without destroying its uniqueness.
Every entity has at least one candidate key, by virtue of the distinctness of entities. Note that a candidate
key is a constraint that determines what is and what is not allowed in a particular row.
MCT USE ONLY. STUDENT USE PROHIBITED
2-10 Data Modeling
Primary Key
The primary key of an entity is an attribute (or combination of attributes) that has (have) the properties of
uniqueness and minimality.
Every primary key is a candidate key. An entity may have more than one candidate key; in this case, just
one of these candidate keys is designated as the primary key. The remaining candidate keys are termed
alternate keys. When there is only one candidate key, that is the primary key and there are no alternate
keys. Where there is a choice between alternate keys for the primary key, the choice is (to a large extent)
arbitrary.
Primary and alternate keys are identified in a database schema as part of an entity definition. It is
important to include this in a schema, because an alternate key (like a primary key) is a constraint.
Foreign Key
The foreign key is an attribute (or combination of attributes) in one entity whose values are the same as
values of the primary key in another.
The major purpose of a primary key is to simplify the way in which the schema represents relationships
between entities. This is done by making a primary key in one entity, and a foreign key in another. Note
that the foreign key has no uniqueness property for the entity where it has been placed.
The use of foreign keys is the mechanism for representing a one-to-many (1:n) relationship. The entity on
the :n side includes, as a foreign key, the primary key of the entity on the 1: side of the relationship.
Relationships
In a conceptual model, a relationship can exist
between two entities. It is a named, meaningful
link between entities. It represents how the
attributes in one entity are related to those in the
other and can be thought of as a verb phrase that
links the two entities to form a sentence—for
example:
We will discuss the types of relationships that can exist (and what they represent) in the next topic. For
now, it is sufficient to appreciate that, between two entities, a relationship can define that: for one
instance in one entity, there may be none, one, or many related instances in the other. Relationships can
be read in both directions. Using the examples above:
STUDENT attends COURSE PRESENTATION.
A STUDENT attends one COURSE PRESENTATION, but there may be several STUDENTS on a particular
COURSE PRESENTATION.
A COMPANY may sponsor many STUDENTs, but each STUDENT will be sponsored by one COMPANY.
A COURSE PRESENTATION will be a presentation of one COURSE TYPE on a specific date and at a specific
place; COURSE TYPEs may be presented on many different dates and/or at many different places.
When modeling relationships, it is important to not only document the fact that a relationship exists, but
also what it means. This removes ambiguity and the potential for misunderstanding, as different people
may interpret the relationship differently.
Types of Relationships
Relationships can be of many different degrees. As
explained previously, they are implemented by
using the primary key from one entity as a foreign
key in the other.
One-to-One
In a one-to-one relationship, an attribute in one entity is related to one (and only one) attribute in the
other. For example, consider two entities, one containing Customer Names, and the other holding
Customer Details (address, telephone number, and so on). In this scenario, each Customer will have only
one set of details.
One-to-Many
In a one-to-many relationship, an attribute in one entity is related to many attributes in the other. For
example, consider two entities, one containing information about Customers, and the other holding
details of Orders. It is likely that each Customer might have placed one or more orders (many), but each
order has only been placed by one Customer.
MCT USE ONLY. STUDENT USE PROHIBITED
2-12 Data Modeling
Many-to-Many
In a many-to-many relationship, many attributes in one entity are related to many attributes in the other.
For example, consider two entities, one containing information about Students (STUDENT), and the other
containing information about Courses (COURSES). It is likely that Students will be enrolled on many
courses and each Course will have many Students enrolled on it.
Many-to-many relationships cannot be directly implemented in a relational database because they would
violate table rules—in that it would require multiple values in a particular column for a row. By the rules,
each row in a table must be atomic; that is to say, for each row, the value within a column is always one
value and never a group of values. In a many-to-many relationship, it would be required that a set of
values for the primary key in one entity is included as a single value of a foreign key in the other. Such
sets of values are not permitted by the logic in the relational model.
Many-to-many relationships are modeled using an intersection entity. In the above example, we could use
an entity named COURSE ENROLMENT. There would be a one-to-many relationship between the entities
STUDENT and COURSE ENROLMENT, indicating that one Student may be enrolled on many Courses; and
a one-to-many relationship between the entities COURSE and COURSE ENROLMENT, indicating that one
Course may have several Students enrolled on it.
Notations
There are several different methods for identifying
and describing relationships between entities. So
far in this module, we have used the symbols
shown in the slide. They have the following
meanings:
When crows’ feet appear at both ends of the relationship line, this identifies a many-to-many
relationship. Recall that this type of relationship cannot be implemented, and so an intersection entity
is inserted to make two one-to-many relationships.
There are other methods for illustrating relationships. For example, the Oracle Case Method uses many of
the symbols previously described. The following different symbols are used:
A straight line indicates a mandatory relationship, while a dashed line indicates an optional
relationship.
A small straight vertical bar appearing perpendicular to the horizontal relationship line at either end
indicates that the entity contains a unique identifier as an attribute.
Non-transferable relationships (small diamond shape): this symbol is used to define a relationship that
cannot participate in an either/or relationship.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 2-13
For entities:
Mandatory attribute (asterisk symbol *): this symbol identifies attributes that must always contain a
value and cannot be NULL.
Key attributes (the hash symbol #): this symbol is used to identify an attribute of an entity that is a
unique identifier.
Optional attributes (small circle symbol or the letter “o”): this symbol identifies attributes that do not
require a value; the “o” is often omitted.
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
True or false? You can remove one or more attributes from a multi-
attribute candidate key without detriment.
MCT USE ONLY. STUDENT USE PROHIBITED
2-14 Data Modeling
The company needs to keep details of staff and students, the courses that are offered, and the
performance of students on these courses. Students are registered with the company before
commencing their studies, and are issued with a student identification number. Students are
not required to enroll on any course when they register. Students’ full names and the date of
their registration are recorded. Staff are also issued with a staff number, and their full names
are recorded. Each staff member may tutor one or more students on courses. Each student
has a tutor for each course on which they are enrolled. Students are allocated a tutor for the
course on which they are enrolled at any time after enrollment.
Each course that is available for study is given a course code, a title, and a value for credit
(either 0.5 or 1.0). Students are not permitted to enroll on more than three credits worth of
courses in any one year. There is a need to record current enrollments only.
Courses may have a quota—the maximum number of students that can be enrolled on the
course in any one year. A course may not (yet) have any students enrolled on it. Each course
may have up to five assignments that the students are required to complete. These are
graded by the tutor assigned to the individual student. The grade for each of these
assignments must be recorded as a mark out of 100.
As a Business Analyst, your task is to model these data requirements and develop a conceptual model
comprising:
An ERD, showing entities and relationships.
Objectives
After completing this lab, you will be able to:
4. Add these attributes to the various entities in the list you developed in the previous task.
5. From these attributes, identify the candidate keys and primary keys.
Results: After completing this exercise, you will have an initial list of entities and attributes that model the
data requirements for the brief provided. The entity definitions will include appropriate domains.
6. What problems do you see for modeling relationships with the model in its current form?
Results: After completing this exercise, you will have an initial list of relationships between the entities
that model the data requirements for the brief provided. You will have an initial ERD and have resolved
any relationships that cannot be modeled (many-to-many relationships).
Results: After completing this exercise, you will have a final data model meeting the original specification.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 2-17
Changes to requirements.
Review Question(s)
Question: How would you approach data modeling?
Tools
There are several ER modeling tools that can be used for the type of modeling discussed in this module.
However, for simple models, Microsoft Visio® is sufficiently powerful.
MCT USE ONLY. STUDENT USE PROHIBITED
3-1
Module 3
Normalization
Contents:
Module Overview 3-1
Lesson 1: Fundamentals of Normalization 3-2
Module Overview
Normalization is a commonly implemented, powerful technique for creating efficient databases that are
best suited to the tasks they will perform. The concepts used in normalization come from mathematical
relational theory, and sometimes definitions of normal form use specialist terminology. This module
describes normalization and normal form by using examples to clarify the points being made.
Objectives
After completing this module, you will be able to:
Outline the benefits of normalization and some of the concepts that underpin it.
Explain the benefits of denormalization and outline some techniques for denormalization.
MCT USE ONLY. STUDENT USE PROHIBITED
3-2 Normalization
Lesson 1
Fundamentals of Normalization
It is essential to understand the various benefits of normalization so that you know what it is that you are
trying to achieve when applying the principles that this module explains. Additionally, to better
understand the process of normalization, it is important to ensure that you are familiar with the relevant
terminology. This lesson explains the benefits that normalizing a database can bring, and introduces some
key terms and concepts.
Lesson Objectives
After completing this lesson, you will be able to:
Explain the terms primary key, candidate key, and surrogate key.
Benefits of Normalization
Broadly speaking, the process of normalization
helps to deliver a database schema that is
standardized, and which ensures that data is
consistent, with minimal redundancy. When you
apply normalization guidelines to a database
design, you can achieve many benefits, including:
A database that is more consistent. Actions
such as modifying or deleting data can result
in data anomalies. For example, if you store
the contact details for a customer in a
Customers table and also store the same
contact details in every order for that
customer in an Orders table, when you update the address, you must update it in all of these
locations, to avoid anomalies. In a normalized design, you would store the contact details only once,
so that when you update the contact details, you do not create anomalies.
A database that you can extend without having to redesign it. In a normalized database design,
each table represents one entity, and the columns in a table are properties that describe that entity.
This makes for a very simple design that should not need to be changed much (if at all) when you
make subsequent changes to the database schema, such as adding new entities.
An intuitive, easy-to-use database. A normalized database typically stores data in a way that
mirrors real-world processes, which makes it easy for users. For example, a database that models the
process of customers placing orders might have a Customers table that contains customer
information, a Products table that contains product information, and an Orders table that contains
order information by reference to the Customers and Products tables.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 3-3
More efficient use of storage media. Including less redundant data in a database results in a
smaller database that requires less storage space.
Keys
One of the most important principles in relational
database design is that every row in a table must
be unique so that it can be unambiguously
identified. Database designers use keys to achieve
this.
Primary keys
A primary key is an attribute value, or a
combination of attribute values, that is unique for
every row in a table. To ensure that the values in a
primary key column are never duplicated, you
create a primary key constraint on the primary key
column. A primary key constraint automatically
prevents the addition of any data value to the column if that value already exists in the column. Primary
keys that include more than one column are sometimes referred to as compound keys.
Candidate keys
To select the primary key for a table, a database designer first identifies every column that contains
unique data values. Each of these columns is a candidate key for the table, and any candidate key can
potentially serve as the table’s primary key. Candidate keys can also be combinations of columns in which
the combined values are always unique, even if the individual values in each column are not. When
identifying candidate keys, you should avoid columns that are configured to allow NULL values, because
NULLs are not valid in a primary key. This includes columns that might not currently contain NULLs but
whose definition allows them, because this means that the column might contain NULLS at some future
point.
Surrogate keys
If it is not possible to identify a suitable candidate key, you can use a surrogate key. A surrogate key is a
value that you create to serve as a candidate key, and which has no meaning outside the context of the
database. For example, imagine a Products table that includes a Product Number column that you
identify as a candidate key. Product number values are used in other applications to identify products, so
they have meaning beyond the database context.
MCT USE ONLY. STUDENT USE PROHIBITED
3-4 Normalization
On investigation, you discover that product numbers are occasionally reused to represent different
products, meaning that the Product Number column is not a good candidate key because values are not
guaranteed to be unique. Instead, you could create a column called Product ID in the Products table to
act as a surrogate key. Product ID would contain a unique value for each row, helping you to differentiate
between all products, even those that have the same product number. Surrogate key columns often
simply contain integer values that the database generates automatically.
Note: In contrast to a surrogate key, a candidate key that represents a value that exists in
the real world, such as the product number in the earlier example, is sometimes called a natural
key. When identifying candidate keys, you should be cautious about assuming that natural key
values are unique. For example, you might reasonably assume that values such as social security
numbers and passport IDs are unique within a country—but they may not be unique in a
worldwide context.
Demonstration Steps
1. Start the 10985C-MIA-DC and 10985C-MIA-SQL virtual machines, and then log on to 10985C-
MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
2. In the D:\Demofiles\Mod03 folder, right-click Setup.cmd, and then click Run as administrator.
3. In the User Account Control dialog box, click Yes, and wait for setup to complete.
4. Open Microsoft SQL Server Management Studio, and then connect to the MIA-SQL instance of
the database engine by using Windows Authentication.
5. On the File menu, point to Open, and then click File.
6. In the Open File dialog box, navigate to the D:\Demofiles\Mod03 folder, click Candidate Keys.sql,
and then click Open.
7. Under the comment View the columns in the Production.Product table and identify the total
number of rows, select the Transact-SQL statement, and then click Execute.
8. Review the results, and in the results pane, in the bottom right corner, note the number of rows that
the query returned.
9. Under the comment Assess ProductID as a candidate key, select the Transact-SQL statement, and
then click Execute.
10. Review the results, and in the results pane, in the bottom right corner, note that the number of rows
that the query returned is the same as for the query in step 7. This means that ProductID is a
candidate key because it contains a unique value for each of the rows in the table.
11. Under the comment Assess Name as a candidate key, select the Transact-SQL statement, and then
click Execute.
12. Review the results, and in the results pane, in the bottom right corner, note that the number of rows
that the query returned is the same as for the query in step 7. This means that Name is a candidate
key because it contains a unique value for each of the rows in the table.
13. Under the comment Assess ProductNumber as a candidate key, select the Transact-SQL statement,
and then click Execute.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 3-5
14. Review the results, and in the results pane, in the bottom right corner, note that the number of rows
that the query returned is the same as for the query in step 7. This means that ProductNumber is a
candidate key because it contains a unique value for each of the rows in the table.
15. Under the comment Assess Color as a candidate key, select the Transact-SQL statement, and then
click Execute.
16. Review the results, and in the results pane, in the bottom right corner, note that the number of rows
that the query returned is less than the number of rows that the query in step 7 returned. This means
that Color is not a candidate key because it does not contain a unique value for each of the rows in
the table.
Dependencies
In addition to ensuring uniqueness, another goal
of normalization is to ensure that non-key values
in tables have a dependency on the key. Non-key
values are the values that are present in a table
but which do not form part of the primary key.
Dependency means that the values of non-key
attributes are determined by the key itself.
Identifying dependencies is part of the process of
normalization, and helps designers to decide
which columns to include in a table and which
ones to move to other tables.
Functional dependency
An attribute is said to have a functional dependency on the key when it is possible to determine the
attribute’s value solely from the key. For example, imagine an Employees table that has a primary key
column called SSN, which contains an individual employee’s social security number. Social security
numbers are government issued values that are intended to uniquely identify an individual for social
security and tax purposes—so they are good candidate keys.
Other columns in the Employees table include Employee Last Name, Date of Birth, Job Title, Phone
Number, and Department. The SSN value in a row determines the values in the other columns; every time
you look up a particular SSN, it will always return the same values for these columns. This relationship is
often expressed by using the notation A C, which means that the value A determines the value C. In the
Employee table example, the following dependencies exist:
MCT USE ONLY. STUDENT USE PROHIBITED
3-6 Normalization
Note that functional dependencies do not work in reverse. For example, Job Title SSN does not work
because the value in the Job Title column could be the same in many rows, so looking up a particular job
title value is not guaranteed to return the same, single SSN value every time. Similarly, Employee Last
Name Department is not a valid dependency. The company might employ several people with the
same last name, who work in different departments, so looking up a name would not reliably return the
same department value.
Determining functional dependencies in tables requires knowledge of the real world. For example, if you
did not know the real-world fact that an SSN is uniquely associated with an individual, you would not be
able to determine the functional relationships in the earlier example. If SSNs were shared in the real world,
then returning any given SSN would not be guaranteed to return the same attribute values every time—
so there would not be a functional dependency.
Transitive dependency
Transitive dependencies are indirect dependencies that can be summarized as A B C. This means
that the value of C is not directly dependent on the key value A, but only indirectly dependent on it. For
example, in the Employees table previously shown, imagine that there is also a column called Salary, and
that the salary that an employee earns depends on their job title, not on the individual. The dependency
SSN Salary is not valid in this case, and you would correctly express the relationship as SSN Job Title
Salary.
The problem with transitive dependencies is that, if you remove or change the intermediate value, the
relationship between the other values is no longer valid. In the Employees table example, changing the
Job Title value for a row would break the link between the SSN and Salary values, resulting in inconsistent
data. Furthermore, it would be possible to have employee rows with the same Job Title value, but with
different Salary values—and that would not be correct.
To correctly identify dependencies, it is essential to understand the data. To identify the transitive
dependency in the earlier example required knowledge about how salaries were linked to job titles.
Without this knowledge, we might have assumed that salaries were linked to the individual, and
consequently that the dependency SSN Salary was valid.
Multivalued dependency
Multivalued dependency refers to the situation where the value in a key column can have multiple
possible corresponding values in attributes that depend upon that column. This contrasts with functional
dependencies, which always return the same value for any given key value. For example, imagine a table
with three columns: Training Courses, which stores training course codes; Trainers, which stores the names
of the trainers who deliver the courses; and Course Books, which stores the titles of the course books. Any
training course can be delivered by multiple different trainers, and each trainer can deliver multiple
courses; so each time a training course appears in a table, it might be associated with a different trainer
value. In other words, for any given value of Training Course, there are multiple possible values for
Trainers.
The relationship between Training Courses and Trainers is an example of a many-to-many relationship. A
second many-to-many relationship exists between Training Courses and Course Books because a course
can use more than one book, and any book might be used in multiple courses. To represent the various
possible combinations of courses, trainers, and books, the same data values will appear in the table
multiple times. For example, there might be rows including:
The multivalued dependencies are Training Courses Trainers and Training Courses Course Books. For
example, for the Training Course value “Course1”, values of the dependent attribute Trainers include
Trainer1 and Trainer2. This situation leads to redundancy because the same values are stored multiple
times.
What type of key can you use when you cannot identify an appropriate primary key
from the existing columns for a table?
Surrogate key
Candidate key
MCT USE ONLY. STUDENT USE PROHIBITED
3-8 Normalization
Lesson 2
Normal Form
You can normalize a database by implementing a set of guidelines that are collectively referred to as
“normal form”. The majority of database designs only take account of the first three levels of normal
form—called first normal form, second normal form, and third normal form—because they are more
widely applicable than the others. When a database complies with, for example, first normal form, the
database is said to be “in first normal form”. The levels of normal form are cumulative—this means, for
example, that for a database to be in second normal form, it must also be in first normal form. This lesson
explains the different levels of normal form and how to implement them.
Lesson Objectives
After completing this lesson, you will be able to:
When assessing whether a table complies with first normal form, you can use the following checks:
Are there any columns that contain data that includes separator values, such as commas?
Are there any columns with similar names, perhaps differentiated by numbers at the end, such as
Address1, Address2, and so on?
Note: Breaking data out from a single table into multiple tables is called “lossless
decomposition”, and is a large part of the process of normalization. When you decompose the
tables into multiple tables, the process is lossless because you never remove any data from the
database; you just relocate it to different tables.
The new PhoneNumbers table would have two columns, a unique primary key identifier called Phone ID
and a column called Phone Number. Neither of these tables contain repeating data groups, but both have
unique identifiers and a fixed number of columns—so both are in first normal form. However, this still
leaves you with a problem: how do you associate the phone numbers with the employees? You cannot
simply add columns to the Employees table again to contain the primary key values for the associated
phone numbers because this defeats the purpose of removing the phone numbers in the first place. To
solve this problem, you would create a third table called EmployeePhone with two columns, called
Employee ID and Phone ID. Each Employee ID could appear in the table multiple times, each time with a
different Phone ID. This also allows a given Phone ID to appear with multiple Employee ID values, which
reflects the real-world situation that some phone numbers might be shared by employees. In the
EmployeePhone table, the combination of the Employee ID and Phone ID columns would be unique for
each row, so it is a candidate key. You could also add a dedicated surrogate key column to the table if
required. To ensure that the values in the EmployeePhone table are valid, you would use foreign key
constraints.
For example, imagine a variation on the Employee table, in which there is a composite primary key that
consists of the SSN column and another column called Start Date. The Start Date column records the date
that an employee started working at the organization. Employees frequently leave and then return to the
company—this means that the organization can differentiate between the different periods of
employment. The other columns in the Employees table, Employee Last Name, Date of Birth, and Address,
all have a dependency on the SSN, but they do not depend on the Start Date. The problem with this
situation is that it leads to redundancy. Every time you add a new row for an employee who has worked
for the organization before, you would need to repeat all of the Employee Last Name, Date of Birth, and
Address data that was in the previous row for that employee.
To eliminate this problem, you could decompose the table into two tables: Employees, which would
contain the SSD, Employee Last Name, and Date of Birth columns, and EmployeeDate, which would
contain the SSD and Start Date columns. The combination of SSN and Start Date form the primary key in
the EmployeeDate table, and SSN is the primary key in Employees. The Employee Last Name and Date of
Birth columns now depend entirely on the key value. A foreign key constraint ensures that the SSN values
in EmployeeDate are consistent with the SSN values in Employees.
To solve this problem, you would decompose the table into two tables:
An Employees table with only the columns SSN, which is the primary key, Employee Last Name, Date
of Birth, and Job Title.
Now the salary for any given job title is stored only once, in the TitleSalary table, and referenced from the
Employees table. Consequently, there is now no possibility that different salaries could be added for the
same job title.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 3-11
Demonstration Steps
1. In the D:\Demofiles\Mod03 folder, double-click the Normalize Book Data.xlsx file.
2. On the Raw Data worksheet, review the data and note the following points:
3. On the 1NF worksheet, note the changes that have been made to bring the table into first normal
form. These changes include:
a. Adding a composite primary key consisting of the Book Title, Copy Number, and Pressing
columns.
4. Note that the columns Year Published, Author, Author Year of Birth, and Author Age all contain
repeating data values. This is because these columns are not functionally dependent on the whole of
the primary key.
5. On the 2NF worksheet, note the changes that have been made to conform to second normal form.
The table has been decomposed into four tables: Book, Author, Copy, and Pressing. The columns in
each table depend on the whole of the primary key in their respective tables.
6. On the 3NF worksheet, note the change that has been made to ensure that the Author table
conforms to third normal form. The Author Age column has been removed because it was
transitively dependent on the primary key, through the Author Date of Birth non-key column. If
required, the age of each author can be calculated by using the date of birth value, so it not
necessary to store it.
MCT USE ONLY. STUDENT USE PROHIBITED
3-12 Normalization
There are two seating areas, one of which has a better view than the other, so it has a higher rate.
However, the cost of booking a ticket also varies with the time of the performance; the later performance
time being the most expensive. There are four candidate keys in the table:
Seating Area and Start Time.
The candidate keys have attributes in common—for example, End Time is part of two candidate keys.
Every attribute is part of a candidate key, so there are no non-key attributes present in the table.
Consequently, because second normal form and third normal form deal with dependencies of non-key
attributes on candidate keys, the table is already in third normal form. However, there is another
dependency in the table that is problematic. Any given Rate value is always associated with the same
Seating Area value, so the dependency Rate Seating Area must hold for every row. With the limited
data set in the previous table, the dependency does appear to hold. In fact, it would be possible to enter a
booking for seating area 1 with a rate of Premium, which should not logically be possible, and which
would make Rate Seating Area invalid.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 3-13
To prevent this, you would decompose the table into the following two tables:
Rates
Basic 1
Standard 1
Higher 2
Premium 2
Bookings
Now it is not possible to enter a booking for a rate that is incompatible with the seating area, and the
tables are in BCNF. In limited situations, you might also normalize to fourth normal form and fifth normal
form.
The table redundantly stores multiple Skill values for each language spoken (or conversely, multiple
Language values for each skill). To eliminate this redundancy, you can decompose the table to create two
new tables:
EmployeeSkill, which contains the Employee column and the Skill column.
EmployeeLanguage, which contains the Employee column and the Language column.
Now each skill and each language is stored only once per employee.
An important point to note is that, once again, knowledge of the real-world data is essential when
deciding whether to decompose a table. In the previous example Employee table, there is an assumption
that each employee can perform each skill in every language. As a result, it is necessary to add a row for
every combination of skill and language for each employee, which leads to redundancy. However,
imagine if some skills could only be performed in one language; for example, employee 1 can do first line
support and technical documentation in English.
However, in German, they can only do first line support because their written skills are not good enough
for technical documentation. The row that combines technical documentation and German is therefore
not required, and so data redundancy is reduced. If this were a very common scenario with other
employees’ skills and languages, you might consider leaving the table in third normal form or BCNF, and
not decomposing it to comply with fourth normal form.
The following Reseller table stores data about the resellers that a company uses to sell its products. The
table contains data about two resellers, two regions in which the resellers operate, and three products.
Resellers can operate in either region, or in both, and they can sell all of the three products, or only some
of them.
1 North Product A
1 South Product A
1 South Product B
2 North Product A
2 North Product C
Reseller 1 operates in both the North and South regions, and sells only products A and B.
Reseller 2 operates only in the North region, and sells only products A and C.
The table does not store other possible combinations, such as reseller 1, North, and Product C
because, in the real world, this combination is meaningless—reseller 1 does not sell Product C.
In this scenario, it is appropriate to leave the table as it is because only limited subsets of the possible
combinations of attribute values are stored. This leads to some repetition of data, but this is not
redundant data—it is required to maintain the information about which resellers sells which products, in
which regions.
However, if the real-world situation was that all resellers operate in all regions and sell all products, then
the table would need to store all possible combinations of attribute values to accurately represent this
fact. This would lead to the storage of redundant data because, if all combinations are valid, this can
simply be assumed; there is no need to store all of the combinations. To eliminate the redundant data,
you could decompose the table into three tables, each of which has two columns, as follows:
ResellerRegion, which contains the Reseller and Region columns to represent all possible
combinations of these columns.
RegionProduct, which contains the Region and Product columns to represent all possible
combinations of these columns.
ResellerProduct, which contains the Reseller and Product columns to represent all possible
combinations of these columns.
In practice, normalizing to fourth normal form and fifth normal form is relatively uncommon; most
transactional relational databases are normalized to third normal form or BCNF at most.
MCT USE ONLY. STUDENT USE PROHIBITED
3-16 Normalization
Categorize Activity
Match each description with the correct level of normal form. Indicate your answer by writing the
category number to the right of each item.
Items
1 Provide a primary key as a unique identifier for rows, and eliminate data groups that
repeat across columns.
2 Ensure that all non-key columns are functionally dependent on the whole of the
primary key.
3 Eliminate attributes that are only transitively dependent on the primary key.
Lesson 3
Denormalization
When a database will be used primarily for create, read, update, and delete (CRUD) operations, using the
normalization guidelines described in the preceding lesson will generally result in better performance.
However, it is not always the case that normalization will produce a better performing database, so you
should regard normalization and normal forms as guidelines rather than rules when designing a database
schema.
Lesson Objectives
After completing this lesson, you will be able to:
Introduction to Denormalization
Decomposing tables to implement the various
levels of normal form is standard practice for OLTP
databases, but it is important to be aware of
situations in which normalization is not optimal,
and can actually be counter-productive.
Star and snowflake schemas, which are used in data warehouses. Data warehouses support
analytical operations, and experience almost exclusively read-based workloads, with periodic data
loads to bring them up to date. These databases are generally not highly normalized because storing
redundant data can improve read performance.
It is probably best to regard these two types of schema as two extreme endpoints on a spectrum, with
most databases sitting somewhere in between; albeit that many databases will be closer to one end or the
other. You should not simply assume that, because you are designing an OLTP database, you should
always normalize it as much as possible.
MCT USE ONLY. STUDENT USE PROHIBITED
3-18 Normalization
Storing derived data that you can calculate from existing data.
The optimal design for any given database is highly specific, and depends on the way that it will be used.
For example, if you are designing a schema for an OLTP database, but you know that certain attributes
will also be used frequently for queries, you should normalize the database for OLTP performance, and
then consider whether to selectively denormalize for those attributes. Denormalization usually reduces the
number of joins needed to process queries. Reintroducing redundancy in this way is likely to have an
adverse effect on CRUD operations, and a good design will find the right balance between read
performance and CRUD performance.
Benefits of Denormalization
Normalizing a database through lossless
decomposition reduces data redundancy; this then
reduces the amount of data in the database.
However, by definition, decomposing tables
increases the number of tables in the database. In
other words, a highly normalized database will be
smaller in terms of data volume, but it will have a
more complex schema than a version of the same
database that has not been normalized. This
complexity can lead to various problems that you
can address through denormalization.
Improved concurrency
When users attempt to access data concurrently, an RDBMS must ensure that inconsistencies don’t occur.
For example, if one user reads a data value from a table, and a second user simultaneously updates the
same data value, the first user’s data is now out of date, and any subsequent decisions that they make
using this data will be based on inaccurate information. RDBMSs use mechanisms such as locking to
handle these kinds of situations. Locks prevent applications from making changes to data if that data is
being used by other applications. When many users need access to the same tables at the same time,
locking can impede performance because applications have to wait until resources are released before
making changes. By introducing redundant data, you can potentially reduce the impact of locks and
improve performance.
Ease of use
Users who query a database need to understand the database schema so that they can locate the
required attributes and understand how those attributes relate to attributes in other tables. A
denormalized database can reduce the complexity of the schema for users, making it easier to work with.
An alternative to denormalization that addresses this specific problem is to use views. A view is a database
object that provides a view of a part of a database that is easier to understand. When using a view, a user
might see only one logical table when, in reality, this logical table is comprised of several database tables.
However, views do not address performance or concurrency issues that arise from normalization.
An important point to note with this approach is that future insert, update, and delete operations must be
performed in both tables to maintain the integrity of the data.
Report tables
You create a report table particularly to handle queries for a specific report or set of reports. Reports can
generate queries that require a great deal of processing power and take a long time to run, because they
often contain multiple joins and complex calculations. A report table, also sometimes called a pre-joined
table, is a table that contains all of the attributes and calculations required to service reporting queries, so
that the joins and calculations do have to be performed when the report is generated. Because a report is
typically a snapshot in time, it is not usually necessary to keep report tables synchronized with other
tables—you would typically repopulate the report table periodically.
Duplicate tables
A duplicate table, also called a mirror table, is an exact copy of an existing table. You would usually create
a duplicate table to address issues with contention. For example, imagine a table that is heavily used by
two applications, one that inserts new data and updates existing data, and one that reads data. In this
scenario, there will be frequent occasions when one application must wait for the other to complete its
actions before it can perform its own actions. By creating an exact copy of a table and dedicating each
copy to one of the applications, you can reduce contention, and speed up response times.
Split tables
A split table is a single table that is divided in two, based on the needs of the applications that use the
table. You can create split tables when two or more applications exclusively use different parts of the
table. Contention exists in this situation because applications can cause the entire table to be locked.
When you split a table, you place all of the columns used by one application in the first table, and the
columns used by the other application in the second table. However, both tables need to store the same
primary key value, so that you can logically recreate the original table by using a query.
The scenario just described is called a vertical split because it involves splitting the table by columns. An
alternative is to split horizontally, by row. This might be useful if applications access different rows in the
same table—for example, one application might deal with current data rows, and a second application
might deal with historical data rows. When you split a table, you can either remove the original table, or
retain it in addition to the new tables—for example, so that a third application can use it. Splitting tables
is also sometimes referred to as over-normalizing because it involves the lossless decomposition of a
table.
For example, you can use the FirstName and LastName columns in an Employee table and concatenate
them in a query to create a FullName column. Adding a calculated column or derived column that stores
this information in the table can speed up response times, because it is not necessary to concatenate data
or perform calculations at query time. When you add a column whose values derive from other non-key
columns in the table, you increase redundancy; furthermore, because this process involves adding a
transitive dependency, the table will no longer meet the requirements of third normal form.
All of the techniques that this topic describes are advanced, and using them requires a solid
understanding of the concepts involved, in addition to the knowledge of the data and the types of
queries associated with a specific database. Additionally, any schema designs should be thoroughly tested
before implementation to ensure that they meet the particular requirements for the database.
Updates. Updates to denormalized data can be problematic because you need to update each piece
of data consistently—this can be difficult because the data item might be stored multiple times.
Remember that eliminating this kind of redundancy is one of the key reasons why you normalize
databases in the first place. For example, with duplicate tables, you have two copies of the same data.
You need to either ensure that the application code performs inserts, updates, and deletes updates
on both tables at the same time; or, if some latency is acceptable, that the tables are synchronized on
a periodic schedule.
Data anomalies. When a database stores redundant data, the likelihood of data anomalies is much
greater because of the difficulty of updating data consistently.
Hardware. The more powerful the hardware, the smaller the effect that denormalization will have on
performance. For example, servers that have large amounts of memory to cache query results will be
able to deliver fast query response times for similar queries, even for highly normalized databases.
In-memory technologies. In-memory technologies, such as In-Memory OLTP and Columnstore
indexes in SQL Server, make it easier to maintain a normalized database without incurring the query
response time cost—this minimizes the need to denormalize. However, these technologies are not
suitable for every database.
Testing. Before introducing denormalized tables into a production environment, it is essential that
you test query response times to ensure that the cost of denormalization is offset by the benefits.
MCT USE ONLY. STUDENT USE PROHIBITED
3-22 Normalization
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
Objectives
After completing this lab, you will have:
2. Review the data on the Raw Data tab in D:\Labfiles\Lab03\Starter\Normalize to 1NF.xlsx and
note any aspects that cause it to violate first normal form.
Results: After completing this exercise, you will have normalized a data set to first normal form.
MCT USE ONLY. STUDENT USE PROHIBITED
3-24 Normalization
Results: After completing this exercise, you will have normalized the data set to second normal form.
Results: After completing this exercise, you will have normalized the data set to third normal form.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 3-25
Results: After completing this exercise, you will have denormalized a table.
Question: In the suggested solution, the Orders table has a surrogate key as its primary key.
Why do you think this key was chosen instead of a composite candidate key, such as the
OrderDate and Customer columns?
Question: In the suggested solution, in the Customers table, are there any other columns
that you could break down even further? If so, why might you do this?
MCT USE ONLY. STUDENT USE PROHIBITED
3-26 Normalization
Outline the benefits of normalization and some of the concepts that underpin it.
Review Question(s)
Question: Can you think of a data set that you are familiar with, from your place of work or
elsewhere, that you could normalize? What steps would you take to normalize to third
normal form?
MCT USE ONLY. STUDENT USE PROHIBITED
4-1
Module 4
Relationships
Contents:
Module Overview 4-1
Lesson 1: Introduction to Relationships 4-2
Module Overview
In Module 2 of this course, you learned about entities and the relationships that can exist between them.
You also saw how it is possible to model these relationships logically by using entity-relationship diagrams
(ERDs). In Module 3, you learned about normalizing a design to obtain the optimal level of data
redundancy. In this module, you will learn how to implement table relationships in a SQL Server®
database.
Objectives
After completing this module, you will be able to:
Describe the considerations for planning referential integrity in a SQL Server database.
MCT USE ONLY. STUDENT USE PROHIBITED
4-2 Relationships
Lesson 1
Introduction to Relationships
An ERD describes the entities in a database and the relationships between them. Schema mapping is the
process of converting those logical designs into a physical implementation by creating tables, constraints,
and other objects in a database. This lesson describes the various types of relationships that you can
implement in a SQL Server database, and explains how to create them.
Lesson Objectives
After completing this lesson, you will be able to:
Explain the considerations for NULL values in optional and required relationships.
Referential Integrity
In an ERD, you specify the relationships that exist between
the attributes in entities. In a relational database, you
implement entities as tables, attributes as columns, and
implement relationships by enforcing referential integrity.
Referential integrity
Enforcing referential integrity between attributes involves
ensuring that the values in one attribute have
corresponding values in a second attribute. This ensures that the data remains meaningful. For example,
in an Orders table in an OLTP database, there might be a column called CustomerID that stores a value
that uniquely identifies the customer who placed any given order. However, in a normalized database, the
Orders table would not contain detailed customer data; this data would be in a dedicated Customers
table. The values in the CustomerID column in the Orders table must have corresponding values in the
Customers table to ensure that the data makes sense. Any row in the Orders table with a CustomerID
value, that does not exist in the Customers table, would represent an order without a matching customer.
By maintaining referential integrity, you can prevent this kind of scenario from occurring.
One-to-Many Relationships
Usually, the most common type of relationship in a relational
database is the one-to-many relationship. In this kind of
relationship, a value can appear only once in a column in the
first table, but the same value can appear many times in a
column in the second table. Typically, the two columns are in
different tables, but this does not have to be the case.
The Customers and Orders table in the previous topic is an example of a one-to-many relationship. Each
customer appears once in the Customers table but can appear multiple times in the Orders table. This
reflects the business fact that one customer can place many orders, but each order can only be placed by
one customer. Other examples of one-to-many relationships include:
The relationship between states and countries. A country can contain many states, but a state can
only be in one country.
The relationship between equipment and workers in an equipment inventory. One worker can
borrow multiple pieces of equipment, but each piece of equipment can only be loaned to one
worker.
Note that sometimes the relationship is defined by real-world facts—as is the case with states and
countries— and sometimes by the logic that the database embodies, as is the case with the equipment
inventory example. In the latter, it is assumed that workers do not share equipment, so the database
models this fact as a one-to-many relationship. If workers did share equipment, then this would not be a
one-to-many relationship. Understanding the business processes that a database will model is an essential
part of the design process.
The following code example creates two tables, called Customers and Orders. The table definition for
Customers includes a primary key on the CustomerID column. The table definition for Orders includes a
primary key on the OrderID column and a foreign key on the CustomerID column, which references the
CustomerID column in Customers.
MCT USE ONLY. STUDENT USE PROHIBITED
4-4 Relationships
You can create a foreign key that references a different type of constraint called a unique constraint
instead of a primary key. A table can only have one primary key, so if you need to impose uniqueness
on an additional column, you can use a unique constraint to do this. However, the majority of foreign
keys reference primary keys.
You cannot create a primary key on a column that allows NULLs, but a foreign key column does not have
this restriction. Whether or not you allow NULLs in a foreign key column depends on the business logic
that defines the relationship. For example, in the Employees table in the previous code example, the
foreign key columns DepartmentID is designated NOT NULL, meaning that the column cannot contain
NULLs. The logic that determines this is that every employee must be associated with a department within
a company—so every employee must have an associated, valid DepartmentID value. This is an example of
a mandatory relationship, in which allowing NULLs would break the business rule.
However, the business rule could state that employees did not have to be associated with a specific
department, perhaps because employees are associated with multiple departments, change departments
frequently, or because the company does not have a clearly defined departmental structure. In this
scenario, you could allow NULLs in the foreign key DepartmentID column. This would mean that,
wherever possible, employees could be associated with a department, but this would also allow for
employees whose departmental status was uncertain or unpredictable. This is an example of an optional
relationship.
One-to-One Relationships
A one-to-one relationship is similar to a one-to-
many relationship, the difference being that, in a
one-to-one relationship, every value in the
primary key column has a single matching value in
the foreign key column. Generally, one-to-one
relationships are not as common as one-to-many
relationships. For example, imagine that a gym
database stores information about its customers in
a Customers table—this could include their height,
age, weight, and other relevant physical
characteristics. If the majority of queries against
the Customers table return only the Name and
Address columns, and not the Age, Weight, Height, and other columns, it might be more efficient to
decompose the Customers table and store these columns in a separate CustomerDetails table.
Decomposing the table in this way might improve response times because it would be possible to read
the smaller Customers table from disk more quickly. In this example, both tables would probably use
CustomerID as the primary key.
To maintain referential integrity, you would create a foreign key relationship between the CustomerID
column in CustomerDetails and the CustomerID column in Customers. If the relationship is mandatory, the
“direction” of the foreign key (from CustomerDetails to Customers or from Customers to CustomerDetails)
does not really matter. You could create it in either table, referencing the other table in the foreign key
definition. However, if the relationship is optional, and NULLs will be allowed on one side of the
relationship, you should create the foreign key on the side of the relationship that allows NULLs.
Note: A foreign key does not guarantee uniqueness in the way that a primary key does so,
in a one-to-one relationship, the foreign key column could contain duplicate values. To ensure
uniqueness in the foreign key column, you can use a primary key or a unique constraint.
MCT USE ONLY. STUDENT USE PROHIBITED
4-6 Relationships
Many-to-Many Relationships
A many-to-many relationship exists when the data
values in both sides of the relationship can be
represented multiple times. This differs from a
one-to-many relationship, in which the values on
one side of the relationship must be unique. For
example, imagine an Employees table that has
been normalized so that the data about each
employee’s phone number and department are
stored in separate tables, called PhoneNumbers
and Departments. Employee includes the column
EmployeeID, PhoneNumbers includes the column
PhoneNumberID, and Departments includes the
column DepartmentID. Each table contains additional columns that depend on these primary key
columns. The business logic states that each employee can have multiple phone numbers, and that any
given phone number might be used by multiple employees—so this is a many-to-many relationship.
Similarly, business logic states that an employee can belong to more than one department, and of course,
each department includes multiple employees.
Create a table called EmployeePhoneNumber with the columns EmployeeID and PhoneNumberID.
These two columns serve as a composite primary key for the table.
Create a foreign key relationship on the EmployeeID column in Employees that references
EmployeeID in EmployeePhoneNumber.
Create a foreign key relationship on the PhoneNumberID column in Employees that references
PhoneNumberID in EmployeePhoneNumber.
The intersection table EmployeePhoneNumber can contain multiple values for the EmployeeID and
PhoneNumberID columns, but the combination of these two values is unique for each row. You could
create an intersection table called EmployeeDepartment in the same way to enable the many-to-many
relationship between Employees and Departments.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 4-7
Generalization. Generalization involves storing all of the attributes of the sub-types in a single table.
For example, you could create a single Employees table that includes all of the attributes required to
describe the sub-types SalesPerson and Technician. In this scenario, there would not be a dedicated
table for each of the sub-types. Generalized tables can often contain a large number of NULL values
because many attributes will not be relevant to all of the sub-types. For example, the attribute
AccountID, which identifies the account that a salesperson manages, is not relevant to technicians.
Similarly, the SkillID attribute, which identifies the technical skill of a technician, is not relevant to
salespeople. Consequently, you would expect to see a lot of NULLs in these two columns. A table that
has a large number of NULL values is sometimes called a sparse table.
Specialization. Specialization involves creating a table to store the common attributes, and
additionally a separate table for each sub-type to store the attributes that are uniquely relevant to
that sub-type. For example, you could create a parent Employees table that contains the shared
attributes, and a Technicians table and a SalesPerson table to store the attributes that are specific to
those sub-types. All three tables could use EmployeeID as a primary key, and the sub-type tables
would have foreign keys that reference the EmployeeID column in Employees in a one-to-one
relationship.
Self-Referencing Relationships
A self-referencing relationship occurs when a one-
to-many or one-to-one relationship exists within a
single table. For example, imagine an Employees
table that includes the columns EmployeeID and
ManagerID. The ManagerID column indicates the
employee ID of the manager for any given
employee. The values in the ManagerID column
are a subset of the values in the EmployeeID
column. To maintain referential integrity between
the columns, you must create a foreign key on the
ManagerID column that references the
EmployeeID column.
MCT USE ONLY. STUDENT USE PROHIBITED
4-8 Relationships
A self-referencing relationship might be optional or it might be required. For example, in the Employees
table, a business rule might be that every employee should have a manager (including the managers
themselves); this would be a required relationship because no NULLs would be allowed in the ManagerID
column. However, if the business rule states that some managers do not need to have managers of their
own (for example, the CEO), then it becomes an optional relationship because the ManagerID column
must allow NULLs.
You are planning a database that tracks the courses that students attend at a
college. Each course can include up to 25 students, and students can enroll on
multiple courses at the same time. You have created an ERD that includes the
entities Students and Courses. What type of relationship exists between these
entities?
A one-to-many relationship.
A one-to-one relationship.
A many-to-many relationship.
A super-type/sub-type relationship.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 4-9
Lesson 2
Planning Referential Integrity
Foreign key constraints limit the values that you can add to a column, and they also prevent the update or
deletion of values in the referenced column, when those values also exist in the foreign key column. While
this is the default behavior for foreign key constraints, there are some scenarios in which you might want
to alter this behavior—for example, by propagating updates or deletes from the referenced column to the
foreign key column. This lesson describes the options for implementing referential integrity, and explains
the considerations for implementing cascading referential integrity.
Lesson Objectives
After completing this lesson, you will be able to:
AFTER triggers. Typically, you use AFTER triggers to maintain referential integrity, but only when a
foreign key constraint is not suitable. For example, you might use an AFTER trigger if it is necessary to
maintain integrity across multiple databases because foreign key constraints can only enforce
integrity within a database. As the name suggests, an AFTER trigger occurs after an operation to
insert, update, or delete data completes. If you used an AFTER trigger to duplicate the behavior of a
foreign key constraint and only allow valid data values in a particular column, the trigger would
achieve this by first allowing the change to occur—and only then checking the data values and
reversing or rolling back the data change if required. This approach is less efficient than just
preventing invalid changes in the first place, as a foreign key constraint does.
INSTEAD OF triggers. INSTEAD OF triggers operate by executing the Transact-SQL statement stored
in the trigger instead of performing the triggering action, such as an update. For example, if you want
to prevent a column from being updated at all, you could use an INSTEAD OF trigger to return a
message explaining that updates are not allowed. The trigger would occur in response to an update,
the update would not occur, and the message would be returned instead. INSTEAD OF triggers can
be useful for enforcing data rules that constraints cannot enforce. For example, you could use a
CHECK constraint to enforce a limit of the quantity of items allowed in any one order. The constraint
would check the number of items, and allow the transaction to go ahead if the number of items is
below the threshold, and prevent it if not. However, if this kind of checking operation requires
referencing data in a different table, a CHECK constraint is not suitable, so you could use a trigger
instead. You can use the Transact-SQL CREATE TRIGGER statement to create triggers.
DML Triggers
http://aka.ms/jv2m72
When you create a foreign key constraint, you can define how to handle cascading referential integrity.
You can define a separate action for updates and for deletes. The options include:
NO ACTION. If you configure NO ACTION, then the change is not allowed, and the user receives an
error message. This is the default setting, which applies if you do not specify any other action.
CASCADE. The CASCADE option allows the change to occur in the primary key column and also
causes the change to propagate to the column in the foreign key table. For example, if you update a
CustomerID value in Customers, all corresponding instances of the value in CustomerID in Orders will
also be updated to the new value. Whilst this might initially seem like the ideal solution, you should
be very cautious about implementing this option because it can have serious consequences, such as:
o If you configure the foreign key to cascade deletes from the primary key table, the deletion of a
row in the primary key table will cause the deletion of all corresponding rows in the foreign key
table. For example, deleting a customer from the Customers table would also delete all of that
customer’s orders.
o If you cascade updates, the tables in the database will maintain referential integrity. However, it is
often important to maintain integrity across multiple databases; for example, you might
periodically load order data that is older than one month into a data warehouse. The data
warehouse might contain orders for customer A. If you update customer A’s CustomerID record
in Customers in the OLTP database to customer Z, the older records in the data warehouse will
not be updated. When you next load orders into the data warehouse, the records for Customer A
and Customer Z will not be recognized as the same customer. You can use triggers to maintain
integrity between different databases.
o If cascading is configured across multiple tables, it is possible for a single delete action to have
far-reaching consequences. It is important, therefore, to plan carefully to decide whether
cascading is appropriate behavior in the specific context of any given database.
SET NULL. The SET NULL option sets the value in the referencing foreign key column to NULL if the
primary key value is deleted or updated. This can lead to rows in the foreign key column that are
sometimes called orphans, because they have no corresponding value in the parent primary key table.
SET DEFAULT. The SET DEFAULT option works in a similar way to the SET NULL option, except that
instead of setting values to NULL, it sets them to a default value, such as “unknown” or “parent
deleted”.
Note: The ability to cascade changes to data across related tables and columns is extremely
powerful, but you should always plan very carefully before implementing it. Without proper
planning and implementation, cascading changes can have serious consequences and can
potentially result in the unintended deletion of data.
MCT USE ONLY. STUDENT USE PROHIBITED
4-12 Relationships
Demonstration Steps
1. Start the 10985C-MIA-DC and 10985C-MIA-SQL virtual machines, and then log on to 10985C-MIA-
SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
5. On the File menu, point to Open, click File, browse to D:\Demofiles\Mod04, click Referential
Integrity.sql, and then click Open.
6. In the query window, under the comment Create a database and change database context, select
the Transact-SQL statement and then click Execute.
7. In the query window, under the comment Create Customers table with a primary key constraint
and four rows, review the Transact-SQL statement, select the Transact-SQL statement, and then click
Execute.
8. In the query window, under the comment Create a table with a foreign key constraint, review the
Transact-SQL statement, select the Transact-SQL statement, and then click Execute.
9. In the query window, under the comment Test foreign key constraint by adding an order with a
valid CustomerID value, review the Transact-SQL statement, select the Transact-SQL statement, and
then click Execute.
12. In the Results pane, review the message that states that the insert statement conflicted with the
foreign key constraint.
13. In the query window, under the comment Update a CustomerID value in Customers that has no
matching value in Orders, review the Transact-SQL statement, select the Transact-SQL statement,
and then click Execute.
15. In the query window, under the comment Update a CustomerID value in Customers that has a
matching value in Orders, review the Transact-SQL statement, select the Transact-SQL statement,
and then click Execute.
16. In the Results pane, review the message that states that the update statement conflicted with the
constraint.
17. In the query window, under the comment Drop foreign key and add new foreign key that
specifies cascading referential integrity, review the Transact-SQL statement, select the Transact-
SQL statement, and then click Execute.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 4-13
18. In the query window, under the comment Test cascading referential integrity, review the Transact-
SQL statement, select the Transact-SQL statement, and then click Execute.
19. In the Results pane, review the results, noting that the row in the Orders table was successfully
updated.
A junior administrator reported that, when she deleted a row in a table in your
organization’s OLTP database, the action unexpectedly caused the deletion of more
than 100 rows. You successfully verify that the necessary foreign key constraints
exist. What was the most likely cause of the unexpected deletions?
A foreign key constraint was configured with the ON DELETE SET NULL option.
A foreign key constraint was configured with the ON DELETE SET DEFAULT
option.
A foreign key constraint was configured with the ON DELETE CASCADE option.
Objectives
After completing this lab, you will have:
Ensure that the 10985C-MIA-DC and 10985C-MIA-SQL virtual machines are both running, and then log
on to 10985C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
Every line item must be associated with a product and a valid order.
In this exercise, you will use a database diagram to help you to plan which foreign key constraints you will
need to enforce these rules.
The main tasks for this exercise are as follows:
2. Identify Relationships
3. Plan Foreign Keys
2. In the CustomerOrders database, create a database diagram, and then add the following tables to it:
o CustomerDetails
o Customers
o Orders
o Products
o LineItems
3. Review the database diagram and determine the relationships between the tables and what types of
relationships they are (one-to-one, one-to-many, or many-to-many).
2. After planning the foreign keys, save the database diagram as Keys.
Results: After completing this exercise, you will have identified the keys required to enforce referential
integrity rules.
To implement the required referential integrity, you need to create the following foreign key constraints:
o OrderID = 105
o CustomerID = 2
o OrderDate = GETDATE()
3. This INSERT statement should succeed.
4. To test the foreign key again, write a Transact-SQL statement INSERT statement to add a row to the
Orders table using the following values:
o OrderID = 106
o CustomerID = 5
o OrderDate = GETDATE()
5. This INSERT statement should be prevented by the foreign key constraint because the CustomerID
value 5 does not exist in the Customers table.
2. To test the foreign key, write a Transact-SQL statement INSERT statement to add a row to the
CustomerDetails table using the following values:
o CustomerID = 5
o Address = '9832 Mt. Dias Blv.'
o City = ‘Chicago’
o Postal Code = ‘97321’
o DateOfBirth = '08/09/1970'
3. This INSERT statement should be prevented by the foreign key constraint because the CustomerID
value 5 does not exist in the Customers table.
2. To test the foreign key, write a Transact-SQL statement INSERT statement to add a row to the
LineItems table using the following values:
o OrderID = 101
o ProductID = 33
o UnitPrice = 30.00
o Quantity = 1
3. This INSERT statement should succeed.
4. To test the foreign key again, write a Transact-SQL statement INSERT statement to add a row to the
LineItems table using the following values:
o OrderID = 106
o ProductID = 44
o UnitPrice = 30.00
o Quantity = 1
5. This INSERT statement should be prevented by the foreign key constraint because the OrderID value
106 does not exist in the Orders table.
6. In the same query window in SQL Server Management Studio, write a Transact-SQL ALTER TABLE
statement to create a foreign key constraint called FK_LineItems_Products on the ProductID
column in the LineItems table that references the ProductID column in the Products table. Do not
include an ON UPDATE or ON DELETE clause.
7. To test the foreign key, write a Transact-SQL statement INSERT statement to add a row to the
LineItems table using the following values:
o OrderID = 102
o ProductID = 22
o UnitPrice = 15.00
o Quantity = 1
8. This INSERT statement should succeed.
9. To test the foreign key again, write a Transact-SQL statement INSERT statement to add a row to the
LineItems table using the following values:
o OrderID = 104
o ProductID = 66
o UnitPrice = 30.00
o Quantity = 1
10. This INSERT statement should be prevented by the foreign key constraint because the ProductID
value 66 does not exist in the Products table.
11. Close the query window and save the file as CreateForeignKeys.sql in the
D:\Labfiles\Lab04\Starter folder.
12. Open the Keys database diagram and review it, noting that it includes the foreign key relationships
that you created in this exercise.
Results: After completing this exercise, you will have implemented referential integrity in the
OrdersDatabase database using constraints.
MCT USE ONLY. STUDENT USE PROHIBITED
4-18 Relationships
2. Type and execute a Transact-SQL statement that adds a row to the Customers table using the
following values:
o CustomerID = 0
o FirstName = ‘Not Applicable’
o LastName = ‘Not Applicable’
3. Type and execute an ALTER TABLE Transact-SQL statement to drop the FK_Orders_Customers
constraint in the Orders table.
4. Type and execute an ALTER TABLE Transact-SQL statement to create a new foreign key constraint
called FK_Orders_Customers on the Orders table. Include a clause to set the value to default for
delete actions.
5. Attempt to delete the row from the Customers table that has the CustomerID value 2 again. The
delete should succeed.
Results: After completing this exercise, you will have implemented cascading referential integrity.
Question: Do you think that it was a good idea to implement the ON DELETE CASCADE and
the ON DELETE SET DEFAULT options in the final exercise in the lab? What problems might
this potentially cause? What might you have done instead to prevent these problems?
MCT USE ONLY. STUDENT USE PROHIBITED
4-20 Relationships
Describe the considerations for planning referential integrity in a SQL Server database.
Review Question(s)
Question: Think about the kinds of processes that exist in your organization. What entities
can you identify and what types of relationships exist between them? Think of three or four
specific entities—what constraints would you implement to enforce referential integrity, and
would you use options such as CASCADE or SET NULL?
MCT USE ONLY. STUDENT USE PROHIBITED
5-1
Module 5
Performance
Contents:
Module Overview 5-1
Lesson 1: Indexing 5-2
Module Overview
This module describes the effects of database design on performance.
Objectives
At the end of this module, you will have gained an appreciation of the performance effects of database
design and be able to discuss the implications of:
Indexing
Query Performance
Concurrency
Logical design
Physical design
Index design
Query design
Execution
Some of these items, such as Logical and Physical design, are beyond the scope of this lesson. You will be
concentrating on the last three items.
MCT USE ONLY. STUDENT USE PROHIBITED
5-2 Performance
Lesson 1
Indexing
Discuss the performance effects of indexing.
For report and query writers, having a basic understanding of indexes is useful in identifying poorly
performing queries. Being able to identify problems with indexes, such as spotting a table scan when you
expected an index to be used, can be helpful in tuning an application. Similarly, SQL Server® might opt to
scan an entire table when you expect the use of a nonclustered index. Understanding that nonclustered
indexes involve an additional lookup operation versus scanning the table in a single pass, might help you
understand the choices of the optimizer.
For more information, see How Indexes are Used by the Query Optimizer in the topic Clustered and
Nonclustered Indexes Described, in Microsoft Docs:
Clustered Index
In SQL Server, a table may not have any indexes,
or it may have one or more. Indexes are not
required for data access, although there are some
features, such as constraints, that create indexes to
support them and cannot be removed without
dropping the constraint.
A database developer or administrator may choose to add a clustered index to a table. A clustered index
causes the rows in the table to be logically stored in order of the column(s) specified in the index, called
the index key. The columns used as the index key in a clustered index are called the clustering key. Tables
with clustered indexes are maintained in index order, and rows are inserted into the correct logical
location determined by their index key value. Rows are stored with their index key value. Clustered
indexes are not stored as separate structures in the database.
Because the clustered index determines the order of the rows in the table, only one clustered index is
permitted per table.
When SQL Server searches for and locates an index key value in the clustered index, it also locates data for
that row. No additional navigation is required, except in the case of special data types. Conceptually, a
table with a clustered index is like a dictionary, whose terms are the index key. The terms appear in the
dictionary in alphabetical order. When you search the dictionary for a term and locate it, you are also at
the definition for the term.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 5-3
Because there is only one clustered index permitted per table, care should be taken when choosing the
column(s) used as the clustering key. For more information, see Create Clustered Indexes in Microsoft
Docs:
Nonclustered Index
In addition to clustered indexes, SQL Server
supports another common type, called a
nonclustered index. Nonclustered indexes are
separate structures that contain one or more
columns as a key, in addition to a pointer back to
the source row in the table. Nonclustered indexes
may be created on tables that are heaps, or on
tables organized by clustered indexes. The pointer
information stored in the nonclustered index
depends on whether the underlying table is
organized as a heap or by a clustered index, but is
otherwise transparent to a query.
Because the nonclustered index is a separate structure and stores only key data and a pointer, queries that
use a nonclustered index may require an additional lookup operation before accessing the underlying
data. Conceptually, a nonclustered index is like a subject index printed at the back of a book. A sorted list
of items appears, along with a page number that points to the location of the subject in the main section
of the book. When a key value is located in a nonclustered index, SQL Server may use the pointer to
locate the data in the table itself.
Nonclustered indexes are often added to tables to improve specific query performance and avoid
resource-intensive table scans. However, nonclustered indexes require additional disk space for storage,
and are updated in real time as the underlying data changes, adding to the duration of transactions. For
these reasons, database administrators may decide not to add nonclustered indexes on every column
referenced by a query.
Distribution Statistics
SQL Server creates and maintains statistics on the
distribution of values in columns, in tables. Distribution
statistics are used by the query optimizer to estimate the
number of rows involved in a query (selectivity). The
optimizer uses this information to help determine the
optimal access method for the query. For example,
statistics about the number of rows in a table might cause
SQL Server to seek through an index rather than scan the
entire table.
By default, distribution statistics are automatically created and maintained by SQL Server for:
Administrators and database developers may also manually create statistics and update them manually, or
by way of scheduled jobs.
Note: Manually creating, updating, and reviewing statistics is beyond the scope of this
course. For more information, see Microsoft course 20762: Developing SQL Databases.
Demonstration Steps
1. In the virtual machine, on the taskbar, click Microsoft SQL Server Management Studio.
2. In the Connect to Server dialog box, in the Server name box, type MIA-SQL. In the Authentication
box, ensure Windows Authentication is selected, and then click Connect.
3. On the File menu, point to Open, click File, navigate to the D:\Demofiles\Mod05\Demo1.sql script
file, and then click Open.
4. To turn statistics on, highlight the code under Step 1 and click Execute.
5. To run a query with no indexes, highlight the code under Step 2 and click Execute.
6. To create indexes, highlight the code under Step 3 and click Execute.
7. To run a query with indexes, highlight the code under Step 4 and click Execute.
8. To drop the indexes, highlight the code under Step 5 and click Execute.
Lesson 2
Query Performance
Describe performance effects of join and search types.
Lesson Objectives
At the end of this lesson, you will have gained an appreciation of the performance effects of the following
join and search types:
Joins
Hash
Merge
Loop
Searches
Search Arguments
Non-Search Arguments
Joins
Here is a quick overview of JOINs. There are five
main types of JOINs: INNER, LEFT OUTER, RIGHT
OUTER, FULL and CROSS. Note that LEFT and
RIGHT OUTER are normally truncated to LEFT or
RIGHT. An OUTER JOIN defaults to LEFT. The
purpose of a JOIN is to gather information from
one or more tables and return it from a single
query.
There are JOIN techniques that can be used in an attempt to improve performance.
Hash Joins
The first join technique is to make use of hash joins.
Hash joins are built using hash tables, which are assembled in-memory by SQL Server. Using a hash table
consists of two phases.
MCT USE ONLY. STUDENT USE PROHIBITED
5-6 Performance
Build phase
In this phase, the smaller of the two tables is read and the
keys of the tables together with the predicate on which
the JOIN is based (the equi-join predicate, for example, ...
ON a.id = b.id) are hashed using a proprietary hash
function, then put into a hash table.
Probe phase
In this phase, each hash is read and compared against the
computed hashes of the rows in the second table, with
the output results segregated until the second table has
been read in full. Finally, the output results are retrieved and presented as the query results.
Consequently, for very large tables and for tables with large row sizes, the hash tables may have to be
flushed to disk—this can incur penalties for I/O, particularly when using non-SSD disk drives. This
constraint also naturally limits the number of hash joins that can be stored in memory and executed at
any one time. Consequently, forcing a hash join in a frequently-used stored procedure in a typical OLTP
can hinder, rather than help, query optimization.
Loop Joins
A loop join operates differently from a hash join, in that
it is best suited to a join on a small table to a larger one.
The join is constructed in a nested fashion—that is, the
smaller table is “placed inside” the larger one and each
row from the outer table is compared against each row
in the smaller table, until all matches have been found.
Consequently, these types of join are very efficient when
using a small table very frequently to sort data; however,
they can suffer in proportion to the size of the smaller table. Inner loop joins also come in a variety of
types—a naive inner loop join, which is a comparison of every row in A to B; a temporary index inner loop
join, where an index is created (b-tree) during joining, which is discarded after use; and an index inner
loop join, where a permanent index is used to optimize the loop join. Consequently, good indexes on the
tables can influence the decision of the query optimizer to select the loop join, if it will complete more
efficiently than a hash join.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 5-7
Merge Joins
A merge join is ideally suited to presorted tables
of similar size. The query optimizer will use a
merging algorithm to compare the rows in a
similar way to the “merge sort” computer science
principle. Each sorted row in A is compared to the
equivalent sorted row in B. If duplicate rows exist
on the left or right, the set (Cartesian product) of
rows is compared before moving on; however, in
most cases, the merge operation is a
straightforward comparison rather than a many-
to-many operation. This is an efficient way of
joining with large data sets, especially those that
are already presorted (such as clustered indexes) using the join predicate.
The scans are performed in parallel. When a match is found, the next search continues where the last one
left off.
In the slide, Employee E2 in row 20 of the outer table has a match in row 60 of the sorted inner table. The
next comparison will be between row 21 of the outer table and row 61 of the inner table. A match will be
found in row 62 of the inner table. The next scan will be between row 22 of the outer table and row 63 of
the inner table, and so on, until the scan between row 100 of the outer table and row 100,000 of the inner
table.
Although this join is faster, the rows in the table must be sorted if no index exists. This incurs a
performance penalty.
Imagine that the outer table contains 100 rows and the inner table contains 100,000 rows. This results in
only 100,000 comparisons (the number of rows in the inner table).
Summary
Here is a summary of when to use the various types of join:
HASH JOIN
Tables are either fairly evenly-sized or large.
Indexes are practically irrelevant unless they are filtering on additional WHERE clauses; good for heaps.
LOOP JOIN
The query has a small table on the left side of the join.
One or both tables are indexed on the JOIN predicate.
MERGE JOIN
Tables are fairly even in size.
Unlike hash join, there is no memory reallocation; good for parallel execution.
Searches
Without any data structures, such as indexes, to
aid searching, the best option is to start at the
beginning of the data and stop when you find
what you are looking for. To give you some idea
about the issues this causes, consider the search
algorithm necessary to find a telephone number in
a directory, where the entries are in a random
order.
When a RDBMS accesses a table that has no index associated to it, the RDBMS must resort to a sequential
search (or relational scan) of each record until the desired record is found. This poses no great problem if
the table is small (10 to 200 rows); the optimizer usually loads the entire table into memory and performs
the scan there. With this scenario, performance is generally not a problem.
If the table is large, (about 100,000 to 1,000,000 rows), sequential access will definitely cause performance
to degenerate. In this case, you would like to utilize a suitable index, so as to avoid the performance
penalty.
Search Arguments
A typical SQL query can be divided into six distinct
components or clauses:
Sargable
Nonsargable
A sargable predicate is one that can use created indexes for faster searches and faster execution of a
query.
SELECT EMP_ID
FROM EMPLOYEE
This tells the database to start looking just under a third of the way down the column.
Non-Search Arguments
Non-search arguments do not help limit a search.
These predicates include:
NOT IN
NOT EQUAL
Demonstration Steps
1. On the taskbar, click Microsoft SQL Server Management Studio.
2. In the Connect to Server dialog box, in the Server name box, type MIA-SQL. Ensure Windows
Authentication is selected in the Authentication box, and then click Connect.
3. On the File menu, point to Open, click File, navigate to the D:\Demofiles\Mod05\Demo2.sql script
file, and then click Open.
4. To turn statistics on, highlight the code under Step 1 and click Execute.
5. To run a query with no indexes, highlight the code under Step 2 and click Execute.
6. To create indexes, highlight the code under Step 3 and click Execute.
7. To run a query with indexes, highlight the code under Step 4 and click Execute.
MCT USE ONLY. STUDENT USE PROHIBITED
5-10 Performance
8. To drop the indexes, highlight the code under Step 5 and click Execute.
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
Lesson 3
Concurrency
This lesson describes the performance effects of concurrency. When more than one user accesses a
resource at the same time, they are said to be accessing that resource concurrently. This requires certain
mechanisms to be in place to prevent adverse effects when a user tries to modify resources that other
users are actively using. This is called concurrency control. If a data storage system has no concurrency
control, users could see the following side effects:
Lost updates.
Phantom reads.
Lesson Objectives
On completion of this lesson, you will have gained an understanding of the performance effects of the
following concurrency issues:
Transactions
Isolation levels
Locking
Transactions
As mentioned in the lesson overview, lack of
concurrency control can lead to the following data
errors:
Lost updates.
Uncommitted dependency (dirty read).
Phantom reads.
Lost updates
Lost updates occur when two or more transactions select the same row, and then update the row based
on the value originally selected. Each transaction is unaware of the other transactions. The last update
overwrites updates made by the other transactions, which results in lost data.
For example, two users have downloaded the same file and are updating it independently of each other.
When the first user has finished, they upload the changed file, overwriting the original. The second user
then completes their updates and uploads their changes to the file. This overwrites the first user’s changes
which are then lost.
This problem might be avoided if one user could not access the file until the other had finished and
committed the transaction. This is further described in the Locking topic later in this module.
MCT USE ONLY. STUDENT USE PROHIBITED
5-12 Performance
For example, a user might make changes to a file that another user accesses before the first user has
completed their updates. This second user then distributes the file. Subsequently, the first user decides
that the changes made are wrong and should not have been made. This means that the distributed file
contains changes that no longer exist and should be treated as if they had never existed.
This problem could be avoided if no one could read the changed data until the first transaction has been
completed and committed.
For example, a user reads the same data twice, but between each reading, another user has modified the
data. When the first user reads the data for the second time, it has changed. The original read was not
repeatable.
This problem might be avoided if the data could not be changed until the first transaction had finished
reading it for the last time.
Phantom reads
Phantom reads occur when an insert or delete action is performed against a row that belongs to a range
of rows being read by a transaction. The transaction's first read of the range of rows shows a row that no
longer exists in the second or succeeding read, as a result of a deletion by a different transaction.
Similarly, the transaction's second or succeeding read shows a row that did not exist in the original read,
as the result of an insertion by a different transaction.
For example, a user reads data from a range of rows in a table. Later, the same user performs the same
read operations. In the meantime, another user has deleted or inserted rows in the range being read. This
results in data being in the first read but not the second (as a result of a delete operation); or, conversely,
data being in the second read but not the first (as a result of an insert operation).
Similar to the nonrepeatable read situation, this problem could be avoided if no one could read data until
the transactions that are adding or deleting data have completed and been committed.
Missing one or more rows that were not the target of the update
When you are using READ UNCOMMITTED, if your query reads rows using an allocation order scan (using
IAM pages), you might miss rows if another transaction is causing a page split. This cannot occur when
you are using read committed because a table lock is held during a page split. This will not happen if the
table does not have a clustered index, because updates do not cause page splits.
Additional Reading: For more information, see the Isolation Level and Locking topics later
in this module.
Isolation Level
You can specify an isolation level for transactions. This
defines the degree to which one transaction must be
isolated from data modifications made by other
transactions. Isolation levels are described in terms of
which concurrency side effects, such as dirty reads or
phantom reads, are allowed.
Whether locks are implemented when data is read, and what type of locks are requested.
How long the read locks are held.
Whether a read operation references rows that are being modified by another transaction.
Blocking the read operation until the exclusive lock on the row is freed. This is the highest isolation
level and does not allow any of the concurrency side effects.
o Retrieves the last-committed version of the row. This will be the one that existed at the time
the read operation started. This isolation level will potentially result in data updates being lost,
but will at least ensure that the data being read is stable (though not necessarily the latest
version).
o Reads the uncommitted data modification. This is the lowest isolation level and allows all the
concurrency side effects.
Choosing a transaction isolation level does not affect the locks necessary to protect data modifications.
When modifying data, a transaction always gets an exclusive lock on that data, holding the lock until it
has completed the modification, regardless of the isolation level set for that transaction. For read
operations, transaction isolation levels primarily define the level of protection from the effects of
modifications made by other transactions.
With a lower isolation level, many users can access data at the same time, but this increases the number of
concurrency effects (such as dirty reads or lost updates) that users might encounter. On the other hand, a
higher isolation level reduces the types of concurrency effects that users may encounter, but requires
more system resources and increases the chances that one transaction will block another. Choosing the
appropriate isolation level is a balancing act between the data integrity requirements of the application
and the overhead intrinsic in each isolation level. The highest isolation level, serializable, guarantees that a
transaction will retrieve exactly the same data every time it repeats a read operation—it does this by
performing a level of locking that is likely to impact other users in multiuser systems. The lowest isolation
level, read uncommitted, may retrieve data that has been modified but not committed by other
transactions. All of the concurrency side effects can happen in read uncommitted, but there is no read
locking or versioning, so overhead is minimized.
MCT USE ONLY. STUDENT USE PROHIBITED
5-14 Performance
Locking
Data can be locked to prevent other users from
accessing it when it is being modified by another
user or transaction. These locks are part of
concurrency control.
Microsoft SQL Server supports a range of concurrency control. Users specify the type of concurrency
control by selecting transaction isolation levels for connections or concurrency options on cursors. These
attributes can be defined using Transact-SQL statements, or through the properties and attributes of
database application programming interfaces (APIs), such as ADO, ADO.NET, OLE DB, and ODBC.
Demonstration Steps
1. On the taskbar, click Microsoft SQL Server Management Studio.
2. In the Connect to Server dialog box, in the Server name box, type MIA-SQL. Ensure Windows
Authentication is selected in the Authentication box, and then click Connect.
3. On the File menu, point to Open, click File, navigate to the D:\Demofiles\Mod05\Demo3A.sql
script file, and then click Open.
4. On the File menu, point to Open, click File, navigate to the D:\Demofiles\Mod05\Demo3B.sql
script file, and then click Open.
5. To start the first query, switch to the Demo3A tab, highlight the code under Step 1 and click
Execute.
6. Immediately switch to the Demo3B tab, highlight the code under Step 2 and click Execute.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 5-15
7. Note that the query does not complete because it is being blocked.
8. Switch to the Demo3A tab and, on the toolbar, click Cancel Executing Query.
9. Wait until the cancellation has completed and switch to the Demo3B tab.
10. Notice that the query has now completed because the locks have been released.
11. Close SQL Server Management Studio without saving any changes.
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
True or false? Isolation levels affect the locks imposed when a transaction
is modifying data.
MCT USE ONLY. STUDENT USE PROHIBITED
5-16 Performance
Initially, you will plan indexing for data that is static and has no insert, updates, or deletes; you will then
also consider the effects of large numbers of transactions.
Objectives
After completing this lab, you will be able to:
Lab Setup
Estimated Time: 30 minutes
Password: Pa55w.rd
This is a paper-based lab and the virtual machines are only required to access supporting material.
3. Plan which clustered and nonclustered indexes you would add to this design, based on the design of
the tables and the common queries.
3. Note the changes that you might make to the index strategy to support transactions.
Results: After completing this exercise, you will have created an index plan.
MCT USE ONLY. STUDENT USE PROHIBITED
5-18 Performance
Module 6
Database Objects
Contents:
Module Overview 6-1
Lesson 1: Tables 6-2
Module Overview
SQL Server® databases can contain many different types of objects, including tables, stored procedures,
and views. Most implementations of SQL Server databases include these types of objects, so it is important
to understand how to use them and the benefits that they provide.
Objectives
After completing this module, you will be able to:
Create tables and use data types and constraints to ensure integrity.
Lesson 1
Tables
Entities are implemented in databases as tables, and attributes are implemented as columns in tables.
Each row in a table represents a discrete instance of the entity; for example, a row in a table called
Customers represents an individual customer. You can use primary key constraints and foreign key
constraints to ensure uniqueness and to implement referential integrity, which helps to ensure that the
data is meaningful. However, there are other configuration options that you can use to help maintain data
integrity, including data types, UNIQUE constraints, CHECK constraints, DEFAULT constraints, NULL, and
NOT NULL.
Lesson Objectives
After completing this lesson, you will be able to:
Describe the use of data types when creating SQL Server tables.
Describe the numeric data types in SQL Server.
The following code example uses the int, money and smallint data types in a CREATE TABLE Transact-SQL
statement:
smallint, which uses two bytes to store integer data values between -32,768 and 32,767.
bigint, which uses eight bytes to store integer data values between -9,223,372,036,854,775,808 and
9,223,372,036,854,775,807.
tinyint, which uses one byte to store integer data values between 0 and 255.
money, which uses eight bytes to store monetary values to four decimal places.
smallmoney, which uses four bytes to store monetary values to four decimal places.
decimal, which stores exact decimal numbers. The number of bytes required varies, depending on
the precision that you specify. The data type numeric duplicates the functionality of decimal.
bit, which uses a single bit to store the values 1 or 0. The bit data type is useful for columns that can
only contain two different values; for example, a MaritalStatus column in a Customers table might use
the value 1 to indicate married and 0 to indicate unmarried.
It is important to select a data type that is appropriate for the data that a column will store, because this
can have an impact on the size of a database and on performance. For example, in a LineItems table, in an
order-processing OLTP database, there might be a column called Quantity that specifies the number of
items of a particular product that were included in an order. You could use the int data type for this
column, which would require storing four bytes of data for every row.
However, if you know that the number of product items in a given order will always be fewer than 500, for
example, you could use smallint instead, because this would require only two bytes to store the same
data. This would reduce storage requirements by two bytes for each row, which could result in a
significant reduction in size for tables with a large number of rows. Additionally, smaller rows can result in
better performance. SQL Server performs best when it can service transactions by using data cached in
memory, instead of retrieving the data from disk. Ensuring that rows are as small as possible means that
more rows can fit into memory, and consequently that SQL Server does not need to access data on disk as
frequently.
MCT USE ONLY. STUDENT USE PROHIBITED
6-4 Database Objects
varchar (n), which stores character data as variable length strings. With the varchar data type, the
value (n) represents a maximum number of characters. Unlike char, varchar stores only the required
number of characters for each row. For example, if you specify varchar (20) for a column, but specify a
value that is six characters in length, then SQL Server will only use six bytes to store the character
instead of 20, as it would if you used char (20). This results in smaller rows, which can be beneficial for
the reasons described earlier. The varchar data type is normally used when string lengths are likely to
be inconsistent, and char is often used when character string lengths are predictable.
varchar (max), which means you can store character strings that are larger than the maximum 8,000
bytes.
If a database includes data in multiple alphabets, it can be better to use Unicode storage for character
strings because this helps you to store a wider range of characters. Unicode uses two bytes to store each
character instead of one; so for each character, there are 65,536 possibilities instead of just the 256
possibilities that one byte provides. You can specify Unicode storage by using data types, including nchar
(n), nvarchar (n), and nvarchar (max). These data types work in the same way as their non-Unicode
equivalents, except that they use twice as much storage space per character.
Note that the following character string data types are deprecated:
text
ntext
datetime, which uses eight bytes to store dates between January 1, 1753 and December 31, 9999,
and times between 00:00:00 and 23:59:59.997.
datetime2, which uses between six and eight bytes (depending on precision) to store dates between
0001-01-01 and 9999-12-31, and times between 00:00:00 and 23:59:59.9999999. The datetime2 data
type has a wider date range than datetime and supports user-defined precision.
date, which uses three bytes to store dates between 0001-01-01 and 9999-12-31.
time, which uses five bytes to store times between 00:00:00.0000000 and 23:59:59.9999999.
smalldatetime, which uses four bytes to store dates between 1900-01-01 and 2079-06-06, and times
between 00:00:00 and 23:59:59.
binary (n), which stores fixed-length binary data. The number of bytes used depends on the user-
defined value (n).
varbinary (n), which stores variable-length binary data. As with the varchar data type, the maximum
number of bytes used depends on the user-defined value (n), and the actual number of bytes used in
a given row depends on the length of the actual data value for that row.
varbinary (max), which stores larger binary values. You should use varbinary (max) instead of the
image data type, which is deprecated.
For more information about spatial data types, see Spatial Data (SQL Server) in Microsoft Docs:
For more information about the xml data type, see XML Data Type and Columns (SQL Server) in
Microsoft Docs:
hierarchyID, which you can use to store data that is hierarchically related to data in other columns.
The hierarchyID data type stores a path that represents a given value’s place in a hierarchy.
For more information about the hierarchyID data type, see hierarchyID (Transact-SQL) in Microsoft
Docs:
hierarchyID (Transact-SQL)
http://aka.ms/mdu4wx
The following Transact-SQL statement creates a data type called SSN that is based on the char (n) data
type, which does not allow NULLs.
UNIQUE constraints
A UNIQUE constraint checks the values in a column to ensure that they are unique, in much the same way
as a PRIMARY KEY constraint does. UNIQUE constraints are useful when it is necessary to apply
uniqueness to columns that do not form part of a table’s primary key. There are several important
differences between the two types of constraint:
It is possible to create multiple UNIQUE constraints on a table, but a table can have only one
PRIMARY KEY constraint.
A column with a UNIQUE constraint can include a single NULL value. A column with a PRIMARY KEY
constraint cannot include any NULLs.
By default, creating a PRIMARY KEY constraint also creates a clustered index that sorts the table by
the primary key column or columns. Creating a UNIQUE constraint on a column creates a
nonclustered index on that column by default.
The following Transact-SQL statement adds a UNIQUE constraint to the SSN column:
DEFAULT constraints
A DEFAULT constraint includes a value that is inserted into a column if the INSERT statement does not
include a specific value for that column. This can be useful in helping to minimize or eliminate NULLs in
tables. NULLs can be problematic in tables because they can give rise to misleading or inaccurate query
results if not handled correctly. For example, imagine a CustomerDemographics table with a column
called NumberOfChildren, which allows NULLs to account for situations where a customer does not
disclose this information. A person reading this data could potentially interpret the NULLs to mean ‘zero
children’, when in fact it means that the number of children is unknown. To remove this ambiguity, you
could create the column by using the NOT NULL option, and additionally using a DEFAULT constraint that
enters the value ‘Unknown’ when no value is supplied for the NumberOfChildren column.
The following Transact-SQL statement adds a DEFAULT constraint that supplies the value ‘Unknown’ for
the NumberOfChildren column:
CHECK constraints
A CHECK constraint checks values that are entered into a column against the defined criteria. This helps to
ensure that the values entered are valid. For example, it might be necessary to ensure that the dates that
an application enters into the OrderDate column in the Orders table are for the current date only. A
CHECK constraint could check the entered order dates and prevent the entry of orders that have a past or
future order date. Alternatively, you could allow dates that are up to two days before or after the current
date to allow users to enter a wider range of order dates if required.
MCT USE ONLY. STUDENT USE PROHIBITED
6-8 Database Objects
The following Transact-SQL statement adds a CHECK constraint that checks the values in the OrderDate
column to ensure that they are for the current date. The WITH NOCHECK option prevents the application
of the constraint to existing values in the OrderDate column:
Note that the IDENTITY property does not ensure uniqueness. With the IDENTITY property configured, a
user can still add a duplicate value to the IDENTITY column. To ensure uniqueness, you need to use a
PRIMARY KEY or a UNIQUE constraint. It is very common to configure the IDENTITY property to supply
values for surrogate PRIMARY KEY columns, in which the value only needs to be unique, and does not
represent anything that exists outside the scope of the database. You can designate only one IDENTITY
column for each table.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 6-9
Schemas
When you create a table in a SQL Server database, you can
specify a schema for the table as part of the CREATE TABLE
statement. A schema is a namespace that represents a
logical container for tables. For example, you could create a
schema called Product, and then create all product-related
tables in that schema.
The following Transact-SQL code example creates a table called ProductCategory in the Product schema:
Create tables.
Create schemas.
Demonstration Steps
1. Start the 10985C-MIA-DC and 10985C-MIA-SQL virtual machines, and then log on to 10985C-MIA-
SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
2. Open Microsoft SQL Server Management Studio, and then connect to the MIA-SQL instance of
the database engine by using Windows Authentication.
3. On the File menu, point to Open, click File, browse to D:\Demofiles\Mod06, click Create
Tables.sql, and then click Open.
4. In the query window, under the comment Create a database and change database context, review
the Transact-SQL statement, select it, and then click Execute.
5. In the query window, under the comment Create schemas, review the Transact-SQL statement, select
it, and then click Execute.
6. In the query window, under the comment Create tables, review the Transact-SQL statement, select it,
and then click Execute.
7. In the query window, under the comment Add CHECK constraint to Sales.Order, review the
Transact-SQL statement, select it, and then click Execute.
8. In the query window, under the comment Test CHECK constraint, review the Transact-SQL
statement, select it, and then click Execute.
9. Review the results, and note that the insert fails because the order date is yesterday's date.
10. In the query window, under the comment Add DEFAULT constraint to
Person.CustomerDemographics, review the Transact-SQL statement, select it, and then click
Execute.
11. In the query window, under the comment Test DEFAULT constraint, review the Transact-SQL
statement, select it, and then click Execute.
12. Review the results, and note that the insert succeeds and the default value ‘Unknown’ is entered.
13. In the query window, under the comment Add UNIQUE constraint to Person.Employee, review the
Transact-SQL statement, select it, and then click Execute.
14. In the query window, under the comment Test UNIQUE constraint, review the Transact-SQL
statement, select it, and then click Execute.
15. Review the results, and note that the first insert succeeds, but the second fails because it contains the
duplicate SSN value ‘123’.
Categorize Activity
Place each item into the appropriate category. Indicate your answer by writing the category number to
the right of each item.
Items
1 Checks the values that are added to a column and prevents the
insertion of duplicate values.
Lesson 2
Views
Databases can sometimes be complex, and may include hundreds of tables, or even more. For end users
who access the data in a database, this complexity can be confusing. In SQL Server, you can create views
to help you to present the data to end users in a more user-friendly way.
Lesson Objectives
After completing this lesson, you will be able to:
What Is a View?
The complexity of a database can be confusing for end
users. A highly normalized database will contain many
tables, and the data that a user might require is often
spread across multiple columns in multiple tables. Writing a
Transact-SQL query to return data may involve using more
advanced queries that include JOINs, for example. A JOIN
is an operation that matches rows in one table against rows
in another table, based on the values in a column that
exists in both tables. Usually, this is a foreign key column in
one table and a primary key column in the other. Views can be useful in this kind of scenario because they
mask this underlying complexity, and help users to write much simpler queries.
A view is a stored Transact-SQL SELECT statement. To create a view, you write a CREATE VIEW Transact-
SQL statement, and include the SELECT statement that you want the view to return.
The following CREATE VIEW statement creates a view called VW_CustomerOrders that includes a SELECT
statement that combines columns from two tables, called Sales.Orders and Person.Customer. The SELECT
statement allocates each table an alias (C for Person.Customer and O for Sales.Orders), which is used in
the JOIN clause:
To use a view, you write a SELECT statement that references the view as if it were a table. A SELECT
statement against a view returns a data set in the same way as a SELECT statement against a table does.
The following Transact-SQL SELECT statement selects all rows from the view called VW_CustomerOrders:
You can also specify the columns that you want to return from a view by listing them in the SELECT clause,
in the same way as you would if querying against a table. In addition to selecting data by using a view,
you can also update, insert, and delete data in the underlying tables, providing the changes only reference
columns from a single base table.
SQL Server includes an extensive set of system views, which you can use to access metadata about SQL
Server for maintenance and troubleshooting.
For more information about system views in SQL Server, see System Views (Transact-SQL) in Microsoft
Docs:
Users can treat a view in the same way as they would treat a table (for the most part); users might not
even be aware that they are using a view to access data, and they only see the data that the view
presents to them. This reduces the risk of users trying to explore data that they should not be able to
access, which is likely to happen if you simply give users permission directly on the underlying tables.
It provides database administrators with a fast and easy way to configure permissions; it is less
complex and less prone to error than configuring permissions on multiple tables.
MCT USE ONLY. STUDENT USE PROHIBITED
6-14 Database Objects
For more information about creating indexed views, see Create Indexed Views in Microsoft Docs:
Create a view.
Query a view.
Demonstration Steps
1. In the D:\Demofiles\Mod06 folder, right-click Setup.cmd, and then click Run as administrator.
2. In the User Account Control dialog box, click Yes, and wait for setup to complete.
3. Open Microsoft SQL Server Management Studio, and then connect to the MIA-SQL instance of
the database engine by using Windows Authentication.
4. On the File menu, point to Open, click File, browse to D:\Demofiles\Mod06, click Create Views.sql,
and then click Open.
5. Under the comment Create a view, review the Transact-SQL statements, select them, and then click
Execute.
6. Under the comment Query the view, review the Transact-SQL statements, select them, and then click
Execute.
7. Under the comment Query the view with a WHERE clause, review the Transact-SQL statements,
select them, and then click Execute.
8. Under the comment Create an index on the view, review the Transact-SQL statements, select them,
and then click Execute.
9. In Object Explorer, expand Databases, expand Demo, expand Views, and note the view that you
created in step 5.
10. Expand dbo.VW_CustomerOrders, expand Indexes, and note the index that you created in step 8.
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
True or false? To create an indexed view, you must include the WITH
SCHEMABINDING option in the view definition.
MCT USE ONLY. STUDENT USE PROHIBITED
6-16 Database Objects
Lesson 3
Stored Procedures, Triggers, and Functions
In addition to the tables, views, and other database objects that this module has described, there are
other objects that occur in SQL Server databases. These include stored procedures, triggers, and functions.
Lesson Objectives
After completing this lesson, you will be able to:
Stored Procedures
A stored procedure is a SQL Server database
object that encapsulates a Transact-SQL
statement. Unlike views, which only encapsulate
SELECT statements, stored procedure definitions
can include a wider range of Transact-SQL
statements, such as UPDATE, INSERT, and DELETE
statements, as well as SELECT statements.
Furthermore, you can include procedural logic in
Transact-SQL statements in stored procedures—
this provides the flexibility to help you to
implement complex business rules.
The following code example creates a stored procedure called USP_Orders, which includes a SELECT
statement. The code example also includes an EXEC statement. You use the EXEC keyword to run stored
procedures in SQL Server.
EXEC USP_Orders;
GO
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 6-17
The majority of the system stored procedures in SQL Server have the prefix sp_ in their name. To make it
easy to differentiate between user-created stored procedures and system stored procedures, you should
avoid creating procedures that use the sp_ prefix.
Modularization
Database users interact with a database by using an
application of some kind, such as a webpage. It is more
efficient to include the stored procedure in an application’s
code than it is to include Transact-SQL statements. For example, if an application needs to include an
INSERT statement to add a row to a database, you could potentially include all of the code to do this
directly in the application. However, by encapsulating the code in a stored procedure and including only
the EXEC statement in the application, you gain significant benefits, including:
Any changes to the Transact-SQL statement can be made in the stored procedure in the database,
and there is usually no need to modify the application code.
The application code is more streamlined and easier to understand.
If communication between the application and the database server is done over a network, sending a
small EXEC statement is more efficient than sending many more lines of Transact-SQL code.
Performance
In addition to reducing the amount of data sent over networks, as described above, stored procedures can
also improve performance because they are precompiled. When you execute a Transact-SQL statement,
the SQL Server database engine creates an execution plan—by referencing the available indexes and
index statistics. The execution plan represents the most efficient way of running the statement at that
particular moment, according to the database engine. This process, which is called compilation, can add
significantly to the processing time for executing Transact-SQL statements, particularly when there is a
large number of queries running simultaneously on a server. The first time a stored procedure executes,
the database engine compiles the statement and creates a plan in the same way, but it then stores and
reuses this plan for subsequent executions. Because subsequent executions use this precompiled
execution plan instead of generating a new one, the time required to process the statement in the
procedure is reduced.
MCT USE ONLY. STUDENT USE PROHIBITED
6-18 Database Objects
Security
When you enable users and applications to interact with data in a database by using a stored procedure,
you only need to configure permissions for the users or applications on the stored procedure itself—not
on the database objects that the stored procedure references. This simplifies the security model, and
reduces the risk of insecure permissions configuration.
Input Parameters
When creating a stored procedure, you can
include input parameters. An input parameter is a
value that the user can supply when they execute
the stored procedure. For example, in a stored
procedure that inserts a row into a table, you
would include a parameter for each value for the
columns in the table; for a stored procedure that
selects data, you could include a parameter that is
used in the where clause to filter the rows that the
query returns. To include input parameters in a
stored procedure, you declare them when you
create the procedure, and assign a data type to
each parameter. The data type usually matches the data type of the column that the parameter maps to.
Parameter names must begin with the @ symbol.
The following code example creates a stored procedure that inserts a row into a table called
Person.Customer, which contains the columns CustomerID, Name, DateOfBirth and Address. The stored
procedure includes one parameter for each of these columns.
When a user executes a stored procedure that includes parameters, a value of the correct data type for
each of the parameters must be specified. Note that values for char, varchar, and datetime data types
must be enclosed in single quotes, but values for int data types should not be.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 6-19
The following code example uses the EXEC keyword to execute the USP_InsertCustomer stored procedure
with parameter values:
Note: When creating stored procedures, you should not include parameters that map to
columns that have the IDENTITY property defined, because IDENTITY automatically provides
system generated values.
Demonstration Steps
1. In the D:\Demofiles\Mod06 folder, right-click Setup.cmd, click Run as Administrator.
2. In the User Account Control dialog box, click Yes, and then wait for the script to complete.
3. Open Microsoft SQL Server Management Studio, and then connect to the MIA-SQL instance of
the database engine by using Windows Authentication.
4. On the File menu, point to Open, click File, browse to D:\Demofiles\Mod06, click Create Stored
Procedure.sql, and then click Open.
5. Under the comment Create the stored procedure, review the Transact-SQL statements, select them,
and then click Execute.
6. Under the comment Execute the stored procedure, review the Transact-SQL statement, select it,
and then click Execute.
7. Under the comment View the results, review the Transact-SQL statement, select it, and then click
Execute.
8. Review the results, noting that a row has been added to the Person.Customer table with the values
specified in the EXEC statement.
Triggers
SQL Server includes database objects called
triggers. A trigger is an encapsulated Transact-SQL
statement that executes in response to an action
performed in a database. You cannot execute a
trigger by calling it in a Transact-SQL statement as
you can with a stored procedure; only the
triggering action causes a trigger to run. There are
three types of trigger that SQL Server supports:
Data Manipulation Language (DML) triggers, Data
Definition Language (DDL) triggers, and logon
triggers.
DML triggers
DML is the subset of Transact-SQL that includes INSERT, UPDATE, and DELETE statements. A DML trigger
runs in response to INSERT, UPDATE, or DELETE actions on a specific table. When you create a DML
trigger, you include the name of the table that the trigger is associated with, and whether the trigger will
run after the specified DML action or actions complete or instead of the action or actions.
AFTER triggers allow the triggering action to complete before running. For example, you might create
a trigger that updates a table in an inventory database in response to the orders that customers place
in an order processing database. When a customer orders a particular product, the trigger
automatically updates the stock level for that product in the inventory database. However, if the
initial order entry fails to complete for any reason, the trigger would not run and the inventory
database would not be updated.
INSTEAD OF triggers run instead of the triggering statement. With an INSTEAD OF trigger, the
triggering Transact-SQL statement never runs; the code in the trigger runs in its place. A key
advantage of INSTEAD OF triggers is that they provide a way to run updates against views that
reference more than one table. When you update a view that references only one table, there is no
ambiguity about which underlying base table needs to be updated. However, when a view references
multiple tables, any attempted updates against the view fail because the view represents a logical
table, whereas the updates must be performed against the base tables. An INSTEAD OF trigger
created on a view can include code to redirect the updates to the correct base tables.
DDL triggers
DDL is the subset of Transact-SQL that includes CREATE, ALTER, and DROP statements, as well as the
GRANT, DENY, and REVOKE statements used to configure permissions. DDL triggers run in response to
DDL statements that execute in a database. For example, you might create a DDL trigger to prevent the
use of ALTER statements to change a table, or alternatively, to record the changes that are made to
database objects. DDL triggers run after the triggering DDL event; there is no option to create INSTEAD
OF DDL triggers.
Logon triggers
Logon triggers run in response to the initiation of logon sessions at the level of the SQL Server instance.
These types of triggers are usually used to audit logon activity, or to manage logon activity; for example,
by restricting the number of simultaneous sessions a given account is allowed to establish. Like DDL
triggers, logon triggers run after the triggering DDL event, in this case the successful completion of
authentication. There is no option to create INSTEAD OF logon triggers.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 6-21
Creating triggers
To create triggers, you use a CREATE TRIGGER Transact-SQL statement. When creating a DML trigger, you
must specify the name of table that the trigger affects. When creating a DDL trigger that affects a single
database, you must include the ON DATABASE clause in the CREATE TRIGGER statement. To create a DDL
trigger that affects all databases on a server, you use the ON ALL SERVER clause. To create a logon trigger,
you should include the FOR LOGON clause.
For more information about creating triggers, see CREATE TRIGGER (Transact-SQL) in Microsoft Docs:
Functions
A function is a database object that performs a
specific task, such as summing values in a column
or extracting part of a character string. Functions
are reusable, which helps to ensure that business
logic is implemented consistently; they also hide
complexity from users, making Transact-SQL
statements easier to write. A function accepts
input values and returns a result, either as a scalar
value, which is a single, discrete value, or as a
table. SQL Server includes a set of built-in
functions, and you can also create user-defined
functions.
Built-in functions
SQL Server built-in functions include:
Scalar functions. Scalar functions operate on a single value and return a single scalar result. For
example, the UPPER function accepts a character string input and returns the same string in upper
case. UPPER is an example of a string function. There are many other types of scalar functions,
including:
o Date and time functions such as GETDATE, which returns the current date and time.
o Mathematical functions such as SQUARE, which returns the squared value of the supplied value.
o Metadata functions, which return information about database objects. For example, the DB_ID
function returns the internal identification number for a given database.
Aggregate functions. Aggregate functions operate on a range of values and return a single result.
For example, the SUM function adds a range of data values to produce a single summed total. Other
aggregate functions include MAX, which return the maximum value in a set of values, and AVG, which
returns the mean value.
MCT USE ONLY. STUDENT USE PROHIBITED
6-22 Database Objects
The following code example uses the SUM aggregate function to return the total of the values in the
UnitPrice column in the Sales.LineItems table:
User-defined functions
With user-defined functions, you can encapsulate and reuse code in a consistent way. You can create
scalar user-defined functions, such as a function that concatenates separate FirstName and LastName
columns to create a single result that represents the full name. You can also create table-valued functions
that return a tabular data set as a result instead of a single value. Types of table-valued user-defined
functions include:
Inline table-valued functions. This type of function returns a table that is populated from a single
SELECT statement included in the function definition. There is no need to declare the structure of the
table in the function definition because the function simply uses the same structure as the base table
that it selects from. An inline table-valued function is similar in functionality to a view, but unlike a
view, you can include parameters. Users can pass parameter values to the function when they use it to
filter the results that it returns.
Multistatement table-valued functions. This type of function returns a table that is populated by
the results of multiple statements in the function definition. You must explicitly declare the structure
of the table that the function returns, including the columns and data types it contains. Because they
can include multiple statements, multistatement table-valued functions are much more flexible in the
data that they can return than views, which can only include a single SELECT statement.
You can use table-valued functions by referencing them in the FROM clause in a SELECT statement, in the
same way as you reference views or tables. Although you can use stored procedures that can also return
data sets, unlike table-valued functions, you cannot reference stored procedures in the FROM clause of a
SELECT statement.
The following code example uses the UDF_Orders function in a SELECT statement:
A stored procedure.
A view.
An indexed view.
Create and test a stored procedure to insert new orders into the Sales.Order table.
Objectives
After completing this lab, you will have:
Created a view.
Note: You can find the completed Transact-SQL statements for this exercise in the Create
Line Items Table.sql file in the D:\Labfiles\Lab06\Solution folder.
1. Open SQL Server Management Studio and connect to the MIA-SQL Database Engine instance by
using Windows Authentication.
2. In a new query window, type and execute a Transact-SQL statement to create a table called
LineItems in the Sales schema in the OrdersDatabase database. Include the following columns and
constraints. None of the columns should allow NULLs:
o A primary key constraint called PK_LineItems on the OrderID and ProductID columns.
o A foreign key constraint called FK_LineItems_Orders between the OrderID column in the
Sales.LineItems table and the OrderID column in the Sales.Orders table.
2. Test the DEFAULT constraint by executing an INSERT statement to add a row to the Sales.Orders
table using the following values:
o OrderID =110
o CustomerID = 3
o OrderDate = DEFAULT
3. Check that the INSERT statement worked by executing a SELECT statement against the Sales.Orders
table to return the row with the OrderID value of 110. Note that the date was added by the DEFAULT
constraint.
4. Close the query window, but leave SQL Server Management Studio open for the next exercise.
Results: After completing this exercise, you will have a new table named Sales.LineItems and a new
DEFAULT constraint on the Sales.Orders table.
MCT USE ONLY. STUDENT USE PROHIBITED
6-26 Database Objects
Note: The file Create View.sql in the D:\Labfiles\Lab06\Solution folder contains the
Transact-SQL statements used in this exercise.
1. In SQL Server Management Studio, in a new query window, in the OrdersDatabase database, write a
Transact-SQL query that returns the following columns. Include the appropriate JOIN statements in
the FROM clause, and use the alias O for the Sales.Orders table, the alias C for the Person.Customer
table, and the alias L for the Sales.LineItems table:
2. Execute the query to ensure that it returns the required columns and rows.
3. In the SELECT list, remove the UnitPrice and Quantity columns, and replace them with the following
Transact-SQL code:
4. In the SELECT list, remove the FirstName and LastName columns, and replace them with the
following Transact-SQL code:
5. After the FROM clause and the JOIN statements, type the following Transact-SQL code:
6. Execute the query, which should return four rows, including a column that shows the total cost for
each order.
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 6-27
2. Test the view by writing a Transact-SQL SELECT statement that selects all columns from the view.
3. Close the query window, but leave SQL Server Management Studio open for the next exercise.
Results: After completing this exercise, you will have a new view in the database.
Note: You can find the completed Transact-SQL statements for this exercise in the Create
Stored Procedure.sql file in the D:\Labfiles\Lab06\Solution folder.
1. In SQL Server Management Studio, in a new query window, ensure that the database context is the
OrdersDatabase database. Write a Transact-SQL INSERT statement that inserts a row into the
Sales.Orders table. In the INSERT clause, in the column list, specify the following columns:
o OrderID
o CustomerID
Note: You do not need to specify the OrderDate column because this column has a
DEFAULT constraint that supplies its value.
o 120
o 2
3. Use the INSERT statement that you just typed to write a CREATE STORED PROCEDURE statement. Use
the following specifications:
o Name: USP_InsertOrders.
o Input parameters:
@OrderID (data type int).
@CustomerID (data type int).
o Replace the literal values (120 and 2) in the VALUES list with the input parameter values you
added above.
o Execute the statement to create the procedure.
MCT USE ONLY. STUDENT USE PROHIBITED
6-28 Database Objects
o @OrderID = 150
o @CustomerID = 1
2. Write a Transact-SQL SELECT statement to view the newly inserted rows in the Sales.Orders table.
Results: After completing this exercise, you will have a new stored procedure in the database.
Question: The PRIMARY KEY column in the Sales.Orders table does not have the IDENTITY
property set. If the IDENTITY property were set for this column, how would you have to
modify the USP_InsertOrders stored procedure?
Question: You created a stored procedure to insert new rows into the Sales.Orders table.
What else would you need to do to ensure that new orders are recorded in full, including
details of the products in each order, prices, and so on?
MCT USE ONLY. STUDENT USE PROHIBITED
Introduction to SQL Databases 6-29
Create tables and use data types and constraints to ensure integrity.
Review Question(s)
Question: Think of a table in a database that you are familiar with, perhaps a database in
your place of work. Imagine that you need to create a stored procedure to input new rows
into the table. Which input parameters would you need to include, and what data types
would you assign to these parameters?
MCT USE ONLY. STUDENT USE PROHIBITED
6-30 Database Objects
Course Evaluation
Your evaluation of this course will help Microsoft understand the quality of your learning experience.
Please work with your training provider to access the course evaluation form.
Microsoft will keep your answers to this survey private and confidential and will use your responses to
improve your future learning experience. Your open and honest feedback is valuable and appreciated.
MCT USE ONLY. STUDENT USE PROHIBITED
L1-1
3. In the User Account Control dialog box, click Yes, and then wait for setup to complete.
2. In the Connect to Server dialog box, in the Server type box, click Database Engine.
4. In the Authentication box, click Windows Authentication, and then click Connect.
9. In the SalesOrderDetail (Sales) table, right-click the SalesOrderID column, and then click
Properties.
10. In the Properties window, click the Description field, and then click the ellipsis button (…).
11. In the Description Property dialog box, note that the column is a primary key column, and that
there is a foreign key that references the SalesOrderID column in the SalesOrderHeader column.
Click Cancel.
12. Click the line between the Customer (Sales) table and the SalesOrderHeader (Sales) table. This line
represents a foreign key relationship.
13. In the Properties window, click Description, and then click the ellipsis button (…).
14. In the Description Property dialog box, note that the foreign key references the CustomerID
column in the Customer (Sales) table. Click Cancel.
MCT USE ONLY. STUDENT USE PROHIBITED
L1-2 Introduction to SQL Databases
15. On the File menu, click Save Diagram_0, in the Choose Name dialog box, type Adventure Works
Diagram, and then click OK.
16. Close the database diagram window, leaving SQL Server Management Studio open for the next task.
2. If a dialog box appears asking if you wish to create support objects for database diagramming, click
Yes.
3. In the Add Table dialog box, press and hold down the CTRL key, click DimCustomer, DimDate,
DimProduct, DimProductCategory, DimProductSubcategory, FactInternetSales, click Add, and
then click Close.
4. Note that the FactInternetSales table has foreign key relationships with the DimCustomer, Dim
Product, and DimDate tables. Examine these relationships in the same way as you did in the previous
task, noting the columns involved in the relationships.
5. On the File menu, click Save Diagram_0, in the Choose Name dialog box, type Adventure Works
Data Warehouse Diagram, and then click OK.
6. Close the database diagram window, leaving SQL Server Management Studio open for the next
exercise.
Results: After completing this exercise, you will have created a database diagram in the
AdventureWorks2016 database and a database diagram in the AdventureWorksDW2016 database.
MCT USE ONLY. STUDENT USE PROHIBITED
L1-3
2. In the query window, type the following Transact-SQL statement, and then click Execute:
SELECT *
FROM Sales.SalesOrderHeader;
GO
3. Review the results in the results pane, and in the lower right corner, note the number of rows that the
query returned.
4. In the query window, under the existing Transact-SQL statement, type the following Transact-SQL
statement:
5. Select the statement that you just typed, and then click Execute.
6. Review the results in the results pane, and in the lower right corner, note the number of rows that the
query returned.
2. Select the statement that you just typed, and then click Execute.
3. Review the results in the results pane, and in the lower right corner, note the number of rows that the
query returned.
4. In the query window, under the existing Transact-SQL statements, type the following Transact-SQL
statement:
5. Select the statement that you just typed, and then click Execute.
6. Review the results in the results pane, and in the lower right corner, note the number of rows that the
query returned.
7. In the query window, under the existing Transact-SQL statements, type the following Transact-SQL
statement:
8. Select the statement that you just typed, and then click Execute.
MCT USE ONLY. STUDENT USE PROHIBITED
L1-4 Introduction to SQL Databases
9. Review the results in the results pane, and in the lower right corner, note the number of rows that the
query returned.
10. In the query window, under the existing Transact-SQL statements, type the following Transact-SQL
statement:
11. Select the statement that you just typed, and then click Execute.
12. Review the results in the results pane, and in the lower right corner, note the number of rows that the
query returned.
Results:
Written and executed SELECT statements to retrieve all columns and to retrieve specific columns from a
table in the Adventure Works OLTP database.
Written and executed SELECT statements that include a WHERE clause to filter the rows that are returned
from a table in the Adventure Works OLTP database.
MCT USE ONLY. STUDENT USE PROHIBITED
L2-1
2. On examination, extract the real nouns (those that refer to actors or actions in the application).
4. Add these attributes to the entities in the list you developed in the previous task.
5. From the list of attributes, identify the candidate keys and primary keys. These will be those attributes
that (either individually or in combination) uniquely identify instances of that entity.
Results: After completing this exercise, you will have an initial list of entities and attributes that model the
data requirements for the brief provided. The entity definitions will include appropriate domains.
4. Model these relationships by sharing keys. (Hint: use Primary Keys in one as Foreign Keys in another.)
6. What problems do you see for modeling relationships with the model in its current form? (Hint: think
about the degree of some of the relationships.)
2. Resolve these relationships by creating an intersection entity, containing the Primary Keys of each
entity involved in the original m:n relationship.
Results: After completing this exercise, you will have an initial list of relationships between the entities
that model the data requirements for the brief provided. You will have an initial ERD and have resolved
any relationships that cannot be modeled (many-to-many relationships).
Results: After completing this exercise, you will have a final data model meeting the original specification.
MCT USE ONLY. STUDENT USE PROHIBITED
L3-1
Module 3: Normalization
Lab: Normalizing Data
Exercise 1: Normalizing to First Normal Form
Task 1: Review the Data Set
1. Ensure that the 10985C-MIA-CLI virtual machine is running, and then log as Student with the
password Pa55w.rd.
3. Click the Raw Data tab, and then review the data set, noting any aspects of the data that cause it to
violate first normal form.
Results: After completing this exercise, you will have normalized a data set to first normal form.
2. Click the Normalize to 2NF tab, and then review the table to identify any features of the data that
might cause the table to violate second normal form.
2. Click the 2NF Solution tab, and then review the suggested solution, comparing it to your own.
Results: After completing this exercise, you will have normalized the data set to second normal form.
MCT USE ONLY. STUDENT USE PROHIBITED
L3-2 Introduction to SQL Databases
2. Click the Normalize to 3NF tab, and then review the table to identify any features of the data that
might cause the table to violate third normal form.
2. Click the 3NF Solution tab, and then review the suggested solution, comparing it to your own.
Results: After completing this exercise, you will have normalized the data set to third normal form.
2. Click the Denormalize tab, and note that there is one row in the Orders table for each order, and
one or more rows for each order in the OrderDetails table.
2. Click the Denormalized Solution tab, and then review the suggested solution, comparing it to your
own.
Results: After completing this exercise, you will have denormalized a table.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-1
Module 4: Relationships
Lab: Planning and Implementing Referential
Integrity
Exercise 1: Planning Referential Integrity
Task 1: Prepare the Lab Environment
1. Ensure that the 10985C-MIA-DC and 10985C-MIA-SQL virtual machines are both running, and then
log on to 10985C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
4. In the Add Table dialog box, press and hold down the SHIFT key, click Products, click Add, and then
click Close.
5. Rearrange the tables so that you can see them all at the same time. You can do this by clicking the
table header and dragging the table to the required place.
6. Review the database diagram and determine the relationships between the tables and what types of
relationships they are (one-to-one, one-to-many, or many-to-many).
4. In the Choose Name dialog box, type Keys, and then click OK.
Results: After completing this exercise, you will have identified the keys required to enforce referential
integrity rules.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-2 Introduction to SQL Databases
Note: You can find the completed Transact-SQL statements for this exercise in the Create
ForeignKeys.sql file in the D:\Labfiles\Lab04\Solution folder.
2. In the query window, type the following Transact-SQL statement, and then click Execute:
USE OrdersDatabase;
GO
ALTER TABLE Orders
ADD CONSTRAINT FK_Orders_Customers FOREIGN KEY (CustomerID)
REFERENCES Customers (CustomerID);
GO
3. To test the foreign key, in the query window, under the existing Transact-SQL statements, type the
following Transact-SQL statement, select it, and then click Execute:
6. Note that the INSERT is prevented by the foreign key constraint because the CustomerID value 5
does not exist in the Customers table.
2. To test the foreign key, in the query window, under the existing Transact-SQL statements, type the
following Transact-SQL statement, select it, and then click Execute:
3. Note that the INSERT is prevented by the foreign key constraint because the CustomerID value 5
does not exist in the Customers table.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-3
2. To test the foreign key, in the query window, under the existing Transact-SQL statements, type the
following Transact-SQL statement, select it, and then click Execute:
5. Note that the INSERT is prevented because the OrderID value 106 does not exist in the Orders table.
6. In the query window under the existing Transact-SQL statements, type the following Transact-SQL
statement, select it, and then click Execute:
7. To test the foreign key, in the query window, under the existing Transact-SQL statements, type the
following Transact-SQL statement, select it, and then click Execute:
9. To test the foreign key again, in the query window under the existing Transact-SQL statements, type
the following Transact-SQL statement, select it, and then click Execute:
10. Note that the INSERT is prevented because the ProductID value 66 does not exist in the Products
table.
12. In the Save File As dialog box, browse to D:\Labfiles\Lab04\Starter, in the File name box, type
CreateForeignKeys, and then click Save.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-4 Introduction to SQL Databases
13. In Object Explorer, under OrdersDatabase, expand Database Diagrams, and then double-click
dbo.Keys. Review the database diagram and note that it includes the foreign key relationships that
you created in this exercise.
Results: After completing this exercise, you will have implemented referential integrity in the
OrdersDatabase database using constraints.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-5
Note: You can find the completed Transact-SQL statements for this exercise in the
ImplementCascadingIntegrity.sql file in the D:\Labfiles\Lab04\Solution folder.
1. In SQL Server Management Studio, click New Query, in the query window, type the following
Transact-SQL statement, and then click Execute:
USE OrdersDatabase;
GO
DELETE Customers
WHERE CustomerID = 2;
GO
2. Review the results, and note that the DELETE failed because of a foreign key constraint.
3. In the query window, under the existing Transact-SQL statements, type the following Transact-SQL
statement, select it, and then click Execute:
4. Select the Transact-SQL statement you typed in step 1, and then click Execute.
5. Review the results, and note that the DELETE failed because of the foreign key constraint on the
Orders table.
2. Under the existing Transact-SQL statements, type the following Transact-SQL statement, which adds a
row to the Customers table with a CustomerID value of 0. Select the statement, and then click
Execute:
3. Under the existing Transact-SQL statements, type the following Transact-SQL statement, select it, and
then click Execute:
4. Select the Transact-SQL statement you typed in step 1 of the previous task, and then click Execute.
6. Under the existing Transact-SQL statements, type the following Transact-SQL statement, select it, and
then click Execute:
7. Review the results, and note that OrderID 101 has a CustomerID value of 0.
8. Under the existing Transact-SQL statements, type the following Transact-SQL statement, select it, and
then click Execute:
9. Review the results, and note that CustomerID 2 has been removed.
Results: After completing this exercise, you will have implemented cascading referential integrity.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-1
Module 5: Performance
Lab: Performance Issues
Exercise 1: Designing for Efficient Queries
Task 1: Plan Indexes for Querying
1. Ensure that the 10985C-MIA-DC and 10985C-MIA-SQL virtual machines are both running, and then
log on to 10985C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
3. Plan which clustered and nonclustered indexes you would add to this design, based on the design of
the tables and the common queries.
5. Review and discuss any differences between your strategy and the proposed index strategy in the
document.
Results: After completing this exercise, you will have created an index plan.
MCT USE ONLY. STUDENT USE PROHIBITED
MCT USE ONLY. STUDENT USE PROHIBITED
L6-1
3. In the User Account Control dialog box, click Yes, and then wait for setup to complete.
Note: You can find the completed Transact-SQL statements for this exercise in the Create
Line Items Table.sql file in the D:\Labfiles\Lab06\Solution folder.
1. On the taskbar, click Microsoft SQL Server Management Studio, in the Connect to Server dialog
box, in the Server type field, select Database Engine, in the Server name field, type MIA-SQL, in
the Authentication field, select Windows Authentication, and then click Connect.
2. Click New Query, and then in the new query window, type the following Transact-SQL statement,
and then click Execute:
USE OrdersDatabase;
GO
CREATE TABLE Sales.LineItems
(OrderID INT NOT NULL
,ProductID INT NOT NULL
,UnitPrice MONEY NOT NULL
,Quantity SMALLINT NOT NULL
CONSTRAINT PK_LineItems PRIMARY KEY (OrderID,ProductID)
CONSTRAINT FK_LineItems_Orders FOREIGN KEY (OrderID)
REFERENCES Sales.Orders (OrderID));
GO
3. In Object Explorer, expand Databases, expand OrdersDatabase, expand Tables, and verify that the
Sales.LineItems table appears in the list of tables.
4. On the taskbar, click File Explorer, browse to the D:\Labfiles\Lab06\Starter folder, right-click
Populate LineItems.cmd, and then click Run as administrator.
5. In the User Account Control dialog box, click Yes, and then wait for setup to complete. This script
adds rows to the newly created Sales.LineItems table.
2. Under the existing Transact-SQL statement, type the following Transact-SQL statement, select it, and
then click Execute:
3. Review the results, and note that the value in the OrderDate column was added by the DEFAULT
constraint.
5. Leave SQL Server Management Studio open for the next exercise.
Results: After completing this exercise, you will have a new table named Sales.LineItems and a new
DEFAULT constraint on the Sales.Orders table.
MCT USE ONLY. STUDENT USE PROHIBITED
L6-3
Note: The file Create View.sql in the D:\Labfiles\Lab06\Solution folder contains the
Transact-SQL statements used in this exercise.
1. In SQL Server Management Studio, click New Query. In the query window, type the following
Transact-SQL statement, and then click Execute:
USE OrdersDatabase;
GO
SELECT (FirstName + ' ' + LastName) AS [Customer Name]
,O.OrderID, OrderDate
,SUM(UnitPrice*Quantity) AS [Order Total]
FROM Person.Customers as C
JOIN Sales.Orders as O
ON C.CustomerID = O.CustomerID
JOIN Sales.LineItems as L
ON O.OrderID = L.OrderID
GROUP BY O.OrderID, Firstname, Lastname, OrderDate;
GO
2. Review the results, noting that the query returns one row for each of the orders in the Sales.Orders
table, along with a single total cost for each order, which is calculated in the query by multiplying the
UnitPrice column by the Quantity column in the Sales.LineItems table. Note also that the
Customer Name column is created by concatenating the FirstName and LastName columns in the
Person.Customers table.
4. Review the results, noting that they are the same as the results that the SELECT query returned in the
previous exercise.
6. Leave SQL Server Management Studio open for the next exercise.
Results: After completing this exercise, you will have a new view in the database.
MCT USE ONLY. STUDENT USE PROHIBITED
L6-4 Introduction to SQL Databases
Note: You can find the completed Transact-SQL statements for this exercise in the Create
Stored Procedure.sql file in the D:\Labfiles\Lab06\Solution folder.
In SQL Server Management Studio, click New Query, type the following Transact-SQL statement, and
then click Execute:
USE OrdersDatabase;
GO
CREATE PROCEDURE USP_InsertOrders
@OrderID int, @CustomerID int
AS
INSERT INTO Sales.Orders (OrderID, CustomerID)
VALUES
(@OrderID, @CustomerID);
GO
2. Review the results set, noting that it includes the row that the stored procedure added, and that the
OrderDate value was added by the DEFAULT constraint.
Results: After completing this exercise, you will have a new stored procedure in the database.