Microsoft Virtual Labs (1) .SQL Server 2005 Data Mining
Microsoft Virtual Labs (1) .SQL Server 2005 Data Mining
Table of Contents
SQL Server™ 2005: Data Mining ............................................................................................... 1
Exercise 1 Lab Setup .....................................................................................................................................................2
Exercise 2 Creating Decision Tree and Naïve Bayes Data Mining Models ..................................................................4
Exercise 3 Viewing Mining Accuracy Charts ............................................................................................................. 16
Exercise 4 Creating a Prediction Query ....................................................................................................................... 21
SQL Server™ 2005: Data Mining
Page 1 of 25
SQL Server™ 2005: Data Mining
Exercise 1
Lab Setup
Scenario
In this part of the lab you will set up the views you will work with in the rest of the lab.
Page 2 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 3 of 25
SQL Server™ 2005: Data Mining
Exercise 2
Creating Decision Tree and Naïve Bayes Data Mining
Models
Scenario
The management at Adventure Works wants to analyze purchasing decisions based on customer demographics.
Analysis Services has improved data mining functionality, providing the following data mining techniques:
• Microsoft Association Rules
• Microsoft Clustering
• Microsoft Decision Trees
• Microsoft Naïve Bayes
• Microsoft Neural Network
• Microsoft Sequence Clustering
• Microsoft Time Series
In this exercise, you will develop an Analysis Services solution using the Microsoft Business Intelligence
Development Studio environment. The Business Intelligence Development Studio is an environment based on the
Microsoft Visual Studio 2005 environment.
Business Intelligence Development Studio provides you with an integrated development environment for designing,
testing, editing, and deploying projects to the Analysis Server. You will create and view a data mining structure with
Decision Trees and Naïve Bayes data mining models using AdventureWorksDW customer data.
To create and view data mining models, you will:
• Create an Analysis Services project in the Business Intelligence Development Studio environment.
• Create a data source and data source view.
• Create a data mining structure and decision trees data mining model using the Mining Model Wizard.
• Create a related mining model (Naïve Bayes) in the Mining Models view.
• Deploy the Analysis Services solution.
• Explore the data mining models using the Mining Model Viewer.
Page 4 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 5 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
f. Under Log on to the server, click Use Windows Authentication.
g. In the Select or enter a database name drop-down list, click
AdventureWorksDW.
h. Click Test Connection.
i. Click OK to dismiss the message box
j. In the Connection Manager dialog box, click OK.
k. In the Data Source Wizard dialog box, on the Select how to define the
connection page, verify that localhost.AdventureWorksDW is selected, and click
Next.
l. In the Impersonation Information page, check the Default checkbox and click
Next.
m. On the Completing the Data Source Wizard page, leave the default Data source
name Adventure Works DW unchanged, and then click Finish.
Note: You have now set up the information how to connect to the database you are
working with. It is now time to define the schema information you want to use in the
solution. You do this through the Data Source View.
4. Create a Data Source a. In the Solution Explorer pane, under the DM Exercise 1 project, right-click the
View Data Source Views folder, and then select New Data Source View from the
context menu.
b. In the Data Source View Wizard dialog box, on the Welcome to the Data
Source View Wizard page, click Next.
c. On the Select Data Source page, in the Relational data sources pane, verify that
Adventure Works DW is selected, and then click Next.
Note: At this point, Analysis Services may take a few moments to read the database
schema.
d. In this project, your Data Source View is not going to be based on a table; instead,
it will be based on a view. On the Select Tables and Views page, double-click
vDMLabCustomerTrain to add this table to the Included objects list.
Note: You may need to expand the Name column, and/or the entire dialog box, in
order to be able to select vDMLabCustomerTrain.
e. Click Next.
f. On the Completing the Wizard page, in the Name text box, type Customers and
then click Finish. The Data Source View Designer will open. The Data Source
View Designer is a graphical representation of the data schema you have defined.
g. Right-click the vDMLabCustomerTrain table and then click Explore Data, as in
Figure 2.
Page 6 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 7 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
vDMLabCustomerTrain row, verify that the Case check box is selected, and
then click Next.
g. On the Specify the Training Data page, in the Mining model structure pane,
select or deselect each cell by clicking on the check box as shown in Figure 4.
Page 8 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 9 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 10 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
d. When the alert appears confirming that you want to use the Microsoft Naive Bayes
algorithm and that some columns will be ignored, click Yes to approve and dismiss
the dialog box.
Note: The Naïve Bayes algorithm does not support continuous columns. Therefore, the
Age column will be ignored in this mining model. Instead, you will use the Age
Discretized column.
e. Click in the Age Discretized cell in the Customers NB column (the content is
currently Ignore) in the cell drop-down list, select Input as in Figure 8.
Page 11 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 12 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Note: In the above procedures, various wizards and editors have been creating XML
code based on your input. Deployment sends the XML code to the Analysis Server and
then processes the Analysis Services database.
10. View the Customers a. On the tabs above the designer window, click the Mining Model Viewer tab.
DT Mining Model Note: If an alert appears indicating that changes have been made, click No.
Decision Tree
b. In the Mining Model drop-down list, select Customers DT.
c. Select View | Full Screen (or press Shift+Left-Alt+Enter) to view the designer
window full screen. Repeat the process to return to normal view.
Note: If accidentally closed, the Mining Model Viewer of the Mining Model Designer
can be re-opened. Select the View | Solution Explorer menu item. In the Solution
Explorer window, under the Mining Models folder, right-click Customers.dmm and
select Browse from the context menu.
d. In the Tree drop-down list, make sure Bike Buyer is selected; Figure 12 shows the
result.
Page 13 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 14 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 15 of 25
SQL Server™ 2005: Data Mining
Exercise 3
Viewing Mining Accuracy Charts
Scenario
The management team at Adventure Works wants to determine the accuracy of their data mining models. Using a
validation data set that was held out of the training set, they create Mining Accuracy Charts to visually identify
which model is performing most accurately.
In this exercise, you will validate the mining models created in Exercise 1 by using the Mining Accuracy Chart view
of the Mining Structure Designer.
To view the Mining Accuracy Chart, you will:
• Create a prediction query by selecting an input table and mapping the columns of the data mining model to
the columns in the validation data set.
• View and interpret a Lift Chart
Page 16 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 17 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Query Customers.dmm.
Note: If the Solution Explorer window is not visible, select the View | Solution
Explorer menu item.
b. From the list of tabs above the designer window, select the Mining Accuracy
Chart icon. The Mining Accuracy Chart view will open, displaying the Column
Mapping page, as shown in Figure 3.
Page 18 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
g. In the Predict Value column, in the drop-down list, click Yes for both mining
models, as shown in Figure 4.
Page 19 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 20 of 25
SQL Server™ 2005: Data Mining
Exercise 4
Creating a Prediction Query
Scenario
The sales and marketing department at Adventure Works received a potential customer list containing demographic
data. The department will use the Customers DT data mining model to predict the likelihood that an individual in the
list is a bike buyer.
In this exercise, you will make predictions by using the Mining Model Prediction view of the Mining Structure
Designer.
To make predictions, you will:
• Select an input table and map the columns of the data mining model to the columns in the prediction data
set.
• Create a prediction query in the Prediction Query Builder Design pane.
• View the prediction query results.
Page 21 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Page 22 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Query using the Customers.dmm.
Decision Tree Note: If the Solution Explorer pane is not visible, select View | Solution Explorer.
Mining Model
b. From the list of tabs above the designer window, select the Mining Model
Prediction tab.
c. In the Mining Model window, click the Select Model… button.
d. In the Select Mining Model dialog box, expand Customers, click Customers DT,
as shown in Figure 3, and then click OK.
Page 23 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
Column Value
Source Customers DT mining model
Field Bike Buyer
Show (Checked)
Note: The value in the Source column will change from Customers DT mining model
to Customers DT.
c. Enter values into the third row of the table as shown below:
Column Value
Source Prediction Function
Field PredictProbability
Alias Confidence
Show (Checked)
Criteria/Argument [Customers DT].[Bike Buyer]
Note: Figure 4 shows what you should see after having entered the values as
described in the previous tables.
Page 24 of 25
SQL Server™ 2005: Data Mining
Tasks Detailed Steps
The CustomerKey column identifies each record from the input table.
The Bike Buyer column contains the mining model's prediction of the customer's bike
buying behavior.
Larger values in the Confidence column mean the mining model has more confidence
in the prediction contained in the Bike Buyer column.
By contacting potential customers predicted to be bike buyers with a high Confidence
value, Adventure Works can use the results of the prediction query to promote their
bikes to those individuals most likely to be bike buyers. Adventure Works marketing
expenses will be reduced because potential customers who are not likely to be bike
buyers will not be contacted.
6. Close the Analysis a. Select File | Close Project. If prompted to save changes, select Yes.
Services Project b. Select File | Exit to close Business Intelligence Studio.
Page 25 of 25