Skip to content

Commit dbec7e2

Browse files
authored
Merge pull request #76274 from garyericson/05-08-git-tutorials
ASQLML: Porting SQL ski rental tutorial to Azure SQL
2 parents 4df1a52 + 90fd622 commit dbec7e2

File tree

5 files changed

+528
-0
lines changed

5 files changed

+528
-0
lines changed
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
title: "Tutorial: Train and compare predictive models in R"
3+
titleSuffix: Azure SQL Database Machine Learning Services (preview)
4+
description: In part two of this three-part tutorial series, you'll create two predictive models in R with Azure SQL Database Machine Learning Services (preview), and then select the most accurate model.
5+
services: sql-database
6+
ms.service: sql-database
7+
ms.subservice: machine-learning
8+
ms.custom:
9+
ms.devlang: r
10+
ms.topic: tutorial
11+
author: garyericson
12+
ms.author: garye
13+
ms.reviewer: davidph
14+
manager: cgronlun
15+
ms.date: 05/02/2019
16+
---
17+
18+
# Tutorial: Create a predictive model in R with Azure SQL Database Machine Learning Services (preview)
19+
20+
In part two of this three-part tutorial series, you'll create two predictive models in R with Azure SQL Database Machine Learning Services (preview), and then select the most accurate model.
21+
22+
This tutorial is **part two of a three-part tutorial series**.
23+
24+
In part two, you'll learn how to:
25+
26+
> [!div class="checklist"]
27+
> * Train two machine learning models
28+
> * Make predictions from both models
29+
> * Compare the results to choose the most accurate model
30+
31+
In [part one](sql-database-tutorial-predictive-model-prepare-data.md), you learned how to import a sample database into an Azure SQL database, and then prepare the data to be used for training a predictive model in R.
32+
33+
In [part three](sql-database-tutorial-predictive-model-deploy.md), you'll learn how to store the model in a database, and then create a stored procedure that can make predictions based on new data.
34+
35+
[!INCLUDE[ml-preview-note](../../includes/sql-database-ml-preview-note.md)]
36+
37+
## Prerequisites
38+
39+
* Part two of this tutorial assumes you have completed [**part one**](sql-database-tutorial-predictive-model-prepare-data.md) and its prerequisites.
40+
41+
## Train two models
42+
43+
To find the best model for the ski rental data, create two different models (linear regression and decision tree) and see which one is predicting more accurately. You'll use the data frame `rentaldata` that you created in part one of this series.
44+
45+
```r
46+
#First, split the dataset into two different sets:
47+
# one for training the model and the other for validating it
48+
train_data = rentaldata[rentaldata$Year < 2015,];
49+
test_data = rentaldata[rentaldata$Year == 2015,];
50+
51+
#Use the RentalCount column to check the quality of the prediction against actual values
52+
actual_counts <- test_data$RentalCount;
53+
54+
#Model 1: Use rxLinMod to create a linear regression model, trained with the training data set
55+
model_linmod <- rxLinMod(RentalCount ~ Month + Day + WeekDay + Snow + Holiday, data = train_data);
56+
57+
#Model 2: Use rxDTree to create a decision tree model, trained with the training data set
58+
model_dtree <- rxDTree(RentalCount ~ Month + Day + WeekDay + Snow + Holiday, data = train_data);
59+
```
60+
61+
## Make predictions from both models
62+
63+
Use a predict function to predict the rental counts using each trained model.
64+
65+
```r
66+
#Use both models to make predictions using the test data set.
67+
predict_linmod <- rxPredict(model_linmod, test_data, writeModelVars = TRUE, extraVarsToWrite = c("Year"));
68+
69+
predict_dtree <- rxPredict(model_dtree, test_data, writeModelVars = TRUE, extraVarsToWrite = c("Year"));
70+
71+
#To verify it worked, look at the top rows of the two prediction data sets.
72+
head(predict_linmod);
73+
head(predict_dtree);
74+
```
75+
76+
```results
77+
RentalCount_Pred RentalCount Month Day WeekDay Snow Holiday
78+
1 27.45858 42 2 11 4 0 0
79+
2 387.29344 360 3 29 1 0 0
80+
3 16.37349 20 4 22 4 0 0
81+
4 31.07058 42 3 6 6 0 0
82+
5 463.97263 405 2 28 7 1 0
83+
6 102.21695 38 1 12 2 1 0
84+
RentalCount_Pred RentalCount Month Day WeekDay Snow Holiday
85+
1 40.0000 42 2 11 4 0 0
86+
2 332.5714 360 3 29 1 0 0
87+
3 27.7500 20 4 22 4 0 0
88+
4 34.2500 42 3 6 6 0 0
89+
5 645.7059 405 2 28 7 1 0
90+
6 40.0000 38 1 12 2 1 0
91+
```
92+
93+
## Compare the results
94+
95+
Now you want to see which of the models gives the best predictions. A quick and easy way to do this is to use a basic plotting function to view the difference between the actual values in your training data and the predicted values.
96+
97+
```r
98+
#Use the plotting functionality in R to visualize the results from the predictions
99+
par(mfrow = c(2, 1));
100+
plot(predict_linmod$RentalCount_Pred - predict_linmod$RentalCount, main = "Difference between actual and predicted. rxLinmod");
101+
plot(predict_dtree$RentalCount_Pred - predict_dtree$RentalCount, main = "Difference between actual and predicted. rxDTree");
102+
```
103+
104+
![Comparing the two models](./media/sql-database-tutorial-predictive-model-build-compare/compare-models.png)
105+
106+
It looks like the decision tree model is the more accurate of the two models.
107+
108+
## Clean up resources
109+
110+
If you're not going to continue with this tutorial, delete the TutorialDB database from your Azure SQL Database server.
111+
112+
From the Azure portal, follow these steps:
113+
114+
1. From the left-hand menu in the Azure portal, select **All resources** or **SQL databases**.
115+
1. In the **Filter by name...** field, enter **TutorialDB**, and select your subscription.
116+
1. Select your TutorialDB database.
117+
1. On the **Overview** page, select **Delete**.
118+
119+
## Next Steps
120+
121+
In part two of this tutorial series, you completed these steps:
122+
123+
* Train two machine learning models
124+
* Make predictions from both models
125+
* Compare the results to choose the most accurate model
126+
127+
To deploy the machine learning model you've created, follow part three of this tutorial series:
128+
129+
> [!div class="nextstepaction"]
130+
> [Tutorial: Deploy a predictive model in R with Azure SQL Database Machine Learning Services (preview)](sql-database-tutorial-predictive-model-deploy.md)
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
---
2+
title: "Tutorial: Deploy a predictive model in R"
3+
titleSuffix: Azure SQL Database Machine Learning Services (preview)
4+
description: In part three of this three-part tutorial, you'll deploy a predictive model in R with Azure SQL Database Machine Learning Services (preview).
5+
services: sql-database
6+
ms.service: sql-database
7+
ms.subservice: machine-learning
8+
ms.custom:
9+
ms.devlang: r
10+
ms.topic: tutorial
11+
author: garyericson
12+
ms.author: garye
13+
ms.reviewer: davidph
14+
manager: cgronlun
15+
ms.date: 05/02/2019
16+
---
17+
18+
# Tutorial: Deploy a predictive model in R with Azure SQL Database Machine Learning Services (preview)
19+
20+
In part three of this three-part tutorial, you'll deploy a predictive model in R with Azure SQL Database Machine Learning Services (preview).
21+
22+
You'll create a stored procedure with an embedded R script that makes predictions using the model. Because your model executes in the Azure SQL database, it can easily be trained against data stored in the database.
23+
24+
This tutorial is **part three of a three-part tutorial series**.
25+
26+
In part three, you'll learn how to:
27+
28+
> [!div class="checklist"]
29+
> * Store the predictive model in a database table
30+
> * Create a stored procedure that generates the model
31+
> * Create a stored procedure that makes predictions using the model
32+
> * Execute the model with new data
33+
34+
In [part one](sql-database-tutorial-predictive-model-prepare-data.md), you learned how how to import a sample database into an Azure SQL database, and then prepare the data to be used for training a predictive model in R.
35+
36+
In [part two](sql-database-tutorial-predictive-model-build-compare.md), you learned how to create and train multiple models, and then choose the most accurate one.
37+
38+
[!INCLUDE[ml-preview-note](../../includes/sql-database-ml-preview-note.md)]
39+
40+
## Prerequisites
41+
42+
* Part three of this tutorial series assumes you have completed [**part one**](sql-database-tutorial-predictive-model-prepare-data.md) and [**part two**](sql-database-tutorial-predictive-model-build-compare.md).
43+
44+
## Create a stored procedure that generates the model
45+
46+
In part two of this tutorial series, you decided that a decision tree (dtree) model was the most accurate. Now create a stored procedure (`generate_rental_rx_model`) that trains and generates the dtree model using rxDTree from the RevoScaleR package.
47+
48+
Run the following commands in Azure Data Studio or SSMS.
49+
50+
```sql
51+
-- Stored procedure that trains and generates an R model using the rental_data and a decision tree algorithm
52+
DROP PROCEDURE IF EXISTS generate_rental_rx_model;
53+
GO
54+
CREATE PROCEDURE generate_rental_rx_model (@trained_model VARBINARY(max) OUTPUT)
55+
AS
56+
BEGIN
57+
EXECUTE sp_execute_external_script @language = N'R'
58+
, @script = N'
59+
require("RevoScaleR");
60+
61+
rental_train_data$Holiday <- factor(rental_train_data$Holiday);
62+
rental_train_data$Snow <- factor(rental_train_data$Snow);
63+
rental_train_data$WeekDay <- factor(rental_train_data$WeekDay);
64+
65+
#Create a dtree model and train it using the training data set
66+
model_dtree <- rxDTree(RentalCount ~ Month + Day + WeekDay + Snow + Holiday, data = rental_train_data);
67+
#Serialize the model before saving it to the database table
68+
trained_model <- as.raw(serialize(model_dtree, connection=NULL));
69+
'
70+
, @input_data_1 = N'
71+
SELECT RentalCount
72+
, Year
73+
, Month
74+
, Day
75+
, WeekDay
76+
, Snow
77+
, Holiday
78+
FROM dbo.rental_data
79+
WHERE Year < 2015
80+
'
81+
, @input_data_1_name = N'rental_train_data'
82+
, @params = N'@trained_model varbinary(max) OUTPUT'
83+
, @trained_model = @trained_model OUTPUT;
84+
END;
85+
GO
86+
```
87+
88+
## Store the model in a database table
89+
90+
Create a table in the TutorialDB database and then save the model to the table.
91+
92+
1. Create a table (`rental_rx_models`) for storing the model.
93+
94+
```sql
95+
USE TutorialDB;
96+
DROP TABLE IF EXISTS rental_rx_models;
97+
GO
98+
CREATE TABLE rental_rx_models (
99+
model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY
100+
, model VARBINARY(MAX) NOT NULL
101+
);
102+
GO
103+
```
104+
105+
1. Save the model to the table as a binary object, with the model name "rxDTree".
106+
107+
```sql
108+
-- Save model to table
109+
TRUNCATE TABLE rental_rx_models;
110+
111+
DECLARE @model VARBINARY(MAX);
112+
113+
EXECUTE generate_rental_rx_model @model OUTPUT;
114+
115+
INSERT INTO rental_rx_models (
116+
model_name
117+
, model
118+
)
119+
VALUES (
120+
'rxDTree'
121+
, @model
122+
);
123+
124+
SELECT *
125+
FROM rental_rx_models;
126+
```
127+
128+
## Create a stored procedure that makes predictions
129+
130+
Create a stored procedure (`predict_rentalcount_new`) that makes predictions using the trained model and a set of new data.
131+
132+
```sql
133+
-- Stored procedure that takes model name and new data as input parameters and predicts the rental count for the new data
134+
DROP PROCEDURE IF EXISTS predict_rentalcount_new;
135+
GO
136+
CREATE PROCEDURE predict_rentalcount_new (
137+
@model_name VARCHAR(100)
138+
, @input_query NVARCHAR(MAX)
139+
)
140+
AS
141+
BEGIN
142+
DECLARE @rx_model VARBINARY(MAX) = (
143+
SELECT model
144+
FROM rental_rx_models
145+
WHERE model_name = @model_name
146+
);
147+
148+
EXECUTE sp_execute_external_script @language = N'R'
149+
, @script = N'
150+
require("RevoScaleR");
151+
152+
#Convert types to factors
153+
rentals$Holiday <- factor(rentals$Holiday);
154+
rentals$Snow <- factor(rentals$Snow);
155+
rentals$WeekDay <- factor(rentals$WeekDay);
156+
157+
#Before using the model to predict, we need to unserialize it
158+
rental_model <- unserialize(rx_model);
159+
160+
#Call prediction function
161+
rental_predictions <- rxPredict(rental_model, rentals);
162+
'
163+
, @input_data_1 = @input_query
164+
, @input_data_1_name = N'rentals'
165+
, @output_data_1_name = N'rental_predictions'
166+
, @params = N'@rx_model varbinary(max)'
167+
, @rx_model = @rx_model
168+
WITH RESULT SETS(("RentalCount_Predicted" FLOAT));
169+
END;
170+
GO
171+
```
172+
173+
## Execute the model with new data
174+
175+
Now you can use the stored procedure `predict_rentalcount_new` to predict the rental count from new data.
176+
177+
```sql
178+
-- Use the predict_rentalcount_new stored procedure with the model name and a set of features to predict the rental count
179+
EXECUTE dbo.predict_rentalcount_new @model_name = 'rxDTree'
180+
, @input_query = '
181+
SELECT CONVERT(INT, 3) AS Month
182+
, CONVERT(INT, 24) AS Day
183+
, CONVERT(INT, 4) AS WeekDay
184+
, CONVERT(INT, 1) AS Snow
185+
, CONVERT(INT, 1) AS Holiday
186+
';
187+
GO
188+
```
189+
190+
You should see a result similar to the following.
191+
192+
```results
193+
RentalCount_Predicted
194+
332.571428571429
195+
```
196+
197+
You have successfully created, trained, and deployed a model in an Azure SQL database. You then used that model in a stored procedure to predict values based on new data.
198+
199+
## Clean up resources
200+
201+
When you've finished using the TutorialDB database, delete it from your Azure SQL Database server.
202+
203+
From the Azure portal, follow these steps:
204+
205+
1. From the left-hand menu in the Azure portal, select **All resources** or **SQL databases**.
206+
1. In the **Filter by name...** field, enter **TutorialDB**, and select your subscription.
207+
1. Select your TutorialDB database.
208+
1. On the **Overview** page, select **Delete**.
209+
210+
## Next Steps
211+
212+
In part three of this tutorial series, you completed these steps:
213+
214+
* Store the predictive model in a database table
215+
* Create a stored procedure that generates the model
216+
* Create a stored procedure that makes predictions using the model
217+
* Execute the model with new data
218+
219+
To learn more about using R in Azure SQL Database Machine Learning Services (preview), see:
220+
221+
* [Write advanced R functions in Azure SQL Database using Machine Learning Services (preview)](sql-database-machine-learning-services-functions.md)
222+
* [Work with R and SQL data in Azure SQL Database Machine Learning Services (preview)](sql-database-machine-learning-services-data-issues.md)
223+
* [Add an R package to Azure SQL Database Machine Learning Services (preview)](sql-database-machine-learning-services-add-r-packages.md)

0 commit comments

Comments
 (0)