Mslearn dp100 02
Mslearn dp100 02
Mslearn dp100 02
Create a dataset
Run an
automated Before you start
machine
learning You’ll need an Azure subscription in which you have administrative-level access.
experiment
Test the
deployed Create the workspace
service
To create the Azure Machine Learning workspace and a compute instance, you’ll use the Azure CLI. All necessary
Delete Azure
commands are grouped in a Shell script for you to execute.
resources
1. In a browser, open the Azure portal at portal.azure.com, signing in with your Microsoft account.
2. Select the [>_] (Cloud Shell) button at the top of the page to the right of the search box. This opens a Cloud
Shell pane at the bottom of the portal.
3. The first time you open the cloud shell, you will be asked to choose the type of shell you want to use (Bash
or PowerShell). Select Bash.
4. If you are asked to create storage for your cloud shell, check that the correct subscription is specified and
select Create storage. Wait for the storage to be created.
6. After the repo has been cloned, enter the following commands to change to the folder for this lab and run
the setup.sh script it contains:
Code Copy
cd mslearn-dp100
./setup.sh
7. Wait for the script to complete - this typically takes around 5-10 minutes.
1. Sign into Azure Machine Learning studio with the Microsoft credentials associated with your Azure
subscription, and select your Azure Machine Learning workspace.
2. In Azure Machine Learning studio, view the Compute page; and on the Compute instances tab, start your
compute instance if it is not already running. You will use this compute instance to test your trained model.
3. While the compute instance is starting, switch to the Compute clusters tab, and add a new compute
cluster with the following settings. You’ll run the automated machine learning experiment on this cluster to
take advantage of the ability to distribute the training runs across multiple compute nodes:
Create a dataset
Now that you have some compute resources that you can use to process data, you’ll need a way to store and
ingest the data to be processed.
1. View the comma-separated data at https://aka.ms/diabetes-data in your web browser. Then save this as a
local file named diabetes.csv (it doesn’t matter where you save it).
2. In Azure Machine Learning studio, view the Data page. Datasets represent specific data files or tables that
you plan to work with in Azure ML.
3. Create a new dataset from local files, using the following settings:
◦ Basic Info:
4. After the dataset has been created, open it and view the Explore page to see a sample of the data. This
data represents details from patients who have been tested for diabetes, and you will use it to train a
model that predicts the likelihood of a patient testing positive for diabetes based on clinical
measurements.
� Note: You can optionally generate a profile of the dataset to see more statistical details.
1. In Azure Machine Learning studio, view the Automated ML page (under Author).
2. Create a new Automated ML run with the following settings:
◦ Select dataset:
▪ Primary metric: Select AUC Weighted (more about this metric later!)
▪ Explain best model: Selected - this option causes automated machine learning to
calculate feature importance for the best model; making it possible to determine the
influence of each feature on the predicted label.
▪ Use all supported models: Unselected - we’ll restrict the experiment to try a few specific
algorithms.
▪ Allowed models: Select only LogisticRegression and RandomForest. These will be the
only algorithms tried in the experiment.
▪ Exit criterion:
▪ Training job time (hours): 0.5 - this causes the experiment to end after a maximum
of 30 minutes.
▪ Metric score threshold: 0.90 - this causes the experiment to end if a model achieves
a weighted AUC metric of 90% or higher.
◦ Select View featurization settings to open Featurization:
1. On the Overview tab of the automated machine learning run, note the best model summary.
2. Select the Algorithm name for the best model to view the child-run that produced it.
The best model is identified based on the evaluation metric you specified (AUC_Weighted). To calculate this
metric, the training process used some of the data to train the model, and applied a technique called cross-
validation to iteratively test the trained model with data it wasn’t trained with and compare the predicted
value with the actual known value. From these comparisons, a confusion matrix of true-positives, false-
positives,true-negatives, and false-negatives is tabulated and additional classification metrics calculated -
including a Receiving Operator Curve (ROC) chart that compares the True-Positive rate and False-Positive
rate. The area under this curve (AUC) us a common metric used to evaluate classification performance.
3. Next to the AUC_Weighted value, select View all other metrics to see values of other possible evaluation
metrics for a classification model.
4. Select the Metrics tab and review the performance metrics you can view for the model. These include a
confusion_matrix visualization showing the confusion matrix for the validated model, and an
accuracy_table visualization that includes the ROC chart.
5. Select the Explanations tab, select an Explanation ID, and then view the Aggregate Importance page.
This shows the extent to which each feature in the dataset influences the label prediction.
� Note: In Azure Machine Learning, you can deploy a service as an Azure Container Instances (ACI) or to an Azure Kubernetes
Service (AKS) cluster. For production scenarios, an AKS deployment is recommended, for which you must create an inference
cluster compute target. In this exercise, you’ll use an ACI service, which is a suitable deployment target for testing, and does
not require you to create an inference cluster.
1. Select the Overview tab for the run that produced the best model.
2. From the Deploy option, use the Deploy to web service button to deploy the model with the following
settings:
◦ Name: auto-predict-diabetes
◦ Description: Predict diabetes
◦ Compute type: Azure Container Instance
◦ Enable authentication: Selected
◦ Use custom deployment assets: Unselected
3. Wait for the deployment to start - this may take a few seconds. Then, on the Model tab, in the Model
summary section, observe the Deploy status for the auto-predict-diabetes service, which should be
Running. Wait for this status to change to Successful. You may need to select ↻ Refresh periodically.
NOTE This can take a while - be patient!
4. In Azure Machine Learning studio, view the Endpoints page and select the auto-predict-diabetes real-
time endpoint. Then select the Consume tab and note the following information there. You need this
information to connect to your deployed service from a client application.
1. With the Consume page for the auto-predict-diabetes service page open in your browser, open a new
browser tab and open a second instance of Azure Machine Learning studio. Then in the new tab, view the
Notebooks page.
2. In the Notebooks page, under My files, browse to the /users/your-user-name/mslearn-dp100 folder
where you cloned the notebook repository, and open the Get AutoML Prediction notebook.
3. When the notebook has opened, ensure that the compute instance you created previously is selected in
the Compute box, and that it has a status of Running.
4. In the notebook, replace the ENDPOINT and PRIMARY_KEY placeholders with the values for your service,
which you can copy from the Consume tab on the page for your endpoint.
5. Run the code cell and view the output returned by your web service.
1. Close the Azure Machine Learning Studio tab and return to the Azure portal.
2. In the Azure portal, on the Home page, select Resource groups.
3. Select the rg-dp100-labs resource group.
4. At the top of the Overview page for your resource group, select Delete resource group.
5. Enter the resource group name to confirm you want to delete it, and select Delete.