writing

Blackmist · Blackmist · commit 3ed5b04c8bbf · 2019-07-24T15:22:36.000-04:00
diff --git a/articles/machine-learning/service/how-to-deploy-inferencing-gpus.md b/articles/machine-learning/service/how-to-deploy-inferencing-gpus.md
@@ -9,39 +9,56 @@ ms.topic: conceptual
 ms.author: vaidyas
 author: csteegz
 ms.reviewer: larryfr
-ms.date: 06/01/2019
+ms.date: 07/24/2019
 ---
 
 # Deploy a deep learning model for inference with GPU
 
-This article teaches you how to use the Azure Machine Learning service to deploy a GPU-enabled Tensorflow deep learning model as a web service.
+This article teaches you how to use the Azure Machine Learning service to deploy a GPU-enabled model as a web service. The information in this article is based on deploying a model on Azure Kubernetes Service (AKS). The AKS cluster provides a GPU resource that is used by the model for inference.
 
-Deploy your model to an Azure Kubernetes Service (AKS) cluster to do GPU-enabled inferencing. Inferencing, or model scoring, is the phase where the deployed model is used for prediction. Using GPUs instead of CPUs offer performance advantages on highly parallelizable computation.
+Inference, or model scoring, is the phase where the deployed model is used to make predictions. Using GPUs instead of CPUs offers performance advantages on highly parallelizable computation.
 
-Although this sample uses a TensorFlow model, you can apply the following steps to any machine learning framework that supports GPUs by making small changes to the scoring file and the environment file. 
+> [!TIP]
+> Although the code snippets in this article usee a TensorFlow model, you can apply the information to any machine learning framework that supports GPUs.
 
-In this article, you take the following steps:
+## Prerequisites
 
-* Create a GPU-enabled AKS cluster
-* Deploy a Tensorflow GPU model
-* Issue a sample query to your deployed model
+* An Azure Machine Learning service workspace. For more information, see [Create an Azure Machine Learning service workspace](setup-create-workspace.md).
 
-## Prerequisites
+* A Python development environment with the Azure Machine Learning SDK installed. For more information, see the [Python SDK](setup-create-workspace.md#sdk) section of the Create a workspace article.
+
+* A registered model that uses a GPU.
 
-* An Azure Machine Learning services workspace.
-* A Python distro.
-* A registered Tensorflow saved model.
     * To learn how to register models, see [Deploy Models](../service/how-to-deploy-and-where.md#registermodel).
 
-You can complete part one of this how-to series, [How to Train a TensorFlow Model](how-to-train-tensorflow.md), to fulfill the necessary prerequisites.
+    * To create and register the Tensorflow model used to create this document, see [How to Train a TensorFlow Model](how-to-train-tensorflow.md).
+
+* A general understanding of [How and where to deploy models](how-to-deploy-and-where.md).
+
+## Connect to your workspace
 
-## Provision an AKS cluster with GPUs
+To connect to an existing workspace, use the following code:
 
-Azure has many different GPU options. You can use any of them for inferencing. See [the list of N-series VMs](https://azure.microsoft.com/pricing/details/virtual-machines/linux/#n-series) for a full breakdown of capabilities and costs.
+> [!IMPORTANT]
+> This code snippet expects the workspace configuration to be saved in the current directory or its parent. For more information on creating a workspace and saving the configuration to file, see [Create an Azure Machine Learning service workspace](setup-create-workspace.md).
+
+```python
+from azureml.core import Workspace
+
+# Connect to the workspace
+ws = Workspace.from_config()
+```
 
-For more information on using AKS with Azure Machine Learning service, see [How to deploy and where](../service/how-to-deploy-and-where.md#deploy-aks).
+## Create a Kubernetes cluster with GPUs
+
+Azure Kubernetes Service provides many different GPU options. You can use any of them for model inference. See [the list of N-series VMs](https://azure.microsoft.com/pricing/details/virtual-machines/linux/#n-series) for a full breakdown of capabilities and costs.
+
+The following code demonstrates how to create a new AKS cluster for your workspace:
+
+```python
+from azureml.core.compute import ComputeTarget, AksCompute
+from azureml.exceptions import ComputeTargetException
 
-```Python
 # Choose a name for your cluster
 aks_name = "aks-gpu"
 
@@ -63,11 +80,16 @@ except ComputeTargetException:
 ```
 
 > [!IMPORTANT]
-> Azure will bill you as long as the AKS cluster is provisioned. Make sure to delete your AKS cluster when you're done with it.
+> Azure will bill you as long as the AKS cluster exists. Make sure to delete your AKS cluster when you're done with it.
+
+For more information on using Azure Kubernetes Service with Azure Machine Learning service, see [How to deploy and where](how-to-deploy-and-where.md#deploy-aks).
 
 ## Write the entry script
 
-Save the following code to your working directory as `score.py`. This file scores images as they're sent to your service. It loads the TensorFlow saved model, passes the input image to the TensorFlow session on each POST request, and then returns the resulting scores. Other inferencing frameworks require different scoring files.
+The entry script receives data submitted to the web service, passes it to the model, and returns the scoring results. The following script loads the Tensorflow model on startup, and then uses the model to score data.
+
+> [!TIP]
+> The entry script is specific to your model. For example, the script must know the framework to use with your model, data formats, etc.
 
 ```python
 import json
@@ -98,9 +120,12 @@ def run(raw_data):
     y_hat = np.argmax(out, axis=1)
     return y_hat.tolist()
 ```
+
+This file is named `score.py`. For more information on entry scripts, see [How and where to deploy](how-to-deploy-and-where.md).
+
 ## Define the conda environment
 
-Create a conda environment file named `myenv.yml` to specify the dependencies for your service. It's important to specify that you're using `tensorflow-gpu` to achieve accelerated performance.
+The conda environment file specifies the dependencies for the service. It includes dependencies required by both the model and the entry script. The following YAML defines the environment for a Tensorflow model. It specifies `tensorflow-gpu`, which will make use of the GPU used in this deployment:
 
 ```yaml
 name: project_environment
@@ -117,37 +142,50 @@ channels:
 - conda-forge
 ```
 
-## Define the GPU InferenceConfig class
+For this example, the file is saved as `myenv.yml`.
+
+## Define the deployment configuration
 
-Create an `InferenceConfig` object that enables the GPUs and ensures that CUDA is installed with your Docker image.
+The deployment configuration defines the Azure Kubernetes Service environment used to run the web service:
 
 ```python
-from azureml.core.model import Model
-from azureml.core.model import InferenceConfig
+from azureml.core.webservice import AksWebservice
 
-aks_service_name = 'aks-dnn-mnist'
 gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
                                                     num_replicas=3,
                                                     cpu_cores=2,
                                                     memory_gb=4)
-model = Model(ws, "tf-dnn-mnist")
+```
+
+For more information, see the reference documentation for [AksService.deploy_configuration](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aks.akswebservice?view=azure-ml-py#deploy-configuration-autoscale-enabled-none--autoscale-min-replicas-none--autoscale-max-replicas-none--autoscale-refresh-seconds-none--autoscale-target-utilization-none--collect-model-data-none--auth-enabled-none--cpu-cores-none--memory-gb-none--enable-app-insights-none--scoring-timeout-ms-none--replica-max-concurrent-requests-none--max-request-wait-time-none--num-replicas-none--primary-key-none--secondary-key-none--tags-none--properties-none--description-none--gpu-cores-none--period-seconds-none--initial-delay-seconds-none--timeout-seconds-none--success-threshold-none--failure-threshold-none--namespace-none-).
+
+## Define the inference configuration
+
+The inference configuration points to the entry script and conda environment file. It also enables GPU support, which installs CUDA in the docker image created for the web service:
+
+```python
+from azureml.core.model import InferenceConfig
 
 inference_config = InferenceConfig(runtime="python",
                                    entry_script="score.py",
                                    conda_file="myenv.yml",
                                    enable_gpu=True)
 ```
 
-For more information, see:
-
-- [InferenceConfig class](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py)
-- [AksServiceDeploymentConfiguration class](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aks.aksservicedeploymentconfiguration?view=azure-ml-py)
+For more information, see the reference documentation for [InferenceConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py).
 
 ## Deploy the model
 
 Deploy the model to your AKS cluster and wait for it to create your service.
 
 ```python
+from azureml.core.model import Model
+
+# Name of the web service that is deployed
+aks_service_name = 'aks-dnn-mnist'
+# Get the registerd model
+model = Model(ws, "tf-dnn-mnist")
+# Deploy the model
 aks_service = Model.deploy(ws,
                            models=[model],
                            inference_config=inference_config,
@@ -160,17 +198,46 @@ print(aks_service.state)
 ```
 
 > [!NOTE]
-> Azure Machine Learning service won't deploy a model with an `InferenceConfig` object that expects GPU to be enabled to a cluster that doesn't have a GPU.
+> If the `InferenceConfig` object has `enable_gpu=True`, then the `deployment_target` parameter must reference a cluster that provides a GPU. Otherwise, the deployment will fail.
 
-For more information, see [Model class](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py).
+For more information, see the reference documentation for [Model](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py).
 
-## Issue a sample query to your model
+## Issue a sample query to your service
 
-Send a test query to the deployed model. When you send a jpeg image to the model, it scores the image. The following code sample uses an external utility function  to load images. You can find the relevant code at pir [TensorFlow sample on GitHub](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/utils.py). 
+Send a test query to the deployed model. When you send a jpeg image to the model, it scores the image. The following code sample downloads test data and then selects a random test image to send to the service. 
 
 ```python
 # Used to test your webservice
-from utils import load_data
+import os
+import urllib
+import gzip
+import numpy as np
+import struct
+import requests
+
+# load compressed MNIST gz files and return numpy arrays
+def load_data(filename, label=False):
+    with gzip.open(filename) as gz:
+        struct.unpack('I', gz.read(4))
+        n_items = struct.unpack('>I', gz.read(4))
+        if not label:
+            n_rows = struct.unpack('>I', gz.read(4))[0]
+            n_cols = struct.unpack('>I', gz.read(4))[0]
+            res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
+            res = res.reshape(n_items[0], n_rows * n_cols)
+        else:
+            res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
+            res = res.reshape(n_items[0], 1)
+    return res
+
+# one-hot encode a 1-D array
+def one_hot_encode(array, num_of_classes):
+    return np.eye(num_of_classes)[array.reshape(-1)]
+
+# Download test data
+os.makedirs('./data/mnist', exist_ok=True)
+urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/mnist/test-images.gz')
+urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/mnist/test-labels.gz')
 
 # Load test data from model training
 X_test = load_data('./data/mnist/test-images.gz', False) / 255.0
@@ -186,13 +253,11 @@ headers = {'Content-Type': 'application/json',
 resp = requests.post(aks_service.scoring_uri, input_data, headers=headers)
 
 print("POST to url", aks_service.scoring_uri)
-#print("input data:", input_data)
 print("label:", y_test[random_index])
 print("prediction:", resp.text)
 ```
 
-> [!IMPORTANT]
-> To minimize latency and optimize throughput, make sure your client is in the same Azure region as the endpoint. In this example, the APIs are created in the East US Azure region.
+For more information on creating a client application, see [Create client to consume deployed web service](how-to-consume-web-service.md).
 
 ## Clean up the resources