You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Deploy a deep learning model for inference with GPU
16
16
17
-
This article teaches you how to use the Azure Machine Learning service to deploy a GPU-enabled Tensorflow deep learning model as a web service.
17
+
This article teaches you how to use the Azure Machine Learning service to deploy a GPU-enabled model as a web service. The information in this article is based on deploying a model on Azure Kubernetes Service (AKS). The AKS cluster provides a GPU resource that is used by the model for inference.
18
18
19
-
Deploy your model to an Azure Kubernetes Service (AKS) cluster to do GPU-enabled inferencing. Inferencing, or model scoring, is the phase where the deployed model is used for prediction. Using GPUs instead of CPUs offer performance advantages on highly parallelizable computation.
19
+
Inference, or model scoring, is the phase where the deployed model is used to make predictions. Using GPUs instead of CPUs offers performance advantages on highly parallelizable computation.
20
20
21
-
Although this sample uses a TensorFlow model, you can apply the following steps to any machine learning framework that supports GPUs by making small changes to the scoring file and the environment file.
21
+
> [!TIP]
22
+
> Although the code snippets in this article usee a TensorFlow model, you can apply the information to any machine learning framework that supports GPUs.
22
23
23
-
In this article, you take the following steps:
24
+
## Prerequisites
24
25
25
-
* Create a GPU-enabled AKS cluster
26
-
* Deploy a Tensorflow GPU model
27
-
* Issue a sample query to your deployed model
26
+
* An Azure Machine Learning service workspace. For more information, see [Create an Azure Machine Learning service workspace](setup-create-workspace.md).
28
27
29
-
## Prerequisites
28
+
* A Python development environment with the Azure Machine Learning SDK installed. For more information, see the [Python SDK](setup-create-workspace.md#sdk) section of the Create a workspace article.
29
+
30
+
* A registered model that uses a GPU.
30
31
31
-
* An Azure Machine Learning services workspace.
32
-
* A Python distro.
33
-
* A registered Tensorflow saved model.
34
32
* To learn how to register models, see [Deploy Models](../service/how-to-deploy-and-where.md#registermodel).
35
33
36
-
You can complete part one of this how-to series, [How to Train a TensorFlow Model](how-to-train-tensorflow.md), to fulfill the necessary prerequisites.
34
+
* To create and register the Tensorflow model used to create this document, see [How to Train a TensorFlow Model](how-to-train-tensorflow.md).
35
+
36
+
* A general understanding of [How and where to deploy models](how-to-deploy-and-where.md).
37
+
38
+
## Connect to your workspace
37
39
38
-
## Provision an AKS cluster with GPUs
40
+
To connect to an existing workspace, use the following code:
39
41
40
-
Azure has many different GPU options. You can use any of them for inferencing. See [the list of N-series VMs](https://azure.microsoft.com/pricing/details/virtual-machines/linux/#n-series) for a full breakdown of capabilities and costs.
42
+
> [!IMPORTANT]
43
+
> This code snippet expects the workspace configuration to be saved in the current directory or its parent. For more information on creating a workspace and saving the configuration to file, see [Create an Azure Machine Learning service workspace](setup-create-workspace.md).
44
+
45
+
```python
46
+
from azureml.core import Workspace
47
+
48
+
# Connect to the workspace
49
+
ws = Workspace.from_config()
50
+
```
41
51
42
-
For more information on using AKS with Azure Machine Learning service, see [How to deploy and where](../service/how-to-deploy-and-where.md#deploy-aks).
52
+
## Create a Kubernetes cluster with GPUs
53
+
54
+
Azure Kubernetes Service provides many different GPU options. You can use any of them for model inference. See [the list of N-series VMs](https://azure.microsoft.com/pricing/details/virtual-machines/linux/#n-series) for a full breakdown of capabilities and costs.
55
+
56
+
The following code demonstrates how to create a new AKS cluster for your workspace:
57
+
58
+
```python
59
+
from azureml.core.compute import ComputeTarget, AksCompute
60
+
from azureml.exceptions import ComputeTargetException
> Azure will bill you as long as the AKS cluster is provisioned. Make sure to delete your AKS cluster when you're done with it.
83
+
> Azure will bill you as long as the AKS cluster exists. Make sure to delete your AKS cluster when you're done with it.
84
+
85
+
For more information on using Azure Kubernetes Service with Azure Machine Learning service, see [How to deploy and where](how-to-deploy-and-where.md#deploy-aks).
67
86
68
87
## Write the entry script
69
88
70
-
Save the following code to your working directory as `score.py`. This file scores images as they're sent to your service. It loads the TensorFlow saved model, passes the input image to the TensorFlow session on each POST request, and then returns the resulting scores. Other inferencing frameworks require different scoring files.
89
+
The entry script receives data submitted to the web service, passes it to the model, and returns the scoring results. The following script loads the Tensorflow model on startup, and then uses the model to score data.
90
+
91
+
> [!TIP]
92
+
> The entry script is specific to your model. For example, the script must know the framework to use with your model, data formats, etc.
71
93
72
94
```python
73
95
import json
@@ -98,9 +120,12 @@ def run(raw_data):
98
120
y_hat = np.argmax(out, axis=1)
99
121
return y_hat.tolist()
100
122
```
123
+
124
+
This file is named `score.py`. For more information on entry scripts, see [How and where to deploy](how-to-deploy-and-where.md).
125
+
101
126
## Define the conda environment
102
127
103
-
Create a conda environment file named `myenv.yml` to specify the dependencies for your service. It's important to specify that you're using `tensorflow-gpu` to achieve accelerated performance.
128
+
The conda environment file specifies the dependencies for the service. It includes dependencies required by both the model and the entry script. The following YAML defines the environment for a Tensorflow model. It specifies `tensorflow-gpu`, which will make use of the GPU used in this deployment:
104
129
105
130
```yaml
106
131
name: project_environment
@@ -117,37 +142,50 @@ channels:
117
142
- conda-forge
118
143
```
119
144
120
-
## Define the GPU InferenceConfig class
145
+
For this example, the file is saved as `myenv.yml`.
146
+
147
+
## Define the deployment configuration
121
148
122
-
Create an `InferenceConfig` object that enables the GPUs and ensures that CUDA is installed with your Docker image.
149
+
The deployment configuration defines the Azure Kubernetes Service environment used to run the web service:
For more information, see the reference documentation for [AksService.deploy_configuration](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aks.akswebservice?view=azure-ml-py#deploy-configuration-autoscale-enabled-none--autoscale-min-replicas-none--autoscale-max-replicas-none--autoscale-refresh-seconds-none--autoscale-target-utilization-none--collect-model-data-none--auth-enabled-none--cpu-cores-none--memory-gb-none--enable-app-insights-none--scoring-timeout-ms-none--replica-max-concurrent-requests-none--max-request-wait-time-none--num-replicas-none--primary-key-none--secondary-key-none--tags-none--properties-none--description-none--gpu-cores-none--period-seconds-none--initial-delay-seconds-none--timeout-seconds-none--success-threshold-none--failure-threshold-none--namespace-none-).
161
+
162
+
## Define the inference configuration
163
+
164
+
The inference configuration points to the entry script and conda environment file. It also enables GPU support, which installs CUDA in the docker image created for the web service:
For more information, see the reference documentation for [InferenceConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py).
145
176
146
177
## Deploy the model
147
178
148
179
Deploy the model to your AKS cluster and wait for it to create your service.
149
180
150
181
```python
182
+
from azureml.core.model import Model
183
+
184
+
# Name of the web service that is deployed
185
+
aks_service_name = 'aks-dnn-mnist'
186
+
# Get the registerd model
187
+
model = Model(ws, "tf-dnn-mnist")
188
+
# Deploy the model
151
189
aks_service = Model.deploy(ws,
152
190
models=[model],
153
191
inference_config=inference_config,
@@ -160,17 +198,46 @@ print(aks_service.state)
160
198
```
161
199
162
200
> [!NOTE]
163
-
> Azure Machine Learning service won't deploy a model with an `InferenceConfig` object that expects GPU to be enabled to a cluster that doesn't have a GPU.
201
+
> If the `InferenceConfig` object has `enable_gpu=True`, then the `deployment_target` parameter must reference a cluster that provides a GPU. Otherwise, the deployment will fail.
164
202
165
-
For more information, see [Model class](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py).
203
+
For more information, see the reference documentation for [Model](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py).
166
204
167
-
## Issue a sample query to your model
205
+
## Issue a sample query to your service
168
206
169
-
Send a test query to the deployed model. When you send a jpeg image to the model, it scores the image. The following code sample uses an external utility function to load images. You can find the relevant code at pir [TensorFlow sample on GitHub](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/utils.py).
207
+
Send a test query to the deployed model. When you send a jpeg image to the model, it scores the image. The following code sample downloads test data and then selects a random test image to send to the service.
170
208
171
209
```python
172
210
# Used to test your webservice
173
-
from utils import load_data
211
+
import os
212
+
import urllib
213
+
import gzip
214
+
import numpy as np
215
+
import struct
216
+
import requests
217
+
218
+
# load compressed MNIST gz files and return numpy arrays
219
+
def load_data(filename, label=False):
220
+
with gzip.open(filename) as gz:
221
+
struct.unpack('I', gz.read(4))
222
+
n_items = struct.unpack('>I', gz.read(4))
223
+
if not label:
224
+
n_rows = struct.unpack('>I', gz.read(4))[0]
225
+
n_cols = struct.unpack('>I', gz.read(4))[0]
226
+
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
227
+
res = res.reshape(n_items[0], n_rows * n_cols)
228
+
else:
229
+
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
> To minimize latency and optimize throughput, make sure your client is in the same Azure region as the endpoint. In this example, the APIs are created in the East US Azure region.
260
+
For more information on creating a client application, see [Create client to consume deployed web service](how-to-consume-web-service.md).
0 commit comments