Skip to content

Commit ef644d0

Browse files
authored
Merge pull request MicrosoftDocs#85724 from RSavage2/patch-1
Update to how-to-auto-train-remote guide
2 parents a23eaf6 + e62e2ca commit ef644d0

File tree

1 file changed

+14
-19
lines changed

1 file changed

+14
-19
lines changed

articles/machine-learning/service/how-to-auto-train-remote.md

Lines changed: 14 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,6 @@ provisioning_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D2
4747
# for GPU, use "STANDARD_NC6"
4848
# vm_priority = 'lowpriority', # optional
4949
max_nodes=6)
50-
5150
compute_target = ComputeTarget.create(
5251
ws, amlcompute_cluster_name, provisioning_config)
5352

@@ -64,35 +63,30 @@ Cluster name restrictions include:
6463
+ Cannot include any of the following characters:
6564
`\` ~ ! @ # $ % ^ & * ( ) = + _ [ ] { } \\\\ | ; : \' \\" , < > / ?.`
6665

67-
## Access data using get_data() function
68-
69-
Provide the remote resource access to your training data. For automated machine learning experiments running on remote compute, the data needs to be fetched using a `get_data()` function.
66+
## Access data using TabularDataset function
7067

71-
To provide access, you must:
72-
+ Create a get_data.py file containing a `get_data()` function
73-
+ Place that file in a directory accessible as an absolute path
68+
Defined X and y as `TabularDataset`s, which are passed to Automated ML in the AutoMLConfig. `from_delimited_files` by default sets the `infer_column_types` to true, which will infer the columns type automatically.
7469

75-
You can encapsulate code to read data from a blob storage or local disk in the get_data.py file. In the following code sample, the data comes from the sklearn package.
70+
If you do wish to manually set the column types, you can set the `set_column_types` argument to manually set the type of each columns. In the following code sample, the data comes from the sklearn package.
7671

7772
```python
7873
# Create a project_folder if it doesn't exist
7974
if not os.path.exists(project_folder):
8075
os.makedirs(project_folder)
8176

82-
#Write the get_data file.
83-
%%writefile $project_folder/get_data.py
84-
8577
from sklearn import datasets
8678
from scipy import sparse
8779
import numpy as np
80+
import pandas as pd
8881

89-
def get_data():
82+
data_train = datasets.load_digits()
9083

91-
digits = datasets.load_digits()
92-
X_digits = digits.data[10:,:]
93-
y_digits = digits.target[10:]
84+
pd.DataFrame(data_train.data[100:,:]).to_csv(\'data/X_train.csv\', index=False)
85+
pd.DataFrame(data_train.target[100:]).to_csv(\'data/y_train.csv\', index=False)
86+
87+
X = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/X_train.csv'))
88+
y = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/y_train.csv'))
9489

95-
return { "X" : X_digits, "y" : y_digits }
9690
```
9791

9892
## Create run configuration
@@ -116,7 +110,6 @@ run_config.environment.python.conda_dependencies = dependencies
116110
See this [sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb) for an additional example of this design pattern.
117111

118112
## Configure experiment
119-
120113
Specify the settings for `AutoMLConfig`. (See a [full list of parameters](how-to-configure-auto-train.md#configure-experiment) and their possible values.)
121114

122115
```python
@@ -140,7 +133,8 @@ automl_config = AutoMLConfig(task='classification',
140133
path=project_folder,
141134
compute_target=compute_target,
142135
run_configuration=run_config,
143-
data_script=project_folder + "/get_data.py",
136+
X = X,
137+
y = y,
144138
**automl_settings,
145139
)
146140
```
@@ -155,7 +149,8 @@ automl_config = AutoMLConfig(task='classification',
155149
path=project_folder,
156150
compute_target=compute_target,
157151
run_configuration=run_config,
158-
data_script=project_folder + "/get_data.py",
152+
X = X,
153+
y = y,
159154
**automl_settings,
160155
model_explainability=True,
161156
X_valid=X_test

0 commit comments

Comments
 (0)