Skip to content

Commit d048310

Browse files
committed
Transform data: minor url update; write data: tables cut
1 parent 09c6d64 commit d048310

File tree

2 files changed

+29
-43
lines changed

2 files changed

+29
-43
lines changed

articles/machine-learning/service/how-to-transform-data.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ dataflow.head(5)
285285

286286
### Filtering columns
287287

288-
To filter columns, use `Dataflow.drop_columns()`. This method takes a list of columns to drop or a more complex argument called [`ColumnSelector`](https://docs.microsoft.com/en-us/python/api/azureml-dataprep/azureml.dataprep.columnselector?view=azure-dataprep-py).
288+
To filter columns, use `Dataflow.drop_columns()`. This method takes a list of columns to drop or a more complex argument called [`ColumnSelector`](https://docs.microsoft.com/python/api/azureml-dataprep/azureml.dataprep.columnselector?view=azure-dataprep-py).
289289

290290
#### Filtering columns with list of strings
291291

@@ -490,7 +490,7 @@ df.head(2)
490490
|0|ALABAMA|Jefferson County|Jefferson County, Alabama|1.019200e+10|1.0|
491491
|1|ALABAMA|Jefferson County|Jefferson County, Alabama|1.019200e+10|0.0|
492492

493-
## Next Steps
493+
## Next steps
494494

495495
* See the SDK [overview](https://aka.ms/data-prep-sdk) for design patterns and usage examples
496496
* See the Azure Machine Learning Data Prep SDK [tutorial](tutorial-data-prep.md) for an example of solving a specific scenario

articles/machine-learning/service/how-to-write-data.md

Lines changed: 27 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.custom: seodec18
1515
---
1616
# Write data using the Azure Machine Learning Data Prep SDK
1717

18-
In this article, you learn different methods to write data using the Azure Machine Learning Data Prep SDK. Output data can be written at any point in a dataflow, and writes are added as steps to the resulting data flow and are run every time the data flow is. Data is written to multiple partition files to allow parallel writes.
18+
In this article, you learn different methods to write data using the [Azure Machine Learning Data Prep SDK](https://aka.ms/data-prep-sdk). Output data can be written at any point in a dataflow, and writes are added as steps to the resulting data flow and are run every time the data flow is. Data is written to multiple partition files to allow parallel writes.
1919

2020
Since there are no limitations to how many write steps there are in a pipeline, you can easily add additional write steps to get intermediate results for troubleshooting or for other pipelines.
2121

@@ -27,7 +27,7 @@ The following file formats are supported
2727
- Delimited files (CSV, TSV, etc.)
2828
- Parquet files
2929

30-
Using the [Azure Machine Learning Data Prep python SDK](https://aka.ms/data-prep-sdk), you can write data to:
30+
Using the Azure Machine Learning Data Prep python SDK, you can write data to:
3131
+ a local file system
3232
+ Azure Blob Storage
3333
+ Azure Data Lake Storage
@@ -48,22 +48,17 @@ For this example, start by loading data into a data flow. You reuse this data wi
4848
import azureml.dataprep as dprep
4949
t = dprep.auto_read_file('./data/fixed_width_file.txt')
5050
t = t.to_number('Column3')
51-
t.head(10)
51+
t.head(5)
5252
```
5353

5454
Example output:
55-
| | Column1 | Column2 | Column3 | Column4 |Column5 | Column6 | Column7 | Column8 | Column9 |
56-
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
57-
| 0 | 10000.0 | 99999.0 | None| NO| NO | ENRS |NaN | NaN | NaN|
58-
| 1| 10003.0 | 99999.0 | None| NO| NO | ENSO| NaN| NaN |NaN|
59-
| 2| 10010.0| 99999.0| None| NO| JN| ENJA| 70933.0| -8667.0 |90.0|
60-
|3| 10013.0| 99999.0| None| NO| NO| | NaN| NaN| NaN|
61-
|4| 10014.0| 99999.0| None| NO| NO| ENSO| 59783.0| 5350.0| 500.0|
62-
|5| 10015.0| 99999.0| None| NO| NO| ENBL| 61383.0| 5867.0| 3270.0|
63-
|6| 10016.0 |99999.0| None| NO| NO| |64850.0| 11233.0| 140.0|
64-
|7| 10017.0| 99999.0| None| NO| NO| ENFR| 59933.0| 2417.0| 480.0|
65-
|8| 10020.0| 99999.0| None| NO| SV| |80050.0| 16250.0| 80.0|
66-
|9| 10030.0| 99999.0| None| NO| SV| |77000.0| 15500.0| 120.0|
55+
| | Column1 | Column2 | Column3 | Column4 | Column5 | Column6 | Column7 | Column8 | Column9 |
56+
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
57+
|0| 10000.0 | 99999.0 | None | NO | NO | ENRS | NaN | NaN | NaN |
58+
|1| 10003.0 | 99999.0 | None | NO | NO | ENSO | NaN | NaN | NaN |
59+
|2| 10010.0 | 99999.0 | None | NO | JN | ENJA | 70933.0 | -8667.0 | 90.0 |
60+
|3| 10013.0 | 99999.0 | None | NO | NO | | NaN | NaN | NaN |
61+
|4| 10014.0 | 99999.0 | None | NO | NO | ENSO | 59783.0 | 5350.0 | 500.0|
6762

6863
### Delimited file example
6964

@@ -77,22 +72,18 @@ write_t = t.write_to_csv(directory_path=dprep.LocalFileOutput('./test_out/'))
7772
write_t.run_local()
7873

7974
written_files = dprep.read_csv('./test_out/part-*')
80-
written_files.head(10)
75+
written_files.head(5)
8176
```
8277

8378
Example output:
84-
| | Column1 | Column2 | Column3 | Column4 |Column5 | Column6 | Column7 | Column8 | Column9 |
85-
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
86-
| 0 | 10000.0 | 99999.0 | ERROR | NO| NO | ENRS |ERROR | ERROR | ERROR|
87-
| 1| 10003.0 | 99999.0 | ERROR | NO| NO | ENSO| ERROR| ERROR |ERROR|
88-
| 2| 10010.0| 99999.0| ERROR | NO| JN| ENJA| 70933.0| -8667.0 |90.0|
89-
|3| 10013.0| 99999.0| ERROR | NO| NO| | ERROR| ERROR| ERROR|
90-
|4| 10014.0| 99999.0| ERROR | NO| NO| ENSO| 59783.0| 5350.0| 500.0|
91-
|5| 10015.0| 99999.0| ERROR | NO| NO| ENBL| 61383.0| 5867.0| 3270.0|
92-
|6| 10016.0 |99999.0| ERROR | NO| NO| |64850.0| 11233.0| 140.0|
93-
|7| 10017.0| 99999.0| ERROR | NO| NO| ENFR| 59933.0| 2417.0| 480.0|
94-
|8| 10020.0| 99999.0| ERROR | NO| SV| |80050.0| 16250.0| 80.0|
95-
|9| 10030.0| 99999.0| ERROR | NO| SV| |77000.0| 15500.0| 120.0|
79+
| | Column1 | Column2 | Column3 | Column4 | Column5 | Column6 | Column7 | Column8 | Column9 |
80+
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
81+
|0| 10000.0 | 99999.0 | ERROR | NO | NO | ENRS | NaN | NaN | NaN |
82+
|1| 10003.0 | 99999.0 | ERROR | NO | NO | ENSO | NaN | NaN | NaN |
83+
|2| 10010.0 | 99999.0 | ERROR | NO | JN | ENJA | 70933.0 | -8667.0 | 90.0 |
84+
|3| 10013.0 | 99999.0 | ERROR | NO | NO | | NaN | NaN | NaN |
85+
|4| 10014.0 | 99999.0 | ERROR | NO | NO | ENSO | 59783.0 | 5350.0 | 500.0|
86+
9687

9788
In the preceding output, several errors appear in the numeric columns because of numbers that were not parsed correctly. When written to CSV, null values are replaced with the string "ERROR" by default.
9889

@@ -104,22 +95,17 @@ write_t = t.write_to_csv(directory_path=dprep.LocalFileOutput('./test_out/'),
10495
na='NA')
10596
write_t.run_local()
10697
written_files = dprep.read_csv('./test_out/part-*')
107-
written_files.head(10)
98+
written_files.head(5)
10899
```
109100

110101
The preceding code produces this output:
111-
| | Column1 | Column2 | Column3 | Column4 |Column5 | Column6 | Column7 | Column8 | Column9 |
112-
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
113-
| 0 | 10000.0 | 99999.0 | BadData | NO| NO | ENRS |BadData | BadData | BadData|
114-
| 1| 10003.0 | 99999.0 | BadData | NO| NO | ENSO| BadData| BadData |BadData|
115-
| 2| 10010.0| 99999.0| BadData | NO| JN| ENJA| 70933.0| -8667.0 |90.0|
116-
|3| 10013.0| 99999.0| BadData | NO| NO| | BadData| BadData| BadData|
117-
|4| 10014.0| 99999.0| BadData | NO| NO| ENSO| 59783.0| 5350.0| 500.0|
118-
|5| 10015.0| 99999.0| BadData | NO| NO| ENBL| 61383.0| 5867.0| 3270.0|
119-
|6| 10016.0 |99999.0| BadData | NO| NO| |64850.0| 11233.0| 140.0|
120-
|7| 10017.0| 99999.0| BadData | NO| NO| ENFR| 59933.0| 2417.0| 480.0|
121-
|8| 10020.0| 99999.0| BadData | NO| SV| |80050.0| 16250.0| 80.0|
122-
|9| 10030.0| 99999.0| BadData | NO| SV| |77000.0| 15500.0| 120.0|
102+
| | Column1 | Column2 | Column3 | Column4 | Column5 | Column6 | Column7 | Column8 | Column9 |
103+
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
104+
|0| 10000.0 | 99999.0 | BadData | NO | NO | ENRS | NaN | NaN | NaN |
105+
|1| 10003.0 | 99999.0 | BadData | NO | NO | ENSO | NaN | NaN | NaN |
106+
|2| 10010.0 | 99999.0 | BadData | NO | JN | ENJA | 70933.0 | -8667.0 | 90.0 |
107+
|3| 10013.0 | 99999.0 | BadData | NO | NO | | NaN | NaN | NaN |
108+
|4| 10014.0 | 99999.0 | BadData | NO | NO | ENSO | 59783.0 | 5350.0 | 500.0|
123109

124110
### Parquet file example
125111

0 commit comments

Comments
 (0)