Skip to content

Commit d7d1e05

Browse files
committed
Not using sharding
1 parent 234d91e commit d7d1e05

File tree

2 files changed

+7
-29
lines changed

2 files changed

+7
-29
lines changed

examples/TPU/README.md

Lines changed: 1 addition & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -9,35 +9,7 @@ displayed in this README.
99

1010
### Using TPUStrategy with a custom loop
1111

12-
This refers to the script `run_tpu_glue.py`. It currently crashes with the following error:
13-
14-
```
15-
Running on TPU ['192.168.31.2:8470']
16-
Number of accelerators: 8
17-
TPUStrategy obtained.
18-
2019-09-29 15:54:22.498885: E tensorflow/core/framework/dataset.cc:76] The Encode() method is not implemented for DatasetVariantWrapper objects.
19-
Traceback (most recent call last):
20-
File "/home/lysandre/transformers/examples/TPU/run_tpu_glue.py", line 32, in <module>
21-
train_distributed_dataset = strategy.experimental_distribute_dataset(train_dataset)
22-
File "/home/lysandre/transformers/venv/lib/python3.5/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 674, in experimental_distribute_dataset
23-
return self._extended._experimental_distribute_dataset(dataset) # pylint: disable=protected-access
24-
File "/home/lysandre/transformers/venv/lib/python3.5/site-packages/tensorflow_core/python/distribute/tpu_strategy.py", line 256, in _experimental_distribute_dataset
25-
split_batch_by=self._num_replicas_in_sync)
26-
File "/home/lysandre/transformers/venv/lib/python3.5/site-packages/tensorflow_core/python/distribute/input_lib.py", line 81, in get_distributed_dataset
27-
input_context=input_context)
28-
File "/home/lysandre/transformers/venv/lib/python3.5/site-packages/tensorflow_core/python/distribute/input_lib.py", line 558, in __init__
29-
input_context=input_context)
30-
File "/home/lysandre/transformers/venv/lib/python3.5/site-packages/tensorflow_core/python/distribute/input_lib.py", line 484, in __init__
31-
dataset = distribute._RebatchDataset(dataset, split_batch_by) # pylint: disable=protected-access
32-
File "/home/lysandre/transformers/venv/lib/python3.5/site-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 112, in __init__
33-
**self._flat_structure)
34-
File "/home/lysandre/transformers/venv/lib/python3.5/site-packages/tensorflow_core/python/ops/gen_experimental_dataset_ops.py", line 6468, in rebatch_dataset
35-
_six.raise_from(_core._status_to_exception(e.code, message), None)
36-
File "<string>", line 3, in raise_from
37-
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:worker/replica:0/task:0/device:CPU:0 in order to run RebatchDataset: Unable to parse tensor proto
38-
Additional GRPC error information:
39-
{"created":"@1569772462.499287349","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Unable to parse tensor proto","grpc_status":3} [Op:RebatchDataset]
40-
```
12+
This refers to the script `run_tpu_glue.py`. It doesn't crash anymore. Advancing on the script.
4113

4214
### Using TPUStrategy with keras' fit method
4315

examples/TPU/run_tpu_glue.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@
3131
strategy, _ = get_tpu()
3232
print('TPUStrategy obtained.')
3333

34+
dataset_options = tf.data.Options()
35+
dataset_options.experimental_distribute.auto_shard = False
36+
37+
train_dataset = train_dataset.with_options(dataset_options)
38+
test_dataset = test_dataset.with_options(dataset_options)
39+
3440
with strategy.scope():
3541
# Crashes here
3642
train_distributed_dataset = strategy.experimental_distribute_dataset(train_dataset)

0 commit comments

Comments
 (0)