Skip to content

Commit cb38ffc

Browse files
[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (huggingface#10324)
* push to show * small improvement * small improvement * Update src/transformers/feature_extraction_utils.py * Update src/transformers/feature_extraction_utils.py * implement base * add common tests * make all tests pass for wav2vec2 * make padding work & add more tests * finalize feature extractor utils * add call method to feature extraction * finalize feature processor * finish tokenizer * finish general processor design * finish tests * typo * remove bogus file * finish docstring * add docs * finish docs * small fix * correct docs * save intermediate * load changes * apply changes * apply changes to doc * change tests * apply surajs recommend * final changes * Apply suggestions from code review * fix typo * fix import * correct docstring
1 parent 9dc7825 commit cb38ffc

33 files changed

+2252
-176
lines changed

docs/source/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,7 @@ TensorFlow and/or Flax.
375375
main_classes/processors
376376
main_classes/tokenizer
377377
main_classes/trainer
378+
main_classes/feature_extractor
378379

379380
.. toctree::
380381
:maxdepth: 2
@@ -441,3 +442,4 @@ TensorFlow and/or Flax.
441442
internal/tokenization_utils
442443
internal/trainer_utils
443444
internal/generation_utils
445+
internal/file_utils

docs/source/internal/file_utils.rst

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
..
2+
Copyright 2021 The HuggingFace Team. All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
5+
the License. You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
10+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
11+
specific language governing permissions and limitations under the License.
12+
13+
General Utilities
14+
-----------------------------------------------------------------------------------------------------------------------
15+
16+
This page lists all of Transformers general utility functions that are found in the file ``file_utils.py``.
17+
18+
Most of those are only useful if you are studying the general code in the library.
19+
20+
21+
Enums and namedtuples
22+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
23+
24+
.. autoclass:: transformers.file_utils.ExplicitEnum
25+
26+
.. autoclass:: transformers.file_utils.PaddingStrategy
27+
28+
.. autoclass:: transformers.file_utils.TensorType
29+
30+
31+
Special Decorators
32+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
33+
34+
.. autofunction:: transformers.file_utils.add_start_docstrings
35+
36+
.. autofunction:: transformers.file_utils.add_start_docstrings_to_model_forward
37+
38+
.. autofunction:: transformers.file_utils.add_end_docstrings
39+
40+
.. autofunction:: transformers.file_utils.add_code_sample_docstrings
41+
42+
.. autofunction:: transformers.file_utils.replace_return_docstrings
43+
44+
45+
Special Properties
46+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47+
48+
.. autoclass:: transformers.file_utils.cached_property
49+
50+
51+
Other Utilities
52+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53+
54+
.. autoclass:: transformers.file_utils._BaseLazyModule

docs/source/internal/tokenization_utils.rst

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,6 @@ SpecialTokensMixin
3838
Enums and namedtuples
3939
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4040

41-
.. autoclass:: transformers.tokenization_utils_base.ExplicitEnum
42-
43-
.. autoclass:: transformers.tokenization_utils_base.PaddingStrategy
44-
45-
.. autoclass:: transformers.tokenization_utils_base.TensorType
46-
4741
.. autoclass:: transformers.tokenization_utils_base.TruncationStrategy
4842

4943
.. autoclass:: transformers.tokenization_utils_base.CharSpan
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
..
2+
Copyright 2021 The HuggingFace Team. All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
5+
the License. You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
10+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
11+
specific language governing permissions and limitations under the License.
12+
13+
14+
Feature Extractor
15+
-----------------------------------------------------------------------------------------------------------------------
16+
17+
A feature extractor is in charge of preparing read-in audio files for a speech model. This includes feature extraction,
18+
such as processing audio files to, *e.g.*, Log-Mel Spectrogram features, but also padding, normalization, and
19+
conversion to Numpy, PyTorch, and TensorFlow tensors.
20+
21+
22+
PreTrainedFeatureExtractor
23+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24+
25+
.. autoclass:: transformers.PreTrainedFeatureExtractor
26+
:members: from_pretrained, save_pretrained, pad
27+
28+
29+
BatchFeature
30+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31+
32+
.. autoclass:: transformers.BatchFeature
33+
:members:

docs/source/model_doc/wav2vec2.rst

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Tips:
3434

3535
- Wav2Vec2 is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
3636
- Wav2Vec2 model was trained using connectionist temporal classification (CTC) so the model output has to be decoded
37-
using :class:`~transformers.Wav2Vec2Tokenizer`.
37+
using :class:`~transformers.Wav2Vec2CTCTokenizer`.
3838

3939

4040
Wav2Vec2Config
@@ -44,13 +44,27 @@ Wav2Vec2Config
4444
:members:
4545

4646

47-
Wav2Vec2Tokenizer
47+
Wav2Vec2CTCTokenizer
4848
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4949

50-
.. autoclass:: transformers.Wav2Vec2Tokenizer
50+
.. autoclass:: transformers.Wav2Vec2CTCTokenizer
5151
:members: __call__, save_vocabulary
5252

5353

54+
Wav2Vec2FeatureExtractor
55+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56+
57+
.. autoclass:: transformers.Wav2Vec2FeatureExtractor
58+
:members: __call__
59+
60+
61+
Wav2Vec2Processor
62+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
63+
64+
.. autoclass:: transformers.Wav2Vec2Processor
65+
:members: __call__, from_pretrained, save_pretrained, batch_decode, decode, as_target_processor
66+
67+
5468
Wav2Vec2Model
5569
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5670

examples/multiple-choice/run_swag.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@
3939
default_data_collator,
4040
set_seed,
4141
)
42-
from transformers.tokenization_utils_base import PaddingStrategy, PreTrainedTokenizerBase
42+
from transformers.file_utils import PaddingStrategy
43+
from transformers.tokenization_utils_base import PreTrainedTokenizerBase
4344
from transformers.trainer_utils import get_last_checkpoint, is_main_process
4445

4546

@@ -133,7 +134,7 @@ class DataCollatorForMultipleChoice:
133134
Args:
134135
tokenizer (:class:`~transformers.PreTrainedTokenizer` or :class:`~transformers.PreTrainedTokenizerFast`):
135136
The tokenizer used for encoding the data.
136-
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
137+
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.file_utils.PaddingStrategy`, `optional`, defaults to :obj:`True`):
137138
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
138139
among:
139140

src/transformers/__init__.py

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@
8888
"TF_WEIGHTS_NAME",
8989
"TRANSFORMERS_CACHE",
9090
"WEIGHTS_NAME",
91+
"TensorType",
9192
"add_end_docstrings",
9293
"add_start_docstrings",
9394
"cached_path",
@@ -125,7 +126,14 @@
125126
],
126127
"models": [],
127128
# Models
128-
"models.wav2vec2": ["WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP", "Wav2Vec2Config", "Wav2Vec2Tokenizer"],
129+
"models.wav2vec2": [
130+
"WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP",
131+
"Wav2Vec2Config",
132+
"Wav2Vec2CTCTokenizer",
133+
"Wav2Vec2Tokenizer",
134+
"Wav2Vec2FeatureExtractor",
135+
"Wav2Vec2Processor",
136+
],
129137
"models.convbert": ["CONVBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "ConvBertConfig", "ConvBertTokenizer"],
130138
"models.albert": ["ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "AlbertConfig"],
131139
"models.auto": [
@@ -234,9 +242,9 @@
234242
"CharSpan",
235243
"PreTrainedTokenizerBase",
236244
"SpecialTokensMixin",
237-
"TensorType",
238245
"TokenSpan",
239246
],
247+
"feature_extraction_utils": ["PreTrainedFeatureExtractor", "BatchFeature"],
240248
"trainer_callback": [
241249
"DefaultFlowCallback",
242250
"EarlyStoppingCallback",
@@ -1217,6 +1225,9 @@
12171225
xnli_tasks_num_labels,
12181226
)
12191227

1228+
# Feature Extractor
1229+
from .feature_extraction_utils import BatchFeature, PreTrainedFeatureExtractor
1230+
12201231
# Files and general utilities
12211232
from .file_utils import (
12221233
CONFIG_NAME,
@@ -1228,6 +1239,7 @@
12281239
TF_WEIGHTS_NAME,
12291240
TRANSFORMERS_CACHE,
12301241
WEIGHTS_NAME,
1242+
TensorType,
12311243
add_end_docstrings,
12321244
add_start_docstrings,
12331245
cached_path,
@@ -1343,7 +1355,14 @@
13431355
TransfoXLCorpus,
13441356
TransfoXLTokenizer,
13451357
)
1346-
from .models.wav2vec2 import WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP, Wav2Vec2Config, Wav2Vec2Tokenizer
1358+
from .models.wav2vec2 import (
1359+
WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP,
1360+
Wav2Vec2Config,
1361+
Wav2Vec2CTCTokenizer,
1362+
Wav2Vec2FeatureExtractor,
1363+
Wav2Vec2Processor,
1364+
Wav2Vec2Tokenizer,
1365+
)
13471366
from .models.xlm import XLM_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMConfig, XLMTokenizer
13481367
from .models.xlm_prophetnet import XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMProphetNetConfig
13491368
from .models.xlm_roberta import XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMRobertaConfig
@@ -1381,7 +1400,6 @@
13811400
CharSpan,
13821401
PreTrainedTokenizerBase,
13831402
SpecialTokensMixin,
1384-
TensorType,
13851403
TokenSpan,
13861404
)
13871405

src/transformers/data/data_collator.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,9 @@
2020
import torch
2121
from torch.nn.utils.rnn import pad_sequence
2222

23+
from ..file_utils import PaddingStrategy
2324
from ..modeling_utils import PreTrainedModel
24-
from ..tokenization_utils_base import BatchEncoding, PaddingStrategy, PreTrainedTokenizerBase
25+
from ..tokenization_utils_base import BatchEncoding, PreTrainedTokenizerBase
2526

2627

2728
InputDataClass = NewType("InputDataClass", Any)
@@ -89,7 +90,7 @@ class DataCollatorWithPadding:
8990
Args:
9091
tokenizer (:class:`~transformers.PreTrainedTokenizer` or :class:`~transformers.PreTrainedTokenizerFast`):
9192
The tokenizer used for encoding the data.
92-
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
93+
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.file_utils.PaddingStrategy`, `optional`, defaults to :obj:`True`):
9394
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
9495
among:
9596
@@ -138,7 +139,7 @@ class DataCollatorForTokenClassification:
138139
Args:
139140
tokenizer (:class:`~transformers.PreTrainedTokenizer` or :class:`~transformers.PreTrainedTokenizerFast`):
140141
The tokenizer used for encoding the data.
141-
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
142+
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.file_utils.PaddingStrategy`, `optional`, defaults to :obj:`True`):
142143
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
143144
among:
144145
@@ -238,7 +239,7 @@ class DataCollatorForSeq2Seq:
238239
prepare the `decoder_input_ids`
239240
240241
This is useful when using `label_smoothing` to avoid calculating loss twice.
241-
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
242+
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.file_utils.PaddingStrategy`, `optional`, defaults to :obj:`True`):
242243
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
243244
among:
244245

0 commit comments

Comments
 (0)