Skip to content

Commit 99a2771

Browse files
novice03NielsRoggesguggerpatrickvonplaten
authored
* Add cookiecutter files * Add cuda kernels and cpp files * Update modeling_yoso.py * Add .h files * Update configuration_yoso.py * Updates * Remove tokenizer * Code quality * Update modeling_yoso.py * Update modeling_yoso.py * Fix failing test * Update modeling_yoso.py * Fix code quality * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review and fix integration tests * Update src/transformers/models/yoso/modeling_yoso.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Apply suggestions from code review * Fix copied from statement * Fix docstring * Fix code quality * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions and fix mask * Apply suggestions from code review * Fix code quality * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix docstrings * Fix code quality * Remove trailing whitespace * Update yoso.mdx * Move kernel loading to YosoEncoder * make style * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/yoso/modeling_yoso.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Add short summary to docs * Update docs/source/model_doc/yoso.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update yoso.mdx * Update docs/source/model_doc/yoso.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Remove CausalLM model and add copied from * Remove autoregressive code * Remove unused imports * add copied from for embeddings * Fix code quality * Update docs/source/model_doc/yoso.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Apply suggestion from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
1 parent 6292532 commit 99a2771

25 files changed

+4103
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -325,6 +325,7 @@ AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Ch
325325
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
326326
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
327327
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
328+
1. **[YOSO](https://huggingface.co/docs/transformers/master/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
328329
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
329330

330331
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/docs/transformers/index#supported-frameworks).

README_ko.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
303303
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
304304
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
305305
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
306+
1. **[YOSO](https://huggingface.co/docs/transformers/master/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
306307
1. 새로운 모델을 올리고 싶나요? 우리가 **상세한 가이드와 템플릿** 으로 새로운 모델을 올리도록 도와드릴게요. 가이드와 템플릿은 이 저장소의 [`templates`](./templates) 폴더에서 확인하실 수 있습니다. [컨트리뷰션 가이드라인](./CONTRIBUTING.md)을 꼭 확인해주시고, PR을 올리기 전에 메인테이너에게 연락하거나 이슈를 오픈해 피드백을 받으시길 바랍니다.
307308

308309
각 모델이 Flax, PyTorch, TensorFlow으로 구현되었는지 또는 🤗 Tokenizers 라이브러리가 지원하는 토크나이저를 사용하는지 확인하려면, [이 표](https://huggingface.co/docs/transformers/index#supported-frameworks)를 확인하세요.

README_zh-hans.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,7 @@ conda install -c huggingface transformers
327327
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (来自 Google/CMU) 伴随论文 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 发布。
328328
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (来自 Facebook AI) 伴随论文 [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) 由 Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli 发布。
329329
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (来自 Facebook AI) 伴随论文 [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) 由 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli 发布。
330+
1. **[YOSO](https://huggingface.co/docs/transformers/master/model_doc/yoso)** (来自 the University of Wisconsin - Madison) 伴随论文 [You Only Sample (Almost) 由 Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh 发布。
330331
1. 想要贡献新的模型?我们这里有一份**详细指引和模板**来引导你添加新的模型。你可以在 [`templates`](./templates) 目录中找到他们。记得查看 [贡献指南](./CONTRIBUTING.md) 并在开始写 PR 前联系维护人员或开一个新的 issue 来获得反馈。
331332

332333
要检查某个模型是否已有 Flax、PyTorch 或 TensorFlow 的实现,或其是否在 🤗 Tokenizers 库中有对应词符化器(tokenizer),敬请参阅[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)

README_zh-hant.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,7 @@ conda install -c huggingface transformers
339339
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
340340
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
341341
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
342+
1. **[YOSO](https://huggingface.co/docs/transformers/master/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
342343
1. 想要貢獻新的模型?我們這裡有一份**詳細指引和模板**來引導你加入新的模型。你可以在 [`templates`](./templates) 目錄中找到它們。記得查看[貢獻指引](./CONTRIBUTING.md)並在開始寫 PR 前聯繫維護人員或開一個新的 issue 來獲得 feedbacks。
343344

344345
要檢查某個模型是否已有 Flax、PyTorch 或 TensorFlow 的實作,或其是否在🤗 Tokenizers 函式庫中有對應的 tokenizer,敬請參閱[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,8 @@
316316
title: XLSR-Wav2Vec2
317317
- local: model_doc/xls_r
318318
title: XLS-R
319+
- local: model_doc/yoso
320+
title: YOSO
319321
title: Models
320322
- sections:
321323
- local: internal/modeling_utils

docs/source/index.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,7 @@ conversion utilities for the following models.
184184
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
185185
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
186186
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
187+
1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
187188

188189

189190
### Supported frameworks
@@ -281,5 +282,6 @@ Flax), PyTorch, and/or TensorFlow.
281282
| XLM-RoBERTa | | | | | |
282283
| XLMProphetNet | | | | | |
283284
| XLNet | | | | | |
285+
| YOSO | | | | | |
284286

285287
<!-- End table-->

docs/source/model_doc/yoso.mdx

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# YOSO
14+
15+
## Overview
16+
17+
The YOSO model was proposed in [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714)
18+
by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh. YOSO approximates standard softmax self-attention
19+
via a Bernoulli sampling scheme based on Locality Sensitive Hashing (LSH). In principle, all the Bernoulli random variables can be sampled with
20+
a single hash.
21+
22+
The abstract from the paper is the following:
23+
24+
*Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is
25+
the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically
26+
on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a Bernoulli sampling
27+
attention mechanism based on Locality Sensitive Hashing (LSH), decreases the quadratic complexity of such models to linear.
28+
We bypass the quadratic cost by considering self-attention as a sum of individual tokens associated with Bernoulli random
29+
variables that can, in principle, be sampled at once by a single hash (although in practice, this number may be a small constant).
30+
This leads to an efficient sampling scheme to estimate self-attention which relies on specific modifications of
31+
LSH (to enable deployment on GPU architectures). We evaluate our algorithm on the GLUE benchmark with standard 512 sequence
32+
length where we see favorable performance relative to a standard pretrained Transformer. On the Long Range Arena (LRA) benchmark,
33+
for evaluating performance on long sequences, our method achieves results consistent with softmax self-attention but with sizable
34+
speed-ups and memory savings and often outperforms other efficient self-attention methods. Our code is available at this https URL*
35+
36+
Tips:
37+
38+
- The YOSO attention algorithm is implemented through custom CUDA kernels, functions written in CUDA C++ that can be executed multiple times
39+
in parallel on a GPU.
40+
- The kernels provide a `fast_hash` function, which approximates the random projections of the queries and keys using the Fast Hadamard Transform. Using these
41+
hash codes, the `lsh_cumulation` function approximates self-attention via LSH-based Bernoulli sampling.
42+
- To use the custom kernels, the user should set `config.use_expectation = False`. To ensure that the kernels are compiled successfully,
43+
the user must install the correct version of PyTorch and cudatoolkit. By default, `config.use_expectation = True`, which uses YOSO-E and
44+
does not require compiling CUDA kernels.
45+
46+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/yoso_architecture.jpg"
47+
alt="drawing" width="600"/>
48+
49+
<small> YOSO Attention Algorithm. Taken from the <a href="https://arxiv.org/abs/2111.09714">original paper</a>.</small>
50+
51+
This model was contributed by [novice03](https://huggingface.co/novice03). The original code can be found [here](https://github.com/mlpen/YOSO).
52+
53+
54+
## YosoConfig
55+
56+
[[autodoc]] YosoConfig
57+
58+
59+
## YosoModel
60+
61+
[[autodoc]] YosoModel
62+
- forward
63+
64+
65+
## YosoForMaskedLM
66+
67+
[[autodoc]] YosoForMaskedLM
68+
- forward
69+
70+
71+
## YosoForSequenceClassification
72+
73+
[[autodoc]] YosoForSequenceClassification
74+
- forward
75+
76+
## YosoForMultipleChoice
77+
78+
[[autodoc]] YosoForMultipleChoice
79+
- forward
80+
81+
82+
## YosoForTokenClassification
83+
84+
[[autodoc]] YosoForTokenClassification
85+
- forward
86+
87+
88+
## YosoForQuestionAnswering
89+
90+
[[autodoc]] YosoForQuestionAnswering
91+
- forward

src/transformers/__init__.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,7 @@
333333
"models.xlm_prophetnet": ["XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP", "XLMProphetNetConfig"],
334334
"models.xlm_roberta": ["XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP", "XLMRobertaConfig"],
335335
"models.xlnet": ["XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP", "XLNetConfig"],
336+
"models.yoso": ["YOSO_PRETRAINED_CONFIG_ARCHIVE_MAP", "YosoConfig"],
336337
"onnx": [],
337338
"pipelines": [
338339
"AudioClassificationPipeline",
@@ -1510,6 +1511,19 @@
15101511
"load_tf_weights_in_xlnet",
15111512
]
15121513
)
1514+
_import_structure["models.yoso"].extend(
1515+
[
1516+
"YOSO_PRETRAINED_MODEL_ARCHIVE_LIST",
1517+
"YosoForMaskedLM",
1518+
"YosoForMultipleChoice",
1519+
"YosoForQuestionAnswering",
1520+
"YosoForSequenceClassification",
1521+
"YosoForTokenClassification",
1522+
"YosoLayer",
1523+
"YosoModel",
1524+
"YosoPreTrainedModel",
1525+
]
1526+
)
15131527
_import_structure["optimization"] = [
15141528
"Adafactor",
15151529
"AdamW",
@@ -2454,6 +2468,7 @@
24542468
from .models.xlm_prophetnet import XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMProphetNetConfig
24552469
from .models.xlm_roberta import XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMRobertaConfig
24562470
from .models.xlnet import XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP, XLNetConfig
2471+
from .models.yoso import YOSO_PRETRAINED_CONFIG_ARCHIVE_MAP, YosoConfig
24572472

24582473
# Pipelines
24592474
from .pipelines import (
@@ -3431,6 +3446,17 @@
34313446
XLNetPreTrainedModel,
34323447
load_tf_weights_in_xlnet,
34333448
)
3449+
from .models.yoso import (
3450+
YOSO_PRETRAINED_MODEL_ARCHIVE_LIST,
3451+
YosoForMaskedLM,
3452+
YosoForMultipleChoice,
3453+
YosoForQuestionAnswering,
3454+
YosoForSequenceClassification,
3455+
YosoForTokenClassification,
3456+
YosoLayer,
3457+
YosoModel,
3458+
YosoPreTrainedModel,
3459+
)
34343460

34353461
# Optimization
34363462
from .optimization import (

src/transformers/models/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,4 +119,5 @@
119119
xlm_prophetnet,
120120
xlm_roberta,
121121
xlnet,
122+
yoso,
122123
)

src/transformers/models/auto/configuration_auto.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
CONFIG_MAPPING_NAMES = OrderedDict(
3131
[
3232
# Add configs here
33+
("yoso", "YosoConfig"),
3334
("swin", "SwinConfig"),
3435
("vilt", "ViltConfig"),
3536
("vit_mae", "ViTMAEConfig"),
@@ -121,6 +122,7 @@
121122
CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
122123
[
123124
# Add archive maps here
125+
("yoso", "YOSO_PRETRAINED_CONFIG_ARCHIVE_MAP"),
124126
("swin", "SWIN_PRETRAINED_CONFIG_ARCHIVE_MAP"),
125127
("vilt", "VILT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
126128
("vit_mae", "VIT_MAE_PRETRAINED_CONFIG_ARCHIVE_MAP"),
@@ -200,6 +202,7 @@
200202
MODEL_NAMES_MAPPING = OrderedDict(
201203
[
202204
# Add full (and cased) model names here
205+
("yoso", "YOSO"),
203206
("swin", "Swin"),
204207
("vilt", "ViLT"),
205208
("vit_mae", "ViTMAE"),

src/transformers/models/auto/modeling_auto.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
MODEL_MAPPING_NAMES = OrderedDict(
2929
[
3030
# Base model mapping
31+
("yoso", "YosoModel"),
3132
("swin", "SwinModel"),
3233
("vilt", "ViltModel"),
3334
("vit_mae", "ViTMAEModel"),
@@ -155,6 +156,7 @@
155156
MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
156157
[
157158
# Model with LM heads mapping
159+
("yoso", "YosoForMaskedLM"),
158160
("nystromformer", "NystromformerForMaskedLM"),
159161
("qdqbert", "QDQBertForMaskedLM"),
160162
("fnet", "FNetForMaskedLM"),
@@ -284,6 +286,7 @@
284286
MODEL_FOR_MASKED_LM_MAPPING_NAMES = OrderedDict(
285287
[
286288
# Model for Masked LM mapping
289+
("yoso", "YosoForMaskedLM"),
287290
("nystromformer", "NystromformerForMaskedLM"),
288291
("perceiver", "PerceiverForMaskedLM"),
289292
("qdqbert", "QDQBertForMaskedLM"),
@@ -357,6 +360,7 @@
357360
MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
358361
[
359362
# Model for Sequence Classification mapping
363+
("yoso", "YosoForSequenceClassification"),
360364
("nystromformer", "NystromformerForSequenceClassification"),
361365
("perceiver", "PerceiverForSequenceClassification"),
362366
("qdqbert", "QDQBertForSequenceClassification"),
@@ -405,6 +409,7 @@
405409
MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
406410
[
407411
# Model for Question Answering mapping
412+
("yoso", "YosoForQuestionAnswering"),
408413
("nystromformer", "NystromformerForQuestionAnswering"),
409414
("qdqbert", "QDQBertForQuestionAnswering"),
410415
("fnet", "FNetForQuestionAnswering"),
@@ -454,6 +459,7 @@
454459
MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
455460
[
456461
# Model for Token Classification mapping
462+
("yoso", "YosoForTokenClassification"),
457463
("nystromformer", "NystromformerForTokenClassification"),
458464
("qdqbert", "QDQBertForTokenClassification"),
459465
("fnet", "FNetForTokenClassification"),
@@ -490,6 +496,7 @@
490496
MODEL_FOR_MULTIPLE_CHOICE_MAPPING_NAMES = OrderedDict(
491497
[
492498
# Model for Multiple Choice mapping
499+
("yoso", "YosoForMultipleChoice"),
493500
("nystromformer", "NystromformerForMultipleChoice"),
494501
("qdqbert", "QDQBertForMultipleChoice"),
495502
("fnet", "FNetForMultipleChoice"),

0 commit comments

Comments
 (0)