@@ -117,95 +117,98 @@ and conversion utilities for the following models:
117
117
12. :doc: `DeBERTa <model_doc/deberta >` (from Microsoft Research) released with the paper `DeBERTa: Decoding-enhanced
118
118
BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654> `__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
119
119
Weizhu Chen.
120
- 13. :doc: `DialoGPT <model_doc/dialogpt >` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
120
+ 13. :doc: `DeBERTa-v2 <model_doc/deberta_v2 >` (from Microsoft Research) released with the paper `DeBERTa:
121
+ Decoding-enhanced BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654> `__ by Pengcheng He, Xiaodong
122
+ Liu, Jianfeng Gao, Weizhu Chen.
123
+ 14. :doc: `DialoGPT <model_doc/dialogpt >` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
121
124
Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536> `__ by Yizhe
122
125
Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
123
- 14 . :doc: `DistilBERT <model_doc/distilbert >` (from HuggingFace), released together with the paper `DistilBERT, a
126
+ 15 . :doc: `DistilBERT <model_doc/distilbert >` (from HuggingFace), released together with the paper `DistilBERT, a
124
127
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108> `__ by Victor
125
128
Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
126
129
<https://github.com/huggingface/transformers/tree/master/examples/distillation> `__, RoBERTa into `DistilRoBERTa
127
130
<https://github.com/huggingface/transformers/tree/master/examples/distillation> `__, Multilingual BERT into
128
131
`DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation >`__ and a German
129
132
version of DistilBERT.
130
- 15 . :doc: `DPR <model_doc/dpr >` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
133
+ 16 . :doc: `DPR <model_doc/dpr >` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
131
134
Question Answering <https://arxiv.org/abs/2004.04906> `__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
132
135
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
133
- 16 . :doc: `ELECTRA <model_doc/electra >` (from Google Research/Stanford University) released with the paper `ELECTRA:
136
+ 17 . :doc: `ELECTRA <model_doc/electra >` (from Google Research/Stanford University) released with the paper `ELECTRA:
134
137
Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555> `__ by Kevin
135
138
Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
136
- 17 . :doc: `FlauBERT <model_doc/flaubert >` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
139
+ 18 . :doc: `FlauBERT <model_doc/flaubert >` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
137
140
Pre-training for French <https://arxiv.org/abs/1912.05372> `__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
138
141
Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
139
- 18 . :doc: `Funnel Transformer <model_doc/funnel >` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
142
+ 19 . :doc: `Funnel Transformer <model_doc/funnel >` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
140
143
Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236> `__ by
141
144
Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
142
- 19 . :doc: `GPT <model_doc/gpt >` (from OpenAI) released with the paper `Improving Language Understanding by Generative
145
+ 20 . :doc: `GPT <model_doc/gpt >` (from OpenAI) released with the paper `Improving Language Understanding by Generative
143
146
Pre-Training <https://blog.openai.com/language-unsupervised/> `__ by Alec Radford, Karthik Narasimhan, Tim Salimans
144
147
and Ilya Sutskever.
145
- 20 . :doc: `GPT-2 <model_doc/gpt2 >` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
148
+ 21 . :doc: `GPT-2 <model_doc/gpt2 >` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
146
149
Learners <https://blog.openai.com/better-language-models/> `__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
147
150
Luan, Dario Amodei** and Ilya Sutskever**.
148
- 21 . :doc: `LayoutLM <model_doc/layoutlm >` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
151
+ 22 . :doc: `LayoutLM <model_doc/layoutlm >` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
149
152
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318> `__ by Yiheng Xu, Minghao Li,
150
153
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
151
- 22 . :doc: `LED <model_doc/led >` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
154
+ 23 . :doc: `LED <model_doc/led >` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
152
155
<https://arxiv.org/abs/2004.05150> `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
153
- 23 . :doc: `Longformer <model_doc/longformer >` (from AllenAI) released with the paper `Longformer: The Long-Document
156
+ 24 . :doc: `Longformer <model_doc/longformer >` (from AllenAI) released with the paper `Longformer: The Long-Document
154
157
Transformer <https://arxiv.org/abs/2004.05150> `__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
155
- 24 . :doc: `LXMERT <model_doc/lxmert >` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
158
+ 25 . :doc: `LXMERT <model_doc/lxmert >` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
156
159
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490> `__
157
160
by Hao Tan and Mohit Bansal.
158
- 25 . :doc: `MarianMT <model_doc/marian >` Machine translation models trained using `OPUS <http://opus.nlpl.eu/ >`__ data by
161
+ 26 . :doc: `MarianMT <model_doc/marian >` Machine translation models trained using `OPUS <http://opus.nlpl.eu/ >`__ data by
159
162
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/ >`__ is being developed by the Microsoft
160
163
Translator Team.
161
- 26 . :doc: `MBart <model_doc/mbart >` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
164
+ 27 . :doc: `MBart <model_doc/mbart >` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
162
165
Neural Machine Translation <https://arxiv.org/abs/2001.08210> `__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
163
166
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
164
- 27 . :doc: `MBart-50 <model_doc/mbart >` (from Facebook) released with the paper `Multilingual Translation with Extensible
167
+ 28 . :doc: `MBart-50 <model_doc/mbart >` (from Facebook) released with the paper `Multilingual Translation with Extensible
165
168
Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401> `__ by Yuqing Tang, Chau Tran, Xian Li,
166
169
Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
167
- 28 . :doc: `MPNet <model_doc/mpnet >` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
170
+ 29 . :doc: `MPNet <model_doc/mpnet >` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
168
171
Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297> `__ by Kaitao Song, Xu Tan, Tao Qin,
169
172
Jianfeng Lu, Tie-Yan Liu.
170
- 29 . :doc: `MT5 <model_doc/mt5 >` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
173
+ 30 . :doc: `MT5 <model_doc/mt5 >` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
171
174
text-to-text transformer <https://arxiv.org/abs/2010.11934> `__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
172
175
Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
173
- 30 . :doc: `Pegasus <model_doc/pegasus >` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
176
+ 31 . :doc: `Pegasus <model_doc/pegasus >` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
174
177
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777> `__> by Jingqing Zhang, Yao Zhao,
175
178
Mohammad Saleh and Peter J. Liu.
176
- 31 . :doc: `ProphetNet <model_doc/prophetnet >` (from Microsoft Research) released with the paper `ProphetNet: Predicting
179
+ 32 . :doc: `ProphetNet <model_doc/prophetnet >` (from Microsoft Research) released with the paper `ProphetNet: Predicting
177
180
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063> `__ by Yu Yan, Weizhen Qi,
178
181
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
179
- 32 . :doc: `Reformer <model_doc/reformer >` (from Google Research) released with the paper `Reformer: The Efficient
182
+ 33 . :doc: `Reformer <model_doc/reformer >` (from Google Research) released with the paper `Reformer: The Efficient
180
183
Transformer <https://arxiv.org/abs/2001.04451> `__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
181
- 33 . :doc: `RoBERTa <model_doc/roberta >` (from Facebook), released together with the paper a `Robustly Optimized BERT
184
+ 34 . :doc: `RoBERTa <model_doc/roberta >` (from Facebook), released together with the paper a `Robustly Optimized BERT
182
185
Pretraining Approach <https://arxiv.org/abs/1907.11692> `__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
183
186
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
184
- 34 . :doc: `SqueezeBert <model_doc/squeezebert >` released with the paper `SqueezeBERT: What can computer vision teach NLP
187
+ 35 . :doc: `SqueezeBert <model_doc/squeezebert >` released with the paper `SqueezeBERT: What can computer vision teach NLP
185
188
about efficient neural networks? <https://arxiv.org/abs/2006.11316> `__ by Forrest N. Iandola, Albert E. Shaw, Ravi
186
189
Krishna, and Kurt W. Keutzer.
187
- 35 . :doc: `T5 <model_doc/t5 >` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
190
+ 36 . :doc: `T5 <model_doc/t5 >` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
188
191
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683> `__ by Colin Raffel and Noam Shazeer and Adam
189
192
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
190
- 36 . :doc: `TAPAS <model_doc/tapas >` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
193
+ 37 . :doc: `TAPAS <model_doc/tapas >` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
191
194
Pre-training <https://arxiv.org/abs/2004.02349> `__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
192
195
Francesco Piccinno and Julian Martin Eisenschlos.
193
- 37 . :doc: `Transformer-XL <model_doc/transformerxl >` (from Google/CMU) released with the paper `Transformer-XL:
196
+ 38 . :doc: `Transformer-XL <model_doc/transformerxl >` (from Google/CMU) released with the paper `Transformer-XL:
194
197
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860> `__ by Zihang Dai*,
195
198
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
196
- 38 . :doc: `Wav2Vec2 <model_doc/wav2vec2 >` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
199
+ 39 . :doc: `Wav2Vec2 <model_doc/wav2vec2 >` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
197
200
Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477> `__ by Alexei Baevski, Henry
198
201
Zhou, Abdelrahman Mohamed, Michael Auli.
199
- 39 . :doc: `XLM <model_doc/xlm >` (from Facebook) released together with the paper `Cross-lingual Language Model
202
+ 40 . :doc: `XLM <model_doc/xlm >` (from Facebook) released together with the paper `Cross-lingual Language Model
200
203
Pretraining <https://arxiv.org/abs/1901.07291> `__ by Guillaume Lample and Alexis Conneau.
201
- 40 . :doc: `XLM-ProphetNet <model_doc/xlmprophetnet >` (from Microsoft Research) released with the paper `ProphetNet:
204
+ 41 . :doc: `XLM-ProphetNet <model_doc/xlmprophetnet >` (from Microsoft Research) released with the paper `ProphetNet:
202
205
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063> `__ by Yu Yan,
203
206
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
204
- 41 . :doc: `XLM-RoBERTa <model_doc/xlmroberta >` (from Facebook AI), released together with the paper `Unsupervised
207
+ 42 . :doc: `XLM-RoBERTa <model_doc/xlmroberta >` (from Facebook AI), released together with the paper `Unsupervised
205
208
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116> `__ by Alexis Conneau*, Kartikay
206
209
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
207
210
Zettlemoyer and Veselin Stoyanov.
208
- 42 . :doc: `XLNet <model_doc/xlnet >` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
211
+ 43 . :doc: `XLNet <model_doc/xlnet >` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
209
212
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237> `__ by Zhilin Yang*, Zihang Dai*, Yiming
210
213
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
211
214
@@ -246,6 +249,8 @@ TensorFlow and/or Flax.
246
249
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
247
250
| DeBERTa | ✅ | ❌ | ✅ | ❌ | ❌ |
248
251
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
252
+ | DeBERTa-v2 | ✅ | ❌ | ✅ | ❌ | ❌ |
253
+ +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
249
254
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
250
255
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
251
256
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ❌ |
@@ -389,6 +394,7 @@ TensorFlow and/or Flax.
389
394
model_doc/convbert
390
395
model_doc/ctrl
391
396
model_doc/deberta
397
+ model_doc/deberta_v2
392
398
model_doc/dialogpt
393
399
model_doc/distilbert
394
400
model_doc/dpr
0 commit comments