Fakerycoder
diff --git a/‎docs/source/index.rst
Lines changed: 3 additions & 0 deletions b/‎docs/source/index.rst
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/source/model_doc/ibert.rst
Lines changed: 88 additions & 0 deletions b/‎docs/source/model_doc/ibert.rst
Lines changed: 88 additions & 0 deletions
diff --git a/‎src/transformers/__init__.py
Lines changed: 25 additions & 0 deletions b/‎src/transformers/__init__.py
Lines changed: 25 additions & 0 deletions
diff --git a/‎src/transformers/models/auto/configuration_auto.py
Lines changed: 4 additions & 0 deletions b/‎src/transformers/models/auto/configuration_auto.py
Lines changed: 4 additions & 0 deletions
diff --git a/‎src/transformers/models/auto/modeling_auto.py
Lines changed: 17 additions & 0 deletions b/‎src/transformers/models/auto/modeling_auto.py
Lines changed: 17 additions & 0 deletions
diff --git a/‎src/transformers/models/auto/tokenization_auto.py
Lines changed: 2 additions & 0 deletions b/‎src/transformers/models/auto/tokenization_auto.py
Lines changed: 2 additions & 0 deletions
diff --git a/‎src/transformers/models/ibert/__init__.py
Lines changed: 69 additions & 0 deletions b/‎src/transformers/models/ibert/__init__.py
Lines changed: 69 additions & 0 deletions
@@ -263,6 +263,8 @@ TensorFlow and/or Flax.
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |     Funnel Transformer      |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
+|           I-BERT            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
++-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |             LED             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |           LXMERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
@@ -405,6 +407,7 @@ TensorFlow and/or Flax.
     model_doc/fsmt
     model_doc/funnel
     model_doc/herbert
+    model_doc/ibert
     model_doc/layoutlm
     model_doc/led
     model_doc/longformer
 
@@ -0,0 +1,88 @@
+.. 
+    Copyright 2020 The HuggingFace Team. All rights reserved.
+
+    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+    the License. You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+    specific language governing permissions and limitations under the License.
+
+I-BERT
+-----------------------------------------------------------------------------------------------------------------------
+
+Overview
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The I-BERT model was proposed in `I-BERT: Integer-only BERT Quantization <https://arxiv.org/abs/2006.10220>`__ by
+Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer. It's a quantized version of RoBERTa running
+inference up to four times faster.
+
+The abstract from the paper is the following:
+
+*Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language
+Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for
+efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this,
+previous work on quantizing Transformer based models use floating-point arithmetic during inference, which cannot
+efficiently utilize integer-only logical units such as the recent Turing Tensor Cores, or traditional integer-only ARM
+processors. In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes
+the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for
+nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT
+inference without any floating point calculation. We evaluate our approach on GLUE downstream tasks using
+RoBERTa-Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to
+the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4 - 4.0x for
+INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
+been open-sourced.*
+
+
+The original code can be found `here <https://github.com/kssteven418/I-BERT>`__.
+
+IBertConfig
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.IBertConfig
+    :members:
+
+
+IBertModel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.IBertModel
+    :members: forward
+
+
+IBertForMaskedLM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.IBertForMaskedLM
+    :members: forward
+
+
+IBertForSequenceClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.IBertForSequenceClassification
+    :members: forward
+
+
+IBertForMultipleChoice
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.IBertForMultipleChoice
+    :members: forward
+
+
+IBertForTokenClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.IBertForTokenClassification
+    :members: forward
+
+
+IBertForQuestionAnswering
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.IBertForQuestionAnswering
+    :members: forward
@@ -182,6 +182,7 @@
     "models.funnel": ["FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP", "FunnelConfig", "FunnelTokenizer"],
     "models.gpt2": ["GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP", "GPT2Config", "GPT2Tokenizer"],
     "models.herbert": ["HerbertTokenizer"],
+    "models.ibert": ["IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "IBertConfig"],
     "models.layoutlm": ["LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP", "LayoutLMConfig", "LayoutLMTokenizer"],
     "models.led": ["LED_PRETRAINED_CONFIG_ARCHIVE_MAP", "LEDConfig", "LEDTokenizer"],
     "models.longformer": ["LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "LongformerConfig", "LongformerTokenizer"],
@@ -613,6 +614,20 @@
             "load_tf_weights_in_gpt2",
         ]
     )
+    _import_structure["models.ibert"].extend(
+        [
+            "IBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
+            "IBertForMaskedLM",
+            "IBertForMultipleChoice",
+            "IBertForQuestionAnswering",
+            "IBertForSequenceClassification",
+            "IBertForTokenClassification",
+            "IBertLayer",
+            "IBertModel",
+            "IBertPreTrainedModel",
+            "load_tf_weights_in_ibert",
+        ]
+    )
     _import_structure["models.layoutlm"].extend(
         [
             "LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST",
@@ -1328,6 +1343,7 @@
     from .models.funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig, FunnelTokenizer
     from .models.gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config, GPT2Tokenizer
     from .models.herbert import HerbertTokenizer
+    from .models.ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
     from .models.layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig, LayoutLMTokenizer
     from .models.led import LED_PRETRAINED_CONFIG_ARCHIVE_MAP, LEDConfig, LEDTokenizer
     from .models.longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig, LongformerTokenizer
@@ -1710,6 +1726,15 @@
             GPT2PreTrainedModel,
             load_tf_weights_in_gpt2,
         )
+        from .models.ibert import (
+            IBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
+            IBertForMaskedLM,
+            IBertForMultipleChoice,
+            IBertForQuestionAnswering,
+            IBertForSequenceClassification,
+            IBertForTokenClassification,
+            IBertModel,
+        )
         from .models.layoutlm import (
             LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST,
             LayoutLMForMaskedLM,
 
@@ -40,6 +40,7 @@
 from ..fsmt.configuration_fsmt import FSMT_PRETRAINED_CONFIG_ARCHIVE_MAP, FSMTConfig
 from ..funnel.configuration_funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig
 from ..gpt2.configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
+from ..ibert.configuration_ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
 from ..layoutlm.configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig
 from ..led.configuration_led import LED_PRETRAINED_CONFIG_ARCHIVE_MAP, LEDConfig
 from ..longformer.configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
@@ -110,6 +111,7 @@
         PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
         MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
         TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP,
+        IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
     ]
     for key, value, in pretrained_map.items()
 )
@@ -123,6 +125,7 @@
         ("led", LEDConfig),
         ("blenderbot-small", BlenderbotSmallConfig),
         ("retribert", RetriBertConfig),
+        ("ibert", IBertConfig),
         ("mt5", MT5Config),
         ("t5", T5Config),
         ("mobilebert", MobileBertConfig),
@@ -173,6 +176,7 @@
         ("led", "LED"),
         ("blenderbot-small", "BlenderbotSmall"),
         ("retribert", "RetriBERT"),
+        ("ibert", "I-BERT"),
         ("t5", "T5"),
         ("mobilebert", "MobileBERT"),
         ("distilbert", "DistilBERT"),
 
@@ -129,6 +129,14 @@
     FunnelModel,
 )
 from ..gpt2.modeling_gpt2 import GPT2ForSequenceClassification, GPT2LMHeadModel, GPT2Model
+from ..ibert.modeling_ibert import (
+    IBertForMaskedLM,
+    IBertForMultipleChoice,
+    IBertForQuestionAnswering,
+    IBertForSequenceClassification,
+    IBertForTokenClassification,
+    IBertModel,
+)
 from ..layoutlm.modeling_layoutlm import (
     LayoutLMForMaskedLM,
     LayoutLMForSequenceClassification,
@@ -270,6 +278,7 @@
     FSMTConfig,
     FunnelConfig,
     GPT2Config,
+    IBertConfig,
     LayoutLMConfig,
     LEDConfig,
     LongformerConfig,
@@ -347,6 +356,7 @@
         (MPNetConfig, MPNetModel),
         (TapasConfig, TapasModel),
         (MarianConfig, MarianModel),
+        (IBertConfig, IBertModel),
     ]
 )
 
@@ -379,6 +389,7 @@
         (FunnelConfig, FunnelForPreTraining),
         (MPNetConfig, MPNetForMaskedLM),
         (TapasConfig, TapasForMaskedLM),
+        (IBertConfig, IBertForMaskedLM),
     ]
 )
 
@@ -418,6 +429,7 @@
         (TapasConfig, TapasForMaskedLM),
         (DebertaConfig, DebertaForMaskedLM),
         (DebertaV2Config, DebertaV2ForMaskedLM),
+        (IBertConfig, IBertForMaskedLM),
     ]
 )
 
@@ -476,6 +488,7 @@
         (TapasConfig, TapasForMaskedLM),
         (DebertaConfig, DebertaForMaskedLM),
         (DebertaV2Config, DebertaV2ForMaskedLM),
+        (IBertConfig, IBertForMaskedLM),
     ]
 )
 
@@ -529,6 +542,7 @@
         (TransfoXLConfig, TransfoXLForSequenceClassification),
         (MPNetConfig, MPNetForSequenceClassification),
         (TapasConfig, TapasForSequenceClassification),
+        (IBertConfig, IBertForSequenceClassification),
     ]
 )
 
@@ -558,6 +572,7 @@
         (MPNetConfig, MPNetForQuestionAnswering),
         (DebertaConfig, DebertaForQuestionAnswering),
         (DebertaV2Config, DebertaV2ForQuestionAnswering),
+        (IBertConfig, IBertForQuestionAnswering),
     ]
 )
 
@@ -591,6 +606,7 @@
         (MPNetConfig, MPNetForTokenClassification),
         (DebertaConfig, DebertaForTokenClassification),
         (DebertaV2Config, DebertaV2ForTokenClassification),
+        (IBertConfig, IBertForTokenClassification),
     ]
 )
 
@@ -613,6 +629,7 @@
         (FlaubertConfig, FlaubertForMultipleChoice),
         (FunnelConfig, FunnelForMultipleChoice),
         (MPNetConfig, MPNetForMultipleChoice),
+        (IBertConfig, IBertForMultipleChoice),
     ]
 )
 
 
@@ -75,6 +75,7 @@
     FSMTConfig,
     FunnelConfig,
     GPT2Config,
+    IBertConfig,
     LayoutLMConfig,
     LEDConfig,
     LongformerConfig,
@@ -244,6 +245,7 @@
         (TapasConfig, (TapasTokenizer, None)),
         (LEDConfig, (LEDTokenizer, LEDTokenizerFast)),
         (ConvBertConfig, (ConvBertTokenizer, ConvBertTokenizerFast)),
+        (IBertConfig, (RobertaTokenizer, RobertaTokenizerFast)),
         (Wav2Vec2Config, (Wav2Vec2CTCTokenizer, None)),
     ]
 )
 
@@ -0,0 +1,69 @@
+# flake8: noqa
+# There's no way to ignore "F401 '...' imported but unused" warnings in this
+# module, but to preserve other warnings. So, don't check this module at all.
+
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from ...file_utils import _BaseLazyModule, is_tokenizers_available, is_torch_available
+
+
+_import_structure = {
+    "configuration_ibert": ["IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "IBertConfig"],
+}
+
+if is_torch_available():
+    _import_structure["modeling_ibert"] = [
+        "IBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
+        "IBertForMaskedLM",
+        "IBertForMultipleChoice",
+        "IBertForQuestionAnswering",
+        "IBertForSequenceClassification",
+        "IBertForTokenClassification",
+        "IBertModel",
+    ]
+
+if TYPE_CHECKING:
+    from .configuration_ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
+
+    if is_torch_available():
+        from .modeling_ibert import (
+            IBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
+            IBertForMaskedLM,
+            IBertForMultipleChoice,
+            IBertForQuestionAnswering,
+            IBertForSequenceClassification,
+            IBertForTokenClassification,
+            IBertModel,
+        )
+
+else:
+    import importlib
+    import os
+    import sys
+
+    class _LazyModule(_BaseLazyModule):
+        """
+        Module class that surfaces all objects but only performs associated imports when the objects are requested.
+        """
+
+        __file__ = globals()["__file__"]
+        __path__ = [os.path.dirname(__file__)]
+
+        def _get_module(self, module_name: str):
+            return importlib.import_module("." + module_name, self.__name__)
+
+    sys.modules[__name__] = _LazyModule(__name__, _import_structure)
Original file line number	Diff line number	Diff line change
`@@ -129,6 +129,14 @@`
`129`	`129`	`FunnelModel,`
`130`	`130`	`)`
`131`	`131`	`from ..gpt2.modeling_gpt2 import GPT2ForSequenceClassification, GPT2LMHeadModel, GPT2Model`
	`132`	`+from ..ibert.modeling_ibert import (`
	`133`	`+ IBertForMaskedLM,`
	`134`	`+ IBertForMultipleChoice,`
	`135`	`+ IBertForQuestionAnswering,`
	`136`	`+ IBertForSequenceClassification,`
	`137`	`+ IBertForTokenClassification,`
	`138`	`+ IBertModel,`
	`139`	`+)`
`132`	`140`	`from ..layoutlm.modeling_layoutlm import (`
`133`	`141`	`LayoutLMForMaskedLM,`
`134`	`142`	`LayoutLMForSequenceClassification,`
`@@ -270,6 +278,7 @@`
`270`	`278`	`FSMTConfig,`
`271`	`279`	`FunnelConfig,`
`272`	`280`	`GPT2Config,`
	`281`	`+ IBertConfig,`
`273`	`282`	`LayoutLMConfig,`
`274`	`283`	`LEDConfig,`
`275`	`284`	`LongformerConfig,`
`@@ -347,6 +356,7 @@`
`347`	`356`	`(MPNetConfig, MPNetModel),`
`348`	`357`	`(TapasConfig, TapasModel),`
`349`	`358`	`(MarianConfig, MarianModel),`
	`359`	`+ (IBertConfig, IBertModel),`
`350`	`360`	`]`
`351`	`361`	`)`
`352`	`362`
`@@ -379,6 +389,7 @@`
`379`	`389`	`(FunnelConfig, FunnelForPreTraining),`
`380`	`390`	`(MPNetConfig, MPNetForMaskedLM),`
`381`	`391`	`(TapasConfig, TapasForMaskedLM),`
	`392`	`+ (IBertConfig, IBertForMaskedLM),`
`382`	`393`	`]`
`383`	`394`	`)`
`384`	`395`
`@@ -418,6 +429,7 @@`
`418`	`429`	`(TapasConfig, TapasForMaskedLM),`
`419`	`430`	`(DebertaConfig, DebertaForMaskedLM),`
`420`	`431`	`(DebertaV2Config, DebertaV2ForMaskedLM),`
	`432`	`+ (IBertConfig, IBertForMaskedLM),`
`421`	`433`	`]`
`422`	`434`	`)`
`423`	`435`
`@@ -476,6 +488,7 @@`
`476`	`488`	`(TapasConfig, TapasForMaskedLM),`
`477`	`489`	`(DebertaConfig, DebertaForMaskedLM),`
`478`	`490`	`(DebertaV2Config, DebertaV2ForMaskedLM),`
	`491`	`+ (IBertConfig, IBertForMaskedLM),`
`479`	`492`	`]`
`480`	`493`	`)`
`481`	`494`
`@@ -529,6 +542,7 @@`
`529`	`542`	`(TransfoXLConfig, TransfoXLForSequenceClassification),`
`530`	`543`	`(MPNetConfig, MPNetForSequenceClassification),`
`531`	`544`	`(TapasConfig, TapasForSequenceClassification),`
	`545`	`+ (IBertConfig, IBertForSequenceClassification),`
`532`	`546`	`]`
`533`	`547`	`)`
`534`	`548`
`@@ -558,6 +572,7 @@`
`558`	`572`	`(MPNetConfig, MPNetForQuestionAnswering),`
`559`	`573`	`(DebertaConfig, DebertaForQuestionAnswering),`
`560`	`574`	`(DebertaV2Config, DebertaV2ForQuestionAnswering),`
	`575`	`+ (IBertConfig, IBertForQuestionAnswering),`
`561`	`576`	`]`
`562`	`577`	`)`
`563`	`578`
`@@ -591,6 +606,7 @@`
`591`	`606`	`(MPNetConfig, MPNetForTokenClassification),`
`592`	`607`	`(DebertaConfig, DebertaForTokenClassification),`
`593`	`608`	`(DebertaV2Config, DebertaV2ForTokenClassification),`
	`609`	`+ (IBertConfig, IBertForTokenClassification),`
`594`	`610`	`]`
`595`	`611`	`)`
`596`	`612`
`@@ -613,6 +629,7 @@`
`613`	`629`	`(FlaubertConfig, FlaubertForMultipleChoice),`
`614`	`630`	`(FunnelConfig, FunnelForMultipleChoice),`
`615`	`631`	`(MPNetConfig, MPNetForMultipleChoice),`
	`632`	`+ (IBertConfig, IBertForMultipleChoice),`
`616`	`633`	`]`
`617`	`634`	`)`
`618`	`635`