-
Notifications
You must be signed in to change notification settings - Fork 30k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v4.55.0 v4.55.0: New openai GPT OSS model!
published
Aug 5, 2025
82 Pull requests merged by 43 people
-
unpin
torchcodec==0.5.0
and usetorch 2.8
on daily CI#40072 merged
Aug 10, 2025 -
Update HuBERT model card according to template
#39742 merged
Aug 10, 2025 -
Revert "fix
notification_service.py
abouttime_spent
"#40044 merged
Aug 8, 2025 -
GLM-4.5V Model Support
#39805 merged
Aug 8, 2025 -
fix
notification_service.py
abouttime_spent
#40037 merged
Aug 8, 2025 -
Bnb failling tests
#40026 merged
Aug 8, 2025 -
Tie weights recursively on all submodels
#39996 merged
Aug 8, 2025 -
[core] Refactor the Cache logic to make it simpler and more general
#39797 merged
Aug 8, 2025 -
Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991)
#40024 merged
Aug 8, 2025 -
Harmonize
past_key_value
topast_key_valueS
everywhere#39956 merged
Aug 8, 2025 -
Fix an annoying flaky test
#40000 merged
Aug 8, 2025 -
Higgs modules_to_not_convert standardization
#39989 merged
Aug 8, 2025 -
Fix broken image inference for Fuyu model
#39915 merged
Aug 8, 2025 -
pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI
#40013 merged
Aug 7, 2025 -
Update expected output values after #39885 (part 2)
#40015 merged
Aug 7, 2025 -
Raising error when quantizing a quantized model
#39998 merged
Aug 7, 2025 -
docs: fix duplication in 'en/optimizers.md'
#40014 merged
Aug 7, 2025 -
unpin torch<2.8 on circleci
#40012 merged
Aug 7, 2025 -
FA2 can continue generation from cache
#39843 merged
Aug 7, 2025 -
Fix default values of getenv
#39867 merged
Aug 7, 2025 -
Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips
#39965 merged
Aug 7, 2025 -
fix: remove CHAT_TEMPLATE import in tests for deepseek-vl
#40003 merged
Aug 7, 2025 -
Fix missing video inputs for PerceptionLM.
#39971 merged
Aug 7, 2025 -
Fix int4 quantized model cannot work with cpu
#39724 merged
Aug 7, 2025 -
Update expected output values after #39885 (part 1)
#39990 merged
Aug 7, 2025 -
Fix consistency
#39995 merged
Aug 7, 2025 -
Fix return typehint for decoder and annotate inv_freq
#39610 merged
Aug 7, 2025 -
Bump transformers from 4.48.0 to 4.53.0 in /examples/tensorflow/language-modeling-tpu
#39967 merged
Aug 7, 2025 -
Fix gemma3n feature extractor's incorrect squeeze
#39919 merged
Aug 7, 2025 -
[Idefics] fix device mismatch
#39981 merged
Aug 7, 2025 -
Various test fixes for AMD
#39978 merged
Aug 7, 2025 -
Support input_embeds in torch exportable decoders
#39836 merged
Aug 7, 2025 -
[superglue] Fixed the way batch mask was applied to the scores before match assignment computation
#39968 merged
Aug 7, 2025 -
Gemma3 fixes
#39960 merged
Aug 7, 2025 -
Modular fix: remove the model name in
find_file_type
#39897 merged
Aug 6, 2025 -
chore: update Deformable_Detr model card
#39902 merged
Aug 6, 2025 -
[bugfix] fix flash_attention_2 unavailable error on Ascend NPU
#39844 merged
Aug 6, 2025 -
Fix
fix_and_overwrite
mode ofutils/check_docstring.py
#39369 merged
Aug 6, 2025 -
remove
triton_kernels
dep withkernels
instead#39926 merged
Aug 6, 2025 -
fix glm4v image process
#39964 merged
Aug 6, 2025 -
fix typo
#39936 merged
Aug 6, 2025 -
Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts
#39959 merged
Aug 6, 2025 -
docs: fix typo in 'quantization-aware training'
#39904 merged
Aug 6, 2025 -
Enable gpt-oss mxfp4 on older hardware (sm75+)
#39940 merged
Aug 6, 2025 -
Fix MXFP4 quantizer validation to allow CPU inference with dequantize option
#39953 merged
Aug 6, 2025 -
[docs] ko toc fix
#39927 merged
Aug 6, 2025 -
circleci: pin torch 2.7.1 until
torchcodec
is updated#39951 merged
Aug 6, 2025 -
Fix CI: Tests failing on CPU due to
torch.device('cpu').index
being None#39933 merged
Aug 6, 2025 -
Avoid
utils/check_bad_commit.py
failing due to rate limit (requestingapi.github.com
)#39918 merged
Aug 5, 2025 -
[CI] post-
GptOss
fixes for green CI#39929 merged
Aug 5, 2025 -
gpt_oss last chat template changes
#39925 merged
Aug 5, 2025 -
Add GPT OSS model from OpenAI
#39923 merged
Aug 5, 2025 -
🌐 [i18n-KO] Translated
cache_explanation.md
to Korean#39535 merged
Aug 5, 2025 -
Export SmolvLM
#39614 merged
Aug 5, 2025 -
Update object_detection.md
#39909 merged
Aug 5, 2025 -
run model debugging with forward arg
#39905 merged
Aug 5, 2025 -
Revert "remove dtensors, not explicit (#39840)"
#39912 merged
Aug 5, 2025 -
Fix aria tests
#39879 merged
Aug 5, 2025 -
Fix eval thread fork bomb
#39717 merged
Aug 5, 2025 -
Replace video_fps with fps in tests
#39898 merged
Aug 5, 2025 -
Fix misleading WandB error when WANDB_DISABLED is set
#39891 merged
Aug 5, 2025 -
Avoid aliasing in cond's branches for torch 2.8
#39488 merged
Aug 5, 2025 -
Remove unnecessary CUDA sync in qwen2_5_vl
#39870 merged
Aug 5, 2025 -
fix test_working_of_tp failure of accelerate ut
#39828 merged
Aug 5, 2025 -
[
Exaone4
] Fixes the attn implementation!#39906 merged
Aug 5, 2025 -
Reorder serving docs
#39634 merged
Aug 5, 2025 -
chore: update DETR model card
#39822 merged
Aug 4, 2025 -
Add support for
ModernBertForMultipleChoice
#39232 merged
Aug 4, 2025 -
send some feedback when manually building doc via comment
#39889 merged
Aug 4, 2025 -
Update cohere2 vision test
#39888 merged
Aug 4, 2025 -
[DOCS] : Improved mimi model card
#39824 merged
Aug 4, 2025 -
Fix link to models in README
#39880 merged
Aug 4, 2025 -
Better return type hint for
AutoModelForCausalLM
andAutoModelForImageTextToText
#39881 merged
Aug 4, 2025 -
Set
torch.backends.cudnn.allow_tf32 = False
for CI#39885 merged
Aug 4, 2025 -
Replace
Tokenizer
withPreTrainedTokenizerFast
inContinuousBatchProcessor
#39858 merged
Aug 4, 2025 -
Rework add-new-model-like with modular and make test filenames coherent
#39612 merged
Aug 4, 2025 -
Fix quant docker for fp-quant
#39641 merged
Aug 4, 2025 -
Fix attn_implementation setter for models with
backbone_config
#39855 merged
Aug 4, 2025 -
Add support for including in-memory videos (not just files/urls) in apply_chat_template
#39494 merged
Aug 4, 2025 -
Use comment to build doc on PRs
#39846 merged
Aug 4, 2025 -
Refactor label name handling for PEFT models in Trainer class
#39265 merged
Aug 4, 2025 -
Improve
is_wandb_available
function to verify WandB installation#39875 merged
Aug 4, 2025
74 Pull requests opened by 56 people
-
Remove deprecated max_size parameter from ConditionalDetrImageProcessor
#39883 opened
Aug 4, 2025 -
added Textnet fast image processor
#39884 opened
Aug 4, 2025 -
🌐 [i18n-KO] Translated `perf_train_gaudi.md` to Korean
#39886 opened
Aug 4, 2025 -
Move old generation modes to the Hub 🧹🧹🧹🧽🧽
#39887 opened
Aug 4, 2025 -
🌐 [i18n-KO] Translated `jamba.md` to Korean
#39890 opened
Aug 4, 2025 -
[docs] Add reference to HF-maintained `custom_generate` collections
#39894 opened
Aug 4, 2025 -
Add Videoprism
#39895 opened
Aug 4, 2025 -
[model] Support MiniCPM-V 4.0
#39899 opened
Aug 5, 2025 -
🌐 [i18n-KO] Translated `fp_quant` to Korean
#39901 opened
Aug 5, 2025 -
🌐 [i18n-KO] Translated clipseg.md to Korean
#39903 opened
Aug 5, 2025 -
Update dynamic attnt setter for multimodals
#39908 opened
Aug 5, 2025 -
🌐 [i18n-KO] Translated `tiny_agents.md` to Korean
#39913 opened
Aug 5, 2025 -
🌐 [i18n-KO] Updated ko/perf_train_cpu.md
#39917 opened
Aug 5, 2025 -
🌐 [i18n-KO] Updated ko/perf_train_special.md
#39920 opened
Aug 5, 2025 -
🌐 [i18n-KO] Translated `attention_interface.md` to Korean
#39922 opened
Aug 5, 2025 -
Add chat template tests
#39924 opened
Aug 5, 2025 -
Fix hidden torchvision>=0.15 dependency issue
#39928 opened
Aug 5, 2025 -
Add missing special token properties to MistralCommonTokenizer
#39930 opened
Aug 5, 2025 -
Registers StaticCache serialization functions for torch.export.export
#39931 opened
Aug 5, 2025 -
Fix whisper `return_language` with `return_timestamp=word`
#39938 opened
Aug 5, 2025 -
fixing image_utils.py todo
#39941 opened
Aug 6, 2025 -
fix llama issue
#39942 opened
Aug 6, 2025 -
Add back `_tp_plan` attribute
#39944 opened
Aug 6, 2025 -
Add pytest marker: `torch_compile_test` and `torch_export_test`
#39950 opened
Aug 6, 2025 -
Use torch._check instead of a test to make the model Gemma3 exportable
#39962 opened
Aug 6, 2025 -
Add Keypoint Matcher pipeline
#39970 opened
Aug 6, 2025 -
Causal loss for `ForConditionalGeneration`
#39973 opened
Aug 7, 2025 -
[bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM
#39975 opened
Aug 7, 2025 -
Fix Qwen3 MoE GGUF architecture mismatch
#39976 opened
Aug 7, 2025 -
Fix cross-attention masking before residual connection
#39979 opened
Aug 7, 2025 -
Fix setting attention for multimodal models
#39984 opened
Aug 7, 2025 -
fix: resolve triton version check compatibility on windows
#39986 opened
Aug 7, 2025 -
Add vggt-hf copied from vit
#39987 opened
Aug 7, 2025 -
Update Glm4V processor and add tests
#39988 opened
Aug 7, 2025 -
Default to dequantize if cpu in device_map for mxfp4
#39993 opened
Aug 7, 2025 -
chore: Add type hints to import_utils.py module
#39994 opened
Aug 7, 2025 -
make sure position_ids are passed in for causal mask creation for gpt-oss
#39997 opened
Aug 7, 2025 -
allow TP to work in ND-parallel with fsdp cpu ram efficient loading
#39999 opened
Aug 7, 2025 -
[`Flash Attention`] Fix flash attention integration
#40002 opened
Aug 7, 2025 -
Fix PerceptionLM image preprocessing for non-tiled image input.
#40006 opened
Aug 7, 2025 -
🚨 Use lru_cache for sine pos embeddings MaskFormer
#40007 opened
Aug 7, 2025 -
Fixes for EncoderDecoderCache
#40008 opened
Aug 7, 2025 -
feat: extract rev in attn_implementation kernels via @
#40009 opened
Aug 7, 2025 -
🌐 [i18n-KO] Translated `optimizers.md` to Korean
#40011 opened
Aug 7, 2025 -
Feat/add gpt oss sequence classification
#40019 opened
Aug 8, 2025 -
[fix] batch inference for llava_onevision
#40021 opened
Aug 8, 2025 -
fix: resolve dropout type error in DogeDecoder
#40022 opened
Aug 8, 2025 -
Add support for SDPA for OWLViT and OWLv2
#40023 opened
Aug 8, 2025 -
[GLM4V] fix vision placeholder mask
#40025 opened
Aug 8, 2025 -
Add amd runners to run-slow command
#40027 opened
Aug 8, 2025 -
Revert FA2 kwargs construction
#40029 opened
Aug 8, 2025 -
Update boxes expectations for OWLViT test
#40030 opened
Aug 8, 2025 -
Add model card for MobileViT
#40033 opened
Aug 8, 2025 -
Remove deprecated cache-related objects
#40035 opened
Aug 8, 2025 -
Fix error on importing unavailable torch.distributed
#40038 opened
Aug 8, 2025 -
New DynamicSlidingWindow layer & caches
#40039 opened
Aug 8, 2025 -
[`GPT Big Code`] Fix attention scaling
#40041 opened
Aug 8, 2025 -
Add GptOssForSequenceClassification for GPT-OSS models
#40043 opened
Aug 8, 2025 -
(small) fix conditional for input_ids and input_embeds in marian
#40045 opened
Aug 8, 2025 -
Update wavlm.md to match new model card template
#40047 opened
Aug 8, 2025 -
Standardize BARTpho model card: badges, new examples, fixed broken im…
#40051 opened
Aug 9, 2025 -
Auto-log parallelism info to wandb.config using HF Accelerate
#40055 opened
Aug 9, 2025 -
updated visualBERT modelcard
#40057 opened
Aug 9, 2025 -
GGUF Qwen2VL
#40058 opened
Aug 9, 2025 -
Fix Inefficient GELU implementation in GPT2
#40059 opened
Aug 9, 2025 -
Avoid CUDA stream sync
#40060 opened
Aug 10, 2025 -
🌐 [i18n-KO] Translated `vitdet.md` to Korean
#40061 opened
Aug 10, 2025 -
fix: move super().__init__ after vision_config init in Mistral3Config
#40063 opened
Aug 10, 2025 -
🌐 [i18n-KO] Translated `videomae.md` to Korean
#40064 opened
Aug 10, 2025 -
Delay float32 upcast in ForCausalLMLoss after filtering ignore_index
#40065 opened
Aug 10, 2025 -
Change Qwen2RMSNorm to RMSNorm from PyTorch
#40066 opened
Aug 10, 2025 -
Add missing arguments
#40068 opened
Aug 10, 2025 -
Remove _prepare_flash_attention_from_position_ids
#40069 opened
Aug 10, 2025
43 Issues closed by 22 people
-
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 closed
Aug 10, 2025 -
Gemma2 fall back to cpu execusion when attn_implementation='flash_attention_2'
#39188 closed
Aug 10, 2025 -
Previous PRs introduced a bug on Accumulated Gradients Losses
#40052 closed
Aug 9, 2025 -
Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model
#37248 closed
Aug 9, 2025 -
Pretrainedtokenizerfast Segmentation fault
#39099 closed
Aug 9, 2025 -
New release 4.53.0 breaks HF trainer/model
#39111 closed
Aug 9, 2025 -
Gradient accumulation steps for Vision Languge model
#39123 closed
Aug 9, 2025 -
Not capable of exporting Mistral to ONNX format with the use of caching
#39162 closed
Aug 9, 2025 -
Error when loading gguf file
#40040 closed
Aug 9, 2025 -
Weights not tied when loading `from_pretrained` with a wrapped model
#39900 closed
Aug 8, 2025 -
`TypeError: 'builtins.safe_open' object is not iterable` in `load_pytorch_state_dict_in_tf2_model `
#40028 closed
Aug 8, 2025 -
Major issues with transformers version causing rubbish generations with Gemma3 family using vllm
#40017 closed
Aug 8, 2025 -
Gemma3n get_placeholder_mask issue
#39991 closed
Aug 8, 2025 -
flash-attn cannot perform deterministic computation
#39982 closed
Aug 8, 2025 -
RoBERTa is not well implemented for tokenizers with pad_token_id != 1
#34528 closed
Aug 8, 2025 -
[DeepSeek-V3] Different rotary embedding implementation between DeepSeek-AI and Transformers
#39687 closed
Aug 8, 2025 -
ModernBertUnpaddedRotaryEmbedding __init__ error
#39934 closed
Aug 7, 2025 -
video_inputs are not passed to perception_lm
#40004 closed
Aug 7, 2025 -
Flash Attention fails with non aligned position_ids
#39814 closed
Aug 7, 2025 -
`convert_deepseek_vl_weights_to_hf.py` not included in v4.55.0 release.
#39966 closed
Aug 7, 2025 -
[Gemma3N] Audio processing issue
#39911 closed
Aug 7, 2025 -
v4.55.0 Idefics3 RuntimeError Tensors on different devices
#39947 closed
Aug 7, 2025 -
[gpt‑oss] eager_attention_forward not using sliding-window attention for GPT‑OSS models
#39954 closed
Aug 7, 2025 -
Finetune `gpt-oss-20b` with `mxfp4` quantization
#39969 closed
Aug 6, 2025 -
Fix grammatically incorrect variable name "expert_hitted" → "expert_hit" in MoE implementation
#39955 closed
Aug 6, 2025 -
transformers serve doesn't handle OPTIONS http method
#39932 closed
Aug 6, 2025 -
454545
#39864 closed
Aug 6, 2025 -
ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#38442 closed
Aug 6, 2025 -
Streaming mode support on HF vs kyutai-labs for the mimi model
#38535 closed
Aug 6, 2025 -
enable GraniteMoeHybridIntegrationTest in UT
#38542 closed
Aug 6, 2025 -
Llama4 inference encounter unsupported op in dynamo ?
#38118 closed
Aug 6, 2025 -
Misleading WandB error when WANDB_DISABLED=True and report_to="wandb" are both set
#39878 closed
Aug 5, 2025 -
Inefficient memory resharding in attention layer
#39072 closed
Aug 5, 2025 -
Inefficient default GELU implementation in GPT2
#39073 closed
Aug 5, 2025 -
AttributeError: 'HfTrainerDeepSpeedConfig' object has no attribute 'is_zero3'
#39081 closed
Aug 5, 2025 -
Why `lm-head` weight still exists with `"tie_word_embeddings": true`
#39812 closed
Aug 4, 2025 -
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
#39704 closed
Aug 4, 2025 -
ValueError: Max cache length is not consistent across layers
#39877 closed
Aug 4, 2025 -
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 closed
Aug 4, 2025 -
Exception while inference Qwen2VL and Qwen2VL, assert module.weight.shape[1] == 1
#38665 closed
Aug 4, 2025
38 Issues opened by 38 people
-
Issue running model from ImageSegmentationPipeline
#40071 opened
Aug 10, 2025 -
Transformer GGUF support philosophy / naive question
#40070 opened
Aug 10, 2025 -
[BUG] No umt5 config for GGUF. This is not supported configuration.
#40067 opened
Aug 10, 2025 -
[Mistral3] attn_implementation not applied to vision_tower.config in Mistral3Config due to init order
#40062 opened
Aug 10, 2025 -
Question: How to write a custome tokenizer form scratch
#40056 opened
Aug 9, 2025 -
Whisper transcription accuracy improves when last 1600 samples of input audio are muted
#40054 opened
Aug 9, 2025 -
Support text classification with GPT-OSS models
#40050 opened
Aug 9, 2025 -
Please support loading Qwen 2.5 VL from GGUF
#40049 opened
Aug 9, 2025 -
Recent releases break backwards-compatibility with key_cache
#40046 opened
Aug 8, 2025 -
Support loading glm4moe GGUF
#40042 opened
Aug 8, 2025 -
`plamo-2-1b` broken on latest main
#40034 opened
Aug 8, 2025 -
Add Padding Strategy to DataCollatorForLanguageModeling
#40032 opened
Aug 8, 2025 -
[gpt-oss] MoE routing bug in the mxfp4 implementation (in distributed setting)
#40031 opened
Aug 8, 2025 -
accelerate==1.10.0 and safetensors==0.6.1 are incompatible with transformers==4.53.1
#40020 opened
Aug 8, 2025 -
need GptOssForSequenceClassification
#40018 opened
Aug 8, 2025 -
Customizable Logit Warping Strategies for Generation
#40010 opened
Aug 7, 2025 -
Possible wrong init call
#40001 opened
Aug 7, 2025 -
[gpt-oss] Transform checkpoint from safetensors to state dict
#39992 opened
Aug 7, 2025 -
Triton version check compatibility on windows
#39985 opened
Aug 7, 2025 -
CVE fix for v4.37.2 and v4.38.0
#39983 opened
Aug 7, 2025 -
FSDP2 not compatible with transformers >= 4.54.0 GenericForTokenClassification
#39977 opened
Aug 7, 2025 -
bug in new transformers: 'Florence2ForConditionalGeneration' object has no attribute '_supports_sdpa'
#39974 opened
Aug 7, 2025 -
Gemma3 with fp16 in inference (I don't know if this change is working in fine-tune) #BUG FIX
#39972 opened
Aug 6, 2025 -
change `dataloader_persistent_workers` default value to `True`
#39963 opened
Aug 6, 2025 -
Retaining computational graph after using AutoImageProcessor
#39946 opened
Aug 6, 2025 -
GPT-OSS mxfp4 with triton_kernel: make_default_matmul_mxfp4_w_layout not found
#39945 opened
Aug 6, 2025 -
Breaking change in unset `_tp_plan` attribute
#39943 opened
Aug 6, 2025 -
Still getting "fp16 mixed precision requires a GPU (not 'mps')." error
#39935 opened
Aug 5, 2025 -
[Gemma3N] Not able to add new special tokens to model/tokenizer due to projection error
#39921 opened
Aug 5, 2025 -
When using batch_eval_metrics, inputs are not gathered from different device, which is wrong behavior
#39916 opened
Aug 5, 2025 -
Question: Llama4 weight reshaping
#39910 opened
Aug 5, 2025 -
Hidden torchvision>=0.19.0 dependency results in quiet import failures of e.g. PreTrainedModel
#39907 opened
Aug 5, 2025 -
Add VideoPrism
#39893 opened
Aug 4, 2025 -
[Feature Request] Automatically log parallelism configuration from Accelerate to W&B
#39882 opened
Aug 4, 2025
121 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Segment Anything 2 (SAM2)
#32317 commented on
Aug 8, 2025 • 43 new comments -
Add support for Florence-2
#38188 commented on
Aug 7, 2025 • 27 new comments -
[Feat] Adding Intern-S1
#39722 commented on
Aug 7, 2025 • 22 new comments -
blt wip
#38579 commented on
Aug 8, 2025 • 21 new comments -
[video processors] decode only sampled videos -> less RAM and faster processing
#39600 commented on
Aug 8, 2025 • 16 new comments -
HunYuan opensource
#39606 commented on
Aug 8, 2025 • 13 new comments -
Update model card for gpt neox japanese
#39862 commented on
Aug 5, 2025 • 12 new comments -
[WIP] Computer vision util: vision visualizer
#36892 commented on
Aug 10, 2025 • 9 new comments -
Mistral: Add support for interleaved attention
#39799 commented on
Aug 8, 2025 • 9 new comments -
Fix missing initializations for models created in 2022
#39772 commented on
Aug 8, 2025 • 6 new comments -
support MiniCPM-o2.6
#37917 commented on
Aug 7, 2025 • 6 new comments -
Improve Gemma3n model and tests
#39764 commented on
Aug 6, 2025 • 5 new comments -
[WIP] RoPE refactor
#39847 commented on
Aug 8, 2025 • 4 new comments -
Refactor vit-like models
#39816 commented on
Aug 5, 2025 • 3 new comments -
README: Update Bert Japanese model card
#39466 commented on
Aug 6, 2025 • 3 new comments -
🌐 [i18n-KO] Translated grounding-dino.md to Korean
#39861 commented on
Aug 9, 2025 • 3 new comments -
Fix issue #39191 respect accelerate config to disable torch.dynamo compilation
#39683 commented on
Aug 6, 2025 • 2 new comments -
Fix rope_deltas corruption in Qwen2.5VL during CFG generation
#39756 commented on
Aug 4, 2025 • 2 new comments -
🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean
#39519 commented on
Aug 10, 2025 • 2 new comments -
Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling
#39485 commented on
Aug 8, 2025 • 2 new comments -
🌐 [i18n-KO] Translated `main_classes/backbones.md` to Korean
#39714 commented on
Aug 9, 2025 • 1 new comment -
[Tests] [Bugfix] Make weights tied for `dynamic_tied_weights` test
#39740 commented on
Aug 5, 2025 • 1 new comment -
🌐 [i18n-KO] Translated `pipelines.md` to Korean
#39577 commented on
Aug 5, 2025 • 1 new comment -
Feat: add Kwai-Keye transformers
#39292 commented on
Aug 10, 2025 • 1 new comment -
fix bug when using DP in trl, the batch size of input and output dism…
#38938 commented on
Aug 8, 2025 • 1 new comment -
Enable SIM rules
#39806 commented on
Aug 6, 2025 • 1 new comment -
🌐 [i18n-KO] Translated `gemma3.md` to Korean
#39865 commented on
Aug 6, 2025 • 1 new comment -
[Draft] Add Llasa TTS family of models
#39760 commented on
Aug 4, 2025 • 1 new comment -
🌐 [i18n-KO] Translated `chat_extras.md` to Korean
#39863 commented on
Aug 10, 2025 • 1 new comment -
handle multimodal models with tp_plan on the text_config
#39735 commented on
Aug 4, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `models.md` to Korean
#39518 commented on
Aug 4, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean
#39517 commented on
Aug 5, 2025 • 0 new comments -
Add model arcinstitute state
#39480 commented on
Aug 7, 2025 • 0 new comments -
refactor(modeling_llama): make RotaryEmbedding default path explicit
#39831 commented on
Aug 4, 2025 • 0 new comments -
Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs
#39435 commented on
Aug 7, 2025 • 0 new comments -
Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs
#39364 commented on
Aug 7, 2025 • 0 new comments -
Fix DeepSpeed mixed precision precedence over Accelerate defaults
#39856 commented on
Aug 10, 2025 • 0 new comments -
WIP: Initial support for bnb 4bit on any nn.Parameter
#39859 commented on
Aug 7, 2025 • 0 new comments -
Fix the issue that csm model cannot work with pipeline mode.
#39349 commented on
Aug 8, 2025 • 0 new comments -
feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT
#39140 commented on
Aug 5, 2025 • 0 new comments -
Update Dockerfiles to install packages inside a virtual environment
#39098 commented on
Aug 4, 2025 • 0 new comments -
Update README.md
#39869 commented on
Aug 6, 2025 • 0 new comments -
Allow compression on meta device
#39039 commented on
Aug 8, 2025 • 0 new comments -
fix: Catch correct ConnectionError for additional_chat_templates
#39874 commented on
Aug 9, 2025 • 0 new comments -
FP-Quant NVFP4 and Python 3.9 support
#39876 commented on
Aug 9, 2025 • 0 new comments -
[qwen-vl] fix beam search with videos
#39726 commented on
Aug 4, 2025 • 0 new comments -
Fix HfArgumentParser to filter out dict types from Union
#39741 commented on
Aug 5, 2025 • 0 new comments -
Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock
#39743 commented on
Aug 5, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean
#39713 commented on
Aug 10, 2025 • 0 new comments -
use untyped storage for dtensors due to deprecation
#39697 commented on
Aug 4, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `deepseek_v3.md` to Korean
#39649 commented on
Aug 10, 2025 • 0 new comments -
fix tensor device when loading state dict
#39623 commented on
Aug 8, 2025 • 0 new comments -
Stop using `from_legacy_cache` as Cache initialization
#39765 commented on
Aug 6, 2025 • 0 new comments -
feat: add `is_fast` to ImageProcessor
#39603 commented on
Aug 6, 2025 • 0 new comments -
[gemma3] update conversion key mapping
#39778 commented on
Aug 4, 2025 • 0 new comments -
Add Fast Image Processor for ImageGPT
#39592 commented on
Aug 7, 2025 • 0 new comments -
Use `dtype` instead of `torch_dtype` everywhere!
#39782 commented on
Aug 7, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `auto_docstring.md` to Korean
#39571 commented on
Aug 9, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean
#39563 commented on
Aug 8, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean
#39559 commented on
Aug 10, 2025 • 0 new comments -
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean
#39557 commented on
Aug 10, 2025 • 0 new comments -
Add Muon optimizer implementation and integration
#39541 commented on
Aug 8, 2025 • 0 new comments -
Served models handle with nested content
#39792 commented on
Aug 5, 2025 • 0 new comments -
hangs during training using deepspeed
#39275 commented on
Aug 8, 2025 • 0 new comments -
Support for context-free-grammars (CFG) to constrain model output
#25778 commented on
Aug 7, 2025 • 0 new comments -
[Contributions Welcome] Add Fast Image Processors
#36978 commented on
Aug 7, 2025 • 0 new comments -
[RFC] Updating pipeline models
#26690 commented on
Aug 7, 2025 • 0 new comments -
Loading audio in video from video URLs fail with chat template
#39076 commented on
Aug 7, 2025 • 0 new comments -
ImportError: cannot import name 'pipeline' from 'transformers'
#39137 commented on
Aug 6, 2025 • 0 new comments -
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [267,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
#33985 commented on
Aug 6, 2025 • 0 new comments -
transformers env fails with: ModuleNotFoundError: No module named 'PIL'
#39779 commented on
Aug 5, 2025 • 0 new comments -
We now require users to upgrade torch to at least v2.6 in order to use the function.
#38464 commented on
Aug 5, 2025 • 0 new comments -
MistralCommonTokenizer does not match PreTrainedTokenizer
#39841 commented on
Aug 5, 2025 • 0 new comments -
Support topNSigma sampling in `generate`
#39850 commented on
Aug 5, 2025 • 0 new comments -
ValueError: Number of image placeholders in the prompt does not match the number of images. internVL3
#39703 commented on
Aug 5, 2025 • 0 new comments -
InternVL, PerceptionLM inference freeze in 4.54.1
#39872 commented on
Aug 5, 2025 • 0 new comments -
[FEAT] [non-CUDA]: Support alternative implementation for `constraints.positive_definite.check`
#36660 commented on
Aug 5, 2025 • 0 new comments -
Torch patches tracker for HPU/Gaudi
#39175 commented on
Aug 5, 2025 • 0 new comments -
Missing einops dependency causing ModuleNotFoundError
#39811 commented on
Aug 5, 2025 • 0 new comments -
Regression - High memory usage when using transformers model with FSDP + LoRA
#39795 commented on
Aug 5, 2025 • 0 new comments -
Accelerate beam search decoding via tree attention
#39682 commented on
Aug 5, 2025 • 0 new comments -
Failed to export PyTorch traced graph of Mixtral-8x7B-Instruct-v0.1 due to the PR #32429
#38518 commented on
Aug 4, 2025 • 0 new comments -
Support for per-token latency tracking in `generate()` (suggested options: using callback, profiler class, or using a config flag)
#39437 commented on
Aug 4, 2025 • 0 new comments -
T5Gemma failing on provided example
#39522 commented on
Aug 4, 2025 • 0 new comments -
Tensor parallelism for GLM-4.5
#39868 commented on
Aug 4, 2025 • 0 new comments -
Florence2ForConditionalGeneration does not support Flash Attention 2.0 yet ?...
#39860 commented on
Aug 4, 2025 • 0 new comments -
Handling of full_text_row_masked_out_mask in mllama is incorrect.
#39379 commented on
Aug 4, 2025 • 0 new comments -
transformers: FlaubertTokenizer: do_lowercase_and_remove_accent: make the logger warning actionable (don't only tell what's wrong, rather suggest what could be done about that)
#39224 commented on
Aug 4, 2025 • 0 new comments -
Exporting Llava decoder into ONNX format
#38924 commented on
Aug 4, 2025 • 0 new comments -
torch fake_tensor load hf model failed
#39217 commented on
Aug 4, 2025 • 0 new comments -
v4.53.0 - Qwen 2.5 VL Flash Attention error - object has no attribute is_causal
#39231 commented on
Aug 4, 2025 • 0 new comments -
Memory leak occurred during training qwen-2.5-vl
#39803 commented on
Aug 4, 2025 • 0 new comments -
Adds Universal Intelligence to awesome transformers documentation
#38641 commented on
Aug 7, 2025 • 0 new comments -
Add Bagel
#38569 commented on
Aug 10, 2025 • 0 new comments -
[omni modality] support composite processor config
#38142 commented on
Aug 5, 2025 • 0 new comments -
fix: qwen2.5 omni apply_chat_template system content check
#37511 commented on
Aug 8, 2025 • 0 new comments -
Add Plain-DETR
#37096 commented on
Aug 10, 2025 • 0 new comments -
Add Ovis2 model and processor implementation
#37088 commented on
Aug 8, 2025 • 0 new comments -
[Community contributions] Model cards
#36979 commented on
Aug 10, 2025 • 0 new comments -
Please support GGUF format for UMT5EncoderModel
#36774 commented on
Aug 10, 2025 • 0 new comments -
"pipeline" is not exported from module "transformers"
#37646 commented on
Aug 10, 2025 • 0 new comments -
YaRN: factor is not effective with original_max_position_embeddings
#38224 commented on
Aug 10, 2025 • 0 new comments -
Attention refactor in #35235 adds a `__getitem__` into the forward pass, which causes errors with torch dynamo.
#38271 commented on
Aug 10, 2025 • 0 new comments -
Potential Memory Leak or Caching in Fast Image Processor
#38656 commented on
Aug 10, 2025 • 0 new comments -
CPMANT Model Fails to Run Following Official Tutorial
#39026 commented on
Aug 10, 2025 • 0 new comments -
AutoModelForCausalLM.from_pretrained(..., device_map=...) ignore `Tensor.retain_grad()` in Multi-GPUs setting
#39036 commented on
Aug 10, 2025 • 0 new comments -
QWEN2VLProcessor missing video_token_id in mm_token_type_ids
#39112 commented on
Aug 10, 2025 • 0 new comments -
bf16_full_eval=True moves model to device before FSDP application and causes cuda OOM
#39136 commented on
Aug 10, 2025 • 0 new comments -
apply_rotary_pos_emb_flashatt failed during triton jit compilation 'constexpr' object has no attribute 'bit_length'
#39167 commented on
Aug 10, 2025 • 0 new comments -
[Trainer] Eval loss depends on batch size (with solution)
#39241 commented on
Aug 10, 2025 • 0 new comments -
TypeError: GenerationMixin._extract_past_from_model_output() got an unexpected keyword argument 'standardize_cache_format'
#39336 commented on
Aug 10, 2025 • 0 new comments -
`MoshiIntegrationTests` started to fail after #34464
#38725 commented on
Aug 9, 2025 • 0 new comments -
CUDA OOM when running meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
#37532 commented on
Aug 9, 2025 • 0 new comments -
Improve CI/CD by completing migration from setup.py to pyproject.toml
#38928 commented on
Aug 9, 2025 • 0 new comments -
Qwen3 MOE models w/non-empty `mlp_only_layers` fail when `output_router_logits=True`
#39203 commented on
Aug 9, 2025 • 0 new comments -
Inference with model.generate( ) using a quantized model leads to assertion error
#39311 commented on
Aug 9, 2025 • 0 new comments -
Whisper demo code for model + processor API is broken
#39318 commented on
Aug 9, 2025 • 0 new comments -
Off-by-one error when using flash_attention with a sliding window
#39408 commented on
Aug 9, 2025 • 0 new comments -
Support `StaticCache` in assisted generation
#32946 commented on
Aug 8, 2025 • 0 new comments -
ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time
#39542 commented on
Aug 8, 2025 • 0 new comments -
Please help i am trying to run model but issue
#39260 commented on
Aug 8, 2025 • 0 new comments