Pulse · huggingface/transformers · GitHub

August 3, 2025 – August 10, 2025

Overview

156 Active pull requests

81 Active issues

1 Release published by 1 person

v4.55.0 v4.55.0: New openai GPT OSS model!
published Aug 5, 2025

82 Pull requests merged by 43 people

unpin torchcodec==0.5.0 and use torch 2.8 on daily CI
#40072 merged Aug 10, 2025
Update HuBERT model card according to template
#39742 merged Aug 10, 2025
Revert "fix notification_service.py about time_spent"
#40044 merged Aug 8, 2025
GLM-4.5V Model Support
#39805 merged Aug 8, 2025
fix notification_service.py about time_spent
#40037 merged Aug 8, 2025
Bnb failling tests
#40026 merged Aug 8, 2025
Tie weights recursively on all submodels
#39996 merged Aug 8, 2025
[core] Refactor the Cache logic to make it simpler and more general
#39797 merged Aug 8, 2025
Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991)
#40024 merged Aug 8, 2025
Harmonize past_key_value to past_key_valueS everywhere
#39956 merged Aug 8, 2025
Fix an annoying flaky test
#40000 merged Aug 8, 2025
Higgs modules_to_not_convert standardization
#39989 merged Aug 8, 2025
Fix broken image inference for Fuyu model
#39915 merged Aug 8, 2025
pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI
#40013 merged Aug 7, 2025
Update expected output values after #39885 (part 2)
#40015 merged Aug 7, 2025
Raising error when quantizing a quantized model
#39998 merged Aug 7, 2025
docs: fix duplication in 'en/optimizers.md'
#40014 merged Aug 7, 2025
unpin torch<2.8 on circleci
#40012 merged Aug 7, 2025
FA2 can continue generation from cache
#39843 merged Aug 7, 2025
Fix default values of getenv
#39867 merged Aug 7, 2025
Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips
#39965 merged Aug 7, 2025
fix: remove CHAT_TEMPLATE import in tests for deepseek-vl
#40003 merged Aug 7, 2025
Fix missing video inputs for PerceptionLM.
#39971 merged Aug 7, 2025
Fix int4 quantized model cannot work with cpu
#39724 merged Aug 7, 2025
Update expected output values after #39885 (part 1)
#39990 merged Aug 7, 2025
Fix consistency
#39995 merged Aug 7, 2025
Fix return typehint for decoder and annotate inv_freq
#39610 merged Aug 7, 2025
Bump transformers from 4.48.0 to 4.53.0 in /examples/tensorflow/language-modeling-tpu
#39967 merged Aug 7, 2025
Fix gemma3n feature extractor's incorrect squeeze
#39919 merged Aug 7, 2025
[Idefics] fix device mismatch
#39981 merged Aug 7, 2025
Various test fixes for AMD
#39978 merged Aug 7, 2025
Support input_embeds in torch exportable decoders
#39836 merged Aug 7, 2025
[superglue] Fixed the way batch mask was applied to the scores before match assignment computation
#39968 merged Aug 7, 2025
Gemma3 fixes
#39960 merged Aug 7, 2025
Modular fix: remove the model name in find_file_type
#39897 merged Aug 6, 2025
chore: update Deformable_Detr model card
#39902 merged Aug 6, 2025
[bugfix] fix flash_attention_2 unavailable error on Ascend NPU
#39844 merged Aug 6, 2025
Fix fix_and_overwrite mode of utils/check_docstring.py
#39369 merged Aug 6, 2025
remove triton_kernels dep with kernels instead
#39926 merged Aug 6, 2025
fix glm4v image process
#39964 merged Aug 6, 2025
fix typo
#39936 merged Aug 6, 2025
Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts
#39959 merged Aug 6, 2025
docs: fix typo in 'quantization-aware training'
#39904 merged Aug 6, 2025
Enable gpt-oss mxfp4 on older hardware (sm75+)
#39940 merged Aug 6, 2025
Fix MXFP4 quantizer validation to allow CPU inference with dequantize option
#39953 merged Aug 6, 2025
[docs] ko toc fix
#39927 merged Aug 6, 2025
circleci: pin torch 2.7.1 until torchcodec is updated
#39951 merged Aug 6, 2025
Fix CI: Tests failing on CPU due to torch.device('cpu').index being None
#39933 merged Aug 6, 2025
Avoid utils/check_bad_commit.py failing due to rate limit (requesting api.github.com)
#39918 merged Aug 5, 2025
[CI] post-GptOss fixes for green CI
#39929 merged Aug 5, 2025
gpt_oss last chat template changes
#39925 merged Aug 5, 2025
Add GPT OSS model from OpenAI
#39923 merged Aug 5, 2025
🌐 [i18n-KO] Translated cache_explanation.md to Korean
#39535 merged Aug 5, 2025
Export SmolvLM
#39614 merged Aug 5, 2025
Update object_detection.md
#39909 merged Aug 5, 2025
run model debugging with forward arg
#39905 merged Aug 5, 2025
Revert "remove dtensors, not explicit (#39840)"
#39912 merged Aug 5, 2025
Fix aria tests
#39879 merged Aug 5, 2025
Fix eval thread fork bomb
#39717 merged Aug 5, 2025
Replace video_fps with fps in tests
#39898 merged Aug 5, 2025
Fix misleading WandB error when WANDB_DISABLED is set
#39891 merged Aug 5, 2025
Avoid aliasing in cond's branches for torch 2.8
#39488 merged Aug 5, 2025
Remove unnecessary CUDA sync in qwen2_5_vl
#39870 merged Aug 5, 2025
fix test_working_of_tp failure of accelerate ut
#39828 merged Aug 5, 2025
[Exaone4] Fixes the attn implementation!
#39906 merged Aug 5, 2025
Reorder serving docs
#39634 merged Aug 5, 2025
chore: update DETR model card
#39822 merged Aug 4, 2025
Add support for ModernBertForMultipleChoice
#39232 merged Aug 4, 2025
send some feedback when manually building doc via comment
#39889 merged Aug 4, 2025
Update cohere2 vision test
#39888 merged Aug 4, 2025
[DOCS] : Improved mimi model card
#39824 merged Aug 4, 2025
Fix link to models in README
#39880 merged Aug 4, 2025
Better return type hint for AutoModelForCausalLM and AutoModelForImageTextToText
#39881 merged Aug 4, 2025
Set torch.backends.cudnn.allow_tf32 = False for CI
#39885 merged Aug 4, 2025
Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor
#39858 merged Aug 4, 2025
Rework add-new-model-like with modular and make test filenames coherent
#39612 merged Aug 4, 2025
Fix quant docker for fp-quant
#39641 merged Aug 4, 2025
Fix attn_implementation setter for models with backbone_config
#39855 merged Aug 4, 2025
Add support for including in-memory videos (not just files/urls) in apply_chat_template
#39494 merged Aug 4, 2025
Use comment to build doc on PRs
#39846 merged Aug 4, 2025
Refactor label name handling for PEFT models in Trainer class
#39265 merged Aug 4, 2025
Improve is_wandb_available function to verify WandB installation
#39875 merged Aug 4, 2025

74 Pull requests opened by 56 people

Remove deprecated max_size parameter from ConditionalDetrImageProcessor
#39883 opened Aug 4, 2025
added Textnet fast image processor
#39884 opened Aug 4, 2025
🌐 [i18n-KO] Translated `perf_train_gaudi.md` to Korean
#39886 opened Aug 4, 2025
Move old generation modes to the Hub 🧹🧹🧹🧽🧽
#39887 opened Aug 4, 2025
🌐 [i18n-KO] Translated `jamba.md` to Korean
#39890 opened Aug 4, 2025
[docs] Add reference to HF-maintained `custom_generate` collections
#39894 opened Aug 4, 2025
Add Videoprism
#39895 opened Aug 4, 2025
[model] Support MiniCPM-V 4.0
#39899 opened Aug 5, 2025
🌐 [i18n-KO] Translated `fp_quant` to Korean
#39901 opened Aug 5, 2025
🌐 [i18n-KO] Translated clipseg.md to Korean
#39903 opened Aug 5, 2025
Update dynamic attnt setter for multimodals
#39908 opened Aug 5, 2025
🌐 [i18n-KO] Translated `tiny_agents.md` to Korean
#39913 opened Aug 5, 2025
🌐 [i18n-KO] Updated ko/perf_train_cpu.md
#39917 opened Aug 5, 2025
🌐 [i18n-KO] Updated ko/perf_train_special.md
#39920 opened Aug 5, 2025
🌐 [i18n-KO] Translated `attention_interface.md` to Korean
#39922 opened Aug 5, 2025
Add chat template tests
#39924 opened Aug 5, 2025
Fix hidden torchvision>=0.15 dependency issue
#39928 opened Aug 5, 2025
Add missing special token properties to MistralCommonTokenizer
#39930 opened Aug 5, 2025
Registers StaticCache serialization functions for torch.export.export
#39931 opened Aug 5, 2025
Fix whisper `return_language` with `return_timestamp=word`
#39938 opened Aug 5, 2025
fixing image_utils.py todo
#39941 opened Aug 6, 2025
fix llama issue
#39942 opened Aug 6, 2025
Add back `_tp_plan` attribute
#39944 opened Aug 6, 2025
Add pytest marker: `torch_compile_test` and `torch_export_test`
#39950 opened Aug 6, 2025
Use torch._check instead of a test to make the model Gemma3 exportable
#39962 opened Aug 6, 2025
Add Keypoint Matcher pipeline
#39970 opened Aug 6, 2025
Causal loss for `ForConditionalGeneration`
#39973 opened Aug 7, 2025
[bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM
#39975 opened Aug 7, 2025
Fix Qwen3 MoE GGUF architecture mismatch
#39976 opened Aug 7, 2025
Fix cross-attention masking before residual connection
#39979 opened Aug 7, 2025
Fix setting attention for multimodal models
#39984 opened Aug 7, 2025
fix: resolve triton version check compatibility on windows
#39986 opened Aug 7, 2025
Add vggt-hf copied from vit
#39987 opened Aug 7, 2025
Update Glm4V processor and add tests
#39988 opened Aug 7, 2025
Default to dequantize if cpu in device_map for mxfp4
#39993 opened Aug 7, 2025
chore: Add type hints to import_utils.py module
#39994 opened Aug 7, 2025
make sure position_ids are passed in for causal mask creation for gpt-oss
#39997 opened Aug 7, 2025
allow TP to work in ND-parallel with fsdp cpu ram efficient loading
#39999 opened Aug 7, 2025
[`Flash Attention`] Fix flash attention integration
#40002 opened Aug 7, 2025
Fix PerceptionLM image preprocessing for non-tiled image input.
#40006 opened Aug 7, 2025
🚨 Use lru_cache for sine pos embeddings MaskFormer
#40007 opened Aug 7, 2025
Fixes for EncoderDecoderCache
#40008 opened Aug 7, 2025
feat: extract rev in attn_implementation kernels via @
#40009 opened Aug 7, 2025
🌐 [i18n-KO] Translated `optimizers.md` to Korean
#40011 opened Aug 7, 2025
[WIP] Fix naive for loops for MoE models resulting in sub 20% downstream MFU for training with trl, e.t.c (Qwen3, Deepseek V3, Ernie 4.5, GLM 4.5, Dots1)
#40016 opened Aug 7, 2025
Feat/add gpt oss sequence classification
#40019 opened Aug 8, 2025
[fix] batch inference for llava_onevision
#40021 opened Aug 8, 2025
fix: resolve dropout type error in DogeDecoder
#40022 opened Aug 8, 2025
Add support for SDPA for OWLViT and OWLv2
#40023 opened Aug 8, 2025
[GLM4V] fix vision placeholder mask
#40025 opened Aug 8, 2025
Add amd runners to run-slow command
#40027 opened Aug 8, 2025
Revert FA2 kwargs construction
#40029 opened Aug 8, 2025
Update boxes expectations for OWLViT test
#40030 opened Aug 8, 2025
Add model card for MobileViT
#40033 opened Aug 8, 2025
Remove deprecated cache-related objects
#40035 opened Aug 8, 2025
Fix error on importing unavailable torch.distributed
#40038 opened Aug 8, 2025
New DynamicSlidingWindow layer & caches
#40039 opened Aug 8, 2025
[`GPT Big Code`] Fix attention scaling
#40041 opened Aug 8, 2025
Add GptOssForSequenceClassification for GPT-OSS models
#40043 opened Aug 8, 2025
(small) fix conditional for input_ids and input_embeds in marian
#40045 opened Aug 8, 2025
Update wavlm.md to match new model card template
#40047 opened Aug 8, 2025
Standardize BARTpho model card: badges, new examples, fixed broken im…
#40051 opened Aug 9, 2025
Auto-log parallelism info to wandb.config using HF Accelerate
#40055 opened Aug 9, 2025
updated visualBERT modelcard
#40057 opened Aug 9, 2025
GGUF Qwen2VL
#40058 opened Aug 9, 2025
Fix Inefficient GELU implementation in GPT2
#40059 opened Aug 9, 2025
Avoid CUDA stream sync
#40060 opened Aug 10, 2025
🌐 [i18n-KO] Translated `vitdet.md` to Korean
#40061 opened Aug 10, 2025
fix: move super().__init__ after vision_config init in Mistral3Config
#40063 opened Aug 10, 2025
🌐 [i18n-KO] Translated `videomae.md` to Korean
#40064 opened Aug 10, 2025
Delay float32 upcast in ForCausalLMLoss after filtering ignore_index
#40065 opened Aug 10, 2025
Change Qwen2RMSNorm to RMSNorm from PyTorch
#40066 opened Aug 10, 2025
Add missing arguments
#40068 opened Aug 10, 2025
Remove _prepare_flash_attention_from_position_ids
#40069 opened Aug 10, 2025

43 Issues closed by 22 people

Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 closed Aug 10, 2025
Gemma2 fall back to cpu execusion when attn_implementation='flash_attention_2'
#39188 closed Aug 10, 2025
Previous PRs introduced a bug on Accumulated Gradients Losses
#40052 closed Aug 9, 2025
Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model
#37248 closed Aug 9, 2025
Pretrainedtokenizerfast Segmentation fault
#39099 closed Aug 9, 2025
New release 4.53.0 breaks HF trainer/model
#39111 closed Aug 9, 2025
Gradient accumulation steps for Vision Languge model
#39123 closed Aug 9, 2025
Not capable of exporting Mistral to ONNX format with the use of caching
#39162 closed Aug 9, 2025
Error when loading gguf file
#40040 closed Aug 9, 2025
Weights not tied when loading `from_pretrained` with a wrapped model
#39900 closed Aug 8, 2025
`TypeError: 'builtins.safe_open' object is not iterable` in `load_pytorch_state_dict_in_tf2_model `
#40028 closed Aug 8, 2025
Major issues with transformers version causing rubbish generations with Gemma3 family using vllm
#40017 closed Aug 8, 2025
AttributeError: 'BitsAndBytesConfig' object has no attribute 'get_loading_attributes' with transformers 4.55.0
#39939 closed Aug 8, 2025
Gemma3n get_placeholder_mask issue
#39991 closed Aug 8, 2025
flash-attn cannot perform deterministic computation
#39982 closed Aug 8, 2025
RoBERTa is not well implemented for tokenizers with pad_token_id != 1
#34528 closed Aug 8, 2025
[DeepSeek-V3] Different rotary embedding implementation between DeepSeek-AI and Transformers
#39687 closed Aug 8, 2025
ModernBertUnpaddedRotaryEmbedding __init__ error
#39934 closed Aug 7, 2025
video_inputs are not passed to perception_lm
#40004 closed Aug 7, 2025
Flash Attention fails with non aligned position_ids
#39814 closed Aug 7, 2025
`convert_deepseek_vl_weights_to_hf.py` not included in v4.55.0 release.
#39966 closed Aug 7, 2025
[Gemma3N] Audio processing issue
#39911 closed Aug 7, 2025
v4.55.0 Idefics3 RuntimeError Tensors on different devices
#39947 closed Aug 7, 2025
[gpt‑oss] eager_attention_forward not using sliding-window attention for GPT‑OSS models
#39954 closed Aug 7, 2025
Finetune `gpt-oss-20b` with `mxfp4` quantization
#39969 closed Aug 6, 2025
Fix grammatically incorrect variable name "expert_hitted" → "expert_hit" in MoE implementation
#39955 closed Aug 6, 2025
transformers serve doesn't handle OPTIONS http method
#39932 closed Aug 6, 2025
454545
#39864 closed Aug 6, 2025
ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#38442 closed Aug 6, 2025
Streaming mode support on HF vs kyutai-labs for the mimi model
#38535 closed Aug 6, 2025
enable GraniteMoeHybridIntegrationTest in UT
#38542 closed Aug 6, 2025
Llama4 inference encounter unsupported op in dynamo ?
#38118 closed Aug 6, 2025
Misleading WandB error when WANDB_DISABLED=True and report_to="wandb" are both set
#39878 closed Aug 5, 2025
pytorch_utils.py > isin_mps_friendly > RuntimeError: Expected elements.dtype() == test_elements.dtype() to be true, but got false.
#37423 closed Aug 5, 2025
Inefficient memory resharding in attention layer
#39072 closed Aug 5, 2025
Inefficient default GELU implementation in GPT2
#39073 closed Aug 5, 2025
facebook/dinov2-with-registers-giant does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.
#39075 closed Aug 5, 2025
AttributeError: 'HfTrainerDeepSpeedConfig' object has no attribute 'is_zero3'
#39081 closed Aug 5, 2025
Why `lm-head` weight still exists with `"tie_word_embeddings": true`
#39812 closed Aug 4, 2025
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
#39704 closed Aug 4, 2025
ValueError: Max cache length is not consistent across layers
#39877 closed Aug 4, 2025
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 closed Aug 4, 2025
Exception while inference Qwen2VL and Qwen2VL, assert module.weight.shape[1] == 1
#38665 closed Aug 4, 2025

38 Issues opened by 38 people

Issue running model from ImageSegmentationPipeline
#40071 opened Aug 10, 2025
Transformer GGUF support philosophy / naive question
#40070 opened Aug 10, 2025
[BUG] No umt5 config for GGUF. This is not supported configuration.
#40067 opened Aug 10, 2025
[Mistral3] attn_implementation not applied to vision_tower.config in Mistral3Config due to init order
#40062 opened Aug 10, 2025
Question: How to write a custome tokenizer form scratch
#40056 opened Aug 9, 2025
Whisper transcription accuracy improves when last 1600 samples of input audio are muted
#40054 opened Aug 9, 2025
Support text classification with GPT-OSS models
#40050 opened Aug 9, 2025
Please support loading Qwen 2.5 VL from GGUF
#40049 opened Aug 9, 2025
Recent releases break backwards-compatibility with key_cache
#40046 opened Aug 8, 2025
Support loading glm4moe GGUF
#40042 opened Aug 8, 2025
`plamo-2-1b` broken on latest main
#40034 opened Aug 8, 2025
Add Padding Strategy to DataCollatorForLanguageModeling
#40032 opened Aug 8, 2025
[gpt-oss] MoE routing bug in the mxfp4 implementation (in distributed setting)
#40031 opened Aug 8, 2025
accelerate==1.10.0 and safetensors==0.6.1 are incompatible with transformers==4.53.1
#40020 opened Aug 8, 2025
need GptOssForSequenceClassification
#40018 opened Aug 8, 2025
Customizable Logit Warping Strategies for Generation
#40010 opened Aug 7, 2025
Possible wrong init call
#40001 opened Aug 7, 2025
[gpt-oss] Transform checkpoint from safetensors to state dict
#39992 opened Aug 7, 2025
Triton version check compatibility on windows
#39985 opened Aug 7, 2025
CVE fix for v4.37.2 and v4.38.0
#39983 opened Aug 7, 2025
FSDP2 not compatible with transformers >= 4.54.0 GenericForTokenClassification
#39977 opened Aug 7, 2025
bug in new transformers: 'Florence2ForConditionalGeneration' object has no attribute '_supports_sdpa'
#39974 opened Aug 7, 2025
Gemma3 with fp16 in inference (I don't know if this change is working in fine-tune) #BUG FIX
#39972 opened Aug 6, 2025
change `dataloader_persistent_workers` default value to `True`
#39963 opened Aug 6, 2025
Calling `trainer.evaluate()` before `trainer.train()` with FSDP 2 raises `ValueError: When using FSDP2, a model and optimizer must be passed together to `Accelerator.prepare()...`
#39961 opened Aug 6, 2025
TypeError: Received a NoneType for argument video_processor, but a BaseVideoProcessor was expected.(this issue im getting when using doc-ocr)
#39958 opened Aug 6, 2025
Retaining computational graph after using AutoImageProcessor
#39946 opened Aug 6, 2025
GPT-OSS mxfp4 with triton_kernel: make_default_matmul_mxfp4_w_layout not found
#39945 opened Aug 6, 2025
Breaking change in unset `_tp_plan` attribute
#39943 opened Aug 6, 2025
Still getting "fp16 mixed precision requires a GPU (not 'mps')." error
#39935 opened Aug 5, 2025
[Gemma3N] Not able to add new special tokens to model/tokenizer due to projection error
#39921 opened Aug 5, 2025
When using batch_eval_metrics, inputs are not gathered from different device, which is wrong behavior
#39916 opened Aug 5, 2025
Idefics 3: shape mismatch: value tensor of shape [256, 576] cannot be broadcast to indexing result of shape [192, 576]
#39914 opened Aug 5, 2025
Question: Llama4 weight reshaping
#39910 opened Aug 5, 2025
Hidden torchvision>=0.19.0 dependency results in quiet import failures of e.g. PreTrainedModel
#39907 opened Aug 5, 2025
v4.54.1 average_tokens_across_devices=True would cause "ValueError: Tensors must be CUDA and dense" when gathering num_items_in_batch
#39896 opened Aug 4, 2025
Add VideoPrism
#39893 opened Aug 4, 2025
[Feature Request] Automatically log parallelism configuration from Accelerate to W&B
#39882 opened Aug 4, 2025

121 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Segment Anything 2 (SAM2)
#32317 commented on Aug 8, 2025 • 43 new comments
Add support for Florence-2
#38188 commented on Aug 7, 2025 • 27 new comments
[Feat] Adding Intern-S1
#39722 commented on Aug 7, 2025 • 22 new comments
blt wip
#38579 commented on Aug 8, 2025 • 21 new comments
[video processors] decode only sampled videos -> less RAM and faster processing
#39600 commented on Aug 8, 2025 • 16 new comments
HunYuan opensource
#39606 commented on Aug 8, 2025 • 13 new comments
Update model card for gpt neox japanese
#39862 commented on Aug 5, 2025 • 12 new comments
[WIP] Computer vision util: vision visualizer
#36892 commented on Aug 10, 2025 • 9 new comments
Mistral: Add support for interleaved attention
#39799 commented on Aug 8, 2025 • 9 new comments
Fix missing initializations for models created in 2022
#39772 commented on Aug 8, 2025 • 6 new comments
support MiniCPM-o2.6
#37917 commented on Aug 7, 2025 • 6 new comments
Improve Gemma3n model and tests
#39764 commented on Aug 6, 2025 • 5 new comments
[WIP] RoPE refactor
#39847 commented on Aug 8, 2025 • 4 new comments
Refactor vit-like models
#39816 commented on Aug 5, 2025 • 3 new comments
README: Update Bert Japanese model card
#39466 commented on Aug 6, 2025 • 3 new comments
🌐 [i18n-KO] Translated grounding-dino.md to Korean
#39861 commented on Aug 9, 2025 • 3 new comments
Fix issue #39191 respect accelerate config to disable torch.dynamo compilation
#39683 commented on Aug 6, 2025 • 2 new comments
Fix rope_deltas corruption in Qwen2.5VL during CFG generation
#39756 commented on Aug 4, 2025 • 2 new comments
🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean
#39519 commented on Aug 10, 2025 • 2 new comments
Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling
#39485 commented on Aug 8, 2025 • 2 new comments
🌐 [i18n-KO] Translated `main_classes/backbones.md` to Korean
#39714 commented on Aug 9, 2025 • 1 new comment
[Tests] [Bugfix] Make weights tied for `dynamic_tied_weights` test
#39740 commented on Aug 5, 2025 • 1 new comment
🌐 [i18n-KO] Translated `pipelines.md` to Korean
#39577 commented on Aug 5, 2025 • 1 new comment
Feat: add Kwai-Keye transformers
#39292 commented on Aug 10, 2025 • 1 new comment
fix bug when using DP in trl, the batch size of input and output dism…
#38938 commented on Aug 8, 2025 • 1 new comment
Enable SIM rules
#39806 commented on Aug 6, 2025 • 1 new comment
🌐 [i18n-KO] Translated `gemma3.md` to Korean
#39865 commented on Aug 6, 2025 • 1 new comment
[Draft] Add Llasa TTS family of models
#39760 commented on Aug 4, 2025 • 1 new comment
🌐 [i18n-KO] Translated `chat_extras.md` to Korean
#39863 commented on Aug 10, 2025 • 1 new comment
handle multimodal models with tp_plan on the text_config
#39735 commented on Aug 4, 2025 • 0 new comments
🌐 [i18n-KO] Translated `models.md` to Korean
#39518 commented on Aug 4, 2025 • 0 new comments
🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean
#39517 commented on Aug 5, 2025 • 0 new comments
Add model arcinstitute state
#39480 commented on Aug 7, 2025 • 0 new comments
refactor(modeling_llama): make RotaryEmbedding default path explicit
#39831 commented on Aug 4, 2025 • 0 new comments
Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs
#39435 commented on Aug 7, 2025 • 0 new comments
Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs
#39364 commented on Aug 7, 2025 • 0 new comments
Fix DeepSpeed mixed precision precedence over Accelerate defaults
#39856 commented on Aug 10, 2025 • 0 new comments
WIP: Initial support for bnb 4bit on any nn.Parameter
#39859 commented on Aug 7, 2025 • 0 new comments
Fix the issue that csm model cannot work with pipeline mode.
#39349 commented on Aug 8, 2025 • 0 new comments
feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT
#39140 commented on Aug 5, 2025 • 0 new comments
Update Dockerfiles to install packages inside a virtual environment
#39098 commented on Aug 4, 2025 • 0 new comments
Update README.md
#39869 commented on Aug 6, 2025 • 0 new comments
Allow compression on meta device
#39039 commented on Aug 8, 2025 • 0 new comments
fix: Catch correct ConnectionError for additional_chat_templates
#39874 commented on Aug 9, 2025 • 0 new comments
FP-Quant NVFP4 and Python 3.9 support
#39876 commented on Aug 9, 2025 • 0 new comments
[qwen-vl] fix beam search with videos
#39726 commented on Aug 4, 2025 • 0 new comments
Fix HfArgumentParser to filter out dict types from Union
#39741 commented on Aug 5, 2025 • 0 new comments
Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock
#39743 commented on Aug 5, 2025 • 0 new comments
🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean
#39713 commented on Aug 10, 2025 • 0 new comments
use untyped storage for dtensors due to deprecation
#39697 commented on Aug 4, 2025 • 0 new comments
🌐 [i18n-KO] Translated `deepseek_v3.md` to Korean
#39649 commented on Aug 10, 2025 • 0 new comments
fix tensor device when loading state dict
#39623 commented on Aug 8, 2025 • 0 new comments
Stop using `from_legacy_cache` as Cache initialization
#39765 commented on Aug 6, 2025 • 0 new comments
feat: add `is_fast` to ImageProcessor
#39603 commented on Aug 6, 2025 • 0 new comments
[gemma3] update conversion key mapping
#39778 commented on Aug 4, 2025 • 0 new comments
Add Fast Image Processor for ImageGPT
#39592 commented on Aug 7, 2025 • 0 new comments
Use `dtype` instead of `torch_dtype` everywhere!
#39782 commented on Aug 7, 2025 • 0 new comments
🌐 [i18n-KO] Translated `auto_docstring.md` to Korean
#39571 commented on Aug 9, 2025 • 0 new comments
🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean
#39563 commented on Aug 8, 2025 • 0 new comments
🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean
#39559 commented on Aug 10, 2025 • 0 new comments
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean
#39557 commented on Aug 10, 2025 • 0 new comments
Add Muon optimizer implementation and integration
#39541 commented on Aug 8, 2025 • 0 new comments
Served models handle with nested content
#39792 commented on Aug 5, 2025 • 0 new comments
hangs during training using deepspeed
#39275 commented on Aug 8, 2025 • 0 new comments
Support for context-free-grammars (CFG) to constrain model output
#25778 commented on Aug 7, 2025 • 0 new comments
[Contributions Welcome] Add Fast Image Processors
#36978 commented on Aug 7, 2025 • 0 new comments
[RFC] Updating pipeline models
#26690 commented on Aug 7, 2025 • 0 new comments
Loading audio in video from video URLs fail with chat template
#39076 commented on Aug 7, 2025 • 0 new comments
ImportError: cannot import name 'pipeline' from 'transformers'
#39137 commented on Aug 6, 2025 • 0 new comments
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [267,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
#33985 commented on Aug 6, 2025 • 0 new comments
transformers env fails with: ModuleNotFoundError: No module named 'PIL'
#39779 commented on Aug 5, 2025 • 0 new comments
We now require users to upgrade torch to at least v2.6 in order to use the function.
#38464 commented on Aug 5, 2025 • 0 new comments
MistralCommonTokenizer does not match PreTrainedTokenizer
#39841 commented on Aug 5, 2025 • 0 new comments
Support topNSigma sampling in `generate`
#39850 commented on Aug 5, 2025 • 0 new comments
ValueError: Number of image placeholders in the prompt does not match the number of images. internVL3
#39703 commented on Aug 5, 2025 • 0 new comments
InternVL, PerceptionLM inference freeze in 4.54.1
#39872 commented on Aug 5, 2025 • 0 new comments
[FEAT] [non-CUDA]: Support alternative implementation for `constraints.positive_definite.check`
#36660 commented on Aug 5, 2025 • 0 new comments
Torch patches tracker for HPU/Gaudi
#39175 commented on Aug 5, 2025 • 0 new comments
Missing einops dependency causing ModuleNotFoundError
#39811 commented on Aug 5, 2025 • 0 new comments
Regression - High memory usage when using transformers model with FSDP + LoRA
#39795 commented on Aug 5, 2025 • 0 new comments
Accelerate beam search decoding via tree attention
#39682 commented on Aug 5, 2025 • 0 new comments
Failed to export PyTorch traced graph of Mixtral-8x7B-Instruct-v0.1 due to the PR #32429
#38518 commented on Aug 4, 2025 • 0 new comments
Support for per-token latency tracking in `generate()` (suggested options: using callback, profiler class, or using a config flag)
#39437 commented on Aug 4, 2025 • 0 new comments
T5Gemma failing on provided example
#39522 commented on Aug 4, 2025 • 0 new comments
Tensor parallelism for GLM-4.5
#39868 commented on Aug 4, 2025 • 0 new comments
Florence2ForConditionalGeneration does not support Flash Attention 2.0 yet ?...
#39860 commented on Aug 4, 2025 • 0 new comments
Handling of full_text_row_masked_out_mask in mllama is incorrect.
#39379 commented on Aug 4, 2025 • 0 new comments
transformers: FlaubertTokenizer: do_lowercase_and_remove_accent: make the logger warning actionable (don't only tell what's wrong, rather suggest what could be done about that)
#39224 commented on Aug 4, 2025 • 0 new comments
Exporting Llava decoder into ONNX format
#38924 commented on Aug 4, 2025 • 0 new comments
torch fake_tensor load hf model failed
#39217 commented on Aug 4, 2025 • 0 new comments
v4.53.0 - Qwen 2.5 VL Flash Attention error - object has no attribute is_causal
#39231 commented on Aug 4, 2025 • 0 new comments
Memory leak occurred during training qwen-2.5-vl
#39803 commented on Aug 4, 2025 • 0 new comments
Adds Universal Intelligence to awesome transformers documentation
#38641 commented on Aug 7, 2025 • 0 new comments
Add Bagel
#38569 commented on Aug 10, 2025 • 0 new comments
[omni modality] support composite processor config
#38142 commented on Aug 5, 2025 • 0 new comments
fix: qwen2.5 omni apply_chat_template system content check
#37511 commented on Aug 8, 2025 • 0 new comments
Add Plain-DETR
#37096 commented on Aug 10, 2025 • 0 new comments
Add Ovis2 model and processor implementation
#37088 commented on Aug 8, 2025 • 0 new comments
[Community contributions] Model cards
#36979 commented on Aug 10, 2025 • 0 new comments
Please support GGUF format for UMT5EncoderModel
#36774 commented on Aug 10, 2025 • 0 new comments
"pipeline" is not exported from module "transformers"
#37646 commented on Aug 10, 2025 • 0 new comments
YaRN: factor is not effective with original_max_position_embeddings
#38224 commented on Aug 10, 2025 • 0 new comments
Attention refactor in #35235 adds a `__getitem__` into the forward pass, which causes errors with torch dynamo.
#38271 commented on Aug 10, 2025 • 0 new comments
Potential Memory Leak or Caching in Fast Image Processor
#38656 commented on Aug 10, 2025 • 0 new comments
CPMANT Model Fails to Run Following Official Tutorial
#39026 commented on Aug 10, 2025 • 0 new comments
AutoModelForCausalLM.from_pretrained(..., device_map=...) ignore `Tensor.retain_grad()` in Multi-GPUs setting
#39036 commented on Aug 10, 2025 • 0 new comments
QWEN2VLProcessor missing video_token_id in mm_token_type_ids
#39112 commented on Aug 10, 2025 • 0 new comments
bf16_full_eval=True moves model to device before FSDP application and causes cuda OOM
#39136 commented on Aug 10, 2025 • 0 new comments
apply_rotary_pos_emb_flashatt failed during triton jit compilation 'constexpr' object has no attribute 'bit_length'
#39167 commented on Aug 10, 2025 • 0 new comments
[Trainer] Eval loss depends on batch size (with solution)
#39241 commented on Aug 10, 2025 • 0 new comments
TypeError: GenerationMixin._extract_past_from_model_output() got an unexpected keyword argument 'standardize_cache_format'
#39336 commented on Aug 10, 2025 • 0 new comments
`MoshiIntegrationTests` started to fail after #34464
#38725 commented on Aug 9, 2025 • 0 new comments
CUDA OOM when running meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
#37532 commented on Aug 9, 2025 • 0 new comments
Improve CI/CD by completing migration from setup.py to pyproject.toml
#38928 commented on Aug 9, 2025 • 0 new comments
Qwen3 MOE models w/non-empty `mlp_only_layers` fail when `output_router_logits=True`
#39203 commented on Aug 9, 2025 • 0 new comments
Inference with model.generate( ) using a quantized model leads to assertion error
#39311 commented on Aug 9, 2025 • 0 new comments
Whisper demo code for model + processor API is broken
#39318 commented on Aug 9, 2025 • 0 new comments
Off-by-one error when using flash_attention with a sliding window
#39408 commented on Aug 9, 2025 • 0 new comments
Support `StaticCache` in assisted generation
#32946 commented on Aug 8, 2025 • 0 new comments
ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time
#39542 commented on Aug 8, 2025 • 0 new comments
Please help i am trying to run model but issue
#39260 commented on Aug 8, 2025 • 0 new comments