New DynamicSlidingWindowLayer & associated Cache #40039

Cyrilvallez · 2025-08-08T15:54:39Z

What does this PR do?

As per the title. To avoid wasting memory for models with sliding window. As I don't want to reintroduce static hybrid caches by default to avoid all the pitfalls of automatic compilation, but don't want to waste that memory, this is definitely the way to go.

The only change that is needed is to pass the config to DynamicCache, to be able to parse sliding_window/layer_types. If we don't, then the behavior is exactly the same as before.

See the following figures for an illustration:

top: Mistral 7B, all layers are sliding, so the cache stops growing after reaching the window size of 4096
bottom: Gemma 2 9B, 1 out of 2 layers are sliding, so the Cache grows "sublinearly" after reaching the window size of 4096

HuggingFaceDocBuilderDev · 2025-08-08T16:09:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez · 2025-08-11T22:58:44Z

All good now, slow tests on mistral, gemma2 and t5gemma are all similar to main (only a slight fa2 issue that surfaced on a slow test for mistral, but it's unrelated and solved by #40002)

github-actions · 2025-08-11T23:01:33Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: arcee, aria, bitnet, cohere, cohere2, csm, deepseek_v2, deepseek_v3, diffllama, doge, dots1, emu3, ernie4_5, exaone4, fsmt, gemma2

Cyrilvallez added 3 commits August 11, 2025 10:31

start adding the layer

2156e24

style

a1cdb3b

improve

41d55aa

Cyrilvallez force-pushed the dynamic-sliding-hybrid branch from 534a6a4 to 41d55aa Compare August 11, 2025 08:31

Cyrilvallez added 18 commits August 11, 2025 10:34

modular

e35347e

fix

1b16485

fix

d39ada3

improve

f2d3309

generate integration

4418ecb

comment

cfbee04

remove old one

59c4e7a

remove

0a75ff4

fix

328a8d1

fix

dcc0ba2

fix

f39b8f7

fix all recompiles

93cabcb

fix

452b7d8

doc

0cdbb2c

fix

c7e715b

add text config check

b8bac36

fix encoderdecoder cache

1c04c62

add it for all models with sliding/hybrid support

5c6b07d

Cyrilvallez changed the title ~~New DynamicSlidingWindow layer & caches~~ New DynamicSlidingWindow layer & cache Aug 11, 2025

Cyrilvallez changed the title ~~New DynamicSlidingWindow layer & cache~~ New DynamicSlidingWindowLayer & associated Cache Aug 11, 2025

Cyrilvallez and others added 5 commits August 11, 2025 20:03

revert

862cc71

start fixing

63e3926

prophetnet

a72b4b4

fsmt

eeab5fd

Merge branch 'main' into dynamic-sliding-hybrid

abb45a7

fix ddp_data

284a7dd

add test for mistral

758ea66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New DynamicSlidingWindowLayer & associated Cache #40039

New DynamicSlidingWindowLayer & associated Cache #40039

Cyrilvallez commented Aug 8, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2025

Uh oh!

Cyrilvallez commented Aug 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

Uh oh!

New DynamicSlidingWindowLayer & associated Cache #40039

Are you sure you want to change the base?

New DynamicSlidingWindowLayer & associated Cache #40039

Conversation

Cyrilvallez commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2025

Uh oh!

Cyrilvallez commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

Uh oh!

Cyrilvallez commented Aug 8, 2025 •

edited

Loading

Cyrilvallez commented Aug 11, 2025 •

edited

Loading