Intelligent ML-Driven Cache System (Offline + Online)

A high-performance hybrid caching system that combines traditional policies (LRU, LFU, MFU) with both offline-trained and online-learning machine learning models to approximate Belady's Optimal Replacement Policy.

Overview

This project explores intelligent caching by:

Generating realistic Zipfian (skewed) access patterns
Simulating Belady’s optimal replacement using full and partial lookahead
Training ML models (Logistic Regression, XGBoost, CatBoost, LightGBM) to predict cache-worthiness
Implementing a real-time Online ML Cache using the river library
Benchmarking all strategies against LRU, LFU, MFU

Benchmark Results (Offline ML Cache)

Cache Type	Hit Rate
LRU	0.3402
LFU	0.4776
MFU	0.0960
ML (Best Offline Model)	0.4805

Online Learning Results

Metric	Value
Streaming Hit Rate	0.6611
Prediction Accuracy	0.9509

The online ML cache adapts to access patterns in real time, significantly outperforming all static policies.

Features

Custom implementation of LRU, LFU, and MFU
Training and selection of best model among LogReg, XGBoost, CatBoost, LightGBM
Belady’s optimal replacement simulation (full and partial lookahead)
Real-time OnlineMLCache (built with river) that trains on-the-fly
Hit rate + accuracy tracking and full benchmarking

Setup & Usage

1. Install all dependencies

pip install -r requirements.txt

2. Generate labeled data (offline Belady)

python data_generator.py --num_requests 100000 --unique_items 1000 --capacity 100 --output data/labeled_requests.csv

3. Train all ML models

python train_models.py

4. Run offline ML cache

python test_cache.py

5. Benchmark all strategies (LRU, LFU, MFU, ML)

python compare_cache.py

6. Run online ML cache (real-time learning)

python test_online_cache.py

Sample Output

=== Offline Cache Hit Rates ===
LRU : 0.3402
LFU : 0.4776
MFU : 0.0960
ML  : 0.4805

--- Streaming Cache ---
Streaming ML-Cache Hit Rate: 0.6611
Prediction Accuracy:       0.9509

License

MIT License.

👤 Author

Tanishq Parihar Built as a systems + ML hybrid project to explore intelligent infrastructure and predictive caching

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
catboost_info		catboost_info
data		data
models		models
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
data_generator.py		data_generator.py
model.py		model.py
online_cache.py		online_cache.py
requirements.txt		requirements.txt
test_cache.py		test_cache.py
test_online_cache.py		test_online_cache.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intelligent ML-Driven Cache System (Offline + Online)

Overview

Benchmark Results (Offline ML Cache)

Online Learning Results

Features

Setup & Usage

1. Install all dependencies

2. Generate labeled data (offline Belady)

3. Train all ML models

4. Run offline ML cache

5. Benchmark all strategies (LRU, LFU, MFU, ML)

6. Run online ML cache (real-time learning)

Sample Output

License

👤 Author

About

Uh oh!

Releases

Packages

Languages

License

randomfunction/SmartCache

Folders and files

Latest commit

History

Repository files navigation

Intelligent ML-Driven Cache System (Offline + Online)

Overview

Benchmark Results (Offline ML Cache)

Online Learning Results

Features

Setup & Usage

1. Install all dependencies

2. Generate labeled data (offline Belady)

3. Train all ML models

4. Run offline ML cache

5. Benchmark all strategies (LRU, LFU, MFU, ML)

6. Run online ML cache (real-time learning)

Sample Output

License

👤 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages