clip

MLX port of CLIP

This is an example to convert CoreNet's CLIP model implementation to MLX's CLIP example with some customized modification. MLX is a machine learning framework that provides native Apple Silicon hardware support.

Conversion

To convert an example CoreNet's CLIP model to the example MLX CLIP using the files in this directory:

cd mlx_examples/clip/

# Install required dependencies
# We assume that the main requirements.txt is already installed.
pip install -r requirements.txt

# Convert the model
python main_clip_to_mlx.py \
    --common.config-file "../../projects/range_augment/clip/clip_vit_base.yaml" \
    --model.multi-modal-image-text.pretrained https://docs-assets.developer.apple.com/ml-research/models/cvnets-v2/examples/range_augment/clip/clip_vit_base_16.pt \
    --common.results-loc results/mlx_model/

# Try example inference
python clip.py

Benchmarking results

Comparing to PyTorch, given the input as ["a photo of cat", "a photo of dog"] prompt and the assets/{cat,dog}.jpeg images. The results are the following on a M2 Ultra:

Model	PyTorch time 100iters (s)	MLX time 100iters (s)	Speedup (%)
FP16 Base variant	2.7322	1.0743	60.68%
FP16 Huge variant	4.9098	4.3189	12.04%

Name		Name	Last commit message	Last commit date
parent directory ..
results		results
README.md		README.md
__init__.py		__init__.py
clip.py		clip.py
image_processor.py		image_processor.py
main_clip_to_mlx.py		main_clip_to_mlx.py
main_test_clip_mlx.py		main_test_clip_mlx.py
model.py		model.py
requirements.txt		requirements.txt
tokenizer.py		tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip

clip

README.md

MLX port of CLIP

Conversion

Benchmarking results

Files

clip

Directory actions

More options

Directory actions

More options

Latest commit

History

clip

Folders and files

parent directory

README.md

MLX port of CLIP

Conversion

Benchmarking results