Add model card for MobileViT #40033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Shivamjan wants to merge 10 commits into huggingface:main from Shivamjan:add-mobilevit-model-card

+90 −50

Shivamjan commented Aug 8, 2025

What does this PR do?

This PR adds a detailed and beginner-friendly model card for MobileViT to the Hugging Face Transformers documentation. The previous model card was minimal and lacked clear explanations about the model architecture. This model retains several elements from the earlier version, as they remain applicable and effective for users.

The new version includes:

A clear explanation of the MobileViT architecture.
Notes on preprocessing and image format.
Clarifies how to use the model for classification and segmentation.
Highlights TensorFlow Lite compatibility for mobile use.
Primary references to the original paper and related resources.

Fixes # (issue)

Before submitting

[ x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ x] Did you read the contributor guideline,
Pull Request section?
[ x] Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
[ x] Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Shivamjan added 2 commits

August 8, 2025 19:01


          Add model card for MobileViT

ede1745


          Merge branch 'main' into add-mobilevit-model-card

3482fdc

Author

Shivamjan commented Aug 8, 2025 •

edited

Loading

@stevhliu Please take a look at your convenience and do let me know if there is any further changes required.

stevhliu mentioned this pull request

[Community contributions] Model cards #36979

Open

stevhliu reviewed

View reviewed changes

Member

stevhliu left a comment

Good start! Please check the model card format again as its missing Pipeline and AutoModel examples!

docs/source/en/model_doc/mobilevit.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/mobilevit.md Outdated Show resolved Hide resolved

Shivamjan and others added 8 commits

August 9, 2025 09:19


          Update docs/source/en/model_doc/mobilevit.md

0d60ed1

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/mobilevit.md

352ce2d

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/mobilevit.md

3c5f262

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/mobilevit.md

a25bc33

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/mobilevit.md

694f26d

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update mobilevit.md

f7012b7


          Update mobilevit.md

5f15b28


          Update mobilevit.md

d1a4df1

stevhliu approved these changes

View reviewed changes

Member

stevhliu left a comment

Thanks, just a few more comments!

docs/source/en/model_doc/mobilevit.md


		This model was contributed by [matthijs](https://huggingface.co/Matthijs). The TensorFlow version of the model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code and weights can be found [here](https://github.com/apple/ml-cvnets).
		from transformers import pipeline

Member

stevhliu Aug 11, 2025

You can just run it on a single image rather than a dataset

import torch
from transformers import pipeline

pipeline = pipeline(
    task="image-classification",
    model="apple/mobilevit-small",
    torch_dtype=torch.float16,
    device=0
)
pipeline("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")

docs/source/en/model_doc/mobilevit.md


		```python

		import torch

Member

stevhliu Aug 11, 2025

import torch
import requests
from PIL import Image
from transformers import AutoModelForImageClassification, AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained(
    "apple/mobilevit-small",
    use_fast=True,
)
model = AutoModelForImageClassification.from_pretrained(
    "apple/mobilevit-small",
)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = image_processor(image, return_tensors="pt").to("cuda")

with torch.no_grad():
  logits = model(**inputs).logits
predicted_class_id = logits.argmax(dim=-1).item()

class_labels = model.config.id2label
predicted_class_label = class_labels[predicted_class_id]
print(f"The predicted class label is: {predicted_class_label}")

docs/source/en/model_doc/mobilevit.md

-              <PipelineTag pipeline="image-classification"/>
+              - Does **not** operate on sequential data, it's purely designed for image tasks.
+              - Feature maps are used directly instead of token embeddings.
+              - Use [`MobileViTImageProcessor`](https://huggingface.co/docs/transformers/main/en/model_doc/mobilevit#transformers.MobileViTImageProcessor) to preprocess images.

Member

stevhliu Aug 11, 2025

Suggested change

      
            - Use [`MobileViTImageProcessor`](https://huggingface.co/docs/transformers/main/en/model_doc/mobilevit#transformers.MobileViTImageProcessor) to preprocess images.
          
            - Use [`MobileViTImageProcessor`] to preprocess images.

docs/source/en/model_doc/mobilevit.md

Comment on lines +114 to +116

+              - The **classification models** are pretrained on [**ImageNet-1k**](https://huggingface.co/datasets/imagenet-1k) (ILSVRC 2012).
+              - The **segmentation models** use a [**DeepLabV3**](https://huggingface.co/papers/1706.05587) head and are pretrained on [**PASCAL VOC**](http://host.robots.ox.ac.uk/pascal/VOC/).
+              - TensorFlow versions are compatible with **TensorFlow Lite**, making them ideal for edge/mobile deployment.

Member

stevhliu Aug 11, 2025

Suggested change

      
            - The **classification models** are pretrained on [**ImageNet-1k**](https://huggingface.co/datasets/imagenet-1k) (ILSVRC 2012).
          
            - The **segmentation models** use a [**DeepLabV3**](https://huggingface.co/papers/1706.05587) head and are pretrained on [**PASCAL VOC**](http://host.robots.ox.ac.uk/pascal/VOC/).
          
            - TensorFlow versions are compatible with **TensorFlow Lite**, making them ideal for edge/mobile deployment.
          
            - The classification models are pretrained on [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k).
          
            - The segmentation models use a [DeepLabV3](https://huggingface.co/papers/1706.05587) head and are pretrained on [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/).
          
            - TensorFlow versions are compatible with TensorFlow Lite, making them ideal for edge/mobile deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet