-
Notifications
You must be signed in to change notification settings - Fork 30k
Add model card for MobileViT #40033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add model card for MobileViT #40033
Conversation
@stevhliu Please take a look at your convenience and do let me know if there is any further changes required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start! Please check the model card format again as its missing Pipeline
and AutoModel
examples!
Most of the models that uses Transformers for vision would first divide the images into several patches which are further flattened and converted into vectors. This causes in the loss of structural properties of an image, which isn't the case for CNNs. Now, this causes the Transformer models to go bigger and deeper to learn visual representations. | ||
But MobileViT uses both convolutions and transformers in a way that the resultant block has convolution-like properties while simultaneously allowing for global interactions. This allows us to design a more shallow and narrow models, which are light-weight. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you upload the image here and then ping me to merge please :)
The image should be formatted like:
<div class="flex justify-center">
<img>
</div>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @stevhliu , I have uploaded the image and pinged you there, can you please check and merge it. I have also made the changes you suggested in the model card, let me know if everything looks good. I will add the image to the model card once you merge it. Thanks
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
What does this PR do?
This PR adds a detailed and beginner-friendly model card for MobileViT to the Hugging Face Transformers documentation. The previous model card was minimal and lacked clear explanations about the model architecture. This model retains several elements from the earlier version, as they remain applicable and effective for users.
The new version includes:
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.