-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Vector Quantization Example increases rather than decreases memory use #23896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Indeed, this tutorial should be revised. Regarding the point raised by @BlaneG, I think that we can illustrate the compression by counting the number of unique values. We should as well comment regarding the data type when it comes to the in-memory compression. Regarding the code, we should replace @BlaneG do you wish to do a PR to improve this example? |
Hello! If no one is already working on this, can I take this up? |
You can go ahead.
|
@MocktaiLEngineer Do let me know if you are working on this otherwise I would like to pick this up. |
@MrinalTyagi Please feel free to take over. |
If this issue is still free, I would like to try it @MrinalTyagi |
@glemaitre is this issue still open ? . if yes then i would like to contribute ! |
@ryuusama09 I think that @ShisuiUzumaki is working on it. |
@ryuusama09 feel free to take this issue since I am a bit busy. |
alright, although can you update me with what youve done till now ? |
regarding these changes , In what file are they to be made . Actually i am a beginner and have no good idea as in what to do so if could guide me a bit ? |
@ryuusama09 Please note that I have submitted a pull request for this issue, waiting for it to get reviewed. |
Describe the issue linked to the documentation
The Vector Quantization Example doesn't seem to demonstrate Vector Quantization.
As written, the kmeans clustering approach used in the example converts a grey scale
uint8
face
to anint32
representation (labels
). This increases the image memory use by 4x.Expected output
Vector quantization output demonstrates a decrease in memory use.
Additional details
From Wikipedia: "Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression. It works by encoding values from a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so the data is compressed."
I'm guessing kmeans outputs an
int32
by default. The clusterlabels
are in the range 0,1,2,3,4. While this could be compressed to a 4 bit integer,uint8
is as small as we can go with numpy so the example does not effectively illustrate the data compression.Perhaps the tutorial assumption is that the values contained in
labels
could be compressed through some other algorithm (e.g. outside of numpy). However, for someone unfamiliar with Vector Quantization, it may seem odd why someone would quantize a vector in a way that both loses information and increases memory use.Suggest a potential alternative/fix
uint8
to demonstrate the compression. Converting 3, 8 bit channels to one 8 bit channel would reduce nbytes by 67%.The text was updated successfully, but these errors were encountered: