Skip to content

Vector Quantization Example increases rather than decreases memory use #23896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BlaneG opened this issue Jul 14, 2022 · 12 comments · Fixed by #24374
Closed

Vector Quantization Example increases rather than decreases memory use #23896

BlaneG opened this issue Jul 14, 2022 · 12 comments · Fixed by #24374
Labels
Documentation good first issue Easy with clear instructions to resolve

Comments

@BlaneG
Copy link

BlaneG commented Jul 14, 2022

Describe the issue linked to the documentation

The Vector Quantization Example doesn't seem to demonstrate Vector Quantization.

As written, the kmeans clustering approach used in the example converts a grey scale uint8 face to an int32 representation (labels). This increases the image memory use by 4x.

print(f'face dtype: {face.dtype}')
print(f'face bytes: {face.nbytes}')
print(f'labels dtype: {labels.dtype}')
print(f'labels bytes: {labels.nbytes}')
face dtype: uint8
face bytes: 786432
labels dtype: int32
labels bytes: 3145728

Expected output
Vector quantization output demonstrates a decrease in memory use.

Additional details
From Wikipedia: "Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression. It works by encoding values from a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so the data is compressed."

I'm guessing kmeans outputs an int32 by default. The cluster labels are in the range 0,1,2,3,4. While this could be compressed to a 4 bit integer, uint8 is as small as we can go with numpy so the example does not effectively illustrate the data compression.

Perhaps the tutorial assumption is that the values contained in labels could be compressed through some other algorithm (e.g. outside of numpy). However, for someone unfamiliar with Vector Quantization, it may seem odd why someone would quantize a vector in a way that both loses information and increases memory use.

Suggest a potential alternative/fix

  1. add a comment to clarify that the quantized representation of the original face could be further compressed by another algorithm
  2. replace the gray image with a color image and cast the kmeans output to a uint8 to demonstrate the compression. Converting 3, 8 bit channels to one 8 bit channel would reduce nbytes by 67%.
@BlaneG BlaneG added Documentation Needs Triage Issue requires triage labels Jul 14, 2022
@glemaitre
Copy link
Member

Indeed, this tutorial should be revised.

Regarding the point raised by @BlaneG, I think that we can illustrate the compression by counting the number of unique values. We should as well comment regarding the data type when it comes to the in-memory compression.

Regarding the code, we should replace KMeans with the KBinsDiscretizer that allows to switch from kmeans to uniform quantization and would require only calling transform. This will make the code easier to understand, and the discretization is actually the job of KBinsDiscretizer :)

@BlaneG do you wish to do a PR to improve this example?

@glemaitre glemaitre added good first issue Easy with clear instructions to resolve help wanted and removed Needs Triage Issue requires triage labels Jul 28, 2022
@MocktaiLEngineer
Copy link

Hello! If no one is already working on this, can I take this up?

@glemaitre
Copy link
Member

glemaitre commented Jul 29, 2022 via email

@MrinalTyagi
Copy link
Contributor

@MocktaiLEngineer Do let me know if you are working on this otherwise I would like to pick this up.

@MocktaiLEngineer
Copy link

@MrinalTyagi Please feel free to take over.

@ShisuiUzumaki
Copy link
Contributor

ShisuiUzumaki commented Sep 2, 2022

If this issue is still free, I would like to try it @MrinalTyagi

@ryuusama09
Copy link

@glemaitre is this issue still open ? . if yes then i would like to contribute !

@glemaitre
Copy link
Member

@ryuusama09 I think that @ShisuiUzumaki is working on it.
Please sync together.

@ShisuiUzumaki
Copy link
Contributor

@ryuusama09 feel free to take this issue since I am a bit busy.

@ryuusama09
Copy link

The Vector Quantization Example doesn't seem to demonstrate Vector Quantization.

As written, the kmeans clustering approach used in the example converts a grey scale uint8 face to an int32 representation (labels). This increases the image memory use by 4x.

alright, although can you update me with what youve done till now ?

@ryuusama09
Copy link

You can go ahead.

regarding these changes , In what file are they to be made . Actually i am a beginner and have no good idea as in what to do so if could guide me a bit ?

@x110
Copy link
Contributor

x110 commented Sep 6, 2022

@ryuusama09 Please note that I have submitted a pull request for this issue, waiting for it to get reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation good first issue Easy with clear instructions to resolve
Projects
None yet
8 participants