Skip to content

DOC use KBinsDiscretizer in lieu of KMeans in vector quantization example #24374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 17, 2022

Conversation

x110
Copy link
Contributor

@x110 x110 commented Sep 6, 2022

Fixes #23896

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add titles to each of the imshow plots?

# create an array from labels and values
face_compressed = np.choose(labels, values)
est = preprocessing.KBinsDiscretizer(
n_bins=n_bins, strategy="uniform", encode="ordinal", random_state=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather keep on using the k-means strategy here.

@glemaitre glemaitre changed the title Vector quantization example DOC use KBinsDiscretizer in lieu of KMeans in vector quantization example Sep 9, 2022
@glemaitre
Copy link
Member

This PR only partially fixes the original issue. The initial issue reported was about the image size (in kB) where the quantized image was larger than the original due to some data type issues.

So I think that the narration should be improved by:

  • looking at the number of unique values original vs. quantized image to show the compression
  • explain the expansion in terms of memory link to the data type

The best would be to change the example into a notebook-style example where each of these points could be discuss in a different section/cell

@glemaitre
Copy link
Member

I solve the conflicts with main. I thought that I would give try to change the example using the notebook style. I, therefore, added some additional narrative but kept the original idea done in this PR.

I will check the rendering and I will probably merge if this is fine.

Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @x110, this is indeed an improvement to the current version of the example. Here is a first batch of suggestions.

x110 and others added 3 commits October 17, 2022 07:25
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
@glemaitre glemaitre self-requested a review October 17, 2022 09:31
glemaitre and others added 2 commits October 17, 2022 11:34
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
@glemaitre
Copy link
Member

I accepted the other suggestion from @ArturoAmorQ which are a net improvement in regard to my first draft. I will wait for the example generation to check that the rendering is fine and if it is, I will merge it.

Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small typo, otherwise then LGTM. Thanks @x110!

Comment on lines +110 to +115
_, ax = plt.subplots()
ax.hist(raccoon_face.ravel(), bins=256)
color = "tab:orange"
for center in bin_center:
ax.axvline(center, color=color)
ax.text(center - 10, ax.get_ybound()[1] + 100, f"{center:.1f}", color=color)
Copy link
Member

@ArturoAmorQ ArturoAmorQ Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For info, what I meant in this comment was the following:

Suggested change
_, ax = plt.subplots()
ax.hist(raccoon_face.ravel(), bins=256)
color = "tab:orange"
for center in bin_center:
ax.axvline(center, color=color)
ax.text(center - 10, ax.get_ybound()[1] + 100, f"{center:.1f}", color=color)
_, ax = plt.subplots()
ax.hist(raccoon_face.ravel(), bins=256)
ax.set(xlabel="Original pixel values", ylabel="Count of pixels")
ax1 = ax.twiny()
ax1.set_xlim(ax.get_xlim())
ax1.set(
xticks=bin_center,
xticklabels=list(range(8)),
xlabel="Sub-sampled pixel values",
)
for center in bin_center:
ax.axvline(center, color="tab:orange", ls="--")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar should then be done for the plot using k-means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it is what I did at first. But I find it more complex to understand than just adding the annotation within the for loop.

@glemaitre glemaitre merged commit 5af75b9 into scikit-learn:main Oct 17, 2022
@glemaitre
Copy link
Member

The rendering is fine. Thanks @x110 @ArturoAmorQ. Merging.

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Oct 31, 2022
…mple (scikit-learn#24374)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vector Quantization Example increases rather than decreases memory use
4 participants