Skip to content

DOC Release Highlights for version 1.6 #30392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Dec 6, 2024

Conversation

jeremiedbb
Copy link
Member

Candidates for the highlights

  • FrozenEstimator

  • Transform metadata in Pipeline

  • Missing value support in ExtraTreesClassifier/Regressor

  • fetch_file
    @ogrisel do we want to showcase an example for that ? If so what file should we download ?

  • News on array api support

  • News on metadata routing support

  • Free threading support
    ping @lesteve, would you mind writing this item ? You'll be more accurate than me :)

  • Developer API

ping @adrinjalali who kindly proposed to write something for frozen estimator, metadata in pipeline and developper API.

cc/ @scikit-learn/communication-team We plan to release 1.6.0 final this week.
cc/ @scikit-learn/core-devs Feel free to correct inaccuracies that I may have done or add items that I have missed.

Copy link

github-actions bot commented Dec 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e2208cd. Link to the linter CI: here

@lesteve
Copy link
Member

lesteve commented Dec 3, 2024

ping @lesteve, would you mind writing this item ? You'll be more accurate than me :)

Sure I guess #30360 should be merged first and then the highlights would have an even shorter description with a link to the changelog entry?

@adrinjalali
Copy link
Member

I'm not sure if the pipeline's transform input is sth we should write about in this release, since there's no real example out there where this is now useful. WDYT?

@jeremiedbb
Copy link
Member Author

I'm not sure if the pipeline's transform input is sth we should write about in this release, since there's no real example out there where this is now useful. WDYT?

Since it's one of the only 2 "major features" of this release I find it sad to not showcase it in the highlights. Can we make a toy example even if we can't benefit from it directly in sklearn but third party libraries might ?

@adrinjalali
Copy link
Member

Ok, let me know what you think about it then. Added a non-executable piece of code.

@lesteve
Copy link
Member

lesteve commented Dec 4, 2024

I have added free-threaded highlights which I pretty much copied from the changelog entry.

I chose to do this rather than having a shorter description with a link to the changelog entry to save one click. Let me know if you prefer the latter option!

@jeremiedbb
Copy link
Member Author

Ok, let me know what you think about it then. Added a non-executable piece of code.

I think a non-executable snippet is fine, thanks !

@jeremiedbb
Copy link
Member Author

I chose to do this rather than having a shorter description with a link to the changelog entry to save one click

Yeah it's better since the highlights are already linked in the changelog so no need to loop around once more

adrinjalali and others added 2 commits December 4, 2024 15:50
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
@glemaitre glemaitre self-requested a review December 5, 2024 10:31
@lorentzenchr
Copy link
Member

Could we add the newton-cholesky solver. The one PR of this release is not that big, but it completes a larger journey and we have not advertised it much.


threshold_classifier = FixedThresholdClassifier(
estimator=FrozenEstimator(classifier), threshold=0.9
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to call fit? Maybe one way to show this is no-op, it to show some timing:

import time
from sklearn.datasets import make_classification
from sklearn.frozen import FrozenEstimator
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import FixedThresholdClassifier

X, y = make_classification(n_samples=1_000, random_state=0)

start = time.time()
classifier = SGDClassifier().fit(X, y)
print(f"Fitting the classifier took {(time.time() - start) * 1_000:.2f} milliseconds")

start = time.time()
threshold_classifier = FixedThresholdClassifier(
    estimator=FrozenEstimator(classifier), threshold=0.9
).fit(X, y)
print(
    f"Fitting the threshold classifier took {(time.time() - start) * 1_000:.2f} milliseconds"
)
Fitting the classifier took 2.53 milliseconds
Fitting the threshold classifier took 0.61 milliseconds

and add an extra conclusion line.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some comment regarding the grammar and two nitpicks.

@jeremiedbb
Copy link
Member Author

Could we add the newton-cholesky solver. The one PR of this release is not that big, but it completes a larger journey and we have not advertised it much.

I agree that it could have been highlighted back then but I'm not very comfortable putting it in the highlights of 1.6 while it was released in 1.2.

The highlights should be about the new stuff that was not there previously. I think that it's not the best place to communicate more about it. Maybe @koaning would be interested in making a video about that ?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of feedback:

@ogrisel
Copy link
Member

ogrisel commented Dec 5, 2024

I would be +1 about advertising the newton-cholesky solver, even if this release only adds support for the multinomial/multiclass case in LogisticRegression. This is a non-trivial PR with dramatic performance improvement on real-world application datasets processed with common feature engineering. Maybe we could link the benchmark results from the PR:

#28840 (comment)

That should not prevent anybody else to advertise it even more in blog/social media posts or videos.

@koaning
Copy link

koaning commented Dec 5, 2024

@jeremiedbb I can for sure make another video for the scikit-learn YouTube channel, but I usually prefer to start work on that once the actual release is live and tested.

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Dec 5, 2024

I would be +1 about advertising the newton-cholesky solver, even if this release only adds support for the multinomial/multiclass case in LogisticRegression. This is a non-trivial PR with dramatic performance improvement on real-world application datasets processed with common feature engineering. Maybe we could link the benchmark results from the PR

Alright, would you @ogrisel or @lorentzenchr mind writing this section ? I haven't followed that in details so you'll be a lot more precise and accurate than me :)

@ogrisel
Copy link
Member

ogrisel commented Dec 5, 2024

Let me give it a shot.

@ogrisel
Copy link
Member

ogrisel commented Dec 5, 2024

I pushed f0669ce to highlight the work on the new solver. I toyed a bit generating synthetic multiclass data where it would make a difference in terms of convergence to a better model but it's not easy to find the regime where it really shines so in the end I just added a paragraph with a link to the benchmark results from the PR.

I checked that I can still reproduce them from the current main.

@jeremiedbb
Copy link
Member Author

So I can't approve my own PR, but since I didn't write most of it I give my +1 anyway 😄

Is it good for you as well ? If so, please give your approval so that we can merge it and continue the release process :)

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well.

@ogrisel ogrisel merged commit a23aef1 into scikit-learn:main Dec 6, 2024
30 checks passed
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 6, 2024
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
jeremiedbb added a commit that referenced this pull request Dec 6, 2024
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
virchan pushed a commit to virchan/scikit-learn that referenced this pull request Dec 9, 2024
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants