-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
PERF: Vectorize stochastic optimizers for 20-25% performance improvement #32110 #32113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Optimize SGDOptimizer._get_updates() using in-place NumPy operations - Optimize AdamOptimizer._get_updates() using vectorized moment updates - Add comprehensive performance tests to verify optimizations - Maintain full numerical accuracy and backward compatibility - Achieve 26.3% speedup for SGD and 24.2% speedup for Adam on large arrays The optimizations use np.multiply() and np.add() with out= parameter to reduce memory allocations and improve cache efficiency during neural network training.
- Fix line length violations in _stochastic_optimizers.py - Remove unused imports and fix style in performance tests - All tests still pass with optimizations intact
- Fix remaining line length violations in both files - Ensure full compliance with flake8 and black formatting - All performance tests still pass (6/6) - Code ready for scikit-learn CI/CD pipeline
…ents - Document 20-25% performance improvement in MLPClassifier/MLPRegressor - Entry follows scikit-learn changelog format for efficiency improvements - Resolves missing changelog requirement for PR scikit-learn#32113
# Vectorized update of velocities for better performance | ||
for i, (velocity, grad) in enumerate(zip(self.velocities, grads)): | ||
np.multiply(self.momentum, velocity, out=velocity) | ||
grad_update = -self.learning_rate * grad | ||
np.add(velocity, grad_update, out=velocity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this syntax is clearer (and efficiency is exactly the same, both syntax are sementically equivalent I think):
# Vectorized update of velocities for better performance | |
for i, (velocity, grad) in enumerate(zip(self.velocities, grads)): | |
np.multiply(self.momentum, velocity, out=velocity) | |
grad_update = -self.learning_rate * grad | |
np.add(velocity, grad_update, out=velocity) | |
# Vectorized in-place update of velocities for better performance | |
for velocity, grad in zip(self.velocities, grads): | |
velocity *= self.momentum | |
velocity -= self.learning_rate * grad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah they are mostly the same
but in second snippet
Uses Python/Numpy in-place operators directly.
More concise, easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
else: | ||
updates = [v.copy() for v in self.velocities] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the tests break without those 2 lines? Maybe returned updates are only read and never written to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it does no break
i will update it to
else:
updates = self.velocities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great.
But let this comment openned, I want other reviewers to see it, just in case 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, cause v.copy() was just extra and it's resolve now
self, | ||
params, | ||
learning_rate_init=0.001, | ||
beta_1=0.9, | ||
beta_2=0.999, | ||
epsilon=1e-8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: avoid changes unrelated to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can i close this as resolve??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the change, maybe you have some auto-formating set-up? Or you simply forgot to change/push/...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i forgot to chaneg yet
sklearn/neural_network/tests/test_stochastic_optimizers_performance.py
Outdated
Show resolved
Hide resolved
Can you add a link to the benchark code in the PR desc? (a gist or a colab notebook or ...) And can you add a link to the issue to? (like "Fixes #32110" ) |
@cakedev0 This is the benchmark_optimize.py
|
- Reduce array sizes from 500x500 to 10x10 and 100x100 to 20x20 - Test runtime reduced from 6+ seconds to 0.13 seconds (98% faster) - Maintains vectorization testing coverage while being CI-friendly - All 6 tests pass with proper numerical accuracy validation
- Replace np.multiply/np.add with in-place *= and += operators - Optimize division by avoiding temporary array creation - Remove unnecessary .copy() in SGD optimizer - Additional ~5% performance improvement for Adam - Maintains numerical accuracy while improving efficiency
- Fix missing self.learning_rate assignment in Adam _get_updates() - Ensure bias correction learning rate is properly tracked as public attribute - Maintains API compatibility and proper Adam algorithm implementation - Learning rate correctly decreases over iterations (0.001 -> ~0.0002 after 50 steps) - All tests pass with corrected implementation Previous optimization accidentally used local lr_t variable instead of updating the public self.learning_rate attribute, which is critical for proper Adam behavior.
Contribute to #32110
The optimizations use np.multiply() and np.add() with out= parameter to reduce memory allocations and improve cache efficiency during neural network training.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Any other comments?