-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] RidgeCV minor refactor to improve readability #13832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] RidgeCV minor refactor to improve readability #13832
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to have @jeromedockes give it a review.
what does |
It stands for |
maybe the reference later uses S^2 to denote the eigenvalues of X^TX (ie the nonzero part of Lambda in the low-dimensional case), and U to denote the corresponding part of Q (ie the left singular vectors of X associated with non-zero diagonal entries of Lambda) maybe we can be even more explicit and use |
The reference uses lower case
|
I am fine with both options. |
return self._solve_eigen_covariance_no_intercept( | ||
alpha, y, sqrt_sw, X_mean, s, V, X) | ||
alpha, y, sqrt_sw, X_mean, eigvals, Q, X) | ||
|
||
def _svd_decompose_design_matrix(self, X, y, sqrt_sw): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the notations used in this function (especially v = s **2
) be updated as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the covariance matrix is left as a side note, in section 5.3. For a future reader, seeing S^2
and looking at the reference, would make it seem like we did SVD on X
, when in fact we did eigen decomposition on X^TX
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this function we are really computing an SVD of X. maybe v
should be called eigvals
since it is the name we are using elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is, in _eigen_decompose_covariance
we are doing an eigenvalue decomposition of X^TX
and _solve_eigen_covariance
uses this to get
X(X^TX + alpha*I)^(-1)X^T
Letting X^TX=QLQ^T
, we write this as
XQ^T(L + alpha*I)^(-1)QX^T
I may be missing something. Where are we doing SVD directly in this procedure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. In what you write above I would only change one detail: in the reference Q and L are the eigenvectors and eigenvalues of XX^T, whereas V and S^2 are the eigenvectors and eigenvalues of X^TX. (also Q is transposed in the second equation). So to avoid confusion I would replace
Letting
X^TX=QLQ^T
, we write this asXQ^T(L + alpha*I)^(-1)QX^T
with
Letting X^TX=VS^2V^T
, we write this as
XV(S^2 + alpha*I)^(-1)V^TX^T
Where are we doing SVD directly in this procedure?
We are not doing SVD in _eigen_decompose_covariance
, but in _svd_decompose_design_matrix
(when X
is dense we compute U
in the decomposition to avoid having to recompute the product XV
many times)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also Q is transposed in the second equation).
Oops.
Letting X^TX=VS^2V^T, we write this as
I think adding any reference to SVD would be confusing for a future reviewer. We are following section 5.3 starting from
1-X(X^TX+alpha*I)^-1X^T
and not following the reference using SVD in section 5.2.
In the docstring for
|
sklearn/linear_model/ridge.py
Outdated
|
||
def _solve_eigen_covariance_no_intercept( | ||
self, alpha, y, sqrt_sw, X_mean, s, V, X): | ||
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) | |
"""Compute dual coefficients and diagonal of G^-1 |
sklearn/linear_model/ridge.py
Outdated
self, alpha, y, sqrt_sw, X_mean, s, V, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) | ||
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), | |
"""Compute dual coefficients and diagonal of G^-1 |
sklearn/linear_model/ridge.py
Outdated
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) | ||
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), | ||
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T |
sklearn/linear_model/ridge.py
Outdated
self, alpha, y, sqrt_sw, X_mean, s, V, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) | ||
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), | |
"""Compute dual coefficients and diagonal of G^-1 |
sklearn/linear_model/ridge.py
Outdated
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) | ||
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), | ||
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T |
sklearn/linear_model/ridge.py
Outdated
@@ -1324,17 +1326,21 @@ def _solve_eigen_covariance_intercept( | |||
return (1 - hat_diag) / alpha, (y - y_hat) / alpha | |||
|
|||
def _solve_eigen_covariance( | |||
self, alpha, y, sqrt_sw, X_mean, s, V, X): | |||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) | |||
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here maybe keep the notation V
instead of Q
? since they are the right singular vectors of X
sklearn/linear_model/ridge.py
Outdated
(n_samples > n_features and X is sparse). | ||
|
||
Letting X^T.X = QVQ^T, the Hat_matrix can be written as: | ||
QV(V + alpha*I)^(-1)VQ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe X = USV^T, and the influence or hat matrix is
XV (1 / (S^2 + alpha I)) V^TX^T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Copy from another comment)
My understanding is, in _eigen_decompose_covariance
we are doing an eigenvalue decomposition of X^TX
and _solve_eigen_covariance
uses this to get
X(X^TX + alpha*I)^(-1)X^T
Letting X^TX=QLQ^T
, we write this as
XQ^T(L + alpha*I)^(-1)QX^T
I may be missing something. Where are we doing SVD directly in this procedure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see this can be put in the context of using the right singular matrix, but we only actually call svd
in _svd_decompose_design_matrix
(not in _eigen_decompose_covariance
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still _eigen_decompose_covariance
uses V
, the eigenvectors of X^TX
, rather than Q
, which are the eigenvectors of XX^T
. Either way in the docstring this line:
QV(V + alpha*I)^(-1)VQ
needs to be corrected, replaced by
XQ(L + alpha*I)^(-1)Q^TX^T
if we keep the notation you suggest, or
XV(S**2 + alpha*I)^(-1)V^TX^T
to avoid using Q
for both the left and right singular vectors of X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see what you mean with the Q. Using V looks good for me. Using S**2
is still al little suspect as mention in the other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see what you mean with the Q. Using V looks good for me. Using
S**2
is still al little suspect as mention in the other comment.
cool! sorry if I was unclear. If you dislike S**2
I think any other letter is fine, outside of Q
, V
, U
, X
, S
and L
sklearn/linear_model/ridge.py
Outdated
self, alpha, y, sqrt_sw, X_mean, s, V, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) | ||
self, alpha, y, sqrt_sw, X_mean, eigvals, V, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), | |
"""Compute dual coefficients and diagonal of G^-1, |
sklearn/linear_model/ridge.py
Outdated
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix) | ||
self, alpha, y, sqrt_sw, X_mean, eigvals, V, X): | ||
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix), | ||
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T |
sklearn/linear_model/ridge.py
Outdated
@@ -1352,10 +1359,10 @@ def _svd_decompose_design_matrix(self, X, y, sqrt_sw): | |||
|
|||
def _solve_svd_design_matrix( | |||
self, alpha, y, sqrt_sw, X_mean, v, U, UT_y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and in _svd_decompose_design_matrix
maybe v
should also be called eigvals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called it eigenvals_sq
just to denote it it S**2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are the eigenvalues of X^TX, ie the squared singular values of X, so I guess it should be eigenvals
or singvals_sq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When performing SVD, we do not explicitly calculate X^TX
. When one reads the code, it explicit shows s
is the eigenvalues of X and eigenvals_sq
is the square of it.
If we use eigenvals = s**2
, a future reviewer would wonder "why are we calling the eigevals the square of the actual eigenvalues?". (It is unclear from the code/comments/reference that we actually want the eigenvals of X^TX) If we following the references directly, the idea of covariance is not stated anywhere while reading the SVD section (5.2). It is briefly mentioned in sectino 5.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then how about singvals_sq
or squared_singvals
or squared_singular_values
? because they are the singular values of X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is briefly mentioned in sectino 5.3.
section 5.3 is about the case where we don't want the LOOE and use a separate validation set; although it also computes an eigendecomposition of X^TX it suggests something quite different and I don't think we should mention it at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think singvals_sq
works.
sklearn/linear_model/ridge.py
Outdated
where (V + alpha*Id) is diagonal. | ||
It is thus inexpensive to inverse for many alphas. | ||
|
||
Let loov be the vector of prediction values for each example | ||
when the model was fitted with all examples but this example. | ||
|
||
loov = (KGY - diag(KG)Y) / diag(I-KG) | ||
loov = (KGY - diag(KG^-1)Y) / diag(I-KG^-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loov = (KGY - diag(KG^-1)Y) / diag(I-KG^-1) | |
loov = (KG^-1Y - diag(KG^-1)Y) / diag(I-KG^-1) |
@jeromedockes Thanks for bearing with me. Naming things can be tough. |
it is! but you are right that this will make the code easier to understand for future readers |
I think all the comments made by @jeromedockes were addressed. |
yes! thanks a lot for considering my suggestions. re-reading the code I also realized that the docstrings for regarding the changes of variable names made in this PR I don't have any more |
thanks! no more comments on my side :) |
Ooops, I put the thank you message in the merge commit message field, instead of the comment field... |
Hahaha
|
Thank you very much for this work @thomasjpfan and @jeromedockes.
Reference Issues/PRs
Minor refactors to #13350
What does this implement/fix? Explain your changes.
lam
andQ
when dealing with eigen decomposition, andU
andv
are used when using SVD.n_samples
first when comparing ton_features
in comments.Any other comments?
The goal of this PR is to make this code easier to read/maintain in the future.
CC @ogrisel