Skip to content

[MRG] RidgeCV minor refactor to improve readability #13832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 16, 2019

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Minor refactors to #13350

What does this implement/fix? Explain your changes.

  1. Renames variables trying to use the same variables shown in the reference.
  2. Uses lam and Q when dealing with eigen decomposition, and U and v are used when using SVD.
  3. Always use n_samples first when comparing to n_features in comments.

Any other comments?

The goal of this PR is to make this code easier to read/maintain in the future.

CC @ogrisel

@thomasjpfan thomasjpfan changed the title [MRG] RidgeCV Sparse Minor Refactor [MRG] RidgeCV minor refactor to improve readability May 9, 2019
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to have @jeromedockes give it a review.

@jeromedockes
Copy link
Contributor

Uses lam and Q when dealing with eigen decomposition, and U and v are used when using SVD.

what does lam stand for?

@thomasjpfan
Copy link
Member Author

thomasjpfan commented May 9, 2019

what does lam stand for?

It stands for lambda corresponding to upper case lambda in the reference. I would like to use v instead of lam, but it may be more confusing for maintainers looking at the reference later.

@jeromedockes
Copy link
Contributor

what does lam stand for?

It stands for lambda corresponds to upper case lambda in the reference. I would like to use v instead of lam, but it may be more confusing for maintainers looking at the reference later.

maybe lambda_ then ? and is there a risk of confusion with small lambda (used to denote the regularization parameter in the reference, alpha in the code)?

the reference later uses S^2 to denote the eigenvalues of X^TX (ie the nonzero part of Lambda in the low-dimensional case), and U to denote the corresponding part of Q (ie the left singular vectors of X associated with non-zero diagonal entries of Lambda)

maybe we can be even more explicit and use covariance_eigvals and gram_eigvals?

@thomasjpfan
Copy link
Member Author

(used to denote the regularization parameter in the reference, alpha in the code)?

The reference uses lower case lambda while we use alpha. We usually save the suffixed under score for "fitted attributes", so I would avoid using it here.

maybe we can be even more explicit and use covariance_eigvals and gram_eigvals?

eigvals instead of lam looks good to me. The names of the decomposition functions already denote what it is decomposing.

@ogrisel
Copy link
Member

ogrisel commented May 9, 2019

maybe we can be even more explicit and use covariance_eigvals and gram_eigvals?

eigvals instead of lam looks good to me. The names of the decomposition functions already denote what it is decomposing.

I am fine with both options.

return self._solve_eigen_covariance_no_intercept(
alpha, y, sqrt_sw, X_mean, s, V, X)
alpha, y, sqrt_sw, X_mean, eigvals, Q, X)

def _svd_decompose_design_matrix(self, X, y, sqrt_sw):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the notations used in this function (especially v = s **2) be updated as well?

Copy link
Member Author

@thomasjpfan thomasjpfan May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the covariance matrix is left as a side note, in section 5.3. For a future reader, seeing S^2 and looking at the reference, would make it seem like we did SVD on X, when in fact we did eigen decomposition on X^TX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this function we are really computing an SVD of X. maybe v should be called eigvals since it is the name we are using elsewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is, in _eigen_decompose_covariance we are doing an eigenvalue decomposition of X^TX and _solve_eigen_covariance uses this to get

X(X^TX + alpha*I)^(-1)X^T

Letting X^TX=QLQ^T, we write this as

XQ^T(L + alpha*I)^(-1)QX^T

I may be missing something. Where are we doing SVD directly in this procedure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. In what you write above I would only change one detail: in the reference Q and L are the eigenvectors and eigenvalues of XX^T, whereas V and S^2 are the eigenvectors and eigenvalues of X^TX. (also Q is transposed in the second equation). So to avoid confusion I would replace

Letting X^TX=QLQ^T, we write this as

XQ^T(L + alpha*I)^(-1)QX^T

with
Letting X^TX=VS^2V^T, we write this as

XV(S^2 + alpha*I)^(-1)V^TX^T

Where are we doing SVD directly in this procedure?

We are not doing SVD in _eigen_decompose_covariance, but in _svd_decompose_design_matrix (when X is dense we compute U in the decomposition to avoid having to recompute the product XV many times)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also Q is transposed in the second equation).

Oops.

Letting X^TX=VS^2V^T, we write this as

I think adding any reference to SVD would be confusing for a future reviewer. We are following section 5.3 starting from

1-X(X^TX+alpha*I)^-1X^T

and not following the reference using SVD in section 5.2.

@jeromedockes
Copy link
Contributor

In the docstring for _RidgeGCV, G needs to be replaced by G^-1 in this
part as well:

    Compute eigendecomposition K = Q V Q^T.
    Then G = Q (V + alpha*Id)^-1 Q^T,
    where (V + alpha*Id) is diagonal.
    It is thus inexpensive to inverse for many alphas.

    Let loov be the vector of prediction values for each example
    when the model was fitted with all examples but this example.

    loov = (KGY - diag(KG)Y) / diag(I-KG)

    Let looe be the vector of prediction errors for each example
    when the model was fitted with all examples but this example.

    looe = y - loov = c / diag(G)


def _solve_eigen_covariance_no_intercept(
self, alpha, y, sqrt_sw, X_mean, s, V, X):
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
"""Compute dual coefficients and diagonal of G^-1

self, alpha, y, sqrt_sw, X_mean, s, V, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
"""Compute dual coefficients and diagonal of G^-1

"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T

self, alpha, y, sqrt_sw, X_mean, s, V, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
"""Compute dual coefficients and diagonal of G^-1

"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T

@@ -1324,17 +1326,21 @@ def _solve_eigen_covariance_intercept(
return (1 - hat_diag) / alpha, (y - y_hat) / alpha

def _solve_eigen_covariance(
self, alpha, y, sqrt_sw, X_mean, s, V, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here maybe keep the notation V instead of Q? since they are the right singular vectors of X

(n_samples > n_features and X is sparse).

Letting X^T.X = QVQ^T, the Hat_matrix can be written as:
QV(V + alpha*I)^(-1)VQ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe X = USV^T, and the influence or hat matrix is
XV (1 / (S^2 + alpha I)) V^TX^T

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Copy from another comment)

My understanding is, in _eigen_decompose_covariance we are doing an eigenvalue decomposition of X^TX and _solve_eigen_covariance uses this to get

X(X^TX + alpha*I)^(-1)X^T

Letting X^TX=QLQ^T, we write this as

XQ^T(L + alpha*I)^(-1)QX^T

I may be missing something. Where are we doing SVD directly in this procedure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this can be put in the context of using the right singular matrix, but we only actually call svd in _svd_decompose_design_matrix (not in _eigen_decompose_covariance).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still _eigen_decompose_covariance uses V, the eigenvectors of X^TX, rather than Q, which are the eigenvectors of XX^T. Either way in the docstring this line:

         QV(V + alpha*I)^(-1)VQ 

needs to be corrected, replaced by

XQ(L + alpha*I)^(-1)Q^TX^T

if we keep the notation you suggest, or

XV(S**2 + alpha*I)^(-1)V^TX^T

to avoid using Q for both the left and right singular vectors of X

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see what you mean with the Q. Using V looks good for me. Using S**2 is still al little suspect as mention in the other comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see what you mean with the Q. Using V looks good for me. Using S**2 is still al little suspect as mention in the other comment.

cool! sorry if I was unclear. If you dislike S**2 I think any other letter is fine, outside of Q, V, U, X, S and L

self, alpha, y, sqrt_sw, X_mean, s, V, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
self, alpha, y, sqrt_sw, X_mean, eigvals, V, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
"""Compute dual coefficients and diagonal of G^-1,

"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
self, alpha, y, sqrt_sw, X_mean, eigvals, V, X):
"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T

@@ -1352,10 +1359,10 @@ def _svd_decompose_design_matrix(self, X, y, sqrt_sw):

def _solve_svd_design_matrix(
self, alpha, y, sqrt_sw, X_mean, v, U, UT_y):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in _svd_decompose_design_matrix maybe v should also be called eigvals ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called it eigenvals_sq just to denote it it S**2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are the eigenvalues of X^TX, ie the squared singular values of X, so I guess it should be eigenvals or singvals_sq?

Copy link
Member Author

@thomasjpfan thomasjpfan May 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When performing SVD, we do not explicitly calculate X^TX. When one reads the code, it explicit shows s is the eigenvalues of X and eigenvals_sq is the square of it.

If we use eigenvals = s**2, a future reviewer would wonder "why are we calling the eigevals the square of the actual eigenvalues?". (It is unclear from the code/comments/reference that we actually want the eigenvals of X^TX) If we following the references directly, the idea of covariance is not stated anywhere while reading the SVD section (5.2). It is briefly mentioned in sectino 5.3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then how about singvals_sq or squared_singvals or squared_singular_values? because they are the singular values of X

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is briefly mentioned in sectino 5.3.

section 5.3 is about the case where we don't want the LOOE and use a separate validation set; although it also computes an eigendecomposition of X^TX it suggests something quite different and I don't think we should mention it at all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think singvals_sq works.

where (V + alpha*Id) is diagonal.
It is thus inexpensive to inverse for many alphas.

Let loov be the vector of prediction values for each example
when the model was fitted with all examples but this example.

loov = (KGY - diag(KG)Y) / diag(I-KG)
loov = (KGY - diag(KG^-1)Y) / diag(I-KG^-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
loov = (KGY - diag(KG^-1)Y) / diag(I-KG^-1)
loov = (KG^-1Y - diag(KG^-1)Y) / diag(I-KG^-1)

@thomasjpfan
Copy link
Member Author

@jeromedockes Thanks for bearing with me. Naming things can be tough.

@jeromedockes
Copy link
Contributor

@jeromedockes Thanks for bearing with me. Naming things can be tough.

it is! but you are right that this will make the code easier to understand for future readers

@thomasjpfan
Copy link
Member Author

I think all the comments made by @jeromedockes were addressed.

@jeromedockes
Copy link
Contributor

I think all the comments made by @jeromedockes were addressed.

yes! thanks a lot for considering my suggestions.

re-reading the code I also realized that the docstrings for _compute_gram and
_compute_covariance are out of date: the center parameter has been removed.
sorry for not modifying them in #13350. I opened a small PR on your branch to
fix this, or you can make the change however you think is better.

regarding the changes of variable names made in this PR I don't have any more
comments.

@jeromedockes
Copy link
Contributor

thanks! no more comments on my side :)

@ogrisel ogrisel merged commit 4e65827 into scikit-learn:master May 16, 2019
@ogrisel
Copy link
Member

ogrisel commented May 16, 2019

Ooops, I put the thank you message in the merge commit message field, instead of the comment field...

@jnothman
Copy link
Member

jnothman commented May 16, 2019 via email

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants