[MRG] RidgeCV minor refactor to improve readability #13832

thomasjpfan · 2019-05-08T17:04:37Z

Reference Issues/PRs

Minor refactors to #13350

What does this implement/fix? Explain your changes.

Renames variables trying to use the same variables shown in the reference.
Uses lam and Q when dealing with eigen decomposition, and U and v are used when using SVD.
Always use n_samples first when comparing to n_features in comments.

Any other comments?

The goal of this PR is to make this code easier to read/maintain in the future.

CC @ogrisel

…efactor

ogrisel

Would be great to have @jeromedockes give it a review.

sklearn/linear_model/ridge.py

jeromedockes · 2019-05-09T13:51:15Z

Uses lam and Q when dealing with eigen decomposition, and U and v are used when using SVD.

what does lam stand for?

sklearn/linear_model/ridge.py

thomasjpfan · 2019-05-09T14:00:53Z

what does lam stand for?

It stands for lambda corresponding to upper case lambda in the reference. I would like to use v instead of lam, but it may be more confusing for maintainers looking at the reference later.

jeromedockes · 2019-05-09T14:17:37Z

what does lam stand for?

It stands for lambda corresponds to upper case lambda in the reference. I would like to use v instead of lam, but it may be more confusing for maintainers looking at the reference later.

maybe lambda_ then ? and is there a risk of confusion with small lambda (used to denote the regularization parameter in the reference, alpha in the code)?

the reference later uses S^2 to denote the eigenvalues of X^TX (ie the nonzero part of Lambda in the low-dimensional case), and U to denote the corresponding part of Q (ie the left singular vectors of X associated with non-zero diagonal entries of Lambda)

maybe we can be even more explicit and use covariance_eigvals and gram_eigvals?

thomasjpfan · 2019-05-09T14:26:06Z

(used to denote the regularization parameter in the reference, alpha in the code)?

The reference uses lower case lambda while we use alpha. We usually save the suffixed under score for "fitted attributes", so I would avoid using it here.

maybe we can be even more explicit and use covariance_eigvals and gram_eigvals?

eigvals instead of lam looks good to me. The names of the decomposition functions already denote what it is decomposing.

ogrisel · 2019-05-09T14:31:39Z

maybe we can be even more explicit and use covariance_eigvals and gram_eigvals?

eigvals instead of lam looks good to me. The names of the decomposition functions already denote what it is decomposing.

I am fine with both options.

sklearn/linear_model/ridge.py

jeromedockes · 2019-05-09T14:39:31Z

sklearn/linear_model/ridge.py

        return self._solve_eigen_covariance_no_intercept(
-            alpha, y, sqrt_sw, X_mean, s, V, X)
+            alpha, y, sqrt_sw, X_mean, eigvals, Q, X)

    def _svd_decompose_design_matrix(self, X, y, sqrt_sw):


should the notations used in this function (especially v = s **2) be updated as well?

Using the covariance matrix is left as a side note, in section 5.3. For a future reader, seeing S^2 and looking at the reference, would make it seem like we did SVD on X, when in fact we did eigen decomposition on X^TX.

in this function we are really computing an SVD of X. maybe v should be called eigvals since it is the name we are using elsewhere

My understanding is, in _eigen_decompose_covariance we are doing an eigenvalue decomposition of X^TX and _solve_eigen_covariance uses this to get

X(X^TX + alpha*I)^(-1)X^T

Letting X^TX=QLQ^T, we write this as

XQ^T(L + alpha*I)^(-1)QX^T

I may be missing something. Where are we doing SVD directly in this procedure?

you are right. In what you write above I would only change one detail: in the reference Q and L are the eigenvectors and eigenvalues of XX^T, whereas V and S^2 are the eigenvectors and eigenvalues of X^TX. (also Q is transposed in the second equation). So to avoid confusion I would replace

Letting X^TX=QLQ^T, we write this as

XQ^T(L + alpha*I)^(-1)QX^T

with
Letting X^TX=VS^2V^T, we write this as

XV(S^2 + alpha*I)^(-1)V^TX^T

Where are we doing SVD directly in this procedure?

We are not doing SVD in _eigen_decompose_covariance, but in _svd_decompose_design_matrix (when X is dense we compute U in the decomposition to avoid having to recompute the product XV many times)

(also Q is transposed in the second equation).

Oops.

Letting X^TX=VS^2V^T, we write this as

I think adding any reference to SVD would be confusing for a future reviewer. We are following section 5.3 starting from

1-X(X^TX+alpha*I)^-1X^T

and not following the reference using SVD in section 5.2.

jeromedockes · 2019-05-09T19:45:30Z

In the docstring for _RidgeGCV, G needs to be replaced by G^-1 in this
part as well:

    Compute eigendecomposition K = Q V Q^T.
    Then G = Q (V + alpha*Id)^-1 Q^T,
    where (V + alpha*Id) is diagonal.
    It is thus inexpensive to inverse for many alphas.

    Let loov be the vector of prediction values for each example
    when the model was fitted with all examples but this example.

    loov = (KGY - diag(KG)Y) / diag(I-KG)

    Let looe be the vector of prediction errors for each example
    when the model was fitted with all examples but this example.

    looe = y - loov = c / diag(G)

jeromedockes · 2019-05-09T19:50:11Z

sklearn/linear_model/ridge.py


    def _solve_eigen_covariance_no_intercept(
-            self, alpha, y, sqrt_sw, X_mean, s, V, X):
+            self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
        """Compute dual coefficients and diagonal of (Identity - Hat_matrix)


Suggested change

"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)

"""Compute dual coefficients and diagonal of G^-1

jeromedockes · 2019-05-09T19:52:31Z

sklearn/linear_model/ridge.py

-            self, alpha, y, sqrt_sw, X_mean, s, V, X):
-        """Compute dual coefficients and diagonal of (Identity - Hat_matrix)
+            self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
+        """Compute dual coefficients and diagonal of (Identity - Hat_matrix),


Suggested change

"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),

"""Compute dual coefficients and diagonal of G^-1

jeromedockes · 2019-05-09T19:52:51Z

sklearn/linear_model/ridge.py

-        """Compute dual coefficients and diagonal of (Identity - Hat_matrix)
+            self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
+        """Compute dual coefficients and diagonal of (Identity - Hat_matrix),
+        where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T


Suggested change

where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T

jeromedockes · 2019-05-09T19:53:42Z

sklearn/linear_model/ridge.py

-            self, alpha, y, sqrt_sw, X_mean, s, V, X):
-        """Compute dual coefficients and diagonal of (Identity - Hat_matrix)
+            self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
+        """Compute dual coefficients and diagonal of (Identity - Hat_matrix),


Suggested change

"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),

"""Compute dual coefficients and diagonal of G^-1

jeromedockes · 2019-05-09T19:54:01Z

sklearn/linear_model/ridge.py

-        """Compute dual coefficients and diagonal of (Identity - Hat_matrix)
+            self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):
+        """Compute dual coefficients and diagonal of (Identity - Hat_matrix),
+        where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T


Suggested change

where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T

jeromedockes · 2019-05-09T19:56:02Z

sklearn/linear_model/ridge.py

@@ -1324,17 +1326,21 @@ def _solve_eigen_covariance_intercept(
        return (1 - hat_diag) / alpha, (y - y_hat) / alpha

    def _solve_eigen_covariance(
-            self, alpha, y, sqrt_sw, X_mean, s, V, X):
-        """Compute dual coefficients and diagonal of (Identity - Hat_matrix)
+            self, alpha, y, sqrt_sw, X_mean, eigvals, Q, X):


here maybe keep the notation V instead of Q? since they are the right singular vectors of X

jeromedockes · 2019-05-09T19:59:17Z

sklearn/linear_model/ridge.py

+        (n_samples > n_features and X is sparse).
+
+        Letting X^T.X = QVQ^T, the Hat_matrix can be written as:
+        QV(V + alpha*I)^(-1)VQ


maybe X = USV^T, and the influence or hat matrix is
XV (1 / (S^2 + alpha I)) V^TX^T

(Copy from another comment)

My understanding is, in _eigen_decompose_covariance we are doing an eigenvalue decomposition of X^TX and _solve_eigen_covariance uses this to get

X(X^TX + alpha*I)^(-1)X^T

Letting X^TX=QLQ^T, we write this as

XQ^T(L + alpha*I)^(-1)QX^T

I may be missing something. Where are we doing SVD directly in this procedure?

I can see this can be put in the context of using the right singular matrix, but we only actually call svd in _svd_decompose_design_matrix (not in _eigen_decompose_covariance).

still _eigen_decompose_covariance uses V, the eigenvectors of X^TX, rather than Q, which are the eigenvectors of XX^T. Either way in the docstring this line:

QV(V + alpha*I)^(-1)VQ

needs to be corrected, replaced by

XQ(L + alpha*I)^(-1)Q^TX^T

if we keep the notation you suggest, or

XV(S**2 + alpha*I)^(-1)V^TX^T

to avoid using Q for both the left and right singular vectors of X

Okay I see what you mean with the Q. Using V looks good for me. Using S**2 is still al little suspect as mention in the other comment.

Okay I see what you mean with the Q. Using V looks good for me. Using S**2 is still al little suspect as mention in the other comment.

cool! sorry if I was unclear. If you dislike S**2 I think any other letter is fine, outside of Q, V, U, X, S and L

jeromedockes · 2019-05-09T20:04:24Z

sklearn/linear_model/ridge.py

-            self, alpha, y, sqrt_sw, X_mean, s, V, X):
-        """Compute dual coefficients and diagonal of (Identity - Hat_matrix)
+            self, alpha, y, sqrt_sw, X_mean, eigvals, V, X):
+        """Compute dual coefficients and diagonal of (Identity - Hat_matrix),


Suggested change

"""Compute dual coefficients and diagonal of (Identity - Hat_matrix),

"""Compute dual coefficients and diagonal of G^-1,

jeromedockes · 2019-05-09T20:04:40Z

sklearn/linear_model/ridge.py

-        """Compute dual coefficients and diagonal of (Identity - Hat_matrix)
+            self, alpha, y, sqrt_sw, X_mean, eigvals, V, X):
+        """Compute dual coefficients and diagonal of (Identity - Hat_matrix),
+        where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T


Suggested change

where Hat_matrix = X(X^TX + alpha*I)^(-1)X^T

jeromedockes · 2019-05-09T20:10:10Z

sklearn/linear_model/ridge.py

@@ -1352,10 +1359,10 @@ def _svd_decompose_design_matrix(self, X, y, sqrt_sw):

    def _solve_svd_design_matrix(
            self, alpha, y, sqrt_sw, X_mean, v, U, UT_y):


here and in _svd_decompose_design_matrix maybe v should also be called eigvals ?

I called it eigenvals_sq just to denote it it S**2.

they are the eigenvalues of X^TX, ie the squared singular values of X, so I guess it should be eigenvals or singvals_sq?

When performing SVD, we do not explicitly calculate X^TX. When one reads the code, it explicit shows s is the eigenvalues of X and eigenvals_sq is the square of it.

If we use eigenvals = s**2, a future reviewer would wonder "why are we calling the eigevals the square of the actual eigenvalues?". (It is unclear from the code/comments/reference that we actually want the eigenvals of X^TX) If we following the references directly, the idea of covariance is not stated anywhere while reading the SVD section (5.2). It is briefly mentioned in sectino 5.3.

then how about singvals_sq or squared_singvals or squared_singular_values? because they are the singular values of X

It is briefly mentioned in sectino 5.3.

section 5.3 is about the case where we don't want the LOOE and use a separate validation set; although it also computes an eigendecomposition of X^TX it suggests something quite different and I don't think we should mention it at all

I think singvals_sq works.

jeromedockes · 2019-05-10T16:09:37Z

sklearn/linear_model/ridge.py

    where (V + alpha*Id) is diagonal.
    It is thus inexpensive to inverse for many alphas.

    Let loov be the vector of prediction values for each example
    when the model was fitted with all examples but this example.

-    loov = (KGY - diag(KG)Y) / diag(I-KG)
+    loov = (KGY - diag(KG^-1)Y) / diag(I-KG^-1)


Suggested change

loov = (KGY - diag(KG^-1)Y) / diag(I-KG^-1)

loov = (KG^-1Y - diag(KG^-1)Y) / diag(I-KG^-1)

thomasjpfan · 2019-05-10T16:41:49Z

@jeromedockes Thanks for bearing with me. Naming things can be tough.

jeromedockes · 2019-05-10T17:01:34Z

@jeromedockes Thanks for bearing with me. Naming things can be tough.

it is! but you are right that this will make the code easier to understand for future readers

thomasjpfan · 2019-05-10T19:28:56Z

I think all the comments made by @jeromedockes were addressed.

…efactor

jeromedockes · 2019-05-11T12:15:49Z

I think all the comments made by @jeromedockes were addressed.

yes! thanks a lot for considering my suggestions.

re-reading the code I also realized that the docstrings for _compute_gram and
_compute_covariance are out of date: the center parameter has been removed.
sorry for not modifying them in #13350. I opened a small PR on your branch to
fix this, or you can make the change however you think is better.

regarding the changes of variable names made in this PR I don't have any more
comments.

jeromedockes · 2019-05-11T13:41:22Z

thanks! no more comments on my side :)

ogrisel · 2019-05-16T09:20:04Z

Ooops, I put the thank you message in the merge commit message field, instead of the comment field...

jnothman · 2019-05-16T09:27:28Z

Hahaha

@thomasjpfan

Thank you very much for this work @thomasjpfan and @jeromedockes.

thomasjpfan added 3 commits May 8, 2019 12:57

CLN Renames variables to be more clear

4190303

Merge remote-tracking branch 'upstream/master' into ridge_cv_sparse_r…

327fd93

…efactor

DOC More comments on the decomposition

4da4582

thomasjpfan changed the title ~~[MRG] RidgeCV Sparse Minor Refactor~~ [MRG] RidgeCV minor refactor to improve readability May 9, 2019

ogrisel reviewed May 9, 2019

View reviewed changes

sklearn/linear_model/ridge.py Outdated Show resolved Hide resolved

DOC Fix

09c9914

jeromedockes reviewed May 9, 2019

View reviewed changes

sklearn/linear_model/ridge.py Outdated Show resolved Hide resolved

DOC Rename to eigvals

4869666

jeromedockes reviewed May 9, 2019

View reviewed changes

sklearn/linear_model/ridge.py Outdated Show resolved Hide resolved

jeromedockes reviewed May 9, 2019

View reviewed changes

CLN Renames variables

bf502bc

jeromedockes reviewed May 9, 2019

View reviewed changes

CLN Renames variables

fd06927

jeromedockes reviewed May 10, 2019

View reviewed changes

CLN Renames variables

bc6c614

thomasjpfan added 2 commits May 10, 2019 15:25

CLN Renames variables

a06cb0e

CLN Renames variables

64a4446

Merge remote-tracking branch 'upstream/master' into ridge_cv_sparse_r…

f4cecf0

…efactor

update docstrings for _compute_gram and _compute_covariance (#5)

7db54cd

NicolasHug approved these changes May 14, 2019

View reviewed changes

ogrisel approved these changes May 16, 2019

View reviewed changes

ogrisel merged commit 4e65827 into scikit-learn:master May 16, 2019

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

[MRG] RidgeCV minor refactor to improve readability (scikit-learn#13832)

b9183f4

Thank you very much for this work @thomasjpfan and @jeromedockes.

	"""Compute dual coefficients and diagonal of (Identity - Hat_matrix)
	"""Compute dual coefficients and diagonal of G^-1

		@@ -1352,10 +1359,10 @@ def _svd_decompose_design_matrix(self, X, y, sqrt_sw):

		def _solve_svd_design_matrix(
		self, alpha, y, sqrt_sw, X_mean, v, U, UT_y):

	loov = (KGY - diag(KG^-1)Y) / diag(I-KG^-1)
	loov = (KG^-1Y - diag(KG^-1)Y) / diag(I-KG^-1)

Uh oh!

[MRG] RidgeCV minor refactor to improve readability #13832

[MRG] RidgeCV minor refactor to improve readability #13832

Uh oh!

Conversation

thomasjpfan commented May 8, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeromedockes commented May 9, 2019

Uh oh!

Uh oh!

thomasjpfan commented May 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromedockes commented May 9, 2019

Uh oh!

thomasjpfan commented May 9, 2019

Uh oh!

ogrisel commented May 9, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan May 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeromedockes commented May 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan May 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

thomasjpfan commented May 9, 2019 •

edited

Loading

thomasjpfan May 9, 2019 •

edited

Loading

thomasjpfan May 10, 2019 •

edited

Loading