Skip to content

Conversation

twolodzko
Copy link
Contributor

@twolodzko twolodzko commented Oct 15, 2019

This PR adds functions for calculating bivariate covariance and Pearson's correlation.

https://bugs.python.org/issue38490

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@twolodzko

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@corona10
Copy link
Member

I'd like to recommend you to open the PR after the discussion on https://bugs.python.org/issue38490 is finalized. :)

@twolodzko twolodzko changed the title bpo-38490: Add covariance and Pearson's correlation bpo-38490: Add covariance, Pearson's correlation, and simple linear regression Oct 22, 2019
@twolodzko twolodzko changed the title bpo-38490: Add covariance, Pearson's correlation, and simple linear regression bpo-38490: statistics: Add covariance, Pearson's correlation, and simple linear regression Oct 22, 2019
@@ -910,10 +910,13 @@ def correlation(x, y, /):
raise StatisticsError('at least one of the inputs is constant')


def linear_regression(regressor, dependent_variable):
LinearRegression = namedtuple('LinearRegression', ['intercept', 'slope'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter order seems backwards to me, here and through out the PR. Traditionally, we say an equation for a line in is slope-intercept form as y = mx + b with the m preceding the b.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on your background, in statistics the model is often written as y = β_0 + β_1 * x + ε , so the intercept is written before. Also, if now it is a named tuple, the order is of less importance. Of course, if you insist, I may change it.

@stevendaprano
Copy link
Member

stevendaprano commented Oct 9, 2020 via email

@stevendaprano
Copy link
Member

stevendaprano commented Oct 9, 2020 via email

@twolodzko
Copy link
Contributor Author

@stevendaprano @rhettinger should I apply any more changes, or is it ok as-is?

@taleinat
Copy link
Contributor

The only current outstanding issue is the order of returned values from linear_regression().

ISTM given the short discussion here that the order of (intercept, gradient) could go either way, and that in the context of statistics the current order seems fine. And since this will be a named tuple, the order is not as important as it would be otherwise. As long as the documentation and doc-string are consistent and clear on this, which they are, IMO this is fine.

I'd like to get @rhettinger's approval, or otherwise, for this before moving forward.

@stevendaprano
Copy link
Member

stevendaprano commented Oct 19, 2020 via email

@twolodzko
Copy link
Contributor Author

@stevendaprano I'd be against having predict_x as this is incorrect if this is a statistical model y = a + bx + noise, because it ignores the noise.

@twolodzko
Copy link
Contributor Author

It's been two weeks and the discussion seems to got stuck. Is there anything more I should change about the PR?

@ZackerySpytz
Copy link
Contributor

Please add an entry to Doc/whatsnew/3.10.rst and add versionadded directives to the documentation for the new functions.

@taleinat
Copy link
Contributor

taleinat commented Nov 2, 2020

@stevendaprano I'd be against having predict_x as this is incorrect if this is a statistical model y = a + bx + noise, because it ignores the noise.

I'm also -1 on adding .predict_{x,y} methods. I feel that they would be beyond the scope of what this module sets out to achieve, and they'd also be surprising for many to find on a named tuple.

@taleinat
Copy link
Contributor

taleinat commented Nov 2, 2020

Apart from my minor suggestion for the What's New entry, this looks good to me.

@rhettinger, would you like to take another look? Any other comments?

Co-authored-by: Tal Einat <taleinat+github@gmail.com>
Copy link
Member

@stevendaprano stevendaprano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twolodzko I am very impressed in this, thank you, and I look forward to using it. I just commented on what looks like a broken piece of ReST and a slightly unusual choice of wording, but apart from this I am very happy with your patch and I think it should be merged.

@twolodzko
Copy link
Contributor Author

If I'm not missing something, all issues seem to be resolved right now.

@taleinat
Copy link
Contributor

taleinat commented Nov 3, 2020

If I'm not missing something, all issues seem to be resolved right now.

Indeed.

@ZackerySpytz
Copy link
Contributor

Unfortunately, it seems like there are some minor grammatical errors in the new docstrings and documentation.

@taleinat taleinat merged commit 09aa6f9 into python:master Apr 25, 2021
@bedevere-bot
Copy link

@taleinat: Please replace # with GH- in the commit message next time. Thanks!

@taleinat
Copy link
Contributor

Merged!

This has indeed reached a very good state a while ago and was reviewed favorably by several core devs. Definitely good to have this for 3.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants