Skip to content

DOC improve diabetes description #16534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 4, 2020

Conversation

maikia
Copy link
Contributor

@maikia maikia commented Feb 24, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

It has been suggested in #16155 that the features of the Diabetes dataset were not well described. Not only the names differed slightly in the explanation and in the dataset but also they were not informative: eg 's1'

This updates the description to give more meaning to the features

Any other comments?

@maikia maikia requested a review from ogrisel February 24, 2020 15:41
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much, this is very useful.

Where did you find the expanded names of each variable, e.g. that "tc" stand for "a number of White Blood Cells"?

The website of the data does not give those details in the summary page:

https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

Did you fine a better reference? If so please mention the extra source at the end of the section.

@maikia
Copy link
Contributor Author

maikia commented Feb 26, 2020

Thank you very much, this is very useful.

Where did you find the expanded names of each variable, e.g. that "tc" stand for "a number of White Blood Cells"?

The website of the data does not give those details in the summary page:

https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

Did you fine a better reference? If so please mention the extra source at the end of the section.

No, I just searched for the description of those abbreviation in the standard blood test (so no specific reference). If you would prefer I could make it more detailed, eg:
tc: T cell count in a blood. A T cell is a type of white blood cell (WBC). WBCs are also called lymphocytes.

@adrinjalali
Copy link
Member

I'd probably replace the "white blood cells" to "T-Cells (a type of white blood cells)"

@NicolasHug NicolasHug merged commit dbd7937 into scikit-learn:master Mar 4, 2020
@NicolasHug
Copy link
Member

Thanks @maikia !

ashutosh1919 pushed a commit to ashutosh1919/scikit-learn that referenced this pull request Mar 13, 2020
@maikia maikia deleted the diabetes_description branch April 24, 2020 11:10
gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants