ENH allows to overwrite read_csv parameter in fetch_openml #25488

glemaitre · 2023-01-26T15:39:43Z

Allows to overwrite the parameter passed to read_csv when reading a dataframe.
It is not intended to be used widely but it could be worth it when things go sideways.

thomasjpfan

Thank you for the PR! I am okay with adding this option.

thomasjpfan · 2023-02-07T23:09:48Z

sklearn/datasets/_openml.py

+        the default options. Internally, we used the default parameters of
+        :func:`pandas.read_csv` except for the following parameters:
+
+        - `header`: set to `None`


Should this be in fetch_openml as part of the public API?

thomasjpfan · 2023-02-07T23:15:38Z

sklearn/datasets/_arff_parser.py

-        dtype=dtypes,
-        skipinitialspace=True,  # skip spaces after delimiter to follow ARFF specs
-    )
+    frame = pd.read_csv(gzip_file, **read_csv_kwargs)


Currently, if there is an exception while reading the data, one would need to enter a debugger to find out where the file is and what the read_csv_kwargs are. I think it would be helpful reraise an exception that outputs the read_csv_kwargs and gzip_file to help with debugging the issue.

glemaitre · 2023-03-23T14:36:41Z

I will close this one. Let's keep in mind that it exists if we really need more flexibility and tweak the parameter in the future.

glemaitre · 2023-05-25T10:13:06Z

OK, so opening back this one. It seems that we will need it if we want to manage ourself some read_csv breaking change between 1.X and 2.X in pandas: #25878 (comment)

ENH allows to overwrite read_csv parameter in fetch_openml

6232fb4

glemaitre marked this pull request as draft January 26, 2023 15:39

github-actions bot added the module:datasets label Jan 26, 2023

glemaitre added 2 commits January 27, 2023 13:55

TST add test to check that we can set parameters

4097107

DOC chnage pr number

cea7457

glemaitre marked this pull request as ready for review January 27, 2023 13:01

thomasjpfan reviewed Feb 7, 2023

View reviewed changes

glemaitre closed this Mar 23, 2023

glemaitre mentioned this pull request May 25, 2023

DOC Rework outlier detection estimators example #25878

Merged

glemaitre reopened this May 25, 2023

glemaitre added this to the 1.3 milestone May 25, 2023

glemaitre closed this May 25, 2023

glemaitre mentioned this pull request May 25, 2023

ENH allows to overwrite read_csv parameter in fetch_openml #26433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH allows to overwrite read_csv parameter in fetch_openml #25488

ENH allows to overwrite read_csv parameter in fetch_openml #25488

glemaitre commented Jan 26, 2023

thomasjpfan left a comment

thomasjpfan Feb 7, 2023

thomasjpfan Feb 7, 2023

glemaitre commented Mar 23, 2023

glemaitre commented May 25, 2023

ENH allows to overwrite read_csv parameter in fetch_openml #25488

ENH allows to overwrite read_csv parameter in fetch_openml #25488

Conversation

glemaitre commented Jan 26, 2023

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Feb 7, 2023

Choose a reason for hiding this comment

thomasjpfan Feb 7, 2023

Choose a reason for hiding this comment

glemaitre commented Mar 23, 2023

glemaitre commented May 25, 2023