-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Add documentation section on Scaling #28315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
I'm on board with this |
This could fit in a general "performance" section of the user guide together with the https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html we already have? |
TomAugspurger
added a commit
to TomAugspurger/pandas
that referenced
this issue
Sep 23, 2019
TomAugspurger
added a commit
to TomAugspurger/pandas
that referenced
this issue
Sep 23, 2019
TomAugspurger
added a commit
that referenced
this issue
Oct 1, 2019
* DOC: Add scaling to large datasets section Closes #28315
josibake
pushed a commit
to josibake/pandas
that referenced
this issue
Oct 1, 2019
* DOC: Add scaling to large datasets section Closes pandas-dev#28315
proost
pushed a commit
to proost/pandas
that referenced
this issue
Dec 19, 2019
* DOC: Add scaling to large datasets section Closes pandas-dev#28315
proost
pushed a commit
to proost/pandas
that referenced
this issue
Dec 19, 2019
* DOC: Add scaling to large datasets section Closes pandas-dev#28315
bongolegend
pushed a commit
to bongolegend/pandas
that referenced
this issue
Jan 1, 2020
* DOC: Add scaling to large datasets section Closes pandas-dev#28315
jason-ytsao
pushed a commit
to jason-ytsao/exp_vtb
that referenced
this issue
Aug 10, 2022
* DOC: Add scaling to large datasets section Closes pandas-dev/pandas#28315
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
From the user survey, the most critical feature request was "Improve scaling to larger datasets".
While we continue to do work within pandas to improve scaling (fewer copies, native string dtype, etc.), we can document a few strategies that are available that may help with scaling.
columns=
(csv or parquet) to select only the columns you need. Probably some other examples.pd.read_csv(..., chunksize=)
, to avoid having to rewrite.Do people have thoughts on this? Any objections to highlighting outside projects like Dask?
Are there other strategies we should mention?
The text was updated successfully, but these errors were encountered: