Skip to content

Added 'prescan' option to loadtxt for array allocation prior to reading #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dhomeier
Copy link
Contributor

ENH: implement 2-pass reading in loadtxt to avoid problems with
excessive memory usage on large data files. The extra parsing
typically takes 10% of the total read-in time; 30-50% for compressed
files.

Setting 'prescan=True' will parse the valid data lines of the input
file in a first pass, then allocate an array to read the data directly
into (row by row), bypassing the creation of an input list with the
associated high memory usage.

Setting 'prescan=True' will parse the valid data lines of the input
file in a first pass, then allocate an array to read the data directly
into (row by row), bypassing the creation of an input list with the
associated high memory usage.
@pv
Copy link
Member

pv commented Sep 2, 2011

I think this is not the correct approach (will not work with streams etc.). A cleaner one would be

  • Read the first N lines of the file to determine the number of columns
  • After that, resize the array dynamically using the .resize() while loading

@charris
Copy link
Member

charris commented May 3, 2013

Looks like it is time to close this.

@charris charris closed this May 3, 2013
luyahan pushed a commit to plctlab/numpy that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants