Skip to content

numpy.loadtxt with skiprows much slower than custom generator #7480

Closed
@jauerb

Description

@jauerb

I noticed when trying to load somewhat large text files (~12MB) that using the skiprows option of numpy.loadtxt is much slower than implementing the same thing with a custom generator

For example this take on average ~1.2s per invocation on my laptop

times = []
for _ in range(30) :
    s = time.time()
    tmp = np.loadtxt("test.txt",  skiprows=1001)
    times.append(time.time() - s)

print np.mean(times)

whereas this takes ~0.4s per invocation

def my_generator(fname, skiprows=0) :
    f = open(fname, 'r')
    for i,line in enumerate(f) :
        if i >= skiprows :
            yield line

times = []
for _ in range(30) :
    s = time.time()
    tmp = np.loadtxt(my_generator("test.txt", skiprows=1001))
    times.append(time.time() - s)

print np.mean(times)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions