Closed
Description
I noticed when trying to load somewhat large text files (~12MB) that using the skiprows option of numpy.loadtxt is much slower than implementing the same thing with a custom generator
For example this take on average ~1.2s per invocation on my laptop
times = []
for _ in range(30) :
s = time.time()
tmp = np.loadtxt("test.txt", skiprows=1001)
times.append(time.time() - s)
print np.mean(times)
whereas this takes ~0.4s per invocation
def my_generator(fname, skiprows=0) :
f = open(fname, 'r')
for i,line in enumerate(f) :
if i >= skiprows :
yield line
times = []
for _ in range(30) :
s = time.time()
tmp = np.loadtxt(my_generator("test.txt", skiprows=1001))
times.append(time.time() - s)
print np.mean(times)
Metadata
Metadata
Assignees
Labels
No labels