Description
Feature
I have recently posted a series of PRs (#19599, #19601, #19608, #19598, #19610, #19620, #19609, #19618, #19687) which altogether speed up loadtxt() by up to 4x when parsing simple but common (specifically, entirely numeric) text files (see benchmarks in #19687, for example) solely by removing a lot of Python overhead, and have yet another patch in the works that should yield a further sizeable improvement. After including that last patch, when parsing e.g. 2 columns of floats, I find (if my profiling is correct...) that nearly ~30% of the runtime is spent in https://docs.python.org/3/c-api/conversion.html#c.PyOS_string_to_double, which numpy ultimately uses to parse floats. Therefore, it may make sense to consider adopting a faster floating point parser (e.g., if https://lemire.me/blog/2020/03/10/fast-float-parsing-in-practice/ really provides a 10x speedup, then we effectively gain another 30% of speed). Of course, we'd need to fall back to PyOS_string_to_double if the former fails, in order to parse "weird" formats supported by Python (e.g., underscores in literals), but they should be relatively uncommon. Also, if one day loadtxt() does get fully rewritten in C (getting rid of the Python overhead that comprises a decent part of the remaining 70%), the speedup on floating point parsing would be even more relevant, so adopting the fast parser would not be wasted work. In any case, I am posting this here because I don't have the intent right now to do that work :-), but it seems to be a relatively self-contained project, so others may be interested...
Alternatively, one could lobby the CPython core devs to directly base PyOS_string_to_double on the Lemire parser, but see https://bugs.python.org/issue41310. Also of note are the pandas float parsers (https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/parser/tokenizer.c), although I would (personally) absolutely not want to use an approximate float parser in loadtxt().