Closed
Description
Bug report
There is a private function _splitlines_no_ff
which is only ever called in ast.get_source_segment
. This functions splits the entire source given to it, but ast.get_source_segment
only needs at most node.end_lineo
lines to work.
Lines 308 to 330 in 1acdfec
Lines 344 to 378 in 1acdfec
If, for example, you want to extract an import line from a very long file, this can seriously degrade performance.
The introduction of a max_lines
kwarg in _splitlines_no_ff
which functions like maxsplit
in str.split
would minimize unneeded work. An implementation of the proposed fix is below (which makes my use case twice as fast):
--- a/Lib/ast.py
+++ b/Lib/ast.py
@@ -305,11 +305,16 @@ def get_docstring(node, clean=True):
return text
-def _splitlines_no_ff(source):
+def _splitlines_no_ff(source, max_lines=-1):
"""Split a string into lines ignoring form feed and other chars.
This mimics how the Python parser splits source code.
+
+ If max_lines is given, at most max_lines will be returned. If max_lines is not
+ specified or negative, then there is no limit on the number of lines returned.
"""
+ if not max_lines:
+ return []
idx = 0
lines = []
next_line = ''
@@ -323,6 +328,8 @@ def _splitlines_no_ff(source):
idx += 1
if c in '\r\n':
lines.append(next_line)
+ if max_lines == len(lines):
+ return lines
next_line = ''
if next_line:
@@ -360,7 +367,7 @@ def get_source_segment(source, node, *, padded=False):
except AttributeError:
return None
- lines = _splitlines_no_ff(source)
+ lines = _splitlines_no_ff(source, max_lines=end_lineno + 1)
if end_lineno == lineno:
return lines[lineno].encode()[col_offset:end_col_offset].decode()
Your environment
- CPython versions tested on: 3.11