Skip to content

io module gets initialized multiple times when opening files #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
silmeth opened this issue Feb 26, 2019 · 4 comments
Closed

io module gets initialized multiple times when opening files #557

silmeth opened this issue Feb 26, 2019 · 4 comments

Comments

@silmeth
Copy link
Contributor

silmeth commented Feb 26, 2019

When experimenting with file IO, I found out that when opening a file, one always initializes the io module from scratch (and then, when importing it, one initializes it exactly once, no matter how many times it’s been already initialized), making checking if objects are instances of given classes from that module impossible, as those classes are represented by different memory objects, eg.:

>>>>> file = open('/tmp/tst')
>>>>> file.buffer
<BufferedReader object at 0x55658e5c30e0>
>>>>> type(file.buffer)
<class 'BufferedReader'>
>>>>> import io
>>>>> io.BufferedReader
<class 'BufferedReader'>
>>>>> type(file.buffer) is io.BufferedReader
False
>>>>> type(file.buffer) == io.BufferedReader
False
>>>>> from io import BufferedReader  # subsequent imports from the module do not cause that
>>>>> BufferedReader is io.BufferedReader
True
>>>>> id(type(file.buffer))
93894668449408
>>>>> id(BufferedReader)
93894668474000

It seems this is the only module affected by this behaviour (all the other mk_module(&ctx) functions are afaik used only in the module_inits hash map used for import initialization).

@coolreader18
Copy link
Member

coolreader18 commented Feb 26, 2019

You're on the right track with your fix, and that should fix the issue short-term, but I think the better solution would be to keep a HashMap of stdlib modules that have already been initialized in the VM, next to the stdlib_inits HashMap .

@silmeth
Copy link
Contributor Author

silmeth commented Feb 26, 2019

It already kinda is in the sys.modules attribute, look at the import::import_module() function, it first checks the sys.modules attribute for the initialized module, only if it fails, it calls into import::import_uncached_module() which initializes the module and returns and then import_module() stores it in sys.modules.

The problem, as far as I can tell, was that any module should only be initialized by the import, never manually in any stdlib function.

EDIT: in other words, when modules are imported using the proper import_module() function, they are initialized only once, and are initialized lazily.

@coolreader18
Copy link
Member

Or better yet, held in the same HashMap with a
(StdlibInitFunc, Option<PyObjectRef>) as the value

@coolreader18
Copy link
Member

coolreader18 commented Feb 26, 2019

Alright, that's valid. I thought it was done that way to prevent someone from replacing a built-in module through sys.modules, but I just tried it in CPython and it's possible to do so. So, yes, put a loaded built-in module into sys.modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants