py/map: Make map rehashing an atomic operation. #11604

dpgeorge · 2023-05-24T00:53:17Z

This simple change to map rehashing makes it so the old map is kept intact during the rehash, and only after the rehash is complete is the old map updated to the new one. And this update is a very quick memory copy, which is done inside an atomic section.

This means that hard IRQs handlers can access dict members (eg globals) even during the update (add/del) of that dict.

See related #10620.

github-actions · 2023-05-24T00:58:50Z

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:   +72 +0.039% 
   unix x64:   +96 +0.012% standard
      stm32:    +8 +0.002% PYBV10
     mimxrt:   +16 +0.004% TEENSY40
        rp2:   +16 +0.005% RPI_PICO
       samd:    +8 +0.003% ADAFRUIT_ITSYBITSY_M4_EXPRESS

codecov-commenter · 2023-05-24T01:30:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2962e24) 98.36% compared to head (d5a023b) 98.36%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #11604   +/-   ##
=======================================
  Coverage   98.36%   98.36%           
=======================================
  Files         161      161           
  Lines       21089    21089           
=======================================
  Hits        20744    20744           
  Misses        345      345

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jimmo · 2023-05-24T01:35:42Z

Great idea!

Worth noting that this isn't just important for hard IRQs but also GIL-free multi-core (on e.g. rp2) where the problem is much more likely to occur.

jimmo · 2023-05-26T06:41:12Z

Unfortunately from further discussion with @dpgeorge about this, I'm not sure this solves the problem. Here are two scenarios:

The main thread triggers map rehash, during which a hard IRQ fires which sets the value of an existing global. This is the scenario in Docs: Detail issue with hard ISRs and globals. #10620 (comment) . With this PR now that won't cause corruption, but the write will go to the old map, and be lost when the rehash finishes and switches over to the new map. I think some sort of per-map flag to detect a write during a re-hash, and then re-run the re-hash might be able to fix this? (This only works because the IRQ could not have triggered the re-hash)
Two non-GIL threads (e.g. rp2040), one triggers a re-hash while the other reads from the same map. The second thread will see the old table, but ff the first thread causes an allocation before the second thread has time to access the table, then the table will be overwritten by the new allocation. This is due to the m_del at the end of rehash. This is an optimisation to early reclaim memory, we could disable this when non-GIL threading is enabled.

Also worth pointing out -- if two threads (or thread + hard IRQ) do simultaneous modifications to a map, it is fine only if they are both updating existing keys. In other words, any operation that mutates the set of keys (insert, remove) is not concurrent safe. For the insert at least, you probably shouldn't be allowed to do this in a hard IRQ because there's a chance it might trigger a re-hash anyway (which would fail to allocate). So for the hard IRQ case at least, we could consider preventing any mutation that isn't an update to an existing key.

For the non-GIL threading case, it seems reasonable that two threads shouldn't be allowed to concurrently mutate a dictionary's keys. Enforcing that without a mutex seems hard though. I feel that conceptually the globals dict and, say, the locals dict of a class or an actual dict object in users code should have different rules. Maybe in non-GIL threading mode, we need to do special protection of just globals dicts?

CC @peterhinch

dpgeorge · 2024-02-20T23:36:38Z

The test tests/extmod/select_poll_eintr.py is failing on the unix port due to the bug that's being addressed in this PR. The spawned thread tries to read the global variable lock but that sometimes happens while the main thread is rehashing the globals dict because it's adding the global variable s or poller.

I was able to fix that unix case by adding a mutex that is obtained during all map operations. That's obviously not efficient but at least shows the problem and how it could be fixed.

Maybe in non-GIL threading mode, we need to do special protection of just globals dicts?

Yes, I think that would be a reasonable way to solve this problem, to somehow make globals dicts (the dict of the top level of a module) special and support concurrent access. Maybe the map can internally have both a read-only and write-only table and then the rehash scheme is:

read-only and write-only starts by pointing to the same table
rehash begins and creates a new table and populates the keys only (not their values)
rehash then points the write-only table to this new table with only keys (so now anyone writing to this map will update the new table, but reads get existing values)
rehash then copies across values from the read-only table but only if the value in the write-only table is empty
rehash then points the read-only pointer to the new table
rehash frees the old table

dpgeorge · 2024-02-20T23:57:20Z

I added a test which does concurrent globals access and fails easily on the unix port with non-GIL threading.

Signed-off-by: Damien George <damien@micropython.org>

dpgeorge · 2024-07-23T01:23:07Z

Alternative idea for a fix:

Define a new mutex which is used to protect the set of functions that can access the globals dict: mp_load_global(), mp_load_name(), mp_store_name() etc. That won't cover all cases of globals access (eg globals()['abc'] = 123) but should be enough to fix normal use of globals.
Would need to think how to release the mutex in case the lookup raises an exception while the mutex is held.
Could optimise the mutex obtain/release (and also the GC and qstr mutex's) by having them dynamic function pointers that are the identity when no other thread is running, and proper mutex functions when other threads are running.

dpgeorge added the py-core Relates to py/ directory in source label May 24, 2023

dpgeorge mentioned this pull request May 24, 2023

Docs: Detail issue with hard ISRs and globals. #10620

Merged

dpgeorge force-pushed the py-map-atomic-rehash branch from e25fe94 to 3102154 Compare February 20, 2024 23:56

dpgeorge added 2 commits February 21, 2024 10:58

py/map: Make map rehashing an atomic operation.

e77e99b

Signed-off-by: Damien George <damien@micropython.org>

tests/thread: Add a test to stress concurrent globals access.

d5a023b

Signed-off-by: Damien George <damien@micropython.org>

dpgeorge force-pushed the py-map-atomic-rehash branch from 3102154 to d5a023b Compare February 20, 2024 23:59

robert-hh mentioned this pull request Feb 21, 2024

No support for UART RX IRQ (interrupt) #13605

Closed

dpgeorge mentioned this pull request Jun 21, 2024

tests/thread: Restore gc before test case ends. #15265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py/map: Make map rehashing an atomic operation. #11604

py/map: Make map rehashing an atomic operation. #11604

dpgeorge commented May 24, 2023

github-actions bot commented May 24, 2023 •

edited

Loading

codecov-commenter commented May 24, 2023 •

edited by codecov bot

Loading

jimmo commented May 24, 2023

jimmo commented May 26, 2023

dpgeorge commented Feb 20, 2024

dpgeorge commented Feb 20, 2024

dpgeorge commented Jul 23, 2024

py/map: Make map rehashing an atomic operation. #11604

Are you sure you want to change the base?

py/map: Make map rehashing an atomic operation. #11604

Conversation

dpgeorge commented May 24, 2023

github-actions bot commented May 24, 2023 • edited Loading

codecov-commenter commented May 24, 2023 • edited by codecov bot Loading

Codecov Report

jimmo commented May 24, 2023

jimmo commented May 26, 2023

dpgeorge commented Feb 20, 2024

dpgeorge commented Feb 20, 2024

dpgeorge commented Jul 23, 2024

github-actions bot commented May 24, 2023 •

edited

Loading

codecov-commenter commented May 24, 2023 •

edited by codecov bot

Loading