Skip to content

Merge 1.18 #6038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 355 commits into from
Feb 19, 2022
Merged

Merge 1.18 #6038

merged 355 commits into from
Feb 19, 2022

Conversation

jepler
Copy link

@jepler jepler commented Feb 15, 2022

Highlights:

  • This release of MicroPython sees a boost to the overall performance of the VM and runtime. However, these performance options are not enabled in any of our builds at this time.
  • Internally, we have transitioned from using FROZEN_MPY_DIR to using FROZEN_MANIFEST, when we include 'frozen modules' in a build.
  • A bug in multiple precision integers with bitwise of -0 was fixed in commit 2c139bb.

A small amount of code size reduction may have taken place thanks to general code size reductions of the core.

For all MicroPython 1.18 release notes, many of which do not apply to CircuitPython, see https://github.com/micropython/micropython/releases/tag/v1.18

Status / checklist:

  • Initial merge completed
  • Unix make VARIANT=coverage test passes
  • Any board firmware builds & loads (feather rp2040)
  • Other test-related stuff passes
  • boards without frozen modules work
  • CI is all green
  • boards with frozen modules work
  • mpy files from bundle work

Boards tested:

  • rotary trinkey (samd21, frozen modules)
  • feather rp2040

robert-hh and others added 30 commits October 25, 2021 23:54
By moving code to ITCM, like vm, gc, parse, runtime.  The change affects
mostly the execution speed of MicroPython code.  The speed is increased by
up to a factor of 6, especially for MCU with small cache.
There is no release of IDF v4.4 yet but master is now on v5.0-dev so a
specific commit must be chosen to stick to v4.4.

Signed-off-by: Damien George <damien@micropython.org>
This forwards through directly to the NimBLE and BTStack connect functions.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
We will use this fork for adding further features and patches to support
MicroPython.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
We're using the MicroPython fork of NimBLE, which on the
`micropython_1_4_0` branch re-adds support for 64-bit targets and fixes
initialisation of g_msys_pool_list.

Also updates modbluetooth_nimble.c to suit v1.4.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This was fixed in NimBLE 1.4.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This will be used by https://micropython.org/download/ to generate the
full listing of boards and firmware files.

Optionally supports a board.md for additional customisation of the
download page, as well as deploy.md for flashing instructions.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
In particular the UM S2 boards (and update the features list).

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Following on from ba94025, the change here
makes output about 15 times faster (now up to about 550 kbytes/sec).

tinyusb_cdcacm_write_queue will return the number of bytes written, so
there's no need to use tud_cdc_n_write_available.

Signed-off-by: Damien George <damien@micropython.org>
Prior to this commit IRQs on STM32F4 could be lost because SR is cleared by
reading SR then reading DR.  For example, if both RXNE and IDLE IRQs were
active upon entry to the IRQ handler, then IDLE is lost because the code
that handles RXNE comes first and accidentally clears SR (by reading SR
then DR to get the incoming character).

This commit fixes this problem by making the IRQ handler more atomic in the
following operations:
- get current IRQ status flags
- deal with RX character
- clear remaining status flags
- call user handler

On the STM32F4 it's very hard to get this right because the only way to
clear IRQ status flags is to read SR then DR, but the read of DR may read
some data which should remain in the register until the user wants to read
it.  And it won't work to cache the read because RTS/CTS flow control will
then not work.  So instead the new code disables interrupts if the DR is
full and waits for the user to read it before reenabling the interrupts.

Fixes issue mentioned in adafruit#4599 and adafruit#6082.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
To simplify the config.  This commit does not change the build.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Some of these will later be moved to CORE or BASIC, but EXTRA is a good
starting point based on what stm32 uses.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This commit is a no-op change.  Future improvements can come from making
individual boards use CORE or BASIC.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This commit is a no-op change to simplify existing config.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This is an stm32-specific feature that's accessed via the pyb module, so
not something that will be widely enabled.

Signed-off-by: Damien George <damien@micropython.org>
Computed goto costs 1800 bytes for 5-10% performance.

Map caching and attr fast path costs 130 bytes for up to 30%.

Net effect of those three optimisations:
bm_chaos.py         +16.059% (+/-0.09%)
bm_fannkuch.py      +11.145% (+/-0.01%)
bm_fft.py           +14.604% (+/-0.01%)
bm_float.py         +26.849% (+/-0.08%)
bm_hexiom.py        +34.039% (+/-0.03%)
bm_nqueens.py       +18.333% (+/-0.06%)
bm_pidigits.py       +4.472% (+/-0.03%)
misc_aes.py         +28.765% (+/-0.09%)
misc_mandel.py      +27.116% (+/-0.05%)
misc_pystone.py     +40.299% (+/-0.20%)
misc_raytrace.py    +22.812% (+/-0.07%)

Also enable other EXTRA-level optimisations (module const, return_if_expr,
triple_tuple_assign, factorial, mpz bitwise).

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This makes it possible for cooperative multitasking systems to keep running
event loops during garbage collector operations.

For example, this can be used to ensure that a motor control loop runs
approximately each 5 ms.  Without this hook, the loop time can jump to
about 15 ms.

Addresses adafruit#3475.

Signed-off-by: Laurens Valk <laurens@pybricks.com>
Word-size specific configuration is now done automatically, so it no longer
requires this to match the ARM configuration.

Also it's less common to have 32-bit compilation support installed, so this
will make it work "out of the box" for more people.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
@jepler jepler marked this pull request as ready for review February 17, 2022 02:04
@jepler jepler requested a review from tannewt February 17, 2022 02:05
Copy link
Collaborator

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IThank you for doing this! looked this over and I don't see anything. I want to check for asyncio.py whether we need both __await__() and __iter__(), but that should not affect this merge.

Do we need to change our frozen module specifications to use manifests? I don't see any changes to any mpconfigboard.mk files.

I would rather wait and merge this after 7.2.0, because of the possibility of unforeseen problems.

Copy link
Member

@gamblor21 gamblor21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth I looked through and didn't notice anything that I think would cause any issues. If I get a moment later I'll try to build it and run it on a couple boards (I'll see if I have one that uses frozen modules to test that).

@gamblor21
Copy link
Member

Tested on both a Feather nRF52840 and an UnexpectedMaker FeatherS2 both with the base build and a custom build where I included a frozen module to ensure it was included and loaded.

Both worked.

@tannewt
Copy link
Member

tannewt commented Feb 17, 2022

Did you try enabling the performance changes? What was the outcome?

@gamblor21
Copy link
Member

Does the below code have to change? It is referencing CIRCUITPY_OPT_CACH_MAP_LOOKUP_IN_BYTECODE which was removed in 1.18. Maybe it has to change to MICROPY_OPT_MAP_LOOKUP_CACHE (or a CIRCUITPY equivalent). I ran into it when I was trying to enable the latter option. I found reference to the new MICROPY_OPT_LOAD_ATTR_FAST_PATH only.

CIRCUITPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE ?= 0

@jepler
Copy link
Author

jepler commented Feb 17, 2022

Did you try enabling the performance changes? What was the outcome?

No, I didn't want to complicate things or make this PR harder to merge.

@tannewt
Copy link
Member

tannewt commented Feb 17, 2022

I think it may be worth spending a little time on the performance stuff to see if it's difficult. I think it's the headline change of 1.18 so MP knowledgeable folks may expect it when we promote that we've merged in 1.18.

@jepler
Copy link
Author

jepler commented Feb 18, 2022

I'm now a bit confused. There are signs of LOAD_ATTR_FAST_PATH in main, which is enabled by default, but no supporting code for it. So this has become enabled during this merge of v1.18 -- see 53c5bde. I can go ahead and enable the cache on FULL_BUILD targets; it costs another ~264 bytes of code and 128 bytes of RAM.

.. and remove a stanza for the "cache map lookup in bytecode" option,
which has been removed by upstream in 1.18; it's superceded by these
other improvements.
@tannewt
Copy link
Member

tannewt commented Feb 18, 2022

I can go ahead and enable the cache on FULL_BUILD targets; it costs another ~264 bytes of code and 128 bytes of RAM.

Sounds perfect! Thank you for looking into this.

@jepler
Copy link
Author

jepler commented Feb 18, 2022

With both performance knobs turned to "go fast", a little loop on rp2040 became nearly 20% faster:

import adafruit_ticks

class K:
    def __init__(self, v):
        self.v = v
        
def main():
    k = K(7)
 
    t0 = adafruit_ticks.ticks_ms()
    for i in range(1000*100):
        k.v
    t1 = adafruit_ticks.ticks_ms()
 
    dt = adafruit_ticks.ticks_diff(t1, t0)
    us_per_loop = dt / 100
    print(f"{dt} ms for 100,000 loop = {us_per_loop} us/loop")
    
main()

Before: 6.58µs/loop
After: 5.28µs/loop

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 🚀

@tannewt tannewt merged commit 918145f into adafruit:main Feb 19, 2022
@dhalbert dhalbert mentioned this pull request Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.