diff --git a/pep-0550-hamt_vs_dict-v2.png b/pep-0550-hamt_vs_dict-v2.png new file mode 100644 index 00000000000..7518e597135 Binary files /dev/null and b/pep-0550-hamt_vs_dict-v2.png differ diff --git a/pep-0550.rst b/pep-0550.rst index eaab823f052..397ad803886 100644 --- a/pep-0550.rst +++ b/pep-0550.rst @@ -2,883 +2,984 @@ PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ -Author: Yury Selivanov +Author: Yury Selivanov , + Elvis Pranskevichus Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 -Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017 +Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017 Abstract ======== -This PEP proposes a new mechanism to manage execution state--the -logical environment in which a function, a thread, a generator, -or a coroutine executes in. +This PEP adds a new generic mechanism of ensuring consistent access +to non-local state in the context of out-of-order execution, such +as in Python generators and coroutines. -A few examples of where having a reliable state storage is required: +Thread-local storage, such as ``threading.local()``, is inadequate for +programs that execute concurrently in the same OS thread. This PEP +proposes a solution to this problem. -* Context managers like decimal contexts, ``numpy.errstate``, - and ``warnings.catch_warnings``; -* Storing request-related data such as security tokens and request - data in web applications, implementing i18n; +Rationale +========= -* Profiling, tracing, and logging in complex and large code bases. +Prior to the advent of asynchronous programming in Python, programs +used OS threads to achieve concurrency. The need for thread-specific +state was solved by ``threading.local()`` and its C-API equivalent, +``PyThreadState_GetDict()``. -The usual solution for storing state is to use a Thread-local Storage -(TLS), implemented in the standard library as ``threading.local()``. -Unfortunately, TLS does not work for the purpose of state isolation -for generators or asynchronous code, because such code executes -concurrently in a single thread. +A few examples of where Thread-local storage (TLS) is commonly +relied upon: +* Context managers like decimal contexts, ``numpy.errstate``, + and ``warnings.catch_warnings``. -Rationale -========= +* Request-related data, such as security tokens and request + data in web applications, language context for ``gettext`` etc. + +* Profiling, tracing, and logging in large code bases. -Traditionally, a Thread-local Storage (TLS) is used for storing the -state. However, the major flaw of using the TLS is that it works only -for multi-threaded code. It is not possible to reliably contain the -state within a generator or a coroutine. For example, consider -the following generator:: +Unfortunately, TLS does not work well for programs which execute +concurrently in a single thread. A Python generator is the simplest +example of a concurrent program. Consider the following:: - def calculate(precision, ...): + def fractions(precision, x, y): with decimal.localcontext() as ctx: - # Set the precision for decimal calculations - # inside this block ctx.prec = precision + yield Decimal(x) / Decimal(y) + yield Decimal(x) / Decimal(y**2) - yield calculate_something() - yield calculate_something_else() - -Decimal context is using a TLS to store the state, and because TLS is -not aware of generators, the state can leak. If a user iterates over -the ``calculate()`` generator with different precisions one by one -using a ``zip()`` built-in, the above code will not work correctly. -For example:: - - g1 = calculate(precision=100) - g2 = calculate(precision=50) + g1 = fractions(precision=2, x=1, y=3) + g2 = fractions(precision=6, x=2, y=3) items = list(zip(g1, g2)) - # items[0] will be a tuple of: - # first value from g1 calculated with 100 precision, - # first value from g2 calculated with 50 precision. - # - # items[1] will be a tuple of: - # second value from g1 calculated with 50 precision (!!!), - # second value from g2 calculated with 50 precision. - -An even scarier example would be using decimals to represent money -in an async/await application: decimal calculations can suddenly -lose precision in the middle of processing a request. Currently, -bugs like this are extremely hard to find and fix. +The expected value of ``items`` is:: -Another common need for web applications is to have access to the -current request object, or security context, or, simply, the request -URL for logging or submitting performance tracing data:: + [(Decimal('0.33'), Decimal('0.666667')), + (Decimal('0.11'), Decimal('0.222222'))] - async def handle_http_request(request): - context.current_http_request = request +Rather surprisingly, the actual result is:: - await ... - # Invoke your framework code, render templates, - # make DB queries, etc, and use the global - # 'current_http_request' in that code. + [(Decimal('0.33'), Decimal('0.666667')), + (Decimal('0.111111'), Decimal('0.222222'))] - # This isn't currently possible to do reliably - # in asyncio out of the box. +This is because Decimal context is stored as a thread-local, so +concurrent iteration of the ``fractions()`` generator would corrupt +the state. A similar problem exists with coroutines. -These examples are just a few out of many, where a reliable way to -store context data is absolutely needed. +Applications also often need to associate certain data with a given +thread of execution. For example, a web application server commonly +needs access to the current HTTP request object. -The inability to use TLS for asynchronous code has lead to +The inadequacy of TLS in asynchronous code has lead to the proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases. -Current status quo is that any library, including the standard -library, that uses a TLS, will likely not work as expected in +The current status quo is that any library (including the standard +library), which relies on TLS, is likely to be broken when used in asynchronous code or with generators (see [3]_ as an example issue.) -Some languages that have coroutines or generators recommend to -manually pass a ``context`` object to every function, see [1]_ -describing the pattern for Go. This approach, however, has limited -use for Python, where we have a huge ecosystem that was built to work -with a TLS-like context. Moreover, passing the context explicitly -does not work at all for libraries like ``decimal`` or ``numpy``, -which use operator overloading. +Some languages, that support coroutines or generators, recommend +passing the context manually as an argument to every function, see [1]_ +for an example. This approach, however, has limited use for Python, +where there is a large ecosystem that was built to work with a TLS-like +context. Furthermore, libraries like ``decimal`` or ``numpy`` rely +on context implicitly in overloaded operator implementations. -.NET runtime, which has support for async/await, has a generic -solution of this problem, called ``ExecutionContext`` (see [2]_). -On the surface, working with it is very similar to working with a TLS, -but the former explicitly supports asynchronous code. +The .NET runtime, which has support for async/await, has a generic +solution for this problem, called ``ExecutionContext`` (see [2]_). Goals ===== -The goal of this PEP is to provide a more reliable alternative to -``threading.local()``. It should be explicitly designed to work with -Python execution model, equally supporting threads, generators, and -coroutines. +The goal of this PEP is to provide a more reliable +``threading.local()`` alternative, which: -An acceptable solution for Python should meet the following -requirements: +* provides the mechanism and the API to fix non-local state issues + with coroutines and generators; -* Transparent support for code executing in threads, coroutines, - and generators with an easy to use API. +* has no or negligible performance impact on the existing code or + the code that will be using the new mechanism, including + libraries like ``decimal`` and ``numpy``. -* Negligible impact on the performance of the existing code or the - code that will be using the new mechanism. -* Fast C API for packages like ``decimal`` and ``numpy``. +High-Level Specification +======================== -Explicit is still better than implicit, hence the new APIs should only -be used when there is no acceptable way of passing the state -explicitly. +The full specification of this PEP is broken down into three parts: +* High-Level Specification (this section): the description of the + overall solution. We show how it applies to generators and + coroutines in user code, without delving into implementation details. -Specification -============= +* Detailed Specification: the complete description of new concepts, + APIs, and related changes to the standard library. -Execution Context is a mechanism of storing and accessing data specific -to a logical thread of execution. We consider OS threads, -generators, and chains of coroutines (such as ``asyncio.Task``) -to be variants of a logical thread. +* Implementation Details: the description and analysis of data + structures and algorithms used to implement this PEP, as well as the + necessary changes to CPython. -In this specification, we will use the following terminology: +For the purpose of this section, we define *execution context* as an +opaque container of non-local state that allows consistent access to +its contents in the concurrent execution environment. -* **Logical Context**, or LC, is a key/value mapping that stores the - context of a logical thread. +A *context variable* is an object representing a value in the +execution context. A new context variable is created by calling +the ``new_context_var()`` function. A context variable object has +two methods: -* **Execution Context**, or EC, is an OS-thread-specific dynamic - stack of Logical Contexts. +* ``lookup()``: returns the value of the variable in the current + execution context; -* **Context Key**, or CK, is an object used to set and get values - from the Execution Context. +* ``set()``: sets the value of the variable in the current + execution context. -Please note that throughout the specification we use simple -pseudo-code to illustrate how the EC machinery works. The actual -algorithms and data structures that we will use to implement the PEP -are discussed in the `Implementation Strategy`_ section. - - -Context Key Object ------------------- -The ``sys.new_context_key(name)`` function creates a new ``ContextKey`` -object. The ``name`` parameter is a ``str`` needed to render a -representation of ``ContextKey`` object for introspection and -debugging purposes. +Regular Single-threaded Code +---------------------------- -``ContextKey`` objects have the following methods and attributes: +In regular, single-threaded code that doesn't involve generators or +coroutines, context variables behave like globals:: -* ``.name``: read-only name; + var = new_context_var() -* ``.set(o)`` method: set the value to ``o`` for the context key - in the execution context. + def sub(): + assert var.lookup() == 'main' + var.set('sub') -* ``.get()`` method: return the current EC value for the context key. - Context keys return ``None`` when the key is missing, so the method - never fails. + def main(): + var.set('main') + sub() + assert var.lookup() == 'sub' -The below is an example of how context keys can be used:: - - my_context = sys.new_context_key('my_context') - my_context.set('spam') - - # Later, to access the value of my_context: - print(my_context.get()) +Multithreaded Code +------------------ -Thread State and Multi-threaded code ------------------------------------- +In multithreaded code, context variables behave like thread locals:: -Execution Context is implemented on top of Thread-local Storage. -For every thread there is a separate stack of Logical Contexts -- -mappings of ``ContextKey`` objects to their values in the LC. -New threads always start with an empty EC. + var = new_context_var() -For CPython:: + def sub(): + assert var.lookup() is None # The execution context is empty + # for each new thread. + var.set('sub') - PyThreadState: - execution_context: ExecutionContext([ - LogicalContext({ci1: val1, ci2: val2, ...}), - ... - ]) + def main(): + var.set('main') -The ``ContextKey.get()`` and ``.set()`` methods are defined as -follows (in pseudo-code):: + thread = threading.Thread(target=sub) + thread.start() + thread.join() - class ContextKey: - - def get(self): - tstate = PyThreadState_Get() + assert var.lookup() == 'main' - for logical_context in reversed(tstate.execution_context): - if self in logical_context: - return logical_context[self] - return None - - def set(self, value): - tstate = PyThreadState_Get() +Generators +---------- - if not tstate.execution_context: - tstate.execution_context = [LogicalContext()] +In generators, changes to context variables are local and are not +visible to the caller, but are visible to the code called by the +generator. Once set in the generator, the context variable is +guaranteed not to change between iterations:: - tstate.execution_context[-1][self] = value + var = new_context_var() -With the semantics defined so far, the Execution Context can already -be used as an alternative to ``threading.local()``:: + def gen(): + var.set('gen') + assert var.lookup() == 'gen' + yield 1 - def print_foo(): - print(ci.get() or 'nothing') + assert var.lookup() == 'gen' + yield 2 - ci = sys.new_context_key('ci') - ci.set('foo') + def main(): + var.set('main') - # Will print "foo": - print_foo() + g = gen() + next(g) + assert var.lookup() == 'main' - # Will print "nothing": - threading.Thread(target=print_foo).start() + var.set('main modified') + next(g) + assert var.lookup() == 'main modified' +Changes to caller's context variables are visible to the generator +(unless they were also modified inside the generator):: -Manual Context Management -------------------------- + var = new_context_var() -Execution Context is generally managed by the Python interpreter, -but sometimes it is desirable for the user to take the control -over it. A few examples when this is needed: + def gen(): + assert var.lookup() == 'var' + yield 1 -* running a computation in ``concurrent.futures.ThreadPoolExecutor`` - with the current EC; + assert var.lookup() == 'var modified' + yield 2 -* reimplementing generators with iterators (more on that later); + def main(): + g = gen() -* managing contexts in asynchronous frameworks (implement proper - EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) + var.set('var') + next(g) -For these purposes we add a set of new APIs (they will be used in -later sections of this specification): + var.set('var modified') + next(g) -* ``sys.new_logical_context()``: create an empty ``LogicalContext`` - object. +Now, let's revisit the decimal precision example from the `Rationale`_ +section, and see how the execution context can improve the situation:: -* ``sys.new_execution_context()``: create an empty - ``ExecutionContext`` object. + import decimal -* Both ``LogicalContext`` and ``ExecutionContext`` objects are opaque - to Python code, and there are no APIs to modify them. + decimal_prec = new_context_var() # create a new context variable -* ``sys.get_execution_context()`` function. The function returns a - copy of the current EC: an ``ExecutionContext`` instance. + # Pre-PEP 550 Decimal relies on TLS for its context. + # This subclass switches the decimal context storage + # to the execution context for illustration purposes. + # + class MyDecimal(decimal.Decimal): + def __init__(self, value="0"): + prec = decimal_prec.lookup() + if prec is None: + raise ValueError('could not find decimal precision') + context = decimal.Context(prec=prec) + super().__init__(value, context=context) - The runtime complexity of the actual implementation of this function - can be O(1), but for the purposes of this section it is equivalent - to:: + def fractions(precision, x, y): + # Normally, this would be set by a context manager, + # but for simplicity we do this directly. + decimal_prec.set(precision) - def get_execution_context(): - tstate = PyThreadState_Get() - return copy(tstate.execution_context) + yield MyDecimal(x) / MyDecimal(y) + yield MyDecimal(x) / MyDecimal(y**2) -* ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, - **kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution - context:: + g1 = fractions(precision=2, x=1, y=3) + g2 = fractions(precision=6, x=2, y=3) - def run_with_execution_context(ec, func, *args, **kwargs): - tstate = PyThreadState_Get() + items = list(zip(g1, g2)) - old_ec = tstate.execution_context +The value of ``items`` is:: - tstate.execution_context = ExecutionContext( - ec.logical_contexts + [LogicalContext()] - ) + [(Decimal('0.33'), Decimal('0.666667')), + (Decimal('0.11'), Decimal('0.222222'))] - try: - return func(*args, **kwargs) - finally: - tstate.execution_context = old_ec +which matches the expected result. - Any changes to Logical Context by ``func`` will be ignored. - This allows to reuse one ``ExecutionContext`` object for multiple - invocations of different functions, without them being able to - affect each other's environment:: - ci = sys.new_context_key('ci') - ci.set('spam') +Coroutines and Asynchronous Tasks +--------------------------------- - def func(): - print(ci.get()) - ci.set('ham') +In coroutines, like in generators, context variable changes are local +and are not visible to the caller:: - ec = sys.get_execution_context() + import asyncio - sys.run_with_execution_context(ec, func) - sys.run_with_execution_context(ec, func) + var = new_context_var() - # Will print: - # spam - # spam + async def sub(): + assert var.lookup() == 'main' + var.set('sub') + assert var.lookup() == 'sub' -* ``sys.run_with_logical_context(lc: LogicalContext, func, *args, - **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution - context using the specified logical context. + async def main(): + var.set('main') + await sub() + assert var.lookup() == 'main' - Any changes that ``func`` does to the logical context will be - persisted in ``lc``. This behaviour is different from the - ``run_with_execution_context()`` function, which always creates - a new throw-away logical context. + loop = asyncio.get_event_loop() + loop.run_until_complete(main()) - In pseudo-code:: +To establish the full semantics of execution context in couroutines, +we must also consider *tasks*. A task is the abstraction used by +*asyncio*, and other similar libraries, to manage the concurrent +execution of coroutines. In the example above, a task is created +implicitly by the ``run_until_complete()`` function. +``asyncio.wait_for()`` is another example of implicit task creation:: - def run_with_logical_context(lc, func, *args, **kwargs): - tstate = PyThreadState_Get() + async def sub(): + await asyncio.sleep(1) + assert var.lookup() == 'main' - old_ec = tstate.execution_context + async def main(): + var.set('main') - tstate.execution_context = ExecutionContext( - old_ec.logical_contexts + [lc] - ) + # waiting for sub() directly + await sub() - try: - return func(*args, **kwargs) - finally: - tstate.execution_context = old_ec + # waiting for sub() with a timeout + await asyncio.wait_for(sub(), timeout=2) - Using the previous example:: + var.set('main changed') - ci = sys.new_context_key('ci') - ci.set('spam') +Intuitively, we expect the assertion in ``sub()`` to hold true in both +invocations, even though the ``wait_for()`` implementation actually +spawns a task, which runs ``sub()`` concurrently with ``main()``. - def func(): - print(ci.get()) - ci.set('ham') +Thus, tasks **must** capture a snapshot of the current execution +context at the moment of their creation and use it to execute the +wrapped coroutine whenever that happens. If this is not done, then +innocuous looking changes like wrapping a coroutine in a ``wait_for()`` +call would cause surprising breakage. This leads to the following:: - ec = sys.get_execution_context() - lc = sys.new_logical_context() + import asyncio - sys.run_with_logical_context(lc, func) - sys.run_with_logical_context(lc, func) + var = new_context_var() - # Will print: - # spam - # ham + async def sub(): + # Sleeping will make sub() run after + # `var` is modified in main(). + await asyncio.sleep(1) -As an example, let's make a subclass of -``concurrent.futures.ThreadPoolExecutor`` that preserves the execution -context for scheduled functions:: + assert var.lookup() == 'main' - class Executor(concurrent.futures.ThreadPoolExecutor): + async def main(): + var.set('main') + loop.create_task(sub()) # schedules asynchronous execution + # of sub(). + assert var.lookup() == 'main' + var.set('main changed') - def submit(self, fn, *args, **kwargs): - context = sys.get_execution_context() + loop = asyncio.get_event_loop() + loop.run_until_complete(main()) - fn = functools.partial( - sys.run_with_execution_context, context, - fn, *args, **kwargs) +In the above code we show how ``sub()``, running in a separate task, +sees the value of ``var`` as it was when ``loop.create_task(sub())`` +was called. - return super().submit(fn) +Like tasks, the intuitive behaviour of callbacks scheduled with either +``Loop.call_soon()``, ``Loop.call_later()``, or +``Future.add_done_callback()`` is to also capture a snapshot of the +current execution context at the point of scheduling, and use it to +run the callback:: + current_request = new_context_var() -Generators ----------- + def log_error(e): + logging.error('error when handling request %r', + current_request.lookup()) -Generators in Python are producers of data, and ``yield`` expressions -are used to suspend/resume their execution. When generators suspend -execution, their local state will "leak" to the outside code if they -store it in a TLS or in a global variable:: + async def render_response(): + ... - local = threading.local() + async def handle_get_request(request): + current_request.set(request) - def gen(): - old_x = local.x - local.x = 'spam' try: - yield - ... - yield - finally: - local.x = old_x + return await render_response() + except Exception as e: + get_event_loop().call_soon(log_error, e) + return '500 - Internal Server Error' -The above code will not work as many Python users expect it to work. -A simple ``next(gen())`` will set ``local.x`` to "spam" and it will -never be reset back to its original value. -One of the goals of this proposal is to provide a mechanism to isolate -local state in generators. +Detailed Specification +====================== +Conceptually, an *execution context* (EC) is a stack of logical +contexts. There is one EC per Python thread. -Generator Object Modifications -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +A *logical context* (LC) is a mapping of context variables to their +values in that particular LC. -To achieve this, we make a small set of modifications to the -generator object: +A *context variable* is an object representing a value in the +execution context. A new context variable object is created by calling +the ``sys.new_context_var(name: str)`` function. The value of the +``name`` argument is not used by the EC machinery, but may be used for +debugging and introspection. -* New ``__logical_context__`` attribute. This attribute is readable - and writable for Python code. +The context variable object has the following methods and attributes: -* When a generator object is instantiated its ``__logical_context__`` - is initialized with an empty ``LogicalContext``. +* ``name``: the value passed to ``new_context_var()``. -* Generator's ``.send()`` and ``.throw()`` methods are modified as - follows (in pseudo-C):: - - if gen.__logical_context__ is not NULL: - tstate = PyThreadState_Get() - - tstate.execution_context.push(gen.__logical_context__) - - try: - # Perform the actual `Generator.send()` or - # `Generator.throw()` call. - return gen.send(...) - finally: - gen.__logical_context__ = tstate.execution_context.pop() - else: - # Perform the actual `Generator.send()` or - # `Generator.throw()` call. - return gen.send(...) +* ``lookup()``: traverses the execution context top-to-bottom, + until the variable value is found. Returns ``None``, if the variable + is not present in the execution context; - If a generator has a non-NULL ``__logical_context__``, it will - be pushed to the EC and, therefore, generators will use it - to accumulate their local state. +* ``set()``: sets the value of the variable in the topmost logical + context. - If a generator has no ``__logical_context__``, generators will - will use whatever LC they are being run in. +Generators +---------- -EC Semantics for Generators -^^^^^^^^^^^^^^^^^^^^^^^^^^^ +When created, each generator object has an empty logical context object +stored in its ``__logical_context__`` attribute. This logical context +is pushed onto the execution context at the beginning of each generator +iteration and popped at the end:: -Every generator object has its own Logical Context that stores -only its own local modifications of the context. When a generator -is being iterated, its logical context will be put in the EC stack -of the current thread. This means that the generator will be able -to access keys from the surrounding context:: + var1 = sys.new_context_var('var1') + var2 = sys.new_context_var('var2') - local = sys.new_context_key("local") - global = sys.new_context_key("global") + def gen(): + var1.set('var1-gen') + var2.set('var2-gen') + + # EC = [ + # outer_LC(), + # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}) + # ] + n = nested_gen() # nested_gen_LC is created + next(n) + # EC = [ + # outer_LC(), + # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}) + # ] + + var1.set('var1-gen-mod') + var2.set('var2-gen-mod') + # EC = [ + # outer_LC(), + # gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}) + # ] + next(n) + + def nested_gen(): + # EC = [ + # outer_LC(), + # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}), + # nested_gen_LC() + # ] + assert var1.lookup() == 'var1-gen' + assert var2.lookup() == 'var2-gen' + + var1.set('var1-nested-gen') + # EC = [ + # outer_LC(), + # gen_LC({var1: 'var1-gen', var2: 'var2-gen'}), + # nested_gen_LC({var1: 'var1-nested-gen'}) + # ] + yield - def generator(): - local.set('inside gen:') - while True: - print(local.get(), global.get()) - yield + # EC = [ + # outer_LC(), + # gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}), + # nested_gen_LC({var1: 'var1-nested-gen'}) + # ] + assert var1.lookup() == 'var1-nested-gen' + assert var2.lookup() == 'var2-gen-mod' - g = gen() + yield - local.set('hello') - global.set('spam') - next(g) + # EC = [outer_LC()] - local.set('world') - global.set('ham') - next(g) + g = gen() # gen_LC is created for the generator object `g` + list(g) - # Will print: - # inside gen: spam - # inside gen: ham + # EC = [outer_LC()] -Any changes to the EC in nested generators are invisible to the outer -generator:: +The snippet above shows the state of the execution context stack +throughout the generator lifespan. - local = sys.new_context_key("local") - def inner_gen(): - local.set('spam') - yield +contextlib.contextmanager +------------------------- - def outer_gen(): - local.set('ham') - yield from gen() - print(local.get()) +Earlier, we've used the following example:: - list(outer_gen()) + import decimal - # Will print: - # ham + # create a new context variable + decimal_prec = sys.new_context_var('decimal_prec') + # ... -Running generators without LC -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + def fractions(precision, x, y): + decimal_prec.set(precision) -If ``__logical_context__`` is set to ``None`` for a generator, -it will simply use the outer Logical Context. + yield MyDecimal(x) / MyDecimal(y) + yield MyDecimal(x) / MyDecimal(y**2) -The ``@contextlib.contextmanager`` decorator uses this mechanism to -allow its generator to affect the EC:: +Let's extend it by adding a context manager:: - item = sys.new_context_key('item') + @contextlib.contextmanager + def precision_context(prec): + old_rec = decimal_prec.lookup() - @contextmanager - def context(x): - old = item.get() - item.set('x') try: + decimal_prec.set(prec) yield finally: - item.set(old) - - with context('spam'): - - with context('ham'): - print(1, item.get()) - - print(2, item.get()) - - # Will print: - # 1 ham - # 2 spam - - -Implementing Generators with Iterators -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The Execution Context API allows to fully replicate EC behaviour -imposed on generators with a regular Python iterator class:: - - class Gen: - - def __init__(self): - self.logical_context = sys.new_logical_context() - - def __iter__(self): - return self - - def __next__(self): - return sys.run_with_logical_context( - self.logical_context, self._next_impl) - - def _next_impl(self): - # Actual __next__ implementation. - ... + decimal_prec.set(old_prec) +Unfortunately, this would not work straight away, as the modification +to the ``decimal_prec`` variable is contained to the +``precision_context()`` generator, and therefore will not be visible +inside the ``with`` block:: -yield from in generator-based coroutines -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + def fractions(precision, x, y): + # EC = [{}, {}] -Prior to :pep:`492`, ``yield from`` was used as one of the mechanisms -to implement coroutines in Python. :pep:`492` is built on top -of ``yield from`` machinery, and it is even possible to make a -generator compatible with async/await code by decorating it with -``@types.coroutine`` (or ``@asyncio.coroutine``). + with precision_context(precision): + # EC becomes [{}, {}, {decimal_prec: precision}] in the + # *precision_context()* generator, + # but here the EC is still [{}, {}] -Generators decorated with these decorators follow the Execution -Context semantics described below in the -`EC Semantics for Coroutines`_ section below. + # raises ValueError('could not find decimal precision')! + yield MyDecimal(x) / MyDecimal(y) + yield MyDecimal(x) / MyDecimal(y**2) +The way to fix this is to set the generator's ``__logical_context__`` +attribute to ``None``. This will cause the generator to avoid +modifying the execution context stack. -yield from in generators -^^^^^^^^^^^^^^^^^^^^^^^^ +We modify the ``contextlib.contextmanager()`` decorator to +set ``genobj.__logical_context__`` to ``None`` to produce +well-behaved context managers:: -Another ``yield from`` use is to compose generators. Essentially, -``yield from gen()`` is a better version of -``for v in gen(): yield v`` (read more about many subtle details -in :pep:`380`.) + def fractions(precision, x, y): + # EC = [{}, {}] -A crucial difference between ``await coro`` and ``yield value`` is -that the former expression guarantees that the ``coro`` will be -executed fully, while the latter is producing ``value`` and -suspending the generator until it gets iterated again. + with precision_context(precision): + # EC = [{}, {decimal_prec: precision}] -Therefore, this proposal does not special case ``yield from`` -expression for regular generators:: - - item = sys.new_context_key('item') - - def nested(): - assert item.get() == 'outer' - item.set('inner') - yield + yield MyDecimal(x) / MyDecimal(y) + yield MyDecimal(x) / MyDecimal(y**2) - def outer(): - item.set('outer') - yield from nested() - assert item.get() == 'outer' + # EC becomes [{}, {decimal_prec: None}] -EC Semantics for Coroutines ---------------------------- +asyncio +------- -Python :pep:`492` coroutines are used to implement cooperative -multitasking. For a Python end-user they are similar to threads, -especially when it comes to sharing resources or modifying -the global state. +``asyncio`` uses ``Loop.call_soon``, ``Loop.call_later``, +and ``Loop.call_at`` to schedule the asynchronous execution of a +function. ``asyncio.Task`` uses ``call_soon()`` to further the +execution of the wrapped coroutine. -An event loop is needed to schedule coroutines. Coroutines that -are explicitly scheduled by the user are usually called Tasks. -When a coroutine is scheduled, it can schedule other coroutines using -an ``await`` expression. In async/await world, awaiting a coroutine -is equivalent to a regular function call in synchronous code. Thus, -Tasks are similar to threads. +We modify ``Loop.call_{at,later,soon}`` to accept the new +optional *execution_context* keyword argument, which defaults to +the copy of the current execution context:: -By drawing a parallel between regular multithreaded code and -async/await, it becomes apparent that any modification of the -execution context within one Task should be visible to all coroutines -scheduled within it. Any execution context modifications, however, -must not be visible to other Tasks executing within the same OS -thread. + def call_soon(self, callback, *args, execution_context=None): + if execution_context is None: + execution_context = sys.get_execution_context() -Similar to generators, coroutines have the new ``__logical_context__`` -attribute and same implementations of ``.send()`` and ``.throw()`` -methods. The key difference is that coroutines start with -``__logical_context__`` set to ``NULL`` (generators start with -an empty ``LogicalContext``.) + # ... some time later -This means that it is expected that the asynchronous library and -its Task abstraction will control how exactly coroutines interact -with Execution Context. + sys.run_with_execution_context( + execution_context, callback, args) +The ``sys.get_execution_context()`` function returns a shallow copy +of the current execution context. By shallow copy here we mean such +a new execution context that: -Tasks -^^^^^ +* lookups in the copy provide the same results as in the original + execution context, and +* any changes in the original execution context do not affect the + copy, and +* any changes to the copy do not affect the original execution + context. -In asynchronous frameworks like asyncio, coroutines are run by -an event loop, and need to be explicitly scheduled (in asyncio -coroutines are run by ``asyncio.Task``.) +Either of the following satisfy the copy requirements: -To enable correct Execution Context propagation into Tasks, the -asynchronous framework needs to assist the interpreter: +* a new stack with shallow copies of logical contexts; +* a new stack with one squashed logical context. -* When ``create_task`` is called, it should capture the current - execution context with ``sys.get_execution_context()`` and save it - on the Task object. +The ``sys.run_with_execution_context(ec, func, *args, **kwargs)`` +function runs ``func(*args, **kwargs)`` with *ec* as the execution +context. The function performs the following steps: -* The ``__logical_context__`` of the wrapped coroutine should be - initialized to a new empty logical context. +1. Set *ec* as the current execution context stack in the current + thread. +2. Push an empty logical context onto the stack. +3. Run ``func(*args, **kwargs)``. +4. Pop the logical context from the stack. +5. Restore the original execution context stack. +6. Return or raise the ``func()`` result. -* When the Task object runs its coroutine object, it should execute - ``.send()`` and ``.throw()`` methods within the captured - execution context, using the ``sys.run_with_execution_context()`` - function. +These steps ensure that *ec* cannot be modified by *func*, +which makes ``run_with_execution_context()`` idempotent. -For ``asyncio.Task``:: +``asyncio.Task`` is modified as follows:: class Task: def __init__(self, coro): ... - self.exec_context = sys.get_execution_context() - coro.__logical_context__ = sys.new_logical_context() + # Get the current execution context snapshot. + self._exec_context = sys.get_execution_context() + + self._loop.call_soon( + self._step, + execution_context=self._exec_context) - def _step(self, val): + def _step(self, exc=None): ... - sys.run_with_execution_context( - self.exec_context, - self.coro.send, val) + self._loop.call_soon( + self._step, + execution_context=self._exec_context) ... -This makes any changes to execution context made by nested coroutine -calls within a Task to be visible throughout the Task:: - ci = sys.new_context_key('ci') +Generators Transformed into Iterators +------------------------------------- - async def nested(): - ci.set('nested') +Any Python generator can be represented as an equivalent iterator. +Compilers like Cython rely on this axiom. With respect to the +execution context, such iterator should behave the same way as the +generator it represents. - async def main(): - ci.set('main') - print('before:', ci.get()) - await nested() - print('after:', ci.get()) +This means that there needs to be a Python API to create new logical +contexts and run code with a given logical context. - asyncio.get_event_loop().run_until_complete(main()) +The ``sys.new_logical_context()`` function creates a new empty +logical context. - # Will print: - # before: main - # after: nested +The ``sys.run_with_logical_context(lc, func, *args, **kwargs)`` +function can be used to run functions in the specified logical context. +The *lc* can be modified as a result of the call. -New Tasks, started within another Task, will run in the correct -execution context too:: +The ``sys.run_with_logical_context()`` function performs the following +steps: - current_request = sys.new_context_key('current_request') +1. Push *lc* onto the current execution context stack. +2. Run ``func(*args, **kwargs)``. +3. Pop *lc* from the execution context stack. +4. Return or raise the ``func()`` result. - async def child(): - print('current request:', repr(current_request.get())) +By using ``new_logical_context()`` and ``run_with_logical_context()``, +we can replicate the generator behaviour like this:: - async def handle_request(request): - current_request.set(request) - event_loop.create_task(child) + class Generator: - run(top_coro()) - - # Will print: - # current_request: None - -The above snippet will run correctly, and the ``child()`` -coroutine will be able to access the current request object -through the ``current_request`` Context Key. + def __init__(self): + self.logical_context = sys.new_logical_context() -Any of the above examples would work if one the coroutines -was a generator decorated with ``@asyncio.coroutine``. + def __iter__(self): + return self + def __next__(self): + return sys.run_with_logical_context( + self.logical_context, self._next_impl) -Event Loop Callbacks -^^^^^^^^^^^^^^^^^^^^ + def _next_impl(self): + # Actual __next__ implementation. + ... -Similarly to Tasks, functions like asyncio's ``loop.call_soon()`` -should capture the current execution context with -``sys.get_execution_context()`` and execute callbacks -within it with ``sys.run_with_execution_context()``. +Let's see how this pattern can be applied to a real generator:: -This way the following code will work:: + # create a new context variable + decimal_prec = sys.new_context_var('decimal_precision') - current_request = sys.new_context_key('current_request') + def gen_series(n, precision): + decimal_prec.set(precision) - def log(): - request = current_request.get() - print(request) + for i in range(1, n): + yield MyDecimal(i) / MyDecimal(3) - async def request_handler(request): - current_request.set(request) - get_event_loop.call_soon(log) + # gen_series is equivalent to the following iterator: + class Series: -Asynchronous Generators ------------------------ + def __init__(self, n, precision): + # Create a new empty logical context on creation, + # like the generators do. + self.logical_context = sys.new_logical_context() -Asynchronous Generators (AG) interact with the Execution Context -similarly to regular generators. + # run_with_logical_context() will pushes + # self.logical_context onto the execution context stack, + # runs self._next_impl, and pops self.logical_context + # from the stack. + return sys.run_with_logical_context( + self.logical_context, self._init, n, precision) -They have an ``__logical_context__`` attribute, which, similarly to -regular generators, can be set to ``None`` to make them use the outer -Logical Context. This is used by the new -``contextlib.asynccontextmanager`` decorator. + def _init(self, n, precision): + self.i = 1 + self.n = n + decimal_prec.set(precision) + def __iter__(self): + return self -Greenlets ---------- + def __next__(self): + return sys.run_with_logical_context( + self.logical_context, self._next_impl) -Greenlet is an alternative implementation of cooperative -scheduling for Python. Although greenlet package is not part of -CPython, popular frameworks like gevent rely on it, and it is -important that greenlet can be modified to support execution -contexts. + def _next_impl(self): + decimal_prec.set(self.precision) + result = MyDecimal(self.i) / MyDecimal(3) + self.i += 1 + return result -In a nutshell, greenlet design is very similar to design of -generators. The main difference is that for generators, the stack -is managed by the Python interpreter. Greenlet works outside of the -Python interpreter, and manually saves some ``PyThreadState`` -fields and pushes/pops the C-stack. Thus the ``greenlet`` package -can be easily updated to use the new low-level `C API`_ to enable -full support of EC. +For regular iterators such approach to logical context management is +normally not necessary, and it is recommended to set and restore +context variables directly in ``__next__``:: + class Series: -New APIs -======== + def __next__(self): + old_prec = decimal_prec.lookup() -Python ------- + try: + decimal_prec.set(self.precision) + ... + finally: + decimal_prec.set(old_prec) -Python APIs were designed to completely hide the internal -implementation details, but at the same time provide enough control -over EC and LC to re-implement all of Python built-in objects -in pure Python. -1. ``sys.new_context_key(name: str='...')``: create a - ``ContextKey`` object used to access/set values in EC. +Asynchronous Generators +----------------------- -2. ``ContextKey``: +The execution context semantics in asynchronous generators does not +differ from that of regular generators and coroutines. + + +Implementation +============== + +Execution context is implemented as an immutable linked list of +logical contexts, where each logical context is an immutable weak key +mapping. A pointer to the currently active execution context is stored +in the OS thread state:: + + +-----------------+ + | | ec + | PyThreadState +-------------+ + | | | + +-----------------+ | + | + ec_node ec_node ec_node v + +------+------+ +------+------+ +------+------+ + | NULL | lc |<----| prev | lc |<----| prev | lc | + +------+--+---+ +------+--+---+ +------+--+---+ + | | | + LC v LC v LC v + +-------------+ +-------------+ +-------------+ + | var1: obj1 | | EMPTY | | var1: obj4 | + | var2: obj2 | +-------------+ +-------------+ + | var3: obj3 | + +-------------+ + +The choice of the immutable list of immutable mappings as a fundamental +data structure is motivated by the need to efficiently implement +``sys.get_execution_context()``, which is to be frequently used by +asynchronous tasks and callbacks. When the EC is immutable, +``get_execution_context()`` can simply copy the current execution +context *by reference*:: + + def get_execution_context(self): + return PyThreadState_Get().ec + +Let's review all possible context modification scenarios: + +* The ``ContextVariable.set()`` method is called:: + + def ContextVar_set(self, val): + # See a more complete set() definition + # in the `Context Variables` section. - * ``.name``: read-only attribute. - * ``.get()``: return the current value for the key. - * ``.set(o)``: set the current value in the EC for the key. + tstate = PyThreadState_Get() + top_ec_node = tstate.ec + top_lc = top_ec_node.lc + new_top_lc = top_lc.set(self, val) + tstate.ec = ec_node( + prev=top_ec_node.prev, + lc=new_top_lc) -3. ``sys.get_execution_context()``: return the current - ``ExecutionContext``. +* The ``sys.run_with_logical_context()`` is called, in which case + the passed logical context object is appended to the + execution context:: -4. ``sys.new_execution_context()``: create a new empty - ``ExecutionContext``. + def run_with_logical_context(lc, func, *args, **kwargs): + tstate = PyThreadState_Get() -5. ``sys.new_logical_context()``: create a new empty - ``LogicalContext``. + old_top_ec_node = tstate.ec + new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc) -6. ``sys.run_with_execution_context(ec: ExecutionContext, - func, *args, **kwargs)``. + try: + tstate.ec = new_top_ec_node + return func(*args, **kwargs) + finally: + tstate.ec = old_top_ec_node -7. ``sys.run_with_logical_context(lc:LogicalContext, - func, *args, **kwargs)``. +* The ``sys.run_with_execution_context()`` is called, in which case + the current execution context is set to the passed execution context + with a new empty logical context appended to it:: + def run_with_execution_context(ec, func, *args, **kwargs): + tstate = PyThreadState_Get() -C API ------ + old_top_ec_node = tstate.ec + new_lc = sys.new_logical_context() + new_top_ec_node = ec_node(prev=ec, lc=new_lc) -1. ``PyContextKey * PyContext_NewKey(char *desc)``: create a - ``PyContextKey`` object. + try: + tstate.ec = new_top_ec_node + return func(*args, **kwargs) + finally: + tstate.ec = old_top_ec_node -2. ``PyObject * PyContext_GetKey(PyContextKey *)``: get the - current value for the context key. +* Either ``genobj.send()``, ``genobj.throw()``, ``genobj.close()`` + are called on a ``genobj`` generator, in which case the logical + context recorded in ``genobj`` is pushed onto the stack:: -3. ``int PyContext_SetKey(PyContextKey *, PyObject *)``: set - the current value for the context key. + PyGen_New(PyGenObject *gen): + gen.__logical_context__ = sys.new_logical_context() -4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty - ``PyLogicalContext``. + gen_send(PyGenObject *gen, ...): + tstate = PyThreadState_Get() -5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty - ``PyExecutionContext``. + if gen.__logical_context__ is not None: + old_top_ec_node = tstate.ec + new_top_ec_node = ec_node( + prev=old_top_ec_node, + lc=gen.__logical_context__) + + try: + tstate.ec = new_top_ec_node + return _gen_send_impl(gen, ...) + finally: + gen.__logical_context__ = tstate.ec.lc + tstate.ec = old_top_ec_node + else: + return _gen_send_impl(gen, ...) + +* Coroutines and asynchronous generators share the implementation + with generators, and the above changes apply to them as well. + +In certain scenarios the EC may need to be squashed to limit the +size of the chain. For example, consider the following corner case:: + + async def repeat(coro, delay): + await coro() + await asyncio.sleep(delay) + loop.create_task(repeat(coro, delay)) + + async def ping(): + print('ping') + + loop = asyncio.get_event_loop() + loop.create_task(repeat(ping, 1)) + loop.run_forever() + +In the above code, the EC chain will grow as long as ``repeat()`` is +called. Each new task will call ``sys.run_in_execution_context()``, +which will append a new logical context to the chain. To prevent +unbounded growth, ``sys.get_execution_context()`` checks if the chain +is longer than a predetermined maximum, and if it is, squashes the +chain into a single LC:: -6. ``PyExecutionContext * PyExecutionContext_Get()``: get the - EC for the active thread state. + def get_execution_context(): + tstate = PyThreadState_Get() -7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the - passed EC object as the current for the active thread state. + if tstate.ec_len > EC_LEN_MAX: + squashed_lc = sys.new_logical_context() -8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, - PyLogicalContext *)``: allows to implement - ``sys.run_with_logical_context`` Python API. + ec_node = tstate.ec + while ec_node: + # The LC.merge() method does not replace existing keys. + squashed_lc = squashed_lc.merge(ec_node.lc) + ec_node = ec_node.prev + return ec_node(prev=NULL, lc=squashed_lc) + else: + return tstate.ec -Implementation Strategy -======================= -LogicalContext is a Weak Key Mapping ------------------------------------- +Logical Context +--------------- -Using a weak key mapping for ``LogicalContext`` implementation -enables the following properties with regards to garbage -collection: +Logical context is an immutable weak key mapping which has the +following properties with respect to garbage collection: -* ``ContextKey`` objects are strongly-referenced only from the - application code, not from any of the Execution Context - machinery or values they point to. This means that there - are no reference cycles that could extend their lifespan - longer than necessary, or prevent their garbage collection. +* ``ContextVar`` objects are strongly-referenced only from the + application code, not from any of the Execution Context machinery + or values they point to. This means that there are no reference + cycles that could extend their lifespan longer than necessary, or + prevent their collection by the GC. * Values put in the Execution Context are guaranteed to be kept - alive while there is a ``ContextKey`` key referencing them in + alive while there is a ``ContextVar`` key referencing them in the thread. -* If a ``ContextKey`` is garbage collected, all of its values will +* If a ``ContextVar`` is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed. * If a thread has ended its execution, its thread state will be cleaned up along with its ``ExecutionContext``, cleaning - up all values bound to all Context Keys in the thread. + up all values bound to all context variables in the thread. + +As discussed earluier, we need ``sys.get_execution_context()`` to be +consistently fast regardless of the size of the execution context, so +logical context is necessarily an immutable mapping. +Choosing ``dict`` for the underlying implementation is suboptimal, +because ``LC.set()`` will cause ``dict.copy()``, which is an O(N) +operation, where *N* is the number of items in the LC. -ContextKey.get() Cache ----------------------- +``get_execution_context()``, when squashing the EC, is a O(M) +operation, where *M* is the total number of context variable values +in the EC. -We can add three new fields to ``PyThreadState`` and -``PyInterpreterState`` structs: +So, instead of ``dict``, we choose Hash Array Mapped Trie (HAMT) +as the underlying implementation of logical contexts. (Scala and +Clojure use HAMT to implement high performance immutable collections +[5]_, [6]_.) -* ``uint64_t PyThreadState->unique_id``: a globally unique - thread state identifier (we can add a counter to - ``PyInterpreterState`` and increment it when a new thread state is - created.) +With HAMT ``.set()`` becomes an O(log N) operation, and +``get_execution_context()`` squashing is more efficient on average due +to structural sharing in HAMT. -* ``uint64_t ContextKey->version``: every time the key is updated - in any logical context or thread, this key will be incremented. +See `Appendix: HAMT Performance Analysis`_ for a more elaborate +analysis of HAMT performance compared to ``dict``. -The above two fields allow implementing a fast cache path in -``ContextKey.get()``, in pseudo-code:: - class ContextKey: +Context Variables +----------------- + +The ``ContextVar.lookup()`` and ``ContextVar.set()`` methods are +implemented as follows (in pseudo-code):: + + class ContextVar: + + def get(self): + tstate = PyThreadState_Get() + + ec_node = tstate.ec + while ec_node: + if self in ec_node.lc: + return ec_node.lc[self] + ec_node = ec_node.prev + + return None + + def set(self, value): + tstate = PyThreadState_Get() + top_ec_node = tstate.ec + + if top_ec_node is not None: + top_lc = top_ec_node.lc + new_top_lc = top_lc.set(self, value) + tstate.ec = ec_node( + prev=top_ec_node.prev, + lc=new_top_lc) + else: + top_lc = sys.new_logical_context() + new_top_lc = top_lc.set(self, value) + tstate.ec = ec_node( + prev=NULL, + lc=new_top_lc) + +For efficient access in performance-sensitive code paths, such as in +``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``, +making it an O(1) operation when the cache is hit. The cache key is +composed from the following: + +* The new ``uint64_t PyThreadState->unique_id``, which is a globally + unique thread state identifier. It is computed from the new + ``uint64_t PyInterpreterState->ts_counter``, which is incremented + whenever a new thread state is created. + +* The ``uint64_t ContextVar->version`` counter, which is incremented + whenever the context variable value is changed in any logical context + in any thread. + +The cache is then implemented as follows:: + + class ContextVar: def set(self, value): ... # implementation @@ -892,11 +993,7 @@ The above two fields allow implementing a fast cache path in self.last_version == self.version): return self.last_value - value = None - for mapping in reversed(tstate.execution_context): - if self in mapping: - value = mapping[self] - break + value = self._get_uncached() self.last_value = value # borrowed ref self.last_tstate_id = tstate.unique_id @@ -905,158 +1002,130 @@ The above two fields allow implementing a fast cache path in return value Note that ``last_value`` is a borrowed reference. The assumption -is that if current thread and key version tests are OK, the object -will be alive. This allows the CK values to be properly GCed. - -This is similar to the trick that decimal C implementation uses -for caching the current decimal context, and will have the same -performance characteristics, but available to all -Execution Context users. - - -Approach #1: Use a dict for LogicalContext ------------------------------------------- +is that if the version checks are fine, the object will be alive. +This allows the values of context variables to be properly garbage +collected. -The straightforward way of implementing the proposed EC -mechanisms is to create a ``WeakKeyDict`` on top of Python -``dict`` type. +This generic caching approach is similar to what the current C +implementation of ``decimal`` does to cache the the current decimal +context, and has similar performance characteristics. -To implement the ``ExecutionContext`` type we can use Python -``list`` (or a custom stack implementation with some -pre-allocation optimizations). -This approach will have the following runtime complexity: - -* O(M) for ``ContextKey.get()``, where ``M`` is the number of - Logical Contexts in the stack. - - It is important to note that ``ContextKey.get()`` will implement - a cache making the operation O(1) for packages like ``decimal`` - and ``numpy``. +Performance Considerations +========================== -* O(1) for ``ContextKey.set()``. +Tests of the reference implementation based on the prior +revisions of this PEP have shown 1-2% slowdown on generator +microbenchmarks and no noticeable difference in macrobenchmarks. -* O(N) for ``sys.get_execution_context()``, where ``N`` is the - total number of keys/values in the current **execution** context. +The performance of non-generator and non-async code is not +affected by this PEP. -Approach #2: Use HAMT for LogicalContext ----------------------------------------- +Summary of the New APIs +======================= -Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) -to implement high performance immutable collections [5]_, [6]_. +Python +------ -Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) -performance for both ``set()``, ``get()``, and ``merge()`` operations, -which is essentially O(1) for relatively small mappings -(read about HAMT performance in CPython in the -`Appendix: HAMT Performance`_ section.) +The following new Python APIs are introduced by this PEP: -In this approach we use the same design of the ``ExecutionContext`` -as in Approach #1, but we will use HAMT backed weak key Logical Context -implementation. With that we will have the following runtime -complexity: +1. The ``sys.new_context_var(name: str='...')`` function to create + ``ContextVar`` objects. -* O(M * log\ :sub:`32`\ N) for ``ContextKey.get()``, - where ``M`` is the number of Logical Contexts in the stack, - and ``N`` is the number of keys/values in the EC. The operation - will essentially be O(M), because execution contexts are normally - not expected to have more than a few dozen of keys/values. +2. The ``ContextVar`` object, which has: - (``ContextKey.get()`` will have the same caching mechanism as in - Approach #1.) + * the read-only ``.name`` attribute, + * the ``.lookup()`` method which returns the value of the variable + in the current execution context; + * the ``.set()`` method which sets the value of the variable in + the current execution context. -* O(log\ :sub:`32`\ N) for ``ContextKey.set()`` where ``N`` is the - number of keys/values in the current **logical** context. This will - essentially be an O(1) operation most of the time. +3. The ``sys.get_execution_context()`` function, which returns a + copy of the current execution context. -* O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where - ``N`` is the total number of keys/values in the current - **execution** context. +4. The ``sys.new_execution_context()`` function, which returns a new + empty execution context. -Essentially, using HAMT for Logical Contexts instead of Python dicts, -allows to bring down the complexity of ``sys.get_execution_context()`` -from O(N) to O(log\ :sub:`32`\ N) because of the more efficient -merge algorithm. +5. The ``sys.new_logical_context()`` function, which returns a new + empty logical context. +6. The ``sys.run_with_execution_context(ec: ExecutionContext, + func, *args, **kwargs)`` function, which runs *func* with the + provided execution context. -Approach #3: Use HAMT and Immutable Linked List ------------------------------------------------ +7. The ``sys.run_with_logical_context(lc:LogicalContext, + func, *args, **kwargs)`` function, which runs *func* with the + provided logical context on top of the current execution context. -We can make an alternative ``ExecutionContext`` design by using -a linked list. Each ``LogicalContext`` in the ``ExecutionContext`` -object will be wrapped in a linked-list node. -``LogicalContext`` objects will use an HAMT backed weak key -implementation described in the Approach #2. +C API +----- -Every modification to the current ``LogicalContext`` will produce a -new version of it, which will be wrapped in a **new linked list -node**. Essentially this means, that ``ExecutionContext`` is an -immutable forest of ``LogicalContext`` objects, and can be safely -copied by reference in ``sys.get_execution_context()`` (eliminating -the expensive "merge" operation.) +1. ``PyContextVar * PyContext_NewVar(char *desc)``: create a + ``PyContextVar`` object. -With this approach, ``sys.get_execution_context()`` will be a -constant time **O(1) operation**. +2. ``PyObject * PyContext_LookupVar(PyContextVar *)``: return + the value of the variable in the current execution context. -In case we decide to apply additional optimizations such as -flattening ECs with too many Logical Contexts, HAMT-backed -immutable mapping will have a O(log\ :sub:`32`\ N) merge -complexity. +3. ``int PyContext_SetVar(PyContextVar *, PyObject *)``: set + the value of the variable in the current execution context. +4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty + ``PyLogicalContext``. -Summary -------- +5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty + ``PyExecutionContext``. -We believe that approach #3 enables an efficient and complete -Execution Context implementation, with excellent runtime performance. +6. ``PyExecutionContext * PyExecutionContext_Get()``: return the + current execution context. -`ContextKey.get() Cache`_ enables fast retrieval of context keys -for performance critical libraries like decimal and numpy. +7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the + passed EC object as the current for the active thread state. -Fast ``sys.get_execution_context()`` enables efficient management -of execution contexts in asynchronous libraries like asyncio. +8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, + PyLogicalContext *)``: allows to implement + ``sys.run_with_logical_context`` Python API. Design Considerations ===================== -Can we fix ``PyThreadState_GetDict()``? ---------------------------------------- +Should ``PyThreadState_GetDict()`` use the execution context? +------------------------------------------------------------- -``PyThreadState_GetDict`` is a TLS, and some of its existing users -might depend on it being just a TLS. Changing its behaviour to follow -the Execution Context semantics would break backwards compatibility. +No. ``PyThreadState_GetDict`` is based on TLS, and changing its +semantics will break backwards compatibility. PEP 521 ------- -:pep:`521` proposes an alternative solution to the problem: -enhance Context Manager Protocol with two new methods: ``__suspend__`` -and ``__resume__``. To make it compatible with async/await, -the Asynchronous Context Manager Protocol will also need to be -extended with ``__asuspend__`` and ``__aresume__``. +:pep:`521` proposes an alternative solution to the problem, which +extends the context manager protocol with two new methods: +``__suspend__()`` and ``__resume__()``. Similarly, the asynchronous +context manager protocol is also extended with ``__asuspend__()`` and +``__aresume__()``. -This allows to implement context managers like decimal context and -``numpy.errstate`` for generators and coroutines. +This allows implementing context managers that manage non-local state, +which behave correctly in generators and coroutines. -The following code:: +For example, consider the following context manager, which uses +execution state:: class Context: def __init__(self): - self.key = new_context_key('key') + self.var = new_context_var('var') def __enter__(self): - self.old_x = self.key.get() - self.key.set('something') + self.old_x = self.var.lookup() + self.var.set('something') def __exit__(self, *err): - self.key.set(self.old_x) + self.var.set(self.old_x) -would become this:: +An equivalent implementation with PEP 521:: local = threading.local() @@ -1075,26 +1144,21 @@ would become this:: def __exit__(self, *err): local.x = self.old_x -Besides complicating the protocol, the implementation will likely -negatively impact performance of coroutines, generators, and any code -that uses context managers, and will notably complicate the -interpreter implementation. +The downside of this approach is the addition of significant new +complexity to the context manager protocol and the interpreter +implementation. This approach is also likely to negatively impact +the performance of generators and coroutines. -:pep:`521` also does not provide any mechanism to propagate state -in a logical context, like storing a request object in an HTTP request -handler to have better logging. Nor does it solve the leaking state -problem for greenlet/gevent. +Additionally, the solution in :pep:`521` is limited to context managers, +and does not provide any mechanism to propagate state in asynchronous +tasks and callbacks. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- -Because async/await code needs an event loop to run it, an EC-like -solution can be implemented in a limited way for coroutines. - -Generators, on the other hand, do not have an event loop or -trampoline, making it impossible to intercept their ``yield`` points -outside of the Python interpreter. +No. Proper generator behaviour with respect to the execution context +requires changes to the interpreter. Should we update sys.displayhook and other APIs to use EC? @@ -1111,44 +1175,41 @@ That said we think it is possible to design new APIs that will be context aware, but that is outside of the scope of this PEP. +Greenlets +--------- + +Greenlet is an alternative implementation of cooperative +scheduling for Python. Although greenlet package is not part of +CPython, popular frameworks like gevent rely on it, and it is +important that greenlet can be modified to support execution +contexts. + +Conceptually, the behaviour of greenlets is very similar to that of +generators, which means that similar changes around greenlet entry +and exit can be done to add support for execution context. + + Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. -Appendix: HAMT Performance -========================== - -While investigating possibilities of how to implement an immutable -mapping in CPython, we were able to improve the efficiency -of ``dict.copy()`` up to 5 times: [4]_. One caveat is that the -improved ``dict.copy()`` does not resize the dict, which is a -necessary thing to do when items get deleted from the dict. -Which means that we can make ``dict.copy()`` faster for only dicts -that don't need to be resized, and the ones that do, will use -a slower version. - -To assess if HAMT can be used for Execution Context, we implemented -it in CPython [7]_. +Appendix: HAMT Performance Analysis +=================================== -.. figure:: pep-0550-hamt_vs_dict.png +.. figure:: pep-0550-hamt_vs_dict-v2.png :align: center :width: 100% Figure 1. Benchmark code can be found here: [9]_. -The chart illustrates the following: +The above chart demonstrates that: * HAMT displays near O(1) performance for all benchmarked dictionary sizes. -* If we can use the optimized ``dict.copy()`` implementation ([4]_), - the performance of immutable mapping implemented with Python - ``dict`` is good up until 100 items. - -* A dict with an unoptimized ``dict.copy()`` becomes very slow - around 100 items. +* ``dict.copy()`` becomes very slow around 100 items. .. figure:: pep-0550-lookup_hamt.png :align: center @@ -1156,30 +1217,25 @@ The chart illustrates the following: Figure 2. Benchmark code can be found here: [10]_. -Figure 2 shows comparison of lookup costs between Python dict -and an HAMT immutable mapping. HAMT lookup time is 30-40% worse -than Python dict lookups on average, which is a very good result, -considering how well Python dicts are optimized. - -Note, that according to [8]_, HAMT design can be further improved. +Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based +immutable mapping. HAMT lookup time is 30-40% slower than Python dict +lookups on average, which is a very good result, considering that the +latter is very well optimized. -The bottom line is that it is possible to imagine a scenario when -an application has more than 100 items in the Execution Context, in -which case the dict-backed implementation of an immutable mapping -becomes a subpar choice. +Thre is research [8]_ showing that there are further possible +improvements to the performance of HAMT. -HAMT on the other hand guarantees that its ``set()``, ``get()``, -and ``merge()`` operations will execute in O(log\ :sub:`32`\ ) time, -which means it is a more future proof solution. +The reference implementation of HAMT for CPython can be found here: +[7]_. Acknowledgments =============== -I thank Elvis Pranskevichus and Victor Petrovykh for countless -discussions around the topic and PEP proof reading and edits. +Thanks to Victor Petrovykh for countless discussions around the topic +and PEP proofreading and edits. -Thanks to Nathaniel Smith for proposing the ``ContextKey`` design +Thanks to Nathaniel Smith for proposing the ``ContextVar`` design [17]_ [18]_, for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state. @@ -1192,9 +1248,9 @@ rewrite of the initial PEP version [19]_. Version History =============== -1. Posted on 11-Aug-2017, view it here: [20]_. +1. Initial revision, posted on 11-Aug-2017 [20]_. -2. Posted on 15-Aug-2017, view it here: [21]_. +2. V2 posted on 15-Aug-2017 [21]_. The fundamental limitation that caused a complete redesign of the first version was that it was not possible to implement an iterator @@ -1204,7 +1260,7 @@ Version History Version 2 was a complete rewrite, introducing new terminology (Local Context, Execution Context, Context Item) and new APIs. -3. Posted on 18-Aug-2017: the current version. +3. V3 posted on 18-Aug-2017 [22]_. Updates: @@ -1212,18 +1268,23 @@ Version History was ambiguous and conflicted with local name scopes. * Context Item was renamed to Context Key, see the thread with Nick - Coghlan, Stefan Krah, and Yury Selivanov [22]_ for details. + Coghlan, Stefan Krah, and Yury Selivanov [23]_ for details. * Context Item get cache design was adjusted, per Nathaniel Smith's - idea in [24]_. + idea in [25]_. * Coroutines are created without a Logical Context; ceval loop no longer needs to special case the ``await`` expression - (proposed by Nick Coghlan in [23]_.) + (proposed by Nick Coghlan in [24]_.) + +4. V4 posted on 25-Aug-2017: the current version. - * `Appendix: HAMT Performance`_ section was updated with more - details about the proposed ``dict.copy()`` optimization and - its limitations. + * The specification section has been completely rewritten. + + * Context Key renamed to Context Var. + + * Removed the distinction between generators and coroutines with + respect to logical context isolation. References @@ -1271,11 +1332,13 @@ References .. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c175/pep-0550.rst -.. [22] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html +.. [22] https://github.com/python/peps/blob/287ed87bb475a7da657f950b353c71c1248f67e7/pep-0550.rst + +.. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html -.. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html +.. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html -.. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html +.. [25] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html Copyright