diff --git a/doc/neps/deferred-execution.md b/doc/neps/deferred-execution.md new file mode 100644 index 000000000000..fa6bdf5009b6 --- /dev/null +++ b/doc/neps/deferred-execution.md @@ -0,0 +1,94 @@ +# Deferred Execution +- Author: Matt Harrigan +- Created: 03-April-2017 + +## Abstract +This NEP describes a proposal to add deferred execution of operations on arrays. This will allow various performance +optimizations, most notably reducing memory bandwidth by processing cache sized blocks through multiple operations in a +single pass. This NEP is largely a variation on this [NEP](https://docs.scipy.org/doc/numpy-dev/neps/deferred-ufunc-evaluation.html) +The key difference is leveraging the `__array_ufunc__` method for a cleaner API. + +## Proposed API +The functionality would be defined in a new class `DeferredExecution`. Two key methods contain the core of the proposed +functionality: +- execute(): returns the result of all operations as an ndarray, or a namedtuple of arrays for the result and each output call. +- output(name=None): can be called after any operation so the subsequent execute call additionally returns the result at that stage + of the operator graph. Name is an optional argument which defines the variable name in the output namedtuple, default is + 'output_{}''.format(index) + +Other required methods to complete the basic functionality would be: +- the constructor would take an ndarray and wrap it +- methods for debug printing and introspection +- many of the special methods for basic arithmetic, logical operations, assignment, and indexing + +## Example Usage +``` +def foo(arr): + # function to convert from degree to radians and find the sum of squares + # very little modification to typical numpy code is required for deferred execution + + # create deferred execution array + deferred = DeferredExecutionArray(arr) + + # created the first operation + # no calculations are performed + # the returned object has references to the inputs and operations, the beginning of the operation graph + scaled = deferred * np.pi / 180 + + # add the next operations to the deferred array + # regular ufuncs can be used + sum_square = np.add.reduce(np.square(scaled) + + # execute, returning the result as an ndarray + result = sum_square.execute() + + return result + + +def bar(arr): + plus1 = DeferredExecutionArray(arr) + 1 + plus1.output() + plus2 = plus1 + 1 + + # returns a namedtuple of DeferredExecutionResult(output_0=arr+1, result=arr+2) + # arguments are returned in order of persist calls, last being the final result + result = plus2.execute() +``` + +## Rationale +### output +A user would often want to keep intermediate results in a long chain of operations. Therefore output() was added. It +allows for those arrays to be returned, but also allows for optimization over the largest possible number of operations, +since execute would typically only be called once. The optimization algorithm should know what needs to be returned, +since if an array is only a temporary intermediate, more optimizations may be possible such as operation reordering and +further reductions in memory bandwidth. It also would allow for reduction in total memory required. Therefore the +proposed API requires the user to explicitly indicate any intermediate results that must be returned. + +### execute +The prior NEP had implicit execution. For example, a print statement may trigger an execution. Many other possible +functions could trigger an execute. That adds quite a bit of complexity. It also limits the optimization opportunities. +Therefore in the proposed API all executions are explicitly triggered by the user invoking the execute method. Calling +print on an instance of DeferredExecutionArray would print (hopefully useful) debug information about the inputs and +operations it refers to, but would not trigger and execution. + +## Implementation +The key to allowing for this functionality is `__array_ufunc__`. It allows for arbitrary objects to be returned when +normal ufuncs are used. In this case the returned object is a proxy which keeps track of inputs and operations as a +graph. They are executed efficiently after performing any number of optimizations. + +DeferredExecution is not a subclass of ndarray. The main reason is that it would violate the substitution principle. +DeferredExecution objects are fundamentally quite different in many respects. + +If any input or output argument if of type DeferredExecution, the result is a DeferredExecution object. This applys +to ufuncs, assignment, and selection. Passing a regular ndarry as out when other arguments are DeferredExecution objects +is a TypeError. + +## Error Handling +Error checking that does not require the operation to be performed should be done when that operation is defined. A +common example would be checking for compatible shapes. Errors during execution such as divide by zero are propagated +up when execute is called. + +## Roadmap +Initially a subset of numpy functionality could be supported. Supporting regular ufuncs (no core dimensions) might +be a good initial start. Subsequently other generalized ufuncs or non-ufunc functions could be added. Also the +number and complexity of optimizations could increase over time.