Feature Request: Enhanced Run Lifecycle Management - Interrupt and Update Active Runs #798

chiehmin-wei · 2025-06-02T06:24:14Z

Is your feature request related to a problem? Please describe.
The current Runner.run and Runner.run_streamed methods in src/agents/run.py execute an agent workflow to completion and return a final RunResult or RunResultStreaming. This design doesn't allow for real-time interruption of an ongoing run or the injection of new information that could alter the agent's subsequent actions. This is a limitation for dynamic use cases like on-call incident response, where an incident might be resolved mid-investigation or new critical findings might emerge.

Describe the solution you'd like
We propose enhancements to the Runner and its associated components to support a more dynamic run lifecycle:

Cancellable/Interruptible Runs:
- Need: The Runner.run and Runner.run_streamed methods should return an object (e.g., an ActiveRun or Task-like handle) that exposes a method to signal cancellation (e.g., active_run.cancel()).
- How it could work:
  - The while True loop within Runner.run (and the equivalent in _run_streamed_impl) would need to check an internal flag or an asyncio.Event set by the cancel() method on the returned handle.
  - Upon detecting a cancellation request, the loop would terminate gracefully, potentially allowing for a final cleanup span and then raising a specific RunCancelledException or returning a RunResult indicating cancellation.
  - For run_streamed, the _run_impl_task (an asyncio.Task) created in Runner.run_streamed seems like a natural candidate to be cancelled. The RunResultStreaming object could expose a method that cancels this underlying task.
Dynamically Add Information to an Active Run:
- Need: A mechanism to inject new information (e.g., new user messages or tool-like observations) into an active run, so the agent can consider it in subsequent turns.
- How it could work:
  - The object returned by run/run_streamed could also expose a method like active_run.add_information(item: TResponseInputItem) or active_run.update_history(items: list[TResponseInputItem]).
  - Internally, this method would need to append to a queue or a shared list that _run_single_turn (or its streaming equivalent) checks before preparing the input for model.get_response or model.stream_response.
  - Specifically, the input variable within _run_single_turn or _run_single_turn_streamed (which is currently built from original_input and generated_items or streamed_result.input and streamed_result.new_items) would incorporate this externally added information before the next model invocation.
  - A new type of RunItem or a way to flag these externally added items in traces might be beneficial for debugging.

Why these features are important:
These features are critical for building agents that can adapt to rapidly changing external environments.

Interrupt Use Case (On-Call Triage):
- A PagerDuty alert triggers an investigation run via Runner.run.
- If an engineer resolves the incident manually, the external system should call active_run.cancel(). The agent run stops, preventing wasted resources and irrelevant notifications.
Add Information Use Case (On-Call Triage):
- While an agent is investigating an incident, an engineer posts a crucial finding (e.g., "Load balancer X is confirmed to be the culprit") in a Slack thread.
- The external system monitoring Slack calls active_run.add_information({"role": "user", "content": "New finding from Slack: Load balancer X is confirmed culprit."}).
- In the agent's next turn, this new message is included in its input history, allowing it to adjust its investigation strategy.

Current Limitations in src/agents/run.py:

Runner.run is an async def that awaits the full completion of the internal loop.
Runner.run_streamed creates an asyncio.create_task for _run_streamed_impl, which runs in the background. While tasks can be cancelled, there's no exposed, clean way to do this through the RunResultStreaming object.
The input for each turn is constructed based on the initial input and items generated internally by the agent's previous turns. There's no current mechanism to inject external data into this sequence mid-run.

These enhancements would significantly improve the SDK's applicability to real-time, interactive agent workflows.

The text was updated successfully, but these errors were encountered:

chiehmin-wei · 2025-06-02T06:24:48Z

I can work on a PR for this if this proposal is approved.

rm-openai · 2025-06-02T15:26:19Z

Would love to discuss this. My first instinct is that we should explore this architecture:

We store the state of the run in RunState
We expose run_single_step, which takes the previous step, if any, and runs the next step. A step would be similar to what happens now - call model, run tools, etc. It returns a new state that also indicates what it would want to do next.

Thoughts?

chiehmin-wei added the enhancement New feature or request label Jun 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Enhanced Run Lifecycle Management - Interrupt and Update Active Runs #798

Feature Request: Enhanced Run Lifecycle Management - Interrupt and Update Active Runs #798

chiehmin-wei commented Jun 2, 2025

chiehmin-wei commented Jun 2, 2025

Uh oh!

rm-openai commented Jun 2, 2025

Uh oh!

Feature Request: Enhanced Run Lifecycle Management - Interrupt and Update Active Runs #798

Feature Request: Enhanced Run Lifecycle Management - Interrupt and Update Active Runs #798

Comments

chiehmin-wei commented Jun 2, 2025

chiehmin-wei commented Jun 2, 2025

Uh oh!

rm-openai commented Jun 2, 2025

Uh oh!