Skip to content

Support For CodeAct In The Future? #383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DrChrisLevy opened this issue Mar 28, 2025 · 4 comments
Open

Support For CodeAct In The Future? #383

DrChrisLevy opened this issue Mar 28, 2025 · 4 comments
Labels
enhancement New feature or request

Comments

@DrChrisLevy
Copy link

I'm curious to know if there are any future plans on roadmap to also support the CodeAct approach as an alternative to JSON tool calling approach ?

Like discussed here:

https://x.com/_philschmid/status/1905536547627110727

Looks like lang-graph is giving it a try. https://github.com/langchain-ai/langgraph-codeact

@DrChrisLevy DrChrisLevy added the enhancement New feature or request label Mar 28, 2025
@rm-openai
Copy link
Collaborator

That's pretty cool. No immediate plans, but if you (or anyone in the community) built a good implementation, we'd be down to link to it from our docs. It should be possible without modifying the agents sdk. I'll also try and build somethng if I can find some time!

@DrChrisLevy
Copy link
Author

I was thinking about this more as I glance through some of the code (with help of AI). Here is what "we" concluded so far. It would seem tricky to integrate directly into the SDK. But thanks for entertaining the idea. I'm going to play with it some more...


That's an interesting question about integrating a CodeACT-style approach (like in smol-agents or langgraph-codeact) into the OpenAI Agents SDK.

Based on our exploration of this SDK and the general understanding of these approaches:

  1. OpenAI Agents SDK Core Mechanism: As we've seen, this SDK is built heavily around the concept of Tools and Handoffs. These are presented to the underlying language model, which then generates structured output (effectively JSON) to indicate which predefined tool or handoff to call and what arguments to pass to it (Functions example, Integrating Tools). The Runner then parses this structured output and executes the corresponding Python function (FunctionTool, Handoff, etc.) with the provided arguments.

  2. CodeACT Paradigm: The CodeACT approach, as you described, typically involves the LLM generating executable code (often Python) directly as its action or plan step, rather than selecting from a predefined set of tools via JSON. The system then executes this generated code.

  3. The Challenge: Integrating CodeACT directly into this SDK appears quite challenging because the fundamental action mechanism is different.

    • LLM Output: The OpenAI Agents SDK relies on the model producing specific structured tool_calls or handoff requests in its response. The CodeACT approach requires the model to generate raw code within its text response.
    • Response Processing: The SDK's core loop (_run_impl.py) is designed to parse the structured tool_calls attribute of the LLM response. It would need significant modification to instead parse and identify executable code blocks from the main message content.
    • Action Execution: The SDK executes predefined Python functions (FunctionTool) or specific ComputerTool actions based on the parsed arguments. A CodeACT integration would require a new mechanism to safely execute arbitrary code generated by the LLM. This involves significant security considerations (sandboxing, restricting available modules/functions) that are not currently part of the SDK's design.
    • State Management: Integrating the results of arbitrary code execution back into the SDK's state (RunItem list, context, etc.) would also require careful design.

Conclusion:

While not strictly impossible, extending the current OpenAI Agents SDK to support a CodeACT paradigm wouldn't be a simple plug-in. It would likely involve:

  • Changing the prompting strategy to encourage code generation instead of tool use.
  • Fundamentally altering the response parsing logic in the execution loop (_run_impl.py).
  • Adding a secure execution environment for the LLM-generated code.
  • Adapting the state and result handling mechanisms.

It seems less like an extension and more like building a parallel execution model or significantly refactoring the core components, given the SDK's current reliance on the JSON-based tool-calling capabilities of the underlying OpenAI APIs.

@aybidi
Copy link

aybidi commented Apr 6, 2025

I briefly read the description, but I wonder if, in the future, we foresee a different Agent class for CodeAct. Similar to how smolagents implements ToolCallingAgent (for JSON tool calling approach) and CodeAgent (for code act style tool calling approach), both inheriting from a single MultiStepAgent class and override the appropriate methods for input, output, handoff, etc.

@yihuang
Copy link

yihuang commented Apr 29, 2025

Isn't CodeAct just use code executor as one of the tools?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants