Skip to content

RFC: add Tool.outputSchema and CallToolResult.structuredContent #371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 8, 2025

Conversation

bhosmer-ant
Copy link
Contributor

@bhosmer-ant bhosmer-ant commented Apr 19, 2025

[update: relaxed the modality of content and structuredContent and the coupling of structuredContent and outputSchema validation in #559]

Adds support for strict validation of structured tool results.

  • A Tool can now optionally provide an outputSchema property, containing a JSON schema that defines the structure of its output.
  • CallToolResult adds a new structuredContent property, mutually exclusive with CallToolResult.content property:
    • for Tools that do not declare an outputSchema, result.structuredContent will be absent, and result.content will be returned as before.
    • for Tools that declare an outputSchema, result.structuredContent will contain an object whose contents must validate against the tool's outputSchema.

Prototype for typescript SDK support in modelcontextprotocol/typescript-sdk#454.

Design notes

This PR aims to provide simple, lightweight support for strict validation of tool result data whose structure can be entirely described by a single JSON schema. The approach here pairs a new Tool.outputSchema property with a new CallToolResult.structuredContent property, avoiding use of the CallToolResult.content array.

This approach leaves the path open for adding schematic validation support to the much richer and more complex space of tools that make use of the full expressiveness of the CallToolResult.content array, via an additional Tool property. Support for these use cases has been proposed in #356, and is under active discussion there.

(After exploring possible ways of providing integrated support for both kinds use cases with one set of protocol additions, it's clear that both will be better served by a disjoint approach: strict validation of statically typed data results can be accomplished with the simple additions provided here, and the subtleties arising from supporting full space of CallToolResult.content shapes - see e.g. #415, in addition to #356 - can be addressed more naturally absent the need to support the use cases addressed here.)

Motivation and Context

For tools that return structured output, having a description of that structure available is useful for various tasks, including:

  • Validating the structure of tool results (and performing a more informed examination of the values they contain, post-validation). Especially useful when interacting with untrusted servers.
  • Considering outputSchemas (or their absence) when making decisions about which tools to expose to the model.
  • Transforming tool results before forwarding content to the model (e.g. formatting, projecting).
  • Making tool results available as structured data in coding environments.

How Has This Been Tested?

No tests yet.

Breaking Changes

Optional new property that introduces a new behavior, not a breaking change.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

@bhosmer-ant bhosmer-ant changed the title Add Tool.outputSchema RFC: add Tool.outputSchema Apr 19, 2025
@bhosmer-ant bhosmer-ant requested a review from ihrpr April 19, 2025 05:18
@evalstate
Copy link
Member

evalstate commented Apr 19, 2025

A couple of comments on this as I prepare to undraft #223 for RFC:

  • I've updated RFC: Client / Server Content capabilities #223 to indicate that Servers that support structured output should advertise generates: ["application/json"]
  • Is there any consideration for the Server returning a TextResourceContents with a mimeType of application/json. I think this would be a more deliberate action by the Server in this scenario.

[update]
The specific proposal would be to return a CallToolResult as follows:

{
  "jsonrpc": "2.0",
  "id": "abc123",
  "result": {
    "content": [
      {
        "type": "resource",
        "resource": {
          "uri": "file:///example/data.json",
          "mimeType": "application/json",
          "text": "{\"name\":\"John Doe\",..... and so on }}"
        }
      }
    ],
    "isError": false
  }
}

With guidance that Servers returning a Structured Response MUST return a CallToolResult containing one EmbeddedResource of type application/json.

@lukaswelinder
Copy link

Having outputSchema restrict you to to returning a single text content entry who's text validates to the schema feels oddly restrictive. That approach makes annotations largely pointless, and I can think of plenty of cases where one would want to have multiple content entries that would be possible with #356:

  • Document processing:
    • Have an outputSchema that is treated as definitions for multiple document types
    • Return content entries for:
      • ImageContent - Generated thumbnail for the document
      • TextContent - Plain text contents of the document
      • DataContent - Structured data, with schema referencing one of the definitions in outputSchema
  • Multi-entity type search
    • Have an outputSchema that is treated as definitions for multiple entity types
    • Return multiple content entries with schema refs and annotations for relevance/importance

Another drawback I see is the lack of ability to dynamically define the structure/schema for response content. There are certainly cases where the output schema may not be known ahead of time, but would still be useful for the client or LLM consuming the content; it also enriches the capability of sampling and prompt messages.

Lastly, going with this approach, extending its functionality in the future would likely represent a breaking chance since it largely goes against the implied design pattern of CallToolResult having an arbitrary number of content entries.

@bhosmer-ant bhosmer-ant marked this pull request as draft April 20, 2025 12:51
@bhosmer-ant
Copy link
Contributor Author

Having outputSchema restrict you to to returning a single text content entry who's text validates to the schema feels oddly restrictive. That approach makes annotations largely pointless, and I can think of plenty of cases where one would want to have multiple content entries that would be possible with #356:

  • Document processing:

    • Have an outputSchema that is treated as definitions for multiple document types

    • Return content entries for:

      • ImageContent - Generated thumbnail for the document
      • TextContent - Plain text contents of the document
      • DataContent - Structured data, with schema referencing one of the definitions in outputSchema
  • Multi-entity type search

    • Have an outputSchema that is treated as definitions for multiple entity types
    • Return multiple content entries with schema refs and annotations for relevance/importance

Another drawback I see is the lack of ability to dynamically define the structure/schema for response content. There are certainly cases where the output schema may not be known ahead of time, but would still be useful for the client or LLM consuming the content; it also enriches the capability of sampling and prompt messages.

Lastly, going with this approach, extending its functionality in the future would likely represent a breaking chance since it largely goes against the implied design pattern of CallToolResult having an arbitrary number of content entries.

@lukaswelinder first of all, my apologies for putting up this PR without first participating in the discussion on #356 - I only saw it as I was writing the PR comments for this, but hadn't had a chance to look properly yet. Definitely didn't mean to step on your ongoing work.

I'll respond to your comment a bit later (not at keyboard right now) and also make comments on #356. Meanwhile I'll move this to draft, pending further discussion.

@bhosmer-ant
Copy link
Contributor Author

A couple of comments on this as I prepare to undraft #223 for RFC:

  • I've updated RFC: Client / Server Content capabilities #223 to indicate that Servers that support structured output should advertise generates: ["application/json"]
  • Is there any consideration for the Server returning a TextResourceContents with a mimeType of application/json. I think this would be a more deliberate action by the Server in this scenario.

[update] The specific proposal would be to return a CallToolResult as follows:

{
  "jsonrpc": "2.0",
  "id": "abc123",
  "result": {
    "content": [
      {
        "type": "resource",
        "resource": {
          "uri": "file:///example/data.json",
          "mimeType": "application/json",
          "text": "{\"name\":\"John Doe\",..... and so on }}"
        }
      }
    ],
    "isError": false
  }
}

With guidance that Servers returning a Structured Response MUST return a CallToolResult containing one EmbeddedResource of type application/json.

@evalstate thanks for the heads up - will come back to this after we see what comes out of the discussion on #356 , per previous comment

@lukaswelinder
Copy link

@lukaswelinder first of all, my apologies for putting up this PR without first participating in the discussion on #356 - I only saw it as I was writing the PR comments for this, but hadn't had a chance to look properly yet. Definitely didn't mean to step on your ongoing work.

I'll respond to your comment a bit later (not at keyboard right now) and also make comments on #356. Meanwhile I'll move this to draft, pending further discussion.

@bhosmer-ant No offense taken, just glad to see there is motivation here. Input and feedback on #356 would be great.

@bhosmer-ant bhosmer-ant force-pushed the basil/output_schema branch from 30d1a5e to 9876ab9 Compare May 6, 2025 02:40
@bhosmer-ant bhosmer-ant changed the title RFC: add Tool.outputSchema RFC: add Tool.outputSchema and CallToolResult.structuredContent May 6, 2025
@bhosmer-ant bhosmer-ant marked this pull request as ready for review May 6, 2025 04:09
Copy link

@sambhav sambhav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely LGTM, thank you so much for moving this forward 🙏. Minor nit: given it's json native, I would love if we could avoid json-in-json and just have structuredOutput as an object.

"result": {
"structuredContent": {
"type": "text",
"text": "[{\"id\":\"doc-1\",\"title\":\"Introduction to MCP\",\"url\":\"https://example.com/docs/1\"},{\"id\":\"doc-2\",\"title\":\"Tool Usage Guide\",\"url\":\"https://example.com/docs/2\"}]"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have a separate key for structuredContent, would it be preferable for its value to directly conform to the outputSchema rather than having a type and a serialized json? At the very least should we avoid serialization?

*
* If the Tool defines an outputSchema, this field MUST be present in the result, and contain a serialized JSON object that matches the schema.
*/
structuredContent?: string;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
structuredContent?: string;
structuredContent?: object;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be aligned more with tool input?

Suggested change
structuredContent?: string;
structuredContent?: { [key: string]: unknown };

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my comment above, maybe

Suggested change
structuredContent?: string;
structuredContent?: any;


1. Clients **MUST** validate that the tool result contains a `structuredContent` field whose contents validate against the declared `outputSchema`.

2. Servers **MUST** provide tool results that conform to their declared `outputSchema`s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to provide some guidelines on how the server should respond when validation of the output against the schema fails.
For example InvalidParams is used in https://github.com/modelcontextprotocol/typescript-sdk/pull/454/files
or recommend server authors to pick a custom error code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not sure we'd need to make an error path explicit for this situation - this was more just a restatement of the contract, since (in the absence of upstream errors returned in the usual way) the server should be able to guarantee that results conform to the declared schema. But definitely LMK if you have a specific situation in mind that I'm not thinking of!

@@ -130,11 +131,12 @@
"isError": {
"description": "Whether the tool call ended in an error.\n\nIf not set, this is assumed to be false (the call was successful).",
"type": "boolean"
},
"structuredContent": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, given that outputSchema is any arbitrary json schema (which makes sense), should this be of type any rather than an object,i.e. should we omit the type=object? It would make sense for eg if the tool just returns an int or even an array. This as is will prevent an array of objects from being returned for eg. even though you could describe it in json schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah see above - on balance I think the sticking with the top-level-object restriction in the rest of the protocol (and standard practice more generally) is worth the extra trouble of wrapping top-level primitives/arrays

*
* If the Tool defines an outputSchema, this field MUST be present in the result, and contain a serialized JSON object that matches the schema.
*/
structuredContent?: string;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my comment above, maybe

Suggested change
structuredContent?: string;
structuredContent?: any;

- `structuredOutput` typed as object rather than string
- tighten CallToolResult to codify at-least-one constraint and explicitly allow structured results to contain `content` (but not the reverse)
- update docs
@bhosmer-ant
Copy link
Contributor Author

@ihrpr fyi new rev makes structuredContent an object, and updates the docs w/compatibility language (and a better example). (TS SDK example updated too)

@evalstate
Copy link
Member

evalstate commented May 7, 2025

OK - I'll just note my outstanding concerns on this one - not expecting a response - just adding my perspective as a Host application developer.

  • MCP Server SDK. Introduction of return type polymorphism based on the presence of the Tool outputSchema will make the developer experience around tool definition and implementation more complex than necessary.
  • MCP Client SDK. Return type polymorphism needs to be handled by the SDK along with additional validation, meaning changes will be needed for implementation, and requiring the Host integrator to special-case the new "structured" return type.
  • Compatibility. Forward/Backward compatibility is managed by the MCP Server itself rather than handled by a stated convention within the SDK. This gives a large number of possibilities to integrate and test for - and opens challenges (potentially including security) when there is a content mismatch, as well as potentially doubling the length of returned content.
  • Consistency. Currently Tools, Prompts and Resources have a logical consistency between their types. This adds a unique Tool-only condition that can't otherwise be represented within the MCP protocol (conceptual fragmentation).
  • JSON Specific. The use of mime types for the schema and payload would allow the use of non-JSON structures if desired.
  • Host Application Development. For someone building a generic Host application it's still not immediately obvious what the benefit is in receiving the data in structured form. Since both schema and content are supplied by the Server, the "interacting with untrusted servers" motivation isn't obviously improved here. Without an identifying uri or prior knowledge of the server/schema this is still "just JSON tokens". On this basis, this change brings extra effort to me as an integrator, with no clear benefit.

}
```

Example valid response for this tool:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add an example for a valid response that is b/w comaptible?

2. Servers **MUST** provide structured results in `structuredContent` that conform to the declared `outputSchema` of the tool.

<Info>
For backwards compatibility, a tool that declares an `outputSchema` may also return unstructured results in the `content` field.
Copy link

@sambhav sambhav May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be based on the MCP client version string during version negotiation rather than doing this unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useful to avoid mandating complex version-dependent logic here. (Paraphrasing offline discussion with @ihrpr:) In practice the SDKs will be handling construction of actual results, so it's useful to leave some freedom, and client/server devs will be spared the implementation details in any case. But even if both formats are sent unconditionally, the perf impact isn't huge, definitely not worth baking complexity into the spec to avoid.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it say "for backwards compatibility"? Does that mean structured output is always preferable over unstructured output? Why not specifically encourage tools to return both an unstructured and a structured response, so that the MCP client can pick the desired one based on the use case? For example, an unstructured response can be optimized for an LLM to read (e.g. when chatting with Claude), and a structured result can be used when further transformations are required.

Furthermore, you might also want to allow structured results contain extra data like IDs or timestamps that are not very meaningful for an LLM, but can be useful when doing data transformations (This also contradicts the bullet points below).

Copy link
Contributor

@ihrpr ihrpr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@bhosmer-ant bhosmer-ant merged commit 03dc24b into main May 8, 2025
5 checks passed
@bhosmer-ant bhosmer-ant deleted the basil/output_schema branch May 8, 2025 15:25
@ihrpr ihrpr added this to the DRAFT 2025-06-XX milestone Jun 9, 2025
@ihrpr ihrpr moved this from Draft to Approved in Standards Track Jun 9, 2025
@jonathanhefner jonathanhefner linked an issue Jun 27, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Approved
Development

Successfully merging this pull request may close these issues.

Bring back the concept of "toolResult" (non-chat result)
8 participants