-
Notifications
You must be signed in to change notification settings - Fork 566
Add Resource.uriScope
to enable direct reads
#607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Resource.uriScope
to enable direct reads
#607
Conversation
I'm not a huge fan of this proposal since an MCP server does not always know where the client environment is and its network conditions. Consider...
base64 decoding does increase the resource size, but if MCP is served over a compressed protocol--which will generally be almost every non-local case--then the size increase is just a couple percent, fairly negligible. The main concern just becomes memory usage, which I think could be better solved with streamed reads. |
Perhaps the alternative approach with
In this case, I think Alternatively, the
The specifics of this scenario are unclear to me, but perhaps some combination of the above solutions could apply? |
How would the MCP server in this case know how / where to create that?
But the MCP server is just some random 3rd party binary, it does not know that web URIs it can access are not accessible to the client nor how to use the internal proxy server. |
It's up to the SDK to upgrade to this if it makes sense for a particular request, so this wouldn't necessarily cause problems. I put a sequence diagram in #527, but copying it here: sequenceDiagram
participant Application as Host Application
participant Client as MCP SDK Client
participant Server as MCP Server
participant CDN
Server->>Client: <resource reference from a tool call or something>
Client->>Application: <ref>
Application->>Client: readResource(ref)
alt resource.supportsDirectRead is supported and true
Client->>CDN: fetch
CDN->>Client: <resource content>
else resource.supportsDirectRead is not supported or false, or the resource URI is misunderstood, etc.
Client->>Server: resources/read
Server->>Client: <resource content>
end
Client->>Application: <resource object>
We could extend this to handle the error case where the resource URI was declared as As it's entirely up to the SDK to leverage, we could even (hypothetically) implement SDKs such that if |
The SDK on the server would not know this information. The SDK on the client... might? In a generic sense for an editor like VS Code where the client runs anywhere and connect to MCP's via arguments or URI configurations, we know a little bit about our own setup but don't know topology beyond that. To give another example, the folks over at Docker support a cloud running scenario and to do that (at least as of the demo I saw last week) they have a Docker MCP server running over stdio that proxies into their service, and as a client I am totally unaware this is happening, it looks like a normal local server to me. Going back to what I queried before: we can probably assume that traffic sent over the network is going to get compressed where base64 overhead is not that big. The pro's I see for this are:
The biggest benefit is no. 2, which works the best when you have things like S3 permalinks and so on. Is that big enough to take this instead of streaming, or could we have streaming in addition to this? |
The client SDK is the only entity that knows if the upgrade makes sense or not, because it would have to be explicitly written to handle particular resource schemes etc. On top of that, it's able to fall back to a standard resource read if there are connectivity mismatches like what you're describing, so that should just be an edge case where we'd fail the direct read and then succeed on the Regarding compression — one of the concerns driving this was memory usage by the application server. HTTP response compression also applies after the raw encoding to b64, so having compression doesn't avoid that memory increase at all. You would still have at least a few representations of the same file you'd be shuffling in memory on the server, which is the problem we're trying to sidestep:
(also I recognize you don't need all of those buffers in flight at the same time, I'm just listing them to illustrate the problem — only your largest buffer matters) I think this proposal is entirely independent of partial/chunked result streaming, and is not a replacement for it. This is solving a problem specific to large resources, and protocol-level streaming will still be the ideal solution for most use cases (and especially for tool calls, which aren't covered by this at all). |
Actually, on the topic of SDKs doing this transparently, should we also have an optional size parameter in the resource object, representing the byte size of the underlying resource? Given that this is basically an escape hatch, I think we should give SDKs the option to only upgrade to this above some arbitrary resource size. Something along the lines of being able to compare the cost of opening a new connection to speculatively request a file directly versus reusing the active connection for relatively small files. Naturally, that introduces some ambiguity around how to handle mismatches between the stated and real resource sizes (do we fail out? should the client attempt to handle it by allocating more buffers?), but it might be worth addressing that complexity if it means being able to make more intelligent decisions about opting into this. That would also allow us to nudge applications towards protocol-level streaming whenever that gets finalized, by tuning that breakpoint in the SDK according to both the total size of the file and whether or not it supports chunked delivery at a protocol level. IMO it should generally be made clear that we want to err towards what we can do within the protocol, but also want to enable efficient large file downloads with this where it makes sense. |
modelcontextprotocol/schema/2024-11-05/schema.ts Lines 447 to 452 in fb34d1d
Resource does have an optional |
f0d4b41
to
c76c416
Compare
Those are good points. I added a caveat to the documentation indicating that if the server is remote, then the client should verify it can access the URI before reading. If the URI is not accessible, the client should fall back to I think the main concern would be false positives / negatives. Since this is an optimization, clients can behave conservatively when unsure, but are there cases where a client could mistake a server-local URI for a generally accessible URI?
I had added documentation to |
c76c416
to
8792cfa
Compare
New alternative: instead of
That avoids having to do any heuristics on the resource URI. |
When reading a binary resource via `resources/read`, the resource content must be encoded into Base64 and wrapped in JSON on the server, and then must be decoded on the client. Base64 increases payload size by 33%, so this process can add significant overhead for large resources. This commit adds an optional `uriScope` property to `Resource`, which indicates when the resource content may be read directly via its URI, thereby avoiding the overhead. If the value of `uriScope` is `"external"`, clients may attempt to read via the URI. If the value of `uriScope` is `"internal"`, clients must verify that the server is local before attempting to read via the URI. If these conditions are not met or if the read fails, clients should fall back to `resources/read`. It's also worth mentioning that if a client SDK's `resources/read` API method accepts a resource object (as returned by `resources/list`), then the API method can transparently take advantage of this optimization.
8792cfa
to
8d556f0
Compare
Resource.supportsDirectRead
Resource.uriScope
to enable direct reads
I think the hint is moving in the right direction. As a client implementor I would want a little clarification on "client must confirm that the server is local before reading the URI". E.g.:
is that enough? |
(UPDATE: The original proposal was to add
Resource.supportsDirectRead
. The current proposal is to addResource.uriScope
.)Motivation and Context
When reading a binary resource via
resources/read
, the resource content must be encoded into Base64 and wrapped in JSON on the server, and then must be decoded on the client. Base64 increases payload size by 33%, so this process can add significant overhead for large resources.This commit adds an optional
uriScope
property toResource
, which indicates when the resource content may be read directly via its URI, thereby avoiding the overhead. If the value ofuriScope
is"external"
, clients may attempt to read via the URI. If the value ofuriScope
is"internal"
, clients must verify that the server is local before attempting to read via the URI. If these conditions are not met or if the read fails, clients should fall back toresources/read
.It's also worth mentioning that if a client SDK's
resources/read
API method accepts a resource object (as returned byresources/list
), then the API method can transparently take advantage of this optimization.How Has This Been Tested?
Not tested.
Breaking Changes
None.
Types of changes
Checklist
Additional context
Closes #527.