Skip to content

Conversation

nirinchev
Copy link
Collaborator

@nirinchev nirinchev commented Sep 5, 2025

Proposed changes

While looking at some customer logs, I noticed we're likely being super inefficient by waiting for telemetry events to be sent before we can continue with normal server tasks.

Additionally, added a log for after tool execution.

Finally, it fixes the race condition reported in MCP-179.

@Copilot Copilot AI review requested due to automatic review settings September 5, 2025 00:23
@nirinchev nirinchev requested a review from a team as a code owner September 5, 2025 00:23
@nirinchev nirinchev added the no-title-validation Add this label to disable the title check for this PR. label Sep 5, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes telemetry performance by making telemetry event emission non-blocking and asynchronous. The changes prevent the server from waiting for telemetry events to be sent before continuing with normal operations.

  • Make telemetry event emission non-blocking by removing await keywords
  • Add event emitters to track telemetry completion states
  • Introduce logging for tool execution completion

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/telemetry.test.ts Adds helper function to properly test async telemetry events
src/transports/streamableHttp.ts Improves logger usage and error message clarity
src/tools/tool.ts Makes tool telemetry non-blocking and adds execution logging
src/telemetry/telemetry.ts Converts telemetry emission to non-blocking with event emitters
src/server.ts Removes blocking telemetry call
src/common/connectionManager.ts Adds explicit type annotation for events property

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@@ -71,34 +90,22 @@ export class Telemetry {

public async close(): Promise<void> {
this.isBufferingEvents = false;
await this.emitEvents(this.eventCache.getEvents());
await timeout(this.emit([]), 5_000);
Copy link
Preview

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout function is called with an empty array [] instead of the cached events. This should be await timeout(this.emit(this.eventCache.getEvents()), 5_000); to ensure cached events are sent during cleanup.

Suggested change
await timeout(this.emit([]), 5_000);
await timeout(this.emit(this.eventCache.getEvents()), 5_000);

Copilot uses AI. Check for mistakes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. We're already emitting the cached events in emit, so passing this.eventCache.getEvents() would have made it so they're emitted twice.

Comment on lines 17 to 18
async function timeout(promise: Promise<unknown>, ms: number): Promise<void> {
await Promise.race([new Promise((resolve) => setTimeout(resolve, ms)), promise]);
Copy link
Preview

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout function doesn't actually enforce a timeout - it just waits for whichever promise resolves first. Consider using Promise.race with a rejection timeout to properly handle cases where the promise takes longer than the specified milliseconds.

Suggested change
async function timeout(promise: Promise<unknown>, ms: number): Promise<void> {
await Promise.race([new Promise((resolve) => setTimeout(resolve, ms)), promise]);
await Promise.race([
promise,
new Promise((_, reject) => setTimeout(() => reject(new Error(`Operation timed out after ${ms} ms`)), ms))
]);

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is actually good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like it. The reason why I chose to resolve the promise is because I don't think close() should throw if we fail to emit the events on time. I'll add a comment explaining that and remove the function as it's just a 1-liner anyway.

Copy link
Collaborator

@kmruiz kmruiz Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this approach is that the timeout is only relevant for waiting, but the actual operation runs in the background (I don't see any usage of an AbortController here). At least the error would show in our logs even if we don't cancel the long-running op.

If you want to keep the resolution as is, I would at least print a log, so timeouts are not invisible. One of the reasons we are implementing this is because we think that telemetry is taking too much time at the end.

Copy link
Collaborator

@gagik gagik Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check that this timeout isn't going to lead to the process being held up?
adding .unref() https://nodejs.org/api/timers.html#timeoutunref just to be safe would be good to.

@coveralls
Copy link
Collaborator

coveralls commented Sep 5, 2025

Pull Request Test Coverage Report for Build 17497369294

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 72 of 91 (79.12%) changed or added relevant lines in 6 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.07%) to 81.286%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/tools/tool.ts 10 11 90.91%
src/telemetry/eventCache.ts 7 9 77.78%
src/transports/streamableHttp.ts 0 3 0.0%
src/telemetry/telemetry.ts 53 66 80.3%
Files with Coverage Reduction New Missed Lines %
src/telemetry/telemetry.ts 1 85.71%
Totals Coverage Status
Change from base Build 17487617251: -0.07%
Covered Lines: 4769
Relevant Lines: 5772

💛 - Coveralls

@nirinchev nirinchev requested a review from blva September 5, 2025 08:43
@nirinchev nirinchev changed the title fix: don't wait for telemetry events fix: don't wait for telemetry events MCP-179 Sep 5, 2025
Copy link
Collaborator

@gagik gagik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure when this regressed but I remember this exact setup with the track functions not being async being the case in the repo before.

Definitely how we should approach it when doing this on a more generalized level.

@blva blva requested review from gagik and kmruiz September 5, 2025 13:11
@blva blva merged commit da2cac8 into main Sep 5, 2025
18 checks passed
@blva blva deleted the ni/sync-telemetry branch September 5, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-title-validation Add this label to disable the title check for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants