Skip to content

Add automatic compaction of historical messages for agents #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Command-line interface for AI-powered coding tasks. Full details available on th
- 👤 **Human Compatible**: Uses README.md, project files and shell commands to build its own context
- 🌐 **GitHub Integration**: GitHub mode for working with issues and PRs as part of workflow
- 📄 **Model Context Protocol**: Support for MCP to access external context sources
- 🧠 **Message Compaction**: Automatic management of context window for long-running agents

Please join the MyCoder.ai discord for support: https://discord.gg/5K6TYrHGHt

Expand Down
105 changes: 105 additions & 0 deletions docs/features/message-compaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Message Compaction

When agents run for extended periods, they accumulate a large history of messages that eventually fills up the LLM's context window, causing errors when the token limit is exceeded. The message compaction feature helps prevent this by providing agents with awareness of their token usage and tools to manage their context window.

## Features

### 1. Token Usage Tracking

The LLM abstraction now tracks and returns:
- Total tokens used in the current completion request
- Maximum allowed tokens for the model/provider

This information is used to monitor context window usage and trigger appropriate actions.

### 2. Status Updates

Agents receive status updates with information about:
- Current token usage and percentage of the maximum
- Cost so far
- Active sub-agents and their status
- Active shell processes and their status
- Active browser sessions and their status

Status updates are sent:
1. Every 5 agent interactions (periodic updates)
2. Whenever token usage exceeds 50% of the maximum (threshold-based updates)

Example status update:
```
--- STATUS UPDATE ---
Token Usage: 45,235/100,000 (45%)
Cost So Far: $0.23

Active Sub-Agents: 2
- sa_12345: Analyzing project structure and dependencies
- sa_67890: Implementing unit tests for compactHistory tool

Active Shell Processes: 3
- sh_abcde: npm test
- sh_fghij: npm run watch
- sh_klmno: git status

Active Browser Sessions: 1
- bs_12345: https://www.typescriptlang.org/docs/handbook/utility-types.html

If token usage is high (>70%), consider using the 'compactHistory' tool to reduce context size.
--- END STATUS ---
```

### 3. Message Compaction Tool

The `compactHistory` tool allows agents to compact their message history by summarizing older messages while preserving recent context. This tool:

1. Takes a parameter for how many recent messages to preserve unchanged
2. Summarizes all older messages into a single, concise summary
3. Replaces the original messages with the summary and preserved messages
4. Reports on the reduction in context size

## Usage

Agents are instructed to monitor their token usage through status updates and use the `compactHistory` tool when token usage approaches 50% of the maximum:

```javascript
// Example of agent using the compactHistory tool
{
name: "compactHistory",
preserveRecentMessages: 10,
customPrompt: "Focus on summarizing our key decisions and current tasks."
}
```

## Configuration

The message compaction feature is enabled by default with reasonable defaults:
- Status updates every 5 agent interactions
- Recommendation to compact at 70% token usage
- Default preservation of 10 recent messages when compacting

## Model Token Limits

The system includes token limits for various models:

### Anthropic Models
- claude-3-opus-20240229: 200,000 tokens
- claude-3-sonnet-20240229: 200,000 tokens
- claude-3-haiku-20240307: 200,000 tokens
- claude-2.1: 100,000 tokens

### OpenAI Models
- gpt-4o: 128,000 tokens
- gpt-4-turbo: 128,000 tokens
- gpt-3.5-turbo: 16,385 tokens

### Ollama Models
- llama2: 4,096 tokens
- mistral: 8,192 tokens
- mixtral: 32,768 tokens

## Benefits

- Prevents context window overflow errors
- Maintains important context for agent operation
- Enables longer-running agent sessions
- Makes the system more robust for complex tasks
- Gives agents self-awareness of resource usage
50 changes: 50 additions & 0 deletions example-status-update.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Example Status Update

This is an example of what the status update looks like for the agent:

```
--- STATUS UPDATE ---
Token Usage: 45,235/100,000 (45%)
Cost So Far: $0.23

Active Sub-Agents: 2
- sa_12345: Analyzing project structure and dependencies
- sa_67890: Implementing unit tests for compactHistory tool

Active Shell Processes: 3
- sh_abcde: npm test -- --watch packages/agent/src/tools/utility
- sh_fghij: npm run watch
- sh_klmno: git status

Active Browser Sessions: 1
- bs_12345: https://www.typescriptlang.org/docs/handbook/utility-types.html

Your token usage is high (45%). It is recommended to use the 'compactHistory' tool now to reduce context size.
--- END STATUS ---
```

## About Status Updates

Status updates are sent to the agent (every 5 interactions and whenever token usage exceeds 50%) to provide awareness of:

1. **Token Usage**: Current usage and percentage of maximum context window
2. **Cost**: Estimated cost of the session so far
3. **Active Sub-Agents**: Running background agents and their tasks
4. **Active Shell Processes**: Running shell commands
5. **Active Browser Sessions**: Open browser sessions and their URLs

When token usage gets high (>70%), the agent is reminded to use the `compactHistory` tool to reduce context size by summarizing older messages.

## Using the compactHistory Tool

The agent can use the compactHistory tool like this:

```javascript
{
name: "compactHistory",
preserveRecentMessages: 10,
customPrompt: "Optional custom summarization prompt"
}
```

This will summarize all but the 10 most recent messages into a single summary message, significantly reducing token usage while preserving important context.
7 changes: 7 additions & 0 deletions packages/agent/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# [mycoder-agent-v1.6.0](https://github.com/drivecore/mycoder/compare/mycoder-agent-v1.5.0...mycoder-agent-v1.6.0) (2025-03-21)


### Features

* **browser:** add system browser detection for Playwright ([00bd879](https://github.com/drivecore/mycoder/commit/00bd879443c9de51c6ee5e227d4838905506382a)), closes [#333](https://github.com/drivecore/mycoder/issues/333)

# [mycoder-agent-v1.5.0](https://github.com/drivecore/mycoder/compare/mycoder-agent-v1.4.2...mycoder-agent-v1.5.0) (2025-03-20)

### Bug Fixes
Expand Down
2 changes: 1 addition & 1 deletion packages/agent/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "mycoder-agent",
"version": "1.5.0",
"version": "1.6.0",
"description": "Agent module for mycoder - an AI-powered software development assistant",
"type": "module",
"main": "dist/index.js",
Expand Down
30 changes: 27 additions & 3 deletions packages/agent/src/core/llm/providers/anthropic.ts
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,33 @@ function addCacheControlToMessages(
});
}

function tokenUsageFromMessage(message: Anthropic.Message) {
// Define model context window sizes for Anthropic models
const ANTHROPIC_MODEL_LIMITS: Record<string, number> = {
'claude-3-opus-20240229': 200000,
'claude-3-sonnet-20240229': 200000,
'claude-3-haiku-20240307': 200000,
'claude-3-7-sonnet-20250219': 200000,
'claude-2.1': 100000,
'claude-2.0': 100000,
'claude-instant-1.2': 100000,
// Add other models as needed
};

function tokenUsageFromMessage(message: Anthropic.Message, model: string) {
const usage = new TokenUsage();
usage.input = message.usage.input_tokens;
usage.cacheWrites = message.usage.cache_creation_input_tokens ?? 0;
usage.cacheReads = message.usage.cache_read_input_tokens ?? 0;
usage.output = message.usage.output_tokens;
return usage;

const totalTokens = usage.input + usage.output;
const maxTokens = ANTHROPIC_MODEL_LIMITS[model] || 100000; // Default fallback

return {
usage,
totalTokens,
maxTokens,
};
}

/**
Expand Down Expand Up @@ -175,10 +195,14 @@ export class AnthropicProvider implements LLMProvider {
};
});

const tokenInfo = tokenUsageFromMessage(response, this.model);

return {
text: content,
toolCalls: toolCalls,
tokenUsage: tokenUsageFromMessage(response),
tokenUsage: tokenInfo.usage,
totalTokens: tokenInfo.totalTokens,
maxTokens: tokenInfo.maxTokens,
};
} catch (error) {
throw new Error(
Expand Down
34 changes: 31 additions & 3 deletions packages/agent/src/core/llm/providers/ollama.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,22 @@ import {

import { TokenUsage } from '../../tokens.js';
import { ToolCall } from '../../types.js';
// Define model context window sizes for Ollama models
// These are approximate and may vary based on specific model configurations
const OLLAMA_MODEL_LIMITS: Record<string, number> = {
'llama2': 4096,
'llama2-uncensored': 4096,
'llama2:13b': 4096,
'llama2:70b': 4096,
'mistral': 8192,
'mistral:7b': 8192,
'mixtral': 32768,
'codellama': 16384,
'phi': 2048,
'phi2': 2048,
'openchat': 8192,
// Add other models as needed
};
import { LLMProvider } from '../provider.js';
import {
GenerateOptions,
Expand Down Expand Up @@ -56,7 +72,7 @@ export class OllamaProvider implements LLMProvider {
messages,
functions,
temperature = 0.7,
maxTokens,
maxTokens: requestMaxTokens,
topP,
frequencyPenalty,
presencePenalty,
Expand Down Expand Up @@ -86,10 +102,10 @@ export class OllamaProvider implements LLMProvider {
};

// Add max_tokens if provided
if (maxTokens !== undefined) {
if (requestMaxTokens !== undefined) {
requestOptions.options = {
...requestOptions.options,
num_predict: maxTokens,
num_predict: requestMaxTokens,
};
}

Expand All @@ -114,11 +130,23 @@ export class OllamaProvider implements LLMProvider {
const tokenUsage = new TokenUsage();
tokenUsage.output = response.eval_count || 0;
tokenUsage.input = response.prompt_eval_count || 0;

// Calculate total tokens and get max tokens for the model
const totalTokens = tokenUsage.input + tokenUsage.output;

// Extract the base model name without specific parameters
const baseModelName = this.model.split(':')[0];
// Check if model exists in limits, otherwise use base model or default
const modelMaxTokens = OLLAMA_MODEL_LIMITS[this.model] ||
(baseModelName ? OLLAMA_MODEL_LIMITS[baseModelName] : undefined) ||
4096; // Default fallback

return {
text: content,
toolCalls: toolCalls,
tokenUsage: tokenUsage,
totalTokens,
maxTokens: modelMaxTokens,
};
}

Expand Down
27 changes: 23 additions & 4 deletions packages/agent/src/core/llm/providers/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import OpenAI from 'openai';

import { TokenUsage } from '../../tokens.js';
import { ToolCall } from '../../types';
import { ToolCall } from '../../types.js';
import { LLMProvider } from '../provider.js';
import {
GenerateOptions,
Expand All @@ -19,6 +19,19 @@ import type {
ChatCompletionTool,
} from 'openai/resources/chat';

// Define model context window sizes for OpenAI models
const OPENAI_MODEL_LIMITS: Record<string, number> = {
'gpt-4o': 128000,
'gpt-4-turbo': 128000,
'gpt-4-0125-preview': 128000,
'gpt-4-1106-preview': 128000,
'gpt-4': 8192,
'gpt-4-32k': 32768,
'gpt-3.5-turbo': 16385,
'gpt-3.5-turbo-16k': 16385,
// Add other models as needed
};

/**
* OpenAI-specific options
*/
Expand Down Expand Up @@ -60,7 +73,7 @@ export class OpenAIProvider implements LLMProvider {
messages,
functions,
temperature = 0.7,
maxTokens,
maxTokens: requestMaxTokens,
stopSequences,
topP,
presencePenalty,
Expand All @@ -79,7 +92,7 @@ export class OpenAIProvider implements LLMProvider {
model: this.model,
messages: formattedMessages,
temperature,
max_tokens: maxTokens,
max_tokens: requestMaxTokens,
stop: stopSequences,
top_p: topP,
presence_penalty: presencePenalty,
Expand Down Expand Up @@ -116,11 +129,17 @@ export class OpenAIProvider implements LLMProvider {
const tokenUsage = new TokenUsage();
tokenUsage.input = response.usage?.prompt_tokens || 0;
tokenUsage.output = response.usage?.completion_tokens || 0;

// Calculate total tokens and get max tokens for the model
const totalTokens = tokenUsage.input + tokenUsage.output;
const modelMaxTokens = OPENAI_MODEL_LIMITS[this.model] || 8192; // Default fallback

return {
text: content,
toolCalls,
tokenUsage,
totalTokens,
maxTokens: modelMaxTokens,
};
} catch (error) {
throw new Error(`Error calling OpenAI API: ${(error as Error).message}`);
Expand Down Expand Up @@ -198,4 +217,4 @@ export class OpenAIProvider implements LLMProvider {
},
}));
}
}
}
3 changes: 3 additions & 0 deletions packages/agent/src/core/llm/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@
text: string;
toolCalls: ToolCall[];
tokenUsage: TokenUsage;
// Add new fields for context window tracking
totalTokens?: number; // Total tokens used in this request
maxTokens?: number; // Maximum allowed tokens for this model
}

/**
Expand All @@ -104,5 +107,5 @@
apiKey?: string;
baseUrl?: string;
organization?: string;
[key: string]: any; // Allow for provider-specific options

Check warning on line 110 in packages/agent/src/core/llm/types.ts

View workflow job for this annotation

GitHub Actions / ci

Unexpected any. Specify a different type
}
Loading
Loading