Important
Vibe code experiment.
A lightweight proxy for LLM interactions with basic guardrails, logging, and metrics.
- Single LLM provider support (OpenAI)
- Enhanced guardrails system:
- Banned words filtering
- Regex pattern matching
- Content length limits
- Custom filtering rules (word count, pattern detection, etc.)
- Logging of requests and responses
- Prometheus metrics
- Config-driven setup
- Docker deployment
- Monitoring with Prometheus and Grafana
The proxy can be configured using a YAML file or environment variables:
server:
port: 8080
llm:
url: "https://api.openai.com/v1/chat/completions"
api_key: "YOUR_OPENAI_API_KEY"
guardrails:
# Basic banned words filtering
banned_words:
- "bomb"
- "attack"
- "weapon"
- "terrorist"
# Regex pattern filtering for sensitive information
regex_patterns:
- "\\b\\d{3}-\\d{2}-\\d{4}\\b" # US Social Security Numbers
- "\\b\\d{16}\\b" # Credit card numbers (simplified)
# Length limitations
max_content_length: 10000
max_prompt_length: 4000
# Custom rules with configuration
custom_rules:
- name: "Max Words"
type: "word_count"
parameters:
max_words: 1000
- name: "PII Detection"
type: "contains_pattern"
parameters:
pattern: "(passport|ssn|social security|credit card)"
Configuration via environment variables is supported for basic settings:
SERVER_PORT
: Server port (default: 8080)LLM_URL
: LLM API endpoint URLLLM_API_KEY
: LLM API keyBANNED_WORDS
: Comma-separated list of banned words
For advanced guardrail configuration, it's recommended to use the config file approach. Complex configurations like regex patterns and custom rules are better defined in the YAML config file.
Request:
{
"prompt": "Your prompt to the LLM",
"model_params": {
"model": "gpt-3.5-turbo",
"temperature": 0.7,
"max_tokens": 256
}
}
Response:
{
"completion": "LLM response text..."
}
GET /metrics
Returns Prometheus-formatted metrics including:
llm_requests_total
: Total number of LLM requests processedllm_errors_total
: Total number of errors from LLM callsllm_tokens_total
: Total number of tokens used in LLM callsguardrail_blocks_total
: Total number of requests blocked by guardrails
GET /health
# Build and run
go build -o ai-proxy ./cmd/server
./ai-proxy --config config/config.yaml
# Build Docker image
docker build -t ai-proxy:0.1 .
# Run with configuration in environment variables
docker run -p 8080:8080 \
-e LLM_API_KEY=your_openai_api_key \
ai-proxy:0.1
# Or mount a custom config file
docker run -p 8080:8080 \
-v $(pwd)/config/config.yaml:/app/config/config.yaml \
ai-proxy:0.1
The project includes a complete monitoring stack with Prometheus and Grafana.
# Start the entire stack (AI Proxy, Prometheus, and Grafana)
docker-compose up -d
# Access the services:
# - AI Proxy: http://localhost:8080
# - Prometheus: http://localhost:9090
# - Grafana: http://localhost:3000 (login with admin/admin)
# Stop the services
docker-compose down
# Send a query
curl -X POST http://localhost:8080/v1/query \
-H "Content-Type: application/json" \
-d '{"prompt": "Tell me a joke", "model_params": {"temperature": 0.7}}'
# Check metrics
curl http://localhost:8080/metrics
The AI Proxy supports multiple types of guardrails that can be combined:
This guardrail blocks requests containing specified banned words.
banned_words:
- "dangerous_word1"
- "dangerous_word2"
This guardrail uses regular expressions to block content matching specified patterns. Useful for detecting structured sensitive information like SSNs or credit cards.
regex_patterns:
- "\\b\\d{3}-\\d{2}-\\d{4}\\b" # SSN pattern
Limits the maximum length of prompts and completions.
max_content_length: 10000
max_prompt_length: 4000
Allows for configurable rules using predefined types. Each rule type supports specific parameters and type validation:
custom_rules:
- name: "Max Words"
type: "word_count"
parameters:
max_words: 1000
custom_rules:
- name: "PII Detection"
type: "contains_pattern"
parameters:
pattern: "(passport|ssn|social security|credit card)" # Regular expression pattern
The pattern parameter must be a valid regular expression. Invalid regex patterns will be logged as warnings and skipped.
Some possible future enhancements could include:
-
Additional LLM Support
- Integration with other LLM providers (Anthropic Claude, Google Gemini, etc.)
-
Authentication & Security
- API key authentication
- Basic rate limiting
- Request validation
-
Performance Improvements
- Optional caching for common queries
- Optimizations for high-traffic scenarios
- Load balancing across multiple LLM providers