VT.ai Documentation¶

Welcome to the VT.ai documentation. VT.ai is a minimal multimodal AI chat application with dynamic routing capabilities.

About VT.ai¶

VT.ai is a sophisticated multimodal AI chat application that integrates multiple AI providers (OpenAI, Anthropic, Google, etc.) with smart semantic routing to direct queries to the most appropriate handlers. It supports text, image, and audio inputs, includes vision analysis for images, and features DALL-E 3 integration for image generation.

Key Features¶

Multi-Provider AI Integration: Supports OpenAI (o1, o3, 4o), Anthropic (Claude), Google (Gemini), DeepSeek, Meta (Llama), Cohere, local models via Ollama, and more.
Semantic-Based Routing: Smart routing system that automatically directs queries to specialized handlers based on vector-based classification.
Multimodal Capabilities: Support for text, image, and audio inputs with vision analysis for images and URLs.
Voice Interaction: Speech-to-text and real-time conversation features with multiple voice models.
Thinking Mode: Access step-by-step reasoning from the models with transparent thinking processes.

Documentation Structure¶

This documentation is organized into several sections:

User Guide: Information for end users of VT.ai, including setup and usage instructions.
Developer Guide: Information for developers who want to extend or modify VT.ai.
API Reference: Detailed API documentation for VT.ai's components.

Getting Started¶

To get started with VT.ai, see the Getting Started guide.

Contributing¶

Contributions to VT.ai and its documentation are welcome. See the GitHub repository for more information.