Overview
RecordEngine supports five AI providers, all configurable from Settings → AI Backend without any code changes or server restarts required.| Provider | Type | Best For |
|---|---|---|
| Local (Ollama) | On-premise | Maximum privacy, air-gapped deployments |
| OpenAI | Cloud BYOK | High accuracy, proven reliability |
| Claude (Anthropic) | Cloud BYOK | Multilingual, nuanced reasoning |
| Qwen API | Cloud BYOK | Chinese documents |
| Gemini (Google) | Cloud BYOK | Best cost/performance ratio |
Switching Providers
- Go to Settings → AI Backend
- Select a provider from the dropdown
- Enter your API key (cloud providers only)
- Click 💾 Save
- Click 🧪 Test Connection to verify
Provider Details
Local (Ollama)
Fully offline inference on your GPU. No API key required. No data leaves your network. Default models:| Task | Model |
|---|---|
| Extraction / structured output | qwen3.5:9b |
| Vision / OCR | qwen3.5:9b |
| Chat | qwen3.5:9b |
- Precise — uses qwen3.5:27b for text tasks (higher accuracy, slower)
- Fast — uses qwen3.5:9b for all tasks (default, recommended)
OpenAI
Default model:gpt-4oAPI key: Get yours at platform.openai.com
Pricing: ~10.00 / 1M output tokens GPT-4o handles vision and structured extraction with high accuracy. This is the recommended provider for GPU-mode pilots.
Claude (Anthropic)
Default model:claude-sonnet-4-5API key: Get yours at console.anthropic.com
Pricing: ~15.00 / 1M output tokens Strong at multilingual documents including Chinese text and nuanced reasoning for AI chat.
Qwen API
Default models:qwen-max (text), qwen-vl-max (vision)API key: Get yours at dashscope.aliyun.com
Pricing: ~$0.50 / 1M input tokens Best performance on Chinese-language documents. Uses Alibaba’s DashScope infrastructure — note this for security-conscious prospects even though data stays within Alibaba Cloud.
Gemini (Google)
Default model:gemini-2.5-flash-preview-04-17API key: Get yours at aistudio.google.com
Pricing: ~2.50 / 1M output tokens Recommended for API-mode (BYOK) deployments where cost efficiency matters. Gemini delivers extraction quality comparable to GPT-4o at approximately 5x lower cost. Excellent OCR performance on handwritten and complex document layouts. To use Gemini 3 Flash, enter
gemini-3-flash-preview in the model override field.
Model Override
To use a specific model instead of the provider default:- Go to Settings → AI Backend
- Enter the model name in the Model override field
- Save
- OpenAI:
gpt-4o-mini(cheaper, lower accuracy) - Claude:
claude-haiku-4-5(faster, cheaper) - Gemini:
gemini-3-flash-preview(latest generation)
Fallback to Local
When a cloud provider is configured, you can enable automatic fallback to your local GPU if the cloud API is unavailable:- Enable Fallback to Local GPU if cloud is unavailable in Settings → AI Backend
- Save
Architecture
All AI calls are routed throughapp/ai_router.py. The three public functions are:
Per-Provider API Key Storage
Each provider’s key is stored independently:| Setting key | Provider |
|---|---|
ai_api_key_openai | OpenAI |
ai_api_key_claude | Claude |
ai_api_key_qwen_api | Qwen API |
ai_api_key_gemini | Gemini |