6.1 KiB
Changing the AI Model
Overview
The system uses Ollama for AI-powered features (summarization, clustering, neutral summaries). You can easily change the model by updating the .env file.
Current Configuration
Default Model: phi3:latest
The model is configured in backend/.env:
OLLAMA_MODEL=phi3:latest
✅ How to Change the Model
Step 1: Update .env File
Edit backend/.env and change the OLLAMA_MODEL value:
# Example: Change to a different model
OLLAMA_MODEL=llama3:latest
# Or use a specific version
OLLAMA_MODEL=mistral:7b
# Or use a custom model
OLLAMA_MODEL=your-custom-model:latest
Step 2: Restart Services
The model will be automatically downloaded on startup:
# Stop services
docker-compose down
# Start services (model will be pulled automatically)
docker-compose up -d
# Watch the download progress
docker-compose logs -f ollama-setup
Note: First startup with a new model takes 2-10 minutes depending on model size.
Supported Models
Recommended Models
| Model | Size | Speed | Quality | Best For |
|---|---|---|---|---|
phi3:latest |
2.3GB | ⚡⚡⚡ | ⭐⭐⭐ | Default - Fast, good quality |
llama3:8b |
4.7GB | ⚡⚡ | ⭐⭐⭐⭐ | Better quality, slower |
mistral:7b |
4.1GB | ⚡⚡ | ⭐⭐⭐⭐ | Balanced performance |
gemma:7b |
5.0GB | ⚡⚡ | ⭐⭐⭐⭐ | Google's model |
Lightweight Models (Faster)
| Model | Size | Speed | Quality |
|---|---|---|---|
phi3:mini |
2.3GB | ⚡⚡⚡ | ⭐⭐⭐ |
tinyllama:latest |
637MB | ⚡⚡⚡⚡ | ⭐⭐ |
qwen:0.5b |
397MB | ⚡⚡⚡⚡ | ⭐⭐ |
High-Quality Models (Slower)
| Model | Size | Speed | Quality |
|---|---|---|---|
llama3:70b |
40GB | ⚡ | ⭐⭐⭐⭐⭐ |
mixtral:8x7b |
26GB | ⚡ | ⭐⭐⭐⭐⭐ |
Full list: https://ollama.ai/library
Manual Model Management
Pull Model Manually
# Pull a specific model
docker-compose exec ollama ollama pull llama3:latest
# Pull multiple models
docker-compose exec ollama ollama pull mistral:7b
docker-compose exec ollama ollama pull phi3:latest
List Available Models
docker-compose exec ollama ollama list
Remove Unused Models
# Remove a specific model
docker-compose exec ollama ollama rm phi3:latest
# Free up space
docker-compose exec ollama ollama prune
Testing the New Model
Test via API
curl http://localhost:5001/api/ollama/test
Test Summarization
docker-compose exec crawler python << 'EOF'
from ollama_client import OllamaClient
from config import Config
client = OllamaClient(
base_url=Config.OLLAMA_BASE_URL,
model=Config.OLLAMA_MODEL,
enabled=True
)
result = client.summarize_article(
"This is a test article about Munich news. The city council made important decisions today.",
max_words=50
)
print(f"Model: {Config.OLLAMA_MODEL}")
print(f"Success: {result['success']}")
print(f"Summary: {result['summary']}")
print(f"Duration: {result['duration']:.2f}s")
EOF
Test Clustering
docker-compose exec crawler python tests/crawler/test_clustering_real.py
Performance Comparison
Summarization Speed (per article)
| Model | CPU | GPU (NVIDIA) |
|---|---|---|
| phi3:latest | ~15s | ~3s |
| llama3:8b | ~25s | ~5s |
| mistral:7b | ~20s | ~4s |
| llama3:70b | ~120s | ~15s |
Memory Requirements
| Model | RAM | VRAM (GPU) |
|---|---|---|
| phi3:latest | 4GB | 2GB |
| llama3:8b | 8GB | 4GB |
| mistral:7b | 8GB | 4GB |
| llama3:70b | 48GB | 40GB |
Troubleshooting
Model Not Found
# Check if model exists
docker-compose exec ollama ollama list
# Pull the model manually
docker-compose exec ollama ollama pull your-model:latest
Out of Memory
If you get OOM errors:
- Use a smaller model (e.g.,
phi3:mini) - Enable GPU acceleration (see GPU_SETUP.md)
- Increase Docker memory limit
Slow Performance
- Use GPU acceleration - 5-10x faster
- Use smaller model - phi3:latest is fastest
- Increase timeout in
.env:OLLAMA_TIMEOUT=300
Model Download Fails
# Check Ollama logs
docker-compose logs ollama
# Restart Ollama
docker-compose restart ollama
# Try manual pull
docker-compose exec ollama ollama pull phi3:latest
Custom Models
Using Your Own Model
- Create/fine-tune your model using Ollama
- Import it:
docker-compose exec ollama ollama create my-model -f Modelfile - Update .env:
OLLAMA_MODEL=my-model:latest - Restart services
Model Requirements
Your custom model should support:
- Text generation
- Prompt-based instructions
- Reasonable response times (<60s per request)
Best Practices
For Production
- Test thoroughly before switching models
- Monitor performance after switching
- Keep backup of old model until stable
- Document model choice in your deployment notes
For Development
- Use phi3:latest for fast iteration
- Test with llama3:8b for quality checks
- Profile performance with different models
- Compare results between models
FAQ
Q: Can I use multiple models?
A: Yes! Pull multiple models and switch by updating .env and restarting.
Q: Do I need to re-crawl articles? A: No. Existing summaries remain. New articles use the new model.
Q: Can I use OpenAI/Anthropic models?
A: Not directly. Ollama only supports local models. For cloud APIs, you'd need to modify the OllamaClient class.
Q: Which model is best?
A: For most users: phi3:latest (fast, good quality). For better quality: llama3:8b. For production with GPU: mistral:7b.
Q: How much disk space do I need? A: 5-10GB for small models, 50GB+ for large models. Plan accordingly.
Related Documentation
- OLLAMA_SETUP.md - Ollama installation & configuration
- GPU_SETUP.md - GPU acceleration setup
- AI_NEWS_AGGREGATION.md - AI features overview