# Changing the AI Model ## Overview The system uses Ollama for AI-powered features (summarization, clustering, neutral summaries). You can easily change the model by updating the `.env` file. ## Current Configuration **Default Model:** `phi3:latest` The model is configured in `backend/.env`: ```env OLLAMA_MODEL=phi3:latest ``` ## ✅ How to Change the Model ### Step 1: Update .env File Edit `backend/.env` and change the `OLLAMA_MODEL` value: ```env # Example: Change to a different model OLLAMA_MODEL=llama3:latest # Or use a specific version OLLAMA_MODEL=mistral:7b # Or use a custom model OLLAMA_MODEL=your-custom-model:latest ``` ### Step 2: Restart Services The model will be automatically downloaded on startup: ```bash # Stop services docker-compose down # Start services (model will be pulled automatically) docker-compose up -d # Watch the download progress docker-compose logs -f ollama-setup ``` **Note:** First startup with a new model takes 2-10 minutes depending on model size. ## Supported Models ### Recommended Models | Model | Size | Speed | Quality | Best For | |-------|------|-------|---------|----------| | `phi3:latest` | 2.3GB | ⚡⚡⚡ | ⭐⭐⭐ | **Default** - Fast, good quality | | `llama3:8b` | 4.7GB | ⚡⚡ | ⭐⭐⭐⭐ | Better quality, slower | | `mistral:7b` | 4.1GB | ⚡⚡ | ⭐⭐⭐⭐ | Balanced performance | | `gemma:7b` | 5.0GB | ⚡⚡ | ⭐⭐⭐⭐ | Google's model | ### Lightweight Models (Faster) | Model | Size | Speed | Quality | |-------|------|-------|---------| | `phi3:mini` | 2.3GB | ⚡⚡⚡ | ⭐⭐⭐ | | `tinyllama:latest` | 637MB | ⚡⚡⚡⚡ | ⭐⭐ | | `qwen:0.5b` | 397MB | ⚡⚡⚡⚡ | ⭐⭐ | ### High-Quality Models (Slower) | Model | Size | Speed | Quality | |-------|------|-------|---------| | `llama3:70b` | 40GB | ⚡ | ⭐⭐⭐⭐⭐ | | `mixtral:8x7b` | 26GB | ⚡ | ⭐⭐⭐⭐⭐ | **Full list:** https://ollama.ai/library ## Manual Model Management ### Pull Model Manually ```bash # Pull a specific model docker-compose exec ollama ollama pull llama3:latest # Pull multiple models docker-compose exec ollama ollama pull mistral:7b docker-compose exec ollama ollama pull phi3:latest ``` ### List Available Models ```bash docker-compose exec ollama ollama list ``` ### Remove Unused Models ```bash # Remove a specific model docker-compose exec ollama ollama rm phi3:latest # Free up space docker-compose exec ollama ollama prune ``` ## Testing the New Model ### Test via API ```bash curl http://localhost:5001/api/ollama/test ``` ### Test Summarization ```bash docker-compose exec crawler python << 'EOF' from ollama_client import OllamaClient from config import Config client = OllamaClient( base_url=Config.OLLAMA_BASE_URL, model=Config.OLLAMA_MODEL, enabled=True ) result = client.summarize_article( "This is a test article about Munich news. The city council made important decisions today.", max_words=50 ) print(f"Model: {Config.OLLAMA_MODEL}") print(f"Success: {result['success']}") print(f"Summary: {result['summary']}") print(f"Duration: {result['duration']:.2f}s") EOF ``` ### Test Clustering ```bash docker-compose exec crawler python tests/crawler/test_clustering_real.py ``` ## Performance Comparison ### Summarization Speed (per article) | Model | CPU | GPU (NVIDIA) | |-------|-----|--------------| | phi3:latest | ~15s | ~3s | | llama3:8b | ~25s | ~5s | | mistral:7b | ~20s | ~4s | | llama3:70b | ~120s | ~15s | ### Memory Requirements | Model | RAM | VRAM (GPU) | |-------|-----|------------| | phi3:latest | 4GB | 2GB | | llama3:8b | 8GB | 4GB | | mistral:7b | 8GB | 4GB | | llama3:70b | 48GB | 40GB | ## Troubleshooting ### Model Not Found ```bash # Check if model exists docker-compose exec ollama ollama list # Pull the model manually docker-compose exec ollama ollama pull your-model:latest ``` ### Out of Memory If you get OOM errors: 1. Use a smaller model (e.g., `phi3:mini`) 2. Enable GPU acceleration (see [GPU_SETUP.md](GPU_SETUP.md)) 3. Increase Docker memory limit ### Slow Performance 1. **Use GPU acceleration** - 5-10x faster 2. **Use smaller model** - phi3:latest is fastest 3. **Increase timeout** in `.env`: ```env OLLAMA_TIMEOUT=300 ``` ### Model Download Fails ```bash # Check Ollama logs docker-compose logs ollama # Restart Ollama docker-compose restart ollama # Try manual pull docker-compose exec ollama ollama pull phi3:latest ``` ## Custom Models ### Using Your Own Model 1. **Create/fine-tune your model** using Ollama 2. **Import it:** ```bash docker-compose exec ollama ollama create my-model -f Modelfile ``` 3. **Update .env:** ```env OLLAMA_MODEL=my-model:latest ``` 4. **Restart services** ### Model Requirements Your custom model should support: - Text generation - Prompt-based instructions - Reasonable response times (<60s per request) ## Best Practices ### For Production 1. **Test thoroughly** before switching models 2. **Monitor performance** after switching 3. **Keep backup** of old model until stable 4. **Document** model choice in your deployment notes ### For Development 1. **Use phi3:latest** for fast iteration 2. **Test with llama3:8b** for quality checks 3. **Profile performance** with different models 4. **Compare results** between models ## FAQ **Q: Can I use multiple models?** A: Yes! Pull multiple models and switch by updating `.env` and restarting. **Q: Do I need to re-crawl articles?** A: No. Existing summaries remain. New articles use the new model. **Q: Can I use OpenAI/Anthropic models?** A: Not directly. Ollama only supports local models. For cloud APIs, you'd need to modify the `OllamaClient` class. **Q: Which model is best?** A: For most users: `phi3:latest` (fast, good quality). For better quality: `llama3:8b`. For production with GPU: `mistral:7b`. **Q: How much disk space do I need?** A: 5-10GB for small models, 50GB+ for large models. Plan accordingly. ## Related Documentation - [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama installation & configuration - [GPU_SETUP.md](GPU_SETUP.md) - GPU acceleration setup - [AI_NEWS_AGGREGATION.md](AI_NEWS_AGGREGATION.md) - AI features overview