aimodel

2025-11-12 11:42:02 +01:00
parent 94c89589af
commit 6773775f2a
2 changed files with 271 additions and 3 deletions
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -50,15 +50,17 @@ services:
        condition: service_healthy
    networks:
      - munich-news-network
    env_file:
      - backend/.env
    entrypoint: /bin/sh
    command: >
      -c "
      echo 'Waiting for Ollama service to be ready...' &&
      sleep 5 &&
-      echo 'Pulling phi3:latest model via API...' &&
+      echo 'Pulling model: ${OLLAMA_MODEL:-phi3:latest}' &&
-      curl -X POST http://ollama:11434/api/pull -d '{\"name\":\"phi3:latest\"}' &&
+      curl -X POST http://ollama:11434/api/pull -d '{\"name\":\"${OLLAMA_MODEL:-phi3:latest}\"}' &&
      echo '' &&
-      echo 'Model phi3:latest pull initiated!'
+      echo 'Model ${OLLAMA_MODEL:-phi3:latest} pull initiated!'
      "
    restart: "no"
--- a/docs/CHANGING_AI_MODEL.md
+++ b/docs/CHANGING_AI_MODEL.md
@@ -0,0 +1,266 @@
 # Changing the AI Model
 ## Overview
 The system uses Ollama for AI-powered features (summarization, clustering, neutral summaries). You can easily change the model by updating the `.env` file.
 ## Current Configuration
 **Default Model:** `phi3:latest`
 The model is configured in `backend/.env`:
 ```env
 OLLAMA_MODEL=phi3:latest
 ```
 ## ✅ How to Change the Model
 ### Step 1: Update .env File
 Edit `backend/.env` and change the `OLLAMA_MODEL` value:
 ```env
 # Example: Change to a different model
 OLLAMA_MODEL=llama3:latest
 # Or use a specific version
 OLLAMA_MODEL=mistral:7b
 # Or use a custom model
 OLLAMA_MODEL=your-custom-model:latest
 ```
 ### Step 2: Restart Services
 The model will be automatically downloaded on startup:
 ```bash
 # Stop services
 docker-compose down
 # Start services (model will be pulled automatically)
 docker-compose up -d
 # Watch the download progress
 docker-compose logs -f ollama-setup
 ```
 **Note:** First startup with a new model takes 2-10 minutes depending on model size.
 ## Supported Models
 ### Recommended Models
 | Model | Size | Speed | Quality | Best For |
 |-------|------|-------|---------|----------|
 | `phi3:latest` | 2.3GB | ⚡⚡⚡ | ⭐⭐⭐ | **Default** - Fast, good quality |
 | `llama3:8b` | 4.7GB | ⚡⚡ | ⭐⭐⭐⭐ | Better quality, slower |
 | `mistral:7b` | 4.1GB | ⚡⚡ | ⭐⭐⭐⭐ | Balanced performance |
 | `gemma:7b` | 5.0GB | ⚡⚡ | ⭐⭐⭐⭐ | Google's model |
 ### Lightweight Models (Faster)
 | Model | Size | Speed | Quality |
 |-------|------|-------|---------|
 | `phi3:mini` | 2.3GB | ⚡⚡⚡ | ⭐⭐⭐ |
 | `tinyllama:latest` | 637MB | ⚡⚡⚡⚡ | ⭐⭐ |
 | `qwen:0.5b` | 397MB | ⚡⚡⚡⚡ | ⭐⭐ |
 ### High-Quality Models (Slower)
 | Model | Size | Speed | Quality |
 |-------|------|-------|---------|
 | `llama3:70b` | 40GB | ⚡ | ⭐⭐⭐⭐⭐ |
 | `mixtral:8x7b` | 26GB | ⚡ | ⭐⭐⭐⭐⭐ |
 **Full list:** https://ollama.ai/library
 ## Manual Model Management
 ### Pull Model Manually
 ```bash
 # Pull a specific model
 docker-compose exec ollama ollama pull llama3:latest
 # Pull multiple models
 docker-compose exec ollama ollama pull mistral:7b
 docker-compose exec ollama ollama pull phi3:latest
 ```
 ### List Available Models
 ```bash
 docker-compose exec ollama ollama list
 ```
 ### Remove Unused Models
 ```bash
 # Remove a specific model
 docker-compose exec ollama ollama rm phi3:latest
 # Free up space
 docker-compose exec ollama ollama prune
 ```
 ## Testing the New Model
 ### Test via API
 ```bash
 curl http://localhost:5001/api/ollama/test
 ```
 ### Test Summarization
 ```bash
 docker-compose exec crawler python << 'EOF'
 from ollama_client import OllamaClient
 from config import Config
 client = OllamaClient(
    base_url=Config.OLLAMA_BASE_URL,
    model=Config.OLLAMA_MODEL,
    enabled=True
 )
 result = client.summarize_article(
    "This is a test article about Munich news. The city council made important decisions today.",
    max_words=50
 )
 print(f"Model: {Config.OLLAMA_MODEL}")
 print(f"Success: {result['success']}")
 print(f"Summary: {result['summary']}")
 print(f"Duration: {result['duration']:.2f}s")
 EOF
 ```
 ### Test Clustering
 ```bash
 docker-compose exec crawler python tests/crawler/test_clustering_real.py
 ```
 ## Performance Comparison
 ### Summarization Speed (per article)
 | Model | CPU | GPU (NVIDIA) |
 |-------|-----|--------------|
 | phi3:latest | ~15s | ~3s |
 | llama3:8b | ~25s | ~5s |
 | mistral:7b | ~20s | ~4s |
 | llama3:70b | ~120s | ~15s |
 ### Memory Requirements
 | Model | RAM | VRAM (GPU) |
 |-------|-----|------------|
 | phi3:latest | 4GB | 2GB |
 | llama3:8b | 8GB | 4GB |
 | mistral:7b | 8GB | 4GB |
 | llama3:70b | 48GB | 40GB |
 ## Troubleshooting
 ### Model Not Found
 ```bash
 # Check if model exists
 docker-compose exec ollama ollama list
 # Pull the model manually
 docker-compose exec ollama ollama pull your-model:latest
 ```
 ### Out of Memory
 If you get OOM errors:
 1. Use a smaller model (e.g., `phi3:mini`)
 2. Enable GPU acceleration (see [GPU_SETUP.md](GPU_SETUP.md))
 3. Increase Docker memory limit
 ### Slow Performance
 1. **Use GPU acceleration** - 5-10x faster
 2. **Use smaller model** - phi3:latest is fastest
 3. **Increase timeout** in `.env`:
   ```env
   OLLAMA_TIMEOUT=300
   ```
 ### Model Download Fails
 ```bash
 # Check Ollama logs
 docker-compose logs ollama
 # Restart Ollama
 docker-compose restart ollama
 # Try manual pull
 docker-compose exec ollama ollama pull phi3:latest
 ```
 ## Custom Models
 ### Using Your Own Model
 1. **Create/fine-tune your model** using Ollama
 2. **Import it:**
   ```bash
   docker-compose exec ollama ollama create my-model -f Modelfile
   ```
 3. **Update .env:**
   ```env
   OLLAMA_MODEL=my-model:latest
   ```
 4. **Restart services**
 ### Model Requirements
 Your custom model should support:
 - Text generation
 - Prompt-based instructions
 - Reasonable response times (<60s per request)
 ## Best Practices
 ### For Production
 1. **Test thoroughly** before switching models
 2. **Monitor performance** after switching
 3. **Keep backup** of old model until stable
 4. **Document** model choice in your deployment notes
 ### For Development
 1. **Use phi3:latest** for fast iteration
 2. **Test with llama3:8b** for quality checks
 3. **Profile performance** with different models
 4. **Compare results** between models
 ## FAQ
 **Q: Can I use multiple models?**
 A: Yes! Pull multiple models and switch by updating `.env` and restarting.
 **Q: Do I need to re-crawl articles?**
 A: No. Existing summaries remain. New articles use the new model.
 **Q: Can I use OpenAI/Anthropic models?**
 A: Not directly. Ollama only supports local models. For cloud APIs, you'd need to modify the `OllamaClient` class.
 **Q: Which model is best?**
 A: For most users: `phi3:latest` (fast, good quality). For better quality: `llama3:8b`. For production with GPU: `mistral:7b`.
 **Q: How much disk space do I need?**
 A: 5-10GB for small models, 50GB+ for large models. Plan accordingly.
 ## Related Documentation
 - [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama installation & configuration
 - [GPU_SETUP.md](GPU_SETUP.md) - GPU acceleration setup
 - [AI_NEWS_AGGREGATION.md](AI_NEWS_AGGREGATION.md) - AI features overview