Files
Munich-news/docs/CHANGING_AI_MODEL.md
2025-11-12 11:55:53 +01:00

8.1 KiB

Changing the AI Model

Overview

The system uses Ollama for AI-powered features (summarization, clustering, neutral summaries). You can easily change the model by updating the .env file.

Current Configuration

Default Model: phi3:latest

The model is configured in backend/.env:

OLLAMA_MODEL=phi3:latest

How to Change the Model

Important Note

The model IS automatically checked and downloaded on startup

The ollama-setup service runs on every docker-compose up and:

  • Checks if the model specified in .env exists
  • Downloads it if missing
  • Skips download if already present

This means you can simply:

  1. Change OLLAMA_MODEL in .env
  2. Run docker-compose up -d
  3. Wait for download (if needed)
  4. Done!

Step 1: Update .env File

Edit backend/.env and change the OLLAMA_MODEL value:

# Example: Change to a different model
OLLAMA_MODEL=llama3:latest

# Or use a specific version
OLLAMA_MODEL=mistral:7b

# Or use a custom model
OLLAMA_MODEL=your-custom-model:latest

Step 2: Restart Services (Model Auto-Downloads)

Option A: Simple restart (Recommended)

# Restart all services
docker-compose up -d

# Watch the model check/download
docker-compose logs -f ollama-setup

The ollama-setup service will:

  • Check if the new model exists
  • Download it if missing (2-10 minutes)
  • Skip download if already present

Option B: Manual pull (if you want control)

# Pull the model manually first
./pull-ollama-model.sh

# Then restart
docker-compose restart crawler backend

Option C: Full restart

docker-compose down
docker-compose up -d

Note: Model download takes 2-10 minutes depending on model size.

Supported Models

Model Size Speed Quality Best For
phi3:latest 2.3GB Default - Fast, good quality
llama3:8b 4.7GB Better quality, slower
mistral:7b 4.1GB Balanced performance
gemma:7b 5.0GB Google's model

Lightweight Models (Faster)

Model Size Speed Quality
phi3:mini 2.3GB
tinyllama:latest 637MB
qwen:0.5b 397MB

High-Quality Models (Slower)

Model Size Speed Quality
llama3:70b 40GB
mixtral:8x7b 26GB

Full list: https://ollama.ai/library

Manual Model Management

Pull Model Manually

# Pull a specific model
docker-compose exec ollama ollama pull llama3:latest

# Pull multiple models
docker-compose exec ollama ollama pull mistral:7b
docker-compose exec ollama ollama pull phi3:latest

List Available Models

docker-compose exec ollama ollama list

Remove Unused Models

# Remove a specific model
docker-compose exec ollama ollama rm phi3:latest

# Free up space
docker-compose exec ollama ollama prune

Testing the New Model

Test via API

curl http://localhost:5001/api/ollama/test

Test Summarization

docker-compose exec crawler python << 'EOF'
from ollama_client import OllamaClient
from config import Config

client = OllamaClient(
    base_url=Config.OLLAMA_BASE_URL,
    model=Config.OLLAMA_MODEL,
    enabled=True
)

result = client.summarize_article(
    "This is a test article about Munich news. The city council made important decisions today.",
    max_words=50
)

print(f"Model: {Config.OLLAMA_MODEL}")
print(f"Success: {result['success']}")
print(f"Summary: {result['summary']}")
print(f"Duration: {result['duration']:.2f}s")
EOF

Test Clustering

docker-compose exec crawler python tests/crawler/test_clustering_real.py

Performance Comparison

Summarization Speed (per article)

Model CPU GPU (NVIDIA)
phi3:latest ~15s ~3s
llama3:8b ~25s ~5s
mistral:7b ~20s ~4s
llama3:70b ~120s ~15s

Memory Requirements

Model RAM VRAM (GPU)
phi3:latest 4GB 2GB
llama3:8b 8GB 4GB
mistral:7b 8GB 4GB
llama3:70b 48GB 40GB

Troubleshooting

Model Not Found

# Check if model exists
docker-compose exec ollama ollama list

# Pull the model manually
docker-compose exec ollama ollama pull your-model:latest

Out of Memory

If you get OOM errors:

  1. Use a smaller model (e.g., phi3:mini)
  2. Enable GPU acceleration (see GPU_SETUP.md)
  3. Increase Docker memory limit

Slow Performance

  1. Use GPU acceleration - 5-10x faster
  2. Use smaller model - phi3:latest is fastest
  3. Increase timeout in .env:
    OLLAMA_TIMEOUT=300
    

Model Download Fails

# Check Ollama logs
docker-compose logs ollama

# Restart Ollama
docker-compose restart ollama

# Try manual pull
docker-compose exec ollama ollama pull phi3:latest

Custom Models

Using Your Own Model

  1. Create/fine-tune your model using Ollama
  2. Import it:
    docker-compose exec ollama ollama create my-model -f Modelfile
    
  3. Update .env:
    OLLAMA_MODEL=my-model:latest
    
  4. Restart services

Model Requirements

Your custom model should support:

  • Text generation
  • Prompt-based instructions
  • Reasonable response times (<60s per request)

Best Practices

For Production

  1. Test thoroughly before switching models
  2. Monitor performance after switching
  3. Keep backup of old model until stable
  4. Document model choice in your deployment notes

For Development

  1. Use phi3:latest for fast iteration
  2. Test with llama3:8b for quality checks
  3. Profile performance with different models
  4. Compare results between models

FAQ

Q: Can I use multiple models? A: Yes! Pull multiple models and switch by updating .env and restarting.

Q: Do I need to re-crawl articles? A: No. Existing summaries remain. New articles use the new model.

Q: Can I use OpenAI/Anthropic models? A: Not directly. Ollama only supports local models. For cloud APIs, you'd need to modify the OllamaClient class.

Q: Which model is best? A: For most users: phi3:latest (fast, good quality). For better quality: llama3:8b. For production with GPU: mistral:7b.

Q: How much disk space do I need? A: 5-10GB for small models, 50GB+ for large models. Plan accordingly.

Complete Example: Changing from phi3 to llama3

# 1. Check current model
curl -s http://localhost:5001/api/ollama/models | python3 -m json.tool
# Shows: "current_model": "phi3:latest"

# 2. Update .env file
# Edit backend/.env and change:
# OLLAMA_MODEL=llama3:8b

# 3. Pull the new model
./pull-ollama-model.sh
# Or manually: docker-compose exec ollama ollama pull llama3:8b

# 4. Restart services
docker-compose restart crawler backend

# 5. Verify the change
curl -s http://localhost:5001/api/ollama/models | python3 -m json.tool
# Shows: "current_model": "llama3:8b"

# 6. Test performance
curl -s http://localhost:5001/api/ollama/test | python3 -m json.tool
# Should show improved quality with llama3

Quick Reference

Change Model Workflow

# 1. Edit .env
vim backend/.env  # Change OLLAMA_MODEL

# 2. Pull model
./pull-ollama-model.sh

# 3. Restart
docker-compose restart crawler backend

# 4. Verify
curl http://localhost:5001/api/ollama/test

Common Commands

# List downloaded models
docker-compose exec ollama ollama list

# Pull a specific model
docker-compose exec ollama ollama pull mistral:7b

# Remove a model
docker-compose exec ollama ollama rm phi3:latest

# Check current config
curl http://localhost:5001/api/ollama/config

# Test performance
curl http://localhost:5001/api/ollama/test