Munich-news/docs/CHANGING_AI_MODEL.md

# Changing the AI Model

## Overview

The system uses Ollama for AI-powered features (summarization, clustering, neutral summaries). You can easily change the model by updating the `.env` file.

## Current Configuration

**Default Model:** `phi3:latest`

The model is configured in `backend/.env`:
```env
OLLAMA_MODEL=phi3:latest
```

## ✅ How to Change the Model

### Important Note

✅ **The model IS automatically checked and downloaded on startup**

The `ollama-setup` service runs on every `docker-compose up` and:
- Checks if the model specified in `.env` exists
- Downloads it if missing
- Skips download if already present

This means you can simply:
1. Change `OLLAMA_MODEL` in `.env`
2. Run `docker-compose up -d`
3. Wait for download (if needed)
4. Done!

### Step 1: Update .env File

Edit `backend/.env` and change the `OLLAMA_MODEL` value:

```env
# Example: Change to a different model
OLLAMA_MODEL=llama3:latest

# Or use a specific version
OLLAMA_MODEL=mistral:7b

# Or use a custom model
OLLAMA_MODEL=your-custom-model:latest
```

### Step 2: Restart Services (Model Auto-Downloads)

**Option A: Simple restart (Recommended)**
```bash
# Restart all services
docker-compose up -d

# Watch the model check/download
docker-compose logs -f ollama-setup
```

The `ollama-setup` service will:
- Check if the new model exists
- Download it if missing (2-10 minutes)
- Skip download if already present

**Option B: Manual pull (if you want control)**
```bash
# Pull the model manually first
./pull-ollama-model.sh

# Then restart
docker-compose restart crawler backend
```

**Option C: Full restart**
```bash
docker-compose down
docker-compose up -d
```

**Note:** Model download takes 2-10 minutes depending on model size.

## Supported Models

### Recommended Models

| Model | Size | Speed | Quality | Best For |
|-------|------|-------|---------|----------|
| `phi3:latest` | 2.3GB | ⚡⚡⚡ | ⭐⭐⭐ | **Default** - Fast, good quality |
| `llama3:8b` | 4.7GB | ⚡⚡ | ⭐⭐⭐⭐ | Better quality, slower |
| `mistral:7b` | 4.1GB | ⚡⚡ | ⭐⭐⭐⭐ | Balanced performance |
| `gemma:7b` | 5.0GB | ⚡⚡ | ⭐⭐⭐⭐ | Google's model |

### Lightweight Models (Faster)

| Model | Size | Speed | Quality |
|-------|------|-------|---------|
| `phi3:mini` | 2.3GB | ⚡⚡⚡ | ⭐⭐⭐ |
| `tinyllama:latest` | 637MB | ⚡⚡⚡⚡ | ⭐⭐ |
| `qwen:0.5b` | 397MB | ⚡⚡⚡⚡ | ⭐⭐ |

### High-Quality Models (Slower)

| Model | Size | Speed | Quality |
|-------|------|-------|---------|
| `llama3:70b` | 40GB | ⚡ | ⭐⭐⭐⭐⭐ |
| `mixtral:8x7b` | 26GB | ⚡ | ⭐⭐⭐⭐⭐ |

**Full list:** https://ollama.ai/library

## Manual Model Management

### Pull Model Manually

```bash
# Pull a specific model
docker-compose exec ollama ollama pull llama3:latest

# Pull multiple models
docker-compose exec ollama ollama pull mistral:7b
docker-compose exec ollama ollama pull phi3:latest
```

### List Available Models

```bash
docker-compose exec ollama ollama list
```

### Remove Unused Models

```bash
# Remove a specific model
docker-compose exec ollama ollama rm phi3:latest

# Free up space
docker-compose exec ollama ollama prune
```

## Testing the New Model

### Test via API

```bash
curl http://localhost:5001/api/ollama/test
```

### Test Summarization

```bash
docker-compose exec crawler python << 'EOF'
from ollama_client import OllamaClient
from config import Config

client = OllamaClient(
    base_url=Config.OLLAMA_BASE_URL,
    model=Config.OLLAMA_MODEL,
    enabled=True
)

result = client.summarize_article(
    "This is a test article about Munich news. The city council made important decisions today.",
    max_words=50
)

print(f"Model: {Config.OLLAMA_MODEL}")
print(f"Success: {result['success']}")
print(f"Summary: {result['summary']}")
print(f"Duration: {result['duration']:.2f}s")
EOF
```

### Test Clustering

```bash
docker-compose exec crawler python tests/crawler/test_clustering_real.py
```

## Performance Comparison

### Summarization Speed (per article)

| Model | CPU | GPU (NVIDIA) |
|-------|-----|--------------|
| phi3:latest | ~15s | ~3s |
| llama3:8b | ~25s | ~5s |
| mistral:7b | ~20s | ~4s |
| llama3:70b | ~120s | ~15s |

### Memory Requirements

| Model | RAM | VRAM (GPU) |
|-------|-----|------------|
| phi3:latest | 4GB | 2GB |
| llama3:8b | 8GB | 4GB |
| mistral:7b | 8GB | 4GB |
| llama3:70b | 48GB | 40GB |

## Troubleshooting

### Model Not Found

```bash
# Check if model exists
docker-compose exec ollama ollama list

# Pull the model manually
docker-compose exec ollama ollama pull your-model:latest
```

### Out of Memory

If you get OOM errors:
1. Use a smaller model (e.g., `phi3:mini`)
2. Enable GPU acceleration (see [GPU_SETUP.md](GPU_SETUP.md))
3. Increase Docker memory limit

### Slow Performance

1. **Use GPU acceleration** - 5-10x faster
2. **Use smaller model** - phi3:latest is fastest
3. **Increase timeout** in `.env`:
   ```env
   OLLAMA_TIMEOUT=300
   ```

### Model Download Fails

```bash
# Check Ollama logs
docker-compose logs ollama

# Restart Ollama
docker-compose restart ollama

# Try manual pull
docker-compose exec ollama ollama pull phi3:latest
```

## Custom Models

### Using Your Own Model

1. **Create/fine-tune your model** using Ollama
2. **Import it:**
   ```bash
   docker-compose exec ollama ollama create my-model -f Modelfile
   ```
3. **Update .env:**
   ```env
   OLLAMA_MODEL=my-model:latest
   ```
4. **Restart services**

### Model Requirements

Your custom model should support:
- Text generation
- Prompt-based instructions
- Reasonable response times (<60s per request)

## Best Practices

### For Production

1. **Test thoroughly** before switching models
2. **Monitor performance** after switching
3. **Keep backup** of old model until stable
4. **Document** model choice in your deployment notes

### For Development

1. **Use phi3:latest** for fast iteration
2. **Test with llama3:8b** for quality checks
3. **Profile performance** with different models
4. **Compare results** between models

## FAQ

**Q: Can I use multiple models?**
A: Yes! Pull multiple models and switch by updating `.env` and restarting.

**Q: Do I need to re-crawl articles?**
A: No. Existing summaries remain. New articles use the new model.

**Q: Can I use OpenAI/Anthropic models?**
A: Not directly. Ollama only supports local models. For cloud APIs, you'd need to modify the `OllamaClient` class.

**Q: Which model is best?**
A: For most users: `phi3:latest` (fast, good quality). For better quality: `llama3:8b`. For production with GPU: `mistral:7b`.

**Q: How much disk space do I need?**
A: 5-10GB for small models, 50GB+ for large models. Plan accordingly.

## Related Documentation

- [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama installation & configuration
- [GPU_SETUP.md](GPU_SETUP.md) - GPU acceleration setup
- [AI_NEWS_AGGREGATION.md](AI_NEWS_AGGREGATION.md) - AI features overview


## Complete Example: Changing from phi3 to llama3

```bash
# 1. Check current model
curl -s http://localhost:5001/api/ollama/models | python3 -m json.tool
# Shows: "current_model": "phi3:latest"

# 2. Update .env file
# Edit backend/.env and change:
# OLLAMA_MODEL=llama3:8b

# 3. Pull the new model
./pull-ollama-model.sh
# Or manually: docker-compose exec ollama ollama pull llama3:8b

# 4. Restart services
docker-compose restart crawler backend

# 5. Verify the change
curl -s http://localhost:5001/api/ollama/models | python3 -m json.tool
# Shows: "current_model": "llama3:8b"

# 6. Test performance
curl -s http://localhost:5001/api/ollama/test | python3 -m json.tool
# Should show improved quality with llama3
```

## Quick Reference

### Change Model Workflow

```bash
# 1. Edit .env
vim backend/.env  # Change OLLAMA_MODEL

# 2. Pull model
./pull-ollama-model.sh

# 3. Restart
docker-compose restart crawler backend

# 4. Verify
curl http://localhost:5001/api/ollama/test
```

### Common Commands

```bash
# List downloaded models
docker-compose exec ollama ollama list

# Pull a specific model
docker-compose exec ollama ollama pull mistral:7b

# Remove a model
docker-compose exec ollama ollama rm phi3:latest

# Check current config
curl http://localhost:5001/api/ollama/config

# Test performance
curl http://localhost:5001/api/ollama/test
```