363 lines
8.1 KiB
Markdown
363 lines
8.1 KiB
Markdown
# Changing the AI Model
|
|
|
|
## Overview
|
|
|
|
The system uses Ollama for AI-powered features (summarization, clustering, neutral summaries). You can easily change the model by updating the `.env` file.
|
|
|
|
## Current Configuration
|
|
|
|
**Default Model:** `phi3:latest`
|
|
|
|
The model is configured in `backend/.env`:
|
|
```env
|
|
OLLAMA_MODEL=phi3:latest
|
|
```
|
|
|
|
## ✅ How to Change the Model
|
|
|
|
### Important Note
|
|
|
|
✅ **The model IS automatically checked and downloaded on startup**
|
|
|
|
The `ollama-setup` service runs on every `docker-compose up` and:
|
|
- Checks if the model specified in `.env` exists
|
|
- Downloads it if missing
|
|
- Skips download if already present
|
|
|
|
This means you can simply:
|
|
1. Change `OLLAMA_MODEL` in `.env`
|
|
2. Run `docker-compose up -d`
|
|
3. Wait for download (if needed)
|
|
4. Done!
|
|
|
|
### Step 1: Update .env File
|
|
|
|
Edit `backend/.env` and change the `OLLAMA_MODEL` value:
|
|
|
|
```env
|
|
# Example: Change to a different model
|
|
OLLAMA_MODEL=llama3:latest
|
|
|
|
# Or use a specific version
|
|
OLLAMA_MODEL=mistral:7b
|
|
|
|
# Or use a custom model
|
|
OLLAMA_MODEL=your-custom-model:latest
|
|
```
|
|
|
|
### Step 2: Restart Services (Model Auto-Downloads)
|
|
|
|
**Option A: Simple restart (Recommended)**
|
|
```bash
|
|
# Restart all services
|
|
docker-compose up -d
|
|
|
|
# Watch the model check/download
|
|
docker-compose logs -f ollama-setup
|
|
```
|
|
|
|
The `ollama-setup` service will:
|
|
- Check if the new model exists
|
|
- Download it if missing (2-10 minutes)
|
|
- Skip download if already present
|
|
|
|
**Option B: Manual pull (if you want control)**
|
|
```bash
|
|
# Pull the model manually first
|
|
./pull-ollama-model.sh
|
|
|
|
# Then restart
|
|
docker-compose restart crawler backend
|
|
```
|
|
|
|
**Option C: Full restart**
|
|
```bash
|
|
docker-compose down
|
|
docker-compose up -d
|
|
```
|
|
|
|
**Note:** Model download takes 2-10 minutes depending on model size.
|
|
|
|
## Supported Models
|
|
|
|
### Recommended Models
|
|
|
|
| Model | Size | Speed | Quality | Best For |
|
|
|-------|------|-------|---------|----------|
|
|
| `phi3:latest` | 2.3GB | ⚡⚡⚡ | ⭐⭐⭐ | **Default** - Fast, good quality |
|
|
| `llama3:8b` | 4.7GB | ⚡⚡ | ⭐⭐⭐⭐ | Better quality, slower |
|
|
| `mistral:7b` | 4.1GB | ⚡⚡ | ⭐⭐⭐⭐ | Balanced performance |
|
|
| `gemma:7b` | 5.0GB | ⚡⚡ | ⭐⭐⭐⭐ | Google's model |
|
|
|
|
### Lightweight Models (Faster)
|
|
|
|
| Model | Size | Speed | Quality |
|
|
|-------|------|-------|---------|
|
|
| `phi3:mini` | 2.3GB | ⚡⚡⚡ | ⭐⭐⭐ |
|
|
| `tinyllama:latest` | 637MB | ⚡⚡⚡⚡ | ⭐⭐ |
|
|
| `qwen:0.5b` | 397MB | ⚡⚡⚡⚡ | ⭐⭐ |
|
|
|
|
### High-Quality Models (Slower)
|
|
|
|
| Model | Size | Speed | Quality |
|
|
|-------|------|-------|---------|
|
|
| `llama3:70b` | 40GB | ⚡ | ⭐⭐⭐⭐⭐ |
|
|
| `mixtral:8x7b` | 26GB | ⚡ | ⭐⭐⭐⭐⭐ |
|
|
|
|
**Full list:** https://ollama.ai/library
|
|
|
|
## Manual Model Management
|
|
|
|
### Pull Model Manually
|
|
|
|
```bash
|
|
# Pull a specific model
|
|
docker-compose exec ollama ollama pull llama3:latest
|
|
|
|
# Pull multiple models
|
|
docker-compose exec ollama ollama pull mistral:7b
|
|
docker-compose exec ollama ollama pull phi3:latest
|
|
```
|
|
|
|
### List Available Models
|
|
|
|
```bash
|
|
docker-compose exec ollama ollama list
|
|
```
|
|
|
|
### Remove Unused Models
|
|
|
|
```bash
|
|
# Remove a specific model
|
|
docker-compose exec ollama ollama rm phi3:latest
|
|
|
|
# Free up space
|
|
docker-compose exec ollama ollama prune
|
|
```
|
|
|
|
## Testing the New Model
|
|
|
|
### Test via API
|
|
|
|
```bash
|
|
curl http://localhost:5001/api/ollama/test
|
|
```
|
|
|
|
### Test Summarization
|
|
|
|
```bash
|
|
docker-compose exec crawler python << 'EOF'
|
|
from ollama_client import OllamaClient
|
|
from config import Config
|
|
|
|
client = OllamaClient(
|
|
base_url=Config.OLLAMA_BASE_URL,
|
|
model=Config.OLLAMA_MODEL,
|
|
enabled=True
|
|
)
|
|
|
|
result = client.summarize_article(
|
|
"This is a test article about Munich news. The city council made important decisions today.",
|
|
max_words=50
|
|
)
|
|
|
|
print(f"Model: {Config.OLLAMA_MODEL}")
|
|
print(f"Success: {result['success']}")
|
|
print(f"Summary: {result['summary']}")
|
|
print(f"Duration: {result['duration']:.2f}s")
|
|
EOF
|
|
```
|
|
|
|
### Test Clustering
|
|
|
|
```bash
|
|
docker-compose exec crawler python tests/crawler/test_clustering_real.py
|
|
```
|
|
|
|
## Performance Comparison
|
|
|
|
### Summarization Speed (per article)
|
|
|
|
| Model | CPU | GPU (NVIDIA) |
|
|
|-------|-----|--------------|
|
|
| phi3:latest | ~15s | ~3s |
|
|
| llama3:8b | ~25s | ~5s |
|
|
| mistral:7b | ~20s | ~4s |
|
|
| llama3:70b | ~120s | ~15s |
|
|
|
|
### Memory Requirements
|
|
|
|
| Model | RAM | VRAM (GPU) |
|
|
|-------|-----|------------|
|
|
| phi3:latest | 4GB | 2GB |
|
|
| llama3:8b | 8GB | 4GB |
|
|
| mistral:7b | 8GB | 4GB |
|
|
| llama3:70b | 48GB | 40GB |
|
|
|
|
## Troubleshooting
|
|
|
|
### Model Not Found
|
|
|
|
```bash
|
|
# Check if model exists
|
|
docker-compose exec ollama ollama list
|
|
|
|
# Pull the model manually
|
|
docker-compose exec ollama ollama pull your-model:latest
|
|
```
|
|
|
|
### Out of Memory
|
|
|
|
If you get OOM errors:
|
|
1. Use a smaller model (e.g., `phi3:mini`)
|
|
2. Enable GPU acceleration (see [GPU_SETUP.md](GPU_SETUP.md))
|
|
3. Increase Docker memory limit
|
|
|
|
### Slow Performance
|
|
|
|
1. **Use GPU acceleration** - 5-10x faster
|
|
2. **Use smaller model** - phi3:latest is fastest
|
|
3. **Increase timeout** in `.env`:
|
|
```env
|
|
OLLAMA_TIMEOUT=300
|
|
```
|
|
|
|
### Model Download Fails
|
|
|
|
```bash
|
|
# Check Ollama logs
|
|
docker-compose logs ollama
|
|
|
|
# Restart Ollama
|
|
docker-compose restart ollama
|
|
|
|
# Try manual pull
|
|
docker-compose exec ollama ollama pull phi3:latest
|
|
```
|
|
|
|
## Custom Models
|
|
|
|
### Using Your Own Model
|
|
|
|
1. **Create/fine-tune your model** using Ollama
|
|
2. **Import it:**
|
|
```bash
|
|
docker-compose exec ollama ollama create my-model -f Modelfile
|
|
```
|
|
3. **Update .env:**
|
|
```env
|
|
OLLAMA_MODEL=my-model:latest
|
|
```
|
|
4. **Restart services**
|
|
|
|
### Model Requirements
|
|
|
|
Your custom model should support:
|
|
- Text generation
|
|
- Prompt-based instructions
|
|
- Reasonable response times (<60s per request)
|
|
|
|
## Best Practices
|
|
|
|
### For Production
|
|
|
|
1. **Test thoroughly** before switching models
|
|
2. **Monitor performance** after switching
|
|
3. **Keep backup** of old model until stable
|
|
4. **Document** model choice in your deployment notes
|
|
|
|
### For Development
|
|
|
|
1. **Use phi3:latest** for fast iteration
|
|
2. **Test with llama3:8b** for quality checks
|
|
3. **Profile performance** with different models
|
|
4. **Compare results** between models
|
|
|
|
## FAQ
|
|
|
|
**Q: Can I use multiple models?**
|
|
A: Yes! Pull multiple models and switch by updating `.env` and restarting.
|
|
|
|
**Q: Do I need to re-crawl articles?**
|
|
A: No. Existing summaries remain. New articles use the new model.
|
|
|
|
**Q: Can I use OpenAI/Anthropic models?**
|
|
A: Not directly. Ollama only supports local models. For cloud APIs, you'd need to modify the `OllamaClient` class.
|
|
|
|
**Q: Which model is best?**
|
|
A: For most users: `phi3:latest` (fast, good quality). For better quality: `llama3:8b`. For production with GPU: `mistral:7b`.
|
|
|
|
**Q: How much disk space do I need?**
|
|
A: 5-10GB for small models, 50GB+ for large models. Plan accordingly.
|
|
|
|
## Related Documentation
|
|
|
|
- [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama installation & configuration
|
|
- [GPU_SETUP.md](GPU_SETUP.md) - GPU acceleration setup
|
|
- [AI_NEWS_AGGREGATION.md](AI_NEWS_AGGREGATION.md) - AI features overview
|
|
|
|
|
|
## Complete Example: Changing from phi3 to llama3
|
|
|
|
```bash
|
|
# 1. Check current model
|
|
curl -s http://localhost:5001/api/ollama/models | python3 -m json.tool
|
|
# Shows: "current_model": "phi3:latest"
|
|
|
|
# 2. Update .env file
|
|
# Edit backend/.env and change:
|
|
# OLLAMA_MODEL=llama3:8b
|
|
|
|
# 3. Pull the new model
|
|
./pull-ollama-model.sh
|
|
# Or manually: docker-compose exec ollama ollama pull llama3:8b
|
|
|
|
# 4. Restart services
|
|
docker-compose restart crawler backend
|
|
|
|
# 5. Verify the change
|
|
curl -s http://localhost:5001/api/ollama/models | python3 -m json.tool
|
|
# Shows: "current_model": "llama3:8b"
|
|
|
|
# 6. Test performance
|
|
curl -s http://localhost:5001/api/ollama/test | python3 -m json.tool
|
|
# Should show improved quality with llama3
|
|
```
|
|
|
|
## Quick Reference
|
|
|
|
### Change Model Workflow
|
|
|
|
```bash
|
|
# 1. Edit .env
|
|
vim backend/.env # Change OLLAMA_MODEL
|
|
|
|
# 2. Pull model
|
|
./pull-ollama-model.sh
|
|
|
|
# 3. Restart
|
|
docker-compose restart crawler backend
|
|
|
|
# 4. Verify
|
|
curl http://localhost:5001/api/ollama/test
|
|
```
|
|
|
|
### Common Commands
|
|
|
|
```bash
|
|
# List downloaded models
|
|
docker-compose exec ollama ollama list
|
|
|
|
# Pull a specific model
|
|
docker-compose exec ollama ollama pull mistral:7b
|
|
|
|
# Remove a model
|
|
docker-compose exec ollama ollama rm phi3:latest
|
|
|
|
# Check current config
|
|
curl http://localhost:5001/api/ollama/config
|
|
|
|
# Test performance
|
|
curl http://localhost:5001/api/ollama/test
|
|
```
|