Files
Munich-news/QUICK_START_GPU.md
2025-11-11 17:40:29 +01:00

145 lines
2.9 KiB
Markdown

# Quick Start: Ollama with GPU
## 30-Second Setup
```bash
# 1. Check GPU
./check-gpu.sh
# 2. Start services
./start-with-gpu.sh
# 3. Test
docker-compose exec crawler python crawler_service.py 2
```
## Commands Cheat Sheet
### Setup
```bash
# Check GPU availability
./check-gpu.sh
# Configure Ollama
./configure-ollama.sh
# Start with GPU auto-detection
./start-with-gpu.sh
# Start with GPU (manual)
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
# Start without GPU
docker-compose up -d
```
### Monitoring
```bash
# Check GPU usage
docker exec munich-news-ollama nvidia-smi
# Monitor GPU in real-time
watch -n 1 'docker exec munich-news-ollama nvidia-smi'
# Check Ollama logs
docker-compose logs -f ollama
# Check crawler logs
docker-compose logs -f crawler
```
### Testing
```bash
# Test translation (2 articles)
docker-compose exec crawler python crawler_service.py 2
# Check translation timing
docker-compose logs crawler | grep "Title translated"
# Test Ollama API (internal network only)
docker-compose exec crawler curl -s http://ollama:11434/api/generate -d '{
"model": "phi3:latest",
"prompt": "Translate to English: Guten Morgen",
"stream": false
}'
```
### Troubleshooting
```bash
# Restart Ollama
docker-compose restart ollama
# Rebuild and restart
docker-compose up -d --build ollama
# Check GPU in container
docker exec munich-news-ollama nvidia-smi
# Pull model manually
docker-compose exec ollama ollama pull phi3:latest
# List available models
docker-compose exec ollama ollama list
```
## Performance Expectations
| Operation | CPU | GPU | Speedup |
|-----------|-----|-----|---------|
| Translation | 1.5s | 0.3s | 5x |
| Summary | 8s | 2s | 4x |
| 10 Articles | 115s | 31s | 3.7x |
## Common Issues
### GPU Not Detected
```bash
# Install NVIDIA Container Toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```
### Out of Memory
```bash
# Use smaller model (edit backend/.env)
OLLAMA_MODEL=gemma2:2b
```
### Slow Performance
```bash
# Verify GPU is being used
docker exec munich-news-ollama nvidia-smi
# Should show GPU memory usage during inference
```
## Configuration Files
**backend/.env** - Main configuration
```env
OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=phi3:latest
OLLAMA_TIMEOUT=120
```
**docker-compose.yml** - Main services
**docker-compose.gpu.yml** - GPU override
## Model Options
- `gemma2:2b` - Fastest, 1.5GB VRAM
- `phi3:latest` - Default, 3-4GB VRAM ⭐
- `llama3.2:3b` - Best quality, 5-6GB VRAM
## Full Documentation
- [OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md) - Complete setup guide
- [GPU_SETUP.md](docs/GPU_SETUP.md) - GPU-specific guide
- [PERFORMANCE_COMPARISON.md](docs/PERFORMANCE_COMPARISON.md) - Benchmarks
## Need Help?
1. Run `./check-gpu.sh`
2. Check `docker-compose logs ollama`
3. See troubleshooting in [GPU_SETUP.md](docs/GPU_SETUP.md)