Munich-news/QUICK_START_GPU.md

# Quick Start: Ollama with GPU

## 30-Second Setup

```bash
# 1. Check GPU
./check-gpu.sh

# 2. Start services
./start-with-gpu.sh

# 3. Test
docker-compose exec crawler python crawler_service.py 2
```

## Commands Cheat Sheet

### Setup
```bash
# Check GPU availability
./check-gpu.sh

# Configure Ollama
./configure-ollama.sh

# Start with GPU auto-detection
./start-with-gpu.sh

# Start with GPU (manual)
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

# Start without GPU
docker-compose up -d
```

### Monitoring
```bash
# Check GPU usage
docker exec munich-news-ollama nvidia-smi

# Monitor GPU in real-time
watch -n 1 'docker exec munich-news-ollama nvidia-smi'

# Check Ollama logs
docker-compose logs -f ollama

# Check crawler logs
docker-compose logs -f crawler
```

### Testing
```bash
# Test translation (2 articles)
docker-compose exec crawler python crawler_service.py 2

# Check translation timing
docker-compose logs crawler | grep "Title translated"

# Test Ollama API (internal network only)
docker-compose exec crawler curl -s http://ollama:11434/api/generate -d '{
  "model": "phi3:latest",
  "prompt": "Translate to English: Guten Morgen",
  "stream": false
}'
```

### Troubleshooting
```bash
# Restart Ollama
docker-compose restart ollama

# Rebuild and restart
docker-compose up -d --build ollama

# Check GPU in container
docker exec munich-news-ollama nvidia-smi

# Pull model manually
docker-compose exec ollama ollama pull phi3:latest

# List available models
docker-compose exec ollama ollama list
```

## Performance Expectations

| Operation | CPU | GPU | Speedup |
|-----------|-----|-----|---------|
| Translation | 1.5s | 0.3s | 5x |
| Summary | 8s | 2s | 4x |
| 10 Articles | 115s | 31s | 3.7x |

## Common Issues

### GPU Not Detected
```bash
# Install NVIDIA Container Toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```

### Out of Memory
```bash
# Use smaller model (edit backend/.env)
OLLAMA_MODEL=gemma2:2b
```

### Slow Performance
```bash
# Verify GPU is being used
docker exec munich-news-ollama nvidia-smi
# Should show GPU memory usage during inference
```

## Configuration Files

**backend/.env** - Main configuration
```env
OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=phi3:latest
OLLAMA_TIMEOUT=120
```

**docker-compose.yml** - Main services
**docker-compose.gpu.yml** - GPU override

## Model Options

- `gemma2:2b` - Fastest, 1.5GB VRAM
- `phi3:latest` - Default, 3-4GB VRAM ⭐
- `llama3.2:3b` - Best quality, 5-6GB VRAM

## Full Documentation

- [OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md) - Complete setup guide
- [GPU_SETUP.md](docs/GPU_SETUP.md) - GPU-specific guide
- [PERFORMANCE_COMPARISON.md](docs/PERFORMANCE_COMPARISON.md) - Benchmarks

## Need Help?

1. Run `./check-gpu.sh`
2. Check `docker-compose logs ollama`
3. See troubleshooting in [GPU_SETUP.md](docs/GPU_SETUP.md)