update

2025-11-11 17:20:56 +01:00
parent 324751eb5d
commit 901e8166cd
14 changed files with 1762 additions and 4 deletions
--- a/docs/GPU_SETUP.md
+++ b/docs/GPU_SETUP.md
@@ -0,0 +1,310 @@
+# GPU Setup Guide for Ollama
+
+This guide explains how to enable GPU acceleration for Ollama to achieve 5-10x faster AI inference.
+
+## Quick Start
+
+```bash
+# 1. Check if you have a compatible GPU
+./check-gpu.sh
+
+# 2. If GPU is available, start with GPU support
+./start-with-gpu.sh
+
+# 3. Verify GPU is being used
+docker exec munich-news-ollama nvidia-smi
+```
+
+## Benefits of GPU Acceleration
+
+| Operation | CPU (4 cores) | GPU (RTX 3060) | Speedup |
+|-----------|---------------|----------------|---------|
+| Model Load | 20s | 8s | 2.5x |
+| Translation | 1.5s | 0.3s | 5x |
+| Summarization | 8s | 2s | 4x |
+| 10 Articles | 90s | 25s | 3.6x |
+
+**Bottom line:** Processing 10 articles takes ~90 seconds on CPU vs ~25 seconds on GPU.
+
+## Requirements
+
+### Hardware
+- NVIDIA GPU with CUDA support (GTX 1060 or newer recommended)
+- Minimum 4GB VRAM for phi3:latest
+- 8GB+ VRAM for larger models (llama3.2, etc.)
+
+### Software
+- NVIDIA drivers (version 525.60.13 or newer)
+- Docker 20.10+
+- Docker Compose v2.3+
+- NVIDIA Container Toolkit
+
+## Installation
+
+### Step 1: Install NVIDIA Drivers
+
+**Ubuntu/Debian:**
+```bash
+# Check current driver
+nvidia-smi
+
+# If not installed, install recommended driver
+sudo ubuntu-drivers autoinstall
+sudo reboot
+```
+
+**Other Linux:**
+Visit: https://www.nvidia.com/Download/index.aspx
+
+### Step 2: Install NVIDIA Container Toolkit
+
+**Ubuntu/Debian:**
+```bash
+# Add repository
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
+curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
+    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
+    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
+
+# Install
+sudo apt-get update
+sudo apt-get install -y nvidia-container-toolkit
+
+# Configure Docker
+sudo nvidia-ctk runtime configure --runtime=docker
+sudo systemctl restart docker
+```
+
+**RHEL/CentOS:**
+```bash
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | \
+    sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
+
+sudo yum install -y nvidia-container-toolkit
+sudo nvidia-ctk runtime configure --runtime=docker
+sudo systemctl restart docker
+```
+
+### Step 3: Verify Installation
+
+```bash
+# Test GPU access from Docker
+docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
+
+# You should see your GPU information
+```
+
+## Usage
+
+### Starting Services with GPU
+
+**Option 1: Automatic (Recommended)**
+```bash
+./start-with-gpu.sh
+```
+This script automatically detects GPU availability and starts services accordingly.
+
+**Option 2: Manual**
+```bash
+# With GPU
+docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
+
+# Without GPU (CPU only)
+docker-compose up -d
+```
+
+### Verifying GPU Usage
+
+```bash
+# Check if GPU is detected in container
+docker exec munich-news-ollama nvidia-smi
+
+# Monitor GPU usage in real-time
+watch -n 1 'docker exec munich-news-ollama nvidia-smi'
+
+# Run a test and watch GPU usage
+# Terminal 1:
+watch -n 1 'docker exec munich-news-ollama nvidia-smi'
+
+# Terminal 2:
+docker-compose exec crawler python crawler_service.py 2
+```
+
+You should see:
+- GPU memory usage increase during inference
+- GPU utilization spike to 80-100%
+- Faster processing times in logs
+
+## Troubleshooting
+
+### GPU Not Detected
+
+**Check NVIDIA drivers:**
+```bash
+nvidia-smi
+# Should show GPU information
+```
+
+**Check Docker GPU access:**
+```bash
+docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
+# Should show GPU information from inside container
+```
+
+**Check Ollama container:**
+```bash
+docker exec munich-news-ollama nvidia-smi
+# Should show GPU information
+```
+
+### Out of Memory Errors
+
+**Symptoms:**
+- "CUDA out of memory" errors
+- Container crashes during inference
+
+**Solutions:**
+1. Use a smaller model:
+   ```bash
+   # Edit backend/.env
+   OLLAMA_MODEL=gemma2:2b  # Requires ~1.5GB VRAM
+   ```
+
+2. Close other GPU applications:
+   ```bash
+   # Check what's using GPU
+   nvidia-smi
+   ```
+
+3. Increase GPU memory (if using Docker Desktop):
+   - Docker Desktop → Settings → Resources → Advanced
+   - Increase memory allocation
+
+### Slow Performance Despite GPU
+
+**Check GPU utilization:**
+```bash
+watch -n 1 'docker exec munich-news-ollama nvidia-smi'
+```
+
+If GPU utilization is low (<50%):
+1. Ensure you're using the GPU compose file
+2. Check Ollama logs for errors: `docker-compose logs ollama`
+3. Try a different model that better utilizes GPU
+4. Update NVIDIA drivers
+
+### Docker Compose GPU Not Working
+
+**Error:** `could not select device driver "" with capabilities: [[gpu]]`
+
+**Solution:**
+```bash
+# Reconfigure Docker runtime
+sudo nvidia-ctk runtime configure --runtime=docker
+sudo systemctl restart docker
+
+# Verify configuration
+cat /etc/docker/daemon.json
+# Should contain nvidia runtime configuration
+```
+
+## Performance Tuning
+
+### Model Selection
+
+Different models have different GPU requirements and performance:
+
+| Model | VRAM | Speed | Quality | Best For |
+|-------|------|-------|---------|----------|
+| gemma2:2b | 1.5GB | Fastest | Good | High volume, speed critical |
+| phi3:latest | 2-4GB | Fast | Very Good | Balanced (default) |
+| llama3.2:3b | 4-6GB | Medium | Excellent | Quality critical |
+| mistral:latest | 6-8GB | Medium | Excellent | Long-form content |
+
+### Batch Processing
+
+GPU acceleration is most effective when processing multiple articles:
+- 1 article: ~2x speedup
+- 10 articles: ~4x speedup
+- 50+ articles: ~5-10x speedup
+
+This is because the model stays loaded in GPU memory between requests.
+
+### Concurrent Requests
+
+Ollama can handle multiple concurrent requests on GPU:
+```bash
+# Edit backend/.env to enable concurrent processing
+OLLAMA_CONCURRENT_REQUESTS=3
+```
+
+Note: Each concurrent request uses additional VRAM.
+
+## Monitoring
+
+### Real-time GPU Monitoring
+
+```bash
+# Basic monitoring
+watch -n 1 'docker exec munich-news-ollama nvidia-smi'
+
+# Detailed monitoring
+watch -n 1 'docker exec munich-news-ollama nvidia-smi --query-gpu=timestamp,name,temperature.gpu,utilization.gpu,utilization.memory,memory.used,memory.total --format=csv'
+```
+
+### Performance Logging
+
+Check crawler logs for timing information:
+```bash
+docker-compose logs crawler | grep "Title translated"
+# GPU: ✓ Title translated (0.3s)
+# CPU: ✓ Title translated (1.5s)
+```
+
+## Cost-Benefit Analysis
+
+### When to Use GPU
+
+**Use GPU if:**
+- Processing 10+ articles daily
+- Need faster newsletter generation
+- Have available GPU hardware
+- Running multiple AI operations
+
+**Use CPU if:**
+- Processing <5 articles daily
+- No GPU available
+- GPU needed for other tasks
+- Cost-sensitive deployment
+
+### Cloud Deployment
+
+GPU instances cost more but process faster:
+
+| Provider | Instance | GPU | Cost/hour | Articles/hour |
+|----------|----------|-----|-----------|---------------|
+| AWS | g4dn.xlarge | T4 | $0.526 | ~1000 |
+| GCP | n1-standard-4 + T4 | T4 | $0.35 | ~1000 |
+| Azure | NC6 | K80 | $0.90 | ~500 |
+
+For comparison, CPU instances process ~100-200 articles/hour at $0.05-0.10/hour.
+
+## Additional Resources
+
+- [NVIDIA Container Toolkit Documentation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
+- [Ollama GPU Support](https://github.com/ollama/ollama/blob/main/docs/gpu.md)
+- [Docker GPU Support](https://docs.docker.com/config/containers/resource_constraints/#gpu)
+- [CUDA Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/)
+
+## Support
+
+If you encounter issues:
+1. Run `./check-gpu.sh` to diagnose
+2. Check logs: `docker-compose logs ollama`
+3. See [OLLAMA_SETUP.md](OLLAMA_SETUP.md) for general Ollama troubleshooting
+4. Open an issue with:
+   - Output of `nvidia-smi`
+   - Output of `docker info | grep -i runtime`
+   - Relevant logs