# Quick Start: Ollama with GPU ## 30-Second Setup ```bash # 1. Check GPU ./check-gpu.sh # 2. Start services ./start-with-gpu.sh # 3. Test docker-compose exec crawler python crawler_service.py 2 ``` ## Commands Cheat Sheet ### Setup ```bash # Check GPU availability ./check-gpu.sh # Configure Ollama ./configure-ollama.sh # Start with GPU auto-detection ./start-with-gpu.sh # Start with GPU (manual) docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d # Start without GPU docker-compose up -d ``` ### Monitoring ```bash # Check GPU usage docker exec munich-news-ollama nvidia-smi # Monitor GPU in real-time watch -n 1 'docker exec munich-news-ollama nvidia-smi' # Check Ollama logs docker-compose logs -f ollama # Check crawler logs docker-compose logs -f crawler ``` ### Testing ```bash # Test translation (2 articles) docker-compose exec crawler python crawler_service.py 2 # Check translation timing docker-compose logs crawler | grep "Title translated" # Test Ollama API directly curl http://localhost:11434/api/generate -d '{ "model": "phi3:latest", "prompt": "Translate to English: Guten Morgen", "stream": false }' ``` ### Troubleshooting ```bash # Restart Ollama docker-compose restart ollama # Rebuild and restart docker-compose up -d --build ollama # Check GPU in container docker exec munich-news-ollama nvidia-smi # Pull model manually docker-compose exec ollama ollama pull phi3:latest # List available models docker-compose exec ollama ollama list ``` ## Performance Expectations | Operation | CPU | GPU | Speedup | |-----------|-----|-----|---------| | Translation | 1.5s | 0.3s | 5x | | Summary | 8s | 2s | 4x | | 10 Articles | 115s | 31s | 3.7x | ## Common Issues ### GPU Not Detected ```bash # Install NVIDIA Container Toolkit sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` ### Out of Memory ```bash # Use smaller model (edit backend/.env) OLLAMA_MODEL=gemma2:2b ``` ### Slow Performance ```bash # Verify GPU is being used docker exec munich-news-ollama nvidia-smi # Should show GPU memory usage during inference ``` ## Configuration Files **backend/.env** - Main configuration ```env OLLAMA_ENABLED=true OLLAMA_BASE_URL=http://ollama:11434 OLLAMA_MODEL=phi3:latest OLLAMA_TIMEOUT=120 ``` **docker-compose.yml** - Main services **docker-compose.gpu.yml** - GPU override ## Model Options - `gemma2:2b` - Fastest, 1.5GB VRAM - `phi3:latest` - Default, 3-4GB VRAM ⭐ - `llama3.2:3b` - Best quality, 5-6GB VRAM ## Full Documentation - [OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md) - Complete setup guide - [GPU_SETUP.md](docs/GPU_SETUP.md) - GPU-specific guide - [PERFORMANCE_COMPARISON.md](docs/PERFORMANCE_COMPARISON.md) - Benchmarks ## Need Help? 1. Run `./check-gpu.sh` 2. Check `docker-compose logs ollama` 3. See troubleshooting in [GPU_SETUP.md](docs/GPU_SETUP.md)