update

2025-11-11 17:20:56 +01:00
parent 324751eb5d
commit 901e8166cd
14 changed files with 1762 additions and 4 deletions
--- a/docs/OLLAMA_SETUP.md
+++ b/docs/OLLAMA_SETUP.md
@@ -0,0 +1,249 @@
+# Ollama Setup Guide
+
+This project includes an integrated Ollama service for AI-powered summarization and translation.
+
+**🚀 Want 5-10x faster performance?** See [GPU_SETUP.md](GPU_SETUP.md) for GPU acceleration setup.
+
+## Docker Compose Setup (Recommended)
+
+The docker-compose.yml includes an Ollama service that automatically:
+- Runs Ollama server on port 11434
+- Pulls the phi3:latest model on first startup
+- Persists model data in a Docker volume
+- Supports GPU acceleration (NVIDIA GPUs)
+
+### GPU Support
+
+Ollama can use NVIDIA GPUs for significantly faster inference (5-10x speedup).
+
+**Prerequisites:**
+- NVIDIA GPU with CUDA support
+- NVIDIA drivers installed
+- NVIDIA Container Toolkit installed
+
+**Installation (Ubuntu/Debian):**
+```bash
+# Install NVIDIA Container Toolkit
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
+curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
+  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
+
+sudo apt-get update
+sudo apt-get install -y nvidia-container-toolkit
+sudo systemctl restart docker
+```
+
+**Start with GPU support:**
+```bash
+# Automatic detection and startup
+./start-with-gpu.sh
+
+# Or manually specify GPU support
+docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
+```
+
+**Verify GPU is being used:**
+```bash
+# Check if GPU is detected
+docker exec munich-news-ollama nvidia-smi
+
+# Monitor GPU usage during inference
+watch -n 1 'docker exec munich-news-ollama nvidia-smi'
+```
+
+### Configuration
+
+Update your `backend/.env` file with one of these configurations:
+
+**For Docker Compose (services communicate via internal network):**
+```env
+OLLAMA_ENABLED=true
+OLLAMA_BASE_URL=http://ollama:11434
+OLLAMA_MODEL=phi3:latest
+OLLAMA_TIMEOUT=120
+```
+
+**For external Ollama server (running on host machine):**
+```env
+OLLAMA_ENABLED=true
+OLLAMA_BASE_URL=http://host.docker.internal:11434
+OLLAMA_MODEL=phi3:latest
+OLLAMA_TIMEOUT=120
+```
+
+### Starting the Services
+
+```bash
+# Option 1: Auto-detect GPU and start (recommended)
+./start-with-gpu.sh
+
+# Option 2: Start with GPU support (if you have NVIDIA GPU)
+docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
+
+# Option 3: Start without GPU (CPU only)
+docker-compose up -d
+
+# Check Ollama logs
+docker-compose logs -f ollama
+
+# Check model setup logs
+docker-compose logs ollama-setup
+
+# Verify Ollama is running
+curl http://localhost:11434/api/tags
+```
+
+### First Time Setup
+
+On first startup, the `ollama-setup` service will automatically pull the phi3:latest model. This may take several minutes depending on your internet connection (model is ~2.3GB).
+
+You can monitor the progress:
+```bash
+docker-compose logs -f ollama-setup
+```
+
+### Available Models
+
+The default model is `phi3:latest` (2.3GB), which provides a good balance of speed and quality.
+
+To use a different model:
+1. Update `OLLAMA_MODEL` in your `.env` file
+2. Pull the model manually:
+   ```bash
+   docker-compose exec ollama ollama pull <model-name>
+   ```
+
+Popular alternatives:
+- `llama3.2:latest` - Larger, more capable model
+- `mistral:latest` - Fast and efficient
+- `gemma2:2b` - Smallest, fastest option
+
+### Troubleshooting
+
+**Ollama service not starting:**
+```bash
+# Check if port 11434 is already in use
+lsof -i :11434
+
+# Restart the service
+docker-compose restart ollama
+
+# Check logs
+docker-compose logs ollama
+```
+
+**Model not downloading:**
+```bash
+# Manually pull the model
+docker-compose exec ollama ollama pull phi3:latest
+
+# Check available models
+docker-compose exec ollama ollama list
+```
+
+**GPU not being detected:**
+```bash
+# Check if NVIDIA drivers are installed
+nvidia-smi
+
+# Check if Docker can access GPU
+docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
+
+# Verify GPU is available in Ollama container
+docker exec munich-news-ollama nvidia-smi
+
+# Check Ollama logs for GPU initialization
+docker-compose logs ollama | grep -i gpu
+```
+
+**GPU out of memory:**
+- Phi3 requires ~2-4GB VRAM
+- Close other GPU applications
+- Use a smaller model: `gemma2:2b` (requires ~1.5GB VRAM)
+- Or fall back to CPU mode
+
+**CPU out of memory errors:**
+- Phi3 requires ~4GB RAM
+- Consider using a smaller model like `gemma2:2b`
+- Or increase Docker's memory limit in Docker Desktop settings
+
+**Slow performance even with GPU:**
+- Ensure GPU drivers are up to date
+- Check GPU utilization: `watch -n 1 'docker exec munich-news-ollama nvidia-smi'`
+- Verify you're using the GPU compose file: `docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d`
+- Some models may not fully utilize GPU - try different models
+
+## Local Ollama Installation
+
+If you prefer to run Ollama directly on your host machine:
+
+1. Install Ollama: https://ollama.ai/download
+2. Pull the model: `ollama pull phi3:latest`
+3. Start Ollama: `ollama serve`
+4. Update `.env` to use `http://host.docker.internal:11434`
+
+## Testing the Setup
+
+### Basic API Test
+```bash
+# Test Ollama API directly
+curl http://localhost:11434/api/generate -d '{
+  "model": "phi3:latest",
+  "prompt": "Translate to English: Guten Morgen",
+  "stream": false
+}'
+```
+
+### GPU Verification
+```bash
+# Check if GPU is detected
+docker exec munich-news-ollama nvidia-smi
+
+# Monitor GPU usage during a test
+# Terminal 1: Monitor GPU
+watch -n 1 'docker exec munich-news-ollama nvidia-smi'
+
+# Terminal 2: Run test crawl
+docker-compose exec crawler python crawler_service.py 1
+
+# You should see GPU memory usage increase during inference
+```
+
+### Full Integration Test
+```bash
+# Run a test crawl to verify translation works
+docker-compose exec crawler python crawler_service.py 1
+
+# Check the logs for translation timing
+# GPU: ~0.3-0.5s per translation
+# CPU: ~1-2s per translation
+docker-compose logs crawler | grep "Title translated"
+```
+
+## Performance Notes
+
+### CPU Performance
+- First request may be slow as the model loads into memory (~10-30 seconds)
+- Subsequent requests are faster (cached in memory)
+- Translation: 0.5-2 seconds per title
+- Summarization: 5-10 seconds per article
+- Recommended: 4+ CPU cores, 8GB+ RAM
+
+### GPU Performance (NVIDIA)
+- Model loads faster (~5-10 seconds)
+- Translation: 0.1-0.5 seconds per title (5-10x faster)
+- Summarization: 1-3 seconds per article (3-5x faster)
+- Recommended: 4GB+ VRAM for phi3:latest
+- Larger models (llama3.2) require 8GB+ VRAM
+
+### Performance Comparison
+
+| Operation | CPU (4 cores) | GPU (RTX 3060) | Speedup |
+|-----------|---------------|----------------|---------|
+| Model Load | 20s | 8s | 2.5x |
+| Translation | 1.5s | 0.3s | 5x |
+| Summarization | 8s | 2s | 4x |
+| 10 Articles | 90s | 25s | 3.6x |
+
+**Tip:** GPU acceleration is most beneficial when processing many articles in batch.