dongho/Munich-news

Fork 0

Files

Dongho Kim 75a6973a49 update

2025-11-11 17:40:29 +01:00

6.8 KiB

Raw Blame History

Ollama Setup Guide

This project includes an integrated Ollama service for AI-powered summarization and translation.

🚀 Want 5-10x faster performance? See GPU_SETUP.md for GPU acceleration setup.

Docker Compose Setup (Recommended)

The docker-compose.yml includes an Ollama service that automatically:

Runs Ollama server (internal only, not exposed to host)
Pulls the phi3:latest model on first startup
Persists model data in a Docker volume
Supports GPU acceleration (NVIDIA GPUs)
Only accessible by other Docker Compose services for security

GPU Support

Ollama can use NVIDIA GPUs for significantly faster inference (5-10x speedup).

Prerequisites:

NVIDIA GPU with CUDA support
NVIDIA drivers installed
NVIDIA Container Toolkit installed

Installation (Ubuntu/Debian):

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Start with GPU support:

# Automatic detection and startup
./start-with-gpu.sh

# Or manually specify GPU support
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

Verify GPU is being used:

# Check if GPU is detected
docker exec munich-news-ollama nvidia-smi

# Monitor GPU usage during inference
watch -n 1 'docker exec munich-news-ollama nvidia-smi'

Configuration

Update your backend/.env file with one of these configurations:

For Docker Compose (services communicate via internal network):

OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=phi3:latest
OLLAMA_TIMEOUT=120

For external Ollama server (running on host machine):

OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=phi3:latest
OLLAMA_TIMEOUT=120

Starting the Services

# Option 1: Auto-detect GPU and start (recommended)
./start-with-gpu.sh

# Option 2: Start with GPU support (if you have NVIDIA GPU)
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

# Option 3: Start without GPU (CPU only)
docker-compose up -d

# Check Ollama logs
docker-compose logs -f ollama

# Check model setup logs
docker-compose logs ollama-setup

# Verify Ollama is running (from inside a container)
docker-compose exec crawler curl http://ollama:11434/api/tags

First Time Setup

On first startup, the ollama-setup service will automatically pull the phi3:latest model. This may take several minutes depending on your internet connection (model is ~2.3GB).

You can monitor the progress:

docker-compose logs -f ollama-setup

Available Models

The default model is phi3:latest (2.3GB), which provides a good balance of speed and quality.

To use a different model:

Update OLLAMA_MODEL in your .env file

Pull the model manually:

docker-compose exec ollama ollama pull <model-name>

Popular alternatives:

llama3.2:latest - Larger, more capable model
mistral:latest - Fast and efficient
gemma2:2b - Smallest, fastest option

Troubleshooting

Ollama service not starting:

# Check if port 11434 is already in use
lsof -i :11434

# Restart the service
docker-compose restart ollama

# Check logs
docker-compose logs ollama

Model not downloading:

# Manually pull the model
docker-compose exec ollama ollama pull phi3:latest

# Check available models
docker-compose exec ollama ollama list

GPU not being detected:

# Check if NVIDIA drivers are installed
nvidia-smi

# Check if Docker can access GPU
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

# Verify GPU is available in Ollama container
docker exec munich-news-ollama nvidia-smi

# Check Ollama logs for GPU initialization
docker-compose logs ollama | grep -i gpu

GPU out of memory:

Phi3 requires ~2-4GB VRAM
Close other GPU applications
Use a smaller model: gemma2:2b (requires ~1.5GB VRAM)
Or fall back to CPU mode

CPU out of memory errors:

Phi3 requires ~4GB RAM
Consider using a smaller model like gemma2:2b
Or increase Docker's memory limit in Docker Desktop settings

Slow performance even with GPU:

Ensure GPU drivers are up to date
Check GPU utilization: watch -n 1 'docker exec munich-news-ollama nvidia-smi'
Verify you're using the GPU compose file: docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
Some models may not fully utilize GPU - try different models

Local Ollama Installation

If you prefer to run Ollama directly on your host machine:

Install Ollama: https://ollama.ai/download
Pull the model: ollama pull phi3:latest
Start Ollama: ollama serve
Update .env to use http://host.docker.internal:11434

Testing the Setup

Basic API Test

# Test Ollama API from inside a container
docker-compose exec crawler curl -s http://ollama:11434/api/generate -d '{
  "model": "phi3:latest",
  "prompt": "Translate to English: Guten Morgen",
  "stream": false
}'

GPU Verification

# Check if GPU is detected
docker exec munich-news-ollama nvidia-smi

# Monitor GPU usage during a test
# Terminal 1: Monitor GPU
watch -n 1 'docker exec munich-news-ollama nvidia-smi'

# Terminal 2: Run test crawl
docker-compose exec crawler python crawler_service.py 1

# You should see GPU memory usage increase during inference

Full Integration Test

# Run a test crawl to verify translation works
docker-compose exec crawler python crawler_service.py 1

# Check the logs for translation timing
# GPU: ~0.3-0.5s per translation
# CPU: ~1-2s per translation
docker-compose logs crawler | grep "Title translated"

Performance Notes

CPU Performance

First request may be slow as the model loads into memory (~10-30 seconds)
Subsequent requests are faster (cached in memory)
Translation: 0.5-2 seconds per title
Summarization: 5-10 seconds per article
Recommended: 4+ CPU cores, 8GB+ RAM

GPU Performance (NVIDIA)

Model loads faster (~5-10 seconds)
Translation: 0.1-0.5 seconds per title (5-10x faster)
Summarization: 1-3 seconds per article (3-5x faster)
Recommended: 4GB+ VRAM for phi3:latest
Larger models (llama3.2) require 8GB+ VRAM

Performance Comparison

Operation	CPU (4 cores)	GPU (RTX 3060)	Speedup
Model Load	20s	8s	2.5x
Translation	1.5s	0.3s	5x
Summarization	8s	2s	4x
10 Articles	90s	25s	3.6x

Tip: GPU acceleration is most beneficial when processing many articles in batch.

6.8 KiB

Raw Blame History

Ollama Setup Guide

Docker Compose Setup (Recommended)

GPU Support

Configuration

Starting the Services

First Time Setup

Available Models

Troubleshooting

Local Ollama Installation

Testing the Setup

Basic API Test

GPU Verification

Full Integration Test

Performance Notes

CPU Performance

GPU Performance (NVIDIA)

Performance Comparison

Build together

Resources

Get help

6.8 KiB Raw Blame History

Ollama Setup Guide

Docker Compose Setup (Recommended)

GPU Support

Configuration

Starting the Services

First Time Setup

Available Models

Troubleshooting

Local Ollama Installation

Testing the Setup

Basic API Test

GPU Verification

Full Integration Test

Performance Notes

CPU Performance

GPU Performance (NVIDIA)

Performance Comparison

Build together

Resources

Get help

6.8 KiB

Raw Blame History