Munich-news/docs/CHECK_GPU_STATUS.md

# How to Check GPU Status via API

## Quick Check

### 1. GPU Status
```bash
curl http://localhost:5001/api/ollama/gpu-status | python3 -m json.tool
```

**Response:**
```json
{
  "status": "success",
  "ollama_running": true,
  "gpu_available": true,
  "gpu_in_use": true,
  "gpu_details": {
    "model": "phi3:latest",
    "gpu_layers": 32,
    "size": 2300000000
  },
  "recommendation": "✓ GPU acceleration is active!"
}
```

### 2. Performance Test
```bash
curl http://localhost:5001/api/ollama/test | python3 -m json.tool
```

**Response:**
```json
{
  "status": "success",
  "duration_seconds": 3.2,
  "performance": "Excellent (GPU likely active)",
  "model": "phi3:latest",
  "recommendation": "Performance is good"
}
```

### 3. List Models
```bash
curl http://localhost:5001/api/ollama/models | python3 -m json.tool
```

## Using the Check Script

We've created a convenient script:

```bash
./check-gpu-api.sh
```

**Output:**
```
==========================================
Ollama GPU Status Check
==========================================

1. GPU Status:
---
{
  "status": "success",
  "gpu_in_use": true,
  ...
}

2. Performance Test:
---
{
  "duration_seconds": 3.2,
  "performance": "Excellent (GPU likely active)"
}

3. Available Models:
---
{
  "models": ["phi3:latest", "llama3:8b"]
}

==========================================
Quick Summary:
==========================================
GPU Status: GPU Active
Performance: 3.2s - Excellent (GPU likely active)
```

## API Endpoints

### GET /api/ollama/gpu-status
Check if GPU is being used by Ollama.

**Response Fields:**
- `gpu_available` - GPU hardware detected
- `gpu_in_use` - Ollama actively using GPU
- `gpu_details` - GPU configuration details
- `recommendation` - Setup suggestions

### GET /api/ollama/test
Test Ollama performance with a sample prompt.

**Response Fields:**
- `duration_seconds` - Time taken for test
- `performance` - Performance rating
- `recommendation` - Performance suggestions

### GET /api/ollama/models
List all available models.

**Response Fields:**
- `models` - Array of model names
- `current_model` - Active model from .env

### GET /api/ollama/ping
Test basic Ollama connectivity.

### GET /api/ollama/config
View current Ollama configuration.

## Interpreting Results

### GPU Status

**✅ GPU Active:**
```json
{
  "gpu_in_use": true,
  "gpu_available": true
}
```
- GPU acceleration is working
- Expect 5-10x faster processing

**❌ CPU Mode:**
```json
{
  "gpu_in_use": false,
  "gpu_available": false
}
```
- Running on CPU only
- Slower processing (15-30s per article)

### Performance Ratings

| Duration | Rating | Mode |
|----------|--------|------|
| < 5s | Excellent | GPU likely active |
| 5-15s | Good | GPU may be active |
| 15-30s | Fair | CPU mode |
| > 30s | Slow | CPU mode, GPU recommended |

## Troubleshooting

### GPU Not Detected

1. **Check if GPU compose is used:**
   ```bash
   docker-compose ps
   # Should show GPU configuration
   ```

2. **Verify NVIDIA runtime:**
   ```bash
   docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
   ```

3. **Check Ollama logs:**
   ```bash
   docker-compose logs ollama | grep -i gpu
   ```

### Slow Performance

If performance test shows > 15s:

1. **Enable GPU acceleration:**
   ```bash
   docker-compose down
   docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
   ```

2. **Verify GPU is available:**
   ```bash
   nvidia-smi
   ```

3. **Check model size:**
   - Larger models = slower
   - Try `phi3:latest` for fastest performance

### Connection Errors

If API returns connection errors:

1. **Check backend is running:**
   ```bash
   docker-compose ps backend
   ```

2. **Check Ollama is running:**
   ```bash
   docker-compose ps ollama
   ```

3. **Restart services:**
   ```bash
   docker-compose restart backend ollama
   ```

## Monitoring in Production

### Automated Checks

Add to your monitoring:

```bash
# Check GPU status every 5 minutes
*/5 * * * * curl -s http://localhost:5001/api/ollama/gpu-status | \
  python3 -c "import json,sys; data=json.load(sys.stdin); \
  sys.exit(0 if data.get('gpu_in_use') else 1)"
```

### Performance Alerts

Alert if performance degrades:

```bash
# Alert if response time > 20s
DURATION=$(curl -s http://localhost:5001/api/ollama/test | \
  python3 -c "import json,sys; print(json.load(sys.stdin).get('duration_seconds', 999))")

if (( $(echo "$DURATION > 20" | bc -l) )); then
  echo "ALERT: Ollama performance degraded: ${DURATION}s"
fi
```

## Example: Full Health Check

```bash
#!/bin/bash
# health-check.sh

echo "Checking Ollama Health..."

# 1. GPU Status
GPU=$(curl -s http://localhost:5001/api/ollama/gpu-status | \
  python3 -c "import json,sys; print('GPU' if json.load(sys.stdin).get('gpu_in_use') else 'CPU')")

# 2. Performance
PERF=$(curl -s http://localhost:5001/api/ollama/test | \
  python3 -c "import json,sys; data=json.load(sys.stdin); print(f\"{data.get('duration_seconds')}s\")")

# 3. Models
MODELS=$(curl -s http://localhost:5001/api/ollama/models | \
  python3 -c "import json,sys; print(len(json.load(sys.stdin).get('models', [])))")

echo "Mode: $GPU"
echo "Performance: $PERF"
echo "Models: $MODELS"

# Exit with error if CPU mode and slow
if [ "$GPU" = "CPU" ] && (( $(echo "$PERF > 20" | bc -l) )); then
  echo "WARNING: Running in CPU mode with slow performance"
  exit 1
fi

echo "✓ Health check passed"
```

## Related Documentation

- [GPU_SETUP.md](GPU_SETUP.md) - GPU setup guide
- [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama configuration
- [CHANGING_AI_MODEL.md](CHANGING_AI_MODEL.md) - Model switching guide