Files
Munich-news/docs/CHECK_GPU_STATUS.md
2025-11-12 11:55:53 +01:00

5.5 KiB

How to Check GPU Status via API

Quick Check

1. GPU Status

curl http://localhost:5001/api/ollama/gpu-status | python3 -m json.tool

Response:

{
  "status": "success",
  "ollama_running": true,
  "gpu_available": true,
  "gpu_in_use": true,
  "gpu_details": {
    "model": "phi3:latest",
    "gpu_layers": 32,
    "size": 2300000000
  },
  "recommendation": "✓ GPU acceleration is active!"
}

2. Performance Test

curl http://localhost:5001/api/ollama/test | python3 -m json.tool

Response:

{
  "status": "success",
  "duration_seconds": 3.2,
  "performance": "Excellent (GPU likely active)",
  "model": "phi3:latest",
  "recommendation": "Performance is good"
}

3. List Models

curl http://localhost:5001/api/ollama/models | python3 -m json.tool

Using the Check Script

We've created a convenient script:

./check-gpu-api.sh

Output:

==========================================
Ollama GPU Status Check
==========================================

1. GPU Status:
---
{
  "status": "success",
  "gpu_in_use": true,
  ...
}

2. Performance Test:
---
{
  "duration_seconds": 3.2,
  "performance": "Excellent (GPU likely active)"
}

3. Available Models:
---
{
  "models": ["phi3:latest", "llama3:8b"]
}

==========================================
Quick Summary:
==========================================
GPU Status: GPU Active
Performance: 3.2s - Excellent (GPU likely active)

API Endpoints

GET /api/ollama/gpu-status

Check if GPU is being used by Ollama.

Response Fields:

  • gpu_available - GPU hardware detected
  • gpu_in_use - Ollama actively using GPU
  • gpu_details - GPU configuration details
  • recommendation - Setup suggestions

GET /api/ollama/test

Test Ollama performance with a sample prompt.

Response Fields:

  • duration_seconds - Time taken for test
  • performance - Performance rating
  • recommendation - Performance suggestions

GET /api/ollama/models

List all available models.

Response Fields:

  • models - Array of model names
  • current_model - Active model from .env

GET /api/ollama/ping

Test basic Ollama connectivity.

GET /api/ollama/config

View current Ollama configuration.

Interpreting Results

GPU Status

GPU Active:

{
  "gpu_in_use": true,
  "gpu_available": true
}
  • GPU acceleration is working
  • Expect 5-10x faster processing

CPU Mode:

{
  "gpu_in_use": false,
  "gpu_available": false
}
  • Running on CPU only
  • Slower processing (15-30s per article)

Performance Ratings

Duration Rating Mode
< 5s Excellent GPU likely active
5-15s Good GPU may be active
15-30s Fair CPU mode
> 30s Slow CPU mode, GPU recommended

Troubleshooting

GPU Not Detected

  1. Check if GPU compose is used:

    docker-compose ps
    # Should show GPU configuration
    
  2. Verify NVIDIA runtime:

    docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
    
  3. Check Ollama logs:

    docker-compose logs ollama | grep -i gpu
    

Slow Performance

If performance test shows > 15s:

  1. Enable GPU acceleration:

    docker-compose down
    docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
    
  2. Verify GPU is available:

    nvidia-smi
    
  3. Check model size:

    • Larger models = slower
    • Try phi3:latest for fastest performance

Connection Errors

If API returns connection errors:

  1. Check backend is running:

    docker-compose ps backend
    
  2. Check Ollama is running:

    docker-compose ps ollama
    
  3. Restart services:

    docker-compose restart backend ollama
    

Monitoring in Production

Automated Checks

Add to your monitoring:

# Check GPU status every 5 minutes
*/5 * * * * curl -s http://localhost:5001/api/ollama/gpu-status | \
  python3 -c "import json,sys; data=json.load(sys.stdin); \
  sys.exit(0 if data.get('gpu_in_use') else 1)"

Performance Alerts

Alert if performance degrades:

# Alert if response time > 20s
DURATION=$(curl -s http://localhost:5001/api/ollama/test | \
  python3 -c "import json,sys; print(json.load(sys.stdin).get('duration_seconds', 999))")

if (( $(echo "$DURATION > 20" | bc -l) )); then
  echo "ALERT: Ollama performance degraded: ${DURATION}s"
fi

Example: Full Health Check

#!/bin/bash
# health-check.sh

echo "Checking Ollama Health..."

# 1. GPU Status
GPU=$(curl -s http://localhost:5001/api/ollama/gpu-status | \
  python3 -c "import json,sys; print('GPU' if json.load(sys.stdin).get('gpu_in_use') else 'CPU')")

# 2. Performance
PERF=$(curl -s http://localhost:5001/api/ollama/test | \
  python3 -c "import json,sys; data=json.load(sys.stdin); print(f\"{data.get('duration_seconds')}s\")")

# 3. Models
MODELS=$(curl -s http://localhost:5001/api/ollama/models | \
  python3 -c "import json,sys; print(len(json.load(sys.stdin).get('models', [])))")

echo "Mode: $GPU"
echo "Performance: $PERF"
echo "Models: $MODELS"

# Exit with error if CPU mode and slow
if [ "$GPU" = "CPU" ] && (( $(echo "$PERF > 20" | bc -l) )); then
  echo "WARNING: Running in CPU mode with slow performance"
  exit 1
fi

echo "✓ Health check passed"