dongho/Munich-news

Fork 0

Files

Dongho Kim d59372d1d6 update

2025-11-12 11:55:53 +01:00

5.5 KiB

Raw Blame History

How to Check GPU Status via API

Quick Check

1. GPU Status

curl http://localhost:5001/api/ollama/gpu-status | python3 -m json.tool

Response:

{
  "status": "success",
  "ollama_running": true,
  "gpu_available": true,
  "gpu_in_use": true,
  "gpu_details": {
    "model": "phi3:latest",
    "gpu_layers": 32,
    "size": 2300000000
  },
  "recommendation": "✓ GPU acceleration is active!"
}

2. Performance Test

curl http://localhost:5001/api/ollama/test | python3 -m json.tool

Response:

{
  "status": "success",
  "duration_seconds": 3.2,
  "performance": "Excellent (GPU likely active)",
  "model": "phi3:latest",
  "recommendation": "Performance is good"
}

3. List Models

curl http://localhost:5001/api/ollama/models | python3 -m json.tool

Using the Check Script

We've created a convenient script:

./check-gpu-api.sh

Output:

==========================================
Ollama GPU Status Check
==========================================

1. GPU Status:
---
{
  "status": "success",
  "gpu_in_use": true,
  ...
}

2. Performance Test:
---
{
  "duration_seconds": 3.2,
  "performance": "Excellent (GPU likely active)"
}

3. Available Models:
---
{
  "models": ["phi3:latest", "llama3:8b"]
}

==========================================
Quick Summary:
==========================================
GPU Status: GPU Active
Performance: 3.2s - Excellent (GPU likely active)

API Endpoints

GET /api/ollama/gpu-status

Check if GPU is being used by Ollama.

Response Fields:

gpu_available - GPU hardware detected
gpu_in_use - Ollama actively using GPU
gpu_details - GPU configuration details
recommendation - Setup suggestions

GET /api/ollama/test

Test Ollama performance with a sample prompt.

Response Fields:

duration_seconds - Time taken for test
performance - Performance rating
recommendation - Performance suggestions

GET /api/ollama/models

List all available models.

Response Fields:

models - Array of model names
current_model - Active model from .env

GET /api/ollama/ping

Test basic Ollama connectivity.

GET /api/ollama/config

View current Ollama configuration.

Interpreting Results

GPU Status

✅ GPU Active:

{
  "gpu_in_use": true,
  "gpu_available": true
}

GPU acceleration is working
Expect 5-10x faster processing

❌ CPU Mode:

{
  "gpu_in_use": false,
  "gpu_available": false
}

Running on CPU only
Slower processing (15-30s per article)

Performance Ratings

Duration	Rating	Mode
< 5s	Excellent	GPU likely active
5-15s	Good	GPU may be active
15-30s	Fair	CPU mode
> 30s	Slow	CPU mode, GPU recommended

Troubleshooting

GPU Not Detected

Check if GPU compose is used:

docker-compose ps
# Should show GPU configuration

Verify NVIDIA runtime:

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Check Ollama logs:

docker-compose logs ollama | grep -i gpu

Slow Performance

If performance test shows > 15s:

Enable GPU acceleration:

docker-compose down
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

Verify GPU is available:
```
nvidia-smi
```
Check model size:
- Larger models = slower
- Try phi3:latest for fastest performance

Connection Errors

If API returns connection errors:

Check backend is running:
```
docker-compose ps backend
```
Check Ollama is running:
```
docker-compose ps ollama
```
Restart services:
```
docker-compose restart backend ollama
```

Monitoring in Production

Automated Checks

Add to your monitoring:

# Check GPU status every 5 minutes
*/5 * * * * curl -s http://localhost:5001/api/ollama/gpu-status | \
  python3 -c "import json,sys; data=json.load(sys.stdin); \
  sys.exit(0 if data.get('gpu_in_use') else 1)"

Performance Alerts

Alert if performance degrades:

# Alert if response time > 20s
DURATION=$(curl -s http://localhost:5001/api/ollama/test | \
  python3 -c "import json,sys; print(json.load(sys.stdin).get('duration_seconds', 999))")

if (( $(echo "$DURATION > 20" | bc -l) )); then
  echo "ALERT: Ollama performance degraded: ${DURATION}s"
fi

Example: Full Health Check

#!/bin/bash
# health-check.sh

echo "Checking Ollama Health..."

# 1. GPU Status
GPU=$(curl -s http://localhost:5001/api/ollama/gpu-status | \
  python3 -c "import json,sys; print('GPU' if json.load(sys.stdin).get('gpu_in_use') else 'CPU')")

# 2. Performance
PERF=$(curl -s http://localhost:5001/api/ollama/test | \
  python3 -c "import json,sys; data=json.load(sys.stdin); print(f\"{data.get('duration_seconds')}s\")")

# 3. Models
MODELS=$(curl -s http://localhost:5001/api/ollama/models | \
  python3 -c "import json,sys; print(len(json.load(sys.stdin).get('models', [])))")

echo "Mode: $GPU"
echo "Performance: $PERF"
echo "Models: $MODELS"

# Exit with error if CPU mode and slow
if [ "$GPU" = "CPU" ] && (( $(echo "$PERF > 20" | bc -l) )); then
  echo "WARNING: Running in CPU mode with slow performance"
  exit 1
fi

echo "✓ Health check passed"

GPU_SETUP.md - GPU setup guide
OLLAMA_SETUP.md - Ollama configuration
CHANGING_AI_MODEL.md - Model switching guide

5.5 KiB

Raw Blame History

How to Check GPU Status via API

Quick Check

1. GPU Status

2. Performance Test

3. List Models

Using the Check Script

API Endpoints

GET /api/ollama/gpu-status

GET /api/ollama/test

GET /api/ollama/models

GET /api/ollama/ping

GET /api/ollama/config

Interpreting Results

GPU Status

Performance Ratings

Troubleshooting

GPU Not Detected

Slow Performance

Connection Errors

Monitoring in Production

Automated Checks

Performance Alerts

Example: Full Health Check

Build together

Resources

Get help

5.5 KiB Raw Blame History

How to Check GPU Status via API

Quick Check

1. GPU Status

2. Performance Test

3. List Models

Using the Check Script

API Endpoints

GET /api/ollama/gpu-status

GET /api/ollama/test

GET /api/ollama/models

GET /api/ollama/ping

GET /api/ollama/config

Interpreting Results

GPU Status

Performance Ratings

Troubleshooting

GPU Not Detected

Slow Performance

Connection Errors

Monitoring in Production

Automated Checks

Performance Alerts

Example: Full Health Check

Related Documentation

Build together

Resources

Get help

5.5 KiB

Raw Blame History