5.5 KiB
How to Check GPU Status via API
Quick Check
1. GPU Status
curl http://localhost:5001/api/ollama/gpu-status | python3 -m json.tool
Response:
{
"status": "success",
"ollama_running": true,
"gpu_available": true,
"gpu_in_use": true,
"gpu_details": {
"model": "phi3:latest",
"gpu_layers": 32,
"size": 2300000000
},
"recommendation": "✓ GPU acceleration is active!"
}
2. Performance Test
curl http://localhost:5001/api/ollama/test | python3 -m json.tool
Response:
{
"status": "success",
"duration_seconds": 3.2,
"performance": "Excellent (GPU likely active)",
"model": "phi3:latest",
"recommendation": "Performance is good"
}
3. List Models
curl http://localhost:5001/api/ollama/models | python3 -m json.tool
Using the Check Script
We've created a convenient script:
./check-gpu-api.sh
Output:
==========================================
Ollama GPU Status Check
==========================================
1. GPU Status:
---
{
"status": "success",
"gpu_in_use": true,
...
}
2. Performance Test:
---
{
"duration_seconds": 3.2,
"performance": "Excellent (GPU likely active)"
}
3. Available Models:
---
{
"models": ["phi3:latest", "llama3:8b"]
}
==========================================
Quick Summary:
==========================================
GPU Status: GPU Active
Performance: 3.2s - Excellent (GPU likely active)
API Endpoints
GET /api/ollama/gpu-status
Check if GPU is being used by Ollama.
Response Fields:
gpu_available- GPU hardware detectedgpu_in_use- Ollama actively using GPUgpu_details- GPU configuration detailsrecommendation- Setup suggestions
GET /api/ollama/test
Test Ollama performance with a sample prompt.
Response Fields:
duration_seconds- Time taken for testperformance- Performance ratingrecommendation- Performance suggestions
GET /api/ollama/models
List all available models.
Response Fields:
models- Array of model namescurrent_model- Active model from .env
GET /api/ollama/ping
Test basic Ollama connectivity.
GET /api/ollama/config
View current Ollama configuration.
Interpreting Results
GPU Status
✅ GPU Active:
{
"gpu_in_use": true,
"gpu_available": true
}
- GPU acceleration is working
- Expect 5-10x faster processing
❌ CPU Mode:
{
"gpu_in_use": false,
"gpu_available": false
}
- Running on CPU only
- Slower processing (15-30s per article)
Performance Ratings
| Duration | Rating | Mode |
|---|---|---|
| < 5s | Excellent | GPU likely active |
| 5-15s | Good | GPU may be active |
| 15-30s | Fair | CPU mode |
| > 30s | Slow | CPU mode, GPU recommended |
Troubleshooting
GPU Not Detected
-
Check if GPU compose is used:
docker-compose ps # Should show GPU configuration -
Verify NVIDIA runtime:
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi -
Check Ollama logs:
docker-compose logs ollama | grep -i gpu
Slow Performance
If performance test shows > 15s:
-
Enable GPU acceleration:
docker-compose down docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d -
Verify GPU is available:
nvidia-smi -
Check model size:
- Larger models = slower
- Try
phi3:latestfor fastest performance
Connection Errors
If API returns connection errors:
-
Check backend is running:
docker-compose ps backend -
Check Ollama is running:
docker-compose ps ollama -
Restart services:
docker-compose restart backend ollama
Monitoring in Production
Automated Checks
Add to your monitoring:
# Check GPU status every 5 minutes
*/5 * * * * curl -s http://localhost:5001/api/ollama/gpu-status | \
python3 -c "import json,sys; data=json.load(sys.stdin); \
sys.exit(0 if data.get('gpu_in_use') else 1)"
Performance Alerts
Alert if performance degrades:
# Alert if response time > 20s
DURATION=$(curl -s http://localhost:5001/api/ollama/test | \
python3 -c "import json,sys; print(json.load(sys.stdin).get('duration_seconds', 999))")
if (( $(echo "$DURATION > 20" | bc -l) )); then
echo "ALERT: Ollama performance degraded: ${DURATION}s"
fi
Example: Full Health Check
#!/bin/bash
# health-check.sh
echo "Checking Ollama Health..."
# 1. GPU Status
GPU=$(curl -s http://localhost:5001/api/ollama/gpu-status | \
python3 -c "import json,sys; print('GPU' if json.load(sys.stdin).get('gpu_in_use') else 'CPU')")
# 2. Performance
PERF=$(curl -s http://localhost:5001/api/ollama/test | \
python3 -c "import json,sys; data=json.load(sys.stdin); print(f\"{data.get('duration_seconds')}s\")")
# 3. Models
MODELS=$(curl -s http://localhost:5001/api/ollama/models | \
python3 -c "import json,sys; print(len(json.load(sys.stdin).get('models', [])))")
echo "Mode: $GPU"
echo "Performance: $PERF"
echo "Models: $MODELS"
# Exit with error if CPU mode and slow
if [ "$GPU" = "CPU" ] && (( $(echo "$PERF > 20" | bc -l) )); then
echo "WARNING: Running in CPU mode with slow performance"
exit 1
fi
echo "✓ Health check passed"
Related Documentation
- GPU_SETUP.md - GPU setup guide
- OLLAMA_SETUP.md - Ollama configuration
- CHANGING_AI_MODEL.md - Model switching guide