Files
Munich-news/docs/CHECK_GPU_STATUS.md
2025-11-12 11:55:53 +01:00

277 lines
5.5 KiB
Markdown

# How to Check GPU Status via API
## Quick Check
### 1. GPU Status
```bash
curl http://localhost:5001/api/ollama/gpu-status | python3 -m json.tool
```
**Response:**
```json
{
"status": "success",
"ollama_running": true,
"gpu_available": true,
"gpu_in_use": true,
"gpu_details": {
"model": "phi3:latest",
"gpu_layers": 32,
"size": 2300000000
},
"recommendation": "✓ GPU acceleration is active!"
}
```
### 2. Performance Test
```bash
curl http://localhost:5001/api/ollama/test | python3 -m json.tool
```
**Response:**
```json
{
"status": "success",
"duration_seconds": 3.2,
"performance": "Excellent (GPU likely active)",
"model": "phi3:latest",
"recommendation": "Performance is good"
}
```
### 3. List Models
```bash
curl http://localhost:5001/api/ollama/models | python3 -m json.tool
```
## Using the Check Script
We've created a convenient script:
```bash
./check-gpu-api.sh
```
**Output:**
```
==========================================
Ollama GPU Status Check
==========================================
1. GPU Status:
---
{
"status": "success",
"gpu_in_use": true,
...
}
2. Performance Test:
---
{
"duration_seconds": 3.2,
"performance": "Excellent (GPU likely active)"
}
3. Available Models:
---
{
"models": ["phi3:latest", "llama3:8b"]
}
==========================================
Quick Summary:
==========================================
GPU Status: GPU Active
Performance: 3.2s - Excellent (GPU likely active)
```
## API Endpoints
### GET /api/ollama/gpu-status
Check if GPU is being used by Ollama.
**Response Fields:**
- `gpu_available` - GPU hardware detected
- `gpu_in_use` - Ollama actively using GPU
- `gpu_details` - GPU configuration details
- `recommendation` - Setup suggestions
### GET /api/ollama/test
Test Ollama performance with a sample prompt.
**Response Fields:**
- `duration_seconds` - Time taken for test
- `performance` - Performance rating
- `recommendation` - Performance suggestions
### GET /api/ollama/models
List all available models.
**Response Fields:**
- `models` - Array of model names
- `current_model` - Active model from .env
### GET /api/ollama/ping
Test basic Ollama connectivity.
### GET /api/ollama/config
View current Ollama configuration.
## Interpreting Results
### GPU Status
**✅ GPU Active:**
```json
{
"gpu_in_use": true,
"gpu_available": true
}
```
- GPU acceleration is working
- Expect 5-10x faster processing
**❌ CPU Mode:**
```json
{
"gpu_in_use": false,
"gpu_available": false
}
```
- Running on CPU only
- Slower processing (15-30s per article)
### Performance Ratings
| Duration | Rating | Mode |
|----------|--------|------|
| < 5s | Excellent | GPU likely active |
| 5-15s | Good | GPU may be active |
| 15-30s | Fair | CPU mode |
| > 30s | Slow | CPU mode, GPU recommended |
## Troubleshooting
### GPU Not Detected
1. **Check if GPU compose is used:**
```bash
docker-compose ps
# Should show GPU configuration
```
2. **Verify NVIDIA runtime:**
```bash
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```
3. **Check Ollama logs:**
```bash
docker-compose logs ollama | grep -i gpu
```
### Slow Performance
If performance test shows > 15s:
1. **Enable GPU acceleration:**
```bash
docker-compose down
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
```
2. **Verify GPU is available:**
```bash
nvidia-smi
```
3. **Check model size:**
- Larger models = slower
- Try `phi3:latest` for fastest performance
### Connection Errors
If API returns connection errors:
1. **Check backend is running:**
```bash
docker-compose ps backend
```
2. **Check Ollama is running:**
```bash
docker-compose ps ollama
```
3. **Restart services:**
```bash
docker-compose restart backend ollama
```
## Monitoring in Production
### Automated Checks
Add to your monitoring:
```bash
# Check GPU status every 5 minutes
*/5 * * * * curl -s http://localhost:5001/api/ollama/gpu-status | \
python3 -c "import json,sys; data=json.load(sys.stdin); \
sys.exit(0 if data.get('gpu_in_use') else 1)"
```
### Performance Alerts
Alert if performance degrades:
```bash
# Alert if response time > 20s
DURATION=$(curl -s http://localhost:5001/api/ollama/test | \
python3 -c "import json,sys; print(json.load(sys.stdin).get('duration_seconds', 999))")
if (( $(echo "$DURATION > 20" | bc -l) )); then
echo "ALERT: Ollama performance degraded: ${DURATION}s"
fi
```
## Example: Full Health Check
```bash
#!/bin/bash
# health-check.sh
echo "Checking Ollama Health..."
# 1. GPU Status
GPU=$(curl -s http://localhost:5001/api/ollama/gpu-status | \
python3 -c "import json,sys; print('GPU' if json.load(sys.stdin).get('gpu_in_use') else 'CPU')")
# 2. Performance
PERF=$(curl -s http://localhost:5001/api/ollama/test | \
python3 -c "import json,sys; data=json.load(sys.stdin); print(f\"{data.get('duration_seconds')}s\")")
# 3. Models
MODELS=$(curl -s http://localhost:5001/api/ollama/models | \
python3 -c "import json,sys; print(len(json.load(sys.stdin).get('models', [])))")
echo "Mode: $GPU"
echo "Performance: $PERF"
echo "Models: $MODELS"
# Exit with error if CPU mode and slow
if [ "$GPU" = "CPU" ] && (( $(echo "$PERF > 20" | bc -l) )); then
echo "WARNING: Running in CPU mode with slow performance"
exit 1
fi
echo "✓ Health check passed"
```
## Related Documentation
- [GPU_SETUP.md](GPU_SETUP.md) - GPU setup guide
- [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama configuration
- [CHANGING_AI_MODEL.md](CHANGING_AI_MODEL.md) - Model switching guide