# How to Check GPU Status via API ## Quick Check ### 1. GPU Status ```bash curl http://localhost:5001/api/ollama/gpu-status | python3 -m json.tool ``` **Response:** ```json { "status": "success", "ollama_running": true, "gpu_available": true, "gpu_in_use": true, "gpu_details": { "model": "phi3:latest", "gpu_layers": 32, "size": 2300000000 }, "recommendation": "✓ GPU acceleration is active!" } ``` ### 2. Performance Test ```bash curl http://localhost:5001/api/ollama/test | python3 -m json.tool ``` **Response:** ```json { "status": "success", "duration_seconds": 3.2, "performance": "Excellent (GPU likely active)", "model": "phi3:latest", "recommendation": "Performance is good" } ``` ### 3. List Models ```bash curl http://localhost:5001/api/ollama/models | python3 -m json.tool ``` ## Using the Check Script We've created a convenient script: ```bash ./check-gpu-api.sh ``` **Output:** ``` ========================================== Ollama GPU Status Check ========================================== 1. GPU Status: --- { "status": "success", "gpu_in_use": true, ... } 2. Performance Test: --- { "duration_seconds": 3.2, "performance": "Excellent (GPU likely active)" } 3. Available Models: --- { "models": ["phi3:latest", "llama3:8b"] } ========================================== Quick Summary: ========================================== GPU Status: GPU Active Performance: 3.2s - Excellent (GPU likely active) ``` ## API Endpoints ### GET /api/ollama/gpu-status Check if GPU is being used by Ollama. **Response Fields:** - `gpu_available` - GPU hardware detected - `gpu_in_use` - Ollama actively using GPU - `gpu_details` - GPU configuration details - `recommendation` - Setup suggestions ### GET /api/ollama/test Test Ollama performance with a sample prompt. **Response Fields:** - `duration_seconds` - Time taken for test - `performance` - Performance rating - `recommendation` - Performance suggestions ### GET /api/ollama/models List all available models. **Response Fields:** - `models` - Array of model names - `current_model` - Active model from .env ### GET /api/ollama/ping Test basic Ollama connectivity. ### GET /api/ollama/config View current Ollama configuration. ## Interpreting Results ### GPU Status **✅ GPU Active:** ```json { "gpu_in_use": true, "gpu_available": true } ``` - GPU acceleration is working - Expect 5-10x faster processing **❌ CPU Mode:** ```json { "gpu_in_use": false, "gpu_available": false } ``` - Running on CPU only - Slower processing (15-30s per article) ### Performance Ratings | Duration | Rating | Mode | |----------|--------|------| | < 5s | Excellent | GPU likely active | | 5-15s | Good | GPU may be active | | 15-30s | Fair | CPU mode | | > 30s | Slow | CPU mode, GPU recommended | ## Troubleshooting ### GPU Not Detected 1. **Check if GPU compose is used:** ```bash docker-compose ps # Should show GPU configuration ``` 2. **Verify NVIDIA runtime:** ```bash docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi ``` 3. **Check Ollama logs:** ```bash docker-compose logs ollama | grep -i gpu ``` ### Slow Performance If performance test shows > 15s: 1. **Enable GPU acceleration:** ```bash docker-compose down docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d ``` 2. **Verify GPU is available:** ```bash nvidia-smi ``` 3. **Check model size:** - Larger models = slower - Try `phi3:latest` for fastest performance ### Connection Errors If API returns connection errors: 1. **Check backend is running:** ```bash docker-compose ps backend ``` 2. **Check Ollama is running:** ```bash docker-compose ps ollama ``` 3. **Restart services:** ```bash docker-compose restart backend ollama ``` ## Monitoring in Production ### Automated Checks Add to your monitoring: ```bash # Check GPU status every 5 minutes */5 * * * * curl -s http://localhost:5001/api/ollama/gpu-status | \ python3 -c "import json,sys; data=json.load(sys.stdin); \ sys.exit(0 if data.get('gpu_in_use') else 1)" ``` ### Performance Alerts Alert if performance degrades: ```bash # Alert if response time > 20s DURATION=$(curl -s http://localhost:5001/api/ollama/test | \ python3 -c "import json,sys; print(json.load(sys.stdin).get('duration_seconds', 999))") if (( $(echo "$DURATION > 20" | bc -l) )); then echo "ALERT: Ollama performance degraded: ${DURATION}s" fi ``` ## Example: Full Health Check ```bash #!/bin/bash # health-check.sh echo "Checking Ollama Health..." # 1. GPU Status GPU=$(curl -s http://localhost:5001/api/ollama/gpu-status | \ python3 -c "import json,sys; print('GPU' if json.load(sys.stdin).get('gpu_in_use') else 'CPU')") # 2. Performance PERF=$(curl -s http://localhost:5001/api/ollama/test | \ python3 -c "import json,sys; data=json.load(sys.stdin); print(f\"{data.get('duration_seconds')}s\")") # 3. Models MODELS=$(curl -s http://localhost:5001/api/ollama/models | \ python3 -c "import json,sys; print(len(json.load(sys.stdin).get('models', [])))") echo "Mode: $GPU" echo "Performance: $PERF" echo "Models: $MODELS" # Exit with error if CPU mode and slow if [ "$GPU" = "CPU" ] && (( $(echo "$PERF > 20" | bc -l) )); then echo "WARNING: Running in CPU mode with slow performance" exit 1 fi echo "✓ Health check passed" ``` ## Related Documentation - [GPU_SETUP.md](GPU_SETUP.md) - GPU setup guide - [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama configuration - [CHANGING_AI_MODEL.md](CHANGING_AI_MODEL.md) - Model switching guide