update
This commit is contained in:
@@ -15,6 +15,21 @@ OLLAMA_MODEL=phi3:latest
|
||||
|
||||
## ✅ How to Change the Model
|
||||
|
||||
### Important Note
|
||||
|
||||
✅ **The model IS automatically checked and downloaded on startup**
|
||||
|
||||
The `ollama-setup` service runs on every `docker-compose up` and:
|
||||
- Checks if the model specified in `.env` exists
|
||||
- Downloads it if missing
|
||||
- Skips download if already present
|
||||
|
||||
This means you can simply:
|
||||
1. Change `OLLAMA_MODEL` in `.env`
|
||||
2. Run `docker-compose up -d`
|
||||
3. Wait for download (if needed)
|
||||
4. Done!
|
||||
|
||||
### Step 1: Update .env File
|
||||
|
||||
Edit `backend/.env` and change the `OLLAMA_MODEL` value:
|
||||
@@ -30,22 +45,38 @@ OLLAMA_MODEL=mistral:7b
|
||||
OLLAMA_MODEL=your-custom-model:latest
|
||||
```
|
||||
|
||||
### Step 2: Restart Services
|
||||
|
||||
The model will be automatically downloaded on startup:
|
||||
### Step 2: Restart Services (Model Auto-Downloads)
|
||||
|
||||
**Option A: Simple restart (Recommended)**
|
||||
```bash
|
||||
# Stop services
|
||||
docker-compose down
|
||||
|
||||
# Start services (model will be pulled automatically)
|
||||
# Restart all services
|
||||
docker-compose up -d
|
||||
|
||||
# Watch the download progress
|
||||
# Watch the model check/download
|
||||
docker-compose logs -f ollama-setup
|
||||
```
|
||||
|
||||
**Note:** First startup with a new model takes 2-10 minutes depending on model size.
|
||||
The `ollama-setup` service will:
|
||||
- Check if the new model exists
|
||||
- Download it if missing (2-10 minutes)
|
||||
- Skip download if already present
|
||||
|
||||
**Option B: Manual pull (if you want control)**
|
||||
```bash
|
||||
# Pull the model manually first
|
||||
./pull-ollama-model.sh
|
||||
|
||||
# Then restart
|
||||
docker-compose restart crawler backend
|
||||
```
|
||||
|
||||
**Option C: Full restart**
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
**Note:** Model download takes 2-10 minutes depending on model size.
|
||||
|
||||
## Supported Models
|
||||
|
||||
@@ -264,3 +295,68 @@ A: 5-10GB for small models, 50GB+ for large models. Plan accordingly.
|
||||
- [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama installation & configuration
|
||||
- [GPU_SETUP.md](GPU_SETUP.md) - GPU acceleration setup
|
||||
- [AI_NEWS_AGGREGATION.md](AI_NEWS_AGGREGATION.md) - AI features overview
|
||||
|
||||
|
||||
## Complete Example: Changing from phi3 to llama3
|
||||
|
||||
```bash
|
||||
# 1. Check current model
|
||||
curl -s http://localhost:5001/api/ollama/models | python3 -m json.tool
|
||||
# Shows: "current_model": "phi3:latest"
|
||||
|
||||
# 2. Update .env file
|
||||
# Edit backend/.env and change:
|
||||
# OLLAMA_MODEL=llama3:8b
|
||||
|
||||
# 3. Pull the new model
|
||||
./pull-ollama-model.sh
|
||||
# Or manually: docker-compose exec ollama ollama pull llama3:8b
|
||||
|
||||
# 4. Restart services
|
||||
docker-compose restart crawler backend
|
||||
|
||||
# 5. Verify the change
|
||||
curl -s http://localhost:5001/api/ollama/models | python3 -m json.tool
|
||||
# Shows: "current_model": "llama3:8b"
|
||||
|
||||
# 6. Test performance
|
||||
curl -s http://localhost:5001/api/ollama/test | python3 -m json.tool
|
||||
# Should show improved quality with llama3
|
||||
```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Change Model Workflow
|
||||
|
||||
```bash
|
||||
# 1. Edit .env
|
||||
vim backend/.env # Change OLLAMA_MODEL
|
||||
|
||||
# 2. Pull model
|
||||
./pull-ollama-model.sh
|
||||
|
||||
# 3. Restart
|
||||
docker-compose restart crawler backend
|
||||
|
||||
# 4. Verify
|
||||
curl http://localhost:5001/api/ollama/test
|
||||
```
|
||||
|
||||
### Common Commands
|
||||
|
||||
```bash
|
||||
# List downloaded models
|
||||
docker-compose exec ollama ollama list
|
||||
|
||||
# Pull a specific model
|
||||
docker-compose exec ollama ollama pull mistral:7b
|
||||
|
||||
# Remove a model
|
||||
docker-compose exec ollama ollama rm phi3:latest
|
||||
|
||||
# Check current config
|
||||
curl http://localhost:5001/api/ollama/config
|
||||
|
||||
# Test performance
|
||||
curl http://localhost:5001/api/ollama/test
|
||||
```
|
||||
|
||||
276
docs/CHECK_GPU_STATUS.md
Normal file
276
docs/CHECK_GPU_STATUS.md
Normal file
@@ -0,0 +1,276 @@
|
||||
# How to Check GPU Status via API
|
||||
|
||||
## Quick Check
|
||||
|
||||
### 1. GPU Status
|
||||
```bash
|
||||
curl http://localhost:5001/api/ollama/gpu-status | python3 -m json.tool
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"ollama_running": true,
|
||||
"gpu_available": true,
|
||||
"gpu_in_use": true,
|
||||
"gpu_details": {
|
||||
"model": "phi3:latest",
|
||||
"gpu_layers": 32,
|
||||
"size": 2300000000
|
||||
},
|
||||
"recommendation": "✓ GPU acceleration is active!"
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Performance Test
|
||||
```bash
|
||||
curl http://localhost:5001/api/ollama/test | python3 -m json.tool
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"duration_seconds": 3.2,
|
||||
"performance": "Excellent (GPU likely active)",
|
||||
"model": "phi3:latest",
|
||||
"recommendation": "Performance is good"
|
||||
}
|
||||
```
|
||||
|
||||
### 3. List Models
|
||||
```bash
|
||||
curl http://localhost:5001/api/ollama/models | python3 -m json.tool
|
||||
```
|
||||
|
||||
## Using the Check Script
|
||||
|
||||
We've created a convenient script:
|
||||
|
||||
```bash
|
||||
./check-gpu-api.sh
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
==========================================
|
||||
Ollama GPU Status Check
|
||||
==========================================
|
||||
|
||||
1. GPU Status:
|
||||
---
|
||||
{
|
||||
"status": "success",
|
||||
"gpu_in_use": true,
|
||||
...
|
||||
}
|
||||
|
||||
2. Performance Test:
|
||||
---
|
||||
{
|
||||
"duration_seconds": 3.2,
|
||||
"performance": "Excellent (GPU likely active)"
|
||||
}
|
||||
|
||||
3. Available Models:
|
||||
---
|
||||
{
|
||||
"models": ["phi3:latest", "llama3:8b"]
|
||||
}
|
||||
|
||||
==========================================
|
||||
Quick Summary:
|
||||
==========================================
|
||||
GPU Status: GPU Active
|
||||
Performance: 3.2s - Excellent (GPU likely active)
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### GET /api/ollama/gpu-status
|
||||
Check if GPU is being used by Ollama.
|
||||
|
||||
**Response Fields:**
|
||||
- `gpu_available` - GPU hardware detected
|
||||
- `gpu_in_use` - Ollama actively using GPU
|
||||
- `gpu_details` - GPU configuration details
|
||||
- `recommendation` - Setup suggestions
|
||||
|
||||
### GET /api/ollama/test
|
||||
Test Ollama performance with a sample prompt.
|
||||
|
||||
**Response Fields:**
|
||||
- `duration_seconds` - Time taken for test
|
||||
- `performance` - Performance rating
|
||||
- `recommendation` - Performance suggestions
|
||||
|
||||
### GET /api/ollama/models
|
||||
List all available models.
|
||||
|
||||
**Response Fields:**
|
||||
- `models` - Array of model names
|
||||
- `current_model` - Active model from .env
|
||||
|
||||
### GET /api/ollama/ping
|
||||
Test basic Ollama connectivity.
|
||||
|
||||
### GET /api/ollama/config
|
||||
View current Ollama configuration.
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
### GPU Status
|
||||
|
||||
**✅ GPU Active:**
|
||||
```json
|
||||
{
|
||||
"gpu_in_use": true,
|
||||
"gpu_available": true
|
||||
}
|
||||
```
|
||||
- GPU acceleration is working
|
||||
- Expect 5-10x faster processing
|
||||
|
||||
**❌ CPU Mode:**
|
||||
```json
|
||||
{
|
||||
"gpu_in_use": false,
|
||||
"gpu_available": false
|
||||
}
|
||||
```
|
||||
- Running on CPU only
|
||||
- Slower processing (15-30s per article)
|
||||
|
||||
### Performance Ratings
|
||||
|
||||
| Duration | Rating | Mode |
|
||||
|----------|--------|------|
|
||||
| < 5s | Excellent | GPU likely active |
|
||||
| 5-15s | Good | GPU may be active |
|
||||
| 15-30s | Fair | CPU mode |
|
||||
| > 30s | Slow | CPU mode, GPU recommended |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### GPU Not Detected
|
||||
|
||||
1. **Check if GPU compose is used:**
|
||||
```bash
|
||||
docker-compose ps
|
||||
# Should show GPU configuration
|
||||
```
|
||||
|
||||
2. **Verify NVIDIA runtime:**
|
||||
```bash
|
||||
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
|
||||
```
|
||||
|
||||
3. **Check Ollama logs:**
|
||||
```bash
|
||||
docker-compose logs ollama | grep -i gpu
|
||||
```
|
||||
|
||||
### Slow Performance
|
||||
|
||||
If performance test shows > 15s:
|
||||
|
||||
1. **Enable GPU acceleration:**
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
||||
```
|
||||
|
||||
2. **Verify GPU is available:**
|
||||
```bash
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
3. **Check model size:**
|
||||
- Larger models = slower
|
||||
- Try `phi3:latest` for fastest performance
|
||||
|
||||
### Connection Errors
|
||||
|
||||
If API returns connection errors:
|
||||
|
||||
1. **Check backend is running:**
|
||||
```bash
|
||||
docker-compose ps backend
|
||||
```
|
||||
|
||||
2. **Check Ollama is running:**
|
||||
```bash
|
||||
docker-compose ps ollama
|
||||
```
|
||||
|
||||
3. **Restart services:**
|
||||
```bash
|
||||
docker-compose restart backend ollama
|
||||
```
|
||||
|
||||
## Monitoring in Production
|
||||
|
||||
### Automated Checks
|
||||
|
||||
Add to your monitoring:
|
||||
|
||||
```bash
|
||||
# Check GPU status every 5 minutes
|
||||
*/5 * * * * curl -s http://localhost:5001/api/ollama/gpu-status | \
|
||||
python3 -c "import json,sys; data=json.load(sys.stdin); \
|
||||
sys.exit(0 if data.get('gpu_in_use') else 1)"
|
||||
```
|
||||
|
||||
### Performance Alerts
|
||||
|
||||
Alert if performance degrades:
|
||||
|
||||
```bash
|
||||
# Alert if response time > 20s
|
||||
DURATION=$(curl -s http://localhost:5001/api/ollama/test | \
|
||||
python3 -c "import json,sys; print(json.load(sys.stdin).get('duration_seconds', 999))")
|
||||
|
||||
if (( $(echo "$DURATION > 20" | bc -l) )); then
|
||||
echo "ALERT: Ollama performance degraded: ${DURATION}s"
|
||||
fi
|
||||
```
|
||||
|
||||
## Example: Full Health Check
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# health-check.sh
|
||||
|
||||
echo "Checking Ollama Health..."
|
||||
|
||||
# 1. GPU Status
|
||||
GPU=$(curl -s http://localhost:5001/api/ollama/gpu-status | \
|
||||
python3 -c "import json,sys; print('GPU' if json.load(sys.stdin).get('gpu_in_use') else 'CPU')")
|
||||
|
||||
# 2. Performance
|
||||
PERF=$(curl -s http://localhost:5001/api/ollama/test | \
|
||||
python3 -c "import json,sys; data=json.load(sys.stdin); print(f\"{data.get('duration_seconds')}s\")")
|
||||
|
||||
# 3. Models
|
||||
MODELS=$(curl -s http://localhost:5001/api/ollama/models | \
|
||||
python3 -c "import json,sys; print(len(json.load(sys.stdin).get('models', [])))")
|
||||
|
||||
echo "Mode: $GPU"
|
||||
echo "Performance: $PERF"
|
||||
echo "Models: $MODELS"
|
||||
|
||||
# Exit with error if CPU mode and slow
|
||||
if [ "$GPU" = "CPU" ] && (( $(echo "$PERF > 20" | bc -l) )); then
|
||||
echo "WARNING: Running in CPU mode with slow performance"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✓ Health check passed"
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [GPU_SETUP.md](GPU_SETUP.md) - GPU setup guide
|
||||
- [OLLAMA_SETUP.md](OLLAMA_SETUP.md) - Ollama configuration
|
||||
- [CHANGING_AI_MODEL.md](CHANGING_AI_MODEL.md) - Model switching guide
|
||||
Reference in New Issue
Block a user