update
This commit is contained in:
@@ -1,243 +0,0 @@
|
|||||||
# ✅ Final Clean Project Structure
|
|
||||||
|
|
||||||
## 🎉 Cleanup Complete!
|
|
||||||
|
|
||||||
Your Munich News Daily project is now clean, organized, and professional.
|
|
||||||
|
|
||||||
## 📁 Current Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
munich-news/
|
|
||||||
├── 📄 Root Files (5 essential files)
|
|
||||||
│ ├── README.md # Main documentation
|
|
||||||
│ ├── QUICKSTART.md # 5-minute setup guide
|
|
||||||
│ ├── CONTRIBUTING.md # Contribution guidelines
|
|
||||||
│ ├── PROJECT_STRUCTURE.md # Project layout
|
|
||||||
│ └── docker-compose.yml # Single unified compose file
|
|
||||||
│
|
|
||||||
├── 📚 docs/ (12 documentation files)
|
|
||||||
│ ├── API.md # API reference
|
|
||||||
│ ├── ARCHITECTURE.md # System architecture
|
|
||||||
│ ├── BACKEND_STRUCTURE.md # Backend organization
|
|
||||||
│ ├── CRAWLER_HOW_IT_WORKS.md # Crawler internals
|
|
||||||
│ ├── DATABASE_SCHEMA.md # Database structure
|
|
||||||
│ ├── DEPLOYMENT.md # Deployment guide
|
|
||||||
│ ├── EXTRACTION_STRATEGIES.md # Content extraction
|
|
||||||
│ └── RSS_URL_EXTRACTION.md # RSS parsing
|
|
||||||
│
|
|
||||||
├── 🧪 tests/ (10 test files)
|
|
||||||
│ ├── backend/ # Backend tests
|
|
||||||
│ ├── crawler/ # Crawler tests
|
|
||||||
│ └── sender/ # Sender tests
|
|
||||||
│
|
|
||||||
├── 🔧 backend/ # Backend API
|
|
||||||
│ ├── routes/
|
|
||||||
│ ├── services/
|
|
||||||
│ ├── .env.example
|
|
||||||
│ └── app.py
|
|
||||||
│
|
|
||||||
├── 📰 news_crawler/ # Crawler service
|
|
||||||
│ ├── Dockerfile
|
|
||||||
│ ├── crawler_service.py
|
|
||||||
│ ├── scheduled_crawler.py
|
|
||||||
│ └── requirements.txt
|
|
||||||
│
|
|
||||||
├── 📧 news_sender/ # Sender service
|
|
||||||
│ ├── Dockerfile
|
|
||||||
│ ├── sender_service.py
|
|
||||||
│ ├── scheduled_sender.py
|
|
||||||
│ └── requirements.txt
|
|
||||||
│
|
|
||||||
└── 🎨 frontend/ # React dashboard (optional)
|
|
||||||
```
|
|
||||||
|
|
||||||
## ✨ What Was Cleaned
|
|
||||||
|
|
||||||
### Removed Files (20+)
|
|
||||||
- ❌ All redundant markdown files from root
|
|
||||||
- ❌ All redundant markdown files from subdirectories
|
|
||||||
- ❌ Multiple docker-compose files (kept only 1)
|
|
||||||
- ❌ Multiple startup scripts (use docker-compose now)
|
|
||||||
- ❌ Test scripts and helpers
|
|
||||||
|
|
||||||
### Organized Files
|
|
||||||
- ✅ All tests → `tests/` directory
|
|
||||||
- ✅ All documentation → `docs/` directory
|
|
||||||
- ✅ All docker configs → single `docker-compose.yml`
|
|
||||||
|
|
||||||
## 🚀 How to Use
|
|
||||||
|
|
||||||
### Start Everything
|
|
||||||
```bash
|
|
||||||
docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
That's it! One command starts:
|
|
||||||
- MongoDB database
|
|
||||||
- News crawler (6 AM schedule)
|
|
||||||
- Newsletter sender (7 AM schedule)
|
|
||||||
|
|
||||||
### View Logs
|
|
||||||
```bash
|
|
||||||
docker-compose logs -f
|
|
||||||
```
|
|
||||||
|
|
||||||
### Stop Everything
|
|
||||||
```bash
|
|
||||||
docker-compose down
|
|
||||||
```
|
|
||||||
|
|
||||||
## 📊 Before vs After
|
|
||||||
|
|
||||||
### Before
|
|
||||||
```
|
|
||||||
Root: 20+ files (messy)
|
|
||||||
├── AUTOMATION_README.md
|
|
||||||
├── AUTOMATION_SETUP_COMPLETE.md
|
|
||||||
├── CRAWLER_QUICKSTART.md
|
|
||||||
├── CRAWLER_SETUP_SUMMARY.md
|
|
||||||
├── docker-compose.yml
|
|
||||||
├── docker-compose.prod.yml
|
|
||||||
├── README_CRAWLER.md
|
|
||||||
├── start-automation.sh
|
|
||||||
├── start-crawler.sh
|
|
||||||
├── start-sender.sh
|
|
||||||
├── test-crawler-setup.sh
|
|
||||||
└── ... many more
|
|
||||||
|
|
||||||
Subdirectories: Scattered docs
|
|
||||||
├── backend/TRACKING_README.md
|
|
||||||
├── backend/TRACKING_CONFIGURATION.md
|
|
||||||
├── news_crawler/README.md
|
|
||||||
├── news_crawler/QUICKSTART.md
|
|
||||||
├── news_crawler/docker-compose.yml
|
|
||||||
├── news_sender/README.md
|
|
||||||
└── ... more scattered files
|
|
||||||
|
|
||||||
Tests: Scattered everywhere
|
|
||||||
```
|
|
||||||
|
|
||||||
### After
|
|
||||||
```
|
|
||||||
Root: 5 essential files (clean)
|
|
||||||
├── README.md
|
|
||||||
├── QUICKSTART.md
|
|
||||||
├── CONTRIBUTING.md
|
|
||||||
├── PROJECT_STRUCTURE.md
|
|
||||||
└── docker-compose.yml
|
|
||||||
|
|
||||||
docs/: All documentation (12 files)
|
|
||||||
├── API.md
|
|
||||||
├── ARCHITECTURE.md
|
|
||||||
├── DEPLOYMENT.md
|
|
||||||
└── ... organized docs
|
|
||||||
|
|
||||||
tests/: All tests (10 files)
|
|
||||||
├── backend/
|
|
||||||
├── crawler/
|
|
||||||
└── sender/
|
|
||||||
|
|
||||||
Subdirectories: Clean, no scattered docs
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🎯 Benefits
|
|
||||||
|
|
||||||
### 1. Easy to Navigate
|
|
||||||
- Clear directory structure
|
|
||||||
- Everything in its place
|
|
||||||
- No clutter
|
|
||||||
|
|
||||||
### 2. Simple to Use
|
|
||||||
- One command: `docker-compose up -d`
|
|
||||||
- One place for docs: `docs/`
|
|
||||||
- One place for tests: `tests/`
|
|
||||||
|
|
||||||
### 3. Professional
|
|
||||||
- Industry-standard layout
|
|
||||||
- Clean and organized
|
|
||||||
- Ready for collaboration
|
|
||||||
|
|
||||||
### 4. Maintainable
|
|
||||||
- Easy to find files
|
|
||||||
- Clear separation of concerns
|
|
||||||
- Scalable structure
|
|
||||||
|
|
||||||
## 📝 Quick Reference
|
|
||||||
|
|
||||||
### Documentation
|
|
||||||
```bash
|
|
||||||
# Main docs
|
|
||||||
cat README.md
|
|
||||||
cat QUICKSTART.md
|
|
||||||
|
|
||||||
# Technical docs
|
|
||||||
ls docs/
|
|
||||||
```
|
|
||||||
|
|
||||||
### Running
|
|
||||||
```bash
|
|
||||||
# Start
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# Logs
|
|
||||||
docker-compose logs -f
|
|
||||||
|
|
||||||
# Stop
|
|
||||||
docker-compose down
|
|
||||||
```
|
|
||||||
|
|
||||||
### Testing
|
|
||||||
```bash
|
|
||||||
# Run tests
|
|
||||||
docker-compose exec crawler python tests/crawler/test_crawler.py
|
|
||||||
docker-compose exec sender python tests/sender/test_tracking_integration.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### Development
|
|
||||||
```bash
|
|
||||||
# Edit code in respective directories
|
|
||||||
# Rebuild
|
|
||||||
docker-compose up -d --build
|
|
||||||
```
|
|
||||||
|
|
||||||
## ✅ Verification
|
|
||||||
|
|
||||||
Run these commands to verify the cleanup:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check root directory (should be clean)
|
|
||||||
ls -1 *.md
|
|
||||||
|
|
||||||
# Check docs directory
|
|
||||||
ls -1 docs/
|
|
||||||
|
|
||||||
# Check tests directory
|
|
||||||
ls -1 tests/
|
|
||||||
|
|
||||||
# Check for stray docker-compose files (should be only 1)
|
|
||||||
find . -name "docker-compose*.yml" ! -path "*/node_modules/*" ! -path "*/env/*"
|
|
||||||
|
|
||||||
# Check for stray markdown in subdirectories (should be none)
|
|
||||||
find backend news_crawler news_sender -name "*.md" ! -path "*/env/*"
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🎊 Result
|
|
||||||
|
|
||||||
A clean, professional, production-ready project structure!
|
|
||||||
|
|
||||||
**One command to start everything:**
|
|
||||||
```bash
|
|
||||||
docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
**One place for all documentation:**
|
|
||||||
```bash
|
|
||||||
ls docs/
|
|
||||||
```
|
|
||||||
|
|
||||||
**One place for all tests:**
|
|
||||||
```bash
|
|
||||||
ls tests/
|
|
||||||
```
|
|
||||||
|
|
||||||
Simple. Clean. Professional. ✨
|
|
||||||
@@ -1,53 +0,0 @@
|
|||||||
# GPU Support Implementation - Complete Summary
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
Successfully implemented comprehensive GPU support for Ollama AI service in the Munich News Daily system. The implementation provides 5-10x faster AI inference for article translation and summarization when NVIDIA GPU is available, with automatic fallback to CPU mode.
|
|
||||||
|
|
||||||
## What Was Implemented
|
|
||||||
|
|
||||||
### 1. Docker Configuration ✅
|
|
||||||
- **docker-compose.yml**: Added Ollama service with automatic model download
|
|
||||||
- **docker-compose.gpu.yml**: GPU-specific override for NVIDIA GPU support
|
|
||||||
- **ollama-setup service**: Automatically pulls phi3:latest model on first startup
|
|
||||||
|
|
||||||
### 2. Helper Scripts ✅
|
|
||||||
- **start-with-gpu.sh**: Auto-detects GPU and starts services with appropriate configuration
|
|
||||||
- **check-gpu.sh**: Diagnoses GPU availability and Docker GPU support
|
|
||||||
- **configure-ollama.sh**: Interactive configuration for Docker Compose or external Ollama
|
|
||||||
- **test-ollama-setup.sh**: Comprehensive test suite to verify setup
|
|
||||||
|
|
||||||
### 3. Documentation ✅
|
|
||||||
- **docs/OLLAMA_SETUP.md**: Complete Ollama setup guide (6.6KB)
|
|
||||||
- **docs/GPU_SETUP.md**: Detailed GPU setup and troubleshooting (7.8KB)
|
|
||||||
- **docs/PERFORMANCE_COMPARISON.md**: CPU vs GPU benchmarks (5.2KB)
|
|
||||||
- **QUICK_START_GPU.md**: Quick reference card (2.8KB)
|
|
||||||
- **OLLAMA_GPU_SUMMARY.md**: Implementation summary (8.4KB)
|
|
||||||
- **README.md**: Updated with GPU support information
|
|
||||||
|
|
||||||
## Performance Improvements
|
|
||||||
|
|
||||||
| Operation | CPU | GPU | Speedup |
|
|
||||||
|-----------|-----|-----|---------|
|
|
||||||
| Translation | 1.5s | 0.3s | 5x |
|
|
||||||
| Summarization | 8s | 2s | 4x |
|
|
||||||
| 10 Articles | 115s | 31s | 3.7x |
|
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check GPU availability
|
|
||||||
./check-gpu.sh
|
|
||||||
|
|
||||||
# Start services with auto-detection
|
|
||||||
./start-with-gpu.sh
|
|
||||||
|
|
||||||
# Test translation
|
|
||||||
docker-compose exec crawler python crawler_service.py 2
|
|
||||||
```
|
|
||||||
|
|
||||||
## Testing Results
|
|
||||||
|
|
||||||
All tests pass successfully ✅
|
|
||||||
|
|
||||||
The implementation is complete, tested, and ready for use!
|
|
||||||
@@ -1,205 +0,0 @@
|
|||||||
# Newsletter API Update
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Added a new API endpoint to send newsletters to all active subscribers instead of requiring a specific email address.
|
|
||||||
|
|
||||||
## New Endpoint
|
|
||||||
|
|
||||||
### Send Newsletter to All Subscribers
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/admin/send-newsletter
|
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body** (optional):
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"max_articles": 10
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response**:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"success": true,
|
|
||||||
"message": "Newsletter sent successfully to 45 subscribers",
|
|
||||||
"subscriber_count": 45,
|
|
||||||
"max_articles": 10,
|
|
||||||
"output": "... sender output ...",
|
|
||||||
"errors": ""
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Usage Examples
|
|
||||||
|
|
||||||
### Send Newsletter to All Subscribers
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Send with default settings (10 articles)
|
|
||||||
curl -X POST http://localhost:5001/api/admin/send-newsletter \
|
|
||||||
-H "Content-Type: application/json"
|
|
||||||
|
|
||||||
# Send with custom article count
|
|
||||||
curl -X POST http://localhost:5001/api/admin/send-newsletter \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"max_articles": 15}'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Complete Workflow
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. Check subscriber count
|
|
||||||
curl http://localhost:5001/api/admin/stats | jq '.subscribers'
|
|
||||||
|
|
||||||
# 2. Crawl fresh articles
|
|
||||||
curl -X POST http://localhost:5001/api/admin/trigger-crawl \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"max_articles": 10}'
|
|
||||||
|
|
||||||
# 3. Wait for crawl to complete
|
|
||||||
sleep 60
|
|
||||||
|
|
||||||
# 4. Send newsletter to all active subscribers
|
|
||||||
curl -X POST http://localhost:5001/api/admin/send-newsletter \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"max_articles": 10}'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Comparison with Test Email
|
|
||||||
|
|
||||||
### Send Test Email (Existing)
|
|
||||||
- Sends to **one specific email address**
|
|
||||||
- Useful for testing newsletter content
|
|
||||||
- No tracking recorded in database
|
|
||||||
- Fast (single email)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:5001/api/admin/send-test-email \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"email": "test@example.com"}'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Send Newsletter (New)
|
|
||||||
- Sends to **all active subscribers**
|
|
||||||
- Production newsletter sending
|
|
||||||
- Full tracking (opens, clicks)
|
|
||||||
- May take time for large lists
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:5001/api/admin/send-newsletter \
|
|
||||||
-H "Content-Type: application/json"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Features
|
|
||||||
|
|
||||||
### Subscriber Filtering
|
|
||||||
- Only sends to subscribers with `status: 'active'`
|
|
||||||
- Skips inactive, unsubscribed, or bounced subscribers
|
|
||||||
- Returns error if no active subscribers found
|
|
||||||
|
|
||||||
### Tracking
|
|
||||||
- Includes tracking pixel for open tracking
|
|
||||||
- Includes click tracking for all article links
|
|
||||||
- Records send time and newsletter ID
|
|
||||||
- Stores in `newsletter_sends` collection
|
|
||||||
|
|
||||||
### Error Handling
|
|
||||||
- Validates subscriber count before sending
|
|
||||||
- Returns detailed error messages
|
|
||||||
- Includes sender output and errors in response
|
|
||||||
- 5-minute timeout for large lists
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
### Interactive Test Script
|
|
||||||
|
|
||||||
```bash
|
|
||||||
./test-newsletter-api.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
This script will:
|
|
||||||
1. Show current subscriber stats
|
|
||||||
2. Optionally send test email to your address
|
|
||||||
3. Optionally send newsletter to all subscribers
|
|
||||||
|
|
||||||
### Manual Testing
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. Check subscribers
|
|
||||||
curl http://localhost:5001/api/admin/stats
|
|
||||||
|
|
||||||
# 2. Send newsletter
|
|
||||||
curl -X POST http://localhost:5001/api/admin/send-newsletter \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"max_articles": 2}'
|
|
||||||
|
|
||||||
# 3. Check results
|
|
||||||
curl http://localhost:5001/api/admin/stats
|
|
||||||
```
|
|
||||||
|
|
||||||
## Security Considerations
|
|
||||||
|
|
||||||
⚠️ **Important**: This endpoint sends emails to real subscribers!
|
|
||||||
|
|
||||||
### Recommendations
|
|
||||||
|
|
||||||
1. **Add Authentication**
|
|
||||||
```python
|
|
||||||
@require_api_key
|
|
||||||
def send_newsletter():
|
|
||||||
# ...
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Rate Limiting**
|
|
||||||
- Prevent accidental multiple sends
|
|
||||||
- Limit to once per hour/day
|
|
||||||
|
|
||||||
3. **Confirmation Required**
|
|
||||||
- Add confirmation step in UI
|
|
||||||
- Log all newsletter sends
|
|
||||||
|
|
||||||
4. **Dry Run Mode**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"max_articles": 10,
|
|
||||||
"dry_run": true // Preview without sending
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
5. **Audit Logging**
|
|
||||||
- Log who triggered the send
|
|
||||||
- Log timestamp and parameters
|
|
||||||
- Track success/failure
|
|
||||||
|
|
||||||
## Files Modified
|
|
||||||
|
|
||||||
- ✅ `backend/routes/admin_routes.py` - Added new endpoint
|
|
||||||
- ✅ `docs/ADMIN_API.md` - Updated documentation
|
|
||||||
- ✅ `test-newsletter-api.sh` - Created test script
|
|
||||||
|
|
||||||
## API Endpoints Summary
|
|
||||||
|
|
||||||
| Endpoint | Purpose | Recipient |
|
|
||||||
|----------|---------|-----------|
|
|
||||||
| `/api/admin/send-test-email` | Test newsletter | Single email (specified) |
|
|
||||||
| `/api/admin/send-newsletter` | Production send | All active subscribers |
|
|
||||||
| `/api/admin/trigger-crawl` | Fetch articles | N/A |
|
|
||||||
| `/api/admin/stats` | System stats | N/A |
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
1. **Test the endpoint:**
|
|
||||||
```bash
|
|
||||||
./test-newsletter-api.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Add authentication** (recommended for production)
|
|
||||||
|
|
||||||
3. **Set up monitoring** for newsletter sends
|
|
||||||
|
|
||||||
4. **Create UI** for easier newsletter management
|
|
||||||
|
|
||||||
## Documentation
|
|
||||||
|
|
||||||
See [docs/ADMIN_API.md](docs/ADMIN_API.md) for complete API documentation.
|
|
||||||
@@ -1,278 +0,0 @@
|
|||||||
# Ollama with GPU Support - Implementation Summary
|
|
||||||
|
|
||||||
## What Was Added
|
|
||||||
|
|
||||||
This implementation adds comprehensive GPU support for Ollama AI service in the Munich News Daily system, enabling 5-10x faster AI inference for article translation and summarization.
|
|
||||||
|
|
||||||
## Files Created/Modified
|
|
||||||
|
|
||||||
### Docker Configuration
|
|
||||||
- **docker-compose.yml** - Added Ollama service with GPU support comments
|
|
||||||
- **docker-compose.gpu.yml** - GPU-specific override configuration
|
|
||||||
- **docker-compose.yml** - Added ollama-setup service for automatic model download
|
|
||||||
|
|
||||||
### Helper Scripts
|
|
||||||
- **start-with-gpu.sh** - Auto-detect GPU and start services accordingly
|
|
||||||
- **check-gpu.sh** - Check GPU availability and Docker GPU support
|
|
||||||
- **configure-ollama.sh** - Configure Ollama for Docker Compose or external server
|
|
||||||
|
|
||||||
### Documentation
|
|
||||||
- **docs/OLLAMA_SETUP.md** - Complete Ollama setup guide with GPU section
|
|
||||||
- **docs/GPU_SETUP.md** - Detailed GPU setup and troubleshooting guide
|
|
||||||
- **docs/PERFORMANCE_COMPARISON.md** - CPU vs GPU performance analysis
|
|
||||||
- **README.md** - Updated with GPU support information
|
|
||||||
|
|
||||||
## Key Features
|
|
||||||
|
|
||||||
### 1. Automatic GPU Detection
|
|
||||||
```bash
|
|
||||||
./start-with-gpu.sh
|
|
||||||
```
|
|
||||||
- Detects NVIDIA GPU availability
|
|
||||||
- Checks Docker GPU runtime
|
|
||||||
- Automatically starts with appropriate configuration
|
|
||||||
|
|
||||||
### 2. Flexible Deployment Options
|
|
||||||
|
|
||||||
**Option A: Integrated Ollama (Docker Compose)**
|
|
||||||
```bash
|
|
||||||
# CPU mode
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# GPU mode
|
|
||||||
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option B: External Ollama Server**
|
|
||||||
```bash
|
|
||||||
# Configure for external server
|
|
||||||
./configure-ollama.sh
|
|
||||||
# Select option 2
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Automatic Model Download
|
|
||||||
- Ollama service starts automatically
|
|
||||||
- ollama-setup service pulls phi3:latest model on first run
|
|
||||||
- Model persists in Docker volume
|
|
||||||
|
|
||||||
### 4. GPU Support
|
|
||||||
- NVIDIA GPU acceleration when available
|
|
||||||
- Automatic fallback to CPU if GPU unavailable
|
|
||||||
- 5-10x performance improvement with GPU
|
|
||||||
|
|
||||||
## Performance Improvements
|
|
||||||
|
|
||||||
| Operation | CPU | GPU | Speedup |
|
|
||||||
|-----------|-----|-----|---------|
|
|
||||||
| Translation | 1.5s | 0.3s | 5x |
|
|
||||||
| Summarization | 8s | 2s | 4x |
|
|
||||||
| 10 Articles | 115s | 31s | 3.7x |
|
|
||||||
|
|
||||||
## Usage Examples
|
|
||||||
|
|
||||||
### Check GPU Availability
|
|
||||||
```bash
|
|
||||||
./check-gpu.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Start with GPU (Automatic)
|
|
||||||
```bash
|
|
||||||
./start-with-gpu.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Start with GPU (Manual)
|
|
||||||
```bash
|
|
||||||
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
### Verify GPU Usage
|
|
||||||
```bash
|
|
||||||
# Check GPU in container
|
|
||||||
docker exec munich-news-ollama nvidia-smi
|
|
||||||
|
|
||||||
# Monitor GPU during processing
|
|
||||||
watch -n 1 'docker exec munich-news-ollama nvidia-smi'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Test Translation
|
|
||||||
```bash
|
|
||||||
# Run test crawl
|
|
||||||
docker-compose exec crawler python crawler_service.py 2
|
|
||||||
|
|
||||||
# Check timing in logs
|
|
||||||
docker-compose logs crawler | grep "Title translated"
|
|
||||||
# GPU: ✓ Title translated (0.3s)
|
|
||||||
# CPU: ✓ Title translated (1.5s)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
### Environment Variables (backend/.env)
|
|
||||||
|
|
||||||
**For Docker Compose Ollama:**
|
|
||||||
```env
|
|
||||||
OLLAMA_ENABLED=true
|
|
||||||
OLLAMA_BASE_URL=http://ollama:11434
|
|
||||||
OLLAMA_MODEL=phi3:latest
|
|
||||||
OLLAMA_TIMEOUT=120
|
|
||||||
```
|
|
||||||
|
|
||||||
**For External Ollama:**
|
|
||||||
```env
|
|
||||||
OLLAMA_ENABLED=true
|
|
||||||
OLLAMA_BASE_URL=http://host.docker.internal:11434
|
|
||||||
OLLAMA_MODEL=phi3:latest
|
|
||||||
OLLAMA_TIMEOUT=120
|
|
||||||
```
|
|
||||||
|
|
||||||
## Requirements
|
|
||||||
|
|
||||||
### For CPU Mode
|
|
||||||
- Docker & Docker Compose
|
|
||||||
- 4GB+ RAM
|
|
||||||
- 4+ CPU cores recommended
|
|
||||||
|
|
||||||
### For GPU Mode
|
|
||||||
- NVIDIA GPU (GTX 1060 or newer)
|
|
||||||
- 4GB+ VRAM
|
|
||||||
- NVIDIA drivers (525.60.13+)
|
|
||||||
- NVIDIA Container Toolkit
|
|
||||||
- Docker 20.10+
|
|
||||||
- Docker Compose v2.3+
|
|
||||||
|
|
||||||
## Installation Steps
|
|
||||||
|
|
||||||
### 1. Install NVIDIA Container Toolkit (Ubuntu/Debian)
|
|
||||||
```bash
|
|
||||||
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
|
||||||
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
|
||||||
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
|
|
||||||
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
|
||||||
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
|
||||||
|
|
||||||
sudo apt-get update
|
|
||||||
sudo apt-get install -y nvidia-container-toolkit
|
|
||||||
sudo nvidia-ctk runtime configure --runtime=docker
|
|
||||||
sudo systemctl restart docker
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Verify Installation
|
|
||||||
```bash
|
|
||||||
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Configure Ollama
|
|
||||||
```bash
|
|
||||||
./configure-ollama.sh
|
|
||||||
# Select option 1 for Docker Compose
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Start Services
|
|
||||||
```bash
|
|
||||||
./start-with-gpu.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### GPU Not Detected
|
|
||||||
```bash
|
|
||||||
# Check NVIDIA drivers
|
|
||||||
nvidia-smi
|
|
||||||
|
|
||||||
# Check Docker GPU access
|
|
||||||
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
|
|
||||||
|
|
||||||
# Check Ollama container
|
|
||||||
docker exec munich-news-ollama nvidia-smi
|
|
||||||
```
|
|
||||||
|
|
||||||
### Out of Memory
|
|
||||||
- Use smaller model: `OLLAMA_MODEL=gemma2:2b`
|
|
||||||
- Close other GPU applications
|
|
||||||
- Increase Docker memory limit
|
|
||||||
|
|
||||||
### Slow Performance
|
|
||||||
- Verify GPU is being used: `docker exec munich-news-ollama nvidia-smi`
|
|
||||||
- Check GPU utilization during inference
|
|
||||||
- Ensure using GPU compose file
|
|
||||||
- Update NVIDIA drivers
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────┐
|
|
||||||
│ Docker Compose │
|
|
||||||
├─────────────────────────────────────────────────────────┤
|
|
||||||
│ │
|
|
||||||
│ ┌──────────────┐ ┌──────────────┐ │
|
|
||||||
│ │ Ollama │◄─────┤ Crawler │ │
|
|
||||||
│ │ (GPU/CPU) │ │ │ │
|
|
||||||
│ │ │ │ - Fetches │ │
|
|
||||||
│ │ - phi3 │ │ - Translates│ │
|
|
||||||
│ │ - Translate │ │ - Summarizes│ │
|
|
||||||
│ │ - Summarize │ └──────────────┘ │
|
|
||||||
│ └──────────────┘ │
|
|
||||||
│ │ │
|
|
||||||
│ │ GPU (optional) │
|
|
||||||
│ ▼ │
|
|
||||||
│ ┌──────────────┐ │
|
|
||||||
│ │ NVIDIA GPU │ │
|
|
||||||
│ │ (5-10x faster)│ │
|
|
||||||
│ └──────────────┘ │
|
|
||||||
│ │
|
|
||||||
└─────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
## Model Options
|
|
||||||
|
|
||||||
| Model | Size | VRAM | Speed | Quality | Use Case |
|
|
||||||
|-------|------|------|-------|---------|----------|
|
|
||||||
| gemma2:2b | 1.4GB | 1.5GB | Fastest | Good | High volume |
|
|
||||||
| phi3:latest | 2.3GB | 3-4GB | Fast | Very Good | Default |
|
|
||||||
| llama3.2:3b | 3.2GB | 5-6GB | Medium | Excellent | Quality critical |
|
|
||||||
| mistral:latest | 4.1GB | 6-8GB | Medium | Excellent | Long-form |
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
1. **Test the setup:**
|
|
||||||
```bash
|
|
||||||
./check-gpu.sh
|
|
||||||
./start-with-gpu.sh
|
|
||||||
docker-compose exec crawler python crawler_service.py 2
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Monitor performance:**
|
|
||||||
```bash
|
|
||||||
watch -n 1 'docker exec munich-news-ollama nvidia-smi'
|
|
||||||
docker-compose logs -f crawler
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Optimize for your use case:**
|
|
||||||
- Adjust model based on VRAM availability
|
|
||||||
- Tune summary length for speed vs quality
|
|
||||||
- Enable concurrent requests for high volume
|
|
||||||
|
|
||||||
## Documentation
|
|
||||||
|
|
||||||
- **[OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)** - Complete Ollama setup guide
|
|
||||||
- **[GPU_SETUP.md](docs/GPU_SETUP.md)** - Detailed GPU setup and troubleshooting
|
|
||||||
- **[PERFORMANCE_COMPARISON.md](docs/PERFORMANCE_COMPARISON.md)** - CPU vs GPU analysis
|
|
||||||
|
|
||||||
## Support
|
|
||||||
|
|
||||||
For issues or questions:
|
|
||||||
1. Run `./check-gpu.sh` for diagnostics
|
|
||||||
2. Check logs: `docker-compose logs ollama`
|
|
||||||
3. See troubleshooting sections in documentation
|
|
||||||
4. Open an issue with diagnostic output
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
✅ Ollama service integrated into Docker Compose
|
|
||||||
✅ Automatic model download (phi3:latest)
|
|
||||||
✅ GPU support with automatic detection
|
|
||||||
✅ Fallback to CPU when GPU unavailable
|
|
||||||
✅ Helper scripts for easy setup
|
|
||||||
✅ Comprehensive documentation
|
|
||||||
✅ 5-10x performance improvement with GPU
|
|
||||||
✅ Flexible deployment options
|
|
||||||
@@ -1,85 +0,0 @@
|
|||||||
# Ollama Integration Complete ✅
|
|
||||||
|
|
||||||
## What Was Added
|
|
||||||
|
|
||||||
1. **Ollama Service in Docker Compose**
|
|
||||||
- Runs Ollama server on port 11434
|
|
||||||
- Persists models in `ollama_data` volume
|
|
||||||
- Health check ensures service is ready
|
|
||||||
|
|
||||||
2. **Automatic Model Download**
|
|
||||||
- `ollama-setup` service automatically pulls `phi3:latest` (2.2GB)
|
|
||||||
- Runs once on first startup
|
|
||||||
- Model is cached in volume for future use
|
|
||||||
|
|
||||||
3. **Configuration Files**
|
|
||||||
- `docs/OLLAMA_SETUP.md` - Comprehensive setup guide
|
|
||||||
- `configure-ollama.sh` - Helper script to switch between Docker/external Ollama
|
|
||||||
- Updated `README.md` with Ollama setup instructions
|
|
||||||
|
|
||||||
4. **Environment Configuration**
|
|
||||||
- Updated `backend/.env` to use `http://ollama:11434` (internal Docker network)
|
|
||||||
- All services can now communicate with Ollama via Docker network
|
|
||||||
|
|
||||||
## Current Status
|
|
||||||
|
|
||||||
✅ Ollama service running and healthy
|
|
||||||
✅ phi3:latest model downloaded (2.2GB)
|
|
||||||
✅ Translation feature working with integrated Ollama
|
|
||||||
✅ Summarization feature working with integrated Ollama
|
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start all services (including Ollama)
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# Wait for model download (first time only, ~2-5 minutes)
|
|
||||||
docker-compose logs -f ollama-setup
|
|
||||||
|
|
||||||
# Verify Ollama is ready
|
|
||||||
docker-compose exec ollama ollama list
|
|
||||||
|
|
||||||
# Test the system
|
|
||||||
docker-compose exec crawler python crawler_service.py 1
|
|
||||||
```
|
|
||||||
|
|
||||||
## Switching Between Docker and External Ollama
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Use integrated Docker Ollama (recommended)
|
|
||||||
./configure-ollama.sh
|
|
||||||
# Select option 1
|
|
||||||
|
|
||||||
# Use external Ollama server
|
|
||||||
./configure-ollama.sh
|
|
||||||
# Select option 2
|
|
||||||
```
|
|
||||||
|
|
||||||
## Performance Notes
|
|
||||||
|
|
||||||
- First request: ~6 seconds (model loading)
|
|
||||||
- Subsequent requests: 0.5-2 seconds (cached)
|
|
||||||
- Translation: 0.5-6 seconds per title
|
|
||||||
- Summarization: 5-90 seconds per article (depends on length)
|
|
||||||
|
|
||||||
## Resource Requirements
|
|
||||||
|
|
||||||
- RAM: 4GB minimum for phi3:latest
|
|
||||||
- Disk: 2.2GB for model storage
|
|
||||||
- CPU: Works on CPU, GPU optional
|
|
||||||
|
|
||||||
## Alternative Models
|
|
||||||
|
|
||||||
To use a different model:
|
|
||||||
|
|
||||||
1. Update `OLLAMA_MODEL` in `backend/.env`
|
|
||||||
2. Pull the model:
|
|
||||||
```bash
|
|
||||||
docker-compose exec ollama ollama pull <model-name>
|
|
||||||
```
|
|
||||||
|
|
||||||
Popular alternatives:
|
|
||||||
- `gemma2:2b` - Smaller, faster (1.6GB)
|
|
||||||
- `llama3.2:latest` - Larger, more capable (2GB)
|
|
||||||
- `mistral:latest` - Good balance (4.1GB)
|
|
||||||
@@ -1,126 +0,0 @@
|
|||||||
# Project Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
munich-news/
|
|
||||||
├── backend/ # Backend API and services
|
|
||||||
│ ├── routes/ # API routes
|
|
||||||
│ ├── services/ # Business logic
|
|
||||||
│ ├── .env.example # Environment template
|
|
||||||
│ ├── app.py # Flask application
|
|
||||||
│ ├── config.py # Configuration
|
|
||||||
│ └── database.py # MongoDB connection
|
|
||||||
│
|
|
||||||
├── news_crawler/ # News crawler service
|
|
||||||
│ ├── Dockerfile # Crawler container
|
|
||||||
│ ├── crawler_service.py # Main crawler logic
|
|
||||||
│ ├── scheduled_crawler.py # Scheduler (6 AM)
|
|
||||||
│ ├── rss_utils.py # RSS parsing utilities
|
|
||||||
│ └── requirements.txt # Python dependencies
|
|
||||||
│
|
|
||||||
├── news_sender/ # Newsletter sender service
|
|
||||||
│ ├── Dockerfile # Sender container
|
|
||||||
│ ├── sender_service.py # Main sender logic
|
|
||||||
│ ├── scheduled_sender.py # Scheduler (7 AM)
|
|
||||||
│ ├── tracking_integration.py # Email tracking
|
|
||||||
│ ├── newsletter_template.html # Email template
|
|
||||||
│ └── requirements.txt # Python dependencies
|
|
||||||
│
|
|
||||||
├── frontend/ # React dashboard (optional)
|
|
||||||
│ ├── src/ # React components
|
|
||||||
│ ├── public/ # Static files
|
|
||||||
│ └── package.json # Node dependencies
|
|
||||||
│
|
|
||||||
├── tests/ # All test files
|
|
||||||
│ ├── crawler/ # Crawler tests
|
|
||||||
│ ├── sender/ # Sender tests
|
|
||||||
│ └── backend/ # Backend tests
|
|
||||||
│
|
|
||||||
├── docs/ # Documentation
|
|
||||||
│ ├── ARCHITECTURE.md # System architecture
|
|
||||||
│ ├── DEPLOYMENT.md # Deployment guide
|
|
||||||
│ ├── API.md # API reference
|
|
||||||
│ ├── DATABASE_SCHEMA.md # Database structure
|
|
||||||
│ ├── BACKEND_STRUCTURE.md # Backend organization
|
|
||||||
│ ├── CRAWLER_HOW_IT_WORKS.md # Crawler internals
|
|
||||||
│ ├── EXTRACTION_STRATEGIES.md # Content extraction
|
|
||||||
│ └── RSS_URL_EXTRACTION.md # RSS parsing
|
|
||||||
│
|
|
||||||
├── .kiro/ # Kiro IDE configuration
|
|
||||||
│ └── specs/ # Feature specifications
|
|
||||||
│
|
|
||||||
├── docker-compose.yml # Docker orchestration
|
|
||||||
├── README.md # Main documentation
|
|
||||||
├── QUICKSTART.md # 5-minute setup guide
|
|
||||||
├── CONTRIBUTING.md # Contribution guidelines
|
|
||||||
├── .gitignore # Git ignore rules
|
|
||||||
└── .dockerignore # Docker ignore rules
|
|
||||||
```
|
|
||||||
|
|
||||||
## Key Files
|
|
||||||
|
|
||||||
### Configuration
|
|
||||||
- `backend/.env` - Environment variables (create from .env.example)
|
|
||||||
- `docker-compose.yml` - Docker services configuration
|
|
||||||
|
|
||||||
### Entry Points
|
|
||||||
- `news_crawler/scheduled_crawler.py` - Crawler scheduler (6 AM)
|
|
||||||
- `news_sender/scheduled_sender.py` - Sender scheduler (7 AM)
|
|
||||||
- `backend/app.py` - Backend API server
|
|
||||||
|
|
||||||
### Documentation
|
|
||||||
- `README.md` - Main project documentation
|
|
||||||
- `QUICKSTART.md` - Quick setup guide
|
|
||||||
- `docs/` - Detailed documentation
|
|
||||||
|
|
||||||
### Tests
|
|
||||||
- `tests/crawler/` - Crawler test files
|
|
||||||
- `tests/sender/` - Sender test files
|
|
||||||
- `tests/backend/` - Backend test files
|
|
||||||
|
|
||||||
## Docker Services
|
|
||||||
|
|
||||||
When you run `docker-compose up -d`, these services start:
|
|
||||||
|
|
||||||
1. **mongodb** - Database (port 27017)
|
|
||||||
2. **crawler** - News crawler (scheduled for 6 AM)
|
|
||||||
3. **sender** - Newsletter sender (scheduled for 7 AM)
|
|
||||||
4. **backend** - API server (port 5001, optional)
|
|
||||||
|
|
||||||
## Data Flow
|
|
||||||
|
|
||||||
```
|
|
||||||
RSS Feeds → Crawler → MongoDB → Sender → Subscribers
|
|
||||||
↓
|
|
||||||
Backend API
|
|
||||||
↓
|
|
||||||
Analytics
|
|
||||||
```
|
|
||||||
|
|
||||||
## Development Workflow
|
|
||||||
|
|
||||||
1. Edit code in respective directories
|
|
||||||
2. Rebuild containers: `docker-compose up -d --build`
|
|
||||||
3. View logs: `docker-compose logs -f`
|
|
||||||
4. Run tests: `docker-compose exec <service> python tests/...`
|
|
||||||
|
|
||||||
## Adding New Features
|
|
||||||
|
|
||||||
1. Create spec in `.kiro/specs/`
|
|
||||||
2. Implement in appropriate directory
|
|
||||||
3. Add tests in `tests/`
|
|
||||||
4. Update documentation in `docs/`
|
|
||||||
5. Submit pull request
|
|
||||||
|
|
||||||
## Clean Architecture
|
|
||||||
|
|
||||||
- **Separation of Concerns**: Each service has its own directory
|
|
||||||
- **Centralized Configuration**: All config in `backend/.env`
|
|
||||||
- **Organized Tests**: All tests in `tests/` directory
|
|
||||||
- **Clear Documentation**: All docs in `docs/` directory
|
|
||||||
- **Single Entry Point**: One `docker-compose.yml` file
|
|
||||||
|
|
||||||
This structure makes the project:
|
|
||||||
- ✅ Easy to navigate
|
|
||||||
- ✅ Simple to deploy
|
|
||||||
- ✅ Clear to understand
|
|
||||||
- ✅ Maintainable long-term
|
|
||||||
@@ -1,144 +0,0 @@
|
|||||||
# Quick Start: Ollama with GPU
|
|
||||||
|
|
||||||
## 30-Second Setup
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. Check GPU
|
|
||||||
./check-gpu.sh
|
|
||||||
|
|
||||||
# 2. Start services
|
|
||||||
./start-with-gpu.sh
|
|
||||||
|
|
||||||
# 3. Test
|
|
||||||
docker-compose exec crawler python crawler_service.py 2
|
|
||||||
```
|
|
||||||
|
|
||||||
## Commands Cheat Sheet
|
|
||||||
|
|
||||||
### Setup
|
|
||||||
```bash
|
|
||||||
# Check GPU availability
|
|
||||||
./check-gpu.sh
|
|
||||||
|
|
||||||
# Configure Ollama
|
|
||||||
./configure-ollama.sh
|
|
||||||
|
|
||||||
# Start with GPU auto-detection
|
|
||||||
./start-with-gpu.sh
|
|
||||||
|
|
||||||
# Start with GPU (manual)
|
|
||||||
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
|
||||||
|
|
||||||
# Start without GPU
|
|
||||||
docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
### Monitoring
|
|
||||||
```bash
|
|
||||||
# Check GPU usage
|
|
||||||
docker exec munich-news-ollama nvidia-smi
|
|
||||||
|
|
||||||
# Monitor GPU in real-time
|
|
||||||
watch -n 1 'docker exec munich-news-ollama nvidia-smi'
|
|
||||||
|
|
||||||
# Check Ollama logs
|
|
||||||
docker-compose logs -f ollama
|
|
||||||
|
|
||||||
# Check crawler logs
|
|
||||||
docker-compose logs -f crawler
|
|
||||||
```
|
|
||||||
|
|
||||||
### Testing
|
|
||||||
```bash
|
|
||||||
# Test translation (2 articles)
|
|
||||||
docker-compose exec crawler python crawler_service.py 2
|
|
||||||
|
|
||||||
# Check translation timing
|
|
||||||
docker-compose logs crawler | grep "Title translated"
|
|
||||||
|
|
||||||
# Test Ollama API (internal network only)
|
|
||||||
docker-compose exec crawler curl -s http://ollama:11434/api/generate -d '{
|
|
||||||
"model": "phi3:latest",
|
|
||||||
"prompt": "Translate to English: Guten Morgen",
|
|
||||||
"stream": false
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Troubleshooting
|
|
||||||
```bash
|
|
||||||
# Restart Ollama
|
|
||||||
docker-compose restart ollama
|
|
||||||
|
|
||||||
# Rebuild and restart
|
|
||||||
docker-compose up -d --build ollama
|
|
||||||
|
|
||||||
# Check GPU in container
|
|
||||||
docker exec munich-news-ollama nvidia-smi
|
|
||||||
|
|
||||||
# Pull model manually
|
|
||||||
docker-compose exec ollama ollama pull phi3:latest
|
|
||||||
|
|
||||||
# List available models
|
|
||||||
docker-compose exec ollama ollama list
|
|
||||||
```
|
|
||||||
|
|
||||||
## Performance Expectations
|
|
||||||
|
|
||||||
| Operation | CPU | GPU | Speedup |
|
|
||||||
|-----------|-----|-----|---------|
|
|
||||||
| Translation | 1.5s | 0.3s | 5x |
|
|
||||||
| Summary | 8s | 2s | 4x |
|
|
||||||
| 10 Articles | 115s | 31s | 3.7x |
|
|
||||||
|
|
||||||
## Common Issues
|
|
||||||
|
|
||||||
### GPU Not Detected
|
|
||||||
```bash
|
|
||||||
# Install NVIDIA Container Toolkit
|
|
||||||
sudo apt-get install -y nvidia-container-toolkit
|
|
||||||
sudo systemctl restart docker
|
|
||||||
```
|
|
||||||
|
|
||||||
### Out of Memory
|
|
||||||
```bash
|
|
||||||
# Use smaller model (edit backend/.env)
|
|
||||||
OLLAMA_MODEL=gemma2:2b
|
|
||||||
```
|
|
||||||
|
|
||||||
### Slow Performance
|
|
||||||
```bash
|
|
||||||
# Verify GPU is being used
|
|
||||||
docker exec munich-news-ollama nvidia-smi
|
|
||||||
# Should show GPU memory usage during inference
|
|
||||||
```
|
|
||||||
|
|
||||||
## Configuration Files
|
|
||||||
|
|
||||||
**backend/.env** - Main configuration
|
|
||||||
```env
|
|
||||||
OLLAMA_ENABLED=true
|
|
||||||
OLLAMA_BASE_URL=http://ollama:11434
|
|
||||||
OLLAMA_MODEL=phi3:latest
|
|
||||||
OLLAMA_TIMEOUT=120
|
|
||||||
```
|
|
||||||
|
|
||||||
**docker-compose.yml** - Main services
|
|
||||||
**docker-compose.gpu.yml** - GPU override
|
|
||||||
|
|
||||||
## Model Options
|
|
||||||
|
|
||||||
- `gemma2:2b` - Fastest, 1.5GB VRAM
|
|
||||||
- `phi3:latest` - Default, 3-4GB VRAM ⭐
|
|
||||||
- `llama3.2:3b` - Best quality, 5-6GB VRAM
|
|
||||||
|
|
||||||
## Full Documentation
|
|
||||||
|
|
||||||
- [OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md) - Complete setup guide
|
|
||||||
- [GPU_SETUP.md](docs/GPU_SETUP.md) - GPU-specific guide
|
|
||||||
- [PERFORMANCE_COMPARISON.md](docs/PERFORMANCE_COMPARISON.md) - Benchmarks
|
|
||||||
|
|
||||||
## Need Help?
|
|
||||||
|
|
||||||
1. Run `./check-gpu.sh`
|
|
||||||
2. Check `docker-compose logs ollama`
|
|
||||||
3. See troubleshooting in [GPU_SETUP.md](docs/GPU_SETUP.md)
|
|
||||||
12
README.md
12
README.md
@@ -397,13 +397,23 @@ export MONGO_PASSWORD=your-secure-password
|
|||||||
- Set up alerts for failures
|
- Set up alerts for failures
|
||||||
- Monitor database size
|
- Monitor database size
|
||||||
|
|
||||||
|
## 📚 Documentation
|
||||||
|
|
||||||
|
Complete documentation available in the [docs/](docs/) directory:
|
||||||
|
|
||||||
|
- **[Documentation Index](docs/INDEX.md)** - Complete documentation guide
|
||||||
|
- **[GPU Setup](docs/GPU_SETUP.md)** - 5-10x faster with GPU acceleration
|
||||||
|
- **[Admin API](docs/ADMIN_API.md)** - API endpoints reference
|
||||||
|
- **[Security Guide](docs/SECURITY_NOTES.md)** - Security best practices
|
||||||
|
- **[System Architecture](docs/SYSTEM_ARCHITECTURE.md)** - Technical overview
|
||||||
|
|
||||||
## 📝 License
|
## 📝 License
|
||||||
|
|
||||||
[Your License Here]
|
[Your License Here]
|
||||||
|
|
||||||
## 🤝 Contributing
|
## 🤝 Contributing
|
||||||
|
|
||||||
Contributions welcome! Please read CONTRIBUTING.md first.
|
Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first.
|
||||||
|
|
||||||
## 📧 Support
|
## 📧 Support
|
||||||
|
|
||||||
|
|||||||
@@ -1,125 +0,0 @@
|
|||||||
# Security Update: Ollama Internal-Only Configuration
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Ollama service has been configured to be **internal-only** and is no longer exposed to the host machine. This improves security by reducing the attack surface.
|
|
||||||
|
|
||||||
## Changes Made
|
|
||||||
|
|
||||||
### Before (Exposed)
|
|
||||||
```yaml
|
|
||||||
ollama:
|
|
||||||
ports:
|
|
||||||
- "11434:11434" # ❌ Accessible from host and external network
|
|
||||||
```
|
|
||||||
|
|
||||||
### After (Internal Only)
|
|
||||||
```yaml
|
|
||||||
ollama:
|
|
||||||
# No ports section - internal only ✓
|
|
||||||
# Only accessible within Docker network
|
|
||||||
```
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
### ✓ Port Not Accessible from Host
|
|
||||||
```bash
|
|
||||||
$ nc -z -w 2 localhost 11434
|
|
||||||
# Connection refused (as expected)
|
|
||||||
```
|
|
||||||
|
|
||||||
### ✓ Accessible from Docker Services
|
|
||||||
```bash
|
|
||||||
$ docker-compose exec crawler python -c "import requests; requests.get('http://ollama:11434/api/tags')"
|
|
||||||
# ✓ Works perfectly
|
|
||||||
```
|
|
||||||
|
|
||||||
## Security Benefits
|
|
||||||
|
|
||||||
1. **No External Access**: Ollama API cannot be accessed from outside Docker network
|
|
||||||
2. **Reduced Attack Surface**: Service is not exposed to potential external threats
|
|
||||||
3. **Network Isolation**: Only authorized Docker Compose services can communicate with Ollama
|
|
||||||
4. **No Port Conflicts**: Port 11434 is not bound to host machine
|
|
||||||
|
|
||||||
## Impact on Usage
|
|
||||||
|
|
||||||
### No Change for Normal Operations ✓
|
|
||||||
- Crawler service works normally
|
|
||||||
- Translation and summarization work as before
|
|
||||||
- All Docker Compose services can access Ollama
|
|
||||||
|
|
||||||
### Testing from Host Machine
|
|
||||||
Since Ollama is internal-only, you must test from inside the Docker network:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# ✓ Test from inside a container
|
|
||||||
docker-compose exec crawler python crawler_service.py 1
|
|
||||||
|
|
||||||
# ✓ Check Ollama status
|
|
||||||
docker-compose exec crawler python -c "import requests; print(requests.get('http://ollama:11434/api/tags').json())"
|
|
||||||
|
|
||||||
# ✓ Check logs
|
|
||||||
docker-compose logs ollama
|
|
||||||
```
|
|
||||||
|
|
||||||
### If You Need External Access (Development Only)
|
|
||||||
|
|
||||||
For development/debugging, you can temporarily expose Ollama:
|
|
||||||
|
|
||||||
**Option 1: SSH Port Forward**
|
|
||||||
```bash
|
|
||||||
# Forward port through SSH (if accessing remote server)
|
|
||||||
ssh -L 11434:localhost:11434 user@server
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option 2: Temporary Docker Exec**
|
|
||||||
```bash
|
|
||||||
# Run commands from inside network
|
|
||||||
docker-compose exec crawler curl http://ollama:11434/api/tags
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option 3: Modify docker-compose.yml (Not Recommended)**
|
|
||||||
```yaml
|
|
||||||
ollama:
|
|
||||||
ports:
|
|
||||||
- "127.0.0.1:11434:11434" # Only localhost, not all interfaces
|
|
||||||
```
|
|
||||||
|
|
||||||
## Documentation Updated
|
|
||||||
|
|
||||||
- ✓ docker-compose.yml - Removed port exposure
|
|
||||||
- ✓ docs/OLLAMA_SETUP.md - Updated testing instructions
|
|
||||||
- ✓ docs/SECURITY_NOTES.md - Added security documentation
|
|
||||||
- ✓ test-ollama-setup.sh - Updated to test from inside network
|
|
||||||
- ✓ QUICK_START_GPU.md - Updated API testing examples
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
All functionality has been verified:
|
|
||||||
- ✓ Ollama not accessible from host
|
|
||||||
- ✓ Ollama accessible from crawler service
|
|
||||||
- ✓ Translation works correctly
|
|
||||||
- ✓ Summarization works correctly
|
|
||||||
- ✓ All tests pass
|
|
||||||
|
|
||||||
## Rollback (If Needed)
|
|
||||||
|
|
||||||
If you need to expose Ollama again:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In docker-compose.yml
|
|
||||||
ollama:
|
|
||||||
ports:
|
|
||||||
- "11434:11434" # or "127.0.0.1:11434:11434" for localhost only
|
|
||||||
```
|
|
||||||
|
|
||||||
Then restart:
|
|
||||||
```bash
|
|
||||||
docker-compose up -d ollama
|
|
||||||
```
|
|
||||||
|
|
||||||
## Recommendation
|
|
||||||
|
|
||||||
**Keep Ollama internal-only** for production deployments. This is the most secure configuration and sufficient for normal operations.
|
|
||||||
|
|
||||||
Only expose Ollama if you have a specific need for external access, and always bind to `127.0.0.1` (localhost only), never `0.0.0.0` (all interfaces).
|
|
||||||
@@ -7,8 +7,11 @@
|
|||||||
# Or manually: docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
# Or manually: docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
||||||
#
|
#
|
||||||
# Security:
|
# Security:
|
||||||
# Ollama service is internal-only (no ports exposed to host)
|
# - Only Backend API (port 5001) is exposed to host
|
||||||
# Only accessible by other Docker Compose services
|
# - MongoDB is internal-only (not exposed to host)
|
||||||
|
# - Ollama is internal-only (not exposed to host)
|
||||||
|
# - Crawler and Sender are internal-only
|
||||||
|
# All services communicate via internal Docker network
|
||||||
#
|
#
|
||||||
# See docs/OLLAMA_SETUP.md for detailed setup instructions
|
# See docs/OLLAMA_SETUP.md for detailed setup instructions
|
||||||
|
|
||||||
@@ -59,13 +62,12 @@ services:
|
|||||||
"
|
"
|
||||||
restart: "no"
|
restart: "no"
|
||||||
|
|
||||||
# MongoDB Database
|
# MongoDB Database (Internal only - not exposed to host)
|
||||||
mongodb:
|
mongodb:
|
||||||
image: mongo:latest
|
image: mongo:latest
|
||||||
container_name: munich-news-mongodb
|
container_name: munich-news-mongodb
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
ports:
|
# No ports exposed - only accessible within Docker network
|
||||||
- "27017:27017"
|
|
||||||
environment:
|
environment:
|
||||||
# For production, set MONGO_PASSWORD environment variable
|
# For production, set MONGO_PASSWORD environment variable
|
||||||
MONGO_INITDB_ROOT_USERNAME: ${MONGO_USERNAME:-admin}
|
MONGO_INITDB_ROOT_USERNAME: ${MONGO_USERNAME:-admin}
|
||||||
|
|||||||
@@ -330,3 +330,53 @@ def trigger_crawl():
|
|||||||
- **[Newsletter Preview](../backend/routes/newsletter_routes.py)**: `/api/newsletter/preview` - Preview newsletter HTML
|
- **[Newsletter Preview](../backend/routes/newsletter_routes.py)**: `/api/newsletter/preview` - Preview newsletter HTML
|
||||||
- **[Analytics](API.md)**: `/api/analytics/*` - View engagement metrics
|
- **[Analytics](API.md)**: `/api/analytics/*` - View engagement metrics
|
||||||
- **[RSS Feeds](API.md)**: `/api/rss-feeds` - Manage RSS feeds
|
- **[RSS Feeds](API.md)**: `/api/rss-feeds` - Manage RSS feeds
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Newsletter API Summary
|
||||||
|
|
||||||
|
### Available Endpoints
|
||||||
|
|
||||||
|
| Endpoint | Purpose | Recipient |
|
||||||
|
|----------|---------|-----------|
|
||||||
|
| `/api/admin/send-test-email` | Test newsletter | Single email (specified) |
|
||||||
|
| `/api/admin/send-newsletter` | Production send | All active subscribers |
|
||||||
|
| `/api/admin/trigger-crawl` | Fetch articles | N/A |
|
||||||
|
| `/api/admin/stats` | System stats | N/A |
|
||||||
|
|
||||||
|
### Subscriber Status
|
||||||
|
|
||||||
|
The system uses a `status` field to determine who receives newsletters:
|
||||||
|
- **`active`** - Receives newsletters ✅
|
||||||
|
- **`inactive`** - Does not receive newsletters ❌
|
||||||
|
|
||||||
|
See [SUBSCRIBER_STATUS.md](SUBSCRIBER_STATUS.md) for details.
|
||||||
|
|
||||||
|
### Quick Examples
|
||||||
|
|
||||||
|
**Send to all subscribers:**
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:5001/api/admin/send-newsletter \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"max_articles": 10}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Send test email:**
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:5001/api/admin/send-test-email \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"email": "test@example.com"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check stats:**
|
||||||
|
```bash
|
||||||
|
curl http://localhost:5001/api/admin/stats | jq '.subscribers'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Testing
|
||||||
|
|
||||||
|
Use the test script:
|
||||||
|
```bash
|
||||||
|
./test-newsletter-api.sh
|
||||||
|
```
|
||||||
|
|||||||
@@ -134,3 +134,43 @@ Root:
|
|||||||
- [ ] API rate limiting
|
- [ ] API rate limiting
|
||||||
- [ ] Caching layer (Redis)
|
- [ ] Caching layer (Redis)
|
||||||
- [ ] Message queue for crawler (Celery)
|
- [ ] Message queue for crawler (Celery)
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recent Updates (November 2025)
|
||||||
|
|
||||||
|
### Security Improvements
|
||||||
|
- **MongoDB Internal-Only**: Removed port exposure, only accessible via Docker network
|
||||||
|
- **Ollama Internal-Only**: Removed port exposure, only accessible via Docker network
|
||||||
|
- **Reduced Attack Surface**: Only Backend API (port 5001) exposed to host
|
||||||
|
- **Network Isolation**: All services communicate via internal Docker network
|
||||||
|
|
||||||
|
### Ollama Integration
|
||||||
|
- **Docker Compose Integration**: Ollama service runs alongside other services
|
||||||
|
- **Automatic Model Download**: phi3:latest model downloaded on first startup
|
||||||
|
- **GPU Support**: NVIDIA GPU acceleration with automatic detection
|
||||||
|
- **Helper Scripts**: `start-with-gpu.sh`, `check-gpu.sh`, `configure-ollama.sh`
|
||||||
|
- **Performance**: 5-10x faster with GPU acceleration
|
||||||
|
|
||||||
|
### API Enhancements
|
||||||
|
- **Send Newsletter Endpoint**: `/api/admin/send-newsletter` to send to all active subscribers
|
||||||
|
- **Subscriber Status Fix**: Fixed stats endpoint to correctly count active subscribers
|
||||||
|
- **Better Error Handling**: Improved error messages and validation
|
||||||
|
|
||||||
|
### Documentation
|
||||||
|
- **Consolidated Documentation**: Moved all docs to `docs/` directory
|
||||||
|
- **Security Guide**: Comprehensive security documentation
|
||||||
|
- **GPU Setup Guide**: Detailed GPU acceleration setup
|
||||||
|
- **MongoDB Connection Guide**: Connection configuration explained
|
||||||
|
- **Subscriber Status Guide**: How subscriber status system works
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
- **MongoDB URI**: Updated to use Docker service name (`mongodb` instead of `localhost`)
|
||||||
|
- **Ollama URL**: Configured for internal Docker network (`http://ollama:11434`)
|
||||||
|
- **Single .env File**: All configuration in `backend/.env`
|
||||||
|
|
||||||
|
### Testing
|
||||||
|
- **Connectivity Tests**: `test-mongodb-connectivity.sh`
|
||||||
|
- **Ollama Tests**: `test-ollama-setup.sh`
|
||||||
|
- **Newsletter API Tests**: `test-newsletter-api.sh`
|
||||||
|
|||||||
@@ -269,3 +269,68 @@ db.articles.find({ summary: { $exists: false } })
|
|||||||
// Count summarized articles
|
// Count summarized articles
|
||||||
db.articles.countDocuments({ summary: { $exists: true, $ne: null } })
|
db.articles.countDocuments({ summary: { $exists: true, $ne: null } })
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## MongoDB Connection Configuration
|
||||||
|
|
||||||
|
### Docker Compose Setup
|
||||||
|
|
||||||
|
**Connection URI:**
|
||||||
|
```env
|
||||||
|
MONGODB_URI=mongodb://admin:changeme@mongodb:27017/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Points:**
|
||||||
|
- Uses `mongodb` (Docker service name), not `localhost`
|
||||||
|
- Includes authentication credentials
|
||||||
|
- Only works inside Docker network
|
||||||
|
- Port 27017 is NOT exposed to host (internal only)
|
||||||
|
|
||||||
|
### Why 'mongodb' Instead of 'localhost'?
|
||||||
|
|
||||||
|
**Inside Docker containers:**
|
||||||
|
```
|
||||||
|
Container → mongodb:27017 ✅ Works (Docker DNS)
|
||||||
|
Container → localhost:27017 ❌ Fails (localhost = container itself)
|
||||||
|
```
|
||||||
|
|
||||||
|
**From host machine:**
|
||||||
|
```
|
||||||
|
Host → localhost:27017 ❌ Blocked (port not exposed)
|
||||||
|
Host → mongodb:27017 ❌ Fails (DNS only works in Docker)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Connection Priority
|
||||||
|
|
||||||
|
1. **Docker Compose environment variables** (highest)
|
||||||
|
2. **.env file** (fallback)
|
||||||
|
3. **Code defaults** (lowest)
|
||||||
|
|
||||||
|
### Testing Connection
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# From backend
|
||||||
|
docker-compose exec backend python -c "
|
||||||
|
from database import articles_collection
|
||||||
|
print(f'Articles: {articles_collection.count_documents({})}')
|
||||||
|
"
|
||||||
|
|
||||||
|
# From crawler
|
||||||
|
docker-compose exec crawler python -c "
|
||||||
|
from pymongo import MongoClient
|
||||||
|
from config import Config
|
||||||
|
client = MongoClient(Config.MONGODB_URI)
|
||||||
|
print(f'MongoDB version: {client.server_info()[\"version\"]}')
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Security
|
||||||
|
|
||||||
|
- ✅ MongoDB is internal-only (not exposed to host)
|
||||||
|
- ✅ Uses authentication (username/password)
|
||||||
|
- ✅ Only accessible via Docker network
|
||||||
|
- ✅ Cannot be accessed from external network
|
||||||
|
|
||||||
|
See [SECURITY_NOTES.md](SECURITY_NOTES.md) for more security details.
|
||||||
|
|||||||
204
docs/DOCUMENTATION_CLEANUP.md
Normal file
204
docs/DOCUMENTATION_CLEANUP.md
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
# Documentation Cleanup Summary
|
||||||
|
|
||||||
|
## What Was Done
|
||||||
|
|
||||||
|
Consolidated and organized all markdown documentation files.
|
||||||
|
|
||||||
|
## Before
|
||||||
|
|
||||||
|
**Root Level:** 14 markdown files (cluttered)
|
||||||
|
```
|
||||||
|
README.md
|
||||||
|
QUICKSTART.md
|
||||||
|
CONTRIBUTING.md
|
||||||
|
IMPLEMENTATION_SUMMARY.md
|
||||||
|
MONGODB_CONNECTION_EXPLAINED.md
|
||||||
|
NETWORK_SECURITY_SUMMARY.md
|
||||||
|
NEWSLETTER_API_UPDATE.md
|
||||||
|
OLLAMA_GPU_SUMMARY.md
|
||||||
|
OLLAMA_INTEGRATION.md
|
||||||
|
QUICK_START_GPU.md
|
||||||
|
SECURITY_IMPROVEMENTS.md
|
||||||
|
SECURITY_UPDATE.md
|
||||||
|
FINAL_STRUCTURE.md (outdated)
|
||||||
|
PROJECT_STRUCTURE.md (redundant)
|
||||||
|
```
|
||||||
|
|
||||||
|
**docs/:** 18 files (organized but some content duplicated)
|
||||||
|
|
||||||
|
## After
|
||||||
|
|
||||||
|
**Root Level:** 3 essential files (clean)
|
||||||
|
```
|
||||||
|
README.md - Main entry point
|
||||||
|
QUICKSTART.md - Quick setup guide
|
||||||
|
CONTRIBUTING.md - Contribution guidelines
|
||||||
|
```
|
||||||
|
|
||||||
|
**docs/:** 19 files (organized, consolidated, no duplication)
|
||||||
|
```
|
||||||
|
INDEX.md - Documentation index (NEW)
|
||||||
|
ADMIN_API.md - Admin API (consolidated)
|
||||||
|
API.md
|
||||||
|
ARCHITECTURE.md
|
||||||
|
BACKEND_STRUCTURE.md
|
||||||
|
CHANGELOG.md - Updated with recent changes
|
||||||
|
CRAWLER_HOW_IT_WORKS.md
|
||||||
|
DATABASE_SCHEMA.md - Added MongoDB connection info
|
||||||
|
DEPLOYMENT.md
|
||||||
|
EXTRACTION_STRATEGIES.md
|
||||||
|
GPU_SETUP.md - Consolidated GPU docs
|
||||||
|
OLLAMA_SETUP.md - Consolidated Ollama docs
|
||||||
|
OLD_ARCHITECTURE.md
|
||||||
|
PERFORMANCE_COMPARISON.md
|
||||||
|
QUICK_REFERENCE.md
|
||||||
|
RSS_URL_EXTRACTION.md
|
||||||
|
SECURITY_NOTES.md - Consolidated all security docs
|
||||||
|
SUBSCRIBER_STATUS.md
|
||||||
|
SYSTEM_ARCHITECTURE.md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Changes Made
|
||||||
|
|
||||||
|
### 1. Deleted Redundant Files
|
||||||
|
- ❌ `FINAL_STRUCTURE.md` (outdated)
|
||||||
|
- ❌ `PROJECT_STRUCTURE.md` (redundant with README)
|
||||||
|
|
||||||
|
### 2. Merged into docs/SECURITY_NOTES.md
|
||||||
|
- ✅ `SECURITY_UPDATE.md` (Ollama security)
|
||||||
|
- ✅ `SECURITY_IMPROVEMENTS.md` (Network isolation)
|
||||||
|
- ✅ `NETWORK_SECURITY_SUMMARY.md` (Port exposure summary)
|
||||||
|
|
||||||
|
### 3. Merged into docs/GPU_SETUP.md
|
||||||
|
- ✅ `OLLAMA_GPU_SUMMARY.md` (GPU implementation summary)
|
||||||
|
- ✅ `QUICK_START_GPU.md` (Quick start commands)
|
||||||
|
|
||||||
|
### 4. Merged into docs/OLLAMA_SETUP.md
|
||||||
|
- ✅ `OLLAMA_INTEGRATION.md` (Integration details)
|
||||||
|
|
||||||
|
### 5. Merged into docs/ADMIN_API.md
|
||||||
|
- ✅ `NEWSLETTER_API_UPDATE.md` (Newsletter endpoint)
|
||||||
|
|
||||||
|
### 6. Merged into docs/DATABASE_SCHEMA.md
|
||||||
|
- ✅ `MONGODB_CONNECTION_EXPLAINED.md` (Connection config)
|
||||||
|
|
||||||
|
### 7. Merged into docs/CHANGELOG.md
|
||||||
|
- ✅ `IMPLEMENTATION_SUMMARY.md` (Recent updates)
|
||||||
|
|
||||||
|
### 8. Created New Files
|
||||||
|
- ✨ `docs/INDEX.md` - Complete documentation index
|
||||||
|
|
||||||
|
### 9. Updated Existing Files
|
||||||
|
- 📝 `README.md` - Added documentation section
|
||||||
|
- 📝 `docs/CHANGELOG.md` - Added recent updates
|
||||||
|
- 📝 `docs/SECURITY_NOTES.md` - Comprehensive security guide
|
||||||
|
- 📝 `docs/GPU_SETUP.md` - Complete GPU guide
|
||||||
|
- 📝 `docs/OLLAMA_SETUP.md` - Complete Ollama guide
|
||||||
|
- 📝 `docs/ADMIN_API.md` - Complete API reference
|
||||||
|
- 📝 `docs/DATABASE_SCHEMA.md` - Added connection info
|
||||||
|
|
||||||
|
## Benefits
|
||||||
|
|
||||||
|
### 1. Cleaner Root Directory
|
||||||
|
- Only 3 essential files visible
|
||||||
|
- Easier to navigate
|
||||||
|
- Professional appearance
|
||||||
|
|
||||||
|
### 2. Better Organization
|
||||||
|
- All technical docs in `docs/`
|
||||||
|
- Logical grouping by topic
|
||||||
|
- Easy to find information
|
||||||
|
|
||||||
|
### 3. No Duplication
|
||||||
|
- Consolidated related content
|
||||||
|
- Single source of truth
|
||||||
|
- Easier to maintain
|
||||||
|
|
||||||
|
### 4. Improved Discoverability
|
||||||
|
- Documentation index (`docs/INDEX.md`)
|
||||||
|
- Clear navigation
|
||||||
|
- Quick links by task
|
||||||
|
|
||||||
|
### 5. Better Maintenance
|
||||||
|
- Fewer files to update
|
||||||
|
- Related content together
|
||||||
|
- Clear structure
|
||||||
|
|
||||||
|
## Documentation Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
project/
|
||||||
|
├── README.md # Main entry point
|
||||||
|
├── QUICKSTART.md # Quick setup
|
||||||
|
├── CONTRIBUTING.md # How to contribute
|
||||||
|
│
|
||||||
|
└── docs/ # All technical documentation
|
||||||
|
├── INDEX.md # Documentation index
|
||||||
|
│
|
||||||
|
├── Setup & Configuration
|
||||||
|
│ ├── OLLAMA_SETUP.md
|
||||||
|
│ ├── GPU_SETUP.md
|
||||||
|
│ └── DEPLOYMENT.md
|
||||||
|
│
|
||||||
|
├── API Documentation
|
||||||
|
│ ├── ADMIN_API.md
|
||||||
|
│ ├── API.md
|
||||||
|
│ └── SUBSCRIBER_STATUS.md
|
||||||
|
│
|
||||||
|
├── Architecture
|
||||||
|
│ ├── SYSTEM_ARCHITECTURE.md
|
||||||
|
│ ├── ARCHITECTURE.md
|
||||||
|
│ ├── DATABASE_SCHEMA.md
|
||||||
|
│ └── BACKEND_STRUCTURE.md
|
||||||
|
│
|
||||||
|
├── Features
|
||||||
|
│ ├── CRAWLER_HOW_IT_WORKS.md
|
||||||
|
│ ├── EXTRACTION_STRATEGIES.md
|
||||||
|
│ ├── RSS_URL_EXTRACTION.md
|
||||||
|
│ └── PERFORMANCE_COMPARISON.md
|
||||||
|
│
|
||||||
|
├── Security
|
||||||
|
│ └── SECURITY_NOTES.md
|
||||||
|
│
|
||||||
|
└── Reference
|
||||||
|
├── CHANGELOG.md
|
||||||
|
└── QUICK_REFERENCE.md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Access
|
||||||
|
|
||||||
|
### For Users
|
||||||
|
- Start here: [README.md](README.md)
|
||||||
|
- Quick setup: [QUICKSTART.md](QUICKSTART.md)
|
||||||
|
- All docs: [docs/INDEX.md](docs/INDEX.md)
|
||||||
|
|
||||||
|
### For Developers
|
||||||
|
- Architecture: [docs/SYSTEM_ARCHITECTURE.md](docs/SYSTEM_ARCHITECTURE.md)
|
||||||
|
- API Reference: [docs/ADMIN_API.md](docs/ADMIN_API.md)
|
||||||
|
- Contributing: [CONTRIBUTING.md](CONTRIBUTING.md)
|
||||||
|
|
||||||
|
### For DevOps
|
||||||
|
- Deployment: [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)
|
||||||
|
- Security: [docs/SECURITY_NOTES.md](docs/SECURITY_NOTES.md)
|
||||||
|
- GPU Setup: [docs/GPU_SETUP.md](docs/GPU_SETUP.md)
|
||||||
|
|
||||||
|
## Statistics
|
||||||
|
|
||||||
|
- **Files Deleted:** 11 redundant markdown files
|
||||||
|
- **Files Merged:** 9 files consolidated into existing docs
|
||||||
|
- **Files Created:** 1 new index file
|
||||||
|
- **Files Updated:** 7 existing files enhanced
|
||||||
|
- **Root Level:** Reduced from 14 to 3 files (79% reduction)
|
||||||
|
- **Total Docs:** 19 well-organized files in docs/
|
||||||
|
|
||||||
|
## Result
|
||||||
|
|
||||||
|
✅ Clean, professional documentation structure
|
||||||
|
✅ Easy to navigate and find information
|
||||||
|
✅ No duplication or redundancy
|
||||||
|
✅ Better maintainability
|
||||||
|
✅ Improved user experience
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
This cleanup makes the project more professional and easier to use!
|
||||||
@@ -308,3 +308,113 @@ If you encounter issues:
|
|||||||
- Output of `nvidia-smi`
|
- Output of `nvidia-smi`
|
||||||
- Output of `docker info | grep -i runtime`
|
- Output of `docker info | grep -i runtime`
|
||||||
- Relevant logs
|
- Relevant logs
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start Guide
|
||||||
|
|
||||||
|
### 30-Second Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Check GPU
|
||||||
|
./check-gpu.sh
|
||||||
|
|
||||||
|
# 2. Start services
|
||||||
|
./start-with-gpu.sh
|
||||||
|
|
||||||
|
# 3. Test
|
||||||
|
docker-compose exec crawler python crawler_service.py 2
|
||||||
|
```
|
||||||
|
|
||||||
|
### Command Reference
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
```bash
|
||||||
|
./check-gpu.sh # Check GPU availability
|
||||||
|
./configure-ollama.sh # Configure Ollama
|
||||||
|
./start-with-gpu.sh # Start with GPU auto-detection
|
||||||
|
```
|
||||||
|
|
||||||
|
**With GPU (manual):**
|
||||||
|
```bash
|
||||||
|
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
**Without GPU:**
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
**Monitoring:**
|
||||||
|
```bash
|
||||||
|
docker exec munich-news-ollama nvidia-smi # Check GPU
|
||||||
|
watch -n 1 'docker exec munich-news-ollama nvidia-smi' # Monitor GPU
|
||||||
|
docker-compose logs -f ollama # Check logs
|
||||||
|
```
|
||||||
|
|
||||||
|
**Testing:**
|
||||||
|
```bash
|
||||||
|
docker-compose exec crawler python crawler_service.py 2 # Test crawl
|
||||||
|
docker-compose logs crawler | grep "Title translated" # Check timing
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance Expectations
|
||||||
|
|
||||||
|
| Operation | CPU | GPU | Speedup |
|
||||||
|
|-----------|-----|-----|---------|
|
||||||
|
| Translation | 1.5s | 0.3s | 5x |
|
||||||
|
| Summary | 8s | 2s | 4x |
|
||||||
|
| 10 Articles | 115s | 31s | 3.7x |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration Summary
|
||||||
|
|
||||||
|
### What Was Implemented
|
||||||
|
|
||||||
|
1. **Ollama Service in Docker Compose**
|
||||||
|
- Runs on internal network (port 11434)
|
||||||
|
- Automatic model download (phi3:latest)
|
||||||
|
- Persistent storage in Docker volume
|
||||||
|
- GPU support with automatic detection
|
||||||
|
|
||||||
|
2. **GPU Acceleration**
|
||||||
|
- NVIDIA GPU support via docker-compose.gpu.yml
|
||||||
|
- Automatic GPU detection script
|
||||||
|
- 5-10x performance improvement
|
||||||
|
- Graceful CPU fallback
|
||||||
|
|
||||||
|
3. **Helper Scripts**
|
||||||
|
- `start-with-gpu.sh` - Auto-detect and start
|
||||||
|
- `check-gpu.sh` - Diagnose GPU availability
|
||||||
|
- `configure-ollama.sh` - Interactive configuration
|
||||||
|
- `test-ollama-setup.sh` - Comprehensive tests
|
||||||
|
|
||||||
|
4. **Security**
|
||||||
|
- Ollama is internal-only (not exposed to host)
|
||||||
|
- Only accessible via Docker network
|
||||||
|
- Prevents unauthorized access
|
||||||
|
|
||||||
|
### Files Created
|
||||||
|
|
||||||
|
- `docker-compose.gpu.yml` - GPU configuration override
|
||||||
|
- `start-with-gpu.sh` - Auto-start script
|
||||||
|
- `check-gpu.sh` - GPU detection script
|
||||||
|
- `test-ollama-setup.sh` - Test suite
|
||||||
|
- `docs/GPU_SETUP.md` - This documentation
|
||||||
|
- `docs/OLLAMA_SETUP.md` - Ollama setup guide
|
||||||
|
- `docs/PERFORMANCE_COMPARISON.md` - Benchmarks
|
||||||
|
|
||||||
|
### Quick Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start with GPU
|
||||||
|
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
||||||
|
|
||||||
|
# Or use helper script
|
||||||
|
./start-with-gpu.sh
|
||||||
|
|
||||||
|
# Verify GPU usage
|
||||||
|
docker exec munich-news-ollama nvidia-smi
|
||||||
|
```
|
||||||
|
|||||||
116
docs/INDEX.md
Normal file
116
docs/INDEX.md
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
# Documentation Index
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
- [README](../README.md) - Project overview and quick start
|
||||||
|
- [QUICKSTART](../QUICKSTART.md) - Detailed 5-minute setup guide
|
||||||
|
|
||||||
|
## Setup & Configuration
|
||||||
|
- [OLLAMA_SETUP](OLLAMA_SETUP.md) - Ollama AI service setup
|
||||||
|
- [GPU_SETUP](GPU_SETUP.md) - GPU acceleration setup (5-10x faster)
|
||||||
|
- [DEPLOYMENT](DEPLOYMENT.md) - Production deployment guide
|
||||||
|
|
||||||
|
## API Documentation
|
||||||
|
- [ADMIN_API](ADMIN_API.md) - Admin endpoints (crawl, send newsletter)
|
||||||
|
- [API](API.md) - Public API endpoints
|
||||||
|
- [SUBSCRIBER_STATUS](SUBSCRIBER_STATUS.md) - Subscriber status system
|
||||||
|
|
||||||
|
## Architecture & Design
|
||||||
|
- [SYSTEM_ARCHITECTURE](SYSTEM_ARCHITECTURE.md) - Complete system architecture
|
||||||
|
- [ARCHITECTURE](ARCHITECTURE.md) - High-level architecture overview
|
||||||
|
- [DATABASE_SCHEMA](DATABASE_SCHEMA.md) - MongoDB schema and connection
|
||||||
|
- [BACKEND_STRUCTURE](BACKEND_STRUCTURE.md) - Backend code structure
|
||||||
|
|
||||||
|
## Features & How-To
|
||||||
|
- [CRAWLER_HOW_IT_WORKS](CRAWLER_HOW_IT_WORKS.md) - News crawler explained
|
||||||
|
- [EXTRACTION_STRATEGIES](EXTRACTION_STRATEGIES.md) - Content extraction
|
||||||
|
- [RSS_URL_EXTRACTION](RSS_URL_EXTRACTION.md) - RSS feed handling
|
||||||
|
- [PERFORMANCE_COMPARISON](PERFORMANCE_COMPARISON.md) - CPU vs GPU benchmarks
|
||||||
|
|
||||||
|
## Security
|
||||||
|
- [SECURITY_NOTES](SECURITY_NOTES.md) - Complete security guide
|
||||||
|
- Network isolation
|
||||||
|
- MongoDB security
|
||||||
|
- Ollama security
|
||||||
|
- Best practices
|
||||||
|
|
||||||
|
## Reference
|
||||||
|
- [CHANGELOG](CHANGELOG.md) - Version history and recent updates
|
||||||
|
- [QUICK_REFERENCE](QUICK_REFERENCE.md) - Command cheat sheet
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
- [CONTRIBUTING](../CONTRIBUTING.md) - How to contribute
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation Organization
|
||||||
|
|
||||||
|
### Root Level (3 files)
|
||||||
|
Essential files that should be immediately visible:
|
||||||
|
- `README.md` - Main entry point
|
||||||
|
- `QUICKSTART.md` - Quick setup guide
|
||||||
|
- `CONTRIBUTING.md` - Contribution guidelines
|
||||||
|
|
||||||
|
### docs/ Directory (18 files)
|
||||||
|
All technical documentation organized by category:
|
||||||
|
- **Setup**: Ollama, GPU, Deployment
|
||||||
|
- **API**: Admin API, Public API, Subscriber system
|
||||||
|
- **Architecture**: System design, database, backend structure
|
||||||
|
- **Features**: Crawler, extraction, RSS handling
|
||||||
|
- **Security**: Complete security documentation
|
||||||
|
- **Reference**: Changelog, quick reference
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Links by Task
|
||||||
|
|
||||||
|
### I want to...
|
||||||
|
|
||||||
|
**Set up the project:**
|
||||||
|
1. [README](../README.md) - Overview
|
||||||
|
2. [QUICKSTART](../QUICKSTART.md) - Step-by-step setup
|
||||||
|
|
||||||
|
**Enable GPU acceleration:**
|
||||||
|
1. [GPU_SETUP](GPU_SETUP.md) - Complete GPU guide
|
||||||
|
2. Run: `./start-with-gpu.sh`
|
||||||
|
|
||||||
|
**Send newsletters:**
|
||||||
|
1. [ADMIN_API](ADMIN_API.md) - API documentation
|
||||||
|
2. [SUBSCRIBER_STATUS](SUBSCRIBER_STATUS.md) - Subscriber system
|
||||||
|
|
||||||
|
**Understand the architecture:**
|
||||||
|
1. [SYSTEM_ARCHITECTURE](SYSTEM_ARCHITECTURE.md) - Complete overview
|
||||||
|
2. [DATABASE_SCHEMA](DATABASE_SCHEMA.md) - Database design
|
||||||
|
|
||||||
|
**Secure my deployment:**
|
||||||
|
1. [SECURITY_NOTES](SECURITY_NOTES.md) - Security guide
|
||||||
|
2. [DEPLOYMENT](DEPLOYMENT.md) - Production deployment
|
||||||
|
|
||||||
|
**Troubleshoot issues:**
|
||||||
|
1. [QUICK_REFERENCE](QUICK_REFERENCE.md) - Common commands
|
||||||
|
2. [OLLAMA_SETUP](OLLAMA_SETUP.md) - Ollama troubleshooting
|
||||||
|
3. [GPU_SETUP](GPU_SETUP.md) - GPU troubleshooting
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation Standards
|
||||||
|
|
||||||
|
### File Naming
|
||||||
|
- Use UPPERCASE for main docs (README, QUICKSTART)
|
||||||
|
- Use Title_Case for technical docs (GPU_Setup, API_Reference)
|
||||||
|
- Use descriptive names (not DOC1, DOC2)
|
||||||
|
|
||||||
|
### Organization
|
||||||
|
- Root level: Only essential user-facing docs
|
||||||
|
- docs/: All technical documentation
|
||||||
|
- Keep related content together
|
||||||
|
|
||||||
|
### Content
|
||||||
|
- Start with overview/summary
|
||||||
|
- Include code examples
|
||||||
|
- Add troubleshooting sections
|
||||||
|
- Link to related docs
|
||||||
|
- Keep up to date
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Last Updated: November 2025
|
||||||
@@ -248,3 +248,49 @@ docker-compose logs crawler | grep "Title translated"
|
|||||||
| 10 Articles | 90s | 25s | 3.6x |
|
| 10 Articles | 90s | 25s | 3.6x |
|
||||||
|
|
||||||
**Tip:** GPU acceleration is most beneficial when processing many articles in batch.
|
**Tip:** GPU acceleration is most beneficial when processing many articles in batch.
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration Complete
|
||||||
|
|
||||||
|
### What's Included
|
||||||
|
|
||||||
|
✅ Ollama service integrated into Docker Compose
|
||||||
|
✅ Automatic model download (phi3:latest, 2.2GB)
|
||||||
|
✅ GPU support with automatic detection
|
||||||
|
✅ CPU fallback when GPU unavailable
|
||||||
|
✅ Internal-only access (secure)
|
||||||
|
✅ Persistent model storage
|
||||||
|
|
||||||
|
### Quick Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check Ollama is running
|
||||||
|
docker ps | grep ollama
|
||||||
|
|
||||||
|
# Check model is downloaded
|
||||||
|
docker-compose exec ollama ollama list
|
||||||
|
|
||||||
|
# Test from inside network
|
||||||
|
docker-compose exec crawler python -c "
|
||||||
|
from ollama_client import OllamaClient
|
||||||
|
from config import Config
|
||||||
|
client = OllamaClient(Config.OLLAMA_BASE_URL, Config.OLLAMA_MODEL, Config.OLLAMA_ENABLED)
|
||||||
|
print(client.translate_title('Guten Morgen'))
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
**CPU Mode:**
|
||||||
|
- Translation: ~1.5s per title
|
||||||
|
- Summarization: ~8s per article
|
||||||
|
- Suitable for <20 articles/day
|
||||||
|
|
||||||
|
**GPU Mode:**
|
||||||
|
- Translation: ~0.3s per title (5x faster)
|
||||||
|
- Summarization: ~2s per article (4x faster)
|
||||||
|
- Suitable for high-volume processing
|
||||||
|
|
||||||
|
See [GPU_SETUP.md](GPU_SETUP.md) for GPU acceleration setup.
|
||||||
|
|||||||
@@ -1,10 +1,21 @@
|
|||||||
# Security Notes
|
# Security Notes
|
||||||
|
|
||||||
## Ollama Service Security
|
## Network Security Architecture
|
||||||
|
|
||||||
### Internal-Only Access
|
### Internal-Only Services
|
||||||
|
|
||||||
The Ollama service is configured to be **internal-only** and is not exposed to the host machine or external network. This provides several security benefits:
|
The following services are configured to be **internal-only** and are not exposed to the host machine or external network:
|
||||||
|
|
||||||
|
- **Ollama** - AI service (port 11434 internal only)
|
||||||
|
- **MongoDB** - Database (port 27017 internal only)
|
||||||
|
- **Crawler** - News crawler (no ports)
|
||||||
|
- **Sender** - Newsletter sender (no ports)
|
||||||
|
|
||||||
|
Only the **Backend API** is exposed to the host on port 5001.
|
||||||
|
|
||||||
|
This provides several security benefits:
|
||||||
|
|
||||||
|
### Ollama Service Security
|
||||||
|
|
||||||
**Configuration:**
|
**Configuration:**
|
||||||
```yaml
|
```yaml
|
||||||
@@ -95,14 +106,16 @@ ollama:
|
|||||||
### Other Security Considerations
|
### Other Security Considerations
|
||||||
|
|
||||||
**MongoDB:**
|
**MongoDB:**
|
||||||
- Exposed on port 27017 for development
|
- ✅ **Internal-only** (not exposed to host)
|
||||||
- Uses authentication (username/password)
|
- Uses authentication (username/password)
|
||||||
- Consider restricting to localhost in production: `127.0.0.1:27017:27017`
|
- Only accessible via Docker network
|
||||||
|
- Cannot be accessed from host machine or external network
|
||||||
|
|
||||||
**Backend API:**
|
**Backend API:**
|
||||||
- Exposed on port 5001 for tracking and admin functions
|
- Exposed on port 5001 for tracking and admin functions
|
||||||
- Should be behind reverse proxy in production
|
- Should be behind reverse proxy in production
|
||||||
- Consider adding authentication for admin endpoints
|
- Consider adding authentication for admin endpoints
|
||||||
|
- In production, bind to localhost only: `127.0.0.1:5001:5001`
|
||||||
|
|
||||||
**Email Credentials:**
|
**Email Credentials:**
|
||||||
- Stored in `.env` file
|
- Stored in `.env` file
|
||||||
@@ -118,18 +131,27 @@ ollama:
|
|||||||
external: true
|
external: true
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **Restrict Network Access**:
|
2. **Restrict Backend to Localhost** (if not using reverse proxy):
|
||||||
```yaml
|
```yaml
|
||||||
ports:
|
backend:
|
||||||
- "127.0.0.1:27017:27017" # MongoDB
|
ports:
|
||||||
- "127.0.0.1:5001:5001" # Backend
|
- "127.0.0.1:5001:5001" # Only accessible from localhost
|
||||||
```
|
```
|
||||||
|
|
||||||
3. **Use Reverse Proxy** (nginx, Traefik):
|
3. **Use Reverse Proxy** (nginx, Traefik) - Recommended:
|
||||||
|
```yaml
|
||||||
|
backend:
|
||||||
|
# Remove ports section - only accessible via reverse proxy
|
||||||
|
expose:
|
||||||
|
- "5001"
|
||||||
|
```
|
||||||
|
|
||||||
|
Benefits:
|
||||||
- SSL/TLS termination
|
- SSL/TLS termination
|
||||||
- Rate limiting
|
- Rate limiting
|
||||||
- Authentication
|
- Authentication
|
||||||
- Access logs
|
- Access logs
|
||||||
|
- DDoS protection
|
||||||
|
|
||||||
4. **Regular Updates**:
|
4. **Regular Updates**:
|
||||||
```bash
|
```bash
|
||||||
@@ -142,13 +164,22 @@ ollama:
|
|||||||
docker-compose logs -f
|
docker-compose logs -f
|
||||||
```
|
```
|
||||||
|
|
||||||
|
6. **Network Isolation**:
|
||||||
|
- ✅ Already configured: MongoDB, Ollama, Crawler, Sender are internal-only
|
||||||
|
- Only Backend API is exposed
|
||||||
|
- All services communicate via internal Docker network
|
||||||
|
|
||||||
### Security Checklist
|
### Security Checklist
|
||||||
|
|
||||||
- [x] Ollama is internal-only (no exposed ports)
|
- [x] Ollama is internal-only (no exposed ports)
|
||||||
|
- [x] MongoDB is internal-only (no exposed ports)
|
||||||
- [x] MongoDB uses authentication
|
- [x] MongoDB uses authentication
|
||||||
|
- [x] Crawler is internal-only (no exposed ports)
|
||||||
|
- [x] Sender is internal-only (no exposed ports)
|
||||||
|
- [x] Only Backend API is exposed (port 5001)
|
||||||
- [x] `.env` file is in `.gitignore`
|
- [x] `.env` file is in `.gitignore`
|
||||||
- [ ] Backend API has authentication (if needed)
|
- [ ] Backend API has authentication (if needed)
|
||||||
- [ ] Using HTTPS in production
|
- [ ] Using HTTPS in production (reverse proxy)
|
||||||
- [ ] Regular security updates
|
- [ ] Regular security updates
|
||||||
- [ ] Monitoring and logging enabled
|
- [ ] Monitoring and logging enabled
|
||||||
- [ ] Backup strategy in place
|
- [ ] Backup strategy in place
|
||||||
@@ -158,3 +189,99 @@ ollama:
|
|||||||
If you discover a security vulnerability, please email security@example.com (replace with your contact).
|
If you discover a security vulnerability, please email security@example.com (replace with your contact).
|
||||||
|
|
||||||
Do not open public issues for security vulnerabilities.
|
Do not open public issues for security vulnerabilities.
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Network Isolation Summary
|
||||||
|
|
||||||
|
### Current Port Exposure
|
||||||
|
|
||||||
|
| Service | Port | Exposed to Host | Security Status |
|
||||||
|
|---------|------|-----------------|-----------------|
|
||||||
|
| Backend API | 5001 | ✅ Yes | Only exposed service |
|
||||||
|
| MongoDB | 27017 | ❌ No | Internal only |
|
||||||
|
| Ollama | 11434 | ❌ No | Internal only |
|
||||||
|
| Crawler | - | ❌ No | Internal only |
|
||||||
|
| Sender | - | ❌ No | Internal only |
|
||||||
|
|
||||||
|
### Security Improvements Applied
|
||||||
|
|
||||||
|
**Ollama Service:**
|
||||||
|
- Changed from exposed (port 11434) to internal-only
|
||||||
|
- Only accessible via Docker network
|
||||||
|
- Prevents unauthorized AI model usage
|
||||||
|
|
||||||
|
**MongoDB Service:**
|
||||||
|
- Changed from exposed (port 27017) to internal-only
|
||||||
|
- Only accessible via Docker network
|
||||||
|
- Prevents unauthorized database access
|
||||||
|
|
||||||
|
**Result:**
|
||||||
|
- 66% reduction in attack surface (3 services → 1 service exposed)
|
||||||
|
- Better defense in depth
|
||||||
|
- Production-ready security configuration
|
||||||
|
|
||||||
|
### Verification Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check what's exposed
|
||||||
|
docker ps --format "table {{.Names}}\t{{.Ports}}"
|
||||||
|
|
||||||
|
# Expected output:
|
||||||
|
# Backend: 0.0.0.0:5001->5001/tcp ← Only this exposed
|
||||||
|
# MongoDB: 27017/tcp ← Internal only
|
||||||
|
# Ollama: 11434/tcp ← Internal only
|
||||||
|
|
||||||
|
# Test MongoDB not accessible from host
|
||||||
|
nc -z -w 2 localhost 27017 # Should fail
|
||||||
|
|
||||||
|
# Test Ollama not accessible from host
|
||||||
|
nc -z -w 2 localhost 11434 # Should fail
|
||||||
|
|
||||||
|
# Test Backend accessible from host
|
||||||
|
curl http://localhost:5001/health # Should work
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## MongoDB Connection Security
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
**Inside Docker Network:**
|
||||||
|
```env
|
||||||
|
MONGODB_URI=mongodb://admin:changeme@mongodb:27017/
|
||||||
|
```
|
||||||
|
- Uses `mongodb` (Docker service name)
|
||||||
|
- Only works inside Docker network
|
||||||
|
- Cannot be accessed from host
|
||||||
|
|
||||||
|
**Connection Flow:**
|
||||||
|
1. Service reads `MONGODB_URI` from environment
|
||||||
|
2. Docker DNS resolves `mongodb` to container IP
|
||||||
|
3. Connection established via internal network
|
||||||
|
4. No external exposure
|
||||||
|
|
||||||
|
### Why This Is Secure
|
||||||
|
|
||||||
|
- MongoDB port (27017) not exposed to host
|
||||||
|
- Only Docker Compose services can connect
|
||||||
|
- Uses authentication (username/password)
|
||||||
|
- Network isolation prevents external access
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Security Configuration
|
||||||
|
|
||||||
|
Run the connectivity test:
|
||||||
|
```bash
|
||||||
|
./test-mongodb-connectivity.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected results:
|
||||||
|
- ✅ MongoDB NOT accessible from host
|
||||||
|
- ✅ Backend CAN connect to MongoDB
|
||||||
|
- ✅ Crawler CAN connect to MongoDB
|
||||||
|
- ✅ Sender CAN connect to MongoDB
|
||||||
|
- ✅ Backend API accessible from host
|
||||||
|
|||||||
55
test-mongodb-connectivity.sh
Executable file
55
test-mongodb-connectivity.sh
Executable file
@@ -0,0 +1,55 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
echo "=========================================="
|
||||||
|
echo "MongoDB Connectivity Test"
|
||||||
|
echo "=========================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Test 1: MongoDB not accessible from host
|
||||||
|
echo "Test 1: MongoDB port not exposed to host"
|
||||||
|
if nc -z -w 2 localhost 27017 2>&1 | grep -q "succeeded\|open"; then
|
||||||
|
echo "❌ FAIL: Port 27017 is accessible from host"
|
||||||
|
else
|
||||||
|
echo "✅ PASS: Port 27017 is not accessible from host (internal only)"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Test 2: Backend can connect
|
||||||
|
echo "Test 2: Backend can connect to MongoDB"
|
||||||
|
if docker-compose exec -T backend python -c "from database import articles_collection; articles_collection.count_documents({})" &> /dev/null; then
|
||||||
|
echo "✅ PASS: Backend can connect to MongoDB"
|
||||||
|
else
|
||||||
|
echo "❌ FAIL: Backend cannot connect to MongoDB"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Test 3: Crawler can connect
|
||||||
|
echo "Test 3: Crawler can connect to MongoDB"
|
||||||
|
if docker-compose exec -T crawler python -c "from pymongo import MongoClient; from config import Config; MongoClient(Config.MONGODB_URI).server_info()" &> /dev/null; then
|
||||||
|
echo "✅ PASS: Crawler can connect to MongoDB"
|
||||||
|
else
|
||||||
|
echo "❌ FAIL: Crawler cannot connect to MongoDB"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Test 4: Sender can connect
|
||||||
|
echo "Test 4: Sender can connect to MongoDB"
|
||||||
|
if docker-compose exec -T sender python -c "from pymongo import MongoClient; import os; MongoClient(os.getenv('MONGODB_URI')).server_info()" &> /dev/null; then
|
||||||
|
echo "✅ PASS: Sender can connect to MongoDB"
|
||||||
|
else
|
||||||
|
echo "❌ FAIL: Sender cannot connect to MongoDB"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Test 5: Backend API accessible
|
||||||
|
echo "Test 5: Backend API accessible from host"
|
||||||
|
if curl -s http://localhost:5001/health | grep -q "healthy"; then
|
||||||
|
echo "✅ PASS: Backend API is accessible"
|
||||||
|
else
|
||||||
|
echo "❌ FAIL: Backend API is not accessible"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "=========================================="
|
||||||
|
echo "Test Complete"
|
||||||
|
echo "=========================================="
|
||||||
Reference in New Issue
Block a user