461 lines
12 KiB
Markdown
461 lines
12 KiB
Markdown
# Munich News Daily - Automated Newsletter System
|
|
|
|
A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.
|
|
|
|
## ✨ Key Features
|
|
|
|
- **🤖 AI-Powered Clustering** - Automatically detects duplicate stories from different sources
|
|
- **📰 Neutral Summaries** - Combines multiple perspectives into balanced coverage
|
|
- **🎯 Smart Prioritization** - Shows most important stories first (multi-source coverage)
|
|
- **🎨 Personalized Newsletters** - AI-powered content recommendations based on user interests
|
|
- **📊 Engagement Tracking** - Open rates, click tracking, and analytics
|
|
- **⚡ GPU Acceleration** - 5-10x faster AI processing with GPU support
|
|
- **🔒 GDPR Compliant** - Privacy-first with data retention controls
|
|
|
|
**🚀 NEW:** GPU acceleration support for 5-10x faster AI processing! See [docs/GPU_SETUP.md](docs/GPU_SETUP.md)
|
|
|
|
## 🚀 Quick Start
|
|
|
|
```bash
|
|
# 1. Configure environment
|
|
cp backend/.env.example backend/.env
|
|
# Edit backend/.env with your email settings
|
|
|
|
# 2. Start everything
|
|
docker-compose up -d
|
|
|
|
# 3. View logs
|
|
docker-compose logs -f
|
|
```
|
|
|
|
That's it! The system will automatically:
|
|
- **Frontend**: Web interface and admin dashboard (http://localhost:3000)
|
|
- **Backend API**: Runs continuously for tracking and analytics (http://localhost:5001)
|
|
- **6:00 AM Berlin time**: Crawl news articles and generate summaries
|
|
- **7:00 AM Berlin time**: Send newsletter to all subscribers
|
|
|
|
### Access Points
|
|
|
|
- **Newsletter Page**: http://localhost:3000
|
|
- **Admin Dashboard**: http://localhost:3000/admin.html
|
|
- **Backend API**: http://localhost:5001
|
|
|
|
📖 **New to the project?** See [QUICKSTART.md](QUICKSTART.md) for a detailed 5-minute setup guide.
|
|
|
|
🚀 **GPU Acceleration:** Enable 5-10x faster AI processing with [GPU Setup Guide](docs/GPU_SETUP.md)
|
|
|
|
## 📋 System Overview
|
|
|
|
```
|
|
6:00 AM → News Crawler
|
|
↓
|
|
Fetches articles from RSS feeds
|
|
Extracts full content
|
|
Generates AI summaries
|
|
Saves to MongoDB
|
|
↓
|
|
7:00 AM → Newsletter Sender
|
|
↓
|
|
Waits for crawler to finish
|
|
Fetches today's articles
|
|
Generates newsletter with tracking
|
|
Sends to all subscribers
|
|
↓
|
|
✅ Done! Repeat tomorrow
|
|
```
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### Components
|
|
|
|
- **Ollama**: AI service for summarization and translation (internal only, GPU-accelerated)
|
|
- **MongoDB**: Data storage (articles, subscribers, tracking) (internal only)
|
|
- **Backend API**: Flask API for tracking and analytics (port 5001 - only exposed service)
|
|
- **News Crawler**: Automated RSS feed crawler with AI summarization (internal only)
|
|
- **Newsletter Sender**: Automated email sender with tracking (internal only)
|
|
- **Frontend**: React dashboard (optional)
|
|
|
|
### Technology Stack
|
|
|
|
- Python 3.11
|
|
- MongoDB 7.0
|
|
- Ollama (phi3:latest model for AI)
|
|
- Docker & Docker Compose
|
|
- Flask (API)
|
|
- Schedule (automation)
|
|
- Jinja2 (email templates)
|
|
|
|
## 📦 Installation
|
|
|
|
### Prerequisites
|
|
|
|
- Docker & Docker Compose
|
|
- 4GB+ RAM (for Ollama AI models)
|
|
- (Optional) NVIDIA GPU for 5-10x faster AI processing
|
|
|
|
### Setup
|
|
|
|
1. **Clone the repository**
|
|
```bash
|
|
git clone <repository-url>
|
|
cd munich-news
|
|
```
|
|
|
|
2. **Configure environment**
|
|
```bash
|
|
cp backend/.env.example backend/.env
|
|
# Edit backend/.env with your settings
|
|
```
|
|
|
|
3. **Configure Ollama (AI features)**
|
|
```bash
|
|
# Option 1: Use integrated Docker Compose Ollama (recommended)
|
|
./configure-ollama.sh
|
|
# Select option 1
|
|
|
|
# Option 2: Use external Ollama server
|
|
# Install from https://ollama.ai/download
|
|
# Then run: ollama pull phi3:latest
|
|
```
|
|
|
|
4. **Start the system**
|
|
```bash
|
|
# Auto-detect GPU and start (recommended)
|
|
./start-with-gpu.sh
|
|
|
|
# Or start manually
|
|
docker-compose up -d
|
|
|
|
# First time: Wait for Ollama model download (2-5 minutes)
|
|
docker-compose logs -f ollama-setup
|
|
```
|
|
|
|
📖 **For detailed Ollama setup & GPU acceleration:** See [docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)
|
|
|
|
💡 **To change AI model:** Edit `OLLAMA_MODEL` in `.env`, then run `./pull-ollama-model.sh`. See [docs/CHANGING_AI_MODEL.md](docs/CHANGING_AI_MODEL.md)
|
|
|
|
## ⚙️ Configuration
|
|
|
|
Edit `backend/.env`:
|
|
|
|
```env
|
|
# MongoDB
|
|
MONGODB_URI=mongodb://localhost:27017/
|
|
|
|
# Email (SMTP)
|
|
SMTP_SERVER=smtp.gmail.com
|
|
SMTP_PORT=587
|
|
EMAIL_USER=your-email@gmail.com
|
|
EMAIL_PASSWORD=your-app-password
|
|
|
|
# Newsletter
|
|
NEWSLETTER_MAX_ARTICLES=10
|
|
NEWSLETTER_HOURS_LOOKBACK=24
|
|
|
|
# Tracking
|
|
TRACKING_ENABLED=true
|
|
TRACKING_API_URL=http://localhost:5001
|
|
TRACKING_DATA_RETENTION_DAYS=90
|
|
|
|
# Ollama (AI Summarization)
|
|
OLLAMA_ENABLED=true
|
|
OLLAMA_BASE_URL=http://127.0.0.1:11434
|
|
OLLAMA_MODEL=phi3:latest
|
|
```
|
|
|
|
## 📊 Usage
|
|
|
|
### View Logs
|
|
|
|
```bash
|
|
# All services
|
|
docker-compose logs -f
|
|
|
|
# Specific service
|
|
docker-compose logs -f crawler
|
|
docker-compose logs -f sender
|
|
docker-compose logs -f mongodb
|
|
```
|
|
|
|
### Manual Operations
|
|
|
|
```bash
|
|
# Run crawler manually
|
|
docker-compose exec crawler python crawler_service.py 10
|
|
|
|
# Send test newsletter
|
|
docker-compose exec sender python sender_service.py test your-email@example.com
|
|
|
|
# Preview newsletter
|
|
docker-compose exec sender python sender_service.py preview
|
|
```
|
|
|
|
### Database Access
|
|
|
|
```bash
|
|
# Connect to MongoDB
|
|
docker-compose exec mongodb mongosh munich_news
|
|
|
|
# View articles
|
|
db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()
|
|
|
|
# View subscribers
|
|
db.subscribers.find({ active: true }).pretty()
|
|
|
|
# View tracking data
|
|
db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()
|
|
```
|
|
|
|
## 🔧 Management
|
|
|
|
### Add RSS Feeds
|
|
|
|
```bash
|
|
mongosh munich_news
|
|
|
|
db.rss_feeds.insertOne({
|
|
name: "Source Name",
|
|
url: "https://example.com/rss",
|
|
active: true
|
|
})
|
|
```
|
|
|
|
### Add Subscribers
|
|
|
|
```bash
|
|
mongosh munich_news
|
|
|
|
db.subscribers.insertOne({
|
|
email: "user@example.com",
|
|
active: true,
|
|
tracking_enabled: true,
|
|
subscribed_at: new Date()
|
|
})
|
|
```
|
|
|
|
### View Analytics
|
|
|
|
```bash
|
|
# Newsletter metrics
|
|
curl http://localhost:5001/api/analytics/newsletter/2024-01-15
|
|
|
|
# Article performance
|
|
curl http://localhost:5001/api/analytics/article/https://example.com/article
|
|
|
|
# Subscriber activity
|
|
curl http://localhost:5001/api/analytics/subscriber/user@example.com
|
|
```
|
|
|
|
## ⏰ Schedule Configuration
|
|
|
|
### Change Crawler Time (default: 6:00 AM)
|
|
|
|
Edit `news_crawler/scheduled_crawler.py`:
|
|
```python
|
|
schedule.every().day.at("06:00").do(run_crawler) # Change time
|
|
```
|
|
|
|
### Change Sender Time (default: 7:00 AM)
|
|
|
|
Edit `news_sender/scheduled_sender.py`:
|
|
```python
|
|
schedule.every().day.at("07:00").do(run_sender) # Change time
|
|
```
|
|
|
|
After changes:
|
|
```bash
|
|
docker-compose up -d --build
|
|
```
|
|
|
|
## 📈 Monitoring
|
|
|
|
### Container Status
|
|
|
|
```bash
|
|
docker-compose ps
|
|
```
|
|
|
|
### Check Next Scheduled Runs
|
|
|
|
```bash
|
|
# Crawler
|
|
docker-compose logs crawler | grep "Next scheduled run"
|
|
|
|
# Sender
|
|
docker-compose logs sender | grep "Next scheduled run"
|
|
```
|
|
|
|
### Engagement Metrics
|
|
|
|
```bash
|
|
mongosh munich_news
|
|
|
|
// Open rate
|
|
var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
|
|
var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
|
|
print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")
|
|
|
|
// Click rate
|
|
var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
|
|
print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")
|
|
```
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Crawler Not Finding Articles
|
|
|
|
```bash
|
|
# Check RSS feeds
|
|
mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
|
|
|
|
# Test manually
|
|
docker-compose exec crawler python crawler_service.py 5
|
|
```
|
|
|
|
### Newsletter Not Sending
|
|
|
|
```bash
|
|
# Check email config
|
|
docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
|
|
|
|
# Test email
|
|
docker-compose exec sender python sender_service.py test your-email@example.com
|
|
```
|
|
|
|
### Containers Not Starting
|
|
|
|
```bash
|
|
# Check logs
|
|
docker-compose logs
|
|
|
|
# Rebuild
|
|
docker-compose up -d --build
|
|
|
|
# Reset everything
|
|
docker-compose down -v
|
|
docker-compose up -d
|
|
```
|
|
|
|
## 🔐 Privacy & Compliance
|
|
|
|
### GDPR Features
|
|
|
|
- **Data Retention**: Automatic anonymization after 90 days
|
|
- **Opt-Out**: Subscribers can disable tracking
|
|
- **Data Deletion**: Full data removal on request
|
|
- **Transparency**: Privacy notice in all emails
|
|
|
|
### Privacy Endpoints
|
|
|
|
```bash
|
|
# Delete subscriber data
|
|
curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
|
|
|
|
# Anonymize old data
|
|
curl -X POST http://localhost:5001/api/tracking/anonymize
|
|
|
|
# Opt out of tracking
|
|
curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
|
|
```
|
|
|
|
## 📚 Documentation
|
|
|
|
### Getting Started
|
|
- **[QUICKSTART.md](QUICKSTART.md)** - 5-minute setup guide
|
|
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines
|
|
|
|
### Core Features
|
|
- **[docs/AI_NEWS_AGGREGATION.md](docs/AI_NEWS_AGGREGATION.md)** - AI-powered clustering & neutral summaries
|
|
- **[docs/PERSONALIZATION.md](docs/PERSONALIZATION.md)** - Personalized newsletter system
|
|
- **[docs/PERSONALIZATION_COMPLETE.md](docs/PERSONALIZATION_COMPLETE.md)** - Personalization implementation guide
|
|
- **[docs/FEATURES.md](docs/FEATURES.md)** - Complete feature list
|
|
- **[docs/API.md](docs/API.md)** - API endpoints reference
|
|
|
|
### Technical Documentation
|
|
- **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** - System architecture
|
|
- **[docs/SETUP.md](docs/SETUP.md)** - Detailed setup guide
|
|
- **[docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)** - AI/Ollama configuration
|
|
- **[docs/GPU_SETUP.md](docs/GPU_SETUP.md)** - GPU acceleration setup
|
|
- **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Production deployment
|
|
- **[docs/SECURITY.md](docs/SECURITY.md)** - Security best practices
|
|
- **[docs/REFERENCE.md](docs/REFERENCE.md)** - Complete reference
|
|
- **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Deployment guide
|
|
- **[docs/API.md](docs/API.md)** - API reference
|
|
- **[docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md)** - Database structure
|
|
- **[docs/BACKEND_STRUCTURE.md](docs/BACKEND_STRUCTURE.md)** - Backend organization
|
|
|
|
### Component Documentation
|
|
- **[docs/CRAWLER_HOW_IT_WORKS.md](docs/CRAWLER_HOW_IT_WORKS.md)** - Crawler internals
|
|
- **[docs/EXTRACTION_STRATEGIES.md](docs/EXTRACTION_STRATEGIES.md)** - Content extraction
|
|
- **[docs/RSS_URL_EXTRACTION.md](docs/RSS_URL_EXTRACTION.md)** - RSS parsing
|
|
|
|
## 🧪 Testing
|
|
|
|
All test files are organized in the `tests/` directory:
|
|
|
|
```bash
|
|
# Run crawler tests
|
|
docker-compose exec crawler python tests/crawler/test_crawler.py
|
|
|
|
# Run sender tests
|
|
docker-compose exec sender python tests/sender/test_tracking_integration.py
|
|
|
|
# Run backend tests
|
|
docker-compose exec backend python tests/backend/test_tracking.py
|
|
|
|
# Test personalization system (all 4 phases)
|
|
docker exec munich-news-local-backend python test_personalization_system.py
|
|
```
|
|
|
|
## 🚀 Production Deployment
|
|
|
|
### Environment Setup
|
|
|
|
1. Update `backend/.env` with production values
|
|
2. Set strong MongoDB password
|
|
3. Use HTTPS for tracking URLs
|
|
4. Configure proper SMTP server
|
|
|
|
### Security
|
|
|
|
```bash
|
|
# Use production compose file
|
|
docker-compose -f docker-compose.prod.yml up -d
|
|
|
|
# Set MongoDB password
|
|
export MONGO_PASSWORD=your-secure-password
|
|
```
|
|
|
|
### Monitoring
|
|
|
|
- Set up log rotation
|
|
- Configure health checks
|
|
- Set up alerts for failures
|
|
- Monitor database size
|
|
|
|
## 📚 Documentation
|
|
|
|
Complete documentation available in the [docs/](docs/) directory:
|
|
|
|
- **[Documentation Index](docs/INDEX.md)** - Complete documentation guide
|
|
- **[GPU Setup](docs/GPU_SETUP.md)** - 5-10x faster with GPU acceleration
|
|
- **[Admin API](docs/ADMIN_API.md)** - API endpoints reference
|
|
- **[Security Guide](docs/SECURITY_NOTES.md)** - Security best practices
|
|
- **[System Architecture](docs/SYSTEM_ARCHITECTURE.md)** - Technical overview
|
|
|
|
## 📝 License
|
|
|
|
[Your License Here]
|
|
|
|
## 🤝 Contributing
|
|
|
|
Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first.
|
|
|
|
## 📧 Support
|
|
|
|
For issues or questions, please open a GitHub issue.
|
|
|
|
---
|
|
|
|
**Built with ❤️ for Munich News Daily**
|