11 KiB
Munich News Daily - Automated Newsletter System
A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.
✨ Key Features
- 🤖 AI-Powered Clustering - Automatically detects duplicate stories from different sources
- 📰 Neutral Summaries - Combines multiple perspectives into balanced coverage
- 🎯 Smart Prioritization - Shows most important stories first (multi-source coverage)
- 📊 Engagement Tracking - Open rates, click tracking, and analytics
- ⚡ GPU Acceleration - 5-10x faster AI processing with GPU support
- 🔒 GDPR Compliant - Privacy-first with data retention controls
🚀 NEW: GPU acceleration support for 5-10x faster AI processing! See docs/GPU_SETUP.md
🚀 Quick Start
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your email settings
# 2. Start everything
docker-compose up -d
# 3. View logs
docker-compose logs -f
That's it! The system will automatically:
- Backend API: Runs continuously for tracking and analytics (http://localhost:5001)
- 6:00 AM Berlin time: Crawl news articles and generate summaries
- 7:00 AM Berlin time: Send newsletter to all subscribers
📖 New to the project? See QUICKSTART.md for a detailed 5-minute setup guide.
🚀 GPU Acceleration: Enable 5-10x faster AI processing with GPU Setup Guide
📋 System Overview
6:00 AM → News Crawler
↓
Fetches articles from RSS feeds
Extracts full content
Generates AI summaries
Saves to MongoDB
↓
7:00 AM → Newsletter Sender
↓
Waits for crawler to finish
Fetches today's articles
Generates newsletter with tracking
Sends to all subscribers
↓
✅ Done! Repeat tomorrow
🏗️ Architecture
Components
- Ollama: AI service for summarization and translation (internal only, GPU-accelerated)
- MongoDB: Data storage (articles, subscribers, tracking) (internal only)
- Backend API: Flask API for tracking and analytics (port 5001 - only exposed service)
- News Crawler: Automated RSS feed crawler with AI summarization (internal only)
- Newsletter Sender: Automated email sender with tracking (internal only)
- Frontend: React dashboard (optional)
Technology Stack
- Python 3.11
- MongoDB 7.0
- Ollama (phi3:latest model for AI)
- Docker & Docker Compose
- Flask (API)
- Schedule (automation)
- Jinja2 (email templates)
📦 Installation
Prerequisites
- Docker & Docker Compose
- 4GB+ RAM (for Ollama AI models)
- (Optional) NVIDIA GPU for 5-10x faster AI processing
Setup
-
Clone the repository
git clone <repository-url> cd munich-news -
Configure environment
cp backend/.env.example backend/.env # Edit backend/.env with your settings -
Configure Ollama (AI features)
# Option 1: Use integrated Docker Compose Ollama (recommended) ./configure-ollama.sh # Select option 1 # Option 2: Use external Ollama server # Install from https://ollama.ai/download # Then run: ollama pull phi3:latest -
Start the system
# Auto-detect GPU and start (recommended) ./start-with-gpu.sh # Or start manually docker-compose up -d # First time: Wait for Ollama model download (2-5 minutes) docker-compose logs -f ollama-setup
📖 For detailed Ollama setup & GPU acceleration: See docs/OLLAMA_SETUP.md
💡 To change AI model: Edit OLLAMA_MODEL in .env, then run ./pull-ollama-model.sh. See docs/CHANGING_AI_MODEL.md
⚙️ Configuration
Edit backend/.env:
# MongoDB
MONGODB_URI=mongodb://localhost:27017/
# Email (SMTP)
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
EMAIL_USER=your-email@gmail.com
EMAIL_PASSWORD=your-app-password
# Newsletter
NEWSLETTER_MAX_ARTICLES=10
NEWSLETTER_HOURS_LOOKBACK=24
# Tracking
TRACKING_ENABLED=true
TRACKING_API_URL=http://localhost:5001
TRACKING_DATA_RETENTION_DAYS=90
# Ollama (AI Summarization)
OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_MODEL=phi3:latest
📊 Usage
View Logs
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f crawler
docker-compose logs -f sender
docker-compose logs -f mongodb
Manual Operations
# Run crawler manually
docker-compose exec crawler python crawler_service.py 10
# Send test newsletter
docker-compose exec sender python sender_service.py test your-email@example.com
# Preview newsletter
docker-compose exec sender python sender_service.py preview
Database Access
# Connect to MongoDB
docker-compose exec mongodb mongosh munich_news
# View articles
db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()
# View subscribers
db.subscribers.find({ active: true }).pretty()
# View tracking data
db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()
🔧 Management
Add RSS Feeds
mongosh munich_news
db.rss_feeds.insertOne({
name: "Source Name",
url: "https://example.com/rss",
active: true
})
Add Subscribers
mongosh munich_news
db.subscribers.insertOne({
email: "user@example.com",
active: true,
tracking_enabled: true,
subscribed_at: new Date()
})
View Analytics
# Newsletter metrics
curl http://localhost:5001/api/analytics/newsletter/2024-01-15
# Article performance
curl http://localhost:5001/api/analytics/article/https://example.com/article
# Subscriber activity
curl http://localhost:5001/api/analytics/subscriber/user@example.com
⏰ Schedule Configuration
Change Crawler Time (default: 6:00 AM)
Edit news_crawler/scheduled_crawler.py:
schedule.every().day.at("06:00").do(run_crawler) # Change time
Change Sender Time (default: 7:00 AM)
Edit news_sender/scheduled_sender.py:
schedule.every().day.at("07:00").do(run_sender) # Change time
After changes:
docker-compose up -d --build
📈 Monitoring
Container Status
docker-compose ps
Check Next Scheduled Runs
# Crawler
docker-compose logs crawler | grep "Next scheduled run"
# Sender
docker-compose logs sender | grep "Next scheduled run"
Engagement Metrics
mongosh munich_news
// Open rate
var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")
// Click rate
var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")
🐛 Troubleshooting
Crawler Not Finding Articles
# Check RSS feeds
mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
# Test manually
docker-compose exec crawler python crawler_service.py 5
Newsletter Not Sending
# Check email config
docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
# Test email
docker-compose exec sender python sender_service.py test your-email@example.com
Containers Not Starting
# Check logs
docker-compose logs
# Rebuild
docker-compose up -d --build
# Reset everything
docker-compose down -v
docker-compose up -d
🔐 Privacy & Compliance
GDPR Features
- Data Retention: Automatic anonymization after 90 days
- Opt-Out: Subscribers can disable tracking
- Data Deletion: Full data removal on request
- Transparency: Privacy notice in all emails
Privacy Endpoints
# Delete subscriber data
curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
# Anonymize old data
curl -X POST http://localhost:5001/api/tracking/anonymize
# Opt out of tracking
curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
📚 Documentation
Getting Started
- QUICKSTART.md - 5-minute setup guide
- CONTRIBUTING.md - Contribution guidelines
Core Features
- docs/AI_NEWS_AGGREGATION.md - AI-powered clustering & neutral summaries
- docs/FEATURES.md - Complete feature list
- docs/API.md - API endpoints reference
Technical Documentation
- docs/ARCHITECTURE.md - System architecture
- docs/SETUP.md - Detailed setup guide
- docs/OLLAMA_SETUP.md - AI/Ollama configuration
- docs/GPU_SETUP.md - GPU acceleration setup
- docs/DEPLOYMENT.md - Production deployment
- docs/SECURITY.md - Security best practices
- docs/REFERENCE.md - Complete reference
- docs/DEPLOYMENT.md - Deployment guide
- docs/API.md - API reference
- docs/DATABASE_SCHEMA.md - Database structure
- docs/BACKEND_STRUCTURE.md - Backend organization
Component Documentation
- docs/CRAWLER_HOW_IT_WORKS.md - Crawler internals
- docs/EXTRACTION_STRATEGIES.md - Content extraction
- docs/RSS_URL_EXTRACTION.md - RSS parsing
🧪 Testing
All test files are organized in the tests/ directory:
# Run crawler tests
docker-compose exec crawler python tests/crawler/test_crawler.py
# Run sender tests
docker-compose exec sender python tests/sender/test_tracking_integration.py
# Run backend tests
docker-compose exec backend python tests/backend/test_tracking.py
🚀 Production Deployment
Environment Setup
- Update
backend/.envwith production values - Set strong MongoDB password
- Use HTTPS for tracking URLs
- Configure proper SMTP server
Security
# Use production compose file
docker-compose -f docker-compose.prod.yml up -d
# Set MongoDB password
export MONGO_PASSWORD=your-secure-password
Monitoring
- Set up log rotation
- Configure health checks
- Set up alerts for failures
- Monitor database size
📚 Documentation
Complete documentation available in the docs/ directory:
- Documentation Index - Complete documentation guide
- GPU Setup - 5-10x faster with GPU acceleration
- Admin API - API endpoints reference
- Security Guide - Security best practices
- System Architecture - Technical overview
📝 License
[Your License Here]
🤝 Contributing
Contributions welcome! Please read CONTRIBUTING.md first.
📧 Support
For issues or questions, please open a GitHub issue.
Built with ❤️ for Munich News Daily