3.7 KiB
3.7 KiB
System Architecture
Overview
Munich News Daily is a fully automated news aggregation and newsletter system with the following components:
┌─────────────────────────────────────────────────────────────────┐
│ Munich News Daily System │
└─────────────────────────────────────────────────────────────────┘
6:00 AM Berlin → News Crawler
↓
Fetches RSS feeds
Extracts full content
Generates AI summaries
Saves to MongoDB
↓
7:00 AM Berlin → Newsletter Sender
↓
Waits for crawler
Fetches articles
Generates newsletter
Sends to subscribers
↓
✅ Done!
Components
1. MongoDB Database
- Purpose: Central data storage
- Collections:
articles: News articles with summariessubscribers: Email subscribersrss_feeds: RSS feed sourcesnewsletter_sends: Email tracking datalink_clicks: Link click trackingsubscriber_activity: Engagement metrics
2. News Crawler
- Schedule: Daily at 6:00 AM Berlin time
- Functions:
- Fetches articles from RSS feeds
- Extracts full article content
- Generates AI summaries using Ollama
- Saves to MongoDB
- Technology: Python, BeautifulSoup, Ollama
3. Newsletter Sender
- Schedule: Daily at 7:00 AM Berlin time
- Functions:
- Waits for crawler to finish (max 30 min)
- Fetches today's articles
- Generates HTML newsletter
- Injects tracking pixels
- Sends to all subscribers
- Technology: Python, Jinja2, SMTP
4. Backend API (Optional)
- Purpose: Tracking and analytics
- Endpoints:
/api/track/pixel/<id>- Email open tracking/api/track/click/<id>- Link click tracking/api/analytics/*- Engagement metrics/api/tracking/*- Privacy controls
- Technology: Flask, Python
Data Flow
RSS Feeds → Crawler → MongoDB → Sender → Subscribers
↓
Backend API
↓
Analytics
Coordination
The sender waits for the crawler to ensure fresh content:
- Sender starts at 7:00 AM
- Checks for recent articles every 30 seconds
- Maximum wait time: 30 minutes
- Proceeds once crawler finishes or timeout
Technology Stack
- Backend: Python 3.11
- Database: MongoDB 7.0
- AI: Ollama (Phi3 model)
- Scheduling: Python schedule library
- Email: SMTP with HTML templates
- Tracking: Pixel tracking + redirect URLs
- Infrastructure: Docker & Docker Compose
Deployment
All components run in Docker containers:
docker-compose up -d
Containers:
munich-news-mongodb- Databasemunich-news-crawler- Crawler servicemunich-news-sender- Sender service
Security
- MongoDB authentication enabled
- Environment variables for secrets
- HTTPS for tracking URLs (production)
- GDPR-compliant data retention
- Privacy controls (opt-out, deletion)
Monitoring
- Docker logs for all services
- MongoDB for data verification
- Health checks on containers
- Engagement metrics via API
Scalability
- Horizontal: Add more crawler instances
- Vertical: Increase container resources
- Database: MongoDB sharding if needed
- Caching: Redis for API responses (future)