132 lines
3.7 KiB
Markdown
132 lines
3.7 KiB
Markdown
# System Architecture
|
|
|
|
## Overview
|
|
|
|
Munich News Daily is a fully automated news aggregation and newsletter system with the following components:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Munich News Daily System │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
|
|
6:00 AM Berlin → News Crawler
|
|
↓
|
|
Fetches RSS feeds
|
|
Extracts full content
|
|
Generates AI summaries
|
|
Saves to MongoDB
|
|
↓
|
|
7:00 AM Berlin → Newsletter Sender
|
|
↓
|
|
Waits for crawler
|
|
Fetches articles
|
|
Generates newsletter
|
|
Sends to subscribers
|
|
↓
|
|
✅ Done!
|
|
```
|
|
|
|
## Components
|
|
|
|
### 1. MongoDB Database
|
|
- **Purpose**: Central data storage
|
|
- **Collections**:
|
|
- `articles`: News articles with summaries
|
|
- `subscribers`: Email subscribers
|
|
- `rss_feeds`: RSS feed sources
|
|
- `newsletter_sends`: Email tracking data
|
|
- `link_clicks`: Link click tracking
|
|
- `subscriber_activity`: Engagement metrics
|
|
|
|
### 2. News Crawler
|
|
- **Schedule**: Daily at 6:00 AM Berlin time
|
|
- **Functions**:
|
|
- Fetches articles from RSS feeds
|
|
- Extracts full article content
|
|
- Generates AI summaries using Ollama
|
|
- Saves to MongoDB
|
|
- **Technology**: Python, BeautifulSoup, Ollama
|
|
|
|
### 3. Newsletter Sender
|
|
- **Schedule**: Daily at 7:00 AM Berlin time
|
|
- **Functions**:
|
|
- Waits for crawler to finish (max 30 min)
|
|
- Fetches today's articles
|
|
- Generates HTML newsletter
|
|
- Injects tracking pixels
|
|
- Sends to all subscribers
|
|
- **Technology**: Python, Jinja2, SMTP
|
|
|
|
### 4. Backend API (Optional)
|
|
- **Purpose**: Tracking and analytics
|
|
- **Endpoints**:
|
|
- `/api/track/pixel/<id>` - Email open tracking
|
|
- `/api/track/click/<id>` - Link click tracking
|
|
- `/api/analytics/*` - Engagement metrics
|
|
- `/api/tracking/*` - Privacy controls
|
|
- **Technology**: Flask, Python
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
RSS Feeds → Crawler → MongoDB → Sender → Subscribers
|
|
↓
|
|
Backend API
|
|
↓
|
|
Analytics
|
|
```
|
|
|
|
## Coordination
|
|
|
|
The sender waits for the crawler to ensure fresh content:
|
|
|
|
1. Sender starts at 7:00 AM
|
|
2. Checks for recent articles every 30 seconds
|
|
3. Maximum wait time: 30 minutes
|
|
4. Proceeds once crawler finishes or timeout
|
|
|
|
## Technology Stack
|
|
|
|
- **Backend**: Python 3.11
|
|
- **Database**: MongoDB 7.0
|
|
- **AI**: Ollama (Phi3 model)
|
|
- **Scheduling**: Python schedule library
|
|
- **Email**: SMTP with HTML templates
|
|
- **Tracking**: Pixel tracking + redirect URLs
|
|
- **Infrastructure**: Docker & Docker Compose
|
|
|
|
## Deployment
|
|
|
|
All components run in Docker containers:
|
|
|
|
```
|
|
docker-compose up -d
|
|
```
|
|
|
|
Containers:
|
|
- `munich-news-mongodb` - Database
|
|
- `munich-news-crawler` - Crawler service
|
|
- `munich-news-sender` - Sender service
|
|
|
|
## Security
|
|
|
|
- MongoDB authentication enabled
|
|
- Environment variables for secrets
|
|
- HTTPS for tracking URLs (production)
|
|
- GDPR-compliant data retention
|
|
- Privacy controls (opt-out, deletion)
|
|
|
|
## Monitoring
|
|
|
|
- Docker logs for all services
|
|
- MongoDB for data verification
|
|
- Health checks on containers
|
|
- Engagement metrics via API
|
|
|
|
## Scalability
|
|
|
|
- Horizontal: Add more crawler instances
|
|
- Vertical: Increase container resources
|
|
- Database: MongoDB sharding if needed
|
|
- Caching: Redis for API responses (future)
|