This commit is contained in:
2025-11-11 14:09:21 +01:00
parent bcd0a10576
commit 1075a91eac
57 changed files with 5598 additions and 1366 deletions

131
docs/ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,131 @@
# System Architecture
## Overview
Munich News Daily is a fully automated news aggregation and newsletter system with the following components:
```
┌─────────────────────────────────────────────────────────────────┐
│ Munich News Daily System │
└─────────────────────────────────────────────────────────────────┘
6:00 AM Berlin → News Crawler
Fetches RSS feeds
Extracts full content
Generates AI summaries
Saves to MongoDB
7:00 AM Berlin → Newsletter Sender
Waits for crawler
Fetches articles
Generates newsletter
Sends to subscribers
✅ Done!
```
## Components
### 1. MongoDB Database
- **Purpose**: Central data storage
- **Collections**:
- `articles`: News articles with summaries
- `subscribers`: Email subscribers
- `rss_feeds`: RSS feed sources
- `newsletter_sends`: Email tracking data
- `link_clicks`: Link click tracking
- `subscriber_activity`: Engagement metrics
### 2. News Crawler
- **Schedule**: Daily at 6:00 AM Berlin time
- **Functions**:
- Fetches articles from RSS feeds
- Extracts full article content
- Generates AI summaries using Ollama
- Saves to MongoDB
- **Technology**: Python, BeautifulSoup, Ollama
### 3. Newsletter Sender
- **Schedule**: Daily at 7:00 AM Berlin time
- **Functions**:
- Waits for crawler to finish (max 30 min)
- Fetches today's articles
- Generates HTML newsletter
- Injects tracking pixels
- Sends to all subscribers
- **Technology**: Python, Jinja2, SMTP
### 4. Backend API (Optional)
- **Purpose**: Tracking and analytics
- **Endpoints**:
- `/api/track/pixel/<id>` - Email open tracking
- `/api/track/click/<id>` - Link click tracking
- `/api/analytics/*` - Engagement metrics
- `/api/tracking/*` - Privacy controls
- **Technology**: Flask, Python
## Data Flow
```
RSS Feeds → Crawler → MongoDB → Sender → Subscribers
Backend API
Analytics
```
## Coordination
The sender waits for the crawler to ensure fresh content:
1. Sender starts at 7:00 AM
2. Checks for recent articles every 30 seconds
3. Maximum wait time: 30 minutes
4. Proceeds once crawler finishes or timeout
## Technology Stack
- **Backend**: Python 3.11
- **Database**: MongoDB 7.0
- **AI**: Ollama (Phi3 model)
- **Scheduling**: Python schedule library
- **Email**: SMTP with HTML templates
- **Tracking**: Pixel tracking + redirect URLs
- **Infrastructure**: Docker & Docker Compose
## Deployment
All components run in Docker containers:
```
docker-compose up -d
```
Containers:
- `munich-news-mongodb` - Database
- `munich-news-crawler` - Crawler service
- `munich-news-sender` - Sender service
## Security
- MongoDB authentication enabled
- Environment variables for secrets
- HTTPS for tracking URLs (production)
- GDPR-compliant data retention
- Privacy controls (opt-out, deletion)
## Monitoring
- Docker logs for all services
- MongoDB for data verification
- Health checks on containers
- Engagement metrics via API
## Scalability
- Horizontal: Add more crawler instances
- Vertical: Increase container resources
- Database: MongoDB sharding if needed
- Caching: Redis for API responses (future)