update
This commit is contained in:
131
docs/ARCHITECTURE.md
Normal file
131
docs/ARCHITECTURE.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# System Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
Munich News Daily is a fully automated news aggregation and newsletter system with the following components:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Munich News Daily System │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
|
||||
6:00 AM Berlin → News Crawler
|
||||
↓
|
||||
Fetches RSS feeds
|
||||
Extracts full content
|
||||
Generates AI summaries
|
||||
Saves to MongoDB
|
||||
↓
|
||||
7:00 AM Berlin → Newsletter Sender
|
||||
↓
|
||||
Waits for crawler
|
||||
Fetches articles
|
||||
Generates newsletter
|
||||
Sends to subscribers
|
||||
↓
|
||||
✅ Done!
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### 1. MongoDB Database
|
||||
- **Purpose**: Central data storage
|
||||
- **Collections**:
|
||||
- `articles`: News articles with summaries
|
||||
- `subscribers`: Email subscribers
|
||||
- `rss_feeds`: RSS feed sources
|
||||
- `newsletter_sends`: Email tracking data
|
||||
- `link_clicks`: Link click tracking
|
||||
- `subscriber_activity`: Engagement metrics
|
||||
|
||||
### 2. News Crawler
|
||||
- **Schedule**: Daily at 6:00 AM Berlin time
|
||||
- **Functions**:
|
||||
- Fetches articles from RSS feeds
|
||||
- Extracts full article content
|
||||
- Generates AI summaries using Ollama
|
||||
- Saves to MongoDB
|
||||
- **Technology**: Python, BeautifulSoup, Ollama
|
||||
|
||||
### 3. Newsletter Sender
|
||||
- **Schedule**: Daily at 7:00 AM Berlin time
|
||||
- **Functions**:
|
||||
- Waits for crawler to finish (max 30 min)
|
||||
- Fetches today's articles
|
||||
- Generates HTML newsletter
|
||||
- Injects tracking pixels
|
||||
- Sends to all subscribers
|
||||
- **Technology**: Python, Jinja2, SMTP
|
||||
|
||||
### 4. Backend API (Optional)
|
||||
- **Purpose**: Tracking and analytics
|
||||
- **Endpoints**:
|
||||
- `/api/track/pixel/<id>` - Email open tracking
|
||||
- `/api/track/click/<id>` - Link click tracking
|
||||
- `/api/analytics/*` - Engagement metrics
|
||||
- `/api/tracking/*` - Privacy controls
|
||||
- **Technology**: Flask, Python
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
RSS Feeds → Crawler → MongoDB → Sender → Subscribers
|
||||
↓
|
||||
Backend API
|
||||
↓
|
||||
Analytics
|
||||
```
|
||||
|
||||
## Coordination
|
||||
|
||||
The sender waits for the crawler to ensure fresh content:
|
||||
|
||||
1. Sender starts at 7:00 AM
|
||||
2. Checks for recent articles every 30 seconds
|
||||
3. Maximum wait time: 30 minutes
|
||||
4. Proceeds once crawler finishes or timeout
|
||||
|
||||
## Technology Stack
|
||||
|
||||
- **Backend**: Python 3.11
|
||||
- **Database**: MongoDB 7.0
|
||||
- **AI**: Ollama (Phi3 model)
|
||||
- **Scheduling**: Python schedule library
|
||||
- **Email**: SMTP with HTML templates
|
||||
- **Tracking**: Pixel tracking + redirect URLs
|
||||
- **Infrastructure**: Docker & Docker Compose
|
||||
|
||||
## Deployment
|
||||
|
||||
All components run in Docker containers:
|
||||
|
||||
```
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
Containers:
|
||||
- `munich-news-mongodb` - Database
|
||||
- `munich-news-crawler` - Crawler service
|
||||
- `munich-news-sender` - Sender service
|
||||
|
||||
## Security
|
||||
|
||||
- MongoDB authentication enabled
|
||||
- Environment variables for secrets
|
||||
- HTTPS for tracking URLs (production)
|
||||
- GDPR-compliant data retention
|
||||
- Privacy controls (opt-out, deletion)
|
||||
|
||||
## Monitoring
|
||||
|
||||
- Docker logs for all services
|
||||
- MongoDB for data verification
|
||||
- Health checks on containers
|
||||
- Engagement metrics via API
|
||||
|
||||
## Scalability
|
||||
|
||||
- Horizontal: Add more crawler instances
|
||||
- Vertical: Increase container resources
|
||||
- Database: MongoDB sharding if needed
|
||||
- Caching: Redis for API responses (future)
|
||||
Reference in New Issue
Block a user