# System Architecture ## Overview Munich News Daily is a fully automated news aggregation and newsletter system with the following components: ``` ┌─────────────────────────────────────────────────────────────────┐ │ Munich News Daily System │ └─────────────────────────────────────────────────────────────────┘ 6:00 AM Berlin → News Crawler ↓ Fetches RSS feeds Extracts full content Generates AI summaries Saves to MongoDB ↓ 7:00 AM Berlin → Newsletter Sender ↓ Waits for crawler Fetches articles Generates newsletter Sends to subscribers ↓ ✅ Done! ``` ## Components ### 1. MongoDB Database - **Purpose**: Central data storage - **Collections**: - `articles`: News articles with summaries - `subscribers`: Email subscribers - `rss_feeds`: RSS feed sources - `newsletter_sends`: Email tracking data - `link_clicks`: Link click tracking - `subscriber_activity`: Engagement metrics ### 2. News Crawler - **Schedule**: Daily at 6:00 AM Berlin time - **Functions**: - Fetches articles from RSS feeds - Extracts full article content - Generates AI summaries using Ollama - Saves to MongoDB - **Technology**: Python, BeautifulSoup, Ollama ### 3. Newsletter Sender - **Schedule**: Daily at 7:00 AM Berlin time - **Functions**: - Waits for crawler to finish (max 30 min) - Fetches today's articles - Generates HTML newsletter - Injects tracking pixels - Sends to all subscribers - **Technology**: Python, Jinja2, SMTP ### 4. Backend API (Optional) - **Purpose**: Tracking and analytics - **Endpoints**: - `/api/track/pixel/` - Email open tracking - `/api/track/click/` - Link click tracking - `/api/analytics/*` - Engagement metrics - `/api/tracking/*` - Privacy controls - **Technology**: Flask, Python ## Data Flow ``` RSS Feeds → Crawler → MongoDB → Sender → Subscribers ↓ Backend API ↓ Analytics ``` ## Coordination The sender waits for the crawler to ensure fresh content: 1. Sender starts at 7:00 AM 2. Checks for recent articles every 30 seconds 3. Maximum wait time: 30 minutes 4. Proceeds once crawler finishes or timeout ## Technology Stack - **Backend**: Python 3.11 - **Database**: MongoDB 7.0 - **AI**: Ollama (Phi3 model) - **Scheduling**: Python schedule library - **Email**: SMTP with HTML templates - **Tracking**: Pixel tracking + redirect URLs - **Infrastructure**: Docker & Docker Compose ## Deployment All components run in Docker containers: ``` docker-compose up -d ``` Containers: - `munich-news-mongodb` - Database - `munich-news-crawler` - Crawler service - `munich-news-sender` - Sender service ## Security - MongoDB authentication enabled - Environment variables for secrets - HTTPS for tracking URLs (production) - GDPR-compliant data retention - Privacy controls (opt-out, deletion) ## Monitoring - Docker logs for all services - MongoDB for data verification - Health checks on containers - Engagement metrics via API ## Scalability - Horizontal: Add more crawler instances - Vertical: Increase container resources - Database: MongoDB sharding if needed - Caching: Redis for API responses (future)