Files
Munich-news/docs/ARCHITECTURE.md
2025-11-11 14:09:21 +01:00

3.7 KiB

System Architecture

Overview

Munich News Daily is a fully automated news aggregation and newsletter system with the following components:

┌─────────────────────────────────────────────────────────────────┐
│                    Munich News Daily System                      │
└─────────────────────────────────────────────────────────────────┘

6:00 AM Berlin → News Crawler
                 ↓
                 Fetches RSS feeds
                 Extracts full content
                 Generates AI summaries
                 Saves to MongoDB
                 ↓
7:00 AM Berlin → Newsletter Sender
                 ↓
                 Waits for crawler
                 Fetches articles
                 Generates newsletter
                 Sends to subscribers
                 ↓
                 ✅ Done!

Components

1. MongoDB Database

  • Purpose: Central data storage
  • Collections:
    • articles: News articles with summaries
    • subscribers: Email subscribers
    • rss_feeds: RSS feed sources
    • newsletter_sends: Email tracking data
    • link_clicks: Link click tracking
    • subscriber_activity: Engagement metrics

2. News Crawler

  • Schedule: Daily at 6:00 AM Berlin time
  • Functions:
    • Fetches articles from RSS feeds
    • Extracts full article content
    • Generates AI summaries using Ollama
    • Saves to MongoDB
  • Technology: Python, BeautifulSoup, Ollama

3. Newsletter Sender

  • Schedule: Daily at 7:00 AM Berlin time
  • Functions:
    • Waits for crawler to finish (max 30 min)
    • Fetches today's articles
    • Generates HTML newsletter
    • Injects tracking pixels
    • Sends to all subscribers
  • Technology: Python, Jinja2, SMTP

4. Backend API (Optional)

  • Purpose: Tracking and analytics
  • Endpoints:
    • /api/track/pixel/<id> - Email open tracking
    • /api/track/click/<id> - Link click tracking
    • /api/analytics/* - Engagement metrics
    • /api/tracking/* - Privacy controls
  • Technology: Flask, Python

Data Flow

RSS Feeds → Crawler → MongoDB → Sender → Subscribers
                         ↓
                    Backend API
                         ↓
                    Analytics

Coordination

The sender waits for the crawler to ensure fresh content:

  1. Sender starts at 7:00 AM
  2. Checks for recent articles every 30 seconds
  3. Maximum wait time: 30 minutes
  4. Proceeds once crawler finishes or timeout

Technology Stack

  • Backend: Python 3.11
  • Database: MongoDB 7.0
  • AI: Ollama (Phi3 model)
  • Scheduling: Python schedule library
  • Email: SMTP with HTML templates
  • Tracking: Pixel tracking + redirect URLs
  • Infrastructure: Docker & Docker Compose

Deployment

All components run in Docker containers:

docker-compose up -d

Containers:

  • munich-news-mongodb - Database
  • munich-news-crawler - Crawler service
  • munich-news-sender - Sender service

Security

  • MongoDB authentication enabled
  • Environment variables for secrets
  • HTTPS for tracking URLs (production)
  • GDPR-compliant data retention
  • Privacy controls (opt-out, deletion)

Monitoring

  • Docker logs for all services
  • MongoDB for data verification
  • Health checks on containers
  • Engagement metrics via API

Scalability

  • Horizontal: Add more crawler instances
  • Vertical: Increase container resources
  • Database: MongoDB sharding if needed
  • Caching: Redis for API responses (future)