Files
Munich-news/docs/OLD_ARCHITECTURE.md
2025-11-11 14:09:21 +01:00

9.7 KiB

Munich News Daily - Architecture

System Overview

┌─────────────────────────────────────────────────────────────┐
│                        Users / Browsers                      │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    Frontend (Port 3000)                      │
│                  Node.js + Express + Vanilla JS              │
│  - Subscription form                                         │
│  - News display                                              │
│  - RSS feed management UI (future)                           │
└────────────────────────┬────────────────────────────────────┘
                         │ HTTP/REST
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    Backend API (Port 5001)                   │
│                      Flask + Python                          │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ Routes (Blueprints)                                  │  │
│  │  - subscription_routes.py  (subscribe/unsubscribe)   │  │
│  │  - news_routes.py          (get news, stats)         │  │
│  │  - rss_routes.py           (manage RSS feeds)        │  │
│  │  - ollama_routes.py        (AI features)             │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ Services (Business Logic)                            │  │
│  │  - news_service.py         (fetch & save articles)   │  │
│  │  - email_service.py        (send newsletters)        │  │
│  │  - ollama_service.py       (AI integration)          │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ Core                                                 │  │
│  │  - config.py               (configuration)           │  │
│  │  - database.py             (DB connection)           │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    MongoDB (Port 27017)                      │
│                                                              │
│  Collections:                                                │
│  - articles         (news articles with full content)        │
│  - subscribers      (email subscribers)                      │
│  - rss_feeds        (RSS feed sources)                       │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          │ Read/Write
                          │
┌─────────────────────────┴───────────────────────────────────┐
│              News Crawler Microservice                       │
│                    (Standalone)                              │
│                                                              │
│  - Fetches RSS feeds from MongoDB                            │
│  - Crawls full article content                               │
│  - Extracts text, metadata, word count                       │
│  - Stores back to MongoDB                                    │
│  - Can run independently or scheduled                        │
└──────────────────────────────────────────────────────────────┘

                          │
                          │ (Optional)
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                  Ollama AI Server (Port 11434)               │
│                    (Optional, External)                      │
│                                                              │
│  - Article summarization                                     │
│  - Content analysis                                          │
│  - AI-powered features                                       │
└──────────────────────────────────────────────────────────────┘

Component Details

Frontend (Port 3000)

  • Technology: Node.js, Express, Vanilla JavaScript
  • Responsibilities:
    • User interface
    • Subscription management
    • News display
    • API proxy to backend
  • Communication: HTTP REST to Backend

Backend API (Port 5001)

  • Technology: Python, Flask
  • Architecture: Modular with Blueprints
  • Responsibilities:
    • REST API endpoints
    • Business logic
    • Database operations
    • Email sending
    • AI integration
  • Communication:
    • HTTP REST from Frontend
    • MongoDB driver to Database
    • HTTP to Ollama (optional)

MongoDB (Port 27017)

  • Technology: MongoDB 7.0
  • Responsibilities:
    • Persistent data storage
    • Articles, subscribers, RSS feeds
  • Communication: MongoDB protocol

News Crawler (Standalone)

  • Technology: Python, BeautifulSoup
  • Architecture: Microservice (can run independently)
  • Responsibilities:
    • Fetch RSS feeds
    • Crawl article content
    • Extract and clean text
    • Store in database
  • Communication: MongoDB driver to Database
  • Execution:
    • Manual: python crawler_service.py
    • Scheduled: Cron, systemd, Docker
    • On-demand: Via backend API (future)

Ollama AI Server (Optional, External)

  • Technology: Ollama
  • Responsibilities:
    • AI model inference
    • Text summarization
    • Content analysis
  • Communication: HTTP REST API

Data Flow

1. News Aggregation Flow

RSS Feeds → Backend (news_service) → MongoDB (articles)

2. Content Crawling Flow

MongoDB (rss_feeds) → Crawler → Article URLs → 
Web Scraping → MongoDB (articles with full_content)

3. Subscription Flow

User → Frontend → Backend (subscription_routes) → 
MongoDB (subscribers)

4. Newsletter Flow (Future)

Scheduler → Backend (email_service) → 
MongoDB (articles + subscribers) → SMTP → Users

5. AI Processing Flow (Optional)

MongoDB (articles) → Backend (ollama_service) → 
Ollama Server → AI Summary → MongoDB (articles)

Deployment Options

Development

  • All services run locally
  • MongoDB via Docker Compose
  • Manual crawler execution

Production

  • Backend: Cloud VM, Container, or PaaS
  • Frontend: Static hosting or same server
  • MongoDB: MongoDB Atlas or self-hosted
  • Crawler: Scheduled job (cron, systemd timer)
  • Ollama: Separate GPU server (optional)

Scalability Considerations

Current Architecture

  • Monolithic backend (single Flask instance)
  • Standalone crawler (can run multiple instances)
  • Shared MongoDB

Future Improvements

  • Load balancer for backend
  • Message queue for crawler jobs (Celery + Redis)
  • Caching layer (Redis)
  • CDN for frontend
  • Read replicas for MongoDB

Security

  • CORS enabled for frontend-backend communication
  • MongoDB authentication (production)
  • Environment variables for secrets
  • Input validation on all endpoints
  • Rate limiting (future)

Monitoring (Future)

  • Application logs
  • MongoDB metrics
  • Crawler success/failure tracking
  • API response times
  • Error tracking (Sentry)