# Munich News Daily - Architecture ## System Overview ``` ┌─────────────────────────────────────────────────────────────┐ │ Users / Browsers │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Frontend (Port 3000) │ │ Node.js + Express + Vanilla JS │ │ - Subscription form │ │ - News display │ │ - RSS feed management UI (future) │ └────────────────────────┬────────────────────────────────────┘ │ HTTP/REST ▼ ┌─────────────────────────────────────────────────────────────┐ │ Backend API (Port 5001) │ │ Flask + Python │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Routes (Blueprints) │ │ │ │ - subscription_routes.py (subscribe/unsubscribe) │ │ │ │ - news_routes.py (get news, stats) │ │ │ │ - rss_routes.py (manage RSS feeds) │ │ │ │ - ollama_routes.py (AI features) │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Services (Business Logic) │ │ │ │ - news_service.py (fetch & save articles) │ │ │ │ - email_service.py (send newsletters) │ │ │ │ - ollama_service.py (AI integration) │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Core │ │ │ │ - config.py (configuration) │ │ │ │ - database.py (DB connection) │ │ │ └──────────────────────────────────────────────────────┘ │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ MongoDB (Port 27017) │ │ │ │ Collections: │ │ - articles (news articles with full content) │ │ - subscribers (email subscribers) │ │ - rss_feeds (RSS feed sources) │ └─────────────────────────┬───────────────────────────────────┘ │ │ Read/Write │ ┌─────────────────────────┴───────────────────────────────────┐ │ News Crawler Microservice │ │ (Standalone) │ │ │ │ - Fetches RSS feeds from MongoDB │ │ - Crawls full article content │ │ - Extracts text, metadata, word count │ │ - Stores back to MongoDB │ │ - Can run independently or scheduled │ └──────────────────────────────────────────────────────────────┘ │ │ (Optional) ▼ ┌─────────────────────────────────────────────────────────────┐ │ Ollama AI Server (Port 11434) │ │ (Optional, External) │ │ │ │ - Article summarization │ │ - Content analysis │ │ - AI-powered features │ └──────────────────────────────────────────────────────────────┘ ``` ## Component Details ### Frontend (Port 3000) - **Technology**: Node.js, Express, Vanilla JavaScript - **Responsibilities**: - User interface - Subscription management - News display - API proxy to backend - **Communication**: HTTP REST to Backend ### Backend API (Port 5001) - **Technology**: Python, Flask - **Architecture**: Modular with Blueprints - **Responsibilities**: - REST API endpoints - Business logic - Database operations - Email sending - AI integration - **Communication**: - HTTP REST from Frontend - MongoDB driver to Database - HTTP to Ollama (optional) ### MongoDB (Port 27017) - **Technology**: MongoDB 7.0 - **Responsibilities**: - Persistent data storage - Articles, subscribers, RSS feeds - **Communication**: MongoDB protocol ### News Crawler (Standalone) - **Technology**: Python, BeautifulSoup - **Architecture**: Microservice (can run independently) - **Responsibilities**: - Fetch RSS feeds - Crawl article content - Extract and clean text - Store in database - **Communication**: MongoDB driver to Database - **Execution**: - Manual: `python crawler_service.py` - Scheduled: Cron, systemd, Docker - On-demand: Via backend API (future) ### Ollama AI Server (Optional, External) - **Technology**: Ollama - **Responsibilities**: - AI model inference - Text summarization - Content analysis - **Communication**: HTTP REST API ## Data Flow ### 1. News Aggregation Flow ``` RSS Feeds → Backend (news_service) → MongoDB (articles) ``` ### 2. Content Crawling Flow ``` MongoDB (rss_feeds) → Crawler → Article URLs → Web Scraping → MongoDB (articles with full_content) ``` ### 3. Subscription Flow ``` User → Frontend → Backend (subscription_routes) → MongoDB (subscribers) ``` ### 4. Newsletter Flow (Future) ``` Scheduler → Backend (email_service) → MongoDB (articles + subscribers) → SMTP → Users ``` ### 5. AI Processing Flow (Optional) ``` MongoDB (articles) → Backend (ollama_service) → Ollama Server → AI Summary → MongoDB (articles) ``` ## Deployment Options ### Development - All services run locally - MongoDB via Docker Compose - Manual crawler execution ### Production - Backend: Cloud VM, Container, or PaaS - Frontend: Static hosting or same server - MongoDB: MongoDB Atlas or self-hosted - Crawler: Scheduled job (cron, systemd timer) - Ollama: Separate GPU server (optional) ## Scalability Considerations ### Current Architecture - Monolithic backend (single Flask instance) - Standalone crawler (can run multiple instances) - Shared MongoDB ### Future Improvements - Load balancer for backend - Message queue for crawler jobs (Celery + Redis) - Caching layer (Redis) - CDN for frontend - Read replicas for MongoDB ## Security - CORS enabled for frontend-backend communication - MongoDB authentication (production) - Environment variables for secrets - Input validation on all endpoints - Rate limiting (future) ## Monitoring (Future) - Application logs - MongoDB metrics - Crawler success/failure tracking - API response times - Error tracking (Sentry)