9.7 KiB
9.7 KiB
Munich News Daily - Architecture
System Overview
┌─────────────────────────────────────────────────────────────┐
│ Users / Browsers │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Frontend (Port 3000) │
│ Node.js + Express + Vanilla JS │
│ - Subscription form │
│ - News display │
│ - RSS feed management UI (future) │
└────────────────────────┬────────────────────────────────────┘
│ HTTP/REST
▼
┌─────────────────────────────────────────────────────────────┐
│ Backend API (Port 5001) │
│ Flask + Python │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Routes (Blueprints) │ │
│ │ - subscription_routes.py (subscribe/unsubscribe) │ │
│ │ - news_routes.py (get news, stats) │ │
│ │ - rss_routes.py (manage RSS feeds) │ │
│ │ - ollama_routes.py (AI features) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Services (Business Logic) │ │
│ │ - news_service.py (fetch & save articles) │ │
│ │ - email_service.py (send newsletters) │ │
│ │ - ollama_service.py (AI integration) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Core │ │
│ │ - config.py (configuration) │ │
│ │ - database.py (DB connection) │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ MongoDB (Port 27017) │
│ │
│ Collections: │
│ - articles (news articles with full content) │
│ - subscribers (email subscribers) │
│ - rss_feeds (RSS feed sources) │
└─────────────────────────┬───────────────────────────────────┘
│
│ Read/Write
│
┌─────────────────────────┴───────────────────────────────────┐
│ News Crawler Microservice │
│ (Standalone) │
│ │
│ - Fetches RSS feeds from MongoDB │
│ - Crawls full article content │
│ - Extracts text, metadata, word count │
│ - Stores back to MongoDB │
│ - Can run independently or scheduled │
└──────────────────────────────────────────────────────────────┘
│
│ (Optional)
▼
┌─────────────────────────────────────────────────────────────┐
│ Ollama AI Server (Port 11434) │
│ (Optional, External) │
│ │
│ - Article summarization │
│ - Content analysis │
│ - AI-powered features │
└──────────────────────────────────────────────────────────────┘
Component Details
Frontend (Port 3000)
- Technology: Node.js, Express, Vanilla JavaScript
- Responsibilities:
- User interface
- Subscription management
- News display
- API proxy to backend
- Communication: HTTP REST to Backend
Backend API (Port 5001)
- Technology: Python, Flask
- Architecture: Modular with Blueprints
- Responsibilities:
- REST API endpoints
- Business logic
- Database operations
- Email sending
- AI integration
- Communication:
- HTTP REST from Frontend
- MongoDB driver to Database
- HTTP to Ollama (optional)
MongoDB (Port 27017)
- Technology: MongoDB 7.0
- Responsibilities:
- Persistent data storage
- Articles, subscribers, RSS feeds
- Communication: MongoDB protocol
News Crawler (Standalone)
- Technology: Python, BeautifulSoup
- Architecture: Microservice (can run independently)
- Responsibilities:
- Fetch RSS feeds
- Crawl article content
- Extract and clean text
- Store in database
- Communication: MongoDB driver to Database
- Execution:
- Manual:
python crawler_service.py - Scheduled: Cron, systemd, Docker
- On-demand: Via backend API (future)
- Manual:
Ollama AI Server (Optional, External)
- Technology: Ollama
- Responsibilities:
- AI model inference
- Text summarization
- Content analysis
- Communication: HTTP REST API
Data Flow
1. News Aggregation Flow
RSS Feeds → Backend (news_service) → MongoDB (articles)
2. Content Crawling Flow
MongoDB (rss_feeds) → Crawler → Article URLs →
Web Scraping → MongoDB (articles with full_content)
3. Subscription Flow
User → Frontend → Backend (subscription_routes) →
MongoDB (subscribers)
4. Newsletter Flow (Future)
Scheduler → Backend (email_service) →
MongoDB (articles + subscribers) → SMTP → Users
5. AI Processing Flow (Optional)
MongoDB (articles) → Backend (ollama_service) →
Ollama Server → AI Summary → MongoDB (articles)
Deployment Options
Development
- All services run locally
- MongoDB via Docker Compose
- Manual crawler execution
Production
- Backend: Cloud VM, Container, or PaaS
- Frontend: Static hosting or same server
- MongoDB: MongoDB Atlas or self-hosted
- Crawler: Scheduled job (cron, systemd timer)
- Ollama: Separate GPU server (optional)
Scalability Considerations
Current Architecture
- Monolithic backend (single Flask instance)
- Standalone crawler (can run multiple instances)
- Shared MongoDB
Future Improvements
- Load balancer for backend
- Message queue for crawler jobs (Celery + Redis)
- Caching layer (Redis)
- CDN for frontend
- Read replicas for MongoDB
Security
- CORS enabled for frontend-backend communication
- MongoDB authentication (production)
- Environment variables for secrets
- Input validation on all endpoints
- Rate limiting (future)
Monitoring (Future)
- Application logs
- MongoDB metrics
- Crawler success/failure tracking
- API response times
- Error tracking (Sentry)