Files
Munich-news/ARCHITECTURE.md
2025-11-10 19:13:33 +01:00

210 lines
9.7 KiB
Markdown

# Munich News Daily - Architecture
## System Overview
```
┌─────────────────────────────────────────────────────────────┐
│ Users / Browsers │
└────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Frontend (Port 3000) │
│ Node.js + Express + Vanilla JS │
│ - Subscription form │
│ - News display │
│ - RSS feed management UI (future) │
└────────────────────────┬────────────────────────────────────┘
│ HTTP/REST
┌─────────────────────────────────────────────────────────────┐
│ Backend API (Port 5001) │
│ Flask + Python │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Routes (Blueprints) │ │
│ │ - subscription_routes.py (subscribe/unsubscribe) │ │
│ │ - news_routes.py (get news, stats) │ │
│ │ - rss_routes.py (manage RSS feeds) │ │
│ │ - ollama_routes.py (AI features) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Services (Business Logic) │ │
│ │ - news_service.py (fetch & save articles) │ │
│ │ - email_service.py (send newsletters) │ │
│ │ - ollama_service.py (AI integration) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Core │ │
│ │ - config.py (configuration) │ │
│ │ - database.py (DB connection) │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ MongoDB (Port 27017) │
│ │
│ Collections: │
│ - articles (news articles with full content) │
│ - subscribers (email subscribers) │
│ - rss_feeds (RSS feed sources) │
└─────────────────────────┬───────────────────────────────────┘
│ Read/Write
┌─────────────────────────┴───────────────────────────────────┐
│ News Crawler Microservice │
│ (Standalone) │
│ │
│ - Fetches RSS feeds from MongoDB │
│ - Crawls full article content │
│ - Extracts text, metadata, word count │
│ - Stores back to MongoDB │
│ - Can run independently or scheduled │
└──────────────────────────────────────────────────────────────┘
│ (Optional)
┌─────────────────────────────────────────────────────────────┐
│ Ollama AI Server (Port 11434) │
│ (Optional, External) │
│ │
│ - Article summarization │
│ - Content analysis │
│ - AI-powered features │
└──────────────────────────────────────────────────────────────┘
```
## Component Details
### Frontend (Port 3000)
- **Technology**: Node.js, Express, Vanilla JavaScript
- **Responsibilities**:
- User interface
- Subscription management
- News display
- API proxy to backend
- **Communication**: HTTP REST to Backend
### Backend API (Port 5001)
- **Technology**: Python, Flask
- **Architecture**: Modular with Blueprints
- **Responsibilities**:
- REST API endpoints
- Business logic
- Database operations
- Email sending
- AI integration
- **Communication**:
- HTTP REST from Frontend
- MongoDB driver to Database
- HTTP to Ollama (optional)
### MongoDB (Port 27017)
- **Technology**: MongoDB 7.0
- **Responsibilities**:
- Persistent data storage
- Articles, subscribers, RSS feeds
- **Communication**: MongoDB protocol
### News Crawler (Standalone)
- **Technology**: Python, BeautifulSoup
- **Architecture**: Microservice (can run independently)
- **Responsibilities**:
- Fetch RSS feeds
- Crawl article content
- Extract and clean text
- Store in database
- **Communication**: MongoDB driver to Database
- **Execution**:
- Manual: `python crawler_service.py`
- Scheduled: Cron, systemd, Docker
- On-demand: Via backend API (future)
### Ollama AI Server (Optional, External)
- **Technology**: Ollama
- **Responsibilities**:
- AI model inference
- Text summarization
- Content analysis
- **Communication**: HTTP REST API
## Data Flow
### 1. News Aggregation Flow
```
RSS Feeds → Backend (news_service) → MongoDB (articles)
```
### 2. Content Crawling Flow
```
MongoDB (rss_feeds) → Crawler → Article URLs →
Web Scraping → MongoDB (articles with full_content)
```
### 3. Subscription Flow
```
User → Frontend → Backend (subscription_routes) →
MongoDB (subscribers)
```
### 4. Newsletter Flow (Future)
```
Scheduler → Backend (email_service) →
MongoDB (articles + subscribers) → SMTP → Users
```
### 5. AI Processing Flow (Optional)
```
MongoDB (articles) → Backend (ollama_service) →
Ollama Server → AI Summary → MongoDB (articles)
```
## Deployment Options
### Development
- All services run locally
- MongoDB via Docker Compose
- Manual crawler execution
### Production
- Backend: Cloud VM, Container, or PaaS
- Frontend: Static hosting or same server
- MongoDB: MongoDB Atlas or self-hosted
- Crawler: Scheduled job (cron, systemd timer)
- Ollama: Separate GPU server (optional)
## Scalability Considerations
### Current Architecture
- Monolithic backend (single Flask instance)
- Standalone crawler (can run multiple instances)
- Shared MongoDB
### Future Improvements
- Load balancer for backend
- Message queue for crawler jobs (Celery + Redis)
- Caching layer (Redis)
- CDN for frontend
- Read replicas for MongoDB
## Security
- CORS enabled for frontend-backend communication
- MongoDB authentication (production)
- Environment variables for secrets
- Input validation on all endpoints
- Rate limiting (future)
## Monitoring (Future)
- Application logs
- MongoDB metrics
- Crawler success/failure tracking
- API response times
- Error tracking (Sentry)