210 lines
9.7 KiB
Markdown
210 lines
9.7 KiB
Markdown
# Munich News Daily - Architecture
|
|
|
|
## System Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Users / Browsers │
|
|
└────────────────────────┬────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Frontend (Port 3000) │
|
|
│ Node.js + Express + Vanilla JS │
|
|
│ - Subscription form │
|
|
│ - News display │
|
|
│ - RSS feed management UI (future) │
|
|
└────────────────────────┬────────────────────────────────────┘
|
|
│ HTTP/REST
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Backend API (Port 5001) │
|
|
│ Flask + Python │
|
|
│ │
|
|
│ ┌──────────────────────────────────────────────────────┐ │
|
|
│ │ Routes (Blueprints) │ │
|
|
│ │ - subscription_routes.py (subscribe/unsubscribe) │ │
|
|
│ │ - news_routes.py (get news, stats) │ │
|
|
│ │ - rss_routes.py (manage RSS feeds) │ │
|
|
│ │ - ollama_routes.py (AI features) │ │
|
|
│ └──────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌──────────────────────────────────────────────────────┐ │
|
|
│ │ Services (Business Logic) │ │
|
|
│ │ - news_service.py (fetch & save articles) │ │
|
|
│ │ - email_service.py (send newsletters) │ │
|
|
│ │ - ollama_service.py (AI integration) │ │
|
|
│ └──────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌──────────────────────────────────────────────────────┐ │
|
|
│ │ Core │ │
|
|
│ │ - config.py (configuration) │ │
|
|
│ │ - database.py (DB connection) │ │
|
|
│ └──────────────────────────────────────────────────────┘ │
|
|
└────────────────────────┬────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ MongoDB (Port 27017) │
|
|
│ │
|
|
│ Collections: │
|
|
│ - articles (news articles with full content) │
|
|
│ - subscribers (email subscribers) │
|
|
│ - rss_feeds (RSS feed sources) │
|
|
└─────────────────────────┬───────────────────────────────────┘
|
|
│
|
|
│ Read/Write
|
|
│
|
|
┌─────────────────────────┴───────────────────────────────────┐
|
|
│ News Crawler Microservice │
|
|
│ (Standalone) │
|
|
│ │
|
|
│ - Fetches RSS feeds from MongoDB │
|
|
│ - Crawls full article content │
|
|
│ - Extracts text, metadata, word count │
|
|
│ - Stores back to MongoDB │
|
|
│ - Can run independently or scheduled │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
|
|
│
|
|
│ (Optional)
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Ollama AI Server (Port 11434) │
|
|
│ (Optional, External) │
|
|
│ │
|
|
│ - Article summarization │
|
|
│ - Content analysis │
|
|
│ - AI-powered features │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Component Details
|
|
|
|
### Frontend (Port 3000)
|
|
- **Technology**: Node.js, Express, Vanilla JavaScript
|
|
- **Responsibilities**:
|
|
- User interface
|
|
- Subscription management
|
|
- News display
|
|
- API proxy to backend
|
|
- **Communication**: HTTP REST to Backend
|
|
|
|
### Backend API (Port 5001)
|
|
- **Technology**: Python, Flask
|
|
- **Architecture**: Modular with Blueprints
|
|
- **Responsibilities**:
|
|
- REST API endpoints
|
|
- Business logic
|
|
- Database operations
|
|
- Email sending
|
|
- AI integration
|
|
- **Communication**:
|
|
- HTTP REST from Frontend
|
|
- MongoDB driver to Database
|
|
- HTTP to Ollama (optional)
|
|
|
|
### MongoDB (Port 27017)
|
|
- **Technology**: MongoDB 7.0
|
|
- **Responsibilities**:
|
|
- Persistent data storage
|
|
- Articles, subscribers, RSS feeds
|
|
- **Communication**: MongoDB protocol
|
|
|
|
### News Crawler (Standalone)
|
|
- **Technology**: Python, BeautifulSoup
|
|
- **Architecture**: Microservice (can run independently)
|
|
- **Responsibilities**:
|
|
- Fetch RSS feeds
|
|
- Crawl article content
|
|
- Extract and clean text
|
|
- Store in database
|
|
- **Communication**: MongoDB driver to Database
|
|
- **Execution**:
|
|
- Manual: `python crawler_service.py`
|
|
- Scheduled: Cron, systemd, Docker
|
|
- On-demand: Via backend API (future)
|
|
|
|
### Ollama AI Server (Optional, External)
|
|
- **Technology**: Ollama
|
|
- **Responsibilities**:
|
|
- AI model inference
|
|
- Text summarization
|
|
- Content analysis
|
|
- **Communication**: HTTP REST API
|
|
|
|
## Data Flow
|
|
|
|
### 1. News Aggregation Flow
|
|
```
|
|
RSS Feeds → Backend (news_service) → MongoDB (articles)
|
|
```
|
|
|
|
### 2. Content Crawling Flow
|
|
```
|
|
MongoDB (rss_feeds) → Crawler → Article URLs →
|
|
Web Scraping → MongoDB (articles with full_content)
|
|
```
|
|
|
|
### 3. Subscription Flow
|
|
```
|
|
User → Frontend → Backend (subscription_routes) →
|
|
MongoDB (subscribers)
|
|
```
|
|
|
|
### 4. Newsletter Flow (Future)
|
|
```
|
|
Scheduler → Backend (email_service) →
|
|
MongoDB (articles + subscribers) → SMTP → Users
|
|
```
|
|
|
|
### 5. AI Processing Flow (Optional)
|
|
```
|
|
MongoDB (articles) → Backend (ollama_service) →
|
|
Ollama Server → AI Summary → MongoDB (articles)
|
|
```
|
|
|
|
## Deployment Options
|
|
|
|
### Development
|
|
- All services run locally
|
|
- MongoDB via Docker Compose
|
|
- Manual crawler execution
|
|
|
|
### Production
|
|
- Backend: Cloud VM, Container, or PaaS
|
|
- Frontend: Static hosting or same server
|
|
- MongoDB: MongoDB Atlas or self-hosted
|
|
- Crawler: Scheduled job (cron, systemd timer)
|
|
- Ollama: Separate GPU server (optional)
|
|
|
|
## Scalability Considerations
|
|
|
|
### Current Architecture
|
|
- Monolithic backend (single Flask instance)
|
|
- Standalone crawler (can run multiple instances)
|
|
- Shared MongoDB
|
|
|
|
### Future Improvements
|
|
- Load balancer for backend
|
|
- Message queue for crawler jobs (Celery + Redis)
|
|
- Caching layer (Redis)
|
|
- CDN for frontend
|
|
- Read replicas for MongoDB
|
|
|
|
## Security
|
|
|
|
- CORS enabled for frontend-backend communication
|
|
- MongoDB authentication (production)
|
|
- Environment variables for secrets
|
|
- Input validation on all endpoints
|
|
- Rate limiting (future)
|
|
|
|
## Monitoring (Future)
|
|
|
|
- Application logs
|
|
- MongoDB metrics
|
|
- Crawler success/failure tracking
|
|
- API response times
|
|
- Error tracking (Sentry)
|