update
This commit is contained in:
209
ARCHITECTURE.md
Normal file
209
ARCHITECTURE.md
Normal file
@@ -0,0 +1,209 @@
|
||||
# Munich News Daily - Architecture
|
||||
|
||||
## System Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Users / Browsers │
|
||||
└────────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Frontend (Port 3000) │
|
||||
│ Node.js + Express + Vanilla JS │
|
||||
│ - Subscription form │
|
||||
│ - News display │
|
||||
│ - RSS feed management UI (future) │
|
||||
└────────────────────────┬────────────────────────────────────┘
|
||||
│ HTTP/REST
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Backend API (Port 5001) │
|
||||
│ Flask + Python │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Routes (Blueprints) │ │
|
||||
│ │ - subscription_routes.py (subscribe/unsubscribe) │ │
|
||||
│ │ - news_routes.py (get news, stats) │ │
|
||||
│ │ - rss_routes.py (manage RSS feeds) │ │
|
||||
│ │ - ollama_routes.py (AI features) │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Services (Business Logic) │ │
|
||||
│ │ - news_service.py (fetch & save articles) │ │
|
||||
│ │ - email_service.py (send newsletters) │ │
|
||||
│ │ - ollama_service.py (AI integration) │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Core │ │
|
||||
│ │ - config.py (configuration) │ │
|
||||
│ │ - database.py (DB connection) │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
└────────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ MongoDB (Port 27017) │
|
||||
│ │
|
||||
│ Collections: │
|
||||
│ - articles (news articles with full content) │
|
||||
│ - subscribers (email subscribers) │
|
||||
│ - rss_feeds (RSS feed sources) │
|
||||
└─────────────────────────┬───────────────────────────────────┘
|
||||
│
|
||||
│ Read/Write
|
||||
│
|
||||
┌─────────────────────────┴───────────────────────────────────┐
|
||||
│ News Crawler Microservice │
|
||||
│ (Standalone) │
|
||||
│ │
|
||||
│ - Fetches RSS feeds from MongoDB │
|
||||
│ - Crawls full article content │
|
||||
│ - Extracts text, metadata, word count │
|
||||
│ - Stores back to MongoDB │
|
||||
│ - Can run independently or scheduled │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
|
||||
│
|
||||
│ (Optional)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Ollama AI Server (Port 11434) │
|
||||
│ (Optional, External) │
|
||||
│ │
|
||||
│ - Article summarization │
|
||||
│ - Content analysis │
|
||||
│ - AI-powered features │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Component Details
|
||||
|
||||
### Frontend (Port 3000)
|
||||
- **Technology**: Node.js, Express, Vanilla JavaScript
|
||||
- **Responsibilities**:
|
||||
- User interface
|
||||
- Subscription management
|
||||
- News display
|
||||
- API proxy to backend
|
||||
- **Communication**: HTTP REST to Backend
|
||||
|
||||
### Backend API (Port 5001)
|
||||
- **Technology**: Python, Flask
|
||||
- **Architecture**: Modular with Blueprints
|
||||
- **Responsibilities**:
|
||||
- REST API endpoints
|
||||
- Business logic
|
||||
- Database operations
|
||||
- Email sending
|
||||
- AI integration
|
||||
- **Communication**:
|
||||
- HTTP REST from Frontend
|
||||
- MongoDB driver to Database
|
||||
- HTTP to Ollama (optional)
|
||||
|
||||
### MongoDB (Port 27017)
|
||||
- **Technology**: MongoDB 7.0
|
||||
- **Responsibilities**:
|
||||
- Persistent data storage
|
||||
- Articles, subscribers, RSS feeds
|
||||
- **Communication**: MongoDB protocol
|
||||
|
||||
### News Crawler (Standalone)
|
||||
- **Technology**: Python, BeautifulSoup
|
||||
- **Architecture**: Microservice (can run independently)
|
||||
- **Responsibilities**:
|
||||
- Fetch RSS feeds
|
||||
- Crawl article content
|
||||
- Extract and clean text
|
||||
- Store in database
|
||||
- **Communication**: MongoDB driver to Database
|
||||
- **Execution**:
|
||||
- Manual: `python crawler_service.py`
|
||||
- Scheduled: Cron, systemd, Docker
|
||||
- On-demand: Via backend API (future)
|
||||
|
||||
### Ollama AI Server (Optional, External)
|
||||
- **Technology**: Ollama
|
||||
- **Responsibilities**:
|
||||
- AI model inference
|
||||
- Text summarization
|
||||
- Content analysis
|
||||
- **Communication**: HTTP REST API
|
||||
|
||||
## Data Flow
|
||||
|
||||
### 1. News Aggregation Flow
|
||||
```
|
||||
RSS Feeds → Backend (news_service) → MongoDB (articles)
|
||||
```
|
||||
|
||||
### 2. Content Crawling Flow
|
||||
```
|
||||
MongoDB (rss_feeds) → Crawler → Article URLs →
|
||||
Web Scraping → MongoDB (articles with full_content)
|
||||
```
|
||||
|
||||
### 3. Subscription Flow
|
||||
```
|
||||
User → Frontend → Backend (subscription_routes) →
|
||||
MongoDB (subscribers)
|
||||
```
|
||||
|
||||
### 4. Newsletter Flow (Future)
|
||||
```
|
||||
Scheduler → Backend (email_service) →
|
||||
MongoDB (articles + subscribers) → SMTP → Users
|
||||
```
|
||||
|
||||
### 5. AI Processing Flow (Optional)
|
||||
```
|
||||
MongoDB (articles) → Backend (ollama_service) →
|
||||
Ollama Server → AI Summary → MongoDB (articles)
|
||||
```
|
||||
|
||||
## Deployment Options
|
||||
|
||||
### Development
|
||||
- All services run locally
|
||||
- MongoDB via Docker Compose
|
||||
- Manual crawler execution
|
||||
|
||||
### Production
|
||||
- Backend: Cloud VM, Container, or PaaS
|
||||
- Frontend: Static hosting or same server
|
||||
- MongoDB: MongoDB Atlas or self-hosted
|
||||
- Crawler: Scheduled job (cron, systemd timer)
|
||||
- Ollama: Separate GPU server (optional)
|
||||
|
||||
## Scalability Considerations
|
||||
|
||||
### Current Architecture
|
||||
- Monolithic backend (single Flask instance)
|
||||
- Standalone crawler (can run multiple instances)
|
||||
- Shared MongoDB
|
||||
|
||||
### Future Improvements
|
||||
- Load balancer for backend
|
||||
- Message queue for crawler jobs (Celery + Redis)
|
||||
- Caching layer (Redis)
|
||||
- CDN for frontend
|
||||
- Read replicas for MongoDB
|
||||
|
||||
## Security
|
||||
|
||||
- CORS enabled for frontend-backend communication
|
||||
- MongoDB authentication (production)
|
||||
- Environment variables for secrets
|
||||
- Input validation on all endpoints
|
||||
- Rate limiting (future)
|
||||
|
||||
## Monitoring (Future)
|
||||
|
||||
- Application logs
|
||||
- MongoDB metrics
|
||||
- Crawler success/failure tracking
|
||||
- API response times
|
||||
- Error tracking (Sentry)
|
||||
Reference in New Issue
Block a user