All checks were successful
dongho-repo/Munich-news/pipeline/head This commit looks good
194 lines
6.1 KiB
Markdown
194 lines
6.1 KiB
Markdown
# Munich News Daily - Automated Newsletter System
|
|
|
|
A fully automated news aggregation system that crawls Munich news sources, generates AI-powered summaries, tracks local transport disruptions, and delivers personalized daily newsletters.
|
|
|
|

|
|
|
|
## ✨ Key Features
|
|
|
|
- **🤖 AI-Powered Clustering** - Smartly detects duplicate stories and groups related articles using ChromaDB vector search.
|
|
- **📝 Neutral Summaries** - Generates balanced, multi-perspective summaries using local LLMs (Ollama).
|
|
- **🚇 Transport Updates** - Real-time tracking of Munich public transport (MVG) disruptions options.
|
|
- **🎯 Smart Prioritization** - Ranks stories based on relevance and user preferences.
|
|
- **🎨 Personalized Newsletters** - diverse content delivery system.
|
|
- **📊 Engagement Analytics** - Detailed tracking of open rates, click-throughs, and user interests.
|
|
- **⚡ GPU Acceleration** - Integrated support for NVIDIA GPUs for faster AI processing.
|
|
- **🔒 Privacy First** - GDPR-compliant with automatic data retention policies and anonymization.
|
|
|
|
## 🚀 Quick Start
|
|
|
|
For a detailed 5-minute setup guide, see [QUICKSTART.md](QUICKSTART.md).
|
|
|
|
```bash
|
|
# 1. Configure environment
|
|
cp backend/.env.example backend/.env
|
|
# Edit backend/.env with your email settings
|
|
|
|
# 2. Start everything (Auto-detects GPU)
|
|
./start-with-gpu.sh
|
|
|
|
# Questions?
|
|
# See logs: docker-compose logs -f
|
|
```
|
|
|
|
The system will automatically:
|
|
1. **6:00 AM**: Crawl news & transport updates.
|
|
2. **6:30 AM**: Generate AI summaries & clusters.
|
|
3. **7:00 AM**: Send personalized newsletters.
|
|
|
|
## 📋 System Architecture
|
|
|
|
The system is built as a set of microservices orchestrated by Docker Compose.
|
|
|
|
```mermaid
|
|
graph TD
|
|
User[Subscribers] -->|Email| Sender[Newsletter Sender]
|
|
User -->|Web| Frontend[React Frontend]
|
|
Frontend -->|API| Backend[Backend API]
|
|
|
|
subgraph "Core Services"
|
|
Crawler[News Crawler]
|
|
Transport[Transport Crawler]
|
|
Sender
|
|
Backend
|
|
end
|
|
|
|
subgraph "Data & AI"
|
|
Mongo[(MongoDB)]
|
|
Redis[(Redis)]
|
|
Chroma[(ChromaDB)]
|
|
Ollama[Ollama AI]
|
|
end
|
|
|
|
Crawler -->|Save| Mongo
|
|
Crawler -->|Embeddings| Chroma
|
|
Crawler -->|Summarize| Ollama
|
|
|
|
Transport -->|Save| Mongo
|
|
|
|
Sender -->|Read| Mongo
|
|
Sender -->|Track| Backend
|
|
|
|
Backend -->|Read/Write| Mongo
|
|
Backend -->|Cache| Redis
|
|
```
|
|
|
|
### Core Components
|
|
|
|
| Service | Description | Port |
|
|
|---------|-------------|------|
|
|
| **Frontend** | React-based user dashboard and admin interface. | 3000 |
|
|
| **Backend API** | Flask API for tracking, analytics, and management. | 5001 |
|
|
| **News Crawler** | Fetches RSS feeds, extracts content, and runs AI clustering. | - |
|
|
| **Transport Crawler** | Monitors MVG (Munich Transport) for delays and disruptions. | - |
|
|
| **Newsletter Sender** | Manages subscribers, generates templates, and sends emails. | - |
|
|
| **Ollama** | Local LLM runner for on-premise AI (Phi-3, Llama3, etc.). | - |
|
|
| **ChromaDB** | Vector database for semantic search and article clustering. | - |
|
|
|
|
## 📂 Project Structure
|
|
|
|
```text
|
|
munich-news/
|
|
├── backend/ # Flask API for tracking & analytics
|
|
├── frontend/ # React dashboard & admin UI
|
|
├── news_crawler/ # RSS fetcher & AI summarizer service
|
|
├── news_sender/ # Email generation & dispatch service
|
|
├── transport_crawler/ # MVG transport disruption monitor
|
|
├── docker-compose.yml # Main service orchestration
|
|
└── docs/ # Detailed documentation
|
|
```
|
|
|
|
## 🛠️ Installation & Setup
|
|
|
|
1. **Clone the repository**
|
|
```bash
|
|
git clone https://github.com/yourusername/munich-news.git
|
|
cd munich-news
|
|
```
|
|
|
|
2. **Environment Configuration**
|
|
```bash
|
|
cp backend/.env.example backend/.env
|
|
nano backend/.env
|
|
```
|
|
*Critical settings:* `SMTP_SERVER`, `EMAIL_USER`, `EMAIL_PASSWORD`.
|
|
|
|
3. **Start the System**
|
|
```bash
|
|
# Recommended: Helper script (handles GPU & Model setup)
|
|
./start-with-gpu.sh
|
|
|
|
# Alternative: Standard Docker Compose
|
|
docker-compose up -d
|
|
```
|
|
|
|
4. **Initial Setup (First Run)**
|
|
* The system needs to download the AI model (approx. 2GB).
|
|
* Watch progress: `docker-compose logs -f ollama-setup`
|
|
|
|
## ⚙️ Configuration
|
|
|
|
Key configuration options in `backend/.env`:
|
|
|
|
| Category | Variable | Description |
|
|
|----------|----------|-------------|
|
|
| **Email** | `SMTP_SERVER` | SMTP Server (e.g., smtp.gmail.com) |
|
|
| | `EMAIL_USER` | Your sending email address |
|
|
| **AI** | `OLLAMA_MODEL` | Model to use (default: phi3:latest) |
|
|
| **Schedule** | `CRAWLER_TIME` | Time to start crawling (e.g., "06:00") |
|
|
| | `SENDER_TIME` | Time to send emails (e.g., "07:00") |
|
|
|
|
## 📊 Usage & Monitoring
|
|
|
|
### Access Points
|
|
* **Web Dashboard**: [http://localhost:3000](http://localhost:3000) (or configured domain)
|
|
* **API**: [http://localhost:5001](http://localhost:5001)
|
|
|
|
### Useful Commands
|
|
|
|
**View Logs**
|
|
```bash
|
|
docker-compose logs -f [service_name]
|
|
# e.g., docker-compose logs -f crawler
|
|
```
|
|
|
|
**Manual Trigger**
|
|
```bash
|
|
# Run News Crawler immediately
|
|
docker-compose exec crawler python crawler_service.py 10
|
|
|
|
# Run Transport Crawler immediately
|
|
docker-compose exec transport-crawler python transport_service.py
|
|
|
|
# Send Test Newsletter
|
|
docker-compose exec sender python sender_service.py test user@example.com
|
|
```
|
|
|
|
**Database Access**
|
|
```bash
|
|
# Connect to MongoDB
|
|
docker-compose exec mongodb mongosh munich_news
|
|
```
|
|
|
|
## 🌐 Production Deployment (Traefik)
|
|
|
|
This project is configured to work with **Traefik** as a reverse proxy.
|
|
The `docker-compose.yml` includes labels for:
|
|
- `news.dongho.kim` (Frontend)
|
|
- `news-api.dongho.kim` (Backend)
|
|
|
|
To use this locally, add these to your `/etc/hosts`:
|
|
```text
|
|
127.0.0.1 news.dongho.kim news-api.dongho.kim
|
|
```
|
|
|
|
For production, ensure your Traefik proxy network is named `proxy` or update the `docker-compose.yml` accordingly.
|
|
|
|
## 🤝 Contributing
|
|
|
|
We welcome contributions! Please check [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
|
|
|
## 📄 License
|
|
|
|
MIT License - see [LICENSE](LICENSE) for details.
|