Files
Munich-news/README.md
dongho 7346ee9de2
All checks were successful
dongho-repo/Munich-news/pipeline/head This commit looks good
update
2025-12-10 15:57:07 +00:00

194 lines
6.1 KiB
Markdown

# Munich News Daily - Automated Newsletter System
A fully automated news aggregation system that crawls Munich news sources, generates AI-powered summaries, tracks local transport disruptions, and delivers personalized daily newsletters.
![Munich News Daily](https://via.placeholder.com/800x400?text=Munich+News+Daily+Dashboard)
## ✨ Key Features
- **🤖 AI-Powered Clustering** - Smartly detects duplicate stories and groups related articles using ChromaDB vector search.
- **📝 Neutral Summaries** - Generates balanced, multi-perspective summaries using local LLMs (Ollama).
- **🚇 Transport Updates** - Real-time tracking of Munich public transport (MVG) disruptions options.
- **🎯 Smart Prioritization** - Ranks stories based on relevance and user preferences.
- **🎨 Personalized Newsletters** - diverse content delivery system.
- **📊 Engagement Analytics** - Detailed tracking of open rates, click-throughs, and user interests.
- **⚡ GPU Acceleration** - Integrated support for NVIDIA GPUs for faster AI processing.
- **🔒 Privacy First** - GDPR-compliant with automatic data retention policies and anonymization.
## 🚀 Quick Start
For a detailed 5-minute setup guide, see [QUICKSTART.md](QUICKSTART.md).
```bash
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your email settings
# 2. Start everything (Auto-detects GPU)
./start-with-gpu.sh
# Questions?
# See logs: docker-compose logs -f
```
The system will automatically:
1. **6:00 AM**: Crawl news & transport updates.
2. **6:30 AM**: Generate AI summaries & clusters.
3. **7:00 AM**: Send personalized newsletters.
## 📋 System Architecture
The system is built as a set of microservices orchestrated by Docker Compose.
```mermaid
graph TD
User[Subscribers] -->|Email| Sender[Newsletter Sender]
User -->|Web| Frontend[React Frontend]
Frontend -->|API| Backend[Backend API]
subgraph "Core Services"
Crawler[News Crawler]
Transport[Transport Crawler]
Sender
Backend
end
subgraph "Data & AI"
Mongo[(MongoDB)]
Redis[(Redis)]
Chroma[(ChromaDB)]
Ollama[Ollama AI]
end
Crawler -->|Save| Mongo
Crawler -->|Embeddings| Chroma
Crawler -->|Summarize| Ollama
Transport -->|Save| Mongo
Sender -->|Read| Mongo
Sender -->|Track| Backend
Backend -->|Read/Write| Mongo
Backend -->|Cache| Redis
```
### Core Components
| Service | Description | Port |
|---------|-------------|------|
| **Frontend** | React-based user dashboard and admin interface. | 3000 |
| **Backend API** | Flask API for tracking, analytics, and management. | 5001 |
| **News Crawler** | Fetches RSS feeds, extracts content, and runs AI clustering. | - |
| **Transport Crawler** | Monitors MVG (Munich Transport) for delays and disruptions. | - |
| **Newsletter Sender** | Manages subscribers, generates templates, and sends emails. | - |
| **Ollama** | Local LLM runner for on-premise AI (Phi-3, Llama3, etc.). | - |
| **ChromaDB** | Vector database for semantic search and article clustering. | - |
## 📂 Project Structure
```text
munich-news/
├── backend/ # Flask API for tracking & analytics
├── frontend/ # React dashboard & admin UI
├── news_crawler/ # RSS fetcher & AI summarizer service
├── news_sender/ # Email generation & dispatch service
├── transport_crawler/ # MVG transport disruption monitor
├── docker-compose.yml # Main service orchestration
└── docs/ # Detailed documentation
```
## 🛠️ Installation & Setup
1. **Clone the repository**
```bash
git clone https://github.com/yourusername/munich-news.git
cd munich-news
```
2. **Environment Configuration**
```bash
cp backend/.env.example backend/.env
nano backend/.env
```
*Critical settings:* `SMTP_SERVER`, `EMAIL_USER`, `EMAIL_PASSWORD`.
3. **Start the System**
```bash
# Recommended: Helper script (handles GPU & Model setup)
./start-with-gpu.sh
# Alternative: Standard Docker Compose
docker-compose up -d
```
4. **Initial Setup (First Run)**
* The system needs to download the AI model (approx. 2GB).
* Watch progress: `docker-compose logs -f ollama-setup`
## ⚙️ Configuration
Key configuration options in `backend/.env`:
| Category | Variable | Description |
|----------|----------|-------------|
| **Email** | `SMTP_SERVER` | SMTP Server (e.g., smtp.gmail.com) |
| | `EMAIL_USER` | Your sending email address |
| **AI** | `OLLAMA_MODEL` | Model to use (default: phi3:latest) |
| **Schedule** | `CRAWLER_TIME` | Time to start crawling (e.g., "06:00") |
| | `SENDER_TIME` | Time to send emails (e.g., "07:00") |
## 📊 Usage & Monitoring
### Access Points
* **Web Dashboard**: [http://localhost:3000](http://localhost:3000) (or configured domain)
* **API**: [http://localhost:5001](http://localhost:5001)
### Useful Commands
**View Logs**
```bash
docker-compose logs -f [service_name]
# e.g., docker-compose logs -f crawler
```
**Manual Trigger**
```bash
# Run News Crawler immediately
docker-compose exec crawler python crawler_service.py 10
# Run Transport Crawler immediately
docker-compose exec transport-crawler python transport_service.py
# Send Test Newsletter
docker-compose exec sender python sender_service.py test user@example.com
```
**Database Access**
```bash
# Connect to MongoDB
docker-compose exec mongodb mongosh munich_news
```
## 🌐 Production Deployment (Traefik)
This project is configured to work with **Traefik** as a reverse proxy.
The `docker-compose.yml` includes labels for:
- `news.dongho.kim` (Frontend)
- `news-api.dongho.kim` (Backend)
To use this locally, add these to your `/etc/hosts`:
```text
127.0.0.1 news.dongho.kim news-api.dongho.kim
```
For production, ensure your Traefik proxy network is named `proxy` or update the `docker-compose.yml` accordingly.
## 🤝 Contributing
We welcome contributions! Please check [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## 📄 License
MIT License - see [LICENSE](LICENSE) for details.