dongho 7346ee9de2
All checks were successful
dongho-repo/Munich-news/pipeline/head This commit looks good
update
2025-12-10 15:57:07 +00:00
2025-12-10 15:50:11 +00:00
2025-11-18 14:45:41 +01:00
2025-12-10 15:50:11 +00:00
2025-12-10 15:50:11 +00:00
2025-11-21 12:29:41 +01:00
2025-11-16 20:59:14 +01:00
2025-11-18 14:45:41 +01:00
2025-11-14 13:00:36 +01:00
2025-11-10 19:13:33 +01:00
2025-11-18 14:45:41 +01:00
2025-11-18 14:45:41 +01:00
2025-11-16 20:59:14 +01:00
2025-11-11 14:09:21 +01:00
2025-11-28 15:50:44 +00:00
2025-11-11 17:20:56 +01:00
2025-11-18 14:45:41 +01:00
2025-12-10 15:50:11 +00:00
2025-12-10 15:52:41 +00:00
2025-12-10 15:57:07 +00:00
2025-12-10 15:57:07 +00:00

Munich News Daily - Automated Newsletter System

A fully automated news aggregation system that crawls Munich news sources, generates AI-powered summaries, tracks local transport disruptions, and delivers personalized daily newsletters.

Munich News Daily

Key Features

  • 🤖 AI-Powered Clustering - Smartly detects duplicate stories and groups related articles using ChromaDB vector search.
  • 📝 Neutral Summaries - Generates balanced, multi-perspective summaries using local LLMs (Ollama).
  • 🚇 Transport Updates - Real-time tracking of Munich public transport (MVG) disruptions options.
  • 🎯 Smart Prioritization - Ranks stories based on relevance and user preferences.
  • 🎨 Personalized Newsletters - diverse content delivery system.
  • 📊 Engagement Analytics - Detailed tracking of open rates, click-throughs, and user interests.
  • GPU Acceleration - Integrated support for NVIDIA GPUs for faster AI processing.
  • 🔒 Privacy First - GDPR-compliant with automatic data retention policies and anonymization.

🚀 Quick Start

For a detailed 5-minute setup guide, see QUICKSTART.md.

# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your email settings

# 2. Start everything (Auto-detects GPU)
./start-with-gpu.sh

# Questions?
# See logs: docker-compose logs -f

The system will automatically:

  1. 6:00 AM: Crawl news & transport updates.
  2. 6:30 AM: Generate AI summaries & clusters.
  3. 7:00 AM: Send personalized newsletters.

📋 System Architecture

The system is built as a set of microservices orchestrated by Docker Compose.

graph TD
    User[Subscribers] -->|Email| Sender[Newsletter Sender]
    User -->|Web| Frontend[React Frontend]
    Frontend -->|API| Backend[Backend API]
    
    subgraph "Core Services"
        Crawler[News Crawler]
        Transport[Transport Crawler]
        Sender
        Backend
    end
    
    subgraph "Data & AI"
        Mongo[(MongoDB)]
        Redis[(Redis)]
        Chroma[(ChromaDB)]
        Ollama[Ollama AI]
    end
    
    Crawler -->|Save| Mongo
    Crawler -->|Embeddings| Chroma
    Crawler -->|Summarize| Ollama
    
    Transport -->|Save| Mongo
    
    Sender -->|Read| Mongo
    Sender -->|Track| Backend
    
    Backend -->|Read/Write| Mongo
    Backend -->|Cache| Redis

Core Components

Service Description Port
Frontend React-based user dashboard and admin interface. 3000
Backend API Flask API for tracking, analytics, and management. 5001
News Crawler Fetches RSS feeds, extracts content, and runs AI clustering. -
Transport Crawler Monitors MVG (Munich Transport) for delays and disruptions. -
Newsletter Sender Manages subscribers, generates templates, and sends emails. -
Ollama Local LLM runner for on-premise AI (Phi-3, Llama3, etc.). -
ChromaDB Vector database for semantic search and article clustering. -

📂 Project Structure

munich-news/
├── backend/            # Flask API for tracking & analytics
├── frontend/           # React dashboard & admin UI
├── news_crawler/       # RSS fetcher & AI summarizer service
├── news_sender/        # Email generation & dispatch service
├── transport_crawler/  # MVG transport disruption monitor
├── docker-compose.yml  # Main service orchestration
└── docs/               # Detailed documentation

🛠️ Installation & Setup

  1. Clone the repository

    git clone https://github.com/yourusername/munich-news.git
    cd munich-news
    
  2. Environment Configuration

    cp backend/.env.example backend/.env
    nano backend/.env
    

    Critical settings: SMTP_SERVER, EMAIL_USER, EMAIL_PASSWORD.

  3. Start the System

    # Recommended: Helper script (handles GPU & Model setup)
    ./start-with-gpu.sh
    
    # Alternative: Standard Docker Compose
    docker-compose up -d
    
  4. Initial Setup (First Run)

    • The system needs to download the AI model (approx. 2GB).
    • Watch progress: docker-compose logs -f ollama-setup

⚙️ Configuration

Key configuration options in backend/.env:

Category Variable Description
Email SMTP_SERVER SMTP Server (e.g., smtp.gmail.com)
EMAIL_USER Your sending email address
AI OLLAMA_MODEL Model to use (default: phi3:latest)
Schedule CRAWLER_TIME Time to start crawling (e.g., "06:00")
SENDER_TIME Time to send emails (e.g., "07:00")

📊 Usage & Monitoring

Access Points

Useful Commands

View Logs

docker-compose logs -f [service_name]
# e.g., docker-compose logs -f crawler

Manual Trigger

# Run News Crawler immediately
docker-compose exec crawler python crawler_service.py 10

# Run Transport Crawler immediately
docker-compose exec transport-crawler python transport_service.py

# Send Test Newsletter
docker-compose exec sender python sender_service.py test user@example.com

Database Access

# Connect to MongoDB
docker-compose exec mongodb mongosh munich_news

🌐 Production Deployment (Traefik)

This project is configured to work with Traefik as a reverse proxy. The docker-compose.yml includes labels for:

  • news.dongho.kim (Frontend)
  • news-api.dongho.kim (Backend)

To use this locally, add these to your /etc/hosts:

127.0.0.1 news.dongho.kim news-api.dongho.kim

For production, ensure your Traefik proxy network is named proxy or update the docker-compose.yml accordingly.

🤝 Contributing

We welcome contributions! Please check CONTRIBUTING.md for guidelines.

📄 License

MIT License - see LICENSE for details.

Description
No description provided
https://news.dongho.kim
Readme 804 KiB
Languages
Python 74.4%
HTML 10.7%
JavaScript 8.7%
Shell 4%
CSS 1.6%
Other 0.6%