2025-11-17 12:46:59 +01:00
2025-11-11 16:58:03 +01:00
2025-11-14 12:51:18 +01:00
2025-11-12 13:35:59 +01:00
2025-11-12 22:33:56 +01:00
2025-11-14 12:51:18 +01:00
2025-11-17 12:46:59 +01:00
2025-11-16 20:59:14 +01:00
2025-11-12 11:34:33 +01:00
2025-11-14 13:00:36 +01:00
2025-11-10 19:13:33 +01:00
2025-11-12 13:50:50 +01:00
2025-11-16 20:59:14 +01:00
2025-11-11 14:09:21 +01:00
2025-11-11 17:20:56 +01:00
2025-11-16 20:59:14 +01:00
2025-11-12 11:34:33 +01:00
2025-11-12 13:35:59 +01:00

Munich News Daily - Automated Newsletter System

A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.

Key Features

  • 🤖 AI-Powered Clustering - Automatically detects duplicate stories from different sources
  • 📰 Neutral Summaries - Combines multiple perspectives into balanced coverage
  • 🎯 Smart Prioritization - Shows most important stories first (multi-source coverage)
  • 📊 Engagement Tracking - Open rates, click tracking, and analytics
  • GPU Acceleration - 5-10x faster AI processing with GPU support
  • 🔒 GDPR Compliant - Privacy-first with data retention controls

🚀 NEW: GPU acceleration support for 5-10x faster AI processing! See docs/GPU_SETUP.md

🚀 Quick Start

# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your email settings

# 2. Start everything
docker-compose up -d

# 3. View logs
docker-compose logs -f

That's it! The system will automatically:

  • Frontend: Web interface and admin dashboard (http://localhost:3000)
  • Backend API: Runs continuously for tracking and analytics (http://localhost:5001)
  • 6:00 AM Berlin time: Crawl news articles and generate summaries
  • 7:00 AM Berlin time: Send newsletter to all subscribers

Access Points

📖 New to the project? See QUICKSTART.md for a detailed 5-minute setup guide.

🚀 GPU Acceleration: Enable 5-10x faster AI processing with GPU Setup Guide

📋 System Overview

6:00 AM → News Crawler
          ↓
          Fetches articles from RSS feeds
          Extracts full content
          Generates AI summaries
          Saves to MongoDB
          ↓
7:00 AM → Newsletter Sender
          ↓
          Waits for crawler to finish
          Fetches today's articles
          Generates newsletter with tracking
          Sends to all subscribers
          ↓
          ✅ Done! Repeat tomorrow

🏗️ Architecture

Components

  • Ollama: AI service for summarization and translation (internal only, GPU-accelerated)
  • MongoDB: Data storage (articles, subscribers, tracking) (internal only)
  • Backend API: Flask API for tracking and analytics (port 5001 - only exposed service)
  • News Crawler: Automated RSS feed crawler with AI summarization (internal only)
  • Newsletter Sender: Automated email sender with tracking (internal only)
  • Frontend: React dashboard (optional)

Technology Stack

  • Python 3.11
  • MongoDB 7.0
  • Ollama (phi3:latest model for AI)
  • Docker & Docker Compose
  • Flask (API)
  • Schedule (automation)
  • Jinja2 (email templates)

📦 Installation

Prerequisites

  • Docker & Docker Compose
  • 4GB+ RAM (for Ollama AI models)
  • (Optional) NVIDIA GPU for 5-10x faster AI processing

Setup

  1. Clone the repository

    git clone <repository-url>
    cd munich-news
    
  2. Configure environment

    cp backend/.env.example backend/.env
    # Edit backend/.env with your settings
    
  3. Configure Ollama (AI features)

    # Option 1: Use integrated Docker Compose Ollama (recommended)
    ./configure-ollama.sh
    # Select option 1
    
    # Option 2: Use external Ollama server
    # Install from https://ollama.ai/download
    # Then run: ollama pull phi3:latest
    
  4. Start the system

    # Auto-detect GPU and start (recommended)
    ./start-with-gpu.sh
    
    # Or start manually
    docker-compose up -d
    
    # First time: Wait for Ollama model download (2-5 minutes)
    docker-compose logs -f ollama-setup
    

📖 For detailed Ollama setup & GPU acceleration: See docs/OLLAMA_SETUP.md

💡 To change AI model: Edit OLLAMA_MODEL in .env, then run ./pull-ollama-model.sh. See docs/CHANGING_AI_MODEL.md

⚙️ Configuration

Edit backend/.env:

# MongoDB
MONGODB_URI=mongodb://localhost:27017/

# Email (SMTP)
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
EMAIL_USER=your-email@gmail.com
EMAIL_PASSWORD=your-app-password

# Newsletter
NEWSLETTER_MAX_ARTICLES=10
NEWSLETTER_HOURS_LOOKBACK=24

# Tracking
TRACKING_ENABLED=true
TRACKING_API_URL=http://localhost:5001
TRACKING_DATA_RETENTION_DAYS=90

# Ollama (AI Summarization)
OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_MODEL=phi3:latest

📊 Usage

View Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f crawler
docker-compose logs -f sender
docker-compose logs -f mongodb

Manual Operations

# Run crawler manually
docker-compose exec crawler python crawler_service.py 10

# Send test newsletter
docker-compose exec sender python sender_service.py test your-email@example.com

# Preview newsletter
docker-compose exec sender python sender_service.py preview

Database Access

# Connect to MongoDB
docker-compose exec mongodb mongosh munich_news

# View articles
db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()

# View subscribers
db.subscribers.find({ active: true }).pretty()

# View tracking data
db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()

🔧 Management

Add RSS Feeds

mongosh munich_news

db.rss_feeds.insertOne({
  name: "Source Name",
  url: "https://example.com/rss",
  active: true
})

Add Subscribers

mongosh munich_news

db.subscribers.insertOne({
  email: "user@example.com",
  active: true,
  tracking_enabled: true,
  subscribed_at: new Date()
})

View Analytics

# Newsletter metrics
curl http://localhost:5001/api/analytics/newsletter/2024-01-15

# Article performance
curl http://localhost:5001/api/analytics/article/https://example.com/article

# Subscriber activity
curl http://localhost:5001/api/analytics/subscriber/user@example.com

Schedule Configuration

Change Crawler Time (default: 6:00 AM)

Edit news_crawler/scheduled_crawler.py:

schedule.every().day.at("06:00").do(run_crawler)  # Change time

Change Sender Time (default: 7:00 AM)

Edit news_sender/scheduled_sender.py:

schedule.every().day.at("07:00").do(run_sender)  # Change time

After changes:

docker-compose up -d --build

📈 Monitoring

Container Status

docker-compose ps

Check Next Scheduled Runs

# Crawler
docker-compose logs crawler | grep "Next scheduled run"

# Sender
docker-compose logs sender | grep "Next scheduled run"

Engagement Metrics

mongosh munich_news

// Open rate
var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")

// Click rate
var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")

🐛 Troubleshooting

Crawler Not Finding Articles

# Check RSS feeds
mongosh munich_news --eval "db.rss_feeds.find({ active: true })"

# Test manually
docker-compose exec crawler python crawler_service.py 5

Newsletter Not Sending

# Check email config
docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"

# Test email
docker-compose exec sender python sender_service.py test your-email@example.com

Containers Not Starting

# Check logs
docker-compose logs

# Rebuild
docker-compose up -d --build

# Reset everything
docker-compose down -v
docker-compose up -d

🔐 Privacy & Compliance

GDPR Features

  • Data Retention: Automatic anonymization after 90 days
  • Opt-Out: Subscribers can disable tracking
  • Data Deletion: Full data removal on request
  • Transparency: Privacy notice in all emails

Privacy Endpoints

# Delete subscriber data
curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com

# Anonymize old data
curl -X POST http://localhost:5001/api/tracking/anonymize

# Opt out of tracking
curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out

📚 Documentation

Getting Started

Core Features

Technical Documentation

Component Documentation

🧪 Testing

All test files are organized in the tests/ directory:

# Run crawler tests
docker-compose exec crawler python tests/crawler/test_crawler.py

# Run sender tests
docker-compose exec sender python tests/sender/test_tracking_integration.py

# Run backend tests
docker-compose exec backend python tests/backend/test_tracking.py

🚀 Production Deployment

Environment Setup

  1. Update backend/.env with production values
  2. Set strong MongoDB password
  3. Use HTTPS for tracking URLs
  4. Configure proper SMTP server

Security

# Use production compose file
docker-compose -f docker-compose.prod.yml up -d

# Set MongoDB password
export MONGO_PASSWORD=your-secure-password

Monitoring

  • Set up log rotation
  • Configure health checks
  • Set up alerts for failures
  • Monitor database size

📚 Documentation

Complete documentation available in the docs/ directory:

📝 License

[Your License Here]

🤝 Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

📧 Support

For issues or questions, please open a GitHub issue.


Built with ❤️ for Munich News Daily

Description
No description provided
https://news.dongho.kim
Readme 804 KiB
Languages
Python 74.4%
HTML 10.7%
JavaScript 8.7%
Shell 4%
CSS 1.6%
Other 0.6%