# Munich News Daily - Automated Newsletter System A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking. ## โœจ Key Features - **๐Ÿค– AI-Powered Clustering** - Automatically detects duplicate stories from different sources - **๐Ÿ“ฐ Neutral Summaries** - Combines multiple perspectives into balanced coverage - **๐ŸŽฏ Smart Prioritization** - Shows most important stories first (multi-source coverage) - **๐Ÿ“Š Engagement Tracking** - Open rates, click tracking, and analytics - **โšก GPU Acceleration** - 5-10x faster AI processing with GPU support - **๐Ÿ”’ GDPR Compliant** - Privacy-first with data retention controls **๐Ÿš€ NEW:** GPU acceleration support for 5-10x faster AI processing! See [docs/GPU_SETUP.md](docs/GPU_SETUP.md) ## ๐Ÿš€ Quick Start ```bash # 1. Configure environment cp backend/.env.example backend/.env # Edit backend/.env with your email settings # 2. Start everything docker-compose up -d # 3. View logs docker-compose logs -f ``` That's it! The system will automatically: - **Backend API**: Runs continuously for tracking and analytics (http://localhost:5001) - **6:00 AM Berlin time**: Crawl news articles and generate summaries - **7:00 AM Berlin time**: Send newsletter to all subscribers ๐Ÿ“– **New to the project?** See [QUICKSTART.md](QUICKSTART.md) for a detailed 5-minute setup guide. ๐Ÿš€ **GPU Acceleration:** Enable 5-10x faster AI processing with [GPU Setup Guide](docs/GPU_SETUP.md) ## ๐Ÿ“‹ System Overview ``` 6:00 AM โ†’ News Crawler โ†“ Fetches articles from RSS feeds Extracts full content Generates AI summaries Saves to MongoDB โ†“ 7:00 AM โ†’ Newsletter Sender โ†“ Waits for crawler to finish Fetches today's articles Generates newsletter with tracking Sends to all subscribers โ†“ โœ… Done! Repeat tomorrow ``` ## ๐Ÿ—๏ธ Architecture ### Components - **Ollama**: AI service for summarization and translation (internal only, GPU-accelerated) - **MongoDB**: Data storage (articles, subscribers, tracking) (internal only) - **Backend API**: Flask API for tracking and analytics (port 5001 - only exposed service) - **News Crawler**: Automated RSS feed crawler with AI summarization (internal only) - **Newsletter Sender**: Automated email sender with tracking (internal only) - **Frontend**: React dashboard (optional) ### Technology Stack - Python 3.11 - MongoDB 7.0 - Ollama (phi3:latest model for AI) - Docker & Docker Compose - Flask (API) - Schedule (automation) - Jinja2 (email templates) ## ๐Ÿ“ฆ Installation ### Prerequisites - Docker & Docker Compose - 4GB+ RAM (for Ollama AI models) - (Optional) NVIDIA GPU for 5-10x faster AI processing ### Setup 1. **Clone the repository** ```bash git clone cd munich-news ``` 2. **Configure environment** ```bash cp backend/.env.example backend/.env # Edit backend/.env with your settings ``` 3. **Configure Ollama (AI features)** ```bash # Option 1: Use integrated Docker Compose Ollama (recommended) ./configure-ollama.sh # Select option 1 # Option 2: Use external Ollama server # Install from https://ollama.ai/download # Then run: ollama pull phi3:latest ``` 4. **Start the system** ```bash # Auto-detect GPU and start (recommended) ./start-with-gpu.sh # Or start manually docker-compose up -d # First time: Wait for Ollama model download (2-5 minutes) docker-compose logs -f ollama-setup ``` ๐Ÿ“– **For detailed Ollama setup & GPU acceleration:** See [docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md) ## โš™๏ธ Configuration Edit `backend/.env`: ```env # MongoDB MONGODB_URI=mongodb://localhost:27017/ # Email (SMTP) SMTP_SERVER=smtp.gmail.com SMTP_PORT=587 EMAIL_USER=your-email@gmail.com EMAIL_PASSWORD=your-app-password # Newsletter NEWSLETTER_MAX_ARTICLES=10 NEWSLETTER_HOURS_LOOKBACK=24 # Tracking TRACKING_ENABLED=true TRACKING_API_URL=http://localhost:5001 TRACKING_DATA_RETENTION_DAYS=90 # Ollama (AI Summarization) OLLAMA_ENABLED=true OLLAMA_BASE_URL=http://127.0.0.1:11434 OLLAMA_MODEL=phi3:latest ``` ## ๐Ÿ“Š Usage ### View Logs ```bash # All services docker-compose logs -f # Specific service docker-compose logs -f crawler docker-compose logs -f sender docker-compose logs -f mongodb ``` ### Manual Operations ```bash # Run crawler manually docker-compose exec crawler python crawler_service.py 10 # Send test newsletter docker-compose exec sender python sender_service.py test your-email@example.com # Preview newsletter docker-compose exec sender python sender_service.py preview ``` ### Database Access ```bash # Connect to MongoDB docker-compose exec mongodb mongosh munich_news # View articles db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty() # View subscribers db.subscribers.find({ active: true }).pretty() # View tracking data db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty() ``` ## ๐Ÿ”ง Management ### Add RSS Feeds ```bash mongosh munich_news db.rss_feeds.insertOne({ name: "Source Name", url: "https://example.com/rss", active: true }) ``` ### Add Subscribers ```bash mongosh munich_news db.subscribers.insertOne({ email: "user@example.com", active: true, tracking_enabled: true, subscribed_at: new Date() }) ``` ### View Analytics ```bash # Newsletter metrics curl http://localhost:5001/api/analytics/newsletter/2024-01-15 # Article performance curl http://localhost:5001/api/analytics/article/https://example.com/article # Subscriber activity curl http://localhost:5001/api/analytics/subscriber/user@example.com ``` ## โฐ Schedule Configuration ### Change Crawler Time (default: 6:00 AM) Edit `news_crawler/scheduled_crawler.py`: ```python schedule.every().day.at("06:00").do(run_crawler) # Change time ``` ### Change Sender Time (default: 7:00 AM) Edit `news_sender/scheduled_sender.py`: ```python schedule.every().day.at("07:00").do(run_sender) # Change time ``` After changes: ```bash docker-compose up -d --build ``` ## ๐Ÿ“ˆ Monitoring ### Container Status ```bash docker-compose ps ``` ### Check Next Scheduled Runs ```bash # Crawler docker-compose logs crawler | grep "Next scheduled run" # Sender docker-compose logs sender | grep "Next scheduled run" ``` ### Engagement Metrics ```bash mongosh munich_news // Open rate var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" }) var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true }) print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%") // Click rate var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" }) print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%") ``` ## ๐Ÿ› Troubleshooting ### Crawler Not Finding Articles ```bash # Check RSS feeds mongosh munich_news --eval "db.rss_feeds.find({ active: true })" # Test manually docker-compose exec crawler python crawler_service.py 5 ``` ### Newsletter Not Sending ```bash # Check email config docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)" # Test email docker-compose exec sender python sender_service.py test your-email@example.com ``` ### Containers Not Starting ```bash # Check logs docker-compose logs # Rebuild docker-compose up -d --build # Reset everything docker-compose down -v docker-compose up -d ``` ## ๐Ÿ” Privacy & Compliance ### GDPR Features - **Data Retention**: Automatic anonymization after 90 days - **Opt-Out**: Subscribers can disable tracking - **Data Deletion**: Full data removal on request - **Transparency**: Privacy notice in all emails ### Privacy Endpoints ```bash # Delete subscriber data curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com # Anonymize old data curl -X POST http://localhost:5001/api/tracking/anonymize # Opt out of tracking curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out ``` ## ๐Ÿ“š Documentation ### Getting Started - **[QUICKSTART.md](QUICKSTART.md)** - 5-minute setup guide - **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines ### Core Features - **[docs/AI_NEWS_AGGREGATION.md](docs/AI_NEWS_AGGREGATION.md)** - AI-powered clustering & neutral summaries - **[docs/FEATURES.md](docs/FEATURES.md)** - Complete feature list - **[docs/API.md](docs/API.md)** - API endpoints reference ### Technical Documentation - **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** - System architecture - **[docs/SETUP.md](docs/SETUP.md)** - Detailed setup guide - **[docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)** - AI/Ollama configuration - **[docs/GPU_SETUP.md](docs/GPU_SETUP.md)** - GPU acceleration setup - **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Production deployment - **[docs/SECURITY.md](docs/SECURITY.md)** - Security best practices - **[docs/REFERENCE.md](docs/REFERENCE.md)** - Complete reference - **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Deployment guide - **[docs/API.md](docs/API.md)** - API reference - **[docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md)** - Database structure - **[docs/BACKEND_STRUCTURE.md](docs/BACKEND_STRUCTURE.md)** - Backend organization ### Component Documentation - **[docs/CRAWLER_HOW_IT_WORKS.md](docs/CRAWLER_HOW_IT_WORKS.md)** - Crawler internals - **[docs/EXTRACTION_STRATEGIES.md](docs/EXTRACTION_STRATEGIES.md)** - Content extraction - **[docs/RSS_URL_EXTRACTION.md](docs/RSS_URL_EXTRACTION.md)** - RSS parsing ## ๐Ÿงช Testing All test files are organized in the `tests/` directory: ```bash # Run crawler tests docker-compose exec crawler python tests/crawler/test_crawler.py # Run sender tests docker-compose exec sender python tests/sender/test_tracking_integration.py # Run backend tests docker-compose exec backend python tests/backend/test_tracking.py ``` ## ๐Ÿš€ Production Deployment ### Environment Setup 1. Update `backend/.env` with production values 2. Set strong MongoDB password 3. Use HTTPS for tracking URLs 4. Configure proper SMTP server ### Security ```bash # Use production compose file docker-compose -f docker-compose.prod.yml up -d # Set MongoDB password export MONGO_PASSWORD=your-secure-password ``` ### Monitoring - Set up log rotation - Configure health checks - Set up alerts for failures - Monitor database size ## ๐Ÿ“š Documentation Complete documentation available in the [docs/](docs/) directory: - **[Documentation Index](docs/INDEX.md)** - Complete documentation guide - **[GPU Setup](docs/GPU_SETUP.md)** - 5-10x faster with GPU acceleration - **[Admin API](docs/ADMIN_API.md)** - API endpoints reference - **[Security Guide](docs/SECURITY_NOTES.md)** - Security best practices - **[System Architecture](docs/SYSTEM_ARCHITECTURE.md)** - Technical overview ## ๐Ÿ“ License [Your License Here] ## ๐Ÿค Contributing Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first. ## ๐Ÿ“ง Support For issues or questions, please open a GitHub issue. --- **Built with โค๏ธ for Munich News Daily**