Files
Munich-news/README.md
2025-11-11 17:20:56 +01:00

415 lines
9.2 KiB
Markdown

# Munich News Daily - Automated Newsletter System
A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.
**🚀 NEW:** GPU acceleration support for 5-10x faster AI processing! See [QUICK_START_GPU.md](QUICK_START_GPU.md)
## 🚀 Quick Start
```bash
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your email settings
# 2. Start everything
docker-compose up -d
# 3. View logs
docker-compose logs -f
```
That's it! The system will automatically:
- **Backend API**: Runs continuously for tracking and analytics (http://localhost:5001)
- **6:00 AM Berlin time**: Crawl news articles and generate summaries
- **7:00 AM Berlin time**: Send newsletter to all subscribers
📖 **New to the project?** See [QUICKSTART.md](QUICKSTART.md) for a detailed 5-minute setup guide.
## 📋 System Overview
```
6:00 AM → News Crawler
Fetches articles from RSS feeds
Extracts full content
Generates AI summaries
Saves to MongoDB
7:00 AM → Newsletter Sender
Waits for crawler to finish
Fetches today's articles
Generates newsletter with tracking
Sends to all subscribers
✅ Done! Repeat tomorrow
```
## 🏗️ Architecture
### Components
- **Ollama**: AI service for summarization and translation (port 11434)
- **MongoDB**: Data storage (articles, subscribers, tracking)
- **Backend API**: Flask API for tracking and analytics (port 5001)
- **News Crawler**: Automated RSS feed crawler with AI summarization
- **Newsletter Sender**: Automated email sender with tracking
- **Frontend**: React dashboard (optional)
### Technology Stack
- Python 3.11
- MongoDB 7.0
- Ollama (phi3:latest model for AI)
- Docker & Docker Compose
- Flask (API)
- Schedule (automation)
- Jinja2 (email templates)
## 📦 Installation
### Prerequisites
- Docker & Docker Compose
- 4GB+ RAM (for Ollama AI models)
- (Optional) NVIDIA GPU for 5-10x faster AI processing
### Setup
1. **Clone the repository**
```bash
git clone <repository-url>
cd munich-news
```
2. **Configure environment**
```bash
cp backend/.env.example backend/.env
# Edit backend/.env with your settings
```
3. **Configure Ollama (AI features)**
```bash
# Option 1: Use integrated Docker Compose Ollama (recommended)
./configure-ollama.sh
# Select option 1
# Option 2: Use external Ollama server
# Install from https://ollama.ai/download
# Then run: ollama pull phi3:latest
```
4. **Start the system**
```bash
# Auto-detect GPU and start (recommended)
./start-with-gpu.sh
# Or start manually
docker-compose up -d
# First time: Wait for Ollama model download (2-5 minutes)
docker-compose logs -f ollama-setup
```
📖 **For detailed Ollama setup & GPU acceleration:** See [docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)
## ⚙️ Configuration
Edit `backend/.env`:
```env
# MongoDB
MONGODB_URI=mongodb://localhost:27017/
# Email (SMTP)
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
EMAIL_USER=your-email@gmail.com
EMAIL_PASSWORD=your-app-password
# Newsletter
NEWSLETTER_MAX_ARTICLES=10
NEWSLETTER_HOURS_LOOKBACK=24
# Tracking
TRACKING_ENABLED=true
TRACKING_API_URL=http://localhost:5001
TRACKING_DATA_RETENTION_DAYS=90
# Ollama (AI Summarization)
OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_MODEL=phi3:latest
```
## 📊 Usage
### View Logs
```bash
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f crawler
docker-compose logs -f sender
docker-compose logs -f mongodb
```
### Manual Operations
```bash
# Run crawler manually
docker-compose exec crawler python crawler_service.py 10
# Send test newsletter
docker-compose exec sender python sender_service.py test your-email@example.com
# Preview newsletter
docker-compose exec sender python sender_service.py preview
```
### Database Access
```bash
# Connect to MongoDB
docker-compose exec mongodb mongosh munich_news
# View articles
db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()
# View subscribers
db.subscribers.find({ active: true }).pretty()
# View tracking data
db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()
```
## 🔧 Management
### Add RSS Feeds
```bash
mongosh munich_news
db.rss_feeds.insertOne({
name: "Source Name",
url: "https://example.com/rss",
active: true
})
```
### Add Subscribers
```bash
mongosh munich_news
db.subscribers.insertOne({
email: "user@example.com",
active: true,
tracking_enabled: true,
subscribed_at: new Date()
})
```
### View Analytics
```bash
# Newsletter metrics
curl http://localhost:5001/api/analytics/newsletter/2024-01-15
# Article performance
curl http://localhost:5001/api/analytics/article/https://example.com/article
# Subscriber activity
curl http://localhost:5001/api/analytics/subscriber/user@example.com
```
## ⏰ Schedule Configuration
### Change Crawler Time (default: 6:00 AM)
Edit `news_crawler/scheduled_crawler.py`:
```python
schedule.every().day.at("06:00").do(run_crawler) # Change time
```
### Change Sender Time (default: 7:00 AM)
Edit `news_sender/scheduled_sender.py`:
```python
schedule.every().day.at("07:00").do(run_sender) # Change time
```
After changes:
```bash
docker-compose up -d --build
```
## 📈 Monitoring
### Container Status
```bash
docker-compose ps
```
### Check Next Scheduled Runs
```bash
# Crawler
docker-compose logs crawler | grep "Next scheduled run"
# Sender
docker-compose logs sender | grep "Next scheduled run"
```
### Engagement Metrics
```bash
mongosh munich_news
// Open rate
var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")
// Click rate
var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")
```
## 🐛 Troubleshooting
### Crawler Not Finding Articles
```bash
# Check RSS feeds
mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
# Test manually
docker-compose exec crawler python crawler_service.py 5
```
### Newsletter Not Sending
```bash
# Check email config
docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
# Test email
docker-compose exec sender python sender_service.py test your-email@example.com
```
### Containers Not Starting
```bash
# Check logs
docker-compose logs
# Rebuild
docker-compose up -d --build
# Reset everything
docker-compose down -v
docker-compose up -d
```
## 🔐 Privacy & Compliance
### GDPR Features
- **Data Retention**: Automatic anonymization after 90 days
- **Opt-Out**: Subscribers can disable tracking
- **Data Deletion**: Full data removal on request
- **Transparency**: Privacy notice in all emails
### Privacy Endpoints
```bash
# Delete subscriber data
curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
# Anonymize old data
curl -X POST http://localhost:5001/api/tracking/anonymize
# Opt out of tracking
curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
```
## 📚 Documentation
### Getting Started
- **[QUICKSTART.md](QUICKSTART.md)** - 5-minute setup guide
- **[PROJECT_STRUCTURE.md](PROJECT_STRUCTURE.md)** - Project layout
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines
### Technical Documentation
- **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** - System architecture
- **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Deployment guide
- **[docs/API.md](docs/API.md)** - API reference
- **[docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md)** - Database structure
- **[docs/BACKEND_STRUCTURE.md](docs/BACKEND_STRUCTURE.md)** - Backend organization
### Component Documentation
- **[docs/CRAWLER_HOW_IT_WORKS.md](docs/CRAWLER_HOW_IT_WORKS.md)** - Crawler internals
- **[docs/EXTRACTION_STRATEGIES.md](docs/EXTRACTION_STRATEGIES.md)** - Content extraction
- **[docs/RSS_URL_EXTRACTION.md](docs/RSS_URL_EXTRACTION.md)** - RSS parsing
## 🧪 Testing
All test files are organized in the `tests/` directory:
```bash
# Run crawler tests
docker-compose exec crawler python tests/crawler/test_crawler.py
# Run sender tests
docker-compose exec sender python tests/sender/test_tracking_integration.py
# Run backend tests
docker-compose exec backend python tests/backend/test_tracking.py
```
## 🚀 Production Deployment
### Environment Setup
1. Update `backend/.env` with production values
2. Set strong MongoDB password
3. Use HTTPS for tracking URLs
4. Configure proper SMTP server
### Security
```bash
# Use production compose file
docker-compose -f docker-compose.prod.yml up -d
# Set MongoDB password
export MONGO_PASSWORD=your-secure-password
```
### Monitoring
- Set up log rotation
- Configure health checks
- Set up alerts for failures
- Monitor database size
## 📝 License
[Your License Here]
## 🤝 Contributing
Contributions welcome! Please read CONTRIBUTING.md first.
## 📧 Support
For issues or questions, please open a GitHub issue.
---
**Built with ❤️ for Munich News Daily**