391 lines
8.4 KiB
Markdown
391 lines
8.4 KiB
Markdown
# Munich News Daily - Automated Newsletter System
|
|
|
|
A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.
|
|
|
|
## 🚀 Quick Start
|
|
|
|
```bash
|
|
# 1. Configure environment
|
|
cp backend/.env.example backend/.env
|
|
# Edit backend/.env with your email settings
|
|
|
|
# 2. Start everything
|
|
docker-compose up -d
|
|
|
|
# 3. View logs
|
|
docker-compose logs -f
|
|
```
|
|
|
|
That's it! The system will automatically:
|
|
- **Backend API**: Runs continuously for tracking and analytics (http://localhost:5001)
|
|
- **6:00 AM Berlin time**: Crawl news articles and generate summaries
|
|
- **7:00 AM Berlin time**: Send newsletter to all subscribers
|
|
|
|
📖 **New to the project?** See [QUICKSTART.md](QUICKSTART.md) for a detailed 5-minute setup guide.
|
|
|
|
## 📋 System Overview
|
|
|
|
```
|
|
6:00 AM → News Crawler
|
|
↓
|
|
Fetches articles from RSS feeds
|
|
Extracts full content
|
|
Generates AI summaries
|
|
Saves to MongoDB
|
|
↓
|
|
7:00 AM → Newsletter Sender
|
|
↓
|
|
Waits for crawler to finish
|
|
Fetches today's articles
|
|
Generates newsletter with tracking
|
|
Sends to all subscribers
|
|
↓
|
|
✅ Done! Repeat tomorrow
|
|
```
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### Components
|
|
|
|
- **MongoDB**: Data storage (articles, subscribers, tracking)
|
|
- **Backend API**: Flask API for tracking and analytics (port 5001)
|
|
- **News Crawler**: Automated RSS feed crawler with AI summarization
|
|
- **Newsletter Sender**: Automated email sender with tracking
|
|
- **Frontend**: React dashboard (optional)
|
|
|
|
### Technology Stack
|
|
|
|
- Python 3.11
|
|
- MongoDB 7.0
|
|
- Docker & Docker Compose
|
|
- Flask (API)
|
|
- Ollama (AI summarization)
|
|
- Schedule (automation)
|
|
- Jinja2 (email templates)
|
|
|
|
## 📦 Installation
|
|
|
|
### Prerequisites
|
|
|
|
- Docker & Docker Compose
|
|
- (Optional) Ollama for AI summarization
|
|
|
|
### Setup
|
|
|
|
1. **Clone the repository**
|
|
```bash
|
|
git clone <repository-url>
|
|
cd munich-news
|
|
```
|
|
|
|
2. **Configure environment**
|
|
```bash
|
|
cp backend/.env.example backend/.env
|
|
# Edit backend/.env with your settings
|
|
```
|
|
|
|
3. **Start the system**
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
## ⚙️ Configuration
|
|
|
|
Edit `backend/.env`:
|
|
|
|
```env
|
|
# MongoDB
|
|
MONGODB_URI=mongodb://localhost:27017/
|
|
|
|
# Email (SMTP)
|
|
SMTP_SERVER=smtp.gmail.com
|
|
SMTP_PORT=587
|
|
EMAIL_USER=your-email@gmail.com
|
|
EMAIL_PASSWORD=your-app-password
|
|
|
|
# Newsletter
|
|
NEWSLETTER_MAX_ARTICLES=10
|
|
NEWSLETTER_HOURS_LOOKBACK=24
|
|
|
|
# Tracking
|
|
TRACKING_ENABLED=true
|
|
TRACKING_API_URL=http://localhost:5001
|
|
TRACKING_DATA_RETENTION_DAYS=90
|
|
|
|
# Ollama (AI Summarization)
|
|
OLLAMA_ENABLED=true
|
|
OLLAMA_BASE_URL=http://127.0.0.1:11434
|
|
OLLAMA_MODEL=phi3:latest
|
|
```
|
|
|
|
## 📊 Usage
|
|
|
|
### View Logs
|
|
|
|
```bash
|
|
# All services
|
|
docker-compose logs -f
|
|
|
|
# Specific service
|
|
docker-compose logs -f crawler
|
|
docker-compose logs -f sender
|
|
docker-compose logs -f mongodb
|
|
```
|
|
|
|
### Manual Operations
|
|
|
|
```bash
|
|
# Run crawler manually
|
|
docker-compose exec crawler python crawler_service.py 10
|
|
|
|
# Send test newsletter
|
|
docker-compose exec sender python sender_service.py test your-email@example.com
|
|
|
|
# Preview newsletter
|
|
docker-compose exec sender python sender_service.py preview
|
|
```
|
|
|
|
### Database Access
|
|
|
|
```bash
|
|
# Connect to MongoDB
|
|
docker-compose exec mongodb mongosh munich_news
|
|
|
|
# View articles
|
|
db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()
|
|
|
|
# View subscribers
|
|
db.subscribers.find({ active: true }).pretty()
|
|
|
|
# View tracking data
|
|
db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()
|
|
```
|
|
|
|
## 🔧 Management
|
|
|
|
### Add RSS Feeds
|
|
|
|
```bash
|
|
mongosh munich_news
|
|
|
|
db.rss_feeds.insertOne({
|
|
name: "Source Name",
|
|
url: "https://example.com/rss",
|
|
active: true
|
|
})
|
|
```
|
|
|
|
### Add Subscribers
|
|
|
|
```bash
|
|
mongosh munich_news
|
|
|
|
db.subscribers.insertOne({
|
|
email: "user@example.com",
|
|
active: true,
|
|
tracking_enabled: true,
|
|
subscribed_at: new Date()
|
|
})
|
|
```
|
|
|
|
### View Analytics
|
|
|
|
```bash
|
|
# Newsletter metrics
|
|
curl http://localhost:5001/api/analytics/newsletter/2024-01-15
|
|
|
|
# Article performance
|
|
curl http://localhost:5001/api/analytics/article/https://example.com/article
|
|
|
|
# Subscriber activity
|
|
curl http://localhost:5001/api/analytics/subscriber/user@example.com
|
|
```
|
|
|
|
## ⏰ Schedule Configuration
|
|
|
|
### Change Crawler Time (default: 6:00 AM)
|
|
|
|
Edit `news_crawler/scheduled_crawler.py`:
|
|
```python
|
|
schedule.every().day.at("06:00").do(run_crawler) # Change time
|
|
```
|
|
|
|
### Change Sender Time (default: 7:00 AM)
|
|
|
|
Edit `news_sender/scheduled_sender.py`:
|
|
```python
|
|
schedule.every().day.at("07:00").do(run_sender) # Change time
|
|
```
|
|
|
|
After changes:
|
|
```bash
|
|
docker-compose up -d --build
|
|
```
|
|
|
|
## 📈 Monitoring
|
|
|
|
### Container Status
|
|
|
|
```bash
|
|
docker-compose ps
|
|
```
|
|
|
|
### Check Next Scheduled Runs
|
|
|
|
```bash
|
|
# Crawler
|
|
docker-compose logs crawler | grep "Next scheduled run"
|
|
|
|
# Sender
|
|
docker-compose logs sender | grep "Next scheduled run"
|
|
```
|
|
|
|
### Engagement Metrics
|
|
|
|
```bash
|
|
mongosh munich_news
|
|
|
|
// Open rate
|
|
var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
|
|
var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
|
|
print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")
|
|
|
|
// Click rate
|
|
var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
|
|
print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")
|
|
```
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Crawler Not Finding Articles
|
|
|
|
```bash
|
|
# Check RSS feeds
|
|
mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
|
|
|
|
# Test manually
|
|
docker-compose exec crawler python crawler_service.py 5
|
|
```
|
|
|
|
### Newsletter Not Sending
|
|
|
|
```bash
|
|
# Check email config
|
|
docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
|
|
|
|
# Test email
|
|
docker-compose exec sender python sender_service.py test your-email@example.com
|
|
```
|
|
|
|
### Containers Not Starting
|
|
|
|
```bash
|
|
# Check logs
|
|
docker-compose logs
|
|
|
|
# Rebuild
|
|
docker-compose up -d --build
|
|
|
|
# Reset everything
|
|
docker-compose down -v
|
|
docker-compose up -d
|
|
```
|
|
|
|
## 🔐 Privacy & Compliance
|
|
|
|
### GDPR Features
|
|
|
|
- **Data Retention**: Automatic anonymization after 90 days
|
|
- **Opt-Out**: Subscribers can disable tracking
|
|
- **Data Deletion**: Full data removal on request
|
|
- **Transparency**: Privacy notice in all emails
|
|
|
|
### Privacy Endpoints
|
|
|
|
```bash
|
|
# Delete subscriber data
|
|
curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
|
|
|
|
# Anonymize old data
|
|
curl -X POST http://localhost:5001/api/tracking/anonymize
|
|
|
|
# Opt out of tracking
|
|
curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
|
|
```
|
|
|
|
## 📚 Documentation
|
|
|
|
### Getting Started
|
|
- **[QUICKSTART.md](QUICKSTART.md)** - 5-minute setup guide
|
|
- **[PROJECT_STRUCTURE.md](PROJECT_STRUCTURE.md)** - Project layout
|
|
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines
|
|
|
|
### Technical Documentation
|
|
- **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** - System architecture
|
|
- **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Deployment guide
|
|
- **[docs/API.md](docs/API.md)** - API reference
|
|
- **[docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md)** - Database structure
|
|
- **[docs/BACKEND_STRUCTURE.md](docs/BACKEND_STRUCTURE.md)** - Backend organization
|
|
|
|
### Component Documentation
|
|
- **[docs/CRAWLER_HOW_IT_WORKS.md](docs/CRAWLER_HOW_IT_WORKS.md)** - Crawler internals
|
|
- **[docs/EXTRACTION_STRATEGIES.md](docs/EXTRACTION_STRATEGIES.md)** - Content extraction
|
|
- **[docs/RSS_URL_EXTRACTION.md](docs/RSS_URL_EXTRACTION.md)** - RSS parsing
|
|
|
|
## 🧪 Testing
|
|
|
|
All test files are organized in the `tests/` directory:
|
|
|
|
```bash
|
|
# Run crawler tests
|
|
docker-compose exec crawler python tests/crawler/test_crawler.py
|
|
|
|
# Run sender tests
|
|
docker-compose exec sender python tests/sender/test_tracking_integration.py
|
|
|
|
# Run backend tests
|
|
docker-compose exec backend python tests/backend/test_tracking.py
|
|
```
|
|
|
|
## 🚀 Production Deployment
|
|
|
|
### Environment Setup
|
|
|
|
1. Update `backend/.env` with production values
|
|
2. Set strong MongoDB password
|
|
3. Use HTTPS for tracking URLs
|
|
4. Configure proper SMTP server
|
|
|
|
### Security
|
|
|
|
```bash
|
|
# Use production compose file
|
|
docker-compose -f docker-compose.prod.yml up -d
|
|
|
|
# Set MongoDB password
|
|
export MONGO_PASSWORD=your-secure-password
|
|
```
|
|
|
|
### Monitoring
|
|
|
|
- Set up log rotation
|
|
- Configure health checks
|
|
- Set up alerts for failures
|
|
- Monitor database size
|
|
|
|
## 📝 License
|
|
|
|
[Your License Here]
|
|
|
|
## 🤝 Contributing
|
|
|
|
Contributions welcome! Please read CONTRIBUTING.md first.
|
|
|
|
## 📧 Support
|
|
|
|
For issues or questions, please open a GitHub issue.
|
|
|
|
---
|
|
|
|
**Built with ❤️ for Munich News Daily**
|