This commit is contained in:
2025-11-11 14:09:21 +01:00
parent bcd0a10576
commit 1075a91eac
57 changed files with 5598 additions and 1366 deletions

647
README.md
View File

@@ -1,327 +1,390 @@
# Munich News Daily 📰
# Munich News Daily - Automated Newsletter System
A TLDR/Morning Brew-style news email platform specifically for Munich. Get the latest Munich news delivered to your inbox every morning.
A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.
## Features
## 🚀 Quick Start
- 📧 Email newsletter subscription system
- 📰 Aggregated news from multiple Munich news sources
- 🎨 Beautiful, modern web interface
- 📊 Subscription statistics
- 🔄 Real-time news updates
```bash
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your email settings
## Tech Stack
# 2. Start everything
docker-compose up -d
- **Backend**: Python (Flask) - Modular architecture with blueprints
- **Frontend**: Node.js (Express + Vanilla JavaScript)
- **Database**: MongoDB
- **News Crawler**: Standalone Python microservice
- **News Sources**: RSS feeds from major Munich news outlets
# 3. View logs
docker-compose logs -f
```
## Setup Instructions
That's it! The system will automatically:
- **Backend API**: Runs continuously for tracking and analytics (http://localhost:5001)
- **6:00 AM Berlin time**: Crawl news articles and generate summaries
- **7:00 AM Berlin time**: Send newsletter to all subscribers
📖 **New to the project?** See [QUICKSTART.md](QUICKSTART.md) for a detailed 5-minute setup guide.
## 📋 System Overview
```
6:00 AM → News Crawler
Fetches articles from RSS feeds
Extracts full content
Generates AI summaries
Saves to MongoDB
7:00 AM → Newsletter Sender
Waits for crawler to finish
Fetches today's articles
Generates newsletter with tracking
Sends to all subscribers
✅ Done! Repeat tomorrow
```
## 🏗️ Architecture
### Components
- **MongoDB**: Data storage (articles, subscribers, tracking)
- **Backend API**: Flask API for tracking and analytics (port 5001)
- **News Crawler**: Automated RSS feed crawler with AI summarization
- **Newsletter Sender**: Automated email sender with tracking
- **Frontend**: React dashboard (optional)
### Technology Stack
- Python 3.11
- MongoDB 7.0
- Docker & Docker Compose
- Flask (API)
- Ollama (AI summarization)
- Schedule (automation)
- Jinja2 (email templates)
## 📦 Installation
### Prerequisites
- Python 3.8+
- Node.js 14+
- npm or yarn
- Docker and Docker Compose (recommended for MongoDB) OR MongoDB (local installation or MongoDB Atlas account)
- Docker & Docker Compose
- (Optional) Ollama for AI summarization
### Backend Setup
### Setup
1. Navigate to the backend directory:
```bash
cd backend
```
2. Create a virtual environment (recommended):
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Set up MongoDB using Docker Compose (recommended):
1. **Clone the repository**
```bash
git clone <repository-url>
cd munich-news
```
2. **Configure environment**
```bash
cp backend/.env.example backend/.env
# Edit backend/.env with your settings
```
3. **Start the system**
```bash
# From the project root directory
docker-compose up -d
```
This will start MongoDB in a Docker container. The database will be available at `mongodb://localhost:27017/`
**Useful Docker commands:**
```bash
# Start MongoDB
docker-compose up -d
# Stop MongoDB
docker-compose down
# View MongoDB logs
docker-compose logs -f mongodb
# Restart MongoDB
docker-compose restart mongodb
# Remove MongoDB and all data (WARNING: deletes all data)
docker-compose down -v
```
**Alternative options:**
- **Local MongoDB**: Install MongoDB locally and make sure it's running
- **MongoDB Atlas** (Cloud): Create a free account at [mongodb.com/cloud/atlas](https://www.mongodb.com/cloud/atlas) and get your connection string
5. Create a `.env` file in the backend directory:
```bash
# Copy the template file
cp env.template .env
```
Then edit `.env` with your configuration:
```env
# MongoDB connection (default: mongodb://localhost:27017/)
# For Docker Compose (no authentication):
MONGODB_URI=mongodb://localhost:27017/
# For Docker Compose with authentication (if you modify docker-compose.yml):
# MONGODB_URI=mongodb://admin:password@localhost:27017/
# Or for MongoDB Atlas:
# MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
# Email configuration (optional for testing)
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
EMAIL_USER=your-email@gmail.com
EMAIL_PASSWORD=your-app-password
# Ollama Configuration (for AI-powered features)
# Remote Ollama server URL
OLLAMA_BASE_URL=http://your-remote-server-ip:11434
# Optional: API key if your Ollama server requires authentication
# OLLAMA_API_KEY=your-api-key-here
# Model name to use (e.g., llama2, mistral, codellama, llama3)
OLLAMA_MODEL=llama2
# Enable/disable Ollama features (true/false)
OLLAMA_ENABLED=false
```
## ⚙️ Configuration
**Notes:**
- For Gmail, you'll need to use an [App Password](https://support.google.com/accounts/answer/185833) instead of your regular password.
- For Ollama, replace `your-remote-server-ip` with your actual server IP or domain. Set `OLLAMA_ENABLED=true` to enable AI features.
Edit `backend/.env`:
6. Run the backend server:
```bash
python app.py
```env
# MongoDB
MONGODB_URI=mongodb://localhost:27017/
# Email (SMTP)
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
EMAIL_USER=your-email@gmail.com
EMAIL_PASSWORD=your-app-password
# Newsletter
NEWSLETTER_MAX_ARTICLES=10
NEWSLETTER_HOURS_LOOKBACK=24
# Tracking
TRACKING_ENABLED=true
TRACKING_API_URL=http://localhost:5001
TRACKING_DATA_RETENTION_DAYS=90
# Ollama (AI Summarization)
OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_MODEL=phi3:latest
```
The backend will run on `http://localhost:5001` (port 5001 to avoid conflict with AirPlay on macOS)
## 📊 Usage
### Frontend Setup
1. Navigate to the frontend directory:
```bash
cd frontend
```
2. Install dependencies:
```bash
npm install
```
3. Run the frontend server:
```bash
npm start
```
The frontend will run on `http://localhost:3000`
## Usage
1. Open your browser and go to `http://localhost:3000`
2. Enter your email address to subscribe to the newsletter
3. View the latest Munich news on the homepage
4. The backend will aggregate news from multiple Munich news sources
## Sending Newsletters
To send newsletters to all subscribers, you can add a scheduled task or manually trigger the `send_newsletter()` function in `app.py`. For production, consider using:
- **Cron jobs** (Linux/Mac)
- **Task Scheduler** (Windows)
- **Celery** with Redis/RabbitMQ for more advanced scheduling
- **Cloud functions** (AWS Lambda, Google Cloud Functions)
Example cron job to send daily at 8 AM:
```
0 8 * * * cd /path/to/munich-news/backend && python -c "from app import send_newsletter; send_newsletter()"
```
## Project Structure
```
munich-news/
├── backend/ # Main API server
│ ├── app.py # Flask application entry point
│ ├── config.py # Configuration management
│ ├── database.py # Database connection
│ ├── routes/ # API endpoints (blueprints)
│ ├── services/ # Business logic
│ ├── templates/ # Email templates
│ └── requirements.txt # Python dependencies
├── news_crawler/ # Crawler microservice
│ ├── crawler_service.py # Standalone crawler
│ ├── ollama_client.py # AI summarization client
│ ├── requirements.txt # Crawler dependencies
│ └── README.md # Crawler documentation
├── news_sender/ # Newsletter sender microservice
│ ├── sender_service.py # Standalone email sender
│ ├── newsletter_template.html # Email template
│ ├── requirements.txt # Sender dependencies
│ └── README.md # Sender documentation
├── frontend/ # Web interface
│ ├── server.js # Express server
│ ├── package.json # Node.js dependencies
│ └── public/
│ ├── index.html # Main page
│ ├── styles.css # Styling
│ └── app.js # Frontend JavaScript
├── docker-compose.yml # Docker Compose for MongoDB (development)
├── docker-compose.prod.yml # Docker Compose with authentication (production)
└── README.md
```
## API Endpoints
### `POST /api/subscribe`
Subscribe to the newsletter
- Body: `{ "email": "user@example.com" }`
### `POST /api/unsubscribe`
Unsubscribe from the newsletter
- Body: `{ "email": "user@example.com" }`
### `GET /api/news`
Get latest Munich news articles
### `GET /api/stats`
Get subscription statistics
- Returns: `{ "subscribers": number, "articles": number, "crawled_articles": number }`
### `GET /api/news/<article_url>`
Get full article content by URL
- Returns: Full article with content, author, word count, etc.
### `GET /api/ollama/ping`
Test connection to Ollama server
- Returns: Connection status and Ollama configuration
- Response examples:
- Success: `{ "status": "success", "message": "...", "response": "...", "ollama_config": {...} }`
- Disabled: `{ "status": "disabled", "message": "...", "ollama_config": {...} }`
- Error: `{ "status": "error", "message": "...", "error_details": "...", "troubleshooting": {...}, "ollama_config": {...} }`
### `GET /api/ollama/models`
List available models on Ollama server
- Returns: List of available models and current configuration
- Response: `{ "status": "success", "models": [...], "current_model": "...", "ollama_config": {...} }`
### `GET /api/rss-feeds`
Get all RSS feeds
- Returns: `{ "feeds": [...] }`
### `POST /api/rss-feeds`
Add a new RSS feed
- Body: `{ "name": "Feed Name", "url": "https://example.com/rss" }`
- Returns: `{ "message": "...", "id": "..." }`
### `DELETE /api/rss-feeds/<feed_id>`
Remove an RSS feed
- Returns: `{ "message": "..." }`
### `PATCH /api/rss-feeds/<feed_id>/toggle`
Toggle RSS feed active status
- Returns: `{ "message": "...", "active": boolean }`
## Database Schema
### Articles Collection
```javascript
{
_id: ObjectId,
title: String,
link: String (unique),
summary: String,
source: String,
published_at: String,
created_at: DateTime
}
```
### Subscribers Collection
```javascript
{
_id: ObjectId,
email: String (unique, lowercase),
subscribed_at: DateTime,
status: String ('active' | 'inactive')
}
```
**Indexes:**
- `articles.link` - Unique index to prevent duplicate articles
- `articles.created_at` - For efficient sorting
- `subscribers.email` - Unique index for email lookups
- `subscribers.subscribed_at` - For analytics
## News Crawler Microservice
The project includes a standalone crawler microservice that fetches full article content from RSS feeds.
### Running the Crawler
### View Logs
```bash
cd news_crawler
# All services
docker-compose logs -f
# Install dependencies
pip install -r requirements.txt
# Run crawler
python crawler_service.py 10
# Specific service
docker-compose logs -f crawler
docker-compose logs -f sender
docker-compose logs -f mongodb
```
See `news_crawler/README.md` for detailed documentation.
### What It Does
- Crawls full article content from RSS feed links
- Extracts text, word count, and metadata
- Stores in MongoDB for AI processing
- Skips already-crawled articles
- Rate-limited (1 second between requests)
## Customization
### Adding News Sources
Use the API to add RSS feeds dynamically:
### Manual Operations
```bash
curl -X POST http://localhost:5001/api/rss-feeds \
-H "Content-Type: application/json" \
-d '{"name": "Your Source Name", "url": "https://example.com/rss"}'
# Run crawler manually
docker-compose exec crawler python crawler_service.py 10
# Send test newsletter
docker-compose exec sender python sender_service.py test your-email@example.com
# Preview newsletter
docker-compose exec sender python sender_service.py preview
```
### Styling
### Database Access
Modify `frontend/public/styles.css` to customize the appearance.
```bash
# Connect to MongoDB
docker-compose exec mongodb mongosh munich_news
## License
# View articles
db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()
MIT
# View subscribers
db.subscribers.find({ active: true }).pretty()
## Contributing
# View tracking data
db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()
```
Feel free to submit issues and enhancement requests!
## 🔧 Management
### Add RSS Feeds
```bash
mongosh munich_news
db.rss_feeds.insertOne({
name: "Source Name",
url: "https://example.com/rss",
active: true
})
```
### Add Subscribers
```bash
mongosh munich_news
db.subscribers.insertOne({
email: "user@example.com",
active: true,
tracking_enabled: true,
subscribed_at: new Date()
})
```
### View Analytics
```bash
# Newsletter metrics
curl http://localhost:5001/api/analytics/newsletter/2024-01-15
# Article performance
curl http://localhost:5001/api/analytics/article/https://example.com/article
# Subscriber activity
curl http://localhost:5001/api/analytics/subscriber/user@example.com
```
## ⏰ Schedule Configuration
### Change Crawler Time (default: 6:00 AM)
Edit `news_crawler/scheduled_crawler.py`:
```python
schedule.every().day.at("06:00").do(run_crawler) # Change time
```
### Change Sender Time (default: 7:00 AM)
Edit `news_sender/scheduled_sender.py`:
```python
schedule.every().day.at("07:00").do(run_sender) # Change time
```
After changes:
```bash
docker-compose up -d --build
```
## 📈 Monitoring
### Container Status
```bash
docker-compose ps
```
### Check Next Scheduled Runs
```bash
# Crawler
docker-compose logs crawler | grep "Next scheduled run"
# Sender
docker-compose logs sender | grep "Next scheduled run"
```
### Engagement Metrics
```bash
mongosh munich_news
// Open rate
var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")
// Click rate
var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")
```
## 🐛 Troubleshooting
### Crawler Not Finding Articles
```bash
# Check RSS feeds
mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
# Test manually
docker-compose exec crawler python crawler_service.py 5
```
### Newsletter Not Sending
```bash
# Check email config
docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
# Test email
docker-compose exec sender python sender_service.py test your-email@example.com
```
### Containers Not Starting
```bash
# Check logs
docker-compose logs
# Rebuild
docker-compose up -d --build
# Reset everything
docker-compose down -v
docker-compose up -d
```
## 🔐 Privacy & Compliance
### GDPR Features
- **Data Retention**: Automatic anonymization after 90 days
- **Opt-Out**: Subscribers can disable tracking
- **Data Deletion**: Full data removal on request
- **Transparency**: Privacy notice in all emails
### Privacy Endpoints
```bash
# Delete subscriber data
curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
# Anonymize old data
curl -X POST http://localhost:5001/api/tracking/anonymize
# Opt out of tracking
curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
```
## 📚 Documentation
### Getting Started
- **[QUICKSTART.md](QUICKSTART.md)** - 5-minute setup guide
- **[PROJECT_STRUCTURE.md](PROJECT_STRUCTURE.md)** - Project layout
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines
### Technical Documentation
- **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** - System architecture
- **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Deployment guide
- **[docs/API.md](docs/API.md)** - API reference
- **[docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md)** - Database structure
- **[docs/BACKEND_STRUCTURE.md](docs/BACKEND_STRUCTURE.md)** - Backend organization
### Component Documentation
- **[docs/CRAWLER_HOW_IT_WORKS.md](docs/CRAWLER_HOW_IT_WORKS.md)** - Crawler internals
- **[docs/EXTRACTION_STRATEGIES.md](docs/EXTRACTION_STRATEGIES.md)** - Content extraction
- **[docs/RSS_URL_EXTRACTION.md](docs/RSS_URL_EXTRACTION.md)** - RSS parsing
## 🧪 Testing
All test files are organized in the `tests/` directory:
```bash
# Run crawler tests
docker-compose exec crawler python tests/crawler/test_crawler.py
# Run sender tests
docker-compose exec sender python tests/sender/test_tracking_integration.py
# Run backend tests
docker-compose exec backend python tests/backend/test_tracking.py
```
## 🚀 Production Deployment
### Environment Setup
1. Update `backend/.env` with production values
2. Set strong MongoDB password
3. Use HTTPS for tracking URLs
4. Configure proper SMTP server
### Security
```bash
# Use production compose file
docker-compose -f docker-compose.prod.yml up -d
# Set MongoDB password
export MONGO_PASSWORD=your-secure-password
```
### Monitoring
- Set up log rotation
- Configure health checks
- Set up alerts for failures
- Monitor database size
## 📝 License
[Your License Here]
## 🤝 Contributing
Contributions welcome! Please read CONTRIBUTING.md first.
## 📧 Support
For issues or questions, please open a GitHub issue.
---
**Built with ❤️ for Munich News Daily**