This commit is contained in:
143
QUICKSTART.md
143
QUICKSTART.md
@@ -1,56 +1,36 @@
|
|||||||
# Quick Start Guide
|
# ⚡ Quick Start Guide
|
||||||
|
|
||||||
Get Munich News Daily running in 5 minutes!
|
Get Munich News Daily running in 5 minutes!
|
||||||
|
|
||||||
## Prerequisites
|
## 📋 Prerequisites
|
||||||
|
- **Docker** & **Docker Compose** installed
|
||||||
|
- **4GB+ RAM** (for AI models)
|
||||||
|
- *(Optional)* NVIDIA GPU for faster processing
|
||||||
|
|
||||||
- Docker & Docker Compose installed
|
## 🚀 Setup Steps
|
||||||
- 4GB+ RAM (for Ollama AI models)
|
|
||||||
- (Optional) NVIDIA GPU for 5-10x faster AI processing
|
|
||||||
|
|
||||||
## Setup
|
|
||||||
|
|
||||||
### 1. Configure Environment
|
### 1. Configure Environment
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Copy example environment file
|
|
||||||
cp backend/.env.example backend/.env
|
cp backend/.env.example backend/.env
|
||||||
|
|
||||||
# Edit with your settings (required: email configuration)
|
|
||||||
nano backend/.env
|
nano backend/.env
|
||||||
```
|
```
|
||||||
|
**Required:** Update `SMTP_SERVER`, `EMAIL_USER`, and `EMAIL_PASSWORD`.
|
||||||
|
|
||||||
**Minimum required settings:**
|
### 2. Start the System
|
||||||
```env
|
|
||||||
SMTP_SERVER=smtp.gmail.com
|
|
||||||
SMTP_PORT=587
|
|
||||||
EMAIL_USER=your-email@gmail.com
|
|
||||||
EMAIL_PASSWORD=your-app-password
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Start System
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Option 1: Auto-detect GPU and start (recommended)
|
# Auto-detects GPU capabilities and starts services
|
||||||
./start-with-gpu.sh
|
./start-with-gpu.sh
|
||||||
|
|
||||||
# Option 2: Start without GPU
|
# Watch installation progress (first time model download ~2GB)
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# View logs
|
|
||||||
docker-compose logs -f
|
|
||||||
|
|
||||||
# Wait for Ollama model download (first time only, ~2-5 minutes)
|
|
||||||
docker-compose logs -f ollama-setup
|
docker-compose logs -f ollama-setup
|
||||||
```
|
```
|
||||||
|
|
||||||
**Note:** First startup downloads the phi3:latest AI model (2.2GB). This happens automatically.
|
### 3. Add News Sources
|
||||||
|
|
||||||
### 3. Add RSS Feeds
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mongosh munich_news
|
# Connect to database
|
||||||
|
docker-compose exec mongodb mongosh munich_news
|
||||||
|
|
||||||
|
# Paste this into the mongo shell:
|
||||||
db.rss_feeds.insertMany([
|
db.rss_feeds.insertMany([
|
||||||
{
|
{
|
||||||
name: "Süddeutsche Zeitung München",
|
name: "Süddeutsche Zeitung München",
|
||||||
@@ -65,11 +45,9 @@ db.rss_feeds.insertMany([
|
|||||||
])
|
])
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Add Subscribers
|
### 4. Add Yourself as Subscriber
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mongosh munich_news
|
# Still in mongo shell:
|
||||||
|
|
||||||
db.subscribers.insertOne({
|
db.subscribers.insertOne({
|
||||||
email: "your-email@example.com",
|
email: "your-email@example.com",
|
||||||
active: true,
|
active: true,
|
||||||
@@ -78,90 +56,35 @@ db.subscribers.insertOne({
|
|||||||
})
|
})
|
||||||
```
|
```
|
||||||
|
|
||||||
### 5. Test It
|
### 5. Verify Installation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Test crawler
|
# 1. Run the crawler manually to fetch news
|
||||||
docker-compose exec crawler python crawler_service.py 5
|
docker-compose exec crawler python crawler_service.py 5
|
||||||
|
|
||||||
# Test newsletter
|
# 2. Send a test email to yourself
|
||||||
docker-compose exec sender python sender_service.py test your-email@example.com
|
docker-compose exec sender python sender_service.py test your-email@example.com
|
||||||
```
|
```
|
||||||
|
|
||||||
## What Happens Next?
|
## 🎮 Dashboard Access
|
||||||
|
|
||||||
The system will automatically:
|
Once running, access the services:
|
||||||
- **Backend API**: Runs continuously at http://localhost:5001 for tracking and analytics
|
- **Dashboard**: [http://localhost:3000](http://localhost:3000)
|
||||||
- **6:00 AM Berlin time**: Crawl news articles
|
- **API**: [http://localhost:5001](http://localhost:5001)
|
||||||
- **7:00 AM Berlin time**: Send newsletter to subscribers
|
|
||||||
|
|
||||||
## View Results
|
## ⏭️ What's Next?
|
||||||
|
|
||||||
|
The system is now fully automated:
|
||||||
|
1. **6:00 AM**: Crawls news and generates AI summaries.
|
||||||
|
2. **7:00 AM**: Sends the daily newsletter.
|
||||||
|
|
||||||
|
### Useful Commands
|
||||||
```bash
|
```bash
|
||||||
# Check articles
|
# Stop everything
|
||||||
mongosh munich_news
|
|
||||||
db.articles.find().sort({ crawled_at: -1 }).limit(5)
|
|
||||||
|
|
||||||
# Check logs
|
|
||||||
docker-compose logs -f crawler
|
|
||||||
docker-compose logs -f sender
|
|
||||||
```
|
|
||||||
|
|
||||||
## Common Commands
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Stop system
|
|
||||||
docker-compose down
|
docker-compose down
|
||||||
|
|
||||||
# Restart system
|
# View logs for a service
|
||||||
docker-compose restart
|
docker-compose logs -f crawler
|
||||||
|
|
||||||
# View logs
|
# Update code & rebuild
|
||||||
docker-compose logs -f
|
|
||||||
|
|
||||||
# Rebuild after changes
|
|
||||||
docker-compose up -d --build
|
docker-compose up -d --build
|
||||||
```
|
```
|
||||||
|
|
||||||
## New Features
|
|
||||||
|
|
||||||
### GPU Acceleration (5-10x Faster)
|
|
||||||
Enable GPU support for faster AI processing:
|
|
||||||
```bash
|
|
||||||
./check-gpu.sh # Check if GPU is available
|
|
||||||
./start-with-gpu.sh # Start with GPU support
|
|
||||||
```
|
|
||||||
See [docs/GPU_SETUP.md](docs/GPU_SETUP.md) for details.
|
|
||||||
|
|
||||||
### Send Newsletter to All Subscribers
|
|
||||||
```bash
|
|
||||||
# Send newsletter to all active subscribers
|
|
||||||
curl -X POST http://localhost:5001/api/admin/send-newsletter \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"max_articles": 10}'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Security Features
|
|
||||||
- ✅ Only Backend API exposed (port 5001)
|
|
||||||
- ✅ MongoDB internal-only (secure)
|
|
||||||
- ✅ Ollama internal-only (secure)
|
|
||||||
- ✅ All services communicate via internal Docker network
|
|
||||||
|
|
||||||
## Need Help?
|
|
||||||
|
|
||||||
- **Documentation Index**: [docs/INDEX.md](docs/INDEX.md)
|
|
||||||
- **GPU Setup**: [docs/GPU_SETUP.md](docs/GPU_SETUP.md)
|
|
||||||
- **API Reference**: [docs/ADMIN_API.md](docs/ADMIN_API.md)
|
|
||||||
- **Security Guide**: [docs/SECURITY_NOTES.md](docs/SECURITY_NOTES.md)
|
|
||||||
- **Full Documentation**: [README.md](README.md)
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
1. ✅ **Enable GPU acceleration** - [docs/GPU_SETUP.md](docs/GPU_SETUP.md)
|
|
||||||
2. Set up tracking API (optional)
|
|
||||||
3. Customize newsletter template
|
|
||||||
4. Add more RSS feeds
|
|
||||||
5. Monitor engagement metrics
|
|
||||||
6. Review security settings - [docs/SECURITY_NOTES.md](docs/SECURITY_NOTES.md)
|
|
||||||
|
|
||||||
That's it! Your automated news system is running. 🎉
|
|
||||||
|
|||||||
527
README.md
527
README.md
@@ -1,460 +1,193 @@
|
|||||||
# Munich News Daily - Automated Newsletter System
|
# Munich News Daily - Automated Newsletter System
|
||||||
|
|
||||||
A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.
|
A fully automated news aggregation system that crawls Munich news sources, generates AI-powered summaries, tracks local transport disruptions, and delivers personalized daily newsletters.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
## ✨ Key Features
|
## ✨ Key Features
|
||||||
|
|
||||||
- **🤖 AI-Powered Clustering** - Automatically detects duplicate stories from different sources
|
- **🤖 AI-Powered Clustering** - Smartly detects duplicate stories and groups related articles using ChromaDB vector search.
|
||||||
- **📰 Neutral Summaries** - Combines multiple perspectives into balanced coverage
|
- **📝 Neutral Summaries** - Generates balanced, multi-perspective summaries using local LLMs (Ollama).
|
||||||
- **🎯 Smart Prioritization** - Shows most important stories first (multi-source coverage)
|
- **🚇 Transport Updates** - Real-time tracking of Munich public transport (MVG) disruptions options.
|
||||||
- **🎨 Personalized Newsletters** - AI-powered content recommendations based on user interests
|
- **🎯 Smart Prioritization** - Ranks stories based on relevance and user preferences.
|
||||||
- **📊 Engagement Tracking** - Open rates, click tracking, and analytics
|
- **🎨 Personalized Newsletters** - diverse content delivery system.
|
||||||
- **⚡ GPU Acceleration** - 5-10x faster AI processing with GPU support
|
- **📊 Engagement Analytics** - Detailed tracking of open rates, click-throughs, and user interests.
|
||||||
- **🔒 GDPR Compliant** - Privacy-first with data retention controls
|
- **⚡ GPU Acceleration** - Integrated support for NVIDIA GPUs for faster AI processing.
|
||||||
|
- **🔒 Privacy First** - GDPR-compliant with automatic data retention policies and anonymization.
|
||||||
**🚀 NEW:** GPU acceleration support for 5-10x faster AI processing! See [docs/GPU_SETUP.md](docs/GPU_SETUP.md)
|
|
||||||
|
|
||||||
## 🚀 Quick Start
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
For a detailed 5-minute setup guide, see [QUICKSTART.md](QUICKSTART.md).
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 1. Configure environment
|
# 1. Configure environment
|
||||||
cp backend/.env.example backend/.env
|
cp backend/.env.example backend/.env
|
||||||
# Edit backend/.env with your email settings
|
# Edit backend/.env with your email settings
|
||||||
|
|
||||||
# 2. Start everything
|
# 2. Start everything (Auto-detects GPU)
|
||||||
docker-compose up -d
|
./start-with-gpu.sh
|
||||||
|
|
||||||
# 3. View logs
|
# Questions?
|
||||||
docker-compose logs -f
|
# See logs: docker-compose logs -f
|
||||||
```
|
```
|
||||||
|
|
||||||
That's it! The system will automatically:
|
The system will automatically:
|
||||||
- **Frontend**: Web interface and admin dashboard (http://localhost:3000)
|
1. **6:00 AM**: Crawl news & transport updates.
|
||||||
- **Backend API**: Runs continuously for tracking and analytics (http://localhost:5001)
|
2. **6:30 AM**: Generate AI summaries & clusters.
|
||||||
- **6:00 AM Berlin time**: Crawl news articles and generate summaries
|
3. **7:00 AM**: Send personalized newsletters.
|
||||||
- **7:00 AM Berlin time**: Send newsletter to all subscribers
|
|
||||||
|
|
||||||
### Access Points
|
## 📋 System Architecture
|
||||||
|
|
||||||
- **Newsletter Page**: http://localhost:3000
|
The system is built as a set of microservices orchestrated by Docker Compose.
|
||||||
- **Admin Dashboard**: http://localhost:3000/admin.html
|
|
||||||
- **Backend API**: http://localhost:5001
|
|
||||||
|
|
||||||
📖 **New to the project?** See [QUICKSTART.md](QUICKSTART.md) for a detailed 5-minute setup guide.
|
```mermaid
|
||||||
|
graph TD
|
||||||
|
User[Subscribers] -->|Email| Sender[Newsletter Sender]
|
||||||
|
User -->|Web| Frontend[React Frontend]
|
||||||
|
Frontend -->|API| Backend[Backend API]
|
||||||
|
|
||||||
🚀 **GPU Acceleration:** Enable 5-10x faster AI processing with [GPU Setup Guide](docs/GPU_SETUP.md)
|
subgraph "Core Services"
|
||||||
|
Crawler[News Crawler]
|
||||||
|
Transport[Transport Crawler]
|
||||||
|
Sender
|
||||||
|
Backend
|
||||||
|
end
|
||||||
|
|
||||||
## 📋 System Overview
|
subgraph "Data & AI"
|
||||||
|
Mongo[(MongoDB)]
|
||||||
|
Redis[(Redis)]
|
||||||
|
Chroma[(ChromaDB)]
|
||||||
|
Ollama[Ollama AI]
|
||||||
|
end
|
||||||
|
|
||||||
```
|
Crawler -->|Save| Mongo
|
||||||
6:00 AM → News Crawler
|
Crawler -->|Embeddings| Chroma
|
||||||
↓
|
Crawler -->|Summarize| Ollama
|
||||||
Fetches articles from RSS feeds
|
|
||||||
Extracts full content
|
Transport -->|Save| Mongo
|
||||||
Generates AI summaries
|
|
||||||
Saves to MongoDB
|
Sender -->|Read| Mongo
|
||||||
↓
|
Sender -->|Track| Backend
|
||||||
7:00 AM → Newsletter Sender
|
|
||||||
↓
|
Backend -->|Read/Write| Mongo
|
||||||
Waits for crawler to finish
|
Backend -->|Cache| Redis
|
||||||
Fetches today's articles
|
|
||||||
Generates newsletter with tracking
|
|
||||||
Sends to all subscribers
|
|
||||||
↓
|
|
||||||
✅ Done! Repeat tomorrow
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🏗️ Architecture
|
### Core Components
|
||||||
|
|
||||||
### Components
|
| Service | Description | Port |
|
||||||
|
|---------|-------------|------|
|
||||||
|
| **Frontend** | React-based user dashboard and admin interface. | 3000 |
|
||||||
|
| **Backend API** | Flask API for tracking, analytics, and management. | 5001 |
|
||||||
|
| **News Crawler** | Fetches RSS feeds, extracts content, and runs AI clustering. | - |
|
||||||
|
| **Transport Crawler** | Monitors MVG (Munich Transport) for delays and disruptions. | - |
|
||||||
|
| **Newsletter Sender** | Manages subscribers, generates templates, and sends emails. | - |
|
||||||
|
| **Ollama** | Local LLM runner for on-premise AI (Phi-3, Llama3, etc.). | - |
|
||||||
|
| **ChromaDB** | Vector database for semantic search and article clustering. | - |
|
||||||
|
|
||||||
- **Ollama**: AI service for summarization and translation (internal only, GPU-accelerated)
|
## 📂 Project Structure
|
||||||
- **MongoDB**: Data storage (articles, subscribers, tracking) (internal only)
|
|
||||||
- **Backend API**: Flask API for tracking and analytics (port 5001 - only exposed service)
|
|
||||||
- **News Crawler**: Automated RSS feed crawler with AI summarization (internal only)
|
|
||||||
- **Newsletter Sender**: Automated email sender with tracking (internal only)
|
|
||||||
- **Frontend**: React dashboard (optional)
|
|
||||||
|
|
||||||
### Technology Stack
|
```text
|
||||||
|
munich-news/
|
||||||
|
├── backend/ # Flask API for tracking & analytics
|
||||||
|
├── frontend/ # React dashboard & admin UI
|
||||||
|
├── news_crawler/ # RSS fetcher & AI summarizer service
|
||||||
|
├── news_sender/ # Email generation & dispatch service
|
||||||
|
├── transport_crawler/ # MVG transport disruption monitor
|
||||||
|
├── docker-compose.yml # Main service orchestration
|
||||||
|
└── docs/ # Detailed documentation
|
||||||
|
```
|
||||||
|
|
||||||
- Python 3.11
|
## 🛠️ Installation & Setup
|
||||||
- MongoDB 7.0
|
|
||||||
- Ollama (phi3:latest model for AI)
|
|
||||||
- Docker & Docker Compose
|
|
||||||
- Flask (API)
|
|
||||||
- Schedule (automation)
|
|
||||||
- Jinja2 (email templates)
|
|
||||||
|
|
||||||
## 📦 Installation
|
1. **Clone the repository**
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/yourusername/munich-news.git
|
||||||
|
cd munich-news
|
||||||
|
```
|
||||||
|
|
||||||
### Prerequisites
|
2. **Environment Configuration**
|
||||||
|
```bash
|
||||||
|
cp backend/.env.example backend/.env
|
||||||
|
nano backend/.env
|
||||||
|
```
|
||||||
|
*Critical settings:* `SMTP_SERVER`, `EMAIL_USER`, `EMAIL_PASSWORD`.
|
||||||
|
|
||||||
- Docker & Docker Compose
|
3. **Start the System**
|
||||||
- 4GB+ RAM (for Ollama AI models)
|
```bash
|
||||||
- (Optional) NVIDIA GPU for 5-10x faster AI processing
|
# Recommended: Helper script (handles GPU & Model setup)
|
||||||
|
./start-with-gpu.sh
|
||||||
|
|
||||||
### Setup
|
# Alternative: Standard Docker Compose
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
1. **Clone the repository**
|
4. **Initial Setup (First Run)**
|
||||||
```bash
|
* The system needs to download the AI model (approx. 2GB).
|
||||||
git clone <repository-url>
|
* Watch progress: `docker-compose logs -f ollama-setup`
|
||||||
cd munich-news
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Configure environment**
|
|
||||||
```bash
|
|
||||||
cp backend/.env.example backend/.env
|
|
||||||
# Edit backend/.env with your settings
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Configure Ollama (AI features)**
|
|
||||||
```bash
|
|
||||||
# Option 1: Use integrated Docker Compose Ollama (recommended)
|
|
||||||
./configure-ollama.sh
|
|
||||||
# Select option 1
|
|
||||||
|
|
||||||
# Option 2: Use external Ollama server
|
|
||||||
# Install from https://ollama.ai/download
|
|
||||||
# Then run: ollama pull phi3:latest
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Start the system**
|
|
||||||
```bash
|
|
||||||
# Auto-detect GPU and start (recommended)
|
|
||||||
./start-with-gpu.sh
|
|
||||||
|
|
||||||
# Or start manually
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# First time: Wait for Ollama model download (2-5 minutes)
|
|
||||||
docker-compose logs -f ollama-setup
|
|
||||||
```
|
|
||||||
|
|
||||||
📖 **For detailed Ollama setup & GPU acceleration:** See [docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)
|
|
||||||
|
|
||||||
💡 **To change AI model:** Edit `OLLAMA_MODEL` in `.env`, then run `./pull-ollama-model.sh`. See [docs/CHANGING_AI_MODEL.md](docs/CHANGING_AI_MODEL.md)
|
|
||||||
|
|
||||||
## ⚙️ Configuration
|
## ⚙️ Configuration
|
||||||
|
|
||||||
Edit `backend/.env`:
|
Key configuration options in `backend/.env`:
|
||||||
|
|
||||||
```env
|
| Category | Variable | Description |
|
||||||
# MongoDB
|
|----------|----------|-------------|
|
||||||
MONGODB_URI=mongodb://localhost:27017/
|
| **Email** | `SMTP_SERVER` | SMTP Server (e.g., smtp.gmail.com) |
|
||||||
|
| | `EMAIL_USER` | Your sending email address |
|
||||||
|
| **AI** | `OLLAMA_MODEL` | Model to use (default: phi3:latest) |
|
||||||
|
| **Schedule** | `CRAWLER_TIME` | Time to start crawling (e.g., "06:00") |
|
||||||
|
| | `SENDER_TIME` | Time to send emails (e.g., "07:00") |
|
||||||
|
|
||||||
# Email (SMTP)
|
## 📊 Usage & Monitoring
|
||||||
SMTP_SERVER=smtp.gmail.com
|
|
||||||
SMTP_PORT=587
|
|
||||||
EMAIL_USER=your-email@gmail.com
|
|
||||||
EMAIL_PASSWORD=your-app-password
|
|
||||||
|
|
||||||
# Newsletter
|
### Access Points
|
||||||
NEWSLETTER_MAX_ARTICLES=10
|
* **Web Dashboard**: [http://localhost:3000](http://localhost:3000) (or configured domain)
|
||||||
NEWSLETTER_HOURS_LOOKBACK=24
|
* **API**: [http://localhost:5001](http://localhost:5001)
|
||||||
|
|
||||||
# Tracking
|
### Useful Commands
|
||||||
TRACKING_ENABLED=true
|
|
||||||
TRACKING_API_URL=http://localhost:5001
|
|
||||||
TRACKING_DATA_RETENTION_DAYS=90
|
|
||||||
|
|
||||||
# Ollama (AI Summarization)
|
**View Logs**
|
||||||
OLLAMA_ENABLED=true
|
```bash
|
||||||
OLLAMA_BASE_URL=http://127.0.0.1:11434
|
docker-compose logs -f [service_name]
|
||||||
OLLAMA_MODEL=phi3:latest
|
# e.g., docker-compose logs -f crawler
|
||||||
```
|
```
|
||||||
|
|
||||||
## 📊 Usage
|
**Manual Trigger**
|
||||||
|
|
||||||
### View Logs
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# All services
|
# Run News Crawler immediately
|
||||||
docker-compose logs -f
|
|
||||||
|
|
||||||
# Specific service
|
|
||||||
docker-compose logs -f crawler
|
|
||||||
docker-compose logs -f sender
|
|
||||||
docker-compose logs -f mongodb
|
|
||||||
```
|
|
||||||
|
|
||||||
### Manual Operations
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Run crawler manually
|
|
||||||
docker-compose exec crawler python crawler_service.py 10
|
docker-compose exec crawler python crawler_service.py 10
|
||||||
|
|
||||||
# Send test newsletter
|
# Run Transport Crawler immediately
|
||||||
docker-compose exec sender python sender_service.py test your-email@example.com
|
docker-compose exec transport-crawler python transport_service.py
|
||||||
|
|
||||||
# Preview newsletter
|
# Send Test Newsletter
|
||||||
docker-compose exec sender python sender_service.py preview
|
docker-compose exec sender python sender_service.py test user@example.com
|
||||||
```
|
```
|
||||||
|
|
||||||
### Database Access
|
**Database Access**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Connect to MongoDB
|
# Connect to MongoDB
|
||||||
docker-compose exec mongodb mongosh munich_news
|
docker-compose exec mongodb mongosh munich_news
|
||||||
|
|
||||||
# View articles
|
|
||||||
db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()
|
|
||||||
|
|
||||||
# View subscribers
|
|
||||||
db.subscribers.find({ active: true }).pretty()
|
|
||||||
|
|
||||||
# View tracking data
|
|
||||||
db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🔧 Management
|
## 🌐 Production Deployment (Traefik)
|
||||||
|
|
||||||
### Add RSS Feeds
|
This project is configured to work with **Traefik** as a reverse proxy.
|
||||||
|
The `docker-compose.yml` includes labels for:
|
||||||
|
- `news.dongho.kim` (Frontend)
|
||||||
|
- `news-api.dongho.kim` (Backend)
|
||||||
|
|
||||||
```bash
|
To use this locally, add these to your `/etc/hosts`:
|
||||||
mongosh munich_news
|
```text
|
||||||
|
127.0.0.1 news.dongho.kim news-api.dongho.kim
|
||||||
db.rss_feeds.insertOne({
|
|
||||||
name: "Source Name",
|
|
||||||
url: "https://example.com/rss",
|
|
||||||
active: true
|
|
||||||
})
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Add Subscribers
|
For production, ensure your Traefik proxy network is named `proxy` or update the `docker-compose.yml` accordingly.
|
||||||
|
|
||||||
```bash
|
|
||||||
mongosh munich_news
|
|
||||||
|
|
||||||
db.subscribers.insertOne({
|
|
||||||
email: "user@example.com",
|
|
||||||
active: true,
|
|
||||||
tracking_enabled: true,
|
|
||||||
subscribed_at: new Date()
|
|
||||||
})
|
|
||||||
```
|
|
||||||
|
|
||||||
### View Analytics
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Newsletter metrics
|
|
||||||
curl http://localhost:5001/api/analytics/newsletter/2024-01-15
|
|
||||||
|
|
||||||
# Article performance
|
|
||||||
curl http://localhost:5001/api/analytics/article/https://example.com/article
|
|
||||||
|
|
||||||
# Subscriber activity
|
|
||||||
curl http://localhost:5001/api/analytics/subscriber/user@example.com
|
|
||||||
```
|
|
||||||
|
|
||||||
## ⏰ Schedule Configuration
|
|
||||||
|
|
||||||
### Change Crawler Time (default: 6:00 AM)
|
|
||||||
|
|
||||||
Edit `news_crawler/scheduled_crawler.py`:
|
|
||||||
```python
|
|
||||||
schedule.every().day.at("06:00").do(run_crawler) # Change time
|
|
||||||
```
|
|
||||||
|
|
||||||
### Change Sender Time (default: 7:00 AM)
|
|
||||||
|
|
||||||
Edit `news_sender/scheduled_sender.py`:
|
|
||||||
```python
|
|
||||||
schedule.every().day.at("07:00").do(run_sender) # Change time
|
|
||||||
```
|
|
||||||
|
|
||||||
After changes:
|
|
||||||
```bash
|
|
||||||
docker-compose up -d --build
|
|
||||||
```
|
|
||||||
|
|
||||||
## 📈 Monitoring
|
|
||||||
|
|
||||||
### Container Status
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker-compose ps
|
|
||||||
```
|
|
||||||
|
|
||||||
### Check Next Scheduled Runs
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Crawler
|
|
||||||
docker-compose logs crawler | grep "Next scheduled run"
|
|
||||||
|
|
||||||
# Sender
|
|
||||||
docker-compose logs sender | grep "Next scheduled run"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Engagement Metrics
|
|
||||||
|
|
||||||
```bash
|
|
||||||
mongosh munich_news
|
|
||||||
|
|
||||||
// Open rate
|
|
||||||
var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
|
|
||||||
var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
|
|
||||||
print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")
|
|
||||||
|
|
||||||
// Click rate
|
|
||||||
var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
|
|
||||||
print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🐛 Troubleshooting
|
|
||||||
|
|
||||||
### Crawler Not Finding Articles
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check RSS feeds
|
|
||||||
mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
|
|
||||||
|
|
||||||
# Test manually
|
|
||||||
docker-compose exec crawler python crawler_service.py 5
|
|
||||||
```
|
|
||||||
|
|
||||||
### Newsletter Not Sending
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check email config
|
|
||||||
docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
|
|
||||||
|
|
||||||
# Test email
|
|
||||||
docker-compose exec sender python sender_service.py test your-email@example.com
|
|
||||||
```
|
|
||||||
|
|
||||||
### Containers Not Starting
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check logs
|
|
||||||
docker-compose logs
|
|
||||||
|
|
||||||
# Rebuild
|
|
||||||
docker-compose up -d --build
|
|
||||||
|
|
||||||
# Reset everything
|
|
||||||
docker-compose down -v
|
|
||||||
docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🔐 Privacy & Compliance
|
|
||||||
|
|
||||||
### GDPR Features
|
|
||||||
|
|
||||||
- **Data Retention**: Automatic anonymization after 90 days
|
|
||||||
- **Opt-Out**: Subscribers can disable tracking
|
|
||||||
- **Data Deletion**: Full data removal on request
|
|
||||||
- **Transparency**: Privacy notice in all emails
|
|
||||||
|
|
||||||
### Privacy Endpoints
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Delete subscriber data
|
|
||||||
curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
|
|
||||||
|
|
||||||
# Anonymize old data
|
|
||||||
curl -X POST http://localhost:5001/api/tracking/anonymize
|
|
||||||
|
|
||||||
# Opt out of tracking
|
|
||||||
curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
|
|
||||||
```
|
|
||||||
|
|
||||||
## 📚 Documentation
|
|
||||||
|
|
||||||
### Getting Started
|
|
||||||
- **[QUICKSTART.md](QUICKSTART.md)** - 5-minute setup guide
|
|
||||||
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines
|
|
||||||
|
|
||||||
### Core Features
|
|
||||||
- **[docs/AI_NEWS_AGGREGATION.md](docs/AI_NEWS_AGGREGATION.md)** - AI-powered clustering & neutral summaries
|
|
||||||
- **[docs/PERSONALIZATION.md](docs/PERSONALIZATION.md)** - Personalized newsletter system
|
|
||||||
- **[docs/PERSONALIZATION_COMPLETE.md](docs/PERSONALIZATION_COMPLETE.md)** - Personalization implementation guide
|
|
||||||
- **[docs/FEATURES.md](docs/FEATURES.md)** - Complete feature list
|
|
||||||
- **[docs/API.md](docs/API.md)** - API endpoints reference
|
|
||||||
|
|
||||||
### Technical Documentation
|
|
||||||
- **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** - System architecture
|
|
||||||
- **[docs/SETUP.md](docs/SETUP.md)** - Detailed setup guide
|
|
||||||
- **[docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)** - AI/Ollama configuration
|
|
||||||
- **[docs/GPU_SETUP.md](docs/GPU_SETUP.md)** - GPU acceleration setup
|
|
||||||
- **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Production deployment
|
|
||||||
- **[docs/SECURITY.md](docs/SECURITY.md)** - Security best practices
|
|
||||||
- **[docs/REFERENCE.md](docs/REFERENCE.md)** - Complete reference
|
|
||||||
- **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Deployment guide
|
|
||||||
- **[docs/API.md](docs/API.md)** - API reference
|
|
||||||
- **[docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md)** - Database structure
|
|
||||||
- **[docs/BACKEND_STRUCTURE.md](docs/BACKEND_STRUCTURE.md)** - Backend organization
|
|
||||||
|
|
||||||
### Component Documentation
|
|
||||||
- **[docs/CRAWLER_HOW_IT_WORKS.md](docs/CRAWLER_HOW_IT_WORKS.md)** - Crawler internals
|
|
||||||
- **[docs/EXTRACTION_STRATEGIES.md](docs/EXTRACTION_STRATEGIES.md)** - Content extraction
|
|
||||||
- **[docs/RSS_URL_EXTRACTION.md](docs/RSS_URL_EXTRACTION.md)** - RSS parsing
|
|
||||||
|
|
||||||
## 🧪 Testing
|
|
||||||
|
|
||||||
All test files are organized in the `tests/` directory:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Run crawler tests
|
|
||||||
docker-compose exec crawler python tests/crawler/test_crawler.py
|
|
||||||
|
|
||||||
# Run sender tests
|
|
||||||
docker-compose exec sender python tests/sender/test_tracking_integration.py
|
|
||||||
|
|
||||||
# Run backend tests
|
|
||||||
docker-compose exec backend python tests/backend/test_tracking.py
|
|
||||||
|
|
||||||
# Test personalization system (all 4 phases)
|
|
||||||
docker exec munich-news-local-backend python test_personalization_system.py
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🚀 Production Deployment
|
|
||||||
|
|
||||||
### Environment Setup
|
|
||||||
|
|
||||||
1. Update `backend/.env` with production values
|
|
||||||
2. Set strong MongoDB password
|
|
||||||
3. Use HTTPS for tracking URLs
|
|
||||||
4. Configure proper SMTP server
|
|
||||||
|
|
||||||
### Security
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Use production compose file
|
|
||||||
docker-compose -f docker-compose.prod.yml up -d
|
|
||||||
|
|
||||||
# Set MongoDB password
|
|
||||||
export MONGO_PASSWORD=your-secure-password
|
|
||||||
```
|
|
||||||
|
|
||||||
### Monitoring
|
|
||||||
|
|
||||||
- Set up log rotation
|
|
||||||
- Configure health checks
|
|
||||||
- Set up alerts for failures
|
|
||||||
- Monitor database size
|
|
||||||
|
|
||||||
## 📚 Documentation
|
|
||||||
|
|
||||||
Complete documentation available in the [docs/](docs/) directory:
|
|
||||||
|
|
||||||
- **[Documentation Index](docs/INDEX.md)** - Complete documentation guide
|
|
||||||
- **[GPU Setup](docs/GPU_SETUP.md)** - 5-10x faster with GPU acceleration
|
|
||||||
- **[Admin API](docs/ADMIN_API.md)** - API endpoints reference
|
|
||||||
- **[Security Guide](docs/SECURITY_NOTES.md)** - Security best practices
|
|
||||||
- **[System Architecture](docs/SYSTEM_ARCHITECTURE.md)** - Technical overview
|
|
||||||
|
|
||||||
## 📝 License
|
|
||||||
|
|
||||||
[Your License Here]
|
|
||||||
|
|
||||||
## 🤝 Contributing
|
## 🤝 Contributing
|
||||||
|
|
||||||
Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first.
|
We welcome contributions! Please check [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
||||||
|
|
||||||
## 📧 Support
|
## 📄 License
|
||||||
|
|
||||||
For issues or questions, please open a GitHub issue.
|
MIT License - see [LICENSE](LICENSE) for details.
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Built with ❤️ for Munich News Daily**
|
|
||||||
|
|||||||
Reference in New Issue
Block a user