update

2025-11-11 14:09:21 +01:00
parent bcd0a10576
commit 1075a91eac
57 changed files with 5598 additions and 1366 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -179,8 +179,8 @@ mongodb_data/
 ollama_data/
 # Spec artifacts (optional - uncomment if you don't want to track specs)
-# .kiro/specs/
+.kiro/specs/
-
+.vscode
 # Test outputs
 test-results/
 coverage/
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,79 @@
 # Contributing to Munich News Daily
 Thank you for your interest in contributing!
 ## Getting Started
 1. Fork the repository
 2. Clone your fork
 3. Create a feature branch
 4. Make your changes
 5. Run tests
 6. Submit a pull request
 ## Development Setup
 ```bash
 # Clone repository
 git clone <your-fork-url>
 cd munich-news
 # Copy environment file
 cp backend/.env.example backend/.env
 # Start development environment
 docker-compose up -d
 # View logs
 docker-compose logs -f
 ```
 ## Running Tests
 ```bash
 # Run all tests
 docker-compose exec crawler python -m pytest tests/crawler
 docker-compose exec sender python -m pytest tests/sender
 docker-compose exec backend python -m pytest tests/backend
 # Run specific test
 docker-compose exec crawler python tests/crawler/test_crawler.py
 ```
 ## Code Style
 - Follow PEP 8 for Python code
 - Use meaningful variable names
 - Add docstrings to functions
 - Keep functions small and focused
 - Write tests for new features
 ## Commit Messages
 - Use clear, descriptive commit messages
 - Start with a verb (Add, Fix, Update, etc.)
 - Keep first line under 50 characters
 - Add details in the body if needed
 Example:
 ```
 Add RSS feed validation
 - Validate URL format
 - Check feed accessibility
 - Add error handling
 ```
 ## Pull Request Process
 1. Update documentation if needed
 2. Add tests for new features
 3. Ensure all tests pass
 4. Update CHANGELOG.md
 5. Request review from maintainers
 ## Questions?
 Open an issue or reach out to the maintainers.
 Thank you for contributing! 🎉
--- a/FINAL_STRUCTURE.md
+++ b/FINAL_STRUCTURE.md
@@ -0,0 +1,243 @@
 # ✅ Final Clean Project Structure
 ## 🎉 Cleanup Complete!
 Your Munich News Daily project is now clean, organized, and professional.
 ## 📁 Current Structure
 ```
 munich-news/
 ├── 📄 Root Files (5 essential files)
 │   ├── README.md                 # Main documentation
 │   ├── QUICKSTART.md             # 5-minute setup guide
 │   ├── CONTRIBUTING.md           # Contribution guidelines
 │   ├── PROJECT_STRUCTURE.md      # Project layout
 │   └── docker-compose.yml        # Single unified compose file
 │
 ├── 📚 docs/ (12 documentation files)
 │   ├── API.md                    # API reference
 │   ├── ARCHITECTURE.md           # System architecture
 │   ├── BACKEND_STRUCTURE.md      # Backend organization
 │   ├── CRAWLER_HOW_IT_WORKS.md   # Crawler internals
 │   ├── DATABASE_SCHEMA.md        # Database structure
 │   ├── DEPLOYMENT.md             # Deployment guide
 │   ├── EXTRACTION_STRATEGIES.md  # Content extraction
 │   └── RSS_URL_EXTRACTION.md     # RSS parsing
 │
 ├── 🧪 tests/ (10 test files)
 │   ├── backend/                  # Backend tests
 │   ├── crawler/                  # Crawler tests
 │   └── sender/                   # Sender tests
 │
 ├── 🔧 backend/                   # Backend API
 │   ├── routes/
 │   ├── services/
 │   ├── .env.example
 │   └── app.py
 │
 ├── 📰 news_crawler/              # Crawler service
 │   ├── Dockerfile
 │   ├── crawler_service.py
 │   ├── scheduled_crawler.py
 │   └── requirements.txt
 │
 ├── 📧 news_sender/               # Sender service
 │   ├── Dockerfile
 │   ├── sender_service.py
 │   ├── scheduled_sender.py
 │   └── requirements.txt
 │
 └── 🎨 frontend/                  # React dashboard (optional)
 ```
 ## ✨ What Was Cleaned
 ### Removed Files (20+)
 - ❌ All redundant markdown files from root
 - ❌ All redundant markdown files from subdirectories
 - ❌ Multiple docker-compose files (kept only 1)
 - ❌ Multiple startup scripts (use docker-compose now)
 - ❌ Test scripts and helpers
 ### Organized Files
 - ✅ All tests → `tests/` directory
 - ✅ All documentation → `docs/` directory
 - ✅ All docker configs → single `docker-compose.yml`
 ## 🚀 How to Use
 ### Start Everything
 ```bash
 docker-compose up -d
 ```
 That's it! One command starts:
 - MongoDB database
 - News crawler (6 AM schedule)
 - Newsletter sender (7 AM schedule)
 ### View Logs
 ```bash
 docker-compose logs -f
 ```
 ### Stop Everything
 ```bash
 docker-compose down
 ```
 ## 📊 Before vs After
 ### Before
 ```
 Root: 20+ files (messy)
 ├── AUTOMATION_README.md
 ├── AUTOMATION_SETUP_COMPLETE.md
 ├── CRAWLER_QUICKSTART.md
 ├── CRAWLER_SETUP_SUMMARY.md
 ├── docker-compose.yml
 ├── docker-compose.prod.yml
 ├── README_CRAWLER.md
 ├── start-automation.sh
 ├── start-crawler.sh
 ├── start-sender.sh
 ├── test-crawler-setup.sh
 └── ... many more
 Subdirectories: Scattered docs
 ├── backend/TRACKING_README.md
 ├── backend/TRACKING_CONFIGURATION.md
 ├── news_crawler/README.md
 ├── news_crawler/QUICKSTART.md
 ├── news_crawler/docker-compose.yml
 ├── news_sender/README.md
 └── ... more scattered files
 Tests: Scattered everywhere
 ```
 ### After
 ```
 Root: 5 essential files (clean)
 ├── README.md
 ├── QUICKSTART.md
 ├── CONTRIBUTING.md
 ├── PROJECT_STRUCTURE.md
 └── docker-compose.yml
 docs/: All documentation (12 files)
 ├── API.md
 ├── ARCHITECTURE.md
 ├── DEPLOYMENT.md
 └── ... organized docs
 tests/: All tests (10 files)
 ├── backend/
 ├── crawler/
 └── sender/
 Subdirectories: Clean, no scattered docs
 ```
 ## 🎯 Benefits
 ### 1. Easy to Navigate
 - Clear directory structure
 - Everything in its place
 - No clutter
 ### 2. Simple to Use
 - One command: `docker-compose up -d`
 - One place for docs: `docs/`
 - One place for tests: `tests/`
 ### 3. Professional
 - Industry-standard layout
 - Clean and organized
 - Ready for collaboration
 ### 4. Maintainable
 - Easy to find files
 - Clear separation of concerns
 - Scalable structure
 ## 📝 Quick Reference
 ### Documentation
 ```bash
 # Main docs
 cat README.md
 cat QUICKSTART.md
 # Technical docs
 ls docs/
 ```
 ### Running
 ```bash
 # Start
 docker-compose up -d
 # Logs
 docker-compose logs -f
 # Stop
 docker-compose down
 ```
 ### Testing
 ```bash
 # Run tests
 docker-compose exec crawler python tests/crawler/test_crawler.py
 docker-compose exec sender python tests/sender/test_tracking_integration.py
 ```
 ### Development
 ```bash
 # Edit code in respective directories
 # Rebuild
 docker-compose up -d --build
 ```
 ## ✅ Verification
 Run these commands to verify the cleanup:
 ```bash
 # Check root directory (should be clean)
 ls -1 *.md
 # Check docs directory
 ls -1 docs/
 # Check tests directory
 ls -1 tests/
 # Check for stray docker-compose files (should be only 1)
 find . -name "docker-compose*.yml" ! -path "*/node_modules/*" ! -path "*/env/*"
 # Check for stray markdown in subdirectories (should be none)
 find backend news_crawler news_sender -name "*.md" ! -path "*/env/*"
 ```
 ## 🎊 Result
 A clean, professional, production-ready project structure!
 **One command to start everything:**
 ```bash
 docker-compose up -d
 ```
 **One place for all documentation:**
 ```bash
 ls docs/
 ```
 **One place for all tests:**
 ```bash
 ls tests/
 ```
 Simple. Clean. Professional. ✨
--- a/PROJECT_STRUCTURE.md
+++ b/PROJECT_STRUCTURE.md
@@ -0,0 +1,126 @@
 # Project Structure
 ```
 munich-news/
 ├── backend/                    # Backend API and services
 │   ├── routes/                # API routes
 │   ├── services/              # Business logic
 │   ├── .env.example           # Environment template
 │   ├── app.py                 # Flask application
 │   ├── config.py              # Configuration
 │   └── database.py            # MongoDB connection
 │
 ├── news_crawler/              # News crawler service
 │   ├── Dockerfile             # Crawler container
 │   ├── crawler_service.py     # Main crawler logic
 │   ├── scheduled_crawler.py   # Scheduler (6 AM)
 │   ├── rss_utils.py           # RSS parsing utilities
 │   └── requirements.txt       # Python dependencies
 │
 ├── news_sender/               # Newsletter sender service
 │   ├── Dockerfile             # Sender container
 │   ├── sender_service.py      # Main sender logic
 │   ├── scheduled_sender.py    # Scheduler (7 AM)
 │   ├── tracking_integration.py # Email tracking
 │   ├── newsletter_template.html # Email template
 │   └── requirements.txt       # Python dependencies
 │
 ├── frontend/                  # React dashboard (optional)
 │   ├── src/                   # React components
 │   ├── public/                # Static files
 │   └── package.json           # Node dependencies
 │
 ├── tests/                     # All test files
 │   ├── crawler/               # Crawler tests
 │   ├── sender/                # Sender tests
 │   └── backend/               # Backend tests
 │
 ├── docs/                      # Documentation
 │   ├── ARCHITECTURE.md        # System architecture
 │   ├── DEPLOYMENT.md          # Deployment guide
 │   ├── API.md                 # API reference
 │   ├── DATABASE_SCHEMA.md     # Database structure
 │   ├── BACKEND_STRUCTURE.md   # Backend organization
 │   ├── CRAWLER_HOW_IT_WORKS.md # Crawler internals
 │   ├── EXTRACTION_STRATEGIES.md # Content extraction
 │   └── RSS_URL_EXTRACTION.md  # RSS parsing
 │
 ├── .kiro/                     # Kiro IDE configuration
 │   └── specs/                 # Feature specifications
 │
 ├── docker-compose.yml         # Docker orchestration
 ├── README.md                  # Main documentation
 ├── QUICKSTART.md              # 5-minute setup guide
 ├── CONTRIBUTING.md            # Contribution guidelines
 ├── .gitignore                 # Git ignore rules
 └── .dockerignore              # Docker ignore rules
 ```
 ## Key Files
 ### Configuration
 - `backend/.env` - Environment variables (create from .env.example)
 - `docker-compose.yml` - Docker services configuration
 ### Entry Points
 - `news_crawler/scheduled_crawler.py` - Crawler scheduler (6 AM)
 - `news_sender/scheduled_sender.py` - Sender scheduler (7 AM)
 - `backend/app.py` - Backend API server
 ### Documentation
 - `README.md` - Main project documentation
 - `QUICKSTART.md` - Quick setup guide
 - `docs/` - Detailed documentation
 ### Tests
 - `tests/crawler/` - Crawler test files
 - `tests/sender/` - Sender test files
 - `tests/backend/` - Backend test files
 ## Docker Services
 When you run `docker-compose up -d`, these services start:
 1. **mongodb** - Database (port 27017)
 2. **crawler** - News crawler (scheduled for 6 AM)
 3. **sender** - Newsletter sender (scheduled for 7 AM)
 4. **backend** - API server (port 5001, optional)
 ## Data Flow
 ```
 RSS Feeds → Crawler → MongoDB → Sender → Subscribers
                        ↓
                   Backend API
                        ↓
                   Analytics
 ```
 ## Development Workflow
 1. Edit code in respective directories
 2. Rebuild containers: `docker-compose up -d --build`
 3. View logs: `docker-compose logs -f`
 4. Run tests: `docker-compose exec <service> python tests/...`
 ## Adding New Features
 1. Create spec in `.kiro/specs/`
 2. Implement in appropriate directory
 3. Add tests in `tests/`
 4. Update documentation in `docs/`
 5. Submit pull request
 ## Clean Architecture
 - **Separation of Concerns**: Each service has its own directory
 - **Centralized Configuration**: All config in `backend/.env`
 - **Organized Tests**: All tests in `tests/` directory
 - **Clear Documentation**: All docs in `docs/` directory
 - **Single Entry Point**: One `docker-compose.yml` file
 This structure makes the project:
 - ✅ Easy to navigate
 - ✅ Simple to deploy
 - ✅ Clear to understand
 - ✅ Maintainable long-term
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@@ -0,0 +1,131 @@
 # Quick Start Guide
 Get Munich News Daily running in 5 minutes!
 ## Prerequisites
 - Docker & Docker Compose installed
 - (Optional) Ollama for AI summarization
 ## Setup
 ### 1. Configure Environment
 ```bash
 # Copy example environment file
 cp backend/.env.example backend/.env
 # Edit with your settings (required: email configuration)
 nano backend/.env
 ```
 **Minimum required settings:**
 ```env
 SMTP_SERVER=smtp.gmail.com
 SMTP_PORT=587
 EMAIL_USER=your-email@gmail.com
 EMAIL_PASSWORD=your-app-password
 ```
 ### 2. Start System
 ```bash
 # Start all services
 docker-compose up -d
 # View logs
 docker-compose logs -f
 ```
 ### 3. Add RSS Feeds
 ```bash
 mongosh munich_news
 db.rss_feeds.insertMany([
  {
    name: "Süddeutsche Zeitung München",
    url: "https://www.sueddeutsche.de/muenchen/rss",
    active: true
  },
  {
    name: "Merkur München",
    url: "https://www.merkur.de/lokales/muenchen/rss/feed.rss",
    active: true
  }
 ])
 ```
 ### 4. Add Subscribers
 ```bash
 mongosh munich_news
 db.subscribers.insertOne({
  email: "your-email@example.com",
  active: true,
  tracking_enabled: true,
  subscribed_at: new Date()
 })
 ```
 ### 5. Test It
 ```bash
 # Test crawler
 docker-compose exec crawler python crawler_service.py 5
 # Test newsletter
 docker-compose exec sender python sender_service.py test your-email@example.com
 ```
 ## What Happens Next?
 The system will automatically:
 - **Backend API**: Runs continuously at http://localhost:5001 for tracking and analytics
 - **6:00 AM Berlin time**: Crawl news articles
 - **7:00 AM Berlin time**: Send newsletter to subscribers
 ## View Results
 ```bash
 # Check articles
 mongosh munich_news
 db.articles.find().sort({ crawled_at: -1 }).limit(5)
 # Check logs
 docker-compose logs -f crawler
 docker-compose logs -f sender
 ```
 ## Common Commands
 ```bash
 # Stop system
 docker-compose down
 # Restart system
 docker-compose restart
 # View logs
 docker-compose logs -f
 # Rebuild after changes
 docker-compose up -d --build
 ```
 ## Need Help?
 - Check [README.md](README.md) for full documentation
 - See [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) for detailed setup
 - View [docs/API.md](docs/API.md) for API reference
 ## Next Steps
 1. Configure Ollama for AI summaries (optional)
 2. Set up tracking API (optional)
 3. Customize newsletter template
 4. Add more RSS feeds
 5. Monitor engagement metrics
 That's it! Your automated news system is running. 🎉
--- a/README.md
+++ b/README.md
@@ -1,327 +1,390 @@
-# Munich News Daily 📰
+# Munich News Daily - Automated Newsletter System
-A TLDR/Morning Brew-style news email platform specifically for Munich. Get the latest Munich news delivered to your inbox every morning.
+A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.
-## Features
+## 🚀 Quick Start
- 📧 Email newsletter subscription system
+```bash
- 📰 Aggregated news from multiple Munich news sources
+# 1. Configure environment
- 🎨 Beautiful, modern web interface
+cp backend/.env.example backend/.env
- 📊 Subscription statistics
+# Edit backend/.env with your email settings
 - 🔄 Real-time news updates
-## Tech Stack
+# 2. Start everything
 docker-compose up -d
- **Backend**: Python (Flask) - Modular architecture with blueprints
+# 3. View logs
- **Frontend**: Node.js (Express + Vanilla JavaScript)
+docker-compose logs -f
- **Database**: MongoDB
+```
 - **News Crawler**: Standalone Python microservice
 - **News Sources**: RSS feeds from major Munich news outlets
-## Setup Instructions
+That's it! The system will automatically:
 - **Backend API**: Runs continuously for tracking and analytics (http://localhost:5001)
 - **6:00 AM Berlin time**: Crawl news articles and generate summaries
 - **7:00 AM Berlin time**: Send newsletter to all subscribers
 📖 **New to the project?** See [QUICKSTART.md](QUICKSTART.md) for a detailed 5-minute setup guide.
 ## 📋 System Overview
 ```
 6:00 AM → News Crawler
          ↓
          Fetches articles from RSS feeds
          Extracts full content
          Generates AI summaries
          Saves to MongoDB
          ↓
 7:00 AM → Newsletter Sender
          ↓
          Waits for crawler to finish
          Fetches today's articles
          Generates newsletter with tracking
          Sends to all subscribers
          ↓
          ✅ Done! Repeat tomorrow
 ```
 ## 🏗️ Architecture
 ### Components
 - **MongoDB**: Data storage (articles, subscribers, tracking)
 - **Backend API**: Flask API for tracking and analytics (port 5001)
 - **News Crawler**: Automated RSS feed crawler with AI summarization
 - **Newsletter Sender**: Automated email sender with tracking
 - **Frontend**: React dashboard (optional)
 ### Technology Stack
 - Python 3.11
 - MongoDB 7.0
 - Docker & Docker Compose
 - Flask (API)
 - Ollama (AI summarization)
 - Schedule (automation)
 - Jinja2 (email templates)
 ## 📦 Installation
 ### Prerequisites
- Python 3.8+
+- Docker & Docker Compose
- Node.js 14+
+- (Optional) Ollama for AI summarization
 - npm or yarn
 - Docker and Docker Compose (recommended for MongoDB) OR MongoDB (local installation or MongoDB Atlas account)
-### Backend Setup
+### Setup
-1. Navigate to the backend directory:
+1. **Clone the repository**
-```bash
+   ```bash
-cd backend
+   git clone <repository-url>
-```
+   cd munich-news
-
+   ```
-2. Create a virtual environment (recommended):
+
-```bash
+2. **Configure environment**
-python3 -m venv venv
+   ```bash
-source venv/bin/activate  # On Windows: venv\Scripts\activate
+   cp backend/.env.example backend/.env
-```
+   # Edit backend/.env with your settings
-
+   ```
-3. Install dependencies:
+
-```bash
+3. **Start the system**
 pip install -r requirements.txt
 ```
 4. Set up MongoDB using Docker Compose (recommended):
   ```bash
   # From the project root directory
   docker-compose up -d
   ```
-   This will start MongoDB in a Docker container. The database will be available at `mongodb://localhost:27017/`
+## ⚙️ Configuration
-   **Useful Docker commands:**
+Edit `backend/.env`:
   ```bash
   # Start MongoDB
   docker-compose up -d
-   # Stop MongoDB
+```env
-   docker-compose down
+# MongoDB
 MONGODB_URI=mongodb://localhost:27017/
-   # View MongoDB logs
+# Email (SMTP)
-   docker-compose logs -f mongodb
+SMTP_SERVER=smtp.gmail.com
 SMTP_PORT=587
 EMAIL_USER=your-email@gmail.com
 EMAIL_PASSWORD=your-app-password
-   # Restart MongoDB
+# Newsletter
-   docker-compose restart mongodb
+NEWSLETTER_MAX_ARTICLES=10
 NEWSLETTER_HOURS_LOOKBACK=24
-   # Remove MongoDB and all data (WARNING: deletes all data)
+# Tracking
-   docker-compose down -v
+TRACKING_ENABLED=true
-   ```
+TRACKING_API_URL=http://localhost:5001
 TRACKING_DATA_RETENTION_DAYS=90
-   **Alternative options:**
+# Ollama (AI Summarization)
-   - **Local MongoDB**: Install MongoDB locally and make sure it's running
+OLLAMA_ENABLED=true
-   - **MongoDB Atlas** (Cloud): Create a free account at [mongodb.com/cloud/atlas](https://www.mongodb.com/cloud/atlas) and get your connection string
+OLLAMA_BASE_URL=http://127.0.0.1:11434
-
+OLLAMA_MODEL=phi3:latest
 5. Create a `.env` file in the backend directory:
   ```bash
   # Copy the template file
   cp env.template .env
   ```
   Then edit `.env` with your configuration:
   ```env
   # MongoDB connection (default: mongodb://localhost:27017/)
   # For Docker Compose (no authentication):
   MONGODB_URI=mongodb://localhost:27017/
   # For Docker Compose with authentication (if you modify docker-compose.yml):
   # MONGODB_URI=mongodb://admin:password@localhost:27017/
   # Or for MongoDB Atlas:
   # MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
   # Email configuration (optional for testing)
   SMTP_SERVER=smtp.gmail.com
   SMTP_PORT=587
   EMAIL_USER=your-email@gmail.com
   EMAIL_PASSWORD=your-app-password
   # Ollama Configuration (for AI-powered features)
   # Remote Ollama server URL
   OLLAMA_BASE_URL=http://your-remote-server-ip:11434
   # Optional: API key if your Ollama server requires authentication
   # OLLAMA_API_KEY=your-api-key-here
   # Model name to use (e.g., llama2, mistral, codellama, llama3)
   OLLAMA_MODEL=llama2
   # Enable/disable Ollama features (true/false)
   OLLAMA_ENABLED=false
   ```
 **Notes:**
 - For Gmail, you'll need to use an [App Password](https://support.google.com/accounts/answer/185833) instead of your regular password.
 - For Ollama, replace `your-remote-server-ip` with your actual server IP or domain. Set `OLLAMA_ENABLED=true` to enable AI features.
 6. Run the backend server:
 ```bash
 python app.py
 ```
-The backend will run on `http://localhost:5001` (port 5001 to avoid conflict with AirPlay on macOS)
+## 📊 Usage
-### Frontend Setup
+### View Logs
 1. Navigate to the frontend directory:
 ```bash
 cd frontend
 ```
 2. Install dependencies:
 ```bash
 npm install
 ```
 3. Run the frontend server:
 ```bash
 npm start
 ```
 The frontend will run on `http://localhost:3000`
 ## Usage
 1. Open your browser and go to `http://localhost:3000`
 2. Enter your email address to subscribe to the newsletter
 3. View the latest Munich news on the homepage
 4. The backend will aggregate news from multiple Munich news sources
 ## Sending Newsletters
 To send newsletters to all subscribers, you can add a scheduled task or manually trigger the `send_newsletter()` function in `app.py`. For production, consider using:
 - **Cron jobs** (Linux/Mac)
 - **Task Scheduler** (Windows)
 - **Celery** with Redis/RabbitMQ for more advanced scheduling
 - **Cloud functions** (AWS Lambda, Google Cloud Functions)
 Example cron job to send daily at 8 AM:
 ```
 0 8 * * * cd /path/to/munich-news/backend && python -c "from app import send_newsletter; send_newsletter()"
 ```
 ## Project Structure
 ```
 munich-news/
 ├── backend/                # Main API server
 │   ├── app.py              # Flask application entry point
 │   ├── config.py           # Configuration management
 │   ├── database.py         # Database connection
 │   ├── routes/             # API endpoints (blueprints)
 │   ├── services/           # Business logic
 │   ├── templates/          # Email templates
 │   └── requirements.txt    # Python dependencies
 ├── news_crawler/           # Crawler microservice
 │   ├── crawler_service.py  # Standalone crawler
 │   ├── ollama_client.py    # AI summarization client
 │   ├── requirements.txt    # Crawler dependencies
 │   └── README.md           # Crawler documentation
 ├── news_sender/            # Newsletter sender microservice
 │   ├── sender_service.py   # Standalone email sender
 │   ├── newsletter_template.html  # Email template
 │   ├── requirements.txt    # Sender dependencies
 │   └── README.md           # Sender documentation
 ├── frontend/               # Web interface
 │   ├── server.js           # Express server
 │   ├── package.json        # Node.js dependencies
 │   └── public/
 │       ├── index.html      # Main page
 │       ├── styles.css      # Styling
 │       └── app.js          # Frontend JavaScript
 ├── docker-compose.yml      # Docker Compose for MongoDB (development)
 ├── docker-compose.prod.yml # Docker Compose with authentication (production)
 └── README.md
 ```
 ## API Endpoints
 ### `POST /api/subscribe`
 Subscribe to the newsletter
 - Body: `{ "email": "user@example.com" }`
 ### `POST /api/unsubscribe`
 Unsubscribe from the newsletter
 - Body: `{ "email": "user@example.com" }`
 ### `GET /api/news`
 Get latest Munich news articles
 ### `GET /api/stats`
 Get subscription statistics
 - Returns: `{ "subscribers": number, "articles": number, "crawled_articles": number }`
 ### `GET /api/news/<article_url>`
 Get full article content by URL
 - Returns: Full article with content, author, word count, etc.
 ### `GET /api/ollama/ping`
 Test connection to Ollama server
 - Returns: Connection status and Ollama configuration
 - Response examples:
  - Success: `{ "status": "success", "message": "...", "response": "...", "ollama_config": {...} }`
  - Disabled: `{ "status": "disabled", "message": "...", "ollama_config": {...} }`
  - Error: `{ "status": "error", "message": "...", "error_details": "...", "troubleshooting": {...}, "ollama_config": {...} }`
 ### `GET /api/ollama/models`
 List available models on Ollama server
 - Returns: List of available models and current configuration
 - Response: `{ "status": "success", "models": [...], "current_model": "...", "ollama_config": {...} }`
 ### `GET /api/rss-feeds`
 Get all RSS feeds
 - Returns: `{ "feeds": [...] }`
 ### `POST /api/rss-feeds`
 Add a new RSS feed
 - Body: `{ "name": "Feed Name", "url": "https://example.com/rss" }`
 - Returns: `{ "message": "...", "id": "..." }`
 ### `DELETE /api/rss-feeds/<feed_id>`
 Remove an RSS feed
 - Returns: `{ "message": "..." }`
 ### `PATCH /api/rss-feeds/<feed_id>/toggle`
 Toggle RSS feed active status
 - Returns: `{ "message": "...", "active": boolean }`
 ## Database Schema
 ### Articles Collection
 ```javascript
 {
  _id: ObjectId,
  title: String,
  link: String (unique),
  summary: String,
  source: String,
  published_at: String,
  created_at: DateTime
 }
 ```
 ### Subscribers Collection
 ```javascript
 {
  _id: ObjectId,
  email: String (unique, lowercase),
  subscribed_at: DateTime,
  status: String ('active' | 'inactive')
 }
 ```
 **Indexes:**
 - `articles.link` - Unique index to prevent duplicate articles
 - `articles.created_at` - For efficient sorting
 - `subscribers.email` - Unique index for email lookups
 - `subscribers.subscribed_at` - For analytics
 ## News Crawler Microservice
 The project includes a standalone crawler microservice that fetches full article content from RSS feeds.
 ### Running the Crawler
 ```bash
-cd news_crawler
+# All services
 docker-compose logs -f
-# Install dependencies
+# Specific service
-pip install -r requirements.txt
+docker-compose logs -f crawler
-
+docker-compose logs -f sender
-# Run crawler
+docker-compose logs -f mongodb
 python crawler_service.py 10
 ```
-See `news_crawler/README.md` for detailed documentation.
+### Manual Operations
 ### What It Does
 - Crawls full article content from RSS feed links
 - Extracts text, word count, and metadata
 - Stores in MongoDB for AI processing
 - Skips already-crawled articles
 - Rate-limited (1 second between requests)
 ## Customization
 ### Adding News Sources
 Use the API to add RSS feeds dynamically:
 ```bash
-curl -X POST http://localhost:5001/api/rss-feeds \
+# Run crawler manually
-  -H "Content-Type: application/json" \
+docker-compose exec crawler python crawler_service.py 10
-  -d '{"name": "Your Source Name", "url": "https://example.com/rss"}'
+
 # Send test newsletter
 docker-compose exec sender python sender_service.py test your-email@example.com
 # Preview newsletter
 docker-compose exec sender python sender_service.py preview
 ```
-### Styling
+### Database Access
-Modify `frontend/public/styles.css` to customize the appearance.
+```bash
 # Connect to MongoDB
 docker-compose exec mongodb mongosh munich_news
-## License
+# View articles
 db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()
-MIT
+# View subscribers
 db.subscribers.find({ active: true }).pretty()
-## Contributing
+# View tracking data
 db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()
 ```
-Feel free to submit issues and enhancement requests!
+## 🔧 Management
 ### Add RSS Feeds
 ```bash
 mongosh munich_news
 db.rss_feeds.insertOne({
  name: "Source Name",
  url: "https://example.com/rss",
  active: true
 })
 ```
 ### Add Subscribers
 ```bash
 mongosh munich_news
 db.subscribers.insertOne({
  email: "user@example.com",
  active: true,
  tracking_enabled: true,
  subscribed_at: new Date()
 })
 ```
 ### View Analytics
 ```bash
 # Newsletter metrics
 curl http://localhost:5001/api/analytics/newsletter/2024-01-15
 # Article performance
 curl http://localhost:5001/api/analytics/article/https://example.com/article
 # Subscriber activity
 curl http://localhost:5001/api/analytics/subscriber/user@example.com
 ```
 ## ⏰ Schedule Configuration
 ### Change Crawler Time (default: 6:00 AM)
 Edit `news_crawler/scheduled_crawler.py`:
 ```python
 schedule.every().day.at("06:00").do(run_crawler)  # Change time
 ```
 ### Change Sender Time (default: 7:00 AM)
 Edit `news_sender/scheduled_sender.py`:
 ```python
 schedule.every().day.at("07:00").do(run_sender)  # Change time
 ```
 After changes:
 ```bash
 docker-compose up -d --build
 ```
 ## 📈 Monitoring
 ### Container Status
 ```bash
 docker-compose ps
 ```
 ### Check Next Scheduled Runs
 ```bash
 # Crawler
 docker-compose logs crawler | grep "Next scheduled run"
 # Sender
 docker-compose logs sender | grep "Next scheduled run"
 ```
 ### Engagement Metrics
 ```bash
 mongosh munich_news
 // Open rate
 var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
 var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
 print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")
 // Click rate
 var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
 print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")
 ```
 ## 🐛 Troubleshooting
 ### Crawler Not Finding Articles
 ```bash
 # Check RSS feeds
 mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
 # Test manually
 docker-compose exec crawler python crawler_service.py 5
 ```
 ### Newsletter Not Sending
 ```bash
 # Check email config
 docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
 # Test email
 docker-compose exec sender python sender_service.py test your-email@example.com
 ```
 ### Containers Not Starting
 ```bash
 # Check logs
 docker-compose logs
 # Rebuild
 docker-compose up -d --build
 # Reset everything
 docker-compose down -v
 docker-compose up -d
 ```
 ## 🔐 Privacy & Compliance
 ### GDPR Features
 - **Data Retention**: Automatic anonymization after 90 days
 - **Opt-Out**: Subscribers can disable tracking
 - **Data Deletion**: Full data removal on request
 - **Transparency**: Privacy notice in all emails
 ### Privacy Endpoints
 ```bash
 # Delete subscriber data
 curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
 # Anonymize old data
 curl -X POST http://localhost:5001/api/tracking/anonymize
 # Opt out of tracking
 curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
 ```
 ## 📚 Documentation
 ### Getting Started
 - **[QUICKSTART.md](QUICKSTART.md)** - 5-minute setup guide
 - **[PROJECT_STRUCTURE.md](PROJECT_STRUCTURE.md)** - Project layout
 - **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines
 ### Technical Documentation
 - **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** - System architecture
 - **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Deployment guide
 - **[docs/API.md](docs/API.md)** - API reference
 - **[docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md)** - Database structure
 - **[docs/BACKEND_STRUCTURE.md](docs/BACKEND_STRUCTURE.md)** - Backend organization
 ### Component Documentation
 - **[docs/CRAWLER_HOW_IT_WORKS.md](docs/CRAWLER_HOW_IT_WORKS.md)** - Crawler internals
 - **[docs/EXTRACTION_STRATEGIES.md](docs/EXTRACTION_STRATEGIES.md)** - Content extraction
 - **[docs/RSS_URL_EXTRACTION.md](docs/RSS_URL_EXTRACTION.md)** - RSS parsing
 ## 🧪 Testing
 All test files are organized in the `tests/` directory:
 ```bash
 # Run crawler tests
 docker-compose exec crawler python tests/crawler/test_crawler.py
 # Run sender tests
 docker-compose exec sender python tests/sender/test_tracking_integration.py
 # Run backend tests
 docker-compose exec backend python tests/backend/test_tracking.py
 ```
 ## 🚀 Production Deployment
 ### Environment Setup
 1. Update `backend/.env` with production values
 2. Set strong MongoDB password
 3. Use HTTPS for tracking URLs
 4. Configure proper SMTP server
 ### Security
 ```bash
 # Use production compose file
 docker-compose -f docker-compose.prod.yml up -d
 # Set MongoDB password
 export MONGO_PASSWORD=your-secure-password
 ```
 ### Monitoring
 - Set up log rotation
 - Configure health checks
 - Set up alerts for failures
 - Monitor database size
 ## 📝 License
 [Your License Here]
 ## 🤝 Contributing
 Contributions welcome! Please read CONTRIBUTING.md first.
 ## 📧 Support
 For issues or questions, please open a GitHub issue.
 ---
 **Built with ❤️ for Munich News Daily**
--- a/TEST_INSTRUCTIONS.md
+++ b/TEST_INSTRUCTIONS.md
@@ -1,132 +0,0 @@
 # Testing RSS Feed URL Extraction
 ## Quick Test (Recommended)
 Run this from the project root with backend virtual environment activated:
 ```bash
 # 1. Activate backend virtual environment
 cd backend
 source venv/bin/activate  # On Windows: venv\Scripts\activate
 # 2. Go back to project root
 cd ..
 # 3. Run the test
 python test_feeds_quick.py
 ```
 This will:
 - ✓ Check what RSS feeds are in your database
 - ✓ Fetch each feed
 - ✓ Test URL extraction on first 3 articles
 - ✓ Show what fields are available
 - ✓ Verify summary and date extraction
 ## Expected Output
 ```
 ================================================================================
 RSS Feed Test - Checking Database Feeds
 ================================================================================
 ✓ Found 3 feed(s) in database
 ================================================================================
 Feed: Süddeutsche Zeitung München
 URL: https://www.sueddeutsche.de/muenchen/rss
 Active: True
 ================================================================================
 Fetching RSS feed...
 ✓ Found 20 entries
 --- Entry 1 ---
 Title: New U-Bahn Line Opens in Munich
 ✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-123
 ✓ Summary: The new U-Bahn line connecting the city center...
 ✓ Date: Mon, 10 Nov 2024 10:00:00 +0100
 --- Entry 2 ---
 Title: Munich Weather Update
 ✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-124
 ✓ Summary: Weather forecast for the week...
 ✓ Date: Mon, 10 Nov 2024 09:30:00 +0100
 ...
 ```
 ## If No Feeds Found
 Add a feed first:
 ```bash
 curl -X POST http://localhost:5001/api/rss-feeds \
  -H "Content-Type: application/json" \
  -d '{"name": "Süddeutsche Politik", "url": "https://rss.sueddeutsche.de/rss/Politik"}'
 ```
 ## Testing News Crawler
 Once feeds are verified, test the crawler:
 ```bash
 # 1. Install crawler dependencies
 cd news_crawler
 pip install -r requirements.txt
 # 2. Run the test
 python test_rss_feeds.py
 # 3. Or run the actual crawler
 python crawler_service.py 5
 ```
 ## Troubleshooting
 ### "No module named 'pymongo'"
 - Activate the backend virtual environment first
 - Or install dependencies: `pip install -r backend/requirements.txt`
 ### "No RSS feeds in database"
 - Make sure backend is running
 - Add feeds via API (see above)
 - Or check if MongoDB is running: `docker-compose ps`
 ### "Could not extract URL"
 - The test will show available fields
 - Check if the feed uses `guid`, `id`, or `links` instead of `link`
 - Our utility should handle most cases automatically
 ### "No entries found"
 - The RSS feed URL might be invalid
 - Try opening the URL in a browser
 - Check if it returns valid XML
 ## Manual Database Check
 Using mongosh:
 ```bash
 mongosh
 use munich_news
 db.rss_feeds.find()
 db.articles.find().limit(3)
 ```
 ## What to Look For
 ✅ **Good signs:**
 - URLs are extracted successfully
 - URLs start with `http://` or `https://`
 - Summaries are present
 - Dates are extracted
 ⚠️ **Warning signs:**
 - "Could not extract URL" messages
 - Empty summaries (not critical)
 - Missing dates (not critical)
 ❌ **Problems:**
 - No entries found in feed
 - All URL extractions fail
 - Feed parsing errors
--- a/backend/.env.example
+++ b/backend/.env.example
@@ -0,0 +1,28 @@
 # MongoDB Configuration
 MONGODB_URI=mongodb://localhost:27017/
 # Email Configuration (Required)
 SMTP_SERVER=smtp.gmail.com
 SMTP_PORT=587
 EMAIL_USER=your-email@gmail.com
 EMAIL_PASSWORD=your-app-password
 # Newsletter Settings
 NEWSLETTER_MAX_ARTICLES=10
 NEWSLETTER_HOURS_LOOKBACK=24
 WEBSITE_URL=http://localhost:3000
 # Tracking Configuration
 TRACKING_ENABLED=true
 TRACKING_API_URL=http://localhost:5001
 TRACKING_DATA_RETENTION_DAYS=90
 # Ollama Configuration (AI Summarization)
 OLLAMA_ENABLED=true
 OLLAMA_BASE_URL=http://127.0.0.1:11434
 OLLAMA_MODEL=phi3:latest
 OLLAMA_TIMEOUT=120
 SUMMARY_MAX_WORDS=150
 # Flask Server Configuration
 FLASK_PORT=5001
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -0,0 +1,20 @@
 FROM python:3.11-slim
 WORKDIR /app
 # Install dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy application files
 COPY . .
 # Set timezone to Berlin
 ENV TZ=Europe/Berlin
 RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
 # Expose Flask port
 EXPOSE 5001
 # Run the Flask application
 CMD ["python", "-u", "app.py"]
--- a/backend/app.py
+++ b/backend/app.py
@@ -7,6 +7,8 @@ from routes.news_routes import news_bp
 from routes.rss_routes import rss_bp
 from routes.ollama_routes import ollama_bp
 from routes.newsletter_routes import newsletter_bp
 from routes.tracking_routes import tracking_bp
 from routes.analytics_routes import analytics_bp
 # Initialize Flask app
 app = Flask(__name__)
@@ -21,9 +23,17 @@ app.register_blueprint(news_bp)
 app.register_blueprint(rss_bp)
 app.register_blueprint(ollama_bp)
 app.register_blueprint(newsletter_bp)
 app.register_blueprint(tracking_bp)
 app.register_blueprint(analytics_bp)
 # Health check endpoint
@app.route('/health')
 def health():
    return {'status': 'healthy', 'service': 'munich-news-backend'}, 200
 # Print configuration
 Config.print_config()
 if __name__ == '__main__':
-    app.run(debug=True, port=Config.FLASK_PORT, host='127.0.0.1')
+    # Use 0.0.0.0 to allow Docker container access
    app.run(debug=True, port=Config.FLASK_PORT, host='0.0.0.0')
--- a/backend/config.py
+++ b/backend/config.py
@@ -40,6 +40,11 @@ class Config:
    # Flask
    FLASK_PORT = int(os.getenv('FLASK_PORT', '5000'))
    # Tracking
    TRACKING_ENABLED = os.getenv('TRACKING_ENABLED', 'true').lower() == 'true'
    TRACKING_API_URL = os.getenv('TRACKING_API_URL', f'http://localhost:{os.getenv("FLASK_PORT", "5000")}')
    TRACKING_DATA_RETENTION_DAYS = int(os.getenv('TRACKING_DATA_RETENTION_DAYS', '90'))
    @classmethod
    def print_config(cls):
        """Print configuration (without sensitive data)"""
@@ -50,3 +55,5 @@ class Config:
        print(f"  Ollama Base URL: {cls.OLLAMA_BASE_URL}")
        print(f"  Ollama Model: {cls.OLLAMA_MODEL}")
        print(f"  Ollama Enabled: {cls.OLLAMA_ENABLED}")
        print(f"  Tracking Enabled: {cls.TRACKING_ENABLED}")
        print(f"  Tracking API URL: {cls.TRACKING_API_URL}")
--- a/backend/database.py
+++ b/backend/database.py
@@ -11,6 +11,11 @@ articles_collection = db['articles']
 subscribers_collection = db['subscribers']
 rss_feeds_collection = db['rss_feeds']
 # Tracking Collections
 newsletter_sends_collection = db['newsletter_sends']
 link_clicks_collection = db['link_clicks']
 subscriber_activity_collection = db['subscriber_activity']
 def init_db():
    """Initialize database with indexes"""
@@ -25,6 +30,9 @@ def init_db():
    # Create unique index on RSS feed URLs
    rss_feeds_collection.create_index('url', unique=True)
    # Initialize tracking collections indexes
    init_tracking_collections()
    # Initialize default RSS feeds if collection is empty
    if rss_feeds_collection.count_documents({}) == 0:
        default_feeds = [
@@ -51,3 +59,37 @@ def init_db():
        print(f"Initialized {len(default_feeds)} default RSS feeds")
    print("Database initialized with indexes")
 def init_tracking_collections():
    """Initialize tracking collections with indexes for email tracking system"""
    # Newsletter Sends Collection Indexes
    # Unique index on tracking_id for fast pixel/click lookups
    newsletter_sends_collection.create_index('tracking_id', unique=True)
    # Index on newsletter_id for analytics queries
    newsletter_sends_collection.create_index('newsletter_id')
    # Index on subscriber_email for user activity queries
    newsletter_sends_collection.create_index('subscriber_email')
    # Index on sent_at for time-based queries
    newsletter_sends_collection.create_index('sent_at')
    # Link Clicks Collection Indexes
    # Unique index on tracking_id for fast redirect lookups
    link_clicks_collection.create_index('tracking_id', unique=True)
    # Index on newsletter_id for analytics queries
    link_clicks_collection.create_index('newsletter_id')
    # Index on article_url for article performance queries
    link_clicks_collection.create_index('article_url')
    # Index on subscriber_email for user activity queries
    link_clicks_collection.create_index('subscriber_email')
    # Subscriber Activity Collection Indexes
    # Unique index on email for fast lookups
    subscriber_activity_collection.create_index('email', unique=True)
    # Index on status for filtering by activity level
    subscriber_activity_collection.create_index('status')
    # Index on last_opened_at for time-based queries
    subscriber_activity_collection.create_index('last_opened_at')
    print("Tracking collections initialized with indexes")
--- a/backend/env.template
+++ b/backend/env.template
@@ -30,3 +30,12 @@ OLLAMA_TIMEOUT=30
 # Port for Flask server (default: 5001 to avoid AirPlay conflict on macOS)
 FLASK_PORT=5001
 # Tracking Configuration
 # Enable/disable email tracking features (true/false)
 TRACKING_ENABLED=true
 # Base URL for tracking API (used in tracking pixel and link URLs)
 # In production, use your actual domain (e.g., https://yourdomain.com)
 TRACKING_API_URL=http://localhost:5001
 # Number of days to retain tracking data before anonymization
 TRACKING_DATA_RETENTION_DAYS=90
--- a/backend/init_tracking_db.py
+++ b/backend/init_tracking_db.py
@@ -0,0 +1,107 @@
 #!/usr/bin/env python3
 """
 Database initialization script for email tracking system.
 This script creates the necessary MongoDB collections and indexes
 for tracking email opens and link clicks in the newsletter system.
 Collections created:
 - newsletter_sends: Tracks each newsletter sent to each subscriber
 - link_clicks: Tracks individual link clicks
 - subscriber_activity: Aggregated activity status for each subscriber
 Usage:
    python init_tracking_db.py
 """
 from pymongo import MongoClient, ASCENDING
 from config import Config
 from datetime import datetime
 def init_tracking_database():
    """Initialize tracking collections with proper indexes"""
    print("Connecting to MongoDB...")
    client = MongoClient(Config.MONGODB_URI)
    db = client[Config.DB_NAME]
    print(f"Connected to database: {Config.DB_NAME}")
    # Get collection references
    newsletter_sends = db['newsletter_sends']
    link_clicks = db['link_clicks']
    subscriber_activity = db['subscriber_activity']
    print("\n=== Setting up Newsletter Sends Collection ===")
    # Newsletter Sends Collection Indexes
    newsletter_sends.create_index('tracking_id', unique=True)
    print("✓ Created unique index on 'tracking_id'")
    newsletter_sends.create_index('newsletter_id')
    print("✓ Created index on 'newsletter_id'")
    newsletter_sends.create_index('subscriber_email')
    print("✓ Created index on 'subscriber_email'")
    newsletter_sends.create_index('sent_at')
    print("✓ Created index on 'sent_at'")
    print("\n=== Setting up Link Clicks Collection ===")
    # Link Clicks Collection Indexes
    link_clicks.create_index('tracking_id', unique=True)
    print("✓ Created unique index on 'tracking_id'")
    link_clicks.create_index('newsletter_id')
    print("✓ Created index on 'newsletter_id'")
    link_clicks.create_index('article_url')
    print("✓ Created index on 'article_url'")
    link_clicks.create_index('subscriber_email')
    print("✓ Created index on 'subscriber_email'")
    print("\n=== Setting up Subscriber Activity Collection ===")
    # Subscriber Activity Collection Indexes
    subscriber_activity.create_index('email', unique=True)
    print("✓ Created unique index on 'email'")
    subscriber_activity.create_index('status')
    print("✓ Created index on 'status'")
    subscriber_activity.create_index('last_opened_at')
    print("✓ Created index on 'last_opened_at'")
    # Display collection statistics
    print("\n=== Collection Statistics ===")
    print(f"newsletter_sends: {newsletter_sends.count_documents({})} documents")
    print(f"link_clicks: {link_clicks.count_documents({})} documents")
    print(f"subscriber_activity: {subscriber_activity.count_documents({})} documents")
    # List all indexes for verification
    print("\n=== Index Verification ===")
    print("\nNewsletter Sends Indexes:")
    for index in newsletter_sends.list_indexes():
        print(f"  - {index['name']}: {index.get('key', {})}")
    print("\nLink Clicks Indexes:")
    for index in link_clicks.list_indexes():
        print(f"  - {index['name']}: {index.get('key', {})}")
    print("\nSubscriber Activity Indexes:")
    for index in subscriber_activity.list_indexes():
        print(f"  - {index['name']}: {index.get('key', {})}")
    print("\n✅ Tracking database initialization complete!")
    client.close()
 if __name__ == '__main__':
    try:
        init_tracking_database()
    except Exception as e:
        print(f"\n❌ Error initializing tracking database: {e}")
        import traceback
        traceback.print_exc()
        exit(1)
--- a/backend/routes/analytics_routes.py
+++ b/backend/routes/analytics_routes.py
@@ -0,0 +1,127 @@
 """
 Analytics routes for email tracking metrics and subscriber engagement.
 """
 from flask import Blueprint, jsonify, request
 from services.analytics_service import (
    get_newsletter_metrics,
    get_article_performance,
    get_subscriber_activity_status,
    update_subscriber_activity_statuses
 )
 from database import subscriber_activity_collection
 analytics_bp = Blueprint('analytics', __name__)
@analytics_bp.route('/api/analytics/newsletter/<newsletter_id>', methods=['GET'])
 def get_newsletter_analytics(newsletter_id):
    """
    Get comprehensive metrics for a specific newsletter.
    Args:
        newsletter_id: Unique identifier for the newsletter batch
    Returns:
        JSON response with newsletter metrics including:
        - total_sent, total_opened, open_rate
        - total_clicks, unique_clickers, click_through_rate
    """
    try:
        metrics = get_newsletter_metrics(newsletter_id)
        return jsonify(metrics), 200
    except Exception as e:
        return jsonify({'error': str(e)}), 500
@analytics_bp.route('/api/analytics/article/<path:article_url>', methods=['GET'])
 def get_article_analytics(article_url):
    """
    Get performance metrics for a specific article.
    Args:
        article_url: The original article URL (passed as path parameter)
    Returns:
        JSON response with article performance metrics including:
        - total_sent, total_clicks, click_rate
        - unique_clickers, newsletters
    """
    try:
        performance = get_article_performance(article_url)
        return jsonify(performance), 200
    except Exception as e:
        return jsonify({'error': str(e)}), 500
@analytics_bp.route('/api/analytics/subscriber/<email>', methods=['GET'])
 def get_subscriber_analytics(email):
    """
    Get activity status and engagement metrics for a specific subscriber.
    Args:
        email: Subscriber email address
    Returns:
        JSON response with subscriber activity data including:
        - status, last_opened_at, last_clicked_at
        - total_opens, total_clicks
        - newsletters_received, newsletters_opened
    """
    try:
        # Get current activity status
        status = get_subscriber_activity_status(email)
        # Get detailed activity record from database
        activity_record = subscriber_activity_collection.find_one(
            {'email': email},
            {'_id': 0}  # Exclude MongoDB _id field
        )
        if activity_record:
            # Convert datetime objects to ISO format strings
            if activity_record.get('last_opened_at'):
                activity_record['last_opened_at'] = activity_record['last_opened_at'].isoformat()
            if activity_record.get('last_clicked_at'):
                activity_record['last_clicked_at'] = activity_record['last_clicked_at'].isoformat()
            if activity_record.get('updated_at'):
                activity_record['updated_at'] = activity_record['updated_at'].isoformat()
            return jsonify(activity_record), 200
        else:
            # Return basic status if no detailed record exists yet
            return jsonify({
                'email': email,
                'status': status,
                'last_opened_at': None,
                'last_clicked_at': None,
                'total_opens': 0,
                'total_clicks': 0,
                'newsletters_received': 0,
                'newsletters_opened': 0
            }), 200
    except Exception as e:
        return jsonify({'error': str(e)}), 500
@analytics_bp.route('/api/analytics/update-activity', methods=['POST'])
 def update_activity_statuses():
    """
    Trigger batch update of subscriber activity statuses.
    Updates the subscriber_activity collection with current engagement
    metrics for all subscribers.
    Returns:
        JSON response with count of updated records
    """
    try:
        updated_count = update_subscriber_activity_statuses()
        return jsonify({
            'success': True,
            'updated_count': updated_count,
            'message': f'Updated activity status for {updated_count} subscribers'
        }), 200
    except Exception as e:
        return jsonify({'error': str(e)}), 500
--- a/backend/routes/tracking_routes.py
+++ b/backend/routes/tracking_routes.py
@@ -0,0 +1,285 @@
 """
 Tracking routes for email open and link click tracking.
 """
 from flask import Blueprint, request, redirect, make_response, jsonify
 from datetime import datetime
 import base64
 from database import newsletter_sends_collection, link_clicks_collection
 from services.tracking_service import delete_subscriber_tracking_data, anonymize_old_tracking_data
 from config import Config
 tracking_bp = Blueprint('tracking', __name__)
 # 1x1 transparent PNG image (43 bytes)
 TRANSPARENT_PNG = base64.b64decode(
    'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII='
 )
@tracking_bp.route('/api/track/pixel/<tracking_id>', methods=['GET'])
 def track_pixel(tracking_id):
    """
    Track email opens via tracking pixel.
    Serves a 1x1 transparent PNG image and logs the email open event.
    Handles multiple opens by updating last_opened_at and open_count.
    Fails silently if tracking_id is invalid to avoid breaking email rendering.
    Args:
        tracking_id: Unique tracking ID for the newsletter send
    Returns:
        Response: 1x1 transparent PNG image with proper headers
    """
    try:
        # Look up tracking record
        tracking_record = newsletter_sends_collection.find_one({'tracking_id': tracking_id})
        if tracking_record:
            # Get user agent for logging
            user_agent = request.headers.get('User-Agent', '')
            current_time = datetime.utcnow()
            # Update tracking record
            update_data = {
                'opened': True,
                'last_opened_at': current_time,
                'user_agent': user_agent
            }
            # Set first_opened_at only if this is the first open
            if not tracking_record.get('opened'):
                update_data['first_opened_at'] = current_time
            # Increment open count
            newsletter_sends_collection.update_one(
                {'tracking_id': tracking_id},
                {
                    '$set': update_data,
                    '$inc': {'open_count': 1}
                }
            )
    except Exception as e:
        # Log error but don't fail - we still want to return the pixel
        print(f"Error tracking pixel for {tracking_id}: {str(e)}")
    # Always return the transparent PNG, even if tracking fails
    response = make_response(TRANSPARENT_PNG)
    response.headers['Content-Type'] = 'image/png'
    response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
    response.headers['Pragma'] = 'no-cache'
    response.headers['Expires'] = '0'
    return response
@tracking_bp.route('/api/track/click/<tracking_id>', methods=['GET'])
 def track_click(tracking_id):
    """
    Track link clicks and redirect to original article URL.
    Logs the click event and redirects the user to the original article URL.
    Handles invalid tracking_id by redirecting to homepage.
    Ensures redirect completes within 200ms.
    Args:
        tracking_id: Unique tracking ID for the article link
    Returns:
        Response: 302 redirect to original article URL or homepage
    """
    # Default redirect URL (homepage)
    redirect_url = Config.TRACKING_API_URL or 'http://localhost:5001'
    try:
        # Look up tracking record
        tracking_record = link_clicks_collection.find_one({'tracking_id': tracking_id})
        if tracking_record:
            # Get the original article URL
            redirect_url = tracking_record.get('article_url', redirect_url)
            # Get user agent for logging
            user_agent = request.headers.get('User-Agent', '')
            current_time = datetime.utcnow()
            # Update tracking record with click event
            link_clicks_collection.update_one(
                {'tracking_id': tracking_id},
                {
                    '$set': {
                        'clicked': True,
                        'clicked_at': current_time,
                        'user_agent': user_agent
                    }
                }
            )
    except Exception as e:
        # Log error but still redirect
        print(f"Error tracking click for {tracking_id}: {str(e)}")
    # Redirect to the article URL (or homepage if tracking failed)
    return redirect(redirect_url, code=302)
@tracking_bp.route('/api/tracking/subscriber/<email>', methods=['DELETE'])
 def delete_subscriber_data(email):
    """
    Delete all tracking data for a specific subscriber.
    Removes all tracking records associated with the subscriber's email address
    from all tracking collections (newsletter_sends, link_clicks, subscriber_activity).
    Supports GDPR right to be forgotten.
    Args:
        email: Email address of the subscriber
    Returns:
        JSON response with deletion counts and confirmation message
    """
    try:
        # Delete all tracking data for the subscriber
        result = delete_subscriber_tracking_data(email)
        return jsonify({
            'success': True,
            'message': f'All tracking data deleted for {email}',
            'deleted_counts': result
        }), 200
    except Exception as e:
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500
@tracking_bp.route('/api/tracking/anonymize', methods=['POST'])
 def anonymize_tracking_data():
    """
    Anonymize tracking data older than the retention period.
    Removes email addresses from old tracking records while preserving
    aggregated metrics. Default retention period is 90 days.
    Request body (optional):
        {
            "retention_days": 90  // Number of days to retain personal data
        }
    Returns:
        JSON response with anonymization counts
    """
    try:
        # Get retention days from request body (default: 90)
        data = request.get_json() or {}
        retention_days = data.get('retention_days', 90)
        # Validate retention_days
        if not isinstance(retention_days, int) or retention_days < 1:
            return jsonify({
                'success': False,
                'error': 'retention_days must be a positive integer'
            }), 400
        # Anonymize old tracking data
        result = anonymize_old_tracking_data(retention_days)
        return jsonify({
            'success': True,
            'message': f'Anonymized tracking data older than {retention_days} days',
            'anonymized_counts': result
        }), 200
    except Exception as e:
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500
@tracking_bp.route('/api/tracking/subscriber/<email>/opt-out', methods=['POST'])
 def opt_out_tracking(email):
    """
    Opt a subscriber out of tracking.
    Sets the tracking_enabled field to False for the subscriber,
    preventing future tracking of their email opens and link clicks.
    Args:
        email: Email address of the subscriber
    Returns:
        JSON response with confirmation message
    """
    try:
        from database import subscribers_collection
        # Update subscriber to opt out of tracking
        result = subscribers_collection.update_one(
            {'email': email},
            {'$set': {'tracking_enabled': False}},
            upsert=False
        )
        if result.matched_count == 0:
            return jsonify({
                'success': False,
                'error': f'Subscriber {email} not found'
            }), 404
        return jsonify({
            'success': True,
            'message': f'Subscriber {email} has opted out of tracking'
        }), 200
    except Exception as e:
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500
@tracking_bp.route('/api/tracking/subscriber/<email>/opt-in', methods=['POST'])
 def opt_in_tracking(email):
    """
    Opt a subscriber back into tracking.
    Sets the tracking_enabled field to True for the subscriber,
    enabling tracking of their email opens and link clicks.
    Args:
        email: Email address of the subscriber
    Returns:
        JSON response with confirmation message
    """
    try:
        from database import subscribers_collection
        # Update subscriber to opt in to tracking
        result = subscribers_collection.update_one(
            {'email': email},
            {'$set': {'tracking_enabled': True}},
            upsert=False
        )
        if result.matched_count == 0:
            return jsonify({
                'success': False,
                'error': f'Subscriber {email} not found'
            }), 404
        return jsonify({
            'success': True,
            'message': f'Subscriber {email} has opted in to tracking'
        }), 200
    except Exception as e:
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500
--- a/backend/services/analytics_service.py
+++ b/backend/services/analytics_service.py
@@ -0,0 +1,306 @@
 """
 Analytics service for email tracking metrics and subscriber engagement.
 Calculates open rates, click rates, and subscriber activity status.
 """
 from datetime import datetime, timedelta
 from typing import Dict, Optional
 from database import (
    newsletter_sends_collection,
    link_clicks_collection,
    subscriber_activity_collection
 )
 def get_open_rate(newsletter_id: str) -> float:
    """
    Calculate the percentage of subscribers who opened a specific newsletter.
    Args:
        newsletter_id: Unique identifier for the newsletter batch
    Returns:
        float: Open rate as a percentage (0-100)
    """
    # Count total sends for this newsletter
    total_sends = newsletter_sends_collection.count_documents({
        'newsletter_id': newsletter_id
    })
    if total_sends == 0:
        return 0.0
    # Count how many were opened
    opened_count = newsletter_sends_collection.count_documents({
        'newsletter_id': newsletter_id,
        'opened': True
    })
    # Calculate percentage
    open_rate = (opened_count / total_sends) * 100
    return round(open_rate, 2)
 def get_click_rate(article_url: str) -> float:
    """
    Calculate the percentage of subscribers who clicked a specific article link.
    Args:
        article_url: The original article URL
    Returns:
        float: Click rate as a percentage (0-100)
    """
    # Count total link tracking records for this article
    total_links = link_clicks_collection.count_documents({
        'article_url': article_url
    })
    if total_links == 0:
        return 0.0
    # Count how many were clicked
    clicked_count = link_clicks_collection.count_documents({
        'article_url': article_url,
        'clicked': True
    })
    # Calculate percentage
    click_rate = (clicked_count / total_links) * 100
    return round(click_rate, 2)
 def get_newsletter_metrics(newsletter_id: str) -> Dict:
    """
    Get comprehensive metrics for a specific newsletter.
    Args:
        newsletter_id: Unique identifier for the newsletter batch
    Returns:
        dict: Dictionary containing:
            - newsletter_id: The newsletter ID
            - total_sent: Total number of emails sent
            - total_opened: Number of emails opened
            - open_rate: Percentage of emails opened
            - total_clicks: Total number of link clicks
            - unique_clickers: Number of unique subscribers who clicked
            - click_through_rate: Percentage of recipients who clicked any link
    """
    # Get total sends
    total_sent = newsletter_sends_collection.count_documents({
        'newsletter_id': newsletter_id
    })
    # Get total opened
    total_opened = newsletter_sends_collection.count_documents({
        'newsletter_id': newsletter_id,
        'opened': True
    })
    # Calculate open rate
    open_rate = (total_opened / total_sent * 100) if total_sent > 0 else 0.0
    # Get total clicks for this newsletter
    total_clicks = link_clicks_collection.count_documents({
        'newsletter_id': newsletter_id,
        'clicked': True
    })
    # Get unique clickers (distinct subscriber emails who clicked)
    unique_clickers = len(link_clicks_collection.distinct(
        'subscriber_email',
        {'newsletter_id': newsletter_id, 'clicked': True}
    ))
    # Calculate click-through rate (unique clickers / total sent)
    click_through_rate = (unique_clickers / total_sent * 100) if total_sent > 0 else 0.0
    return {
        'newsletter_id': newsletter_id,
        'total_sent': total_sent,
        'total_opened': total_opened,
        'open_rate': round(open_rate, 2),
        'total_clicks': total_clicks,
        'unique_clickers': unique_clickers,
        'click_through_rate': round(click_through_rate, 2)
    }
 def get_article_performance(article_url: str) -> Dict:
    """
    Get performance metrics for a specific article across all newsletters.
    Args:
        article_url: The original article URL
    Returns:
        dict: Dictionary containing:
            - article_url: The article URL
            - total_sent: Total times this article was sent
            - total_clicks: Total number of clicks
            - click_rate: Percentage of recipients who clicked
            - unique_clickers: Number of unique subscribers who clicked
            - newsletters: List of newsletter IDs that included this article
    """
    # Get all link tracking records for this article
    total_sent = link_clicks_collection.count_documents({
        'article_url': article_url
    })
    # Get total clicks
    total_clicks = link_clicks_collection.count_documents({
        'article_url': article_url,
        'clicked': True
    })
    # Calculate click rate
    click_rate = (total_clicks / total_sent * 100) if total_sent > 0 else 0.0
    # Get unique clickers
    unique_clickers = len(link_clicks_collection.distinct(
        'subscriber_email',
        {'article_url': article_url, 'clicked': True}
    ))
    # Get list of newsletters that included this article
    newsletters = link_clicks_collection.distinct(
        'newsletter_id',
        {'article_url': article_url}
    )
    return {
        'article_url': article_url,
        'total_sent': total_sent,
        'total_clicks': total_clicks,
        'click_rate': round(click_rate, 2),
        'unique_clickers': unique_clickers,
        'newsletters': newsletters
    }
 def get_subscriber_activity_status(email: str) -> str:
    """
    Get the activity status for a specific subscriber.
    Classifies subscribers based on their last email open:
    - 'active': Opened an email in the last 30 days
    - 'inactive': No opens in 30-60 days
    - 'dormant': No opens in 60+ days
    - 'new': No opens yet
    Args:
        email: Subscriber email address
    Returns:
        str: Activity status ('active', 'inactive', 'dormant', or 'new')
    """
    # Find the most recent open for this subscriber
    most_recent_open = newsletter_sends_collection.find_one(
        {'subscriber_email': email, 'opened': True},
        sort=[('last_opened_at', -1)]
    )
    if not most_recent_open:
        # Check if subscriber has received any newsletters
        has_received = newsletter_sends_collection.count_documents({
            'subscriber_email': email
        }) > 0
        return 'new' if has_received else 'new'
    # Calculate days since last open
    last_opened_at = most_recent_open.get('last_opened_at')
    if not last_opened_at:
        return 'new'
    days_since_open = (datetime.utcnow() - last_opened_at).days
    # Classify based on days since last open
    if days_since_open <= 30:
        return 'active'
    elif days_since_open <= 60:
        return 'inactive'
    else:
        return 'dormant'
 def update_subscriber_activity_statuses() -> int:
    """
    Batch update activity statuses for all subscribers.
    Updates the subscriber_activity collection with current activity status,
    engagement metrics, and last interaction timestamps for all subscribers
    who have received newsletters.
    Returns:
        int: Number of subscriber records updated
    """
    # Get all unique subscriber emails from newsletter sends
    all_subscribers = newsletter_sends_collection.distinct('subscriber_email')
    updated_count = 0
    for email in all_subscribers:
        # Get activity status
        status = get_subscriber_activity_status(email)
        # Get last opened timestamp
        last_open_record = newsletter_sends_collection.find_one(
            {'subscriber_email': email, 'opened': True},
            sort=[('last_opened_at', -1)]
        )
        last_opened_at = last_open_record.get('last_opened_at') if last_open_record else None
        # Get last clicked timestamp
        last_click_record = link_clicks_collection.find_one(
            {'subscriber_email': email, 'clicked': True},
            sort=[('clicked_at', -1)]
        )
        last_clicked_at = last_click_record.get('clicked_at') if last_click_record else None
        # Count total opens
        total_opens = newsletter_sends_collection.count_documents({
            'subscriber_email': email,
            'opened': True
        })
        # Count total clicks
        total_clicks = link_clicks_collection.count_documents({
            'subscriber_email': email,
            'clicked': True
        })
        # Count newsletters received
        newsletters_received = newsletter_sends_collection.count_documents({
            'subscriber_email': email
        })
        # Count newsletters opened (distinct newsletter_ids)
        newsletters_opened = len(newsletter_sends_collection.distinct(
            'newsletter_id',
            {'subscriber_email': email, 'opened': True}
        ))
        # Update or insert subscriber activity record
        subscriber_activity_collection.update_one(
            {'email': email},
            {
                '$set': {
                    'email': email,
                    'status': status,
                    'last_opened_at': last_opened_at,
                    'last_clicked_at': last_clicked_at,
                    'total_opens': total_opens,
                    'total_clicks': total_clicks,
                    'newsletters_received': newsletters_received,
                    'newsletters_opened': newsletters_opened,
                    'updated_at': datetime.utcnow()
                }
            },
            upsert=True
        )
        updated_count += 1
    return updated_count
--- a/backend/services/tracking_service.py
+++ b/backend/services/tracking_service.py
@@ -0,0 +1,215 @@
 """
 Email tracking service for Munich News Daily newsletter system.
 Handles tracking ID generation and tracking record creation.
 """
 import uuid
 from datetime import datetime, timedelta
 from typing import Dict, List, Optional
 from database import newsletter_sends_collection, link_clicks_collection, subscriber_activity_collection, subscribers_collection
 def generate_tracking_id() -> str:
    """
    Generate a unique tracking ID using UUID4.
    Returns:
        str: A unique UUID4 string for tracking purposes
    """
    return str(uuid.uuid4())
 def create_newsletter_tracking(
    newsletter_id: str,
    subscriber_email: str,
    article_links: Optional[List[Dict[str, str]]] = None
 ) -> Dict[str, any]:
    """
    Create tracking records for a newsletter send.
    Creates a tracking record in newsletter_sends collection for email open tracking,
    and creates tracking records in link_clicks collection for each article link.
    Respects subscriber opt-out preferences.
    Args:
        newsletter_id: Unique identifier for the newsletter batch (e.g., date-based)
        subscriber_email: Email address of the recipient
        article_links: Optional list of article dictionaries with 'url' and 'title' keys
    Returns:
        dict: Tracking information containing:
            - pixel_tracking_id: ID for the tracking pixel (None if opted out)
            - link_tracking_map: Dict mapping original URLs to tracking IDs (empty if opted out)
            - newsletter_id: The newsletter batch ID
            - subscriber_email: The recipient email
            - tracking_enabled: Boolean indicating if tracking is enabled for this subscriber
    """
    # Check if subscriber has opted out of tracking
    subscriber = subscribers_collection.find_one({'email': subscriber_email})
    tracking_enabled = subscriber.get('tracking_enabled', True) if subscriber else True
    # If tracking is disabled, return empty tracking data
    if not tracking_enabled:
        return {
            'pixel_tracking_id': None,
            'link_tracking_map': {},
            'newsletter_id': newsletter_id,
            'subscriber_email': subscriber_email,
            'tracking_enabled': False
        }
    # Generate tracking ID for the email open pixel
    pixel_tracking_id = generate_tracking_id()
    # Create newsletter send tracking record
    newsletter_send_doc = {
        'newsletter_id': newsletter_id,
        'subscriber_email': subscriber_email,
        'tracking_id': pixel_tracking_id,
        'sent_at': datetime.utcnow(),
        'opened': False,
        'first_opened_at': None,
        'last_opened_at': None,
        'open_count': 0,
        'created_at': datetime.utcnow()
    }
    newsletter_sends_collection.insert_one(newsletter_send_doc)
    # Create tracking records for article links
    link_tracking_map = {}
    if article_links:
        for article in article_links:
            article_url = article.get('url')
            article_title = article.get('title', '')
            if article_url:
                link_tracking_id = generate_tracking_id()
                # Create link click tracking record
                link_click_doc = {
                    'tracking_id': link_tracking_id,
                    'newsletter_id': newsletter_id,
                    'subscriber_email': subscriber_email,
                    'article_url': article_url,
                    'article_title': article_title,
                    'clicked': False,
                    'clicked_at': None,
                    'user_agent': None,
                    'created_at': datetime.utcnow()
                }
                link_clicks_collection.insert_one(link_click_doc)
                # Map original URL to tracking ID
                link_tracking_map[article_url] = link_tracking_id
    return {
        'pixel_tracking_id': pixel_tracking_id,
        'link_tracking_map': link_tracking_map,
        'newsletter_id': newsletter_id,
        'subscriber_email': subscriber_email,
        'tracking_enabled': True
    }
 def anonymize_old_tracking_data(retention_days: int = 90) -> Dict[str, int]:
    """
    Anonymize tracking data older than the specified retention period.
    Removes email addresses from tracking records while preserving aggregated metrics.
    This helps comply with privacy regulations by not retaining personal data indefinitely.
    Args:
        retention_days: Number of days to retain personal data (default: 90)
    Returns:
        dict: Count of anonymized records for each collection:
            - newsletter_sends_anonymized: Number of newsletter send records anonymized
            - link_clicks_anonymized: Number of link click records anonymized
            - total_anonymized: Total number of records anonymized
    """
    cutoff_date = datetime.utcnow() - timedelta(days=retention_days)
    # Anonymize newsletter_sends records
    newsletter_result = newsletter_sends_collection.update_many(
        {
            'sent_at': {'$lt': cutoff_date},
            'subscriber_email': {'$ne': 'anonymized'}  # Don't re-anonymize
        },
        {
            '$set': {
                'subscriber_email': 'anonymized',
                'anonymized_at': datetime.utcnow()
            }
        }
    )
    # Anonymize link_clicks records
    link_clicks_result = link_clicks_collection.update_many(
        {
            'created_at': {'$lt': cutoff_date},
            'subscriber_email': {'$ne': 'anonymized'}  # Don't re-anonymize
        },
        {
            '$set': {
                'subscriber_email': 'anonymized',
                'anonymized_at': datetime.utcnow()
            }
        }
    )
    newsletter_count = newsletter_result.modified_count
    link_clicks_count = link_clicks_result.modified_count
    return {
        'newsletter_sends_anonymized': newsletter_count,
        'link_clicks_anonymized': link_clicks_count,
        'total_anonymized': newsletter_count + link_clicks_count
    }
 def delete_subscriber_tracking_data(subscriber_email: str) -> Dict[str, int]:
    """
    Delete all tracking data for a specific subscriber.
    Removes all tracking records associated with a subscriber's email address
    from all tracking collections. This supports GDPR right to be forgotten.
    Args:
        subscriber_email: Email address of the subscriber
    Returns:
        dict: Count of deleted records for each collection:
            - newsletter_sends_deleted: Number of newsletter send records deleted
            - link_clicks_deleted: Number of link click records deleted
            - subscriber_activity_deleted: Number of activity records deleted
            - total_deleted: Total number of records deleted
    """
    # Delete from newsletter_sends
    newsletter_result = newsletter_sends_collection.delete_many({
        'subscriber_email': subscriber_email
    })
    # Delete from link_clicks
    link_clicks_result = link_clicks_collection.delete_many({
        'subscriber_email': subscriber_email
    })
    # Delete from subscriber_activity
    activity_result = subscriber_activity_collection.delete_many({
        'email': subscriber_email
    })
    newsletter_count = newsletter_result.deleted_count
    link_clicks_count = link_clicks_result.deleted_count
    activity_count = activity_result.deleted_count
    return {
        'newsletter_sends_deleted': newsletter_count,
        'link_clicks_deleted': link_clicks_count,
        'subscriber_activity_deleted': activity_count,
        'total_deleted': newsletter_count + link_clicks_count + activity_count
    }
--- a/docker-compose.prod.yml
+++ b/docker-compose.prod.yml
@@ -1,33 +0,0 @@
 version: '3.8'
 # Production version with authentication enabled
 # Usage: docker-compose -f docker-compose.prod.yml up -d
 services:
  mongodb:
    image: mongo:7.0
    container_name: munich-news-mongodb
    restart: unless-stopped
    ports:
      - "27017:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: ${MONGO_PASSWORD:-changeme}
      MONGO_INITDB_DATABASE: munich_news
    volumes:
      - mongodb_data:/data/db
      - mongodb_config:/data/configdb
    networks:
      - munich-news-network
    command: mongod --bind_ip_all --auth
 volumes:
  mongodb_data:
    driver: local
  mongodb_config:
    driver: local
 networks:
  munich-news-network:
    driver: bridge
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,24 +1,106 @@
 version: '3.8'
 services:
  # MongoDB Database
  mongodb:
-    image: mongo:7.0
+    image: mongo:latest
    container_name: munich-news-mongodb
    restart: unless-stopped
    ports:
      - "27017:27017"
-    # For development: MongoDB runs without authentication
+    environment:
-    # For production: Uncomment the environment variables below and update MONGODB_URI
+      # For production, set MONGO_PASSWORD environment variable
-    # environment:
+      MONGO_INITDB_ROOT_USERNAME: ${MONGO_USERNAME:-admin}
-    #   MONGO_INITDB_ROOT_USERNAME: admin
+      MONGO_INITDB_ROOT_PASSWORD: ${MONGO_PASSWORD:-changeme}
-    #   MONGO_INITDB_ROOT_PASSWORD: password
+      MONGO_INITDB_DATABASE: munich_news
    #   MONGO_INITDB_DATABASE: munich_news
    volumes:
      - mongodb_data:/data/db
      - mongodb_config:/data/configdb
    networks:
      - munich-news-network
-    command: mongod --bind_ip_all
+    command: mongod --bind_ip_all ${MONGO_AUTH:---auth}
    healthcheck:
      test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
      interval: 30s
      timeout: 10s
      retries: 3
  # News Crawler - Runs at 6 AM Berlin time
  crawler:
    build:
      context: .
      dockerfile: news_crawler/Dockerfile
    container_name: munich-news-crawler
    restart: unless-stopped
    depends_on:
      - mongodb
    environment:
      - MONGODB_URI=mongodb://${MONGO_USERNAME:-admin}:${MONGO_PASSWORD:-changeme}@mongodb:27017/
      - TZ=Europe/Berlin
    volumes:
      - ./backend/.env:/app/.env:ro
      - ./backend/config.py:/app/config.py:ro
      - ./backend/ollama_client.py:/app/ollama_client.py:ro
      - ./news_crawler:/app:ro
    networks:
      - munich-news-network
    healthcheck:
      test: ["CMD", "python", "-c", "import sys; sys.exit(0)"]
      interval: 1m
      timeout: 10s
      retries: 3
  # Backend API - Tracking and analytics
  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    container_name: munich-news-backend
    restart: unless-stopped
    depends_on:
      - mongodb
    ports:
      - "5001:5001"
    environment:
      - MONGODB_URI=mongodb://${MONGO_USERNAME:-admin}:${MONGO_PASSWORD:-changeme}@mongodb:27017/
      - FLASK_PORT=5001
      - TZ=Europe/Berlin
    volumes:
      - ./backend/.env:/app/.env:ro
    networks:
      - munich-news-network
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:5001/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
  # Newsletter Sender - Runs at 7 AM Berlin time
  sender:
    build:
      context: .
      dockerfile: news_sender/Dockerfile
    container_name: munich-news-sender
    restart: unless-stopped
    depends_on:
      - mongodb
      - backend
      - crawler
    environment:
      - MONGODB_URI=mongodb://${MONGO_USERNAME:-admin}:${MONGO_PASSWORD:-changeme}@mongodb:27017/
      - TZ=Europe/Berlin
    volumes:
      - ./backend/.env:/app/.env:ro
      - ./backend/services:/app/backend/services:ro
      - ./news_sender:/app:ro
    networks:
      - munich-news-network
    healthcheck:
      test: ["CMD", "python", "-c", "import sys; sys.exit(0)"]
      interval: 1m
      timeout: 10s
      retries: 3
 volumes:
  mongodb_data:
@@ -29,4 +111,3 @@ volumes:
 networks:
  munich-news-network:
    driver: bridge
--- a/docs/API.md
+++ b/docs/API.md
@@ -0,0 +1,223 @@
 # API Reference
 ## Tracking Endpoints
 ### Track Email Open
 ```http
 GET /api/track/pixel/<tracking_id>
 ```
 Returns a 1x1 transparent PNG and logs the email open event.
 **Response**: Image (image/png)
 ### Track Link Click
 ```http
 GET /api/track/click/<tracking_id>
 ```
 Logs the click event and redirects to the original article URL.
 **Response**: 302 Redirect
 ## Analytics Endpoints
 ### Get Newsletter Metrics
 ```http
 GET /api/analytics/newsletter/<newsletter_id>
 ```
 Returns comprehensive metrics for a specific newsletter.
 **Response**:
 ```json
 {
  "newsletter_id": "2024-01-15",
  "total_sent": 100,
  "total_opened": 75,
  "open_rate": 75.0,
  "unique_openers": 70,
  "total_clicks": 45,
  "unique_clickers": 30,
  "click_through_rate": 30.0
 }
 ```
 ### Get Article Performance
 ```http
 GET /api/analytics/article/<article_url>
 ```
 Returns performance metrics for a specific article.
 **Response**:
 ```json
 {
  "article_url": "https://example.com/article",
  "total_sent": 100,
  "total_clicks": 25,
  "click_rate": 25.0,
  "unique_clickers": 20,
  "newsletters": ["2024-01-15", "2024-01-16"]
 }
 ```
 ### Get Subscriber Activity
 ```http
 GET /api/analytics/subscriber/<email>
 ```
 Returns activity status and engagement metrics for a subscriber.
 **Response**:
 ```json
 {
  "email": "user@example.com",
  "status": "active",
  "last_opened_at": "2024-01-15T10:30:00",
  "last_clicked_at": "2024-01-15T10:35:00",
  "total_opens": 45,
  "total_clicks": 20,
  "newsletters_received": 50,
  "newsletters_opened": 45
 }
 ```
 ## Privacy Endpoints
 ### Delete Subscriber Data
 ```http
 DELETE /api/tracking/subscriber/<email>
 ```
 Deletes all tracking data for a subscriber (GDPR compliance).
 **Response**:
 ```json
 {
  "success": true,
  "message": "All tracking data deleted for user@example.com",
  "deleted_counts": {
    "newsletter_sends": 50,
    "link_clicks": 25,
    "subscriber_activity": 1
  }
 }
 ```
 ### Anonymize Old Data
 ```http
 POST /api/tracking/anonymize
 ```
 Anonymizes tracking data older than the retention period.
 **Request Body** (optional):
 ```json
 {
  "retention_days": 90
 }
 ```
 **Response**:
 ```json
 {
  "success": true,
  "message": "Anonymized tracking data older than 90 days",
  "anonymized_counts": {
    "newsletter_sends": 1250,
    "link_clicks": 650
  }
 }
 ```
 ### Opt Out of Tracking
 ```http
 POST /api/tracking/subscriber/<email>/opt-out
 ```
 Disables tracking for a subscriber.
 **Response**:
 ```json
 {
  "success": true,
  "message": "Subscriber user@example.com has opted out of tracking"
 }
 ```
 ### Opt In to Tracking
 ```http
 POST /api/tracking/subscriber/<email>/opt-in
 ```
 Re-enables tracking for a subscriber.
 **Response**:
 ```json
 {
  "success": true,
  "message": "Subscriber user@example.com has opted in to tracking"
 }
 ```
 ## Examples
 ### Using curl
 ```bash
 # Get newsletter metrics
 curl http://localhost:5001/api/analytics/newsletter/2024-01-15
 # Delete subscriber data
 curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
 # Anonymize old data
 curl -X POST http://localhost:5001/api/tracking/anonymize \
  -H "Content-Type: application/json" \
  -d '{"retention_days": 90}'
 # Opt out of tracking
 curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
 ```
 ### Using Python
 ```python
 import requests
 # Get newsletter metrics
 response = requests.get('http://localhost:5001/api/analytics/newsletter/2024-01-15')
 metrics = response.json()
 print(f"Open rate: {metrics['open_rate']}%")
 # Delete subscriber data
 response = requests.delete('http://localhost:5001/api/tracking/subscriber/user@example.com')
 result = response.json()
 print(result['message'])
 ```
 ## Error Responses
 All endpoints return standard error responses:
 ```json
 {
  "success": false,
  "error": "Error message here"
 }
 ```
 HTTP Status Codes:
 - `200` - Success
 - `404` - Not found
 - `500` - Server error
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,131 @@
 # System Architecture
 ## Overview
 Munich News Daily is a fully automated news aggregation and newsletter system with the following components:
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                    Munich News Daily System                      │
 └─────────────────────────────────────────────────────────────────┘
 6:00 AM Berlin → News Crawler
                 ↓
                 Fetches RSS feeds
                 Extracts full content
                 Generates AI summaries
                 Saves to MongoDB
                 ↓
 7:00 AM Berlin → Newsletter Sender
                 ↓
                 Waits for crawler
                 Fetches articles
                 Generates newsletter
                 Sends to subscribers
                 ↓
                 ✅ Done!
 ```
 ## Components
 ### 1. MongoDB Database
 - **Purpose**: Central data storage
 - **Collections**:
  - `articles`: News articles with summaries
  - `subscribers`: Email subscribers
  - `rss_feeds`: RSS feed sources
  - `newsletter_sends`: Email tracking data
  - `link_clicks`: Link click tracking
  - `subscriber_activity`: Engagement metrics
 ### 2. News Crawler
 - **Schedule**: Daily at 6:00 AM Berlin time
 - **Functions**:
  - Fetches articles from RSS feeds
  - Extracts full article content
  - Generates AI summaries using Ollama
  - Saves to MongoDB
 - **Technology**: Python, BeautifulSoup, Ollama
 ### 3. Newsletter Sender
 - **Schedule**: Daily at 7:00 AM Berlin time
 - **Functions**:
  - Waits for crawler to finish (max 30 min)
  - Fetches today's articles
  - Generates HTML newsletter
  - Injects tracking pixels
  - Sends to all subscribers
 - **Technology**: Python, Jinja2, SMTP
 ### 4. Backend API (Optional)
 - **Purpose**: Tracking and analytics
 - **Endpoints**:
  - `/api/track/pixel/<id>` - Email open tracking
  - `/api/track/click/<id>` - Link click tracking
  - `/api/analytics/*` - Engagement metrics
  - `/api/tracking/*` - Privacy controls
 - **Technology**: Flask, Python
 ## Data Flow
 ```
 RSS Feeds → Crawler → MongoDB → Sender → Subscribers
                         ↓
                    Backend API
                         ↓
                    Analytics
 ```
 ## Coordination
 The sender waits for the crawler to ensure fresh content:
 1. Sender starts at 7:00 AM
 2. Checks for recent articles every 30 seconds
 3. Maximum wait time: 30 minutes
 4. Proceeds once crawler finishes or timeout
 ## Technology Stack
 - **Backend**: Python 3.11
 - **Database**: MongoDB 7.0
 - **AI**: Ollama (Phi3 model)
 - **Scheduling**: Python schedule library
 - **Email**: SMTP with HTML templates
 - **Tracking**: Pixel tracking + redirect URLs
 - **Infrastructure**: Docker & Docker Compose
 ## Deployment
 All components run in Docker containers:
 ```
 docker-compose up -d
 ```
 Containers:
 - `munich-news-mongodb` - Database
 - `munich-news-crawler` - Crawler service
 - `munich-news-sender` - Sender service
 ## Security
 - MongoDB authentication enabled
 - Environment variables for secrets
 - HTTPS for tracking URLs (production)
 - GDPR-compliant data retention
 - Privacy controls (opt-out, deletion)
 ## Monitoring
 - Docker logs for all services
 - MongoDB for data verification
 - Health checks on containers
 - Engagement metrics via API
 ## Scalability
 - Horizontal: Add more crawler instances
 - Vertical: Increase container resources
 - Database: MongoDB sharding if needed
 - Caching: Redis for API responses (future)
--- a/docs/BACKEND_STRUCTURE.md
+++ b/docs/BACKEND_STRUCTURE.md
@@ -17,13 +17,17 @@ backend/
 │   ├── subscription_routes.py # /api/subscribe, /api/unsubscribe
 │   ├── news_routes.py          # /api/news, /api/stats
 │   ├── rss_routes.py           # /api/rss-feeds (CRUD operations)
-│   └── ollama_routes.py        # /api/ollama/* (AI features)
+│   ├── ollama_routes.py        # /api/ollama/* (AI features)
 │   ├── tracking_routes.py      # /api/track/* (email tracking)
 │   └── analytics_routes.py     # /api/analytics/* (engagement metrics)
 │
 └── services/                   # Business logic layer
    ├── __init__.py
    ├── news_service.py         # News fetching and storage logic
    ├── email_service.py        # Newsletter email sending
-    └── ollama_service.py       # Ollama AI integration
+    ├── ollama_service.py       # Ollama AI integration
    ├── tracking_service.py     # Email tracking (opens/clicks)
    └── analytics_service.py    # Engagement analytics
 ```
 ## Key Components
@@ -49,12 +53,16 @@ Each route file is a Flask Blueprint handling specific API endpoints:
 - **news_routes.py**: News fetching and statistics
 - **rss_routes.py**: RSS feed management (add/remove/list/toggle)
 - **ollama_routes.py**: AI/Ollama integration endpoints
 - **tracking_routes.py**: Email tracking (pixel, click redirects, data deletion)
 - **analytics_routes.py**: Engagement analytics (open rates, click rates, subscriber activity)
 ### services/
 Business logic separated from route handlers:
 - **news_service.py**: Fetches news from RSS feeds, saves to database
 - **email_service.py**: Sends newsletter emails to subscribers
 - **ollama_service.py**: Communicates with Ollama AI server
 - **tracking_service.py**: Email tracking logic (tracking IDs, pixel generation, click logging)
 - **analytics_service.py**: Analytics calculations (open rates, click rates, activity classification)
 ## Benefits of This Structure
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
--- a/docs/CRAWLER_HOW_IT_WORKS.md
+++ b/docs/CRAWLER_HOW_IT_WORKS.md
--- a/backend/DATABASE_SCHEMA.md
+++ b/backend/DATABASE_SCHEMA.md
@@ -78,6 +78,134 @@ Stores all newsletter subscribers.
 }
 ```
 ### 3. Newsletter Sends Collection (`newsletter_sends`)
 Tracks each newsletter sent to each subscriber for email open tracking.
 **Document Structure:**
 ```javascript
 {
  _id: ObjectId,                    // Auto-generated MongoDB ID
  newsletter_id: String,            // Unique ID for this newsletter batch (date-based)
  subscriber_email: String,         // Recipient email
  tracking_id: String,              // Unique tracking ID for this send (UUID)
  sent_at: DateTime,                // When email was sent (UTC)
  opened: Boolean,                  // Whether email was opened
  first_opened_at: DateTime,        // First open timestamp (null if not opened)
  last_opened_at: DateTime,         // Most recent open timestamp
  open_count: Number,               // Number of times opened
  created_at: DateTime              // Record creation time (UTC)
 }
 ```
 **Indexes:**
 - `tracking_id` - Unique index for fast pixel request lookups
 - `newsletter_id` - Index for analytics queries
 - `subscriber_email` - Index for user activity queries
 - `sent_at` - Index for time-based queries
 **Example Document:**
 ```javascript
 {
  _id: ObjectId("507f1f77bcf86cd799439013"),
  newsletter_id: "2024-01-15",
  subscriber_email: "user@example.com",
  tracking_id: "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  sent_at: ISODate("2024-01-15T08:00:00.000Z"),
  opened: true,
  first_opened_at: ISODate("2024-01-15T09:30:00.000Z"),
  last_opened_at: ISODate("2024-01-15T14:20:00.000Z"),
  open_count: 3,
  created_at: ISODate("2024-01-15T08:00:00.000Z")
 }
 ```
 ### 4. Link Clicks Collection (`link_clicks`)
 Tracks individual link clicks from newsletters.
 **Document Structure:**
 ```javascript
 {
  _id: ObjectId,                    // Auto-generated MongoDB ID
  tracking_id: String,              // Unique tracking ID for this link (UUID)
  newsletter_id: String,            // Which newsletter this link was in
  subscriber_email: String,         // Who clicked
  article_url: String,              // Original article URL
  article_title: String,            // Article title for reporting
  clicked_at: DateTime,             // When link was clicked (UTC)
  user_agent: String,               // Browser/client info
  created_at: DateTime              // Record creation time (UTC)
 }
 ```
 **Indexes:**
 - `tracking_id` - Unique index for fast redirect request lookups
 - `newsletter_id` - Index for analytics queries
 - `article_url` - Index for article performance queries
 - `subscriber_email` - Index for user activity queries
 **Example Document:**
 ```javascript
 {
  _id: ObjectId("507f1f77bcf86cd799439014"),
  tracking_id: "b2c3d4e5-f6a7-8901-bcde-f12345678901",
  newsletter_id: "2024-01-15",
  subscriber_email: "user@example.com",
  article_url: "https://www.sueddeutsche.de/muenchen/ubahn-1.123456",
  article_title: "New U-Bahn Line Opens in Munich",
  clicked_at: ISODate("2024-01-15T09:35:00.000Z"),
  user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
  created_at: ISODate("2024-01-15T09:35:00.000Z")
 }
 ```
 ### 5. Subscriber Activity Collection (`subscriber_activity`)
 Aggregated activity status for each subscriber.
 **Document Structure:**
 ```javascript
 {
  _id: ObjectId,                    // Auto-generated MongoDB ID
  email: String,                    // Subscriber email (unique)
  status: String,                   // 'active', 'inactive', or 'dormant'
  last_opened_at: DateTime,         // Most recent email open (UTC)
  last_clicked_at: DateTime,        // Most recent link click (UTC)
  total_opens: Number,              // Lifetime open count
  total_clicks: Number,             // Lifetime click count
  newsletters_received: Number,     // Total newsletters sent
  newsletters_opened: Number,       // Total newsletters opened
  updated_at: DateTime              // Last status update (UTC)
 }
 ```
 **Indexes:**
 - `email` - Unique index for fast lookups
 - `status` - Index for filtering by activity level
 - `last_opened_at` - Index for time-based queries
 **Activity Status Classification:**
 - **active**: Opened an email in the last 30 days
 - **inactive**: No opens in 30-60 days
 - **dormant**: No opens in 60+ days
 **Example Document:**
 ```javascript
 {
  _id: ObjectId("507f1f77bcf86cd799439015"),
  email: "user@example.com",
  status: "active",
  last_opened_at: ISODate("2024-01-15T09:30:00.000Z"),
  last_clicked_at: ISODate("2024-01-15T09:35:00.000Z"),
  total_opens: 45,
  total_clicks: 23,
  newsletters_received: 60,
  newsletters_opened: 45,
  updated_at: ISODate("2024-01-15T10:00:00.000Z")
 }
 ```
 ## Design Decisions
 ### Why MongoDB?
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -0,0 +1,274 @@
 # Deployment Guide
 ## Quick Start
 ```bash
 # 1. Clone repository
 git clone <repository-url>
 cd munich-news
 # 2. Configure environment
 cp backend/.env.example backend/.env
 # Edit backend/.env with your settings
 # 3. Start system
 docker-compose up -d
 # 4. View logs
 docker-compose logs -f
 ```
 ## Environment Configuration
 ### Required Settings
 Edit `backend/.env`:
 ```env
 # Email (Required)
 SMTP_SERVER=smtp.gmail.com
 SMTP_PORT=587
 EMAIL_USER=your-email@gmail.com
 EMAIL_PASSWORD=your-app-password
 # MongoDB (Optional - defaults provided)
 MONGODB_URI=mongodb://localhost:27017/
 # Tracking (Optional)
 TRACKING_ENABLED=true
 TRACKING_API_URL=http://localhost:5001
 ```
 ### Optional Settings
 ```env
 # Newsletter
 NEWSLETTER_MAX_ARTICLES=10
 NEWSLETTER_HOURS_LOOKBACK=24
 # Ollama AI
 OLLAMA_ENABLED=true
 OLLAMA_BASE_URL=http://127.0.0.1:11434
 OLLAMA_MODEL=phi3:latest
 # Tracking
 TRACKING_DATA_RETENTION_DAYS=90
 ```
 ## Production Deployment
 ### 1. Set MongoDB Password
 ```bash
 export MONGO_PASSWORD=your-secure-password
 docker-compose up -d
 ```
 ### 2. Use HTTPS for Tracking
 Update `backend/.env`:
 ```env
 TRACKING_API_URL=https://yourdomain.com
 ```
 ### 3. Configure Log Rotation
 Add to `docker-compose.yml`:
 ```yaml
 services:
  crawler:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
 ```
 ### 4. Set Up Backups
 ```bash
 # Daily MongoDB backup
 0 3 * * * docker exec munich-news-mongodb mongodump --out=/data/backup/$(date +\%Y\%m\%d)
 ```
 ### 5. Enable Backend API
 Uncomment backend service in `docker-compose.yml`:
 ```yaml
 backend:
  build:
    context: ./backend
  ports:
    - "5001:5001"
  # ... rest of config
 ```
 ## Schedule Configuration
 ### Change Crawler Time
 Edit `news_crawler/scheduled_crawler.py`:
 ```python
 schedule.every().day.at("06:00").do(run_crawler)  # Change time
 ```
 ### Change Sender Time
 Edit `news_sender/scheduled_sender.py`:
 ```python
 schedule.every().day.at("07:00").do(run_sender)  # Change time
 ```
 Rebuild after changes:
 ```bash
 docker-compose up -d --build
 ```
 ## Database Setup
 ### Add RSS Feeds
 ```bash
 mongosh munich_news
 db.rss_feeds.insertMany([
  {
    name: "Süddeutsche Zeitung München",
    url: "https://www.sueddeutsche.de/muenchen/rss",
    active: true
  },
  {
    name: "Merkur München",
    url: "https://www.merkur.de/lokales/muenchen/rss/feed.rss",
    active: true
  }
 ])
 ```
 ### Add Subscribers
 ```bash
 mongosh munich_news
 db.subscribers.insertMany([
  {
    email: "user1@example.com",
    active: true,
    tracking_enabled: true,
    subscribed_at: new Date()
  },
  {
    email: "user2@example.com",
    active: true,
    tracking_enabled: true,
    subscribed_at: new Date()
  }
 ])
 ```
 ## Monitoring
 ### Check Container Status
 ```bash
 docker-compose ps
 ```
 ### View Logs
 ```bash
 # All services
 docker-compose logs -f
 # Specific service
 docker-compose logs -f crawler
 docker-compose logs -f sender
 ```
 ### Check Database
 ```bash
 mongosh munich_news
 // Count articles
 db.articles.countDocuments()
 // Count subscribers
 db.subscribers.countDocuments({ active: true })
 // View recent articles
 db.articles.find().sort({ crawled_at: -1 }).limit(5)
 ```
 ## Troubleshooting
 ### Containers Won't Start
 ```bash
 # Check logs
 docker-compose logs
 # Rebuild
 docker-compose up -d --build
 # Reset everything
 docker-compose down -v
 docker-compose up -d
 ```
 ### Crawler Not Finding Articles
 ```bash
 # Check RSS feeds
 mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
 # Test manually
 docker-compose exec crawler python crawler_service.py 5
 ```
 ### Newsletter Not Sending
 ```bash
 # Test email
 docker-compose exec sender python sender_service.py test your-email@example.com
 # Check SMTP config
 docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
 ```
 ## Maintenance
 ### Update System
 ```bash
 git pull
 docker-compose up -d --build
 ```
 ### Backup Database
 ```bash
 docker exec munich-news-mongodb mongodump --out=/data/backup
 ```
 ### Clean Old Data
 ```bash
 mongosh munich_news
 // Delete articles older than 90 days
 db.articles.deleteMany({
  crawled_at: { $lt: new Date(Date.now() - 90*24*60*60*1000) }
 })
 ```
 ## Security Checklist
 - [ ] Set strong MongoDB password
 - [ ] Use HTTPS for tracking URLs
 - [ ] Secure SMTP credentials
 - [ ] Enable firewall rules
 - [ ] Set up log rotation
 - [ ] Configure backups
 - [ ] Monitor for failures
 - [ ] Keep dependencies updated
--- a/news_crawler/EXTRACTION_STRATEGIES.md
+++ b/news_crawler/EXTRACTION_STRATEGIES.md
--- a/docs/OLD_ARCHITECTURE.md
+++ b/docs/OLD_ARCHITECTURE.md
--- a/docs/QUICK_REFERENCE.md
+++ b/docs/QUICK_REFERENCE.md
@@ -84,6 +84,33 @@ curl http://localhost:5001/api/ollama/ping
 curl http://localhost:5001/api/ollama/models
 ```
 ### Email Tracking & Analytics
 **Get newsletter metrics:**
 ```bash
 curl http://localhost:5001/api/analytics/newsletter/<newsletter_id>
 ```
 **Get article performance:**
 ```bash
 curl http://localhost:5001/api/analytics/article/<article_id>
 ```
 **Get subscriber activity:**
 ```bash
 curl http://localhost:5001/api/analytics/subscriber/<email>
 ```
 **Delete subscriber tracking data:**
 ```bash
 curl -X DELETE http://localhost:5001/api/tracking/subscriber/<email>
 ```
 **Anonymize old tracking data:**
 ```bash
 curl -X POST http://localhost:5001/api/tracking/anonymize
 ```
 ### Database
 **Connect to MongoDB:**
@@ -110,6 +137,13 @@ db.subscribers.countDocuments({status: "active"})
 db.rss_feeds.find()
 ```
 **Check tracking data:**
 ```javascript
 db.newsletter_sends.find().limit(5)
 db.link_clicks.find().limit(5)
 db.subscriber_activity.find()
 ```
 ## File Locations
 ### Configuration
@@ -186,6 +220,9 @@ EMAIL_PASSWORD=your-app-password
 OLLAMA_BASE_URL=http://127.0.0.1:11434
 OLLAMA_MODEL=phi3:latest
 OLLAMA_ENABLED=true
 TRACKING_ENABLED=true
 TRACKING_API_URL=http://localhost:5001
 TRACKING_DATA_RETENTION_DAYS=90
 ```
 ## Development Workflow
--- a/news_crawler/RSS_URL_EXTRACTION.md
+++ b/news_crawler/RSS_URL_EXTRACTION.md
--- a/docs/SYSTEM_ARCHITECTURE.md
+++ b/docs/SYSTEM_ARCHITECTURE.md
@@ -0,0 +1,412 @@
 # Munich News Daily - System Architecture
 ## 📊 Complete System Overview
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                    Munich News Daily System                      │
 │                     Fully Automated Pipeline                     │
 └─────────────────────────────────────────────────────────────────┘
                         Daily Schedule
                    ┌──────────────────────┐
                    │   6:00 AM Berlin     │
                    │   News Crawler       │
                    └──────────┬───────────┘
                               │
                               ▼
 ┌──────────────────────────────────────────────────────────────────┐
 │                        News Crawler                               │
 │  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐│
 │  │ Fetch RSS  │→ │  Extract   │→ │ Summarize  │→ │   Save to  ││
 │  │   Feeds    │  │  Content   │  │  with AI   │  │  MongoDB   ││
 │  └────────────┘  └────────────┘  └────────────┘  └────────────┘│
 │                                                                   │
 │  Sources: Süddeutsche, Merkur, BR24, etc.                       │
 │  Output: Full articles + AI summaries                            │
 └──────────────────────────────────────────────────────────────────┘
                               │
                               │ Articles saved
                               ▼
                    ┌──────────────────────┐
                    │      MongoDB         │
                    │   (Data Storage)     │
                    └──────────┬───────────┘
                               │
                               │ Wait for crawler
                               ▼
                    ┌──────────────────────┐
                    │   7:00 AM Berlin     │
                    │  Newsletter Sender   │
                    └──────────┬───────────┘
                               │
                               ▼
 ┌──────────────────────────────────────────────────────────────────┐
 │                     Newsletter Sender                             │
 │  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐│
 │  │ Wait for   │→ │   Fetch    │→ │  Generate  │→ │   Send to  ││
 │  │  Crawler   │  │  Articles  │  │ Newsletter │  │ Subscribers││
 │  └────────────┘  └────────────┘  └────────────┘  └────────────┘│
 │                                                                   │
 │  Features: Tracking pixels, link tracking, HTML templates        │
 │  Output: Personalized newsletters with engagement tracking       │
 └──────────────────────────────────────────────────────────────────┘
                               │
                               │ Emails sent
                               ▼
                    ┌──────────────────────┐
                    │    Subscribers       │
                    │  (Email Inboxes)     │
                    └──────────┬───────────┘
                               │
                               │ Opens & clicks
                               ▼
                    ┌──────────────────────┐
                    │  Tracking System     │
                    │  (Analytics API)     │
                    └──────────────────────┘
 ```
 ## 🔄 Data Flow
 ### 1. Content Acquisition (6:00 AM)
 ```
 RSS Feeds → Crawler → Full Content → AI Summary → MongoDB
 ```
 **Details**:
 - Fetches from multiple RSS sources
 - Extracts full article text
 - Generates concise summaries using Ollama
 - Stores with metadata (author, date, source)
 ### 2. Newsletter Generation (7:00 AM)
 ```
 MongoDB → Articles → Template → HTML → Email
 ```
 **Details**:
 - Waits for crawler to finish (max 30 min)
 - Fetches today's articles with summaries
 - Applies Jinja2 template
 - Injects tracking pixels
 - Replaces links with tracking URLs
 ### 3. Engagement Tracking (Ongoing)
 ```
 Email Open → Pixel Load → Log Event → Analytics
 Link Click → Redirect → Log Event → Analytics
 ```
 **Details**:
 - Tracks email opens via 1x1 pixel
 - Tracks link clicks via redirect URLs
 - Stores engagement data in MongoDB
 - Provides analytics API
 ## 🏗️ Component Architecture
 ### Docker Containers
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                    Docker Network                        │
 │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
 │  │   MongoDB    │  │    Crawler   │  │    Sender    │ │
 │  │              │  │              │  │              │ │
 │  │  Port: 27017 │←─│  Schedule:   │←─│  Schedule:   │ │
 │  │              │  │  6:00 AM     │  │  7:00 AM     │ │
 │  │  Storage:    │  │              │  │              │ │
 │  │  - articles  │  │  Depends on: │  │  Depends on: │ │
 │  │  - subscribers│  │  - MongoDB   │  │  - MongoDB   │ │
 │  │  - tracking  │  │              │  │  - Crawler   │ │
 │  └──────────────┘  └──────────────┘  └──────────────┘ │
 │                                                          │
 │  All containers auto-restart on failure                 │
 │  All use Europe/Berlin timezone                         │
 └─────────────────────────────────────────────────────────┘
 ```
 ### Backend Services
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                   Backend Services                       │
 │                                                          │
 │  ┌──────────────────────────────────────────────────┐  │
 │  │  Flask API (Port 5001)                           │  │
 │  │  ┌────────────┐  ┌────────────┐  ┌────────────┐ │  │
 │  │  │ Tracking   │  │ Analytics  │  │  Privacy   │ │  │
 │  │  │ Endpoints  │  │ Endpoints  │  │ Endpoints  │ │  │
 │  │  └────────────┘  └────────────┘  └────────────┘ │  │
 │  └──────────────────────────────────────────────────┘  │
 │                                                          │
 │  ┌──────────────────────────────────────────────────┐  │
 │  │  Services Layer                                  │  │
 │  │  ┌────────────┐  ┌────────────┐  ┌────────────┐ │  │
 │  │  │ Tracking   │  │ Analytics  │  │  Ollama    │ │  │
 │  │  │ Service    │  │ Service    │  │  Client    │ │  │
 │  │  └────────────┘  └────────────┘  └────────────┘ │  │
 │  └──────────────────────────────────────────────────┘  │
 └─────────────────────────────────────────────────────────┘
 ```
 ## 📅 Daily Timeline
 ```
 Time (Berlin)  │ Event                    │ Duration
 ───────────────┼──────────────────────────┼──────────
 05:59:59       │ System idle              │ -
 06:00:00       │ Crawler starts           │ ~10-20 min
 06:00:01       │ - Fetch RSS feeds        │
 06:02:00       │ - Extract content        │
 06:05:00       │ - Generate summaries     │
 06:15:00       │ - Save to MongoDB        │
 06:20:00       │ Crawler finishes         │
 06:20:01       │ System idle              │ ~40 min
 07:00:00       │ Sender starts            │ ~5-10 min
 07:00:01       │ - Wait for crawler       │ (checks every 30s)
 07:00:30       │ - Crawler confirmed done │
 07:00:31       │ - Fetch articles         │
 07:01:00       │ - Generate newsletters   │
 07:02:00       │ - Send to subscribers    │
 07:10:00       │ Sender finishes          │
 07:10:01       │ System idle              │ Until tomorrow
 ```
 ## 🔐 Security & Privacy
 ### Data Protection
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                  Privacy Features                        │
 │                                                          │
 │  ┌──────────────────────────────────────────────────┐  │
 │  │  Data Retention                                  │  │
 │  │  - Personal data: 90 days                        │  │
 │  │  - Anonymization: Automatic                      │  │
 │  │  - Deletion: On request                          │  │
 │  └──────────────────────────────────────────────────┘  │
 │                                                          │
 │  ┌──────────────────────────────────────────────────┐  │
 │  │  User Rights                                     │  │
 │  │  - Opt-out: Anytime                              │  │
 │  │  - Data access: API available                    │  │
 │  │  - Data deletion: Full removal                   │  │
 │  └──────────────────────────────────────────────────┘  │
 │                                                          │
 │  ┌──────────────────────────────────────────────────┐  │
 │  │  Compliance                                      │  │
 │  │  - GDPR compliant                                │  │
 │  │  - Privacy notice in emails                      │  │
 │  │  - Transparent tracking                          │  │
 │  └──────────────────────────────────────────────────┘  │
 └─────────────────────────────────────────────────────────┘
 ```
 ## 📊 Database Schema
 ### Collections
 ```
 MongoDB (munich_news)
 │
 ├── articles
 │   ├── title
 │   ├── author
 │   ├── content (full text)
 │   ├── summary (AI generated)
 │   ├── link
 │   ├── source
 │   ├── published_at
 │   └── crawled_at
 │
 ├── subscribers
 │   ├── email
 │   ├── active
 │   ├── tracking_enabled
 │   └── subscribed_at
 │
 ├── rss_feeds
 │   ├── name
 │   ├── url
 │   └── active
 │
 ├── newsletter_sends
 │   ├── tracking_id
 │   ├── newsletter_id
 │   ├── subscriber_email
 │   ├── opened
 │   ├── first_opened_at
 │   └── open_count
 │
 ├── link_clicks
 │   ├── tracking_id
 │   ├── newsletter_id
 │   ├── subscriber_email
 │   ├── article_url
 │   ├── clicked
 │   └── clicked_at
 │
 └── subscriber_activity
    ├── email
    ├── status (active/inactive/dormant)
    ├── last_opened_at
    ├── last_clicked_at
    ├── total_opens
    └── total_clicks
 ```
 ## 🚀 Deployment Architecture
 ### Development
 ```
 Local Machine
 ├── Docker Compose
 │   ├── MongoDB (no auth)
 │   ├── Crawler
 │   └── Sender
 ├── Backend (manual start)
 │   └── Flask API
 └── Ollama (optional)
    └── AI Summarization
 ```
 ### Production
 ```
 Server
 ├── Docker Compose (prod)
 │   ├── MongoDB (with auth)
 │   ├── Crawler
 │   └── Sender
 ├── Backend (systemd/pm2)
 │   └── Flask API (HTTPS)
 ├── Ollama (optional)
 │   └── AI Summarization
 └── Nginx (reverse proxy)
    └── SSL/TLS
 ```
 ## 🔄 Coordination Mechanism
 ### Crawler-Sender Synchronization
 ```
 ┌─────────────────────────────────────────────────────────┐
 │              Coordination Flow                           │
 │                                                          │
 │  6:00 AM → Crawler starts                               │
 │            ↓                                             │
 │            Crawling articles...                          │
 │            ↓                                             │
 │            Saves to MongoDB                              │
 │            ↓                                             │
 │  6:20 AM → Crawler finishes                             │
 │            ↓                                             │
 │  7:00 AM → Sender starts                                │
 │            ↓                                             │
 │            Check: Recent articles? ──→ No ──┐           │
 │            ↓ Yes                            │           │
 │            Proceed with send                │           │
 │                                             │           │
 │            ← Wait 30s ← Wait 30s ← Wait 30s┘           │
 │            (max 30 minutes)                             │
 │                                                          │
 │  7:10 AM → Newsletter sent                              │
 └─────────────────────────────────────────────────────────┘
 ```
 ## 📈 Monitoring & Observability
 ### Key Metrics
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                    Metrics to Monitor                    │
 │                                                          │
 │  Crawler:                                                │
 │  - Articles crawled per day                              │
 │  - Crawl duration                                        │
 │  - Success/failure rate                                  │
 │  - Summary generation rate                               │
 │                                                          │
 │  Sender:                                                 │
 │  - Newsletters sent per day                              │
 │  - Send duration                                         │
 │  - Success/failure rate                                  │
 │  - Wait time for crawler                                 │
 │                                                          │
 │  Engagement:                                             │
 │  - Open rate                                             │
 │  - Click-through rate                                    │
 │  - Active subscribers                                    │
 │  - Dormant subscribers                                   │
 │                                                          │
 │  System:                                                 │
 │  - Container uptime                                      │
 │  - Database size                                         │
 │  - Error rate                                            │
 │  - Response times                                        │
 └─────────────────────────────────────────────────────────┘
 ```
 ## 🛠️ Maintenance Tasks
 ### Daily
 - Check logs for errors
 - Verify newsletters sent
 - Monitor engagement metrics
 ### Weekly
 - Review article quality
 - Check subscriber growth
 - Analyze engagement trends
 ### Monthly
 - Archive old articles
 - Clean up dormant subscribers
 - Update dependencies
 - Review system performance
 ## 📚 Technology Stack
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                   Technology Stack                       │
 │                                                          │
 │  Backend:                                                │
 │  - Python 3.11                                           │
 │  - Flask (API)                                           │
 │  - PyMongo (Database)                                    │
 │  - Schedule (Automation)                                 │
 │  - Jinja2 (Templates)                                    │
 │  - BeautifulSoup (Parsing)                               │
 │                                                          │
 │  Database:                                               │
 │  - MongoDB 7.0                                           │
 │                                                          │
 │  AI/ML:                                                  │
 │  - Ollama (Summarization)                                │
 │  - Phi3 Model (default)                                  │
 │                                                          │
 │  Infrastructure:                                         │
 │  - Docker & Docker Compose                               │
 │  - Linux (Ubuntu/Debian)                                 │
 │                                                          │
 │  Email:                                                  │
 │  - SMTP (configurable)                                   │
 │  - HTML emails with tracking                             │
 └─────────────────────────────────────────────────────────┘
 ```
 ---
 **Last Updated**: 2024-01-16  
 **Version**: 1.0  
 **Status**: Production Ready ✅
--- a/news_crawler/CHANGES.md
+++ b/news_crawler/CHANGES.md
@@ -1,191 +0,0 @@
 # Recent Changes - Full Content Storage
 ## ✅ What Changed
 ### 1. Removed Content Length Limit
 **Before:**
 ```python
 'content': content_text[:10000]  # Limited to 10k chars
 ```
 **After:**
 ```python
 'content': content_text  # Full content, no limit
 ```
 ### 2. Simplified Database Schema
 **Before:**
 ```javascript
 {
  summary: String,      // Short summary
  full_content: String  // Limited content
 }
 ```
 **After:**
 ```javascript
 {
  content: String  // Full article content, no limit
 }
 ```
 ### 3. Enhanced API Response
 **Before:**
 ```javascript
 {
  title: "...",
  link: "...",
  summary: "..."
 }
 ```
 **After:**
 ```javascript
 {
  title: "...",
  author: "...",        // NEW!
  link: "...",
  preview: "...",       // First 200 chars
  word_count: 1250,     // NEW!
  has_full_content: true // NEW!
 }
 ```
 ## 📊 Database Structure
 ### Articles Collection
 ```javascript
 {
  _id: ObjectId,
  title: String,           // Article title
  author: String,          // Article author (extracted)
  link: String,            // Article URL (unique)
  content: String,         // FULL article content (no limit)
  word_count: Number,      // Word count
  source: String,          // RSS feed name
  published_at: String,    // Publication date
  crawled_at: DateTime,    // When crawled
  created_at: DateTime     // When added
 }
 ```
 ## 🆕 New API Endpoint
 ### GET /api/news/<article_url>
 Get full article content by URL.
 **Example:**
 ```bash
 # URL encode the article URL
 curl "http://localhost:5001/api/news/https%3A%2F%2Fexample.com%2Farticle"
 ```
 **Response:**
 ```json
 {
  "title": "New U-Bahn Line Opens in Munich",
  "author": "Max Mustermann",
  "link": "https://example.com/article",
  "content": "The full article text here... (complete, no truncation)",
  "word_count": 1250,
  "source": "Süddeutsche Zeitung München",
  "published_at": "2024-11-10T10:00:00Z",
  "crawled_at": "2024-11-10T16:30:00Z",
  "created_at": "2024-11-10T16:00:00Z"
 }
 ```
 ## 📈 Enhanced Stats
 ### GET /api/stats
 Now includes crawled article count:
 ```json
 {
  "subscribers": 150,
  "articles": 500,
  "crawled_articles": 350  // NEW!
 }
 ```
 ## 🎯 Benefits
 1. **Complete Content** - No truncation, full articles stored
 2. **Better for AI** - Full context for summarization/analysis
 3. **Cleaner Schema** - Single `content` field instead of `summary` + `full_content`
 4. **More Metadata** - Author, word count, crawl timestamp
 5. **Better API** - Preview in list, full content on demand
 ## 🔄 Migration
 If you have existing articles with `full_content` field, they will continue to work. New articles will use the `content` field.
 To migrate old articles:
 ```javascript
 // MongoDB shell
 db.articles.updateMany(
  { full_content: { $exists: true } },
  [
    {
      $set: {
        content: "$full_content"
      }
    },
    {
      $unset: ["full_content", "summary"]
    }
  ]
 )
 ```
 ## 🚀 Usage
 ### Crawl Articles
 ```bash
 cd news_crawler
 python crawler_service.py 10
 ```
 ### Get Article List (with previews)
 ```bash
 curl http://localhost:5001/api/news
 ```
 ### Get Full Article Content
 ```bash
 # Get the article URL from the list, then:
 curl "http://localhost:5001/api/news/<encoded_url>"
 ```
 ### Check Stats
 ```bash
 curl http://localhost:5001/api/stats
 ```
 ## 📝 Example Workflow
 1. **Add RSS Feed**
 ```bash
 curl -X POST http://localhost:5001/api/rss-feeds \
  -H "Content-Type: application/json" \
  -d '{"name": "News Source", "url": "https://example.com/rss"}'
 ```
 2. **Crawl Articles**
 ```bash
 cd news_crawler
 python crawler_service.py 10
 ```
 3. **View Articles**
 ```bash
 curl http://localhost:5001/api/news
 ```
 4. **Get Full Content**
 ```bash
 # Copy article link from above, URL encode it
 curl "http://localhost:5001/api/news/https%3A%2F%2Fexample.com%2Farticle"
 ```
 Now you have complete article content ready for AI processing! 🎉
--- a/news_crawler/Dockerfile
+++ b/news_crawler/Dockerfile
@@ -6,8 +6,20 @@ WORKDIR /app
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
-# Copy crawler service
+# Copy crawler files
-COPY crawler_service.py .
+COPY . .
-# Run crawler
+# Copy backend config files (needed for Config class)
-CMD ["python", "crawler_service.py"]
+COPY ../backend/config.py /app/config.py
 COPY ../backend/ollama_client.py /app/ollama_client.py
 COPY ../backend/.env /app/.env
 # Make the scheduler executable
 RUN chmod +x scheduled_crawler.py
 # Set timezone to Berlin
 ENV TZ=Europe/Berlin
 RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
 # Run the scheduled crawler
 CMD ["python", "-u", "scheduled_crawler.py"]
--- a/news_crawler/QUICKSTART.md
+++ b/news_crawler/QUICKSTART.md
@@ -1,127 +0,0 @@
 # News Crawler - Quick Start
 ## 1. Install Dependencies
 ```bash
 cd news_crawler
 pip install -r requirements.txt
 ```
 ## 2. Configure Environment
 Make sure MongoDB is running and accessible. The crawler will use the same database as the backend.
 Default connection: `mongodb://localhost:27017/`
 To use a different MongoDB URI, create a `.env` file:
 ```env
 MONGODB_URI=mongodb://localhost:27017/
 ```
 ## 3. Run the Crawler
 ```bash
 # Crawl up to 10 articles per feed
 python crawler_service.py
 # Crawl up to 20 articles per feed
 python crawler_service.py 20
 ```
 ## 4. Verify Results
 Check your MongoDB database:
 ```bash
 # Using mongosh
 mongosh
 use munich_news
 db.articles.find({full_content: {$exists: true}}).count()
 db.articles.findOne({full_content: {$exists: true}})
 ```
 ## 5. Schedule Regular Crawling
 ### Option A: Cron (Linux/Mac)
 ```bash
 # Edit crontab
 crontab -e
 # Add this line to run every 6 hours
 0 */6 * * * cd /path/to/news_crawler && /path/to/venv/bin/python crawler_service.py
 ```
 ### Option B: Docker
 ```bash
 # Build and run
 docker-compose up
 # Or run as a one-off
 docker-compose run --rm crawler
 ```
 ### Option C: Manual
 Just run the script whenever you want to fetch new articles:
 ```bash
 python crawler_service.py
 ```
 ## What Gets Crawled?
 The crawler:
 1. Fetches all active RSS feeds from the database
 2. For each feed, gets the latest articles
 3. Crawls the full content from each article URL
 4. Saves: title, full_content, word_count, crawled_at
 5. Skips articles that already have content
 ## Output Example
 ```
 ============================================================
 🚀 Starting RSS Feed Crawler
 ============================================================
 Found 3 active feed(s)
 📰 Crawling feed: Süddeutsche Zeitung München
   URL: https://www.sueddeutsche.de/muenchen/rss
   🔍 Crawling: New U-Bahn Line Opens in Munich...
   ✓ Saved (1250 words)
   🔍 Crawling: Munich Weather Update...
   ✓ Saved (450 words)
   ✓ Crawled 2 articles from Süddeutsche Zeitung München
 ============================================================
 ✓ Crawling Complete!
  Total feeds processed: 3
  Total articles crawled: 15
  Duration: 45.23 seconds
 ============================================================
 ```
 ## Troubleshooting
 **No feeds found:**
 - Make sure you've added RSS feeds via the backend API
 - Check MongoDB connection
 **Can't extract content:**
 - Some sites block scrapers
 - Some sites require JavaScript (not supported yet)
 - Check if the URL is accessible
 **Timeout errors:**
 - Increase timeout in the code
 - Check your internet connection
 ## Next Steps
 Once articles are crawled, you can:
 - View them in the frontend
 - Use Ollama to summarize them
 - Generate newsletters with full content
 - Perform text analysis
--- a/news_crawler/README.md
+++ b/news_crawler/README.md
@@ -1,225 +0,0 @@
 # News Crawler Microservice
 A standalone microservice that crawls full article content from RSS feeds and stores it in MongoDB.
 ## Features
 - 🔍 Extracts full article content from RSS feed links
 - 📊 Calculates word count
 - 🔄 Avoids re-crawling already processed articles
 - ⏱️ Rate limiting (1 second delay between requests)
 - 🎯 Smart content extraction using multiple selectors
 - 🧹 Cleans up scripts, styles, and navigation elements
 ## Installation
 1. Create a virtual environment:
 ```bash
 python -m venv venv
 source venv/bin/activate  # On Windows: venv\Scripts\activate
 ```
 2. Install dependencies:
 ```bash
 pip install -r requirements.txt
 ```
 3. Configure environment variables:
 Create a `.env` file in the project root (or use the backend's `.env`):
 ```env
 MONGODB_URI=mongodb://localhost:27017/
 ```
 ## Usage
 ### Standalone Execution
 Run the crawler directly:
 ```bash
 # Crawl up to 10 articles per feed (default)
 python crawler_service.py
 # Crawl up to 20 articles per feed
 python crawler_service.py 20
 ```
 ### As a Module
 ```python
 from crawler_service import crawl_all_feeds, crawl_rss_feed
 # Crawl all active feeds
 result = crawl_all_feeds(max_articles_per_feed=10)
 print(result)
 # Crawl a specific feed
 crawl_rss_feed(
    feed_url='https://example.com/rss',
    feed_name='Example News',
    max_articles=10
 )
 ```
 ### Via Backend API
 The backend has integrated endpoints:
 ```bash
 # Start crawler
 curl -X POST http://localhost:5001/api/crawler/start
 # Check status
 curl http://localhost:5001/api/crawler/status
 # Crawl specific feed
 curl -X POST http://localhost:5001/api/crawler/feed/<feed_id>
 ```
 ## How It Works
 1. **Fetch RSS Feeds**: Gets all active RSS feeds from MongoDB
 2. **Parse Feed**: Extracts article links from each feed
 3. **Crawl Content**: For each article:
   - Fetches HTML page
   - Removes scripts, styles, navigation
   - Extracts main content using smart selectors
   - Calculates word count
 4. **Store Data**: Saves to MongoDB with metadata
 5. **Skip Duplicates**: Avoids re-crawling articles with existing content
 ## Content Extraction Strategy
 The crawler tries multiple selectors in order:
 1. `<article>` tag
 2. Elements with class containing "article-content", "article-body"
 3. Elements with class containing "post-content", "entry-content"
 4. `<main>` tag
 5. Fallback to all `<p>` tags in body
 ## Database Schema
 Articles are stored with these fields:
 ```javascript
 {
  title: String,           // Article title
  link: String,            // Article URL (unique)
  summary: String,         // Short summary
  full_content: String,    // Full article text (max 10,000 chars)
  word_count: Number,      // Number of words
  source: String,          // RSS feed name
  published_at: String,    // Publication date
  crawled_at: DateTime,    // When content was crawled
  created_at: DateTime     // When added to database
 }
 ```
 ## Scheduling
 ### Using Cron (Linux/Mac)
 ```bash
 # Run every 6 hours
 0 */6 * * * cd /path/to/news_crawler && /path/to/venv/bin/python crawler_service.py
 ```
 ### Using systemd Timer (Linux)
 Create `/etc/systemd/system/news-crawler.service`:
 ```ini
 [Unit]
 Description=News Crawler Service
 [Service]
 Type=oneshot
 WorkingDirectory=/path/to/news_crawler
 ExecStart=/path/to/venv/bin/python crawler_service.py
 User=your-user
 ```
 Create `/etc/systemd/system/news-crawler.timer`:
 ```ini
 [Unit]
 Description=Run News Crawler every 6 hours
 [Timer]
 OnBootSec=5min
 OnUnitActiveSec=6h
 [Install]
 WantedBy=timers.target
 ```
 Enable and start:
 ```bash
 sudo systemctl enable news-crawler.timer
 sudo systemctl start news-crawler.timer
 ```
 ### Using Docker
 Create `Dockerfile`:
 ```dockerfile
 FROM python:3.11-slim
 WORKDIR /app
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 COPY crawler_service.py .
 CMD ["python", "crawler_service.py"]
 ```
 Build and run:
 ```bash
 docker build -t news-crawler .
 docker run --env-file ../.env news-crawler
 ```
 ## Configuration
 Environment variables:
 - `MONGODB_URI` - MongoDB connection string (default: `mongodb://localhost:27017/`)
 ## Rate Limiting
 - 1 second delay between article requests
 - Respects server resources
 - User-Agent header included
 ## Troubleshooting
 **Issue: Can't extract content**
 - Some sites block scrapers
 - Try adjusting User-Agent header
 - Some sites require JavaScript (consider Selenium)
 **Issue: Timeout errors**
 - Increase timeout in `extract_article_content()`
 - Check network connectivity
 **Issue: Memory usage**
 - Reduce `max_articles_per_feed`
 - Content limited to 10,000 characters per article
 ## Architecture
 This is a standalone microservice that:
 - Can run independently of the main backend
 - Shares the same MongoDB database
 - Can be deployed separately
 - Can be scheduled independently
 ## Next Steps
 Once articles are crawled, you can:
 - Use Ollama to summarize articles
 - Perform sentiment analysis
 - Extract keywords and topics
 - Generate newsletter content
 - Create article recommendations
--- a/news_crawler/docker-compose.yml
+++ b/news_crawler/docker-compose.yml
@@ -1,33 +0,0 @@
 version: '3.8'
 services:
  crawler:
    build: .
    container_name: news-crawler
    environment:
      - MONGODB_URI=mongodb://mongodb:27017/
    networks:
      - munich-news-network
    depends_on:
      - mongodb
    # Run once and exit
    restart: "no"
  mongodb:
    image: mongo:7.0
    container_name: munich-news-mongodb
    restart: unless-stopped
    ports:
      - "27017:27017"
    volumes:
      - mongodb_data:/data/db
    networks:
      - munich-news-network
 volumes:
  mongodb_data:
    driver: local
 networks:
  munich-news-network:
    driver: bridge
--- a/news_crawler/requirements.txt
+++ b/news_crawler/requirements.txt
@@ -4,3 +4,5 @@ requests==2.31.0
 feedparser==6.0.10
 pymongo==4.6.1
 python-dotenv==1.0.0
 schedule==1.2.0
 pytz==2023.3
--- a/news_crawler/scheduled_crawler.py
+++ b/news_crawler/scheduled_crawler.py
@@ -0,0 +1,75 @@
 #!/usr/bin/env python3
 """
 Scheduled crawler that runs daily at 6 AM Berlin time
 """
 import schedule
 import time
 from datetime import datetime
 import pytz
 from crawler_service import crawl_all_feeds
 # Berlin timezone
 BERLIN_TZ = pytz.timezone('Europe/Berlin')
 def run_crawler():
    """Run the crawler and log the execution"""
    berlin_time = datetime.now(BERLIN_TZ)
    print(f"\n{'='*60}")
    print(f"🕐 Scheduled crawler started at {berlin_time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
    print(f"{'='*60}\n")
    try:
        # Run crawler with max 20 articles per feed
        result = crawl_all_feeds(max_articles_per_feed=20)
        print(f"\n{'='*60}")
        print(f"✓ Scheduled crawler completed successfully")
        print(f"  Articles crawled: {result['total_articles_crawled']}")
        print(f"  Duration: {result['duration_seconds']}s")
        print(f"{'='*60}\n")
    except Exception as e:
        print(f"\n{'='*60}")
        print(f"✗ Scheduled crawler failed: {e}")
        print(f"{'='*60}\n")
 def main():
    """Main scheduler loop"""
    print("🤖 Munich News Crawler Scheduler")
    print("="*60)
    print("Schedule: Daily at 6:00 AM Berlin time")
    print("Timezone: Europe/Berlin (CET/CEST)")
    print("="*60)
    # Schedule the crawler to run at 6 AM Berlin time
    schedule.every().day.at("06:00").do(run_crawler)
    # Show next run time
    berlin_time = datetime.now(BERLIN_TZ)
    print(f"\nCurrent time (Berlin): {berlin_time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
    # Get next scheduled run
    next_run = schedule.next_run()
    if next_run:
        # Convert to Berlin time for display
        next_run_berlin = next_run.astimezone(BERLIN_TZ)
        print(f"Next scheduled run: {next_run_berlin.strftime('%Y-%m-%d %H:%M:%S %Z')}")
    print("\n⏳ Scheduler is running... (Press Ctrl+C to stop)\n")
    # Run immediately on startup (optional - comment out if you don't want this)
    print("🚀 Running initial crawl on startup...")
    run_crawler()
    # Keep the scheduler running
    while True:
        schedule.run_pending()
        time.sleep(60)  # Check every minute
 if __name__ == '__main__':
    try:
        main()
    except KeyboardInterrupt:
        print("\n\n👋 Scheduler stopped by user")
    except Exception as e:
        print(f"\n\n✗ Scheduler error: {e}")
--- a/news_sender/Dockerfile
+++ b/news_sender/Dockerfile
@@ -0,0 +1,24 @@
 FROM python:3.11-slim
 WORKDIR /app
 # Install dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy sender files
 COPY . .
 # Copy backend files (needed for tracking and config)
 COPY ../backend/services /app/backend/services
 COPY ../backend/.env /app/.env
 # Make the scheduler executable
 RUN chmod +x scheduled_sender.py
 # Set timezone to Berlin
 ENV TZ=Europe/Berlin
 RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
 # Run the scheduled sender
 CMD ["python", "-u", "scheduled_sender.py"]
--- a/news_sender/README.md
+++ b/news_sender/README.md
@@ -1,303 +0,0 @@
 # News Sender Microservice
 Standalone service for sending Munich News Daily newsletters to subscribers.
 ## Features
 - 📧 Sends beautiful HTML newsletters
 - 🤖 Uses AI-generated article summaries
 - 📊 Tracks sending statistics
 - 🧪 Test mode for development
 - 📝 Preview generation
 - 🔄 Fetches data from shared MongoDB
 ## Installation
 ```bash
 cd news_sender
 pip install -r requirements.txt
 ```
 ## Configuration
 The service uses the same `.env` file as the backend (`../backend/.env`):
 ```env
 # MongoDB
 MONGODB_URI=mongodb://localhost:27017/
 # Email (Gmail example)
 SMTP_SERVER=smtp.gmail.com
 SMTP_PORT=587
 EMAIL_USER=your-email@gmail.com
 EMAIL_PASSWORD=your-app-password
 # Newsletter Settings (optional)
 NEWSLETTER_MAX_ARTICLES=10
 WEBSITE_URL=http://localhost:3000
 ```
 **Gmail Setup:**
 1. Enable 2-factor authentication
 2. Generate an App Password: https://support.google.com/accounts/answer/185833
 3. Use the App Password (not your regular password)
 ## Usage
 ### 1. Preview Newsletter
 Generate HTML preview without sending:
 ```bash
 python sender_service.py preview
 ```
 This creates `newsletter_preview.html` - open it in your browser to see how the newsletter looks.
 ### 2. Send Test Email
 Send to a single email address for testing:
 ```bash
 python sender_service.py test your-email@example.com
 ```
 ### 3. Send to All Subscribers
 Send newsletter to all active subscribers:
 ```bash
 # Send with default article count (10)
 python sender_service.py send
 # Send with custom article count
 python sender_service.py send 15
 ```
 ### 4. Use as Python Module
 ```python
 from sender_service import send_newsletter, preview_newsletter
 # Send newsletter
 result = send_newsletter(max_articles=10)
 print(f"Sent to {result['sent_count']} subscribers")
 # Generate preview
 html = preview_newsletter(max_articles=5)
 ```
 ## How It Works
 ```
 ┌─────────────────────────────────────────────────────────┐
 │  1. Fetch Articles from MongoDB                         │
 │     - Get latest articles with AI summaries             │
 │     - Sort by creation date (newest first)              │
 └─────────────────────────────────────────────────────────┘
                         ↓
 ┌─────────────────────────────────────────────────────────┐
 │  2. Fetch Active Subscribers                            │
 │     - Get all subscribers with status='active'          │
 └─────────────────────────────────────────────────────────┘
                         ↓
 ┌─────────────────────────────────────────────────────────┐
 │  3. Render Newsletter HTML                              │
 │     - Load newsletter_template.html                     │
 │     - Populate with articles and metadata               │
 │     - Generate beautiful HTML email                     │
 └─────────────────────────────────────────────────────────┘
                         ↓
 ┌─────────────────────────────────────────────────────────┐
 │  4. Send Emails                                         │
 │     - Connect to SMTP server                            │
 │     - Send to each subscriber                           │
 │     - Track success/failure                             │
 └─────────────────────────────────────────────────────────┘
                         ↓
 ┌─────────────────────────────────────────────────────────┐
 │  5. Report Statistics                                   │
 │     - Total sent                                        │
 │     - Failed sends                                      │
 │     - Error details                                     │
 └─────────────────────────────────────────────────────────┘
 ```
 ## Output Example
 ```
 ======================================================================
 📧 Munich News Daily - Newsletter Sender
 ======================================================================
 Fetching latest 10 articles with AI summaries...
 ✓ Found 10 articles
 Fetching active subscribers...
 ✓ Found 150 active subscriber(s)
 Rendering newsletter HTML...
 ✓ Newsletter rendered
 Sending newsletter: 'Munich News Daily - November 10, 2024'
 ----------------------------------------------------------------------
 [1/150] Sending to user1@example.com... ✓
 [2/150] Sending to user2@example.com... ✓
 [3/150] Sending to user3@example.com... ✓
 ...
 ======================================================================
 📊 Sending Complete
 ======================================================================
 ✓ Successfully sent: 148
 ✗ Failed: 2
 📰 Articles included: 10
 ======================================================================
 ```
 ## Scheduling
 ### Using Cron (Linux/Mac)
 Send newsletter daily at 8 AM:
 ```bash
 # Edit crontab
 crontab -e
 # Add this line
 0 8 * * * cd /path/to/news_sender && /path/to/venv/bin/python sender_service.py send
 ```
 ### Using systemd Timer (Linux)
 Create `/etc/systemd/system/news-sender.service`:
 ```ini
 [Unit]
 Description=Munich News Sender
 [Service]
 Type=oneshot
 WorkingDirectory=/path/to/news_sender
 ExecStart=/path/to/venv/bin/python sender_service.py send
 User=your-user
 ```
 Create `/etc/systemd/system/news-sender.timer`:
 ```ini
 [Unit]
 Description=Send Munich News Daily at 8 AM
 [Timer]
 OnCalendar=daily
 OnCalendar=*-*-* 08:00:00
 [Install]
 WantedBy=timers.target
 ```
 Enable and start:
 ```bash
 sudo systemctl enable news-sender.timer
 sudo systemctl start news-sender.timer
 ```
 ### Using Docker
 Create `Dockerfile`:
 ```dockerfile
 FROM python:3.11-slim
 WORKDIR /app
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 COPY sender_service.py newsletter_template.html ./
 CMD ["python", "sender_service.py", "send"]
 ```
 Build and run:
 ```bash
 docker build -t news-sender .
 docker run --env-file ../backend/.env news-sender
 ```
 ## Troubleshooting
 ### "Email credentials not configured"
 - Check that `EMAIL_USER` and `EMAIL_PASSWORD` are set in `.env`
 - For Gmail, use an App Password, not your regular password
 ### "No articles with summaries found"
 - Run the crawler first: `cd ../news_crawler && python crawler_service.py 10`
 - Make sure Ollama is enabled and working
 - Check MongoDB has articles with `summary` field
 ### "No active subscribers found"
 - Add subscribers via the backend API
 - Check subscriber status is 'active' in MongoDB
 ### SMTP Connection Errors
 - Verify SMTP server and port are correct
 - Check firewall isn't blocking SMTP port
 - For Gmail, ensure "Less secure app access" is enabled or use App Password
 ### Emails Going to Spam
 - Set up SPF, DKIM, and DMARC records for your domain
 - Use a verified email address
 - Avoid spam trigger words in subject/content
 - Include unsubscribe link (already included in template)
 ## Architecture
 This is a standalone microservice that:
 - Runs independently of the backend
 - Shares the same MongoDB database
 - Can be deployed separately
 - Can be scheduled independently
 - Has no dependencies on backend code
 ## Integration with Other Services
 ```
 ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
 │   Backend    │     │   Crawler    │     │    Sender    │
 │   (Flask)    │     │  (Scraper)   │     │   (Email)    │
 └──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                     │
       │                    │                     │
       └────────────────────┴─────────────────────┘
                            │
                    ┌───────▼────────┐
                    │    MongoDB     │
                    │  (Shared DB)   │
                    └────────────────┘
 ```
 ## Next Steps
 1. **Test the newsletter:**
   ```bash
   python sender_service.py test your-email@example.com
   ```
 2. **Schedule daily sending:**
   - Set up cron job or systemd timer
   - Choose appropriate time (e.g., 8 AM)
 3. **Monitor sending:**
   - Check logs for errors
   - Track open rates (requires email tracking service)
   - Monitor spam complaints
 4. **Optimize:**
   - Add email tracking pixels
   - A/B test subject lines
   - Personalize content per subscriber
--- a/news_sender/newsletter_template.html
+++ b/news_sender/newsletter_template.html
@@ -146,6 +146,14 @@
                                <a href="{{ unsubscribe_link }}" style="color: #999999; text-decoration: none;">Unsubscribe</a>
                            </p>
                            {% if tracking_enabled %}
                            <!-- Privacy Notice -->
                            <p style="margin: 20px 0 0 0; font-size: 11px; color: #666666; line-height: 1.4;">
                                This email contains tracking to measure engagement and improve our content.<br>
                                We respect your privacy and anonymize data after 90 days.
                            </p>
                            {% endif %}
                            <p style="margin: 20px 0 0 0; font-size: 11px; color: #666666;">
                                © {{ year }} Munich News Daily. All rights reserved.
                            </p>
--- a/news_sender/requirements.txt
+++ b/news_sender/requirements.txt
@@ -1,3 +1,6 @@
 pymongo==4.6.1
 python-dotenv==1.0.0
 Jinja2==3.1.2
 beautifulsoup4==4.12.2
 schedule==1.2.0
 pytz==2023.3
--- a/news_sender/scheduled_sender.py
+++ b/news_sender/scheduled_sender.py
@@ -0,0 +1,178 @@
 #!/usr/bin/env python3
 """
 Scheduled newsletter sender that runs daily at 7 AM Berlin time
 Waits for crawler to finish before sending to ensure fresh content
 """
 import schedule
 import time
 from datetime import datetime, timedelta
 import pytz
 from pathlib import Path
 import sys
 # Add current directory to path
 sys.path.insert(0, str(Path(__file__).parent))
 from sender_service import send_newsletter, get_latest_articles, Config
 # Berlin timezone
 BERLIN_TZ = pytz.timezone('Europe/Berlin')
 # Maximum time to wait for crawler (in minutes)
 MAX_WAIT_TIME = 30
 def check_crawler_finished():
    """
    Check if crawler has finished by looking for recent articles
    Returns: (bool, str) - (is_finished, message)
    """
    try:
        # Check if we have articles from today
        articles = get_latest_articles(max_articles=1, hours=2)
        if articles:
            # Check if the most recent article was crawled recently (within last 2 hours)
            latest_article = articles[0]
            crawled_at = latest_article.get('crawled_at')
            if crawled_at:
                time_since_crawl = datetime.utcnow() - crawled_at
                minutes_since = time_since_crawl.total_seconds() / 60
                if minutes_since < 120:  # Within last 2 hours
                    return True, f"Crawler finished {int(minutes_since)} minutes ago"
        return False, "No recent articles found"
    except Exception as e:
        return False, f"Error checking crawler status: {e}"
 def wait_for_crawler(max_wait_minutes=30):
    """
    Wait for crawler to finish before sending newsletter
    Args:
        max_wait_minutes: Maximum time to wait in minutes
    Returns:
        bool: True if crawler finished, False if timeout
    """
    berlin_time = datetime.now(BERLIN_TZ)
    print(f"\n⏳ Waiting for crawler to finish...")
    print(f"   Current time: {berlin_time.strftime('%H:%M:%S %Z')}")
    print(f"   Max wait time: {max_wait_minutes} minutes")
    start_time = time.time()
    check_interval = 30  # Check every 30 seconds
    while True:
        elapsed_minutes = (time.time() - start_time) / 60
        # Check if crawler finished
        is_finished, message = check_crawler_finished()
        if is_finished:
            print(f"   ✓ {message}")
            return True
        # Check if we've exceeded max wait time
        if elapsed_minutes >= max_wait_minutes:
            print(f"   ⚠ Timeout after {max_wait_minutes} minutes")
            print(f"   Proceeding with available articles...")
            return False
        # Show progress
        remaining = max_wait_minutes - elapsed_minutes
        print(f"   ⏳ Still waiting... ({remaining:.1f} minutes remaining) - {message}")
        # Wait before next check
        time.sleep(check_interval)
 def run_sender():
    """Run the newsletter sender with crawler coordination"""
    berlin_time = datetime.now(BERLIN_TZ)
    print(f"\n{'='*70}")
    print(f"📧 Scheduled newsletter sender started")
    print(f"   Time: {berlin_time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
    print(f"{'='*70}\n")
    try:
        # Wait for crawler to finish (max 30 minutes)
        crawler_finished = wait_for_crawler(max_wait_minutes=MAX_WAIT_TIME)
        if not crawler_finished:
            print(f"\n⚠ Crawler may still be running, but proceeding anyway...")
        print(f"\n{'='*70}")
        print(f"📧 Starting newsletter send...")
        print(f"{'='*70}\n")
        # Send newsletter to all subscribers
        result = send_newsletter(max_articles=Config.MAX_ARTICLES)
        if result['success']:
            print(f"\n{'='*70}")
            print(f"✅ Newsletter sent successfully!")
            print(f"   Sent: {result['sent_count']}/{result['total_subscribers']}")
            print(f"   Articles: {result['article_count']}")
            print(f"   Failed: {result['failed_count']}")
            print(f"{'='*70}\n")
        else:
            print(f"\n{'='*70}")
            print(f"❌ Newsletter send failed: {result.get('error', 'Unknown error')}")
            print(f"{'='*70}\n")
    except Exception as e:
        print(f"\n{'='*70}")
        print(f"❌ Scheduled sender error: {e}")
        print(f"{'='*70}\n")
        import traceback
        traceback.print_exc()
 def main():
    """Main scheduler loop"""
    print("📧 Munich News Newsletter Scheduler")
    print("="*70)
    print("Schedule: Daily at 7:00 AM Berlin time")
    print("Timezone: Europe/Berlin (CET/CEST)")
    print("Coordination: Waits for crawler to finish (max 30 min)")
    print("="*70)
    # Schedule the sender to run at 7 AM Berlin time
    schedule.every().day.at("07:00").do(run_sender)
    # Show next run time
    berlin_time = datetime.now(BERLIN_TZ)
    print(f"\nCurrent time (Berlin): {berlin_time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
    # Get next scheduled run
    next_run = schedule.next_run()
    if next_run:
        # Convert to Berlin time for display
        next_run_berlin = next_run.astimezone(BERLIN_TZ)
        print(f"Next scheduled run: {next_run_berlin.strftime('%Y-%m-%d %H:%M:%S %Z')}")
    print("\n⏳ Scheduler is running... (Press Ctrl+C to stop)\n")
    # Optional: Run immediately on startup (comment out if you don't want this)
    # print("🚀 Running initial send on startup...")
    # run_sender()
    # Keep the scheduler running
    while True:
        schedule.run_pending()
        time.sleep(60)  # Check every minute
 if __name__ == '__main__':
    try:
        main()
    except KeyboardInterrupt:
        print("\n\n👋 Scheduler stopped by user")
    except Exception as e:
        print(f"\n\n❌ Scheduler error: {e}")
        import traceback
        traceback.print_exc()
--- a/news_sender/sender_service.py
+++ b/news_sender/sender_service.py
@@ -11,8 +11,17 @@ from pathlib import Path
 from jinja2 import Template
 from pymongo import MongoClient
 import os
 import sys
 from dotenv import load_dotenv
 # Add backend directory to path for importing tracking service
 backend_dir = Path(__file__).parent.parent / 'backend'
 sys.path.insert(0, str(backend_dir))
 # Import tracking modules
 from services import tracking_service
 from tracking_integration import inject_tracking_pixel, replace_article_links, generate_tracking_urls
 # Load environment variables from backend/.env
 backend_dir = Path(__file__).parent.parent / 'backend'
 env_path = backend_dir / '.env'
@@ -41,6 +50,11 @@ class Config:
    HOURS_LOOKBACK = int(os.getenv('NEWSLETTER_HOURS_LOOKBACK', '24'))
    WEBSITE_URL = os.getenv('WEBSITE_URL', 'http://localhost:3000')
    # Tracking
    TRACKING_ENABLED = os.getenv('TRACKING_ENABLED', 'true').lower() == 'true'
    TRACKING_API_URL = os.getenv('TRACKING_API_URL', 'http://localhost:5001')
    TRACKING_DATA_RETENTION_DAYS = int(os.getenv('TRACKING_DATA_RETENTION_DAYS', '90'))
 # MongoDB connection
 client = MongoClient(Config.MONGODB_URI)
@@ -117,15 +131,20 @@ def get_active_subscribers():
    return [doc['email'] for doc in cursor]
-def render_newsletter_html(articles):
+def render_newsletter_html(articles, tracking_enabled=False, pixel_tracking_id=None, 
                          link_tracking_map=None, api_url=None):
    """
-    Render newsletter HTML from template
+    Render newsletter HTML from template with optional tracking integration
    Args:
        articles: List of article dictionaries
        tracking_enabled: Whether to inject tracking pixel and replace links
        pixel_tracking_id: Tracking ID for the email open pixel
        link_tracking_map: Dictionary mapping original URLs to tracking IDs
        api_url: Base URL for the tracking API
    Returns:
-        str: Rendered HTML content
+        str: Rendered HTML content with tracking injected if enabled
    """
    # Load template
    template_path = Path(__file__).parent / 'newsletter_template.html'
@@ -142,11 +161,23 @@ def render_newsletter_html(articles):
        'article_count': len(articles),
        'articles': articles,
        'unsubscribe_link': f'{Config.WEBSITE_URL}/unsubscribe',
-        'website_link': Config.WEBSITE_URL
+        'website_link': Config.WEBSITE_URL,
        'tracking_enabled': tracking_enabled
    }
    # Render HTML
-    return template.render(**template_data)
+    html = template.render(**template_data)
    # Inject tracking if enabled
    if tracking_enabled and pixel_tracking_id and api_url:
        # Inject tracking pixel
        html = inject_tracking_pixel(html, pixel_tracking_id, api_url)
        # Replace article links with tracking URLs
        if link_tracking_map:
            html = replace_article_links(html, link_tracking_map, api_url)
    return html
 def send_email(to_email, subject, html_content):
@@ -246,14 +277,14 @@ def send_newsletter(max_articles=None, test_email=None):
            'error': 'No active subscribers'
        }
-    # Render newsletter
+    # Generate newsletter ID (date-based)
-    print("\nRendering newsletter HTML...")
+    newsletter_id = f"newsletter-{datetime.now().strftime('%Y-%m-%d')}"
    html_content = render_newsletter_html(articles)
    print("✓ Newsletter rendered")
    # Send to subscribers
    subject = f"Munich News Daily - {datetime.now().strftime('%B %d, %Y')}"
    print(f"\nSending newsletter: '{subject}'")
    print(f"Newsletter ID: {newsletter_id}")
    print(f"Tracking enabled: {Config.TRACKING_ENABLED}")
    print("-" * 70)
    sent_count = 0
@@ -262,6 +293,34 @@ def send_newsletter(max_articles=None, test_email=None):
    for i, email in enumerate(subscribers, 1):
        print(f"[{i}/{len(subscribers)}] Sending to {email}...", end=' ')
        # Generate tracking data for this subscriber if tracking is enabled
        if Config.TRACKING_ENABLED:
            try:
                tracking_data = generate_tracking_urls(
                    articles=articles,
                    newsletter_id=newsletter_id,
                    subscriber_email=email,
                    tracking_service=tracking_service
                )
                # Render newsletter with tracking
                html_content = render_newsletter_html(
                    articles=articles,
                    tracking_enabled=True,
                    pixel_tracking_id=tracking_data['pixel_tracking_id'],
                    link_tracking_map=tracking_data['link_tracking_map'],
                    api_url=Config.TRACKING_API_URL
                )
            except Exception as e:
                print(f"⚠ Tracking error: {e}, sending without tracking...", end=' ')
                # Fallback: send without tracking
                html_content = render_newsletter_html(articles)
        else:
            # Render newsletter without tracking
            html_content = render_newsletter_html(articles)
        # Send email
        success, error = send_email(email, subject, html_content)
        if success:
@@ -310,12 +369,11 @@ def preview_newsletter(max_articles=None, hours=None):
        today_date = datetime.now().strftime('%B %d, %Y')
        return f"<h1>No articles from today found</h1><p>No articles published today ({today_date}). Run the crawler with Ollama enabled to get fresh content.</p>"
-    return render_newsletter_html(articles)
+    # Preview without tracking
    return render_newsletter_html(articles, tracking_enabled=False)
 if __name__ == '__main__':
    import sys
    # Parse command line arguments
    if len(sys.argv) > 1:
        command = sys.argv[1]
--- a/news_sender/tracking_integration.py
+++ b/news_sender/tracking_integration.py
@@ -0,0 +1,150 @@
 """
 Tracking integration module for Munich News Daily newsletter system.
 Handles injection of tracking pixels and replacement of article links with tracking URLs.
 """
 import re
 from typing import Dict, List
 from bs4 import BeautifulSoup
 def inject_tracking_pixel(html: str, tracking_id: str, api_url: str) -> str:
    """
    Inject tracking pixel into newsletter HTML before closing </body> tag.
    The tracking pixel is a 1x1 transparent image that loads when the email is opened,
    allowing us to track email opens.
    Args:
        html: Original newsletter HTML content
        tracking_id: Unique tracking ID for this newsletter send (None if tracking disabled)
        api_url: Base URL for the tracking API (e.g., http://localhost:5001)
    Returns:
        str: HTML with tracking pixel injected (unchanged if tracking_id is None)
    Example:
        >>> html = '<html><body><p>Content</p></body></html>'
        >>> inject_tracking_pixel(html, 'abc-123', 'http://api.example.com')
        '<html><body><p>Content</p><img src="http://api.example.com/api/track/pixel/abc-123" width="1" height="1" alt="" /></body></html>'
    """
    # Skip tracking if no tracking_id provided (subscriber opted out)
    if not tracking_id:
        return html
    # Construct tracking pixel URL
    pixel_url = f"{api_url}/api/track/pixel/{tracking_id}"
    # Create tracking pixel HTML
    pixel_html = f'<img src="{pixel_url}" width="1" height="1" alt="" style="display:block;" />'
    # Inject pixel before closing </body> tag
    if '</body>' in html:
        html = html.replace('</body>', f'{pixel_html}</body>')
    else:
        # Fallback: append to end if no </body> tag found
        html += pixel_html
    return html
 def replace_article_links(
    html: str,
    link_tracking_map: Dict[str, str],
    api_url: str
 ) -> str:
    """
    Replace article links in newsletter HTML with tracking URLs.
    Finds all article links in the HTML and replaces them with tracking redirect URLs
    that log clicks before redirecting to the original article.
    Args:
        html: Original newsletter HTML content
        link_tracking_map: Dictionary mapping original URLs to tracking IDs (empty if tracking disabled)
        api_url: Base URL for the tracking API (e.g., http://localhost:5001)
    Returns:
        str: HTML with article links replaced by tracking URLs (unchanged if map is empty)
    Example:
        >>> html = '<a href="https://example.com/article">Read</a>'
        >>> mapping = {'https://example.com/article': 'track-123'}
        >>> replace_article_links(html, mapping, 'http://api.example.com')
        '<a href="http://api.example.com/api/track/click/track-123">Read</a>'
    """
    # Skip tracking if no tracking map provided (subscriber opted out)
    if not link_tracking_map:
        return html
    # Parse HTML with BeautifulSoup
    soup = BeautifulSoup(html, 'html.parser')
    # Find all <a> tags with href attributes
    for link in soup.find_all('a', href=True):
        original_url = link['href']
        # Check if this URL should be tracked
        if original_url in link_tracking_map:
            tracking_id = link_tracking_map[original_url]
            tracking_url = f"{api_url}/api/track/click/{tracking_id}"
            # Replace the href with tracking URL
            link['href'] = tracking_url
    # Return modified HTML
    return str(soup)
 def generate_tracking_urls(
    articles: List[Dict],
    newsletter_id: str,
    subscriber_email: str,
    tracking_service
 ) -> Dict[str, str]:
    """
    Generate tracking records for all article links and return URL mapping.
    Creates tracking records in the database for each article link and returns
    a mapping of original URLs to tracking IDs.
    Args:
        articles: List of article dictionaries with 'link' and 'title' keys
        newsletter_id: Unique identifier for the newsletter batch
        subscriber_email: Email address of the recipient
        tracking_service: Tracking service module with create_newsletter_tracking function
    Returns:
        dict: Dictionary containing:
            - pixel_tracking_id: ID for the tracking pixel
            - link_tracking_map: Dict mapping original URLs to tracking IDs
    Example:
        >>> articles = [{'link': 'https://example.com/1', 'title': 'Article 1'}]
        >>> generate_tracking_urls(articles, 'news-2024-01-01', 'user@example.com', tracking_service)
        {
            'pixel_tracking_id': 'uuid-for-pixel',
            'link_tracking_map': {'https://example.com/1': 'uuid-for-link'}
        }
    """
    # Prepare article links for tracking
    article_links = []
    for article in articles:
        if 'link' in article and article['link']:
            article_links.append({
                'url': article['link'],
                'title': article.get('title', '')
            })
    # Create tracking records using the tracking service
    tracking_data = tracking_service.create_newsletter_tracking(
        newsletter_id=newsletter_id,
        subscriber_email=subscriber_email,
        article_links=article_links
    )
    return {
        'pixel_tracking_id': tracking_data['pixel_tracking_id'],
        'link_tracking_map': tracking_data['link_tracking_map'],
        'tracking_enabled': tracking_data.get('tracking_enabled', True)
    }
--- a/tests/backend/test_analytics.py
+++ b/tests/backend/test_analytics.py
@@ -0,0 +1,451 @@
 #!/usr/bin/env python
 """
 Test analytics functionality for email tracking
 Run from backend directory with venv activated:
  cd backend
  source venv/bin/activate  # or venv\Scripts\activate on Windows
  python test_analytics.py
 """
 import sys
 import os
 from datetime import datetime, timedelta
 # Add backend directory to path
 sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
 from services.analytics_service import (
    get_open_rate,
    get_click_rate,
    get_newsletter_metrics,
    get_article_performance,
    get_subscriber_activity_status,
    update_subscriber_activity_statuses
 )
 from database import (
    newsletter_sends_collection,
    link_clicks_collection,
    subscriber_activity_collection
 )
 from app import app
 print("\n" + "="*80)
 print("Analytics Service Tests")
 print("="*80)
 # Test counters
 tests_passed = 0
 tests_failed = 0
 def test_result(test_name, passed, message=""):
    """Print test result"""
    global tests_passed, tests_failed
    if passed:
        tests_passed += 1
        print(f"✓ {test_name}")
        if message:
            print(f"  {message}")
    else:
        tests_failed += 1
        print(f"❌ {test_name}")
        if message:
            print(f"  {message}")
 # Setup test data
 print("\n" + "-"*80)
 print("Setting up test data...")
 print("-"*80)
 try:
    # Clean up existing test data
    newsletter_sends_collection.delete_many({'newsletter_id': {'$regex': '^test-analytics-'}})
    link_clicks_collection.delete_many({'newsletter_id': {'$regex': '^test-analytics-'}})
    subscriber_activity_collection.delete_many({'email': {'$regex': '^test-analytics-'}})
    # Create test newsletter sends
    test_newsletter_id = 'test-analytics-newsletter-001'
    # Create 10 newsletter sends: 7 opened, 3 not opened
    for i in range(10):
        opened = i < 7  # First 7 are opened
        doc = {
            'newsletter_id': test_newsletter_id,
            'subscriber_email': f'test-analytics-user{i}@example.com',
            'tracking_id': f'test-pixel-{i}',
            'sent_at': datetime.utcnow(),
            'opened': opened,
            'first_opened_at': datetime.utcnow() if opened else None,
            'last_opened_at': datetime.utcnow() if opened else None,
            'open_count': 1 if opened else 0,
            'created_at': datetime.utcnow()
        }
        newsletter_sends_collection.insert_one(doc)
    # Create test link clicks for an article
    test_article_url = 'https://example.com/test-analytics-article'
    # Create 10 link tracking records: 4 clicked, 6 not clicked
    for i in range(10):
        clicked = i < 4  # First 4 are clicked
        doc = {
            'tracking_id': f'test-link-{i}',
            'newsletter_id': test_newsletter_id,
            'subscriber_email': f'test-analytics-user{i}@example.com',
            'article_url': test_article_url,
            'article_title': 'Test Analytics Article',
            'clicked': clicked,
            'clicked_at': datetime.utcnow() if clicked else None,
            'user_agent': 'Test Agent' if clicked else None,
            'created_at': datetime.utcnow()
        }
        link_clicks_collection.insert_one(doc)
    print("✓ Test data created")
 except Exception as e:
    print(f"❌ Error setting up test data: {str(e)}")
    import traceback
    traceback.print_exc()
    sys.exit(1)
 # Test 1: Open Rate Calculation
 print("\n" + "-"*80)
 print("Test 1: Open Rate Calculation")
 print("-"*80)
 try:
    open_rate = get_open_rate(test_newsletter_id)
    # Expected: 7 out of 10 = 70%
    is_correct = open_rate == 70.0
    test_result("Calculate open rate", is_correct, f"Open rate: {open_rate}% (expected 70%)")
    # Test with non-existent newsletter
    open_rate_empty = get_open_rate('non-existent-newsletter')
    handles_empty = open_rate_empty == 0.0
    test_result("Handle non-existent newsletter", handles_empty, 
                f"Open rate: {open_rate_empty}% (expected 0%)")
 except Exception as e:
    test_result("Open rate calculation", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 2: Click Rate Calculation
 print("\n" + "-"*80)
 print("Test 2: Click Rate Calculation")
 print("-"*80)
 try:
    click_rate = get_click_rate(test_article_url)
    # Expected: 4 out of 10 = 40%
    is_correct = click_rate == 40.0
    test_result("Calculate click rate", is_correct, f"Click rate: {click_rate}% (expected 40%)")
    # Test with non-existent article
    click_rate_empty = get_click_rate('https://example.com/non-existent')
    handles_empty = click_rate_empty == 0.0
    test_result("Handle non-existent article", handles_empty, 
                f"Click rate: {click_rate_empty}% (expected 0%)")
 except Exception as e:
    test_result("Click rate calculation", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 3: Newsletter Metrics
 print("\n" + "-"*80)
 print("Test 3: Newsletter Metrics")
 print("-"*80)
 try:
    metrics = get_newsletter_metrics(test_newsletter_id)
    # Verify all expected fields
    has_all_fields = all(key in metrics for key in [
        'newsletter_id', 'total_sent', 'total_opened', 'open_rate',
        'total_clicks', 'unique_clickers', 'click_through_rate'
    ])
    test_result("Returns all required fields", has_all_fields)
    # Verify values
    correct_sent = metrics['total_sent'] == 10
    test_result("Correct total_sent", correct_sent, f"Total sent: {metrics['total_sent']}")
    correct_opened = metrics['total_opened'] == 7
    test_result("Correct total_opened", correct_opened, f"Total opened: {metrics['total_opened']}")
    correct_open_rate = metrics['open_rate'] == 70.0
    test_result("Correct open_rate", correct_open_rate, f"Open rate: {metrics['open_rate']}%")
    correct_clicks = metrics['total_clicks'] == 4
    test_result("Correct total_clicks", correct_clicks, f"Total clicks: {metrics['total_clicks']}")
    correct_unique_clickers = metrics['unique_clickers'] == 4
    test_result("Correct unique_clickers", correct_unique_clickers, 
                f"Unique clickers: {metrics['unique_clickers']}")
    correct_ctr = metrics['click_through_rate'] == 40.0
    test_result("Correct click_through_rate", correct_ctr, 
                f"CTR: {metrics['click_through_rate']}%")
 except Exception as e:
    test_result("Newsletter metrics", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 4: Article Performance
 print("\n" + "-"*80)
 print("Test 4: Article Performance")
 print("-"*80)
 try:
    performance = get_article_performance(test_article_url)
    # Verify all expected fields
    has_all_fields = all(key in performance for key in [
        'article_url', 'total_sent', 'total_clicks', 'click_rate',
        'unique_clickers', 'newsletters'
    ])
    test_result("Returns all required fields", has_all_fields)
    # Verify values
    correct_sent = performance['total_sent'] == 10
    test_result("Correct total_sent", correct_sent, f"Total sent: {performance['total_sent']}")
    correct_clicks = performance['total_clicks'] == 4
    test_result("Correct total_clicks", correct_clicks, f"Total clicks: {performance['total_clicks']}")
    correct_click_rate = performance['click_rate'] == 40.0
    test_result("Correct click_rate", correct_click_rate, f"Click rate: {performance['click_rate']}%")
    correct_unique = performance['unique_clickers'] == 4
    test_result("Correct unique_clickers", correct_unique, 
                f"Unique clickers: {performance['unique_clickers']}")
    has_newsletters = len(performance['newsletters']) > 0
    test_result("Returns newsletter list", has_newsletters, 
                f"Newsletters: {performance['newsletters']}")
 except Exception as e:
    test_result("Article performance", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 5: Activity Status Classification
 print("\n" + "-"*80)
 print("Test 5: Activity Status Classification")
 print("-"*80)
 try:
    # Create test data for activity classification
    now = datetime.utcnow()
    # Active user (opened 10 days ago)
    newsletter_sends_collection.insert_one({
        'newsletter_id': 'test-analytics-activity',
        'subscriber_email': 'test-analytics-active@example.com',
        'tracking_id': 'test-active-pixel',
        'sent_at': now - timedelta(days=10),
        'opened': True,
        'first_opened_at': now - timedelta(days=10),
        'last_opened_at': now - timedelta(days=10),
        'open_count': 1,
        'created_at': now - timedelta(days=10)
    })
    # Inactive user (opened 45 days ago)
    newsletter_sends_collection.insert_one({
        'newsletter_id': 'test-analytics-activity',
        'subscriber_email': 'test-analytics-inactive@example.com',
        'tracking_id': 'test-inactive-pixel',
        'sent_at': now - timedelta(days=45),
        'opened': True,
        'first_opened_at': now - timedelta(days=45),
        'last_opened_at': now - timedelta(days=45),
        'open_count': 1,
        'created_at': now - timedelta(days=45)
    })
    # Dormant user (opened 90 days ago)
    newsletter_sends_collection.insert_one({
        'newsletter_id': 'test-analytics-activity',
        'subscriber_email': 'test-analytics-dormant@example.com',
        'tracking_id': 'test-dormant-pixel',
        'sent_at': now - timedelta(days=90),
        'opened': True,
        'first_opened_at': now - timedelta(days=90),
        'last_opened_at': now - timedelta(days=90),
        'open_count': 1,
        'created_at': now - timedelta(days=90)
    })
    # New user (never opened)
    newsletter_sends_collection.insert_one({
        'newsletter_id': 'test-analytics-activity',
        'subscriber_email': 'test-analytics-new@example.com',
        'tracking_id': 'test-new-pixel',
        'sent_at': now - timedelta(days=5),
        'opened': False,
        'first_opened_at': None,
        'last_opened_at': None,
        'open_count': 0,
        'created_at': now - timedelta(days=5)
    })
    # Test classifications
    active_status = get_subscriber_activity_status('test-analytics-active@example.com')
    is_active = active_status == 'active'
    test_result("Classify active user", is_active, f"Status: {active_status}")
    inactive_status = get_subscriber_activity_status('test-analytics-inactive@example.com')
    is_inactive = inactive_status == 'inactive'
    test_result("Classify inactive user", is_inactive, f"Status: {inactive_status}")
    dormant_status = get_subscriber_activity_status('test-analytics-dormant@example.com')
    is_dormant = dormant_status == 'dormant'
    test_result("Classify dormant user", is_dormant, f"Status: {dormant_status}")
    new_status = get_subscriber_activity_status('test-analytics-new@example.com')
    is_new = new_status == 'new'
    test_result("Classify new user", is_new, f"Status: {new_status}")
 except Exception as e:
    test_result("Activity status classification", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 6: Batch Update Activity Statuses
 print("\n" + "-"*80)
 print("Test 6: Batch Update Activity Statuses")
 print("-"*80)
 try:
    updated_count = update_subscriber_activity_statuses()
    # Should update all test subscribers
    has_updates = updated_count > 0
    test_result("Updates subscriber records", has_updates, 
                f"Updated {updated_count} subscribers")
    # Verify a record was created
    activity_record = subscriber_activity_collection.find_one({
        'email': 'test-analytics-active@example.com'
    })
    record_exists = activity_record is not None
    test_result("Creates activity record", record_exists)
    if activity_record:
        has_required_fields = all(key in activity_record for key in [
            'email', 'status', 'total_opens', 'total_clicks',
            'newsletters_received', 'newsletters_opened', 'updated_at'
        ])
        test_result("Activity record has required fields", has_required_fields)
        correct_status = activity_record['status'] == 'active'
        test_result("Activity record has correct status", correct_status, 
                   f"Status: {activity_record['status']}")
 except Exception as e:
    test_result("Batch update activity statuses", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 7: Analytics API Endpoints
 print("\n" + "-"*80)
 print("Test 7: Analytics API Endpoints")
 print("-"*80)
 try:
    with app.test_client() as client:
        # Test newsletter analytics endpoint
        response = client.get(f'/api/analytics/newsletter/{test_newsletter_id}')
        is_200 = response.status_code == 200
        test_result("Newsletter endpoint returns 200", is_200, f"Status: {response.status_code}")
        if is_200:
            data = response.get_json()
            has_data = data is not None and 'open_rate' in data
            test_result("Newsletter endpoint returns data", has_data)
        # Test article analytics endpoint
        response = client.get(f'/api/analytics/article/{test_article_url}')
        is_200 = response.status_code == 200
        test_result("Article endpoint returns 200", is_200, f"Status: {response.status_code}")
        if is_200:
            data = response.get_json()
            has_data = data is not None and 'click_rate' in data
            test_result("Article endpoint returns data", has_data)
        # Test subscriber analytics endpoint
        response = client.get('/api/analytics/subscriber/test-analytics-active@example.com')
        is_200 = response.status_code == 200
        test_result("Subscriber endpoint returns 200", is_200, f"Status: {response.status_code}")
        if is_200:
            data = response.get_json()
            has_data = data is not None and 'status' in data
            test_result("Subscriber endpoint returns data", has_data)
        # Test update activity endpoint
        response = client.post('/api/analytics/update-activity')
        is_200 = response.status_code == 200
        test_result("Update activity endpoint returns 200", is_200, f"Status: {response.status_code}")
        if is_200:
            data = response.get_json()
            has_count = data is not None and 'updated_count' in data
            test_result("Update activity endpoint returns count", has_count)
 except Exception as e:
    test_result("Analytics API endpoints", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Clean up test data
 print("\n" + "-"*80)
 print("Cleaning up test data...")
 print("-"*80)
 try:
    newsletter_sends_collection.delete_many({'newsletter_id': {'$regex': '^test-analytics-'}})
    link_clicks_collection.delete_many({'newsletter_id': {'$regex': '^test-analytics-'}})
    subscriber_activity_collection.delete_many({'email': {'$regex': '^test-analytics-'}})
    print("✓ Test data cleaned up")
 except Exception as e:
    print(f"⚠  Error cleaning up: {str(e)}")
 # Summary
 print("\n" + "="*80)
 print("TEST SUMMARY")
 print("="*80)
 print(f"Total tests: {tests_passed + tests_failed}")
 print(f"✓ Passed: {tests_passed}")
 print(f"❌ Failed: {tests_failed}")
 if tests_failed == 0:
    print("\n🎉 All tests passed!")
 else:
    print(f"\n⚠  {tests_failed} test(s) failed")
 print("="*80 + "\n")
 # Exit with appropriate code
 sys.exit(0 if tests_failed == 0 else 1)
--- a/tests/backend/test_privacy.py
+++ b/tests/backend/test_privacy.py
@@ -0,0 +1,389 @@
 #!/usr/bin/env python
 """
 Test privacy compliance features for email tracking
 Run from backend directory with venv activated:
  cd backend
  source venv/bin/activate  # or venv\Scripts\activate on Windows
  python test_privacy.py
 """
 import sys
 import os
 from datetime import datetime, timedelta
 from pymongo import MongoClient
 # Add backend directory to path
 sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
 from config import Config
 from services.tracking_service import (
    create_newsletter_tracking,
    anonymize_old_tracking_data,
    delete_subscriber_tracking_data
 )
 from database import (
    newsletter_sends_collection,
    link_clicks_collection,
    subscriber_activity_collection,
    subscribers_collection
 )
 from app import app
 print("\n" + "="*80)
 print("Privacy Compliance Tests")
 print("="*80)
 # Test counters
 tests_passed = 0
 tests_failed = 0
 def test_result(test_name, passed, message=""):
    """Print test result"""
    global tests_passed, tests_failed
    if passed:
        tests_passed += 1
        print(f"✓ {test_name}")
        if message:
            print(f"  {message}")
    else:
        tests_failed += 1
        print(f"❌ {test_name}")
        if message:
            print(f"  {message}")
 # Setup: Clean up test data
 print("\n" + "-"*80)
 print("Setup: Cleaning test data")
 print("-"*80)
 test_newsletter_id = 'privacy-test-newsletter'
 test_email = 'privacy-test@example.com'
 test_email_opted_out = 'opted-out@example.com'
 newsletter_sends_collection.delete_many({'newsletter_id': test_newsletter_id})
 link_clicks_collection.delete_many({'newsletter_id': test_newsletter_id})
 subscriber_activity_collection.delete_many({'email': {'$in': [test_email, test_email_opted_out]}})
 subscribers_collection.delete_many({'email': {'$in': [test_email, test_email_opted_out]}})
 print("✓ Test data cleaned")
 # Test 1: Data Anonymization
 print("\n" + "-"*80)
 print("Test 1: Data Anonymization")
 print("-"*80)
 try:
    # Create old tracking records (older than 90 days)
    old_date = datetime.utcnow() - timedelta(days=100)
    old_newsletter_doc = {
        'newsletter_id': test_newsletter_id,
        'subscriber_email': 'old-user@example.com',
        'tracking_id': 'old-tracking-id-1',
        'sent_at': old_date,
        'opened': True,
        'first_opened_at': old_date,
        'last_opened_at': old_date,
        'open_count': 3,
        'created_at': old_date
    }
    newsletter_sends_collection.insert_one(old_newsletter_doc)
    old_link_doc = {
        'tracking_id': 'old-link-tracking-id-1',
        'newsletter_id': test_newsletter_id,
        'subscriber_email': 'old-user@example.com',
        'article_url': 'https://example.com/old-article',
        'article_title': 'Old Article',
        'clicked': True,
        'clicked_at': old_date,
        'created_at': old_date
    }
    link_clicks_collection.insert_one(old_link_doc)
    # Create recent tracking records (within 90 days)
    recent_date = datetime.utcnow() - timedelta(days=30)
    recent_newsletter_doc = {
        'newsletter_id': test_newsletter_id,
        'subscriber_email': 'recent-user@example.com',
        'tracking_id': 'recent-tracking-id-1',
        'sent_at': recent_date,
        'opened': True,
        'first_opened_at': recent_date,
        'last_opened_at': recent_date,
        'open_count': 1,
        'created_at': recent_date
    }
    newsletter_sends_collection.insert_one(recent_newsletter_doc)
    # Run anonymization
    result = anonymize_old_tracking_data(retention_days=90)
    # Check that old records were anonymized
    old_newsletter_after = newsletter_sends_collection.find_one({'tracking_id': 'old-tracking-id-1'})
    old_anonymized = old_newsletter_after and old_newsletter_after['subscriber_email'] == 'anonymized'
    test_result("Anonymizes old newsletter records", old_anonymized,
                f"Email: {old_newsletter_after.get('subscriber_email', 'N/A') if old_newsletter_after else 'N/A'}")
    old_link_after = link_clicks_collection.find_one({'tracking_id': 'old-link-tracking-id-1'})
    link_anonymized = old_link_after and old_link_after['subscriber_email'] == 'anonymized'
    test_result("Anonymizes old link click records", link_anonymized,
                f"Email: {old_link_after.get('subscriber_email', 'N/A') if old_link_after else 'N/A'}")
    # Check that aggregated metrics are preserved
    metrics_preserved = (
        old_newsletter_after and
        old_newsletter_after['open_count'] == 3 and
        old_newsletter_after['opened'] == True
    )
    test_result("Preserves aggregated metrics", metrics_preserved,
                f"Open count: {old_newsletter_after.get('open_count', 0) if old_newsletter_after else 0}")
    # Check that recent records were NOT anonymized
    recent_newsletter_after = newsletter_sends_collection.find_one({'tracking_id': 'recent-tracking-id-1'})
    recent_not_anonymized = (
        recent_newsletter_after and
        recent_newsletter_after['subscriber_email'] == 'recent-user@example.com'
    )
    test_result("Does not anonymize recent records", recent_not_anonymized,
                f"Email: {recent_newsletter_after.get('subscriber_email', 'N/A') if recent_newsletter_after else 'N/A'}")
    # Check return counts
    correct_counts = result['newsletter_sends_anonymized'] >= 1 and result['link_clicks_anonymized'] >= 1
    test_result("Returns correct anonymization counts", correct_counts,
                f"Newsletter: {result['newsletter_sends_anonymized']}, Links: {result['link_clicks_anonymized']}")
 except Exception as e:
    test_result("Data anonymization", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 2: User Data Deletion
 print("\n" + "-"*80)
 print("Test 2: User Data Deletion")
 print("-"*80)
 try:
    # Create tracking records for a specific user
    article_links = [
        {'url': 'https://example.com/article1', 'title': 'Article 1'},
        {'url': 'https://example.com/article2', 'title': 'Article 2'}
    ]
    tracking_data = create_newsletter_tracking(
        newsletter_id=test_newsletter_id,
        subscriber_email=test_email,
        article_links=article_links
    )
    # Create subscriber activity record
    subscriber_activity_collection.insert_one({
        'email': test_email,
        'status': 'active',
        'last_opened_at': datetime.utcnow(),
        'total_opens': 5,
        'total_clicks': 3
    })
    # Verify records exist
    newsletter_count_before = newsletter_sends_collection.count_documents({'subscriber_email': test_email})
    link_count_before = link_clicks_collection.count_documents({'subscriber_email': test_email})
    activity_count_before = subscriber_activity_collection.count_documents({'email': test_email})
    records_exist = newsletter_count_before > 0 and link_count_before > 0 and activity_count_before > 0
    test_result("Creates test tracking records", records_exist,
                f"Newsletter: {newsletter_count_before}, Links: {link_count_before}, Activity: {activity_count_before}")
    # Delete all tracking data for the user
    delete_result = delete_subscriber_tracking_data(test_email)
    # Verify all records were deleted
    newsletter_count_after = newsletter_sends_collection.count_documents({'subscriber_email': test_email})
    link_count_after = link_clicks_collection.count_documents({'subscriber_email': test_email})
    activity_count_after = subscriber_activity_collection.count_documents({'email': test_email})
    all_deleted = newsletter_count_after == 0 and link_count_after == 0 and activity_count_after == 0
    test_result("Deletes all tracking records", all_deleted,
                f"Remaining - Newsletter: {newsletter_count_after}, Links: {link_count_after}, Activity: {activity_count_after}")
    # Check return counts
    correct_delete_counts = (
        delete_result['newsletter_sends_deleted'] == newsletter_count_before and
        delete_result['link_clicks_deleted'] == link_count_before and
        delete_result['subscriber_activity_deleted'] == activity_count_before
    )
    test_result("Returns correct deletion counts", correct_delete_counts,
                f"Deleted - Newsletter: {delete_result['newsletter_sends_deleted']}, Links: {delete_result['link_clicks_deleted']}, Activity: {delete_result['subscriber_activity_deleted']}")
 except Exception as e:
    test_result("User data deletion", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 3: Tracking Opt-Out
 print("\n" + "-"*80)
 print("Test 3: Tracking Opt-Out")
 print("-"*80)
 try:
    # Create subscriber with tracking disabled
    subscribers_collection.insert_one({
        'email': test_email_opted_out,
        'subscribed_at': datetime.utcnow(),
        'tracking_enabled': False
    })
    # Try to create tracking for opted-out subscriber
    article_links = [
        {'url': 'https://example.com/article1', 'title': 'Article 1'}
    ]
    tracking_data_opted_out = create_newsletter_tracking(
        newsletter_id=test_newsletter_id,
        subscriber_email=test_email_opted_out,
        article_links=article_links
    )
    # Check that no tracking was created
    no_pixel_id = tracking_data_opted_out['pixel_tracking_id'] is None
    test_result("Does not create pixel tracking for opted-out users", no_pixel_id,
                f"Pixel ID: {tracking_data_opted_out['pixel_tracking_id']}")
    empty_link_map = len(tracking_data_opted_out['link_tracking_map']) == 0
    test_result("Does not create link tracking for opted-out users", empty_link_map,
                f"Link map size: {len(tracking_data_opted_out['link_tracking_map'])}")
    tracking_disabled_flag = tracking_data_opted_out.get('tracking_enabled') == False
    test_result("Returns tracking_enabled=False for opted-out users", tracking_disabled_flag)
    # Verify no database records were created
    newsletter_count = newsletter_sends_collection.count_documents({'subscriber_email': test_email_opted_out})
    link_count = link_clicks_collection.count_documents({'subscriber_email': test_email_opted_out})
    no_db_records = newsletter_count == 0 and link_count == 0
    test_result("Does not create database records for opted-out users", no_db_records,
                f"Newsletter records: {newsletter_count}, Link records: {link_count}")
    # Test opt-in/opt-out endpoints
    with app.test_client() as client:
        # Create a subscriber with tracking enabled
        subscribers_collection.insert_one({
            'email': test_email,
            'subscribed_at': datetime.utcnow(),
            'tracking_enabled': True
        })
        # Opt out
        response = client.post(f'/api/tracking/subscriber/{test_email}/opt-out')
        opt_out_success = response.status_code == 200 and response.json.get('success') == True
        test_result("Opt-out endpoint works", opt_out_success,
                    f"Status: {response.status_code}")
        # Verify tracking is disabled
        subscriber = subscribers_collection.find_one({'email': test_email})
        tracking_disabled = subscriber and subscriber.get('tracking_enabled') == False
        test_result("Opt-out disables tracking in database", tracking_disabled)
        # Opt back in
        response = client.post(f'/api/tracking/subscriber/{test_email}/opt-in')
        opt_in_success = response.status_code == 200 and response.json.get('success') == True
        test_result("Opt-in endpoint works", opt_in_success,
                    f"Status: {response.status_code}")
        # Verify tracking is enabled
        subscriber = subscribers_collection.find_one({'email': test_email})
        tracking_enabled = subscriber and subscriber.get('tracking_enabled') == True
        test_result("Opt-in enables tracking in database", tracking_enabled)
 except Exception as e:
    test_result("Tracking opt-out", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 4: Privacy API Endpoints
 print("\n" + "-"*80)
 print("Test 4: Privacy API Endpoints")
 print("-"*80)
 try:
    with app.test_client() as client:
        # Create test tracking data
        article_links = [{'url': 'https://example.com/test', 'title': 'Test'}]
        create_newsletter_tracking(
            newsletter_id=test_newsletter_id,
            subscriber_email='api-test@example.com',
            article_links=article_links
        )
        # Test deletion endpoint
        response = client.delete('/api/tracking/subscriber/api-test@example.com')
        delete_endpoint_works = response.status_code == 200 and response.json.get('success') == True
        test_result("Deletion endpoint returns success", delete_endpoint_works,
                    f"Status: {response.status_code}")
        # Verify data was deleted
        remaining_records = newsletter_sends_collection.count_documents({'subscriber_email': 'api-test@example.com'})
        data_deleted = remaining_records == 0
        test_result("Deletion endpoint removes data", data_deleted,
                    f"Remaining records: {remaining_records}")
        # Test anonymization endpoint
        response = client.post('/api/tracking/anonymize', json={'retention_days': 90})
        anonymize_endpoint_works = response.status_code == 200 and response.json.get('success') == True
        test_result("Anonymization endpoint returns success", anonymize_endpoint_works,
                    f"Status: {response.status_code}")
        has_counts = 'anonymized_counts' in response.json
        test_result("Anonymization endpoint returns counts", has_counts)
 except Exception as e:
    test_result("Privacy API endpoints", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Clean up test data
 print("\n" + "-"*80)
 print("Cleaning up test data...")
 print("-"*80)
 try:
    newsletter_sends_collection.delete_many({'newsletter_id': test_newsletter_id})
    link_clicks_collection.delete_many({'newsletter_id': test_newsletter_id})
    subscriber_activity_collection.delete_many({'email': {'$in': [test_email, test_email_opted_out, 'api-test@example.com']}})
    subscribers_collection.delete_many({'email': {'$in': [test_email, test_email_opted_out, 'api-test@example.com']}})
    # Clean up anonymized records
    newsletter_sends_collection.delete_many({'subscriber_email': 'anonymized'})
    link_clicks_collection.delete_many({'subscriber_email': 'anonymized'})
    print("✓ Test data cleaned up")
 except Exception as e:
    print(f"⚠  Error cleaning up: {str(e)}")
 # Summary
 print("\n" + "="*80)
 print("TEST SUMMARY")
 print("="*80)
 print(f"Total tests: {tests_passed + tests_failed}")
 print(f"✓ Passed: {tests_passed}")
 print(f"❌ Failed: {tests_failed}")
 if tests_failed == 0:
    print("\n🎉 All privacy compliance tests passed!")
 else:
    print(f"\n⚠  {tests_failed} test(s) failed")
 print("="*80 + "\n")
 # Exit with appropriate code
 sys.exit(0 if tests_failed == 0 else 1)
--- a/tests/backend/test_rss_extraction.py
+++ b/tests/backend/test_rss_extraction.py
--- a/tests/backend/test_tracking.py
+++ b/tests/backend/test_tracking.py
@@ -0,0 +1,260 @@
 #!/usr/bin/env python
 """
 Test email tracking functionality
 Run from backend directory with venv activated:
  cd backend
  source venv/bin/activate  # or venv\Scripts\activate on Windows
  python test_tracking.py
 """
 import sys
 import os
 from datetime import datetime
 from pymongo import MongoClient
 # Add backend directory to path
 sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
 from config import Config
 from services.tracking_service import generate_tracking_id, create_newsletter_tracking
 from database import newsletter_sends_collection, link_clicks_collection
 from app import app
 print("\n" + "="*80)
 print("Email Tracking System Tests")
 print("="*80)
 # Test counters
 tests_passed = 0
 tests_failed = 0
 def test_result(test_name, passed, message=""):
    """Print test result"""
    global tests_passed, tests_failed
    if passed:
        tests_passed += 1
        print(f"✓ {test_name}")
        if message:
            print(f"  {message}")
    else:
        tests_failed += 1
        print(f"❌ {test_name}")
        if message:
            print(f"  {message}")
 # Test 1: Tracking ID Generation
 print("\n" + "-"*80)
 print("Test 1: Tracking ID Generation")
 print("-"*80)
 try:
    tracking_id = generate_tracking_id()
    # Check format (UUID4)
    is_valid_uuid = len(tracking_id) == 36 and tracking_id.count('-') == 4
    test_result("Generate tracking ID", is_valid_uuid, f"Generated ID: {tracking_id}")
    # Check uniqueness
    tracking_id2 = generate_tracking_id()
    is_unique = tracking_id != tracking_id2
    test_result("Tracking IDs are unique", is_unique, f"ID1: {tracking_id[:8]}... ID2: {tracking_id2[:8]}...")
 except Exception as e:
    test_result("Generate tracking ID", False, f"Error: {str(e)}")
 # Test 2: Create Newsletter Tracking
 print("\n" + "-"*80)
 print("Test 2: Create Newsletter Tracking")
 print("-"*80)
 try:
    # Clean up test data first
    newsletter_sends_collection.delete_many({'newsletter_id': 'test-newsletter-001'})
    link_clicks_collection.delete_many({'newsletter_id': 'test-newsletter-001'})
    # Create tracking with article links
    article_links = [
        {'url': 'https://example.com/article1', 'title': 'Test Article 1'},
        {'url': 'https://example.com/article2', 'title': 'Test Article 2'}
    ]
    tracking_data = create_newsletter_tracking(
        newsletter_id='test-newsletter-001',
        subscriber_email='test@example.com',
        article_links=article_links
    )
    # Verify return data structure
    has_pixel_id = 'pixel_tracking_id' in tracking_data
    test_result("Returns pixel tracking ID", has_pixel_id)
    has_link_map = 'link_tracking_map' in tracking_data
    test_result("Returns link tracking map", has_link_map)
    correct_link_count = len(tracking_data.get('link_tracking_map', {})) == 2
    test_result("Creates tracking for all links", correct_link_count, 
                f"Created {len(tracking_data.get('link_tracking_map', {}))} link tracking records")
    # Verify database records
    newsletter_record = newsletter_sends_collection.find_one({
        'tracking_id': tracking_data['pixel_tracking_id']
    })
    record_exists = newsletter_record is not None
    test_result("Creates newsletter_sends record", record_exists)
    if newsletter_record:
        correct_initial_state = (
            newsletter_record['opened'] == False and
            newsletter_record['open_count'] == 0 and
            newsletter_record['first_opened_at'] is None
        )
        test_result("Newsletter record has correct initial state", correct_initial_state)
    # Verify link click records
    link_records = list(link_clicks_collection.find({'newsletter_id': 'test-newsletter-001'}))
    correct_link_records = len(link_records) == 2
    test_result("Creates link_clicks records", correct_link_records, 
                f"Created {len(link_records)} link click records")
 except Exception as e:
    test_result("Create newsletter tracking", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 3: Tracking Pixel Endpoint
 print("\n" + "-"*80)
 print("Test 3: Tracking Pixel Endpoint")
 print("-"*80)
 try:
    with app.test_client() as client:
        # Test with valid tracking ID
        pixel_tracking_id = tracking_data['pixel_tracking_id']
        response = client.get(f'/api/track/pixel/{pixel_tracking_id}')
        is_png = response.content_type == 'image/png'
        test_result("Returns PNG for valid tracking_id", is_png, 
                   f"Content-Type: {response.content_type}")
        is_200 = response.status_code == 200
        test_result("Returns 200 status", is_200, f"Status: {response.status_code}")
        # Verify database was updated
        updated_record = newsletter_sends_collection.find_one({
            'tracking_id': pixel_tracking_id
        })
        was_logged = (
            updated_record and
            updated_record['opened'] == True and
            updated_record['open_count'] == 1 and
            updated_record['first_opened_at'] is not None
        )
        test_result("Logs email open event", was_logged, 
                   f"Open count: {updated_record.get('open_count', 0) if updated_record else 0}")
        # Test multiple opens
        response2 = client.get(f'/api/track/pixel/{pixel_tracking_id}')
        updated_record2 = newsletter_sends_collection.find_one({
            'tracking_id': pixel_tracking_id
        })
        handles_multiple = (
            updated_record2 and
            updated_record2['open_count'] == 2 and
            updated_record2['last_opened_at'] != updated_record2['first_opened_at']
        )
        test_result("Handles multiple opens", handles_multiple, 
                   f"Open count: {updated_record2.get('open_count', 0) if updated_record2 else 0}")
        # Test with invalid tracking ID
        response3 = client.get('/api/track/pixel/invalid-tracking-id-12345')
        fails_silently = response3.status_code == 200 and response3.content_type == 'image/png'
        test_result("Returns PNG for invalid tracking_id (fails silently)", fails_silently)
 except Exception as e:
    test_result("Tracking pixel endpoint", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Test 4: Link Redirect Endpoint
 print("\n" + "-"*80)
 print("Test 4: Link Redirect Endpoint")
 print("-"*80)
 try:
    with app.test_client() as client:
        # Test with valid tracking ID
        article_url = 'https://example.com/article1'
        link_tracking_id = tracking_data['link_tracking_map'][article_url]
        response = client.get(f'/api/track/click/{link_tracking_id}', follow_redirects=False)
        is_redirect = response.status_code == 302
        test_result("Returns 302 redirect", is_redirect, f"Status: {response.status_code}")
        correct_location = response.location == article_url
        test_result("Redirects to correct URL", correct_location, 
                   f"Location: {response.location}")
        # Verify database was updated
        click_record = link_clicks_collection.find_one({
            'tracking_id': link_tracking_id
        })
        was_logged = (
            click_record and
            click_record['clicked'] == True and
            click_record['clicked_at'] is not None
        )
        test_result("Logs click event", was_logged)
        # Test with invalid tracking ID
        response2 = client.get('/api/track/click/invalid-tracking-id-12345', follow_redirects=False)
        redirects_on_invalid = response2.status_code == 302
        test_result("Redirects on invalid tracking_id", redirects_on_invalid, 
                   f"Redirects to: {response2.location}")
 except Exception as e:
    test_result("Link redirect endpoint", False, f"Error: {str(e)}")
    import traceback
    traceback.print_exc()
 # Clean up test data
 print("\n" + "-"*80)
 print("Cleaning up test data...")
 print("-"*80)
 try:
    newsletter_sends_collection.delete_many({'newsletter_id': 'test-newsletter-001'})
    link_clicks_collection.delete_many({'newsletter_id': 'test-newsletter-001'})
    print("✓ Test data cleaned up")
 except Exception as e:
    print(f"⚠  Error cleaning up: {str(e)}")
 # Summary
 print("\n" + "="*80)
 print("TEST SUMMARY")
 print("="*80)
 print(f"Total tests: {tests_passed + tests_failed}")
 print(f"✓ Passed: {tests_passed}")
 print(f"❌ Failed: {tests_failed}")
 if tests_failed == 0:
    print("\n🎉 All tests passed!")
 else:
    print(f"\n⚠  {tests_failed} test(s) failed")
 print("="*80 + "\n")
 # Exit with appropriate code
 sys.exit(0 if tests_failed == 0 else 1)
--- a/tests/crawler/test_crawler.py
+++ b/tests/crawler/test_crawler.py
--- a/tests/crawler/test_ollama.py
+++ b/tests/crawler/test_ollama.py
--- a/tests/crawler/test_rss_feeds.py
+++ b/tests/crawler/test_rss_feeds.py
--- a/tests/sender/test_newsletter_tracking.py
+++ b/tests/sender/test_newsletter_tracking.py
@@ -0,0 +1,208 @@
 #!/usr/bin/env python
 """
 Integration test for newsletter with tracking.
 Tests the full flow of generating a newsletter with tracking enabled.
 """
 import sys
 from pathlib import Path
 from datetime import datetime
 # Add backend directory to path
 backend_dir = Path(__file__).parent.parent / 'backend'
 sys.path.insert(0, str(backend_dir))
 # Mock the tracking service to avoid database dependency
 class MockTrackingService:
    """Mock tracking service for testing"""
    @staticmethod
    def create_newsletter_tracking(newsletter_id, subscriber_email, article_links=None):
        """Mock create_newsletter_tracking function"""
        link_tracking_map = {}
        if article_links:
            for i, article in enumerate(article_links):
                link_tracking_map[article['url']] = f"mock-link-{i}"
        return {
            'pixel_tracking_id': 'mock-pixel-123',
            'link_tracking_map': link_tracking_map,
            'newsletter_id': newsletter_id,
            'subscriber_email': subscriber_email
        }
 # Import after setting up path
 from tracking_integration import inject_tracking_pixel, replace_article_links, generate_tracking_urls
 from jinja2 import Template
 def test_newsletter_with_tracking():
    """Test generating a newsletter with tracking enabled"""
    print("\n" + "="*70)
    print("NEWSLETTER TRACKING INTEGRATION TEST")
    print("="*70)
    # Mock article data
    articles = [
        {
            'title': 'Munich Tech Summit Announces 2025 Dates',
            'author': 'Tech Reporter',
            'link': 'https://example.com/tech-summit',
            'summary': 'The annual Munich Tech Summit will return in 2025 with exciting new features.',
            'source': 'Munich Tech News',
            'published_at': datetime.now()
        },
        {
            'title': 'New Public Transport Routes Launched',
            'author': 'Transport Desk',
            'link': 'https://example.com/transport-routes',
            'summary': 'MVG announces three new bus routes connecting suburban areas.',
            'source': 'Munich Transport',
            'published_at': datetime.now()
        }
    ]
    # Configuration
    newsletter_id = 'test-newsletter-2025-11-11'
    subscriber_email = 'test@example.com'
    api_url = 'http://localhost:5001'
    print(f"\nNewsletter ID: {newsletter_id}")
    print(f"Subscriber: {subscriber_email}")
    print(f"Articles: {len(articles)}")
    print(f"API URL: {api_url}")
    # Step 1: Generate tracking URLs
    print("\n" + "-"*70)
    print("Step 1: Generate tracking data")
    print("-"*70)
    tracking_data = generate_tracking_urls(
        articles=articles,
        newsletter_id=newsletter_id,
        subscriber_email=subscriber_email,
        tracking_service=MockTrackingService
    )
    print(f"✓ Pixel tracking ID: {tracking_data['pixel_tracking_id']}")
    print(f"✓ Link tracking map: {len(tracking_data['link_tracking_map'])} links")
    for url, tracking_id in tracking_data['link_tracking_map'].items():
        print(f"  - {url} → {tracking_id}")
    # Step 2: Load and render template
    print("\n" + "-"*70)
    print("Step 2: Render newsletter template")
    print("-"*70)
    template_path = Path(__file__).parent / 'newsletter_template.html'
    with open(template_path, 'r', encoding='utf-8') as f:
        template_content = f.read()
    template = Template(template_content)
    now = datetime.now()
    template_data = {
        'date': now.strftime('%A, %B %d, %Y'),
        'year': now.year,
        'article_count': len(articles),
        'articles': articles,
        'unsubscribe_link': 'http://localhost:3000/unsubscribe',
        'website_link': 'http://localhost:3000',
        'tracking_enabled': True
    }
    html = template.render(**template_data)
    print("✓ Template rendered")
    # Step 3: Inject tracking pixel
    print("\n" + "-"*70)
    print("Step 3: Inject tracking pixel")
    print("-"*70)
    html = inject_tracking_pixel(
        html,
        tracking_data['pixel_tracking_id'],
        api_url
    )
    pixel_url = f"{api_url}/api/track/pixel/{tracking_data['pixel_tracking_id']}"
    if pixel_url in html:
        print(f"✓ Tracking pixel injected: {pixel_url}")
    else:
        print(f"✗ Tracking pixel NOT found")
        return False
    # Step 4: Replace article links
    print("\n" + "-"*70)
    print("Step 4: Replace article links with tracking URLs")
    print("-"*70)
    html = replace_article_links(
        html,
        tracking_data['link_tracking_map'],
        api_url
    )
    # Verify all article links were replaced
    success = True
    for article in articles:
        original_url = article['link']
        tracking_id = tracking_data['link_tracking_map'].get(original_url)
        if tracking_id:
            tracking_url = f"{api_url}/api/track/click/{tracking_id}"
            if tracking_url in html:
                print(f"✓ Link replaced: {original_url}")
                print(f"  → {tracking_url}")
            else:
                print(f"✗ Link NOT replaced: {original_url}")
                success = False
        # Verify original URL is NOT in the HTML (should be replaced)
        if f'href="{original_url}"' in html:
            print(f"✗ Original URL still present: {original_url}")
            success = False
    # Step 5: Verify privacy notice
    print("\n" + "-"*70)
    print("Step 5: Verify privacy notice")
    print("-"*70)
    if "This email contains tracking to measure engagement" in html:
        print("✓ Privacy notice present in footer")
    else:
        print("✗ Privacy notice NOT found")
        success = False
    # Step 6: Save output for inspection
    print("\n" + "-"*70)
    print("Step 6: Save test output")
    print("-"*70)
    output_file = 'test_newsletter_with_tracking.html'
    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(html)
    print(f"✓ Test newsletter saved to: {output_file}")
    print(f"  Open it in your browser to inspect the tracking integration")
    return success
 if __name__ == '__main__':
    print("\n" + "="*70)
    print("TESTING NEWSLETTER WITH TRACKING")
    print("="*70)
    success = test_newsletter_with_tracking()
    print("\n" + "="*70)
    if success:
        print("✓ ALL TESTS PASSED")
        print("="*70 + "\n")
        sys.exit(0)
    else:
        print("✗ SOME TESTS FAILED")
        print("="*70 + "\n")
        sys.exit(1)
--- a/tests/sender/test_newsletter_with_tracking.html
+++ b/tests/sender/test_newsletter_with_tracking.html
@@ -0,0 +1,179 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
 <meta charset="utf-8"/>
 <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
 <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
 <title>Munich News Daily</title>
 <!--[if mso]>
    <style type="text/css">
        body, table, td {font-family: Arial, Helvetica, sans-serif !important;}
    </style>
    <![endif]-->
 </head>
 <body style="margin: 0; padding: 0; background-color: #f4f4f4; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;">
 <!-- Wrapper Table -->
 <table border="0" cellpadding="0" cellspacing="0" role="presentation" style="background-color: #f4f4f4;" width="100%">
 <tr>
 <td align="center" style="padding: 20px 0;">
 <!-- Main Container -->
 <table border="0" cellpadding="0" cellspacing="0" role="presentation" style="background-color: #ffffff; max-width: 600px;" width="600">
 <!-- Header -->
 <tr>
 <td style="background-color: #1a1a1a; padding: 30px 40px; text-align: center;">
 <h1 style="margin: 0 0 8px 0; font-size: 28px; font-weight: 700; color: #ffffff; letter-spacing: -0.5px;">
                                Munich News Daily
                            </h1>
 <p style="margin: 0; font-size: 14px; color: #999999; letter-spacing: 0.5px;">
                                Tuesday, November 11, 2025
                            </p>
 </td>
 </tr>
 <!-- Greeting -->
 <tr>
 <td style="padding: 30px 40px 20px 40px;">
 <p style="margin: 0; font-size: 16px; line-height: 1.5; color: #333333;">
                                Good morning ☀️
                            </p>
 <p style="margin: 15px 0 0 0; font-size: 15px; line-height: 1.6; color: #666666;">
                                Here's what's happening in Munich today. We've summarized 2 stories using AI so you can stay informed in under 5 minutes.
                            </p>
 </td>
 </tr>
 <!-- Divider -->
 <tr>
 <td style="padding: 0 40px;">
 <div style="height: 1px; background-color: #e0e0e0;"></div>
 </td>
 </tr>
 <!-- Articles -->
 <tr>
 <td style="padding: 25px 40px;">
 <!-- Article Number Badge -->
 <table border="0" cellpadding="0" cellspacing="0" role="presentation" width="100%">
 <tr>
 <td>
 <span style="display: inline-block; background-color: #000000; color: #ffffff; width: 24px; height: 24px; line-height: 24px; text-align: center; border-radius: 50%; font-size: 12px; font-weight: 600;">
                                            1
                                        </span>
 </td>
 </tr>
 </table>
 <!-- Article Title -->
 <h2 style="margin: 12px 0 8px 0; font-size: 19px; font-weight: 700; line-height: 1.3; color: #1a1a1a;">
                                Munich Tech Summit Announces 2025 Dates
                            </h2>
 <!-- Article Meta -->
 <p style="margin: 0 0 12px 0; font-size: 13px; color: #999999;">
 <span style="color: #000000; font-weight: 600;">Munich Tech News</span>
 <span> • Tech Reporter</span>
 </p>
 <!-- Article Summary -->
 <p style="margin: 0 0 15px 0; font-size: 15px; line-height: 1.6; color: #333333;">
                                The annual Munich Tech Summit will return in 2025 with exciting new features.
                            </p>
 <!-- Read More Link -->
 <a href="http://localhost:5001/api/track/click/mock-link-0" style="display: inline-block; color: #000000; text-decoration: none; font-size: 14px; font-weight: 600; border-bottom: 2px solid #000000; padding-bottom: 2px;">
                                Read more →
                            </a>
 </td>
 </tr>
 <!-- Article Divider -->
 <tr>
 <td style="padding: 0 40px;">
 <div style="height: 1px; background-color: #f0f0f0;"></div>
 </td>
 </tr>
 <tr>
 <td style="padding: 25px 40px;">
 <!-- Article Number Badge -->
 <table border="0" cellpadding="0" cellspacing="0" role="presentation" width="100%">
 <tr>
 <td>
 <span style="display: inline-block; background-color: #000000; color: #ffffff; width: 24px; height: 24px; line-height: 24px; text-align: center; border-radius: 50%; font-size: 12px; font-weight: 600;">
                                            2
                                        </span>
 </td>
 </tr>
 </table>
 <!-- Article Title -->
 <h2 style="margin: 12px 0 8px 0; font-size: 19px; font-weight: 700; line-height: 1.3; color: #1a1a1a;">
                                New Public Transport Routes Launched
                            </h2>
 <!-- Article Meta -->
 <p style="margin: 0 0 12px 0; font-size: 13px; color: #999999;">
 <span style="color: #000000; font-weight: 600;">Munich Transport</span>
 <span> • Transport Desk</span>
 </p>
 <!-- Article Summary -->
 <p style="margin: 0 0 15px 0; font-size: 15px; line-height: 1.6; color: #333333;">
                                MVG announces three new bus routes connecting suburban areas.
                            </p>
 <!-- Read More Link -->
 <a href="http://localhost:5001/api/track/click/mock-link-1" style="display: inline-block; color: #000000; text-decoration: none; font-size: 14px; font-weight: 600; border-bottom: 2px solid #000000; padding-bottom: 2px;">
                                Read more →
                            </a>
 </td>
 </tr>
 <!-- Article Divider -->
 <!-- Bottom Divider -->
 <tr>
 <td style="padding: 25px 40px 0 40px;">
 <div style="height: 1px; background-color: #e0e0e0;"></div>
 </td>
 </tr>
 <!-- Summary Box -->
 <tr>
 <td style="padding: 30px 40px;">
 <table border="0" cellpadding="0" cellspacing="0" role="presentation" style="background-color: #f8f8f8; border-radius: 8px;" width="100%">
 <tr>
 <td style="padding: 25px; text-align: center;">
 <p style="margin: 0 0 8px 0; font-size: 13px; color: #666666; text-transform: uppercase; letter-spacing: 1px; font-weight: 600;">
                                            Today's Digest
                                        </p>
 <p style="margin: 0; font-size: 36px; font-weight: 700; color: #000000;">
                                            2
                                        </p>
 <p style="margin: 8px 0 0 0; font-size: 14px; color: #666666;">
                                            stories • AI-summarized • 5 min read
                                        </p>
 </td>
 </tr>
 </table>
 </td>
 </tr>
 <!-- Footer -->
 <tr>
 <td style="background-color: #1a1a1a; padding: 30px 40px; text-align: center;">
 <p style="margin: 0 0 15px 0; font-size: 14px; color: #ffffff; font-weight: 600;">
                                Munich News Daily
                            </p>
 <p style="margin: 0 0 20px 0; font-size: 13px; color: #999999; line-height: 1.5;">
                                AI-powered news summaries for busy people.<br/>
                                Delivered daily to your inbox.
                            </p>
 <!-- Footer Links -->
 <p style="margin: 0; font-size: 12px; color: #666666;">
 <a href="http://localhost:3000" style="color: #999999; text-decoration: none;">Visit Website</a>
 <span style="color: #444444;"> • </span>
 <a href="http://localhost:3000/unsubscribe" style="color: #999999; text-decoration: none;">Unsubscribe</a>
 </p>
 <!-- Privacy Notice -->
 <p style="margin: 20px 0 0 0; font-size: 11px; color: #666666; line-height: 1.4;">
                                This email contains tracking to measure engagement and improve our content.<br/>
                                We respect your privacy and anonymize data after 90 days.
                            </p>
 <p style="margin: 20px 0 0 0; font-size: 11px; color: #666666;">
                                © 2025 Munich News Daily. All rights reserved.
                            </p>
 </td>
 </tr>
 </table>
 <!-- End Main Container -->
 </td>
 </tr>
 </table>
 <!-- End Wrapper Table -->
 <img alt="" height="1" src="http://localhost:5001/api/track/pixel/mock-pixel-123" style="display:block;" width="1"/></body>
 </html>
--- a/tests/sender/test_tracking_integration.py
+++ b/tests/sender/test_tracking_integration.py
@@ -0,0 +1,187 @@
 #!/usr/bin/env python
 """
 Test script for tracking integration in newsletter sender.
 Tests tracking pixel injection and link replacement.
 """
 import sys
 from pathlib import Path
 # Add backend directory to path
 backend_dir = Path(__file__).parent.parent / 'backend'
 sys.path.insert(0, str(backend_dir))
 from tracking_integration import inject_tracking_pixel, replace_article_links
 def test_inject_tracking_pixel():
    """Test that tracking pixel is correctly injected into HTML"""
    print("\n" + "="*70)
    print("TEST 1: Inject Tracking Pixel")
    print("="*70)
    # Test HTML
    html = """<html>
 <body>
 <p>Newsletter content</p>
 </body>
 </html>"""
    tracking_id = "test-tracking-123"
    api_url = "http://localhost:5001"
    # Inject pixel
    result = inject_tracking_pixel(html, tracking_id, api_url)
    # Verify pixel is present
    expected_pixel = f'<img src="{api_url}/api/track/pixel/{tracking_id}" width="1" height="1" alt="" style="display:block;" />'
    if expected_pixel in result:
        print("✓ Tracking pixel correctly injected")
        print(f"  Pixel URL: {api_url}/api/track/pixel/{tracking_id}")
        return True
    else:
        print("✗ Tracking pixel NOT found in HTML")
        print(f"  Expected: {expected_pixel}")
        print(f"  Result: {result}")
        return False
 def test_replace_article_links():
    """Test that article links are correctly replaced with tracking URLs"""
    print("\n" + "="*70)
    print("TEST 2: Replace Article Links")
    print("="*70)
    # Test HTML with article links
    html = """<html>
 <body>
 <a href="https://example.com/article1">Article 1</a>
 <a href="https://example.com/article2">Article 2</a>
 <a href="https://example.com/untracked">Untracked Link</a>
 </body>
 </html>"""
    # Tracking map
    link_tracking_map = {
        "https://example.com/article1": "track-id-1",
        "https://example.com/article2": "track-id-2"
    }
    api_url = "http://localhost:5001"
    # Replace links
    result = replace_article_links(html, link_tracking_map, api_url)
    # Verify replacements
    success = True
    # Check article 1 link
    expected_url_1 = f"{api_url}/api/track/click/track-id-1"
    if expected_url_1 in result:
        print(f"✓ Article 1 link replaced: {expected_url_1}")
    else:
        print(f"✗ Article 1 link NOT replaced")
        success = False
    # Check article 2 link
    expected_url_2 = f"{api_url}/api/track/click/track-id-2"
    if expected_url_2 in result:
        print(f"✓ Article 2 link replaced: {expected_url_2}")
    else:
        print(f"✗ Article 2 link NOT replaced")
        success = False
    # Check untracked link remains unchanged
    if "https://example.com/untracked" in result:
        print(f"✓ Untracked link preserved: https://example.com/untracked")
    else:
        print(f"✗ Untracked link was modified (should remain unchanged)")
        success = False
    return success
 def test_full_integration():
    """Test full integration: pixel + link replacement"""
    print("\n" + "="*70)
    print("TEST 3: Full Integration (Pixel + Links)")
    print("="*70)
    # Test HTML
    html = """<html>
 <body>
 <h1>Newsletter</h1>
 <a href="https://example.com/article">Read Article</a>
 </body>
 </html>"""
    api_url = "http://localhost:5001"
    pixel_tracking_id = "pixel-123"
    link_tracking_map = {
        "https://example.com/article": "link-456"
    }
    # First inject pixel
    html = inject_tracking_pixel(html, pixel_tracking_id, api_url)
    # Then replace links
    html = replace_article_links(html, link_tracking_map, api_url)
    # Verify both are present
    success = True
    pixel_url = f"{api_url}/api/track/pixel/{pixel_tracking_id}"
    if pixel_url in html:
        print(f"✓ Tracking pixel present: {pixel_url}")
    else:
        print(f"✗ Tracking pixel NOT found")
        success = False
    link_url = f"{api_url}/api/track/click/link-456"
    if link_url in html:
        print(f"✓ Tracking link present: {link_url}")
    else:
        print(f"✗ Tracking link NOT found")
        success = False
    if success:
        print("\n✓ Full integration successful!")
        print("\nFinal HTML:")
        print("-" * 70)
        print(html)
        print("-" * 70)
    return success
 if __name__ == '__main__':
    print("\n" + "="*70)
    print("TRACKING INTEGRATION TEST SUITE")
    print("="*70)
    results = []
    # Run tests
    results.append(("Inject Tracking Pixel", test_inject_tracking_pixel()))
    results.append(("Replace Article Links", test_replace_article_links()))
    results.append(("Full Integration", test_full_integration()))
    # Summary
    print("\n" + "="*70)
    print("TEST SUMMARY")
    print("="*70)
    passed = sum(1 for _, result in results if result)
    total = len(results)
    for test_name, result in results:
        status = "✓ PASS" if result else "✗ FAIL"
        print(f"{status}: {test_name}")
    print("-" * 70)
    print(f"Results: {passed}/{total} tests passed")
    print("="*70 + "\n")
    # Exit with appropriate code
    sys.exit(0 if passed == total else 1)
--- a/tests/test_feeds_quick.py
+++ b/tests/test_feeds_quick.py