# Email Tracking System Design ## Overview The email tracking system enables Munich News Daily to measure subscriber engagement through email opens and link clicks. The system uses industry-standard techniques (tracking pixels and redirect URLs) while maintaining privacy compliance and performance. ## Architecture ### High-Level Components ``` ┌─────────────────────────────────────────────────────────────┐ │ Newsletter System │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Sender │─────▶│ Tracking │ │ │ │ Service │ │ Generator │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌──────────────┐ │ │ │ │ MongoDB │ │ │ │ │ (tracking) │ │ │ │ └──────────────┘ │ │ ▼ │ │ ┌──────────────┐ │ │ │ Email │ │ │ │ Client │ │ │ └──────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▲ │ │ ▼ │ ┌─────────────────────────────────────────────────────────────┐ │ Backend API Server │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Pixel │ │ Link │ │ │ │ Endpoint │ │ Redirect │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ └──────────┬───────────┘ │ │ ▼ │ │ ┌──────────────┐ │ │ │ MongoDB │ │ │ │ (tracking) │ │ │ └──────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### Technology Stack - **Backend**: Flask (Python) - existing backend server - **Database**: MongoDB - existing database with new collections - **Email**: SMTP (existing sender service) - **Tracking**: UUID-based unique identifiers - **Image**: 1x1 transparent PNG (base64 encoded) ## Components and Interfaces ### 1. Tracking ID Generator **Purpose**: Generate unique tracking identifiers for emails and links **Module**: `backend/services/tracking_service.py` **Functions**: ```python def generate_tracking_id() -> str: """Generate a unique tracking ID using UUID4""" return str(uuid.uuid4()) def create_newsletter_tracking(newsletter_id: str, subscriber_email: str) -> dict: """Create tracking record for a newsletter send""" # Returns tracking document with IDs for pixel and links ``` ### 2. Tracking Pixel Endpoint **Purpose**: Serve 1x1 transparent PNG and log email opens **Endpoint**: `GET /api/track/pixel/` **Flow**: 1. Receive request with tracking_id 2. Look up tracking record in database 3. Log open event (email, timestamp, user-agent) 4. Return 1x1 transparent PNG image 5. Handle multiple opens (update last_opened_at) **Response**: - Status: 200 OK - Content-Type: image/png - Body: 1x1 transparent PNG (43 bytes) ### 3. Link Tracking Endpoint **Purpose**: Track link clicks and redirect to original URL **Endpoint**: `GET /api/track/click/` **Flow**: 1. Receive request with tracking_id 2. Look up tracking record and original URL 3. Log click event (email, article_url, timestamp, user-agent) 4. Redirect to original article URL (302 redirect) 5. Handle errors gracefully (redirect to homepage if invalid) **Response**: - Status: 302 Found - Location: Original article URL - Performance: < 200ms redirect time ### 4. Newsletter Template Modifier **Purpose**: Inject tracking pixel and replace article links **Module**: `news_sender/tracking_integration.py` **Functions**: ```python def inject_tracking_pixel(html: str, tracking_id: str, api_url: str) -> str: """Inject tracking pixel before closing tag""" pixel_url = f"{api_url}/api/track/pixel/{tracking_id}" pixel_html = f'' return html.replace('', f'{pixel_html}') def replace_article_links(html: str, articles: list, tracking_map: dict, api_url: str) -> str: """Replace article links with tracking URLs""" # For each article link, replace with tracking URL ``` ### 5. Analytics Service **Purpose**: Calculate engagement metrics and identify active users **Module**: `backend/services/analytics_service.py` **Functions**: ```python def get_open_rate(newsletter_id: str) -> float: """Calculate percentage of subscribers who opened""" def get_click_rate(article_url: str) -> float: """Calculate percentage of subscribers who clicked""" def get_subscriber_activity_status(email: str) -> str: """Return 'active', 'inactive', or 'dormant'""" def update_subscriber_activity_statuses(): """Batch update all subscriber activity statuses""" ``` ## Data Models ### Newsletter Sends Collection (`newsletter_sends`) Tracks each newsletter sent to each subscriber. ```javascript { _id: ObjectId, newsletter_id: String, // Unique ID for this newsletter batch (date-based) subscriber_email: String, // Recipient email tracking_id: String, // Unique tracking ID for this send (UUID) sent_at: DateTime, // When email was sent opened: Boolean, // Whether email was opened first_opened_at: DateTime, // First open timestamp (null if not opened) last_opened_at: DateTime, // Most recent open timestamp open_count: Number, // Number of times opened created_at: DateTime // Record creation time } ``` **Indexes**: - `tracking_id` (unique) - Fast lookup for pixel requests - `newsletter_id` - Analytics queries - `subscriber_email` - User activity queries - `sent_at` - Time-based queries ### Link Clicks Collection (`link_clicks`) Tracks individual link clicks. ```javascript { _id: ObjectId, tracking_id: String, // Unique tracking ID for this link (UUID) newsletter_id: String, // Which newsletter this link was in subscriber_email: String, // Who clicked article_url: String, // Original article URL article_title: String, // Article title for reporting clicked_at: DateTime, // When link was clicked user_agent: String, // Browser/client info created_at: DateTime // Record creation time } ``` **Indexes**: - `tracking_id` (unique) - Fast lookup for redirect requests - `newsletter_id` - Analytics queries - `article_url` - Article performance queries - `subscriber_email` - User activity queries ### Subscriber Activity Collection (`subscriber_activity`) Aggregated activity status for each subscriber. ```javascript { _id: ObjectId, email: String, // Subscriber email (unique) status: String, // 'active', 'inactive', or 'dormant' last_opened_at: DateTime, // Most recent email open last_clicked_at: DateTime, // Most recent link click total_opens: Number, // Lifetime open count total_clicks: Number, // Lifetime click count newsletters_received: Number, // Total newsletters sent newsletters_opened: Number, // Total newsletters opened updated_at: DateTime // Last status update } ``` **Indexes**: - `email` (unique) - Fast lookup - `status` - Filter by activity level - `last_opened_at` - Time-based queries ## Error Handling ### Tracking Pixel Failures - **Invalid tracking_id**: Return 1x1 transparent PNG anyway (don't break email rendering) - **Database error**: Log error, return pixel (fail silently) - **Multiple opens**: Update existing record, don't create duplicate ### Link Redirect Failures - **Invalid tracking_id**: Redirect to website homepage - **Database error**: Log error, redirect to homepage - **Missing original URL**: Redirect to homepage ### Privacy Compliance - **Data retention**: Anonymize tracking data after 90 days - Remove email addresses - Keep aggregated metrics - **Opt-out**: Check subscriber preferences before tracking - **GDPR deletion**: Provide endpoint to delete all tracking data for a user ## Testing Strategy ### Unit Tests 1. **Tracking ID Generation** - Test UUID format - Test uniqueness 2. **Pixel Endpoint** - Test valid tracking_id returns PNG - Test invalid tracking_id returns PNG - Test database logging 3. **Link Redirect** - Test valid tracking_id redirects correctly - Test invalid tracking_id redirects to homepage - Test click logging 4. **Analytics Calculations** - Test open rate calculation - Test click rate calculation - Test activity status classification ### Integration Tests 1. **End-to-End Newsletter Flow** - Send newsletter with tracking - Simulate email open (pixel request) - Simulate link click - Verify database records 2. **Privacy Compliance** - Test data anonymization - Test user data deletion - Test opt-out handling ### Performance Tests 1. **Redirect Speed** - Measure redirect time (target: < 200ms) - Test under load (100 concurrent requests) 2. **Pixel Serving** - Test pixel response time - Test caching headers ## API Endpoints ### Tracking Endpoints ``` GET /api/track/pixel/ - Returns: 1x1 transparent PNG - Logs: Email open event GET /api/track/click/ - Returns: 302 redirect to article URL - Logs: Link click event ``` ### Analytics Endpoints ``` GET /api/analytics/newsletter/ - Returns: Open rate, click rate, engagement metrics GET /api/analytics/article/ - Returns: Click count, click rate for specific article GET /api/analytics/subscriber/ - Returns: Activity status, engagement history POST /api/analytics/update-activity - Triggers: Batch update of subscriber activity statuses - Returns: Update count ``` ### Privacy Endpoints ``` DELETE /api/tracking/subscriber/ - Deletes: All tracking data for subscriber - Returns: Deletion confirmation POST /api/tracking/anonymize - Triggers: Anonymize tracking data older than 90 days - Returns: Anonymization count ``` ## Implementation Phases ### Phase 1: Core Tracking (MVP) - Tracking ID generation - Pixel endpoint - Link redirect endpoint - Database collections - Newsletter template integration ### Phase 2: Analytics - Open rate calculation - Click rate calculation - Activity status classification - Analytics API endpoints ### Phase 3: Privacy & Compliance - Data anonymization - User data deletion - Opt-out handling - Privacy notices ### Phase 4: Optimization - Caching for pixel endpoint - Performance monitoring - Batch processing for activity updates ## Security Considerations 1. **Rate Limiting**: Prevent abuse of tracking endpoints 2. **Input Validation**: Validate all tracking_ids (UUID format) 3. **SQL Injection**: Use parameterized queries (MongoDB safe by default) 4. **Privacy**: Don't expose subscriber emails in URLs 5. **HTTPS**: Ensure all tracking URLs use HTTPS in production ## Configuration Add to `backend/.env`: ```env # Tracking Configuration TRACKING_ENABLED=true TRACKING_API_URL=http://localhost:5000 TRACKING_DATA_RETENTION_DAYS=90 ``` ## Monitoring and Metrics ### Key Metrics to Track 1. **Email Opens** - Overall open rate - Open rate by newsletter - Time to first open 2. **Link Clicks** - Overall click rate - Click rate by article - Click-through rate (CTR) 3. **Subscriber Engagement** - Active subscriber count - Inactive subscriber count - Dormant subscriber count 4. **System Performance** - Pixel response time - Redirect response time - Database query performance