Files
2025-11-11 14:34:43 +01:00

408 lines
14 KiB
Markdown

# Email Tracking System Design
## Overview
The email tracking system enables Munich News Daily to measure subscriber engagement through email opens and link clicks. The system uses industry-standard techniques (tracking pixels and redirect URLs) while maintaining privacy compliance and performance.
## Architecture
### High-Level Components
```
┌─────────────────────────────────────────────────────────────┐
│ Newsletter System │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Sender │─────▶│ Tracking │ │
│ │ Service │ │ Generator │ │
│ └──────────────┘ └──────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────────┐ │
│ │ │ MongoDB │ │
│ │ │ (tracking) │ │
│ │ └──────────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Email │ │
│ │ Client │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ ▲
│ │
▼ │
┌─────────────────────────────────────────────────────────────┐
│ Backend API Server │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Pixel │ │ Link │ │
│ │ Endpoint │ │ Redirect │ │
│ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └──────────┬───────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ MongoDB │ │
│ │ (tracking) │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Technology Stack
- **Backend**: Flask (Python) - existing backend server
- **Database**: MongoDB - existing database with new collections
- **Email**: SMTP (existing sender service)
- **Tracking**: UUID-based unique identifiers
- **Image**: 1x1 transparent PNG (base64 encoded)
## Components and Interfaces
### 1. Tracking ID Generator
**Purpose**: Generate unique tracking identifiers for emails and links
**Module**: `backend/services/tracking_service.py`
**Functions**:
```python
def generate_tracking_id() -> str:
"""Generate a unique tracking ID using UUID4"""
return str(uuid.uuid4())
def create_newsletter_tracking(newsletter_id: str, subscriber_email: str) -> dict:
"""Create tracking record for a newsletter send"""
# Returns tracking document with IDs for pixel and links
```
### 2. Tracking Pixel Endpoint
**Purpose**: Serve 1x1 transparent PNG and log email opens
**Endpoint**: `GET /api/track/pixel/<tracking_id>`
**Flow**:
1. Receive request with tracking_id
2. Look up tracking record in database
3. Log open event (email, timestamp, user-agent)
4. Return 1x1 transparent PNG image
5. Handle multiple opens (update last_opened_at)
**Response**:
- Status: 200 OK
- Content-Type: image/png
- Body: 1x1 transparent PNG (43 bytes)
### 3. Link Tracking Endpoint
**Purpose**: Track link clicks and redirect to original URL
**Endpoint**: `GET /api/track/click/<tracking_id>`
**Flow**:
1. Receive request with tracking_id
2. Look up tracking record and original URL
3. Log click event (email, article_url, timestamp, user-agent)
4. Redirect to original article URL (302 redirect)
5. Handle errors gracefully (redirect to homepage if invalid)
**Response**:
- Status: 302 Found
- Location: Original article URL
- Performance: < 200ms redirect time
### 4. Newsletter Template Modifier
**Purpose**: Inject tracking pixel and replace article links
**Module**: `news_sender/tracking_integration.py`
**Functions**:
```python
def inject_tracking_pixel(html: str, tracking_id: str, api_url: str) -> str:
"""Inject tracking pixel before closing </body> tag"""
pixel_url = f"{api_url}/api/track/pixel/{tracking_id}"
pixel_html = f'<img src="{pixel_url}" width="1" height="1" alt="" />'
return html.replace('</body>', f'{pixel_html}</body>')
def replace_article_links(html: str, articles: list, tracking_map: dict, api_url: str) -> str:
"""Replace article links with tracking URLs"""
# For each article link, replace with tracking URL
```
### 5. Analytics Service
**Purpose**: Calculate engagement metrics and identify active users
**Module**: `backend/services/analytics_service.py`
**Functions**:
```python
def get_open_rate(newsletter_id: str) -> float:
"""Calculate percentage of subscribers who opened"""
def get_click_rate(article_url: str) -> float:
"""Calculate percentage of subscribers who clicked"""
def get_subscriber_activity_status(email: str) -> str:
"""Return 'active', 'inactive', or 'dormant'"""
def update_subscriber_activity_statuses():
"""Batch update all subscriber activity statuses"""
```
## Data Models
### Newsletter Sends Collection (`newsletter_sends`)
Tracks each newsletter sent to each subscriber.
```javascript
{
_id: ObjectId,
newsletter_id: String, // Unique ID for this newsletter batch (date-based)
subscriber_email: String, // Recipient email
tracking_id: String, // Unique tracking ID for this send (UUID)
sent_at: DateTime, // When email was sent
opened: Boolean, // Whether email was opened
first_opened_at: DateTime, // First open timestamp (null if not opened)
last_opened_at: DateTime, // Most recent open timestamp
open_count: Number, // Number of times opened
created_at: DateTime // Record creation time
}
```
**Indexes**:
- `tracking_id` (unique) - Fast lookup for pixel requests
- `newsletter_id` - Analytics queries
- `subscriber_email` - User activity queries
- `sent_at` - Time-based queries
### Link Clicks Collection (`link_clicks`)
Tracks individual link clicks.
```javascript
{
_id: ObjectId,
tracking_id: String, // Unique tracking ID for this link (UUID)
newsletter_id: String, // Which newsletter this link was in
subscriber_email: String, // Who clicked
article_url: String, // Original article URL
article_title: String, // Article title for reporting
clicked_at: DateTime, // When link was clicked
user_agent: String, // Browser/client info
created_at: DateTime // Record creation time
}
```
**Indexes**:
- `tracking_id` (unique) - Fast lookup for redirect requests
- `newsletter_id` - Analytics queries
- `article_url` - Article performance queries
- `subscriber_email` - User activity queries
### Subscriber Activity Collection (`subscriber_activity`)
Aggregated activity status for each subscriber.
```javascript
{
_id: ObjectId,
email: String, // Subscriber email (unique)
status: String, // 'active', 'inactive', or 'dormant'
last_opened_at: DateTime, // Most recent email open
last_clicked_at: DateTime, // Most recent link click
total_opens: Number, // Lifetime open count
total_clicks: Number, // Lifetime click count
newsletters_received: Number, // Total newsletters sent
newsletters_opened: Number, // Total newsletters opened
updated_at: DateTime // Last status update
}
```
**Indexes**:
- `email` (unique) - Fast lookup
- `status` - Filter by activity level
- `last_opened_at` - Time-based queries
## Error Handling
### Tracking Pixel Failures
- **Invalid tracking_id**: Return 1x1 transparent PNG anyway (don't break email rendering)
- **Database error**: Log error, return pixel (fail silently)
- **Multiple opens**: Update existing record, don't create duplicate
### Link Redirect Failures
- **Invalid tracking_id**: Redirect to website homepage
- **Database error**: Log error, redirect to homepage
- **Missing original URL**: Redirect to homepage
### Privacy Compliance
- **Data retention**: Anonymize tracking data after 90 days
- Remove email addresses
- Keep aggregated metrics
- **Opt-out**: Check subscriber preferences before tracking
- **GDPR deletion**: Provide endpoint to delete all tracking data for a user
## Testing Strategy
### Unit Tests
1. **Tracking ID Generation**
- Test UUID format
- Test uniqueness
2. **Pixel Endpoint**
- Test valid tracking_id returns PNG
- Test invalid tracking_id returns PNG
- Test database logging
3. **Link Redirect**
- Test valid tracking_id redirects correctly
- Test invalid tracking_id redirects to homepage
- Test click logging
4. **Analytics Calculations**
- Test open rate calculation
- Test click rate calculation
- Test activity status classification
### Integration Tests
1. **End-to-End Newsletter Flow**
- Send newsletter with tracking
- Simulate email open (pixel request)
- Simulate link click
- Verify database records
2. **Privacy Compliance**
- Test data anonymization
- Test user data deletion
- Test opt-out handling
### Performance Tests
1. **Redirect Speed**
- Measure redirect time (target: < 200ms)
- Test under load (100 concurrent requests)
2. **Pixel Serving**
- Test pixel response time
- Test caching headers
## API Endpoints
### Tracking Endpoints
```
GET /api/track/pixel/<tracking_id>
- Returns: 1x1 transparent PNG
- Logs: Email open event
GET /api/track/click/<tracking_id>
- Returns: 302 redirect to article URL
- Logs: Link click event
```
### Analytics Endpoints
```
GET /api/analytics/newsletter/<newsletter_id>
- Returns: Open rate, click rate, engagement metrics
GET /api/analytics/article/<article_id>
- Returns: Click count, click rate for specific article
GET /api/analytics/subscriber/<email>
- Returns: Activity status, engagement history
POST /api/analytics/update-activity
- Triggers: Batch update of subscriber activity statuses
- Returns: Update count
```
### Privacy Endpoints
```
DELETE /api/tracking/subscriber/<email>
- Deletes: All tracking data for subscriber
- Returns: Deletion confirmation
POST /api/tracking/anonymize
- Triggers: Anonymize tracking data older than 90 days
- Returns: Anonymization count
```
## Implementation Phases
### Phase 1: Core Tracking (MVP)
- Tracking ID generation
- Pixel endpoint
- Link redirect endpoint
- Database collections
- Newsletter template integration
### Phase 2: Analytics
- Open rate calculation
- Click rate calculation
- Activity status classification
- Analytics API endpoints
### Phase 3: Privacy & Compliance
- Data anonymization
- User data deletion
- Opt-out handling
- Privacy notices
### Phase 4: Optimization
- Caching for pixel endpoint
- Performance monitoring
- Batch processing for activity updates
## Security Considerations
1. **Rate Limiting**: Prevent abuse of tracking endpoints
2. **Input Validation**: Validate all tracking_ids (UUID format)
3. **SQL Injection**: Use parameterized queries (MongoDB safe by default)
4. **Privacy**: Don't expose subscriber emails in URLs
5. **HTTPS**: Ensure all tracking URLs use HTTPS in production
## Configuration
Add to `backend/.env`:
```env
# Tracking Configuration
TRACKING_ENABLED=true
TRACKING_API_URL=http://localhost:5000
TRACKING_DATA_RETENTION_DAYS=90
```
## Monitoring and Metrics
### Key Metrics to Track
1. **Email Opens**
- Overall open rate
- Open rate by newsletter
- Time to first open
2. **Link Clicks**
- Overall click rate
- Click rate by article
- Click-through rate (CTR)
3. **Subscriber Engagement**
- Active subscriber count
- Inactive subscriber count
- Dormant subscriber count
4. **System Performance**
- Pixel response time
- Redirect response time
- Database query performance