update
This commit is contained in:
24
news_sender/Dockerfile
Normal file
24
news_sender/Dockerfile
Normal file
@@ -0,0 +1,24 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install dependencies
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy sender files
|
||||
COPY . .
|
||||
|
||||
# Copy backend files (needed for tracking and config)
|
||||
COPY ../backend/services /app/backend/services
|
||||
COPY ../backend/.env /app/.env
|
||||
|
||||
# Make the scheduler executable
|
||||
RUN chmod +x scheduled_sender.py
|
||||
|
||||
# Set timezone to Berlin
|
||||
ENV TZ=Europe/Berlin
|
||||
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
|
||||
|
||||
# Run the scheduled sender
|
||||
CMD ["python", "-u", "scheduled_sender.py"]
|
||||
@@ -1,303 +0,0 @@
|
||||
# News Sender Microservice
|
||||
|
||||
Standalone service for sending Munich News Daily newsletters to subscribers.
|
||||
|
||||
## Features
|
||||
|
||||
- 📧 Sends beautiful HTML newsletters
|
||||
- 🤖 Uses AI-generated article summaries
|
||||
- 📊 Tracks sending statistics
|
||||
- 🧪 Test mode for development
|
||||
- 📝 Preview generation
|
||||
- 🔄 Fetches data from shared MongoDB
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
cd news_sender
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The service uses the same `.env` file as the backend (`../backend/.env`):
|
||||
|
||||
```env
|
||||
# MongoDB
|
||||
MONGODB_URI=mongodb://localhost:27017/
|
||||
|
||||
# Email (Gmail example)
|
||||
SMTP_SERVER=smtp.gmail.com
|
||||
SMTP_PORT=587
|
||||
EMAIL_USER=your-email@gmail.com
|
||||
EMAIL_PASSWORD=your-app-password
|
||||
|
||||
# Newsletter Settings (optional)
|
||||
NEWSLETTER_MAX_ARTICLES=10
|
||||
WEBSITE_URL=http://localhost:3000
|
||||
```
|
||||
|
||||
**Gmail Setup:**
|
||||
1. Enable 2-factor authentication
|
||||
2. Generate an App Password: https://support.google.com/accounts/answer/185833
|
||||
3. Use the App Password (not your regular password)
|
||||
|
||||
## Usage
|
||||
|
||||
### 1. Preview Newsletter
|
||||
|
||||
Generate HTML preview without sending:
|
||||
|
||||
```bash
|
||||
python sender_service.py preview
|
||||
```
|
||||
|
||||
This creates `newsletter_preview.html` - open it in your browser to see how the newsletter looks.
|
||||
|
||||
### 2. Send Test Email
|
||||
|
||||
Send to a single email address for testing:
|
||||
|
||||
```bash
|
||||
python sender_service.py test your-email@example.com
|
||||
```
|
||||
|
||||
### 3. Send to All Subscribers
|
||||
|
||||
Send newsletter to all active subscribers:
|
||||
|
||||
```bash
|
||||
# Send with default article count (10)
|
||||
python sender_service.py send
|
||||
|
||||
# Send with custom article count
|
||||
python sender_service.py send 15
|
||||
```
|
||||
|
||||
### 4. Use as Python Module
|
||||
|
||||
```python
|
||||
from sender_service import send_newsletter, preview_newsletter
|
||||
|
||||
# Send newsletter
|
||||
result = send_newsletter(max_articles=10)
|
||||
print(f"Sent to {result['sent_count']} subscribers")
|
||||
|
||||
# Generate preview
|
||||
html = preview_newsletter(max_articles=5)
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 1. Fetch Articles from MongoDB │
|
||||
│ - Get latest articles with AI summaries │
|
||||
│ - Sort by creation date (newest first) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 2. Fetch Active Subscribers │
|
||||
│ - Get all subscribers with status='active' │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 3. Render Newsletter HTML │
|
||||
│ - Load newsletter_template.html │
|
||||
│ - Populate with articles and metadata │
|
||||
│ - Generate beautiful HTML email │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 4. Send Emails │
|
||||
│ - Connect to SMTP server │
|
||||
│ - Send to each subscriber │
|
||||
│ - Track success/failure │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 5. Report Statistics │
|
||||
│ - Total sent │
|
||||
│ - Failed sends │
|
||||
│ - Error details │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Output Example
|
||||
|
||||
```
|
||||
======================================================================
|
||||
📧 Munich News Daily - Newsletter Sender
|
||||
======================================================================
|
||||
|
||||
Fetching latest 10 articles with AI summaries...
|
||||
✓ Found 10 articles
|
||||
|
||||
Fetching active subscribers...
|
||||
✓ Found 150 active subscriber(s)
|
||||
|
||||
Rendering newsletter HTML...
|
||||
✓ Newsletter rendered
|
||||
|
||||
Sending newsletter: 'Munich News Daily - November 10, 2024'
|
||||
----------------------------------------------------------------------
|
||||
[1/150] Sending to user1@example.com... ✓
|
||||
[2/150] Sending to user2@example.com... ✓
|
||||
[3/150] Sending to user3@example.com... ✓
|
||||
...
|
||||
|
||||
======================================================================
|
||||
📊 Sending Complete
|
||||
======================================================================
|
||||
✓ Successfully sent: 148
|
||||
✗ Failed: 2
|
||||
📰 Articles included: 10
|
||||
======================================================================
|
||||
```
|
||||
|
||||
## Scheduling
|
||||
|
||||
### Using Cron (Linux/Mac)
|
||||
|
||||
Send newsletter daily at 8 AM:
|
||||
|
||||
```bash
|
||||
# Edit crontab
|
||||
crontab -e
|
||||
|
||||
# Add this line
|
||||
0 8 * * * cd /path/to/news_sender && /path/to/venv/bin/python sender_service.py send
|
||||
```
|
||||
|
||||
### Using systemd Timer (Linux)
|
||||
|
||||
Create `/etc/systemd/system/news-sender.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Munich News Sender
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
WorkingDirectory=/path/to/news_sender
|
||||
ExecStart=/path/to/venv/bin/python sender_service.py send
|
||||
User=your-user
|
||||
```
|
||||
|
||||
Create `/etc/systemd/system/news-sender.timer`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Send Munich News Daily at 8 AM
|
||||
|
||||
[Timer]
|
||||
OnCalendar=daily
|
||||
OnCalendar=*-*-* 08:00:00
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
```
|
||||
|
||||
Enable and start:
|
||||
|
||||
```bash
|
||||
sudo systemctl enable news-sender.timer
|
||||
sudo systemctl start news-sender.timer
|
||||
```
|
||||
|
||||
### Using Docker
|
||||
|
||||
Create `Dockerfile`:
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY sender_service.py newsletter_template.html ./
|
||||
|
||||
CMD ["python", "sender_service.py", "send"]
|
||||
```
|
||||
|
||||
Build and run:
|
||||
|
||||
```bash
|
||||
docker build -t news-sender .
|
||||
docker run --env-file ../backend/.env news-sender
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Email credentials not configured"
|
||||
- Check that `EMAIL_USER` and `EMAIL_PASSWORD` are set in `.env`
|
||||
- For Gmail, use an App Password, not your regular password
|
||||
|
||||
### "No articles with summaries found"
|
||||
- Run the crawler first: `cd ../news_crawler && python crawler_service.py 10`
|
||||
- Make sure Ollama is enabled and working
|
||||
- Check MongoDB has articles with `summary` field
|
||||
|
||||
### "No active subscribers found"
|
||||
- Add subscribers via the backend API
|
||||
- Check subscriber status is 'active' in MongoDB
|
||||
|
||||
### SMTP Connection Errors
|
||||
- Verify SMTP server and port are correct
|
||||
- Check firewall isn't blocking SMTP port
|
||||
- For Gmail, ensure "Less secure app access" is enabled or use App Password
|
||||
|
||||
### Emails Going to Spam
|
||||
- Set up SPF, DKIM, and DMARC records for your domain
|
||||
- Use a verified email address
|
||||
- Avoid spam trigger words in subject/content
|
||||
- Include unsubscribe link (already included in template)
|
||||
|
||||
## Architecture
|
||||
|
||||
This is a standalone microservice that:
|
||||
- Runs independently of the backend
|
||||
- Shares the same MongoDB database
|
||||
- Can be deployed separately
|
||||
- Can be scheduled independently
|
||||
- Has no dependencies on backend code
|
||||
|
||||
## Integration with Other Services
|
||||
|
||||
```
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Backend │ │ Crawler │ │ Sender │
|
||||
│ (Flask) │ │ (Scraper) │ │ (Email) │
|
||||
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
|
||||
│ │ │
|
||||
│ │ │
|
||||
└────────────────────┴─────────────────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ MongoDB │
|
||||
│ (Shared DB) │
|
||||
└────────────────┘
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Test the newsletter:**
|
||||
```bash
|
||||
python sender_service.py test your-email@example.com
|
||||
```
|
||||
|
||||
2. **Schedule daily sending:**
|
||||
- Set up cron job or systemd timer
|
||||
- Choose appropriate time (e.g., 8 AM)
|
||||
|
||||
3. **Monitor sending:**
|
||||
- Check logs for errors
|
||||
- Track open rates (requires email tracking service)
|
||||
- Monitor spam complaints
|
||||
|
||||
4. **Optimize:**
|
||||
- Add email tracking pixels
|
||||
- A/B test subject lines
|
||||
- Personalize content per subscriber
|
||||
@@ -146,6 +146,14 @@
|
||||
<a href="{{ unsubscribe_link }}" style="color: #999999; text-decoration: none;">Unsubscribe</a>
|
||||
</p>
|
||||
|
||||
{% if tracking_enabled %}
|
||||
<!-- Privacy Notice -->
|
||||
<p style="margin: 20px 0 0 0; font-size: 11px; color: #666666; line-height: 1.4;">
|
||||
This email contains tracking to measure engagement and improve our content.<br>
|
||||
We respect your privacy and anonymize data after 90 days.
|
||||
</p>
|
||||
{% endif %}
|
||||
|
||||
<p style="margin: 20px 0 0 0; font-size: 11px; color: #666666;">
|
||||
© {{ year }} Munich News Daily. All rights reserved.
|
||||
</p>
|
||||
|
||||
@@ -1,3 +1,6 @@
|
||||
pymongo==4.6.1
|
||||
python-dotenv==1.0.0
|
||||
Jinja2==3.1.2
|
||||
beautifulsoup4==4.12.2
|
||||
schedule==1.2.0
|
||||
pytz==2023.3
|
||||
|
||||
178
news_sender/scheduled_sender.py
Executable file
178
news_sender/scheduled_sender.py
Executable file
@@ -0,0 +1,178 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Scheduled newsletter sender that runs daily at 7 AM Berlin time
|
||||
Waits for crawler to finish before sending to ensure fresh content
|
||||
"""
|
||||
import schedule
|
||||
import time
|
||||
from datetime import datetime, timedelta
|
||||
import pytz
|
||||
from pathlib import Path
|
||||
import sys
|
||||
|
||||
# Add current directory to path
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
|
||||
from sender_service import send_newsletter, get_latest_articles, Config
|
||||
|
||||
# Berlin timezone
|
||||
BERLIN_TZ = pytz.timezone('Europe/Berlin')
|
||||
|
||||
# Maximum time to wait for crawler (in minutes)
|
||||
MAX_WAIT_TIME = 30
|
||||
|
||||
def check_crawler_finished():
|
||||
"""
|
||||
Check if crawler has finished by looking for recent articles
|
||||
Returns: (bool, str) - (is_finished, message)
|
||||
"""
|
||||
try:
|
||||
# Check if we have articles from today
|
||||
articles = get_latest_articles(max_articles=1, hours=2)
|
||||
|
||||
if articles:
|
||||
# Check if the most recent article was crawled recently (within last 2 hours)
|
||||
latest_article = articles[0]
|
||||
crawled_at = latest_article.get('crawled_at')
|
||||
|
||||
if crawled_at:
|
||||
time_since_crawl = datetime.utcnow() - crawled_at
|
||||
minutes_since = time_since_crawl.total_seconds() / 60
|
||||
|
||||
if minutes_since < 120: # Within last 2 hours
|
||||
return True, f"Crawler finished {int(minutes_since)} minutes ago"
|
||||
|
||||
return False, "No recent articles found"
|
||||
|
||||
except Exception as e:
|
||||
return False, f"Error checking crawler status: {e}"
|
||||
|
||||
|
||||
def wait_for_crawler(max_wait_minutes=30):
|
||||
"""
|
||||
Wait for crawler to finish before sending newsletter
|
||||
|
||||
Args:
|
||||
max_wait_minutes: Maximum time to wait in minutes
|
||||
|
||||
Returns:
|
||||
bool: True if crawler finished, False if timeout
|
||||
"""
|
||||
berlin_time = datetime.now(BERLIN_TZ)
|
||||
print(f"\n⏳ Waiting for crawler to finish...")
|
||||
print(f" Current time: {berlin_time.strftime('%H:%M:%S %Z')}")
|
||||
print(f" Max wait time: {max_wait_minutes} minutes")
|
||||
|
||||
start_time = time.time()
|
||||
check_interval = 30 # Check every 30 seconds
|
||||
|
||||
while True:
|
||||
elapsed_minutes = (time.time() - start_time) / 60
|
||||
|
||||
# Check if crawler finished
|
||||
is_finished, message = check_crawler_finished()
|
||||
|
||||
if is_finished:
|
||||
print(f" ✓ {message}")
|
||||
return True
|
||||
|
||||
# Check if we've exceeded max wait time
|
||||
if elapsed_minutes >= max_wait_minutes:
|
||||
print(f" ⚠ Timeout after {max_wait_minutes} minutes")
|
||||
print(f" Proceeding with available articles...")
|
||||
return False
|
||||
|
||||
# Show progress
|
||||
remaining = max_wait_minutes - elapsed_minutes
|
||||
print(f" ⏳ Still waiting... ({remaining:.1f} minutes remaining) - {message}")
|
||||
|
||||
# Wait before next check
|
||||
time.sleep(check_interval)
|
||||
|
||||
|
||||
def run_sender():
|
||||
"""Run the newsletter sender with crawler coordination"""
|
||||
berlin_time = datetime.now(BERLIN_TZ)
|
||||
print(f"\n{'='*70}")
|
||||
print(f"📧 Scheduled newsletter sender started")
|
||||
print(f" Time: {berlin_time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
|
||||
print(f"{'='*70}\n")
|
||||
|
||||
try:
|
||||
# Wait for crawler to finish (max 30 minutes)
|
||||
crawler_finished = wait_for_crawler(max_wait_minutes=MAX_WAIT_TIME)
|
||||
|
||||
if not crawler_finished:
|
||||
print(f"\n⚠ Crawler may still be running, but proceeding anyway...")
|
||||
|
||||
print(f"\n{'='*70}")
|
||||
print(f"📧 Starting newsletter send...")
|
||||
print(f"{'='*70}\n")
|
||||
|
||||
# Send newsletter to all subscribers
|
||||
result = send_newsletter(max_articles=Config.MAX_ARTICLES)
|
||||
|
||||
if result['success']:
|
||||
print(f"\n{'='*70}")
|
||||
print(f"✅ Newsletter sent successfully!")
|
||||
print(f" Sent: {result['sent_count']}/{result['total_subscribers']}")
|
||||
print(f" Articles: {result['article_count']}")
|
||||
print(f" Failed: {result['failed_count']}")
|
||||
print(f"{'='*70}\n")
|
||||
else:
|
||||
print(f"\n{'='*70}")
|
||||
print(f"❌ Newsletter send failed: {result.get('error', 'Unknown error')}")
|
||||
print(f"{'='*70}\n")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n{'='*70}")
|
||||
print(f"❌ Scheduled sender error: {e}")
|
||||
print(f"{'='*70}\n")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
def main():
|
||||
"""Main scheduler loop"""
|
||||
print("📧 Munich News Newsletter Scheduler")
|
||||
print("="*70)
|
||||
print("Schedule: Daily at 7:00 AM Berlin time")
|
||||
print("Timezone: Europe/Berlin (CET/CEST)")
|
||||
print("Coordination: Waits for crawler to finish (max 30 min)")
|
||||
print("="*70)
|
||||
|
||||
# Schedule the sender to run at 7 AM Berlin time
|
||||
schedule.every().day.at("07:00").do(run_sender)
|
||||
|
||||
# Show next run time
|
||||
berlin_time = datetime.now(BERLIN_TZ)
|
||||
print(f"\nCurrent time (Berlin): {berlin_time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
|
||||
|
||||
# Get next scheduled run
|
||||
next_run = schedule.next_run()
|
||||
if next_run:
|
||||
# Convert to Berlin time for display
|
||||
next_run_berlin = next_run.astimezone(BERLIN_TZ)
|
||||
print(f"Next scheduled run: {next_run_berlin.strftime('%Y-%m-%d %H:%M:%S %Z')}")
|
||||
|
||||
print("\n⏳ Scheduler is running... (Press Ctrl+C to stop)\n")
|
||||
|
||||
# Optional: Run immediately on startup (comment out if you don't want this)
|
||||
# print("🚀 Running initial send on startup...")
|
||||
# run_sender()
|
||||
|
||||
# Keep the scheduler running
|
||||
while True:
|
||||
schedule.run_pending()
|
||||
time.sleep(60) # Check every minute
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
try:
|
||||
main()
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n👋 Scheduler stopped by user")
|
||||
except Exception as e:
|
||||
print(f"\n\n❌ Scheduler error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
@@ -11,8 +11,17 @@ from pathlib import Path
|
||||
from jinja2 import Template
|
||||
from pymongo import MongoClient
|
||||
import os
|
||||
import sys
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Add backend directory to path for importing tracking service
|
||||
backend_dir = Path(__file__).parent.parent / 'backend'
|
||||
sys.path.insert(0, str(backend_dir))
|
||||
|
||||
# Import tracking modules
|
||||
from services import tracking_service
|
||||
from tracking_integration import inject_tracking_pixel, replace_article_links, generate_tracking_urls
|
||||
|
||||
# Load environment variables from backend/.env
|
||||
backend_dir = Path(__file__).parent.parent / 'backend'
|
||||
env_path = backend_dir / '.env'
|
||||
@@ -40,6 +49,11 @@ class Config:
|
||||
MAX_ARTICLES = int(os.getenv('NEWSLETTER_MAX_ARTICLES', '10'))
|
||||
HOURS_LOOKBACK = int(os.getenv('NEWSLETTER_HOURS_LOOKBACK', '24'))
|
||||
WEBSITE_URL = os.getenv('WEBSITE_URL', 'http://localhost:3000')
|
||||
|
||||
# Tracking
|
||||
TRACKING_ENABLED = os.getenv('TRACKING_ENABLED', 'true').lower() == 'true'
|
||||
TRACKING_API_URL = os.getenv('TRACKING_API_URL', 'http://localhost:5001')
|
||||
TRACKING_DATA_RETENTION_DAYS = int(os.getenv('TRACKING_DATA_RETENTION_DAYS', '90'))
|
||||
|
||||
|
||||
# MongoDB connection
|
||||
@@ -117,15 +131,20 @@ def get_active_subscribers():
|
||||
return [doc['email'] for doc in cursor]
|
||||
|
||||
|
||||
def render_newsletter_html(articles):
|
||||
def render_newsletter_html(articles, tracking_enabled=False, pixel_tracking_id=None,
|
||||
link_tracking_map=None, api_url=None):
|
||||
"""
|
||||
Render newsletter HTML from template
|
||||
Render newsletter HTML from template with optional tracking integration
|
||||
|
||||
Args:
|
||||
articles: List of article dictionaries
|
||||
tracking_enabled: Whether to inject tracking pixel and replace links
|
||||
pixel_tracking_id: Tracking ID for the email open pixel
|
||||
link_tracking_map: Dictionary mapping original URLs to tracking IDs
|
||||
api_url: Base URL for the tracking API
|
||||
|
||||
Returns:
|
||||
str: Rendered HTML content
|
||||
str: Rendered HTML content with tracking injected if enabled
|
||||
"""
|
||||
# Load template
|
||||
template_path = Path(__file__).parent / 'newsletter_template.html'
|
||||
@@ -142,11 +161,23 @@ def render_newsletter_html(articles):
|
||||
'article_count': len(articles),
|
||||
'articles': articles,
|
||||
'unsubscribe_link': f'{Config.WEBSITE_URL}/unsubscribe',
|
||||
'website_link': Config.WEBSITE_URL
|
||||
'website_link': Config.WEBSITE_URL,
|
||||
'tracking_enabled': tracking_enabled
|
||||
}
|
||||
|
||||
# Render HTML
|
||||
return template.render(**template_data)
|
||||
html = template.render(**template_data)
|
||||
|
||||
# Inject tracking if enabled
|
||||
if tracking_enabled and pixel_tracking_id and api_url:
|
||||
# Inject tracking pixel
|
||||
html = inject_tracking_pixel(html, pixel_tracking_id, api_url)
|
||||
|
||||
# Replace article links with tracking URLs
|
||||
if link_tracking_map:
|
||||
html = replace_article_links(html, link_tracking_map, api_url)
|
||||
|
||||
return html
|
||||
|
||||
|
||||
def send_email(to_email, subject, html_content):
|
||||
@@ -246,14 +277,14 @@ def send_newsletter(max_articles=None, test_email=None):
|
||||
'error': 'No active subscribers'
|
||||
}
|
||||
|
||||
# Render newsletter
|
||||
print("\nRendering newsletter HTML...")
|
||||
html_content = render_newsletter_html(articles)
|
||||
print("✓ Newsletter rendered")
|
||||
# Generate newsletter ID (date-based)
|
||||
newsletter_id = f"newsletter-{datetime.now().strftime('%Y-%m-%d')}"
|
||||
|
||||
# Send to subscribers
|
||||
subject = f"Munich News Daily - {datetime.now().strftime('%B %d, %Y')}"
|
||||
print(f"\nSending newsletter: '{subject}'")
|
||||
print(f"Newsletter ID: {newsletter_id}")
|
||||
print(f"Tracking enabled: {Config.TRACKING_ENABLED}")
|
||||
print("-" * 70)
|
||||
|
||||
sent_count = 0
|
||||
@@ -262,6 +293,34 @@ def send_newsletter(max_articles=None, test_email=None):
|
||||
|
||||
for i, email in enumerate(subscribers, 1):
|
||||
print(f"[{i}/{len(subscribers)}] Sending to {email}...", end=' ')
|
||||
|
||||
# Generate tracking data for this subscriber if tracking is enabled
|
||||
if Config.TRACKING_ENABLED:
|
||||
try:
|
||||
tracking_data = generate_tracking_urls(
|
||||
articles=articles,
|
||||
newsletter_id=newsletter_id,
|
||||
subscriber_email=email,
|
||||
tracking_service=tracking_service
|
||||
)
|
||||
|
||||
# Render newsletter with tracking
|
||||
html_content = render_newsletter_html(
|
||||
articles=articles,
|
||||
tracking_enabled=True,
|
||||
pixel_tracking_id=tracking_data['pixel_tracking_id'],
|
||||
link_tracking_map=tracking_data['link_tracking_map'],
|
||||
api_url=Config.TRACKING_API_URL
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"⚠ Tracking error: {e}, sending without tracking...", end=' ')
|
||||
# Fallback: send without tracking
|
||||
html_content = render_newsletter_html(articles)
|
||||
else:
|
||||
# Render newsletter without tracking
|
||||
html_content = render_newsletter_html(articles)
|
||||
|
||||
# Send email
|
||||
success, error = send_email(email, subject, html_content)
|
||||
|
||||
if success:
|
||||
@@ -310,12 +369,11 @@ def preview_newsletter(max_articles=None, hours=None):
|
||||
today_date = datetime.now().strftime('%B %d, %Y')
|
||||
return f"<h1>No articles from today found</h1><p>No articles published today ({today_date}). Run the crawler with Ollama enabled to get fresh content.</p>"
|
||||
|
||||
return render_newsletter_html(articles)
|
||||
# Preview without tracking
|
||||
return render_newsletter_html(articles, tracking_enabled=False)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
import sys
|
||||
|
||||
# Parse command line arguments
|
||||
if len(sys.argv) > 1:
|
||||
command = sys.argv[1]
|
||||
|
||||
150
news_sender/tracking_integration.py
Normal file
150
news_sender/tracking_integration.py
Normal file
@@ -0,0 +1,150 @@
|
||||
"""
|
||||
Tracking integration module for Munich News Daily newsletter system.
|
||||
Handles injection of tracking pixels and replacement of article links with tracking URLs.
|
||||
"""
|
||||
|
||||
import re
|
||||
from typing import Dict, List
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
|
||||
def inject_tracking_pixel(html: str, tracking_id: str, api_url: str) -> str:
|
||||
"""
|
||||
Inject tracking pixel into newsletter HTML before closing </body> tag.
|
||||
|
||||
The tracking pixel is a 1x1 transparent image that loads when the email is opened,
|
||||
allowing us to track email opens.
|
||||
|
||||
Args:
|
||||
html: Original newsletter HTML content
|
||||
tracking_id: Unique tracking ID for this newsletter send (None if tracking disabled)
|
||||
api_url: Base URL for the tracking API (e.g., http://localhost:5001)
|
||||
|
||||
Returns:
|
||||
str: HTML with tracking pixel injected (unchanged if tracking_id is None)
|
||||
|
||||
Example:
|
||||
>>> html = '<html><body><p>Content</p></body></html>'
|
||||
>>> inject_tracking_pixel(html, 'abc-123', 'http://api.example.com')
|
||||
'<html><body><p>Content</p><img src="http://api.example.com/api/track/pixel/abc-123" width="1" height="1" alt="" /></body></html>'
|
||||
"""
|
||||
# Skip tracking if no tracking_id provided (subscriber opted out)
|
||||
if not tracking_id:
|
||||
return html
|
||||
|
||||
# Construct tracking pixel URL
|
||||
pixel_url = f"{api_url}/api/track/pixel/{tracking_id}"
|
||||
|
||||
# Create tracking pixel HTML
|
||||
pixel_html = f'<img src="{pixel_url}" width="1" height="1" alt="" style="display:block;" />'
|
||||
|
||||
# Inject pixel before closing </body> tag
|
||||
if '</body>' in html:
|
||||
html = html.replace('</body>', f'{pixel_html}</body>')
|
||||
else:
|
||||
# Fallback: append to end if no </body> tag found
|
||||
html += pixel_html
|
||||
|
||||
return html
|
||||
|
||||
|
||||
def replace_article_links(
|
||||
html: str,
|
||||
link_tracking_map: Dict[str, str],
|
||||
api_url: str
|
||||
) -> str:
|
||||
"""
|
||||
Replace article links in newsletter HTML with tracking URLs.
|
||||
|
||||
Finds all article links in the HTML and replaces them with tracking redirect URLs
|
||||
that log clicks before redirecting to the original article.
|
||||
|
||||
Args:
|
||||
html: Original newsletter HTML content
|
||||
link_tracking_map: Dictionary mapping original URLs to tracking IDs (empty if tracking disabled)
|
||||
api_url: Base URL for the tracking API (e.g., http://localhost:5001)
|
||||
|
||||
Returns:
|
||||
str: HTML with article links replaced by tracking URLs (unchanged if map is empty)
|
||||
|
||||
Example:
|
||||
>>> html = '<a href="https://example.com/article">Read</a>'
|
||||
>>> mapping = {'https://example.com/article': 'track-123'}
|
||||
>>> replace_article_links(html, mapping, 'http://api.example.com')
|
||||
'<a href="http://api.example.com/api/track/click/track-123">Read</a>'
|
||||
"""
|
||||
# Skip tracking if no tracking map provided (subscriber opted out)
|
||||
if not link_tracking_map:
|
||||
return html
|
||||
|
||||
# Parse HTML with BeautifulSoup
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
|
||||
# Find all <a> tags with href attributes
|
||||
for link in soup.find_all('a', href=True):
|
||||
original_url = link['href']
|
||||
|
||||
# Check if this URL should be tracked
|
||||
if original_url in link_tracking_map:
|
||||
tracking_id = link_tracking_map[original_url]
|
||||
tracking_url = f"{api_url}/api/track/click/{tracking_id}"
|
||||
|
||||
# Replace the href with tracking URL
|
||||
link['href'] = tracking_url
|
||||
|
||||
# Return modified HTML
|
||||
return str(soup)
|
||||
|
||||
|
||||
def generate_tracking_urls(
|
||||
articles: List[Dict],
|
||||
newsletter_id: str,
|
||||
subscriber_email: str,
|
||||
tracking_service
|
||||
) -> Dict[str, str]:
|
||||
"""
|
||||
Generate tracking records for all article links and return URL mapping.
|
||||
|
||||
Creates tracking records in the database for each article link and returns
|
||||
a mapping of original URLs to tracking IDs.
|
||||
|
||||
Args:
|
||||
articles: List of article dictionaries with 'link' and 'title' keys
|
||||
newsletter_id: Unique identifier for the newsletter batch
|
||||
subscriber_email: Email address of the recipient
|
||||
tracking_service: Tracking service module with create_newsletter_tracking function
|
||||
|
||||
Returns:
|
||||
dict: Dictionary containing:
|
||||
- pixel_tracking_id: ID for the tracking pixel
|
||||
- link_tracking_map: Dict mapping original URLs to tracking IDs
|
||||
|
||||
Example:
|
||||
>>> articles = [{'link': 'https://example.com/1', 'title': 'Article 1'}]
|
||||
>>> generate_tracking_urls(articles, 'news-2024-01-01', 'user@example.com', tracking_service)
|
||||
{
|
||||
'pixel_tracking_id': 'uuid-for-pixel',
|
||||
'link_tracking_map': {'https://example.com/1': 'uuid-for-link'}
|
||||
}
|
||||
"""
|
||||
# Prepare article links for tracking
|
||||
article_links = []
|
||||
for article in articles:
|
||||
if 'link' in article and article['link']:
|
||||
article_links.append({
|
||||
'url': article['link'],
|
||||
'title': article.get('title', '')
|
||||
})
|
||||
|
||||
# Create tracking records using the tracking service
|
||||
tracking_data = tracking_service.create_newsletter_tracking(
|
||||
newsletter_id=newsletter_id,
|
||||
subscriber_email=subscriber_email,
|
||||
article_links=article_links
|
||||
)
|
||||
|
||||
return {
|
||||
'pixel_tracking_id': tracking_data['pixel_tracking_id'],
|
||||
'link_tracking_map': tracking_data['link_tracking_map'],
|
||||
'tracking_enabled': tracking_data.get('tracking_enabled', True)
|
||||
}
|
||||
Reference in New Issue
Block a user