This commit is contained in:
2025-11-11 14:09:21 +01:00
parent bcd0a10576
commit 1075a91eac
57 changed files with 5598 additions and 1366 deletions

24
news_sender/Dockerfile Normal file
View File

@@ -0,0 +1,24 @@
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy sender files
COPY . .
# Copy backend files (needed for tracking and config)
COPY ../backend/services /app/backend/services
COPY ../backend/.env /app/.env
# Make the scheduler executable
RUN chmod +x scheduled_sender.py
# Set timezone to Berlin
ENV TZ=Europe/Berlin
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
# Run the scheduled sender
CMD ["python", "-u", "scheduled_sender.py"]

View File

@@ -1,303 +0,0 @@
# News Sender Microservice
Standalone service for sending Munich News Daily newsletters to subscribers.
## Features
- 📧 Sends beautiful HTML newsletters
- 🤖 Uses AI-generated article summaries
- 📊 Tracks sending statistics
- 🧪 Test mode for development
- 📝 Preview generation
- 🔄 Fetches data from shared MongoDB
## Installation
```bash
cd news_sender
pip install -r requirements.txt
```
## Configuration
The service uses the same `.env` file as the backend (`../backend/.env`):
```env
# MongoDB
MONGODB_URI=mongodb://localhost:27017/
# Email (Gmail example)
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
EMAIL_USER=your-email@gmail.com
EMAIL_PASSWORD=your-app-password
# Newsletter Settings (optional)
NEWSLETTER_MAX_ARTICLES=10
WEBSITE_URL=http://localhost:3000
```
**Gmail Setup:**
1. Enable 2-factor authentication
2. Generate an App Password: https://support.google.com/accounts/answer/185833
3. Use the App Password (not your regular password)
## Usage
### 1. Preview Newsletter
Generate HTML preview without sending:
```bash
python sender_service.py preview
```
This creates `newsletter_preview.html` - open it in your browser to see how the newsletter looks.
### 2. Send Test Email
Send to a single email address for testing:
```bash
python sender_service.py test your-email@example.com
```
### 3. Send to All Subscribers
Send newsletter to all active subscribers:
```bash
# Send with default article count (10)
python sender_service.py send
# Send with custom article count
python sender_service.py send 15
```
### 4. Use as Python Module
```python
from sender_service import send_newsletter, preview_newsletter
# Send newsletter
result = send_newsletter(max_articles=10)
print(f"Sent to {result['sent_count']} subscribers")
# Generate preview
html = preview_newsletter(max_articles=5)
```
## How It Works
```
┌─────────────────────────────────────────────────────────┐
│ 1. Fetch Articles from MongoDB │
│ - Get latest articles with AI summaries │
│ - Sort by creation date (newest first) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 2. Fetch Active Subscribers │
│ - Get all subscribers with status='active' │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 3. Render Newsletter HTML │
│ - Load newsletter_template.html │
│ - Populate with articles and metadata │
│ - Generate beautiful HTML email │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 4. Send Emails │
│ - Connect to SMTP server │
│ - Send to each subscriber │
│ - Track success/failure │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 5. Report Statistics │
│ - Total sent │
│ - Failed sends │
│ - Error details │
└─────────────────────────────────────────────────────────┘
```
## Output Example
```
======================================================================
📧 Munich News Daily - Newsletter Sender
======================================================================
Fetching latest 10 articles with AI summaries...
✓ Found 10 articles
Fetching active subscribers...
✓ Found 150 active subscriber(s)
Rendering newsletter HTML...
✓ Newsletter rendered
Sending newsletter: 'Munich News Daily - November 10, 2024'
----------------------------------------------------------------------
[1/150] Sending to user1@example.com... ✓
[2/150] Sending to user2@example.com... ✓
[3/150] Sending to user3@example.com... ✓
...
======================================================================
📊 Sending Complete
======================================================================
✓ Successfully sent: 148
✗ Failed: 2
📰 Articles included: 10
======================================================================
```
## Scheduling
### Using Cron (Linux/Mac)
Send newsletter daily at 8 AM:
```bash
# Edit crontab
crontab -e
# Add this line
0 8 * * * cd /path/to/news_sender && /path/to/venv/bin/python sender_service.py send
```
### Using systemd Timer (Linux)
Create `/etc/systemd/system/news-sender.service`:
```ini
[Unit]
Description=Munich News Sender
[Service]
Type=oneshot
WorkingDirectory=/path/to/news_sender
ExecStart=/path/to/venv/bin/python sender_service.py send
User=your-user
```
Create `/etc/systemd/system/news-sender.timer`:
```ini
[Unit]
Description=Send Munich News Daily at 8 AM
[Timer]
OnCalendar=daily
OnCalendar=*-*-* 08:00:00
[Install]
WantedBy=timers.target
```
Enable and start:
```bash
sudo systemctl enable news-sender.timer
sudo systemctl start news-sender.timer
```
### Using Docker
Create `Dockerfile`:
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY sender_service.py newsletter_template.html ./
CMD ["python", "sender_service.py", "send"]
```
Build and run:
```bash
docker build -t news-sender .
docker run --env-file ../backend/.env news-sender
```
## Troubleshooting
### "Email credentials not configured"
- Check that `EMAIL_USER` and `EMAIL_PASSWORD` are set in `.env`
- For Gmail, use an App Password, not your regular password
### "No articles with summaries found"
- Run the crawler first: `cd ../news_crawler && python crawler_service.py 10`
- Make sure Ollama is enabled and working
- Check MongoDB has articles with `summary` field
### "No active subscribers found"
- Add subscribers via the backend API
- Check subscriber status is 'active' in MongoDB
### SMTP Connection Errors
- Verify SMTP server and port are correct
- Check firewall isn't blocking SMTP port
- For Gmail, ensure "Less secure app access" is enabled or use App Password
### Emails Going to Spam
- Set up SPF, DKIM, and DMARC records for your domain
- Use a verified email address
- Avoid spam trigger words in subject/content
- Include unsubscribe link (already included in template)
## Architecture
This is a standalone microservice that:
- Runs independently of the backend
- Shares the same MongoDB database
- Can be deployed separately
- Can be scheduled independently
- Has no dependencies on backend code
## Integration with Other Services
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Backend │ │ Crawler │ │ Sender │
│ (Flask) │ │ (Scraper) │ │ (Email) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ │ │
└────────────────────┴─────────────────────┘
┌───────▼────────┐
│ MongoDB │
│ (Shared DB) │
└────────────────┘
```
## Next Steps
1. **Test the newsletter:**
```bash
python sender_service.py test your-email@example.com
```
2. **Schedule daily sending:**
- Set up cron job or systemd timer
- Choose appropriate time (e.g., 8 AM)
3. **Monitor sending:**
- Check logs for errors
- Track open rates (requires email tracking service)
- Monitor spam complaints
4. **Optimize:**
- Add email tracking pixels
- A/B test subject lines
- Personalize content per subscriber

View File

@@ -146,6 +146,14 @@
<a href="{{ unsubscribe_link }}" style="color: #999999; text-decoration: none;">Unsubscribe</a>
</p>
{% if tracking_enabled %}
<!-- Privacy Notice -->
<p style="margin: 20px 0 0 0; font-size: 11px; color: #666666; line-height: 1.4;">
This email contains tracking to measure engagement and improve our content.<br>
We respect your privacy and anonymize data after 90 days.
</p>
{% endif %}
<p style="margin: 20px 0 0 0; font-size: 11px; color: #666666;">
© {{ year }} Munich News Daily. All rights reserved.
</p>

View File

@@ -1,3 +1,6 @@
pymongo==4.6.1
python-dotenv==1.0.0
Jinja2==3.1.2
beautifulsoup4==4.12.2
schedule==1.2.0
pytz==2023.3

178
news_sender/scheduled_sender.py Executable file
View File

@@ -0,0 +1,178 @@
#!/usr/bin/env python3
"""
Scheduled newsletter sender that runs daily at 7 AM Berlin time
Waits for crawler to finish before sending to ensure fresh content
"""
import schedule
import time
from datetime import datetime, timedelta
import pytz
from pathlib import Path
import sys
# Add current directory to path
sys.path.insert(0, str(Path(__file__).parent))
from sender_service import send_newsletter, get_latest_articles, Config
# Berlin timezone
BERLIN_TZ = pytz.timezone('Europe/Berlin')
# Maximum time to wait for crawler (in minutes)
MAX_WAIT_TIME = 30
def check_crawler_finished():
"""
Check if crawler has finished by looking for recent articles
Returns: (bool, str) - (is_finished, message)
"""
try:
# Check if we have articles from today
articles = get_latest_articles(max_articles=1, hours=2)
if articles:
# Check if the most recent article was crawled recently (within last 2 hours)
latest_article = articles[0]
crawled_at = latest_article.get('crawled_at')
if crawled_at:
time_since_crawl = datetime.utcnow() - crawled_at
minutes_since = time_since_crawl.total_seconds() / 60
if minutes_since < 120: # Within last 2 hours
return True, f"Crawler finished {int(minutes_since)} minutes ago"
return False, "No recent articles found"
except Exception as e:
return False, f"Error checking crawler status: {e}"
def wait_for_crawler(max_wait_minutes=30):
"""
Wait for crawler to finish before sending newsletter
Args:
max_wait_minutes: Maximum time to wait in minutes
Returns:
bool: True if crawler finished, False if timeout
"""
berlin_time = datetime.now(BERLIN_TZ)
print(f"\n⏳ Waiting for crawler to finish...")
print(f" Current time: {berlin_time.strftime('%H:%M:%S %Z')}")
print(f" Max wait time: {max_wait_minutes} minutes")
start_time = time.time()
check_interval = 30 # Check every 30 seconds
while True:
elapsed_minutes = (time.time() - start_time) / 60
# Check if crawler finished
is_finished, message = check_crawler_finished()
if is_finished:
print(f"{message}")
return True
# Check if we've exceeded max wait time
if elapsed_minutes >= max_wait_minutes:
print(f" ⚠ Timeout after {max_wait_minutes} minutes")
print(f" Proceeding with available articles...")
return False
# Show progress
remaining = max_wait_minutes - elapsed_minutes
print(f" ⏳ Still waiting... ({remaining:.1f} minutes remaining) - {message}")
# Wait before next check
time.sleep(check_interval)
def run_sender():
"""Run the newsletter sender with crawler coordination"""
berlin_time = datetime.now(BERLIN_TZ)
print(f"\n{'='*70}")
print(f"📧 Scheduled newsletter sender started")
print(f" Time: {berlin_time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
print(f"{'='*70}\n")
try:
# Wait for crawler to finish (max 30 minutes)
crawler_finished = wait_for_crawler(max_wait_minutes=MAX_WAIT_TIME)
if not crawler_finished:
print(f"\n⚠ Crawler may still be running, but proceeding anyway...")
print(f"\n{'='*70}")
print(f"📧 Starting newsletter send...")
print(f"{'='*70}\n")
# Send newsletter to all subscribers
result = send_newsletter(max_articles=Config.MAX_ARTICLES)
if result['success']:
print(f"\n{'='*70}")
print(f"✅ Newsletter sent successfully!")
print(f" Sent: {result['sent_count']}/{result['total_subscribers']}")
print(f" Articles: {result['article_count']}")
print(f" Failed: {result['failed_count']}")
print(f"{'='*70}\n")
else:
print(f"\n{'='*70}")
print(f"❌ Newsletter send failed: {result.get('error', 'Unknown error')}")
print(f"{'='*70}\n")
except Exception as e:
print(f"\n{'='*70}")
print(f"❌ Scheduled sender error: {e}")
print(f"{'='*70}\n")
import traceback
traceback.print_exc()
def main():
"""Main scheduler loop"""
print("📧 Munich News Newsletter Scheduler")
print("="*70)
print("Schedule: Daily at 7:00 AM Berlin time")
print("Timezone: Europe/Berlin (CET/CEST)")
print("Coordination: Waits for crawler to finish (max 30 min)")
print("="*70)
# Schedule the sender to run at 7 AM Berlin time
schedule.every().day.at("07:00").do(run_sender)
# Show next run time
berlin_time = datetime.now(BERLIN_TZ)
print(f"\nCurrent time (Berlin): {berlin_time.strftime('%Y-%m-%d %H:%M:%S %Z')}")
# Get next scheduled run
next_run = schedule.next_run()
if next_run:
# Convert to Berlin time for display
next_run_berlin = next_run.astimezone(BERLIN_TZ)
print(f"Next scheduled run: {next_run_berlin.strftime('%Y-%m-%d %H:%M:%S %Z')}")
print("\n⏳ Scheduler is running... (Press Ctrl+C to stop)\n")
# Optional: Run immediately on startup (comment out if you don't want this)
# print("🚀 Running initial send on startup...")
# run_sender()
# Keep the scheduler running
while True:
schedule.run_pending()
time.sleep(60) # Check every minute
if __name__ == '__main__':
try:
main()
except KeyboardInterrupt:
print("\n\n👋 Scheduler stopped by user")
except Exception as e:
print(f"\n\n❌ Scheduler error: {e}")
import traceback
traceback.print_exc()

View File

@@ -11,8 +11,17 @@ from pathlib import Path
from jinja2 import Template
from pymongo import MongoClient
import os
import sys
from dotenv import load_dotenv
# Add backend directory to path for importing tracking service
backend_dir = Path(__file__).parent.parent / 'backend'
sys.path.insert(0, str(backend_dir))
# Import tracking modules
from services import tracking_service
from tracking_integration import inject_tracking_pixel, replace_article_links, generate_tracking_urls
# Load environment variables from backend/.env
backend_dir = Path(__file__).parent.parent / 'backend'
env_path = backend_dir / '.env'
@@ -40,6 +49,11 @@ class Config:
MAX_ARTICLES = int(os.getenv('NEWSLETTER_MAX_ARTICLES', '10'))
HOURS_LOOKBACK = int(os.getenv('NEWSLETTER_HOURS_LOOKBACK', '24'))
WEBSITE_URL = os.getenv('WEBSITE_URL', 'http://localhost:3000')
# Tracking
TRACKING_ENABLED = os.getenv('TRACKING_ENABLED', 'true').lower() == 'true'
TRACKING_API_URL = os.getenv('TRACKING_API_URL', 'http://localhost:5001')
TRACKING_DATA_RETENTION_DAYS = int(os.getenv('TRACKING_DATA_RETENTION_DAYS', '90'))
# MongoDB connection
@@ -117,15 +131,20 @@ def get_active_subscribers():
return [doc['email'] for doc in cursor]
def render_newsletter_html(articles):
def render_newsletter_html(articles, tracking_enabled=False, pixel_tracking_id=None,
link_tracking_map=None, api_url=None):
"""
Render newsletter HTML from template
Render newsletter HTML from template with optional tracking integration
Args:
articles: List of article dictionaries
tracking_enabled: Whether to inject tracking pixel and replace links
pixel_tracking_id: Tracking ID for the email open pixel
link_tracking_map: Dictionary mapping original URLs to tracking IDs
api_url: Base URL for the tracking API
Returns:
str: Rendered HTML content
str: Rendered HTML content with tracking injected if enabled
"""
# Load template
template_path = Path(__file__).parent / 'newsletter_template.html'
@@ -142,11 +161,23 @@ def render_newsletter_html(articles):
'article_count': len(articles),
'articles': articles,
'unsubscribe_link': f'{Config.WEBSITE_URL}/unsubscribe',
'website_link': Config.WEBSITE_URL
'website_link': Config.WEBSITE_URL,
'tracking_enabled': tracking_enabled
}
# Render HTML
return template.render(**template_data)
html = template.render(**template_data)
# Inject tracking if enabled
if tracking_enabled and pixel_tracking_id and api_url:
# Inject tracking pixel
html = inject_tracking_pixel(html, pixel_tracking_id, api_url)
# Replace article links with tracking URLs
if link_tracking_map:
html = replace_article_links(html, link_tracking_map, api_url)
return html
def send_email(to_email, subject, html_content):
@@ -246,14 +277,14 @@ def send_newsletter(max_articles=None, test_email=None):
'error': 'No active subscribers'
}
# Render newsletter
print("\nRendering newsletter HTML...")
html_content = render_newsletter_html(articles)
print("✓ Newsletter rendered")
# Generate newsletter ID (date-based)
newsletter_id = f"newsletter-{datetime.now().strftime('%Y-%m-%d')}"
# Send to subscribers
subject = f"Munich News Daily - {datetime.now().strftime('%B %d, %Y')}"
print(f"\nSending newsletter: '{subject}'")
print(f"Newsletter ID: {newsletter_id}")
print(f"Tracking enabled: {Config.TRACKING_ENABLED}")
print("-" * 70)
sent_count = 0
@@ -262,6 +293,34 @@ def send_newsletter(max_articles=None, test_email=None):
for i, email in enumerate(subscribers, 1):
print(f"[{i}/{len(subscribers)}] Sending to {email}...", end=' ')
# Generate tracking data for this subscriber if tracking is enabled
if Config.TRACKING_ENABLED:
try:
tracking_data = generate_tracking_urls(
articles=articles,
newsletter_id=newsletter_id,
subscriber_email=email,
tracking_service=tracking_service
)
# Render newsletter with tracking
html_content = render_newsletter_html(
articles=articles,
tracking_enabled=True,
pixel_tracking_id=tracking_data['pixel_tracking_id'],
link_tracking_map=tracking_data['link_tracking_map'],
api_url=Config.TRACKING_API_URL
)
except Exception as e:
print(f"⚠ Tracking error: {e}, sending without tracking...", end=' ')
# Fallback: send without tracking
html_content = render_newsletter_html(articles)
else:
# Render newsletter without tracking
html_content = render_newsletter_html(articles)
# Send email
success, error = send_email(email, subject, html_content)
if success:
@@ -310,12 +369,11 @@ def preview_newsletter(max_articles=None, hours=None):
today_date = datetime.now().strftime('%B %d, %Y')
return f"<h1>No articles from today found</h1><p>No articles published today ({today_date}). Run the crawler with Ollama enabled to get fresh content.</p>"
return render_newsletter_html(articles)
# Preview without tracking
return render_newsletter_html(articles, tracking_enabled=False)
if __name__ == '__main__':
import sys
# Parse command line arguments
if len(sys.argv) > 1:
command = sys.argv[1]

View File

@@ -0,0 +1,150 @@
"""
Tracking integration module for Munich News Daily newsletter system.
Handles injection of tracking pixels and replacement of article links with tracking URLs.
"""
import re
from typing import Dict, List
from bs4 import BeautifulSoup
def inject_tracking_pixel(html: str, tracking_id: str, api_url: str) -> str:
"""
Inject tracking pixel into newsletter HTML before closing </body> tag.
The tracking pixel is a 1x1 transparent image that loads when the email is opened,
allowing us to track email opens.
Args:
html: Original newsletter HTML content
tracking_id: Unique tracking ID for this newsletter send (None if tracking disabled)
api_url: Base URL for the tracking API (e.g., http://localhost:5001)
Returns:
str: HTML with tracking pixel injected (unchanged if tracking_id is None)
Example:
>>> html = '<html><body><p>Content</p></body></html>'
>>> inject_tracking_pixel(html, 'abc-123', 'http://api.example.com')
'<html><body><p>Content</p><img src="http://api.example.com/api/track/pixel/abc-123" width="1" height="1" alt="" /></body></html>'
"""
# Skip tracking if no tracking_id provided (subscriber opted out)
if not tracking_id:
return html
# Construct tracking pixel URL
pixel_url = f"{api_url}/api/track/pixel/{tracking_id}"
# Create tracking pixel HTML
pixel_html = f'<img src="{pixel_url}" width="1" height="1" alt="" style="display:block;" />'
# Inject pixel before closing </body> tag
if '</body>' in html:
html = html.replace('</body>', f'{pixel_html}</body>')
else:
# Fallback: append to end if no </body> tag found
html += pixel_html
return html
def replace_article_links(
html: str,
link_tracking_map: Dict[str, str],
api_url: str
) -> str:
"""
Replace article links in newsletter HTML with tracking URLs.
Finds all article links in the HTML and replaces them with tracking redirect URLs
that log clicks before redirecting to the original article.
Args:
html: Original newsletter HTML content
link_tracking_map: Dictionary mapping original URLs to tracking IDs (empty if tracking disabled)
api_url: Base URL for the tracking API (e.g., http://localhost:5001)
Returns:
str: HTML with article links replaced by tracking URLs (unchanged if map is empty)
Example:
>>> html = '<a href="https://example.com/article">Read</a>'
>>> mapping = {'https://example.com/article': 'track-123'}
>>> replace_article_links(html, mapping, 'http://api.example.com')
'<a href="http://api.example.com/api/track/click/track-123">Read</a>'
"""
# Skip tracking if no tracking map provided (subscriber opted out)
if not link_tracking_map:
return html
# Parse HTML with BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Find all <a> tags with href attributes
for link in soup.find_all('a', href=True):
original_url = link['href']
# Check if this URL should be tracked
if original_url in link_tracking_map:
tracking_id = link_tracking_map[original_url]
tracking_url = f"{api_url}/api/track/click/{tracking_id}"
# Replace the href with tracking URL
link['href'] = tracking_url
# Return modified HTML
return str(soup)
def generate_tracking_urls(
articles: List[Dict],
newsletter_id: str,
subscriber_email: str,
tracking_service
) -> Dict[str, str]:
"""
Generate tracking records for all article links and return URL mapping.
Creates tracking records in the database for each article link and returns
a mapping of original URLs to tracking IDs.
Args:
articles: List of article dictionaries with 'link' and 'title' keys
newsletter_id: Unique identifier for the newsletter batch
subscriber_email: Email address of the recipient
tracking_service: Tracking service module with create_newsletter_tracking function
Returns:
dict: Dictionary containing:
- pixel_tracking_id: ID for the tracking pixel
- link_tracking_map: Dict mapping original URLs to tracking IDs
Example:
>>> articles = [{'link': 'https://example.com/1', 'title': 'Article 1'}]
>>> generate_tracking_urls(articles, 'news-2024-01-01', 'user@example.com', tracking_service)
{
'pixel_tracking_id': 'uuid-for-pixel',
'link_tracking_map': {'https://example.com/1': 'uuid-for-link'}
}
"""
# Prepare article links for tracking
article_links = []
for article in articles:
if 'link' in article and article['link']:
article_links.append({
'url': article['link'],
'title': article.get('title', '')
})
# Create tracking records using the tracking service
tracking_data = tracking_service.create_newsletter_tracking(
newsletter_id=newsletter_id,
subscriber_email=subscriber_email,
article_links=article_links
)
return {
'pixel_tracking_id': tracking_data['pixel_tracking_id'],
'link_tracking_map': tracking_data['link_tracking_map'],
'tracking_enabled': tracking_data.get('tracking_enabled', True)
}