7.3 KiB
Features Guide
Complete guide to Munich News Daily features.
Core Features
1. Automated News Crawling
- Fetches articles from RSS feeds
- Scheduled daily at 6:00 AM Berlin time
- Extracts full article content
- Handles multiple news sources
2. AI-Powered Summarization
- Generates concise summaries (150 words)
- Uses Ollama AI (phi3:latest model)
- GPU acceleration available (5-10x faster)
- Configurable summary length
3. Title Translation
- Translates German titles to English
- Uses Ollama AI
- Displays both languages in newsletter
- Stores both versions in database
4. Newsletter Generation
- Beautiful HTML email template
- Responsive design
- Numbered articles
- Summary statistics
- Scheduled daily at 7:00 AM Berlin time
5. Engagement Tracking
- Email open tracking (pixel)
- Link click tracking
- Analytics dashboard ready
- Subscriber engagement metrics
News Crawler
How It Works
1. Fetch RSS feeds from database
2. Parse RSS XML
3. Extract article URLs
4. Fetch full article content
5. Extract text from HTML
6. Translate title (German → English)
7. Generate AI summary
8. Store in MongoDB
Content Extraction
Strategies (in order):
- Article Tag - Look for
<article>tags - Main Tag - Look for
<main>content - Content Divs - Common class names (content, article-body, etc.)
- Paragraph Aggregation - Collect all
<p>tags - Fallback - Use RSS description
Cleaning:
- Remove scripts and styles
- Remove navigation elements
- Remove ads and sidebars
- Extract clean text
- Preserve paragraphs
RSS Feed Handling
Supported Formats:
- RSS 2.0
- Atom
- Custom formats
Extracted Data:
- Title
- Link
- Description/Summary
- Published date
- Author (if available)
Error Handling:
- Retry failed requests
- Skip invalid URLs
- Log errors
- Continue with next article
AI Features
Summarization
Process:
- Send article text to Ollama
- Request 150-word summary
- Receive AI-generated summary
- Store with article
Configuration:
OLLAMA_ENABLED=true
OLLAMA_MODEL=phi3:latest
SUMMARY_MAX_WORDS=150
OLLAMA_TIMEOUT=120
Performance:
- CPU: ~8s per article
- GPU: ~2s per article (4x faster)
Translation
Process:
- Send German title to Ollama
- Request English translation
- Receive translated title
- Store both versions
Configuration:
OLLAMA_ENABLED=true
OLLAMA_MODEL=phi3:latest
Performance:
- CPU: ~1.5s per title
- GPU: ~0.3s per title (5x faster)
Newsletter Display:
English Title (Primary)
Original: German Title (Subtitle)
Newsletter System
Template Features
- Responsive Design - Works on all devices
- Clean Layout - Easy to read
- Numbered Articles - Clear organization
- Summary Box - Quick stats
- Tracking Links - Click tracking
- Unsubscribe Link - Easy opt-out
Personalization
- Greeting message
- Date formatting
- Article count
- Source attribution
- Author names
Tracking
Open Tracking:
- Invisible 1x1 pixel image
- Loaded when email opened
- Records timestamp
- Tracks unique opens
Click Tracking:
- All article links tracked
- Redirect through backend
- Records click events
- Tracks which articles clicked
Subscriber Management
Status System
| Status | Description | Receives Newsletters |
|---|---|---|
active |
Subscribed | ✅ Yes |
inactive |
Unsubscribed | ❌ No |
Operations
Subscribe:
curl -X POST http://localhost:5001/api/subscribe \
-H "Content-Type: application/json" \
-d '{"email": "user@example.com"}'
Unsubscribe:
curl -X POST http://localhost:5001/api/unsubscribe \
-H "Content-Type: application/json" \
-d '{"email": "user@example.com"}'
Check Stats:
curl http://localhost:5001/api/admin/stats | jq '.subscribers'
Admin Features
Manual Crawl
Trigger crawl anytime:
curl -X POST http://localhost:5001/api/admin/trigger-crawl \
-H "Content-Type: application/json" \
-d '{"max_articles": 10}'
Test Email
Send test newsletter:
curl -X POST http://localhost:5001/api/admin/send-test-email \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com"}'
Send Newsletter
Send to all subscribers:
curl -X POST http://localhost:5001/api/admin/send-newsletter \
-H "Content-Type: application/json" \
-d '{"max_articles": 10}'
System Stats
View system statistics:
curl http://localhost:5001/api/admin/stats
Automation
Scheduled Tasks
Crawler (6:00 AM Berlin time):
- Fetches new articles
- Processes with AI
- Stores in database
Sender (7:00 AM Berlin time):
- Waits for crawler to finish
- Fetches today's articles
- Generates newsletter
- Sends to all active subscribers
Manual Execution
# Run crawler manually
docker-compose exec crawler python crawler_service.py 10
# Run sender manually
docker-compose exec sender python sender_service.py send 10
# Send test email
docker-compose exec sender python sender_service.py test your@email.com
Configuration
Environment Variables
# Newsletter Settings
NEWSLETTER_MAX_ARTICLES=10
NEWSLETTER_HOURS_LOOKBACK=24
WEBSITE_URL=http://localhost:3000
# Ollama AI
OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=phi3:latest
OLLAMA_TIMEOUT=120
SUMMARY_MAX_WORDS=150
# Tracking
TRACKING_ENABLED=true
TRACKING_API_URL=http://localhost:5001
TRACKING_DATA_RETENTION_DAYS=90
RSS Feeds
Add feeds in MongoDB:
db.rss_feeds.insertOne({
name: "Süddeutsche Zeitung München",
url: "https://www.sueddeutsche.de/muenchen/rss",
active: true
})
Performance Optimization
GPU Acceleration
Enable for 5-10x faster processing:
./start-with-gpu.sh
Benefits:
- Faster summarization (8s → 2s)
- Faster translation (1.5s → 0.3s)
- Process more articles
- Lower CPU usage
Batch Processing
Process multiple articles efficiently:
- Model stays loaded in memory
- Reduced overhead
- Better throughput
Caching
- Model caching (Ollama)
- Database connection pooling
- Persistent storage
Monitoring
Logs
# Crawler logs
docker-compose logs -f crawler
# Sender logs
docker-compose logs -f sender
# Backend logs
docker-compose logs -f backend
Metrics
- Articles crawled
- Summaries generated
- Newsletters sent
- Open rate
- Click-through rate
- Processing time
Health Checks
# Backend health
curl http://localhost:5001/health
# System stats
curl http://localhost:5001/api/admin/stats
Troubleshooting
Crawler Issues
No articles found:
- Check RSS feed URLs
- Verify feeds are active
- Check network connectivity
Extraction failed:
- Article structure changed
- Paywall detected
- Network timeout
AI processing failed:
- Ollama not running
- Model not downloaded
- Timeout too short
Newsletter Issues
Not sending:
- Check email configuration
- Verify SMTP credentials
- Check subscriber count
Tracking not working:
- Verify tracking enabled
- Check backend API accessible
- Verify tracking URLs
See SETUP.md for configuration, API.md for API reference, and ARCHITECTURE.md for system design.