6.2 KiB
Newsletter Personalization Implementation
Overview
Personalized newsletters based on user click behavior, using keywords and categories to build interest profiles.
Implementation Phases
✅ Phase 1: Keyword Extraction (COMPLETED)
Status: Implemented Files Modified:
news_crawler/ollama_client.py- Addedextract_keywords()methodnews_crawler/crawler_service.py- Integrated keyword extraction into crawl process
What it does:
- Extracts 5 keywords from each article using Ollama AI
- Keywords stored in
articlescollection:keywords: ["Bayern Munich", "Football", ...] - Runs automatically during news crawling
Test it:
# Trigger a crawl
curl -X POST http://localhost:5001/api/admin/trigger-crawl -d '{"max_articles": 2}'
# Check articles have keywords
docker exec munich-news-mongodb mongosh munich_news --eval "db.articles.findOne({}, {title: 1, keywords: 1})"
✅ Phase 2: Click Tracking Enhancement (COMPLETED)
Status: Implemented Goal: Track clicks with keyword metadata
Files Modified:
backend/services/tracking_service.py- Enhancedcreate_newsletter_tracking()to look up article metadata
What it does:
- When creating tracking links, looks up article from database
- Stores article ID, category, and keywords in tracking record
- Enables building user interest profiles from click behavior
Database Schema:
// link_clicks collection
{
tracking_id: "uuid",
newsletter_id: "2024-11-18",
subscriber_email: "user@example.com",
article_url: "https://...",
article_title: "Article Title",
article_id: "673abc123...", // NEW: Article database ID
category: "sports", // NEW: Article category
keywords: ["Bayern Munich", "Bundesliga"], // NEW: Keywords for personalization
clicked: false,
clicked_at: null,
user_agent: null,
created_at: ISODate()
}
Test it:
# Send a test newsletter
curl -X POST http://localhost:5001/api/admin/send-newsletter
# Check tracking records have keywords
docker exec munich-news-mongodb mongosh munich_news --eval "db.link_clicks.findOne({}, {article_title: 1, keywords: 1, category: 1})"
✅ Phase 3: User Interest Profiling (COMPLETED)
Status: Implemented Goal: Build user interest profiles from click history
Files Created:
backend/services/interest_profiling_service.py- Core profiling logicbackend/routes/interests_routes.py- API endpoints for interest management
Files Modified:
backend/routes/tracking_routes.py- Auto-update interests on clickbackend/app.py- Register interests routes
What it does:
- Automatically builds interest profiles when users click articles
- Tracks interest scores for categories and keywords (0.0 to 1.0)
- Increments scores by 0.1 per click, capped at 1.0
- Provides decay mechanism for old interests
- Supports rebuilding profiles from click history
Database Schema:
// user_interests collection
{
email: "user@example.com",
categories: {
sports: 0.8,
local: 0.5,
science: 0.2
},
keywords: {
"Bayern Munich": 0.9,
"Oktoberfest": 0.7,
"AI": 0.3
},
total_clicks: 15,
last_updated: ISODate(),
created_at: ISODate()
}
API Endpoints:
# Get user interests
GET /api/interests/<email>
# Get top interests
GET /api/interests/<email>/top?top_n=10
# Rebuild from history
POST /api/interests/<email>/rebuild
Body: {"days_lookback": 30}
# Decay old interests
POST /api/interests/decay
Body: {"decay_factor": 0.95, "days_threshold": 7}
# Get statistics
GET /api/interests/statistics
# Delete profile (GDPR)
DELETE /api/interests/<email>
Test it:
# Run test script
docker exec munich-news-local-backend python test_interest_profiling.py
# View a user's interests
curl http://localhost:5001/api/interests/user@example.com
# Get statistics
curl http://localhost:5001/api/interests/statistics
✅ Phase 4: Personalized Newsletter (COMPLETED)
Status: Implemented Goal: Rank and select articles based on user interests
Files Created:
backend/services/personalization_service.py- Core personalization logicbackend/routes/personalization_routes.py- API endpoints for testing
Files Modified:
backend/app.py- Register personalization routes
What it does:
- Scores articles based on user's category and keyword interests
- Ranks articles by personalization score (0.0 to 1.0)
- Selects mix of personalized (70%) + trending (30%) content
- Provides explanations for recommendations
Algorithm:
score = (category_match * 0.4) + (keyword_match * 0.6)
# Example:
# User interests: sports=0.8, "Bayern Munich"=0.9
# Article: sports category, keywords=["Bayern Munich", "Football"]
# Score = (0.8 * 0.4) + (0.9 * 0.6) = 0.32 + 0.54 = 0.86
API Endpoints:
# Preview personalized newsletter
GET /api/personalize/preview/<email>?max_articles=10&hours_lookback=24
# Explain recommendation
POST /api/personalize/explain
Body: {"email": "user@example.com", "article_id": "..."}
Test it:
# Run test script
docker exec munich-news-local-backend python test_personalization.py
# Preview personalized newsletter
curl "http://localhost:5001/api/personalize/preview/demo@example.com?max_articles=5"
✅ All Phases Complete!
Phase 1: Keyword extraction from articles✅ DONEPhase 2: Click tracking with keywords✅ DONEPhase 3: User interest profiling✅ DONEPhase 4: Personalized newsletter generation✅ DONE
Next Steps for Production
- Integrate with newsletter sender - Modify
news_sender/sender_service.pyto use personalization - A/B testing - Compare personalized vs non-personalized engagement
- Tune parameters - Adjust personalization_ratio, weights, decay rates
- Monitor metrics - Track click-through rates, open rates by personalization score
- User controls - Add UI for users to view/edit their interests
Configuration
No configuration needed yet. Keyword extraction uses existing Ollama settings from backend/.env:
OLLAMA_ENABLED=trueOLLAMA_MODEL=gemma3:12bOLLAMA_BASE_URL=http://ollama:11434