5.5 KiB
5.5 KiB
🎉 Newsletter Personalization System - Complete!
All 4 phases of the personalization system have been successfully implemented and tested.
✅ What Was Built
Phase 1: Keyword Extraction
- AI-powered keyword extraction from articles using Ollama
- 5 keywords per article automatically extracted during crawling
- Keywords stored in database for personalization
Phase 2: Click Tracking Enhancement
- Enhanced tracking to capture article keywords and category
- Tracking records now include metadata for building interest profiles
- Privacy-compliant with opt-out and GDPR support
Phase 3: User Interest Profiling
- Automatic profile building from click behavior
- Interest scores (0.0-1.0) for categories and keywords
- Decay mechanism for old interests
- API endpoints for viewing and managing profiles
Phase 4: Personalized Newsletter Generation
- Article scoring based on user interests
- Smart ranking algorithm (40% category + 60% keywords)
- Mix of personalized (70%) + trending (30%) content
- Explanation system for recommendations
📊 How It Works
1. User clicks article in newsletter
↓
2. System records: keywords + category
↓
3. Interest profile updates automatically
↓
4. Next newsletter: articles ranked by interests
↓
5. User receives personalized content
🧪 Testing
All phases have been tested and verified:
# Run comprehensive test suite (tests all 4 phases)
docker exec munich-news-local-backend python test_personalization_system.py
# Or test keyword extraction separately
docker exec munich-news-local-crawler python -c "from crawler_service import crawl_all_feeds; crawl_all_feeds(max_articles_per_feed=2)"
🔌 API Endpoints
Interest Management
GET /api/interests/<email> # View profile
GET /api/interests/<email>/top # Top interests
POST /api/interests/<email>/rebuild # Rebuild from history
GET /api/interests/statistics # Platform stats
DELETE /api/interests/<email> # Delete (GDPR)
Personalization
GET /api/personalize/preview/<email> # Preview personalized newsletter
POST /api/personalize/explain # Explain recommendation
📈 Example Results
User Profile
{
"email": "user@example.com",
"categories": {
"sports": 0.30,
"local": 0.10
},
"keywords": {
"Bayern Munich": 0.30,
"Football": 0.20,
"Transportation": 0.10
},
"total_clicks": 5
}
Personalized Newsletter
{
"articles": [
{
"title": "Bayern Munich wins championship",
"personalization_score": 0.86,
"category": "sports",
"keywords": ["Bayern Munich", "Football"]
},
{
"title": "New S-Bahn line opens",
"personalization_score": 0.42,
"category": "local",
"keywords": ["Transportation", "Munich"]
}
],
"statistics": {
"highly_personalized": 1,
"moderately_personalized": 1,
"trending": 0
}
}
🎯 Scoring Algorithm
# Article score calculation
category_score = user_interests.categories[article.category]
keyword_score = average(user_interests.keywords[kw] for kw in article.keywords)
final_score = (category_score * 0.4) + (keyword_score * 0.6)
Example:
- User: sports=0.8, "Bayern Munich"=0.9
- Article: sports category, keywords=["Bayern Munich", "Football"]
- Score = (0.8 × 0.4) + (0.9 × 0.6) = 0.32 + 0.54 = 0.86
🚀 Production Integration
To integrate with the newsletter sender:
- Modify
news_sender/sender_service.py:
from services.personalization_service import select_personalized_articles
# For each subscriber
personalized_articles = select_personalized_articles(
all_articles,
subscriber_email,
max_articles=10
)
- Enable personalization flag in config:
PERSONALIZATION_ENABLED=true
PERSONALIZATION_RATIO=0.7 # 70% personalized, 30% trending
- Monitor metrics:
- Click-through rate by personalization score
- Open rates for personalized vs non-personalized
- User engagement over time
🔐 Privacy & Compliance
- ✅ Users can opt out of tracking
- ✅ Interest profiles can be deleted (GDPR)
- ✅ Automatic anonymization after 90 days
- ✅ No PII beyond email address
- ✅ Transparent recommendation explanations
📁 Files Created/Modified
New Files
backend/services/interest_profiling_service.pybackend/services/personalization_service.pybackend/routes/interests_routes.pybackend/routes/personalization_routes.pybackend/test_tracking_phase2.pybackend/test_interest_profiling.pybackend/test_personalization.pydocs/PERSONALIZATION.md
Modified Files
news_crawler/ollama_client.py- Added keyword extractionnews_crawler/crawler_service.py- Integrated keyword extractionbackend/services/tracking_service.py- Enhanced with metadatabackend/routes/tracking_routes.py- Auto-update interestsbackend/app.py- Registered new routes
🎓 Key Learnings
- Incremental scoring works well - 0.1 per click prevents over-weighting
- Mix is important - 70/30 personalized/trending avoids filter bubbles
- Keywords > Categories - 60/40 weight reflects keyword importance
- Decay is essential - Prevents stale interests from dominating
- Transparency matters - Explanation API helps users understand recommendations
🎉 Status: COMPLETE
All 4 phases implemented, tested, and documented. The personalization system is ready for production integration!