Files
Munich-news/docs/PERSONALIZATION_COMPLETE.md
2025-11-18 14:45:41 +01:00

196 lines
5.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🎉 Newsletter Personalization System - Complete!
All 4 phases of the personalization system have been successfully implemented and tested.
## ✅ What Was Built
### Phase 1: Keyword Extraction
- AI-powered keyword extraction from articles using Ollama
- 5 keywords per article automatically extracted during crawling
- Keywords stored in database for personalization
### Phase 2: Click Tracking Enhancement
- Enhanced tracking to capture article keywords and category
- Tracking records now include metadata for building interest profiles
- Privacy-compliant with opt-out and GDPR support
### Phase 3: User Interest Profiling
- Automatic profile building from click behavior
- Interest scores (0.0-1.0) for categories and keywords
- Decay mechanism for old interests
- API endpoints for viewing and managing profiles
### Phase 4: Personalized Newsletter Generation
- Article scoring based on user interests
- Smart ranking algorithm (40% category + 60% keywords)
- Mix of personalized (70%) + trending (30%) content
- Explanation system for recommendations
## 📊 How It Works
```
1. User clicks article in newsletter
2. System records: keywords + category
3. Interest profile updates automatically
4. Next newsletter: articles ranked by interests
5. User receives personalized content
```
## 🧪 Testing
All phases have been tested and verified:
```bash
# Run comprehensive test suite (tests all 4 phases)
docker exec munich-news-local-backend python test_personalization_system.py
# Or test keyword extraction separately
docker exec munich-news-local-crawler python -c "from crawler_service import crawl_all_feeds; crawl_all_feeds(max_articles_per_feed=2)"
```
## 🔌 API Endpoints
### Interest Management
```bash
GET /api/interests/<email> # View profile
GET /api/interests/<email>/top # Top interests
POST /api/interests/<email>/rebuild # Rebuild from history
GET /api/interests/statistics # Platform stats
DELETE /api/interests/<email> # Delete (GDPR)
```
### Personalization
```bash
GET /api/personalize/preview/<email> # Preview personalized newsletter
POST /api/personalize/explain # Explain recommendation
```
## 📈 Example Results
### User Profile
```json
{
"email": "user@example.com",
"categories": {
"sports": 0.30,
"local": 0.10
},
"keywords": {
"Bayern Munich": 0.30,
"Football": 0.20,
"Transportation": 0.10
},
"total_clicks": 5
}
```
### Personalized Newsletter
```json
{
"articles": [
{
"title": "Bayern Munich wins championship",
"personalization_score": 0.86,
"category": "sports",
"keywords": ["Bayern Munich", "Football"]
},
{
"title": "New S-Bahn line opens",
"personalization_score": 0.42,
"category": "local",
"keywords": ["Transportation", "Munich"]
}
],
"statistics": {
"highly_personalized": 1,
"moderately_personalized": 1,
"trending": 0
}
}
```
## 🎯 Scoring Algorithm
```python
# Article score calculation
category_score = user_interests.categories[article.category]
keyword_score = average(user_interests.keywords[kw] for kw in article.keywords)
final_score = (category_score * 0.4) + (keyword_score * 0.6)
```
**Example:**
- User: sports=0.8, "Bayern Munich"=0.9
- Article: sports category, keywords=["Bayern Munich", "Football"]
- Score = (0.8 × 0.4) + (0.9 × 0.6) = 0.32 + 0.54 = **0.86**
## 🚀 Production Integration
To integrate with the newsletter sender:
1. **Modify `news_sender/sender_service.py`:**
```python
from services.personalization_service import select_personalized_articles
# For each subscriber
personalized_articles = select_personalized_articles(
all_articles,
subscriber_email,
max_articles=10
)
```
2. **Enable personalization flag in config:**
```env
PERSONALIZATION_ENABLED=true
PERSONALIZATION_RATIO=0.7 # 70% personalized, 30% trending
```
3. **Monitor metrics:**
- Click-through rate by personalization score
- Open rates for personalized vs non-personalized
- User engagement over time
## 🔐 Privacy & Compliance
- ✅ Users can opt out of tracking
- ✅ Interest profiles can be deleted (GDPR)
- ✅ Automatic anonymization after 90 days
- ✅ No PII beyond email address
- ✅ Transparent recommendation explanations
## 📁 Files Created/Modified
### New Files
- `backend/services/interest_profiling_service.py`
- `backend/services/personalization_service.py`
- `backend/routes/interests_routes.py`
- `backend/routes/personalization_routes.py`
- `backend/test_tracking_phase2.py`
- `backend/test_interest_profiling.py`
- `backend/test_personalization.py`
- `docs/PERSONALIZATION.md`
### Modified Files
- `news_crawler/ollama_client.py` - Added keyword extraction
- `news_crawler/crawler_service.py` - Integrated keyword extraction
- `backend/services/tracking_service.py` - Enhanced with metadata
- `backend/routes/tracking_routes.py` - Auto-update interests
- `backend/app.py` - Registered new routes
## 🎓 Key Learnings
1. **Incremental scoring works well** - 0.1 per click prevents over-weighting
2. **Mix is important** - 70/30 personalized/trending avoids filter bubbles
3. **Keywords > Categories** - 60/40 weight reflects keyword importance
4. **Decay is essential** - Prevents stale interests from dominating
5. **Transparency matters** - Explanation API helps users understand recommendations
## 🎉 Status: COMPLETE
All 4 phases implemented, tested, and documented. The personalization system is ready for production integration!