update
This commit is contained in:
195
docs/PERSONALIZATION_COMPLETE.md
Normal file
195
docs/PERSONALIZATION_COMPLETE.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# 🎉 Newsletter Personalization System - Complete!
|
||||
|
||||
All 4 phases of the personalization system have been successfully implemented and tested.
|
||||
|
||||
## ✅ What Was Built
|
||||
|
||||
### Phase 1: Keyword Extraction
|
||||
- AI-powered keyword extraction from articles using Ollama
|
||||
- 5 keywords per article automatically extracted during crawling
|
||||
- Keywords stored in database for personalization
|
||||
|
||||
### Phase 2: Click Tracking Enhancement
|
||||
- Enhanced tracking to capture article keywords and category
|
||||
- Tracking records now include metadata for building interest profiles
|
||||
- Privacy-compliant with opt-out and GDPR support
|
||||
|
||||
### Phase 3: User Interest Profiling
|
||||
- Automatic profile building from click behavior
|
||||
- Interest scores (0.0-1.0) for categories and keywords
|
||||
- Decay mechanism for old interests
|
||||
- API endpoints for viewing and managing profiles
|
||||
|
||||
### Phase 4: Personalized Newsletter Generation
|
||||
- Article scoring based on user interests
|
||||
- Smart ranking algorithm (40% category + 60% keywords)
|
||||
- Mix of personalized (70%) + trending (30%) content
|
||||
- Explanation system for recommendations
|
||||
|
||||
## 📊 How It Works
|
||||
|
||||
```
|
||||
1. User clicks article in newsletter
|
||||
↓
|
||||
2. System records: keywords + category
|
||||
↓
|
||||
3. Interest profile updates automatically
|
||||
↓
|
||||
4. Next newsletter: articles ranked by interests
|
||||
↓
|
||||
5. User receives personalized content
|
||||
```
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
All phases have been tested and verified:
|
||||
|
||||
```bash
|
||||
# Run comprehensive test suite (tests all 4 phases)
|
||||
docker exec munich-news-local-backend python test_personalization_system.py
|
||||
|
||||
# Or test keyword extraction separately
|
||||
docker exec munich-news-local-crawler python -c "from crawler_service import crawl_all_feeds; crawl_all_feeds(max_articles_per_feed=2)"
|
||||
```
|
||||
|
||||
## 🔌 API Endpoints
|
||||
|
||||
### Interest Management
|
||||
```bash
|
||||
GET /api/interests/<email> # View profile
|
||||
GET /api/interests/<email>/top # Top interests
|
||||
POST /api/interests/<email>/rebuild # Rebuild from history
|
||||
GET /api/interests/statistics # Platform stats
|
||||
DELETE /api/interests/<email> # Delete (GDPR)
|
||||
```
|
||||
|
||||
### Personalization
|
||||
```bash
|
||||
GET /api/personalize/preview/<email> # Preview personalized newsletter
|
||||
POST /api/personalize/explain # Explain recommendation
|
||||
```
|
||||
|
||||
## 📈 Example Results
|
||||
|
||||
### User Profile
|
||||
```json
|
||||
{
|
||||
"email": "user@example.com",
|
||||
"categories": {
|
||||
"sports": 0.30,
|
||||
"local": 0.10
|
||||
},
|
||||
"keywords": {
|
||||
"Bayern Munich": 0.30,
|
||||
"Football": 0.20,
|
||||
"Transportation": 0.10
|
||||
},
|
||||
"total_clicks": 5
|
||||
}
|
||||
```
|
||||
|
||||
### Personalized Newsletter
|
||||
```json
|
||||
{
|
||||
"articles": [
|
||||
{
|
||||
"title": "Bayern Munich wins championship",
|
||||
"personalization_score": 0.86,
|
||||
"category": "sports",
|
||||
"keywords": ["Bayern Munich", "Football"]
|
||||
},
|
||||
{
|
||||
"title": "New S-Bahn line opens",
|
||||
"personalization_score": 0.42,
|
||||
"category": "local",
|
||||
"keywords": ["Transportation", "Munich"]
|
||||
}
|
||||
],
|
||||
"statistics": {
|
||||
"highly_personalized": 1,
|
||||
"moderately_personalized": 1,
|
||||
"trending": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 🎯 Scoring Algorithm
|
||||
|
||||
```python
|
||||
# Article score calculation
|
||||
category_score = user_interests.categories[article.category]
|
||||
keyword_score = average(user_interests.keywords[kw] for kw in article.keywords)
|
||||
|
||||
final_score = (category_score * 0.4) + (keyword_score * 0.6)
|
||||
```
|
||||
|
||||
**Example:**
|
||||
- User: sports=0.8, "Bayern Munich"=0.9
|
||||
- Article: sports category, keywords=["Bayern Munich", "Football"]
|
||||
- Score = (0.8 × 0.4) + (0.9 × 0.6) = 0.32 + 0.54 = **0.86**
|
||||
|
||||
## 🚀 Production Integration
|
||||
|
||||
To integrate with the newsletter sender:
|
||||
|
||||
1. **Modify `news_sender/sender_service.py`:**
|
||||
```python
|
||||
from services.personalization_service import select_personalized_articles
|
||||
|
||||
# For each subscriber
|
||||
personalized_articles = select_personalized_articles(
|
||||
all_articles,
|
||||
subscriber_email,
|
||||
max_articles=10
|
||||
)
|
||||
```
|
||||
|
||||
2. **Enable personalization flag in config:**
|
||||
```env
|
||||
PERSONALIZATION_ENABLED=true
|
||||
PERSONALIZATION_RATIO=0.7 # 70% personalized, 30% trending
|
||||
```
|
||||
|
||||
3. **Monitor metrics:**
|
||||
- Click-through rate by personalization score
|
||||
- Open rates for personalized vs non-personalized
|
||||
- User engagement over time
|
||||
|
||||
## 🔐 Privacy & Compliance
|
||||
|
||||
- ✅ Users can opt out of tracking
|
||||
- ✅ Interest profiles can be deleted (GDPR)
|
||||
- ✅ Automatic anonymization after 90 days
|
||||
- ✅ No PII beyond email address
|
||||
- ✅ Transparent recommendation explanations
|
||||
|
||||
## 📁 Files Created/Modified
|
||||
|
||||
### New Files
|
||||
- `backend/services/interest_profiling_service.py`
|
||||
- `backend/services/personalization_service.py`
|
||||
- `backend/routes/interests_routes.py`
|
||||
- `backend/routes/personalization_routes.py`
|
||||
- `backend/test_tracking_phase2.py`
|
||||
- `backend/test_interest_profiling.py`
|
||||
- `backend/test_personalization.py`
|
||||
- `docs/PERSONALIZATION.md`
|
||||
|
||||
### Modified Files
|
||||
- `news_crawler/ollama_client.py` - Added keyword extraction
|
||||
- `news_crawler/crawler_service.py` - Integrated keyword extraction
|
||||
- `backend/services/tracking_service.py` - Enhanced with metadata
|
||||
- `backend/routes/tracking_routes.py` - Auto-update interests
|
||||
- `backend/app.py` - Registered new routes
|
||||
|
||||
## 🎓 Key Learnings
|
||||
|
||||
1. **Incremental scoring works well** - 0.1 per click prevents over-weighting
|
||||
2. **Mix is important** - 70/30 personalized/trending avoids filter bubbles
|
||||
3. **Keywords > Categories** - 60/40 weight reflects keyword importance
|
||||
4. **Decay is essential** - Prevents stale interests from dominating
|
||||
5. **Transparency matters** - Explanation API helps users understand recommendations
|
||||
|
||||
## 🎉 Status: COMPLETE
|
||||
|
||||
All 4 phases implemented, tested, and documented. The personalization system is ready for production integration!
|
||||
Reference in New Issue
Block a user