This commit is contained in:
2025-11-18 14:45:41 +01:00
parent 2e80d64ff6
commit 84fce9a82c
19 changed files with 2437 additions and 3 deletions

View File

@@ -0,0 +1,195 @@
# 🎉 Newsletter Personalization System - Complete!
All 4 phases of the personalization system have been successfully implemented and tested.
## ✅ What Was Built
### Phase 1: Keyword Extraction
- AI-powered keyword extraction from articles using Ollama
- 5 keywords per article automatically extracted during crawling
- Keywords stored in database for personalization
### Phase 2: Click Tracking Enhancement
- Enhanced tracking to capture article keywords and category
- Tracking records now include metadata for building interest profiles
- Privacy-compliant with opt-out and GDPR support
### Phase 3: User Interest Profiling
- Automatic profile building from click behavior
- Interest scores (0.0-1.0) for categories and keywords
- Decay mechanism for old interests
- API endpoints for viewing and managing profiles
### Phase 4: Personalized Newsletter Generation
- Article scoring based on user interests
- Smart ranking algorithm (40% category + 60% keywords)
- Mix of personalized (70%) + trending (30%) content
- Explanation system for recommendations
## 📊 How It Works
```
1. User clicks article in newsletter
2. System records: keywords + category
3. Interest profile updates automatically
4. Next newsletter: articles ranked by interests
5. User receives personalized content
```
## 🧪 Testing
All phases have been tested and verified:
```bash
# Run comprehensive test suite (tests all 4 phases)
docker exec munich-news-local-backend python test_personalization_system.py
# Or test keyword extraction separately
docker exec munich-news-local-crawler python -c "from crawler_service import crawl_all_feeds; crawl_all_feeds(max_articles_per_feed=2)"
```
## 🔌 API Endpoints
### Interest Management
```bash
GET /api/interests/<email> # View profile
GET /api/interests/<email>/top # Top interests
POST /api/interests/<email>/rebuild # Rebuild from history
GET /api/interests/statistics # Platform stats
DELETE /api/interests/<email> # Delete (GDPR)
```
### Personalization
```bash
GET /api/personalize/preview/<email> # Preview personalized newsletter
POST /api/personalize/explain # Explain recommendation
```
## 📈 Example Results
### User Profile
```json
{
"email": "user@example.com",
"categories": {
"sports": 0.30,
"local": 0.10
},
"keywords": {
"Bayern Munich": 0.30,
"Football": 0.20,
"Transportation": 0.10
},
"total_clicks": 5
}
```
### Personalized Newsletter
```json
{
"articles": [
{
"title": "Bayern Munich wins championship",
"personalization_score": 0.86,
"category": "sports",
"keywords": ["Bayern Munich", "Football"]
},
{
"title": "New S-Bahn line opens",
"personalization_score": 0.42,
"category": "local",
"keywords": ["Transportation", "Munich"]
}
],
"statistics": {
"highly_personalized": 1,
"moderately_personalized": 1,
"trending": 0
}
}
```
## 🎯 Scoring Algorithm
```python
# Article score calculation
category_score = user_interests.categories[article.category]
keyword_score = average(user_interests.keywords[kw] for kw in article.keywords)
final_score = (category_score * 0.4) + (keyword_score * 0.6)
```
**Example:**
- User: sports=0.8, "Bayern Munich"=0.9
- Article: sports category, keywords=["Bayern Munich", "Football"]
- Score = (0.8 × 0.4) + (0.9 × 0.6) = 0.32 + 0.54 = **0.86**
## 🚀 Production Integration
To integrate with the newsletter sender:
1. **Modify `news_sender/sender_service.py`:**
```python
from services.personalization_service import select_personalized_articles
# For each subscriber
personalized_articles = select_personalized_articles(
all_articles,
subscriber_email,
max_articles=10
)
```
2. **Enable personalization flag in config:**
```env
PERSONALIZATION_ENABLED=true
PERSONALIZATION_RATIO=0.7 # 70% personalized, 30% trending
```
3. **Monitor metrics:**
- Click-through rate by personalization score
- Open rates for personalized vs non-personalized
- User engagement over time
## 🔐 Privacy & Compliance
- ✅ Users can opt out of tracking
- ✅ Interest profiles can be deleted (GDPR)
- ✅ Automatic anonymization after 90 days
- ✅ No PII beyond email address
- ✅ Transparent recommendation explanations
## 📁 Files Created/Modified
### New Files
- `backend/services/interest_profiling_service.py`
- `backend/services/personalization_service.py`
- `backend/routes/interests_routes.py`
- `backend/routes/personalization_routes.py`
- `backend/test_tracking_phase2.py`
- `backend/test_interest_profiling.py`
- `backend/test_personalization.py`
- `docs/PERSONALIZATION.md`
### Modified Files
- `news_crawler/ollama_client.py` - Added keyword extraction
- `news_crawler/crawler_service.py` - Integrated keyword extraction
- `backend/services/tracking_service.py` - Enhanced with metadata
- `backend/routes/tracking_routes.py` - Auto-update interests
- `backend/app.py` - Registered new routes
## 🎓 Key Learnings
1. **Incremental scoring works well** - 0.1 per click prevents over-weighting
2. **Mix is important** - 70/30 personalized/trending avoids filter bubbles
3. **Keywords > Categories** - 60/40 weight reflects keyword importance
4. **Decay is essential** - Prevents stale interests from dominating
5. **Transparency matters** - Explanation API helps users understand recommendations
## 🎉 Status: COMPLETE
All 4 phases implemented, tested, and documented. The personalization system is ready for production integration!