update
This commit is contained in:
217
docs/PERSONALIZATION.md
Normal file
217
docs/PERSONALIZATION.md
Normal file
@@ -0,0 +1,217 @@
|
||||
# Newsletter Personalization Implementation
|
||||
|
||||
## Overview
|
||||
Personalized newsletters based on user click behavior, using keywords and categories to build interest profiles.
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### ✅ Phase 1: Keyword Extraction (COMPLETED)
|
||||
**Status:** Implemented
|
||||
**Files Modified:**
|
||||
- `news_crawler/ollama_client.py` - Added `extract_keywords()` method
|
||||
- `news_crawler/crawler_service.py` - Integrated keyword extraction into crawl process
|
||||
|
||||
**What it does:**
|
||||
- Extracts 5 keywords from each article using Ollama AI
|
||||
- Keywords stored in `articles` collection: `keywords: ["Bayern Munich", "Football", ...]`
|
||||
- Runs automatically during news crawling
|
||||
|
||||
**Test it:**
|
||||
```bash
|
||||
# Trigger a crawl
|
||||
curl -X POST http://localhost:5001/api/admin/trigger-crawl -d '{"max_articles": 2}'
|
||||
|
||||
# Check articles have keywords
|
||||
docker exec munich-news-mongodb mongosh munich_news --eval "db.articles.findOne({}, {title: 1, keywords: 1})"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ✅ Phase 2: Click Tracking Enhancement (COMPLETED)
|
||||
**Status:** Implemented
|
||||
**Goal:** Track clicks with keyword metadata
|
||||
|
||||
**Files Modified:**
|
||||
- `backend/services/tracking_service.py` - Enhanced `create_newsletter_tracking()` to look up article metadata
|
||||
|
||||
**What it does:**
|
||||
- When creating tracking links, looks up article from database
|
||||
- Stores article ID, category, and keywords in tracking record
|
||||
- Enables building user interest profiles from click behavior
|
||||
|
||||
**Database Schema:**
|
||||
```javascript
|
||||
// link_clicks collection
|
||||
{
|
||||
tracking_id: "uuid",
|
||||
newsletter_id: "2024-11-18",
|
||||
subscriber_email: "user@example.com",
|
||||
article_url: "https://...",
|
||||
article_title: "Article Title",
|
||||
article_id: "673abc123...", // NEW: Article database ID
|
||||
category: "sports", // NEW: Article category
|
||||
keywords: ["Bayern Munich", "Bundesliga"], // NEW: Keywords for personalization
|
||||
clicked: false,
|
||||
clicked_at: null,
|
||||
user_agent: null,
|
||||
created_at: ISODate()
|
||||
}
|
||||
```
|
||||
|
||||
**Test it:**
|
||||
```bash
|
||||
# Send a test newsletter
|
||||
curl -X POST http://localhost:5001/api/admin/send-newsletter
|
||||
|
||||
# Check tracking records have keywords
|
||||
docker exec munich-news-mongodb mongosh munich_news --eval "db.link_clicks.findOne({}, {article_title: 1, keywords: 1, category: 1})"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ✅ Phase 3: User Interest Profiling (COMPLETED)
|
||||
**Status:** Implemented
|
||||
**Goal:** Build user interest profiles from click history
|
||||
|
||||
**Files Created:**
|
||||
- `backend/services/interest_profiling_service.py` - Core profiling logic
|
||||
- `backend/routes/interests_routes.py` - API endpoints for interest management
|
||||
|
||||
**Files Modified:**
|
||||
- `backend/routes/tracking_routes.py` - Auto-update interests on click
|
||||
- `backend/app.py` - Register interests routes
|
||||
|
||||
**What it does:**
|
||||
- Automatically builds interest profiles when users click articles
|
||||
- Tracks interest scores for categories and keywords (0.0 to 1.0)
|
||||
- Increments scores by 0.1 per click, capped at 1.0
|
||||
- Provides decay mechanism for old interests
|
||||
- Supports rebuilding profiles from click history
|
||||
|
||||
**Database Schema:**
|
||||
```javascript
|
||||
// user_interests collection
|
||||
{
|
||||
email: "user@example.com",
|
||||
categories: {
|
||||
sports: 0.8,
|
||||
local: 0.5,
|
||||
science: 0.2
|
||||
},
|
||||
keywords: {
|
||||
"Bayern Munich": 0.9,
|
||||
"Oktoberfest": 0.7,
|
||||
"AI": 0.3
|
||||
},
|
||||
total_clicks: 15,
|
||||
last_updated: ISODate(),
|
||||
created_at: ISODate()
|
||||
}
|
||||
```
|
||||
|
||||
**API Endpoints:**
|
||||
```bash
|
||||
# Get user interests
|
||||
GET /api/interests/<email>
|
||||
|
||||
# Get top interests
|
||||
GET /api/interests/<email>/top?top_n=10
|
||||
|
||||
# Rebuild from history
|
||||
POST /api/interests/<email>/rebuild
|
||||
Body: {"days_lookback": 30}
|
||||
|
||||
# Decay old interests
|
||||
POST /api/interests/decay
|
||||
Body: {"decay_factor": 0.95, "days_threshold": 7}
|
||||
|
||||
# Get statistics
|
||||
GET /api/interests/statistics
|
||||
|
||||
# Delete profile (GDPR)
|
||||
DELETE /api/interests/<email>
|
||||
```
|
||||
|
||||
**Test it:**
|
||||
```bash
|
||||
# Run test script
|
||||
docker exec munich-news-local-backend python test_interest_profiling.py
|
||||
|
||||
# View a user's interests
|
||||
curl http://localhost:5001/api/interests/user@example.com
|
||||
|
||||
# Get statistics
|
||||
curl http://localhost:5001/api/interests/statistics
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ✅ Phase 4: Personalized Newsletter (COMPLETED)
|
||||
**Status:** Implemented
|
||||
**Goal:** Rank and select articles based on user interests
|
||||
|
||||
**Files Created:**
|
||||
- `backend/services/personalization_service.py` - Core personalization logic
|
||||
- `backend/routes/personalization_routes.py` - API endpoints for testing
|
||||
|
||||
**Files Modified:**
|
||||
- `backend/app.py` - Register personalization routes
|
||||
|
||||
**What it does:**
|
||||
- Scores articles based on user's category and keyword interests
|
||||
- Ranks articles by personalization score (0.0 to 1.0)
|
||||
- Selects mix of personalized (70%) + trending (30%) content
|
||||
- Provides explanations for recommendations
|
||||
|
||||
**Algorithm:**
|
||||
```python
|
||||
score = (category_match * 0.4) + (keyword_match * 0.6)
|
||||
|
||||
# Example:
|
||||
# User interests: sports=0.8, "Bayern Munich"=0.9
|
||||
# Article: sports category, keywords=["Bayern Munich", "Football"]
|
||||
# Score = (0.8 * 0.4) + (0.9 * 0.6) = 0.32 + 0.54 = 0.86
|
||||
```
|
||||
|
||||
**API Endpoints:**
|
||||
```bash
|
||||
# Preview personalized newsletter
|
||||
GET /api/personalize/preview/<email>?max_articles=10&hours_lookback=24
|
||||
|
||||
# Explain recommendation
|
||||
POST /api/personalize/explain
|
||||
Body: {"email": "user@example.com", "article_id": "..."}
|
||||
```
|
||||
|
||||
**Test it:**
|
||||
```bash
|
||||
# Run test script
|
||||
docker exec munich-news-local-backend python test_personalization.py
|
||||
|
||||
# Preview personalized newsletter
|
||||
curl "http://localhost:5001/api/personalize/preview/demo@example.com?max_articles=5"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ All Phases Complete!
|
||||
|
||||
1. ~~**Phase 1:** Keyword extraction from articles~~ ✅ DONE
|
||||
2. ~~**Phase 2:** Click tracking with keywords~~ ✅ DONE
|
||||
3. ~~**Phase 3:** User interest profiling~~ ✅ DONE
|
||||
4. ~~**Phase 4:** Personalized newsletter generation~~ ✅ DONE
|
||||
|
||||
## Next Steps for Production
|
||||
|
||||
1. **Integrate with newsletter sender** - Modify `news_sender/sender_service.py` to use personalization
|
||||
2. **A/B testing** - Compare personalized vs non-personalized engagement
|
||||
3. **Tune parameters** - Adjust personalization_ratio, weights, decay rates
|
||||
4. **Monitor metrics** - Track click-through rates, open rates by personalization score
|
||||
5. **User controls** - Add UI for users to view/edit their interests
|
||||
|
||||
## Configuration
|
||||
|
||||
No configuration needed yet. Keyword extraction uses existing Ollama settings from `backend/.env`:
|
||||
- `OLLAMA_ENABLED=true`
|
||||
- `OLLAMA_MODEL=gemma3:12b`
|
||||
- `OLLAMA_BASE_URL=http://ollama:11434`
|
||||
Reference in New Issue
Block a user