update

2025-11-18 14:45:41 +01:00
parent 2e80d64ff6
commit 84fce9a82c
19 changed files with 2437 additions and 3 deletions
--- a/docs/PERSONALIZATION.md
+++ b/docs/PERSONALIZATION.md
@@ -0,0 +1,217 @@
+# Newsletter Personalization Implementation
+
+## Overview
+Personalized newsletters based on user click behavior, using keywords and categories to build interest profiles.
+
+## Implementation Phases
+
+### ✅ Phase 1: Keyword Extraction (COMPLETED)
+**Status:** Implemented
+**Files Modified:**
+- `news_crawler/ollama_client.py` - Added `extract_keywords()` method
+- `news_crawler/crawler_service.py` - Integrated keyword extraction into crawl process
+
+**What it does:**
+- Extracts 5 keywords from each article using Ollama AI
+- Keywords stored in `articles` collection: `keywords: ["Bayern Munich", "Football", ...]`
+- Runs automatically during news crawling
+
+**Test it:**
+```bash
+# Trigger a crawl
+curl -X POST http://localhost:5001/api/admin/trigger-crawl -d '{"max_articles": 2}'
+
+# Check articles have keywords
+docker exec munich-news-mongodb mongosh munich_news --eval "db.articles.findOne({}, {title: 1, keywords: 1})"
+```
+
+---
+
+### ✅ Phase 2: Click Tracking Enhancement (COMPLETED)
+**Status:** Implemented
+**Goal:** Track clicks with keyword metadata
+
+**Files Modified:**
+- `backend/services/tracking_service.py` - Enhanced `create_newsletter_tracking()` to look up article metadata
+
+**What it does:**
+- When creating tracking links, looks up article from database
+- Stores article ID, category, and keywords in tracking record
+- Enables building user interest profiles from click behavior
+
+**Database Schema:**
+```javascript
+// link_clicks collection
+{
+  tracking_id: "uuid",
+  newsletter_id: "2024-11-18",
+  subscriber_email: "user@example.com",
+  article_url: "https://...",
+  article_title: "Article Title",
+  article_id: "673abc123...",              // NEW: Article database ID
+  category: "sports",                      // NEW: Article category
+  keywords: ["Bayern Munich", "Bundesliga"], // NEW: Keywords for personalization
+  clicked: false,
+  clicked_at: null,
+  user_agent: null,
+  created_at: ISODate()
+}
+```
+
+**Test it:**
+```bash
+# Send a test newsletter
+curl -X POST http://localhost:5001/api/admin/send-newsletter
+
+# Check tracking records have keywords
+docker exec munich-news-mongodb mongosh munich_news --eval "db.link_clicks.findOne({}, {article_title: 1, keywords: 1, category: 1})"
+```
+
+---
+
+### ✅ Phase 3: User Interest Profiling (COMPLETED)
+**Status:** Implemented
+**Goal:** Build user interest profiles from click history
+
+**Files Created:**
+- `backend/services/interest_profiling_service.py` - Core profiling logic
+- `backend/routes/interests_routes.py` - API endpoints for interest management
+
+**Files Modified:**
+- `backend/routes/tracking_routes.py` - Auto-update interests on click
+- `backend/app.py` - Register interests routes
+
+**What it does:**
+- Automatically builds interest profiles when users click articles
+- Tracks interest scores for categories and keywords (0.0 to 1.0)
+- Increments scores by 0.1 per click, capped at 1.0
+- Provides decay mechanism for old interests
+- Supports rebuilding profiles from click history
+
+**Database Schema:**
+```javascript
+// user_interests collection
+{
+  email: "user@example.com",
+  categories: {
+    sports: 0.8,
+    local: 0.5,
+    science: 0.2
+  },
+  keywords: {
+    "Bayern Munich": 0.9,
+    "Oktoberfest": 0.7,
+    "AI": 0.3
+  },
+  total_clicks: 15,
+  last_updated: ISODate(),
+  created_at: ISODate()
+}
+```
+
+**API Endpoints:**
+```bash
+# Get user interests
+GET /api/interests/<email>
+
+# Get top interests
+GET /api/interests/<email>/top?top_n=10
+
+# Rebuild from history
+POST /api/interests/<email>/rebuild
+Body: {"days_lookback": 30}
+
+# Decay old interests
+POST /api/interests/decay
+Body: {"decay_factor": 0.95, "days_threshold": 7}
+
+# Get statistics
+GET /api/interests/statistics
+
+# Delete profile (GDPR)
+DELETE /api/interests/<email>
+```
+
+**Test it:**
+```bash
+# Run test script
+docker exec munich-news-local-backend python test_interest_profiling.py
+
+# View a user's interests
+curl http://localhost:5001/api/interests/user@example.com
+
+# Get statistics
+curl http://localhost:5001/api/interests/statistics
+```
+
+---
+
+### ✅ Phase 4: Personalized Newsletter (COMPLETED)
+**Status:** Implemented
+**Goal:** Rank and select articles based on user interests
+
+**Files Created:**
+- `backend/services/personalization_service.py` - Core personalization logic
+- `backend/routes/personalization_routes.py` - API endpoints for testing
+
+**Files Modified:**
+- `backend/app.py` - Register personalization routes
+
+**What it does:**
+- Scores articles based on user's category and keyword interests
+- Ranks articles by personalization score (0.0 to 1.0)
+- Selects mix of personalized (70%) + trending (30%) content
+- Provides explanations for recommendations
+
+**Algorithm:**
+```python
+score = (category_match * 0.4) + (keyword_match * 0.6)
+
+# Example:
+# User interests: sports=0.8, "Bayern Munich"=0.9
+# Article: sports category, keywords=["Bayern Munich", "Football"]
+# Score = (0.8 * 0.4) + (0.9 * 0.6) = 0.32 + 0.54 = 0.86
+```
+
+**API Endpoints:**
+```bash
+# Preview personalized newsletter
+GET /api/personalize/preview/<email>?max_articles=10&hours_lookback=24
+
+# Explain recommendation
+POST /api/personalize/explain
+Body: {"email": "user@example.com", "article_id": "..."}
+```
+
+**Test it:**
+```bash
+# Run test script
+docker exec munich-news-local-backend python test_personalization.py
+
+# Preview personalized newsletter
+curl "http://localhost:5001/api/personalize/preview/demo@example.com?max_articles=5"
+```
+
+---
+
+## ✅ All Phases Complete!
+
+1. ~~**Phase 1:** Keyword extraction from articles~~ ✅ DONE
+2. ~~**Phase 2:** Click tracking with keywords~~ ✅ DONE
+3. ~~**Phase 3:** User interest profiling~~ ✅ DONE
+4. ~~**Phase 4:** Personalized newsletter generation~~ ✅ DONE
+
+## Next Steps for Production
+
+1. **Integrate with newsletter sender** - Modify `news_sender/sender_service.py` to use personalization
+2. **A/B testing** - Compare personalized vs non-personalized engagement
+3. **Tune parameters** - Adjust personalization_ratio, weights, decay rates
+4. **Monitor metrics** - Track click-through rates, open rates by personalization score
+5. **User controls** - Add UI for users to view/edit their interests
+
+## Configuration
+
+No configuration needed yet. Keyword extraction uses existing Ollama settings from `backend/.env`:
+- `OLLAMA_ENABLED=true`
+- `OLLAMA_MODEL=gemma3:12b`
+- `OLLAMA_BASE_URL=http://ollama:11434`