# Newsletter Personalization Implementation ## Overview Personalized newsletters based on user click behavior, using keywords and categories to build interest profiles. ## Implementation Phases ### ✅ Phase 1: Keyword Extraction (COMPLETED) **Status:** Implemented **Files Modified:** - `news_crawler/ollama_client.py` - Added `extract_keywords()` method - `news_crawler/crawler_service.py` - Integrated keyword extraction into crawl process **What it does:** - Extracts 5 keywords from each article using Ollama AI - Keywords stored in `articles` collection: `keywords: ["Bayern Munich", "Football", ...]` - Runs automatically during news crawling **Test it:** ```bash # Trigger a crawl curl -X POST http://localhost:5001/api/admin/trigger-crawl -d '{"max_articles": 2}' # Check articles have keywords docker exec munich-news-mongodb mongosh munich_news --eval "db.articles.findOne({}, {title: 1, keywords: 1})" ``` --- ### ✅ Phase 2: Click Tracking Enhancement (COMPLETED) **Status:** Implemented **Goal:** Track clicks with keyword metadata **Files Modified:** - `backend/services/tracking_service.py` - Enhanced `create_newsletter_tracking()` to look up article metadata **What it does:** - When creating tracking links, looks up article from database - Stores article ID, category, and keywords in tracking record - Enables building user interest profiles from click behavior **Database Schema:** ```javascript // link_clicks collection { tracking_id: "uuid", newsletter_id: "2024-11-18", subscriber_email: "user@example.com", article_url: "https://...", article_title: "Article Title", article_id: "673abc123...", // NEW: Article database ID category: "sports", // NEW: Article category keywords: ["Bayern Munich", "Bundesliga"], // NEW: Keywords for personalization clicked: false, clicked_at: null, user_agent: null, created_at: ISODate() } ``` **Test it:** ```bash # Send a test newsletter curl -X POST http://localhost:5001/api/admin/send-newsletter # Check tracking records have keywords docker exec munich-news-mongodb mongosh munich_news --eval "db.link_clicks.findOne({}, {article_title: 1, keywords: 1, category: 1})" ``` --- ### ✅ Phase 3: User Interest Profiling (COMPLETED) **Status:** Implemented **Goal:** Build user interest profiles from click history **Files Created:** - `backend/services/interest_profiling_service.py` - Core profiling logic - `backend/routes/interests_routes.py` - API endpoints for interest management **Files Modified:** - `backend/routes/tracking_routes.py` - Auto-update interests on click - `backend/app.py` - Register interests routes **What it does:** - Automatically builds interest profiles when users click articles - Tracks interest scores for categories and keywords (0.0 to 1.0) - Increments scores by 0.1 per click, capped at 1.0 - Provides decay mechanism for old interests - Supports rebuilding profiles from click history **Database Schema:** ```javascript // user_interests collection { email: "user@example.com", categories: { sports: 0.8, local: 0.5, science: 0.2 }, keywords: { "Bayern Munich": 0.9, "Oktoberfest": 0.7, "AI": 0.3 }, total_clicks: 15, last_updated: ISODate(), created_at: ISODate() } ``` **API Endpoints:** ```bash # Get user interests GET /api/interests/ # Get top interests GET /api/interests//top?top_n=10 # Rebuild from history POST /api/interests//rebuild Body: {"days_lookback": 30} # Decay old interests POST /api/interests/decay Body: {"decay_factor": 0.95, "days_threshold": 7} # Get statistics GET /api/interests/statistics # Delete profile (GDPR) DELETE /api/interests/ ``` **Test it:** ```bash # Run test script docker exec munich-news-local-backend python test_interest_profiling.py # View a user's interests curl http://localhost:5001/api/interests/user@example.com # Get statistics curl http://localhost:5001/api/interests/statistics ``` --- ### ✅ Phase 4: Personalized Newsletter (COMPLETED) **Status:** Implemented **Goal:** Rank and select articles based on user interests **Files Created:** - `backend/services/personalization_service.py` - Core personalization logic - `backend/routes/personalization_routes.py` - API endpoints for testing **Files Modified:** - `backend/app.py` - Register personalization routes **What it does:** - Scores articles based on user's category and keyword interests - Ranks articles by personalization score (0.0 to 1.0) - Selects mix of personalized (70%) + trending (30%) content - Provides explanations for recommendations **Algorithm:** ```python score = (category_match * 0.4) + (keyword_match * 0.6) # Example: # User interests: sports=0.8, "Bayern Munich"=0.9 # Article: sports category, keywords=["Bayern Munich", "Football"] # Score = (0.8 * 0.4) + (0.9 * 0.6) = 0.32 + 0.54 = 0.86 ``` **API Endpoints:** ```bash # Preview personalized newsletter GET /api/personalize/preview/?max_articles=10&hours_lookback=24 # Explain recommendation POST /api/personalize/explain Body: {"email": "user@example.com", "article_id": "..."} ``` **Test it:** ```bash # Run test script docker exec munich-news-local-backend python test_personalization.py # Preview personalized newsletter curl "http://localhost:5001/api/personalize/preview/demo@example.com?max_articles=5" ``` --- ## ✅ All Phases Complete! 1. ~~**Phase 1:** Keyword extraction from articles~~ ✅ DONE 2. ~~**Phase 2:** Click tracking with keywords~~ ✅ DONE 3. ~~**Phase 3:** User interest profiling~~ ✅ DONE 4. ~~**Phase 4:** Personalized newsletter generation~~ ✅ DONE ## Next Steps for Production 1. **Integrate with newsletter sender** - Modify `news_sender/sender_service.py` to use personalization 2. **A/B testing** - Compare personalized vs non-personalized engagement 3. **Tune parameters** - Adjust personalization_ratio, weights, decay rates 4. **Monitor metrics** - Track click-through rates, open rates by personalization score 5. **User controls** - Add UI for users to view/edit their interests ## Configuration No configuration needed yet. Keyword extraction uses existing Ollama settings from `backend/.env`: - `OLLAMA_ENABLED=true` - `OLLAMA_MODEL=gemma3:12b` - `OLLAMA_BASE_URL=http://ollama:11434`