Files
Munich-news/docs/PERSONALIZATION_COMPLETE.md
2025-11-18 14:45:41 +01:00

5.5 KiB
Raw Blame History

🎉 Newsletter Personalization System - Complete!

All 4 phases of the personalization system have been successfully implemented and tested.

What Was Built

Phase 1: Keyword Extraction

  • AI-powered keyword extraction from articles using Ollama
  • 5 keywords per article automatically extracted during crawling
  • Keywords stored in database for personalization

Phase 2: Click Tracking Enhancement

  • Enhanced tracking to capture article keywords and category
  • Tracking records now include metadata for building interest profiles
  • Privacy-compliant with opt-out and GDPR support

Phase 3: User Interest Profiling

  • Automatic profile building from click behavior
  • Interest scores (0.0-1.0) for categories and keywords
  • Decay mechanism for old interests
  • API endpoints for viewing and managing profiles

Phase 4: Personalized Newsletter Generation

  • Article scoring based on user interests
  • Smart ranking algorithm (40% category + 60% keywords)
  • Mix of personalized (70%) + trending (30%) content
  • Explanation system for recommendations

📊 How It Works

1. User clicks article in newsletter
   ↓
2. System records: keywords + category
   ↓
3. Interest profile updates automatically
   ↓
4. Next newsletter: articles ranked by interests
   ↓
5. User receives personalized content

🧪 Testing

All phases have been tested and verified:

# Run comprehensive test suite (tests all 4 phases)
docker exec munich-news-local-backend python test_personalization_system.py

# Or test keyword extraction separately
docker exec munich-news-local-crawler python -c "from crawler_service import crawl_all_feeds; crawl_all_feeds(max_articles_per_feed=2)"

🔌 API Endpoints

Interest Management

GET    /api/interests/<email>              # View profile
GET    /api/interests/<email>/top          # Top interests
POST   /api/interests/<email>/rebuild      # Rebuild from history
GET    /api/interests/statistics           # Platform stats
DELETE /api/interests/<email>              # Delete (GDPR)

Personalization

GET  /api/personalize/preview/<email>      # Preview personalized newsletter
POST /api/personalize/explain              # Explain recommendation

📈 Example Results

User Profile

{
  "email": "user@example.com",
  "categories": {
    "sports": 0.30,
    "local": 0.10
  },
  "keywords": {
    "Bayern Munich": 0.30,
    "Football": 0.20,
    "Transportation": 0.10
  },
  "total_clicks": 5
}

Personalized Newsletter

{
  "articles": [
    {
      "title": "Bayern Munich wins championship",
      "personalization_score": 0.86,
      "category": "sports",
      "keywords": ["Bayern Munich", "Football"]
    },
    {
      "title": "New S-Bahn line opens",
      "personalization_score": 0.42,
      "category": "local",
      "keywords": ["Transportation", "Munich"]
    }
  ],
  "statistics": {
    "highly_personalized": 1,
    "moderately_personalized": 1,
    "trending": 0
  }
}

🎯 Scoring Algorithm

# Article score calculation
category_score = user_interests.categories[article.category]
keyword_score = average(user_interests.keywords[kw] for kw in article.keywords)

final_score = (category_score * 0.4) + (keyword_score * 0.6)

Example:

  • User: sports=0.8, "Bayern Munich"=0.9
  • Article: sports category, keywords=["Bayern Munich", "Football"]
  • Score = (0.8 × 0.4) + (0.9 × 0.6) = 0.32 + 0.54 = 0.86

🚀 Production Integration

To integrate with the newsletter sender:

  1. Modify news_sender/sender_service.py:
from services.personalization_service import select_personalized_articles

# For each subscriber
personalized_articles = select_personalized_articles(
    all_articles,
    subscriber_email,
    max_articles=10
)
  1. Enable personalization flag in config:
PERSONALIZATION_ENABLED=true
PERSONALIZATION_RATIO=0.7  # 70% personalized, 30% trending
  1. Monitor metrics:
  • Click-through rate by personalization score
  • Open rates for personalized vs non-personalized
  • User engagement over time

🔐 Privacy & Compliance

  • Users can opt out of tracking
  • Interest profiles can be deleted (GDPR)
  • Automatic anonymization after 90 days
  • No PII beyond email address
  • Transparent recommendation explanations

📁 Files Created/Modified

New Files

  • backend/services/interest_profiling_service.py
  • backend/services/personalization_service.py
  • backend/routes/interests_routes.py
  • backend/routes/personalization_routes.py
  • backend/test_tracking_phase2.py
  • backend/test_interest_profiling.py
  • backend/test_personalization.py
  • docs/PERSONALIZATION.md

Modified Files

  • news_crawler/ollama_client.py - Added keyword extraction
  • news_crawler/crawler_service.py - Integrated keyword extraction
  • backend/services/tracking_service.py - Enhanced with metadata
  • backend/routes/tracking_routes.py - Auto-update interests
  • backend/app.py - Registered new routes

🎓 Key Learnings

  1. Incremental scoring works well - 0.1 per click prevents over-weighting
  2. Mix is important - 70/30 personalized/trending avoids filter bubbles
  3. Keywords > Categories - 60/40 weight reflects keyword importance
  4. Decay is essential - Prevents stale interests from dominating
  5. Transparency matters - Explanation API helps users understand recommendations

🎉 Status: COMPLETE

All 4 phases implemented, tested, and documented. The personalization system is ready for production integration!