Files
Munich-news/docs/PERSONALIZATION.md
2025-11-18 14:45:41 +01:00

6.2 KiB

Newsletter Personalization Implementation

Overview

Personalized newsletters based on user click behavior, using keywords and categories to build interest profiles.

Implementation Phases

Phase 1: Keyword Extraction (COMPLETED)

Status: Implemented Files Modified:

  • news_crawler/ollama_client.py - Added extract_keywords() method
  • news_crawler/crawler_service.py - Integrated keyword extraction into crawl process

What it does:

  • Extracts 5 keywords from each article using Ollama AI
  • Keywords stored in articles collection: keywords: ["Bayern Munich", "Football", ...]
  • Runs automatically during news crawling

Test it:

# Trigger a crawl
curl -X POST http://localhost:5001/api/admin/trigger-crawl -d '{"max_articles": 2}'

# Check articles have keywords
docker exec munich-news-mongodb mongosh munich_news --eval "db.articles.findOne({}, {title: 1, keywords: 1})"

Phase 2: Click Tracking Enhancement (COMPLETED)

Status: Implemented Goal: Track clicks with keyword metadata

Files Modified:

  • backend/services/tracking_service.py - Enhanced create_newsletter_tracking() to look up article metadata

What it does:

  • When creating tracking links, looks up article from database
  • Stores article ID, category, and keywords in tracking record
  • Enables building user interest profiles from click behavior

Database Schema:

// link_clicks collection
{
  tracking_id: "uuid",
  newsletter_id: "2024-11-18",
  subscriber_email: "user@example.com",
  article_url: "https://...",
  article_title: "Article Title",
  article_id: "673abc123...",              // NEW: Article database ID
  category: "sports",                      // NEW: Article category
  keywords: ["Bayern Munich", "Bundesliga"], // NEW: Keywords for personalization
  clicked: false,
  clicked_at: null,
  user_agent: null,
  created_at: ISODate()
}

Test it:

# Send a test newsletter
curl -X POST http://localhost:5001/api/admin/send-newsletter

# Check tracking records have keywords
docker exec munich-news-mongodb mongosh munich_news --eval "db.link_clicks.findOne({}, {article_title: 1, keywords: 1, category: 1})"

Phase 3: User Interest Profiling (COMPLETED)

Status: Implemented Goal: Build user interest profiles from click history

Files Created:

  • backend/services/interest_profiling_service.py - Core profiling logic
  • backend/routes/interests_routes.py - API endpoints for interest management

Files Modified:

  • backend/routes/tracking_routes.py - Auto-update interests on click
  • backend/app.py - Register interests routes

What it does:

  • Automatically builds interest profiles when users click articles
  • Tracks interest scores for categories and keywords (0.0 to 1.0)
  • Increments scores by 0.1 per click, capped at 1.0
  • Provides decay mechanism for old interests
  • Supports rebuilding profiles from click history

Database Schema:

// user_interests collection
{
  email: "user@example.com",
  categories: {
    sports: 0.8,
    local: 0.5,
    science: 0.2
  },
  keywords: {
    "Bayern Munich": 0.9,
    "Oktoberfest": 0.7,
    "AI": 0.3
  },
  total_clicks: 15,
  last_updated: ISODate(),
  created_at: ISODate()
}

API Endpoints:

# Get user interests
GET /api/interests/<email>

# Get top interests
GET /api/interests/<email>/top?top_n=10

# Rebuild from history
POST /api/interests/<email>/rebuild
Body: {"days_lookback": 30}

# Decay old interests
POST /api/interests/decay
Body: {"decay_factor": 0.95, "days_threshold": 7}

# Get statistics
GET /api/interests/statistics

# Delete profile (GDPR)
DELETE /api/interests/<email>

Test it:

# Run test script
docker exec munich-news-local-backend python test_interest_profiling.py

# View a user's interests
curl http://localhost:5001/api/interests/user@example.com

# Get statistics
curl http://localhost:5001/api/interests/statistics

Phase 4: Personalized Newsletter (COMPLETED)

Status: Implemented Goal: Rank and select articles based on user interests

Files Created:

  • backend/services/personalization_service.py - Core personalization logic
  • backend/routes/personalization_routes.py - API endpoints for testing

Files Modified:

  • backend/app.py - Register personalization routes

What it does:

  • Scores articles based on user's category and keyword interests
  • Ranks articles by personalization score (0.0 to 1.0)
  • Selects mix of personalized (70%) + trending (30%) content
  • Provides explanations for recommendations

Algorithm:

score = (category_match * 0.4) + (keyword_match * 0.6)

# Example:
# User interests: sports=0.8, "Bayern Munich"=0.9
# Article: sports category, keywords=["Bayern Munich", "Football"]
# Score = (0.8 * 0.4) + (0.9 * 0.6) = 0.32 + 0.54 = 0.86

API Endpoints:

# Preview personalized newsletter
GET /api/personalize/preview/<email>?max_articles=10&hours_lookback=24

# Explain recommendation
POST /api/personalize/explain
Body: {"email": "user@example.com", "article_id": "..."}

Test it:

# Run test script
docker exec munich-news-local-backend python test_personalization.py

# Preview personalized newsletter
curl "http://localhost:5001/api/personalize/preview/demo@example.com?max_articles=5"

All Phases Complete!

  1. Phase 1: Keyword extraction from articles DONE
  2. Phase 2: Click tracking with keywords DONE
  3. Phase 3: User interest profiling DONE
  4. Phase 4: Personalized newsletter generation DONE

Next Steps for Production

  1. Integrate with newsletter sender - Modify news_sender/sender_service.py to use personalization
  2. A/B testing - Compare personalized vs non-personalized engagement
  3. Tune parameters - Adjust personalization_ratio, weights, decay rates
  4. Monitor metrics - Track click-through rates, open rates by personalization score
  5. User controls - Add UI for users to view/edit their interests

Configuration

No configuration needed yet. Keyword extraction uses existing Ollama settings from backend/.env:

  • OLLAMA_ENABLED=true
  • OLLAMA_MODEL=gemma3:12b
  • OLLAMA_BASE_URL=http://ollama:11434