Munich-news/docs/PERSONALIZATION.md at 6c8d6d094007cbcbe591f74cbcf6b4dc4e0ea566

dongho/Munich-news

Fork 0

Files

Dongho Kim 84fce9a82c update

2025-11-18 14:45:41 +01:00

6.2 KiB

Raw Blame History

Overview

Personalized newsletters based on user click behavior, using keywords and categories to build interest profiles.

Implementation Phases

✅ Phase 1: Keyword Extraction (COMPLETED)

Status: Implemented Files Modified:

news_crawler/ollama_client.py - Added extract_keywords() method
news_crawler/crawler_service.py - Integrated keyword extraction into crawl process

What it does:

Extracts 5 keywords from each article using Ollama AI
Keywords stored in articles collection: keywords: ["Bayern Munich", "Football", ...]
Runs automatically during news crawling

Test it:

# Trigger a crawl
curl -X POST http://localhost:5001/api/admin/trigger-crawl -d '{"max_articles": 2}'

# Check articles have keywords
docker exec munich-news-mongodb mongosh munich_news --eval "db.articles.findOne({}, {title: 1, keywords: 1})"

✅ Phase 2: Click Tracking Enhancement (COMPLETED)

Status: Implemented Goal: Track clicks with keyword metadata

Files Modified:

backend/services/tracking_service.py - Enhanced create_newsletter_tracking() to look up article metadata

What it does:

When creating tracking links, looks up article from database
Stores article ID, category, and keywords in tracking record
Enables building user interest profiles from click behavior

Database Schema:

// link_clicks collection
{
  tracking_id: "uuid",
  newsletter_id: "2024-11-18",
  subscriber_email: "user@example.com",
  article_url: "https://...",
  article_title: "Article Title",
  article_id: "673abc123...",              // NEW: Article database ID
  category: "sports",                      // NEW: Article category
  keywords: ["Bayern Munich", "Bundesliga"], // NEW: Keywords for personalization
  clicked: false,
  clicked_at: null,
  user_agent: null,
  created_at: ISODate()
}

Test it:

# Send a test newsletter
curl -X POST http://localhost:5001/api/admin/send-newsletter

# Check tracking records have keywords
docker exec munich-news-mongodb mongosh munich_news --eval "db.link_clicks.findOne({}, {article_title: 1, keywords: 1, category: 1})"

✅ Phase 3: User Interest Profiling (COMPLETED)

Status: Implemented Goal: Build user interest profiles from click history

Files Created:

backend/services/interest_profiling_service.py - Core profiling logic
backend/routes/interests_routes.py - API endpoints for interest management

Files Modified:

backend/routes/tracking_routes.py - Auto-update interests on click
backend/app.py - Register interests routes

What it does:

Automatically builds interest profiles when users click articles
Tracks interest scores for categories and keywords (0.0 to 1.0)
Increments scores by 0.1 per click, capped at 1.0
Provides decay mechanism for old interests
Supports rebuilding profiles from click history

Database Schema:

// user_interests collection
{
  email: "user@example.com",
  categories: {
    sports: 0.8,
    local: 0.5,
    science: 0.2
  },
  keywords: {
    "Bayern Munich": 0.9,
    "Oktoberfest": 0.7,
    "AI": 0.3
  },
  total_clicks: 15,
  last_updated: ISODate(),
  created_at: ISODate()
}

API Endpoints:

# Get user interests
GET /api/interests/<email>

# Get top interests
GET /api/interests/<email>/top?top_n=10

# Rebuild from history
POST /api/interests/<email>/rebuild
Body: {"days_lookback": 30}

# Decay old interests
POST /api/interests/decay
Body: {"decay_factor": 0.95, "days_threshold": 7}

# Get statistics
GET /api/interests/statistics

# Delete profile (GDPR)
DELETE /api/interests/<email>

Test it:

# Run test script
docker exec munich-news-local-backend python test_interest_profiling.py

# View a user's interests
curl http://localhost:5001/api/interests/user@example.com

# Get statistics
curl http://localhost:5001/api/interests/statistics

Status: Implemented Goal: Rank and select articles based on user interests

Files Created:

backend/services/personalization_service.py - Core personalization logic
backend/routes/personalization_routes.py - API endpoints for testing

Files Modified:

backend/app.py - Register personalization routes

What it does:

Scores articles based on user's category and keyword interests
Ranks articles by personalization score (0.0 to 1.0)
Selects mix of personalized (70%) + trending (30%) content
Provides explanations for recommendations

Algorithm:

score = (category_match * 0.4) + (keyword_match * 0.6)

# Example:
# User interests: sports=0.8, "Bayern Munich"=0.9
# Article: sports category, keywords=["Bayern Munich", "Football"]
# Score = (0.8 * 0.4) + (0.9 * 0.6) = 0.32 + 0.54 = 0.86

API Endpoints:

# Preview personalized newsletter
GET /api/personalize/preview/<email>?max_articles=10&hours_lookback=24

# Explain recommendation
POST /api/personalize/explain
Body: {"email": "user@example.com", "article_id": "..."}

Test it:

# Run test script
docker exec munich-news-local-backend python test_personalization.py

# Preview personalized newsletter
curl "http://localhost:5001/api/personalize/preview/demo@example.com?max_articles=5"

✅ All Phases Complete!

Phase 1: Keyword extraction from articles ✅ DONE
Phase 2: Click tracking with keywords ✅ DONE
Phase 3: User interest profiling ✅ DONE
Phase 4: Personalized newsletter generation ✅ DONE

Next Steps for Production

Integrate with newsletter sender - Modify news_sender/sender_service.py to use personalization
A/B testing - Compare personalized vs non-personalized engagement
Tune parameters - Adjust personalization_ratio, weights, decay rates
Monitor metrics - Track click-through rates, open rates by personalization score
User controls - Add UI for users to view/edit their interests

Configuration

No configuration needed yet. Keyword extraction uses existing Ollama settings from backend/.env:

OLLAMA_ENABLED=true
OLLAMA_MODEL=gemma3:12b
OLLAMA_BASE_URL=http://ollama:11434

6.2 KiB

Raw Blame History

Overview

Implementation Phases

✅ Phase 1: Keyword Extraction (COMPLETED)

✅ Phase 2: Click Tracking Enhancement (COMPLETED)

✅ Phase 3: User Interest Profiling (COMPLETED)

✅ All Phases Complete!

Next Steps for Production

Configuration

Build together

Resources

Get help

6.2 KiB Raw Blame History

Newsletter Personalization Implementation

Overview

Implementation Phases

✅ Phase 1: Keyword Extraction (COMPLETED)

✅ Phase 2: Click Tracking Enhancement (COMPLETED)

✅ Phase 3: User Interest Profiling (COMPLETED)

✅ Phase 4: Personalized Newsletter (COMPLETED)

✅ All Phases Complete!

Next Steps for Production

Configuration

Build together

Resources

Get help

6.2 KiB

Raw Blame History