# Features Guide Complete guide to Munich News Daily features. --- ## Core Features ### 1. Automated News Crawling - Fetches articles from RSS feeds - Scheduled daily at 6:00 AM Berlin time - Extracts full article content - Handles multiple news sources ### 2. AI-Powered Summarization - Generates concise summaries (150 words) - Uses Ollama AI (phi3:latest model) - GPU acceleration available (5-10x faster) - Configurable summary length ### 3. Title Translation - Translates German titles to English - Uses Ollama AI - Displays both languages in newsletter - Stores both versions in database ### 4. Newsletter Generation - Beautiful HTML email template - Responsive design - Numbered articles - Summary statistics - Scheduled daily at 7:00 AM Berlin time ### 5. Engagement Tracking - Email open tracking (pixel) - Link click tracking - Analytics dashboard ready - Subscriber engagement metrics --- ## News Crawler ### How It Works ``` 1. Fetch RSS feeds from database 2. Parse RSS XML 3. Extract article URLs 4. Fetch full article content 5. Extract text from HTML 6. Translate title (German → English) 7. Generate AI summary 8. Store in MongoDB ``` ### Content Extraction **Strategies (in order):** 1. **Article Tag** - Look for `
` tags 2. **Main Tag** - Look for `
` content 3. **Content Divs** - Common class names (content, article-body, etc.) 4. **Paragraph Aggregation** - Collect all `

` tags 5. **Fallback** - Use RSS description **Cleaning:** - Remove scripts and styles - Remove navigation elements - Remove ads and sidebars - Extract clean text - Preserve paragraphs ### RSS Feed Handling **Supported Formats:** - RSS 2.0 - Atom - Custom formats **Extracted Data:** - Title - Link - Description/Summary - Published date - Author (if available) **Error Handling:** - Retry failed requests - Skip invalid URLs - Log errors - Continue with next article --- ## AI Features ### Summarization **Process:** 1. Send article text to Ollama 2. Request 150-word summary 3. Receive AI-generated summary 4. Store with article **Configuration:** ```env OLLAMA_ENABLED=true OLLAMA_MODEL=phi3:latest SUMMARY_MAX_WORDS=150 OLLAMA_TIMEOUT=120 ``` **Performance:** - CPU: ~8s per article - GPU: ~2s per article (4x faster) ### Translation **Process:** 1. Send German title to Ollama 2. Request English translation 3. Receive translated title 4. Store both versions **Configuration:** ```env OLLAMA_ENABLED=true OLLAMA_MODEL=phi3:latest ``` **Performance:** - CPU: ~1.5s per title - GPU: ~0.3s per title (5x faster) **Newsletter Display:** ``` English Title (Primary) Original: German Title (Subtitle) ``` --- ## Newsletter System ### Template Features - **Responsive Design** - Works on all devices - **Clean Layout** - Easy to read - **Numbered Articles** - Clear organization - **Summary Box** - Quick stats - **Tracking Links** - Click tracking - **Unsubscribe Link** - Easy opt-out ### Personalization - Greeting message - Date formatting - Article count - Source attribution - Author names ### Tracking **Open Tracking:** - Invisible 1x1 pixel image - Loaded when email opened - Records timestamp - Tracks unique opens **Click Tracking:** - All article links tracked - Redirect through backend - Records click events - Tracks which articles clicked --- ## Subscriber Management ### Status System | Status | Description | Receives Newsletters | |--------|-------------|---------------------| | `active` | Subscribed | ✅ Yes | | `inactive` | Unsubscribed | ❌ No | ### Operations **Subscribe:** ```bash curl -X POST http://localhost:5001/api/subscribe \ -H "Content-Type: application/json" \ -d '{"email": "user@example.com"}' ``` **Unsubscribe:** ```bash curl -X POST http://localhost:5001/api/unsubscribe \ -H "Content-Type: application/json" \ -d '{"email": "user@example.com"}' ``` **Check Stats:** ```bash curl http://localhost:5001/api/admin/stats | jq '.subscribers' ``` --- ## Admin Features ### Manual Crawl Trigger crawl anytime: ```bash curl -X POST http://localhost:5001/api/admin/trigger-crawl \ -H "Content-Type: application/json" \ -d '{"max_articles": 10}' ``` ### Test Email Send test newsletter: ```bash curl -X POST http://localhost:5001/api/admin/send-test-email \ -H "Content-Type: application/json" \ -d '{"email": "test@example.com"}' ``` ### Send Newsletter Send to all subscribers: ```bash curl -X POST http://localhost:5001/api/admin/send-newsletter \ -H "Content-Type: application/json" \ -d '{"max_articles": 10}' ``` ### System Stats View system statistics: ```bash curl http://localhost:5001/api/admin/stats ``` --- ## Automation ### Scheduled Tasks **Crawler (6:00 AM Berlin time):** - Fetches new articles - Processes with AI - Stores in database **Sender (7:00 AM Berlin time):** - Waits for crawler to finish - Fetches today's articles - Generates newsletter - Sends to all active subscribers ### Manual Execution ```bash # Run crawler manually docker-compose exec crawler python crawler_service.py 10 # Run sender manually docker-compose exec sender python sender_service.py send 10 # Send test email docker-compose exec sender python sender_service.py test your@email.com ``` --- ## Configuration ### Environment Variables ```env # Newsletter Settings NEWSLETTER_MAX_ARTICLES=10 NEWSLETTER_HOURS_LOOKBACK=24 WEBSITE_URL=http://localhost:3000 # Ollama AI OLLAMA_ENABLED=true OLLAMA_BASE_URL=http://ollama:11434 OLLAMA_MODEL=phi3:latest OLLAMA_TIMEOUT=120 SUMMARY_MAX_WORDS=150 # Tracking TRACKING_ENABLED=true TRACKING_API_URL=http://localhost:5001 TRACKING_DATA_RETENTION_DAYS=90 ``` ### RSS Feeds Add feeds in MongoDB: ```javascript db.rss_feeds.insertOne({ name: "Süddeutsche Zeitung München", url: "https://www.sueddeutsche.de/muenchen/rss", active: true }) ``` --- ## Performance Optimization ### GPU Acceleration Enable for 5-10x faster processing: ```bash ./start-with-gpu.sh ``` **Benefits:** - Faster summarization (8s → 2s) - Faster translation (1.5s → 0.3s) - Process more articles - Lower CPU usage ### Batch Processing Process multiple articles efficiently: - Model stays loaded in memory - Reduced overhead - Better throughput ### Caching - Model caching (Ollama) - Database connection pooling - Persistent storage --- ## Monitoring ### Logs ```bash # Crawler logs docker-compose logs -f crawler # Sender logs docker-compose logs -f sender # Backend logs docker-compose logs -f backend ``` ### Metrics - Articles crawled - Summaries generated - Newsletters sent - Open rate - Click-through rate - Processing time ### Health Checks ```bash # Backend health curl http://localhost:5001/health # System stats curl http://localhost:5001/api/admin/stats ``` --- ## Troubleshooting ### Crawler Issues **No articles found:** - Check RSS feed URLs - Verify feeds are active - Check network connectivity **Extraction failed:** - Article structure changed - Paywall detected - Network timeout **AI processing failed:** - Ollama not running - Model not downloaded - Timeout too short ### Newsletter Issues **Not sending:** - Check email configuration - Verify SMTP credentials - Check subscriber count **Tracking not working:** - Verify tracking enabled - Check backend API accessible - Verify tracking URLs --- See [SETUP.md](SETUP.md) for configuration, [API.md](API.md) for API reference, and [ARCHITECTURE.md](ARCHITECTURE.md) for system design.