update

2025-12-10 15:57:07 +00:00 · 2025-12-10 15:52:41 +00:00 · 2025-12-10 15:50:11 +00:00
15 changed files with 439 additions and 617 deletions
--- a/25
+++ b/25
@@ -0,0 +1,25 @@
 pipeline {
    agent any
    stages {
        stage('Security Scan') {
            steps {
                withCredentials([string(credentialsId: 'nvd-api-key', variable: 'NVD_API_KEY')]) {
                    // Run OWASP Dependency Check using the specific installation configured in Jenkins
                    // Using NVD API Key to avoid rate limiting
                    dependencyCheck additionalArguments: "--scan ./ --format ALL --nvdApiKey ${NVD_API_KEY}", odcInstallation: 'depcheck'
                }
            }
        }
    }
    post {
        always {
            // Publish the results
            dependencyCheckPublisher pattern: 'dependency-check-report.xml'
            // Archive the reports
            archiveArtifacts allowEmptyArchive: true, artifacts: 'dependency-check-report.html'
        }
    }
 }
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@@ -1,56 +1,36 @@
-# Quick Start Guide
+# ⚡ Quick Start Guide
 Get Munich News Daily running in 5 minutes!
-## Prerequisites
+## 📋 Prerequisites
 - **Docker** & **Docker Compose** installed
 - **4GB+ RAM** (for AI models)
 - *(Optional)* NVIDIA GPU for faster processing
- Docker & Docker Compose installed
+## 🚀 Setup Steps
 - 4GB+ RAM (for Ollama AI models)
 - (Optional) NVIDIA GPU for 5-10x faster AI processing
 ## Setup
 ### 1. Configure Environment
 ```bash
 # Copy example environment file
 cp backend/.env.example backend/.env
 # Edit with your settings (required: email configuration)
 nano backend/.env
 ```
 **Required:** Update `SMTP_SERVER`, `EMAIL_USER`, and `EMAIL_PASSWORD`.
-**Minimum required settings:**
+### 2. Start the System
 ```env
 SMTP_SERVER=smtp.gmail.com
 SMTP_PORT=587
 EMAIL_USER=your-email@gmail.com
 EMAIL_PASSWORD=your-app-password
 ```
 ### 2. Start System
 ```bash
-# Option 1: Auto-detect GPU and start (recommended)
+# Auto-detects GPU capabilities and starts services
 ./start-with-gpu.sh
-# Option 2: Start without GPU
+# Watch installation progress (first time model download ~2GB)
 docker-compose up -d
 # View logs
 docker-compose logs -f
 # Wait for Ollama model download (first time only, ~2-5 minutes)
 docker-compose logs -f ollama-setup
 ```
-**Note:** First startup downloads the phi3:latest AI model (2.2GB). This happens automatically.
+### 3. Add News Sources
 ### 3. Add RSS Feeds
 ```bash
-mongosh munich_news
+# Connect to database
 docker-compose exec mongodb mongosh munich_news
 # Paste this into the mongo shell:
 db.rss_feeds.insertMany([
  {
    name: "Süddeutsche Zeitung München",
@@ -65,11 +45,9 @@ db.rss_feeds.insertMany([
 ])
 ```
-### 4. Add Subscribers
+### 4. Add Yourself as Subscriber
 ```bash
-mongosh munich_news
+# Still in mongo shell:
 db.subscribers.insertOne({
  email: "your-email@example.com",
  active: true,
@@ -78,90 +56,35 @@ db.subscribers.insertOne({
 })
 ```
-### 5. Test It
+### 5. Verify Installation
 ```bash
-# Test crawler
+# 1. Run the crawler manually to fetch news
 docker-compose exec crawler python crawler_service.py 5
-# Test newsletter
+# 2. Send a test email to yourself
 docker-compose exec sender python sender_service.py test your-email@example.com
 ```
-## What Happens Next?
+## 🎮 Dashboard Access
-The system will automatically:
+Once running, access the services:
- **Backend API**: Runs continuously at http://localhost:5001 for tracking and analytics
+- **Dashboard**: [http://localhost:3000](http://localhost:3000)
- **6:00 AM Berlin time**: Crawl news articles
+- **API**: [http://localhost:5001](http://localhost:5001)
 - **7:00 AM Berlin time**: Send newsletter to subscribers
-## View Results
+## ⏭️ What's Next?
 The system is now fully automated:
 1.  **6:00 AM**: Crawls news and generates AI summaries.
 2.  **7:00 AM**: Sends the daily newsletter.
 ### Useful Commands
 ```bash
-# Check articles
+# Stop everything
 mongosh munich_news
 db.articles.find().sort({ crawled_at: -1 }).limit(5)
 # Check logs
 docker-compose logs -f crawler
 docker-compose logs -f sender
 ```
 ## Common Commands
 ```bash
 # Stop system
 docker-compose down
-# Restart system
+# View logs for a service
-docker-compose restart
+docker-compose logs -f crawler
-# View logs
+# Update code & rebuild
 docker-compose logs -f
 # Rebuild after changes
 docker-compose up -d --build
 ```
 ## New Features
 ### GPU Acceleration (5-10x Faster)
 Enable GPU support for faster AI processing:
 ```bash
 ./check-gpu.sh          # Check if GPU is available
 ./start-with-gpu.sh     # Start with GPU support
 ```
 See [docs/GPU_SETUP.md](docs/GPU_SETUP.md) for details.
 ### Send Newsletter to All Subscribers
 ```bash
 # Send newsletter to all active subscribers
 curl -X POST http://localhost:5001/api/admin/send-newsletter \
  -H "Content-Type: application/json" \
  -d '{"max_articles": 10}'
 ```
 ### Security Features
 - ✅ Only Backend API exposed (port 5001)
 - ✅ MongoDB internal-only (secure)
 - ✅ Ollama internal-only (secure)
 - ✅ All services communicate via internal Docker network
 ## Need Help?
 - **Documentation Index**: [docs/INDEX.md](docs/INDEX.md)
 - **GPU Setup**: [docs/GPU_SETUP.md](docs/GPU_SETUP.md)
 - **API Reference**: [docs/ADMIN_API.md](docs/ADMIN_API.md)
 - **Security Guide**: [docs/SECURITY_NOTES.md](docs/SECURITY_NOTES.md)
 - **Full Documentation**: [README.md](README.md)
 ## Next Steps
 1. ✅ **Enable GPU acceleration** - [docs/GPU_SETUP.md](docs/GPU_SETUP.md)
 2. Set up tracking API (optional)
 3. Customize newsletter template
 4. Add more RSS feeds
 5. Monitor engagement metrics
 6. Review security settings - [docs/SECURITY_NOTES.md](docs/SECURITY_NOTES.md)
 That's it! Your automated news system is running. 🎉
--- a/README.md
+++ b/README.md
@@ -1,460 +1,193 @@
 # Munich News Daily - Automated Newsletter System
-A fully automated news aggregation and newsletter system that crawls Munich news sources, generates AI summaries, and sends daily newsletters with engagement tracking.
+A fully automated news aggregation system that crawls Munich news sources, generates AI-powered summaries, tracks local transport disruptions, and delivers personalized daily newsletters.
 ![Munich News Daily](https://via.placeholder.com/800x400?text=Munich+News+Daily+Dashboard)
 ## ✨ Key Features
- **🤖 AI-Powered Clustering** - Automatically detects duplicate stories from different sources
+- **🤖 AI-Powered Clustering** - Smartly detects duplicate stories and groups related articles using ChromaDB vector search.
- **📰 Neutral Summaries** - Combines multiple perspectives into balanced coverage
+- **📝 Neutral Summaries** - Generates balanced, multi-perspective summaries using local LLMs (Ollama).
- **🎯 Smart Prioritization** - Shows most important stories first (multi-source coverage)
+- **🚇 Transport Updates** - Real-time tracking of Munich public transport (MVG) disruptions options.
- **🎨 Personalized Newsletters** - AI-powered content recommendations based on user interests
+- **🎯 Smart Prioritization** - Ranks stories based on relevance and user preferences.
- **📊 Engagement Tracking** - Open rates, click tracking, and analytics
+- **🎨 Personalized Newsletters** - diverse content delivery system.
- **⚡ GPU Acceleration** - 5-10x faster AI processing with GPU support
+- **📊 Engagement Analytics** - Detailed tracking of open rates, click-throughs, and user interests.
- **🔒 GDPR Compliant** - Privacy-first with data retention controls
+- **⚡ GPU Acceleration** - Integrated support for NVIDIA GPUs for faster AI processing.
-
+- **🔒 Privacy First** - GDPR-compliant with automatic data retention policies and anonymization.
 **🚀 NEW:** GPU acceleration support for 5-10x faster AI processing! See [docs/GPU_SETUP.md](docs/GPU_SETUP.md)
 ## 🚀 Quick Start
 For a detailed 5-minute setup guide, see [QUICKSTART.md](QUICKSTART.md).
 ```bash
 # 1. Configure environment
 cp backend/.env.example backend/.env
 # Edit backend/.env with your email settings
-# 2. Start everything
+# 2. Start everything (Auto-detects GPU)
-docker-compose up -d
+./start-with-gpu.sh
-# 3. View logs
+# Questions?
-docker-compose logs -f
+# See logs: docker-compose logs -f
 ```
-That's it! The system will automatically:
+The system will automatically:
- **Frontend**: Web interface and admin dashboard (http://localhost:3000)
+1.  **6:00 AM**: Crawl news & transport updates.
- **Backend API**: Runs continuously for tracking and analytics (http://localhost:5001)
+2.  **6:30 AM**: Generate AI summaries & clusters.
- **6:00 AM Berlin time**: Crawl news articles and generate summaries
+3.  **7:00 AM**: Send personalized newsletters.
 - **7:00 AM Berlin time**: Send newsletter to all subscribers
-### Access Points
+## 📋 System Architecture
- **Newsletter Page**: http://localhost:3000
+The system is built as a set of microservices orchestrated by Docker Compose.
 - **Admin Dashboard**: http://localhost:3000/admin.html
 - **Backend API**: http://localhost:5001
-📖 **New to the project?** See [QUICKSTART.md](QUICKSTART.md) for a detailed 5-minute setup guide.
+```mermaid
-
+graph TD
-🚀 **GPU Acceleration:** Enable 5-10x faster AI processing with [GPU Setup Guide](docs/GPU_SETUP.md)
+    User[Subscribers] -->|Email| Sender[Newsletter Sender]
-
+    User -->|Web| Frontend[React Frontend]
-## 📋 System Overview
+    Frontend -->|API| Backend[Backend API]
-
+    
-```
+    subgraph "Core Services"
-6:00 AM → News Crawler
+        Crawler[News Crawler]
-          ↓
+        Transport[Transport Crawler]
-          Fetches articles from RSS feeds
+        Sender
-          Extracts full content
+        Backend
-          Generates AI summaries
+    end
-          Saves to MongoDB
+    
-          ↓
+    subgraph "Data & AI"
-7:00 AM → Newsletter Sender
+        Mongo[(MongoDB)]
-          ↓
+        Redis[(Redis)]
-          Waits for crawler to finish
+        Chroma[(ChromaDB)]
-          Fetches today's articles
+        Ollama[Ollama AI]
-          Generates newsletter with tracking
+    end
-          Sends to all subscribers
+    
-          ↓
+    Crawler -->|Save| Mongo
-          ✅ Done! Repeat tomorrow
+    Crawler -->|Embeddings| Chroma
    Crawler -->|Summarize| Ollama
    Transport -->|Save| Mongo
    Sender -->|Read| Mongo
    Sender -->|Track| Backend
    Backend -->|Read/Write| Mongo
    Backend -->|Cache| Redis
 ```
-## 🏗️ Architecture
+### Core Components
-### Components
+| Service | Description | Port |
 |---------|-------------|------|
 | **Frontend** | React-based user dashboard and admin interface. | 3000 |
 | **Backend API** | Flask API for tracking, analytics, and management. | 5001 |
 | **News Crawler** | Fetches RSS feeds, extracts content, and runs AI clustering. | - |
 | **Transport Crawler** | Monitors MVG (Munich Transport) for delays and disruptions. | - |
 | **Newsletter Sender** | Manages subscribers, generates templates, and sends emails. | - |
 | **Ollama** | Local LLM runner for on-premise AI (Phi-3, Llama3, etc.). | - |
 | **ChromaDB** | Vector database for semantic search and article clustering. | - |
- **Ollama**: AI service for summarization and translation (internal only, GPU-accelerated)
+## 📂 Project Structure
 - **MongoDB**: Data storage (articles, subscribers, tracking) (internal only)
 - **Backend API**: Flask API for tracking and analytics (port 5001 - only exposed service)
 - **News Crawler**: Automated RSS feed crawler with AI summarization (internal only)
 - **Newsletter Sender**: Automated email sender with tracking (internal only)
 - **Frontend**: React dashboard (optional)
-### Technology Stack
+```text
 munich-news/
 ├── backend/            # Flask API for tracking & analytics
 ├── frontend/           # React dashboard & admin UI
 ├── news_crawler/       # RSS fetcher & AI summarizer service
 ├── news_sender/        # Email generation & dispatch service
 ├── transport_crawler/  # MVG transport disruption monitor
 ├── docker-compose.yml  # Main service orchestration
 └── docs/               # Detailed documentation
 ```
- Python 3.11
+## 🛠️ Installation & Setup
 - MongoDB 7.0
 - Ollama (phi3:latest model for AI)
 - Docker & Docker Compose
 - Flask (API)
 - Schedule (automation)
 - Jinja2 (email templates)
-## 📦 Installation
+1.  **Clone the repository**
    ```bash
    git clone https://github.com/yourusername/munich-news.git
    cd munich-news
    ```
-### Prerequisites
+2.  **Environment Configuration**
    ```bash
    cp backend/.env.example backend/.env
    nano backend/.env
    ```
    *Critical settings:* `SMTP_SERVER`, `EMAIL_USER`, `EMAIL_PASSWORD`.
- Docker & Docker Compose
+3.  **Start the System**
- 4GB+ RAM (for Ollama AI models)
+    ```bash
- (Optional) NVIDIA GPU for 5-10x faster AI processing
+    # Recommended: Helper script (handles GPU & Model setup)
    ./start-with-gpu.sh
    # Alternative: Standard Docker Compose
    docker-compose up -d
    ```
-### Setup
+4.  **Initial Setup (First Run)**
-
+    *   The system needs to download the AI model (approx. 2GB).
-1. **Clone the repository**
+    *   Watch progress: `docker-compose logs -f ollama-setup`
   ```bash
   git clone <repository-url>
   cd munich-news
   ```
 2. **Configure environment**
   ```bash
   cp backend/.env.example backend/.env
   # Edit backend/.env with your settings
   ```
 3. **Configure Ollama (AI features)**
   ```bash
   # Option 1: Use integrated Docker Compose Ollama (recommended)
   ./configure-ollama.sh
   # Select option 1
   # Option 2: Use external Ollama server
   # Install from https://ollama.ai/download
   # Then run: ollama pull phi3:latest
   ```
 4. **Start the system**
   ```bash
   # Auto-detect GPU and start (recommended)
   ./start-with-gpu.sh
   # Or start manually
   docker-compose up -d
   # First time: Wait for Ollama model download (2-5 minutes)
   docker-compose logs -f ollama-setup
   ```
 📖 **For detailed Ollama setup & GPU acceleration:** See [docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)
 💡 **To change AI model:** Edit `OLLAMA_MODEL` in `.env`, then run `./pull-ollama-model.sh`. See [docs/CHANGING_AI_MODEL.md](docs/CHANGING_AI_MODEL.md)
 ## ⚙️ Configuration
-Edit `backend/.env`:
+Key configuration options in `backend/.env`:
-```env
+| Category | Variable | Description |
-# MongoDB
+|----------|----------|-------------|
-MONGODB_URI=mongodb://localhost:27017/
+| **Email** | `SMTP_SERVER` | SMTP Server (e.g., smtp.gmail.com) |
 | | `EMAIL_USER` | Your sending email address |
 | **AI** | `OLLAMA_MODEL` | Model to use (default: phi3:latest) |
 | **Schedule** | `CRAWLER_TIME` | Time to start crawling (e.g., "06:00") |
 | | `SENDER_TIME` | Time to send emails (e.g., "07:00") |
-# Email (SMTP)
+## 📊 Usage & Monitoring
 SMTP_SERVER=smtp.gmail.com
 SMTP_PORT=587
 EMAIL_USER=your-email@gmail.com
 EMAIL_PASSWORD=your-app-password
-# Newsletter
+### Access Points
-NEWSLETTER_MAX_ARTICLES=10
+*   **Web Dashboard**: [http://localhost:3000](http://localhost:3000) (or configured domain)
-NEWSLETTER_HOURS_LOOKBACK=24
+*   **API**: [http://localhost:5001](http://localhost:5001)
-# Tracking
+### Useful Commands
 TRACKING_ENABLED=true
 TRACKING_API_URL=http://localhost:5001
 TRACKING_DATA_RETENTION_DAYS=90
-# Ollama (AI Summarization)
+**View Logs**
-OLLAMA_ENABLED=true
+```bash
-OLLAMA_BASE_URL=http://127.0.0.1:11434
+docker-compose logs -f [service_name]
-OLLAMA_MODEL=phi3:latest
+# e.g., docker-compose logs -f crawler
 ```
-## 📊 Usage
+**Manual Trigger**
 ### View Logs
 ```bash
-# All services
+# Run News Crawler immediately
 docker-compose logs -f
 # Specific service
 docker-compose logs -f crawler
 docker-compose logs -f sender
 docker-compose logs -f mongodb
 ```
 ### Manual Operations
 ```bash
 # Run crawler manually
 docker-compose exec crawler python crawler_service.py 10
-# Send test newsletter
+# Run Transport Crawler immediately
-docker-compose exec sender python sender_service.py test your-email@example.com
+docker-compose exec transport-crawler python transport_service.py
-# Preview newsletter
+# Send Test Newsletter
-docker-compose exec sender python sender_service.py preview
+docker-compose exec sender python sender_service.py test user@example.com
 ```
-### Database Access
+**Database Access**
 ```bash
 # Connect to MongoDB
 docker-compose exec mongodb mongosh munich_news
 # View articles
 db.articles.find().sort({ crawled_at: -1 }).limit(5).pretty()
 # View subscribers
 db.subscribers.find({ active: true }).pretty()
 # View tracking data
 db.newsletter_sends.find().sort({ created_at: -1 }).limit(10).pretty()
 ```
-## 🔧 Management
+## 🌐 Production Deployment (Traefik)
-### Add RSS Feeds
+This project is configured to work with **Traefik** as a reverse proxy.
 The `docker-compose.yml` includes labels for:
 - `news.dongho.kim` (Frontend)
 - `news-api.dongho.kim` (Backend)
-```bash
+To use this locally, add these to your `/etc/hosts`:
-mongosh munich_news
+```text
-
+127.0.0.1 news.dongho.kim news-api.dongho.kim
 db.rss_feeds.insertOne({
  name: "Source Name",
  url: "https://example.com/rss",
  active: true
 })
 ```
-### Add Subscribers
+For production, ensure your Traefik proxy network is named `proxy` or update the `docker-compose.yml` accordingly.
 ```bash
 mongosh munich_news
 db.subscribers.insertOne({
  email: "user@example.com",
  active: true,
  tracking_enabled: true,
  subscribed_at: new Date()
 })
 ```
 ### View Analytics
 ```bash
 # Newsletter metrics
 curl http://localhost:5001/api/analytics/newsletter/2024-01-15
 # Article performance
 curl http://localhost:5001/api/analytics/article/https://example.com/article
 # Subscriber activity
 curl http://localhost:5001/api/analytics/subscriber/user@example.com
 ```
 ## ⏰ Schedule Configuration
 ### Change Crawler Time (default: 6:00 AM)
 Edit `news_crawler/scheduled_crawler.py`:
 ```python
 schedule.every().day.at("06:00").do(run_crawler)  # Change time
 ```
 ### Change Sender Time (default: 7:00 AM)
 Edit `news_sender/scheduled_sender.py`:
 ```python
 schedule.every().day.at("07:00").do(run_sender)  # Change time
 ```
 After changes:
 ```bash
 docker-compose up -d --build
 ```
 ## 📈 Monitoring
 ### Container Status
 ```bash
 docker-compose ps
 ```
 ### Check Next Scheduled Runs
 ```bash
 # Crawler
 docker-compose logs crawler | grep "Next scheduled run"
 # Sender
 docker-compose logs sender | grep "Next scheduled run"
 ```
 ### Engagement Metrics
 ```bash
 mongosh munich_news
 // Open rate
 var sent = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15" })
 var opened = db.newsletter_sends.countDocuments({ newsletter_id: "2024-01-15", opened: true })
 print("Open Rate: " + ((opened / sent) * 100).toFixed(2) + "%")
 // Click rate
 var clicks = db.link_clicks.countDocuments({ newsletter_id: "2024-01-15" })
 print("Click Rate: " + ((clicks / sent) * 100).toFixed(2) + "%")
 ```
 ## 🐛 Troubleshooting
 ### Crawler Not Finding Articles
 ```bash
 # Check RSS feeds
 mongosh munich_news --eval "db.rss_feeds.find({ active: true })"
 # Test manually
 docker-compose exec crawler python crawler_service.py 5
 ```
 ### Newsletter Not Sending
 ```bash
 # Check email config
 docker-compose exec sender python -c "from sender_service import Config; print(Config.SMTP_SERVER)"
 # Test email
 docker-compose exec sender python sender_service.py test your-email@example.com
 ```
 ### Containers Not Starting
 ```bash
 # Check logs
 docker-compose logs
 # Rebuild
 docker-compose up -d --build
 # Reset everything
 docker-compose down -v
 docker-compose up -d
 ```
 ## 🔐 Privacy & Compliance
 ### GDPR Features
 - **Data Retention**: Automatic anonymization after 90 days
 - **Opt-Out**: Subscribers can disable tracking
 - **Data Deletion**: Full data removal on request
 - **Transparency**: Privacy notice in all emails
 ### Privacy Endpoints
 ```bash
 # Delete subscriber data
 curl -X DELETE http://localhost:5001/api/tracking/subscriber/user@example.com
 # Anonymize old data
 curl -X POST http://localhost:5001/api/tracking/anonymize
 # Opt out of tracking
 curl -X POST http://localhost:5001/api/tracking/subscriber/user@example.com/opt-out
 ```
 ## 📚 Documentation
 ### Getting Started
 - **[QUICKSTART.md](QUICKSTART.md)** - 5-minute setup guide
 - **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines
 ### Core Features
 - **[docs/AI_NEWS_AGGREGATION.md](docs/AI_NEWS_AGGREGATION.md)** - AI-powered clustering & neutral summaries
 - **[docs/PERSONALIZATION.md](docs/PERSONALIZATION.md)** - Personalized newsletter system
 - **[docs/PERSONALIZATION_COMPLETE.md](docs/PERSONALIZATION_COMPLETE.md)** - Personalization implementation guide
 - **[docs/FEATURES.md](docs/FEATURES.md)** - Complete feature list
 - **[docs/API.md](docs/API.md)** - API endpoints reference
 ### Technical Documentation
 - **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** - System architecture
 - **[docs/SETUP.md](docs/SETUP.md)** - Detailed setup guide
 - **[docs/OLLAMA_SETUP.md](docs/OLLAMA_SETUP.md)** - AI/Ollama configuration
 - **[docs/GPU_SETUP.md](docs/GPU_SETUP.md)** - GPU acceleration setup
 - **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Production deployment
 - **[docs/SECURITY.md](docs/SECURITY.md)** - Security best practices
 - **[docs/REFERENCE.md](docs/REFERENCE.md)** - Complete reference
 - **[docs/DEPLOYMENT.md](docs/DEPLOYMENT.md)** - Deployment guide
 - **[docs/API.md](docs/API.md)** - API reference
 - **[docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md)** - Database structure
 - **[docs/BACKEND_STRUCTURE.md](docs/BACKEND_STRUCTURE.md)** - Backend organization
 ### Component Documentation
 - **[docs/CRAWLER_HOW_IT_WORKS.md](docs/CRAWLER_HOW_IT_WORKS.md)** - Crawler internals
 - **[docs/EXTRACTION_STRATEGIES.md](docs/EXTRACTION_STRATEGIES.md)** - Content extraction
 - **[docs/RSS_URL_EXTRACTION.md](docs/RSS_URL_EXTRACTION.md)** - RSS parsing
 ## 🧪 Testing
 All test files are organized in the `tests/` directory:
 ```bash
 # Run crawler tests
 docker-compose exec crawler python tests/crawler/test_crawler.py
 # Run sender tests
 docker-compose exec sender python tests/sender/test_tracking_integration.py
 # Run backend tests
 docker-compose exec backend python tests/backend/test_tracking.py
 # Test personalization system (all 4 phases)
 docker exec munich-news-local-backend python test_personalization_system.py
 ```
 ## 🚀 Production Deployment
 ### Environment Setup
 1. Update `backend/.env` with production values
 2. Set strong MongoDB password
 3. Use HTTPS for tracking URLs
 4. Configure proper SMTP server
 ### Security
 ```bash
 # Use production compose file
 docker-compose -f docker-compose.prod.yml up -d
 # Set MongoDB password
 export MONGO_PASSWORD=your-secure-password
 ```
 ### Monitoring
 - Set up log rotation
 - Configure health checks
 - Set up alerts for failures
 - Monitor database size
 ## 📚 Documentation
 Complete documentation available in the [docs/](docs/) directory:
 - **[Documentation Index](docs/INDEX.md)** - Complete documentation guide
 - **[GPU Setup](docs/GPU_SETUP.md)** - 5-10x faster with GPU acceleration
 - **[Admin API](docs/ADMIN_API.md)** - API endpoints reference
 - **[Security Guide](docs/SECURITY_NOTES.md)** - Security best practices
 - **[System Architecture](docs/SYSTEM_ARCHITECTURE.md)** - Technical overview
 ## 📝 License
 [Your License Here]
 ## 🤝 Contributing
-Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first.
+We welcome contributions! Please check [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
-## 📧 Support
+## 📄 License
-For issues or questions, please open a GitHub issue.
+MIT License - see [LICENSE](LICENSE) for details.
 ---
 **Built with ❤️ for Munich News Daily**
--- a/backend/app.py
+++ b/backend/app.py
@@ -13,6 +13,7 @@ from routes.admin_routes import admin_bp
 from routes.transport_routes import transport_bp
 from routes.interests_routes import interests_bp
 from routes.personalization_routes import personalization_bp
 from routes.search_routes import search_bp
 # Initialize Flask app
 app = Flask(__name__)
@@ -33,6 +34,7 @@ app.register_blueprint(admin_bp)
 app.register_blueprint(transport_bp)
 app.register_blueprint(interests_bp)
 app.register_blueprint(personalization_bp)
 app.register_blueprint(search_bp)
 # Health check endpoint
@app.route('/health')
--- a/backend/chroma_client.py
+++ b/backend/chroma_client.py
@@ -87,7 +87,8 @@ class ChromaClient:
            # Prepare text for embedding (Title + Summary + Start of Content)
            # This gives semantic search a good overview
-            title = article.get('title', '')
+            # Use English title if available, otherwise original
            title = article.get('title_en') if article.get('title_en') else article.get('title', '')
            summary = article.get('summary') or ''
            content_snippet = article.get('content', '')[:1000]
--- a/backend/config.py
+++ b/backend/config.py
@@ -45,6 +45,11 @@ class Config:
    TRACKING_API_URL = os.getenv('TRACKING_API_URL', f'http://localhost:{os.getenv("FLASK_PORT", "5000")}')
    TRACKING_DATA_RETENTION_DAYS = int(os.getenv('TRACKING_DATA_RETENTION_DAYS', '90'))
    # ChromaDB
    CHROMA_HOST = os.getenv('CHROMA_HOST', 'chromadb')
    CHROMA_PORT = int(os.getenv('CHROMA_PORT', '8000'))
    CHROMA_COLLECTION = os.getenv('CHROMA_COLLECTION', 'munich_news_articles')
    @classmethod
    def print_config(cls):
        """Print configuration (without sensitive data)"""
@@ -57,3 +62,5 @@ class Config:
        print(f"  Ollama Enabled: {cls.OLLAMA_ENABLED}")
        print(f"  Tracking Enabled: {cls.TRACKING_ENABLED}")
        print(f"  Tracking API URL: {cls.TRACKING_API_URL}")
        print(f"  ChromaDB Host: {cls.CHROMA_HOST}")
        print(f"  ChromaDB Port: {cls.CHROMA_PORT}")
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -8,3 +8,4 @@ Jinja2==3.1.2
 redis==5.0.1
 chromadb>=0.4.0
 sentence-transformers>=2.2.2
--- a/backend/routes/news_routes.py
+++ b/backend/routes/news_routes.py
@@ -24,8 +24,11 @@ def get_news():
        db_articles = []
        for doc in cursor:
            # Use English title if available, otherwise fallback to original
            title = doc.get('title_en') if doc.get('title_en') else doc.get('title', '')
            article = {
-                'title': doc.get('title', ''),
+                'title': title,
                'author': doc.get('author'),
                'link': doc.get('link', ''),
                'source': doc.get('source', ''),
@@ -114,8 +117,10 @@ def get_clustered_news_internal():
            # Use cluster_articles from aggregation (already fetched)
            cluster_articles = doc.get('cluster_articles', [])
            title = doc.get('title_en') if doc.get('title_en') else doc.get('title', '')
            article = {
-                'title': doc.get('title', ''),
+                'title': title,
                'link': doc.get('link', ''),
                'source': doc.get('source', ''),
                'published': doc.get('published_at', ''),
@@ -173,7 +178,7 @@ def get_article_by_url(article_url):
            return jsonify({'error': 'Article not found'}), 404
        return jsonify({
-            'title': article.get('title', ''),
+            'title': article.get('title_en') if article.get('title_en') else article.get('title', ''),
            'author': article.get('author'),
            'link': article.get('link', ''),
            'content': article.get('content', ''),
--- a/backend/routes/search_routes.py
+++ b/backend/routes/search_routes.py
@@ -0,0 +1,88 @@
 from flask import Blueprint, jsonify, request
 from config import Config
 from chroma_client import ChromaClient
 import logging
 search_bp = Blueprint('search', __name__)
 # Initialize ChromaDB client
 # Note: We use the hostname 'chromadb' as defined in docker-compose for the backend
 chroma_client = ChromaClient(
    host=Config.CHROMA_HOST,
    port=Config.CHROMA_PORT,
    collection_name=Config.CHROMA_COLLECTION
 )
@search_bp.route('/api/search', methods=['GET'])
 def search_news():
    """
    Semantic search for news articles using ChromaDB.
    Query parameters:
    - q: Search query (required)
    - limit: Number of results (default: 10)
    - category: Filter by category (optional)
    """
    try:
        query = request.args.get('q')
        if not query:
            return jsonify({'error': 'Missing search query'}), 400
        limit = int(request.args.get('limit', 10))
        category = request.args.get('category')
        # Build filter if category provided
        where_filter = None
        if category:
            where_filter = {"category": category}
        # Perform search
        results = chroma_client.search(
            query_text=query,
            n_results=limit,
            where=where_filter
        )
        # Format for frontend
        formatted_response = []
        for item in results:
            metadata = item.get('metadata', {})
            # Use translated title if availble (stored in metadata as title_en or title)
            # Note: Chroma metadata structure is flat. If we store title_en, we should use it.
            # But currently we store: title, url, source, category, published_at. 
            # We need to make sure title_en is stored in Chroma OR fetch it from DB.
            # Faster approach: just rely on what is in Chroma.
            # BETTER: In crawl, we store title as title_en in metadata if available?
            # Let's check how we store it in crawler_service.py/chroma_client.py
            # Correction: Looking at crawler_service.py line 456, we pass article_doc to add_articles.
            # In chroma_client.py line 97, we only extract title, url, source, category, published_at.
            # We are NOT storing title_en in Chroma metadata currently.
            # FOR NOW: We will stick to the title stored in Chroma, but we should update Chroma storing logic.
            # However, since the user IS complaining about English, let's assume valid English titles 
            # are what we want to display.
            # Wait, if we change the metadata in ChromaClient to use title_en as the main title, 
            # then search results will automatically show English.
            title = metadata.get('title', 'Unknown Title')
            formatted_response.append({
                'title': title,
                'link': metadata.get('url', ''),
                'source': metadata.get('source', 'Unknown'),
                'category': metadata.get('category', 'general'),
                'published_at': metadata.get('published_at', ''),
                'relevance_score': 1.0 - item.get('distance', 1.0), # Convert distance to score (approx)
                'snippet': item.get('document', '')[:200] + '...' # Preview
            })
        return jsonify({
            'query': query,
            'count': len(formatted_response),
            'results': formatted_response
        }), 200
    except Exception as e:
        logging.error(f"Search error: {str(e)}")
        return jsonify({'error': str(e)}), 500
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,20 +1,3 @@
 # Munich News Daily - Docker Compose Configuration
 #
 # GPU Support:
 #   To enable GPU acceleration for Ollama (5-10x faster):
 #   1. Check GPU availability: ./check-gpu.sh
 #   2. Start with GPU: ./start-with-gpu.sh
 #   Or manually: docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
 #
 # Security:
 #   - Only Backend API (port 5001) is exposed to host
 #   - MongoDB is internal-only (not exposed to host)
 #   - Ollama is internal-only (not exposed to host)
 #   - Crawler and Sender are internal-only
 #   All services communicate via internal Docker network
 #
 # See docs/OLLAMA_SETUP.md for detailed setup instructions
 services:
  # Ollama AI Service (Internal only - not exposed to host)
  ollama:
@@ -29,14 +12,6 @@ services:
    dns:
      - 8.8.8.8
      - 1.1.1.1
    # GPU support (uncomment if you have NVIDIA GPU)
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]
    healthcheck:
      test: [ "CMD-SHELL", "ollama list || exit 1" ]
      interval: 30s
--- a/frontend/public/app.js
+++ b/frontend/public/app.js
@@ -19,10 +19,10 @@ async function loadCategories() {
        const response = await fetch('/api/categories');
        const data = await response.json();
        const categories = data.categories || [];
-        
+
        const container = document.getElementById('categoryCheckboxes');
        container.innerHTML = '';
-        
+
        categories.forEach(category => {
            const label = document.createElement('label');
            label.className = 'flex items-center space-x-3 cursor-pointer';
@@ -40,11 +40,11 @@ async function loadCategories() {
 async function loadNews() {
    const newsGrid = document.getElementById('newsGrid');
    newsGrid.innerHTML = '<div class="text-center py-10 text-gray-500">Loading news...</div>';
-    
+
    try {
        const response = await fetch('/api/news');
        const data = await response.json();
-        
+
        if (data.articles && data.articles.length > 0) {
            allArticles = data.articles;
            filteredArticles = data.articles;
@@ -63,24 +63,24 @@ async function loadNews() {
 function loadMoreArticles() {
    if (isLoading || displayedCount >= filteredArticles.length) return;
-    
+
    isLoading = true;
    const newsGrid = document.getElementById('newsGrid');
-    
+
    // Remove loading indicator if exists
    const loadingIndicator = document.getElementById('loadingIndicator');
    if (loadingIndicator) loadingIndicator.remove();
-    
+
    // Get next batch of articles
    const nextBatch = filteredArticles.slice(displayedCount, displayedCount + ARTICLES_PER_PAGE);
-    
+
    nextBatch.forEach((article, index) => {
        const card = createNewsCard(article, displayedCount + index);
        newsGrid.appendChild(card);
    });
-    
+
    displayedCount += nextBatch.length;
-    
+
    // Add loading indicator if more articles available
    if (displayedCount < filteredArticles.length) {
        const loader = document.createElement('div');
@@ -95,17 +95,17 @@ function loadMoreArticles() {
        endMessage.textContent = `✓ All ${filteredArticles.length} articles loaded`;
        newsGrid.appendChild(endMessage);
    }
-    
+
    isLoading = false;
 }
 function setupInfiniteScroll() {
    window.addEventListener('scroll', () => {
        if (isLoading || displayedCount >= filteredArticles.length) return;
-        
+
        const scrollPosition = window.innerHeight + window.scrollY;
        const threshold = document.documentElement.scrollHeight - 500;
-        
+
        if (scrollPosition >= threshold) {
            loadMoreArticles();
        }
@@ -113,53 +113,85 @@ function setupInfiniteScroll() {
 }
 // Search functionality
-function handleSearch() {
+let searchTimeout;
 async function handleSearch() {
    const searchInput = document.getElementById('searchInput');
    const clearBtn = document.getElementById('clearSearch');
-    searchQuery = searchInput.value.trim().toLowerCase();
+    const searchStats = document.getElementById('searchStats');
-    
+    const newsGrid = document.getElementById('newsGrid');
    searchQuery = searchInput.value.trim();
    // Show/hide clear button
    if (searchQuery) {
        clearBtn.classList.remove('hidden');
    } else {
        clearBtn.classList.add('hidden');
    }
-    
+
-    // Filter articles
+    // Clear previous timeout
    if (searchTimeout) clearTimeout(searchTimeout);
    // If empty query, reset to all articles
    if (searchQuery === '') {
        filteredArticles = allArticles;
-    } else {
+        displayedCount = 0;
-        filteredArticles = allArticles.filter(article => {
+        newsGrid.innerHTML = '';
-            const title = article.title.toLowerCase();
+        updateSearchStats();
            const summary = (article.summary || '').toLowerCase().replace(/<[^>]*>/g, '');
            const source = formatSourceName(article.source).toLowerCase();
            return title.includes(searchQuery) || 
                   summary.includes(searchQuery) || 
                   source.includes(searchQuery);
        });
    }
    // Reset display
    displayedCount = 0;
    const newsGrid = document.getElementById('newsGrid');
    newsGrid.innerHTML = '';
    // Update stats
    updateSearchStats();
    // Load filtered articles
    if (filteredArticles.length > 0) {
        loadMoreArticles();
-    } else {
+        return;
        newsGrid.innerHTML = `
            <div class="text-center py-16">
                <div class="text-6xl mb-4">🔍</div>
                <p class="text-xl text-gray-600 mb-2">No articles found</p>
                <p class="text-gray-400">Try a different search term</p>
            </div>
        `;
    }
    // Debounce search API call
    searchTimeout = setTimeout(async () => {
        // Show searching state
        newsGrid.innerHTML = '<div class="text-center py-10 text-gray-500">Searching...</div>';
        try {
            const response = await fetch(`/api/search?q=${encodeURIComponent(searchQuery)}&limit=20`);
            // Check if response is ok
            if (!response.ok) {
                const errorText = await response.text();
                throw new Error(`Server returned ${response.status}: ${errorText}`);
            }
            const data = await response.json();
            if (data.results && data.results.length > 0) {
                // Map results to match card format
                filteredArticles = data.results.map(item => ({
                    title: item.title,
                    link: item.link,
                    source: item.source,
                    summary: item.snippet, // Map snippet to summary
                    published_at: item.published_at,
                    score: item.relevance_score
                }));
                displayedCount = 0;
                newsGrid.innerHTML = '';
                // Update stats
                searchStats.textContent = `Found ${filteredArticles.length} relevant articles`;
                loadMoreArticles();
            } else {
                newsGrid.innerHTML = `
                    <div class="text-center py-16">
                        <div class="text-6xl mb-4">🔍</div>
                        <p class="text-xl text-gray-600 mb-2">No relevant articles found</p>
                        <p class="text-gray-400">Try different keywords or concepts</p>
                    </div>
                `;
                searchStats.textContent = 'No results found';
            }
        } catch (error) {
            console.error('Search failed:', error);
            newsGrid.innerHTML = `<div class="text-center py-10 text-red-400">Search failed: ${error.message}</div>`;
        }
    }, 500); // 500ms debounce
 }
 function clearSearch() {
@@ -182,11 +214,11 @@ function createNewsCard(article, index) {
    const card = document.createElement('div');
    card.className = 'group bg-white rounded-xl overflow-hidden shadow-md hover:shadow-xl transition-all duration-300 cursor-pointer border border-gray-100 hover:border-primary/30';
    card.onclick = () => window.open(article.link, '_blank');
-    
+
    // Extract image from summary if it's an img tag (from Süddeutsche)
    let imageUrl = null;
    let cleanSummary = article.summary || 'No summary available.';
-    
+
    if (cleanSummary.includes('<img')) {
        const imgMatch = cleanSummary.match(/src="([^"]+)"/);
        if (imgMatch) {
@@ -195,17 +227,17 @@ function createNewsCard(article, index) {
        // Remove img tag from summary
        cleanSummary = cleanSummary.replace(/<img[^>]*>/g, '').replace(/<\/?p>/g, '').trim();
    }
-    
+
    // Get source icon/emoji
    const sourceIcon = getSourceIcon(article.source);
-    
+
    // Format source name
    const sourceName = formatSourceName(article.source);
-    
+
    // Get word count badge
    const wordCount = article.word_count || article.summary_word_count;
    const readTime = wordCount ? Math.ceil(wordCount / 200) : null;
-    
+
    card.innerHTML = `
        <div class="flex flex-col sm:flex-row">
            <!-- Image -->
@@ -237,11 +269,11 @@ function createNewsCard(article, index) {
            </div>
        </div>
    `;
-    
+
    // Add staggered animation
    card.style.opacity = '0';
    card.style.animation = `fadeIn 0.5s ease-out ${(index % ARTICLES_PER_PAGE) * 0.1}s forwards`;
-    
+
    return card;
 }
@@ -293,7 +325,7 @@ async function loadStats() {
    try {
        const response = await fetch('/api/stats');
        const data = await response.json();
-        
+
        if (data.subscribers !== undefined) {
            document.getElementById('subscriberCount').textContent = data.subscribers.toLocaleString();
        }
@@ -306,44 +338,44 @@ async function subscribe() {
    const emailInput = document.getElementById('emailInput');
    const subscribeBtn = document.getElementById('subscribeBtn');
    const formMessage = document.getElementById('formMessage');
-    
+
    const email = emailInput.value.trim();
-    
+
    if (!email || !email.includes('@')) {
        formMessage.textContent = 'Please enter a valid email address';
        formMessage.className = 'text-red-200 font-medium';
        return;
    }
-    
+
    // Get selected categories
    const checkboxes = document.querySelectorAll('#categoryCheckboxes input[type="checkbox"]:checked');
    const categories = Array.from(checkboxes).map(cb => cb.value);
-    
+
    if (categories.length === 0) {
        formMessage.textContent = 'Please select at least one category';
        formMessage.className = 'text-red-200 font-medium';
        return;
    }
-    
+
    subscribeBtn.disabled = true;
    subscribeBtn.textContent = 'Subscribing...';
    subscribeBtn.classList.add('opacity-75', 'cursor-not-allowed');
    formMessage.textContent = '';
-    
+
    try {
        const response = await fetch('/api/subscribe', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json'
            },
-            body: JSON.stringify({ 
+            body: JSON.stringify({
                email: email,
                categories: categories
            })
        });
-        
+
        const data = await response.json();
-        
+
        if (response.ok) {
            formMessage.textContent = data.message || 'Successfully subscribed! Check your email for confirmation.';
            formMessage.className = 'text-green-200 font-medium';
@@ -384,15 +416,15 @@ function closeUnsubscribe() {
 async function unsubscribe() {
    const emailInput = document.getElementById('unsubscribeEmail');
    const unsubscribeMessage = document.getElementById('unsubscribeMessage');
-    
+
    const email = emailInput.value.trim();
-    
+
    if (!email || !email.includes('@')) {
        unsubscribeMessage.textContent = 'Please enter a valid email address';
        unsubscribeMessage.className = 'text-red-600 font-medium';
        return;
    }
-    
+
    try {
        const response = await fetch('/api/unsubscribe', {
            method: 'POST',
@@ -401,9 +433,9 @@ async function unsubscribe() {
            },
            body: JSON.stringify({ email: email })
        });
-        
+
        const data = await response.json();
-        
+
        if (response.ok) {
            unsubscribeMessage.textContent = data.message || 'Successfully unsubscribed.';
            unsubscribeMessage.className = 'text-green-600 font-medium';
@@ -423,7 +455,7 @@ async function unsubscribe() {
 }
 // Close modal when clicking outside
-window.onclick = function(event) {
+window.onclick = function (event) {
    const modal = document.getElementById('unsubscribeModal');
    if (event.target === modal) {
        closeUnsubscribe();
--- a/frontend/server.js
+++ b/frontend/server.js
@@ -204,6 +204,31 @@ app.get('/api/ollama/config', async (req, res) => {
  }
 });
 app.get('/api/search', async (req, res) => {
  try {
    const { q, limit, category } = req.query;
    const response = await axios.get(`${API_URL}/api/search`, {
      params: { q, limit, category }
    });
    res.json(response.data);
  } catch (error) {
    if (error.response) {
      // The request was made and the server responded with a status code
      // that falls out of the range of 2xx
      console.error('Search API Error:', error.response.status, error.response.data);
      res.status(error.response.status).json(error.response.data);
    } else if (error.request) {
      // The request was made but no response was received
      console.error('Search API No Response:', error.request);
      res.status(502).json({ error: 'Search service unavailable (timeout/connection)' });
    } else {
      // Something happened in setting up the request that triggered an Error
      console.error('Search API Request Error:', error.message);
      res.status(500).json({ error: 'Internal proxy error' });
    }
  }
 });
 app.listen(PORT, () => {
  console.log(`Frontend server running on http://localhost:${PORT}`);
  console.log(`Admin dashboard: http://localhost:${PORT}/admin.html`);
--- a/news_crawler/chroma_client.py
+++ b/news_crawler/chroma_client.py
@@ -87,7 +87,8 @@ class ChromaClient:
            # Prepare text for embedding (Title + Summary + Start of Content)
            # This gives semantic search a good overview
-            title = article.get('title', '')
+            # Use English title if available, otherwise original
            title = article.get('title_en') if article.get('title_en') else article.get('title', '')
            summary = article.get('summary') or ''
            content_snippet = article.get('content', '')[:1000]
--- a/news_crawler/crawler_service.py
+++ b/news_crawler/crawler_service.py
@@ -340,7 +340,11 @@ def crawl_rss_feed(feed_url, feed_name, feed_category='general', max_articles=10
        if not feed.entries:
            print(f"   ⚠ No entries found in feed")
-            return 0
+            return {
                'crawled': 0,
                'summarized': 0,
                'failed_summaries': 0
            }
        crawled_count = 0
        summarized_count = 0
--- a/news_crawler/scheduled_crawler.py
+++ b/news_crawler/scheduled_crawler.py
@@ -37,12 +37,12 @@ def main():
    """Main scheduler loop"""
    print("🤖 Munich News Crawler Scheduler")
    print("="*60)
-    print("Schedule: Daily at 6:00 AM Berlin time")
+    print("Schedule: Every 3 hours")
    print("Timezone: Europe/Berlin (CET/CEST)")
    print("="*60)
-    # Schedule the crawler to run at 6 AM Berlin time
+    # Schedule the crawler to run every 3 hours
-    schedule.every().day.at("06:00").do(run_crawler)
+    schedule.every(3).hours.do(run_crawler)
    # Show next run time
    berlin_time = datetime.now(BERLIN_TZ)
Author	SHA1	Message	Date
dongho	7346ee9de2	update All checks were successful dongho-repo/Munich-news/pipeline/head This commit looks good Details	2025-12-10 15:57:07 +00:00
dongho	6e9fbe44c4	update	2025-12-10 15:52:41 +00:00
dongho	4e8b60f77c	update	2025-12-10 15:50:11 +00:00