Files
Munich-news/docs/CHANGELOG.md
2025-11-11 17:58:12 +01:00

5.9 KiB

Changelog

[Unreleased] - 2024-11-10

Added - Major Refactoring

Backend Modularization

  • Restructured backend into modular architecture
  • Created separate route blueprints:
    • subscription_routes.py - User subscriptions
    • news_routes.py - News fetching and stats
    • rss_routes.py - RSS feed management (CRUD)
    • ollama_routes.py - AI integration
  • Created service layer:
    • news_service.py - News fetching logic
    • email_service.py - Newsletter sending
    • ollama_service.py - AI communication
  • Centralized configuration in config.py
  • Separated database logic in database.py
  • Reduced main app.py from 700+ lines to 27 lines

RSS Feed Management

  • Dynamic RSS feed management via API
  • Add/remove/list/toggle RSS feeds without code changes
  • Unique index on RSS feed URLs (prevents duplicates)
  • Default feeds auto-initialized on first run
  • Created fix_duplicates.py utility script

News Crawler Microservice

  • Created standalone news_crawler/ microservice
  • Web scraping with BeautifulSoup
  • Smart content extraction using multiple selectors
  • Full article content storage in MongoDB
  • Word count calculation
  • Duplicate prevention (skips already-crawled articles)
  • Rate limiting (1 second between requests)
  • Can run independently or scheduled
  • Docker support for crawler
  • Comprehensive documentation

API Endpoints

New endpoints added:

  • GET /api/rss-feeds - List all RSS feeds
  • POST /api/rss-feeds - Add new RSS feed
  • DELETE /api/rss-feeds/<id> - Remove RSS feed
  • PATCH /api/rss-feeds/<id>/toggle - Toggle feed active status

Documentation

  • Created ARCHITECTURE.md - System architecture overview
  • Created backend/STRUCTURE.md - Backend structure guide
  • Created news_crawler/README.md - Crawler documentation
  • Created news_crawler/QUICKSTART.md - Quick start guide
  • Created news_crawler/test_crawler.py - Test suite
  • Updated main README.md with new features
  • Updated DATABASE_SCHEMA.md with new fields

Configuration

  • Added FLASK_PORT environment variable
  • Fixed OLLAMA_MODEL typo in .env
  • Port 5001 default to avoid macOS AirPlay conflict

Changed

  • Backend structure: Monolithic → Modular
  • RSS feeds: Hardcoded → Database-driven
  • Article storage: Summary only → Full content support
  • Configuration: Scattered → Centralized

Technical Improvements

  • Separation of concerns (routes vs services)
  • Better testability
  • Easier maintenance
  • Scalable architecture
  • Independent microservices
  • Proper error handling
  • Comprehensive logging

Database Schema Updates

Articles collection now includes:

  • full_content - Full article text
  • word_count - Number of words
  • crawled_at - When content was crawled

RSS Feeds collection added:

  • name - Feed name
  • url - Feed URL (unique)
  • active - Active status
  • created_at - Creation timestamp

Files Added

backend/
├── config.py
├── database.py
├── fix_duplicates.py
├── STRUCTURE.md
├── routes/
│   ├── __init__.py
│   ├── subscription_routes.py
│   ├── news_routes.py
│   ├── rss_routes.py
│   └── ollama_routes.py
└── services/
    ├── __init__.py
    ├── news_service.py
    ├── email_service.py
    └── ollama_service.py

news_crawler/
├── crawler_service.py
├── test_crawler.py
├── requirements.txt
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── README.md
└── QUICKSTART.md

Root:
├── ARCHITECTURE.md
└── CHANGELOG.md

Files Removed

  • Old monolithic backend/app.py (replaced with modular version)

Next Steps (Future Enhancements)

  • Frontend UI for RSS feed management
  • Automatic article summarization with Ollama
  • Scheduled newsletter sending
  • Article categorization and tagging
  • Search functionality
  • User preferences (categories, frequency)
  • Analytics dashboard
  • API rate limiting
  • Caching layer (Redis)
  • Message queue for crawler (Celery)

Recent Updates (November 2025)

Security Improvements

  • MongoDB Internal-Only: Removed port exposure, only accessible via Docker network
  • Ollama Internal-Only: Removed port exposure, only accessible via Docker network
  • Reduced Attack Surface: Only Backend API (port 5001) exposed to host
  • Network Isolation: All services communicate via internal Docker network

Ollama Integration

  • Docker Compose Integration: Ollama service runs alongside other services
  • Automatic Model Download: phi3:latest model downloaded on first startup
  • GPU Support: NVIDIA GPU acceleration with automatic detection
  • Helper Scripts: start-with-gpu.sh, check-gpu.sh, configure-ollama.sh
  • Performance: 5-10x faster with GPU acceleration

API Enhancements

  • Send Newsletter Endpoint: /api/admin/send-newsletter to send to all active subscribers
  • Subscriber Status Fix: Fixed stats endpoint to correctly count active subscribers
  • Better Error Handling: Improved error messages and validation

Documentation

  • Consolidated Documentation: Moved all docs to docs/ directory
  • Security Guide: Comprehensive security documentation
  • GPU Setup Guide: Detailed GPU acceleration setup
  • MongoDB Connection Guide: Connection configuration explained
  • Subscriber Status Guide: How subscriber status system works

Configuration

  • MongoDB URI: Updated to use Docker service name (mongodb instead of localhost)
  • Ollama URL: Configured for internal Docker network (http://ollama:11434)
  • Single .env File: All configuration in backend/.env

Testing

  • Connectivity Tests: test-mongodb-connectivity.sh
  • Ollama Tests: test-ollama-setup.sh
  • Newsletter API Tests: test-newsletter-api.sh