Files
Munich-news/CHANGELOG.md
2025-11-10 19:13:33 +01:00

4.1 KiB

Changelog

[Unreleased] - 2024-11-10

Added - Major Refactoring

Backend Modularization

  • Restructured backend into modular architecture
  • Created separate route blueprints:
    • subscription_routes.py - User subscriptions
    • news_routes.py - News fetching and stats
    • rss_routes.py - RSS feed management (CRUD)
    • ollama_routes.py - AI integration
  • Created service layer:
    • news_service.py - News fetching logic
    • email_service.py - Newsletter sending
    • ollama_service.py - AI communication
  • Centralized configuration in config.py
  • Separated database logic in database.py
  • Reduced main app.py from 700+ lines to 27 lines

RSS Feed Management

  • Dynamic RSS feed management via API
  • Add/remove/list/toggle RSS feeds without code changes
  • Unique index on RSS feed URLs (prevents duplicates)
  • Default feeds auto-initialized on first run
  • Created fix_duplicates.py utility script

News Crawler Microservice

  • Created standalone news_crawler/ microservice
  • Web scraping with BeautifulSoup
  • Smart content extraction using multiple selectors
  • Full article content storage in MongoDB
  • Word count calculation
  • Duplicate prevention (skips already-crawled articles)
  • Rate limiting (1 second between requests)
  • Can run independently or scheduled
  • Docker support for crawler
  • Comprehensive documentation

API Endpoints

New endpoints added:

  • GET /api/rss-feeds - List all RSS feeds
  • POST /api/rss-feeds - Add new RSS feed
  • DELETE /api/rss-feeds/<id> - Remove RSS feed
  • PATCH /api/rss-feeds/<id>/toggle - Toggle feed active status

Documentation

  • Created ARCHITECTURE.md - System architecture overview
  • Created backend/STRUCTURE.md - Backend structure guide
  • Created news_crawler/README.md - Crawler documentation
  • Created news_crawler/QUICKSTART.md - Quick start guide
  • Created news_crawler/test_crawler.py - Test suite
  • Updated main README.md with new features
  • Updated DATABASE_SCHEMA.md with new fields

Configuration

  • Added FLASK_PORT environment variable
  • Fixed OLLAMA_MODEL typo in .env
  • Port 5001 default to avoid macOS AirPlay conflict

Changed

  • Backend structure: Monolithic → Modular
  • RSS feeds: Hardcoded → Database-driven
  • Article storage: Summary only → Full content support
  • Configuration: Scattered → Centralized

Technical Improvements

  • Separation of concerns (routes vs services)
  • Better testability
  • Easier maintenance
  • Scalable architecture
  • Independent microservices
  • Proper error handling
  • Comprehensive logging

Database Schema Updates

Articles collection now includes:

  • full_content - Full article text
  • word_count - Number of words
  • crawled_at - When content was crawled

RSS Feeds collection added:

  • name - Feed name
  • url - Feed URL (unique)
  • active - Active status
  • created_at - Creation timestamp

Files Added

backend/
├── config.py
├── database.py
├── fix_duplicates.py
├── STRUCTURE.md
├── routes/
│   ├── __init__.py
│   ├── subscription_routes.py
│   ├── news_routes.py
│   ├── rss_routes.py
│   └── ollama_routes.py
└── services/
    ├── __init__.py
    ├── news_service.py
    ├── email_service.py
    └── ollama_service.py

news_crawler/
├── crawler_service.py
├── test_crawler.py
├── requirements.txt
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── README.md
└── QUICKSTART.md

Root:
├── ARCHITECTURE.md
└── CHANGELOG.md

Files Removed

  • Old monolithic backend/app.py (replaced with modular version)

Next Steps (Future Enhancements)

  • Frontend UI for RSS feed management
  • Automatic article summarization with Ollama
  • Scheduled newsletter sending
  • Article categorization and tagging
  • Search functionality
  • User preferences (categories, frequency)
  • Analytics dashboard
  • API rate limiting
  • Caching layer (Redis)
  • Message queue for crawler (Celery)