# Changelog ## [Unreleased] - 2024-11-10 ### Added - Major Refactoring #### Backend Modularization - ✅ Restructured backend into modular architecture - ✅ Created separate route blueprints: - `subscription_routes.py` - User subscriptions - `news_routes.py` - News fetching and stats - `rss_routes.py` - RSS feed management (CRUD) - `ollama_routes.py` - AI integration - ✅ Created service layer: - `news_service.py` - News fetching logic - `email_service.py` - Newsletter sending - `ollama_service.py` - AI communication - ✅ Centralized configuration in `config.py` - ✅ Separated database logic in `database.py` - ✅ Reduced main `app.py` from 700+ lines to 27 lines #### RSS Feed Management - ✅ Dynamic RSS feed management via API - ✅ Add/remove/list/toggle RSS feeds without code changes - ✅ Unique index on RSS feed URLs (prevents duplicates) - ✅ Default feeds auto-initialized on first run - ✅ Created `fix_duplicates.py` utility script #### News Crawler Microservice - ✅ Created standalone `news_crawler/` microservice - ✅ Web scraping with BeautifulSoup - ✅ Smart content extraction using multiple selectors - ✅ Full article content storage in MongoDB - ✅ Word count calculation - ✅ Duplicate prevention (skips already-crawled articles) - ✅ Rate limiting (1 second between requests) - ✅ Can run independently or scheduled - ✅ Docker support for crawler - ✅ Comprehensive documentation #### API Endpoints New endpoints added: - `GET /api/rss-feeds` - List all RSS feeds - `POST /api/rss-feeds` - Add new RSS feed - `DELETE /api/rss-feeds/` - Remove RSS feed - `PATCH /api/rss-feeds//toggle` - Toggle feed active status #### Documentation - ✅ Created `ARCHITECTURE.md` - System architecture overview - ✅ Created `backend/STRUCTURE.md` - Backend structure guide - ✅ Created `news_crawler/README.md` - Crawler documentation - ✅ Created `news_crawler/QUICKSTART.md` - Quick start guide - ✅ Created `news_crawler/test_crawler.py` - Test suite - ✅ Updated main `README.md` with new features - ✅ Updated `DATABASE_SCHEMA.md` with new fields #### Configuration - ✅ Added `FLASK_PORT` environment variable - ✅ Fixed `OLLAMA_MODEL` typo in `.env` - ✅ Port 5001 default to avoid macOS AirPlay conflict ### Changed - Backend structure: Monolithic → Modular - RSS feeds: Hardcoded → Database-driven - Article storage: Summary only → Full content support - Configuration: Scattered → Centralized ### Technical Improvements - Separation of concerns (routes vs services) - Better testability - Easier maintenance - Scalable architecture - Independent microservices - Proper error handling - Comprehensive logging ### Database Schema Updates Articles collection now includes: - `full_content` - Full article text - `word_count` - Number of words - `crawled_at` - When content was crawled RSS Feeds collection added: - `name` - Feed name - `url` - Feed URL (unique) - `active` - Active status - `created_at` - Creation timestamp ### Files Added ``` backend/ ├── config.py ├── database.py ├── fix_duplicates.py ├── STRUCTURE.md ├── routes/ │ ├── __init__.py │ ├── subscription_routes.py │ ├── news_routes.py │ ├── rss_routes.py │ └── ollama_routes.py └── services/ ├── __init__.py ├── news_service.py ├── email_service.py └── ollama_service.py news_crawler/ ├── crawler_service.py ├── test_crawler.py ├── requirements.txt ├── .gitignore ├── Dockerfile ├── docker-compose.yml ├── README.md └── QUICKSTART.md Root: ├── ARCHITECTURE.md └── CHANGELOG.md ``` ### Files Removed - Old monolithic `backend/app.py` (replaced with modular version) ### Next Steps (Future Enhancements) - [ ] Frontend UI for RSS feed management - [ ] Automatic article summarization with Ollama - [ ] Scheduled newsletter sending - [ ] Article categorization and tagging - [ ] Search functionality - [ ] User preferences (categories, frequency) - [ ] Analytics dashboard - [ ] API rate limiting - [ ] Caching layer (Redis) - [ ] Message queue for crawler (Celery)