Files
Munich-news/docs/CHANGELOG.md
2025-11-11 14:09:21 +01:00

137 lines
4.1 KiB
Markdown

# Changelog
## [Unreleased] - 2024-11-10
### Added - Major Refactoring
#### Backend Modularization
- ✅ Restructured backend into modular architecture
- ✅ Created separate route blueprints:
- `subscription_routes.py` - User subscriptions
- `news_routes.py` - News fetching and stats
- `rss_routes.py` - RSS feed management (CRUD)
- `ollama_routes.py` - AI integration
- ✅ Created service layer:
- `news_service.py` - News fetching logic
- `email_service.py` - Newsletter sending
- `ollama_service.py` - AI communication
- ✅ Centralized configuration in `config.py`
- ✅ Separated database logic in `database.py`
- ✅ Reduced main `app.py` from 700+ lines to 27 lines
#### RSS Feed Management
- ✅ Dynamic RSS feed management via API
- ✅ Add/remove/list/toggle RSS feeds without code changes
- ✅ Unique index on RSS feed URLs (prevents duplicates)
- ✅ Default feeds auto-initialized on first run
- ✅ Created `fix_duplicates.py` utility script
#### News Crawler Microservice
- ✅ Created standalone `news_crawler/` microservice
- ✅ Web scraping with BeautifulSoup
- ✅ Smart content extraction using multiple selectors
- ✅ Full article content storage in MongoDB
- ✅ Word count calculation
- ✅ Duplicate prevention (skips already-crawled articles)
- ✅ Rate limiting (1 second between requests)
- ✅ Can run independently or scheduled
- ✅ Docker support for crawler
- ✅ Comprehensive documentation
#### API Endpoints
New endpoints added:
- `GET /api/rss-feeds` - List all RSS feeds
- `POST /api/rss-feeds` - Add new RSS feed
- `DELETE /api/rss-feeds/<id>` - Remove RSS feed
- `PATCH /api/rss-feeds/<id>/toggle` - Toggle feed active status
#### Documentation
- ✅ Created `ARCHITECTURE.md` - System architecture overview
- ✅ Created `backend/STRUCTURE.md` - Backend structure guide
- ✅ Created `news_crawler/README.md` - Crawler documentation
- ✅ Created `news_crawler/QUICKSTART.md` - Quick start guide
- ✅ Created `news_crawler/test_crawler.py` - Test suite
- ✅ Updated main `README.md` with new features
- ✅ Updated `DATABASE_SCHEMA.md` with new fields
#### Configuration
- ✅ Added `FLASK_PORT` environment variable
- ✅ Fixed `OLLAMA_MODEL` typo in `.env`
- ✅ Port 5001 default to avoid macOS AirPlay conflict
### Changed
- Backend structure: Monolithic → Modular
- RSS feeds: Hardcoded → Database-driven
- Article storage: Summary only → Full content support
- Configuration: Scattered → Centralized
### Technical Improvements
- Separation of concerns (routes vs services)
- Better testability
- Easier maintenance
- Scalable architecture
- Independent microservices
- Proper error handling
- Comprehensive logging
### Database Schema Updates
Articles collection now includes:
- `full_content` - Full article text
- `word_count` - Number of words
- `crawled_at` - When content was crawled
RSS Feeds collection added:
- `name` - Feed name
- `url` - Feed URL (unique)
- `active` - Active status
- `created_at` - Creation timestamp
### Files Added
```
backend/
├── config.py
├── database.py
├── fix_duplicates.py
├── STRUCTURE.md
├── routes/
│ ├── __init__.py
│ ├── subscription_routes.py
│ ├── news_routes.py
│ ├── rss_routes.py
│ └── ollama_routes.py
└── services/
├── __init__.py
├── news_service.py
├── email_service.py
└── ollama_service.py
news_crawler/
├── crawler_service.py
├── test_crawler.py
├── requirements.txt
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── README.md
└── QUICKSTART.md
Root:
├── ARCHITECTURE.md
└── CHANGELOG.md
```
### Files Removed
- Old monolithic `backend/app.py` (replaced with modular version)
### Next Steps (Future Enhancements)
- [ ] Frontend UI for RSS feed management
- [ ] Automatic article summarization with Ollama
- [ ] Scheduled newsletter sending
- [ ] Article categorization and tagging
- [ ] Search functionality
- [ ] User preferences (categories, frequency)
- [ ] Analytics dashboard
- [ ] API rate limiting
- [ ] Caching layer (Redis)
- [ ] Message queue for crawler (Celery)