4.1 KiB
4.1 KiB
Changelog
[Unreleased] - 2024-11-10
Added - Major Refactoring
Backend Modularization
- ✅ Restructured backend into modular architecture
- ✅ Created separate route blueprints:
subscription_routes.py- User subscriptionsnews_routes.py- News fetching and statsrss_routes.py- RSS feed management (CRUD)ollama_routes.py- AI integration
- ✅ Created service layer:
news_service.py- News fetching logicemail_service.py- Newsletter sendingollama_service.py- AI communication
- ✅ Centralized configuration in
config.py - ✅ Separated database logic in
database.py - ✅ Reduced main
app.pyfrom 700+ lines to 27 lines
RSS Feed Management
- ✅ Dynamic RSS feed management via API
- ✅ Add/remove/list/toggle RSS feeds without code changes
- ✅ Unique index on RSS feed URLs (prevents duplicates)
- ✅ Default feeds auto-initialized on first run
- ✅ Created
fix_duplicates.pyutility script
News Crawler Microservice
- ✅ Created standalone
news_crawler/microservice - ✅ Web scraping with BeautifulSoup
- ✅ Smart content extraction using multiple selectors
- ✅ Full article content storage in MongoDB
- ✅ Word count calculation
- ✅ Duplicate prevention (skips already-crawled articles)
- ✅ Rate limiting (1 second between requests)
- ✅ Can run independently or scheduled
- ✅ Docker support for crawler
- ✅ Comprehensive documentation
API Endpoints
New endpoints added:
GET /api/rss-feeds- List all RSS feedsPOST /api/rss-feeds- Add new RSS feedDELETE /api/rss-feeds/<id>- Remove RSS feedPATCH /api/rss-feeds/<id>/toggle- Toggle feed active status
Documentation
- ✅ Created
ARCHITECTURE.md- System architecture overview - ✅ Created
backend/STRUCTURE.md- Backend structure guide - ✅ Created
news_crawler/README.md- Crawler documentation - ✅ Created
news_crawler/QUICKSTART.md- Quick start guide - ✅ Created
news_crawler/test_crawler.py- Test suite - ✅ Updated main
README.mdwith new features - ✅ Updated
DATABASE_SCHEMA.mdwith new fields
Configuration
- ✅ Added
FLASK_PORTenvironment variable - ✅ Fixed
OLLAMA_MODELtypo in.env - ✅ Port 5001 default to avoid macOS AirPlay conflict
Changed
- Backend structure: Monolithic → Modular
- RSS feeds: Hardcoded → Database-driven
- Article storage: Summary only → Full content support
- Configuration: Scattered → Centralized
Technical Improvements
- Separation of concerns (routes vs services)
- Better testability
- Easier maintenance
- Scalable architecture
- Independent microservices
- Proper error handling
- Comprehensive logging
Database Schema Updates
Articles collection now includes:
full_content- Full article textword_count- Number of wordscrawled_at- When content was crawled
RSS Feeds collection added:
name- Feed nameurl- Feed URL (unique)active- Active statuscreated_at- Creation timestamp
Files Added
backend/
├── config.py
├── database.py
├── fix_duplicates.py
├── STRUCTURE.md
├── routes/
│ ├── __init__.py
│ ├── subscription_routes.py
│ ├── news_routes.py
│ ├── rss_routes.py
│ └── ollama_routes.py
└── services/
├── __init__.py
├── news_service.py
├── email_service.py
└── ollama_service.py
news_crawler/
├── crawler_service.py
├── test_crawler.py
├── requirements.txt
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── README.md
└── QUICKSTART.md
Root:
├── ARCHITECTURE.md
└── CHANGELOG.md
Files Removed
- Old monolithic
backend/app.py(replaced with modular version)
Next Steps (Future Enhancements)
- Frontend UI for RSS feed management
- Automatic article summarization with Ollama
- Scheduled newsletter sending
- Article categorization and tagging
- Search functionality
- User preferences (categories, frequency)
- Analytics dashboard
- API rate limiting
- Caching layer (Redis)
- Message queue for crawler (Celery)