Files
Munich-news/docs/CHANGELOG.md
2025-11-11 17:58:12 +01:00

177 lines
5.9 KiB
Markdown

# Changelog
## [Unreleased] - 2024-11-10
### Added - Major Refactoring
#### Backend Modularization
- ✅ Restructured backend into modular architecture
- ✅ Created separate route blueprints:
- `subscription_routes.py` - User subscriptions
- `news_routes.py` - News fetching and stats
- `rss_routes.py` - RSS feed management (CRUD)
- `ollama_routes.py` - AI integration
- ✅ Created service layer:
- `news_service.py` - News fetching logic
- `email_service.py` - Newsletter sending
- `ollama_service.py` - AI communication
- ✅ Centralized configuration in `config.py`
- ✅ Separated database logic in `database.py`
- ✅ Reduced main `app.py` from 700+ lines to 27 lines
#### RSS Feed Management
- ✅ Dynamic RSS feed management via API
- ✅ Add/remove/list/toggle RSS feeds without code changes
- ✅ Unique index on RSS feed URLs (prevents duplicates)
- ✅ Default feeds auto-initialized on first run
- ✅ Created `fix_duplicates.py` utility script
#### News Crawler Microservice
- ✅ Created standalone `news_crawler/` microservice
- ✅ Web scraping with BeautifulSoup
- ✅ Smart content extraction using multiple selectors
- ✅ Full article content storage in MongoDB
- ✅ Word count calculation
- ✅ Duplicate prevention (skips already-crawled articles)
- ✅ Rate limiting (1 second between requests)
- ✅ Can run independently or scheduled
- ✅ Docker support for crawler
- ✅ Comprehensive documentation
#### API Endpoints
New endpoints added:
- `GET /api/rss-feeds` - List all RSS feeds
- `POST /api/rss-feeds` - Add new RSS feed
- `DELETE /api/rss-feeds/<id>` - Remove RSS feed
- `PATCH /api/rss-feeds/<id>/toggle` - Toggle feed active status
#### Documentation
- ✅ Created `ARCHITECTURE.md` - System architecture overview
- ✅ Created `backend/STRUCTURE.md` - Backend structure guide
- ✅ Created `news_crawler/README.md` - Crawler documentation
- ✅ Created `news_crawler/QUICKSTART.md` - Quick start guide
- ✅ Created `news_crawler/test_crawler.py` - Test suite
- ✅ Updated main `README.md` with new features
- ✅ Updated `DATABASE_SCHEMA.md` with new fields
#### Configuration
- ✅ Added `FLASK_PORT` environment variable
- ✅ Fixed `OLLAMA_MODEL` typo in `.env`
- ✅ Port 5001 default to avoid macOS AirPlay conflict
### Changed
- Backend structure: Monolithic → Modular
- RSS feeds: Hardcoded → Database-driven
- Article storage: Summary only → Full content support
- Configuration: Scattered → Centralized
### Technical Improvements
- Separation of concerns (routes vs services)
- Better testability
- Easier maintenance
- Scalable architecture
- Independent microservices
- Proper error handling
- Comprehensive logging
### Database Schema Updates
Articles collection now includes:
- `full_content` - Full article text
- `word_count` - Number of words
- `crawled_at` - When content was crawled
RSS Feeds collection added:
- `name` - Feed name
- `url` - Feed URL (unique)
- `active` - Active status
- `created_at` - Creation timestamp
### Files Added
```
backend/
├── config.py
├── database.py
├── fix_duplicates.py
├── STRUCTURE.md
├── routes/
│ ├── __init__.py
│ ├── subscription_routes.py
│ ├── news_routes.py
│ ├── rss_routes.py
│ └── ollama_routes.py
└── services/
├── __init__.py
├── news_service.py
├── email_service.py
└── ollama_service.py
news_crawler/
├── crawler_service.py
├── test_crawler.py
├── requirements.txt
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── README.md
└── QUICKSTART.md
Root:
├── ARCHITECTURE.md
└── CHANGELOG.md
```
### Files Removed
- Old monolithic `backend/app.py` (replaced with modular version)
### Next Steps (Future Enhancements)
- [ ] Frontend UI for RSS feed management
- [ ] Automatic article summarization with Ollama
- [ ] Scheduled newsletter sending
- [ ] Article categorization and tagging
- [ ] Search functionality
- [ ] User preferences (categories, frequency)
- [ ] Analytics dashboard
- [ ] API rate limiting
- [ ] Caching layer (Redis)
- [ ] Message queue for crawler (Celery)
---
## Recent Updates (November 2025)
### Security Improvements
- **MongoDB Internal-Only**: Removed port exposure, only accessible via Docker network
- **Ollama Internal-Only**: Removed port exposure, only accessible via Docker network
- **Reduced Attack Surface**: Only Backend API (port 5001) exposed to host
- **Network Isolation**: All services communicate via internal Docker network
### Ollama Integration
- **Docker Compose Integration**: Ollama service runs alongside other services
- **Automatic Model Download**: phi3:latest model downloaded on first startup
- **GPU Support**: NVIDIA GPU acceleration with automatic detection
- **Helper Scripts**: `start-with-gpu.sh`, `check-gpu.sh`, `configure-ollama.sh`
- **Performance**: 5-10x faster with GPU acceleration
### API Enhancements
- **Send Newsletter Endpoint**: `/api/admin/send-newsletter` to send to all active subscribers
- **Subscriber Status Fix**: Fixed stats endpoint to correctly count active subscribers
- **Better Error Handling**: Improved error messages and validation
### Documentation
- **Consolidated Documentation**: Moved all docs to `docs/` directory
- **Security Guide**: Comprehensive security documentation
- **GPU Setup Guide**: Detailed GPU acceleration setup
- **MongoDB Connection Guide**: Connection configuration explained
- **Subscriber Status Guide**: How subscriber status system works
### Configuration
- **MongoDB URI**: Updated to use Docker service name (`mongodb` instead of `localhost`)
- **Ollama URL**: Configured for internal Docker network (`http://ollama:11434`)
- **Single .env File**: All configuration in `backend/.env`
### Testing
- **Connectivity Tests**: `test-mongodb-connectivity.sh`
- **Ollama Tests**: `test-ollama-setup.sh`
- **Newsletter API Tests**: `test-newsletter-api.sh`