update
This commit is contained in:
143
backend/DATABASE_SCHEMA.md
Normal file
143
backend/DATABASE_SCHEMA.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# MongoDB Database Schema
|
||||
|
||||
This document describes the MongoDB collections and their structure for Munich News Daily.
|
||||
|
||||
## Collections
|
||||
|
||||
### 1. Articles Collection (`articles`)
|
||||
|
||||
Stores all news articles aggregated from Munich news sources.
|
||||
|
||||
**Document Structure:**
|
||||
```javascript
|
||||
{
|
||||
_id: ObjectId, // Auto-generated MongoDB ID
|
||||
title: String, // Article title (required)
|
||||
author: String, // Article author (optional, extracted during crawl)
|
||||
link: String, // Article URL (required, unique)
|
||||
content: String, // Full article content (no length limit)
|
||||
summary: String, // AI-generated English summary (≤150 words)
|
||||
word_count: Number, // Word count of full content
|
||||
summary_word_count: Number, // Word count of AI summary
|
||||
source: String, // News source name (e.g., "Süddeutsche Zeitung München")
|
||||
published_at: String, // Original publication date from RSS feed or crawled
|
||||
crawled_at: DateTime, // When article content was crawled (UTC)
|
||||
summarized_at: DateTime, // When AI summary was generated (UTC)
|
||||
created_at: DateTime // When article was added to database (UTC)
|
||||
}
|
||||
```
|
||||
|
||||
**Indexes:**
|
||||
- `link` - Unique index to prevent duplicate articles
|
||||
- `created_at` - Index for efficient sorting by date
|
||||
|
||||
**Example Document:**
|
||||
```javascript
|
||||
{
|
||||
_id: ObjectId("507f1f77bcf86cd799439011"),
|
||||
title: "New U-Bahn Line Opens in Munich",
|
||||
author: "Max Mustermann",
|
||||
link: "https://www.sueddeutsche.de/muenchen/ubahn-1.123456",
|
||||
content: "The new U-Bahn line connecting the city center with the airport opened today. Mayor Dieter Reiter attended the opening ceremony... [full article text continues]",
|
||||
summary: "Munich's new U-Bahn line connecting the city center to the airport opened today with Mayor Dieter Reiter in attendance. The line features 10 stations and runs every 10 minutes during peak hours, significantly reducing travel time. Construction took five years and cost approximately 2 billion euros.",
|
||||
word_count: 1250,
|
||||
summary_word_count: 48,
|
||||
source: "Süddeutsche Zeitung München",
|
||||
published_at: "Mon, 15 Jan 2024 10:00:00 +0100",
|
||||
crawled_at: ISODate("2024-01-15T09:30:00.000Z"),
|
||||
summarized_at: ISODate("2024-01-15T09:30:15.000Z"),
|
||||
created_at: ISODate("2024-01-15T09:00:00.000Z")
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Subscribers Collection (`subscribers`)
|
||||
|
||||
Stores all newsletter subscribers.
|
||||
|
||||
**Document Structure:**
|
||||
```javascript
|
||||
{
|
||||
_id: ObjectId, // Auto-generated MongoDB ID
|
||||
email: String, // Subscriber email (required, unique, lowercase)
|
||||
subscribed_at: DateTime, // When user subscribed (UTC)
|
||||
status: String // Subscription status: 'active' or 'inactive'
|
||||
}
|
||||
```
|
||||
|
||||
**Indexes:**
|
||||
- `email` - Unique index for email lookups and preventing duplicates
|
||||
- `subscribed_at` - Index for analytics and sorting
|
||||
|
||||
**Example Document:**
|
||||
```javascript
|
||||
{
|
||||
_id: ObjectId("507f1f77bcf86cd799439012"),
|
||||
email: "user@example.com",
|
||||
subscribed_at: ISODate("2024-01-15T08:30:00.000Z"),
|
||||
status: "active"
|
||||
}
|
||||
```
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why MongoDB?
|
||||
|
||||
1. **Flexibility**: Easy to add new fields without schema migrations
|
||||
2. **Scalability**: Handles large volumes of articles and subscribers efficiently
|
||||
3. **Performance**: Indexes on frequently queried fields (link, email, created_at)
|
||||
4. **Document Model**: Natural fit for news articles and subscriber data
|
||||
|
||||
### Schema Choices
|
||||
|
||||
1. **Unique Link Index**: Prevents duplicate articles from being stored, even if fetched multiple times
|
||||
2. **Status Field**: Soft delete for subscribers (set to 'inactive' instead of deleting) - allows for analytics and easy re-subscription
|
||||
3. **UTC Timestamps**: All dates stored in UTC for consistency across timezones
|
||||
4. **Lowercase Emails**: Emails stored in lowercase to prevent case-sensitivity issues
|
||||
|
||||
### Future Enhancements
|
||||
|
||||
Potential fields to add in the future:
|
||||
|
||||
**Articles:**
|
||||
- `category`: String (e.g., "politics", "sports", "culture")
|
||||
- `tags`: Array of Strings
|
||||
- `image_url`: String
|
||||
- `sent_in_newsletter`: Boolean (track if article was sent)
|
||||
- `sent_at`: DateTime (when article was included in newsletter)
|
||||
|
||||
**Subscribers:**
|
||||
- `preferences`: Object (newsletter frequency, categories, etc.)
|
||||
- `last_sent_at`: DateTime (last newsletter sent date)
|
||||
- `unsubscribed_at`: DateTime (when user unsubscribed)
|
||||
- `verification_token`: String (for email verification)
|
||||
|
||||
|
||||
|
||||
## AI Summarization Workflow
|
||||
|
||||
When the crawler processes an article:
|
||||
|
||||
1. **Extract Content**: Full article text is extracted from the webpage
|
||||
2. **Summarize with Ollama**: If `OLLAMA_ENABLED=true`, the content is sent to Ollama for summarization
|
||||
3. **Store Both**: Both the original `content` and AI-generated `summary` are stored
|
||||
4. **Fallback**: If Ollama is unavailable or fails, only the original content is stored
|
||||
|
||||
### Summary Field Details
|
||||
|
||||
- **Language**: Always in English, regardless of source article language
|
||||
- **Length**: Maximum 150 words
|
||||
- **Format**: Plain text, concise and clear
|
||||
- **Purpose**: Quick preview for newsletters and frontend display
|
||||
|
||||
### Querying Articles
|
||||
|
||||
```javascript
|
||||
// Get articles with AI summaries
|
||||
db.articles.find({ summary: { $exists: true, $ne: null } })
|
||||
|
||||
// Get articles without summaries
|
||||
db.articles.find({ summary: { $exists: false } })
|
||||
|
||||
// Count summarized articles
|
||||
db.articles.countDocuments({ summary: { $exists: true, $ne: null } })
|
||||
```
|
||||
98
backend/STRUCTURE.md
Normal file
98
backend/STRUCTURE.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# Backend Structure
|
||||
|
||||
The backend has been modularized for better maintainability and scalability.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
backend/
|
||||
├── app.py # Main Flask application entry point
|
||||
├── config.py # Configuration management
|
||||
├── database.py # Database connection and initialization
|
||||
├── requirements.txt # Python dependencies
|
||||
├── .env # Environment variables
|
||||
│
|
||||
├── routes/ # API route handlers (blueprints)
|
||||
│ ├── __init__.py
|
||||
│ ├── subscription_routes.py # /api/subscribe, /api/unsubscribe
|
||||
│ ├── news_routes.py # /api/news, /api/stats
|
||||
│ ├── rss_routes.py # /api/rss-feeds (CRUD operations)
|
||||
│ └── ollama_routes.py # /api/ollama/* (AI features)
|
||||
│
|
||||
└── services/ # Business logic layer
|
||||
├── __init__.py
|
||||
├── news_service.py # News fetching and storage logic
|
||||
├── email_service.py # Newsletter email sending
|
||||
└── ollama_service.py # Ollama AI integration
|
||||
```
|
||||
|
||||
## Key Components
|
||||
|
||||
### app.py
|
||||
- Main Flask application
|
||||
- Registers all blueprints
|
||||
- Minimal code, just wiring things together
|
||||
|
||||
### config.py
|
||||
- Centralized configuration
|
||||
- Loads environment variables
|
||||
- Single source of truth for all settings
|
||||
|
||||
### database.py
|
||||
- MongoDB connection setup
|
||||
- Collection definitions
|
||||
- Database initialization with indexes
|
||||
|
||||
### routes/
|
||||
Each route file is a Flask Blueprint handling specific API endpoints:
|
||||
- **subscription_routes.py**: User subscription management
|
||||
- **news_routes.py**: News fetching and statistics
|
||||
- **rss_routes.py**: RSS feed management (add/remove/list/toggle)
|
||||
- **ollama_routes.py**: AI/Ollama integration endpoints
|
||||
|
||||
### services/
|
||||
Business logic separated from route handlers:
|
||||
- **news_service.py**: Fetches news from RSS feeds, saves to database
|
||||
- **email_service.py**: Sends newsletter emails to subscribers
|
||||
- **ollama_service.py**: Communicates with Ollama AI server
|
||||
|
||||
## Benefits of This Structure
|
||||
|
||||
1. **Separation of Concerns**: Routes handle HTTP, services handle business logic
|
||||
2. **Testability**: Each module can be tested independently
|
||||
3. **Maintainability**: Easy to find and modify specific functionality
|
||||
4. **Scalability**: Easy to add new routes or services
|
||||
5. **Reusability**: Services can be used by multiple routes
|
||||
|
||||
## Adding New Features
|
||||
|
||||
### To add a new API endpoint:
|
||||
1. Create a new route file in `routes/` or add to existing one
|
||||
2. Create a Blueprint and define routes
|
||||
3. Register the blueprint in `app.py`
|
||||
|
||||
### To add new business logic:
|
||||
1. Create a new service file in `services/`
|
||||
2. Import and use in your route handlers
|
||||
|
||||
### Example:
|
||||
```python
|
||||
# services/my_service.py
|
||||
def my_business_logic():
|
||||
return "Hello"
|
||||
|
||||
# routes/my_routes.py
|
||||
from flask import Blueprint
|
||||
from services.my_service import my_business_logic
|
||||
|
||||
my_bp = Blueprint('my', __name__)
|
||||
|
||||
@my_bp.route('/api/my-endpoint')
|
||||
def my_endpoint():
|
||||
result = my_business_logic()
|
||||
return {'message': result}
|
||||
|
||||
# app.py
|
||||
from routes.my_routes import my_bp
|
||||
app.register_blueprint(my_bp)
|
||||
```
|
||||
29
backend/app.py
Normal file
29
backend/app.py
Normal file
@@ -0,0 +1,29 @@
|
||||
from flask import Flask
|
||||
from flask_cors import CORS
|
||||
from config import Config
|
||||
from database import init_db
|
||||
from routes.subscription_routes import subscription_bp
|
||||
from routes.news_routes import news_bp
|
||||
from routes.rss_routes import rss_bp
|
||||
from routes.ollama_routes import ollama_bp
|
||||
from routes.newsletter_routes import newsletter_bp
|
||||
|
||||
# Initialize Flask app
|
||||
app = Flask(__name__)
|
||||
CORS(app)
|
||||
|
||||
# Initialize database
|
||||
init_db()
|
||||
|
||||
# Register blueprints
|
||||
app.register_blueprint(subscription_bp)
|
||||
app.register_blueprint(news_bp)
|
||||
app.register_blueprint(rss_bp)
|
||||
app.register_blueprint(ollama_bp)
|
||||
app.register_blueprint(newsletter_bp)
|
||||
|
||||
# Print configuration
|
||||
Config.print_config()
|
||||
|
||||
if __name__ == '__main__':
|
||||
app.run(debug=True, port=Config.FLASK_PORT, host='127.0.0.1')
|
||||
52
backend/config.py
Normal file
52
backend/config.py
Normal file
@@ -0,0 +1,52 @@
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
from pathlib import Path
|
||||
|
||||
# Get the directory where this script is located
|
||||
backend_dir = Path(__file__).parent
|
||||
env_path = backend_dir / '.env'
|
||||
|
||||
# Load .env file
|
||||
load_dotenv(dotenv_path=env_path)
|
||||
|
||||
# Debug: Print if .env file exists (for troubleshooting)
|
||||
if env_path.exists():
|
||||
print(f"✓ Loading .env file from: {env_path}")
|
||||
else:
|
||||
print(f"⚠ Warning: .env file not found at {env_path}")
|
||||
print(f" Current working directory: {os.getcwd()}")
|
||||
print(f" Looking for .env in: {env_path}")
|
||||
|
||||
|
||||
class Config:
|
||||
"""Application configuration"""
|
||||
|
||||
# MongoDB
|
||||
MONGODB_URI = os.getenv('MONGODB_URI', 'mongodb://localhost:27017/')
|
||||
DB_NAME = 'munich_news'
|
||||
|
||||
# Email
|
||||
SMTP_SERVER = os.getenv('SMTP_SERVER', 'smtp.gmail.com')
|
||||
SMTP_PORT = int(os.getenv('SMTP_PORT', '587'))
|
||||
EMAIL_USER = os.getenv('EMAIL_USER', '')
|
||||
EMAIL_PASSWORD = os.getenv('EMAIL_PASSWORD', '')
|
||||
|
||||
# Ollama
|
||||
OLLAMA_BASE_URL = os.getenv('OLLAMA_BASE_URL', 'http://localhost:11434')
|
||||
OLLAMA_MODEL = os.getenv('OLLAMA_MODEL', 'llama2')
|
||||
OLLAMA_API_KEY = os.getenv('OLLAMA_API_KEY', '')
|
||||
OLLAMA_ENABLED = os.getenv('OLLAMA_ENABLED', 'false').lower() == 'true'
|
||||
|
||||
# Flask
|
||||
FLASK_PORT = int(os.getenv('FLASK_PORT', '5000'))
|
||||
|
||||
@classmethod
|
||||
def print_config(cls):
|
||||
"""Print configuration (without sensitive data)"""
|
||||
print("\nApplication Configuration:")
|
||||
print(f" MongoDB URI: {cls.MONGODB_URI}")
|
||||
print(f" Database: {cls.DB_NAME}")
|
||||
print(f" Flask Port: {cls.FLASK_PORT}")
|
||||
print(f" Ollama Base URL: {cls.OLLAMA_BASE_URL}")
|
||||
print(f" Ollama Model: {cls.OLLAMA_MODEL}")
|
||||
print(f" Ollama Enabled: {cls.OLLAMA_ENABLED}")
|
||||
53
backend/database.py
Normal file
53
backend/database.py
Normal file
@@ -0,0 +1,53 @@
|
||||
from pymongo import MongoClient
|
||||
from datetime import datetime
|
||||
from config import Config
|
||||
|
||||
# MongoDB setup
|
||||
client = MongoClient(Config.MONGODB_URI)
|
||||
db = client[Config.DB_NAME]
|
||||
|
||||
# Collections
|
||||
articles_collection = db['articles']
|
||||
subscribers_collection = db['subscribers']
|
||||
rss_feeds_collection = db['rss_feeds']
|
||||
|
||||
|
||||
def init_db():
|
||||
"""Initialize database with indexes"""
|
||||
# Create unique index on article links to prevent duplicates
|
||||
articles_collection.create_index('link', unique=True)
|
||||
# Create index on created_at for faster sorting
|
||||
articles_collection.create_index('created_at')
|
||||
# Create unique index on subscriber emails
|
||||
subscribers_collection.create_index('email', unique=True)
|
||||
# Create index on subscribed_at
|
||||
subscribers_collection.create_index('subscribed_at')
|
||||
# Create unique index on RSS feed URLs
|
||||
rss_feeds_collection.create_index('url', unique=True)
|
||||
|
||||
# Initialize default RSS feeds if collection is empty
|
||||
if rss_feeds_collection.count_documents({}) == 0:
|
||||
default_feeds = [
|
||||
{
|
||||
'name': 'Süddeutsche Zeitung München',
|
||||
'url': 'https://www.sueddeutsche.de/muenchen/rss',
|
||||
'active': True,
|
||||
'created_at': datetime.utcnow()
|
||||
},
|
||||
{
|
||||
'name': 'Münchner Merkur',
|
||||
'url': 'https://www.merkur.de/muenchen/rss',
|
||||
'active': True,
|
||||
'created_at': datetime.utcnow()
|
||||
},
|
||||
{
|
||||
'name': 'Abendzeitung München',
|
||||
'url': 'https://www.abendzeitung-muenchen.de/rss',
|
||||
'active': True,
|
||||
'created_at': datetime.utcnow()
|
||||
}
|
||||
]
|
||||
rss_feeds_collection.insert_many(default_feeds)
|
||||
print(f"Initialized {len(default_feeds)} default RSS feeds")
|
||||
|
||||
print("Database initialized with indexes")
|
||||
32
backend/env.template
Normal file
32
backend/env.template
Normal file
@@ -0,0 +1,32 @@
|
||||
# MongoDB Configuration
|
||||
# For Docker Compose (no authentication):
|
||||
MONGODB_URI=mongodb://localhost:27017/
|
||||
# For Docker Compose with authentication:
|
||||
# MONGODB_URI=mongodb://admin:password@localhost:27017/
|
||||
# For MongoDB Atlas (cloud):
|
||||
# MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
|
||||
|
||||
# Email Configuration (for sending newsletters)
|
||||
SMTP_SERVER=smtp.gmail.com
|
||||
SMTP_PORT=587
|
||||
EMAIL_USER=your-email@gmail.com
|
||||
EMAIL_PASSWORD=your-app-password
|
||||
# Note: For Gmail, use an App Password: https://support.google.com/accounts/answer/185833
|
||||
|
||||
# Ollama Configuration (for AI-powered features)
|
||||
# Remote Ollama server URL (e.g., http://your-server-ip:11434 or https://your-domain.com)
|
||||
OLLAMA_BASE_URL=http://localhost:11434
|
||||
# Optional: API key if your Ollama server requires authentication
|
||||
# OLLAMA_API_KEY=your-api-key-here
|
||||
# Model name to use (e.g., llama2, mistral, codellama, llama3, phi3:latest)
|
||||
OLLAMA_MODEL=phi3:latest
|
||||
# Enable/disable Ollama features (true/false)
|
||||
# When enabled, the crawler will automatically summarize articles in English (≤150 words)
|
||||
OLLAMA_ENABLED=true
|
||||
# Timeout for Ollama requests in seconds (default: 30)
|
||||
OLLAMA_TIMEOUT=30
|
||||
|
||||
# Flask Server Configuration
|
||||
# Port for Flask server (default: 5001 to avoid AirPlay conflict on macOS)
|
||||
FLASK_PORT=5001
|
||||
|
||||
61
backend/fix_duplicates.py
Normal file
61
backend/fix_duplicates.py
Normal file
@@ -0,0 +1,61 @@
|
||||
"""
|
||||
Script to fix duplicate RSS feeds and create unique index
|
||||
Run this once: python fix_duplicates.py
|
||||
"""
|
||||
from pymongo import MongoClient
|
||||
from config import Config
|
||||
|
||||
# Connect to MongoDB
|
||||
client = MongoClient(Config.MONGODB_URI)
|
||||
db = client[Config.DB_NAME]
|
||||
rss_feeds_collection = db['rss_feeds']
|
||||
|
||||
print("Fixing duplicate RSS feeds...")
|
||||
|
||||
# Get all feeds
|
||||
all_feeds = list(rss_feeds_collection.find())
|
||||
print(f"Total feeds found: {len(all_feeds)}")
|
||||
|
||||
# Find duplicates by URL
|
||||
seen_urls = {}
|
||||
duplicates_to_remove = []
|
||||
|
||||
for feed in all_feeds:
|
||||
url = feed.get('url')
|
||||
if url in seen_urls:
|
||||
# This is a duplicate, mark for removal
|
||||
duplicates_to_remove.append(feed['_id'])
|
||||
print(f" Duplicate found: {feed['name']} - {url}")
|
||||
else:
|
||||
# First occurrence, keep it
|
||||
seen_urls[url] = feed['_id']
|
||||
|
||||
# Remove duplicates
|
||||
if duplicates_to_remove:
|
||||
result = rss_feeds_collection.delete_many({'_id': {'$in': duplicates_to_remove}})
|
||||
print(f"Removed {result.deleted_count} duplicate feeds")
|
||||
else:
|
||||
print("No duplicates found")
|
||||
|
||||
# Drop existing indexes (if any)
|
||||
print("\nDropping existing indexes...")
|
||||
try:
|
||||
rss_feeds_collection.drop_indexes()
|
||||
print("Indexes dropped")
|
||||
except Exception as e:
|
||||
print(f"Note: {e}")
|
||||
|
||||
# Create unique index on URL
|
||||
print("\nCreating unique index on 'url' field...")
|
||||
rss_feeds_collection.create_index('url', unique=True)
|
||||
print("✓ Unique index created successfully")
|
||||
|
||||
# Verify
|
||||
remaining_feeds = list(rss_feeds_collection.find())
|
||||
print(f"\nFinal feed count: {len(remaining_feeds)}")
|
||||
print("\nRemaining feeds:")
|
||||
for feed in remaining_feeds:
|
||||
print(f" - {feed['name']}: {feed['url']}")
|
||||
|
||||
print("\n✓ Done! Duplicates removed and unique index created.")
|
||||
print("You can now restart your Flask app.")
|
||||
8
backend/requirements.txt
Normal file
8
backend/requirements.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
Flask==3.0.0
|
||||
flask-cors==4.0.0
|
||||
feedparser==6.0.10
|
||||
python-dotenv==1.0.0
|
||||
pymongo==4.6.1
|
||||
requests==2.31.0
|
||||
Jinja2==3.1.2
|
||||
|
||||
1
backend/routes/__init__.py
Normal file
1
backend/routes/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# Routes package
|
||||
123
backend/routes/news_routes.py
Normal file
123
backend/routes/news_routes.py
Normal file
@@ -0,0 +1,123 @@
|
||||
from flask import Blueprint, jsonify
|
||||
from database import articles_collection
|
||||
from services.news_service import fetch_munich_news, save_articles_to_db
|
||||
|
||||
news_bp = Blueprint('news', __name__)
|
||||
|
||||
|
||||
@news_bp.route('/api/news', methods=['GET'])
|
||||
def get_news():
|
||||
"""Get latest Munich news"""
|
||||
try:
|
||||
# Fetch fresh news and save to database
|
||||
articles = fetch_munich_news()
|
||||
save_articles_to_db(articles)
|
||||
|
||||
# Get articles from MongoDB, sorted by created_at (newest first)
|
||||
cursor = articles_collection.find().sort('created_at', -1).limit(20)
|
||||
|
||||
db_articles = []
|
||||
for doc in cursor:
|
||||
article = {
|
||||
'title': doc.get('title', ''),
|
||||
'author': doc.get('author'),
|
||||
'link': doc.get('link', ''),
|
||||
'source': doc.get('source', ''),
|
||||
'published': doc.get('published_at', ''),
|
||||
'word_count': doc.get('word_count'),
|
||||
'has_full_content': bool(doc.get('content')),
|
||||
'has_summary': bool(doc.get('summary'))
|
||||
}
|
||||
|
||||
# Include AI summary if available
|
||||
if doc.get('summary'):
|
||||
article['summary'] = doc.get('summary', '')
|
||||
article['summary_word_count'] = doc.get('summary_word_count')
|
||||
article['summarized_at'] = doc.get('summarized_at', '').isoformat() if doc.get('summarized_at') else None
|
||||
# Fallback: Include preview of content if no summary (first 200 chars)
|
||||
elif doc.get('content'):
|
||||
article['preview'] = doc.get('content', '')[:200] + '...'
|
||||
|
||||
db_articles.append(article)
|
||||
|
||||
# Combine fresh articles with database articles and deduplicate
|
||||
seen_links = set()
|
||||
combined = []
|
||||
|
||||
# Add fresh articles first (they're more recent)
|
||||
for article in articles:
|
||||
link = article.get('link', '')
|
||||
if link and link not in seen_links:
|
||||
seen_links.add(link)
|
||||
combined.append(article)
|
||||
|
||||
# Add database articles
|
||||
for article in db_articles:
|
||||
link = article.get('link', '')
|
||||
if link and link not in seen_links:
|
||||
seen_links.add(link)
|
||||
combined.append(article)
|
||||
|
||||
return jsonify({'articles': combined[:20]}), 200
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
|
||||
@news_bp.route('/api/news/<path:article_url>', methods=['GET'])
|
||||
def get_article_by_url(article_url):
|
||||
"""Get full article content by URL"""
|
||||
try:
|
||||
# Decode URL
|
||||
from urllib.parse import unquote
|
||||
decoded_url = unquote(article_url)
|
||||
|
||||
# Find article by link
|
||||
article = articles_collection.find_one({'link': decoded_url})
|
||||
|
||||
if not article:
|
||||
return jsonify({'error': 'Article not found'}), 404
|
||||
|
||||
return jsonify({
|
||||
'title': article.get('title', ''),
|
||||
'author': article.get('author'),
|
||||
'link': article.get('link', ''),
|
||||
'content': article.get('content', ''),
|
||||
'summary': article.get('summary'),
|
||||
'word_count': article.get('word_count', 0),
|
||||
'summary_word_count': article.get('summary_word_count'),
|
||||
'source': article.get('source', ''),
|
||||
'published_at': article.get('published_at', ''),
|
||||
'crawled_at': article.get('crawled_at', '').isoformat() if article.get('crawled_at') else None,
|
||||
'summarized_at': article.get('summarized_at', '').isoformat() if article.get('summarized_at') else None,
|
||||
'created_at': article.get('created_at', '').isoformat() if article.get('created_at') else None
|
||||
}), 200
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
|
||||
@news_bp.route('/api/stats', methods=['GET'])
|
||||
def get_stats():
|
||||
"""Get subscription statistics"""
|
||||
try:
|
||||
from database import subscribers_collection
|
||||
|
||||
# Count only active subscribers
|
||||
subscriber_count = subscribers_collection.count_documents({'status': 'active'})
|
||||
|
||||
# Also get total article count
|
||||
article_count = articles_collection.count_documents({})
|
||||
|
||||
# Count crawled articles
|
||||
crawled_count = articles_collection.count_documents({'content': {'$exists': True, '$ne': ''}})
|
||||
|
||||
# Count summarized articles
|
||||
summarized_count = articles_collection.count_documents({'summary': {'$exists': True, '$ne': ''}})
|
||||
|
||||
return jsonify({
|
||||
'subscribers': subscriber_count,
|
||||
'articles': article_count,
|
||||
'crawled_articles': crawled_count,
|
||||
'summarized_articles': summarized_count
|
||||
}), 200
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
62
backend/routes/newsletter_routes.py
Normal file
62
backend/routes/newsletter_routes.py
Normal file
@@ -0,0 +1,62 @@
|
||||
from flask import Blueprint, Response
|
||||
from pathlib import Path
|
||||
from jinja2 import Template
|
||||
from datetime import datetime
|
||||
from database import articles_collection
|
||||
|
||||
newsletter_bp = Blueprint('newsletter', __name__)
|
||||
|
||||
|
||||
@newsletter_bp.route('/api/newsletter/preview', methods=['GET'])
|
||||
def preview_newsletter():
|
||||
"""Preview the newsletter HTML (for testing)"""
|
||||
try:
|
||||
# Get latest articles with AI summaries
|
||||
cursor = articles_collection.find(
|
||||
{'summary': {'$exists': True, '$ne': None}}
|
||||
).sort('created_at', -1).limit(10)
|
||||
|
||||
articles = []
|
||||
for doc in cursor:
|
||||
articles.append({
|
||||
'title': doc.get('title', ''),
|
||||
'author': doc.get('author'),
|
||||
'link': doc.get('link', ''),
|
||||
'summary': doc.get('summary', ''),
|
||||
'source': doc.get('source', ''),
|
||||
'published_at': doc.get('published_at', '')
|
||||
})
|
||||
|
||||
if not articles:
|
||||
return Response(
|
||||
"<h1>No articles with summaries found</h1><p>Run the crawler with Ollama enabled first.</p>",
|
||||
mimetype='text/html'
|
||||
)
|
||||
|
||||
# Load template
|
||||
template_path = Path(__file__).parent.parent / 'templates' / 'newsletter_template.html'
|
||||
with open(template_path, 'r', encoding='utf-8') as f:
|
||||
template_content = f.read()
|
||||
|
||||
template = Template(template_content)
|
||||
|
||||
# Prepare data
|
||||
now = datetime.now()
|
||||
template_data = {
|
||||
'date': now.strftime('%A, %B %d, %Y'),
|
||||
'year': now.year,
|
||||
'article_count': len(articles),
|
||||
'articles': articles,
|
||||
'unsubscribe_link': 'http://localhost:3000/unsubscribe',
|
||||
'website_link': 'http://localhost:3000'
|
||||
}
|
||||
|
||||
# Render and return HTML
|
||||
html_content = template.render(**template_data)
|
||||
return Response(html_content, mimetype='text/html')
|
||||
|
||||
except Exception as e:
|
||||
return Response(
|
||||
f"<h1>Error</h1><p>{str(e)}</p>",
|
||||
mimetype='text/html'
|
||||
), 500
|
||||
158
backend/routes/ollama_routes.py
Normal file
158
backend/routes/ollama_routes.py
Normal file
@@ -0,0 +1,158 @@
|
||||
from flask import Blueprint, jsonify
|
||||
from config import Config
|
||||
from services.ollama_service import call_ollama, list_ollama_models
|
||||
import os
|
||||
|
||||
ollama_bp = Blueprint('ollama', __name__)
|
||||
|
||||
|
||||
@ollama_bp.route('/api/ollama/ping', methods=['GET', 'POST'])
|
||||
def ping_ollama():
|
||||
"""Test connection to Ollama server"""
|
||||
try:
|
||||
# Check if Ollama is enabled
|
||||
if not Config.OLLAMA_ENABLED:
|
||||
return jsonify({
|
||||
'status': 'disabled',
|
||||
'message': 'Ollama is not enabled. Set OLLAMA_ENABLED=true in your .env file.',
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': False
|
||||
}
|
||||
}), 200
|
||||
|
||||
# Send a simple test prompt
|
||||
test_prompt = "Say 'Hello! I am connected and working.' in one sentence."
|
||||
system_prompt = "You are a helpful assistant. Respond briefly and concisely."
|
||||
|
||||
response_text, error_message = call_ollama(test_prompt, system_prompt)
|
||||
|
||||
if response_text:
|
||||
return jsonify({
|
||||
'status': 'success',
|
||||
'message': 'Successfully connected to Ollama',
|
||||
'response': response_text,
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': True
|
||||
}
|
||||
}), 200
|
||||
else:
|
||||
# Try to get available models for better error message
|
||||
available_models, _ = list_ollama_models()
|
||||
|
||||
troubleshooting = {
|
||||
'check_server': f'Verify Ollama is running at {Config.OLLAMA_BASE_URL}',
|
||||
'check_model': f'Verify model "{Config.OLLAMA_MODEL}" is available (run: ollama list)',
|
||||
'test_connection': f'Test manually: curl {Config.OLLAMA_BASE_URL}/api/generate -d \'{{"model":"{Config.OLLAMA_MODEL}","prompt":"test"}}\''
|
||||
}
|
||||
|
||||
if available_models:
|
||||
troubleshooting['available_models'] = available_models
|
||||
troubleshooting['suggestion'] = f'Try setting OLLAMA_MODEL to one of: {", ".join(available_models[:5])}'
|
||||
|
||||
return jsonify({
|
||||
'status': 'error',
|
||||
'message': error_message or 'Failed to get response from Ollama',
|
||||
'error_details': error_message,
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': True
|
||||
},
|
||||
'troubleshooting': troubleshooting
|
||||
}), 500
|
||||
|
||||
except Exception as e:
|
||||
return jsonify({
|
||||
'status': 'error',
|
||||
'message': f'Error connecting to Ollama: {str(e)}',
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': Config.OLLAMA_ENABLED
|
||||
}
|
||||
}), 500
|
||||
|
||||
|
||||
@ollama_bp.route('/api/ollama/config', methods=['GET'])
|
||||
def get_ollama_config():
|
||||
"""Get current Ollama configuration (for debugging)"""
|
||||
try:
|
||||
from pathlib import Path
|
||||
backend_dir = Path(__file__).parent.parent
|
||||
env_path = backend_dir / '.env'
|
||||
|
||||
return jsonify({
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': Config.OLLAMA_ENABLED,
|
||||
'has_api_key': bool(Config.OLLAMA_API_KEY)
|
||||
},
|
||||
'env_file_path': str(env_path),
|
||||
'env_file_exists': env_path.exists(),
|
||||
'current_working_directory': os.getcwd()
|
||||
}), 200
|
||||
except Exception as e:
|
||||
return jsonify({
|
||||
'error': str(e),
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': Config.OLLAMA_ENABLED
|
||||
}
|
||||
}), 500
|
||||
|
||||
|
||||
@ollama_bp.route('/api/ollama/models', methods=['GET'])
|
||||
def get_ollama_models():
|
||||
"""List available models on Ollama server"""
|
||||
try:
|
||||
if not Config.OLLAMA_ENABLED:
|
||||
return jsonify({
|
||||
'status': 'disabled',
|
||||
'message': 'Ollama is not enabled. Set OLLAMA_ENABLED=true in your .env file.',
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': False
|
||||
}
|
||||
}), 200
|
||||
|
||||
models, error_message = list_ollama_models()
|
||||
|
||||
if models is not None:
|
||||
return jsonify({
|
||||
'status': 'success',
|
||||
'models': models,
|
||||
'current_model': Config.OLLAMA_MODEL,
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': True
|
||||
}
|
||||
}), 200
|
||||
else:
|
||||
return jsonify({
|
||||
'status': 'error',
|
||||
'message': error_message or 'Failed to list models',
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': True
|
||||
}
|
||||
}), 500
|
||||
|
||||
except Exception as e:
|
||||
return jsonify({
|
||||
'status': 'error',
|
||||
'message': f'Error listing models: {str(e)}',
|
||||
'ollama_config': {
|
||||
'base_url': Config.OLLAMA_BASE_URL,
|
||||
'model': Config.OLLAMA_MODEL,
|
||||
'enabled': Config.OLLAMA_ENABLED
|
||||
}
|
||||
}), 500
|
||||
124
backend/routes/rss_routes.py
Normal file
124
backend/routes/rss_routes.py
Normal file
@@ -0,0 +1,124 @@
|
||||
from flask import Blueprint, request, jsonify
|
||||
from datetime import datetime
|
||||
from pymongo.errors import DuplicateKeyError
|
||||
from bson.objectid import ObjectId
|
||||
import feedparser
|
||||
from database import rss_feeds_collection
|
||||
|
||||
rss_bp = Blueprint('rss', __name__)
|
||||
|
||||
|
||||
@rss_bp.route('/api/rss-feeds', methods=['GET'])
|
||||
def get_rss_feeds():
|
||||
"""Get all RSS feeds"""
|
||||
try:
|
||||
cursor = rss_feeds_collection.find().sort('created_at', -1)
|
||||
feeds = []
|
||||
for feed in cursor:
|
||||
feeds.append({
|
||||
'id': str(feed['_id']),
|
||||
'name': feed.get('name', ''),
|
||||
'url': feed.get('url', ''),
|
||||
'active': feed.get('active', True),
|
||||
'created_at': feed.get('created_at', '').isoformat() if feed.get('created_at') else ''
|
||||
})
|
||||
return jsonify({'feeds': feeds}), 200
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
|
||||
@rss_bp.route('/api/rss-feeds', methods=['POST'])
|
||||
def add_rss_feed():
|
||||
"""Add a new RSS feed"""
|
||||
data = request.json
|
||||
name = data.get('name', '').strip()
|
||||
url = data.get('url', '').strip()
|
||||
|
||||
if not name or not url:
|
||||
return jsonify({'error': 'Name and URL are required'}), 400
|
||||
|
||||
if not url.startswith('http://') and not url.startswith('https://'):
|
||||
return jsonify({'error': 'URL must start with http:// or https://'}), 400
|
||||
|
||||
try:
|
||||
# Test if the RSS feed is valid
|
||||
try:
|
||||
feed = feedparser.parse(url)
|
||||
if not feed.entries:
|
||||
return jsonify({'error': 'Invalid RSS feed or no entries found'}), 400
|
||||
except Exception as e:
|
||||
return jsonify({'error': f'Failed to parse RSS feed: {str(e)}'}), 400
|
||||
|
||||
feed_doc = {
|
||||
'name': name,
|
||||
'url': url,
|
||||
'active': True,
|
||||
'created_at': datetime.utcnow()
|
||||
}
|
||||
|
||||
try:
|
||||
result = rss_feeds_collection.insert_one(feed_doc)
|
||||
return jsonify({
|
||||
'message': 'RSS feed added successfully',
|
||||
'id': str(result.inserted_id)
|
||||
}), 201
|
||||
except DuplicateKeyError:
|
||||
return jsonify({'error': 'RSS feed URL already exists'}), 409
|
||||
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
|
||||
@rss_bp.route('/api/rss-feeds/<feed_id>', methods=['DELETE'])
|
||||
def remove_rss_feed(feed_id):
|
||||
"""Remove an RSS feed"""
|
||||
try:
|
||||
# Validate ObjectId
|
||||
try:
|
||||
obj_id = ObjectId(feed_id)
|
||||
except Exception:
|
||||
return jsonify({'error': 'Invalid feed ID'}), 400
|
||||
|
||||
result = rss_feeds_collection.delete_one({'_id': obj_id})
|
||||
|
||||
if result.deleted_count > 0:
|
||||
return jsonify({'message': 'RSS feed removed successfully'}), 200
|
||||
else:
|
||||
return jsonify({'error': 'RSS feed not found'}), 404
|
||||
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
|
||||
@rss_bp.route('/api/rss-feeds/<feed_id>/toggle', methods=['PATCH'])
|
||||
def toggle_rss_feed(feed_id):
|
||||
"""Toggle RSS feed active status"""
|
||||
try:
|
||||
# Validate ObjectId
|
||||
try:
|
||||
obj_id = ObjectId(feed_id)
|
||||
except Exception:
|
||||
return jsonify({'error': 'Invalid feed ID'}), 400
|
||||
|
||||
# Get current status
|
||||
feed = rss_feeds_collection.find_one({'_id': obj_id})
|
||||
if not feed:
|
||||
return jsonify({'error': 'RSS feed not found'}), 404
|
||||
|
||||
# Toggle status
|
||||
new_status = not feed.get('active', True)
|
||||
result = rss_feeds_collection.update_one(
|
||||
{'_id': obj_id},
|
||||
{'$set': {'active': new_status}}
|
||||
)
|
||||
|
||||
if result.modified_count > 0:
|
||||
return jsonify({
|
||||
'message': f'RSS feed {"activated" if new_status else "deactivated"} successfully',
|
||||
'active': new_status
|
||||
}), 200
|
||||
else:
|
||||
return jsonify({'error': 'Failed to update RSS feed'}), 500
|
||||
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
63
backend/routes/subscription_routes.py
Normal file
63
backend/routes/subscription_routes.py
Normal file
@@ -0,0 +1,63 @@
|
||||
from flask import Blueprint, request, jsonify
|
||||
from datetime import datetime
|
||||
from pymongo.errors import DuplicateKeyError
|
||||
from database import subscribers_collection
|
||||
|
||||
subscription_bp = Blueprint('subscription', __name__)
|
||||
|
||||
|
||||
@subscription_bp.route('/api/subscribe', methods=['POST'])
|
||||
def subscribe():
|
||||
"""Subscribe a user to the newsletter"""
|
||||
data = request.json
|
||||
email = data.get('email', '').strip().lower()
|
||||
|
||||
if not email or '@' not in email:
|
||||
return jsonify({'error': 'Invalid email address'}), 400
|
||||
|
||||
try:
|
||||
subscriber_doc = {
|
||||
'email': email,
|
||||
'subscribed_at': datetime.utcnow(),
|
||||
'status': 'active'
|
||||
}
|
||||
|
||||
# Try to insert, if duplicate key error, subscriber already exists
|
||||
try:
|
||||
subscribers_collection.insert_one(subscriber_doc)
|
||||
return jsonify({'message': 'Successfully subscribed!'}), 201
|
||||
except DuplicateKeyError:
|
||||
# Check if subscriber is active
|
||||
existing = subscribers_collection.find_one({'email': email})
|
||||
if existing and existing.get('status') == 'active':
|
||||
return jsonify({'message': 'Email already subscribed'}), 200
|
||||
else:
|
||||
# Reactivate if previously unsubscribed
|
||||
subscribers_collection.update_one(
|
||||
{'email': email},
|
||||
{'$set': {'status': 'active', 'subscribed_at': datetime.utcnow()}}
|
||||
)
|
||||
return jsonify({'message': 'Successfully re-subscribed!'}), 200
|
||||
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
|
||||
|
||||
@subscription_bp.route('/api/unsubscribe', methods=['POST'])
|
||||
def unsubscribe():
|
||||
"""Unsubscribe a user from the newsletter"""
|
||||
data = request.json
|
||||
email = data.get('email', '').strip().lower()
|
||||
|
||||
try:
|
||||
result = subscribers_collection.update_one(
|
||||
{'email': email},
|
||||
{'$set': {'status': 'inactive'}}
|
||||
)
|
||||
|
||||
if result.matched_count > 0:
|
||||
return jsonify({'message': 'Successfully unsubscribed'}), 200
|
||||
else:
|
||||
return jsonify({'error': 'Email not found in subscribers'}), 404
|
||||
except Exception as e:
|
||||
return jsonify({'error': str(e)}), 500
|
||||
1
backend/services/__init__.py
Normal file
1
backend/services/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# Services package
|
||||
88
backend/services/email_service.py
Normal file
88
backend/services/email_service.py
Normal file
@@ -0,0 +1,88 @@
|
||||
import smtplib
|
||||
from email.mime.text import MIMEText
|
||||
from email.mime.multipart import MIMEMultipart
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from jinja2 import Template
|
||||
from config import Config
|
||||
from database import subscribers_collection, articles_collection
|
||||
|
||||
|
||||
def send_newsletter(max_articles=10):
|
||||
"""Send newsletter to all subscribers with AI-summarized articles"""
|
||||
if not Config.EMAIL_USER or not Config.EMAIL_PASSWORD:
|
||||
print("Email credentials not configured")
|
||||
return
|
||||
|
||||
# Get latest articles with AI summaries from database
|
||||
cursor = articles_collection.find(
|
||||
{'summary': {'$exists': True, '$ne': None}}
|
||||
).sort('created_at', -1).limit(max_articles)
|
||||
|
||||
articles = []
|
||||
for doc in cursor:
|
||||
articles.append({
|
||||
'title': doc.get('title', ''),
|
||||
'author': doc.get('author'),
|
||||
'link': doc.get('link', ''),
|
||||
'summary': doc.get('summary', ''),
|
||||
'source': doc.get('source', ''),
|
||||
'published_at': doc.get('published_at', '')
|
||||
})
|
||||
|
||||
if not articles:
|
||||
print("No articles with summaries to send")
|
||||
return
|
||||
|
||||
# Load email template
|
||||
template_path = Path(__file__).parent.parent / 'templates' / 'newsletter_template.html'
|
||||
with open(template_path, 'r', encoding='utf-8') as f:
|
||||
template_content = f.read()
|
||||
|
||||
template = Template(template_content)
|
||||
|
||||
# Prepare template data
|
||||
now = datetime.now()
|
||||
template_data = {
|
||||
'date': now.strftime('%A, %B %d, %Y'),
|
||||
'year': now.year,
|
||||
'article_count': len(articles),
|
||||
'articles': articles,
|
||||
'unsubscribe_link': 'http://localhost:3000', # Update with actual unsubscribe link
|
||||
'website_link': 'http://localhost:3000'
|
||||
}
|
||||
|
||||
# Render HTML
|
||||
html_content = template.render(**template_data)
|
||||
|
||||
# Get all active subscribers
|
||||
subscribers_cursor = subscribers_collection.find({'status': 'active'})
|
||||
subscribers = [doc['email'] for doc in subscribers_cursor]
|
||||
|
||||
# Send emails
|
||||
for subscriber in subscribers:
|
||||
try:
|
||||
msg = MIMEMultipart('alternative')
|
||||
msg['Subject'] = f'Munich News Daily - {datetime.now().strftime("%B %d, %Y")}'
|
||||
msg['From'] = f'Munich News Daily <{Config.EMAIL_USER}>'
|
||||
msg['To'] = subscriber
|
||||
msg['Date'] = datetime.now().strftime('%a, %d %b %Y %H:%M:%S %z')
|
||||
msg['Message-ID'] = f'<{datetime.now().timestamp()}.{subscriber}@dongho.kim>'
|
||||
msg['X-Mailer'] = 'Munich News Daily'
|
||||
|
||||
# Add plain text version as fallback
|
||||
plain_text = "This email requires HTML support. Please view it in an HTML-capable email client."
|
||||
msg.attach(MIMEText(plain_text, 'plain', 'utf-8'))
|
||||
|
||||
# Add HTML version
|
||||
msg.attach(MIMEText(html_content, 'html', 'utf-8'))
|
||||
|
||||
server = smtplib.SMTP(Config.SMTP_SERVER, Config.SMTP_PORT)
|
||||
server.starttls()
|
||||
server.login(Config.EMAIL_USER, Config.EMAIL_PASSWORD)
|
||||
server.send_message(msg)
|
||||
server.quit()
|
||||
|
||||
print(f"Newsletter sent to {subscriber}")
|
||||
except Exception as e:
|
||||
print(f"Error sending to {subscriber}: {e}")
|
||||
90
backend/services/news_service.py
Normal file
90
backend/services/news_service.py
Normal file
@@ -0,0 +1,90 @@
|
||||
import feedparser
|
||||
from datetime import datetime
|
||||
from pymongo.errors import DuplicateKeyError
|
||||
from database import articles_collection, rss_feeds_collection
|
||||
from utils.rss_utils import extract_article_url, extract_article_summary, extract_published_date
|
||||
|
||||
|
||||
def get_active_rss_feeds():
|
||||
"""Get all active RSS feeds from database"""
|
||||
feeds = []
|
||||
cursor = rss_feeds_collection.find({'active': True})
|
||||
for feed in cursor:
|
||||
feeds.append({
|
||||
'name': feed.get('name', ''),
|
||||
'url': feed.get('url', '')
|
||||
})
|
||||
return feeds
|
||||
|
||||
|
||||
def fetch_munich_news():
|
||||
"""Fetch news from Munich news sources"""
|
||||
articles = []
|
||||
|
||||
# Get RSS feeds from database instead of hardcoded list
|
||||
sources = get_active_rss_feeds()
|
||||
|
||||
for source in sources:
|
||||
try:
|
||||
feed = feedparser.parse(source['url'])
|
||||
for entry in feed.entries[:5]: # Get top 5 from each source
|
||||
# Extract article URL using utility function
|
||||
article_url = extract_article_url(entry)
|
||||
|
||||
if not article_url:
|
||||
print(f" ⚠ No valid URL for: {entry.get('title', 'Unknown')[:50]}")
|
||||
continue # Skip entries without valid URL
|
||||
|
||||
# Extract summary
|
||||
summary = extract_article_summary(entry)
|
||||
if summary:
|
||||
summary = summary[:200] + '...' if len(summary) > 200 else summary
|
||||
|
||||
articles.append({
|
||||
'title': entry.get('title', ''),
|
||||
'link': article_url,
|
||||
'summary': summary,
|
||||
'source': source['name'],
|
||||
'published': extract_published_date(entry)
|
||||
})
|
||||
except Exception as e:
|
||||
print(f"Error fetching from {source['name']}: {e}")
|
||||
|
||||
return articles
|
||||
|
||||
|
||||
def save_articles_to_db(articles):
|
||||
"""Save articles to MongoDB, avoiding duplicates"""
|
||||
saved_count = 0
|
||||
|
||||
for article in articles:
|
||||
try:
|
||||
# Prepare article document
|
||||
article_doc = {
|
||||
'title': article.get('title', ''),
|
||||
'link': article.get('link', ''),
|
||||
'summary': article.get('summary', ''),
|
||||
'source': article.get('source', ''),
|
||||
'published_at': article.get('published', ''),
|
||||
'created_at': datetime.utcnow()
|
||||
}
|
||||
|
||||
# Use update_one with upsert to handle duplicates
|
||||
# This will insert if link doesn't exist, or update if it does
|
||||
result = articles_collection.update_one(
|
||||
{'link': article_doc['link']},
|
||||
{'$setOnInsert': article_doc}, # Only set on insert, don't update existing
|
||||
upsert=True
|
||||
)
|
||||
|
||||
if result.upserted_id:
|
||||
saved_count += 1
|
||||
|
||||
except DuplicateKeyError:
|
||||
# Link already exists, skip
|
||||
pass
|
||||
except Exception as e:
|
||||
print(f"Error saving article {article.get('link', 'unknown')}: {e}")
|
||||
|
||||
if saved_count > 0:
|
||||
print(f"Saved {saved_count} new articles to database")
|
||||
96
backend/services/ollama_service.py
Normal file
96
backend/services/ollama_service.py
Normal file
@@ -0,0 +1,96 @@
|
||||
import requests
|
||||
from config import Config
|
||||
|
||||
|
||||
def list_ollama_models():
|
||||
"""List available models on Ollama server"""
|
||||
if not Config.OLLAMA_ENABLED:
|
||||
return None, "Ollama is not enabled"
|
||||
|
||||
try:
|
||||
url = f"{Config.OLLAMA_BASE_URL}/api/tags"
|
||||
headers = {}
|
||||
if Config.OLLAMA_API_KEY:
|
||||
headers["Authorization"] = f"Bearer {Config.OLLAMA_API_KEY}"
|
||||
|
||||
response = requests.get(url, headers=headers, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
result = response.json()
|
||||
models = result.get('models', [])
|
||||
model_names = [model.get('name', '') for model in models]
|
||||
|
||||
return model_names, None
|
||||
except requests.exceptions.RequestException as e:
|
||||
return None, f"Error listing models: {str(e)}"
|
||||
except Exception as e:
|
||||
return None, f"Unexpected error: {str(e)}"
|
||||
|
||||
|
||||
def call_ollama(prompt, system_prompt=None):
|
||||
"""Call Ollama API to generate text"""
|
||||
if not Config.OLLAMA_ENABLED:
|
||||
return None, "Ollama is not enabled"
|
||||
|
||||
try:
|
||||
url = f"{Config.OLLAMA_BASE_URL}/api/generate"
|
||||
payload = {
|
||||
"model": Config.OLLAMA_MODEL,
|
||||
"prompt": prompt,
|
||||
"stream": False
|
||||
}
|
||||
|
||||
if system_prompt:
|
||||
payload["system"] = system_prompt
|
||||
|
||||
headers = {}
|
||||
if Config.OLLAMA_API_KEY:
|
||||
headers["Authorization"] = f"Bearer {Config.OLLAMA_API_KEY}"
|
||||
|
||||
print(f"Calling Ollama at {url} with model {Config.OLLAMA_MODEL}")
|
||||
response = requests.post(url, json=payload, headers=headers, timeout=30)
|
||||
response.raise_for_status()
|
||||
|
||||
result = response.json()
|
||||
response_text = result.get('response', '').strip()
|
||||
|
||||
if not response_text:
|
||||
return None, "Ollama returned empty response"
|
||||
|
||||
return response_text, None
|
||||
except requests.exceptions.ConnectionError as e:
|
||||
error_msg = f"Cannot connect to Ollama server at {Config.OLLAMA_BASE_URL}. Is Ollama running?"
|
||||
print(f"Connection error: {error_msg}")
|
||||
return None, error_msg
|
||||
except requests.exceptions.Timeout:
|
||||
error_msg = "Request to Ollama timed out after 30 seconds"
|
||||
print(f"Timeout error: {error_msg}")
|
||||
return None, error_msg
|
||||
except requests.exceptions.HTTPError as e:
|
||||
# Check if it's a model not found error
|
||||
if e.response.status_code == 404:
|
||||
try:
|
||||
error_data = e.response.json()
|
||||
if 'model' in error_data.get('error', '').lower() and 'not found' in error_data.get('error', '').lower():
|
||||
# Try to get available models
|
||||
available_models, _ = list_ollama_models()
|
||||
if available_models:
|
||||
error_msg = f"Model '{Config.OLLAMA_MODEL}' not found. Available models: {', '.join(available_models)}"
|
||||
else:
|
||||
error_msg = f"Model '{Config.OLLAMA_MODEL}' not found. Use 'ollama list' on the server to see available models."
|
||||
else:
|
||||
error_msg = f"HTTP error from Ollama: {e.response.status_code} - {e.response.text}"
|
||||
except (ValueError, KeyError):
|
||||
error_msg = f"HTTP error from Ollama: {e.response.status_code} - {e.response.text}"
|
||||
else:
|
||||
error_msg = f"HTTP error from Ollama: {e.response.status_code} - {e.response.text}"
|
||||
print(f"HTTP error: {error_msg}")
|
||||
return None, error_msg
|
||||
except requests.exceptions.RequestException as e:
|
||||
error_msg = f"Request error: {str(e)}"
|
||||
print(f"Request error: {error_msg}")
|
||||
return None, error_msg
|
||||
except Exception as e:
|
||||
error_msg = f"Unexpected error: {str(e)}"
|
||||
print(f"Unexpected error: {error_msg}")
|
||||
return None, error_msg
|
||||
162
backend/templates/newsletter_template.html
Normal file
162
backend/templates/newsletter_template.html
Normal file
@@ -0,0 +1,162 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<title>Munich News Daily</title>
|
||||
<!--[if mso]>
|
||||
<style type="text/css">
|
||||
body, table, td {font-family: Arial, Helvetica, sans-serif !important;}
|
||||
</style>
|
||||
<![endif]-->
|
||||
</head>
|
||||
<body style="margin: 0; padding: 0; background-color: #f4f4f4; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;">
|
||||
<!-- Wrapper Table -->
|
||||
<table role="presentation" width="100%" cellpadding="0" cellspacing="0" border="0" style="background-color: #f4f4f4;">
|
||||
<tr>
|
||||
<td align="center" style="padding: 20px 0;">
|
||||
<!-- Main Container -->
|
||||
<table role="presentation" width="600" cellpadding="0" cellspacing="0" border="0" style="background-color: #ffffff; max-width: 600px;">
|
||||
|
||||
<!-- Header -->
|
||||
<tr>
|
||||
<td style="background-color: #1a1a1a; padding: 30px 40px; text-align: center;">
|
||||
<h1 style="margin: 0 0 8px 0; font-size: 28px; font-weight: 700; color: #ffffff; letter-spacing: -0.5px;">
|
||||
Munich News Daily
|
||||
</h1>
|
||||
<p style="margin: 0; font-size: 14px; color: #999999; letter-spacing: 0.5px;">
|
||||
{{ date }}
|
||||
</p>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<!-- Greeting -->
|
||||
<tr>
|
||||
<td style="padding: 30px 40px 20px 40px;">
|
||||
<p style="margin: 0; font-size: 16px; line-height: 1.5; color: #333333;">
|
||||
Good morning ☀️
|
||||
</p>
|
||||
<p style="margin: 15px 0 0 0; font-size: 15px; line-height: 1.6; color: #666666;">
|
||||
Here's what's happening in Munich today. We've summarized {{ article_count }} stories using AI so you can stay informed in under 5 minutes.
|
||||
</p>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<!-- Divider -->
|
||||
<tr>
|
||||
<td style="padding: 0 40px;">
|
||||
<div style="height: 1px; background-color: #e0e0e0;"></div>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<!-- Articles -->
|
||||
{% for article in articles %}
|
||||
<tr>
|
||||
<td style="padding: 25px 40px;">
|
||||
<!-- Article Number Badge -->
|
||||
<table role="presentation" width="100%" cellpadding="0" cellspacing="0" border="0">
|
||||
<tr>
|
||||
<td>
|
||||
<span style="display: inline-block; background-color: #000000; color: #ffffff; width: 24px; height: 24px; line-height: 24px; text-align: center; border-radius: 50%; font-size: 12px; font-weight: 600;">
|
||||
{{ loop.index }}
|
||||
</span>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<!-- Article Title -->
|
||||
<h2 style="margin: 12px 0 8px 0; font-size: 19px; font-weight: 700; line-height: 1.3; color: #1a1a1a;">
|
||||
{{ article.title }}
|
||||
</h2>
|
||||
|
||||
<!-- Article Meta -->
|
||||
<p style="margin: 0 0 12px 0; font-size: 13px; color: #999999;">
|
||||
<span style="color: #000000; font-weight: 600;">{{ article.source }}</span>
|
||||
{% if article.author %}
|
||||
<span> • {{ article.author }}</span>
|
||||
{% endif %}
|
||||
</p>
|
||||
|
||||
<!-- Article Summary -->
|
||||
<p style="margin: 0 0 15px 0; font-size: 15px; line-height: 1.6; color: #333333;">
|
||||
{{ article.summary }}
|
||||
</p>
|
||||
|
||||
<!-- Read More Link -->
|
||||
<a href="{{ article.link }}" style="display: inline-block; color: #000000; text-decoration: none; font-size: 14px; font-weight: 600; border-bottom: 2px solid #000000; padding-bottom: 2px;">
|
||||
Read more →
|
||||
</a>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<!-- Article Divider -->
|
||||
{% if not loop.last %}
|
||||
<tr>
|
||||
<td style="padding: 0 40px;">
|
||||
<div style="height: 1px; background-color: #f0f0f0;"></div>
|
||||
</td>
|
||||
</tr>
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
|
||||
<!-- Bottom Divider -->
|
||||
<tr>
|
||||
<td style="padding: 25px 40px 0 40px;">
|
||||
<div style="height: 1px; background-color: #e0e0e0;"></div>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<!-- Summary Box -->
|
||||
<tr>
|
||||
<td style="padding: 30px 40px;">
|
||||
<table role="presentation" width="100%" cellpadding="0" cellspacing="0" border="0" style="background-color: #f8f8f8; border-radius: 8px;">
|
||||
<tr>
|
||||
<td style="padding: 25px; text-align: center;">
|
||||
<p style="margin: 0 0 8px 0; font-size: 13px; color: #666666; text-transform: uppercase; letter-spacing: 1px; font-weight: 600;">
|
||||
Today's Digest
|
||||
</p>
|
||||
<p style="margin: 0; font-size: 36px; font-weight: 700; color: #000000;">
|
||||
{{ article_count }}
|
||||
</p>
|
||||
<p style="margin: 8px 0 0 0; font-size: 14px; color: #666666;">
|
||||
stories • AI-summarized • 5 min read
|
||||
</p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<!-- Footer -->
|
||||
<tr>
|
||||
<td style="background-color: #1a1a1a; padding: 30px 40px; text-align: center;">
|
||||
<p style="margin: 0 0 15px 0; font-size: 14px; color: #ffffff; font-weight: 600;">
|
||||
Munich News Daily
|
||||
</p>
|
||||
<p style="margin: 0 0 20px 0; font-size: 13px; color: #999999; line-height: 1.5;">
|
||||
AI-powered news summaries for busy people.<br>
|
||||
Delivered daily to your inbox.
|
||||
</p>
|
||||
|
||||
<!-- Footer Links -->
|
||||
<p style="margin: 0; font-size: 12px; color: #666666;">
|
||||
<a href="{{ website_link }}" style="color: #999999; text-decoration: none;">Visit Website</a>
|
||||
<span style="color: #444444;"> • </span>
|
||||
<a href="{{ unsubscribe_link }}" style="color: #999999; text-decoration: none;">Unsubscribe</a>
|
||||
</p>
|
||||
|
||||
<p style="margin: 20px 0 0 0; font-size: 11px; color: #666666;">
|
||||
© {{ year }} Munich News Daily. All rights reserved.
|
||||
</p>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
</table>
|
||||
<!-- End Main Container -->
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<!-- End Wrapper Table -->
|
||||
</body>
|
||||
</html>
|
||||
128
backend/test_rss_extraction.py
Normal file
128
backend/test_rss_extraction.py
Normal file
@@ -0,0 +1,128 @@
|
||||
#!/usr/bin/env python
|
||||
"""
|
||||
Test RSS feed URL extraction
|
||||
Run from backend directory with venv activated:
|
||||
cd backend
|
||||
source venv/bin/activate # or venv\Scripts\activate on Windows
|
||||
python test_rss_extraction.py
|
||||
"""
|
||||
from pymongo import MongoClient
|
||||
from config import Config
|
||||
import feedparser
|
||||
from utils.rss_utils import extract_article_url, extract_article_summary, extract_published_date
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("RSS Feed URL Extraction Test")
|
||||
print("="*80)
|
||||
|
||||
# Connect to database
|
||||
print(f"\nConnecting to MongoDB: {Config.MONGODB_URI}")
|
||||
client = MongoClient(Config.MONGODB_URI)
|
||||
db = client[Config.DB_NAME]
|
||||
|
||||
# Get RSS feeds
|
||||
print("Fetching RSS feeds from database...")
|
||||
feeds = list(db['rss_feeds'].find())
|
||||
|
||||
if not feeds:
|
||||
print("\n❌ No RSS feeds in database!")
|
||||
print("\nAdd a feed first:")
|
||||
print(" curl -X POST http://localhost:5001/api/rss-feeds \\")
|
||||
print(" -H 'Content-Type: application/json' \\")
|
||||
print(" -d '{\"name\": \"Süddeutsche Politik\", \"url\": \"https://rss.sueddeutsche.de/rss/Politik\"}'")
|
||||
exit(1)
|
||||
|
||||
print(f"✓ Found {len(feeds)} feed(s)\n")
|
||||
|
||||
# Test each feed
|
||||
total_success = 0
|
||||
total_fail = 0
|
||||
|
||||
for feed_doc in feeds:
|
||||
name = feed_doc.get('name', 'Unknown')
|
||||
url = feed_doc.get('url', '')
|
||||
active = feed_doc.get('active', True)
|
||||
|
||||
print("\n" + "="*80)
|
||||
print(f"Feed: {name}")
|
||||
print(f"URL: {url}")
|
||||
print(f"Active: {'Yes' if active else 'No'}")
|
||||
print("="*80)
|
||||
|
||||
if not active:
|
||||
print("⏭ Skipping (inactive)")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Parse RSS
|
||||
print("\nFetching RSS feed...")
|
||||
feed = feedparser.parse(url)
|
||||
|
||||
if not feed.entries:
|
||||
print("❌ No entries found in feed")
|
||||
continue
|
||||
|
||||
print(f"✓ Found {len(feed.entries)} entries")
|
||||
|
||||
# Test first 3 entries
|
||||
print(f"\nTesting first 3 entries:")
|
||||
print("-" * 80)
|
||||
|
||||
for i, entry in enumerate(feed.entries[:3], 1):
|
||||
print(f"\n📰 Entry {i}:")
|
||||
|
||||
# Title
|
||||
title = entry.get('title', 'No title')
|
||||
print(f" Title: {title[:65]}")
|
||||
|
||||
# Test URL extraction
|
||||
article_url = extract_article_url(entry)
|
||||
if article_url:
|
||||
print(f" ✓ URL: {article_url}")
|
||||
total_success += 1
|
||||
else:
|
||||
print(f" ❌ Could not extract URL")
|
||||
print(f" Available fields: {list(entry.keys())[:10]}")
|
||||
print(f" link: {entry.get('link', 'N/A')}")
|
||||
print(f" guid: {entry.get('guid', 'N/A')}")
|
||||
print(f" id: {entry.get('id', 'N/A')}")
|
||||
total_fail += 1
|
||||
|
||||
# Test summary
|
||||
summary = extract_article_summary(entry)
|
||||
if summary:
|
||||
print(f" ✓ Summary: {summary[:70]}...")
|
||||
else:
|
||||
print(f" ⚠ No summary")
|
||||
|
||||
# Test date
|
||||
pub_date = extract_published_date(entry)
|
||||
if pub_date:
|
||||
print(f" ✓ Date: {pub_date}")
|
||||
else:
|
||||
print(f" ⚠ No date")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
# Summary
|
||||
print("\n" + "="*80)
|
||||
print("SUMMARY")
|
||||
print("="*80)
|
||||
print(f"Total URLs tested: {total_success + total_fail}")
|
||||
print(f"✓ Successfully extracted: {total_success}")
|
||||
print(f"❌ Failed to extract: {total_fail}")
|
||||
|
||||
if total_fail == 0:
|
||||
print("\n🎉 All URLs extracted successfully!")
|
||||
print("\nYou can now run the crawler:")
|
||||
print(" cd ../news_crawler")
|
||||
print(" pip install -r requirements.txt")
|
||||
print(" python crawler_service.py 5")
|
||||
else:
|
||||
print(f"\n⚠ {total_fail} URL(s) could not be extracted")
|
||||
print("Check the output above for details")
|
||||
|
||||
print("="*80 + "\n")
|
||||
1
backend/utils/__init__.py
Normal file
1
backend/utils/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# Utils package
|
||||
98
backend/utils/rss_utils.py
Normal file
98
backend/utils/rss_utils.py
Normal file
@@ -0,0 +1,98 @@
|
||||
"""
|
||||
Utility functions for RSS feed processing
|
||||
"""
|
||||
|
||||
|
||||
def extract_article_url(entry):
|
||||
"""
|
||||
Extract article URL from RSS entry.
|
||||
Different RSS feeds use different fields for the article URL.
|
||||
|
||||
Args:
|
||||
entry: feedparser entry object
|
||||
|
||||
Returns:
|
||||
str: Article URL or None if not found
|
||||
|
||||
Examples:
|
||||
- Most feeds use 'link'
|
||||
- Some use 'guid' as the URL
|
||||
- Some use 'id' as the URL
|
||||
- Some have guid as a dict with 'href'
|
||||
"""
|
||||
# Try 'link' first (most common)
|
||||
if entry.get('link') and entry.get('link', '').startswith('http'):
|
||||
return entry.get('link')
|
||||
|
||||
# Try 'guid' if it's a valid URL
|
||||
if entry.get('guid'):
|
||||
guid = entry.get('guid')
|
||||
# guid can be a string
|
||||
if isinstance(guid, str) and guid.startswith('http'):
|
||||
return guid
|
||||
# or a dict with 'href'
|
||||
elif isinstance(guid, dict) and guid.get('href', '').startswith('http'):
|
||||
return guid.get('href')
|
||||
|
||||
# Try 'id' if it's a valid URL
|
||||
if entry.get('id') and entry.get('id', '').startswith('http'):
|
||||
return entry.get('id')
|
||||
|
||||
# Try 'links' array (some feeds have multiple links)
|
||||
if entry.get('links'):
|
||||
for link in entry.get('links', []):
|
||||
if isinstance(link, dict) and link.get('href', '').startswith('http'):
|
||||
# Prefer 'alternate' type, but accept any http link
|
||||
if link.get('type') == 'text/html' or link.get('rel') == 'alternate':
|
||||
return link.get('href')
|
||||
# If no alternate found, return first http link
|
||||
for link in entry.get('links', []):
|
||||
if isinstance(link, dict) and link.get('href', '').startswith('http'):
|
||||
return link.get('href')
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def extract_article_summary(entry):
|
||||
"""
|
||||
Extract article summary/description from RSS entry.
|
||||
|
||||
Args:
|
||||
entry: feedparser entry object
|
||||
|
||||
Returns:
|
||||
str: Article summary or empty string
|
||||
"""
|
||||
# Try different fields
|
||||
if entry.get('summary'):
|
||||
return entry.get('summary', '')
|
||||
elif entry.get('description'):
|
||||
return entry.get('description', '')
|
||||
elif entry.get('content'):
|
||||
# content is usually a list of dicts
|
||||
content = entry.get('content', [])
|
||||
if content and isinstance(content, list) and len(content) > 0:
|
||||
return content[0].get('value', '')
|
||||
|
||||
return ''
|
||||
|
||||
|
||||
def extract_published_date(entry):
|
||||
"""
|
||||
Extract published date from RSS entry.
|
||||
|
||||
Args:
|
||||
entry: feedparser entry object
|
||||
|
||||
Returns:
|
||||
str: Published date or empty string
|
||||
"""
|
||||
# Try different fields
|
||||
if entry.get('published'):
|
||||
return entry.get('published', '')
|
||||
elif entry.get('updated'):
|
||||
return entry.get('updated', '')
|
||||
elif entry.get('created'):
|
||||
return entry.get('created', '')
|
||||
|
||||
return ''
|
||||
Reference in New Issue
Block a user