update
This commit is contained in:
3
.gitignore
vendored
3
.gitignore
vendored
@@ -186,8 +186,9 @@ docker-compose.override.yml
|
|||||||
mongodb_data/
|
mongodb_data/
|
||||||
ollama_data/
|
ollama_data/
|
||||||
|
|
||||||
# Database initialization script (may contain sensitive data)
|
# Database scripts (may contain sensitive data)
|
||||||
init-database.sh
|
init-database.sh
|
||||||
|
clean-database.sh
|
||||||
|
|
||||||
# Spec artifacts (optional - uncomment if you don't want to track specs)
|
# Spec artifacts (optional - uncomment if you don't want to track specs)
|
||||||
# .kiro/specs/
|
# .kiro/specs/
|
||||||
|
|||||||
328
.kiro/specs/article-title-translation/design.md
Normal file
328
.kiro/specs/article-title-translation/design.md
Normal file
@@ -0,0 +1,328 @@
|
|||||||
|
# Design Document: Article Title Translation
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This feature extends the existing Ollama AI integration to translate German article titles to English during the crawling process. The translation will be performed immediately after article content extraction and before AI summarization. Both the original German title and English translation will be stored in the MongoDB article document, and the newsletter template will be updated to display the English title prominently with the original as a subtitle.
|
||||||
|
|
||||||
|
The design leverages the existing Ollama infrastructure (same server, configuration, and error handling patterns) to minimize complexity and maintain consistency with the current summarization feature.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Component Interaction Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
RSS Feed Entry
|
||||||
|
↓
|
||||||
|
Crawler Service (extract_article_content)
|
||||||
|
↓
|
||||||
|
Article Data (with German title)
|
||||||
|
↓
|
||||||
|
Ollama Client (translate_title) ← New Method
|
||||||
|
↓
|
||||||
|
Translation Result
|
||||||
|
↓
|
||||||
|
Crawler Service (prepare article_doc)
|
||||||
|
↓
|
||||||
|
MongoDB (articles collection with title + title_en)
|
||||||
|
↓
|
||||||
|
Newsletter Service (fetch articles)
|
||||||
|
↓
|
||||||
|
Newsletter Template (display English title + German subtitle)
|
||||||
|
↓
|
||||||
|
Email to Subscribers
|
||||||
|
```
|
||||||
|
|
||||||
|
### Integration Points
|
||||||
|
|
||||||
|
1. **Ollama Client** - Add new `translate_title()` method alongside existing `summarize_article()` method
|
||||||
|
2. **Crawler Service** - Call translation after content extraction, before summarization
|
||||||
|
3. **Article Document Schema** - Add `title_en` and `translated_at` fields
|
||||||
|
4. **Newsletter Template** - Update title display logic to show English/German titles
|
||||||
|
|
||||||
|
## Components and Interfaces
|
||||||
|
|
||||||
|
### 1. Ollama Client Extension
|
||||||
|
|
||||||
|
**New Method: `translate_title(title, target_language='English')`**
|
||||||
|
|
||||||
|
```python
|
||||||
|
def translate_title(self, title, target_language='English'):
|
||||||
|
"""
|
||||||
|
Translate article title to target language
|
||||||
|
|
||||||
|
Args:
|
||||||
|
title (str): Original German title
|
||||||
|
target_language (str): Target language (default: 'English')
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict: {
|
||||||
|
'success': bool,
|
||||||
|
'translated_title': str or None,
|
||||||
|
'error': str or None,
|
||||||
|
'duration': float
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
|
||||||
|
**Implementation Details:**
|
||||||
|
|
||||||
|
- **Prompt Engineering**: Clear, concise prompt instructing the model to translate only the headline without explanations
|
||||||
|
- **Temperature**: 0.3 (lower than summarization's 0.7) for more consistent, deterministic translations
|
||||||
|
- **Token Limit**: 100 tokens (sufficient for title-length outputs)
|
||||||
|
- **Response Cleaning**:
|
||||||
|
- Remove surrounding quotes (single and double)
|
||||||
|
- Extract first line only (ignore any extra text)
|
||||||
|
- Trim whitespace
|
||||||
|
- **Error Handling**: Same pattern as `summarize_article()` - catch timeouts, connection errors, HTTP errors
|
||||||
|
- **Validation**: Check for empty title input before making API call
|
||||||
|
|
||||||
|
### 2. Crawler Service Integration
|
||||||
|
|
||||||
|
**Location**: In `crawl_rss_feed()` function, after content extraction
|
||||||
|
|
||||||
|
**Execution Order**:
|
||||||
|
1. Extract article content (existing)
|
||||||
|
2. **Translate title** (new)
|
||||||
|
3. Summarize article (existing)
|
||||||
|
4. Save to database (modified)
|
||||||
|
|
||||||
|
**Implementation Pattern**:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# After article_data extraction
|
||||||
|
translation_result = None
|
||||||
|
original_title = article_data.get('title') or entry.get('title', '')
|
||||||
|
|
||||||
|
if Config.OLLAMA_ENABLED:
|
||||||
|
# Translate title
|
||||||
|
print(f" 🌐 Translating title...")
|
||||||
|
translation_result = ollama_client.translate_title(original_title)
|
||||||
|
|
||||||
|
if translation_result and translation_result['success']:
|
||||||
|
print(f" ✓ Title translated ({translation_result['duration']:.1f}s)")
|
||||||
|
else:
|
||||||
|
print(f" ⚠ Translation failed: {translation_result['error']}")
|
||||||
|
|
||||||
|
# Then summarize (existing code)
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
**Console Output Format**:
|
||||||
|
- Success: `✓ Title translated (0.8s)`
|
||||||
|
- Failure: `⚠ Translation failed: Request timed out`
|
||||||
|
|
||||||
|
### 3. Data Models
|
||||||
|
|
||||||
|
**MongoDB Article Document Schema Extension**:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
// Existing fields
|
||||||
|
title: String, // Original German title
|
||||||
|
author: String,
|
||||||
|
link: String,
|
||||||
|
content: String,
|
||||||
|
summary: String,
|
||||||
|
word_count: Number,
|
||||||
|
summary_word_count: Number,
|
||||||
|
source: String,
|
||||||
|
category: String,
|
||||||
|
published_at: Date,
|
||||||
|
crawled_at: Date,
|
||||||
|
summarized_at: Date,
|
||||||
|
created_at: Date,
|
||||||
|
|
||||||
|
// New fields
|
||||||
|
title_en: String, // English translation of title (nullable)
|
||||||
|
translated_at: Date // Timestamp when translation completed (nullable)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Field Behavior**:
|
||||||
|
- `title_en`: NULL if translation fails or Ollama is disabled
|
||||||
|
- `translated_at`: NULL if translation fails, set to `datetime.utcnow()` on success
|
||||||
|
|
||||||
|
### 4. Newsletter Template Updates
|
||||||
|
|
||||||
|
**Current Title Display**:
|
||||||
|
```html
|
||||||
|
<h2 style="...">
|
||||||
|
{{ article.title }}
|
||||||
|
</h2>
|
||||||
|
```
|
||||||
|
|
||||||
|
**New Title Display Logic**:
|
||||||
|
```html
|
||||||
|
<!-- Primary title: English if available, otherwise German -->
|
||||||
|
<h2 style="margin: 12px 0 8px 0; font-size: 19px; font-weight: 700; line-height: 1.3; color: #1a1a1a;">
|
||||||
|
{{ article.title_en if article.title_en else article.title }}
|
||||||
|
</h2>
|
||||||
|
|
||||||
|
<!-- Subtitle: Original German title (only if English translation exists and differs) -->
|
||||||
|
{% if article.title_en and article.title_en != article.title %}
|
||||||
|
<p style="margin: 0 0 12px 0; font-size: 13px; color: #999999; font-style: italic;">
|
||||||
|
Original: {{ article.title }}
|
||||||
|
</p>
|
||||||
|
{% endif %}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Display Rules**:
|
||||||
|
1. If `title_en` exists and differs from `title`: Show English as primary, German as subtitle
|
||||||
|
2. If `title_en` is NULL or same as `title`: Show only the original title
|
||||||
|
3. Subtitle styling: Smaller font (13px), gray color (#999999), italic
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
### Translation Failure Scenarios
|
||||||
|
|
||||||
|
| Scenario | Behavior | User Impact |
|
||||||
|
|----------|----------|-------------|
|
||||||
|
| Ollama server unavailable | Skip translation, continue with summarization | Newsletter shows German title only |
|
||||||
|
| Translation timeout | Log error, store NULL in title_en | Newsletter shows German title only |
|
||||||
|
| Empty title input | Return error immediately, skip API call | Newsletter shows German title only |
|
||||||
|
| Ollama disabled in config | Skip translation entirely | Newsletter shows German title only |
|
||||||
|
| Network error | Catch exception, log error, continue | Newsletter shows German title only |
|
||||||
|
|
||||||
|
### Error Handling Principles
|
||||||
|
|
||||||
|
1. **Non-blocking**: Translation failures never prevent article processing
|
||||||
|
2. **Graceful degradation**: Fall back to original German title
|
||||||
|
3. **Consistent logging**: All errors logged with descriptive messages
|
||||||
|
4. **No retry logic**: Single attempt per article (same as summarization)
|
||||||
|
5. **Silent failures**: Newsletter displays seamlessly regardless of translation status
|
||||||
|
|
||||||
|
### Console Output Examples
|
||||||
|
|
||||||
|
**Success Case**:
|
||||||
|
```
|
||||||
|
🔍 Crawling: Neuer U-Bahn-Ausbau in München geplant...
|
||||||
|
🌐 Translating title...
|
||||||
|
✓ Title translated (0.8s)
|
||||||
|
🤖 Summarizing with AI...
|
||||||
|
✓ Summary: 45 words (from 320 words, 2.3s)
|
||||||
|
✓ Saved (320 words)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Translation Failure Case**:
|
||||||
|
```
|
||||||
|
🔍 Crawling: Neuer U-Bahn-Ausbau in München geplant...
|
||||||
|
🌐 Translating title...
|
||||||
|
⚠ Translation failed: Request timed out after 30 seconds
|
||||||
|
🤖 Summarizing with AI...
|
||||||
|
✓ Summary: 45 words (from 320 words, 2.3s)
|
||||||
|
✓ Saved (320 words)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
### Unit Testing
|
||||||
|
|
||||||
|
**Ollama Client Tests** (`test_ollama_client.py`):
|
||||||
|
1. Test successful translation with valid German title
|
||||||
|
2. Test empty title input handling
|
||||||
|
3. Test timeout handling
|
||||||
|
4. Test connection error handling
|
||||||
|
5. Test response cleaning (quotes, newlines, whitespace)
|
||||||
|
6. Test translation with special characters
|
||||||
|
7. Test translation with very long titles
|
||||||
|
|
||||||
|
**Test Data Examples**:
|
||||||
|
- Simple: "München plant neue U-Bahn-Linie"
|
||||||
|
- With quotes: "\"Historischer Tag\" für München"
|
||||||
|
- With special chars: "Oktoberfest 2024: 7,5 Millionen Besucher"
|
||||||
|
- Long: "Stadtrat beschließt umfassende Maßnahmen zur Verbesserung der Verkehrsinfrastruktur..."
|
||||||
|
|
||||||
|
### Integration Testing
|
||||||
|
|
||||||
|
**Crawler Service Tests**:
|
||||||
|
1. Test article processing with translation enabled
|
||||||
|
2. Test article processing with translation disabled
|
||||||
|
3. Test article processing when translation fails
|
||||||
|
4. Test database document structure includes new fields
|
||||||
|
5. Test console output formatting
|
||||||
|
|
||||||
|
### Manual Testing
|
||||||
|
|
||||||
|
**End-to-End Workflow**:
|
||||||
|
1. Enable Ollama in configuration
|
||||||
|
2. Trigger crawl with `max_articles=2`
|
||||||
|
3. Verify console shows translation status
|
||||||
|
4. Check MongoDB for `title_en` and `translated_at` fields
|
||||||
|
5. Send test newsletter
|
||||||
|
6. Verify email displays English title with German subtitle
|
||||||
|
|
||||||
|
**Test Scenarios**:
|
||||||
|
- Fresh crawl with Ollama enabled
|
||||||
|
- Re-crawl existing articles (should skip translation)
|
||||||
|
- Crawl with Ollama disabled
|
||||||
|
- Crawl with Ollama server stopped (simulate failure)
|
||||||
|
|
||||||
|
### Performance Testing
|
||||||
|
|
||||||
|
**Metrics to Monitor**:
|
||||||
|
- Translation duration per article (target: < 2 seconds)
|
||||||
|
- Impact on total crawl time (translation + summarization)
|
||||||
|
- Ollama server resource usage
|
||||||
|
|
||||||
|
**Expected Performance**:
|
||||||
|
- Translation: ~0.5-1.5 seconds per title
|
||||||
|
- Total per article: ~3-5 seconds (translation + summarization)
|
||||||
|
- Acceptable for batch processing during scheduled crawls
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### No New Configuration Required
|
||||||
|
|
||||||
|
The translation feature uses existing Ollama configuration:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# From config.py (existing)
|
||||||
|
OLLAMA_ENABLED = True/False
|
||||||
|
OLLAMA_BASE_URL = "http://ollama:11434"
|
||||||
|
OLLAMA_MODEL = "phi3:latest"
|
||||||
|
OLLAMA_TIMEOUT = 30
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rationale**: Simplifies deployment and maintains consistency. Translation is automatically enabled/disabled with the existing `OLLAMA_ENABLED` flag.
|
||||||
|
|
||||||
|
## Deployment Considerations
|
||||||
|
|
||||||
|
### Docker Container Updates
|
||||||
|
|
||||||
|
**Affected Services**:
|
||||||
|
- `crawler` service: Needs rebuild to include new translation code
|
||||||
|
- `sender` service: Needs rebuild to include updated newsletter template
|
||||||
|
|
||||||
|
**Deployment Steps**:
|
||||||
|
1. Update code in `news_crawler/ollama_client.py`
|
||||||
|
2. Update code in `news_crawler/crawler_service.py`
|
||||||
|
3. Update template in `news_sender/newsletter_template.html`
|
||||||
|
4. Rebuild containers: `docker-compose up -d --build crawler sender`
|
||||||
|
5. No database migration needed (new fields are nullable)
|
||||||
|
|
||||||
|
### Backward Compatibility
|
||||||
|
|
||||||
|
**Existing Articles**: Articles without `title_en` will display German title only (graceful fallback)
|
||||||
|
|
||||||
|
**No Breaking Changes**: Newsletter template handles NULL `title_en` values
|
||||||
|
|
||||||
|
### Rollback Plan
|
||||||
|
|
||||||
|
If issues arise:
|
||||||
|
1. Revert code changes
|
||||||
|
2. Rebuild containers
|
||||||
|
3. Existing articles with `title_en` will continue to work
|
||||||
|
4. New articles will only have German titles
|
||||||
|
|
||||||
|
## Future Enhancements
|
||||||
|
|
||||||
|
### Potential Improvements (Out of Scope)
|
||||||
|
|
||||||
|
1. **Batch Translation**: Translate multiple titles in single API call for efficiency
|
||||||
|
2. **Translation Caching**: Cache common phrases/words to reduce API calls
|
||||||
|
3. **Multi-language Support**: Add configuration for target language selection
|
||||||
|
4. **Translation Quality Metrics**: Track and log translation quality scores
|
||||||
|
5. **Retry Logic**: Implement retry with exponential backoff for failed translations
|
||||||
|
6. **Admin API**: Add endpoint to re-translate existing articles
|
||||||
|
|
||||||
|
These enhancements are not included in the current implementation to maintain simplicity and focus on core functionality.
|
||||||
75
.kiro/specs/article-title-translation/requirements.md
Normal file
75
.kiro/specs/article-title-translation/requirements.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# Requirements Document
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
This feature adds automatic translation of German article titles to English using the Ollama AI service. The translation will occur during the article crawling process and both the original German title and English translation will be stored in the database. The newsletter will display the English title prominently with the original German title as a subtitle when available.
|
||||||
|
|
||||||
|
## Glossary
|
||||||
|
|
||||||
|
- **Crawler Service**: The Python service that fetches articles from RSS feeds and processes them
|
||||||
|
- **Ollama Client**: The Python client that communicates with the Ollama AI server for text processing
|
||||||
|
- **Article Document**: The MongoDB document structure that stores article data
|
||||||
|
- **Newsletter Template**: The HTML template used to render the email newsletter sent to subscribers
|
||||||
|
- **Translation Result**: The response object returned by the Ollama translation function containing the translated title and metadata
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
### Requirement 1
|
||||||
|
|
||||||
|
**User Story:** As a newsletter subscriber, I want to see article titles in English, so that I can quickly understand the content without knowing German
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
|
||||||
|
1. WHEN the Crawler Service processes an article, THE Ollama Client SHALL translate the German title to English
|
||||||
|
2. THE Article Document SHALL store both the original German title and the English translation
|
||||||
|
3. THE Newsletter Template SHALL display the English title as the primary heading
|
||||||
|
4. WHERE an English translation exists, THE Newsletter Template SHALL display the original German title as a subtitle
|
||||||
|
5. IF the translation fails, THEN THE Newsletter Template SHALL display the original German title as the primary heading
|
||||||
|
|
||||||
|
### Requirement 2
|
||||||
|
|
||||||
|
**User Story:** As a system administrator, I want translation to be integrated with the existing Ollama service, so that I don't need to configure additional services
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
|
||||||
|
1. THE Ollama Client SHALL provide a translate_title method that accepts a German title and returns an English translation
|
||||||
|
2. THE translate_title method SHALL use the same Ollama server configuration as the existing summarization feature
|
||||||
|
3. THE translate_title method SHALL use a temperature setting of 0.3 for consistent translations
|
||||||
|
4. THE translate_title method SHALL limit the response to 100 tokens maximum for title-length outputs
|
||||||
|
5. THE translate_title method SHALL return a Translation Result containing success status, translated title, error message, and duration
|
||||||
|
|
||||||
|
### Requirement 3
|
||||||
|
|
||||||
|
**User Story:** As a developer, I want translation errors to be handled gracefully, so that article processing continues even when translation fails
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
|
||||||
|
1. IF the Ollama server is unavailable, THEN THE Crawler Service SHALL continue processing articles without translations
|
||||||
|
2. IF a translation request times out, THEN THE Crawler Service SHALL log the error and store the article with only the original title
|
||||||
|
3. THE Crawler Service SHALL display translation status in the console output during crawling
|
||||||
|
4. THE Article Document SHALL include a translated_at timestamp field when translation succeeds
|
||||||
|
5. THE Article Document SHALL store NULL in the title_en field when translation fails
|
||||||
|
|
||||||
|
### Requirement 4
|
||||||
|
|
||||||
|
**User Story:** As a newsletter subscriber, I want translations to be accurate and natural, so that the English titles read fluently
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
|
||||||
|
1. THE Ollama Client SHALL provide a clear prompt instructing the model to translate German news headlines to English
|
||||||
|
2. THE Ollama Client SHALL instruct the model to provide only the translation without explanations
|
||||||
|
3. THE Ollama Client SHALL clean the translation output by removing quotes and extra text
|
||||||
|
4. THE Ollama Client SHALL extract only the first line of the translation response
|
||||||
|
5. THE Ollama Client SHALL trim whitespace from the translated title
|
||||||
|
|
||||||
|
### Requirement 5
|
||||||
|
|
||||||
|
**User Story:** As a system operator, I want to see translation performance metrics, so that I can monitor the translation feature effectiveness
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
|
||||||
|
1. THE Crawler Service SHALL log the translation duration for each article
|
||||||
|
2. THE Crawler Service SHALL display a success indicator when translation completes
|
||||||
|
3. THE Crawler Service SHALL display an error message when translation fails
|
||||||
|
4. THE Translation Result SHALL include the duration in seconds
|
||||||
|
5. THE Article Document SHALL store the translated_at timestamp for successful translations
|
||||||
47
.kiro/specs/article-title-translation/tasks.md
Normal file
47
.kiro/specs/article-title-translation/tasks.md
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
# Implementation Plan
|
||||||
|
|
||||||
|
- [x] 1. Add translate_title method to Ollama client
|
||||||
|
- Create the `translate_title()` method in `news_crawler/ollama_client.py` that accepts a title string and target language parameter
|
||||||
|
- Implement the translation prompt that instructs the model to translate German headlines to English without explanations
|
||||||
|
- Configure Ollama API call with temperature=0.3 and num_predict=100 for consistent title-length translations
|
||||||
|
- Implement response cleaning logic to remove quotes, extract first line only, and trim whitespace
|
||||||
|
- Add error handling for timeout, connection errors, HTTP errors, and empty title input
|
||||||
|
- Return a dictionary with success status, translated_title, error message, and duration fields
|
||||||
|
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 4.1, 4.2, 4.3, 4.4, 4.5_
|
||||||
|
|
||||||
|
- [x] 2. Integrate translation into crawler service
|
||||||
|
- [x] 2.1 Add translation call in crawl_rss_feed function
|
||||||
|
- Locate the article processing section in `news_crawler/crawler_service.py` after content extraction
|
||||||
|
- Store the original title from article_data or entry
|
||||||
|
- Add conditional check for Config.OLLAMA_ENABLED before calling translation
|
||||||
|
- Call `ollama_client.translate_title()` with the original title
|
||||||
|
- Store the translation_result for later use in article document
|
||||||
|
- _Requirements: 1.1, 2.1_
|
||||||
|
|
||||||
|
- [x] 2.2 Add console logging for translation status
|
||||||
|
- Add "🌐 Translating title..." message before translation call
|
||||||
|
- Add success message with duration: "✓ Title translated (X.Xs)"
|
||||||
|
- Add failure message with error: "⚠ Translation failed: {error}"
|
||||||
|
- _Requirements: 5.1, 5.2, 5.3_
|
||||||
|
|
||||||
|
- [x] 2.3 Update article document structure
|
||||||
|
- Modify the article_doc dictionary to include `title_en` field with translated title or None
|
||||||
|
- Add `translated_at` field set to datetime.utcnow() on success or None on failure
|
||||||
|
- Ensure the original `title` field still contains the German title
|
||||||
|
- _Requirements: 1.2, 3.5_
|
||||||
|
|
||||||
|
- [x] 3. Update newsletter template for bilingual title display
|
||||||
|
- Modify `news_sender/newsletter_template.html` to display English title as primary heading when available
|
||||||
|
- Add conditional logic to show original German title as subtitle only when English translation exists and differs
|
||||||
|
- Style the subtitle with smaller font (13px), gray color (#999999), and italic formatting
|
||||||
|
- Ensure fallback to German title when title_en is NULL or missing
|
||||||
|
- _Requirements: 1.3, 1.4, 1.5_
|
||||||
|
|
||||||
|
- [x] 4. Test the translation feature end-to-end
|
||||||
|
- Rebuild the crawler Docker container with the new translation code
|
||||||
|
- Clear existing articles from the database for clean testing
|
||||||
|
- Trigger a test crawl with max_articles=2 to process fresh articles
|
||||||
|
- Verify console output shows translation status messages
|
||||||
|
- Check MongoDB to confirm title_en and translated_at fields are populated
|
||||||
|
- Send a test newsletter email to verify English titles display correctly with German subtitles
|
||||||
|
- _Requirements: 1.1, 1.2, 1.3, 1.4, 5.1, 5.2, 5.4, 5.5_
|
||||||
@@ -2,7 +2,21 @@ FROM python:3.11-slim
|
|||||||
|
|
||||||
WORKDIR /app
|
WORKDIR /app
|
||||||
|
|
||||||
# Install dependencies
|
# Install Docker CLI for admin endpoints
|
||||||
|
RUN apt-get update && \
|
||||||
|
apt-get install -y --no-install-recommends \
|
||||||
|
ca-certificates \
|
||||||
|
curl \
|
||||||
|
gnupg \
|
||||||
|
&& install -m 0755 -d /etc/apt/keyrings \
|
||||||
|
&& curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc \
|
||||||
|
&& chmod a+r /etc/apt/keyrings/docker.asc \
|
||||||
|
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian bookworm stable" > /etc/apt/sources.list.d/docker.list \
|
||||||
|
&& apt-get update \
|
||||||
|
&& apt-get install -y --no-install-recommends docker-ce-cli \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
# Install Python dependencies
|
||||||
COPY requirements.txt .
|
COPY requirements.txt .
|
||||||
RUN pip install --no-cache-dir -r requirements.txt
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ from routes.ollama_routes import ollama_bp
|
|||||||
from routes.newsletter_routes import newsletter_bp
|
from routes.newsletter_routes import newsletter_bp
|
||||||
from routes.tracking_routes import tracking_bp
|
from routes.tracking_routes import tracking_bp
|
||||||
from routes.analytics_routes import analytics_bp
|
from routes.analytics_routes import analytics_bp
|
||||||
|
from routes.admin_routes import admin_bp
|
||||||
|
|
||||||
# Initialize Flask app
|
# Initialize Flask app
|
||||||
app = Flask(__name__)
|
app = Flask(__name__)
|
||||||
@@ -25,6 +26,7 @@ app.register_blueprint(ollama_bp)
|
|||||||
app.register_blueprint(newsletter_bp)
|
app.register_blueprint(newsletter_bp)
|
||||||
app.register_blueprint(tracking_bp)
|
app.register_blueprint(tracking_bp)
|
||||||
app.register_blueprint(analytics_bp)
|
app.register_blueprint(analytics_bp)
|
||||||
|
app.register_blueprint(admin_bp)
|
||||||
|
|
||||||
# Health check endpoint
|
# Health check endpoint
|
||||||
@app.route('/health')
|
@app.route('/health')
|
||||||
|
|||||||
193
backend/routes/admin_routes.py
Normal file
193
backend/routes/admin_routes.py
Normal file
@@ -0,0 +1,193 @@
|
|||||||
|
"""
|
||||||
|
Admin routes for testing and manual operations
|
||||||
|
"""
|
||||||
|
from flask import Blueprint, request, jsonify
|
||||||
|
import subprocess
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
admin_bp = Blueprint('admin', __name__)
|
||||||
|
|
||||||
|
|
||||||
|
@admin_bp.route('/api/admin/trigger-crawl', methods=['POST'])
|
||||||
|
def trigger_crawl():
|
||||||
|
"""
|
||||||
|
Manually trigger the news crawler
|
||||||
|
|
||||||
|
Request body (optional):
|
||||||
|
{
|
||||||
|
"max_articles": 10 // Number of articles per feed
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
data = request.get_json() or {}
|
||||||
|
max_articles = data.get('max_articles', 10)
|
||||||
|
|
||||||
|
# Validate max_articles
|
||||||
|
if not isinstance(max_articles, int) or max_articles < 1 or max_articles > 100:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': 'max_articles must be an integer between 1 and 100'
|
||||||
|
}), 400
|
||||||
|
|
||||||
|
# Execute crawler in crawler container using docker exec
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
['docker', 'exec', 'munich-news-crawler', 'python', 'crawler_service.py', str(max_articles)],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=300 # 5 minute timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check result
|
||||||
|
success = result.returncode == 0
|
||||||
|
|
||||||
|
return jsonify({
|
||||||
|
'success': success,
|
||||||
|
'message': f'Crawler {"executed successfully" if success else "failed"}',
|
||||||
|
'max_articles': max_articles,
|
||||||
|
'output': result.stdout[-1000:] if result.stdout else '', # Last 1000 chars
|
||||||
|
'errors': result.stderr[-500:] if result.stderr else ''
|
||||||
|
}), 200 if success else 500
|
||||||
|
|
||||||
|
except FileNotFoundError:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': 'Docker command not found. Make sure Docker is installed and the socket is mounted.'
|
||||||
|
}), 500
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': 'Crawler timed out after 5 minutes'
|
||||||
|
}), 500
|
||||||
|
except Exception as e:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': f'Failed to run crawler: {str(e)}'
|
||||||
|
}), 500
|
||||||
|
|
||||||
|
|
||||||
|
@admin_bp.route('/api/admin/send-test-email', methods=['POST'])
|
||||||
|
def send_test_email():
|
||||||
|
"""
|
||||||
|
Send a test newsletter to a specific email
|
||||||
|
|
||||||
|
Request body:
|
||||||
|
{
|
||||||
|
"email": "test@example.com",
|
||||||
|
"max_articles": 10 // Optional, defaults to 10
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
data = request.get_json()
|
||||||
|
|
||||||
|
if not data or 'email' not in data:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': 'Email address is required'
|
||||||
|
}), 400
|
||||||
|
|
||||||
|
email = data.get('email', '').strip()
|
||||||
|
max_articles = data.get('max_articles', 10)
|
||||||
|
|
||||||
|
# Validate email
|
||||||
|
if not email or '@' not in email:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': 'Invalid email address'
|
||||||
|
}), 400
|
||||||
|
|
||||||
|
# Validate max_articles (not used currently but validated for future use)
|
||||||
|
if not isinstance(max_articles, int) or max_articles < 1 or max_articles > 50:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': 'max_articles must be an integer between 1 and 50'
|
||||||
|
}), 400
|
||||||
|
|
||||||
|
# Execute sender in sender container using docker exec
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
['docker', 'exec', 'munich-news-sender', 'python', 'sender_service.py', 'test', email],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=60 # 1 minute timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if successful
|
||||||
|
success = result.returncode == 0
|
||||||
|
|
||||||
|
return jsonify({
|
||||||
|
'success': success,
|
||||||
|
'message': f'Test email {"sent" if success else "failed"} to {email}',
|
||||||
|
'email': email,
|
||||||
|
'output': result.stdout[-1000:] if result.stdout else '', # Last 1000 chars
|
||||||
|
'errors': result.stderr[-500:] if result.stderr else ''
|
||||||
|
}), 200 if success else 500
|
||||||
|
|
||||||
|
except FileNotFoundError:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': 'Docker command not found. Make sure Docker is installed and the socket is mounted.'
|
||||||
|
}), 500
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': 'Email sending timed out after 1 minute'
|
||||||
|
}), 500
|
||||||
|
except Exception as e:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': f'Failed to send email: {str(e)}'
|
||||||
|
}), 500
|
||||||
|
|
||||||
|
|
||||||
|
@admin_bp.route('/api/admin/stats', methods=['GET'])
|
||||||
|
def get_stats():
|
||||||
|
"""Get system statistics"""
|
||||||
|
try:
|
||||||
|
from database import (
|
||||||
|
articles_collection,
|
||||||
|
subscribers_collection,
|
||||||
|
rss_feeds_collection,
|
||||||
|
newsletter_sends_collection,
|
||||||
|
link_clicks_collection
|
||||||
|
)
|
||||||
|
|
||||||
|
stats = {
|
||||||
|
'articles': {
|
||||||
|
'total': articles_collection.count_documents({}),
|
||||||
|
'with_summary': articles_collection.count_documents({'summary': {'$exists': True, '$ne': None}}),
|
||||||
|
'today': articles_collection.count_documents({
|
||||||
|
'crawled_at': {
|
||||||
|
'$gte': datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
},
|
||||||
|
'subscribers': {
|
||||||
|
'total': subscribers_collection.count_documents({}),
|
||||||
|
'active': subscribers_collection.count_documents({'active': True})
|
||||||
|
},
|
||||||
|
'rss_feeds': {
|
||||||
|
'total': rss_feeds_collection.count_documents({}),
|
||||||
|
'active': rss_feeds_collection.count_documents({'active': True})
|
||||||
|
},
|
||||||
|
'tracking': {
|
||||||
|
'total_sends': newsletter_sends_collection.count_documents({}),
|
||||||
|
'total_opens': newsletter_sends_collection.count_documents({'opened': True}),
|
||||||
|
'total_clicks': link_clicks_collection.count_documents({'clicked': True})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return jsonify(stats), 200
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return jsonify({
|
||||||
|
'success': False,
|
||||||
|
'error': str(e)
|
||||||
|
}), 500
|
||||||
|
|
||||||
|
|
||||||
|
# Import datetime for stats endpoint
|
||||||
|
from datetime import datetime
|
||||||
@@ -62,6 +62,7 @@ services:
|
|||||||
- TZ=Europe/Berlin
|
- TZ=Europe/Berlin
|
||||||
volumes:
|
volumes:
|
||||||
- ./backend/.env:/app/.env:ro
|
- ./backend/.env:/app/.env:ro
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||||
networks:
|
networks:
|
||||||
- munich-news-network
|
- munich-news-network
|
||||||
healthcheck:
|
healthcheck:
|
||||||
|
|||||||
240
docs/ADMIN_API.md
Normal file
240
docs/ADMIN_API.md
Normal file
@@ -0,0 +1,240 @@
|
|||||||
|
# Admin API Reference
|
||||||
|
|
||||||
|
Admin endpoints for testing and manual operations.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The admin API allows you to trigger manual operations like crawling news and sending test emails directly through HTTP requests.
|
||||||
|
|
||||||
|
**How it works**: The backend container has access to the Docker socket, allowing it to execute commands in other containers via `docker exec`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
### Trigger Crawler
|
||||||
|
|
||||||
|
Manually trigger the news crawler to fetch new articles.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/admin/trigger-crawl
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request Body** (optional):
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"max_articles": 10
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `max_articles` (integer, optional): Number of articles to crawl per feed (1-100, default: 10)
|
||||||
|
|
||||||
|
**Response**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"message": "Crawler executed successfully",
|
||||||
|
"max_articles": 10,
|
||||||
|
"output": "... crawler output (last 1000 chars) ...",
|
||||||
|
"errors": ""
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```bash
|
||||||
|
# Crawl 5 articles per feed
|
||||||
|
curl -X POST http://localhost:5001/api/admin/trigger-crawl \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"max_articles": 5}'
|
||||||
|
|
||||||
|
# Use default (10 articles)
|
||||||
|
curl -X POST http://localhost:5001/api/admin/trigger-crawl
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Send Test Email
|
||||||
|
|
||||||
|
Send a test newsletter to a specific email address.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/admin/send-test-email
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request Body**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"email": "test@example.com",
|
||||||
|
"max_articles": 10
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `email` (string, required): Email address to send test newsletter to
|
||||||
|
- `max_articles` (integer, optional): Number of articles to include (1-50, default: 10)
|
||||||
|
|
||||||
|
**Response**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"message": "Test email sent to test@example.com",
|
||||||
|
"email": "test@example.com",
|
||||||
|
"output": "... sender output ...",
|
||||||
|
"errors": ""
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```bash
|
||||||
|
# Send test email
|
||||||
|
curl -X POST http://localhost:5001/api/admin/send-test-email \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"email": "your-email@example.com"}'
|
||||||
|
|
||||||
|
# Send with custom article count
|
||||||
|
curl -X POST http://localhost:5001/api/admin/send-test-email \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"email": "your-email@example.com", "max_articles": 5}'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Get System Statistics
|
||||||
|
|
||||||
|
Get overview statistics of the system.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/admin/stats
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"articles": {
|
||||||
|
"total": 150,
|
||||||
|
"with_summary": 120,
|
||||||
|
"today": 15
|
||||||
|
},
|
||||||
|
"subscribers": {
|
||||||
|
"total": 50,
|
||||||
|
"active": 45
|
||||||
|
},
|
||||||
|
"rss_feeds": {
|
||||||
|
"total": 4,
|
||||||
|
"active": 4
|
||||||
|
},
|
||||||
|
"tracking": {
|
||||||
|
"total_sends": 200,
|
||||||
|
"total_opens": 150,
|
||||||
|
"total_clicks": 75
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:5001/api/admin/stats
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Workflow Examples
|
||||||
|
|
||||||
|
### Test Complete System
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Check current stats
|
||||||
|
curl http://localhost:5001/api/admin/stats
|
||||||
|
|
||||||
|
# 2. Trigger crawler to fetch new articles
|
||||||
|
curl -X POST http://localhost:5001/api/admin/trigger-crawl \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"max_articles": 5}'
|
||||||
|
|
||||||
|
# 3. Wait a moment for crawler to finish, then check stats again
|
||||||
|
sleep 30
|
||||||
|
curl http://localhost:5001/api/admin/stats
|
||||||
|
|
||||||
|
# 4. Send test email
|
||||||
|
curl -X POST http://localhost:5001/api/admin/send-test-email \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"email": "your-email@example.com"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quick Test Newsletter
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Send test email with latest articles
|
||||||
|
curl -X POST http://localhost:5001/api/admin/send-test-email \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"email": "your-email@example.com", "max_articles": 3}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fetch Fresh Content
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Crawl more articles from each feed
|
||||||
|
curl -X POST http://localhost:5001/api/admin/trigger-crawl \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"max_articles": 20}'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error Responses
|
||||||
|
|
||||||
|
All endpoints return standard error responses:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": false,
|
||||||
|
"error": "Error message here"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common HTTP Status Codes**:
|
||||||
|
- `200` - Success
|
||||||
|
- `400` - Bad request (invalid parameters)
|
||||||
|
- `500` - Server error
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Security Notes
|
||||||
|
|
||||||
|
⚠️ **Important**: These are admin endpoints and should be protected in production!
|
||||||
|
|
||||||
|
Recommendations:
|
||||||
|
1. Add authentication/authorization
|
||||||
|
2. Rate limiting
|
||||||
|
3. IP whitelisting
|
||||||
|
4. API key requirement
|
||||||
|
5. Audit logging
|
||||||
|
|
||||||
|
Example protection (add to routes):
|
||||||
|
```python
|
||||||
|
from functools import wraps
|
||||||
|
from flask import request
|
||||||
|
|
||||||
|
def require_api_key(f):
|
||||||
|
@wraps(f)
|
||||||
|
def decorated_function(*args, **kwargs):
|
||||||
|
api_key = request.headers.get('X-API-Key')
|
||||||
|
if api_key != os.getenv('ADMIN_API_KEY'):
|
||||||
|
return jsonify({'error': 'Unauthorized'}), 401
|
||||||
|
return f(*args, **kwargs)
|
||||||
|
return decorated_function
|
||||||
|
|
||||||
|
@admin_bp.route('/api/admin/trigger-crawl', methods=['POST'])
|
||||||
|
@require_api_key
|
||||||
|
def trigger_crawl():
|
||||||
|
# ... endpoint code
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Endpoints
|
||||||
|
|
||||||
|
- **[Newsletter Preview](../backend/routes/newsletter_routes.py)**: `/api/newsletter/preview` - Preview newsletter HTML
|
||||||
|
- **[Analytics](API.md)**: `/api/analytics/*` - View engagement metrics
|
||||||
|
- **[RSS Feeds](API.md)**: `/api/rss-feeds` - Manage RSS feeds
|
||||||
@@ -6,14 +6,25 @@ from dotenv import load_dotenv
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
# Load environment variables from backend/.env
|
# Load environment variables from backend/.env
|
||||||
backend_dir = Path(__file__).parent.parent / 'backend'
|
# Try multiple locations (Docker vs local)
|
||||||
env_path = backend_dir / '.env'
|
env_locations = [
|
||||||
|
Path('/app/.env'), # Docker location
|
||||||
|
Path(__file__).parent.parent / 'backend' / '.env', # Local location
|
||||||
|
Path(__file__).parent / '.env', # Current directory
|
||||||
|
]
|
||||||
|
|
||||||
if env_path.exists():
|
env_loaded = False
|
||||||
|
for env_path in env_locations:
|
||||||
|
if env_path.exists():
|
||||||
load_dotenv(dotenv_path=env_path)
|
load_dotenv(dotenv_path=env_path)
|
||||||
print(f"✓ Loaded configuration from: {env_path}")
|
print(f"✓ Loaded configuration from: {env_path}")
|
||||||
else:
|
env_loaded = True
|
||||||
print(f"⚠ Warning: .env file not found at {env_path}")
|
break
|
||||||
|
|
||||||
|
if not env_loaded:
|
||||||
|
print(f"⚠ Warning: .env file not found in any of these locations:")
|
||||||
|
for loc in env_locations:
|
||||||
|
print(f" - {loc}")
|
||||||
|
|
||||||
|
|
||||||
class Config:
|
class Config:
|
||||||
|
|||||||
@@ -344,6 +344,21 @@ def crawl_rss_feed(feed_url, feed_name, feed_category='general', max_articles=10
|
|||||||
article_data = extract_article_content(article_url)
|
article_data = extract_article_content(article_url)
|
||||||
|
|
||||||
if article_data and article_data.get('content'):
|
if article_data and article_data.get('content'):
|
||||||
|
# Store original title
|
||||||
|
original_title = article_data.get('title') or entry.get('title', '')
|
||||||
|
|
||||||
|
# Translate title with Ollama if enabled
|
||||||
|
translation_result = None
|
||||||
|
if Config.OLLAMA_ENABLED and original_title:
|
||||||
|
print(f" 🌐 Translating title...")
|
||||||
|
translation_result = ollama_client.translate_title(original_title)
|
||||||
|
|
||||||
|
if translation_result and translation_result['success']:
|
||||||
|
print(f" ✓ Title translated ({translation_result['duration']:.1f}s)")
|
||||||
|
else:
|
||||||
|
error_msg = translation_result['error'] if translation_result else 'Unknown error'
|
||||||
|
print(f" ⚠ Translation failed: {error_msg}")
|
||||||
|
|
||||||
# Summarize with Ollama if enabled
|
# Summarize with Ollama if enabled
|
||||||
summary_result = None
|
summary_result = None
|
||||||
if Config.OLLAMA_ENABLED and article_data.get('content'):
|
if Config.OLLAMA_ENABLED and article_data.get('content'):
|
||||||
@@ -362,7 +377,8 @@ def crawl_rss_feed(feed_url, feed_name, feed_category='general', max_articles=10
|
|||||||
|
|
||||||
# Prepare document
|
# Prepare document
|
||||||
article_doc = {
|
article_doc = {
|
||||||
'title': article_data.get('title') or entry.get('title', ''),
|
'title': original_title,
|
||||||
|
'title_en': translation_result['translated_title'] if translation_result and translation_result['success'] else None,
|
||||||
'author': article_data.get('author'),
|
'author': article_data.get('author'),
|
||||||
'link': article_url,
|
'link': article_url,
|
||||||
'content': article_data.get('content', ''), # Full article content
|
'content': article_data.get('content', ''), # Full article content
|
||||||
@@ -373,6 +389,7 @@ def crawl_rss_feed(feed_url, feed_name, feed_category='general', max_articles=10
|
|||||||
'category': feed_category,
|
'category': feed_category,
|
||||||
'published_at': extract_published_date(entry) or article_data.get('published_date', ''),
|
'published_at': extract_published_date(entry) or article_data.get('published_date', ''),
|
||||||
'crawled_at': article_data.get('crawled_at'),
|
'crawled_at': article_data.get('crawled_at'),
|
||||||
|
'translated_at': datetime.utcnow() if translation_result and translation_result['success'] else None,
|
||||||
'summarized_at': datetime.utcnow() if summary_result and summary_result['success'] else None,
|
'summarized_at': datetime.utcnow() if summary_result and summary_result['success'] else None,
|
||||||
'created_at': datetime.utcnow()
|
'created_at': datetime.utcnow()
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -160,6 +160,147 @@ class OllamaClient:
|
|||||||
'duration': time.time() - start_time
|
'duration': time.time() - start_time
|
||||||
}
|
}
|
||||||
|
|
||||||
|
def translate_title(self, title, target_language='English'):
|
||||||
|
"""
|
||||||
|
Translate article title to target language
|
||||||
|
|
||||||
|
Args:
|
||||||
|
title: Original title (typically German)
|
||||||
|
target_language: Target language (default: 'English')
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
{
|
||||||
|
'success': bool, # Whether translation succeeded
|
||||||
|
'translated_title': str or None, # Translated title
|
||||||
|
'error': str or None, # Error message if failed
|
||||||
|
'duration': float # Time taken in seconds
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
if not self.enabled:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'translated_title': None,
|
||||||
|
'error': 'Ollama is not enabled',
|
||||||
|
'duration': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
if not title or len(title.strip()) == 0:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'translated_title': None,
|
||||||
|
'error': 'Title is empty',
|
||||||
|
'duration': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Construct prompt
|
||||||
|
prompt = self._build_translation_prompt(title, target_language)
|
||||||
|
|
||||||
|
# Prepare request
|
||||||
|
url = f"{self.base_url}/api/generate"
|
||||||
|
headers = {'Content-Type': 'application/json'}
|
||||||
|
if self.api_key:
|
||||||
|
headers['Authorization'] = f'Bearer {self.api_key}'
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
'model': self.model,
|
||||||
|
'prompt': prompt,
|
||||||
|
'stream': False,
|
||||||
|
'options': {
|
||||||
|
'temperature': 0.3, # Lower temperature for consistent translations
|
||||||
|
'num_predict': 100 # Limit response length for title-length outputs
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Make request
|
||||||
|
response = requests.post(
|
||||||
|
url,
|
||||||
|
json=payload,
|
||||||
|
headers=headers,
|
||||||
|
timeout=self.timeout
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
# Parse response
|
||||||
|
result = response.json()
|
||||||
|
translated_title = result.get('response', '').strip()
|
||||||
|
|
||||||
|
if not translated_title:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'translated_title': None,
|
||||||
|
'error': 'Ollama returned empty translation',
|
||||||
|
'duration': time.time() - start_time
|
||||||
|
}
|
||||||
|
|
||||||
|
# Clean the translation output
|
||||||
|
translated_title = self._clean_translation(translated_title)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'success': True,
|
||||||
|
'translated_title': translated_title,
|
||||||
|
'error': None,
|
||||||
|
'duration': time.time() - start_time
|
||||||
|
}
|
||||||
|
|
||||||
|
except requests.exceptions.Timeout:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'translated_title': None,
|
||||||
|
'error': f'Request timed out after {self.timeout} seconds',
|
||||||
|
'duration': time.time() - start_time
|
||||||
|
}
|
||||||
|
except requests.exceptions.ConnectionError:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'translated_title': None,
|
||||||
|
'error': f'Cannot connect to Ollama server at {self.base_url}',
|
||||||
|
'duration': time.time() - start_time
|
||||||
|
}
|
||||||
|
except requests.exceptions.HTTPError as e:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'translated_title': None,
|
||||||
|
'error': f'HTTP error: {e.response.status_code} - {e.response.text[:100]}',
|
||||||
|
'duration': time.time() - start_time
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'translated_title': None,
|
||||||
|
'error': f'Unexpected error: {str(e)}',
|
||||||
|
'duration': time.time() - start_time
|
||||||
|
}
|
||||||
|
|
||||||
|
def _build_translation_prompt(self, title, target_language):
|
||||||
|
"""Build prompt for title translation"""
|
||||||
|
prompt = f"""Translate the following German news headline to {target_language}. Provide only the translation without any explanations, quotes, or additional text.
|
||||||
|
|
||||||
|
German headline:
|
||||||
|
{title}
|
||||||
|
|
||||||
|
{target_language} translation:"""
|
||||||
|
|
||||||
|
return prompt
|
||||||
|
|
||||||
|
def _clean_translation(self, translation):
|
||||||
|
"""Clean translation output by removing quotes and extra text"""
|
||||||
|
# Extract first line only
|
||||||
|
translation = translation.split('\n')[0]
|
||||||
|
|
||||||
|
# Remove surrounding quotes (single and double)
|
||||||
|
translation = translation.strip()
|
||||||
|
if (translation.startswith('"') and translation.endswith('"')) or \
|
||||||
|
(translation.startswith("'") and translation.endswith("'")):
|
||||||
|
translation = translation[1:-1]
|
||||||
|
|
||||||
|
# Trim whitespace again after quote removal
|
||||||
|
translation = translation.strip()
|
||||||
|
|
||||||
|
return translation
|
||||||
|
|
||||||
def _build_summarization_prompt(self, content, max_words):
|
def _build_summarization_prompt(self, content, max_words):
|
||||||
"""Build prompt for article summarization"""
|
"""Build prompt for article summarization"""
|
||||||
# Truncate content if too long (keep first 5000 words)
|
# Truncate content if too long (keep first 5000 words)
|
||||||
|
|||||||
@@ -67,9 +67,16 @@
|
|||||||
|
|
||||||
<!-- Article Title -->
|
<!-- Article Title -->
|
||||||
<h2 style="margin: 12px 0 8px 0; font-size: 19px; font-weight: 700; line-height: 1.3; color: #1a1a1a;">
|
<h2 style="margin: 12px 0 8px 0; font-size: 19px; font-weight: 700; line-height: 1.3; color: #1a1a1a;">
|
||||||
{{ article.title }}
|
{{ article.title_en if article.title_en else article.title }}
|
||||||
</h2>
|
</h2>
|
||||||
|
|
||||||
|
<!-- Original German Title (subtitle) -->
|
||||||
|
{% if article.title_en and article.title_en != article.title %}
|
||||||
|
<p style="margin: 0 0 12px 0; font-size: 13px; color: #999999; font-style: italic;">
|
||||||
|
Original: {{ article.title }}
|
||||||
|
</p>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
<!-- Article Meta -->
|
<!-- Article Meta -->
|
||||||
<p style="margin: 0 0 12px 0; font-size: 13px; color: #999999;">
|
<p style="margin: 0 0 12px 0; font-size: 13px; color: #999999;">
|
||||||
<span style="color: #000000; font-weight: 600;">{{ article.source }}</span>
|
<span style="color: #000000; font-weight: 600;">{{ article.source }}</span>
|
||||||
|
|||||||
@@ -27,14 +27,25 @@ from services import tracking_service
|
|||||||
from tracking_integration import inject_tracking_pixel, replace_article_links, generate_tracking_urls
|
from tracking_integration import inject_tracking_pixel, replace_article_links, generate_tracking_urls
|
||||||
|
|
||||||
# Load environment variables from backend/.env
|
# Load environment variables from backend/.env
|
||||||
backend_dir = Path(__file__).parent.parent / 'backend'
|
# Try multiple locations (Docker vs local)
|
||||||
env_path = backend_dir / '.env'
|
env_locations = [
|
||||||
|
Path('/app/.env'), # Docker location
|
||||||
|
Path(__file__).parent.parent / 'backend' / '.env', # Local location
|
||||||
|
Path(__file__).parent / '.env', # Current directory
|
||||||
|
]
|
||||||
|
|
||||||
if env_path.exists():
|
env_loaded = False
|
||||||
|
for env_path in env_locations:
|
||||||
|
if env_path.exists():
|
||||||
load_dotenv(dotenv_path=env_path)
|
load_dotenv(dotenv_path=env_path)
|
||||||
print(f"✓ Loaded configuration from: {env_path}")
|
print(f"✓ Loaded configuration from: {env_path}")
|
||||||
else:
|
env_loaded = True
|
||||||
print(f"⚠ Warning: .env file not found at {env_path}")
|
break
|
||||||
|
|
||||||
|
if not env_loaded:
|
||||||
|
print(f"⚠ Warning: .env file not found in any of these locations:")
|
||||||
|
for loc in env_locations:
|
||||||
|
print(f" - {loc}")
|
||||||
|
|
||||||
|
|
||||||
class Config:
|
class Config:
|
||||||
@@ -114,6 +125,8 @@ def get_latest_articles(max_articles=10, hours=24):
|
|||||||
|
|
||||||
articles.append({
|
articles.append({
|
||||||
'title': doc.get('title', ''),
|
'title': doc.get('title', ''),
|
||||||
|
'title_en': doc.get('title_en'),
|
||||||
|
'translated_at': doc.get('translated_at'),
|
||||||
'author': doc.get('author'),
|
'author': doc.get('author'),
|
||||||
'link': doc.get('link', ''),
|
'link': doc.get('link', ''),
|
||||||
'summary': doc.get('summary', ''),
|
'summary': doc.get('summary', ''),
|
||||||
|
|||||||
Reference in New Issue
Block a user