This commit is contained in:
2025-11-10 19:13:33 +01:00
commit ac5738c29d
64 changed files with 9445 additions and 0 deletions

View File

@@ -0,0 +1,92 @@
# Implementation Plan
- [x] 1. Create Ollama client module
- Create `news_crawler/ollama_client.py` with OllamaClient class
- Implement `summarize_article()` method with prompt construction and API call
- Implement `is_available()` method for health checks
- Implement `test_connection()` method for diagnostics
- Add timeout handling (30 seconds)
- Add error handling for connection, timeout, and invalid responses
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 4.1, 4.2, 4.3, 5.2_
- [x] 2. Create configuration module for crawler
- Create `news_crawler/config.py` with Config class
- Load environment variables (OLLAMA_BASE_URL, OLLAMA_MODEL, OLLAMA_ENABLED, OLLAMA_API_KEY, OLLAMA_TIMEOUT)
- Add validation for required configuration
- Add default values for optional configuration
- _Requirements: 2.1, 2.2, 2.3, 2.4_
- [x] 3. Integrate Ollama client into crawler service
- Import OllamaClient in `news_crawler/crawler_service.py`
- Initialize Ollama client at module level using Config
- Modify `crawl_rss_feed()` to call summarization after content extraction
- Add conditional logic to skip summarization if OLLAMA_ENABLED is false
- Add error handling to continue processing if summarization fails
- Add logging for summarization start, success, and failure
- Add rate limiting delay after summarization
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 2.3, 2.4, 4.1, 4.5, 5.1, 5.3, 6.1, 6.2, 6.3_
- [x] 4. Update database schema and storage
- Modify article document structure in `crawl_rss_feed()` to include:
- `summary` field (AI-generated summary)
- `summary_word_count` field
- `summarized_at` field (timestamp)
- Update MongoDB upsert logic to handle new fields
- Add check to skip re-summarization if article already has summary
- _Requirements: 3.1, 3.2, 3.3, 3.4, 8.4_
- [x] 5. Update backend API to return summaries
- Modify `backend/routes/news_routes.py` GET /api/news endpoint
- Add `summary`, `summary_word_count`, `summarized_at` fields to response
- Add `has_summary` boolean field to indicate if AI summarization was performed
- Modify GET /api/news/<url> endpoint to include summary fields
- Add fallback to content preview if no summary exists
- _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 8.1, 8.2, 8.3_
- [x] 6. Update database schema documentation
- Update `backend/DATABASE_SCHEMA.md` with new summary fields
- Add example document showing summary fields
- Document the summarization workflow
- _Requirements: 3.1, 3.2, 3.3_
- [x] 7. Add environment variable configuration
- Update `backend/env.template` with Ollama configuration
- Add comments explaining each Ollama setting
- Document default values
- _Requirements: 2.1, 2.2_
- [x] 8. Create test script for Ollama integration
- Create `news_crawler/test_ollama.py` to test Ollama connection
- Test summarization with sample article
- Test error handling (timeout, connection failure)
- Display configuration and connection status
- _Requirements: 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 4.1, 4.2_
- [x] 9. Update crawler statistics and logging
- Add summarization statistics to final report in `crawl_all_feeds()`
- Track total articles summarized vs failed
- Log average summarization time
- Display progress indicators during summarization
- _Requirements: 5.4, 6.1, 6.2, 6.3, 6.4, 6.5_
- [x] 10. Create documentation for AI summarization
- Create `news_crawler/AI_SUMMARIZATION.md` explaining the feature
- Document configuration options
- Provide troubleshooting guide
- Add examples of usage
- _Requirements: 2.1, 2.2, 2.3, 2.4, 6.1, 6.2, 6.3_
- [x] 11. Update main README with AI summarization info
- Add section about AI summarization feature
- Document Ollama setup requirements
- Add configuration examples
- Update API endpoint documentation
- _Requirements: 2.1, 2.2, 7.1, 7.2_
- [x] 12. Test end-to-end workflow
- Run crawler with Ollama enabled
- Verify articles are summarized correctly
- Check database contains all expected fields
- Test API endpoints return summaries
- Verify error handling when Ollama is disabled/unavailable
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5, 7.1, 7.2, 7.3, 7.4, 7.5, 8.1, 8.2, 8.3, 8.4, 8.5_