93 lines
4.2 KiB
Markdown
93 lines
4.2 KiB
Markdown
# Implementation Plan
|
|
|
|
- [x] 1. Create Ollama client module
|
|
- Create `news_crawler/ollama_client.py` with OllamaClient class
|
|
- Implement `summarize_article()` method with prompt construction and API call
|
|
- Implement `is_available()` method for health checks
|
|
- Implement `test_connection()` method for diagnostics
|
|
- Add timeout handling (30 seconds)
|
|
- Add error handling for connection, timeout, and invalid responses
|
|
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 4.1, 4.2, 4.3, 5.2_
|
|
|
|
- [x] 2. Create configuration module for crawler
|
|
- Create `news_crawler/config.py` with Config class
|
|
- Load environment variables (OLLAMA_BASE_URL, OLLAMA_MODEL, OLLAMA_ENABLED, OLLAMA_API_KEY, OLLAMA_TIMEOUT)
|
|
- Add validation for required configuration
|
|
- Add default values for optional configuration
|
|
- _Requirements: 2.1, 2.2, 2.3, 2.4_
|
|
|
|
- [x] 3. Integrate Ollama client into crawler service
|
|
- Import OllamaClient in `news_crawler/crawler_service.py`
|
|
- Initialize Ollama client at module level using Config
|
|
- Modify `crawl_rss_feed()` to call summarization after content extraction
|
|
- Add conditional logic to skip summarization if OLLAMA_ENABLED is false
|
|
- Add error handling to continue processing if summarization fails
|
|
- Add logging for summarization start, success, and failure
|
|
- Add rate limiting delay after summarization
|
|
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 2.3, 2.4, 4.1, 4.5, 5.1, 5.3, 6.1, 6.2, 6.3_
|
|
|
|
- [x] 4. Update database schema and storage
|
|
- Modify article document structure in `crawl_rss_feed()` to include:
|
|
- `summary` field (AI-generated summary)
|
|
- `summary_word_count` field
|
|
- `summarized_at` field (timestamp)
|
|
- Update MongoDB upsert logic to handle new fields
|
|
- Add check to skip re-summarization if article already has summary
|
|
- _Requirements: 3.1, 3.2, 3.3, 3.4, 8.4_
|
|
|
|
- [x] 5. Update backend API to return summaries
|
|
- Modify `backend/routes/news_routes.py` GET /api/news endpoint
|
|
- Add `summary`, `summary_word_count`, `summarized_at` fields to response
|
|
- Add `has_summary` boolean field to indicate if AI summarization was performed
|
|
- Modify GET /api/news/<url> endpoint to include summary fields
|
|
- Add fallback to content preview if no summary exists
|
|
- _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 8.1, 8.2, 8.3_
|
|
|
|
- [x] 6. Update database schema documentation
|
|
- Update `backend/DATABASE_SCHEMA.md` with new summary fields
|
|
- Add example document showing summary fields
|
|
- Document the summarization workflow
|
|
- _Requirements: 3.1, 3.2, 3.3_
|
|
|
|
- [x] 7. Add environment variable configuration
|
|
- Update `backend/env.template` with Ollama configuration
|
|
- Add comments explaining each Ollama setting
|
|
- Document default values
|
|
- _Requirements: 2.1, 2.2_
|
|
|
|
- [x] 8. Create test script for Ollama integration
|
|
- Create `news_crawler/test_ollama.py` to test Ollama connection
|
|
- Test summarization with sample article
|
|
- Test error handling (timeout, connection failure)
|
|
- Display configuration and connection status
|
|
- _Requirements: 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 4.1, 4.2_
|
|
|
|
- [x] 9. Update crawler statistics and logging
|
|
- Add summarization statistics to final report in `crawl_all_feeds()`
|
|
- Track total articles summarized vs failed
|
|
- Log average summarization time
|
|
- Display progress indicators during summarization
|
|
- _Requirements: 5.4, 6.1, 6.2, 6.3, 6.4, 6.5_
|
|
|
|
- [x] 10. Create documentation for AI summarization
|
|
- Create `news_crawler/AI_SUMMARIZATION.md` explaining the feature
|
|
- Document configuration options
|
|
- Provide troubleshooting guide
|
|
- Add examples of usage
|
|
- _Requirements: 2.1, 2.2, 2.3, 2.4, 6.1, 6.2, 6.3_
|
|
|
|
- [x] 11. Update main README with AI summarization info
|
|
- Add section about AI summarization feature
|
|
- Document Ollama setup requirements
|
|
- Add configuration examples
|
|
- Update API endpoint documentation
|
|
- _Requirements: 2.1, 2.2, 7.1, 7.2_
|
|
|
|
- [x] 12. Test end-to-end workflow
|
|
- Run crawler with Ollama enabled
|
|
- Verify articles are summarized correctly
|
|
- Check database contains all expected fields
|
|
- Test API endpoints return summaries
|
|
- Verify error handling when Ollama is disabled/unavailable
|
|
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5, 7.1, 7.2, 7.3, 7.4, 7.5, 8.1, 8.2, 8.3, 8.4, 8.5_
|