4.2 KiB
4.2 KiB
Implementation Plan
-
1. Create Ollama client module
- Create
news_crawler/ollama_client.pywith OllamaClient class - Implement
summarize_article()method with prompt construction and API call - Implement
is_available()method for health checks - Implement
test_connection()method for diagnostics - Add timeout handling (30 seconds)
- Add error handling for connection, timeout, and invalid responses
- Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 4.1, 4.2, 4.3, 5.2
- Create
-
2. Create configuration module for crawler
- Create
news_crawler/config.pywith Config class - Load environment variables (OLLAMA_BASE_URL, OLLAMA_MODEL, OLLAMA_ENABLED, OLLAMA_API_KEY, OLLAMA_TIMEOUT)
- Add validation for required configuration
- Add default values for optional configuration
- Requirements: 2.1, 2.2, 2.3, 2.4
- Create
-
3. Integrate Ollama client into crawler service
- Import OllamaClient in
news_crawler/crawler_service.py - Initialize Ollama client at module level using Config
- Modify
crawl_rss_feed()to call summarization after content extraction - Add conditional logic to skip summarization if OLLAMA_ENABLED is false
- Add error handling to continue processing if summarization fails
- Add logging for summarization start, success, and failure
- Add rate limiting delay after summarization
- Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 2.3, 2.4, 4.1, 4.5, 5.1, 5.3, 6.1, 6.2, 6.3
- Import OllamaClient in
-
4. Update database schema and storage
- Modify article document structure in
crawl_rss_feed()to include:summaryfield (AI-generated summary)summary_word_countfieldsummarized_atfield (timestamp)
- Update MongoDB upsert logic to handle new fields
- Add check to skip re-summarization if article already has summary
- Requirements: 3.1, 3.2, 3.3, 3.4, 8.4
- Modify article document structure in
-
5. Update backend API to return summaries
- Modify
backend/routes/news_routes.pyGET /api/news endpoint - Add
summary,summary_word_count,summarized_atfields to response - Add
has_summaryboolean field to indicate if AI summarization was performed - Modify GET /api/news/ endpoint to include summary fields
- Add fallback to content preview if no summary exists
- Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 8.1, 8.2, 8.3
- Modify
-
6. Update database schema documentation
- Update
backend/DATABASE_SCHEMA.mdwith new summary fields - Add example document showing summary fields
- Document the summarization workflow
- Requirements: 3.1, 3.2, 3.3
- Update
-
7. Add environment variable configuration
- Update
backend/env.templatewith Ollama configuration - Add comments explaining each Ollama setting
- Document default values
- Requirements: 2.1, 2.2
- Update
-
8. Create test script for Ollama integration
- Create
news_crawler/test_ollama.pyto test Ollama connection - Test summarization with sample article
- Test error handling (timeout, connection failure)
- Display configuration and connection status
- Requirements: 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 4.1, 4.2
- Create
-
9. Update crawler statistics and logging
- Add summarization statistics to final report in
crawl_all_feeds() - Track total articles summarized vs failed
- Log average summarization time
- Display progress indicators during summarization
- Requirements: 5.4, 6.1, 6.2, 6.3, 6.4, 6.5
- Add summarization statistics to final report in
-
10. Create documentation for AI summarization
- Create
news_crawler/AI_SUMMARIZATION.mdexplaining the feature - Document configuration options
- Provide troubleshooting guide
- Add examples of usage
- Requirements: 2.1, 2.2, 2.3, 2.4, 6.1, 6.2, 6.3
- Create
-
11. Update main README with AI summarization info
- Add section about AI summarization feature
- Document Ollama setup requirements
- Add configuration examples
- Update API endpoint documentation
- Requirements: 2.1, 2.2, 7.1, 7.2
-
12. Test end-to-end workflow
- Run crawler with Ollama enabled
- Verify articles are summarized correctly
- Check database contains all expected fields
- Test API endpoints return summaries
- Verify error handling when Ollama is disabled/unavailable
- Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5, 7.1, 7.2, 7.3, 7.4, 7.5, 8.1, 8.2, 8.3, 8.4, 8.5