# Crawler Tests Test suite for the news crawler, AI clustering, and neutral summary generation. ## Test Files ### AI Clustering & Aggregation Tests - **`test_clustering_real.py`** - Tests AI-powered article clustering with realistic fake articles - **`test_neutral_summaries.py`** - Tests neutral summary generation from clustered articles - **`test_complete_workflow.py`** - End-to-end test of clustering + neutral summaries ### Core Crawler Tests - **`test_crawler.py`** - Basic crawler functionality - **`test_ollama.py`** - Ollama AI integration tests - **`test_rss_feeds.py`** - RSS feed parsing tests ## Running Tests ### Run All Tests ```bash # From project root docker-compose exec crawler python -m pytest tests/crawler/ ``` ### Run Specific Test ```bash # AI clustering test docker-compose exec crawler python tests/crawler/test_clustering_real.py # Neutral summaries test docker-compose exec crawler python tests/crawler/test_neutral_summaries.py # Complete workflow test docker-compose exec crawler python tests/crawler/test_complete_workflow.py ``` ### Run Tests Inside Container ```bash # Enter container docker-compose exec crawler bash # Run tests python test_clustering_real.py python test_neutral_summaries.py python test_complete_workflow.py ``` ## Test Data Tests use fake articles to avoid depending on external RSS feeds: **Test Scenarios:** 1. **Same story, different sources** - Should cluster together 2. **Different stories** - Should remain separate 3. **Multi-source clustering** - Should generate neutral summaries **Expected Results:** - Housing story (2 sources) → Cluster together → Neutral summary - Bayern transfer (2 sources) → Cluster together → Neutral summary - Single-source stories → Individual summaries ## Cleanup Tests create temporary data in MongoDB. To clean up: ```bash # Clean test articles docker-compose exec crawler python << 'EOF' from pymongo import MongoClient client = MongoClient("mongodb://admin:changeme@mongodb:27017/") db = client["munich_news"] db.articles.delete_many({"link": {"$regex": "^https://example.com/"}}) db.cluster_summaries.delete_many({}) print("✓ Test data cleaned") EOF ``` ## Requirements - Docker containers must be running - Ollama service must be available - MongoDB must be accessible - AI model (phi3:latest) must be downloaded ## Troubleshooting ### Ollama Not Available ```bash # Check Ollama status docker-compose logs ollama # Restart Ollama docker-compose restart ollama ``` ### Tests Timing Out - Increase timeout in test files (default: 60s) - Check Ollama model is downloaded - Verify GPU acceleration if enabled ### MongoDB Connection Issues ```bash # Check MongoDB status docker-compose logs mongodb # Restart MongoDB docker-compose restart mongodb ```