Crawler Tests
Test suite for the news crawler, AI clustering, and neutral summary generation.
Test Files
AI Clustering & Aggregation Tests
test_clustering_real.py- Tests AI-powered article clustering with realistic fake articlestest_neutral_summaries.py- Tests neutral summary generation from clustered articlestest_complete_workflow.py- End-to-end test of clustering + neutral summaries
Core Crawler Tests
test_crawler.py- Basic crawler functionalitytest_ollama.py- Ollama AI integration teststest_rss_feeds.py- RSS feed parsing tests
Running Tests
Run All Tests
# From project root
docker-compose exec crawler python -m pytest tests/crawler/
Run Specific Test
# AI clustering test
docker-compose exec crawler python tests/crawler/test_clustering_real.py
# Neutral summaries test
docker-compose exec crawler python tests/crawler/test_neutral_summaries.py
# Complete workflow test
docker-compose exec crawler python tests/crawler/test_complete_workflow.py
Run Tests Inside Container
# Enter container
docker-compose exec crawler bash
# Run tests
python test_clustering_real.py
python test_neutral_summaries.py
python test_complete_workflow.py
Test Data
Tests use fake articles to avoid depending on external RSS feeds:
Test Scenarios:
- Same story, different sources - Should cluster together
- Different stories - Should remain separate
- Multi-source clustering - Should generate neutral summaries
Expected Results:
- Housing story (2 sources) → Cluster together → Neutral summary
- Bayern transfer (2 sources) → Cluster together → Neutral summary
- Single-source stories → Individual summaries
Cleanup
Tests create temporary data in MongoDB. To clean up:
# Clean test articles
docker-compose exec crawler python << 'EOF'
from pymongo import MongoClient
client = MongoClient("mongodb://admin:changeme@mongodb:27017/")
db = client["munich_news"]
db.articles.delete_many({"link": {"$regex": "^https://example.com/"}})
db.cluster_summaries.delete_many({})
print("✓ Test data cleaned")
EOF
Requirements
- Docker containers must be running
- Ollama service must be available
- MongoDB must be accessible
- AI model (phi3:latest) must be downloaded
Troubleshooting
Ollama Not Available
# Check Ollama status
docker-compose logs ollama
# Restart Ollama
docker-compose restart ollama
Tests Timing Out
- Increase timeout in test files (default: 60s)
- Check Ollama model is downloaded
- Verify GPU acceleration if enabled
MongoDB Connection Issues
# Check MongoDB status
docker-compose logs mongodb
# Restart MongoDB
docker-compose restart mongodb