Files
Munich-news/tests/crawler
2025-11-12 11:34:33 +01:00
..
2025-11-12 11:34:33 +01:00
2025-11-12 11:34:33 +01:00
2025-11-12 11:34:33 +01:00
2025-11-11 14:09:21 +01:00
2025-11-12 11:34:33 +01:00
2025-11-11 14:09:21 +01:00
2025-11-11 14:09:21 +01:00

Crawler Tests

Test suite for the news crawler, AI clustering, and neutral summary generation.

Test Files

AI Clustering & Aggregation Tests

  • test_clustering_real.py - Tests AI-powered article clustering with realistic fake articles
  • test_neutral_summaries.py - Tests neutral summary generation from clustered articles
  • test_complete_workflow.py - End-to-end test of clustering + neutral summaries

Core Crawler Tests

  • test_crawler.py - Basic crawler functionality
  • test_ollama.py - Ollama AI integration tests
  • test_rss_feeds.py - RSS feed parsing tests

Running Tests

Run All Tests

# From project root
docker-compose exec crawler python -m pytest tests/crawler/

Run Specific Test

# AI clustering test
docker-compose exec crawler python tests/crawler/test_clustering_real.py

# Neutral summaries test
docker-compose exec crawler python tests/crawler/test_neutral_summaries.py

# Complete workflow test
docker-compose exec crawler python tests/crawler/test_complete_workflow.py

Run Tests Inside Container

# Enter container
docker-compose exec crawler bash

# Run tests
python test_clustering_real.py
python test_neutral_summaries.py
python test_complete_workflow.py

Test Data

Tests use fake articles to avoid depending on external RSS feeds:

Test Scenarios:

  1. Same story, different sources - Should cluster together
  2. Different stories - Should remain separate
  3. Multi-source clustering - Should generate neutral summaries

Expected Results:

  • Housing story (2 sources) → Cluster together → Neutral summary
  • Bayern transfer (2 sources) → Cluster together → Neutral summary
  • Single-source stories → Individual summaries

Cleanup

Tests create temporary data in MongoDB. To clean up:

# Clean test articles
docker-compose exec crawler python << 'EOF'
from pymongo import MongoClient
client = MongoClient("mongodb://admin:changeme@mongodb:27017/")
db = client["munich_news"]
db.articles.delete_many({"link": {"$regex": "^https://example.com/"}})
db.cluster_summaries.delete_many({})
print("✓ Test data cleaned")
EOF

Requirements

  • Docker containers must be running
  • Ollama service must be available
  • MongoDB must be accessible
  • AI model (phi3:latest) must be downloaded

Troubleshooting

Ollama Not Available

# Check Ollama status
docker-compose logs ollama

# Restart Ollama
docker-compose restart ollama

Tests Timing Out

  • Increase timeout in test files (default: 60s)
  • Check Ollama model is downloaded
  • Verify GPU acceleration if enabled

MongoDB Connection Issues

# Check MongoDB status
docker-compose logs mongodb

# Restart MongoDB
docker-compose restart mongodb