Files
Munich-news/tests/crawler/README.md
2025-11-12 11:34:33 +01:00

111 lines
2.7 KiB
Markdown

# Crawler Tests
Test suite for the news crawler, AI clustering, and neutral summary generation.
## Test Files
### AI Clustering & Aggregation Tests
- **`test_clustering_real.py`** - Tests AI-powered article clustering with realistic fake articles
- **`test_neutral_summaries.py`** - Tests neutral summary generation from clustered articles
- **`test_complete_workflow.py`** - End-to-end test of clustering + neutral summaries
### Core Crawler Tests
- **`test_crawler.py`** - Basic crawler functionality
- **`test_ollama.py`** - Ollama AI integration tests
- **`test_rss_feeds.py`** - RSS feed parsing tests
## Running Tests
### Run All Tests
```bash
# From project root
docker-compose exec crawler python -m pytest tests/crawler/
```
### Run Specific Test
```bash
# AI clustering test
docker-compose exec crawler python tests/crawler/test_clustering_real.py
# Neutral summaries test
docker-compose exec crawler python tests/crawler/test_neutral_summaries.py
# Complete workflow test
docker-compose exec crawler python tests/crawler/test_complete_workflow.py
```
### Run Tests Inside Container
```bash
# Enter container
docker-compose exec crawler bash
# Run tests
python test_clustering_real.py
python test_neutral_summaries.py
python test_complete_workflow.py
```
## Test Data
Tests use fake articles to avoid depending on external RSS feeds:
**Test Scenarios:**
1. **Same story, different sources** - Should cluster together
2. **Different stories** - Should remain separate
3. **Multi-source clustering** - Should generate neutral summaries
**Expected Results:**
- Housing story (2 sources) → Cluster together → Neutral summary
- Bayern transfer (2 sources) → Cluster together → Neutral summary
- Single-source stories → Individual summaries
## Cleanup
Tests create temporary data in MongoDB. To clean up:
```bash
# Clean test articles
docker-compose exec crawler python << 'EOF'
from pymongo import MongoClient
client = MongoClient("mongodb://admin:changeme@mongodb:27017/")
db = client["munich_news"]
db.articles.delete_many({"link": {"$regex": "^https://example.com/"}})
db.cluster_summaries.delete_many({})
print("✓ Test data cleaned")
EOF
```
## Requirements
- Docker containers must be running
- Ollama service must be available
- MongoDB must be accessible
- AI model (phi3:latest) must be downloaded
## Troubleshooting
### Ollama Not Available
```bash
# Check Ollama status
docker-compose logs ollama
# Restart Ollama
docker-compose restart ollama
```
### Tests Timing Out
- Increase timeout in test files (default: 60s)
- Check Ollama model is downloaded
- Verify GPU acceleration if enabled
### MongoDB Connection Issues
```bash
# Check MongoDB status
docker-compose logs mongodb
# Restart MongoDB
docker-compose restart mongodb
```