111 lines
2.7 KiB
Markdown
111 lines
2.7 KiB
Markdown
# Crawler Tests
|
|
|
|
Test suite for the news crawler, AI clustering, and neutral summary generation.
|
|
|
|
## Test Files
|
|
|
|
### AI Clustering & Aggregation Tests
|
|
|
|
- **`test_clustering_real.py`** - Tests AI-powered article clustering with realistic fake articles
|
|
- **`test_neutral_summaries.py`** - Tests neutral summary generation from clustered articles
|
|
- **`test_complete_workflow.py`** - End-to-end test of clustering + neutral summaries
|
|
|
|
### Core Crawler Tests
|
|
|
|
- **`test_crawler.py`** - Basic crawler functionality
|
|
- **`test_ollama.py`** - Ollama AI integration tests
|
|
- **`test_rss_feeds.py`** - RSS feed parsing tests
|
|
|
|
## Running Tests
|
|
|
|
### Run All Tests
|
|
```bash
|
|
# From project root
|
|
docker-compose exec crawler python -m pytest tests/crawler/
|
|
```
|
|
|
|
### Run Specific Test
|
|
```bash
|
|
# AI clustering test
|
|
docker-compose exec crawler python tests/crawler/test_clustering_real.py
|
|
|
|
# Neutral summaries test
|
|
docker-compose exec crawler python tests/crawler/test_neutral_summaries.py
|
|
|
|
# Complete workflow test
|
|
docker-compose exec crawler python tests/crawler/test_complete_workflow.py
|
|
```
|
|
|
|
### Run Tests Inside Container
|
|
```bash
|
|
# Enter container
|
|
docker-compose exec crawler bash
|
|
|
|
# Run tests
|
|
python test_clustering_real.py
|
|
python test_neutral_summaries.py
|
|
python test_complete_workflow.py
|
|
```
|
|
|
|
## Test Data
|
|
|
|
Tests use fake articles to avoid depending on external RSS feeds:
|
|
|
|
**Test Scenarios:**
|
|
1. **Same story, different sources** - Should cluster together
|
|
2. **Different stories** - Should remain separate
|
|
3. **Multi-source clustering** - Should generate neutral summaries
|
|
|
|
**Expected Results:**
|
|
- Housing story (2 sources) → Cluster together → Neutral summary
|
|
- Bayern transfer (2 sources) → Cluster together → Neutral summary
|
|
- Single-source stories → Individual summaries
|
|
|
|
## Cleanup
|
|
|
|
Tests create temporary data in MongoDB. To clean up:
|
|
|
|
```bash
|
|
# Clean test articles
|
|
docker-compose exec crawler python << 'EOF'
|
|
from pymongo import MongoClient
|
|
client = MongoClient("mongodb://admin:changeme@mongodb:27017/")
|
|
db = client["munich_news"]
|
|
db.articles.delete_many({"link": {"$regex": "^https://example.com/"}})
|
|
db.cluster_summaries.delete_many({})
|
|
print("✓ Test data cleaned")
|
|
EOF
|
|
```
|
|
|
|
## Requirements
|
|
|
|
- Docker containers must be running
|
|
- Ollama service must be available
|
|
- MongoDB must be accessible
|
|
- AI model (phi3:latest) must be downloaded
|
|
|
|
## Troubleshooting
|
|
|
|
### Ollama Not Available
|
|
```bash
|
|
# Check Ollama status
|
|
docker-compose logs ollama
|
|
|
|
# Restart Ollama
|
|
docker-compose restart ollama
|
|
```
|
|
|
|
### Tests Timing Out
|
|
- Increase timeout in test files (default: 60s)
|
|
- Check Ollama model is downloaded
|
|
- Verify GPU acceleration if enabled
|
|
|
|
### MongoDB Connection Issues
|
|
```bash
|
|
# Check MongoDB status
|
|
docker-compose logs mongodb
|
|
|
|
# Restart MongoDB
|
|
docker-compose restart mongodb
|
|
```
|