update
This commit is contained in:
110
tests/crawler/README.md
Normal file
110
tests/crawler/README.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# Crawler Tests
|
||||
|
||||
Test suite for the news crawler, AI clustering, and neutral summary generation.
|
||||
|
||||
## Test Files
|
||||
|
||||
### AI Clustering & Aggregation Tests
|
||||
|
||||
- **`test_clustering_real.py`** - Tests AI-powered article clustering with realistic fake articles
|
||||
- **`test_neutral_summaries.py`** - Tests neutral summary generation from clustered articles
|
||||
- **`test_complete_workflow.py`** - End-to-end test of clustering + neutral summaries
|
||||
|
||||
### Core Crawler Tests
|
||||
|
||||
- **`test_crawler.py`** - Basic crawler functionality
|
||||
- **`test_ollama.py`** - Ollama AI integration tests
|
||||
- **`test_rss_feeds.py`** - RSS feed parsing tests
|
||||
|
||||
## Running Tests
|
||||
|
||||
### Run All Tests
|
||||
```bash
|
||||
# From project root
|
||||
docker-compose exec crawler python -m pytest tests/crawler/
|
||||
```
|
||||
|
||||
### Run Specific Test
|
||||
```bash
|
||||
# AI clustering test
|
||||
docker-compose exec crawler python tests/crawler/test_clustering_real.py
|
||||
|
||||
# Neutral summaries test
|
||||
docker-compose exec crawler python tests/crawler/test_neutral_summaries.py
|
||||
|
||||
# Complete workflow test
|
||||
docker-compose exec crawler python tests/crawler/test_complete_workflow.py
|
||||
```
|
||||
|
||||
### Run Tests Inside Container
|
||||
```bash
|
||||
# Enter container
|
||||
docker-compose exec crawler bash
|
||||
|
||||
# Run tests
|
||||
python test_clustering_real.py
|
||||
python test_neutral_summaries.py
|
||||
python test_complete_workflow.py
|
||||
```
|
||||
|
||||
## Test Data
|
||||
|
||||
Tests use fake articles to avoid depending on external RSS feeds:
|
||||
|
||||
**Test Scenarios:**
|
||||
1. **Same story, different sources** - Should cluster together
|
||||
2. **Different stories** - Should remain separate
|
||||
3. **Multi-source clustering** - Should generate neutral summaries
|
||||
|
||||
**Expected Results:**
|
||||
- Housing story (2 sources) → Cluster together → Neutral summary
|
||||
- Bayern transfer (2 sources) → Cluster together → Neutral summary
|
||||
- Single-source stories → Individual summaries
|
||||
|
||||
## Cleanup
|
||||
|
||||
Tests create temporary data in MongoDB. To clean up:
|
||||
|
||||
```bash
|
||||
# Clean test articles
|
||||
docker-compose exec crawler python << 'EOF'
|
||||
from pymongo import MongoClient
|
||||
client = MongoClient("mongodb://admin:changeme@mongodb:27017/")
|
||||
db = client["munich_news"]
|
||||
db.articles.delete_many({"link": {"$regex": "^https://example.com/"}})
|
||||
db.cluster_summaries.delete_many({})
|
||||
print("✓ Test data cleaned")
|
||||
EOF
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Docker containers must be running
|
||||
- Ollama service must be available
|
||||
- MongoDB must be accessible
|
||||
- AI model (phi3:latest) must be downloaded
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Ollama Not Available
|
||||
```bash
|
||||
# Check Ollama status
|
||||
docker-compose logs ollama
|
||||
|
||||
# Restart Ollama
|
||||
docker-compose restart ollama
|
||||
```
|
||||
|
||||
### Tests Timing Out
|
||||
- Increase timeout in test files (default: 60s)
|
||||
- Check Ollama model is downloaded
|
||||
- Verify GPU acceleration if enabled
|
||||
|
||||
### MongoDB Connection Issues
|
||||
```bash
|
||||
# Check MongoDB status
|
||||
docker-compose logs mongodb
|
||||
|
||||
# Restart MongoDB
|
||||
docker-compose restart mongodb
|
||||
```
|
||||
Reference in New Issue
Block a user