update
This commit is contained in:
132
TEST_INSTRUCTIONS.md
Normal file
132
TEST_INSTRUCTIONS.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Testing RSS Feed URL Extraction
|
||||
|
||||
## Quick Test (Recommended)
|
||||
|
||||
Run this from the project root with backend virtual environment activated:
|
||||
|
||||
```bash
|
||||
# 1. Activate backend virtual environment
|
||||
cd backend
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
|
||||
# 2. Go back to project root
|
||||
cd ..
|
||||
|
||||
# 3. Run the test
|
||||
python test_feeds_quick.py
|
||||
```
|
||||
|
||||
This will:
|
||||
- ✓ Check what RSS feeds are in your database
|
||||
- ✓ Fetch each feed
|
||||
- ✓ Test URL extraction on first 3 articles
|
||||
- ✓ Show what fields are available
|
||||
- ✓ Verify summary and date extraction
|
||||
|
||||
## Expected Output
|
||||
|
||||
```
|
||||
================================================================================
|
||||
RSS Feed Test - Checking Database Feeds
|
||||
================================================================================
|
||||
|
||||
✓ Found 3 feed(s) in database
|
||||
|
||||
================================================================================
|
||||
Feed: Süddeutsche Zeitung München
|
||||
URL: https://www.sueddeutsche.de/muenchen/rss
|
||||
Active: True
|
||||
================================================================================
|
||||
Fetching RSS feed...
|
||||
✓ Found 20 entries
|
||||
|
||||
--- Entry 1 ---
|
||||
Title: New U-Bahn Line Opens in Munich
|
||||
✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-123
|
||||
✓ Summary: The new U-Bahn line connecting the city center...
|
||||
✓ Date: Mon, 10 Nov 2024 10:00:00 +0100
|
||||
|
||||
--- Entry 2 ---
|
||||
Title: Munich Weather Update
|
||||
✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-124
|
||||
✓ Summary: Weather forecast for the week...
|
||||
✓ Date: Mon, 10 Nov 2024 09:30:00 +0100
|
||||
|
||||
...
|
||||
```
|
||||
|
||||
## If No Feeds Found
|
||||
|
||||
Add a feed first:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:5001/api/rss-feeds \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"name": "Süddeutsche Politik", "url": "https://rss.sueddeutsche.de/rss/Politik"}'
|
||||
```
|
||||
|
||||
## Testing News Crawler
|
||||
|
||||
Once feeds are verified, test the crawler:
|
||||
|
||||
```bash
|
||||
# 1. Install crawler dependencies
|
||||
cd news_crawler
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 2. Run the test
|
||||
python test_rss_feeds.py
|
||||
|
||||
# 3. Or run the actual crawler
|
||||
python crawler_service.py 5
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "No module named 'pymongo'"
|
||||
- Activate the backend virtual environment first
|
||||
- Or install dependencies: `pip install -r backend/requirements.txt`
|
||||
|
||||
### "No RSS feeds in database"
|
||||
- Make sure backend is running
|
||||
- Add feeds via API (see above)
|
||||
- Or check if MongoDB is running: `docker-compose ps`
|
||||
|
||||
### "Could not extract URL"
|
||||
- The test will show available fields
|
||||
- Check if the feed uses `guid`, `id`, or `links` instead of `link`
|
||||
- Our utility should handle most cases automatically
|
||||
|
||||
### "No entries found"
|
||||
- The RSS feed URL might be invalid
|
||||
- Try opening the URL in a browser
|
||||
- Check if it returns valid XML
|
||||
|
||||
## Manual Database Check
|
||||
|
||||
Using mongosh:
|
||||
|
||||
```bash
|
||||
mongosh
|
||||
use munich_news
|
||||
db.rss_feeds.find()
|
||||
db.articles.find().limit(3)
|
||||
```
|
||||
|
||||
## What to Look For
|
||||
|
||||
✅ **Good signs:**
|
||||
- URLs are extracted successfully
|
||||
- URLs start with `http://` or `https://`
|
||||
- Summaries are present
|
||||
- Dates are extracted
|
||||
|
||||
⚠️ **Warning signs:**
|
||||
- "Could not extract URL" messages
|
||||
- Empty summaries (not critical)
|
||||
- Missing dates (not critical)
|
||||
|
||||
❌ **Problems:**
|
||||
- No entries found in feed
|
||||
- All URL extractions fail
|
||||
- Feed parsing errors
|
||||
Reference in New Issue
Block a user