Files
Munich-news/TEST_INSTRUCTIONS.md
2025-11-10 19:13:33 +01:00

3.1 KiB

Testing RSS Feed URL Extraction

Run this from the project root with backend virtual environment activated:

# 1. Activate backend virtual environment
cd backend
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 2. Go back to project root
cd ..

# 3. Run the test
python test_feeds_quick.py

This will:

  • ✓ Check what RSS feeds are in your database
  • ✓ Fetch each feed
  • ✓ Test URL extraction on first 3 articles
  • ✓ Show what fields are available
  • ✓ Verify summary and date extraction

Expected Output

================================================================================
RSS Feed Test - Checking Database Feeds
================================================================================

✓ Found 3 feed(s) in database

================================================================================
Feed: Süddeutsche Zeitung München
URL: https://www.sueddeutsche.de/muenchen/rss
Active: True
================================================================================
Fetching RSS feed...
✓ Found 20 entries

--- Entry 1 ---
Title: New U-Bahn Line Opens in Munich
✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-123
✓ Summary: The new U-Bahn line connecting the city center...
✓ Date: Mon, 10 Nov 2024 10:00:00 +0100

--- Entry 2 ---
Title: Munich Weather Update
✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-124
✓ Summary: Weather forecast for the week...
✓ Date: Mon, 10 Nov 2024 09:30:00 +0100

...

If No Feeds Found

Add a feed first:

curl -X POST http://localhost:5001/api/rss-feeds \
  -H "Content-Type: application/json" \
  -d '{"name": "Süddeutsche Politik", "url": "https://rss.sueddeutsche.de/rss/Politik"}'

Testing News Crawler

Once feeds are verified, test the crawler:

# 1. Install crawler dependencies
cd news_crawler
pip install -r requirements.txt

# 2. Run the test
python test_rss_feeds.py

# 3. Or run the actual crawler
python crawler_service.py 5

Troubleshooting

"No module named 'pymongo'"

  • Activate the backend virtual environment first
  • Or install dependencies: pip install -r backend/requirements.txt

"No RSS feeds in database"

  • Make sure backend is running
  • Add feeds via API (see above)
  • Or check if MongoDB is running: docker-compose ps

"Could not extract URL"

  • The test will show available fields
  • Check if the feed uses guid, id, or links instead of link
  • Our utility should handle most cases automatically

"No entries found"

  • The RSS feed URL might be invalid
  • Try opening the URL in a browser
  • Check if it returns valid XML

Manual Database Check

Using mongosh:

mongosh
use munich_news
db.rss_feeds.find()
db.articles.find().limit(3)

What to Look For

Good signs:

  • URLs are extracted successfully
  • URLs start with http:// or https://
  • Summaries are present
  • Dates are extracted

⚠️ Warning signs:

  • "Could not extract URL" messages
  • Empty summaries (not critical)
  • Missing dates (not critical)

Problems:

  • No entries found in feed
  • All URL extractions fail
  • Feed parsing errors