dongho/Munich-news

Fork 0

Files

Dongho Kim ac5738c29d update

2025-11-10 19:13:33 +01:00

3.1 KiB

Raw Blame History

Testing RSS Feed URL Extraction

Quick Test (Recommended)

Run this from the project root with backend virtual environment activated:

# 1. Activate backend virtual environment
cd backend
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 2. Go back to project root
cd ..

# 3. Run the test
python test_feeds_quick.py

This will:

✓ Check what RSS feeds are in your database
✓ Fetch each feed
✓ Test URL extraction on first 3 articles
✓ Show what fields are available
✓ Verify summary and date extraction

Expected Output

================================================================================
RSS Feed Test - Checking Database Feeds
================================================================================

✓ Found 3 feed(s) in database

================================================================================
Feed: Süddeutsche Zeitung München
URL: https://www.sueddeutsche.de/muenchen/rss
Active: True
================================================================================
Fetching RSS feed...
✓ Found 20 entries

--- Entry 1 ---
Title: New U-Bahn Line Opens in Munich
✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-123
✓ Summary: The new U-Bahn line connecting the city center...
✓ Date: Mon, 10 Nov 2024 10:00:00 +0100

--- Entry 2 ---
Title: Munich Weather Update
✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-124
✓ Summary: Weather forecast for the week...
✓ Date: Mon, 10 Nov 2024 09:30:00 +0100

...

If No Feeds Found

Add a feed first:

curl -X POST http://localhost:5001/api/rss-feeds \
  -H "Content-Type: application/json" \
  -d '{"name": "Süddeutsche Politik", "url": "https://rss.sueddeutsche.de/rss/Politik"}'

Testing News Crawler

Once feeds are verified, test the crawler:

# 1. Install crawler dependencies
cd news_crawler
pip install -r requirements.txt

# 2. Run the test
python test_rss_feeds.py

# 3. Or run the actual crawler
python crawler_service.py 5

Troubleshooting

"No module named 'pymongo'"

Activate the backend virtual environment first
Or install dependencies: pip install -r backend/requirements.txt

"No RSS feeds in database"

Make sure backend is running
Add feeds via API (see above)
Or check if MongoDB is running: docker-compose ps

"Could not extract URL"

The test will show available fields
Check if the feed uses guid, id, or links instead of link
Our utility should handle most cases automatically

"No entries found"

The RSS feed URL might be invalid
Try opening the URL in a browser
Check if it returns valid XML

Manual Database Check

Using mongosh:

mongosh
use munich_news
db.rss_feeds.find()
db.articles.find().limit(3)

What to Look For

✅ Good signs:

URLs are extracted successfully
URLs start with http:// or https://
Summaries are present
Dates are extracted

⚠️ Warning signs:

"Could not extract URL" messages
Empty summaries (not critical)
Missing dates (not critical)

❌ Problems:

No entries found in feed
All URL extractions fail
Feed parsing errors

3.1 KiB

Raw Blame History

Testing RSS Feed URL Extraction

Quick Test (Recommended)

Expected Output

If No Feeds Found

Testing News Crawler

Troubleshooting

"No module named 'pymongo'"

"No RSS feeds in database"

"Could not extract URL"

"No entries found"

Manual Database Check

What to Look For

Build together

Resources

Get help

3.1 KiB Raw Blame History

Testing RSS Feed URL Extraction

Quick Test (Recommended)

Expected Output

If No Feeds Found

Testing News Crawler

Troubleshooting

"No module named 'pymongo'"

"No RSS feeds in database"

"Could not extract URL"

"No entries found"

Manual Database Check

What to Look For

Build together

Resources

Get help

3.1 KiB

Raw Blame History