Munich-news/TEST_INSTRUCTIONS.md

# Testing RSS Feed URL Extraction

## Quick Test (Recommended)

Run this from the project root with backend virtual environment activated:

```bash
# 1. Activate backend virtual environment
cd backend
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 2. Go back to project root
cd ..

# 3. Run the test
python test_feeds_quick.py
```

This will:
- ✓ Check what RSS feeds are in your database
- ✓ Fetch each feed
- ✓ Test URL extraction on first 3 articles
- ✓ Show what fields are available
- ✓ Verify summary and date extraction

## Expected Output

```
================================================================================
RSS Feed Test - Checking Database Feeds
================================================================================

✓ Found 3 feed(s) in database

================================================================================
Feed: Süddeutsche Zeitung München
URL: https://www.sueddeutsche.de/muenchen/rss
Active: True
================================================================================
Fetching RSS feed...
✓ Found 20 entries

--- Entry 1 ---
Title: New U-Bahn Line Opens in Munich
✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-123
✓ Summary: The new U-Bahn line connecting the city center...
✓ Date: Mon, 10 Nov 2024 10:00:00 +0100

--- Entry 2 ---
Title: Munich Weather Update
✓ URL extracted: https://www.sueddeutsche.de/muenchen/article-124
✓ Summary: Weather forecast for the week...
✓ Date: Mon, 10 Nov 2024 09:30:00 +0100

...
```

## If No Feeds Found

Add a feed first:

```bash
curl -X POST http://localhost:5001/api/rss-feeds \
  -H "Content-Type: application/json" \
  -d '{"name": "Süddeutsche Politik", "url": "https://rss.sueddeutsche.de/rss/Politik"}'
```

## Testing News Crawler

Once feeds are verified, test the crawler:

```bash
# 1. Install crawler dependencies
cd news_crawler
pip install -r requirements.txt

# 2. Run the test
python test_rss_feeds.py

# 3. Or run the actual crawler
python crawler_service.py 5
```

## Troubleshooting

### "No module named 'pymongo'"
- Activate the backend virtual environment first
- Or install dependencies: `pip install -r backend/requirements.txt`

### "No RSS feeds in database"
- Make sure backend is running
- Add feeds via API (see above)
- Or check if MongoDB is running: `docker-compose ps`

### "Could not extract URL"
- The test will show available fields
- Check if the feed uses `guid`, `id`, or `links` instead of `link`
- Our utility should handle most cases automatically

### "No entries found"
- The RSS feed URL might be invalid
- Try opening the URL in a browser
- Check if it returns valid XML

## Manual Database Check

Using mongosh:

```bash
mongosh
use munich_news
db.rss_feeds.find()
db.articles.find().limit(3)
```

## What to Look For

✅ **Good signs:**
- URLs are extracted successfully
- URLs start with `http://` or `https://`
- Summaries are present
- Dates are extracted

⚠️ **Warning signs:**
- "Could not extract URL" messages
- Empty summaries (not critical)
- Missing dates (not critical)

❌ **Problems:**
- No entries found in feed
- All URL extractions fail
- Feed parsing errors