Article Testing Strategy¶

This document explains how article validation tests work in this repository.

Overview¶

We have two types of article tests with different purposes:

1. Unit Tests (Standard Test Suite)¶

Location: Most tests in tests/ directory When they run: On every push and PR via lint-and-test.yml workflow What they test: Core functionality, utilities, business logic What they DON'T test: Article content validation

2. Article Validation Tests (Generated Content)¶

Location: tests/test_new_articles.py When they run: Only after article generation in fetch-trends.yml workflow What they test: Newly generated articles only (not historical articles)

Why This Separation?¶

Problem with Testing All Articles¶

Previously, article validation tests checked ALL articles in the repository, including: - Historical articles that may use older formats - Articles with different structural requirements from the past - Articles that are valid but don't meet current standards

This caused: - ❌ False positives in CI on unrelated PRs - ❌ Failures due to old articles when making code changes - ❌ Inability to evolve article standards without fixing all old articles

Solution: Test Only New Articles¶

Now we: - ✅ Only validate articles generated in the current workflow run - ✅ Use git to detect which articles are new - ✅ Fall back to date-based detection (today/yesterday) when needed - ✅ Skip article validation entirely in the standard test suite

Test Files¶

`tests/test_new_articles.py` (ACTIVE)¶

Purpose: Validate newly generated articles
Scope: Only articles from current workflow run
Detection Method:
Primary: Git diff to find added/modified articles
Fallback: Articles from today and yesterday
Usage: Run automatically by fetch-trends.yml after article generation

`tests/test_article_integrity.py` (LEGACY)¶

Purpose: Comprehensive validation of ALL articles
Status: Excluded from standard test suite
Usage: Manual debugging only
Note: Will fail on historical articles with different formats

Workflows¶

`lint-and-test.yml`¶

- name: Run tests
  run: uv run pytest tests/ -v --ignore=tests/test_article_integrity.py --ignore=tests/test_new_articles.py

- Runs on every push/PR - Excludes all article validation tests - Only runs unit tests for code functionality

`fetch-trends.yml`¶

- name: Run tests on generated content
  if: steps.aggregate.outputs.has_valid_articles == 'true'
  env:
    NEW_ARTICLES_EXPECTED: 'true'
  run: uv run pytest tests/test_new_articles.py -v

- Runs after article generation and aggregation - Sets NEW_ARTICLES_EXPECTED=true to enable validation - Only validates articles from the current run

Running Tests Locally¶

Run unit tests (no article validation):¶

uv run pytest tests/ -v --ignore=tests/test_article_integrity.py --ignore=tests/test_new_articles.py

Run article validation for new articles:¶

export NEW_ARTICLES_EXPECTED=true
uv run pytest tests/test_new_articles.py -v

Run legacy article integrity tests (will likely fail on old articles):¶

uv run pytest tests/test_article_integrity.py -v

Article Validation Checks¶

New articles are validated for:

Structure:
Has a title (# Title)
Has a summary section (## 📌 Összefoglaló or variants)
Has a sources section (## 🔗 Forrásanyagok)
Content Quality:
Summary is at least 50 characters
Contains real URLs (not placeholders)
No error keywords (Error:, TODO:, FIXME:, etc.)
Format:
Valid markdown structure
Proper section headers

Future Improvements¶

Possible enhancements to consider:

Incremental Validation: Track which articles have been validated to avoid re-testing
Migration Tool: Gradually update old articles to meet current standards
Version Markers: Add format version metadata to articles
Separate History Tests: Create optional tests that can validate old articles when needed

Questions?¶

If you need to: - Add new validation rules → Update tests/test_new_articles.py - Debug article generation → Run test_new_articles.py locally - Check all articles → Run test_article_integrity.py manually - Update test strategy → Modify this document and relevant test files