Kihagyás

Article Testing Strategy

This document explains how article validation tests work in this repository.

Overview

We have two types of article tests with different purposes:

1. Unit Tests (Standard Test Suite)

Location: Most tests in tests/ directory When they run: On every push and PR via lint-and-test.yml workflow What they test: Core functionality, utilities, business logic What they DON'T test: Article content validation

2. Article Validation Tests (Generated Content)

Location: tests/test_new_articles.py When they run: Only after article generation in fetch-trends.yml workflow What they test: Newly generated articles only (not historical articles)

Why This Separation?

Problem with Testing All Articles

Previously, article validation tests checked ALL articles in the repository, including: - Historical articles that may use older formats - Articles with different structural requirements from the past - Articles that are valid but don't meet current standards

This caused: - ❌ False positives in CI on unrelated PRs - ❌ Failures due to old articles when making code changes - ❌ Inability to evolve article standards without fixing all old articles

Solution: Test Only New Articles

Now we: - ✅ Only validate articles generated in the current workflow run - ✅ Use git to detect which articles are new - ✅ Fall back to date-based detection (today/yesterday) when needed - ✅ Skip article validation entirely in the standard test suite

Test Files

tests/test_new_articles.py (ACTIVE)

  • Purpose: Validate newly generated articles
  • Scope: Only articles from current workflow run
  • Detection Method:
  • Primary: Git diff to find added/modified articles
  • Fallback: Articles from today and yesterday
  • Usage: Run automatically by fetch-trends.yml after article generation

tests/test_article_integrity.py (LEGACY)

  • Purpose: Comprehensive validation of ALL articles
  • Status: Excluded from standard test suite
  • Usage: Manual debugging only
  • Note: Will fail on historical articles with different formats

Workflows

lint-and-test.yml

- name: Run tests
  run: uv run pytest tests/ -v --ignore=tests/test_article_integrity.py --ignore=tests/test_new_articles.py
- Runs on every push/PR - Excludes all article validation tests - Only runs unit tests for code functionality

fetch-trends.yml

- name: Run tests on generated content
  if: steps.aggregate.outputs.has_valid_articles == 'true'
  env:
    NEW_ARTICLES_EXPECTED: 'true'
  run: uv run pytest tests/test_new_articles.py -v
- Runs after article generation and aggregation - Sets NEW_ARTICLES_EXPECTED=true to enable validation - Only validates articles from the current run

Running Tests Locally

Run unit tests (no article validation):

uv run pytest tests/ -v --ignore=tests/test_article_integrity.py --ignore=tests/test_new_articles.py

Run article validation for new articles:

export NEW_ARTICLES_EXPECTED=true
uv run pytest tests/test_new_articles.py -v

Run legacy article integrity tests (will likely fail on old articles):

uv run pytest tests/test_article_integrity.py -v

Article Validation Checks

New articles are validated for:

  1. Structure:
  2. Has a title (# Title)
  3. Has a summary section (## 📌 Összefoglaló or variants)
  4. Has a sources section (## 🔗 Forrásanyagok)

  5. Content Quality:

  6. Summary is at least 50 characters
  7. Contains real URLs (not placeholders)
  8. No error keywords (Error:, TODO:, FIXME:, etc.)

  9. Format:

  10. Valid markdown structure
  11. Proper section headers

Future Improvements

Possible enhancements to consider:

  1. Incremental Validation: Track which articles have been validated to avoid re-testing
  2. Migration Tool: Gradually update old articles to meet current standards
  3. Version Markers: Add format version metadata to articles
  4. Separate History Tests: Create optional tests that can validate old articles when needed

Questions?

If you need to: - Add new validation rules → Update tests/test_new_articles.py - Debug article generation → Run test_new_articles.py locally - Check all articles → Run test_article_integrity.py manually - Update test strategy → Modify this document and relevant test files