Article Testing Strategy¶
This document explains how article validation tests work in this repository.
Overview¶
We have two types of article tests with different purposes:
1. Unit Tests (Standard Test Suite)¶
Location: Most tests in tests/ directory
When they run: On every push and PR via lint-and-test.yml workflow
What they test: Core functionality, utilities, business logic
What they DON'T test: Article content validation
2. Article Validation Tests (Generated Content)¶
Location: tests/test_new_articles.py
When they run: Only after article generation in fetch-trends.yml workflow
What they test: Newly generated articles only (not historical articles)
Why This Separation?¶
Problem with Testing All Articles¶
Previously, article validation tests checked ALL articles in the repository, including: - Historical articles that may use older formats - Articles with different structural requirements from the past - Articles that are valid but don't meet current standards
This caused: - ❌ False positives in CI on unrelated PRs - ❌ Failures due to old articles when making code changes - ❌ Inability to evolve article standards without fixing all old articles
Solution: Test Only New Articles¶
Now we: - ✅ Only validate articles generated in the current workflow run - ✅ Use git to detect which articles are new - ✅ Fall back to date-based detection (today/yesterday) when needed - ✅ Skip article validation entirely in the standard test suite
Test Files¶
tests/test_new_articles.py (ACTIVE)¶
- Purpose: Validate newly generated articles
- Scope: Only articles from current workflow run
- Detection Method:
- Primary: Git diff to find added/modified articles
- Fallback: Articles from today and yesterday
- Usage: Run automatically by
fetch-trends.ymlafter article generation
tests/test_article_integrity.py (LEGACY)¶
- Purpose: Comprehensive validation of ALL articles
- Status: Excluded from standard test suite
- Usage: Manual debugging only
- Note: Will fail on historical articles with different formats
Workflows¶
lint-and-test.yml¶
- name: Run tests
run: uv run pytest tests/ -v --ignore=tests/test_article_integrity.py --ignore=tests/test_new_articles.py
fetch-trends.yml¶
- name: Run tests on generated content
if: steps.aggregate.outputs.has_valid_articles == 'true'
env:
NEW_ARTICLES_EXPECTED: 'true'
run: uv run pytest tests/test_new_articles.py -v
NEW_ARTICLES_EXPECTED=true to enable validation
- Only validates articles from the current run
Running Tests Locally¶
Run unit tests (no article validation):¶
uv run pytest tests/ -v --ignore=tests/test_article_integrity.py --ignore=tests/test_new_articles.py
Run article validation for new articles:¶
Run legacy article integrity tests (will likely fail on old articles):¶
Article Validation Checks¶
New articles are validated for:
- Structure:
- Has a title (
# Title) - Has a summary section (
## 📌 Összefoglalóor variants) -
Has a sources section (
## 🔗 Forrásanyagok) -
Content Quality:
- Summary is at least 50 characters
- Contains real URLs (not placeholders)
-
No error keywords (Error:, TODO:, FIXME:, etc.)
-
Format:
- Valid markdown structure
- Proper section headers
Future Improvements¶
Possible enhancements to consider:
- Incremental Validation: Track which articles have been validated to avoid re-testing
- Migration Tool: Gradually update old articles to meet current standards
- Version Markers: Add format version metadata to articles
- Separate History Tests: Create optional tests that can validate old articles when needed
Questions?¶
If you need to:
- Add new validation rules → Update tests/test_new_articles.py
- Debug article generation → Run test_new_articles.py locally
- Check all articles → Run test_article_integrity.py manually
- Update test strategy → Modify this document and relevant test files