document_redaction_vlm / .github /workflow_README.md
seanpedrickcase's picture
Sync: Changed search text tab title
d864d45

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

GitHub Actions CI/CD Setup

This directory contains GitHub Actions workflows for automated testing of the CLI redaction application.

Workflows Overview

1. Simple Test Run (.github/workflows/simple-test.yml)

  • Purpose: Basic test execution
  • Triggers: Push to main/dev, Pull requests
  • OS: Ubuntu Latest
  • Python: 3.11
  • Features:
    • Installs system dependencies
    • Sets up test data
    • Runs CLI tests
    • Runs pytest

2. Comprehensive CI/CD (.github/workflows/ci.yml)

  • Purpose: Full CI/CD pipeline
  • Features:
    • Linting (Ruff, Black)
    • Unit tests (Python 3.10, 3.11, 3.12)
    • Integration tests
    • Security scanning (Safety, Bandit)
    • Coverage reporting
    • Package building (on main branch)

3. Multi-OS Testing (.github/workflows/multi-os-test.yml)

  • Purpose: Cross-platform testing
  • OS: Ubuntu, macOS (Windows not included currently but may be reintroduced)
  • Python: 3.10, 3.11, 3.12
  • Features: Tests compatibility across different operating systems

4. Basic Test Suite (.github/workflows/test.yml)

  • Purpose: Original test workflow
  • Features:
    • Multiple Python versions
    • System dependency installation
    • Test data creation
    • Coverage reporting

Setup Scripts

Test Data Setup (.github/scripts/setup_test_data.py)

Creates dummy test files when example data is not available:

  • PDF documents
  • CSV files
  • Word documents
  • Images
  • Allow/deny lists
  • OCR output files

Usage

Running Tests Locally

# Install dependencies
pip install -r requirements.txt
pip install pytest pytest-cov

# Setup test data
python .github/scripts/setup_test_data.py

# Run tests
cd test
python test.py

GitHub Actions Triggers

  1. Push to main/dev: Runs all tests
  2. Pull Request: Runs tests and linting
  3. Daily Schedule: Runs tests at 2 AM UTC
  4. Manual Trigger: Can be triggered manually from GitHub

Configuration

Environment Variables

  • PYTHON_VERSION: Default Python version (3.11)
  • PYTHONPATH: Set automatically for test discovery

Caching

  • Pip dependencies are cached for faster builds
  • Cache key based on requirements.txt hash

Artifacts

  • Test results (JUnit XML)
  • Coverage reports (HTML, XML)
  • Security reports
  • Build artifacts (on main branch)

Test Data

The workflows automatically create test data when example files are missing:

Required Files Created:

  • example_data/example_of_emails_sent_to_a_professor_before_applying.pdf
  • example_data/combined_case_notes.csv
  • example_data/Bold minimalist professional cover letter.docx
  • example_data/example_complaint_letter.jpg
  • example_data/test_allow_list_*.csv
  • example_data/partnership_toolkit_redact_*.csv
  • example_data/example_outputs/doubled_output_joined.pdf_ocr_output.csv

Dependencies Installed:

  • System: tesseract-ocr, poppler-utils, OpenGL libraries
  • Python: All requirements.txt packages + pytest, reportlab, pillow

Workflow Status

Success Criteria:

  • ✅ All tests pass
  • ✅ No linting errors
  • ✅ Security checks pass
  • ✅ Coverage meets threshold (if configured)

Failure Handling:

  • Tests are designed to skip gracefully if files are missing
  • AWS tests are expected to fail without credentials
  • System dependency failures are handled with fallbacks

Customization

Adding New Tests:

  1. Add test methods to test/test.py
  2. Update test data in setup_test_data.py if needed
  3. Tests will automatically run in all workflows

Modifying Workflows:

  1. Edit the appropriate .yml file
  2. Test locally first
  3. Push to trigger the workflow

Environment-Specific Settings:

  • Ubuntu: Full system dependencies
  • Windows: Python packages only
  • macOS: Homebrew dependencies

Troubleshooting

Common Issues:

  1. Missing Dependencies:

    • Check system dependency installation
    • Verify Python package versions
  2. Test Failures:

    • Check test data creation
    • Verify file paths
    • Review test output logs
  3. AWS Test Failures:

    • Expected without credentials
    • Tests are designed to handle this gracefully
  4. System Dependency Issues:

    • Different OS have different requirements
    • Check the specific OS section in workflows

Debug Mode:

Add --verbose or -v flags to pytest commands for more detailed output.

Security

  • Dependencies are scanned with Safety
  • Code is scanned with Bandit
  • No secrets are exposed in logs
  • Test data is temporary and cleaned up

Performance

  • Tests run in parallel where possible
  • Dependencies are cached
  • Only necessary system packages are installed
  • Test data is created efficiently

Monitoring

  • Workflow status is visible in GitHub Actions tab
  • Coverage reports are uploaded to Codecov
  • Test results are available as artifacts
  • Security reports are generated and stored