Spaces:

seanpedrickcase
/

document_redaction_vlm

Running on Zero

App Files Files Community

document_redaction_vlm / .github /workflow_README.md

seanpedrickcase

Sync: Changed search text tab title

d864d45 10 days ago

preview code

raw

history blame contribute delete

4.91 kB

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

GitHub Actions CI/CD Setup

This directory contains GitHub Actions workflows for automated testing of the CLI redaction application.

Workflows Overview

1. Simple Test Run (`.github/workflows/simple-test.yml`)

Purpose: Basic test execution
Triggers: Push to main/dev, Pull requests
OS: Ubuntu Latest
Python: 3.11
Features:
- Installs system dependencies
- Sets up test data
- Runs CLI tests
- Runs pytest

2. Comprehensive CI/CD (`.github/workflows/ci.yml`)

Purpose: Full CI/CD pipeline
Features:
- Linting (Ruff, Black)
- Unit tests (Python 3.10, 3.11, 3.12)
- Integration tests
- Security scanning (Safety, Bandit)
- Coverage reporting
- Package building (on main branch)

3. Multi-OS Testing (`.github/workflows/multi-os-test.yml`)

Purpose: Cross-platform testing
OS: Ubuntu, macOS (Windows not included currently but may be reintroduced)
Python: 3.10, 3.11, 3.12
Features: Tests compatibility across different operating systems

4. Basic Test Suite (`.github/workflows/test.yml`)

Purpose: Original test workflow
Features:
- Multiple Python versions
- System dependency installation
- Test data creation
- Coverage reporting

Setup Scripts

Test Data Setup (`.github/scripts/setup_test_data.py`)

Creates dummy test files when example data is not available:

PDF documents
CSV files
Word documents
Images
Allow/deny lists
OCR output files

Usage

Running Tests Locally

# Install dependencies
pip install -r requirements.txt
pip install pytest pytest-cov

# Setup test data
python .github/scripts/setup_test_data.py

# Run tests
cd test
python test.py

GitHub Actions Triggers

Push to main/dev: Runs all tests
Pull Request: Runs tests and linting
Daily Schedule: Runs tests at 2 AM UTC
Manual Trigger: Can be triggered manually from GitHub

Configuration

Environment Variables

PYTHON_VERSION: Default Python version (3.11)
PYTHONPATH: Set automatically for test discovery

Caching

Pip dependencies are cached for faster builds
Cache key based on requirements.txt hash

Artifacts

Test results (JUnit XML)
Coverage reports (HTML, XML)
Security reports
Build artifacts (on main branch)

Test Data

The workflows automatically create test data when example files are missing:

Required Files Created:

example_data/example_of_emails_sent_to_a_professor_before_applying.pdf
example_data/combined_case_notes.csv
example_data/Bold minimalist professional cover letter.docx
example_data/example_complaint_letter.jpg
example_data/test_allow_list_*.csv
example_data/partnership_toolkit_redact_*.csv
example_data/example_outputs/doubled_output_joined.pdf_ocr_output.csv

Dependencies Installed:

System: tesseract-ocr, poppler-utils, OpenGL libraries
Python: All requirements.txt packages + pytest, reportlab, pillow

Workflow Status

Success Criteria:

✅ All tests pass
✅ No linting errors
✅ Security checks pass
✅ Coverage meets threshold (if configured)

Failure Handling:

Tests are designed to skip gracefully if files are missing
AWS tests are expected to fail without credentials
System dependency failures are handled with fallbacks

Customization

Adding New Tests:

Add test methods to test/test.py
Update test data in setup_test_data.py if needed
Tests will automatically run in all workflows

Modifying Workflows:

Edit the appropriate .yml file
Test locally first
Push to trigger the workflow

Environment-Specific Settings:

Ubuntu: Full system dependencies
Windows: Python packages only
macOS: Homebrew dependencies

Troubleshooting

Common Issues:

Missing Dependencies:
- Check system dependency installation
- Verify Python package versions
Test Failures:
- Check test data creation
- Verify file paths
- Review test output logs
AWS Test Failures:
- Expected without credentials
- Tests are designed to handle this gracefully
System Dependency Issues:
- Different OS have different requirements
- Check the specific OS section in workflows

Debug Mode:

Add --verbose or -v flags to pytest commands for more detailed output.

Security

Dependencies are scanned with Safety
Code is scanned with Bandit
No secrets are exposed in logs
Test data is temporary and cleaned up

Performance

Tests run in parallel where possible
Dependencies are cached
Only necessary system packages are installed
Test data is created efficiently

Monitoring

Workflow status is visible in GitHub Actions tab
Coverage reports are uploaded to Codecov
Test results are available as artifacts
Security reports are generated and stored