document_redaction_vlm / .github /workflow_README.md
seanpedrickcase's picture
Sync: Changed search text tab title
d864d45
# GitHub Actions CI/CD Setup
This directory contains GitHub Actions workflows for automated testing of the CLI redaction application.
## Workflows Overview
### 1. **Simple Test Run** (`.github/workflows/simple-test.yml`)
- **Purpose**: Basic test execution
- **Triggers**: Push to main/dev, Pull requests
- **OS**: Ubuntu Latest
- **Python**: 3.11
- **Features**:
- Installs system dependencies
- Sets up test data
- Runs CLI tests
- Runs pytest
### 2. **Comprehensive CI/CD** (`.github/workflows/ci.yml`)
- **Purpose**: Full CI/CD pipeline
- **Features**:
- Linting (Ruff, Black)
- Unit tests (Python 3.10, 3.11, 3.12)
- Integration tests
- Security scanning (Safety, Bandit)
- Coverage reporting
- Package building (on main branch)
### 3. **Multi-OS Testing** (`.github/workflows/multi-os-test.yml`)
- **Purpose**: Cross-platform testing
- **OS**: Ubuntu, macOS (Windows not included currently but may be reintroduced)
- **Python**: 3.10, 3.11, 3.12
- **Features**: Tests compatibility across different operating systems
### 4. **Basic Test Suite** (`.github/workflows/test.yml`)
- **Purpose**: Original test workflow
- **Features**:
- Multiple Python versions
- System dependency installation
- Test data creation
- Coverage reporting
## Setup Scripts
### Test Data Setup (`.github/scripts/setup_test_data.py`)
Creates dummy test files when example data is not available:
- PDF documents
- CSV files
- Word documents
- Images
- Allow/deny lists
- OCR output files
## Usage
### Running Tests Locally
```bash
# Install dependencies
pip install -r requirements.txt
pip install pytest pytest-cov
# Setup test data
python .github/scripts/setup_test_data.py
# Run tests
cd test
python test.py
```
### GitHub Actions Triggers
1. **Push to main/dev**: Runs all tests
2. **Pull Request**: Runs tests and linting
3. **Daily Schedule**: Runs tests at 2 AM UTC
4. **Manual Trigger**: Can be triggered manually from GitHub
## Configuration
### Environment Variables
- `PYTHON_VERSION`: Default Python version (3.11)
- `PYTHONPATH`: Set automatically for test discovery
### Caching
- Pip dependencies are cached for faster builds
- Cache key based on requirements.txt hash
### Artifacts
- Test results (JUnit XML)
- Coverage reports (HTML, XML)
- Security reports
- Build artifacts (on main branch)
## Test Data
The workflows automatically create test data when example files are missing:
### Required Files Created:
- `example_data/example_of_emails_sent_to_a_professor_before_applying.pdf`
- `example_data/combined_case_notes.csv`
- `example_data/Bold minimalist professional cover letter.docx`
- `example_data/example_complaint_letter.jpg`
- `example_data/test_allow_list_*.csv`
- `example_data/partnership_toolkit_redact_*.csv`
- `example_data/example_outputs/doubled_output_joined.pdf_ocr_output.csv`
### Dependencies Installed:
- **System**: tesseract-ocr, poppler-utils, OpenGL libraries
- **Python**: All requirements.txt packages + pytest, reportlab, pillow
## Workflow Status
### Success Criteria:
- ✅ All tests pass
- ✅ No linting errors
- ✅ Security checks pass
- ✅ Coverage meets threshold (if configured)
### Failure Handling:
- Tests are designed to skip gracefully if files are missing
- AWS tests are expected to fail without credentials
- System dependency failures are handled with fallbacks
## Customization
### Adding New Tests:
1. Add test methods to `test/test.py`
2. Update test data in `setup_test_data.py` if needed
3. Tests will automatically run in all workflows
### Modifying Workflows:
1. Edit the appropriate `.yml` file
2. Test locally first
3. Push to trigger the workflow
### Environment-Specific Settings:
- **Ubuntu**: Full system dependencies
- **Windows**: Python packages only
- **macOS**: Homebrew dependencies
## Troubleshooting
### Common Issues:
1. **Missing Dependencies**:
- Check system dependency installation
- Verify Python package versions
2. **Test Failures**:
- Check test data creation
- Verify file paths
- Review test output logs
3. **AWS Test Failures**:
- Expected without credentials
- Tests are designed to handle this gracefully
4. **System Dependency Issues**:
- Different OS have different requirements
- Check the specific OS section in workflows
### Debug Mode:
Add `--verbose` or `-v` flags to pytest commands for more detailed output.
## Security
- Dependencies are scanned with Safety
- Code is scanned with Bandit
- No secrets are exposed in logs
- Test data is temporary and cleaned up
## Performance
- Tests run in parallel where possible
- Dependencies are cached
- Only necessary system packages are installed
- Test data is created efficiently
## Monitoring
- Workflow status is visible in GitHub Actions tab
- Coverage reports are uploaded to Codecov
- Test results are available as artifacts
- Security reports are generated and stored