seanpedrickcase's picture
Sync: Changed search text tab title
d864d45
|
raw
history blame
3.71 kB
# CLI Redaction Test Suite
This test suite provides comprehensive testing for the `cli_redact.py` script based on all the examples shown in the CLI epilog.
## Overview
The test suite includes tests for:
1. **PDF Redaction Examples**
- Default settings (local OCR)
- Text extraction only (no redaction)
- Text extraction with whole page redaction
- Redaction with allow lists
- Limited pages with custom fuzzy matching
- Custom deny/allow/whole page lists
- Image redaction
2. **Tabular Anonymisation Examples**
- CSV anonymisation with specific columns
- Different anonymisation strategies
- Word document anonymisation
3. **AWS Services Examples**
- Textract and Comprehend redaction
- Signature extraction
- Layout extraction
4. **Duplicate Detection Examples**
- Duplicate pages in OCR files
- Line-level duplicate detection
- Tabular duplicate detection
5. **Textract Batch Operations**
- Submit documents for analysis
- Retrieve results by job ID
- List recent jobs
## Running the Tests
### Method 1: Run the test suite directly
```bash
cd test
python test.py
```
### Method 2: Use the convenience script
```bash
cd test
python run_tests.py
```
### Method 3: Run with unittest
```bash
cd test
python -m unittest test.test.TestCLIRedactExamples -v
```
## Test Behavior
- **File Dependencies**: Tests will be skipped if required example files are not found in the `example_data/` directory
- **AWS Tests**: AWS-related tests may fail if credentials are not configured, but this is expected
- **Temporary Output**: All tests use temporary output directories that are cleaned up automatically
- **Timeout**: Each test has a 10-minute timeout to prevent hanging
## Test Structure
The test suite uses Python's `unittest` framework with the following structure:
- `TestCLIRedactExamples`: Main test class containing all test methods
- `run_cli_redact()`: Helper function that executes the CLI script with specified parameters
- `run_all_tests()`: Main function that runs all tests and provides a summary
## Example Output
```
================================================================================
DOCUMENT REDACTION CLI TEST SUITE
================================================================================
This test suite runs through all the examples from the CLI epilog.
Tests will be skipped if required example files are not found.
AWS-related tests may fail if credentials are not configured.
================================================================================
Test setup complete. Script: /path/to/cli_redact.py
Example data directory: /path/to/example_data
Temp output directory: /tmp/test_output_xyz
=== Testing PDF redaction with default settings ===
✅ PDF redaction with default settings passed
=== Testing PDF text extraction only ===
✅ PDF text extraction only passed
...
================================================================================
TEST SUMMARY
================================================================================
Tests run: 20
Failures: 0
Errors: 0
Skipped: 2
Overall result: ✅ PASSED
================================================================================
```
## Requirements
- Python 3.6+
- All dependencies for the main CLI script
- Example data files in the `example_data/` directory (for full test coverage)
- AWS credentials (for AWS-related tests)
## Notes
- Tests are designed to be robust and will skip gracefully if files are missing
- AWS tests are marked as completed even if they fail due to missing credentials
- The test suite provides detailed output for debugging
- All temporary files are cleaned up automatically