Spaces:
Running
on
Zero
Running
on
Zero
| # CLI Redaction Test Suite | |
| This test suite provides comprehensive testing for the `cli_redact.py` script based on all the examples shown in the CLI epilog. | |
| ## Overview | |
| The test suite includes tests for: | |
| 1. **PDF Redaction Examples** | |
| - Default settings (local OCR) | |
| - Text extraction only (no redaction) | |
| - Text extraction with whole page redaction | |
| - Redaction with allow lists | |
| - Limited pages with custom fuzzy matching | |
| - Custom deny/allow/whole page lists | |
| - Image redaction | |
| 2. **Tabular Anonymisation Examples** | |
| - CSV anonymisation with specific columns | |
| - Different anonymisation strategies | |
| - Word document anonymisation | |
| 3. **AWS Services Examples** | |
| - Textract and Comprehend redaction | |
| - Signature extraction | |
| - Layout extraction | |
| 4. **Duplicate Detection Examples** | |
| - Duplicate pages in OCR files | |
| - Line-level duplicate detection | |
| - Tabular duplicate detection | |
| 5. **Textract Batch Operations** | |
| - Submit documents for analysis | |
| - Retrieve results by job ID | |
| - List recent jobs | |
| ## Running the Tests | |
| ### Method 1: Run the test suite directly | |
| ```bash | |
| cd test | |
| python test.py | |
| ``` | |
| ### Method 2: Use the convenience script | |
| ```bash | |
| cd test | |
| python run_tests.py | |
| ``` | |
| ### Method 3: Run with unittest | |
| ```bash | |
| cd test | |
| python -m unittest test.test.TestCLIRedactExamples -v | |
| ``` | |
| ## Test Behavior | |
| - **File Dependencies**: Tests will be skipped if required example files are not found in the `example_data/` directory | |
| - **AWS Tests**: AWS-related tests may fail if credentials are not configured, but this is expected | |
| - **Temporary Output**: All tests use temporary output directories that are cleaned up automatically | |
| - **Timeout**: Each test has a 10-minute timeout to prevent hanging | |
| ## Test Structure | |
| The test suite uses Python's `unittest` framework with the following structure: | |
| - `TestCLIRedactExamples`: Main test class containing all test methods | |
| - `run_cli_redact()`: Helper function that executes the CLI script with specified parameters | |
| - `run_all_tests()`: Main function that runs all tests and provides a summary | |
| ## Example Output | |
| ``` | |
| ================================================================================ | |
| DOCUMENT REDACTION CLI TEST SUITE | |
| ================================================================================ | |
| This test suite runs through all the examples from the CLI epilog. | |
| Tests will be skipped if required example files are not found. | |
| AWS-related tests may fail if credentials are not configured. | |
| ================================================================================ | |
| Test setup complete. Script: /path/to/cli_redact.py | |
| Example data directory: /path/to/example_data | |
| Temp output directory: /tmp/test_output_xyz | |
| === Testing PDF redaction with default settings === | |
| ✅ PDF redaction with default settings passed | |
| === Testing PDF text extraction only === | |
| ✅ PDF text extraction only passed | |
| ... | |
| ================================================================================ | |
| TEST SUMMARY | |
| ================================================================================ | |
| Tests run: 20 | |
| Failures: 0 | |
| Errors: 0 | |
| Skipped: 2 | |
| Overall result: ✅ PASSED | |
| ================================================================================ | |
| ``` | |
| ## Requirements | |
| - Python 3.6+ | |
| - All dependencies for the main CLI script | |
| - Example data files in the `example_data/` directory (for full test coverage) | |
| - AWS credentials (for AWS-related tests) | |
| ## Notes | |
| - Tests are designed to be robust and will skip gracefully if files are missing | |
| - AWS tests are marked as completed even if they fail due to missing credentials | |
| - The test suite provides detailed output for debugging | |
| - All temporary files are cleaned up automatically | |