Spaces:
Running
Running
| title: Historical OCR | |
| emoji: ⚙️ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: 1.44.1 | |
| app_file: app.py | |
| pinned: false | |
| license: gpl-3.0 | |
| short_description: advanced OCR application for historical document analysis | |
| # Historical OCR | |
| An advanced OCR application for historical document analysis using Mistral AI. | |
| > **Note:** This tool is designed to assist scholars in historical research by extracting text from challenging documents. While it may not achieve 100% accuracy for all materials, it serves as a valuable research aid for navigating historical documents, particularly historical newspapers, handwritten documents, and photos of archival materials. | |
| ## Features | |
| - **OCR with Context:** AI-enhanced OCR optimized for historical documents | |
| - **Document Type Detection:** Automatically identifies handwritten letters, recipes, scientific texts, and more | |
| - **Advanced Image Preprocessing:** | |
| - Automatic deskewing to correct document orientation | |
| - Smart thresholding with Otsu and adaptive methods | |
| - Morphological operations to clean up text | |
| - Document-type specific optimization | |
| - **Custom Prompting:** Tailor the AI analysis with document-specific instructions | |
| - **Structured Output:** Returns organized, structured information based on document type | |
| ## Using This App | |
| 1. Upload a historical document (image or PDF) | |
| 2. Add optional context or special instructions | |
| 3. Get detailed, structured OCR results with historical context | |
| ## Supported Document Types | |
| - Handwritten letters and correspondence | |
| - Historical recipes and cookbooks | |
| - Travel accounts and exploration logs | |
| - Scientific papers and experiments | |
| - Legal documents and certificates | |
| - Historical newspaper articles | |
| - General historical texts | |
| ## Technical Details | |
| Built with Streamlit and Mistral AI's OCR and large language model capabilities. | |
| --- | |
| Created by Zach Muhlbauer, CUNY Graduate Center | |