OCRInsight
title: OCRInsight emoji: "🧾" colorFrom: "0A84FF" colorTo: "7C3AED" sdk: docker sdk_version: "1.0" app_file: streamlit_app.py pinned: false
This Streamlit application allows users to perform OCR (Optical Character Recognition) using multiple open-source OCR engines and optionally process the OCR results using LLMs (Large Language Models). Users can compare the outputs of different OCR models and perform tasks such as summarization or text generation based on the OCR results.
Here the link of comprehensive explanation: (https://medium.com/@alperenclk/ocrinsight-building-a-modular-ocr-and-llm-application-f3d3a1ea7a18)

Features
Multiple OCR Engines Supported:
- EasyOCR
- DocTR
- Tesseract OCR
- PaddleOCR
Optional LLM Processing:
Use models like llama3.1, llama3, gemma2 via Ollama. Perform tasks such as summarization or text generation based on OCR results.
Compare OCR Outputs:
Select multiple OCR models to compare their outputs side by side.
Save Outputs:
Option to save OCR and LLM outputs to text files.
Installation
Prerequisites
- Python 3.7 or higher
- pip package manager
Clone the Repository
git clone https://github.com/Alperenclk/OCRInsight-open-source-OCRs-Plus-LLM.git
cd ocr-llm-app
Create a Virtual Environment (Recommended)
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
Install Required Python Packages
Install the required packages using pip:
pip install -r requirements.txt
Note: The requirements.txt file includes basic dependencies. Depending on the OCR engines and LLM support you want to use, you may need to install additional dependencies as described below.
Install OCR Engine Dependencies
EasyOCR
pip install easyocr
DocTR
pip install python-doctr[torch]
Note: For GPU support, ensure that PyTorch is installed with CUDA support.
Tesseract OCR
Install Tesseract OCR Engine:
Windows:
Download the Tesseract installer from UB Mannheim: https://github.com/UB-Mannheim/tesseract/wiki.
Run the installer and follow the instructions. Note the installation path (e.g., C:\Program Files\Tesseract-OCR\tesseract.exe). Update the pytesseract.pytesseract.tesseract_cmd variable in ocr_engines.py to point to the Tesseract executable.
macOS:
brew install tesseract
Ubuntu/Linux:
sudo apt-get update
sudo apt-get install tesseract-ocr
Install Python Wrapper:
pip install pytesseract
Language Data Files:
Ensure that the language data files for the languages you intend to use are installed. For example, to install Turkish language data on Ubuntu:
sudo apt-get install tesseract-ocr-tur
PaddleOCR
Install PaddlePaddle:
CPU Version:
pip install paddlepaddle
GPU Version:
Refer to the PaddlePaddle Installation Guide for GPU support.
Install PaddleOCR:
pip install paddleocr
Install LLM Dependencies (Optional)
If you want to use the LLM features, install Ollama:
pip install ollama
Note: If you do not wish to use the LLM features, you can skip this step. The application will work in OCR-only mode.
Usage
Run the Application
streamlit run app.py
Application Interface
Settings Sidebar:
Select Device: Choose between CPU and GPU (if available).
Language Selection: Choose the language for OCR processing.
Select OCR Models: Choose one or more OCR models to use.
LLM Model Selection: Choose an LLM model or select "Only OCR Mode" to disable LLM features.
LLM Command and Task Type: Enter commands and select tasks if LLM is enabled.
Save Outputs: Option to save OCR and LLM outputs to files.
Main Area:
File Upload: Upload a PDF or image file for OCR processing.
OCR Results: View the OCR results from the selected models.
LLM Processing: Perform LLM processing on the combined OCR text (if enabled).
Notes
Language Support:
Ensure that the necessary language data files or models are installed for each OCR engine you intend to use. Some OCR engines may require specific language codes or configurations.
GPU Support:
For GPU acceleration, ensure that your hardware supports it and that the necessary libraries (e.g., CUDA) are installed. Not all OCR engines support GPU acceleration.
Performance:
Processing multiple OCR engines simultaneously may consume significant resources. Processing large files or images may take longer. Modular Code Structure The application is structured modularly to enhance maintainability and extensibility.
app.py: The main Streamlit application script.
ocr_engines.py: Contains functions to initialize and perform OCR using different engines.
llm_processor.py: Contains functions for LLM processing (optional). Modifying the Code
Adding a New OCR Engine:
Create a new function in ocr_engines.py to initialize and perform OCR with the new engine. Update initialize_ocr_models and perform_ocr functions accordingly.
Modifying LLM Functionality:
Update llm_processor.py with new LLM models or processing methods.
Disabling LLM Features:
If you don't want to use LLM features, you don't need to install ollama. The application will automatically disable LLM features if ollama is not installed.
Troubleshooting
Import Errors:
If you encounter import errors, ensure that all required packages are installed. For optional features (like LLM), missing packages will disable those features without affecting the rest of the application.
Tesseract Not Found:
Ensure that the Tesseract executable path is correctly set in ocr_engines.py. Verify that Tesseract is installed and the path is correct.
Language Data Missing:
Install the necessary language data files for the OCR engines. Contributing Contributions are welcome! Please fork the repository and submit a pull request for any improvements or new features.
License
This project is licensed under the MIT License.