Spaces:

Beijuka
/

ocr

Configuration error

App Files Files Community

ocr / README.md

Beijuka

Upload folder using huggingface_hub

0f922c9 verified about 2 months ago

preview code

raw

history blame contribute delete

6.28 kB

OCRInsight

title: OCRInsight emoji: "🧾" colorFrom: "0A84FF" colorTo: "7C3AED" sdk: docker sdk_version: "1.0" app_file: streamlit_app.py pinned: false

This Streamlit application allows users to perform OCR (Optical Character Recognition) using multiple open-source OCR engines and optionally process the OCR results using LLMs (Large Language Models). Users can compare the outputs of different OCR models and perform tasks such as summarization or text generation based on the OCR results.

Here the link of comprehensive explanation: (https://medium.com/@alperenclk/ocrinsight-building-a-modular-ocr-and-llm-application-f3d3a1ea7a18)

Features

Multiple OCR Engines Supported:

EasyOCR
DocTR
Tesseract OCR
PaddleOCR

Optional LLM Processing:

Use models like llama3.1, llama3, gemma2 via Ollama. Perform tasks such as summarization or text generation based on OCR results.

Compare OCR Outputs:

Select multiple OCR models to compare their outputs side by side.

Save Outputs:

Option to save OCR and LLM outputs to text files.

Installation

Prerequisites

Python 3.7 or higher
pip package manager

Clone the Repository

git clone https://github.com/Alperenclk/OCRInsight-open-source-OCRs-Plus-LLM.git
cd ocr-llm-app

Create a Virtual Environment (Recommended)

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install Required Python Packages

Install the required packages using pip:

pip install -r requirements.txt

Note: The requirements.txt file includes basic dependencies. Depending on the OCR engines and LLM support you want to use, you may need to install additional dependencies as described below.

Install OCR Engine Dependencies

EasyOCR

pip install easyocr

DocTR

pip install python-doctr[torch]

Note: For GPU support, ensure that PyTorch is installed with CUDA support.

Tesseract OCR

Install Tesseract OCR Engine:

Windows:

Download the Tesseract installer from UB Mannheim: https://github.com/UB-Mannheim/tesseract/wiki.

Run the installer and follow the instructions. Note the installation path (e.g., C:\Program Files\Tesseract-OCR\tesseract.exe). Update the pytesseract.pytesseract.tesseract_cmd variable in ocr_engines.py to point to the Tesseract executable.

macOS:

brew install tesseract

Ubuntu/Linux:

sudo apt-get update
sudo apt-get install tesseract-ocr

Install Python Wrapper:

pip install pytesseract

Language Data Files:

Ensure that the language data files for the languages you intend to use are installed. For example, to install Turkish language data on Ubuntu:

sudo apt-get install tesseract-ocr-tur

PaddleOCR

Install PaddlePaddle:

CPU Version:

pip install paddlepaddle

GPU Version:

Refer to the PaddlePaddle Installation Guide for GPU support.

Install PaddleOCR:

pip install paddleocr

Install LLM Dependencies (Optional)

If you want to use the LLM features, install Ollama:

pip install ollama

Note: If you do not wish to use the LLM features, you can skip this step. The application will work in OCR-only mode.

Usage

Run the Application

streamlit run app.py

Application Interface

Settings Sidebar:

Select Device: Choose between CPU and GPU (if available).

Language Selection: Choose the language for OCR processing.

Select OCR Models: Choose one or more OCR models to use.

LLM Model Selection: Choose an LLM model or select "Only OCR Mode" to disable LLM features.

LLM Command and Task Type: Enter commands and select tasks if LLM is enabled.

Save Outputs: Option to save OCR and LLM outputs to files.

Main Area:

File Upload: Upload a PDF or image file for OCR processing.

OCR Results: View the OCR results from the selected models.

LLM Processing: Perform LLM processing on the combined OCR text (if enabled).

Notes

Language Support:

Ensure that the necessary language data files or models are installed for each OCR engine you intend to use. Some OCR engines may require specific language codes or configurations.

GPU Support:

For GPU acceleration, ensure that your hardware supports it and that the necessary libraries (e.g., CUDA) are installed. Not all OCR engines support GPU acceleration.

Performance:

Processing multiple OCR engines simultaneously may consume significant resources. Processing large files or images may take longer. Modular Code Structure The application is structured modularly to enhance maintainability and extensibility.

app.py: The main Streamlit application script.

ocr_engines.py: Contains functions to initialize and perform OCR using different engines.

llm_processor.py: Contains functions for LLM processing (optional). Modifying the Code

Adding a New OCR Engine:

Create a new function in ocr_engines.py to initialize and perform OCR with the new engine. Update initialize_ocr_models and perform_ocr functions accordingly.

Modifying LLM Functionality:

Update llm_processor.py with new LLM models or processing methods.

Disabling LLM Features:

If you don't want to use LLM features, you don't need to install ollama. The application will automatically disable LLM features if ollama is not installed.

Troubleshooting

Import Errors:

If you encounter import errors, ensure that all required packages are installed. For optional features (like LLM), missing packages will disable those features without affecting the rest of the application.

Tesseract Not Found:

Ensure that the Tesseract executable path is correctly set in ocr_engines.py. Verify that Tesseract is installed and the path is correct.

Language Data Missing:

Install the necessary language data files for the OCR engines. Contributing Contributions are welcome! Please fork the repository and submit a pull request for any improvements or new features.

License

This project is licensed under the MIT License.