File size: 6,282 Bytes
0f922c9 e55c550 0f922c9 e55c550 0f922c9 e55c550 0f922c9 e55c550 0f922c9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
# OCRInsight
---
title: OCRInsight
emoji: "🧾"
colorFrom: "0A84FF"
colorTo: "7C3AED"
sdk: docker
sdk_version: "1.0"
app_file: streamlit_app.py
pinned: false
---
This Streamlit application allows users to perform OCR (Optical Character Recognition) using multiple open-source OCR engines and optionally process the OCR results using LLMs (Large Language Models). Users can compare the outputs of different OCR models and perform tasks such as summarization or text generation based on the OCR results.
Here the link of comprehensive explanation: (https://medium.com/@alperenclk/ocrinsight-building-a-modular-ocr-and-llm-application-f3d3a1ea7a18)

## Features
### Multiple OCR Engines Supported:
* EasyOCR
* DocTR
* Tesseract OCR
* PaddleOCR
#### Optional LLM Processing:
Use models like llama3.1, llama3, gemma2 via Ollama.
Perform tasks such as summarization or text generation based on OCR results.
#### Compare OCR Outputs:
Select multiple OCR models to compare their outputs side by side.
#### Save Outputs:
Option to save OCR and LLM outputs to text files.
## Installation
### Prerequisites
- Python 3.7 or higher
- pip package manager
### Clone the Repository
```bash
git clone https://github.com/Alperenclk/OCRInsight-open-source-OCRs-Plus-LLM.git
cd ocr-llm-app
```
### Create a Virtual Environment (Recommended)
```bash
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
```
### Install Required Python Packages
#### Install the required packages using pip:
```bash
pip install -r requirements.txt
```
Note: The requirements.txt file includes basic dependencies. Depending on the OCR engines and LLM support you want to use, you may need to install additional dependencies as described below.
## Install OCR Engine Dependencies
### EasyOCR
```bash
pip install easyocr
```
### DocTR
```bash
pip install python-doctr[torch]
```
Note: For GPU support, ensure that PyTorch is installed with CUDA support.
### Tesseract OCR
Install Tesseract OCR Engine:
#### Windows:
Download the Tesseract installer from UB Mannheim: <https://github.com/UB-Mannheim/tesseract/wiki>.
**Run the installer and follow the instructions.
Note the installation path (e.g., C:\Program Files\Tesseract-OCR\tesseract.exe).
Update the pytesseract.pytesseract.tesseract_cmd variable in ocr_engines.py to point to the Tesseract executable.**
#### macOS:
```bash
brew install tesseract
```
#### Ubuntu/Linux:
``` bash
sudo apt-get update
sudo apt-get install tesseract-ocr
```
##### Install Python Wrapper:
```bash
pip install pytesseract
```
##### Language Data Files:
Ensure that the language data files for the languages you intend to use are installed. For example, to install Turkish language data on Ubuntu:
```bash
sudo apt-get install tesseract-ocr-tur
```
### PaddleOCR
#### Install PaddlePaddle:
#### CPU Version:
```bash
pip install paddlepaddle
```
#### GPU Version:
Refer to the PaddlePaddle Installation Guide for GPU support.
### Install PaddleOCR:
```bash
pip install paddleocr
```
## Install LLM Dependencies (Optional)
If you want to use the LLM features, install **Ollama**:
```bash
pip install ollama
```
Note: If you do not wish to use the LLM features, **you can skip this step**. The application will work in OCR-only mode.
## Usage
### Run the Application
```bash
streamlit run app.py
```
## Application Interface
### Settings Sidebar:
**Select Device:** Choose between CPU and GPU (if available).
**Language Selection:** Choose the language for OCR processing.
**Select OCR Models:** Choose one or more OCR models to use.
**LLM Model Selection:** Choose an LLM model or select "Only OCR Mode" to disable LLM features.
**LLM Command and Task Type:** Enter commands and select tasks if LLM is enabled.
**Save Outputs:** Option to save OCR and LLM outputs to files.
### Main Area:
**File Upload:** Upload a PDF or image file for OCR processing.
**OCR Results:** View the OCR results from the selected models.
**LLM Processing:** Perform LLM processing on the combined OCR text (if enabled).
## Notes
**Language Support:**
Ensure that the necessary language data files or models are installed for each OCR engine you intend to use.
Some OCR engines may require specific language codes or configurations.
**GPU Support:**
For GPU acceleration, ensure that your hardware supports it and that the necessary libraries (e.g., CUDA) are installed.
Not all OCR engines support GPU acceleration.
**Performance:**
Processing multiple OCR engines simultaneously may consume significant resources.
Processing large files or images may take longer.
Modular Code Structure
The application is structured modularly to enhance maintainability and extensibility.
**app.py:** The main Streamlit application script.
**ocr_engines.py:** Contains functions to initialize and perform OCR using different engines.
**llm_processor.py:** Contains functions for LLM processing (optional).
Modifying the Code
#### **Adding a New OCR Engine:**
Create a new function in ocr_engines.py to initialize and perform OCR with the new engine.
Update initialize_ocr_models and perform_ocr functions accordingly.
**Modifying LLM Functionality:**
Update llm_processor.py with new LLM models or processing methods.
**Disabling LLM Features:**
If you don't want to use LLM features, you don't need to install ollama.
The application will automatically disable LLM features if ollama is not installed.
## Troubleshooting
**Import Errors:**
If you encounter import errors, ensure that all required packages are installed.
For optional features (like LLM), missing packages will disable those features without affecting the rest of the application.
**Tesseract Not Found:**
Ensure that the Tesseract executable path is correctly set in ocr_engines.py.
Verify that Tesseract is installed and the path is correct.
**Language Data Missing:**
Install the necessary language data files for the OCR engines.
Contributing
Contributions are welcome! Please fork the repository and submit a pull request for any improvements or new features.
### License
This project is licensed under the **MIT** License.
# OCR
|