| .. _algorithm_formula_detection: | |
| ==================== | |
| Formula Detection Algorithm | |
| ==================== | |
| Introduction | |
| ==================== | |
| Formula detection involves identifying the positions of all formulas (including inline and block formulas) in a given input image. | |
| .. note:: | |
| Formula detection is technically a subtask of layout detection. However, due to its complexity, we recommend using a dedicated formula detection model to decouple it. This approach typically makes data annotation easier and improves detection performance. | |
| Model Usage | |
| ==================== | |
| With the environment properly set up, simply run the layout detection algorithm script by executing ``scripts/formula_detection.py``. | |
| .. code:: shell | |
| $ python scripts/formula_detection.py --config configs/formula_detection.yaml | |
| Model Configuration | |
| -------------------- | |
| .. code:: yaml | |
| inputs: assets/demo/formula_detection | |
| outputs: outputs/formula_detection | |
| tasks: | |
| formula_detection: | |
| model: formula_detection_yolo | |
| model_config: | |
| img_size: 1280 | |
| conf_thres: 0.25 | |
| iou_thres: 0.45 | |
| batch_size: 1 | |
| model_path: models/MFD/yolov8/weights.pt | |
| visualize: True | |
| - inputs/outputs: Define the input file path and the visualization output directory, respectively. | |
| - tasks: Define the task type, currently only a formula detection task is included. | |
| - model: Define the specific model type: currently, only the YOLO formula detection model is available. | |
| - model_config: Define the model configuration. | |
| - img_size: Define the image's longer side size; the shorter side will be scaled proportionally. | |
| - conf_thres: Define the confidence threshold; only targets above this threshold will be detected. | |
| - iou_thres: Define the IoU threshold to remove targets with an overlap greater than this value. | |
| - batch_size: Define the batch size; the number of images inferred simultaneously. Generally, the larger the batch size, the faster the inference speed. A better GPU allows for a larger batch size. | |
| - model_path: Path to the model weights. | |
| - visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory. | |
| Diverse Input Support | |
| -------------------- | |
| The formula detection script in PDF-Extract-Kit supports various input formats such as ``a single image``, ``a directory of image files``, ``a single PDF file``, and ``a directory of PDF files``. | |
| .. note:: | |
| Modify the ``inputs`` path in ``configs/formula_detection.yaml`` according to your actual data format: | |
| - Single image: path/to/image | |
| - Image directory: path/to/images | |
| - Single PDF file: path/to/pdf | |
| - PDF directory: path/to/pdfs | |
| .. note:: | |
| When using a PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``. | |
| .. code:: python | |
| # for image detection | |
| detection_results = model_formula_detection.predict_images(input_data, result_path) | |
| Change to: | |
| .. code:: python | |
| # for pdf detection | |
| detection_results = model_formula_detection.predict_pdfs(input_data, result_path) | |
| Viewing Visualization Results | |
| -------------------- | |
| When the ``visualize`` option in the config file is set to ``True``, visualization results will be saved in the ``outputs/formula_detection`` directory. | |
| .. note:: | |
| Visualization facilitates the analysis of model results. However, for large-scale tasks, it is recommended to disable visualization (set ``visualize`` to ``False`` ) to reduce memory and disk usage. |