Spaces:

SFEREWQW
/

114514

Runtime error

App Files Files Community

114514 / docs /en /algorithm /formula_detection.rst

SFEREWQW

Upload 395 files

18e4106 verified 5 months ago

raw

history blame contribute delete

3.57 kB

	.. _algorithm_formula_detection:

	====================
	Formula Detection Algorithm
	====================

	Introduction
	====================

	Formula detection involves identifying the positions of all formulas (including inline and block formulas) in a given input image.

	.. note::

	Formula detection is technically a subtask of layout detection. However, due to its complexity, we recommend using a dedicated formula detection model to decouple it. This approach typically makes data annotation easier and improves detection performance.

	Model Usage
	====================

	With the environment properly set up, simply run the layout detection algorithm script by executing ``scripts/formula_detection.py``.

	.. code:: shell

	$ python scripts/formula_detection.py --config configs/formula_detection.yaml

	Model Configuration
	--------------------

	.. code:: yaml

	inputs: assets/demo/formula_detection
	outputs: outputs/formula_detection
	tasks:
	formula_detection:
	model: formula_detection_yolo
	model_config:
	img_size: 1280
	conf_thres: 0.25
	iou_thres: 0.45
	batch_size: 1
	model_path: models/MFD/yolov8/weights.pt
	visualize: True

	- inputs/outputs: Define the input file path and the visualization output directory, respectively.
	- tasks: Define the task type, currently only a formula detection task is included.
	- model: Define the specific model type: currently, only the YOLO formula detection model is available.
	- model_config: Define the model configuration.
	- img_size: Define the image's longer side size; the shorter side will be scaled proportionally.
	- conf_thres: Define the confidence threshold; only targets above this threshold will be detected.
	- iou_thres: Define the IoU threshold to remove targets with an overlap greater than this value.
	- batch_size: Define the batch size; the number of images inferred simultaneously. Generally, the larger the batch size, the faster the inference speed. A better GPU allows for a larger batch size.
	- model_path: Path to the model weights.
	- visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory.

	Diverse Input Support
	--------------------

	The formula detection script in PDF-Extract-Kit supports various input formats such as ``a single image``, ``a directory of image files``, ``a single PDF file``, and ``a directory of PDF files``.

	.. note::

	Modify the ``inputs`` path in ``configs/formula_detection.yaml`` according to your actual data format:
	- Single image: path/to/image
	- Image directory: path/to/images
	- Single PDF file: path/to/pdf
	- PDF directory: path/to/pdfs

	.. note::

	When using a PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``.

	.. code:: python

	# for image detection
	detection_results = model_formula_detection.predict_images(input_data, result_path)

	Change to:

	.. code:: python

	# for pdf detection
	detection_results = model_formula_detection.predict_pdfs(input_data, result_path)


	Viewing Visualization Results
	--------------------

	When the ``visualize`` option in the config file is set to ``True``, visualization results will be saved in the ``outputs/formula_detection`` directory.

	.. note::

	Visualization facilitates the analysis of model results. However, for large-scale tasks, it is recommended to disable visualization (set ``visualize`` to ``False`` ) to reduce memory and disk usage.