YOLOv8s β€” SEC IPO Filing Image Classifier

A fine-tuned YOLOv8s model trained to classify images extracted from U.S. IPO registration statements (S-1 and F-1 filings) on SEC EDGAR. This model serves as the initial detection stage in the pipeline used to construct the gtfintechlab/ipo-images dataset.


Classes

The model classifies images into 5 categories:

Label Description
chart Bar charts, line charts, pie charts, org charts, flow charts, etc.
logo Company logos and branding marks
map Geographic maps
infographic Composite visuals combining data, icons, and text
other Decorative images, photographs, signatures, and other visuals

Usage

Install dependencies

pip install ultralytics

Run inference

from ultralytics import YOLO

model = YOLO("<path/to/model.pt>")

# Single image
results = model("path/to/image.png")
print(results[0].probs.top1)        # top class index
print(results[0].names)             # class name mapping

# With a confidence threshold
results = model("path/to/image.png", conf=0.5)

# Batch inference
results = model(["image1.png", "image2.png", "image3.png"])
for r in results:
    print(r.probs.top1cls, r.names[r.probs.top1])

Get the predicted label as a string

result = model("image.png")[0]
label = result.names[result.probs.top1]
print(label)  # e.g. "chart"

Relation to the IPO Image Dataset

This model is the first stage of the classification pipeline used to build the gtfintechlab/ipo-images dataset β€” a large-scale collection of 76,000+ labeled images from SEC IPO filings spanning 1994–2026.

The pipeline works as follows:

  1. This model generates an initial prediction (initial_yolo_prediction) for each image
  2. An ensemble of 8 Vision-Language Models verifies the prediction, producing a consensus score (llm_yolo_verification_score) and per-model votes (llm_yolo_verification_votes)
  3. The final label in the dataset reflects this verified output

Citation

If you use this model in your work, please cite:

@misc{galarnyk2026ipomine,
  title  = {IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents},
  author = {Galarnyk, Michael and Lohani, Siddharth and Nandi, Sagnik and Patel, Aman and Kannan, Vidhyakshaya and Banerjee, Prasun and Routu, Rutwik and Ye, Liqin and Hiray, Arnav and Somani, Siddhartha and Chava, Sudheer},
  year   = {2026},
  url    = {https://huggingface.co/datasets/gtfintechlab/ipo-images},
  note   = {Preprint/Working Paper}
}
Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gtfintechlab/ipomine-yolov8-classifier

Base model

Ultralytics/YOLOv8
Finetuned
(128)
this model

Dataset used to train gtfintechlab/ipomine-yolov8-classifier

Collection including gtfintechlab/ipomine-yolov8-classifier