GLiNER2 Dataset Mention Extractor

Fine-tuned GLiNER2 model for extracting structured dataset mentions from research documents.

Task

Given a document passage, extracts:

  • Entity fields: dataset_name, acronym, producer, geography, description, etc.
  • Classifications: dataset_tag (named/descriptive/vague), usage_context, is_used

Training

  • Base model: fastino/gliner2-base-v1
  • Method: LoRA (r=16, alpha=32)
  • Data: 1,197 synthetic training examples

Usage

from gliner2 import GLiNER2

extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1")
extractor.load_adapter("rafmacalaba/gliner2-datause-v1")

schema = (
    extractor.create_schema()
    .structure("dataset_mention")
        .field("dataset_name", dtype="str")
        .field("acronym", dtype="str")
        .field("producer", dtype="str")
        .field("geography", dtype="str")
        .field("dataset_tag", dtype="str", choices=["named", "descriptive", "vague"])
        .field("usage_context", dtype="str", choices=["primary", "supporting", "background"])
        .field("is_used", dtype="str", choices=["True", "False"])
)

results = extractor.extract(text, schema)
dataset_mentions = results["dataset_mention"]
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rafmacalaba/gliner2-datause-v1

Adapter
(2)
this model