esg_emissions_ner

Overview

This model is a fine-tuned RoBERTa-base instance designed for Named Entity Recognition (NER) within corporate sustainability reports. It specifically extracts greenhouse gas (GHG) emission figures categorized by Scope 1, 2, and 3, along with their associated units and target net-zero years.

Model Architecture

The model uses a RoBERTa-Base backbone with a token classification head.

Tokenization: Uses Byte-Pair Encoding (BPE) optimized for technical and corporate terminology.
Contextual Embeddings: 768-dimensional vectors that capture the relationship between metrics and their qualitative context (e.g., distinguishing between a "target" and a "current" value).
Head: A dropout layer (0.1) followed by a linear classification layer mapping to 10 BIO-tagged labels.

Intended Use

Automated ESG Auditing: Extracting data from thousands of PDF annual reports to populate sustainability databases.
Investment Screening: Helping analysts verify if a company's reported emissions align with their public climate commitments.
Regulatory Compliance: Streamlining the reporting process for CSRD (Corporate Sustainability Reporting Directive) requirements.

Limitations

Complex Tables: Struggles with data embedded in complex multi-column PDF tables; works best on narrative text or simple lists.
Unit Conversion: The model extracts units (e.g., "metric tonnes of CO2e") but does not perform mathematical conversions between different units.
Language: Currently optimized for English-language reports only; non-English terms may be misclassified.

Downloads last month: 11

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support