Removed extra brackets

db22b48 verified 9 months ago

7.1 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: roberta_nli_ensemble
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# roberta_nli_ensemble

	<!-- Provide a quick summary of what the model is/does. -->

	A fine-tuned RoBERTa model designed for an Natural Language Inference (NLI) task, classifying the relationship between pairs of sentences given a premise and a hypothesis.


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This model builds upon the roberta-base architecture, adding a multi-layer classification head for NLI. It computes average pooled representations of premise and hypothesis tokens (identified via `token_type_ids`) and concatenates them before passing through additional linear and non-linear layers. The final output is used to classify the pair of sentences into one of three classes.

	- Developed by: Dev Soneji and Patrick Mermelstein Lyons
	- Language(s): English
	- Model type: Supervised
	- Model architecture: RoBERTa encoder with a multi-layer classification head
	- Finetuned from model: roberta-base

	### Model Resources

	<!-- Provide links where applicable. -->

	- Repository: [Devtrick/roberta_nli_ensemble](https://huggingface.co/Devtrick/roberta_nli_ensemble)
	- Paper or documentation: [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)

	## Training Details

	### Training Data

	<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->

	The model was trained on a dataset located in `train.csv`. This dataset comprised of 24K premise-hypothesis pairs, with a label to determine if the hypothesis is true based on the premise. The label was binary, 0 = hypothesis is false, 1 = hypothesis is true. No further details were given on the origin and validity of this dataset.

	The data was passed through a tokenizer ([AutoTokenizer](https://huggingface.co/docs/transformers/v4.50.0/en/model_doc/auto#transformers.AutoTokenizer)), as part of the standard hugging face library. No other pre-processing was done, aside from relabelling columns to match the expected format.

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	The model was trained in the following way:
	- The model was trained on the following data ([Training Data](#training-data)), with renaming of columns and tokenization.
	- The model was initialised with a custom configuration class, `roBERTaConfig`, setting essential parameters. The model itself, `roBERTaClassifier` extends the pretrained RoBERTa model to include multiple linear layers for classification and pooling.
	- Hyperparameter selection was carried out in a seperate grid search to identify the best performing hyperparameters. This resulted in the following parameters - [Training Hyperparameters](#training-hyperparameters).
	- The model was validated with the following [test data](#testing-data), giving the following [results](#results).
	- Checkpoints were saved after each epoch, and finally the best checkpoint was reloaded and pushed to the Hugging Face Hub.


	#### Training Hyperparameters

	<!-- This is a summary of the values of hyperparameters used in training the model. -->

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 128
	- eval_batch_size: 128
	- weight_decay: 0.01
	- seed: 42
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 10

	#### Speeds, Sizes, Times

	<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->

	- Training time: This model took 12 minutes 17 seconds to train on the hardware specified below. It was trained on 10 epochs, however early stopping caused only 5 epochs to train.

	Model size: 126M parameteres.

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data & Metrics

	#### Testing Data

	<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->

	The development (and effectively testing) dataset is located in `dev.csv`. This is 6K pairs as validation data, in the same format of the training data. No further details were given on the origin and validity of this dataset.

	The data was passed through a tokenizer ([AutoTokenizer](https://huggingface.co/docs/transformers/v4.50.0/en/model_doc/auto#transformers.AutoTokenizer)), as part of the standard hugging face library. No other pre-processing was done, aside from relabelling columns to match the expected format.

	#### Metrics

	<!-- These are the evaluation metrics being used. -->

	- Accuracy: Proportion of correct predictions.
	- Matthews Correlation Coefficient (MCC): Correlation coefficient between predicted and true labels, ranging from -1 to 1.

	### Results

	Final results on the evaluation set:

	- Loss: 0.4849
	- Accuracy: 0.8848
	- Mcc: 0.7695

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Mcc \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:------:\|
	\| 0.6552 \| 1.0 \| 191 \| 0.3383 \| 0.8685 \| 0.7377 \|
	\| 0.2894 \| 2.0 \| 382 \| 0.3045 \| 0.8778 \| 0.7559 \|
	\| 0.1891 \| 3.0 \| 573 \| 0.3255 \| 0.8854 \| 0.7705 \|
	\| 0.1209 \| 4.0 \| 764 \| 0.3963 \| 0.8829 \| 0.7657 \|
	\| 0.0843 \| 5.0 \| 955 \| 0.4849 \| 0.8848 \| 0.7695 \|

	## Technical Specifications

	### Hardware

	PC specs the model was trained on:

	- CPU: AMD Ryzen 7 7700X
	- GPU: NVIDIA GeForce RTX 5070 Ti
	- Memory: 32GB DDR5
	- Motherboard: MSI MAG B650 TOMAHAWK WIFI Motherboard

	### Software

	- Transformers 4.50.2
	- Pytorch 2.8.0.dev20250326+cu128
	- Datasets 3.5.0
	- Tokenizers 0.21.1

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	- The model's performance and biases depend on the data on which it was trained, however no details of the data's origin is known so this cannot be commented on.
	- The risk lies in trusting any labelling with confidence, without manual verification. Models can make mistakes, verify the outputs.
	- This is limited by the training data not being comprehensive of all possible premise-hypothesis combinations, however this is possible in real life. Additional training and validation data would have been useful.

	## Additional Information

	<!-- Any other information that would be useful for other people to know. -->

	- This model was pushed to the Hugging Face Hub with `trainer.push_to_hub()` after training locally.