SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/embed-all-MiniLM-L6-v2-squad-9-epochs")
# Run inference
sentences = [
    "Who is often listed amongst the world's worst offenders when it comes to human rights?",
    'The United Nations Organization and its children\'s agency UNICEF withdrew their staff, saying that it wasn\'t sure the event would help its mission of raising awareness of conditions for children and amid concerns that the relay would be used as a propaganda stunt. "It was unconscionable," said a UN official who was briefed on the arguments. North Korea is frequently listed among the world’s worst offenders against human rights.',
    'The United Nations Organization and its children\'s agency UNICEF withdrew their staff, saying that it wasn\'t sure the event would help its mission of raising awareness of conditions for children and amid concerns that the relay would be used as a propaganda stunt. "It was unconscionable," said a UN official who was briefed on the arguments. North Korea is frequently listed among the world’s worst offenders against human rights.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4042

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,286 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 7 tokens
    • mean: 14.37 tokens
    • max: 34 tokens
    • min: 28 tokens
    • mean: 146.42 tokens
    • max: 256 tokens
    • min: 29 tokens
    • mean: 146.64 tokens
    • max: 256 tokens
  • Samples:
    question context negative
    What was one name of a power that the szlachta was dependent on. During the Partitions of Poland from 1772 to 1795, its members began to lose these legal privileges and social status. From that point until 1918, the legal status of the nobility was essentially dependent upon the policies of the three partitioning powers: the Russian Empire, the Kingdom of Prussia, and the Habsburg Monarchy. The legal privileges of the szlachta were legally abolished in the Second Polish Republic by the March Constitution of 1921. Poles of the 17th century assumed that "szlachta" came from the German "schlachten" ("to slaughter" or "to butcher"); also suggestive is the German "Schlacht" ("battle"). Early Polish historians thought the term may have derived from the name of the legendary proto-Polish chief, Lech, mentioned in Polish and Czech writings.
    How many warships does the Royal Canadian Navy have? The Royal Canadian Navy (RCN), headed by the Commander of the Royal Canadian Navy, includes 33 warships and submarines deployed in two fleets: Maritime Forces Pacific (MARPAC) at CFB Esquimalt on the west coast, and Maritime Forces Atlantic (MARLANT) at Her Majesty's Canadian Dockyard in Halifax on the east coast, as well as one formation: the Naval Reserve Headquarters (NAVRESHQ) at Quebec City, Quebec. The fleet is augmented by various aircraft and supply vessels. The RCN participates in NATO exercises and operations, and ships are deployed all over the world in support of multinational deployments. The Royal Navy is constructing two new larger STOVL aircraft carriers, the Queen Elizabeth class, to replace the three now retired Invincible-class carriers. The ships are HMS Queen Elizabeth and HMS Prince of Wales. They will be able to operate up to 40 aircraft on peace time operations with a tailored group of up to 50, and will have a displacement of 70,600 tonnes. HMS Queen Elizabeth is projected to commission in 2017 followed by Prince of Wales in about 2020. The ships are due to become operational starting in 2020. Their primary aircraft complement will be made up of F-35B Lightning IIs, and their ship's company will number around 680 with the total complement rising to about 1600 when the air group is embarked. The two ships will be the largest warships ever built for the Royal Navy.
    In this period, who usually inherited fiefs? Other sections of society included the nobility, clergy, and townsmen. Nobles, both the titled nobility and simple knights, exploited the manors and the peasants, although they did not own lands outright but were granted rights to the income from a manor or other lands by an overlord through the system of feudalism. During the 11th and 12th centuries, these lands, or fiefs, came to be considered hereditary, and in most areas they were no longer divisible between all the heirs as had been the case in the early medieval period. Instead, most fiefs and lands went to the eldest son.[R] The dominance of the nobility was built upon its control of the land, its military service as heavy cavalry, control of castles, and various immunities from taxes or other impositions.[S] Castles, initially in wood but later in stone, began to be constructed in the 9th and 10th centuries in response to the disorder of the time, and provided protection from invaders as well as allowing lords defence from riva... In European history, the Middle Ages or medieval period lasted from the 5th to the 15th century. It began with the collapse of the Western Roman Empire and merged into the Renaissance and the Age of Discovery. The Middle Ages is the middle period of the three traditional divisions of Western history: Antiquity, Medieval period, and Modern period. The Medieval period is itself subdivided into the Early, the High, and the Late Middle Ages.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 3 tokens
    • mean: 14.52 tokens
    • max: 38 tokens
    • min: 28 tokens
    • mean: 146.86 tokens
    • max: 256 tokens
    • min: 28 tokens
    • mean: 144.27 tokens
    • max: 256 tokens
  • Samples:
    question context negative_1
    In which video did Madonna wear a rosary? Madonna's Italian-Catholic background and her relationship with her parents are reflected in the album Like a Prayer. It was an evocation of the impact religion had on her career. Her video for the title track contains Catholic symbolism, such as the stigmata. During The Virgin Tour, she wore a rosary and prayed with it in the music video for "La Isla Bonita". The "Open Your Heart" video sees her boss scolding her in the Italian language. On the Who's That Girl World Tour, she dedicated the song "Papa Don't Preach" to Pope John Paul II. Madonna's Italian-Catholic background and her relationship with her parents are reflected in the album Like a Prayer. It was an evocation of the impact religion had on her career. Her video for the title track contains Catholic symbolism, such as the stigmata. During The Virgin Tour, she wore a rosary and prayed with it in the music video for "La Isla Bonita". The "Open Your Heart" video sees her boss scolding her in the Italian language. On the Who's That Girl World Tour, she dedicated the song "Papa Don't Preach" to Pope John Paul II.
    Who leads the Defence Medical Services? There are also three Deputy Chiefs of the Defence Staff with particular remits, Deputy Chief of the Defence Staff (Capability), Deputy CDS (Personnel and Training) and Deputy CDS (Operations). The Surgeon General, represents the Defence Medical Services on the Defence Staff, and is the clinical head of that service. There are also three Deputy Chiefs of the Defence Staff with particular remits, Deputy Chief of the Defence Staff (Capability), Deputy CDS (Personnel and Training) and Deputy CDS (Operations). The Surgeon General, represents the Defence Medical Services on the Defence Staff, and is the clinical head of that service.
    Which of the Marshall Islands did Salazar most likely see? Spanish explorer Alonso de Salazar was the first European to see the islands in 1526, commanding the ship Santa Maria de la Victoria, the only surviving vessel of the Loaísa Expedition. On August 21, he sighted an island (probably Taongi) at 14°N that he named "San Bartolome". The Ministry of Education (Marshall Islands) operates the state schools in the Marshall Islands. There are two tertiary institutions operating in the Marshall Islands, the College of the Marshall Islands and the University of the South Pacific.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 9
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 9
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3266
0.2890 100 0.4583 0.8001 0.3838
0.5780 200 0.4002 0.7751 0.3890
0.8671 300 0.3768 0.7717 0.3980
1.1561 400 0.3325 0.7565 0.4026
1.4451 500 0.3036 0.7464 0.4064
1.7341 600 0.3052 0.7472 0.4092
2.0231 700 0.2979 0.7500 0.3984
2.3121 800 0.2108 0.7424 0.4018
2.6012 900 0.2212 0.7453 0.4102
2.8902 1000 0.2195 0.7432 0.4078
3.1792 1100 0.1834 0.7468 0.4064
3.4682 1200 0.1621 0.7497 0.4036
3.7572 1300 0.1689 0.7472 0.4026
4.0462 1400 0.1632 0.7478 0.4130
4.3353 1500 0.1348 0.7569 0.4012
4.6243 1600 0.1347 0.7463 0.4136
4.9133 1700 0.1391 0.7500 0.4060
5.2023 1800 0.1168 0.7589 0.4062
5.4913 1900 0.1132 0.7526 0.4066
5.7803 2000 0.1148 0.7572 0.4052
6.0694 2100 0.1134 0.7579 0.4048
6.3584 2200 0.0965 0.7603 0.4052
6.6474 2300 0.0992 0.7622 0.4024
6.9364 2400 0.0993 0.7645 0.4064
7.2254 2500 0.0927 0.7659 0.4038
7.5145 2600 0.0895 0.7657 0.4068
7.8035 2700 0.0922 0.7619 0.4084
8.0925 2800 0.0904 0.7652 0.4084
8.3815 2900 0.0845 0.7653 0.4112
8.6705 3000 0.0815 0.7646 0.4100
8.9595 3100 0.0819 0.7656 0.4120
-1 -1 - - 0.4042

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayushexel/embed-all-MiniLM-L6-v2-squad-9-epochs

Finetuned
(752)
this model

Papers for ayushexel/embed-all-MiniLM-L6-v2-squad-9-epochs

Evaluation results