SentenceTransformer based on Alibaba-NLP/gte-Qwen2-1.5B-instruct

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-Qwen2-1.5B-instruct. It maps sentences & paragraphs to a 1536-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-Qwen2-1.5B-instruct
  • Maximum Sequence Length: 32768 tokens
  • Output Dimensionality: 1536 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False, 'architecture': 'Qwen2Model'})
  (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
queries = [
    "Instruct: Compare Canonical and its colloquial Financial Instrument name: shradha infra ltd",
]
documents = [
    'Instruct: Compare Canonical and its colloquial Financial Instrument name: shradha infraprojects ltd.',
    'Instruct: Compare Canonical and its colloquial Financial Instrument name: boi axa fixed maturity plan - series 12 (386 days)',
    'Instruct: Compare Canonical and its colloquial Financial Instrument name: 360 one silver etf',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1536] [3, 1536]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.6971, 0.0176, 0.1806]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 60,995 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 15 tokens
    • mean: 20.99 tokens
    • max: 46 tokens
    • min: 17 tokens
    • mean: 22.98 tokens
    • max: 43 tokens
    • 0: 100.00%
  • Samples:
    sentence1 sentence2 label
    Instruct: Compare Canonical and its colloquial Financial Instrument name: synergy green industries ltd Instruct: Compare Canonical and its colloquial Financial Instrument name: synergy green industries ltd. 0
    Instruct: Compare Canonical and its colloquial Financial Instrument name: nfp sampoorna foods ltd Instruct: Compare Canonical and its colloquial Financial Instrument name: nfp sampoorna foods ltd. 0
    Instruct: Compare Canonical and its colloquial Financial Instrument name: alpex Instruct: Compare Canonical and its colloquial Financial Instrument name: alpex solar ltd. 0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • learning_rate: 3e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0005 1 1.8786
0.0010 2 1.9664
0.0016 3 1.4735
0.0021 4 2.6475
0.0026 5 2.2149
0.0031 6 1.6354
0.0037 7 1.6902
0.0042 8 1.5221
0.0047 9 1.5243
0.0052 10 1.1373
0.0058 11 1.2209
0.0063 12 1.4964
0.0068 13 1.5428
0.0073 14 1.3659
0.0079 15 0.7927
0.0084 16 0.9309
0.0089 17 1.2404
0.0094 18 0.7762
0.0100 19 0.8889
0.0105 20 0.657
0.0110 21 0.67
0.0115 22 0.5714
0.0121 23 0.5005
0.0126 24 0.6801
0.0131 25 0.3774
0.0136 26 0.3306
0.0142 27 0.549
0.0147 28 0.1291
0.0152 29 0.3316
0.0157 30 0.0576
0.0163 31 0.0699
0.0168 32 0.1169
0.0173 33 0.0951
0.0178 34 0.0854
0.0184 35 0.0519
0.0189 36 0.0247
0.0194 37 0.1768
0.0199 38 0.045
0.0205 39 0.0202
0.0210 40 0.0776
0.0215 41 0.1327
0.0220 42 0.0103
0.0225 43 0.0899
0.0231 44 0.0559
0.0236 45 0.088
0.0241 46 0.0052
0.0246 47 0.0429
0.0252 48 0.0016
0.0257 49 0.1128
0.0262 50 0.0746
0.0267 51 0.1085
0.0273 52 0.0332
0.0278 53 0.0667
0.0283 54 0.0363
0.0288 55 0.0375
0.0294 56 0.0693
0.0299 57 0.1447
0.0304 58 0.045
0.0309 59 0.0029
0.0315 60 0.022
0.0320 61 0.0174
0.0325 62 0.3009
0.0330 63 0.0153
0.0336 64 0.1176
0.0341 65 0.3625
0.0346 66 0.055
0.0351 67 0.0178
0.0357 68 0.0054
0.0362 69 0.0559
0.0367 70 0.057
0.0372 71 0.0689
0.0378 72 0.0042
0.0383 73 0.0145
0.0388 74 0.0188
0.0393 75 0.0093
0.0399 76 0.0496
0.0404 77 0.0071
0.0409 78 0.004
0.0414 79 0.0141
0.0420 80 0.0107
0.0425 81 0.0372
0.0430 82 0.1183
0.0435 83 0.0012
0.0440 84 0.1094
0.0446 85 0.0007

Framework Versions

  • Python: 3.13.2
  • Sentence Transformers: 5.0.0
  • Transformers: 4.54.1
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mata5764/gte-Qwen2-1.5B-instruct-myfi-v3

Finetuned
(20)
this model