--- tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer - dataset_size:100 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: nomic-ai/nomic-embed-text-v1.5 widget: - source_sentence: "func SetFactory(ctx context.Context, f Factory) context.Context\ \ {\n\treturn" sentences: - rm -r path - 'Transforms an array into a DateTime. @param array $value Array value. @return DateTime DateTime value.' - ' context.WithValue(ctx, &clockKey, f) }' - source_sentence: "public function hyvesTipUrl($title, $body, $categoryId = 12, $rating\ \ = 5) {\n\n $url = 'http://www.hyves-share.nl/button/tip/?tipcategoryid=%s&rating=%s&title=%s&body=%s';\n" sentences: - " by a TLS client to\n\t// authenticate itself to the TLS server.\n\ttemplate.ExtKeyUsage\ \ = append(template.ExtKeyUsage, x509.ExtKeyUsageClientAuth)\n\n\tt := time.Now().UnixNano()\n\ \ttemplate.SerialNumber = pki.BuildPKISerial(t)\n\n\tcertificate, err := pki.SignNewCertificate(privateKey,\ \ template, caCert.Certificate, caKey)\n\tif err != nil {\n\t\treturn nil, fmt.Errorf(\"\ error signing certificate for master kubelet: %v\", err)\n\t}\n\n\tcaBytes, err\ \ := caCert.AsBytes()\n\tif err != nil {\n\t\treturn nil, fmt.Errorf(\"failed\ \ to get certificate authority data: %s\", err)\n\t}\n\tcertBytes, err := certificate.AsBytes()\n\ \tif err != nil {\n\t\treturn nil, fmt.Errorf(\"failed to get certificate data:\ \ %s\", err)\n\t}\n\tkeyBytes, err := privateKey.AsBytes()\n\tif err != nil {\n\ \t\treturn nil, fmt.Errorf(\"failed to get private key data: %s\", err)\n\t}\n\ \n\tcontent, err := b.BuildKubeConfig(\"kubelet\", caBytes, certBytes, keyBytes)\n\ \tif err != nil {\n\t\treturn nil, err\n\t}\n\n\treturn &nodetasks.File{\n\t\t\ Path: b.KubeletKubeConfig(),\n\t\tContents: fi.NewStringResource(content),\n\ \t\tType: nodetasks.FileType_File,\n\t\tMode: s(\"600\"),\n\t}, nil\n}" - 'Executes the current query and returns the response @throws \Cassandra\Response\Exception @return \Cassandra\Response' - " $title = $title;\n $body = $body;\n return sprintf($url,\ \ $categoryId, $rating, $title, $body);\n }" - source_sentence: "public function get($key, $default = null, $dot_syntax = true)\n\ \ {\n if ($dot_syntax === true) {\n $paths = explode('.',\ \ $key);\n $node =& $this->_data;\n \n foreach\ \ ($paths as $path) {\n if (!is_array($node) || !isset($node[$path]))\ \ {\n // error occurred\n return $default;\n\ \ }\n $node =& $node[$path];\n }\n \ \ \n return $node;\n \n } else {\n \ \ \n return isset($this->_data[$key]) ? $this->_data[$key] :\ \ $default;\n \n }\n }" sentences: - // PrintShortName turns a pkix.Name into a string of RDN tuples. - "Here is the code to create an array, add elements, sort in ascending order, and\ \ print the elements in reverse order in Java:\n\n```java\nimport java.util.Arrays;\n\ \npublic class Main {\n public static void main(String[] args) {\n //\ \ Create an array\n int[] array = {5, 7, 3};\n\n // Sort the array\ \ in ascending order\n Arrays.sort(array);\n\n // Print the elements\ \ in reverse order\n for (int i = array.length - 1; i >= 0; i--) {\n \ \ System.out.println(array[i]);\n }\n }\n}\n```\n\nOutput:\n\ ```\n7\n5\n3\n```\n\nIn the code above, we import the `Arrays` class from the\ \ `java.util` package to use the `sort()` method for sorting the array. We create\ \ an integer array `array` with the given elements. The `Arrays.sort(array)` method\ \ sorts the array in ascending order. Finally, we loop through the array in reverse\ \ order starting from the last index (`array.length - 1`) and print each element\ \ using `System.out.println()`." - 'Returns a single item from the collection data. @param string $key @return mixed' - source_sentence: "def iter(self, query, *parameters, **kwargs):\n \"\"\"\ Returns a generator for records from the query.\"\"\"\n cursor = self._cursor()\n\ \ try:\n self._execute(cursor, query, parameters or None, kwargs)\n\ \ if cursor.description:\n column_names = [column.name\ \ for column in cursor.description]\n while True:\n \ \ record = cursor.fetchone()\n if not record:\n \ \ break\n yield Row(zip(column_names, record))\n\ \ raise StopIteration\n\n except:\n cursor.close()\n\ \ raise" sentences: - "def exit(exit_code=0):\n r\"\"\"A function to support exiting from exit hooks.\n\ \n Could also be used to exit from the calling scripts in a thread safe manner.\n\ \ \"\"\"\n core.processExitHooks()\n\n if state.isExitHooked and not hasattr(sys,\ \ 'exitfunc'): # The function is called from the exit hook\n sys.stderr.flush()\n\ \ sys.stdout.flush()\n os._exit(exit_code) #pylint: disable=W0212\n\n sys.exit(exit_code)" - Returns a generator for records from the query. - " \"\"\"\n\n url = self.file['url']\n args = ['{0}={1}'.format(k,\ \ v) for k, v in kwargs.items()]\n\n if args:\n url += '?{0}'.format('&'.join(args))\n\ \n return url" - source_sentence: What is the total CO2 emission from all aquaculture farms in the year 2021? sentences: - " && value.size == value.uniq.size\n else\n result\n end\n \ \ end" - "\n\treturn c.postJSON(\"joberror\", args)\n}" - SELECT SUM(co2_emission) FROM co2_emission WHERE year = 2021; pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on nomic-ai/nomic-embed-text-v1.5 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'NomicBertModel'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("JahnaviKumar/nomic-embed-text1.5-ftcode") # Run inference queries = [ "What is the total CO2 emission from all aquaculture farms in the year 2021?", ] documents = [ 'SELECT SUM(co2_emission) FROM co2_emission WHERE year = 2021;', '\n\treturn c.postJSON("joberror", args)\n}', ' && value.size == value.uniq.size\n else\n result\n end\n end', ] query_embeddings = model.encode_query(queries) document_embeddings = model.encode_document(documents) print(query_embeddings.shape, document_embeddings.shape) # [1, 768] [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(query_embeddings, document_embeddings) print(similarities) # tensor([[0.7075, 0.3913, 0.3213]]) ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 100 training samples * Columns: query and corpus * Approximate statistics based on the first 100 samples: | | query | corpus | |:--------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | query | corpus | |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | def add_data_file(data_files, target, source):
"""Add an entry to data_files"""
for t, f in data_files:
if t == target:
break
else:
| data_files.append((target, []))
f = data_files[-1][1]
if source not in f:
f.append(source)
| | function verify (token, options) {
options = options \|\| {}
options.issuer = options.issuer \|\| this.issuer
options.client_id = options.client_id \|\| this.client_id
options.client_secret = options.client_secret \|\| this.client_secret
options.scope = options.scope \|\| this.scope
options.key = options.key \|\| this.jwks.sig

return new Promise(function (resolve, reject) {
AccessToken.verify(token, options, function (err, claims) {
if (err) { return reject(err) }
resolve(claims)
})
})
}
| Verifies a given OIDC token
@method verify
@param token {String} JWT AccessToken for OpenID Connect (base64 encoded)
@param [options={}] {Object} Options hashmap
@param [options.issuer] {String} OIDC Provider/Issuer URL
@param [options.key] {Object} Issuer's public key for signatures (jwks.sig)
@param [options.client_id] {String}
@param [options.client_secret {String}
@param [options.scope] {String}
@throws {UnauthorizedError} HTTP 401 or 403 errors (invalid tokens etc)
@return {Promise}
| | def _combine_lines(self, lines):
"""
Combines a list of JSON objects into one JSON object.
"""
| lines = filter(None, map(lambda x: x.strip(), lines))
return '[' + ','.join(lines) + ']'
| * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 5.1.1 - Transformers: 4.54.1 - PyTorch: 2.9.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.2.0 - Tokenizers: 0.21.4 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```