ddrg
/

codecolbert

AnReu commited on Oct 11, 2023

Commit

02ba4f5

1 Parent(s): 063da31

Create README

Files changed (1) hide show

README.md ADDED Viewed

+# CodeColBERT
+This model serves as the base for our semantic code retrieval system SELMA. It can be applied for indexing and retrieval using the Pyterrier bindings for ColBERT.
+## Training Details
+This model was trained for code retrieval. As a base, CodeBERT is used. It is trained using the official ColBERTv2 code
+([Github](https://github.com/stanford-futuredata/ColBERT)).
+Our data source is the [CodeSearchNet Challenge](https://github.com/github/CodeSearchNet).
+Training ColBERT requires a tripes of queries, positive examples and negative examples. As queries, we used the documentation
+provided for each sample in the CodeSearchNet data set, while its code snippet serves as the positive example. Negative examples were
+sampled randomly from the corpus. In total, we train for 400.000 steps.