Create README
Browse files
README.md
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CodeColBERT
|
| 2 |
+
|
| 3 |
+
This model serves as the base for our semantic code retrieval system SELMA. It can be applied for indexing and retrieval using the Pyterrier bindings for ColBERT.
|
| 4 |
+
|
| 5 |
+
## Training Details
|
| 6 |
+
This model was trained for code retrieval. As a base, CodeBERT is used. It is trained using the official ColBERTv2 code
|
| 7 |
+
([Github](https://github.com/stanford-futuredata/ColBERT)).
|
| 8 |
+
|
| 9 |
+
Our data source is the [CodeSearchNet Challenge](https://github.com/github/CodeSearchNet).
|
| 10 |
+
Training ColBERT requires a tripes of queries, positive examples and negative examples. As queries, we used the documentation
|
| 11 |
+
provided for each sample in the CodeSearchNet data set, while its code snippet serves as the positive example. Negative examples were
|
| 12 |
+
sampled randomly from the corpus. In total, we train for 400.000 steps.
|