|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-Coder-0.5B |
|
|
pipeline_tag: text-generation |
|
|
datasets: |
|
|
- seniruk/git-diff_to_commit_msg_large |
|
|
--- |
|
|
|
|
|
# Hi, Iโm Seniru Epasinghe ๐ |
|
|
|
|
|
Iโm an AI undergraduate and an AI enthusiast, working on machine learning projects and open-source contributions. |
|
|
I enjoy exploring AI pipelines, natural language processing, and building tools that make development easier. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Connect with me |
|
|
|
|
|
[](https://huggingface.co/seniruk) |
|
|
[](https://medium.com/@senirukepasinghe) |
|
|
[](https://www.linkedin.com/in/seniru-epasinghe-b34b86232/) |
|
|
[](https://github.com/seth2k2) |
|
|
|
|
|
|
|
|
### Finetuned-qwen2.5-coder-0.5B model on 100000 rows of a cutom dataset containing. git-differences and respective commit messages |
|
|
- [dataset huggingface link](https://huggingface.co/datasets/seniruk/git-diff_to_commit_msg_large) |
|
|
- [dataset kaggle link](https://www.kaggle.com/datasets/seniruepasinghe/git-diff-to-commit-msg-large) |
|
|
|
|
|
### Each row of the dataset was formatted as below to suit finetuning requirement of Qwen2.5-coder model so we have to use the same prompt for better results |
|
|
``` |
|
|
"""Generate a concise and meaningful commit message based on the provided Git diff. |
|
|
|
|
|
### Git Diff: |
|
|
{Git diff from dataset} |
|
|
|
|
|
### Commit Message:""" |
|
|
``` |
|
|
|
|
|
### Code for inference of the gguf model is given below |
|
|
|
|
|
``` |
|
|
from llama_cpp import Llama |
|
|
|
|
|
modelGGUF = Llama.from_pretrained( |
|
|
repo_id="seniruk/qwen2.5coder-0.5B_commit_msg", |
|
|
filename="qwen0.5-finetuned.gguf", |
|
|
rope_scaling={"type": "linear", "factor": 2.0}, |
|
|
chat_format=None, # Disables any chat formatting |
|
|
n_ctx=32768, # Set the context size explicitly |
|
|
) |
|
|
|
|
|
# Define the commit message prompt (Minimal format, avoids assistant behavior) |
|
|
commit_prompt = """Generate a meaningful commit message explaining all the changes in the provided Git diff. |
|
|
|
|
|
### Git Diff: |
|
|
{} |
|
|
|
|
|
### Commit Message:""" # Removed {} after "Commit Message:" to prevent pre-filled text. |
|
|
|
|
|
# Git diff example for commit message generation |
|
|
git_diff_example = """ |
|
|
diff --git a/index.html b/index.html |
|
|
index 89abcde..f123456 100644 |
|
|
--- a/index.html |
|
|
+++ b/index.html |
|
|
@@ -5,16 +5,6 @@ <body> |
|
|
<h1>Welcome to My Page</h1> |
|
|
|
|
|
- <table border="1"> |
|
|
- <tr> |
|
|
- <th>Name</th> |
|
|
- <th>Age</th> |
|
|
- </tr> |
|
|
- <tr> |
|
|
- <td>John Doe</td> |
|
|
- <td>30</td> |
|
|
- </tr> |
|
|
- </table> |
|
|
|
|
|
+ <p>This is a newly added paragraph replacing the table.</p> |
|
|
</body> |
|
|
</html> |
|
|
""" |
|
|
|
|
|
# Prepare the raw input prompt |
|
|
input_prompt = commit_prompt.format(git_diff_example) |
|
|
|
|
|
# Generate commit message |
|
|
output = modelGGUF( |
|
|
input_prompt, |
|
|
max_tokens=64, |
|
|
temperature=0.6, # Balanced randomness |
|
|
top_p=0.8, # Controls nucleus sampling |
|
|
top_k=50, # Limits vocabulary selection |
|
|
) |
|
|
|
|
|
# Decode and print the output |
|
|
commit_message = output["choices"][0]["text"].strip() |
|
|
|
|
|
print("\nGenerated Commit Message:\n{}".format(commit_message)) |
|
|
``` |