seniruk's picture
Update README.md
9682200 verified
---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen2.5-Coder-0.5B
pipeline_tag: text-generation
datasets:
- seniruk/git-diff_to_commit_msg_large
---
# Hi, Iโ€™m Seniru Epasinghe ๐Ÿ‘‹
Iโ€™m an AI undergraduate and an AI enthusiast, working on machine learning projects and open-source contributions.
I enjoy exploring AI pipelines, natural language processing, and building tools that make development easier.
---
## ๐ŸŒ Connect with me
[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-seniruk-orange?logo=huggingface&logoColor=white)](https://huggingface.co/seniruk)   
[![Medium](https://img.shields.io/badge/Medium-seniruk_epasinghe-black?logo=medium&logoColor=white)](https://medium.com/@senirukepasinghe)   
[![LinkedIn](https://img.shields.io/badge/LinkedIn-seniru_epasinghe-blue?logo=linkedin&logoColor=white)](https://www.linkedin.com/in/seniru-epasinghe-b34b86232/)   
[![GitHub](https://img.shields.io/badge/GitHub-seth2k2-181717?logo=github&logoColor=white)](https://github.com/seth2k2)
### Finetuned-qwen2.5-coder-0.5B model on 100000 rows of a cutom dataset containing. git-differences and respective commit messages
- [dataset huggingface link](https://huggingface.co/datasets/seniruk/git-diff_to_commit_msg_large)
- [dataset kaggle link](https://www.kaggle.com/datasets/seniruepasinghe/git-diff-to-commit-msg-large)
### Each row of the dataset was formatted as below to suit finetuning requirement of Qwen2.5-coder model so we have to use the same prompt for better results
```
"""Generate a concise and meaningful commit message based on the provided Git diff.
### Git Diff:
{Git diff from dataset}
### Commit Message:"""
```
### Code for inference of the gguf model is given below
```
from llama_cpp import Llama
modelGGUF = Llama.from_pretrained(
repo_id="seniruk/qwen2.5coder-0.5B_commit_msg",
filename="qwen0.5-finetuned.gguf",
rope_scaling={"type": "linear", "factor": 2.0},
chat_format=None, # Disables any chat formatting
n_ctx=32768, # Set the context size explicitly
)
# Define the commit message prompt (Minimal format, avoids assistant behavior)
commit_prompt = """Generate a meaningful commit message explaining all the changes in the provided Git diff.
### Git Diff:
{}
### Commit Message:""" # Removed {} after "Commit Message:" to prevent pre-filled text.
# Git diff example for commit message generation
git_diff_example = """
diff --git a/index.html b/index.html
index 89abcde..f123456 100644
--- a/index.html
+++ b/index.html
@@ -5,16 +5,6 @@ <body>
<h1>Welcome to My Page</h1>
- <table border="1">
- <tr>
- <th>Name</th>
- <th>Age</th>
- </tr>
- <tr>
- <td>John Doe</td>
- <td>30</td>
- </tr>
- </table>
+ <p>This is a newly added paragraph replacing the table.</p>
</body>
</html>
"""
# Prepare the raw input prompt
input_prompt = commit_prompt.format(git_diff_example)
# Generate commit message
output = modelGGUF(
input_prompt,
max_tokens=64,
temperature=0.6, # Balanced randomness
top_p=0.8, # Controls nucleus sampling
top_k=50, # Limits vocabulary selection
)
# Decode and print the output
commit_message = output["choices"][0]["text"].strip()
print("\nGenerated Commit Message:\n{}".format(commit_message))
```