seniruk
/

qwen2.5coder-0.5B_commit_msg

Text Generation

Model card Files Files and versions

qwen2.5coder-0.5B_commit_msg / README.md

seniruk's picture

Update README.md

9682200 verified about 1 month ago

|

history blame contribute delete

3.38 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-Coder-0.5B
	pipeline_tag: text-generation
	datasets:
	- seniruk/git-diff_to_commit_msg_large
	---

	# Hi, I’m Seniru Epasinghe 👋

	I’m an AI undergraduate and an AI enthusiast, working on machine learning projects and open-source contributions.
	I enjoy exploring AI pipelines, natural language processing, and building tools that make development easier.

	---

	## 🌐 Connect with me

	[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-seniruk-orange?logo=huggingface&logoColor=white)](https://huggingface.co/seniruk)
	[![Medium](https://img.shields.io/badge/Medium-seniruk_epasinghe-black?logo=medium&logoColor=white)](https://medium.com/@senirukepasinghe)
	[![LinkedIn](https://img.shields.io/badge/LinkedIn-seniru_epasinghe-blue?logo=linkedin&logoColor=white)](https://www.linkedin.com/in/seniru-epasinghe-b34b86232/)
	[![GitHub](https://img.shields.io/badge/GitHub-seth2k2-181717?logo=github&logoColor=white)](https://github.com/seth2k2)


	### Finetuned-qwen2.5-coder-0.5B model on 100000 rows of a cutom dataset containing. git-differences and respective commit messages
	- [dataset huggingface link](https://huggingface.co/datasets/seniruk/git-diff_to_commit_msg_large)
	- [dataset kaggle link](https://www.kaggle.com/datasets/seniruepasinghe/git-diff-to-commit-msg-large)

	### Each row of the dataset was formatted as below to suit finetuning requirement of Qwen2.5-coder model so we have to use the same prompt for better results
	```
	"""Generate a concise and meaningful commit message based on the provided Git diff.

	### Git Diff:
	{Git diff from dataset}

	### Commit Message:"""
	```

	### Code for inference of the gguf model is given below

	```
	from llama_cpp import Llama

	modelGGUF = Llama.from_pretrained(
	repo_id="seniruk/qwen2.5coder-0.5B_commit_msg",
	filename="qwen0.5-finetuned.gguf",
	rope_scaling={"type": "linear", "factor": 2.0},
	chat_format=None, # Disables any chat formatting
	n_ctx=32768, # Set the context size explicitly
	)

	# Define the commit message prompt (Minimal format, avoids assistant behavior)
	commit_prompt = """Generate a meaningful commit message explaining all the changes in the provided Git diff.

	### Git Diff:
	{}

	### Commit Message:""" # Removed {} after "Commit Message:" to prevent pre-filled text.

	# Git diff example for commit message generation
	git_diff_example = """
	diff --git a/index.html b/index.html
	index 89abcde..f123456 100644
	--- a/index.html
	+++ b/index.html
	@@ -5,16 +5,6 @@ <body>
	<h1>Welcome to My Page</h1>

	- <table border="1">
	- <tr>
	- <th>Name</th>
	- <th>Age</th>
	- </tr>
	- <tr>
	- <td>John Doe</td>
	- <td>30</td>
	- </tr>
	- </table>

	+ <p>This is a newly added paragraph replacing the table.</p>
	</body>
	</html>
	"""

	# Prepare the raw input prompt
	input_prompt = commit_prompt.format(git_diff_example)

	# Generate commit message
	output = modelGGUF(
	input_prompt,
	max_tokens=64,
	temperature=0.6, # Balanced randomness
	top_p=0.8, # Controls nucleus sampling
	top_k=50, # Limits vocabulary selection
	)

	# Decode and print the output
	commit_message = output["choices"][0]["text"].strip()

	print("\nGenerated Commit Message:\n{}".format(commit_message))
	```