metadata
license: llama3.1
language: en
base_model: meta-llama/Llama-3.1-8B-Instruct
MathBite/self_corrective_llama_3.1_8B_end
This is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct. The LoRA adapter has been merged into the base model. This model includes a custom hallucination detection head that can intervene during generation to insert corrective instructions.
How to Use
Because this model uses a custom architecture with a modified generate method, you must use trust_remote_code=True when loading it. The required modeling.py file is included in this repository.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "MathBite/self_corrective_llama_3.1_8B_end"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Important: You must trust the remote code
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16 # or your preferred dtype
).to("cuda") # move model to GPU
# You can now use the model's custom generate method
prompt = "YOUR PROMPT HERE" # your prompt here
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# The custom generate method requires the tokenizer instance
generated_ids = model.generate(
inputs.input_ids,
tokenizer=tokenizer,
max_new_tokens=100,
temperature=0.7
)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(generated_text)
Model Details
This model was programmatically merged and uploaded using a deployment script. The custom class SelfCorrectiveLlama can be found in the modeling.py file.
The code in modeling.py is licensed under the Apache 2.0 License. The model weights are subject to the original license of the base model.