self_corrective_llama_3.1_8B / README.md

MathBite

Upload fine-tuned and merged model.

1af334e verified about 2 months ago

preview code

raw

history blame

1.8 kB

metadata

license: llama3.1
language: en
base_model: meta-llama/Llama-3.1-8B-Instruct

MathBite/self_corrective_llama_3.1_8B_end

This is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct. The LoRA adapter has been merged into the base model. This model includes a custom hallucination detection head that can intervene during generation to insert corrective instructions.

How to Use

Because this model uses a custom architecture with a modified generate method, you must use trust_remote_code=True when loading it. The required modeling.py file is included in this repository.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "MathBite/self_corrective_llama_3.1_8B_end"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Important: You must trust the remote code
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 # or your preferred dtype
).to("cuda") # move model to GPU

# You can now use the model's custom generate method
prompt = "YOUR PROMPT HERE" # your prompt here
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# The custom generate method requires the tokenizer instance
generated_ids = model.generate(
    inputs.input_ids,
    tokenizer=tokenizer,
    max_new_tokens=100,
    temperature=0.7
)

generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(generated_text)

Model Details

This model was programmatically merged and uploaded using a deployment script. The custom class SelfCorrectiveLlama can be found in the modeling.py file.

The code in modeling.py is licensed under the Apache 2.0 License. The model weights are subject to the original license of the base model.