arvindkaphley
/

finetune_starcoder2_with_Ruby_Data

Generated from Trainer

Model card Files Files and versions

arvindkaphley commited on Mar 30, 2024

Commit

1ce788a

·

verified ·

1 Parent(s): f62bbdd

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -40,10 +40,12 @@ Ruby Code Generator is a versatile tool crafted to streamline the interaction be
     - Load the bigcode/the-stack-smol dataset using the Hugging Face Datasets library.
     - Filter for the specified subset (data/ruby) and split (train).
     - Load the bigcode/starcoder2-3b model from the Hugging Face Hub with '4-bit' quantization.
 **2. Data Preprocessing:**
     - Tokenize the code text using the appropriate tokenizer for the chosen model.
     - Apply necessary cleaning or normalization (e.g., removing comments, handling indentation).
     - Create input examples suitable for the model's architecture (e.g., with masked language modeling objectives).
 **3. Configure Training:**
     - Initialize a Trainer object (likely from a library like Transformers).
     - Set training arguments based on the provided args:

     - Load the bigcode/the-stack-smol dataset using the Hugging Face Datasets library.
     - Filter for the specified subset (data/ruby) and split (train).
     - Load the bigcode/starcoder2-3b model from the Hugging Face Hub with '4-bit' quantization.
 **2. Data Preprocessing:**
     - Tokenize the code text using the appropriate tokenizer for the chosen model.
     - Apply necessary cleaning or normalization (e.g., removing comments, handling indentation).
     - Create input examples suitable for the model's architecture (e.g., with masked language modeling objectives).
 **3. Configure Training:**
     - Initialize a Trainer object (likely from a library like Transformers).
     - Set training arguments based on the provided args: