how do you fine tune?

#8
by Roman1111111 - opened

how do you fine tune?, in what precision? and what gpu do you use, what context window? I would like to know about this as i also work on similar project, could you please help me with that

I also have a question about that

TeichAI org

How i finetune Is a big question. we put some docs together to help people replicate what we do. Although things can vary depending how much compute you have at your disposal.

https://docs.teichai.com
Note: the "open in colab" buttons don't work at the moment as we haven't made the notebooks yet

For tuning something like GLM Flash you want at the very least 64gb of vram.

Well what GPUs do you use

TeichAI org

Again depends on the model but for models like this colab I use the 80gb a100

What is the max I could fine tune with my 5070 Ti

TeichAI org

Same gpu I use. Max of 14B (depending on how optimized the model is in unsloth)

is it worth it? if i fine tune glm 4.7 flash on 100k rows+ dataset, answered by gemini 3 flash, it has various domains and concepts - so it's really diverse, could you please give advice

Well, one thing that’s great about synthetic data is that it can be specialized, I would build a 50k Dataset with Claude 4.6 Sonnet for maybe the ultimate Rust LLM

what about sft? or just logic specialised, if i train it on logical samlpes, and include just general converstions too, will it improve in all benchmarks?

More than likely will see gains and losses depending on how diverse your dataset is. especially with tool calling.

My recommendation is to pick a use case that you want your tuned model to excel at, then make your dataset with that in mind. It's still good to include other examples but a skewed prompt distribution would help the model specialize to that domain you select. Then train from there.

Improvement in all benchmarks would require lots of different post training techniques and lots of diverse data.

i have about 300 domains, and 6000 concepts (20 concepts for each domain) and about 15 rows of different level - medium, hard and extreme - 80k rows. But all are highly complex. Domains cover - Logic, strategy, math, science, complex code, money and profit, legal, psyhology and emotional inteligence, opportunity finding and control, resource optimizations and efficiency and more. And i have 10000 rows of just chity-chat with imperfect human like simple prompts. And 20000 general purpose. So is it better to fine tune on high quality, diverse 100k rows, or decerase them to 40k? For maximum effect in reasoning and most of the benchmarks

The more data the better the results in my experience. Sound's like you've thought this through enough and have a good prompt distribution. Ideally you see improvements in all benchmarks, but no way to know how it turns out until you run training and experiment!

Good luck!

thanks so much for advice, but thats a problem that i won't be sure if the model turns out good after that much of data

I’m sure it will. I recommend a larger batch size (batch * grad accum) so you don’t kill the model with sft, but plenty of examples of models turning out great when trained on large datasets.

Depending on the size of the model I can run benchmarks for you after training

i will try to train glm 4.7 flash in google colab with a100 80gb, but im not sure if it's enough for 32k context window, and about 400 million of tokens of dataset

Sign up or log in to comment