how do you fine tune?

by Roman1111111 - opened 2 days ago

2 days ago

how do you fine tune?, in what precision? and what gpu do you use, what context window? I would like to know about this as i also work on similar project, could you please help me with that

flashxRQ

2 days ago

I also have a question about that

armand0e

TeichAI org 1 day ago

How i finetune Is a big question. we put some docs together to help people replicate what we do. Although things can vary depending how much compute you have at your disposal.

https://docs.teichai.com
Note: the "open in colab" buttons don't work at the moment as we haven't made the notebooks yet

For tuning something like GLM Flash you want at the very least 64gb of vram.

Bob-the-Koala

1 day ago

Well what GPUs do you use

armand0e

TeichAI org 1 day ago

Again depends on the model but for models like this colab I use the 80gb a100

Bob-the-Koala

1 day ago

What is the max I could fine tune with my 5070 Ti

armand0e

TeichAI org 1 day ago

Same gpu I use. Max of 14B (depending on how optimized the model is in unsloth)

Roman1111111

1 day ago

is it worth it? if i fine tune glm 4.7 flash on 100k rows+ dataset, answered by gemini 3 flash, it has various domains and concepts - so it's really diverse, could you please give advice

Bob-the-Koala

about 4 hours ago

Well, one thing that’s great about synthetic data is that it can be specialized, I would build a 50k Dataset with Claude 4.6 Sonnet for maybe the ultimate Rust LLM

Roman1111111

about 4 hours ago

what about sft? or just logic specialised, if i train it on logical samlpes, and include just general converstions too, will it improve in all benchmarks?

armand0e

TeichAI org about 2 hours ago

More than likely will see gains and losses depending on how diverse your dataset is. especially with tool calling.

My recommendation is to pick a use case that you want your tuned model to excel at, then make your dataset with that in mind. It's still good to include other examples but a skewed prompt distribution would help the model specialize to that domain you select. Then train from there.

Improvement in all benchmarks would require lots of different post training techniques and lots of diverse data.

Roman1111111

about 2 hours ago

i have about 300 domains, and 6000 concepts (20 concepts for each domain) and about 15 rows of different level - medium, hard and extreme - 80k rows. But all are highly complex. Domains cover - Logic, strategy, math, science, complex code, money and profit, legal, psyhology and emotional inteligence, opportunity finding and control, resource optimizations and efficiency and more. And i have 10000 rows of just chity-chat with imperfect human like simple prompts. And 20000 general purpose. So is it better to fine tune on high quality, diverse 100k rows, or decerase them to 40k? For maximum effect in reasoning and most of the benchmarks

armand0e

TeichAI org about 2 hours ago

The more data the better the results in my experience. Sound's like you've thought this through enough and have a good prompt distribution. Ideally you see improvements in all benchmarks, but no way to know how it turns out until you run training and experiment!

Good luck!

Roman1111111

about 1 hour ago

thanks so much for advice, but thats a problem that i won't be sure if the model turns out good after that much of data

armand0e

TeichAI org about 1 hour ago

I’m sure it will. I recommend a larger batch size (batch * grad accum) so you don’t kill the model with sft, but plenty of examples of models turning out great when trained on large datasets.

Depending on the size of the model I can run benchmarks for you after training

Roman1111111

about 1 hour ago

i will try to train glm 4.7 flash in google colab with a100 80gb, but im not sure if it's enough for 32k context window, and about 400 million of tokens of dataset

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment