Text Generation
Transformers
PyTorch
Safetensors
English
hf_olmo
custom_code

Checkpoints

#15
by borgr - opened

There are multiple checkpoints mentioned all inside OLMo-7B repo, how could one part be with LR going to 0 and a later one in the same repo not? What does it mean about the rest of the checkpoints found in the repo?

Hi @borgr , for the revisions from step0 to step556, we follow a linear LR schedule, and then in the last 1000 steps, we anneal the LR to 0. We found this to be better for the performance of the final model.

I think I didn't put the question well

I find the differences between those checkpoints unclear, specifically the ones that are part of allenai/OLMo-7B, how can the not annealed one be the one with more tokens,batches and steps?
image.png

@borgr This might make it clearer:

OLMo-7B step452k | 2T tokens | following linear schedule (not annealed)
OLMo-7B step 556k | 2.460T tokens | still following linear schedule (not annealed)
OLMo-7B step 557k (main) | 2.464T tokens | LR annealed to 0

Maybe write in the NAME and Note something comparable between the second and third row then?

Ai2 org

Hi, thanks again for the inquiry! We’re currently working on closing out old tickets, so we’re closing this out for now, but if you require a follow-up response, please re-open this ticket or a new one and we will get back to you!

baileyk changed discussion status to closed

Sign up or log in to comment