Feels same with the base model.

#1
by Dobratr - opened

In my seaching a good 24B model I found your model and download it blindly start to test it in my way for its role playling capabilities.
I usually test with CoboldCpp + ST setup with my 16GB GPU.
I prefer 1024 Response (tokens) and 16384 Context (tokens) settings on ST.
So, I downloaded Mag-Mell-R1-21B.i1-Q6_K version of your model.
Then I see your model is a merge of 4x MN-12B-Mag-Mell-R1
I downloaded the MN-12B-Mag-Mell-R1.Q8_0 version and test it side by side and I couldn't find a noticeable diffrence between the two.
You haven't give any information of the intention of this merge.
I accept that I'm not a experinced ethusiast but please give me guidance to test it properly or at least please let me know what is the purpose of merging the same model into itself.

In my seaching a good 24B model I found your model and download it blindly start to test it in my way for its role playling capabilities.
I usually test with CoboldCpp + ST setup with my 16GB GPU.
I prefer 1024 Response (tokens) and 16384 Context (tokens) settings on ST.
So, I downloaded Mag-Mell-R1-21B.i1-Q6_K version of your model.
Then I see your model is a merge of 4x MN-12B-Mag-Mell-R1
I downloaded the MN-12B-Mag-Mell-R1.Q8_0 version and test it side by side and I couldn't find a noticeable diffrence between the two.
You haven't give any information of the intention of this merge.
I accept that I'm not a experinced ethusiast but please give me guidance to test it properly or at least please let me know what is the purpose of merging the same model into itself.

There is nothing wrong with your testing approach. This model is a result of what's usually in the community called "upscale" of the base model. This means the base model has been expanded to be bigger, however that's all that was done with it. There was no further training of the model, or at least there is no mention of it in the model card and the model card itself seems to be a generic model card created by MergeKit which is the tool that serves the purpose of merging models (and in this case upscaling them). With that said, without further training of the upscaled model, there will be no difference in its output. The only difference is the size, so you're basically getting the same output quality, but with a model that is more demanding on your hardware, because it's bigger so it requires more resources to load. The model as is is more suitable for further training than using it for inference. If you're not interested in training it yourself and you're only looking for a model that you can run as is, you'd be better off using the base model which is smaller and isn't as demanding as this one. I hope that helps.

In my seaching a good 24B model I found your model and download it blindly start to test it in my way for its role playling capabilities.
I usually test with CoboldCpp + ST setup with my 16GB GPU.
I prefer 1024 Response (tokens) and 16384 Context (tokens) settings on ST.
So, I downloaded Mag-Mell-R1-21B.i1-Q6_K version of your model.
Then I see your model is a merge of 4x MN-12B-Mag-Mell-R1
I downloaded the MN-12B-Mag-Mell-R1.Q8_0 version and test it side by side and I couldn't find a noticeable diffrence between the two.
You haven't give any information of the intention of this merge.
I accept that I'm not a experinced ethusiast but please give me guidance to test it properly or at least please let me know what is the purpose of merging the same model into itself.

There is nothing wrong with your testing approach. This model is a result of what's usually in the community called "upscale" of the base model. This means the base model has been expanded to be bigger, however that's all that was done with it. There was no further training of the model, or at least there is no mention of it in the model card and the model card itself seems to be a generic model card created by MergeKit which is the tool that serves the purpose of merging models (and in this case upscaling them). With that said, without further training of the upscaled model, there will be no difference in its output. The only difference is the size, so you're basically getting the same output quality, but with a model that is more demanding on your hardware, because it's bigger so it requires more resources to load. The model as is is more suitable for further training than using it for inference. If you're not interested in training it yourself and you're only looking for a model that you can run as is, you'd be better off using the base model which is smaller and isn't as demanding as this one. I hope that helps.

@MrDevolver thanks for replying to my message. I couldn't understand the reason why a user may want to use a high demanding model instead of he/she can have the same output from a more resource friendly alternative. So, you are mentioning the 21B version is more suitable for further training. I'm also getting a hang of using the models and try to understand the ways for training a model.
But, I'm a bit confused. Please tell me more about what makes a 21B model to be more suitable for training than 12B model? In my logic 12B can be trained for more data to be 21B with less resourses than taking a 21B model and train it to be a higher one and please let me know if my logic is wrong, or give me any link that I can read and understand the logic more.

Sign up or log in to comment