SerialKicked
/

ModelTestingBed

Model card Files Files and versions

SerialKicked commited on May 27, 2024

Commit

7e65b3f

·

verified ·

1 Parent(s): f6fde87

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
 ### DoggoEval
-The goal of this test featuring Rex (a dog), and his master (EsKa) is to determine if a model is good at obeying a system prompt and character card. The trick being that dogs can't talk, but LLM love to.
 - [Results and discussions are hosted in this thread](https://huggingface.co/SerialKicked/ModelTestingBed/discussions/1) ([old thread here](https://huggingface.co/LWDCLS/LLM-Discussions/discussions/13))
 - [Files, cards and settings can be found here](https://huggingface.co/SerialKicked/ModelTestingBed/tree/main/DoggoEval)
@@ -51,6 +51,6 @@ TODO: The goal of this test is to check if a model is able of following a very s
 # Limitations
-I'm testing for things I'm interested in. Do not ask for ERP-specific tests. I do not pretend any of this is very scientific or accurate: as much as I try to reduce the amount of variables, a small LLM is still a small LLM at the end of the day. The results for other seeds, or with the smallest of change, are bound to give very different results.
 I usually give the different models I'm testing a fair shake in a more casual settings. I regen tons of outputs with random seeds, and while there are (large) variations, it tends to even out to the results shown in testing. Otherwise I'll make a note of it.

 ### DoggoEval
+The goal of this test, featuring a dog (Rex) and his owner (EsKa), is to determine if a model is good at obeying a system prompt and character card. The trick being that dogs can't talk, but LLM love to.
 - [Results and discussions are hosted in this thread](https://huggingface.co/SerialKicked/ModelTestingBed/discussions/1) ([old thread here](https://huggingface.co/LWDCLS/LLM-Discussions/discussions/13))
 - [Files, cards and settings can be found here](https://huggingface.co/SerialKicked/ModelTestingBed/tree/main/DoggoEval)
 # Limitations
+I'm testing for things I'm interested in. I do not pretend any of this is very scientific or accurate: as much as I try to reduce the amount of variables, a small LLM is still a small LLM at the end of the day. The results for other seeds, or with the smallest of change, are bound to give very different results.
 I usually give the different models I'm testing a fair shake in a more casual settings. I regen tons of outputs with random seeds, and while there are (large) variations, it tends to even out to the results shown in testing. Otherwise I'll make a note of it.