Update README.md
Browse files
README.md
CHANGED
|
@@ -15,16 +15,17 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
|
|
| 15 |
|
| 16 |
# Testing Environment
|
| 17 |
|
| 18 |
-
-
|
| 19 |
- Backend is the latest version of KoboldCPP for Windows using CUDA 12.
|
| 20 |
- Using **CuBLAS** but **not using QuantMatMul (mmq)**.
|
| 21 |
-
- 7-10B Models
|
| 22 |
-
-
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
- -
|
| 26 |
-
-
|
| 27 |
-
-
|
|
|
|
| 28 |
- Response size set to 1024 tokens max.
|
| 29 |
- Fixed Seed for all tests: **123**
|
| 30 |
|
|
|
|
| 15 |
|
| 16 |
# Testing Environment
|
| 17 |
|
| 18 |
+
- Frontend is staging version of Silly Tavern.
|
| 19 |
- Backend is the latest version of KoboldCPP for Windows using CUDA 12.
|
| 20 |
- Using **CuBLAS** but **not using QuantMatMul (mmq)**.
|
| 21 |
+
- **7-10B Models:**
|
| 22 |
+
- All models are loaded in Q8_0 (GGUF)
|
| 23 |
+
- All models are extended to **16K context length** (auto rope from KCPP)
|
| 24 |
+
- **Flash Attention** and **ContextShift** enabled.
|
| 25 |
+
- **11-15B Models:**
|
| 26 |
+
- All models are loaded in Q4_KM or whatever is the highest/closest available (GGUF)
|
| 27 |
+
- All models are extended to **12K context length** (auto rope from KCPP)
|
| 28 |
+
- **Flash Attention** and **8Bit cache compression** are enabled.
|
| 29 |
- Response size set to 1024 tokens max.
|
| 30 |
- Fixed Seed for all tests: **123**
|
| 31 |
|