SerialKicked
/

ModelTestingBed

Model card Files Files and versions

SerialKicked commited on Jun 8, 2024

Commit

8802cae

·

verified ·

1 Parent(s): 7f391d2

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -18,8 +18,12 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
 - All models are loaded in Q8_0 (GGUF) with all layers on the GPU (NVidia RTX3060 12GB)
 - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
 - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
-- All models are extended to **16K context length** (auto rope from KCPP)
-- **Flash Attention** and **ContextShift** enabled.
 - Frontend is staging version of Silly Tavern.
 - Response size set to 1024 tokens max.
 - Fixed Seed for all tests: **123**

 - All models are loaded in Q8_0 (GGUF) with all layers on the GPU (NVidia RTX3060 12GB)
 - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
 - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
+- 7-10B Models
+- - All models are extended to **16K context length** (auto rope from KCPP)
+- - **Flash Attention** and **ContextShift** enabled.
+- 11-15B Models:
+- - All models are extended to **12K context length** (auto rope from KCPP)
+- - **Flash Attention** and **8Bit cache compression** are enabled.
 - Frontend is staging version of Silly Tavern.
 - Response size set to 1024 tokens max.
 - Fixed Seed for all tests: **123**