--- language: - en - zh - fr - es - de - pt - ru - it - ja - ko - vi - ar tags: - pytorch - text-generation - causal-lm - rwkv license: apache-2.0 datasets: - HuggingFaceFW/fineweb-edu - mlfoundations/dclm-baseline-1.0 - cerebras/SlimPajama-627B - EleutherAI/pile - bigcode/starcoderdata - oscar-corpus/OSCAR-2301 --- # RWKV7-G1 "GooseOne" pure RNN reasoning model **These are BASE models** (pretrained with web/code/synthetic + instruction/chat/reasoning data), suitable for post-training and fine-tuning (check https://huggingface.co/spaces/Jellyfish042/UncheatableEval to see their performance at language modeling). More info & Gradio demo: https://rwkv.com/ For developers: https://github.com/BlinkDL/RWKV-LM RWKV-7 pth => GGUF script: https://github.com/MollySophia/rwkv-mobile/blob/master/converter/convert_rwkv_pth_to_gguf.py GGUF: https://huggingface.co/collections/shoumenchougou/rwkv7-g0a-g1a-gguf Ollama GGUF: https://ollama.com/mollysama Use rwkv pip package 0.8.29+ for RWKV-7 inference: https://pypi.org/project/rwkv/ Efficient inference project: https://github.com/BlinkDL/Albatross RWKV APP: https://github.com/RWKV-APP/RWKV_APP (local inference on Android/iOS) Please always use **latest G#a# models** (better at everything). ``` Gxx = Data Version G0x = less than 1 epoch, as training 1 epoch for a large model is expensive :( G0 G0a G0a2 G0a3 ... G0b ... = adding more (newer and better) data, so G0a has better quality (but less) data than G1 G1x = more than 1 epoch G1 G1a G1a2 G1a3 ... G1b ... = adding more (newer and better) data, note G1a has better quality (and more) data than G0a ``` Decoding (note: this is for RWKV pip pkg, which apply temp after topp): ``` Math: temp 0.3, topp 0.3, alpha_presence 0, alpha_frequency 0, alpha_decay 0.996 Chat: temp 1, topp 0.3, alpha_presence 0.5, alpha_frequency 0.5, alpha_decay 0.996 Creative (great for fiction etc.): temp 0.6, topp 0.6 ~ 0.8, alpha_presence 1 ~ 2, alpha_frequency 0.2, alpha_decay 0.99 ``` **There should not be any space at the end of your input (so strip it) or you will upset the tokenizer and see non-English reponse.** Chat prompt (note: better replace all \n\n in USER_PROMPT to \n as i am using \n\n as "chat round separator" in pretrain data): ``` System: YOU_CAN_USE_SYSTEM_IF_NEEDED User: PREVIOUS_STUFF Assistant: PREVIOUS_STUFF User: USER_PROMPT Assistant: ``` Think prompt: ``` User: USER_PROMPT Assistant: