--- license: mit datasets: - >- CreitinGameplays/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B-filtered-mistral language: - en base_model: - mistralai/Mistral-Nemo-Instruct-2407 pipeline_tag: text-generation library_name: transformers --- ## Chat template: ``` [SYSTEM]You are an AI focused on providing systematic, well-reasoned responses. Response Structure: - Format: {{reasoning}}{{answer}} - Reasoning: Minimum 6 logical steps only when it required in block - Process: Think first, then answer.[/SYSTEM] [INST]{user_input}[/INST] ``` ## Run the model: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, BitsAndBytesConfig import bitsandbytes import torch._dynamo from torch._dynamo import disable as dynamo_disable import os torch._dynamo.config.suppress_errors = True os.environ["TORCHDYNAMO_DISABLE"] = "1" quantization_config = BitsAndBytesConfig( load_in_8bit=True, #bnb_8bit_use_double_quant=True, #bnb_8bit_quant_type="nf4", #bnb_8bit_compute_dtype=torch.bfloat16, #llm_int8_threshold=200.0, llm_int8_enable_fp32_cpu_offload=True ) model_id = "CreitinGameplays/Mistral-Nemo-12B-R1-v0.2-exp" # Initialize model and tokenizer with streaming support model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", quantization_config=quantization_config ) tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True) # Custom streamer that collects the output into a string while streaming class CollectingStreamer(TextStreamer): def __init__(self, tokenizer): super().__init__(tokenizer) self.output = "" def on_llm_new_token(self, token: str, **kwargs): self.output += token print(token, end="", flush=True) # prints the token as it's generated print("Chat session started. Type 'exit' to quit.\n") # Initialize chat history as a list of messages chat_history = [] chat_history.append({"role": "system", "content": "You are an AI assistant made by Mistral AI"}) while True: user_input = input("You: ") if user_input.strip().lower() == "exit": break # Append the user message to the chat history chat_history.append({"role": "user", "content": user_input}) # Prepare the prompt by formatting the complete chat history inputs = tokenizer.apply_chat_template( chat_history, return_tensors="pt", add_special_tokens=False ).to(model.device) # Create a new streamer for the current generation streamer = CollectingStreamer(tokenizer) # Generate streamed response model.generate( inputs, streamer=streamer, temperature=0.3, top_p=0.8, top_k=50, repetition_penalty=1.1, max_new_tokens=4096, do_sample=True ) # The complete response text is stored in streamer.output response_text = streamer.output print("\nAssistant:", response_text) # Append the assistant response to the chat history chat_history.append({"role": "assistant", "content": response_text}) ``` ### Note: This model was finetuned only with 2000 max steps.