:::   :::   ::::::::::: ::::::::  :::::::::   ::::::::  ::::::::: 
  :+:+: :+:+:      :+:    :+:    :+: :+:    :+: :+:    :+: :+:    :+: 
  +:+ +:+:+ +:+     +:+    +:+        +:+    +:+ +:+    +:+ |:|    +:+  
   +#+  +:+  +#+     +#+    +#+        +#++:++#:  +#+    +:+ |#|    +:+   
    +#+       +#+     +#+    +#+        +#+    +#+ +#+    +#+ |#|    +#+    
###       ###     ###    ###    ### ###    ### ###    ### ###    ###
###      ###  ########### ########  ###   ###   ########  #########

MICROD v1.0 (micro-distill-grpo-vae)

This model was made with the Micro Distillery app available at:

webxos.netlify.app/MICROD

-Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
-Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
-VAE Filtering: Apply latent space compression to improve distillation quality.
-Sandbox Testing: Execute safe Python code with feedback masking.
-Export & Deployment: Generate deployable models for inference in various frameworks.
-Offline Usage: PWA supports offline training simulation and exports.
by webXOS
- **Model size**: 42M parameters
- **Model type**: micro-distill-grpo-vae

Model Description

This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering. MICROD v1.0 (micro-distill-grpo-vae) is a small template model designed to be built upon for custom ground up builds. It is distillated into a small set of files the user can use to template their own agents. Designed for educational learning and micro scalling. Use MICROD V1.0 (micro-distill-grpo-vae) in your own custom projects and train it from the ground up.

Model Details

  • Model type: micro-distill-grpo-vae
  • Model size: 42M parameters
  • Language: English
  • License: Apache 2.0

Training Methodology

  • GRPO (Group Relative Policy Optimization): 8 groups
  • VAE Filtering: 32D latent space
  • KV-Cache Reuse: 512 cache size

Architecture Details

  • Hidden size: 512
  • Number of layers: 8
  • Attention heads: 8
  • Vocabulary size: 50257
  • Maximum sequence length: 1024

Usage

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("micro-distill-grpo-vae")
tokenizer = AutoTokenizer.from_pretrained("micro-distill-grpo-vae")

inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Downloads last month
156
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for webxos/microd_v1

Quantized
(83)
this model