fnmodel / README_inference.md
aeb56
Transform Space into professional inference UI for fine-tuned model
5e458c4
metadata
title: Kimi 48B Fine-tuned - Inference
emoji: ๐Ÿš€
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
suggested_hardware: l40sx4

๐Ÿš€ Kimi Linear 48B A3B Instruct - Fine-tuned

Professional inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model.

Model Information

Features

โœจ Professional Chat Interface

  • Clean, modern UI for seamless conversations
  • Chat history with copy functionality
  • System prompt customization

โš™๏ธ Advanced Generation Settings

  • Temperature control for creativity
  • Top-P and Top-K sampling
  • Repetition penalty adjustment
  • Configurable response length

๐ŸŽฎ Optimized Performance

  • Multi-GPU support (4xL40S recommended)
  • Automatic device mapping
  • bfloat16 precision for efficiency
  • ~96GB VRAM requirement

Usage

  1. Click "Load Model" - Initialize the model (takes 2-5 minutes)
  2. Set System Prompt (optional) - Define the assistant's behavior
  3. Start Chatting - Type your message and hit send
  4. Adjust Settings - Fine-tune generation parameters as needed

Generation Parameters

Temperature (0.0 - 2.0)

  • Low (0.1-0.5): Focused, deterministic responses
  • Medium (0.6-0.9): Balanced creativity
  • High (1.0-2.0): More creative and diverse outputs

Top P (0.0 - 1.0)

  • 0.9 (recommended): Good balance
  • Lower values: More focused
  • Higher values: More diverse

Max New Tokens

  • Maximum length of generated response
  • 1024 (default): Good for most use cases
  • Increase for longer responses

Hardware Requirements

  • Recommended: 4x NVIDIA L40S GPUs (192GB total VRAM)
  • Minimum: 4x NVIDIA L4 GPUs (96GB total VRAM)
  • Memory: ~96GB VRAM in bfloat16 precision

Fine-tuning Details

This model was fine-tuned using QLoRA with the following configuration:

  • LoRA Rank (r): 16
  • LoRA Alpha: 32
  • Target Modules: q_proj, k_proj, v_proj, o_proj (attention layers only)
  • Dropout: 0.05

Support

For issues or questions:


Built with โค๏ธ using Transformers and Gradio