| | --- |
| | license: apache-2.0 |
| | tags: |
| | - llm |
| | - tinyllama |
| | - function-calling |
| | - cpu-optimized |
| | - low-resource |
| | --- |
| | |
| | # TinyLlama Function Calling (CPU Optimized) |
| |
|
| | This is a CPU-optimized version of TinyLlama that has been fine-tuned for function calling capabilities. |
| |
|
| | ## Model Details |
| |
|
| | - **Base Model**: TinyLlama-1.1B-Chat-v1.0 |
| | - **Parameters**: 1.1 billion |
| | - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) |
| | - **Training Data**: Function calling examples from Glaive Function Calling v2 dataset |
| | - **Optimization**: Merged LoRA weights, converted to float32 for CPU deployment |
| |
|
| | ## Key Features |
| |
|
| | 1. **Function Calling Capabilities**: The model can identify when functions should be called and generate appropriate function call syntax |
| | 2. **CPU Optimized**: Ready to run efficiently on low-end hardware without GPUs |
| | 3. **Lightweight**: Only 1.1B parameters, making it suitable for older hardware |
| | 4. **Low Resource Requirements**: Requires only 4-6 GB RAM for loading |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | # Load the model |
| | model = AutoModelForCausalLM.from_pretrained("tinyllama-function-calling-cpu-optimized") |
| | tokenizer = AutoTokenizer.from_pretrained("tinyllama-function-calling-cpu-optimized") |
| | |
| | # Example prompt for function calling |
| | prompt = """### Instruction: |
| | Given the available functions and the user query, determine which function(s) to call and with what arguments. |
| | |
| | Available functions: |
| | { |
| | "name": "get_exchange_rate", |
| | "description": "Get the exchange rate between two currencies", |
| | "parameters": { |
| | "type": "object", |
| | "properties": { |
| | "base_currency": { |
| | "type": "string", |
| | "description": "The currency to convert from" |
| | }, |
| | "target_currency": { |
| | "type": "string", |
| | "description": "The currency to convert to" |
| | } |
| | }, |
| | "required": [ |
| | "base_currency", |
| | "target_currency" |
| | ] |
| | } |
| | } |
| | |
| | User query: What is the exchange rate from USD to EUR? |
| | |
| | ### Response:""" |
| | |
| | # Tokenize and generate response |
| | inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512) |
| | with torch.no_grad(): |
| | outputs = model.generate( |
| | **inputs, |
| | max_new_tokens=150, |
| | do_sample=True, |
| | temperature=0.7, |
| | top_k=50, |
| | top_p=0.95 |
| | ) |
| | |
| | response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | print(response) |
| | ``` |
| |
|
| | ## Performance on Low-End Hardware |
| |
|
| | The CPU-optimized model requires approximately: |
| | - 4-6 GB RAM for loading |
| | - 2-4 CPU cores for inference |
| | - No GPU required |
| |
|
| | This makes it suitable for: |
| | - Older laptops (2018 and newer) |
| | - Low-end desktops |
| | - Edge devices with ARM processors |
| |
|
| | ## Training Process |
| |
|
| | The model was fine-tuned using LoRA (Low-Rank Adaptation) on the Glaive Function Calling v2 dataset. Only a subset of 50 examples was used for demonstration purposes. |
| |
|
| | ## License |
| |
|
| | This model is licensed under the Apache 2.0 license. |