Nanbeige4.1-3B for Apple Neural Engine (ctx4096)

CoreML conversion of Nanbeige/Nanbeige4.1-3B for on-device inference on Apple Neural Engine using ANEMLL.

Available Context Lengths

Variant Context Use Case
ctx512 512 Quick Q&A, simple tasks
ctx1024 1024 Tool calls, short conversations
ctx2048 2048 Multi-turn chat
ctx4096 4096 Long conversations, complex tool workflows

Model Details

Parameter Value
Source Model Nanbeige/Nanbeige4.1-3B
Architecture LLaMA
Context Length 4096
Batch Size 16
Chunks 3
LUT (FFN) 6-bit, per_channel=4
LUT (LM Head) 6-bit, per_channel=4

Recommended Sampling

Parameter Value
Temperature 0.6
Top-p 0.95
Repeat penalty 1.0

Requirements

  • macOS 15 (Sequoia) or later with Apple Neural Engine
  • 8GB RAM or more
  • Python 3.9+ with coremltools>=9.0 and transformers

Quick Start

# Install dependencies
pip install "coremltools>=9.0" transformers huggingface_hub

# Clone the model
git lfs install
git clone https://huggingface.co/anemll/anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5
cd anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5

# Extract CoreML model files
find . -type f -name "*.zip" -exec unzip {} \;

# Run chat
python chat.py --meta ./meta.yaml

For conversation mode with history:

python chat_full.py --meta ./meta.yaml

Note: First load takes time as macOS places the model on ANE. Subsequent loads are instant. Use Ctrl-D to exit, Ctrl-C to interrupt.

Tool Calling

Nanbeige4.1-3B natively supports tool/function calling. The model uses <tool_call> tags to invoke functions.

ctx4096 provides ample context for complex tool calling with multiple tools and long <think> reasoning.

Live Demo

The included test_tool_call.py runs a full tool-calling loop on ANE with live HTTP calls:

  1. Model decides which tool to call
  2. Script executes the tool (real API request)
  3. Model summarizes the result
# Weather in San Francisco (default)
python test_tool_call.py --meta ./meta.yaml

# Custom city
python test_tool_call.py --meta ./meta.yaml --question "Weather in Paris?"
python test_tool_call.py --meta ./meta.yaml --question "Weather in Tokyo?"

The get_weather tool fetches live data from open-meteo.com (free, no API key).

Example output:

[Step 3] Parsed: get_weather({"location": "San Diego"})

[Step 4] Executing 'get_weather' (live HTTP call)...
  Result: {"location": "San Diego", "temp_f": 57.3, "humidity": 78, "condition": "Partly cloudy"}

[Step 5] Second inference (model summarizes result)...

  Final answer:
  ──────────────────────────────────────────────────
  The current weather in San Diego is:
  - **Temperature**: 57.3°F
  - **Conditions**: Partly cloudy
  - **Humidity**: 78%

How It Works

The model receives tools as JSON schemas in the system prompt using ChatML format:

<|im_start|>system
You are a helpful assistant with access to tools.
<tools>
{"type": "function", "function": {"name": "get_weather", ...}}
</tools>
<|im_end|>
<|im_start|>user
What is the weather in San Francisco?<|im_end|>
<|im_start|>assistant

The model responds with a tool call:

<think>
The user wants weather info. I should call get_weather.
</think>

<tool_call>
{"name": "get_weather", "arguments": {"location": "San Francisco"}}
</tool_call>

The tool result is fed back as a <tool_response>, and the model generates a natural language summary.

iOS/macOS App

Try the sample Chat-Bot app on TestFlight:

  1. Install TestFlight from App Store
  2. Join beta: TestFlight Link
  3. Add custom models via HuggingFace URLs

Links

License

ANEMLL conversion pipeline: MIT License Source model (Nanbeige4.1-3B): Apache 2.0

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anemll/anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5

Finetuned
(11)
this model

Collection including anemll/anemll-Nanbeige-Nanbeige4.1-3B-ctx4096_0.3.5