macmacmacmac's picture
Add Fully Loaded Serving section with Ax MiPRO integration
e1be71a verified
---
license: gemma
language:
- en
base_model:
- google/functiongemma-270m-it
pipeline_tag: text-generation
tags:
- function-calling
- infrastructure
- devops
- litertlm
---
# FunctionGemma Infrastructure Tools v8
A fine-tuned [FunctionGemma 270M](https://huggingface.co/google/functiongemma-270m-it) model for infrastructure error diagnosis and remediation. Achieves **100% accuracy** on 7 infrastructure tools when using the correct tool definitions.
## Model Details
- **Base Model**: google/functiongemma-270m-it
- **Format**: LiteRT-LM (.litertlm) - optimized for on-device inference
- **Quantization**: INT8 (Q8)
- **Size**: ~271MB
- **Training**: 50 epochs on 10,500 examples (1,500 per tool)
## Supported Tools
| Tool | Description | Use Case |
|------|-------------|----------|
| `enableCors` | Enable CORS for a specific origin | CORS policy errors, blocked cross-origin requests |
| `updateConnectionUrl` | Update service connection URL | ECONNREFUSED errors, localhost connection issues in containers |
| `setEnvVar` | Set environment variable | Missing configuration, undefined env vars |
| `addHostMapping` | Add hostname to IP mapping | DNS resolution (ENOTFOUND) errors |
| `increaseMemory` | Increase memory limit | OOMKilled errors, out of memory crashes |
| `increaseTimeout` | Increase timeout value | 504 Gateway Timeout, connection timeout errors |
| `restartService` | Restart a service | Stuck processes, stale data after deployment |
## Usage with LiteRT-LM
### Download the Model
```bash
# Using huggingface-cli
huggingface-cli download macmacmacmac/functiongemma-nextjs functiongemma-infra-v8_q8_ekv1024.litertlm
# Or using Python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="macmacmacmac/functiongemma-nextjs",
filename="functiongemma-infra-v8_q8_ekv1024.litertlm"
)
```
### Required Tool Definitions
**Important**: You must use these exact tool definitions for optimal accuracy. The model was trained with these specific descriptions.
```javascript
const tools = [
{
type: "function",
function: {
name: "enableCors",
description: "Enable CORS for a specific origin to fix blocked cross-origin requests.",
parameters: {
type: "object",
properties: {
origin: { type: "string", description: "The origin to allow (e.g., http://localhost:3000)" },
methods: { type: "string", description: "Allowed HTTP methods (e.g., GET,POST,PUT,DELETE)" }
},
required: ["origin"]
}
}
},
{
type: "function",
function: {
name: "updateConnectionUrl",
description: "Update a service connection URL to fix ECONNREFUSED errors, typically changing localhost to the correct service hostname.",
parameters: {
type: "object",
properties: {
service: { type: "string", description: "The service to update (e.g., database, redis, api)" },
hostname: { type: "string", description: "The correct hostname to connect to" },
port: { type: "integer", description: "The port number to connect to" }
},
required: ["service", "hostname", "port"]
}
}
},
{
type: "function",
function: {
name: "setEnvVar",
description: "Set an environment variable to fix missing configuration errors.",
parameters: {
type: "object",
properties: {
name: { type: "string", description: "Environment variable name (e.g., DATABASE_URL, API_KEY)" },
value: { type: "string", description: "The value to set" }
},
required: ["name", "value"]
}
}
},
{
type: "function",
function: {
name: "addHostMapping",
description: "Add a hostname to IP mapping to fix DNS resolution (ENOTFOUND) errors.",
parameters: {
type: "object",
properties: {
hostname: { type: "string", description: "The hostname to map" },
ip: { type: "string", description: "The IP address to map to" }
},
required: ["hostname", "ip"]
}
}
},
{
type: "function",
function: {
name: "increaseMemory",
description: "Increase memory limit for a service to fix OOMKilled errors.",
parameters: {
type: "object",
properties: {
service: { type: "string", description: "The service/container/pod name" },
memoryMb: { type: "integer", description: "Memory limit in megabytes" }
},
required: ["service", "memoryMb"]
}
}
},
{
type: "function",
function: {
name: "increaseTimeout",
description: "Increase timeout value to fix 504 Gateway Timeout or connection timeout errors.",
parameters: {
type: "object",
properties: {
service: { type: "string", description: "The service to configure" },
timeoutMs: { type: "integer", description: "Timeout value in milliseconds" }
},
required: ["service", "timeoutMs"]
}
}
},
{
type: "function",
function: {
name: "restartService",
description: "Restart a service to apply configuration changes or fix a stuck process.",
parameters: {
type: "object",
properties: {
service: { type: "string", description: "The service/container/pod name to restart" }
},
required: ["service"]
}
}
}
];
```
### Example Usage with dad-express
```javascript
const { FunctionGemmaEngine } = require('dad-express');
const engine = new FunctionGemmaEngine({
modelPath: './functiongemma-infra-v8_q8_ekv1024.litertlm',
tools: JSON.stringify(tools)
});
// Diagnose an error
const result = await engine.call('Container api was OOMKilled - out of memory');
console.log(result.tool_calls[0].function);
// { name: 'increaseMemory', arguments: { service: 'api', memoryMb: 1024 } }
```
## Training Data
The model was trained on 10,500 synthetic examples covering common infrastructure errors:
| Error Pattern | Tool | Examples |
|--------------|------|----------|
| CORS policy errors | enableCors | 1,500 |
| ECONNREFUSED errors | updateConnectionUrl | 1,500 |
| Missing env vars | setEnvVar | 1,500 |
| DNS/ENOTFOUND errors | addHostMapping | 1,500 |
| OOMKilled errors | increaseMemory | 1,500 |
| Timeout errors | increaseTimeout | 1,500 |
| Stuck services | restartService | 1,500 |
### Sample Training Examples
```
"CORS error: No 'Access-Control-Allow-Origin' header from http://localhost:3000" → enableCors
"Error: connect ECONNREFUSED 127.0.0.1:5432 - database connection failed" → updateConnectionUrl
"Missing required environment variable: DATABASE_URL" → setEnvVar
"getaddrinfo ENOTFOUND db" → addHostMapping
"Container api was OOMKilled" → increaseMemory
"504 Gateway Timeout from backend" → increaseTimeout
"nginx container is not responding" → restartService
```
## Fully Loaded Serving
**Fully Loaded Serving** is an end-to-end intelligent error remediation pipeline that runs entirely on-device. It combines:
1. **Low-latency vector embeddings** (EmbeddingGemma) for streaming log classification
2. **Semantic clustering** to group similar errors/issues/outliers
3. **Function calling** (FunctionGemma) to automatically diagnose and fix infrastructure issues
4. **Prompt optimization** via [Ax](https://github.com/ax-llm/ax) with MiPRO for continuous improvement
### Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Next.js Application │
├─────────────────────────────────────────────────────────────────────────┤
│ stdout/stderr ──▶ Log Stream ──▶ dad-express middleware │
│ │ │
│ ┌─────────────────────┼──────────────────────┐ │
│ │ ▼ │ │
│ │ ┌──────────────────────────────────┐ │ │
│ │ │ EmbeddingGemma (~5ms) │ │ │
│ │ │ 768-dim vector per log line │ │ │
│ │ └──────────────┬───────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────────────────────┐ │ │
│ │ │ Semantic Clustering (cosine) │ │ │
│ │ │ • Group similar errors │ │ │
│ │ │ • Detect outliers │ │ │
│ │ │ • Identify recurring patterns │ │ │
│ │ └──────────────┬───────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────────────────────┐ │ │
│ │ │ FunctionGemma (~50ms/call) │ │ │
│ │ │ → enableCors, setEnvVar, etc. │ │ │
│ │ └──────────────┬───────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────────────────────┐ │ │
│ │ │ Auto-Remediation Layer │ │ │
│ │ │ Execute fix or notify developer │ │ │
│ │ └──────────────────────────────────┘ │ │
│ │ │ │
│ │ LiteRT-LM (on-device, ~300MB RAM) │ │
│ └────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Ax Integration with MiPRO
[Ax](https://github.com/ax-llm/ax) is a TypeScript DSPy-style framework for declarative AI programming. dad-express provides `AxLiteRTProvider` to run Ax signatures entirely on-device:
```typescript
import { AxGen } from "@ax-llm/ax";
import { AxLiteRTProvider, EmbeddingEngine, FunctionGemmaEngine } from "dad-express";
// Create on-device provider with both embedding and chat models
const provider = new AxLiteRTProvider({
chat: {
modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm",
tools: infrastructureTools, // The 7 tools from this repo
},
embed: {
modelPath: "./models/embedding_gemma.tflite",
tokenizerPath: "./models/tokenizer.model",
},
});
// Define Ax signature for error diagnosis (MiPRO-optimizable)
const diagnoseError = new AxGen(`
errorMessage:string "The error log line",
errorCluster:string? "Similar errors seen recently"
->
diagnosis:string "Root cause analysis",
toolName:string "Which infrastructure tool to call",
confidence:class "high, medium, low"
`);
// Run inference on-device
const result = await diagnoseError.forward(provider, {
errorMessage: "CORS error from http://localhost:3000",
errorCluster: "3 similar CORS errors in last 5 minutes",
});
console.log(result);
// { diagnosis: "Frontend origin not in allowed list",
// toolName: "enableCors",
// confidence: "high" }
```
### Example: Hosting Next.js with Fully Loaded Serving
```typescript
// server.ts - Next.js with intelligent error remediation
import { createApp, FunctionGemmaEngine, EmbeddingEngine } from "dad-express";
import { spawn } from "child_process";
// Infrastructure tools (exact definitions for 100% accuracy)
const tools = [
{ type: "function", function: { name: "enableCors", description: "Enable CORS for a specific origin to fix blocked cross-origin requests.", parameters: { type: "object", properties: { origin: { type: "string", description: "The origin to allow" } }, required: ["origin"] } } },
{ type: "function", function: { name: "updateConnectionUrl", description: "Update a service connection URL to fix ECONNREFUSED errors.", parameters: { type: "object", properties: { service: { type: "string" }, hostname: { type: "string" }, port: { type: "integer" } }, required: ["service", "hostname", "port"] } } },
{ type: "function", function: { name: "setEnvVar", description: "Set an environment variable to fix missing configuration errors.", parameters: { type: "object", properties: { name: { type: "string" }, value: { type: "string" } }, required: ["name", "value"] } } },
{ type: "function", function: { name: "addHostMapping", description: "Add a hostname to IP mapping to fix DNS resolution errors.", parameters: { type: "object", properties: { hostname: { type: "string" }, ip: { type: "string" } }, required: ["hostname", "ip"] } } },
{ type: "function", function: { name: "increaseMemory", description: "Increase memory limit for a service to fix OOMKilled errors.", parameters: { type: "object", properties: { service: { type: "string" }, memoryMb: { type: "integer" } }, required: ["service", "memoryMb"] } } },
{ type: "function", function: { name: "increaseTimeout", description: "Increase timeout value to fix 504 Gateway Timeout errors.", parameters: { type: "object", properties: { service: { type: "string" }, timeoutMs: { type: "integer" } }, required: ["service", "timeoutMs"] } } },
{ type: "function", function: { name: "restartService", description: "Restart a service to apply changes or fix stuck processes.", parameters: { type: "object", properties: { service: { type: "string" } }, required: ["service"] } } },
];
// Initialize on-device models
const embedEngine = new EmbeddingEngine({
modelPath: "./models/embedding_gemma.tflite",
tokenizerPath: "./models/tokenizer.model",
});
const functionGemma = new FunctionGemmaEngine({
modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm",
tools: JSON.stringify(tools),
});
// Error clustering state
const errorClusters = new Map<string, { embedding: Float32Array; count: number; lastSeen: Date }>();
async function classifyAndCluster(logLine: string): Promise<string | null> {
// Skip non-error lines
if (!logLine.match(/error|fail|exception|timeout|refused|denied/i)) {
return null;
}
// Generate embedding (~5ms on CPU)
const embedding = await embedEngine.encodeAsync(logLine);
// Find similar errors via cosine similarity
let bestMatch: string | null = null;
let bestSimilarity = 0.85; // Threshold for clustering
for (const [clusterId, cluster] of errorClusters) {
const similarity = EmbeddingEngine.cosineSimilarity(embedding, cluster.embedding);
if (similarity > bestSimilarity) {
bestSimilarity = similarity;
bestMatch = clusterId;
}
}
if (bestMatch) {
// Update existing cluster
const cluster = errorClusters.get(bestMatch)!;
cluster.count++;
cluster.lastSeen = new Date();
return bestMatch;
}
// Create new cluster
const clusterId = `cluster_${Date.now()}`;
errorClusters.set(clusterId, { embedding, count: 1, lastSeen: new Date() });
return clusterId;
}
async function diagnoseAndFix(errorLog: string, clusterId: string): Promise<void> {
const cluster = errorClusters.get(clusterId);
// Call FunctionGemma for diagnosis (~50ms)
const result = await functionGemma.sendMessage(errorLog);
if (result.functionCalls && result.functionCalls.length > 0) {
const call = result.functionCalls[0];
console.log(`[AutoFix] Detected ${cluster?.count || 1}x: ${call.name}`);
console.log(`[AutoFix] Args: ${JSON.stringify(call.arguments)}`);
// Execute remediation (in production, this would call actual infrastructure APIs)
switch (call.name) {
case "enableCors":
console.log(`[AutoFix] Would enable CORS for: ${call.arguments.origin}`);
break;
case "restartService":
console.log(`[AutoFix] Would restart: ${call.arguments.service}`);
break;
case "increaseMemory":
console.log(`[AutoFix] Would increase memory for ${call.arguments.service} to ${call.arguments.memoryMb}MB`);
break;
// ... handle other tools
}
}
}
// Create dad-express app
const app = createApp();
// API routes
app.get("/health", () => ({ status: "ok", models: { embed: true, functionGemma: true } }));
app.get("/clusters", () => {
const clusters = [];
for (const [id, cluster] of errorClusters) {
clusters.push({ id, count: cluster.count, lastSeen: cluster.lastSeen });
}
return clusters;
});
// Start Next.js as child process with log monitoring
const nextProcess = spawn("npx", ["next", "start"], {
stdio: ["inherit", "pipe", "pipe"],
env: { ...process.env, PORT: "3001" },
});
// Stream stdout
nextProcess.stdout.on("data", (data) => {
const line = data.toString().trim();
console.log(`[next] ${line}`);
});
// Stream stderr with intelligent processing
nextProcess.stderr.on("data", async (data) => {
const line = data.toString().trim();
console.log(`[next:err] ${line}`);
// Classify and cluster error
const clusterId = await classifyAndCluster(line);
if (clusterId) {
// Diagnose and auto-fix
await diagnoseAndFix(line, clusterId);
}
});
// Start dad-express on separate port for monitoring
app.listen(4000, () => {
console.log("dad-express monitoring on http://localhost:4000");
console.log("Next.js app on http://localhost:3001");
});
```
### Key Benefits
| Feature | Latency | Memory | Cloud Calls |
|---------|---------|--------|-------------|
| EmbeddingGemma | ~5ms/embed | ~50MB | 0 |
| FunctionGemma | ~50ms/call | ~271MB | 0 |
| Semantic clustering | <1ms | Varies | 0 |
| **Total pipeline** | **~60ms** | **~350MB** | **0** |
- **Zero cloud dependency**: All inference runs locally via LiteRT-LM
- **Sub-100ms latency**: Fast enough for real-time log processing
- **Privacy-preserving**: Error logs never leave the device
- **Continuous improvement**: Use Ax MiPRO to optimize prompts over time
## Limitations
- Optimized for the 7 specific infrastructure tools listed above
- Requires exact tool definitions for best accuracy
- May not generalize well to error patterns not seen in training
## License
This model inherits the [Gemma license](https://ai.google.dev/gemma/terms) from the base model.