macmacmacmac commited on
Commit
e1be71a
·
verified ·
1 Parent(s): 2079dd4

Add Fully Loaded Serving section with Ax MiPRO integration

Browse files
Files changed (1) hide show
  1. README.md +248 -0
README.md CHANGED
@@ -208,6 +208,254 @@ The model was trained on 10,500 synthetic examples covering common infrastructur
208
  "nginx container is not responding" → restartService
209
  ```
210
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
211
  ## Limitations
212
 
213
  - Optimized for the 7 specific infrastructure tools listed above
 
208
  "nginx container is not responding" → restartService
209
  ```
210
 
211
+
212
+
213
+ ## Fully Loaded Serving
214
+
215
+ **Fully Loaded Serving** is an end-to-end intelligent error remediation pipeline that runs entirely on-device. It combines:
216
+
217
+ 1. **Low-latency vector embeddings** (EmbeddingGemma) for streaming log classification
218
+ 2. **Semantic clustering** to group similar errors/issues/outliers
219
+ 3. **Function calling** (FunctionGemma) to automatically diagnose and fix infrastructure issues
220
+ 4. **Prompt optimization** via [Ax](https://github.com/ax-llm/ax) with MiPRO for continuous improvement
221
+
222
+ ### Architecture
223
+
224
+ ```
225
+ ┌─────────────────────────────────────────────────────────────────────────┐
226
+ │ Next.js Application │
227
+ ├─────────────────────────────────────────────────────────────────────────┤
228
+ │ stdout/stderr ──▶ Log Stream ──▶ dad-express middleware │
229
+ │ │ │
230
+ │ ┌─────────────────────┼──────────────────────┐ │
231
+ │ │ ▼ │ │
232
+ │ │ ┌──────────────────────────────────┐ │ │
233
+ │ │ │ EmbeddingGemma (~5ms) │ │ │
234
+ │ │ │ 768-dim vector per log line │ │ │
235
+ │ │ └──────────────┬───────────────────┘ │ │
236
+ │ │ │ │ │
237
+ │ │ ▼ │ │
238
+ │ │ ┌──────────────────────────────────┐ │ │
239
+ │ │ │ Semantic Clustering (cosine) │ │ │
240
+ │ │ │ • Group similar errors │ │ │
241
+ │ │ │ • Detect outliers │ │ │
242
+ │ │ │ • Identify recurring patterns │ │ │
243
+ │ │ └──────────────┬───────────────────┘ │ │
244
+ │ │ │ │ │
245
+ │ │ ▼ │ │
246
+ │ │ ┌──────────────────────────────────┐ │ │
247
+ │ │ │ FunctionGemma (~50ms/call) │ │ │
248
+ │ │ │ → enableCors, setEnvVar, etc. │ │ │
249
+ │ │ └──────────────┬───────────────────┘ │ │
250
+ │ │ │ │ │
251
+ │ │ ▼ │ │
252
+ │ │ ┌──────────────────────────────────┐ │ │
253
+ │ │ │ Auto-Remediation Layer │ │ │
254
+ │ │ │ Execute fix or notify developer │ │ │
255
+ │ │ └──────────────────────────────────┘ │ │
256
+ │ │ │ │
257
+ │ │ LiteRT-LM (on-device, ~300MB RAM) │ │
258
+ │ └────────────────────────────────────────────┘ │
259
+ └─────────────────────────────────────────────────────────────────────────┘
260
+ ```
261
+
262
+ ### Ax Integration with MiPRO
263
+
264
+ [Ax](https://github.com/ax-llm/ax) is a TypeScript DSPy-style framework for declarative AI programming. dad-express provides `AxLiteRTProvider` to run Ax signatures entirely on-device:
265
+
266
+ ```typescript
267
+ import { AxGen } from "@ax-llm/ax";
268
+ import { AxLiteRTProvider, EmbeddingEngine, FunctionGemmaEngine } from "dad-express";
269
+
270
+ // Create on-device provider with both embedding and chat models
271
+ const provider = new AxLiteRTProvider({
272
+ chat: {
273
+ modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm",
274
+ tools: infrastructureTools, // The 7 tools from this repo
275
+ },
276
+ embed: {
277
+ modelPath: "./models/embedding_gemma.tflite",
278
+ tokenizerPath: "./models/tokenizer.model",
279
+ },
280
+ });
281
+
282
+ // Define Ax signature for error diagnosis (MiPRO-optimizable)
283
+ const diagnoseError = new AxGen(`
284
+ errorMessage:string "The error log line",
285
+ errorCluster:string? "Similar errors seen recently"
286
+ ->
287
+ diagnosis:string "Root cause analysis",
288
+ toolName:string "Which infrastructure tool to call",
289
+ confidence:class "high, medium, low"
290
+ `);
291
+
292
+ // Run inference on-device
293
+ const result = await diagnoseError.forward(provider, {
294
+ errorMessage: "CORS error from http://localhost:3000",
295
+ errorCluster: "3 similar CORS errors in last 5 minutes",
296
+ });
297
+
298
+ console.log(result);
299
+ // { diagnosis: "Frontend origin not in allowed list",
300
+ // toolName: "enableCors",
301
+ // confidence: "high" }
302
+ ```
303
+
304
+ ### Example: Hosting Next.js with Fully Loaded Serving
305
+
306
+ ```typescript
307
+ // server.ts - Next.js with intelligent error remediation
308
+ import { createApp, FunctionGemmaEngine, EmbeddingEngine } from "dad-express";
309
+ import { spawn } from "child_process";
310
+
311
+ // Infrastructure tools (exact definitions for 100% accuracy)
312
+ const tools = [
313
+ { type: "function", function: { name: "enableCors", description: "Enable CORS for a specific origin to fix blocked cross-origin requests.", parameters: { type: "object", properties: { origin: { type: "string", description: "The origin to allow" } }, required: ["origin"] } } },
314
+ { type: "function", function: { name: "updateConnectionUrl", description: "Update a service connection URL to fix ECONNREFUSED errors.", parameters: { type: "object", properties: { service: { type: "string" }, hostname: { type: "string" }, port: { type: "integer" } }, required: ["service", "hostname", "port"] } } },
315
+ { type: "function", function: { name: "setEnvVar", description: "Set an environment variable to fix missing configuration errors.", parameters: { type: "object", properties: { name: { type: "string" }, value: { type: "string" } }, required: ["name", "value"] } } },
316
+ { type: "function", function: { name: "addHostMapping", description: "Add a hostname to IP mapping to fix DNS resolution errors.", parameters: { type: "object", properties: { hostname: { type: "string" }, ip: { type: "string" } }, required: ["hostname", "ip"] } } },
317
+ { type: "function", function: { name: "increaseMemory", description: "Increase memory limit for a service to fix OOMKilled errors.", parameters: { type: "object", properties: { service: { type: "string" }, memoryMb: { type: "integer" } }, required: ["service", "memoryMb"] } } },
318
+ { type: "function", function: { name: "increaseTimeout", description: "Increase timeout value to fix 504 Gateway Timeout errors.", parameters: { type: "object", properties: { service: { type: "string" }, timeoutMs: { type: "integer" } }, required: ["service", "timeoutMs"] } } },
319
+ { type: "function", function: { name: "restartService", description: "Restart a service to apply changes or fix stuck processes.", parameters: { type: "object", properties: { service: { type: "string" } }, required: ["service"] } } },
320
+ ];
321
+
322
+ // Initialize on-device models
323
+ const embedEngine = new EmbeddingEngine({
324
+ modelPath: "./models/embedding_gemma.tflite",
325
+ tokenizerPath: "./models/tokenizer.model",
326
+ });
327
+
328
+ const functionGemma = new FunctionGemmaEngine({
329
+ modelPath: "./models/functiongemma-infra-v8_q8_ekv1024.litertlm",
330
+ tools: JSON.stringify(tools),
331
+ });
332
+
333
+ // Error clustering state
334
+ const errorClusters = new Map<string, { embedding: Float32Array; count: number; lastSeen: Date }>();
335
+
336
+ async function classifyAndCluster(logLine: string): Promise<string | null> {
337
+ // Skip non-error lines
338
+ if (!logLine.match(/error|fail|exception|timeout|refused|denied/i)) {
339
+ return null;
340
+ }
341
+
342
+ // Generate embedding (~5ms on CPU)
343
+ const embedding = await embedEngine.encodeAsync(logLine);
344
+
345
+ // Find similar errors via cosine similarity
346
+ let bestMatch: string | null = null;
347
+ let bestSimilarity = 0.85; // Threshold for clustering
348
+
349
+ for (const [clusterId, cluster] of errorClusters) {
350
+ const similarity = EmbeddingEngine.cosineSimilarity(embedding, cluster.embedding);
351
+ if (similarity > bestSimilarity) {
352
+ bestSimilarity = similarity;
353
+ bestMatch = clusterId;
354
+ }
355
+ }
356
+
357
+ if (bestMatch) {
358
+ // Update existing cluster
359
+ const cluster = errorClusters.get(bestMatch)!;
360
+ cluster.count++;
361
+ cluster.lastSeen = new Date();
362
+ return bestMatch;
363
+ }
364
+
365
+ // Create new cluster
366
+ const clusterId = `cluster_${Date.now()}`;
367
+ errorClusters.set(clusterId, { embedding, count: 1, lastSeen: new Date() });
368
+ return clusterId;
369
+ }
370
+
371
+ async function diagnoseAndFix(errorLog: string, clusterId: string): Promise<void> {
372
+ const cluster = errorClusters.get(clusterId);
373
+
374
+ // Call FunctionGemma for diagnosis (~50ms)
375
+ const result = await functionGemma.sendMessage(errorLog);
376
+
377
+ if (result.functionCalls && result.functionCalls.length > 0) {
378
+ const call = result.functionCalls[0];
379
+ console.log(`[AutoFix] Detected ${cluster?.count || 1}x: ${call.name}`);
380
+ console.log(`[AutoFix] Args: ${JSON.stringify(call.arguments)}`);
381
+
382
+ // Execute remediation (in production, this would call actual infrastructure APIs)
383
+ switch (call.name) {
384
+ case "enableCors":
385
+ console.log(`[AutoFix] Would enable CORS for: ${call.arguments.origin}`);
386
+ break;
387
+ case "restartService":
388
+ console.log(`[AutoFix] Would restart: ${call.arguments.service}`);
389
+ break;
390
+ case "increaseMemory":
391
+ console.log(`[AutoFix] Would increase memory for ${call.arguments.service} to ${call.arguments.memoryMb}MB`);
392
+ break;
393
+ // ... handle other tools
394
+ }
395
+ }
396
+ }
397
+
398
+ // Create dad-express app
399
+ const app = createApp();
400
+
401
+ // API routes
402
+ app.get("/health", () => ({ status: "ok", models: { embed: true, functionGemma: true } }));
403
+
404
+ app.get("/clusters", () => {
405
+ const clusters = [];
406
+ for (const [id, cluster] of errorClusters) {
407
+ clusters.push({ id, count: cluster.count, lastSeen: cluster.lastSeen });
408
+ }
409
+ return clusters;
410
+ });
411
+
412
+ // Start Next.js as child process with log monitoring
413
+ const nextProcess = spawn("npx", ["next", "start"], {
414
+ stdio: ["inherit", "pipe", "pipe"],
415
+ env: { ...process.env, PORT: "3001" },
416
+ });
417
+
418
+ // Stream stdout
419
+ nextProcess.stdout.on("data", (data) => {
420
+ const line = data.toString().trim();
421
+ console.log(`[next] ${line}`);
422
+ });
423
+
424
+ // Stream stderr with intelligent processing
425
+ nextProcess.stderr.on("data", async (data) => {
426
+ const line = data.toString().trim();
427
+ console.log(`[next:err] ${line}`);
428
+
429
+ // Classify and cluster error
430
+ const clusterId = await classifyAndCluster(line);
431
+
432
+ if (clusterId) {
433
+ // Diagnose and auto-fix
434
+ await diagnoseAndFix(line, clusterId);
435
+ }
436
+ });
437
+
438
+ // Start dad-express on separate port for monitoring
439
+ app.listen(4000, () => {
440
+ console.log("dad-express monitoring on http://localhost:4000");
441
+ console.log("Next.js app on http://localhost:3001");
442
+ });
443
+ ```
444
+
445
+ ### Key Benefits
446
+
447
+ | Feature | Latency | Memory | Cloud Calls |
448
+ |---------|---------|--------|-------------|
449
+ | EmbeddingGemma | ~5ms/embed | ~50MB | 0 |
450
+ | FunctionGemma | ~50ms/call | ~271MB | 0 |
451
+ | Semantic clustering | <1ms | Varies | 0 |
452
+ | **Total pipeline** | **~60ms** | **~350MB** | **0** |
453
+
454
+ - **Zero cloud dependency**: All inference runs locally via LiteRT-LM
455
+ - **Sub-100ms latency**: Fast enough for real-time log processing
456
+ - **Privacy-preserving**: Error logs never leave the device
457
+ - **Continuous improvement**: Use Ax MiPRO to optimize prompts over time
458
+
459
  ## Limitations
460
 
461
  - Optimized for the 7 specific infrastructure tools listed above