person-classification-tflite / deployment_guide.md
Jahnavibh's picture
Initial commit: Person Classification TensorFlow Lite Models
0eef5f7

Deployment Guide - Person Classification TensorFlow Lite Models

This guide provides comprehensive instructions for deploying the person classification models across different platforms and environments.

Table of Contents

  1. Quick Start
  2. Astra MCU SDK Deployment
  3. General Embedded Deployment
  4. Desktop/Server Deployment
  5. Performance Optimization
  6. Troubleshooting

Quick Start

Installation

# Install dependencies
pip install -r requirements.txt

# Or minimal installation for inference only
pip install tensorflow numpy pillow

Basic Usage

# Test with Flash model (VGA resolution)
python inference_example.py --model flash --image your_image.jpg

# Test with SRAM model (WQVGA resolution) 
python inference_example.py --model sram --image your_image.jpg

Astra MCU SDK Deployment

Prerequisites

  • Astra MCU SDK installed and configured
  • GCC/AC6 build environment
  • SynaToolkit for debugging and deployment
  • Astra Machina Micro Kit hardware

Model Selection Strategy

Scenario Recommended Model Resolution Memory Location Use Case
High Accuracy Required Flash Model 640×480 Flash Memory Security systems, detailed detection
Real-time Processing SRAM Model 480×270 SRAM IoT sensors, battery devices
Memory Constrained SRAM Model 480×270 SRAM Low-power applications
Balanced Performance Flash Model 640×480 Flash Memory General purpose applications

Step-by-Step Deployment

1. Project Configuration

For WQVGA Resolution (SRAM Model):

make cm55_person_classification_defconfig

For VGA Resolution (Flash Model):

make cm55_person_classification_defconfig
make menuconfig
# Navigate to: COMPONENTS CONFIGURATION → Off Chip Components → Display Resolution
# Change to: VGA(640x480)

2. Model Integration

SRAM Model Setup:

  • Copy person_classification_sram(256x448).tflite to your project's model directory
  • Model weights loaded into SRAM during initialization
  • Faster access but uses SRAM space

Flash Model Setup:

  • Copy person_classification_flash(448x640).tflite to your project's model directory
  • Generate binary file for flash deployment:
    # Use Vela compilation guide to generate .bin file
    # Flash to address: 0x629000 (calculated based on your NVM_data.json)
    

3. Build Process

# Build the application
make build

# Or simply
make

4. Binary Generation

  1. Open Astra MCU SDK VSCode Extension
  2. Navigate to AXF/ELF TO BIN → Bin Conversion
  3. Load generated sr110_cm55_fw.elf or sr110_cm55_fw.axf
  4. Click Run Image Generator

5. Flashing

WQVGA (SRAM Model):

# Flash the main application binary
# File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin
# The model is loaded into SRAM during runtime

VGA (Flash Model):

# 1. Flash the model binary first
# File: person_classification_flash(448x640).bin  
# Address: 0x629000

# 2. Flash the main application binary
# File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin

6. Verification

  1. Connect to Application SR110 USB port
  2. Open SynaToolkit
  3. Connect to COM port for logging
  4. Use Tools → Video Streamer for testing
  5. Configure UC ID: PERSON_CLASSIFICATION

General Embedded Deployment

TensorFlow Lite Micro Integration

#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"

// Model data (convert .tflite to C array)
extern const unsigned char person_model[];
extern const int person_model_len;

// Tensor arena size (adjust based on model)
constexpr int kTensorArenaSize = 100 * 1024;  // 100KB for SRAM model
constexpr int kTensorArenaSize = 150 * 1024;  // 150KB for Flash model

class PersonClassifier {
private:
    uint8_t tensor_arena[kTensorArenaSize];
    tflite::MicroInterpreter* interpreter;
    TfLiteTensor* input;
    TfLiteTensor* output;

public:
    bool Initialize() {
        // Load model
        const tflite::Model* model = tflite::GetModel(person_model);
        
        // Set up resolver and interpreter
        tflite::AllOpsResolver resolver;
        static tflite::MicroInterpreter static_interpreter(
            model, resolver, tensor_arena, kTensorArenaSize, &error_reporter);
        interpreter = &static_interpreter;

        // Allocate tensors
        TfLiteStatus allocate_status = interpreter->AllocateTensors();
        if (allocate_status != kTfLiteOk) {
            return false;
        }

        // Get input and output tensors
        input = interpreter->input(0);
        output = interpreter->output(0);
        
        return true;
    }

    float ClassifyImage(uint8_t* image_data) {
        // Copy image data to input tensor
        memcpy(input->data.uint8, image_data, input->bytes);

        // Run inference
        if (interpreter->Invoke() != kTfLiteOk) {
            return -1.0f;  // Error
        }

        // Get result (dequantize if needed)
        if (output->type == kTfLiteUInt8) {
            uint8_t output_quantized = output->data.uint8[0];
            return (output_quantized - output->params.zero_point) * output->params.scale;
        } else {
            return output->data.f[0];
        }
    }
};

Memory Requirements

Model Tensor Arena Model Size Total RAM Flash Usage
SRAM Model ~80KB 1.5MB ~2.5MB Minimal
Flash Model ~120KB 1.5MB ~200KB 1.5MB

Desktop/Server Deployment

Python Implementation

#!/usr/bin/env python3
import tensorflow as tf
import numpy as np
from PIL import Image
import argparse

class PersonClassificationServer:
    def __init__(self, model_path):
        self.interpreter = tf.lite.Interpreter(model_path=model_path)
        self.interpreter.allocate_tensors()
        self.input_details = self.interpreter.get_input_details()
        self.output_details = self.interpreter.get_output_details()
    
    def preprocess_image(self, image_path):
        image = Image.open(image_path).convert('RGB')
        input_shape = self.input_details[0]['shape'][1:3]  # height, width
        image = image.resize((input_shape[1], input_shape[0]))
        return np.expand_dims(np.array(image, dtype=np.uint8), axis=0)
    
    def classify(self, image_path):
        input_data = self.preprocess_image(image_path)
        self.interpreter.set_tensor(self.input_details[0]['index'], input_data)
        self.interpreter.invoke()
        output_data = self.interpreter.get_tensor(self.output_details[0]['index'])
        
        # Handle quantization
        scale = self.output_details[0]['quantization'][0]
        zero_point = self.output_details[0]['quantization'][1]
        
        if scale != 0:
            dequantized = scale * (output_data.astype(np.float32) - zero_point)
            probability = 1 / (1 + np.exp(-dequantized[0][0]))
        else:
            probability = float(output_data[0][0])
        
        return {
            'probability': probability,
            'prediction': 'person' if probability > 0.5 else 'non-person',
            'confidence': probability if probability > 0.5 else 1 - probability
        }

# Example usage
if __name__ == '__main__':
    classifier = PersonClassificationServer('person_classification_sram(256x448).tflite')
    result = classifier.classify('test_image.jpg')
    print(f"Prediction: {result['prediction']} (confidence: {result['confidence']:.2%})")

REST API Server

from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename
import os

app = Flask(__name__)
classifier = PersonClassificationServer('person_classification_sram(256x448).tflite')

@app.route('/classify', methods=['POST'])
def classify_image():
    if 'image' not in request.files:
        return jsonify({'error': 'No image file'}), 400
    
    file = request.files['image']
    if file.filename == '':
        return jsonify({'error': 'No file selected'}), 400
    
    filename = secure_filename(file.filename)
    filepath = os.path.join('/tmp', filename)
    file.save(filepath)
    
    try:
        result = classifier.classify(filepath)
        os.remove(filepath)  # Cleanup
        return jsonify(result)
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Performance Optimization

Model Selection Guidelines

  1. Choose SRAM model when:

    • Memory is extremely constrained
    • Real-time processing is critical
    • Power consumption is a concern
    • Input resolution is sufficient for use case
  2. Choose Flash model when:

    • Higher accuracy is required
    • Sufficient flash storage available
    • Processing higher resolution images
    • Can afford slightly longer inference time

Optimization Techniques

Input Image Optimization

# Efficient preprocessing
def optimize_preprocessing(image_path, target_size):
    """Optimized image preprocessing"""
    image = Image.open(image_path)
    
    # Convert only if necessary
    if image.mode != 'RGB':
        image = image.convert('RGB')
    
    # Use high-quality resampling for better accuracy
    image = image.resize(target_size, Image.Resampling.LANCZOS)
    
    # Convert to numpy efficiently
    return np.asarray(image, dtype=np.uint8)

Batch Processing

def batch_classify(classifier, image_paths, batch_size=8):
    """Process multiple images efficiently"""
    results = []
    
    for i in range(0, len(image_paths), batch_size):
        batch = image_paths[i:i+batch_size]
        batch_results = []
        
        for image_path in batch:
            result = classifier.classify(image_path)
            batch_results.append(result)
        
        results.extend(batch_results)
    
    return results

Performance Benchmarks

Platform Model Resolution Inference Time Memory Usage
Astra MCU (400MHz) SRAM 480×270 ~15ms 80KB RAM
Astra MCU (400MHz) Flash 640×480 ~25ms 120KB RAM
Raspberry Pi 4 SRAM 480×270 ~8ms 50MB RAM
Raspberry Pi 4 Flash 640×480 ~12ms 55MB RAM
Desktop CPU SRAM 480×270 ~2ms 30MB RAM
Desktop CPU Flash 640×480 ~3ms 35MB RAM

Troubleshooting

Common Issues

Model Loading Errors

# Issue: "Model file not found"
# Solution: Check file path and permissions
import os
if not os.path.exists(model_path):
    print(f"Model not found: {model_path}")
    
# Issue: "Invalid model format"  
# Solution: Verify .tflite file integrity
try:
    interpreter = tf.lite.Interpreter(model_path=model_path)
except Exception as e:
    print(f"Model loading error: {e}")

Input Shape Mismatch

# Get expected input shape
input_details = interpreter.get_input_details()
expected_shape = input_details[0]['shape']
print(f"Expected input shape: {expected_shape}")

# Ensure image matches expected dimensions
if image_data.shape != expected_shape:
    print(f"Shape mismatch: got {image_data.shape}, expected {expected_shape}")

Quantization Issues

# Check if model is quantized
output_details = interpreter.get_output_details()
scale = output_details[0]['quantization'][0]
zero_point = output_details[0]['quantization'][1]

if scale == 0:
    print("Model uses float32 output")
else:
    print(f"Quantized model: scale={scale}, zero_point={zero_point}")

Memory Issues on MCU

// Increase tensor arena size if needed
constexpr int kTensorArenaSize = 150 * 1024;  // Increase from 100KB

// Check allocation status
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
    printf("Failed to allocate tensors - increase kTensorArenaSize\n");
}

Debugging Tips

  1. Enable Verbose Logging:

    tf.get_logger().setLevel('DEBUG')
    
  2. Check Model Details:

    interpreter = tf.lite.Interpreter(model_path=model_path)
    print("Input details:", interpreter.get_input_details())
    print("Output details:", interpreter.get_output_details())
    
  3. Validate Input Data:

    print(f"Input shape: {input_data.shape}")
    print(f"Input dtype: {input_data.dtype}")
    print(f"Input range: [{input_data.min()}, {input_data.max()}]")
    

Support Resources

  • Astra MCU SDK: Official documentation and support forums
  • TensorFlow Lite: Official TFLite documentation
  • Model Issues: Check GitHub issues or create new issue with model details
  • Performance Optimization: TensorFlow Lite optimization guide

For additional support or specific deployment questions, please refer to the main README.md or create an issue in the repository.