Deployment Guide - Person Classification TensorFlow Lite Models
This guide provides comprehensive instructions for deploying the person classification models across different platforms and environments.
Table of Contents
- Quick Start
- Astra MCU SDK Deployment
- General Embedded Deployment
- Desktop/Server Deployment
- Performance Optimization
- Troubleshooting
Quick Start
Installation
# Install dependencies
pip install -r requirements.txt
# Or minimal installation for inference only
pip install tensorflow numpy pillow
Basic Usage
# Test with Flash model (VGA resolution)
python inference_example.py --model flash --image your_image.jpg
# Test with SRAM model (WQVGA resolution)
python inference_example.py --model sram --image your_image.jpg
Astra MCU SDK Deployment
Prerequisites
- Astra MCU SDK installed and configured
- GCC/AC6 build environment
- SynaToolkit for debugging and deployment
- Astra Machina Micro Kit hardware
Model Selection Strategy
| Scenario | Recommended Model | Resolution | Memory Location | Use Case |
|---|---|---|---|---|
| High Accuracy Required | Flash Model | 640×480 | Flash Memory | Security systems, detailed detection |
| Real-time Processing | SRAM Model | 480×270 | SRAM | IoT sensors, battery devices |
| Memory Constrained | SRAM Model | 480×270 | SRAM | Low-power applications |
| Balanced Performance | Flash Model | 640×480 | Flash Memory | General purpose applications |
Step-by-Step Deployment
1. Project Configuration
For WQVGA Resolution (SRAM Model):
make cm55_person_classification_defconfig
For VGA Resolution (Flash Model):
make cm55_person_classification_defconfig
make menuconfig
# Navigate to: COMPONENTS CONFIGURATION → Off Chip Components → Display Resolution
# Change to: VGA(640x480)
2. Model Integration
SRAM Model Setup:
- Copy
person_classification_sram(256x448).tfliteto your project's model directory - Model weights loaded into SRAM during initialization
- Faster access but uses SRAM space
Flash Model Setup:
- Copy
person_classification_flash(448x640).tfliteto your project's model directory - Generate binary file for flash deployment:
# Use Vela compilation guide to generate .bin file # Flash to address: 0x629000 (calculated based on your NVM_data.json)
3. Build Process
# Build the application
make build
# Or simply
make
4. Binary Generation
- Open Astra MCU SDK VSCode Extension
- Navigate to AXF/ELF TO BIN → Bin Conversion
- Load generated
sr110_cm55_fw.elforsr110_cm55_fw.axf - Click Run Image Generator
5. Flashing
WQVGA (SRAM Model):
# Flash the main application binary
# File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin
# The model is loaded into SRAM during runtime
VGA (Flash Model):
# 1. Flash the model binary first
# File: person_classification_flash(448x640).bin
# Address: 0x629000
# 2. Flash the main application binary
# File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin
6. Verification
- Connect to Application SR110 USB port
- Open SynaToolkit
- Connect to COM port for logging
- Use Tools → Video Streamer for testing
- Configure UC ID: PERSON_CLASSIFICATION
General Embedded Deployment
TensorFlow Lite Micro Integration
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
// Model data (convert .tflite to C array)
extern const unsigned char person_model[];
extern const int person_model_len;
// Tensor arena size (adjust based on model)
constexpr int kTensorArenaSize = 100 * 1024; // 100KB for SRAM model
constexpr int kTensorArenaSize = 150 * 1024; // 150KB for Flash model
class PersonClassifier {
private:
uint8_t tensor_arena[kTensorArenaSize];
tflite::MicroInterpreter* interpreter;
TfLiteTensor* input;
TfLiteTensor* output;
public:
bool Initialize() {
// Load model
const tflite::Model* model = tflite::GetModel(person_model);
// Set up resolver and interpreter
tflite::AllOpsResolver resolver;
static tflite::MicroInterpreter static_interpreter(
model, resolver, tensor_arena, kTensorArenaSize, &error_reporter);
interpreter = &static_interpreter;
// Allocate tensors
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
return false;
}
// Get input and output tensors
input = interpreter->input(0);
output = interpreter->output(0);
return true;
}
float ClassifyImage(uint8_t* image_data) {
// Copy image data to input tensor
memcpy(input->data.uint8, image_data, input->bytes);
// Run inference
if (interpreter->Invoke() != kTfLiteOk) {
return -1.0f; // Error
}
// Get result (dequantize if needed)
if (output->type == kTfLiteUInt8) {
uint8_t output_quantized = output->data.uint8[0];
return (output_quantized - output->params.zero_point) * output->params.scale;
} else {
return output->data.f[0];
}
}
};
Memory Requirements
| Model | Tensor Arena | Model Size | Total RAM | Flash Usage |
|---|---|---|---|---|
| SRAM Model | ~80KB | 1.5MB | ~2.5MB | Minimal |
| Flash Model | ~120KB | 1.5MB | ~200KB | 1.5MB |
Desktop/Server Deployment
Python Implementation
#!/usr/bin/env python3
import tensorflow as tf
import numpy as np
from PIL import Image
import argparse
class PersonClassificationServer:
def __init__(self, model_path):
self.interpreter = tf.lite.Interpreter(model_path=model_path)
self.interpreter.allocate_tensors()
self.input_details = self.interpreter.get_input_details()
self.output_details = self.interpreter.get_output_details()
def preprocess_image(self, image_path):
image = Image.open(image_path).convert('RGB')
input_shape = self.input_details[0]['shape'][1:3] # height, width
image = image.resize((input_shape[1], input_shape[0]))
return np.expand_dims(np.array(image, dtype=np.uint8), axis=0)
def classify(self, image_path):
input_data = self.preprocess_image(image_path)
self.interpreter.set_tensor(self.input_details[0]['index'], input_data)
self.interpreter.invoke()
output_data = self.interpreter.get_tensor(self.output_details[0]['index'])
# Handle quantization
scale = self.output_details[0]['quantization'][0]
zero_point = self.output_details[0]['quantization'][1]
if scale != 0:
dequantized = scale * (output_data.astype(np.float32) - zero_point)
probability = 1 / (1 + np.exp(-dequantized[0][0]))
else:
probability = float(output_data[0][0])
return {
'probability': probability,
'prediction': 'person' if probability > 0.5 else 'non-person',
'confidence': probability if probability > 0.5 else 1 - probability
}
# Example usage
if __name__ == '__main__':
classifier = PersonClassificationServer('person_classification_sram(256x448).tflite')
result = classifier.classify('test_image.jpg')
print(f"Prediction: {result['prediction']} (confidence: {result['confidence']:.2%})")
REST API Server
from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename
import os
app = Flask(__name__)
classifier = PersonClassificationServer('person_classification_sram(256x448).tflite')
@app.route('/classify', methods=['POST'])
def classify_image():
if 'image' not in request.files:
return jsonify({'error': 'No image file'}), 400
file = request.files['image']
if file.filename == '':
return jsonify({'error': 'No file selected'}), 400
filename = secure_filename(file.filename)
filepath = os.path.join('/tmp', filename)
file.save(filepath)
try:
result = classifier.classify(filepath)
os.remove(filepath) # Cleanup
return jsonify(result)
except Exception as e:
return jsonify({'error': str(e)}), 500
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Performance Optimization
Model Selection Guidelines
Choose SRAM model when:
- Memory is extremely constrained
- Real-time processing is critical
- Power consumption is a concern
- Input resolution is sufficient for use case
Choose Flash model when:
- Higher accuracy is required
- Sufficient flash storage available
- Processing higher resolution images
- Can afford slightly longer inference time
Optimization Techniques
Input Image Optimization
# Efficient preprocessing
def optimize_preprocessing(image_path, target_size):
"""Optimized image preprocessing"""
image = Image.open(image_path)
# Convert only if necessary
if image.mode != 'RGB':
image = image.convert('RGB')
# Use high-quality resampling for better accuracy
image = image.resize(target_size, Image.Resampling.LANCZOS)
# Convert to numpy efficiently
return np.asarray(image, dtype=np.uint8)
Batch Processing
def batch_classify(classifier, image_paths, batch_size=8):
"""Process multiple images efficiently"""
results = []
for i in range(0, len(image_paths), batch_size):
batch = image_paths[i:i+batch_size]
batch_results = []
for image_path in batch:
result = classifier.classify(image_path)
batch_results.append(result)
results.extend(batch_results)
return results
Performance Benchmarks
| Platform | Model | Resolution | Inference Time | Memory Usage |
|---|---|---|---|---|
| Astra MCU (400MHz) | SRAM | 480×270 | ~15ms | 80KB RAM |
| Astra MCU (400MHz) | Flash | 640×480 | ~25ms | 120KB RAM |
| Raspberry Pi 4 | SRAM | 480×270 | ~8ms | 50MB RAM |
| Raspberry Pi 4 | Flash | 640×480 | ~12ms | 55MB RAM |
| Desktop CPU | SRAM | 480×270 | ~2ms | 30MB RAM |
| Desktop CPU | Flash | 640×480 | ~3ms | 35MB RAM |
Troubleshooting
Common Issues
Model Loading Errors
# Issue: "Model file not found"
# Solution: Check file path and permissions
import os
if not os.path.exists(model_path):
print(f"Model not found: {model_path}")
# Issue: "Invalid model format"
# Solution: Verify .tflite file integrity
try:
interpreter = tf.lite.Interpreter(model_path=model_path)
except Exception as e:
print(f"Model loading error: {e}")
Input Shape Mismatch
# Get expected input shape
input_details = interpreter.get_input_details()
expected_shape = input_details[0]['shape']
print(f"Expected input shape: {expected_shape}")
# Ensure image matches expected dimensions
if image_data.shape != expected_shape:
print(f"Shape mismatch: got {image_data.shape}, expected {expected_shape}")
Quantization Issues
# Check if model is quantized
output_details = interpreter.get_output_details()
scale = output_details[0]['quantization'][0]
zero_point = output_details[0]['quantization'][1]
if scale == 0:
print("Model uses float32 output")
else:
print(f"Quantized model: scale={scale}, zero_point={zero_point}")
Memory Issues on MCU
// Increase tensor arena size if needed
constexpr int kTensorArenaSize = 150 * 1024; // Increase from 100KB
// Check allocation status
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
printf("Failed to allocate tensors - increase kTensorArenaSize\n");
}
Debugging Tips
Enable Verbose Logging:
tf.get_logger().setLevel('DEBUG')Check Model Details:
interpreter = tf.lite.Interpreter(model_path=model_path) print("Input details:", interpreter.get_input_details()) print("Output details:", interpreter.get_output_details())Validate Input Data:
print(f"Input shape: {input_data.shape}") print(f"Input dtype: {input_data.dtype}") print(f"Input range: [{input_data.min()}, {input_data.max()}]")
Support Resources
- Astra MCU SDK: Official documentation and support forums
- TensorFlow Lite: Official TFLite documentation
- Model Issues: Check GitHub issues or create new issue with model details
- Performance Optimization: TensorFlow Lite optimization guide
For additional support or specific deployment questions, please refer to the main README.md or create an issue in the repository.