# Deployment Guide - Person Classification TensorFlow Lite Models This guide provides comprehensive instructions for deploying the person classification models across different platforms and environments. ## Table of Contents 1. [Quick Start](#quick-start) 2. [Astra MCU SDK Deployment](#astra-mcu-sdk-deployment) 3. [General Embedded Deployment](#general-embedded-deployment) 4. [Desktop/Server Deployment](#desktopserver-deployment) 5. [Performance Optimization](#performance-optimization) 6. [Troubleshooting](#troubleshooting) ## Quick Start ### Installation ```bash # Install dependencies pip install -r requirements.txt # Or minimal installation for inference only pip install tensorflow numpy pillow ``` ### Basic Usage ```bash # Test with Flash model (VGA resolution) python inference_example.py --model flash --image your_image.jpg # Test with SRAM model (WQVGA resolution) python inference_example.py --model sram --image your_image.jpg ``` ## Astra MCU SDK Deployment ### Prerequisites - Astra MCU SDK installed and configured - GCC/AC6 build environment - SynaToolkit for debugging and deployment - Astra Machina Micro Kit hardware ### Model Selection Strategy | Scenario | Recommended Model | Resolution | Memory Location | Use Case | |----------|------------------|------------|-----------------|----------| | **High Accuracy Required** | Flash Model | 640×480 | Flash Memory | Security systems, detailed detection | | **Real-time Processing** | SRAM Model | 480×270 | SRAM | IoT sensors, battery devices | | **Memory Constrained** | SRAM Model | 480×270 | SRAM | Low-power applications | | **Balanced Performance** | Flash Model | 640×480 | Flash Memory | General purpose applications | ### Step-by-Step Deployment #### 1. Project Configuration **For WQVGA Resolution (SRAM Model):** ```bash make cm55_person_classification_defconfig ``` **For VGA Resolution (Flash Model):** ```bash make cm55_person_classification_defconfig make menuconfig # Navigate to: COMPONENTS CONFIGURATION → Off Chip Components → Display Resolution # Change to: VGA(640x480) ``` #### 2. Model Integration **SRAM Model Setup:** - Copy `person_classification_sram(256x448).tflite` to your project's model directory - Model weights loaded into SRAM during initialization - Faster access but uses SRAM space **Flash Model Setup:** - Copy `person_classification_flash(448x640).tflite` to your project's model directory - Generate binary file for flash deployment: ```bash # Use Vela compilation guide to generate .bin file # Flash to address: 0x629000 (calculated based on your NVM_data.json) ``` #### 3. Build Process ```bash # Build the application make build # Or simply make ``` #### 4. Binary Generation 1. Open Astra MCU SDK VSCode Extension 2. Navigate to **AXF/ELF TO BIN** → **Bin Conversion** 3. Load generated `sr110_cm55_fw.elf` or `sr110_cm55_fw.axf` 4. Click **Run Image Generator** #### 5. Flashing **WQVGA (SRAM Model):** ```bash # Flash the main application binary # File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin # The model is loaded into SRAM during runtime ``` **VGA (Flash Model):** ```bash # 1. Flash the model binary first # File: person_classification_flash(448x640).bin # Address: 0x629000 # 2. Flash the main application binary # File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin ``` #### 6. Verification 1. Connect to Application SR110 USB port 2. Open SynaToolkit 3. Connect to COM port for logging 4. Use Tools → Video Streamer for testing 5. Configure UC ID: PERSON_CLASSIFICATION ## General Embedded Deployment ### TensorFlow Lite Micro Integration ```cpp #include "tensorflow/lite/micro/all_ops_resolver.h" #include "tensorflow/lite/micro/micro_error_reporter.h" #include "tensorflow/lite/micro/micro_interpreter.h" #include "tensorflow/lite/schema/schema_generated.h" // Model data (convert .tflite to C array) extern const unsigned char person_model[]; extern const int person_model_len; // Tensor arena size (adjust based on model) constexpr int kTensorArenaSize = 100 * 1024; // 100KB for SRAM model constexpr int kTensorArenaSize = 150 * 1024; // 150KB for Flash model class PersonClassifier { private: uint8_t tensor_arena[kTensorArenaSize]; tflite::MicroInterpreter* interpreter; TfLiteTensor* input; TfLiteTensor* output; public: bool Initialize() { // Load model const tflite::Model* model = tflite::GetModel(person_model); // Set up resolver and interpreter tflite::AllOpsResolver resolver; static tflite::MicroInterpreter static_interpreter( model, resolver, tensor_arena, kTensorArenaSize, &error_reporter); interpreter = &static_interpreter; // Allocate tensors TfLiteStatus allocate_status = interpreter->AllocateTensors(); if (allocate_status != kTfLiteOk) { return false; } // Get input and output tensors input = interpreter->input(0); output = interpreter->output(0); return true; } float ClassifyImage(uint8_t* image_data) { // Copy image data to input tensor memcpy(input->data.uint8, image_data, input->bytes); // Run inference if (interpreter->Invoke() != kTfLiteOk) { return -1.0f; // Error } // Get result (dequantize if needed) if (output->type == kTfLiteUInt8) { uint8_t output_quantized = output->data.uint8[0]; return (output_quantized - output->params.zero_point) * output->params.scale; } else { return output->data.f[0]; } } }; ``` ### Memory Requirements | Model | Tensor Arena | Model Size | Total RAM | Flash Usage | |-------|-------------|------------|-----------|-------------| | **SRAM Model** | ~80KB | 1.5MB | ~2.5MB | Minimal | | **Flash Model** | ~120KB | 1.5MB | ~200KB | 1.5MB | ## Desktop/Server Deployment ### Python Implementation ```python #!/usr/bin/env python3 import tensorflow as tf import numpy as np from PIL import Image import argparse class PersonClassificationServer: def __init__(self, model_path): self.interpreter = tf.lite.Interpreter(model_path=model_path) self.interpreter.allocate_tensors() self.input_details = self.interpreter.get_input_details() self.output_details = self.interpreter.get_output_details() def preprocess_image(self, image_path): image = Image.open(image_path).convert('RGB') input_shape = self.input_details[0]['shape'][1:3] # height, width image = image.resize((input_shape[1], input_shape[0])) return np.expand_dims(np.array(image, dtype=np.uint8), axis=0) def classify(self, image_path): input_data = self.preprocess_image(image_path) self.interpreter.set_tensor(self.input_details[0]['index'], input_data) self.interpreter.invoke() output_data = self.interpreter.get_tensor(self.output_details[0]['index']) # Handle quantization scale = self.output_details[0]['quantization'][0] zero_point = self.output_details[0]['quantization'][1] if scale != 0: dequantized = scale * (output_data.astype(np.float32) - zero_point) probability = 1 / (1 + np.exp(-dequantized[0][0])) else: probability = float(output_data[0][0]) return { 'probability': probability, 'prediction': 'person' if probability > 0.5 else 'non-person', 'confidence': probability if probability > 0.5 else 1 - probability } # Example usage if __name__ == '__main__': classifier = PersonClassificationServer('person_classification_sram(256x448).tflite') result = classifier.classify('test_image.jpg') print(f"Prediction: {result['prediction']} (confidence: {result['confidence']:.2%})") ``` ### REST API Server ```python from flask import Flask, request, jsonify from werkzeug.utils import secure_filename import os app = Flask(__name__) classifier = PersonClassificationServer('person_classification_sram(256x448).tflite') @app.route('/classify', methods=['POST']) def classify_image(): if 'image' not in request.files: return jsonify({'error': 'No image file'}), 400 file = request.files['image'] if file.filename == '': return jsonify({'error': 'No file selected'}), 400 filename = secure_filename(file.filename) filepath = os.path.join('/tmp', filename) file.save(filepath) try: result = classifier.classify(filepath) os.remove(filepath) # Cleanup return jsonify(result) except Exception as e: return jsonify({'error': str(e)}), 500 if __name__ == '__main__': app.run(host='0.0.0.0', port=5000) ``` ## Performance Optimization ### Model Selection Guidelines 1. **Choose SRAM model when:** - Memory is extremely constrained - Real-time processing is critical - Power consumption is a concern - Input resolution is sufficient for use case 2. **Choose Flash model when:** - Higher accuracy is required - Sufficient flash storage available - Processing higher resolution images - Can afford slightly longer inference time ### Optimization Techniques #### Input Image Optimization ```python # Efficient preprocessing def optimize_preprocessing(image_path, target_size): """Optimized image preprocessing""" image = Image.open(image_path) # Convert only if necessary if image.mode != 'RGB': image = image.convert('RGB') # Use high-quality resampling for better accuracy image = image.resize(target_size, Image.Resampling.LANCZOS) # Convert to numpy efficiently return np.asarray(image, dtype=np.uint8) ``` #### Batch Processing ```python def batch_classify(classifier, image_paths, batch_size=8): """Process multiple images efficiently""" results = [] for i in range(0, len(image_paths), batch_size): batch = image_paths[i:i+batch_size] batch_results = [] for image_path in batch: result = classifier.classify(image_path) batch_results.append(result) results.extend(batch_results) return results ``` ### Performance Benchmarks | Platform | Model | Resolution | Inference Time | Memory Usage | |----------|--------|------------|----------------|--------------| | **Astra MCU (400MHz)** | SRAM | 480×270 | ~15ms | 80KB RAM | | **Astra MCU (400MHz)** | Flash | 640×480 | ~25ms | 120KB RAM | | **Raspberry Pi 4** | SRAM | 480×270 | ~8ms | 50MB RAM | | **Raspberry Pi 4** | Flash | 640×480 | ~12ms | 55MB RAM | | **Desktop CPU** | SRAM | 480×270 | ~2ms | 30MB RAM | | **Desktop CPU** | Flash | 640×480 | ~3ms | 35MB RAM | ## Troubleshooting ### Common Issues #### Model Loading Errors ```python # Issue: "Model file not found" # Solution: Check file path and permissions import os if not os.path.exists(model_path): print(f"Model not found: {model_path}") # Issue: "Invalid model format" # Solution: Verify .tflite file integrity try: interpreter = tf.lite.Interpreter(model_path=model_path) except Exception as e: print(f"Model loading error: {e}") ``` #### Input Shape Mismatch ```python # Get expected input shape input_details = interpreter.get_input_details() expected_shape = input_details[0]['shape'] print(f"Expected input shape: {expected_shape}") # Ensure image matches expected dimensions if image_data.shape != expected_shape: print(f"Shape mismatch: got {image_data.shape}, expected {expected_shape}") ``` #### Quantization Issues ```python # Check if model is quantized output_details = interpreter.get_output_details() scale = output_details[0]['quantization'][0] zero_point = output_details[0]['quantization'][1] if scale == 0: print("Model uses float32 output") else: print(f"Quantized model: scale={scale}, zero_point={zero_point}") ``` #### Memory Issues on MCU ```cpp // Increase tensor arena size if needed constexpr int kTensorArenaSize = 150 * 1024; // Increase from 100KB // Check allocation status TfLiteStatus allocate_status = interpreter->AllocateTensors(); if (allocate_status != kTfLiteOk) { printf("Failed to allocate tensors - increase kTensorArenaSize\n"); } ``` ### Debugging Tips 1. **Enable Verbose Logging:** ```python tf.get_logger().setLevel('DEBUG') ``` 2. **Check Model Details:** ```python interpreter = tf.lite.Interpreter(model_path=model_path) print("Input details:", interpreter.get_input_details()) print("Output details:", interpreter.get_output_details()) ``` 3. **Validate Input Data:** ```python print(f"Input shape: {input_data.shape}") print(f"Input dtype: {input_data.dtype}") print(f"Input range: [{input_data.min()}, {input_data.max()}]") ``` ### Support Resources - **Astra MCU SDK**: Official documentation and support forums - **TensorFlow Lite**: [Official TFLite documentation](https://www.tensorflow.org/lite) - **Model Issues**: Check GitHub issues or create new issue with model details - **Performance Optimization**: TensorFlow Lite optimization guide --- For additional support or specific deployment questions, please refer to the main README.md or create an issue in the repository.