# Deployment Guide - Person Classification TensorFlow Lite Models

This guide provides comprehensive instructions for deploying the person classification models across different platforms and environments.

## Table of Contents
1. [Quick Start](#quick-start)
2. [Astra MCU SDK Deployment](#astra-mcu-sdk-deployment)  
3. [General Embedded Deployment](#general-embedded-deployment)
4. [Desktop/Server Deployment](#desktopserver-deployment)
5. [Performance Optimization](#performance-optimization)
6. [Troubleshooting](#troubleshooting)

## Quick Start

### Installation
```bash
# Install dependencies
pip install -r requirements.txt

# Or minimal installation for inference only
pip install tensorflow numpy pillow
```

### Basic Usage
```bash
# Test with Flash model (VGA resolution)
python inference_example.py --model flash --image your_image.jpg

# Test with SRAM model (WQVGA resolution) 
python inference_example.py --model sram --image your_image.jpg
```

## Astra MCU SDK Deployment

### Prerequisites
- Astra MCU SDK installed and configured
- GCC/AC6 build environment
- SynaToolkit for debugging and deployment
- Astra Machina Micro Kit hardware

### Model Selection Strategy

| Scenario | Recommended Model | Resolution | Memory Location | Use Case |
|----------|------------------|------------|-----------------|----------|
| **High Accuracy Required** | Flash Model | 640×480 | Flash Memory | Security systems, detailed detection |
| **Real-time Processing** | SRAM Model | 480×270 | SRAM | IoT sensors, battery devices |
| **Memory Constrained** | SRAM Model | 480×270 | SRAM | Low-power applications |
| **Balanced Performance** | Flash Model | 640×480 | Flash Memory | General purpose applications |

### Step-by-Step Deployment

#### 1. Project Configuration

**For WQVGA Resolution (SRAM Model):**
```bash
make cm55_person_classification_defconfig
```

**For VGA Resolution (Flash Model):**
```bash
make cm55_person_classification_defconfig
make menuconfig
# Navigate to: COMPONENTS CONFIGURATION → Off Chip Components → Display Resolution
# Change to: VGA(640x480)
```

#### 2. Model Integration

**SRAM Model Setup:**
- Copy `person_classification_sram(256x448).tflite` to your project's model directory
- Model weights loaded into SRAM during initialization
- Faster access but uses SRAM space

**Flash Model Setup:**
- Copy `person_classification_flash(448x640).tflite` to your project's model directory  
- Generate binary file for flash deployment:
  ```bash
  # Use Vela compilation guide to generate .bin file
  # Flash to address: 0x629000 (calculated based on your NVM_data.json)
  ```

#### 3. Build Process
```bash
# Build the application
make build

# Or simply
make
```

#### 4. Binary Generation
1. Open Astra MCU SDK VSCode Extension
2. Navigate to **AXF/ELF TO BIN** → **Bin Conversion**
3. Load generated `sr110_cm55_fw.elf` or `sr110_cm55_fw.axf`
4. Click **Run Image Generator**

#### 5. Flashing

**WQVGA (SRAM Model):**
```bash
# Flash the main application binary
# File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin
# The model is loaded into SRAM during runtime
```

**VGA (Flash Model):**
```bash
# 1. Flash the model binary first
# File: person_classification_flash(448x640).bin  
# Address: 0x629000

# 2. Flash the main application binary
# File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin
```

#### 6. Verification
1. Connect to Application SR110 USB port
2. Open SynaToolkit
3. Connect to COM port for logging
4. Use Tools → Video Streamer for testing
5. Configure UC ID: PERSON_CLASSIFICATION

## General Embedded Deployment

### TensorFlow Lite Micro Integration

```cpp
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"

// Model data (convert .tflite to C array)
extern const unsigned char person_model[];
extern const int person_model_len;

// Tensor arena size (adjust based on model)
constexpr int kTensorArenaSize = 100 * 1024;  // 100KB for SRAM model
constexpr int kTensorArenaSize = 150 * 1024;  // 150KB for Flash model

class PersonClassifier {
private:
    uint8_t tensor_arena[kTensorArenaSize];
    tflite::MicroInterpreter* interpreter;
    TfLiteTensor* input;
    TfLiteTensor* output;

public:
    bool Initialize() {
        // Load model
        const tflite::Model* model = tflite::GetModel(person_model);
        
        // Set up resolver and interpreter
        tflite::AllOpsResolver resolver;
        static tflite::MicroInterpreter static_interpreter(
            model, resolver, tensor_arena, kTensorArenaSize, &error_reporter);
        interpreter = &static_interpreter;

        // Allocate tensors
        TfLiteStatus allocate_status = interpreter->AllocateTensors();
        if (allocate_status != kTfLiteOk) {
            return false;
        }

        // Get input and output tensors
        input = interpreter->input(0);
        output = interpreter->output(0);
        
        return true;
    }

    float ClassifyImage(uint8_t* image_data) {
        // Copy image data to input tensor
        memcpy(input->data.uint8, image_data, input->bytes);

        // Run inference
        if (interpreter->Invoke() != kTfLiteOk) {
            return -1.0f;  // Error
        }

        // Get result (dequantize if needed)
        if (output->type == kTfLiteUInt8) {
            uint8_t output_quantized = output->data.uint8[0];
            return (output_quantized - output->params.zero_point) * output->params.scale;
        } else {
            return output->data.f[0];
        }
    }
};
```

### Memory Requirements

| Model | Tensor Arena | Model Size | Total RAM | Flash Usage |
|-------|-------------|------------|-----------|-------------|
| **SRAM Model** | ~80KB | 1.5MB | ~2.5MB | Minimal |
| **Flash Model** | ~120KB | 1.5MB | ~200KB | 1.5MB |

## Desktop/Server Deployment

### Python Implementation

```python
#!/usr/bin/env python3
import tensorflow as tf
import numpy as np
from PIL import Image
import argparse

class PersonClassificationServer:
    def __init__(self, model_path):
        self.interpreter = tf.lite.Interpreter(model_path=model_path)
        self.interpreter.allocate_tensors()
        self.input_details = self.interpreter.get_input_details()
        self.output_details = self.interpreter.get_output_details()
    
    def preprocess_image(self, image_path):
        image = Image.open(image_path).convert('RGB')
        input_shape = self.input_details[0]['shape'][1:3]  # height, width
        image = image.resize((input_shape[1], input_shape[0]))
        return np.expand_dims(np.array(image, dtype=np.uint8), axis=0)
    
    def classify(self, image_path):
        input_data = self.preprocess_image(image_path)
        self.interpreter.set_tensor(self.input_details[0]['index'], input_data)
        self.interpreter.invoke()
        output_data = self.interpreter.get_tensor(self.output_details[0]['index'])
        
        # Handle quantization
        scale = self.output_details[0]['quantization'][0]
        zero_point = self.output_details[0]['quantization'][1]
        
        if scale != 0:
            dequantized = scale * (output_data.astype(np.float32) - zero_point)
            probability = 1 / (1 + np.exp(-dequantized[0][0]))
        else:
            probability = float(output_data[0][0])
        
        return {
            'probability': probability,
            'prediction': 'person' if probability > 0.5 else 'non-person',
            'confidence': probability if probability > 0.5 else 1 - probability
        }

# Example usage
if __name__ == '__main__':
    classifier = PersonClassificationServer('person_classification_sram(256x448).tflite')
    result = classifier.classify('test_image.jpg')
    print(f"Prediction: {result['prediction']} (confidence: {result['confidence']:.2%})")
```

### REST API Server

```python
from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename
import os

app = Flask(__name__)
classifier = PersonClassificationServer('person_classification_sram(256x448).tflite')

@app.route('/classify', methods=['POST'])
def classify_image():
    if 'image' not in request.files:
        return jsonify({'error': 'No image file'}), 400
    
    file = request.files['image']
    if file.filename == '':
        return jsonify({'error': 'No file selected'}), 400
    
    filename = secure_filename(file.filename)
    filepath = os.path.join('/tmp', filename)
    file.save(filepath)
    
    try:
        result = classifier.classify(filepath)
        os.remove(filepath)  # Cleanup
        return jsonify(result)
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

## Performance Optimization

### Model Selection Guidelines

1. **Choose SRAM model when:**
   - Memory is extremely constrained
   - Real-time processing is critical
   - Power consumption is a concern
   - Input resolution is sufficient for use case

2. **Choose Flash model when:**
   - Higher accuracy is required
   - Sufficient flash storage available
   - Processing higher resolution images
   - Can afford slightly longer inference time

### Optimization Techniques

#### Input Image Optimization
```python
# Efficient preprocessing
def optimize_preprocessing(image_path, target_size):
    """Optimized image preprocessing"""
    image = Image.open(image_path)
    
    # Convert only if necessary
    if image.mode != 'RGB':
        image = image.convert('RGB')
    
    # Use high-quality resampling for better accuracy
    image = image.resize(target_size, Image.Resampling.LANCZOS)
    
    # Convert to numpy efficiently
    return np.asarray(image, dtype=np.uint8)
```

#### Batch Processing
```python
def batch_classify(classifier, image_paths, batch_size=8):
    """Process multiple images efficiently"""
    results = []
    
    for i in range(0, len(image_paths), batch_size):
        batch = image_paths[i:i+batch_size]
        batch_results = []
        
        for image_path in batch:
            result = classifier.classify(image_path)
            batch_results.append(result)
        
        results.extend(batch_results)
    
    return results
```

### Performance Benchmarks

| Platform | Model | Resolution | Inference Time | Memory Usage |
|----------|--------|------------|----------------|--------------|
| **Astra MCU (400MHz)** | SRAM | 480×270 | ~15ms | 80KB RAM |
| **Astra MCU (400MHz)** | Flash | 640×480 | ~25ms | 120KB RAM |
| **Raspberry Pi 4** | SRAM | 480×270 | ~8ms | 50MB RAM |
| **Raspberry Pi 4** | Flash | 640×480 | ~12ms | 55MB RAM |
| **Desktop CPU** | SRAM | 480×270 | ~2ms | 30MB RAM |
| **Desktop CPU** | Flash | 640×480 | ~3ms | 35MB RAM |

## Troubleshooting

### Common Issues

#### Model Loading Errors
```python
# Issue: "Model file not found"
# Solution: Check file path and permissions
import os
if not os.path.exists(model_path):
    print(f"Model not found: {model_path}")
    
# Issue: "Invalid model format"  
# Solution: Verify .tflite file integrity
try:
    interpreter = tf.lite.Interpreter(model_path=model_path)
except Exception as e:
    print(f"Model loading error: {e}")
```

#### Input Shape Mismatch
```python
# Get expected input shape
input_details = interpreter.get_input_details()
expected_shape = input_details[0]['shape']
print(f"Expected input shape: {expected_shape}")

# Ensure image matches expected dimensions
if image_data.shape != expected_shape:
    print(f"Shape mismatch: got {image_data.shape}, expected {expected_shape}")
```

#### Quantization Issues
```python
# Check if model is quantized
output_details = interpreter.get_output_details()
scale = output_details[0]['quantization'][0]
zero_point = output_details[0]['quantization'][1]

if scale == 0:
    print("Model uses float32 output")
else:
    print(f"Quantized model: scale={scale}, zero_point={zero_point}")
```

#### Memory Issues on MCU
```cpp
// Increase tensor arena size if needed
constexpr int kTensorArenaSize = 150 * 1024;  // Increase from 100KB

// Check allocation status
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
    printf("Failed to allocate tensors - increase kTensorArenaSize\n");
}
```

### Debugging Tips

1. **Enable Verbose Logging:**
   ```python
   tf.get_logger().setLevel('DEBUG')
   ```

2. **Check Model Details:**
   ```python
   interpreter = tf.lite.Interpreter(model_path=model_path)
   print("Input details:", interpreter.get_input_details())
   print("Output details:", interpreter.get_output_details())
   ```

3. **Validate Input Data:**
   ```python
   print(f"Input shape: {input_data.shape}")
   print(f"Input dtype: {input_data.dtype}")
   print(f"Input range: [{input_data.min()}, {input_data.max()}]")
   ```

### Support Resources

- **Astra MCU SDK**: Official documentation and support forums
- **TensorFlow Lite**: [Official TFLite documentation](https://www.tensorflow.org/lite)
- **Model Issues**: Check GitHub issues or create new issue with model details
- **Performance Optimization**: TensorFlow Lite optimization guide

---

For additional support or specific deployment questions, please refer to the main README.md or create an issue in the repository.