File size: 13,377 Bytes
0eef5f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
# Deployment Guide - Person Classification TensorFlow Lite Models

This guide provides comprehensive instructions for deploying the person classification models across different platforms and environments.

## Table of Contents
1. [Quick Start](#quick-start)
2. [Astra MCU SDK Deployment](#astra-mcu-sdk-deployment)  
3. [General Embedded Deployment](#general-embedded-deployment)
4. [Desktop/Server Deployment](#desktopserver-deployment)
5. [Performance Optimization](#performance-optimization)
6. [Troubleshooting](#troubleshooting)

## Quick Start

### Installation
```bash
# Install dependencies
pip install -r requirements.txt

# Or minimal installation for inference only
pip install tensorflow numpy pillow
```

### Basic Usage
```bash
# Test with Flash model (VGA resolution)
python inference_example.py --model flash --image your_image.jpg

# Test with SRAM model (WQVGA resolution) 
python inference_example.py --model sram --image your_image.jpg
```

## Astra MCU SDK Deployment

### Prerequisites
- Astra MCU SDK installed and configured
- GCC/AC6 build environment
- SynaToolkit for debugging and deployment
- Astra Machina Micro Kit hardware

### Model Selection Strategy

| Scenario | Recommended Model | Resolution | Memory Location | Use Case |
|----------|------------------|------------|-----------------|----------|
| **High Accuracy Required** | Flash Model | 640×480 | Flash Memory | Security systems, detailed detection |
| **Real-time Processing** | SRAM Model | 480×270 | SRAM | IoT sensors, battery devices |
| **Memory Constrained** | SRAM Model | 480×270 | SRAM | Low-power applications |
| **Balanced Performance** | Flash Model | 640×480 | Flash Memory | General purpose applications |

### Step-by-Step Deployment

#### 1. Project Configuration

**For WQVGA Resolution (SRAM Model):**
```bash
make cm55_person_classification_defconfig
```

**For VGA Resolution (Flash Model):**
```bash
make cm55_person_classification_defconfig
make menuconfig
# Navigate to: COMPONENTS CONFIGURATION → Off Chip Components → Display Resolution
# Change to: VGA(640x480)
```

#### 2. Model Integration

**SRAM Model Setup:**
- Copy `person_classification_sram(256x448).tflite` to your project's model directory
- Model weights loaded into SRAM during initialization
- Faster access but uses SRAM space

**Flash Model Setup:**
- Copy `person_classification_flash(448x640).tflite` to your project's model directory  
- Generate binary file for flash deployment:
  ```bash
  # Use Vela compilation guide to generate .bin file
  # Flash to address: 0x629000 (calculated based on your NVM_data.json)
  ```

#### 3. Build Process
```bash
# Build the application
make build

# Or simply
make
```

#### 4. Binary Generation
1. Open Astra MCU SDK VSCode Extension
2. Navigate to **AXF/ELF TO BIN** → **Bin Conversion**
3. Load generated `sr110_cm55_fw.elf` or `sr110_cm55_fw.axf`
4. Click **Run Image Generator**

#### 5. Flashing

**WQVGA (SRAM Model):**
```bash
# Flash the main application binary
# File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin
# The model is loaded into SRAM during runtime
```

**VGA (Flash Model):**
```bash
# 1. Flash the model binary first
# File: person_classification_flash(448x640).bin  
# Address: 0x629000

# 2. Flash the main application binary
# File: B0_flash_full_image_GD25LE128_67Mhz_secured.bin
```

#### 6. Verification
1. Connect to Application SR110 USB port
2. Open SynaToolkit
3. Connect to COM port for logging
4. Use Tools → Video Streamer for testing
5. Configure UC ID: PERSON_CLASSIFICATION

## General Embedded Deployment

### TensorFlow Lite Micro Integration

```cpp
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"

// Model data (convert .tflite to C array)
extern const unsigned char person_model[];
extern const int person_model_len;

// Tensor arena size (adjust based on model)
constexpr int kTensorArenaSize = 100 * 1024;  // 100KB for SRAM model
constexpr int kTensorArenaSize = 150 * 1024;  // 150KB for Flash model

class PersonClassifier {
private:
    uint8_t tensor_arena[kTensorArenaSize];
    tflite::MicroInterpreter* interpreter;
    TfLiteTensor* input;
    TfLiteTensor* output;

public:
    bool Initialize() {
        // Load model
        const tflite::Model* model = tflite::GetModel(person_model);
        
        // Set up resolver and interpreter
        tflite::AllOpsResolver resolver;
        static tflite::MicroInterpreter static_interpreter(
            model, resolver, tensor_arena, kTensorArenaSize, &error_reporter);
        interpreter = &static_interpreter;

        // Allocate tensors
        TfLiteStatus allocate_status = interpreter->AllocateTensors();
        if (allocate_status != kTfLiteOk) {
            return false;
        }

        // Get input and output tensors
        input = interpreter->input(0);
        output = interpreter->output(0);
        
        return true;
    }

    float ClassifyImage(uint8_t* image_data) {
        // Copy image data to input tensor
        memcpy(input->data.uint8, image_data, input->bytes);

        // Run inference
        if (interpreter->Invoke() != kTfLiteOk) {
            return -1.0f;  // Error
        }

        // Get result (dequantize if needed)
        if (output->type == kTfLiteUInt8) {
            uint8_t output_quantized = output->data.uint8[0];
            return (output_quantized - output->params.zero_point) * output->params.scale;
        } else {
            return output->data.f[0];
        }
    }
};
```

### Memory Requirements

| Model | Tensor Arena | Model Size | Total RAM | Flash Usage |
|-------|-------------|------------|-----------|-------------|
| **SRAM Model** | ~80KB | 1.5MB | ~2.5MB | Minimal |
| **Flash Model** | ~120KB | 1.5MB | ~200KB | 1.5MB |

## Desktop/Server Deployment

### Python Implementation

```python
#!/usr/bin/env python3
import tensorflow as tf
import numpy as np
from PIL import Image
import argparse

class PersonClassificationServer:
    def __init__(self, model_path):
        self.interpreter = tf.lite.Interpreter(model_path=model_path)
        self.interpreter.allocate_tensors()
        self.input_details = self.interpreter.get_input_details()
        self.output_details = self.interpreter.get_output_details()
    
    def preprocess_image(self, image_path):
        image = Image.open(image_path).convert('RGB')
        input_shape = self.input_details[0]['shape'][1:3]  # height, width
        image = image.resize((input_shape[1], input_shape[0]))
        return np.expand_dims(np.array(image, dtype=np.uint8), axis=0)
    
    def classify(self, image_path):
        input_data = self.preprocess_image(image_path)
        self.interpreter.set_tensor(self.input_details[0]['index'], input_data)
        self.interpreter.invoke()
        output_data = self.interpreter.get_tensor(self.output_details[0]['index'])
        
        # Handle quantization
        scale = self.output_details[0]['quantization'][0]
        zero_point = self.output_details[0]['quantization'][1]
        
        if scale != 0:
            dequantized = scale * (output_data.astype(np.float32) - zero_point)
            probability = 1 / (1 + np.exp(-dequantized[0][0]))
        else:
            probability = float(output_data[0][0])
        
        return {
            'probability': probability,
            'prediction': 'person' if probability > 0.5 else 'non-person',
            'confidence': probability if probability > 0.5 else 1 - probability
        }

# Example usage
if __name__ == '__main__':
    classifier = PersonClassificationServer('person_classification_sram(256x448).tflite')
    result = classifier.classify('test_image.jpg')
    print(f"Prediction: {result['prediction']} (confidence: {result['confidence']:.2%})")
```

### REST API Server

```python
from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename
import os

app = Flask(__name__)
classifier = PersonClassificationServer('person_classification_sram(256x448).tflite')

@app.route('/classify', methods=['POST'])
def classify_image():
    if 'image' not in request.files:
        return jsonify({'error': 'No image file'}), 400
    
    file = request.files['image']
    if file.filename == '':
        return jsonify({'error': 'No file selected'}), 400
    
    filename = secure_filename(file.filename)
    filepath = os.path.join('/tmp', filename)
    file.save(filepath)
    
    try:
        result = classifier.classify(filepath)
        os.remove(filepath)  # Cleanup
        return jsonify(result)
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

## Performance Optimization

### Model Selection Guidelines

1. **Choose SRAM model when:**
   - Memory is extremely constrained
   - Real-time processing is critical
   - Power consumption is a concern
   - Input resolution is sufficient for use case

2. **Choose Flash model when:**
   - Higher accuracy is required
   - Sufficient flash storage available
   - Processing higher resolution images
   - Can afford slightly longer inference time

### Optimization Techniques

#### Input Image Optimization
```python
# Efficient preprocessing
def optimize_preprocessing(image_path, target_size):
    """Optimized image preprocessing"""
    image = Image.open(image_path)
    
    # Convert only if necessary
    if image.mode != 'RGB':
        image = image.convert('RGB')
    
    # Use high-quality resampling for better accuracy
    image = image.resize(target_size, Image.Resampling.LANCZOS)
    
    # Convert to numpy efficiently
    return np.asarray(image, dtype=np.uint8)
```

#### Batch Processing
```python
def batch_classify(classifier, image_paths, batch_size=8):
    """Process multiple images efficiently"""
    results = []
    
    for i in range(0, len(image_paths), batch_size):
        batch = image_paths[i:i+batch_size]
        batch_results = []
        
        for image_path in batch:
            result = classifier.classify(image_path)
            batch_results.append(result)
        
        results.extend(batch_results)
    
    return results
```

### Performance Benchmarks

| Platform | Model | Resolution | Inference Time | Memory Usage |
|----------|--------|------------|----------------|--------------|
| **Astra MCU (400MHz)** | SRAM | 480×270 | ~15ms | 80KB RAM |
| **Astra MCU (400MHz)** | Flash | 640×480 | ~25ms | 120KB RAM |
| **Raspberry Pi 4** | SRAM | 480×270 | ~8ms | 50MB RAM |
| **Raspberry Pi 4** | Flash | 640×480 | ~12ms | 55MB RAM |
| **Desktop CPU** | SRAM | 480×270 | ~2ms | 30MB RAM |
| **Desktop CPU** | Flash | 640×480 | ~3ms | 35MB RAM |

## Troubleshooting

### Common Issues

#### Model Loading Errors
```python
# Issue: "Model file not found"
# Solution: Check file path and permissions
import os
if not os.path.exists(model_path):
    print(f"Model not found: {model_path}")
    
# Issue: "Invalid model format"  
# Solution: Verify .tflite file integrity
try:
    interpreter = tf.lite.Interpreter(model_path=model_path)
except Exception as e:
    print(f"Model loading error: {e}")
```

#### Input Shape Mismatch
```python
# Get expected input shape
input_details = interpreter.get_input_details()
expected_shape = input_details[0]['shape']
print(f"Expected input shape: {expected_shape}")

# Ensure image matches expected dimensions
if image_data.shape != expected_shape:
    print(f"Shape mismatch: got {image_data.shape}, expected {expected_shape}")
```

#### Quantization Issues
```python
# Check if model is quantized
output_details = interpreter.get_output_details()
scale = output_details[0]['quantization'][0]
zero_point = output_details[0]['quantization'][1]

if scale == 0:
    print("Model uses float32 output")
else:
    print(f"Quantized model: scale={scale}, zero_point={zero_point}")
```

#### Memory Issues on MCU
```cpp
// Increase tensor arena size if needed
constexpr int kTensorArenaSize = 150 * 1024;  // Increase from 100KB

// Check allocation status
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
    printf("Failed to allocate tensors - increase kTensorArenaSize\n");
}
```

### Debugging Tips

1. **Enable Verbose Logging:**
   ```python
   tf.get_logger().setLevel('DEBUG')
   ```

2. **Check Model Details:**
   ```python
   interpreter = tf.lite.Interpreter(model_path=model_path)
   print("Input details:", interpreter.get_input_details())
   print("Output details:", interpreter.get_output_details())
   ```

3. **Validate Input Data:**
   ```python
   print(f"Input shape: {input_data.shape}")
   print(f"Input dtype: {input_data.dtype}")
   print(f"Input range: [{input_data.min()}, {input_data.max()}]")
   ```

### Support Resources

- **Astra MCU SDK**: Official documentation and support forums
- **TensorFlow Lite**: [Official TFLite documentation](https://www.tensorflow.org/lite)
- **Model Issues**: Check GitHub issues or create new issue with model details
- **Performance Optimization**: TensorFlow Lite optimization guide

---

For additional support or specific deployment questions, please refer to the main README.md or create an issue in the repository.