File size: 4,522 Bytes
bf17f74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
# ML Training Pipeline

Complete machine learning pipeline for waste classification using PyTorch and EfficientNet-B0.

## Setup

### 1. Install Dependencies

\`\`\`bash
pip install -r ml/requirements.txt
\`\`\`

### 2. Prepare Dataset

#### Option A: Use Public Datasets

\`\`\`bash
# View available datasets
python ml/dataset_prep.py info

# Download datasets from sources in DATASET_SOURCES.txt
# Extract to ml/data/raw/ with category folders

# Organize dataset into train/val/test splits
python ml/dataset_prep.py
\`\`\`

#### Option B: Use Custom Data

Place your images in:
\`\`\`
ml/data/raw/
    recyclable/
    organic/
    wet-waste/
    dry-waste/
    ewaste/
    hazardous/
    landfill/
\`\`\`

Then run:
\`\`\`bash
python ml/dataset_prep.py
\`\`\`

## Training

### Initial Training

Train from scratch with pretrained EfficientNet-B0:

\`\`\`bash
python ml/train.py
\`\`\`

Training will:
- Use transfer learning with ImageNet pretrained weights
- Apply data augmentation for better generalization
- Save best model to `ml/models/best_model.pth`
- Generate confusion matrix
- Log training history

### Model Architecture

- **Base**: EfficientNet-B0 (pretrained on ImageNet)
- **Input**: 224x224 RGB images
- **Output**: 7 waste categories
- **Parameters**: ~5.3M
- **Inference Time**: ~50ms on CPU

### Why EfficientNet-B0?

1. **Accuracy**: State-of-the-art performance
2. **Speed**: Optimized for mobile/edge devices
3. **Size**: Compact model (~20MB)
4. **Efficiency**: Best accuracy-to-parameters ratio

## Inference

### Python Inference

\`\`\`python
from ml.predict import WasteClassifier

classifier = WasteClassifier('ml/models/best_model.pth')

# From file path
result = classifier.predict('image.jpg')

# From base64
result = classifier.predict('data:image/jpeg;base64,...')

print(result)
# {
#   'category': 'recyclable',
#   'confidence': 0.95,
#   'probabilities': {...},
#   'timestamp': 1234567890
# }
\`\`\`

### Export to ONNX

For production deployment:

\`\`\`bash
python -c "from ml.predict import export_to_onnx; export_to_onnx()"
\`\`\`

## Continuous Learning

### Collect Feedback

User corrections are saved to:
\`\`\`
ml/data/retraining/
    recyclable/
    organic/
    ...
\`\`\`

### Retrain Model

Fine-tune model with new samples:

\`\`\`bash
python ml/retrain.py
\`\`\`

Retraining will:
1. Add new samples to training set
2. Fine-tune existing model (lower learning rate)
3. Evaluate improvement
4. Promote model if accuracy improves by >1%
5. Version models (v1, v2, v3, ...)
6. Archive retraining samples
7. Log retraining events

### Automated Retraining

Set up a cron job or scheduled task:

\`\`\`bash
# Weekly retraining
0 2 * * 0 python ml/retrain.py
\`\`\`

## Model Versioning

Models are versioned automatically:
- `best_model.pth` - Current production model
- `model_v1.pth` - Version 1 (archived)
- `model_v2.pth` - Version 2 (archived)
- `best_model_backup_*.pth` - Backup before promotion

## Evaluation Metrics

- **Accuracy**: Overall classification accuracy
- **F1 Score (Macro)**: Average F1 across all categories
- **F1 Score (Weighted)**: Weighted by class frequency
- **Confusion Matrix**: Per-category performance

## Dataset Requirements

### Minimum Samples per Category

- Training: 500+ images per category
- Validation: 100+ images per category
- Test: 100+ images per category

### Image Quality

- Resolution: 640x480 or higher
- Format: JPG or PNG
- Lighting: Various conditions
- Backgrounds: Real-world environments
- Variety: Different angles, distances, overlaps

## Performance Optimization

### CPU Inference

- Uses optimized EfficientNet-B0
- Inference time: ~50ms per image
- No GPU required for deployment

### GPU Training

- Trains 10-20x faster on GPU
- Automatically detects CUDA availability
- Falls back to CPU if no GPU

## Troubleshooting

### Low Accuracy

1. Add more diverse training data
2. Balance dataset (equal samples per category)
3. Increase training epochs
4. Adjust learning rate

### Overfitting

1. Increase dropout rate
2. Add more data augmentation
3. Use early stopping (already enabled)
4. Collect more training data

### Class Confusion

1. Check confusion matrix
2. Add more examples for confused classes
3. Ensure clear visual differences
4. Review mislabeled data

## Next Steps

1. **Collect Data**: Gather Indian waste images
2. **Initial Training**: Train base model
3. **Deploy**: Integrate with backend API
4. **Monitor**: Track prediction accuracy
5. **Improve**: Continuous learning pipeline