Spaces:
Sleeping
ML Training Pipeline
Complete machine learning pipeline for waste classification using PyTorch and EfficientNet-B0.
Setup
1. Install Dependencies
```bash pip install -r ml/requirements.txt ```
2. Prepare Dataset
Option A: Use Public Datasets
```bash
View available datasets
python ml/dataset_prep.py info
Download datasets from sources in DATASET_SOURCES.txt
Extract to ml/data/raw/ with category folders
Organize dataset into train/val/test splits
python ml/dataset_prep.py ```
Option B: Use Custom Data
Place your images in: ``` ml/data/raw/ recyclable/ organic/ wet-waste/ dry-waste/ ewaste/ hazardous/ landfill/ ```
Then run: ```bash python ml/dataset_prep.py ```
Training
Initial Training
Train from scratch with pretrained EfficientNet-B0:
```bash python ml/train.py ```
Training will:
- Use transfer learning with ImageNet pretrained weights
- Apply data augmentation for better generalization
- Save best model to
ml/models/best_model.pth - Generate confusion matrix
- Log training history
Model Architecture
- Base: EfficientNet-B0 (pretrained on ImageNet)
- Input: 224x224 RGB images
- Output: 7 waste categories
- Parameters: ~5.3M
- Inference Time: ~50ms on CPU
Why EfficientNet-B0?
- Accuracy: State-of-the-art performance
- Speed: Optimized for mobile/edge devices
- Size: Compact model (~20MB)
- Efficiency: Best accuracy-to-parameters ratio
Inference
Python Inference
```python from ml.predict import WasteClassifier
classifier = WasteClassifier('ml/models/best_model.pth')
From file path
result = classifier.predict('image.jpg')
From base64
result = classifier.predict('data:image/jpeg;base64,...')
print(result)
{
'category': 'recyclable',
'confidence': 0.95,
'probabilities': {...},
'timestamp': 1234567890
}
```
Export to ONNX
For production deployment:
```bash python -c "from ml.predict import export_to_onnx; export_to_onnx()" ```
Continuous Learning
Collect Feedback
User corrections are saved to: ``` ml/data/retraining/ recyclable/ organic/ ... ```
Retrain Model
Fine-tune model with new samples:
```bash python ml/retrain.py ```
Retraining will:
- Add new samples to training set
- Fine-tune existing model (lower learning rate)
- Evaluate improvement
- Promote model if accuracy improves by >1%
- Version models (v1, v2, v3, ...)
- Archive retraining samples
- Log retraining events
Automated Retraining
Set up a cron job or scheduled task:
```bash
Weekly retraining
0 2 * * 0 python ml/retrain.py ```
Model Versioning
Models are versioned automatically:
best_model.pth- Current production modelmodel_v1.pth- Version 1 (archived)model_v2.pth- Version 2 (archived)best_model_backup_*.pth- Backup before promotion
Evaluation Metrics
- Accuracy: Overall classification accuracy
- F1 Score (Macro): Average F1 across all categories
- F1 Score (Weighted): Weighted by class frequency
- Confusion Matrix: Per-category performance
Dataset Requirements
Minimum Samples per Category
- Training: 500+ images per category
- Validation: 100+ images per category
- Test: 100+ images per category
Image Quality
- Resolution: 640x480 or higher
- Format: JPG or PNG
- Lighting: Various conditions
- Backgrounds: Real-world environments
- Variety: Different angles, distances, overlaps
Performance Optimization
CPU Inference
- Uses optimized EfficientNet-B0
- Inference time: ~50ms per image
- No GPU required for deployment
GPU Training
- Trains 10-20x faster on GPU
- Automatically detects CUDA availability
- Falls back to CPU if no GPU
Troubleshooting
Low Accuracy
- Add more diverse training data
- Balance dataset (equal samples per category)
- Increase training epochs
- Adjust learning rate
Overfitting
- Increase dropout rate
- Add more data augmentation
- Use early stopping (already enabled)
- Collect more training data
Class Confusion
- Check confusion matrix
- Add more examples for confused classes
- Ensure clear visual differences
- Review mislabeled data
Next Steps
- Collect Data: Gather Indian waste images
- Initial Training: Train base model
- Deploy: Integrate with backend API
- Monitor: Track prediction accuracy
- Improve: Continuous learning pipeline