garbage-segregate / ml /README.md
Rahiq's picture
Deploy waste classification backend with ML model
bf17f74

ML Training Pipeline

Complete machine learning pipeline for waste classification using PyTorch and EfficientNet-B0.

Setup

1. Install Dependencies

```bash pip install -r ml/requirements.txt ```

2. Prepare Dataset

Option A: Use Public Datasets

```bash

View available datasets

python ml/dataset_prep.py info

Download datasets from sources in DATASET_SOURCES.txt

Extract to ml/data/raw/ with category folders

Organize dataset into train/val/test splits

python ml/dataset_prep.py ```

Option B: Use Custom Data

Place your images in: ``` ml/data/raw/ recyclable/ organic/ wet-waste/ dry-waste/ ewaste/ hazardous/ landfill/ ```

Then run: ```bash python ml/dataset_prep.py ```

Training

Initial Training

Train from scratch with pretrained EfficientNet-B0:

```bash python ml/train.py ```

Training will:

  • Use transfer learning with ImageNet pretrained weights
  • Apply data augmentation for better generalization
  • Save best model to ml/models/best_model.pth
  • Generate confusion matrix
  • Log training history

Model Architecture

  • Base: EfficientNet-B0 (pretrained on ImageNet)
  • Input: 224x224 RGB images
  • Output: 7 waste categories
  • Parameters: ~5.3M
  • Inference Time: ~50ms on CPU

Why EfficientNet-B0?

  1. Accuracy: State-of-the-art performance
  2. Speed: Optimized for mobile/edge devices
  3. Size: Compact model (~20MB)
  4. Efficiency: Best accuracy-to-parameters ratio

Inference

Python Inference

```python from ml.predict import WasteClassifier

classifier = WasteClassifier('ml/models/best_model.pth')

From file path

result = classifier.predict('image.jpg')

From base64

result = classifier.predict('data:image/jpeg;base64,...')

print(result)

{

'category': 'recyclable',

'confidence': 0.95,

'probabilities': {...},

'timestamp': 1234567890

}

```

Export to ONNX

For production deployment:

```bash python -c "from ml.predict import export_to_onnx; export_to_onnx()" ```

Continuous Learning

Collect Feedback

User corrections are saved to: ``` ml/data/retraining/ recyclable/ organic/ ... ```

Retrain Model

Fine-tune model with new samples:

```bash python ml/retrain.py ```

Retraining will:

  1. Add new samples to training set
  2. Fine-tune existing model (lower learning rate)
  3. Evaluate improvement
  4. Promote model if accuracy improves by >1%
  5. Version models (v1, v2, v3, ...)
  6. Archive retraining samples
  7. Log retraining events

Automated Retraining

Set up a cron job or scheduled task:

```bash

Weekly retraining

0 2 * * 0 python ml/retrain.py ```

Model Versioning

Models are versioned automatically:

  • best_model.pth - Current production model
  • model_v1.pth - Version 1 (archived)
  • model_v2.pth - Version 2 (archived)
  • best_model_backup_*.pth - Backup before promotion

Evaluation Metrics

  • Accuracy: Overall classification accuracy
  • F1 Score (Macro): Average F1 across all categories
  • F1 Score (Weighted): Weighted by class frequency
  • Confusion Matrix: Per-category performance

Dataset Requirements

Minimum Samples per Category

  • Training: 500+ images per category
  • Validation: 100+ images per category
  • Test: 100+ images per category

Image Quality

  • Resolution: 640x480 or higher
  • Format: JPG or PNG
  • Lighting: Various conditions
  • Backgrounds: Real-world environments
  • Variety: Different angles, distances, overlaps

Performance Optimization

CPU Inference

  • Uses optimized EfficientNet-B0
  • Inference time: ~50ms per image
  • No GPU required for deployment

GPU Training

  • Trains 10-20x faster on GPU
  • Automatically detects CUDA availability
  • Falls back to CPU if no GPU

Troubleshooting

Low Accuracy

  1. Add more diverse training data
  2. Balance dataset (equal samples per category)
  3. Increase training epochs
  4. Adjust learning rate

Overfitting

  1. Increase dropout rate
  2. Add more data augmentation
  3. Use early stopping (already enabled)
  4. Collect more training data

Class Confusion

  1. Check confusion matrix
  2. Add more examples for confused classes
  3. Ensure clear visual differences
  4. Review mislabeled data

Next Steps

  1. Collect Data: Gather Indian waste images
  2. Initial Training: Train base model
  3. Deploy: Integrate with backend API
  4. Monitor: Track prediction accuracy
  5. Improve: Continuous learning pipeline