Spaces:

Rahiq
/

garbage-segregate

Sleeping

App Files Files Community

garbage-segregate / ml /README.md

Rahiq

Deploy waste classification backend with ML model

bf17f74 25 days ago

preview code

raw

history blame contribute delete

4.52 kB

ML Training Pipeline

Complete machine learning pipeline for waste classification using PyTorch and EfficientNet-B0.

Setup

1. Install Dependencies

```bash pip install -r ml/requirements.txt ```

2. Prepare Dataset

Option A: Use Public Datasets

```bash

View available datasets

python ml/dataset_prep.py info

Download datasets from sources in DATASET_SOURCES.txt

Extract to ml/data/raw/ with category folders

Organize dataset into train/val/test splits

python ml/dataset_prep.py ```

Option B: Use Custom Data

Place your images in: ``` ml/data/raw/ recyclable/ organic/ wet-waste/ dry-waste/ ewaste/ hazardous/ landfill/ ```

Then run: ```bash python ml/dataset_prep.py ```

Training

Initial Training

Train from scratch with pretrained EfficientNet-B0:

```bash python ml/train.py ```

Training will:

Use transfer learning with ImageNet pretrained weights
Apply data augmentation for better generalization
Save best model to ml/models/best_model.pth
Generate confusion matrix
Log training history

Model Architecture

Base: EfficientNet-B0 (pretrained on ImageNet)
Input: 224x224 RGB images
Output: 7 waste categories
Parameters: ~5.3M
Inference Time: ~50ms on CPU

Why EfficientNet-B0?

Accuracy: State-of-the-art performance
Speed: Optimized for mobile/edge devices
Size: Compact model (~20MB)
Efficiency: Best accuracy-to-parameters ratio

Inference

Python Inference

```python from ml.predict import WasteClassifier

classifier = WasteClassifier('ml/models/best_model.pth')

From file path

result = classifier.predict('image.jpg')

From base64

result = classifier.predict('data:image/jpeg;base64,...')

print(result)

{

'category': 'recyclable',

'confidence': 0.95,

'probabilities': {...},

'timestamp': 1234567890

}

```

Export to ONNX

For production deployment:

```bash python -c "from ml.predict import export_to_onnx; export_to_onnx()" ```

Continuous Learning

Collect Feedback

User corrections are saved to: ``` ml/data/retraining/ recyclable/ organic/ ... ```

Retrain Model

Fine-tune model with new samples:

```bash python ml/retrain.py ```

Retraining will:

Add new samples to training set
Fine-tune existing model (lower learning rate)
Evaluate improvement
Promote model if accuracy improves by >1%
Version models (v1, v2, v3, ...)
Archive retraining samples
Log retraining events

Automated Retraining

Set up a cron job or scheduled task:

```bash

Weekly retraining

0 2 * * 0 python ml/retrain.py ```

Model Versioning

Models are versioned automatically:

best_model.pth - Current production model
model_v1.pth - Version 1 (archived)
model_v2.pth - Version 2 (archived)
best_model_backup_*.pth - Backup before promotion

Evaluation Metrics

Accuracy: Overall classification accuracy
F1 Score (Macro): Average F1 across all categories
F1 Score (Weighted): Weighted by class frequency
Confusion Matrix: Per-category performance

Dataset Requirements

Minimum Samples per Category

Training: 500+ images per category
Validation: 100+ images per category
Test: 100+ images per category

Image Quality

Resolution: 640x480 or higher
Format: JPG or PNG
Lighting: Various conditions
Backgrounds: Real-world environments
Variety: Different angles, distances, overlaps

Performance Optimization

CPU Inference

Uses optimized EfficientNet-B0
Inference time: ~50ms per image
No GPU required for deployment

GPU Training

Trains 10-20x faster on GPU
Automatically detects CUDA availability
Falls back to CPU if no GPU

Troubleshooting

Low Accuracy

Add more diverse training data
Balance dataset (equal samples per category)
Increase training epochs
Adjust learning rate

Overfitting

Increase dropout rate
Add more data augmentation
Use early stopping (already enabled)
Collect more training data

Class Confusion

Check confusion matrix
Add more examples for confused classes
Ensure clear visual differences
Review mislabeled data

Next Steps

Collect Data: Gather Indian waste images
Initial Training: Train base model
Deploy: Integrate with backend API
Monitor: Track prediction accuracy
Improve: Continuous learning pipeline