Spaces:

Rahiq
/

garbage-segregate

Sleeping

App Files Files Community

garbage-segregate / ml /README.md

Rahiq

Deploy waste classification backend with ML model

bf17f74 26 days ago

preview code

raw

history blame contribute delete

4.52 kB

	# ML Training Pipeline

	Complete machine learning pipeline for waste classification using PyTorch and EfficientNet-B0.

	## Setup

	### 1. Install Dependencies

	\`\`\`bash
	pip install -r ml/requirements.txt
	\`\`\`

	### 2. Prepare Dataset

	#### Option A: Use Public Datasets

	\`\`\`bash
	# View available datasets
	python ml/dataset_prep.py info

	# Download datasets from sources in DATASET_SOURCES.txt
	# Extract to ml/data/raw/ with category folders

	# Organize dataset into train/val/test splits
	python ml/dataset_prep.py
	\`\`\`

	#### Option B: Use Custom Data

	Place your images in:
	\`\`\`
	ml/data/raw/
	recyclable/
	organic/
	wet-waste/
	dry-waste/
	ewaste/
	hazardous/
	landfill/
	\`\`\`

	Then run:
	\`\`\`bash
	python ml/dataset_prep.py
	\`\`\`

	## Training

	### Initial Training

	Train from scratch with pretrained EfficientNet-B0:

	\`\`\`bash
	python ml/train.py
	\`\`\`

	Training will:
	- Use transfer learning with ImageNet pretrained weights
	- Apply data augmentation for better generalization
	- Save best model to `ml/models/best_model.pth`
	- Generate confusion matrix
	- Log training history

	### Model Architecture

	- Base: EfficientNet-B0 (pretrained on ImageNet)
	- Input: 224x224 RGB images
	- Output: 7 waste categories
	- Parameters: ~5.3M
	- Inference Time: ~50ms on CPU

	### Why EfficientNet-B0?

	1. Accuracy: State-of-the-art performance
	2. Speed: Optimized for mobile/edge devices
	3. Size: Compact model (~20MB)
	4. Efficiency: Best accuracy-to-parameters ratio

	## Inference

	### Python Inference

	\`\`\`python
	from ml.predict import WasteClassifier

	classifier = WasteClassifier('ml/models/best_model.pth')

	# From file path
	result = classifier.predict('image.jpg')

	# From base64
	result = classifier.predict('data:image/jpeg;base64,...')

	print(result)
	# {
	# 'category': 'recyclable',
	# 'confidence': 0.95,
	# 'probabilities': {...},
	# 'timestamp': 1234567890
	# }
	\`\`\`

	### Export to ONNX

	For production deployment:

	\`\`\`bash
	python -c "from ml.predict import export_to_onnx; export_to_onnx()"
	\`\`\`

	## Continuous Learning

	### Collect Feedback

	User corrections are saved to:
	\`\`\`
	ml/data/retraining/
	recyclable/
	organic/
	...
	\`\`\`

	### Retrain Model

	Fine-tune model with new samples:

	\`\`\`bash
	python ml/retrain.py
	\`\`\`

	Retraining will:
	1. Add new samples to training set
	2. Fine-tune existing model (lower learning rate)
	3. Evaluate improvement
	4. Promote model if accuracy improves by >1%
	5. Version models (v1, v2, v3, ...)
	6. Archive retraining samples
	7. Log retraining events

	### Automated Retraining

	Set up a cron job or scheduled task:

	\`\`\`bash
	# Weekly retraining
	0 2 * * 0 python ml/retrain.py
	\`\`\`

	## Model Versioning

	Models are versioned automatically:
	- `best_model.pth` - Current production model
	- `model_v1.pth` - Version 1 (archived)
	- `model_v2.pth` - Version 2 (archived)
	- `best_model_backup_*.pth` - Backup before promotion

	## Evaluation Metrics

	- Accuracy: Overall classification accuracy
	- F1 Score (Macro): Average F1 across all categories
	- F1 Score (Weighted): Weighted by class frequency
	- Confusion Matrix: Per-category performance

	## Dataset Requirements

	### Minimum Samples per Category

	- Training: 500+ images per category
	- Validation: 100+ images per category
	- Test: 100+ images per category

	### Image Quality

	- Resolution: 640x480 or higher
	- Format: JPG or PNG
	- Lighting: Various conditions
	- Backgrounds: Real-world environments
	- Variety: Different angles, distances, overlaps

	## Performance Optimization

	### CPU Inference

	- Uses optimized EfficientNet-B0
	- Inference time: ~50ms per image
	- No GPU required for deployment

	### GPU Training

	- Trains 10-20x faster on GPU
	- Automatically detects CUDA availability
	- Falls back to CPU if no GPU

	## Troubleshooting

	### Low Accuracy

	1. Add more diverse training data
	2. Balance dataset (equal samples per category)
	3. Increase training epochs
	4. Adjust learning rate

	### Overfitting

	1. Increase dropout rate
	2. Add more data augmentation
	3. Use early stopping (already enabled)
	4. Collect more training data

	### Class Confusion

	1. Check confusion matrix
	2. Add more examples for confused classes
	3. Ensure clear visual differences
	4. Review mislabeled data

	## Next Steps

	1. Collect Data: Gather Indian waste images
	2. Initial Training: Train base model
	3. Deploy: Integrate with backend API
	4. Monitor: Track prediction accuracy
	5. Improve: Continuous learning pipeline