jennamk14 commited on
Commit
8d15d01
·
verified ·
1 Parent(s): c2289fa

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +242 -106
README.md CHANGED
@@ -1,127 +1,263 @@
1
- # MMLA Repo
2
- Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset
3
-
4
- ![zebras_giraffes](vizualizations/location_1_session_5_DJI_0211_partition_1_DJI_0211_002590.jpg)
5
- Example photo from the MMLA dataset and labels generated from model. The image shows a group of zebras and giraffes at the Mpala Research Centre in Kenya.
6
- ## Table of Contents
7
- - [How to use the scripts in this repo](#how-to-use-the-scripts-in-this-repo)
8
- - [Requirements](#requirements)
9
- - [Baseline YOLO evaluation](#baseline-yolo-evaluation)
10
- - [Download evaluation data from HuggingFace](#download-evaluation-data-from-huggingface)
11
- - [Run the evaluate_yolo script](#run-the-evaluate_yolo-script)
12
- - [Model Training](#model-training)
13
- - [Prepare the dataset](#prepare-the-dataset)
14
- - [Optional: Downsample the frames](#optional-downsample-the-frames)
15
- - [Run the training script](#run-the-training-script)
16
- - [Evaluation](#evaluation)
17
- - [Optional: Perform bootstrapping](#optional-perform-bootstrapping)
18
- - [Results](#results)
19
- - [Fine-Tuned Model Weights](#fine-tuned-model)
20
- - [Paper](#paper)
21
- - [Dataset](#dataset)
22
-
23
- This repo provides scripts to fine-tune YOLO models on the MMLA dataset. The [MMLA dataset](https://huggingface.co/collections/imageomics/wildwing-67f572d3ba17fca922c80182) is a collection of low-altitude aerial footage of various species in different environments. The dataset is designed to help researchers and practitioners develop and evaluate object detection models for wildlife monitoring and conservation.
24
-
25
-
26
- # How to use the scripts in this repo
27
-
28
- ### Requirements
29
- ```bash
30
- # install packages from requirements
31
- conda create --name yolo_env --file requirements.txt
32
- # OR using pip
33
- pip install -r requirements.txt
34
- ```
35
 
 
36
 
 
37
 
38
- ## Baseline YOLO evaluation
39
- ### Download evaluation data from HuggingFace
40
- This dataset contains an evenly distributed set of frames from the MMLA dataset, with bounding box annotations for each frame. The dataset is designed to help researchers and practitioners evaluate the performance of object detection models on low-altitude aerial footage containing a variety of environments and species.
41
 
42
- ```bash
43
- # download the datasets from HuggingFace to local /data directory
44
 
45
- git clone
46
- ```
 
 
47
 
48
- ### Run the evaluate_yolo script
49
- ```bash
50
- # example usage
51
- python model_eval/evaluate_yolo.py --model model_eval/yolov5mu.pt --images model_eval/eval_data/frames_500_coco --annotations model_eval/eval_data/frames_500_coco --output model_eval/results/frames_500_coco/yolov5m
52
 
53
- ```
54
- ## Model Training
55
 
56
- ### Prepare the dataset
57
- ```bash
58
- # download the datasets from HuggingFace to local /data directory
59
 
60
- # wilds dataset
61
- git clone https://huggingface.co/datasets/imageomics/wildwing_wilds
62
- # opc dataset
63
- git clone https://huggingface.co/datasets/imageomics/wildwing_opc
64
- # mpala dataset
65
- git clone https://huggingface.co/datasets/imageomics/wildwing_mpala
66
 
67
- # run the script to split the dataset into train and test sets
68
- python prepare_yolo_dataset.py
69
 
70
- ```
 
 
 
71
 
72
- #### Alternatively, you can create your own dataset from video frames and bounding box annotations
73
- ```bash
74
- python frame_extractor.py --dataset wilds --dataset_path ./wildwing_wilds --output_dir ./wildwing_wilds
75
 
76
- ```
77
- ### Optional: Downsample the frames to extract a subset of frames from each video
78
- ```bash
79
- python downsample.py --dataset wilds --dataset_path ./wildwing_wilds --output_dir ./wildwing_wilds --downsample_rate 0.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  ```
81
 
82
- ### Run the training script
83
- ```bash
84
- # run the training script
85
- python train.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ```
87
 
 
 
 
 
 
 
 
 
88
  ## Evaluation
89
- To evaluate the trained model on the test data:
90
- ```bash
91
- # run the validate script
92
- python validate.py
93
- ```
94
 
95
- ### Optional: Perform bootstrapping to get confidence intervals
96
- ```bash
97
- # run the evaluation script
98
- bootstrap.ipynb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ```
100
- #### Download inference results from baseline and fine-tned model
101
-
102
- ## Results
103
- Our fine-tuned YOLO11m model achieves the following performance on the MMLA dataset:
104
- | Class | Images | Instances | Box(P) | R | mAP50 | mAP50-95 |
105
- |---------|--------|-----------|--------|-------|-------|----------|
106
- | all | 7,658 | 44,619 | 0.867 | 0.764 | 0.801 | 0.488 |
107
- | Zebra | 4,430 | 28,219 | 0.768 | 0.647 | 0.675 | 0.273 |
108
- | Giraffe | 868 | 1,357 | 0.788 | 0.634 | 0.678 | 0.314 |
109
- | Onager | 172 | 1,584 | 0.939 | 0.776 | 0.857 | 0.505 |
110
- | Dog | 3,022 | 13,459 | 0.973 | 0.998 | 0.995 | 0.860 |
111
-
112
-
113
- # Fine-Tuned Model
114
- See [HuggingFace Repo](https://huggingface.co/imageomics/mmla) for details and weights.
115
-
116
- # Dataset
117
- See [HuggingFace Repo](https://huggingface.co/collections/imageomics/wildwing-67f572d3ba17fca922c80182) for MMLA dataset.
118
-
119
- # Paper
120
- ```bibtex
121
- @article{kline2025mmla,
122
- title={MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset},
123
- author={Kline, Jenna and Stevens, Samuel and Maalouf, Guy and Saint-Jean, Camille Rondeau and Ngoc, Dat Nguyen and Mirmehdi, Majid and Guerin, David and Burghardt, Tilo and Pastucha, Elzbieta and Costelloe, Blair and others},
124
- journal={arXiv preprint arXiv:2504.07744},
125
- year={2025}
126
  }
127
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: ultralytics
6
+ tags:
7
+ - biology
8
+ - CV
9
+ - images
10
+ - animals
11
+ - object-detection
12
+ - YOLO
13
+ - fine-tuned
14
+ datasets:
15
+ - custom_animal_dataset
16
+ metrics:
17
+ - precision
18
+ - recall
19
+ - mAP50
20
+ - mAP50-95
21
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
+ # Model Card for Fine-Tuned YOLOv11m Animal Detection Model
24
 
25
+ This model is a fine-tuned version of YOLOv11m optimized for detection and classification of wildlife from low-altitude drone imagery. It has been trained to identify zebras (Plains and Grevy's), giraffes (reticulated and Masai), Persian onagers, and African Painted dogs with high accuracy across diverse environmental conditions.
26
 
27
+ ## Model Details
 
 
28
 
29
+ ### Model Description
 
30
 
31
+ - **Developed by:** Jenna Kline
32
+ - **Model type:** Object Detection and Classification
33
+ - **Language(s) (NLP):** Not applicable (Computer Vision model)
34
+ - **Fine-tuned from model:** YOLOv11m (ultralytics/yolo11m.pt)
35
 
36
+ ### Model Sources
 
 
 
37
 
38
+ - **Repository:** [https://github.com/Imageomics/mmla](https://github.com/Imageomics/mmla)
39
+ - **Paper:** [MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset](https://arxiv.org/abs/2504.07744)
40
 
41
+ ## Uses
 
 
42
 
43
+ ### Direct Use
 
 
 
 
 
44
 
45
+ This model is designed for direct use in wildlife monitoring applications, ecological research, and biodiversity studies. It can:
 
46
 
47
+ - Detect and classify zebras, giraffes, onagers, and African wild dogs in low-altitude drone images
48
+ - Monitor wildlife populations in their natural habitats
49
+ - Automate animal ecology data collection using drones and computer vision
50
+ - Support biodiversity assessments by identifying species present in field surveys
51
 
52
+ The model can be used by researchers, conservationists, wildlife managers, and citizen scientists to automate and scale up wildlife monitoring efforts, particularly in African ecosystems.
 
 
53
 
54
+ ### Downstream Use
55
+
56
+ This model can be integrated into larger ecological monitoring systems including:
57
+ - Wildlife conservation monitoring platforms
58
+ - Ecological research workflows
59
+ - Environmental impact assessment tools
60
+
61
+ ### Out-of-Scope Use
62
+
63
+ This model is not suitable for:
64
+ - Security or surveillance applications targeting humans
65
+ - Applications where errors in detection could lead to harmful conservation decisions without human verification
66
+ - Real-time detection systems requiring extremely low latency (model prioritizes accuracy over speed)
67
+ - Detection of species not included in the training set (only trained on zebras, giraffes, onagers, and dogs)
68
+
69
+ ## Bias, Risks, and Limitations
70
+
71
+ - **Species representation bias:** The model may perform better on species that were well-represented in the training data.
72
+ - **Environmental bias:** Performance may degrade in environmental conditions not represented in the training data (e.g., extreme weather, unusual lighting).
73
+ - **Morphological bias:** Similar-looking species may be confused with one another (particularly among equids like zebras and onagers).
74
+ - **Geospatial bias:** The model may perform better in biomes similar to those present in the training data, particularly African savanna environments.
75
+ - **Seasonal bias:** Detection accuracy may vary based on seasonal appearance changes in animals or environments.
76
+ - **Technical limitations:** Performance depends on image quality, with reduced accuracy in low-resolution, blurry, or poorly exposed images.
77
+
78
+ ### Recommendations
79
+
80
+ Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model:
81
+ - Always verify critical detections with human review, especially for rare species or conservation decision-making
82
+ - Consider confidence scores when evaluating detections
83
+ - Be cautious when applying the model to new geographic regions or habitats not represented in training data
84
+ - Periodically validate model performance on new data to ensure continued reliability
85
+ - Consider fine-tuning the model on domain-specific data when applying to new regions or species
86
+
87
+ ## How to Get Started with the Model
88
+
89
+ Use the code below to get started with the model:
90
+
91
+ ```python
92
+ from ultralytics import YOLO
93
+
94
+ # Load the model
95
+ model = YOLO('path/to/your/model.pt')
96
+
97
+ # Run inference on an image
98
+ results = model('path/to/image.jpg')
99
+
100
+ # Process results
101
+ for result in results:
102
+ boxes = result.boxes # Boxes object for bounding boxes outputs
103
+ for box in boxes:
104
+ x1, y1, x2, y2 = box.xyxy[0] # get box coordinates
105
+ conf = box.conf[0] # confidence score
106
+ cls = int(box.cls[0]) # class id
107
+ class_name = model.names[cls] # class name (Zebra, Giraffe, Onager, or Dog)
108
+ print(f"Detected {class_name} with confidence {conf:.2f} at position {x1:.1f}, {y1:.1f}, {x2:.1f}, {y2:.1f}")
109
+
110
+ # Visualize results
111
+ results[0].plot()
112
  ```
113
 
114
+ ## Training Details
115
+
116
+ ### Training Data
117
+
118
+ Dataset is available at [Hugging Face](https://huggingface.co/collections/imageomics/wildwing-67f572d3ba17fca922c80182). See prepare_yolo_dataset.py for details on train/test splits.
119
+
120
+ #### Dataset splitting strategy
121
+ We applied a stratified 60/40 train-test split across species and locations to evaluate model generalizability. Data was collected from three distinct environments: Mpala Research Centre (location_1), Ol Pejeta Conservancy (location_2), and The Wilds Conservation Center (location_3). The dataset includes four target classes: Zebra, Giraffe, Onager, and African Wild Dog.
122
+
123
+ To prevent overlap in individual animals or environmental conditions between training and testing, we split video sessions at the file level—ensuring that no frames from a given session appear in both train and test sets. This also allows consistent per-frame sampling at a fixed interval (every 10th frame).
124
+
125
+ Training set includes:
126
+ - Mpala (location_1): Multiple full sessions for Giraffes, Plains Zebras, and Grevy’s Zebras, including mixed-species scenes.-
127
+ - Ol Pejeta (location_2): Full sessions of Plains Zebras.
128
+ - The Wilds (location_3): 70% of sessions for Painted Dogs, Giraffes, and Persian Onagers.
129
+
130
+ Test set includes:
131
+ - The Wilds (location_3): The remaining 30% of sessions, including additional Grevy’s Zebra sessions used exclusively for testing.
132
+ - Mpala (location_1) and Ol Pejeta (location_2): Separate zebra and mixed-species sessions not used during training.
133
+
134
+ This careful division by session and location ensures that the model is evaluated on unseen environments, individuals, and contexts, making it a robust benchmark for testing generalization across ecological and geographic domains.
135
+
136
+ ### Training Procedure
137
+
138
+ #### Preprocessing
139
+
140
+ - Images were resized to 640x640 pixels (as specified in the training script)
141
+ - Standard YOLOv11 augmentation pipeline was applied
142
+
143
+ #### Training Hyperparameters
144
+
145
+ The model was trained with the following hyperparameters as specified in the training script:
146
+ - **Base model:** YOLOv11m (yolo11m.pt)
147
+ - **Epochs:** 50
148
+ - **Image size:** 640
149
+ - **Dataset configuration:** Custom YAML file defining 4 classes (Zebra, Giraffe, Onager, Dog)
150
+ - **Training regime:** Default YOLOv11 training parameters
151
+
152
+ ```python
153
+ # Training script
154
+ from ultralytics import YOLO
155
+
156
+ model = YOLO("yolo11m.pt")
157
+ results = model.train(
158
+ data="/data/dataset.yaml",
159
+ epochs=50,
160
+ imgsz=640,
161
+ )
162
  ```
163
 
164
+ #### Speeds, Sizes, Times
165
+
166
+ - **Training hardware:** 2 Tesla V100-PCIE-16GB, 16144MiB
167
+ - **Training time:** 2 hours, 11 minutes
168
+ - **Model size:** YOLO11m summary - 231 layers, 20,056,092 parameters, 20,056,076 gradients, 68.2 GFLOPs
169
+ - **Inference speed:** 0.1ms preprocess, 4.6ms inference, 0.0ms loss, 0.9ms postprocess per image on Tesla V100-PCIE-16GB, 16144MiB
170
+
171
+
172
  ## Evaluation
 
 
 
 
 
173
 
174
+ ### Testing Data, Factors & Metrics
175
+
176
+ #### Testing Data
177
+
178
+ The model was evaluated on a held-out test set located at `/fs/ess/PAS2136/Kenya-2023/yolo_benchmark/HerdYOLO/data/images/test` containing:
179
+ - 7658 test images with instances of Zebra, Giraffe, Onager, and Dog
180
+
181
+
182
+ #### Factors
183
+
184
+ The evaluation disaggregated performance by:
185
+ - Species category (Zebra, Giraffe, Onager, African wild dog)
186
+
187
+ #### Metrics
188
+
189
+ The model was evaluated using standard object detection metrics:
190
+ - **Precision:** Ratio of true positives to all predicted positives
191
+ - **Recall:** Ratio of true positives to all actual positives (ground truth)
192
+ - **mAP50:** Mean Average Precision at IoU threshold of 0.5
193
+ - **mAP50-95:** Mean Average Precision averaged over IoU thresholds from 0.5 to 0.95
194
+
195
+ ### Results
196
+
197
+ #### Summary
198
+
199
+ - **Overall mAP50:** 80.1%
200
+ - **Overall mAP50-95:** 48.8%
201
+ - **Per-class performance:**
202
+ - Zebra: mAP50 = 67.5%, Precision = 76.5%, Recall = 64.7%
203
+ - Giraffe: mAP50 = 67.8%, Precision = 78.8%, Recall = 63.4%
204
+ - Onager: mAP50 = 85.7%, Precision = 93.9%, Recall = 77.6%
205
+ - Dog: mAP50 = 99.5%, Precision = 97.3%, Recall = 99.8%
206
+
207
+
208
+
209
+ ## Technical Specifications
210
+
211
+ ### Model Architecture and Objective
212
+
213
+ - Base architecture: YOLOv11m
214
+ - Detection heads: Standard YOLOv11 architecture
215
+ - Classes: 4 (Zebra, Giraffe, Onager, Dog)
216
+
217
+ ### Compute Infrastructure
218
+
219
+
220
+ #### Software
221
+
222
+ - Python 3.8+
223
+ - PyTorch 2.0+
224
+ - Ultralytics YOLOv11 framework
225
+ - CUDA 11.7+ (for GPU acceleration)
226
+
227
+ ## Citation
228
+
229
+ **BibTeX:**
230
+
231
  ```
232
+ @software{mmla_finetuned_yolo11m,
233
+ author = {Jenna Kline},
234
+ title = {Fine-Tuned YOLOv11m Animal Detection Model},
235
+ version = {1.0.0},
236
+ year = {2025},
237
+ url = {https://huggingface.co/imageomics/mmla}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
238
  }
239
+ ```
240
+
241
+ ## Acknowledgements
242
+
243
+ This work was supported by both the [Imageomics Institute](https://imageomics.org) and the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org). The Imageomics Institute is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). The ABC Global Center is funded by the US National Science Foundation under [Award No. 2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false) and Natural Sciences and Engineering Research Council of Canada under [Award No. 585136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440). This model draws on research supported by the Social Sciences and Humanities Research Council.
244
+
245
+ Additional support was provided by the National Ecological Observatory Network (NEON), a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute. This material is based in part upon work supported by the National Science Foundation through the NEON Program.
246
+
247
+ Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Natural Sciences and Engineering Research Council of Canada, or Social Sciences and Humanities Research Council.
248
+
249
+ ## Glossary
250
+
251
+ - **YOLO:** You Only Look Once, a family of real-time object detection models
252
+ - **mAP:** mean Average Precision, a standard metric for evaluating object detection models
253
+ - **IoU:** Intersection over Union, a measure of overlap between predicted and ground truth bounding boxes
254
+ - **Onager:** Also known as the Asian wild ass, a species of equid native to Asia
255
+ - **YOLOv11m:** The medium-sized variant of the YOLOv11 architecture
256
+
257
+ ## Model Card Authors
258
+
259
+ Jenna Kline, The Ohio State University
260
+
261
+ ## Model Card Contact
262
+
263
+ kline.377 at osu dot edu