Haxxsh commited on
Commit
b6be5d4
·
verified ·
1 Parent(s): d6c3d9b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +165 -0
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - tabular-regression
5
+ - sklearn
6
+ - xgboost
7
+ - random-forest
8
+ - motorsport
9
+ - lap-time-prediction
10
+ datasets:
11
+ - Haxxsh/gdgc-datathon-data
12
+ language:
13
+ - en
14
+ pipeline_tag: tabular-regression
15
+ ---
16
+
17
+ # GDGC Datathon 2025 - Lap Time Prediction Models
18
+
19
+ Trained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition.
20
+
21
+ ## Model Description
22
+
23
+ This repository contains ensemble models trained to predict `Lap_Time_Seconds` for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation.
24
+
25
+ ### Models Included
26
+
27
+ | File | Description | Size |
28
+ |------|-------------|------|
29
+ | `rf_final.pkl` | Final Random Forest model | 158 MB |
30
+ | `xgb_final.pkl` | Final XGBoost model | 2.6 MB |
31
+ | `rf_cv_models.pkl` | Random Forest CV fold models | 13.4 GB |
32
+ | `xgb_cv_models.pkl` | XGBoost CV fold models | 103 MB |
33
+ | `rf_model.pkl` | Base Random Forest model | 95 MB |
34
+ | `xgb_model.pkl` | Base XGBoost model | 2 MB |
35
+ | `feature_engineer.pkl` | Feature preprocessing pipeline | 6 KB |
36
+ | `best_params.json` | Optimal hyperparameters | 1 KB |
37
+ | `cv_results.json` | Cross-validation results | 1 KB |
38
+
39
+ ## Training Data
40
+
41
+ The models were trained on the [GDGC Datathon 2025 dataset](https://huggingface.co/datasets/Haxxsh/gdgc-datathon-data):
42
+
43
+ - **Training samples:** 734,002
44
+ - **Target variable:** `Lap_Time_Seconds` (continuous)
45
+ - **Target range:** 70.001s - 109.999s
46
+ - **Target distribution:** Nearly symmetric (mean ≈ 90s, std ≈ 11.5s)
47
+
48
+ ### Features
49
+
50
+ The dataset includes features such as:
51
+ - Circuit characteristics (length, corners, laps)
52
+ - Weather conditions (temperature, humidity, track condition)
53
+ - Rider/driver information (championship points, position, history)
54
+ - Tire compounds and degradation factors
55
+ - Pit stop durations
56
+
57
+ ## Usage
58
+
59
+ ### Loading the Models
60
+
61
+ ```python
62
+ import pickle
63
+ import joblib
64
+
65
+ # Load the final models
66
+ with open("rf_final.pkl", "rb") as f:
67
+ rf_model = pickle.load(f)
68
+
69
+ with open("xgb_final.pkl", "rb") as f:
70
+ xgb_model = pickle.load(f)
71
+
72
+ # Load feature engineering pipeline
73
+ with open("feature_engineer.pkl", "rb") as f:
74
+ feature_engineer = pickle.load(f)
75
+ ```
76
+
77
+ ### Making Predictions
78
+
79
+ ```python
80
+ import pandas as pd
81
+
82
+ # Load test data
83
+ test_df = pd.read_csv("test.csv")
84
+
85
+ # Apply feature engineering
86
+ X_test = feature_engineer.transform(test_df)
87
+
88
+ # Predict with ensemble (average of RF and XGB)
89
+ rf_preds = rf_model.predict(X_test)
90
+ xgb_preds = xgb_model.predict(X_test)
91
+ ensemble_preds = (rf_preds + xgb_preds) / 2
92
+ ```
93
+
94
+ ### Download from Hugging Face
95
+
96
+ ```python
97
+ from huggingface_hub import hf_hub_download
98
+
99
+ # Download a specific model file
100
+ model_path = hf_hub_download(
101
+ repo_id="Haxxsh/gdgc-datathon-models",
102
+ filename="xgb_final.pkl"
103
+ )
104
+
105
+ # Load it
106
+ with open(model_path, "rb") as f:
107
+ model = pickle.load(f)
108
+ ```
109
+
110
+ ## Hyperparameters
111
+
112
+ Best parameters found via cross-validation (see `best_params.json`):
113
+
114
+ ```json
115
+ {
116
+ "random_forest": {
117
+ "n_estimators": 100,
118
+ "max_depth": null,
119
+ "min_samples_split": 2,
120
+ "min_samples_leaf": 1
121
+ },
122
+ "xgboost": {
123
+ "n_estimators": 100,
124
+ "learning_rate": 0.1,
125
+ "max_depth": 6
126
+ }
127
+ }
128
+ ```
129
+
130
+ ## Evaluation
131
+
132
+ Cross-validation results are stored in `cv_results.json`. Primary metric: **RMSE** (Root Mean Squared Error).
133
+
134
+ ## Training Code
135
+
136
+ The training code is available on GitHub: [ezylopx5/DATATHON](https://github.com/ezylopx5/DATATHON)
137
+
138
+ Key files:
139
+ - `train.py` - Main training script
140
+ - `features.py` - Feature engineering
141
+ - `predict.py` - Inference script
142
+
143
+ ## Framework Versions
144
+
145
+ - Python 3.8+
146
+ - scikit-learn
147
+ - XGBoost
148
+ - pandas
149
+ - numpy
150
+
151
+ ## License
152
+
153
+ MIT License
154
+
155
+ ## Citation
156
+
157
+ ```bibtex
158
+ @misc{gdgc-datathon-2025,
159
+ author = {Haxxsh},
160
+ title = {GDGC Datathon 2025 Lap Time Prediction Models},
161
+ year = {2025},
162
+ publisher = {Hugging Face},
163
+ url = {https://huggingface.co/Haxxsh/gdgc-datathon-models}
164
+ }
165
+ ```