Haxxsh
/

gdgc-datathon-models

+---
+license: mit
+tags:
+  - tabular-regression
+  - sklearn
+  - xgboost
+  - random-forest
+  - motorsport
+  - lap-time-prediction
+datasets:
+  - Haxxsh/gdgc-datathon-data
+language:
+  - en
+pipeline_tag: tabular-regression
+---
+# GDGC Datathon 2025 - Lap Time Prediction Models
+Trained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition.
+## Model Description
+This repository contains ensemble models trained to predict `Lap_Time_Seconds` for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation.
+### Models Included
+| File | Description | Size |
+|------|-------------|------|
+| `rf_final.pkl` | Final Random Forest model | 158 MB |
+| `xgb_final.pkl` | Final XGBoost model | 2.6 MB |
+| `rf_cv_models.pkl` | Random Forest CV fold models | 13.4 GB |
+| `xgb_cv_models.pkl` | XGBoost CV fold models | 103 MB |
+| `rf_model.pkl` | Base Random Forest model | 95 MB |
+| `xgb_model.pkl` | Base XGBoost model | 2 MB |
+| `feature_engineer.pkl` | Feature preprocessing pipeline | 6 KB |
+| `best_params.json` | Optimal hyperparameters | 1 KB |
+| `cv_results.json` | Cross-validation results | 1 KB |
+## Training Data
+The models were trained on the [GDGC Datathon 2025 dataset](https://huggingface.co/datasets/Haxxsh/gdgc-datathon-data):
+- **Training samples:** 734,002
+- **Target variable:** `Lap_Time_Seconds` (continuous)
+- **Target range:** 70.001s - 109.999s
+- **Target distribution:** Nearly symmetric (mean ≈ 90s, std ≈ 11.5s)
+### Features
+The dataset includes features such as:
+- Circuit characteristics (length, corners, laps)
+- Weather conditions (temperature, humidity, track condition)
+- Rider/driver information (championship points, position, history)
+- Tire compounds and degradation factors
+- Pit stop durations
+## Usage
+### Loading the Models
+```python
+import pickle
+import joblib
+# Load the final models
+with open("rf_final.pkl", "rb") as f:
+    rf_model = pickle.load(f)
+with open("xgb_final.pkl", "rb") as f:
+    xgb_model = pickle.load(f)
+# Load feature engineering pipeline
+with open("feature_engineer.pkl", "rb") as f:
+    feature_engineer = pickle.load(f)
+```
+### Making Predictions
+```python
+import pandas as pd
+# Load test data
+test_df = pd.read_csv("test.csv")
+# Apply feature engineering
+X_test = feature_engineer.transform(test_df)
+# Predict with ensemble (average of RF and XGB)
+rf_preds = rf_model.predict(X_test)
+xgb_preds = xgb_model.predict(X_test)
+ensemble_preds = (rf_preds + xgb_preds) / 2
+```
+### Download from Hugging Face
+```python
+from huggingface_hub import hf_hub_download
+# Download a specific model file
+model_path = hf_hub_download(
+    repo_id="Haxxsh/gdgc-datathon-models",
+    filename="xgb_final.pkl"
+)
+# Load it
+with open(model_path, "rb") as f:
+    model = pickle.load(f)
+```
+## Hyperparameters
+Best parameters found via cross-validation (see `best_params.json`):
+```json
+{
+  "random_forest": {
+    "n_estimators": 100,
+    "max_depth": null,
+    "min_samples_split": 2,
+    "min_samples_leaf": 1
+  },
+  "xgboost": {
+    "n_estimators": 100,
+    "learning_rate": 0.1,
+    "max_depth": 6
+  }
+}
+```
+## Evaluation
+Cross-validation results are stored in `cv_results.json`. Primary metric: **RMSE** (Root Mean Squared Error).
+## Training Code
+The training code is available on GitHub: [ezylopx5/DATATHON](https://github.com/ezylopx5/DATATHON)
+Key files:
+- `train.py` - Main training script
+- `features.py` - Feature engineering
+- `predict.py` - Inference script
+## Framework Versions
+- Python 3.8+
+- scikit-learn
+- XGBoost
+- pandas
+- numpy
+## License
+MIT License
+## Citation
+```bibtex
+@misc{gdgc-datathon-2025,
+  author = {Haxxsh},
+  title = {GDGC Datathon 2025 Lap Time Prediction Models},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/Haxxsh/gdgc-datathon-models}
+}
+```