GDGC Datathon 2025 - Lap Time Prediction Models
Trained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition.
Model Description
This repository contains ensemble models trained to predict Lap_Time_Seconds for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation.
Models Included
| File | Description | Size |
|---|---|---|
rf_final.pkl |
Final Random Forest model | 158 MB |
xgb_final.pkl |
Final XGBoost model | 2.6 MB |
rf_cv_models.pkl |
Random Forest CV fold models | 13.4 GB |
xgb_cv_models.pkl |
XGBoost CV fold models | 103 MB |
rf_model.pkl |
Base Random Forest model | 95 MB |
xgb_model.pkl |
Base XGBoost model | 2 MB |
feature_engineer.pkl |
Feature preprocessing pipeline | 6 KB |
best_params.json |
Optimal hyperparameters | 1 KB |
cv_results.json |
Cross-validation results | 1 KB |
Training Data
The models were trained on the GDGC Datathon 2025 dataset:
- Training samples: 734,002
- Target variable:
Lap_Time_Seconds(continuous) - Target range: 70.001s - 109.999s
- Target distribution: Nearly symmetric (mean โ 90s, std โ 11.5s)
Features
The dataset includes features such as:
- Circuit characteristics (length, corners, laps)
- Weather conditions (temperature, humidity, track condition)
- Rider/driver information (championship points, position, history)
- Tire compounds and degradation factors
- Pit stop durations
Usage
Loading the Models
import pickle
import joblib
# Load the final models
with open("rf_final.pkl", "rb") as f:
rf_model = pickle.load(f)
with open("xgb_final.pkl", "rb") as f:
xgb_model = pickle.load(f)
# Load feature engineering pipeline
with open("feature_engineer.pkl", "rb") as f:
feature_engineer = pickle.load(f)
Making Predictions
import pandas as pd
# Load test data
test_df = pd.read_csv("test.csv")
# Apply feature engineering
X_test = feature_engineer.transform(test_df)
# Predict with ensemble (average of RF and XGB)
rf_preds = rf_model.predict(X_test)
xgb_preds = xgb_model.predict(X_test)
ensemble_preds = (rf_preds + xgb_preds) / 2
Download from Hugging Face
from huggingface_hub import hf_hub_download
# Download a specific model file
model_path = hf_hub_download(
repo_id="Haxxsh/gdgc-datathon-models",
filename="xgb_final.pkl"
)
# Load it
with open(model_path, "rb") as f:
model = pickle.load(f)
Hyperparameters
Best parameters found via cross-validation (see best_params.json):
{
"random_forest": {
"n_estimators": 100,
"max_depth": null,
"min_samples_split": 2,
"min_samples_leaf": 1
},
"xgboost": {
"n_estimators": 100,
"learning_rate": 0.1,
"max_depth": 6
}
}
Evaluation
Cross-validation results are stored in cv_results.json. Primary metric: RMSE (Root Mean Squared Error).
Training Code
The training code is available on GitHub: ezylopx5/DATATHON
Key files:
train.py- Main training scriptfeatures.py- Feature engineeringpredict.py- Inference script
Framework Versions
- Python 3.8+
- scikit-learn
- XGBoost
- pandas
- numpy
License
MIT License
Citation
@misc{gdgc-datathon-2025,
author = {Haxxsh},
title = {GDGC Datathon 2025 Lap Time Prediction Models},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Haxxsh/gdgc-datathon-models}
}
- Downloads last month
- -