GDGC Datathon 2025 - Lap Time Prediction Models

Trained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition.

Model Description

This repository contains ensemble models trained to predict Lap_Time_Seconds for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation.

Models Included

File	Description	Size
`rf_final.pkl`	Final Random Forest model	158 MB
`xgb_final.pkl`	Final XGBoost model	2.6 MB
`rf_cv_models.pkl`	Random Forest CV fold models	13.4 GB
`xgb_cv_models.pkl`	XGBoost CV fold models	103 MB
`rf_model.pkl`	Base Random Forest model	95 MB
`xgb_model.pkl`	Base XGBoost model	2 MB
`feature_engineer.pkl`	Feature preprocessing pipeline	6 KB
`best_params.json`	Optimal hyperparameters	1 KB
`cv_results.json`	Cross-validation results	1 KB

Training Data

The models were trained on the GDGC Datathon 2025 dataset:

Training samples: 734,002
Target variable: Lap_Time_Seconds (continuous)
Target range: 70.001s - 109.999s
Target distribution: Nearly symmetric (mean ≈ 90s, std ≈ 11.5s)

Features

The dataset includes features such as:

Circuit characteristics (length, corners, laps)
Weather conditions (temperature, humidity, track condition)
Rider/driver information (championship points, position, history)
Tire compounds and degradation factors
Pit stop durations

Usage

Loading the Models

import pickle
import joblib

# Load the final models
with open("rf_final.pkl", "rb") as f:
    rf_model = pickle.load(f)

with open("xgb_final.pkl", "rb") as f:
    xgb_model = pickle.load(f)

# Load feature engineering pipeline
with open("feature_engineer.pkl", "rb") as f:
    feature_engineer = pickle.load(f)

Making Predictions

import pandas as pd

# Load test data
test_df = pd.read_csv("test.csv")

# Apply feature engineering
X_test = feature_engineer.transform(test_df)

# Predict with ensemble (average of RF and XGB)
rf_preds = rf_model.predict(X_test)
xgb_preds = xgb_model.predict(X_test)
ensemble_preds = (rf_preds + xgb_preds) / 2

Download from Hugging Face

from huggingface_hub import hf_hub_download

# Download a specific model file
model_path = hf_hub_download(
    repo_id="Haxxsh/gdgc-datathon-models",
    filename="xgb_final.pkl"
)

# Load it
with open(model_path, "rb") as f:
    model = pickle.load(f)

Hyperparameters

Best parameters found via cross-validation (see best_params.json):

{
  "random_forest": {
    "n_estimators": 100,
    "max_depth": null,
    "min_samples_split": 2,
    "min_samples_leaf": 1
  },
  "xgboost": {
    "n_estimators": 100,
    "learning_rate": 0.1,
    "max_depth": 6
  }
}

Evaluation

Cross-validation results are stored in cv_results.json. Primary metric: RMSE (Root Mean Squared Error).

Training Code

The training code is available on GitHub: ezylopx5/DATATHON

Key files:

train.py - Main training script
features.py - Feature engineering
predict.py - Inference script

Framework Versions

Python 3.8+
scikit-learn
XGBoost
pandas
numpy

License

MIT License

Citation

@misc{gdgc-datathon-2025,
  author = {Haxxsh},
  title = {GDGC Datathon 2025 Lap Time Prediction Models},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Haxxsh/gdgc-datathon-models}
}

Downloads last month: -

Haxxsh
/

gdgc-datathon-models