GDGC Datathon 2025 - Lap Time Prediction Models

Trained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition.

Model Description

This repository contains ensemble models trained to predict Lap_Time_Seconds for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation.

Models Included

File Description Size
rf_final.pkl Final Random Forest model 158 MB
xgb_final.pkl Final XGBoost model 2.6 MB
rf_cv_models.pkl Random Forest CV fold models 13.4 GB
xgb_cv_models.pkl XGBoost CV fold models 103 MB
rf_model.pkl Base Random Forest model 95 MB
xgb_model.pkl Base XGBoost model 2 MB
feature_engineer.pkl Feature preprocessing pipeline 6 KB
best_params.json Optimal hyperparameters 1 KB
cv_results.json Cross-validation results 1 KB

Training Data

The models were trained on the GDGC Datathon 2025 dataset:

  • Training samples: 734,002
  • Target variable: Lap_Time_Seconds (continuous)
  • Target range: 70.001s - 109.999s
  • Target distribution: Nearly symmetric (mean โ‰ˆ 90s, std โ‰ˆ 11.5s)

Features

The dataset includes features such as:

  • Circuit characteristics (length, corners, laps)
  • Weather conditions (temperature, humidity, track condition)
  • Rider/driver information (championship points, position, history)
  • Tire compounds and degradation factors
  • Pit stop durations

Usage

Loading the Models

import pickle
import joblib

# Load the final models
with open("rf_final.pkl", "rb") as f:
    rf_model = pickle.load(f)

with open("xgb_final.pkl", "rb") as f:
    xgb_model = pickle.load(f)

# Load feature engineering pipeline
with open("feature_engineer.pkl", "rb") as f:
    feature_engineer = pickle.load(f)

Making Predictions

import pandas as pd

# Load test data
test_df = pd.read_csv("test.csv")

# Apply feature engineering
X_test = feature_engineer.transform(test_df)

# Predict with ensemble (average of RF and XGB)
rf_preds = rf_model.predict(X_test)
xgb_preds = xgb_model.predict(X_test)
ensemble_preds = (rf_preds + xgb_preds) / 2

Download from Hugging Face

from huggingface_hub import hf_hub_download

# Download a specific model file
model_path = hf_hub_download(
    repo_id="Haxxsh/gdgc-datathon-models",
    filename="xgb_final.pkl"
)

# Load it
with open(model_path, "rb") as f:
    model = pickle.load(f)

Hyperparameters

Best parameters found via cross-validation (see best_params.json):

{
  "random_forest": {
    "n_estimators": 100,
    "max_depth": null,
    "min_samples_split": 2,
    "min_samples_leaf": 1
  },
  "xgboost": {
    "n_estimators": 100,
    "learning_rate": 0.1,
    "max_depth": 6
  }
}

Evaluation

Cross-validation results are stored in cv_results.json. Primary metric: RMSE (Root Mean Squared Error).

Training Code

The training code is available on GitHub: ezylopx5/DATATHON

Key files:

  • train.py - Main training script
  • features.py - Feature engineering
  • predict.py - Inference script

Framework Versions

  • Python 3.8+
  • scikit-learn
  • XGBoost
  • pandas
  • numpy

License

MIT License

Citation

@misc{gdgc-datathon-2025,
  author = {Haxxsh},
  title = {GDGC Datathon 2025 Lap Time Prediction Models},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Haxxsh/gdgc-datathon-models}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train Haxxsh/gdgc-datathon-models