metadata
language: en
license: mit
tags:
- tabular-classification
- hospitality
- cancellations
- sri-lanka
- mlops
- shap
model-index:
- name: hotel-cancellation-predictor
results:
- task:
type: tabular-classification
name: Hotel Booking Cancellation
metrics:
- type: f1
value: 0.8046506137865911
- type: roc_auc
value: 0.9384035807110922
- type: precision
value: 0.841708852944808
- type: recall
value: 0.7707179197286602
- type: accuracy
value: 0.8613786749308987
Hotel Booking Cancellation Predictor
Predicts probability that a hotel booking will be cancelled (Sri Lankan hospitality context). The champion model is XGBoost; threshold based decisions currently use 0.35000000000000003 (see champion_meta.json).
Last updated: 2025-10-05 16:43 UTC
Key Metrics (Holdout)
| Metric | Value |
|---|---|
| F1 | 0.8046506137865911 |
| ROC-AUC | 0.9384035807110922 |
| Precision | 0.841708852944808 |
| Recall | 0.7707179197286602 |
| Accuracy | 0.8613786749308987 |
Top Features (SHAP importance)
- deposit_type
- country__te
- market_segment
- total_of_special_requests
- lead_time
- required_car_parking_spaces
- assigned_room_type
- customer_type_target_encoded
- reserved_room_type
- previous_cancellations
Quickstart
from huggingface_hub import snapshot_download
import joblib, json, pandas as pd
local_dir = snapshot_download(repo_id="j2damax/hotel-cancel-model")
model = joblib.load(f"{local_dir}/champion_model.pkl")
preprocessor = joblib.load(f"{local_dir}/preprocessor.pkl")
meta = json.load(open(f"{local_dir}/champion_meta.json"))
sample = pd.DataFrame([{
'lead_time': 45, 'arrival_month': 7, 'adults': 2, 'children': 0, 'adr': 110.0
}])
X = preprocessor.transform(sample)
proba = float(model.predict_proba(X)[:,1][0])
print('Cancellation probability:', round(proba, 4))
Files
champion_model.pkl– serialized champion estimatorpreprocessor.pkl– unified preprocessing / feature pipelinechampion_meta.json– metrics & threshold- Optional SHAP / feature importance JSON artifacts
Notes
Model trained with stratified 5-fold CV; primary selection metric: F1; tie-breaker: ROC-AUC. Class imbalance handled via class weights.
Citation
Academic coursework (NIB 7072) — Sri Lankan tourism cancellation risk analysis.