codealchemist01 commited on
Commit
02a96af
·
verified ·
1 Parent(s): c7e4da8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +170 -0
README.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: sklearn
3
+ tags:
4
+ - energy-consumption
5
+ - regression
6
+ - random-forest
7
+ - xgboost
8
+ - building-energy
9
+ - sustainability
10
+ - carbon-footprint
11
+ pipeline_tag: regression
12
+ ---
13
+
14
+ # Ecologia Electricity Consumption Model
15
+
16
+ ## Model Description
17
+
18
+ This model predicts **electricity_consumption (kWh)** for buildings using machine learning ensemble methods.
19
+
20
+ - **Model Architecture**: Random Forest Regressor (Ensemble)
21
+ - **Task**: Regression (Energy Consumption Prediction)
22
+ - **Target Variable**: electricity_consumption (kWh)
23
+ - **Input Features**: 22 features
24
+ - **Training Dataset**: Building Data Genome Project 2
25
+ - **Training Samples**: ~15 million
26
+
27
+ ## Model Performance
28
+
29
+ ### Random Forest Model
30
+ - **RMSE**: 37.6519
31
+ - **MAE**: 17.5059
32
+ - **R² Score**: 0.9587
33
+
34
+ ### XGBoost Model
35
+ - **RMSE**: 59.3440
36
+ - **MAE**: 29.7273
37
+ - **R² Score**: 0.8973
38
+
39
+ ### Best Model
40
+ The best performing model (based on validation RMSE) is saved as `electricity_model.joblib`.
41
+
42
+ ## Training Details
43
+
44
+ ### Dataset
45
+ - **Source**: [Building Data Genome Project 2](https://www.kaggle.com/datasets/claytonmiller/buildingdatagenomeproject2)
46
+ - **Training Samples**: ~15 million
47
+ - **Data Preprocessing**:
48
+ - Outlier removal (99th percentile)
49
+ - Feature engineering (temporal, building, weather features)
50
+ - Missing value imputation
51
+ - Normalization
52
+
53
+ ### Training Method
54
+ - **Algorithm**: Ensemble (Random Forest + XGBoost)
55
+ - **Best Model Selection**: Based on validation RMSE
56
+ - **Cross-Validation**: Train/Validation/Test split (60/20/20)
57
+ - **Hyperparameters**: Optimized for large-scale datasets
58
+
59
+ ### Feature Engineering
60
+ The model uses 22 engineered features including:
61
+ - **Building Features**: Type, area, age, location
62
+ - **Temporal Features**: Hour, day, month, season, day of week
63
+ - **Weather Features**: Temperature, humidity, dew point
64
+ - **Interaction Features**: Building-weather interactions
65
+ - **Lag Features**: Previous consumption patterns
66
+
67
+ ## Usage
68
+
69
+ ### Installation
70
+ ```bash
71
+ pip install scikit-learn xgboost joblib huggingface_hub
72
+ ```
73
+
74
+ ### Load Model
75
+ ```python
76
+ from huggingface_hub import hf_hub_download
77
+ import joblib
78
+
79
+ # Download model and features
80
+ model_path = hf_hub_download(
81
+ repo_id="codealchemist01/ecologia-electricity-model",
82
+ filename="electricity_model.joblib",
83
+ token="YOUR_HF_TOKEN" # Optional if public
84
+ )
85
+
86
+ features_path = hf_hub_download(
87
+ repo_id="codealchemist01/ecologia-electricity-model",
88
+ filename="electricity_features.joblib",
89
+ token="YOUR_HF_TOKEN" # Optional if public
90
+ )
91
+
92
+ # Load model and features
93
+ model = joblib.load(model_path)
94
+ feature_columns = joblib.load(features_path)
95
+ ```
96
+
97
+ ### Prediction Example
98
+ ```python
99
+ import pandas as pd
100
+ import numpy as np
101
+
102
+ # Prepare input data (example)
103
+ input_data = pd.DataFrame({
104
+ 'building_type': ['Office'],
105
+ 'area_sqm': [1000],
106
+ 'year_built': [2020],
107
+ 'temperature': [20.5],
108
+ 'humidity': [65],
109
+ 'hour': [14],
110
+ 'day_of_week': [1],
111
+ 'month': [6],
112
+ # ... other required features
113
+ })
114
+
115
+ # Ensure all features are present
116
+ for col in feature_columns:
117
+ if col not in input_data.columns:
118
+ input_data[col] = 0
119
+
120
+ # Select features in correct order
121
+ input_data = input_data[feature_columns]
122
+
123
+ # Make prediction
124
+ prediction = model.predict(input_data)
125
+ print(f"Predicted electricity_consumption (kWh): {prediction[0]:.2f}")
126
+ ```
127
+
128
+ ## Model Limitations
129
+
130
+ - Model performance may vary based on building characteristics and regional differences
131
+ - Training data is primarily from North American buildings
132
+ - Predictions are estimates and should be validated with actual consumption data
133
+ - Model requires all input features to be provided
134
+
135
+ ## Ethical Considerations
136
+
137
+ - Model is designed to help reduce energy consumption and carbon footprint
138
+ - No personal or sensitive data is used in training
139
+ - Model predictions should be used responsibly for sustainability purposes
140
+
141
+ ## Citation
142
+
143
+ If you use this model, please cite:
144
+
145
+ ```bibtex
146
+ @software{ecologia_energy_model,
147
+ title = {Ecologia Electricity Consumption Model},
148
+ author = {Ecologia Energy Team},
149
+ year = {2024},
150
+ url = {https://huggingface.co/codealchemist01/ecologia-electricity-model},
151
+ note = {Trained on Building Data Genome Project 2 dataset}
152
+ }
153
+ ```
154
+
155
+ ## License
156
+
157
+ This model is released under the MIT License.
158
+
159
+ ## Contact
160
+
161
+ For questions or issues, please open an issue on the repository or contact the Ecologia Energy team.
162
+
163
+ ## Acknowledgments
164
+
165
+ - Building Data Genome Project 2 dataset creators
166
+ - scikit-learn and XGBoost communities
167
+ - HuggingFace for model hosting
168
+
169
+ ---
170
+ *This model is part of the Ecologia sustainability platform for energy consumption prediction and carbon footprint calculation.*