Knight-coderr's picture
Create README.md
cdfc4ed verified
Dataset:
Use the Data Science Salaries 2023 dataset available on Kaggle: Data Science Salaries
2023.
Tasks and Requirements:
1. Data Exploration and Preprocessing:
o Load the dataset and perform exploratory data analysis (EDA).
o Clean the data, handle missing values, and encode categorical variables.
o Split the data into training and testing sets.
2. Model Training:
o Train multiple machine learning models (e.g., Linear Regression, Decision
Trees, Random Forest, Gradient Boosting).
o Use MLflow to track experiments, including parameters, metrics, and artifacts.
o Evaluate the models using appropriate metrics (e.g., RMSE, MAE, R²).
3. Model Selection and Optimization:
o Compare the performance of different models.
o Optimize the best-performing model using hyperparameter tuning.
o Record all experiments and their results using MLflow.
4. Streamlit Application:
o Create a Streamlit app to interact with the trained model.
o The app should allow users to input features and get salary predictions.
o Display relevant model performance metrics and visualizations in the app.
5. Model Registration and Deployment:
o Register the best model in the MLflow Model Registry.
o Deploy the model using Hugging Face Spaces.
o Ensure the deployed model is accessible via an API for inference.