| Dataset: | |
| Use the Data Science Salaries 2023 dataset available on Kaggle: Data Science Salaries | |
| 2023. | |
| Tasks and Requirements: | |
| 1. Data Exploration and Preprocessing: | |
| o Load the dataset and perform exploratory data analysis (EDA). | |
| o Clean the data, handle missing values, and encode categorical variables. | |
| o Split the data into training and testing sets. | |
| 2. Model Training: | |
| o Train multiple machine learning models (e.g., Linear Regression, Decision | |
| Trees, Random Forest, Gradient Boosting). | |
| o Use MLflow to track experiments, including parameters, metrics, and artifacts. | |
| o Evaluate the models using appropriate metrics (e.g., RMSE, MAE, R²). | |
| 3. Model Selection and Optimization: | |
| o Compare the performance of different models. | |
| o Optimize the best-performing model using hyperparameter tuning. | |
| o Record all experiments and their results using MLflow. | |
| 4. Streamlit Application: | |
| o Create a Streamlit app to interact with the trained model. | |
| o The app should allow users to input features and get salary predictions. | |
| o Display relevant model performance metrics and visualizations in the app. | |
| 5. Model Registration and Deployment: | |
| o Register the best model in the MLflow Model Registry. | |
| o Deploy the model using Hugging Face Spaces. | |
| o Ensure the deployed model is accessible via an API for inference. |