New Content-1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Good [afternoon] everyone,

Today, I'm excited to share with you the findings from our capstone project on HR data analysis and
modeling at Delta Ltd.

The goal of our project is to develop a model that can automatically determine the salary range for
employees with similar profiles. This aims to bring transparency and fairness to salary determination,
eliminating biases in the process.

Slide 2: Need of the Study/Project

THIS project addresses the need to use historical data for predicting salaries based on various factors like
experience, education, and industry. By minimizing human judgment, we aim to ensure fairness and
transparency in salary decisions, ultimately benefiting both the company and its employees.

Slide 3: Understanding Business/Social Opportunity

This project offers several benefits, such as attracting and retaining talented employees with competitive
salaries, enhancing employee satisfaction, and promoting social justice by eliminating discrimination in
salary among similar profiles.

Data Report: The dataset, comprising 25,000 records of Delta Ltd. applicants,
encompasses 29 columns, including applicant ID, current and expected CTC, total
experience, education, and industry. Primary data collection methods, such as surveys and
interviews, were employed.

Data Preprocessing: Preprocessing involved removing duplicates and unwanted variables,


handling missing values through imputation, treating outliers using the IQR method, and
encoding categorical variables using label encoding.

Exploratory Data Analysis (EDA) - Univariate:

For univa analysis histogram and boxplots were used to understand the distribution of each
variable. Notably, total experience exhibits a right-skewed distribution, while current and
expected CTC display normal distributions with outliers at higher values.

EDA - Bivariate: Bivariate analysis was performed using scatter and corr matrix to underst
the relatiomship b/w diff variabl

Bivara analysis highlighted significant positive correlations between current and expected
CTC, total experience, and current CTC.

Business Insights from EDA: Key insights include the need for scaling due to the
sensitivity of regression models to variable magnitudes. The marketing department attracts
the highest number of applicants, followed by finance and IT, indicating competitiveness.
Diverse roles and skills are observed among applicants, and the training, IT, and banking
industries are the most relevant.

Model Building and Interpretation: Regression models, including linear regression,


random forest, KNN, XGBoost, lasso, and ridge regression, were employed. Evaluation
metrics included mean squared error, root mean squared error, mean absolute error, and R-
squared score.
The models were compared based on their performance on both training and test sets, as well as their
interpretability and robustness

Model performance observations:

● Linear regression achieves a perfect fit, explaining all variance.


● Random forest performs well with moderate errors and a strong fit.
● KNN exhibits poor performance with higher errors and lower explained variance.
● XG Boost demonstrates strong performance, explaining most data variance.
● Lasso moderately performs, explaining a high proportion of variance.
● Ridge excels with low errors and high explanatory power.

BEST PERFORMER
The top-performing models, including Ridge Regression, Linear Regression, and XG Boost
Regression, exhibit superior accuracy, minimal errors, and a robust fit to the data. These
models effectively capture intricate data relationships, ensuring reliable and consistent
predictions. Moreover, they provide valuable insights into feature importance, aiding in the
identification of key factors influencing salary outcomes.

Model Tuning: Models underwent tuning using ensemble modeling, hyperparameter


optimization, and cross-validation. The KNN Regression model, after tuning with Grid Search
CV, showed improved performance on the test set.

Optimal Model and Business Implications: Random forest and XGBoost models
emerged as potentially optimal due to high accuracy and strong predictive power. Business
implications include supporting decision-making in salary negotiation, employee retention,
and talent acquisition, leading to improved operational efficiency and customer satisfaction.

Limitations and Future Scope: Limitations include potential data representativeness


issues, biases, and lack of generalizability. Future opportunities involve enriching data with
external sources, validating data, and refining models using advanced techniques.

Conclusion: The project successfully achieved its objectives, providing valuable insights
and building robust regression models for HR data analysis at Delta Ltd.

You might also like