ASM2-Vu Lam Le-3975055

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Assessment 2: Visualisations and Predictive Analytics Report

Made by Vu Lam Le

MAY 19, 2024


RMIT
Introduction to Business Analytics (2410)
Table of Contents
I. Executive summary............................................................................................... 1
II. Introduction.................................................................................................... 2
III. Data Pre-processing........................................................................................... 2
IV. Analytics and Findings/Discussions.......................................................................3
Question 1.............................................................................................................. 3
The Years of Employment associated with Age and Occuption..........................................3
The Years of Employment associated with Education Years and Sex...................................4
The Years of Employment associated with Work Hours and Sex........................................5
Question 2.............................................................................................................. 6
Interpret the Regression Analysis..............................................................................6
Summary of the Regression Analysis.........................................................................7
Question 3.............................................................................................................. 8
Interpretation of Logistic Regression Model....................................................................8
V. Recommendations............................................................................................... 10

I. Executive summary
The executive summary of the document emphasizes a thorough examination of employment
data by employing advanced data pre-processing techniques and conducting in-depth
analytics. The data underwent a thorough preparation process, which involved the
segmentation of thirteen essential variables into numerical and categorical categories. To
facilitate integration with machine learning techniques, special care was taken to convert
categorical data to numeric format. The pre-processing staged the data, which enabled a
thorough examination and yielded crucial findings regarding employment patterns in diverse
demographics and occupational classifications.

The results obtained from the analysis highlight noteworthy associations among age, years of
work experience, and educational attainment in various occupational classifications. The
relationships are further quantified through regression analysis, which reveals that age and
working hours have a positive impact on employment duration. Conversely, higher education
levels are correlated with later career beginnings, which results in shorter employment
tenures.
An additional aspect of the document pertains to a logistic regression model that forecasts the
likelihood of long-term employment. The model demonstrates that age and work hours
increase the probability of extended employment, whereas higher education decreases it.
These observations are vital in the formulation of focused human resource strategies and
workforce planning, with the objective of improving employee retention and catering to
distinct demographic requirements.

In general, the executive summary succinctly summarizes the essential components of the
analysis, placing particular emphasis on the preparatory stage of the data, the analytical
techniques employed, and the pragmatic ramifications of the results for the strategy of the
organization and the management of its workforce.

II. Introduction

This analysis provides an in-depth analysis of the intricacies surrounding employee retention
in a diverse organizational structure, including the effects of education, age, and working
hours. By conducting thorough data pre-processing, we have compiled a dataset consisting of
thirteen essential variables. These variables have been classified into numerical and
categorical groups, which guarantees a strong incorporation with sophisticated machine
learning techniques. The insights gained from our extensive regression and logistic modeling
analyses regarding the impact of these factors on employee longevity are of great value. This
analysis serves to underscore the noteworthy predictors of years of employment and
additionally aids in the formulation of focused human resource strategies that improve
employee retention and satisfaction—thus cultivating a work environment that is both
supportive and productive.

III. Data Pre-processing

The dataset has been subjected to thorough data pre-processing to facilitate efficient and
effective analysis, in accordance with the chosen approaches. The dataset has 13 essential
variables, which are divided into two groups: numerical (IdNum, WorkHrs, Age, Educ_Yrs,
WrkYears, EmpYears, NumPromo) and category (Occup_n, Sex, MemUnion, FutPromo,
SexPromo, AwareI4). The categorical variables were initially transformed into numeric
format using one-hot encoding. This conversion was done skillfully to ensure smooth
integration with machine learning techniques. Particular emphasis was placed on EmpYears,
which is regarded as the prevailing variable in this analysis. These thorough pre-processing
methods guarantee that the dataset is strong, well-prepared, and ready for advanced analysis,
thereby enabling remarkable analytical results.

IV. Analytics and Findings/Discussions

Question 1.
The Years of Employment associated with Age and Occuption

Scatter plot showing the relationship between age (Age) and years worked (EmpYears) across
seven distinct occupational categories (Occup_n). Each box is color-coded by occupation,
making it easy to distinguish between categories. This indicates that typically, as individuals
in these occupations age, their number of years working in those roles tends to increase.
According to the data, there are three main groups: groups with strong correlation, groups
with medium correlation, and groups with less correlation. Strongly correlated group
includes: Management, Services, and Workers, which suggests steady career growth in
management roles as age increases. The next group with average correlation including:
Professional and Admin, this group still shows correlation but less than the first group.
Finally, the less correlated group includes: Production and Tech/Sale, showing that the data
points are slightly more spread out and relatively flat trend, suggesting some variability in
career length among individuals.

The Years of Employment associated with Education Years and Sex

The provided visualization displays the relationship between years of education (Educ_Yrs)
and years of employment (EmpYears), separated by gender (Female on the left, Male on the
right). The plot for females shows a considerable number of data points, with a scatter
distribution heavily concentrated in lower years of education (around 5 to 15 years). The
regression line for females suggests a slight negative trend, indicating that as years of
education increase, the years of employment slightly decrease. However, the confidence
interval is wide, reflecting substantial variability in the data and potentially weak correlation.
For males, the points are more widely distributed across the education range, though
primarily focused between 5 to 20 years of education. The regression line for males exhibits a
similarly slight negative slope, suggesting a small decrease in years of employment as
education increases. Like the female plot, the confidence interval is wide, suggesting
variability. Both plots show considerable scatter, indicating variability in how education
correlates with employment duration across individuals. Notably, there are outliers in the
female plot, particularly for those with lower years of education but high employment years,
which might suggest cases of early career starts or long tenure in specific roles without
correspondingly high formal education. The trend lines for both genders are similarly sloped,
indicating that the relationship between education and employment years does not differ
markedly between genders.

The Years of Employment associated with Work Hours and Sex

The visualization you've shared illustrates the relationship between working hours (WorkHrs)
and years of employment (EmpYears), split by gender (Female on the left, Male on the right).
Females: The plot shows a slight negative trend, suggesting that as working hours increase,
the years of employment slightly decrease. The regression line's slope is negative, but the
confidence interval is broad, indicating considerable uncertainty around the estimate. Males:
Similarly, there is a slight negative trend, although the regression line is flatter compared to
females. This suggests a weaker relationship between working hours and years of
employment for males. The data for females is more clustered at lower working hours (below
40 hours), while for males, the data points are more evenly distributed across the range of
working hours. Both plots exhibit some outliers, particularly in the upper range of working
hours, where a few individuals have significantly higher years of employment regardless of
the general trend. The trend appears slightly more pronounced for females than males,
suggesting that changes in working hours may have a slightly more noticeable impact on
employment duration for females. Males show a wider distribution in both working hours and
years of employment, indicating greater variability among this group.
Question 2.
Interpret the Regression Analysis

The regression model aims to predict the number of EmpYears (years of employment) based
on three independent variables: Age, WorkHrs (working hours), and Educ_Yrs (years of
education)

Interpretation of Coefficients:

Intercept (-5.99943):

This is the value of EmpYears when all independent variables (Age, WorkHrs, and Educ_Yrs)
are zero. While it might not have a practical interpretation in this context (since Age,
WorkHrs, and Educ_Yrs cannot realistically be zero), it serves as a baseline value from which
the effects of the other variables are measured.

Age (0.48644):

For each additional year of age, the number of years of employment (EmpYears) increases by
approximately 0.486 years, holding WorkHrs and Educ_Yrs constant. This positive
relationship suggests that older employees tend to have more years of employment.

WorkHrs (0.08607):

For each additional hour of work per week, the number of years of employment (EmpYears)
increases by approximately 0.086 years, holding Age and Educ_Yrs constant. This indicates a
positive, but relatively small, relationship between working hours and years of employment.

Educ_Yrs (-0.60483):

For each additional year of education, the number of years of employment (EmpYears)
decreases by approximately 0.605 years, holding Age and WorkHrs constant. This negative
relationship might suggest that individuals with more years of education tend to start their
careers later, thus having fewer years of employment relative to their age.Overall Model
Analysis:
Age and WorkHrs have a positive effect on the number of years of employment, meaning
older employees and those who work more hours tend to have more years of employment.
While Educ_Yrs has a negative effect, indicating that more educated employees might start
their careers later, leading to fewer years of employment compared to less educated
employees of the same age. This regression model provides insight into how age, working
hours, and education level impact employees' years of employment, helping to understand the
dynamics of employee tenure within the organization.

Summary of the Regression Analysis

The regression analysis aims to predict the number of employment years (EmpYears) based
on three predictors: age (Age), working hours (WorkHrs), and years of education (Educ_Yrs).
The results of the regression model are as follows:

Residuals:

The residuals indicate the differences between observed and predicted values of EmpYears.
The residuals range from -19.9695 to 30.4403, with a median of -0.3459, suggesting a
relatively balanced distribution of errors around the predicted values.

Model Statistics:

Residual Standard Error: 7.657, indicating the typical size of the residuals.
Multiple R-squared (0.3374): Approximately 33.74% of the variability in EmpYears is
explained by the model, which is moderate.

Adjusted R-squared (0.3354): Adjusted for the number of predictors, slightly lower than the
multiple R-squared, reflecting the model's explanatory power.

F-statistic (169.1): The overall significance of the model (p < 2.2e-16), indicating that the
predictors collectively have a significant effect on EmpYears.

Significance:

The *** notation in the output indicates that the p-values for all predictors are extremely
significant (p < 0.001). This suggests that years of education, working hours, and age are all
significant predictors of employment years. However, the Multiple R-squared index stands at
a mere 33.74%, indicating that the available data is insufficient to generate predictions
regarding the staff's years of employment. Additional data that may be required include
employee satisfaction levels, the remuneration policy of the organization, and employee
preferences to the company culture.

Question 3.
Interpretation of Logistic Regression Model
The logistic regression model predicts the probability that an employee has more than 10
years of employment (EmpYears_more10yrs) based on WorkHrs (working hours), Age, and
Educ_Yrs (years of education). The results of the model are as follows:

Coefficients:

Intercept (-4.010420): This is the baseline log-odds of having more than 10 years of
employment when all predictors are zero.

WorkHrs (0.018138): For each additional hour of work per week, the log-odds of having
more than 10 years of employment increases by approximately 0.0181 (p = 0.0185),
indicating a significant positive relationship.

Age (0.103903): For each additional year of age, the log-odds of having more than 10 years
of employment increases by approximately 0.1039 (p < 2e-16), indicating a highly significant
positive relationship.

Educ_Yrs (-0.142795): For each additional year of education, the log-odds of having more
than 10 years of employment decreases by approximately 0.1428 (p = 4.42e-06), indicating a
highly significant negative relationship.

Model Statistics:

Null Deviance (1239.8): This measures the goodness of fit of a model with only the intercept
(no predictors) on 999 degrees of freedom.Residual Deviance (1006.9): This measures the
goodness of fit of the model with all predictors on 996 degrees of freedom. A lower residual
deviance indicates a better fit.

AIC (1014.9): The Akaike Information Criterion is used to compare models; a lower AIC
indicates a better fit.

Number of Fisher Scoring Iterations (4):

The number of iterations the model used to converge. A lower number of iterations generally
indicates faster convergence.

Conclusion

The logistic regression model reveals that WorkHrs and Age have significant positive effects
on the likelihood of having more than 10 years of employment. While, Educ_Yrs has a
significant negative effect, suggesting that higher education is associated with a lower
probability of having more than 10 years of employment, potentially due to later career starts.

The model fit is reasonably good, as indicated by the decrease in deviance and the relatively
low AIC value. These insights help in understanding the factors that influence long-term
employment, allowing for more targeted HR strategies and workforce planning.

V. Recommendations

In light of the information and conclusions presented in your document, a number of


strategies can be implemented to increase employee retention. It is essential to develop
distinct career advancement strategies, particularly for those in management, service, and
employee positions. Enhancing employee retention can be achieved through the
implementation of flexible working hours and the promotion of work-life balance initiatives,
including remote work options and adaptable schedules. This effect is particularly
pronounced among female employees. By offering educational incentives that encourage
employees to obtain qualifications without postponing their career initiation, the inverse
relationship between years of employment and higher education can be remedied. Consistent
employee engagement and satisfaction can be increased through the implementation of
recognition programs and feedback and satisfaction surveys. In addition to implementing
performance-based incentives, routinely evaluating and modifying compensation packages to
maintain industry competitiveness can inspire and retain personnel. One way to cultivate a
favourable corporate culture is by coordinating team-building exercises and advocating for an
inclusive work environment. Tailored retention strategies can effectively target distinct
challenges and requirements of various occupational categories. Lastly, proactive retention
strategies can be informed by the use of predictive analytics to anticipate potential employee
attrition and the ongoing analysis of employee data to identify trends and patterns. By
implementing these solutions, a rewarding and supportive workplace can be established,
which can promote long-term employment.

You might also like