Chapter 1 Tupad 2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

“Predicting Qualified Beneficiaries

in the TUPAD Program using


Machine Learning Techniques.”

BSIS 4B

Leader: DelaPeña, Lyster S.

Members:

Erguiza, John Ivin V.

Erguiza, Jeric E.

Magnawa, Mark Jonnel

Paduhilag, Mark Steven


Chapter 1

The Problem and Its Background

1.1 Introduction

The research of predicting beneficiaries in TUPAD, or The Tulong

Panghanapbuhay sa Ating Disadvantaged/Displaced Workers, aims to provide short-term

emergency employment for displaced, underemployed, and disadvantaged workers,

especially those impacted by disasters or economic downturns, according to the

Department of Labor and Employment (DOLE). Through the mobilization of local

workers for various tasks, TUPAD improves the physical environment of communities,

making them cleaner, safer, and more livable. It also supports workers in earning an

income through community-based projects, enhances their skills for future employment,

and aids in their livelihood recovery. The increasing adoption of machine learning

models and advanced algorithms for predictive analytics, coupled with a growing

emphasis on data-driven decision-making, marks a significant trend in social welfare

programs. As poverty reduction and employment generation initiatives gain prominence,

efficient resource allocation, timely support, and community development have become

critical priorities. Research focuses on improving predictive accuracy through optimized

algorithm selection, hyperparameter tuning, and diverse datasets. Ultimately, leveraging

technology and data analytics aims to enhance social welfare outcomes, addressing

societal challenges faced by disadvantaged workers and communities affected by

disasters or economic downturns. the TUPAD program faces several challenges for both

beneficiaries and management. Beneficiaries often struggle with limited job opportunities
that don't match their skills, leading to financial instability due to the program's

temporary nature. Issues of equitable opportunity distribution and insufficient support

services, such as training, further complicate their experiences (Luh Putu Saraswati Devia

Jayanti, 2022). For management, effective resource allocation and accurate data tracking

are significant challenges that require robust systems (Murat Levent Demircan & Kaan

Aksaç, 2022). Coordination with various stakeholders can be complex, and public

perception of the program's effectiveness plays a crucial role in its reception.

Additionally, ensuring the long-term sustainability of TUPAD initiatives to create lasting

impacts on beneficiaries' livelihoods remains a key concern (Department of Labor and

Employment, 2023). This study aims to enhance the predictive accuracy of employee

qualification for the TUPAD program by developing an improved machine learning

model utilizing an expanded dataset. Previous studies have employed Random Forest and

Naive Bayes classification algorithms, achieving accuracy levels of 70-80% (Luh Putu

Saraswati Devia Jayanti, 2022). Additionally, a Logistic Regression model reached a

71.19% accuracy rate through five-fold validation, while Support Vector Machine

(SVM), K-Nearest Neighbors (KNN), Decision Trees (DT), and Multi-Layer Perceptron

(MLP) algorithms were assessed, ultimately optimizing to a 73.14% accuracy level.

Testing the developed model on a fresh dataset yielded a 71.67% accuracy (Murat Levent

Demircan & Kaan Aksaç, 2022). This investigation seeks to address limitations by

incorporating a larger and more diverse dataset, additional relevant features, and

optimized algorithm selection with hyperparameter tuning. By leveraging this enhanced

model, the TUPAD program can improve beneficiary selection accuracy, reduce errors,

and optimize resource allocation, ultimately contributing to more effective poverty


reduction and employment generation initiatives. This study aims to develop an enhanced

machine learning model for predicting TUPAD beneficiaries, achieving the highest

accuracy to date. By leveraging this model, TUPAD can facilitate fast and accurate

selection of beneficiaries, ensuring timely support for disadvantaged workers and

communities. With an improved predictive accuracy exceeding 73%, this research

enables efficient beneficiary selection, reducing processing time and resources while

enhancing community impact by targeting those most in need. Ultimately, the findings

contribute significantly to effective poverty reduction, employment generation, livelihood

recovery, and community development, informing data-driven decision-making for social

welfare initiatives.

1.2 Review of Related Literature

Machine learning techniques can improve the selection process for qualified

beneficiaries in the TUPAD program by automating and increasing accuracy. Traditional

methods are prone to biases and inefficiencies, while machine learning models like

decision trees and random forests can analyze complex data to predict eligibility more

effectively. This approach ensures faster, more accurate beneficiary selection, helping

resources reach those most in need.


Prediction of employee performance

Employee performance prediction has become a critical factor for effective

human resource management (Ikram et al., 2019). The use of machine learning models,

such as decision trees and random forests, has demonstrated improved accuracy in

identifying high-performing employees (Son & Kim, 2019). These models analyze

various employee characteristics, including skills, experience, and past performance, to

predict future productivity. This approach enables organizations to make data-driven

decisions, optimize talent management, and tailor employee development programs

(Kim, 2019).

Employee Turnover Prediction Using Machine Learning

Employee turnover is a crucial concern for organizations, with machine learning

models offering predictive insights to mitigate turnover rates (Ikram et al., 2024). By

analyzing factors such as job satisfaction, compensation, and work environment,

predictive models like random forests and neural networks can forecast turnover risk,

enabling organizations to take proactive measures (Perkins & Neumayer, 2024). This

data-driven approach helps improve employee retention strategies, ensuring better

workforce stability and cost savings (Son & Kim, 2024).


Gender-Based Analysis of Employee Attrition Prediction Using Machine Learning

In the study by Nandy and Kamila (2022), machine learning techniques are

employed to enhance the recruitment process by predicting the suitability of candidates

for specific roles. By utilizing models such as logistic regression, support vector

machines, and random forests, the analysis identifies key features like skills, experience,

and qualifications that contribute to successful job placement. This approach not only

improves the efficiency of recruitment but also reduces bias and helps organizations

make data-driven decisions.

Predictive Analysis of Job Recruitment Using Machine Learning

The integration of machine learning techniques in job recruitment processes has

gained significant attention, enhancing the ability to predict candidate success and

streamline hiring (Khan et al., 2022). By leveraging various algorithms, organizations can

analyze large datasets to identify patterns and factors that contribute to successful job

placements (Shah et al., 2022). Machine learning models enable recruiters to assess

candidate qualifications, predict turnover risks, and improve overall hiring efficiency (Ali

et al., 2022). These predictive insights not only help in making informed hiring decisions

but also foster a more data-driven recruitment strategy that aligns with organizational

goals.
Predictive Analysis of Job Recruitment Using Machine Learning

This study aims to develop a predictor model that proposes suitable job positions

based on a candidate's resume, considering factors like academic performance,

professional experience, education, and publications. The Naive Bayesian classifier is

proposed as an effective algorithm for predicting employability. The model will be

trained on a well-known employed/unemployed resume dataset, consisting of three

phases to train pre-labelled resumes and apply the Bayesian classifier.

1.3 Significance of the Study

This study seeks to explore methods for predicting beneficiaries of TUPAD or the

Tulong Panghanapbuhay sa Ating Disadvantaged/Displaced Workers program. It aims to

identify individuals who qualify for assistance by utilizing various machine learning

tools. the study will help in the selection of qualified beneficiaries ensuring that support

reaches those who need it most. The study can suggest that social welfare programs use

technology to identify the right individuals who need assistance, resulting in more

effective outcomes.
Future Researchers: This research can serve as a valuable resource for future

researchers, providing them with insights and guidance for their own investigations. The

findings of this study can serve as a reference point and inform their future work.

Social Workers and Practitioners: The research may benefit to social workers

and practitioners to provides a better understanding of the factors influencing

beneficiaries' needs, enabling them to provide more targeted support and interventions.

Educational Institutions: This research can benefit educational institutions, such

as the University of Caloocan City, to understand how machine learning can be integrated

into social policies, this research will support the development of an innovative

environment in public service education.

Community Organizations: The research can benefit also to community

organization like NGOs they can use the study’s findings to advocate for better support

systems for working individuals, enhancing their programs to meet community needs

effectively.

Beneficiaries: The study may have benefits for the beneficiaries as well because

it may lead to better selection procedures that will help those who are poor and families

receive right on time assistance, which may improve their quality of life.
1.4 Theoretical/Conceptual Framework
Figure 1.

table 1 Shows the project aligns with business goals by first defining objectives

and success criteria, such as improving candidate selection. Next, it assesses resources,

risks, and conducts a cost-benefit analysis. Data mining goals are then set to determine

technical success metrics, like predictive accuracy. Finally, a detailed project plan

outlines tools, technologies, and timelines to ensure the process meets both business and

technical requirements. table 2 Shows the process starts with collecting initial data,

ensuring all necessary datasets are acquired and loaded into the analysis tools. Next, the

data is described by examining its structure, such as format, number of records, and field

identities. Afterward, the data is explored more deeply through queries, visualizations,

and relationship analysis. Finally, data quality is verified by identifying and documenting

any issues or inconsistencies to ensure it is suitable for further processing. table 3 the first

step is to select relevant datasets and document the reasons for their inclusion or

exclusion. Next, data cleaning is performed, often the most time-consuming task, to

correct, impute, or remove errors, ensuring the quality of the inputs. The process also

involves constructing new attributes, such as calculating a body mass index from height

and weight. Data integration follows, where multiple sources are combined to create

comprehensive datasets. Finally, data is formatted appropriately, such as converting

strings to numeric values, to ensure compatibility with further analysis and modeling.

table 4 Shows the process starts by selecting appropriate modeling techniques, such as

regression or neural networks. A test design is created by splitting the data into training,

testing, and validation sets. The model is then built, often through straightforward code

execution, such as using “reg = LinearRegression().fit(X, y).” Finally, models are


assessed by comparing their performance and interpreting results based on domain

knowledge, predefined success criteria, and the test design. All the criteria and figures

discussed thus far lead to *table 5*, which focuses on the *Evaluation* phase of the

CRISP-DM model. In this phase, the aim is to assess the effectiveness and capability of

the models developed during the previous phases. By examining performance metrics and

ensuring alignment with the defined business objectives and success criteria, this

evaluation provides critical insights into the reliability and accuracy of the data mining

process. table 6 Will illustrate the Deployment phase, where the successful models are

implemented into a real-world environment. This phase ensures that the insights gained

from the evaluation are translated into actionable strategies, allowing businesses to apply

the findings effectively. Overall, both figures emphasize the importance of evaluating the

models' capabilities and ultimately deploying them to achieve the desired business

outcomes. We chose the CRISP-DM model for its structured yet flexible approach to data

mining, which is adaptable to various projects. Its phases—business understanding, data

understanding, data preparation, modeling, evaluation, and deployment—provide a

comprehensive process that promotes clear objectives and effective communication.


1.5 Statement of the Problem

The main concern of the study was to make a Machine learning that could predict

TUPAD members being qualified or unqualified Specifically, the study sought to answer

the following questions:

1. What will be the attributes in classifying if qualified or unqualified as a TUPAD

member

2. How accurately can the machine learning model predict whether a TUPAD

member is qualified or unqualified?

3. What machine learning algorithms are most effective for predicting the

qualification status of TUPAD members?

4. How does the performance of the machine learning model compare to traditional

assessment methods in predicting qualification?

5. What implications does the model's accuracy have for improving the overall

fairness and efficiency of the TUPAD selection process?

6. How can the predictions from the machine learning model inform the decision-

making process for TUPAD program administrators?

1.6 Synthesis
The issues faced in our study revolve around the challenge of hiring qualified

beneficiaries for the TUPAD program. Based on our references, companies often

encounter significant costs and lengthy processes when recruiting suitable candidates

(Luh Putu Saraswati Devia Jayanti, 2022). To address this, the proposed study will utilize

classification algorithms such as J48, Random Forest, and Naive Bayes. These algorithms

have been effectively combined in previous research to analyze garment employees'

productivity datasets. The evaluation of these models will focus on three key metrics:

accuracy, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). Results

from prior studies indicate that Random Forest outperforms other standard algorithms,

achieving an accuracy of 0.983 and an RMSE of 0.1423 (Ruba Obiedat, Sara Amjad

Taubasi, 2022). Furthermore, this study will implement the Cross Industry Standard

Process for Data Mining (CRISP-DM) model, which is well-suited for guiding our

research as `it provides a comprehensive framework for developing the data mining

project lifecycle (John M. Kirimi, 2016).

1.7 Definition of Terms

Predictive Modeling: the use of predictive modeling in our research is to identify


statistical techniques and machine learning algorithms to forecast future outcomes based
on our data we gather. In this research, predictive modeling refers to the application of
machine learning techniques to predict employee turnover.

Machine Learning: We use machine learning in our studies to automatically learn and
improve from experience without being explicitly programmed. For the purpose of this
study, machine learning involves using algorithms to analyze employee data and predict
future turnover rates.

Algorithm: An algorithm is a step-by-step procedure or formula for solving a problem or


completing a task. It consists of a set of defined rules or instructions that can be followed
to achieve a specific outcome, often used in computing and mathematics to process data
or perform calculations. So our research has encountered lots of algorithms but these are
the most used of them Decision tree, Random forest, SVM, J48 and linear regression
which aligned to our given proposed machine learning that would like to develop.

CRISP-DM: This iterative framework helps ensure that projects align with business
objectives while effectively managing data throughout the process. We used the Crispdm
model to efficiently and accurately analyze what our proposed machine learning and to
be more data driven on decision making to prevent risks or errors that may encounter
during the working phase of our proposed study.

1.8 Scope and Delimitation of the Study

This study focuses on enhancing the predictive accuracy of beneficiary selection

for the TUPAD (Tulong Panghanapbuhay sa Ating Disadvantaged/Displaced Workers)

program by employing advanced machine learning classification techniques using tools

such as Weka and Akkiko Inc. The dataset utilized in this research will consist of

3,000 instances, sourced from verified government records and previous studies related to

socioeconomic factors affecting eligibility for TUPAD assistance. The dataset will

include crucial demographic information, such as age, employment status, education

level, income, and geographic location. The treatment approach will emphasize retaining

relevant features that significantly impact beneficiary selection while excluding irrelevant

or redundant data points. This ensures that the model focuses on individuals who

genuinely require assistance, thereby improving the accuracy of predictions.

For classification, this study will specifically utilize three algorithms: Random

Forest, Naive Bayes, and J48 (a decision tree algorithm). These algorithms were chosen

for their effectiveness in handling classification tasks and their ability to provide insights

into feature importance. The study will also include system figures that illustrate the data
processing workflow, model evaluation metrics, and comparative performance results of

the selected algorithms.

Limitations of the Study

The study has several limitations that could impact its findings and applicability.

Firstly, it relies heavily on the availability and quality of data from verified sources; any

gaps or inconsistencies in this data may adversely affect the performance of the model.

Additionally, the complexity of social factors influencing eligibility may not be fully

captured in the dataset, which could lead to biases in the predictions made by the model.

While the focus on specific algorithms such as Random Forest, Naive Bayes, and J48

allows for a targeted analysis, it also means that other potentially valuable algorithms are

excluded from consideration. Furthermore, although the developed model may perform

well on training data, it could encounter challenges when generalizing to new or unseen

data in real-world applications. Lastly, limitations in computational resources may restrict

the ability to conduct extensive hyperparameter tuning or employ cross-validation

techniques, potentially affecting the robustness of the model's performance.


CHAPTER 2

Data Case Analysis

The TUPAD program, run by the Department of Labor and Employment (DOLE),

helps displaced, underemployed, and disadvantaged workers by offering short-term

emergency jobs. The program has several strengths. It not only helps individuals by

providing income and improving their skills, but it also benefits the community. TUPAD

works on local projects that make communities safer and better places to live.

However, there are also some weaknesses. Beneficiaries often struggle to find jobs that

match their skills, which can lead to financial problems since the program only offers

temporary work. Also, there are not enough support services like training programs that

could help beneficiaries find long-term work. From a management perspective, it’s

difficult to track and manage the large amount of data, making it harder to allocate

resources effectively.

Despite these challenges, there are opportunities to improve the program. Our research

will use machine learning to help select beneficiaries more accurately and efficiently. By

improving the process of choosing who should receive support, we can reduce mistakes

and ensure that help reaches those who need it most. Expanding the data and including
more useful information can help the program make better decisions. There is also the

chance to offer long-term solutions for beneficiaries, making the program’s impact last

longer than just temporary jobs. Collaborating with local governments, NGOs, and other

groups can strengthen the program and make it more effective.

You might also like