Ijertv13n10 46withibthal-0.5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/345197059
Students Performance: From Detection of Failures and Anomaly Cases to the

Solutions-Based Mining Algorithms
Article in International Journal of Engineering Research and Technology · November 2020

DOI: 10.37624/IJERT/13.10.2020.2895-2908
CITATIONS READS
5 520
2 authors:
Ebtehal Ibrahim Al-Fairouz Mohammed Abdullah Al-Hagery

Qassim University Qassim University
3 PUBLICATIONS 17 CITATIONS 63 PUBLICATIONS 320 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Mohammed Abdullah Al-Hagery on 02 November 2020.
The user has requested enhancement of the downloaded file.

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 10 (2020), pp. 2895-2908
© International Research Publication House. http://www.irphouse.com
Students Performance: From Detection of Failures and Anomaly Cases to

the Solutions-Based Mining Algorithms
Ebtehal Ibrahim Al-Fairouz1, Mohammed Abdullah Al-Hagery2

1,2
Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia.
1
ORCID: 0000-0002-2667-5925, 2ORCID: 0000-0001-6939-013X
Abstract Data mining is an essential step in what is referred to as

Knowledge Discovery In Databases (KDD) [2]. It can be
Educational Data Mining (EDM) helps to recognise the
briefly defined as extracting useful features or unseen patterns
performance of students and predict their academic
from a large data set [3], [4]. The KDD process consists of
achievements that include the successes aspects and failures,
several steps; the first one involves gathering appropriate data
negative aspects and challenges. In the educational systems, a
from different sources. The second, data selection, to determine
massive amount of students' data has been collected, which has
which data is to be used. Third, data pre-processing, which
become difficult for officials to search through and obtain the
involves filling in missing values, removing outliers and
knowledge required to discover challenges facing students and
resolving inconsistencies in the data. Fourth, data
universities by traditional methods. Therefore, the rooted
transformation, by converting data into a format that is
problem is how to dive into these data and discover real
appropriate for the mining process. Fifth, data mining
challenges that are facing both the students and the universities.
algorithms by applying intelligent techniques to extract useful
The main aim of this research is to extract hidden, significant
patterns. Finally, the evaluation of results, seen in the patterns
patterns, new insights from students' historical data, which can
that represent knowledge discovered [2], [5].
solve the current problems, help to enhance the educational
process and to improve academic performance. The data Predicting student performance is a significant concern for
mining tools used for this task are classification, regression, and educational institutions [6]. To do this, the field of education
association rules for frequent patterns generation. The research has adopted data mining techniques as a way to detect and
data sets gathered from the College of Business and Economics analyse student performance and predict their learning
(CBE). The finding of this research can help to make achievement. The techniques have shown themselves to be
appropriate decisions for certain circumstances and provide capable of preventing failure and focusing on poor performance
better suggestions for overcoming students' weaknesses and to guide and help overcome difficulties. Student performance
failures. Through the findings, numerous problems related to a depends on many factors, such as the social, economic and
students' performance discovered at different levels and in personal; knowledge could be derived from these factors to
various courses. The research findings indicated that there are assess the academic performance of students [7]. Other benefits
many important problems. Consequently, a suggestion of include better evaluating the institution, helping improve the
suitable solutions, which can be presented to the relevant education process, identifying future requirements, and
authorities for the benefit and improving student performance improving decision making [8]. The importance of this research
and activating academic advising. comes from the promise this field offers, in serving the
educational process, the university and the student.
Keywords: Educational Data Mining, Students Failures,
Student Performance, Academic Advising, Association Rules, This study aims to discover new patterns and features in the
Anomaly Detection. students' academic records. It contributes to predict and
improve academic performance using regression and
1. INTRODUCTION classification techniques on that data for the last five years.
Moreover, it identifies the student's weaknesses and failures
Educational institutions have information systems designed to and explores the knowledge that helps to improve the
provide the information necessary for the management and educational process. Furthermore, it tries to find the reasons for
educational development process. The Educational Information the student's repeated failure in a particular course by use
System (EIS) is a means of collecting, analysing, maintaining association rules and to activate academic advising for students
and distributing information and data, which supports decision to overcome or minimise their problems and failures. Also, this
making [1]. The data mining processes and tools can extract research contributes to discovering anomalous values that may
useful knowledge from these systems, which have accumulated provide great benefits in achieving the requirements to raise the
educational data over several years. level of education quality.
2895
The motivation for doing this research is to help the college of One of the common methods that can be employed in this field
CBE to find useful solutions that help in achieving quality in is the decision tree, using the decision tree model as a classifier
the educational process. In addition to searching for the reasons or predictor for students' academic data can help to analyse the
that led to the level weakness of some students and their low data and to study student performance and the discovery of their
academic achievement, or searching for outstanding students in achievements [8], [26].
its various departments to benefit from their experiences in
Besides, applying the Data Mining Tools can constitute a
achieving high academic performance.
practical guide for decision-makers and teachers in higher
2. RELATED WORK education institutions, to identify hidden problems related to
student success and failure [27]. Furthermore, the classification
Research in EDM is an interesting domain for academics and
techniques are useful to predict a student's career [28].
researchers, especially in educational institutions. The research
in this area generates useful knowledge related to students, The use of association rules algorithms can be extensively used
instructors, courses and the educational management system, as in studies related to EDM alongside the other algorithms. The
a whole. Since the knowledge from data collected in benefit of association rules extraction is to find frequent
educational systems is a veritable gold mine, it is important to patterns in databases and to explore the relationship between
make accurate decisions in achieving the requirements for this the various attributes that affect the academic achievement of
work, as it helps raise the educational process, in addition to students [29], [30]. Furthermore, revelation the useful
increasing the quality of the educational institution and information from behavioral data for students by using
reducing failure. association rules. Additionally, by the association rules, we can
obtain frequent patterns of behaviors that have a significant
Data mining can be used in the area of education for a better
impact on student performance and students' Failures cases can
understanding of the learning process and acquiring practical
be identified. This may help educational institutions understand
knowledge. This, in turn, helps identify problems facing
and improve students' behavior and also make the appropriate
students and reduce failure in academic performance [9]. Data
decisions, besides, the use of the association rule method that
mining in the educational area is called Educational Data
offers insight into improving admissions planning [14], [28],
Mining (EDM). It has contributed significantly to the
[31].
measurement of student academic performance and preventing
dropouts, and to better understanding failure [7]. The EDM is a In this paper, we selected the most significant tools to analysis
research field that assists in discovering ways to enhance the students' historical dataset from the CBE to identify aspects of
quality of education [10], [11]. It is a computer-based learning student failure, success and predict their academic performance
method that helps discover new patterns of data sets in using these technologies, which include classification and
educational institutions and represents one particular field of regression, Outlier Analysis. Where Outlier Analysis are
data mining [8]. representing the anomalies cases. Also, the use of technologies
that help discover students' achievements and find out the
EDM includes various sets of users or members, including the
reasons behind some students' failure by using association rules.
educational institution's administrators, teaching staff, students,
Also, this paper contributes to the search for anomalies
curriculum developers, and planners [10], [12]. Since 1993,
detection that may be distinct cases of the college that help in
many research works have employed EDM, with the number of
making the appropriate decisions in the interest of those
these studies growing significantly since then [13], [14]. A
students.
research works focused on extracting knowledge from student
data, predicting performance, evaluating student performance
in specific courses or finding an association between courses
3. METHODOLOGY
using various data mining techniques [15].
The proposed method uses several various techniques to focus
Some related works have obtained their data from the learning
on student performance analysis of the CBE. The overall
management system (LMS) known as kalboard 360 [16]–[18].
architecture of the proposed method is shown in Figure 1. In
Whereas many studies relied on the analysis of real data from
this study, we used the Orange data mining platform as open-
different environments of institutions, such as colleges,
source software for data mining and machine learning [32]. The
universities, or schools using common classification methods,
data mining techniques include Linear Regression, Association
like collected data sets from the College of Computer
Rules, Decision Tree, Naive Bayes, and Random Forest. The
Applications in India, also, from the National Defense
classification and regression techniques were used to predict
University in Malaysia. Some of the datasets were not enough
students' performance. Whereas, the association rules
[19]–[23].
technique was used for detecting frequents items among
Additionally, some of the previous works utilised limited students' records; to understand the reasons for their failure.
methods such as the classification and regression methods in
their study [24], [25].
2896
Fig 1. Methodology framework

3.1 Data Collection SEMESTER_G Semester Grade Point
11
PA Average (SGPA) out of 5.0.
This study was conducted on the five-year (2014–2018) data
STSTATUS_D Active, graduate, dropped out,
set of undergraduate students enrolled in the CBE. The data set 12
ESC etc.
contains male and female students' data from the different
departments, namely Management Information Systems (MIS), MAJOR_NAM MIS, Accounting, Finance,
13
Finance, Accounting, Economics, Business Administration E Economics, BA, or Pre-Major
(BA), and Pre-Major. The total number of records is 72,259 and Actually level of the student
STUDENT_LE
14 attributes. The attributes used in this study are described in 14 such as First level, second
VEL
Table 1. level, etc.
Table 1: Data set information

3.2 Data Pre-processing
Attribute Real-world data tends to be noisy, incomplete, and inconsistent.
# Description
Name For this reason, the best practice to use before data mining
This attribute contains the techniques is the application of data pre-processing, which will
semester such as 382, 391, etc. ensure error-free and high-quality data. The data pre-
The meaning of 382: (38) is processing steps are shown in Figure 2.
1 SEMESTER
the year 1438 in the Hijri and
(2) is the second semester of
this year.
COURSE_COD
2 The code of the course.
E
COURSE_NA
3 The full name of the course.
ME
4 CRD_HRS The credit hours per semester.
A student number is a unique
5 STUDENT_ID
number for each student.
Fig 2. Data Pre-processing
GENDER_NA
6 Female, Male.
ME
Date of adding the course to 3.2.1 Data Cleaning
7 ENTRY_DATE
the student schedule.
(1) Remove missing values
CONFIRMED_ Student points of 100 in every In the first step, we used the Orange platform to clean the
8
MARK course.
data and remove missing values, where the records
9 GRADE_DESC A+, A, B+, B, etc.
Cumulative Grade Point containing empty values were completely deleted. After
10 CUM_GPA the records of missing values were deleted, the data
Average (CGPA) out of 5.0.
reduced to 52,430 records.
2897
(2) Resolve inconsistencies We created a second new feature, called "Class_Semesters," to

Inconsistent data is that contain discrepancies in names or group the semesters into years, using SEMESTER attribute, as
values. It was done through used the Microsoft Excel, shown in Table 3.
involved checking the data set, and used this step as a
Table 3: SEMESTER classification
means of avoiding future errors and conflicts.
(3) Detect outliers and anomalies # Semester Class

As we know, outliers can present as an incorrect values 1 342-351 (2014) First Year
entry, sampling error or exceptional true value. We
2 352-361 (2015) Second Year
checked outliers' values to identify them and make sure
they are not incorrect values. In the third step of data pre- 3 362-371 (2016) Third Year
processing, we reveal the outliers by using the outliers' 4 372-381 (2017) Fourth Year
widget in the Orange platform. The widget revealed 525
5 382-391 (2018) Fifth Year
outlier cases. Figure 3 displays the outliers' detection by
scatter plot.
We created the third new feature, called "Class_Marks," to
group students' grades into two sub-groups by using the
CONFIRMED_MARK attribute, as shown in Table 4.
Table 4: CONFIRMED_MARK classification
# CONFIRMED_MARK Class
1 >=60 P (Pass)
2 <60 F (Fail)
3.3 Application of Data Mining Techniques
(1) Classification Methods

Fig 3. Scatter plot detecting outliers/ anomalies cases
Decision Tree (DT) is a tool that helps support decisions and
uses a flow chart in the form of a tree that contains a set of rules
3.2.2 Data Transformation can be represented in this form "IF-THEN" [2], [4], [33].
Following data cleaning, we used data transformation to Random Forest (RF), an ensemble method, is normally used to
provide more effective results. It should be noted that some of improve accuracy [34]. The principle of the ensemble method
the proposed algorithms require GPA classification due to it not is that weak classifiers can be combined to form a strong
being able to handle continuous numerical values. Our study ensemble model or strong classification method. The RF is a
classified GPA into five categories. collection of DTs (weak classifiers), with all the outcomes of
these DTs collected to produce the RF, which is a strong
The first new feature was called "Class_GPA" and was
classifier, then "the average" or "the majority voting" is used
assembled by using CUM_GPA attribute, normally used to
to predict the final result [16], [35], [36].
split students by their CGPA into multiclass classifications. The
CGPA was classified into five categories, as shown in Table 2. Naïve Bayes (NB) a simple technique for probability
classification based on Bayes' theorem. It is called Naïve
Table 2: CUM_GPA classification because it assumes that all attributes are independent of each
other, which means the attributes are not correlated with each
# CGPA Class other [24], [37]. This algorithm is faster because this classifier
requires small amounts of training data and less computing than
1 >=4.5 Excellent other algorithms [2].
2 >=3.75 Very Good
Model performance is measured using the Confusion Matrix. It
3 >=2.75 Good is a table that contains columns and rows, where the number of
4 >=2.0 Acceptable columns and rows depends on the number of classes. It displays
the number of true positives (TP), true negatives (TN), false
5 <2.0 Fail positives (FP), and false negatives (FN). Several measures can
be derived from the confusion matrix to evaluate the
2898
performance of models. In this study, our focus is on when the generated rules with values of confidence and support
Classification Accuracy (CA), Precision, F1-score and Recall, for itemsets is lower than the predefined minimum value, these
as seen in Equations (1) to (4) [2]. itemsets are not accepted as a frequent itemset; consequently,
𝑇𝑃+𝑇𝑁 the generated rules will be rejected [30], [41], [42]. The
CA= (1) Equation of support and confidence measures are given in
𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁
Equation (9) and Equation (10), respectively. A and B are
𝑇𝑃
Precision= (2) frequent itemsets, P is the probability [2].
𝑇𝑃+𝐹𝑃
𝑓𝑟𝑒𝑞(𝐴,𝐵)
F1-score =
2× 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
(3) Support(𝐴 ⟹ 𝐵) = 𝑃(𝐴 ∪ 𝐵) = (9)
𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
𝑇𝑃 𝑓𝑟𝑒𝑞(𝐴,𝐵)
Recall=
𝑇𝑃+𝐹𝑁
(4) Confidence (𝐴 ⟹ 𝐵) = 𝑃(𝐴|𝐵) = (10)
𝑓𝑟𝑒𝑞(𝐴)
(2) Regression Methods 4. EXPERIMENTS AND ANALYSIS
Linear Regression (LR) is a predictive model used to predict 4.1 The General Analysis of Student Performance
the value of the dependent variable (y) based on the value of Students' performance was analysed through the Orange
the independent variable (x) [10], [38]. LR can produce platform. We used the Distribution widget to shows the values
accurate predictions and is considered one of the easiest for Class_GPA based on five years of study. We compared the
techniques to apply. In the LR model, the two-dimensional data performance of students over five years to determine the
is represented as dots falling into a straight line, where the X- possibility of failure and excellence. Table 5 shows the
axis is the predictor and the Y-axis is the target [39]. The probability of failure and excellence for five years and the total
performance of the regression model is evaluated based on four number of student records in each semester. It also displays the
of the most popular metrics: Mean Square Error (MSE), Root percentage of students who excel and fail.
Mean Square Error (RMSE), Mean Absolute Error (MAE), and
coefficient of determination (R-squared) [40]. The MSE, In the next part, we compared the failure and excellence rates,
RMSE, MAE, and R-squared are presented below, from where the records were divided into ten semesters. The goal
Equation (5) to Equation (8). Where n is the total number of was to search for the semester that comprised large numbers of
observation/ rows, yi represents the actual values, ŷi represents students failing and excelling. Table 6 focusses on students
predicted/ estimated values, y̅i is the mean of the actual yi who excel and fail and compares this to the rates of excellence
values and the i value ranging from 1 to n. and failure throughout the ten semesters.
𝑛
1 2
Students' GPA was analysed based on Major_Name, to know
𝑀𝑆𝐸 = ∑(𝑦𝑖 − 𝑦̂)
𝑖 (5) which majors include the most significant number of excellent
𝑛
𝑖=1
and failed students. Table 7 shows data on excellent and failed
students based on Majors. Since the number of students
∑𝑖(𝑦𝑖 − 𝑦̂)
𝑖
2
influences the failure or excellence rate, the total number of
𝑅𝑀𝑆𝐸 = √ (6)
𝑛 students in each major is calculated, as shown in the following
Table 7.
𝑛
1
𝑀𝐴𝐸 = ∑ |𝑦𝑖 − 𝑦̂|
𝑖 (7) In the next part, the data will be analysed based on gender, to
𝑛
𝑖=1 identify which gender more often fails to achieve a high CGPA.
Table 8 presents data on failed and excellent students based on
∑𝑛𝑖=1(𝑦𝑖 − 𝑦̂)
𝑖
2
𝑅2 = 1 − 𝑛 (8) gender, where the number of failed male students' records was
∑𝑖=1(𝑦𝑖 − 𝑦̅)𝑖
2
2,289, whereas the number of failed female students' records
was 798. The table also shows the probability of failure and
excellence in student records. From this table, it became clear
(3) Association Rules that female students earned higher percentages of distinction.
Finding meaningful rules among student data requires the use

of Association Rules, which helps to extract frequent patterns
between data. Confidence and support measures are used to
identify the relationships between transactions. Support refers
to the probability that the transaction contains A and B of
itemsets A and B. In contrast to this, one can confidently
evaluate, to a degree of certainty, the discovered correlation,
which is the probability that a transaction containing A also
contains B [2]. The user identifies the initial values of minimum
support and confidence to produce association rules so that
2899
Table 5: Students who excel and fail, by year

Year Total No. No. of No. of % of % of Probability Probability
of Student Excellent Failed Excellent Failed of Excellent of Failed
Records Students’ Students’ Students’ Students’ Students Students
Records Records Records Records
First 9648 682 673 17.23% 21.80% 0.071  0.005 0.070  0.005
Year
Second 10697 731 509 18.47% 16.49% 0.068  0.005 0.048  0.004
Year
Third 8104 478 425 12.08% 13.77% 0.059  0.005 0.052  0.005
Year
Fourth 13737 1105 783 27.92% 25.36% 0.080  0.005 0.057  0.004
Year
Fifth 10221 962 697 24.31% 22.58% 0.094  0.006 0.068  0.005
Year
Table 6: Students who excel and fail, by semester

Total No. No. of
of Excellent No. of Failed % of Excellent % of Failed
Semesters
Students’ Students’ Students’ Records Students’ Records Students’ Records
Records Records
342 4845 291 384 7.35% 12.44%
351 4803 391 289 9.88% 9.36%
352 5242 391 252 9.88% 8.16%
361 5455 340 257 8.59% 8.33%
362 5288 295 271 7.45% 8.78%
371 2816 183 154 4.62% 4.99%
372 6630 551 243 13.92% 7.87%
381 7107 554 540 14.00% 17.49%
382 7363 774 496 19.56% 16.07%
391 2858 188 201 4.75% 6.51%
Table 7: Students who excel and fail, by majors

Total No. No. of % of % of Probability Probability
No. of Failed
Major of Excellent Failed Excellent of Failed of Excellent
Students’
Name Students’ Students’ Students’ Students’ Students’ Students’
Records
Records Records Records Records Records Records
0.009  0.095 
MIS 6731 60 640 1.94% 16.17%
0.002 0.007
0.009  0.062 
Finance 8031 76 498 2.46% 12.58%
0.002 0.005
0.406  0.041 
Pre-Major 5691 2313 235 74.93% 5.94%
0.013 0.005
0.014  0.108 
Accounting 11081 159 1202 5.15% 30.37%
0.002 0.006
0.202 
Economics 3440 0 695 0% 17.56% -
0.013
0.027  0.039 
BA 17433 479 688 15.52% 17.38%
0.002 0.003
2900
Table 8: Students who excel and fail, by gender

No. of % of Probability of Probability of
Total No. of No. of Failed % of Failed
Gender Excellent Excellent Excellent Failed
Students’ Students’ Students’
Name Students’ Students’ Students’ Students’
Records Records Records
Records Records Records Records
Female 27177 3380 798 85.40% 25.85% 0.124  0.004 0.029  0.002
Male 25230 578 2289 14.60% 74.15% 0.023  0.002 0.091  0.004
strong positive relationship since both runs in a straight line and

increase in parallel.
4.2 The Experimental Results of Data Mining Method
4.2.1 Experimental Results of Classification
The data set was divided, with 75% training data and 25% test
data, with Class_GPA as the target variable. Table 9 presents
the evaluation results of DT, RF, and NB. The table contains
the results of CA, F1-score, Precision, and Recall. Figure 4
shows the results of the evaluation of the three models.
Table 9: The evaluation results of the prediction
Model CA F1-score Precision Recall

RF 0.713 0.712 0.715 0.713
DT 0.698 0.697 0.699 0.698
NB 0.594 0.595 0.605 0.594
Fig. 5. Scatter plot of CUM_GPA and SEMESTER_GPA
Two regression models were used to predict student

performance, those being the LR and the DT. The target
variable was the CUM_GPA attribute, with the predictor being
the SEMESTER_GPA and an attribute that has a meaningful
impact on CGPA. Figure 6 visualises the Error rate of the
models using various measures. Whereas, Table 10 presents the
evaluation results values of the regression models.
Table 10: Regression models results' evaluation
Model MSE RMSE MAE R-squared

LR 0.154 0.393 0.315 0.773
Fig. 4. Results evaluation of the three models
DT 0.135 0.368 0.288 0.801
4.2.2 Experimental Results of Regression
Initially, the correlation between the CUM_GPA and the
SEMESTER_GPA was examined and drawn on a Scatter Plot
to establish a relationship between the two variables. Figure 5
shows the dependent variable being the CUM_GPA, and the
independent variable the SEMESTER_GPA. As we can see
from the figure, the coefficient of correlation r=0.88, this value
indicates that the relationship between the two variables was a
2901
MAJOR_NA
Class_GPA=
9 ME= Pre- 4.4% 74.9%
Fail
Major
MAJOR_NA
Class_GPA=
10 ME= Pre- 3.4% 31.5%
Acceptable
Major
MAJOR_NA
Class_GPA=
11 ME= 2.4% 36.3%
Very_Good
Economics
MAJOR_NA
Class_GPA=
12 ME= 2.3% 30.4%
Excellent
Accounting
MAJOR_NA
Class_GPA=
13 ME= 2.2% 32.8%
Good
Economics
The second approach is the impact of subject major on the

Fig. 6. Error rate of the models. students' marks, with ten rules extracted, and based on two
attributes MAJOR_NAME and Class_Marks, as shown in
Table 12.
Table 12: The second approach of Association Rules
4.2.3 Experimental Results of Association Rules
Finding a strong association between items in the Rul If Then Suppor
Confidence
multidimensional data set is not always easy, due to the e# (Antecedent) (Consequent) t
variance of data. For this reason, the association between data MAJOR_NA Class_Marks=
will be examined in three approaches by selecting two 1 29.5% 88.8%
ME= BA P
attributes in each method. All the rules generated will need to Class_Marks= MAJOR_NA
be higher than the minimum support value and also higher than 2 29.5% 33.4%
P ME= BA
the minimum confidence value [41]. For this reason, we MAJOR_NA
Class_Marks=
determined the minimum support value to be 2%. Additionally, 3 ME=
P
19.3% 91.2%
we identified the minimum confidence value to be 30% for all Accounting
approaches. MAJOR_NA Class_Marks=
4 14.3% 93%
ME= Finance P
The first approach is the impact of a subject major on the CGPA, MAJOR_NA Class_Marks=
5 11.8% 92%
with thirteen rules extracted, and based on two attributes ME= MIS P
MAJOR_NAME and Class_GPA, as observed in Table 11. MAJOR_NA
Class_Marks=
6 ME= Pre- 7.1% 65.7%
P
Major
MAJOR_NA
Table 11: The first approach of Association Rules Class_Marks=
7 ME= 6.4% 97.8%
P
Economics
Rule If Then Confidenc Class_Marks= MAJOR_NA
Support 8 3.7% 32.3%
# (Antecedent) (Consequent) e F ME= BA
MAJOR_NA Class_GPA= MAJOR_NA
1 14.4% 43.4% Class_Marks=
ME=BA Good 9 ME= Pre- 3.7% 34.3%
F
Class_GPA= MAJOR_NA Major
2 14.4% 36.1% MAJOR_NA
Good ME= BA Class_Marks=
MAJOR_NA Class_GPA= 10 ME= Pre- 3.7% 32.2%
3 12.8% 38.5% F
ME= BA Acceptable Major
Class_GPA= MAJOR_NA
4 12.8% 46.6%
Acceptable ME= BA
The third approach is the impact of courses on the students'
MAJOR_NA
Class_GPA= marks; seven of the rules have been extracted based on two
5 ME= 8.5% 40.2%
Good attributes COURSE_NAME and Class_Marks, as presented in
Accounting
MAJOR_NA Class_GPA= Table 13.
6 7.1% 46.1%
ME= Finance Good
Table 13: The third approach of Association Rules
MAJOR_NA Class_GPA=
7 5.9% 46.1%
ME= MIS Good
MAJOR_NA Rule If Then Suppo Confidenc
Class_GPA=F # (Antecedent) (Consequent) rt e
8 ME= Pre- 4.4% 40.6%
ail
Major
COURSE_ Class_Marks
1 3.3% 99.4%
NAME= =P
2902
Feasibility In contrast, the lowest excellence and failure rates occurred in

analysis of the students' third year. It's important to note that the number of
projects student records in the fourth year is higher than in other years;
COURSE_ the reason may be due to an increase in the admission rate in
NAME= that year (2017). Furthermore, by comparing the fourth and
Class_Marks fifth year for excellent students, we found that the probability
2 Operations 3% 79.6%
=P of excelling in the fifth year is higher than the fourth year by
Managemen
t between 0.1 and 0.088, which is slightly higher by 0.015 and
COURSE_ 0.013, compared to the fourth year. Whereas comparing the
NAME= fourth and fifth year for failed students, we noted that the
Introduction probability of failure in the fifth year is between 0.073 and
Class_Marks 0.063, which is a slight increase of 0.012 and 0.010, compared
3 to 2.9% 92.6%
=P to the fourth year. By comparing the ten semesters, the highest
management
information failure rate was observed in the first semester of 2017 by
systems 17.49%, and the lowest failure rate was in the first semester of
COURSE_ 2016 by 4.99%. As for students who excelled, we noted that the
NAME= Class_Marks highest rate of excellence in a class was in the first semester of
4 2.8% 95.2% 2018 by 19.56% and the lowest rate was in the second semester
Organizatio =P
nal Behavior of 2016 by 4.62%.
COURSE_
NAME= Consequently, these results helped to make an important
Class_Marks observation, which is that the number of students in the fourth
5 Strategic 2.8% 98%
=P year has a strong impact on increasing the percentage of
Managemen
t excellence and failure in that year, which reached 27.92% and
COURSE_ 25.36%, respectively.
NAME=
Class_Marks Further, the students' GPA was analysed based on the major
6 Saudi 2.5% 98.2%
=P and, through this analysis, we noted that Accounting students
Commercial
outperformed all students in the excellence classes. In contrast,
Law
we found that Pre-Major students were higher in the failure
COURSE_
classes. On the other hand, the highest failure rate was in a Pre-
NAME=
Major by 74.93% and the probability of increased failure is
Principles of Class_Marks estimated to be between 0.419 and 0.396. Since the Pre-Major
7 2.1% 72%
Managemen =P is the major that contains general study courses from various
t
majors, we found that most students fail in some courses,
Accounting
especially in the first three semesters of study at the college. In
contrast, the probability of increasing excellence in the
4.2.4 Experimental Results of Anomalies' Analysis Economics department is the highest among other majors,
where the probability value is estimated to be between 21.5%
In the EDM, anomaly detection is not only used to find students and 18.9%. This is due to one of the following reasons. First, in
with academic problems and poor performance but also to our opinion, the failure rate in the Economics major is 0%, so
discover students who excel in academic performance. Also, it is likely to increase excellence. Second, we think this major
the detection of outliers helps the educational institution make may be easy, as it depends on theoretical more than practical
effective decisions that help the student avoid making wrong courses.
decisions. In our work, the outliers were discovered in the CBE
students' data through the use of outliers' analysis. About 525 Therefore, these results help the decision-makers to find
anomalies and about 51882 inliers were obtained after applying alternative methodological plans for the difficult
the outliers' detection method. This analysis helps to detect specialisations that have a high rate of failure and develop
anomalies that may be distinctive and useful to the CBE. It also studied plans that contribute to raising students' academic
helps in discovering cases, which may turn into problems to be achievement. Moreover, the performance of students was
avoided. It will also help decisions to be solved, such as trying analysed based on gender, and we observed that female
to find solutions that would help improve the performance of students outperformed male students, where the analysis
students with low GPAs. showed that the records of failed male students exceeded the
records of female students by 1,491 records. Also, the
probability of failure in male students' records was between
5. FINDING AND DISCUSSION 0.095 and 0.086. Whereas, the probability of failure in the
female students' records was between 0.031 and 0.027.
The following are the results obtained in the pursuit of the Furthermore, it became clear that the highest percentage of
study's goals. The results of the analysis of student academic distinction was in the records of female students, where the
records showed that by comparing five years, that the highest percentage of excellence was 85.4%. In contrast, the percentage
excellence and failure rates occurred in the students' fourth year. of excellence for male students was 14.6%. We also noted that
2903
the probability of a high GPA in female students' records was squared was 80.1%, which indicates that the model explains
between 0.128 and 0.120, whereas the probability of a high 80.1 % of the variability in the CUM_GPA. The result of
GPA in male students' records was between 0.025 and 0.021. evaluating the models' performance has shown us that the DT
Overall, the results indicated that female students' model is good and is better than LR, as the error rate in the DT
outperformed male students and that they are less likely to fail is less than LR.
than male students. Besides, the probability of male students
obtaining a failed GPA is 7% higher than the probability of The association's rules were analysed based on three
excelling. As for female students, the probability of superiority approaches. The first approach is the impact of a major on the
is 9.7% higher than the probability of failing. CGPA; we observed from the first and third rule that students
in the BA category are most likely to obtain a good GPA, with
Accordingly, these results lead us to the fact that female 43.4% confidence. They are also most likely to obtain an
students are more diligent in obtaining high rates and avoiding acceptable GPA, with 38.5% confidence. Furthermore, we
failure in their academic performance. These results help the noted from the second and fourth rule that the vast majority
college to try to search for the reasons that led to the failure of who obtain a good GPA, with 36.1% confidence and an
male students in their academic performance, educate students acceptable GPA, with 46.6% confidence are BA students.
by setting up seminars that support them in raising their
academic performance, the search behind the reasons that led As in the fifth, sixth and seventh rule, Accounting, Finance, and
to their failure and take the crucial decisions to reduce this MIS students are more likely to get a good GPA with 40.2%,
failure in the coming years 46.1%, and 46.1% confidence, respectively. It was noted in the
eighth and ninth rules, the Pre-Major students, often get a GPA
On the other hand, the evaluation results of classification to fail with 40.6% confidence. Also, that the failed students
methods showed that RFs achieved the highest scoring 71.3% most probably belong to the Pre-Major with 74.9% confidence.
on CA and Recall, 71.5% on Precision and 71.2% on F1-score. As the 10th rule states, students of Pre-Major may obtain an
The next algorithm was the DT with 69.7% on F1-score, 69.8% acceptable GPA, with 31.5% confidence.
on CA and Recall, and showed slight increases on Precision by
0.1%, which means it scored 69.9%. Meanwhile, the NB As for Economics students, the 11th and 13th rules show that
appeared to be the worst algorithm, obtaining 59.4% on CA and they are more likely to have a very good GPA, with 36.3%
Recall, 59.5% on F1-score, and 60.5% on Precision. We can confidence, and a good GPA with 32.8% confidence. As for the
conclude from these findings that the performance of the RF 12th rule, they are the lowest in confidence value, 30.4%; this
algorithm on this type of data set is excellent. Therefore, one of rule says that if the GPA class belongs to the excellent group,
the points to be taken into account is that the principle of RF then the major will be Accounting. This rule indicates that most
and the ensemble learning method is proportional to our data students who excel the most belong to the Accounting major.
set, which is structured data. Where the basic principle of RF is
that a group of weak learners can be combined to form a strong Through these thirteen rules, it is clear to us that the highest
collective learner, this principle helped to obtain an adequate confidence obtained was 74.9%, which shows that failure rates
evaluation in the classification of student performance. often occur in the Pre-Major. As we mentioned previously, the
Furthermore, we found that the DT was lower by 1.5% on CA Pre-Major is a major that is taken before specialisation and
than RF; this indicates that the RF is more accurate with results comprises courses from all majors. We surmise that its students
than the DT, and the DT built according to IF-THEN rules [2]. often fail because some of their courses are from disciplines
Accordingly, we concluded from this assessment that a rule- they do not like.
based classifier is proportional to the data set used in this study.
We noticed the next rule that scores the highest confidence, at
Finally, according to the results, RFs have outperformed the 46.6%, states that if the GPA class belongs to "acceptable,"
other algorithms in all evaluation measures. This can be used then the major is a BA. The BA is dominated by an acceptable
to meet the requirements of the university in achieving quality GPA, and it is the most popular specialisation in the CBE with
and discovering weak students, as well as finding students who 17,433 records. This discovery may indicate that most students
show excellent and exceptional capabilities tend to belong to this specialisation due to the belief among
many that courses tend to be easy. This may also be due to the
As for the results of the evaluation of regression models, the popularity of this major, which provides jobs for graduates at
value of the average of the squared of the errors (MSE) was many companies and organisations.
estimated at 15.4% in LR, whereas in DT it was estimated at
13.5%. The value of the differences between the actual values The second approach is the impact of a major on the students'
and the values predicted by the LR (RMSE) was 39.3%, marks; we noticed that the rule with the highest confidence,
whereas the DT was 36.8%. Also, the value of the average of 97.8%, is the seventh rule. This rule shows that if a major is in
the absolute values decided, calculating the differences among Economics, it is more likely that it will obtain a mark of "P",
predicted and actual values (MAE), was estimated at 31.6% in which means that Economics students will likely pass all
LR, whereas in DT it was 28.8%. Moreover, the value of the courses. This is followed by the fourth rule, with a 93%
proportion of variance of the dependent variable explained by confidence. This rule clarifies that if the major is Finance, they
the independent variable (R-squared) was 77.3% in LR, which are likely to pass the courses. The fifth rule, with 92%
indicates that the model shows 77.3 % of the variability in the confidence, indicated that if the major is Management
CUM_GPA (the target variable). Whereas, in DT, the R- Information Systems, the marks will constitute a pass. The next
2904
rule is the third rule, with 91.2% confidence, denoting that if of the college is that students must complete the courses of the
the major is Accounting, then they are likely to pass the courses first three levels before specialisation, with a second condition
too. The last rule with high confidence is the first rule, at 88.8%, being the obligation to obtain a GPA higher than 2, conditions
which shows the student who belongs to the BA group is most that led some students to be shut out of specialisations. So, they
likely to obtain a pass mark. On the other hand, if the major in have to add these general courses to finish previous courses or
Pre-Major, then they will pass the courses with a confidence of raise their GPA. We did note that the rules with the F mark did
65.7%, as in the sixth rule. Whereas the ninth rule states that if not appear in this analysis under the measures' selected values.
the major in Pre-Major, then it is likely that the failure of a We conclude from this that there were more instances of
course will be obtained with a confidence of 34.3%. The second success than failure.
rule says that if the class mark is "P," it is likely that the major
is a BA, with a 33.4% confidence. Whereas the eighth and 10 th Furthermore, after the outliers' analysis, we noticed that
rules state that if the class of marks is "F" this indicates that the significant anomalous data appeared in the records of students
major is a BA or Pre-Major, with little confidence, 32.3% and of Pre-Major. The anomaly was due to the weak SGPA and
32.2%, respectively. Finally, we have concluded that these CGPA. In addition to their course failures. Student failure at the
majors, Economics, Finance, MIS and Accounting, are more first level was often due to several reasons, such as the
likely to get the pass in courses, with high confidence, over difficulty of the courses, the difference in the methods of
90%. This demonstrates that excellent and interested students lecturers teaching the courses or the standardisation of
always belong to these majors. Furthermore, when students questions (standardised test) between the female and male
belong to the fields they prefer, they give their best. students department. Also, there may be personal reasons
related to the student's social life. Consequently, a strategic plan
The third approach is the impact of courses on the students' must be designed to understand difficulties and problems
marks; the resulting rules show that a student who registered in experienced by the students of the first level, and then practical
the course "Feasibility analysis of projects" is most likely to decisions could be made that are appropriate to these problems,
obtain a pass mark, with a 99.4% confidence, as in the first rule. to avoid students failing in future years. This brings up the
We also noted in the fifth and sixth rules, that with a 98.2% and necessity of the academic advisor, especially for Pre-Major
98% confidence, and if the course is "Saudi Commercial Law" students, to guide them in the continuation of their studies and
and the course is "Strategic Management," then students will to overcome difficulties. We have observed the problem of
most likely pass this course. The fourth rule states that if the "academic separation" in the academic cases of most Pre-Major
course is "Organisational Behavior," then students will pass students. The terms "dropout," "discontinuation of study" and
this course, with a 95.2% confidence, as the third rule, with a "termination" were also read. We also discovered a group of
92.6% confidence. If the course is "Introduction to observed anomalies that serve the college in many respects,
Management Information Systems," students are more likely to especially in obtaining high-quality standards in the education
pass this course. Moreover, we have two rules where we see process. Where a group of students was found who have a high
less confidence than 92.6% by almost 13%, which are the CGPA at all levels of study, they nevertheless graduated with
second and seventh rules, with 79.6% and 72%, respectively. an excellent CGPA. The college should, in turn, realize that the
As the second rule states, that if the course is "Operations excellent students' experience leads to organised volunteer
Management," students will succeed in this course. As for the courses. These could be offered by the students who excel, and
seventh rule, it appears that if the course is "Principles of that can assist students of the same major. Those students'
Management Accounting," students will also pass this course. experiences may be used to provide advice to those who wish
Finally, according to our experience in the CBE courses, to join this major and could be achieved through social media.
"Feasibility analysis of projects," "Saudi Commercial Law" and
"Organisational Behavior" are general education courses in the
five departments: Management Information Systems, 6. CONCLUSION AND FUTURE WORK
Accounting, Finance, Economics and Business Administration,
the "Strategic Management" course is a general education The purpose of this study was to analyse student data in the
course in Management Information Systems, Accounting, CBE by extracting new patterns and features from their
Economics and Business Administration. These general academic data. It additionally sought to detect anomaly cases.
education courses aim to expand the scope of students' It did this by predicting the academic performance of students
understanding by adding courses from different specialisations, over the last five years, from 2014 to 2018, using data mining
for the student to graduate with knowledge of majors different techniques. Moreover, it identified the students' weaknesses
than the one they primarily studied. and failures and explored the knowledge that helps to improve
the educational process. Furthermore, it tried to find the reasons
The research findings suggest that the knowledge obtained for the students' repeated failure in a particular course.
from the third approach means that students often pass general
education courses. Many students intentionally add these This study explored, through the application of data analysis,
courses, either to raise their GPA due to the course being easy first, that the probability of excellence and failure was in the
or because of the cooperation they are felt with the lecturer. fifth year more than in the fourth year (in the first and second
Also, these courses may be added to fill the gap of the academic semesters of 2018). We found through these results that the rate
schedule because some students prefer to not have too much of excellence in the last year exceeded the failure rate by 2.7%.
free time in their schedules. Furthermore, one of the conditions Second, the probability of increasing excellence among
2905
students of the department of Economics was the highest 2014 International Conference on Engineering and
among other majors by more than 18.9%. On the other hand, Technology (ICET), 2014, pp. 1–6.
the probability of increased failure in a Pre-Major was
[2] M. K. Jiawei Han Jian Pei, Data Mining: Concepts and
estimated to be more than 39.3%. Third, the probability of
Techniques, 3rd ed. Elsevier, 2012.
excellence in the records of female students was estimated
between 12.8% and 12%, whereas the probability of excellence [3] R. Lawrance and V. Shanmugarajeshwari, “An assay
among the records of male students was estimated to be of teachers’ attainmentusing decision tree based
between 2.5% and 2.1%. Therefore, the analysis leads us to the classification techniques,” in Proceedings of IEEE
following conclusion: male students and Pre-Major students are International Conference on Circuit, Power and
more likely to fail and therefore need, in this period, to follow Computing Technologies, ICCPCT 2017, 2017.
up with academic advisors.
[4] K. N. Shah, M. R. Patel, N. V Trivedi, P. N. Gadariya,
Additionally, according to the results of classification, RF has R. H. Shah, and N. Adhvaryu, “Study of Data Mining
outperformed the other algorithms in all evaluation measures, in Higher Education-A Review,” International
with 71.3% of CA and Recall, F1-score 71.2%, and Precision Journal of Computer Science and Information
71.5%. Furthermore, as a result of evaluating the performance Technologies, vol. 6, no. 1, pp. 455–458, 2015.
of the regression models, we have noticed that the DT model is [5] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From
not only good but is better than LR, as the error rate in the DT Data Mining to Knowledge Discovery in Databases,”
is less than LR. On this basis, we conclude from this study that AI Magazine, vol. 17, no. 3, pp. 37–54, Mar. 1996.
the best classification model is RF, and the best regression
model is DT. [6] B. Guo, R. Zhang, G. Xu, C. Shi, and L. Yang,
“Predicting Students Performance in Educational Data
Moreover, the results that we reached through the association's Mining,” in 2015 International Symposium on
rules indicate that the knowledge obtained from the first Educational Technology (ISET), 2015, pp. 125–128.
approach was that failure rates often appeared in the Pre-Major [7] B. Kumar and S. Pal, “Mining Educational Data to
with a 74.9% estimated confidence. The results also showed Analyze Students Performance,” International Journal
that if the GPA class belongs to "acceptable," then the major is of Advanced Computer Science and Applications, vol.
BA with an estimated 46.6% confidence. Students from the 2, no. 6, pp. 63–69, 2011.
following majors, Economics, Finance, MIS and Accounting,
are more likely to get the pass marks in the courses, with an [8] A. I. Adekitan and E. Noma-Osaghae, “Data mining
over-90% confidence. Our findings from the third approach approach to predicting the performance of first year
suggest that the knowledge obtained shows that students often student in a university using the admission
pass general education courses. It was clear through research requirements,” Education and Information
and submitted questions to the officials in the CBE that many Technologies, vol. 24, no. 2, pp. 1527–1543, Mar. 2018.
students intentionally add these courses either to raise their
[9] L. A. Buschetto Macarini, C. Cechinel, M. F. Batista
GPA due to the course being easy or because of the cooperation
Machado, V. Faria Culmant Ramos, and R. Munoz,
they felt with the lecturer. There is also passion and curiosity “Predicting Students Success in Blended Learning—
felt by some students, who enroll in these courses and obtain Evaluating Different Interactions Inside Learning
valuable information that will benefit them in future. Management Systems,” Applied Sciences, vol. 9, no.
24, p. 5523, Dec. 2019.
In future, researchers would need additional data for the
analysis, to increase the accuracy of the prediction. They may [10] S. Angra and S. Ahuja, “Implementation of data mining
also want to focus on features that have a substantial impact on algorithms on student’s data using rapid miner,” in
student performance, such as high school rate, absences, the 2017 International Conference on Big Data Analytics
number of notifications and the number of failures in a course. and Computational Intelligence (ICBDAC), 2017, pp.
Additional models, such as traditional Neural Networks and 387–391.
deep learning, could be employed.
[11] R. S. J. Baker, “Data Mining for Education,” 3rd
Editio., vol. 7, 2010, pp. 112–118.
ACKNOWLEDGMENT [12] C. Romero and S. Ventura, “Educational Data Mining:
A Review of the State of the Art,” IEEE Transactions
The authors would like to thank the College of Business and
on Systems, Man, and Cybernetics, Part C
Economics at Qassim University that provided the data
(Applications and Reviews), vol. 40, no. 6, pp. 601–618,
required for this research.
Nov. 2010.
[13] S. Roy and A. Garg, “Predicting academic performance
REFERENCES of student using classification techniques,” 2017 4th
IEEE Uttar Pradesh Section International Conference
[1] R. M. Damin, M. A. Kadry, and E. M. Hamed, “An
on Electrical, Computer and Electronics, UPCON
investigation into the use of the education Management
2017, vol. 2018-Janua, pp. 568–572, 2018.
Information System (EMIS) in Iraq: Case study,” in
2906
[14] S. Ahmed, R. Paul, and A. S. M. L. Hoque, MEC International Conference on Big Data and Smart
“Knowledge discovery from academic data using City (ICBDSC), 2019, pp. 1–4.
Association Rule Mining,” in 2014 17th International
[25] H. Mousa and A. Maghari, “School Students ’
Conference on Computer and Information Technology
Performance Predication Using Data Mining
(ICCIT), 2014, pp. 314–319.
Classification,” International Journal of Advanced
[15] M. Hasibur Rahman and M. Rabiul Islam, “Predict Research in Computer and Communication
Student’s Academic Performance and Evaluate the Engineering, vol. 6, no. 8, pp. 136–141, 2017.
Impact of Different Attributes on the Performance
[26] A. Al Mazidi and E. Abusham, “Study of general
Using Data Mining Techniques,” 2nd International
education diploma students’ performance and
Conference on Electrical and Electronic Engineering,
prediction in Sultanate of Oman, based on data mining
ICEEE 2017, no. December, pp. 1–4, 2018.
approaches,” International Journal of Engineering
[16] B. Kapur, N. Ahluwalia, and S. R, “Comparative Study Business Management, vol. 10, pp. 1–11, 2018.
on Marks Prediction using Data Mining and
[27] K. Sunday, P. Ocheja, S. Hussain, S. S. Oyelere, B. O.
Classification Algorithms,” International Journal of
Samson, and F. J. Agbo, “Analyzing Student
Advanced Research in Computer Science, vol. 8, no. 3,
Performance in Programming Education Using
pp. 632–636, Apr. 2017.
Classification Techniques,” International Journal of
[17] C. Jalota and R. Agrawal, “Analysis of Educational Emerging Technologies in Learning (iJET), vol. 15, no.
Data Mining using Classification,” in Proceedings of 02, p. 127, Jan. 2020.
the International Conference on Machine Learning,
[28] P. Rojanavasu, “Educational data analytics using
Big Data, Cloud and Parallel Computing: Trends,
association rule mining and classification,” in ECTI
Prespectives and Prospects, COMITCon 2019, 2019,
DAMT-NCON 2019 - 4th International Conference on
pp. 243–247.
Digital Arts, Media and Technology and 2nd ECTI
[18] J. H. Sharp and L. A. Sharp, “A comparison of student Northern Section Conference on Electrical,
academic performance with traditional, online, and Electronics, Computer and Telecommunications
flipped instructional approaches in a C# programming Engineering, 2019, pp. 142–145.
course,” Journal of Information Technology Education:
[29] S. Kotsiantis and D. Kanellopoulos, “Association
Innovations in Practice, vol. 16, no. 1, pp. 215–231,
Rules Mining: A Recent Overview,” GESTS
2017.
International Transactions on Computer Science and
[19] V. Shanmugarajeshwari and R. Lawrance, “Analysis of Engineering, vol. 32(1), pp. 71–82, 2006.
students’ performance evaluation using classification
[30] V. Nida Uzel, S. Sevgi Turgut, and S. Ayse Ozel,
techniques,” 2016 International Conference on
“Prediction of Students’ Academic Success Using Data
Computing Technologies and Intelligent Data
Mining Methods,” in 2018 Innovations in Intelligent
Engineering, ICCTIDE 2016, pp. 1–7, 2016.
Systems and Applications Conference (ASYU), 2018,
[20] S. B. Rahayu, N. D. Kamarudin, and Z. Zainol, “Case pp. 1–5.
Study of UPNM Students Performance Classification
[31] A. F. Meghji, N. Ahmed Mahoto, M. A. Unar, and M.
Algorithms,” International Journal of Engineering and
Akram Shaikh, “Analysis of Student Performance
Technology, vol. 7, no. December 2018, pp. 285–289,
using EDM Methods,” in 2018 5th International Multi-
2018.
Topic ICT Conference (IMTIC), 2018, pp. 1–7.
[21] R. Hasan, S. Palaniappan, A. R. A. Raziff, S. Mahmood,
[32] A. Naik and L. Samant, “Correlation Review of
K. U. Sarker, and A. Rafi, “Student Academic
Classification Algorithm Using Data Mining Tool:
Performance Prediction by using Decision Tree
WEKA, Rapidminer, Tanagra, Orange and Knime,”
Algorithm,” in 2018 4th International Conference on
Procedia Computer Science, vol. 85, pp. 662–668, Jan.
Computer and Information Sciences (ICCOINS), 2018,
2016.
pp. 1–5.
[33] M. A. Al-Hagery, “Classifiers’ Accuracy Based on
[22] A. Marwaha and A. Singla, “A study of factors to
Breast Cancer Medical Data and Data Mining
predict at-risk students based on machine learning
Techniques,” International Journal of Advanced
techniques,” in Advances in Intelligent Systems and
Biotechnology and Research, vol. 7, no. 2, pp. 760–772,
Computing, 2020, vol. 989, pp. 133–141.
2016.
[23] A. I. Adekitan and O. Salau, “Toward an improved
[34] K. Limsathitwong, K. Tiwatthanont, and T.
learning process: the relevance of ethnicity to data
Yatsungnoen, “Dropout prediction system to reduce
mining prediction of students’ performance,” SN
discontinue study rate of information technology
Applied Sciences, vol. 2, no. 1, pp. 1–15, Jan. 2020.
students,” in 2018 5th International Conference on
[24] S. S. Al-Nadabi and C. Jayakumari, “Predict the Business and Industrial Research (ICBIR), 2018, pp.
selection of mathematics subject for 11 th grade 110–114.
students using Data Mining technique,” in 2019 4th
2907
[35] E. A. Amrieh, T. Hamtini, and I. Aljarah, “Mining Mohammed Abdullah Al-Hagery

Educational Data to Predict Student’s academic received his BSc in Computer Science
Performance using Ensemble Methods,” International from the University of Technology in
Journal of Database Theory and Application, vol. 9, no. Baghdad Iraq-1994. He got his MSc. in
8, pp. 119–136, 2016. Computer Science from the University of
Science and Technology Yemen-1998. Al-
[36] M. A. Al-Hagery, E. I. Al-Fairouz, and N. A. Al- Hagery finished his Ph.D. in Computer
Humaidan, “Improvement of Alzheimer disease Science and Information Technology,
diagnosis accuracy using ensemble methods,” (Software Engineering) from the Faculty of Computer Science
Indonesian Journal of Electrical Engineering and and IT, University of Putra Malaysia (UPM), November 2004.
Informatics (IJEEI), vol. 8, no. 1, pp. 132–139, 2020. He was the head of the Computer Science Department at the
College of Science and Engineering, USTY, Sana'a from 2004
[37] A. Abu Saa, “Educational Data Mining & Students’ to 2007. From 2007 to this date, he is a staff member at the
Performance Prediction,” International Journal of College of Computer, Department of Computer Science,
Advanced Computer Science and Applications, vol. 7, Qassim University, Buraydah, KSA. He published more than 31
no. 5, pp. 212–220, 2016. papers in various international journals. Dr. Al-Hagery was
[38] P. M. Arsad, N. Buniyamin, and J. Ab Manan, “Neural appointed the head of the Research Centre at the Computer
College, and a council member of the Scientific Research
Network and Linear Regression methods for prediction
Deanship Qassim University, KSA from September 2012 to
of students’ academic achievement,” in 2014 IEEE
October 2018. Currently, he is teaching the master degree
Global Engineering Education Conference students and a supervisor of four master thesis. He is a jury
(EDUCON), 2014, no. April, pp. 916–921. member of several PhD and master thesis, as an internal and
[39] A. Ahlemeyer-Stubbe and S. Coleman, A Practical external examiner in his field of his specialist.
Guide to Data Mining for Business and Industry, 1st ed.
Chichester, UK: John Wiley & Sons, Ltd, 2014.
[40] A. Sandoval, C. Gonzalez, R. Alarcon, K. Pichara, and
M. Montenegro, “Centralized student performance
prediction in large courses based on low-cost variables
in an institutional context,” Internet and Higher
Education, vol. 37, no. June 2017, pp. 76–89, Apr.
2018.
[41] M. A. Al-Hagery, “Extracting hidden patterns from
dates’ product data using a machine learning technique,”
IAES International Journal of Artificial Intelligence
(IJ-AI), vol. 8, no. 3, pp. 205–214, Dec. 2019.
[42] S. Hussain et al., “Educational data mining and
analysis of students’ academic performance using
WEKA,” Indonesian Journal of Electrical Engineering
and Computer Science, vol. 9, no. 2, p. 447, Feb. 2018.
ABOUT THE AUTHORS:
Ebtehal Ibrahim Al-Fairouz received her BSc in Computer

Science from the Qassim University, Buraydah, KSA. She is a
teaching assistant in the Department of Management
Information System (MIS) at the College of Business and
Economics (CBE) and a Master's student in Computer Science
Department, Qassim University, KSA. Her research interests
include data mining, data analytics, data visualisation and
machine learning.
2908
View publication stats

Ijertv13n10 46withibthal-0.5

Uploaded by

Copyright:

Available Formats

Ijertv13n10 46withibthal-0.5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ijertv13n10 46withibthal-0.5

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Students Performance: From Detection of Failures and Anomaly Cases to the

Article in International Journal of Engineering Research and Technology · November 2020

Ebtehal Ibrahim Al-Fairouz Mohammed Abdullah Al-Hagery

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Students Performance: From Detection of Failures and Anomaly Cases to

Ebtehal Ibrahim Al-Fairouz1, Mohammed Abdullah Al-Hagery2

Abstract Data mining is an essential step in what is referred to as

Fig 1. Methodology framework

Table 1: Data set information

(2) Resolve inconsistencies We created a second new feature, called "Class_Semesters," to

(3) Detect outliers and anomalies # Semester Class

Table 4: CONFIRMED_MARK classification

3.3 Application of Data Mining Techniques

(1) Classification Methods

(2) Regression Methods 4. EXPERIMENTS AND ANALYSIS

Finding meaningful rules among student data requires the use

Table 5: Students who excel and fail, by year

Table 6: Students who excel and fail, by semester

Table 7: Students who excel and fail, by majors

Table 8: Students who excel and fail, by gender

strong positive relationship since both runs in a straight line and

4.2.1 Experimental Results of Classification

Table 9: The evaluation results of the prediction

Model CA F1-score Precision Recall

Fig. 5. Scatter plot of CUM_GPA and SEMESTER_GPA

Two regression models were used to predict student

Table 10: Regression models results' evaluation

Model MSE RMSE MAE R-squared

The second approach is the impact of subject major on the

Feasibility In contrast, the lowest excellence and failure rates occurred in

[35] E. A. Amrieh, T. Hamtini, and I. Aljarah, “Mining Mohammed Abdullah Al-Hagery

ABOUT THE AUTHORS:

Ebtehal Ibrahim Al-Fairouz received her BSc in Computer

View publication stats

You might also like