10 1109@educon 2019 8725114
10 1109@educon 2019 8725114
10 1109@educon 2019 8725114
5HFRUGV
Mohammed Hussain Abdullah Hussein
College of Technological Innovation Department of Computer Science
Zayed University University of Sharjah
Dubai, UAE, P.O. Box 19282 Sharjah, UAE, P.O. Box 27272
mohammed.hussain@zu.ac.ae ahussein@sharjah.ac.ae
Abstract — Many educational institutions enforce The research question investigated in this paper is the
attendance policies, where students are expected to have their value of analyzing student attendance records, which is
absences below a certain percentage in each class. Attendance typically only collected to enforce attendance policies.
records are collected to enforce such policies, but they are rarely Specifically, would mining such data helps instructors and
utilized for anything else. In this paper, we investigate the value advisors gain insights with regards to identifying students who
of analyzing the records pulled from student attendance miss their classes due to peer pressure, as described earlier?
systems. We apply a data mining technique, the market basket The importance of this work comes from the fact that the
analysis, on student attendance data. The contribution of this number of students may reach tens of thousands in many
analysis is the identification of student groups who share highly educational institutions. Thus, any meaningful insight, even if
similar absence records. Such similarity may indicate that the it applies for a small percentage of students, will have a great
students are missing classes due to peer pressure, rather than
value.
valid excuses. The presented method helps instructors and
advisors discover this behavior, which is more efficient than This paper addresses the stated research question through
relying on instructors, who may teach many classes. To the field of educational data mining and learning analytics [1-
minimize the number of false alarms, student groups are ranked 4]. Educational data mining and learning analytics have
based on their absence similarity. We tested our method by gained a great attention as tools which provide valuable
analyzing student attendance data for over two thousand insights to higher educational institutions through the analysis
students for one semester at a public higher education of student data collected from the various IT systems.
institution. The results were helpful in identifying students with
Numerous papers have used data mining techniques to predict
miss classes due to their friends missing the classes.
student academic performance, engagement in classes and
preferences [5-10]. This work utilizes this field through the
Keywords –Educational data mining; Learning
application of a popular data mining technique, specifically
analytics; Mining student behavior
the market basked analysis, to analyze student data. Market
basket analysis is used in e-commerce, such as Amazon and
I. INTRODUCTION eBay, to analyze how frequently a group of products are
Student advisors and counselors strive to ensure the purchased together and to recommend products to customers.
success of their students. Apart from advising students with The rest of the paper is organized as follows. Section II
respect to courses they need to register for next the semester, discusses the background and related work on educational
it is expected that advisors and counselors encourage students data mining and learning analytics relevant to this paper.
to put more effort in their courses, motivate students to engage Section III presents our method of analyzing student
in extracurricular activities and help students reflect on their attendance records, based on a market basket analysis. Section
behavior. Instructors may help advisors by providing feedback IV illustrates the method through a case study involving the
with regards to the performance of the advisees. The authors’ attendance records of over two thousand students throughout
institution uses a system to keep track of student attendance. a semester, collected from the attendance system at the author
The institution uses the system to enforce attendance, where institution. Section V concludes the paper and presents the
students are withdrawn from a course if they exceed a certain current limitations and planned future work.
number of absences. The system notifies advisors once their
students are in danger of being withdrawn from a course. The
system fails short from reporting the following behavior. Alice II. BACKGROUND
and Bob are two students and they are classmate in a few Students in higher educational institutions interact with
courses. Bob and Alice are also friends. Should Bob, skip a many systems, such as registration and learning management.
class, Alice may feel pressured to skip that class. The same Mining student data collected from these systems helps
may be true the other way around. This behavior may be discover useful insights with respect to student behavior.
repeated a few times in each course, will amount to many Researchers in the field of educational data mining and
absences. However, their behavior will fly under the learning analytics have already demonstrated that such data
attendance system radar, since they are skipping classes from can be used to predict student academic performance,
difference courses. Neither the instructor, nor the advisor will
preferences and engagement [5-9] and [13-14].
suspect that one of the students is skipping the classes because
of peer pressure. Such behavior may result in students failing A survey on the application of data mining methods to
their courses. This is a problem that remains unsolved and it achieve different educational purposes, such as the way the
constitutes the problem addressed in this paper. input and output of the educational process affect each other
was presented in [4]. The survey describes the way different
This research was funded through Zayed University Research
Incentive Fund R71068 research work on determining student failure/success rate in
978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1198
order to help students before they reached risk of failure, and
effective resource utilization and cost minimization were also
studied. The authors clustered the students based on their class class attendance
performance and overall attendance (low, medium and high).
This helped the authors in predicting the students’ graduation
performance in final year at university using only pre- Attendance
Instructor
university marks and examination marks of early years at System
se t
university. The authors of [5] Studied the effect of student ata
ed
class attendance on their academic scores and registered a nc
da
en
significant relation between attendance and the academic att
score. The study used the T-test to measure the relation
between the percentage of attendance and the percentage score absence rules
of the students. A survey on the application of data mining in
learning management systems is presented in [6]. The survey Learning
describes the way each step of the data mining process is Analytics Advisor
applied to the field of e-learning, from preprocessing to
interpreting results. The survey focused on the open-source Fig. 1. The system architecture
Modular Object-Oriented Development Learning
Environment (MOODLE) as the source of the educational may signify missing classes due to peer pressure. The next
data. section describes the presented method for finding patterns
The authors of [7] extracted knowledge that describes within student attendance records.
students’ performance in end semester examination to help in
identifying the dropouts and students who need special III. METHOD
attention. A study to identify patterns of interaction between
the students is proposed by [8]. The study related these We propose a method that utilizes the attendance records
patterns to student performance. A case study was presented for all students. The collected data is then passed to a data
where students were trained on use of the Internet to mining technique for analysis. Specifically, the method mines
accomplish education-related tasks. By monitoring the the records for association rules, where each rule links the
student’s communications solving the difficulties that arose, absences of one student to the absences of one or more
the course instructors searched for patterns of interaction and students. The method makes use of the market basket analysis
related these patterns to student performance and final course [15], which is a data mining technique that retailers, such as
grades. As the prediction of student performance, early in the Amazon and eBay use to find associations between their
course, is vital to student success, the authors of [9] presented products. Retailers generate association rules by inspecting
an approach to evaluate student data and predict the student their transactions and finding items that frequently appear
performance in courses during their early period of study. together. For example, the following rule may be used by a
Students were asked to fill a questionnaire which include retailer recommender system.
questions related to several personal, socio-economic, {Gaming_Console, Motion_Detector} ⇒ {Motion_Game}
psychological, school and college related variables that were
expected to affect student performance. The authors then built The rule states that if a customer buys a certain gaming
data mining models based on the data collected from the console, as well as that console’s motion detector, then the
questionnaire. customer will likely to buy a motion-based game, such as a
dancing game. The set of the two items, Gaming_Console and
Learning analytics is an essential component for Motion_Detector, represents the left side of the association
leveraging the benefits of big data [10] in educational rule, whereas the set of one item, Motion_Game, represents
contexts. The challenges and opportunities of big data for the right side of the rule. Such rules help retailers recommend
educational institutions are studied in [11-12]. The authors of products to customers. A popular algorithm to generate
[13] presented a method to enhance the academic association rules is the apriori algorithm [16].
accreditation process through the application of big data. The
method analyzes assessment tools and learning outcomes and In our work, we use the apriori algorithm to find rules
helps educators in aligning assessment with outcomes. A associating the absences of one student to the absences of one
method to identify student utilization of campus facilities or more students. The generated rules help instructors and
through tracking student access to campus wireless networks advisors identify students who are frequently absent together.
was presented in [14]. Instructors and advisors may meet with such students to
investigate the reason behind this behavior. Such intervention
A common approach among the educational data mining supports student academic success. Please refer to Fig. 1,
literature described earlier [5-8] is the use of student which illustrates the proposed method. The next subsection
attendance, performance in assessment and interaction with defines the necessary terminology for the apriori algorithm.
learning resources to predict student performance and Please refer to [16] for the exact algorithm.
success. Student attendance is modeled as a qualitative value,
such as low, average and high attendance. This paper utilizes A. The Apriori Algorithm
the actual attendance records. Using the actual records allows The apriori algorithm works on transactional datasets
us to utilize more data mining techniques, such as the market where each row represents a transaction consists of a set of
basket analysis used in this paper. Further, existing literature items. Table I shows an example of a dataset of five
does not help in finding groups of students who share a very transactions. Consider the set of available items for purchase,
similar attendance records across different classes, which gaming consoles, controllers, virtual reality sets, virtual reality
978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1199
TABLE I SAMPLE TRANSACTION DATASET
Trans Items
1 Gaming_Console, VR_Set, VR_Game
2 Gaming_Console, Controller, Action_Game, Online_Pass
3 VR_Set, VR_Game
4 Gaming_Console, Action_Game, Online_Pass
5 VR_Set, VR_Game, Controller
978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1200
TABLE III. Sample Anonymized Student Absence Rules
CS101_3_Apr12, where CS101_3 is the section id based on the attendance records collected from the
and Apr12 is the date of one of the sessions. Each author institution. The experiment detail is explained
element of Master_Sec_Date_List is therefore a the next section.
transaction label.
Step 7. The advisor needs to investigate the rules. Note that
Step 3. Create a matrix where the rows represent the some rules may be the result of students
elements of Master_Sec_Date_List ids and the coincidentally missing classes, without peer
columns represent the student ids. The celli,j in pressure. The advisor may check on the student
created matrix represents whether student, j, was overall performance to set the priority for meeting
present (1) or absent (0) during session i (note that i students. Students who miss classes and are under
is composed of a section id and a date id). Table II is probation needs urgent attention, compared to
an illustration of the transaction matrix that we students with good standing status.
supply to the apriori algorithm. The second row
Step 8. Finally, advisors provide feedback for the person
shows that student1 was absent along with student4
responsible for generating the rules with regards to
on Jan 7th in section CS101_1.
the usefulness of the generated rules. The feedback
Step 4. Fill the matrix created in Step 3 with student helps tweak the support and confidence thresholds
absence values based on the records pulled from the for the next rounds.
attendance system.
Step 5. Run the apriori algorithm on the computed matrix IV. EXPERIMENT
from Step 4. The support and confidence thresholds To demonstrate the presented method, we analyzed the
need to be specified. The support threshold is based attendance records of more than 2000 students, over the period
on the number of sessions. If a student takes 5 of one semester at the author’s institution. The apriori
courses in a semester, where each section holds 30 algorithm was applied to perform the basket analysis. The
sessions throughout the semester, then the maximum authors implemented their approach using R programming
number of sessions is 150. Consider two students language [17], a popular data mining software environment.
having an exact class schedule, the maximum The following is a summary of the data collection, analysis
number of times both are absent is 150. However, it and results of the experiment.
is highly unlikely that a student misses all of
sessions, in all of the classes. Therefore, one may set A. Data Collection and Preprocessing
the support threshold a reasonable value. For
example, a threshold of 10 help generate association The authors collected more than 50,000 absence records,
rules, where each rule is supported by 10 incidents where each record documents the absence of a student with
were the two students were absent together. The respect to a class. Each record in the collected attendance data
confidence threshold is based on the required rule consists of the following items. The St_ID is the student id,
strength. Setting the threshold to 0.7 help generate C_ID, is the course id, Sec_ID is the section id and A_ Date is
rules, where each rule associates the absence of one the date of the absence:
student to another with 70% accuracy. Setting the Recordi = (St_ID, C_ID, Sec_ID, A_ Date)
support and confidence thresholds also depends on
the number of rules the user is willing to go through. The records were then preprocessed to remove the first and
The lower the thresholds, the more rules the apriori last week of classes. The rationale is to avoid the collection of
generates. absence records for the days in which many students are
absent. At the first week, students are busy with add and drop,
Step 6. The output of the apriori algorithm is a set of rules while at the last week, many student miss classes to prepare
that meets the thresholds set in Step 5. If a student for assessment. The authors also preprocessed the records for
appears in a rule, then the advisor of that student may students who change sections. An attendance matrix was
receive a notification with regards to that rule. An generated for each section, where rows represent the dates that
example output is displayed in Table III. Table III section had a session and columns represent the students
lists a set of the actual anonymized rules generated registered in that section, i.e., the class roaster.
978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1201
Fig. 2. A Graph of the Anonymized Rules – the weight of an edge is the support of the rule
1
RStudio: https://www.rstudio.com
978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1202
records can be useful for identifying student who miss classes [14]M. Hussain, M. B. Al-Mourad, A. Hussein, S. Mathew & E.
due to peer pressure. We investigated the value of analyzing Morsy, “A novel approach for analyzing student interaction
the records pulled from the student attendance system. We with educational systems,” in Proc. of Global Engineering
presented a method that applies a market basket analysis, Education Conference, pp. 1332-1336, 2017.
using the apriori algorithm, on student attendance data. The [15]M. J. Berry & G. Linoff, Data mining techniques: for
method was used to mine over 50,000 attendance records that marketing, sales, and customer support. John Wiley & Sons,
belong to over 2000 students at the author institution. The Inc., 1997.
method was successful in generating more than a 100 [16]R. Agrawal and R. Srikant, “Fast algorithms for mining
association rules. Each rule links the absences of one student association rules,” In Proc. of the 20th International
to the absences of one or more students. Conference on Very Large Data Bases, pp 487-499, 1994.
The main limitation of this work is the fact that the method [17]K. Hornik, “R FAQ,” The Comprehensive R Archive Network.
needs a significant number of records to be able to generate 2.1 What is R?, 2015, Retrieved 2019-01-13, from
the rules. At least, half of the semester may pass before enough https://cran.r-project.org/doc/FAQ/R-FAQ.html
absences can be recorded by class instructors. Nevertheless,
advisors may benefit from this approach to better advise
students for the next semester.
REFERENCES
[1] R. S. Baker and P. S. Inventado, “Educational data mining and
learning analytics,” Learning Analytics, pp. 61-75, 2014.
[2] A. Dutt, M. A. Ismail & T. Herawan, “A systematic review on
educational data mining,” IEEE Access, vol. 5, pp. 15991-
16005, 2017.
[3] A. Peña-Ayala, “Educational data mining: A survey and a data
miningbased analysis of recent works,” Expert Systems with
Applications, vol. 41, no. 4, pp. 1432-1462, 2014.
[4] H. Alagib, A. Hamza, & P. Kommers, “A Review of
Educational Data Mining Tools & Techniques,” International
Journal of Educational Technology and Learning, vol. 3, no. 1,
pp. 17-23, 2018.
[5] O. D. Ayodele, “Class attendance and academic performance
of second year university students in an organic chemistry
course,” African Journal of Chemical Education, vol. 7, no. 1,
pp. 63-75, 2017.
[6] C. Romero, S. Ventura & E. García, “Data mining in course
management systems: Moodle case study and
tutorial,” Computers & Education, vol. 51, no. 1, pp. 368-384,
2008.
[7] B. K. Baradwaj & S. Pal, “Mining educational data to analyze
students' performance,” International Journal of Advanced
Computer Science and Applications, vol. 2, no. 6, pp. 63 - 69,
2011.
[8] L. Talavera, & E. Gaudioso, “Mining student data to
characterize similar behavior groups in unstructured
collaboration spaces,” in Proc. of the 16th European
conference on artificial intelligence, pp. 17-23, 2004.
[9] V. Ramesh, P. Thenmozhi & K. Ramar, “Study of influencing
factors of academic performance of students: A data mining
Approach,” International Journal of Scientific & Engineering
Research, vol. 3, no. 7, pp. 1-5, 2012.
[10]A. Gandomi & M. Haider, “Beyond the hype: Big data
concepts, methods, and analytics,” International Journal of
Information Management, vol. 35, no. 2, pp. 137-144, 2015.
[11]B. Daniel, “Big Data and analytics in higher education:
Opportunities and challenges,” British Journal of Educational
Technology, vol. 46, no. 5, pp. 904-920, 2015.
[12]V. Kellen, “Applying Big Data in higher education: A case
study,” Cutter Consortium white paper, vol. 13, no. 8, 2013.
[13]M. Hussain, M. Al-Mourad, S. Mathew & A. Hussein, “Mining
educational data for academic accreditation: aligning
assessment with outcomes,” Global Journal of Flexible
Systems Management, vol. 18, no. 1, pp. 51-60, 2017.
978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1203