0% found this document useful (0 votes)

171 views23 pages

Big Educational Data & Analytics Survey

This document provides a comprehensive literature review on big educational data and analytics. It surveys the current state of the field in five parts: (1) an overview and classification of big education research, (2) data sources such as learning management systems and MOOCs, (3) data collection, mining and databases, (4) technological architectures and tools, and (5) different approaches to data analytics including predictive analytics, learning analytics, recommendation systems, and challenges. The review aims to map out the full landscape of this emerging field and provide a more inclusive discussion of data analytics in higher education beyond traditional forms of learning analysis.

Uploaded by

imran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views23 pages

Big Educational Data & Analytics Survey

Uploaded by

imran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Received April 10, 2020, accepted April 25, 2020, date of publication May 14, 2020, date of current

version July 2, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.2994561

Big Educational Data & Analytics: Survey,

Architecture and Challenges
KENNETH LI-MINN ANG 1, (Senior Member, IEEE), FENG LU GE2 ,
AND KAH PHOOI SENG 3,4 , (Member, IEEE)
1 School of Science and Engineering, University of Sunshine Coast, Petrie, QLD 4502, Australia
2 PacificTelecom & Navigation Ltd., Hong Kong
3 School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
4 Sydney Imperial Polytechnic Institute, Sydney, NSW 2000, Australia

Corresponding author: Kenneth Li-Minn Ang (lang@usc.edu.au)

ABSTRACT The proliferation of mobile devices and the rapid development of information and commu-
nication technologies (ICT) have seen increasingly large volume and variety of data being generated at an
unprecedented pace. Big data have started to demonstrate significant values in higher education. This paper
gives several contributions to the state-of-the-art for Big data in higher education and learning technologies
research. Currently, there is no comprehensive survey or literature review for Big educational data. Most
literature reviews from a few authors have focused on one of these fields: educational mining, learning
analytics with discussions on one or two aspects such as Big data technologies without educational focus,
social media data in education, etc. Most of these literature reviews are short and insufficient to provide
more inclusive reviews for Big educational data. In this paper, we present a comprehensive literature review
of the current and emerging paradigms for Big educational data. The survey is presented in five parts: (1) The
first part presents an overview and classification of Big education research to show the full landscape in this
field, which also gives a concise summary of the overall scope of this paper; (2) The second part presents a
discussion for the various data sources from education platforms or systems including learning management
systems (LMS), massive open online courses (MOOC), learning object repository (LOR), OpenCourseWare
(OCW), open educational resources (OER), social media, linked data and mobile learning contributing to
Big education data; (3) The third part presents the data collection, data mining and databases in Big education
data; (4) The fourth part presents the technological aspects including Big data platforms and architectures
such as Hadoop, Spark, Samza and Big data tools for Big education data; and (5) The fifth part presents
different approaches of data analytics for Big education data. This part provides a more inclusive discussion
on data analytics which is beyond traditional forms of learning analysis in higher education. This includes
predictive analytics, learning analytics including collaborative, behavior, personal learnings and assessment,
followed by recommendation systems, graph analytics, visual analytics, immersive learning and analytics,
etc. The final part of the paper discusses social (e.g. privacy and ethical issues) and technological challenges
for Big data in education. This part also illustrates the technological challenges faced by giving an example
for utilizing graph-based analytics for a cross-institution learning analytics scenario.

INDEX TERMS Big data, learning technologies, educational data, learning analytics.

I. INTRODUCTION produce huge amounts of data in a variety of formats from

In a world of data deluge, vast amounts of information are different sources. The volume, variety and velocity (3Vs) of
generated in every area of our lives with the rapid devel- data generated daily lead to the phenomenon of Big data
opment of new technologies such as Internet, social media, with the potential to further improve the values of products
Internet of Things (IoTs), cloud, smart and mobile devices. and services in different industries [147], [148]. One of the
The public, commercial and social sectors also ceaselessly sectors that 3Vs coexist in the data is the higher education
and professional education industry. Educational data are
The associate editor coordinating the review of this manuscript and captured and generated rapidly in the higher educational
approving it for publication was Mauro Gaggero . ecosystem which embraces different systems and platforms

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
116392 VOLUME 8, 2020
K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

such as course management and learning management sys- The authors in [153] reported on a case study applying a
tems (LMS), massive open online courses (MOOC), Open- Big data framework towards a LMS which was conducted at
CourseWare (OCW), Open Educational Resources (OER), the Catholic University of Murcia. The authors commented
and social media sites such as Twitter, Facebook, YouTube on the challenges of managing the large volume of data
and personal learning environments (PLE). The scalability generated by users in the LMS and employed statistical and
to data processing and analysis enable the development of association rule techniques to speed up the statistical analysis
new insights and valuable information from these educational of the data. In this study the size of the Big data generated by
data and have further shown promise in higher education to the LMS was 70GB from data sources such as student activ-
benefit academics, students and the whole education ecosys- ity, learning modality (e.g. on-campus, online, and blended),
tem. Since Big data and analytics is employed to draw useful number of accesses to the LMS, tools employed by students
insights or values (the 4th V ) from the educational data, we use and their associated events. In the era of Big education
the term Big educational data to describe this emerging field. data, educational data mining (EDM) and data analytics are
There has been growing interest in the education commu- becoming essential tools to address the challenges. Data min-
nity to gain insights of Big educational data to improve ing or also termed as knowledge discovery is known for its
the learning performance of students, recommend courses, effectiveness in discovering hidden information embedded in
analyze learning patterns, predict dropout, improve the work- the educational data. A recent literature review paper on EDM
ing effectiveness of instructors and reduce administrative can be found in [11]. This review work presented twenty
workload. years of data mining research in e-learning environments,
Big data technologies comprise of architectures and tech- from an educational perspective. This paper presented a wide-
nologies which are designed to extract valuable information scale review of 525 papers where both terms of ‘‘data min-
from very large volumes from a wide variety of data sources. ing’’ and ‘‘education’’ were analyzed and used as keywords.
Some common platforms for Big data technologies which The review included 72 papers focused on teaching-learning
have been developed are Hadoop, Samza and Spark. Hadoop evaluation. The analyzed papers showed that the researches
is commonly used for the information processing of complex in EDM have expanded into several different sub-areas and
Big data systems and off-line processing. Samza is mainly themes.
used to address the large volumes for high rate stream data Other literature reviews paper on EDM can be found
processing, and Spark is often used for off-line rapid Big in [12]–[18]. Learning analytics (LA) or sometimes referred
data processing. In the context of Big data in education, some to as academic analytics, and EDM are interconnected areas
specific Big data architectures or frameworks [1]–[10] have in education research. A recent literature review paper on
been proposed for education. The authors in [1] proposed a EDM and LA together for 21st century higher education can
distributed architecture for the information processing of Big be found in [19]. There are different definitions of LA from
education data and predicting student performance with and different authors. Some authors define it in terms of the use of
without sentiment analytics. The authors in [2] proposed a student-generated data for the prediction of educational out-
five-layered architecture termed the Concept Definition for comes for tailoring education, whereas other authors define
Big Data Architecture for education. The authors in [3] pro- LA as a tool to help educators examine, understand and
posed a cloud-based architecture to analyze educational data support student study behaviors and change their learning
from the Moodle system in the cloud using Apache Hadoop. environments. A literature review of the current landscape
The authors in [4] proposed a Big data architecture for edu- of the usage of LA in higher education can be found in
cation using Spark to identify patterns of lecture data that [20]. This study was based on the analysis of 252 papers
students have taken for the year and semester. The authors on learning analytics in higher education published between
in [5] proposed a logging architecture for an E-Learning Big 2012 and 2018. The work by [21] proposed a literature
Data Ecosystem. The authors in [6] proposed a Big data review of the LA landscape from its evolution, status and
infrastructure using the Hadoop platform. The platform is trends. The authors discussed LA as arising from a knowledge
deployed within the e-learning infrastructure of a laboratory. discovery paradigm to understand the learning process. The
The authors in [7] proposed an architecture based on the work by [22] discussed the evidence on four propositions of
Apache Hadoop distributed computing architecture to pro- LA including whether LA improves learning outcomes and
cess the Big data of Holland vocational interest theory. student retention, completion and progression. The work by
Other works on frameworks and platforms for Big educa- [23] focused on the current research trends of LA and its
tion data can be found in [8]–[10]. Further details will be limitations and methods. Another literature review focused
discussed later in the paper. Big data analytics is changing on the use of LA in higher educational settings can be
the educational industry and gives new opportunities for both found in [24]. Up to this point, we can see that there is no
learners and instructors. In general, there are three challenges comprehensive survey or review for Big educational data.
for Big educational data analysis to be addressed: (1) The Most reviews have either focused on EDM or LA from only
huge amount of data to be processed; (2) The complex and the education aspects. There are some short papers on Big
unstructured data analytics; and (3) The difficulty to find the education data but they only provide short overviews of Big
hidden value in the Big education data in a timely manner. data in education and challenges. Therefore, there is a need of

VOLUME 8, 2020 116393

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

a solid review that combine all aspects in both technologies TABLE 1. Overall classification of big educational data research.
and education for Big education data. A comprehensive lit-
erature review of Big education data which emphasizes on
all aspects of Big data technologies, architectures and data
analytics for education is the major contribution in this paper.
The literature review in this paper has been comprehen-
sively carried out using an extensive search of the relevant
databases including IEEE Xplore, Springer, ScienceDirect,
ACM conference proceedings and other sources using com-
bination of keywords such as ‘‘Big data’’, ‘‘Education’’,
‘‘Learning analytics’’, ‘‘Education data mining’’, ‘‘Learn-
ing management system’’, ‘‘MOOC’’, ‘‘immersive learning’’,
etc. For example, when using IEEE Xplore, a search with
the keyword combination of ‘‘Big data’’ and ‘‘Education’’
returned 585 journals and 1452 conference papers. Of this,
recent papers most relevant to Big educational data were
surveyed.
In this paper, the data sources from education plat-
forms or systems including LMS, MOOC, learning object
repository (LOR), OCW, OER, social media, linked data
and mobile learning contributing to Big education data are
discussed. This is followed by the data collection, data mining
and databases for education. This paper also gives discus-
sions for the technological aspects which include Big data
platforms such Hadoop, Spark and Samza and Big data tools
for Big education data. The Big data architectures or frame-
works specifically proposed to education are reviewed and
discussed in detail. The most challenging part of this paper
is to present a comprehensive literature review on data ana-
lytics from both technology and education aspects and this is
beyond traditional forms of analysis in education. The works
on data analytics are classified into predictive analytics, learn-
ing analytics which includes collaborative and interactive
learning, behavior learning, personal learning and others.
Recommendation systems or recommender for education
which is an emerging topic in data analytics is also presented.
Other emerging analytics such as graph analytics, visual ana-
lytics, immersive learning and analytics are also included.
The final part of the paper provides some experimental
insights for utilizing graph analytics for a university-based
learning analytics scenario. The technological and social
challenges for Big data in education and insights for future II. OVERVIEW AND RESEARCH CLASSIFICATION
direction are also discussed. The rest of the paper is orga- The paper first presents the overview and classification of Big
nized as follows. Section II gives background information and educational data and analytics research as shown in Table 1 to
research classifications. Section III describes the data sources give a concise summary of the overall scope of this paper. The
from education systems that form the Big education data. research works are classified into the various categories based
Section IV reviews the data collection, mining and databases on the following: (1) Big educational data; (2) Technological
in education systems. Section V presents the technological aspects for Big data for education; (3) Data analytics for Big
aspects for Big education data. Section VI gives a com- education data; and (4) Future challenges for Big education
prehensive literature review on data analytics. Section VII data. Table 1 also allows the reader to see the full landscape
discusses future challenges for Big data in education. This of the research field of Big education data.
section also illustrates the usefulness and technological chal-
lenges faced by giving an example for utilizing graph-based III. DATA SOURCES FROM EDUCATION SYSTEMS
analytics for a cross-institution learning analytics scenario. CONTRIBUTING TO BIG EDUCATION DATA
The paper is concluded with some comments and remarks in Data from education systems can be found in various sources
Section VIII. such as student information systems, student administrative

116394 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

TABLE 2. Summary of survey contributions for EDM research.

systems, learning management systems and from library formats of audio, video, text, and images besides the data in
information systems. New education developments and appli- relational databases from institutions. This section presents
cations of information technology together with Internet tech- sources that contribute to Big educational data by reviewing
nology have led to the online education industry. Higher the current education systems or platforms. Fig. 1 shows a
education institutions are increasingly offering and delivering pictorial overview of research areas and data sources in Big
online learning resulting in a large volume and availabil- education data.
ity of educational digital libraries, storage repositories and
tools. Furthermore, enrolled students and offered courses
from massive open online courses (MOOC) are becoming A. LEARNING MANAGEMENT SYSTEMS (LMS)
large and diverse, resulting in a growing abundance in data Learning management systems (LMS) are educational man-
for analytics. There is also increasingly different varieties and agement platforms for the administration, delivery, tracking

VOLUME 8, 2020 116395

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

FIGURE 1. Overview of research areas and data sources for big education data.

and reporting of educational curriculum and courses. Moo- ent pedagogical approaches called c-MOOC and x-MOOC
dle [28] is one of the most popular open source LMS to distinguish MOOC are often used [30]. The c-MOOC
options available today. Other examples of LMS [29] are emphasize the openness and networking among learners and
Canvas [151], Sakai [152], ATutor, Eliademy. Forma LMS, facilitators where anyone can contribute to the contents,
Dokeos and OpenOLAT. The LMS concept emerged from e- whereas x-MOOC are more facilitator-centric; the contents
Learning. In general, LMS have three major functions: (1) are prepared by the facilitators. Coursera [31] and edX [32]
Management of educational courses and students; (2) Man- are two established MOOC. Other examples of MOOC [33]
agement of online assessments and tracking student progress include Udacity, Duolingo, Treehouse and Google Primer.
and attendance; and (3) Providing feedback to users and
students. The LMS provides services and tools to instructors
C. OPEN EDUCATIONAL RESOURCES (OER) &
to create course content which contains text, images, tables,
OpenCourseWare
interactive tests, and slideshows. The LMS can also be used
to engage the student with contact tools and control access Open educational resources (OER) are educational mate-
to the educational content. For instructors, the LMS enables rials that are freely available in the public domain. The
the management of courses and modules, enrollment of stu- OER include licensed text, media, and other digital assets
dents, and generation of reports on students. Most modern that are useful for teaching, learning, and assessment. The
LMS are web-based information technology systems. With term OER was introduced at the 2002 UNESCO Forum on
the advancement of technology, various tools and strategies Open Courseware [34]. Some examples of OER include:
can be employed for embedding content into LMS such as (1) university curriculum and courses, video lectures and
SCORM (Sharable Content Object Reference Model) [26], assignments; (2) Interactive simulations about a specific topic
and LTI (Learning Tools Interoperability) [27]. (e.g. mathematics, chemistry, etc.); (3) Digital textbooks that
are supported with additional learning materials; (4) Lesson
plans, worksheets and learning activities; and (5) Transla-
B. MASSIVE OPEN ONLINE COURSES (MOOC) tions and adaptations of previously-published OER. Some
Massive Open Online Courses (MOOC) employ web-based well-known examples of OER [35] include Khan Academy,
learning technologies to enroll large number of students OpenStax CNX, Open Textbook Library, Curriki, and Wiki-
worldwide. MOOC learning materials and contents can be media Commons. OpenCourseWare (OCW) [36] is a subset
delivered as text-based or video-based materials. Two differ- of OER. OCW refers to the free and open digital publication

116396 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

of high-quality college and university level educational mate- data collection about student learning and experiences. Edu-
rials. Examples of OCW include MIT OCW, Johns Hopkins cational data can be collected at a rapid pace with the advance
OCW and CORE (China Open Resources for Education). of online technologies (e.g. MOOC and LMS) which have the
capability to track and collect a huge amount of educational
D. SOCIAL MEDIA data about learner experience. The Experience API (xAPI)
Social media sites such as Twitter, Facebook and YouTube [25] is an open data specification for data collection across
provide a platform for learners to share their educational learning tools. The authors in [43] use the xAPI standard
experiences, emotions, concerns about the learning process to collect, track and store educational data retrieved from
and seek social support from peers. These digital data provide an e-learning environment called Kalboard 360. The tracked
knowledge and perspectives for instructors to understand data is classified into three features (behavioral, demographic
the student’s experiences outside the classroom environment. and academic background features). Another major source
The data from social-based environments can provide valu- of educational data can be obtained from social media (e.g.
able knowledge to inform on student learning and assist insti- blogs, online social networks, microblogs). It is challenging
tutional decision-making on interventions for at-risk students, to collect social media data related to student learning experi-
improve education quality and increase student retention, and ences and behavior because of the variety and diversity of the
success [37]. The abundance and diversity of the social media language used. The authors in [37] performed data collection
data raises challenges for algorithms to capture the embedded from Twitter using an educational account on a commercial
information within the data. social media monitoring tool.

E. LINKED DATA B. EDUCATIONAL DATASETS

Linked Data (LD) uses Internet technologies to create con- Educational datasets can be considered from two aspects
nections among data which may be stored in databases dis- [44]: (1) Datasets directly related to educational information
tributed across several geographic locations. LD extends the containing educational resources, institutional data and edu-
Web of Documents to a Web of Data, where data may be cational indicators; and (2) Datasets from different domains
directly connected. LD principles and technologies are being which may be used in educational settings. Some exam-
investigated in various areas. Several studies target to use LD ples of educational datasets are DBpedia [45], Freebase [46]
to solve problems of interoperability of educational data and and GeoNames [47]. The data in these educational datasets
resources. The authors in [39] presented a systematic map- can be used for enriching the available educational content,
ping of proposals which have been adopting Linked Data to discovery of new information which can help educational
support education objectives. The authors discussed the chal- practices and connecting local datasets to the cloud. For a
lenges and provided a research landscape of the area. Some few examples, the authors in [48] used DBpedia to analyze
notable projects in the LD area are the LinkedUp project, the ranking of universities based on their structured infor-
Linked Education Cloud, and mEducator. LD technologies mation, and the authors in [49] used the categories provided
have the potential to drive the development of applications by DBpedia to select the suitable categories for describ-
in the LA and EDM areas. The work in [40] describes the ing learning objects. Examples of datasets from different
Learning Analytics and Knowledge (LAK) dataset which domains which may be used in educational settings include
contains a five-year collection of bibliographic resources TEDTalks [50] which contains various conferences on a wide
about learning analytics and educational data mining. Other range of topics. Examples of other datasets could be from
examples of works for applying LD in LA can be found in different domains and fields such as agriculture, medicine
[41] and [42]. The authors in [41] developed a metric to and tourism. Examples of datasets cited in the agriculture
identify the relative ranking of universities worldwide based field are organic.edunet, Agris, AGROVOC, ASFA and JITA.
on educational Linked Data. The authors in [42] proposed Some examples of datasets cited in the medical field are
using education and economic LD for analysis of school PubMed and mEducator. PubMed is a service of the US
performance in Brazilian schools. National Library of Medicine which includes citations from
MEDLINE and other scientific journals in life sciences. The
IV. DATA COLLECTION, MINING AND DATABASES IN mEducator Linked Educational Resources dataset is intended
EDUCATION to provide educational resources in a linked data format, and
In Big education data, a variety of data is collected, stored are focused on the medical field, covering content ranging
and explored to unlock the value accrued from Big data. This from traditional teaching to open learning, and experimental
section presents a literature review of previous works from studies. Another project cited by several studies in the educa-
three aspects: (1) Educational data collection; (2) Educational tion domain is LinkedUp which have the objectives to collect
datasets; and (3) Educational data mining. and make available various types of data sources relevant for
education, to provide a shared resource and to develop the
A. EDUCATIONAL DATA COLLECTION community interested in the Web of Data for Education [51].
Traditionally, educational researchers have been using meth- Other examples of university initiatives for linked datasets
ods such as surveys, interviews and classroom activities for include the University of Southampton Open Data service,

VOLUME 8, 2020 116397

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

the Greek University Open Data, and the Linking Italian administrators and using EDM to set parameters to improve
University Statistics Project. site efficiency and adapt it to the behavior of users.
The authors in [13] presented a systematic review on
C. EDUCATIONAL DATA MINING EDM focusing on clustering algorithms and its applicability
Data mining techniques are increasingly gaining significance and usability in the context of EDM. The authors term this
in the education sector and the outcomes from data min- approach when applied to analyze datasets from educational
ing techniques can provide invaluable support for decision systems as Educational Data Clustering (EDC). Different
making. The field of data mining in education is termed approaches for EDC were reviewed including 166 studies for
as Educational Data Mining (EDM). EDM is an emerging e-learning and clustering, examination failure and clustering,
discipline that focuses on applying data mining tools and intelligent tutor system and clustering, learning style and
techniques to education related data. This section presents a clustering, student modeling and clustering, student moti-
literature review of the literature or survey papers for EDM vation and clustering, student profiling and clustering, etc.
and highlights their main contributions. A recent literature In [14], the authors performed a literature review focused on
review or survey paper can be found in [11]. This review the different agents in the educational context as students,
presents twenty years of data mining research in e-learning educators, researchers, institutions, and managers. The sur-
environments, from an educational perspective. The authors vey reviewed DM techniques applied to education, and mod-
identified and classified challenges for research to improve els to provide updated information and improve institutional
student learner performances. Another literature review paper efficiency. The review of techniques included forecast perfor-
by [19] published in 2019 focused on EDM and learning ana- mance modelling, undesired behaviour detection, monitoring
lytics in higher education. The work in this literature review support, recommendation planning and scheduling, and intel-
covered four main areas: (1) computer-supported learning ligent tutoring. Other review works on EDM can be found
analytics (CSLA) and the use of DM techniques to derive in [16]–[18]. The literature review paper of [16] discussed
actionable information based on student interaction in LMS an explanation of the DM techniques in order of relevance,
environments; (2) computer-supported predictive analytics tendencies, and limitations faced by e-learning environments.
(CSPA) and the use of EDM and LA to predict student In [17], the authors introduced a new perspective on the indi-
performance and retention in courses based on assessment, vidualization and interaction between the educational actors
engagement and domain knowledge in a learning activity; (3) and highlighted the trends and challenges of EDM from
computer-supported behavioral analytics (CSBA) and the use the perspectives of educational actors. In [18], the authors
of DM techniques to identify student behavioral patterns and discussed the results of researches upon the behavior detec-
preferences when participating in online learning activities; tion, personalization and student’s performance evaluation
and (4) computer-supported visualization analytics (CSVA) obtained by DM techniques such as clustering, classification,
and the combination of information visualization techniques and regression. In the work by [52], the authors focused
with advances in data mining and knowledge representation on detecting the students’ circumvention risks through pre-
to offer a visual analysis of student behavior with respect to dictive models and provide a custom recommendation to
the learning activity. students by identifying their needs and learning disabilities.
Other review papers on EDM for education can be found The objectives were to present a literature review of EDM
in the works by [12]–[18], [52], [53]. Table 2 shows a sum- focused on student’s retention and evasion, recommendation
mary of the various surveys which have been proposed for systems and course administration. The work in [53] covered
EDM. The table gives various details including the year, ubiquitous and pervasive data mining applied to education
survey objectives, and remarks and comments. The authors for fraud detection and identification of students that require
in [15] surveyed the history and applications of data mining special attention.
techniques in the educational field for traditional educational
system, web-based educational system, intelligent tutoring V. TECHNOLOGICAL ASPECTS FOR BIG EDUCATION
system, and e-learning. The authors discussed concepts for DATA
EDM such as prediction, clustering, relationship mining, In this section, some common platforms for Big data such as
outlier detection, text mining, and social network analysis. Hadoop, Spark and Samza will be discussed. Hadoop, Samza
In [12], the authors targeted to highlight the main data min- and Spark are currently the popular systems for Big data
ing techniques applied in the e-learning environment and analysis. Hadoop is used for off-line and complex educational
proposed three useful orientations for EDM research: (1) Big data processing, Samza is mainly used to solve the high
Orientation towards students and using EDM to recommend data rate and large amounts for streaming education data pro-
activities, resources and learning tasks to learners based on cessing, and Spark is often used for off-line rapid education
the tasks already accomplished by the learner and their suc- Big data processing. The authors in [6] provided a general
cesses; (2) Orientation towards educators and using EDM overview of Big data computing and discussed main charac-
to obtain objective feedback for instruction, evaluate the teristics such as data organization, decision-making, domain
structure of the course content and its effectiveness on the specific tools and platform tools. The authors illustrate the
learning process; and (3) Orientation towards academics and infrastructure that enables users to extract the maximum

116398 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

benefit from the large amounts of data available. In our

context of Big data in education, this section aims to give a
literature review for the Big data architectures or frameworks
specially proposed for education. The architectures or frame-
works on higher education setting is our focus. Specific soft-
ware tools for data analytics/Big data which are increasingly
being used in education will also be discussed.

A. BIG DATA PLATFORMS

Big data can be handled on different platforms. Hadoop and
Spark are two commonly used platforms. In general, Apache
Spark is used to manage massive amounts of data and to
provide real-time analytics power.

1) HADOOP PLATFORM
Hadoop is an open source, distributed data processing dis-
tributed system infrastructure developed by the Apache Foun-
dation. It enables distributed and parallel processing of large
FIGURE 2. Samza technology core architecture [56].
amount of data sets across clusters of many computers. It fea-
tures low cost, high efficiency, high reliability, high scalabil-
ity, and high fault tolerance. Hadoop consists of the HDFS
distributed file system, MapReduce and several general- Other Business Intelligence (BI) Tools Although an
purpose tools. increase Big data technology is huge, it doesn’t mean the end
MapReduce MapReduce is a paradigm of parallel pro- of classical BI tools like Cognos, QlikView, SPSS and so on.
gramming across big datasets working with many comput- The trend is that BI tools would be able to work with new Big
ers (nodes). It supports the use of inexpensive computer Data technologies side by side.
clusters to perform distributed parallel computing on large Data Storage: NoSQL databases are inherently schema
datasets up to petabytes. The data can be in the form of less and highly scalable. These databases support frameworks
structured or unstructured forms (e.g. weblog records, e- like MapReduce, Dryad etc. for the parallel processing of
commerce click trails, binary or multi-line records). It is large amounts of data. The paper by [54] investigated educa-
mainly composed of two functions: (1) Map function; and tional technology for Big data analysis and the exploration of
(2) Reduce function. The Map function is responsible for the development trend for online education. The authors gath-
processing standardized data whereas the Reduce function ered data, attached importance to the basic function and value
mainly summarizes the results after the Map function. of education data, and explored the education technology that
HDFS is a distributed, scalable and portable filesystem for matches the Big data analysis. The work by [55] discussed the
the Hadoop framework written in Java. HDFS stores large relationship between Big data and cloud computing, Big data
files (from gigabytes to terabytes) across many servers. HDFS storage systems and Apache Hadoop technology.
provides unstructured data storage for Big data. HDFS is
characterized by ‘‘write once read many times’’ and is very 2) SPARK PLATFORM
suitable for reading Big data. HDFS is a typical master-slave Apache Spark is a distributed computing framework like
architecture. HDFS has the advantages of high fault tolerance MapReduce but maintains data in Resilient Distributed
and high scalability. Dataset (RDD). It is useful for algorithms that perform iter-
Hive is a data warehouse infrastructure built on top of ative operations and data flow processing. Spark provides
Hadoop which provides summarization of data, query and Shark, an interactive query analyzer, Bagel, a high-volume
analysis. Hive supports analysis of big datasets stored in graph processing and analyzer, Spark Streaming, a real-time
HDFS, Amazon S3 file system etc. It provides an SQL –like analyzer, and Mllib, a machine learning library.
language called HiveQL, supporting indexes.
NoSQL: is a database system providing a mechanism for 3) SAMZA PLATFORM
storage and retrieval of data with less constrained than tradi- Samza is a distributed stream processing framework for real-
tional SQL (relational) databases. time data processing. In Samza, the data stream is partitioned,
Hadoop Common provides java libraries and utilities and each partition is given a specific ID or offset. Samza
which are required by other Hadoop modules. places the storage and processing on the same machine and
Mahout: Mahout is an open source machine learning and does not load additional memory while maintaining pro-
data mining algorithms sets based on Hadoop which has cessing efficiency and providing a framework for a flexible
implemented many machine learning and data mining algo- pluggable API. Fig. 2 shows the Samza technology core
rithms. architecture [56].

VOLUME 8, 2020 116399

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

FIGURE 4. Processing of big educational data in the cloud [3].

earlier layers. The third layer is the Data Warehouse layer

which is technology and vendor independent for creating
the data cubes. The Data Mining layer uses tools from IBM
FIGURE 3. Distributed architecture for big education data [1]. SPSS or SAS. The highest layer (Reporting Layer) performs
the creation of useful analysis from the obtained data for
different types of users (e.g. teachers, administrators, other
stakeholders for the university). The Cognos software appli-
B. FRAMEWORKS AND ARCHITECTURES FOR BIG cation from IBM could be used for the reporting functionality.
EDUCATION DATA The authors in [3] proposed an architecture to analyze
This section discusses several frameworks and architectures educational data from the Moodle system in the cloud using
for Big education data. The authors in [1] proposed a dis- Apache Hadoop. Their cloud-based architecture consists of
tributed architecture for the information processing of Big four stages: (1) Big Educational Data; (2) Data Collection;
education data. The authors use this architecture to predict (3) Data Transport; and (4) Cloud Computing Infrastruc-
student performance with and without sentiment analytics. ture. The Big educational data are collected through the
Fig. 3 shows their proposed architecture which consists of API or other interfaces and transported to the data storage
three layers: (1) Data Access Layer; (2) Data Storage Layer; with the use of the most suitable platform, tool or service.
and (3) Data Processing Layer. The Data Access Layer com- The data storage and data processing are performed in the
prises of all the data sources the processing engine require cloud. The data-intensive computing framework is applied
for the information processing such as student logs, student to analyze massive amounts of data to reveal the valuable
records and historical data), and a student mobile application information. The main contribution of this paper is the newly
which can generate data based on a student’s activity. The data proposed model approach for processing big educational
sources are connected to the Storage Layer (HBASE) using data generated from the Moodle system, which was also
the Sqoop and REST API-HBase Connector. The second implemented and validated as an experimental architecture as
layer is the Data Storage Layer which comprises of HBase shown in Fig. 4. The architecture was constructed based on
and the HDFS distributed storage. The third layer is the Pro- open-source platforms, tools and services. The API was used
cessing Layer which performs the sentiment and predictive to limit programming only to the computational tasks and
analytics. This layer uses the Spark cluster. In this layer, data transfer from the Moodle system to the cloud. The exper-
the features were transformed to the Spark Resilient Dis- imental implementation of the proposed model approach was
tributed Data (RDD) formats to perform the predictive ana- performed with the use of the following platforms: Apache
lytics. The predictive modeling procedures were performed Flume, Apache Hadoop and Hadoop Distributed File System
via a process of ensemble modeling. (HDFS), Apache HBase, Apache Hive, Apache Sqoop and
The authors in [2] proposed an architecture termed Con- OpenStack.
cept Definition for Big Data Architecture in the Education The authors in [4] presented a Big data architecture for
System. Their architecture consists of five layers: (1) Data education using Spark. As shown in Fig. 5, the various data
Sources; (2) Big Data Processing; (3) Data Warehouse; (4) are delivered in HDFS according to each attribute. The struc-
Data Mining Tools; and (5) Reporting. In the Data Sources tured data is transferred from the RDBMS to HDFS using
layer, the data can be stored in traditional SQL databases SQL-to-Hadoop (Sqoop). Among these collected data, lec-
(e.g. classical relational data) or NoSQL databases (e.g. data ture data is an important item, and the FP-Growth algorithm
from social networks). The Big Data Processing layer uses is performed using MLlib, a spark machine learning library.
Apache Hadoop to process the huge amount of data from the The resulting data can be used to identify patterns of lecture

116400 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

FIGURE 5. Educational big data architecture using Spark [4].

FIGURE 7. Big data Hadoop infrastructure for education [6].

system is the Result Data Storage system which has the

ability of rapid access to data to provide high I/O perfor-
mance for stream computing in the Computation Module
and the Service Module. The Computation Module contains
two computing frameworks. The first is the data intensive
computing framework which is applied to analyze massive
raw data to dig out valuable information, and the second is the
stream computing framework which is applied to deal with
every coming data in real-time. The computing frameworks
are required to support parallel computing to guarantee low
latency of the analyzing process and improve computational
efficiency. The Service Module reads the needed data from
the Result Data Storage system for all objects and roles in
FIGURE 6. E-learning big data ecosystem [5]. each layer of the e-learning ecosystem.
The authors in [6] proposed a Big data infrastructure
data that students have taken for the year and semester. These deployed as a Hadoop platform in order to improve the edu-
patterns include pattern information for students’ preferred cation process. The platform is integrated with the learning
lectures, and based on this, a recommendation system was management system (LMS) Moodle platform. The platform
implemented that recommends lectures to students. In addi- is deployed within the e-learning infrastructure of a labora-
tion, by using the data collected from the sensor informa- tory. Fig. 7 shows the implemented Hadoop e-learning infras-
tion of the classroom attendance and the dormitory entrance tructure. The Hadoop cluster contains three nodes (Master
information, it is possible to determine the population of the node, Slave 1 node, and Slave 2 node). The Hadoop cluster
students, predict the density of the population and to control is also connected to the Email server, Moodle server. Net-
the temperature of the classroom and buildings by pattern work data storage and Sharepoint cluster through the Results
analysis. server. The cluster communicates with other components
The authors in [5] proposed an architecture for an E- using the TCP/IP protocol and all data is transferred through
Learning Big Data Ecosystem. It is composed of five modules the Ethernet infrastructure.
as shown in Fig. 6: (1) Collection Module; (2) Transport Mod- The authors in [7] proposed an architecture based on
ule; (3) Storage Module; (4) Computation Module; and (5) Apache’s Hadoop open source distributed Big data comput-
Service Module. The Collection Module contains collectors ing architecture. It is used to process the Big data of Holland
distributed in each layer. Each collector records the log data vocational interest theory. The core module is divided into
produced by different objects and normalizes the collected two parts: (1) Hadoop Distributed File System (HDFS); and
data. The Transport Module transfers the collected log data (2) Hadoop Parallel Programming Framework (MapReduce).
to where the data is required. The Storage Module includes The overall architecture of the system is composed of three
two categories of storage systems. The first storage system layers: (1) Data Layer; (2) Logic Layer; and (3) Presentation
is the Raw Data Storage system which stores historical data Layer. The Data Layer supplies the basic data supporting for
for future data mining and analyzing. The second storage the entire system and stores the mass data of student behavior

VOLUME 8, 2020 116401

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

data including teaching, education management, scientific derived from unstructured data gave a 10% improvement in
research, campus life and so on. The Logic Layer is the core the accuracy of results compare with the traditional single
part of the whole system, which is the value of the data predictive model. The authors in [59] proposed an approach
mining. The Presentation Layer provides a visual interface using predictive analytics for e-learning with the Hadoop Big
for users. The graphical data analysis interface can help users data platform. Their work used the decision tree classification
to perform the Holland analysis, curriculum optimization and approach (C4.5) in a Hadoop framework to predict student
student employment decision. Other works on frameworks performance. The C4.5 algorithm was proposed because: (1)
and platforms for Big education data can be found in [8]–[10]. It is able to handle both discrete attributes, and continuous
The work in [8] used the Hadoop platform to conduct parallel attributes; (2) It can process partially complete training data
mining of educational literature on Big data. The paper has sets with values not present; (3) Pruning can be done while
analyzed the main function of text mining technology, and constructing the trees to prevent the over-fitting problem.
combined Canopy and the k-means algorithm to analyze and The work by [60] proposed a two-stage model, supported by
research the educational Big data literature. The authors in data mining techniques that uses the information available
[9] presented a framework for a Big data education system at the end of the first year of students’ academic career
based on Hadoop. They examined the MapReduce system (path) to predict their overall academic performance. This
for the education system and the huge volumes of data were study proposed to segment students based on the evidence
stored in HDFS. The authors in [10] provided a comparison of failure or high performance at the beginning of the degree
on the Hadoop, Spark and Samza platforms, and presented an program, and the students’ performance levels predicted by
architecture of Spark for education. the model. A data set of 2459 students spanning the years
from 2003 to 2015 from a European Engineering School of a
VI. DATA ANALYTICS FOR BIG EDUCATION DATA public research University was used to validate the proposed
This section gives comprehensive discussions for data ana- methodology. The empirical results demonstrated the ability
lytics for Big education data from two areas: (1) Predictive of the proposed model to predict the students’ performance
analytics; and (2) Learning analytics. A brief literature review level with an accuracy above 95%.
of some emerging trends and opportunities in applications of The ASSISTment [61] system designed by Worcester
Big data in educational data mining and learning analytics can Polytechnic Institute and Carnegie Mellon University can
be found in [57] and [58]. tutor students and assess the student learning at the same
time. This system targets the problem that instructors wish
A. PREDICTIVE ANALYTICS (PA) to do assisting and assessing at the same time in class. The
The prediction of how well a student or a group will perform system gives assessment results by predicting the student’s
on a learning task is one of the most popular and useful performance on standard test given by official assessment
applications of educational predictive analytics. It can also system such as MCAS (The Massachusetts Comprehensive
be used to identify at-risk students who are likely to fail. Assessment System). It collects the student’s reaction infor-
However, there is a challenging problem to solve due to the mation (such as accuracy, speed, the number of hints required
large number of circumstances that can impact student perfor- and performance on sub-steps) and predicts the student’s
mance, such as socioeconomic status, cultural background, performance based on the correlation model trained by past
demographic characteristics and psychological profile. This data of past months and years. Since the students work on
section gives discussions for predictive analytics from three the system every week, the ASSISTment system can keep
application areas: (1) Student performance; (2) Dropout pre- updating the value of metrics and provide increasingly accu-
diction and academic early warning systems; and (3) Courses rate predictions. The authors in [62] developed a predictive
selection. model to forecast the student performance in higher level
modules based on the contextual factors. The authors ana-
1) STUDENT PERFORMANCE PREDICTION lyzed data from 1037 students across various specializations,
The authors in [1] provides a discussion on Big data, learn- with different mode of study, age group, gender and different
ing analytics and use of natural language processing (NLP) sponsors. The Rapid Miner open source tool for predictive
in higher education. They proposed an integrated analytics analytics and visualization was chosen for the study. The
model with predictive analytics for student performance on outcome of the work showcased that negative correlation
their Big data architecture with data access, storage and exists between age and the academic performance, whereas
processing layers. The architecture has been discussed in positive correlation exists between lower level and higher-
Section V. Their analytics model utilizes different types of level modules.
data to predict student performance and support student Other examples of predictive analytics for student per-
progress. The authors incorporate the usage of sentiment formance can be found in [63]–[70]. The authors in [63]
analysis in their predictive analytics to and employ a dis- used student information like attendance, class test, seminar
tributed technology system capable of supporting academic and assignment marks collected from the student manage-
authorities and advisors at educational institutions in making ment system to predict the performance at the end of the
decisions. Their experiment results showed that the features semester. This paper investigated the accuracy of decision

116402 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

tree techniques for predicting student performance. The work any a priori structure of functions. The proposed GP model
in [64] analyzed live video streaming and the students online also provided instructors with individualized suggestions to
learning behaviors and their performance in their courses. students in any performance state (at-risk, just survive, aver-
The student participation and login frequency, as well as the age or good) as well as increasing students’ awareness.
number of chat messages and questions that they submitted The authors in [69] proposed an educational data mining
to their instructors were analyzed together with the student’s (EDM) case study based on the data collected from learning
final grades. The results of the study showed a consider- management system (LMS) of e-learning center and elec-
able variability in students’ questions and chat messages tronic education system of Iran University of Science and
and revealed that combining EDM with traditional statistical Technology (IUST). The authors implemented a model to
analysis provides a strong and coherent analytical framework predict the GPA of graduated students. To achieve goals,
capable of enabling a deeper and richer understanding of a common methodology of data mining was utilized which
students learning behaviors and experience. The authors in is called CRISP. Our results show that there can be confident
[65] explored the use of predictive modeling methods for models for predicting educational attributes. The work in [70]
identifying students in virtual learning environments (VLE) also used data mining as a predictive tool for performance
who will benefit most from tutor interventions. The meth- improvement of engineering students. The authors applied
ods discussed included decision-tree classification, support the C4.5, ID3 and CART decision tree algorithms on engi-
vector machine (SVM), general unary hypotheses automaton neering student data to predict their performance in the final
(GUHA), Bayesian networks, and linear and logistic regres- exam. The authors showed that the outcome of the decision
sion. The methods were trialed through building and testing tree classifiers predicted the number of students who are
predictive models using data from several Open University likely to pass, fail or promoted to next year. Their results
(OU) modules. This work highlighted the importance of provided steps to improve the performance of the students
understanding how a student’s pattern of behavior changes who were predicted to fail or promoted. The comparative
during the course. The authors commented on two findings: analysis of the results also showed that the prediction has
(1) VLE activity is a useful data source to include for pre- helped the weaker students to improve and brought out better
dicting student outcome but should not be viewed as an outcomes in the result.
absolute measure of engagement but rather with reference to
a student’s own past behavior; and (2) Feature selection has 2) DROPOUT PREDICTION AND ACADEMIC EARLY
a big impact on the reliability of a model generated from the WARNING SYSTEMS
data regardless of which model type is chosen. One of the biggest challenges every institution face is how to
The work in [66] demonstrated how web usage mining can improve student retention and reduce attrition. There could
be applied in e-learning systems to predict the marks that be several reasons for student attrition including academic
university students will obtain in the final exam of a course. issues (inadequate preparation, student disinterest with con-
In this work, the authors developed a specific Moodle min- tent or delivery method); motivational issues (low level of
ing tool oriented and compared the performance of different commitment to the institution, perceived irrelevance of the
data mining techniques for classifying students. Several well- institution’s experience); psychosocial issues (social factors,
known classification methods were used such as statistical emotional issues); and financial issues (inability to afford
methods, decision trees, rule and fuzzy rule induction meth- fees, perception that cost outweighs benefits) [71]. Two
ods, and neural networks. The authors carried out several emerging areas to improve student retention and reduce attri-
experiments using available and filtered data to try to obtain tion are (1) Dropout prediction; and (2) Development of
more accuracy. The authors in [67] used predictive analytics academic early warning systems. Dropout prediction is one
to identify the factors influencing the performance of students of the major research topics in learning analytics (LA) for
in final examinations and found a suitable data mining algo- Big education data. The prediction of dropout is very useful
rithm to predict the grade of students. The authors designed to instructors and to be able to identify how likely a student
a neural network (multilayer perceptron) tool using the .NET would drop out during the course. The instructor can make
framework to predict the grade of the student when given the some adjustments during the teaching process to mitigate
various parameters as input and achieved an accuracy of 72% and reduce the likelihood (e.g. send email reminders or give
which showed the potential efficiency of the MLP algorithm. positive feedback to students who have been identified to be
The obtained results from hypothesis testing showed that the very likely to drop out during the course).
type of school did not influence student performance and on Some examples of LA for dropout prediction can be found
the other hand, the parents’ occupation played a major role in in the works by [72]–[80]. The authors in [72] investigated
predicting grades. The work in [68] proposed an approach to dropout prediction in massive open online courses (MOOC).
predict student performance through genetic programming. The objective was to predict from the student behavior log
The authors used activity theory derived participation indi- data the likelihood of students dropping out from the MOOC
cators as inputs into a Genetic Programming (GP) model to in the next ten days. In this work, the authors collected
develop a student performance prediction model. Their GP 39 courses data from the XuetangX platform which is one of
model was able to build a prediction model without assuming the largest online learning platforms in China. The authors

VOLUME 8, 2020 116403

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

used four supervised classification models (SVM, logistic system has been shown to be able to correctly distinguish if
regression, random forest and gradient boosting decision tree the student will get either an ABC grade or a DF grade with
(GBDT)) to perform the dropout prediction task and achieved 92% accuracy. The authors in [83] proposed an approach for
the highest classification accuracy of 88% accuracy with Big data analytics for predicting academic course preference
the GBDT. The work in [73] used machine learning (ML) using Hadoop and MapReduce. In their work, they derived
techniques to demonstrate that categorizing student perfor- preferable courses for pursuing training for students based on
mance data and exercise sets were adequate parameters for course combinations. The input dataset collected from stu-
identifying possible dropouts during a course. The authors dents is split into various clusters and provided to the mapper
used experimental data from a computer science course and that maps data to the output which are represented as <key,
showed that their ML techniques could provide automatic value > pairs. The output obtained from the mapper are then
detection of student dropouts during the second week of the combined in the combiner and then sent to the reducer. The
eight-week courses. authors in [84] developed educational models to predict how
The work in [74] utilized education data mining to analyze learning materials might be designed to fit the knowledge of
the factors affecting student academic performance which the student. Their approach used educational data mining to
contributed towards the student failure and dropout. The develop educational models to predict how learning materials
authors showed that their techniques enabled the identifica- might be designed to fit the knowledge of the student.
tion of weak students shown to have poor performance. The
authors in [75] used learning analytics to manage dropout B. LEARNING ANALYTICS
rates based on a set of pedagogical actions in distance edu- Learning Analytics (LA) is the collection and analysis of
cation courses and reported an average of 87% prediction usage data associated with student learning. This section
accuracy and an average reduction of 11% in dropout rates. gives discussions for LA from five areas: (1) Collaborative
Other works for dropout prediction can be found in [76]–[80]. and interactive Learning; (2) Behavior learning; (3) Personal-
The authors in [77] conducted experiments using a dataset ized learning; (4) Social network analytics; and (5) Learning
of 419 students to determine the best predictors of dropout at and assessment analytics.
different stages in a course. The authors in [77] extracted fea-
tures from student behavior from completed curriculum and 1) COLLABORATIVE & INTERACTIVE LEARNING
applied machine learning algorithms to predict the dropout Collaborative analytics are commonly used to deal with
rate. The authors in [78] used data mining algorithms to issues related to providing instructional strategies that sup-
predict student failure from high dimensional and imbalanced ports and enhances the collaboration process among students
behavior data. A second emerging area in LA for Big edu- who work together in small groups. A collaborative learning
cation data is the development of academic early warning environment (CLE) aims to improve continuous and recipro-
systems (AEWS). The objective of an AEWS is to discover cal student-educator interaction, cooperation towards knowl-
and identify existing and potential academic problems of edge construction, and knowledge and experience exchange
students in the early stages of education and inform students to reach common goals. The work in [85] presented an
so that remedial actions can be taken to mitigate the risks. The empirical case study to investigate the impact of collaborative
authors in [81] proposed an AEWS based on Big education learning patterns on student achievements with educational
data collected from different departments of the university data captured from a CLE platform. The authors analyzed
such as the academic affairs, library and other departments. the progress time series reflecting students’ contributions to
The authors used principal component analysis (PCA) to an assignment to investigate different styles of collabora-
locate the key predictors and utilized three machine learning tions. By comparing the collaborative learning patterns of the
algorithms to train and test their classifiers from their sample same groups in completing different assignments, the authors
data. Their results showed that the naïve Bayesian algorithm explored the pattern impact on the grades received as a result
gave the best accuracy rate of 86% for three-semester data of teacher assessments of these assignments and identified the
and 85.4% for one-semester data. characteristic patterns that lead to better learning outcomes
either in terms of quality or efficiency. The authors showed
3) COURSES SELECTION that continuous focus, self-reflection, live collaboration, and
This section focuses on the articles or works where learning even distribution of workload and contributions were more
analytics is used as a tool for courses selection. The authors in likely to lead to more refined and coherent assignments, and
[82] proposed a system termed as Degree Compass to be used consequently achieve better marks. A different approach was
by students who are not familiar with navigating their way taken by the authors in [86] which proposed using student
through a degree program. The Degree Compass system uses interaction to measure the effectiveness of collaboration in
data from hundreds of thousands of past students with the virtual learning environments (VLE). In this work, the user
data of a particular student (course grades, standardized test activity logs from the learning platform were used as the main
scores, college transcript grades, etc.) to recommend courses tool for inferring learners’ activities to fit certain behaviors
to students that is most likely to achieve the best grade and and preferences. The work by [87] examined the effects of
which also fits with the program of study of the student. The learning analytics as supporting tools for instructors to guide

116404 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

cooperating groups. Other examples of papers on collabora- in micro-learning, information is delivered in small portions
tive and interactive learning for LA can be found in the works that are easy to learn effectively [100] and content can be
by [88]–[91]. delivered according to a tailored knowledge composition pat-
terns that are best retained by individual students. Personal-
2) BEHAVIOUR LEARNING ized learning has been advocated as an effective approach that
The concept of behavior learning is important to understand could be applied at different stages of the curriculum to ensure
student learning and evaluating student performance. The deep learning and leaves students with knowledge absorbed
authors in [92] proposed searching for student behavioral quicker and retained longer.
patterns while accessing and browsing educational resources.
In this work, the authors extracted behavioral patterns related 4) SOCIAL LEARNING AND NETWORK-BASED ANALYTICS
to the student interactions with the educational media. Their Social and networked-based learning and analytics benefit
results demonstrated the usefulness of student perception and from the utilization of technology to establish connections
identified the trends regarding the use of educational media between students, instructors, communities and resources
for learning. The authors in [93] developed an evaluation [101]. The use of EDM and LA for social networks analysis
system for student learning through the factors analysis that has been reported to be associated with student learning and
influences their behavior during the media usage. The goal building knowledge in social and cultural settings to discover
was to improve the evaluation method in order to improve patterns of collaboration, assessment and communications.
the students’ behavior in relation to use of the learning media. The work by [102] showed that by collecting data about user
To evaluate the level of student’s learning, the decision tree behavior, LA could be useful for providing recommendations
technique was used. The authors in [94] developed a system about learning resources and activities. The work by [103]
to explore and visualize generated data in virtual learning showed that mining students’ online social interaction was
environments and analyzed these data using web-mining and important for recommending appropriate learning partners
statistical techniques to extract behavior patterns of the stu- in a web-based cooperative learning environment. Another
dent. The authors in [95] grouped and analyzed access data work for EDM and LA to aid educational decision makers
in order to recognize behavior patterns (e.g. identify whether by providing the environment to share and collaborate with
the instructions were inadequate or insufficient, or to identify other team members to take the appropriate actions for a given
visibility problems in the content posted) in order to review learning task can be found in [104].
and organize the educational content. The authors in [96]
presented a framework for analyzing student activity data 5) LEARNING & ASSESSMENT ANALYTICS USING
in open-ended learning environments (OELE) that integrates EXPERIENCE API
model-driven behavior characterization and data-driven pat- The Experience API (xAPI) standard is a specification for
tern discovery. The model-driven approach used linked task learning technologies which can be used for data collection
and strategy models to provide more precise interpretation of describing the wide range of experiences of the learner in the
student activity sequences as learning and problem-solving context of formal learning, informal learning and social learn-
strategies while the pattern mining approach enables the iden- ing [105]. The authors in [109] gave two classifications for
tification of new variations of strategies and of gaps in the research works using the xAPI specification in the context for
coverage of the current strategy model. Other examples of learning analytics: (1) The first category deals with the defi-
papers on behavior learning can be found in the works by ciencies of xAPI specification such as limitations of learning
[97], [98]. interactions and inconsistency of learning behaviors across
platforms in addressing specific issues related to the learning
3) PERSONALIZED LEARNING context; and (2) The second category deals with tracking and
Personalized learning is aimed at customizing the learning analyzing the learning experience using the xAPI specifica-
journey of a student to maximize his/her learning potential tion. The work by [43] used the xAPI standard to track educa-
and hence fulfill the goal of education and career with sat- tional data from an e-learning environment called Kalboard
isfaction and accomplishment. With the help of Big data 360. The tracked data is classified into behavioral, demo-
technologies, learning can be made increasingly personal- graphic and academic background features and three data
ized, and instructors can watch learners and track which areas mining techniques (ANN, naive Bayes and decision tree clas-
within a program of study they find challenging and spend sifier) were employed to evaluate the impact of such features
most of their time, the learning materials they revisit often, on student performance. The experimental results showed
the sections they recommend to their peers, the learning styles that there was a strong relationship between learner behaviors
they prefer, and the time of day they learn better [99]. With and their academic achievement. The authors in [107] pro-
the emergence of various learning strategies such as micro- posed a 3D design activity stream for STEM education based
learning, multimedia learning and flipped classroom, learn- on xAPI. The xAPI can describe learner experiences as active
ing personalization has been recognized as an effective and statements with eight attributes (UUID, ACTOR, VERB,
adaptable interface between the student and the knowledge to OBJECT, RESULT, CONTEXT, TIMESTAMP and VER-
allow effective learning and knowledge transfer. For example, SION). For example, the specification <ACTOR, VERB,

VOLUME 8, 2020 116405

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

OBJECT, CONTEXT > composes a simple activity flow. tems. In collaborative-based filtering systems, an item will
Experiments were carried out at the Li Jun School in China. be recommended to the user based on the preference of other
The authors collected more than 22,000 data elements and similar users for the same item. The sets of users which
showed that their xAPI could completely record the learning have the strongest correlation in the past will be identified
paths of students. Their results also showed that students had as nearest neighbors, and the score of the new items will be
different operating habits and learning paths which provided predicted based upon the scores of its nearest neighbors. The
the basis for the evaluation of students’ spatial thinking ability correlation or log-likelihood ratio measures can be used to
and engineering design skill in the interactive learning envi- identify preferred items for the user. Content-based filtering
ronment. The authors in [108] discussed some experiences recommender systems utilize a series of discrete and pre-
and learnt lessons from implementing xAPI for projects in tagged characteristics of an item in order to recommend addi-
the Netherlands. The authors remarked on the need for a tional items which have similar properties. Content-based
centralized approach for data collection to get a complete recommendation systems find out items of interest for users
picture of student behavior which may be stored on many by analyzing item descriptions. These systems generate lists
heterogeneous IT systems. Furthermore, the xAPI recipes of item profiles for the users based on the data provided by
need to be seen in their infrastructural context. An ETL users. It uses two metrics called term-frequency (TF) and
(Extract Transform Load) layer with communal best practices inverse document frequency (IDF). The TF determines how
encoded in the transforms and applied across the higher many times the item has occurred in a document whereas
education sector can enforce the authoritative standard and the IDF identifies the importance of the item. The product
decrease the overall costs. of TF×IDF is used to identify the importance of the item.
The authors in [106] discussed a case study to show the Knowledge-based recommendation systems are based upon
suitability of using xAPI (Tin Can API) for self-regulated the knowledge of a user’s need for an item and can therefore
learning (SRL). The authors proposed an extension of xAPI reason about the relationship between a need and a possi-
for recording SRL-related actions termed as xAPI-SRL. Their ble recommendation. The knowledge about the user needs,
monitoring system had several steps: (1) Author – filter state- preferences, etc. are used to perform the recommendation.
ments from the selected author; (2) SRL – filter SRL related Current recommender systems typically combine one or more
actions; (3) Time – select time window and organize records approaches into a hybrid recommendation system to improve
time wise; (4) Object – filter or organize statements attending the recommendation accuracy. Examples of recommendation
to the object; (5) Grouping and analysis – analyze groups of systems for educational data can be found in [112]–[119].
statements attending to how they relate to each other. A recent For specific course recommendation of MOCC, some
work by [109] explored the use of xAPI in learning analytics approaches such as collaborative filtering, content-based fil-
for MOOC environments which generated big assessment tering and hybrid recommendation systems can be found
data (Big data) given the massive number of courses proposed in [113]–[115], [116]. The authors in [113] proposed a
and the high number of learners enrolled. These assessment systematic methodology for recommending personalized
data must be tracked, processed and analyzed as the learn- courses and considering the sequence of learning curriculum.
ing data. The authors in [110] commented that assessment In their system, they considered a measurable context space
analytics has the potential to make valuable contributions with Lipschitz condition, where space is divided into many
to the field of learning analytics by extending its scope and subspaces to represent different types of students. The course
increasing its usefulness. The authors also state that the role clusters are defined to capture the prerequisite dependencies
that assessment analytics could play in the learning process is among courses. Their dataset is composed of three parts: (1)
significant and yet it is underdeveloped and underexplored. Data of courses; (2) Context information of the students; and
(3) Feedback reward records. The course data was obtained
C. RECOMMENDATION SYSTEMS from the biggest MOOC platform in China called ‘iCourse’
A recommendation system or recommender is an informa- which contains nearly all the Chinese online courses. The
tion filtering system that seeks to predict the rating or pref- context information was collected from 4939 anonymized
erence a user would give to an item. These systems have students in Huazhong University of Science and Technol-
been very helpful in applications such as e-commerce (e.g. ogy and Central China Normal University (∼20,000 learning
Amazon), entertainment (e.g. Netflix, YouTube and Spotify), records). The reward records are the scores of courses and
service industries, and social media platforms (e.g. Twitter the degree of satisfaction. The authors in [114] proposed a
and Facebook). Recently, recommender systems have gained Big data solution on Hadoop platform for recommendation
popularity in the education sector to generate various kinds of pedagogical documents that meet the identified needs of
of recommendations for learning institutions, instructors and the learner. This system will be established by using Big
students. This sub-section explores recommendation systems data as a tool to analyze the performance and skill level of
for Big data in education. The various recommendation tech- students individually and then create personalized learning
niques can be broadly categorized into four types [111]: experiences that fit into their specific learning paths. The
(1) Collaborative-based filtering; (2) Content-based filtering; authors used a semantic approach which recommends learn-
(3) Knowledge-based systems; and (4) Hybrid-based sys- ing objects by comparing the textual contents of resources

116406 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

that form a corpus of pedagogical documents and proposed Some examples of using graph-based analytics and machine
an algorithm for similarity measurement between the doc- learning to address challenges and opportunities for educa-
ument viewed by the learner and the documents of corpus tion can be found in the works by [127]–[129]. The authors
of pedagogical documents available in order to select from in [127] used observed prerequisite relations among courses
those which are most similar to the viewed document. Their to learn a directed universal concept graph and used the
work was implemented and tested on the Hadoop Big data induced graph to predict unobserved prerequisite relations
platform. For the implementation of the recommendation among a broader range of courses. This is particularly useful
algorithm, modules were coded in Python using scikit-learn to infer prerequisite relations among courses from different
and NLTK python packages. For parallelization, MapReduce providers e.g. universities, MOOC, etc. The authors proposed
was leveraged to process the data stored in Google File a new framework called Concept Graph Learning (CGL)
System (GFS). The authors in [115] also designed and imple- for inference within and across two graphs at the course
mented a personalized recommendation system on Big data level and at the induced concept level. The explicit learn-
platform. Their system can help people to automatically exca- ing of the directed graph for universal concepts is the key
vate interesting and valuable information from target data. part of the framework. Once the concept graph is learned,
A personalized education resource recommendation system it could be used to predict unobserved prerequisite relations
which can handle Big data is studied and implemented. The among different courses including those not in the training
results showed that the personalized recommendation system set and from multiple sources. Their experiments showed
of educational resources based on Big data has been put promising results for cross-universities setting. The universal
into use in a university network and achieved the expected transferability is particularly desirable in MOOC environ-
design goal. This system, combining the discipline classi- ments where courses are offered by different universities and
fication tree and the recommended structure, provides the instructors.
resilient processing ability with the increase of data and the The authors in [128] addressed the graph analysis problem
personalized recommendation function based on the security, in multi-source relational learning for educational data. When
high efficiency and real-time of Big data. It provides effec- the numbers of nodes in multiple graphs are large, the labeled
tive help for the students and teachers to make use of the training instances are extremely sparse. Existing methods
valuable teaching resources. However, when they evaluated such as tensor factorization or tensor kernel machines do
their recommendation algorithm, the MovieLens dataset (not not work well because of the lack of convex formulation
educational data) was used to verify the performance. for the optimization, the poor scalability of the algorithms
Other educational recommendation systems can be found in handling combinatorial numbers of tuples and the non-
in [116]–[119]. The authors in [116] built a personalized transductive nature of the learning methods which limits their
English learning recommender system for students to set ability to leverage unlabeled data in training. The authors pro-
basic score of lessons. The collaborative filtering technique posed a Cross-graph Relational Learning (CGRL) approach
and content-based method was used. Another author [117] for predicting the strengths or labels of multi-relational tuples
developed a recommender system for predicting student of heterogeneous object types. They formulated the CGRL
performance. Their approach mapped educational data to as a convex optimization problem which enable transduc-
user/item. The matrix factorization technique was used to tive learning using both labeled and unlabeled tuples and
generate the recommendation and logistic regression to vali- proposed a scalable algorithm that guarantees the optimal
date their approach. An automated recommender system for solution and enjoys a linear time complexity with respect to
course selection can be found in [118]. The collaborative the sizes of input graphs. The authors conducted the experi-
recommendation technique was used to recommend elec- ments on 34,340 DBLP publication records in the domain of
tive courses to students by using association rule mining Artificial Intelligence. Tuples in the form of (Author, Paper,
to generate course association rules. The authors in [119] Venue) were extracted from the publication records leading to
built a semantic educational recommender system in for- 15,514 tuples (cross-graph interactions) after preprocessing.
mal e-learning scenarios. They used a conceptual approach The authors showed that their proposed method success-
which can be used as personalized recommender in e-learning fully scaled to the large cross-graph inference problem, and
scenarios in their work. Other examples of earlier works outperformed other representative approaches significantly.
on recommendation systems for e-learning can be found A recent work on graph analytics by [129] presented the
in [120]–[126]. The Recommendation Agent for e-learning early detection prediction of learning outcomes in online
systems is one of the first collaborative filtering educational short course via learner behaviors. Through evaluation on
recommendation systems that have been established [120]. data captured from three two-week courses hosted through
delivery platforms, the authors made three key observations:
D. GRAPH ANALYTICS (1) Behavioral data contains signals predictive of learning
Graph analytics can be used to determine the strength and outcomes in short-courses (with classifiers achieving AUCs
direction of relationships between objects in a graph. This ≥ 0.8 after the two weeks); (2) Early detection is possible
section discusses some research works to address challenges within the first week (AUCs ≥ 0.7 with the first week of
of Big data from online education data using graph analysis. data); and (3) Content features have an ‘‘earliest’’ detection

VOLUME 8, 2020 116407

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

capability (with higher AUC in the first few days), while the spatial patterns of the individual who learned during the test
SLN features become the more predictive set over time as duration. They also analyzed the group learning patterns from
the network matures. They also discuss how their method can mobile learners and the location distribution.
generate behavioral analytics for instructors. The authors in [135] developed a novel approach, Be
the Data, which exploits embodiment in visual analyt-
E. VISUAL ANALYTICS ics to invoke experiential learning. The authors designed
Visual analytics (VA) focuses on analytical reasoning facil- and proposed a visual analytics approach to teach stu-
itated by interactive visual interfaces and scientific visual- dents about exploring alternative two-dimensional (2D) pro-
ization. This section gives some review and discussions and jections of high dimensional data points using weighted
applications of VA in Big education data. The authors in [130] multi-dimensional scaling. In their approach, each student
presented a systematic review of the emerging field for visual embodies a data point, and the position of students in a phys-
learning analytics of educational data. The authors found that: ical space represents a 2D projection of the high-dimensional
(1) Few works have been done to bring visual learning ana- data. Students physically move within the room with respect
lytics tools into classroom settings; (2) Few studies have con- to each other to collaboratively construct alternative projec-
sidered the background information from the students such as tions and receive visual feedback about relevant data dimen-
demographics or prior performance; (3) Traditional statistical sions. The approach exploits a large interactive room called
visualization techniques such as bar plots and scatter plots the Cube and includes a large overhead display, a vision-
are still commonly used in learning analytics contexts; and based motion tracking system, and a software system for
(4) While some studies employ sophisticated visualizations, direct manipulation of high-dimensional data. To use the
there is a lack of studies that employ sophisticated visualiza- system, a group of students enter the Cube and embody
tions and engage deeply with educational theories. Two other virtual data points by wearing trackable hats which detect
studies for visual data mining can be found in [131] and [132]. the locations of students in real-time. Their experimental
The use of VA methods can help turn the features of education findings indicate that Be the Data approach provided the
into a visible type of representation, with the ability of being engagement to enable students to quickly learn about high-
seen and interpreted by means of variety of diagrams, charts, dimensional data and analysis processes despite their mini-
tables, infographics and other forms of visual factors [133]. mal prior knowledge. They identified student data analytical
For example, the activities have characteristic of geolocation strategies that employ this form of embodiment and found
which can be projected onto a map, while the resources of both qualitative and quantitative evidence of student improve-
knowledge can also be converted into the map. A map-based ment in understanding high-dimensional data. Visual Analyt-
management and visual analysis method will largely benefit ics approaches can also be usefully employed in MOOC. For
the users and the researchers from taking advantages of the example, VisMOOC [136] is an interactive visual analytics
Big data in education. system, which can analyze video clickstream data by using
The authors in [134] proposed a novel map-based method a seeking diagram, PeakVizor [137] uses correlation view
to manage and analyze the mobile learning in Big education and flow view to uncover spatial and temporal information of
data. They retrieved the geographic location information from peaks in video clickstreams from MOOC, and DropoutSeer
the GPS for the activities of participants recorded by the system [138] uses timeline view by stack timelines and glyph
mobile learning systems and projected the data onto a map to uncover the participants’ learning activities and patterns,
with a geographic reference and projection parameters. The which can also predict the dropout.
layers of the new generated map can be subsequently inte-
grated with an open map service like Google Map or Baidu F. IMMERSIVE LEARNING & ANALYTICS
Map. The learning activities and resources can be described The emergence of immersive learning approaches enabled
as points, lines or polygons in the form of vector on the by virtual reality (VR) technologies have given instructors
map. The map-based representations provide new methods and educators more flexibility and tools in designing active-
(e.g. the map browsing) to perform exploration of learning based learning environments. Immersive learning techniques
practices. With their approach, the activities of users scattered use computer graphics and human-computer interaction tech-
among the space are reorganized on a geographic map with nologies to create simulated virtual worlds in which student
location changes in time series, and the resources are geo- learning can take place by employing suitable pedagogical
tagged with the information from the developers or adopters, approaches to create virtual worlds where learners could
which are converted to a map style according to their hierar- learn collaboratively [139], [140]. For example, the Second
chical structures. The authors performed experiments using Life virtual world enables learners to create avatars in the
mobile learning data from the platform named M-starC of virtual world for interaction with virtual objects and virtual
Central China Normal University (CCNU), which allows environments [141]. Compared to traditional learning envi-
participants to use a mobile learning application for the access ronments, immersive learning environments allow learners to
of the learning resources. Their experiment aimed to analyze explore problems and experience solutions in the virtual envi-
the personal learning patterns. Classes were obtained from ronment through experiential learning. The authors in [142]
the data using the k-means method. The clusters revealed the proposed an empirical study of designing and evaluating

116408 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

an immersive learning experience for a MOOC termed the sidered a scenario where learning analytics (LA) could be
VirtualHK MOOC. The authors work showed that immersive used to track students and their performance could be flagged
learning experience may not directly impact the knowledge to deny a student access to future education programs based
gain of learning but can improve the overall learning experi- on the pre-conceived student ability for institution decision-
ence in better motivating learners and making the learning making leading to unintended outcomes. The authors in
more enjoyable. The student feedback and sentiment anal- [145] remarked that LA presents significant student privacy
ysis showed that 52.73% of the learners gave ‘‘positive’’ challenges for higher education institutions. In their work,
comments and 47.27% gave ‘‘neutral’’ comments for the the authors also posited four proponents that LA must justify
immersive learning experience. in relation to the use of student data: (1) LA systems should
provide controls for differential access to private student data;
G. SOCIAL MEDIA ANALYTICS (2) Institutions must be able to justify their data collection
Student interactions and informal conversations on social using specific criteria; (3) The actual or perceived positive
media (e.g. Twitter, Facebook) give useful insights into their consequences of LA may not be equally beneficial for all
educational experiences, emotions and concerns about the students. A full accounting is required of how benefits are
learning process. However, data collection and analytics from distributed between institutions and students and among stu-
social media data can be challenging due to the complexity. dents; and (4) Students should be made aware of collec-
The collection of social media data has been presented in the tion and use of their data and permitted reasonable choices
previous section (Section IV). Normally, the student learn- regarding collection and use of that data. The authors in
ing experiences acquired from social media content would [146] remarked that privacy and data protection are major
require human interpretation. However, the growing scale of stumbling blocks for a data-driven educational future. In this
data volume and variety demands automatic data analytics work, the authors proposed three principles to guide the prac-
techniques. This section focuses on a brief of mining social tical deployment of LA and Big education data systems: (1)
media data such as Twitter, followed by the inductive content Privacy and data protection in LA are achieved by negotiating
analysis which frequently used in social media analytics and data sharing with each student; (2) How the educational
prominent themes. The previous section (Section IV) only institution will use data and act upon the insights of analysis
presents the reviews of education data mining research. Here should be clarified in close dialogue with the students; and
we give some examples of studies on Twitter from the fields (3) In negotiating privacy and data protection measures with
of data mining, machine learning and natural language pro- students, schools and universities should use this opportunity
cessing for education models and algorithms. The authors to strengthen their personal data literacies.
in [37] presented a work on mining social media data for
understanding student learning experience from Twitter posts B. TECHNOLOGICAL CHALLENGES
at Purdue University. The authors conducted a qualitative There are several technological opportunities and challenges
analysis taken from 25,000 tweets from engineering stu- for employing Big data in education and learning analytics
dents and implemented a classification algorithm for tweets due to the large and increasing amounts of online education
reflecting the student’s problems. Their work presented a data. As discussed in Section V, Big education systems would
methodology that showed how data from social media can require access to a high-performance computational infras-
be used to provide insight into student learning experiences. tructure which can handle a large amount of data for capture,
The proliferation of multimedia technology in social learning storage, processing and visualization. There are also several
spaces allows student emotions and sentiments to be captured issues and considerations for practical deployment of Big
and automatically classified from audio-visual devices such education data systems due to lack of interoperability of insti-
as web-cameras and microphones [149]. tutional data systems and different forms of data storage in
disparate databases [143]. The absence of cross-institutional
VII. CHALLENGES FOR BIG DATA IN EDUCATION AND policies for data sharing and integration creates another major
LEARNING ANALYTICS challenge to be addressed for Big education systems [38].
This section presents challenges for Big data in education To illustrate some technological challenges and the useful-
and learning analytics from two perspectives: (1) social chal- ness and potential of exploiting cross-institutional Big educa-
lenges; and (2) technological challenges. The technological tion data, we performed an investigation for practical deploy-
and practical challenges are illustrated by giving an example ment of a Big education data system across some institutions
for utilizing graph analytics for a university-based learning in Australia. The objective of the system is to detect the
analytics scenario. unobserved prerequisite dependencies among online courses
for different universities in Australia. This system would
A. SOCIAL CHALLENGES be useful for students to infer prerequisite relations among
As in many fields where large amounts of data are being courses from different providers (e.g. universities, MOOC)
collected, there are also several important social challenges to chart their learning pathways.
including privacy, ethical, security and safety issues to be Our approach is based on graph-based analytics like
addressed for Big education data. The authors in [144] con- the techniques proposed by [127], [128]. The graph-based

VOLUME 8, 2020 116409

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

TABLE 3. Data Statistics for crawled university subject data. TABLE 4. Performance using MAP for cross-institution subject data.

TABLE 5. Performance using AUC for cross-institution subject data.

analytics approach was selected due to its effectiveness for

cross-university transfer learning where courses may come
from different providers and across institutions. For an initial
investigation, we performed data collection from three uni-
versities in Australia (Australian National University – ANU,
Australian Catholic University – ACU, and Bond University – performance (shown as the diagonals in the tables). On the
BU) by regular web scraping techniques using Python on the one hand, the AUC gives equal weight to the predicted true
respective university subject data available on the Internet. positives. There may be different paths to achieve a goal and
One challenge that was faced in the data collection process the AUC metric may evaluate them as giving similar perfor-
was to scrape the dynamic generated subject data from ANU, mance. On the other hand, the MAP metric sorts true positives
where we used Selenium to complete this task. Another to higher positions of ranked lists to rank true positives higher
challenge was to clean the raw data. We used standard text than false positives to achieve a high MAP score. The score of
preprocessing methods to remove stop words (e.g. ‘‘and’’, MAP will be low if it fails to get higher rank on true positives.
‘‘is’’, ‘‘the’’) and rare words with a training set frequency For the Table 4, the MAP may not always get higher rank on
of 1. The raw data was cleaned by four methods including: (1) true positives. On cross-university performance, the scores
Conversion of data to lowercase; (2) Tokenization; (3) Word of AUC were still higher than MAP, while lower than the
stemming; and (4) Removal of stop words and symbols. The AUC performance of within-university. The scores of MAP
Bag of Words (BoW) was used for modeling the cleaned data, were lower than the within-university one. The performance
with the extracted data statistics and total key words for the of the cross-university was lower than the within-university
crawled university subject data as shown in Table 3. The BoW performance due to the usage of different key words (labels)
approach is a representation technique originating from Nat- as inputs.
ural Language Processing (NLP) which is commonly used
to extract features from text documents and other objects VIII. CONCLUSION
[150]. The ‘‘Subjects’’ field list the number of subjects in This paper has presented a comprehensive survey of research
each university. The ‘‘Prerequisites’’ field shows the number works on Big education data including the data sources, data
of dependencies among subjects in the university. The ‘‘Key collection, technological aspects, data analytics and chal-
Words’’ field delivered the number of key words extracted lenges. The different sources for input into Big education
from the ‘‘Subject description’’ in every university. And the data systems have also been discussed including learning
‘‘Total key words’’ field show the number of key words after management systems (LMS), open educational resources
merging of key words from the three universities. The gener- (OER), MOOC, social media and linked data. A classifi-
ated links (Prerequisites) and the BoW model were imported cation of the various approaches for analytics have also
into Matlab by libSVM as inputs into the graph-based algo- been given which includes predictive analytics, learning ana-
rithms. Two metrics (Mean Average Precision (MAP) and lytics (collaborative/interactive learning, behaviour learning,
Area Under the Curve (AUC)) were used to evaluate the personalized learning, and social learning), recommendation
performance of the algorithms. The experiments were carried systems, graph analytics, visual analytics, social media ana-
out on a workstation with an Intel i7-6800k CPU and 32GB lytics and immersive learning and analytics. The paper has
RAM under Ubuntu 18.04 LTS. Table 4 and Table 5 shows also discussed social (privacy and ethical issues) and techno-
the performance of the graph-based analytics among the three logical challenges for Big education data to be addressed for
universities. The resulting subject data were split into training future research. Investigations for a cross-institution learn-
sets and test sets. ing analytics scenario have also been given to illustrate the
The models were trained using the dataset from one uni- usefulness and technological challenges faced for practical
versity, and then tested using the dataset from a different deployment of Big education systems. The research area
university. Some observations can be made from the results of Big education data is constantly evolving and amongst
in Table 4 and Table 5. The performance of the AUC scores other sources, readers can refer to learning forums such
are higher than the MAP scores for the within-university as LAK (Learning Analytics & Knowledge Conference),

116410 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

Learning@Scale, AIED (Artificial Intelligence in Education) [22] R. Ferguson and D. Clow, ‘‘Where is the evidence?: A call to action
and periodicals such as JEDM (Journal of Educational Data for learning analytics,’’ in Proc. 7th Int. Learn. Analytics Knowl. Conf.,
Mar. 2017, pp. 56–65.
Mining), IEEE Transactions on Learning Technologies for [23] P. Leitner, M. Khalil, and M. Ebner, ‘‘Learning analytics in higher
the latest research. education—A literature review,’’ in Learning Analytics: Fundaments,
Applications, and Trends (Studies in Systems, Decision and Control),
Vol. 94, A. Peña-Ayala, Ed. Cham, Switzerland: Springer, 2017, pp. 1–23.
REFERENCES [24] Z. K. Papamitsiou and A. A. Economides, ‘‘Learning analytics and educa-
[1] A. S. Alblawi and A. A. Alhamed, ‘‘Big data and learning analytics in tional data mining in practice: A systematic literature review of empirical
higher education: Demystifying variety, acquisition, storage, NLP and evidence,’’ Educ. Technol. Soc., vol. 17, no. 4, pp. 49–64, Oct. 2014.
analytics,’’ in Proc. IEEE Conf. Big Data Analytics (ICBDA), Nov. 2017, [25] L. K. Chew, ‘‘Using xAPI and learning analytics in education,’’ in Elearn-
pp. 124–129. ing Forum Asia, 2016, pp. 13–15.
[2] P. Michalik, J. Stofa, and I. Zolotova, ‘‘Concept definition for big data [26] O. Bohl, J. Scheuhase, R. Sengler, and U. Winand, ‘‘The sharable content
architecture in the education system,’’ in Proc. IEEE 12th Int. Symp. Appl. object reference model (SCORM)—A critical review,’’ in Proc. Int. Conf.
Mach. Intell. Informat. (SAMI), Jan. 2014, pp. 331–334. Comput. Edu., Dec. 2002, pp. 950–951.
[27] J. P. Leal and R. Queirós, ‘‘Using the learning tools interoperability frame-
[3] R. Machova, J. Komarkova, and M. Lnenicka, ‘‘Processing of big edu-
work for LMS integration in service oriented architectures,’’ Technol.
cational data in the cloud using apache Hadoop,’’ in Proc. Int. Conf. Inf.
Enhanced Learn. Tech-Educ., to be published.
Soc. (i-Soc.), Oct. 2016, pp. 46–49.
[28] M. Dougiamas and P. Taylor, ‘‘Moodle: Using learning communities to
[4] M.-S. Lee, E. Kim, C.-S. Nam, and D.-R. Shin, ‘‘Design of educational
create an open source course management system,’’ in Proc. EdMedia+
big data application using spark,’’ in Proc. 19th Int. Conf. Adv. Commun.
Innovate Learn., Assoc. Advancement Comput. Educ. (AACE), 2003,
Technol. (ICACT), Feb. 2017, pp. 355–357.
pp. 171–178.
[5] Q. Zheng, H. He, T. Ma, N. Xue, B. Li, and B. Dong, ‘‘Big log analysis [29] The Top Open Source Learning Management Systems. Accessed:
for E-Learning ecosystem,’’ in Proc. IEEE 11th Int. Conf. e-Bus. Eng., Feb. 2020. [Online]. Available: https://elearningindustry.com/top-open-
Nov. 2014, pp. 258–263. source-learning-management-systems
[6] D. Marjanovic, M. Milovanovic, and B. Radenkovic, ‘‘Hadoop infrastruc- [30] M. H. Mohamed and M. Hammond, ‘‘MOOCs: A differentiation by
ture for education,’’ in Proc. 14th Int. Symp. New Bus. Models Sustain. pedagogy, content and assessment,’’ Int. J. Inf. Learn. Technol., vol. 35,
Competitiveness, 2014, pp. 365–370. no. 1, pp. 2–11, Jan. 2018.
[7] C. Zhenyu, ‘‘The application of big data in higher vocational education [31] A. Agrawal, A. Kumar, and P. Agrawal, ‘‘Massive open online courses:
based on holland vocational interest theory,’’ in Proc. Int. Conf. Ind. EdX. org, Coursera. com and NPTEL, a comparative study based on
Informat. Comput. Technol., Intell. Technol., Ind. Inf. Integr. (ICIICII), usage statistics and features with special reference to India,’’ INFLIBNET
Dec. 2017, pp. 37–40. Centre, Tech. Rep., 2015.
[8] H. Wang, Q. Wang, and W. Wang, ‘‘Text mining for educational literature [32] S. I. El Ahrache, H. Badir, Y. Tabaa, and A. Medouri, ‘‘Massive open
on big data with Hadoop,’’ in Proc. IEEE Int. Conf. Smart Cloud (Smart- online courses: A new dawn for higher education,’’ Int. J. Comput. Sci.
Cloud), Sep. 2018, pp. 166–170. Eng., vol. 5, no. 5, p. 323, 2013.
[9] R. Swathi, N. P. Kumar, L. Kirankranth, L. S. Madhav, and R. Seshadri, [33] [Online]. Available: http://sociallearningcommunity.com/10-of-the-best-
‘‘Systematic approach on big data analytics in education systems,’’ in mooc-providers/
Proc. Int. Conf. Intell. Comput. Control Syst. (ICICCS), Jun. 2017, [34] [Online]. Available: https://en.unesco.org/events/experts-meeting-
pp. 420–423. defining-open-educational-resources-oer-indicators
[10] J. Chen, J. Tang, Q. Jiang, Y. Wang, C. Tao, X. Zhang, and J. Liao, [35] [Online]. Available: http://discourse.col.org/t/what-are-examples-
‘‘Research on architecture of education big data analysis system,’’ of-oer/27
in Proc. IEEE 2nd Int. Conf. Big Data Anal. (ICBDA), Mar. 2017, [36] J. C. Taylor, ‘‘Open courseware futures: Creating a parallel universe,’’
pp. 601–605. e-JIST, vol. 10, no. 1, pp. 1–7, 2007.
[11] M. W. Rodrigues, S. Isotani, and L. E. Zárate, ‘‘Educational data mining: [37] X. Chen, M. Vorvoreanu, and K. P. C. Madhavan, ‘‘Mining social media
A review of evaluation process in the e-learning,’’ Telematics Informat., data for understanding students’ learning experiences,’’ IEEE Trans.
vol. 35, no. 6, pp. 1701–1717, Sep. 2018. Learn. Technol., vol. 7, no. 3, pp. 246–259, Jul. 2014.
[12] C. Romero and S. Ventura, ‘‘Educational data mining: A survey from [38] A. Dix, ‘‘Challenge and potential of fine grain, cross-institutional
1995 to 2005,’’ Expert Syst. Appl., vol. 33, no. 1, pp. 135–146, Jul. 2007. learning data,’’ in Proc. 3rd ACM Conf. Learn. Scale L@S, 2016,
[13] A. Dutt, M. A. Ismail, and T. Herawan, ‘‘A systematic review on educa- pp. 261–264.
tional data mining,’’ IEEE Access, vol. 5, pp. 15991–16005, 2017. [39] C. K. Pereira, S. W. M. Siqueira, B. P. Nunes, and S. Dietze, ‘‘Linked
[14] C. Romero and S. Ventura, ‘‘Educational data mining: A review of the data in education: A survey and a synthesis of actual research and future
state of the art,’’ IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, challenges,’’ IEEE Trans. Learn. Technol., vol. 11, no. 3, pp. 400–412,
no. 6, pp. 601–618, Nov. 2010 Jul. 2018.
[40] D. Taibi and S, Dietze, ‘‘Fostering analytics on learning analytics
[15] R. Sachin and M. Vijay, ‘‘A survey and future vision of data mining
research: The LAK dataset,’’ in Proc. CEUR Workshop. vol. 974, 2013,
in educational field, advanced computing communication technologies
pp. 5–7.
(ACCT),’’ in Proc. 2nd Int. Conf., Jan. 2012, pp. 96–100.
[41] R. Meymandpour and J. G. Davis, ‘‘Ranking universities using linked
[16] S. K. Mohamad and Z. Tasir, ‘‘Educational data mining: A review,’’ in
open data,’’ J. Stud. Int. Educ., vol. 18, no. 2, pp. 318–327, 2007.
Proc. 9th Int. Conf. Cognit. Sci., vol. 97, pp. 320–324, Nov. 2013.
[42] B. E. Penteado, ‘‘Correlational analysis between school performance and
[17] R. Jindal and M. D. Borah, ‘‘A survey on educational data mining and municipal indicators in Brazil supported by linked open data,’’ in Proc.
research trends,’’ Int. J. Database Manage. Syst., vol. 5, no. 3, pp. 53–73, 25th Int. Conf. Companion World Wide Web WWW Companion, 2016,
Jun. 2013. pp. 507–512.
[18] A. Peña-Ayala, ‘‘Educational data mining: A survey and a data mining- [43] E. A. Amrieh, T. Hamtini, and I. Aljarah, ‘‘Preprocessing and analyz-
based analysis of recent works,’’ Expert Syst. Appl., vol. 41, no. 4, ing educational data set using X-API for improving student’s perfor-
pp. 1432–1462, Mar. 2014. mance,’’ in Proc. IEEE Jordan Conf. Appl. Electr. Eng. Comput. Technol.
[19] H. Aldowah, H. Al-Samarraie, and W. M. Fauzy, ‘‘Educational data (AEECT), Nov. 2015, pp. 1–5.
mining and learning analytics for 21st century higher education: A review [44] C. Keßler, M. d’Aquin, and S. Dietze, ‘‘Linked data for science and
and synthesis,’’ Telematics Informat., vol. 37, pp. 13–49, Apr. 2019. education,’’ Semantic Web, vol. 4, no. 1, pp. 1–2, 2013.
[20] O. Viberg, M. Hatakka, O. Bälter, and A. Mavroudi, ‘‘The current land- [45] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives,
scape of learning analytics in higher education,’’ Comput. Hum. Behav., ‘‘Dbpedia: A nucleus for a Web of open data,’’ In The Semantic Web.
vol. 89, pp. 98–110, Dec. 2018. Berlin, Germany: Springer, 2007, pp. 722–735.
[21] A. Peña-Ayala, ‘‘Learning analytics: A glance of evolution, status, and [46] K. Bollacker, R. Cook, and P. Tufts, ‘‘Freebase: A shared database of
trends according to a proposed taxonomy,’’ Wiley Interdiscipl. Rev., Data structured general human knowledge,’’ in Proc. AAAI vol. 7, Jul. 2007,
Mining Knowl. Discovery, vol. 8, no. 3, May 2018, Art. no. e1243. pp. 1962–1963.

VOLUME 8, 2020 116411

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

[47] T. Rebele, F. Suchanek, J. Hoffart, J. Biega, E. Kuzey, and G. Weikum, [70] K. Bunkar, U. K. Singh, B. Pandya, and R. Bunkar, ‘‘Data mining:
‘‘YAGO: A multilingual knowledge base from wikipedia, wordnet, Prediction for performance improvement of graduate students using clas-
and geonames,’’ in Proc. Int. Semantic Web Conf. Cham, Switzerland: sification,’’ in Proc. 9th Int. Conf. Wireless Opt. Commun. Netw. (WOCN),
Springer, Oct. 2016, pp. 177–185. Sep. 2012, pp. 1–5.
[48] N. Bassiliades, ‘‘Collecting university rankings for comparison using [71] [Online]. Available: web.ysu.edu/gen/ysu_generated_bin/documents/
Web extraction and entity linking techniques,’’ in Information and Com- basic_module/Key_Causes_of_Student_AttritionComprehensive_
munication Technologies in Education, Research, and Industrial Applica- Retention_Plan.pdf
tions (Communications in Computer and Information Science), vol. 469, [72] J. Liang, J. Yang, Y. Wu, C. Li, and L. Zheng, ‘‘Big data application in
2014, pp. 23–46. education: Dropout prediction in edx MOOCs,’’ in Proc. IEEE 2nd Int.
[49] J. Robinson, J. Stan, and M. Ribière, ‘‘Using linked data to reduce Conf. Multimedia Big Data (BigMM), Apr. 2016, pp. 440–443.
learning latency for e-book readers,’’ in Proc. Extended Semantic Web [73] R. Kanth, M.-J. Laakso, P. Nevalainen, and J. Heikkonen, ‘‘Future edu-
Conf., 2012, pp. 28–34. cational technology with big data and learning analytics,’’ in Proc. IEEE
[50] L. D. Rubenstein, ‘‘Using TED talks to inspire thoughtful practice,’’ 27th Int. Symp. Ind. Electron. (ISIE), Jun. 2018, pp. 906–910.
Teacher Educator, vol. 47, no. 4, pp. 261–267, Oct. 2012. [74] A. Pradeep, S. Das, and J. J. Kizhekkethottam, ‘‘Students dropout factor
[51] [Online]. Available: http://data.linkededucation.org/linkedup/catalog/ prediction using EDM techniques,’’ in Proc. Int. Conf. Soft-Comput.
[52] R. A. Huebner, ‘‘A survey of educational data-mining research,’’ Res. Netw. Secur. (ICSNS), Feb. 2015, pp. 1–7.
Higher Educ. J., vol. 19, no. 4, pp. 1–13, 2013. [75] W. L. Cambruzzi, S. J. Rigo, and J. L. Barbosa, ‘‘Dropout prediction
[53] P. Guleria and M. Sood, ‘‘Data mining in education: A review on the and reduction in distance education courses with the learning analytics
knowledge discovery perspective,’’ Int. J. Data Mining Knowl. Manage. multitrail approach,’’ J. UCS, vol. 21, no. 1, pp. 23–47, 2015.
Process, vol. 4, no. 5, pp. 47–60, Sep. 2014. [76] C. Márquez-Vera, A. Cano, C. Romero, A. Y. M. Noaman, H. Mousa
[54] S. Yu, D. Yang, and X. Feng, ‘‘A big data analysis method for online Fardoun, and S. Ventura, ‘‘Early dropout prediction using data mining:
education,’’ in Proc. 10th Int. Conf. Intell. Comput. Technol. Autom. A case study with high school students,’’ Expert Syst., vol. 33, no. 1,
(ICICTA), Oct. 2017, pp. 291–294. pp. 107–124, Feb. 2016.
[55] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and [77] G. Dekker, M. Pechenizkiy, and J. Vleeshouwers, ‘‘Predicting students
S. U. Khan, ‘‘The rise of ‘big data’ on cloud computing: Review and open drop out: A case study,’’ Educ. Data Mining, to be published.
research issues,’’ Inf. Syst., vol. 47, pp. 98–115, Jan. 2015. [78] C. Márquez-Vera, A. Cano, C. Romero, and S. Ventura, ‘‘Predicting
student failure at school using genetic programming and different data
[56] S. A. Noghabi, K. Paramasivam, Y. Pan, N. Ramesh, J. Bringhurst,
mining approaches with high dimensional and imbalanced data,’’ Int. J.
I Gupta, and R. H. Campbell, ‘‘Samza: Stateful scalable stream
Speech Technol., vol. 38, no. 3, pp. 315–330, Apr. 2013.
processing at LinkedIn,’’ Proc. VLDB Endowment, vol. 10, no. 12,
pp. 1634–1645, Aug. 2017. [79] G. Dekker, M. Pechenizkiy, and J. Vleeshouwers, ‘‘Predicting students
drop out: A case study,’’ presented at the Educ. Data Mining, Jul. 2009.
[57] S. Roy and S. N. Singh, ‘‘Emerging trends in applications of big data in
[80] J. Bayer, H. Bydzovská, J. Géryk, T. Obsivac, and L. Popelinsky, ‘‘Pre-
educational data mining and learning analytics,’’ in Proc. 7th Int. Conf.
dicting drop-out from social Behaviour of students,’’ Int. Educ. Data
Cloud Comput., Data Sci. Eng. Confluence, Jan. 2017, pp. 193–198.
Mining Soc., to be published.
[58] L. Cen, D. Ruta, and J. Ng, ‘‘Big education: Opportunities for big
[81] Z. Wang, C. Zhu, Z. Ying, Y. Zhang, B. Wang, X. Jin, and H. Yang,
data analytics,’’ in Proc. IEEE Int. Conf. Digit. Signal Process. (DSP),
‘‘Design and implementation of early warning system based on educa-
Jul. 2015, pp. 502–506, doi: 10.1109/ICDSP.2015.7251923.
tional big data,’’ in Proc. 5th Int. Conf. Syst. Informat. (ICSAI), Nov. 2018,
[59] M. S. Vyas and R. Gulwani, ‘‘Predictive analytics for e learning system,’’
pp. 549–553.
in Proc. Int. Conf. Inventive Syst. Control (ICISC), Jan. 2017, pp. 1–4.
[82] T. Denley, ‘‘Degree compass: A course recommendation system,’’ Edu-
[60] V. L. Miguéis, A. Freitas, P. J. V. Garcia, and A. Silva, ‘‘Early seg- cause Rev. Online, pp. 1–5, Jun. 2013.
mentation of students according to their academic performance: A pre- [83] P. Guleria and M. Sood, ‘‘Big data analytics: Predicting academic course
dictive modelling approach,’’ Decis. Support Syst., vol. 115, pp. 36–51, preference using Hadoop inspired mapreduce,’’ in Proc. 4th Int. Conf.
Nov. 2018. Image Inf. Process. (ICIIP), Dec. 2017, pp. 1–4.
[61] M. Feng, N. Heffernan, and K. Koedinger, ‘‘Addressing the assessment [84] A. Pejic and P. S. Molcer, ‘‘Exploring data mining possibilities on com-
challenge with an online system that tutors as it assesses,’’ User Model. puter based problem solving data,’’ in Proc. IEEE 14th Int. Symp. Intell.
User-Adapted Interact., vol. 19, no. 3, pp. 243–266, Aug. 2009. Syst. Informat. (SISY), Aug. 2016, pp. 171–176.
[62] M. Jose, P. S. Kurian, and V. Biju, ‘‘Progression analysis of students in a [85] L. Cen, D. Ruta, L. Powell, and J. Ng, ‘‘Learning alone or in a group -
higher education institution using big data open source predictive model- an empirical case study of the collaborative learning patterns and their
ing tool,’’ in Proc. 3rd MEC Int. Conf. Big Data Smart City (ICBDSC), impact on student grades,’’ in Proc. Int. Conf. Interact. Collaborative
Mar. 2016, pp. 1–5. Learn. (ICL), Dec. 2014.
[63] B. Kumar and S. Pal, ‘‘Mining educational data to analyze students [86] Á. F. Agudo-Peregrina, S. Iglesias-Pradas, M. Á. Conde-González, and
performance,’’ Int. J. Adv. Comput. Sci. Appl., vol. 2, no. 6, pp. 1–8, 2012. Á. Hernández-García, ‘‘Can we predict success from log data in VLEs?
[64] M. H. Abdous, H. Wu, and C. J. Yen, ‘‘Using data mining for predicting Classification of interactions for learning analytics and their relation with
relationships between online question theme and final grade,’’ J. Educ. performance in VLE-supported F2F and online learning,’’ Comput. Hum.
Technol. Soc., vol. 15, no. 3, p. 77, 2012. Behav., vol. 31, pp. 542–550, Feb. 2014.
[65] A. Wolff, Z. Zdrahal, D. Herrmannova, and P. Knoth, ‘‘Predicting student [87] A. van Leeuwen, J. Janssen, G. Erkens, and M. Brekelmans, ‘‘Supporting
performance from combined data sources,’’ in Educational Data Mining, teachers in guiding collaborating students: Effects of learning analytics
2013. in CSCL,’’ Comput. Edu., vol. 79, pp. 28–39, Oct. 2014.
[66] C. Romero, P. G. Espejo, A. Zafra, J. R. Romero, and S. Ventura, [88] J. Janssen, G. Erkens, and G. Kanselaar, ‘‘Visualization of agreement and
‘‘Web usage mining for predicting final marks of students that use discussion processes during computer-supported collaborative learning,’’
moodle courses,’’ Comput. Appl. Eng. Edu., vol. 21, no. 1, pp. 135–146, Comput. Hum. Behav., vol. 23, no. 3, pp. 1105–1125, May 2007.
Mar. 2013. [89] R. Cerezo, M. Sánchez-Santillán, M. P. Paule-Ruiz, and J. C. Núñez,
[67] V. Ramesh, P. Parkavi, and K. Ramar, ‘‘Predicting student performance: ‘‘Students’ LMS interaction patterns and their relationship with achieve-
A statistical and data mining approach,’’ Int. J. Comput. Appl., vol. 63, ment: A case study in higher education,’’ Comput. Edu., vol. 96,
no. 8, pp. 35–39, 2012. pp. 42–54, May 2016.
[68] W. Xing, R. Guo, E. Petakovic, and S. Goggins, ‘‘Participation- [90] Á. Fidalgo-Blanco, M. L. Sein-Echaluce, F. J. García-Peñalvo, and
based student final performance prediction model through interpretable M. Á. Conde, ‘‘Using learning analytics to improve teamwork assess-
genetic programming: Integrating learning analytics, educational data ment,’’ Comput. Hum. Behav., vol. 47, pp. 149–156, Jun. 2015.
mining and theory,’’ Comput. Hum. Behav., vol. 47, pp. 168–181, [91] P. Williams, ‘‘Assessing collaborative learning: Big data, analytics and
Jun. 2015. university futures,’’ Assessment Eval. Higher Edu., vol. 42, no. 6,
[69] M. Nasiri, B. Minaei, and F. Vafaei, ‘‘Predicting GPA and aca- pp. 978–989, Aug. 2017.
demic dismissal in LMS using educational data mining: A case min- [92] L. dos Santos Machado and K. Becker, ‘‘Distance education: A Web usage
ing,’’ in Proc. 6th Nat. 3rd Int. Conf. E-Learn. E-Teach., Feb. 2012, mining case study for the evaluation of learning sites,’’ in Proc. 3rd IEEE
pp. 53–58. Int. Conf. Adv. Technol., Jul. 2003, pp. 360–361.

116412 VOLUME 8, 2020

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

[93] L. Wang, J. Li, L. Ding, and P. Li, ‘‘E-learning evaluation system based [116] M.-H. Hsu, ‘‘A personalized english learning recommender system for
on data mining,’’ in Proc. 2nd Inf. Eng. Electron. Commerce (IEEC), ESL students,’’ Expert Syst. Appl., vol. 34, no. 1, pp. 683–688, Jan. 2008.
Jul. 2010, pp. 1–3. [117] N. Thai-Nghe, L. Drumond, A. Krohn-Grimberghe, and L. Schmidt-
[94] V. Pascual-Cid, L. Vigentini, and M. Quixal, ‘‘Visualising virtual learning Thieme, ‘‘Recommender system for predicting student performance,’’
environments: Case studies of the Website exploration tool,’’ in Proc. 14th Procedia Comput. Sci., vol. 1, no. 2, pp. 2811–2819, 2010.
Int. Conf. Inf. Visualisation, Jul. 2010, pp. 149–155. [118] O. C. Santos and J. G. Boticario, ‘‘Requirements for semantic educa-
[95] I. L. M. Ricarte, G. R. F. Junior,‘‘A methodology for mining data from tional recommender systems in formal E-learning scenarios,’’ Algorithms,
computer-supported learning environments,’’ Informática na educação: vol. 4, no. 2, pp. 131–154, 2011.
Teoria Prática, vol. 14, no. 2, pp. 83–94, 2011. [119] O. R. Zaiane, ‘‘Building a recommender agent for e-learning systems,’’
[96] J. S. Kinnebrew, J. R. Segedy, and G. Biswas, ‘‘Integrating model-driven in Proc. Int. Conf. Comput. Edu., Dec. 2002, pp. 55–59.
and data-driven techniques for analyzing learning behaviors in open- [120] J. Lu, ‘‘Personalized e-learning material recommender system,’’ in Proc.
ended learning environments,’’ IEEE Trans. Learn. Technol., vol. 10, Int. Conf. Inf. Technol. Appl., 2004, pp. 374–379.
no. 2, pp. 140–153, Apr. 2017. [121] F.-H. Wang and H.-M. Shao, ‘‘Effective personalized recommendation
[97] A. Nussbaumer, E.-C. Hillemann, C. Gütl, and D. Albert, ‘‘A competence- based on time-framed navigation clustering and association mining,’’
based service for supporting self-regulated learning in virtual environ- Expert Syst. Appl., vol. 27, no. 3, pp. 365–377, Oct. 2004.
ments,’’ J. Learn. Anal., vol. 2, no. 1, pp. 101–133, 2015. [122] N. Baloian, P. Galdames, C. A. Collazos, and L. A. Guerrero, ‘‘A model
[98] J. L. Sabourin, B. W. Mott, and J. C. Lester, ‘‘Early prediction of student for a collaborative recommender system for multimedia learning mate-
self-regulation strategies by combining multiple models,’’ Int. Educ. Data rial,’’ in Proc. Int Conf. Collaboration Technol., Sep. 2004, pp. 281–288.
Mining Soc., to be published.
[123] C.-M. Chen, H.-M. Lee, and Y.-H. Chen, ‘‘Personalized e-learning system
[99] K. Pietrosanti. When E-Learning Technologies Embrace Big Data. using item response theory,’’ Comput. Edu., vol. 44, no. 3, pp. 237–255,
Accessed: Feb. 2020. [Online]. Available: https://www.docebo.com/ Apr. 2005.
2013/12/06/when-elearning-technologiesembrace-big-data-2/
[124] M. Gomez-Albarran and G. Jimenez-Diaz, ‘‘Recommendation and stu-
[100] K. Habitzel, T. D. Mrk, B. Stehno, and S. Prock, ‘‘Microlearning:
dents’ authoring in repositories of learning objects: A case-based reason-
Emerging concepts, practices and technologies after e-learning,’’ Proc.
ing approach,’’ Int. J. Emerg. Technol. Learn. (iJET), vol. 4, pp. 35–40,
Microlearning Learn. Work. New Media, vol. 5, no. 3, 2006.
Oct. 2009.
[101] R. Ferguson and S. B. Shum, ‘‘Social learning analytics: Five
[125] M. K. Khribi, M. Jemni, and O. Nasraoui, ‘‘Toward a hybrid rec-
approaches,’’ presented at the Proc. 2nd Int. Conf. Learn. Anal. Knowl.,
ommender system for e-learning personalization based on Web usage
2012.
mining techniques and information retrieval,’’ in Proc. World Conf. E-
[102] E. Duval, ‘‘Attention please!: Learning analytics for visualization and Learn. Corporate, Government, Healthcare Higher Educ., Oct. 2007,
recommendation,’’ presented at the Proc. 1st Int. Conf. Learn. Anal. pp. 6136–6145.
Knowl., 2011.
[126] Y. Yang, H. Liu, J. Carbonell, and W. Ma, ‘‘Concept graph learning from
[103] C.-M. Chen, C.-M. Hong, and C.-C. Chang, ‘‘Mining interactive social
educational data,’’ in Proc. 8th ACM Int. Conf. Web Search Data Mining
network for recommending appropriate learning partners in a Web-based
WSDM, 2015, pp. 159–168.
cooperative learning environment,’’ in Proc. IEEE Conf. Cybern. Intell.
[127] H. Liu and Y. Yang, ‘‘Cross-graph learning of multi-relational associa-
Syst., Sep. 2008, pp. 642–647.
tions,’’ in Proc. 33rd Int. Conf. Mach. Learn., 2016, pp. 2235–2243.
[104] E. A. Heathcote and S. P. Dawson, ‘‘Data mining for evaluation, bench-
marking and reflective practice in a LMS,’’ presented at the E-Learn [128] W. Chen, C. G. Brinton, D. Cao, A. Mason-Singh, C. Lu, and M. Chiang,
World Conf. E-Learn. Corporate, Government, Heathcare Higher Educ., ‘‘Early detection prediction of learning outcomes in online short-courses
Vancouver, BC, Canada, Oct. 2005. via learning behaviors,’’ IEEE Trans. Learn. Technol., vol. 12, no. 1,
pp. 44–58, Jan. 2019.
[105] [Online]. Available: https://xapi.com/overview/
[106] M. Manso-Vazquez, M. Caeiro-Rodriguez, and M. Llamas-Nistal, [129] C. Vieira, P. Parsons, and V. Byrd, ‘‘Visual learning analytics of educa-
‘‘XAPI-SRL: Uses of an application profile for self-regulated learning tional data: A systematic literature review and research agenda,’’ Comput.
based on the analysis of learning strategies,’’ in Proc. IEEE Frontiers Edu. Edu., vol. 122, pp. 119–135, Jul. 2018.
Conf. (FIE), Oct. 2015, pp. 1–8. [130] J. Yoo, S. Yoo, C. Lance, and J. Hankins, ‘‘Student progress monitoring
[107] Y. Wu, S. Guo, and L. Zhu, ‘‘Design and implementation of data collec- tool using treeview,’’ presented at the ACM SIGCSE Bulletin, 2006.
tion mechanism for 3D design course based on xAPI standard,’’ Interact. [131] L. P. Macfadyen and P. Sorenson, ‘‘Using LiMS (the learner interaction
Learn. Environments, pp. 1–18, Dec. 2019. monitoring system) to track online learner engagement and evaluate
[108] A. Berg, M. Scheffel, H. Drachsler, S. Ternier, and M. Specht, ‘‘Dutch course design,’’ presented at the Educ. Data Mining, Jun. 2010.
cooking with xAPI recipes: The good, the bad, and the consistent,’’ in [132] J. Zeitz, N. Self, L. House, J. R. Evia, S. Leman, and C. North, ‘‘Bringing
Proc. IEEE 16th Int. Conf. Adv. Learn. Technol. (ICALT), Jul. 2016, interactive visual analytics to the classroom for developing EDA skills,’’
pp. 234–236. J. Comput. Sci. Colleges, vol. 33, no. 3, pp. 115–125, 2018.
[109] A. Nouira, L. Cheniti-Belcadhi, and R. Braham, ‘‘An enhanced xAPI data [133] D. Zhou, H. Li, S. Liu, B. Song, and T. Hu, ‘‘A map-based visual
model supporting assessment analytics,’’ Procedia Comput. Sci., vol. 126, analysis method for patterns discovery of mobile learning in education
pp. 566–575, Jan. 2018. with big data,’’ in Proc. IEEE Int. Conf. Big Data (Big Data), Dec. 2017,
[110] C. Ellis, ‘‘Broadening the scope and increasing the usefulness of learning pp. 3482–3491.
analytics: The case for assessment analytics,’’ Brit. J. Educ. Technol., [134] X. Chen, J. Zeitz Self, L. House, J. Wenskovitch, M. Sun, N. Wycoff,
vol. 44, no. 4, pp. 662–664, Jul. 2013. J. Robertson Evia, S. Leman, and C. North, ‘‘Be the data: Embodied
[111] L. Cao, ‘‘Non-IID recommender systems: A review and framework visual analytics,’’ IEEE Trans. Learn. Technol., vol. 11, no. 1, pp. 81–95,
of recommendation paradigm shifting,’’ Engineering, vol. 2, no. 2, Mar. 2018.
pp. 212–224, Jun. 2016. [135] C. Shi, S. Fu, Q. Chen, and H. Qu, ‘‘VisMOOC: Visualizing video click-
[112] S. Dwivedi and V. S. K. Roshni, ‘‘Recommender system for big data in stream data from massive open online courses,’’ in Proc. IEEE Pacific
education,’’ in Proc. 5th Nat. Conf. E-Learn. E-Learn. Technol. (ELEL- Visualizat. Symp. (PacificVis), Apr. 2015, pp. 159–166.
TECH), Aug. 2017, pp. 1–4. [136] Q. Chen, Y. Chen, D. Liu, C. Shi, Y. Wu, and H. Qu, ‘‘PeakVizor:
[113] Y. Hou, P. Zhou, J. Xu, and D. O. Wu, ‘‘Course recommendation of Visual analytics of peaks in video clickstreams from massive open
MOOC with big data support: A contextual online learning approach,’’ in online courses,’’ IEEE Trans. Vis. Comput. Graphics, vol. 22, no. 10,
Proc. IEEE INFOCOM Conf. Comput. Commun. Workshops (INFOCOM pp. 2315–2330, Oct. 2016.
WKSHPS), Apr. 2018, pp. 106–111. [137] Y. Chen, Q. Chen, M. Zhao, S. Boyer, K. Veeramachaneni, and H. Qu,
[114] M. Qbadou, I. Salhi, and K. Mansouri, ‘‘Towards an educational recom- ‘‘DropoutSeer: Visualizing learning patterns in massive open online
mendation system based on big data techniques-case of Hadoop,’’ in Proc. courses for dropout reasoning and prediction,’’ in Proc. IEEE Conf. Vis.
4th Int. Conf. Optim. Appl. (ICOA), Apr. 2018, pp. 1–5. Analytics Sci. Technol. (VAST), Oct. 2016, pp. 111–120.
[115] L. Feng and G. Wei-wei, ‘‘Design and implementation of personalized [138] J. Herrington, T. C. Reeves, and R. Oliver, ‘‘Immersive learning technolo-
recommendation system under big data platform,’’ in Proc. 11th Int. Conf. gies: Realism and online authentic learning,’’ J. Comput. Higher Edu.,
Intell. Comput. Technol. Autom. (ICICTA), Sep. 2018, pp. 291–294. vol. 19, no. 1, pp. 80–99, Sep. 2007.

VOLUME 8, 2020 116413

K. L.-M. Ang et al.: Big Educational Data & Analytics: Survey, Architecture and Challenges

[139] Z. Pan, A. D. Cheok, H. Yang, J. Zhu, and J. Shi, ‘‘Virtual reality KENNETH LI-MINN ANG (Senior Member, IEEE) received the B.Eng.
and mixed reality for virtual learning environments,’’ Comput. Graph., and Ph.D. degrees from Edith Cowan University, Australia. He was an
vol. 30, no. 1, pp. 20–28, Feb. 2006. Associate Professor of networked and computer systems with the School
[140] S. C. Baker, R. K. Wentz, and M. M. Woods, ‘‘Using virtual worlds in of Information and Communication Technology (ICT), Griffith University.
education: Second Life as an educational tool,’’ Teach. Psychol., vol. 36, He is currently a Professor with the School of Science and Engineering,
no. 1, pp. 59–64, Jan. 2009. University of Sunshine Coast. His research interests include big data analyt-
[141] H. H. S. Ip, C. Li, S. Leoni, Y. Chen, K.-F. Ma, C. H.-T. Wong, and Q. Li, ics, multimedia Internet-of-Things, embedded systems, wireless multimedia
‘‘Design and evaluate immersive learning experience for massive open
sensor systems, reconfigurable computing and the development of real-world
online courses (MOOCs),’’ IEEE Trans. Learn. Technol., vol. 12, no. 4,
computer systems, and machine learning. He has published over 180 articles
pp. 503–515, Oct. 2019.
[142] B. Daniel, ‘‘Big data and analytics in higher education: Opportunities in journals and international refereed conferences. He is a Fellow of the
and challenges,’’ Brit. J. Educ. Technol., vol. 46, no. 5, pp. 904–920, Higher Education Academy, U.K.
Sep. 2015.
[143] B. K. Daniel, ‘‘Big data and data science: A critical review of issues for
educational research,’’ Brit. J. Educ. Technol., vol. 50, no. 1, pp. 101–113,
Jan. 2019.
[144] A. Rubel and K. M. L. Jones, ‘‘Student privacy in learning analytics: An
information ethics perspective,’’ Inf. Soc., vol. 32, no. 2, pp. 143–159, FENG LU GE received the B.Sc. degree in information engineering from
Mar. 2016. the Dalian University of Technology, China, the M.Sc. degree from the Uni-
[145] T. Hoel and W. Chen, ‘‘Privacy and data protection in learning analytics versity of Wollongong, Australia, and the Ph.D. degree from Charles Sturt
should be motivated by an educational maxim—Towards a proposal,’’ University. He is currently an Engineer with Pacific Telecom & Navigation
Res. Pract. Technol. Enhanced Learn., vol. 13, no. 1, pp. 1–14, Dec. 2018. Ltd., Hong Kong. He was previously a Postdoctoral Researcher with Charles
[146] H. V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel,
Sturt University. His research interests include data analytics, computer
R. Ramakrishnan, and C. Shahabi, ‘‘Big data and its technical chal-
vision, and robotics.
lenges,’’ Commun. ACM, vol. 57, no. 7, pp. 86–94, Jul. 2014.
[147] R. H. L. Ip, L.-M. Ang, K. P. Seng, J. C. Broster, and J. E. Pratley,
‘‘Big data and machine learning for crop protection,’’ Comput. Electron.
Agricult., vol. 151, pp. 376–383, Aug. 2018.
[148] K. P. Seng, L. M. Ang, and C. S. Ooi, ‘‘A combined rule-based & machine
learning audio-visual emotion recognition approach,’’ IEEE Trans. Affect.
Comput., vol. 9, no. 1, pp. 3–13, Jan./Mar. 2018. KAH PHOOI SENG (Member, IEEE) received the B.Eng. and Ph.D. degrees
[149] Y. Zhang, R. Jin, and Z.-H. Zhou, ‘‘Understanding bag-of-words model: from the University of Tasmania, Australia. She is currently an Adjunct Pro-
A statistical framework,’’ Int. J. Mach. Learn. Cybern., vol. 1, nos. 1–4, fessor with the School of Engineering and Information Technology, UNSW.
pp. 43–52, Dec. 2010. Before returning to Australia, she was a Professor and the Department Head
[150] [Online]. Available: https://www.instructure.com/canvas/
of computer science and networked system with Sunway University. Before
[151] B. Flanagan and H. Ogata, ‘‘Learning analytics platform in higher edu-
joining Sunway University, she was an Associate Professor with the School
cation in Japan,’’ Knowl. Manage. E-Learn. (KM&EL), vol. 10, no. 4,
pp. 469–484, Nov. 2018. of Electrical and Electronic Engineering, Nottingham University. She has
[152] M. Cantabella, R. Martínez-España, B. Ayuso, J. A. Yáñez, and published over 230 articles in journals and international refereed confer-
A. Muñoz, ‘‘Analysis of student behavior in learning management sys- ences. She is the lead author of the book Multimodal Analytics for Next-
tems through a big data framework,’’ Future Gener. Comput. Syst., vol. 90, Generation Big Data Technologies and Applications. Her research interests
pp. 262–272, Jan. 2019. include data analytics, big data, machine learning, artificial intelligence (AI)
[153] O. K. Akputu, K. P. Seng, Y. Lee, and L.-M. Ang, ‘‘Emotion recognition and intelligent systems, the Internet of Things (IoT), multimodal signal
using multiple kernel learning toward E-learning applications,’’ ACM processing, pervasive computing and sensor networks, HCI and affective
Trans. Multimedia Comput., Commun., Appl., vol. 14, no. 1, pp. 1–20, computing, and mobile software development.
Jan. 2018.

116414 VOLUME 8, 2020

Swayam Workshop PPT PDF
0% (1)
Swayam Workshop PPT PDF
66 pages
E Learning PROPOSAL
0% (1)
E Learning PROPOSAL
12 pages
UGBS 202 Study Guide
No ratings yet
UGBS 202 Study Guide
17 pages
Project Management: Submitted By: Akram Dakhli & Tawfik Chelli
No ratings yet
Project Management: Submitted By: Akram Dakhli & Tawfik Chelli
12 pages
Big Data
No ratings yet
Big Data
52 pages
Personalisation in MOOCs Recommendation of Open Educational Resources
No ratings yet
Personalisation in MOOCs Recommendation of Open Educational Resources
25 pages
WW Norton Smartwork Online Homework
100% (1)
WW Norton Smartwork Online Homework
7 pages
Eservices Classification, Trends, and Analysis: A Systematic Mapping Study
No ratings yet
Eservices Classification, Trends, and Analysis: A Systematic Mapping Study
20 pages
Learning Analytics
No ratings yet
Learning Analytics
16 pages
A Systematic Mapping Study of The Empirical
No ratings yet
A Systematic Mapping Study of The Empirical
19 pages
Identification of Operational Risks Impeding The Implementation of Elearning in Higher Education System
No ratings yet
Identification of Operational Risks Impeding The Implementation of Elearning in Higher Education System
17 pages
Pe 1 Banilad
No ratings yet
Pe 1 Banilad
34 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
5 pages
Exam Note Pad Template
No ratings yet
Exam Note Pad Template
2 pages
2020 04. Learning Path Combination Recommendation Based On The Learning Networks
No ratings yet
2020 04. Learning Path Combination Recommendation Based On The Learning Networks
13 pages
Use of E-Learning at Higher Educational Institutions in Bangladesh
No ratings yet
Use of E-Learning at Higher Educational Institutions in Bangladesh
15 pages
Ssss
No ratings yet
Ssss
32 pages
Indian Institute of Banking & Finance (: ISO:9001:2000 Certified Organization)
No ratings yet
Indian Institute of Banking & Finance (: ISO:9001:2000 Certified Organization)
5 pages
Report of Project 55
No ratings yet
Report of Project 55
13 pages
Course Book For Students
No ratings yet
Course Book For Students
75 pages
Eng 103-355
No ratings yet
Eng 103-355
9 pages
Research On Online Learning Resource Recommendation 2020
No ratings yet
Research On Online Learning Resource Recommendation 2020
8 pages
MyLearning Catalog 2021 - Rev B
No ratings yet
MyLearning Catalog 2021 - Rev B
18 pages
Philo11 03
No ratings yet
Philo11 03
8 pages
CFA Brochure
No ratings yet
CFA Brochure
13 pages
Personalization Criteria For Enhancing Learner
No ratings yet
Personalization Criteria For Enhancing Learner
8 pages
2015 - 02 Semantic Web and Ontologies For Personalisation of Learning in MOOCs
No ratings yet
2015 - 02 Semantic Web and Ontologies For Personalisation of Learning in MOOCs
4 pages
2020 02. DNNRec A Novel Deep Learning Based Hybrid Recommender System
No ratings yet
2020 02. DNNRec A Novel Deep Learning Based Hybrid Recommender System
14 pages
Student Manual LMS - Revised2nov2020
No ratings yet
Student Manual LMS - Revised2nov2020
13 pages
Navigating The Labyrinth: A Study of Engagement and Artistry in Process Drama For Additional Language Teaching and Learning
No ratings yet
Navigating The Labyrinth: A Study of Engagement and Artistry in Process Drama For Additional Language Teaching and Learning
475 pages
2015 - 01 Learnersourced Recommendations For Remediation
No ratings yet
2015 - 01 Learnersourced Recommendations For Remediation
2 pages
Getting Started Guide For Parents Primary 2019
No ratings yet
Getting Started Guide For Parents Primary 2019
4 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
An Introduction To Big Data
No ratings yet
An Introduction To Big Data
31 pages
MODULE 4-TTL1_AVDF_
No ratings yet
MODULE 4-TTL1_AVDF_
15 pages
Edict 2022 3064
No ratings yet
Edict 2022 3064
8 pages
Assignment 1 INF4860
No ratings yet
Assignment 1 INF4860
5 pages
Data-Centric Artificial Intelligence
No ratings yet
Data-Centric Artificial Intelligence
39 pages
Big Data Metods
No ratings yet
Big Data Metods
23 pages
Big Data Platforms and Techniques: January 2016
No ratings yet
Big Data Platforms and Techniques: January 2016
11 pages
Learning Tools Interoperability (LTI)
No ratings yet
Learning Tools Interoperability (LTI)
2 pages
Best Practices For Implementing Cloud Data Governance and Catalog
100% (1)
Best Practices For Implementing Cloud Data Governance and Catalog
45 pages
Evaluation of BIRCH Clustering Algorithm For Big Data
No ratings yet
Evaluation of BIRCH Clustering Algorithm For Big Data
5 pages
Application of Big Data Analytics and Organizational Performance
No ratings yet
Application of Big Data Analytics and Organizational Performance
17 pages
BCG-Executive-Perspectives-Future-of-Data-Management-with-AI-EP9-10Dec2024
100% (1)
BCG-Executive-Perspectives-Future-of-Data-Management-with-AI-EP9-10Dec2024
22 pages
DM-CI-2020-00162 Suggested Strategies On DLDM FY 2020-2021
100% (1)
DM-CI-2020-00162 Suggested Strategies On DLDM FY 2020-2021
32 pages
Data Management Maturity: Assessment Review
No ratings yet
Data Management Maturity: Assessment Review
44 pages
Blackboard vs. Moodle Comparing User Experience of Learning Management Systems
No ratings yet
Blackboard vs. Moodle Comparing User Experience of Learning Management Systems
6 pages
Afem & Ci Metadata Best Practice Guide
No ratings yet
Afem & Ci Metadata Best Practice Guide
21 pages
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
No ratings yet
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
59 pages
AnalytiX DS - Master Deck
No ratings yet
AnalytiX DS - Master Deck
56 pages
SSAR Format
No ratings yet
SSAR Format
3 pages
Big Data: by It Faculty Alttc Ghaziabad
No ratings yet
Big Data: by It Faculty Alttc Ghaziabad
26 pages
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
No ratings yet
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
22 pages
Hands-On Exercise No. 3 Batch-07 Wordpress Total Marks: 10 Due Date: 23/07/2020
No ratings yet
Hands-On Exercise No. 3 Batch-07 Wordpress Total Marks: 10 Due Date: 23/07/2020
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
Da Notes (Big Data) PDF
No ratings yet
Da Notes (Big Data) PDF
32 pages
A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques
100% (1)
A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques
8 pages
Ensuring Data Quality
No ratings yet
Ensuring Data Quality
16 pages
Media and Information Literacy Daily Lesson Log Week 1 PDF Free PDF
No ratings yet
Media and Information Literacy Daily Lesson Log Week 1 PDF Free PDF
5 pages
Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend
No ratings yet
Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend
10 pages
Bigdata MINT PDF
No ratings yet
Bigdata MINT PDF
4 pages
IBM Big Data Presentation
No ratings yet
IBM Big Data Presentation
32 pages
Data Model Scorecard - Article 2 of 11
No ratings yet
Data Model Scorecard - Article 2 of 11
6 pages
MRA - Big Data Analytics - Its Impact On Changing Trends in Retail Industry
No ratings yet
MRA - Big Data Analytics - Its Impact On Changing Trends in Retail Industry
4 pages
Non-Profit Solution For Microsoft Dynamics CRM: Users Guide
No ratings yet
Non-Profit Solution For Microsoft Dynamics CRM: Users Guide
55 pages
11-12 Big Data Concepts and Tools
No ratings yet
11-12 Big Data Concepts and Tools
30 pages
data engineering design patterns
No ratings yet
data engineering design patterns
53 pages
Rapid Fire BI: A New Approach To Business Intelligence Tableau
No ratings yet
Rapid Fire BI: A New Approach To Business Intelligence Tableau
16 pages
BI Project Management
No ratings yet
BI Project Management
11 pages
What Is DataOps - The Ultimate DataOps Guide by Rivery
No ratings yet
What Is DataOps - The Ultimate DataOps Guide by Rivery
11 pages
CD SDA Data Governance Syllabus
No ratings yet
CD SDA Data Governance Syllabus
26 pages
Data Virtuality Best Practices
No ratings yet
Data Virtuality Best Practices
18 pages
Big Data Architectures
No ratings yet
Big Data Architectures
4 pages
Idq 1
No ratings yet
Idq 1
13 pages
Data Governance Maturity Model
No ratings yet
Data Governance Maturity Model
42 pages
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
No ratings yet
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
12 pages
Information Management Training: Info@dmadvisors - Co.uk
No ratings yet
Information Management Training: Info@dmadvisors - Co.uk
17 pages
Research Data Strategy
No ratings yet
Research Data Strategy
9 pages
How To Scale Data Governance
100% (1)
How To Scale Data Governance
13 pages
Metadata Management On A Hadoop Eco-System: Whitepaper by
No ratings yet
Metadata Management On A Hadoop Eco-System: Whitepaper by
12 pages
Designing A Data Governance Model Based
No ratings yet
Designing A Data Governance Model Based
7 pages
WP How To Use The Dgi Data Governance Framework
100% (2)
WP How To Use The Dgi Data Governance Framework
17 pages
Data Architecture Is Composed of Models
No ratings yet
Data Architecture Is Composed of Models
7 pages
Data Management Chapter1
No ratings yet
Data Management Chapter1
11 pages
The Politics of Data Warehousing
No ratings yet
The Politics of Data Warehousing
9 pages
Profisee Datasheet Integrator 8.5x11
No ratings yet
Profisee Datasheet Integrator 8.5x11
1 page
Report - Atlan - Data Catalog Primer
100% (1)
Report - Atlan - Data Catalog Primer
24 pages
Data Science Answers
No ratings yet
Data Science Answers
2 pages
Insurance DataWare House Design Vechiles
No ratings yet
Insurance DataWare House Design Vechiles
2 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
PCCO106 Information Management BSIT
No ratings yet
PCCO106 Information Management BSIT
3 pages
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Data Literacy Fundamentals: Understanding the Power & Value of Data
From Everand
Data Literacy Fundamentals: Understanding the Power & Value of Data
Ben Jones
No ratings yet
TOGAF® Business Architecture Level 1 Study Guide
From Everand
TOGAF® Business Architecture Level 1 Study Guide
Andrew Josey
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
From Everand
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
Fouad Sabry
No ratings yet