David Azcona PHD Thesis
David Azcona PHD Thesis
David Azcona PHD Thesis
School of Computing
Dublin City University
August 2019
I hereby certify that this material, which I now submit for assessment on the
programme of study leading to the award of Doctor of Philosophy is entirely my
own work, that I have exercised reasonable care to ensure that the work is original,
and does not to the best of my knowledge breach any law of copyright, and has not
been taken from the work of others save and to the extent that such work has been
cited and acknowledged within the text of my work.
David Azcona
David Azcona
ID No.: 15212605
August 2019
Acknowledgements
I would like to thank my tireless supervisor Prof. Alan F. Smeaton for his continu-
ous support, patience, knowledge and encouragement over the last four years.
I would also like to thank my wonderful hosts at Arizona State University John
Rome and Prof. I-Han Sharon Hsiao for believing in my potential and guiding me
throughout the year where I conducted research as a Fulbright scholar in the United
States.
Finally, I would like to acknowledge support from the following funding sources:
• Irish Research Council in partnership with The National Forum for the En-
hancement of Teaching & Learning in Ireland under project number GOIPG/2015/3497
• Fulbright Ireland
i
Artificial Intelligence in Computer Science and Mathematics Education
ii
Contents
Acknowledgements i
List of Tables x
Abstract xiv
1 Introduction 1
2 Literature Review 12
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
iii
Artificial Intelligence in Computer Science and Mathematics Education
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.10.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.10.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
iv
Artificial Intelligence in Computer Science and Mathematics Education
4.11.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
v
Artificial Intelligence in Computer Science and Mathematics Education
8 Conclusions 139
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.2 Modelling Student Behaviour . . . . . . . . . . . . . . . . . . . . . . 139
8.3 Modelling Students With Embeddings . . . . . . . . . . . . . . . . . 140
8.4 Providing Adaptive Feedback to Students . . . . . . . . . . . . . . . . 142
8.5 Using Graph Theory and Networks to Model Students . . . . . . . . 144
8.6 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Appendices 147
D Awards 155
Bibliography 169
vi
List of Figures
3.1 Screengrab from the Virtual Learning Environment for the Teaching
of Computer Programming at Dublin City University . . . . . . . . . 31
4.2 Number of Students Enrolled in CA114 over from 2009 - 2010 Aca-
demic Year to 2015 - 2016 Academic Year . . . . . . . . . . . . . . . 43
vii
Artificial Intelligence in Computer Science and Mathematics Education
4.3 CA114’s Numbers per Examination over from 2009 - 2010 Academic
Year to 2015 - 2016 Academic Year . . . . . . . . . . . . . . . . . . . 43
4.4 CA114’s Failure Rates Per Examination from 2009 - 2010 Academic
Year to 2015 - 2016 Academic Year . . . . . . . . . . . . . . . . . . . 44
4.7 Empirical Risk for CA116 for the Training Data using Accuracy . . . 51
4.8 Empirical Risk for CA116 for the Training Data using F1-Score . . . 52
4.9 Empirical Risk for CA116 for the Training Data using F1-Score for
the Fail Class Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.13 Confusion Matrix for Week 4 for CA116’s Incoming 2018/2019 Cohort 57
4.14 Confusion Matrix for Week 8 for CA116’s Incoming 2018/2019 Cohort 58
4.15 Confusion Matrix for Week 12 for CA116’s Incoming 2018/2019 Cohort 58
4.17 Feature Importance Across Periods for ASU’s Data Structures Course 65
4.18 Classification Performance using ROC AUC for ASU’s Data Struc-
tures Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.20 Linear Regression Predictions vs. Actual Results Before the Third
Exam for ASU’s Data Structures Course . . . . . . . . . . . . . . . . 67
viii
Artificial Intelligence in Computer Science and Mathematics Education
4.22 Scatter Plot between CAO Points and the Precision Mark, color coded
by Faculty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.23 Scatter Plots between CAO Points and the Precision Mark by Faculty 71
4.25 Mutual Information Score between a Feature and the Precision Mark,
for Several Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.6 Embeddings for the Top Words & Token Words. These Embeddings
Are Projected from 100 Dimensions to 2 Dimensions for Visualization
Using Principal Component Analysis (PCA). Axis in the Graphs Are
the PCA’s Two Principal Components. . . . . . . . . . . . . . . . . . 95
6.3 Frequency of Access to Material and Labsheets from the Notifications 112
7.1 Number of Students that Took each Type of Assessment for MAT117
and MAT170 in ASU’s GFA via EdX . . . . . . . . . . . . . . . . . . 126
ix
Artificial Intelligence in Computer Science and Mathematics Education
x
List of Tables
4.3 CA116 Prediction Metrics including passing rates and at-risk rates . . 56
4.5 Number Features per period and Students below the Threshold . . . 63
5.3 Top-5 Token Words & AST Nodes in terms of Number of Occurrences 91
xi
Artificial Intelligence in Computer Science and Mathematics Education
6.2 Difference and Normalised Gain Index between the examinations for
CA117 and CA114 in the 2015/2016 academic year . . . . . . . . . . 108
6.3 Demographic information and prior information from the 2016/2017
student groups in CA117, CA114 and CA278 . . . . . . . . . . . . . . 110
6.4 Difference and Normalised Gain Index among the examinations for
CA117, CA114 and CA278 in the 2016/2017 academic year . . . . . . 111
6.5 Difference and Normalised Gain Index between the examinations for
CA117, CA114 and CA278 on 2017/2018 academic year . . . . . . . . 113
6.6 Comparision between 2015/2016, 2016/2017 and 2018/2019 academic
years . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.7 2016/2017 Student survey responses from students about the project 116
xii
Abstract
Mathematics Education
David Azcona
In this thesis I examine how Artificial Intelligence (AI) techniques can help Com-
puter Science students learn programming and mathematics skills more efficiently
using algorithms and concepts such as Predictive Modelling, Machine Learning,
Deep Learning, Representational Learning, Recommender Systems and Graph The-
ory.
For that, I use Learning Analytics (LA) and Educational Data Mining
(EDM) principles. In Learning Analytics one collects and analyses data about stu-
dents and their contexts for purposes of understanding and improving their learning
and the environments students interact with. Educational Data Mining applies Data
Mining, Machine Learning and statistics to data captured during these learning pro-
cesses.
xiii
Artificial Intelligence in Computer Science and Mathematics Education
I have used these models not only to predict outcome and exam performance but
also to automatically generate feedback to students in a variety of ways, including
recommending better programming techniques. My research question is explored
by examining the performance of the AI techniques in helping to improve student
learning.
xiv
Chapter 1
Introduction
1
Artificial Intelligence in Computer Science and Mathematics Education
A large contributor to these low progression rates in computing degrees is the fact
that students quite often struggle on their first introductory programming course
[102, 82]. The mean worldwide pass rate for introductory programming has been
estimated at 67% [15], a figure revisited in 2014 [102].
Learning to program a computer is challenging for most people and few students
find it easy at first. Moreover, first-year students often struggle with making the
transition into University as they adapt to what is likely to be a very different form
of independent study and learning. For instance, Dublin City University provides
comprehensive orientation programmes and information for first-year students in
order to ease this transition between second- and third-level education, as well as
student support and career guidance for students at risk of stopping out (leave and
2
Artificial Intelligence in Computer Science and Mathematics Education
In this section of this introductory chapter, we will introduce the areas we will be
discussing throughout this thesis, our motivation to work on them and the research
questions we derived for each, which are used to frame the experiments carried out
and reported later.
There are many different types of data which are gathered about our University stu-
dents, ranging from static demographics to dynamic behaviour logs. These can be
harnessed from a variety of data sources at Higher Education Institutions. Combin-
ing these into one location assembles a rich student digital footprint, which can
enable institutions to better understand student behaviour and to better prepare
for guiding students towards reaching their academic potential.
3
Artificial Intelligence in Computer Science and Mathematics Education
formance in the learning task, in some way. Automated collection of data on com-
puter programming activities, the online activities that students carry out during
their learning process, is typically used in isolation within designated programming
learning environments such as WebCAT ([38]). Yet combining this automatically
collected data with other complementary data sources (i.e. performance in class
assignments or demographic information or information on prior learning) means it
may have to be retrieved and aggregated from different course or University man-
agement systems. As a result, most of the data collection in the reported studies in
CS learning is extremely customised and impossible to replicate and reproduce at
other institutions.
Today, the majority of computer programming classes are delivered via a blended
instructional strategy with face-to-face instruction in classrooms supported by online
tools such as intelligent tutors, self-assessment quizzes, online assignment submis-
sion, and course management systems. New attempts in today’s classrooms seek to
combine multiple modalities of data such as gestures, gaze, speech or writing from
video cameras, lecture recordings, etc. to leverage students’ digital footprints [19,
79].
In this thesis, we propose, build and then evaluate a series of traditional Ma-
chine Learning Predictive Analytics models using student characteristics,
prior academic history, students’ programming laboratory work, and all logged in-
teractions between students’ offline and online resources. We generate predictions of
end-of-course outcome weekly, during the semester. Furthermore, lecturers on the
courses were updated each week regarding their students’ progress.
4
Artificial Intelligence in Computer Science and Mathematics Education
techniques are adopted to develop and then create mathematical models to provide
realtime prediction of course outcomes as well as personalised dynamic feedback to
students on their progress. This approach incorporates static and dynamic student
data features to enhance predictive model scalability that can be extrapolated to
other blended classrooms and to other subjects as well as to other higher education
institutions. Additionally, not only is the approach we develop generic, it also per-
mits applicability in the case of only limited data sets being available (e.g. log files
for access to online laboratory material only) in order to be beneficial in helping
students in need. Most importantly, the generated predictions allow us to auto-
matically create and provide adaptive feedback to each student according to each
student’s progression and also to provide guidance when in need.
We explore how the proposed Predictive Analytics models which are developed
here, work in distinguishing students who may be struggling in computer program-
ming courses. We have access to, and we use, two years of groundtruth student
data as training data from which we can learn. To demonstrate the theory and
address research questions, we implemented multimodal models for each course that
aggregates sources of student data including student characteristics, prior academic
history, students’ programming laboratory work, and all the logged interactions
between students’ offline and online resources. Classification models are built by
developing data features and automatically identifying and extracting patterns of
success on these courses. These are then then trained and cross-validated to de-
termine and then refine their accuracy, and finally predictions are generated every
week with incoming student data. This gives us experimental data that we can use
to validate our underlying research questions and hypotheses. A report containing
whether each student is likely to pass or fail their next formal assessment and the
associated confidence with that estimation, is sent to the lecturers for each course.
In summary, the single and most important research question derived in this section
can be stated as the following:
5
Artificial Intelligence in Computer Science and Mathematics Education
RQ1: When working with new cohorts of University students about whom we
have little historical interaction data, how accurate are the traditional Predictive
Analytics models when used with generic static and dynamic student data features,
in identifying those students in need of assistance in computer programming courses?
In the next section we will look at the technique of embeddings and how it can
be used in this thesis.
Embeddings
Online learning tools and platforms including Massive Open Online Courses (MOOCs)
provide a rich mechanism for students to engage and interact with educational ma-
terial based on their individual existing knowledge and requirements for their own
academic development. Such tools also provide a mechanism to support person-
alised learning effectively through the use of customised recommendations. These
recommendations should be developed based on users’ understanding, effort and
their logged interaction with the learning systems to date by interpreting historical
data from previous cohorts of students as well as data from the current students.
Interest in, and the use of students’ digital footprints and, particularly, interactions
on VLE systems have been rising in the last decade because of their advantage in
better supporting individualised learning. However, developing a richer represen-
tation for student digital footprints effectively and efficiently is still a challenging
problem which has been an area of recent research interest, and is the focus of our
work in this thesis.
Machine Learning (ML) is a subset of AI that provides computers with the abil-
ity to learn without being explicitly programmed [17]. That is done by combining
the study of algorithms with statistical models. ML algorithms build a statistical
model based on a collection of existing data with known outcomes such as retail
data, bank loan applications or customer data from telecoms companies [57]. Those
trained models are then used to predict outcomes for unknown data such as new
6
Artificial Intelligence in Computer Science and Mathematics Education
customers, new bank loan applications or new telecoms customers. There is a range
of algorithms and ML techniques to do this including Support Vector Machines
(SVM), Naive Bayes, Decision Trees and the recently popular Deep Learning or
Neural Networks. New advances and techniques are being discovered regularly. One
common challenge across all ML techniques and across all ML applications, is de-
ciding what data to use to represent customers, bank loan applicants or telecoms
customers, whatever the application is: which kind of customer data is most impor-
tant, which is of little value and which can be discarded. This is sometimes called
“data wrangling” and involves manual feature engineering including data cleaning
and can take far more time than doing actual data analytics.
One of the main objectives of the work in this thesis in the area of learning
analytics is to explore the latent signals or information buried in raw data by building
high dimensional and distributional representations of student profiles and their
programming codes or the outputs of their programming assignments. We propose
a new methodology to profile individual CS students based on their programming
design using a technique called embeddings. An embedding is a mapping from
discrete objects to real number vectors. Such mappings constitute mapping to a
dimension which may not always be meaningful or easily explainable in Machine
Learning. However, the patterns of location and distances between vectors derived
from embeddings may uncover numerous latent factors among the embeddings. In
recent research in Deep Learning and Artificial Intelligence, the value of the amount
of data has surpassed the complexity of the models. Thus, we investigate the use of
hundreds of thousands of code submissions inputted to a Deep Learning model
7
Artificial Intelligence in Computer Science and Mathematics Education
The research questions that we investigate in this particular aspect of the thesis
work can be enumerated as the following:
RQ2: How can students’ programming submissions be encoded into vectors for use
as internal representations of those students?
RQ3: By leveraging the vectorisation of code submissions for a given course, how
can we represent students based on their programming work?
8
Artificial Intelligence in Computer Science and Mathematics Education
The accuracy of our Predictive Analytics models is crucial as students will receive
customised feedback regarding their predicted performance. Then, we were able to
measure the engagement with these customised notifications and how that could
be an indicator of their performance. In addition, students were surveyed for their
views and impressions.
The research questions derived from this subsection are the following:
RQ4: What are the effects of timely automatic adaptive support and peer-
programming feedback on students’ performance in computer programming courses?
RQ5: What are students’ and teachers’ perspectives and experiences after adopting
a predictive modelling and adaptive feedback system into their own classes?
9
Artificial Intelligence in Computer Science and Mathematics Education
MOOCs are revolutionising education by giving students around the world open ac-
cess to first-class education via the web. Lectures, readings, exercises and discussion
forums are now one click away for anybody with an internet connection and a com-
puter, anywhere. MOOCs gained popularity in 2012, which according to the New
York Times, became “the year of the MOOC”. Since then, the leading providers
have been Coursera, Udacity and edX.
The single research question derived in this area is stated as the following:
RQ6: Can we extract valuable insights from massive open online learning platforms
utilising the sequences of learning states?
10
Artificial Intelligence in Computer Science and Mathematics Education
(i) Introduction: the current chapter, which presents some of the context for
the work reported later.
(ii) Literature Review (Chapter 2): this chapter explores state-of-the-art re-
search in Learning Analytics and Educational Data Mining available in the
literature.
(iii) Students’ Digital Footprints and Data Used in the Thesis (Chapter
3): we introduce the datasets used for the studies which we use throughout
the thesis, and these datasets are taken from two institutions.
(iv) Modelling Student Online Behaviour (Chapter 4): this chapter gives
an overview of how to deploy a traditional Machine Learning model in an
educational environment using students’ digital footprints taken from that
education environment.
(v) Modelling Students With Embeddings (Chapter 5): building on the pre-
vious chapter we explore how we can model students using their code submis-
sions by leveraging the technique of embeddings.
(vi) Adaptive Feedback to Students (Chapter 6): we study how students im-
prove their performance in end-of-semester module examinations based on the
feedback provided to them.
(vii) Using Graph Theory and Networks to Model Students (Chapter 7):
we look at graph theory to explore how students learn mathematical concepts.
(viii) Conclusions (Chapter 8): this final chapter summarises the research pre-
sented, revisits the research questions and asks have they been answered and
proposes future directions for further research.
11
Chapter 2
Literature Review
2.1 Introduction
Research has shown that there has been significant interest in searching for the
factors which motivate students to succeed in their first computer programming
module as they master a programming skill set. In particular, researchers have been
trying to identify the so-called “weak” students by looking at their characteristics,
demographics, online and offline behaviour and performance in assessments [48].
Demographics, academic and psychological factors are all examples of static char-
acteristics. When used for predicting computer programming success they include
such things as prior programming knowledge [62], prior academic history like math-
ematics scores, number of hours playing video games and programming self-esteem
[16, 88]. All these have been used in analysis of learning of computer programming,
and with some success.
12
Artificial Intelligence in Computer Science and Mathematics Education
However useful these are, these factors do have some limitations [105]. First,
this information has typically been gathered using written questionnaires. Lectur-
ers in University settings have to process them and by the time they finish their
course, some students may already have disengaged with their course. Second, and
more importantly, these parameters do not reflect the students’ actual effort and
their learning progress throughout their course and might discourage students who
are working on the material but possess characteristics like previous mathematics
results, that are likely to present difficulties.
More recently, researchers have shifted their focus to a more data-driven approach
to predicting student outcome by analysing computer programming behaviour, in-
cluding patterns in compilations and programming states associated with the com-
puter programs that students write and submit for assessment. These are substan-
tially more effective at reflecting actual programming ability and competence, as
well as progress in learning, than the characteristics on test performance gathered
prior to the commencement of the course [104, 18].
The two main predictive measures are the Error Quotient [54] and the Watwin
Score [105] which measure a student’s behaviour between compilations and transi-
tions in their learning from compilation errors. These metrics gather snapshots of
the student’s code on compilation using BlueJ or Microsoft Visual Studio with the
OSBIDE plug-in while teaching using the Java or C++ programming languages and
they potentially augment the programming environment to offer dynamic feedback
or pathways.
Based on these predictors, new models, like the Normalised Programming State
Model [23] which focuses on learning transitions, or data-driven approaches using
machine learning, are emerging for these type of courses [1, 8]. In Computer Science
education research, there are further studies to evaluate how students learn and to
identify “at-risk” students by detecting changes in their behaviour as they learn
computer programming, over time [39, 24].
13
Artificial Intelligence in Computer Science and Mathematics Education
14
Artificial Intelligence in Computer Science and Mathematics Education
identify those in need [51]. Combining learning analytics engagement features with
programming states or behaviours in large classes can enable Lecturers to automat-
ically identify students having difficulties at an earlier stage [58].
In research into Computer Science Education, based on the granularity, namely
frequency and type of events, different models have been developed for student
programming learning behaviour. The digital footprints used to drive these models
include key strokes, program edits, compilations to executions and submissions [51].
In our work, explained in more detail in the following sections, we leverage an
automated grading system for the teaching of programming. We collect submissions,
a fine-grained footprint about each submission, and web logs regarding students’
interactions with the material. However, we should note, we are limited by the
frequency of the students submitting their solutions and we miss the programming
actions in between.
(i) Data management and storage: data typically comes from files, databases
or streams (for real time processing of live data). Files are good for distri-
bution, and they can be structured or unstructured data. Databases are a
good choice for centralised information and network access and the structure
can be enforced using schemas. In our research, we sync all the students’
programming file submissions to our own systems and we are also provided
access to other files such as grades and student demographics. These files are
usually in plain text format such as CSV or JSON which are human readable.
Binary formats are also used for storing numeric arrays. In addition, when we
develop web platforms for Faculty to look at we store this type of information
in structured or unstructured databases.
15
Artificial Intelligence in Computer Science and Mathematics Education
(ii) Data wrangling and cleaning: datasets typically contain errors, inaccu-
racies, missing values, duplicates, inconsistencies, etc. Data wrangling is the
process of transforming raw data into data we can process for extracting useful
information. Raw data should be kept separate from cleaned data. Typically,
we fix inaccuracies of the data and deal with missing values at this stage of
our work.
(iii) Data summarisation: From the data distributions of each of the variables
in our dataset, we analyse the measures of central tendency (statistics to cap-
ture the middle of the distributions) such as mean, median and mode as well
as measures of statistical dispersion (statistics to measure how stretched each
distribution is) such as variance, standard deviation and inter-quartile range.
In addition, we can measure statistics of association between variables such
as the covariance (how much two variables vary together), the linear (Pear-
son) and non-linear (Spearman) correlation coefficients (normalised version of
the covariance for measuring the relationship between quantitative variables)
and the mutual information (measure of the mutual dependence between two
variables which is also knows as the “correlation for the 21st century” [99]).
In our work, we confirm the predictive power of our features by analysing the
correlation coefficients with a target variable. For instance, the programming
percentage of work done by students is typically highly correlated with their
performance on examinations.
(iv) Data visualisation: exploring the distributions of our variables and their
relationships visually is incredibly useful. This provides us with sanity checks
for our datasets and we can generate or confirm any hypothesis we may have
at this stage. Visualisations are also key to communicate our hypotheses,
conclusions and findings. For that, we typically use histograms for distribu-
tions, scatter plots for relationships between variables or bar charts to compare
quantities.
16
Artificial Intelligence in Computer Science and Mathematics Education
Our dataset will be split into three sets: training, validation and testing.
Training will be used to fit the model and the validation data to optimise
the hyperparameters of the learning function. Cross-Validation is a tech-
nique used to validate the model that repeatedly trains it and tests it on
17
Artificial Intelligence in Computer Science and Mathematics Education
a subset of the data (also known as folds). The testing set will be used
to calculate the error using a scoring method such as Accuracy or F1-
Score. Some of the more commonly used supervised learning algorithms
are: Linear Regression (regression), Logistic Regression (classification),
Decision Trees (typically classification), Random Forests (typically clas-
sification), Support Vector Machines (typically classification) and many
more. A description of these different algorithms is beyond the scope
of this thesis but can be found in any good online learning material or
textbook such as [43] and [17].
18
Artificial Intelligence in Computer Science and Mathematics Education
Figure 2.1: Venn diagram showing how deep learning is a kind of representation
learning, which is in turn a kind of machine learning, which is used for many but
not all approaches to AI. Image taken from [43].
19
Artificial Intelligence in Computer Science and Mathematics Education
Figure 2.2: Flowcharts showing how the different parts of an AI system relate to
each other within different AI disciplines. Shaded boxes indicate components that
are able to learn from data. Image taken from [43].
20
Artificial Intelligence in Computer Science and Mathematics Education
Even more recently, [6] developed a code2vec neural attention network that col-
lects AST paths and aggregates them to extract syntactic information from code
snippets. Their objective was to predict semantic properties such as method names
by representing snippets of code as continuous distributed vectors, also known as
Code Embeddings. In our work, we build similar higher-level distributed vectors to
predict the correctness of code solutions to verify patterns and meaningful informa-
tion is then extracted.
21
Artificial Intelligence in Computer Science and Mathematics Education
that human tutors would have given. Also, [44] proposed feedback strategies and
automatic example assignments using structured solution spaces. More recently,
[86] collected a dataset of rich events streams. Instead of studying artifacts after
they happened, they build FeedBaG, a general-purpose interaction tracker for Vi-
sual Studio that monitors development activities and collected data from software
developers.
In our work, code solutions from students are transformed into continuous dis-
tributed vectors, Code Embeddings, to be used as a representation of their program-
ming submissions (code2vec). These vectors are leveraged to construct a matrix that
represents each user in a comparable way (user2code2vec). [93] proposed a Tensor
Factorisation approach for modelling learning and predicting student’s performance
that does not need any prior knowledge. This work outperformed state-of-the-art
approaches for measuring learning and predicting performance such as Bayesian
Knowledge Tracing and other tensor factorisation approaches. We were inspired
by this work [93] to develop a similar representation for users who learn coding at
our University and we use embeddings to learn higher level representations of that
information.
22
Artificial Intelligence in Computer Science and Mathematics Education
23
Artificial Intelligence in Computer Science and Mathematics Education
Feedback has always been one of the most effective methods in enhancing stu-
dents’ learning [47]. There is an abundance of factors that affect educational achieve-
ment. Some factors are more influential than others. For instance, feedback types
and formats and the timing of providing feedback [95] are both important. Stud-
ies have reported that positive feedback is not always positive for students’ growth
and achievement [47]; “critical” rather than “confirmatory” feedback is the most
beneficial for learning regardless of whether feedback was chosen or assigned [31].
Content feedback achieves significantly better learning effects than progress feed-
back, where the former refers to qualitative information about the domain content
and its accuracy, and the latter describes the quantitative assessment of the stu-
24
Artificial Intelligence in Computer Science and Mathematics Education
dent’s advancement through the material being covered [53]. Several of the different
feedback factors were explored at the intersection with the learner’s variables (i.e.
skills, affects) and reported to support personalised learning [75]. For instance, cog-
nitive feedback was found to make a significant difference in the outcomes of student
learning gains in an intelligent dialogue tutor [21]. Students’ affects were adapted to
improve motivational outcome (self-efficacy) in work reported in [21, 32] while using
student characteristics as input to tutoring feedback strategies to optimise students’
learning in adaptive educational systems was reported in [76]. While a large body of
empirical studies investigate the feedback impacts in the context of learning [110],
we focused on researching educational technology to support delivering adaptive
feedback for computer programming courses.
In the final topic for our literature review, we briefly summarise work in the area
of graph theory and networks. The reason for including this topic is that a system
called ALEKS (Assessment and Learning in Knowledge Spaces), which we describe
in the next Chapter in Section 3.2.2, leverages AI techniques in order to map stu-
dents’ knowledge. ALEKS is based on knowledge spaces, which was introduced in
25
Artificial Intelligence in Computer Science and Mathematics Education
1985 by Doignon and Falmagne, who describe the possible states of knowledge of a
learner [35]. In order to develop a knowledge space, a domain like Algebra or Chem-
istry is modelled and divided into a set of concepts and feasible states of knowledge
where the student’s knowledge is at any given point in time. This technology adapts
and navigates the students by determining what the student may know and may
not know in a course and guides her to the topics she is most ready to learn. It as-
sesses the student’s knowledge periodically to ensure topics are learned and retained
[36]. Recent research has shown that using ALEKS for learning Mathematics has a
positive learning impact on an after-school program for more than 200 sixth graders
[29, 30].
Related to our use of data from the ALEKS system, recent research has shown
that Network Analysis measurements can be used as predictive features for machine
learning models in addition to generic content-based features [26]. Moreover, se-
quential modelling (i.e. Hidden Markov Models (HMMs)) can be useful to uncover
student progress or students’ learning behaviours [84, 83, 50]. We hypothesise that
modelling the evolution of a large number of students’ working behaviours with so-
cial network features, will allow us to uncover students’ progression. This, in turn,
will allow the possibility to enhance the student experience with further personalised
interventions in these Intelligent Tutoring Systems as they gather rich information
about concepts, topics and learning states.
26
Chapter 3
In the work reported in this thesis, we leveraged data from two Higher Education
Institutions, namely Dublin City University (DCU) and Arizona State University
(ASU).
Higher Education Institutions collect data about their students at multiple points
during the student journey and they store this in different locations and on different
institutional systems. This data includes information on students’ background and
demographics at the time of initial registration, interaction with the institution’s on-
line learning environments and other online resources like WiFi access, online library
resources and student support services, geolocated data from physical locations like
lecture attendance or library accesses, and some aspects of their social activities like
memberships of clubs and societies. Leveraging all these sources of information and
many more, if they were integrated together, could shape a picture of the students’
engagement and involvement on campus.
27
Artificial Intelligence in Computer Science and Mathematics Education
even greater digital footprint than learning in other disciplines [51, 52]. Some of
these platforms are: Web-CAT [38], CloudCoder [81], CodeWorkout [37], Blackbox
[22] using the BlueJ plugin [59] and many more.
These types of computer program submission platforms are often used to eval-
uate the correctness of the students’ computer programming work which acts as a
measure of their progression through the course, and the effectiveness of their learn-
ing. Analytics platforms could be used to make use of this information in order
to understand students’ engagement and behaviour, which could, in turn, be an
indicator of their learning experience. Unfortunately, these automated assessment
systems are not the only tools that the students and the instructors will use, espe-
cially when students take multiple courses or modules in the computer science area,
and they often have to switch among several online educational platforms for each
course. Therefore, without collecting all the diverse interaction data, plus all the
other data on students that an institution has, it is challenging to establish reliable
groundtruth data in order to train predictive models.
28
Artificial Intelligence in Computer Science and Mathematics Education
Dublin City University offers two honours Bachelor degrees in Computer Science
through its School of Computing, a B.Sc. in Computer Applications (CA) and a
B.Sc. in Enterprise Computing (EC). CA prepares students for a career in com-
puting and information technology by giving them in-depth knowledge of software
engineering and the practical skills to apply this knowledge to develop the tech-
nology behind computing-based products. EC prepares students to use computing
technology to help organisations to work together and give companies a competitive
edge in the marketplace. The EC degree is more focused on topics like managing
information technology and developing and using systems to improve and even to
re-design the way organisations do business. It is safe to say, CA teaches students
deeper computer programming skills and EC is more business and project manage-
ment oriented.
The data used in this thesis from DCU was drawn from students registered in
the CA and EC degree programs. Specifically, the data sources we made use of in
order to model student interaction, engagement and effort in computer programming
courses in DCU consists of:
29
Artificial Intelligence in Computer Science and Mathematics Education
• Interaction logs: Students interact online with the custom VLE developed
for computer programming courses and every instance of a student’s access to
a page of any kind is recorded and stored. These are web logs from an Apache
web server for the resource or page requested, the date and time of access, the
unique student identifier, and the IP address of the device used for access.
Table 3.2 presents a list of the courses we used to test our various research
questions and hypothesis that will be discussed in the following sections. Some of
these courses were delivered multiple times and we use data gathered from multiple
runnings of these courses.
These courses use the custom VLE for the teaching of computer programming
which allows us to capture a fine-grained digital footprint of students interacting
with computer programming learning material and submitting their code solutions.
1
Dr. Stephen Blott is an Associate Professor at the School of Computing in Dublin City
University http://www.computing.dcu.ie/~sblott/
30
Artificial Intelligence in Computer Science and Mathematics Education
Figure 3.1: Screengrab from the Virtual Learning Environment for the Teaching of
Computer Programming at Dublin City University
31
Artificial Intelligence in Computer Science and Mathematics Education
Figure 3.2: Instant Feedback Provided to the Student After Submitting a Program
to the Automated Grading Assistant
In addition to log and interaction data from the custom VLE for learning computer
programming, a range of other sources have been used to extract data on all first-year
undergraduate students at the university. This was gathered in order to model their
behaviour and chances of success at the university, and it includes the following:
• Library: dates of borrowing instances (books) and dates and times of each
occasion entering library building.
32
Artificial Intelligence in Computer Science and Mathematics Education
• General VLE usage: students at Dublin City University use Moodle as their
main VLE and records associated with assignment submission were recorded
and extracted.
Approval for access to this confidential non-anonymised student data was granted
by DCU Research Ethics Committee, reference DCUREC/2014/195.
33
Artificial Intelligence in Computer Science and Mathematics Education
(which mostly are students’ clickstreams) are logged along with their timestamp.
Examples of the clickstream data which is logged include logging in and out, clicking
on a question to review, bookmarking a question, navigating through an exam, and
taking of notes.
The data used in this thesis was collected from a classroom study conducted
in a Data Structures and Algorithms course offered during the Fall 2016 semester.
This class had a total of 3 exams and 13 quizzes. Among the 13 quizzes, only 6
were graded while the remaining 7 were recorded only for attendance (full credit
was given regardless of the answers). There were 283 students enrolled in the class
but only 246 (86.93%) were included in the study as those who dropped the course
in the middle of the semester, did not take the three exams, or did not use the
reviewing platform at all had to be removed. In the study presented in this thesis,
we analysed review actions performed by students. A review action is an event where
a student examines his or her graded answer. It includes reading the question, the
answer, the assigned score and the feedback provided by the grader (see Figure 3.3
for an example). These review actions are collected via the web logs. We consider a
student reads a question or answer when he or she clicked on the material resource.
In 2016, Arizona State University (ASU) launched the Global Freshman Academy
(GFA) where they provide first-year university courses through the EdX platform
allowing students to earn transferable ASU credits from anywhere. GFA makes
university education available to anybody, from high school students to retirees
going back to study at college. ASU currently offer 13 courses and our analysis will
focus on two Mathematics modules: “College Algebra and Problem Solving I”, and
“Precalculus”.
These courses leverage the Assessment and Learning in Knowledge Spaces (ALEKS)
technology, which is a web-based artificially intelligent assessment and learning sys-
tem owned by McGraw-Hill Education. This technology was developed at New York
34
Artificial Intelligence in Computer Science and Mathematics Education
Figure 3.3: Web Programming Grading Assistant Platform When a Student Reviews
Her Graded Response to an Exercise
35
Artificial Intelligence in Computer Science and Mathematics Education
ALEKS and daily aggregates of the topics learned and retained were generated. We
tracked 40,000+ assessments and 8+ million daily aggregates in this work. In addi-
tion, 5+ million transactions of students navigating through the concepts have been
extracted from the EdX logs. Each timestamped transaction contains information
on the student, the concept being studied and a learning state. The learning states
are the following, and final states are determined automatically by the system:
• L: Initial state for each concept where a student reads the Lesson
In the next chapter we will describe how we used established machine learning
techniques to model students’ learning based on their digital footprints.
In order to determine which features have more importance we used two approaches:
• The first one is an extra trees classifier, a type of forest that fits a number of
randomized decision trees [41]. We utilized this technique for the predictive
models developed for Dublin City University’s computing courses, explained
in more detailed in Chapter 4. We fitted an extra trees classifier with 250
estimators to measure the importance of each feature. Interestingly, we can
observe how static features such as previous university scores or the entry-to-
university Mathematics score are important in the first week of the semester
while the dynamic programming work features and the effort students put in
36
Artificial Intelligence in Computer Science and Mathematics Education
• The second one is an ablation study, which refers to removing some feature of
a model from the algorithm and seeing how that affects performance [65]. We
leveraged this approach for determining the most meaningful features for all
first-year students at Dublin City University, also explained in more detailed
in Chapter 4. We excluded each of the feature from a linear model in order
to measure how much variance the feature contain. The top features were
considered the most meaningful ones.
Finally, in our projects we can reduce the number of features in our models by
selecting the most meaningful parameters. In addition, we can also reduce this di-
mensionality by using techniques such as Principal Component Analysis (PCA) [56]
which projects the data into a subspace maximizing the variance retained. These are
areas of great interest in ML, either selecting important features or dimensionality
reduction, but are out of the scope of this thesis.
37
Chapter 4
4.1 Introduction
Dublin City University’s academic year is divided in two semesters with one week of
inter-semester break in between. Semesters are comprised of a 12-week teaching or
classes period, a 2-week study period and a 2-week exam period. Laboratory sessions
and computer-based examinations are carried out during the teaching period. We
have developed predictive models for a range of computer programming modules
including the following:
38
Artificial Intelligence in Computer Science and Mathematics Education
39
Artificial Intelligence in Computer Science and Mathematics Education
involves two hours of supervised laboratory work each week using the Bash
Unix shell and command language. This course has been taught for the past
seven academic years since 2010/2011. Students work with the Bash Unix
shell and the command language.
CA116 and CA277 are taught in the first semester (Fall). CA117, CA114 and CA278
are taught in the second semester (Spring). In all courses, students are assessed by
taking two laboratory computer-based programming exams, a mid-semester and
an end-of-semester assessment, during the teaching period. In CA278, instead of an
end-of-semester lab exam, students demonstrate a working project. Each laboratory
exam or demo contributes equally to their continuous assessment mark; 15% in
40
Artificial Intelligence in Computer Science and Mathematics Education
CA117, 25% in CA114 and 20% in CA278. Students are not required to submit
their laboratory work for CA114 or CA277. In contrast, laboratory work count
towards their final grade of the course for CA117 and CA278, both count as 10% of
the overall grade for the course.
The university also provides in-lab peer-mentoring for some courses as well as the
lecturer attending the laboratory sessions. In CA116 and CA117 around eight CA
second-year students give tutoring support during laboratory sessions. In CA114
and CA278 a postgraduate student has been providing support to students during
laboratory sessions. The automated grading platform, introduced earlier in Sec-
tion 3, is currently used in a variety of programming courses across computing at
DCU including CA116, CA117, CA114, CA277 and CA278. Students can browse
course material, submit and verify their laboratory work.
We know that students’ digital footprints commence prior to their arrival at the
university as demographics and GPA (CAO points) are collected at the time of
application.
We analysed 950 first-year Computer Science (CS) entrants across a seven year
period through the Leaving Certificate entry route. Early analysis showed a high
correlation between the entry level GPA equivalent and first year final exams aggre-
gate as shown in Figure 4.1. CAO (Cantral Applications Office) points can max at
600 and while there is no theoretical minimium, there is a minimum number of CAO
points required for entry into each University course in Ireland and generally this
is above 300 points. The Precision Mark shown on the y-axis in Figure 4.1 is the
overall percentage aggregated across all subjects in first year, including computer
programming modules.
41
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.1: University Entry Points Correlated with First-Year Precision Mark for
Computing Students at Dublin City University from 2013 - 2014 Academic Year to
2016 - 2017 Academic Year
where our work would have the most impact. For instance, Figure 4.2 shows the
number of students enrolled in one of these courses, CA114, since it was introduced
in the CS curriculum in 2010/2011 academic year.
Figure 4.3 shows the number of examinations taken in May with respect to the
resit examinations which are taken in August, from 2009 - 2010 Academic Year to
2015 - 2016 Academic Year.
Figure 4.4 shows how CA114 is one of the courses with the highest failure rates
in some past years and where students were having the most issues. This module
along with the others mentioned earlier in first and second year were then added to
our studies.
42
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.2: Number of Students Enrolled in CA114 over from 2009 - 2010 Academic
Year to 2015 - 2016 Academic Year
Figure 4.3: CA114’s Numbers per Examination over from 2009 - 2010 Academic
Year to 2015 - 2016 Academic Year
For each course, a number of features were extracted from the data in a weekly
basis. A combination of static and dynamic student features was used for a weekly
43
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.4: CA114’s Failure Rates Per Examination from 2009 - 2010 Academic
Year to 2015 - 2016 Academic Year
learning function we will introduce in the next section. A set of static features was
extracted before the start of the semester for each course and each student based
on their characteristics and prior academic performance. Then, each week, a set
of dynamic features was collected for each student by building engagement and
progression features based on their interactions and submissions to the platforms.
The data sources we leverage in order to model student interaction, engagement and
effort in computer programming courses are student characteristics, prior aca-
demic history, interaction logs and programming submissions, as explained
in Chapter 3.
The following set of static features were extracted before the start of the
semester for each course and student:
• Student characteristics:
44
Artificial Intelligence in Computer Science and Mathematics Education
– Irish CAO points and Leaving Certificate exam scores (equivalent to High
School GPA and SAT exams in the US)
Then, each week, a set of dynamic features were extracted for each student based
on raw log data, interaction events for students accessing material and corresponding
computer programming submissions. For instance, in [69] it was found that high
level of VLE activity was a good indicator and, in particular, evening activity was
a indicator of good performance. See the following dynamic features extracted:
• programming effort:
• engagement:
– Resources clicked with respect to all resources made available each week.
– Average lapse time between a resource being made available and the
student accessing it for the first time.
45
Artificial Intelligence in Computer Science and Mathematics Education
Table 4.1 lists the features with an associated short name that we will use for graphs
and tables in the remainder of this chapter.
At this point in the thesis, we will focus on just one of the modules in order
to make the research methodology understood better. The module we choose is
CA116, Programming I, for first-year Computer Applications students.
In order to identify the predictive power of the various features in our student
model, we measured the linear and non-linear relationships between their values
46
Artificial Intelligence in Computer Science and Mathematics Education
and our predicted target, the next laboratory exam results. We used the Pearson
and Spearman correlation coefficients and p-values (to indicate the probability of
uncorrelation and the null hypothesis to be incorrect).
For instance, in the last week of the teaching period (week 12), see Figure 4.5,
the cumulative programming work and the grade are highly correlated (correlation
coefficient of 62% with an associated p-value less than 0.01).
For each week in the academic year 2016/2017, we measured the correlation
among features of the same week for CA116, see Figure 4.6, the labels for each graph
are in the same order as in Figure 4.5: Campus Rate, Coverage, Cum. Programs,
Programs Correct, Week Rate (all of them for each week) and the Grade (the target
of our predictions and the parameter we use to calculate the correlation coefficients).
This analysis confirms the predictive power of our features and the programming
weekly and cumulative progress features increasingly gain importance throughout
the semester as students put more effort into the module and the module learns
47
Artificial Intelligence in Computer Science and Mathematics Education
more about each student because it has more data on each student to learn from.
48
Artificial Intelligence in Computer Science and Mathematics Education
and Testing
Over the last few years, at Dublin City University, we have been developing pre-
dictive models to identify students struggling or having issues with course material
that they are studying. We do this by training these models with past student data
from previous cohorts of students and running these in pseudo real-time (running
them every week) with incoming new student data on the present cohort of students
based on their engagement activities during the previous week.
For the course CA116 and the academic year 2018/2019, we trained weekly
predictive models leveraging student data from student cohorts in 2016/2017 and
2017/2018. See Figure 4.2 for details. This module had three laboratory examina-
tions during the semester which took place during weeks 4, 8 and 12. The results of
those exams will be the target of our predictions. The proportion of students that
passed vs. failed in the training and validation sets is roughly the same so we have
a balanced training set.
Table 4.2: CA116 Split between Training, Validation & Test sets
49
Artificial Intelligence in Computer Science and Mathematics Education
A set of binary classifiers, one per week, were built to predict a student’s likelihood
of passing or failing the next computer-based laboratory exam based on their data.
CA116 had three laboratory exams every semester so in that case, to clarify, classi-
fiers from week 1 to 4, were trained to predict the laboratory exam outcome (pass
or fail for each student) on week 4; from 5 to 8, the laboratory exam outcome on
week 8; and from week 9 to 12, the end-of-semester’s laboratory exam outcome in
week 12.
At a given week, the dynamic features mentioned above were extracted from that
week’s activity log and programming submissions. Then, every week, a classifier was
built by concatenating the static student data, the dynamic features from previous
weeks’ classifiers and that week’s dynamic ones in order to account for each student’s
progression throughout the course.
In terms of implementation, the empirical error minimization approach was em-
ployed to determine the learning algorithm with the fewest empirical errors from
a bag of classifiers C [33]. The bag of classifiers consists of the following learning
algorithms:
(b) Decision Tree: interpretable model where a tree is built by splitting the train-
ing data using the classification features.
(c) Random Forest: ensemble classifier that fits a number of decision tree classi-
fiers and uses averaging to improve accuracy and control overfitting.
(d) K-Neighbors (k-NN): algorithm that memorizes the labelled data during train-
ing and makes a decision on classification by looking at the k closest training
examples. k is a hyper-parameter.
50
Artificial Intelligence in Computer Science and Mathematics Education
These models were trained every week with the training data and evaluated on the
validation data for several scoring metrics including receiver operating character-
istic area under the curve (ROC AUC), accuracy (see Figure 4.7), F1 score (see
Figure 4.8), precision and recall. In addition, instead of just taking the learning
algorithm with the lowest empirical risk or the highest metric (namely accuracy or
F1-score), we also looked at these metrics per class. Generally, the results on the
next laboratory exam, our target variable, is quite imbalanced as on some courses
there are more students that pass than fail the exams. The resulting accuracy of a
learning algorithm could be misinterpreted if we weight the predictions based on the
numbers per class. Our goal is to identify weak students as we would rather classify
students “on the edge” as likely to fail than not flagging them at all and miss the
opportunity to intervene and help them.
Figure 4.7: Empirical Risk for CA116 for the Training Data using Accuracy
We should note that we only have two years of archival data from previous student
51
Artificial Intelligence in Computer Science and Mathematics Education
cohorts and thus not many samples of past student data for a machine learning
algorithm. Hence, most classifiers perform very similarly.
Figure 4.8: Empirical Risk for CA116 for the Training Data using F1-Score
Figure 4.9: Empirical Risk for CA116 for the Training Data using F1-Score for the
Fail Class Only
Following this approach, we chose to use the learning algorithm which minimized
52
Artificial Intelligence in Computer Science and Mathematics Education
the empirical risk on average for the 12 weeks which was then used in deployment for
each weekly classifier. Figure 4.8 and Figure 4.9 shows how some of these classifiers
are not doing a good job at detecting students that are likely to fail. Classes are
typically imbalanced as there might be more students passing a particular exam than
failing it. Hence, F1-Score is an appropriate metric that shows the MLP classifier
is not learning at all and predicting all students would pass for some of the weeks.
Random Forest is the chosen algorithm from the bag of classifiers as it minimized
the empirical risk on average for the 12 weeks and keeps a good balance for both
classes. Random Forest has proven to outperform other classifiers by mining student
data for predicting performance [68]. Other algorithms in our bag of classifiers such
as MLP might have poor initializations and does not reach a global minima.
In our study, we chose a Random Forest classifier and, for instance, one of the
hyperparameters that has to be tuned is the number of trees. So, for every week,
using the validation data we held out, we optimized these hyperparameters for a
Random Forest classifier using Grid Search. Then, we stored these weekly learned
53
Artificial Intelligence in Computer Science and Mathematics Education
models to be used with incoming student data. Figure 4.10 shows how this trained
model performs on the validation data we held out.
risk”
For the incoming cohort of 2018/2019 students taking the CA116 module, we lever-
aged the models trained on previous cohorts to predict whether each incoming
student would pass or fail the next laboratory examination, every week, and the
associated probability of doing so. A summary report was sent to the lecturers
of the module and other faculty associated with the classes. See Figure 4.11 for
an anonymized example of the feedback on (in this case just 3) student predicted
outcomes. This information was also posted on our web application accessible to
lecturers. Reports were also posted on a web application where the retrospective
analysis, the feature values tabulated for each student and more analysis in detail
could be found, similar to [34].
54
Artificial Intelligence in Computer Science and Mathematics Education
elling
In section 1.1 the first research question in the thesis was introduced and it is re-
stated here for convenience:
RQ1: When working with new cohorts of University students about whom we
have little historical interaction data, how accurate are the traditional Predictive
Analytics models when used with generic static and dynamic student data features,
in identifying those students in need of assistance in computer programming courses?
We analysed the preliminary results obtained and their impact on the first research
question proposed in this chapter. In order to evaluate how the predictions per-
formed in terms of accuracy, we compared the corresponding weeks’ predictions
with the actual results of the three laboratory exams that took place in weeks 4,
8 and 12 for CA116. This meant we were able to investigate how our predictions
worked with respect to the actual 2018/2019 students’ grades and the details are
shown in Table 4.3. As the semester progresses, the pass rates for the three labo-
ratory exams reduce. We only show the weeks where there was a laboratory exam
in this case, but we could also have a look at the rest of the weeks and see how our
model performed on new incoming data.
For each of those exam weeks we created a confusion matrix with the expected
pass/fail and the actual results by looking at the true positives, true negatives,
55
Artificial Intelligence in Computer Science and Mathematics Education
Table 4.3: CA116 Prediction Metrics including passing rates and at-risk rates
false positives and false negatives and from this we are able to calculate accuracy,
precision, recall and F1-score as shown in the Table. Also, for each laboratory exam,
we looked at the passing rate to compare it with the percentage of students at-risk
predicted by our models. These metrics were also shared with the lecturer. The
evaluation metrics are now further explained:
Figure 4.12 shows how accurately the predictions worked for each week of the
semester using the F1 measure (not only the exam weeks). We chose F1 as the
main evaluation metric as classes might be imbalanced. Generally, the number of
students that pass is not the same as the number of students that fail and we can
not rely on accuracy.
56
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.12: Evaluation using F1 for CA116’s Incoming 2018/2019 Cohort Shown
Weekly
to have a deeper look at the confusion matrices for each of the exam weeks.
Figure 4.13 shows the confusion matrix for Week 4 while Figure 4.14 shows the
confusion matrix for Week 8 and Figure 4.15 shows the confusion matrix for Week
12.
Figure 4.13: Confusion Matrix for Week 4 for CA116’s Incoming 2018/2019 Cohort
Condition Positive means students passed the examination and Condition Negative
means students failed the examination. Predicted Condition Positive means students
were predicted as they were going to pass the examination and Predicted Condition
57
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.14: Confusion Matrix for Week 8 for CA116’s Incoming 2018/2019 Cohort
Figure 4.15: Confusion Matrix for Week 12 for CA116’s Incoming 2018/2019 Cohort
Negative means students were predicted as they were going to fail the examination.
Moreover, True Positives means students passed the examination and were predicted
to do so. True Positives means students failed the examination and were predicted
to do so. False Positives means students were predicted or expected to pass the
examination but in reality they failed. False Negatives means students were expected
to fail the examination but in reality they passed.
The analysis carried out on CA116, as a usecase for this chapter1 , showed we
were successfully able to:
(i) gather student data about the student’s learning progress by combining static
with dynamic information regarding their characteristics, prior academic re-
1
The code for this work has been made available as a GitHub repository at
https://github.com/dazcona/edm-modelling
58
Artificial Intelligence in Computer Science and Mathematics Education
(ii) leverage that digital footprint for predicting how new incoming students are
likely to perform reaching a usable accuracy
Behaviours at ASU
59
Artificial Intelligence in Computer Science and Mathematics Education
semester, did not take the three exams, or did not use the research platform. In this
study, review actions performed by students were analyzed. A review action is an
event where a student examines his or her graded answer.
Predictive models were trained after extracting patterns and tested with the goal
of identifying students’ academic performance and those who might be in need of
assistance. The results of the retrospective analysis show a reasonable accuracy. This
suggests the possibility of developing interventions for students, such as providing
feedback in the form of effective reviewing strategies.
4.10.2 Features
First, a set of features was extracted based on the students’ interaction within the
system and the different actions they performed. For each assessment, the following
were gathered:
1. their grades
60
Artificial Intelligence in Computer Science and Mathematics Education
3. when they reviewed it for the first time (after being posted online)
1. number of distinct days when the system was used by each student
2. the number of interactions or how many assessments were reviewed per student
The predictive power of these features is measured by correlating their values with
the cumulative exam average (target). Table 4.4 shows a few of those features and
the corresponding correlation coefficient.
Students were more likely to obtain a better score on the average exam grade as
they:
61
Artificial Intelligence in Computer Science and Mathematics Education
On the other hand, the later the student reviewed the assessments (on average), the
lesser the chance he or she had of getting a high grade. This analysis indicates the
high correlation of some of the features such as the number of assessments covered
or the number of distinct days. In contrast, other parameters such as whether the
students reviewed specific assessments or how long it took them to review particular
assessments were individually not correlated with the average grade target.
A bag of learning algorithms was trained retrospectively using the students’ digital
footprint and their observed behavioural patterns. We analyzed their power to pre-
dict the students’ performance and their generalizability. We used cross-validation
to train and test on the same dataset. The target was to predict whether a student
will score above or below the threshold of each cumulative exam average. Note that
the cumulative exam averages (our target for each period) include: (1) the first
exam score before it took place, (2) an average between the first and second exam
before the second exam, and (3) an average of the three exams before the third
exam was taken by students. The threshold, 77.60%, was used to divide high and
62
Artificial Intelligence in Computer Science and Mathematics Education
The three exams divided the entire semester into four periods, namely Before
Exam 1, Exam 1 - Exam 2, Exam 2 - Exam 3, and After Exam 3. For each period,
a learning function is trained to predict a student’s likelihood of scoring above or
below the performance threshold on their cumulative average grades. For instance,
the first classifier was trained to predict the first exam outcome (above or below
the threshold), the second one was to predict the average outcome between the first
two exams, and so on. In a given period, the features mentioned above (Table 4.4)
were extracted from the student’s interactions with the system along with their
reviewing patterns. A classifier was built by concatenating all the features from
previous assessments, such as scores and reviewing times.
Table 4.5 shows how more features were concatenated as students were being
assessed throughout the semester. The percentage of students below the threshold
was also checked. It shows that the two target classes (above and below) were
balanced.
Table 4.5: Number Features per period and Students below the Threshold
In terms of the features’ importance, their weights were plotted in Figure 4.17
using a heatmap. The general engagement features, such as the number assessments
reviewed by students or the number of distinct days students logged in, were used
individually in the model. Their weights were calculated per period and plotted on
the graph. In addition, features developed which were specific for each assessments
were grouped (mark or score, whether they reviewed these assessments and how
early they reviewed them) into three single parameters: Mark, Reviewed and Time.
Those three parameters aggregated the importance for each of those categories. For
63
Artificial Intelligence in Computer Science and Mathematics Education
instance, the features that capture the time students review each assessment for the
first time were clustered into one parameter, Time, that adds up the weights of all
of them for the importance graph. Therefore, Figure 4.17 shows the following:
• Across all periods, these three parameters (the mark of the assessment, the
review patterns and the time to attend to review ) consistently remained the
key predictors among all the classifiers.
• The feature importance converged over time. There were more diverse pre-
dictors for the first classifier. It could possibly be due to the relatively fewer
items that could be reviewed and/or students were learning how to use the
system.
• The parameters review patterns and the time to attend to review gradually
increased their importance over time until the third exam, which highlighted
the nature of programming as accumulative. Students relied on studying past
assessments and attended to review them sooner.
• Another key parameter the mark of the assessment gained importance from
the first to the second exam and maintained it from the second to the third.
It become equally important how and when students had reviewed, and their
previous scores in quizzes and exams.
The empirical error minimization approach was employed to determine the learning
algorithm with the fewest empirical error from a bag of classifiers C [45]. Cross-
validation was utilised to train and test the bag of classifiers using 10 folds. Figure
4.18 shows a visual comparative analysis between classifiers using the Receiver Op-
erating Characteristic Area Under the Curve (ROC AUC), a well-known metric to
evaluate binary classifiers, and the number for each classifier on each period is an
average of the metric per fold.
In addition, Table 4.6 shows the chosen learning algorithm, SVM with a linear
kernel, and the results for the weighted average precision and F1-metric, which
64
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.17: Feature Importance Across Periods for ASU’s Data Structures Course
Figure 4.18: Classification Performance using ROC AUC for ASU’s Data Structures
Course
combines precision and recall, for each period. The values for the metrics shown are
the mean and the standard deviation for the cross validation folds.
65
Artificial Intelligence in Computer Science and Mathematics Education
Table 4.6: Linear SVM Classification Performance throughout the periods for ASU’s
Data Structures Course
In addition, a regression model was built to predict the precise cumulative exam
average grade for each students on these periods. In a similar manner, a linear
regression functions was constructed retrospectively per period. Cross-validation
was employed using 10 folds. The performance of the linear regression function can
be found in Figure 4.19.
Figure 4.19: Linear Regression Performance using R2 for ASU’s Data Structures
Course
Table 4.20 shows the means and the standard deviation of the folds for each
66
Artificial Intelligence in Computer Science and Mathematics Education
period using the Coefficient of Determination (R2 ) and the Mean Absolute Error
(MAE).
Table 4.7: Linear Regression Performance throughout the periods for ASU’s Data
Structures Course
Period R2 MAE
Mean (SD) Mean (SD)
In Figure 4.20, the predicted cumulative target grades were plotted with respect
to the actual results for each of the students before the third exam period.
Figure 4.20: Linear Regression Predictions vs. Actual Results Before the Third
Exam for ASU’s Data Structures Course
67
Artificial Intelligence in Computer Science and Mathematics Education
4.10.4 Conclusions
1. the gathering of student data, which tells us their reviewing learning process
RQ1 was answered, these predictions reached a usable accuracy for potential in-
terventions and feedback to students. Both models worked well and increased their
performance every week and period as students completed and reviewed more as-
sessments and more timing and engagement features were extracted.
After the last exam was finished, the three exam scores could be utilised to
calculate the average exam score and, therefore, the classification and regression
models made no mistake. It is now possible to leverage the patterns extracted from
this reviewing data to predict how a new incoming cohort of students will perform
in next versions of this course, intervene and provide personalized help to those in
need to follow desired reviewing strategies2 .
years at DCU
As part of a DCU internal project, we were provided with a dataset of data on 16,799
first-year students over a 5-year period. Each student has up to 138 data columns
(mostly categorical). This data was compiled using a variety of data sources at our
university which is explained in more detail in Chapter 3.
We explored each of 138 parameters individually, to see how their values are dis-
tributed and to verify that the dataset given to us was valid. The number of students
2
The code for this work can be shared on request given approval of the governing parties
68
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.22 shows there is a correlation between CAO marks and precision mark.
However, students can still do well even if they enter University with low points. If
we divide this data up by Faculty, Figure 4.23 shows how there are differences in
this correlation across Faculties. For instance, the Institute of Education has a small
variance and most students score between 50% and 75%. This ”flat“ distribution
may happen because the work they do in college is more important than their
previous studies. In the Engineering and Computing faculty, students can perform
very well and not so well even if they come with high points. However, students
near the 600 points mark do excel. In Science and Health, there is a high correlation
between CAO marks and precision mark but, some students still perform very well
coming with low CAO points.
69
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.22: Scatter Plot between CAO Points and the Precision Mark, color coded
by Faculty
The 138 parameters (mostly categorical) were encoded into 891 features for a ML
algorithm to use this information in building a model to predict the precision mark
of students. It is worth to note the resulting matrix was very sparse, with lots of
non-applicable values and gaps to do with repeat students and students transferring
across courses. Students were split between Training (80%) and Testing (20%) sets
by keeping a balance between students that passed or failed the first assessment.
70
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.23: Scatter Plots between CAO Points and the Precision Mark by Faculty
71
Artificial Intelligence in Computer Science and Mathematics Education
variation or impact each feature had on a predictive model, we leverage the concept
of ablation studies. An ablation study typically refers to removing some feature of
a model from the algorithm and seeing how that affects performance [65]. In our
work, we excluded each feature from a linear model in order to measure how much
variance the feature contain.
The result shows that the most impactful features which removed the greatest
amount of variance for the model generated, are quite noisy as most of them only
refer to repeat students and are blank for the rest. This is because there are a small
number of repeat vs. non-repeat students so there is a bias in that aspect of the
data.
An ablation study typically refers to removing some ”feature“ of the model developed
and then measuring how that affects performance [65]. This technique has been
recently used in Deep Learning Research and Computer Vision [42].
In addition, to explore the importance of the features in this student dataset,
we looked at different metric, Mutual Information. The Mutual Information of two
random variables measures their mutual dependence (how they vary together) [74].
Mutual Information is more general than the correlation coefficient, it can cover
categorical variables and it determines how similar the joint distribution of the pair
(X, Y) is to the product of the marginal distributions of X and Y. See Figure 4.24
for a number of graphical examples to illustrate what the shapes of the graphs of
mutual information between two variables looks like [2].
In our work, we trained a linear model using all the features given to us from
first-year students. Then, we trained models by removing features individually to
understand which features impacted the most to the performance of the model indi-
vidually. After discarding features that are only relevant to students that repeated
the year, the most meaningful features from the ablation studies are the following:
• NUM EARLY LOANS: Number of early loans (to end Nov). Number of items
72
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.24: Examples of Mutual Information between Two Variables. Image taken
from [2].
• NUM GRANTS: SUSI Grants. Number of grants received in year (2015 on)
73
Artificial Intelligence in Computer Science and Mathematics Education
• STUDENT TYPE: Student Type, 1-9 depending on year of leaving cert and
if Irish / non Irish.
• DAYS TO FIRST LAB: Days to First Lab Session (full year). If null, set to
360
• LEAV CERT MATHS: Leaving Cert Maths points. Points from best maths
result (may be from resit), excludes bonus points
Then, we measured the mutual information between each of the meaningful fea-
tures extracted from the ablation studies and the Precision Mark. The Precision
Mark is an average grade from all first year courses. Figure 4.25 shows the mutual
information analysis between these features and the Precision Mark. Each graph
shows the Pearson correlation coefficient (and p-value) along with the Mutual In-
formation coefficient.
4.11.6 Conclusion
Based on our work in exploring the longitudinal student data from DCU that we
have access to, we could not identify a single independent feature of student data
that stands above others in predicting how students are going to perform in their
first-year examinations as determined by their overall Precision Mark. From this
we conclude that there has to be a combination of features that will model how
students behave using different data sources.
74
Artificial Intelligence in Computer Science and Mathematics Education
Figure 4.25: Mutual Information Score between a Feature and the Precision Mark,
for Several Features
75
Artificial Intelligence in Computer Science and Mathematics Education
3
The code for this work can be shared on request given approval of the governing parties
76
Chapter 5
5.1 Introduction
(i) Several approaches to representing students’ source code submissions are in-
vestigated, describing the merits associated with each approach;
(ii) The performance of different source code representations for predicting the
77
Artificial Intelligence in Computer Science and Mathematics Education
In Dublin City University, students learn how to code by taking a variety of pro-
gramming modules. Students develop code algorithms for problems proposed by
Faculty. Many of these courses or modules are delivered through the Virtual Learn-
ing Environment (VLE) built for the purpose of teaching and learning computer
programming introduced earlier in chapter 3. This custom VLE enables students to
access course information, material and slides for each module. In addition, the sys-
tem integrates an automatic grading platform where students can verify their code
submissions for various programming exercises. Students typically develop solutions
locally for what are called laboratory sheets or sets of exercises and programming
tasks for the computer programming courses. Then, they submit their individual
programs online to the automatic grading platform which runs a number of test-
78
Artificial Intelligence in Computer Science and Mathematics Education
cases specified by the Lecturer on each exercise. This provides instant feedback to
students based on the suite of testcases run and ultimately tells the student whether
the program is considered correct or incorrect based on whether any of the testcases
fail. This information is invaluable to students’ learning and such a platform as this
is very beneficial in order to verify the students’ programs work as expected.
Machine Learning (ML) extracts patterns from data and learns rules without being
explicitly programmed [17]. In Chapter 4 we introduced ML and covered how it can
be used for predictive modelling. Data used by ML algorithms has to be structured
information. For instance, images are processed into matrices of numbers before
inputting them to a ML algorithm. However, data is rarely presented in a straight-
forward structured way. ML algorithms generally need to find a way to process text
information like Natural Language or multimedia such as images and videos.
79
Artificial Intelligence in Computer Science and Mathematics Education
(NLP) techniques are employed. In our case, code submissions or programs cannot
be considered as natural language and need to be parsed and analysed in a different
way. We explored the following representations of programming submissions by
tokenising the code:
In the following sections we dig deeper into vectorised representations within a math-
ematical model based on using student programming code. For that, we support our
narrative with Listings 5.1, 5.2, and 5.3. These examples are code snippets similar
to students’ submissions in our programming courses.
#! / u s r / b i n / env python
print ” H e l l o , World ! ”
#! / u s r / b i n / env python
def s a y h e l l o ( ) :
print ( ” H e l l o , World ! ” )
say hello ()
#! / u s r / b i n / env python
# read from i n p u t
a = int ( raw input ( ) ) # f i r s t
b = int ( raw input ( ) ) # second
print a + b
80
Artificial Intelligence in Computer Science and Mathematics Education
Listings 5.4, 5.5 and 5.6 show how such word vectors are extracted, prepared
and made ready to use for some of our snippets.
[ ’ p r i n t ’ , ’ ” h e l l o , ’ , ’ world ” ’ ]
[ ’ a ’ , ’= ’ , ’ i n t ( r a w i n p u t ( ) ) ’ ,
’ b ’ , ’= ’ , ’ i n t ( r a w i n p u t ( ) ) ’ ,
’ p r i n t ’ , ’ a ’ , ’+ ’ , ’ b ’ ]
These word vectors may not represent a programming submission in a very com-
parable way to other submissions that have, for instance, different variable names.
Even though the special characters like operands carry important information re-
garding these code programs, splitting the words only using spaces may not give a
useful representation.
81
Artificial Intelligence in Computer Science and Mathematics Education
82
Artificial Intelligence in Computer Science and Mathematics Education
These token categories or types have an associated identifier that can also be used
for vectorisation, as shown in Listing 5.9.
83
Artificial Intelligence in Computer Science and Mathematics Education
0 ,0 −5 ,0: 0 ’’
Although these tokens appear to represent code solutions more meaningfully than
word vectors do, information regarding the structure, design and flow of the program
is still not captured and to do this requires an even more complex representation,
as we shall see in the next sub-section.
Figure 5.1: Abstract Syntax Tree (AST) for Hello World Example
2
https://github.com/hchasestevens/show_ast
84
Artificial Intelligence in Computer Science and Mathematics Education
After traversing the ASTs, nodes can be represented using their parents in a
pair-wise way. See Listings 5.10 and 5.11 for two of the example snippets. The
ASTs are traversed using a Breadth-first search (BFS) approach.
85
Artificial Intelligence in Computer Science and Mathematics Education
’ Print ’ ’ Str ’
’ Print ’ ’ True ’
’ Str ’ ’ H e l l o \ tWorld ! ’
We now investigate how student code submissions can be transformed into mean-
ingful vectors as a form of representation of the program code, and implicitly as a
representation of the student who submitted that code. As mentioned earlier, com-
puters do not understand text data and text needs to be represented and encoded
into vectors of numbers as the input into a Machine Learning algorithm. For that,
we use the following two approaches:
2. Code Embeddings
86
Artificial Intelligence in Computer Science and Mathematics Education
The number of words extracted after running the tokeniser on our data are
231,659 which was fitted with 591,707 code submissions. A lot of computer mem-
ory is required to generate the large sparse matrices for learning code2vec and
user2code2vec representations. Although our experiments are run on a GPU for
faster computation, running a classification algorithm for more than half a million
source code files is computationally expensive, hence we set a limit to the number of
Words, Python Categories, Python Tokens Words and AST Nodes to 2,000. Overall,
there are fewer Token Words than Words.
The bag-of-words (BOW) model, also called the vector space model, is a simple
representation of documents and queries used Information Retrieval for almost 50
years [94]. According to this model, a text (such as a sentence or a whole document
or a user query) is represented as a bag of its words, disregarding grammar and even
word order but keeping multiplicity. In our work, we leverage the BOW model to
represent code submissions by looking at either:
(a) Words
The ordering of these items in each of the alternatives is ignored and only their
frequency is stored in a large sparse matrix. This matrix can be populated using
one of the following operations:
87
Artificial Intelligence in Computer Science and Mathematics Education
Table 5.1 shows a simple BOW example using the count of each Token Word for
Listings 5.1 and 5.2 as the corpus. This BOW approach can be used for classification
methods where the count, frequency, presence or TF-IDF of occurrence of each item
(Word, Token Category, Token Word or AST Node) is used as a feature for training
a classifier.
Table 5.1: Count Occurrence Matrix for Listings 5.1 and 5.2
UNK ‘(’ ‘)’ ‘print’ ‘”Hello, World!”’ ‘say hello’ ‘def’ ‘:’
0 1 1 1 1 0 0 0
0 3 3 1 1 2 1 1
88
Artificial Intelligence in Computer Science and Mathematics Education
For each course and academic year, a User Representation Matrix is con-
structed for each student using the code vectors of the submissions to the proposed
labsheets by the Lecturer. Having a vector representation of code submissions allows
researchers to generate a higher-level representation for each student or user. This
User Representation Matrix is built by vectorising the submissions. Submission are
vectorised using either:
1. Word Tokeniser
This results in a User Representation Matrix of shape (number tasks, MAX LENGTH).
MAX LENGTH is the limit for each sequence that we use for padding the code sub-
mission after tokenisation. MAX LENGTH is set to 50. The User Representation
Matrix for each student is flattened out as a long vector. Principal Component
Analysis (PCA) [101] is leveraged as the dimensionality reduction technique to vi-
89
Artificial Intelligence in Computer Science and Mathematics Education
In this section, the results of the code2vec technique will be discussed for both ap-
proaches: BOW and Embeddings. We train the models and learn representations
using all the Python programs submitted by students from previous cohorts in our
University over a number of years, and we use these representations to predict the
correctness of code submitted by a student from the present cohort. As a reminder
to the reader, in section 1.1 the second research question in the thesis was introduced
and it is re-stated below:
RQ2: How can students’ programming submissions be encoded into vectors for use
as internal representations of those students?
This will help the reader to understand the context of the experiments we now
present.
First, we build four tokenisers constructed and fitted with the code submissions
using either (a) Words, (b) Python Categories, (c) Python Token Words or (d) AST
Nodes, respectively. The dictionary of items and their counts are shown in Tables 5.2
and 5.3. It is interesting to see the differences between the top occurrences for each
tokenisation, where Token Words are a generalisation of Words, Token Categories
are a generalisation of Token Words and the AST nodes are at an abstract level
which contain items regarding the structure of the code submission.
By processing student data in this way, we can construct matrices where each
row is a code submission and we count the number of occurrences for each (a) Word,
(b) Token Category, (c) Token Word and (d) AST Node. Figure 5.4a shows details
90
Artificial Intelligence in Computer Science and Mathematics Education
Table 5.2: Top-5 Words & Token Categories in terms of Number of Occurrences
Table 5.3: Top-5 Token Words & AST Nodes in terms of Number of Occurrences
91
Artificial Intelligence in Computer Science and Mathematics Education
of the performance of these model combinations (a), (b), (c) and (d) just using the
count of items. In addition, we look on the (a) Words (as they work better) and
perform a similar analysis looking at the count, presence (binary), frequency and
TF-IDF of the Words instead of the pure count only. Figure 5.4b does not show a
meaningful difference between them except that the frequency model works slightly
worse than the others. These models are trained using a Naive Bayes classification
algorithm [63] holding out 20% of the data as the testset. The models are trained
using around half a million code submissions (less for the Tokens or AST Trees as
some code submissions could not be tokenised using the Python Tokeniser library or
an AST could not be extracted when the programs are incorrectly constructed). The
classes for this classification problem are well balanced. For instance, for the model
that uses the Words, 194,451 submissions were correct and 296,369 were incorrect
based on the output of the grading platform. That is the target of our predictions
for training the models.
Interestingly, the least generalised model that uses the Words instead of Tokens
or AST Nodes is the one that performs slightly better than the rest using BOW.
The less generalised the model is, the better it performs, and using Words performs
better than Tokens and Tokens perform better than AST Nodes.
92
Artificial Intelligence in Computer Science and Mathematics Education
(a) Words vs. Token Categories vs. Token Words vs. AST Nodes Using Count
The performance of the models using (a) Words and (b) Token Words is shown
Figure 5.5. These models are trained using Cross Validation with 20% of the dataset
as the holdout set. Utilising Neural Networks with an embeddings layer allows us
to learn better patterns and representations of the code solutions submitted to the
93
Artificial Intelligence in Computer Science and Mathematics Education
grading platform. The models perform better than the baseline BOW and the Word
Tokens are better able to distinguish between correct and incorrect programs. We
expect that incorporating the structure of the program using ASTs will create a
richer model.
Table 5.4 shows the results from the different BOW and embeddings models in
a comparable way.
94
Artificial Intelligence in Computer Science and Mathematics Education
(a) Embeddings for the Top 20 Most Common (b) Embeddings for the Top 20 Most Common
Words Token Words
Figure 5.6: Embeddings for the Top Words & Token Words. These Embeddings Are
Projected from 100 Dimensions to 2 Dimensions for Visualization Using Principal
Component Analysis (PCA). Axis in the Graphs Are the PCA’s Two Principal
Components.
After the models are trained using the code submissions with the correct or in-
correct target for each, the learned embeddings can be extracted. Figure 5.6a shows
the embeddings of the top 20 most common words. It is interesting to note how
operands are clustered together as are numbers. This confirms that the network is
learning efficient representations. Similarly, we can explore the top 20 most common
tokens in Figure 5.6b.
These vectors contain really interesting properties similar to word embeddings. Ta-
ble 5.5 shows some cosine similarities between pairs of words that are very close to
other pairs. Neighbors of these embeddings can also be checked out, and numbers
can be found besides other numbers in String format, but in general, our learned
embeddings have noise such as variable names and Strings that prevents us to from
seeing other relationships as would be found in word2vec [67].
95
Artificial Intelligence in Computer Science and Mathematics Education
items.
RQ3: By leveraging the vectorisation of code submissions for a given course, how
can we represent students based on their programming work?
User Representation Matrices were constructed using the code submissions for each
student. Then, we flattened them to input them to a Deep Learning Network
similar to code2vec with an embeddings layer that learns representations of users
in a continuous space with reduced dimensionality. User embeddings are given 100
dimensions. The input data are very large and sparse vectors with the indexes of
the vocabularies for the code submissions.
96
Artificial Intelligence in Computer Science and Mathematics Education
Network and Figure 5.7b shows how difficult it is to distinguish among a few hundred
students with such a large sparse matrix of code submissions. Unfortunately, we
cannot add more data as there were no more students in that cohort, unlike other
domains which allow downloading of more tweets or crawling more websites when
a similar situation with insufficient data occurs. The vectors are transformed to 2
dimensions using PCA. The variance retained is very low (between 2% and 6%).
Each dot in the graphs represents a student based on the projection of their student
vector. The colour used represents the average grade of the exams that student took
that year on that course3 .
97
Artificial Intelligence in Computer Science and Mathematics Education
(a) User Raw Representations Using Word To- (b) User Learned Embeddings Using Word To-
kens kens
from each submission and to concatenate key features across the code submissions.
In short, the approach is to keep the number of features small in order to effectively
learn from constrained data. These user2code2vec representations can then be used
to identify student neighbours for programming recommendations.
98
Chapter 6
6.1 Introduction
This chapter builds on the work developed in chapter 4 where we built models to
distinguish higher-performing students from lower performers, in semester exami-
nations. Students who are struggling with understanding course material may be
doing so for a variety of reasons and each student may not have the same issues
in understanding, as the others. In order to personalise the way our students learn
programming skills and to support adaptive feedback in the computer programming
modules, we started sending customised weekly notifications via email. We provided
this feedback typically after week six (mid-semester) of the teaching period. This
chapter then presents a study on students’ engagement with the weekly personalised
performance notifications. Overall, the predictive and personalised feedback helped
to reduce the gap between the lower and higher-performing students. Furthermore,
students praised the prediction and the personalised feedback, conveying strong rec-
ommendations for future students to use the system. We also found that students
who followed their personalised guidance and recommendations performed better in
subsequent examinations.
99
Artificial Intelligence in Computer Science and Mathematics Education
Students have different needs for support at different times during their learning
periods. Understanding which students are finding difficulties with course material
at different stages is a potentially great resource to help improve learning and ul-
timately success in passing a course. However, students are likely to have different
knowledge gaps from one another (Figure 6.1).
Figure 6.1: Students Struggling with Programming Concepts May Have Different
Learning Issues
100
Artificial Intelligence in Computer Science and Mathematics Education
In order to adapt to students’ learning on this VLE platform, in the middle of the
semester, a feature is enabled on the VLE for students to opt-in or opt-out of receiv-
ing weekly personalised notifications on their progress, relative to other students.
These include a performance message based on the predictions being run on the
class of students to which they belong which has been trained with a combination
of historical student cohorts data, recommended learning material and laboratory
sheets resources to review based on their progress and finally programming code
solutions from among the top-ranked students in their class as well as additional
support resources.
Feedback was sent to students who decided to opt-in to receive these feedback noti-
fications via email. The feedback to each student was personalised in the following
ways:
101
Artificial Intelligence in Computer Science and Mathematics Education
• If a student did not spend any time logged onto the platform and active, we
added the message “Remember, computer programming is a skill that requires
practice and this module is no exception.” in bold.
• For each student we checked whether (s)he attended any of the lab sessions
the week before and regardless of the predicted probability we acknowledged
if they did or we added the message Try to make it to the next lab session so
the lecturer and tutors can help you resolve any issue.” if they did not.
• In addition, as a form of peer feedback, for each notification that was sent we
included one computer programming suggestion if the student had submitted a
program that failed any of the testcases and was thus incorrect. We developed
a knowledge graph and based on the concepts the lecturer considered more
important each week for the course, we started suggesting computer programs
for those gaps in knowledge. Typically, the most recent labsheet exercises were
suggested first, then the previous labsheet exercises and so on. The order of
the exercises selected from a particular labsheet is the normal order in the lab.
The first exercise in the labsheet will be offered before the second if both were
failed submissions for that particular student. The solution suggested is the
closest program from a top-rated student in the class that week who got that
program working as expected. The top students are the 10% highest-ranked
in that class from our predictions each week. We recommended the closest
submission by text similarity between the programs after removing comments
in the program.
• Students were given an explanation about this project which included sending
them alerts, and how the predictions are computed. They were also provided
with support resources to reach out to the Lecturer of the module, this project
or the Support Services at the University if they needed assistance.
• At the end of the note, students could find links to read the Terms and Con-
ditions for this project, or to unsubscribe from these notifications if desired.
102
Artificial Intelligence in Computer Science and Mathematics Education
• The percentage of students who attended any of the laboratory sessions that
week. This was computed by examining the web logs, the IP associated with
those log entries and whether that IP belong to the university or not;
103
Artificial Intelligence in Computer Science and Mathematics Education
• The percentage of students who had opted-in up to that week to receive cus-
tomised notifications;
• The amount of time spent on the platform by students on average that week;
• The distribution of the predicted performance for students using the associated
probability of our predictions;
• Similarly, the top-5 the most suggested laboratory sheets associated to the
learning resources.
In this study, we explore how students engaged with the system notifications that
include personalised performance messages and resources to focus on based on the
students’ progression with laboratory work. The recommended resources are sug-
gested by creating a knowledge map with labsheets that are associated to concepts
and those concepts are in turn associated with slides to review. For instance, Fig-
ure 6.2 recommended that the student work on the first labsheet from week 2 and
also to revise the associated material on lists and files.
A difference index for each student i, shown in Equation 6.1, measures the dif-
ference between the second examination mark and the first one for a particular
student.
di (e1 , e2 ) = e2 − e1 (6.1)
A gain index was developed to measure each student’s improvement between two
104
Artificial Intelligence in Computer Science and Mathematics Education
(e2 − e1 )
gi (e1 , e2 ) = (6.2)
e1
0 e1 = 0, e2 = 0
1
e1 = 0, e2 6= 0
normgi (e1 , e2 ) = (6.3)
1 gi (e1 , e2 ) > 1
gi (e1 , e2 ) otherwise
It is important to note, we should be careful in using our own Equation 6.3 for
measuring the impact of our interventions. A student going from an examination
grade of 1% to 2% improves 100% with respect to the first examination mark which
gives a normalised gain index of 1. However, a student going from 60% to 100% will
not have such a high normalised gain index or a student going from 1% to 100% will
have its normalised gain index truncated to 1. Results may be skewed due to this
approach. An alternative approach to calculate the normalized gain for assessing
students’ performance in pre- and post-examinations is the g-factor [12].
back
Earlier in the thesis, in section 1.3 the fourth research question in the thesis was
105
Artificial Intelligence in Computer Science and Mathematics Education
RQ4: What are the effects of timely automatic adaptive support and peer-
programming feedback on students’ performance in computer programming courses?
We will now examine the impact of this feedback for the three academic years
2015/2016, 2016/2017 and 2017/2018.
In 2015/2016 academic year, students using our system did not receive any notifi-
cations. We leverage the data from that year to train our student models. CA278
had not been formed and did not exist at that point as it was taught for the first
time in the 2016/2017 academic year. This data from this year is however, used as
a baseline to compare to the levels of engagement that happened among students in
the following academic years. Even though these would be different actual students
form year to year, we believe that because of the size of the classes, the behaviour
of students would be acceptably consistent.
Table 6.1 shows some basic characteristics of students who passed and who failed
the first assessment in that year. We see that the first assessment had a high failure
rate for both courses but there is little differences in terms of age and CAO Points
between students that passed or failed this assessment. “Students CAO Route”
means how many of the total students for each group that came via their CAO
application rather than other routes such as Dare, Access or Sports scholarship.
Table 6.2 shows how students who failed then subsequently improved more than
students who passed the first assessment with respect to our normalised gain index.
This is usually the case as these students who failed their first assessment have more
room for improvement. Note the number of students that passed and failed for each
course in Table 6.2 are fewer than the number of students that passed and failed in
Table 6.1. That is because for some of the students, their demographics and prior
106
Artificial Intelligence in Computer Science and Mathematics Education
Table 6.1: Demographics and prior information for students in 2015/2016 in courses
CA117 and CA114
Number Mean Students Mean
Course Group
Students Age CAO Route CAO Points
Passed 66 18.77 49 445.31
CA117
Failed 83 19.07 61 430.08
Passed 22 18.45 18 387.50
CA114
Failed 49 18.31 42 388.93
information such as route to university could not be retrieved. For instance, that
is the case for exchange students. We worked with the university quality office to
collect this data but it was not possible for some of the students.
In 2016/2017, we trained models using one year of groundtruth data and generated
predictions for incoming students that year, that is we trained on data from the year
2015/2016.
We now analyse the effect this type of notification had on students in the
2016/2017 academic year for CA117, CA114 and CA278. For that, we extracted
several groups from the students who were enrolled in each course, namely . . .
(a) Students who Opted-IN vs. students who Opted-OUT to receive who cus-
tomised notifications;
107
108
Artificial Intelligence in Computer Science and Mathematics Education
Table 6.2: Difference and Normalised Gain Index between the examinations for CA117 and CA114 in the 2015/2016 academic year
First Second di normgi
Group First Second normgi
Course Exam Exam Mean Mean
(Number) Exam Exam Difference
Week Week (Std.Dev.) (Std.Dev.)
Passed (66) 75.23 (20.08) 55.06 (29.94) 0.00 (36.51) -0.28 (0.36)
CA117 W6 W12 -0.54**
Failed (83) 14.70 (13.65) 24.40 (24.74) 9.70 (23.37) 0.26 (0.71)
Passed (24) 64.17 (22.35) 64.17 (34.15) 36.08 (34.07) 0.05 (0.59)
CA114 W6 W12 -0.6**
Failed (51) 5.88 (9.11) 41.96 (31.75) 22.81 (37.71) 0.65 (0.65)
**
p − value < 0.001
Artificial Intelligence in Computer Science and Mathematics Education
(b) Students who Fixed vs students who Did-not-fix the programs they were sug-
gested in the notifications.
(c) Students who Passed vs students who Failed their first laboratory exam.
On the last division, students who passed or failed the first laboratory exam,
contains all the students in the class. However, the other two divisions do not.
Students had to opt-in or opt-out before they could submit any more programs to
be analysed in the grading platform, but some of them were not engaged and did
not reply. In addition, there are some students who were not sent notifications, were
deemed to have failed in their programming submissions as they did not have any
to submit, so they were not in the fixed nor in the did-not-fix group.
Table 6.3 shows some characteristics of those students in the different groups for
the courses being analysed. There are no major differences in between the groups.
In general, students who passed the first assessment for all courses have higher CAO
points if they came to University through that route. Also, in general, most students
would opt-in and would not fix the programs suggested in their notifications.
Table 6.4 shows the differences among the groups with respect to examinations.
Examinations happened in Week 6 (mid-semester) and Week 12 (end-of-semester
classes). Again, students who failed the first assessment had more room for im-
provement than students who passed that assessment, for all courses. In general, for
this course and this academic year, there are only significant differences for those
students who opted in for CA117’s notifications and not for the other courses with
respect to the opt-outs. However, students who fixed the programming submissions
that were suggested in the notifications, improved more using the normalised gain
index with respect to students who did not fix their programs for all courses: CA117,
CA114 and CA278. Again, we do not have demographics and prior information for
all the students. This can be seen as there are fewer students in the pass and fail
groups for Table 6.3 with respect to Table 6.4.
109
Artificial Intelligence in Computer Science and Mathematics Education
Table 6.3: Demographic information and prior information from the 2016/2017
student groups in CA117, CA114 and CA278
110
Table 6.4: Difference and Normalised Gain Index among the examinations for CA117, CA114 and CA278 in the 2016/2017 academic year
di normgi
Group First Second normgi
Course Mean Mean
(Number) Exam Exam Difference
(Std.Dev.) (Std.Dev.)
Passed (82) 76.22 (21.70) 47.85 (26.42) -28.37 (22.28) -0.38 (0.30)
-0.56**
Failed (58) 8.62 (11.88) 12.02 (14.64) 3.40 (12.79) 0.18 (0.53)
Opted-IN (122) 54.51 (36.36) 37.75 (27.42) -16.76 (25.51) -0.15 (0.52)
CA117 +0.18
Opted-OUT (11) 52.27 (34.47) 19.45 (17.07) -32.82 (28.52) -0.33 (0.68)
Fixed (16) 32.81 (39.25) 27.62 (25.45) -5.19 (23.28) 0.23 (0.68)
+0.37*
Did-Not-Fix (52) 45.19 (31.79) 34.42 (27.46) -10.77 (21.89) -0.14 (0.45)
Passed (57) 72.81 (18.90) 55.44 (29.02) -17.37 (31.21) -0.20 (0.46)
-0.55**
Failed (16) 17.19 (11.59) 40.00 (29.15) 22.81 (37.71) 0.35 (0.69)
111
Opted-IN (62) 62.10 (28.31) 53.23 (29.17) -8.87 (37.09) -0.11 (0.54)
CA114 -0.04
Opted-OUT (7) 57.14 (25.75) 42.86 (32.83) -14.29 (38.68) -0.07 (0.74)
Fixed (18) 62.50 (29.17) 56.67 (28.48) -5.83 (36.83) -0.03 (0.51)
+0.12
Did-Not-Fix (34) 61.03 (22.84) 50.59 (29.99) -10.44 (32.21) -0.15 (0.54)
Passed (53) 62.57 (17.65) 73.91 (17.44) 11.34 (19.87) 0.23 (0.36)
-0.73**
Failed (5) 15.80 (12.94) 64.80 (8.13) 49.00 (17.45) 0.96 (0.09)
Opted-IN (42) 60.88 (23.24) 76.45 (14.91) 15.57 (23.74) 0.31 (0.41)
CA278 -0.01
Opted-OUT (8) 58.12 (19.81) 71.38 (11.79) 13.25 (13.45) 0.32 (0.32)
Fixed (7) 53.57 (27.49) 73.00 (14.90) 19.43 (33.00) 0.41 (0.50)
+0.11
Did-Not-Fix (28) 58.25 (20.88) 73.93 (14.90) 15.68 (22.12) 0.30 (0.39)
Artificial Intelligence in Computer Science and Mathematics Education
*
p − value = 0.01
**
p − value < 0.001
Artificial Intelligence in Computer Science and Mathematics Education
Figure 6.3: Frequency of Access to Material and Labsheets from the Notifications
In 2017/2018, we trained models using two years of groundtruth data and we gen-
erated predictions for incoming students in that year. Again, in the second part of
the semester, we sent customised notifications to students who decided to opt-in. In
addition, in this 2017/2018 academic year, we measured whether a student would
click the material resources suggested on the notifications. Figure 6.3 shows how
we tracked when students accessed the material or labsheets from the links in the
notifications.
Table 6.5 shows that students who clicked on any resource (which were not a
majority) showed a greater normalised gain than students who did not click on those
resources between examinations. Examinations happened in Week 6 and Week 12 as
in previous years. The normalised gain difference between the two groups is found
to be statistically significant.
112
Table 6.5: Difference and Normalised Gain Index between the examinations for CA117, CA114 and CA278 on 2017/2018 academic year
di normgi
Group First Second normgi
Course Mean Mean
(Number) Exam Exam Difference
(Std.Dev.) (Std.Dev.)
Passed (90) 75.83 (20.90) 55.11 (28.53) -20.72 (23.05) -0.28 (0.33)
-0.59**
Failed (58) 12.93 (12.49) 25.00 (26.08) 12.07 (24.74) 0.31 (0.67)
CA117
Clicked (19) 51.32 (39.30) 53.16 (24.72) 1.84 (28.20) 0.22 (0.58)
+0.311
Did-not-click (129) 51.16 (35.06) 41.86 (31.86) -9.30 (28.41) -0.09 (0.56)
Passed (53) 83.02 (21.60) 79.62 (27.88) -3.40 (32.96) 0.04 (0.47)
-0.38*
113
Failed (16) 10.94 (12.40) 40.00 (36.74) 29.06 (34.52) 0.42 (0.71)
CA114
Clicked (4) 50.00 (30.62) 90.00 (10.00) 40.00 (35.88) 0.70 (0.52)
+0.611
Did-not-click (65) 67.31 (36.41) 69.23 (35.10) 1.92 (34.86) 0.09 (0.55)
Passed (70) 69.06 (16.34) 65.99 (17.24) -0.01 (0.27) 65.99 (17.24)
-0.01**
Failed (10) 31.90 (10.71) 50.00 (22.03) 0.47 (0.37) 50.00 (22.03)
CA278
Clicked (5) 66.40 (21.14) 60.00 (17.99) -0.08 (0.14) 60.00 (17.99)
+0.11
Did-not-click (75) 64.28 (19.89) 64.25 (18.68) 0.05 (0.33) 64.25 (18.68)
1
p − value = 0.03
*
p − value = 0.01
**
p − value < 0.001
Artificial Intelligence in Computer Science and Mathematics Education
Artificial Intelligence in Computer Science and Mathematics Education
Table 6.6 shows a comparison between students that passed and failed the first
laboratory exam in 2015/2016, 2016/2017 and 2017/2018 academic years. Recall,
in 2015/2016, there were no interventions to students; in 2016/2017 and 2017/2018
there were customized notifications sent to students. The normalized gain index
difference between students that passed and failed the first examination is reduced
for CA114 and slightly increased in CA117. We can not state students that are
lower performers in the first examinations get their difference reduced with respect
to the higher performers unless they engage with these notifications by fixing their
suggested failed submissions or clicking on the resources suggested.
In terms of addressing RQ4, re-stated earlier at the start of this section, the
effects of engaging with these personalised notifications by fixing programming sub-
114
Artificial Intelligence in Computer Science and Mathematics Education
missions or clicking on learning material might have a positive effect on the progres-
sion from one examination to the next one1 . This means the research question is
addressed with a positive answer.
We believe it is worth the effort of making this type of interventions and putting
our predictive modelling work to practice. We feel morally compelled to implement
this type of interventions and we can potentially guide, help and motivate our stu-
dent body. In the future, we could explore how students engage with the program-
ming code solutions from higher performers and how it affects their programming
design learning.
In section 1.3 earlier in this thesis, the fifth research question in the thesis was in-
troduced and it is re-stated here for convenience:
RQ5: What are students’ and teachers’ perspectives and experiences after adopting
a predictive modelling and adaptive feedback system into their own classes ?
115
Artificial Intelligence in Computer Science and Mathematics Education
• Q3: How useful did you find the weekly notifications ? [1 to 5 star rating]
• Q4: Did you run any of the working programs suggested to you ? [Yes / No
/ I was never suggested any]
• Q5: Would you recommend the system to a student taking this same module
next year ? [Yes / No]
• Q6: Would you like to see weekly the system notifications for other modules ?
[Yes / No]
• Q7: How could we improve the system for next year ? Any other comments.
[Comment]
Overall, feedback from students in these student surveys was very positive from stu-
dents and a summary of the responses can be found in Table 6.7 for the 2016/2017
academic year. The survey results were anonymised. Most students would recom-
mend this system to other students attending the same module next year or would
like to see this system included in other modules as shown in answers to questions
5 and 6 respectively.
Table 6.7: 2016/2017 Student survey responses from students about the project
For first-year courses, the questionnaire was completed on the second-last day
of the semester classes during an evaluation for another module where all CA and
EC first-year students should have attended. We could gather a good number of
responses. The response to the first question shows the percentage of students who
opted-in and the fourth shows whether they ran any of the suggested programs. The
responses in both questions are inaccurate as, for instance, some students claimed
116
Artificial Intelligence in Computer Science and Mathematics Education
they opted-in when they did not or they were not suggested any program when
in reality they were. That indicates that some students may not check their mail
regularly or they opted-in without realising what they were signing up for or what
the system would do with their digital footprint. Notifications via email may not
be the best way to communicate with students as some of them pointed out in the
improvements and comments section of the survey and a better way to measure how
they interact with these customised messages should be tried.
The final question regarding how students would like to improve the system re-
ceived some really interesting comments. However, students who were doing well
or very well, were getting a similar response every week and the notifications might
seem monotonous. In general, students demanded a more personalised notification
as well as some additional learning resources. Finally, the following are some posi-
tive and negative quotes, comments and suggested improvements from students in
response to this last question of the survey:
• “Feedback on each step of each task would be good, such as, better ways to
do things”
117
Artificial Intelligence in Computer Science and Mathematics Education
Others did not enjoy the notifications that much as they could be very repetitive
for students who are well on top of the module and do not have any failed programs
for which they could receive suggestions:
Overall the feedback was very positive and some students were motivated with the
weekly notifications:
• “Good service, very helpful and effective way to manage your module”
CA116 and CA114’s lecturers also completed a survey and indicated the follow-
ing:
1. “With large class sizes for our programming modules, it is practically im-
possible for the Lecturer to monitor each student personally. Therefore, an
automated approach is useful”.
118
Artificial Intelligence in Computer Science and Mathematics Education
2. “Simply seeing the list of students marked red or green each week gives a sense
for how things are going”.
CA117’s lecturer said “(He) would be happy for you to run further experiments
in future deliveries of CA117”. “(He) liked the fact you could tailor the release
of concepts to students in order to keep pace with the delivery of the module e.g.
avoiding sharing solutions by advanced students that used lambda expressions not
yet covered”. The following year, he indicated that “(He) likes the fact it does its
thing without my input. He does use it for checking overall interaction by students
with lab exercises and that information he finds valuable.”.
As a summary for addressing RQ5, the platform was well received by students
and lecturers. Students would like to see their feedback in other modules and would
recommend it to other students taking those courses in the following year. Lecturers
show the recommendations as an extra help to automate the personalisation of the
experience of hundreds of students.
119
Artificial Intelligence in Computer Science and Mathematics Education
Conventionally, Learning Analytics are used to process student data for a variety of
possible applications including feedback to University administrators, or to notify
students regarding their predicted performance and available further resources using
email or via a university’s Learning Management System.
• Short code snippets: Students can avail of code snippets that showcase func-
tionality such as slicing lists, reading from files or printing arguments. 100+
2
WhatsApp Messenger is a freeware and cross-platform messaging and Voice over IP service
owned by Facebook
3
Code assistant has a separate repository at https://github.com/dazcona/code-assistant
120
Artificial Intelligence in Computer Science and Mathematics Education
snippets have been hosted on GitHub’s gists, as that website is already opti-
mised for easy reading of programming code on smartphones.
In addition to the above, students can ask for further help from the Lecturer or
the University’s support services, consult the terms of the project and opt-out at
any time. Phone numbers used for WhatsApp are deleted at the end of the semester.
Efforts are now being made to include Natural Language Understanding so students
can ask questions in natural language such as “How am I doing?” or “What can I
learn next?”
In 2018/2019, in terms of RQ4, after running the same quantitative analysis
between students that engaged and talked to the virtual assistant (52 students)
showed a greater normalised gain than students that did not talk to the chatbot
(80 students). The normalised gain difference between the two groups is not found
to be statistically significant. Engaging witht these type of feedback was found to
have an effect and many more students engage with the chatbot than with previous
email notifications.
121
Chapter 7
7.1 Introduction
In this chapter, in order to address the final of the 6 research questions which are
the basis of this thesis, we collected a unique dataset of college student learning
states and navigation traces from student access logs to a large MOOC. We present
a straightforward way for researchers and lecturers to quickly follow up on a par-
ticular student’s progression using networks. Students have different strategies to
study concepts and that can be exploited to personalise each student’s learning. The
potential application of unsupervised machine learning approaches such as HMMs
using deep learning might showcase hidden and higher level representations of learn-
ing states that can be applied to Intelligent Tutoring Systems and online courses,
and we explore that further in this Chapter.
122
Artificial Intelligence in Computer Science and Mathematics Education
State University
Arizona State University (ASU)’s Global Freshman Academy (GFA)1 provides first-
year university courses through the online MOOC platform, EdX. EdX is a massive
open online course (MOOC) provider which supports access to online courses from
numerous Universities worldwide. It was formed by a coalition of MIT and Harvard
University but has opened up to host courses from many other Universities, in a
range of topics. The GFA was launched in 2015 in partnership with EdX and since
its launch it has enrolled more than 230,000 students from more than 180 countries.
The GFA offers a wide range of courses in Business, Engineering and General
studies and in our analysis we will focus on only a small subset of these, specifically
the following two Mathematics modules:
Additionally, both the Algebra and Precalculus courses uses the cutting-edge adap-
tive technology ALEKS. ALEKS is a personalized math tutor that helps students
1
https://gfa.asu.edu/
123
Artificial Intelligence in Computer Science and Mathematics Education
learn mathematics skills at their own pace. The courses tailor content and person-
alizes the learning experience around the student’s skill level, allowing the student
to achieve mastery in a certain concept before moving on to the next. Utilizing the
ALEKS learning system, students expect to be instructed on the topics they are
most ready to learn. Further details regarding the ALEKS platform for this study
can be found in Chapter 3. The log data we extracted when students are assessed
continuously while navigating through ALEKS is summarized in Table 7.1.
16,022 students
40,356 assessments
8,808,675 daily aggregate events of the topics learned and retained
186,224 students mastering topics
5,022,091 transactions of students navigating through the concepts
As a further breakdown into this dataset, the following is the information which
was provided to us for this analysis and this stored by us in a non-relational (NoSQL)
database for easy access to explore and analyse:
– Objective Completion;
124
Artificial Intelligence in Computer Science and Mathematics Education
For instance, a student is assessed when she first comes onto the platform as
an Initial Knowledge Check, as a pretest given at the beginning of the course,
and the content shown afterwards depends on the topics the student already
knows. Also, students are assessed periodically with a Progress Knowledge
Check to see how many new topics have been mastered by them. See the
numbers for each reason of assessment in Figure 7.1;
• Daily rollup: a daily progress record regarding the topics mastered by each
student and the percentage goal towards completion of the course. For in-
stance, for a student that has some activity on a particular day a list of topics
retained and learned are calculated and stored as a daily row in this collection;
• Concepts: topics students work on, each of which belong to sections. For
instance, topic “Evaluating functions: Linear and quadratic or cubic” which
belong to Section 4 of the course MAT117x;
We carried out a range of sanity checks to ensure the data we are working with
did not have any gaps. The data provided was from 13th. April 2016 to 1st. October
2017. However, we discovered there were no transactions or assessments data in the
Summer of 2017. We believe there could have been a data breach at that time but
it is unclear and the end result is that we can not make any inferences of what
happened during that time or even after when the data collection was resumed. In
order to progress with our analysis, we removed the data we have after the data
125
Artificial Intelligence in Computer Science and Mathematics Education
Figure 7.1: Number of Students that Took each Type of Assessment for MAT117
and MAT170 in ASU’s GFA via EdX
breach for the retrospective analysis. This the actual dataset that we used was until
August 2017.
In a student’s learning journey on the MOOC the system navigates them to dif-
ferent concepts or topics based on their understanding and performance in previous
topics. For instance, some of the topic names are “Distributive property: Integer
coefficients” or “Writing and evaluating a function modeling continuous exponential
growth or decay given two outputs”. In any topic, students generally start with the
initial learning state of reading the lesson (marked with an L), then their under-
126
Artificial Intelligence in Computer Science and Mathematics Education
standing and mastery is evaluated via some exercises that they can get correct (C)
or wrong (W). They could also request a working example of the concept (E). After
some work, the system marks the understanding of the students as a provisional
mastery (S) or a failure to learn a concept (F). The student is then redirected to
another concept.
There are a total of 797 topics in the 2 courses and Figure 7.3 shows the time
spent by all students in each learning state, shown in seconds using the transactional
data, ordered by the total time spent by all students on that topic.
Figure 7.3: Total Duration for each Topic Using All Students’ Data, Ordered, Split
and Colour Coded for each Learning State
127
Artificial Intelligence in Computer Science and Mathematics Education
Figure 7.4: Number and Percentage of Students Who Worked on Each Section for
Both Courses
We developed a web application to explore this dataset and, specially, the trans-
actions recorded while students move from one learning state to another, one topic
to another and one slice or section (higher-level representations of concepts) to
another. The EdX and ALEKS log data was stored and then indexed in a non-
relational database. This information was made available using our web application
where Faculty can explore individual students and how they are redirected through
the material, concepts and their underlying difficulty based on completion and likely
sequence patterns around concepts and slices.
128
Artificial Intelligence in Computer Science and Mathematics Education
actions for a particular student are shown in Figure 7.5. We can explore all this
student’s transactions grouped by the dates. Figure 7.6 which shows the time dis-
tribution for each learning state for this student with respect to the other states. In
addition, we can create an interactive visualization for each of the topic’s transac-
tions using networks. See Figure 7.7. We can appreciate differences between simpler
concepts such as “Ordering integers” (shown in Figure 7.6) and more challenging
ones such as “Evaluating a linear expression: Integer multiplication with addition
or subtraction” (shown in Figure 7.7). This will be further analysed in the next
section utilizing graph theory.
Figure 7.5: Screengrab from the Web Application which shows the First Transactions
for a Particular Student
of Learning States
In section 1.3 the sixth research question in the thesis was introduced and it is re-
stated here for convenience:
RQ6: Can we extract valuable insights from massive open online learning platforms
utilising the sequences of learning states?
129
Artificial Intelligence in Computer Science and Mathematics Education
Figure 7.6: Screengrab from the Web Application which shows the Distribution of
Learning States Grouped By Date for a Particular Student
Figure 7.7: Screengrab from the Web Application which shows Network Visualisa-
tions for the First Three Topics for a Particular Student
Of the 6 research questions posed in this thesis, this one is probably the most open-
ended in that the question does not have an obvious yes/no answer and requires us
to experiment with, and find, the so-called “valuable insights”. This requires a dif-
ferent approach to what we have done in earlier chapters, namely building predictive
models of student outcomes and deploying them in practice, or using them as a form
of factor analysis to explore features which influence student performance. What
130
Artificial Intelligence in Computer Science and Mathematics Education
we did here is to try to learn useful structure but without labelled classes as we had
in the previous chapters (student exam performance being the labels). This is also
known as unsupervised learning and we proceeded in this work by deriving graph
representations of the transactional data logs from the MOOC. The 5 million trans-
actions contain the following information: student, timestamp, concept, learning
state and duration of the activity and this formed the basis for the investigation.
Through the ALEKS platform, we are able to observe the learning states students
will go through as they consume content through the GFA platform. We analysed
how students transition from one learning state to another and as an example,
Figure 7.8 shows how a particular student traversed through the learning states for
two particular topics: “Ordering integers” and “Exponents and integers: Problem
type 1”. Even for the same student, there are different learning paths being taken
for different topics and we can see this by examining the learning states. Figure 7.8
shows two very different learning paths, a linear learning path versus the student
failing to master the concept at the beginning but being brought back to that concept
later on and mastering it eventually.
Figure 7.8: Visualizations of Networks for Two Students Going Through Various
Learning States on Two Different Topics
131
Artificial Intelligence in Computer Science and Mathematics Education
Topic Networks
Students are navigated through topics, also known as concepts, by using the ALEKS
technology. We developed directed networks for each topic by using the data from
all students. There are 413 topics in MAT170 and 384 in MAT117. For instance, a
student can follow the learning path “LWCEWCECS” for a particular topic such as
“Ordering integers”.
For each topic, a graph was developed using the following parameters:
• Sum: adding up the duration of time spent between two learning states;
• Average: the average duration of time spent between two learning states
e.g. L-C-avg duration;
2. For each learning state we extracted the following in-coming and out-going
metrics:
Once this data had been extracted, metrics were derived for each learning state in a
topic network based on the in-coming and out-going metrics. These metrics derived
for a topic network can be compared with other topic networks using the network
or graph topology alone. We select two topic networks to illustrate this point:
132
Artificial Intelligence in Computer Science and Mathematics Education
We can compute and plot the differences between these two topic networks based
on their metrics, as shown in Figure 7.9. Degree centrality measures popularity of the
learning states for these topic networks. Reading is more popular for the “easy topic”
with respect to the exercises which are more important for the “challenging topic”.
This might mean, coming back to reading is more important for the “easy topic”
with respect to re-attempting exercises being more important for the “challenging
topic”.
Figure 7.9: Network Degree Metrics Extracted from Two Topic Networks and the
Values for Each Learning State. Metrics for One Topic Network are shown in Blue
and for the Other Topic Network in Red.
133
Artificial Intelligence in Computer Science and Mathematics Education
In the ASU GFA MOOC, topics belong to sections, also known as slices, which
are higher level abstractions of topics. We also develop directed networks for each
section by using all the transactional data for the topics that belong to each section
(or each slice). Figure 7.10 shows metrics for the in-coming and out-going metrics
calculated for each of the slices.
• The first figures, in both Figure 7.10a and Figure 7.10b, show the number of
edges when students move from one learning transaction to another, in-coming
and out-going respectively. It is clear that the number of students decrease
considerably from one section to a more advanced section. From Figure 7.4
shown earlier, we know that only 0.44% of students who started, progressed
to Section 8 and only around 5% of students completed 90% of the online
content.
• The second figures, for both graphs again, show the sum of all the durations
for each of the edges for one transaction learning state to another. Again, in
Figure 7.10a we look at the in-coming edges and in Figure 7.10b we look at
the out-going edges. The first one shows there are only a few in-coming edges
for the Lesson learning state and the second figure illustrates there are only a
few out-going edges from the Failed state. We should note that section 4 has
a higher number of edges from the transactions, that is because that section
contains more concepts than other sections such as section 3, see Figure 7.11
for details.
134
Artificial Intelligence in Computer Science and Mathematics Education
• The third figures, for both graphs, show the average duration for each of the
edges for one transaction learning state to another. These are the most inter-
esting figures. For both in-coming and out-going figures, the more advanced
the sections, as measured by graph traversal distances, students will take longer
to move to the next learning state, on average. That could mean that either
(1) students will take longer on each learning state for more advanced sections,
or (2) students who advance further through a course have different learning
strategies.
Figure 7.10: Custom Network Metrics Extracted from the Section Networks Divided
by In-coming and Out-going Metrics
The learning states extracted from the transactions and the underlying naviga-
tion through the course for which we have presented graph network statistics above,
can be modelled using a Markovian procedure [40] by assuming the future learning
135
Artificial Intelligence in Computer Science and Mathematics Education
Figure 7.11: Number of Concepts per Section for MAT117 and MAT170 in ASU’s
GFA via EdX
states depend only on the current learning state. This could be further developed
by looking at sequences of learning states and topics learned in order to model the
likelihood of future learning states. In our case we consider the learning states as
observable states and we can model their learning using an HMM, see Figure 7.12.
In this, the unobserved or hidden states can be estimated from the sequence of
learning states students follow and are navigated on the system [89].
Figure 7.12: Diagram of Modelling How Students Learn on MOOC Platforms using
Learning States and Hidden Markov Models
Figure 7.13 shows how the observable states can be mapped to two hidden states.
136
Artificial Intelligence in Computer Science and Mathematics Education
With this level of analysis possible on log data from learning state transitions on
a MOOC, some of which we have actually demonstrated on log files from a MOOC at
ASU, we have demonstrated that we can learn useful structure of the transactional
learning data using unsupervised approaches, potentially learn valuable insights from
137
Artificial Intelligence in Computer Science and Mathematics Education
138
Chapter 8
Conclusions
8.1 Introduction
This chapter concludes the thesis by examining each area studied previously, in turn,
re-stating some of the conclusions from earlier and enabling some discussion from
those outcomes. The central thesis hypothesis was subdivided into smaller areas
that were tackled. I have focused on automatically modelling students, automati-
cally detecting students having difficulties in their computer programming modules,
automatically offering them assistance and measuring how that aids their learning.
Good progress was made towards identifying good predictors of student outcomes
and developing interventions for students in computer programming modules.
We are eager to add more features to our prediction algorithms which would
139
Artificial Intelligence in Computer Science and Mathematics Education
enrich the models’ representation of students and their activities. These additional
features could include the use of the laboratory resources or physical access to build-
ings in our university. Not only would this better describe our students’ digital foot-
prints but it would also allow us to select a subset of the most discriminating features
per week or even daily that provides us with an overview of student behaviour to
improve the predictive functions early in the semester.
It is important to note we are not tracking every time the student runs a program
locally which would be similar to source code snapshots used by other research
studies that are being carried out elsewhere. However, we could look at programming
states and behaviours between submissions to the platform as additional features.
The techniques developed in the thesis have been deployed in a variety of com-
puter programming modules including CA116, CA117, CA114, CA277 and CA278
at Dublin City University. In general, models were trained using past student data
from previous years of the same module and have worked well. However, there were
times we used data from a different course or even all data captured in the platform.
In those instances, we were not expecting the predictions to work as well as the
other models with better training data as the courseware in different modules would
not be the same, but they did a good job. The concepts taught are similar but more
advanced across different computer programming modules, and utilising patterns
from other courses as training dat has surprisingly resulted in good outcomes. The
more data there is means classifiers are usually better even if the material and pro-
gramming exercises are different. This approach could be applicable to other courses
not only in computer programming but even Mathematics and other courses with
a significant amount of laboratory material or programming work which students
need to check and complete on a weekly basis.
In this chapter of the thesis, Chapter 5 we explored in some depth, a range of Ma-
chine Learning algorithms and techniques including Deep Learning, as we gathered
140
Artificial Intelligence in Computer Science and Mathematics Education
more student data. Recent work using Deep Learning tends to work better when
more and more data is provided. However, in Learning Analytics, the number of
students taking a course is an unavoidable limit. Thus we cannot simply generate
more data as is done in other domains such as FinTech or Social Network Analysis.
Our findings indicate there is a need to learn and to develop better mechanisms to
extract and learn effective data features from limited amounts of student data so as
to analyse = students’ progression and performance effectively.
Our code2vec implementation and results confirm the power of code embeddings
as a technique in this application and the latent learning analytics properties in
student code submissions. Code embeddings are an alternative to using bag-of-
words based representations of source code. In future work we will explore combining
Tokens and Abstract Syntax Trees (ASTs) for creating an even richer model of a
student that factors in code details from their computer programming assessments
including structure and context. In addition, we will explore Concrete Syntax Trees
which are parse trees, typically built by a parser during the source code translation
and compilation process, adding subsequent processing to ASTs such as contextual
information.
User2Code2vec is a novel technique that was developed and used in the thesis to
represent students in a high-dimensional space, i.e. when there are many features
forming the students’ digital footprints. It uses distributional representations of
student profiles and their programming code. Other techniques such as Matrix
Factorisation can be used to find and to group students with similar coding patterns.
In addition, a User Representation Matrix could be built as as a Tensor with a new
dimension using all the submissions instead of the last one or one at random. Any
of these might give us a better representation of student learning and progression
that we could use.
141
Artificial Intelligence in Computer Science and Mathematics Education
of the vectors such as the amount and quality of the training data, the size of the vec-
tors and the learning algorithm used. The quality of these vectors is crucial for the
representations but trying out different hyper-parameters takes a lot of computation
and time. Developing and then using pre-trained vectors with a large corpus is a
standard approach in other domains such as word vectors developed using Google’s
News dataset. The Learning Analytics community should make the effort to develop
good Code Embeddings that can be used to learn higher level abstracts like User
Embeddings.
The use of Embeddings for source code submissions and student code represen-
tations is still at an early stage of development but has potential to change how we
understand learning to program, recommend code and peer learning of programming
using higher level abstractions.
In our future work in this specific area we plan to focus on two main aspects,
to learn better distributional semantics using abstract trees to capture syntactic
structure effectively following recent work proposed in [83, 6] and to use the rec-
ommendation learned using the user2code2vec representation proposed in this work
and to evaluate how this representation helps students to improve their learning.
142
Artificial Intelligence in Computer Science and Mathematics Education
Predictive Modelling and identifying those students having difficulties with course
material, also in programming courses [11], and offering remediation, personalised
feedback and interventions to students using Machine Learning techniques [9, 10].
Notifying the Lecturers or Professors who deliver computer programming modules
and sending personalised assistance to students helped those students at-risk to learn
more and reduced the gap in performance in examinations between them and the
higher-performing students. It is important to note that higher-performing students
do not have the same room for improvement than lower-performing students so for
higher-performing students, maintaining their grade is an accomplishment in itself.
However, we are trying to measure learning and we expect that lower-performing
students tend to learn more in our blended classrooms and complete more assessment
programs with mentoring and further assistance.
The approach chosen for the programming recommendations was to pick the
closest text program from among those submitted by the top-ranked students in the
class that year. This could be further advanced by identifying variables and choosing
the closest program syntactically and semantically as we studied in the embeddings
chapter, chapter 5. Other approaches tested and deserving of further exploration are
Collaborative Filtering as used in recommender systems, by looking at the closest
student to a given student, taken from within in the class or from within the top-
143
Artificial Intelligence in Computer Science and Mathematics Education
students and recommend one of their programs. Netflix recommends movies and
Amazon recommends products from people with the same tastes assuming they
are constant. It is reasonable to assume that learning computer programming is a
stable process and to recommend sample computer programs from those students
from their closest person. Last semester, we were more interested in the students
being able to identify what is wrong with their programs and the closest solution
from a top-student worked very well specially for shorter programs, like the ones
suggested in CA114. We also want to explore the use of crowdsourcing and to
recommend the program uploaded the most by using previous years solutions if
most of the programs remain the same in the future.
In addition, eventually, we would like to go a step further and solve programs for
students by suggesting a solution that will fix or improve their own submitted code
and that would meet the lecturer’s learning criteria. This could be done by tokenising
the student’s programs to identify the variables used, calculate the similarity and
differences with the solutions stored and actually solve the student’s program instead
of just offering an alternative solution. We need to be cautious with this approach
and use it in moderation. However, research shows students also learn by example
[92] and a good number of them were demanding more solutions and explanatory
guided code in our survey.
Students
A research visit to Arizona State University (ASU) during the time of this PhD
study enabled collaboration in the area of Educational Analytics, a field where both
institutions have demonstrated expertise. I was involved with ASU’s Action Lab,
a dedicated digital teaching and learning laboratory. They are engaging in deep
learning analytics, providing continuous program improvement, ultimately resulting
in student success. As part of this visit I was able to collect a distinct dataset of
144
Artificial Intelligence in Computer Science and Mathematics Education
digital footprints from students learning Mathematics at ASU Online via EdX.
• Students spend the most time working on programs they get correct, than
they spend on reading the lessons and then on problems they get incorrect;
• The first slices of learning material and in particular the first that contains the
introductory topics is the one that students spent the greatest amount of time
on. This is evidence that many students drop out at the beginning of these
online courses. However, the topic more students have worked on corresponds
to the fifth slice (out of seven) and is called“Adding or subtracting complex
numbers”. ALEKS technology which is an AI-based recommender system used
by students at ASU, redirects students where the students should learn from
in the next step of their learning journey, and that concept seems to be very
connected to other components in the knowledge space;
• In a similar manner, the two most taken assessment types are the Initial Knowl-
edge Check and the Progress Knowledge Check that are carried out in the be-
ginning and continuously, respectively. However, one of the assignment types
taken least by students is the Class Completion Knowledge Check.
What all these findings give us is an insight into student progress, and the high
number of drop-outs that are a characteristic of online MOOCs.
145
Artificial Intelligence in Computer Science and Mathematics Education
The main contribution of this thesis, apart from answering a specific set of re-
search questions, is in providing a set of tools that help Lecturers and Professors
and encourage students’ learning and interest in computer programming by using
cutting-edge data mining techniques. I believe computer programming is an ability
but also a skill that needs work to help it develop.
As our students demanded in the questionnaire responses they provided, I am
eager to provide them with more detailed programming recommendations, suitable
material and other actions to fill the knowledge programming holes they may have
when learning CS programming in blended classrooms at our university.
146
Appendices
147
Appendix A
2. Azcona, D., & Casey, K. (2015). Micro-analytics for Student Performance Pre-
diction. International Journal of Computer Science and Software Engineering
(IJCSSE), 4, 218–223.
3. Azcona, D., Arora, P., Hsiao, I.-H., & Smeaton, A. F. (2019). user2code2vec:
Embeddings for Profiling Students Based on Distributional Representations of
Source Code. In Proceedings of the 9th International Learning Analytics &
Knowledge Conference (LAK’19). ACM.
148
Artificial Intelligence in Computer Science and Mathematics Education
5. Azcona, D., Hsiao, I.-H., & Smeaton, A. F. (2018). Modelling Math Learning
on an Open Access Intelligent Tutor. In The 19th International Conference
on Artificial Intelligence in Education (AIED 2018).
8. Vance, Y., Azcona, D., Hsiao, I.-H., & Smeaton, A. F. (2018). Predictive
Modelling of Student Reviewing Behaviors in an Introductory Programming
Course. In Educational Data Mining in Computer Science Education Work-
shop (CSEDM’18).
9. Vance, Y., Azcona, D., Hsiao, I.-H., & Smeaton, A. F. (2018). Learning by
Reviewing Paper-based Programming Assessments. In European Conference
on Technology Enhanced Learning (EC-TEL’18). NY, USA: Springer.
10. Azcona, D., Corrigan, O., Scanlon, P., & Smeaton, A. F. (2017). Innova-
tive learning analytics research at a data-driven HEI. In Third International
Conference on Higher Education Advances (HEAd’17). Editorial Universitat
Politècnica de València.
11. Azcona, D., & Smeaton, A. F. (2017). Targeting At-risk Students Using En-
gagement and Effort Predictors in an Introductory Computer Programming
149
Artificial Intelligence in Computer Science and Mathematics Education
A.3 Demos
12. Azcona, D., Moreu, E., Hsiao, I.-H., & Smeaton, A. F. (2019). CoderBot:
AI Chatbot to Support Adaptive Feedback for Programming Courses. Com-
panion Proceedings of the 9th International Learning Analytics & Knowledge
Conference (LAK’19). ACM.
150
Appendix B
Organisational Activities
B.1 Workshops
B.2 Proceedings
I was the Proceedings Editor for the 9th International Learning Analytics & Knowl-
edge Conference (LAK 2019) at Arizona State University, Arizona, USA.
https://dl.acm.org/citation.cfm?id=3303772
151
Appendix C
2. How to win a Hackathon with AI at the Insight Seminar Series (April 2019)
3. Data Mining & Embeddings to Offer Fresh Insights on Irish Politics at the
Insight Student Conference 2018 in Unversity College Dublin, Ireland (ISC
2018)
152
Artificial Intelligence in Computer Science and Mathematics Education
11. Demoed at the Learning Analytics & Knowledge 2019 conference in Arizona
State University
13. Presented at the Dublin City University’s Teaching & Learning Day 2018
15. Demoed about Educational Analytics in Computer Science at the 1st. Insight
Augmented Human Demonstrator Event in March 2017
153
Artificial Intelligence in Computer Science and Mathematics Education
16. Attended the Fulbright Enrichment April 2018 Seminar Philadelphia, Penn-
sylvania, USA
17. Attended the Amazon’s AWS re:Invent 2017 in Las Vegas, Nevada, USA
18. Presented a poster at the Ireland’s Data Summit where I met Leo Varadkar
the Taoiseach (Prime Minister) in Ireland, 2017
19. Attended the Big Data and Analytics Summer School at University of Essex
in September 2016 as a YERUN awardee
20. Presented posters at Dublin City University’s Faculty of Engineering and Com-
puting Research Day in 2017 and 2018
154
Appendix D
Awards
• Winner of the Ulster Bank Hackathon (2019) at Dogpatch Labs Dublin, Ireland
• Microsoft Imagine Cup Awardee (2018) in San Francisco and Seattle, USA.
National Awards: Best Use of Artificial Intelligence & 4th place at USA Na-
tionals. World Awards: Top-6 in Artificial Intelligence & Semifinalists at
World Finals
155
Artificial Intelligence in Computer Science and Mathematics Education
• Invited scholar at Data Science Conference (2017) by Open Data Science Con-
ference in San Francisco, CA, USA
156
Bibliography
[2] Davide Albanese et al. “Minerva and minepy: a C engine for the MINE suite
and its R, Python and MATLAB wrappers”. In: Bioinformatics 29.3 (2012),
pp. 407–408.
[3] Miltiadis Allamanis, Hao Peng, and Charles Sutton. “A convolutional atten-
tion network for extreme summarization of source code”. In: International
Conference on Machine Learning. 2016, pp. 2091–2100.
[4] Miltiadis Allamanis et al. “Learning natural coding conventions”. In: Proceed-
ings of the 22nd ACM SIGSOFT International Symposium on Foundations
of Software Engineering. ACM. 2014, pp. 281–293.
[5] Uri Alon et al. “A general path-based representation for predicting program
properties”. In: arXiv preprint arXiv:1803.09544 (2018).
[7] Kimberly E Arnold and Matthew D Pistilli. “Course Signals at Purdue: Using
learning analytics to increase student success”. In: Proceedings of the 2nd In-
ternational Conference on Learning Analytics and Knowledge (LAK). ACM.
2012, pp. 267–270.
157
Artificial Intelligence in Computer Science and Mathematics Education
[8] David Azcona and Kevin Casey. “Micro-analytics for Student Performance
Prediction”. In: International Journal of Computer Science and Software En-
gineering (IJCSSE) 4.8 (2015), pp. 218–223.
[9] David Azcona, I-Han Hsiao, and Alan F Smeaton. “Detecting Students-In-
Need in Programming Classes with Multimodal Learning Analytics”. In: In-
ternational Journal of Artificial Intelligence in Education (ijAIED) (2018).
[10] David Azcona, I-Han Hsiao, and Alan F Smeaton. “PredictCS: Personalizing
Programming Learning by Leveraging Learning Analytics”. In: Companion
Proceedings 8th International Conference on Learning Analytics & Knowledge
(LAK) (2018).
[11] David Azcona and Alan F Smeaton. “Targeting At-risk Students Using En-
gagement and Effort Predictors in an Introductory Computer Programming
Course”. In: European Conference on Technology Enhanced Learning. Springer.
2017, pp. 361–366.
[13] Marco Baroni, Georgiana Dinu, and Germán Kruszewski. “Don’t count, pre-
dict! A systematic comparison of context-counting vs. context-predicting se-
mantic vectors”. In: Proceedings of the 52nd Annual Meeting of the Associ-
ation for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2014,
pp. 238–247.
[14] Yoshua Bengio et al. “A neural probabilistic language model”. In: Journal of
Machine Learning Research 3.Feb (2003), pp. 1137–1155.
[16] Susan Bergin and Ronan Reilly. “Predicting introductory programming per-
formance: A multi-institutional multivariate study”. In: Computer Science
Education 16.4 (2006), pp. 303–323.
158
Artificial Intelligence in Computer Science and Mathematics Education
[19] Paulo Blikstein and Marcelo Worsley. “Multimodal Learning Analytics and
Education Data Mining: using computational technologies to measure com-
plex learning tasks”. In: Journal of Learning Analytics 3.2 (2016), pp. 220–
238.
[20] Phillip Bonacich. “Some unique properties of eigenvector centrality”. In: So-
cial networks 29.4 (2007), pp. 555–564.
[21] Kristy Elizabeth Boyer et al. “Balancing cognitive and motivational scaffold-
ing in tutorial dialogue”. In: International Conference on Intelligent Tutoring
Systems. Springer. 2008, pp. 239–249.
[22] Neil Christopher Charles Brown et al. “Blackbox: a large scale repository
of novice programmers’ activity”. In: Proceedings of the 45th ACM technical
symposium on Computer science education. ACM. 2014, pp. 223–228.
[24] Adam Scott Carter and Christopher David Hundhausen. “Using Program-
ming Process Data to Detect Differences in Students’ Patterns of Program-
ming”. In: Proceedings of the 2017 ACM SIGCSE Technical Symposium on
Computer Science Education. ACM. 2017, pp. 105–110.
159
Artificial Intelligence in Computer Science and Mathematics Education
[25] Ana Carvalho, Nelson Areal, and Joaquim Silva. “Students’ perceptions of
Blackboard and Moodle in a Portuguese university”. In: British Journal of
Educational Technology 42.5 (2011), pp. 824–841.
[27] Marc Claesen and Bart De Moor. “Hyperparameter search in machine learn-
ing”. In: arXiv preprint arXiv:1502.02127 (2015).
[28] Owen Corrigan et al. “Using Educational Analytics to Improve Test Per-
formance”. In: Design for Teaching and Learning in a Networked World.
Springer, 2015, pp. 42–55.
[29] Scotty D Craig et al. “Learning with ALEKS: The Impact of Students’ Atten-
dance in a Mathematics After-School Program”. In: International Conference
on Artificial Intelligence in Education. Springer. 2011, pp. 435–437.
[31] Maria Cutumisu and Daniel L Schwartz. “Choosing versus Receiving Feed-
back: The Impact of Feedback Valence on Learning in an Assessment Game.”
In: Proceedings of the 9th International Conference on Educational Data Min-
ing (EDM). 2016, pp. 341–346.
[32] Matt Dennis, Judith Masthoff, and Chris Mellish. “Adapting progress feed-
back and emotional support to learner personality”. In: International Journal
of Artificial Intelligence in Education 26.3 (2016), pp. 877–931.
[33] Luc Devroye, László Györfi, and Gábor Lugosi. A probabilistic theory of pat-
tern recognition. Vol. 31. Springer Science & Business Media, 2013.
160
Artificial Intelligence in Computer Science and Mathematics Education
[34] Nicholas Diana et al. “An instructor dashboard for real-time analytics in
interactive programming assignments”. In: Proceedings of the Seventh Inter-
national Learning Analytics & Knowledge Conference (LAK). ACM. 2017,
pp. 272–279.
[35] Jean-Paul Doignon and Jean-Claude Falmagne. “Spaces for the assessment
of knowledge”. In: International journal of man-machine studies 23.2 (1985),
pp. 175–196.
[36] Jean-Paul Doignon, Jean-Claude Falmagne, and Eric Cosyn. “Learning Spaces:
A Mathematical Compendium”. In: Knowledge Spaces. Springer, 2013, pp. 131–
145.
[39] Anthony Estey, Hieke Keuning, and Yvonne Coady. “Automatically Classi-
fying Students in Need of Support by Detecting Changes in Programming
Behaviour”. In: Proceedings of the 2017 ACM SIGCSE Technical Symposium
on Computer Science Education. ACM. 2017, pp. 189–194.
[40] Paul A Gagniuc. Markov chains: from theory to implementation and experi-
mentation. John Wiley & Sons, 2017.
[41] Pierre Geurts, Damien Ernst, and Louis Wehenkel. “Extremely randomized
trees”. In: Machine learning 63.1 (2006), pp. 3–42.
[42] Ross Girshick et al. “Rich feature hierarchies for accurate object detection
and semantic segmentation”. In: Proceedings of the IEEE conference on com-
puter vision and pattern recognition. 2014, pp. 580–587.
161
Artificial Intelligence in Computer Science and Mathematics Education
[43] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. “Deep learning”. In:
(2016).
[46] Bjorn Hartmann et al. “What would other programmers do: suggesting so-
lutions to error messages”. In: Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems. ACM. 2010, pp. 1019–1028.
[47] John Hattie and Helen Timperley. “The power of feedback”. In: Review of
Educational Research 77.1 (2007), pp. 81–112.
[49] I-H Hsiao, Sergey Sosnovsky, and Peter Brusilovsky. “Guiding students to
the right questions: adaptive navigation support in an E-Learning system for
Java programming”. In: Journal of Computer Assisted Learning 26.4 (2010),
pp. 270–283.
[50] I-Han Hsiao, Po-Kai Huang, and Hannah Murphy. “Uncovering reviewing
and reflecting behaviors from paper-based formal assessment”. In: Proceed-
ings of the Seventh International Learning Analytics & Knowledge Confer-
ence. ACM. 2017, pp. 319–328.
[51] Petri Ihantola et al. “Educational data mining and learning analytics in pro-
gramming: Literature review and case studies”. In: Proceedings of the 2015
ITiCSE on Working Group Reports. ACM. 2015, pp. 41–63.
162
Artificial Intelligence in Computer Science and Mathematics Education
[52] Petri Ihantola et al. “Review of recent systems for automatic assessment of
programming assignments”. In: Proceedings of the 10th Koli calling interna-
tional conference on computing education research. ACM. 2010, pp. 86–93.
[54] Matthew C Jadud. “Methods and tools for exploring novice compilation be-
haviour”. In: Proceedings of the Second International Workshop on Comput-
ing Education Research. ACM. 2006, pp. 73–84.
[55] George F Jenks. “The data model concept in statistical mapping”. In: Inter-
national yearbook of cartography 7 (1967), pp. 186–190.
[57] John D Kelleher, Brian Mac Namee, and Aoife D’arcy. Fundamentals of ma-
chine learning for predictive data analytics: algorithms, worked examples, and
case studies. MIT Press, 2015.
[58] Hassan Khosravi and Kendra ML Cooper. “Using Learning Analytics to In-
vestigate Patterns of Performance and Engagement in Large Classes”. In:
Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer
Science Education. ACM. 2017, pp. 309–314.
[59] Michael Kölling et al. “The BlueJ system and its pedagogy”. In: Computer
Science Education 13.4 (2003), pp. 249–268.
[60] Jakub Kuzilek et al. “OU analyse: Analysing at-risk students at the Open
University”. In: Learning Analytics Review (2015), pp. 1–16.
[61] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep learning”. In: na-
ture 521.7553 (2015), p. 436.
163
Artificial Intelligence in Computer Science and Mathematics Education
[65] Richard Meyes et al. “Ablation Studies in Artificial Neural Networks”. In:
arXiv preprint arXiv:1901.08644 (2019).
[66] Tomas Mikolov et al. “Distributed representations of words and phrases and
their compositionality”. In: Advances in neural information processing sys-
tems. 2013, pp. 3111–3119.
[68] Tripti Mishra, Dharminder Kumar, and Sangeeta Gupta. “Mining students’
data for prediction performance”. In: 2014 Fourth International Conference
on Advanced Computing & Communication Technologies. IEEE. 2014, pp. 255–
262.
[70] Oliver Mooney et al. “A study of progression in Irish higher education”. In:
Dublin: Higher Education Authority (2010).
[71] Lili Mou et al. “Building program vector representations for deep learning”.
In: arXiv preprint arXiv:1409.3358 (2014).
[72] Lili Mou et al. “Convolutional Neural Networks over Tree Structures for
Programming Language Processing.” In: AAAI. Vol. 2. 3. 2016, p. 4.
164
Artificial Intelligence in Computer Science and Mathematics Education
[75] Susanne Narciss. “Feedback strategies for interactive learning tasks”. In:
Handbook of research on educational communications and technology 3 (2008),
pp. 125–144.
[76] Susanne Narciss et al. “Exploring feedback and student characteristics rel-
evant for personalizing feedback strategies”. In: Computers & Education 71
(2014), pp. 56–76.
[77] Iulian Neamtiu, Jeffrey S Foster, and Michael Hicks. “Understanding source
code evolution using abstract syntax tree matching”. In: ACM SIGSOFT
Software Engineering Notes 30.4 (2005), pp. 1–5.
[78] Andrew Ng. “What data scientists should know about deep learning”. In: .
44 (2015).
[79] Xabier Ochoa. “Multimodal Learning Analytics”. In: The Handbook of Learn-
ing Analytics, 1 ed., C. Lang, G. Siemens, A. F. Wise and D. Gasevic, Eds.
Alberta, Canada: Society for Learning Analytics Research (SoLAR), 2017,
pp. 129–141.
[81] Andrei Papancea, Jaime Spacco, and David Hovemeyer. “An open platform
for managing short programming exercises”. In: Proceedings of the ninth an-
nual international ACM conference on International computing education re-
search. ACM. 2013, pp. 47–52.
165
Artificial Intelligence in Computer Science and Mathematics Education
[82] Andrew Petersen et al. “Revisiting why students drop CS1”. In: Proceedings
of the 16th Koli Calling International Conference on Computing Education
Research. ACM. 2016, pp. 71–80.
[84] Chris Piech et al. “Modeling how students learn to program”. In: Proceed-
ings of the 43rd ACM Technical Symposium on Computer Science Education.
ACM. 2012, pp. 153–160.
[85] Victor Pigott and Denise Frawley. “An Analysis of Completion in Irish Higher
Education: 2007/08 Entrants”. In: Higher Education Authority in Ireland
(2019).
[86] Sebastian Proksch, Sven Amann, and Sarah Nadi. “Enriched event streams:
a general dataset for empirical studies on in-IDE activities of software devel-
opers”. In: Proceedings of the International Conference on Mining Software
Repositories. 2018.
[87] Sebastian Proksch et al. “A dataset of simplified syntax trees for C”. In:
Proceedings of the 13th International Conference on Mining Software Repos-
itories. ACM. 2016, pp. 476–479.
[88] Keith Quille, Susan Bergin, and Aidan Mooney. “PreSS#, A Web-Based
Educational System to Predict Programming Performance”. In: International
Journal of Computer Science and Software Engineering (IJCSSE) 4.7 (2015),
pp. 178–189.
166
Artificial Intelligence in Computer Science and Mathematics Education
[90] Maxim Rabinovich, Mitchell Stern, and Dan Klein. “Abstract syntax net-
works for code generation and semantic parsing”. In: arXiv preprint arXiv:1704.07535
(2017).
[91] Veselin Raychev, Martin Vechev, and Eran Yahav. “Code completion with
statistical language models”. In: ACM Sigplan Notices. Vol. 49. 6. ACM.
2014, pp. 419–428.
[92] Rebecca Reynolds and Ming Ming Chiu. “Formal and informal context fac-
tors as contributors to student engagement in a guided discovery-based pro-
gram of game design learning”. In: Learning, Media and Technology 38.4
(2013), pp. 429–462.
[93] Shaghayegh Sahebi, Yu-Ru Lin, and Peter Brusilovsky. “Tensor factorization
for student modeling and performance prediction in unstructured domain”.
In: Proceedings of the 9th International Conference on Educational Data Min-
ing. IEDMS. 2016, pp. 502–506.
[94] Gerard Salton, Anita Wong, and Chung-Shu Yang. “A vector space model for
automatic indexing”. In: Communications of the ACM 18.11 (1975), pp. 613–
620.
[95] Valerie J Shute and Diego Zapata-Rivera. “Adaptive technologies”. In: ETS
Research Report Series 2007.1 (2007).
[96] George Siemens and Phil Long. “Penetrating the fog: Analytics in learning
and education.” In: EDUCAUSE review 46.5 (2011), p. 30.
[98] Sergey Sosnovsky, I-Han Hsiao, and Peter Brusilovsky. “Adaptation “in the
Wild”: ontology-based personalization of open-corpus learning material”. In:
European Conference on Technology Enhanced Learning. Springer. 2012, pp. 425–
431.
167
Artificial Intelligence in Computer Science and Mathematics Education
[99] Terry Speed. “A correlation for the 21st century”. In: Science 334.6062
(2011), pp. 1502–1503.
[102] Christopher Watson and Frederick WB Li. “Failure rates in introductory pro-
gramming revisited”. In: Proc. 2014 Conference on Innovation & Technology
in Computer Science Education. ACM. 2014, pp. 39–44.
[103] Christopher Watson, Frederick WB Li, and Jamie L Godwin. “Bluefix: Using
crowd-sourced feedback to support programming students in error diagnosis
and repair”. In: International Conference on Web-Based Learning. Springer.
2012, pp. 228–239.
[104] Christopher Watson, Frederick WB Li, and Jamie L Godwin. “No tests re-
quired: comparing traditional and dynamic predictors of programming suc-
cess”. In: Proceedings of the 45th ACM Technical Symposium on Computer
Science Education. ACM. 2014, pp. 469–474.
168
Artificial Intelligence in Computer Science and Mathematics Education
[107] Doug Wightman et al. “Snipmatch: using source code context to enhance
snippet retrieval and parameterization”. In: Proceedings of the 25th Annual
ACM Symposium on User Interface Software and Technology. ACM. 2012,
pp. 219–228.
[108] Ben Williamson. “The hidden architecture of higher education: building a big
data infrastructure for the ‘smarter university’”. In: International Journal of
Educational Technology in Higher Education 15.1 (2018), p. 12.
[110] Billy Tak-ming Wong and Kam Cheong Li. “A review of learning analytics
intervention in higher education (2011–2018)”. In: Journal of Computers in
Education (2019), pp. 1–22.
169