Automatic Answer Checker Using Machine Learning
1
Tushar Singh, 2Shaifullah Ahmed, 3Dr. Ajay Kr. Sahu
1
ts807742@gmail.com, 2mdsaif96328@gmail.com , 3ajaysahu.it@gniot.net.in ,
Department of Information Technology, Greater Noida Institute of Technology, (Engg. Institute), Greater Noida
Abstract: This paper presents an automated evaluation model designed to reduce human involvement, minimize
biases caused by fluctuating psychological states, save time, ensure consistent tracking of evaluations, and
simplify data extraction. The proposed method dynamically assesses key elements and generates results closely
aligned with those of human evaluators. To validate the model's effectiveness, its scoring outcomes were
compared against teacher-assigned scores and results obtained through various keyword-matching and
similarity-assessment techniques.
Keywords: Natural Language Processing, Machine Learning, Subjective Answer Evaluation, Learning
Assessments.
I. INTRODUCTION subject. Asking descriptive
questions can deepen students'
Exam questions can be broadly categorized as either engagement with the electronic
Objective or Subjective. Objective questions consist content they regularly encounter.
of selecting a response from a list of alternatives or Students must adhere to the rules
providing a word or brief sentence. These types of of content, syntax, and punctuation
questions have only one correct answer and can easily while submitting subjective
be graded automatically by an online assessment answers, and they must explain
platform. On the other hand, Subjective questions their reasoning by giving examples,
require answers in the form of explanations. Essay writing figures, or even sketching
questions, short answers, definitions, scenarios, and an illustration. The subjective
opinion questions are among them. It is vital to components are more captivating
include human knowledge of the concepts when and noteworthy due to these
grading these detailed answers using Artificial factors.
Intelligence techniques, as well as to take into
account linguistic factors like vocabulary, sentence
structure, and syntax.
2.RELATED WORK
Numerous academic approaches have been developed to
Due to the ongoing pandemic, education has facilitate the automated evaluation of subjective
undergone a significant transformation that has responses. A summary of some notable methods is
rapidly increased online learning, where instruction is provided below:
delivered remotely via digital platforms and online Assessment of Answers in Online Subjective
classrooms. As the teaching-learning sessions have Examinations
become virtual, online descriptive tests and In this method, questions were categorized into types such
assessments can be the best carriers of skills and as Define, Describe/Illustrate, Differentiate/Distinguish,
personality enhancement. Students can learn better Discuss/Explain, Enumerate/List/Identify/Outline,
and experiment with their writing patterns by Interpret, and Justify/Prove, focusing on responses limited
working on spontaneous thoughts regarding the to a single sentence. A paragraph indexing module used
query terms from the question-processing module to
retrieve relevant information. For answers, part-of-speech
tagging (e.g., using Python POS taggers) and shallow
parsing were employed to extract pertinent words or
phrases. Lexical resources, such as WordNet for synonym
matching, aided in determining answer correctness. responses matching the expected pattern by over 50%
Paraphrasing techniques based on synonym substitution, were considered correct. The system achieved a
lexical/structural changes, and alterations emphasized the performance accuracy of 70%. However, it struggled to
intent behind responses. Semantic analysis, conducted evaluate questions requiring diagrams, examples, or
with WordNet, measured word density in sequences; mathematical formulas (Dhokrat et al., 20
3.PROPOSED SYSTEM
Depending on the kind of question posed, the techniques 2. Similarity Check. Find the sentence similarity
and methods operate differently; the suggested model between the student's and model's responses.
permits a variety of method combinations to obtain the 3. Grammar/Language Check. Language score is
best grade. For assessing the answers, not every determined by examining spelling and grammatical
evaluation criterion has to be given equal weight. As a errors.
result, the proposed method enables the marks in The developed model, as presented in Figure l, consists of
accordance with the weightage of the evaluation criteria. two parts: one is for the evaluation of answers (Checker),
The proposed model has the following evaluation criteria: and another is for finding the optimal combination of
l . Keywords Matching. Check for the presence of evaluation techniques that can be used to evaluate
important keywords. answers to a particular question (Evaluator). The sum of
the similarity, language/grammar, and keyword scores
determines the final score
Proposed Algorithm: user search engine return all document that are
Key word search algorithm: A search algorithm is an associated with these keywords. Typically two keyword &
algorithm that retrieves information stored within a documents are of associated with keywords are
some data structure. Data structure can include linked contained in the document & their degree of
list, array, search tree, hash table or various other associatively is often distance from each other.
storage methods the appropriate search algorithm often Keyword research is a practices search engine
depends on the data structure being searched. optimization professionals use to find & research actual
Searching also encompasses algorithm that query the search terms that people enter into search engine
data structure such as SQL SELECT command. Search optimization professional research keyword which they
algorithm can be classified based on their mechanism use to achieve better ranking in search engines. 2]
of searching. Linear search algorithm check every record Stemming Algorithm: It is the process for removing
for the one associated with target key in a linear the commoner morphological & in flexional ending
fashion. Binary search repeat target the center of the from words in English. It is main use is as part of a term
search structure and divide the search & digital normalization process that is usually done when
search algorithm. Hashing directly maps keys to record setting up information retrieval system. Stemming
based on a hash function. searches outside of a linear refers to the process of removing affixes (prefixes &
search require that the data be sorted in some way. suffixes) from words. In the information retrieval
Search functions are also evaluated on the basis of their context, stemming is used to conflate word from to
complexity or maximum theoretical runtime. Keyword avoid mismatches that may undermine recall. As a
search: Keyword search is the most popular simple example consider searching for a document
information discovery method because the user does entitled “How to write” if the user issues the query
not need to know either a query language or the “writing ” there will be no match with the title .however if
underlying structure of the data. The search engine are the query is stemmed so that “writing ”becomes ”write”
available today provide keyword search on top of sets of then retrieval will be successful. Stemming is the
document when a set of keyword is provided by the process of finding the route word. Given below is the
System Architecture of this Automatic Answer Sheet
Checker
Automatic
matching
4 CHECKER: The implemented system offers the user a choice of
techniques for sentence similarity analysis, keyword
Methods used for automatic keyword extraction are: extraction, and summarization. The combination obtained
l . Term Frequency- Inverse Document Frequency (TF- by an evaluator can be used in this part, or the user can try
IDF). It is a statistical method for determining how their own combinations. The system is implemented as a
pertinent a word is to a document within a group of web application using Flask that takes a question with up
documents. To accolnplish this, the frequency of a word to three model answers (Expected Answers), a student's
within a document and its inverse document frequency answer, and total marks as user input.
across a collection of documents are multiplied (Ramos et For the manual option, the user must enter the desired
al., 2003). keywords, separated by commas, and for the automatic
2. CountVectorizer. It is a utility offered by the option, the user must provide the desired number of
Python scikit-learn package that turns a given text into a keywords. Additionally, a selection for the keyword-
vector based on the frequency of each word that appears matching technique is provided. In case the student's
across the full-text (Cou, n.d.). response is too lengthy, the user can then select a
technique to sununarise it. The maximum number of
3. SpaCy. It is a Python and Cython grammatical mistakes that may appear in the response can
programmingbased open-source natural language be specified. When comparing expected and student
processing library. For trainable features such as named replies, the user has a choice of methods. The user must
entity recognition, part-of-speech tagging, dependency enter the percentage weighting of grammar, keyword
parsing, text classification, and entity linking, it has built- matches, and similarity checks in order to calculate the
in support. It segments the paragraph into pieces, and final marks. The process of extracting keywords involves
keywords can be identified by using parts of speech selecting the most pertinent words and phrases from the
tagging and noun extraction (spa, n.d.). text. Both the Manual and Automatic options are provided
for keyword extraction. For the Inanual method, an input
4. Rapid Automatic Keyword Extraction Algorithm
of comma-separated keywords issirnilarity between two
(RAKE). To identify the significant words or phrases in a
documents regardless of their size. The sinlilarity between
document's text, it employs a set of stopwords and phrase
each pair of sentences in a paragraph is calculated and
delimiters (rak, n.d.).
ranked. The highest-ranked sentences are used in
5. Yet Another Keyword Extractor (YAKE). It is a summary (Rahutomo et al., 2()12).
simple unsupervised automatic keyword extraction
4.1 Methods Used for Keyword Matching
technique that chooses the most significant keywords
from a text by using statistical text features acquired from For keyword matching, the RAKE and YAKE approaches
individual documents. (yak,n.d.). are utilized, with the optimum strategy chosen based on
the needs. The extracted keywords of the model answers
are compared with the keywords of the student's response. Python, which specifies the mistakes along with the
The keywords are also matched with the synonyms of the document's Rule Id, Message, Suggestion, and line
keywords that were extracted from the model answer if number. The user can choose the maximum number of
the check synonym option is chosen. errors permitted as a cutoff point at which the grammar
marks can be deducted.
4.2 Summarization Method
4.3 Similarity Check
Summarization can be defined as the task of producing a
concise and fluent sununaryr while preserving essential The similarity between the student's and model's answers
information and overall meaning. If a student's answer is is checked to determine how closely the student's
lengthy, summarization will help the evaluator to response corresponds to the model's response. Methods
understand the answer's gist and determine the student's used for checking similarity are:
level .
l. FuzzyWuzzy. It is a Python library for string matching
that uses Levenshtein distance to determine the
differences between sequences. The Levenshtein distance
between two words is the smallest number of insertions,
deletions, or character swaps (single-character changes)
l . Cosine Similarity. It is a Natural Language Processing required to change one word into another. (Fuz, n.d.).
method used for measuring the text sirnilarity between 2. Jaccard Similarity. Also known as the Jaccard
two documents regardless of their size. The sinlilarity index and Intersection over Union. It is a metric used to
between each pair of sentences in a paragraph is determine the similarity between two text documents by
calculated and ranked. The highest-ranked sentences are finding common words that exist over total words (Bag et
used in summary (Rahutomo et al., 2()12). al., 2019).
2. BM25 Okpi. BM is an abbreviation for best 3. TF-IDF. It displays a word's frequency in a
matching. It's a ranking algorithm that assigns a bunch of document as well as its inverse document frequency for a
documents a ranking based on the search collection of documents (Ramos et al., 2003).
4. BERT. Bidirectional Encoder Representations
from Transformers (BERT) is a transformer-based model
to measure the semantic similarity between sentences. It
phrases that exist in each one of them, independent of converts all the sentences into a vector form and then
how a document's search phrases relate to each other. determines the sentences that are closest in proximity to
(Robertson et al., 2009). one another in respect of Euclidean distance or cosine
similarity. (Devlin et al., 2018).
3. BM25L. It is an extension of BM25, which was
developed to overcome the previous model's unfair 5 EXPERIMENTS AND RESULTS
preference for shorter documents over longer ones (Lv
and Zhai, 201 1). We observed that the BLEU (bilingual The end-to-end model is implemented using Python
evaluation understudy) score and ROUGE (Recall- programming, Flask, HTML, and CSS on the front
Oriented Understudy for Gisting Evaluation) score for
BM25L were the best among other summarization
methods. end. For an introductory computer science course, the
assessment model has been tested with more than 20
Grammar is the structural foundation of one's ability to questions and responses from 14 students.
express oneself. It can help foster precision, detect
ambiguity, and exploit the richness of expression The expected answer and student responses are compared
available in the language. Automated grammar check is to determine the similarity score. The keywords,
implemented using the Language_check library of grammar, and semantics of words are checked to ensure
that the response is accurate. The evaluator module
determines the best approach for similarity checking, or it
can also be selected manually.
The total score is the sum of the similarity, grammar/
language, and keyword scores. According to Table l, the
student with ID8 does remarkably well. In order to
evaluate how effectively the developed model.
Figure 3: A comparison of the linal score obtained by the students using the proposed model and tnarks awarded bv the
teacher.
Figure 4: A
companson of
the marks
obtained by
the students
using keyword
matching
algorithms
Figure -5: A companson of the marks obtained by the students using vanous similanty
methods.
Ques Student Marks awarded Total marks no id by model
teacher's grades, and the results are shown in Figure
3. Figures 4 and 5 compare the scores given by mechanism for automatically assessing answer scripts
different keyword matching and similarity check that use the question, the expected answer, and the
algorithms. student's response as an input. The proposed model is
trained to categorize questions according to marks,
which can assist in automatically assigning weights to
6 CONCLUSION AND FUTURE WORK components. The proposed method does not consider
answers that include non-textual information like
The grading of student responses is more difficult equations, graphs, and tables, which could be the
under the existing evaluation procedure. The direction of future research. Additionally, batch
evaluation scheme has significant issues that require a processing of all students' responses is a viable
lot of human resources, time, and expertise. To alternative to accessing a question at a tilne.
overconle these challenges, this work developed a
7 . REFERENCE [3]. Chhanda Roy , Chitrita Chaudhuri “Case Based
Modeling of Answer Points to Expedite Semi-
[1]. Partha Pakray, Santanu Pal and Sivaji Automated Evaluation of Subjective Papers” IEEE 2018.
Bandyopadhyay “Automatic Answer Validation System
on English Language” IEEE 2010. [4]. http://eduexamsoftware.weebly.com.
[2]. Gunjal M.S., Sanap K.N., Sable R.G., Nannaware [5]. www.projectcorner.in/online-examination-system-
P.S., Ghuge R.B. , “Automatic answer sheet checker. college -project-asp-net.
”, International Journal of Advanced Engineering and
Science research(IJAES) ,Volume 5, Issue 1, March 2017. [6]. http://oes.sourceforge.net.
[7]. www.codeguru.com [13]. R. C. Schank, “Memory-based expert systems”,
Technical Report (# AFOSR. TR. 84-0814), Yale
[8]. Web Enabled Commercial Appplication University , New Haven, USA, 1984
Developmen -Ivan Bayross .
[14]. K. D. Ashley, “Case-based reasoning and its
[9]. Head First Servlets and J SP -Bryan Basham,Kathy implications for legal expert Systems”, Artificial
Sierra and Bert Bates. Intelligence and Law, Volume 1, Issue 2-3, pp 113-
[10]. V. Senthil Kumaran and A Sankar, “Towards an 208, June 1992
automated system for short-answer assessment using [15]. M. Nilsson, M. SollenBorn, “Advancement and
ontology mapping”, International Arab Journal of e- Development Trends in Medical Case-Based Reasoning
Technology, Vol. 4, No. 1, pp: 17-24, January 2015 : An Overview of Systems and System Development”,
[11]. H. Mittal and and M. Syamala Devi, © American Association for Artificial Intelligence, 2004
“Computerized Evaluation of Subjective Answers Using [16]. D. W. Aha, “The omnipresence of case-based
Hybrid Technique”, International Conference on reasoning in science and application”, Knowledge-
Innovations in Computer Science and Engineering Based Systems, vol. 11, no. 5-6, pp. 261-273, 1998
(ICICSE 2015), Part of the Advances in Intelligent
Systems and Computing book series, volume 413, pp: [17]. T. H. Cormen, C. E. Leiserson, R. L. Rivest, C.
295-303, Stein, “Introduction to Algorithms”, 2nd Edition, Prentice
Hall of India (Eastern Economy Edition), 2001
[12]. R. C. Schank, ”Dynamic Memory : A theory of
reminding and learning in computers and [18]. T. Nawaz, K. I. Qazi, and MdI. Ashraf, “An
people”,Cambridge, UK: Cambridge University Press, Efficient Algorithm of Computerized Checking System
1982 for Hard Copy MCQs Test(HCMCQST)”, IJCSNS
International Journal of Computer Science and Network
Security, VOL.9 No.5, pp: 228-236, May 2009