Chapter2 - Literature Review
Chapter2 - Literature Review
Chapter2 - Literature Review
Updates
1.0 Domain
No. Author Summary Critical Thoughts
(1) Xue (2021) Xue (2021) found that utilizing a stacking method yielded the most effective results The literatures have indicated that the stock
in predicting the NASDAQ Composite Index. prediction domain, specifically focusing on the
NASDAQ stock market, has gained significant
attention due to its importance for investors and
(1) Kovacs et al. (2021) Kovacs et al. (2021) employed recurrent neural networks to model and predict stock
traders in making profitable decisions.
market changes, providing a good approximation of trends.
(1) Yan & Wu (2021) Yan & Wu (2021) employed a random walk model for short-term simulation of the
NASDAQ Composite Index but faced difficulties with longer-term simulation.
(1) Feng et al. (2019) Feng et al. (2019) proposed a deep learning solution called Relational Stock Ranking
(RSR) that captures time-sensitive stock relations and outperforms other prediction
methods.
2.0 Data Analysis / EDA/ Descriptive Analysis
No. Author Summary Critical Thoughts
(1) Ali (2019) Ali (2019) elucidates the essential components of descriptive The reviewed papers demonstrate that measures of central
statistics, encompassing measures of central tendency (such as tendency and dispersion, such as mean, median, mode,
mean, median, and mode) and measures of dispersion (such as variance, and standard deviation, play a pivotal role in
variance and standard deviation). describing the distribution of data and providing valuable
(1) Acosta (2021) Acosta (2021) suggests that, when combined with more complex insights.
statistics, descriptive statistics can help focus research on The integration of descriptive statistics with more complex
meaningful problems and facilitate clear communication of statistical techniques allows researchers to focus on
findings. meaningful problems and effectively communicate their
(1) Turner (2019) Turner (2019) provides a comprehensive overview of the common findings.
descriptive statistics employed in headache research. By
showcasing the specific application of these statistical measures in
a specialized field, the paper contributes to a deeper understanding
of their relevance and utility in the context of medical research.
(1) Bulanov (2021) Bulanov (2021) delves into the basic principles of descriptive
statistics in the domain of medical research. This includes
elucidating the distinction between qualitative and quantitative
variables and discussing the application of measures of central
tendency and variation.
3.0 Data Modelling
Literature Details of literature Critical Thoughts
Sabah Al-Fedaghi (2021). This paper studies the representation and ER models are essential and powerful for
organisation of complex data structures and conceptual data modelling as they are able
Conceptual Data Modeling: Entity-Relationship relationships in various domains through the usage to analyse data structures and relationships.
Models as Thinging Machines of ER models.
Jonathan Cohen and Jorge Gil (2021). This article discusses the development process of ERD is the visual representation of the
ERM and the usage of ERM as a tool to understand entities, attributes, and relationships
An entity-relationship model of the flow of waste and and map the flow for waste and resource derived from the ERM.
resources in city-regions: Improving knowledge management.
management for the circular economy
Noraizan Amran, Haslinda Mohamed and Farrah Diana This paper presents the idea of using ERD in This paper provides a better understanding
Saiful Bahry (2018). designing the human resource training management of ERD through detailed descriptions of the
conceptual model. The authors describe the ERD components and the relationships.
Developing Human Resource Training Management components and identify their project’s main
(HRTM) Conceptual Model Using Entity Relationship entities and their attributes.
Diagram (ERD)
4.0 Machine Learning Model
Literature Summary Critical Ideas
Jawahar et al. (2019) Introduction to the BERT language model, its technical BERT is presented as a viable language model for text
capabilities, technology, and language understanding classification tasks and is superior to previous models with its
What does BERT learn about the structure benchmarks. remarkable results in NLP benchmarks such as GLUE.
of language?
Koroteev (2021) Presents BERT’s flexible customisation features in the This paper speaks on BERT’s customisation features which
form of retraining & fine-tuning. BERT is retrained with enable the model to be tuned for a wide variety of NLP tasks. It
BERT: A Review of Applications in specific domain text data to serve domain-specific needs. provides industry-level examples in the form of BioBERT and
Natural Language Processing and BERT is also fine-tuned to generate different outputs in insight into the model’s capabilities.
Understanding addition to text classification tasks. This paper also
mentions BERT’s contextual capabilities, enabling it to
solidify as an industry standard over unidirectional
language models.
4.0 Machine Learning Model: Continue
Literature Summary Critical Ideas
Grießhaber et al. (2020) This study focuses on training the BERT model in a low- BERT proves to be a valid model even in lower resource
resource setting which is the case more often than not. environments, specifically where training data is scarce. In order
Fine-tuning BERT for Low-Resource With the training dataset as small as 1,000 data points, to evaluate the trained or tuned model, proven metrics were also
Natural Language Understanding via BERT was able to maintain its performance. Different presented for us to implement in the development & testing
Active Learning techniques, such as the Monte Carlo Dropout Strategy, phase.
layer freezing, and Bayesian Approximations, were
presented as valid metrics to gauge the accuracy and
performance of post-trained BERT models.
Devlin et al. (2018) Expanding on the fine-tuning ability of the BERT model With the built-in Transformer Architecture, the fine-tuning
and the process of execution. The fine-tuning process of process of the model for our specific use case will be in line
BERT: Pre-training of Deep Bidirectional the BERT model is made straightforward due to its self- with the timeline provided for the development of SkillSync.
Transformers for Language Understanding attention mechanism, also known as the Transformer The in-depth timeline of the project can be viewed in the Gantt
architecture. This allows the model to capture relationships Chart in Chapter 5.
between different words within a sequence by design.
TF-IDF is used to give statistics on how important a certain term is to the dataset It gives the clear explanation of TF-
of given documents. IDF and some examples of it.
TF-IDF uses weights for text mining and information retrieval, and its value is
Sentiment analysis of directly correlated with the quantity of a given word within the provided dataset.
(Rathi et.al,
tweets using machine The frequency of each different word is counted in the provided document for
2018)
learning approach better information retrieval that generally controls more frequently appearing
words.
TF-IDF is used to convert the string input to the numerical type.
News text classification The category keywords are extracted using the TF-IDF model, and the cosine It gives the explanation of the TF-IDF
(Li et.al, 2018) based on improved Bi- similarity between the category keywords and the text keywords that need to be but didn’t give any examples or
TF-IDF is a technique that combines inverse document frequency (IDF) and It gives a clear explanation of TF-IDF
Text mining approach
term frequency (TF). The goal of the TF-IDF method is to find a representation vectorization.
using TF-IDF and naive
of each document's value from a training data set, after which a vector between
(Aninditya et.al, Bayes for classification of
the document and the word (documents with terms) is formed. Next, the
2019) exam questions based on
prototype vectors, also known as centroid clusters, are used to determine how
cognitive level of bloom's
similar the document is to the cluster.
taxonomy
5.0 Data Visualization
Literature Details of literature Critical Thoughts
Donckt et al. (2023) The paper explores the topic of the Line charts provide a clear
representation and effectiveness of line chart representation of continuous data and
Data Point Selection for Line Chart Visualization: visualization in analyzing time series data. are ideal for visualizing trends.
Methodological Assessment and Evidence-Based
Guidelines
Xiong et al. (2022) This article studies the usage of simple bar Bar charts are useful for representing
charts and investigates the comparison of categorical data and presenting the
Visual Arrangements of Bar Charts Influence affordances of four common arrangements data in a clear and concise manner.
Comparisons in Viewer Takeaways found in bar charts.
Peng et al. (2019) This paper discusses the use of radar charts Radar charts, also known as spider
to visualize and compare estimation results. charts, provide a clear visualization in
Radar Chart for Estimation Performance comparing the performance across
Evaluation different categories.
6.0 SDLC / BALC/ MLOPS
Literature Title Summary Critical Thoughts
Zahariah et al. (2018). The paper discussed the challenges of a model While choosing an SDLC framework, a more
development lifecycle and suggested a new flexible option should be adopted as there
Accelerating the machine learning open-source tool to streamline the model will be numerous trials and errors when
lifecycle with MLflow. development process while providing user developing an ML model and system
flexibility when developing the ML model. simultaneously.
Qian et al. (2018) The paper discussed the development process The ML model and system shall be
of an IoT mobile Application with an integrated developed separately and integrated later to
Orchestrating the development lifecycle of ML model. A flexible development process avoid repeating errors within the system
machine learning-based IoT applications: must be used, and the ML model must be during the development process.
A taxonomy and survey. developed separately due to the large amount
of data processed.
Ranawana and Karunananda (2021) The paper discusses the challenges when The MLASDLC model by Ranawana and
building software with ML models. The author Karunananda is a good example of a hybrid
An Agile Software Development Life suggests a new development lifecycle called of SDLC and ML development methodology.
Cycle Model for Machine Learning MLASDLC. The proposed project will use a simpler
Application Development. version of the proposed lifecycle framework.
6.1 RPA
Author, year Tittle Discussion summary Critical thought
There are three different kinds of RPA model, which are attended,
Robotic process automation: unattended, and hybrid.
By adopting the RPA hybrid model, it offers
(Axmann & Harmoko, An overview and RPA Hybrid Model is the combination of both the attended and
2020) flexibility, improved user experience, and cost
comparison to other unattended models. It is used to carry out a lengthy procedure split
optimization.
technology in industry 4.0. into two stages (the decision stage and the complete automation
stage).
An Assessment of the Hybrid RPA combines both attended and unattended models, and it
Adoption of Robotic Process makes it easier for humans to delegate tasks to machines so that both By combining human and robot skills, this
(Mbiu, 2022)
Automation in Kenyan can focus on their strengths. The robot can handle certain jobs on its partnership increase production
Insurance Companies own, while still providing manual assistance to workers.
(Baytar, Model Proposal for Testing Websites To control the operations of a web Selenium web driver is a potent tool for
2022) in Multiple Browsers: Case of browser, Selenium WebDriver is automating web browser interactions, but in
Selenium Test Tool necessary. Drivers are required to order to communicate with various browsers, it
connect Selenium WebDriver to the depends on drivers.
needed web browsers (Baytar, 2022).
(Raharjana Conversion of User Story Scenarios ChromeDriver is an alternative to Essential for selenium testing automation in the
et al., 2023) to Python-Based Selenium Source WebDriver that provides testing chrome browser, allowing developers to take
Code for Automated Testing automation in the Chrome web control of and automate browser processes for
browser. This is a prerequisite for effective testing.
running the Selenium software.
(Boni et al., Automated driver management for To manage Chrome and Chromium, ChromeDriver is a selenium web driver used for
2021) Selenium WebDriver you'll need a Selenium web driver, of managing and automating chrome and
which ChromeDriver is one of it. chromium browsers.
7.0 Verification & Validation
Literature Summary Critical Thoughts
Geetha & Renuka (2021). The paper explores the use of BERT base uncase These metrics are appropriate and commonly
model in sentiment analysis of reviews and used to evaluate the usefulness of the model.
Improving the performance of aspect performance metrics like precision, recall, f-measure,
based sentiment analysis using fine- and accuracy are used.
tuned Bert Base Uncased model.
Geetha & Renuka (2019) The paper compared deep learning methods used in Metrics like RMSE and MSE can be used to
recommender system and performance metrics like evaluate the validation data performance while
Research on recommendation systems RMSE, MSE, MAE, precision, recall, F1, ROC, are metrics like precision, recall, and F1 can be
using deep learning models. used. used to evaluate the model performance.
Yu, Su & Luo (2019). This paper addressed the limitations of BERT in Metrics like accuracy, F1, and precision are
multi-class classification and the performance metrics commonly used to evaluate model
Improving bert-based text used are accuracy, F1, and precision. performance.
classification with auxiliary sentence
and domain knowledge.
8.0 Expected Contribution
Authors Title Critical Thoughts
Robotic process automation (RPA) is not meant to replace workers but rather to assist
them in daily tasks and free them from mundane, unimaginative tasks. Restoring
(Liermann & process visibility, streamlining and speeding up execution, and bettering process quality
Faster Turnaround Time
Stegmann, 2021) are the goals, not cutting down on employees. Managing change effectively is essential
for the success of all partners in any transition.
Result shown from the research shows that expanding the use of RPA in current
(EI Khatib et al., Streamlined Workflow
processes would not only increase monitoring and control capabilities but will also
2023) and Collaboration
motivate the creation of cutting-edge new platforms.
What you Need to Do
INDIVIDUAL TASKS
Individual Task
Slide 1 Slide 2 Slide 3/4/5 Summary
• Project Title • Distribution • Summary • Summarize
• Group Tasks of Table your
Members your team • Significant findings
members Point • What will
• Highlight • 3–5 you adopt?
your Task Journals /
Reference
Example
INDIVIDUAL UPDATES
Group 1A
TITLE: TIME SHEET MANAGEMENT FOR MANPOWER
CONSULTANTS – INTEGRATED PAYMENT AND BILLING
PROCESS
Domain : Time Sheet Management ( Payment and Billing Process)
Data Analysis
Data Modelling
Expected Contribution
(Liermann &
Faster Turnaround Time
Stegmann, 2021)
Tabulated Summary : Your Topic - Continue
Authors Title Critical Thoughts
(Liermann &
Faster Turnaround Time
Stegmann, 2021)
Summary