0% found this document useful (0 votes)
13 views

Machine learning with unstructured data

The document is an in-house project report titled 'Machine Learning With Unstructured Data' submitted by Rachit Goel for his Bachelor of Technology degree at Amity University. It explores the challenges and methodologies of applying machine learning techniques to unstructured data, including text, images, audio, and video, and discusses various applications and preprocessing techniques. The report emphasizes the significance of feature extraction and engineering in effectively utilizing unstructured data for machine learning applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Machine learning with unstructured data

The document is an in-house project report titled 'Machine Learning With Unstructured Data' submitted by Rachit Goel for his Bachelor of Technology degree at Amity University. It explores the challenges and methodologies of applying machine learning techniques to unstructured data, including text, images, audio, and video, and discusses various applications and preprocessing techniques. The report emphasizes the significance of feature extraction and engineering in effectively utilizing unstructured data for machine learning applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

In-House NTCC Report

On

MACHINE LEARNING WITH UNSTRUCTURED DATA


Submitted in partial fulfilment of the requirements for the award of the degree of
Bachelor of Technology
in

Computer Science & Engineering


By
Rachit Goel
Under the guidance of
Dr. Rashmi Bhel
(Associate Professor)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


AMITY SCHOOL OF ENGINEERING AND TECHNOLOGY
AMITY UNIVERSITY UTTAR PRADESH, NOIDA

2023
DECLARATION

I, Rachit Goel of B.Tech.(CSE) hereby declare that the in-house project report titled
“Machine Learning With Unstructured Data” which is submitted by me to Department of
Computer Science & Engineering, Amity School of Engineering and Technology, Amity
University Uttar Pradesh, Noida, in partial fulfilment of requirement for the award of the
degree of Bachelor of Technology in Computer Science & Engineering has not been
previously formed the basis for the award of any degree, diploma or other similar title or
recognition.

Rachit Goel
A2305222645

ii
iii
iv
ACKNOWLEDGEMENT

Inspiration and motivation have always played a key role in success of any venture and right
guidance, assistance and encouragement of other people have played an essential part.

I am grateful to my faculty guide Dr. Rashmi Bhel, Associate Professor, Amity School of
Engineering and Technology (ASET) for his able guidance and support. Her guidance helped
me in every aspect for writing this report. I could not have imagined having a better advisor
and mentor for my report.
And lastly, I would like to acknowledge the main support I had that made me complete this
report on time and that is my family. They have helped me throughout and supported me. It
would have been unimaginable without their support.

Rachit Goel
A2305222645

v
TABLE OF CONTENT

S.no Topic Page no.


1. DECLARATION ii

2. CERTIFICATE iii

3. ACKNOWLEDGEMENT iv

4. ABSTRACT 1

5. INTRODUCTION 2-11
6. REVIEW OF LITERATURE 12-13

7. METHODOLOGY 14

8. DISCUSSION 15-16

9. CONCLUSION 17

10. REFERENCES 18-19

vi
ABSTRACT

Machine learning methods have completely changed the landscape of data analysis by
opening up previously inaccessible insights and patterns. Unstructured data presents new
difficulties and potential for machine learning, which has historically focused on structured
data with well-defined forms and organisation. Text files, photographs, audio recordings, &
video files all fall under the category of "unstructured data," which describes information that
does not adhere to a certain format. This study investigates the use of machine learning
techniques on unstructured data, focusing on the specific features and complexity of this data
format. It covers numerous methods for preparing, transforming, and representing
unstructured data for machine learning applications. The effect of feature extraction &
engineering techniques tailored to unstructured data is also investigated. The study then goes
on to explore the many ways in which machine learning may be used with unstructured data.
Methods for analysing textual data using natural language processing, or NLP, are studied.
These methods include sentiment analysis, text categorization, and information extraction.
picture identification, object detection, and picture synthesis are only few of the computer
vision problems described, along with the corresponding methods. Speech recognition,
recognising speakers, & video summarising are some of the other audio and visual data
processing methods investigated.

Keywords: Machine learning, Unstructured data, Data analysis, Structured data, Text
documents, Images.

1
CHAPTER-1

INTRODUCTION
There have been incredible developments and game-changing applications in several fields
thanks to machine learning in recent years. Structured data, which has clear definitions and is
already organised, has been the primary focus of machine learning so far. Unstructured data
has proliferated with the exponential expansion of digital information, presenting new
difficulties and possibilities for machine learning methods.

Because of its lack of structure, unstructured data presents a greater challenge for analysis
and insight extraction. There are many different types of content that fall under this category
of information. Unstructured data, as opposed to row-and-column-organized structured data,
may be challenging to analyse because of its variety, volume, and heterogeneity.

Machine learning has undergone a sea change with the advent of unstructured data, which
provides novel opportunities for processing and benefiting from previously inaccessible
material. In fields as varied as medical care, finance, marketing, & social media analysis,
academics and practitioners may benefit from the successful harnessing of unstructured data
by discovering previously unseen patterns, extracting relevant characteristics, and making
educated judgements.

Machine learning on unstructured data presents a unique set of obstacles, not the least of
which is preparing the data for analysis via preprocessing and transformation. NLP (natural
language processing) methods, for instance, are necessary for making machine learning
algorithms comprehensible when applied to textual input. Tokenization, stemming, and the
elimination of stop words are all steps in the process of creating a numerical representation of
the text that accurately represents its semantic meaning.

2
Preprocessing methods are also required for photographs, recordings of sound, and video
data. For applications like picture identification and object detection, algorithms for computer
vision are used to parse visual data for useful properties like edges, forms, and colours.
Speech recognition methods are used to transcribe audio data into text, whereas video
processing methods facilitate activities like speaker identification & video summarising.

Feature extraction & engineering become critical for accurately expressing the data for the
purposes of machine learning when the unstructured data has been preprocessed and
converted in a suitable manner. These operations include locating and picking out aspects that
are useful for learning. Word embeddings (such as Word2Vec, GloVe) and TF-IDF (Term
Frequency-Inverse Documents Frequency) are frequently used methods for processing text
data. Automatic feature extraction at several levels of abstraction for pictures is now possible
using convolutional neural networks (CNNs). Similarly, deep learning architectures may be
used to audio and video data to derive useful characteristics for further study.

1.2 BACKGROUND OF THE STUDY

The explosion of digital media has led to a hitherto unseen volume of unstructured
information. Text files, photographs, audio recordings, & video files are all examples of
unstructured data that may provide unique problems and possibilities for machine learning
programmes. Historically, machine learning has focused much of its attention on "structured
data," or data with clear definitions and organisational structure. The exponential expansion
of unstructured data, however, calls for the creation of specialised methods for efficiently
mining knowledge and insights from such enormous and varied stores of information.

Multiple fields and businesses often deal with large amounts of unstructured data. Digital
health records, diagnostic data, & clinical notes are all examples of rich sources of
information in the healthcare sector that may influence patient outcomes and care. Insights on
consumer behaviour, sentiment analysis, etc trend identification may be gleaned from the vast
amounts of unstructured data produced by social media platforms, such as posts, comments,
or user profiles. Automakers can now create autonomous driving systems or predictive

3
maintenance algorithms because to the massive volumes of unstructured data generated by
sensors, cameras, and telematics.

The increasing prevalence of unstructured data presents new difficulties for machine learning
programmes. Unstructured data, as contrast to structured data, does not have a predetermined
organisation and may be represented in a variety of ways. Because of its complexity, this data
has to be preprocessed and transformed in novel ways before it can be analysed. Unstructured
data has a high dimensionality and heterogeneity, hence new feature extraction and
engineering methods are needed to properly collect data from all of these places.

NLP, or natural language processing, has come a long way in tackling the difficulties of
dealing with textual input in its unstructured form. Text tokenization, semantic evaluation, et
sentiment classification are just a few of the methods that have paved the way for modern
uses of NLP, such as document classification, emotional analysis, and data retrieval. When it
comes to computer vision tasks like picture identification, object detection, or image
production, deep learning architectures like convolutional neural networks, or CNNs, have
shown amazing performance. Voice recognition, speaker identification, & video summarising
are only some of the applications of recent developments in audio and video processing.

1.3 PREPROCESSING TECHNIQUES FOR UNSTRUCTURED DATA

Unstructured data, which includes things like text documents, photographs, audio recordings,
as video files, has proliferated in the big data era. However, machine learning algorithms
normally function on structured data, thus the unprocessed nature of unstructured information
presents hurdles for them. When it comes to preparing data for analysis or machine learning,
preprocessing methods are invaluable. This section delves into the relevance of unstructured
data preparation strategies for further analysis.

When working with unstructured textual data, text preparation is essential. The process
entails organising unstructured material into a format that can be read and understood by
machine learning programmes. Tokenization is a typical method that separates the text into

4
distinct phrases for the sake of further analysis. Stemming, which returns words to their
original, uninflected form, is another useful tool for dealing with inflection. Stop word
removal also gets rid of filler words like "and," "the," and "is" that don't add anything to the
sense of the text. The preprocessing stage is improved by normalisation methods like casing
consistency and lemmatization, which reduces words to their smallest possible form.

Methods of picture preprocessing are used to better prepare raw image data for statistical
analysis. In order to maintain consistency across the dataset, it is usual practise to resize
photographs to the same dimensions. The precision of an analysis may be improved by the
use of noise-reduction methods. By adjusting the colour distribution across photographs,
colour normalisation may reduce the impact of differences in lighting or camera settings.
Relevant characteristics may also be extracted from pictures using methods like edge
detection & image segmentation.

The goal of audio preprocessing is to transform unprocessed audio files into a format that can
be read and understood by machine learning programmes. The process of audio segmentation
is cutting up large audio files into manageable chunks for the sake of analysis & feature
extraction. To improve the intelligibility of audio transmissions, noise cancellation methods
are used to lessen the volume of distracting noise. To describe sound in a numerical manner

5
usable for analysis and modelling, feature extraction techniques like Mel-frequency cepstral
values (MFCCs) are used.

Methods of video preparation are used to get data ready for analysis. In order to reduce the
amount of data while maintaining vital visual information, video frame extraction includes
choosing key frames from movies. For effective summarization and analysis of video
footage, keyframe selection is essential. By using video compression methods, the file size of
movies may be reduced without a discernible drop in visual quality, allowing for more
efficient storage and processing. Video motion data may be captured using optical flow
methods, allowing for the study of changing patterns and actions.

1.4 APPLICATIONS OF MACHINE LEARNING WITH


UNSTRUCTURED DATA

Many fields and businesses have been radically altered by the application of machine
learning to unstructured data, which has allowed for the extraction of previously inaccessible
insights, the automation of previously manual processes, and the making of data-driven
choices. Text files, photos, audio recordings, & video files are all examples of unstructured
data that may be used in a wide variety of contexts. Some of the most important uses of
machine learning on unstructured data are discussed here.

6
Uses of NLP (Natural Language Processing):

Using sentiment analysis, organisations may get a sense of public opinion, keep tabs on their
brand's reputation, and boost customer happiness by analysing textual data like customer
reviews and social media postings to assess the sentiment represented.

Automated spam filtering, content classification, & news classification are just a few
examples of the duties made easier by applying artificial intelligence algorithms to the
classification of text.

Extraction of Information: Organisations can automate data input, construct knowledge


graphs, & extract useful facts from textual material with the use of techniques like NER
(Named Entity Recognition) & Relationship Extraction.

Virtual assistants & chatbots that can instantly respond to user inquiries are possible because
to the ability to train machine learning models to comprehend and answer questions provided
in natural language.

Uses for Computer Vision:

Applications in areas like face identification, object recognition, & visual search are made
possible by the fact that machine learning algorithms, and in particular neural networks using
convolution (CNNs), could reliably categorise and recognise objects inside pictures.

Object Detection: With the use of machine learning algorithms, we can now use photographs
for things like autonomous driving, surveillance, and stocktaking using just the images
themselves.

Applications in the arts, design, and virtual reality are made possible by the ability of
adversarial networks (GANs) along with other deep learning designs to produce new, realistic
pictures based on learnt patterns.

Uses for Sound and Voice Processing:

Voice assistants, transcription offerings, and voice-controlled systems can all benefit from
speech recognition thanks to the use of machine learning algorithms like Hidden Markov

7
Models (HMMs) as well as deep learning models such as Recurrent Neural Networks
(RNNs).

To facilitate applications like speaker authentication, security systems, and forensic


investigations, machine learning algorithms may be used to determine and confirm the
identification of persons based on their speech patterns.

Customer sentiment analysis, contact centre analytics, & mental health monitoring are just
some of the areas that might benefit from machine learning models' ability to infer a speaker's
emotional state by analysing acoustic data and speech patterns.

Uses of Multiple Media:

By combining data from many sources (such as text, images, audio, and video), machine
learning models are able to capture a deeper and more complex sentiment analysis, yielding a
more complete picture of the user's feelings and thoughts.

Aligning visual material (pictures or videos) with their accompanying written descriptions is
possible thanks to visual-semantic alignment, a machine learning approach that enables
applications such as captioning of images, video summarization, or content-based image
retrieval.

Clinical and Biomedical Uses:

Algorithms using machine learning can analyse medical pictures like X-rays, MRIs, even CT
scans in order to help doctors spot anomalies, make more accurate diagnoses, and foresee
how a disease will proceed.

Clinical Artificial Language Processing: Machine learning algorithms can analyse clinical
notes & reports to extract structured data, which may then be used to improve systems that
support clinical decisions and automate medical billing and coding.

therapeutic Discovery: By applying machine learning algorithms to large quantities of


unstructured biomedical literature, we can better forecast the effectiveness and safety of novel
compounds, find new therapeutic targets, and speed up the discovery process.

8
1.5 LIMITATIONS IN MACHINE LEARNING WITH UNSTRUCTURED
DATA.

 Uses of NLP (Natural Language Processing):

Using sentiment analysis, organisations may get a sense of public opinion, keep tabs on their
brand's reputation, and boost customer happiness by analysing textual data like customer
reviews and social media postings to assess the sentiment represented.

Automated spam filtering, content classification, & news classification are just a few
examples of the duties made easier by applying artificial intelligence algorithms to the
classification of text. Extraction of Information Organisations can automate data input,
construct knowledge graphs, & extract useful facts from textual material with the use of
techniques like NER (Named Entity Recognition) & Relationship Extraction.

Virtual assistants & chatbots that can instantly respond to user inquiries are possible because
to the ability to train machine learning models to comprehend and answer questions provided
in natural language.

 Uses for Computer Vision:

Applications in areas like face identification, object recognition, & visual search are made
possible by the fact that machine learning algorithms, and in particular neural networks using
convolution (CNNs), could reliably categorise and recognise objects inside pictures.

Object Detection: With the use of machine learning algorithms, we can now use photographs
for things like autonomous driving, surveillance, and stocktaking using just the images
themselves.

Applications in the arts, design, and virtual reality are made possible by the ability of
adversarial networks (GANs) along with other deep learning designs to produce new, realistic
pictures based on learnt patterns.

9
 Uses for Sound and Voice Processing:

Voice assistants, transcription offerings, and voice-controlled systems can all benefit from
speech recognition thanks to the use of machine learning algorithms like Hidden Markov
Models (HMMs) as well as deep learning models such as Recurrent Neural Networks
(RNNs).

To facilitate applications like speaker authentication, security systems, and forensic


investigations, machine learning algorithms may be used to determine and confirm the
identification of persons based on their speech patterns.

Customer sentiment analysis, contact centre analytics, & mental health monitoring are just
some of the areas that might benefit from machine learning models' ability to infer a speaker's
emotional state by analysing acoustic data and speech patterns.

 Uses of Multiple Media:

By combining data from many sources (such as text, images, audio, and video), machine
learning models are able to capture a deeper and more complex sentiment analysis, yielding a
more complete picture of the user's feelings and thoughts.

10
Aligning visual material (pictures or videos) with their accompanying written descriptions is
possible thanks to visual-semantic alignment, a machine learning approach that enables
applications such as captioning of images, video summarization, or content-based image
retrieval.

 Clinical and Biomedical Uses:

Algorithms using machine learning can analyse medical pictures like X-rays, MRIs, even CT
scans in order to help doctors spot anomalies, make more accurate diagnoses, and foresee
how a disease will proceed.

Clinical Artificial Language Processing: Machine learning algorithms can analyse clinical
notes & reports to extract structured data, which may then be used to improve systems that
support clinical decisions and automate medical billing and coding.

therapeutic Discovery: By applying machine learning algorithms to large quantities of


unstructured biomedical literature, we can better forecast the effectiveness and safety of novel
compounds, find new therapeutic targets, and speed up the discovery process.

11
CHAPTER-2

REVIEW OF LITERATURE

There are numerous uses in the sciences & engineering where data fields are sampled at
unregular intervals. In order to reap the advantages of weight sharing and invariances,
Convolutional Neural Networks, were effectively applied to regular grids. In this work, we
generalise Convolutional Neural Networks (CNNs) by presenting Generalised Moving Least
Squares (GMLS)-based algorithms for data on unorganised point clouds. Partial differential
equations have recently been solved using GMLS, a non-parametric method for estimating
nonlinear bounded functionals from dispersed data. Learning strategies for operators with
structured stencils are obtained by parameterization of the GMLS estimator. The required
computations in GMLS-Nets are local, easily parallelizable, and backed by a strong
approximation theory for the estimator. We demonstrate how this structure may be used to
find related differential operators as well as conduct functional regression on quantities of
interest for unorganised physical data sets. The findings point to the designs as a promising
starting point for data-driven model creation in ML for science.

In order to produce useful value-added data, big data places a premium on data processing.
Coverage in big data has increased steadily with the advent of the cloud computing era,
attracting the interest of people from every walk of life. Big data analysis is becoming more
prevalent in contemporary society for tasks like as preparing for the future, assessing risks,
and integrating the current state of the market. As more and more sectors of society progress
swiftly, so too has the exchange of information and the growth of the Internet, leading to the
increased use of big data in numerous contexts. The field of research known as "machine
learning" examines the many ways in which computers may learn from data and experience.

This article discusses several machine learning (ML) methods from the viewpoint of an
applied economist. First, we provide an overview of the most important ML approaches by
relating them to standard econometric procedure. We then highlight present gaps in the

12
econometric as simulation model toolkit in applied economics, and investigate opportunities
made available by ML. We go into scenarios where prediction or causal analysis are hindered
by sophisticated simulation models, such as when functional forms are fixed, data sources are
unstructured, and there are many explanatory factors. We conclude that economists play a
crucial role in resolving the problems inherent in the use of ML to quantitative economic
research.

This article discusses several machine learning (ML) methods from the viewpoint of an
applied economist. First, we provide an overview of the most important ML approaches by
relating them to standard econometric procedure. We then highlight present gaps in the
econometric as simulation model toolkit in applied economics, and investigate opportunities
made available by ML. We go into scenarios where prediction or causal analysis are hindered
by sophisticated simulation models, such as when functional forms are fixed, data sources are
unstructured, and there are many explanatory factors. We conclude that economists play a
crucial role in resolving the problems inherent in the use of ML to quantitative economic
research.

Computer-based fluid dynamics (CFD) faces a potential development constraint due to


difficulties with mesh creation and adaption. Research on methods of automatically and
intelligently generating meshes should continue. Artificial intelligence, in the form of
machine learning, has been effectively applied to various domains, including fluid dynamics,
and has therefore revolutionised the growth of these subjects along with the fast advancement
of powerful computing horsepower and big data technologies. This study provides a short
overview of the use of machine learning techniques in the development of unstructured
meshes for CFD and an analysis of the primary challenges associated with machine-learning-
based mesh generation. Automatic extraction of unorganised mesh sample set data is also
achieved while a sample data format is developed. A unique machine-learning-based
approach to generating two-dimensional triangular grids is created via the combination of the
advancing front (AFT) technique and the artificial neural network.

13
CHAPTER-3

METHODOLOGY

Aim of Study

The study's goal is to examine the potential of machine learning methods when applied to
unstructured data, such as that found in text documents, photographs, audio recordings, &
video files, in order to extract useful insights and patterns.

Objective of the study

 To analyse how different unstructured data preparation methods affect data quality and
appropriateness for machine learning applications.

 The goal of this research is to investigate and assess the performance of feature extraction
and engineering techniques designed for use with unstructured data.

 The goal of this research is to understand more about how machine learning may be used
in fields like NLP, computer vision, and audio processing while dealing with unstructured
data.

 To recognise the difficulties and restrictions of machine learning on unstructured data,


such as privacy issues, biases, and scalability problems.

 The goal of this paper is to design and build a framework for testing and evaluating
machine learning methods using real-world, unstructured information.

14
CHAPTER-4

DISCUSSION
The results of the research on machine learning using unstructured data are analysed and
interpreted in the discussion section. The purpose of this chapter is to answer the study's
research questions and provide light on the performance of machine learning methods applied
to various forms of unstructured data. The following are some of the most salient features of
this section's discussion:

 Methods of Preprocessing and the Integrity of the Data:

First, we consider how different preprocessing methods affect data quality and how well they
work with various machine learning applications. Methods for improving the accuracy and
efficacy of unstructured data, such as text preparation, picture scaling, noise reduction, &
audio segmentation, are investigated.

The results demonstrate that proper preprocessing approaches considerably enhance data
quality by decreasing background noise, standardising file formats, and making unstructured
data more amenable to machine learning procedures.

 Engineering and Extraction of Features:

The efficiency of engineering techniques and feature extraction strategies designed for
unstructured data is explored. It examines the effectiveness of several methods for extracting
insights from data, including term frequency-inverse document frequency (TF-IDF) analysis,
word embedding, convolutional neural network networks (CNNs), as well as deep learning
architectures.

The findings suggest that feature extraction methods designed for unstructured data can boost
machine learning performance. These methods allow for a more accurate depiction of the

15
fundamental trends in the data and the extraction of useful features to improve the
discriminative ability of models.

 Machine Learning's Uses for Disorganised Data:

The potential of machine learning in fields as diverse as natural language processing,


computer vision, processing of audio, etc multimodal analysis are discussed. There is a focus
on the efficacy of machine learning models in areas including sentiment analysis, picture
recognition, voice recognition, or visual-semantic alignment.

The results demonstrate the usefulness of machine learning techniques for gaining
understanding and creating predictions from previously unanalyzed data. They show the
power of these methods in applications including multimodal sentiment analysis, imaging in
medicine, voice-controlled devices, and customer satisfaction surveys.

 Constraints and Difficulties:

The limits and difficulties of applying machine learning to unstructured data are discussed.
Data quality, scaling, privacy problems, biases, interpretability, and a lack of standard formats
are only few of the topics covered.

According to the results, fixing these problems is crucial for using machine learning on
unstructured data in a responsible and trustworthy manner. The limits of unstructured data are
highlighted, along with the need of privacy preservation measures, bias reduction tactics,
interpretability methodologies, and scalable infrastructure.

 Directions for Future Study:

Finally, several directions for further study and development are suggested for applying
machine learning to unstructured data. This emphasises the need of further development in
areas such as preprocessing, feature extraction, algorithms, interpretability, and fairness-
aware models. Data quality examination, automated preliminary processing methods,
explainable models for machine learning for unstructured information, and standardisation

16
efforts that promote interoperability or contrast between results across various domains or
sources are all areas that could benefit from additional study as a result of the findings.

CHAPTER-5

CONCLUSION

In conclusion, the research on machine learning in unstructured data has illuminated the
opportunities and obstacles associated with applying machine learning methods to various
forms of unstructured data, such as written documents, photographs, audio recordings, et
video files. The purpose of this research was to examine how well preprocessing approaches,
feature extraction strategies, & machine learning models performed when applied to
unstructured data prior to analysis and insight extraction. The study's results stress the value
of preprocessing methods for cleaning and organising unstructured material before analysis.
Tools including text preparation, picture resizing, noise reduction, & audio segmentation can
make unstructured data more useful for machine learning. This research also shows that
unstructured data may benefit from feature extraction & engineering techniques. Improved
performance in pattern recognition and prediction may be achieved via the use of machine
learning models by using feature extraction methods such as term frequency-inverse
document frequency (TF-IDF), embeddings of words, convolutional neural network models
(CNNs), as well as deep learning architectures. In addition, the research delves into the ways
in which machine learning may be used with unstructured data in a variety of settings.
Sentiment analysis, picture recognition, voice recognition, & visual-semantic alignment are
all examples of such uses. The results show how machine learning approaches may be used to
automate processes and extract insights in areas such as multimodal sentiment analysis,
voice-controlled devices, medical imaging, and consumer satisfaction surveys. But the
research also recognises the difficulties and restrictions of using machine learning upon
unstructured data. Significant obstacles include poor data quality, inability to scale, privacy
concerns, biases, difficulties in interpretation, and a lack of standardised formats. Reliable
and ethical application of machine learning methods to unstructured data relies on
overcoming these obstacles.

17
REFERENCE

Trask, N., Patel, R. G., Gross, B. J., & Atzberger, P. J. (2019). GMLS-Nets: A framework for
learning from unstructured data. arXiv preprint arXiv:1909.05371.

Nianhua, W., Peng, L., Xinghua, C., & Laiping, Z. (2021). Preliminary investigation on
unstructured mesh generation technique based on advancing front method and machine
learning methods. 力学学报, 53(3), 740-751.

Storm, H., Baylis, K., & Heckelei, T. (2020). Machine learning in agricultural and applied
economics. European Review of Agricultural Economics, 47(3), 849-892.

Hou, R., Kong, Y., Cai, B., & Liu, H. (2020). Unstructured big data analysis algorithm and
simulation of Internet of Things based on machine learning. Neural Computing and
Applications, 32, 5399-5407.

Zhang, D., Yin, C., Zeng, J., Yuan, X., & Zhang, P. (2020). Combining structured and
unstructured data for predictive models: a deep learning approach. BMC medical informatics
and decision making, 20(1), 1-11.

Wróblewska, A., Stanislawek, T., Prus-Zajaczkowski, B., & Garncarek, L. (2018, September).
Robotic Process Automation of Unstructured Data with Machine Learning. In FedCSIS
(Position Papers) (pp. 9-16).

18
Feldman, R. (1999, August). Mining unstructured data. In Tutorial notes of the fifth ACM
SIGKDD international conference on Knowledge discovery and data mining (pp. 182-236).

Barker, J. (2020). Machine learning in M4: What makes a good unstructured model?.
International Journal of Forecasting, 36(1), 150-155.

Jiang, S., Nocera, A., Tatar, C., Yoder, M. M., Chao, J., Wiedemann, K., ... & Rosé, C. P.
(2022). An empirical analysis of high school students' practices of modelling with
unstructured data. British Journal of Educational Technology, 53(5), 1114-1133.

Wang, Z., Shah, A. D., Tate, A. R., Denaxas, S., Shawe-Taylor, J., & Hemingway, H. (2012).
Extracting diagnoses and investigation results from unstructured text in electronic health
records by semi-supervised machine learning. PLoS One, 7(1), e30412.

Ma, L., & Sun, B. (2020). Machine learning and AI in marketing–Connecting computing
power to human insights. International Journal of Research in Marketing, 37(3), 481-504.

19

You might also like