0% found this document useful (0 votes)
10 views

Soranson Python-Machine-Learning RuLit Me 683600

Uploaded by

maxqa007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Soranson Python-Machine-Learning RuLit Me 683600

Uploaded by

maxqa007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Python Machine Learning

How to Learn Machine Learning with


Python, The Complete Guide to
Understand Python Machine Learning
for Beginners and Artificial Intelligence
Table of Contents
Introduction
Chapter 1: Introduction (A Small History of Machine Learning)
History of Machine Learning
Key Machine Learning Terms and Their Definitions
Chapter 2: The Concept of Machine Learning
Definition and Application of Machine Learning
Key Elements of Machine Learning
Types of Artificial Intelligence Learning
Chapter 3: Mathematical Notation, Basic Terminology, and Building
Machine Learning Systems
Mathematical Notation for Machine Learning
Terminologies Used For Machine Learning
Road Map to Building Your Machine Learning Systems
Chapter 4: Using Python for Machine Learning
Variables
Essential Operator
Functions
Conditional Statements
Loop
Chapter 5: Artificial Neural Networks
Introduction to Artificial Neural Networks
Types of Artificial Neural Networks
Artificial Neural Network Layers
Advantages and Disadvantages of Neural Networks
Chapter 6: Machine Learning Classification
What Is Machine Learning Classification?
Types Of Classifiers In Python Machine Learning
Machine learning classification models
Metrics for evaluating machine learning classification models
Chapter 7: Machine Learning Training Model
Simple Machine Training Model in Python
Simple ML Python Model using Linear Regression
Chapter 8: Developing a Machine Learning Model with Python
Installing the Python and SciPy Platforms
Loading the Dataset
Summarizing the Dataset
Visualizing the Dataset
Evaluating some Algorithms
Making some Predictions
Chapter 9: Training Simple Machine Learning Algorithms for Classification
Linear Regression
Logistic Regression
Decision Tree
SVM
Naïve Bayes
KNN
K-Means Clustering
Random Forest
Dimensionality Reduction Algorithms
Gradient Boosting Algorithms
Chapter 10: Building Good Training Sets
How to Build the Data
Data Selection
Data Preprocessing
Data Conversion
Conclusion
Introduction
Congratulations on purchasing Python Machine Learning: How to Learn
Machine Learning with Python, The Complete Guide to Understand Python
Machine Learning for Beginners and Artificial Intelligence and thank you for
doing so. The following chapters will discuss everything a beginner would
want to know about Machine learning, Artificial intelligence, and Python.

The first chapter is an introduction to Machine Learning and a history of


where it all began back in the 1940s to where it is at today. The chapter also
covers the terms popular in Machine learning and artificial intelligence
circles and their definitions. This is so that a beginner will understand the
language in the book without much struggle.
The second chapter is about the concept of machine learning. This chapter
offers an in-depth explanation of how machines gain the ability to think for
themselves the same way human beings do and the many ways people apply
machine learning in various fields. It continues to explain the key elements of
machine learning and gives a description of the types of Artificial
Intelligence learning available today.
The third chapter is about the mathematical notation for machine learning,
where the reader will understand the relationship between mathematical
nomenclatures and machine learning. The chapter also explains the
terminologies common in machine learning, and it concludes with a roadmap
for machine learning exploration.
The fourth chapter is an introduction to using Python for machine learning,
and it explains the basics an individual would need to understand about this
excellent coding language. The chapter explains the various stages involved
in machine learning using Python and it contains real-life explanations of the
integral features and functions making up this language.
The fifth chapter is an explanation of Artificial Neural Networks in machine
learning. This chapter goes into detail to show how the human brain is the
main inspiration for machine learning and how with time machines will have
the ability to reason the same way human beings do. The chapter explains the
meaning of neural networks, the classifiers in Python machine learning, the
machine learning models, and the metrics for evaluating machine-learning
models.
The seventh chapter is about Machine Learning training models. It covers the
process of training machines in Python and goes into detail on the use of
linear regression in training. The eighth chapter is about developing machine-
learning models with Python, all the way from installing this coding
language, loading the dataset and summarizing them, evaluating algorithms
and making predictions.
The ninth chapter is about training machine learning algorithms for
classification, and it goes deep into explaining all the steps involved and
processes implemented. Some of the areas covered in this chapter include
Linear Regression, Logistic regression, decision tree, random forest, and
dimensionality reduction algorithms among others. The final chapter is about
building good training sets for machine learning. The chapter explains how a
beginner can build data, select the data, process it, and then convert it.
There are plenty of books on this subject on the market, thanks again for
choosing this one! Every effort was made to ensure it is full of as much
useful information as possible, please enjoy it!
Chapter 1: Introduction (A Small History of
Machine Learning)
History of Machine Learning
Machine learning is the platform from which we develop neural networks.
Currently, there are many applications of the power and capacity of neural
networks in everyday life. Humans need these artificial intelligence models to
assist in technical time demanding tasks. These are tasks, which would
otherwise be prone to errors due to our biological limitations.
The machines need us as well to continue their process of learning and to
gain more knowledge as technological intelligence improves. When we
observe how the improvement and developments do go hand in hand, we can
only imagine what is in store for the future of humankind. All these leaps and
bounds we are currently undergoing had to start somewhere. That somewhere
and some time was in the past, not far past but within the last century.
Given the progress we have made so far, it would be hard to bet against
achieving exponentially massive leaps in technology in our very near future.
However, before that fantastical future gets here, let us all look back to where
it all started. Let us examine how all these machine learning and artificial
intelligence concepts came into existence. Here is the roughly accurate
timeline of events:
The 40s
There were two leading experts in the field of math (W. Pitts) and
neurophysiology (W. McCulloch). They first had the idea of combining their
areas of expertise. Both of them realized than human brains are the center of
decision making for humans. Neurons govern the minds.
Neurons communicate by electrical neurotransmission. Humans already
knew how electricity works, but what if the operation of neurons and
electrical circuits had a relationship. The idea of combining the terms neural
and networks was born then, in 1943.
The 50s
Inventions were borne out of world war two and dominated this particular
decade. Many ideas that spanned this decade came about due to the war
effort. The previous conscription of scientific minds to assist in the war effort
had left many ideas worth further pursuing. For instance, you had the Turing
test developed by the British genius, Alan Turing.
The test was reasonably paradoxical as for it to be a success; it had to prove
that a contrary assumption was right. For instance, the trial test had to prove
to the masses that the machine was a biological human and not a mechanical
computing device. This specific year was 1950.
Two years later, people all over the technological field were astonished by
the success of a computer, which was programmed to play a game of
checkers efficiently. The machine correctly assimilated all the rules relevant
to the match. In addition, its program executed the algorithms according to
the laws of the game. This event was the initial instance that people could
believe that a computer could concurrently learn as it operated at the same
time from a set of rules. This achievement was courtesy of a Mr. Arthur
Samuel in1952.
While these initial developments were groundbreaking at their time, they
were just the baby steps that the industry needed. To cover an unimaginably
massive distance, you start with a single step. These initial developments
could mirror as such individual steps. These achievements were just the
teetering between early concepts of machine learning Vis a Vis advanced
information technology and computer science. With hindsight, you could
view these initial successes as the beginning of programming a computer to
perform whatever task you commanded.
The expectation came into existence from the creative mind of Mr. Frank
Rosenblatt. He developed a system of programs, which was capable of
identifying and subsequently, recognizing various shapes and patterns. He
almost appropriately named his invention, Perceptron. His was a machine
that operated based on a system and had the added advantage of recognition.
It was mainly due to these factors that his invention widely claimed the title
of the world's first artificial neural network. The year was 1958.
Since the revelation of the Perceptron, many machine linguists started
investing more of their time and resources. These investments went directly
into this relatively new subject of neural networks. Progress on neural
networks carried on at a pace that only technological advancements would
allow. Neural networking was a popular knowledge base that attracted a lot
of civilized technical interest from scholarly and intellectual sources.
An event at Stanford University would bring assurance to a great extent,
which the neural network massive interest had generated. Two researchers
who were working on the topic of neural networks managed to develop
synthetic artificial models of them-selves. The aim was to monitor their
observation capacity, identification ability, and learning progress of these
synthetic variants.
The intellectual community confirmed that when it came to sets of binary
data, the models could identify the data sets as such. The models tested well
on binary probability, and furthermore, they gave correct speculations as to
the expected binary sequences. In addition, the models were responsible for
minimizing the interference within telegraphic lines. The models got the
names, Adeline and Madeline. These events happened in 1959.
The 70s
The seventies was the decade of the Von Neumann architecture and its
preferred application in existing and upcoming developments in neural
networks. Fundamentally, this architecture rightfully advocated that the
storage of relevant data and program instructions be performed by the same
memory, as opposed to the norm at the time.
This new system was ideally more understandable than the typical network
systems of that age. Further meaningful progress in the field was dependent
on the favourability of this modern architecture. Most of the program
developers within the domain of neural networks found it perceptually
simple, cost-effective and started developing their respective components
based on the Von Neumann architecture.
The 80s
After the seventies, the new decade brought with it new ideas. For instance,
we explored the concept of neural networks that comprised more than one
line. This unique idea was meant to mimic the modus operandi of human
biological neurons. An impressive champion of this alternative thought
process was Mr. John Hopfield.
The notion of developing multi-directional neural networks generated quite a
buzz in the technological disciplines at the time. The sudden focus that
resulted from this increased attention caused a generous spike in resource
allocations, especially from countries like the United States and Japan. This
year was 1982.
Stanford University once again sets the stage for the next step in the
evolution of machine learning. This institution produced the researches that
developed an essential algorithm for the functionality of neural networks. The
algorithm is called backpropagation. Backpropagation was mainly an
extension of a previous algorithm by Hoff and Widrow.
Because neural networks could then be distinguished based on the learning
period, this extension arrived just in time. Hence, saving the limited vital time
you had in the first place. Backpropagation, in essence, made it possible for
your neural networks to utilize multiple layers. It was thereby resulting in
'slow learners', which are the networks that need a longer duration for
learning. Backpropagation had its introduction in 1986.
The 90s
Progress in machine learning was rather slow in the nineties. There were a
couple of gaming computers, which went head to head against human
intelligence. However, unlike the previous networks from the early fifties,
these current models had the benefit of profiting from technological progress
made since then. The newer machines were much more advanced in
programming and execution. For instance, the world chess champion in 1997
lost his match against an IBM computer.
Furthermore, researchers found that neural network systems had consistently
high scores while analyzing digit identification/recognition. This particular
project experimentation eventually led to a highly accurate degree of
postcode detection that was, in fact, handwritten. This level of accuracy was
possible through the algorithms used by backpropagation. Therefore,
backpropagation contributed to neural network history in 1998.
The 2010s
Machine learning now has to be universally accepted as the way of the future.
Every endeavor made by humans since the beginning of this new millennium
is towards advancement in artificial intelligence. From an early age, humans
need to know that machine learning forms the basis for the possibility of an
eventual artificial intelligence society.
The capacity for profitability in our bottom lines is unlimited. Any business
that has the resources will apply artificial intelligence techniques to maximize
their profits. Besides, such companies will invest significantly in AI research
and development to beat their competition. To improve your chances of
succeeding in a future intelligence-based world, you need to get up to speed
with machine learning today.
The following are the primary inventions currently capturing our attentions
and turning heads in the field of machine learning, and by extension, possible
artificial intelligence:
2012 – GoogleBrain by Jeff Dean. This deep neural network cantered on
recognition and detection of patterns. Various patterns found in images and
videos were detectable by GoogleBrain. However, the sources of models
were limited to Google resources such as YouTube videos. This level of
quality made it unparalleled to the more humble neural networks.
2012 – AlexNet. This image detection system has the highest level of detail
in clarity. This network resulted in the current widespread use of GPU in
neural network systems and machine learning in general. AlexNet also
developed the Convolutional Neural Networks (CNN) in machine language.
Besides, it designed an accompanying ReLU activation feature to increase
CNN efficiency massively.
2014 – DeepFace by Facebook. The company claims that its deep neural
network's accuracy and specificity in facial recognition is similar to that of a
person.
2014 – DeepMind. This system is equal if not better than humans at certain
games considered complex. A professional in playing Go board games lost to
DeepMind at this game in 2016.
2015 – OpenAI by Elon Musk. This project began based on humanitarian
grounds. The main intention is to develop secure AI environments for the
wellbeing of all humans.
2015 – Amazon Machine Learning Platform. This platform belongs to
Amazon Web Services and caters to massive companies. Such huge
companies trade in a highly significant amount of data regularly to warrant
assistance from a machine learning application or artificial intelligence. In
addition, multinational companies and big franchises deal with a vast array of
services that can only get accomplished with the help of new intelligent
systems. Distribution services also benefit a lot from such machine learning
platforms.
Key Machine Learning Terms and Their Definitions
Model - This term describes the real-world interpretation in a mathematical
format that the machine can understand. Training data is essential for the
device to learn from; hence, your model should contain algorithms based on
appropriate training data.
Algorithms - This term is the set of assumptions that the machine applies to
the input data and produces outputs based on these assumptions. Algorithms
are almost synonymous with program instructions. During training, devices
have to follow specific algorithms to result in particular outcomes.
Training - This activity involves repeated subjection of the machine to the
same set of training data and corresponding algorithms. The aim of this
repetitive nature in training is to achieve learning. Hence, the overall process
is called machine learning. The immediate purpose of machine training is to
achieve synergy between the training data and their corresponding
algorithms. This synergy is an indicator of progress in the right direction for
your machine learning process.
Regression - This training technique is applied when the machine labeled
output does not correspond to the format of the expected known outcome.
Regression implies a problem in the learning process of the machine-learning
model. For instance, the machine produces a fixed value output based on
continually variable outcomes such as time range.
Classification - This activity involves grouping data sets into predetermined
classes. Classification is carried out based on various factors relevant to the
particular data classes.
Target - Target is the output produced by your machine after processing the
input. The outcome depends on the input variable, and the algorithms
applied. In training, the output produced is an already known value.
Feature - The feature is a term used to refer to the input variables to your
machine. In machine learning, the feature data is synonymous with input
data. Dimensions are the number of features used as inputs. Feature
engineering is the act of turning previous features into new features.
Predictions are typically made using features. Features may also identify as
attributes.
Label - Labels are the opposite of features. The final output or outcome from
machine learning is called label data.
Overfitting and Generalization - Overfitting uses the extents of
generalizations to offer a description of itself. Both overfitting and
generalization are end-process concepts of machine learning. Generalization
is essentially the extent to which your output precision in the real world
matches your target precision during training. Remember, during practice,
you used training data, and different types of data will invariably behave
differently.
Generalization describes the ease with which you can switch your training
data for separate data while still maintaining your target precision.
Generalization can be excellent or weak based on your signaling to noisiness
rate. A high value in this rate in the material used as training data indicates a
useful generalization, while the opposite situation holds. Overfitting is a
condition in which both the model and training data exactly match each other,
resulting in a poor generalization of new data.
Regularization - Regulation is a mutually agreeable coming to terms on the
permissible extent of complexity in your machine learning models. The
ultimate objectives of this process are to increase generalizations and to
eliminate overfitting or under-fitting situations.
A figure of authority and integrity should carry out this exercise to avoid bias
or favoritism. He or she must be a third party and non-partisan. To achieve
these objectives, the regulator will need to instill and charge a substantial fine
on the conflicting or inconsistent features of your model. You should
understand that regularization would result in limitations to the independent
status of your model.
Parameter and hyperparameter - Parameters are features of the machine-
learning model that form a part of its design layout and enhances its
mechanical, physical looks. Parameters are modifiable and variable.
Functional parameters are settings. Parameters can be technical as to affect
the machine mechanism of prediction, or visual parameters to enhance the
observational appearance of your machine-learning model. Parameters can be
targeted for regularization to achieve model uniformity among the warring
parties. Functional parameters depend on alterations in the algorithms or
training data.
Hyperparameters are those knowledge aspects that you obtain from your
extensive familiarity with a particular field of study or research.
Hyperparameters cannot bear significant influence by the output from your
machine-learning model. You need to fine-tune or initialize your model with
these hyperparameters before training. This action will build bias towards
your point of view. Your final model output will favor your already preferred
outcome. You receive training to ignore certain aspects of the model output
entirely or partially, which do not conform to your predetermined
professional bias.
GPU - GPUs are the processing units that are used by neural networks and
machine language systems to carry out their necessary functions. It is
comparable to a personal computer's CPU except that it has hundreds or
thousands more processors per chip than the CPU.
A significant characteristic of GPU is that it performs arithmetic operations
on vectors. This fact means that a GPU is pre-programmed to particular use
situations. Each process will be carried out in parallel with another different
task at the same time. On the contrary, a regular CPU can carry out all kinds
of mathematical operations in a sequential order making it slower than the
GPU.
Vectors and Matrices - These terms represent lists of vectors or just
numbers placed in particular grids. This attention to detail in the specificity of
the vectors, grids, matrix, and data sets is crucial in the successful functioning
of your neural network. You can also apply these traits to machine learning
systems.
These features are intentionally small to fit as many of them as possible into a
chip. Therefore, since GPU uses more processors per chip, it will need a very
high number of grids for every available chip. Vectors and matrices are vital
elements for the learning process in machine language and artificial
intelligence.
Cloud computing with GPU - You can apply existing cloud computing
aspects and features to machine learning as well. All you need is rental space
on the cloud, and you can lease out GPUs from the large companies offering
these services. This move is more cost-effective than entirely purchasing a
whole bunch of GPUs since you get to pay for its use at affordable rates.
In addition, this is an efficient way to utilize the massive potential of GPUs.
Only units that needed at a particular time will be in use. Others will be on
standby for the next available task. This concept maximizes the usage of
available GPUs. Individual enthusiasts could also rent out affordable GPUs
for their personal intellectual development and promote innovation.
CNN - This term is Convolutional Neural Networks. This system is used in
machine learning to improve the clarity, accuracy, and precision of a
particular feature, in this case, an image. The resulting labeled output should
achieve very high scores in all the relevant features during prediction.
The mechanism of action in CNN involves passing an image through a range
of filters to get a classified, labeled output. The availability of both filtering
and convolution layers enables your CNN to improve its specificity. Your
network can then handle complicated or abstract images better than the
typical systems.
ReLU - ReLU is the short form of the Rectified Linear Unit. ReLU is a layer
added as an activation feature to CNN to improve its efficiency further.
Remember that there were filtering and convolution layers on CNN. After the
insertion of each convolution layer, the application of the ReLU layer is
subsequent. In this case, the ReLU typically serves as the activation function.
Currently, it is a commonly used application whose purpose is to mimic the
actual neurons on and off switch. The ReLU primarily eliminates negative
values by setting them to zero. In the subject of machine learning, CNN has
so many redundancies that success and the highest efficiency are almost
guaranteed outright.
U-Net - In convolutional neural networks, there is an architectural component
termed as U-Net. This structural feature focuses on biomedical image
segmentation, particularly in the field of radiology. U-Net established similar
numbers of up and downsampling layers resulting in CNN's positive
contribution to the medical field, albeit radiology.
Backpropagation - This procedure is the core algorithm behind how the
neural networks learn. What we mean by learning is that we want to find out
which weights and balances minimize a specific cost function. The aim is to
try to reduce this cost function. You are looking for a negative gradient of
this cost function. A negative slope will indicate how to alter the weights and
balances to decrease the cost most efficiently.
People arrive at the total cost of the network by calculation as follows: you
get all the outcomes given by the system. Compare the findings to the known
expected results that you already have. Take the differences in all these run
instances - Square all the values representing the various individual
differences. Finally, find the mean. The mean is the halfway point of all the
total squared numerical values. To achieve learning and convincingly
evaluate performance, machine learning must subject to backpropagation.
This activity introduces many chances for the machine to learn or relearn
while the chances of failing at prediction (cost function) minimize
significantly. Without backpropagation, the advancements currently achieved
in machine learning would not have been possible.
Chapter 2: The Concept of Machine Learning
Machine Learning is getting a computer to program itself automatically. A
programmer inputs data sets with specific accompanying commands that
enable the system to differentiate between particular objects of concern. They
do it continuously until the computer adapts and learns to distinguish the
objects automatically. A person feeds the software consistently to ensure the
algorithms adjust and optimize the models, which leads to automatic
programming.
In contrast with traditional programming, machine learning enables the data
to do the job as opposed to people. A programmer using machine learning
runs data and output on the computer to form a program. He or she can then
utilize that program to carry out traditional programming processes. The
customary programming runs data and application in the system to obtain the
output.
Therefore, machine learning enables IT systems to identify resolutions to
issues through recognition of patterns or designs that exist in the databases.
They base this recognition on present algorithms and sets of data and
afterward use it to form appropriate concepts of solving issues. It is essential
that the programmer inputs the proper data and algorithms, as well as clearly
define the corresponding analysis rules for pattern recognition in a data
inventory.
Such prior preparations allow machine learning to carry out various duties. It
can use identified patterns to optimize processes, use data analysis to make
specific predictions, and independently adapt to particular developments.
Additionally, it can also find, extract, and summarize essential data, as well
as calculate the probabilities for specific results.
Here, we have information regarding the application of machine learning,
along with the elements and types of Artificial Intelligence learning. This
knowledge will enable an individual to understand the concept of machine
learning and its application to the system and its surroundings.
Definition and Application of Machine Learning
Machine learning refers to the process of making a system to work
automatically by making the computer program itself. It works in the same
way that human beings learn. Humans learn throughout their lives and
broaden their knowledge as they go. They learn to recognize and distinguish
between various items while in childhood and continue learning about new
things as they later go through the education system and their professional
lives. Machine learning also works in a similar fashion where it teaches a
computer to determine and differentiate particular things or people. It uses the
input of data along with specific commands to train the system to learn the
differentiation of objects.
The programmer continuously provides the computer software with feedback,
and then the algorithms utilize the resulting signals to modify and make the
most of the model. The more the new collection of data person inputs, the
better optimized the model becomes, and it can make a better distinction
between objects or persons of interest. For example, a programmer can input
data that informs the computer that a specific item is a fruit while another
thing is not a fruit. They continually feed the software, and this constant
update of data input optimizes the model until it comes to differentiate
between fruits and those objects that are not fruits correctly.

1. Types of Machine Learning

There are four main types of machine learning that a person can use. A
programmer chooses the kind that they prefer or need to accomplish
automated programming. Certain factors, such as need or disregard of the
desired output can determine the type of machine learning that a person
utilizes. The four forms of machine learning sort out as supervised,
unsupervised, semi-supervised as well as reinforcement machine learning.
a) Supervised Machine Learning
The supervised machine learning employs the environment on the outside as
a teacher. The external conditions, people utilizing the programs, and other
factors guide the algorithms of Artificial Intelligence. They provide feedback
to the learning functions that outline the observations concerning inputs to
output findings. One supervises learning in this type, and the training data
also takes into account the desired outputs. Supervised learning is also an
inductive learning type. It is the most common type and the one about which
people studied the most. Additionally, it is also the most mature in
comparison to the other learning types. This model provides better facilitation
because learning with supervision is a lot easier than it is without
guidance.
b) Unsupervised Machine Learning
This type concentrates on learning an arrangement or pattern in the data one
inputs without including any feedback from the external conditions.
Examples of unsupervised learning are clustering and association techniques.
Clustering deals with the collection of data points. A person uses these
collections of data to categorize each data point into a particular group by
using a clustering algorithm. Data points that exist in the same group should,
in theory, possess comparable characteristics or features.
On the other hand, the data points that are in different groups should
theoretically have very different properties. People use this technique to
analyze statistical data in various disciplines and fields. One uses the
association technique when they want to find out rules that explain large
sections of their data. The training data in this unsupervised learning
technique does not take into account the desired outputs.
c) Semi-supervised Machine Learning
This learning type strives to deduce or infer new features or titles on fresh
data sets by employing a group of data that a person selects carefully and
labels them. Semi-supervised learning is the middle ground between the
unsupervised type and the supervised learning kind. The training data in this
learning model incorporates a few outputs that one desires.
d) Reinforcement Machine Learning
Reinforcement learning type strengthens and improves knowledge by
employing contrasting dynamics. It uses contrary moves such as rewarding
and punishing to reinforce the information. A succession of actions can
produce rewards. Artificial Intelligence favors this learning type, and it is the
most ambitious one of all the learning kinds.
e) Active Machine Learning
This type of learning allows an algorithm to identify questions with high
significance. The algorithm examines and queries the findings for particular
input data on the grounds of relevant predetermined questions.
2. Applications of Machine Learning
A person experiences machine learning in almost everything automated that
they do. Machine learning takes place in various activities, such as an
individual's daily functions to serious financial decisions. Sometimes a
person does not even recognize when they experience a machine learning
process. The following are some of the areas and the processes of machine
learning occur in them. Understanding its application in different aspects of
life can help an individual to appreciate its importance and value.
a) Social Networks Application
Machine learning in social networks enables applications such as Facebook
and Instagram to exist. It uses face recognition to identify and differentiate
between various users of the services. A person inputs data into the social
network site and Facebook or Instagram optimize and trains the visual
recognition of the machine learning systems. It takes various data such as
bitmaps of the face and the corresponding names that a user types to update
its face database. It then enhances the visual identification features of the
social network application.
b) Internet Application
Machine learning allows spam detection capabilities in internet functions
such as e-mail programs. Machine learning evaluates and categorizes the data
that exists in the e-mails to establish a pattern that indicates spam or non-
spam. It recognizes spam e-mails faster if the e-mail is junk mail. In addition
to spam detection, machine learning also identifies virus threats and block or
warn about the related sites. Such recognition of dangers enables the users to
prevent computer attacks as well as fighting against cyber-crimes such as
hacking. Furthermore, it solves problems like debugging by evaluating the
data it has to identify the possible location of a bug in the software.
Machine learning also enables optimization of websites where a user
establishes specific keywords to generate views or clicks. People repetitively
input that particular word until the machine learning systems adapt and
enhance information regarding the keyword. This optimization leads to
ranking in search engines depending on the number of inputs of the keyword
of interest.
c) Financial Application
A person can use machine learning to make money and even prevent any
potential losses in the future. They can utilize the automatic analysis of
reports concerning movement or loss of clients. They can get feedback
regarding previous clients who left the company and look at the ones who
requested to leave. If there are, characteristics that the two groups have in
common, one can identify the mutual problem and find possible solutions to
prevent the possible further loss of clients.
Similarly, a person can input data that indicates the values and preferences of
customers. Machine learning analyzes and evaluates the data. It then provides
findings that can assist a business person to create measures that cater to
those customers. Additionally, service industries like banks can employ
machine-learning systems to find out if a client is viable to receive a loan or
credit. The bank analyzes the client's property and wealth information. The
results will determine whether the enterprise approves the customer's credit
or loan request.
d) Technological Application
Machine learning enables development and advancements in the robotics
industry. Self-driving cars are an example of how machine learning can work.
A driver inputs their travel details such as the destination and speed, and the
vehicle proceeds to use machine learning to drive itself automatically.
Moreover, there are automated programs in several customer service
companies, which use robots to converse with clients. The chatbots modify
and maximize their cognitive skills through interpretation of the customer's
vocal tone. The bots transfer the call to a human employee if the sound and
situation are too complicated.
In addition to robots, one can apply machine learning in other independent
systems such as medical devices and analytical tools in the stock market. The
devices provide automated diagnosis and make the work of the doctor more
efficient, which provides better health services. Traders also analyze and
interpret the stock market through indicators that use machine learning to
offer patterns that determine their movements in the trade.
e) Computational Application
A machine learning system enables a computer to form a logical design. The
system identifies previous experiments in the database and uses it to produce
a systematic pattern in the software. Further to providing this design, machine
learning extracts information by asking questions in the various databases
across the internet. This extraction gives the user valuable results where they
can determine and separate essential and unnecessary information.
Conclusively, the machine learning process is a cycle that needs a person to
repeat the loop until they get outcomes that they can use. If the data changes,
one may need to create and use a new circuit. A person starts the loop and
proceeds to research to understand the goals, domain, and gain prior
knowledge from other user experts.
They then integrate data, clean and pre-process it by retaining the significant
and eliminating the others. The user next learns the models and then
interprets the results. Finally, they merge and distribute the discovered
information before ending the loop. A successful cycle ensures that an
individual can effectively apply the machine-learning process in various
aspects as needed.
Key Elements of Machine Learning
There are three essential elements that that form the components of a
machine-learning algorithm. People create numerous algorithms yearly, but
they all share these three features. A person needs to study and understand
these components and their functions concerning the machine learning
process. This comprehension enables one to possess vital knowledge
regarding the framework for discerning all algorithms.
1. Representation – It refers to how machine learning represents
knowledge. The algorithms could portray information by using means
such as model ensembles, impartial networks, graphic models,
instances, and sets of rules, among others.
2. Evaluation – It refers to how the algorithm evaluates prospect
programs or hypotheses. Machine learning can examine through
likelihood, accuracy, cost, prediction and recall, squared error, and
posterior probability amongst others.
3. Optimization – It refers to the search process where it shows how
machine-learning algorithms generate hypotheses or prospect
programs. The machine learning employs methods such as constrained
optimization, combinatorial optimization, and convex optimization.
Types of Artificial Intelligence Learning
Learning is the process where a person uses observations made in the
environment of an Artificial Intelligence program and uses it to enhance the
knowledge or information of Artificial Intelligence. The learning processes
concentrate on handling a collection of input to output pairings for a
particular function. They then foretell the output or results for recent data.
The human learning process comprises of two essential features, which are
feedback and knowledge.
One also categorizes the types of Artificial Intelligence learning into those
under information and those appertaining to feedback. The knowledge point
of view classifies learning models according to the depiction of input and
output points of data. The feedback perspective categorizes the models
regarding the engagement with the external environment, user, and other
extrinsic elements.

1. Models Based on Feedback Classification

An individual can categorize Artificial-learning types according to the


characteristics of the feedback that they obtain. They can classify the models
as supervised, semi-supervised, unsupervised as well as reinforced types.
a) Supervised Learning Model
Supervised learning is also an inductive type where a person supervises or
monitors the learning process. This model utilizes feedback from the outside
to the functions of machine learning that indicate the inputs to output
information that one observes. The external feedback provides guidance or
teaches the algorithms of Artificial Intelligence.
b) Unsupervised Learning Model
This model concentrates on learning design or pattern in the data that
individual inputs, without any extrinsic feedback. Examples of the
unsupervised learning model are association and clustering techniques.
c) Semi-supervised Learning Model
One uses this type of learning when they use a group of data that they
carefully selected. They then attempt to guess new characteristics or terms on
the latest collection of data. The mid-point between the unsupervised model
and the supervised learning type is the semi-supervised learning model.
d) Reinforcement Learning Model
This learning type utilizes opposite dynamics like punishment and rewards to
strengthen various types of information. Modern Artificial Intelligence
solutions are making this learning model grow in popularity.

2. Models Based on Knowledge Classification

Artificial Intelligence learning using classifications based on this model deal


with how one represents knowledge. A person uses the information and
applies it in two ways, that is, deductive learning type and the inductive
learning model.
a) Deductive Learning Type
This learning technique begins with a series of rules that infer or deduce other
new laws. These new rules work more efficiently in the context of a
particular Artificial Intelligence algorithm. Deductive techniques like
Explanation-Based Learning (EBL) draw the general rules from instances by
the generalization of the explanations. Other methods like the Relevance-
Based Learning (RBL) concentrate on simple examples by determining the
attributes along with the deductive generalizations.
b) Inductive Learning Type
Inductive learning infers a general rule from a collection of data of input and
output pairs. It provides function examples by way of data or input samples
(x) and the output of the function, which is (f(x)). A person uses this learning
type to find out the service (f) for the new data (x). It also provides the
overall theory at the back of supervised learning. It is difficult to approximate
the function when practicing inductive learning. As a result, one tries to find
out the best possible estimations of the service. Therefore, an individual
inputs specific data (x) and from the results (f(x)) that they receive, they can
interpret the meaning of the data they input. People use inductive learning in
day-to-day activities without really noticing the process behind it.
For instance, they use it in medicine, transport systems, security, and
financial industries. Doctors and patients alike input the symptoms that they
have (x) and get the results that diagnose the disease or problem (f(x)) of the
patient. In transportation, drivers input their destination names in the Global
Positioning Systems (GPS), and they get information regarding the distance
and directions (f(x)) to follow. The security personnel at police stations or
airport stations input fingerprints or a person's photo (x) into their computers
and get the name (f(x)) of that person. Businesses like banks in the finance
industry use it to evaluate if a client is fit to receive credit or loan. They input
the client's properties (x), and the resulting information (f(x)) will determine
if they will approve their request.
One can describe the function in inductive learning in three different ways,
depending on its characteristics. Classification is where the service (f) that a
person is learning is distinct. Regression refers to the case where the user
deals with a continuous function. Probability estimation is where the output
of the task is a probability.
A person uses inductive learning in appropriate circumstances to solve
different problems. One uses it when there are issues due to frequent changes
in the desired function. It is not cost-effective to write a program that changes
too much continuously. An individual also applies it when there are problems
due to every user needing a custom function since it is expensive to write
tailored programs.
They also use it to deal with issues arising from the absence of a human
expert since people cannot write software to deal with an unknown answer.
There are also difficulties with the inability of a computer to do specific
social activities like riding a motorbike. It is troublesome because no one can
describe to the machine on how to act.
In conclusion, machine learning is an essential concept that an individual
needs to understand. Many factors in people's lives incorporate machine
learning. Sometimes one is aware of the learning process behind certain
activities while at other times they carry on without noticing it. The
information above explains the meaning of machine learning as well as the
types of Artificial Intelligence models.
It also indicates how machine learning takes place and serves users in
different aspects, such as financial, social, security, and other sectors.
Therefore, it is imperative to study and fully understand the concept and
application of machine learning. Such knowledge will enable a person to
contribute to the development and improvement of various machine-learning
systems around the world.
Chapter 3: Mathematical Notation, Basic
Terminology, and Building Machine Learning
Systems
Mathematical Notation for Machine Learning
In your process of machine learning, you will realize that mathematical
nomenclature and notations go hand in hand throughout your project. There
is a variety of signs, symbols, values, and variables used in the course of
mathematics to describe whatever algorithms you may be trying to
accomplish.
You will find yourself using some of the mathematical notations within this
field of model development. You will find that values that deal with data and
the process of learning or memory formation will always take precedence.
Therefore, the following six examples are the most commonly used notations.
Each of these notations has a description for which its algorithm explains:
1. Algebra
To indicate a change or difference: Delta
To give the total summation of all values: Summation
To describe a nested function: Composite function
To indicate Euler's number and Epsilon where necessary
To describe the product of all values: Capital pi
2. Calculus
To describe a particular gradient: Nabla
To describe the first derivative: Derivative
To describe the second derivative: Second derivative
To describe a function value as x approaches zero: Limit
3. Linear Algebra
To describe capitalized variables are matrices: Matrix
To describe matrix transpose: Transpose
To describe a matrix or vector: Brackets
To describe a dot product: Dot
To describe a Hadamard product: Hadamard
To describe a vector: Vector
To describe a vector of magnitude 1: Unit vector
4. Probability
The probability of an event: Probability
5. Set theory
To describe a list of distinct elements: Set
6. Statistics
To describe the median value of variable x: Median
To describe the correlation between variables X and Y: Correlation
To describe the standard deviation of a sample set: Sample standard deviation
To describe the population standard deviation: Standard deviation
To describe the variance of a subset of a population: Sample variance
To describe the variance of a population value: Population variance
To describe the mean of a subset of a population: Sample mean
To describe the mean of population values: Population means
Terminologies Used For Machine Learning
The following terminologies are what you will encounter most often during
machine learning. You may be getting into machine learning for professional
purposes or even as an artificial intelligence (AI) enthusiast. Anyway,
whatever your reasons, the following are categories and subcategories of
terminologies that you will need to know and probably understand to get
along with your colleagues. In this section, you will get to see the significant
picture explanation and then delve into the subcategories. Here are machine-
learning terms that you need to know:

1. Natural language processing (NLP)

Natural language is what you as a human, use, i.e., human language. By


definition, NLP is a way of machine learning where the machine learns your
human form of communication. NLP is the standard base for all if not most
machine languages that allow your device to make use of human (natural)
language. This NLP ability enables your machine to hear your natural
(human) input, understand it, execute it then give a data output. The device
can realize humans and interact appropriately or as close to appropriate as
possible.
There are five primary stages in NLP: machine translation, information
retrieval, sentiment analysis, information extraction, and finally question
answering. It begins with the human query which straight-up leads to
machine translation and then through all the four other processes and finally
ending up in question explaining itself. You can now break down these five
stages into subcategories as suggested earlier:
Text classification and ranking - This step is a filtering mechanism that
determines the class of importance based on relevance algorithms that filter
out unwanted stuff such as spam or junk mail. It filters out what needs
precedence and the order of execution up to the final task.
Sentiment analysis - This analysis predicts the emotional reaction of a
human towards the feedback provided by the machine. Customer relations
and satisfaction are factors that may benefit from sentiment analysis.
Document summarization - As the phrase suggests, this is a means of
developing short and precise definitions of complex and complicated
descriptions. The overall purpose is to make it easy to understand.
Named-Entity Recognition (NER) - This activity involves getting
structured and identifiable data from an unstructured set of words. The
machine learning process learns to identify the most appropriate keywords,
applies those words to the context of the speech, and tries to come up with
the most appropriate response. Keywords are things like company name,
employee name, calendar date, and time.
Speech recognition - An example of this mechanism can easily be
appliances such as Alexa. The machine learns to associate the spoken text to
the speech originator. The device can identify audio signals from human
speech and vocal sources.
It understands Natural language and generation - As opposed to Named-
Entity Recognition; these two concepts deal with human to computer and
vice versa conversions. Natural language understanding allows the machine
to convert and interpret the human form of spoken text into a coherent set of
understandable computer format. On the other hand, natural language
generation does the reverse function, i.e., transforming the incorrect computer
format to the human audio format that is understandable by the human ear.
Machine translation - This action is an automated system of converting one
written human language into another human language. Conversion enables
people from different ethnic backgrounds and different styles to understand
each other. An artificial intelligence entity that has gone through the process
of machine learning carries out this job.

2. Dataset

A dataset is a range of variables that you can use to test the viability and
progress of your machine learning. Data is an essential component of your
machine learning progress. It gives results that are indicative of your
development and areas that need adjustments and tweaking for fine-tuning
specific factors. There are three types of datasets:
Training data - As the name suggests, training data is used to predict
patterns by letting the model learn via deduction. Due to the enormity of
factors to be trained on, yes, there will be factors that are more important than
others are. These features get a training priority. Your machine-learning
model will use the more prominent features to predict the most appropriate
patterns required. Over time, your model will learn through training.
Validation data - This set is the data that are used to micro tune the small
tiny aspects of the different models that are at the completion phase.
Validation testing is not a training phase; it is a final comparison phase. The
data obtained from your validation is used to choose your final model. You
get to validate the various aspects of the models under comparison and then
make a final decision based on this validation data.
Test data - Once you have decided on your final model, test data is a stage
that will give you vital information on how the model will handle in real life.
The test data will be carried out using an utterly different set of parameters
from the ones used during both training and validation. Having the model go
through this kind of test data will give you an indication of how your model
will handle the types of other types of inputs. You will get answers to
questions such as how will the fail-safe mechanism react. Will the fail-safe
even come online in the first place?

3. Computer vision

Computer vision is responsible for the tools providing a high-level analysis of


image and video data. Challenges that you should look out for in computer
vision are:
Image classification - This training allows the model to identify and learn
what various images and pictorial representations are. The model needs to
retain a memory of a familiar-looking image to maintain mind and identify
the correct image even with minor alterations such as color changes.
Object detection - Unlike image classification, which detects whether there
is an image in your model field of view, object detection allows it to identify
objects. Object identification enables the model to take a large set of data and
then frames them to detect a pattern recognition. It is akin to facial
recognition since it looks for patterns within a given field of view.
Image segmentation - The model will associate a specific image or video
pixel with a previously encountered pixel. This association depends on the
concept of a most likely scenario based on the frequency of association
between a particular pixel and a corresponding specific predetermined set.
Saliency detection - In this case, it will involve that you train and get your
model accustomed to increase its visibility. For instance, advertisements are
best at locations with higher human traffic. Hence, your model will learn to
place itself at positions of maximum social visibility. This computer vision
feature will naturally attract human attention and curiosity.

4. Supervised learning

You achieve supervised learning by having the models teach themselves by


using targeted examples. If you wanted to show the models how to recognize
a given task, then you would label the dataset for that particular supervised
task. You will then present the model with the set of labeled examples and
monitor its learning through supervision.
The models get to learn themselves through constant exposure to the correct
patterns. You want to promote brand awareness; you could apply supervised
learning where the model leans by using the product example and mastering
its art of advertisement.

5. Unsupervised learning

This learning style is the opposite of supervised learning. In this case, your
models learn through observations. There is no supervision involved, and the
datasets are not labeled; hence, there is no correct base value as learned from
the supervised method.
Here, through constant observations, your models will get to determine their
right truths. Unsupervised models most often learn through associations
between different structures and elemental characteristics common to the
datasets. Since unsupervised learning deals with similar groups of related
datasets, they are useful in clustering.

6. Reinforcement learning

Reinforcement learning teaches your model to strive for the best result
always. In addition to only performing its assigned tasks correctly, the model
gets rewarded with a treat. This learning technique is a form of
encouragement to your model to always deliver the correct action and
perform it well or to the best of its ability. After some time, your model will
learn to expect a present or favor, and therefore, the model will always strive
for the best outcome.
This example is a form of positive reinforcement. It rewards good behavior.
However, there is another type of support called negative reinforcement.
Negative reinforcement aims to punish or discourage bad behavior. The
model gets reprimanded in cases where the supervisor did not meet the
expected standards. The model learns as well that lousy behavior attracts
penalties, and it will always strive to do good continually.

7. Neural networks

The neural network is a concept of interconnected models connected through


artificial intelligence. These models though synthetic, have the same level of
interactions that is observable between humans. Due to the long period for
the models for training and learning, their level of interconnectivity will
depend on an automated base system.
Problems will arise when self-awareness becomes possible. The ability of
such artificial intelligence to make their own decisions based on their
definitions of right and wrong will be worrying to humans. The need for
humans existence could be worthless. Neural networks are into layers that go
deeper and deeper, depending on the level of sophistication and artificial
intelligence. The top-level layer handles simple queries or inputs while the
more profound layers handle algorithms that are more complex and consider
best-case scenarios.
All the while, these models keep learning more and more through
observations, training, supervision, and examples. This constant cycle of
learning without an apparent need for downtime may make these models
achieve a much deeper level of intelligence and knowledge than we humans
could ever understand. It is at this advanced level of information that we refer
to the models as a deep learning model. There is also a legitimate fear of
letting artificial intelligence develop unchecked because they may turn
against humans sometime in the future.

8. Overfitting

Overfitting is a concept used to describe the assumption developed by the


model through an insufficient presentation of a relevant dataset. The model
will generate a bias based on its performance. If at all times, it gets corrupted
or an inadequate number of data points, then the model will treat the corrupt
data as legitimate. Based on this preconditioned bias, the model will give you
an output that is correct as far as it is concerned.
According to the model, the data warrants that particular output. You, as the
human, will understand that the output data is wrong based on the limited or
corrupted information you presented to your model in the first place. A model
will always function according to its programmed algorithms. A faulty
output is most likely due to your human error from an insufficient dataset or a
corrupted set of input.
Road Map to Building Your Machine Learning Systems
Machine learning and artificial intelligence are no longer just purely for the
super-wealthy. With the right amount of talented resources and financial
might, anybody can get into the front door and try his or her hand at machine
learning exploration. This short three-step system will offer you a simple
guideline towards achieving your dream. You should not be afraid.
Success tends to favor those of us who dare. Therefore, when developing a
road map for building machine-learning systems, it is essential to understand
the terminologies explained before. These terms are the language used in the
science of machine learning and may be used from time to time when making
a particular point. Machine learning was initially a means to an end, with the
end being artificial intelligence.
However, nowadays, we have come to realize that machine learning may be
utilized as an adjunct to ease our everyday life and its associated challenges.
Therefore, machine learning is a field that has suddenly become rather
indispensable over the past few years. In addition, given the progress
happening at exponential rates, it is undoubtedly here to stay. There are three
main steps for your roadmap to building machine learning systems:

I. Pre-processing data

Data is always the first step towards building a machine-learning project.


You will need an up to date comprehensive source of relevant information
that will match your target. Pre-processing data refers to the concept that you
will probably need to fine-tune the specific aspects you will need from a vast
data set.
Depending on the algorithm you will apply, the type of data you will need
has to be pre-processed from its raw form and into the specificity, you need.
You also have to pay attention to the compliance of the data set in terms of
range and distribution. With a lot of data, comes confusion. The more
trouble you have, the more redundant your data will be. You will end up with
a multitude of data sets that correlate the same aspect of your algorithm,
hence, the importance of pre-processing your data.
In addition, you will need data for the testing and training stages of the
project development. Your model depends on its ability to learn, hence,
machine learning. To learn, you will have to train the model using training
data sets, and afterward, during the validation period, you will apply
validation data. Finally, when you are ready to unveil your model, test data
will let you know how the model will handle your target market.

II. Conduct training and select a predictive model

This step is a crucial stage in machine learning. This step is the stage where
you identify your patterns and associations with your data. Is your data
performing as you expected? You can only know the answer through training
your model. Therefore, you should get to compare a range of different
models against a predetermined set of parameters.
This comparison will provide you with an idea of the most likely model
(predictive model) based on higher performance. This most predictive model
thus has the tolerability to withstand the training session that will come
afterward. Vow that you have decided on the predictive model, you get
started on the testing stages. Training is the only way to determine whether
your model will learn according to their algorithms. After all, this is a
roadmap to building machine-learning systems.
The tests that you will carry out will have to meet various conditions
depending on your algorithms. During training, you test the model using
training data sets, during validation; you get to use the validation data sets.
By the time, you get to the validation stage, and then your model will likely
have been through all the necessary tests and passed them all. In addition,
you will probably have confidence in your model to follow the set algorithms
and be tested in the market.

III. Evaluating models and predicting unseen data instances

Now that we have selected our final model, we need to run a test run to see
how it behaves. Remember, all the previous data sets that we subjected to the
model during training and valuation were not original. We will now use
altered data to meet specific algorithmic criteria. Now that the model is
through validation, we need to subject it to previously untested data set to
mimic the real-world application.
This stage involves the project leaving the laptops and other development
platforms and availing them to applications. Only when you subject your
model to this test data represented by the real-world application can we
determine its evaluability. This stage is your deployment phase. You get to
see how your algorithms play with others.
Finally, once you have had a successful deployment, your machine learning
system can be released to the market. You should remember that you are
dealing with a machine learning system. Therefore, every aspect of the entire
order will continuously want to keep learning new things and adapt
accordingly. Maintenance becomes almost a full-time task to ensure peak
performance all the time.
Chapter 4: Using Python for Machine Learning
Machine learning is a popular topic in the field of artificial intelligence. For
some time now, it has been in the spotlight because of the massive
opportunities it offers. Machine learning is a type of AI or coding then gives
computers the ability to learn without the need for explicit programming. It
involves the development of software programs that change or learn when
exposed to a particular data set, and use it to predict the properties of new
data. Essentially, it is learning based on experience.
Starting a career in this field is not as difficult as most people seem to think.
Even people with no experience in programming can do it, as long as they
have the motivation and interest to learn. People who want to become
successful coders need to learn many things; however, to succeed in the field
of machine learning, they simply need to master and use one coding
language.
The first step towards success is choosing the right coding language right
from the start, as this choice will determine one's future. To make the right
choice, one needs to have the right priorities and think strategically. In
addition, it is important to avoid focusing on unnecessary things. Python is a
good choice for beginners to make their first forays into the field of machine
learning.
Python for machine learning is an intuitive and minimalistic language with
full-featured frameworks, which help reduce the time it takes developers to
achieve their first results. For starters, new developers need to understand that
machine learning involves several stages:
1. Collection of data
2. Sorting this data
3. Analyzing data
4. Developing an algorithm
5. Checking the algorithm generated
6. Using an algorithm to further objectives
Variables
It would be an amazing thing to be able to predict what is going to happen.
For example, an investor might want to predict how a certain stock will
behave based on some information that he/she just happens to possess. This is
where multiple linear regression comes in.
The beauty of multiple linear regression is that it analyzes relationships
within huge amounts of data, instead of simply identifying how one thing
relates to another. Using this model, the investor referenced above would be
able to look at the connection between many different things and the outcome
he/she wants to predict. The linear regression model is one of the main
building blocks of machine learning.
MLR makes use of different variables to predict a different variable's
outcome. Its goal is to model the relationship between dependent variables
and independent variables. The analysis of multiple-regression has three main
purposes:
1. Predicting future values and trends
2. Determining how much the dependent variable will change if there is a
change in the independent variables
3. Measuring the impact of independent variables on the dependent
variable
For the purpose of this post, assume 'Peter' is the investor references above,
and he now works for a venture capitalist. He has a dataset with information
on 100 companies, and five columns with information about the amount of
money those companies spend on marketing, research and development,
administration, profits, and location by state. However, he does not know the
real identity of any of these companies.
Peter needs to study this information and design a model that will identify
companies that are good investment prospects, based on the previous year's
profits. As such, the profits column will be the dependent variable, and the
other columns will be independent variables.
To achieve his objective, Peter needs to learn about the profit based on all the
information he has. That said, Peter's boss does not intend to invest in any of
these companies; rather, he/she wants to use the dataset's information as a
sample to help him/her how to analyze investment opportunities in the future.
Therefore, Peter needs to help his boss determine the following:
1. Whether to invest in companies spending more on research and
development,
2. Whether to invest in those focusing on marketing,
3. Whether to choose companies based in particular locations and so on
The first thing that Peter needs to do is help his boss create a set of
guidelines; for example, he/she is interested in a company based in Chicago
that spends more money on research and development and less on
administration. Therefore, he will need to design a model that will allow him
to assess into which companies his boss needs to invest to maximize his/her
profits.
Some of the assumptions he will need to consider include:
1. There is a linear relationship between independent and dependent
variables
2. His observations for dependent variables are random and independent
3. Independent variables are not often connected with each other
4. There is a normal distribution of regression residuals
Before building a model, Peter needs to ensure that these assumptions are
true. In this case, since the company location is an independent variable and
profit is the dependent variable, he will need his independent variables to be
numbered. For the purpose of this example, assume all companies operate in
either Chicago or Florida.
This means that Peter will need to divide the column representing location
into two different columns of 1s and 0s, with each state representing a
specific column. Companies based in Chicago will have a '1' in its column,
while one based in Florida will have a '0' in its column.
However, if he were considering more than two states, he would use a '1' in
the Chicago column; a '0' in the Florida column; a '0' in the New York
column; a '0' in the California column, and so on. Essentially, he will not use
the original column for locations because he will not need it. In other words,
'1' is for yes and '0' is for no. Peter will need to determine which columns to
get rid of and which to keep before building his model.
Application of Variables in Python
A variable in python is a memory location set aside to store values.
Essentially, it is a label for a specific location in memory. Different types of
data in python are strings, numbers, dictionary, list, and so on, and every
value has a type of data. Developers can declare variables using any name or
alphabet, such as a, abc, aa, and more. They can also re-declare a variable
even if they have already declared it once.
There are different predetermined types of variables in statistically typed
computer languages, and since variables hold value, they can only hold
values o the same type. In python, however, developers can reuse the same
variable to hold any type of value. A variable is different from the memory
function of a calculator in that developers can have different variables
holding different types of values, with each variable having a particular name
or label.

Types of Variables

Python is an object-oriented computer language, which means that it is not a


statistical type. Therefore, developers do not need to declare variables or their
type before using them because every variable is an object in Python. The
main types of variables are:

1. Numbers

There are two basic types of numbers in Python; i.e., floating-point numbers
and integers; however, it also supports complex numbers.

2. Strings

These are either double quotes or a single quote. The difference between
these two is that double quotes allow for the inclusion apostrophes, which
would lead to the termination of a string using single quotes. However, when
it comes to defining strings, certain variations can make it possible to include
characters such as Unicode, backslashes, and carriage returns.
Strings and numbers allow for the execution of simple operators, as well as
the assignment of multiple variables at the same time on the same line.
However, Python does not support mixing operators between strings and
numbers.
In python, developers simply assign an integer value to a name to define a
new variable. For example, by typing count = 0, one will define a variable
called count, which has a value of zero. To assign a new value to this
variable, one will use the same syntax. That said, developers should use
meaningful names for variables in keeping with proper programming style.
It is important to understand that different variables exist for different
amounts of time, and not all of them are accessible from all parts of a
program. How long a particular variable exists and from where it is
accessible depends on how developers define them. In other words,
developers define a variable's lifetime and scope.
For instance, a global variable, referring to a variable defined on a file's main
body, is visible and accessible from any part of the file, as well as inside any
file that imports that particular file. Due to its wide-ranging effects and scope,
this type of variable can have unintended consequences, which is why it is a
good idea to use it as little as possible. Only classes, functions, and other
objects intended for this type of scope should occupy the global namespace.
Any variable defined within a function is limited or local to that function.
Essentially, its scope begins from its definition point and terminates at the
end of the function. In the same way, its lifetime will end as soon as the
function stops executing. When the assignment operator "=" appears within a
function, by default, it creates a new local variable. However, this will not
happen if there is another variable with the same name defined inside the
function.
Within a class body, there is a new local variable scope. Class attributes are
any variable defined outside any class method but within this scope.
Referenced using their bare names, class attributes are accessible from
outside their local scope if developers use the .”" access operator on a class or
any object that uses that particular class as its type. Developers share these
class attributes between all objects that use that class, also called instances,
but each instance has a different instance attribute.
Initializing a variable is the process of assigning a new value to a variable. In
Python, this involves the definition and assignment of a value in a single step.
As such, developers rarely encounter situations where a variable has a
definition but not a value, which could lead to unexpected behavior and
errors when developers try to use such valueless variables.
Essential Operator
Python supports many different types of operators, including logic operators,
arithmetic operators, bitwise operators, rational operators, identity operators,
assignment operators, and membership operators. Operators are various types
of symbols that indicate the need for the performance of some sort of
computation. Operators act on values known as operands, for example:

1. a=5
2. b = 10
3. a+b
15
Essentially, they facilitate operations between two variables or values, with
the output varying depending on the type of operator used. These special
constructs or symbols manipulate the values of operands, whose value can be
either a data type or variable. In the example above, the '+' operator adds two
operands, i.e., 'a' and 'b', together. In this case, these operands may be
variables that reference a certain object or a literal value.
A sequence of operators and operands, such as a + b – 2, is known as an
expression. Python supports a wide range of operators for combining
different data objects into various expressions. Essentially, the basic
expression supported by Python looks like operand 1 & operator & operand 2
= result.
Python uses arithmetic operators or symbols to perform arithmetic equations
and assignment operators to assign values to objects or variables. For
example:

1. x = 20
2. x += 10
3. #this is the same as x = x + 10

In addition, Python uses comparison operators to compare two different


values, such as:
1. a = 10
2. b = 5
3. #equal
4. a ==10
5. #not equal
6. a != 5
7. #greater than
8. a > b
9. #less than
10. a<b
11. #greater than or equal to
12. a >= b
Logical operators in Python compare two conditional statements, such as #
logical and 4 > 2 and 4 > 1, which will return true. A statement such as, 4 > 2
or 4 < 1, will also return true because one of the statements is accurate.
However, 4 > 3 and 4 < 2 will return false because one of the statements is
not true.
Identity operators, on the other hand, compare two objects; membership
operators determine whether there is a sequence in an object, while bitwise
operators compare binary values. For example, bitwise AND 10 & 12 will
return 8, bitwise OR 10 / 12 will return 14, and bitwise XOR 10 ^ 12 will
return 6.
To understand the results generated by bitwise operators, one needs to
consider the binary equivalent of the values. For example, the binary
equivalent of 10 is 1010, while that of 12 is 1100. Therefore, in the case of an
AND operation between these two figures, i.e., 1010 and 1100, both the bits
are 1. As such, the binary equivalent will be 8, which is 1000 converted into
decimal.
In summary, arithmetic operators in Python include addition, subtraction,
multiplication, modulus, division floor, and division float. Relational
operators include less than, greater than, not equal to, equal to, less than or
equal to, and greater than or equal to. The main logical operands include
AND, NOT, and OR, while bitwise operators are Bitwise AND, Bitwise
NOT, Bitwise OR, Bitwise XOR, Bitwise left shift, and Bitwise right shift.
Assignment operators include Add AND, Assign value, Subtract AND,
Modulus AND, Divide AND, Multiply AND, Exponent AND, Divide (floor)
AND, Perform Bitwise AND, Perform Bitwise left shift, Perform Bitwise
right shift, and Perform Bitwise xOR. Not in and in are the two main
membership operators.
Functions
A function in Python is a group of statements that take input, perform certain
calculations, and generate output. The idea is to combine common tasks to
perform a certain function, with the aim of calling the function instead of
writing the same code multiple times for different inputs. Python comes with
built-in functionalities, but developers can still add their own functions,
commonly referred to as user-defined functions.
An example of a simple Python function to check whether a certain value is
an even number or an odd number is:
1. def evenOdd(x):
2. if (x % 2 ==0):
3. print "even"
4. else:
5. print "odd"
6. # Driver code
7. evenOdd(2)
8. evenOdd(3)

Developers should understand that, in Python, the name of every variable is a


reference. Therefore, when they pass a variable to a particular function, they
will create a new reference to the object. Essentially, it is Python's version of
reference passing in Java.
Functions in Python are very important when it comes to data analysis and
machine learning. They save time and effort applied in coding. In other
words, they allow developers to reuse a certain subset of code in their
program whenever they need to. A basic function contains the following
things:
1. Code that the developer wants certain functions to run whenever they
call it
2. Function name to use when calling a particular function
3. Def keyword to let the program know when developers are adding their
own functions
Essentially, when developers want to use a certain function, they simply call
it using its name followed by parenthesis, i.e., function name (), for example,
if the aim of the function is to print the word 'Thanks' with the name of the
user, the developer can use the following code:
def func(name):
print("Thanks" +" " + name)
Whenever the developer needs to call this function, all he/she needs to do is
pass its name as a parameter in string format, which will generate the desired
output. For example, func("Paul") will return "Thanks Paul."
On the other hand, if a developer wants a function to generate a value, he/she
can define it as follows:

1. def square(value):
2. new_value = value ** 2
3. return new_value
4. Output-
5. number = square(4)
6. print(number)
7. 16

Developers can also define default arguments to return the predefined default
values if someone does not assign any value for that argument.
It is normal to wonder why it is important to use functions for machine
learning when using Python. Sometimes, developers may be trying to solve a
problem where they need to analyze different models of machine learning to
achieve better accuracy or any other metric, and then plot their results using
different data visualization packages.
In such situations, they can write the same code multiple times, which is
often frustrating and time-consuming, or simply create a function with certain
parameters for each model and call it whenever needed. Obviously, the last
option is more simple and efficient.
In addition, when a dataset contains tons of information and features, the
amount of work and effort required will also increase. Therefore, after
analyzing and engineering different information and features, developers can
easily define a function that combines different tasks or features and create
plots easily and automatically.
Essentially, when developers define a function, they will drastically reduce
the complexity and length of their code, as well as the time and resources
required to run it efficiently. In other words, it will help them automate the
whole process.
Python also allows for a defined function in the program to call itself, also
known as function recursion. This programming and mathematical concept
allow developers to loop through data to achieve a specific result. However,
they should be very careful with this function because it is possible to code a
function that uses too much processing power and memory, or one that never
terminates. When done properly, however, it can be an elegant and efficient
approach to programming.
Conditional Statements
The world is a complicated place; therefore, there is no reason why coding
should be easy, right? Often, a program needs to choose between different
statements, execute certain statements multiple times, or skip over others to
execute. This is why there is a need for control structures, which direct or
manage the order of statement execution within a program.
To function in the real world, people often have to analyze information or
situations and choose the right action to take based on what they observe and
understand. In the same way, in Python, conditional statements are the tool
developers use to code the decision-making process.
Controlled by IF statements in Python, conditional statements perform
various actions and computations depending on whether a certain constraint
is true or false. Essentially, a conditional statement works as a decision-
making tool and runs the body of code only when it is true. Developers use
this statement when they want to justify one of two conditions.
On the other hand, they use the 'else condition, when they need to judge one
statement based on another. In other words, if one condition is not true, then
there should be a condition justifying this logic. However, sometimes, this
condition might not generate the expected results; instead, it gives the wrong
result due to an error in the program logic, which often happens when
developers need to justify more than two conditions or statements in a
program.
In its most basic form, the IF statement looks as follows:

1. If <expr>:
2. <statement>

In this case, if the first part of the statement is true, then the second part will
execute. On the other hand, if the first one is false, the program will skip the
second one or not execute the statement at all. That said, the colon symbol
following the first part is necessary. However, unlike most other
programming languages, in Python, developers do not need to enclose the
first part of the statement in parentheses.
There are situations where a developer may want to evaluate a particular
condition and do several things if it proves to be true. For example:
If the sun comes out, I will:
1. Take a walk
2. Weed the garden
3. Mow the lawn
If the sun does not come out, then I will not perform any of these functions.
In most other programming languages, the developer will group all three
statements into one block, and, if the first condition returns true, then the
program will execute all three statements in the block. If it returns false,
however, none will execute.
Python, on the other hand, follows the offside rule, which is all about
indentation. Peter J Landin, a computer scientist from Britain, coined this
term taken from the offside law in soccer. The relatively few machine
languages that follow this rule define these compound statements or blocks
using indentation.
In Python, indentation plays an important function, in addition to defining
blocks. Python considers contiguous statements indented to the same level to
constitute the same compound statement. Therefore, the program executes the
whole block if the statement returns true, or skips it if false. In Python, a suite
is a group of statements with the same level of indentation level.
Most other machine languages, on the other hand, use unique tokens to
identify the beginning and end of a block, while others use keywords. Beauty,
however, is in the eye of the beholder. On the one hand, the use of
indentation by Python is consistent, concise, and clean.
On the other hand, in machine languages that do not adhere to the offside
rule, code indentation is independent of code function and block definition.
Therefore, developers can write indent their code in a way that does not
match how it executes, which can create a wrong impression when someone
looks at it. In Python, this type of mistake cannot happen. The use of
indentation to define compound statements forces developers to remain true
to their standards of formatting code.
Sometimes, while evaluating a certain condition, developers might want to
perform a certain function if the condition is true, and perform an alternative
function if it turns out to be false. This is where the 'else' clause comes in. for
example:

1. if <expr>:
2. <statement/statements>
3. else:
4. <statement/statements>

If the first line of the statement returns true, the program executes the first
statement or group of statements and skips the second condition. On the other
hand, if the first condition returns false, the program skips the first condition
and executes the second one. Whatever happens, however, the program
resumes after the second set of conditions.
In any case, indentations define both suits of statements, as discussed above.
As opposed to machine languages that use delimiters, Python uses
indentation; therefore, developers cannot specify an empty block, which is a
good thing.
Other types of conditional statements supported by Python that developers
need to look into include:
1. Python's ternary operator
2. One line IF statements
3. The PASS statement
4. While statement
5. For statement
Conditional statements are important when it comes to writing a more
complex and powerful Python code. These control statements or structures
allow for the facilitation of iteration, which is the execution of a block or
single statement repeatedly.
Loop
Traditionally, developers used loops when they needed to repeat a block of
code a certain number of times. Every important activity in life needs practice
to be perfect. In the same way, machine-learning programs also need
repetition to learn, adapt, and perform the desired functions, which means
looping back over the same code or block of code multiple times.
Python supports several ways of executing loops. While all these ways offer
the same basic functionality, they are different when it comes to their
condition checking time and syntax. The main types of loops in Python are:

1. While loop
2. For in loop

The first type of loop repeats as long as certain conditions return true.
Developers use this loop to execute certain blocks of statements until the
program satisfies certain conditions. When this happens, the line or code
following the look executes. However, this is a never-ending loop unless
forcefully terminated; therefore, developers need to be careful when using it
Fortunately, developers can use a 'break statement' to exit the loop, or a
'continue statement' to skip the current block and return to the original
statement. They can also use the 'else' clause if the 'while' or 'for' statement
fails.
On the other hand, developers use for loops to traverse an array, string, or
list. These loops repeat over a specific sequence of numbers using the
'xrange' or 'range' functions. The difference between these two ranges is that
the second one generates a new sequence with the same range, while the first
one generates an iterator, making it more helpful and efficient.
Other types of loops developers need to look into include:
1. Iterating by the index of elements
2. Using else-statements with for in loops
3. Nested loops
4. Loop control statements
When Guido van Rossum released Python back in 1991, he had no idea that it
would come to be one of the most popular and fastest-growing computer
learning languages on the market. For many developers, it is the perfect
computer language for fast prototyping because of its readability,
adaptability, understandability, and flexibility.
Chapter 5: Artificial Neural Networks
Over time, the human brain adapted and developed as part of the evolution
process. It became better after it modified many characteristics that were
suitable and useful. Some of the qualities that enhanced the brain include
learning ability, fault tolerance, adaptation, massive parallelism, low energy
utilization, distributed representation and computation, generalization ability,
and innate contextual processing of information.
The aim of developing artificial neural networks is the hope that the systems
will have some of these features in their possession. Human beings are
excellent at solving complicated perceptual issues such as identifying a
person in a crowded area from just a glimpse almost instantaneously.
On the other hand, computer systems calculate numerically and manipulate
the related symbols at super-speed rates. This difference shows how the
biological and artificial information processing varies, and a person can study
both systems to learn how to improve artificial neural networks best.
These networks seek to apply some organizational elements that the human
brain uses, such as learning ability and generalization skills, among others.
Thus, it is also essential for a person to have some understanding of
neurophysiology and cognitive science. It will help them to comprehend
artificial neural networks that seek to mirror how the brain works to function
effectively.
Here, we have the information regarding the types, layers, as well as the
advantages and disadvantages of artificial neural networks. Read on to find
out how artificial neural systems function in the machine learning process.
Introduction to Artificial Neural Networks
Artificial neural networks are one of the vital tools that a person uses in
machine learning. They refer to the materially cellular systems or
computational algorithms that can obtain, store, and use experimental
knowledge. These systems show the similarities that exist between the
functioning styles of artificial neural networks and the human information
processing system.
The biological neural networks inspire the design and development of
artificial neural systems. People use artificial neural networks to solve several
issues such as prediction, associative memory, optimization, recognition, and
control. As a result, the neural systems consist of a high number of linked
units of processing which operate together to process information and
produce significant results.
The human brain comprises millions of neurons that process and send cues
by using chemical and electrical signals. Synapses are specific structures that
connect neurons and permit messages to pass between them. The neural
networks form a significant number of simulated neurons. Similarly, artificial
neural systems have a series of interlinked neurons that compute values from
inputs.
They use a vast number of linked processing units, which act as neurons, to
accomplish information processing. The numerous groups of units form the
neural networks in the computational algorithm. These networks comprise
input and output layers, along with hidden layers, which can change the input
into that which an output layer can utilize.
Artificial neural networks are perfect instruments for identifying patterns that
are too complex for a person to retrieve and teach the computer to recognize.
It can also classify a significant amount of data after a programmer carefully
trains it. In this case, deep learning neural networks provide classification.
Afterward, the programmer uses backpropagation to rectify mistakes that
took place in the classification process.
Backpropagation is a method that an individual uses to calculate the gradient
failure or loss function. Deep learning networks provide multilayer systems
that identify and extract certain relevant elements from the inputs and
combine them to produce the final significant output. It uses complex
detection of features to recognize and classify data as needed by the user. The
programmers who train the networks label the outputs before conducting
backpropagation.
Therefore, artificial neural networks are the computational models that use
the biological central nervous system as a blueprint and inspiration.
Understanding neural networks enable a person to discover and comprehend
the similarities that exist. It also allows them to note the differences that exist
between the artificial and biological intelligence systems. They also know
how they can apply the suitable characteristics to improve and optimize
information and function.
Types of Artificial Neural Networks
Artificial Neural Networks are computational algorithms that draw
inspiration from the human brain to operate. Some types of networks employ
mathematical processes along with a group of parameters to establish output.
The different kinds of neural networks are:

1. Kohonen Self Organizing Neural Network

This type of system uses Kohonen's self-organizing neural map to visualize


and classify data of high dimensions. The neural network forms a layout from
the N-dimensional space into the neurons, that is, units place. In this case, the
N-dimensional space is arbitrary. The map maintains the topological
connections of a random dimensional scope. The U-matrix method visualizes
designs or systems in the N-dimensional space. It also identifies irregularities
in the layout that this network provides.
A programmer trains the mapping that the network produces and enables it to
form its arrangement of training or teaching data. During the training process,
the value determines how the weights differ while the neuron stays
unchanged. The first phase of the self-organizing process consists of the
initialization of each neuron value by the input vector and small weight. The
second phase comprises the winning neurons, which are those that are nearest
to a point. Other neurons linked with these winning one also proceed towards
the spot. The Euclidean distance calculates the distance between the neurons
and the point where the minimum distance produces the winner. Recurrence
leads to clustering of the places and every neuron symbolizes every type of
cluster.
A person can apply Kohonen's self-organizing neural network to cluster data
because it can identify the limits between various subgroups of data. One
uses association to categorize new inputs. This neural system also enables a
user to acquire knowledge and analyze exploratory data. One uses it to
identify data and analyze cluster data into various classifications with
accuracy.

2. Feedforward Neural Network – Artificial Neuron

This type is among the most accessible forms of neural networks because the
data moves in a single direction. The data goes through the input nodes and
comes out of the output ones. Feedforward neural network type does not have
backpropagation and only possesses front propagation. As a result, there may
or may not be hidden layers in its possession. It also usually uses an
activation function for classifying purposes.
The system calculates the totality of the products of weights and inputs and
feeds that sum to the output. The output produced is activated or deactivated,
depending on the values on the threshold. This boundary is usually zero, and
if the value is equal to or above one, the neuron discharges an activated
output. On the other hand, if the product is below zero, the neuron does not
release it, and instead, the network sends out the deactivated value.
The Feedforward Neural Network responds to noisy input, and it is easy to
sustain. It exists in speech and vision recognition features of a computer
because it classifies complicated target classes.

3. Radial Basis Function Neural Network

This neural system looks at a point’s distant concerning the center, and it
consists of two layers. The features initially join with the Radial Basis
Function that exists in the inner layer. The system takes into account the
resulting outputs of the elements concerned while processing the same
product in the subsequent step, which is the memory. One can use the
Euclidean method to measure the distances of interest.
This neural network classifies the points into various groups according to the
circle's radius or the highest reach. The closer the point is to the range of the
ring, the more likely the system will categorize that new point into that
particular class. The beta function controls the conversion that sometimes
takes place when there is a switch from one region to another.
The most prominent application of the Radial Function Neural Network is in
Power Restoration Systems, which are enormous and complex. The network
restores power quickly in an orderly manner where urgent clients like
hospitals receive power first, followed by the majority and lastly the common
few. The restoration goes following the arrangement of the power lines and
the customers within the radius of the tracks. The first line will resolve power
outage among the emergency clients, the second one will fix the power for
the majority, and the last one will deal with the remaining few.

4. Recurrent Neural Network – Long Short Term Memory

The Recurrent neural network uses the idea where it saves the output of the
layer and then feeds it back into the input. It seeks to assist in providing
predictions concerning the outcome or result of the stratum. This type of
neural network takes place in some models, such as the text to speech
conversion prototypes. In this case, the system first changes the text into a
phoneme.
A synthesis or combining audio model then converts it into speech. The
initial process used texting as input which resulted in a phonological sound as
the output. The network then reutilized the output phoneme and inputted it
into the system, which produced the outcome that was the speech.
Neurons in Recurrent neural networks perform like memory cells while
carrying out computations. Each neuron remembers some details or data that
it possessed in the preceding time-step. Thus, the more the information passes
from one neuron to the next, the more knowledge a neuron recalls. The
system works using the front propagation and recollects which knowledge it
will require for future use.
In case the prediction that the network makes is wrong, a person utilizes
backpropagation. They correct errors and employ the learning rate to create
small modifications. These alterations will guide the system to make correct
predictions. It is essential also to note that the sum of the features and weights
produce the outputs, which create the first layer of the Recurrent Neural
Network type.

5. Convolutional Neural Network

This neural network type can receive an input image and attribute importance
to different features or items in the representation. Afterward, it can manage
to distinguish one figure from another. Convolutional neural networks require
less pre-processing in comparison to other algorithms used in classification.
The different methods need to utilize filters that a programmer engineers by
hand. In contrast, the deep learning algorithms of this neural network type
have the capability of learning the necessary filters or qualities.
The structure of the Convolutional neural network gets its inspiration from
the formation and structure of the Visual cortex in the brain. Its organization
is similar to the patterns of connection of neurons in the human brain. The
single neurons react to stimuli only in a confined region of the optical area
called the Receptive field. A group of this field covers the whole visual area
by overlapping.
The neural network type, therefore, uses relevant filters to fit the image data
collection better and consequently comprehend more the complexity of an
image. The screens refer to the system taking in the input features in batches
which help it to recall the image in sections and compute processes. The
system carries out appropriate screening and reduces the number of
frameworks concerned as well as enables the recycling of weights. As a
result, the Convolutional neural network type can obtain the Temporal and
Spatial dependencies effectively.
This neural system has biases and learn-able weights in their neurons that
enable a person to use it in the processing of signals and images in the
computer vision area. The computation process includes converting an image
to Gray-scale from a HIS or RGB scale. This conversion changes the value of
the pixel, which in turn assists in identifying the edges. It then enables the
classification of images into various categories. Techniques in computer
vision utilize this neural network significantly because it processes signals
and provides accurate image classification.

6. Modular Neural Network

This neural system type has a group of various networks that work
individually to supply to the output. They do not transmit signals to one
another while carrying out the necessary tasks. Every neural system contains
a collection of inputs that are exclusive to the related network. Thus, the
structures do not interact with each other in any way while they work to
create and perform the assigned sub-duties. They reduce the sophistication of
a process by breaking it down into sub-tasks. It leads to the network
functioning without interacting with one another because the amount of links
between the processes lessens.
As a result, efficiency in the accomplishment of the sub-tasks leads to an
increased speed of the computation process. Nevertheless, the time it takes to
complete the process is contingent on the number of neurons and their
engagement in the computation of results.
Artificial Neural Network Layers
Three layers make up the Artificial Neural Networks. These layers are the
input, hidden, and output ones. Numerous interlinked nodes form each one of
these layers, and the nodes possess an activation function, which is an output
of a specific input.

1. Input Layer
This layer takes in the values of the descriptive qualities for every
observation. It offers the patterns to the system that communicate to the
hidden layers. The nodes in this layer do not modify the data because they are
passive. It duplicates every value that it receives and sends them onto the
hidden nodes. Additionally, the number of descriptive variables is the same
as the number of nodes present in this layer.

2. Hidden Layer

This layer utilizes a structure of weighted links to process values. It


multiplies the values getting into the stratum by weights and then adds the
weighted inputs to obtain a single number. The weighting, in this case, refers
to a group of preset numbers that the program stores. The hidden layer
implements particular conversions to the input values in the system.

3. Output Layer

This layer obtains links from the input layer or the hidden layer and provides
an output that matches the prediction of the feedback variable. The selection
of suitable weights leads to the layer producing relevant manipulation of data.
In this layer, the active nodes merge and modify the data to generate the
output value.
Conclusively, the input layer includes input neurons that convey information
to the hidden layer. The hidden layer then passes the data on to the output
layer. Synapses in a neural network are the flexible framework or boundaries
that transform a neural system into a parameterized network. Hence, every
neuron in the neural network contains balanced inputs, activation functions,
and an output. The weighted data represent the synapses while the activation
function establishes the production in a particular entry.
Advantages and Disadvantages of Neural Networks
There are some advantages and disadvantages to using Artificial Neural
Networks. Understanding them can enable one to comprehend better the
operations and shortcomings involved in the neural networks.
Advantages of Artificial Neural Networks
The following are the advantages of Artificial Neural Networks, which help a
person to learn the benefits of using the neural systems.
a) Ability to Make Machine Learning – Artificial neural systems learn
circumstances and events, and decide by remarking on similar
occurrences.
b) Ability to Work with Incomplete Knowledge – Proper training
enables the network to provide output even when the information is
insufficient or incomplete.
c) Having a Distributed Memory – A programmer uses examples to
teach the neural network and get it to produce desired outputs. They
train the neural system by the desired outcome by utilizing as many
details as possible. The example comprises of all the information and
sections necessary to ensure that the network will learn the most. This
training provides the system with a memory that has different relevant
details that will allow for the production of suitable outputs. The better
the example in the training process is, the less likely the neural
network is to provide false outputs.
d) Parallel Processing Capability – The networks can carry out several
jobs at the same time due to their numerical advantage. They process
several numerics simultaneously and at a high speed.
e) Storing Information on the Entire Network – The system stores
information on the whole network and not in databases. It enables the
system to continue operating even in cases where a section loses some
information details.
f) Having Fault Tolerance – The network is fault-tolerant in that it can
carry on providing output even in situations where one or more cells
decay.
g) Gradual Corruption – Degradation in the network does not take place
abruptly and instantly. It occurs slowly and progressively over a
period, which allows one to use the system despite the corrosion still.
Disadvantages of Artificial Neural Networks
Below are a few difficulties associated with Artificial Neural Networks. They
indicate the downside of a person utilizing the system.
a) The difficulty of showing the Issue to the Network – This neural
system only deals with information in numerical form. The user must,
therefore, convert problems in numerical values to utilize the network.
The user may find it challenging to transform the display mechanism
as needed.
b) Dependence on Hardware – The neural networks depend on hardware
with parallel processing ability, without which they cannot function.
c) The Duration of the Network is Unidentified – The system signals the
completion of training when the network drops to a particular error
value on the example. The estimate does not represent the optimal
results.
d) Unexplained Behavior of the Network – There is decreased trust in
the network because it provides solutions for an inquisition without an
accompanying explanation.
e) Determining the Proper Network Structure – Relevant neural
systems form through experiments and experiences. There is no set
rule for establishing an appropriate network structure.
In conclusion, Artificial Neural Networks are a crucial tool in Artificial
Intelligence and machine learning. A person needs to understand what the
neural system means and how it functions. The types of networks indicate
how the network operates in different circumstances while the layers show
how data moves in this neural system. The advantages and disadvantages also
inform a user about the highs and lows of the networks. The information
above provides significant knowledge that a person can use to learn about
Artificial Neural Networks. Subsequently, they can employ it in their
application of the systems.
Chapter 6: Machine Learning Classification
What Is Machine Learning Classification?
Machine learning classification is a concept in which the machine learns new
information from any further data provided to it. In addition, the device uses
the knowledge it obtains to classify a new similar observation. This practice
is also known as machine training at the computer development stage. Only
data sets and algorithms will be in use in the preparation and not the finalized
product models.
More so, this machine learning technique is carried out under a supervised
learning approach where the input data is already labeled. Due to labeling, the
essential features will be put into separate classes before learning. As a result,
the machine-learning network will know which areas of the input features are
vital, and it can compare itself to the ground truth for comparison.
On the other hand, unsupervised learning involves the machine trying to sort
out the importance of the features by itself. It does this through multiple
observations since the input data is unlabeled. Your machine learning
classification will only apply to the supervised approach.
Types Of Classifiers In Python Machine Learning
When you get involved in machine learning, it is crucial to understand the
success of your product model, depends on rigorous training and testing
procedures. When training a model, give it a set of data inputs called features.
Let the model learn patterns and extract relevant data sets. The model then
predicts an output called labels. During training, the model will depend on
both the feature and label data sets.
However, during testing, the model will only be tested on a different feature
data set. You want to determine whether your model has indeed undergone
machine learning and can predict an outcome by itself. The model will have
already learned from previous training datasets and exposed to the standard
features and labels in training. Hence, the need to apply a new feature data set
only. No label data set is needed during testing. You will conduct this
seemingly one-sided test to maintain integrity and eliminate bias.
The best approach to applying python machine learning is to try out multiple
classifiers. Depending on your specific algorithm, you can test each model
type. After getting an idea of how each type of classifier performs, you can
then choose your best option. The following are the types of classifiers in
python machine learning:
K-Nearest Neighbours - This classifier is an algorithm that selects the model
that provides the nearest parameters to the target result. You run a handful of
tests on the different algorithms or models. All along, you have the correct
known value of the parameter under testing. This type is a training
methodology even though you will be testing a model's proximity or
deviation from perceived value. This proximity determination will enable you
to select data sets that are more likely to learn faster and memorize better.
The model or algorithm that results in a value that is closest to the correct
known amount is the data set that is selected.
Support Vector Machines - This machine training method works by getting
a bunch of data points and then drawing a line between the different data
points. Where these data points end up landing will determine their respective
classes. Now, the position of the classifier is to try to maximize its associated
data point distance from the vector line. The longer the distance from the
vector, the stronger the model's confidence in belonging to a particular class.
The machine learns this algorithm based on estimating the data point distance
from the vector line. Knowing this algorithm, the model will try as much as
possible to distance its data point from the specific vector line. Success is
imminent when the model succeeds in maximizing this distance. This
classifier depends on random data points as inputs and a predetermined class
preference as the outcome.
Decision Tree Classifiers/Random Forests - In this training technique, your
objective is to break down your model's data sets into smaller and smaller
subsets. You will keep this process going on using different and various
criteria until you will have exhausted all the approaches available. The model
is trained to divide data sets from a more significant set up until it cannot
break down the subsets any longer. The model then remains with only one
indivisible unit. This unit then forms a key. This training method takes a
pictorial example of a large tree being broken down into its smallest parts.
When you link a range of decision tree classifiers, you get random forest
classifiers.
Naïve Bayes - This classifier is a training method that uses probability as its
mathematical notation. Your model predicts the likelihood of an event based
on a range of independent predictors. The assumption is that all these
predictors have a similar effect on the overall outcome. Naïve Bayes is a
probability training, and the performance of your model will give you a more
honest indication of the model's decision-making process.
The result is independent of your data inputs as most training scenarios. The
model has to make the most appropriate and most correct outcome, given the
limited amount of feedback provided. The best description of this classifier
could be a best-case scenario or model's hunch depending on the likelihood
of a predicted outcome.
Linear Discriminant Analysis - This classifier is best applied to machine
training when the data sets in use have a linear relationship. All the possible
dimensions in the data sets decrease to the smallest size reasonable: a line.
Once all the data sets get into a direct form, a center point is determined. In
addition, the data sets are grouped into classes based on their distances from
this point. Any other criterion may be used to determine the grouping
classification. Maybe, the lasses could depend on regular intervals between
successive data sets.
Another approach used may be arranging the data sets in an ascending or
descending order, and then grouping the data sets into classes based on size
factor or numerical value. You could even train your model to group your
liner data sets into even and odd numbers as a criterion. As you can see, there
are various ways you could reduce the dimensionality of your data. The data
sets are only limited to the number of mathematical notations you can use.
Logistic Regression - Just like in the linear discriminant analysis, this
training model is also a linear classifier that uses the direct relationships
between the data sets. However, unlike the linear discriminant analysis, this
logistic regression will only deal with binary data: zero or one. The data sets
are not as unlimited as before. Both the input data and the predicted outcome
are strictly binary. Any numerical value of less than 0.5 will be expressed as
zero, while higher benefits as 0.5 classify as class 1.
This classifier only has two levels, both in its inputs (features) and outputs
(labels). This training approach limits the model's decision-making process to
an either-or proposition, especially when you present it with insufficient data.
The model will have to learn how to pick the best available scenario from a
bunch of inappropriate choices. It is akin to learning morality in humans, i.e.,
asking yourself, which is the better of two bad options?
Once you have determined your classifier, the next step is for you to
implement your preferred classifier. During implementation, you need to
transfer your specific classifier into the python system. The application will
be followed by instantiation, which typically involves creating a variable and
calling a particular function associated with your classifier. Doing this
instantiation procedure will eventually set your specific classifier into an
existent form in the python program. Training using features and labels data
sets will enable appropriate prediction. In addition, you will well be on your
way to developing a functional artificial intelligence model.
Machine learning classification models
Data pre-processing - This activity is the very first step when embarking on
a machine learning endeavor. The data meant for the classifier has to be
pristine, checked, and rechecked. Any anomalies, minor errors, or
mishandling of the data will lead to irregularities. Your work will have
suffered an early setback before it has even begun. It is therefore vital to
monitor the data handling procedures strictly to avoid any negative impact on
the performance of your classifiers. Remember, garbage in, garbage out. The
classifier will only give you rubbish outcomes when you input rubbish data in
the first place.
You are creating training and testing sets - Once you are satisfied with the
purity and pristineness of your data pre-processing, you can then create your
training and testing data sets. These data sets are essential as a progress-
monitoring tool for your classifiers. Useful data prepared from the previous
stage will, in turn, give you a proper collection of data that you can classify
into training and testing sets. Your input data sets are features while the
classifier outputs are labels. You will use the training sets to determine how
the level of accuracy of your classifier to your known predetermined set
labels.
Repeated training will improve the predictability rate of your classifier.
Regular high ranges of predictions will eventually reach the point of carrying
out a test run. Remember, testing is different from training since the classifier
has never interacted with the test data. There will not be any labels, only
features that have never been present before. This time is typically a nervous
period in the project process. The test feature data set should be trustworthy
since both training and test data sets went through similar pre-processing. If
the test data proves to be good, then the test result will pass as expected.
You are instantiating the classifier - As explained during the python
classification stage, instantiating the classifier involves bringing your
classifier to life as presented on a python program. This process is vital as it
gives you a pictorial representation of a bunch of data sets, your seemingly
random algorithms, and your applied mathematical notations. You can now
finally visualize these formerly abstract concepts. You get to appreciate the
value of the work you have achieved so far. You are now ready and filled
with confidence in your probability of success in your next stage.
You are training the classifier - Training is a learning process for your
classifier. Training involves taking an estimate of how accurate your
classifier is to the actual correct parameters. During training, the classifier is
made aware of both the features and labels since this approach is a form of
supervised practice. You need your classifier to observe, memorize,
internalize, process, and finally give a prediction. You aim to have a
prediction that is as close as possible to your already known outcome.
Training is an extensive and intensive process that involves various data sets
and algorithms. You want to increase the probability of a correct prediction.
Appropriate features are provided to the classifier and depending on its
previous encounters with similar situations, the classifier's ability to predict is
thus improved. This improvement in prediction is a sign that the training
process is working. Success in training is imminent when the classifier
achieves a series of very high forecasts continuously.
You are making predictions - Training your classifier and making
predictions go hand in hand. The purpose of training is to achieve the highest
prediction possible by your classifier. Making a prediction is a feature that
indicates the classifier's ability to process feature data sets. In addition, make
the best decision to provide labels that are considering the right outcome. If
the classifier can maintain high prediction scores, then it will have taken a
massive step towards self-awareness and a capacity for intelligent thoughts.
A high prediction score often warrants testing before possible deployment
into the market.
You are evaluating performance - Evaluating your classifier's performance
is just a confirmatory stage to prove whether your classifier can do what it
claims. This stage is the testing stage. Now, unlike during training, where the
classifier was aware of both the features and expected labels, this stage is
different. Testing is meant to eliminate bias. Different feature data sets will
be in use. The classifier in the course of training and prediction will never
have experienced these features before.
Furthermore, testing does not have a predictable label for the classifier to aim
for and achieve. The training was dependent on your classifier's high
prediction scores. Testing is open and meant to mimic the real world as much
as possible. However, due to the intensity of training, you subjected to the
classifier and the various types of machine learning classification based on
python, your classifier is bound to pass this test. Although it may be highly
improbable to meet all of your expectations on the first test run, some little
elements of tweaking may be warranted.
You are tweaking parameters - After performing the test run and achieving
a certain level of success, you might want to tweak specific settings that
affect particular aspects of your classifier. During testing, it is likely that your
classifier performed excellently in certain areas but not so much in others.
Tweaking is a way to improve performance on the parameters that may be
lacking in a perfect performance.
You should take note that you do not end up overhauling the whole machine
in a vain attempt to achieve perfection. This vanity is highly discouraged as
nothing will ever be perfect enough, at least no more accurate than humans
themselves will. You should be proud of your achievements and alter only
the bare minimum that deserves this tweaking exercise.
Metrics for evaluating machine learning classification models
These metrics describe the available methods of measuring the performance
of your classifier. They include the following:

I. Classification accuracy

This method is the most common method and widely used metric for
measuring the performance of your classifier. This metric is the simplest to
use since it uses the fraction of your overall numerical predictions that turned
out to be correct. It is a function of reliability but only as a factor of repeated
speculation runs, i.e., the more tests you run, the closer you get to an accurate
value. This metric depends on constant practice over time to get better at your
accuracy. This evaluation method is useful in comparing roughly equivalent
sets of observations since it gives you a quick idea of how your classifier is
performing.

II. Logarithmic Loss

Logarithmic loss evaluation short-form is Logloss. This evaluation technique


uses prediction values of zero to one to determine the level of confidence in
your classifier. A probability-based system evaluates your classifier's
confidence in its predicting accuracy. A zero score indicates no faith in your
classifier's ability to achieve the prediction. A score of one is complete
confidence in your classifier's probability in hitting its prediction value. The
Logloss mechanism primarily evaluates classifiers based on faith in their
respective possibilities. You may result in a paradoxical evaluation, such as
having high confidence in a low probability of success.

III. AUC – Area Under ROC Curve

This evaluation method is useful for binary classifications for rating your
ideal prediction. This method is also a form of a probability curve for various
or different classes of data sets. This difference in data classes is strictly
binary, as well. AUC uses the well-known binary principle: if one of two
possibilities is a good indicator, then the other will invariably be the opposite.
This technique indicates how good your classifier is for differentiating
between the available classes. Remember that although this distinction will
be in terms of predicted probability, the results can only be binary. For your
information, ROC is an acronym that spells receiving operating
characteristics. Most AUC graphs typically have prediction indicators on
both the horizontal and vertical axes.
Additionally, the two axes go only up to a maximum of 1 in value. When
plotting your values, the resulting graph will be your ROC. Your AUC is
your graphical region covered between your ROC and the horizontal axis.
The larger the area covered, the better your classifier is at distinguishing the
particular classes. The ideal value is one as it indicates that your classifier is
perfect. An AUC of 0.5 is just as good as a random hunch.

IV. Confusion matrix

Just like the AUC, a confusion matrix also uses graphical representation to
conduct your evaluation. However, unlike AUC, a confusion matrix is not
strictly based on binary variables as the only possible outcomes. A confusion
matrix makes use of a chart, a table, or a graph to interpret the relationship
between your classifier and its corresponding output graphically. You will
notice that the confusion matrix has an inverse relationship between your
classifier and your expected outcome.
This evaluation method comprises a graph with predictions indicated on the
X-axis and results (prediction accuracy) are shown on the Y-axis. Your
realization visualizes the inverse relationship that correct predictions will lie
at a point roughly on the right-sided downward trajectory. A high prediction
value within your confusion matrix, the better it is for you since it means that
your classifier made many right predictions. Such a high yield prediction
value from your confusion matrix typically indicates a high potential for
success in your classifier project and possible future intelligence models.
This evaluation technique is rare since its interpretation can sometimes be
confusing and misleading. A misinterpreted evaluation metric may lead to
often-disastrous decisions down the line. Hence, many machine-learning
enthusiasts typically prefer evaluations that easily catch your eye. In addition,
such a metric should be interpretable at a glance. An example of such a
simple, straightforward metric is a classification accuracy ratio based on your
predictions.

V. Classification report

This metric is essentially a report, which provides you with a brief intuition
of the progress of your classifier or model. This particular technique was built
to solve most of the classification difficulties encountered in machine
learning. This evaluation metric makes use of terms such as recall, f1-score,
and precision.
A recall is the report of your proximity or discrepancy in data set classes
categorized as a specific class against the correct actual number classified
within that particular class. Precision has a close relationship to recall since it
indicates your accurate estimation of one particular data type, and it turned
out to be correct. This ratio is a reflection of your true positives against false
positives. The mean point definitions between precision and recall is an f1-
score.
Chapter 7: Machine Learning Training Model
In machine learning, a model is a mathematical or digital representation of a
real-world process. To build a good Machine Learning (ML) model,
developers need to provide the right training data to an algorithm. An
algorithm, on the other hand, is a hypothetical set taken before training
begins with real-world data.
A linear regression algorithm, for example, is a set of functions defining
similar characteristics or features as defined by linear regression. Developers
choose the function that fits most of the training data from a set or group of
functions. The process of training for machine learning involves providing an
algorithm with training data.
The basic purpose of creating any ML model is to expose it to a lot of input,
as well as the output applicable to it, allowing it to analyze this data and use it
to determine the relationship between it and the results. For example, if a
person wants to decide whether to carry an umbrella or not depending on the
weather, he/she will need to look at the weather conditions, which, in this
case, is the training data.
Professional data scientists spend more of their time and effort on the steps
preceding the following processes:
1. Data exploration
2. Data cleaning
3. Engineering new features
Simple Machine Training Model in Python
When it comes to machine learning, having the right data is more important
than having the ability to write a fancy algorithm. A good modeling process
will protect against over-fitting and maximize performance. In machine
learning, data is a limited resource, which developers should spend doing the
following:

1. Feeding their algorithm or training their model


2. Testing their model

However, they cannot reuse the same data to perform both functions. If they
do this, they could over-fit their model and they would not even know. The
effectiveness of a model depends on its ability to predict unseen or new data;
therefore, it is important to have separate training and test different sections
of the dataset. The primary aim of using training sets is to fit and fine-tune
one's model. Test sets, on the other hand, are new datasets for the evaluation
of one's model.
Before doing anything else, it is important to split data to get the best
estimates of the model's performance. After doing this, one should avoid
touching the test sets until one is ready to choose the final model. Comparing
training versus test performance allows developers to avoid over-fitting. If a
model's performance is adequate or exceptional on the training data but
inadequate on the test data, then the model has this problem.
In the field of machine learning, over-fitting is one of the most important
considerations. It describes how well the target function's approximation
correlates with the training data provided. It happens when the training data
provided has a high signal to noise ratio, which will lead to poor predictions.
Essentially, an ML model is over-fitting if it fits the training data
exceptionally well while generalizing new data poorly. Developers overcome
this problem by creating a penalty on the model's parameters, thereby
limiting the model's freedom.
When professionals talk about tuning models in machine learning, they
usually mean working on hyper-parameters. In machine learning, there are
two main types of parameters, i.e., model parameters and hyper-parameters.
The first type defines individual models and is a learned attribute, such as
decision tree locations and regression coefficients.
The second type, however, defines higher-level settings for machine learning
algorithms, such as the number of trees in a random forest algorithm or the
strength of the penalty used in regression algorithms.
The process of training a machine-learning model involves providing an
algorithm with training data. The term machine-learning model refers to the
model artifact created by the ML training process. This data should contain
the right answer, known as the target attribute. The algorithm looks for
patterns in the data that point to the answer it wants to predict and creates a
model that captures these different patterns.
Developers can use machine-learning models to generate predictions on new
data for which they do not know the target attributes. Supposing a developer
wanted to train a model to predict whether an email is legitimate or spam, for
example, he/she would give it training data containing emails with known
labels that define the emails as either spam or not spam. Using this data to
train the model will result in it trying to predict whether a new email is
legitimate or spam.
Simple ML Python Model using Linear Regression
When it comes to building a simple ML model in Python, beginners need to
download and install sci-kit-learn, an open-source Python library with a wide
variety of visualization, cross-validation, pre-processing, and machine
learning algorithms using a unified user-interface. It offers easy-to-
understand and use functions designed to save a significant amount of time
and effort. Developers also need to have Python Version 3 installed in their
system.
Some of the most important features of sci-kit-learn include;
1. Efficient and easy-to-use tools for data analysis and data mining
2. BSD license
3. Reusable in many different contexts and highly accessible
4. Built on the top of matplotlib, SciPy, and NumPy
5. Functionality for companion tasks
6. Excellent documentation
7. Tuning parameters with sensible defaults
8. User-interface supporting various ML models
Before installing this library, users need to have SciPy and NumPy installed.
If they already have a data set, they need to split it into training data, testing
data, and validation data. However, in this example, they are creating their
own training set, which will contain both the input and desired output values
of the data set they want to use to train their model. To load an external
dataset, they can use the Panda library, which will allow them to easily load
and manipulate datasets.
Their input data will consist of random integer values, which will generate a
random integer N; for instance, a <= N <= b. As such, they will create a
function that will determine the output. Recall a function uses some input
value to return some output value. Having created their training set, they will
split each row into an input training set and its related output training set,
resulting in two lists of all inputs and their corresponding outputs.
Benefits of splitting datasets include:
1. Gaining the ability to train and test the model on different types of data
than the data used for training
2. Testing the model's accuracy, which is better than testing the accuracy
of out-of-sample training
3. Ability to evaluate predictions using response values for the test
datasets
They will then use the linear regression method from Python's sci-kit-learn
library to create and train their model, which will try to imitate the function
they created for the ML training dataset. At this point, they will need to
determine whether their model can imitate the programmed function and
generate the correct answer or accurate prediction.
Here, the ML model analyzes the training data and uses it to calculate the
coefficients or weights to assign to the inputs to return the right outputs. By
providing it with the right test data, the model will arrive at the correct
answer.
Chapter 8: Developing a Machine Learning Model
with Python
The benefits of machine learning are the models that make predictions and
the predictions themselves. To succeed in this field, developers need to
understand how to deliver accurate, reliable, and consistent predictions,
which requires a systematic approach to building an ML model. Essentially,
it depends on the mathematics applied in harnessing and quantifying
uncertainty.
The main steps involved in developing a powerful machine-learning model
are:

1. Define the problem to solve or objectives to meet


2. Adjusting one's mindset
3. Choosing a systematic process
4. Picking a tool, such as Python
5. Choosing the right datasets to work on and practicing the process
6. Building an ML portfolio

Python is one of the fastest-growing and most popular ML platforms. When it


comes to developing an ML model with Python, the first step is to define
one's problem or objectives. Using the fanciest and most powerful algorithms
is meaningless if one is focusing on the wrong problem. Therefore, it is
important to think deeply about one's problem or objectives before starting.
This is one of the most important steps when it comes to developing a good
ML model.
When defining a new problem, one needs to use a simple framework that will
help one understand the motivation and elements of the problem. This
involves answering a few important questions, such as:

1. What is the main problem or objective?


2. Why do I need to solve this problem?
3. How should I go about solving it?

Describing the problem as though one were describing it to a friend is a good


way to begin. Doing this will help one identify areas that one needs to focus
on or address. In addition, it will provide the foundation for a specific
sentence description one will use to analyze the problem. For example, "I
need a model that will identify which emails need a response."
Having described the problem, one needs to think about why one needs to
solve it, which means considering the motivation for solving it, the benefits
of solving it, and the possible solutions, including the type of lifetime one
expects from the chosen solution.
The final step is to determine how one would solve the problem or achieve
one's objectives manually. This involves identifying the data one needs to
gather, how to prepare it, and how to design a model to load the data and
solve the problem. This step may include experiments and prototypes one
will need to use.
They are helpful because they can help identify uncertainties and knowledge
about a potentially useful domain to explore. They can also help identify
problems that simply need a manually implemented solution, in addition to
uncovering hidden but valuable knowledge.
Installing the Python and SciPy Platforms
People typically install python packages from either PyPI or Conda package
repositories. Every version of Python comes with its own set of packages;
however, it is usually good practice to isolate them within a Python
environment using the reticulate package so than updating a package for one
project does not affect other projects. Reticulate functions, on the other hand,
create Python environments and create packages within them.
Many professionals recommend using the Anaconda Python distribution to
install Python. For computational modeling and scientific computing,
developers need additional packages that are not available in the standard
Python library. These packages will allow them to use specialized numerical
methods and create plots. These packages are numpy, pandas, scipy, and
matplotlib, and are the building blocks for computational work with Python.
Anaconda is one of the most popular Python distributions, providing the
Python interpreter and a list of packages and applications. Generally,
installing Python is easy; however, installing additional packages can be a bit
difficult. The Anaconda Python distribution is available for free download for
Linux operation systems, OS X, and Windows. For OS X and Windows,
users can download either the next based installer or the graphical installer.
For beginners, the graphical installer is a better choice.
Once they have installed the Python distribution of their choice, users should
download a testing program, such as Spyder, and execute it. If it produces the
desired results, chances are that the installation of Python and other packages,
including SciPy, was successful.
SciPy, or scientific Python, provide useful functionalities. It is an open-
source Python-based library used in scientific computing, mathematics,
technical computing, and engineering. It is available for download from
www.scipy.org/download. Once downloaded, users simply need to unpack,
compile, and install it using a custom installation path.
It contains many different sub-packages, including image manipulation; file
input and output, signal processing, special function, numerical integration,
linear algebra operation, statistics, and random numbers, and others.
Loading the Dataset
Before a developer can begin an ML project, he/she must load his/her dataset.
The most common format for ML data is CSV files. There are several
considerations when it comes to loading an ML dataset from these types of
files. The first consideration is the file header. If the data lacks a file header,
one may need to name each file attribute manually. However, if it has a file
header, it will help in naming each column of data. In both cases, one needs
to specify whether one's data has a file header before loading.
It is also important to consider whether the CSV file contains comments,
usually indicated using a hash sign at the start of a line. If it has comments,
depending on the method used to load the dataset, one may have to indicate
the characters to expect to identify a comment line. Other considerations
when it comes to loading CSV files are quotes and delimiters.
Each process for loading ML datasets is standalone, which means users have
the ability to copy and paste it into their projects. Python's standard library,
for example, provides the function reader and the module CSV to load CSV
files. Once loaded, users can convert their datasets to a NumPy array and
employ it for machine learning.
Another way to load datasets is by using the NumPy and numpy.loadtxt
function, which assumes that all data has the same format, and there is no
header row. Developers can also load their CSV files using Pandas, which is
an easy and flexible process that returns a data-frame that they can
immediately begin plotting and summarizing.
ML algorithms learn from data; therefore, it is important to provide them
with the right data for the problem one wants to solve. It is also important to
ensure the data is in a useful format and scale, in addition to having
meaningful features. The more disciplined users are in their handling of ML
datasets, the more consistent and accurate predictions they are likely to
achieve. The process of preparing data for an ML algorithm involves three
main steps, which are:

1. Data selection
2. Data preprocessing
3. Data transformation

The first step involves choosing the subset of all available data; however, the
data selected should address the problem one wants to solve. The second step
is about considering how one will use the data and getting it into a workable
format. This involves three common preprocessing steps, which are
formatting, cleaning, and sampling the data.
The third step is to transform the preprocessed data. Here, knowledge of the
problem at hand and the specific algorithm one is working with will have an
influence on this step. The main processes involved in this step are scaling,
decomposition of attributes, and attribute aggregation.
Data preparation is one of the most important processes in an ML project. It
involves a lot of analysis, exploration, and iterations; however, learning how
to do it properly will help one succeed in this field.
Summarizing the Dataset
People today have access to tons of information due to the rise of the internet.
In fact, this information bombards people from different sources, such as
social media, news media, emails, and many other sources. People wish there
was someone to summarize the most important information for them.
Actually, machine learning is getting there.
Using the latest technological advances in the field of ML, developers can
now group large datasets by different variables and use summary functions
on each group. Python's rich Pandas library, for example, is an excellent
analysis tool designed to help developers summarize their datasets.
To demonstrate the simplicity and effectiveness of such an analysis tool,
assume one has a dataset containing mobile phone usage records will 1,000
entries from the phone log spanning 6 months. One can simply load a CSV
file containing this data using Panda's DataFrame function, which will create
several columns in the file, including date, duration, item, month, network,
and network type.
Having loaded this data into Python, the calculation of different statistics for
columns will be very simple. These calculations might include standard
deviations, minimum, maximum, mean, and more. For example:
1. How many rows are in the dataset
2. Data['item'].count()
3. Out[40]:1000
Unless one has very specific requirements, the need to create and use custom
functions is minimal or not necessary. The standard Pandas package comes
with a wide range of quickly calculable basic statistics, including sum, count,
median, mean, min, max, prod, abs, mode, skew, var, std, cunsum, sem,
quantile, and more. Developers can quickly apply these functions to gain
summary data for each group, which is a very helpful function.
In addition, they can include or exclude different variables from each
summary requirement, or combine more than one variable to perform more
detailed and complex queries.
Visualizing the Dataset
Sometimes, data might fail to make sense until one uses a visual form to look
at it; for example, using plots and charts. Having the ability to visualize a
dataset is a critical skill in both machine learning and applied statistics. Data
visualization provides useful tools for gaining a deeper understanding, which
is important when it comes to exploring and understanding a dataset. It is also
useful in identifying outliers, corrupt data, patterns, and more.
Developers can use data visualizations, with a little domain knowledge, to
identify and show important relationships in charts and plots to stakeholders.
In fact, data visualization is a whole field in itself. To understand the basics
of data visualization, developers need to explore and understand several key
plots, including:
1. Line plot
2. Scatter plot
3. Bar chart
4. Whisker and box plot
5. Histogram plot
Understanding these plots will help them gain a qualitative understanding of
any data they come across. Python offers many useful plotting libraries, and
it is important to explore them to learn how to build effective visual graphics.
Developers can use the matplotlib library, for example, to create quick plots
meant for their own use. This library acts as the foundation for other more
complex libraries and plotting support.

Line Plot

A line plot provides observations gathered at regular intervals, with the x-axis
representing the interval, such as time, and the y-axis representing certain
observations connected to the line and ordered by the x-axis. Developers can
create this type of visualization plot by using the 'plot()' function and
providing the interval data and observations. It is helpful when it comes to
presenting any sequence data, such as time-series data, where there is a
connection between observations.

Scatter Plot

This visualization plot summarizes the relationship between two combined


data samples. Essentially, combined data samples are two different measures
of a particular observation, such as the height and weight of an individual. In
a scatter plot, the x-axis represents the observation values of one sample,
while the y-axis represents those of the other sample. This type of plot can
help developers identify and quantify the relationship between two variables.

Bar Chart

This tool is ideal for presenting relative quantities for many categories. A bar
chart usually features an e-axis with evenly spaced categories, and a y-axis
representing each category's quality. Developers use the bar() function to
create this chart and pass the qualities for the y-axis and category
names/labels for the x-axis. This type of visualization tool is helpful for
comparing estimations and multiple point qualities.

Whisker and Box Plot

Also called boxplot, in short, this visualization tool aims to summarize the
distribution of a data sample. The data sample appears on the x-axis, which
can include many boxplots drawn side by side. The y-axis, on the other hand,
features observation values. Essentially, this type of plot shows the
distribution of variables to help developers see the outlying points, spread,
location, tile length, and skewness of sample data to provide an idea of the
range of sensible and common values in the whisker and in the box
respectively.

Histogram Plot

This type of plot is ideal for summarizing the distribution of a sample dataset.
Its x-axis represents intervals or discrete bins for different observations, and
its y-axis represents count or frequency of the observations belonging to each
interval. In other words, this visual plot is a density estimate that transforms
sample data into a bar chart to provide a clear impression of the distribution
of data.
Evaluating some Algorithms
Once developers have defined their problem and prepared their datasets, they
need to use ML algorithms to solve their problems. They can spend most of
their time selecting, running and tuning their algorithms; however, they
should ensure they are using their time effectively to improve their chances of
meeting their goals.

Test Harness

To evaluate some algorithms, developers need to define a test harness, which


refers to the data they will use to train and test their algorithm, as well as the
measures they will employ to assess its performance. Using a well-defined
test harness will help them think deeply about their specific problems and
focus on evaluating various algorithms against a fair representation of the
problem they are facing.
The outcome of doing this will be an estimation of how different algorithms
perform based on a specific performance measure. They will be able to
identify algorithms that might be worth tuning further and those they need to
set aside. The results produced by their test harness will also give them an
idea of how learnable their problem really is, especially if different
algorithms perform poorly on their specific problem, which might point to
the use of a poor ML structure.

Performance Measure

This is the measurement of an algorithm's predictions based on the test data.


Usually, performance measures focus on the class of problem one is working
with, such as clustering, regression, and classification. Most performance
measures provide a meaningful score to one's problem domain, such as the
accuracy of classification.
Other common evaluation techniques include:

1. Cross-validation
2. Test and train datasets

Evaluating some algorithms is useful because it helps developers to


determine whether there are any learnable structures for their model in the
estimate and data, which may be effective in solving their problem. It also
helps them work out any difficulties and ensure the achievement of the
desired results or performance.
Making some Predictions
After evaluating some algorithms, it is now time to use the model and make
some predictions. Developers may need to revisit their reason for building the
algorithm and remind themselves of the form they want their solution to take.
It is important to continue to test the model to improve its performance by
doing the following:
1. Revisiting the original forward selection process
2. Selecting important features and creating new ones to determine their
correlation
3. Feeding the model with data to check the results and performance
4. Keeping track of any features that might improve the model, and get rid
of any features that do not improve the model
Predictions in machine learning refer to the output of algorithms trained on
historical data and provided new data to forecast the probability of a certain
outcome. In the process of making predictions, algorithms should provide
values for an unknown variable based on new data, allowing developers to
determine what the values will likely be.
ML model predictions allow users to make highly accurate guesses as to the
probable answers or outcomes of a question based on old data. They provide
users with insights that lead to tangible value. For example, if an ML model
predicts the likelihood of a consumer making a purchase, the business
concerned will target him/her with specific communication and information
to encourage him/her to make the purchase.
In order to begin making predictions, users need to deploy their models into
production or operational application. Developers could simply install their
model into their software and release it; however, this is not advisable. By
adding a few simple steps, they can build confidence that the model they
want to deploy remains accurate and maintainable over the long-term. Some
of the steps to take include:

1. Specifying performance requirements


2. Separating the predictive algorithm from the model internals
3. Developing automated tests for the model
4. Developing current testing and back-testing infrastructure
5. Updating the model

Developers use the final ML model to make predictions on new data.


Essentially, it is the pinnacle of the entire process. Developers pretend the test
data is new and withhold the output values from the algorithm. They then
collect predictions from the final model and compare them with the withheld
output values to help them determine the model's performance. If everything
goes well, they apply the ML procedure to all their data. Essentially, with the
finalized model, they can make predictions on new data or save it for
operational or later use.
When many beginners start looking into machine learning, often, they have
no idea about what they are reading. This is because most articles, blogs, and
other resources use scary equations and weird jargon that is difficult to figure
out. However, with time, patience, and effort, people can gain adequate
knowledge about this amazing and exciting field of data science. Anyone can
do it.
Chapter 9: Training Simple Machine Learning
Algorithms for Classification
People are living in one of the most defining periods of human history.
However, it is not what is happening now that makes the period so exciting;
rather, it is what is coming in the near future. Today, data scientists and
developers can use complex algorithms to build powerful data-crunching
machines that would have seemed like something out of science fiction only
a few years ago.
There are two ways to group the main algorithms, these are:
1. Grouping them by their similarity in function or form
2. Grouping them by their learning style
Grouping algorithms by their similarity in terms of how they work is a useful
method of categorization. However, it is important to understand that certain
algorithms can easily fit into several different categories.
On the other hand, it is common in artificial intelligence and machine
learning to first look at the different learning styles that an algorithm can use.
Algorithms have a small number of machine-learning models and learning
styles. This way of grouping them is useful because it forces developers to
consider the model preparation process and the roles of the input data to find
the most appropriate option needed to achieve the best results. The main
learning styles in algorithms include supervised learning, unsupervised
learning, and semi-supervised learning.

1. Supervised Learning

Supervised learning aims to predict a target/dependent variable from a given


set of independent variables or predictors. Using these predictors, developers
generate functions that match inputs to desired outputs. The training or
learning process continues until the algorithm achieves the desired level of
accuracy. Examples of this type of learning model include Logistic
Regression, KNN, Random Forest, Decision Tree, and others.

2. Unsupervised Learning

An algorithm using unsupervised learning does not have any outcome or


target variable to estimate or predict. It is ideal for clustering or grouping
population, which is helpful when it comes to segmenting consumers in
different groups to achieve certain objectives. Examples of this type of
algorithm are K-means and Apriori algorithm.

3. Reinforcement Learning

Developers use this type of algorithm to train machines to take specific


actions. Essentially, they expose their machine to certain environments where
it continually trains itself using trial and error, thereby learning from
experiences. In the process, the machine gains more knowledge to make
accurate decisions. An example of this type of machine learning is the
Markov Decision Process.

4. Semi-Supervised Learning

This model falls somewhere between supervised and unsupervised learning.


Developers combine some features present in both models to produce the
desired outcome. It is one of the most important and useful algorithms for
real-world scenarios, where all the information available is a combination of
unlabeled and labeled data.
Common Machine Learning Algorithms
Most machine-learning algorithms are applicable to almost any data problem.
Some of the commonly used learning algorithms include:
Linear Regression
Initially developed to analyze the relationship between input and output
numerical values in statistics, linear regression soon caught the attention of
the machine learning community. Developers use it to estimate or predict real
values based on continuous variables. These values can include total sales,
home costs, number of calls, and more. It works by establishing a relationship
between dependent variables and independent variables by fitting a
regression line, also called the best-fit line.
A regression line in the form of y= a + bx represents linear regression, where
'y' is the output variable and 'x' is the input variable. The goal, therefore, is to
determine the values of coefficients 'a', which is the intercept, and 'b', which
is the slope of the regression line. The aim is to plot a line that is near most of
the points, which would minimize the error or distance between the line and
the 'y' value of the data point.
A good way to understand this concept is to use the example of a kid asked to
arrange his/her classmates based on their weights without measuring their
weights or asking them to state their weights. Obviously, the kid will analyze
them visually by looking at their build and height and arrange them
accordingly. This is a real-life application of linear regression. Essentially,
he/she will know that build and height would have a relationship to the
weight, which explains the equation above.
There are two types of linear regression. These are:

1. Simple linear regression


2. Multiple linear regression

SLR comprises a single independent variable, while MLR, as the name


suggests, has multiple independent variables. To find the best-fit line, one
needs to fit a curvilinear or polynomial regression.
Sometimes, instead of a straight line, a curve may represent the relationship
between two different variables. Nevertheless, in terms of linear regression,
this does not mean that such relationships are non-linear. Rather, this is a type
of relationship where an increase in one variable will lead to an increase in
the other variable up to a certain level. Afterward, as continues to increase,
the other one will decrease.
Therefore, graphing this type of curvilinear relationship will produce an
inverted-U. A curvilinear relationship might also happen whereas one
variable goes up, the other one goes down up to a certain point, after which,
both go up together, which will produce a U-shaped curve.
A real-world example of this type of relationship is the relationship between
customer satisfaction and employee satisfaction. When the level of employee
satisfaction is high, customer satisfaction tends to go up as well, but up to a
certain point. When employees are too cheerful or happy, customers may
perceive it to be annoying or fake, which would lower their level of
satisfaction.
Polynomial regression fits a non-linear relationship between the independent
variable and the corresponding conditional mean of the dependent variable.
Developers hypothesize certain relationships as curvilinear. Often, such
relationships will have a polynomial term. If they try to fit a linear model to
such relationships, the result achieved will not be appropriate. In usual MLR,
the assumption is that all independent variables are independent; however,
this assumption does satisfy polynomial regression.
Often, developers use polynomial regression to describe or define this type of
non-linear relationships, which might include:
1. The spread of disease epidemics
2. Tissue growth rates
3. Carbon isotope distribution in lake sediments
By using polynomial regression, developers or researchers can fit a wide
range of functions under it to fit a wide range of curvature. In addition, this
regression model offers the best approximation of the relationships between
independent and dependent variables. However, it is important to understand
that polynomial regression is extremely sensitive to outliers in the data.
Logistic Regression
Whereas predictions for linear regression are continuous values, those
applying to logistic regression are discrete values after employing a
transformation function. This model is ideal for binary classification, which
refers to data sets where (y = 1 or 0), where 1 identifies the default class.
Suppose a researcher wants to predict whether something will happen or not.
In this case, there are only two possibilities; i.e., it will happen, which he will
denote as (1), or it will not happen, which will be (0).
That said, in spite of its name, it is important to understand that this is a
classification algorithm, not a regression technique. Many experts, however,
insist it is a regression model. Essentially, this algorithm estimates discrete
values, such as true/false, yes/no, or 0/1, based on independent variables. For
example, if a teacher gives a student a puzzle to solve, the student will either
solve it or fail to solve it.
In linear regression, the assumption is that data follows a linear function. In
the same way, logistic regression algorithms apply the sigmoid function when
it comes to modeling data. Only after reaching the decision threshold does
logistic regression become a classification model; therefore, this threshold is
a critical aspect of this type of algorithm.
The values of precision and recall are the main determinants of the value of
the threshold. Ideally, both should be 1; however, this is rarely the case.
Therefore, to choose the right threshold value, the following factors come
into play:

1. High Recall/Low Precision

In programs where developers want to minimize false negatives without


reducing the number of false positives, they often go for a threshold value
with a high value for recall and low value for precision. A real-world
application of this model is an application to diagnose cancer.
Users do not want the application to classify patients with cancer as healthy
without considering the possibility of a wrong diagnosis. This is because
further medical issues or procedures can confirm the absence of cancer, but
the detection of cancer cannot happen in an already cleared or rejected
candidate.

2. Low Recall/High Precision

In programs where developers want to lower the occurrence of false positives


without lowering the number of false negatives, they choose this decision or
threshold value. For example, if their aim is to classify consumers based on
their positive or negative reaction to a certain type of marketing strategy, they
want to be sure of a positive reaction. Otherwise, a negative reaction can have
a terrible impact on the company's sales and revenue.
Therefore, based on the broad range of categories, logistic regression can be
ordinal, multinomial, or binomial. In other words, it can focus on target
variables with ordered categories, target variables with more than two
possible disordered categories, or target variables with only two possible
types of categories.
Decision Tree
This is one of the most popular algorithms used in machine learning. Every
time people ask themselves a question before making a choice or decision,
their brain is working like a decision tree. For example, is it raining outside?
If yes, I will take an umbrella. Essentially, the aim of this algorithm is to
break down information into smaller pieces of data based on a particular
feature value, until all variables targeted fall under a single category.
For instance, suppose a developer wants to build a decision tree to determine
whether an individual is a child or an adult based on height and weight. He
could break down the data points based on a set of values of one of the two
characteristics, such as, for a weight greater than 40 kg, the individual is an
adult. However, if it is less than that, his subset will contain two kids and one
adult, which means he will need to separate it again, until only one class
remains, or until all his subsets are pure.
This is a type of supervised machine learning. The aim of this type of
algorithm is to solve both classification and regression problems. Decision
tree algorithms use the representation of a tree to solve problems, whereby
each leaf node represents a class label, while the internal node of the tree
represents attributes. Using this algorithm, developers can represent any type
of Boolean function on certain attributes.
This type of algorithm makes several assumptions, including:
1. There is a recursive distribution of records based on attribute values
2. The whole training set is the root at the beginning
3. The use of statistical methods to order attributes as the internal node or
root
4. Feature values should be categorical
The major challenge in a decision tree algorithm is the process of identifying
the attribute for the root node, also referred to as attribute selection. The two
main methods of attribute selection are:

1. Information gain
2. Gini index

To build a decision tree using information gain, developers start with all
training instances related to the root node, choose the attributes for each
node, and recursively build each sub-tree based on training instances that will
flow down that path in the decision tree.
The Gini index, on the other hand, is a metric to determine the number of
times an element chosen at random will get a wrong identification.
Essentially, developers want an attribute with a lower Gini index. The most
common types of decision tree algorithms are C4.5, Iterative Dichotomiser 3,
and Classification and Regression Tree.
SVM
Most beginners in the field of machine learning start by learning about the
different regression algorithms because they are relatively simple to learn and
use. However, beginners who want to succeed in this field need to learn much
more. A carpenter, for example, needs to know how to use the wide range of
tools used in carpentry. In the same way, to master machine learning, one
needs to learn about different types of algorithms.
If one thinks of regression as a sword capable of dicing and slicing data with
a few strong swings, then a support vector machine, or SVM, is like a very
sharp knife ideal for working on smaller datasets; however, it is also a
powerful building model. Essentially, SVM is a supervised learning
algorithm, and developers can use it for both regression and classification
problems.
Using SVM, developers plot each item of data as a point in n-dimensional
space with the value of each feature being a certain coordinate's value. In this
case, n is the number of features they have. Afterward, they classify this data
by identifying the hyper-plane that separates the two classes. That is quite a
mouthful. To put it simply, support vectors are the coordinates of certain
observations, and SVM is the tool that best differentiates the two classes of
data points.
To separate these classes, developers can choose one of many hyper-planes to
find one that has the maximum distance between data points of different
classes. Maximizing this distance offers some protection for a more confident
classification of future data points.
In support vector machines, hyper-planes represent decision boundaries that
help developers classify data points. As such, developers attribute any data
point appearing above or below the hyper-plane or line a particular class. In
addition, the hyperplane's dimension depends on the number of features
present.
Therefore, if there are two input features, like height and weight, the hyper-
plane will be a line; on the other hand, it will be a two-dimensional plane if
there are three features, such as hair color, height, and eye color. If there are
more than three features, it is difficult to imagine what it becomes.
Naïve Bayes
A classification model based on Bayes' theorem, Naïve Bayes is an effective,
yet simple machine learning classifier that makes classifications using a
decision rule, known as the Maximum A Posteriori, in a Bayesian setting.
This probabilistic classifier algorithm assumes that the availability of a
certain class feature is unrelated to the availability of any other feature, and is
very popular for text classification.
To explain this model in a simple way, one may consider a certain fruit an
apple if it is about three inches in diameter, round, and red. Even if these
features depend upon other features or on each other, this classification
model would consider all of these features as contributing to the probability
of the fruit being an apple.
This algorithm is especially useful when it comes to working on large data
sets. It addition to being simple to understand and easy to build, it can
actually perform better than more complex classification methods. It also
provides an easy way to calculate the posterior probability.
KNN
The K-Nearest Neighbor, KNN, is another simple and easy-to-use
classification algorithm. Considered a lazy learning and non-parametric
algorithm, it uses data and classifies new data points based on certain
similarities, such as the distance function, through a majority vote to its
neighbors.
KNN is a supervised machine learning algorithm used to solve both
regression and classification problems, which contain a discrete value as their
output. For example, "likes cream in coffee" and "does not like cream in
coffee" are discrete values because there is no middle ground.
The assumption when it comes to KNN algorithms is that similar things are
near each other, or exist in close proximity. The popular saying that goes,
'birds of a feather flock together,' comes to mind. This type of machine
learning algorithm captures the idea of closeness, proximity, or distance with
some simple calculations, involving the process of figuring out the distance
between two points on a graph.
There are many different ways of calculating this distance, and, the chosen
method will depend on the problem at hand. However, the Euclidean
distance, also known as the straight-line distance, is one of the most familiar
and popular choices. To determine the right value for K, developers run the
algorithm many times using different K-values and choose the one that
returns the least amount of errors while maintaining the program's ability to
make accurate predictions when given unfamiliar data.
Some of the things to remember when using KNN are:
1. Predictions become more stable with the increase in the value of K due
to averaging or majority voting; therefore, up to a certain point, the
algorithm is will make predictions that are more accurate. Eventually,
however, it will begin making more errors, which is the point where
developers know they have pushed the K-value too far.
2. On the other hand, predictions will become less accurate as the value of
K decreases.
3. In the case of a majority vote among labels, developers make the value
of K an odd number to make a tiebreaker.
There are several benefits of using this type of algorithm. In addition to it
being simple and easy to implement, developers do not need to make
additional assumptions, tune several parameters, or build a model. In
addition, it is versatile and applicable to search, regression, and classification.
The biggest disadvantage of this algorithm is that it gets a lot slower as the
number of independent variables, predictors, or examples increases.
K-Means Clustering
This is a popular and simple unsupervised machine-learning algorithm that
refers from datasets solely through input vectors, without the need to
reference labeled or known outcomes. This algorithm's objective is to
combine similar data points and discover any possible patterns. K-means
achieves its objective by identifying a fixed number of clusters in a dataset.
In this case, clusters refer to a group of data points connected by certain
similarities. Developers define a target 'k' value, which represents the number
of real or imaginary locations they need in the dataset. In addition, they
allocate each data point to a particular cluster by reducing the sum of squares
in the in-cluster. Another name for these real or imaginary locations is
centroids.
Essentially, this algorithm identifies the 'k' number of centroids and then
allocates each data point to the neighboring cluster, while making sure each
centroid remains as small as possible.
To process learning data, this algorithm starts with a randomly selected group
of centroids and uses them as the starting points for each cluster. It then
performs repetitive or iterative calculations to improve the centroids'
positions, only stopping when the centroids stabilize or when it reaches the
defined number of repetitions.
This algorithm is versatile and applicable in many types of groupings, such
as:
1. Behavioral segmentation, including the creation of profiles based on
activity monitoring, purchase history, interests, or activity
2. Inventory grouping based on manufacturing metrics, sales activity, and
more
3. Sorting sensor measurements
4. Detecting anomalies and bots
This is an easy-to-understand and popular technique for data cluster analysis,
which often delivers quick training results. However, its performance may
not be as powerful as that of other more complex clustering techniques since
any small change in the data could cause a massive variance. In addition,
clusters tend to be evenly sized and spherical, which may negatively affect
the accuracy of this algorithm.
Random Forest
Everything about machine learning is interesting, not least some of these
algorithm names. The random forest algorithm offers a good solution for both
regression and classification problems, in addition to offering many other
advantages. To begin with, this is a supervised classification algorithm,
which, as its name suggests, aims to create a forest and make it random.
The level of accuracy and number of results it can generate will depend on
the number of trees planted in the forest. However, it is important to
understand that there is a difference between creating the forest and
constructing the decision through the process of information gain, also known
as the gain index approach.
In this case, a decision tree refers to a tool used to support decisions using a
tree-like representation or graph that identifies the possible results or
consequences. Every time a user inputs a training dataset with features and
targets into the tree, the algorithm will generate some rule-sets to perform
certain predictions.
For example, suppose a user wants to predict whether a particular kid will
enjoy a new animated movie. He/she needs to identify other animated movies
the kid likes and use some of the features of those movies as the input.
Having done this, he/she will generate the rules through this algorithm. Then,
he/she will input the features of the target movie and determine whether the
kid will enjoy watching it, which involves using Gini index calculations and
information gain.
The main difference between this algorithm and the decision tree algorithm
discussed earlier is that in this machine-learning algorithm, the processes
involved in identifying the root node and separating feature nodes run
randomly. If the forest has enough trees, this algorithm will not overwhelm
the model; in addition, it can handle missing values and categorical values.
Here is another real-life example to make this classifier easy to understand.
Suppose Simon wants to take his 3-week vacation in a different location but
does not know where to go. He asks his brother, Paul, for advice and Paul
asks him where he has already vacationed and what he liked about those
places. Based on Simon's answers, Paul offers him several recommendations.
Essentially, Paul is helping his brother form a decision tree.
However, Simon decides to ask more people for advice because he needs
more recommendations to make the best decision. Other people also ask him
random questions before offering suggestions. Simon considers the
suggestion with the most votes as his possible vacation destination.
His brother asked him some questions and offered ideas of the ideal venue
based on his answers. This is an example of a typical approach to a decision
tree algorithm because his brother created the rules based on the answers
Simon provided and used the rules to determine the option that matched the
rules.
The other people he asked also asked random questions and offered their
suggestions, which for him were the votes for each particular place.
Eventually, he chose the place with the most votes. This is a simple, yet
perfect example of a random forest algorithm approach.
Dimensionality Reduction Algorithms
The last few years have seen an exponential rise in data capturing at every
possible level or stage. Research organizations, corporations, and government
agencies are coming up with new sources of information and capturing data
in more detail.
E-commerce businesses, for example, are capturing more details about their
current and potential customers, such as what they like, their demographic,
dislikes, web activities, purchase history, and more. They use this
personalized information to offer better and more personalized attention.
This information consists of many different features, which appears to be a
good thing. However, there is a challenge when it comes to identifying the
most important variables out of massive amounts of data. Fortunately, using
dimensionality reduction algorithms along with other algorithms can help
them identify these variables using a missing value ratio, correlation matrix,
and others.
Gradient Boosting Algorithms
Most winning programs in Kaggle competitions are a combination of several
different advanced machine-learning algorithms. However, a common feature
in most of them is the Gradient Boosting Machine model. Most users still use
this boosting algorithm as a black box, in spite of its massive potential and
popularity.
Gradient boosting is the process of boosting the strength and efficiency of
weak machine learning programs in a sequential, additive, and gradual
manner. It does this by identifying weaknesses using gradients in the loss
function, which indicates how good a particular algorithm's coefficients are at
applying the underlying data. The logical process of understanding the loss
function depends on what users are trying to boost or optimize.
For example, if users were trying to predict the sales price of a particular
asset using a regression model, then the difference between predicted and true
prices would be the basis of the loss function. The main objective of using
this algorithm is to optimize or boost user-specified functions.
Gradient boosting algorithms aim to imitate nature by transforming weak
learners into strong learners by adding features of new models to correct
weaknesses in an existing model using various engines, such as margin-
maximizing classification algorithm and decision stamp. The common types
of gradient boosting algorithms include:

1. XG Boost

Extreme gradient boosting or XG Boost aims to boost performance,


scalability, and computational speed. It uses many different features to
achieve its objectives, such as cache optimization, cross-validation, parallel
processing, and distributed computing.

2. GMB
This is a machine-learning algorithm designed to deal with classification and
regression problems by building a prediction model based on an ensemble of
other weak algorithms, usually decision trees. It constructs models in a stage-
wise manner and generalizes them by allowing for the improvement of a
random loss function.

3. LightGMB

Using tree-based learning algorithms, this gradient boosting model provides


better accuracy, lower memory consumption, higher efficiency, faster
training, ability to handle large-scale data, and support for GPU and parallel
learning.

4. CatBoost

This is a high-performance, scalable, and fast gradient boosting framework


based on decision tree algorithms. It optimizes regression, classification,
ranking, and other machine learning processes for Python, C++, R, and Java.
It also supports computation on both GPU and CPU.
There are so many machine-learning algorithms that beginners can feel
overwhelmed when they hear or come across different algorithms they need
to learn about and understand where they fit. Therefore, it is important to tour
the common algorithms to get a better feeling of the various options
available.
Chapter 10: Building Good Training Sets
Machine learning refers to the process of getting the automatic operations of
a system to function automatically with particular computational algorithms.
It takes place when these algorithms receive training from datasets input by
the programmer. They eventually learn how to complete processes following
what they discovered. Therefore, the most critical part of machine learning
involves training sets. The data that one input to teach the software
determines the success or failure of the learning process and its predictions.
The more precise the collection that one creates and applies, the more
accurate the eventual findings of the algorithms will be.
So, how does a person ensure to build good training sets? A good foundation
leads to success because it is what holds up the subsequent procedures or
structures. Similarly, data sets form the basis of the machine learning process
as the signal the beginning of the learning procedure. Here, we have the
information regarding the creation of data sets, from the collection to the
conversion of data. Knowledge of the procedures involved can help an
individual to learn how they can build suitable training sets. Such
understanding will also result in a programmer optimizing the machine
learning process of their systems.
How to Build the Data
Data is the key to starting any machine learning process. A person needs to
understand the procedures involved in creating data. A few vital steps ensure
that an individual obtains and applies the most suitable data to a machine-
learning algorithm. Following the process correctly allows one to feed the
best possible and most unbiased input to the system and obtain the most
accurate predictions as needed. These steps of preparing and building data
are:

1. Collecting Data

The collection of data is the first measure of building data. It refers to


obtaining the information or values needed according to the training that an
individual wants to place in the machine learning process. One collects these
inputs in line with the requirements of the software model. It is essential for a
person also to ensure that a set of files merge into single information. The
merging of the records of interest results in the machine learning type
ingesting a large number of necessary data in an organized way. For instance,
a programmer can assemble the data, which represents everyday dealings into
one input. They can then place it into a learning model that has a limited
capacity, such as taking in a couple of months or a year’s worth of data.
Additionally, it is imperative to obtain the best results from the sampling
process. One must ensure that they gain the eventual inputs objectively. Data
collection is a vital step that deals with regular issues. It involves looking for
and determining essential data from external storage, parsing complex data
into tabular form. The breakdown enables the scanning and identification of a
pattern to be more comfortable.

2. Exploring and Profiling of Data

This second step aims to study and find out anything that may compromise
the accuracy of the data. A programmer uses the data they collect to train the
machine-learning model. Thus, they must ensure that the information is as
close to perfect as possible to safeguard against findings that are full of errors
once the model eventually completes its processes.
One evaluates the state of the data they collect and search for details such as
deviations, trends, and exceptions, inconsistent, incorrect, skewed, or missing
information. They assess the whole collection and not just portions of it to
make sure that they do not miss potentially significant findings. Hence,
correct exploration and profiling of data lead to a user appropriately and
accurately informing their learning model's conclusions.

3. Formatting the Data

The third step of preparation is to format the data. Different machine learning
models have various characteristics. A user must make sure that their data or
data sets fit appropriately with the learning type that they have. There will be
errors or anomalies in the results if the data has too many differences.
The variances can come because of inputting values in different styles or
having various programmers handling the sets in different ways. For this
reason, the key to ensuring a person undertakes proper formatting process has
consistency. Maintaining uniform arrangements eliminates discrepancies in
the whole collection of data by making sure that the input formatting
procedure is the same.
4. Improving the Quality of Data

The next step in the preparation process is enhancing the quality of data. It
involves possessing a plan for handling the mistakes that are in the data,
which in turn improves the data. This step addresses and solves issues such as
missing values, outliers, extreme values, or erroneous data. However, a
person should deal with each problem carefully and sensibly to avoid ruining
the data collection. For instance, one should not eliminate too much
information with missing values or other errors. They can end up making the
data set inadequate and consequently, inaccurate.
It is also essential to note outliers that may exist in the collection, along with
the meanings that they represent. Sometimes they are just errors of inputting,
while at other times, they genuinely reflect useful findings. These results may
guide future events depending on the conditions. In addition to this, some
tools possess inherent intelligent capabilities. These instruments or
algorithms assist in matching similar characteristics of the data from different
collections. They then merge them intelligently and enable a user to look at
different information in one view of the datasets.

5. Engineering Data into Features

This step of preparing data deals with breaking down data into forms that the
machine-learning algorithm can best understand. Different algorithms
recognize various patterns of learning, and a person looks to present them
with the best possible representations. They engineer by changing the raw
data into elements that show an enhanced pattern to the learning algorithms.
The better the system can read the features, the more critical information that
the computation receives. Subsequently, it leads to better and more accurate
findings from the learning process. Thus, in this step, a programmer breaks
down data into several parts, which indicate, connections that are more
precise and better inform the algorithm of interest.

6. Dividing the Data into Evaluation and Training Sets

The last step of preparation is to split the data into two sets. A person uses the
first set to train the algorithm and employs the other one in evaluation
procedures. The most suitable collections to choose in this step are those that
do not overlap with each other. Such selections ensure that an individual
carries out appropriate testing. Proper division and subsequent organization
of the data can enable a user to increase accuracy in the findings.
This also allows them to trace the data's path from the results, all the way
back to the input. This tracking enables processes such as backpropagation,
which assist a user in correcting errors in the system and optimizing the
learning. Hence, one should utilize instruments that sort and arrange the
primary source along with the developed data. These tools seek to input the
machine learning algorithms in a clear and orderly manner.
Data Selection
Selecting data involves choosing features and engineering them as
appropriate. Suitable data selection leads to accurate findings. It also reduces
meaningless data, which enables the algorithms to train quicker. Three ways
of proper feature selection, which determine the type of data that one utilizes
in the end include:

1. Univariate Selection

This procedure of selecting involves using statistical examination. These tests


indicate the individual features that possess the most reliable connection with
the response variable. Selection methods here are easy to administer and to
comprehend which helps a person also to understand data better. Some of the
methods applied here include the Pearson correlation efficient, distance
correlation, mutual information, and maximal information coefficient, and
ranking based on a model.

2. Correlation Matrix with Heatmap

The correlation, in this case, indicates how the features link with one another
or with the output variable. The relationship is negative when the increment
of the value of the element reduces the output variable's factor. On the other
hand, the correlation is positive when an increase in a feature's value leads to
an increment in the value of the response variable. The heatmap, used in this
instance, enables a person to determine the elements that connect the most
with the target variable quickly.
3. Feature Importance

It refers to a characteristic of the learning model that an individual can


employ to find the feature importance of each element of the dataset. It
provides a grade for every feature of the data. The higher the degree is for the
quality, the more significant the aspect is to the response variable. Feature
importance utilizes Tree-Based Classifiers to extract the most relevant
features for a collection of data.
Data selection involves choosing the best suitable features, which increase
accuracy in the output but also eliminates noisy irrelevant ones.

Data Preprocessing
It is an essential stage in machine learning as it provides valuable information
that informs and influences how a model learns. The concepts involved here
are:
1. Dealing with Null Values – Null values are always present in the
collection of data. Human users deal with these values because the
models cannot manage them by themselves. The person initially
confirms the existence of the null values in the set and then proceeds to
remove the data containing them. One should be mindful of the amount
they delete to prevent loss of information or skewing the entire data.
They can use the substitution process of imputation to deal with such
circumstances where values may be missing.
2. Imputation – It refers to the approach of substituting values that are
missing in the data collection. The process can apply the Imputer class
or a customized function to replace the values and obtain a rough
indication of the data shape.
3. Standardization – In this preprocessing, a person converts their values
in a way that the standard or average deviation is one, while the mean
is zero. It helps to avoid getting results that ignored potentially relevant
information. It ensures to capture all the critical data by calculating the
value's mean and standard deviation. It uses the StandardScaler
function to do the calculations. One then subtracts the mean and
divides it by the average variance to address every data point.
4. Handling of Categorical Variables – It refers to dealing with distinct
and non-continuous variables. The nominal categorical variables refer
to those qualities, which a person cannot order due to lack of a
relationship between himself or herself like, color. In contrast, ordinal
categorical variables are those features that a person can order such as,
size or amount of an item. One must understand the difference between
the two variables to avoid errors in the application of the map function.
5. One-Hot Encoding – One uses this to know the number of unique
values that the nominal feature can receive. If they are present in a
particular set or column, the computes will be one, and if there are no
unique ones, the value will be zero.
6. Multicollinearity – It occurs when there are features, which strongly
depend on each other and can influence the learning model. It prevents
one from utilizing the weight vector to compute feature importance and
can change in the decision boundary.
Data Conversion
It refers to transforming data from one format into another by the use of
software and rarely, by human intervention. The process bases the conversion
on rules, which the application, programming language, programmer or
operating system creates. It aims to allow internetworking and embed as
much information as possible to sustain the entire data. The target format
must be able to facilitate the same constructs and features of the original data.
A person can apply reverse engineering to obtain a close estimate of the
source stipulation if the format specifications are unknown. Conversion
agents or applications delete information, but it is more challenging to try to
add it while in data conversion.
The data formats employed along with the circumstances concerned
determine the simplicity or complexity of data conversion. One needs to
convert data accordingly because of the systems and applications of a
computer function differently. Effective data conversion enables applications
to operate without difficulty. Nonetheless, it is essential to note that the data
conversion of multiple data formats can be tasking. It could cause
information loss if one handles it carelessly.
Therefore, one should study and understand the steps and elements involved
in developing suitable data collections. Building good training sets lead to
relevant and accurate outputs from the learning model.

Conclusion
Thank you for making it through to the end of Python Machine Learning:
How to Learn Machine Learning with Python, The Complete Guide to
Understand Python Machine Learning for Beginners and Artificial
Intelligence. Let’s hope it was informative and able to provide you with all of
the tools you need to achieve your goals whatever they may be.
Since you have made it to the end of the book, you have learned that Machine
learning is the ability that Information and Technology systems acquire to
identify resolutions to issues through recognition of patterns or designs that
exist in their databases. They base this recognition on present algorithms and
sets of data and afterward use it to form appropriate concepts for solving
issues.
Machine learning is universally accepted as the way of the future and most
new endeavors human beings are making are towards advancement in
artificial intelligence. Human beings need artificial intelligence models to
assist in technical and time-demanding tasks, which would otherwise be
prone to errors due to our biological limitations.
There are four main types of machine learning that a person can use and they
include supervised, unsupervised, semi-supervised, and reinforcement
machine learning.
The three main steps to building machine learning systems include:

Pre-processing data – This refers to the concept that you will


probably need to fine-tune the specific aspects you will need from
a vast data set.
Conduct training and select a predictive model - This step is the
stage where you identify your patterns and associations with your
data.
Evaluating models and predicting unseen data instances – In this
step, an individual executes a test run to see how the data behaves.

Python for machine learning is an intuitive and minimalistic language with


full-featured frameworks, which help reduce the time it takes developers to
achieve their first results.
Artificial neural networks are one of the vital tools that a person uses in
machine learning. They refer to the materially cellular systems or
computational algorithms that can obtain, store, and use experimental
knowledge. These systems show the similarities that exist between the
functioning styles of artificial neural networks and the human information
processing system.
Machine learning classification is a concept in which the machine learns new
information from data provided to it. This machine learning technique is
carried out under a supervised learning approach where the input data is
already labeled. Unsupervised learning involves the machine trying to sort
out the importance of the features by itself. It does this through multiple
observations since the input data is unlabeled.
To build a good Machine Learning (ML) model, developers need to provide
the right training data to an algorithm; an algorithm is a hypothetical set taken
before training begins with real-world data.
Python is one of the fastest-growing and most popular ML platforms. The
main steps involved in developing a powerful machine-learning model are:
1. Define the problem to solve or objectives to meet
2. Adjusting your mindset
3. Choosing a systematic process
4. Picking Python
5. Choosing the right datasets to work on and practicing the process
6. Building an ML portfolio
The next step is to build your own program or software that can learn and be
of value to you in your career or even just for fun.
Finally, if you found this book useful in any way, a review on Amazon is
always appreciated!

You might also like