Unit 1 - Machine Learning
Unit 1 - Machine Learning
Tech
Subject Name: Machine Learning
Subject Code: CS-601
Semester: 6th
Downloaded from www.rgpvnotes.in
Figure: 1.1
Important Terms of Machine Learning
• Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques used to
learn patterns from data and draw significant information from it. It is the logic behind a
Machine Learning model. An example of a Machine Learning algorithm is the Linear
Regression algorithm.
• Model: A model is the main component of Machine Learning. A model is trained by using a
Machine Learning Algorithm. An algorithm maps all the decisions that a model is supposed to
take based on the given input, in order to get the correct output.
• Predictor Variable: It is a feature(s) of the data that can be used to predict the output.
• Response Variable: It is the feature or the output variable that needs to be predicted by using
the predictor variable(s).
• Training Data: The Machine Learning model is built using the training data. The training data
helps the model to identify key trends and patterns essential to predict the output.
• Testing Data: After the model is trained, it must be tested to evaluate how accurately it can
predict an outcome. This is done by the testing data set.
Note: A Machine Learning process begins by feeding the machine lots of data, by using this data
the machine is trained to detect hidden insights and trends. These insights are then used to build a
Machine Learning Model by using an algorithm in order to solve a problem in Figure 1.2.
Figure: 1.2
Scope
• Increase in Data Generation: Due to excessive production of data, need a method that can be
used to structure, analyze and draw useful insights from data. This is where Machine Learning
comes in. It uses data to solve problems and find solutions to the most complex tasks faced
by organizations.
• Improve Decision Making: By making use of various algorithms, Machine Learning can be
used to make better business decisions.
For example, Machine Learning is used to forecast sales, predict downfalls in the stock market,
identify risks and anomalies, etc.
• Uncover patterns & trends in data: Finding hidden patterns and extracting key insights from
data is the most essential part of Machine Learning. By building predictive models and using
statistical techniques, Machine Learning allows you to dig beneath the surface and explore
the data at a minute scale. Understanding data and extracting patterns manually will take
days, whereas Machine Learning algorithms can perform such computations in less than a
second.
• Solve complex problems: Building self-driving cars, Machine Learning can be used to solve
the most complex problems.
Limitations
1. What algorithms exist for learning general target function from specific training examples?
2. In what setting will particular algorithm converge to the desired function, given sufficient
training data?
3. Which algorithm performs best for which types of problems and representations?
Polynomial Regression
In polynomial regression, we transform the original features into polynomial features of a given
degree and then apply Linear Regression on it. Consider the above linear model Y = a+bX is
transformed to something like – Y=a + bX + cX2
It is still a linear model but the curve is now quadratic rather than a line. Scikit-Learn provide
Polynomial Features class to transform the features.
If we increase the degree to a very high value, the curve becomes overfitted as it learns the noise in
the data as well.
Support Vector Regression
In SVR, we identify a hyper plane with maximum margin such that maximum numbers of data points
are within that margin. SVRs are almost similar to SVM classification algorithm. Instead of minimizing
the error rate as in simple linear regression, we try to fit the error within a certain threshold. Our
objective in SVR is to basically consider the points that are within the margin. Our best fit line is the
hyper plane that has maximum number of points.
Each value of the random variable may or may not be equally likely. There is only 1
combination of dice, with sum 2{(1,1)}, while a sum of 5 can be achieved by {(1,4), (2,3), (3,2),
(4,1)}. So, 5 is more likely to occur as compared to 2. On the contrary, the likelihood of a head
or a tail in a coin toss is equal and 50-50.
Sometimes, the random variables can only take fixed values, or values only in a certain
interval. For example in a dice, the top face will only show values between 1 and 6. It cannot
take a 2.25 or a 1.5. Similarly, when a coin is flipped, it can only show heads and tails and
nothing else. On the other hand, if I define my random variable to be the amount of sugar in
orange. It can take any value like 1.4g, 1.45g, 1.456g, 1.4568g as so on. All these values are
possible and all infinite values between them are also possible. So, in this case, the random
variable is continuous with a possibility of all real numbers.
Don’t think random variable as a traditional variable (even though both are called variables)
like y=x+2, where the value of y is dependent on x. Random variable is defined in terms of the
outcome of a process. We quantify the process using the random variable.
Statistic
Machine learning and statistics are two tightly related fields of study. So much so that statisticians
refer to machine learning as “applied statistics” or “statistical learning” rather than the computer-
science-centric name.
Raw observations alone are data, but they are not information or knowledge.
Data raises questions, such as:
What is the most common or expected observation?
What are the limits on the observations?
What does the data look like?
Although they appear simple, these questions must be answered in order to turn raw observations
into information that we can use and share.
Beyond raw data, we may design experiments in order to collect observations. From these
experimental results we may have more sophisticated questions, such as:
What variables are most relevant?
What is the difference in an outcome between two experiments?
Are the differences real or the result of noise in the data?
Questions of this type are important. The results matter to the project, to stakeholders, and to
effective decision making.
Statistical methods are required to find answers to the questions that we have about data.
We can see that in order to both understand the data used to train a machine learning model and to
interpret the results of testing different machine learning models, that statistical methods are
required. Statistics is a subfield of mathematics.
It refers to a collection of methods for working with data and using data to answer questions.
Descriptive Statistics
Descriptive statistics refer to methods for summarizing raw observations into information that we
can understand and share.
Commonly, we think of descriptive statistics as the calculation of statistical values on samples of data
in order to summarize properties of the sample of data, such as the common expected value (e.g. the
mean or median) and the spread of the data (e.g. the variance or standard deviation).
Descriptive statistics may also cover graphical methods that can be used to visualize samples of data.
Charts and graphics can provide a useful qualitative understanding of both the shape or distribution
of observations as well as how variables may relate to each other.
Inferential Statistics
Inferential statistics is a fancy name for methods that aid in quantifying properties of the domain or
population from a smaller set of obtained observations called a sample.
Commonly, we think of inferential statistics as the estimation of quantities from the population
distribution, such as the expected value or the amount of spread.
More sophisticated statistical inference tools can be used to quantify the likelihood of observing data
samples given an assumption. These are often referred to as tools for statistical hypothesis testing,
where the base assumption of a test is called the null hypothesis.
Linear Algebra
Linear Algebra is a branch of mathematics that lets you concisely describe coordinates and
interactions of planes in higher dimensions and perform operations on them and concerned with
vectors, matrices, and linear transforms.
Although linear algebra is integral to the field of machine learning, the tight relationship is often left
unexplained or explained using abstract concepts such as vector spaces or specific matrix operations.
Linear Algebra is required -
When working with data, such as tabular datasets and images.
When working with data preparation, such as one hot encoding and dimensionality reduction.
The ingrained use of linear algebra notation and methods in sub-fields such as deep learning,
natural language processing, and recommender systems.
Examples of linear algebra in machine learning-
1. Dataset and Data Files
2. Images and Photographs
3. Linear Regression
4. Regularization
5. Principal Component Analysis
6. Singular-Value Decomposition
7. Latent Semantic Analysis
8. Recommender Systems
9. Deep Learning
For instance-
Images and Photographs
1. Perhaps you are more used to working with images or photographs in computer vision
applications.
2. Each image that you work with is itself a table structure with a width and height and one pixel
value in each cell for black and white images or 3 pixel values in each cell for a color image.
3. A photo is yet another example of a matrix from linear algebra.
4. Operations on the image, such as cropping, scaling, shearing, and so on are all described using
the notation and operations of linear algebra.
Linear Regression
1. Linear regression is an old method from statistics for describing the relationships between
variables.
2. It is often used in machine learning for predicting numerical values in simpler regression
problems.
3. There are many ways to describe and solve the linear regression problem, i.e. finding a set of
coefficients that when multiplied by each of the input variables and added together results in
the best prediction of the output variable.
Convex Optimization
Optimization is a big part of machine learning. It is the core of most popular methods, from least
squares regression to artificial neural networks.
These methods useful in the core implementation of a machine learning algorithm. It is required to
implement own algorithm tuning scheme to optimize the parameters of a model for some cost
function.
A good example may be the case where we want to optimize the hyper-parameters of a blend of
predictions from an ensemble of multiple child models.
Machine learning algorithms use optimization all the time. We minimize loss, or error, or maximize
some kind of score functions. Gradient descent is the "hello world" optimization algorithm covered
on probably any machine learning course. It is obvious in the case of regression, or classification
models, but even with tasks such as clustering we are looking for a solution that optimally fits our
data (e.g. k-means minimizes the within-cluster sum of squares). So if you want to understand how
the machine learning algorithms do work, learning more about optimization helps. Moreover, if you
need to do things like hyper parameter tuning, then you are also directly using optimization.
Data visualization
Data visualization is an important skill in applied statistics and machine learning.
Statistics does indeed focus on quantitative descriptions and estimations of data. Data visualization
provides an important suite of tools for gaining a qualitative understanding.
This can be helpful when exploring and getting to know a dataset and can help with identifying
patterns, corrupt data, outliers, and much more. With a little domain knowledge, data visualizations
can be used to express and demonstrate key relationships in plots and charts that are more visceral
to yourself and stakeholders than measures of association or significance.
There are five key plots that need to know well for basic data visualization. They are:
Line Plot
Bar Chart
Histogram Plot
Box and Whisker Plot
Scatter Plot
With knowledge of these plots, you can quickly get a qualitative understanding of most data that you
come across.
Line Plot
A line plot is generally used to present observations collected at regular intervals.
The x-axis represents the regular interval, such as time. The y-axis shows the observations, ordered
by the x-axis and connected by a line.
Bar Chart
A bar chart is generally used to present relative quantities for multiple categories.
The x-axis represents the categories and are spaced evenly. The y-axis represents the quantity for
each category and is drawn as a bar from the baseline to the appropriate level on the y-axis.
A bar chart can be created by calling the bar() function and passing the category names for the x-axis
and the quantities for the y-axis.
Bar charts can be useful for comparing multiple point quantities or estimations.
Scatter Plot
A scatter plot (or ‘scatterplot’) is generally used to summarize the relationship between two paired
data samples.
Paired data samples means that two measures were recorded for a given observation, such as the
weight and height of a person.
The x-axis represents observation values for the first sample, and the y-axis represents the
observation values for the second sample. Each point on the plot represents a single observation.
Scatter plots are useful for showing the association or correlation between two variables. A
correlation can be quantified, such as a line of best fit, that too can be drawn as a line plot on the
same chart, making the relationship clearer.
A dataset may have more than two measures (variables or columns) for a given observation. A
scatter plot matrix is a cart containing scatter plots for each pair of variables in a dataset with more
than two variables.
• For achieving better results from the applied model in Machine Learning projects the format of the
data has to be in a proper manner. Some specified Machine Learning model needs information in a
specified format, for example, Random Forest algorithm does not support null values, therefore to
execute random forest algorithm null values have to be managed from the original raw data set.
• Another aspect is that data set should be formatted in such a way that more than one Machine
Learning and Deep Learning algorithms are executed in one data set, and best out of them is chosen.
Data Augmentation
Data augmentation is the process of increasing the amount and diversity of data. We do not collect
new data, rather we transform the already present data. For instance we can consider image,so in
image there are various ways to transform and augment the image data.
Need for data augmentation
Data augmentation is an integral process in deep learning, as in deep learning we need large
amounts of data and in some cases it is not feasible to collect thousands or millions of images, so
data augmentation comes to the rescue. It helps us to increase the size of the dataset and introduce
variability in the dataset.
Operations in data augmentation
The most commonly used operations are-
1. Rotation
2. Shearing
3. Zooming
4. Cropping
5. Flipping
6. Changing the brightness level
Normalizing Data Sets
Normalization is a technique often applied as part of data preparation for machine learning. The goal
of normalization is to change the values of numeric columns in the dataset to a common scale,
without distorting differences in the ranges of values. For machine learning, every dataset does not
require normalization. It is required only when features have different ranges.
The goal of normalization is to transform features to be on a similar scale. This improves the
performance and training stability of the model.
Four common normalization techniques may be useful:
scaling to a range
clipping
log scaling
z-score
Normalization is a technique often applied as part of data preparation for machine learning. The goal
of normalization is to change the values of numeric columns in the dataset to use a common scale,
without distorting differences in the ranges of values or losing information. Normalization is also
required for some algorithms to model the data correctly.
For example, assume your input dataset contains one column with values ranging from 0 to 1, and
another column with values ranging from 10,000 to 100,000. The great difference in the scale of the
numbers could cause problems when you attempt to combine the values as features during
modelling. Normalization avoids these problems by creating new values that maintain the general
distribution and ratios in the source data, while keeping values within a scale applied across all
numeric columns used in the model.
output to all the units on the next layer, but there is no feedback to the previous layer. Weightings
are applied to the signals passing from one unit to another, and it is these weightings which are
tuned in the training phase to adapt a neural network to the particular problem at hand.
Types of Machine Learning
Machine learning is sub-categorized to three types:
1. Supervised Learning – Train Me!
2. Unsupervised Learning – I am self-sufficient in learning
3. Reinforcement Learning – My life My rules! (Hit & Trial)
Supervised Learning
Supervised Learning is the one, where you can consider the learning is guided by a teacher. We have
a dataset which acts as a teacher and its role is to train the model or the machine. Once the model
gets trained it can start making a prediction or decision when new data is given to it.
follows the concept of hit and trial method. The agent is rewarded or penalized with a point for a
correct or a wrong answer, and on the basis of the positive reward points gained the model trains
itself. And again once trained it gets ready to predict the new data presented to it.