ML Unit 1
ML Unit 1
ML Unit 1
Tech-III-I,ECE
UNIT I: Introduction: Definition of learning systems, Goals and applications of machine
learning, Aspects of developing a learning system: training data, concept representation,
function approximation. Inductive Classification: The concept learning task, Concept
learning as search through a hypothesis space, General-to-specific ordering of hypotheses,
Finding maximally specific hypotheses, Version spaces and the candidate elimination
algorithm, Learning conjunctive concepts, The importance of inductive bias.
Introduction to Machine learning
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which
work on our instructions. But can a machine also learn from experiences or past data like
a human does, So here comes the role of Machine Learning.
***A machine can learn if it can gain more data to improve its performance.***
Image recognition is one of the most common applications of machine learning, The
popular use case of image recognition and face detection is, Automatic friend tagging
suggestion.
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars.
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. fraudulent transaction can take place such as fake accounts, fake ids,
and steal money in the middle of a transaction.
Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short term
memory neural network is used for the prediction of stock market trends.
In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.
Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, Google's GNMT (Google Neural Machine Translation) provide this feature,
which is a Neural Machine Learning that translates the text into our familiar language,
and it called as automatic translation.
Machine learning has given the computer systems the abilities to automatically
learn without being explicitly programmed.
Machine learning life cycle is a cyclic process to build an efficient machine learning
project. The main purpose of the life cycle is to find a solution to the problem or
project.
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is
to identify and obtain all data-related problems. The data can be collected from various
sources such as files, database, internet, or mobile devices. The quantity and quality of
the collected data will determine the efficiency of the output.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a step
where we put our data into a suitable place and prepare it to use in our machine learning
training.
The cleaned and prepared data is passed on to the analysis step. This step involves:
The aim of this step is to build a machine learning model to analyze the data using various
analytical techniques and review the outcome. In real-world applications, collected data
may have various issues, including:
5. Train Model
The next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem. Training a model is required so that it can
understand the various patterns, rules, and, features.
6. Test Model
Machine learning model has been trained on a given dataset, then we test the model. In
this step, we check for the accuracy of our model by providing a test dataset to it. Testing
the model determines the percentage accuracy of the model as per the requirement of
project or problem.
7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the model in
the real-world system. The above-prepared model is producing an accurate result as per
our requirement with acceptable speed, then we deploy the model in the real system.
o Missing Values
o Duplicate data
o Invalid data
o Noise
2. Choosing the target function: The next design decision is to figure out
exactly what kind of knowledge will be acquired and how the performance
software will put it to use. Let’s take the classic example of the checkers game
to understand better. The program only needs to learn how to select the best
moves out of the legal moves(Set of all possible moves is called legal moves).
The choice of the target function is a key feature in designing the entire system. The target
function V: B -> R. This notation denotes that V maps any legal board state from set B to
a real value. Assigning value to target function in a checkers game,
i. The performance system: The performance system solves the given performance
task.
ii. Critic: The critic takes the history of the game and generates training examples.
iii. Generalizer: It outputs the hypothesis that is its estimate of the target function.
iv. Experiment Generator: It creates a new problem after taking in the hypothesis for
the performance system to explore.
Traditional Programming
Machine learning is a subset of AI, which enables the machine to automatically learn from
data, improve performance from past experiences, and make predictions.These ML
algorithms help to solve different business problems like Regression, Classification,
Forecasting, Clustering, and Associations, etc.Based on the methods and way of learning,
machine learning is divided into mainly four types, which are:
Supervised machine learning can be classified into two types of problems, which are given
below:
o Classification
o Regression
Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.
Some popular classification algorithms are given below:
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables Some popular Regression algorithms are
given below:
Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact
idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.
Disadvantages:
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences .
The images are totally unknown to the model, and the task of the machine is to find the
patterns and categories of the objects. So, now the machine will discover its patterns and
differences, such as colour difference, shape difference, and predict the output when it is
tested with the test dataset.
Advantages:
o These algorithms can be used for complicated tasks compared to the supervised
ones because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.
Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.
Unsupervised Learning can be further classified into two types, which are given below:
o Clustering
o Association
1) Clustering: The clustering technique is used when we want to find the inherent groups
from the data. Some of the popular clustering algorithms are given below:
4. Reinforcement Learning
Advantages
Disadvantage
Concept Learning
“A task of acquiring potential hypothesis (solution) that best fits the given training
examples.”
Consider the example task of learning the target concept “days on which my friend
Prabhas enjoys his favorite water sport. ”Below Table describes a set of example days,
each represented by a set of attributes. The attribute Enjoy Sport indicates whether or
not Prabhas enjoys his favorite water sport on this day. The task is to learn to predict the
value of Enjoy Sport for an arbitrary day, based on the values of its other attributes.
(?, ?, ?, ?, ?, ?)
and the most specific possible hypothesis-that no day is a positive example-is represented
by(ø, ø, ø, ø, ø, ø)
Instance Space: (A, X), (A, Y), (B, X), (B, Y) – 4 Examples
Hypothesis Space: (A, X), (A, Y), (A, ø), (A, ?), (B, X), (B, Y), (B, ø), (B, ?), (ø, X), (ø, Y),
(ø, ø), (ø, ?), (?, X), (?, Y), (?, ø), (?, ?) – 16
Hypothesis Space: (A, X), (A, Y), (A, ?), (B, X), (B, Y), (B, ?), (?, X), (?, Y (?, ?) – 10
Hypothesis Space
• consider the sets of instances that are classified positive by h1 and by h2. Because
h2 imposes fewer constraints on the instance, it classifies more instances as
positive.
• Any instance classified positive by h1 will also be classified positive by h2. Therefore,
h2 is more general than h1.
• For any instance x in X and hypothesis h in H, we say that x satisfies h if and only
if h(x) = 1.the more general than or equal to relation in terms of the sets of
instances that satisfy the two hypotheses.
The S algorithm, also known as the Find-S algorithm, is a machine learning algorithm
that seeks to find a maximally specific hypothesis based on labeled training data. It starts
with the most specific hypothesis and generalizes it by incorporating positive examples.
Numerical example:
If (d is positive example)
Remove s from S
If (d is negative example)
Sol:
Step1: S0: (0, 0, 0) Most Specific Boundary
G0: (?, ?, ?) Most Generic Boundary
Iteration1:The first example is negative, the hypothesis at the specific boundary is
consistent, hence we retain it, and the hypothesis at the generic boundary is inconsistent
hence we write all consistent hypotheses by removing one “?” at a time.
S1: (0, 0, 0)
S2: (0, 0, 0)
G2: (Small, Blue, ?), (Small, ?, Circle), (?, Blue, ?), (Big, ?, Triangle), (?, Blue, Triangle)
Learned Version Space by Candidate Elimination Algorithm for given data set is:
S: G: (Small, ?, Circle)
Consistent Hypothesis
Example
Origin Manufacturer Color Decade Type
Type
Japan Honda Blue 1980 Economy Positive
Japan Toyota Green 1970 Sports Negative
Japan Toyota Blue 1990 Economy Positive
USA Chrysler Red 1980 Economy Negative
Japan Honda White 1980 Economy Positive
Solution:
These models represent the most general and the most specific heuristics one might learn.
The actual heuristic to be learned, "Japanese Economy Car", probably lies between them
somewhere within the version space.
G = { (Japan, ?, ?, ?, Economy) }
S = { (Japan, ?, ?, ?, Economy) }
List-Then-Eliminate algorithm
Example:
F1 – > (A, B);F2 – > (X, Y). Here F1 and F2 are two features (attributes) with two possible
values for each feature or attribute.
Instance Space: (A, X), (A, Y), (B, X), (B, Y) – 4 Examples
Hypothesis Space: (A, X), (A, Y), (A, ø), (A, ?), (B, X), (B, Y), (B, ø), (B, ?), (ø, X), (ø, Y),
(ø, ø), (ø, ?), (?, X), (?, Y), (?, ø), (?, ?) – 16 Hypothesis
Semantically Distinct Hypothesis : (A, X), (A, Y), (A, ?), (B, X), (B, Y), (B, ?), (?, X), (?, Y
(?, ?), (ø, ø) – 10
List-Then-Eliminate Algorithm Steps
Version Space: (A, X), (A, Y), (A, ?), (B, X), (B, Y), (B, ?), (?, X), (?, Y) (?, ?), (ø, ø),
•Training Instances
F1 F2 Target
A X yes
A Y Yes
Inductive Learning:
This basically means learning from examples, learning on the go. Consider input
samples (x) and output samples (f(x)) in the context of inductive learning, and the
objective is to estimate the function (f). The goal is to generalize from the samples
and map such that the output may be estimated for fresh samples in the future.
Deductive Learning:
Learners are initially exposed to concepts and generalizations, followed by particular
examples and exercises to aid learning.Already existing rules are applied to the
training examples.