MODULE _ 1
MODULE _ 1
LEARNING
MODULE – 1
Course Code BCS602
• Introduction
• Need for Machine Learning
• Machine Learning Explained
• Machine Learning in Relation
to other Fields
• Types of Machine Learning
TOPICS • Challenges of Machine
Learning
• Machine Learning Process
• Machine Learning
Applications
• Understanding Data – 1:
Introduction
• Big Data Analysis
Framework
• Descriptive Statistics
• Univariate Data Analysis
and Visualization.
Machine learning (ML) allows computers to learn and make
decisions without being explicitly programmed. It involves
feeding data into algorithms to identify patterns and make
predictions on new data.
https://medium.com/enjoy-algorithm/introduction-to-machine-
learning-74393e6b7b9d
Need for Machine Learning
Machine learning has become so popular because of three reasons:
1. High volume of available data to manage: Big companies such as Facebook,
Twitter, and YouTube generate huge amount of data that grows at a
phenomenal rate. It is estimated that the data approximately gets doubled
every year.
2. cost of storage has reduced. The hardware cost has also dropped.
Therefore, it is easier now to capture, process, store, distribute, and transmit
the digital information.
3. popularity of machine learning is the availability of complex algorithms now.
Especially with the advent of deep learning, many algorithms are available for
machine learning.
Understanding the Knowledge Pyramid
The Knowledge Pyramid explains how raw data is transformed into useful
knowledge and intelligence. It has five levels:
• Data (Raw Facts)
• Basic facts and numbers stored in different formats like databases or spreadsheets.
• Example: A store collects data on daily sales transactions.
• Information (Processed Data)
• Data that has been analyzed to find patterns or useful details.
• Example: Identifying the best-selling product from sales data.
• Knowledge (Condensed Information)
• Insights gained from information, which help in decision-making.
• Example: Noticing seasonal trends in sales data, like increased sales during holidays.
• Intelligence (Applied Knowledge)
• Using knowledge to take actions or make strategic decisions.
• Example: A business using sales trends to decide on stock levels or marketing strategies.
• Wisdom (Final Stage – Human Expertise)
• The ability to make the best decisions based on intelligence, experience, and judgment.
• Example: A business leader using experience and insights to create long-term strategies.
Machine Learning Explained
"Machine learning is the field of study that gives computers the ability to learn
without being explicitly programmed."
How is Machine Learning Different from Traditional Programming?
Traditional Programming:
• A programmer writes a set of rules for the computer to follow.
• Example: A program for spam filtering checks emails for specific keywords
like "lottery" or "free money" and marks them as spam.
• Problem: This method cannot adapt to new types of spam emails that do
not contain predefined keywords.
Real-World Example:
• Amazon’s Customer Insights:
• Amazon uses data mining to analyze customer purchase history.
• This helps in identifying shopping trends and recommending products
accordingly.
4️⃣ Data Analytics: Extracting Actionable Insights
Data analytics involves analyzing raw data to find actionable insights that
can be used for decision-making.
Labelled Data To illustrate labelled data, let us take one example dataset
called Iris flower dataset or Fisher’s Iris dataset. The dataset has 50 samples of
Iris – with four attributes, length and width of sepals and petals. The target
variable is called class. There are three classes – Iris setosa, Iris virginica, and
Iris versicolor.
Supervised Learning: Learning with Answers
In supervised learning, the model learns from labelled data (data with correct
answers).
The goal is to predict labels for new, unseen data.
Example:
Imagine a teacher giving students a math problem and the correct answer. The
students learn from these examples and solve similar problems on their own.
Types of Supervised Learning:
Regression – Predicts continuous values (e.g., house prices, temperature).
Classification – Predicts categories (e.g., spam vs. non-spam emails, dog vs. cat
images).
Real-World Use Cases:
Spam Detection – Classifies emails as spam or not.
Disease Prediction – Predicts if a patient has a disease based on symptoms.
Stock Market Prediction – Predicts stock prices.
CLASSIFICATION
What is Classification?
Classification is a supervised learning technique used to predict labels
(categories) for new data. It works by learning from labelled data and then
using that knowledge to classify new, unseen data.
Example:
Imagine you have a set of pictures of cats and dogs, where each image is
labelled as either "cat" or "dog." A classification model learns from these
images and can later identify whether a new, unknown image is a cat or a
dog.
How Does Classification Work?
The classification process happens in two stages:
1️⃣ Training Stage
• The model learns from a dataset where each data point is labelled.
• Example: A dataset of animals where each image has a label ("cat" or "dog").
• The model builds a classification structure to understand patterns.
2️⃣ Testing Stage
• The trained model is given new, unseen data.
• It predicts the correct label based on what it learned.
• Example: If given an unknown animal image, the model classifies it as a "cat"
or "dog."
Example with the Iris Dataset:
• Suppose the Iris dataset contains flower details like petal length, petal width,
etc.
• If we provide new flower data (6.3, 2.9, 5.6, 1.8, ?), the classification model
predicts the flower type.
Some of the key algorithms of classification are:
• Decision Tree
• Random Forest
• Support Vector Machines
• Naïve Bayes
• Artificial Neural Network and Deep Learning networks like CNN
Regression Models
What is Regression?
Regression is a supervised learning technique used to predict
continuous values (numbers). Unlike classification (which
predicts categories like "cat" or "dog"), regression predicts
numerical values like sales, temperature, or house prices.
Example:
predict future sales of a product based on previous weeks' sales?
Dimensionality Reduction
Dimensionality reduction algorithms are examples of unsupervised
algorithms. It takes a higher dimension data as input and outputs the
data in lower dimension by taking advantage of the variance of the
data. It is a task of reducing the dataset with few features without
losing the generality.
Differences between Supervised and Unsupervised Learning
Semi-supervised Learning
There are circumstances where the dataset has a huge collection of
unlabelled data and some labelled data. Labelling is a costly
process and difficult to perform by the humans. Semi-supervised
algorithms use unlabelled data by assigning a pseudo-label. Then,
the labelled and pseudo-labelled dataset can be combined.
Reinforcement Learning
Reinforcement Learning (RL) is a type of machine learning
where an agent learns by interacting with an environment
to achieve a goal.
Agent: The learner (it could be a robot, program, or even
a human).
Environment: The world where the agent operates.
Actions: The choices an agent can make.
Rewards & Punishments: The feedback an agent
receives for its actions.
The goal of RL is to maximize rewards over time!
How Does It Work?
Just like how humans learn from trial and error, RL agents learn by taking
actions and getting feedback (positive rewards or negative punishments).
Example: Learning to Play a Grid Game
In the Grid Game (Figure 1.10):
Goal: Reach the target tile.
Danger: Avoid stepping on danger tiles.
Block: Some paths are blocked.
Actions: Move left, right, up, or down.
The agent starts from the bottom-left tile and tries different moves.
If it steps into danger, it gets a negative reward (punishment).
If it moves closer to the goal, it gets a positive reward.
Over time, the agent learns the best path to the goal through experience.
https://www.geeksforgeeks.org/what-is-reinforcement-learning/
CHALLENGES OF MACHINE LEARNING
Computers are better than humans in performing tasks like computation. For
example, while calculating the square root of large numbers, an average
human may blink but computers can display the result in seconds. Computers
can play games like chess, GO, and even beat professional players of that
game.
However, humans are better than computers in many aspects like recognition.
But, deep learning systems challenge human beings in this aspect as well.
Machines can recognize human faces in a second. Still, there are tasks where
humans are better as machine learning systems still require quality data for
model construction. The quality of a learning system depends on the quality of
data.
Some of the challenges are listed below
1. Problems – Machine learning can deal with the ‘well-posed’ problems
where specifications are complete and available. Computers cannot solve
‘ill-posed’ problems. Consider one simple example (shown in Table 1.3)
Can a model for this test data be multiplication? That is, y = x1 × x2 . Well! It is true! But, this
is equally true that y may be y = x1 ÷ x2 , or y = x1^x2. So, there are three functions that fit the
data. This means that the problem is ill-posed. To solve this problem, one needs more
example to check the model. Puzzles and games that do not have sufficient specification
may become an ill-posed problem and scientific computation has many ill-posed problems.
2. Huge Data
• Machine learning needs a lot of data to learn properly.
• The data should be high quality—it should not have missing values or incorrect
information.
• Example: If you are teaching a computer to recognize dogs, but half of your images are
missing labels, the model will not learn properly.
3. High Computation Power
• More data means more processing power is needed.
• Special hardware like GPUs (Graphics Processing Units) and TPUs (Tensor Processing
Units) help speed up calculations.
• Example: Training a self-driving car model requires processing millions of images, which
needs powerful computers.
4. Complexity of Algorithms
• There are many different machine learning algorithms, and choosing the right one is
difficult.
• Data scientists must compare, test, and tune different algorithms to get the best results.
• Example: If you are building a recommendation system (like Netflix or YouTube), you must
test multiple algorithms to see which one gives the best movie recommendations.
5. Bias/Variance Problem
• Bias: The model is too simple and does not learn well (underfitting).
• Variance: The model is too complex and learns too much from the training
data but fails on new data (overfitting).
• Example:
• Underfitting: A model that always predicts "the temperature is 20°C" no matter the
weather. It does not learn well.
• Overfitting: A student memorizing answers instead of understanding concepts—does
well in practice but fails in real exams.
MACHINE LEARNING PROCESS
The emerging process model for the data mining solutions for business
organizations is CRISP-DM. Since machine learning is like data mining, except
for the aim, this process can be used for machine learning. CRISP-DM stands
for Cross Industry Standard Process – Data Mining. This process involves six
steps. The steps are listed below in Figure 1.11.
1. Understanding the Business
• Before analyzing data, you need to understand the problem the business is
facing.
• You also define what kind of solution is needed.
• Example: A retail store wants to predict which customers are likely to return and
shop again.
2. Understanding the Data
• You collect all the data and study its characteristics.
• You look for patterns and form a hypothesis about what might be happening.
• Example: The store collects data on customer purchases and looks for patterns
(e.g., do customers who buy shoes also buy socks?).
3. Preparing the Data
• Raw data often has missing values or errors, so it must be cleaned.
• Missing or incorrect data can lead to wrong predictions.
• Example: If half the customer purchase history is missing, your prediction model
4. Modeling
• You apply data mining algorithms to the cleaned data to build a model that
finds patterns.
• Example: A model might predict which customers will return based on their
past shopping habits.
5. Evaluating the Model
• You test the model to see how well it performs.
• The model should make accurate predictions to be useful.
• Example: If the model predicts that a customer will return, but they don’t,
then the model needs improvement.
6. Deploying the Model
• Once the model is working well, it is used in the real world to improve
decision-making.
• Example: The store uses the model to send discount coupons to customers
likely to return.
MACHINE LEARNING APPLICATIONS
• Machine learning is used everywhere today, making daily tasks easier. Here
are some common applications:
2. Time-Series Database
Stores time-related data (data collected over time).
Data is organized based on timestamps (hourly, daily, weekly, etc.).
Helps track trends and patterns over time.
Example:
• Stock Market Data – Records daily price changes of stocks.
• Weather Monitoring – Tracks temperature, humidity, and rainfall over time.
• Website Traffic Logs – Stores number of website visitors per hour/day.
3. Spatial Database
Stores location-based (spatial) data in two formats:
• Raster Format → Uses bitmaps or pixel maps (e.g., satellite images).
• Vector Format → Stores geometric shapes like points, lines, and polygons (e.g.,
maps).
Example:
• Google Maps & GPS Navigation – Stores geographic locations, routes, and places.
• Weather Maps – Uses raster images to display temperature, pressure, etc.