Lecture 1 - Introduction To ML
Lecture 1 - Introduction To ML
Email: miqbal@cct.ie 1
Agenda
• Introduction to Machine Learning (ML)
• Traditional Programming and Machine Learning
• What is Machine Learning?
• Application Areas of ML
• Branches of Machine Learning
• Supervised, Unsupervised, Semi-supervised and Reinforcement Learning
• ML Algorithm Selection
• Cross Industry Standard Process (CRISP-DM)
2
3
Introduction to ML
• In 1950, Alan Turing asked a question in his paper, “Computing
Machinery and Intelligence”,
• “Can machines think?” or “Can machines behave intelligently?”
• The paper describes the “Imitation Game,” which involves three
participants, such as a human acting as a judge, another human,
and a computer that is attempting to convince the judge that it
is human.
• The judge would type into a terminal program to “talk” to the
other two participants. Both the human and the computer
would respond, and the judge would decide which response
came from the computer.
• If the judge couldn’t consistently tell the difference between the
human and computer responses, then the computer won the
game. The test continues today in the form of the Loebner Prize,
an annual competition in artificial intelligence.
• The aim is simple enough: convince the judges that they are
4
chatting to a human instead of a computer chat bot program.
Introduction to Machine Learning
• We can observe from the figure that the relationships
between AI, machine learning, deep learning and data
science overlapping.
• Machine learning is a subset of AI that consists of
techniques enabling computers to identify patterns in data
and to deliver AI applications. Deep learning is a subset of
machine learning that enables computers to solve more
complex problems.
• Data science isn’t exactly a subset of machine learning, but
it uses machine learning, deep learning, and AI to analyze
data and reach actionable conclusions.
• It combines machine learning, deep learning and AI with
other disciplines, such as big data analytics and cloud
computing.
5
TP and ML
• Traditional programming (TP) approach is
shown in the Figure. We have rules that act on
data and give us answers.
• The rules and data are provided to the
programme and you can get an output based on
the data structure.
• We get lots of data about our scenario, we label
that data, and the computer can figure out what
the rules are that make one piece of data match
a particular label and another piece of data
match a different label.
• Suppose we collect a lot of instances of this
data while they’re doing different activities.
• We end up with a scenario of having data that
says “This is what walking looks like,” “This is
what running looks like,” and so on 6
TP and ML
• We can develop a spam filter using traditional programming
techniques using the following steps as
• First we would consider what spam typically looks like. We
might notice that some words or phrases (such as “win,”
“credit card,” “free,” and “amazing”) tend to come up a lot in
the subject line.
The traditional approach
1. We would notice a few other patterns in the sender’s
name, the email’s body, and other parts of the email.
2. We would write a detection algorithm for each of the
patterns that we noticed, and your program would flag
emails as spam if a number of these patterns were
detected.
• We would test the program and repeat steps 1 and 2 until it Machine Learning can help
was good enough to launch. humans learn 7
What is Machine Learning?
• Machine Learning is the science (and art) of programming computers so they
can learn from data.
• Machine Learning is the field of study that gives computers the ability to learn
without being explicitly programmed. Arthur Samuel, 1959
• A set of tools for making inferences and predictions from data.
• Predict future events
• Will it rain tomorrow?
• Yes (70% probability)
• Infer the causes of events and behaviors
• Why does it rain?
• Time of the year, humidity levels, temperature, location etc.
• Infer patterns
• What are the different types of weather conditions?
• Rain, sunny, overcast, fog, etc. 8
Machine Learning Framework
• Machine learning is a unified algorithmic
framework designed to identify
computational models that accurately
describe empirical data and the
phenomena underlying it, with little or no
human involvement.
Data
Tid Refund Marital Taxable
Status Income Cheat
Milk
10
Branches of Machine Learning
• Machine learning algorithms fall into two broad categories
Supervised Unsupervised
• Supervised Learning Algorithms are trained with labeled data. Learning Learning
In other words, data composed of examples of the desired
answers. For instance, a model that identifies fraudulent credit
card use would be trained from a dataset with labeled data points
of known fraudulent and valid charges. Most machine learning is
Semi-supervised Reinforcement
supervised. Learning Learning
• Examples:
No Education
{ High school,
Graduate
Undergrad }
Number of Number of
years years
Yes No Yes No
13
Classification
Applications
• Fraud Detection • Churn prediction for telephone
• Goal: Predict fraudulent cases in credit customers
card transactions.
• Goal: To predict whether a customer
• Approach: is likely to be lost to a competitor.
• Use credit card transactions and the • Approach:
information on its account-holder as
attributes. • Use detailed record of transactions
• When does a customer buy, what does with each of the past and present
he/ she buy, how often he/ she pays customers, to find attributes.
on time, etc.
• How often the customer calls,
• Label past transactions as fraud or fair where he/ she calls, what time-
transactions. This forms the class of-the day he/ she calls most, his
attribute. financial status, marital status,
• Learn a model for the class of the etc.
transactions.
• Label the customers as loyal or disloyal.
• Use this model to detect fraud by
observing credit card transactions on an • Find a model for loyalty.
account.
14
Machine Learning Modelling
Clustering
• Finding groups of objects such that the objects in a group will be similar (or related) to one
another and different from (or unrelated to) the objects in other groups.
• In unsupervised learning, as we might guess, the training
data is unlabeled. The system tries to learn without a
teacher.
• The most important unsupervised learning algorithms.
• Clustering: kNN, K-Means, DBSCAN and Hierarchical
Cluster Analysis (HCA).
• Anomaly Detection and Novelty Detection: One-class
SVM and Isolation Forest.
• Visualization and Dimensionality Reduction: Principal
Component Analysis (PCA), Locally Linear Embedding
(LLE) and t-Distributed Stochastic Neighbor Embedding
(t-SNE).
15
Clustering
Applications
• Market Segmentation: • Document Clustering:
• Goal: subdivide a market into distinct • Goal: To find groups of documents that
subsets of customers where any subset are similar to each other based on the
may be selected as a market target to be important terms appearing in them.
reached with a distinct marketing mix.
• Approach: To identify frequently
• Approach: occurring terms in each document. Form a
similarity measure based on the
• Collect different attributes of customers frequencies of different terms. Use it to
based on their geographical and lifestyle cluster.
related information.
• Find clusters of similar customers.
• Measure the clustering quality by Enron email
observing buying patterns of customers in dataset
the same cluster vs. those from different
clusters.
16
Association Rule Discovery
• Given a set of records each of which contain some number of
items from a given collection
• Produce dependency rules which will predict occurrence of an item based
on occurrences of other items.
TID Items
1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Rules Discovered:
{Milk} --> {Coke}
{Diaper, Milk} --> {Beer}
17
Association Analysis
Applications
• Market-basket analysis
• Rules are used for sales promotion, shelf management, and inventory
management.
• Medical Informatics
• Rules are used to find combination of patient symptoms and test results
associated with certain diseases.
18
Deviation/Anomaly/Change Detection
19
Semi-supervised Learning
• Since labeling data is time-consuming and costly, we have
plenty of unlabeled instances, and few labeled instances.
• This is the unsupervised part of the algorithm (clustering). Semi-supervised learning with two classes (triangles
Now all the system needs is for you to tell it who these people and squares): the unlabeled examples (circles) help
classify a new instance (the cross) into the triangle
are. Add one label per person and it is able to name everyone class rather than the square class, even though it is
in every photo, which is useful for searching photos. closer to the labeled squares 20
Reinforcement Learning
• Consider we are teaching the dog to catch a ball, when we throw a ball, the dog catches a ball,
we will give a cookie. If it fails to catch a dog, we will not give a cookie. So the dog will figure
out what actions it does that made it receive a cookie and repeat that action.
• Similarly, in an RL environment, we will not teach the agent what to do or how to do, but we
will give feedback to the agent for each action it does. The feedback may be positive (reward)
or negative (punishment).
• The learning system which receives the punishment will improve itself. It is a trial-and-error
process. The reinforcement learning algorithm retains outputs that maximize the received
reward over time.
• In the above analogy, the dog represents the agent, giving a cookie to the dog on catching a
ball is a reward and not giving a cookie is punishment.
• It depends on out policy that we should give a reward after each step or after completion of
some number of steps.
• A RL agent can explore for different actions which might give a good reward, or it can (exploit)
use the previous action which resulted in a good reward. If the RL agent explores different
actions, there is a great possibility to get a poor reward.
21
ML Algorithm Selection
• The flow chart is used to show the
selection of Machine learning
algorithm based on the data.
• It is clear from the flow chart that
the three major kinds of
algorithms are considered as
important in the machine learning.
• It is the responsibility of Data
Scientist to select an appropriate
ML algorithm to provide an
accurate results for the relevant
problem.
22
Cross Industry Standard Process
CRISP-DM
• Fits data mining into the general Evaluation Phase Modeling Phase
problem-solving strategy of
business/ research unit
CRISP-DM Lifecycle
• Industry, tool and application
neutral • Iterative CRIP-DM process is shown in outer circle
• Most significant dependencies between phases shown
• Data mining projects follow
• Next phase depends on results from preceding phase
iterative, adaptive life cycle
• Returning to earlier phase possible before moving
consisting of 6 phases forward
23
Cross Industry Standard Process
CRISP-DM
1. Business/ Research Understanding Phase 4. Modeling Phase
◦ Define project requirements and objectives • Select and apply one or more modeling
◦ Translate objectives into data exploration techniques
problem definition • Calibrate model settings to optimize results
◦ Prepare preliminary strategy to meet objectives 5. Evaluation Phase
2. Data Understanding Phase • Evaluate one or more models for effectiveness
◦ Collect data • Determine whether defined objectives
◦ Perform exploratory data analysis (EDA) achieved
◦ Assess data quality • Make decision regarding data exploration
results before deploying to field
◦ Optionally, select interesting subsets
6. Deployment Phase
3. Data Preparation Phase
• Make use of models created
◦ Prepares for modeling in subsequent phases
• Simple deployment example: generate report
◦ Select cases and variables appropriate for
analysis • Complex deployment example: implement
parallel data exploration effort in another
◦ Cleanse and prepare data so it is ready for department
modeling tools
• In businesses, customer often carries out
◦ Perform transformation of certain variables, if deployment based on your model
needed
24
Resources/ References
• Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow, 2nd Edition, Aurélien Géron, O'Reilly Media,
September 2019, ISBN: 9781492032649.
• Statlib: http://lib.stat.cmu.edu/