0% found this document useful (0 votes)
11 views6 pages

EXAMPLE ML in real life

The document outlines three examples of machine learning applications: house price prediction, book genre exploration, and spill detection from video. Each example details the steps involved, including problem definition, dataset building, model training, evaluation, and inference, emphasizing the importance of data preprocessing and model selection. Key concepts such as linear models, k-means clustering, and convolutional neural networks are introduced to illustrate the methodologies used in these tasks.

Uploaded by

ashwani verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

EXAMPLE ML in real life

The document outlines three examples of machine learning applications: house price prediction, book genre exploration, and spill detection from video. Each example details the steps involved, including problem definition, dataset building, model training, evaluation, and inference, emphasizing the importance of data preprocessing and model selection. Key concepts such as linear models, k-means clustering, and convolutional neural networks are introduced to illustrate the methodologies used in these tasks.

Uploaded by

ashwani verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

EXAMPLE 1 House price prediction

STEP One: Defining the problem

Step Two: Building a Dataset

• Data collection: You collect numerous examples of homes sold in your neighbourhood within
the past year, and pay a real estate appraiser to appraise the homes whose selling price is not
known.
• Data exploration: You confirm that all of your data is numerical because most machine
learning models operate on sequences of numbers. If there is textual data, you need to
transform it into numbers. You'll see this in the next example.
• Data cleaning: Look for things such as missing information or outliers, such as the 10-room
mansion. Several techniques can be used to handle outliers, but you can also just remove
those from your dataset.
• Data visualization: You can plot home values against each of your input variables to
look for trends in your data. In the following chart, you see that when lot size increases,
the house value increases.

Step Three: Model Training


Prior to actually training your model, you need to split your data. The standard practice is to
put 80% of your dataset into a training dataset and 20% into a test dataset.

Linear model selection


As you see in the preceding chart, when lot size increases, home values increase too. This
relationship is simple enough that a linear model can be used to represent this relationship.

A linear model across a single input variable can be represented as a line. It becomes a plane
for two variables, and then a hyperplane for more than two variables. The intuition, as a line
with a constant slope, doesn't change.

Step Four: Evaluation


One of the most common evaluation metrics in a regression scenario is called root mean
square or RMS. The math is beyond the scope of this lesson, but RMS can be thought of roughly as
the "average error” across your test dataset, so you want this value to be low.

The math behind RMS


In the following chart, you can see where the data points are in relation to the blue line. You want
the data points to be as close to the "average" line as possible, which would mean less net error.
You compute the root mean square between your model’s prediction for a data point in your test
dataset and the true value from your data. This actual calculation is beyond the scope of this
lesson, but it's good to understand the process at a high level.

Interpreting Results
In general, as your model improves, you see a better RMS result. You may still not be confident
about whether the specific value you’ve computed is good or bad.

Many machine learning engineers manually count how many predictions were off by a threshold
(for example, $50,000 in this house pricing problem) to help determine and verify the model's
accuracy.

Step Five: Inference: Try out your model


Now you are ready to put your model into action. As you can see in the following image, this
means seeing how well it predicts with new data not seen during model training.

Terminology

• Continuous: Floating-point values with an infinite range of possible values. The opposite
of categorical or discrete values, which take on a limited number of possible values.
• Hyperplane: A mathematical term for a surface that contains more than two planes.
• Plane: A mathematical term for a flat surface (like a piece of paper) on which two points
can be joined by a straight line.
• Regression: A common task in supervised machine learning.

EXAMPLE 2 -Book Genre Exploration

Step One: Define the Problem

Step Two: Build your Dataset


To test the hypothesis, you gather book description text for 800 romance books published in the
current year.
Before you can train the model, you need to do some data pre-processing , called data vectorization, to
convert text into numbers.
You transform this book description text into what is called a bag of words representation shown
in the following image so that it is understandable by machine learning models.
Step Three: Train the Model
You pick a common cluster-finding model called k-means . In this model, you can change a
model parameter, k , to be equal to how many clusters the model will try to find in your
dataset.
Your data is unlabelled you don't how many microgenres might exist. So you train your
model multiple times using different values for k each time.

K=2 K=3
Step Four: Model Evaluation
In machine learning, numerous statistical metrics or methods are available to evaluate a
model. In this use case, the silhouette coefficient is a good choice. This metric describes how
well your data was clustered by the model. To find the optimal number of clusters, you plot
the silhouette coefficient as shown in the following image below. You find the optimal value is
when k=19 .

Step Five: Inference (Use the Model)


As you inspect the different clusters found when k=19 , you find a surprisingly large cluster of
books. Here's an example from fictionalized cluster #7.

Clustered data

As you inspect the preceding table, you can see that most of these text snippets are indicating
that the characters are in some kind of long-distance relationship. You see a few other self-
consistent clusters and feel you now have enough useful data to begin writing an article on
unexpected modern romance microgenres.

***********************************************************************************

EXAMPLE 3 Spill Detection from Video


Step One: Defining the Problem

Detecting spills with machine learning

Step Two: Model Training (and selection)


This task is a supervised classification task, as shown in the following image. As shown in the
image above, your goal will be to predict if each image belongs to one of the following classes:

• Contains spill
• Does not contain spill

Step Two: Building a Dataset


• Collecting
• Using historical data, as well as safely staged spills, you quickly build a collection of
images that contain both spills and non-spills in multiple lighting conditions and
environments.
• Exploring and cleaning
• You go through all the photos to ensure the spill is clearly in the shot. There are
Python tools and other techniques available to improve image quality, which you
can use later if you determine a need to iterate.
• Data vectorization (converting to numbers)
• Many models require numerical data, so all your image data needs to be
transformed into a numerical format. Python tools can help you do this
automatically.
• In the following image, you can see how each pixel in the image on the left can be
represented in the image on the right by a number between 0 and 1, with 0 being
completely black and 1 being completely white.

Chemical spill image Numeric representation of chemical spill image


Split the data
• You split your image data into a training dataset and a test dataset.

Step Three: Model Training

Traditionally, solving this problem would require hand-engineering features on top of the
underlying pixels (for example, locations of prominent edges and corners in the image), and
then training a model on these features.

Today, deep neural networks are the most common tool used for solving this kind of
problem. Many deep neural network models are structured to learn the features on top of
the underlying pixels so you don’t have to learn them. You’ll have a chance to take a deeper
look at this in the next lesson, so we’ll keep things high-level for now.

CNN (convolutional neural network)


Neural networks are beyond the scope of this lesson, but you can think of them as a
collection of very simple models connected together. These simple models are called neurons,
and the connections between these models are trainable model parameters called weights.
Convolutional neural networks are a special type of neural network particularly good at
processing images.

Step Four: Model Evaluation


There are many different statistical metrics you can use to evaluate your model. As you gain
more experience in machine learning, you will learn how to research which metrics can help
you evaluate your model most effectively. Here's a list of common metrics:

Accuracy False positive rate Precision

Confusion matrix False negative rate Recall

F1 Score Log Loss ROC curve

Negative predictive value Specificity

The common problem is that Precision and Recall will be effective. You can think
of precision as answering the question, "Of all predictions of a spill, how many were right?"
and recall as answering the question, "Of all actual spills, how many did we detect?"
Manual evaluation plays an important role. You are unsure if your staged spills are
sufficiently realistic compared to actual spills. To get a better sense how well your model
performs with actual spills, you find additional examples from historical records. This allows
you to confirm that your model is performing satisfactorily.

Step Five: Model Inference


The model can be deployed on a system that enables you to run machine learning workloads
such as AWS Panorama.
Thankfully, most of the time, the results will be from the class 'Does not contain spill.'

No spill detected

But, when the class 'Contains spill' is detected, a simple paging system could alert the team to
respond.

Spill detected

You might also like