EXAMPLE ML in real life
EXAMPLE ML in real life
• Data collection: You collect numerous examples of homes sold in your neighbourhood within
the past year, and pay a real estate appraiser to appraise the homes whose selling price is not
known.
• Data exploration: You confirm that all of your data is numerical because most machine
learning models operate on sequences of numbers. If there is textual data, you need to
transform it into numbers. You'll see this in the next example.
• Data cleaning: Look for things such as missing information or outliers, such as the 10-room
mansion. Several techniques can be used to handle outliers, but you can also just remove
those from your dataset.
• Data visualization: You can plot home values against each of your input variables to
look for trends in your data. In the following chart, you see that when lot size increases,
the house value increases.
A linear model across a single input variable can be represented as a line. It becomes a plane
for two variables, and then a hyperplane for more than two variables. The intuition, as a line
with a constant slope, doesn't change.
Interpreting Results
In general, as your model improves, you see a better RMS result. You may still not be confident
about whether the specific value you’ve computed is good or bad.
Many machine learning engineers manually count how many predictions were off by a threshold
(for example, $50,000 in this house pricing problem) to help determine and verify the model's
accuracy.
Terminology
• Continuous: Floating-point values with an infinite range of possible values. The opposite
of categorical or discrete values, which take on a limited number of possible values.
• Hyperplane: A mathematical term for a surface that contains more than two planes.
• Plane: A mathematical term for a flat surface (like a piece of paper) on which two points
can be joined by a straight line.
• Regression: A common task in supervised machine learning.
K=2 K=3
Step Four: Model Evaluation
In machine learning, numerous statistical metrics or methods are available to evaluate a
model. In this use case, the silhouette coefficient is a good choice. This metric describes how
well your data was clustered by the model. To find the optimal number of clusters, you plot
the silhouette coefficient as shown in the following image below. You find the optimal value is
when k=19 .
Clustered data
As you inspect the preceding table, you can see that most of these text snippets are indicating
that the characters are in some kind of long-distance relationship. You see a few other self-
consistent clusters and feel you now have enough useful data to begin writing an article on
unexpected modern romance microgenres.
***********************************************************************************
• Contains spill
• Does not contain spill
Traditionally, solving this problem would require hand-engineering features on top of the
underlying pixels (for example, locations of prominent edges and corners in the image), and
then training a model on these features.
Today, deep neural networks are the most common tool used for solving this kind of
problem. Many deep neural network models are structured to learn the features on top of
the underlying pixels so you don’t have to learn them. You’ll have a chance to take a deeper
look at this in the next lesson, so we’ll keep things high-level for now.
The common problem is that Precision and Recall will be effective. You can think
of precision as answering the question, "Of all predictions of a spill, how many were right?"
and recall as answering the question, "Of all actual spills, how many did we detect?"
Manual evaluation plays an important role. You are unsure if your staged spills are
sufficiently realistic compared to actual spills. To get a better sense how well your model
performs with actual spills, you find additional examples from historical records. This allows
you to confirm that your model is performing satisfactorily.
No spill detected
But, when the class 'Contains spill' is detected, a simple paging system could alert the team to
respond.
Spill detected