ML and AI Notes
ML and AI Notes
ML and AI Notes
Objectives
On completion of this Machine Learning tutorial, you will learn how to:
Define Artificial Intelligence (AI) and understand its relationship with data
Define Machine Learning (ML) and understand its relationship with Artificial
Intelligence
Understand Machine Learning approach and its relationship with data science
Identify the application
Data Economy
Let us quickly understand the importance of Data in brief.
The world is witnessing the real-time flow of all types of structured and
unstructured data from social media, communication, transportation, sensors,
and devices.
International Data Corporation (IDC) forecasts that 180 zettabytes of data will
be generated by 2025.
This explosion of data has given rise to a new economy known as the Data
Economy.
Data is the new oil that is precious but useful only when cleaned and
processed.
There is a constant battle for ownership of data between enterprises to derive
benefits from it.
Self-driving cars
Applications like Siri that understand and respond to human speech
Google’s AlphaGo AI has defeated many Go champions such as Ke Jie
Implementing AI in chess
Amazon ECHO product (home control chatbot device)
Hilton using Connie – concierge robot from IBM Watson
Powerful Processing
Better Decision Making & Prediction
Quicker Processing
Accurate
Affordable Data Management
Inexpensive
Analyzing Complex Big Data
Example: Learning from new spam words or new speech (also called incremental
learning)
As you go from rule-based systems to the deep learning ones, more complex
features and input-output relationships become learnable.
The Relationship between Data Science and Machine
Learning
Let us understand the relationship between Data Science and Machine Learning.
Data Science and Machine Learning go hand in hand. Data Science helps
evaluate data for Machine Learning algorithms
Data science is the use of statistical methods to find patterns in the data.
Statistical machine learning uses the same math and techniques as data
science.
These techniques are integrated into algorithms that learn and improve on
their own.
Machine Learning facilitates Artificial Intelligence as it enables machines to
learn from the patterns in data.
1. Classification
2. Categorization
3. Clustering
4. Trend analysis
5. Anomaly detection
6. Visualization
7. Decision making
Machine Learning can learn from labeled data (known as supervised learning)
or unlabelled data (known as unsupervised learning).
Machine Learning algorithms involving unlabelled data, or unsupervised
learning, are more complicated than those with the labeled data or supervised
learning
Machine Learning algorithms can be used to make decisions in subjective
areas as well.
Examples:
Logistic Regression can be used to predict which party will win at the ballots.
Naïve Bayes algorithm can separate valid emails from spam.
Get to know about the applications of Machine Learning. Click here for more!
Image Processing
Robotics
Data Mining
Video Games
Text Analysis
Healthcare
Human simulation
Robotics Industrial robotics
Anomaly detection
Grouping and Predictions
Data Mining
Association rules
Healthcare Startups
Healthcare
Machine Learning Tutorial Overview
By the end of this Machine Learning tutorial, you will be able to:
Target Audience
There is an increasing demand for skilled Machine Learning Engineers across all
industries, making this Machine Learning certification course well-suited for
participants at the intermediate level of experience. We recommend this Machine
Learning training course for the following professionals in particular:
Let’s look into the prerequisites below in this Machine Learning Tutorial.
Prerequisites
For this Machine Learning tutorial, you should have:
Lessons Covered
Let’s look into the lessons covered below in this Machine Learning tutorial.
Chapter
Chapter Name Topics
No.
In this lesson, you will learn more about
Supervised learning
Unsupervised learning
Lesson 2 Techniques of Machine Learning Semi-supervised and
Reinforcement learning
Bias and variance trade-off
Representation learning
Data preparation
Feature engineering
Lesson 3 Data Preprocessing
Feature scaling
Datasets
Dimensionality reduction
Key Takeaways
Let us quickly look at what you have learned so far in this Machine Learning tutorial.
The explosion of data has given rise to a new economy known as the Data
Economy
AI refers to the intelligence in machines that simulates human intelligence.
The capability of AI systems to learn by extracting patterns from data is known
as Machine Learning
Statistical machine learning uses the same math and techniques as data
science.
Artificial intelligence and Machine learning are being increasingly used in
various functions such as image processing, text analysis, healthcare, data
mining, robotics, and video games.
Objectives
Let us look at some of the objectives under this Techniques of Machine Learning
tutorial.
Amazon uses supervised learning algorithms to predict what items the user may like
based on the purchase history of similar classes of users. New Input Predicted
Output Algorithm Trained on Historical Data.
Data Preparation
Training Step
Evaluation or Test Step
Production Deployment
1. Once the algorithm is trained, test it with test data (a set of data instances that do not
appear in the training set).
2. A well-trained algorithm can predict well for new test data.
3. If the learning is poor, we have an underfit situation. The algorithm will not work well
on test data. Retraining may be needed to find a better fit.
4. If learning on training data is too intensive, it may lead to overfitting – a situation
where the algorithm is not able to handle new testing data that it has not seen before.
The technique to keep data generic is called regularization.
Classification
Regression
Example: Social media sentiment analysis has three potential outcomes, positive,
negative, or neutral.
Example: Given the age and salary of consumers, predict whether they will be
interested in purchasing a house. You can perform this in your lab environment with
the dataset available in the LMS.
Regression Supervised Learning
Given below are some elements of Regression Supervised learning.
Answers “How much?”
Applied when the output is a continuous number
A simple regression algorithm: y = wx + b. Example: the relationship between
environmental temperature (y) and humidity levels (x)
Example
Given the details of the area a house is located, predict the prices. You can perform
this in your lab environment with the dataset available in the LMS.
Eager to know about other Machine Learning Techniques? Click for course preview!
1. Clustering
2. Visualization Algorithms
3. Anomaly Detection
Anomaly Detection
This algorithm detects anomalies in data without any prior training. It can detect
suspicious credit card transactions and differentiate a criminal from a set of people.
What is Semi-Supervised Learning?
It is a hybrid approach (combination of Supervised and Unsupervised Learning) with
some labeled and some non-labeled data.
Example of Semi-Supervised Learning
Google Photos automatically detects the same person in multiple photos from a
vacation trip (clustering – unsupervised). One has to just name the person once
(supervised), and the name tag gets attached to that person in all the photos.
The learning system (agent) observes the environment, selects and takes certain
actions, and gets rewards in return (or penalties in certain cases).
The agent learns the strategy or policy (choice of actions) that maximizes its rewards
over time.
Bias refers to the error in the machine learning model due to wrong assumptions. A
high-bias model will underfit the training data.
Variance refers to problems caused due to overfitting. This is a result of the over-
sensitivity of the model to small variations in the training data. A model with many
degrees of freedom (such as a high-degree polynomial model) is likely to have high
variance and thus overfit the training data.
Conversely, reducing a model’s complexity will increase its bias and reduce its
variance. This is why it is called a tradeoff.
If the two classes can't be separated by a linear decision boundary, you can set a
maximum number of passes over the training dataset (epochs) and/or a threshold for
the number of tolerated misclassifications. The perceptron would never stop
updating the weights otherwise.
Interested in Machine Learning? Check out the course description here!
Key Takeaways
Let us run through what you have covered in this tutorial of Machine Learning
Techniques.
Objectives
Let’s look at the objectives of Data Preprocessing Tutorial.
Data Selection
Data Preprocessing
Data Transformation
Data Selection
Steps involved in Data Selection involves:
There is a vast volume, variety, and velocity of available data for a Machine Learning
problem.
This step involves selecting only a subset of the available data.
The selected sample must be an accurate representation of the entire population.
Some data can be derived or simulated from the available data if required.
Data not relevant to the problem at hand can be excluded.
Data Preprocessing
Let’s understand Data Preprocessing in detail below.
After the data has been selected, it needs to be preprocessed using the given steps:
Data cleaning at this stage involves filtering it based on the following variables:
Insufficient Data
The amount of data required for ML algorithms can vary from thousands to millions,
depending upon the complexity of the problem and the chosen algorithm.
Non-Representative Data
The sample selected must be an exact representation of the entire data, as non-
representative data might train an algorithm such that it won't generalize well on new
test data.
Substandard Data
Outliers, errors, and noise can be eliminated to get a better fitment of the model.
Missing features such as age for 10% of the audience may be ignored completely, or
an average value can be assumed for the missing component.
Data Preprocessing(Contd.)
Selecting the right size of the sample is a key step in data preparation. Samples that
are too large or too small might give skewed results.
Sampling Noise
Smaller samples cause sampling noise since they get trained on non-representative
data. For example, checking voter sentiment from a very small subset of voters.
Sampling Bias
Larger samples work well as long as there is no sampling bias, that is, hen the right
data is picked. For example, sampling bias would occur when checking voter
sentiment only for the technically sound subset of voters, while ignoring others.
Example
Let us look at the Data Sample below:
1. Scaling: It involves selecting the right feature scaling for the selected and
preprocessed data.
2. Aggregation: This is the last step to collate a bunch of data features into a single one.
Types of Data
Lets us look at the Types of Data below.
Labeled Data or Training Data
Unlabeled Data
Test Data
Data provided to test a hypothesis created via prior learning is known as test
data.
Typically 20% of labeled data is reserved for the test.
Validation data
It is a dataset used to retest the hypothesis (in case the algorithm got overfitted to
even the test data due to multiple attempts at testing).
The illustration given below depicts how total available labeled data may be
segregated into the training dataset, test dataset, and validation dataset.
Feature Engineering
The transformation stage in the data preparation process includes an important step
known as Feature Engineering.
Definition of Feature Engineering
Feature Engineering refers to selecting and extracting the right features from the
data that are relevant to the task and model in consideration.
Feature Engineering in ML
The place of feature engineering in the machine learning workflow is shown below:
Feature Scaling
Feature scaling is an important step in the data transformation stage of the data
preparation process.
Definition of Feature Scaling
Feature Scaling is a method used in Machine Learning for standardization of
independent variables of data features.
Let’s consider a situation where input data has two features, one ranging from value
1 to 100 and the other from 1 to 10000.
This might cause an error in machine learning algorithms, like mean squared error
method, when the optimizer tries to minimize larger errors in the second feature.
The computed Euclidean distances between samples will be dominated by the
second feature axis in the K-nearest neighbors (KNN) algorithm.
The solution lies in scaling all the features on a similar scale (0 to 1) or (1 to 10).
1. Standardization
2. Normalization
Standardization is a popular feature scaling method, which gives data the property of
a standard normal distribution (also known as Gaussian distribution).
All features are standardized on the normal distribution (a mathematical model).
The mean of each feature is centered at zero, and the feature column has a standard
deviation of one.
Standardization: Example
To standardize the jth feature, you need to subtract the sample mean uj from every
training sample and divide it by its standard deviation σj as given below:
Here, xj is a vector consisting of the jth feature values of all training samples n.
Given below is a sample NumPy code that uses NumPy mean and standard
functions to standardize features from a sample data set X (x0, x1...) :
In effect, it measures the relative percentage of distance of each instance from the
min value for that feature.
The ML library scikit-learn has a MinMaxScaler class for normalization.
MNIST Dataset
Modified National Institute of Standards and Technology (MNIST) dataset is another
popular dataset used in ML algorithms.
Growing Datasets
As the amount of data grows in the world, the size of datasets available for ML
development also grows:
Dimensionality Reduction
Let’s look at some aspects of Dimensionality Reduction below.
Example: Car attributes might contain maximum speed in both units, kilometer per
hour, and miles per hour. One of these can be safely discarded in order to reduce
the dimensions and simplify the data.
Before the PCA algorithm is developed, you need to preprocess the data to
normalize its mean and variance.
Steps 1 and 2 reduce the mean of the data, and steps 3 and 4 rescale each
coordinate to have unit variance. It ensures that different attributes are treated on the
same scale.
For instance, if x1 was maxed speed in mph (taking values in high tens or low
hundreds) and x2 was the number of seats (taking values 2-4), then this
renormalization rescales the attributes to make them more comparable to each other.
When you project this data to lie along the axis of the unit vector, you would like to
preserve most of it, such that its variance is maximized (which means most data is
covered).
Intuitively, the data starts off with some amount of variance (information).
The figure shows this normalized data.
Let’s project data onto different u axes as shown in the charts given on the left.
Dots represent the projection of data points on this line.
In figure A, projected data has a large amount of variance, and the points are far from
zero.
In figure B, projected data has a low amount of variance, and the points are closer to
zero.
Hence, figure A is a better choice to project the data.
The length of projection of x on a unit vector u is given by xTu. This also represent
the distance of the projection of x from the origin.
Hence, to maximize the variance of the projections, you can choose a unit length u:
It is also known as the covariance matrix of the data (assuming that it has zero
mean).
Generally, if you need to project data onto the k-dimensional subspace (k < n), you
choose u1, u2...uk to be the top k Eigenvectors of ∑.
All the ui now form a new orthogonal basis for the data.
Then, to represent x(i) in this new basis, you need to compute the corresponding
vector:
The vector y(i) is a lower k-dimensional approximation of x(i). This is known as the
dimensionality reduction.
The vectors u1,u2...uk are called the first k principal components of the data.
Applications of PCA
Given below are the application of PCA.
Noise Reduction
PCA can eliminate noise or noncritical aspects of the data set to reduce complexity.
Also, during image processing or comparison, image compression can be done with
PCA, eliminating the noise such as lighting variations in face images.
Compression
It is used to map high dimensional data to lower dimensions. For example, instead of
having to deal with multiple car types (dimensions), we can cluster them into fewer
types.
Preprocess
It reduces data dimensions before running a supervised learning program and saves
on computations as well as reduces overfitting.
PCA: 3D to 2D Conversion
3D Data ----changes to----- After PCA, one finds only two dimensions being
important—Red and Green that carry most of the variance. The blue dimension has
limited variance, and hence it is eliminated.
Key Takeaways
Let us go through what you have learned so far in this Data Preprocessing tutorial.
Data preparation allows simplification of data to make it ready for Machine Learning
and involves data selection, filtering, and transformation.
Data must be sufficient, representative of real-world data, and of high quality.
Feature Engineering helps in selecting the right features and extracting the most
relevant features.
Feature scaling transforms features to bring them on a similar scale, in order to make
them comparable in ML routines.
Dimensionality Reduction allows reducing dimensions in datasets to simplify ML
training.
2019 and years to come will be dominated by deep learning trends that will create a
disrupting impact in the technology and business world, here are the Top 5 Deep
Learning Trends that will dominate 2019.
Deep learning has dramatically improved the way we live and interact with
technology. Amazon’s deep learning offering Alexa is powered to carry out a number
of functions via voice interactions, like playing music, making online purchases and
answering factual questions. Amazon’s latest offering, AmazonGo that works on AI
allows shoppers to walk out of a shop with their shopping bags and automatically get
charged with a purchase invoice sent directly to their phone.
In the future times to come, AI will be explored and deployed for groundbreaking
applications like drug discovery which can have a detrimental impact on human life if
an incorrect decision is made. Thus, audit trails to AI and deep learning predictions
are extremely important.
Every company is now a data company, capable of using machine learning in the
cloud to deploy intelligent apps at scale, thanks to three machine learning
trends: data flywheels, the algorithm economy, and cloud-hosted intelligence.
That was the takeaway from the inaugural Machine Learning / Artificial Intelligence
Summit, hosted by Madrona Venture Group* last month in Seattle, where more than
100 experts, researchers, and journalists converged to discuss the future of artificial
intelligence, trends in machine learning, and how to build smarter applications.
With hosted machine learning models, companies can now quickly analyze large,
complex data, and deliver faster, more accurate insights without the high cost of
deploying and maintaining machine learning systems.
“Every successful new application built today will be an intelligent application,” Soma
Somasegar said, venture partner at Madrona Venture Group. “Intelligent building
blocks and learning services will be the brains behind apps.”
Data Flywheels
Digital data and cloud storage follow Moore’s law: the world’s data doubles every two
years, while the cost of storing that data declines at roughly the same
rate. This abundance of data enables more features, and better machine learning
models to be created.
“In the world of intelligent applications, data will be king, and the services that can
generate the highest-quality data will have an unfair advantage from their data
flywheel — more data leading to better models, leading to a better user experience,
leading to more users, leading to more data,” Somasegar says.
For instance, Tesla has collected 780 million miles of driving data, and they’re
adding another million every 10 hours.
This data is feed into Autopilot, their assisted driving program that uses ultrasonic
sensors, radar, and cameras to steer, change lanes, and avoid collisions with little
human interaction. Ultimately, this data will be the basis for their autonomous, self-
driving car theyplan to release in 2018.
Compared to Google’s self-driving program, which has amassed just over 1.5 million
miles of driving data. Tesla’s data flywheel is in full effect.
“Algorithm marketplaces are similar to the mobile app stores that created the ‘app
economy,'” Alexander Linden, research director at Gartner said. “The essence of the
app economy is to allow all kinds of individuals to distribute and sell software globally
without the need to pitch their idea to investors or set up their own sales, marketing
and distribution channels.”
Cloud-Hosted Intelligence
For a company to discover insights about their business, using algorithmic machine
intelligence to iteratively learn from their data is the only scalable way. It’s historically
been an expensive upfront investment with no guarantee of a significant return.
“Analytics and data science today are like tailoring 40-years ago,” Sirosh said. “It
takes a long time and a tremendous amount of effort.”
For instance, an organization needs to first collect custom data, hire a team of data
scientists, continually develop the models, and optimize them to keep pace with the
rapidly changing and growing volumes of data — that’s just to get started.
With more data becoming available, and the cost to store it dropping, machine
learning is starting to move to the cloud, where a scalable web service is an API call
away. Data scientists will no longer need to manage infrastructure or implement
custom code. The systems will scale for them, generating new models on the fly, and
delivering faster, more accurate results.
“When the effort to build and deploy machine learning models becomes a lot less —
when you can ‘mass manufacture’ it — then the data to do that becomes widely
available in the cloud,” Sirosh said.
When open source machine learning and deep learning frameworks running in the
cloud, like Scikit-Learn, NLTK, Numpy, Caffe, TensorFlow, Theano, or Torch,
companies will be able to easily leverage pre-trained, hosted models to tag images,
recommend products, and do general natural language processing tasks.
“Our world view is that every company today is a data company, and every
application is an intelligent application,” Somasegar said. “How can companies get
insights from huge amounts of data and learn from that? That’s something that has
to be brought up with every organization in the world.”
As the data flywheels begin to turn, the cost to acquire, store, and compute that data
will continue to drop.
TwitterFacebookGoogle+LinkedIn
Artificial Intelligence (AI) and associated
technologies will be present across many industries, within a considerable number of
software packages, and part of our daily lives by 2020. Gartner has also predicted that by
2020, AI will become one of the top five investment priorities for at least 30 percent of Chief
Information Officers. Global software vendors are after this new gold rush. Unfortunately,
though the promise of new revenue has pushed software business owners to invest in AI
technologies, the truth is that most organizations do not have skilled staff to embrace AI.
An implicit note of warning in many industry surveys on AI and its impact on industries is
that software vendors should first focus on understanding the business-customer needs and
potential business benefits from AI, before chasing the gold rush, which has been termed as
“AI Washing,” as suggested in How Enterprise Software Providers Should (and Should Not)
Exploit the AI Disruption.
The trust deficit in the “capabilities of tech-enabled solutions” that exists today will vanish in
the next 10 years, states In Ten Years: The Future of AI and ML. Over the next decade, we
will witness a radical shift from partial mistrust and skepticism to complete dependence on
AI and other advanced technologies. Most AI-powered applications are consumer facing,
which is another solid reason for mainstream users to overcome the trust barrier over time.
With more exposure and more access to technological solutions for their daily business, the
Citizen Data Science community will pave the way for a new-technology-order world.
Leveraging AI and Machine Learning as Competitive Business Drivers claims that while
technologies like the Cloud brings agility to business processes, AI and Machine Learning
have the power to influence business outcomes.
According to Gartner:
“Artificial Intelligence and Machine Learning have reached a critical tipping point and will
increasingly augment and extend virtually every technology enabled service, thing, or
application.”
The Future of AI
In the post-industrialization era, people have worked to create a machine that behaves like a
human. The thinking machine is AI’s biggest gift to humankind; the grand entry of this self-
propelled machine has suddenly changed the operative rules of business. In the recent years,
self-driving vehicles, digital assistants, robotic factory staff, and smart cities have proven that
intelligent machines are possible. AI has transformed most industry sectors like retail,
manufacturing, finance, healthcare, and media and continues to invade new territories.
Here are some predictions about Machine Learning, based on current technology trends and
ML’s systematic progression toward maturity:
The blog post, 5 Predictions for the Future of Machine Learning from IBM Big Data Hub,
offers descriptions of the above trends.
A seasoned user of ML techniques shares his insights into the world of ML, suggesting these
trends are imminent in the field of ML:
Use of Multiple Technologies in ML: The emergence of IoT has benefitted Machine Learning in many
ways. The use of multiple technological strategies to achieve better learning is currently is practice in
ML; in the future more “collaborative learning” by utilizing multiple technologies is probable.
Personalized Computing Environment: Developers will have access to API kits to design and deliver
“more intelligent application.” In a way, this effort is akin to “assisted programming.” Through these
API kits, developers will easily embed facial, speech, or vision-recognition features into their
systems.
Quantum Computing will greatly enhance the speed of execution of ML algorithms in high-
dimensional vector processing. This will be the next conquest in the field of ML research.
Future advancement in “unsupervised ML algorithms” will lead to higher business outcomes.
Tuned Recommendation Engines: ML-enabled services of the future will become more accurate and
relevant. For example, the Recommendation Engines of the future will be far more relevant and
closer to an individual user’s personal preferences and tastes.
Machine Learning and Artificial Intelligence Trends in 2018 provides a quick roundup of the
most salient technology trends for 2018. Gartner’s Top 10 Technology Trends of 2017 sums
up the all-pervading digital fever as the existence of people, machines, and business
processes in a unified system.
It is hard to ignore the global impact of “AI Washing” in the current business market, and
how AI and ML may change the application-development markets of tomorrow.
AI and ML have jointly been given the same importance as the discovery of electricity at the
beginning of Industrial Revolution. These frontier technologies, just like electricity, have
ushered in a new era in the history of Information Technology.
Today, AI- and ML-powered systems are drastically changing the way business is done
across all industry sectors. These frontier technologies are gradually bringing about
transformative changes across industry sectors, a few of which are listed here:
In Healthcare
Gradually, human practitioners and machines will work in tandem to deliver improved
outcomes. Advanced machines will be expected to deliver accurate and timely diagnosis of
patient conditions, while the practitioners can focus more on patients.
In Finance
AI And Machine Learning are the New Future Technology Trends discusses how the latest
technologies like blockchain are impacting India’s capital markets. For instance, capital-
market operators can use blockchain to predict movements in the market and to detect fraud.
AI technologies not only provide opportunities for newer business models in the financial
market, but also solidify the AI technologist’s position in the business-investment ecosystem.
In Real Estate
Contactually.com, an advanced CRM system for the real estate business, has been
specifically designed to connect Washington DC-based investors and startups. The additional
power of Machine Learning algorithms transforms the static system into a live, interactive
machine, which responds, approves, and recommends.
In Database Administration
The repetitive tasks in an average DBA system provide opportunity for AI technologies to
automate processes and tasks. Today’s DBA is empowered with advanced tools, so that they
can make value-added contributions to their organizations rather than just performing rote
functions, as explored in What Do AI and Machine Learning Mean for DBAs.