Unit 5 Notes
Unit 5 Notes
Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.
With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions without
being explicitly programmed. Machine learning brings computer science and statistics together
for creating predictive models. Machine learning constructs or uses the algorithms that learn
from historical data. The more we will provide the information, the higher will be the
performance.
A machine has the ability to learn if it can improve its performance by gaining more data.
We can train machine learning algorithms by providing them the huge amount of data and let
them explore the data, construct the models, and predict the required output automatically. The
performance of the machine learning algorithm depends on the amount of data, and it can be
determined by the cost function. With the help of machine learning, we can save both time and
money.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and Amazon
have build machine learning models that are using a vast amount of data to analyze the user
interest and recommend product accordingly.
Following are some key points which show the importance of Machine Learning:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
What is R Analytics?
R has become increasingly popular over many years and remains a top analytics
language for many universities and colleges. It is well established today within
academia as well as among corporations around the world for delivering robust,
reliable, and accurate analytics. While R programming was originally seen as
difficult for non-statisticians to learn, the user interface has become more user-
friendly in recent years. It also now allows for extensions and other plugins like
R Studio and R Excel, making the learning process easier and faster for new
business analysts and other users. It has become the industry standard for
statistical analysis and data mining projects and is due to grow in use as more
graduates enter the workforce as R-trained analysts.
What are the Benefits of R Analytics?
Business analytics in R allows users to analyze business data more efficiently.
The following are some of the main benefits realized by companies employing R
in their analytics programs:
Leveraging Big Data: R can help with querying big data and is used by many
industry leaders to leverage big data across the business. With R analytics,
organizations can surface new insights in their large data sets and make sense of
their data. R can handle these big datasets and is arguably as easy if not easier for
most analysts to use as any of the other analytics tools available today.
• Statistical testing
• Prescriptive analytics
• Predictive analytics
• Time-series analysis
• What-if analysis
• Regression models
• Data exploration
• Forecasting
• Text mining
• Data mining
• Visual analytics
• Web analytics
• Social media analytics
• Sentiment analysis
• It provides good explanatory code. For example, if you are at the early stage
of working with a machine learning project and you need to explain the
work you do, it becomes easy to work with R language comparison to
python language as it provides the proper statistical method to work with
data with fewer lines of code.
• R language is perfect for data visualization. R language provides the best
prototype to work with machine learning models.
• R language has the best tools and library packages to work with machine
learning projects. Developers can use these packages to create the best pre-
model, model, and post-model of the machine learning projects. Also, the
packages for R are more advanced and extensive than python language
which makes it the first choice to work with machine learning projects.
• lattice: The lattice package supports the creation of the graphs displaying
the variable or relation between multiple variables with conditions.
• DataExplorer: This R package focus to automate the data visualization
and data handling so that the user can pay attention to data insights of the
project.
• Dalex(Descriptive Machine Learning Explanations): This package
helps to provide various explanations for the relation between the input
variable and its output. It helps to understand the complex models of
machine learning
• dplyr: This R package is used to summarize the tabular data of machine
learning with rows and columns. It applies the “split-apply-combine”
approach.
• Esquisse: This R package is used to explore the data quickly to get the
information it holds. It also allows to plot bar graph, histograms, curves,
and scatter plots.
• caret: This R package attempts to streamline the process for creating
predictive models.
• janitor: This R package has functions for examining and cleaning dirty
data. It is basically built for the purpose of user-friendliness for beginners
and intermediate users.
• rpart: This R package helps to create the classification and regression
models using two-stage procedures. The resulting models are represented
as binary trees.
There are many top companies like Google, Facebook, Uber, etc using the R
language for application of Machine Learning. The application are:
• Social Network Analytics
• To analyze trends and patterns
• Getting insights for behaviour of users
• To find the relationships between the users
• Developing analytical solutions
• Accessing charting components
• Embedding interactive visual graphics
• Web search like Siri, Alexa, Google, Cortona: Recognize the user’s
voice and fulfill the request made
• Social Media Service: Help people to connect all over the world and also
show the recommendations of the people we may know
• Online Customer Support: Provide high convenience of customer and
efficiency of support agent
• Intelligent Gaming: Use high level responsive and adaptive non player
characters similar to human like intelligence
• Product Recommendation: A software tool used to recommend the
product that you might like to purchase or engage with
• Virtual Personal Assistance: It is the software which can perform the
task according to the instructions provided
• Traffic Alerts: Help to switch the traffic alerts according to the situation
provided
• Online Fraud Detection: Check the unusual functions performed by the
user and detect the frauds
• Healthcare: Machine Learning can manage a large amount of data beyond
the imagination of normal human being and help to identify the illness of
the patient according to symptoms
• Real world example: When you search for some kind of cooking recipe
on youTube, you will see the recommendations below with the title “You
May Also Like This”. This is a common use of Machine Learning.
Packages
We will be using, directly or indirectly, the following packages through the chapters:
• caret
• ggplot2
• mlbench
• class
• caTools
• randomForest
• impute
• ranger
• kernlab
• class
• glmnet
• naivebayes
• rpart
• rpart.plot
Supervised Learning:
Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct output.
In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly. It applies the
same concept as a student learns in the supervision of the teacher.
In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.
The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to
identify the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it
classifies the shape on the bases of a number of sides, and predicts the output.
Supervised learning deals with or learns with “labeled” data. This implies that some data is
already tagged with the correct answer.
Types:-
• Regression
• Logistic Regression
• Classification
• Naive Bayes Classifiers
• K-NN (k nearest neighbors)
• Decision Trees
• Support Vector Machine
Regression:
• Dependent Variable: This is the variable that we are trying to understand or forecast.
• Independent Variable: These are factors that influence the analysis or target variable
and provide us with information regarding the relationship of the variables with the
target variable.
Regression analysis is used for prediction and forecasting. This statistical method is
used across different industries such as,
• Financial Industry- Understand the trend in the stock prices, forecast the prices, and
evaluate risks in the insurance domain
• Marketing- Understand the effectiveness of market campaigns, and forecast pricing
and sales of the product.
• Manufacturing- Evaluate the relationship of variables that determine to define a better
engine to provide better performance
• Medicine- Forecast the different combinations of medicines to prepare generic
medicines for diseases.
Logistic Regression
Classification
In Regression algorithms, we have predicted the output for
continuous values, but to predict the categorical values, we
need Classification algorithms.
What is the Classification Algorithm?
The Classification algorithm is a Supervised Learning
technique that is used to identify the category of new
observations on the basis of training data. In
Classification, a program learns from the given dataset or
observations and then classifies new observation into a
number of classes or groups. Such as, Yes or No, 0 or 1,
Spam or Not Spam, cat or dog, etc. Classes can be called
as targets/labels or categories.
Unlike regression, the output variable of Classification is
a category, not a value, such as "Green or Blue", "fruit or
animal", etc. Since the Classification algorithm is a
Supervised learning technique, hence it takes labeled input
data, which means it contains input with the
corresponding output.
Naïve Bayes classifer
• Naïve Bayes algorithm is a supervised learning algorithm,
which is based on Bayes theorem and used for solving
classification problems.
• It is mainly used in text classification that includes a high-
dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most
effective Classification algorithms which helps in building
the fast machine learning models that can make quick
predictions.
• It is a probabilistic classifier, which means it predicts on
the basis of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm
are spam filtration, Sentimental analysis, and classifying
articles.
K-NN (k nearest neighbors)
• K-Nearest Neighbour is one of the simplest Machine
Learning algorithms based on Supervised Learning
technique.
• K-NN algorithm assumes the similarity between the new
case/data and available cases and put the new case into the
category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies
a new data point based on the similarity.
• This means when new data appears then it can be easily
classified into a well suite category by using K- NN
algorithm.
• K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification
problems.
• K-NN is a non-parametric algorithm, which means it
does not make any assumption on underlying data.
• It is also called a lazy learner algorithm because it does
not learn from the training set immediately instead it stores
the dataset and at the time of classification, it performs an
action on the dataset.
• KNN algorithm at the training phase just stores the dataset
and when it gets new data, then it classifies that data into
a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that
looks similar to cat and dog, but we want to know either it
is a cat or dog. So for this identification, we can use the
KNN algorithm, as it works on a similarity measure. Our
KNN model will find the similar features of the new data
set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories
that are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we want
a model that can accurately identify whether it is a cat or dog, so such a model can be
created by using the SVM algorithm. We will first train our model with lots of images of
cats and dogs so that it can learn about different features of cats and dogs, and then we
test it with this strange creature. So as support vector creates a decision boundary
between these two data (cat and dog) and choose extreme cases (support vectors), it will
see the extreme case of cat and dog. On the basis of the support vectors, it will classify it
as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.
Unsupervised learning
It uses machine learning algorithms to analyze and cluster
unlabeled datasets.
These algorithms discover hidden patterns or data groupings
without the need for human intervention.
• Unsupervised machine learning finds all kind of unknown
patterns in data.
• Unsupervised methods help you to find features which can
be useful for categorization.
• It is taken place in real time, so all the input data to be
analyzed and labeled in the presence of learners.
• It is easier to get unlabeled data from a computer than
labeled data, which needs manual intervention.
Working of Unsupervised Learning
Clustering Types of Unsupervised Learning Algorithms
Below are the clustering types of Unsupervised Machine
Learning algorithms:
Unsupervised learning problems further grouped into
clustering and association problems.
Clustering
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-
shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering may look similar,
but they both differ depending on how they work. As there is no requirement to
predetermine the number of clusters as we did in the K-Means algorithm.
o Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them together to form one
cluster. There will be N-2 clusters.
o Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images:
o Step-5: Once all the clusters are combined into onebig cluster, develop the
dendrogram to divide the clusters as per the problem.
2. Complete Linkage: It is the farthest distance between the two points of two different
clusters. It is one of the popular linkage methods as it forms tighter clusters than single-
linkage.
3. Average Linkage: It is the linkage method in which the distance between each pair of
datasets is added up and then divided by the total number of datasets to calculate the
average distance between two clusters. It is also one of the most popular linkage methods.
4. Centroid Linkage: It is the linkage method in which the distance between the centroid of
the clusters is calculated. Consider the below image:
From the above-given approaches, we can apply any of them according to the type of
problem or business requirement.
K-means Clustering
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and
so on.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables
is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them
into different clusters. It means here we will try to group these datasets into two
different clusters.
o We need to choose some random k points or centroid to form the cluster. These
points can be either the points from the dataset or any other point. So, here we are
selecting the below two points as k points, which are not the part of our dataset.
Consider the below image:
o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have studied
to calculate the distance between two points. So, we will draw a median between
both the centroids. Consider the below image:
From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color
them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by choosing a
new centroid. To choose the new centroids, we will compute the center of gravity
of these centroids, and will find new centroids as below:
o Next, we will reassign each datapoint to the new centroid. For this, we will repeat
the same process of finding a median line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and two
blue points are right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new
centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign the
data points. So, the image will be:
o We can see in the above image; there are no dissimilar data points on either side
of the line, which means our model is formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two final
clusters will be as shown in the below image:
Agglomerative clustering
It is also known as the bottom-up approach or hierarchical agglomerative
clustering (HAC). A structure that is more informative than the unstructured
set of clusters returned by flat clustering. This clustering algorithm does not
require us to prespecify the number of clusters. Bottom-up algorithms treat
each data as a singleton cluster at the outset and then successively
agglomerate pairs of clusters until all clusters have been merged into a single
cluster that contains all data.
Steps:
• Consider each alphabet as a single cluster and calculate the
distance of one cluster from all the other clusters.
• In the second step, comparable clusters are merged together to
form a single cluster. Let’s say cluster (B) and cluster (C) are very
similar to each other therefore we merge them in the second step
similarly to cluster (D) and (E) and at last, we get the clusters [(A),
(BC), (DE), (F)]
• We recalculate the proximity according to the algorithm and merge
the two nearest clusters([(DE), (F)]) together to form new clusters
as [(A), (BC), (DEF)]
• Repeating the same process; The clusters DEF and BC are
comparable and merged together to form a new cluster. We’re now
left with clusters [(A), (BCDEF)].
• At last, the two remaining clusters are merged together to form a
single cluster [(ABCDEF)].
Dendrogram
A dendrogram is a diagram that shows the hierarchical relationship between
objects. It is most commonly created as an output from hierarchical
clustering. The main use of a dendrogram is to work out the best way to
allocate objects to clusters. The dendrogram below shows the hierarchical
clustering of six observations shown on the scatterplot to the
left. (Dendrogram is often miswritten as dendogram.
K- Nearest neighbors
K-Nearest Neighbours is one of the most basic yet essential classification
algorithms in Machine Learning. It belongs to the supervised learning domain
and finds intense application in pattern recognition, data mining, and
intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric,
meaning, it does not make any underlying assumptions about the distribution
of data (as opposed to other algorithms such as GMM, which assume
a Gaussian distribution of the given data). We are given some prior data (also
called training data), which classifies coordinates into groups identified by an
attribute.
As an example, consider the following table of data points containing two
features:
Distance Metrics Used in KNN Algorithm
As we know that the KNN algorithm helps us identify the nearest points or
the groups for a query point. But to determine the closest groups or the
nearest points for a query point we need some metric. For this purpose, we
use below distance metrics:
• Euclidean Distance
• Manhattan Distance
• Minkowski Distance
Euclidean Distance
This is nothing but the cartesian distance between the two points which are
in the plane/hyperplane. Euclidean distance can also be visualized as the
length of the straight line that joins the two points which are into
consideration. This metric helps us calculate the net displacement done
between the two states of an object.
Manhattan Distance
This distance metric is generally used when we are interested in the total
distance traveled by the object instead of the displacement. This metric is
calculated by summing the absolute difference between the coordinates of
the points in n-dimensions.
Minkowski Distance
We can say that the Euclidean, as well as the Manhattan distance, are
special cases of the Minkowski distance.
From the formula above we can say that when p = 2 then it is the same as
the formula for the Euclidean distance and when p = 1 then we obtain the
formula for the Manhattan distance.
The above-discussed metrics are most common while dealing with
a Machine Learning problem but there are other distance metrics as well
like Hamming Distance which come in handy while dealing with problems
that require overlapping comparisons between two vectors whose contents
can be boolean as well as string values.
How to choose the value of k for KNN Algorithm?
The value of k is very crucial in the KNN algorithm to define the number of
neighbors in the algorithm. The value of k in the k-nearest neighbors (k-NN)
algorithm should be chosen based on the input data. If the input data has
more outliers or noise, a higher value of k would be better. It is recommended
to choose an odd value for k to avoid ties in classification. Cross-
validation methods can help in selecting the best k value for the given
dataset.
Applications of the KNN Algorithm
• Data Preprocessing – While dealing with any Machine Learning
problem we first perform the EDA part in which if we find that the
data contains missing values then there are multiple imputation
methods are available as well. One of such method is KNN
Imputer which is quite effective ad generally used for sophisticated
imputation methodologies.
• Pattern Recognition – KNN algorithms work very well if you have
trained a KNN algorithm using the MNIST dataset and then
performed the evaluation process then you must have come across
the fact that the accuracy is too high.
• Recommendation Engines – The main task which is performed by
a KNN algorithm is to assign a new query point to a pre-existed
group that has been created using a huge corpus of datasets. This
is exactly what is required in the recommender systems to assign
each user to a particular group and then provide them
recommendations based on that group’s preferences.
Advantages of the KNN Algorithm
• Easy to implement as the complexity of the algorithm is not that
high.
• Adapts Easily – As per the working of the KNN algorithm it stores
all the data in memory storage and hence whenever a new example
or data point is added then the algorithm adjusts itself as per that
new example and has its contribution to the future predictions as
well.
• Few Hyperparameters – The only parameters which are required
in the training of a KNN algorithm are the value of k and the choice
of the distance metric which we would like to choose from our
evaluation metric.
Disadvantages of the KNN Algorithm
• Does not scale – As we have heard about this that the KNN
algorithm is also considered a Lazy Algorithm. The main significance
of this term is that this takes lots of computing power as well as
data storage. This makes this algorithm both time-consuming and
resource exhausting.
• Curse of Dimensionality – There is a term known as the peaking
phenomenon according to this the KNN algorithm is affected by
the curse of dimensionality which implies the algorithm faces a hard
time classifying the data points properly when the dimensionality is
too high.
• Prone to Overfitting – As the algorithm is affected due to the
curse of dimensionality it is prone to the problem of overfitting as
well. Hence generally feature selection as well as dimensionality
reduction techniques are applied to deal with this problem.
Principal Components Analysis
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the
help of orthogonal transformation. These new transformed features are called
the Principal Components. It is one of the popular tools that is used for exploratory data
analysis and predictive modeling. It is a technique to draw strong patterns from the given
dataset by reducing the variances.
PCA generally tries to find the lower-dimensional surface to project the high-dimensional
data.
PCA works by considering the variance of each attribute because the high attribute shows
the good split between the classes, and hence it reduces the dimensionality. Some real-
world applications of PCA are image processing, movie recommendation system,
optimizing the power allocation in various communication channels. It is a feature
extraction technique, so it contains the important variables and drops the least important
variable.
Association
Association rules allow you to establish associations amongst
data objects inside large databases. This unsupervised
technique is about discovering interesting relationships
between variables in large databases.
For example, people that buy a new home most likely to buy
new furniture.
Other Examples:
• A subgroup of cancer patients grouped by their gene
expression measurements.
• Groups of shopper based on their browsing and
purchasing histories.
• Movie group by the rating given by movies viewers.
Supervised vs. Unsupervised Machine Learning
Unsupervised
Computational Supervised learning is learning is
Complexity a simpler method. computationally
complex
Mobile Analytics
Mobile analytics involves measuring and analysing
data generated by mobile platforms and properties,
such as mobile sites and mobile applications. AT
Internet's analytics solution lets you track, measure
and understand how your mobile users are
interacting with your mobile sites and mobile apps.
Why do companies use mobile analytics?
Mobile analytics gives companies unparalleled
insights into the otherwise hidden lives of app users.
Analytics usually comes in the form of software that
integrates into companies’ existing websites and
apps to capture, store, and analyze the data.
This data is vitally important to marketing, sales,
and product management teams who use it to make
more informed decisions.
Without a mobile analytics solution, companies are
left flying blind. They’re unable to tell what users
engage with, who those users are, what brings them
to the site or app, and why they leave.
Why are mobile analytics important?
• Mobile usage surpassed that of desktop in 2015
and smartphones are fast becoming consumers’
preferred portal to the internet. Consumers spend 70
percent of their media consumption and screen
time on mobile devices, and most of that time in
mobile apps.
• This is a tremendous opportunity for companies to
reach their consumers, but it’s also a highly
saturated market. There are more than 6.5 million
apps in the major mobile app stores, millions of web
apps, and more than a billion websites in existence.
• Companies use mobile analytics platforms to gain a
competitive edge in building mobile experiences
that stand out. Mobile analytics tools also give teams
a much-needed edge in advertising.
• As more businesses compete for customers on
mobile, teams need to understand how their ads
perform in detail, and whether app users who
interact with ads end up purchasing.
How do mobile analytics work?
Mobile analytics typically track:
• Page views
• Visits
• Visitors
• Source data
• Strings of actions
• Location
• Device information
• Login / logout
• Custom event data
Companies use this data to figure out what users want in
order to deliver a more satisfying user experience.
For example, they’re able to see:
• What draws visitors to the mobile site or app
• How long visitors typically stay
• What features visitors interact with
• Where visitors encounter problems
• What factors are correlated with outcomes like
purchases
• What factors lead to higher usage and long-term
retention
How different teams use mobile analytics:
The actual installation of mobile analytics involves adding tracking code to the sites and
SDKs to the mobile applications teams want to track. Most mobile analytics platforms will
be set up to automatically track website visits.
Platforms with codeless mobile features will be able to automatically track certain basic
features of apps such as crashes, errors, and clicks, but you’ll want to expand that by
manually tagging additional actions for tracking. With mobile analytics in place, you’ll
have deeper insights into your mobile web and app users which you can use to create
competitive, world-class products and experiences.
Collecting the data necessary for successful mobile analytics is often the
greatest challenge organizations face when attempting to understand
consumer behavior on mobile devices. Many devices do not allow for
cookies to track actions or do not use Javascript which can also help with
website data tracking.