pma unit 3 pdf

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Different Types of Clustering Models

There are several types of clustering models, each with its own strengths
and limitations. Let’s explore a few prominent ones:

 K-Means Clustering: This model partitions data into k clusters, with


each data point assigned to the cluster closest to its centroid. It is a
popular and efficient algorithm used for clustering. K-Means
clustering is widely used in various applications, such as image
compression, document clustering, and market segmentation.

 Hierarchical Clustering: This model organizes data in a hierarchical


structure, forming a tree-like structure of clusters. It is useful for
visualizing relationships between clusters at different levels.
Hierarchical clustering can be agglomerative, where each data point
starts as a separate cluster and is successively merged, or divisive,
where all data points start in one cluster and are successively split.

 Density-Based Clustering: This model identifies dense regions of


data points and groups them into clusters. It can discover clusters of
arbitrary shapes and handle noise effectively. Density-based
clustering algorithms, such as DBSCAN (Density-Based Spatial
Clustering of Applications with Noise), are particularly useful in
applications where clusters have varying densities or irregular
shapes.

Benefits of Using Clustering Models


Clustering models offer numerous benefits in machine learning:

1. Pattern Discovery: By organizing data into clusters, we can uncover


hidden patterns and relationships that might not be apparent at first
glance. This can be useful in various domains, such as customer
segmentation in marketing or anomaly detection in cybersecurity.
2. Data Reduction: Clustering helps in reducing the dimensionality of
the dataset by grouping similar data points together. This makes it
easier to interpret and analyze the data, especially when dealing with
high-dimensional datasets.
3. Anomaly Detection: Clustering models can identify outliers or
anomalies in a dataset, which can be crucial in detecting fraudulent
activities or anomalies in medical diagnosis. By comparing data
points to the established clusters, we can identify instances that
deviate significantly from the norm.
4. Feature Engineering: Clustering assists in feature engineering by
creating new features based on the clusters formed. These new
features can capture the underlying structure of the data and
enhance the predictive power of the machine learning models. For
example, in image recognition, clustering can be used to extract
visual features that represent different objects or patterns.
Collaborative filtering (CF) is a technique used by recommender
systems. Collaborative filtering has two senses, a narrow one and a more general
[1]

one. [2]

In the newer, narrower sense, collaborative filtering is a method of making


automatic predictions (filtering) about the interests of a user by collecting
preferences or taste information from many users (collaborating). The underlying
assumption of the approach is that if persons A and B have the same opinion one
issue, then they are more likely to agree on other issues than are A and a randomly
chosen person. For example, a collaborative filtering recommendation system for
preferences in television programming could make predictions about which television
show a user should like given a partial list of that user's tastes (likes or
dislikes). These predictions are specific to the user, but use information gleaned
[3]

from many users. This differs from the simpler approach of giving an average (non-
specific) score for each item of interest, for example based on its number of votes.

In the more general sense, collaborative filtering is the process of filtering information
or patterns using techniques involving collaboration among multiple agents,
viewpoints, data sources, etc. Applications of collaborative filtering typically involve
[2]

very large data sets. Collaborative filtering methods have been applied to many
kinds of data including: sensing and monitoring data, such as in mineral exploration,
environmental sensing over large areas or multiple sensors; financial data, such as
financial service institutions that integrate many financial sources; and user data
from electronic commerce and web applications.

Application on social web


[edit]
Unlike the traditional model of mainstream media, in which there are few editors who
set guidelines, collaboratively filtered social media can have a very large number of
editors, and content improves as the number of participants increases. Services
like Reddit, YouTube, and Last.fm are typical examples of collaborative filtering
based media. [18]

One scenario of collaborative filtering application is to recommend interesting or


popular information as judged by the community. As a typical example, stories
appear in the front page of Reddit as they are "voted up" (rated positively) by the
community. As the community becomes larger and more diverse, the promoted
stories can better reflect the average interest of the community members.

Wikipedia is another application of collaborative filtering. Volunteers contribute to the


encyclopedia by filtering out facts from falsehoods. [19]

Another aspect of collaborative filtering systems is the ability to generate more


personalized recommendations by analyzing information from the past activity of a
specific user, or the history of other users deemed to be of similar taste to a given
user. These resources are used as user profiling and helps the site recommend
content on a user-by-user basis. The more a given user makes use of the system,
the better the recommendations become, as the system gains data to improve its
model of that user.
Propensity modeling is a statistical approach that attempts to predict the likelihood that visitors, leads,
and customers will perform certain actions. It's a statistical technique that accounts for all the
independent and confounding variables that affect customer behavior.

Propensity modeling allows you to make more use of your data, giving you a better understanding of the
interactions and behaviors associated with an event: a prospect converting, a member canceling their
subscription, a client adding an additional service to their plan.

What is Propensity Modeling?

Propensity Modeling is a statistical technique used to predict the chances of certain events happening in
the future. With the increasing use of machine learning, companies can build robust propensity models
and make accurate forecasts. In marketing, for example, propensity models are used to
predict customer behavior.

Then, it could be as basic as finding out whether a customer was likely to respond to a particular offer,
or purchase a product, given a certain set of circumstances. Understanding the behavior of a customer
helps businesses fine-tune their marketing efforts, and accordingly allocate resources.

Importance of Propensity Modeling

As a general term, “propensity model” refers to different types of statistical models designed to predict
binary outcomes; that is, either something will happen or it won’t.

It is a statistical technique used to predict the likelihood of a certain event occurring.

Propensity modeling is a powerful tool that can be used to improve marketing campaigns, target
customers more effectively, make better business decisions, and to even predict customer churn.

While propensity modeling as a technique goes back to the early ’30s, today, machine learning is being
deployed to develop these models.

There are some preparatory steps before you can begin making these models. An enterprise needs to
first collect data on customer behavior. This data can be collected through surveys, focus groups, or
customer transaction history.

Once this data is collected, it can be used to create a statistical model that can be used to predict
customer behavior.

There are three basic sources of data. One of them is the demography of the customer, which will tell
you who your customer is.

In order to understand what the customer has done, or what action he/she has taken, you need their
transactional data, i.e. purchase history.
To understand why the customer completed that particular action, you need his/her opinions and
comments posted in the “Comments” section or on social media.

What Are The Various Propensity Models?

Propensity models are typically used by businesses to target customers with specific marketing
campaigns or identify which customers are most likely to respond to a particular offer or
measure customer churn.

There are several propensity models that can be used. The most common and oft-used ones are probit
(a type of regression model) and logit (logistic regression) models.

Simply put: Probit models are regression models where the dependent variant can only take two values,
and determine the likelihood that an item or event will fall into one of a range of categories.

It is used to predict the likelihood of an event occurring, while logit models are used to predict the odds
of the success of a certain event.

Probit models and logit models are similar, but they are based on different functions. Probit models use
probits to determine the likelihood of an item or event falling into a certain category, like married or
unmarried, while logit models use logistic functions.

One common use of propensity modeling is to predict customer purchase behavior.

Logistic regression: Logit models are commonly used in classification and predictive analytics.
Based on a dataset of independent variables, logistic regression estimates the probability of an event
occurring, such as voting or not voting. In this case, the dependent variable has a range of 0 to 1.

In addition to making predictions about categorical variables, logistic regression can also be used to
estimate relationships between dependent variables and independent variables.

Random Forest: In classification and regression problems, Random Forest is often used as a supervised
machine learning algorithm.

For classification and regression, it uses the majority vote of the decision trees created on different
samples.

What is more, the Random Forest Algorithm can handle both continuous and categorical variables,
which is why it can be used for regression and classification. As a result, it produces better results when
dealing with classification problems.

Benefits of Propensity Modeling

Propensity modeling is based on the idea that past behavior is a good predictor of future conduct. By
analyzing past data, businesses can make models that can accurately talk of the chances of events
happening.

This information can then be used to decide on marketing, product development, and other areas of the
business.

There are a number of benefits to using propensity modeling in your business:

1. Such models can be used to predict customer behavior. By understanding the factors that
influence customer behavior, businesses can use them to target their marketing and sales
efforts in an effective manner.

2. Propensity modelling can help you make smarter decisions by throwing up insights that would
otherwise not be available to an enterprise.

3. They can also be used to predict the value each customer brings in real time. By understanding
which customers are most likely to make a purchase, businesses can gauge the value of those
customers and allocate resources accordingly.

4. Such models can be used to optimize customer acquisition strategies. By figuring out which
customers are most likely to become regulars, businesses can identify which campaigns are
resonating.

5. These models can be used to optimize customer retention strategies. By knowing in advance
which customers are most likely to churn, businesses can identify the measures that can then
stop them from leaving.

6. Propensity models can be used to predict the profitability of a given customer segment. By
understanding which customers are most likely to generate revenue, businesses can optimize
their marketing and sales efforts accordingly.
7. They can also be used to improve customer service and improve the levels of customer
satisfaction.

The Limitations of Propensity Modeling

For one, it is based on past data. Also, propensity models can be biased if the data used to create them
is not representative of the population as a whole.

What’s more, propensity models are only as good as the assumptions made around customer behavior.

If these assumptions are inaccurate, the predictions will be as well. But despite such limitations,
propensity modeling can be a valuable tool for businesses looking to better understand and predict
customer behavior.
What are predictive statistical models?

Predictive modeling is a mathematical process used to predict future events or


outcomes by analyzing patterns in a given set of input data. It is a crucial component
of predictive analytics, a type of data analytics which uses current and historical data
to forecast activity, behavior and trends.

Statistical Modeling Techniques

Some popular statistical model examples include logistic regression, time-series, clustering,
and decision trees.

You might also like