DMPPT 557

Text Mining
Hierarchy of Categories in Text Mining

Text Mining
• Text mining, also known as text data mining or text analytics, is
the process of extracting meaningful information and patterns
from large sets of textual data.
• It combines natural language processing (NLP), machine learning,
and data analytics to process unstructured text (like articles, social
media posts, and emails) and convert it into structured data that
can be analyzed and used for various purposes.
• Text mining helps in discovering hidden insights, trends, and
relationships in the text data that might not be immediately
obvious.
Example for Text Mining
In e-commerce, text mining helps companies understand what customers think and
improve their experience. Here’s how it works in simple terms:
1. Understanding Reviews: By analyzing words in customer reviews, the company

can see if people like a product or not. For example, if a lot of reviews mention
"bad quality," it’s a sign something might be wrong with that product.
2. Personalized Suggestions: Based on what people search or write about, text

mining helps show customers products they’re likely to enjoy.
This way, text mining helps e-commerce platforms boost customer satisfaction and
provide better shopping experiences.
Text Mining vs Web Mining
• Text mining and web mining are related but different.
• Text Mining focuses specifically on extracting insights from text data, regardless of where it’s stored
(documents, emails, reviews, etc.). It analyzes words and phrases to find patterns and meaning in text.
• Web Mining, on the other hand, deals with extracting information from web sources, like websites
and social media. It includes text mining but also covers other data types, like images, links, and
website structure.
• Example : Suppose a company wants to understand customer sentiment about their products online:
• With Text Mining: They analyze the text in customer reviews to find positive or negative feelings.
• With Web Mining: They also analyze how often these reviews are shared, which websites link to
them, and look at other types of data, like social media trends.
• In short, text mining is a part of web mining—it’s used to study just the text, while web mining
studies all data types across the web.
Hierarchy of Categories in Text Mining
In text mining, the "hierarchy of categories" represents the layered process by which
raw text data is transformed into meaningful insights. It starts with simple steps, like
cleaning the text, and advances to more complex analysis, such as detecting patterns
and making predictions.
In text mining, the hierarchy of categories is a structure that organizes text

documents into classes based on their content. This structure is called a hierarchical
category space, and the process of assigning documents to categories is called
hierarchical text classification (HTC)
Text Preprocessing
• This is the first and most crucial step, where raw text is cleaned up and prepared for analysis.
• Example: Suppose you have customer reviews with sentences like "I absolutely LOVED this
product!! The battery life is incredible."
• Steps:
• Tokenization: Breaks the sentence into words or "tokens" like ["I," "absolutely," "LOVED," "this," "product,"
"battery," "life," "is," "incredible"].
• Lowercasing: Converts all text to lowercase to treat "LOVED" and "loved" as the same.
• Removing Stop Words: Removes common words like "I," "this," and "is" to focus on words with meaning.
• Stemming/Lemmatization: Converts words to their root form (e.g., "loved" becomes "love").
• Purpose: Text preprocessing makes data more manageable, ensuring that the next stages are
more accurate and meaningful.
2. Information Extraction
• Here, specific, relevant pieces of information are extracted from the text. This level
pulls out details that make the text easier to analyze, often turning it into structured data.
• Example: From reviews like "John purchased an iPhone from New York last Tuesday,"
information extraction would pull out:
• Entities: Recognize "John" as a person, "iPhone" as a product, "New York" as a location, and
"Tuesday" as a date.
• Keywords: Identify important words like "purchased" and "iPhone."
• Relationship Extraction: Detects that "John" and "iPhone" have a "purchase" relationship.
• Purpose: This step simplifies the data by extracting essential details, making it easier to
analyze or store in a structured format.
3. Text Classification
• Text classification involves sorting texts into predefined categories or labels.
This step helps organize large amounts of data, making it easier to analyze
specific segments.
• Example: Imagine sorting customer reviews into categories like "Positive,"
"Negative," and "Neutral."
• A review like "The product quality is amazing!" would be classified as Positive.
• A review like "This is terrible, it broke after one day" would be classified as Negative.
• Other Uses: Classification can also label topics (e.g., sorting news articles into
categories like "Politics," "Sports," or "Technology").
• Purpose: Text classification enables you to filter, sort, and analyze data based on
specific labels, making insights more actionable.
4. Clustering and Topic Modeling
• Clustering and topic modeling organize texts based on similarities or common
themes, without predefined categories. This helps reveal patterns and connections
in the data.
• Example: Suppose you have a collection of product reviews and want to see what
topics customers are talking about.
• Clustering: Groups similar reviews together. For example, it might group reviews related
to "battery life" in one cluster and "camera quality" in another.
• Topic Modeling: Discovers common topics in a set of reviews. For example, it might
reveal topics like "affordability," "durability," or "customer service" across various reviews.
• Purpose: Clustering and topic modeling are helpful for exploring data when you
don’t know what themes exist. They can reveal patterns or popular topics in large
sets of text.
5. Sentiment Analysis
• Sentiment analysis identifies the emotions or attitudes expressed in the text. It
goes beyond basic classification by interpreting the tone of the message.
• Example: In customer feedback, sentiment analysis can tell if people are happy,
angry, or indifferent about a product.
• For instance, "I’m thrilled with this purchase!" would be tagged as Positive Sentiment.
• "This product is the worst!" would be tagged as Negative Sentiment.
• Some sentiment analysis tools also identify specific emotions like joy, sadness, anger, or
surprise.
• Aspect-Based Sentiment Analysis: Analyzes specific parts of a review. For

instance, in "The camera is fantastic, but the battery life is disappointing," the
analysis can rate the camera as positive and battery life as negative.
• Purpose: Sentiment analysis gives businesses insight into customer feelings,
helping them respond to issues and improve satisfaction.
6. Advanced Analytics and Pattern Recognition
• This is the final level, where deeper insights are uncovered. Advanced analytics
can identify trends, predict outcomes, and detect anomalies in text data.
• Example: A company analyzing customer support chats might use pattern
recognition to detect:
• Trends: A rising number of complaints about "delivery time" may indicate a shipping
issue.
• Anomalies: Sudden negative comments about "website performance" could point to a
technical problem.
• Predictive Analytics: By analyzing past issues, the system might predict which products
are likely to receive complaints in the future.
• Purpose: Advanced analytics transforms data into actionable insights, helping
organizations make informed, data-driven decisions.
Applications of Text Mining
• Sentiment Analysis: Understanding user sentiments from social

media or product reviews.
• Spam Detection: Identifying and filtering unwanted emails based on
content.
• Topic Modeling: Discovering abstract topics within a collection of
documents.
• Information Retrieval: Enhancing search engines by improving query
understanding.
• Customer Feedback Analysis: Analyzing reviews to derive
actionable insights for products or services.

DMPPT 557

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

DMPPT 557

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DMPPT 557

Uploaded by

Copyright:

Available Formats

Text Mining

Hierarchy of Categories in Text Mining

1. Understanding Reviews: By analyzing words in customer reviews, the company

2. Personalized Suggestions: Based on what people search or write about, text

In text mining, the hierarchy of categories is a structure that organizes text

• Aspect-Based Sentiment Analysis: Analyzes specific parts of a review. For

• Sentiment Analysis: Understanding user sentiments from social

You might also like