DMPPT 557
DMPPT 557
DMPPT 557
This way, text mining helps e-commerce platforms boost customer satisfaction and
provide better shopping experiences.
Text Mining vs Web Mining
• Text mining and web mining are related but different.
• Text Mining focuses specifically on extracting insights from text data, regardless of where it’s stored
(documents, emails, reviews, etc.). It analyzes words and phrases to find patterns and meaning in text.
• Web Mining, on the other hand, deals with extracting information from web sources, like websites
and social media. It includes text mining but also covers other data types, like images, links, and
website structure.
• Example : Suppose a company wants to understand customer sentiment about their products online:
• With Text Mining: They analyze the text in customer reviews to find positive or negative feelings.
• With Web Mining: They also analyze how often these reviews are shared, which websites link to
them, and look at other types of data, like social media trends.
• In short, text mining is a part of web mining—it’s used to study just the text, while web mining
studies all data types across the web.
Hierarchy of Categories in Text Mining
In text mining, the "hierarchy of categories" represents the layered process by which
raw text data is transformed into meaningful insights. It starts with simple steps, like
cleaning the text, and advances to more complex analysis, such as detecting patterns
and making predictions.
• Example: Suppose you have customer reviews with sentences like "I absolutely LOVED this
product!! The battery life is incredible."
• Steps:
• Tokenization: Breaks the sentence into words or "tokens" like ["I," "absolutely," "LOVED," "this," "product,"
"battery," "life," "is," "incredible"].
• Lowercasing: Converts all text to lowercase to treat "LOVED" and "loved" as the same.
• Removing Stop Words: Removes common words like "I," "this," and "is" to focus on words with meaning.
• Stemming/Lemmatization: Converts words to their root form (e.g., "loved" becomes "love").
• Purpose: Text preprocessing makes data more manageable, ensuring that the next stages are
more accurate and meaningful.
2. Information Extraction
• Here, specific, relevant pieces of information are extracted from the text. This level
pulls out details that make the text easier to analyze, often turning it into structured data.
• Example: From reviews like "John purchased an iPhone from New York last Tuesday,"
information extraction would pull out:
• Entities: Recognize "John" as a person, "iPhone" as a product, "New York" as a location, and
"Tuesday" as a date.
• Keywords: Identify important words like "purchased" and "iPhone."
• Relationship Extraction: Detects that "John" and "iPhone" have a "purchase" relationship.
• Purpose: This step simplifies the data by extracting essential details, making it easier to
analyze or store in a structured format.
3. Text Classification
• Text classification involves sorting texts into predefined categories or labels.
This step helps organize large amounts of data, making it easier to analyze
specific segments.
• Example: Imagine sorting customer reviews into categories like "Positive,"
"Negative," and "Neutral."
• A review like "The product quality is amazing!" would be classified as Positive.
• A review like "This is terrible, it broke after one day" would be classified as Negative.
• Other Uses: Classification can also label topics (e.g., sorting news articles into
categories like "Politics," "Sports," or "Technology").
• Purpose: Text classification enables you to filter, sort, and analyze data based on
specific labels, making insights more actionable.
4. Clustering and Topic Modeling
• Clustering and topic modeling organize texts based on similarities or common
themes, without predefined categories. This helps reveal patterns and connections
in the data.
• Example: Suppose you have a collection of product reviews and want to see what
topics customers are talking about.
• Clustering: Groups similar reviews together. For example, it might group reviews related
to "battery life" in one cluster and "camera quality" in another.
• Topic Modeling: Discovers common topics in a set of reviews. For example, it might
reveal topics like "affordability," "durability," or "customer service" across various reviews.
• Purpose: Clustering and topic modeling are helpful for exploring data when you
don’t know what themes exist. They can reveal patterns or popular topics in large
sets of text.
5. Sentiment Analysis
• Sentiment analysis identifies the emotions or attitudes expressed in the text. It
goes beyond basic classification by interpreting the tone of the message.
• Example: In customer feedback, sentiment analysis can tell if people are happy,
angry, or indifferent about a product.
• For instance, "I’m thrilled with this purchase!" would be tagged as Positive Sentiment.
• "This product is the worst!" would be tagged as Negative Sentiment.
• Some sentiment analysis tools also identify specific emotions like joy, sadness, anger, or
surprise.