Knowledge Representation in Data Mining

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

KNOWLEDGE REPRESENTATION

IN DATA MINING
• Knowledge representation is the presentation of
knowledge to the user for visualization in terms of trees,
tables, rules graphs, charts, matrices, etc.

• For Example: Histograms
Example: Histogram of an
electricity bill generated for
4 months, as shown in
diagram given below.
Histograms
Histogram provides the representation of a distribution of values of
a single attribute.
It consists of a set of rectangles, that reflects the counts or
frequencies of the classes present in the given data.
DATA VISUALIZATION

Some of the vital data visualization techniques are:


1. Pixel- oriented visualization technique
In pixel based visualization techniques, there are separate sub- windows for the
value of each attribute and it is represented by one colored pixel.
• It maximizes the amount of information represented at one time without any overlap.
• Tuple with 'm' variable has different 'm' colored pixel to represent each variable and each variable has a
sub window.
• The color mapping of the pixel is decided on the basis of data characteristics and visualization tasks.
2. Geometric projection visualization technique
• Techniques used to find geometric transformation are:

• Scatter-plot matrices
• Hyper slice
• Parallel co-ordinates
3. Icon-based visualization techniques
-The most commonly used technique is Chernoff faces.
• Chernoff faces
• This concept was introduced by Herman Chernoff in 1973.
• The faces in Chernoff faces are related to facial expressions or
features of human being. So, it becomes easy to identify the
difference between the faces.
• It includes the mapping of different data dimensions with different
facial features.
For example: The face width, the length of the mouth and
the length of nose, etc. as shown in the following diagram.
4. Hierarchical visualization techniques
-Hierarchical visualization techniques are used forpartitioning of all
dimensions in to subset.
Some of the visualization techniques are:

• Dimensional stacking
•In dimension stacking, n-dimensional attribute space is partitioned  in
2-dimensional subspaces.
•Attribute values are partitioned into various classes.
•Each element is two dimensional space in the form of xy plot.
•Helps to mark the important attributes and are used on the outer level.
• Mosaic plot
• Mosaic plot gives the graphical representation of successive
decompositions.
• Rectangles are used to represent the count of categorical data and at
every stage, rectangles are split parallel.
• Worlds within worlds
• Worlds within worlds are useful to generate an interactive hierarchy of
display.
• Innermost word must have a function and two most important parameters.
• Remaining parameters are fixed with the constant value.
• Through this, N-vision of data are possible like data glove and stereo
displays, including rotation, scaling (inner) and translation (inner/outer).
• Using queries, static interaction is possible
•  Tree maps
• Tree maps visualization techniques are well suited for displaying large
amount of hierarchical structured data.
• The visualization space is divided into the multiple rectangles that are
ordered, according to a quantitative variable.
• The levels in the hierarchy are seen as rectangles containing the other
rectangle.
• Each set of rectangles on the same level in the hierarchy represents a
category, a column or an expression in a data set
• Visualization complex data and relations
This technique is used to visualize non-numeric data.

For example: text, pictures, blog entries and product reviews.


• A tag cloud is a visualization method which helps to
understand the information of user generated tags.
• It is also possible to arrange the tags alphabetically or
according to the user preferences with different font sizes
and colors.
PRE-PROCESSING OF THE DATA

• The process of transformation of the data into information by using


different methods like classifying, sorting, merging, recording
retrieving, transmitting and recording is called as data
processing. The data processing can be performed manually or
automatically.
• The data pre-processing is required, if the data
is incomplete (data is incomplete when attributes or attribute
values are missing), noisy (data contains errors), unreliable or
irrelevant.
THE DATA PRE-PROCESSING INVOLVES SEVERAL
OPERATIONS SUCH AS CLEANING, INTEGRATION,
REDUCTION, TRANSFORMATION AND DISCRETIZATION.

1. Data cleaning
Data cleaning is also known as scrubbing or cleansing.
The different steps involved in data cleaning are:
i. Parsing
ii. Correcting
iii. Standardizing
iv. Matching
v. Consolidating
The data cleansing process detects and removes the errors and inconsistencies to improve the quality of the
data.
2. Data integration
It is a process of collecting and combining the data from the several
available data sources to provide the standardized view of the data to
database users.
3. Data Transformation
Data in operational databases keeps changing according  to the
requirements, so the data warehouse can face the problem of
inconsistency, while integrating the data from multiple data sources.
4. Data reduction
The amount of the data extracted in the data-ware house may be very
large. Mining and analyzing such data may be time consuming.
5. Data Discretization
Data discretization method is used to reduce the size of the data.
In this method, a continuous attribute is divided into intervals.

You might also like