Knowledge Representation in Data Mining
Knowledge Representation in Data Mining
Knowledge Representation in Data Mining
IN DATA MINING
• Knowledge representation is the presentation of
knowledge to the user for visualization in terms of trees,
tables, rules graphs, charts, matrices, etc.
• For Example: Histograms
Example: Histogram of an
electricity bill generated for
4 months, as shown in
diagram given below.
Histograms
Histogram provides the representation of a distribution of values of
a single attribute.
It consists of a set of rectangles, that reflects the counts or
frequencies of the classes present in the given data.
DATA VISUALIZATION
• Scatter-plot matrices
• Hyper slice
• Parallel co-ordinates
3. Icon-based visualization techniques
-The most commonly used technique is Chernoff faces.
• Chernoff faces
• This concept was introduced by Herman Chernoff in 1973.
• The faces in Chernoff faces are related to facial expressions or
features of human being. So, it becomes easy to identify the
difference between the faces.
• It includes the mapping of different data dimensions with different
facial features.
For example: The face width, the length of the mouth and
the length of nose, etc. as shown in the following diagram.
4. Hierarchical visualization techniques
-Hierarchical visualization techniques are used forpartitioning of all
dimensions in to subset.
Some of the visualization techniques are:
• Dimensional stacking
•In dimension stacking, n-dimensional attribute space is partitioned in
2-dimensional subspaces.
•Attribute values are partitioned into various classes.
•Each element is two dimensional space in the form of xy plot.
•Helps to mark the important attributes and are used on the outer level.
• Mosaic plot
• Mosaic plot gives the graphical representation of successive
decompositions.
• Rectangles are used to represent the count of categorical data and at
every stage, rectangles are split parallel.
• Worlds within worlds
• Worlds within worlds are useful to generate an interactive hierarchy of
display.
• Innermost word must have a function and two most important parameters.
• Remaining parameters are fixed with the constant value.
• Through this, N-vision of data are possible like data glove and stereo
displays, including rotation, scaling (inner) and translation (inner/outer).
• Using queries, static interaction is possible
• Tree maps
• Tree maps visualization techniques are well suited for displaying large
amount of hierarchical structured data.
• The visualization space is divided into the multiple rectangles that are
ordered, according to a quantitative variable.
• The levels in the hierarchy are seen as rectangles containing the other
rectangle.
• Each set of rectangles on the same level in the hierarchy represents a
category, a column or an expression in a data set
• Visualization complex data and relations
This technique is used to visualize non-numeric data.
1. Data cleaning
Data cleaning is also known as scrubbing or cleansing.
The different steps involved in data cleaning are:
i. Parsing
ii. Correcting
iii. Standardizing
iv. Matching
v. Consolidating
The data cleansing process detects and removes the errors and inconsistencies to improve the quality of the
data.
2. Data integration
It is a process of collecting and combining the data from the several
available data sources to provide the standardized view of the data to
database users.
3. Data Transformation
Data in operational databases keeps changing according to the
requirements, so the data warehouse can face the problem of
inconsistency, while integrating the data from multiple data sources.
4. Data reduction
The amount of the data extracted in the data-ware house may be very
large. Mining and analyzing such data may be time consuming.
5. Data Discretization
Data discretization method is used to reduce the size of the data.
In this method, a continuous attribute is divided into intervals.