0% found this document useful (0 votes)
52 views4 pages

2/ Organizing and Visualizing Variables: Dcova

Data visualization organizes and represents data in a way that allows trends and relationships to be easily perceived, in order to better understand information and drive business decision making. Various methods are described for organizing numerical and categorical data into frequency distributions, contingency tables, scatter plots, histograms and other visual formats to analyze and summarize the data. Care must be taken to present data in a clear way that does not obscure trends or create false impressions.

Uploaded by

Thong Phan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views4 pages

2/ Organizing and Visualizing Variables: Dcova

Data visualization organizes and represents data in a way that allows trends and relationships to be easily perceived, in order to better understand information and drive business decision making. Various methods are described for organizing numerical and categorical data into frequency distributions, contingency tables, scatter plots, histograms and other visual formats to analyze and summarize the data. Care must be taken to present data in a clear way that does not obscure trends or create false impressions.

Uploaded by

Thong Phan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

2/ Organizing and Visualizing Variables

Data visualisation is all about organizing and representing data in a way that allows us to
more easily perceive trends and relationships, and effectively communicate otherwise
complex information. We do this in order to better understand information to ultimately drive
better business decision making

DCOVA

   To properly apply statistics, you should follow a framework to minimize possible
errors.
   Define the data you want to study in order to solve a problem or meet an objective (e.g.
study sales data in order to solve the problem of advertising expenditure).
   Collect the data from appropriate sources.
   Organise the data collected by developing pages.
   Visualise the data by developing figures/charts
   Analyse the data collected to reach conclusions and present results.

A summary table tallies the frequencies or percentages of items in a set of categories so that
you can see differences between categories.
A Contingency Table Helps Organize Two or More Categorical Variables
   Used to study patterns that may exist between the responses of two or more

categorical variables.
   Cross tabulates or tallies jointly the responses of the categorical variables.
   For two variables the tallies for one variable are located in the rows and the tallies
for the second variable are located in the columns.

An ordered array is a sequence of data, in rank order, from the smallest value to the largest
value. Shows range (minimum value to maximum value). May help identify outliers (unusual
observations).
The frequency distribution is a summary table in which the data are arranged into
numerically ordered classes.
 You must give attention to selecting the appropriate number of class groupings for the
table, determining a suitable width of a class grouping, and establishing the
boundaries of each class grouping to avoid overlapping.
 The number of classes depends on the number of values in the data. With a larger
number of values, typically there are more classes. In general, a frequency distribution
should have at least 5 but no more than 15 classes.
 To determine the width of a class interval, you divide the range (Highest value–
Lowest value) of the data by the number of class groupings desired.

Organizing Numerical Data: Frequency Distribution Example

   Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58.
   Find range: 58 - 12 = 46.
   Select number of classes: 5 (usually between 5 and 15).
   Compute class interval (width): 10 (46/5 then round up).
   Determine class boundaries (limits):
o   Class 1: 10 but less than 20.
o   Class 2: 20 but less than 30.
o   Class 3: 30 but less than 40.
o   Class 4: 40 but less than 50.
o   Class 5: 50 but less than 60.
   Compute class midpoints: 15, 25, 35, 45, 55.
   Count observations (Frequency)
 Calculate percentage, cumulative percentage, cumulative frequency

Frequency Distribution

It condenses the raw data into a more useful form. It allows for a quick visual interpretation
of the data. It enables the determination of the major characteristics of the data set including
where the data are concentrated / clustered.

The bar chart visualizes a categorical variable as a series of bars. The length of each bar
represents either the frequency or percentage of values for each category. Each bar is
separated by a space called a gap.

The pie chart is a circle broken up into slices that represent categories. The size of each slice
of the pie varies according to the percentage in each category.

The doughnut chart is the outer part of a circle broken up into pieces that represent
categories. The size of each piece of the doughnut varies according to the percentage in each
category.

The Pareto Chart


   Used to portray categorical data (nominal scale).
   A vertical bar chart, where categories are shown in descending order of frequency.
   A cumulative polygon is shown in the same graph.
   Used to separate the “vital few” from the “trivial many.”

The side by side bar chart represents the data from a contingency table.

Stem-and-Leaf Display

A simple way to see how the data are distributed and where concentrations of data exist.

METHOD: Separate the sorted data series into leading digits (the stems) and the trailing
digits (the leaves).

A stem-and-leaf display organizes data into groups (called stems) so that the values within
each group (the leaves) branch out to the right on each row.

The Histogram
 A vertical bar chart of the data in a frequency distribution is called a histogram.
   In a histogram there are no gaps between adjacent bars.
   The class boundaries (or class midpoints) are shown on the

horizontal axis.

   The vertical axis is either frequency, relative frequency, or

percentage.

   The height of the bars represent the frequency, relative frequency, or


percentage.

The Polygon
   A percentage polygon is formed by having the midpoint of each class represent
the data in that class and then connecting the sequence of midpoints at their
respective class percentages.
   The cumulative percentage polygon, or ogive, displays the variable of interest
along the X axis, and the cumulative percentages along the Y axis.
   Useful when there are two or more groups to compare

Variables: The Scatter Plot


 Scatterplotsareusedfornumericaldataconsistingofpaired observations taken from two
numerical variables.

   Onevariableismeasuredontheverticalaxisandtheother variable is measured on the


horizontal axis.
   Scatterplotsareusedtoexaminepossiblerelationships between two numerical
variables.

Variables: The Time Series Plot


 A Time-Series Plot is used to study patterns in the values of a numeric variable over time.

 The Time-Series Plot:

 Numeric variable is measured on the vertical axis and the time period is measured on the
horizontal axis.

Visualize Many Variables


 A Pivot Table:

   Summarizes variables as a multidimensional summary table.


   Allows interactive changing of the level of summarization and formatting of the
variables.
   Allows you to interactively “slice” your data to summarize subsets of data that
meet specified criteria.
  Can be used to discover possible patterns and relationships in multidimensional
data that simpler tables and charts would fail to make apparent
 When organizing and visualizing data need to be mindful of:

o   The limits of other’s ability to perceive and comprehend.

o   Presentation issues that can undercut the usefulness of methods from this
chapter.

   It is easy to create summaries that:

o   Obscure the data or

o   Create false impressions.

You might also like