Statlab Workshop Series 2008 Introduction To Regression/Data Analysis
Statlab Workshop Series 2008 Introduction To Regression/Data Analysis
Statlab Workshop Series 2008 Introduction To Regression/Data Analysis
I. The basics
A. Types of variables
Your variables may take several forms, and it will be important later that you are aware
of, and understand, the nature of your variables. The following variables are those which
you are most likely to encounter in your research.
• Categorical variables
Dummy variables take only two possible values, 0 and 1. They signify
conceptual opposites: war vs. peace, fixed exchange rate vs. floating
exchange rate, etc.
Nominal variables can range over any number of non-negative
integers. They signify conceptual categories that have no inherent
relationship to one another: red vs. green vs. black, Christian vs.
Jewish vs. Muslim, etc.
Ordinal variables are like nominal variables, only there is an ordered
relationship among them: no vs. maybe vs. yes, etc.
• Numerical variables
Such variables describe data that can be readily quantified. Like categorical
variables, there are a few relevant subclasses of numerical variables.
A useful starting point is to get a handle on your variables. How many are there? Are
they qualitative or quantitative? If they are quantitative, are they discrete or continuous?
Another useful practice is to explore how your data are distributed. Do your variables all
cluster around the same value, or do you have a large amount of variation in your
variables? Are they normally distributed? Plots are extremely useful at this introductory
stage of data analysis – histograms for single variables, scatter plots for pairs of
continuous variables, or box-and-whisker plots for a continuous variable vs. a categorical
variable. This preliminary data analysis will help you decide upon the appropriate tool
for your data.
http://www.yale.edu/statlab 2