0% found this document useful (0 votes)
12 views129 pages

Real Estate Case Analysis

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 129

See discussions, stats, and author profiles for this publication at: https://www.researchgate.


Guidelines for Real Estate Research and Case Study Analysis

Book · January 2016


1 14,472

2 authors:

Ranthilaka Gedara Ariyawansa Terans Gunawardhana

University of Sri Jayewardenepura RMIT University


All content following this page was uploaded by Ranthilaka Gedara Ariyawansa on 30 June 2016.

The user has requested enhancement of the downloaded file.

Guidelines for Real Estate Research and Case Study Analysis

Published by the Department of Estate Management and Valuation,

University of Sri Jayewardenepura

ISBN 978 955 4908 42 0

R. G Ariyawansa
Department of Estate Management and Valuation
Faculty of Management Studies and Commerce
University of Sri Jayewardenepura
Sri Lanka
135/60 A, Jaya Mawatha,
Neelammahara Road,

W.H.T. Gunawardhana
Department of Estate Management and Valuation
Faculty of Management Studies and Commerce
University of Sri Jayewardenepura
Sri Lanka
Email : terans@sjp.ac.lk
TP : +94713238188


Content Page No

Chapter one: Introduction to Real Estate 07-20

 Land
 Real Estate
 Real Property
 Types of Properties
 Real Estate Development
 Real Estate Management
 Real Estate Valuation

Chapter Two: Real Estate Research and Case Study Analysis

 Beginning of a Research
 What is a Research?
 Types of Research
 Real Estate Research…!
 Researcher
 The Research Process

Chapter Three: Real Estate Analysis-Quantitative Approach 47-87

 Hypothesis Testing
 Selection of an appropriate statistical test

Chapter Four: Real Estate Analysis- Validity and Reliability 88-96
 Validity of Research Outcomes
 Reliability of Research Outcomes

 Bibliography 97-98
 Glossary 99-127

Purpose of the book
Real Estate Development and Management (REDM) is multi-
disciplinary in nature. Therefore, the scope of Real Estate
Development and Management is very wide and complex. This
discipline relates with several subjects as well. Any decision
taken by a supplier, buyer, investor, policy maker etc is very
crucial and expensive. Changing or altering decisions is
sometimes impossible or very expensive. Therefore, scientific
analysis is a fundamental requirement for the correct decision
making process. Scientific research also helps for innovations
in the industry.

Hence, this book attempts to provide some basic guidelines for

real estate research and case analysis. The book briefly
describes about some fundamentals of real estate, real estate
research and case analysis. For comprehensive analysis,
however, it is needed to aware sufficient knowledge on related
subjects such as Real Estate Development and Management,
Marketing Management, General Management, Micro and
Macro Economics, Valuation, Law and Planning and research
methods etc. Followings are some of fundamental areas that
should be comprehended by developers, managers and related
professionals such as planners and valuers etc.
 Principles of Real Estate Development and
 Classification and Identification of Different Properties.

 General Management (aspects including four functional
areas i.e. Financial Management, Human Resources
Management, Production and Marketing Management).
 Strategic Management.
 Role of different stakeholders i.e. developer, investors,
managers etc.
 Analysis if highest and best use of lands and properties
 Cost Benefit Analysis.
 Cash flow, Net Present Value, Internal Rate of Return
etc with regard to real estate investment projects.
 Investment appraisal and portfolio management.
 Real Estate Development process, real estate life cycle.
 Property and the macro-economy, globalization of
 Obsolescence of properties and remedies.
 Different costs and values in real estates.
 Environmental effects on Real Estate and vice versa.
 Building facilities & service management.
 Management of Condominiums.
 Ownerships, tenure types, different interest and legal
aspects of real estates.
 Real estate management in non-real estate firms i.e.
Corporate Real Estate Asset Management (CREAM).
 Review of real estate industry and public policy (real
estate sector/real estate markets etc).
 Real estate research and consultations.

Chapter 01
Introduction to Real Estate

Some Basic Concepts

There are three major terms to be clarified firstly in the field of
real estate. They are;
- Land
- Real Estate
- Real Property
- Types of Properties
- Real Estate Development
- Real Estate Management
- Real Estate Valuation

Land: Land is defined as the earth’s surface extending

downwards to the centre of the earth and upwards to infinity,
including things permanently attached by nature such as trees
and water. The term “land”, thus, refers to not only the surface
of the land but also the underlying soil and things that are
naturally attached to the land, such as rocks and plants. Land
includes the minerals and substances far below the earth’s
surface. It also includes the air above the land up into the space.
Therefore, land consists of three layers known as respectively
the “surface” the “subsurface” and “airspace”.

Land is one of the basic factors of production. Primary

production functions are taken place in rural land (non-urban
land) whereas the other functions are allied with urban

locations. However, due to the development of the
infrastructure facilities, it is difficult to demarcate urban uses
and non-urban uses easily. Anyway, in this book, more concern
is given to urban property development and management.
Real Estate: Real estate is defined as the land above and below
the earth’s surface, including all things that are permanently
attached to it either natural or artificial. Therefore, the term
“Real Estate” is broader than the term “land”. It includes not
only the natural components of the land but also all artificially
improved immovable features made by the man.

Any artificial thing that is attached to the land, such as a

building or a structure or a fence is concerned as a part of real
estate. Land is also converted into real estate as it is improved
by means of providing access, utilities, sewerage systems and
other services that make it suitable for habitable buildings. They
are also called serviced-lands, improved lands or developed
lands. Such parcels of lands are called real estates since they
have been reshaped from their natural features.

At the same time, it is clear that the land becomes usable when
it is converted into a real estate. This means when a land
becomes a real estate, it is usable for planned activities.
Therefore, it is able to argue that the “usability of the land” is a
more apparent and logical criterion to recognize real estate. In
this way, it is also able to argue that when the man starts to use
the land it becomes a real estate.

Real Property: The word “property” has different meanings in
legal term1. In this discipline, it is the land and tangible features
on the land and permanent improvements.

“Real property” is defined as the interests, benefits and rights

inherent in the ownership of real estates. Indeed, a real estate is
valuable, usable and marketable as it possesses several real
properties. Hence, the term “real property” is broader than both
the terms “land” and “real estate”. It includes the physical
surface of the land, what lies above and below to it, what is
permanently attached to it, as well as the bundle of legal rights:
legal rights of ownership which is attached to the ownership of
a parcel of real estate.

Real property includes not only the surface, subsurface and

airspace but also the surface rights, subsurface rights and
airspace rights, all of which can be owned by different
individuals. There are, however, some limitations as well. For

 Surface rights are restricted by different legal conditions

such as Planning Law which aims to control haphazard
development, Law of Delict, which aims to protect
others from nuisance made by one who enjoys benefit(s)
of a real estate etc. Further, restriction of cutting some
trees like jack, coconut etc, and excavation of sand,

Moveable property, immoveable property etc… In this subject, the term
property referred only to immoveable properties.

granite etc are examples of limitations of the surface

 Subsurface rights are restricted by licence, tax (for

instance gem mining) while some substances are fully
restricted by having state ownership such as fossil oil
and some minerals etc.

 Airspace is also restricted by civil aviations law,

electricity and other service lines etc. Further, it seems
in many countries of full restriction for fling by
individuals or need of very special permission for some
selected journeys.

Land = (Soil) + (All natural attachments up to the space and

below to the centre of the earth)

Land = (Natural Surface) + (Natural Subsurface) + (Natural


Real Estate = Land + All Man Made Fixtures

Real Property = (Real Estate) + (Rights and benefits attached

to real estate)

Classification of Properties (Types of Properties)
When it is analysing the real estate cases, it is essential to know
different types of properties. Properties can be classified
according to different criteria as pointed out below.

Law classifies properties based on the ownerships/tenure as,

 Freehold
 Leasehold
 Customary
As per the property usages/nature of development properties
are classified as,
 Residential property/ condominiums
 Commercial property
 Agricultural property
 Public property
 Industrial property
 Service property
As per the ownership, it is classified as;
 Private (individuals, companies and social
organizations etc)
 Public (state departments, incorporated bodies etc)

According to the location, neighbourhood development etc,

properties are classified as;
 Urban property
 Rural property

Markets and related socio-economic context may define
properties (especially residential properties) as,
 Luxury
 Semi-luxury/ middle income...
 Low income etc

Some more special classifications can be made if needed. For

instance, properties can be classified according to the age,
materials used, technology adopted, value, size of properties
and so on. Hence, property related cases may come under these
types of properties and their characteristics. (For property
characteristics, read the Chapter one, Management of Real
Estate...by R G Ariyawansa pp.8-15).

Real Estate Development (RED)

“Real estate development is an idea that comes to completion
and use the bricks and mortar put in place by the
development team (Mike et. al., 1995)”.

The above definition can be rephrased as “Real estate

development is a refined and confirmed idea that comes to
satisfactory completion and satisfactory use of right bricks and
mortar put in place by the specialised development team”.

This definition implies that the real estate development is not a

single or a simple activity. It is a long term process. Also it is a
continuing process. It is a number of activities. It has three main
processes such as,

(a) A process of “idea generation”
(b) A process of “physical construction” and
(c) A process of “use of the built property”

It needs a team of experts to complete a development. Whatever

the effort put into a development, it is worth only if the final
product is usable, funtionable, and adoptable to satisfy the
different and changing needs and wants of different users of the
property. Therefore, the idea of acquiring a particular property
should be refined and confirmed having properly done “need
analysis”. Completion of the project should be properly
monitored not allowing any deviation from the expected needs
and wants but having necessary alteration according to altered
needs and wants if any in order to maximise the users’

The development team is responsible to select the right input

(bricks and mortar) to realize the expected needs and wants. It
seems that the “completion” of the development is not met by
the completion of physical construction alone. Actual
completion means that the physically constructed structure
becomes a property only if it is fit for the continuous use for the
beneficiaries’ satisfaction. Therefore, it is clear that the real
estate development is linked with the real estate management.

Real Estate Management (REM)
Property management starts with the property purchase idea
(idea of acquiring/ building/ purchasing or simply having a
property) of the users and investors. Generally, there are two
main areas in property management. They are,

 Property portfolio management.

 Property assets management.

Property portfolio management is dealing with strategies of

formalising and monitoring of an organization’s property so as
to achieve the maximum portfolio return and minimum
portfolio risk. Hence, this mainly deals with the financial and
investment decisions in connection with real estates (land and
landed properties) at the strategic level of organizations. For
instance, critical decisions relating to properties such as
acquiring, constructing, changing the uses and users, and
demolishing or selling properties etc are highly expensive for
organizations. And also implications of such decisions for the
entire organizational activities are very high. Therefore, such
decisions are usually taken by the top management as they are
strategic level decisions.

“Property asset management” is about the use of properties in

maximum level. Therefore, it deals with the objectives of
increasing the life span of properties together with the
contribution to economic worth of the assets. Hence, through
property assets management, organizations expect to ensure

proper care and attention of benefits of properties. Thus,
property assets management functions deal with the operational
decisions. These decisions are usually taken by the middle and
lower level managers of organizations.
In case of properties of private sector organizations, it is
expected to have separate property management plans along
with the corporate business plans of organizations. As far as the
management of the public sector properties are concerned, there
may have some approved laws relating to the respective public
sector agency. Sometimes, necessary regulations and circulars
are issued time to time and strategic level and operational level
decisions are taken accordingly. Decisions according to
circulars are not rational as it does not always concern the
professional requirements rather than administrative

(1) According to the RICS policy review in 1974, the Estate

Management (The generic activity) is considered as “All
facets of case, development and Management of urban
land, including the sale, purchase and letting of
residential, commercial and industrial property and
management of urban estate and advice to clients to
planning …

Further, the areas of Corporate Real Estate Asset Management

(CREAM), Real Estate Facility Management, Real Estate Life
Cycle Management, Management of Farm Land, Management

of Public Properties etc are some of areas under the subject of
real estate management.
Some issues relating to the increasing real estate assets of
 Rapid changes in technology: This is beneficial as
it paves the way for innovations, highest and best
uses etc. At the same time, it is a challenge as it
leads to a heavy cost and functional obsolesce of
properties etc.
Variety of social needs: Since social needs and wants
are rapidly changed, adoptability of properties is an
expensive task.

 Unforeseen expensive maintenance problems: This

may collapse property functions interrupting all
operations of organizations.
 Continuous management needs: To minimize the
above risk, it incurs heavy cost for overall property

 Having properties without interested parties/

clients/ demand: This pushes organizations to be
bankrupted due to unbearable property cost.

The above mentioned are general challenges for any

organization. Organizations can only be able to face these
challenges by applying proper management for real estate.

Similarly, organizations have to deal with some common
macro-economic issues when managing properties.
Some macro level socio-economic issues
 Interest rates over the cost of money.
 Inflation on construction cost.
 Tax/ subsidies etc.
 International trading issues.
 Establishing international property market.
 Oil crisis/ cost of energy/ electricity/ gas etc.
 Scientific development.
 War/crisis/ natural and artificial disasters.
 Ultimate result of determinants of property prices:
difficult to renew/ difficult to test/ difficult to focus.

Decision making is challenging

As explained in above sections, it is clear that real estate and
related socio-economic context is very dynamic and
challenging. Especially for some decisions related with real
estate, expert views with proper analysis are essential.
Followings are some of examples for which scientific analysis
are needed.
 Real estate restructuring, especially for large
 Real estate (new) development or redevelopment.
 Estimate values of real estate for different
 Property auditing for necessary changes.
 Research for searching new knowledge.

 Preparing real estate strategic plans to support general
business plans of large organizations.
 Real estate brokerage when selling or purchasing.
 Capital markets/asset advisory for investment
 Feasibility studies, property analysis for new investment
or alteration for the existing real estate.
 Evaluate customer needs and to solve customer
 Etc...

Why real estate projects need experts’ views?

 As real estates are multi-disciplinary by nature, single or
few cannot complete projects.
 Complicatedness of real estate is another aspect for
which developers/manager/buyer/seller/policy makers
need to have scientific information based of
comprehensive analysis.
 High risk should be minimized and high cost should be
controlled through correct information.
 Durability of investments on real estate is very long and
therefore the decision making is high responsible task.
 As overall social cost of real estate is very high.

Namely, there are some common studies required in real estate

projects such as,
 Appraisal/Valuation.
 Cost-benefit analysis.

 Analysis of economic base of real estates or businesses.
 Analysis of economic impact.
 Study of highest and best use of land and properties.
 Land use study in regions or neighbourhoods.
 Market surveys.
 Marketability study.
 Financial feasibility.
 Environmental impact analysis

Real Estate Valuation

Ensuring highest possible value of a real estate is one of sine
qua non in the process of real estate development and
management. Values of a real estate may be estimated for
different purposes such as buying or selling, taxation,
compensation, mortgaging, insurance, and accounting purpose
and so on.

In the process of estimating values, observation of property,

collection of data, analyzing, and writing reports and
presentation (if necessary) are all challenging tasks for valuers.
This is due to properties are living entities, business entities,
built environments, eco-systems, social systems, not just
physical entities.

However, the estimation and final judgment of the market value

of a property is heavily based on an individual’s opinion. This
means that individual’s views, experience, knowledge,
attitudes, beliefs etc. will operate as internal determinants when

he/she engages in observing properties, auditing properties,
collecting, reporting, and analyzing data for the purpose of
estimating the market value of a property. Hence, providing
objective opinion is a crucial task for a valuer as a human being
who is living among ordinary community. However, it is
believed and also evident that by means of wider experience,
updated scientific knowledge, a range of smart skills as well as
professional ethics, cord of conducts etc. an ordinary/apprentice
valuer (learner) is developed as a capable, confident
professional who is conscious over the existence of the given
property and the property market enabling to take correct
judgments based of proper analysis.

Chapter 02
Real Estate Research and Case Study Analysis

This chapter consists of two sections as “Beginning of a

Research” and “Designing of a Research”.

Beginning of a Research

How do you find an answer(s) for a problem(s)? There are
several approaches for this. Mainly,

(1) Informal/personal/ad hoc/conventional/ indigenous

(2) Technical/legal/procedural/expert-opinions/democratic
(3) Systematic/purposeful/concerned/rational/scientific-
approaches, (which is called RESEARCH).

The first approach is very general, ordinary people tend to

apply such methods for solving issues as a practice or habit or
due to ignorance. However, the rate of successful results is very

The second approach is somewhat systematic. However, to

solve a crucial problem this approach may not be enough.
Mostly the first and second approaches attempt to find solutions
for symptom not for the root cause of issues.

The third approach is stronger to find solutions for real problem
as it addresses the root of the problems. This approach is;
 Base on philosophy/theory.
 Least bias.
 Least subjective.
 Higher ability to generalize.
 Higher degree of applicability.
 Measurable/responsible/correctable/improvable.
 Higher reliability and validity.

What is a research?
In order to find an answer for the above question i.e. what is a
research, try to find answers for the questions mentioned below.
 Why do you want a research?
 How do you do research?
 When do you do research?

Research cycle: What/Why/How/When …research?

Simply research is an attempt to find solutions for a problem.
The nature of the problem for which research is needed to find
solution is a considerably a long term socially painful matters.
Therefore, it needs a purposeful, conscious and systematic
attempt. Sometimes, it needs a very specific method for a
specific problem. Following diagram illustrates what/why/how
and when a research is done.

Research cycle
Origin of research: Societal problems:
There are questions/issues/ These are multidimensional/
difficulties/needs and wants/ interrelated/ interdependent/
motivations/ frustrations/desires/ critical in situations/ multiple
expectations/plans etc. views.

A need: to solve such

A Process of Research:
An application of systematic systematically,
inquiry using relevant and scientifically,
adequate data/information with rigorously, rationally,
a view to find ways and means using relevant and
valid data
of solving the given problem

Every individual and group as well as family and firms have

different problems, needs and wants etc as mentioned in the top
left-side box. These issues become social problem gradually as
all individuals and organizations are interacting with other
members of society. That is why it needs to have systematic,
scientific and rational approaches to address to such social
issues. In fact, it needs “an application of systematic inquiry
using relevant and adequate data/information with a view to
find ways and means of solving the given problem”. These
underlined terms are important to understand the meaning of

Systematic inquiry: This implies a profound design of a
research. Identification of the problem, establishing aims and
objectives, setting research questions, review of literature,
determining variables, research tools, and analysis etc; are the
features of a systematic approach.

Relevant and adequate data and information: This means

that you want to make sure the validity and reliability of your
study. The data used should be relevant to find the real picture
of the situation. And also you need to use adequate amount of
data to obtain reasonably accurate results.

Ways and means: This implies that by a research you are

supposed to add new knowledge to the existing body of

Accordingly we can define research as mentioned below.

Definition for research

A research is a way of thinking and action in inquiring a
problem/issue/need … with a view to find solution or to find a
way of satisfying the need in a higher level.

The way should be:

 Scientific:- measurable, justifiable, applicable,
approachable, feasible.

 Least subjective (objective) and nonbiased.
Thinking and action:
This implies that the whole process of designing, collecting
analysing and interpreting data, drawing conclusions, writing
the report, and presenting of the finding.

 Why/how/how much/how many/ when/ and so on are

 You can think and identify researchable questions

relating to suitable procedures/ rules/
regulations/practices/ norms and values/
attitudes/methods/criteria/ formats/formula/
mechanism/processes etc.

Higher level:
This is very significant in a research. This emphasises that you
are producing a set of new knowledge. Otherwise whatever
your attempt will not be a research.

Therefore, remember research is not the SPSS or any other

computer applications or the final report. It is a series of
activities from identification of a problem/need, and meeting
and discussions with experts/supervisors/ reviewing of relevant
literature/ collecting, analysing and interpreting data and
information/ writing a report/ presentation of finding and
justification for the findings etc. Also the end of your research

is the beginning of another research(es). The research process is
generally a cyclical one.

Types of research
(1) According to the nature of what do you want to do,
research can be identified as,
a. Descriptive research:
You describe a situation or the nature of a
problem in this approach. You describe what is
the shape of the problem? For instance, (i) Study
on the changes of land use pattern in Colombo
city. (ii) An empirical examination of socio-
economic characteristics of slum dwellers in
b. Correlation research:
You examine the relationship(s) between (or
among) variables. (With what it is related). For
instance, (i) Analysis of the impact of new tax
policy introduced by the 2008 budget on the land
use in the Western Province. (ii) Analysis of the
effectiveness of application of computer
technology for planning in Sri Lanka.
c. Explanatory research:
You study why a certain problem or a
phenomenon or a situation or a relationship etc.
exists. (Why it is related with that). For instance,
(i) Why the rate of increase of land price is
higher in Colombo than major Asian cities? (ii)

Factors for the success or failure of privatization
of state owned investment properties.

(2) According to the major discipline, you may find a

classification of research as,
a. Research in pure sciences:
For instance, (i) A successful medication for
cancer patients. (ii) Impact of soil types on
building cracks.
b. Research in social sciences & Humanities:
This is generally recognized as Social research.
For instance, (i) Study of the food behaviour of
cancer patients. (ii) Building cracks and
developers’ responsibility.
(3) According to the use of the research, you may find a
classification as,
a. Pure research:
These are the researches about the research
methodology. For instance, (i) Developing
appropriate criteria for suitable sampling for land
use analysis.
b. Applied research:
Research relating to a project, development of a
product or a service etc is an applied research.
For instance, (i) Pattern of land price of
peripheral agricultural land in Colombo. (ii)
Changes of small scale constructions after the
Tsunami devastation in Southern Sri Lanka.

(4) According to the style of inquiry you find types of
research as,
a. Quantitative research:
In a quantitative research, you use data from
highly structured formats and analyse
accordingly. For instance, you quantify
relationships, classify relationships, order
relationships according to the significance,
compute ratios, use specific samples, and so on.
b. Qualitative research:
In qualitative research, you use data and
information in unstructured forms and analyse
accordingly. There, you describe the variables
and their relationships rather than quantifying,
use selected cases for in-depth analysis and so

c. Hybrid / combination of qualitative and

quantitative research:
This is a mix of qualitative and quantitative
methods. Therefore, this is more practical and
useful method.

(5) According to the research paradigms, you find different

types of research as,
a. Systematic or Scientific/ positivist approach.
b. Qualitative, ethnographic, ecological or
naturalistic approach.

Therefore, it is clear that there is no a hard and fast
method/methodology called “the methodology”. Determine a
suitable method for your particular study. Then that particular
method will be “the methodology” for your particular study for
that particular purpose, in that particular occasion.

Real Estate research…!

According to the basic production circular model, you can think
of either the producers’ aspects or consumers’ aspects relating
to real estate as depicted below.
General consumer production circular model
Household Consumers

Producers Suppliers
Research on

Real Estate Producers/suppliers’ Real Estate Consumers’ point of

point of views views
 Financial/investment aspects  Pattern of consumption, needs and
 Market/marketing/business/invest wants
ment/products and services, new  Satisfaction
products/innovations etc  Income and expenditure patters
 Environmental/material/procedure  Socio-cultural
s/ standards etc  Political aspect
 Cost and value of products and  Demographic
services  Different users, households,
 Economic aspect/ policies/ neighbours, visitors etc
development projects  Consumer behaviours
 Legal aspects  Psychological aspects
 Technical aspects 29 Knowledge, belief, values etc
 Professional aspects
 Management aspects ...
It can be described research as a process of collecting,
analysing, and interpreting information to answer questions.
However, answering for research problems can be made
according to the different perspectives and different angles. For
instance, go through the following examples.

(1) Why do people pay lower amount of stamp duty on land

transaction? You will find different answers for the
question as follows.
 Answer 01 is “In order to minimize the cost”. This
answer seems as a customer point of view. It is an
economic aspect relating the problem.
 Answer 02 is “Since they don’t know the legal
actions against the fraud”. This may be a
professional’s point of view. It is a legal aspect
relating to the problem.
 Answer 03 is “Due to the weakness of the market
system, immaturity, informality”. This seems as an
academic’s point of view. It also looks like a policy
aspect relating to the problem.

Similarly you can learn the following examples and you will be
aware of formulating research theme as well.

 How does the existing information system help to the

role of the valuer?
 What is the appropriate plot size for residential
properties in Colombo city?

 The appropriateness of plot size and different land uses.

In order to make an appropriate analysis, real estate researcher
will have some qualities such as;
 Researcher is (or will be) an expert in a particular area
of interest in the field of real estate.
 Lifelong learner.
 Self learner.
 Good consultant.
 Able to diagnose problems accurately.
 Rational thinker.
 Good negotiator.
 Good observer.
 Able to provide objective opinions.
 Philosopher.
 Theory builder.
 Opinion maker.
 Good rapport builder.
 Good investigator.
 Able to read and comprehend.
 Listener.
 Writer.
 Analyser and interpreter.
 Presenter and convincer.

The research process
(1) Identification of a research problem.

Tentative Title

Area of study This shows you a

direction/destination and hints
you about the needed data/
sample/analysis type etc. to
Reviewing formulate an attractive title and a
literature good research design

Research problem

(2) Research design:

 Your own way of activities in the research.
 You have to plan everything systematically.
rationally/logically such as,
o Data/ data collection methods.
o Sample/sampling strategy.
o Analysis types.
o Timeframe/cost etc.
o Limitations/policies/scope etc.

(3) Preparing data collection tools.
 Observation methods/guidelines.
 Interview schedules/ list of interviewees/ organizations
to be visited etc.
 Interview guidelines.
 Questionnaire(s).
 Data recording tools.

(4) Selecting a sample(s)/cases.

 Representing the total population.
 Consider the cost.
 Within the given or available timeframe.
 According to the purpose of the research (objectives).

(5) Research proposal.

Arrange what you have done so far as an informative
document, which guides you to conduct the research. This
tells your supervisor and the finding agency about what and
how you will do in your research and what sort of outcomes
will likely be there.

Therefore, you have to include followings in your research

 Title.
 Introduction (an argument along with theoretical
 Statement of the problem.

 Objectives (General and special).
 Hypothesis (if needed).
 Research design (methodology).
 Limitations/scope.
 Chapters of your research report.
(These features are enough for BSc and MSc level
academic research proposals)
 Timeframe.
(PhD and other given task, even for MSc programs if
 Budget.
 Qualifications of resource persons (CVs).
 Any other relevant particulars.
(For funded projects)

(6) Collection of data and information.

(7) Analysis of data.
(8) Writing a report.
(9) Presentation and justification of findings.
(10) Preparing for publication of papers in journals or as a

Learning outcome up to now

The following table summarises the learning outcomes that you
have gained up to now from the first part of this chapter.

Learning outcomes
Our attempt Your responsibility Learning
We attempt to Knowing answers for
Short term
help you to (1)-(5)
understand You will be
(1) What is researchers and you do
research? research (Research
(2) How do you oriented/analytical/
do research? good observer/
(3) Who is maintaining your own
researcher? data and information
(4) What bank/ innovator/
characteristic critical thinker/ theory
s does a good builder)
researcher Accordingly you help
have? to upgrade the social
(5) What is the well-being through
purpose of being a true real estate
doing a professional

Designing of a Research
This part of the chapter 09 covers following areas.
 Designing a research proposal.
 Title of your research.
 Developing an “Introduction”.

 Methodology.
 Designing of your research.

Designing a research proposal: You need to have an

appropriate research proposal to conduct a research. Research
proposal is the outcome of the process of “research design”. If
you have a sound proposal, you have finished 1/4 or 2/3 of your

Title of your research: This is the “theme/topic” and/or “the

shortest possible meaningful way of expression of your research
study” by means a “name”. However, you do not necessarily
need a title to begin a research. Constructing a title is a part of
the research designing process. Title preferably possesses some
characteristics such as;

 A very short phrase but not a sentence.

 Clear and straightforward meaning and eye-catching.
 Maximum two to three lines.
 It should imply (highlight/hint) the core of the research
problem and the area of the study.
 Avoid two or more research problems (avoid
conjunctions such as “and”, “or” etc…).
 Not necessarily be in a form of questions (What, when,
why…etc), better avoid from questioning.
 Should not be in a form of exclamation/
sympathy/emotion (!, Oh !, …).

 Should not include abbreviation/ jargons/ ambiguous
terms (WHO, CMC, UN…).
 Should not be two sentences, but able to have one sub
title not more than one.

Followings are examples indicated in the first part of this

1. Study on the changes of land use pattern in …
2. An empirical examination of socio-economic
characteristics of slum duellers in…
3. Analysis of the impact of new taxes imposed by the
2008 budget on land use in the Western province
4. Analysis of the effectiveness of application of computer
technology for planning in Sri Lanka
5. (Why) the rate of increase of land price is higher in
Colombo than major Asian cities
6. Factors for the success or failure of privatization of state
owned investment properties
7. A successful medication for cancer patients
8. Impact of soil types for building cracks
9. Food behaviour of cancer patients
10. Building cracks and developers responsibility
11. Developing (suitable criteria for ) a suitable sampling
for land use analysis
12. Pattern of land price of peripheral agricultural land in
13. Changes of small scale constructions after the
Tsunami devastation in Southern Sri Lanka

14. Why do people pay lower amount of stamp duty on land
15. How does the existing information system help to the
role of the valuer?
16. What is the appropriate plot size for residential
17. The appropriateness of plot size and the land use

Go through the following questions and answers carefully.

Question 01: Are these appropriate titles for research and can
you do researches on these areas?

Answer: These are not in the form of attractive titles. However,

each shows an appropriate area for research.

Question 02 – How do I formulate a good title out of these?

Answer – Simply you can do it by rephrasing the given themes!

However, it is not too simple as it says. Before
rephrasing or rearticulating, learn the way you
extracted the theme. It is none of the way other than
scanning of the existing literature. Therefore, you
have to adopt the same way to develop a
researchable theme into an appropriate title i.e. with
the help of reviewing relevant and updated literature.
It may be possible if you are smart in language.
However, without knowing the existing situation

you can not do a research. You will not go

Examples 01
Suppose that you have roughly identified a research theme as
“Analysis of Housing Market”. This is a relevant researchable
area in your discipline. But, it is too vague. So, you have to
rearticulate it to be a specific and researchable theme. Go on the
following steps and learn the gradual changes made for the
initial theme.
a) Analysis of Housing Market
b) Analysis of Housing Market in Colombo
c) Analysis of Housing Market in Colombo and Suburbs
d) Analysis of Consumer Behaviour of the Housing Market
in Colombo and Suburbs
e) Analysis of Determinants of Consumer Behaviour of the
Housing Market in Colombo and Suburbs
f) Analysis of Determinants of Consumer Behaviour of
Potential Buyers in the Housing Market in Colombo and
g) Analysis of Determinants of Consumer Behaviour of
Potential Buyers in the Housing Market in Colombo and
h) Analysis of Determinants of Consumer Behaviour of the
Housing Market in Colombo and suburbs: Views of
Potential Buyers

i) Analysis of Determinants of Consumer Buying
Behaviour of the Housing Market: Views of Potential
Buyers in Colombo and Suburbs
j) Determinants of Consumer Buying Behaviour of the
Housing Market: Views of Potential Buyers in Colombo
and Suburbs
k) Analysis of Determinants of Consumer Satisfaction of
the Housing Market: Views of Potential Buyers in
Colombo and Suburbs
l) Analysis of Determinants of Consumer’s Expected
Needs and Wants from of the Housing Market: Views of
Potential Buyers Evidence of Colombo and Suburbs

Example 02
Initial title
“Impact of global credit crunch on residential property market
in Sri Lanka with special reference to Kaduwela area in

Improved titles
“Impact of global economic downturn on residential property
market in Sri Lanka: Empirical evidence from Colombo”

“Impact of failure of global financial market on residential

property market in Sri Lanka: Empirical evidence from

“Failure of global financial market on residential property
market in Colombo: Competition of global and local contexts”

Some special lessons….

 Having a small area makes your work easier but smaller
area will make you an empty feeling and not
encouraging (for beginners).
 Having connections with few other areas and the major
area provides you more opportunities to expand your
 Having a vague, unfamiliar area will stop your journey
or mislead or fed up.
 Having many connections with the major area will make
you more ambitious, more workload, and ultimately you
will have less or no focus.

Developing an “Introduction”
Suppose that you have finalized a title as, “Determinants of
Buying Behaviour of the Housing Market: Views of Potential
Buyers in Colombo and Suburbs”. Accordingly you have to
provide an “INTRODUCTION” to your study at the beginning
of the proposal. What does the “introduction mean? What is it
supposed to do?

Different people may have different understanding and/or

different levels of understanding over a title/theme according to
their perceptions. Therefore, you have to show very clearly
what your perception over the title/theme with the help of

analyzing the background. It is an overview of your study. You
describe all the core areas of your research theme. According to
this example, “buying behaviour”, “determinants of buying
behaviour” “potential buyer” “housing market” are core
concepts in the study theme. You need to develop a core
ideology/ theme/ logic/ rationale or a central argument for your
study by means of the introduction. Through this way, you trace
a “research problem” more specifically for your study. And at
the end of the introduction you can indicate the research
problem of your study under a separate heading called
“Statement of the Problem”.

In the introduction,
 You have to discuss all the major concepts of your
research theme/title.
 For instance, according to the first example, “Consumer
buying behaviour”, “Housing market in general”,
“Housing market experience of Colombo and suburbs”
are major areas of importance.
 Therefore, in your introduction, you have to describe all
these concepts according to their level of significance.
 You should identity and discuss the relationship
between/among these concepts (major concepts and
their variables).
 You have to input sufficient amount of literature
(evidence) to support your argument along with
citations in an appropriate way.

Once you precisely identified a research problem, you can go
ahead with your research by setting aim(s) and objectives
through which you can address the research problem. You can
have one general objective and few specific objectives.
However, this is not a must.

Having some research questions on specific objectives

commonly or separately, you can easily simplify your study
further so that you can find solutions for the research problem
from a particular point of view of your interest.

You need to describe particularly the data on which you depend
in finding solutions for the research problem. (Instead of that, it
is needed to explain data collection and analysis methods,
limitations etc).

Designing your research

Step 01- Through reviewing literature (relevant concepts,
theories, research findings, models, procedures,
practices, definitions, statistics and so on), you have to
trace a researchable problem.

Step 02 – Scale-down the scope of your area of study into a

very specific, least complicated, meaningful, and
researchable. This is possible by means of formulating
general and specific objectives, research questions for
each specific objective (if needed), adopting some

limitations relating to the variables, case studies,
samples for data collections, methods of analysis etc.

Go through carefully the following table and facts in it. You

learn the total activities of designing your research as discussed
in the chapter.

Examples - 01
Specific Research Yardstic Data/informat
objectives Questions ks of ion (working
(About (Using indicator definitions of
particula indicators s variables)
r of relevant (relevant
concepts concepts) variables
relating of
to the indicator
research s)
Househol  Distance About the About size of
d’s desires to service quality rooms,
or centres of  Size of
satisfactio  Quality of structure bedrooms
n structure s,  Size of
s (What  Design kitchen
Objective specific s  Size of
is, features  Size of bathrooms
To does a rooms  Size of
evaluate buyer  Qualit

household expect y of sitting
’s desires from a finishe rooms
on house in a s  etc.
different flat?)  Materi (These are the
features of  Neighbour al used expected set of
a house. hood  etc. data. You can
quality obtain needed
 etc. data through
experts or

Examples - 02
Specific Research Yardstic Data/informat
objectives Questions ks of ion (working
(About (Using indicator definitions of
particula indicators s variables)
r of relevant (relevant
concepts concepts) variables
relating of
to the indicator
research s)
Househol  Level of About the About formal

ds income savings savings
purchasin  Expenditu  Forma  Saving
g power re pattern l accounts
 Savings saving  Fix deposits
Objective  Family s  NRFC
is, supports  EPF
To  etc…  Jewelle Or
examine ries  Rs. Less
level of  etc… than
household 100,000
s  Rs.
purchasin 100,000-
g power in 200,000
g a house

When the research design is completed, researcher can start

data collection and data analysis. The next sections present
some data analysis methods. This discussion is on quantitative
method, however, there are several methods tools that can be
used for qualitative analysis.

Chapter 03
Real Estate Analysis-Quantitative Approach

Introduction to Quantitative Approach

After clearly understanding the research problem, it is possible

to formulate research objectives and hypotheses accordingly.
Finally, data collection and analysis to be done in order to
achieve the set objectives.

Branch of mathematics that deals with the analysis and

interpretation of numerical data in terms of samples and
populations are basically two types,

Descriptive Analysis
 Tables, Graphs, Summary Measures

Statistical Analysis (Statistical Inference)

 Estimation, Hypothesis Testing

Hypothesis Testing

Real Estate Researchers are also interested in answering many

types of questions. For example, a Property Developer might
want to know whether the land prices are going up and how. A
mortgagee (bank) might want to know whether a new property
tax will lower a mortgage loan disbursements. A Real Estate

Sales Consultant might wish to see whether a new promotion
technique is better than a traditional one.

These types of questions can be addressed through statistical

hypothesis testing, which is a decision-making process for
evaluating claims about a population. In hypothesis testing, the
researcher must define the population under study, state the
particular hypotheses that will be investigated, give the
significance level, select a sample from the population, collect
the data, perform the calculations required for the statistical
test, and reach a conclusion.
Hypotheses concerning parameters such as means and
proportions can be investigated. There are two specific
statistical tests used for hypotheses concerning means: the z test
and the t test.

This chapter aims to explain basics in the hypothesis-testing

procedure along with the z test and the t test.

The three methods used to test hypotheses are;

1. The traditional method

2. The P-value method
3. The confidence interval method

The traditional method has been used since the hypothesis

testing method was formulated. A newer method, called the P-
value method, has become popular with the advent of modern

computers and high-powered statistical calculators. The third
method, the confidence interval method illustrates the
relationship between hypothesis testing and confidence

Hypothesis Testing—Traditional Method

Every hypothesis-testing situation begins with the statement of
a hypothesis.

A statistical hypothesis is a conjecture about a population

parameter. This conjecture may or may not be true.

There are two types of statistical hypotheses for each situation:

the null hypothesis and the alternative hypothesis.

The null hypothesis, symbolized by H0, is a statistical

hypothesis that states that there is no difference between a
parameter and a specific value, or that there is no difference
between two parameters.

The alternative hypothesis, symbolized by H1, is a statistical

hypothesis that states the existence of a difference between a
parameter and a specific value, or states that there is a
difference between two parameters.

As an illustration of how hypotheses should be stated, three

different statistical studies will be used as examples.

Situation A: A Structural Engineer changes his drawing to
increase the physical life (durability) of a condominium.

If the mean life of the condominium without the structural

changes is 25 years, then his hypotheses are

H0: µ= 25 and H1: µ > 25

In this situation, the Structural Engineer is interested only in

increasing the lifetime of the condominium, so his alternative
hypothesis is that the mean is greater than 25 years. The null
hypothesis is that the mean is equal to 25 years. This test is
called right-tailed, since the interest is in an increase only.

Situation B A building facilities manager wishes to lower air

condition bills by using a special type of insulation in houses. If
the average of the monthly air condition bills is Rs 25,000, his
hypotheses about air condition costs with the use of insulation

H0: µ= 25,000 and H1: µ < 25,000

This test is a left-tailed test, since the building facilities

manager is interested only in lowering air condition costs.

Situation C: A Real Estate Researcher is interested in finding
out whether a new global economic downturn would have any
desirable effects on land sales. The researcher is particularly
concerned with the sales volume during the global economic
downturn. Will the sales volume increase, decrease, or remain
unchanged after a sales promotion?

Since the researcher knows that the land sales volume for the
Colombo District under study is Rs.100 million per month, the
hypotheses for this situation are;

H0: µ= 100 and H1: µ = 100

The null hypothesis specifies that the mean will remain

unchanged, and the alternative hypothesis states that it will be
different. This test is called a two-tailed test, since the possible
effects of the global economic downturn could be to raise or
lower the land sales volume.

To state hypotheses correctly, researchers must translate the

conjecture or claim from words into mathematical symbols.
The basic symbols used are as follows:

Equal to = Greater than >

Not equal to ≠ Less than <

The null and alternative hypotheses are stated together, and the
null hypothesis contains the equal sign, as shown (where k
represents a specified number).

Two-tailed test Right-tailed Left-tailed test

H0: µ= k H0: µ= k H0: µ= k
H1: µ ≠ k H1: µ > k H1: µ < k

When a researcher conducts a study, he or she is generally

looking for evidence to support a claim. Therefore, the claim
should be stated as the alternative hypothesis, i.e., using < or >
or =. Because of this, the alternative hypothesis is sometimes
called the research hypothesis.

A claim, though, can be stated as either the null hypothesis or

the alternative hypothesis; however, the statistical evidence can
only support the claim if it is the alternative hypothesis.
Statistical evidence can be used to reject the claim if the claim
is the null hypothesis.

Following table should be helpful in translating verbal

conjectures into mathematical symbols.

Hypothesis-Testing Common Phrases

> <
is greater than is less than
is above is below
is higher than is lower than
is longer than is shorter than
is bigger than is smaller than
is increased is decreased or reduced from

= ≠
is equal to is not equal to
is the same as is different from
has not changed from has changed from
is the same as is not the same as

Selection of an appropriate statistical test

Finally, the success or the failure of a research highly depend

on the selection of an appropriate statistical test. Hypotheses,
variables and nature of the data etc will be considered before
selecting a test.

Following diagrams guide researchers to select an appropriate

statistical test after considering objectives, hypotheses and
variables etc.


In statistics, correlation is a technique which tells if two

variables are related or not and how much strong the
relationship. But this measure doesn’t indicate cause-effect
relation or impact of one on another.
Real Estate Developer wants to know whether there is a
relationship between land sales prices and distances to the town
centre. Accordingly data were collected on 200 land sales and
related distances to the town centre and tested for the normality.
Finally SPSS output of table generated with the data and tested
his research hypothesis.

Pearson Correlation - These numbers measure the strength
and direction of the linear relationship between the two
variables. The correlation coefficient can range from -1 to +1,
with -1 indicating a perfect negative correlation, +1 indicating a
perfect positive correlation, and 0 indicating no correlation at
all. (A variable correlated with itself will always have a
correlation coefficient of 1.)
From the scatterplot of the variables land perch price
and distance to town below, it is possible to see that the points
tend no pattern, which is the same as saying that the correlation
is weak negative. The -0.18 is the numerical description of how
tightly around the imaginary line the points lie.
If the correlation was higher, the points would tend to be closer
to the line; if it was smaller, they would tend to be further away
from the line. Also note that, by definition, any variable
correlated with itself has a correlation of 1.
Sig. (2-tailed) - This is the p-value associated with the

N - This is number of cases that was used in the correlation.
Because data set has no missing data, all correlations were
based on all 200 cases in the data set. However, if some
variables had missing values, the N's would be different for the
different correlations.


Linear Regression

Linear Regression estimates the coefficients of the linear

equation, involving one or more independent variables that
best predict the value of the dependent variable. For example,
it is possible to predict a land perch price (the dependent
variable) from independent variables such as distance to town
centre, extent, and frontage etc.

Statistics. For each variable: number of valid cases, mean, and

standard deviation. For each model: regression coefficients,
correlation matrix, part and partial correlations, multiple R, R2,
adjusted R2, change in R2, standard error of the estimate,
analysis-of-variance table, predicted values, and residuals.
Also, 95%-confidence intervals for each regression
coefficient, variance-covariance matrix, variance inflation
factor, tolerance, Durbin-Watson test, distance measures
(Mahalanobis, Cook, and leverage values), DfBeta, DfFit,
prediction intervals, and case wise diagnostics. Plots:
scatterplots, partial plots, histograms, and normal probability

Data. The dependent and independent variables should be

quantitative. Categorical variables, such as soil type, slope of
the land, or area of land sale, need to be recoded to binary
(dummy) variables or other types of contrast variables.

Assumptions. For each value of the independent variable, the

distribution of the dependent variable must be normal. The

variance of the distribution of the dependent variable should
be constant for all values of the independent variable. The
relationship between the dependent variable and each
independent variable should be linear, and all observations
should be independent.

Regression Analysis: Land price Estimation Model

(Hedonic Model)
This case study shows an example regression analysis to
model the land price (dependent variable) subject to the
changes of distance to town, shape, extent, soil type etc
(independent variables) .These data were collected on 200 land
sales. SPSS 20. Package was used for the data analysis

Model - SPSS allows to specify multiple models in a
single regression command. This indicates the number of the
model being reported.
Variables Entered – This indicates entered variables into a
regression in blocks, and it allows stepwise regression. Hence,
researchers need to know which variables were entered into the
current regression. If it doesn’t block the independent variables
or use stepwise regression, this column should list all of the
independent variables that the researcher specified.
Variables Removed - This column listed the variables that
were removed from the current regression. Usually, this
column will be empty unless the researcher did a stepwise
Method - This column shows the method that SPSS used to run
the regression. "Enter" means that each independent variable
was entered in usual fashion. If the researcher did a stepwise
regression, the entry in this column would show that.

Overall Model Fit

Model - SPSS allows to specify multiple models in a
single regression command. This indicates the number of the
model being reported.
R - R is the square root of R-Squared and is the correlation
between the observed and predicted values of dependent
R-Square - This is the proportion of variance in the dependent
variable (land perch price) which can be explained by the
independent variables (distance to town, shape, extent and soli
type etc). This is an overall measure of the strength of
association and does not reflect the extent to which any
particular independent variable is associated with the dependent
Adjusted R-square - This is an adjustment of the R-squared
that penalizes the addition of extraneous predictors to the
model. Adjusted R-squared is computed using the formula 1 -
((1 – R2) ((N - 1) / (N - k - 1)) where k is the number of
Std. Error of the Estimate - This is also referred to as the root
mean squared error. It is the standard deviation of the error
term and the square root of the Mean Square for the Residuals
in the ANOVA table.

Model - SPSS allows to specify multiple models in a
single regression command. This indicates the number of the
model being reported.
Regression, Residual, Total - Looking at the breakdown of
variance in the outcome variable, these are the categories to be
examined: Regression, Residual, and Total. The Total variance
is partitioned into the variance which can be explained by the
independent variables (Model) and the variance which is not
explained by the independent variables (Error).
Sum of Squares - These are the Sum of Squares associated
with the three sources of variance, Total, Model and Residual.
The Total variance is partitioned into the variance which can be
explained by the independent variables (Regression) and the
variance which is not explained by the independent variables
df - These are the degrees of freedom associated with the
sources of variance. The total variance has N-1 degrees of
freedom. The Regression degrees of freedom corresponds to

the number of coefficients estimated minus 1. Including the
intercept, there are 11 coefficients, so the model has 11-1=10
degrees of freedom. The Error degrees of freedom is the DF
total minus the DF model, 151 - 10 =141.
Mean Square - These are the Mean Squares, the Sum of
Squares divided by their respective DF.
F and Sig. - This is the F-statistic the p-value associated with
it. The F-statistic is the Mean Square (Regression) divided by
the Mean Square (Residual): 11.654/1.072 = 10.86. The p-value
is compared to some alpha level in testing the null hypothesis
that all of the model coefficients are 0.

Variables in the model

Model - SPSS allows to specify multiple models in a

single regression command. This indicates the number of the
model being reported.

B - These are the values for the regression equation for
predicting the dependent variable from the independent
variable. The regression equation is presented in many different
ways, for example:
Y predicted = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4……+ e
The column of estimates provides the values for b0, b1, b2, b3…
and b10 for this equation.
Distance to town- The coefficient for Distance to town is
.837. So for every unit increase in Distance to town, a 0.837
thousand rupees increase in land perch price is predicted,
holding all other variables constant ( This positive impact will
be debatable).However it is possible to consider all other
coefficient values of independent variables.
Std. Error - These are the standard errors associated with the
Beta - These are the standardized coefficients. These are the
coefficients that the researcher would obtain if the researcher
standardized all of the variables in the regression, including the
dependent and all of the independent variables, and ran the
regression. By standardizing the variables before running the
regression, researcher has to put all of the variables on the same
scale, and he can compare the magnitude of the coefficients to
see which one has more of an effect. The researcher will also
notice that the larger betas are associated with the larger t-
values and lower p-values.

t and Sig. - These are the t-statistics and their associated 2-
tailed p-values used in testing whether a given coefficient is
significantly different from zero. Using an alpha of 0.05: The
coefficient for Distance to town (0.837) is significantly
different from 0 because its p-value is 0.017, which is smaller
than 0.05.
95% Confidence Limit for B Lower Bound and Upper
Bound - These are the 95% confidence intervals for the
coefficients. The confidence intervals are related to the p-
values such that the coefficient will not be statistically
significant if the confidence interval includes 0. These
confidence intervals can help the researcher to put the estimate
from the coefficient into perspective by seeing how much the
value could vary.

Discriminant Analysis

Discriminant analysis builds a predictive model for group

membership. The model is composed of a discriminant function
(or, for more than two groups, a set of discriminant functions)
based on linear combinations of the predictor variables that
provide the best discrimination between the groups. The
functions are generated from a sample of cases for which group
membership is known; the functions can then be applied to new
cases that have measurements for the predictor variables but
have unknown group membership.

Example. On average, people in rural areas think about the
possibility for more social interaction/ neighbourhood
relationships than people in the urban, and a greater proportion
of the people in the rural areas are farmers. A real estate
researcher wants to combine this information into a function to
determine how well an individual can discriminate between
the two groups of areas. The researcher thinks that population
size and economic information, social factors may also be

Discriminant analysis allows him to estimate coefficients of

the linear discriminant function, which looks like the right side
of a multiple linear regression equation. That is, using
coefficients a, b, c, and d, the function is:

D = a * occupation + b * No of family members + c * level of

education + d * income per month

If these variables are useful for discriminating between the

two areas, the values of D will differ for the social interaction
and rural areas. If the researcher uses a stepwise variable
selection method, you may find that you do not need to
include all four variables in the function.

Statistics. For each variable: means, standard deviations,

univariate ANOVA. For each analysis: Box's M, within-
groups correlation matrix, within-groups covariance matrix,
separate-groups covariance matrix, total covariance matrix.
For each canonical discriminant function: eigenvalue,

percentage of variance, canonical correlation, Wilks' lambda,
chi-square. For each step: prior probabilities, Fisher's function
coefficients, unstandardized function coefficients, Wilks'
lambda for each canonical function.

Data. The grouping variable must have a limited number of

distinct categories, coded as integers. Independent variables
that are nominal must be recoded to dummy or contrast

Assumptions. Cases should be independent. Predictor

variables should have a multivariate normal distribution, and
within-group variance-covariance matrices should be equal
across groups. Group membership is assumed to be mutually
exclusive (that is, no case belongs to more than one group) and
collectively exhaustive (that is, all cases are members of a
group). The procedure is most effective when group
membership is a truly categorical variable; if group
membership is based on values of a continuous variable (for
example, high Land Price versus low Land Price), consider
using linear regression to take advantage of the richer
information that is offered by the continuous variable itself.

Logistic Regression
A Mortgage bank needs to know the loan repayment capacity of
its mortgage loan holding customers. Accordingly, data were
collected on 250 customers.

Loan repayment capacity is a dichotomous variable which has
Yes or No answer. Age, Income, Education, Employment and
other loans are the variables in the model.

This indicates about the cases that were included and excluded
from the analysis, the coding of the dependent variable, and
coding of any categorical variables listed on
the categorical subcommand.
N - This is the number of cases in each category (e.g., included
in the analysis, missing, total).
Percent - This is the percent of cases in each category (e.g.,
included in the analysis, missing, total).
Included in Analysis - This row gives the number and percent
of cases that were included in the analysis. This has no missing
data in the data set, this also corresponds to the total number of
Missing Cases - This row give the number and percent of
missing cases. By default, SPSS logistic regression does a list
wise deletion of missing data. This means that if there is

missing value for any variable in the model, the entire case will
be excluded from the analysis.
Total - This is the sum of the cases that were included in the
analysis and the missing cases. In this example, 250 + 0 = 250

Block 0: Beginning Block

This part of the output describes a "null model", which is model
with no predictors and just the intercept. This is why you will
see all of the variables that you put into the model in the table
titled "Variables not in the Equation".

Step 0 - SPSS allows you to have different steps in your logistic
regression model. The difference between the steps is the
predictors that are included. This is similar to blocking
variables into groups and then entering them into the equation
one group at a time. By default,
SPSS logistic regression is run in two steps. The first step,
called Step 0, includes no predictors and just the intercept.
Often, this model is not interesting to researchers.
Observed - This indicates the number of 0's and 1's that are
observed in the dependent variable.
Predicted - In this null model, SPSS has predicted that all cases
are 0 on the dependent variable.
Overall Percentage - This gives the percent of cases for which
the dependent variables was correctly predicted given the
model. In this part of the output, this is the null model. 76.4 =
B - This is the coefficient for the constant (also called the
"intercept") in the null model.
S.E. - This is the standard error around the coefficient for the

Wald and Sig. - This is the Wald chi-square test that tests the
null hypothesis that the constant equals 0. This hypothesis is
rejected because the p-value (listed in the column called "Sig.")
is smaller than the critical p-value of .05 (or .01). Hence, we
conclude that the constant is not 0. Usually, this finding is not
of interest to researchers.
df - This is the degrees of freedom for the Wald chi-square test.
There is only one degree of freedom because there is only one
predictor in the model, namely the constant.
Exp(B) - This is the exponentiation of the B coefficient, which
is an odds ratio. This value is given by default because odds
ratios can be easier to interpret than the coefficient, which is in
log-odds units. This is the odds: 59/191 = 0.3
Score and Sig. - This is a Score test that is used to predict
whether or not an independent variable would be significant in
the model. Looking at the p-values (located in the column
labelled "Sig.").
df - This column lists the degrees of freedom for each variable.
Each variable to be entered into the model, e.g., age, education,
income etc. has one degree of freedom, which leads to the total
of four shown at the bottom of the column.
Overall Statistics - This shows the result of including all of the
predictors into the model.
Block 1: Method = Enter
The section contains what is frequently the most interesting part
of the output: the overall test of the model (in the "Omnibus

Tests of Model Coefficients" table) and the coefficients and
odds ratios (in the "Variables in the Equation" table).

Step 1 - This is the first step (or model) with predictors in it. In
this case, it is the full model that we specified in the logistic
regression command. You can have more steps if you do
stepwise or use blocking of variables.
Chi-square and Sig. - This is the chi-square statistic and its
significance level. In this example, the statistics for the Step,
Model and Block are the same because we have not used
stepwise logistic regression or blocking. The value given in the
Sig. column is the probability of obtaining the chi-square
statistic given that the null hypothesis is true. In other words,
this is the probability of obtaining this chi-square statistic
(37.635) if there is in fact no effect of the independent
variables, taken together, on the dependent variable. This is, of
course, the p-value, which is compared to a critical value,
perhaps .05 or .01 to determine if the overall model is
statistically significant. In this case, the model is statistically
significant because the p-value is less than .000.
df - This is the number of degrees of freedom for the model.
There is one degree of freedom for each predictor in the model.
In this example, we have five predictors: age, income,
education, employment and other loans

-2 Log likelihood - This is the -2 log likelihood for the final
model. By itself, this number is not very informative.
However, it can be used to compare nested (reduced) models.
Cox & Snell R Square and Nagelkerke R Square - These are
pseudo R-squares. Logistic regression does not have an
equivalent to the R-squared that is found in OLS regression;
however, many people have tried to come up with one. There
are a wide variety of pseudo-R-square statistics (these are only
two of them). Because this statistic does not mean what R-
squared means in OLS regression (the proportion of variance
explained by the predictors).
Observed - This indicates the number of 0's and 1's that are
observed in the dependent variable.
Predicted - These are the predicted values of the dependent
variable based on the full logistic regression model. This table
shows how many cases are correctly predicted (184 cases are
observed to be No and are correctly predicted to be No; 50
cases are observed to be Yes and are correctly predicted to be
Yes), and how many cases are not correctly predicted (7 cases
are observed to be No but are predicted to be Yes; 9 cases are
observed to be Yes but are predicted to be No).
Overall Percentage - This gives the overall percent of cases
that are correctly predicted by the model. As you can see, this
percentage has increased from 76.4 for the null model to 77.2
for the full model.
B - These are the values for the logistic regression equation for
predicting the dependent variable from the independent

variable. They are in log-odds units. Similar to OLS
regression, the prediction equation is
Log (p/1-p) = b0 + b1*x1 + b2*x2 + b3*x3 + b3*x3+b4*x4
Where p is the probability of being loan payable. Expressed in
terms of the variables used in this example, the logistic
regression equation is
Log (p/1-p) = 0.02 + -0.03*age + 0.24*income – 0.412
education - 0.257* employment + 0.154 other loans

Factor Analysis

In a questionnaire, several questions might be included to

measure one factor. For example, 5 questions can be asked
from respondents to verify his/her perception on impact of
2008- economic crisis on real estate property market in Sri
Lanka. However, all five (05) questions direct to measure one
factor “impact of 2008- economic crisis on real estate property

Factor analysis attempts to identify underlying variables, or

factors, that explain the pattern of correlations within a set of
observed variables. Factor analysis is often used in data
reduction to identify a small number of factors that explain
most of the variance that is observed in a much larger number
of manifest variables. Factor analysis can also be used to
generate hypotheses regarding causal mechanisms or to screen
variables for subsequent analysis (for example, to identify
collinearity prior to performing a linear regression analysis).

The factor analysis procedure offers a high degree of


• Seven methods of factor extraction are available.

• Five methods of rotation are available, including direct

oblimin and promax for nonorthogonal rotations.

• Three methods of computing factor scores are available,
and scores can be saved as variables for further analysis.

Example. What underlying attitudes lead people to respond to

the questions on a real estate market related survey as they do?
Examining the correlations among the survey items reveals
that there is significant overlap among various subgroups of
items--questions about property taxes tend to correlate with
each other, questions about property ownership issues
correlate with each other, and so on.

With factor analysis, you can investigate the number of

underlying factors and, in many cases, identify what the
factors represent conceptually. Additionally, you can compute
factor scores for each respondent, which can then be used in
subsequent analyses. For example, you might build a logistic
regression model to predict voting behavior based on factor

Statistics. For each variable: number of valid cases, mean, and

standard deviation. For each factor analysis: correlation matrix
of variables, including significance levels, determinant, and
inverse; reproduced correlation matrix, including anti-image;
initial solution (communalities, eigenvalues, and percentage of
variance explained); Kaiser-Meyer-Olkin measure of sampling
adequacy and Bartlett's test of sphericity; unrotated solution,
including factor loadings, communalities, and eigenvalues;
and rotated solution, including rotated pattern matrix and

transformation matrix. For oblique rotations: rotated pattern
and structure matrices; factor score coefficient matrix and
factor covariance matrix. Plots: scree plot of eigenvalues and
loading plot of first two or three factors.

Data. The variables should be quantitative at the interval or

ratio level. Categorical data (such as education level or
property type) are not suitable for factor analysis. Data for
which Pearson correlation coefficients can sensibly be
calculated should be suitable for factor analysis.

Assumptions. The data should have a bivariate normal

distribution for each pair of variables, and observations should
be independent. The factor analysis model specifies that
variables are determined by common factors (the factors
estimated by the model) and unique factors (which do not
overlap between observed variables); the computed estimates
are based on the assumption that all unique factors are
uncorrelated with each other and with the common factors.

In this example we have included many options, including the

scree plot and the plot of the rotated factors. While you may
not wish to use all of these options, we have included them here
to aid in the explanation of the analysis. We have also created a
page of annotated output for a principal components analysis
that parallels this analysis.

Mean - These are the means of the variables used in the factor
Std. Deviation - These are the standard deviations of the
variables used in the factor analysis.
Analysis N - This is the number of cases used in the factor

Kaiser-Meyer-Olkin Measure of Sampling Adequacy - This
measure varies between 0 and 1, and values closer to 1 are
better. A value of .6 is a suggested minimum.
Bartlett's Test of Sphericity - This tests the null hypothesis
that the correlation matrix is an identity matrix. An identity
matrix is matrix in which all of the diagonal elements are 1 and
all off diagonal elements are 0. You want to reject this null
Taken together, these tests provide a minimum standard which
should be passed before a factor analysis (or a principal
components analysis) should be conducted.

Communalities - This is the proportion of each variable's

variance that can be explained by the factors (e.g., the

underlying latent continua). It is also noted as h2 and can be
defined as the sum of squared factor loadings for the variables.
Initial - With principal factor axis factoring, the initial values
on the diagonal of the correlation matrix are determined by the
squared multiple correlation of the variable with the other
Extraction - The values in this column indicate the proportion
of each variable's variance that can be explained by the retained
factors. Variables with high values are well represented in the
common factor space, while variables with low values are not
well represented. (In this example, we don't have any
particularly low values.) They are the reproduced variances
from the factors that you have extracted. You can find these
values on the diagonal of the reproduced correlation matrix.

Factor - The initial number of factors is the same as the number

of variables used in the factor analysis. However, not all 7
factors will be retained. In this example, only the first two
factors will be retained.
Initial Eigenvalues - Eigenvalues are the variances of the
factors. Because we conducted our factor analysis on the

correlation matrix, the variables are standardized, which means
that the each variable has a variance of 1, and the total variance
is equal to the number of variables used in the analysis, in this
case, 7.
Total - This column contains the eigenvalues. The first factor
will always account for the most variance (and hence have the
highest eigenvalue), and the next factor will account for as
much of the left over variance as it can, and so on. Hence, each
successive factor will account for less and less variance.
% of Variance - This column contains the percent of total
variance accounted for by each factor.
Cumulative % - This column contains the cumulative
percentage of variance accounted for by the current and all
preceding factors. For example, the second row shows a value
of 71.81. This means that the first two factors together account
for 71.81% of the total variance.
Extraction Sums of Squared Loadings - The number of rows
in this panel of the table correspond to the number of factors
retained. In this example, we requested that two factors be
retained, so there are two rows, one for each retained factor.
The values in this panel of the table are calculated in the same
way as the values in the left panel, except that here the values
are based on the common variance. The values in this panel of
the table will always be lower than the values in the left panel
of the table, because they are based on the common variance,
which is always smaller than the total variance.

Rotation Sums of Squared Loadings - The values in this
panel of the table represent the distribution of the variance after
the varimax rotation. Varimax rotation tries to maximize the
variance of each of the factors, so the total amount of variance
accounted for is redistributed over the three extracted factors.

The scree plot graphs the eigenvalue against the factor number.
You can see these values in the first two columns of the table
immediately above. From the third factor on, you can see that
the line is almost flat, meaning the each successive factor is

accounting for smaller and smaller amounts of the total

Rotated Factor Matrix - This table contains the rotated factor

loadings (factor pattern matrix), which represent both how the
variables are weighted for each f actor but also the correlation
between the variables and the factor. Because these are
correlations, possible values range from -1 to +1.
For orthogonal rotations, such as varimax, the factor pattern and
factor structure matrices are the same.
Factor - The columns under this heading are the rotated factors
that have been extracted. These are the factors that analysts are
most interested in and try to name.

Factor Transformation Matrix - This is the matrix by which
you multiply the unrotated factor matrix to get the rotated factor

The plot above shows the items (variables) in the rotated factor
space. While this picture may be particularly helpful, when you
get this graph in the SPSS output, you can interactively rotate
it. This may help you to see how the items (variables) are
organized in the common factor space. Accordingly Shape is
slightly away from the other factors.

Chapter 04
Real Estate Analysis- Validity and Reliability

As real estate researches and studies are a part of the social

science researches, the estimation of reliability and validity are
tasks frequently encountered.

Measurement issues differ in the social sciences in that they are

related to the quantification of abstract, intangible and
unobservable constructs. In many instances, then, the meaning
of quantities is only inferred.

Most concepts in the behavioural sciences have meaning within

the context of the theory that they are a part of. Each concept,
thus, has an operational definition which is governed by the
central theory. If a concept is involved in the testing of
hypothesis to support the theory it has to be measured. So the
first decision that the research is faced with is “how the concept
shall be measured?” That is the type of measure. At a very
broad level the type of measure can be observational, self-
report, interview, etc. These types ultimately take shape of a
more specific form like observation of ongoing activity,
observing video-taped events, self-report measures like
questionnaires that can be open-ended or close-ended, Likert-
type scales, interviews that are structured, semi-structured or
unstructured and open-ended or close-ended.

Each type of measure has specific types of issues that need to
be addressed to make the measurement meaningful, accurate,
and efficient.

Another important feature is the population for which

the measure is intended. This decision is not entirely dependent
on the theoretical paradigm but more to the immediate research
question at hand.

A third point that needs mentioning is the purpose of the

scale or measure. What is it that the researcher wants to do
with the measure? Is it developed for a specific study or is it
developed with the anticipation of extensive use with similar

Once some of these decisions are made and a measure is

developed, which is a careful and tedious process, the relevant
questions to raise are “how do we know that we are indeed
measuring what we want to measure?” since the construct that
we are measuring is abstract, and “can we be sure that if we
repeated the measurement we will get the same result?”. The
first question is related to validity and second to reliability.
Validity and reliability are two important characteristics of
behavioural measure and are referred to as psychometric

It is important to bear in mind that validity and
reliability are not an all or none issue but a matter of

Validity of Research Outcomes

Very simply, validity is the extent to which a test
measures what it is supposed to measure. The question of
validity is raised in the context of the three points made above,
the form of the test, the purpose of the test and the population
for whom it is intended. Therefore, it is not possible to ask the
general question “Is this a valid test?” The question to ask is
“how valid is this test for the decision that I need to make?” or
“how valid is the interpretation I propose for the test?” It is
possible divide the types of validity into logical and empirical.

Content Validity:
When researcher wants to find out if the entire content
of the behaviour/construct/area is represented in the test we
compare the test task with the content of the behaviour. This is
a logical method, not an empirical one.

Example, if researcher wants to analyse buying

behaviour of Sri Lankan residential property buyers, it is not
fair/ possible to use questions/test perception aspects in
literature or instruments used in European countries; Sri Lankan
context is more or less different from European or others.

Face Validity:
Basically face validity refers to the degree to which a
test appears to measure what it purports to measure.
Researchers need to maintain/understand the link among
research problem, objective/s, hypothesis and statistical test/s

Criterion-Oriented or Predictive Validity:

When a researcher is expecting a future land sales based
on the land sales records obtained currently by the measure,
correlate the sales records obtained with the future sales. The
future land sales is called the criterion and the current land sales
records is the prediction. This is an empirical check on the
value of the test – a criterion-oriented or predictive validation.

Concurrent Validity:
An example in Real Estate Studies, Concurrent validity
is the degree to which the records on a land sale are related to
the records on another, already established, sale occurred at the
same time, or to some other valid criterion available at the same
time. Logically, predictive and concurrent validation are the
same, the term concurrent validation is used to indicate that no
time elapsed between measures.

Construct Validity:
Construct validity is the degree to which a test measures
an intended hypothetical construct. Many times psychologists
assess/measure abstract attributes or constructs. The process of

validating the interpretations about that construct as indicated
by the test score is construct validation.

Reliability of Research Outcomes

Every Research requires dependable measurement.
Measurements are reliable to the extent that they are repeatable
and that any random influence which tends to make
measurements different from occasion to occasion or
circumstance to circumstance is a source of measurement error.

Reliability is the degree to which a test consistently

measures whatever it measures. Errors of measurement that
affect reliability are random errors and errors of measurement
that affect validity are systematic or constant errors.

Reliability Analysis

Reliability analysis allows you to study the properties of

measurement scales and the items that compose the scales.
The Reliability Analysis procedure calculates a number of
commonly used measures of scale reliability and also provides
information about the relationships between individual items
in the scale. Intra class correlation coefficients can be used to
compute inter-rater reliability estimates.

Example. Does my questionnaire measure customer

satisfaction in a useful way? Using reliability analysis, you can
determine the extent to which the items in your questionnaire

are related to each other, you can get an overall index of the
repeatability or internal consistency of the scale as a whole,
and you can identify problem items that should be excluded
from the scale.

Statistics. Descriptive for each variable and for the scale,

summary statistics across items, inter-item correlations and
covariance, reliability estimates, ANOVA table, intra-class
correlation coefficients, Hotelling's T2, and Tukey's test of

Models. The following models of reliability are available:

• Alpha (Cronbach). This model is a model of internal

consistency, based on the average inter-item correlation.

• Split-half. This model splits the scale into two parts and
examines the correlation between the parts.

• Guttman. This model computes Guttman's lower bounds

for true reliability.

• Parallel. This model assumes that all items have equal

variances and equal error variances across replications.

• Strict parallel. This model makes the assumptions of the

Parallel model and also assumes equal means across

Data. Data can be dichotomous, ordinal, or interval, but the
data should be coded numerically.

Assumptions. Observations should be independent, and errors

should be uncorrelated between items. Each pair of items
should have a bivariate normal distribution. Scales should be
additive, so that each item is linearly related to the total score.

Related procedures. If you want to explore the

dimensionality of your scale items (to see whether more than
one construct is needed to account for the pattern of item
scores), use factor analysis or multidimensional scaling. To
identify homogeneous groups of variables, use hierarchical
cluster analysis to cluster variables.

Test-retest, equivalent forms and split-half reliability are

all determined through correlation.

Test-retest Reliability:
Test-retest reliability is the degree to which
outcomes/results are consistent over time. It indicates
outcomes/results variation that occurs from
surveying/investigation to surveying/investigation as a result of
errors of measurement.

Split-Half Reliability:
Requires only one administration. Especially
appropriate when the test is very long. The most commonly

used method to split the test into two is using the odd-even
strategy. Since longer tests tend to be more reliable, and since
split-half reliability represents the reliability of a test only half
as long as the actual test, a correction formula must be applied
to the coefficient. Spearman-Brown prophecy formula.
Split-half reliability is a form of internal consistency

Internal Consistency Reliability:

Determining how all items on the test relate to all other
items. Kudser-Richardson is an estimate of reliability that is
essentially equivalent to the average of the split-half reliabilities
computed for all possible halves.

In a research, for every dimension of interest and specific

question or set of questions, there are a vast number of ways to
make questions. Although the guiding principle should be the
specific purposes of the research, there are better and worse
questions for any particular operationalization. How to evaluate
the measures?
Two of the primary criteria of evaluation in any
measurement or observation are:

 Whether we are measuring what we intend to measure.

 Whether the same measurement process yields the same
These two concepts are validity and reliability.

Reliability is concerned with questions of stability and
consistency - does the same measurement tool yield stable and
consistent results when repeated over time.

Example: To calculate the extent of a piece of land, a tape

measure is a highly reliable measuring instrument. If the extent
of the land is 12 perches. You measure it once with the tape
measure - you get an extent 12 perches. Measure it again and
you get 12 perches. Measure it repeatedly and you consistently
get a measurement 12 perches. The tape measure yields reliable

Validity refers to the extent we are measuring what we hope to

measure (and what we think we are measuring). To find out the
extent of a piece of land , a measuring tape that has been
created with accurate spacing for inches, feet, etc. should yield
valid results as well. Measuring this piece of land with a "good"
measuring tape should produce a correct measurement of the
land extent.

These concepts are applicable in social research, we

want to use measurement tools that are both reliable and valid.
We want questions that yield consistent responses when asked
multiple times - this is reliability. Similarly, we want questions
that get accurate responses from respondents - this is validity

Validity and Reliability Convergence

Source: Kendell, R; Jablensky, A (2003).

1. Ariyawansa R G (2009) Management of Real Estate:
Principles of Real Estate Development & Management
Published by the author, Colombo
2. Ariyawansa R G (2008) “Issues in property development:
Tenure & Ownership, Informality and Price Formation in
Property Markets in Developing Cities”, Published by the
author, Colombo
3. Ariyawansa R G (2008) “Property Market in Colombo:
Evolution and Success”, Published by the author, Colombo
4. Ariyawansa R G (2009) “Housing Market: A Review of
Purchase Decision of Potential Buyers”, published by the
author, Colombo
5. Berk, R., 1979. Generalizability of Behavioral
Observations: A Clarification of Interobserver Agreement
and Interobserver Reliability. American Journal of Mental
Deficiency, Vol. 83, No. 5, p. 460-472.
6. Cronbach, L., 1990. Essentials of psychological testing.
Harper & Row, New York.
7. Cadman, D & Austini, L (1981), Property Development. E
& F.N. Spon Ltd. U.S.A,
8. Chalrles F. Floyd and Marcus T Allen (1993), “Real Estate
Principles”, 4th Edition, Real Estate Education Company,
9. Carmines, E., and Zeller, R., 1979. Reliability and Validity
Assessment. Sage Publications, Beverly Hills, California.

10. Fillmore W. Galaty, Wellington J. Allaway, Robert C. Kyle
( 2006) “Modern Real Estate Practice”, 17th Edition,
Dearborn Real Estate Education, USA
11. Gay, L., 1987. Eductional research: competencies for
analysis and application. Merrill Pub. Co., Columbus.
12. Guilford, J., 1954. Psychometric Methods. McGraw-Hill,
New York.
13. Introduction to SAS. UCLA: Statistical Consulting Group.
From http://www.ats.ucla.edu/stat/sas/notes/ (accessed
November 24, 2015).
14. John B. Corgel, David C. Ling, Halbert C. Smith (2001),
“Real Estate Perspectives: An Introduction to Real Estate”,
4th Edition, McGraw-Hill Higher Education, USA
15. Kendell, R; Jablensky, A (2003). "Distinguishing between
the validity and utility of psychiatric diagnoses". The
American Journal of Psychiatry 160 (1): 4–12
16. Kotler P (1999), “Marketing Management: The Millennium
Edition, Prentice Hall, New Delhi
17. Mike E Miles, Richard L Haney, Gayle Berens (1995),
“Real Estate Development: Principle and Process”, 2nd
Edition, urban land institute
18. Nunnally, J., 1978. Psychometric Theory. McGraw-Hill,
New York.
19. Stapleton T (1986), “Estate Management Practice”, 2nd
Edition, The eastern gazette ltd, London
20. Winer, B., Brown, D., and Michels, K., 1991. Statistical
Principles in Experimental Design, Third Edition.
McGraw-Hill, New York.


Adjusted R2 used in multiple regression when n and k are

approximately equal, to provide a more realistic value of R2

Alpha the probability of a type I error, represented by the

Greek letter a

Alternative hypothesis a statistical hypothesis that states

difference between a parameter and a specific value or states
that there is a difference between two parameters

Analysis of variance (ANOVA) a statistical technique used to

test a hypothesis concerning the means of three or more

ANOVA summary table the table used to summarize the

results of an ANOVA test

Bayes’ theorem a theorem that allows you to compute the

revised probability of an event that occurred before another
event when the events are dependent

Beta the probability of a type II error, represented by the Greek

letter b

Between-group variance a variance estimate using the means

of the groups or between the groups in an F test

Biased sample a sample for which some type of systematic
error has been made in the selection of subjects for the sample

Bimodal a data set with two modes

Binomial distribution the outcomes of a binomial experiment

and the corresponding probabilities of these outcomes

Binomial experiment a probability experiment in which each

trial has only two outcomes, there are a fixed number of trials,
the outcomes of the trials are Independent, and the probability
of success remains the same for each trial

Boxplot a graph used to represent a data set when the dataset

contains a small number of values

Categorical frequency distribution a frequency distribution

used when the data are categorical (nominal)

Central limit theorem a theorem that states that as the sample

size increases, the shape of the distribution of the sample means
taken from the population with mean m and standard deviation
s will approach a normal distribution; the distribution will have
a mean m and a standard deviation

Chebyshev’s theorem a theorem that states that the proportion

of values from a data set that fall within k standard deviations of

the mean will be at least1 _ 1_k2, where k is a number greater
than 1

Chi-square distribution a probability distribution obtained

from the values of (n _ 1) s2_s2 when random samples are
selected from a normally distributed population whose variance
is s2

Class boundaries the upper and lower values of a class for A

grouped frequency distribution whose values have one
Additional decimal place more than the data and end in The
digit 5
Class midpoint a value for a class in a frequency Distribution
obtained by adding the lower and upper Class boundaries (or
the lower and upper class limits) And dividing by 2

Class width the difference between the upper class Boundary

and the lower class boundary for a class in a Frequency

Classical probability the type of probability that uses Sample

spaces to determine the numerical probability that an event will

Cluster sample a sample obtained by selecting a Preexisting or

natural group, called a cluster, and using the members in the
cluster for the sample

Coefficient of determination a measure of the variation of the
dependent variable that is explained by the Regression line and
the independent variable; the ratio of the explained variation to
the total variation

Coefficient of variation the standard deviation divided by

The mean; the result is expressed as a percentage

Combination a selection of objects without regard to order

Complement of an event the set of outcomes in the sample

Space that are not among the outcomes of the event itself

Compound event an event that consists of two or more

Outcomes or simple events

Conditional probability the probability that an event B Occurs

after an event A has already occurred

Confidence interval a specific interval estimate of a Parameter

determined by using data obtained from a Sample and the
specific confidence level of the estimate

Confidence level the probability that a parameter lies within

the specified interval estimate of the parameter

Confounding variable a variable that influences the Outcome
variable but cannot be separated from the other Variables that
influence the outcome variable

Consistent estimator an estimator whose value approaches

The value of the parameter estimated as the sample size

Contingency table data arranged in table form for the chi-

square Independence test, with R rows and C columns

Continuous variable a variable that can assume all values

Between any two specific values; a variable obtained by

Control group a group in an experimental study that is not

given any special treatment

Convenience sample sample of subjects used because they

Are convenient and available

Correction for continuity a correction employed when a

Continuous distribution is used to approximate a discrete

Correlation a statistical method used to determine whether a

linear relationship exists between variables

Correlation coefficient a statistic or parameter that measures
the strength and direction of a linear Relationship between two

Critical or rejection region the range of values of the test

Value that indicates that there is a significant difference and the
null hypothesis should be rejected in a Hypothesis test

Critical value (C.V.) a value that separates the critical Region

from the noncritical region in a hypothesis test

Cumulative frequency the sum of the frequencies

Accumulated up to the upper boundary of a class in a
Frequency distribution

Data measurements or observations for a variable

Data array a data set that has been ordered

Data set a collection of data values

Data value or datum a value in a data set

Decile a location measure of a data value; it divides the

Distribution into 10 groups

Degrees of freedom the number of values that are free to Vary
after a sample statistic has been computed; used when a
distribution (such as the t distribution) consists Of a family of

Dependent events events for which the outcome or Occurrence

of the first event affects the outcome or Occurrence of the
second event in such a way that the Probability is changed

Dependent samples samples in which the subjects are Paired

or matched in some way; i.e., the samples are Related

Dependent variable a variable in correlation and Regression

analysis that cannot be controlled or manipulated

Descriptive statistics a branch of statistics that consists of the

collection, organization, summarization, and Presentation of

Discrete variable a variable that assumes values that can be


Disordinal interaction an interaction between variables In

ANOVA, indicated when the graphs of the lines Connecting the
mean intersect

Distribution-free statistics see nonparametric statistics

Double sampling a sampling method in which a Very large
population is given a questionnaire to determine those who
meet the qualifications for a Study; the questionnaire is
reviewed, a second smaller Population is defined, and a sample
is selected from this group

Empirical probability the type of probability that uses

Frequency distributions based on observations to determine
numerical probabilities of events

Empirical rule a rule that states that when a distribution is

Bell-shaped (normal), approximately 68% of the data Values
will fall within 1 standard deviation of the mean;
Approximately 95% of the data values will fall within 2
standard deviations of the mean; and approximately 99.7% of
the data values will fall within 3 standard Deviations of the
Equally likely events the events in the sample space that Have
the same probability of occurring

Estimation the process of estimating the value of a Parameter

from information obtained from a sample

Estimator a statistic used to estimate a parameter

Event outcome of a probability experiment

Expected frequency the frequency obtained by calculation (as
if there were no preference) and used in the chi-square Test

Expected value the theoretical average of a variable that has

A probability distribution
Experimental study a study in which the researcher
manipulates one of the variables and tries to determine how the
manipulation influences other variables

Explanatory variable a variable that is being manipulated by

the researcher to see if it affects the outcome variable

Exploratory data analysis the act of analyzing data to

determine what information can be obtained by using Stem and
leaf plots, medians, interquartile ranges, and Boxplots

Extrapolation use of the equation for the regression line to

Predict y_ for a value of x which is beyond the range of the data
values of x

F distribution the sampling distribution of the Variances when

two independent samples are selected from two normally
distributed populations in which the Variances are equal and the
variances and are compared as _

F test a statistical test used to compare two variances or three

or more means

Factors the independent variables in ANOVA tests

Finite population correction factor a correction factor Used to

correct the standard error of the mean when the Sample size is
greater than 5% of the population size

Five-number summary five specific values for a data set That

consist of the lowest and highest values, Q1 and Q3, And the

Frequency the number of values in a specific class of a

Frequency distribution

Frequency distribution an organization of raw data in Table

form, using classes and frequencies

Frequency polygon a graph that displays the data by using

Lines that connect points plotted for the frequencies at the
midpoints of the classes

Goodness-of-fit test a chi-square test used to see whether a

Frequency distribution fits a specific pattern

Grouped frequency distribution a distribution used when the

range is large and classes of several units in width are needed

Hawthorne effect an effect on an outcome variable caused by
the fact that subjects of the study know that they are
participating in the study

Histogram a graph that displays the data by using vertical Bars

of various heights to represent the frequencies of a distribution

Homogeneity of proportions test a test used to determine

The equality of three or more proportions
Hypergeometric distribution the distribution of a variable that
has two outcomes when sampling is done without Replacement

Hypothesis testing a decision-making process for Evaluating

claims about a population

Independence test a chi-square test used to test the

Independence of two variables when data are tabulated In table
form in terms of frequencies

Independent events events for which the probability of the

First occurring does not affect the probability of the Second

Independent samples samples that are not related

Independent variable a variable in correlation and Regression

analysis that can be controlled or manipulated

Inferential statistics a branch of statistics that consists of
Generalizing from samples to populations, performing
Hypothesis testing, determining relationships among Variables,
and making predictions

Influential observation an observation which when Removed

from the data values would markedly change the position of the
regression line

Interaction effect the effect of two or more variables on Each

other in a two-way ANOVA study

Interquartile range Q3 _ Q1

Interval estimate a range of values used to estimate a


Interval level of measurement a measurement level that Ranks

data and in which precise differences between Units of measure
exist. See also nominal, ordinal, and Ratio levels of

Kruskal-Wallis test a nonparametric test used to compare three

or more means

Law of large numbers when a probability experiment is

repeated a large number of times, the relative frequency

Probability of an outcome will approach its theoretical

Least-squares line another name for the regression line

Left-tailed test a test used on a hypothesis when the critical

Region is on the left side of the distribution

Level a treatment in ANOVA for a variable

Level of significance the maximum probability of committing a

type I error in hypothesis testing

Lower class limit the lower value of a class in a frequency

Distribution that has the same decimal place value as the data

Lurking variable a variable that influences the relationship

Between x and y, but was not considered in the study
Main effect the effect of the factors or independent Variables
when there is a nonsignificant interaction effect in a two-way
ANOVA study

Marginal change the magnitude of the change in the

Dependent variable when the independent variable Changes 1

Maximum error of estimate the maximum likely Difference
between the point estimate of a parameter and the actual value
of the parameter

Mean the sum of the values, divided by the total number of


Mean square the variance found by dividing the sum of the

squares of a variable by the corresponding degrees Of freedom;
used in ANOVA

Measurement scales a type of classification that tells how

variables are categorized, counted, or measured; the four types
of scales are nominal, ordinal, interval, and ratio

Median the midpoint of a data array

Midrange the sum of the lowest and highest data values,

Divided by 2

Modal class the class with the largest frequency

Mode the value that occurs most often in a data set

Monte Carlo method a simulation technique using Random


Multimodal a data set with three or more modes

Multinomial distribution a probability distribution for an
Experiment in which each trial has more than two Outcomes

Multiple correlation coefficient a measure of the strength of

the relationship between the independent variables And the
dependent variable in a multiple regression study

Multiple regression a study that seeks to determine if several

independent variables are related to a dependent Variable

Multiple relationship a relationship in which many Variables

are under study

Multistage sampling a sampling technique that uses a

Combination of sampling methods

Mutually exclusive events probability events that cannot occur

at the same time

Negative relationship a relationship between variables Such

that as one variable increases, the other variable Decreases, and
vice versa

Negatively skewed or left-skewed distribution a Distribution

in which the majority of the data values fall to the right of the

Nominal level of measurement a measurement level that
Classifies data into mutually exclusive (no overlapping)
Exhaustive categories in which no order or ranking can be
imposed on them. See also interval, ordinal, and ratio Levels of

Noncritical or no rejection region the range of values of the

test value that indicates that the difference was probably due to
chance and the null hypothesis should not be rejected

Nonparametric statistics a branch of statistics for use when

the population from which the samples are selected is not
normally distributed and for use in testing Hypotheses that do
not involve specific population Parameters

No rejection region see noncritical region

Normal distribution a continuous, symmetric, bell-shaped

Distribution of a variable

Normal quantile plot graphical plot used to determine whether

a variable is approximately normally distributed

Null hypothesis a statistical hypothesis that states that there is

no difference between a parameter and a specific Value or that
there is no difference between two Parameters

Observational study a study in which the researcher merely
observes what is happening or what has happened in the past
and draws conclusions based on these observations

Observed frequency the actual frequency value obtained from

a sample and used in the chi-square test

Ogive a graph that represents the cumulative frequencies for the

classes in a frequency distribution

One-tailed test a test that indicates that the null hypothesis

should be rejected when the test statistic value is in the Critical
region on one side of the mean

One-way ANOVA a study used to test for differences among

means for a single independent variable when there are three or
more groups

Open-ended distribution a frequency distribution that has No

specific beginning value or no specific ending value

Ordinal interaction an interaction between variables in

ANOVA, indicated when the graphs of the lines connecting the
means do not intersect

Ordinal level of measurement a measurement level that

Classifies data into categories that can be ranked; however,

precise differences between the ranks do not exist. See also
interval, nominal, and ratio levels of Measurement

Outcome the result of a single trial of a probability Experiment

Outcome variable a variable that is studied to see if it has

changed significantly due to the manipulation of the
Explanatory variable

Outlier an extreme value in a data set; it is omitted from a


Parameter a characteristic or measure obtained by using all the

data values for a specific population

Parametric tests statistical tests for population parameters

Such as means, variances, and proportions that involve
Assumptions about the populations from which the Samples
were selected

Pareto chart chart that uses vertical bars to represent

Frequencies for a categorical variable

Pearson product moment correlation coefficient (PPMCC) a

statistic used to determine the strength of a Relationship when
the variables are normally distributed
Pearson’s index of skewness value used to determine the
Degree of skewness of a variable

Percentile a location measure of a data value; it divides the
Distribution into 100 groups

Permutation an arrangement of n objects in a specific order

Pie graph a circle that is divided into sections or wedges

According to the percentage of frequencies in each Category of
the distribution

Point estimate a specific numerical value estimate of a


Poisson distribution a probability distribution used when N is

large and p is small and when the independent Variables occur
over a period of time

Pooled estimate of the variance a weighted average of the

variance using the two sample variances and their Respective
degrees of freedom as the weights

Population the totality of all subjects possessing certain

Common characteristics that are being studied

Population correlation coefficient the value of the Correlation

coefficient computed by using all possible Pairs of data values
(x, y) taken from a population

Positive relationship a relationship between two variables
Such that as one variable increases, the other variable Increases
or as one variable decreases, the other Decreases

Positively skewed or right-skewed distribution a distribution

in which the majority of the data values fall to the left of the

Power of a test the probability of rejecting the null Hypothesis

when it is false

Prediction interval a confidence interval for a predicted Value


Probability the chance of an event occurring

Probability distribution the values a random variable can

assume and the corresponding probabilities of the values

Probability experiment a chance process that leads to Well-

defined results called outcomes

Proportion a part of a whole, represented by a fraction, a

Decimal, or a percentage

P-value the actual probability of getting the sample mean Value

if the null hypothesis is true

Qualitative variable a variable that can be placed into Distinct
categories, according to some characteristic or Attribute

Quantiles values that separate the data set into approximately

equal groups

Quantitative variable a variable that is numerical in nature and

that can be ordered or ranked

Quartile a location measure of a data value; it divides the

Distribution into four groups
Quasi-experimental study a study that uses intact groups
Rather than random assignment of subjects to groups

Random sample a sample obtained by using random or Chance

methods; a sample for which every member of the population
has an equal chance of being selected

Random variable a variable whose values are determined by


Range the highest data value minus the lowest data value

Range rule of thumb dividing the range by 4, given an

Approximation of the standard deviation

Ranking the positioning of a data value in a data array

According to some rating scale

Ratio level of measurement a measurement level that
possesses all the characteristics of interval measurement and a
true zero; it also has true ratios between different Units of
measure. See also interval, nominal, and ordinal Levels of

Raw data data collected in original form

Regression a statistical method used to describe the Nature of

the relationship between variables, that Is, a positive or
negative, linear or nonlinear Relationship

Regression line the line of best fit of the data

Rejection region see critical region

Relative frequency graph a graph using proportions Instead of

raw data as frequencies
Relatively efficient estimator an estimator that has the
smallest variance from among all the statistics that can be used
to estimate a parameter

Residual the difference between the actual value of y and the

predicted value y_ for a specific value of x

Resistant statistic a statistic that is not affected by an

extremely skewed distribution

Right-tailed test a test used on a hypothesis when the Critical
region is on the right side of the distribution

Run a succession of identical letters preceded by or Followed

by a different letter or no letter at all, such as the beginning or
end of the succession

Runs test a nonparametric test used to determine whether

Data are random
Sample a group of subjects selected from the population

Sample space the set of all possible outcomes of a Probability


Sampling distribution of sample means a distribution

Obtained by using the means computed from random Samples
taken from a population

Sampling error the difference between the sample measure

and the corresponding population measure due to the Fact that
the sample is not a perfect representation of the Population

Scatter plot a graph of the independent and dependent

Variables in regression and correlation analysis

Scheffé test a test used after ANOVA, if the null hypothesis s

rejected, to locate significant differences in the means

Sequence sampling a sampling technique used in quality
Control in which successive units are taken from Production
lines and tested to see whether they meet the Standards set by
the manufacturing company

Sign test a nonparametric test used to test the value of the

Median for a specific sample or to test sample means in A
comparison of two dependent samples

Simple event an outcome that results from a single trial of A

probability experiment

Simple relationship a relationship in which only two Variables

are under study

Simulation techniques techniques that use probability

Experiments to mimic real-life situations

Spearman rank correlation coefficient the nonparametric

Equivalent to the correlation coefficient, used when the data are

Standard deviation the square root of the variance

Standard error of the estimate the standard deviation of the

observed y values about the predicted y values in Regression
and correlation analysis

Standard error of the mean the standard deviation of the
Sample means for samples taken from the same Population

Standard normal distribution a normal distribution for which

the mean is equal to 0 and the standard deviation is equal to 1

Standard score the difference between a data value and the

Mean, divided by the standard deviation

Statistic a characteristic or measure obtained by using the Data

values from a sample

Statistical hypothesis a conjecture about a population

Parameter, which may or may not be true

Statistical test a test that uses data obtained from a sample to

make a decision about whether the null hypothesis should be

Statistics the science of conducting studies to collect, Organize,

summarize, analyze, and draw conclusions from data

Stem and leaf plot a data plot that uses part of a data value as
the stem and part of the data value as the leaf to form Groups or

Stratified sample a sample obtained by dividing the Population

into subgroups, called strata, according to Various

homogeneous characteristics and then selecting Members from
each stratum

Subjective probability the type of probability that uses a

probability value based on an educated guess Or estimate,
employing opinions and inexact Information

Sum of squares between groups a statistic computed in the

numerator of the fraction used to find the between group
Variance in ANOVA

Sum of squares within groups a statistic computed in the

Numerator of the fraction used to find the within-group
Variance in ANOVA

Symmetric distribution a distribution in which the data Values

are uniformly distributed about the mean

Systematic sample a sample obtained by numbering each

element in the population and then selecting every kth number
from the population to be included in the sample

T distribution a family of bell-shaped curves based on Degrees

of freedom, similar to the standard normal Distribution with the
exception that the variance is Greater than 1; used when you are
testing small samples and when the population standard
deviation is unknown

T test a statistical test for the mean of a population, used when
the population is normally distributed and the Population
standard deviation is unknown

Test value the numerical value obtained from a statistical Test,

computed from (observed value _ expected value) _ Standard

Time series graph a graph that represents data that occur over
a specific time

Treatment group a group in an experimental study that has

received some type of treatment

Treatment groups the groups used in an ANOVA study

Tree diagram a device used to list all possibilities of a
Sequence of events in a systematic way

Tukey test a test used to make pairwise comparisons of Means

in an ANOVA study when samples are the same size

Two-tailed test a test that indicates that the null hypothesis

Should be rejected when the test value is in either of the
Two critical regions

Two-way ANOVA a study used to test the effects of two or

More independent variables and the possible interaction
Between them

Type I error the error that occurs if you reject the null
Hypothesis when it is true

Type II error the error that occurs if you do not reject the Null
hypothesis when it is false

Unbiased estimator an estimator whose value approximates

the expected value of a population Parameter, used for the
variance or standard deviation when the sample size is less than
30; an estimator whose Expected value or mean must be equal
to the mean of the Parameter being estimated

Unbiased sample a sample chosen at random from the

Population that is, for the most part, representative of the

Ungrouped frequency distribution a distribution that uses

Individual data and has a small range of data

Uniform distribution a distribution whose values are evenly

distributed over its range

Upper class limit the upper value of a class in a frequency

Distribution that has the same decimal place value as the Data

Variable a characteristic or attribute that can assume Different


Variance the average of the squares of the distance that each
value is from the mean

Venn diagram a diagram used as a pictorial representative for

a probability concept or rule

Weighted mean the mean found by multiplying each value by

its corresponding weight and dividing by the sum of the weights

Wilcoxon rank sum test a nonparametric test used to test

Independent samples and compare distributions

Wilcoxon signed-rank test a nonparametric test used to Test

dependent samples and compare distributions

Within-group variance a variance estimate using all the

Sample data for an F test; it is not affected by differences in the

Z distribution see standard normal distribution

Z score see standard score

Z test a statistical test for means and proportions of a

Population, used when the population is normally distributed
and the population standard deviation is known


View publication stats

You might also like