Statistics in Behavioural Sciences

Course Name –Statistics in Behavioural
Sciences
Course Code –PHA314B
Faculty Name – Dr Heenakshi Bhansali
Test Construction
COURSE DESCRIPTION
• The course aims to provide understanding about Standardization and

test construction.
2
Test Construction
• Test construction is the set of activities involved in developing and
evaluating a test of some psychological function.
• A test is a series of questions or exercises or other means of

measuring the skills, knowledge , intelligence or aptitude of an
individual or group.
• Standardized tests are carefully constructed tests with a uniform

procedure of scoring, administering and interpreting the test results.
They consist of items of high quality.
3
Test Construction
• The items are pretested and selected on the basis of difficulty value,
discrimination power, and relationship to clearly defined objectives
in behavioural terms.
• Any person can administer the test as the directions for

administering, time -limits and scores are given. These are norm-
based tests.
• Anastasi, Anne (1982), “A psychological test is essentially an

objective and standardized measure of a sample behaviour.”
4
Test Construction
• Munn says that , “Test is an examination to reveal the relative

standing of an individual in the group with respect to intelligence,
personality, aptitude or achievement.”
• Chronbach ,L.J. (1970) defines it in the following words , “A test is a

systematic procedure for comparing the behaviour of two or more
persons .”
• A psychological test is an instrument designed to describe the

measure the sample of certain aspects of human behavior and test
may be used to compare the behavior of two or more persons at a
particular time or one or more persons at different times.
5
Characteristic of Good Test
Psychological test is a standardized procedure to measure
quantitatively and qualitatively one or more aspects of traits by
means of a sample of behaviour.
• According to Asthana, B. and Agarwal, R.N.(1991) characteristics of a

good test can be classified into two following
• Practical Criteria
• Technical Criteria
6
Practical Criteria Technical Criteria
Ease of scoring Validity
Time Reliability
Cast Objectivity
Face validity Discrimination
Purpose Standardization
Meaningfulness of test score Norms
Acceptability Items
Ease of administration & Interpretation
7
8
Purpose of Good Test
• Compare same individual on two or more than two traits.
• Compare two individual or more than two person on the same traits.
9
Steps for Test Construction
• The steps include specifying the construct of interest, deciding the

test’s function (diagnosis, description of skill level, prediction of
recovery), choosing a method (performance, behavioral observation,
self-report), designing item content, evaluating the reliability and
validity of the test, and modifying the test to maximize its utility.
• Depending upon the construct of interest, different forms of

reliability may be differentially important.
10
Steps for Test Construction
• Planning the test (Objectivity)

• Preparing the test (Item Selection)
• Try out of the test
• Reliability and Validity of the final test
• Preparation of norms for the final test (Standardization)
• Preparation of manual and reproduction of test
11
Planning the test (Objectivity)
• Planning is necessary for the construction of a test. The first step in

the development of a standardized test is the preparation of a plan.
The plan will vary, depending upon the type of test that a developer
is preparing.
• The purpose of the plan is to present the rationale for the test and to
guide the preparation and evaluation of items to be used in the test.
The purpose also serves as a general guideline for a potential user as
he considers the quality of a test and how well the purpose was
achieved.
12
Planning the test (Objectivity)
The test developer should plan well in advance and have a clear idea
about
(a) the nature of items and contents to be included in the test

(b) the type of instructions
(c) method of sampling
(d) probable time limit
(e) statistical methods to be adopted
(f) arrangement for preliminary and final administration.
13
Preparing the test (Item Selection)
• After the preparation of a test plan, the next step is writing and item
evaluation. Item writing is the preparation of the test itself. Items to
be included in a test will depend upon the purpose for which the test
is constructed. If poor items are prepared or if the items are not
related to test purpose, we can not meet the test objectives.
The test developer should have:
• thorough knowledge of subject matter
• familiar with different types of items along with their advantages
and disadvantages
• must have large vocabulary, i.e. he/she should know different
meanings of a word
14
In item writing the following suggestions are taken into
consideration:
• The number of items in the preliminary draft should be more than

that in the final draft.
• The items should be clearly phrased so that their content and not
their form, determines the response.
• The test should be comprehensive enough.
• No item should be such that it could be replied by referring to any
other item, or a group of items.
15
• The wording of items should be such that the whole content

determines the answer and not a part of it
• Each item should carry equal marks.
• So in item writing every precaution has to be taken to ensure that
items are valid, appropriate and unambiguous.
• After writing, the items must be submitted to a group of judges or
experts for their criticism and suggestions and modify if necessary.
16
Item evaluation is the process of judging the adequacy of test items to
fulfill the designated purpose of a test.
• According to Womer, F.B.(1968) It takes two forms:

1. Subject judgement: The subjective judgements may be made by
test specialist or by subject matter specialist or by both. Subject
matter evaluators also look for potential ambiguities and also
judge whether an item seems to be measuring a knowledge or skill
that is related to the purpose of the test.
2. Statistical judgement: In statistical evaluation the difficulty level
and discrimination power of items are calculated and this is done
through the process of item analysis.
17
Try out of the test
• The test cannot be directly put to use after development. It has to be
administered on a sample before releasing to common use. The
purpose of preliminary try out is to know the weaknesses, omissions,
ambiguities, inadequacies of items, distribution of items and
number of items to be included in final form.
• Conditions of administration should be normal. By using the scoring

key scoring should be done.
18
Establishment of reliability and validity
• This is the most important part of test standardization. Because it is
only after this process the test will be eligible for use. In addition to
validity, it is essential that every test should possess definite element
of reliability. It is only then that the conclusions of the test can be
considered reliable and worthy of trust.
• Along with this, the norms are also to be developed.
19
20
Reliability of the test:
• Reliability of a test refers to the quality of the test that may inspire
confidence and trust for the measurement. And this quality can be
attributed to only that test which provides the same score every time
it is performed on the same individual.
• The term reliability basically refers to the extent to which a test can
be relied upon, i.e. it gives consistency in scores even if it is tested on
the same group after frequent intervals/ time gap.
• If any test yields one score for an individual at one time, and
another at the same individual if it is applied at a different time, it is
too evident that such a test cannot be considered reliable.
21
• The reliability of a test is not a part of it, but is in its wholeness or
completeness. it is essential that the internal parts of a test possess
internal consistency and uniformity.
• When test is finally composed, the final test is again administered on
a fresh sample in order to compute the reliability coefficient. This
time also sample should not be less than 100.
• Reliability is calculated through Test-retest method, split-half
method and the equivalent -form method. Reliability shows the
consistency of test scores.
22
Measuring Reliability: Reliability can be measured in the following
four ways.
• Test retest method: One method of gauging reliability is to perform

the same test on the same group of individuals at two different
occasions, and then the scores or results obtained are compared.
• Parallel form method: In the parallel form of reliability, same

group is provided two different tests measuring the same dimension
or construct. Finally, the results scores of the two can be compared
or corrected to judge the reliability of the test. It is also known as
equivalent form of reliability.
23
• Split half method: The reliability of a test can also be judged by
dividing the components of the test into even and odd times whose
results can be individually obtained. Now the results can be
compared between the groups to check the reliability of the test.
• Inter item consistency: In this method of measuring the reliability

only one method is applied at one time. The mutual relation between
the scores obtained for each specific item in the test is observed. At
the same time the relation between the marks obtained for one
specific question and the marks obtained for the whole test is also
ascertained. This method of measuring reliability involves
considerable statistical.
24
Validity of the test:
• Validity refers what the test measures and how well it measures. If a
test measures a trait that it intends to measure well then the test can
be said valid one. It is correlation of test with some outside
independent criterion.
• A test can be accepted as valid only to that degree to which it can

correctly gauge the mentioned dimension of the participant, which it
claims to measure. In this way, validity of a test is that quality on the
basis of which the correctness or incorrectness of judgments based
upon it is evaluated.
25
Types of Validity
Validity is a relative term, as no test can have complete validity. Hence,
whenever a particular test is termed valid, or whenever the lack of
validity of a test is in question, it is necessary to indicate the sense in
which it is considered to be valid or invalid. Psychologists have
roughly accepted the following kinds of validity:
1. Face Validity
2. Content Validity
3. Factorial Validity
4. Predictive Validity
5. Concurrent Validity
26
Face Validity: It only focuses on the form of the test. Such validity is
attributed only to the test which provides an item or subject that just
appears to be valid.
Content Validity: Another kind of validity is content validity in which
the validity of the content forms the basis of the validity of the test. In
order to obtain this kind of validity in a particular test, it becomes
imperative that the items of the test achieve the objective for which
they are originally designed.
Factorial Validity: This is inclusive of the validity of the factors in the
test and in order to judge whether a test has factorial validity, it is
examined by the method of factor analysis, and a correlation between
this result and the evident factor resultant of tests is established.
27
Predictive Validity: This is the most popular form of validity. In this,
results are obtained on the basis of a particular criterion (how well
one measure predicts an outcome for another measure), and the
correlation between the scores and the criterion is established. In this,
the choice of a criterion requires much care and attention.
The coefficient obtained by this correlation between scores and
criterion is called the validity coefficient. The validity coefficient
varies between 0.5 and 0.8. A lower coefficient makes the test
inapplicable and lacking in utility, while a higher coefficient is not
normally obtained.
Concurrent Validity: Concurrent validity is demonstrated when a test
correlates well with a measure that has previously been validated.
The two measures may be for the same construct, but more often used
for different, but presumably related, constructs.
28
Norms
• Norms consist of data that make it possible to determine the relative
standing of an individual who has taken a test. It refers to the typical
performance level for a certain group of individuals. Any
psychological test with just the raw score is meaningless until it is
supplemented by additional data to interpret it further.
• A subject’s raw score has little meaning. A test score must be
interpreted as indicating the subject’s position relative to others in
some group. Norms provide a basis for comparing the individual
with a group.
• Test norms describe the characteristics or behaviors that are typical
or common within a specific population. In other words, test norms
compare a person's answers to the answers of other test-takers in
the same group.
29
Norms
30
Norms
• Therefore, the cumulative total of a psychological test is generally
inferred through referring to the norms that depict the score of the
standardized sample. Norms are factually demonstrated by
establishing the performance of individuals from a specific group in
a test.
• To determine accurately a subject’s (individual’s) position with
respect to the standard sample, the raw score is transformed into a
relative measure.
• For example, the average IQ when using a standardized intelligence
test is about 90. This means that people typically or normally score
at or near 90 on this particular test. Scores that are significantly
lower or higher are considered atypical or much less common.
31
Norms
• Norms have been defined as the standard, performance of a group
of pupils in a test. there is difference between norm and standard.
Norms indicate the actual achievement of individual at standardized
level, while standard indicate the desired level of performance.
• Norms are averages or values determined by actual measurement of
a group of persons who are the representatives of specific
population. While standard possesses a desirable objective, which
can be less or more than the obtained norm.
• To prepare norms, administer a test on a large population in order to
change the scores into percentiles or standard scores. Norms are
used as the central tendency of scores of a certain group.
32
Characteristics of Norms
Novelty: By being novel or up-to-date is meant that the norms should
not be outdated i.e. norms should not be constructed on the basis of
test scores which were administered a long way back because time
interval can effect change in students’ abilities.
Representation: By representation of norms is meant that norms

should be developed from the scores obtained from the representative
group, whose scores have to be analyzed. Test norms should be
constructed on the basis of scores obtained from a large group. A
small group cannot represent the whole population adequately, due to
which a norm developed on a small group can give incorrect
interpretation.
33
Characteristics of Norms
Meaningfulness: By meaningfulness is meant the type of norms. The
evolved norms should be dependent on the test objectives and
measurable traits. Where traits increase with an increase in age, it
would be proper to develop age norms or grade norms.
Comparability: Comparability is an important characteristic of norms.

Test norms should be mutually comparable, only then these norms
can be used to compare different individuals. Besides, norms should
be sufficiently described, that is, the different reference points should
be clearly explained so that the individuals’ ability can be clearly
explained in words.
34
Types of Norms
• Norm is used for comparison of an individual’s performance to the
most closely related groups’ performance. They carry a clear and
well defined quantitative meaning which can be applied to most
statistical analysis.
• Norms can be classified in four types of norms i.e. Age norms, Grade
norms, Percentile norms and Standard score norms.
Age Norms:
• Assumption in this type of norms is that variable should increase
with the age. To obtain this, take the mean raw score gathered from
all in the common age group inside a standardized sample. By age
norms is meant the average scores of individuals of different ages.
Age norms find out only those variables, which increase with age,
such as height, weight, intelligence, reading ability, vocabulary,
mental age etc. Hence, the 15 year norm would be represented and
be applicable by the mean raw score of students aged 15 years. 35
Types of Norms
Grade Norms:
• Grade norms are related with class. They are also called class norms. By
grade norms in a test is meant the average scores of students of different
classes. This is administered on a classified student in the school. It is
calculated by finding the mean raw score earned by students in a specific
grade. Grade norms are mostly established for achievement tests. These
are related with the performance of average students of all classes.
Percentiles (P(n) and PR):
• They refer to the percentage of people in a standardized sample that are
below a certain set of score. In calculating percentile norm, a candidate is
compared with the group of which he/she is a member. They depict an
individual’s position with respect to the sample. Here the counting begins
from bottom, so the higher the percentile the better the rank. For example,
75th percentile norm tells that 75% students have scored below this score
and only 25% students have obtained scores above it.
36
Types of Norms
Standard Score:
• It signifies the gap between the individuals score and the mean
depicted as standard deviation of the distribution. By standard score
norms are meant to change the raw scores of candidates into
standard scores. This type of norms is found out with the help of
standard deviation (S.D. or σ).
• This standard deviation is a measurement of the expanse of scores of
a group. It can be derived by linear or nonlinear transformation of
the original raw scores. They are also known as T and Z scores.
• Standard Score can be used to compare raw scores that are taken
from different tests especially when the data are at the interval level
of measurement.
37
Item Writing
Item writing refers to the process of constructing the items required for a
given test or assessment. Item writing is essentially a creative art. There are
no set rules to guide and guarantee the writing of good items. A lot depends
upon the item writer’s intuition, imagination, experience, practice, and
ingenuity.
A good test item must have the following characteristics:
Clarity in meaning
Clarity in reading
Discriminating power
Not too easy or too difficult
Doesn't encourage guesswork
Gets to the point
38
Item Writing
To make sure that his test items meet these standards, follows some
general guidelines for item writing:
• Avoids using non-functional words, or words that make no

contribution towards the appropriate and correct choice of a
response.
• Writes items that are adaptable to the level of understanding of
different types of respondents.
• Avoids using stereotyped words.
• Avoids items that provide irrelevant clues, such as items with the
same order of the correct answer.
• Avoids items that can be answered only by referring to other items.
39
Item Writing Guidelines
Do Not Use Trick Items
• Items are tricky either intentionally or accidentally.
1. Trick items are unfair to examinees and threaten the validity of the test.
2. Trick items measure test-taking skills more than the intended construct.
3. Trick items heighten test anxiety and cause examinees to mistrust the intent of all
other items.
• Causes of unintentional trickiness.

1. Trivial content
2. Discrimination between options is too fine
3. Overlapping options
4. Irrelevant content
5. Single answer allowed, but multiple correct answers possible
6. Ambiguity in either the stem or options
40
Measure a Single Construct
• If an examinee incorrectly answers an item that has multiple

constructs, it is impossible to know which construct is not mastered.
• Items are generally scored dichotomously, so the only inference that
can be made is that the examinee knows the entire item or none of the
item.
• Compound items heighten test anxiety and can lower the perceived
validity of the exam.
41
Avoid Opinion-Based Items
• Never ask "What would you ... do", " ... use", " ... try", etc. The
examinee's answer can never be wrong.
• Use caution when asking for the "best" thing, or the "best" way of
doing something, unless it is clearly the best amongst the options.
• If differences exist between any experts' opinion about what the
"best" is, then avoid using it.
• Qualify the standard for "best" (i.e., according to ... ).
42
A void Absolute Modifiers such as always, never, only and none.
• The use of absolute modifiers in options makes it easy to eliminate options,
increasing the guessing probability.
Avoid over-specific or over-general content

• Over-specific content tends to be trivial.
• Over-general content tends to be ambiguous.
Use Novel Content

• Do not repeat exact wording from training materials.
• Repeated wording tends to test recall and recognition, rather than learning.
43
Avoiding Excessive Verbiage
• "Verbosity is an enemy to clarity."
• Wordy items take longer to read and answer, meaning fewer items
can be presented in a fixed amount of time, reducing reliability.
• Write items as briefly as possible without compromising the
construct and cognitive demand required.
• Get to the point in the stem and present clean, clear options for the
examinee to choose.
• Avoid unnecessary background information.
44
Keep Items Independent
• Content should be independent from item to item.
• Don't give the answer away to one item in the stem of another.
• Don't make answering one item correctly dependent on knowing the
correct answer to another item.
Example: Write items to a sixth grade reading level

• Use appropriate vocabulary for construct
• Use necessary technical terms & content
• For everything else, use the simplest words and sentence structure
possible
45
Item Analysis
• Item analysis is a process of statistically analyzing assessment data
to evaluate the quality and performance of test items. This is an
important step in the test development. It helps identify individual
items or questions that are not good questions and whether or not
they should be discarded, kept, or revised.
• Item analysis is one of the most common applications of

psychometrics. An item analysis is a post hoc test which means it is a
measure used after the test has been taken. Many teachers use an
item analysis after a test to be certain all of the questions are fair.
46
Item Analysis
• Item analysis is a general term for a set of methods used to evaluate
test items. Items can be analyzed qualitatively in terms of their
content and form and quantitatively in terms of their statistical
properties.
• Experimental try out stage of any test construction process is divided

into three different stages. 1) Pre-Try out 2) Proper Try out 3) Final
try out.
• The main purpose of this experimental try out is to find out the
weakness, ambiguity and inadequacies of items.
47
Item Analysis
Pre-Try Out
• The sample size for Pre try out should be 100.

• This step helps in determining the difficulty level of each item, which
in turn helps in their proper distribution in the final form.
• To determining a reasonable time limit for the test.
• To determining the number of items to be included in the final form.
• To identifying weaknesses and vagueness in directions or
instructions of the test.
48
Item Analysis
Proper Try Out (Item Analysis)
Item analysis is the technique of selecting discriminating items for the
final composition of the test. The sample size for Proper try out should
be 400. Sample needs to similar to those for whom the test is intended.
It aims at obtaining three kinds of information regarding the items.
1) Item Difficulty
2) Discriminatory power of items
3) Effectiveness of distractors
49
Importance of Item Analysis
• Item analysis is a vitally important operation in the development of

a new test and one that should invariably be carried out unless
special circumstances.
• Both the validity and reliability of any test depend ultimately on the
characteristics of its items. Item analysis helps in high reliability and
validity that can be built into a test in advance with this process.
• Tests can be improved through the item selection, item substitution
or revision of items.
• Item analysis help in shorten a test and increase its validity and
reliability.
50
Methods of Item Analysis
There are two basic methods of item such as Item difficulty and item
discrimination.
Item Difficulty index
• For testing purposes, the difficulty of an item is defined in terms of

the percentage or proportion of persons who answer it correctly. The
easier the item, the larger this percentage will be.
• It is customary to arrange items in order of difficulty so that test-
takers begin with relatively easy items and proceed to items of
increasing difficulty.
• In the process of test construction, a major reason for measuring
item difficulty is to choose items of suitable difficulty level.
51
Item Difficulty index
• Items that are too easy or too difficult do not affect the variability of
test scores, they contribute nothing to the reliability or validity of the
test.
• The closer the difficulty of an item approaches 1.00 or 0, the less
differential information about test-takers it contributes, conversely
the closer the difficulty level approaches .50 the more
differentiations the items can make.
• Items within a test tend to be inter-correlated, the more
homogeneous the test, the higher these intercorrelations will be.
Moreover, the higher the item inter-correlations, the wider should
be the spread of item difficulty.
52
Item Difficulty index
• Up and Lp designate the numbers of examinees in the upper and

lower groups, respectively, who pass the item.
• U is the total number of examinees in the upper or lower group. The
value of p is referred to as an item easiness index or item difficulty
index.
• If 50 people take a test. Then the upper and lower groups can be
formed from the top 14 and bottom 14 people on a total test score. If
12 of the examinees in the upper group and 7 of those in the lower
group pass item A, the p = (12 + 7)/28 = .68
53
Item Discrimination index
• Item discrimination refers to the degree to which an item

differentiates correctly among test-takers in the behavior that the
test is designed to measure.
• The extreme group method is used to calculate item discrimination.
• The upper group is made up of 25%-33% who are the best
performers and bottom is 25%-33% who are poorest performers
respectively.
• The most appropriate percentage is to use the top and bottom 27% of
the distribution.
• The higher the discrimination, the more the item discriminates.
54
Item Discrimination index
• Item discrimination refers to the degree to which an item

differentiates correctly among test-takers in the behavior that the
test is designed to measure.
• The extreme group method is used to calculate item discrimination.
• The value of D is referred to as an item discrimination index. And D
= (12 — 7)/14 = .36
55
Item Analysis
Final Try Out
• The sample for final Try out should be at least 100.
• The items are selected after item analysis and constitute the test in
the final form. It is carried out to determine the minor defects that
may not have been detected by the first two preliminary
administrations.
• The final administration indicates how effective the test will be

when it would be administered on the sample for which it is really
intended. 56
Scale Construction
• Scales of measurement in research and statistics are the different
ways in which variables are defined and grouped into different
categories.
• The term scale of measurement is derived from two keywords in
statistics: measurement and scale.
• Measurement is the process of recording observations collected as
part of the research.
• Scaling, on the other hand, is the assignment of objects to numbers
or semantics. These two words merged together refer to the
relationship among the assigned objects and the recorded
observations.
57
Scale Construction
• A measurement scale is used to qualify or quantify data variables in
statistics.
• There are different kinds of measurement scales, and the type of

data being collected determines the kind of measurement scale to be
used for statistical measurement.
• These measurement scales are four in number, namely; nominal

scale, ordinal scale, interval scale, and ratio scale. With nominal and
ordinal scale being used to measure qualitative data while interval
and ratio scales are used to measure quantitative data.
58
Characteristic of Scale
• Identify: The assignment of numbers to the values of each variable

in a data. Arithmetic operations can not be performed on these
values because they are just for identification purposes. This is a
characteristic of a nominal scale.
• Magnitude: The size of a measurement scale, where numbers (the

identity) have an inherent order from least to highest. They are
usually represented on the scale in ascending or descending order.
This is a characteristic of an Ordinal Scale.
59
Characteristic of Scale
• Equal interval: The scale has a standardized order and the

difference between each level on the scale is the same. A variable
that has an identity, magnitude, and equal interval is measured on
an interval scale.
• Absolute Zero: There is an existence of zero on the scale, and is

defined by the absence of the variable being measured. This is a
characteristic of a ratio scale.
60
Types of Scale
• The nominal scale is a scale of measurement that is used for

identification purposes. It is also known as categorical scale.
• It assigns numbers to attributes for easy identity. These numbers are

however not qualitative in nature and only act as labels.
• The only statistical analysis that can be performed on a nominal

scale is the percentage or frequency count. It can be analyzed
graphically using a bar chart and pie chart.
61
Types of Scale
• Ordinal Scale involves the ranking or ordering of the attributes

depending on the variable being scaled. The items in this scale are
classified according to the degree of occurrence of the variable in
question.
• The attributes on an ordinal scale are usually arranged in ascending

or descending order. It measures the degree of occurrence of the
variable.
• Ordinal scale can be used in market research, advertising, and

customer satisfaction surveys. It uses qualifiers like very, highly,
more, less, etc. to depict a degree. 62
Types of Scale
• The interval scale of data measurement is a scale in which the

levels are ordered and each numerically equal distances on the
scale have equal interval difference.
• If it is an extension of the ordinal scale, with the main
difference being the existence of equal intervals.
• With an interval scale, you not only know that a given attribute
A is bigger than another attribute B, but also the extent at
which A is larger than B. Also, unlike ordinal and nominal scale,
arithmetic operations can be performed on an interval scale.
63
Types of Scale
• Ratio Scale is the peak level of data measurement. It is an

extension of the interval scale, therefore satisfying the four
characteristics of the measurement scale; identity, magnitude,
equal interval, and the absolute zero property.
• This level of data measurement allows the researcher to

compare both the differences and the relative magnitude of
numbers. Some examples of ratio scales include length, weight,
time, etc.
64
THANK YOU
For queries Email: Heenakshi.bhansali@mitwpu.edu.in

Statistics in Behavioural Sciences

Uploaded by

Copyright:

Available Formats

Statistics in Behavioural Sciences

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics in Behavioural Sciences

Uploaded by

Copyright:

Available Formats

Course Name –Statistics in Behavioural

• The course aims to provide understanding about Standardization and

• A test is a series of questions or exercises or other means of

• Standardized tests are carefully constructed tests with a uniform

• Any person can administer the test as the directions for

• Anastasi, Anne (1982), “A psychological test is essentially an

• Munn says that , “Test is an examination to reveal the relative

• Chronbach ,L.J. (1970) deﬁnes it in the following words , “A test is a

• A psychological test is an instrument designed to describe the

• According to Asthana, B. and Agarwal, R.N.(1991) characteristics of a

Ease of scoring Validity

Face validity Discrimination

Meaningfulness of test score Norms

Ease of administration & Interpretation

• Compare same individual on two or more than two traits.

• The steps include specifying the construct of interest, deciding the

• Depending upon the construct of interest, different forms of

• Planning the test (Objectivity)

• Planning is necessary for the construction of a test. The ﬁrst step in

(a) the nature of items and contents to be included in the test

• The number of items in the preliminary draft should be more than

• The wording of items should be such that the whole content

• According to Womer, F.B.(1968) It takes two forms:

• Conditions of administration should be normal. By using the scoring

• Test retest method: One method of gauging reliability is to perform

• Parallel form method: In the parallel form of reliability, same

• Inter item consistency: In this method of measuring the reliability

• A test can be accepted as valid only to that degree to which it can

Representation: By representation of norms is meant that norms

Comparability: Comparability is an important characteristic of norms.

• Avoids using non-functional words, or words that make no

• Causes of unintentional trickiness.

• If an examinee incorrectly answers an item that has multiple

Avoid over-speciﬁc or over-general content

Use Novel Content

Example: Write items to a sixth grade reading level

• Item analysis is one of the most common applications of

• Experimental try out stage of any test construction process is divided

• The sample size for Pre try out should be 100.

• Item analysis is a vitally important operation in the development of

• For testing purposes, the diﬃculty of an item is deﬁned in terms of

• Up and Lp designate the numbers of examinees in the upper and

• Item discrimination refers to the degree to which an item

• Item discrimination refers to the degree to which an item

• The sample for ﬁnal Try out should be at least 100.

• The ﬁnal administration indicates how effective the test will be

• There are different kinds of measurement scales, and the type of

• These measurement scales are four in number, namely; nominal

• Identify: The assignment of numbers to the values of each variable

• Magnitude: The size of a measurement scale, where numbers (the

• Equal interval: The scale has a standardized order and the

• Absolute Zero: There is an existence of zero on the scale, and is

• The nominal scale is a scale of measurement that is used for

• It assigns numbers to attributes for easy identity. These numbers are

• The only statistical analysis that can be performed on a nominal

• Ordinal Scale involves the ranking or ordering of the attributes

• The attributes on an ordinal scale are usually arranged in ascending

• Ordinal scale can be used in market research, advertising, and

• The interval scale of data measurement is a scale in which the