Item Response

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Assignment – I

PSY 201: Psychometrics


Name: Mitali Malhotra
Roll no.: 22014725009
Item Response Theory
FRAMEWORKS USED IN PSYCHOMETRICS FOR MEASUREMENT
Item Response Theory (IRT) and Classical Test Theory (CTT) are two frameworks used in
psychometrics for the measurement of latent traits or abilities. Both of the tools also help in
demonstrating the reliability and validity of measurement. However, IRT and CTT make
different assumptions about the relationship between the latent trait and observed test scores.

ITEM RESPONSE THEORY


The item response theory (IRT), often referred to as the latent response theory or item
characteristic curve theory, is a group of mathematical models that aims to explain the
connection between latent traits, i.e., unobservable characteristics or attributes and their
manifestations in terms of observed behaviours, reactions, or performance. The IRT tends to
consider the chances of getting particular items right or wrong. The theory asserts that one
or more qualities of the individual respondents and one or more properties of the test item work
together to determine the probability of a specific response to a test item (Singh, 2009).

THE ASSUMPTIONS OF ITEM RESPONSE THEORY (Columbia University, 2016)-:


1) Monotonicity - According to this presumption, the likelihood of responding correctly to the
item increases as the trait level rises.
2) Uni-dimensionality – The model assumes that there is one dominant latent trait being
measured which is the driving force for the responses observed for each item in the measure.
3) Local Independence – Responses given to the separate items in a test are mutually
independent given a certain level of ability.
4) Invariance – In Item Response Theory, researchers are allowed to estimate the item
parameters from any group of subjects who have answered the item. In other words, item
characteristic curve parameters don't depend on the sample from which test data are drawn.
Thus, the theory produces measures of item difficulty, item discrimination, etc. across different
samples of people who take this test.

ITEM CHARACTERISTIC CURVE - KEY FOUNDATION OF ITEM RESPONSE THEORY


Item Characteristic Curve (ICC), also known as the Item Response Function (IRF), serves as
the foundation of the item response theory. It is a mathematical equation that expresses the
connection between a person's latent characteristic and the probability that they will respond
in a particular way to a test question meant to measure that construct. In simpler terms, it
reflects the probability of getting the item right or wrong, with respect to a specific ability
level of the test taker. Each test item has its own IRF. The ICC provides information about the
item's difficulty and discrimination, i.e., it provides a summary of the information conveyed by
item analysis.
There are three parameters used when estimating the ability of a person using the Item
Response theory.
First Parameter (Item Difficulty): It uses item difficulty as a parameter for calculating a
person's ability. In the IRT approach, the difficulty level is gauged differently than in classical
test theory.
In Item Response Theory, the difficulty is indexed by how much of the trait is needed to
answer the item correctly. For example, item A has the lowest difficulty level which indicates
that it is passed by almost everyone including people with even less trait level. This can be
understood graphically too. For the difficult items (C), the ICC starts rising on the right side of
the plot than the easier items.
Figure 1
Three ICCs Showing Variant Difficulty Level

Items can be classified as easy, moderate, or difficult based on their difficulty parameter.
1) Easy Items: Easy items are those that most people of all skill levels are likely to correctly
answer. They are situated near the bottom of the latent trait continuum and have a high
probability of giving the right answer. It tells us that easy items are useful only for testing
individuals low on the relevant trait as at higher levels, everyone answers correctly, and no
information is gained.
2) Difficult Items: Difficult items are challenging for most individuals, including those with
higher levels of latent traits. They have a low probability of a correct response and are located
towards the upper end of the latent trait continuum.
3) Moderate Items: Items in the moderate range: These are neither overly simple nor overly
complex. They are situated in the middle of the latent trait continuum and have a modest
likelihood of giving the right answer. It represents the point on the latent trait continuum where
the probability of a correct response is 0.5 (50%).
IRT equation using the single parameter of item difficulty level-:
b = Item Difficulty Level
Georg Rasch, a Danish mathematician, created this particular formula in 1960; as a result, this
IRT application is also known as a Rasch Model in his honour.
Second parameter: It uses both item difficulty and item discrimination. The item
discrimination parameter is a gauge of how well the item differentiates among individuals at a
specific level of the trait in question. Two items might be equally difficult overall like both are
answered correctly by 50 per cent of the examinees but still, we can’t tell the difference between
individuals possessing similar levels of the latent construct of interest (Gregory, 2004).
High Discrimination versus Low Discrimination: The slope of the curve determines the
discriminatory power of the item. Items with a steeper curve possess better discrimination,
meaning that it is better able to differentiate among individuals at this level of the trait. An item
with high discrimination will have different response patterns for individuals with high and
low trait levels, whereas an item with low discrimination will result in similar response patterns
across different trait levels. Items with high discrimination are desirable in assessments as they
provide more precise measurements of individual abilities. These variables have a greater
impact on how accurately individuals are ranked along the spectrum of latent traits. On the
other hand, items with low discrimination are less effective in discriminating between
individuals and may not contribute as much to the measurement precision. If an item is found
to exhibit negative discrimination, researchers should proceed with caution since as respondent
skill grows, the likelihood of selecting the right response shouldn't fall.
Third Parameter: The third parameter uses item difficulty, item discrimination, and the extent
to which the candidate can guess the correct answer (Baker, 2001).

NEW RULES OF MEASUREMENT IN ITEM RESPONSE THEORY IN COMPARISON TO


CLASSICAL TEST THEORY
Standard Error of Measurement: The standard error of measurement is thought to be a
constant in classical test theory. The standard error of measurement in IRT, however,
significantly increases at both ability extremes. The IRT model, therefore, comes to the
conclusion that test results are more trustworthy for those with average ability.
Length and Reliability: The idea that longer tests are more reliable than shorter ones is
practically an assumption in traditional test theory. IRT-based tests are more suited to
computerised adaptive testing, which allows one to make flexible selections of specific
questions based on each examinee's continued answers to earlier items.
Invariance: Item characteristic curve parameters don't depend on the sample from which
test data are drawn. This is not true for the classical standard item analysis statistic.
Different Format Items: IRT, can handle items that are written in different formats which
serve as one of its biggest advantages (Hayes, 2000). All the items in Classical Test Theory
need to be in the same format.

Table 1
Showing differences in Classical Test Theory and Item Response Theory

CLASSICAL TEST THEORY ITEM RESPONSE THEORY

The test is the unit of analysis. The item is the unit of analysis.

Measures with more items are deemed to be Measures with lesser items can be reliable too
more reliable. in comparison to their counterparts.

Item properties depend on the representative Item properties don’t depend on a


sample. representative sample

Position on the latent trait continuum is Position on the latent trait continuum is
derived by comparing the total test score from derived by comparing the distance between
a reference group. the items on the ability scale.

All items of a measure must have same scale All items of a measure can have different
points. response categories.

APPLICATIONS OF ITEM RESPONSE THEORY


IRT has important ramifications. In fact, IRT has been hailed as the greatest significant
advancement in psychological testing in the latter part of the 20th century by some.
Employed in a Number of Disciplines: IRT is employed in a number of disciplines, including
market research, psychological testing, health outcomes research, and educational testing. It
contributes to more precise and trustworthy evaluations by offering insightful information
about test construction, item properties, and the measurement of latent traits. It can also be used
for specific problems such as the measurement of self-efficacy (Smith et al., 2003),
psychopathology (Nugent, 2003) and industrial psychology (Schneider et al., 2003).
Scores are No Longer Influenced by Number of Correct Items: The test-takers most crucial
takeaway is that the difficulty of the questions that they can correctly answer determines their
score, not the overall number of questions they get right.
Various Approaches to Construction of Tests in IRT: There are various approaches to the
construction of tests using IRT. Some of the approaches use only one parameter and the other
uses two dimensions of difficulty and discriminability. Other approaches add a third parameter
as well.
Computerized Tailored Testing: The most attractive advantage of tests based on IRT is that
one can easily adapt them for computer administration. The computer samples items and
determines the range of ability that best represents each test taker. The remaining testing time
is then spent concentrating on the precise range that presents a challenge to the respondent,
more particularly, on questions with a 50% chance for a correct answer for each person. With
this approach, test takers do not have to suffer the embarrassment of attempting multiple items
beyond their ability. On the other hand, people shouldn't waste their time and energy on things
that are well below their skill level. Additionally, each test-taker can receive a unique set of
questions to respond to, greatly lowering the likelihood of cheating. Testing that is computer-
adaptive has been proposed (Kaplan et al., 2017).
Getting information about ability from distractors: In classical test theory, the wrong
answer is given a score of 0, and it makes no difference which distractor is chosen. However,
some proponents of item response theory contend that some distractions are better than others
in that they reflect the examinee's partial understanding, as opposed to other distractions that
show the examinee's lack of information. Therefore, conclusions regarding the examinees'
ability can also be reached based on their replies to the distractors.
Collectively, the IRFs can be used for many other purposes, including the refinement of the
instrument, the calculation of reliability, and the estimation of examinee trait levels.
Figure 2
Applications of Item Response Theory

Used in a Variety of
Disciplines like market
research, health outcomes,
etc.

Gaining Information Scores are not


about Ability from the Influenced by the
Distractors Applications number of correctly
of IRT answered items

Various Approaches to
Test Construction in
Computerized Tailored IRT
Testing

REFERENCES
Columbia University Mailman School of Public Health (2016, August 5). Item Response
Theory. https://www.publichealth.columbia.edu/research/population-health-methods/item-
response-theory
Gregory, R. J. (2004). Psychological testing: History, principles, and applications. Pearson
Education India.
Kaplan, R. M., & Saccuzzo, D. P. (2017). Psychological testing: Principles, applications, and
issues. Cengage Learning.
Schneider, B., Hanges, P. J., Smith, D. B., & Salvaggio, A. N. (2003). Which comes first:
employee attitudes or organizational financial and market performance?. Journal of applied
psychology, 88(5), 836.
Singh, A.K. (2009). Tests, Measurements and Research Methods in Behavioural
Sciences. Bharati Bhawan, New Delhi.

You might also like