A Taxonomy of Polytomous Item Response
Models
Gerhard Tutz
Ludwig-Maximilians-Universität München
arXiv:2010.01382v1 [stat.ME] 3 Oct 2020
Akademiestraße 1, 80799 München
October 6, 2020
Abstract
A common framework is provided that comprises classical ordinal item response
models as the cumulative, sequential and adjacent categories models as well as
nominal response models and item response tree models. The taxonomy is based
on the ways binary models can be seen as building blocks of the various models.
In particular one can distinguish between conditional and unconditional model
components. Conditional models are by far the larger class of models containing the adjacent categories model and the whole class of hierarchically structured
models. The latter is introduced as a class of models that comprises binary trees
and hierarchically structured models that use ordinal models conditionally. The
study of the binary models contained in latent trait models clarifies the relation
between models and the interpretation of item parameters. It is also used to distinguish between ordinal and nominal models by giving a conceptualization of
ordinal models. The taxonomy differs from previous taxonomies by focusing on
the structured use of dichotomizations instead of the role of parameterizations.
Keywords: Ordered responses, latent trait models, item response theory, graded response model, partial credit model, sequential model, Rasch model, item response
trees
1 Introduction
Various latent trait models for ordered response data have been proposed in the literature, for an overview see, for example, Van der Linden (2016). One can in particular
distinguish between three basic types of models, cumulative models, sequential models and adjacent categories models. One of the objectives of the present paper is to
show how these models are easily built from binary latent trait models. The way how
the binary models are used to construct models helps to understand the structure of
the models and to clarify the meaning of the parameters. It also provides a framework that allows to embed more recently developed ordinal item response models as,
for example, tree-based models, yielding a general taxonomy of ordinal item response
models.
1
The proposed taxonomy is quite different from that given by Thissen and Steinberg
(1986). Their classification into “difference” models and “divide-by-total” models is
based on the form of the response probabilities, which may be represented as differences or as a ratio obtained by dividing by sums of terms. The third class of models
they consider are “left-side added models”, which arise if the parameterization is extended to account for guessing parameters. The taxonomy proposed here is of a different nature. It is based on exploiting how ordinal models can be constructed by using
(conditional or unconditional) dichotomizations of response categories. It also works
the other way, by clarifying which binary models (or dichotomizations) are contained
in ordinal models. By investigating this structural aspect one obtains a taxonomy that
uses that ordinal models can be characterized by the way they determine the choice of
specific subsets of categories.
As a preview let us consider how classical models can be described by considering
dichotomizations. The basic building blocks are binary models, which in its simplest
form specify that the responses of person p on item i are determined by
P (Ypi = 1) = F (θp − δi ),
(1)
where F (.) is a cumulative distribution function, θp is the person parameter, and δi
is the item parameter, typically a difficulty or threshold. An important member of this
class of models is the Rasch model, which is obtained if F (.) is the logistic distribution
function F (η) = exp(η)/(1 + exp(η)).
Given one has a response in ordered categories {0, 1, . . . , k} there are several ways
to construct an ordinal model from binary models of the form (1). The binary models
can be used to compare specific categories or groups of categories from {0, 1, . . . , k}.
One can, in particular,
- compare groups of categories that result from splitting the categories into the
subsets {0, 1, . . . , r − 1} and {r, . . . , k},
- compare (conditionally) between two categories, for example, adjacent categories,
- compare (conditionally) between a category and a set of adjacent categories, for
example, {r − 1} and {r, . . . , k}.
The different ways to compare categories correspond to cumulative models, adjacent
categories and sequential models in that order. The taxonomy given in Figure 1 distinguishes between conditional and non-conditional models, a distinction which follows
from the consideration of the binary models that are contained in the ordinal models.
The cumulative or graded response model is the only non-conditional model from this
class of models. The other two use some sort of conditioning in the binary building
blocks within the models.
The graded response model corresponds to the difference models, and the adjacent categories models to the divide-by-total models in the Thissen-Steinberg taxonomy. Thissen and Steinberg (1986) did not consider sequential models, which were
not known in 1986. Although in the present taxonomy and the Thissen-Steinberg taxonomy two model classes cover the same types of models, the focus is different. Here,
the models are not characterized by the form of the probability but by their building
2
Ordinal Models
Conditional Models
Adjacent Categories Model
compares adjacent categories
Graded Response Model
Simultaneous Splits Model
Sequential Model
compares category and groups
F IGURE 1: Structure of classical ordinal latent trait models.
blocks. The consideration of building blocks allows not only to include the sequential models and other models, but is also helpful to obtain a valid interpretation of the
parameters of the models, which has not always been correct and has been a subject
of debate in the literature on ordinal models, see, for example, Adams et al. (2012),
Andrich (2013, 2015), Garcı́a-Pérez (2017), Tutz (2020b).
The focus on building blocks allows to identify common structures beyond the
choice of the response function F (.) in (1) and the parameterization. For the structuring
it does not matter if one chooses the logistic response function, the normal ogive or any
other strictly monotone distribution function. One might also extend the linear term
to include an item discrimination parameter by using αi (θp − δi ) instead of θp − δi ,
or extend it to include guessing parameters. Although parameterization is considered
secondary when characterizing model types it can be important as a source of ordinality
in latent trait models. This will be investigated separately when considering nominal
models.
In Section 2 classical ordered response models are considered. It is used that they
can be characterized as parameterizations of split variables, which yield the preliminary structure given in Figure 1. Section 3 is devoted to tree structured models. In
particular binary IR-tree models are considered, which have been introduced as flexible models to account for response styles. Within the general hierarchically structured models it is distinguished between binary IR-tree models and partitioning models
that contain ordinal building blocks. Both are embedded into the proposed framework
yielding the taxonomy given in Section 4. In Section 5 the role of parameterizations
and the order in ordinal models are discussed. It is in particular investigated how ordinal models can be obtained by using constraints on parameters in nominal models. The
final chapter completes the taxonomy by including the class of finite mixture models,
which are divided into homogeneous and heterogeneous mixture models.
2 Classical Ordered Response Models
In the following let Ypi ∈ {0, 1, . . . , k}, p = 1, . . . , P , i = 1, . . . , I, denote the ordinal
response of person p on item i. An important partition of the response categories is the
3
partition into the subsets {0, . . . , r − 1} and {r, . . . , k}, which can be represented by
the binary variable
1 Ypi ≥ r
(r)
Ypi =
(2)
0 Ypi < r.
(1)
(k)
The variables Ypi , . . . , Ypi are called split variables because they split the response
categories into two subsets. As shown by Tutz (2020b) they play a major role in the
construction of the traditional ordered latent trait models. In the following we use these
results to derive a taxonomy of the traditional models
2.1 Simultaneous Modelling of Splits: The Graded Response Model
Let us assume that the response categories represent levels of performance in an
achievement test. Then one can consider two groups of categories, {0, 1, . . . , r −1} for
low performance and {r, . . . , k} for high performance, where low and high are relative
terms that refer to “below category r” and “above or in category r”. One might assume
that the split into low and high performance is determined by a binary model with person ability θp and a threshold that depends on the category at which the categories have
been split by specifying
(r)
P (Ypi = 1) = F (θp − δir , )
, r = 1, . . . , k.
(3)
Thus, for each dichotomization into categories {0, 1, . . . , r−1} and {r, . . . , k} a binary
model is assumed to hold. Importantly, the models are assumed to hold simultaneously
with the same person ability θp but different item difficulties δir . Simple rewriting
yields the cumulative model
P (Ypi ≥ r) = F (θp − δir ),
r = 1, . . . , k,
(4)
which is equivalent to a version of Samejima’s graded response model (Samejima,
1995, 2016). Thus, the graded response model can be seen as a model for which the
dichotomizations into the categories Ypi < r and Ypi ≥ r are simultaneously modeled.
One consequence is that item difficulties are ordered. Since P (Ypi = r) = P (Ypi ≥
r)−P (Ypi ≥ r +1) = F (θp −δir )−F (θp −δi,r+1 ) ≥ 0, one obtains that δir ≤ δi,r+1 has
to hold for all categories.
The strong link between the binary responses and the ordinal response yields a
specific view of the graded response model that differs from traditional ones. In an
(1)
(k)
achievement test the sequence of binary responses (Ypi , . . . , Ypi ) can be seen as referring to tasks with increasing difficulties. More concrete, because item difficulties are
(r)
(r+1)
ordered, one has P (Ypi = 1) ≥ P (Ypi
= 1), which means the “task” represented
(r)
(r+1)
(r)
by Ypi is simpler than the “task” Ypi . Moreover, if the task Ypi was completed
(r)
(s)
(s)
(Ypi = 1 or, equivalently, Ypi ≥ r), the simpler tasks Ypi , s < r (Ypi = 1 or, equivalently, Ypi ≥ s) were also completed. Therefore, the outcome of the sequence of binary
variables has the specific form
(1)
(k)
(Ypi . . . , Ypi ) = (1, . . . , 1, 0, . . . , 0),
which means a sequence of ones is followed by a sequence of zeros. Binary variables
that follow this pattern have been called Guttman variables and the resulting response
4
space is usually referred to as Guttman space, a term that was used by Andrich (2013)
when discussing partial credit models.
The more classic derivation of the cumulative model suggests that the item parameters may be seen as thresholds. Let Ỹpi = θp + εpi , where εpi is a noise variable
with symmetric continuous distribution function F (.), denote a latent variable that is
invoked if person p tries to solve item i. Ỹpi is essentially the ability of the person plus a
noise variable and can be seen as the random ability of the person. The category boundaries approach assumes that category r is observed if the latent variable is between
thresholds δir and δi,r+1 . More formally, one has Ypi = r ⇔ δir ≤ Ỹpi < δi,r+1 . It
is easily seen that one obtains the cumulative model and thresholds have to be ordered.
Thissen and Steinberg (1986) called the graded response models “difference” models because the probabilities are given as differences, P (Ypi = r) = F (θp − δir ) −
F (θp − δi,r+1 ). Although they also start with binary models they do not further investigate that the models have to hold simultaneously.
2.2 Conditional Comparison of Categories: the Partial Credit and General Adjacent
Categories Models
Rather than compare groups of categories by utilizing a binary model one can also
compare two categories from the set of categories {0, 1, . . . , k}. A choice that suggests itself are adjacent categories. Let the binary models that compare two adjacent
categories be given by
P (Ypi = r|Ypi ∈ {r − 1, r}) = F (θp − δir ),
r = 1, . . . , k.
(5)
Again all the models contain the same person parameter but model-specific item parameters. For the logistic distribution function one obtains the partial credit model
P
exp( rl=1 (θp − δil ))
, r = 1, . . . , k,
P (Ypi = r) = Pk
Ps
exp(
(θ
−
δ
))
p
il
s=0
l=1
which was propagated by Masters (1982) and Masters and Wright (1984). It is equivalent to the polytomous Rasch model, which is just a different parameterization, see, for
example, Andrich (2010). Thissen and Steinberg (1986) called the partial credit model
model a “divide-by-total” model because of the denominator in the probabilities. However, the family of adjacent categories models is much larger because in (5) any strictly
monotone distribution function can be used, for example, the use of the normal distribution yields a probit version of the adjacent categories model. In the logistic version
sufficient statistics for item and person parameters are available. While the existence
of sufficient statistics is an advantage if one wants to estimate parameters conditionally
it is of lesser importance if one uses marginal estimates. We refer to this model class
more generally as adjacent categories models, of which the polytomous Rasch model
or partial credit model are just the most prominent members.
An alternative form of the partial credit model, which emphasizes the implicit comparison of categories is
P (Ypi = r)
= θp − δir , r = 1, . . . , k.
(6)
log
P (Ypi = r − 1)
5
That means, the PCM directly compares two adjacent categories, and θp determines
the strength of the preference for the higher category.
It should be emphasized that the binary models used as building blocks are conditional models, it is assumed that a binary model holds given the response is in two
categories from the set of available categories. This is seen from the representation
(5) but hidden in the representation (6). However, it has consequences for the interpretation of parameters. The item parameters represent thresholds given the response is
in categories {r − 1, r} and the trait parameters are the abilities to score r rather than
r − 1 given the response is in categories {r − 1, r}. Therefore, the parameters refer to a
local conditional decision or preference although changing the item parameter changes
the probabilities of all possible outcome values since the PCM assumes that the binary
models hold simultaneously.
The conditional structure is also seen if the model is represented by using split
(r−1)
(r+1)
variables. Since the condition Ypi
= 1, Ypi
= 0 is equivalent to Ypi ∈ {r − 1, r}
one obtains that the PCM is equivalent to postulating for all split-variables
(r)
(r−1)
P (Ypi = 1|Ypi
(r+1)
= 1, Ypi
= 0) = F (θp − δir ),
where F (.) is the logistic distribution function. It means that a Rasch model holds for
(r)
(r−1)
the split-variable Ypi given the split Ypi
is in favor of higher categories while the
(r+1)
split Ypi
is in favor of lower categories.
The class of adjacent categories model also contains simplified versions that use
sparser parameterizations. By assuming that the item parameters can be decomposed
into two terms in the form δil = δi + τl , one obtains the Rasch rating scale model
(Andrich, 1978, 2016). The model can also be extended to include a slope parameter
if it is included in the binary model that distinguishes between adjacent categories
(Muraki, 1990, 1997).
2.3 Conditional Comparison of a Single Category and a Group of Categories:
Sequential Models
In achievement tests frequently items are used that are solved in consecutive
observed
√
3
steps. For example, a mathematical problem may have the form:
(
49
−
9)
=?. One
√
can distinguish four levels: no problem solved (level 0), 49 = 7 solved (level 1),
7 − 9 = −2 solved (level 2), (−2)3 = −8 solved (level 3). Obviously the sub problems
have to be solved in a consecutive way. A sub problem can only be solved if the all the
previous sub problems have been solved. A model that explicitly models the solving
of sub problems has the form
P (Ypi ≥ r|Ypi ≥ r − 1) = F (θp − δir ),
r = 1, . . . , k.
(7)
The model is known as sequential model (Tutz, 1990) or step model (Verhelst et al.,
1997). It is a process model for consecutive steps. One models the transition to
higher categories given the previous step was successful. The first step is the only
non-conditional step. If it fails, the response is in category 0 (first sub problem not
solved), if it is successful, the response is larger than 0 (first sub problem solved). In
the latter case the person tries to take the second step. If it is not successful, the response is in category 1 (second sub problem not solved), if it is successful, the response
6
is larger than 1 (second sub problem solved), etc. In the r-th step it is distinguished
between Ypi = r − 1 and Ypi ≥ r given at least level r − 1 is reached (Ypi ≥ r − 1).
In the model the parameter θp represents the person’s ability to successfully perform
each of the steps while δir is the difficulty in step r. Of course, later steps can be easier
than early steps, thus item difficulties are not necessarily
ordered. In the example step
√
2 (7 − 9) is certainly easier to master than step 1 ( 49 = 7). However, sub problem
2 can be only solved after step 1 was successful. Therefore, the item parameters have
local meaning, they refer to the difficulty in a step given that all previous steps were
successful. In contrast, the same ability parameter is present in each of the steps, which
makes the model uni-dimensional in terms of person parameters.
The logistic version of the model, also called logistic sequential model, can be
given in the alternative form of a continuation ratio model,
P (Ypi ≥ r)
log
= θp − δir , r = 1, . . . , k,
(8)
P (Ypi = r − 1)
(Agresti, 2013). The logits on the left hand side compare the categories the probability of a response in the categories {r, . . . , k} to the probability of a response in
category {r − 1}. In this sense the binary models contained in the sequential model
compare groups of categories to single categories. This comparison is also seen from
the tree representation of the model given in Figure 2, which shows the sequence of
(conditional) binary splits in a sequential model with four categories. In the r-th step a
decision between category {r − 1} and categories {r, . . . , k} is obtained. The split is
conditional, given categories {r − 1, . . . , k}, that means, under the condition that the
previous step was successful.
A disadvantage of the model representation (8) is that it does not directly show the
underlying process. The implicit conditioning on responses Ypi ≥ r, which is essential
for the interpretation of the model parameters, gets lost. It is however seen in the model
representation with split variables given by
(r)
(r−1)
P (Ypi = 1|Ypi
(1)
= 1, . . . , Ypi = 1) = F (θp − δir )
(1)
, r = 1, . . . , k,
(9)
(k)
which again shows that the split variables (Ypi . . . , Ypi ) form a Guttman space.
A rating scale version of the model, in which the parameter
P δir is split up into a an
item location parameter δi and a step parameter τr , with r τr = 0 has been considered by Tutz (1990), extended versions with predictor αir θp − δir and nonparameteric
versions have been considered by Hemker et al. (2001).
2.4 Overview on Classical Ordinal Models
The fact that all the models contain binary models that split categories into two subsets
can be exploited to distinguish between models by focusing on the underlying conditioning. In particular, in the partial credit model and the sequential model the splits
are conditional whereas in the cumulative model the splits are simultaneous but not
conditional. Figure 1 visualizes the resulting hierarchy of models.
In Table 1 the models are given in various representations. The left column shows
the logistic versions of the models. It shows which categories or groups of categories
7
0, 1, 2, 3
First Step
1, 2, 3
0
Unsuccessful: level 0
Second Step
2, 3
1
Unsuccessful: level 1
Third Step
2
3
Unsuccessful: level 2
F IGURE 2: The sequential model as a hierarchically structured model.
TABLE 1: Overview of traditional ordinal models.
Cumulative
Adjacent
Sequential
Category Representation
Conditional Representation
Conditional Representation
Logistic Version
log(.) = θp − δir
General Version
P (.) = F (θp − δir )
With split variables
P (.) = F (θp − δir )
P (Y ≥r)
log P (Ypi
pi <r)
P (Y =r)
log P (Ypipi=r−1)
P (Y ≥r)
log P (Ypipi=r−1)
P (Ypi ≥ r)
P (Ypi = 1)
P (Ypi = r|Ypi ∈ {r − 1, r})
P (Ypi ≥ r|Ypi ≥ r − 1)
(r)
(r)
(r−1)
P (Ypi = 1|Ypi
(r)
(r+1)
= 1, Ypi
(r−1)
P (Ypi = 1|Ypi
are compared. In particular it is seen which type of logits are determined by the difference between person parameter and item parameter, θp −δir . For example, in the partial
credit model one has the adjacent categories logits log(P (Ypi = r)/P (Ypi = r − 1)), in
the sequential model one has the continuation ratios log(P (Ypi ≥ r)/P (Ypi = r − 1)).
In the middle column the general conditional representations of the models are given.
In these representations the distribution function F (.) can be any strictly monotonic
distribution function. It shows which conditional binary response models are contained in the ordinal model. In the case of the graded response model the condition is
empty since it is a non-conditional model.
The right column shows the representation of the general models with split variables. It also shows clearly the conditioning implicitly contained in the models. It
(1)
(k)
should be emphasized that in all the models the split variables (Ypi . . . , Ypi ) form a
Guttman space with outcomes having the form (1, . . . , 1, 0, . . . , 0). They can be seen
as generating the Guttman space, which is always defined and is not linked to any
specific model, see Tutz (2020b).
The dichotomizations or Guttman variables also clarify the meaning of parame8
= 1)
= 0)
(r)
ters. In the graded response model the split variables Ypi , which distinguish between
a response Ypi ≥ r (strong performance) and Ypi < r (weak performance), are directly
linked to the difference between ’ability’ and item parameter, θp −δir . The corresponding binary models for split variables are unconditional and have to hold simultaneously.
This allows to see the item difficulties as thresholds that are necessarily ordered, and
(r)
which have to be exceeded to obtain higher levels. The variables Ypi should not be
(r)
seen as steps. Ypi = 1 simply denotes that a person has at least performance level
r. Since performance levels are ordered, that means, its performance cannot be below
(1)
(r)
level r, or, in split variables, Ypi = · · · = Ypi = 1, which is the Guttman property of
(r+1)
the binary responses. One observes Ypi = r, if, in addition Ypi
= 0, which means
that the performance is below level r + 1. However, no steps or transitions are needed
to explain the level of performance. As Andrich (2015) argues, if a performance like
acting is to be classified according to some protocol, the judge places the person’s performance in one of the categories on the trait, not how the person transitioned in getting
to the category. Even in simple binary models for problem solving one observes if the
problem was solved or not, but not the transition. Thus, when considering ordinal models and the binary models contained in them there is no reason to construct a transition.
It might be misleading and is not compatible with the underlying process, which is
determined by simultaneous dichotomizations or the placing on the continuum of the
latent scale, which is divided by the thresholds δi1 ≤ · · · ≤ δi,k .
Interpretation of parameters is quite different in conditional models. Let us start
with the sequential model since it is by construction a step or transition model. The
(r)
(r−1)
split variables representation, P (Ypi = 1|Ypi
= 1) = F (θp − δir ), shows that the
difference between ability and item difficulty determines if the performance is above
or in category r given at least performance level r − 1 has been reached. It makes the
parameter δir a local threshold parameter. There is no ordering of thresholds involved
since later steps might be easier than previous steps.
In the partial credit model the decision that the performance is in category r as a
function of the difference between ability and item parameter is under the condition
(r−1)
(r+1)
Ypi ∈ {r − 1, r}, or equivalently, Ypi
= 1, Ypi
= 0. Thus, the binary models are
conditional models and parameters should be interpreted with reference to the conditional structure. One consequence of the conditional parameterization is that thresholds
do not have to be ordered though there has been some discussion on the ordering of
thresholds (Adams et al., 2012; Andrich, 2013, 2015; Tutz, 2020b).
Consideration of the binary models contained in ordinal models also explains why
some models are robust against the collapsing of categories. In general, if a model
holds for the original categories it does not necessarily hold if adjacent categories are
grouped yielding a smaller set of response categories, although that might be an attractive feature of a model, see also Jansen and Roskam (1986) where the extreme case of
dichotomization in the polytomous Rasch model is considered in detail. The graded
response model (with a logistic distribution function) holds also for dichotomized responses (or other groupings of adjacent categories) since the splits themselves follow
a Rasch model. This is different for the adjacent categories and the sequential model.
They are not robust against collapsing of categories because the Rasch models that are
contained are conditional. However, collapsing of categories changes the conditioning.
If, for example, categories rand r + 1 are collapsed to form a new category, the condi9
1, 2, 3, 4, 5, 6
Query Agreement/Disagreement
1, 2, 3
4, 5, 6
Query Extremity
2, 3
1
4, 5
6
Query Weakness of Attitude
2
3
4
5
F IGURE 3: A tree for six ordered categories, categories 1,2,3 represent levels of disagreement, categories 4,5,6 represent levels of agreement (compare Figure 3 in Böckenholt (2017)).
tions in the binary submodels after collapsing differ from the conditions in the original
set of categories for all conditions that contain the new category.
3 Hierarchically Structured Modeling: Tree-Based Models
The classical models considered in the previous section represent different types of
modelling concerning the conditioning. While the graded response model is a model
that does not rely on conditioning, the partial credit model conditions on a response in
adjacent categories. The sequential model is conditional but, in contrast to the partial
credit model, it can be represented as a tree (see Figure 2). This makes it a special
model, it is hierarchical, that means, it can be represented by a sequence of conditional
splits. Neither the graded response models nor the partial credit model are hierarchical.
More recently with IRTrees a whole class of hierarchical models has been introduced.
In the following we will first consider binary IRTrees and then consider alternative
approaches. For simplicity in the following the response categories are {1, . . . , m},
which is the common notation in IR-Trees.
3.1 Binary IRTree Models
Tree-based models assume a nested structure with the building blocks given as binary models. They were considered, among others, by De Boeck and Partchev (2012),
Böckenholt (2012), Khorramdel and von Davier (2014), Böckenholt (2017) and Böckenholt and Meiser (2017). In the following we use the presentation of IRTree models
given by Böckenholt (2017). IR-tree models are sequential process models, a response
is constructed based on a series of mental questions. For illustration we consider
an ordinal response with six categories representing ordinal outcomes ranging from
“strongly disagree” to “strongly agree”. Figure 3 shows the corresponding tree, which
is equivalent to Figure 3 in Böckenholt (2017). The first query determines a respondent’s agreement or disagreement. The second query determines the extremity of the
(dis)agreement and the third query assesses whether the agreement is weak or not. For
each query in the tree, which corresponds to a conditional binary decision one uses a
10
binary model. For query q the model is given by
(q)
P (Y(q)pi = 1) = F (θp(q) − δi ),
(10)
and the (local) response variable Y(q)pi is often referred to as a pseudo-item.
Pseudo-items are conditional dichotomizations, and can also be represented by split
variables. For example, the query that determines the extremity within agreement categories, distinguishing between category 6 and categories {4, 5} corresponds to mod(6)
(6)
(4)
elling the split variable Ypi |Ypi ∈ {4, 5, 6}, or alternatively Ypi |Ypi = 1. Thus, tree
models implicitly use the same dichotomizations as traditional ordinal models.
However, there is one crucial difference between traditional models and IRTree
models. While the former typically use one person parameter (and split-specific item
parameters) the majority of IRTree models uses query-specific person parameters as
given in (10). This makes the models multi-dimensional in terms of person parameters
and person parameters are interpreted with reference to the specific query, that is, the
conditional decision. In the tree given in Figure 3 the basic propensity to agree or
disagree is modelled in the first query. The person parameters in the next queries
refer to response styles, whether a person prefers extreme or middle categories. The
parameterization seems not to efficiently use the information in the ordered categories
since the propensity to agree or disagree is not present in later queries, though it might
also determine the choice between category groups {1} and {2, 3}.
Only recently more efficient binary trees have been proposed that use the same traits
in more than one query (Tutz and Draxler, 2019; Meiser et al., 2019). In particular the
approach of Meiser et al. (2019) is very attractive. They do not simply use the same
trait in different queries but use scaled versions, which may be seen as factor loadings.
(1)
Let, for example, θp denote the trait in the first pseudo-item, which distinguishes
between agreement and disagreement, then the pseudo item Y(2)pi , which distinguishes
between {1} and {2, 3}, can be parameterized by
(2)
P (Y(2)pi = 1) = F (θp(2) + αθp(1) − δi ),
(2)
where the term θp represents the tendency to prefer categories {2, 3} (given the re(1)
sponse is in categories {1, 2, 3}), and αθp represents the scaled tendency to higher
response categories. In a similar way scaled versions of traits from previous queries
are used in other pseudo-items, for details see Meiser et al. (2019). The strength of
such parameterizations is that the same person parameter is present on several levels
of the tree, and parameters that are specific to pseudo-items get a distinct meaning, for
example as a tendency to extreme or less extreme categories.
A major topic in binary trees is the modelling of response styles. However, IRTrees
provide a wide class of flexible modelling tools that is not limited to response styles.
For example, the first query may assess a person’s tendency to select a midscale answer
indicating neutrality in five or seven-grade Likert scales. Then the first split distinguishes between the middle category and other categories, the following splits model
the response if the neutral middle category is avoided. The resulting tree has an asymmetric form, see Figure 4. Models of this type are useful since the role of the neutral
category is ambivalent. Kulas et al. (2008) investigated whether it is used to indicate
a moderate standing on a trait/item, or rather is viewed by the respondent as a ’dumping ground’ for unsure or non-applicable response. In the latter case the use of the
11
1, 2, 3, 4, 5
neutral or not
1, 2, 4, 5
3
within disagreement/agreement categories
1
2
4
5
F IGURE 4: A tree for five ordered categories, categories 1,2 represent low response
categories, categories 4,5 represent high response categories, 3 is the neutral middle
category.
middle category as part of the integer protocol might yield strongly biased results. An
initial binary split that distinguishes between the neutral category and other categories
can avoid bias. Binary trees of this type have been considered by Jeon and De Boeck
(2016); Böckenholt (2017) and, more recently, by Plieninger (2020); Tutz (2020a).
3.2 Hierarchical Partitioning Using Ordinal Models
Binary splits are simple but yield rather large trees with many nodes. An alternative
that exploits the ordering of categories and provides sparser parameterizations is to use
ordinal models as building blocks.
Let us again consider an example with six ordered categories. Instead of using the
binary splits tree given in Figure 3 one can work with the tree given in Figure 5. It has
a simpler structure with only two levels in addition to the 0-level, which contains all
categories. One can model the propensity to agree or disagree by
(1)
P (Ypi ≥ 4) = F (θp − δi ),
(11)
(1)
where δi is the level 1 item parameter. The conditional propensity to choose from one
of the categories in level 2 can be specified by any simple ordinal model, for example,
by conditional graded response models,
(2)
P (Ypi ≥ r|Ypi ≤ 3) = F (αθp − δir ), r = 2, 3,
P (Ypi ≥ r|Ypi ≥ 4) = F (αθp −
(2)
δir ), r
= 5, 6,
(12)
(13)
where α scales the person parameter at the second level. The model has just one
parameter more than the simple graded response model, however, order restrictions are
(2)
(2)
(2)
(2)
weaker. One just has δi2 ≤ δi3 and δi5 ≤ δi6 whereas in the simple cumulative
model five thresholds have to be ordered. Thissen-Roe and Thissen (2013) considered
a two-decision model of this type, which uses a modified graded response model in the
second level.
Within the model it is straightforward to include response styles by adding just one
(2)
person parameter. In the extended model P (Ypi ≥ r|Ypi ≤ 3) = F (αθp + γp − δir ),
(2)
r = 2, 3, P (Ypi ≥ r|Ypi ≥ 4) = F (αθp − γp − δir ), r = 5, 6, the parameter γp
is a response style parameter that contains the tendency to middle categories. In a
12
1, 2, 3, 4, 5, 6
Query Agreement/Disagreement
1, 2, 3
4, 5, 6
Query Extremity
1
2
3
4
5
6
F IGURE 5: A tree for six ordered categories with three levels .
binary tree as given in Figure 3 several additional parameters are needed to account for
response styles whereas in the simpler structured tree in Figure 5 only one additional
parameter is needed. For the estimation one can use similar methods as in binary
trees, exploiting that likelihood contributions can be written as products of conditional
probabilities (Tutz and Draxler, 2019).
The class of hierarchical partitioning models is characterized by containing ordinal models for more than two categories as constituents. Instead of using just binary
splits the order in responses is exploited efficiently by using ordinal models as building
blocks. When modeling the response within agreement and disagreement categories
any simple ordinal model can be used. Since the ordinal models can be represented
by split variables the same holds for the model built from these blocks. They are in
particular helpful to obtain sparse parameterizations.
4 A Taxonomy of Polytomous Item Response Models Including Tree
Structured Models
The taxonomy of ordinal models given in Figure 1 covers only the basic models. An
extended taxonomy of polytomous IRT models that also contains the general class of
hierarchically structured models is given in Figure 6. It also includes the nominal
model and mixture models to be considered later Here we focus on the structure that
is obtained by they way how ordinal models can be constructed from binary models
as building blocks. At the outset it is distinguished between conditional models and
simultaneous splits, that is, graded response models. The former use binary models
in a conditional way by assuming that the choice between categories has already been
narrowed down to a reduced set of categories. In contrast, the latter assume no conditioning but assume that the splits between categories are simultaneously determined
by the same person parameter.
There are two groups of conditional models. In the first group pairs of categories
are compared by utilizing a binary response model to obtain, for example, the partial
credit model and its simplified or extended versions. The second group is formed by
hierarchical models. The crucial difference between non-hierarchical and hierarchical
models is that in the former the conditions under which binary models are assumed to
hold are overlapping. For example, in the partial credit model one binary sub model
13
Polytomous Models
Nominal Model
Conditional Models
Graded Response Model
Simultaneous Splits Model
Mixture Models
Homogeneous MM
Non-hierarchical Models
Adjacent categories models
Rating scale model
Heterogeneous MM
Hierarchically structured Models
Binary IRTrees
conditional binary splits
Asymmetric Trees
Hierarchical Partitioning
using
conditional ordinal models
Symmetric Trees
F IGURE 6: Hierarchy of polytomous models.
conditions on the the categories {0, 1}, another sub model conditions on {1, 2}. Both
conditions contain the category 1. This overlapping prevents a representation as a
hierarchical model.
Hierarchically structured models can be divided into two types of models, binary
IRTrees and hierarchical partitioning approaches. The former use only binary models to describe the conditional response in subsets of categories while the latter use
traditional models with more than two categories as building blocks.
One can further distinguish between symmetric and asymmetric tree models. Symmetric models are in particular useful for Likert items to account for the symmetry in
answer categories. Symmetric tree models can be defined by considering subsets of
categories S1 , S2 ⊂ {1, . . . , a}, where a = m/2 if m is even, and a = (m − 1)/2 if m
is odd. An IRTree model is symmetric if for any (conditional) split between S1 and S2
there is a split between S3 = {r|m − r + 1, r ∈ S2 } and S4 = {r|m − r + 1, r ∈ S1 }.
An example with an even number of categories is the splitting structure shown in Figure 3. If the number of categories is odd, and Likert items are considered, the first
split typically distinguishes between the neutral category and the other categories. Although the visual appearance of the corresponding tree shows some asymmetry, see
Figure 4, the corresponding model is symmetric, and treats categories in a proper way.
Therefore, it is essential to distinguish between the symmetry of a tree and the symmetry of the model. While the former refers to the tree structure the latter refers to the
corresponding model.
A classical example of a tree model that is not symmetric is the sequential model.
It is, in particular, not invariant under the reverse permutation of categories; if the order
14
of categories is reversed the corresponding sequential model differs from the sequential model for the original categories. In contrast, most symmetric models in common
use are invariant under the reverse permutation, namely symmetric models, in which
the response function F (.) is a symmetric distribution function. The distinction between asymmetric and symmetric models can be made for all hierarchically structured
models. It is included in the taxonomy only for binary IRTrees, which have been investigated in the literature more intensively than other hierarchical models.
In general, the graded response model and the adjacent categories models can be
used for any form of graded responses, in achievement tests as well as in the investigation of attitudes. Hierarchically structured models are somewhat different, they are
process models tailored to model a specific process. The sequential model assumes
that levels of performance are reached successively, and therefore is most useful in
items that are constructed with categories that represent successive solutions levels.
Binary IRTrees and hierarchical partitioning approaches assume a specific conditional
structure that aims at modeling the way how respondents generate a response. In hierarchically structured models, as in all conditional models, item parameters have to be
interpreted locally since they refer to conditional decisions.
5 Nominal Models, Ordinal Models and the Role of Parameterizations
All the polytomous IRT models considered so far can be considered ordinal models in
the sense that they exploit the ordering of categories. A model that is different in this
aspect is the so-called nominal model. It can be seen as a model that aims at detecting
the order rather than using it, but also as a sort of background model from which
specific ordinal models can be derived. In a taxonomy of polytomous IRT models,
which essentially is a taxonomy of ordinal models, it should be included and its role
be investigated.
Another major topic in the following is the role of the parameterization of a model,
which also plays a role in the transformation of the nominal model into an ordinal
model. Variations in parameterizations yield more or less complex models of specific
model types in the hierarchy given in Figure 6, various parameterizations can be used
on every level of the hierarchy (Section 5.2). Another aspect of the parameterization
within the taxonomy considered here concerns the link between parameterization and
the exploitation of the ordering of categories. It is argued that the split variables and
their specific parameterization make models ordinal models.
5.1 Nominal Models
The taxonomy uses that ordinal models can be seen as composed from simpler, in
particular binary models. This is most obvious in IRTrees but holds also for basic
models as the graded response model. What makes the models ordinal ones is that the
binary models are assumed to hold only for specific subsets of categories. For example,
the graded response model assumes binary models to hold for subsets {0, . . . , r} and
{r+1, . . . , k}. None of the ordinal models are built from binary models that distinguish
between subsets such as {3, 7} and {5}. All subsets that are used reflect the ordering of
the categories {0, . . . , k}. More concrete, binary models are assumed to hold, possibly
conditionally, for subsets S1 , S2 , with c1 < c2 for c1 ∈ S1 , c2 ∈ S2 , and the binary
15
models distinguish between S1 an S2 in a way such that an increase in θp increases the
probability of a response in S2 . It can be seen as an ’ordered subsets’ characterization
of ordinal models, which is linked to split variables as considered later.
This aspect is emphasized since ordinal models are sometimes derived from models
that do not use the order of categories. The most widely used model to this end is
Bock’s nominal model (Bock, 1972)
exp(αir θp − βir )
P (Ypi = r) = Pk
,
s=0 exp(αis θp − βis )
r = 1, . . . , k,
(14)
in which additional constraints are needed to ensure identifiability of parameters, see
Bock (1972), Thissen and Cai (2016). In the basic form the model uses only the nominal scale of the response, however, it can be transformed to use the order information.
A first step is to set αir = φr , where φr are considered scoring functions for the categories, yielding Andersen’s version of the model (Andersen, 1977). If, in addition, it
is assumed that the scores are ordered, that is, φ1 ≤ · · · ≤ φk , one obtains a model that
actually uses the ordering of categories. If one assumes equi-distant scores, φr = r one
obtains the partial credit model, which has been noted among others by Thissen and
Steinberg (1986), where much more general transformations were considered. Also a
general-purpose multidimensional model was considered by Thissen et al. (2010).
The nominal model can be seen as a useful background model from which various
ordinal models can be derived as special cases. It is also interesting from a conceptual
point of view since it has also been used in a different way, namely to check the order
of categories if that is not clear. This use is linked to different conceptualizations of ordinal and nominal models. An ordinal model, in the sense used here, exploits the order
of categories while a nominal model is a model that is invariant against permutations
of response categories. In its general form the model (14) is a nominal model but not
an ordinal one since it is stable under permutations.
On the other hand it uses a uni-dimensional trait, which implicitly assumes an
order of the latent trait and therefore on the responses. This makes it a model that
can be used to investigate the order of categories. Fitting the unconstrained model
might yield information on the order, and it can be used to fit responses constructed
for testlets, see, for example, Thissen and Cai (2016). It can also be used to provide
scores using information from all responses even when the response categories are not
clearly ordered. When used in this way it does not exploit the order of categories but
aims at investigating the order empirically. This is a different concept of dealing with
ordinality, namely using the model ’to examine the expected, or empirical, ordering of
response categories’ (Thissen and Cai, 2016).
It becomes an ordinal model in the sense used here if restrictions are imposed.
Moreover, it generates a whole family of models that is strongly linked to the partial
credit model. It should be noted that it does not generate the general adjacent categories
model, but only models that use the logistic link, which, however, are the most widely
used ones. This is visualized in the tree structure given in Figure 6. Adjacent categories
models are sub models of nominal models (if the logistic link is used) but can also be
considered as specific conditional models (for any response function F (.)).
16
5.2 Models and Parameterizations
In the proposed taxonomy the parameterization is considered secondary. That does not
mean that parameterization is not important. It is important, and much of the more recent latent trait literature is devoted to account for specific features like response styles
or differential item functioning, which can be investigated by using specific parameterizations. However, parameterizations do not alter the structure given in Figure 6. They
may be seen as special cases within this framework.
Given the conditional or unconditional structure of a model quite differing parameterizations can be used. Instead of the simple difference between person parameters
θp − δir one can include slope parameters yielding αi (θp − δir ). Response style effects can be modeled by adding additional person parameters yielding models with a
multi-dimensional person parameter, see Johnson (2003) for cumulative type models
with extreme response styles, and Wetzel and Carstensen (2017), Plieninger (2016), Jin
and Wang (2014), Tutz et al. (2018) for partial credit models that account for response
styles.
The taxonomy given in Figure 6 also includes models that make much weaker
assumptions on the response functions. Introduced by Mokken (1971) nonparametric IRT models have been extended to a wide class of models nonparametric IRT
models, see, for example, Sijtsma and Molenaar (2002). Assumptions are much
weaker, only local independence, uni-dimensionality and some form of monotonicity are needed. Ordinal nonparametric models can be derived by using more general
functions in the binary models that are the building blocks of the models in the taxonomy. Instead of using the parametric form F (θp − δi ) one uses a uni-dimensional
monotonic function. For example, the cumulative version is obtained by assuming
P (Ypi ≥ r) = Mir (θp ), where Mir is a strictly increasing function that can depend on
the item i and the response category r. Corresponding adjacent categories and sequential models are obtained by using on the left hand side P (Ypi = r|Ypi ∈ {r − 1, r}) or
P (Ypi = r|Ypi ≥ r − 1), respectively. Models of this form have been considered by
Hemker et al. (1997, 2001).
Parameterizations yield specific hierarchies if the conditioning (the type of model)
and the response function are fixed. An example is the hierarchy given by Hemker
et al. (2001) in their Figure 2 for sequential models, which starts with the very restrictive sequential rating scale model, in which the location parameter is split into an item
location parameter and a step parameter. The most general models in this hierarchy
are Samejima’s acceleration model Samejima (1995) and the nonparametric sequential
model. Similar hierarchies may be built for all of the models identified in Figure 6 but
they are hierarchies generated by parameterization within the taxonomy. The taxonomy itself, which shows the relationship of models characterized by their conditioning,
is unchanged.
Taxonomies that focus on parameterizations have been given by Hemker et al.
(1997), Hemker et al. (2001) and Sijtsma and Hemker (2000). They study carefully
which parameterizations are special cases of other ones and display the structure in
Venn diagrams. They also investigate so-called measurement properties of models as
the monotone likelihood ratio, stochastic ordering properties and invariant item ordering, and show which models have these properties.
17
5.3 Characterization of Ordinal Models
In the following the fundamental structure of ordinal models is investigated. It is argued that binary models for split variables are the essential constituents of models that
are able to exploit the ordering of categories. Although typically there is some intuition
why models are appropriate for ordered responses, for example the ordered thresholds
on the latent scale in cumulative models, and the process from which the sequential
model is derived, these motivations do not yield a general conceptualization of ordinal
models. Nevertheless, in a taxonomy of polytomous models it seems warranted that
one distinguishes between ordinal and nominal models.
All the models considered here contain binary submodels of the form P (Yp+ ∈
S1 |Yp+ ∈ S2 ) = g(θp , {δis }), where S1 ⊂ S2 and {δis } is a set of item parameters. For example, the tree in Figure 3 contains a binary model that distinguishes
between {1} and {2, 3} given {1, 2, 3}. The corresponding model for P (Ypi ∈
{2, 3}|Ypi ∈ {1, 2, 3}) can also be described by split variables, it is equivalent to
(2)
(4)
modelling P (Ypi = 1|Ypi = 0). Also in partitioning models that have ordinal
components binary models for split variables are contained. If a cumulative model
is used to model the response given {1, 2, 3} in the structure given in Figure 5 the
model contains a binary model P (Ypi ∈ {2, 3}|Ypi ∈ {1, 2, 3}), which is equivalent to
(2)
(4)
P (Ypi = 1|Ypi = 0).
However, binary models for split variables are not only submodels of ordinal model
but are also the building blocks of the models. Therefore, a general class of models
can be defined by postulating that there exists a finite number of submodels for split
variables
(l)
(s)
(r)
(15)
P (Ypi = 1|Ypi = 1, Ypi = 0) = gl (θp(l) |{δis }), s < r,
such that
(i) the function gl (.|{δis }) is nondecreasing for at least one l with s ≤ l ≤ r,
(ii) the response probabilities are uniquely determined by these submodels.
The set of submodels are called the model generating binary models, and the class of
generated models split variables generated ordinal models.
The second condition just ensures that the polytomous model can be constructed
from the set of binary models. The crucial ingredients in the definition are the form
of the conditioning in (15) and that the function is nondecreasing. The condition
(s)
(r)
Ypi = 1, Ypi = 0 means that conditioning refers to a sequence of categories since it
is equivalent to Ypi ∈ {s, s + 1, . . . , r − 1}. It can also be empty, which is the case
(k+1)
if s = 0, r = k + 1, and one defines Ypi
= 0. The postulate that the function is
nondecreasing ensures that the order of categories is used in a consistent way. It im(l)
plies that, whatever the conditioning, an increase in θp is in favor of higher categories,
the probability of lower categories can not increase. It excludes, in particular, that one
constructs a binary tree, in which, for example, the response in the lower category 1
given {1, 2} and the response in the higher category 6 given {5, 6} are modeled by
binary Rasch models. This construction would violate the order of categories, and is
avoided by using split variables to define the condition.
18
With the exception of the nominal model, all models considered here are split variables generated ordinal models. In the traditional models, that is, the cumulative, sequential, and adjacent categories model, it is typically assumed that the trait does not
depend on the the split, that means, one has in all models generating binary models
(l)
with θp = θp for all l. The model generating binary models are the ones given in
((l))
Table 1. In IRTrees the parameters θp are not necessarily the same, they can vary
over the binary models.
It is instructive to investigate why the nominal model is not among the class of
models specified by (15). In particular, it clarifies the conditions in the definition of
this class of models. The nominal model (14) is certainly a nominal but not an ordinal
model since it is invariant under permutations. Nevertheless, it can be constructed from
binary submodels log(P (Ypi = r)/P (Ypi = 1)) = γir θp − ξir with γir ≥ 0. These submodels yield log(P (Ypi = r)/P (Ypi = r − 1)) = (γir − γi,r−1 )θp + ξi,r−1 − ξi,r .
This model is equivalent to the nominal model, which is seen by using the reparameterization αir = γir − γi,r−1 , βir = ξi,r − ξi,r−1 . Thus, a nominal model
is constructed from binary models that are nondecreasing in person parameters because γir ≥ 0 is assumed. The crucial point is that the model generating submodels
log(P (Ypi = r)/(P (Ypi = 1) = γir θp − ξir are based on the conditioning Ypi ∈ {1, r},
which is not of the type used in the definition of the class of split variables generated ordinal models. This clarifies that one has to postulate for the (conditional) model
(s)
(r)
generating binary models that the condition is of the form Ypi = 1, Ypi = 0 or, equivalently, Ypi ∈ {s, s + 1, . . . , r − 1}. Without that condition it would not be ensured that
a model exploits the ordering of categories.
The class of split variables generated ordinal models comprises all the traditional
ordinal models as well as the hierarchical models considered previously. It does not
depend on specific parameterizations, it is just assumed that response functions are
nondecreasing, and that the condition in the generating model has a specific form determined by split variables. Although it can not be totally excluded that there might be
alternative ways to find models that use the order in categories, the considered class of
models seems rather exhaustive.
An additional advantage of characterizing ordinal models by the binary submodels
that are contained is that it is rather flexible and avoids questionable criteria. For example, Adams et al. (2012) consider categories in an item response model as ordered
if the expectation E(Yp |θp ) is an increasing function of θp . That means, if a person has
a higher value of θp than another person, then the person with the higher value will,
on average, score more. The problem with the definition is that the expectation is a
sensible measure only if the response Yp is measured on a metric scale level, however,
one wants to characterize the use of the ordinal scale level. The expectation is not
helpful for this purpose because it uses a scale level that is not assumed to be available. Alternative ways of characterizing the use of order rely on specific functions. As
Adams et al. (2012) suggest a model uses the order if for any ordered pair s < r the
function msr (θp ) = P (Ypi = r|θp )/P (Ypi = s|θp ) is an increasing function of θp . It
can be seen as a stochastic ordering property since it implies that for θp1 < θp2 one has
msr (θp1 ) < msr (θp2 ). However, one might use quite different functions, for example,
that P (Yp ≥ r|θp ) is an increasing function of θp , yielding quite different conceptualizations of ordinal models that are not compatible. The strength of the conceptualiza-
19
tion based on split variables is that no specific functions, which are somewhat arbitrary,
are needed.
6 Mixture Models
An alternative class of models that has been included in Figure 6 are mixture models.
They follow a quite different reasoning to account for heterogeneity in responses and
response styles and therefore are included as a separate class of models. General finite
mixture models for latent traits have the form
P ((Yp1 , . . . , YpI )) =
M
X
m=1
(m)
πm Pm ((Yp1 , . . . , YpI )|θp , {δ ir }).
That means the population is subdivided into M latent classes, where Pm (.) denotes
(m)
the model in the latent class m with parameters
P θp , {δ ir }, and π1 , . . . , πM denotes the
mixture probabilities of the latent classes ( m πm = 1).
Mixture item response models, originally developed for Rasch models by Rost
(1991), are strong tools to investigate unidimensionality, the presence of response
styles, and differential item functioning without assuming that the relevant grouping
variable that induces differential item functioning to be known. Extensions to ordinal
responses have been considered by Rost et al. (1997), Eid and Rauber (2000), Gollwitzer et al. (2005), Maij-de Meij et al. (2008), Moors (2010),Van Rosmalen et al.
(2010), Von Davier and Yamamoto (2004), for an overview see also Von Davier and
Yamamoto (2007).
It seems sensible to distinguish between two approaches to specifying mixture
models, the homogeneous modelling strategy and the heterogeneous strategy. In homogeneous finite mixture models the same functional form is used in all the mixture
components, for example, a partial credit model (Eid and Rauber, 2000; Gollwitzer
et al., 2005). It is assumed that respondents are from different latent classes but only
model parameters, not the structure of the model vary across classes. The approach is
not without problems. Typically the number of classes is unknown and has to be chosen driven by data. However, one gets quite different model parameters when fitting,
for example, three or four classes, since all the parameters change when considering
one more class. Even if a number of classes has been chosen it is sometimes still difficult to interpret the difference between classes and explain what exact features are
represented by classes, they might indicate a response style or some other dimension
that is involved when responding to items. Homogeneous models do not explicitly
model which trait is to be detected and are primarily exploratory tools.
Heterogeneous finite mixture models are sharper tools, they allow to use different
models in the components specifying explicitly which specific trait is to be detected.
Moreover, typically the number of components is fixed. Early mixture models of this
type are HYBRID models as proposed by Von Davier (1996); Von Davier and Yamamoto (2007). Although some HYBRID models can be represented as mixture models that have the same functional form in the components but with constraints in some
of the components (Von Davier and Yamamoto, 2007), the constraints specify which
traits are modelled. Further models with constraints have been proposed by De Boeck
et al. (2011), Shu et al. (2013).
20
A specific mixture with fundamentally different components as been considered
more recently by Tijmstra et al. (2018). They proposed a two-class mixture of a generalized partial credit model and an IRTree model, carefully designed to distinguish
between respondents who consider the middle category of a five-category Likert item
as representing one category in a sequence of ordered categories and respondents who
use the middle category as a non-response option. While the former follow a partial
credit model the response of the latter is described by a specific IRtree model that
separates the middle category.
All mixture models, in which at least one of the components is an ordinal model
account for the order of categories. Therefore, they are included in the taxonomy,
but they are a separate class of models with specific purposes. In particular they can
be used to model response styles in a quite different way than IRtrees and extensions
of classical models as the partial credit model with response style. In the following
possible approaches are considered briefly.
To avoid the pitfalls of homogeneous mixture models it might be sensible to use
structured mixtures, in which the type of response style is explicitly specified. With
response vector Y Tp = (Yp1 , . . . , YpI ) a simple two-components model has the form
P (Y p ) = πM PM (Y p |θp , {δ ir }) + (1 − πM )PRS (Y p |par),
where
in the first component responses are determined by model M with
PM (Y p |θp , {δ ir }) referring to a partial credit or some other ordinal model,
the second model PRS (Y p |par) specifies the response style that is suspected to
be present.
For example, one might investigate if a portion of respondents shows non-contingent
response style, which is found if persons have a tendency to respond carelessly, randomly, or nonpurposefully (Van Vaerenbergh and Thomas, 2013; Baumgartner and
Steenkamp, 2001) by specifying
PRS (Y p |par =
I
Y
i=1
P (Ypi |{δ m
ir }),
where {δ m
ir } are parameters that determine the marginal distribution of responses item
i. The specification means that responses on items are independent and determined
only by the item parameters. An alternative is a mixture with the component
PRS (Y p |par = PM (Y p |θp , {δ RS
ir }, γp ),
where γp are additional response style parameters in a partial credit model if, for example, a partial credit model determines the first mixture component. The parameters
{δ RS
ir } are parameters for model M in the second mixture component. Then it is assumed that respondents that are affected by response style have different parameters
than respondents without response style. However, it may also be assume that the parameters are the same as in the first component. Then one allows for respondents to be
affected by response styles in differing degrees.
21
Approaches like that go beyond the classical modelling of response styles in mixture models. In classical mixture model respondents are affected by response styles
or not, response style is considered a discrete trait (Bolt and Johnson, 2009). In contrast, in parametric models response styles are represented by parameters, which may
be small or large, varying across persons, and making response style a continuous trait.
The mixture given above combines these two worlds. Respondents may not be be
affected by response style, or may be affected, but in different degrees.
Models of this type seem not to have been considered, although there has been some
development on modelling uncertainty, which is related to non-contingent response
styles, however approaches were proposed mainly in the regression context not for
repeated measurements as item responses. For an overview of uncertainty modelling
in regression see Piccolo and Simone (2019), repeated measures were considered by
Colombi et al. (2018).
7 Concluding Remarks
It has been shown that an easily comprehensible taxonomy of ordinal item response
models can be obtained by investigating the role of building blocks and split variables within the structure of ordinal models. The structure contains traditional models,
IRTree models, and the class of hierarchical partitioning models. Although it is well
known that ordinal models contain binary models their role in the construction of ordinal models seems not to have been investigated in a systematic way to obtain a taxonomy. In particular the distinction between non-conditional and conditional models, the
split of the latter into hierarchical and non-hierarchical models, the role of the nominal
model and how it is to be distinguished from ordinal models contribute to clarify the
structuring of polytomous models.
One of the advantages of having a distinct taxonomy of models is that the meaning
of parameters becomes clear. In particular, effects in conditional models should be
interpreted with regard to the conditioning, which holds for parametric and nonparametric approaches. Parameterizations do not determine the taxonomy. They primarily
determine the complexity of the model, and specify which effects are included in the
model. Alternative parameterization have different meanings, and an additional slope
parameter has a quite different interpretation if it is included in an adjacent categories
model or an IRTree. Their meaning depends on the model type, and therefore on the
placement in the taxonomy.
References
Adams, R. J., M. L. Wu, and M. Wilson (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement 72(4),
547–573.
Agresti, A. (2013). Categorical Data Analysis, 3d Edition. New York: Wiley.
Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika 42,
69–81.
22
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika 43(4), 561–573.
Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the
polytomous Rasch model. Psychometrika 75(2), 292–308.
Andrich, D. (2013). An expanded derivation of the threshold structure of the polytomous Rasch model that dispels any ’threshold disorder controversy’. Educational
and Psychological Measurement 73(1), 78–124.
Andrich, D. (2015). The problem with the step metaphor for polytomous models for
ordinal assessments. Educational Measurement: Issues and Practice 34(2), 8–14.
Andrich, D. (2016). Rasch rating-scale model. In W. Van der Linden (Ed.), Handbook
of Modern Item Response Theory, pp. 75–94. Springer.
Baumgartner, H. and J.-B. E. Steenkamp (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research 38(2), 143–
156.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are
scored in two or more nominal categories. Psychometrika 37(1), 29–51.
Böckenholt, U. (2012). Modeling multiple response processes in judgment and choice.
Psychological Methods 17(4), 665–678.
Böckenholt, U. (2017). Measuring response styles in Likert items. Psychological
methods (22), 69–83.
Böckenholt, U. and T. Meiser (2017). Response style analysis with threshold and
multi-process irt models: A review and tutorial. British Journal of Mathematical
and Statistical Psychology 70(1), 159–181.
Bolt, D. M. and T. R. Johnson (2009). Addressing score bias and differential item
functioning due to individual differences in response style. Applied Psychological
Measurement 33(5), 335–352.
Colombi, R., S. Giordano, A. Gottard, and M. Iannario (2018). Hierarchical marginal
models with latent uncertainty. Scandinavian Journal of Statistics, to appear.
De Boeck, P., S.-J. Cho, and M. Wilson (2011). Explanatory secondary dimension
modeling of latent differential item functioning. Applied Psychological Measurement 35(8), 583–603.
De Boeck, P. and I. Partchev (2012). Irtrees: Tree-based item response models of the
glmm family. Journal of Statistical Software 48(1), 1–28.
Eid, M. and M. Rauber (2000). Detecting measurement invariance in organizational
surveys. European Journal of Psychological Assessment 16(1), 20–30.
Garcı́a-Pérez, M. A. (2017). An analysis of (dis) ordered categories, thresholds, and
crossings in difference and divide-by-total irt models for ordered responses. The
Spanish Journal of Psychology 20, 1–27.
23
Gollwitzer, M., M. Eid, and R. Jürgensen (2005). Response styles in the assessment of
anger expression. Psychological assessment 17(1), 56.
Hemker, B. T., K. Sijtsma, I. W. Molenaar, and B. W. Junker (1997). Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika 62(3), 331–347.
Hemker, B. T., L. A. van der Ark, and K. Sijtsma (2001). On measurement properties
of continuation ratio models. Psychometrika 66(4), 487–506.
Jansen, P. G. and E. E. Roskam (1986). Latent trait models and dichotomization of
graded responses. Psychometrika 51(1), 69–91.
Jeon, M. and P. De Boeck (2016). A generalized item response tree model for psychological assessments. Behavior research methods 48(3), 1070–1085.
Jin, K.-Y. and W.-C. Wang (2014). Generalized irt models for extreme response style.
Educational and Psychological Measurement 74(1), 116–138.
Johnson, T. R. (2003). On the use of heterogeneous thresholds ordinal regression
models to account for individual differences in response style. Psychometrika 68(4),
563–583.
Khorramdel, L. and M. von Davier (2014). Measuring response styles across the big
five: A multiscale extension of an approach using multinomial processing trees.
Multivariate Behavioral Research 49(2), 161–177.
Kulas, J. T., A. A. Stachowski, and B. A. Haynes (2008). Middle response functioning
in likert-responses to personality items. Journal of Business and Psychology 22(3),
251–259.
Maij-de Meij, A. M., H. Kelderman, and H. van der Flier (2008). Fitting a mixture
item response theory model to personality questionnaire data: Characterizing latent
classes and investigating possibilities for improving prediction. Applied Psychological Measurement 32(8), 611–631.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika 47,
149–174.
Masters, G. N. and B. Wright (1984). The essential process in a family of measurement
models. Psychometrika 49, 529–544.
Meiser, T., H. Plieninger, and M. Henninger (2019). Irt ree models with ordinal
and multidimensional decision nodes for response styles and trait-based rating responses. British Journal of Mathematical and Statistical Psychology.
Mokken, R. J. (1971). A theory and procedure of scale analysis. Berlin: Walter de
Gruyter.
Moors, G. (2010). Ranking the ratings: A latent-class regression model to control
for overall agreement in opinion research. International Journal of Public Opinion
Research 22(1), 93–119.
24
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data.
Applied Psychological Measurement 14(1), 59–71.
Muraki, E. (1997). A generalized partial credit model. Handbook of modern item
response theory, 153–164.
Piccolo, D. and R. Simone (2019). The class of CUB models: statistical foundations, inferential issues and empirical evidence. Statistical Methods and Applications, https://doi.org/10.1007/s10260-019-00461-1.
Plieninger, H. (2016). Mountain or molehill? a simulation study on the impact of
response styles. Educational and Psychological Measurement 77, 32–53.
Plieninger, H. (2020).
Developing and applying ir-tree models: Guidelines,
caveats, and an extension to multiple groups. Organizational Research Methods,
doi:10.1177/1094428120911096.
Rost, J. (1991). A logistic mixture distribution model for polychotomous item responses. British Journal of Mathematical and Statistical Psychology 44(1), 75–92.
Rost, J., C. Carstensen, and M. Von Davier (1997). Applying the mixed rasch model
to personality questionnaires. Applications of latent trait and latent class models in
the social sciences, 324–332.
Samejima, F. (1995). Acceleration model in the heterogeneous case of the general
graded response model. Psychometrika 60(4), 549–572.
Samejima, F. (2016). Graded response model. In W. Van der Linden (Ed.), Handbook
of item response theory, pp. 95–108.
Shu, Z., R. Henson, and R. Luecht (2013). Using deterministic, gated item response
theory model to detect test cheating due to item compromise. Psychometrika 78(3),
481–497.
Sijtsma, K. and B. T. Hemker (2000). A taxonomy of irt models for ordering persons and items using simple sum scores. Journal of Educational and Behavioral
Statistics 25(4), 391–415.
Sijtsma, K. and I. W. Molenaar (2002). Introduction to nonparametric item response
theory, Volume 5. Sage.
Thissen, D. and L. Cai (2016). Nominal categories model. In W. Van der Linden (Ed.),
Handbook of Modern Item Response Theory, pp. 51–73. Springer.
Thissen, D., L. Cai, and R. D. Bock (2010). The nominal categories item response
model. Handbook of polytomous item response theory models, 43–75.
Thissen, D. and L. Steinberg (1986). A taxonomy of item response models. Psychometrika 51(4), 567–577.
Thissen-Roe, A. and D. Thissen (2013). A two-decision model for responses to Likerttype items. Journal of Educational and Behavioral Statistics 38(5), 522–547.
25
Tijmstra, J., M. Bolsinova, and M. Jeon (2018). Generalized mixture irt models with
different item-response structures: A case study using Likert-scale data. Behavior
Research Methods 55, 1–20.
Tutz, G. (1990). Sequential item response models with an ordered response. British
Journal of Statistical and Mathematical Psychology 43, 39–55.
Tutz, G. (2020a). Hierarchical models for the analysis of Likert scales in regression and
item response analysis. International Statistical Review, doi:10.1111/insr.12396.
Tutz, G. (2020b). On the structure of ordered latent trait models. Journal of Mathematical Psychology 96.
Tutz, G. and C. Draxler (2019). A common framework for classical and tree-based item
response models including extended hierarchically structured models. Technical
Report 227, Department of Statistics LMU Munich.
Tutz, G., G. Schauberger, and M. Berger (2018). Response styles in the partial credit
model. Applied Psychological Measurement 42, 407–427.
Van der Linden, W. (2016). Handbook of Item Response Theory. Springer: New York.
Van Rosmalen, J., H. Van Herk, and P. Groenen (2010). Identifying response styles: A
latent-class bilinear multinomial logit model. Journal of Marketing Research 47(1),
157–172.
Van Vaerenbergh, Y. and T. D. Thomas (2013). Response styles in survey research: A
literature review of antecedents, consequences, and remedies. International Journal
of Public Opinion Research 25(2), 195–217.
Verhelst, N. D., C. Glas, and H. De Vries (1997). A steps model to analyze partial
credit. In Handbook of modern item response theory, pp. 123–138. Springer.
Von Davier, M. (1996). Mixtures of polytomous rasch models and latent class models
for ordinal variables. Softstat 95.
Von Davier, M. and K. Yamamoto (2004). Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Applied Psychological
Measurement 28(6), 389–406.
Von Davier, M. and K. Yamamoto (2007). Mixture-distribution and hybrid rasch models. In Multivariate and mixture distribution Rasch models, pp. 99–115. Springer.
Wetzel, E. and C. H. Carstensen (2017). Multidimensional modeling of traits and
response styles. European Journal of Psychological Assessment (33), 352–364.
26