Academia.eduAcademia.edu

A Taxonomy of Polytomous Item Response Models

2020, arXiv (Cornell University)

A Taxonomy of Polytomous Item Response Models Gerhard Tutz Ludwig-Maximilians-Universität München arXiv:2010.01382v1 [stat.ME] 3 Oct 2020 Akademiestraße 1, 80799 München October 6, 2020 Abstract A common framework is provided that comprises classical ordinal item response models as the cumulative, sequential and adjacent categories models as well as nominal response models and item response tree models. The taxonomy is based on the ways binary models can be seen as building blocks of the various models. In particular one can distinguish between conditional and unconditional model components. Conditional models are by far the larger class of models containing the adjacent categories model and the whole class of hierarchically structured models. The latter is introduced as a class of models that comprises binary trees and hierarchically structured models that use ordinal models conditionally. The study of the binary models contained in latent trait models clarifies the relation between models and the interpretation of item parameters. It is also used to distinguish between ordinal and nominal models by giving a conceptualization of ordinal models. The taxonomy differs from previous taxonomies by focusing on the structured use of dichotomizations instead of the role of parameterizations. Keywords: Ordered responses, latent trait models, item response theory, graded response model, partial credit model, sequential model, Rasch model, item response trees 1 Introduction Various latent trait models for ordered response data have been proposed in the literature, for an overview see, for example, Van der Linden (2016). One can in particular distinguish between three basic types of models, cumulative models, sequential models and adjacent categories models. One of the objectives of the present paper is to show how these models are easily built from binary latent trait models. The way how the binary models are used to construct models helps to understand the structure of the models and to clarify the meaning of the parameters. It also provides a framework that allows to embed more recently developed ordinal item response models as, for example, tree-based models, yielding a general taxonomy of ordinal item response models. 1 The proposed taxonomy is quite different from that given by Thissen and Steinberg (1986). Their classification into “difference” models and “divide-by-total” models is based on the form of the response probabilities, which may be represented as differences or as a ratio obtained by dividing by sums of terms. The third class of models they consider are “left-side added models”, which arise if the parameterization is extended to account for guessing parameters. The taxonomy proposed here is of a different nature. It is based on exploiting how ordinal models can be constructed by using (conditional or unconditional) dichotomizations of response categories. It also works the other way, by clarifying which binary models (or dichotomizations) are contained in ordinal models. By investigating this structural aspect one obtains a taxonomy that uses that ordinal models can be characterized by the way they determine the choice of specific subsets of categories. As a preview let us consider how classical models can be described by considering dichotomizations. The basic building blocks are binary models, which in its simplest form specify that the responses of person p on item i are determined by P (Ypi = 1) = F (θp − δi ), (1) where F (.) is a cumulative distribution function, θp is the person parameter, and δi is the item parameter, typically a difficulty or threshold. An important member of this class of models is the Rasch model, which is obtained if F (.) is the logistic distribution function F (η) = exp(η)/(1 + exp(η)). Given one has a response in ordered categories {0, 1, . . . , k} there are several ways to construct an ordinal model from binary models of the form (1). The binary models can be used to compare specific categories or groups of categories from {0, 1, . . . , k}. One can, in particular, - compare groups of categories that result from splitting the categories into the subsets {0, 1, . . . , r − 1} and {r, . . . , k}, - compare (conditionally) between two categories, for example, adjacent categories, - compare (conditionally) between a category and a set of adjacent categories, for example, {r − 1} and {r, . . . , k}. The different ways to compare categories correspond to cumulative models, adjacent categories and sequential models in that order. The taxonomy given in Figure 1 distinguishes between conditional and non-conditional models, a distinction which follows from the consideration of the binary models that are contained in the ordinal models. The cumulative or graded response model is the only non-conditional model from this class of models. The other two use some sort of conditioning in the binary building blocks within the models. The graded response model corresponds to the difference models, and the adjacent categories models to the divide-by-total models in the Thissen-Steinberg taxonomy. Thissen and Steinberg (1986) did not consider sequential models, which were not known in 1986. Although in the present taxonomy and the Thissen-Steinberg taxonomy two model classes cover the same types of models, the focus is different. Here, the models are not characterized by the form of the probability but by their building 2 Ordinal Models Conditional Models Adjacent Categories Model compares adjacent categories Graded Response Model Simultaneous Splits Model Sequential Model compares category and groups F IGURE 1: Structure of classical ordinal latent trait models. blocks. The consideration of building blocks allows not only to include the sequential models and other models, but is also helpful to obtain a valid interpretation of the parameters of the models, which has not always been correct and has been a subject of debate in the literature on ordinal models, see, for example, Adams et al. (2012), Andrich (2013, 2015), Garcı́a-Pérez (2017), Tutz (2020b). The focus on building blocks allows to identify common structures beyond the choice of the response function F (.) in (1) and the parameterization. For the structuring it does not matter if one chooses the logistic response function, the normal ogive or any other strictly monotone distribution function. One might also extend the linear term to include an item discrimination parameter by using αi (θp − δi ) instead of θp − δi , or extend it to include guessing parameters. Although parameterization is considered secondary when characterizing model types it can be important as a source of ordinality in latent trait models. This will be investigated separately when considering nominal models. In Section 2 classical ordered response models are considered. It is used that they can be characterized as parameterizations of split variables, which yield the preliminary structure given in Figure 1. Section 3 is devoted to tree structured models. In particular binary IR-tree models are considered, which have been introduced as flexible models to account for response styles. Within the general hierarchically structured models it is distinguished between binary IR-tree models and partitioning models that contain ordinal building blocks. Both are embedded into the proposed framework yielding the taxonomy given in Section 4. In Section 5 the role of parameterizations and the order in ordinal models are discussed. It is in particular investigated how ordinal models can be obtained by using constraints on parameters in nominal models. The final chapter completes the taxonomy by including the class of finite mixture models, which are divided into homogeneous and heterogeneous mixture models. 2 Classical Ordered Response Models In the following let Ypi ∈ {0, 1, . . . , k}, p = 1, . . . , P , i = 1, . . . , I, denote the ordinal response of person p on item i. An important partition of the response categories is the 3 partition into the subsets {0, . . . , r − 1} and {r, . . . , k}, which can be represented by the binary variable  1 Ypi ≥ r (r) Ypi = (2) 0 Ypi < r. (1) (k) The variables Ypi , . . . , Ypi are called split variables because they split the response categories into two subsets. As shown by Tutz (2020b) they play a major role in the construction of the traditional ordered latent trait models. In the following we use these results to derive a taxonomy of the traditional models 2.1 Simultaneous Modelling of Splits: The Graded Response Model Let us assume that the response categories represent levels of performance in an achievement test. Then one can consider two groups of categories, {0, 1, . . . , r −1} for low performance and {r, . . . , k} for high performance, where low and high are relative terms that refer to “below category r” and “above or in category r”. One might assume that the split into low and high performance is determined by a binary model with person ability θp and a threshold that depends on the category at which the categories have been split by specifying (r) P (Ypi = 1) = F (θp − δir , ) , r = 1, . . . , k. (3) Thus, for each dichotomization into categories {0, 1, . . . , r−1} and {r, . . . , k} a binary model is assumed to hold. Importantly, the models are assumed to hold simultaneously with the same person ability θp but different item difficulties δir . Simple rewriting yields the cumulative model P (Ypi ≥ r) = F (θp − δir ), r = 1, . . . , k, (4) which is equivalent to a version of Samejima’s graded response model (Samejima, 1995, 2016). Thus, the graded response model can be seen as a model for which the dichotomizations into the categories Ypi < r and Ypi ≥ r are simultaneously modeled. One consequence is that item difficulties are ordered. Since P (Ypi = r) = P (Ypi ≥ r)−P (Ypi ≥ r +1) = F (θp −δir )−F (θp −δi,r+1 ) ≥ 0, one obtains that δir ≤ δi,r+1 has to hold for all categories. The strong link between the binary responses and the ordinal response yields a specific view of the graded response model that differs from traditional ones. In an (1) (k) achievement test the sequence of binary responses (Ypi , . . . , Ypi ) can be seen as referring to tasks with increasing difficulties. More concrete, because item difficulties are (r) (r+1) ordered, one has P (Ypi = 1) ≥ P (Ypi = 1), which means the “task” represented (r) (r+1) (r) by Ypi is simpler than the “task” Ypi . Moreover, if the task Ypi was completed (r) (s) (s) (Ypi = 1 or, equivalently, Ypi ≥ r), the simpler tasks Ypi , s < r (Ypi = 1 or, equivalently, Ypi ≥ s) were also completed. Therefore, the outcome of the sequence of binary variables has the specific form (1) (k) (Ypi . . . , Ypi ) = (1, . . . , 1, 0, . . . , 0), which means a sequence of ones is followed by a sequence of zeros. Binary variables that follow this pattern have been called Guttman variables and the resulting response 4 space is usually referred to as Guttman space, a term that was used by Andrich (2013) when discussing partial credit models. The more classic derivation of the cumulative model suggests that the item parameters may be seen as thresholds. Let Ỹpi = θp + εpi , where εpi is a noise variable with symmetric continuous distribution function F (.), denote a latent variable that is invoked if person p tries to solve item i. Ỹpi is essentially the ability of the person plus a noise variable and can be seen as the random ability of the person. The category boundaries approach assumes that category r is observed if the latent variable is between thresholds δir and δi,r+1 . More formally, one has Ypi = r ⇔ δir ≤ Ỹpi < δi,r+1 . It is easily seen that one obtains the cumulative model and thresholds have to be ordered. Thissen and Steinberg (1986) called the graded response models “difference” models because the probabilities are given as differences, P (Ypi = r) = F (θp − δir ) − F (θp − δi,r+1 ). Although they also start with binary models they do not further investigate that the models have to hold simultaneously. 2.2 Conditional Comparison of Categories: the Partial Credit and General Adjacent Categories Models Rather than compare groups of categories by utilizing a binary model one can also compare two categories from the set of categories {0, 1, . . . , k}. A choice that suggests itself are adjacent categories. Let the binary models that compare two adjacent categories be given by P (Ypi = r|Ypi ∈ {r − 1, r}) = F (θp − δir ), r = 1, . . . , k. (5) Again all the models contain the same person parameter but model-specific item parameters. For the logistic distribution function one obtains the partial credit model P exp( rl=1 (θp − δil )) , r = 1, . . . , k, P (Ypi = r) = Pk Ps exp( (θ − δ )) p il s=0 l=1 which was propagated by Masters (1982) and Masters and Wright (1984). It is equivalent to the polytomous Rasch model, which is just a different parameterization, see, for example, Andrich (2010). Thissen and Steinberg (1986) called the partial credit model model a “divide-by-total” model because of the denominator in the probabilities. However, the family of adjacent categories models is much larger because in (5) any strictly monotone distribution function can be used, for example, the use of the normal distribution yields a probit version of the adjacent categories model. In the logistic version sufficient statistics for item and person parameters are available. While the existence of sufficient statistics is an advantage if one wants to estimate parameters conditionally it is of lesser importance if one uses marginal estimates. We refer to this model class more generally as adjacent categories models, of which the polytomous Rasch model or partial credit model are just the most prominent members. An alternative form of the partial credit model, which emphasizes the implicit comparison of categories is   P (Ypi = r) = θp − δir , r = 1, . . . , k. (6) log P (Ypi = r − 1) 5 That means, the PCM directly compares two adjacent categories, and θp determines the strength of the preference for the higher category. It should be emphasized that the binary models used as building blocks are conditional models, it is assumed that a binary model holds given the response is in two categories from the set of available categories. This is seen from the representation (5) but hidden in the representation (6). However, it has consequences for the interpretation of parameters. The item parameters represent thresholds given the response is in categories {r − 1, r} and the trait parameters are the abilities to score r rather than r − 1 given the response is in categories {r − 1, r}. Therefore, the parameters refer to a local conditional decision or preference although changing the item parameter changes the probabilities of all possible outcome values since the PCM assumes that the binary models hold simultaneously. The conditional structure is also seen if the model is represented by using split (r−1) (r+1) variables. Since the condition Ypi = 1, Ypi = 0 is equivalent to Ypi ∈ {r − 1, r} one obtains that the PCM is equivalent to postulating for all split-variables (r) (r−1) P (Ypi = 1|Ypi (r+1) = 1, Ypi = 0) = F (θp − δir ), where F (.) is the logistic distribution function. It means that a Rasch model holds for (r) (r−1) the split-variable Ypi given the split Ypi is in favor of higher categories while the (r+1) split Ypi is in favor of lower categories. The class of adjacent categories model also contains simplified versions that use sparser parameterizations. By assuming that the item parameters can be decomposed into two terms in the form δil = δi + τl , one obtains the Rasch rating scale model (Andrich, 1978, 2016). The model can also be extended to include a slope parameter if it is included in the binary model that distinguishes between adjacent categories (Muraki, 1990, 1997). 2.3 Conditional Comparison of a Single Category and a Group of Categories: Sequential Models In achievement tests frequently items are used that are solved in consecutive observed √ 3 steps. For example, a mathematical problem may have the form: ( 49 − 9) =?. One √ can distinguish four levels: no problem solved (level 0), 49 = 7 solved (level 1), 7 − 9 = −2 solved (level 2), (−2)3 = −8 solved (level 3). Obviously the sub problems have to be solved in a consecutive way. A sub problem can only be solved if the all the previous sub problems have been solved. A model that explicitly models the solving of sub problems has the form P (Ypi ≥ r|Ypi ≥ r − 1) = F (θp − δir ), r = 1, . . . , k. (7) The model is known as sequential model (Tutz, 1990) or step model (Verhelst et al., 1997). It is a process model for consecutive steps. One models the transition to higher categories given the previous step was successful. The first step is the only non-conditional step. If it fails, the response is in category 0 (first sub problem not solved), if it is successful, the response is larger than 0 (first sub problem solved). In the latter case the person tries to take the second step. If it is not successful, the response is in category 1 (second sub problem not solved), if it is successful, the response 6 is larger than 1 (second sub problem solved), etc. In the r-th step it is distinguished between Ypi = r − 1 and Ypi ≥ r given at least level r − 1 is reached (Ypi ≥ r − 1). In the model the parameter θp represents the person’s ability to successfully perform each of the steps while δir is the difficulty in step r. Of course, later steps can be easier than early steps, thus item difficulties are not necessarily ordered. In the example step √ 2 (7 − 9) is certainly easier to master than step 1 ( 49 = 7). However, sub problem 2 can be only solved after step 1 was successful. Therefore, the item parameters have local meaning, they refer to the difficulty in a step given that all previous steps were successful. In contrast, the same ability parameter is present in each of the steps, which makes the model uni-dimensional in terms of person parameters. The logistic version of the model, also called logistic sequential model, can be given in the alternative form of a continuation ratio model,   P (Ypi ≥ r) log = θp − δir , r = 1, . . . , k, (8) P (Ypi = r − 1) (Agresti, 2013). The logits on the left hand side compare the categories the probability of a response in the categories {r, . . . , k} to the probability of a response in category {r − 1}. In this sense the binary models contained in the sequential model compare groups of categories to single categories. This comparison is also seen from the tree representation of the model given in Figure 2, which shows the sequence of (conditional) binary splits in a sequential model with four categories. In the r-th step a decision between category {r − 1} and categories {r, . . . , k} is obtained. The split is conditional, given categories {r − 1, . . . , k}, that means, under the condition that the previous step was successful. A disadvantage of the model representation (8) is that it does not directly show the underlying process. The implicit conditioning on responses Ypi ≥ r, which is essential for the interpretation of the model parameters, gets lost. It is however seen in the model representation with split variables given by (r) (r−1) P (Ypi = 1|Ypi (1) = 1, . . . , Ypi = 1) = F (θp − δir ) (1) , r = 1, . . . , k, (9) (k) which again shows that the split variables (Ypi . . . , Ypi ) form a Guttman space. A rating scale version of the model, in which the parameter P δir is split up into a an item location parameter δi and a step parameter τr , with r τr = 0 has been considered by Tutz (1990), extended versions with predictor αir θp − δir and nonparameteric versions have been considered by Hemker et al. (2001). 2.4 Overview on Classical Ordinal Models The fact that all the models contain binary models that split categories into two subsets can be exploited to distinguish between models by focusing on the underlying conditioning. In particular, in the partial credit model and the sequential model the splits are conditional whereas in the cumulative model the splits are simultaneous but not conditional. Figure 1 visualizes the resulting hierarchy of models. In Table 1 the models are given in various representations. The left column shows the logistic versions of the models. It shows which categories or groups of categories 7 0, 1, 2, 3 First Step 1, 2, 3 0 Unsuccessful: level 0 Second Step 2, 3 1 Unsuccessful: level 1 Third Step 2 3 Unsuccessful: level 2 F IGURE 2: The sequential model as a hierarchically structured model. TABLE 1: Overview of traditional ordinal models. Cumulative Adjacent Sequential Category Representation Conditional Representation Conditional Representation Logistic Version log(.) = θp − δir General Version P (.) = F (θp − δir ) With split variables P (.) = F (θp − δir )   P (Y ≥r) log P (Ypi pi <r)   P (Y =r) log P (Ypipi=r−1)   P (Y ≥r) log P (Ypipi=r−1) P (Ypi ≥ r) P (Ypi = 1) P (Ypi = r|Ypi ∈ {r − 1, r}) P (Ypi ≥ r|Ypi ≥ r − 1) (r) (r) (r−1) P (Ypi = 1|Ypi (r) (r+1) = 1, Ypi (r−1) P (Ypi = 1|Ypi are compared. In particular it is seen which type of logits are determined by the difference between person parameter and item parameter, θp −δir . For example, in the partial credit model one has the adjacent categories logits log(P (Ypi = r)/P (Ypi = r − 1)), in the sequential model one has the continuation ratios log(P (Ypi ≥ r)/P (Ypi = r − 1)). In the middle column the general conditional representations of the models are given. In these representations the distribution function F (.) can be any strictly monotonic distribution function. It shows which conditional binary response models are contained in the ordinal model. In the case of the graded response model the condition is empty since it is a non-conditional model. The right column shows the representation of the general models with split variables. It also shows clearly the conditioning implicitly contained in the models. It (1) (k) should be emphasized that in all the models the split variables (Ypi . . . , Ypi ) form a Guttman space with outcomes having the form (1, . . . , 1, 0, . . . , 0). They can be seen as generating the Guttman space, which is always defined and is not linked to any specific model, see Tutz (2020b). The dichotomizations or Guttman variables also clarify the meaning of parame8 = 1) = 0) (r) ters. In the graded response model the split variables Ypi , which distinguish between a response Ypi ≥ r (strong performance) and Ypi < r (weak performance), are directly linked to the difference between ’ability’ and item parameter, θp −δir . The corresponding binary models for split variables are unconditional and have to hold simultaneously. This allows to see the item difficulties as thresholds that are necessarily ordered, and (r) which have to be exceeded to obtain higher levels. The variables Ypi should not be (r) seen as steps. Ypi = 1 simply denotes that a person has at least performance level r. Since performance levels are ordered, that means, its performance cannot be below (1) (r) level r, or, in split variables, Ypi = · · · = Ypi = 1, which is the Guttman property of (r+1) the binary responses. One observes Ypi = r, if, in addition Ypi = 0, which means that the performance is below level r + 1. However, no steps or transitions are needed to explain the level of performance. As Andrich (2015) argues, if a performance like acting is to be classified according to some protocol, the judge places the person’s performance in one of the categories on the trait, not how the person transitioned in getting to the category. Even in simple binary models for problem solving one observes if the problem was solved or not, but not the transition. Thus, when considering ordinal models and the binary models contained in them there is no reason to construct a transition. It might be misleading and is not compatible with the underlying process, which is determined by simultaneous dichotomizations or the placing on the continuum of the latent scale, which is divided by the thresholds δi1 ≤ · · · ≤ δi,k . Interpretation of parameters is quite different in conditional models. Let us start with the sequential model since it is by construction a step or transition model. The (r) (r−1) split variables representation, P (Ypi = 1|Ypi = 1) = F (θp − δir ), shows that the difference between ability and item difficulty determines if the performance is above or in category r given at least performance level r − 1 has been reached. It makes the parameter δir a local threshold parameter. There is no ordering of thresholds involved since later steps might be easier than previous steps. In the partial credit model the decision that the performance is in category r as a function of the difference between ability and item parameter is under the condition (r−1) (r+1) Ypi ∈ {r − 1, r}, or equivalently, Ypi = 1, Ypi = 0. Thus, the binary models are conditional models and parameters should be interpreted with reference to the conditional structure. One consequence of the conditional parameterization is that thresholds do not have to be ordered though there has been some discussion on the ordering of thresholds (Adams et al., 2012; Andrich, 2013, 2015; Tutz, 2020b). Consideration of the binary models contained in ordinal models also explains why some models are robust against the collapsing of categories. In general, if a model holds for the original categories it does not necessarily hold if adjacent categories are grouped yielding a smaller set of response categories, although that might be an attractive feature of a model, see also Jansen and Roskam (1986) where the extreme case of dichotomization in the polytomous Rasch model is considered in detail. The graded response model (with a logistic distribution function) holds also for dichotomized responses (or other groupings of adjacent categories) since the splits themselves follow a Rasch model. This is different for the adjacent categories and the sequential model. They are not robust against collapsing of categories because the Rasch models that are contained are conditional. However, collapsing of categories changes the conditioning. If, for example, categories rand r + 1 are collapsed to form a new category, the condi9 1, 2, 3, 4, 5, 6 Query Agreement/Disagreement 1, 2, 3 4, 5, 6 Query Extremity 2, 3 1 4, 5 6 Query Weakness of Attitude 2 3 4 5 F IGURE 3: A tree for six ordered categories, categories 1,2,3 represent levels of disagreement, categories 4,5,6 represent levels of agreement (compare Figure 3 in Böckenholt (2017)). tions in the binary submodels after collapsing differ from the conditions in the original set of categories for all conditions that contain the new category. 3 Hierarchically Structured Modeling: Tree-Based Models The classical models considered in the previous section represent different types of modelling concerning the conditioning. While the graded response model is a model that does not rely on conditioning, the partial credit model conditions on a response in adjacent categories. The sequential model is conditional but, in contrast to the partial credit model, it can be represented as a tree (see Figure 2). This makes it a special model, it is hierarchical, that means, it can be represented by a sequence of conditional splits. Neither the graded response models nor the partial credit model are hierarchical. More recently with IRTrees a whole class of hierarchical models has been introduced. In the following we will first consider binary IRTrees and then consider alternative approaches. For simplicity in the following the response categories are {1, . . . , m}, which is the common notation in IR-Trees. 3.1 Binary IRTree Models Tree-based models assume a nested structure with the building blocks given as binary models. They were considered, among others, by De Boeck and Partchev (2012), Böckenholt (2012), Khorramdel and von Davier (2014), Böckenholt (2017) and Böckenholt and Meiser (2017). In the following we use the presentation of IRTree models given by Böckenholt (2017). IR-tree models are sequential process models, a response is constructed based on a series of mental questions. For illustration we consider an ordinal response with six categories representing ordinal outcomes ranging from “strongly disagree” to “strongly agree”. Figure 3 shows the corresponding tree, which is equivalent to Figure 3 in Böckenholt (2017). The first query determines a respondent’s agreement or disagreement. The second query determines the extremity of the (dis)agreement and the third query assesses whether the agreement is weak or not. For each query in the tree, which corresponds to a conditional binary decision one uses a 10 binary model. For query q the model is given by (q) P (Y(q)pi = 1) = F (θp(q) − δi ), (10) and the (local) response variable Y(q)pi is often referred to as a pseudo-item. Pseudo-items are conditional dichotomizations, and can also be represented by split variables. For example, the query that determines the extremity within agreement categories, distinguishing between category 6 and categories {4, 5} corresponds to mod(6) (6) (4) elling the split variable Ypi |Ypi ∈ {4, 5, 6}, or alternatively Ypi |Ypi = 1. Thus, tree models implicitly use the same dichotomizations as traditional ordinal models. However, there is one crucial difference between traditional models and IRTree models. While the former typically use one person parameter (and split-specific item parameters) the majority of IRTree models uses query-specific person parameters as given in (10). This makes the models multi-dimensional in terms of person parameters and person parameters are interpreted with reference to the specific query, that is, the conditional decision. In the tree given in Figure 3 the basic propensity to agree or disagree is modelled in the first query. The person parameters in the next queries refer to response styles, whether a person prefers extreme or middle categories. The parameterization seems not to efficiently use the information in the ordered categories since the propensity to agree or disagree is not present in later queries, though it might also determine the choice between category groups {1} and {2, 3}. Only recently more efficient binary trees have been proposed that use the same traits in more than one query (Tutz and Draxler, 2019; Meiser et al., 2019). In particular the approach of Meiser et al. (2019) is very attractive. They do not simply use the same trait in different queries but use scaled versions, which may be seen as factor loadings. (1) Let, for example, θp denote the trait in the first pseudo-item, which distinguishes between agreement and disagreement, then the pseudo item Y(2)pi , which distinguishes between {1} and {2, 3}, can be parameterized by (2) P (Y(2)pi = 1) = F (θp(2) + αθp(1) − δi ), (2) where the term θp represents the tendency to prefer categories {2, 3} (given the re(1) sponse is in categories {1, 2, 3}), and αθp represents the scaled tendency to higher response categories. In a similar way scaled versions of traits from previous queries are used in other pseudo-items, for details see Meiser et al. (2019). The strength of such parameterizations is that the same person parameter is present on several levels of the tree, and parameters that are specific to pseudo-items get a distinct meaning, for example as a tendency to extreme or less extreme categories. A major topic in binary trees is the modelling of response styles. However, IRTrees provide a wide class of flexible modelling tools that is not limited to response styles. For example, the first query may assess a person’s tendency to select a midscale answer indicating neutrality in five or seven-grade Likert scales. Then the first split distinguishes between the middle category and other categories, the following splits model the response if the neutral middle category is avoided. The resulting tree has an asymmetric form, see Figure 4. Models of this type are useful since the role of the neutral category is ambivalent. Kulas et al. (2008) investigated whether it is used to indicate a moderate standing on a trait/item, or rather is viewed by the respondent as a ’dumping ground’ for unsure or non-applicable response. In the latter case the use of the 11 1, 2, 3, 4, 5 neutral or not 1, 2, 4, 5 3 within disagreement/agreement categories 1 2 4 5 F IGURE 4: A tree for five ordered categories, categories 1,2 represent low response categories, categories 4,5 represent high response categories, 3 is the neutral middle category. middle category as part of the integer protocol might yield strongly biased results. An initial binary split that distinguishes between the neutral category and other categories can avoid bias. Binary trees of this type have been considered by Jeon and De Boeck (2016); Böckenholt (2017) and, more recently, by Plieninger (2020); Tutz (2020a). 3.2 Hierarchical Partitioning Using Ordinal Models Binary splits are simple but yield rather large trees with many nodes. An alternative that exploits the ordering of categories and provides sparser parameterizations is to use ordinal models as building blocks. Let us again consider an example with six ordered categories. Instead of using the binary splits tree given in Figure 3 one can work with the tree given in Figure 5. It has a simpler structure with only two levels in addition to the 0-level, which contains all categories. One can model the propensity to agree or disagree by (1) P (Ypi ≥ 4) = F (θp − δi ), (11) (1) where δi is the level 1 item parameter. The conditional propensity to choose from one of the categories in level 2 can be specified by any simple ordinal model, for example, by conditional graded response models, (2) P (Ypi ≥ r|Ypi ≤ 3) = F (αθp − δir ), r = 2, 3, P (Ypi ≥ r|Ypi ≥ 4) = F (αθp − (2) δir ), r = 5, 6, (12) (13) where α scales the person parameter at the second level. The model has just one parameter more than the simple graded response model, however, order restrictions are (2) (2) (2) (2) weaker. One just has δi2 ≤ δi3 and δi5 ≤ δi6 whereas in the simple cumulative model five thresholds have to be ordered. Thissen-Roe and Thissen (2013) considered a two-decision model of this type, which uses a modified graded response model in the second level. Within the model it is straightforward to include response styles by adding just one (2) person parameter. In the extended model P (Ypi ≥ r|Ypi ≤ 3) = F (αθp + γp − δir ), (2) r = 2, 3, P (Ypi ≥ r|Ypi ≥ 4) = F (αθp − γp − δir ), r = 5, 6, the parameter γp is a response style parameter that contains the tendency to middle categories. In a 12 1, 2, 3, 4, 5, 6 Query Agreement/Disagreement 1, 2, 3 4, 5, 6 Query Extremity 1 2 3 4 5 6 F IGURE 5: A tree for six ordered categories with three levels . binary tree as given in Figure 3 several additional parameters are needed to account for response styles whereas in the simpler structured tree in Figure 5 only one additional parameter is needed. For the estimation one can use similar methods as in binary trees, exploiting that likelihood contributions can be written as products of conditional probabilities (Tutz and Draxler, 2019). The class of hierarchical partitioning models is characterized by containing ordinal models for more than two categories as constituents. Instead of using just binary splits the order in responses is exploited efficiently by using ordinal models as building blocks. When modeling the response within agreement and disagreement categories any simple ordinal model can be used. Since the ordinal models can be represented by split variables the same holds for the model built from these blocks. They are in particular helpful to obtain sparse parameterizations. 4 A Taxonomy of Polytomous Item Response Models Including Tree Structured Models The taxonomy of ordinal models given in Figure 1 covers only the basic models. An extended taxonomy of polytomous IRT models that also contains the general class of hierarchically structured models is given in Figure 6. It also includes the nominal model and mixture models to be considered later Here we focus on the structure that is obtained by they way how ordinal models can be constructed from binary models as building blocks. At the outset it is distinguished between conditional models and simultaneous splits, that is, graded response models. The former use binary models in a conditional way by assuming that the choice between categories has already been narrowed down to a reduced set of categories. In contrast, the latter assume no conditioning but assume that the splits between categories are simultaneously determined by the same person parameter. There are two groups of conditional models. In the first group pairs of categories are compared by utilizing a binary response model to obtain, for example, the partial credit model and its simplified or extended versions. The second group is formed by hierarchical models. The crucial difference between non-hierarchical and hierarchical models is that in the former the conditions under which binary models are assumed to hold are overlapping. For example, in the partial credit model one binary sub model 13 Polytomous Models Nominal Model Conditional Models Graded Response Model Simultaneous Splits Model Mixture Models Homogeneous MM Non-hierarchical Models Adjacent categories models Rating scale model Heterogeneous MM Hierarchically structured Models Binary IRTrees conditional binary splits Asymmetric Trees Hierarchical Partitioning using conditional ordinal models Symmetric Trees F IGURE 6: Hierarchy of polytomous models. conditions on the the categories {0, 1}, another sub model conditions on {1, 2}. Both conditions contain the category 1. This overlapping prevents a representation as a hierarchical model. Hierarchically structured models can be divided into two types of models, binary IRTrees and hierarchical partitioning approaches. The former use only binary models to describe the conditional response in subsets of categories while the latter use traditional models with more than two categories as building blocks. One can further distinguish between symmetric and asymmetric tree models. Symmetric models are in particular useful for Likert items to account for the symmetry in answer categories. Symmetric tree models can be defined by considering subsets of categories S1 , S2 ⊂ {1, . . . , a}, where a = m/2 if m is even, and a = (m − 1)/2 if m is odd. An IRTree model is symmetric if for any (conditional) split between S1 and S2 there is a split between S3 = {r|m − r + 1, r ∈ S2 } and S4 = {r|m − r + 1, r ∈ S1 }. An example with an even number of categories is the splitting structure shown in Figure 3. If the number of categories is odd, and Likert items are considered, the first split typically distinguishes between the neutral category and the other categories. Although the visual appearance of the corresponding tree shows some asymmetry, see Figure 4, the corresponding model is symmetric, and treats categories in a proper way. Therefore, it is essential to distinguish between the symmetry of a tree and the symmetry of the model. While the former refers to the tree structure the latter refers to the corresponding model. A classical example of a tree model that is not symmetric is the sequential model. It is, in particular, not invariant under the reverse permutation of categories; if the order 14 of categories is reversed the corresponding sequential model differs from the sequential model for the original categories. In contrast, most symmetric models in common use are invariant under the reverse permutation, namely symmetric models, in which the response function F (.) is a symmetric distribution function. The distinction between asymmetric and symmetric models can be made for all hierarchically structured models. It is included in the taxonomy only for binary IRTrees, which have been investigated in the literature more intensively than other hierarchical models. In general, the graded response model and the adjacent categories models can be used for any form of graded responses, in achievement tests as well as in the investigation of attitudes. Hierarchically structured models are somewhat different, they are process models tailored to model a specific process. The sequential model assumes that levels of performance are reached successively, and therefore is most useful in items that are constructed with categories that represent successive solutions levels. Binary IRTrees and hierarchical partitioning approaches assume a specific conditional structure that aims at modeling the way how respondents generate a response. In hierarchically structured models, as in all conditional models, item parameters have to be interpreted locally since they refer to conditional decisions. 5 Nominal Models, Ordinal Models and the Role of Parameterizations All the polytomous IRT models considered so far can be considered ordinal models in the sense that they exploit the ordering of categories. A model that is different in this aspect is the so-called nominal model. It can be seen as a model that aims at detecting the order rather than using it, but also as a sort of background model from which specific ordinal models can be derived. In a taxonomy of polytomous IRT models, which essentially is a taxonomy of ordinal models, it should be included and its role be investigated. Another major topic in the following is the role of the parameterization of a model, which also plays a role in the transformation of the nominal model into an ordinal model. Variations in parameterizations yield more or less complex models of specific model types in the hierarchy given in Figure 6, various parameterizations can be used on every level of the hierarchy (Section 5.2). Another aspect of the parameterization within the taxonomy considered here concerns the link between parameterization and the exploitation of the ordering of categories. It is argued that the split variables and their specific parameterization make models ordinal models. 5.1 Nominal Models The taxonomy uses that ordinal models can be seen as composed from simpler, in particular binary models. This is most obvious in IRTrees but holds also for basic models as the graded response model. What makes the models ordinal ones is that the binary models are assumed to hold only for specific subsets of categories. For example, the graded response model assumes binary models to hold for subsets {0, . . . , r} and {r+1, . . . , k}. None of the ordinal models are built from binary models that distinguish between subsets such as {3, 7} and {5}. All subsets that are used reflect the ordering of the categories {0, . . . , k}. More concrete, binary models are assumed to hold, possibly conditionally, for subsets S1 , S2 , with c1 < c2 for c1 ∈ S1 , c2 ∈ S2 , and the binary 15 models distinguish between S1 an S2 in a way such that an increase in θp increases the probability of a response in S2 . It can be seen as an ’ordered subsets’ characterization of ordinal models, which is linked to split variables as considered later. This aspect is emphasized since ordinal models are sometimes derived from models that do not use the order of categories. The most widely used model to this end is Bock’s nominal model (Bock, 1972) exp(αir θp − βir ) P (Ypi = r) = Pk , s=0 exp(αis θp − βis ) r = 1, . . . , k, (14) in which additional constraints are needed to ensure identifiability of parameters, see Bock (1972), Thissen and Cai (2016). In the basic form the model uses only the nominal scale of the response, however, it can be transformed to use the order information. A first step is to set αir = φr , where φr are considered scoring functions for the categories, yielding Andersen’s version of the model (Andersen, 1977). If, in addition, it is assumed that the scores are ordered, that is, φ1 ≤ · · · ≤ φk , one obtains a model that actually uses the ordering of categories. If one assumes equi-distant scores, φr = r one obtains the partial credit model, which has been noted among others by Thissen and Steinberg (1986), where much more general transformations were considered. Also a general-purpose multidimensional model was considered by Thissen et al. (2010). The nominal model can be seen as a useful background model from which various ordinal models can be derived as special cases. It is also interesting from a conceptual point of view since it has also been used in a different way, namely to check the order of categories if that is not clear. This use is linked to different conceptualizations of ordinal and nominal models. An ordinal model, in the sense used here, exploits the order of categories while a nominal model is a model that is invariant against permutations of response categories. In its general form the model (14) is a nominal model but not an ordinal one since it is stable under permutations. On the other hand it uses a uni-dimensional trait, which implicitly assumes an order of the latent trait and therefore on the responses. This makes it a model that can be used to investigate the order of categories. Fitting the unconstrained model might yield information on the order, and it can be used to fit responses constructed for testlets, see, for example, Thissen and Cai (2016). It can also be used to provide scores using information from all responses even when the response categories are not clearly ordered. When used in this way it does not exploit the order of categories but aims at investigating the order empirically. This is a different concept of dealing with ordinality, namely using the model ’to examine the expected, or empirical, ordering of response categories’ (Thissen and Cai, 2016). It becomes an ordinal model in the sense used here if restrictions are imposed. Moreover, it generates a whole family of models that is strongly linked to the partial credit model. It should be noted that it does not generate the general adjacent categories model, but only models that use the logistic link, which, however, are the most widely used ones. This is visualized in the tree structure given in Figure 6. Adjacent categories models are sub models of nominal models (if the logistic link is used) but can also be considered as specific conditional models (for any response function F (.)). 16 5.2 Models and Parameterizations In the proposed taxonomy the parameterization is considered secondary. That does not mean that parameterization is not important. It is important, and much of the more recent latent trait literature is devoted to account for specific features like response styles or differential item functioning, which can be investigated by using specific parameterizations. However, parameterizations do not alter the structure given in Figure 6. They may be seen as special cases within this framework. Given the conditional or unconditional structure of a model quite differing parameterizations can be used. Instead of the simple difference between person parameters θp − δir one can include slope parameters yielding αi (θp − δir ). Response style effects can be modeled by adding additional person parameters yielding models with a multi-dimensional person parameter, see Johnson (2003) for cumulative type models with extreme response styles, and Wetzel and Carstensen (2017), Plieninger (2016), Jin and Wang (2014), Tutz et al. (2018) for partial credit models that account for response styles. The taxonomy given in Figure 6 also includes models that make much weaker assumptions on the response functions. Introduced by Mokken (1971) nonparametric IRT models have been extended to a wide class of models nonparametric IRT models, see, for example, Sijtsma and Molenaar (2002). Assumptions are much weaker, only local independence, uni-dimensionality and some form of monotonicity are needed. Ordinal nonparametric models can be derived by using more general functions in the binary models that are the building blocks of the models in the taxonomy. Instead of using the parametric form F (θp − δi ) one uses a uni-dimensional monotonic function. For example, the cumulative version is obtained by assuming P (Ypi ≥ r) = Mir (θp ), where Mir is a strictly increasing function that can depend on the item i and the response category r. Corresponding adjacent categories and sequential models are obtained by using on the left hand side P (Ypi = r|Ypi ∈ {r − 1, r}) or P (Ypi = r|Ypi ≥ r − 1), respectively. Models of this form have been considered by Hemker et al. (1997, 2001). Parameterizations yield specific hierarchies if the conditioning (the type of model) and the response function are fixed. An example is the hierarchy given by Hemker et al. (2001) in their Figure 2 for sequential models, which starts with the very restrictive sequential rating scale model, in which the location parameter is split into an item location parameter and a step parameter. The most general models in this hierarchy are Samejima’s acceleration model Samejima (1995) and the nonparametric sequential model. Similar hierarchies may be built for all of the models identified in Figure 6 but they are hierarchies generated by parameterization within the taxonomy. The taxonomy itself, which shows the relationship of models characterized by their conditioning, is unchanged. Taxonomies that focus on parameterizations have been given by Hemker et al. (1997), Hemker et al. (2001) and Sijtsma and Hemker (2000). They study carefully which parameterizations are special cases of other ones and display the structure in Venn diagrams. They also investigate so-called measurement properties of models as the monotone likelihood ratio, stochastic ordering properties and invariant item ordering, and show which models have these properties. 17 5.3 Characterization of Ordinal Models In the following the fundamental structure of ordinal models is investigated. It is argued that binary models for split variables are the essential constituents of models that are able to exploit the ordering of categories. Although typically there is some intuition why models are appropriate for ordered responses, for example the ordered thresholds on the latent scale in cumulative models, and the process from which the sequential model is derived, these motivations do not yield a general conceptualization of ordinal models. Nevertheless, in a taxonomy of polytomous models it seems warranted that one distinguishes between ordinal and nominal models. All the models considered here contain binary submodels of the form P (Yp+ ∈ S1 |Yp+ ∈ S2 ) = g(θp , {δis }), where S1 ⊂ S2 and {δis } is a set of item parameters. For example, the tree in Figure 3 contains a binary model that distinguishes between {1} and {2, 3} given {1, 2, 3}. The corresponding model for P (Ypi ∈ {2, 3}|Ypi ∈ {1, 2, 3}) can also be described by split variables, it is equivalent to (2) (4) modelling P (Ypi = 1|Ypi = 0). Also in partitioning models that have ordinal components binary models for split variables are contained. If a cumulative model is used to model the response given {1, 2, 3} in the structure given in Figure 5 the model contains a binary model P (Ypi ∈ {2, 3}|Ypi ∈ {1, 2, 3}), which is equivalent to (2) (4) P (Ypi = 1|Ypi = 0). However, binary models for split variables are not only submodels of ordinal model but are also the building blocks of the models. Therefore, a general class of models can be defined by postulating that there exists a finite number of submodels for split variables (l) (s) (r) (15) P (Ypi = 1|Ypi = 1, Ypi = 0) = gl (θp(l) |{δis }), s < r, such that (i) the function gl (.|{δis }) is nondecreasing for at least one l with s ≤ l ≤ r, (ii) the response probabilities are uniquely determined by these submodels. The set of submodels are called the model generating binary models, and the class of generated models split variables generated ordinal models. The second condition just ensures that the polytomous model can be constructed from the set of binary models. The crucial ingredients in the definition are the form of the conditioning in (15) and that the function is nondecreasing. The condition (s) (r) Ypi = 1, Ypi = 0 means that conditioning refers to a sequence of categories since it is equivalent to Ypi ∈ {s, s + 1, . . . , r − 1}. It can also be empty, which is the case (k+1) if s = 0, r = k + 1, and one defines Ypi = 0. The postulate that the function is nondecreasing ensures that the order of categories is used in a consistent way. It im(l) plies that, whatever the conditioning, an increase in θp is in favor of higher categories, the probability of lower categories can not increase. It excludes, in particular, that one constructs a binary tree, in which, for example, the response in the lower category 1 given {1, 2} and the response in the higher category 6 given {5, 6} are modeled by binary Rasch models. This construction would violate the order of categories, and is avoided by using split variables to define the condition. 18 With the exception of the nominal model, all models considered here are split variables generated ordinal models. In the traditional models, that is, the cumulative, sequential, and adjacent categories model, it is typically assumed that the trait does not depend on the the split, that means, one has in all models generating binary models (l) with θp = θp for all l. The model generating binary models are the ones given in ((l)) Table 1. In IRTrees the parameters θp are not necessarily the same, they can vary over the binary models. It is instructive to investigate why the nominal model is not among the class of models specified by (15). In particular, it clarifies the conditions in the definition of this class of models. The nominal model (14) is certainly a nominal but not an ordinal model since it is invariant under permutations. Nevertheless, it can be constructed from binary submodels log(P (Ypi = r)/P (Ypi = 1)) = γir θp − ξir with γir ≥ 0. These submodels yield log(P (Ypi = r)/P (Ypi = r − 1)) = (γir − γi,r−1 )θp + ξi,r−1 − ξi,r . This model is equivalent to the nominal model, which is seen by using the reparameterization αir = γir − γi,r−1 , βir = ξi,r − ξi,r−1 . Thus, a nominal model is constructed from binary models that are nondecreasing in person parameters because γir ≥ 0 is assumed. The crucial point is that the model generating submodels log(P (Ypi = r)/(P (Ypi = 1) = γir θp − ξir are based on the conditioning Ypi ∈ {1, r}, which is not of the type used in the definition of the class of split variables generated ordinal models. This clarifies that one has to postulate for the (conditional) model (s) (r) generating binary models that the condition is of the form Ypi = 1, Ypi = 0 or, equivalently, Ypi ∈ {s, s + 1, . . . , r − 1}. Without that condition it would not be ensured that a model exploits the ordering of categories. The class of split variables generated ordinal models comprises all the traditional ordinal models as well as the hierarchical models considered previously. It does not depend on specific parameterizations, it is just assumed that response functions are nondecreasing, and that the condition in the generating model has a specific form determined by split variables. Although it can not be totally excluded that there might be alternative ways to find models that use the order in categories, the considered class of models seems rather exhaustive. An additional advantage of characterizing ordinal models by the binary submodels that are contained is that it is rather flexible and avoids questionable criteria. For example, Adams et al. (2012) consider categories in an item response model as ordered if the expectation E(Yp |θp ) is an increasing function of θp . That means, if a person has a higher value of θp than another person, then the person with the higher value will, on average, score more. The problem with the definition is that the expectation is a sensible measure only if the response Yp is measured on a metric scale level, however, one wants to characterize the use of the ordinal scale level. The expectation is not helpful for this purpose because it uses a scale level that is not assumed to be available. Alternative ways of characterizing the use of order rely on specific functions. As Adams et al. (2012) suggest a model uses the order if for any ordered pair s < r the function msr (θp ) = P (Ypi = r|θp )/P (Ypi = s|θp ) is an increasing function of θp . It can be seen as a stochastic ordering property since it implies that for θp1 < θp2 one has msr (θp1 ) < msr (θp2 ). However, one might use quite different functions, for example, that P (Yp ≥ r|θp ) is an increasing function of θp , yielding quite different conceptualizations of ordinal models that are not compatible. The strength of the conceptualiza- 19 tion based on split variables is that no specific functions, which are somewhat arbitrary, are needed. 6 Mixture Models An alternative class of models that has been included in Figure 6 are mixture models. They follow a quite different reasoning to account for heterogeneity in responses and response styles and therefore are included as a separate class of models. General finite mixture models for latent traits have the form P ((Yp1 , . . . , YpI )) = M X m=1 (m) πm Pm ((Yp1 , . . . , YpI )|θp , {δ ir }). That means the population is subdivided into M latent classes, where Pm (.) denotes (m) the model in the latent class m with parameters P θp , {δ ir }, and π1 , . . . , πM denotes the mixture probabilities of the latent classes ( m πm = 1). Mixture item response models, originally developed for Rasch models by Rost (1991), are strong tools to investigate unidimensionality, the presence of response styles, and differential item functioning without assuming that the relevant grouping variable that induces differential item functioning to be known. Extensions to ordinal responses have been considered by Rost et al. (1997), Eid and Rauber (2000), Gollwitzer et al. (2005), Maij-de Meij et al. (2008), Moors (2010),Van Rosmalen et al. (2010), Von Davier and Yamamoto (2004), for an overview see also Von Davier and Yamamoto (2007). It seems sensible to distinguish between two approaches to specifying mixture models, the homogeneous modelling strategy and the heterogeneous strategy. In homogeneous finite mixture models the same functional form is used in all the mixture components, for example, a partial credit model (Eid and Rauber, 2000; Gollwitzer et al., 2005). It is assumed that respondents are from different latent classes but only model parameters, not the structure of the model vary across classes. The approach is not without problems. Typically the number of classes is unknown and has to be chosen driven by data. However, one gets quite different model parameters when fitting, for example, three or four classes, since all the parameters change when considering one more class. Even if a number of classes has been chosen it is sometimes still difficult to interpret the difference between classes and explain what exact features are represented by classes, they might indicate a response style or some other dimension that is involved when responding to items. Homogeneous models do not explicitly model which trait is to be detected and are primarily exploratory tools. Heterogeneous finite mixture models are sharper tools, they allow to use different models in the components specifying explicitly which specific trait is to be detected. Moreover, typically the number of components is fixed. Early mixture models of this type are HYBRID models as proposed by Von Davier (1996); Von Davier and Yamamoto (2007). Although some HYBRID models can be represented as mixture models that have the same functional form in the components but with constraints in some of the components (Von Davier and Yamamoto, 2007), the constraints specify which traits are modelled. Further models with constraints have been proposed by De Boeck et al. (2011), Shu et al. (2013). 20 A specific mixture with fundamentally different components as been considered more recently by Tijmstra et al. (2018). They proposed a two-class mixture of a generalized partial credit model and an IRTree model, carefully designed to distinguish between respondents who consider the middle category of a five-category Likert item as representing one category in a sequence of ordered categories and respondents who use the middle category as a non-response option. While the former follow a partial credit model the response of the latter is described by a specific IRtree model that separates the middle category. All mixture models, in which at least one of the components is an ordinal model account for the order of categories. Therefore, they are included in the taxonomy, but they are a separate class of models with specific purposes. In particular they can be used to model response styles in a quite different way than IRtrees and extensions of classical models as the partial credit model with response style. In the following possible approaches are considered briefly. To avoid the pitfalls of homogeneous mixture models it might be sensible to use structured mixtures, in which the type of response style is explicitly specified. With response vector Y Tp = (Yp1 , . . . , YpI ) a simple two-components model has the form P (Y p ) = πM PM (Y p |θp , {δ ir }) + (1 − πM )PRS (Y p |par), where in the first component responses are determined by model M with PM (Y p |θp , {δ ir }) referring to a partial credit or some other ordinal model, the second model PRS (Y p |par) specifies the response style that is suspected to be present. For example, one might investigate if a portion of respondents shows non-contingent response style, which is found if persons have a tendency to respond carelessly, randomly, or nonpurposefully (Van Vaerenbergh and Thomas, 2013; Baumgartner and Steenkamp, 2001) by specifying PRS (Y p |par = I Y i=1 P (Ypi |{δ m ir }), where {δ m ir } are parameters that determine the marginal distribution of responses item i. The specification means that responses on items are independent and determined only by the item parameters. An alternative is a mixture with the component PRS (Y p |par = PM (Y p |θp , {δ RS ir }, γp ), where γp are additional response style parameters in a partial credit model if, for example, a partial credit model determines the first mixture component. The parameters {δ RS ir } are parameters for model M in the second mixture component. Then it is assumed that respondents that are affected by response style have different parameters than respondents without response style. However, it may also be assume that the parameters are the same as in the first component. Then one allows for respondents to be affected by response styles in differing degrees. 21 Approaches like that go beyond the classical modelling of response styles in mixture models. In classical mixture model respondents are affected by response styles or not, response style is considered a discrete trait (Bolt and Johnson, 2009). In contrast, in parametric models response styles are represented by parameters, which may be small or large, varying across persons, and making response style a continuous trait. The mixture given above combines these two worlds. Respondents may not be be affected by response style, or may be affected, but in different degrees. Models of this type seem not to have been considered, although there has been some development on modelling uncertainty, which is related to non-contingent response styles, however approaches were proposed mainly in the regression context not for repeated measurements as item responses. For an overview of uncertainty modelling in regression see Piccolo and Simone (2019), repeated measures were considered by Colombi et al. (2018). 7 Concluding Remarks It has been shown that an easily comprehensible taxonomy of ordinal item response models can be obtained by investigating the role of building blocks and split variables within the structure of ordinal models. The structure contains traditional models, IRTree models, and the class of hierarchical partitioning models. Although it is well known that ordinal models contain binary models their role in the construction of ordinal models seems not to have been investigated in a systematic way to obtain a taxonomy. In particular the distinction between non-conditional and conditional models, the split of the latter into hierarchical and non-hierarchical models, the role of the nominal model and how it is to be distinguished from ordinal models contribute to clarify the structuring of polytomous models. One of the advantages of having a distinct taxonomy of models is that the meaning of parameters becomes clear. In particular, effects in conditional models should be interpreted with regard to the conditioning, which holds for parametric and nonparametric approaches. Parameterizations do not determine the taxonomy. They primarily determine the complexity of the model, and specify which effects are included in the model. Alternative parameterization have different meanings, and an additional slope parameter has a quite different interpretation if it is included in an adjacent categories model or an IRTree. Their meaning depends on the model type, and therefore on the placement in the taxonomy. References Adams, R. J., M. L. Wu, and M. Wilson (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement 72(4), 547–573. Agresti, A. (2013). Categorical Data Analysis, 3d Edition. New York: Wiley. Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika 42, 69–81. 22 Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika 43(4), 561–573. Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika 75(2), 292–308. Andrich, D. (2013). An expanded derivation of the threshold structure of the polytomous Rasch model that dispels any ’threshold disorder controversy’. Educational and Psychological Measurement 73(1), 78–124. Andrich, D. (2015). The problem with the step metaphor for polytomous models for ordinal assessments. Educational Measurement: Issues and Practice 34(2), 8–14. Andrich, D. (2016). Rasch rating-scale model. In W. Van der Linden (Ed.), Handbook of Modern Item Response Theory, pp. 75–94. Springer. Baumgartner, H. and J.-B. E. Steenkamp (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research 38(2), 143– 156. Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika 37(1), 29–51. Böckenholt, U. (2012). Modeling multiple response processes in judgment and choice. Psychological Methods 17(4), 665–678. Böckenholt, U. (2017). Measuring response styles in Likert items. Psychological methods (22), 69–83. Böckenholt, U. and T. Meiser (2017). Response style analysis with threshold and multi-process irt models: A review and tutorial. British Journal of Mathematical and Statistical Psychology 70(1), 159–181. Bolt, D. M. and T. R. Johnson (2009). Addressing score bias and differential item functioning due to individual differences in response style. Applied Psychological Measurement 33(5), 335–352. Colombi, R., S. Giordano, A. Gottard, and M. Iannario (2018). Hierarchical marginal models with latent uncertainty. Scandinavian Journal of Statistics, to appear. De Boeck, P., S.-J. Cho, and M. Wilson (2011). Explanatory secondary dimension modeling of latent differential item functioning. Applied Psychological Measurement 35(8), 583–603. De Boeck, P. and I. Partchev (2012). Irtrees: Tree-based item response models of the glmm family. Journal of Statistical Software 48(1), 1–28. Eid, M. and M. Rauber (2000). Detecting measurement invariance in organizational surveys. European Journal of Psychological Assessment 16(1), 20–30. Garcı́a-Pérez, M. A. (2017). An analysis of (dis) ordered categories, thresholds, and crossings in difference and divide-by-total irt models for ordered responses. The Spanish Journal of Psychology 20, 1–27. 23 Gollwitzer, M., M. Eid, and R. Jürgensen (2005). Response styles in the assessment of anger expression. Psychological assessment 17(1), 56. Hemker, B. T., K. Sijtsma, I. W. Molenaar, and B. W. Junker (1997). Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika 62(3), 331–347. Hemker, B. T., L. A. van der Ark, and K. Sijtsma (2001). On measurement properties of continuation ratio models. Psychometrika 66(4), 487–506. Jansen, P. G. and E. E. Roskam (1986). Latent trait models and dichotomization of graded responses. Psychometrika 51(1), 69–91. Jeon, M. and P. De Boeck (2016). A generalized item response tree model for psychological assessments. Behavior research methods 48(3), 1070–1085. Jin, K.-Y. and W.-C. Wang (2014). Generalized irt models for extreme response style. Educational and Psychological Measurement 74(1), 116–138. Johnson, T. R. (2003). On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style. Psychometrika 68(4), 563–583. Khorramdel, L. and M. von Davier (2014). Measuring response styles across the big five: A multiscale extension of an approach using multinomial processing trees. Multivariate Behavioral Research 49(2), 161–177. Kulas, J. T., A. A. Stachowski, and B. A. Haynes (2008). Middle response functioning in likert-responses to personality items. Journal of Business and Psychology 22(3), 251–259. Maij-de Meij, A. M., H. Kelderman, and H. van der Flier (2008). Fitting a mixture item response theory model to personality questionnaire data: Characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement 32(8), 611–631. Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika 47, 149–174. Masters, G. N. and B. Wright (1984). The essential process in a family of measurement models. Psychometrika 49, 529–544. Meiser, T., H. Plieninger, and M. Henninger (2019). Irt ree models with ordinal and multidimensional decision nodes for response styles and trait-based rating responses. British Journal of Mathematical and Statistical Psychology. Mokken, R. J. (1971). A theory and procedure of scale analysis. Berlin: Walter de Gruyter. Moors, G. (2010). Ranking the ratings: A latent-class regression model to control for overall agreement in opinion research. International Journal of Public Opinion Research 22(1), 93–119. 24 Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement 14(1), 59–71. Muraki, E. (1997). A generalized partial credit model. Handbook of modern item response theory, 153–164. Piccolo, D. and R. Simone (2019). The class of CUB models: statistical foundations, inferential issues and empirical evidence. Statistical Methods and Applications, https://doi.org/10.1007/s10260-019-00461-1. Plieninger, H. (2016). Mountain or molehill? a simulation study on the impact of response styles. Educational and Psychological Measurement 77, 32–53. Plieninger, H. (2020). Developing and applying ir-tree models: Guidelines, caveats, and an extension to multiple groups. Organizational Research Methods, doi:10.1177/1094428120911096. Rost, J. (1991). A logistic mixture distribution model for polychotomous item responses. British Journal of Mathematical and Statistical Psychology 44(1), 75–92. Rost, J., C. Carstensen, and M. Von Davier (1997). Applying the mixed rasch model to personality questionnaires. Applications of latent trait and latent class models in the social sciences, 324–332. Samejima, F. (1995). Acceleration model in the heterogeneous case of the general graded response model. Psychometrika 60(4), 549–572. Samejima, F. (2016). Graded response model. In W. Van der Linden (Ed.), Handbook of item response theory, pp. 95–108. Shu, Z., R. Henson, and R. Luecht (2013). Using deterministic, gated item response theory model to detect test cheating due to item compromise. Psychometrika 78(3), 481–497. Sijtsma, K. and B. T. Hemker (2000). A taxonomy of irt models for ordering persons and items using simple sum scores. Journal of Educational and Behavioral Statistics 25(4), 391–415. Sijtsma, K. and I. W. Molenaar (2002). Introduction to nonparametric item response theory, Volume 5. Sage. Thissen, D. and L. Cai (2016). Nominal categories model. In W. Van der Linden (Ed.), Handbook of Modern Item Response Theory, pp. 51–73. Springer. Thissen, D., L. Cai, and R. D. Bock (2010). The nominal categories item response model. Handbook of polytomous item response theory models, 43–75. Thissen, D. and L. Steinberg (1986). A taxonomy of item response models. Psychometrika 51(4), 567–577. Thissen-Roe, A. and D. Thissen (2013). A two-decision model for responses to Likerttype items. Journal of Educational and Behavioral Statistics 38(5), 522–547. 25 Tijmstra, J., M. Bolsinova, and M. Jeon (2018). Generalized mixture irt models with different item-response structures: A case study using Likert-scale data. Behavior Research Methods 55, 1–20. Tutz, G. (1990). Sequential item response models with an ordered response. British Journal of Statistical and Mathematical Psychology 43, 39–55. Tutz, G. (2020a). Hierarchical models for the analysis of Likert scales in regression and item response analysis. International Statistical Review, doi:10.1111/insr.12396. Tutz, G. (2020b). On the structure of ordered latent trait models. Journal of Mathematical Psychology 96. Tutz, G. and C. Draxler (2019). A common framework for classical and tree-based item response models including extended hierarchically structured models. Technical Report 227, Department of Statistics LMU Munich. Tutz, G., G. Schauberger, and M. Berger (2018). Response styles in the partial credit model. Applied Psychological Measurement 42, 407–427. Van der Linden, W. (2016). Handbook of Item Response Theory. Springer: New York. Van Rosmalen, J., H. Van Herk, and P. Groenen (2010). Identifying response styles: A latent-class bilinear multinomial logit model. Journal of Marketing Research 47(1), 157–172. Van Vaerenbergh, Y. and T. D. Thomas (2013). Response styles in survey research: A literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research 25(2), 195–217. Verhelst, N. D., C. Glas, and H. De Vries (1997). A steps model to analyze partial credit. In Handbook of modern item response theory, pp. 123–138. Springer. Von Davier, M. (1996). Mixtures of polytomous rasch models and latent class models for ordinal variables. Softstat 95. Von Davier, M. and K. Yamamoto (2004). Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Applied Psychological Measurement 28(6), 389–406. Von Davier, M. and K. Yamamoto (2007). Mixture-distribution and hybrid rasch models. In Multivariate and mixture distribution Rasch models, pp. 99–115. Springer. Wetzel, E. and C. H. Carstensen (2017). Multidimensional modeling of traits and response styles. European Journal of Psychological Assessment (33), 352–364. 26