Student Evaluation of Teaching Effectiveness: An Assessment of Student Perception and Motivation
Student Evaluation of Teaching Effectiveness: An Assessment of Student Perception and Motivation
Student Evaluation of Teaching Effectiveness: An Assessment of Student Perception and Motivation
1, 2003
YINING CHEN & LEON B. HOSHOWER, Ohio University, Athens, Ohio, USA
ABSTRACT Over the past century, student ratings have steadily continued to take
precedence in faculty evaluation systems in North America and Australia, are increas-
ingly reported in Asia and Europe and are attracting considerable attention in the Far
East. Since student ratings are the most, if not the only, influential measure of teaching
effectiveness, active participation by and meaningful input from students can be critical
in the success of such teaching evaluation systems. Nevertheless, very few studies have
looked into students’ perception of the teaching evaluation system and their motivation
to participate. This study employs expectancy theory to evaluate some key factors that
motivate students to participate in the teaching evaluation process. The results show that
students generally consider an improvement in teaching to be the most attractive
outcome of a teaching evaluation system. The second most attractive outcome was using
teaching evaluations to improve course content and format. Using teaching evaluations
for a professor’s tenure, promotion and salary rise decisions and making the results of
evaluations available for students’ decisions on course and instructor selection were less
important from the students’ standpoint. Students’ motivation to participate in teaching
evaluations is also impacted significantly by their expectation that they will be able to
provide meaningful feedback. Since quality student input is an essential antecedent of
meaningful student evaluations of teaching effectiveness, the results of this study should
be considered thoughtfully as the evaluation system is designed, implemented and
operated.
Introduction
Student evaluations have become routine at most colleges and universities. Evidence
from many studies indicates that most universities and colleges throughout the world use
student ratings of instruction as part of their evaluation of teaching effectiveness (Seldin,
1985; Abrami, 1989; Wagenaar, 1995; Abrami et al., 2001; Hobson & Talbot, 2001).
ISSN 0260-2938 print; ISSN 1469-297X online/03/010071-18 2003 Taylor & Francis Ltd
DOI: 10.1080/0260293032000033071
72 Y. Chen & L. B. Hoshower
With the surge in public demand for accountability in higher education and the
great concern for quality of university teaching, the practice of collecting
student ratings of teaching has been widely adopted by universities all over the
world as part of the quality assurance system. (Kwan, 1999, p. 181)
Student evaluations of teaching effectiveness are commonly used to provide: (1)
formative feedback to faculty for improving teaching, course content and structure; (2)
a summary measure of teaching effectiveness for promotion and tenure decisions; (3)
information to students for the selection of courses and teachers (Marsh & Roche, 1993).
Research on student evaluations of teaching effectiveness often examines issues like the
development and validity of an evaluation instrument (Marsh, 1987), the validity
(Cohen, 1981) and reliability (Feldman, 1977) of student ratings in measuring teaching
effectiveness and the potential bias of student ratings (Hofman & Kremer, 1980; Abrami
& Mizener, 1983; Tollefson et al., 1989). Very few studies, however, have examined
students’ perceptions of teaching evaluations and their motivation to participate in the
evaluation. Since students’ input is the root and source of student evaluation data,
meaningful and active participation of students is essential. The usefulness of student
evaluation data is severely undermined unless students willingly provide quality input.
Expectancy theory has been recognised as one of the most promising conceptualisa-
tions of individual motivation (Ferris, 1977). Many researchers have proposed that
expectancy theory can provide an appropriate theoretical framework for research that
examines a user’s acceptance of and intent to use a system (DeSanctis, 1983). However,
empirical research employing expectancy theory within an educational context has been
limited. This study uses expectancy theory as part of a student-based experiment to
examine students’ acceptance of and motivation to participate in a teaching evaluation
system. The following section provides a review of prior research on teaching evaluation
and a discussion of expectancy theory.
and two summative. They have been identified by literature as the primary uses of
teaching evaluations as discussed earlier. The two formative uses are: (1) improving the
professor’s teaching; and (2) improving the course’s content and format; the two
summative uses are: (3) influencing the professor’s tenure, promotion and salary rises;
and (4) making these results available for students to use in the selection of courses and
teachers [1]. The second objective of this study was to examine whether an inappropri-
ately designed teaching evaluation that, in the perception of students, hinders students
from providing valid or meaningful feedback will affect their motivation to participate
in the evaluation. The third objective is to discover whether these results are consistent
across class rank or whether freshmen and seniors have different preferences in the use
of student-generated teaching evaluations and consequently whether they have different
motivations to participate. Through a better understanding of students’ needs and
behavioural intentions, the results of this study can aid in creating evaluation systems
that truly respond to the needs of those who evaluate teaching performance.
Expectancy Theory
The theory of reasoned action, as proposed by Ajzen and Fishbein (1980), is a well-
researched model that has successfully predicted behaviour in a variety of contexts. They
propose that attitudes and other variables (i.e. an individual’s normative beliefs) do not
directly influence actual behaviour (e.g. participation) but are fully mediated through
behaviour intentions or the strength of one’s intention to perform a specific behaviour.
This would imply that measurement of behavioural intentions (motivation) to participate
in a system is a strong and more appropriate predictor (than just attitudes) of the success
of the system.
Expectancy theory is considered one of the most promising conceptualisations of
individual motivation. It was originally developed by Vroom (1964) and has served as
a theoretical foundation for a large body of studies in psychology, organisational
behaviour and management accounting (Harrell et al., 1985; Brownell & McInnes, 1986;
Hancock, 1995; Snead & Harrell, 1994; Geiger & Cooper, 1996). Expectancy models are
cognitive explanations of human behaviour that cast a person as an active, thinking,
predicting creature in his/her environment. He or she continuously evaluates the
outcomes of his or her behaviour and subjectively assesses the likelihood that each of
his or her possible actions will lead to various outcomes. The choice of the amount of
effort he or she exerts is based on a systematic analysis of: (1) the values of the rewards
from these outcomes; (2) the likelihood that rewards will result from these outcomes;
and (3) the likelihood of reaching these outcomes through his or her actions and efforts.
According to Vroom, expectancy theory is composed of two related models: the
valence model and the force model. In our application of the theory, the valence model
shows that the overall attractiveness of a teaching evaluation system to a student (Vj) is
the summation of the products of the attractiveness of those outcomes associated with
the system (Vk) and the probability that the system will produce those outcomes (Ijk):
Vj ⫽ 兺nk ⫽ 1(Vk ⫻ Ijk)
where Vj is the the valence, or attractiveness, of a teaching evaluation (outcome j, first
level outcome), Vk is the the valence, or attractiveness, of outcome k (second level
outcome) and Ijk is the perceived probability that the teaching evaluation will lead to
outcome k.
In our case the four potential outcomes (i.e. k ⫽ 4) are the four uses of teaching
evaluations that are described in the literature. They are: (1) improving the professor’s
Student Evaluation of Teaching 75
teaching; (2) influencing the professor’s tenure, promotion and salary rises; (3) improv-
ing the course’s content and format; (4) making these results available for students to use
in the selection of courses and teachers.
The force model shows that a student’s motivation to exert effort into a teaching
evaluation system (Fi) is the summation of the products of the attractiveness of the
system (Vj) and the probability that a certain level of effort will result in a successful
contribution to the system (Eij):
Fi ⫽ 兺nj⫽ 1(Eij ⫻ Vj)
where Fi is the motivational force to participate in a teaching evaluation at some level
i, Eij is the expectancy that a particular level of participation (or effort) will result in a
successful contribution to the evaluation and Vj is the valence, or attractiveness, of the
teaching evaluation, derived in the previous equation of the valence model.
In terms of the decision making process, each student first uses the valence model and
then the force model. In the valence model, each participant in a teaching evaluation
system evaluates the system’s outcomes (e.g. improved teaching, rewarding effective
teaching, improved course content and availability of results for students’ decision
making) and subjectively assesses the likelihood that these outcomes will occur. Next,
by placing his or her own intrinsic values (or weights) on the various outcomes, each
student evaluates the overall attractiveness of the teaching evaluation system. Finally, the
student uses the force model to determine the amount of effort he or she is willing to
exert in the evaluation process. This effort level is determined by the product of the
attractiveness generated by the valence model (above) and the likelihood that his or her
effort will result in a successful contribution to the system. Based on this systematic
analysis, the student will determine how much effort he or she would like to exert in
participating in the evaluation system.
Research Method
Subject Selection
This study was conducted at a mid-sized (15 000–20 000 total enrollment), mid-west
university. The freshmen participants were gathered from two sections of Western
Civilization, which were designated as ‘freshmen and sophomores only’, although a few
upper classmen were registered for the class. Seniors were gathered from ‘Tier III’
courses. Tier III courses are the capstone course of the general education requirement.
All students are required to take one Tier III class before graduation. Although each Tier
III course is unique, they have a number of common factors. Tier III courses must
integrate more than one academic discipline, are restricted exclusively to seniors, address
widely diverse topics and have few or no prerequisites. As a general education
requirement, Tier III are never directed towards a particular major and are usually
populated by students from diverse academic backgrounds.
The instrument was administered at the beginning of a regularly scheduled class
around the middle of the quarter to all the students who were present on that particular
day. We explained the use of the instrument, read the instruction page to the students and
then asked the students to complete the instrument. The entire process took between 15
and 20 minutes. Students other than freshmen and seniors were eliminated from the
sample as were the instruments with incomplete data [2]. This resulted in 208 usable
instruments completed by 105 freshman and 103 senior students. The demographic
76 Y. Chen & L. B. Hoshower
Freshmen Seniors
information for students in the two groups is summarised in Table 1. We used the
freshmen versus seniors design to examine whether freshmen and seniors have different
motivations to participate in the teaching evaluation system.
Judgment Exercise
The within-person or individual focus of expectancy theory suggests that appropriate
tests of this theory should involve comparing measurements of the same individual’s
motivation under different circumstances (Harrell et al., 1985; Murky & Frizzier, 1986).
In response to this suggestion, this study incorporates a well-established within-person
methodology originally developed by Stahl and Harrell (1981) and later proven to be
valid by other studies in various circumstances (see, for example, Snead & Harrell, 1995;
Geiger & Cooper, 1996). This methodology uses a judgment modelling decision exercise
that provides a set of cues that an individual uses in arriving at a particular judgment or
decision. Multiple sets of these cues are presented, each representing a unique combi-
nation of strengths or values associated with the cues. A separate judgment is required
from the individual for each unique combination of cues presented.
We employed a one-half fractional factorial design using the four second level
outcomes [3]. This resulted in eight different combinations of the second level outcomes
1
(24 ⫻ 2 ⫽ 8 combinations). Each of the resulting eight combinations was then presented
at two levels (10 and 90%) of expectancy to obtain 16 unique cases (8 combinations ⫻ 2
levels of expectancy ⫽ 16 cases). This furnished each participant with multiple cases
that, in turn, provided multiple measures of each individual’s behavioural intentions
under varied circumstances. This is a prerequisite for the within-person application of
expectancy theory (Snead & Harrell, 1995).
Student Evaluation of Teaching 77
In each of the 16 cases, the participants were asked to make two decisions. The first
decision, Decision A, corresponded to Vj in the valence model and represented the
overall attractiveness of participating in the evaluation, given the likelihood (10 or 90%)
that the four second level outcomes (Ijk) would result from their participation. (The
instructions and a sample case are provided in the Appendix.) As mentioned earlier, the
four second level outcomes are (1) improving the professor’s teaching, (2) influencing
the professor’s tenure, promotion and salary rise, (3) improving the course content and
format and (4) making the results available to students. The second decision, Decision
B, corresponded to Fi in the force model and reflected the strength of a participant’s
motivation to participate in the evaluation, using (1) the attractiveness of the evaluation
(Vj) obtained from Decision A and (2) the expectancy (Eij, 10 or 90%) that if the
participant exerted a great deal of effort he or she would be successful in providing
meaningful or useful input to the evaluation process. We used an 11 point response scale
with a range of ⫺ 5 to ⫹ 5 for Decision A and 0 to 10 for Decision B. For Decision
A, ⫺ 5 five represented ‘very unattractive’ and ⫹ 5 represented ‘very attractive’; for
Decision B, 0 represented ‘zero effort’ and 10 represented a ‘great deal of effort’.
There is a problem with applying the expectancy theory model to teaching evaluations.
Expectancy theory holds that an individual may devote a great deal of effort towards
achieving an outcome and despite this best effort he or she may not be able to achieve
the desired outcome. It may be that in the student’s perception, the ‘successful’
completion of the evaluation is trivial. All one has to do is fill in the multiple choice grid.
Likewise, the range of effort that may be devoted to completing the evaluation may not
be apparent. Consequently, in the ‘Further Information’ supplied between Decision A
and Decision B, we provided a situation in which the student was told that the
hypothetical course evaluation contained ‘several open-ended essay questions which will
require a great deal of effort for you to complete’. Furthermore, we told the students that
despite their best efforts their feedback might not be helpful to the reader. This added
the necessary uncertainty about the reward of effort, as well as providing a feeling that
the required effort could be considerable. The students were further reminded that their
participation in student evaluations is voluntary and they are free to decide to what extent
they would participate in the evaluation.
It is quite common that open-ended questions are enclosed in a course evaluation to
allow students to express an unconstrained opinion about some aspect of the class and/or
instructor. Such questions usually provide important diagnostic information and insight
for the formative evaluation about the course and instructor (Calderon et al., 1996).
Though important, open-ended questions are more difficult to summarise and report. Our
instrument explained that the reader could misinterpret the evaluator’s feedback in the
essay. Likewise, the data from multiple choice questions could be difficult to interpret
or meaningless if the questionnaire is designed poorly or the questions are ambiguous or
the evaluation is administered inappropriately. Therefore, despite his or her efforts, the
student may not be successful in contributing meaningfully to the evaluation process.
Experimental Controls
The participants were presented with 16 hypothetical situations. They were supposed to
detach themselves from their past experiences and evaluate the hypothetical situations
from a third party perspective. If the respondents were successful in doing this, we would
expect to find no correlation between their actual experiences with student-generated
evaluations or background and their responses. To test this, we calculated Pearson’s
78 Y. Chen & L. B. Hoshower
correlations between the R2 value of the valence model and four selected demographic
factors. These factors are gender, grade point average (GPA), impression of professors
and perception about the evaluation system. The coding of gender was 1 for male and
0 for female. The impression of professors and perception about the evaluation system
were measured by two 11 point scale demographic questions. Participating students were
asked ‘In general, how do you describe the professors you have had at this institution?’
and ‘What is your general impression about the course evaluation system?’. We also
calculated the correlation between the R2 value of the force model and the four
demographic factors. We used these correlations to assess whether the subjects were able
to evaluate the 16 hypothetical situations objectively without bias and thus were
appropriate for this study.
Results
Valence Model
Through the use of multiple regression analysis, we sought to determine each student’s
perception of the attractiveness of participating in the evaluation. Decision A (Vj) served
as the dependent variable and the four second level outcome instruments (Ijk) served as
the independent variables. The resulting standardised regression coefficients represent
the relative importance (attractiveness) of each of the second level outcomes to each
participant in arriving at Decision A. Table 2 presents the mean adjusted R2 of the
regressions and the mean standardised values of each outcome. Detailed regression
results for each participant are not presented but are available from the authors.
As indicated in Table 2, the mean R2 of the individual regression models is 0.69 for
the freshman group and 0.71 for the senior group. The mean R2 represents the percentage
of total variation in responses which is explained by the multiple regression. Thus, these
relatively high mean R2 values indicate that the valence model of expectancy theory
TABLE 2. Valence model regression resultsa
Frequency of
significance at
n Mean SD Range 0.05 level
Group I: Freshmen
Adjusted R2 105 0.69 0.14 0.35–0.98 104/105
Standardised weight
V1 105 0.47 0.17 0.03–0.92 86/105
V2 105 0.26 0.24 ⫺ 0.69–0.83 50/105
V3 105 0.42 0.16 ⫺ 0.08–0.77 80/105
V4 105 0.39 0.19 ⫺ 0.43–0.78 72/105
Group II: Seniors
Adjusted R2 103 0.71 0.15 0.20–0.98 101/103
Standardised weight
V1 103 0.44 0.22 ⫺ 0.27–0.78 78/103
V2 103 0.33 0.30 ⫺ 0.55–0.97 63/103
V3 103 0.42 0.19 ⫺ 0.23–0.87 77/103
V4 103 0.33 0.19 ⫺ 0.14–0.75 62/103
V1, valence of teaching improvement; V2, valence of tenure and promotion decisions; V3,
valence of course improvement; V4, valence of result availability.
a
Results (i.e. mean, standard deviation, range and frequency of significance at 0.05) of
individual within-person regression models are reported.
Student Evaluation of Teaching 79
Group I: Freshmen
V1 0.47
V3 0.42 0.05 (V1 versus V3)
V4 0.39 0.16 (V3 versus V4)
V2 0.26 0.00 (V4 versus V2)
Group II: Seniors
V1 0.44
V3 0.42 0.47 (V1 versus V3)
V2 0.33 0.05 (V3 versus V2)
V4 0.33 0.86 (V2 versus V4)
difference for V2. The freshman group considered V4 (the results made available to
students) as an outcome significantly more attractive than did the senior group. The
senior group, in contrast, considered V2 (tenure and promotion influence) a close to
significantly more attractive outcome. Our interpretation of the former result is that
freshmen may be seeking more guidance in choosing professors, while seniors may have
an effective word-of-mouth system. Our interpretation of the latter result is that freshmen
are naive about the promotion and tenure system, while seniors are more aware of its
impact on individual professors and upon the composition of the faculty.
Table 3C presents the comparison between male and female students in perceiving the
four second level outcomes. The results indicate that there was no significant difference
in their weight to the second level outcomes except for V4. Male students considered the
Student Evaluation of Teaching 81
Frequency of
significance at
n Mean SD Range 0.05 level
Group I: Freshmen
Adjusted R2 105 0.75 0.20 0.10–1.00 104/105
Standardised weight
W1 105 0.50 0.35 ⫺ 0.19–1.00 73/105
W2 105 0.53 0.37 ⫺ 0.23–1.00 75/105
Group II: Seniors
Adjusted R2 103 0.78 0.18 0.13–1.00 101/103
Standardised weight
W1 103 0.53 0.31 ⫺ 0.05–0.99 79/103
W2 103 0.56 0.33 ⫺ 0.21–1.00 81/103
W1, weight placed on attractiveness of the evaluation; W2, weight placed on the expectancy of
successfully participating in the evaluation.
a
Results (i.e. mean, standard deviation, range and frequency of significance at 0.05) of individual
within-person regression models are reported.
results being made available as a more attractive outcome of course evaluations than did
female students.
Force Model
We then used multiple regression analysis to examine the force model (Decision B) in
the experiment. The dependent variable is the individual’s level of effort to participate
in the evaluation (Fi). The two independent variables are (1) each student’s perception
about the attractiveness of the system (Vj) from Decision A and (2) the expectancy
information (Eij ⫽ 10 or 90%), which is provided by the ‘Further Information’ of the test
instrument (see Appendix). The force model results are summarised in Table 4.
The mean R2 (0.75 and 0.78) indicates that the force model sufficiently explains the
students’ motivation of participating in the evaluation system. The mean standardised
regression coefficient W1 indicates the impact of the overall attractiveness of the
evaluation (Vj) while W2 indicates the impact of the expectation that a certain level of
effort leads to successful participation in the evaluation. Our results found no significant
difference between the mean standardised values for W1 and W2 for either group of
students. The P values of these t-tests were 0.60 and 0.54 for the freshman and senior
groups, respectively. (P values are not shown in Table 4.) These results imply that both
factors, the attractiveness of the evaluation system (W1) and the likelihood that the
student’s efforts will lead to success (W2), are of similar importance to the student’s
motivation to participate in the evaluation.
Experimental Controls
Table 5 presents Pearson’s correlations between the R2 values of the valence and force
models and four demographic factors: gender, GPA, impression of professors and
perception about the evaluation system [4]. This creates eight correlations for each group
or a total of 16 correlations. These correlations are shown in the two right hand columns
of Table 5. None of these 16 correlations is significant, suggesting that neither the
82 Y. Chen & L. B. Hoshower
Group I: Freshmen
Gender 0.03 (0.75) 0.03 (0.73) ⫺ 0.18 (0.07) 0.09 (0.37) ⫺ 0.00 (0.98)
GPA 0.10 (0.37) 0.10 (0.35) 0.18 (0.09) 0.17 (0.11)
Impression of professors 0.20 (0.04) ⫺ 0.03 (0.73) 0.01 (0.96)
Impression of evaluation 0.08 (0.42) 0.13 (0.08)
Group II: Seniors
Gender ⫺ 0.28 (0.00) ⫺ 0.14 (0.16) ⫺ 0.10 (0.31) ⫺ 0.15 (0.12) ⫺ 0.03 (0.76)
GPA 0.20 (0.04) ⫺ 0.02 (0.86) 0.15 (0.13) 0.04 (0.72)
Impression of professors 0.41 (0.00) 0.10 (0.32) 0.03 (0.79)
Impression of evaluation 0.07 (0.47) 0.03 (0.78)
students’ perception of the attractiveness of the evaluation system nor their motivation
to participate is correlated with their background or with their prior experience with
evaluation systems. These results also support our argument that the subjects we used
were appropriate for this study because neither their background nor their prior
experience with professors and teaching evaluations affected their perceptions of the
evaluation systems tested in the questionnaire [5].
To examine if an order effect is present in our experimental design, we administered
two versions of the instrument; each had the order of the 16 hypothetical situations
determined at random. We then ran regression using the average R2 values from the two
random order versions as the dependent variable and the order as the independent
dummy variable and found no association between the two. This result suggests that
there was no order effect in our experimental design.
speculation, which competes with our initial speculation, is that this gender–GPA
relationship is due to a self-selection of males into majors that are traditionally stingier
with grades, such as engineering. GPA differences due to majors for freshmen are
minimal since most freshmen are taking similar general education requirements.
Our final interesting finding is drawn from Table 1. The data show that the freshman
group has a significantly higher regard of professors and student-generated teaching
evaluation systems than the senior group, with t-test P values of 0.01 and 0.00,
respectively (P values not shown in Table 1). This result is the opposite of our a priori
belief. We expected that seniors would be engaged in courses required by their major,
which presumably would be more consistent with their educational interests. Typically,
these senior level courses would have a smaller class size and would be staffed by
professors rather than graduate assistants. All of these factors are correlated with higher
evaluations of the professor. So, if these correlations are real, rather than spurious, it
is likely that an important change, which is probably worth investigating, has taken
place.
Concluding Remarks
Limitations and Future Research
Some limitations of this study need to be discussed. First, the selection of subjects was
not random. Students became subjects by virtue of being present the day their class was
surveyed. The selection of classes was arbitrary. Consequently, caution should be used
in generalising the results to other groups and settings. Second, an experimental task was
used in this study and the subjects’ responses were gathered in a controlled environment
rather than in a real world setting, although sitting in a classroom completing a teaching
evaluation and sitting in a classroom completing an instrument about teaching evalua-
tions are similar activities. Third, students were not given the opportunity for input on
the outcomes that motivate them to participate in a teaching evaluation. In the
instrument, four possible outcomes are given to the students. It is possible that other
possible outcomes of teaching evaluations may have a stronger impact on students’
motivation than the four outcomes used in this study. Future research can solicit input
from college students on what specifically they see or would like to see as the outcomes
of an evaluation system. Fourth, extreme levels of instrumentality and expectancy (10
and 90%) were used in the cases. This did not allow us to test for the full range within
the extremes. In another sense, such extremes may not exist in actual practice. Fifth, all
subjects came from only one institution, which may limit the applicability of the results
to other academic environments. Extensions can be made by future studies to examine
the effect of academic environments on the results of this study.
Implications
The expectancy model used in this study provides a good overall explanation of a
student’s motivation to participate in the evaluation of teaching effectiveness. The
valence model significantly explains a student’s assessment of the attractiveness of a
teaching evaluation system. Further, the force model provides a good explanation of a
student’s motivation to participate in the teaching evaluation. By the successful appli-
cation of expectancy theory, this study provides a better understanding of the behavioural
intent (motivation) of students’ participation in the teaching evaluation process.
84 Y. Chen & L. B. Hoshower
Our empirical results show that students have strong preferences for the uses of
teaching evaluations and these preferences are remarkably consistent across individuals.
Since quality student participation is an essential antecedent of the success of student
evaluations of teaching effectiveness, this knowledge of student motivation must be
considered thoughtfully when the system is implemented. If, however, students are kept
ignorant of the use of teaching evaluations or if teaching evaluations are used for
purposes that students do not value or if they see no visible results from their
participatory efforts, they will cease to give meaningful input.
Suggestions
Towards the goal of motivating students to participate in the teaching evaluation process,
we make the following practical suggestions. First, consider listing prominently the uses
of the teaching evaluation on the evaluation instrument. This will inform the students of
the uses of the evaluation. If these uses are consistent with the uses that students prefer
(and they believe that the evaluations will truly be used for these purposes), the students
will assign a high valence to the evaluation system. The next step is to show students
that their feedback really is used. Accomplishing this will increase their subjective
probabilities of the secondary outcomes that are stated on the evaluation. It would also
increase their subjective probabilities that they will be successful in providing meaning-
ful feedback, since they will see that previous feedback has been used successfully.
Thus, their force or motivation to participate will be high. One way of showing students
that their feedback has been used successfully is to require every instructor to cite on the
course syllabus one recent example of how student evaluations have helped improve this
particular course or have helped the instructor to improve his or her teaching. This seems
like a low cost, but highly visible way to show students the benefits of teaching
evaluations. (It may also have the salutary effect of encouraging faculty to ponder the
information contained in student evaluations and to act upon it.)
This research shows that students’ most attractive outcome of an evaluation system for
both seniors and freshmen is improving the professor’s teaching, while improving the
course is the second strongest outcome. Thus, students who believe that their feedback
on evaluations will improve teaching or the course or both should be highly motivated
to provide such feedback. Through better understanding of students’ needs and be-
havioural intentions, the results of this study can aid in creating evaluation systems that
truly respond to the needs of those who evaluate teaching performance.
Notes on Contributors
YINING CHEN PhD is Associate Professor, School of Accountancy at Ohio University.
Her current teaching and research interests are in accounting information systems,
financial accounting and auditing. Professor Chen earned her doctorate from the
College of Business Administration, University of South Carolina. Before joining
Ohio University, she was Assistant Professor of Accounting at Concordia University
in Canada for 2 years. She has also held instructional positions at the University of
South Carolina. Professor Chen has authored articles in Auditing: A Journal of
Practice & Theory, Journal of Management Information Systems, Issues in Accounting
Education, Review of Quantitative Finance & Accounting, Journal of End User
Computing, Journal of Computer Information Systems and Internal Auditing. Corre-
Student Evaluation of Teaching 85
spondence: 634 Copeland Hall, Ohio University, Athens, OH 45701, USA. Tel: ⫹ 1
740 593 4841. Fax: ⫹ 1 740 593 9342. E-mail: cheny@ohiou.edu
NOTES
[1] The two formative uses of teaching evaluations reflect teaching effectiveness issues.
[2] Five and 14 questionnaires were turned in incomplete or blank for the freshman and senior groups,
respectively.
[3] According to Montgomery (1984, p. 325), ‘If the experimenter can reasonably assume that certain
high-order interactions are negligible, then information on main effects and low-order interactions
may be obtained by running only a fraction of the complete factorial experiment’. A one-half
fraction of the 24 design can be found in Montgomery (1984, pp. 331–334). Prior expectancy theory
studies (see, for example, Burton et al., 1992; Snead & Harrell, 1995) also used one-half fractional
factorial design.
[4] Though gender is a dichotomous variable with 1 ⫽ male and 0 ⫽ female, the Pearson correlation
would still provide a basis for directional inferences.
[5] It is reasonable to expect an association between someone’s prior experience with an evaluation
system and his or her motivation to participate in that particular system. However, the participants
were asked to evaluate the 16 proposed cases (evaluation systems), none of which they had
experienced. Therefore, the non-significant correlations indicate that the subjects were able to
evaluate the proposed systems objectively without bias, thus supporting our argument that the
subjects we used were appropriate for this study.
REFERENCES
ABRAMI, P. C. (1989) How should we use student ratings to evaluate teaching?, Research in Higher
Education, 30 (2), pp. 221–227.
ABRAMI, P. C. & MIZENER, D. A. (1983) Does the attitude similarity of college professors and their
students produce “bias” in the course evaluations?, American Educational Research Journal, 20 (1),
pp. 123–136.
ABRAMI, P. C., MARILYN, H. M. & RAISZADEH, F. (2001) Business students’ perceptions of faculty
evaluations, The International Journal of Educational Management, 15(1), pp. 12–22.
AJZEN, I. & FISHBEIN, M. (1980) Understanding Attitudes and Predicting Social Behavior (Englewood
Cliffs, NJ, Prentice Hall).
ARUBAYI, E. A. (1987) Improvement of instruction and teacher effectiveness: are student ratings reliable
and valid?, Higher Education, 16 (3), pp. 267–278.
86 Y. Chen & L. B. Hoshower
BROWNELL, P. & MCINNES, M. (1986) Budgetary participation, motivation, and managerial performance,
Accounting Review, 61 (4), pp. 587–600.
BURTON, F. G., CHEN, Y., GROVER, V. & STEWART, K. A. (1992) An application of expectancy theory
for assessing user motivation to utilize an expert system, Journal of Management Information Systems,
9 (3) pp. 183–198.
BYRNE, C. J. (1992) Validity studies of teacher rating instruments: design and interpretation, Research
in Education, 48 (November), pp. 42–54.
CALDERON, T. G., GREEN, B. P. & REIDER, B. P. (1994) Extent of use of multiple information sources
in assessing accounting faculty teaching performance, in: American Accounting Association Ohio
Regional Meeting Proceeding (Columbus, OH, American Accounting Association).
CALDERON, T. G., GABBIN, A. L. & GREEN, B. P. (1996) A Framework for Encouraging Effective
Teaching (St. Harrisonburg, VA, American Accounting Association Center for Research in Account-
ing Education, James Madison University).
CASHIN, W. E. (1983) Concerns about using student ratings in community colleges, in: A. SMITH (Ed.)
Evaluating Faculty and Staff: new directions for community colleges (San Francisco, CA, Jossey-
Bass).
CASHIN, W. E. & DOWNEY, R. G. (1992) Using global student rating items for summative evaluation,
Journal of Educational Psychology, 84 (4), pp. 563–572.
CENTRA, J. A. (1993) Reflective Faculty Evaluation (San Francisco, CA, Jossey-Bass).
CENTRA, J. A. (1994) The use of the teaching portfolio and student evaluations for summative evaluation,
Journal of Higher Education, 65 (5), pp. 555–570.
COHEN, P. A. (1981) Student ratings of instruction and student achievement: a meta-analysis of
multisection validity studies, Review of Educational Research, 51 (3), pp. 281–309.
DESANCTIS, G. (1983) Expectancy theory as an explanation of voluntary use of a decision support
system, Psychological Reports, 52 (1), pp. 247–260.
DIVOKY, J. J. & ROTHERMEL, M. A. (1989) Improving teaching using systematic differences in student
course ratings, Journal of Education for Business, 65 (2), pp. 116–119.
DOUGLAS, P. D. & CARROLL, S. R. (1987) Faculty evaluations: are college students influenced by
differential purposes?, College Student Journal, 21 (4), pp. 360–365.
DRISCOLL, L. A. & GOODWIN, W. L. (1979) The effects of varying information about use and disposition
of results on university students’ evaluations of faculty courses, American Educational Research
Journal, 16 (1), pp. 25–37.
FELDMAN, K. A. (1977) Consistency and variability among college students in their ratings among
courses: a review and analysis, Research in Higher Education, 6 (3), pp. 223–274.
FERRIS, K. R. (1977) A test of the expectancy theory as motivation in an accounting environment, The
Accounting Review, 52 (3), pp. 605–614.
GEIGER, M. A. & COOPER, E. A. (1996) Using expectancy theory to assess student motivation, Issues in
Accounting Education, 11 (1), pp. 113–129.
GREEN, B. P., CALERON, T. G. & REIDER B. P. (1998). A content analysis of teaching evaluation
instruments used in accounting departments, Issues in Accounting Education, 13 (1), pp. 15–30.
HANCOCK, D. R. (1995) What teachers may do to influence student motivation: an application of
expectancy theory, The Journal of General Education, (44) (3), pp. 171–179.
HARRELL, A. M., CALDWELL, C. & DOTY, E. (1985) Within-person expectancy theory predictions of
accounting students’ motivation to achieve academic success, Accounting Review, 60 (4), pp. 724–735.
HOBSON, S. M. & TALBOT, D. M. (2001) Understanding student evaluations, College Teaching, 49 (1),
pp. 26–31.
HOFMAN, F. E. & KREMER, L. (1980) Attitudes toward higher education and course evaluation, Journal
of Educational Psychology, 72 (5), pp. 610–617.
HOWARD, G. S., CONWAY, C. G. & MAXWELL, S. E. (1985) Construct validity of measures of college
teaching effectiveness, Journal of Educational Psychology, 77 (2), pp. 187–196.
KEMP, B. W. & KUMAR, G. S. (1990) Student evaluations: are we using them correctly?, Journal of
Education for Business, 66 (2), pp. 106–111.
KWAN, K. P. (1999) How fair are student ratings in assessing the teaching performance of university
teachers?, Assessment & Evaluation in Higher Education, 24 (2), pp. 181–195.
LIN, Y. G., MCKEACHIE, W. J. & TUCKER, D. G. (1984) The use of student ratings in promotion decisions,
Journal of Higher Education, 55 (5), pp. 583–589.
MARLIN, J. E., JR & GAYNOR, P. (1989) Do anticipated grades affect student evaluations? A discriminant
analysis approach, College Student Journal, 23 (2), pp. 184–192.
Student Evaluation of Teaching 87
Appendix
Instructions
At the end of each quarter you are asked to evaluate the courses you have taken and the professors who
have conducted those courses. These evaluations may be used in various ways, such as: improving
teaching; rewarding (or punishing) the professors’ performance with tenure, promotion, or salary
increases; improving course content; and providing information to future students who are contemplating
taking this course or this professor.
This exercise presents 16 situations. Each situation is different with respect to how the evaluation is
likely to be used. We want to know how attractive participation in the evaluation is to you in each given
situation.
You are asked to make two decisions. You must first decide how attractive it would be for you to
participate in the evaluation (DECISION A). Next you must decide how much effort to exert in
completing the evaluation (DECISION B). Use the information provided in each situation to reach your
decisions. There are no ‘right’ or ‘wrong’ responses, so express your opinions freely.
Situations 2 to 16 vary in combinations of the second-level outcomes and expectancy levels, e.g. situation
2, low/low/low/low/low; situation 3, low/high/low/high/high; situation 4, low//low/low/low/high; situ-
ation 5, low/low/high/high/low; situation 6, high/low/low/high/low; situation 7, high/high/high/high/high;
situation 8, high/low/high/low/high; situation 9, high/low/high/low/low; situation 10, low/high/high/low/
high; situation 11, low/high/high/low/low; situation 12, high/high/high/high/low; situation 13, high/high/
low/low/high; situation 14, low/high/low/high/low; situation 15, high/low/low/high/high; situation 16,
low/low/high/high/high.