Michel Balinski, Rida Laraki - Majority Judgment
Michel Balinski, Rida Laraki - Majority Judgment
Michel Balinski, Rida Laraki - Majority Judgment
Majority Judgment
Measuring, Ranking, and Electing
Contents
Preface
Majority Judgment
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Voting in Practice
2.1
2.2
2.3
2.4
2.5
2.6
2.7
ix
21
23
47
67
vi
Contents
11
161
175
Inputs
176
Social Grading Functions 176
Social Ranking Functions 181
The Role of Judges Utilities 183
Strategy in Grading
10.1
10.2
10.3
10.4
156
New Model
9.1
9.2
9.3
9.4
10
129
Students
130
Employees
134
Musicians
136
Skaters and Gymnasts 139
Divers
147
Countries
148
Wines
149
The Paris Wine Tasting of 1976
Conclusion
158
Common Language
8.1
8.2
8.3
8.4
8.5
8.6
Unrealistic Inputs
112
Statistical Left-Right Spectra 117
Bordas and Condorcets Bias for the Center
Conclusion
127
Judging in Practice
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
93
187
Meaningfulness
199
199
111
121
Contents
vii
12
Majority-Grade
12.1
12.2
12.3
12.4
12.5
12.6
12.7
13
14
209
Majority-Ranking
13.1
13.2
13.3
13.4
Large Electorates
16
220
235
209
219
Strategy-Proofness in Ranking
Majority-Value 223
Characterization 226
Juries of Different Sizes 230
14.1 Majority-Gauge
236
14.2 Abbreviated Majority-Value
14.3 Other Rules
244
15
204
239
251
279
17
Point-Summing Methods
293
18
Approval Voting
18.1
18.2
18.3
18.4
294
306
315
318
280
viii
19
Contents
339
20
21
351
Equilibria
352
Honest Equilibria
355
Best Response Equilibria 360
Best Response Dynamics 366
Strategic Majority Judgment Winner
Condorcet-Judgment-Winner 373
Conclusion
374
Multicriteria Ranking
370
375
22
A Summing Up
References
395
Name Index
405
Subject Index
409
387
378
Preface
There was a risk in theorizing. I had witnessed, close up, the fatal, comic effect upon
professors and students of hypotheses which had become unconscious convictions. And
thus warned, I had thrown overboard, as a reporter facing facts, many of my college-bred
notions . . . It was hard to do; ideas harden like arteries; indeed, one theory of mine is that
convictions are identical with hardened arteries. But the facts . . . forced me to drop my
academic theories one by one; and my reward was the discovery that it was as pleasant
to change ones mind as it was to change ones clothes. The practice led one to other,
more fascinatingtheories.
Lincoln Steffens
Kenneth Arrow begins his classic, Social Choice and Individual Values, with the
sentence, In a capitalist democracy there are essentially two methods by which
social choices can be made: voting, typically used to make political decisions,
and the market mechanism, typically used to make economic decisions. In
the second paragraph he poses the problem of social choice. The methods
of voting and the market . . . are methods of amalgamating the tastes of many
individuals in the making of social choices . . . [Any] individual can be rational
in his choices. Can such consistency be attributed to collective modes of choice,
where the wills of many people are involved? (1951, 12). His celebrated
impossibility theorem answers no! for voting.
Arrows conclusion begs refutation.
But Arrow carefully explained the limitations of his analysis. First, he explicitly assumed that the behavior of an individual in making choices is describable
by means of a preference scale [an ordered list] (11). Second, he deliberately ignored the strategic aspects of voting, in his words, the game aspects
of social choice (7). Third, he clearly stated his acceptance of the standard
though unreasonable view that individual values . . . are not capable of being
altered by the nature of the process itself (8). This assumes that a voters or
a judges expressionhis or her vote, his or her evaluationis not altered by
the actions of other voters or judges, nor by the mechanism by which all the
Preface
expressions are amalgamated. In so doing, Arrow anticipated future developments, showing where to seek an escape from the logical conclusions of his
analysis: a brand new model of social choice not bound by these restrictive and
unrealistic assumptions. Practice, it turns out, suggests how the model should
be formulated.
A mechanism is an instrument or process, physical or mental, by which
something is done or comes into being.1 Society routinely uses mechanisms
left unmentioned byArrow to collectively designate who or what is best, secondbest, and so on, down to worst: it measures in one way or another, then ranks
in accordance with the measures and declares the winner to be the one with the
highest score. Students are graded, then ranked according to their grades; the
attributes of wines (e.g., tannicity, finish, bouquet, body) are assigned numerical
values, then ranked according to the values; figure skaters, gymnasts, and divers
are given marks for a particular exercise, then ranked according to their marks.
In each of these instances there may be several or many judges who assign scores
that may be quite different, yet each uses a language that is common to all and
is understood by all. Kenneth is an A+ student is a meaningful statement (or
was before the age of grade inflation). The statement that Sonjas free skating
performance is worth 5.9 when the traditional 06 scale was used, or that her
skating skills component is 7.75 with the newly adopted scale, or that Xu
Sangs inward flying 1 12 somersault was a 9.0 means something specific to
figure skating or diving enthusiasts, though all may not agree with the score
that was assigned.
The market mechanismperhaps better thought of as an invisible hand,
since its process is but loosely understooditself provides the world with a
measure: price expressed in terms of money. Money is, of course, a complex
concept that plays many roles in the dynamic workings of an economy, but
its units are mere convention, a numraire expressed in euros, shekels, dinars,
or dollars, invented to simplify or solve problems, just as are letter grades or
numerical levels of performance. In past times the units differed: grain in ancient
Babylon, rice in Japan, cigarettes in concentration camps, though gold and
silvermore durable commoditieshave enjoyed a longer-lasting use. Price
measures, and as a consequence ranks. An instance is the famous classification
of 1855, said to be of the Bordeaux wines, though in fact limited to the Mdocs
and Sauternes and exactly one Graves, Haut-Brion. The Exposition universelle
de Paris of that year prompted an official request for a complete and satisfactory
description of the wines of the department (Debuigne 1970). This ranking
1. American Heritage Dictionary. 2d college ed. Boston: Houghton Mifflin, 1982.
Preface
xi
of the grands crus has steadfastly maintained its importance to this day; it
was determined by the prices of the wines prevalent in those years (Markham
1997). Auctionsthe traditional English ascending-price auction, the sealed
second-price auction of Vickrey, the Dutch descending-price auctionare other
well-defined mechanisms that also use price as a measure to determine winners.
Measurement as a means to choose and to rank has accumulated a rich experience and taught lessons well worth learning. Ranking figure skating competitions is a good example: it is an ongoing, dynamic process that reveals many
of the defects known in methods of elections. The traditional system often saw
the lead of one skater over another reverse as a result of a third s performance
to the obvious consternation of the spectators. This is, of course, nothing but
a violation of Arrows independence of irrelevant alternatives: no system of
ranking should allow the relative positions of two competitors to be influenced
by the (irrelevant) performance of a third. Moreover, cheatingwhen a judge
exaggerates the rank of competitors up or downhas provoked major scandals at the Olympics. This is nothing but strategic manipulation, a problem
well known in voting theory and central to the theory of games. What did the
International Skating Union do? It invented a new method that appears to satisfy Arrows condition and pretends to combat the possibility of cheating. It is
a method that measures. It is a mechanism that has flawsas do the market
mechanism when markets are rigged and auction mechanisms that are often
plagued by the winners curse 2 but these flaws may be overcome.
Up to now, the problem of ranking competitions among athletes, goods, or
musicians has remained completely separated from the problem of elections.
No organized body of knowledge has accompanied ranking competitors. The
methods are, by and large, home-grown, invented by skating enthusiasts for
skaters, by nologists for classifying wines, by piano maestros for discerning
prizes to pianists. Nevertheless, the methods show good sense: increasingly,
they use measures.
In contrast, the specialists in how to elect and rank have almost exclusively
devoted their attentions to elections, either in small committees such as juries
or in large electorates such as nations, where a voter submits either a vote
for one or several candidates or an ordered list of his preferences among all
candidates. The focus of the work has remained resolutely the same for almost
a millennium. It is the model analyzed by Arrow: how to transform the socalled preference lists of individuals or judges into a preference list of society
2. The winners curse occurs when the actual value of the good obtained is well below what was
paid for it. This happens typically because the winning bidder made the highest estimate of a good
of unknown value, as for example when bidding for the right to exploit an off-shore oil field.
xii
Preface
or the jury. Despite Arrows devastating result showing the traditional theory
of social choice has no truly acceptable solution to the problem of how to elect
and how to rank, that modeland the manner in which voters may express
themselveshas remained unquestioned.
We show that this traditional model is fundamentally flawed, for reasons that
go well beyond the classical paradoxes. First, it assumes that the preferences
of voters and judges are expressed as rank-orderings. This is clearly not true: a
voter confronted with (say) twelve candidates knows from personal experience
that he does not formulate a list of all candidates from first to last. We present
evidence that proves that this conception of what judges or voters have in their
minds is simply false. Second, the traditional model does not make a clear
distinction between the judges or voters preferences and their votes, which
are the messages they are allowed to send by the method of voting that is
used. A voter who announces a rank-orderingor who votes for one candidate
among manyis not and cannot be expressing all of his preferences; he is
only sending a very limited and strategically chosen message. Third, there
is a profound difference between the problem of electing one candidate and
the problem of ranking several or many candidates, though this has not been
widely appreciated. H. Peyton Young (1986) is the first to have made a clear
distinction between them, showing how a same line of reasoning can yield a
ranking of the candidates and a winner who is not the first-place candidate of
the ranking. This, it turns out, is an irreconcilable difference of the traditional
model; a new impossibility theorem shows how and why the two problems are
incompatible.
In summary, insofar as it concerns ranking and electing, social choice theory
hypothesizes a faulty model of reality to produce an inconsistent theory. So
why on earth use it?
The analogy with wine, sports, and music opens the door to another view.
Lord Kelvins celebrated warning may be seen on the faade of the Social
Sciences Research Building at the University of Chicago: If you cannot measure, your knowledge is meager and unsatisfactory (see Kuhn 1961, 178). The
economist Frank Knight is reported to have quipped that for social scientists
this means, If you cannot measure, measure anyhow (Kuhn 1961, 183). His
remark makes more sense than most people at first may suppose.
We have studied what is done in practice. We have learned from the insights
of nologists, sportsmen, pianists, and others that the fundamental question
should be posed differently. Instead of trying to translate many individual rankings of competitors into a single collective ranking, or many individual lists of
preferences into a single collective list of preferences, a common language to
measure should be defined, individuals should measure and assign grades to
Preface
xiii
each competitor in that language, and the many individual grades should then
determine the single collective grade of each competitor. In short, the central
problem becomes how to transform many individual grades of a common language into a single collective grade, when the many individuals have unknown
preferences that are too complex to be formulated. Sharing a common language
of grades makes no assumptions about a voters or a judges utilities or preferences. Utilities measure the satisfaction of a voter or a judge, grades measure the
merits of competitors. The basic atoms of Arrows model are the comparisons
between pairs of alternatives, competitors, or candidates. The basic atoms of
our model are the grades of a common language assigned to alternatives, competitors, or candidates. Grades yield rankings, but rankings assuredly do not
yield grades.
The celebrated market mechanism works because it uses a common measure
that facilitates comparisons of goods, services, assets, debts: monetary units.
The evaluations in terms of a common language of grades are no more the
utilities of judges or voters making a collective decision than the evaluations of
items bought and sold in terms of money are the utilities of agents in a market.
The common language of the market is money. The money of collective decision isin our modelthe common language of grades.
The change in point of viewin the premises of the underlying model
of social choicechanges everything. The method and theory that emerges
is simple. It must be, to be practical. We have called it majority judgment. It
may be used to elect officials, to classify wines, and to rank contestants for
international piano prizes and Olympic competitors in skating, diving, and
gymnastics. It has been tested in classifying wines and electing a candidate
to political office. A simple theory characterizes the methods that satisfy all
the good properties that were stated in Arrows axioms. Beyond overcoming
Arrows impossibility, the model makes it possible to address another important
question: What mechanisms are the most robust against cheating and strategic manipulation of the judges or the voters? Or, what mechanisms make
the judges and voters optimal strategies be to give the grades they believe the
competitors and candidates merit? Understanding and discovering the psychology or the possible secret cabals of judges is one perfectly reasonable approach
to combating strategic manipulation. Another is to design methods that make
it impossible for cheating to take place or, if that ideal is unattainable, that
minimize the possibility of manipulation. When grades replace orders, possibility theorems and strategy-proof or least manipulative methods replace
impossibility theorems. The majority judgment method uniquely best combats strategic manipulation while satisfying the desirable properties of classical
social choice theory. In fact, the approach suggests a new mechanism in the
xiv
Preface
Preface
xv
critiques. We are deeply indebted to Jack Nagel, Hannu Nurmi, Maurice Salles,
and John Weymark, each of whom analyzed the entire manuscript with great
care and made very substantial recommendations. It is a particular pleasure to
acknowledge the contributions of two exceptionally gifted graduate students in
mathematics, Andrew Jennings and Cheng Wan. Andy Jennings read the entire
manuscript, checking the proofs, the statements, the reasoning, and the prose,
thereby eliminating errors, ambiguities, and infelicities. Cheng Wan carried
out all the extensive computer analyses of the 2007 Orsay experiment. Claude
Henry and Vincent Renard gave us their unstinting support and encouragement
throughout. Anna Kehres-Diaz accepted and espoused the importance of the
ideas for real, practical use, and backed the application for a patent and the
development of professional software that permitted several experiments to be
carried out on the Web. Last, yet also first, we thank Jacques Blouin. Dissatisfied with the existing methods for evaluating wines, he asked us whether we
could find a better way; that is how this work began.
Institutions The C.N.R.S. (Centre National de Recherche Scientifique), where
both of us have held permanent positions, is wonderful for those who wish
to pursue long-term projects: they are free to go ahead and do it! The cole
Polytechnique, more particlarly, the Laboratoire dconomtrie, provided the
ideal interdisciplinary environment in which to pursue this project that is
at once mathematics, economics, political science, operations research, and
statistics, with occasional dashes of linguistics, psychology, sociology, and philosophy. The D.R.I.P. (Direction des relations industrielles et des partanariats)
of the cole Polytechnique has steadfastly supported every endeavor to realize
practical applications of our ideas.
The 2007 Orsay experience could not have been realized without the generous support of Orsays Mayor, Madame Marie-Hlne Aubry, the staff of
the Mayors office, and our friends and colleagues who sacrificed their Sunday (a beautiful spring day) to urging voters to participate and explaining the
idea: Pierre Brochot, Stphanie Brochot Laraki, David Chavalarias, Sophie
Chemarin, Clmence Christin, Maximilien Laye, Jean-Philippe Nicolai, Matias
Nuez, Vianney Perchet, Jrme Renault, Claudia Saavedra, Gilles Stoltz,
Tristan Tomala, Marie-Anne Valfort, and Guillaume Vigeral. Thanks to them,
the experiment was successful and its expense limited to the costs of ballots,
envelopes, and posters.
Majority Judgment
For with what judgement ye judge, ye shall be judged: and with what measure ye mete,
it shall be measured to you again.
Matthew 7:2
1.1 Inputs and Outputs
Throughout the world, voters elect candidates, and judges rank competitors,
goods, alternatives, cities, restaurants, universities, employees, and students.
How? Schemes, devices, or mechanisms are invented to reach decisions. Each
defines
the specific form of the voters and judges inputs, the messages used to exert
their wills, and
the procedure by which the inputs or messages are amalgamated or transformed into a final decision, social choice, or output.
Chapter 1
rules dictate the way the scores given by the twelve judges to each of the many
parts of a competitors performance become the number grades that are their
input messages. The output is a rank-ordering determined by first, eliminating
the grades of three judges chosen at random; second, eliminating the highest
and lowest of the grades that are left; third, ranking the competitors according
to the averages of the seven remaining grades. This complicated procedure is
intended to combat judges who manipulate their inputs to favor or disfavor
one or another competitor (the piano, wine, and figure skating mechanisms are
described in detail in chapter 7).
In Australian elections, a voters input is a complete rank-ordering of the candidates, and the output is a winner. But in most countries a voters input message
is at most one vote for one candidate, and the output is a winner, the candidate
with the most votes; or the output is a ranking determined by the candidates
respective total votes. Approval voting is a relatively new mechanism used
by several professional scientific societies: a voters input message is one vote
or none for every candidate, the output is a winner or a ranking determined
by the candidates respective total votes. These electoral schemeseach a pure
invention to elicit the opinions of votersoffer an extremely limited possibility
for voters to express themselves (various voting mechanisms used in practice
are described in detail in chapter 2; other traditional methods in chapters 3
and 4; approval voting is discussed and analyzed in chapter 18; point-summing
methods in chapter 17).
In fact, every mechanism generates information, notably the candidates total
scores, that in many situations may be viewed as constituting a part of the
genuine outputs (see chapter 20).
1.2 Messages of a Common Language
Majority Judgment
them with respect to a set of benchmarks that constitute a shared scale of evaluation. By way of contrast, ranking competitors is only relative; it bars any
scale of evaluation and ignores any sense of shared benchmarks. The common
languages used by judges in wine, figure skating, diving, and other competitions
are described in chapter 7; their connections with measurement in general are
discussed in chapter 8.
Judges and voters have complex aims, ends, purposes, and wishes: their
preferences or utilities. A judges or a voters preferences may depend on many
factors, including his beliefs about what is right and wrong, about the common
language, about the method that transforms input messages into decisions, about
the other judges or voters acts and behaviors, in addition to his evaluations of
the competitors or candidates. But the judges or voters input messagesthe
grades they giveare assuredly not their preferences: a judge may dislike a
wine, a dive, or a part of a skaters performance yet give it a high grade because
of its merits; or a judge may like it yet give it a low grade because of its demerits.
Rules and regulations define how certain performances are to be evaluated, yet
votes can be strategic. The fact that voters or judges share a common language
of grades makes no assumptions about their preferences or utilities. Utilities
are measures of the judges or voters satisfaction with the output, the decision
of the jury or the society; grades are measures of the merits of competitors
used as inputs. A judges or a voters input message is chosen strategically:
depending on the mechanism for transforming inputs into an output, a judge
may exaggerate the grades he gives, upward or downward, in the hopes of
influencing the final result.
Arrows theorem plays an important role in this approach as well. It proves in
theory what practitioners intuitively have learned by trial and error: without a
common language there can be no consistent collective decision (see chapter 11,
theorem 11.6a). Its true moral is that judges and voters must express themselves
in a common language.
1.3 Majority-Grade
Chapter 1
been accepted in practice (e.g., for wine tasting; see Peynaud and Blouin 2006,
104 107). A host of different arguments prove they are the only social decision
functions that satisfy each of various desirable properties.
Two supplementary concepts explained below are linked to the majorityranking. The majority-value is a sequence of grades that determines the
majority-ranking. The majority-gauge is a simplified majority-value that is
sufficient to determine the majority-ranking when the number of judges or
the electorate is large.
To begin, the aim is to decide on a final grade, given the individual messages of all the judges. Suppose the common language is a set of ten integers
{0, 3, 5, 6, . . . , 11, 13} (from worst to best), the system of school grades previously used in Denmark (it is amusing that 1, 2, 4, and 12 are missing, the
reasons for which are explained in chapter 8; but this has no influence on
the present discussion). Imagine that the grades given to a competitor by all the
judges are listed in ascending order from worst to best. When the number of
judges is odd, the majority-grade is the grade that is in the middle of the list
(the median, in statistics). For example, if there are nine judges who give a competitor the grades {7, 7, 8, 8, 8, 9, 10, 11, 11}, the competitors majority-grade
is 8. When the number of judges is even, there is a middle-interval (which can,
of course, be reduced to a single grade if the two middle grades are the same),
and the majority-grade is the lowest grade of the middle-interval (the lower
median when there are two in the middle). For example, if there are eight
judges who give a competitor the grades {7, 7, 8, 8, 11, 11, 11, 13}, the middleinterval goes from 8 to 11 and thus is the set of grades {8, 9, 10, 11}, and the
competitors majority-grade is 8.
The majority-grade of a competitor is the highest grade approved by an
absolute majority of the electors: more than 50% of the electors give the competitor at least a grade of , but every grade lower than is rejected by an
absolute majority. Thus the majority-grade of a competitor is the final grade
wished by the majority. In the first example, {7, 7, 8, 8, 8, 9, 10, 11, 11}, only
two (of nine) judges would vote for a lower grade, and only four for a higher
grade. In the second example, {7, 7, 8, 8, 11, 11, 11, 13}, only two (of eight)
judges would vote for a lower grade, and only four for a higher grade.
The choice of the smallest grade of the middle-interval when the number of
judges is even is the logical consequence of a principle of consensus. Compare two competitors A and B when there is an even number of judges: if
all of As grades strictly belong to the middle-interval of Bs grades, then
since there is a greater consensus among the judges for As grade than for Bs
grade, A should be ranked at least as high as B. For example, if Bs grades
are {7, 7, 8, 8, 11, 11, 11, 13} (the second example) and all of As grades are
Majority Judgment
either 9 or 10 (e.g., {9, 9, 9, 9, 9, 10, 10, 10}) and thus strictly belong to Bs
middle-interval, then A should rank higher than B.
The majority-grade is necessarily a word that belongs to the common language, and it has an absolute meaning. When an absolute majority of the judges
give a competitor a particular grade , then the competitors majority-grade
must necessarily be , for if the number of judges is odd, the middle grade
is necessarily , and if the number of judges is even, the two middle grades are
necessarily .
It is reasonable to suppose that if a judge wishes that a competitor be accorded
a certain gradesay, a 9 in the Denmark school scalethen the more the competitors final grade deviates from 9, the greater will be the judges discontent.
When this is true, the best strategy for a judge is always to assign the grade
that she believes the competitor merits, neither more nor less. For suppose that
a judge believes that a candidate merits a grade of 9. If the majority-grade was
higher, say, 11, she might be tempted, in anticipation of such an outcome, to
assign a lower grade than 9. But doing so would change nothing because the
majority-grade 11 would resolutely remain in the middle whatever lower grade
she chose to give. If, on the other hand, the majority-grade was lower, say, 7,
she might anticipate the outcome and be tempted to assign a higher grade than
9. Again this would change nothing because the majority-grade 7 would stay
in the middle. The only other possibility is that the judge assigns a grade of 9
and the majority-grade is 9; in this case the judge is completely content. Thus
in any case honesty is the best policy.
1.4 Majority-Ranking
In some applications, a complete ranking among all competitors or alternatives is not sought. For example, most wine competitions only wish to discern
one of four medals (grand gold, gold, silver, bronze) or none. In many applications, however, notably in sports competitions, a complete rank-ordering is
essential.
When two competitors have different majority-grades, the competitor with
the higher grade is naturally ranked higher. Suppose, however, that two competitors A and B have the same majority-grade. For example, As grades are
{7, 9, 9, 11, 11} and Bs are {8, 9, 9, 10, 11}, so they both have a majority-grade
of 9. How are they to be compared? Their common (first) majority-grade is
dropped (a single one) because it has already yielded all the information it can
give relevant to comparing A with B, and the majority-grades of the grades that
remain to each competitortheir second majority-gradesare found. In this
example, As remaining grades are {7, 9, 11, 11} and Bs are {8, 9, 10, 11}, so
Chapter 1
their second majority-grades are both again 9. If one were higher than the other,
it would designate the competitor who is ranked higher. If, as here, the second
majority-grades are the same, they are discarded, and the third majority-grades
of the competitorsthe majority-grades of the grades that remainare found,
and so on, until one competitor is ranked ahead of the other. In this case, As
third majority-grade is 11 and Bs is 10, so A ranks above B. One of the two
must be ranked ahead of the other unless the competitors have identical sets of
grades. This defines the majority-ranking.
1.5 Majority-Value
Majority Judgment
Table 1.1a
Hypothetical Example: Four Competitors, Nine Judges
Judge
1
2
3
4
5
6
7
8
9
13
10
09
10
04
13
11
10
10
09
09
08
11
10
08
07
11
11
07
13
11
09
09
00
10
09
11
08
05
13
08
09
02
13
08
07
Table 1.1b
Hypothetical Example: Grades Ordered from Best to Worst
A
13
13
11
10
10
10
10
09
04
11
11
11
10
09
09
08
08
07
13
11
11
10
09
09
09
07
00
13
13
09
08
08
08
07
05
02
S
Bourgueil
S
Cahors
GoodGoodVery GoodPassable
>
GoodGoodVery GoodMediocre
>
GoodPassable
Chapter 1
Table 1.2
Hypothetical Example: Three Wines, Five Judges
St. Amour
Bourgueil
Cahors
Very Good
Very Good
Good
Good
Passable
Excellent
Very Good
Good
Good
Mediocre
Excellent
Excellent
Good
Passable
Mediocre
The first mention where two majority-values differ determines the order:
thus the St. Amour and the Bourgeuil have the same mentions in the first three
positions, and the fourth determines which ranks ahead of the other.
Two important points may be made immediately. First, if one or several
competitors withdraw, the majority-ranking among the remaining competitors
necessarily agrees with the majority-ranking among all competitors. So the
majority-ranking satisfies Arrows independence of irrelevant alternatives (IIA)
condition: the relative positions of two competitors in the majority-ranking do
not depend on the merits of another competitor. This is decidedly not the case
with most voting mechanisms used in practice, as the United States learned in
the presidential election of 2000 (see chapter 2), nor is it the case with the
methods traditionally used to rank figure skaters (see chapter 7).
Second, what is the aim of a competition (or of an election)? It is to reach
a consensual decision. The jury (or the society) seeks to find agreement. The
majority-ranking makes the effect of middle grades more decisive. A few extreme or cranky evaluations should have a less decisive effect, though of
course, every grade counts. If after the k best and the k worst grades of two
competitors are dropped, the grades of a woman rank her ahead of a man by the
majority-ranking, then she is ranked ahead of him by the majority-ranking of
the jury (or the society). To see this in the first example of this section, where
there are nine judges and four competitors, drop the two best and the two worst
grades of competitors B and C. C is ranked ahead of B, C S B, because on
the basis of the remaining five grades the majority-ranking puts C ahead of B
(in this case their grades are all the same except for one, an 8 for B and a
9 for C). It has already been pointed out that the majority-ranking satisfies
another such property, namely, when the number of judges is even and all the
grades of one competitor A strictly belong to the middle-interval of the grades
of another competitor B, then A should be ranked ahead of B. It is proven that
the majority-ranking is the only method that satisfies these two properties.
Majority Judgment
1.6 Majority-Gauge
10
Chapter 1
Bulletin de vote:
lection du Prsident de la Rpublique 2007
Pour prsider la France,
ayant pris tous les lments en compte,
je juge en conscience que ce candidat serait:
Trs Bien
Bien
Assez Bien
Passable
Insuffisant
A Rejeter
Olivier Besancenot
Marie-George Buffet
Grard Schivardi
Franois Bayrou
Jos Bov
Dominique Voynet
Philippe de Villiers
Sgolne Royal
Frdric Nihous
Jean-Marie Le Pen
Arlette Laguiller
Nicolas Sarkozy
Cochez une seule mention dans la ligne de chaque candidat.
Ne pas cocher une mention dans la ligne dun candidat revient le Rejeter.
Figure 1.1
Ballot, Orsay experiment, 2007 French presidential election.
The meanings of the grades are directly related to the question posed.
The sentences at the bottom of the ballot say, Check one grade in the line
of each candidate. No check in the line of a candidate means To Reject him.
We believe that every voter must be required to evaluate every candidate. A
voter having no opinion concerning a candidate has not even taken the time
to evaluate him and thus has implicitly rejected him (other possibilities are
discussed in chapters 13 and 14).
Of the 2,383 persons who cast official ballots, 1,752 participated in the experiment (74%). In fact, the rate of participation was slightly higher because in
France a voter is permitted (under certain conditions) to ask someone else to
vote in his place, but no one was allowed to vote twice in the experiment. Nineteen ballots were invalid, usually because more than one grade was assigned
to a candidate, leaving 1,733 valid majority judgment ballots. The results are
given in table 1.3.
Majority Judgment
11
Table 1.3
Majority Judgment Results, Three precincts of Orsay, April 22, 2007
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
Nihous
de Villiers
Schivardi
Le Pen
Excellent
Very
Good
Good
Acceptable
Poor
To Reject
No
Grade
13.6%
16.7%
19.1%
2.9%
4.1%
2.5%
1.5%
2.1%
0.3%
2.4%
0.5%
3.0%
30.7%
22.7%
19.8%
9.3%
9.9%
7.6%
6.0%
5.3%
1.8%
6.4%
1.0%
4.6%
25.1%
19.1%
14.3%
17.5%
16.3%
12.5%
11.4%
10.2%
5.3%
8.7%
3.9%
6.2%
14.8%
16.8%
11.5%
23.7%
16.0%
20.6%
16.0%
16.6%
11.0%
11.3%
9.5%
6.5%
8.4%
12.2%
7.1%
26.1%
22.6%
26.4%
25.7%
25.9%
26.7%
15.8%
24.9%
5.4%
4.5%
10.8%
26.5%
16.2%
27.9%
26.1%
35.3%
34.8%
47.8%
51.2%
54.6%
71.7%
2.9%
1.8%
1.7%
4.3%
3.2%
4.3%
4.2%
5.3%
7.2%
4.3%
5.8%
2.7%
Note: When there was no grade, it was counted as a To Reject, as per the instructions on the ballot
(so Bayrous To Reject was counted as 7.4%, Royals as 12.6%, . . . , Le Pens as 74.4%). There
were few.
12
Chapter 1
The triple (p, , q) is the candidates majority-gauge. S. Royals majoritygauge (see table 1.3) is (39.4%, Good, 41.5%) since 39.4% = 16.7% + 22.7%
of her grades are better than Good, and 41.5% = 16.8% + 12.2% + 10.8% +
1.8% are worse than Good. If the number or percentage p of the grades better
than a candidates majority-grade is higher than the number or percentage q
of those worse than the candidates majority-grade, then the majority-grade
is completed by a plus (+); otherwise the majority-grade is completed by
a minus (). Thus S. Royals majority-grade is Good. The plus or minus
attached to the majority-grade is implied by the majority-gauge, so it is not
necessary to show it, but for added clarity it is most often included, so
that, for example, Royals majority-gauge may be written (39.4%, Good,
41.5%).
Naturally a majority-grade+ is ahead of a majority-grade in the majorityranking. Of two majority-grade+s, the one having the higher number or
percentage of grades better than the majority-grade is ahead of the other; of two
majority-grades, the one having the higher number or percentage of grades
worse than the majority-grade is behind the other. For example, in table 1.4,
S. Royal and N. Sarkozy both have the majority-grade Good. Royal has 41.5%
worse than Good, and Sarkozy has 46.9% worse than Good, so Royal finishes
ahead of Sarkozy. O. Besancenot and M.-G. Buffet both have the majoritygrade Poor+. Besancenot has 46.3% better than Poor, and Buffet has 43.2%
better than Poor, so Besancenot finishes ahead of Buffet. To see a candidates
majority-gauge, imagine a see-saw or teeterboard with all the voters lined up
according to the grades they give, from best to worst. Assuming each voters
weight is the same, the grade given by the voter who stands at the fulcrum where
the board is in perfect balance is the majority-grade. Remove, now, all voters
who gave the majority-grade and place the fulcrum at the juncture between the
better than and worse than majority-grade. If the board tilts to the better grades,
a plus (+) is accorded, and the more it tilts, the better is the majority-gauge. If
it tilts to the worse grades, a minus () is accorded, and the more it tilts, the
worse is the majority-gauge.
The majority-gauges and the majority-ranking of the experiment are shown
in table 1.4. It may be seen that the majority-ranking is quite different than the
order of finish given by the official vote in these three voting precincts (which
are not representative of the vote in all of France; see the last two columns of
table 1.4). This is due to the fact that the majority judgment allows voters to
Majority Judgment
13
Table 1.4
The Majority-Gauges (p, , q) and the Majority-Ranking, Three Precincts of Orsay, April 22, 2007
1st
2d
3d
4th
5th
6th
7th
8th
9th
10th
11th
12th
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
Nihous
de Villiers
Schivardi
Le Pen
p
Better Than
MajorityGrade
MajorityGrade
q
Worse Than
MajorityGrade
Official
Vote,
3 Precincts
Official
National
Vote
44.3%
39.4%
38.9%
29.7%
46.3%
43.2%
34.9%
34.2%
45.0%
44.5%
39.7%
25.7%
Good +
Good
Good
Acceptable
Poor+
Poor+
Poor
Poor
To Reject
To Reject
To Reject
To Reject
30.6%
41.5%
46.9%
46.6%
31.2%
30.5%
39.4%
40.0%
25.5%
29.9%
29.0%
1.7%
2.5%
1.4%
0.9%
0.8%
0.3%
1.9%
0.2%
5.9%
18.6%
25.9%
31.2%
1.6%
4.1%
1.9%
1.3%
1.3%
1.1%
2.2%
0.3%
10.4%
express their opinions on all the candidates rather than simply singling out one
among them.
The reasons for believing in the validity of the experiment are given later,
but several salient observations are made here.
More than one of every three participants gave their highest grade to two or
more candidates.
This proves that voters do not have in their minds rank-orderings of the candidates (and still more evidence supports this claim). A rank-ordering does not
allow a voter to express an equal evaluation of candidates, or an intensity of
appreciation, or an outright rejection. It also shows that the actual systemcast
one vote for one among several or many candidatesforced one-third of the
voters to opt for a candidate when in fact they saw no difference among two or
more of them. These observations are convincing because voters had no incentive to vote strategically since they were only participating in an experiment. It
is precisely such experiments that can elicit the true opinions of voters.
Strategic voting played an important role in the French presidential election
of 2007. Voters had in mind what had happened in 2002, when the vote to the left
was so widely distributed among some eight candidates that instead of a second
14
Chapter 1
round between Jacques Chirac (the incumbent president and major candidate of
the right) and Lionel Jospin (the standing prime minister and major candidate of
the left), Chirac was pitted against Jean-Marie Le Pen, the perennial candidate
of the extreme right (see chapter 2). It seems safe to assert that in the first
round of 2007 a significant number of voters did not vote for the candidate they
preferred. Instead of voting for their favoritean ecologist, a communist, or a
Trotskyistmany voters of the left opted for Sgolne Royal, the socialist, the
major candidate of the left. And the same phenomenon occurred on the right:
it seems that many voters abandoned the extreme right to vote for the major
candidate of the right, the U.M.P. candidate, Nicolas Sarkozy. In contrast, the
majority judgment encourages a voter to express his convictions (as is proven
via several precise criteria that are defined in later chapters).
To see how the majority judgment resists strategic manipulation in the context of elections, take a candidate, say Sgolne Royal, whose majority-grade
is Good and whose majority-gauge is
(39.4%, Good , 41.5%).
Then, 39.4% of her grades are better than Good, 41.5% are worse than Good, so
19.1% are Good. Who are the voters who can change Royals majority-gauge
by changing the grades they give her, and what are their motivations to change?
Suppose a voter believes a candidate merits a grade of , and the further
the majority-grade is from , the less he likes it (a reasonable motivation: the
voters preferences in grading are then said to be single-peaked). Then, as was
seen, the voters optimal voting strategy is simply to give the candidate the
grade : the majority judgment is strategy-proof-in-grading.
Similar reasoning shows that the majority-grade mechanism is group
strategy-proof-in-grading. A group of voters who share the same beliefs (e.g.,
they belong to the same political party) has the same optimal strategy, namely, to
give to the candidates the grades it believes they merit. For if the group believed
that Royal merited better than Good, and all raised the grade they gave her,
her majority-gauge would remain the same (p does not change). If all lowered
the grade they gave her, her majority-gauge would decrease (q increases), and
perhaps her majority-grade would be lowered (not their intent). If the group
believed that Royal merited worse than Good, and all lowered the grade they
gave her, her majority-gauge would remain the same (q does not change). If all
raised the grade they gave her, her majority-gauge would increase (p increases),
and perhaps her majority-grade would be raised (not their intent).
These strategy-proof-in-grading properties are not true of any of the mechanisms currently used today. The strategy of a voter may, however, focus on the
Majority Judgment
15
final ranking of the candidates rather than on their final grades. It is impossible
to completely eliminate the possibility of strategic manipulation if a voter is
prepared for a candidates final grade to be either above or below what he thinks
the candidate merits: there is no mechanism that is strategy-proof-in-ranking.
The majority judgment does not escape the Gibbard-Satterthwaite impossibility
theorem (see chapters 5 and 13), but it best resists such manipulation.
One means by which it resists is easy to explain. Take the example of Bayrou with a Good + and Royal with a Good (see table 1.4); their respective
majority-gauges are
Bayrou: (44.3%, Good, 30.6%)
How could a voter who graded Royal higher than Bayrou manipulate? By changing the grades assigned to try to lower Bayrous majority-gauge and to raise
Royals majority-gauge. But the majority judgment is partially strategy-proofin-ranking: those voters who can lower Bayrous majority-gauge cannot raise
Royals, and those who can raise Royals majority-gauge cannot lower Bayrous. For suppose a voter can lower Bayrous. Then she must have given Bayrou
a Good or better; but having preferred Royal to Bayrou, the voter gave a grade
of better than Good to Royal, so she cannot raise Royals majority-gauge (cannot raise her p). Symmetrically, a voter who can raise Royals majority-gauge
must have given her a Good or worse and thus to Bayrou a worse than Good ;
so the voter cannot lower Bayrous majority-gauge (cannot increase his q).
Compared with other mechanisms, the majority judgment cuts in half the
possibility of manipulation, however bizarre a voters motivations or whatever
her utility function. The majority judgment resists manipulation in still other
ways that other methods do not, but to see how requires information found
in voters individual ballots that is not shown in the elections results of table
1.4. For example, significant numbers of voters cannot contribute at all either
to raising Royals majority-gauge or to lowering Bayrous (28% of those who
graded Royal above Bayrou). Moreover, those who can manipulate have no
incentive to exaggerate very much in any case, for it does not pay to do so (a
more detailed analysis is given in chapter 19).
Some critics have averred that a voter should be forced to make up his mind
by expressing a clear-cut preference for one candidate. The first-past-the-post
system has this property (unless the voter abstains or hands in a blank ballot),
as does any mechanism in which the input is a rank-order of the candidates.
Both types of mechanism prevent the voter from expressing any intensity
of preference: the second-ranked candidate is only that, whatever the voters
16
Chapter 1
evaluation. But why limit any voters freedom of expression? Shouldnt someone who sees no discernible difference between two or more candidates be
allowed to record this? Shouldnt a voter who believes his second-ranked
candidate is merely acceptable or worse be allowed to say so? The majority
judgment gives voters complete freedom of expression (within the bounds of the
language).
Voters who participated in the Orsay experiment were delighted with the
idea that a candidate could be assigned a final grade. The majority-grade is an
important signal that expresses the electorates appreciation of a candidate.
Chiracs triumph against Le Pen in 2002a majority of over 80% in the second roundwould have been very different had the majority judgment been
used: he surely would have won, but with a middling gradeperhaps an
Acceptable or a Good to Le Pens To Rejectthat would have given a more
sober sense to his reelection. In the election of 2007, Voynets majority-grade
of Acceptable and her fourth place in the majority-ranking more clearly express
the electorates concern with environmental issues than her eighth-place finish
in the official national vote. Le Pens last place in 2007 and solid To Reject evaluation shows the electorates strong rejection of his ideas, whereas the official
vote makes him one of the four major contenders. Even when there is only one
contender, a not infrequent occurrencein the 2002, 2004, and 2006 U.S. congressional elections, respectively 81, 66, and 59 candidates were elected with
no Democratic or Republican opponentthe majority judgment establishes the
esteem in which the candidate is held.
U.S. presidential primaries leap to mind as an immediately realistic application. The majority judgment would be relatively easy to implement since the
decision to do so may be taken at the state level. It would permit a real expression of the voters opinions versus sending a message consisting of one name.
With as many as five to ten candidates, the first-past-the-post system drastically curtails expression of the voters opinions: a big winner often garners
as little as 25% of the total vote, hardly a mandate to be singled out as the
principal candidate. Indeed, the luck of the draw may determine the winner
due to the mere presence of strategic candidates who have no real chance of
emerging as real contenders. Finally, and of real importance, the current system
is divisive for a political party: it opts for one candidate and throws out the
others. With the majority judgment, candidates are not rejected: many, perhaps
all, receive good majority-grades, yet one is singled out because he is first in
the majority-ranking.
A very small-scale experiment was conducted on the Web in late September,
early October 2008. Members of the Institute for Operations Research and the
Management Sciences (INFORMS), a scientific society, were asked,
Majority Judgment
17
Very
Good
Good
Acceptable
Poor
To
Reject
No
Opinion
Suppose that instead of primary elections in states to designate candidates, then national
elections to choose one among them, the system was one national election in which
all eligible candidates are presented at once. Or, suppose you are in a state holding a
primary where you are asked to evaluate the candidates of all parties (at least one state
primary votes on all candidates at once). A possible slate of candidates for President of
the United States could be [here followed the names of the eight candidates given in the
ballot together with their affiliations.]
Then they were invited to vote with the ballot shown in figure 1.2.
This experiment was certainly not representative of the U.S. electorate (nor
was it meant to be). A large majority of the members of INFORMS are U.S.
citizens, but many members are citizens of other nations. The results, shown
in table 1.5, are nevertheless of interest. In this case the winner stands out as
the only candidate with a Very Good, and the collective opinion of those who
voted is quite clear.
18
Chapter 1
Table 1.5
The Majority-Gauges (p, , q) and the Majority-Ranking, INFORMS Experiment, September
Early October 2008
1st
2nd
3rd
4th
5th
6th
7th
8th
Barack H. Obama
Hillary R. Clinton
Colin L. Powell
Michael R. Bloomberg
John R. Edwards
John S. McCain
W. Mitt Romney
Michael D. Huckabee
p
Better Than
Majority-Grade
Majority-Grade
q
Worse Than
Majority-Grade
35.9%
45.0%
32.8%
42.0%
36.6%
33.4%
46.6%
33.5%
Very Good +
Good +
Good
Acceptable+
Acceptable+
Acceptable
Poor+
Poor
32.0%
33.6%
41.2%
31.3%
32.8%
44.2%
22.9%
47.3%
The descriptions of the majority judgment given in this chapter should permit its use in any applicationwith few judges or many votersgiven that a
common language of grades has been defined and explained.
1.7 Nomenclature
Majority judgment is the name we have chosen to give to the method we advocate. It encompasses several key ideas, each endowed with a name that is used
again and again throughout the book.
Majority-grade The middlemost or median of the grades given to a competitor by the judges or voters; when there is an even number of judges or voters,
the lower middlemost of the grades (see section 1.3 and chapter 12).
Majority Judgment
19
The intent of this book is to show why the majority judgment is superior to any
known method of voting and to any known method of judging competitions.
To do so, it presents the fundamentals of the traditional theory of social choice,
describing the principal known methods, together with simple proofs of the most
important results. It also proposes new characterizations, new incompatibility
theorems, and new methods in the traditional model of social choice. Actual
voting systems and methods used in practice to judge competitors (e.g., wines,
divers, figure skaters) are also described to show how they are in fact subject to
all the paradoxes and failures that are identified in theory. Throughout, theory,
experiments, and practical evidence in voting and in judging competitions are
provided to support the first central point: the traditional model is a bad model,
in theory and in practice.
The new model is then developed. It is shown that a host of properties
uniquely characterize the order functions as the only methods that can be
used. This is established from a variety of points of view. Practice again plays
a central role: experimental evidence is given that shows the majority judgment is a practical method and that common languages in voting and in judging
competitions do in fact exist and can be meaningfully defined. Statistical comparisons that depend on real data are made with other methods to show why the
majority judgment is better in voting; in particular, approval voting is shown
to fail. Often, judging competitions (such as wine, ski jumping, or figure skating) invoke several different criteria: the majority judgment is generalized to
such situations. Throughout, theory, experiments, and practical evidence are
provided to support the second central point: the majority judgment is a better
alternative to all other known methods, in theory and in practice.
Voting in Practice
Voting in practice invokes issues that go well beyond the problem of how to
elect one candidate among several or how to determine their order of finish.
Many candidates are elected as the representatives of regions (constituencies,
congressional districts, states, departments, provinces, or nations) to legislatures (Assembles nationales, Diets, Houses of Representatives, Knessets, Parliaments, or Senates), or as the representatives of political parties, or of both
regions and parties, to legislatures. A multitude of different systems are used;
they raise different problems, invoke different information, ask for different inputs, and are resolved with different mechanisms. Nevertheless, several
central problems are common to many electoral systems.
First and foremost among them is the subject of this book, the problem of
social choice: to elect one candidate and rank all candidates. The apportionment
of a legislature is a second major problem (see Balinski and Young 1982). In
one guise, apportionment addresses how to allocate a fixed number of seats
among regions according to their respective numbers of inhabitants. The
implicit ideal is an allocation of seats that is proportional to the number of
inhabitants, though that ideal cannot be realized perfectly. In a second guise,
it concerns what is commonly called proportional representation: the apportionment of seats to political parties according to their respective votes. In this
case the ideal is explicit, but proportional to votes has no more intrinsic significance than is to be found in the input messages that are the votes. A third
important problem is districting or redistricting: given a region that has been
apportioned k seats in a legislative body, it is to be cut up into k single-member
districts of approximately equal population, each of which is a collection of
22
Chapter 2
Voting in Practice
23
draws by lot a dozen members; [5] this dozen names, in turn, by a qualified
majority of 8 votes, 25 citizens worthy of esteem; [6] they draw 9 others by
lot; [7] these 9, by a majority of 7 votes, elect 11; [8] finally these 11 draw
the real Quarantia of 41 members; [9] the Quarantia elects the Doge. With all
these convolutions they hoped to purify the election of all party influence or
intrigue, distilling to the utmost the quintessence of patriotism and intelligence
(L. Konopczynski, cited in L. Moulin 1953, 115). The rules are not completely
unambiguous, and voting does take place, though exactly how is not made
clear. And yet, might this be a better procedure than the popularity contests that
national presidential elections have become in many democratic nations today?
It certainly would cost less.
This chapter describes elements of an assortment of electoral systems, past
and present, to give examples of how politicians have manipulated them, to
show that the paradoxes of the theory of social choice are real and can have
important practical consequences, and to point to the importance of strategic
behavior on the parts of both candidates and voters.
2.1 United States of America
Presidents of the United States are elected indirectly. Each of the fifty states
is allocated a number of votes in the Electoral College equal to the number
of its Representatives plus its number of Senators; in addition, the District of
Columbia (the city of Washington) has three electoral votes. The House of
Representatives has 435 members and each state has two Senators, so there is a
total of 538 Electoral College votes. A voters input is one vote for at most one
candidate. A candidate who receives a plurality of the votes in a statemeaning
the most votes, not necessarily a majoritywins all the Electoral College votes
of the state in every state but two and in the District of Columbia. The states
of Maine (two Representatives by the apportionment of 2000) and Nebraska
(three Representatives) have a different but common rule (adopted by Maine in
1972, by Nebraska in 1992): a candidate who receives a plurality of the votes in
a congressional district wins one Electoral College vote, and a candidate who
receives a plurality of the votes in the state wins two Electoral College votes.
The system has at least two major drawbacks. First, it is entirely possible
for a candidate to be elected whose popular vote is below an opponents. This
has happened at least three times,2 as shown in table 2.1. Had the number of
2. In 1824 the election was decided in the House of Representatives; some claim that John
F. Kennedys election against Richard Nixon is another instance.
24
Chapter 2
Table 2.1
U.S. Presidents Elected with a Minority of the Popular Votes
Year
Popular Votes
1876
Rutherford B. Hayes
Samuel J. Tilden
4,036,298
4,300,590
185
184
1888
Benjamin Harrison
Grover Cleveland
George W. Bush
Albert Gore
5,439,853
5,540,390
50,456,002
50,999,897
233
168
271
266
2000
Voting in Practice
25
Table 2.2
U.S. Presidential Election 2000, Florida Popular Votes
George W. Bush
Albert Gore
Ralph Nader
2,912,790
2,912,253
97,488
area into electoral districts, often of highly irregular shape, to give one political
party an unfair advantage by diluting the oppositions voting strength (Blacks
Law Dictionary 2001)has in the twenty-first century become a science.
It is generally acknowledged that well upwards of 80% of the seats in the
House of Representatives are safe in the districts established on the basis of
the 2000 census. Many claim the districts determine elections, not votes. If an
election is deemed competitive when the spread in votes between the winner
and the runner-up is 6% or less, then 5.5% of the elections were competitive
in 2002, 2.3% in 2004, and 9.0% in 2006. Many candidates ran unopposed
by a candidate from one of the two major parties in all three elections (19%
in 2002, 15% in 2004, 14% in 2006). In Michigan the Democratic candidates
together out-polled the Republican candidates by some 35,000 votes in 2002
yet elected only six representatives to the Republicans nine. In the 2002 Maryland elections, Republican representatives needed an average of 376,455 votes
to be elected, the Democratic representatives only 150,708. In all three elections Massachusetts elected only Democrats, at least half without Republican
opposition. Ohio elected eleven Republican and seven Democratic representatives in 2006, and yet the Democratic candidates received 211,347 more votes
than did the Republican candidates. Every one of Californias fifty-three districts returned a candidate of the same party in the three elections: fifty were
elected by a margin of at least 20% in 2002, fifty-one in 2004, and forty-nine
in 2006, whereas only one candidate won by less than a margin of 6% in any
of these elections. Gerrymandering is widespread and decidedly ecumenical:
both parties indulge.
How has this situation come about? That is a long and fascinating story culminating in the U.S. Supreme Courts decision of April 28, 2004, that upheld
Pennsylvanias actual districting plan (see Balinski 2004). Everyone involved
the attorneys against, the attorneys for, and the Justicesacknowledged that
the plan was a blatant political gerrymander. As the chairman of the National
Republican Congressional Committee had predicted, Democrats rewrote the
book when they did Georgia, and we would be stupid not to reciprocate . . . [The
Pennsylvania redistricting] will make Georgia look like a picnic. In view of
the confused and often contradictory precedents of some forty years, four
26
Chapter 2
Voting in Practice
27
can totally frustrate the popular will on an overwhelming number of critical issues.
The legislature must do more than satisfy one man, one vote; it must create a structure which will in fact as well as theory be responsive to the sentiments of the community
. . . Even more than in the past, district lines are likely to be drawn to maximize the political advantage of the party temporarily dominant in public affairs. (Wells v. Rockefeller
1969; our emphasis)
A new structure has been proposed that eliminates the possibility of political
gerrymandering. Called fair majority voting (FMV), it combines single-member
constituencies, as required by federal law, with proportional representation at
the level of the states. Voters vote as they do in the current system: for a candidate
of a party in their congressional district. In every state each party is apportioned
a number of seats according to its total vote in the state; it must then be decided
which candidates of each party are elected. When there are but two parties, the
candidates of each party whose percentages of the votes in their districts are
the highest are elected (Balinski 2008).
2.2 Zrich, Switzerland
28
Chapter 2
Table 2.3
Results, Zrich City Parliament Election, February 12, 2006
Party
Votes-Seats
SP
23180-44
SVP
12633-24
FDP
10300-19
Greens
7501-14
Dist.-Seats
A-12
B-16
C-13
D-10
E-17
F-16
G-12
H-19
J-10
Party-div.
2377-4
2846-7
2052-5
2409-4
3632-5
2628-6
2938-4
2976-6
1322-3
1.01
1275-2
1379-3
629-2
968-1
1642-2
1972-4
1630-3
2113-4
1025-3
1
1819-3
653-1
349-1
1092-2
3015-5
754-2
1272-2
1039-2
307-1
1.01
1033-2
1082-3
786-2
842-1
1499-2
572-1
807-1
661-1
219-1
1
The method was inaugurated on February 12, 2006. The results are given
in table 2.3. Voters vote as they do in the current system: they cast a vote for
at most one of the party lists of candidates in their district. Each district has
a predetermined number of seats that have been apportioned as a function of
their populations; in table 2.3 districts are identified by A, B, . . . , J, followed
by their numbers of seats (e.g., B-16 means district B has been apportioned
sixteen seats).
The parties are apportioned numbers of seats that depend on their total vote
in all districts by Websters or Saint-Lagus method.4 In table 2.3 the parties
are identified by SP, SVP, . . . , SD, followed by their total votes and numbers
of seats (e.g., 5418-10 under CVP means that party received 5,418 votes in
total and was apportioned ten seats). The entry of the row of a district and
the column of a party gives the number of votes received by the party list
in that district, followed by the number of elected candidates (e.g., 3015-5
means that in district E the list of the FDP party had 3,015 votes and elected
five candidates). The number of candidates elected by each party list is the result
of a computation that requires a computer program. However, the solution is
easily checked given the district divisors and party divisors (in table 2.3, the
last column and last row, respectively): the vote in an entry is divided by its
divisors and rounded to the closest integer (e.g., 3015/(660 1.01) 4.52 is
4. Websters or Saint-Lagus method: If the party votes are (v1 , . . . , vn ), and the number of seats
to distribute is
h, then the apportionment is (a1 , . . . , an ), where ai = [vi /] and the divisor is
chosen so that ai = h. [x] is x rounded to the nearest integer for x any real number. When seats
are apportioned to regions and the pi are populations, a minimum of 1 is usually guaranteed each
region: in that case the computation is the same except that ai = max{1, [vi /]}.
Voting in Practice
29
Table 2.3
(cont.)
Party
Votes-Seats
CVP
5418-10
EVP
3088-6
AL
2517-5
SD
1692-3
Dist.
-div.
Dist.-Seats
A
B
C
D
E
F
G
H
J
Party-div.
610-1
541-1
315-1
440-1
837-1
708-1
696-1
777-2
494-1
1
236-0
176-0
79-0
342-1
618-1
615-1
391-1
631-2
0-0
0.88
201-0
464-1
699-2
230-0
323-1
154-0
212-0
191-1
43-0
0.8
138-0
198-0
108-0
111-0
144-0
333-1
124-0
328-1
208-1
1
600
432
400
660
660
473
650
470
400
rounded to 5). The solution is unique (up to ties, which are extremely rare).
FMV is the same method except that every district is apportioned exactly one
seat, which simplifies the theory and the computation.
It should be noted that the number of elected candidates per list may deceive
the candidates of some lists. The Greens district B list elects three candidates
with 1,082 votes, whereas the SVPs district E list elects only two with 1,642
votes, and there are other similar instances. This is the price of guaranteeing
equity to parties and districts. The solution was accepted with no complaints;
indeed, the canton of Aarau has decided to adopt this method, too.
2.3 Mexico
30
Chapter 2
second, separate vote. These 200 members were to be allocated to the parties
in proportion to their total national second votes by the method of Hamilton,5
except that no party could have more than 300 seats in total, and if a partys
percentage of the total vote was x%, then it could have no more than (x + 8)%
of the 500 seats.
There were, on July 6, 1997, five eligible parties, which elected the following
numbers of members in the 300 single-member constituencies:
PRI: 165,
PAN: 64,
PRD: 70,
PVEM: 0,
PT: 1.
The results of the second votesfor the respective regional party listsare
given in table 2.4a. By the law, the PRI could receive no more than 39.96 +
8% = 47.96% of 500 = 239.8, so 239 seats. Having already elected 165, it
had to be allocated 74 of the 200 seats, leaving 126 to be distributed among
the remaining parties by a first procedureHamiltons methodimplying the
following apportionment of the 200 seats:
PRI: 74,
PAN: 57,
PRD: 55,
PVEM: 8,
PT: 6.
The law further stipulated that the seats were to be apportioned among the
regional party lists by a second procedure. First, the seats of a party having
been limited by one of the constraints are apportioned to its five regional lists
by the method of Hamilton (so the PRIs regional lists are apportioned 15, 17,
15, 14, and 13). This leaves 40 15 = 25 in region I, 40 17 = 23 in region II,
and so on. Second, Hamiltons method is used to apportion the remaining seats
in each region to the other parties. So, for example, the 25 seats remaining in
region I are apportioned among the four other parties. The results of this second
procedure are given in table 2.4b.
The second procedure allocates to the PRI and the PT the number of seats
they should receive according to the first part of the law. However, no other
party is apportioned the number of members it should receive: the PAN receives
one too few, the PRD two too many, the PVM one too few. The law is logically
inconsistent. This is hardly surprising because the second procedure ignores the
allocations of the first procedure except for that of the PRI. It should come as
no surprise that PRI politicians were the authors of this law. Unknown by the
general public, the Consejo General del Instituto Federal Electoral invented an
ad hoc rule to correct the error (see the italic numbers in table 2.4b): (1) allocate
the seats of the PRI, as before; (2) for each region compute the quotas of the
5. Hamiltons method: Suppose h seats are to
be apportioned to n parties with votes (v1 , . . . , vn ).
First the integer part of the quota qi = hvi / vj is apportioned to each party i; then any unapportioned seats are assigned to those parties having the largest fractions or remainders, qi qi
(where x is the real number x rounded down to the nearest integer).
Voting in Practice
31
Table 2.4a
Mexican House of Deputies Election, July 6, 1997
Region
I
II
III
IV
V
Total
PRI
PAN
PRD
PVEM
PT
Total
2,379,785
2,543,570
2,354,047
2,098,581
2,062,736
2,504,484
2,138,564
909,386
1,237,297
1,005,807
1,019,822
687,162
1,377,933
2,385,525
2,048,461
197,098
112,721
90,373
424,672
291,273
118,673
303,794
126,342
123,612
83,704
6,219,862
5,785,811
4,858,081
6,269,687
5,491,981
11,438,719
39.96%
7,795,538
27.23%
7,518,903
26.27%
1,116,137
3.90%
756,125
2.64%
28,625,422
100%
Table 2.4b
Mexican House of Deputies Election, July 6, 1997, showing the Laws Inconsistent distribution of
200 Seats and the Corrections
Region
PRI
PAN
PRD
PVEM
PT
Total
I
II
III
IV
V
Total
15
17
15
14
13
74
16(17)
15
9
8
8
56(57)
7(6)
5
14(13)
15
16
57(55)
1
1
1
2(3)
2
7(8)
1
2
1(2)
1(0)
1
6
40
40
40
40
40
200
remaining seats due each party list and allocate their integer parts; (3) for each
party, in the order of the largest to smallest vote total, assign any unapportioned
seats to the lists with the largest remainders subject to the further restriction
that no region may be assigned more than forty seats. This procedure is completely arbitrary and has no justification whatsoever, e.g., why not start with
the party with the smallest vote total?
There exists a method to solve this problem that is justified: the biproportional
method used in Zrich. It yields the solution given in table 2.4c.
The Mexican story is not an isolated example of a logically inconsistent law.
Silvio Berlusconis government, facing upcoming elections in April 2006, completely changed the electoral law for electing the members of the Chamber of
Deputies on December 14, 2005, a mere four months before the elections. It
replaced a mixed system, known as the Legge Mattarella or Mattarellum,
whereby 75% of the deputies were elected in single-member districts and 25%
on a proportional basis, with a system that is supposedly proportional. Its difficulties are similar to Mexicos: seats are allocated to parties on the basis of the
total national vote, but the procedures for apportioning seats to regional party
32
Chapter 2
Table 2.4c
Mexican House of Deputies Election, July 6, 1997, showing the Biproportional Apportionment
Region
PRI
PAN
PRD
PVEM
PT
Total
I
II
III
IV
V
14
16
18
12
14
17
16
8
8
8
7
5
12
16
15
1
1
1
3
2
1
2
1
1
1
40
40
40
40
40
Total
74
57
55
200
Voting in Practice
33
Table 2.5
The Winners of the Last Six British Elections
Votes
Seats
1983
1987
1992
1997
2001
2005
42.4%
61.1%
42.2%
57.8%
41.9%
51.6%
43.2%
63.4%
40.7%
62.5%
35.2%
55.1%
throughout the country could easily end up with no seats at all. A big advantage
accrues to the party that has the highest percentage of votes, even when its
margin over the second highest partys percentage is very small.
That a method of voting has the property of producing a majority party able
to govern is generally viewed as a good thing, but is this a truly democratic
outcome? The six most recent parliamentary elections in the United Kingdom
have also seen minorities of the votes translated into (for the most part) large
majorities of the seats, as may be seen in table 2.5. The Conservatives benefited
from 1983 to 1992, Labour from 1997 to 2005.
2.5 Australia
34
Chapter 2
Table 2.6
The Winning Coalitions of the Last Six Australian House of Representatives Elections
First votes
Seats
1993
1996
1998
2001
2004
2007
44.9%
54.4%
47.3%
63.5%
39.2%
54.1%
43.0%
54.7%
46.4%
57.3%
43.4%
55.3%
Voting in Practice
35
candidates considered most likely to win. Definitive results are not known until
the following day.
The Australian Senate has seventy-six members, twelve from each of the six
states, two from each of the two territories. It is renewed by halves. Typically,
six Senators are elected simultaneously in a statewide vote using the single
transferable vote (STV) system. It is a multicandidate version of the alternative vote. Again, the voters rank-orderings of the candidates indicating their
preferences is the input. But in these elections there can be huge numbers
of candidates: in 2004, for example, there were seventy-eight candidates for
the six positions in the New South Wales election. A voter can either choose to
vote above the line or below the line. Above means the voter chooses the
rank-ordering (of all the candidates) specified by one political party, the outcome of prior strategic negotiations. Below means the voter determines her
own rank-ordering. In practice, over 95% vote above the line.
The STV works as follows. First, the (Droop) quota is computed: if
b is the number of ballots and s the number of candidates to be elected,
b
(q is the smallest integer that when multiplied by s + 1 yields
q = 1 + s+1
a greater number than b, so any smaller q could elect s + 1 members). The
computation is iterative. Any candidate listed first at least q times is elected
and dropped from all lists. If this elects fewer than the necessary number, then
an elected candidate Cs surplus of votes above q is distributed pro rata to the
candidates that immediately follow Cs (currently) first-ranked position. The
procedure is repeated until the necessary number of candidates is elected, unless
at some step no new candidate is elected. In that case the candidate D with the
fewest current first places is eliminated, and her votes are distributed pro rata
to the candidates that immediately follow Ds (currently) first-ranked position.
The computation continues in the same manner with the distribution of surplus
votes of an elected candidate taking precedence over the distribution of the
votes of a candidate with fewest current first places. In the famous New South
Wales electionseventy-eight candidates was a recordit took seventy-seven
steps to determine all winners. As a multicandidate version of the alternative
vote, STV suffers from the same defects. It is not monotonic and it does not
avoid Arrows paradox. It is, in addition, a very complex procedure that takes
days to compute and that voters are unable to verify because the complete lists
of preferences of the voters are not published. Moreover, as with any method
whose inputs are rank-orderings, it seriously constrains the voters abilities to
express intensities of preference: perhaps only one or two or three candidates
are acceptable at all, perhaps one candidate is considered best, the second trails
well behind, and so on.
36
Chapter 2
2.6 France
French electoral laws have undergone more than two centuries of change. The
main manipulative variable of the early years was who was allowed to vote. On
the eve of the Revolution, in the summer of 1789, some 5 million (including
some women) participated in electing the members of what came to be the Assemble nationale. Successive regimes imposed conditions of ownership and
earnings on the franchise to vote and excluded women. Later in 1789, 4.3 million
participated; in 1793 the number went up to 7 million; with the Empire in 1799
the number decreased to 5 million (other controls concerned who was allowed to
be elected). But then, with the Restoration in 1814, the number dropped drastically to some 90,000, climbing slowly from 94,000 in 1830 to 241,000 on the eve
of the revolution of 1848, when true universal suffrage is generally considered
to having been achieved. Women did not obtain the franchise until 1944.8
A particularly striking example of blatant manipulation was the electoral law
of May 1951. The coalition of center parties that ruled France were afraid of two
major forces: Charles de Gaulles R.P.F. (Rassemblement du Peuple Franais)
and the P.C. (Parti Communiste). The law stipulated that in each department
parties should present lists of candidates, the votes then going to party lists.
Parties could, however, declare themselves as grouped. A party that stands
alone is treated as a group. (1) If a group of parties obtains a majority of the
votes, the group obtains all the seats of the department; otherwise, the seats are
first apportioned to the groups. (2) Then the seats of each group are apportioned
to its parties. The method of Jefferson (or of DHondt) is used in each case.9
The inventors were certain that the R.P.F. and the P.C. would never form a
group. The results of the election in Marne, given in table 2.7, show how well
the method worked. Had the method of Jefferson been applied to the parties
directly, the P.C. and the R.P.F. would each have had two seats and the parties
of the group one in total (last column). The method limited the P.C. and the
R.P.F. to one seat each and gave three to the group of parties.
The suburban towns surrounding Paris constituted the P.C.s fortress, and
the R.P.F. was thought to be popular in Paris and several of its more affluent
suburbs. The architects of the law feared the electoral system just described
would be a disadvantage in these departments. They found a simple expedient:
8. On April 21, 1944, a decree of Charles de Gaulles provisional government in Algiers established
womens suffrage. Earlier in the twentieth century, the left had been, by and large, opposed to
granting the vote to women: its members claimed that priests would dictate their votes.
9. The method of Jefferson (or DHondt): If the votes are (v1 , . . . , vn ), and the number of seats
to distribute is
h, then the apportionment is (a1 , . . . , an ), where ai = vi / and the divisor is
chosen so that ai = h.
Voting in Practice
37
Table 2.7
Elections to the French Assemble nationale, Marne, Five Seats, June 17, 1951
Vote
Group Vote
PC
RPF
47,216
45,912
47,216
45,912
MRP
SFIO
=75,735
RGR
GrC
36,702
18,567
+
16,575
3,891
RIF
4,024
4,024
Group Seats
Party Seats
Direct
1
1
1
1
2
2
2
1
1
0
0
0
0
0
Note: Seven parties presented lists. The MRP, SFIO, RGR, and GrC declared themselves a group.
Party Seats was the actual apportionment. Direct is the apportionment to party lists. No group
has a majority. So by the law: = 25, 000 apportions the five seats to the groups, and = 17, 000
apportions the groups three seats to their parties. = 22, 000 apportions the five seats directly to
the parties.
38
Chapter 2
Voting in Practice
39
The Conseil let matters stand, ruling that the Constitution did not confer on it
the power to decide on the equity of the plan nor the power to suggest other
plans. In essence, it followed Justice Felix Frankfurters famous 1946 dictum,
Courts ought not to enter this political thicket (Colegrove v. Green 1946),
upheld in a 2004 U.S. Supreme Court decision written by JusticeAntonin Scalia:
[W]e believe the correct standard which identifies unconstitutional political
districting has not been met . . . [We] do not know what the correct standard
is . . . We would therefore . . . decline to adjudicate these political gerrymandering claims (Vieth v. Jubelirer 2004). The fact is that there is no theory, no
well-defined set of criteria, by which to decide which of two districting plans
is the more equitable.
France has conducted two censuses since that of 1982, in 1990 and 1999 (but
no census will again be taken because a continuous estimation procedure was
established in 2004). Governments of the left and of the right have ignored the
principle that reapportionment and redistricting should have taken place. The
Pasqua redistricting has remained in force through the legislative elections of
2007. The distortions have become grotesque. France has one hundred departments that share 570 seats, and 7 seats are allocated to territories. According
to the latest available official figures (January 2006)and compared with the
ideal standard of Websters apportionmentonly forty-six departments have
the number of seats they deserve, thirty-one have 1 too many, sixteen have 1
too few, six have 2 too few, and one has 3 too few. Table 2.8 gives examples of
the inequity of the actual apportionment and shows that there are cases of pairs
of departments where the more populated one has fewer seats (in all, there are
eighty-two such pairs).
Populations of departments are easy to obtain; those of the legislative districts
more up-to-date than the 1999 census were not available (in 2008). According
to the 1999 censusand the distortions were undoubtedly worse in 2007the
40
Chapter 2
Table 2.8
Extract of Apportionment of Seats in the French Assemble nationale to Departments, 2006
Populations
Department
Seine-et-Marne
Seine-Maritime
Haute-Garonne
Moselle
Var
Ain
Saone-et-Loire
Population
Equitable
Actual
1,267,000
1,245,000
1,169,500
1,039,000
974,000
565,000
546,000
11
11
11
9
9
5
5
9
12
8
10
7
4
6
23 most populated
77 least populated
31,403,000
31,596,000
284
286
266
304
Total
62,999,000
570
570
population of the 2d district of Lozre was 34,374 and that of the 2d district of
Val dOise was 188,200, so that two residents of the first weighed as heavily
as eleven residents of the second. Some of this inequality is due to bad apportionment, some to bad definitions of districts. Within the department of Var,
independent of the apportionment, two residents of the 1st district weighed
about as much as five residents of the 6th district, for the 1st district had
73,946 inhabitants and the 6th district had 180,153. Though these examples
are the worst, there is no denying that there are flagrant inequities throughout
the country.
No changes at all were made in the representation of Frances departments in
the Assemble nationale from 1986 through 2009; this is manipulation all the
samepassive manipulationbecause the equal, constitutionally guaranteed
rights of voters were ignored for the convenience of the deputies whose districts
remain unchanged. A new electoral law is, at last, expected in 2010.
French Presidential Elections
Voting in Practice
41
In each round 11 a voter may cast one vote for at most one candidate, and the
order of finish is determined by the candidates votes. Except for the provision
of a run-off between the top two finishers, this is exactly the mechanism used
in the U.S. presidential elections: an elector has no way of expressing her or
his opinions concerning candidates except to designate exactly one favorite. In
consequenceimagine for the moment a field of at least three candidateshis
or her vote counts for nothing in designating the winner unless it was cast for
the winner, for no expression concerning the remaining two or more candidates
is possible.
2002 Presidential Election
The French presidential election of 2002 with its sixteen candidates is a veritable storybook example of the inanity of the two-past-the-post mechanism.
Jacques Chirac, the incumbent president, was the candidate of the Rassemblement pour la Rpublique (R.P.R.), the big party of the legitimate right; Lionel
Jospin, the incumbent prime minister, that of the Parti Socialist (P.S.); JeanMarie Le Pen, that of the extreme right Front National party (F.N.); and Franois
Bayrou, that of the moderate Union pour la Dmocratie Franaise (U.D.F., the
ex-president Valry Giscard dEstaings party). Arlette Laguiller was the perennial candidate of a party of the extreme left, the Lutte Ouvrire. The extreme
right had two candidates, Le Pen and Bruno Mgret; the moderate right five,
Chirac, Bayrou, Alain Madelin, Christine Boutin, and Corinne Lepage; the left
and the Greens had four, Jospin, Jean-Pierre Chevnement, Christiane Taubira,
and Nol Mamre; and the extreme left had four, Laguiller, Olivier Besancenot, Robert Hue, and Daniel Gluckstein. One group, the hunters (Hunting,
Fishing, Nature, Tradition party), managed to present only one candidate, Jean
Saint-Josse.
The strategic aspects surrounding so many candidates turned the election
into something of a farce (see table 2.9). The public expected a confrontation
between the dominant candidate of the right, Jacques Chirac, and the dominant
candidate of the left, Lionel Jospin. Instead it was offered a choice between
Chirac and Jean-Marie Le Pen of the extreme right. Chirac crushed Le Pen,
obtaining 82.2% of the votes in the second round. Some 20% of Chiracs votes
were obviously for him. Most of his votes are more accurately described as
11. There have always been two rounds. The first direct popular election of the president in the
Fifth Republic (instituted in 1958) was in 1965: in the first round Charles de Gaulle had 44.64% of
the vote, Franois Mitterrand 31.72%. Together they received 76.36%. In every subsequent election
the top two together received a lower percentage. In 2002 the top seven together received 76.04%.
42
Chapter 2
Table 2.9
French Presidential Election, First-Round Votes, April 21, 2002
J. Chirac
J.-M. Le Pen
L. Jospin
F. Bayrou
A. Laguiller
J.-P. Chevnement
N. Mamre
O. Besancenot
19.88%
16.86%
16.18%
6.84%
5.72%
5.33%
5.25%
4.25%
J. Saint-Josse
A. Madelin
R. Hue
B. Mgret
C. Taubira
C. Lepage
C. Boutin
D. Gluckstein
4.23%
3.91%
3.37%
2.34%
2.32%
1.88%
1.19%
0.47%
against Le Pen: the intrinsic value of a vote when there is only one to cast has
very different meanings.
Had either Jean-Pierre Chevnement, an ex-socialist, or Christiane Taubira,
a socialist, withdrawn, most of his 5.3% or her 2.3% of the votes would have
gone to Jospin, and the second round would have pitted Chirac against Jospin.
According to most of the polls, Jospin would have beaten Chirac, though by
little. 12 In fact, Taubira had offered to withdraw if the P.S. was prepared to
cover her expenses, but that offer was refused. It was rumored that the R.P.R.
helped to finance Taubiras campaign (a credible strategic gambit backed by no
specific evidence). But if Charles Pasqua, an aging ally of Chirac, had been a
candidate, as he had announced he would be, then he might have taken away
enough votes from Chirac to result in a second round between Jospin and Le
Pen. In this event Jospin would surely have been the overwhelming victor, and
for the same reason that Chirac emerged the victor: most of his votes would
have been against Le Pen.
What does this story show? The first- and two-past-the-post mechanisms
invite strategic candidacies, candidates who cannot hope to win but can change
the outcome. This is why Arrows paradox is of great practical significance:
the very possibility of its occurrence completely confuses the ultimate outcome. It also shows the very different meanings or values of votes when these
mechanisms are used.
2007 Presidential Election
French voting behavior in the presidential election of 2007 was very much
influenced by the experience of 2002. There were twelve candidates. Nicolas
12. Sofres predicted a 50%50% tie on the eve of the first round. In the last eleven predictions,
spanning two months before the first round, Sofres polls showed Jospin the winner seven times,
Chirac the winner twice, and two ties.
Voting in Practice
43
Sarkozy was the candidate of the U.M.P. (Union pour un Mouvement Populaire, founded in 2002 by Chirac), its president and the incumbent minister of
the interior; Sgolne Royal, that of the P.S.; Bayrou, again that of the U.D.F.
(though he announced immediately after the first round that he would create a
new party, the MoDem or Mouvement Dmocrate); Le Pen, again that of the
F.N.; and Dominique Voynet, the candidate of the Greens. The extreme left had
five candidates, Besancenot (again), Marie-George Buffet, Laguiller (again),
Jos Bov, and Grard Schivardi; the extreme right had two, Le Pen (of course)
and Philippe de Villiers; and the hunters had one, Frdric Nihous. The distribution of the votes among the twelve candidates in the first round is given in
table 2.10. In the second round Nicolas Sarkozy defeated Sgolne Royal by
18,983,138 votes (53.06%) to 16,790,440 (46.94%).
In response to the debacle of 2002, the number of registered voters increased
sharply (from 41.2 million in 2002 to 44.5 million in 2007), and voter participation was mammoth: 84% of registered voters participated in both rounds.
Voting is a strategic act. In 2007 voters were acutely aware of the importance
of who would survive the first round. Many who believed that voting for their
preferred candidate could again lead to a catastrophic second round, voted differently. Such behaviora deliberate strategic vote for a candidate who is not
the electors favorite (le vote utile)was much debated by the candidates
and the media, and was practiced.
A poll conducted on election day (by TNS SofresUnilog, Groupe LogicaCMG, April 22, 2007) asked electors what most determined their votes. One
of the seven possible answers was a deliberate strategic vote: this answer was
given by 22% of those (who said they voted) for Bayrou, 10% of those for
Le Pen, 31% of those for Royal, and 25% of those for Sarkozy. Comparing the
first rounds in 2002 and 2007 also suggests that deliberate strategic votes were
important in 2007. In 2002 the seven minor candidates of the left and the Greens
(Laguiller, Chevnement, Mamre, Besancenot, Hue, Taubira, Gluckstein) had
26.71% of the vote, whereas in 2007 six obtained only 10.57% (Besancenot,
Buffet, Voynet, Laguiller, Bov, Schivardi); in 2002 the five minor candidates
Table 2.10
French Presidential Election, First-Round Votes, April 22, 2007
N. Sarkozy
S. Royal
F. Bayrou
J.-M. Le Pen
O. Besancenot
P. de Villiers
31.18%
25.87%
18.57%
10.44%
4.08%
2.23%
M.-G. Buffet
D. Voynet
A. Laguiller
J. Bov
F. Nihous
G. Schivardi
1.93%
1.57%
1.33%
1.32%
1.15%
0.34%
44
Chapter 2
of the right and the hunters (Saint-Josse, Madelin, Mgret, Lepage, Boutin)
had 13.55% of the vote, whereas in 2007 two obtained only 3.38% (Villiers,
Nihous).
A candidacy can be a strategic act as well, as was shown in the 2002 election.
To become an official candidate requires 500 signatures. They are drawn from a
pool of about 47,000 elected local and national officials who represent the one
hundred departments and must include signatures coming from at least thirty
departments but no more than 10% from any one department. Both Besancenot
and Le Pen appeared to have had difficulty in obtaining them. Sarkozy publicly
announced he would help them obtain the necessary signatures, as a service to
democracy.
In the period leading up to the first round of voting, the major candidates
of the right and the leftSarkozy of the U.M.P. and Royal of the P.S.both
argued strenuously against Bayrou, the centrist. Both most feared him in a oneto-one confrontation. The polls show why: as of February 2007 they consistently
suggested that Bayrou would defeat either one of them in the second round.
Immediately after the first round, Royal and Sarkozy both sought Bayrous
support and tried to incorporate some of his ideas along with theirs. Royal
subsequently revealed that she had offered Bayrou the position of prime minister
at that time. Once elected, Sarkozy, in naming many political personalities of
the left to responsible political positions (ministries, commissions, a coveted
international position), put into effect one of Bayrous principal promises, the
appointment of persons from the left and the right (louverture).
Polling results (table 2.11) suggest that Franois Bayrou was the Condorcetwinner: he would have defeated any candidate in a head-to-head confrontation.
Moreover, the pair-by-pair confrontations (of March 28 and April 19) determine an unambiguous order of finish (there is no Condorcet-cycle): Bayrou is
first, Sarkozy second, Royal third, and Le Pen last. On the other hand, there
is a linear interpolation between their percentages of the vote in head-to-head
encounters among the principal three on December 15, 2006, and April 19,
2007, where there is a Condorcet-cycle (just under halfway from the first to
the second date): Sarkozy defeats Bayrou, Bayrou defeats Royal, and Royal
defeats Sarkozy. This suggests that at some time in that period a poll may well
have revealed the Condorcet paradox. A more precise actual occurrence of the
paradox comes from the 1994 general election of the Danish Folketing. A preelection poll elicited the voters preferences among the three main contenders
for prime ministerH. Engell, U. Ellemann-Jensen, and P. Nyrup Rasmussen
and found Engell preferred to Ellemann-Jensen (by 50.6%), Ellemann-Jensen
preferred to Rasmussen (by 51.1%), and Rasmussen preferred to Engell (by
52.8%) (Kurrild-Klitgaard 1999).
Voting in Practice
45
Table 2.11
Polls on Potential Head-to-Head Second-Round Results, French Presidential Election, December
2006April 2007
Dec. 15
Jan. 20
Feb. 15
Mar. 15
Mar. 28
Bayrou
Sarkozy
45%
55%
49%
51%
52%
48%
54%
46%
54%
46%
55%
45%
Bayrou
Royal
43%
57%
50%
50%
54%
46%
60%
40%
57%
43%
58%
42%
84%
16%
80%
20%
Bayrou
Le Pen
Sarkozy
Royal
49%
51%
51%
49%
53%
47%
54%
46%
54%
46%
Apr. 16
53%
47%
Apr. 19
51%
49%
Sarkozy
Le Pen
84%
16%
84%
16%
Royal
Le Pen
75%
25%
73%
27%
The information in table 2.11 (March 28 and April 19) suffices to determine
the Borda-scores among the four candidates. When all head-to-head results
are known for a set of candidates, a candidates Borda-score is the sum of his
votes against all opponents. It determines the winner and the order among the
candidates. On March 28 the Borda-scores were Bayrou 195, Sarkozy 184,
Royal 164, and Le Pen 57. On April 19 they were Bayrou 193, Sarkozy 180,
Royal 164, and Le Pen 63. Condorcet and Borda agree on the order of finish.
These two ideas, though never (or hardly ever) used in practice, have enjoyed
a peculiar but tenacious hold on the minds of social choice theorists down to
the present day (they are discussed in detail in chapters 3 and 4).
2.7 The Lessons
Practice proves that the many properties of methods imagined and studied by
the theorists of social choice and voting are real. Arrows and Condorcets
paradoxes occur. Borda- and Condorcet-winners can be eliminated in first- and
two-past-the-post systems. Throughout history politicians manipulate. They
manipulate who has the right to vote, who has the right to be elected, how the
district plans are drawn, what systems to use for allocating seats to parties
and regions. When it suits their purposes, and the constitutional rules allow it,
they change systems, even in the several months preceding an election. Ad hoc
46
Chapter 2
systems are routinely invented. Some are incredibly complex, some are contradictory. The historical recordand the bloggers of todaysuggests that most
people seem to believe that conceiving an electoral system is a simple matter
invented on the back of an envelope, despite the years of effort and thought that
have gone into the development of the theory of social choice.
Practice also shows that given a particular system, both the politicians and
the voters act strategically. Parties instruct voters how to vote. Minor candidates
throw their hats into the ringperhaps by personal conviction, perhaps urged
(or paid) by otherssometimes changing the outcome by their very presence.
Voters may cast their one vote not for their favorite candidate but for another;
at other times, when asked or forced to give a rank-ordered list of candidates,
voters may place last a candidate they prefer to most others but who they fear
presents the greatest threat to their favorite. The analysis of electoral systems
must account for such strategic behavior.
Has the theory of social choice and voting responded to these real challenges?
An account of the theory is given in chapters 3, 4, and 5. Although what emerges
gives a largely negative answer, the material provides a rich foundation of ideas,
approaches, concepts, and mechanisms. Chapter 6 argues that the theory fails
and presents experimental evidence to show why.
Many a man doing loud work in the world, stands only on some thin traditionality,
conventionality; to him indubitable, to you incredible.
Thomas Carlyle
Llull seems to be the first to have carefully specified a system of election, and
he specified two. The first is a refinement of what Condorcet proposed five
1. See London and McLean (1990) and Hgele and Pukelsheim (2001; 2008).
48
Chapter 3
49
end of the list. A clear statistical advantage is given to a candidate who appears
later in the ordered list over one who appears earlier, for the simple reason that
he is put up against fewer opponents. In some cases different orders determine
different winners, so the order itself can determine the winner.
Cusanus proposed a quite different scheme:
After a solemn introduction into the electoral business they should decide on the list of
candidates who because of their outward or inner qualities may be worthy of so majestic
an office [the future emperor]. So that the election may be carried out without fear and
in complete freedom and secrecy, they swear oaths at the altar of the Lord that they will
elect the best man in the just judgement of a free conscience.
The names of all those on the list of candidates are put down by a notary on identical
ballots, with only one name on each ballot. Next to each name the numbers One, Two,
Three are written, up to as many as there are persons that have been decided upon as
worthy candidates [for example, ten]. Every elector receives ten ballots with the ten
names.
. . . [Each] elector should go aside alone . . . and consider the name on every ballot. In
the name of God he should ponder, directed by his conscience, who among all candidates
is least qualified, and place a simple long mark in ink above the number One. Thereafter
he should decide who is next least suitable, and mark the number Two with a simple
long overline. Thus he continues until he arrives at the best, in his judgement, and there
he will mark the number Ten, or generally the number corresponding to the number
of candidates. It is a good idea for the electors to use the same ink, identical pens and
the same simple marks . . . so that individual handwritings cannot be identified. This
preserves maximum freedom for the electors and peace among all.
. . . [Every] elector should bring his ballots forward and throw them with his own
hand into an empty sack hanging in the midst of the electors. When all ballots have
been deposited in the sack, the priest who has celebrated the mass should be called, as
well as a teller with a tablet on which the names of the candidates are listed . . . [The]
priest should take the ballots out of the sack in the order in which they come to hand.
He then reads out the name and the number marked, and the teller writes the number next to this name in the tablet. When all ballots are recorded, the teller should
add up the numbers next to each name. The candidate who has the highest total shall
be king.
. . . It is not possible to discover a method which leads to so infallible a decision more
safely. Indeed, all sorts of comparison among all candidates and all confrontations and
arguments likely to be made by every elector are included in this system . . . You may
well believe that no more perfect method can be found. (Hgele and Pukelsheim 2008)
Cusanus neglected the possibility that some electors might not follow the
instructions and give, say, Ten points or One point to several candidates.
In their exhortations, both Llull and Cusanus insisted on honesty, holiness, disposition of the heart, oaths at the altar of the Lord, conscience,
and so on. Were they concerned with the possibility that voters might manipulate? One cannot help but wonder if asking todays voters to solemnly swear
50
Chapter 3
oaths to elect, with free consciences, the best candidate would in any way alter
the outcomes of elections.
Cusanus proposed what today is known as Bordas method (assuming instructions are followed). Bordas 1784 account of the proposal begins by showing
that when every elector casts one vote (as they usually do in most countries),
and there are (at least) three candidates, A, B, and C, candidate A may well
receive the most votes when the electors prefer both B and C to A. He postulated twenty-one electors having the following profile of preference-orders (or
preference-profile),2 meaning their rank-orderings based on their assessments
of the relative merits of the candidates (A B means A is preferred to B):
1:AB C
7:AC B
7:B C A
6 : C B A.
So, for example, seven voters rank A higher than C, and C higher than B,
and therefore A higher than B. It is as usual implicitly assumed that individual
preference-orders are rational, meaning transitive, and strict. The English and
U.S. plurality or first-past-the-post system elects A with eight votes, while B
receives seven votes and C receives six votes, yet thirteen voters prefer both B
and C to A. The French system eliminates C in the first round and elects B in the
second round. And yet the Condorcet-winner is C. With the same preferenceorders, three different systems give three different winners. Voters inputs are
treated as strict preferences, though it is easy enough to allow for indifferences
(as is done quite often in the literature); this leads to no fundamentally different
analyses or conclusions.
Borda explained that the trouble with plurality voting is that some of the
preferences of the voters are ignored. He concluded that in order for a system
of election to be good, it must give the electors the means to pronounce on the
merits of each candidate, compared successively with the merits of each of his
competitors (1784, 659660). His emphasis on the merits was, of course, quite
right. He went on to propose Cusanuss method: As score, with 8 firsts and 13
thirds, is 3 8 + 1 13 = 37; Bs score, with 7 firsts, 7 seconds, and 7 thirds,
is 3 7 + 2 7 + 1 7 = 42; and Cs score, with 6 firsts, 14 seconds, and 1
third, is 3 6 + 2 14 + 1 1 = 47; so C is the Borda-winner.
Equivalently, instead of giving 1 point for a last place, 2 points for a secondto-last place, and 3 points for a first place, 0 points could be given for a last, 1
for a second, and 2 for a first; this simply reduces the score of every candidate
by the number of electors (in this example, 21) and thus yields exactly the same
2. We use the word preferences in the chapters devoted to the traditional model, although they
are but a pale reflection of the real thing.
51
ordering. Explained in these terms, a voter accords k Borda-points to a candidate if k opponents are ranked below him; and a candidates Borda-score is the
sum of his Borda-points over all voters. This description makes it easier to see a
second interpretation of the method, pointed out by Borda: a candidates score
is equal to the sum of the votes he receives in all pair-by-pair votes (see table
3.2b for an example). This is because an elector votes for a candidate whenever
he is confronted by a less preferred opponent. Accordingly, to calculate every
candidates Borda-score it suffices to know the tallies of every pairwise vote.
The Borda-scores may be used to determine two different outputs: a collective
rank-ordering among the candidates, the higher-placed candidates having the
higher scores, called the Borda-ranking, and a Borda-winner, a candidate with
the highest Borda-score (there may be several of each when there are ties).
For Bordas example the Borda-ranking is C S B S A (the subscript S is
systematically used to indicate a societys or a jurys ranking, whatever method
is used to establish it). Bordas method has continued to exert a strong appeal to
theorists down to the present day, though it has seen very limited use in voting
(it is used, together with an ad hoc procedure to break ties, to rank marching
bands in Texas; see chapter 7).
On the other hand, beware: every set of pairwise tallies does not necessarily
result from a profile of preference-orders; moreover, a set of pairwise tallies may
devolve from more than one profile. The seemingly innocent set of preferences
shown in table 3.1 corresponds to no profile of rational preferences. To see why,
note that the 80% of the voters who prefer A to C have one of the preferenceorders A B C, B A C, or A C B; at most 39% have the first
two preference-orders since they all prefer B to C; therefore at least 41% must
have the last preference-order. But then at least 41% prefer A to B, a contradiction. Thus a numerical example of pairwise votes that is not accompanied
by a profile of preferences is per se suspicious.
Of course, such pairwise votes could occur in practice, but that would mean
that some electors had voted irrationally or strategically, for example, for A
against B, for B against C, but also for C against A.
In summary, Cusanus and Borda count the number of votes of a candidate
in all paired confrontations, whereas Llull and Copeland count the number of
wins in all paired confrontations.
It seems surprising that Llull never noticed what is today known as the
Condorcet paradox. Condorcet gave many examples of this paradox including
one with sixty electors:
23: A B C
10 : C A B
2:B AC
8 : C B A.
17 : B C A
52
Chapter 3
Table 3.1
Pairwise Votes Corresponding to No Profile of Preference-Orders
vs.
A
B
C
60%
20%
40%
61%
80%
39%
When A is pitted against B, she wins with 33 votes to Bs 27, so this society
of electors ranks A ahead of B, or A S B; when B stands alone against C, he
receives 42 votes to Cs 18, so the society ranks B higher than C, or B S C; and
when C runs against A, she amasses 35 votes to As 25, so C S A. In short, by
Llulls and Condorcets criterion of pair-by-pair comparisons, a society S may
have no unchallenged winner and no consistent rank-ordering of the candidates.
In symbols, A S B S C S A. Not surprisingly, the search for a method that
always elects a Condorcet-winner, when he exists, has been the aim of many.
Llull (and Copeland after him) proposed exactly such a method, resolving any
remaining ties among candidates having the same number of wins by lottery.
There have been others as well.
Condorcet discovered that Bordas method could elect a candidate other than
the Condorcet-winner and so disdainfully rejected it. An example of this possibility may be seen in tables 3.2a and 3.2b. Bordas method elects C for every
valid (namely, 0 2), whereas for 1 < 2 the Condorcet-winner is A.
Had Condorcet looked more closely, he might have noticed another damning
but more subtle property of the method. Bordas method determines C to be the
winner and C S B S A S D to be societys rank-ordering (for any value
of in the interval). But if candidate D withdraws, Bordas method makes A
the winner and determines societys rank-ordering to be the exact opposite of
what it was alleged to be before: A S B S C. This strange and unacceptable
behavior is Arrows paradox. A winner and societys rank-ordering of any set
of candidates should not change when any other candidate enters or leaves the
fray. A good procedure for ranking should be (in the jargon of voting theory)
independent of irrelevant alternatives (IIA): societys ranking between any two
candidates and its designation of a winner should remain the same in the presence or absence of any other candidates. This is particularly real in sports where
tentative standings are announced throughout the competition (see chapter 7).
A Condorcet-winner, when he exists, remains a Condorcet-winner whatever
other candidate withdraw, so that mechanism satisfies the IIA condition. This
is another major reason that theoreticians defend Condorcet.
53
Table 3.2a
Preference-Profile, 0 2
%
4%
5%
29%
3%
3%
2 %
2%
32%
6%
5%
4%
5%
A
B
C
D
A
B
D
C
A
C
B
D
A
C
D
B
A
D
B
C
A
D
C
B
B
A
C
D
B
A
D
C
B
C
D
A
B
D
A
C
C
A
B
D
C
B
A
D
C
B
D
A
Table 3.2b
Pairwise Votes for Example of Table 3.2a
vs.
Borda-Score
A
B
C
D
51 %
46%
43%
49 + %
51%
35%
54%
49%
18%
57%
65%
82%
160 +
165
179
96
54
10% : A B C
40% : C B A.
Chapter 3
25% : A C B
25% : B A C
The Borda-scores are 95 for A, 100 for B, and 105 for C; A is eliminated and
C wins. But if B withdraws, A is the winner. Donald Saari advocates a slightly
different method that he calls instant-Borda-runoff : This means (as suggested
by E. J. Nanson) that the candidates are first ranked with a Borda-score; the
bottom candidate is dropped, and the remaining candidates are reranked with
the Borda-score. This continues until one candidatethe winnerremains
(2001b, 103). The following example shows that the methods are different:
10 : A B D C
11 : C A D B
10 : B D C A.
Nansons rule first eliminates C and D, and elects A. Saaris rule first eliminates
D, then B, and elects C. Instant-Borda-runoff also always elects a Condorcetwinner, when he exists, for the same reason and also fails independence of
irrelevant alternatives. Why one of these methods should be used rather than
the other is unclear, and not explained.
Australia uses a similar system, the alternative vote, to elect the members
of its House of Representatives (see chapter 2). This procedure behaves very
badly. To begin, consider the following example in which A is eliminated and
B wins:
4% : A B C
47% : B A C
49% : C A B.
28% : A C B
16% : C A B.
38% : B C A
C is eliminated in the first round, and B defeats A in the second round. But if
the 4% move B to first place, producing the profile
4% : B A C
14% : C B A
28% : A C B
16% : C A B,
38% : B C A
55
Table 3.3a
Preference-Profile: B the Nanson and Instant-Borda-Runoff Winner
22%
11%
11%
23%
22%
11%
A
B
D
C
A
D
C
B
B
A
D
C
B
C
D
A
C
A
D
B
C
B
D
A
Table 3.3b
Preference-Profile: A the Nanson and Instant-Borda-Runoff Winner
22%
11%
11%
23%
22%
11%
A
B
D
C
A
B
D
C
B
A
D
C
B
C
D
A
C
A
D
B
C
B
D
A
then A is eliminated in the first round, and C defeats B in the second round. The
same example shows that the French system is not monotonic. Interpreting it
in the opposite direction shows that the alternative vote and the French system
are both manipulable: they elect C with the second preference-profile, but by
lowering B from first to second place, the 4% who put C in the last position can
make B the winner. For example, in the French 2007 election, if Sarkozys firstround vote had sufficiently increased at the expense of Royals, then Bayrou
would have been his opponent in the second round, and Sarkozy would have
lost (according to the overwhelming evidence).
Nansons method and instant-Borda-runoff are not monotonic either; the
example of tables 3.3a and 3.3b shows that a winner, if ranked higher, may
lose. Suppose the preference-profile is as given in table 3.3a. Ds Borda-score
of 111 is the only one below the average of 150, so D is eliminated; among
the remaining three, As 99 is the only score below the average of 100, so A is
eliminated; thus with either method B defeats C.
Now suppose that B becomes the second-ranked candidate of all those
who had ranked the candidates A D C B, so that the preference-orders
become those of table 3.3b. Then D and C with Borda-scores of 100 and 145 are
eliminated at the first step with Nansons method; and A wins against poor B,
whose only fault was to have become more attractive to the electorate. InstantBorda-runoff gives the same result: it first eliminates D, then C, and again A
defeats B. Thus neither method is monotonic.
56
Chapter 3
An alternative interpretation is that the 11% of voters whose true preferenceorders are A D C B manipulate by sending a message other than their
true order of preference in which their least preferred candidate is moved up
to second place, and thus they elect their first choice A. When a method is not
monotonic, there is an opportunity to manipulate. A method that admits Arrows
paradox creates still more transparent opportunities to manipulate.
3.2 IIA and Arrows Impossibility Theorem
Perhaps the most astonishing aspect of the theory of voting is that despite
Arrows celebrated impossibility theorem and others, the search for a best
method of election has continued within the basic framework of the model
first conceived by Llull and Cusanus almost a millennium ago, namely, How
should a voting system rank all candidatesor determine a winnergiven
every individuals ranking of them?
The idea that a Condorcet-winner, when she exists, must be the winner has
well nigh become universally accepted dogma. In many situations a Condorcetwinner exists, but sometimes, as in the previous Condorcet example and in the
examples of tables 3.3a and 3.3b, there is no such candidate. There are real
examples where none exist as well, though identifying them is not a simple
matter because of the lack of information concerning the preference-orders of
voters (see chapter 2).
Any two candidates may be compared with the majority-rule: when a candidate A is higher than a candidate B in a majority of the voters preferences,
then A should be ranked higher than B, that is, A S B. The Condorcet paradox is simply an instance showing that the majority-rule does not always give a
transitive rank-ordering. Table 3.3a is an example: A S B S C S A, there
is no Condorcet-winner, but each of the candidates A, B, and C defeats D with
the majority-rule. Call the majority-rule-ranking the binary relation for which
A S B when a candidate A is ranked higher than a candidate B by a majority
of the voters. In the example of tables 3.2a and 3.2b, the majority-rule-ranking
happens to be transitive for 1 < 2: A S C S B S D and X S Y if X
is a candidate to the left of Y in the given order. There are preference-profiles
that have a Condorcet-winner but for which the majority-rule-ranking is not
transitive.
Arrow attacked the problem via an entirely new, axiomatic approach but
in the terms of the same basic model. There are many alternative, essentially
equivalent statements ofArrows theorem. One of the points that varies concerns
strict preferences A B versus weak preferences A B, that is, A B or
A B, the latter meaning A and B are tied in a voters preferences. Heretofore,
57
strict preferences of the inputs have been implicit, and the possibility of ties
in the outputs has largely been ignored. Several candidates may be tied for the
highest or any Borda-score, so that there may be several possible Borda-winners
and several possible Borda-rankings; and every other method that has been
presented may result in ties. For the sake of simplicity the following account
assumes strict preferences of inputs and outputs, and outputs are unique,
so some arbitrary tie-breaking rule breaks ties (e.g., the lexicographic order
of the names of candidates decides). But it must be understood that Arrows
theorem and the theorems and proofs that follow are essentially the same if
weak preferences are used instead and outputs are multiple.
Arrow assumes every voters input message is a preference-order that is
exactly the expression of the voters preferences (thus excluding any strategic
considerations) and that this message is unaffected by the nature of the decision
process itself. A ranking function maps the voters inputs into an output that is
societys rank-ordering for a fixed set of candidates (in the literature, a ranking
function is often called a social welfare function). Arrow considers the set of
all possible ranking functions and reasons that to be satisfactory such functions
must behave properly:
1. There must exist a solution for every possible preference-profile. The
functions domain of profiles is unrestricted .3
2. When every voter i prefers A to B, A i B, then so does society, A S B.
The function must respect a unanimous decision (in the literature, this property
is also called Pareto optimality).
3. Whether society places a candidate A higher or lower than a candidate B
can depend only on the relative positions of A and B in the preference-orders
of the voters. This is the independence of irrelevant alternatives (IIA) property.
4. No one voters preference-order can always determine societys rankordering whatever the preference-orders of all the others. The function must
be nondictatorial.
There is no ranking function that satisfies the
properties (1)(4) when there are at least three candidates.
Arrows Impossibility Theorem
Alternatively, the only system that satisfies properties (1), (2), and (3) is
dictatorial: one voters preference-order determines societys preference-order.
The truth of this result has been repeatedly observed for all the proposals
discussed so far: each fails to satisfy at least one of the properties. The majority3. In the case it suffices to assume that all profiles of strict preferences are possible.
58
Chapter 3
rule-ranking fails transitivity when the domain is unrestricted; all the others fail
IIA. This is a very clean and sparse result. It asks only that three properties
be satisfied for all possible profiles, and that is too much. And yet, there are
other important properties that good ranking functions should satisfy as well,
notably, monotonicity and resistance to strategic manipulation.
Proof The clever proof that follows is surprisingly short and easy (Geanakoplos 2005). First, let B be any candidate, and take a preference-profile in which
every voter places B either at the top or at the bottom. Then, it will be seen, a
function satisfying properties (2) and (3) must place B either at the top or at the
bottom. For, if not, there are different candidates A and C for which A S B
and B S C. Suppose every voter changes by placing C above A while keeping B either first or last. Then IIA implies A S B and B S C for the new
profile as well (since the voters orders between A and B, and between B and C
remain the same). But for the new profile unanimous decision implies C S A,
contradicting the transitivity of the output, societys preference-order.4
Second, there is a voter who by changing his preference-order can, for some
profile, move a candidate B from the bottom to the top of societys preferenceorder. Take any profile in which every voters input has B at the bottom. By
unanimous decision B must be at the bottom of the output, societys preferenceorder. Change the voters inputs one by one, moving B from the bottom to the
top of their lists, until some voter i = i(B) causes B to change position in the
output (this must happen sometime because when all voters change, unanimous
decision would place B first in the output). Call the profile before is change,
the profile after is change. Since B changed from the bottom of the output
with the profile , and B is everywhere at the top or at the bottom in , the
first argument shows that B must be at the top of the output with profile and
at the bottom of the output with profile .
Third, the voter i = i(B) is a dictator concerning all pairs of candidates A, C
other than B. To see that i can cause the output to be A S C, construct the
profile from as follows: i moves A above B, so that is input satisfies
A i B i C; all other voters determine in any way they wish the order between
A and C but leave B at the top or bottom position. IIA implies A S B because
all the voters orders between A and B are as in (where i and society had B
4. The argument for instead of in the inputs is the same. Suppose not, i.e., A S B and
B S C; place C i A for all voters i; so by transitivity A S C, by unanimous decision C S A.
Similar changes may be made in the next several steps of the proof as well. But Pareto optimality
(unanimity) must be extended to indifferences, and the concept of dictatorship must be modified as
follows. In comparing any two candidates, there is a sequence of dictators D: the first decides; if he
is indifferent, the second decides; if he is indifferent, the third decides; . . . ; if all D are indifferent,
so is society.
59
at the bottom). IIA also implies B S C because all the voters orders between
B and C are as in (where i and society had B at the top). Transitivity of the
output implies A S C. But by IIA this implies that A S C holds whenever
A i C.
Fourth, the voter i = i(B) is a dictator concerning B and any other candidate
A as well. Let C be some third candidate, and let C take on Bs role in the second
argument. The third argument shows there must be a voter j = j (C) who is a
dictator for all pairs of candidates D, E other than C, in particular A, B. But i
alone changed the output from A S B to B S A in going from to , so
j must be i.
n
Thus there is no method that satisfies all four conditions.
Designate by C the set of candidates and by C any subset of the candidates,
C. A ranking rule F associates to each profile C on candidates C societys rank-ordering of the candidates C . Versus a ranking function, a ranking
rule is defined for any subset of candidates. Arrows theorem may be expressed
in terms of ranking rules.
When is the preference-profile, i is the rank-order of voter i. Replace the
unanimous decision property by
C
2 . When every voter i has the same rank-order, i = over the candidates,
then society has the rank-order .
When i iC is the rank-order of i over all the candidates C and C C,
then iC is the rank-order obtained by simply dropping the candidates that are
not in C ; and C is the corresponding profile.
Reinterpret the IIA condition as a consistency property:
3 . F (C ) = C implies F (C ) = C, that is, the order in the output among
the candidates C alone agrees with the order in the output among all the candidates C. In other words, the ranking between two candidates is not changed by
the presence or the absence of another candidate. This is the IIA property that
is often violated in practice.
There is no ranking rule that
satisfies (1), (2 ), (3 ), and (4) when there are at least three candidates.
This version is immediate because (3 ) implies (3), and (2 ) and (3 ) imply (2).
A choice rule f has as input a preference-profile C and as output a winner
C C . As before, it is defined for any subset of candidates. A choice function
is defined on a fixed set of candidates (in the literature, a choice function is
usually called a social choice function). Several candidates may, of course, be
tied as winner in theory and in practice. In practice generically unique rules and
60
Chapter 3
functions are sought: outputs are unique save for an exceptional, very small set
of profiles. In practice some additional rule specifies the winner among the tied
candidates (obviously rare when first-past-the-post or Borda is used in a large
electorate but important when the electorate is small). France, for example,
in a law of 1999 established by a left-leaning government, broke ties among
conseillers rgionaux (representatives in regional assemblies) by electing the
youngest candidate; a right-leaning government changed this in 2003 to electing
the oldest candidate.
In all the examples of Arrows paradox the withdrawal of a nonwinning candidate changed the winner. To analyze this property is the reason that the notion
of a rule (versus a function) has been introduced. Another version of Arrows
theorem shows that this is true of all mechanisms. Replace the unanimous
decision property by
2 . When every voter i ranks a candidate C first, society declares C the winner.
Reinterpret the IIA condition as
To see this, suppose that there was an f that satisfied the conditions.
Then define the ranking rule F recursively, by putting the winner of f first, the
winner of f among the remaining candidates second, . . . , or more precisely,
Proof
F () = {C1 C2 Cn },
where
Ci = f (C )
i
As Arrow suggested, Knowing the social choices [that is, the choice function] made in pairwise comparisons in turn determines the entire social ordering
and therewith the social choice function [that is, the ranking function] C(S) for
all possible environments (1951, 28).
It is straightforward to verify that F satisfies properties (1) and (2 ). To prove
that (3 ) holds, it is shown that if a candidate Ck is dropped, the order among the
remaining candidates stays the same. If Ck is dropped, property (3 ) implies
61
62
Chapter 3
Utility
Peak of
voter 1
Peak of
voter 3
Peak of
voter 2
xC
xC
xC
xC
xC
xC
xC
Candidates
Figure 3.1
Single-peaked preferences.
63
median-voter and all voters whose preferred candidates are to the right of C ;
and if C is to the right of C , then C obtains the votes of the median-voter and
all voters to the left of C .
Essentially the same reasoning establishes the transitivity of the majorityrule-ranking because if any candidate is dropped, the voters preference-orders
among the remaining candidates are still single-peaked, so if the Condorcetwinner is put aside there is a (second) Condorcet-winner among the remaining
candidates; continuing, the majority-rule-ranking is found. Notice, however,
that this argument fails when there is an even number of voters; for example, if
50% : A B C
50% : B C A;
then by majority-rule A and B are tied, A S B, and also A and C are tied, A S
C, but B defeats C, B S C, so the majority-rule-ranking when ties are possible
(with A S B meaning either A S B or A S B) is not transitive because
A S B S C S A. In an election with thousands or millions of voters such
ties with the majority-rule are so unlikely that they may reasonably be dismissed
(as is done in a model that follows), so Blacks result may be accepted in general;
it is true generically.
Is it reasonable to believe that there are elections among candidates that
satisfy Blacks condition? The stereotyped classical political cleavage of left
versus right is the obvious possibility, but the empirical work on spatial voting models has clearly discarded the idea that one dimension suffices (e.g.,
Enelow and Hinich 1984). A recent electoral experiment proves that the condition is far from satisfied in practice (see chapter 6), although the data yield a
probabilistic sense of left to right. Black himself had doubts, for he advocated
another, hybrid method. Blacks method : [The] Condorcet criterion should first
be used to pick out the majority candidate if there is one; and if no majority candidate exists, that candidate should be chosen who has the highest Borda-score
(1958, 66).
Blacks method does not behave continuously: small changes in the messages
of the voters or their numbers can lead to big abrupt changes in the results.
Specifically, suppose f is either a choice function or a ranking function and
f (Pi ) for every preference-profile Pi in a sequence approaching the profile P, Pi P. Then f (P) if f is continuous. In the example of tables 3.2a
and 3.2b, the Borda-ranking is C S B S A S D whenever 0 2.
When 0 1 there is no Condorcet-winner, and when 1 < 2
the majority-rule-ranking is transitive with A S C S B S D. Therefore
Blacks method suddenly vaults A from third place to first place as increases
from below 1 to above 1. Any hybrid method is likely to exhibit this type of
64
Chapter 3
behavior. The two methods are almost opposed: the Condorcet-winner is resolutely in the next-to-last position in the Borda-ranking. This is no isolated
phenomenon. Suppose there are m + 1 4 candidates and 2m + 1 voters with
the preference-profile
m + 1 : A1 A2 Am1 Am Am+1
m : Am Am1 A2 Am+1 A1 .
Then A1 is the Condorcet-winner but next-to-last in the Borda-ranking (when
m + 1 = 4 A1 , is tied for next-to-last).
Blacks restriction to single-peaked preference-profiles naturally led to the
question, What are not only sufficient but also necessary restrictions on the
preference-profiles of a society to guarantee the existence of a Condorcetwinner and/or a transitive majority-rule-ranking? The essence of the answer
to the second part was established in 1969 (Inada 1969 for ranking functions,
Sen and Pattanaik 1969 for choice functions). It is described in the context of the
model of Partha Dasgupta and Eric Maskin (2008), who postulated a continuum
of voters rather than a finite number and asked that the majority-rule-ranking
be generically transitive, meaning transitive except for ties, when exactly half
the voters prefer one candidate to anotherhardly a likely event when there
are many voters.
Consider any three candidates A, B, and C. The preference-orders of voters
can contain two different Condorcet-cycles among them:
ABC
BCA
C A B,
CBA
B A C.
or
ACB
Theorem The majority-rule-ranking is transitive on a domain of preferenceprofiles if and only if the domain is restricted to profiles that contain no
Condorcet-cycle among any triple of candidates. (Dasgupta and Maskin 2008;
the same result with a finite, odd number of voters is given in Sen and Pattanaik
1969.)
This restricts the preference-orders of voters to those that omit at least one
of the orders of each of the Condorcet-cycles on every three candidates. It is
a simple matter to verify that single-peaked preferences satisfy this condition.
There are, however, profiles for which the condition does not hold and yet there
is a transitive majority-rule-ranking and so a Condorcet-winner (see section
4.3). This occurs because a restriction on the domain says nothing about the
number of voters involved (e.g., in a large electorate, extremely few violations
65
of the condition may have no influence). As mentioned earlier, there are also
real-life, historical examples where there is no Condorcet-winner. The following
preference-profile is a real example that violates the restriction on the domain. It
comes from the election of a president of the Social Choice and Welfare (SCW)
Society (Brams and Fishburn 2001; Saari 2001a):
13 : A B C
11 : A C B
9:B C A
8 : C B A.
11 : C A B
Theorem
The theorem invokes a very restrictive condition that can hardly be assumed
to occur; for example, the preceding example of the SCW Society violates it.
They concluded that the simplest way to overcome the possible absence
of a Condorcet-winner is what we call the Dasgupta-Maskin method : If no
one obtains a majority against all opponents, then among those candidates who
66
Chapter 3
defeat the most opponents in head-to-head comparisons, select as winner the one
with the highest [Borda-score] (2004, 97). This is a hybrid method, a refinement
of Copelands (or Llulls) method: if several candidates are tied as Copelandwinners, then apply Bordas (or Cusanuss) method to decide among them. It
yields exactly the same outcomes in the example of tables 3.2a and 3.2b as does
Blacks method, and so it is not continuous. Attempting to define a method of
voting that guarantees electing the Condorcet-winner, when she exists, based
on some other rationale is bound to behave in this manner. The method is, of
course, manipulable. The example of tables 3.2a and 3.2b with = 2 shows it.
A, the Condorcet-winner, wins. But if A is dropped to third place by the 5%
of the voters with the ranking C A B D, there is no Condorcet-winner;
each of the three candidates A, B, and C beat two others; and C, the Bordawinner, wins, much to the satisfaction of those voters. Surprisingly, the method
was actually used well before it was formally defined (see chapter 7).
What is fascinating in the centuries-old history of the debates that have raged
over how to elect and how to rank is the persistent dominance of the concepts of
the Condorcet-winner and the Borda-ranking. In fact, they are not compatible
ideas.
By my troth, this is the old fashion; you two never meet but you fall to some discord:
you are both, in good troth, as rheumatic as two dry toasts; you cannot one bear with
anothers confirmities.
William Shakespeare
68
Chapter 4
important: Suppose A to be the candidate whom I wish to elect, and that a division is taken between B and C; am I bound in honour to vote for the one whom
I should really prefer, if A were not in the field, or may I vote in whatever
way I think most favourable to As chances? Some say the former, some the
latter. I proceed to show that, whenever [there is no Condorcet-winner] and
there are among the electors a certain number who hold the latter course to be
allowable, the result must be a case of cyclical majorities (cited in Black 1958,
232, 265).
Black casually dismissed Nansons method despite its election of the
Condorcet-winner (when he exists), claiming that it would be unintelligible to
the average voter and too laborious. He discussed Condorcets probabilistic
approach and correctly reported that Condorcet had subsequently abandoned it,
but he failed to see that it defined a method of ranking. For this he should
be forgiven because Condorcets description was confused, complicated, and
erroneous.
4.1 Condorcets Method of Ranking
v(Ai Ak ).
i=1 k>i
1. This model is not entirely satisfactory, however, because asking for a voters ranking of the
candidates implies that the pair-by-pair comparisons are not independent. How to model this, let
alone how to do the ensuing computations, is not at all clear.
69
Table 4.1
Pairwise Votes for Example of Tables 3.2a and 3.2b
vs.
Borda-Score
A
B
C
D
51 %
46%
43%
49 + %
51%
35%
54%
49%
18%
57%
65%
82%
160 +
165
179
96
70
Chapter 4
and B S A S C S D.
By implicitly or explicitly studying the problem of choice rather than the problem of ranking, most of the past work would reject such ties. Or, if such a tie
did exist, then it would be interpreted as meaning that there is a three-way tie
for first placeB tied for first surely qualifies C as tied for first, tooso that
societys order should be A S B S C S D. This point of view is too restrictive. What explains the curious nature of the two orders is the fact that the outputs
of the traditional model are rankings of the candidates, whereas the outputs of
Condorcets method are rankings of rankings of candidates.
Condorcet himself seems to have realized that the maximum likelihood
estimate of who the winner should be among the candidates depends (in
a complicated fashion) on the value of p > 12 and is not necessarily the
Condorcet-winner. He then abandoned this approach and the Condorcetranking altogether to champion the one simple idea of the Condorcet-winner.
Young (1986) was the first to see the distinction clearly: Even if we reject
the specific probabilistic model by which this conclusion is reached, there are
still strong a priori grounds for asserting that the [Condorcet-ranking and the
Borda-winner] are the optimal rules for ranking and choice respectively.
A choice function or a ranking function is anonymous if it treats all voters
equally, neutral if it depends only on the numbers of the different preferenceorders of a profile (or treats all candidates equally), and impartial if it is both
anonymous and neutral. These are clearly essential properties of choice and
71
ranking functions. Imagine now that a society is split into two separate parts,
each with its preference-profile, and that a choice or a ranking function f
determines a set of winners W1 and W2 of each part or a set of rankings R1
and R2 of each part. f is a join-consistent choice rule when f determines the
set of winners of the entire society to be those candidates that are winners in
both W1 and W2 (whenever they have a winner in common, W1 W2 = ).
f is a join-consistent 2 ranking rule when f determines the set of rankings of
the entire society to be those rankings that are in both R1 and R2 (whenever
they have a ranking in common, R1 R2 = ). This idea seems reasonable:
the solutions in common to both groups (if such exist) should be the solutions
of the two groups reunited.
Suppose that f is a ranking function that selects the ranking
Theorem
Note that earlier a rule (as opposed to a function) meant that the number of
candidates varies; here it is taken to mean that the number of voters varies.
Elsewhere, the context will determine the sense of rule versus function.
4.2 Bordas and Sum-Scoring Methods
Consider now the problem of choice rather than that of ranking. To begin, notice
that when the profile is a Condorcet-componentequal numbers of voters
2. Young sometimes calls this reinforcement, at other times consistency.
72
Chapter 4
k:BCA
k : C A B,
impartiality requires that all candidates must be in a vast tie as winners. But
there are other profiles for which the same ought to be true, namely, when for
every pair of candidates A and B the number of voters who rank A higher than
B equals the number who rank B higher than A. f has the cancellation property
if in this case it selects all the candidates (there is again a gigantic tie among
them all). There can be many such situations. An example is equal numbers of
voters having each of the two preference-orders
k:ABCD
k : D C B A.
Make the innocuous assumption that f is faithful: if there is only one voter,
then his highest-place candidate is f s choice.
The unique choice rule that is impartial, join-consistent, faithful,
and satisfies the cancellation property is the Borda-winner. (Young 1975)
Theorem
f (Q + P + + P) = A,
where + means taken together in one profile.
The unique choice rules that are impartial, join-consistent, and
respect large electorates are the sum-scoring methods. (Young 1975)
Theorem
There is a parallel result that addresses rankings. A ranking rule f is pairwise join-consistent if A B for the preference-profile P and A B for the
preference-profile Q, then A B for P + Q; and if A B for both profiles,
73
then the same holds in the joint profile. f respects the pairwise rankings of large
electorates if A B in P, then for any Q there is an integer n large enough to
guarantee that A B for the profile
n
Q+P ++P .
The unique ranking rules that are impartial and pairwise joinconsistent and that respect the pairwise rankings of large electorates are the
sum-scoring methods. (Smith 1973)
Theorem
Bordas method may not elect the Condorcet-winner (when he exists). Much
more may be said, for there is a basic opposition between Condorcet-winners
and sum-scoring-winners, of which one manifestation is the following.
The Condorcet-winner of some profiles is elected by no strictly
monotonic sum-scoring method. (Fishburn 1984)
Theorem
2:ABC
1:ACB
1 : B C A.
74
Chapter 4
Nevertheless, Saaris conclusion is to opt for Bordas method: What provides hope from these dictionaries is that the Borda Count . . . is the unique
rule (when used with every subset of candidates) that significantly minimizes
the number and kinds of allowed paradoxes. Thus the Borda Count enjoys
the maximum number of positive properties; e.g., only Borda always ranks a
Condorcet-winner over a Condorcet loser (2009, 4). Bordas method may minimize misbehavior among all sum-scoring methods, but it certainly does not
eliminate it (as is shown by experimental evidence in chapters 6 and 19 as well
as the example of tables 3.2a and 3.2b). Moreover, it fails in other important
dimensions (as will be seen anon): it is highly manipulable and extremely biased
in favor of centrist political candidates or in favor of good but unexceptional
competitors.
4.3 Objections to Condorcet-Consistency
Bordas method is characterized by conditions that concern selecting a candidate, Condorcets by conditions that concern selecting a ranking of the
candidates. That is why Bordas method is singularly suited to choosing a
winner (but not a ranking) and Condorcets to choosing a ranking (but not
a winner). This point is strikingly evident in an example with eighty-one voters that Condorcet invented to argue Bordas method was bad, but that Saari
(2001b) used even more effectively to show the questionable legitimacy of a
Condorcet-winner in the context of the traditional model:
30 : A B C
10 : B C A
1:ACB
10 : C A B
29 : B A C
1 : C B A.
The Borda-score ranks the candidates B S A S C, yet A is the Condorcetwinner. Intuitively, the reason for the Borda outcome is that A wins over B by
the narrowest of margins and defeats C comfortably, whereas B trounces C by
so much that he emerges as the over-all winner. But look closer. Thirty of the
eighty-one voters have preferences that constitute a Condorcet-component
10 : A B C
10 : B C A
10 : C A B.
These voters cancel each other out when it comes to designating a winner
because there is a perfect symmetry among all the candidates; said differently,
these voters together say that the candidates A, B, and C are tied. Indeed,
the same is true for another Condorcet-component that is contained in the
preference-profile:
1:ACB
1:CBA
1 : B A C.
75
Since these thirty-three voters cancel each other out, their preference-orders
may be dropped, and the election outcome should be decided by the remaining
forty-eight voters, whose profile is
20 : A B C
28 : B A C.
The [Borda-score] outcome for all n-candidates is the unique ranking which
avoids all of the indicated problems (2000, 57; our emphasis).
Saari claims that Bordas dominates all other scoring methods in avoiding
difficulties because in addition to the Condorcet-components it cancels out
all the symmetries that cause noxious deviations.
76
Chapter 4
The encouraging news is that the measures developed for what the voters really want isolate a unique procedure, the Borda-score, which meets all
expectations (2001b, 110).
These conclusions ignore the distinction between ranking and choice; moreover, Bordas method and the more general scoring methods are all highly
manipulable. Saari recommends the following:
Instant-Borda-runoff should be used to combine the consistency of Borda
outcomes while frustrating manipulative voting (2001b, 103).
3 : A C B(: 3)
6 : C A B(: 8)
4 : B A C(: 1)
2 : C B A(: 0).
77
properly.
Take any rule consistent with Condorcet, and suppose there are n 3
candidates, A1 , . . . , An . For n even, define P to be the (n + 1)-voter profile
Proof
n
2
:
+1 :
n
2
A1 A2 A3 An
A2 A1 A3 An
A1 A2 A3 An
A2 A3 An A1
..
.
1:
An A1 A2 A(n1) .
A1 A 2
A1 A 3
..
.
( n2 + 1) + 1 :
2:
A2 A1
A3 A 1
..
.
(n + 1) + 1 :
A1 An
n1 :
An A1 .
A1 is preferred more often against each opponent and thus is the Condorcetwinner and therefore the rules winner (when n is odd, P + C is a (2n + 2)-voter
profile, and it is easy to check that A1 is the rules winner). The rule does not
n
cancel properly.
Join-consistency says that if a candidate A is the winner with a rule in each of
two separate electorates, then A must be the winner with that rule in the united
electorate. Another objection to Condorcet-winners is the following.
There is no choice rule consistent with Condorcet that is joinconsistent. (Due to Young; see to H. Moulin 1988, 237238.)
Theorem
78
Chapter 4
Proof The argument follows that given in H. Moulin (1988). Take any joinconsistent rule that is consistent with Condorcet, and suppose there are n 3
voters. Choose a preference-profile P with n voters that has no Condorcetwinner for which the rule makes A the winner. This implies there must be
another candidate B who is preferred to A by k voters, where k > 12 n. Letting
Q be the preference-profile
2k : A B C
n : B A C ,
consider the preference-profile 2P + Q of 3n + 2k voters. Since the rule is joinconsistent, A must be elected by 2P. A is the (unique) Condorcet-winner of Q,
so the join-consistent rule uniquely elects A. However, of the 2k + 3n voters,
2k + n prefer B to A, so more prefer B to A. Moreover, at least 2k + n prefer
n
B to any other candidate, so B is the Condorcet-winner, a contradiction.
Againnow with regard to join-consistencythere is an opposition between
sum-scoring methods and rules that are consistent with Condorcet.
For any profile of preferences, a candidate A is preferred to each other candidate B by a certain number of voters, v(A B) 0. As Simpson-score is
the minimum of those numbers, s(A) = minX v(A X). The Simpson-winner
is the candidate with the highest Simpson-score. Simpsons method is clearly
consistent with Condorcet. Apply it to the fifteen-voter profile
3:ADCB
5:DBCA
3:ADBC
4 : B C A D.
Then
s(A) = min v(A B), v(A C), v(A D) = min{6, 6, 10} = 6,
s(B) = 4,
s(C) = 3,
s(D) = 5,
so A is the Simpson-winner, though he is not a Condorcet-winner.
If four additional voters with the same preferences
4:CABD
participated in the vote, all preferring A to B, then the nineteen-voter Simpsonwinner is B. This is the well-known no-show paradox: these voters would have
done better for themselves not to vote.
Suppose a choice rule selects candidate A as a winner. Then the rule is
participant-consistent if when an additional voter casts his ballot, either A is
a winner or a candidate preferred to A by that voter is a winner. Participantconsistency is essentially a special case of join-consistency where one part of
the electorate consists of one voter. Once again, the Condorcet approach fails.
79
There is no choice rule consistent with Condorcet that is participantconsistent when there are at least four candidates. 3 (H. Moulin 1988, 238239,
251).
Theorem
The proof is slick. It uses the fifteen- and nineteen-voter profiles just
given. Suppose there is such a rule. Take a profile P of n voters with a pair of candidates A and B for which s(B) v(A B) + 1 and n 2s(B) + 1 > 0 (the
fifteen-voter profile shows such pairs exist). Consider the augmented profile
Proof
n 2s(B) + 1 : A B
Lemma
As a consequence, A is the only possible winner with a rule that satisfies the
two properties in the fifteen-voter example because s(A) is greater than any
other candidates Simpson-score. Adjoin the four voters with identical preferences, all of whom prefer A to B, to obtain the nineteen-voter profile. In that
profile, s(A) = 6, s(B) = 8, s(C) = 7, and s(D) = 5. The same lemma implies
n
that the only possible winner is B, a contradiction.
Proper cancellation, join-consistency, and participant-consistency are three
closely related properties that argue against Condorcet-winners. Each in essence
characterizes sum-scoring methods, so it can be concluded that there is a fundamental opposition between Condorcet-winners and sum-scoring-winners in
the traditional model.
4.4 Borda-Winners and Condorcet-Rankings
Putting aside for the moment that the Borda-ranking is wide open to strategic
manipulation, is it nevertheless a good method for designating a winner? In that
3. Simpsons method is participation-consistent with three candidates.
80
Chapter 4
case, declaring the Borda-score winner in first place, the Borda-score winner
among the remaining candidates in second place, and so on, another ranking
instant-winner-Borda-runoffis defined. Is this a good ranking? It is sure to
rank a Condorcet-loser last, and it seems better suited to designating a winner.
But all three ordersSaaris instant-(loser-)Borda-runoff ranking, the instantwinner-Borda-runoff ranking, and the Borda-rankingmay differ.
Is the Condorcet-ranking a good method for designating a winner? In that
case, declaring the Condorcet-score winner in first place, the Condorcet-score
winner among the remaining candidates in second place, and so on, another
rankinginstant-winner-Condorcet-runoffis defined. Or one could declare
the Condorcet-score loser in last place, the Condorcet-score loser among the
remaining candidates in next-to-last place, and so on, to obtain the instant-loserCondorcet-runoff ranking. Are these good rankings? Nothing at all changes;
these two rankings both coincide with the Condorcet-ranking (in this sense,
Condorcets method almost satisfies IIA).
To see why suppose, more generally, that the first several candidates and the
last several candidates were dropped and that the Condorcet-ranking among
the remaining candidates changed. This implies that the change would have
increased the total count among the remaining candidates; but then the same
change would increase the total count by the same amount in the original order,
a contradiction. This is reassuring. On the other hand, the Condorcet-ranking
always ranks the Condorcet-winner W first (as does instant-loser-Borda-runoff,
when W exists) and always ranks the Condorcet-loser L last (as does instantwinner-Borda-runoff, when L exists), which is unacceptable for selecting either
a winner or a loser, as Young and Saari have argued.
The contrast between Bordas and Condorcets approaches is stark. The
method of Borda is more appropriate for designating winners and losers, Condorcets more appropriate for finding a best ranking among the candidates. Very
simple new characterizations of them makes this point even clearer.
Given any preference-profile, a candidate-scoring method assigns a nonnegative score to every candidate.Acandidate-scoring method should (1) assign
a zero to the worst possible candidate, that is, give a score of 0 to a candidate who
is last on every voters list; and (2) correctly reward a minimal improvement,
that is, when one voter inverts two successive candidates of his list, the score
of the candidate who has placed higher increases by 1.
Theorem 4.1 (Borda Characterization) The Borda-score is the unique candidate-scoring method that assigns a zero to the worst possible candidate and
correctly rewards minimal improvements.
81
Proof This is easy to see. The Borda-score clearly satisfies the two properties.
On the other hand, the argument that follows shows that if the properties hold,
then the score of any candidate must agree with the candidates Borda-score.
Consider any preference-profile, and choose any one candidate C. Put C at the
bottom of every voters list. Cs score must now be 0. Take one voter and raise
C back to his previous position one step at a time. At each step he gains a point,
so if he rises k times, he gains k points from this voter. But this is precisely the
number of points contributed by this voter to the Borda-score of C, so doing
n
the same for every voter proves the theorem.
82
Chapter 4
83
ranking a real number and selects the ranking(s) having the largest real number.
Sum-consistency may be defined for value ranking functions in the same way
as it is for value choice functions, and an analogous theorem characterizes those
whose values equal sums of values over the rankings of voters.
The arguments of this section all suggest that Borda-rankings and Condorcetwinners should be discarded. The properties imply that Bordas approach makes
sense only for designating a winner and Condorcets only for designating a
ranking.
4.5 Incompatibility of Electing and Ranking
84
Chapter 4
satisfies conditions (1) and (2) but not (3), whereas Condorcets satisfies (1)
and (3) but not (2).
There is no ranking rule that is winner-loserunanimous, choice compatible, and rank-compatible for all preference-profiles
(when there are at least three alternatives).
Proof
p : A1 Ak A B C
p : A1 Ak C A B
p : A1 Ak B C A
q : A1 Ak A C B,
and suppose the method f yields a ranking R. Candidate A1 is placed first by all
voters, so by winner-loser unanimity, A1 must be the winner and hence the firstplace candidate of R. Rank-compatibility implies that when A1 is removed,
f applied to the profile that remains must yield a ranking that is the same
as R (except that A1 is absent). Repeating the same reasoning, successively
removing A2 , . . . , down to Ak shows that the first k candidates of the ranking
R are A1 S S Ak and its last three candidates are ranked by f applied
to the reduced profile
p:ABC
p:BCA
p:CAB
q : A C B.
The reduced profile contains a 3p-voter Condorcet-component, and the remaining voters unanimously place A first and B last. By winner-loser-unanimity and
choice-compatibility this implies that the (3p + q)-voter profile does the same,
so f yields the ranking A S C S B. Therefore R = A1 S S Ak S
A S C S B.
Rank-compatibility now implies that when A1 is dropped from the profile,
f yields the remaining part of R; when the winner of that part A2 is dropped
from the profile, f yields the remaining part of R; . . . , down to dropping A
from the profile. But without A1 , . . . , Ak , A, the profile is
pq : B C
p+q : B C
p + q : C B.
The last 2p + 2q voters constitute a Condorcet-component, so by choicecompatibility and unanimity, f yields the ranking B S C, a contradiction
n
that completes the proof.
A similar argument shows that losers instead of winners could be invoked in
the definition of rank-compatibility.
The three conditions of this theorem do not demand much and are unassailable
in any practical application. The first merely requires that complete agreement
85
on a winner or loser must be confirmed in the final standings. The second asks
considerably less than Youngs and Saaris cancellation for two reasons: (1)
only one Condorcet-component is canceled; and (2) a Condorcet-component is
canceled and a decision is assumed only if the remaining voters unanimously
agree on a winner or loser (which is much less demanding than to assume a decision whatever the remaining voters orders). The third is weaker than Arrows
independence of irrelevant alternatives property, for it asks that the ranking
stay the same among the remaining candidates only when the first-place candidate is withdrawn, not when any candidate is withdrawn. The theorem tells
us that runoffsdropping a winner or a loser and then ranking the remaining candidatesare doomed to failure. As Saari realized, reranking dismisses
information that is essential. More fundamentally, the theorem says that in the
context of the traditional model it is impossible to assert that the first-place
candidate of the ranking of an electorate or a jury is necessarily the winner.
This is damning testimony against the very validity of that model.
More evidence may be given to show that there is a fundamental difference
between the problem of choice and the problem of ranking. To begin, consider
a situation where the profile of a society is k-Cond (A B C):
k:ABC
k:BCA
k : C A B.
86
Chapter 4
Table 4.2
Condorcets Solution for the Profile k-Cond (A B C)
Condorcet-Points
R1 : A S B S C
R2 : A S C S B
CondorcetScore
Ranks of
Rankings
k
0
0
2k
2k
0
0
k
5k
4k
1st
4th
Table 4.3
Condorcets Solution for the Profile k-Cond (A B C D) of 4k Voters
Condorcet-Points
R1
R2
R3
R4
R5
R6
: A S
: A S
: A S
: A S
: A S
: A S
B S C S D
B S D S C
C S D S B
D S B S C
C S B S D
D S C S B
CondorcetScore
Ranks of
Rankings
k
0
0
0
0
0
0
k
0
k
k
0
0
k
2k
k
0
k
2k
0
k
0
k
2k
k
k
0
k
2k
0
0
k
k
k
0
0
0
0
0
0
0
k
14k
12k
12k
12k
12k
10k
1st
5th
5th
5th
5th
21st
Note: Four are tied for 1st; sixteen for 5th; and four for 21st.
k:BCDA
k : D A B C.
It shows even more dramatically how very different ranking is from choice.
Each Ri in table 4.3 represents four preference-orders that must have
identical Condorcet-points (and Condorcet-scores). By impartiality, the three
preference-orders that define a Condorcet-component with R1 have the same
Condorcet-points as R1 ; the same is true for R6 . BCAD (dropping the for
brevity), CDBA, and DACB have the same Condorcet-points as R2 ; BDAC,
CABD, and DBCA have the same as R3 ; BACD, CBDA, and DCAB have
the same as R4 ; and BDCA, CADB, and DBAC have the same as R5 .
87
Some rankings are clearly better for society than others. R1 dominates R5
and R6 because for any h = 0, . . . , 6, the number of voters who agree on
at least h orders between pairs of candidates is always at least as great, and
sometimes greater, for R1 than for R5 or R6 (in mathematical terminology,
this is stochastic dominance). At least as many voters are in at least as great
agreement with R1 (and either some at least is more or some as great
is greater) than with either R5 or R6 . This is confirmed by their respective
Condorcet-scores (as it must be by any reasonable criterion of comparison).
The comparison between, say, R1 and R2 is not evident and depends on the
criterion invoked. On the other hand, every candidate must be tied for first
by any reasonable method. Moreover, all twenty-four possible rank-orderings
have exactly the same Borda-score, showing how inadequate Bordas method
is for ranking.
The order given by the Condorcet-scores to the various rankings of candidates
is very curious. First, note that the Condorcet-scores of A B D C and
its opposite C D B A are exactly the same. Next, the Condorcet-scores
rank the rankings not in accord with choice-monotonicity. For consider the
ranking A B D C with score 12k, which is ahead of the ranking
D C B A with score 10k. Suppose that the preference-profile k-Cond
(A B C D) changed, with 2k of the voters who place A ahead of B
inverting them in their rankings. Then after the change A B D C has
the score 10k and so is behind D C B A, whose score is 12k. But B
was ahead of D and C in the preferred ranking before and was moved strictly
higher by 2k voters, so choice-monotonicity suggests B should be ahead of C
and D after the change in the preferred ranking, but it is not.
Now add one voter to k-Cond (A B C D) with preference-order A
D C B to obtain the (4k + 1)-voter profile
{A D C B} + k-Cond (A B C D).
R1 and the other three rank-orders with which it defines a Condorcet-cycle
each have Condorcet-score 14k. All other rank-orders have Condorcet-scores
at most 12k. Since at most six Condorcet-points can be added to any ranking,
one of those first four must be the Condorcet-ranking when k > 3. Among the
first four, the rank-order D A B C has the highest Condorcet-score, so
it is the Condorcet-ranking, as may be seen in table 4.4. On the other hand, it
is clear that A should be the winner and B the loser by any reasonable choice
function. Once again the traditional model leads to an incompatibility between
winners and rankings.
88
Chapter 4
Table 4.4
The Four Rankings with Highest Condorcet-Scores for the (4k + 1) Voter Profile {A D C
B} + k-Cond (A B C D) when k > 3
Condorcet-Points
6
CondorcetScore
Ranks of
Rankings
R41 : D S A S B S C
2k
14k + 4
1st
R31 : C S D S A S B
R11 : A S B S C S D
R21 : B S C S D S A
2k + 1
14k + 3
2d
2k + 1
14k + 3
2d
2k
14k
4th
To see the truth of this statement, assume there is such a ranking function
and consider the profile
Proof
p : A B C A1 Ak
p : C A B A1 Ak
q : C B A A1 Ak
p : B C A A1 Ak
q : A C B A1 Ak
q : B A C A1 Ak .
p + 2q : A C B A1 Ak .
89
90
Chapter 4
If
k th
{A B} {B A} and
{A C} {C A},
{B C} {C B}
implies
for every three competitors. It will be seen anon that these conditions can be
satisfied.
The input in this model is a profile that gives the voters preferences over the
rank-orders on every subset of competitors; the output is a rank-order of society.
The natural counterparts to Arrows conditions (see chapter 3) are as follows:
91
(1) There must exist a solution for every possible set of inner-consistent preferences over rank-orders. (unrestricted domain) (2) When every voter has
the same preferences, the common preferred rank-order on all n competitors is
societys rank-order (unanimity). (3) If a competitor withdraws, societys
rank-order among the remaining competitors remains the same (IIA).
(4) No one voters preferences over rank-orders can always determine societys
rank-order, whatever the preferences of all the other voters (non-dictatorial).
There is no method of amalgamating the preferences over rank-orders into a rank-order of society that satisfies the four
conditions (when there are at least three candidates).
It is shown that there is no method for a particular profile of innerconsistent preferences because of Arrows theorem; hence there cannot be a
method for an arbitrary profile of inner-consistent preferences.
Name the n competitors A1 , A2 , . . . , An so that voter vs preferred rank-order
over all n competitors is An An1 A1 . Voter vs code is defined by
v(Ak ) = k for k = 1, 2, . . . , n. An order is defined recursively on the codes,
which will in turn determine vs preference between any two rank-orders with
a common set of competitors.
First, (j1 , j2 ) c (j2 , j1 ) if j1 >j2 . This simply says that the order {Aj1
to the order {Aj2 Aj1 } if j1 > j2 .
Aj2 } is preferred
Next, j1 , j2 , . . . , js c i1 , i2 , . . . , is , where the ik s are a permutation
of the jk s, if either
Proof
jk = s
and
max il < s,
1lk
or
jk = s = ik
The idea is that a voters top priority is for his favorite among any subset
of candidates to be the highest possible in the ranking. Thus, for example,
(2, 1) c (1, 2) and
(3, 2, 1) c (3, 1, 2) c (2, 3, 1) c (1, 3, 2) c (2, 1, 3) c (1, 2, 3).
The code defines vs preference between any two rank-orders with a common
set of competitors by
{Aj1 Aj2 Ajs } {A ( j1 ) A ( j2 ) A ( js ) }
if
j1 , j2 , . . . , js c (j1 ), (j2 ), . . . , (js ) ,
92
Chapter 4
94
Chapter 5
of candidates, and thus he gave a justification for the Borda-points (though the
uniformity assumption is doubtful). 1
Laplace went on to conclude that Bordas is the method for finding the candidate of greatest merit: Such is the method of election indicated by the Theory
of Probability (xcii). In doing so, however, he simply assumed (as have many
others) that summing the Borda-points is the evident or only reasonable way of
aggregating the evaluations of many voters.
But he realized that Bordas method can only find the candidate of greatest
merit if the voters honestly report their preference-orders: This method of
election would be without a doubt the best if considerations alien to the merit
[of a candidate] did not influence the choice of the electors, even the most honest
ones, and did not determine them to rank last the most dangerous opponents
to their favorites, which would give a big advantage to candidates of mediocre
merit. Moreover, the experience of institutions which adopted it has led them
to abandon it (277; the analysis, 275279).
Laplace seems to be the first person to have clearly seen the importance of
strategic manipulation in voting.2 Beginning with a new and promising idea that
builds a probabilistic argument to justify the Borda-points (and, he thought, the
Borda-winner), he ended up supporting a Condorcet-winner, suggesting that
an assembly that continues to vote often enough will eventually give to one
candidate an absolute majority through sheer exhaustion of the participants
(much like the procedure for electing a Pope). However, an election that requires
many rounds is suspicious, for the result depends on the information transmitted
from round to round, and the ultimate winner may have little to do with the
original preferences of the voters (yet everything to do with one or two strong
personalities among the voters).
Laplace saw a difference between electing a candidate and voting on a provision concerning some common end such as a budget. In the latter case, he
assumed, there is a common search for the one correct decision, and each voter
honestly assigns to alternative motions the probabilities he believes are those
with which they should be chosen. But, again, the voters only report their
preferences. Laplace assumed that the probabilities assigned to the alternative
motions by a voter are uniformly distributed. He then used the same analysis as he did in voting for candidates, calculating the average of all the lowest
1. Grades that are very near perfection are scarce, as are often though not always grades that are
extremely low. The Orsay experiment and recent experiences in grading wines confirm this (see
chapters 14 and 21).
2. Farquharson (1969) traces the study of strategic manipulation to Pliny the Younger. He also cites
Dodgson (1876).
95
96
1:AB C
Chapter 5
7:AC B
7:B C A
6 : C B A.
The first-past-the-post system elects A, but if the six whose true preferenceorder is C B A vote for B, then B is elected, which for them is better. The
two-past-the-post system elects B, but if the seven whose true preference-order
is A C B vote for C in the first round, then A is eliminated and C is elected
in the second round, which for them is better. The Borda-winner is C, but if the
seven whose true preference-order is B C A switched C and A, then B is
elected, which they prefer. In the French, Anglo-American, and Borda systems
it may pay voters not to vote for their favorites or to not honestly report their
orders of preference.
In practice, voters can, and do, send messages that misrepresent their real
preferences to abet the chances of the outcomes they seek (see chapter 2). If
no candidate in Frances presidential elections wins an absolute majority of
the votes, there is a runoff between the top two candidates. If an elector is
sure his favorite has no chance to survive the first round, he may vote for the
candidate he prefers among those who do have a chance; or if his favorite is
sure to be one of the top two, he may well do best by voting for the weakest
realistic opponent to his favorite. Neither vote is in accord with the electors
true preferences. And, as mentioned earlier, Dodgson concluded that the road
to foiling Condorcet-winners is to provoke cyclic-majorities.
More subtle misrepresentations may affect the winner of an election when the
alternative vote or Nansons system is used because placing a losing candidate
lower in some electors rankings can turn him into a winner (as illustrated in
a previous example). When there are only two candidates (and an arbitrary
number of electors) the simple majority-rule clearly elicits honest responses,
and it is the only impartial method for which every electors optimal strategy is
clearly and unambiguously to vote honestly for ones preferred candidate (May
1952).
5.1 Gibbard-Satterthwaites Impossibility Theorem
97
98
Chapter 5
the subset of candidates D. D/D is a profile over all the candidates defined
as follows: for every voter the profile in the top |D| places (if K is a set, |K|
is its cardinality) coincides with D , and in the bottom |D | places each voter
has the candidates of D in any order whatsoever. Define
g(D ) = f (D/D ) = A.
This is an unambiguous definition for two reasons. First, A
/ D , for suppose
otherwise and choose some candidate C D. Lower every candidate in D other
than C below A. Strong monotonicity implies A remains the winner; but C is
in the first place of every voter, so unanimity implies C is the winner, a contradiction. Second, since no candidate of D can be a winner, strong monotonicity
implies they can be rearranged in any order without changing the outcome.
Thus g is a choice rule.
Three of the properties of the choice rule version of Arrows theorem (see
chapter 3) are clearly satisfied: unrestricted domain (1), unanimity (2 ), and
nondictatorship (4). It remains to show (3 ), namely, that if g(D ) = A and
some nonwinner C is dropped, then A remains the winner. But that is obvious
because
Corollary
99
the Condorcet procedure, or one of its variants. He characterized the anonymous, efficient (Pareto optimal) choice rules that are strategy-proof. In practice
they may be described as n different mechanisms, where n is the number of
voters. 5 Identify each voters preferred candidate on the real line. Some candidates may be the most preferred of many voters, some of no voters. For instance,
suppose there are three candidates A, B, and C that go from left to right on the
real line, and fifteen voters, where seven most prefer A, three most prefer B,
and five most prefer C. The kth mechanism elects the kth of the most preferred
candidates in going from left to right on the real line. Thus, in the example,
the first through seventh mechanisms elect A; the eighth through tenth elect B;
and the last five elect C. The median mechanism opts for the candidate of the
median voterthe eighth in the exampleand elects B, the Condorcet-winner
because the preferences are single-peaked. Each of the n mechanisms is clearly
strategy-proof: a voter who reports other than his most preferred candidate
either changes nothing or changes the outcome to a less preferred candidate. It
has also been shown that the same result holds if each voter reports a complete,
single-peaked rank-order, though the social choice function will depend only
on the peaks of the reported preferences (Barber and Jackson 1994). But as
practical schemes these methods make no sense at all: how are candidates for
public office, competing wines, or Olympic skaters to be aligned on the real line
so that the preferences of all voters or all judges are single-peaked relative to
that alignment?
The attempts to escape Arrows theorem by restricting the domain of
preference-profiles have their counterparts in attempting to escape GibbardSatterthwaites theorem. For example, if the domain is restricted to singlepeaked preferences, any Condorcet-consistent method is strategy-proof (H.
Moulin 1988, 263). But how can a voter who wishes to manipulate be restricted
in any way? For example, if there were three candidates L (left), C (center),
and R (right), a voter would be denied the inputs L R C and R L C.
Even if it were true that preferences are single-peaked in voting, it makes absolutely no sense to imagine that voters could be restricted to single-peaked inputs
(indeed, how would the law be formulated?).
The Gibbard-Satterthwaite impossibility theorem adds one more reason for
rejecting the traditional model. And yet, the majority judgment shows that there
is a way of combating manipulation though not of eliminating it entirely. The
essence of the idea must be attributed to Galton.
5. Moulin gives a purely theoretical description that is somewhat more complicated. His description
introduces fictitious alternatives, and his characterization includes more mechanisms.
100
Chapter 5
Some one hundred years after Laplace had raised the problem, Sir Francis
Galtondistinguished statistician, pioneer of correlation and regression, inventor of fingerprint identification, geographer and explorer, meteorologist, founder
of differential psychology, geneticist (and eugenicist), cousin of Charles Darwin, and best-selling authorproposed a different, considerably more convincing solution to the now well-identified budget problem:
A certain class of problems do not as yet appear to be solved according to scientific
rules, though they are of much importance and of frequent recurrence. Two examples
will suffice. (1) A jury has to assess damages. (2) The council of a society has to fix on a
sum of money, suitable for some purpose. Each voter, whether of the jury or the council,
has equal authority with each of his colleagues. How can the right conclusion be reached,
considering that there may be as many different estimates as there are members? That
conclusion is clearly not the average of all the estimates, which would give a voting
power to cranks in proportion to their crankiness. One absurdly large or small estimate
would leave a greater impress on the result than one of reasonable amount, and the more
an estimate diverges from the bulk of the rest, the more influence would it exert. I wish
to point out that the estimate to which least objection can be raised is the middlemost
estimate, the number of votes that it is too high being exactly balanced by the number
of votes that it is too low. Every other estimate is condemned by a majority of voters
as being either too high or too low, the middlemost alone escaping this condemnation.
(Galton 1907a; our emphasis).
The budget problem has a very distinctive property: when the alternatives are
quantities, they have a natural order.
Not a man to be satisfied by mere theory alone, Sir Francis applied the idea
a week later. In doing so he displayed not only his practical spirit but also his
wisdom and his wit:
In these democratic days, any investigation into the trustworthiness and peculiarities of
popular judgments is of interest. . .
A weight-judging competition was carried on at the annual show of the West of
England Fat Stock and Poultry Exhibition recently held at Plymouth. A fat ox having
been selected, competitors bought stamped and numbered cards, for 6d. each, on which
to inscribe their names, addresses, and estimates of what the ox would weigh after it
had been slaughtered and dressed. Those who guessed most successfully received
prizes . . . The judgments were unbiased by passion and uninfluenced by oratory and the
like. The sixpence fee deterred practical joking, and the hope of a prize and the joy of
competition prompted each competitor to do his best . . . The average competitor was
probably as well fitted for making a just estimate of the dressed weight of the ox, as an
average voter is of judging the merits of most political issues on which he votes, and
the variety among the voters to judge justly was probably much the same in either case.
(Galton 1907b)
101
The rate which together with all higher rates just represented a majority of all the
ballots cast became the millage rate; in our terminology, the majority-rate was
the choice, that is, the middlemost when the number of ballots was odd and the
lower middlemost when the number of ballots was even. The system was abandoned in 1968 when Florida adopted a new constitution (Holcombe and Kenny
2007). Why and at whose urging the method was chosen remains a mystery.
It seems a strange quirk in the history of ideas that Galtons idea has lain
dormant. This may be due to Duncan Black, who called Galtons a small contribution that had no doubt been made independently by many other people
(1958, 188), explaining that he mentioned it only because of Galtons stature
102
Chapter 5
as a statistician. Yet Galtons idea is far from alien to Blacks main result. It
is natural to order the different money amounts to be budgeted from lowest to
highest; and it is not unreasonable to suppose that each member of the council
or the jury has a preference for some one ideal budget and that her liking for
the others decreases the more they differ from her ideal. Galton had realized
that relative to the natural order every voters or judges preference-order
in the budgeting problemor the weight of the dressed ox problemis
single-peaked. And so he argued for the middlemost estimate: Every other
estimate is condemned by a majority of voters as being either too high or too
low, the middlemost alone escaping this condemnation.
5.3 Majority Judgment Methods
The majority-grade and the majority-ranking (or majority-value or majoritygauge)the principal concepts of the theory developed in this bookmay be
seen as generalizations of Galtons astute observation (though implicitly he
seems to have assumed single-peaked preferences).
Two principal methods for finding solutions in the context of the traditional
model survive close analysis: Bordas for winners, Condorcets for rankings.
Their major drawback is their manipulability. That is primarily due to the
fact that both methods sum points to obtain scores. The majority judgment,
applied respectively to Borda-points and Condorcet-points, aggregate them not
by summing but by taking the middlemost points.
Borda-majority judgment method A voters input is a rank-order of the candidates. It determines the Borda-points assigned by the voter to each candidate.
A candidates majority-grade and majority-value (or majority-gauge) are computed on the basis of his Borda-points.Acandidate with a highest majority-value
(or majority-gauge) is elected. 6
To understand how the method works, consider again the real example of the
Social Choice and Welfare (SCW) Societys presidential election (Brams and
Fishburn 2001; Saari 2001a):
13 : A B C
11 : C A B
11 : A C B
8 : C B A.
9:B C A
103
Table 5.1a
Borda-Majority Judgment Method, SCW Society Election
Borda-Points
2
MajorityValue
24
11
17
1. 1 . . . 1 2
MajorityGauge (p, , q)
MajorityRanking
BordaScore
1st
59
2d
58
(9, 1, 22)
3d
39
19
20
13
1. 1 . . . 1 1
22
1. 1 . . . 1 0
21
italics, to distinguish them as grades) are given in table 5.1a. A, for example,
is ranked first 24 times (so is assigned that number of 2s), ranked second
11 times (so is assigned that number of 1s), and ranked last 17 times (so is
assigned that number of 0s). All three candidates have the same majoritygrade of 1. The majority-gauges are sufficient to determine the order (here
p and q are the numbers of grades higher and lower than the majority-grade
rather than their percentages). The majority-values are written only with the
precision needed to rank the candidates. The Borda-majority method makes A
the winner and ranks the candidates in the order A S C S B (in agreement
with the Borda-ranking).
The Borda-points are not summed, so the method resists manipulation. For
example, if two of the eleven voters with preference-order C A B manipulated by reporting instead C B A, the Borda-ranking would change to
C S A S B, but the majority-ranking would remain the same: it would take
at least seven of those eleven voters to manipulate in the same way to change
the outcome of the majority-ranking.
The method is impartial, unanimous, and nondictatorial. However, independence of irrelevant alternatives (IIA) may be violated (necessarily, by Arrows
theorem), and it is: if B withdraws, the profile becomes 24 : A C and
28 : C A, so C is ranked ahead or wins against A, as shown in table 5.1b.
A similar approach defines a method for ranking:
Condorcet-majority judgment method A voters input is a rank-order of the
candidates. It determines the Condorcet-points assigned by a voter to each
ranking. A rankings majority-grade and majority-value (or majority-gauge)
are computed on the basis of its Condorcet-points. A ranking with a highest
majority-value (or majority-gauge) is chosen.
104
Chapter 5
Table 5.1b
Borda-Majority Judgment Method, SCW Society Election, Restricted to Candidates A, C
Borda-Points
C
A
MajorityGrade
MajorityRanking
28
24
24
28
1
0
1st
2d
Table 5.1c
Condorcet-Majority Judgment Method, SCW Society Election
Condorcet-Points
3
11
24
MajorityValue
MajorityGauge
MajorityRanking
CondorcetScore
(11, 2, 17)
1st
89
(11, 2, 22)
2d
93
(8, 2, 24)
3d
75
(24, 1+, 8)
4th
81
5th
63
6th
67
A S C S B
2. 2 . . . 2 2
7
C S A S B
11
19
22
2. 2 . . . 2 1
13
2. 2 . . . 2 1
1. 1 . . . 1 2
C S B S A
20
11
A S B S C
13
11
20
B S A S C
22
19
11
1. 1 . . . 1 2
8
B S C S A
24
11
1. 1 . . . 1 1
This is applied to the same example in table 5.1c. Grades are given to
each ranking. For example, A B C is given the grade of 3 Condorcetpoints 13 times (because 13 preference-orders agree in all three comparisons
of candidates, namely, A B C), 2 Condorcet-points 11 times (because 11
preference-orders agree in two comparisons of candidates, namely, A C
B), 1 Condorcet-point 20 times (because 20 preference-orders agree in one
comparison, namely, B C A and C A B), and 0 Condorcet-points 8
times (because 8 preference-orders disagree completely, namely, C A B).
The Condorcet-majority methods solution to the SCW Society problem is
A S C S B. It agrees with the Borda-majority method but not with the solution of Condorcets method, C S A S B (compatibility with Borda is not to
be expected in general). But the Condorcet-points are not summed, as they are
in Condorcets method, and manipulation is resisted with this method, too.
Consider again the 3k-voter profile k-Cond(A B C):
105
Table 5.2
Condorcet-Majority Judgment Method, 3k Voters with Profile k-Cond(A B C)
Condorcet-Points
R2 : A S C S B
R1 : A S B S C
MajorityGrade
MajorityRanking
CondorcetScore
0
k
2k
0
0
2k
k
0
2
1
1st
4th
4k
5k
Note: Three rankings are tied for 1st, three for 4th.
k:AB C
k:B C A
k : C A B.
As table 5.2 shows, Condorcets method selects the three rankings that with
R1 form a Condorcet-cycle; the Condorcet-majority method selects the compromise solution, namely, the other three rankings that with R2 form a
Condorcet-cycle. But the Condorcet-majority method is not choice-compatible.
For suppose there is one voter whose preference-order is A B C; the (unanimous) winner is A and the (unanimous) loser is C. When the 3k-voter profile
that is a Condorcet-component is adjoined, the method selects A S C S B
and B S A S C; choice-compatibility would have it select A S B S C.
Now take the profile to be k-Cond(A B C D) over 4k voters:
k:AB C D
k:C DAB
k:B C DA
k : D A B C.
In this case (see table 5.3) the Condorcet-majority method and the Condorcetscore yield different winning rankings and different rankings of the rankings.
The point of departure of the Condorcet-majority method is that each voters
input is a rank-order of the candidates that gives a grade to each possible rankorder, its Condorcet-points. The grades are then used to determine societys
rank-order (the Condorcet method adds them, the Condorcet-majority method
applies the majority-value). The grade given to a rank-order is the number
of times the order between two candidates agrees with the voters rank-order.
Thus, for example, when there are three candidates A, B, and C and a voters
input is A B C, the voter gives a grade of 3 to A B C; a grade of 2
to A C B and B A C; a grade of 1 to C A B and B C A;
and a grade of 0 to C B A. A higher number grade is better.
The construction given in the proof of theorem 4.5 suggests a more refined
idea for assigning grades. If a voters input is An An1 A1 , then for
(j1 , j2 , . . . , jn ) a permutation of (1, 2, . . . , n),
{Aj1 Aj2 Ajn }
106
Chapter 5
Table 5.3
Condorcet-Majority Judgment Method, 4k Voters with Profile k-Cond(A B C D)
Condorcet-Points
6
R3 : A S C S D S B
2k
R1 : A S B S C S D
0
0
0
0
0
k
k
k
k
k
k
0
R6
R2
R4
R5
: A S
: A S
: A S
: A S
D S C S B
B S D S C
D S B S C
C S B S D
2k
2k
0
0
k
0
k
k
2k
0
k
k
0
k
0
0
0
ABC
(3, 2, 1)
ACB
(3, 1, 2)
BAC
(2, 3, 1)
CAB
(1, 3, 2)
BCA
(2, 1, 3)
CBA
(1, 2, 3)
The order among the top-preferred-grades agrees with, but is more refined
than, the Condorcet-points. The Condorcet-points are the same for A C B
and B A C, whereas the first is better than the second according to the
top-preferred-grades. This is not true when there are more than three candidates.
Top-preferred-majority judgment method A voters input is a rank-order of the
candidates. It determines the top-preferred-grades assigned by a voter to each
ranking. A rankings majority-grade and majority-value (or majority-gauge)
are computed on the basis of its top-preferred-grades. A ranking with a highest
majority-value (or majority-gauge) is a chosen.
The top-preferred-majority judgment method is applied to the SCW Society election problem in table 5.4. The ranking A C B has 13 grades
of (3, 1, 2), for example, because the input of 13 voters is A B C. The
solution happens to be exactly the same rank-ordering of the rank-orders as
the Condorcet-majority method, though this clearly will not always be the
107
Table 5.3
(cont.)
R3 : A S C S D S B
MajorityValue
MajorityRank
CondorcetScore
3.4
1st
12k
5th
14k
9th
13th
13th
21st
10k
12k
12k
12k
2k1
3. 3 . . . 3 2
R1 : A S B S C S D
2k1
R6
R2
R4
R5
: A S
: A S
: A S
: A S
3. 3 . . . 3 0
2.4
2.4
2.3
D S C S B
B S D S C
D S B S C
C S B S D
case. Notice that not one of the three majority judgment methods places the
Condorcet-winner C first. The top-preferred-majority method resists manipulation. The very nature of the grades it uses shows that it is nonsense to assign
them numerical values and sum them to determine societys preferences.
5.4 The Majority Judgment for the Traditional Model
The two most famous impossibility theorems, Arrows and Gibbard-Satterthwaites, together with the incompatibility between choosing and ranking
combine to prove one basic truth: within the age-old model there is no satisfactory scheme for determining a winner or an order of merit among candidates
Table 5.4
Top-Preferred-Majority Judgment Method, SCW Society Election
Top-Preferred-Grades
ACB
CAB
CBA
ABC
BAC
BCA
=
(321)
=
(312)
=
(231)
=
(132)
=
(213)
=
(123)
Majority
Gauge
MajorityRank
CondorcetScore
11
11
8
13
0
9
13
8
11
11
9
0
11
11
9
0
13
8
0
9
11
11
8
13
8
13
0
9
11
11
9
0
13
8
11
11
(24, + , 17)
(19, , 22)
(19, , 24)
(24, + , 17)
(22, , 22)
(17, , 22)
1st
2d
3d
4th
5th
6th
89
93
75
81
63
67
108
Chapter 5
unless there are only two candidates. But suppose a practitioner faces a situation
where, owing to lack of time or lack of understanding or some other cause, it
is impossible to define a common language with which to evaluate candidates;
all that can be obtained from voters or judges are rank-orderings of candidates.
What then? In the context of the traditional model, use the Borda-majority
judgment method.
Why? Inputs are messages. Ideally, they would be grades, permitting voters total freedom of expression (within the bounds of the language of grades).
But why use Borda-points instead of some other strictly monotonic scoring
scheme s1 > > sn ? Laplace justified their use assuming an underlying uniform distribution of the competitors merits on an arbitrary interval [0, R] of the
real line. But that assumption seems unreasonable: very high grades and very
low grades are rare. Laplace derived a completely different set of scores when
he assumed voters assign probabilities to alternative motions. The Denmark
school scale used the ten marks {0, 3, 5, 6, . . . , 11, 13}, omitting 1, 2, 4, 12
(see chapter 1), for evaluating students because of the observed distribution of
their merit. Different underlying distributions may be used to justify the use
of different scoring schemes (see chapter 8). When different scoring schemes
are used to grade and to rank by adding or averaging, they yield very different
solutions. The results of the Borda-majority method are one and the same for
every distribution and thus for every strictly monotonic scoring scheme.
Furthermore, the Borda-majority judgment method enjoys many of the good
properties of the majority judgment. It is strategy-proof-in-grading. Since a
candidates final grade is the majority-grade, any voter who ranks the candidate
above it cannot increase the final grade by placing the candidate higher in his
input, and similarly, any voter who ranks the candidate below it cannot decrease
the final grade by placing the candidate lower in his input. The Borda-majority
method is perhaps even more resistant to manipulation because if a voter places
some candidate higher (or lower) in his preference-order, then he necessarily
places one or more others lower (or higher), thus complicating the effort to
manipulate. This method is also group strategy-proof-in-grading: the same
holds for groups of voters. Moreover, a candidates Borda-majority-grade is sure
to increase by 1 only if a majority of the voters increase the candidates ranking
by 1. It is only in unusual circumstances that one or a very few voters can change
the grade. These arguments concern grades. Other strategic considerations come
into play if the aim is not grades but the rank-ordering induced by grades. The
method is partially strategy-proof-in-ranking: if societys rank-order placed
one candidate ahead of another, A S B, and some voter preferred the opposite,
that voter could not change her input and both lower As majority-grade and
raise Bs majority-grade; at most, the voter could either raise Bs or lower
109
As. The method resists manipulation in still other ways, as illustrated by the
example of table 5.1a. Finally, it almost reconciles Borda and Condorcet: the
top-preferred-majority-winner always has the highest Borda-majority-grade.
The Borda-majority judgment method is a new and more satisfactory solution
to the problem of choosing a winner and a ranking in the context of the traditional
model. It happens that (almost) this method was used in practice for many years
by the International Skating Union to rank figure skaters. Specifically, an odd
number of judges ranked the skaters, and the majority-grades determined the
jurys ranking together with an ad hoc assortment of rules to break ties (see
chapter 7).
The Borda-majority judgment method is impartial, resists manipulation,
and satisfies winner-loser unanimity, choice-compatibility, and choice-monotonicity, but it (necessarily) fails rank-compatibility and rank-monotonicity (and
so also fails independence of irrelevant alternatives). The reason for this failure
is that the scale of grades changes when the number of candidates changes,
and this can induce changes in the majority-ranking. That is its unavoidable
weakness. When there is a common language, that weakness is overcome.
During the Middle Ages there were all kinds of crazy ideas, such as that a piece of
rhinoceros horn would increase potency. Then a method was discovered for separating
the ideaswhich was to try one to see if it worked, and if it didnt work, to eliminate
it. This method became organized, of course, into science.
Richard P. Feynman
Several centuries of work on the theory of social choice have produced very
substantial contributions, notably, in identifying a host of important properties or criteria that should (or should not) be satisfied by a mechanism that
amalgamates the beliefs, desires, or wills of individuals into a decision of
society.
Arrows paradox must be avoided: a method should satisfy independence
of irrelevant alternatives, that is, the presence or absence of some candidate
should not cause a change of winner between two others. Condorcets paradox
must be avoided: a method should yield a transitive order of finish among
the competitors. A method should be monotonic: a winning candidate who
receives more votes or rises in the rank-orders of candidates must remain the
winning candidate. A unanimous decision among individual voters must be
the decision of society. Mechanisms should make voters optimal strategies
be those messages that honestly express their beliefs; or, if no such mechanism
can be found, then one that best resists strategic manipulation and best incites
the electorate to express themselves honestly must be found.
Regrettably, the theory shows that even when voters eschew strategic voting
and honestly express their convictions, there exists no method that satisfies the
essential criteria, unless it is assumed that voters have very restricted types
of unrealistic views. The impossibility and incompatibility theorems prove
that the traditional model harbors internal inconsistencies. The reason for this
conundrum is the basic paradigm of social choice: voting depends on comparisons between pairs of candidatesone is better than anotherso voters have
112
Chapter 6
lists of preferences in their minds. Instead of inputs that evaluate the absolute
merits of candidates, the inputs compare the relative standings of candidates.
But even the idea of comparing is questionable: if the decision or output is to
be a rank-order of the candidates, should not the voters be asked to compare the
relative merits of the various possible rank-orders rather than only the relative
merits of candidates?
6.1 Unrealistic Inputs
Every bit as damning as the logical inconsistencies of the theory is the fact that
the traditional paradigm of voting theorythat voters, when confronted by a set
of candidates, compare them or rank-order themis simply wrong. Voters do
not go to the polls with rank-orders of the candidates in their minds. The French
presidential elections of 2002 and 2007 had, respectively, sixteen and twelve
candidates. Instead of effecting a rank-ordering a voter ignored most candidates
as unacceptable and looked upon a few with varying intensities of approval or
disapproval. The model that underlies the theory simply does not correspond
to reality. Experimental evidence proves this conclusively. Information drawn
from three electoral experiments refute the traditional view as well as several
other preconceived ideas.
The Orsay experiment (see chapters 1 and 15) tested the majority judgment,
so voters inputs were expressed in a common language of gradesExcellent,
Very Good, Good, Acceptable, Poor, and To Reject 1 evaluating the merits
of candidates. Of the 2,360 who voted, 1,752 officially participated; 1,733
ballots were valid. Contrary to the predictions of some, the voters had no
difficulty in filling out the ballots, usually doing so in about one minute. In
fact, every member of the team conducting the experiment had the impression that the participants were very glad to have the means of expressing their
opinions concerning all the candidates, and were delighted with the idea that
candidates would be assigned final grades.2 An effective argument to persuade
reluctant voters to participate was that the majority judgment allows a much
fuller expression of a voters opinions. The actual system offered voters only
thirteen possible messages: to vote for one of the twelve candidates or to vote
for none. Several participants actually stated that the experiment had induced
them to vote for the first time: finally, a method that permitted them to express
1. Trs Bien, Bien, Assez Bien, Passable, Insuffisant, and A Rejeter.
2. A collection of television interviews of participants prepared by Raphal Hitier, a journalist of
I-Tl, confirms these impressions. Also, a questionnaire used in the ILC experiment (see chapter
17) shows that voters prefer using three number grades rather than two.
113
themselves. The majority judgment offered voters more than 2 billion possible
messages with which to express themselves (with twelve candidates and six
grades, there are 612 = 2,176,782,336 possible messages). The voters relative
ease of expression in the face of so vast a choice shows that assigning grades
is cognitively simple, certainly much simpler than ranking candidates (as any
teacher or professor faced with ranking students will attest). Of the 1,733 valid
majority judgment ballots, 1,705 were different. It is surprising that they were
not all different. Had all those who voted in France in 2007 (some 36 million)
cast different majority judgment ballots, fewer than 1.7% of the possible messages would have been used. Those that were the same among the 1,733 valid
messages of the experiment contained only To Rejects or accorded Excellent
for one or several candidates and To Reject for all the others.
Voters were particularly happy with the grade To Reject and used it the most.
There was an average of 4.1 of To Reject per ballot and an average of 0.5 of no
grade (which, in conformity with the stated rules, was counted as a To Reject).
Voters were parsimonious with high grades and generous with low ones (see
table 6.1). Only 52% of voters used a grade of Excellent; 37% used Very Good
but no Excellent; 9% used Good but no Excellent and no Very Good ; 2% gave
none of the three highest grades. The opinions of voters are richer, more varied
and complex by many orders of magnitude than they are allowed to express
with any current system.
The highest grades were often multiple (see table 6.2). In all, more than 33%
of the ballots gave the highest grade to at least two candidates. Thus one of
every three voters did not designate a single best candidate. This shows that
many voters either saw nothing (or very little) to prefer among several candidates, or at the least, they were very hesitant to make a choice among two,
three, or more candidates. Moreover, many voters did not distinguish between
the leading candidates: 17.9% gave the same grade to Bayrou and Sarkozy
(10.6% their highest grade to both), 23.3% the same grade to Bayrou and Royal
(11.7% their highest grade to both), and 14.3% the same grade to Sarkozy
and Royal (4.1% their highest grade to both). Indeed, 4.8% gave the same
grade to all three (4.1% their highest grade to all three: all who gave their
highest grade to Sarkozy and Royal also gave it to Bayrou). These are significant percentages: many elections are decided by smaller margins. These are
valid, significant inputs of opinion that are completely ignored by the traditional
model.
This finding is reinforced by a poll conducted on election day (by TNS
SofresUnilog, Groupe LogicaCMG, April 22, 2007) that asked at what moment
voters had decided to vote for a particular candidate. Their hesitancy in making
a choice is reflected in the answers: 33% decided in the last week, one-third of
114
Chapter 6
Table 6.1
Average Number of Grades per Majority Judgment Ballot, Three Precincts of Orsay, April 22, 2007
Average No. of
Grades per Ballot
Excellent
Very Good
Good
Acceptable
Poor
To Reject
Total
0.69
1.25
1.50
1.74
2.27
4.55
12.00
Table 6.2
Multiple Highest Grades, Three Precincts of Orsay, April 22, 2007
Two or more Excellent
Two or more Very Good, none higher
Two or more Good, none higher
11%
16%
6%
whom (11%) decided on election day itself. For Bayrou voters 43% decided in
the last week and 12% on election day; for Sarkozy voters the numbers were
20% and 6%; for Royal voters, 28% and 9%; for Le Pen voters, 43% and 18%.
In contrast, the system forced them to make a choice of one (or to vote for
no one).
Moreover, inputs that are rank-ordersor that simply show preferences
between pairs of candidatesignore how voters evaluate the respective candidates (just as the 2002 runoff ignored the respective evaluations of Chirac and
Le Pen) except, of course, that one is evaluated higher than the other. Over
one-half of the highest grades are less than Excellent. Two-thirds of the second
highest grades are merely Good or worse (see table 6.3). To be first, second, or
third in a ranking of at least three candidates carries very different meanings to
different voters that are completely ignored by the inputs to the traditional
model. This is still another reason that aggregating rank-orders (as do the
methods of Condorcet and of Borda, and their combinations) is not meaningful.
The Faches-Thumesnil experiment tested two versions of the alternative vote,
so votersinputs were rank-orders of the candidates (Farvaque, Jayet, and Ragot
2007). The experiment was conducted in two of the eleven voting precincts of
Faches-Thumesnil, a small town in Frances northernmost department, Nord.
Voters were not obliged to rank-order all candidates (as inAustralia): a candidate
115
Table 6.3
Distributions: Highest Grade, Second Highest Grade, Third Highest Grade, Three Precincts of
Orsay, April 22, 2007
Highest
Second highest
Third highest
Excellent
Very Good
Good
Acceptable
Poor
To Reject
52%
37%
35%
9%
41%
26%
2%
16%
40%
0%
5%
22%
1%
3%
13%
Table 6.4
Number of Candidates Rank-Ordered, Faches-Thumesnil Experiment, April 22, 2007
Number of Candidates Rank-Ordered
No. of ballots
Percent of ballots
13
46
711
12
260
29.1%
210
23.5%
53
5.9%
370
41.4%
not on the list of a voters ballot was considered off the list and thus could
never be placed first on the voters list after elimination of other candidates.
Of those who voted officially, 960 (or 60%) participated in the experiment, 67
ballots were invalid, and 893 were valid. Almost 60% of the ballots did not
rank-order all candidates and over 50% rank-ordered six or fewer of the twelve
candidates, showing that voters are reluctant to rank-order many candidates (see
table 6.4).
Admittedly, it is a difficult and time-consuming task to rank-order alternatives, and in any case, whether a voter rank-orders many or few candidates, she
is unable to express any sense of how much or how little any of the candidates
are appreciated. Suppose there are n candidates. To rank-order them a voter first
places some one candidate on a list; then places the second in the slot above
or below; then the third in one of the three slots (above, between, below); and
so on. This takes n(n + 1)/2 time units. And if ties are not allowed, ranking
becomes even more difficult. In contrast, it is a much easier task to assign each
candidate a grade than to rank-order candidates. In practice, with a natural wellunderstood language of grades, a voter quickly situates an approximate grade
for each candidate (e.g., Sarkozy is Good or Very Good ) and thus takes about
2n time units (and in any case, a maximum of mn when there are m grades).
Cognitively, assigning grades seems to be a much simpler exercise than ranking
candidates. But whatever the reason, ranking a large number of alternatives is
clearly very difficult, as is seen by the fact that about 95% of Australian voters
rely on predetermined rankings provided by their parties.
116
Chapter 6
The two versions of the alternative vote tested concerned the choice of candidate to eliminate when there is no majority for any candidate among the
(current) first places. The Australian system eliminates the candidate listed first
the least number of times. The other version eliminates the candidate listed
last the greatest number of times. The Australian version makes Sarkozy the
winner; the other version makes Bayrou the winner. The Australian version is
less favorable to centrists because major candidates of the right and the left
are usually either high or low on voters lists. This may explain why it is used
in practice rather than the other version. The other method, sometimes called
the Coombs method, guarantees the election of the Condorcet-winner when the
preferences are single-peaked and the votes are sincere, which is not true of the
first method (see Grofman and Feld 2004; Nagel 2007).
The official first-round results in the two voting precincts of FachesThumesnil were very close to the national percentages (table 6.5). The voters
rank-orders make it possible to compute the results of the face-to-face confrontations (table 6.6). They yield the same unambiguous order of finish among
the four significant candidates as did the polls on March 28 and April 19 (see
table 2.11). Once again the Condorcet-order agrees with the Borda-ranking:
Bayrou Sarkozy Royal Le Pen.
Table 6.5
Official First-Round Votes, National and Two Precincts of Faches-Thumesnil, April 22, 2007
National
Faches-Thumesnil
National
Faches-Thumesnil
Sarkozy
Royal
Bayrou
Le Pen
Besancenot
de Villiers
31.2%
29.7%
25.9%
25.5%
18.6%
19.7%
10.4%
12.0%
4.1%
3.7%
2.2%
2.4%
Buffet
Voynet
Laguiller
Bov
Nihous
Schivardi
1.9%
2.4%
1.6%
1.4%
1.3%
1.5%
1.3%
0.9%
1.1%
0.5%
0.3%
0.3%
Table 6.6
Projected Second-Round Results, Faches-Thumesnil Experiment, April 22, 2007
Bayrou
Sarkozy
Royal
Le Pen
Bayrou
Sarkozy
Royal
Le Pen
48%
40%
20%
52%
46%
17%
60%
54%
27%
80%
83%
73%
Note: For example, Sarkozy has 48% of the votes against Bayrou.
117
The one escape from the inner inconsistencies of the traditional model of social
choice occurs when voters have single-peaked preferences relative to a common
ordering of the candidates. It comes from the idea that in the political realm
candidates may be listed on a line from left to right, voters place themselves
somewhere along it, prefer the candidate closest to their position, and dislike
candidates more, the more distant they are from their position. Were there such
a line, and were it true that voters preferences for candidates are single-peaked,
inputs of rank-orders would satisfy the aims of the traditional theory: the winner
would be the Condorcet-winner, and the order of finish would be transitive in
conformity with the face-to-face votes. The reality, long recognized, is that
there is no such left-to-right line for which preferences are single-peaked. New
experimental evidence confirms it.
An electoral experiment was conducted in parallel with the French presidential election of 2002 in five of Orsays twelve voting precincts (under the
same general conditions as the 2007 Orsay experiment). Its aim was to test
approval voting (see chapter 18 for a detailed description of the experiment).
The experimental ballot contained a list of the sixteen candidates together with
instructions saying:
Rules of approval voting: The elector votes by placing crosses [in boxes corresponding
to candidates]. He may place crosses for as many candidates as he wishes, but not more
than one per candidate. The winner is the candidate with the most crosses.
On average there were 3.15 crosses per ballot. The total number of different
possible messages was 216 = 65,536. Of the 2,587 valid ballots, 813 were
different. Voters had no incentive to vote other than sincerely, namely, if a cross
was given to some candidate C, then a cross was given to every candidate
preferred to C as well. But if there existed a left-to-right line relative to which
the voters preferences are single-peaked, then the total number of different
possible sincere votes would have been 137. The crosses would have to have
been consecutive with regard to the alignment along the spectrum: there are
16 sincere messages with one cross, 15 with two consecutive crosses, 14 with
three consecutive crosses, . . . , 1 with sixteen consecutive crosses, and 1 with
no crosses, so in all 137 sincere votes. The large discrepancy between 137 and
813 proves that the single-peaked condition was far from satisfied.
To assume single-peaked preferences is certainly not valid in elections. On
the other hand, there is no denying that candidates and their political parties
seeking election are commonly described in terms of a left-right spectrum and
that this makes sense to political scientists, journalists, and the general public in
118
Chapter 6
France, the U.K., the U.S.A., and throughout the world. The Orsay experiments
of 2002 and 2007 both give solid scientific evidence that this is a valid concept.
Ballots from the 2002 experiment with several crosses yield statistical information about how voters favorable to one candidate might transfer their votes
to others. For example, an estimate for, say, Bayrou may be computed as follows: among the ballots containing a cross for Bayrou and k 1 other crosses,
attribute 1/k to each of the other candidates with a cross, and find the sum given
each candidate. The estimate of the transfer to a candidate is the percentage that
the candidates sum represents of the total sum (see table 6.7). Statistically, the
voterstransfers are almost single-peaked among the important candidates. For
instance, among those who gave a cross to Chirac, Bayrou was the most likely
transfer, and the further distant from Chirac on the left-right line, the less likely
the transfer. This does not hold for the unimportant candidates. The deviations
are strikingly small among the important candidates and are easily explained.
Chirac, the incumbent president, often exerted an appeal to voters in excess of
the left-right spectrum (e.g., 16% of Chevnement voters go to Chirac, only
14% to Bayrou); crosses were sometimes given to the far right and the far left
as expressions of opposition (e.g., more Gluckstein voters go to Le Pen than to
Bayrou).
Table 6.7
Estimated Transfers of Votes to Important Candidates, Based on 2002 Orsay Experiment
Gluckstein
Laguiller
Hue
Besancenot
Mamre
Taubira
Jospin
Chevnement
Bayrou
Chirac
Madelin
Lepage
Boutin
Le Pen
Saint-Josse
Mgret
Average transfer
Left
Mamre
Jospin
Chevnement
Bayrou
Chirac
15%*
14%
13%
20%
15%
26%*
8%
6%
3%
3%
7%
4%
3%
3%
1%
9.4%
9%
20%*
33%*
21%*
38%*
28%*
20%*
10%
5%
4%
12%
4%
3%
6%
1%
14.3%
5%
9%
10%
9%
8%
10%
15%
13%
13%
9%
12%
6%
13%
10%
5%
9.9%
2%
4%
3%
5%
7%
8%
8%
14%
24%*
22%
17%*
23%*
10%
10%
11%
11.2%
4%
7%
2%
3%
4%
4%
5%
16%
27%*
32%*
16%
17%
26%*
23%*
22%
13.7%
Right
Le Pen
Ten
Others
5%
5%
2%
3%
1%
0%
1%
6%
4%
10%
6%
2%
5%
9%
36%*
6.4%
60%
45%
37%
39%
42%
35%
45%
36%
40%
45%
24%
34%
41%
45%
39%
24%
119
120
Chapter 6
have elected her. In 2002 the election of Bayrou with 7% of the first-round votes
or of Lepage with 2% is unacceptable: neither was a major candidate. 3
The estimates of transfers in 2007 are given in table 6.8 and are computed
for, say, Bayrou as follows. Among the ballots whose highest grade goes to
Bayrou, either k 1 other candidates are given the same grade or Bayrou is
the only candidate with that grade and there are k 1 candidates who are given
the next highest grade. Attribute 1/k to each of the other candidates in either
case, and find the sum accorded to each candidate. The estimate of the transfer
to a candidate is the percentage his sum represents of the total sum. Exactly
the same rules are used to determine candidates of the left and the right and the
order among them. In this case, there is only one center candidate, Bayrou. The
left (from Buffet to Royal) and the right (from Sarkozy to Le Pen) correspond
to the usual media designations. Important candidates are Le Pen and those
whose average transfer is above 100/11 = 9.1%. Once again, statistically, the
voters transfers are almost single-peaked among the important candidates.
The single peak in the rows (one small exception for the important candidate
Le Pen) is accompanied by a single peak in the columns (two small exceptions,
Besancenot and Le Pen). In two practical political situations single-peaked
transfers are real when seen in terms of probabilities.
More strikingly than in 2002, the bulk of the transfers go to the important
candidates: to Besancenot (far left), to Royal (moderate left), to Bayrou (center), and to Sarkozy (right). In fact, if the grades are used as determinants of
preference among the three major candidates, 4.1% expressed the preference
Royal Sarkozy Bayrou and 5.8% Sarkozy Royal Bayrou. So 90.1%
of the ballots agree with the single-peaked preferences hypothesis on the leftright line among the three, going from Royal to Bayrou to Sarkozy (though
among more candidates this is not true).
Not surprisingly, Bayrou is the choice of the median-voter nationally with
respect to the left-right line of table 6.8: Bayrous vote plus that of the candidates
to his left was 57.7%; his vote plus that of the candidates to his right was
60.9%. These numbers are close to the estimates that are available of face-toface confrontations with Royal (thus against the left) and with Sarkozy (thus
against the right). A poll taken two days before the election shows the same
3. The 2002 Orsay experiment allows estimates to be made of the face-to-face races. To compute
the estimate between two candidates, a vote is given to one whenever he is given a cross and
the other is not. Jospin (19.5%) and Bayrou (9.9%) did better in the Orsay official vote than
nationally, and Le Pen (10.0%) did worse. The estimates show Jospin winning against Chirac
(with 53%), Bayrou (with 56%) and Le Pen (with 75%); Chirac winning against Bayrou (with
54%) and Le Pen (with 80%); Bayrou winning against Le Pen (with 74%). Jospin is at once the
Condorcet-winner and the Borda-winner, and the Condorcet- and Borda-rankings are the same as
well: Jospin S Chirac S Bayrou S Le Pen.
121
Table 6.8
Estimated Transfers of Votes to Important Candidates, Based on 2007 Orsay Experiment
Left
Besancenot
Buffet
Laguiller
Bov
Schivardi
Besancenot
Voynet
Royal
Bayrou
Sarkozy
Nihous
de Villiers
Le Pen
28%*
32%*
17%*
29%*
13%
11%
6%
2%
14%
2%
4%
Average transfer
14.5%
Royal
24%
17%
13%
11%
26%*
34%*
34%
15%
11%
4%
8%
18.1%
Bayrou
Sarkozy
5%
14%
15%
17%
18%
24%
44%*
43%*
18%
9%
6%
2%
11%
9%
5%
3%
9%
10%
36%*
19%*
60%*
38%*
19.5%
18.4%
Right
Le Pen
Seven
Others
2%
3%
2%
8%
2%
1%
1%
2%
12%
7%
11%
39%
23%
44%
30%
51%
19%
35%
22%
28%
31%
14%
44%
4.5%
Table 6.9
Transfers of Votes to Important Candidates, Polling Results, April 20, 2007
Royal
Bayrou
Sarkozy
Le Pen
Left
Besancenot
Royal
Bayrou
Sarkozy
Right
Le Pen
Eight
Others
Not
Counted
12%
7%
3%
1%
28%*
10%
12%
34%*
37%*
8%
5%
25%
31%*
4%
3%
7%
27%
15%
19%
23%
18%
22%
24%
25%
qualitative results (though they are national estimates, not Orsay estimates; see
table 6.9).
6.3 Bordas and Condorcets Bias for the Center
It is striking that in the 2007 election (for which there is so much polling and
experimental evidence), the Condorcet-winner and the Borda-winnerthose
centuries-old opposing conceptsare consistently one and the same candidate
(Bayrou). Why? The evidence of tables 6.7 and 6.8 suggests that when there is a
statistical left-right spectrum, a voters second choice is most likely to be a major
candidate (protest voters are an exception). So if there are two major candidates,
122
Chapter 6
Left
A
Right
C
A
B
C
vBA = (xB + xC )%
vCA = xC + (1 )xB %
vAB = xA + (1 )xC %
vCB = xC + (1 )xA %
vAC = (xA + xB )%
vBC = (xB + xA )%
Table 6.11
Strong Statistical Left-Right Spectrum, Pairwise Votes, 2002 and 2007 Orsay Experiments
2002
Jospin
Chirac
Le Pen
2007
Royal
Bayrou
Sarkozy
Jospin
Chirac
Le Pen
47%
25%
56%
20%
75%
80%
Royal
Bayrou
Sarkozy
56%
48%
44%
40%
52%
60%
123
Table 6.12a
Number of Wins among Royal, Bayrou, and Sarkozy Only, 2007 Orsay Experiment
First-past-the-post winner
Two-past-the-post winner
Majority judgment-winner
Condorcet-winner
Borda-winner
Left
Royal
Bayrou
Right
Sarkozy
Tie
Cycle
4,274
3,410
1,462
772
369
1,772
4,671
7,573
8,894
9,526
3,574
1,225
956
65
67
380
694
9
246
38
23
Note: Ten thousand samples of 101 ballots, which were drawn from 1,733 ballots.
Cycle indicates a Condorcet paradox.
Table 6.12b
Number of Wins among All Candidates, Winner Always Royal, Bayrou, or Sarkozy, 2007 Orsay
Experiment
First-past-the-post winner
Two-past-the-post winner
Majority judgment-winner
Condorcet-winner
Borda-winner
Left
Royal
Bayrou
Right
Sarkozy
Tie
Cycle
2,324
3,175
1,290
623
348
2,260
5,830
7,756
9,152
9,639
5,379
801
943
5
0
37
194
11
184
13
36
Note: Ten thousand samples of 101 ballots, which were drawn from 1,733 ballots.
Cycle indicates a Condorcet paradox.
conclusion when comparing B with C. On the other hand, if A is the Condorcetwinner, then the Borda-winner is either A or B. For vBC > vAC > 50% > vCB
and vBA > vCA imply that C cannot be the Borda-winner; and symmetrically, if
C is the Condorcet-winner, then the Borda-winner is either C or B. So the Bordawinner favors the centrist candidate more than does the Condorcet-winner.
However, with only a statistical left-right spectrum it is entirely possible for the
Condorcet paradox to occur in theory (by varying the data in table 6.10) and in
practice, as the experimental evidence shows (see tables 6.12a, 6.12b, 6.14a,
6.14b).
Evidence from the 2007 Orsay experiment supports these arguments and
observations. Two sets of independent random drawings were made. In one,
10,000 samples from 101 ballots were drawn from the 1,733 valid ballots in
order to compare the behavior of the principal methods applied only to the
three major candidates, Bayrou, Royal, and Sarkozy (table 6.12a). In the other,
conducted separately, 10,000 random samples from 101 ballots were drawn to
compare the behavior of the principal methods applied to all the candidates
124
Chapter 6
(table 6.12b). In every case one of the three major candidates was the winner.
To compute the winners by one or another of the methods a candidate was
accorded the vote of a ballot if she had the highest grade; when there was a tie
among k candidates for the highest grade on a ballot, each was attributed 1k .
The results in tables 6.12a and 6.12b clearly show that as one passes from
one method to another down the listfrom first-past-the-post to Bordathe
centrist candidate is more and more favored. Bordas method favors the centrist
candidate Bayrou slightly more than Condorcets, and Condorcets much more
than the majority judgment. At the opposite end of the spectrum, the firstand two-past-the-post methods disfavor the centrist candidate in comparison
with the majority judgment. The nine and eleven ties in the majority judgment
mean ties in the majority-gauge (not the majority-value): a 0.001 probability of
a tie with only 101 voters is sufficiently small. The twenty-three and thirty-six
occurrences of the Condorcet paradox show that the preferences among the
candidates is not single-peaked and that though there is a statistical left-right
spectrum, it is not strong. One of the twenty-three Condorcet paradoxes of table
6.12a showed Bayrou with 59% against Sarkozy, Sarkozy with 52.5% against
Royal, and Royal with 52% against Bayrou. The striking contrast between
tables 6.12a and 6.12b is the large increase in Sarkozy first-past-the-post wins
when there are twelve candidates rather than three: it reflects a large number
of occurrences of Arrows paradox coming from the dispersion of votes among
candidates of the left. And, of course, the more candidates there are, the more
Borda favors the centrist. The majority judgment is unaffected by the number of
candidates: the small differences are due to the independently drawn samples.
The official first-round votes in the three precincts of the 2007 Orsay experiment were quite different from the official first-round votes nationally (table
6.13). In particular, Royals 29.9% in Orsay was above her 25.5% nationally,
Bayrous 25.5% in Orsay was much above his 18.6% nationally, Le Pens
5.9% in Orsay much below his 10.4% nationally. So it is no surprise to see
Bayrouthe choice of the median-voter in the official first-round vote in the
Table 6.13
Official First-Round Votes, National and Three Precincts of Orsay, April 22, 2007
National
Orsay
National
Orsay
Sarkozy
Royal
Bayrou
Le Pen
Besancenot
de Villiers
31.2%
29.0%
25.9%
29.9%
18.6%
25.5%
10.4%
5.9%
4.1%
2.5%
2.2%
1.9%
Buffet
Voynet
Laguiller
Bov
Nihous
Schivardi
1.9%
1.4%
1.6%
1.7%
1.3%
0.8%
1.3%
0.9%
1.1%
0.3%
0.3%
0.2%
125
Table 6.14a
Number of Wins among Royal, Bayrou, and Sarkozy Only, 2007 Orsay Experiment
First-past-the-post winner
Two-past-the-post winner
Majority judgment-winner
Condorcet-winner
Borda-winner
Left
Royal
Bayrou
Right
Sarkozy
Tie
Cycle
1,678
2,145
1,288
671
484
42
820
4,001
6,462
7,109
8,089
6,470
4,701
1,993
2,225
191
565
10
669
182
205
Note: Ten thousand samples of 101 ballots, which were drawn from a sample of 501 ballots
representative of the national vote. The same approach was used as in estimating first-round results
on the basis of majority judgment ballots; the percentage of votes of each candidate in the sample of
501 ballots came close to that of the candidates national vote. In this sample, Sarkozy had 30.7%,
Royal 25.5%, and Bayrou had 18.7%. (Le Pen 9.3%.)
Cycle indicates a Condorcet paradox.
Table 6.14b
Number of Wins among All Candidates, Winner Always Royal, Bayrou, or Sarkozy, 2007 Orsay
Experiment
First-past-the-post winner
Two-past-the-post winner
Majority judgment-winner
Condorcet-winner
Borda-winner
Left
Royal
Bayrou
Right
Sarkozy
Tie
Cycle
2,061
2,174
1,309
616
354
50
716
4,034
6,538
9,608
7,874
6,731
4,649
2,002
26
15
379
8
630
12
214
Note: Ten thousand samples of 101 ballots, which were drawn from a sample of 501 ballots
representative of the national vote.
Cycle indicates a Condorcet paradox.
voting precincts of the Orsay experiment and in the nationso often the winner
(see tables 6.12a and 6.12b). Accordingly, parallel sets of independent random
drawings were made from a subset of 501 ballots (of the 1,733 valid ballots)
whose estimated first-round votes were representative of the national vote. In
one, 10,000 samples from 101 ballots were drawn from the 501 to compare
the methods applied to the three major candidates (table 6.14a); in the other,
conducted separately, 10,000 samples from 101 ballots were drawn from the
501 to compare the methods applied to all the candidates (table 6.14b).
The results show more dramatically how Bordas method and to a lesser
extent Condorcets method favor the centrist candidate and how the firstand two-past-the-post methods penalize him, while in contrast the majority
judgment appears to be more evenhanded. Note, in particular, the chaotic
126
Chapter 6
behavior in the centrists Borda-wins when there are twelve rather than only
three candidates.
This has practical significance: most thoughtful commentators reject election
mechanisms that systematically elect the centrist candidate. As the well-known
popularizer of science William Poundstone wrote, We want a system that
doesnt automatically exclude [moderate] candidates from winning. We also
want a system that doesnt make it easy for any goof who calls himself a
moderate to win (2008, 211). On the other hand, the fact that Bayrou had
merely forty-two and fifty wins with first-past-the-post when by all reasonable
estimates Bayrou was the Condorcet- and Borda-winner seems derisory. A good
election mechanism should eliminate extremes and give all major polesleft,
center, and righta fighting chance to win.
To this day, the Condorcet-winner and the Borda-ranking dominate the thinking in the theory of social choice: they continue to be proposed and reproposed,
alone and in combinations. Agreement between them would therefore seem to
be a happy concurrence giving a particularly valid result. But both of these
mechanisms are heavily biased in favor of moderate candidates. Major candidates of the right and the left, such as Sarkozy and Royal, often elicit strong
support and strong opposition, so they are given high or low evaluations. A
moderate candidate, on the other hand, is often placed second or third. Face-toface confrontations and rank-orders ignore how voters evaluate the respective
candidates (just as the 2002 French presidential runoff merely compared Chirac
and Le Pen but did not evaluate them).
The ballots of the 2007 Orsay experiment show that these evaluations are
significant: two-thirds of the second highest grades are merely Good or worse,
three-quarters of the third highest grades are Acceptable or worse (see table
6.15). Both Condorcet and Borda ignore evaluations; they rely only on comparisons. When there are twelve candidates, a voters list gives 11 points to the
first candidate, 10 to the second, 9 to the third, and so on. The difference between
being first, second, or third on the list is marginal, especially in the presence of
many candidates. Perhaps this exaggerated bias in favor of moderate candidates
explains why these mechanisms are hardly ever used in practice.
Table 6.15
Distributions: Highest Grade, Second Highest Grade, Third Highest Grade, Three Precincts of
Orsay, April 22, 2007
Highest
Second highest
Third highest
Excellent
Very Good
Good
Acceptable
Poor
To Reject
52%
37%
35%
9%
41%
26%
2%
16%
40%
0%
5%
22%
1%
3%
13%
127
The free communication of thoughts and opinions is one of the most precious
rights of man. (Dclaration des droits 1789, article 11). Not one of the electoral
systems used in practicewhether it be the Australian rank-order or the one
vote allowed in England, France, and the United Statesgives voters anywhere
near the freedom of expression they wish to have.
The traditional model of social choice has been tried in theory and in practice
and does not work. By Richard Feynmans definition of science, it must be
eliminated.
Judging in Practice
A thing may look specious in theory, and yet be ruinous in practice; a thing may look
evil in theory, and yet be in practice excellent.
Edmund Burke
Mais l o les uns voyaient labstraction, dautres voyaient la verit. (But where some
saw abstraction, others saw truth.)
Albert Camus
Were ranking everybody, said the playwright Arthur Miller, every minute
of the day: economists and peace-makers, mathematicians and physicists,
novelists and journalists, students and professors, divers and skaters, beauty
queens and muscle-men, cities and countries, hotels and restaurants, movies
and theatrical performances, hospitals and universities, wines and cheeses. To
rank these and many other competitors, accomplishments, endowments, performances, goods, or services is fraught with differences of opinion among
the judgesor conflicting appreciations of their characteristic attributesthat
must be reconciled into the verdict of a jury.
Athletes compete for glory (and money) at Olympic games; chess and go
players do, too, though elsewhere. Wines and cheeses compete for prizes and
other accolades of excellence in trade fairs. Flautists, pianists, and violinists
compete for international, national, and regional prizes. Ranking and designating the winner among runners, high-jumpers, chess champions, and go players
is simple enough: time distinguishes among runners, height among jumpers,
and the winners and losers between pairs of chess and go players are obvious
(though how to rank many players is not).
A heated argument between two (wonderful but fictional) late-eighteenthcentury zoologists over emotion and its expression in animals was described,
in which one says to the other: How can [emotion] be measured? It cannot be
measured. It is a notion; a most valuable notion, I am sure; but, my dear sir,
130
Chapter 7
Students are regularly ranked at all levels. Today, more often than not, their
examinations, essays, and class performances are graded, and the grades determine their ranking. Candidates for positionsin the civil services, as qualified
medical doctors, as lawyers admitted to the bar, as students wishing to
pursue medicine, the law, the sciences, or any disciplinehave from time
immemorial been examined, and ranked or graded by groups of individuals.
Important examinationsthe baccalaurat in France, college aptitude tests in
the United States, A-levels in Great Britain, bar examinations, doctoral qualifying examinationsoccupy an important place in the psyche of those who have
suffered through them. This must explain why systems that are devised to find
rankings so often adopt the language with which their inventors were familiar
in their youth. These are, among many others, important examples of a method
of social choicefamiliar to all from their very first memories of classroom
gradesthat is neither voting nor a market mechanism.
In China since medieval times, imperial dynasties, gentry-literati elites, and
classical studies were tightly intertwined in the operation of the civil service
examinations. All three dimensions were perpetuated during the late empire
(13681911), and they stabilized for five hundred years because of their interdependence . . . [B]oth local elites and the imperial court continually influenced
the government to reexamine and adjust the classical curriculum and to entertain new ways to improve the institutional system for selecting those candidates
who were eligible to become officials (Elman 2000, xvii, xxiiixxiv). So begins
a magisterial study of an extremely elaborate system for examining and ranking
candidates seeking positions in the Chinese civil service and, more generally,
both social and economic importance in society.
The system originated in the Sui Dynasty (581618) and evolved over time.
By 1065 examinations were held every three years, in three levels, at fixed
dates: provincial Autumn examinations with results given in the Laurel list,
followed first by state Spring examinations with results given in the Apricot
list, then a month later by palace examinations with results given in the Golden
list. Their importance is measured by an incident of 1397. The first emperor of
Judging in Practice
131
the Ming Dynasty, angered because no candidates from the north had survived
to the palace examinations, demanded a change in the results. Refused by the
chief examiner on the grounds that the evaluations had been strictly anonymous,
the emperor ordered new evaluations by new officials. They reported the same
results and were promptly put to death.
The system had to be elaborate in view of the numbers of candidates in local,
provincial, and metropolitan examinations. In the metropolitan examinations
of 1742, four chief and eighteen associate examiners had to evaluate 5,993
candidates, of whom they retained 319 (Elman 2000, 680). To begin, associate
examiners accepted or rejectedusing such phrases as deep thoughts, rich in
force, sufficient life, and perspicacious, or when there was little time, studies
that have a base, lack of subtlety, or correct but ordinary (Elman 2000,
426; Zi 1894, 152)then ranked the surviving candidates. Exactly how the
rankings were reached is unclear, but grading was linked to examiner comments,
not scores . . . The rankings served as the scores. It seems to be more like a
process of elimination within a fixed quota.1 An earlier scholar hints at the
possibility of a rudimentary system where each of eight examiners placed a
circle on excellent exam copies, points on less than good ones, or combinations
of them, so that eight circles was the highest accolade (Zi 1894, 198). Another
account tells of up to six circles being given for each answer, the total number
of circles determining the ranking of the candidates.2 There is often a dearth
of precise information concerning the mechanics of grading and ranking in the
historical records.
Gaspard Mongemathematician; founder of the cole Polytechnique;
inventor of descriptive geometry; a favorite of Napolon, whom he accompanied to Egypt; president of General Bonapartes Institut dEgypte; Minister
of the Navy for eight months (17921793), and in that capacity the man who
witnessed and signed the official procs verbal of Louis XVIs decapitation
(Balinski 1991)appears to be one of the first to have developed and used a
system for grading. Monge, who had been elected to the Royal Academy of Sciences in 1772, was appointed examiner of the navy in October 1783, a position
which he continued to occupy until 1790. He had to examine the students at
two academies in the provincesto which later were added others in the ports
of Brest, Rochefort, and Toulonto determine those students who will have
1. B. A. Elman, private communication, January 11, 2006.
2. We are indebted to Wanyan Shaoyuan for some of this information, cited from two recent
Chinese publications, Huang Mingguang, Studies of the Imperial Examination System in the Ming
Dynasty, Guangxi Normal University Press, 2006, and Wang Kaixuan, Research and Discourses
on the Imperial Examination System in the Ming Dynasty, Shengyang Press, 2005.
132
Chapter 7
satisfactorily mastered the required parts of the program, who will immediately
be sent to the ports after their examination and admitted as cadets third class
of the navy with a salary of 300 pounds per year (Julia 1990). In 1789 he
orally examined thirty-eight candidates at the academies; by 1790 the number
had grown to seventy. Monge recorded letter grades in his private notebook
going from a to g, though no a may be found in itfor each of the courses
of study, sometimes mixing them by writing cd and gh. To these he
added descriptions of the quality of presentations. He also sought to assess the
intelligence and character of the candidates, using words such as promising,
very promising, fairly intelligent, very intelligent, ordinary, and slow
witted for the first; and for the second, reasonable, thoughtful, lively,
bold, and light. He was required by the ministry to rank-order all thirtyeight candidates, but there is no record as to how he integrated the letter grades
and the verbal descriptions of intelligence and character into a single list.
Grades come in a myriad of scales and have changed over time. United States
universities are a case in point. As for grading, while some system of grading
was implied in the ranking of seniors for commencement parts in the colonial
colleges, the initiative in attempting to formulate a scale for grading students
was not taken until 1813 at Yale, which adopted a numerical scale of four for
evaluating course work. The numerical scale took the place of four terms that
had been used as early as 1783 . . . : optimi, second optimi, inferiores (boni), and
pejores (Rudolph 1977, 147). Harvard adopted a scale of 100 in 1879, replaced
it with five letter grades from A to E in 1883, and chose a scale of three verbal
scores, passed with distinction, passed, and failed, in 1895. The pioneering historian of U.S. colleges and universities, Frederick Rudolph, goes on to
remark, All this thrashing around in search of the perfect grading system was
a response to a changing curriculum and a changing climate of academic life.
Examining and grading systems were barometers of curricular health and style
and purpose. In adopting the numerical scale, Harvard stressed competition
as an inducement to student effort. Letter grades reduced competitive pressures and deemphasized class rank, which now could not be calculated (147).
He adds elsewhere that changes in grades also reflected the changing moods of
the country: the election of Andrew Jacksonthat unschooled orphaned soldier (Rudolph 1991, 201)3 and the spirit of Jacksonian democracy, hostile
to privilege in all its forms, had discouraged the practice of ranking students
at all.
3. John Quincy Adams, Jacksons immediate predecessor, decried Harvards award in 1833 of an
honorary degree to the barbarian Jackson as a disgrace. This opinion was so widely shared that
Harvard abstained from so honoring another president for almost forty years.
Judging in Practice
133
134
Chapter 7
What mechanism of social choice should be used to fuse the grades assigned
to a candidate or a student by different examiners or professors of different
disciplines into one grade has had a well nigh unanimous answer: calculate the
average values of the grades, or calculate the averages of the grades weighted
by a factor of their importance. For example, students at Frances prestigious
cole Polytechnique are ranked at graduation on the basis of the averages of all
their weighted grades, and U.S. universities elect students to Phi Beta Kappa
and bestow the distinctions of magna or summa cum laude on the basis of
the students grade averages (sometimes with supplementary evidence such as
letters of professors or difficulty of programs).
Two well-known French psychometricians, Henri Piron and Henri Laugier,
are generally credited with having founded la docimologie, the science of examinations, doci coming from the Greek dokim, meaning test, which enjoyed a
certain vogue in the 1920s but did not survive as a focused discipline (Martin
2002). The docimologists analyzed grades given in examinations and showed
that significant variations in the results came from the varying subjective coefficients of the different examiners: some are severe and others generous, some
assign widely varying marks, others not. Thus the grades of a student sitting
an examination together with many others (such as the national baccalaurat
in France) depend on who does the grading. Piron and Laugier contested the
idea of using average scores. They were right: the clear solution is to have a
jury of several examiners assign grades and to use the majority-grade (though
having more examiners obviously implies greater expense).
Piron noted that electors called on to examine candidacies for public office,
committees of clubs charged with designating new members, and university
professors deciding whom to appoint as new faculty face exactly the same
problems as do examiners (Piron 1963, 5354). He clearly recognized that the
problem of grading and ranking students is a problem of social choice.
7.2 Employees
Employees are evaluated, sometimes they are ranked, and many firms distribute
year-end bonuses on the basis of their performances. Forced ranking became
a hotly debated innovation with the publication of General Electrics annual
report in 2000. In it their corporate executive office, led by chief executive
officer Jack Welch, declared, In every evaluation and reward system, we break
our population down into three categories: the top 20%, the high-performance
middle 70%, and the bottom 10%. The top 20% must be loved, nurtured and
rewarded in the soul and wallet because they are the ones who make magic
happen . . . The top 20% and middle 70% are not permanent labels. People
Judging in Practice
135
move between them all the time. However, the bottom 10%, in our experience,
tend to remain there. A Company that bets its future on its people must remove
the lower 10%, and keep removing it every yearalways raising the bar of
performance and increasing the quality of its leadership.
Forced ranking or forced distribution rating systemsrank and yank
refer specifically to the idea that the bottom 10% must either be let go with
a severance package or be put on notice to do better within three months and
face severance with no parting gift. Many companies are said to use it (or to
have used it), including Lucent, the infamous Enron, Ford, General Motors,
Goodyear, and Microsoft, though instead of a 20%70%10% formula some
practice(d) 10%80%10%, others 25%25%25%25%.
Class-action suits were filed against each of the last four firms, claiming their
systems discriminated against blacks, women, older workers, or noncitizens.
In the suit against Ford, 500 employees accepted a $10.5 million settlement.
Dick Grote, a strong proponent of the system, claimed (in 2003) that forced
ranking is probably the most controversial issue in management today. It
is generally acknowledged that about one-quarter of the Fortune 500 companies use a performance management system based on forced ranking, though
increasingly companies declare they do not or refuse to speak about it; it does
not have the image of a loving, nurturing system, and it has led to costly suits
and unflattering publicity.
A recent article investigates the extent to which introducing such systems
can be expected to improve the quality of the workforce (Scullen, Bergey, and
Aiman-Smith 2005). It leaves aside how morale, profitability, and productivity
may be affected, concentrating on performance potential as a function of
the percentage fired, the managers ability to judge performance, the quality of
personnel selection procedures, and the usual levels of turnover. It concludes
that such systems definitely hold promise but that after impressive gains in
the first few years, the expected improvement goes to zero.
The important point is that organizations rank and classify employees in
terms of their past and expected future performances. They may force-rank or
simply wish to determine bonuses and raises. So methods are needed for ranking
employees, deciding on raises, and distributing bonuses among them from a
presumably fixed pool of money. On the other hand, nothing of the sort may be
found in the main book on the subject, issued by the Harvard Business School
Press (Grote 2005). A publication clearly designed to sell the consulting services
of its authorIts about jump-starting a leadership development process . . .
Its about understanding the depth of your talent pool and seeing where the
true leadership potential lies in your company . . . Its about talent management . . . its about grooming great leaders. This book will help you raise the bar
136
Chapter 7
and lift the boat (xixii)it repeats checklists of the obvious questions that
should be asked, and recommends that managers should be trained to the task
by an outside facilitator who should also chair the committees that establish
the rankings. Nothing is said about how the opinions of many managers are to
be reconciled to arrive at a company decision. By implication the decisions are
made by some sort of collective consensus (which more often than not must
mean that one or two individuals impose their views).
French jurisprudence upheld the right of companies to rank (but clearly
not, in a country known for its social support system, to yank) in 2002, stating (in full): Ranking systems permit fixing salary increases as a function
of the relative performance of employees and positioning them according to
pre-established, known, objective and controllable criteria; the individual performance of employees is appreciated in comparison with the performances of
employees occupying comparable positions. The classifications, brought to the
knowledge of the employees beforehand, are neither subjective nor discriminatory. Therefore, including a [ranking system] in the interior regulations [of
a firm] to classify an employee at the lowest performance level, in conformity
with rules established in a manner known to all and that constitutes a licit individualization of salary increases, is to be allowed (Rpertoire de jurisprudence
2002). 4 How the French court decided that any ranking system is objective and
not subjective, known to all, controllable, and not discriminatory, is a total
mystery.
7.3 Musicians
Judging in Practice
137
lowest score the winner, the one with the highest score last (except that in
the finals a band ranked first by two music judges and one marching judge is
immediately declared the winnerto be certain that if a band is judged best by
a qualified majority then it must winand the others are ranked as described).
This is nothing but an equivalent, mirror image of Bordas method (and one of
the very rare instances where Bordas method is actually used), with low scores
good and high scores bad, together with a provision to assure that if a (qualified)
Condorcet-winning band exists, it wins. In addition, the UIL demands a strict
ordering with no ties. If two or more bands have the same score, the procedure
says the same rule is to be applied to them alone to decide how they should be
ranked among themselves (and if ties occur again, then they are to be broken
with still another application of the rule). Regrettably, this recipe does not
(always) work. When an even number of bands are tied, the rule cannot give
the same score to all of them, so either the bands are strictly ordered or some
are still tied. However, when an odd number of bands are tied, they can have the
same score. For example, suppose that in a competition among twenty bands
A to T every one of the five judges Ji consistently placed three of them, A, B
and C, in one of the first three places, as follows:
J1 : A B C
J4 : A C B
J2 : C A B
J5 : B C A .
J3 : B C A
Each has a UIL score of 10, the scores of all other bands are higher than 10,
and considering the three alone changes nothing, so the rule fails. It is surprising
that such an outcome has never been confronted in practice. The UIL could
correct this flaw by accepting such situations as unavoidable, true ties. Theirs
would then be a genuinely new method. 5
The Frederick Chopin International Piano Competition began in 1927. One of
the oldest and most distinguished piano competitions, it is held in Warsaw every
five years (except for the disruptions caused by World War II). The methods for
ranking candidates have changed over the years and are sketched rather than
defined precisely (Frederick Chopin International Piano Competition 2006).
The system has gradually evolved into one where competitors are successively
eliminated at each of three stages, and (usually) six of the finalists are ranked.
Before the 2000 competition, judges awarded points to competitors, and the
5. The Texas State Director of Music, Richard Floyd, informed us that the UIL was aware of this
rare circumstance. They would place C last because it was first least often. Between two bands
the procedure is the same as simple majority voting, so there must be a clear outcome with five
judges, giving the result A S B S C. But their rule is arbitrary: they might have said eliminate
the bands that are last most often, A and B, yielding the outcome C S A S B.
138
Chapter 7
sums (or averages) determined who survived, who did not, and how the finalists
were ranked. The scale of points assigned by judges has always gone from a low
of 0 or 1 up to some upper but varying limit: to 12 in 1927, to 15 in 1932, to 20
in 1937, to 25 between 1955 and 1985, then in 1990 and 1995 to 10 at the first
stage and to 25 at the later stages. New regulations have been in use since 2000.
They state that in principle the first stage of the competition should reduce the
number of competitors to 80, the second to 30, the third to 12, and the final stage
should rank 6 of them. In 2005, beginning with 257 participants, 80 survived
the first stage, 32 the second, and 12 the third. Of the six finalists who were
ranked, one was awarded first place, two were tied for third, two were tied for
fourth, and one was sixth (in 1990 and 1995 no first place was awarded). Two
systems are prescribed. The first, used in each of the three stages, asks that each
judge say yes, the competitor should be admitted to the next stage, or no, he
should not. The second, a supporting system, also used at each stage, asks that
each judge accompany his yes-no decision with a grade ranging from 1 to 100.
At the end of each stage the average scores of the competitors (without their
names) are displayed in descending order together with the yes-no counts to
enable the judges to admit the correct number of competitors to the next stage
(by consensus or by voting in some way that is not specified). The six finalists,
evaluated on a 112 scale, are ranked according to their average scores.
A quick overview of musical competitions reveals that the vast majority first
eliminate in one or more stages, then rank a very limited number of finalists, such
as six. It seems that usually average scores determine the ranking, though often
the regulations of a competition only say internal regulations determine how
a jury reaches decisions (including cutoffs, the scale of points, the calculation
to merge judges points into the jurys scores). Sometimes the top and bottom
scores are dropped from the calculations to discard the influence of extremes
on the average score and to try to eliminate the effects of cheating or favoritism
when, for example, a member of a jury evaluates the performance of her student. In at least one piano competition the final result was a shock to all: the jury
contained two sets of judges representing distinctly differing sensibilities, causing the piano performances most highly admired by one set of judges to earn
mediocre averages from the other.6 If neither of the two sets constitutes a majority of the jury, the majority-grade and majority-ranking avoid this problem.
A very different procedure has been used to judge flute competitions at the
Conservatoire National Suprieur de Musique in Paris. Annual auditions award
6. We are obliged to Thrse Dussault, an experienced jury member of international piano
competitions, for some of this information.
Judging in Practice
139
first, second, and third prizes. After all the contestants have been heard, the
judges are asked to vote on the question, Should a first prize be awarded at
all? If there is not a majority of yeas, no first prize is awarded. Otherwise, a
new question is put to vote: Should two first prizes be awarded? If the vote is
against, only one is awarded. Otherwise, again a new question: Should three
be awarded? The votes continue until the number of first prizes is determined,
call it n. Each judge is now asked to write the names of at most n contestants
on a ballot; all those who receive a majority win a first prize, and they are listed
in order of the number of votes received. The identical procedure is used for
second and third prizes.7 This may be thought of as a kind of approval voting
to elect either one or more candidates.
The Bsendorfer and Van Cliburn International Piano Competitions both use
what they refer to as a sophisticated computer program that calculates results
based on numerical scores. Little is revealed about it,8 beyond suggesting that
to balance the scores of a consistently high-scoring juror with a consistently
low-scoring juror, the scores of all jurors are processed by computer software to
the same statistical distribution (Herberger College 2006). It thus recognizes
a difficulty, pointed out by the docimologists, that is inherent when numbers
are used without giving an explicit and absolute sense of what they mean. In
any case, a black box whose insides remain hidden should never be accepted
for electing or ranking: to be fairand to be considered fair by all, competitors, judges and the publica procedure must be known to all and understood
by all.
7.4 Skaters and Gymnasts
Judging figure skating has a particularly rich recent history. The big controversy
of the 2002 winter Olympic games held in Salt Lake City, Utah, concerned the
first two finishers in the pairs figure skating competition and focused on the
method of scoring that gave the gold medal to a Russian pair, the silver to
a Canadian pair. The vast majority of the public, and many experts as well,
were convinced that the gold should have gone to the Canadians, the silver to
the Russians. Though many admitted that the Russians final performance was
more challenging, it had flaws, whereas the Canadians did not. Described by
many as skatings worst scandal, it provoked a heated debate and deep divisions
7. We are indebted to Michel Debost, an experienced jury member of international flute
competitions, for this information.
8. J. A. MacBain, who conceived the system, answered the request for a description with the
message, I have taken the position that my methods are proprietary.
140
Chapter 7
within the skating world. The outcry was so strident that the International Skating Union (ISU), recognized by the International Olympic Committee (IOC) as
the international governing body for the sport, ended up changing the verdict
and giving both pairs a gold medal. It then went on to institute a complete and
fundamental change in the system for judging all figure skating competitions,
abandoning the new One-by-One (OBO) method it had just adopted after the
last Olympic games. In a furor a dissident camp formed the World Skating Federation with the avowed intent of keeping the old ordinal system and becoming
the new IOC-recognized governing body for the sport. It brought suit against the
ISU, ultimately lost, then disbanded (World Skating Federation 2005). Judging
athletic competitions is not, it seems, a benign activity.
The old ordinal system had survived for years despite its foibles. Vote trading
was not unknown, one judge exaggerating the score of a skater in return for
another judges doing the same for another skater. Accumulated evidence shows
that judges had strong national biases. Sonja Henie of Norway won the 1927
World Championships with the votes of three Norweigian judges, the German
and Austrian judges having voted for a German skater. A statistical analysis
concludes, [Judges] appear to engage in bloc judging or vote trading. A skater
whose country is not represented on the judging panel is at a serious disadvantage. The data suggests that countries are divided into two blocs, with the United
States, Canada, Germany and Italy on one side and Russia, the Ukraine, France
and Poland on the other (Zitzewitz 2006). During the 2002 scandal, a French
judge first confessed having favored the Russian over the Canadian pair, saying
she had yielded to pressure from her hierarchy, then denied it. Another factor
abetted national biases: judges at international competitions were appointed by
national skating unions, contrary to the practice in ski jumping, for example,
where judges are appointed by the Fdration internationale du ski (FIS). By the
ISUs new rules, the judges are no longer appointed by the national committees.
The ordinal system also bewildered the public. The rankings among skaters
were naturally updated after each individual performance. Often the order
between two skaters would be reversed solely as the result of a third skaters
performance. The new OBO system (ISU 1998), introduced in 1998, cured
neither problem. Moreover, each exercise performed by a skater would be followed by a posting of the judges technical and presentation marksbetween
0 and 6 inclusiveand their averages, along with the new relative standings
of all the skaters. Although the language of a score between 0 and 6 seems to
have been understood and accepted by all, including the public, the role of the
scores was not well understood. That one judge gave a skater a technical mark
of 5.3 and a presentation mark of 4.8 (a total of 10.1) on a performance, and
another judge gave him a 5.6 and a 5.9 (a total of 11.5) had no direct effect
Judging in Practice
141
Table 7.1a
1997 European Championships, Ordinals of Mens Free Skating among Top Five before
Performance of Vlascenko
Candeloro
Kulik
Urmanov
Yagudin
Zagorodniuk
J1
J2
J3
J4
J5
J6
J7
J8
J9
Mark / Maj
Place
3
2
1
4
5
2
4
1
3
5
5
2
1
3
4
2
3
1
5
4
3
5
1
4
2
2
4
1
5
3
5
3
1
4
2
5
4
1
3
2
5
4
1
2
3
3/5
4/8
1/9
4/7
3/5
3d
4th
1st
5th
2d
Note: Mark is the middlemost (or median) of the ordinals; Maj is the number of judges in favor
of at least the mark.
whatsoever on the ultimate order among the skaters. The sum of the technical
and presentation marks given by a judge in a particular exercise to each of
the skaters served only to determine how that one judge ranked all the skaters
in that exercise. Thus, in particular, the average total mark earned by a skater
in an exercisemade known to the spectators almost instantaneouslymeant
nothing whatsoever. Since the marks given by a judge served only to determine
that judges ranking of the skaters, why not simply ask judges to rank them
straightaway? Because that is much harder to do. When there are, say, some
fifteen competitors, placing them in order is a very complicated task. It is far
more practical to use an absolute measure than to try to compare relative merits.
But why, then, did they not use the scores themselves? Perhaps, like Laplace
before them, they believed the language was not common at the outset and thus
decided to rely only on the orders determined by the evaluations of individual
judges.
The 1997 mens competition in the European Championships shows how the
old ordinal system worked (here restricted to the top six finishers) (Loosemore
1997). The competitors finished the short program in the following order: first
I. Kulik, second V. Zagorodniuk; third A. Vlascenko; fourth P. Candeloro; fifth
A. Yagudin, sixth A. Urmanov. With only Vlascenko yet to perform in free
skating, the marks of the nine judges resulted in the ordinalsmeaning the
order of finish according to each of the judgesshown in table 7.1a, where, for
example, judge J3 ranked Candeloro fifth.
A majority principle deduces the jurys decision from the ordinal rankings of
the judges. We give a different, simpler, but equivalent description of the system
than the usual one. A competitors mark is the middlemost (or median) of the
ranks ascribed to him by the (odd number of) judges. This isexcept for how
to resolve tiesthe Borda-majority method described in chapter 5, or Galtons
idea applied to ranks rather than scores or grades. A majority of the judges
142
Chapter 7
believe that a competitors rank should be at least his mark, and a majority of
them believe that his rank should be at most his mark. Thus, in fact, the method
effectively resists manipulation, unless a majority of the judges collude, which
is precisely what was happening. Each of the two blocs presumably leaned
on the one other judge to manipulate the final outcome. This is wholesale
cheating.
By this method Urmanovs mark is obviously 1. Candeloros mark is 3: a
majority of at least five place him third or better, and a majority of at least five
place him third or worse. If there are ties among some marks, a skater with
the greater majority in favor of at least his mark takes precedence: thus Kulik
takes precedence over Yagudin. If there are ties among some marks that have
the same majority in favor (Maj), a skater for whom the sum of the ordinals
at least equal to his mark (called the total ordinals of majority, or TOM) is
smaller takes precedence: the TOMs of Candeloro and Zagorodniuk are both
12. If, as in this case, ties remain, a skater whose sum of all ordinals (called
TO) is smaller takes precedence: Candeloros TO is 32, Zagorodniuks is 30, so
Zagorodniuk is ahead of Candeloro. If the TOs were also equal, then the two
competitors would be considered tied. These complicated rules, advanced with
no justification, show the great importance of ending with a complete, strict
order among the competitors. It happens that the Borda-majority method gives
the same results in this case. The majority-values given to the needed precision
are (low numbers are better): Urmanov 1., Zagorodniuk 3.4, Candeloro 3.5,
Kulik 4.43434, and Yagudin 4.43435.
The final standings are determined by adding each skaters place number
in the short program to twice his place number in the free skating program to
obtain his index: a skater with a lower index takes precedence, and any ties
are resolved by the standings in the free skating program. Thus before Vlascenkos free skating appearance the current standings were, in order (indices in
parentheses): Zagorodniuk (2 + 2 2 = 6), Urmanov (6 + 1 2 = 8), Kulik
(1 + 4 2 = 9), Candeloro (4 + 3 2 = 10), Yagudin (5 + 5 2 = 15).
Then Vlascenko performed. The final ordinals in free skating are given in
table 7.1b, along with the final free skating outcome. Alone, Vlascenkos performance caused Yagudin to move ahead of Kulik, and Candeloro to move ahead of
Zagorodniuk. The final standings were then, in order (ties settled by the order
of finish in free skating): first Urmanov (6 + 1 2 = 8), second Candeloro
(4 + 2 2 = 8), third Zagorodniuk (2 + 2 3 = 8), fourth Kulik (1 + 5 2 =
11), fifth Yagudin (5 + 4 2 = 13), sixth Vlascenko (3 + 6 2 = 15).
Alone also, Vlascenkos performance in the free skating program pushed
Urmanov into first place and Candeloro into second place, relegating the prior
provisional leader, Zagorodniuk, to third place. These reversals, or flip-flops,
Judging in Practice
143
Table 7.1b
1997 European Championships, Ordinals of Mens Free Skating among Top Six
Candeloro
Kulik
Urmanov
Yagudin
Zagorodniuk
Vlascenko
J1
J2
J3
J4
J5
J6
J7
J8
J9
Mark / Maj
Place
3
2
1
4
5
6
2
4
1
3
5
6
5
2
1
3
4
6
2
3
1
6
4
5
3
6
1
4
2
5
3
5
2
6
4
1
5
3
1
4
2
6
6
4
1
3
2
5
6
5
1
2
3
4
3/5
4/6
1/8
4/7
4/7
5/5
2d
5th
1st
4th
3d
6th
Note: Mark is the middlemost of the ordinals; Maj is the number of judges in favor of at least
the mark.
Table 7.1c
1997 European Championships, Ordinals of Mens Free Skating among Top Six Borda-Majority
Method
Candeloro
Kulik
Urmanov
Yagudin
Zagorodniuk
Vlascenko
2
2
1
2
2
1
2
2
1
3
2
4
3
3
1
3
2
5
3
3
1
3
3
5
3
4
1
4
4
5
5
4
1
4
4
6
5
5
1
4
4
6
6
5
1
6
5
6
6
6
2
6
5
6
Majority-Value
Place
3. . . .
4.435 . . .
1. . . .
4.4343 . . .
4.4342 . . .
5. . . .
2d
5th
1st
4th
3rd
6th
were ubiquitous in the old ordinal system and should long ago have encouraged
the ISU to hunt for a new system. There happens to be agreement, once again,
with the Borda-majority method (table 7.1c).
The ISU then opted for the OBO system (used only once in Olympic competition, 2002). In explaining it, the ISU wrote, Nothing has been changed
in the work of the individual Judge, so that he/she judges every event in the
same manner as with the previous Result System. The difference is how the
opinion of the majority of the Judges is taken into account (1998, 1). Thus, in
particular, the 0 to 6 language was maintained. The innovation was to use the
Dasgupta-Maskin method to obtain a ranking, that is, Llulls (or Copelands)
method, together with Cusanuss (or Bordas) in order to break ties, so it is
simple to describe. As before, judges rank-order competitors in the short and
free skating programs. In each, the number of times a competitor is preferred
by a majority of judges to other competitors is counted (her number of wins): a
skater with a higher number of wins is ranked ahead of one with a lower number, with ties among them resolved by their Cusanus- or Borda-scores (when
computed for all contestants). The final standings are calculated as before, by
144
Chapter 7
adding a skaters rank in the short program to twice her rank in the free skating
program, and breaking ties with the results of the free skating (ISU 2002).9
All kinds of reversals and discontinuities can occur, as was noted earlier in the
context of voting.
The importance of a methods obeying the independence of irrelevant alternatives (IIA) condition is crystal clear when the method is used in a dynamic
environment such as sports or cultural competitions. A method that fails the condition provokes immediate suspicion. How can one competitors performance
be relevant to the jurys decision concerning the order among the others? That
is why in practice procedures for judging competitors have turned more and
more to scoring or grading as a means for ranking.
The ISU ended up adopting a completely different and very complex scoring system patterned on the one used for judging gymnastic competitions (ISU
2004). To be specific, consider the mens competition. As in the past, there are
two performances, the short program and the free skating program. An executed element of both is a part of a program, such as a layback spin level
3, a double axel, a triple-flip, a death spiral, or combinations thereof.
Each executed element has a base value of points that is predetermined by a
technical committee. A skaters program is formally announced as a collection
of executed elements: at the 2006 European Championships in Lyon, every
mans short program had eight such elements, and their free skating programs
usually had fourteen, though several had thirteen and one fifteen. A judge gives
to each executed element a score of 0, 1, 2, or 3call them merits (if
negative, demerits)that modifies the base value by that amount, up if he considers the execution good, down if bad. He also grades each of five program
components on a scale of 0.25 to 10 in increments of 0.25: skating skills, transition/linking footwork, performance/execution, choreography/composition, and
interpretation. Each program component is multiplied by a factor of 1 in the
short program and a factor of 2 in the free skating program. For womens and
pairs skating, however, the factor is 0.8 for the short and 1.6 for the free skating
program. The order of the contestants is exactly the same as when the factors
are respectively 1 and 2. Why, then, the distinction? To ensure that the mens
totals dominate any that may be earned by women? At Lyon, the two top men
had final scores of 245.33 and 228.87; the top two women, 193.24 and 177.81,
which would become 241.55 and 222.26 if the factors were the same as the
mens.
9. The ISU data is incomplete in one respect. The Russian and Canadian scores in free skating of
the judge accused of deceit are not given.
Judging in Practice
145
In any case, for every competitor there are two matrices or tables of numbers:
(1) the element table, one line for each executed element and one column for
each judge; and (2) the program component table, one line for each component
and one column for each judge. The twelve judges are anonymous; it is not
revealed which judge announced what merits, demerits, or grades. The system
selects three judges at random and ignores their evaluations; which judges and
what scores is unknown to all (before the competition has ended). For each
skater, the trimmed average of the nine judges merits or demerits for each
element modifies the base value to obtain the score of that element, and the
trimmed average of their grades for each program component, multiplied by its
factor, gives the score of that component. The trimmed average is the average
value after the highest and lowest values have been deleted. The sum of the
element score plus the program component score gives the total score in each
of the two programs. The two are added together to give the final complete
ranking of the contestants.
Since scores are involved, no flip-flops are possible. Manipulation is supposed to be discouraged, first, by randomly eliminating some judges, next, by
dropping the highest and lowest grades. On the other hand, the anonymity of the
judges gives them added impunity (though their scores are known to the ISU)10
and would seem to damage the credibility of the grades in front of the general
public. Moreover, randomly selecting a panel of nine of twelve judges whose
grades count invites a question: Suppose some other panel had perchance been
chosen?
In all there are 220 different possible panels that can be chosen. Were they to
be used to rank the top three women figure skaters in the short program of the
2006 Olympics, 67 panels would agree with the official outcome, 153 panels
would not; 92 would agree on first place, 128 would not. The rankings that
result from using all 220 panels are given in table 7.2. The official point totals
in the short program were very close: Shizuka Arakawa 66.02, Irina Slutskaya
66.70, and Sasha Cohen 66.73. Had the grades of all twelve judges counted,
the outcome in the short program would have been Slutskaya first and Cohen
second (by 0.28 points). The official final result, based on the short and the free
skating programs, was Arakawa first, Cohen second, and Slutskaya third. It was
determined by one of the 48,400 (= 220 220) different possible combinations
of panels; 16,295 of those combinations would have placed Slutskaya ahead
of Cohen, and 132 of them would have declared them tied. There is something
10. One judge was barred by the ISU several days before the 2006 Olympics. The ISU had
admonished her four times for errors in judgment.
146
Chapter 7
Table 7.2
Rankings of Top Three Finishers in Womens Figure Skating Short Program, 2006 Olympic Games,
Turin, Italy, According to 220 Different Possible Panels
No. of Panels
92
33
67
25
First
Second
Third
Slutskaya
Cohen
Arakawa
Slutskaya
Arakawa / Cohen
Arakawa / Cohen
Slutskaya
Arakawa
Cohen
Cohen
Slutskaya
Arakawa
Cohen
Arakawa
Slutskaya
Judging in Practice
147
gold medal, but if the correct base value (called the start value by the gymnasts)
had been given to the Korean Yang Tae Young for his routine on the parallel
bars, then ceteris paribus Yang would have won the gold instead of the bronze
medal. There is no dispute about the error: a move called a belle, with a start
value of 9.9, was recorded instead of a move known as a morisue, with a start
value of 10. The FIG suspended three judges for the error, declared Yang to
be the true winner, and suggested that Hamm should return his medal as the
ultimate demonstration of fair play. Hamm refused. The Korean Olympic Committee demanded a hearing at the Court of Arbitration for Sport. In the words of
the court, It is beyond argument that judges operate under conditions of great
pressure when a routine compresses so many elements into so short a time frame
. . . Immediately after a gymnasts routine . . . the marks . . . are published on
three-sided electronic score boards near the respective apparatus for a period of
approximately 15 seconds . . . The arena was noisy as the gymnasts proceeded
to their last apparatusthe climax of the event . . . A conversation between a
Colombian and a Korean [a Korean judge of the performances had noticed the
error and tried to communicate it to a Colombian judge of the start values]
was fraught with potential for linguistic misunderstanding (2004, 69). The
complexity of the system was unable to withstand the brouhaha and the press of
time. Yet, the court ended up refusing to change the awards, for three reasons:
Courts may interfere only if an officials field of play decision is tainted by
fraud or arbitrariness or corruption . . . [Any] protest to be effective . . . had to be
made before the end of the competition . . . We have no means of knowing how
Yang would have reacted had he concluded the competition in this apparatus as
points leader . . . So it needs to be clearly stated that while the error may have
cost Yang a gold medal, it did not necessarily do so (3842).
7.5 Divers
148
Chapter 7
In diving, there are either five or seven judges. If five, the highest and lowest
scores of a dive are eliminated, leaving three scores; if seven, the two highest
and two lowest scores are eliminated, leaving again three scores. The sum of
the three remaining scores is multiplied by the degree of difficulty to obtain the
score of the dive. The divers final score is the sum of those of his individual
dives. In the case of synchronized diving, when a pair of divers perform together,
there are four execution judges (two for each diver) and five synchronization
judges. The same procedure is followed except that the top and bottom scores
of each type of judge are eliminated, leaving five scores.
FINAs rules are transparent and easy to understand. Scores determine rankings, so there can be no flip-flops. Top and bottom, or top two and bottom
two, scores are dropped, so excessive cheating is eliminated. The simplicity, in
contrast with the rules for skaters and gymnasts, is striking.
7.6 Countries
The Economist Intelligence Unit (EIU) issues a quality-of-life index that compares countries throughout the world. According to the 2005 index, the first ten
countries and several others, each with an indication of its place, are Ireland,
Switzerland, Norway, Luxembourg, Sweden, Australia, Iceland, Italy, Denmark, Spain, . . . , United States (13), . . . , France (25), Germany (26), . . . ,
United Kingdom (29). How is this done? First, subjective life-satisfaction surveys are made that ask how satisfied people are with their lives on a four-point
scale. Second, a statistical regression analysis is done to explain the responses
as a function of nine measurable indicators of the quality-of-life: material
well being (GDP per person), health (life expectancy at birth), stability and
security (using another EIU index), family life (divorce rate translated into an
index on a scale of 1 to 5), community life (1 if high church attendance or elevated union membership, 0 otherwise), climate and geography (latitude), job
security (unemployment rate), political freedom (average of indices of political
and civil liberties, on a scale of 1 to 7), and gender equality (ratio of average male and female earnings). When the values of the parameters have been
estimated, a countrys index is found by multiplying its indicators by the
corresponding parameters and summing them (EIU 2005). The idea is that the
weight or relative importance of the various attributes of satisfaction are estimated in one way or another, and then satisfaction in a country is a weighted sum
of supposedly measurable quantities. This general approach has also been used
to calculate indices for ranking universities and hospitals, though it has absolutely no theoretical justification. It is difficult to pretend that adding weighted
Judging in Practice
149
Wines have been ranked since the first century. It is the property of wine, when
drunk, to cause a Feeling of warmth in the interior of the viscera, and, when
poured upon the exterior of the body, to be cool and refreshing. . . . Who can
entertain a doubt that some kinds of wine are more agreeable to the palate than
others, or that even out of the very same vat there are occasionally produced
wines that are by no means of equal goodness, the one being much superior
to the other . . . ? Let each person, therefore, constitute himself his own judge
as to which kind it is that occupies the pre-eminence (Pliny the Elder). In
his treatise, Pliny the Elder (2379 c.e.) goes on to catalog the known wines,
placing them into four ranks according to their qualities, using such phrases
as, there is not a wine that is deemed superior . . . the Ccubum enjoyed the
reputation of being the most generous of wines . . . there is now no wine known
that ranks higher.
Philippe le Bel, King of France (12851314), established an official society
of Agents-Gourmets-Tasters of wine 12 (Peynaud and Blouin 1999) in 1312,
to be responsible for tasting, regulating, and classifying wines. A later decree
relative to the sale and distribution of wine, adopted on December 14, 1813,
clarifies a members role:
Napolon, Emperor of the French, King of Italy, Protector of the Confederation of the
Rhine, Mediator of the Swiss Confederation, etc. . . . We have decreed and so decree
what follows:
14. There shall be named agents-gourmets-tasters of wine. Their number may not exceed
fifty.
15. Their functions are: (1) Exclusive of all others, to store, and when necessary to
serve as intermediaries between sellers and buyers of spirits. (2) To taste, to that effect,
the said spirits, and to faithfully indicate the vintage and the quality. (3) To serve also,
exclusive of all others, as experts when there are disputes as to the quality of wines, and
allegations against carriers and boatmen arriving at ports or warehouses claiming wines
have been altered or falsified . . .
17. They shall be named by our minister of commerce . . .
19. They cannot buy or sell on their own account or on commission, under penalty of
destitution. (Dcret 1813)
12. Courtiers-Gourmets-Piqueurs de Vins.
150
Chapter 7
The famous and still important Bordeaux classification of 1855 was carried out
by them.
How are wines classified and ranked today? Robert M. Parker, Jr., the most
famous and most powerful critic aliveno less a person than Jacques Chirac
anointed him with the presidential accolade, Robert Parker is the most followed
and influential critic of French wines in the entire world (Parker 2002)
explains how he does it. He judges alone. Seven attributes should bless the
critic, he affirms: independence, courage, experience, individual accountability,
an emphasis on pleasure and value, a focus on qualitative issues, and candor. He
uses a 50100 scale, undoubtedly modeled on what he experienced as a student
in the United States. The grade 90100 is equivalent to an A, the very best of
wines, and therefore rare; 8089 is equivalent to a B, very good, especially if
85 or above (many such wines are in Parkers personal cellar). 13 The grade
7079 is a C, average, lacking complexity, though above 75 may be pleasant
and, when cheap, ideal for uncritical quaffing. Below 70 is a D or an F,
depending on where you went to school, flawed, dull, unbalanced wines. Parker
begins by giving a wine 50 points; its general color and appearance add at most
5 points; its aroma and bouquet contribute up to 15 points depending on the
intensities and its cleanliness; flavor and finish, depending on balance, depth,
and length on the palate, can garner as many as 20 points; and 10 additional
points are available to rate the overall impression and the potential for improving
with age. The grades, published in his bimonthly newsletter, Wine Advocate, are
awaited with anguish, and they hugely influence sales. It is said that one point
more or one point less can cause prices to skyrocket or sink. A measure of the
importance of Parkers opinions is his starring role in an excellent documentary
movie on wines and their production (Mondovino 2004).
Agreat many national and international wine competitions are held every year
throughout the world that are judged by juries of several members. Generally
speaking, the anonymity of each wine being tasted is assured by strict rules.
The International Wine and Spirit Competition (IWSC) has been organized in
Great Britain since 1969. At the first step all judging is done by region, variety
or type, and vintage. Samples are presented in numbered glasses, and judges
record their scores and give them to a panel chairman, who may decide to
discuss them. The highest possible score is 100, broken down into clarity up to
20 and taste and bouquet up to 40 each. The differing grades of the judges are
amalgamated into a panels grade by consensus. Where the judges are unable
to reach a majority decision, flights will be referred to another panel (IWSC
13. His book of 1,635 pages, just cited, contains very, very few wines with grades under 85.
Judging in Practice
151
2006). A grade of 90 to 100 earns a gold medal, 80 to less than 90 a silver medal,
and 75 to less than 80 a bronze medal. 14
Most Australian competitions judge wines with a 20-point scale: 3 points for
appearance, 7 for bouquet, and 10 for palate. Scores are given in multiples of
0.5. An average score of 18.5 or above earns a gold medal, 17 to below 18.5 a
silver medal, and 15.5 to below 17 a bronze medal. The Regional Agricultural
Societies organize most shows with the objective of improving the quality of
the products and the efficiency of their production. These are mainly Australian
shows, run by them for their wines, with judges looking for faults that are to
be eliminated in future years. Some 60% of the entries receive a medal. The
director of the Sydney Wine Competition points out that there is an in-built
negativism to this system.15 He explains that when two of three judges give
gold medal scores of 18.5 to a wine, but the third believes it deserves no medal
at all and gives it a 15, then even if the first two cheat and give it scores of
20, theythe majoritycannot impose their opinion. This is giving power to
cranks in proportion to their crankiness! The majority-grade avoids this problem
completely.
Wine competitions can have different aims. The Sydney International Wine
Competitions main objective is to help consumers choose pleasing wines to
complement their dining Table (Mason 2006). Its juries are mostly foreigners,
and it accords medals to only 20% of the entries. It has thirteen judges, including
a chief judge, and upwards of 2,000 wines to rank. In the first phase, there are
six panels of two judges, and each judge uses her preferred system of marking
to select a certain prescribed percentage (e.g., 20%) of wines out of sets of
no more than forty that belong to a same varietal category. A chief judge
intervenes when necessary, in particular when there are disagreements. In the
second phase, the wines are arrayed in a line of perceived palate weight, from
lightest bodied to fullest bodied. The competition director admits, So far as
I am aware, there is no scientific method to accurately predict a wines mouthfeel, its palate weight; this is a perceptual thing. The chief judge groups the
wines on the light-to-heavy-bodied scale for subsequent analysis. In the third
phase, the director informs members of the jury, You will be judging the wines
in each of these style categories alongside appropriate food complexing your
palate. There are two panels of six judges; judges give scores to wines between
1 and 10 (here a 1 corresponds to the usual 15.5, and a 10 to the usual 20) and
14. The details concerning the components of the scores and consensus decisions were provided by
Lesley Gray.
15. Warren Mason, in an email received March 16, 2006. We are indebted to him for the information
he provided.
152
Chapter 7
write comments (of some fifty to sixty words). A wines score is the average of
the scores given by the judges. Of the total entry, 10% win Blue-Gold awards,
a further 2 12 % win Highly Commended awards.
Vinitaly, an annual Italian wine competition, classed some 4,500 wines in
2005. They are judged in separate categories defined on the basis of color, age,
still or sparkling, sweet or dry, and so on. Each jury panel has five members: two
Italian members, one foreign technician, and two members of the international
trade press. The system is well defined:
Every wine entered in the Competition is assessed by a jury. The final score for every
sample is calculated from the arithmetical average of the individual numerical assessments after eliminating the highest and the lowest scores. Wines achieving the best
score (for no more than 30% of entries of each group of every category . . .), provided
that they have achieved a minimum score of 80 hundredths in accordance with the
Union Internationale des nologues [U.I..] evaluation method, will be awarded
ex-quo with a Special Mention Diploma. The 20 wines in every group in each of the
categories . . . which achieve the highest scores, provided that they are above 80 hundredths, are then subjected to further evaluation by 3 different juries. In this stage, the
score for every sample is calculated from the arithmetical average of the individual
numerical assessments after eliminating the highest and lowest scores of each jury. The
top 4 wines in each group achieving the best score, of no less than 80 hundredths, will
be respectively awarded the Grand Gold Medal, Gold Medal, Silver Medal and Bronze
Medal. (Vinitaly 2006)
Judging in Practice
153
Table 7.3a
Components of U.I. .s Sensorial Analysis Tasting Sheet for Wine Judging Competitions for
Still Wines, 2006
Excellent
Very Good
Good
Passable
Inadequate
Mediocre
Bad
Aspect
Limpidity
Nuance
Intensity
Aroma
Frankness
Intensity
Finesse
Harmony
6
[6]
6
[5]
5
5
4
4
[4]
3
3
3
2
2
2
1
1
1
0
0
0
[6]
8
8
[8]
5
[7]
7
7
4
6
[6]
6
3
5
5
5
2
4
4
4
1
2
2
2
0
0
0
0
Taste, flavor
Frankness
Intensity
Body
Harmony
Persistence
After-taste
6
8
8
[8]
8
8
[5]
[7]
7
7
[7]
[7]
4
6
[6]
6
6
6
3
5
5
5
5
5
2
4
4
4
4
4
1
2
2
2
2
2
0
0
0
0
0
0
Global opinion
[8]
Table 7.3b
Components of U.I. .s Score Sheet for Still Wines, July 2009
Visual
Limpidity
Aspect other than limpidity
Nose
Genuineness
Positive intensity
Quality
Taste
Genuineness
Positive intensity
Harmonious persistence
Quality
HarmonyOverall judgement
Excellent
+
5
[10]
[4]
8
3
6
2
4
1
2
[6]
8
[16]
5
[7]
14
4
6
12
3
4
10
2
2
8
6
8
8
[22]
[11]
[5]
[7]
7
19
10
4
6
[6]
16
9
3
4
5
13
8
2
2
4
10
7
Inadequate
154
Chapter 7
Table 7.4
Awards Conferred to Wines
Medal
OIV System
U.I.. System
Grand Gold
Gold
Silver
Bronze
score = 03
score = 48
score = 914
score = 1521
90 score 100
85 score < 90
80 score < 85
75 score < 80
The International Organization of Vine and Wine describes itself as an intergovernmental organisation of a scientific and technical nature of recognized
competence for its works concerning vines, wine, wine-based beverages, table
grapes, raisins and other vine-based products (OIV 1994). At its 74th General
Assembly in June 1994, it adopted a standard for international wine competitions, arguing that this would guarantee procedural fairness and allow for the
comparison of results. The rules stipulate that a jury should be composed of
seven jurors (never fewer than five), most of whom should be nologists (persons who because of their scientific and technical knowledge, and diplomas,
are experts on the production and distribution of wine). OIV insists on absolute
anonymity in the presentation of the wines to the jurors. The order of presentation is sacrosanct: beginning with whites and going on to ross and reds, within
each first sparkling wines then still wines, and ending up with sweet wines and
mistelles.16 Furthermore, within each category the wines are presented to the
jurors in increasing order of persistence of aromatic intensity. Judges should
work in isolation, in well-ventilated, well-lit rooms, at temperatures between
18 and 22 centigrade, and be supplied with a carafe of water, small pieces of
bread to clear the palate, and a receptacle in which to discard wine. Wines are
to be tasted individually, not comparatively. Each type of wine must be served
at its ideal temperature.
The qualitative evaluations indicated on the jurorswine tasting rating sheets
[table 7.5] are translated into numbers by the secretariat in accordance with the
following calculation chart (OIV 1994). Each of six attributes is rated at one
of five levels by a check mark at the appropriate place: the calculations are done
by others, with 0 the best possible mark. Each [wine] receives a rating which
is the median rating based on the ratings resulting from the calculation of the
evaluation of each of the jurors.Afootnote reads, When there is an odd number
of jurors, the median is immediately evident. If the number happens to be even,
the median is based on the average of the closest two ratings in the middle of
16. A mistelle is a must whose fermentation has been arrested by the addition of alcohol.
Judging in Practice
155
Table 7.5
OIVs Calculations to Be Made by the Secretariat: Still Wines, 1994
Excellent
0
Eye
Aspect
Nose
Intensity
Quality
Mouth
Intensity
Quality
Harmony
Very Good
1
Good
2
Inadequate
4
Eliminated
Total
Weight
Result
1
2
1
2
1
3
3
0
3
3
9
the ratings (OIV 1994). The reason for taking the median or the middlemost of
the jurors scores accords with precedent (rather than with the incisive insight
of Sir Francis Galton). In the days before handheld calculators, computing an
average of seven numbers was a chore, but singling out the middlemost was
(and is) trivial. In fact, more and more, the jurys grade is taken to be the average
of the jurors grades.17
How use doth breed a habit in a man! To change a method is very difficult. Nevertheless, the OIV has been and is continuing to work on defining
a new method. This can only reflect dissatisfaction with the present system.
Its antecedents go back some thirty years when wines could display serious
defects in one or another aspect; accordingly, the rating sheets obliged judges
to give marks on all aspects. The advances in the technology of making wines
and in avoiding abusive transportation and storage have changed the problem.
Today wines that enter competitions do not have serious flaws. They are all
more or less good in every aspect, yet the whole may be flawed. As Parker
writes, Although technology allows wine-makers to produce wines of better and better quality, the continuing obsession with technically perfect wines
is unfortunately stripping wines of their identifiable and distinctive character.
Whether it is excessive filtration of wines or insufficiently critical emulation
of winemaking styles, the downside of modern winemaking is that it is now
increasingly difficult to tell an Italian Chardonnay from one made in France or
17. We are indebted to Jacques Blouin, nologist and an organizer of international wine
competitions, for this and other authoritative information given in the discussion of grading wines.
156
Chapter 7
California or Australia (2002, 18). Grading a wine on the basis of the sum of
the scores of its individual characteristics misses the point, for it has difficulty
in detecting exceptional wines by overly favoring wines that are taste-wise
correct (Peynaud and Blouin 1999, 109). Indeed, many say that professional
judges work backward: they first decide what grade a wine should receive, then
they score the individual characteristics so that the scores give the desired outcome. Even the great Parker seems to have hinted he may proceed in this manner.
7.8 The Paris Wine Tasting of 1976
Judging in Practice
157
Table 7.6
Judges Grades, Official Rankings, and Borda-, Quandt, and Majority-Rankings, CabarnetSauvignon Wine Tasting, Paris, May 22, 1976
A
B*
C*
D*
F*
P. Brejoux
A. de Villaine
M. Dovaz
P. Gallagher
O. Kahn
C. Dubois-Millot
R. Oliver
S. Spurrier
P. Tari
C. Vanneque
J.C. Vrinat
14
15
10
14
15
16
14
14
13
16.5
14
16
14
15
15
12
16
12
14
11
16
14
12
16
11
14
12
17
14
14
14
11
15
17
15
12
12
12
13.5
10
8
14
17
15
13
9
12
16
7
7
12
14
17
15.5
11
10
10
10
14
12
11
12
12
12
8
12
12
7
11
17
2
8
10
13
15
10
9
14
5
11
13
2
9
10
11
13
16.5
7
5
12
8
9
13
9.5
14
9
12
3
13
7
7
14
14
5
9
8
13
14
6
7
Average
Official rank
Borda-rank
Quandt rank
Majority-rank
14.4
1st
1st
1st
2d
14.3
2d
3d
3d
1st
13.6
3d
1st
2d
3d
11.8
4th
4th
4th
4th
11.6
5th
5th
5th
5th
10.9
6th
7th
7th
6th
10.6
7th
6th
6th
8th
10.5
8th
10th
10th
7th
10.0
9th
8th
9th
9th
5.7
10th
9th
8th
10th
judges rankings they induce. Quandt (2006) writes, for instance, [If] one
judge assigns to three wines the grades 3,4,5, while another judge assigns the
grades 18,19,20, and a third judge assigns 3,12,20, they appear to be in complete harmony concerning the ranking of wines, but have serious differences of
opinion with respect to the absolute quality. I am somewhat sceptical about the
value of the information contained in such differences. But we always have the
option of translating grades into ranks and then analyzing the ranks. Quandt
advocates adding the ranks (low numbers good, high numbers bad) and translating grades into ranks by giving each of k wines having the same grade the
average of the next k places on the list. Call it Quandts method. It is an
ad hoc kind of Borda idea. Translating grades into ranks discards important
information, as shown by Quandts example. This ignores the strategic aspects
of giving grades, which clearly has importance. A very confident judge could
well exaggerate her grades up or down to try to impose her will on the decision
of the jury.
The number of (expected) wins of a wine X against a wine Y, given in table
7.7, is the number of judges that give a higher grade to X than to Y plus 0.5
for each tie in the grades (thus, for example, A has a higher grade than B five
times and three are ties, so As number of wins against B is 6.5 and ipso facto
Bs number of wins against A is 4.5).
158
Chapter 7
Table 7.7
Number of Wins between Every Pair of Wines, and Borda-Scores, Cabernet-Sauvignon Wine
Tasting, Paris, May 22, 1976
A
BordaScore
A
B*
C*
D*
E
4.5
6.5
5.5
3.5
6.5
6
5.5
3
4.5
5
4.5
5.5
5.5
5.5
6.5
4.5
7.5
8
5.5
6.5
10
9
10
7.5
6.5
8
9
8
7.5
9
8.5
9
8.5
8.5
8
10.5
8
9.5
8
6
8.5
10
9
7.5
9
69.5
68
69.5
61
55
F*
G
H
I
J
1
3
2.5
0.5
2.5
2
2
2
3
1
1
3
2.5
1.5
2
3.5
3.5
2.5
3
3.5
4.5
2
3
5
2
6
4
4.5
3.5
4.5
5
4
7
6.5
5
7
6.5
6
6
7.5
7
4
5
38
39
31
32.5
31.5
In Table 7.7 the number of wins 5.5 indicates a tie (represented by S in the
following formula) between two wines. A number of wins above 5.5 means that
a majority of the judges gave a higher grade (S ) to the wine listed in the row
than to the wine listed in the column of the table. No one seems to have noticed
that this is a real example of the occurrence of the Condorcet paradox:
E S C S D S A S B S E.
These five winesA, B, C, D, and Ewere preferred by a majority of judges
to the remaining five wines, and all four methods agree with this (though they
disagree on the rankings among the first three). The last five wines are ranked
transitively according to the simple majority-rule
G S F S J S H S I,
but again there is no agreement among the methods as to the order among them.
The Paris wine tasting of 1976, sometimes called the judgment of Paris,
shows how important it is to have a reliable method of amalgamating opinions
founded on sound theoretical grounds.
7.9 Conclusion
Judging in Practice
159
Common Language
Language, that wonderful crystallisation of the very flow and spray of thought.
James Martineau
For a large class of casesthough not for allin which we employ the word meaning
it can be defined thus: the meaning of a word is its use in the language.
Ludwig Wittgenstein
Everywhere, in all pursuits, scientific and societal, scales are invented to measure, to understand, to classify, to evaluate, to rank, or to make decisions. This
applies to every activity, attribute, candidate, or alternative, be it an immutable
concept of the universetemperature and its degreesor an ephemeral fancy
the value of a painting and its price. These scales or measures constitute
common languages of words that have absolute meanings, clearly understood
by those who use them. Many domains of the physical world have natural
units of measurement, imposed as it were, by the physics of the situation:
time, mass, distance, speed, pressure, energy. Others of the physical world do
not, or do not in the present state of knowledge. Yet they show that words and
phrasesindeed, sometimes even colors and facescan and do define perfectly
understandable absolute measures.
8.1 Examples of Common Languages
Three examples are sufficient to show how such languages may be defined.
The Mohs scale of mineral hardness was proposed by Friedrich Mohs in 1812.
Ten specific substances are given, from hardest, rated 10, to softest, rated 1
(American Federation of Mineralogical Societies 2008):
10 diamond (C),
9 corundum (e.g., sapphire and ruby) (Al2 O3 ),
8 topaz (Al2 SiO4 (OH-,F-)2 ),
162
Chapter 8
7 quartz (SiO2 ),
6 orthoclase (KAlSi3 O8 ),
5 apatite (Ca5 (PO4 )3 (OH-,Cl-,F-)),
4 fluorite (CaF2 ),
3 calcite (CaCO3 ),
2 gypsum (CaSO4 2H2 O),
1 talc (Mg3 Si4 O10 (OH)2 ).
Common Language
163
IX Damage considerable in specially designed structures; well-designed frame structures thrown out of plumb. Damage great in substantial buildings, with partial collapse.
Buildings shifted off foundations.
X Some well-built wooden structures destroyed; most masonry and frame structures
destroyed with foundations. Rails bent.
XI Few, if any (masonry) structures remain standing. Bridges destroyed. Rails bent
greatly.
XII Damage total. Lines of sight and level are distorted. Objects thrown into the air.
The last example is the measurement of pain. Those who have suffered severe
pain may recall having been asked where they situated their pain on a 010
scale, 0 indicating no pain, 10 the most intense imaginable. With the question
formulated in this way, the answer means little: it remains a vague, completely
subjective suggestion having little to do with an absolute measure.
The following implicit questions together with descriptive words and colors
attached to a 100 visual scale give meanings to the levels. The scale goes
from unbearable distress to no distress: an intense red is rated 10 and described
as agonizing; the color gradually transforms into pinks, is rated 8, and called
horrible; 6 is dreadful; the color becomes yellow and is rated 5; then it turns
into a very light green, is rated 4, and categorized uncomfortable; the green
gradually deepens, is rated 2, and is said to be annoying; finally, 0, dark green,
is none (Adams 2008). Here at least some clear distinctions are made as to the
meanings of the numbers.
The Mankoski pain scale (Wilderness Emergency Medical Services Institute
2008) presents a much more detailed set of definitions:
0 Pain Free
1 Very minor annoyanceoccasional minor twinges. No medication needed.
2 Minor Annoyanceoccasional strong twinges. No medication needed.
3 Annoying enough to be distracting. Mild painkillers take care of it. (Aspirin,
Ibuprofen.)
4 Can be ignored if you are really involved in your work, but still distracting. Mild
painkillers remove pain for 34 hours.
5 Cant be ignored for more than 30 minutes. Mild painkillers ameliorate pain for 34
hours.
6 Cant be ignored for any length of time, but you can still go to work and participate
in social activities. Stronger painkillers (Codeine, narcotics) reduce pain for 34 hours.
7 Makes it difficult to concentrate, interferes with sleep. You can still function with
effort. Stronger painkillers are only partially effective.
8 Physical activity severely limited. You can read and converse with effort. Nausea and
dizziness set in as factors of pain.
164
Chapter 8
These definitions mix criteria expressed by a person suffering from pain with
others that may be used by an objective observer.
An altogether different common language of grades, the Faces Pain Scale
Revised (2010), is used as a measure of pain in pediatric medicine (Hicks et al.
2001). Six faces are aligned on a 010 scale. Instructions say,
These faces show how much something can hurt. This face [point to left-most face]
shows no pain. The faces show more and more pain [point to each from left to right] up
to this one [point to right-most face]it shows very much pain. Point to the face that
shows how much you hurt [right now].
Score the chosen face 0, 2, 4, 6, 8, or 10, counting left to right, so 0 = no pain
and 10 = very much pain. . . . This scale is intended to measure how children feel
inside, not how their face looks.
Each of the three measures of pain uses a 010 scale, but their common
languages are given different, though related, definitions. Every activity has its
own individual scale even when it uses a measure in common with others. The
high temperature of a patient is in no way as hot as the lowest temperature of a
kiln for baking ceramics. The scale of a thermometer used to assess a patients
temperature is other than that of a thermostat that controls the ambient heat of
a room: tenths of degrees (Celsius or Fahrenheit) are significant for the first,
integral units suffice for the second, but a much broader interval of temperatures
is required to control the warmth of a room than to measure the heat of a human
body.
8.2 Measurement Theory
Common Language
165
166
Chapter 8
The focus of the present enquiry is scales and levels of measurement that have
no clear-cut physical existence or validity, such as length, weight, time, or
sound. It deals primarily with common languages of grading that are intellectual
constructs and that have no meaning other than what is ascribed to them by their
users. Obvious examples are a students grade, the quality of a wine, the brio
of a pianists interpretation, the level of a skaters or a divers performance,
the excellence of a candidate for political office. No instrument for listening
will faithfully measure a pianists performance; no set of chemical tests (in the
present state of knowledge) will faithfully measure the quality of a wine; no sets
of questions will adequately measure the competence of an aspiring politician.
For the most part, there is no a priori demonstration of the validity of most of
the scales that are used and analyzed in this book; for the common languages
of measurement in these applications, the proof of the pudding is in the eating.
And proof there is, as has been seen in a variety of real, practical examples
presented in this and the preceding chapter and as proven for the 2007 voting
experiment carried out in Orsay.
The language of price in terms of units of money is a completely natural
and unquestioned example: it is a common language expressed in arbitrary
units (euros, dollars, pounds, francs) that is clearly understood. Indeed, when a
1. The quotations and definitions are taken from Narens and Luce (2008).
Common Language
167
currency is revaluedas when the old one hundred French francs became the
new one French franc in 1960or when there is a change in denominationas
when 6.55957 new French francs became 1 euromost people translate prices
back into the older language to fully appreciate their meaning and significance
when important expenditures are in the offing. Today in France there are people (undoubtedly adults in 1960) who still evaluate important expenses in old
francs: they are the benchmarks!
The real, practical common languages are usually given very careful definitions. The meanings of the U.I..s language used to evaluate wines until
2009 is evident; it has seven words: Excellent, Very Good, Good, Passable,
Inadequate, Mediocre, Bad (see table 7.3a). The numbers associated with
each word are used to determine total grades, though they may also be used as
synonyms and their intermediate values express more nuances. The same is true
for the language of the OIV, which contains only five words, Excellent, Very
Good, Good, Inadequate, and Eliminated, and its number scheme is different
and opposite (see table 7.4). If the same jury used both languages in parallel,
the resultsin particular, the rankings they implycould well be different, for
two reasons. First, the numbers of words are different; second, although they
share certain appellations, the languages are not the same and can therefore
elicit different appreciations.
The meaning of the 010 scale in increments of 12 used to grade divers is
clearly explained in short expressions, much as the language used for judging
wines (see chapter 7). It is fair to say that in both of these cases a common
language has not only been accepted but its words have become better understood over time and in use. In time, the numbers themselves come to have
shared, common meanings. The same cannot be said of the new rules used
for skaters and gymnasts. The regulations in those instances are so detailed
and so complexdeliberately designed, presumably, to counter cheatingthat
there is (at present) no common language. This is regrettable, for with the old
rules common languages had been established.
The scales used to give grades to students are many and vary across nations, as
noted earlier. But each constitutes a well-defined common language. One example concerns Belgium, France, Morocco, Portugal, Peru, Venezuela, Senegal,
Mali, Iran, and Tunisia, which all use a 020 scale, explained in the following terms: 1011 is adequate; 1213 is passable; 1415 is good; 16 is
excellent; 17 is outstanding; 1819 is nearing perfection; 20 is perfect (Wikipedia 2007). Until 2006 the Danish scale consisted of 10 grades
ranging from 00 to 13, with 00 being the worst. Grade 00, the completely
unacceptable performance; 03 the very hesitant, very insufficient, and unsatisfactory performance; 5 the hesitant and not satisfactory performance; 6 the just
168
Chapter 8
Common Language
169
How many grades should be used? Mineral hardness uses ten, the Mercalli
intensity scale defines twelve, pain is measured on eleven, divers are judged
on twenty-one. The official Union Internationale des nologues applies seven
grades to each characteristic; the International Skating Union modifies the base
value of each executed element with one of seven grades (0, 1, 2, 3) and
assigns one of forty grades to each of the five program components; in France
students are given one of twenty-one grades; in the United States they are
given one of six letter grades. An interesting but unrelated game theoretical
analysis has shown that to best motivate students who are primarily concerned
about their status it is preferable to assign them a grade from a coarse set of
few grades than one from a fine set of many grades (Dubey and Geanakoplos
2006). A greater number of grades permits a finer distinction but demands a
higher degree of expertise and discernment. When competitors (e.g., wines,
skaters) are judged according to several different criteria or characteristics, a
small number of grades for each may well suffice.
George A. Miller, persecuted by the integer 7 (or so he claimed), was driven
to write the most quoted article (Miller 1956) of the first hundred years of the
journal Psychological Review (Kintsch and Cacioppo 1994). He considered
two problems: a human beings capacity for absolute judgment, or the capacity
of people to transmit information, and the span of immediate memory (the latter
problemfor him fundamentally differentdoes not concern us). His is the
clearest explanation:
In the experiments on absolute judgment, the observer is considered to be a communication channel . . . The experimental problem is to increase the amount of input information
and to measure the amount of transmitted information. If the observers absolute judgments are quite accurate, then nearly all of the input information will be transmitted
and will be recoverable from his responses. If he makes errors, then the transmitted
information may be considerably less than the input. We expect that, as we increase the
amount of input information, the observer will begin to make more and more errors;
we can test the limits of accuracy of his absolute judgments. If the human observer is a
170
Chapter 8
reasonable kind of communication system, then when we increase the amount of input
information the transmitted information will increase at first and will eventually level
off at some asymptotic value. This asymptotic value we take to be the channel capacity
of the observer: it represents the greatest amount of information that he can give us about
the stimulus on the basis of an absolute judgment. The channel capacity is the upper
limit on the extent to which the observer can match his responses to the stimuli we give
him. (Miller 1956, 82)
Common Language
171
constitutes a common language, that is, that allows judges or voters to make
absolute judgments.
8.5 Interval Measure Grades
172
Chapter 8
As with the 100-meter dash, the numbers must be related to the percentages
of passing students if they are to constitute an interval measure. Imagine that
all the real numbers from 2 (the minimum acceptable) up to 12 are possible
passing grades in an examination. Underlying the idea of an interval measure is
that in the long run, over many students, in the closed interval [2, 12], the
percentages of students who obtain a grade in all intervals of length > 0
are the same. Which of the five passing grades should be assigned to a 5.7?
The grade whose number {2, 4, 7, 10, 12} is closest to 5.7, namely, 7 or good ;
or more generally, any number from the interval [5.5, 8.5] should be mapped
into a good. By the same token, any grade from the interval [2, 3] is mapped into
an adequate, from [3, 5.5] into a fair, from [8.5, 11] into an excellent, and from
[11, 12] into an outstanding. The five numbers (2, 4, 7, 10, 12) were chosen so
that the intervals occupy, respectively, the percentages of the whole equal to the
percentages of passing grades specified in the definition: [2, 3] occupies 10%
of the interval from 2 to 12; [3, 5.5] occupies 25%; [5.5, 8.5] occupies 30%;
[8.5, 11] occupies 25%; and [11, 12] occupies 10%. Thus equal intervals have
the same significance: on average the same percentages of passing students
belong to all such intervals, and on average 10% are outstanding, 25% are
excellent, and so on, down to 10% are adequate. Thus, the Danish system is an
interval measure.
More formally, suppose k number grades, x1 < x2 < < xk , are to be
given, and their percentages are to be (p1 , p2 , . . . , pk ), so
pj = 100. The
grades constitute an interval measure when for all i, xi is in the interval
[p1 + + pi1 , p1 + + pi ] and ij =1 pj is the midpoint of the interval
[xi , xi+1 ]. Let qi = i1 (1)j +1 pj for i = 1, . . . , k.
Theorem 8.1 There exist number grades x = (x1 , . . . , xk ) that constitute an
interval measure for the percentage distribution (p1 , . . . , pk ) if and only if
there exists a 0 that satisfies
Common Language
173
i
p2j 1
and
x2i+1 = + 2
i
p2j
i
pj = xi + xi+1
j =1
and
0 x1 p1 x2 p1 + p2 x3 xk p1 + p2 + + pk ,
then doing a bit of algebraic manipulation.
In the Danish case, namely, p = (10, 25, 30, 25, 10), there is a unique =
0 because q = (10, 15, 15, 10, 0) and max{15, 10} min{10, 15, 0}.
Thus, = 0 is unique, and x = (0, 20, 50, 80, 100). Rescaling them by dividing
by 10 and translating up by 2 yields the equivalent Danish grades.
On the other hand, if the Danes had observed, or asked for, p = (8, 24,
36, 24, 8), then q = (8, 16, 20, 4, 4), max{16, 4} min{8, 20, 4},
so any [0, 4] yields interval measure grades. For = 0 they are x =
(0, 16, 48, 88, 96), the lowest but not the highest point in the interval [0, 100]
is a grade; for = 2 they are x = (2, 14, 50, 86, 98), neither the lowest nor
the highest point in the interval [0, 100] is a grade; and for = 4 they are
x = (4, 12, 52, 84, 100), the highest but not the lowest point in the interval
[0, 100] is a grade.
Suppose they had observed the percentages p = (10, 20, 40, 20, 10). Then
q = (10, 10, 30, 10, 20), max{10, 10} min{10, 30, 20}, so = 10 is
unique. But now x = (10, 10, 50, 90, 90): the percentages of the five grades
cannot be achieved, but it is possible to have three grades, with 30% A/Bs,
40% Cs, and 30% D/Es, so the sums of the percentages for A and B and for
D and E do meet the requirements.
Finally, if instead the Danes had observed or stipulated the percentages
p = (10, 19, 42, 19, 10), then q = (10, 9, 33, 14, 24), so max{9, 14}
min{10, 33, 24} and there is no set of interval measure grades.
There are percentage distributions (p1 , . . . , pk ) for which no set
of interval measure grades exist.
Corollary
174
Chapter 8
So sometimes the percentages stipulated or observed admit interval measure grades, sometimes not. When several are possible, however, they are not
equivalent: one set cannot be obtained from the other by scaling and translating
because a change in the value of moves the grades with odd indices in the
opposite direction of the grades with even indices. When the value of is unique,
the solution is unstable, for some small perturbation in the percentages always
renders interval measure grades impossible. For example, for an > 0 perturbation of the Danes original percentages, p = (10, 25 + , 30 , 25, 10)
there is no set of interval measure grades. So, for any given set of percentages,
either there is no set of interval measure grades, or it is unique but unstable,
or there are several sets that are not equivalent. These are troublesome facts.
Together they suggest that mechanisms that depend on adding or averaging
should be shunned.
8.6 The Lesson
Common languages exist. They are used to measure many things. They may
seem to be completely arbitrary, each invented only to serve as a common
languageof numbers, alphabets, words, or facesin order to make common
assessments or to arrive at group or collective decisions. But when defined, they
have absolute meanings, even for such subjective experiences as pain.And when
they are used repeatedly they acquire more and more precise absolute meanings.
But glory, doesnt mean a nice knock-down argument, Alice said. When I use a
word, Humpty Dumpty said in a rather scornful tone, it means just what I choose it to
mean,neither more nor less. The question is, said Alice, whether you can make
words mean so many different things. The question is, said Humpty Dumpty, which
is to be masterthats all. (Carroll 1871)
Juries of experts who class wines, committees of professors who grade students,
Olympic officials of various nationalities who judge divers performances, and
earthquake victims who gauge damages use their languages of measures in
exactly the same manner as Humpty Dumpty used words. This is but an echo of
Wittgensteins somewhat more somber sounding, The meaning of a word is
its use in the language.
New Model
Revolution is not the uprising against pre-existing order, but the setting-up of a new
order contradictory to the traditional one.
Jos Ortega y Gasset
When we mean to build,
We first survey the plot, then draw the model;
And when we see the figure of the house,
Then must we rate the cost of the erection;
Which if we find outweighs ability,
What do we then but draw anew the model
In fewer offices, or at last desist
To build at all?
William Shakespeare
Over seven hundred years of effort and a host of impossibility theorems show
that the Arrovian model, where many individual rankings are to be resolved
into a single collective ranking, cannot be made to work: there is no satisfactory
mechanism for doing what is wanted. Experience shows, on the other hand, that
it is a relatively simple matter to invent grades, scores, levels, or measures to
evaluate the performances of students, figure skaters, divers, and musicians, the
qualities of wines and cheeses, and the intensities of seismic events, and so by
inference to determine the relative merits of competitors in any situation. With
use, the grades take on meaning, so they come to constitute a common language
of evaluation. Experience also shows that what to do with judges scoreshow
to resolve them into a single scoreis far from evident. Practical people have
devised many different mechanisms.
The first step is to formulate the problem precisely. That is the aim of this
chapter. It presents the basic model, which consists of a common language, a
set of judges, and a set of competitors.
176
Chapter 9
9.1 Inputs
A language
is a set of grades (words, levels, or categories) denoted by lowercase letters of the Greek alphabet, , , . . .. It is strictly ordered; specifically,
supposing , ,
, (1) any two levels may be compared, = implies
either ( is the higher grade) or ( is the higher grade); and (2)
transitivity holds, and imply . Otherwise, there is no restriction: a language
may be either finite or a subset of points of an interval of
the real line. means that either is a higher grade than or = .
There is a finite set of m competitors (alternatives, candidates, performances,
competing goods) C = {A, . . . , I, . . . , Z}. Individual competitors are denoted
by uppercase Latin letters.
There is also a finite set of n judges J = {1, . . . , j, . . . , n}. Individual judges
are denoted by lowercase Latin letters, typically i, j, k.
A problem is completely specified by its inputs, or a profile = (C, J ): it
is an m by n matrix of the grades (I, j )
assigned by each of the judges
j J to each of the competitors I C. Thus, if C is a collection of wines, J
is a jury of five nologists, and
is a language of six grades or levelssay,
excellent, very good, good, mediocre, poor, bad. Each judge gives to each wine
one of the six grades, and the profile is a matrix of grades with five columns
and as many rows as there are wines in the collection to be tasted.
9.2 Social Grading Functions
very good
excellent
mediocre
good
good
excellent
good
good
poor
mediocre
poor
good
good
very good
bad
good
= very good .
mediocre
New Model
177
In Arrows model the inputs are the judges rankings of the candidates; there
is no language or measure. A ranking functionor what he calls a social welfare
functionassigns to any preference-profile, one single ranking of society. In
terms of wines this would mean that every judge rank-orders all of them, and
the ranking function deduces one collective rank-order among them. But the
primary aim of the grading model is to classify competitors or alternatives, to
give them final grades as students are given final grades. The final grades may
be used to rank competitors, but only up to a point, because several competitors
appreciated differently by the judges may have a same final grade.
Many different grading methods F may be imagined. When the language is
numerical, say grades range from 0 to 100, the most often encountered example
is an F that assigns to each competitor the average of the grades given her by
the judges. Other possibilities would be to assign each competitor the lowest of
all her grades or the highest of all of them. But F should obey some minimal
requirements to be deemed acceptable. What should they be?
They are directly inspired by the requirements imposed on the traditional
model of social choice theory. There should be no inherent advantage or disadvantage given to any one or more competitors: all should be treated equally. So
if, for example, the three wines A, B, and C were listed in a different order
say, B followed by A, then by Cthen F should yield the same answer: B
very good, A good, C mediocre. When the rows (or competitors) of a profile
are permuted, F should give the identical answer permuted in the same way.
Axiom 9.1 states this formally.
Axiom 9.1 F is neutral, F () = F (), for any permutation of the
178
Chapter 9
anonymity: f (. . . , , . . . , , . . .) = f (. . . , , . . . , , . . .);
unanimity: f (, , . . . , ) = ;
monotonicity:
New Model
j j
179
for all j f (1 , . . . , j , . . . , n ) f (1 , . . . , j , . . . , n )
and
j j
for all j f (1 , . . . , n ) f (1 , . . . , n ).
When f only satisfies the first of the two monotonicity properties, it will be
said to be weakly monotonic.
Alanguage
is often parameterized as a bounded interval of the non-negative
rational or real numbers. In either case an obvious example of an aggregation
function assigns the mean value of its arguments. Other examples are those
that assign the geometric, the harmonic, or any other of the well-known means;
those that assign the minimum or the maximum value of its arguments; or
more generally, those that assign the kth largest of the arguments for 1 k n
(called order statistics by probabilists).
Theorem 9.1 A method of grading F is impartial, unanimous, monotonic, and
independent
of irrelevant alternatives in grading if and only if F ()(I ) =
f (I ) for every I C, for some one aggregation function f .
If there is an aggregation function f that defines F as in the statement
of the theorem, then the axioms are obviously met by F . On the other hand,
suppose F satisfies the axioms. IIAG and neutrality imply that F determines the
grade of a competitor I C solely on the basis of the grades assigned to I ; so call
the function that does this f . The other three axiomsanonymity, unanimity,
and monotonicityimmediately establish the corresponding properties of f ,
n
so it must be an aggregation function.
Proof
180
Chapter 9
is allowed to give, yet they cannot help but take on meanings of their own.
Were they then to be used in the judges language, a further enrichment of the
final grades would ensue. Why not simply take all the possible numbers as
grades to begin with?
Accordingly, in conformity with most practical applications, the common
language is parameterized as a subset of real numbers and whatever aggregation is used, small changes in the parameterization or the input grades should
naturally imply small changes in the outputs or the final grades. The analysis
in the rest of this book could equally well have chosen any open or half-open
interval, including [0, ). Even if the original language is finite, the possibility
of taking an arbitrary aggregation function implies that all possible parameterizations must be considered. Accordingly, the common language will be taken
to be [0, R], as did Laplace. Whereas Americans may like R = 100, the French
R = 20, and the Danes R = 13, most mathematicians probably prefer R = 1.
The choice is unimportant. Grades are almost always bounded. The grades used
by the International Organization of Vine and Wine (OIV) are an exception: 0
is the best, the worst.
Suppose that
and
are the number grades corresponding to two languages
or parameterizations that are -close: r
implies there exists an r
with
|r r | < , and symmetrically, r
implies there exists an r
with
|r r | < . It is then clear that a method of grading F should be defined by
an aggregation function f that satisfies |f (r1 , . . . , rn ) f (r1 , . . . , rn )| < ()
when maxj J |rj rj | < for some positive function () that converges to
0 when approaches 0. That is, f should be uniformly continuous, so, since
[0, R] is compact, f should be continuous.
Axiom 9.6 F (and its aggregation function f ) is continuous.
Two lists of grades that are very similar should clearly be assigned final
grades that differ by very little. Enriching a language by embedding it into a
real interval opens the door to vastly more possible methods of grading, but it
will turn out that the aggregation functions that emerge as those that must be used
are directly applicable in the seemingly more restrictive discrete languages as
well. The characterizations in the next chapters sometimes require the language
to be sufficiently rich and the functions to be continuous; but the properties of
the functions that are characterized hold for arbitrary finite languages. Many
theorems do not require all axioms.
A social grading function (SGF) F is a method of grading that satisfies the six
axioms of the basic model.
Thus F defines, and is defined by, a unique continuous aggregation function f .
New Model
181
The number of candidates or of judges may be varied to study certain properties and phenomena.
9.3 Social Ranking Functions
1 2 n1 n
..
..
.. .
= ...
.
.
.
2
n1 n
1
..
..
..
..
.
.
.
.
Imagine that a competitor A is assigned the list of grades = (1 , . . . , n ) and a
competitor B the list = (1 , . . . , n ). A method of ranking is a nonsymmetric
binary relation S that compares any two competitors, A and B, whose grades
belong to some profile . By definition, A S B if A S B and B S A; and
A S B if A S B, but it is not true that A S B. Thus S is a complete binary
relation.
Any reasonable method of ranking should possess certain minimal properties.
Axiom 9.7 The method of ranking S is neutral: A S B for the profile
implies A S B for the profile , for any permutation of the competitors
(or rows).
Axiom 9.8 The method of ranking S is anonymous: A S B for the profile
implies A S B for the profile , for any permutation of the judges (or
columns).1
Axiom 9.9 The method of ranking S is transitive: A S B and B S C
implies A S C.
1. So, for example, As grades (1 , 2 , . . . , n ) are permuted to ( 1 , 2 , . . . , n ).
182
Chapter 9
Axiom 9.9 demands that the Condorcet paradox be avoided. Axiom 9.10
is strong independence of irrelevant alternatives, as defined in chapter 3. It
demands that Arrows paradox be avoided. These are the two important paradoxes that have been observed to occur in real competitive situations.
A method of ranking respects grades if the rank-order between two candidates
A and B depends only on their sets of grades.
Thus, the preference lists induced by the grades must be forgotten: it matters not
which judge gave which grade. In other words, if two judges or voters switch
the grades they give to a candidate, then nothing changes in the jurys or the
electorates ranking of all candidates.
A method of ranking respects ties if when any two competitors A and B have
an identical set of grades, they are tied, A S B.
Respecting grades together with impartiality implies respecting ties.
Theorem 9.2 A method of ranking is neutral, anonymous, transitive, and inde-
where the grades of A are in the first row and those in the second row are called
those of A . Suppose A S A . Permuting the grades of the two judges 1 and
1 changes nothing by anonymity,
n
1 2 1
,
1 =
1
2 1 n
so the first row of 1 ranks at least as high as the second; but by neutrality
A S A, so that A S A . Thus (1 , 2 , . . . , n ) S ( 1 , 2 , . . . , n ) and
the second list agrees with Bs in the first place.
New Model
183
n
,
n
Nothing has been said yet concerning the behavior of the judges or the voters,
their complex and often secret aims, their likes and dislikes. The tradition in the
theory of social choice is to assume that judges and voters have preferences,
invariably expressed as rank-orders. But the word preference misleads. A
judge in a court of justice evaluates in conformity with the law, which has
nothing to do with his preferences; a judge may dislike a wine presented in a
competition yet give it a high grade because of its merits, or he may like one
and without qualm give it a low grade because of its demerits; an elector may
cast a vote not in accord with his personal opinions of the candidates but rather
184
Chapter 9
in the hope of making the correct decision by electing the best candidate for the
job (see, e.g., Goodin and Roberts 1975; but Llull, Cusanus, Condorcet, and
all the early thinkers formulated the problem in these terms, as do also some
philosophers today, e.g., Estlund 2008).
Thus whereas the traditional model pretends to aggregate the preferences
of judges and voters, in fact it does nothing of the kind. It amalgamates individuals rankings of the candidatesthe inputinto societys ranking of the
candidatesthe output. The possible outputs are rankings, yet the inputs say
nothing about how a judge or a voter compares the rankings.
In the real world the deep preferences or utilities of a judge or a voter are
a very complicated function that depends on a host of factors, including the
decision or output, the messages of the other judges (a judge may wish to differ
from the others, or on the contrary resemble the others), the social decision
function that is used (a judge may prefer a decision given by a democratic
function to one rendered by an oligarchic function, or the contrary), and the
message she thinks is the right one (a judge may prefer honest behavior, or not).
We contend that the deep preferences of judges or voters cannot be the inputs
of a practical model of voting. A judges input is simply a message, no more
no less. But her input, chosen strategically, depends, of course, on her deep
preferences or utilities.
Amartya Sens model (1970) and the subsequent work on welfarism
(Blackorby, Donaldson, and Weymark 1984; Bossert and Weymark 2004)
often referred to as social welfare functionalspostulates real number utilities
on candidates as the input, a rank-ordering of candidates as the output. The
motivation is the study of social welfare judgments in the context of Arrows
framework but with more information in order to avoid the impossibilities. Sen
makes no claim that this approach is valid in the context of voting. As with any
mathematical model, the mathematical symbols may be given very different
interpretations. At a formal level, reinterpreting the symbols of the inputs of
Sens model as the grades of a language yields a social ranking function. But
this ignores the essential concept of a common language. By contrast, utilities
measured in an absolute scale play no significant role in the social welfare
functional literature, which focuses instead on weaker information invariance
assumptions (although they are often assumed for simplicity, e.g. Blackorby,
Bossert, and Donaldson 2005).
Social welfare functionals are not intended to enable a comparison of
rankings. For, how are two outputstwo rankings of the candidatesto be
compared by a voter or judge on the basis of his utilities for individual candidates? If the answer is by looking at the first-place candidate of the rankings,
then all (m 1)! rankings that have the same first-place candidate must be taken
New Model
185
as giving him equal satisfaction. This is too restrictive for a theory (or practice)
that designates winners and orders of finish.
The language of grades has nothing to do with utilities (viewed as measures
of individual satisfaction). Grades are absolute measures of merit. In the context
of voting and judging, utilities are relative measures of satisfaction. The 2002
French presidential elections offer a perfect example of the difference. The
voters of the left would have hated to see Jacques Chirac defeat Lionel Jospin:
their utilities for a Chirac victory would have been the lowest possible. The same
voters were delighted to see Chirac roundly defeat Le Pen in the second round:
their utilities for a Chirac victory were the highest possible. On the other hand,
these same voters would probably have given Chirac a grade of Acceptable or
Poor (on a scale of Excellent, Very Good, Good, Acceptable, Poor, To Reject)
were he standing against Jospin, Le Pen, or anyone else.
Formally, a distinction must be made between two different types of scales
of measurement. An absolute scale measures each entity individually (height,
area, merit). A relative scale measures each entity with respect to a collective of
like entities (velocity, satisfaction). Were voters to be asked their satisfaction as
inputs, adjoining or eliminating candidates would alter their answers, provoking
the possibility of Arrows paradox. A common language must be an absolute
scale of measurement.
Utility plays another, important role in voting and judging. A decision maker
is routinely assumed to behave in such manner as to try to maximize his utility.
But what is it? In theory the utility of a judge or voter j may be imagined to
be a function uj ( , , f, C,
), where = (ij ), with ij the grade judge
j believes candidate i merits, = (ij ), with ij the grade judge j actually
gives candidate i, f is the aggregation function, C the set of competitors, and
the common language that is used. The utility of judge j could include a
term |ij ij | if she wished to grade candidate i honestly; it could include a
| if she wished that the other judges graded i honestly;
term k=j |ik
ik
it could include a term |
j | if she wished that the common language was
j rather than
; and the reader can no doubt invent many other utility functions
that a judge might have, or plausible components of them. One hypothesis is to
imagine that a judges utility is single-peaked in the grade of each candidate i,
uj = i |ij f (i1 , . . . , in )|: the further the final grade f (i1 , . . . , in )
is from what judge j believes it should be, the less her satisfaction. Another is
that a judges utilities depends solely on the winner, which is usually assumed
in the analysis of voting games. In fact, of course, judges utilities, judges
beliefs, their beliefs about the others beliefs, their likes and dislikes for the
decision mechanism or the language are all completely unknown and often
186
Chapter 9
hidden, and they change from one competition to another (indeed, perhaps a
voters utility on a sunny election day differs from that on a rainy election day).
In the terms of the current technical jargon, we are faced with a problem of mechanism design: [Individuals] actual preferences are not publicly
observable. As a result . . . individuals must be relied upon to reveal this
information . . . [The problem is] how this information can be elicited, and the
extent to which the information revelation problem constrains the ways in which
social decisions can respond to individual preferences (Mas-Colell, Whinston,
and Green 1995, 857). This is often seen as a problem of the theory of games
where the information is incomplete. The standard approach postulates that
every individual is of a certain type and associates to each type a utility function that depends only on the outcome. Typically, the individuals types are
drawn from a set of types by some known prior probability distribution, and
the utility functions are all of some common analytical form (whose parameters
vary with the different types). The methods are then, of course, dependent on
the utilities that are postulated.
In contrast, the methods we develop make no overall assumptions concerning
utilities. They are similar, in this regard, to Vickreys second price mechanism,
which allocates the good up for auction to the highest bidder at a price equal to
the second highest bid (Vickrey 1961): it depends only on the bidders bids
their private valuesnot their utilities. Our mechanisms depend only on
what in practice can be known. Knowing the judges or voters true utilities is
unnecessary to much of the analysis. The mechanisms that emerge as the only
ones that separately satisfy each of several desirable properties are strategyproof for large classes of reasonable utility functions, though not all. When
they are not strategy-proof, they are unique in being the least manipulable
methods in several well-defined senses.
10
Strategy in Grading
It is not true that men can be divided into absolutely honest persons and absolutely
dishonest ones. Our honesty varies with the strain put on it.
George Bernard Shaw
The members of a jury assign grades. A social grading function defines a mechanism for transforming the individual grades of several or many judges into
one final grade of the jury. The issues addressed in this chapter focus on the
question, What strategies will judges use in the game of assigning grades? Later
chapters consider other strategic games that judges may play, notably, how they
may act and react to giving grades when these are also used to rank competitors.
Experience clearly establishes the fact that assigning grades is a game,
because the playersthe judgesmay assign their grades strategically. A judge
may assign a grade that is well above or well below what he believes is the correct grade so that the jurys final grade approaches, as much as possible, what
he believes it should be. Worse, he may assign a grade dishonestly for reasons
that are totally extraneous to the performance or alternative being judged. A
device used in many competitions (e.g., sports and music contests) to counter
such temptations is to eliminate extreme grades, the highest and the lowest or
the two highest and the two lowest. In the 2002 figure skating scandal at the
Olympic games in Salt Lake City, the grade of a judge seems to have been
exaggerated beyond what it should have been for reasons having to do with
nationality rather than performance. As a result, the new International Skating
Union system of grading randomly discards several scores and then eliminates
the highest and lowest. Strategic manipulation in wine competitions is of a
different order because the wines are almost invariably tasted blind: careful
procedures make sure that a judge has no information whatsoever concerning
the identity of the wines he tastes. Nevertheless, the problem remains for, as
was pointed out by an organizer of wine competitions (see chapter 7), the rules
188
Chapter 10
may give one judge the power of forcing a final grade to be well below what
the majority believes it should be.
The potential for a judge to manipulate grades in the face of a mechanism that
determines the jurys final grade is, of course, a mirror image of similar behavior
in the traditional model where a judge has the potential to manipulate his rankordering knowing what method of voting will be used, or of the behavior of a
bidder in an auction knowing the rules that determine the winner and the price.
When Bordas method is the rule in the traditional model, a voter may well list
the strongest opponent of her favorite last, though she believes that the opponent
is the second best among all candidates. As was recognized by Laplace, and
indeed by Borda himself, his method does not encourage honest voting. Vickrey
showed that a similar phenomenon occurs in auctions: the usual mechanism
the highest bidder wins and pays the price he biddoes not encourage honest
bids. Galton realized an analogous property when a jury is to decide on a money
amount or weight: taking the average of their choices as the mechanism would
give a voting power to cranks in proportion to their crankiness.
A judge assigns a grade to a competitor. A very complex set of wishes,
opinions, expectations, and anticipationsin theory, his utility function, a complicated expression involving many variablesdetermines the grade he gives.
Note, in particular, that the final grade he wishes a competitor to be awarded,
the final grade he believes the competitor merits, and the grade he gives may
all be different. In many cases it is natural to assume that the further the jurys
grade strays from what a judge wishes the grade to be, the less he likes it.
That is, the preferences of each judge over the grades are single-peaked (as
implicitly assumed by Galton, explicitly by Black and Moulin). In this case
a judge seeks a strategy that will bring the final grademeaning the jurys
gradeas close as possible to what he wishes it to be. Whereas the assumption
of single-peaked preferences in the traditional model is a very restrictive and
unnatural assumption for most applications (Galtons budget problem stands
out as a notable exception), assuming that a judge has single-peaked preferences over the grades a competitor is to have is, in contrast, a very reasonable
and natural assumption. But it is not necessarily true of all judges. There is
a difference between a judge who either honestly believes or simply wishes
that a competitor receive a final grade close to his personal assessment, and
a judge whose mission is to distort the final grade whatever the value of the
performance of a competitor (the judge may have been bribed, or he may be a
member of a clique having a particular agendarecall the two blocs of nations
in international figure skating competitions described in chapter 7).
To begin, observe that there is no social grading function F (with aggregation
function f ) that can prevent every one of the judges from raising the final
Strategy in Grading
189
Experience shows that juries may well include judges who are bribed (recall
the Olympic brouhaha over figure skating). However, juries almost certainly
contain judges who honestly wish grades to be assigned according to merit, and
in certain cases it is perfectly reasonable to assume that all the judges of a jury
share this intent.
Juries in wine competitions when the tasting is blind cannot do otherwise, for
it is in practice impossible for them to identify particular wines. In a competition
to be named the worlds best sommelier (or wine master), described in the
October 2004 issue of the Revue du Vin de France, contestants were asked
to identify the country of origin and type of grape of two wines. The first
was a Riesling from New Zealand. Four candidates responded as follows: a
Chardonnay from South Africa, a Sauvignon from South Africa, an Albario
from Galicia, and an Assyrtico from the Cyclades Islands. The second wine
was a Carmnre from Chile. The same four candidates identified it as a Merlot
from Chile, a Cabernet Franc or Sauvignon from the Loire Valley, a Merlot
from Chile, and a Cabernet Sauvignon/Merlot from Chili. Another piece of
evidence attesting to the difficulty of identifying wines is an ongoing project
devoted to classifying all the worlds wines; there are well over 3,000 different
categories. Judges of sports competitions who are obliged to publicly announce
the grades they give are seriously constrained by the opinion of the public and
expert commentators.
Suppose that r is a jurys final grade.Asocial grading function (SGF) is strategyproof-in-grading if, when a judges input grade is r + > r, any change in his
input can only lead to a lower grade; and if, when a judges input grade is r < r,
190
Chapter 10
any change in his input can only lead to a higher grade. (The specification ingrading is often dropped when there can be no confusion as to the meaning of
strategy-proof.)
When it is the case that the more a final grade deviates from the grade a judge
wishes it to be, the less he likes it (single-peaked preferences over grades), the
utility function is
uj (r , r, f, C,
) = |rj f(r1 , . . . , rn )|.
This implies that it is a dominant strategy for a judge to assign the grade he
wishes. This means that it is at least as good as any other strategy, and it is
strictly better than others in some cases. The reverse implication is not true for
some preferences, as will be illustrated in an example below. Moreover, when
the judges have preferences that are single-peaked, but they also lean toward
being honest rather than not, so the utility function is
(uj (r , r, f, C,
) = |rj f(r1 , . . . , rn )| |rj rj |,
for > 0, then this implies that it is a strictly dominant strategy for a judge to
assign the grade he wishes; it is a strictly better strategy in all cases.
The use of a strategy-proof-in-grading SGF permits a judge who seeks to
grade honestlyone whose objective is a final grade as close as possible to the
grade he believes should be assignedto discard all strategic considerations
and to concentrate on the task of deciding what he believes is the true grade.
Moreover, he has no need to even know what his preference is between two
other grades, when one of them is lower than and the other higher than his true
grade. It is a very desirable property.
10.2 Order Functions
There is a class of SGFs whose functions are easily shown to be strategy-proofin-grading. We call them the order SGFs (in another context they are known as
the order statistics of the grades given a competitor). Their aggregation functions
f are the first or highest grade, the second or second highest grade, . . . , the kth
highest grade, . . . , the nth highest or worst grade.
The kth highest grade is the kth-order function f k .
It is, of course, an aggregation function. To see the truth of the claim of strategyproofness, suppose the kth-order value, or final grade, is r. A judge who wishes
the final grade to be higher can only hope to improve it by increasing the grade
he gives, but increasing it changes nothing. Similarly, if he wishes the final grade
Strategy in Grading
191
functions.
Let f (r1 , . . . , rn ) = r. Unanimity and monotonicity imply that the
value of r must fall between the worst and best grades, max rj r min rj .
This fact is used repeatedly here and in the chapters to come.
Suppose the judges assigned the grades r1 rn . We claim that
Proof
f (r1 , . . . , rn ) = rk
for some k.
rj > r
for any rj r.
This is true for two reasons. First, when rj is increased to a higher grade rj ,
the value of f cannot increase because f is strategy-proof-in-grading. Second,
when rj is decreased to a lower grade rj r, the value of f can either remain
the same or decrease. But if it decreased, then increasing the grade from that
point would again contradict the strategy-proofness of f .
Similarly, and for the same reasons, when f (r1 , . . . , rn ) = r,
implies f (r1 , . . . , rj 1 , rj , rj +1 , . . . , rn ) = r
rj < r
for any rj r.
nj
f (r + , . . . , r + , r, . . . , r) = r
nj
and
j
f (r, . . . , r, r , . . . , r ) = r.
But by monotonicity the value of f on the left is strictly greater than the value
of f on the right, a contradiction, proving r = rk for some k J .
192
Chapter 10
when r1 rn
implies
f (s1 . . . , sn ) = rk
Theorem 10.2
k1
nk
f (R, . . . , R, R, 0, . . . , 0) = R
k1
and
nk
f (R, . . . , R, 0, 0, . . . , 0) = 0.
This k is unique. If the language contains only two grades, the proof ends here.
Suppose the language contains at least three grades. Let R > rk > 0 and
k1
nk
f (R, . . . , R, rk , 0, . . . , 0) = r.
It will be shown that r = rk . Since f s values are one of its arguments, r = 0,
rk , or R. If r = R, voter k who believes that the final grade should be lower
can decrease the final grade to the lowest grade (or 0) by lowering his grade to
the lowest grade (or 0), violating strategy-proofness. Similarly, if r = 0, voter
k can increase the final grade from the lowest (0) to the highest (or R). Thus,
Strategy in Grading
193
k does not depend on the input r because it is defined uniquely, so this completes
the proof.
n
As a practical matter it is important to note that even if all the judges who
think that the final grade should be higher manipulate by increasing their input
grades, and all those who think that the final grade should be lower decrease
their input grades, the final grade remains the same with order functions. Consequently, order functions are strategy-proof for groups of judges having the
same interests acting together: they arein the vocabulary of the literature on
votingnonmanipulable by any judge or any coalition of judges, or groupstrategy-proof. For if a group of judges has given grades above the final grade,
a concerted decision to increase all their grades changes nothing; and similarly
when a group has given grades below the final grade.
As a technical matter, the central result in (Moulin 1980) may be used to prove
theorem 10.1, though the model and spirit are altogether different. Moulin seeks
to single out a candidate; we seek to single out a grade (see chapter 5). He characterizes a wider class of methods. When monotonicity is added and candidates
are interpreted as grades, Moulins proof provides another characterization of
the order functions, though his proof is considerably more involved.
To see why asking for an SGF that makes honesty a dominant strategy for
every judge is less general than asking for strategy-proofness, consider the
following example. There are three grades, , and two judges, and
the SGF is defined by
f (, ) = ,
f (, ) = ,
f (, ) = ,
f (, ) = ,
194
Chapter 10
Not all judges send messages (vote) according to their beliefs, as has been amply
documented. The problem remains: what social grading functions should be
used to determine the grades of figure skaters, divers, pianists, and others when
judges may manipulate the grades they announce? The ideal is a strategy-proof
SGF, one that encourages every judge, honest or not, to assign the grade he
thinks is the correct one. Regrettably, this ideal is impossible to achieve. So, we
naturally turn to the question, How can the potential impact of manipulating in
the assignment of grades best be countered? Since the facts make attaining the
ideal impossible, the demands of the practical world suggest that the ideal be
realized as near as may be. 1
To manipulate successfully a judge must be able to raise or lower the final
grade by changing the grade he assigns. In some situations a judge can only
change the final grade by increasing his grade, in others only by decreasing his
grade. When grades are sufficiently different, the judge who gives the lowest
grade can always increase the final grade, whereas the judge who gives the
highest grade can always decrease it. For suppose f (r1 , . . . , rn ) = r and R >
r1 > > rn > 0. Then,
f (r1 , . . . , rn1 , R) = f (R, r1 , . . . , rn1 ) > f (r1 , r2 , . . . , rn ) = r,
since f is anonymous and monotonic, showing the first claim. The second is
established similarly.
Judges who can both lower and raise the final grade have a much greater possibility of manipulating; an outsider seeking to bribe or otherwise influence the
outcome would surely wish to deal with such judges. It is important, therefore, to
identify methods of grading that eliminate as much as possible the existence of
such judges. The order functions clearly have the property that at most one judge
is able to both raise and lower a final grade. Moreover, if two or more judges
happen to assign the final grade, then no judge is able to both raise and lower
the final grade. It happens that they are the only SGFs with this property.
The order functions are the unique SGFs for which, for any r,
at most one judge may both increase and decrease a final grade.
Theorem 10.3
1. This was precisely Daniel Websters point when he addressed the Senate in 1832 on another
problem: The Constitution, therefore, must be understood, not as enjoying an absolute relative
equality, because that would be demanding an impossibility, but as requiring of Congress to make the
apportionment of Representatives among the several States according to their respective numbers,
as near as may be. That which cannot be done perfectly must be done in a manner as near perfection
as can be (1832, 107).
Strategy in Grading
195
Proof
f (r1 , . . . , rn ) = r
for some r1 rn ,
with r = (r1 , . . . , rn ).
If judge j is able to decrease the final grade by decreasing his grade rj , then
anonymity and monotonicity together imply that any judge k can do this when
rk rj . Similarly, if judge j is able to increase the final grade by increasing
his grade, then any judge k can do this when rk rj . Let
I (r) = {j J : j can decrease the final grade}
and
I + (r) = {j J : j can increase the final grade}.
By hypothesis, I (r) I + (r) contains at most one judge.
It will be shown that when j I (r), then rj r, and symmetrically, when
j I + (r), then rj r. Thus, rj < r implies j
/ I (r), and rj > r implies j
/
+
I (r): by definition f is strategy-proof-in-grading. So theorem 10.1 implies
the result.
Suppose i is the judge who gave the lowest grade among those able to
decrease the final grade, that is, ri = minj rj , j I (r). It must be shown that
ri r. By hypothesis,
f (r1 , . . . , ri1 , 0, ri+1 , . . . , rn ) = r 0 < r.
If ri1 > ri , let ri1 > 0 be the smallest grade of judge i 1 for which
f (r1 , . . . , ri1 , ri , . . . , rn ) = r.
Suppose ri1 > ri . The continuity of f implies that there exists a small > 0
so that
f (r1 , . . . , ri1
, ri , . . . , rn ) = r > r 0 ,
= r
for ri1
i1 > ri . Thus for (r1 , . . . , ri1 , ri , . . . , rn ) judge i 1 can
both increase and decrease the final grade, implying that all those giving lower
grades can only increase the final score. Therefore,
, 0, ri+1 , . . . , rn ) = r .
f (r1 , . . . , ri1
r = f (r1 , . . . , ri1
, 0, ri+1 , . . . , rn ) f (r1 , . . . , ri1 , 0, ri+1 , . . . , rn ) = r 0 ,
a contradiction.
196
Chapter 10
Corollary
Theorem 10.3 and its corollary are not true when the language is finite. Take
a problem with three grades, , and two judges, and let the SGF be
defined by
f (, ) = ,
f (, ) = ,
f (, ) = ,
f (, ) = .
f is not an order function. Whatever the two grades assigned, no judge can
both increase and decrease the final score. For when the vote is unanimous,
the only possible exception would be for f (, ), but in this case neither judge
can raise the final grade. Otherwise, a judge whose input is can only lower the
final grade; a judge whose input is can only raise it; and when f (, ) =
or f (, ) = , the judge with input can only raise it.
There is another way of looking at the problem of limiting the possibility
of manipulation. Given
an aggregation function f and a profile of grades r =
(r1 , . . . , rn
), let f (r) be number of judges who can decrease the final
grade, + f (r)
be the
number
ofjudges who can increase the final grade, and
f (r) = f (r) + + f (r) be their sum.
The manipulability of f is (f ), the largest value of f (r) over all possible
profiles r = (r1 , . . . , rn ),
(f ) = max f (r) .
r=(r1 ,...,rn )
At worst, a judge can both increase and decrease the final grade, so the manipulability of f can be no more than 2n, that is, (f ) 2n. In particular, when
Strategy in Grading
197
f is taken to be the arithmetic mean of the grades (as with Bordas method)
the manipulability is maximized, (f ) = 2n. On the other hand, when f is
the kth-order function, (f ) = n + 1. When it is impossible to entirely eliminate manipulation, a natural idea is to try to minimize the possibility of
manipulation.
Theorem 10.4 The only SGFs that minimize manipulability are the order
functions.
Proof Let f be any aggregation function whose manipulability is at most
n + 1, and let r be any input. As noted, if a judge can reduce (can augment)
the final grade, then so can any judge who assigns a higher (a lower) grade.
Take I (r) and I + (r) as defined in the proof of theorem 10.3, respectively
the sets of judges who can decrease and increase the final grade. If these sets
have more than one judge in common, |I (r) I + (r)| 2, then (f ) n + 2
because each judge in the intersection can both increase and decrease the final
grade, and each one of the other judges can at least increase or decrease the final
grade. Therefore, at most one judge can be in both sets, |I (r) I + (r)| 1,
and theorem 10.3 implies f must be an order function.
n
10.4 Implications
The aim of this chapter is to identify methods of grading that are immune
to strategic manipulation of the judges whenever possible; and when not, to limit
strategic manipulation as much as possible. In each case the answer is to choose
the order functions. They reflect the intuition of many practical people who have
felt that the best and worst grades, or the best two and worst two, should be
eliminated to combat manipulation. When a judge seeks to award final grades
that are meritedin some situations, such as tasting wines, the assumption is
reasonable; in others, judges may in fact simply prefer being honestthe order
SGFs are totally immune to manipulation. There exists no SGF that completely
prevents strategic manipulation, but the use of an order function minimizes the
probability that a judge can manipulate. For, a priori, a judge who manipulates
wishes to increase the final grade with probability 12 and wishes to decrease it
with probability 12 . And, a priori, any one of the n judges may cheat. Thus, when
the input of grades is r, the probability for a judge to manipulate successfully
when using an SGF with aggregation function f is
+
f (r)
f (r)
f (r)
1
1
+
=
.
2
n
2
n
2n
198
Chapter 10
One would like this probability to be as small as possible in the worst case; this
amounts to finding an f that solves
min max f (r) = min (f ),
f
r=(r1 ,...,rn )
that is, minimizes the manipulability. Thus the minimum probability is slightly
over one-half, n+1
2n , and is achieved only when f is an order function.
When the judges of a jury are obligated to publicly announce their grades, it
is reasonable to suppose that whatever exaggeration takes place is prudent: the
increase is small when the intent is to raise the grade, the decrease is small when
the intent is to lower the grade. If the first-order function (the maximum of all
grades) is used and all the judges give the same grade, then all n of them can
manipulate to increase the final grade. Similarly, when the nth-order function is
used and all give the same grade, then all n of them can manipulate to decrease
the final grade. However, when all the grades are different, r1 > > rn , the
kth-order function allows only one judge to manipulate prudently, namely,
the judge with the kth highest grade. In this case, the probability of manipulation
is reduced to n1 .
The order functions necessarily give answers that are in the original language,
so no enrichment of language is necessary. This is very important for many
applications where using numbers may be unnatural or where the numbers are
confusing to judges of different cultures (recall the profusion of scales used by
different nations in reporting the grades of students). Moreover, the properties
that characterize them in the context of a continuous language obviously hold
in the context of a discrete language.
These ideas have an immediate application. They provide a partial justification for the Borda-majority method (see chapter 5), which was actually used
by the International Skating Union (see chapter 7). In the context of this chapter, that method is the SGF determined by the kth-order function, where k is
the middlemost grade (when there are 2k + 1 judges) and judges or voters are
forced to grade every competitor differently with a number that depends on
the number of competitors. The middlemost grade is, of course, particularly
appealing because a majority of the judges agree that at least that grade should
be given, and at the same time a majority agree that at most that grade should
be given. It is given special attention in the chapters that follow.
11
Meaningfulness
The languages used to grade students vary from nation to nation, the ranges of
numbers used to evaluate flautists, pianists, and wines change from competition
to competition, and the highest and lowest scores earned by Olympic competitors differ from one to another athletic discipline. When numbers are used, the
differing languages are not related by a simple change in scale. For example,
a school grade of 10 on a scale of [0, 20] in France has an entirely different
meaning than a 50 on a scale of [0, 100] in the United States (indeed, it may be
more accurate to say that the scale in the United States is [50, 100], as Parker
the wine critic suggests). When the language is defined by letters or descriptive
phrases, any association with numbers is arbitrary. Social grading functions
transform the measures assigned by each member of the jury into a single measure of the jury. Is this single measure meaningful? Why should a particular
SGF be accepted as faithfully representing the will of the jury? Several different
qualitative properties are advanced that address this question. Each uniquely
characterizes the order functions among all possible aggregation functions.
11.1 Reinforcement and Conformity
Suppose that judges seek to assign the true grade, and imagine that after the
members of a jury have assigned their grades, some judge wishes to revise her
grade by assigning a grade closer to the final grade of the jury. Surely then
the jurys revised grade should be the same because there is a greater consensus
for that grade.
A social grading function (SGF) with aggregation function f is reinforcing
when
200
Chapter 11
r rk > rk
or
implies
f (r1 , . . . , rk1 , rk , rk+1 , . . . , rn ) = r.
It is immediately obvious that the order functions are reinforcing.
Theorem 11.1
Proof
implies f (r1 , . . . , rn ) S.
Theorem 11.2
Meaningfulness
201
That f conforms with the grades implies that for all r1 > r2 > > rn ,
f (r1 , . . . , rn ) = rk for some k. By the continuity of f , k should be the same for
all profiles, as was seen in the proof of theorem 10.1, so f must be the kth-order
n
function.
Proof
11.2 Language-Consistency
The particular language used in grading should make no difference to the ultimate outcomes. An aggregation function should give equivalent grades when
one language is faithfully translated into another. This is the meaningfulness
problem of measurement theory in the context of a jury decision (Krantz et al.
1971; Pfanzagl 1971; Roberts 1979).
An SGF with aggregation function f is language-consistent if
f (r1 ), . . . , (rn ) = (f (r1 , . . . , rn )
for all increasing, continuous functions
: [0, R] [R, R],
(0) = R,
(R) = R.
The fact that the order functions are language-consistent is immediate: the kth
largest value remains the kth largest value under increasing, continuous transformations. A particularly simple proof is given here of the reverse implication,
although the result is well known. One of the main points we would like to
emphasize in developing this theory is the ease of its ideas, its theorems and
their proofs. For simplicity, we systematically take a slightly less demanding definition of language-consistency, namely, we assume R = 0 and R = R:
the results are the same because the order functions clearly satisfy the more
demanding definition.
Theorem 11.3
functions.
Proof Suppose f is language-consistent and R > r1 > . . . > rn > 0. The
property is used to construct a sequence of continuous, increasing functions
that converge to the step function defined by
R if x = R
r1 if R > x r1
..
(x) = ...
.
if rn1 > x rn
r
n
0
if rn > x 0.
202
Chapter 11
New scale
R
r1
r2
r3
t
r4
0
r4
r3
r2
r1
Initial scale
Figure 11.1
The function t
Define the function t on each interval [ri+1 , ri ), as follows (see figure 11.1):
t (x) = ri+1 + t (ri ri+1 ) when x = ri+1 + (ri ri+1 ) for 0 < 1,
where r0 = R and rn+1 = 0. Note that t is strictly increasing and continuous
for every t, and t (ri ) = ri . By hypothesis
f (r1 , . . . , rn ) = f t (r1 ), . . . , t (rn ) = t (f (r1 , . . . , rn ) ,
whereas in the limit, as t ,
t (f (r1 , . . . , rn ) {r1 , . . . , rn },
that is, in the limit t must be one of the values ri . But this means that f
conforms with one of the assigned grades, so by theorem 11.2 it must be an
n
order function.
11.3 Order-Consistency
When do two different aggregation functions induce the same order on the final
grades? The answer is simple: when they are the same.
Two aggregation functions f and g that define the same order
on the final grades are identical.
Theorem 11.4
Meaningfulness
203
To say that f and g impose the same order means that f (r1 , . . . , rn )
f (s1 , . . . , sn ) for two sets of grades if and only if g(r1 , . . . , rn ) g(s1 , . . . , sn ).
This is equivalent to the same statement with the inequality () replaced by
equality (=). Suppose f (r1 , . . . , rn ) = r, so that f (r, . . . , r) = r. Then, since
f and g define the same order, f (r1 , . . . , rn ) = f (r, . . . , r) if and only if
g(r1 , . . . , rn ) = g(r, . . . , r). But g(r, . . . , r) = r, so f (r1 , . . . , rn ) = r if and
only if g(r1 , . . . , rn ) = r, that is, f = g.
n
Proof
(0) = R,
(R) = R.
It is obvious that the order functions are order-consistent; again, the property
characterizes them.
Theorem 11.5a The unique SGFs that are order-consistent are the order
functions.
This is really a corollary, but its importance merits calling it a
theorem. It suffices to observe, first, that language-consistency implies orderconsistency. Second, if f (r1 , . . . , rn ) = r, then f (r1 , . . . , rn ) = f (r, . . . , r),
implying, when f is order-consistent, that
Proof
204
Chapter 11
f (r1 ), . . . (rn ) = f (r), . . . , (r) = (r) = f (r1 , . . . , rn ) ,
n
so f is language-consistent.
The idea of measuring performances or alternatives to enable them to be compared depends crucially on the judges using a common language in assigning
grades. When each judge uses his own language, the only meaningful information concerning a judges evaluation is the order of his preferences. What
happens when there is no common language? For an SGF to be meaningful in
this case, comparisons between two aggregations should be preserved under
different increasing, continuous transformations of each of the judges grades.
An SGF with aggregation function f is preference-consistent if
f (r1 , . . . , rn ) f (s1 , . . . , sn )
implies
f 1 (r1 ), . . . , n (rn ) f 1 (s1 ), . . . , n (sn )
for all increasing, continuous functions
j : [0, R] [R, R],
Lemma
j (0) = R,
j (R) = R.
Meaningfulness
205
0
if x = 0
if x = si
si
i (x) =
s + if x = si +
k
R
if x = R.
Preference-consistency then implies
f s1 , . . . , si1 , i (si ), si+1 , . . . , sn =f s1 , . . . , si1 , i (si + ), si+1 , . . . , sn ,
1. Language-, order- and preference-consistency are known under different names in the literature
on measurement theory; also, theorem 11.4 and its corollaries are known in one guise or another
(see, e.g., Orlov 1981). However, the context and spirit of the approach are completely different,
and our proofs seem simpler. The theorems we give are in fact true with less demanding hypotheses.
206
Chapter 11
Meaningfulness
207
Thus all methods based on the traditional model are meaningless. Hence,
the only possible meaningful methods of election must be based on a new
paradigm.
In practice, this implies that languages of grades used in the majority judgment should be designed to embody as much as possible absolute meanings for
each judge and common meanings among all judges.
12
Majority-Grade
The object of . . . an election is to select, if possible, some candidate who shall, in the
opinion of a majority of the electors, be most fit for the post . . . [The] fundamental
condition . . . is that the method adopted must not be capable of bringing about a result
which is contrary to the wishes of the majority.
E. J. Nanson
The previous chapters have presented mounting evidence that argues for a jury to
arrive at a final grade by using one of the order functions. A completely different
set of arguments will single out one function that happens to be an order function.
Sir Francis Galton pointed in the right direction a century ago:
I wish to point out that the estimate to which least objection can be raised is the
middlemost estimate, the number of votes that it is too high being exactly balanced
by the number of votes that it is too low. Every other estimate is condemned by a majority of voters as being either too high or too low, the middlemost alone escaping this
condemnation. The number of voters may be odd or even. If odd, there is one middlemost value . . . If the number of voters be even, there are two middlemost values, the
mean of which must be taken. (Galton 1907a, 414)
12.1 Middlemost Aggregation Functions
There seems to be no particularly good argument for advancing the mean of the
middlemost values in the even caseGalton gave nonerather than any other
value in the interval. Given that r1 rn , the middlemost grade(s) and
interval are r(n+1)/2 when n is odd and [rn/2 , r(n+2)/2 ] when n is even. Recall
that the kth-order function is f k .
A social grading function (SGF) is middlemost if it is defined by a middlemost
aggregation function f , where for r1 rn ,
f (r1 , . . . , rn ) = r(n+1)/2
when n is odd,
210
Chapter 12
and
rn/2 f (r1 , . . . , rn ) r(n+2)/2
when n is even.
When n is odd, there is exactly one such function, f (n+1)/2 . When n is even,
there are infinitely many; in particular, f n/2 is the upper middlemost and
f (n+2)/2 is the lower middlemost.
An aggregation function f depends only on the middlemost interval if
f (r1 , . . . , rn ) = f (s1 , . . . , sn ) when the middlemost intervals of the grades
r = (r1 , . . . , rn ) and the grades s = (s1 , . . . , sn ) are the same.
Four separate and quite different justifications for choosing the middlemost
aggregation functions are given, and then it is shown why there is a single best
choicethe majority-gradewhatever may be the parity of n.
12.2 Majority Decision
It is at once evident that if a majority of the judges assign a grade r, then the
middlemost order functions assign a final grade of r. This is a very important property. If an absolute majority of the judges agree on a grade, then that
should surely be the final grade. The objection to the Sydney Wine Competitions system raised by its director (see chapter 7) focused on precisely this
point: a majority of the jury could favor bestowing a gold medal, yet the grade
assigned by one judge could deny it. The fact is that the middlemost are the
only aggregation functions that satisfy this criterion.
Theorem 12.1 The unique aggregation functions that assign a final grade of
r when a majority of judges assign r are the middlemost.
Proof The proof only invokes weak monotonicity and anonymity. Suppose
f is an aggregation function that assigns r when a majority assigns r, where
r1 rn . If n is odd,
(n1)/2
(n+1)/2
(n1)/2
Majority-Grade
211
If n is even,
(n2)/2
(n+2)/2
(n2)/2
212
Chapter 12
function is sought that minimizes the probability that a judge may be found
who can effectively raise or lower the grade in the worst case.
The probability of cheating Ch(f ) with an aggregation function f is
+ f (r) + (1 ) f (r)
max
.
Ch(f ) = max
r=(r1 ,...,rn ) 01
n
Theorem 12.2 The unique aggregation functions that minimize the probability
of cheating are the middlemost that depend only on the middlemost interval.
Suppose, first, that n is odd. To see that f (n+1)/2 minimizes Ch, observe
that for any aggregation function f ,
max max + f (r) + (1 ) f (r)
Proof
01
max
r
1
1
n + 1
+ f (r) + f (r)
.
2
2
2
n+1
so f must be an order function. But Ch(f k ) = max{ nk , nk+1
n }, so k = 2 .
Now suppose n is even, and f is any aggregation function for which Ch(f )
n+2
2n , so that
n + 2
max + f (r) , f (r)
.
r
2
Take r1 > > rn , let f (r1 , . . . , rn ) = r, and suppose that some judge j with
rj < r(n+2)/2 can change the final grade by increasing his grade not beyond
r(n+2)/2 : then a fortiori he can somewhere both increase and decrease the final
grade. But that implies that every judge k with rk rj can decrease it as well, so
n+4
that at least n+2
2 + 1 = 2 judges can decrease the final grade, a contradiction.
Therefore no judge j with rj < r n+2 can change the final grade by increasing
2
his grade to r(n+2)/2 . Similarly, no judge j with rj > r n2 can change the final
grade by decreasing his grade to rn/2 . Therefore,
Majority-Grade
213
n/2
n/2
Classical utilitarianism calls for the greatest good for the greatest number. In
that view of the world, the utility of individuals may be measured, and societys
utility is merely the sum of the utilities of its members. Can that message or
approach be interpreted in the context of grading?
Focus on a single judge. Suppose there exists a distance function d that
measures the judges discontent: when the judge assigns the grade r and the
final grade is s, his disutility is d(r, s). Thus d enjoys the following properties: d(r, r) = 0, d(r, s) = d(s, r), and r < s < t implies d(r, s) + d(s, t) =
d(r, t). The last equation simply says that the disutility of a judge who believes
the grade should be r when the final grade is t equals his disutility when the final
grade is s plus his disutility when he believes it should be s when the final grade
is t. This accommodates the possibility that, for example, on a scale of 0 to 100,
d(98, 99) = 5d(75, 76). Another interpretation focuses on the performances
of the competitors: the improvement in going from a grade of r to s plus the
improvement of going from s to t equals that of going from r to t. It is a simple
matter to define a continuous, strictly increasing function : [0, R] such
that d(r, s) = |(r) (s)|, namely,
214
Chapter 12
(s) =
d( R2 , s)
if s
d( R2 , s)
if s
R
2
R
2.
If ones intuition believes decreasing low grades to 0 is not that costly, then it
suffices to define by (s) = d(0, s). The definition permits situations where
going from a low grade to 0 is worth and from a high grade to R is
worth +. But that would require an unbounded language, which has not been
considered (and is not a practical idea). The function |(r) (s)|think of r
as the final grade and s as a judges inputis single-peaked. On the other hand,
other single-peaked functions are not distance functions, for example, (r s)2 ,
and so are not justified.
Assume that distance is the same to all the judges.
An SGF with aggregation function f maximizes the social welfare when the
final grade f (r1 , . . . , rn ) = r minimizes the total disutility of all the judges,
(r) = j J d(r, rj ).
Theorem 12.3 The unique aggregation functions that maximize the social
Majority-Grade
215
216
Chapter 12
judges and their grades, so the two f s are different because their arguments are
different numbers of grades. Iterating,
f (r1 , r2 , . . . , rn1 , rn ) = f (r+ , r ),
where
r+ = r = r(n+1)/2
when n is odd,
and
r+ = rn/2 ,
r = r(n+2)/2
when n is even,
Theorem 12.4
12.6 Majority-Grade
The middlemost aggregation functions are repeatedly the only ones that meet
the evident desiderata. But in the case of an even number of judges, they are
infinite in number and thus can yield different results. What should be the unique
choice?
An SGF respects consensus when all of As grades belong to the middlemost
interval of Bs grades, and this implies that As final grade is not below Bs
final grade.
The rationale is evident: when a jury is more united on the grade of one alternative than on that of another, the stronger consensus must be respected by the
award of a final grade no lower than the others; or, to take Galtons perspective, crankiness must not be respected. When there are only two judges, this is
closely related to Hammonds equity principle (Hammond 1976).
The majority-grade f maj is the lower middlemost order function,
f maj =
f (n+1)/2
f (n+2)/2
when n is odd
when n is even.
Proof
Majority-Grade
217
If n is even,
f (r1 , . . . , rn ) f(r(n+2)/2 , . . . , r(n+2)/2 ) = r(n+2)/2 = f maj .
Assume now that f f maj , and suppose all the grades of A are in the middlemost interval of Bs grades. Then, since f f maj , the final grade of B
according to f is at most the majority-grade of B. But since f is unanimous
and monotonic, the majority-grade of B is at most the final grade of A according
to f . This shows that f gives a grade to A equal at least to the one given to B,
so f respects consensus.
The only middlemost SGF f for which f f maj is the majority-grade f maj ,
and it is clear that the majority-grade respects consensus. This ends the proof.
The theorem and its proof are valid if the language is finite.
n
There is an obvious, opposed alternative.
An SGF respects dissent when all of As grades belong to the middlemost
interval of Bs grades, and this implies that As final grade is not above Bs
final grade.
Though this is not a compelling idea (particularly when the jury is small),
respecting dissent leads to implications that parallel those obtained with respecting consensus. The proofs are similar.
The other-majority-grade f o/maj is the upper middlemost order function,
f o/maj =
f (n+1)/2
f n/2
when n is odd
when n is even.
Theorem 12.5b An SGF f respects dissent if and only if f f o/maj The unique
Together the results point to the majority-grade as the best of all aggregation
functions.
First, the middlemost aggregation functions emerge as the best possible
choices to determine the final grade for many theoretical, practical, and intuitive
reasons. They always agree with a majority decision. They best avoid manipulation. They meet the utilitarian test of maximizing the social welfare of the
jury. They counter cheaters and cranks.
Second, when the choice is narrowed to the lower and upper middlemost
aggregation functions, respectively f maj and f o/maj , all the good properties
218
Chapter 12
of the order functions are inherited, properties proven for different reasons and
with independent arguments.
Third, several properties single out the majority-grade f maj among the
middlemost functions. The first is respecting consensus, a purely symmetric
condition that gives more credence to agreement than to disagreement. The second is a property of the majority-grade that is perhaps not at once evident: it is a
Rawlsian criterion. Within the class of all the middlemost functions it is the one
that singles out the best grade among many competitors or alternatives by the
max-min criterion; it assigns to each competitor or alternative a final grade that
is the minimum in the middlemost interval, the best among them being the one
which is the highest. The third is its simplicity: the method is intuitive and easily
understood. The fourth is that f maj emphasizes the positive: an absolute majority is for the grade it assigns or better. Contrast this with f o/maj : an absolute
majority is for the grade it assigns or worse. But when there are a large number
of voters, the majority-grade and the other-majority-grade are for all intents and
purposes the same, f maj = f o/maj (except for a set of measure zero).
To be able to determine a final grade when the number of judges is even is
of paramount importance for several reasons, though a priori a jury constituted
of an odd number is preferable. To begin, it is not always possible to determine
the size of the jury; for instance, in an election the parity of the number of
voters is unknown. Next, in practice it often happens that one or several judges
may be absent or may be declared ineligible for one or another reason. Last,
as is shown in chapter 13, to be able to establish complete rank-orderings of
candidates or alternatives, it is necessary to be able to determine final grades
for even-numbered as well as odd-numbered juries.
One major general problem remains: How are ties to be resolved? Equivalently, how are alternatives or competitors to be ranked?
13
Majority-Ranking
We may think of the political process as a machine which makes social decisions when
the views of representatives and their constituents are fed into it. A citizen will regard
some ways of designing this machine as more just than others. So a complete conception
of justice is not only able to assess laws and policies but it can also rank procedures for
selecting which political opinion is to be enacted into law.
John Rawls
220
Chapter 13
Strategy raises its head again, but in a slightly altered guise. In a sports competition or an election, for example, how the final grades compare may take on
greater importance than the final grades themselves. On the other hand, when
a language is well established and accepted, the grades themselves inevitably
take on a meaning and significance of their own. For example, a champion
diver whose majority-grade was merely 7 or good would undoubtedly be considered a rather modest champion. Usually, both the absolute final gradesthe
majority-gradesand their orderthe majority-rankingare significant.
Does this change in point of viewthe fact that the order-of-finish may be
the principal focusalter the conclusions arrived at earlier?
Given judges j J and candidates or alternatives I C, a profile of grades
is a matrix (rjI ) with rjI [0, R], and the vector of final grades is r I. Suppose
that the final grades of some two alternatives A, B C are r A < r B , but some
judge j is of the opposite conviction, rjA > rjB . She would like either to increase
As final grade or decrease Bs final grade or, better yet, do both.
A social grading function (SGF) is strategy-proof-in-ranking when the final
grade of A is lower than that of B, r A < r B , and when any judge j of the
opposite conviction, rjA > rjB , can neither decrease Bs final grade nor increase
As final grade. (The specification in-ranking is often dropped when there
can be no confusion as to the meaning of strategy-proof.)
Consider an individual judge or voter j whose utility function uj depends
only on the ultimate ranking of the competitors, that is, only on the order of the
final grades. Suppose that when judge j s utility for A to rank above B is greater
than her utility for B to rank no lower than A, uj {rA > rB } > uj {rB rA }, this
is equivalent to a conviction that A merits a higher grade than B, rjA > rjB .
Then if the SGF is strategy-proof-in-ranking, it is a dominant strategy for judge
j to assign grades according to her convictions since it serves no purpose to do
otherwise. But notice that the ability to so distinguish any two competitors A
and B means that the language of grades must be sufficiently rich.
Regrettably, this ideal cannot be met.
Theorem 13.1 There exists no SGF that is strategy-proof-in-ranking.
Majority-Ranking
221
than the winner is lowered in some input, the winner remains the sameall
reasonable methods of the new model are strongly monotonic.
It suffices to construct a profile of grades of two competitors for which
r A < r B and there exists some judge j with rjA > rjB who can either raise As
final grade or decrease Bs.
Suppose r1A > r2A > > rnA . Either there is a judge j with rjA > r A or not.
If rjA > r A , take Bs grades to be
Proof
rjA > rjB > r1B > > rjB1 > rjB+1 > > rnB > r A .
Then, since all of Bs grades are higher than r A , so is Bs final grade, r B > r A .
Now suppose judge j reduces Bs grade to any value rjB in the open interval
(r A , rnB ). Then
B , rB )
(rjB , r1B , . . . , rjB1 , rjB+1 , . . . , rn1
n
with a strict inequality holding between every pair of corresponding components. So by monotonicity the final grade for the grades on the left must be
strictly higher than that for the grades on the right: judge n is able to increase
As final grade. Notice that continuity plays no role in the proof; it is only
n
necessary that the common language be sufficiently rich.
Absolute perfection is unattainable, as so often in life. But perhaps something
that limits the ability of judges to manipulate rankings can be achieved.
An SGF is partially strategy-proof-in-ranking when r A < r B and when any
judge j is of the opposite opinion, rjA > rjB , then if he can decrease Bs final
222
Chapter 13
grade, he cannot increase As final grade, and if he can increase As final grade,
he cannot decrease Bs final grade.
Happily, this condition can be realized.
The unique SGFs that are partially strategy-proof-in-ranking
are the order functions.
Theorem 13.2
Majority-Ranking
223
A in the ranking, nor lower B. This does not mean, however, that approval
voting is nonmanipulable in ranking. For a judge k could well approve of two
candidates A and B, yetbecause of the poverty of the languagenot be able
to express his preference for A to finish above B. In this case he could be
tempted to disapprove of B. Thus, whatever the language of grades, rich or
poor, every method is manipulable in ranking.
Approval voting is partially strategy-proof-in-ranking. The judge k just
mentioned can contribute to lowering B in the ranking, but he can do nothing
to raise A. Symmetrically, a judge who disapproves of two candidates can
contribute to raising one but can do nothing to lower the other.
Chapter 19 shows how the majority judgment effectively limits strategic manipulation in a real example (see table 19.3 and the accompanying discussion).
Of course, judgesutilities may not depend only on the final rankings: the final
grades may enter into their utilities as well. Since it is not always a dominant
strategy for a judge to give grades according to his convictions when rankings
are in the offing, what may be expected to happen at equilibrium in this game
becomes an interesting question (see chapter 20).
13.2 Majority-Value
r1A
.
..
= r1I
.
.
.
r1Z
rjA
..
.
. . . rjI
..
.
. . . rjZ
...
...
...
rnA
..
.
rnI .
..
. . . rnZ
A social ranking function satisfies all the basic axioms if and only if each
row of may be permuted independently without changing the result. This
simply means that given a jurys grades, there are (n!)m other profiles that
give the identical final grades (though some of these may be the same since
some of the entries in a row of the matrix may be equal). Each contains
exactly the same information, no more, no less. Among them, the ordered
profile is particularly convenient: it satisfies (with the subscripts in each row
renamed)
224
Chapter 13
Majority-Ranking
=
225
A : 90 85 78 73 73 70 69
,
B : 95 81 77 73 73 70 66
Theorem 13.3
The proof is completely trivial. But remember, the usual practice is to compute the mean or average value of the grades of each contestant and to rank them
accordingly. This theorem is assuredly not true in that case. As a practical matter, the theorem is very important: complete rankings are very definitely wanted
in many applications. Witness, for example, the complicated (and unjustified)
rules used to settle ties in skating.
Define, for every candidate or alternative, the first majority-grade to be the
majority-grade of the entire jury; the second majority-grade to be the majoritygrade of the grades that remain after the first majority-grade is dropped; the third
majority-grade to be the majority-grade of the grades that remain after the first
two majority-grades have been dropped; . . . ; and the nth majority-grade to be
the majority-grade of the grades that remain after the first n 1 majority-grades
have been dropped.
Acandidates (or alternatives) majority-value for a jury of n members is a vector
of n components that assigns, in order, his first, second, third, . . . , nth majoritygrades. A maj B if and only if As majority-value is lexicographically higher
than Bs.
In the previous example, As majority-value is (73, 73, 78, 70, 85, 69, 90) and
Bs is (73, 73, 77, 70, 81, 66, 95). As is higher because As is higher in the
first entry at which they differ. Both candidates have a final grade of 73, but
one of the two is slightly better. Notice, however, that in this example it suffices
226
Chapter 13
to know the first three majority-grades to determine that A maj B; the kth
majority-grades for k 4 are superfluous.
What, it is fair to ask, are the lowest grades that A could earn and the highest
grades that B could earn from a seven-member jury to end up with the same
first three majority-grades and thus with A maj B. The answer is
78 78 73 73 0
0
= A : 78
.
B : 100 100 77 73 73 73 73
The majority-ranking systematically eliminates crankiness as much as possible.
The important grades are the ones in the middle. In the example, Bs 100s and
As 0s are clearly suspect. Indeed, how is one to evaluate the competence of a
judge who is a member of a jury? Since the members of the jury areor must be
consideredexperts, those whose grades are in the middle should be the most
significant. With the majority-ranking this wish is realized: the further a judges
grades are from the middle, the less will be their impact on the majority-value.
The majority-ranking is different than the ranking procedure that was once
used by the International Skating Union (see tables 7.1b, 7.1c). The ISUs
ordinal system gave the majority-grade but resolved ties differently, using
increasingly ad hoc devices: first, if the majority-grade is the same, by the size
of the majority in favor of at least the majority-grade; second, if the majorities
are of the same size, by the magnitude of the sum of the corresponding places
(considered as points or grades); third, if those magnitudes are the same, by
the magnitude of the sum of all the places (considered as points or grades). The
majority-ranking of the particular example happened to agree with that of the
ISU. However, the majority-ranking is, in general, more precise than the ISUs.
There are cases when the ISUs declares a tie but the majority-ranking discerns
an order, whereas a tie in the majority-ranking (rare) implies a tie in the ISUs
ranking.
On the other hand, the majority-ranking for the SCW, election Society (table
13.1) does not agree with the ISUs tie-breaking rules (compare with table 5.1a).
13.3 Characterization
Given the input grades of two competitors A and B, how should they be ranked?
Write A S B to mean A is ranked ahead of B, and A S B to mean either
A is ahead of B or they are tied. Recall that a social ranking function (SRF)
assigns a transitive rank-order S to any set of competitors that respects ties
and grades (in comparing the lists of the grades of two candidates, who gave
what grade is forgotten). This is a consequence of asking that it avoid the Arrow
and Condorcet paradoxes (see chapter 9).
Majority-Ranking
227
Table 13.1
Majority-Values and Majority-Ranking with Borda-Majority Method, SCW Society Election
Points
A
B
C
Score
No. for
Majority-Value
ISU Rule
24
9
19
11
21
20
17
22
13
1
1
1
35
30
39
1st
3d
2d
2d
3d
1st
1,11112121. . .
1,11111110. . .
1,11111111. . .
228
Chapter 13
Theorem 13.4 The majority-ranking is the unique middlemost, choicemonotonic social ranking function that rewards consensus.
A S B
A
r
A
r
when
rA
A
r
B
> r
B
> r
B
= r
B
> r
and
and
and
and
A
r+
A
r+
A
r+
A
r+
B,
> r+
B,
= r+
B,
> r
B.
< r+
+
+
that all grades outside the first middlemost intervals are minimum or maximum
grades). The last comparison is implied by rewarding consensus and the middlemost property (since it may be assumed that all grades of A are in the first
middlemost interval of B). This is exactly the output of the majority-ranking.
(For the four remaining possibilities, A S B.)
If they first differ in the kth middlemost intervals of their grades for k > 1,
n
the proof is the same.
The majority-ranking satisfies Arrows unanimity principle: if every judge
gives a strictly higher grade to a candidate A than to B, then A is ranked
higher than B. The same is true for any social ranking function that is choicemonotonic.
This simple characterization of the majority-ranking parallels the characterization of the majority-grade and in this sense shows that it is its natural
generalization. In marked contrast with the traditional model, grades and
ranks agree. Moreover, the majority judgment is rank-compatible and rankmonotonic (impossible in the traditional model; recall theorem 4.4). There is
a fundamental compatibility between grading, ranking, and electing.
Why are the middlemost SRFs so important? Because they best resist
manipulation, as the following argument shows.
A social ranking function must be meaningful, that is, it must be orderconsistent. Therefore, by theorem 11.5b (and sequel), an order function must
be used to decide on a final grade; if there are ties, a second order function must
resolve them; if ties remain, a third order function must be invoked; and so on. If
the social ranking function must be choice-monotonic or if the practical situation
Majority-Ranking
229
demands a resolution of ties as much as possible, then all order functions will
be invoked.
A lexi-order social ranking function is a permutation of the order functions
f = (f (1) , . . . , f (n) ) that ranks the candidates by
A S B if f (1) (A), . . . , f (n) (A) lex f (1) (B), . . . , f (n) (B) .
There are n! lexi-order SRFs. Which ones minimize the probability of cheating?
Recall that the probability of cheating with a grading function f is
+ f (r) + (1 ) f (r)
.
Ch(f ) = max
max
r=(r1 ,...,rn ) 01
n
Since the decision is taken lexicographically, the probability of cheating must
be measured lexicographically as well. Thus the lexi-probability of cheating is
LCh(f )= Ch(f (1) ), . . . , Ch(f (1), . . . , f (k) ), . . . , Ch(f (1), . . . , f (n) ) ,
and the aim is to find the lexicographic minimum of LCh(f ) over the
functions f .
Theorem 13.5 The only lexi-order SRFs that minimize the lexi-probability of
cheating are those that are middlemost SRFs. In particular, the majority-ranking
is the unique choice-monotonic, meaningful SRF that minimizes cheating and
rewards consensus.
Proof To obtain a lexicographic minimum implies first minimizing the first
term, thengiven the firstminimizing the two first terms, thengiven the
two firstminimizing the three first terms, and so on.
First, Ch(f (1) , . . . , f (k) ) must be calculated. Suppose r = (r1 , . . . , rn ),
where R > r1 > > rn > 0. The number of judges
who can increase
the
is
n
+
1
min
(i)
, and
k-dimensional vector f (1) (r), . . . , f (k) (r)
1ik
the number of judges who can decrease it is max1ik (i) . So the probability
of cheating is max1ik Ch(f (i) ).
Therefore to minimize the
mini
lexi-probability of cheating is equivalent to
mizing lexicographically Ch(f (1) ), . . . , Ch(f (k) ), . . . , Ch(f (n) ) . As in
the proof of theorem 12.2, the least manipulable order functions are those in
the first middlemost interval, the next least manipulable are in the second
middlemost interval, and so forth. So the lexi-order SRFs must be middlen
most SRFs.
n
230
Chapter 13
define almost surely the same ranking. In this sense, the majority-gauge-ranking
is the unique meaningful SRF that minimizes cheating in a large electorate.
13.4 Juries of Different Sizes
In practice, as has already been mentioned, competitions with many contestants field many juries. Les Citadelles du Vin, an annual international wine and
spirits competition sponsored by the OIV and held in the Bordeaux region,
graded 1,247 different wines in June 2006, with a dozen juries chosen from
some sixty experts, each meant to consist of five judges; in fact, several juries
had fewer than five members. How are two alternatives evaluated by juries
of different sizes to be compared? If the majority-grade of one is higher
than that of the other, the answer is known. But what if they have the same
majority-grade?
Postulate one jury of five persons evaluating three alternatives, and another
jury of four persons evaluating two alternatives, with grades ranging from a
high of 5 to a low of 1.
A: 4 4 4 3 3
D: 4 4 4 3
and 2 =
.
1 = B : 5 4 4 3 3
E: 5 4 4 2
C: 5 4 4 4 3
They all have a majority-grade of 4. Clearly C maj B maj A, since the order
agrees with weak monotonicity and the sets of grades are different, so there can
be no ties. That D maj E is also clear because the grades of D are more
consensual than those of E. The only question is how to obtain a ranking of all
five alternatives.
Two procedures naturally suggest themselves.
Annex the majority-grade of each alternative considered by the smaller jury
to its grades as many times as necessary to equal the number of grades of the
larger jury.
The rationale for this is that the most reliable collective information concerning
the grade of a contestant given by any jury is its majority-grade, so it is adjoined.
For the example, this yields
A: 4 4 4 3 3
B : 5 4 4 3 3
= C : 5 4 4 4 3 ,
D : 4 4 4 4 3
E: 5 4 4 4 2
Majority-Ranking
231
A: 4 4 3 3
B : 5 4 3 3
=
C : 5 4 4 3 .
D : 4 4 4 3
E: 5 4 4 2
is exactly equal to without the column of the majorityObserve that
grades, so the two procedures result in identical rankings. They must. A
moments reflection shows why: annexing the identical majority-grade to the
alternatives of the smaller jury gives all five the same majority-grade in , so
they are ranked according to the other grades; but these other grades are precisely the ones obtained when removing the majority-grade from the larger jury
. This is clearly true for all problems. This suggests generalizing
found in
the majority-ranking to the comparison of pairs of alternatives or candidates
evaluated by juries of different sizes.
The general majority-ranking (gmaj ) between two alternatives A and B
when Bs jury is no larger than As jury is defined by the majority-ranking
(maj ) applied to two sets of grades of equal size: As grades, and Bs grades
supplemented, to the extent necessary, by its majority-grade.
Theorem 13.6 The general majority-ranking gmaj is a complete, transitive
order of all alternatives or candidates.
232
Chapter 13
Majority-Ranking
233
unless As second majority-grade, which cannot be lower than the majoritygrade, equals the common majority-grade. Again, a slight but real advantage is
given to A.
This procedure seems acceptable as a practical matter in a variety of circumstances. But when there are large numbers of judges, say, one hundred or
more, the majority-ranking may be simplified, and this suggests other possible
practical approaches to the comparison of results of juries of different sizes.
14
Large Electorates
To the mind, good and evil, above and below, are not skeptical, relative concepts, but
terms of a function, values that depend on the context they find themselves in.
Robert Musil
The majority-values of competitors or candidatesand thus the majorityrankingmay be simplified when a jury has many judges or an electorate many
voters, or when the common language contains few grades in comparison with
the number of judges. To see why, look again at the example of candidate A
in the fifty-two-voter SCW Society election (see table 13.1): twenty-four 2s,
eleven 1s, and seventeen 0s:
24
11
17
The kth majority-grades for k = 1, . . . , 52, are indicated by the subscripts. The
majority-value is obtained from the ordered set of grades by beginning with
the majority-grade, or the lower middlemost grade, and then taking alternating grades emanating from the middle. As majority-value (of fifty-two digits)
begins with two 11s and seven 12s, and ends with a sequence of 02s:
52 digits
1,11112121212121212020202 .
17
11 12 02 ,
where the numbers above the pairs describe how often each pair is repeated.
A general algorithm is given that shows how these repetitions may be used to
write a more efficient, abbreviated expression of the majority-value. It applies
236
Chapter 14
When the distinction is necessary, is called the competitors modified majority grade. Suppose, for example, that
= {A, B, C, D, E, F } is the language
of grades, with A the highest to F the lowest, and a competitors grades are
distributed as follows:
A
Large Electorates
237
if
, or,
= and = +, = .
or
when
, or
= = + and p > r,
= = and q < s.
or
implies
X maj Y.
or
yet X and Y are not tied in the majority-ranking. The experimental evidence
shows how rarely the majority-gauges are tied. In four independent experiments,
with 10,000 random drawings of 101 ballots, ties occurred 8, 9, 10, and 11 times,
a rate of about 0.1% (see tables, 6.12a, 6.12b, 6.14a, 6.14b). In another four
independent experiments, with 10,000 random drawings of 201 ballots, ties
occurred 3, 3, 9, and 10 times (see tables, 14.4a and 14.4b). And as the number
of voters increases, ties become still less likely.
Proof It may be assumed that n, the number of judges or voters, is odd. For, if
n is even, adjoining one to Xs grades (his majority-grade) leaves (p, , q)
unchanged; and similarly for Y . Moreover, the majority-order between X and
Y remains unchanged as well (see chapter 13).
238
Chapter 14
Take X mg Y , that is, (p, , q) mg (r, , s). Then either or
= .
To begin, suppose . Then either Xs majority-grade is above Y s,
, or they have the same majority-grade , but = + and = .
In the first case, the majority-grade of X is above Y s, so X mg Y implies
X maj Y .
In the second case, p > q and r s. The relation p > q implies that beginning at the center and taking Xs alternate emanating grades, a higher grade
than is encountered before a lower grade, whereas r s implies that taking Y s alternate emanating grades from the center, a lower grade than is
encountered before a higher grade. This is clearly true when r < s. It is also
true when r = s because since n is odd, the first majority-grade is squarely
in the middle and the second majority-grade is on the side of the lower grades,
so the alternating process begins on the lower side. There are two possibilities:
(1) a higher grade than is encountered in Xs grades before a lower grade
than is encountered in Y s grades, or (2) a lower grade than is encountered
in Y s grades before a higher grade than is encountered in Xs grades. In
either case Xs majority-value is lexicographically above Y s majority-value,
so X mg Y implies X maj Y .
Now suppose = . Either = = +, or = = . Beginning at the center, take the alternate emanating grades of both X and Y . In the
first case a grade above is encountered before a grade below for X and
for Y ; and p > r implies it is encountered with Xs grades before Y s, so Xs
majority-value is lexicographically above Y s. In the second case a grade below
is encountered before a grade above for X and for Y ; and q < s implies
it is encountered with Y s grades before Xs, so again Xs majority-value is
n
lexicographically above Y s, completing the proof.
For an example of the majority-ranking and the corresponding majorityvalues and majority-gauges, see table 14.3.
Two important strategic properties of the majority-grade f maj are inherited by the majority-gauge. In comparing majority-gauges here, higher,
lower, increase, and decrease mean with respect to the majority-gaugeranking mg .
The majority-gauge is strategy-proof-in-grading:
Suppose that a competitors majority-gauge is (p, , q). If a voters input grade
is and , any change in his input can only lead to a lower majority-gauge;
and if a voters input grade is , any change in his input can only lead to
a higher majority-gauge.
Large Electorates
239
The proof is obvious: the only change that can occur in (p, , q) is either for
some voter among the p (who gave a grade higher than ) to give as input
or a lower grade, resulting in a lower majority-gauge; or for some voter among
the q (who gave a grade lower than ) to give as input or a higher grade,
resulting in a higher majority-gauge.
The majority-gauge is also partially strategy-proof-in-ranking:
Suppose that Xs majority-gauge is (p, , q), Y s is (r, , s), and X mg Y .
Consider a voter j who is of the opposite opinion: for her, Xs grade j should
be below Y s grade j , i.e., j j . Then if she can decrease Xs majoritygauge, she cannot increase Y s majority-gauge, and if she can increase Y s
majority-gauge, she cannot decrease Xs majority-gauge.
It is again easy to verify this. If j can decrease Xs majority-gauge, then she
must have assigned j , implying j j , so j cannot increase
Y s majority-gauge. And if j can increase Y s majority-gauge, then j ,
implying j j , so j cannot decrease Xs majority-gauge. These
properties carry over to other rankings that depend only on the triple (p, , q),
as will be seen anon. Moreover, when p = q and both are strictly less than
50%, all middlemost lexi-order SRFs yield the majority-gauge-ranking.
14.2 Abbreviated Majority-Value
The majority-gauge can produce a tie between two candidates, though this is
highly unlikely, as was observed. In that case, the full majority-value may be
invoked. But when there are many voters or judges, or the language of grades
is small, it is efficient to express the majority-value in an abbreviated form. The
first several terms give the majority-gauge, though not obviously: exactly how
is explained at the end of this section.
It is assumed that each judge assigns a grade to every competitor. Suppose
the common language consists of the grades
= {1 r }, and that
wi is the percentage of i s received by a competitor (the numbers of grades
instead of their percentages could, of course, be used). Then the competitors
majority-grade is t if
w1 + + wt1 50% < w1 + + wt1 + wt .
Each step of the algorithm for calculating the majority-value is accompanied
by two lists. The first, in parentheses, consists of numbers (coming from the
wi s) together with a center, denoted by ||:
(v1 , . . . , vs ||vs+ , . . . , vr ).
240
Chapter 14
[],
and
[].
w(h,g) "
! w(e,d)
(e d ) . . . (h g ) .
Let
k correspond to the lowest grade left of the center with vk > 0,
l correspond to the highest grade right of the center with vl > 0,
= w(l, k) = min{vl , vk }.
Output
v = (v1 , . . . , vs ||vs+1
, . . . , vr ) and
Large Electorates
241
Table 14.1
Calculation of the Abbreviated Majority-Value: An Example with Percentages of Grades
Vectors v
B
||
(13.6
30.7
5.7
||
19.4
14.8
8.4
7.4)
! "
(13.6
30.7
||
13.7
14.8
8.4
7.4)
! 5.7 "
(CC)
(13.6
17.0
||
14.8
8.4
7.4)
(13.6
2.2
||
8.4
7.4)
(13.6
||
6.2
7.4)
(7.4
||
7.4)
(0
||
0)
and
! 5.7 "
(CC) .
1 then to 0 are (24, 11, 17). There is an even number of voters, so the algorithm
242
Chapter 14
Table 14.2
Calculation of the Abbreviated Majority-Value: An Example with Fifty-two grades
Vectors v
2
||
(24
||
17)
! "
(24
||
17)
! 2 "
(11)
(17
||
17)
! 2 7 "
(11)(12)
(0
||
0)
! 2 7 17 "
(11)(12)(02)
14
34
Table 14.3
Abbreviated majority-values and Majority-Gauges of six sets of grades, Listed from Best to Worst
according to Majority-Ranking
% Grades
A
1st
21.6
26.3
19.1
19.9
6.5
6.6
2d
13.6
30.7
25.1
14.8
8.4
7.4
3d
18.9
25.4
25.1
16.3
9.6
4.7
4th
16.9
22.5
19.1
16.6
12.2
12.7
5th
17.0
22.7
18.5
16.8
12.3
12.7
6th
5.1
14.9
38.2
12.1
13.5
16.2
Large Electorates
243
= + implies
p = 50 frequency of (),
q = 50 frequency of all pairs including .
= implies
q = 50 frequency of (),
p = 50 frequency of all pairs including .
Take, for example, the sixth set of grades shown in table 14.3. The majoritygrade is C, the first grade. C = C because the first grade encountered = C
is D, which is below C. The frequency of (CC) is 8.2, so q = 41.8. The sum
Table 14.3
(cont.)
Majority-Gauge
(p%, , q%)
Abbreviated Majority-Value
2.1
1st
14.9 11.4
8.5
6.5
6.6
(CC)(CB)(DB)(DA)(EA)(F A)
5.7
13.7 14.8
2.2
6.2
7.4
2d
(CC)(CB)(DB)(EB)(EA)(F A)
3d
(CC)(CB)(DB)(EB)(EA)(F A)
4th
(CC)(DC)(DB)(EB)(EA)(F A)
5th
(CC)(DC)(DB)(EB)(EA)(F A)
6th
(CC)(DC)(EC)(EB)(F B)(F A)
5.7
8.5
8.2
8.2
13.7 11.7
2.1
2.1
12.1
14.5
14.7
9.7
4.6
8.0
8.0
3.8
9.6
4.2
4.3
11.1
4.7
12.7
12.7
5.1
(47.9, C + , 33.0)
(44.3, C + , 30.6)
(44.3, C + , 30.6)
(39.4, C , 41.5)
(39.7, C , 41.8)
(20.0, C , 41.8)
244
Chapter 14
of the frequencies of all pairs including C is 30.0, so p = 20.0. The foregoing prescription assumes that a candidates grades are given in percentages.
If instead they are given in numbers, it is assumed that the number of judges n
is even, and the prescription is the same except that 50 is replaced by n/2.
14.3 Other Rules
Tie-Breaking Rules
Very
Good
Good
Acceptable
Poor
To Reject
(p, , q)
Royal
16.4%
26.0%
17.5%
16.1%
10.1%
13.9%
Bayrou
10.8%
30.0%
27.7%
15.5%
8.0%
7.9%
The majority-gauge places Royal ahead of Bayrou (so the majority-value and
majority-ranking do, too).
Given the triple (p, , q), various different seemingly reasonable rules are
possible for designating which of two candidates leads the other when their
majority-grades are the same. One naive idea is the upper tie-breaking rule:
When two candidates have the same majority-grade , the candidate with the
greater percentage of grades above (namely, p) is first; and in case of a tie,
(p, , q) and (p, , s), the first is ranked higher if q < s.
It ranks Royal ahead of Bayrou. But then, why not instead take the lower
tie-breaking rule:
When two candidates have the same majority-grade , the candidate with the
greater percentage of grades below (namely, q) is last; and in case of a tie,
(p, , q) and (r, , q), the first is ranked higher if p > r.
It ranks Bayrou ahead of Royal.
Imagine the grades of the candidates ordered from best to worst on seesaws
or teeterboards that are balanced exactly at the middlemost grade: that which
weighs more heavily toward the better-than grades than toward the worse-than
grades should correspond to the candidate who is ranked higher. This is the
difference tie-breaking rule:
Large Electorates
245
When two candidates have the same majority-grade , the candidate with
the greater difference between the percentages of grades above and below
(namely, p q) is first; and in case of a tie, (p, , q) and (r, , q) with
p q = r s, the first is ranked higher if p + q < r + s because it expresses
a greater consensus for .
It places Bayrou comfortably ahead of Royal. David Gale, believing at first that
this rule agrees with the majority-ranking, suggested it for large electorates.
But it does not agree with the majority-ranking, as the Bayrou-Royal example
shows. However, the difference tie-breaking rule agrees with the majoritygauge-ranking to the extent that both rank a candidate with an + above a
candidate with an . It may be observed that the majority-gauge-ranking and
the difference-rule-ranking are the same for the Orsay French presidential election experiment and for the INFORMS U.S. presidential election experiment
(see table 1.5), and this will often be the case.
The majority-gauge-ranking, deduced from the underlying principles of the
majority judgment, is more subtle. If the weight of one candidate leans toward
the better-than side of the seesaw and the weight of the other leans toward the
worse-than side, then the better-than side is, of course, ahead. If both candidates lean toward the better-than side, then the one whose better-than side is
the weightier is ahead of the otherso Royal with 42.4% is ahead of Bayrou with 40.8%because there are more voters who really care about Royal
than there are who really care about Bayrou. Symmetrically, if both candidates
lean toward the worse-than side, then the one whose worse-than side is the
weightier is behind the other, because there are more voters who really are
opposed.
The three tie-breaking rulesupper, lower, and differenceshare important
properties with the majority-gauge-ranking: all are strategy-proof-in-grading
and partially strategy-proof-in-ranking. The proofs are the same as those
given for the majority-gauge, where comparisons of majority-gauges are made
according to each of the three rules. The essential reason is that all are based
on the triple (p, , q) of the majority-gauge. Each of the three have the serious
defect of disagreeing with the majority-ranking; nevertheless, they sometimes
behave better than other methods, and they may be compared among themselves. Statistical evidence suggests that the majority-gauge-ranking (for all
intents and purposes, the majority judgment, when there are many voters) is
slightly less manipulable than the majority judgment with the difference tiebreaking rule, and both are clearly less manipulable than the majority judgement
with the upper or lower rules. This confirms the theoretical result because the
246
Chapter 14
Table 14.4a
Number of Wins among Royal, Bayrou, and Sarkozy, 2007 Orsay Experiment
Left
Royal
First-past-the-post
Two-past-the-post
MJ: upper rule
MJ: majority-gauge
MJ: difference rule
Condorcet
MJ: lower rule
Borda
918
1,124
613
603
285
152
380
13
Right
Sarkozy
Bayrou
622
1,139
662
564
255
138
373
40
0
92
922
4,252
6,840
8,381
9,540
9,973
0
155
871
4,397
6,906
8,416
9,554
8,731
9,068
8,244
8,462
5,142
2,385
941
49
0
Tie
9,311
8,117
8,464
5,036
2,326
915
42
1,086
14
540
3
3
490
403
31
14
67
589
3
3
513
415
31
143
Large Electorates
247
Table 14.4b
Number of Wins among Royal, Bayrou, and Sarkozy, 2007 Orsay Experiment
Left
Royal
First-past-post
Two-past-post
MJ: upper rule
MJ: lower rule
MJ: majority-gauge
Condorcet
MJ: difference rule
Borda
1,844
3,425
832
1,454
724
173
119
69
Right
Sarkozy
Bayrou
4,275
3,959
872
1,454
765
176
119
31
1,850
5,527
8,324
8,369
8,830
9,497
9,662
9,930
1,478
4,242
8,250
8,380
8,785
9,507
9,631
9,889
6,271
685
835
144
437
2
31
0
Tie
3,825
1,098
868
122
440
3
45
3
35
663
9
33
9
327
188
1
397
701
10
44
10
309
205
77
were generated. In the experiments whose results are given in tables 14.4a,b,
10,000 random samples of 201 voters were generated independently for each
of the four experiments (two per table). It may be observed that the qualitative
properties are the same (where comparisons are possible). The reader may check
that the number of wins and ties is 10,000 for each of the methods of election;
for instance, for the majority-gauge in table 14.4a, 603 + 4, 252 + 5, 142 +
3 = 10, 000. This fails to be true only for Condorcets method because cycles
occurthe Condorcet paradoxthat is, in table 14.4a, 138 + 8,416 + 915 +
415 = 9,884, and there were 116 cycles.
In chapter 6 it was shown that Bayrou is in fact the centrist candidate. The
methods of election used to designate the winners are listed from least favorable for the centrist candidate to most favorable for the centrist candidate. The
majority judgment with the lower tie-breaking rule tends to favor the centrist
because he is infrequently given low grades; symmetrically, the majority judgment with the upper tie-breaking rule tends to penalize the centrist because
he is infrequently given high grades. The majority-gauge-ranking tends to
be in between, as is the difference tie-breaking rule, though the latter tends
to be more favorable for the centrist than the former, as may be seen in each
of the four experiments of tables 14.4a and 14.4b. Recall, however, that in the
three precincts of the Orsay experiment, voters gave quite different percentages
of their actual votes than the national percentages: much more to Bayrou, more
to Royal, and slightly less to Sarkozy; thus in table 14.4b, Bayrou wins much
more often with any method.
248
Chapter 14
What should be done when candidates or competitors receive different numbers of grades? This can happen with large electorates and juries as well as
small ones. The answer depends very much on the particular situation. Three
separate procedures naturally commend themselves.
The first procedure was developed in chapter 13. It is the choice when juries
are small, but it may also be used when juries are large. Suppose the competitor
with the most grades has n grades. This procedure adjoins to any competitors
set of grades her majority-grade as many times as necessary to give her a total
of n grades. Why this is reasonable has already been explained; essentially
it is because the majority-grade is the best evaluation of a jurys collective
opinion.
The second procedure was used in the Orsay and other voting experiments.
Every voter or judge must assign a grade to each candidate, so every candidate necessarily has the same number of grades. This was accomplished by
stating very clearly that when a voter gives no grade or has no opinion, it
means the voter chooses To Reject the candidate. If no opinion were permitted and the first procedure was used, it would be possible for a relatively
unknown candidate to receive no opinion from the overwhelming majority
of the electorate and a very few very high grades from family, friends, and
other acquaintances, giving him a high majority-grade, which, when repeatedly adjoined, would make him the winner or place him high in the ranking.
That is clearly unacceptable. The rationale for insisting that a voter must evaluate every candidate is on the one hand civica voter should take the trouble
to know who each candidate is and what he stands forand on the other hand
realistica voter who has no opinion about a candidate has implicitly rejected
him, having not been sufficiently motivated to invest the time necessary to
evaluate him.
The third procedure is an immediately evident option when there are several
large juries evaluating different sets of competitors (for if exactly the same sets
Large Electorates
249
The theory developed in this book has assumed that social grading functions,
which assign a final grade as a function of the judges or voters input grades,
are monotonic: (1) if the input grades are replaced by the same or higher grades,
then the final grade can be no lower, and (2) if the input grades are all replaced by
strictly higher grades, then the final grade must be strictly higher. We believe that
this makes practical sense. But it is of interest to investigate the consequences
of dropping the second condition. An aggregation function is weakly monotonic
when only the first of the two conditions is satisfied. Andrew Jennings (2009)
has identified and characterized the family of weakly monotonic aggregation
functions that are strategy-proof-in-grading.
For simplicity, think of percentages. Assume that the language is infinite
and belongs to the closed real interval [0, 100]. The linear median (weakly
monotonic aggregation function), LM : [0, 100]n [0, 100], is defined by
#
#(i y)
LM(1 , . . . , n ) = sup y [0, 100] : 100
y ,
n
where #(i y) means the number of grades i that are greater than or equal to
y, and n is the total number of grades (or voters). If a candidates linear median
is, say, 87, then at least 87% of voters assigned the candidate an 87 or higher.
This idea makes sense only when there are many voters. The linear medians
determine the ranking of the candidates. Jennings argues that this approach is
of interest for polarized electionswhen all voters submit minimal or maximal gradesbecause an order function returns an extreme grade, whereas LM
returns the fraction of the voters who submit a maximal grade. The mechanism is intriguing, but the majority-gauge and majority-ranking do take into
consideration such fractions.
Jennings characterizes the weakly monotonic aggregation functions that
are strategy-proof-in-grading: they are the strategic medians. A function f :
[0, 100]n [0, 100], is a strategic median if there exists a positive increasing
function g : [0, 100] [0, 100] and
250
Chapter 14
f () = sup y [0, 100] : 100
#(i y)
n
#
g(y) .
15
All this is strongly reminiscent of the conditions existant at the beginning of the theory
of heat: that too was based on the intuitively clear concept of one body feeling warmer
than another, yet there was no immediate way to express significantly by how much, or
how many times, or in what sense.
John von Neumann and Oskar Morgenstern
252
Chapter 15
conceptions of how society should be run and organized. The deep meaning of
the grades of a common language for evaluating candidates for political office
is an elusive philosophical concept. There is no known procedure for being
able to assert that the words Excellent or Acceptable mean exactly the same
thing to different voters (but, of course, that is also true of the word green, for
instance).
And yet, people talk, they write, they evaluate, they communicate. Words
do carry meanings, meanings that depend on the languageEnglish, French,
Spanishand also on the cultureBritish versus American versus Indian,
French versus Canadian versus Swiss, Spanish versus Mexican versus Chilean.
The ballot posed the solemn question, To be president of France, after having
taken every consideration into account, I judge in conscience that this candidate
would be:
and invited the voter to evaluate every candidate. The words of the language of
grading that they were to use,
Trs Bien (Excellent), Bien (Very Good), Assez Bien (Good), Passable (Acceptable), Insuffisant (Poor), Rejeter (To Reject),
are intimately linked with the question posed and have clear cultural meanings
independent of voting that are, by and large, shared by French voters (as well
as by English speakers). The choice of those six words was made after considerable discussion and consultation. The first five are known to every French
school child, to all those that sat baccalaurat examinations marking the end
of secondary school, and to university students. There may at times be a slight
hesitation between two successive gradesPoor and Acceptable, or Excellent
and Very Good but there is surely none between grades that are two or more
apart, such as Good and Poor. Voters faced with such dilemmas would prefer a
still richer languagethe possibility of assigning a Poor+ or an Acceptable,
an Excellent or a Very Good +. Imagine a population asked to evaluate colors:
reds, yellows, greens, blues are flashed on a screen, and individuals are to identify them. The deep greens of the Amazonian forests and the fragile greens of
the new leaves of springtime are green to most people, but many would prefer
identifying them with additional adjectives. There may be hesitation as greens
gradually fade into blues, for example, but as Wittgenstein said, the meaning
of a word is its use in the language.
One fact is clear. A voter is better able to express his opinion by assigning to
candidates one of the six grades used by the majority judgment than by giving
rank-orders of candidates. This fact emerges from various experiments. Asked
to rank-order the candidates, over 50% of the voters rank-ordered at most six
Common Language
253
of the twelve candidates in the 2007 French presidential election (see table
6.4); rank-ordering is at once too difficult and too constraining. In the 2007
Orsay experiment, 48% of the ballots had no Excellent, 11% had no Excellent
and no Very Good, 2% had no Good or above. To be listed first in a rankorder carries very different meanings. To be listed second (or third or in any
place) in a rank-order also carries very different meanings: two-thirds of the
second highest grades are merely Good or worse, one-quarter of the second
highest grades are Acceptable or worse (see table 6.3). The traditional model
and methods aggregate inputs that have considerably less common meaning
than the inputs aggregated by the majority judgment. Moreover, even with
twelve candidates, fully 86% of the majority judgment voters chose to use only
five different grades. This suggests that six is the optimal number of grades that
permit absolute judgments (less than Millers seven; in keeping with the six
for the distinction of tones). The number was deliberately chosen to be even so
that there would be no middle grade, and there are four positive and only two
negative grades, in keeping with a sense that candidates for public office should
be in any case exceptional persons. The six grades are also meant to represent
very distinctive classifications. Hesitation between neighboring grades is more
likely due to difficulties of approximation in the scale of grades than to a lack
of understanding of their meanings.
The extensive experience with competitions of other typessports, products, musical performancesvalidates Wittgensteins assertion that over time
grades acquire clear, commonly held meanings. There is no reason for this
not to happen with a language of grades to evaluate candidates. On the other
hand, it is impossible to prove that the meanings of the evaluations were in
fact shared by French voters (any more than one can be sure about the shared
value of any word or concept in any language). What can be asked is circumstantial: Are the results consistent with the hypothesis that the meanings of
the grades are shared? Werner Heisenberg, faced with a similar difficulty in
establishing the uncertainty principle, explained, We believe we have gained
anschaulich 1 understanding of a physical theory, if in all simple cases, we can
grasp the experimental consequences qualitatively and see that the theory does
not lead to contradictions. 2
Two distinct answers are given to this question. First, it is shown that the
results make sense. They are consistent with themselves and with the known
1. Anschaulich means intuitively intelligible, visualizable.
2. The more precisely the position [of a particle] is determined, the less precisely the momentum
is known, and vice versa. See Hilgevoord and Uffink (2008).
254
Chapter 15
facts. Had the grades no common, cultural meanings, there would be no reason
for the results to make sense nor for them to agree with observed facts. Second,
the actual use of the grades is studied (as opposed to their meanings). Likeminded people tend naturally to grade in the same way. Use, however, depends
crucially on the set of candidates. A very weak set of candidates will elicit
different grades than a very strong set of candidates. A set of candidates with
a right rather than a left coloring (or the opposite) will elicit grades that are
very different when they come from the left rather than the right. The French
presidential election of 2007 presented twelve candidates who covered the entire
political spectrum from the far left to the far right, so it became possible for all
the voters to use the grades with the same frequencies. Had the French electorate
been presented only with the seven candidates of the left or only with the four
candidates of the right (see table 6.8), the voters of the left would have used the
language of grades very differently from the voters of the right, so the language
of grades could not be expected to be approximately the same for all voters.
But with an entire spectrum of opinions, the ballots showed that the words were
used very much in the same way, even though the three separate voting precincts
did not elect the same candidate with the majority judgment. The hypothesis of
a common language is reinforced and certainly not contradicted.
15.1 The 2007 Orsay Experiment: Validation
This was a field experiment: real voters were asked to express themselves in
a real contest almost at the same time as they cast their votes in the actual
election. They came to the experiment with their real opinions or utilities.
Nothing in their incentives or beliefs was controlled, and no treatments were
assigned or analyzed, as is done in laboratory experiments where students are
offered artificial monetary incentives under varying conditions. We wished to
find out whether real, uncontrollable voters of vastly different opinions and
educational backgrounds could easily and intelligently evaluate a wide spectrum of candidates in a common language of grades. This is impossible in a
laboratory setting. The fact is that their natural incentive was to answer honestly
(though this is totally uncontrollable) because their evaluations would have no
real consequence. The analysis of the results suggests that they did respond
honestly. If such is the case, the data are an accurate expression of their true
opinions, so their preferences between pairs of candidates can be deduced. This
allows different methods of voting to be tested, analyzed, and compared on the
basis of real data. To our knowledge this is the first database that allows such
analyses.
Common Language
255
256
Chapter 15
Table 15.1
Percentages of Voters, k Grades (k = 1, . . . , 6)
1 grade
2 grades
3 grades
4 grades
5 grades
6 grades
1%
2%
10%
31%
42%
14%
was that the majority judgment allows a much fuller expression of opinion.
With twelve candidates, the official vote allowed one of thirteen messages;
the majority judgment allowed more than two billion possible messages.4 A
natural question elicits a quick and natural answer, so despite the vast number
of possible answers, assigning grades is an easy task (especially compared with
ranking). Of the 1,733 valid ballots, 1,705 were different. Those that were the
same had typically only To Reject, or an Excellent for Royal or Sarkozy and To
Reject for the others.
Six possible grades assigned to twelve candidates implies that a voter was
unable to express a preference between every pair of candidates. The number of
different grades actually used by voters shows that in any case they did not wish
to distinguish between every pair (table 15.1); only 14% used all six grades.
This suggests that six grades were sufficient. A scant 3% of the voters used at
most two grades, 13% at most three, suggesting that two or three grades were
far from sufficient.
The distribution of the grades of each of the candidates, their majority-grades,
and their majority-gauges are given in the order of the majority-ranking in table
15.2. The majority-ranking is very different from the rank-ordering obtained in
the three precincts of Orsay with the current first-past-the-post system. Bayrou
finished first with the majority judgment, defeating the candidates of the two
major parties. Most striking, instead of finishing fourth as in the official vote,
the extreme rightist Le Pen finished last: 74% of the electorate assigned him a
To Reject. Significantly, the Green candidate Voynet placed fourth (instead of
her official seventh place): the electorate was able to express the importance it
attaches to problems of the environment while giving higher grades to candidates it judged better able to preside over the nation. Sarkozy later recognized
this importance: his new government created one superministry, the Ministry
of Ecology and Sustainable Development.
4. With twelve candidates and six grades there are 612 = 2, 176, 782, 336 possible messages.
Several participants confessed they had never voted officially before but had done so this time to
be able to participate in a vote that enabled them to express their opinions.
Common Language
257
The distribution of the grades of the majority judgment was frequently shown
in talks and seminars with the names of the four major candidates (Bayrou, Le
Pen, Royal, and Sarkozy) hidden. Invariably, someone in the audience was
able to identify from the percentages who was who. This shows that the numbers make sense, that they contain meaningful information. What happened
is, we believe, what most observers anticipated: Le Pen had an overwhelming percentage of To Reject; among the other three, Sarkozy had at once the
highest percentages of Excellent and of To Reject; Bayrou at once the lowest
percentages of Excellent and of Poor plus To Reject.
The results of the face-to-face confrontations between every pair of candidates may be estimated from the majority judgment ballots by comparing the
grades of each. When two candidates are face-to-face, the ballots vote goes to
the one with the higher grade; when their grades are the same, each receives 12
(to obtain these data the ballots themselves must be inspected; the data of table
15.2 do not suffice). The estimates are given in table 15.3. In particular, Royal
defeats Sarkozy with 52.3% of the vote, a prediction of the outcome of the
second round within 1%: in the three Orsay precincts Royal actually obtained
51.3% of the official second-round votes to Sarkozys 48.7%. The participants
seem to have expressed themselves in the majority judgment ballots in conformity with the manner in which they actually voted. The 1% difference is
easily explained: 26% of the voters did not participate in the experiment; and
the last two weeks of the campaign may have changed perceptions (in particular,
Sarkozy clearly dominated Royal in a televised debate).
The closeness of the estimate to the outcome shows that the majority judgment ballots are consistent with the observed facts. From the face-to-face results
it is possible to obtain the Condorcet- and Borda-rankings (the face-to-face
majority rule gives a transitive ordering; there is no Condorcet paradox). Given
at the bottom of table 15.3, they are identical with the majority-ranking except
for the four last places.
That all three methods rank Bayrou first, Royal second, and Sarkozy third
is not surprising: except for the Excellents, whose percentages taken alone
give the opposite rank-ordering, the percentages of at least Very Good, at least
Good, . . . , at least Poor all agree with that order (table 15.4).Almost any reasonable election mechanism should agree with this ranking of the three important
candidates; the first- and two-past-the-post mechanisms are an exception, but
they are not reasonable.
Still another explanation for Bayrou to be the winner with all three methods
is to compare how voters who gave their highest grade to each of the top three
evaluated the other two (table 15.5). Sarkozy votersvoters who gave Sarkozy
258
Chapter 15
Table 15.2
Majority Judgment Results, Three Precincts of Orsay, April 22, 2007
1st
2d
3d
4th
5th
6th
7th
8th
9th
10th
11th
12th
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
Nihous
de Villiers
Schivardi
Le Pen
Excellent
Very
Good
Good
Acceptable
13.6%
16.7%
19.1%
2.9%
4.1%
2.5%
1.5%
2.1%
0.3%
2.4%
0.5%
3.0%
30.7%
22.7%
19.8%
9.3%
9.9%
7.6%
6.0%
5.3%
1.8%
6.4%
1.0%
4.6%
25.1%
19.1%
14.3%
17.5%
16.3%
12.5%
11.4%
10.2%
5.3%
8.7%
3.9%
6.2%
14.8%
16.8%
11.5%
23.7%
16.0%
20.6%
16.0%
16.6%
11.0%
11.3%
9.5%
6.5%
Table 15.3
Face-to-Face Elections, Percentages of Votes Estimated from Majority Judgment Ballots, Three
Precincts of Orsay, April 22, 2007
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
de Villiers
Nihous
Schivardi
Le Pen
Condorcetranking
Bordaranking
Majorityranking
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
44
40
23
23
19
17
17
16
10
10
14
56
48
27
26
22
19
20
23
15
13
19
60
52
41
39
36
34
34
23
25
25
20
77
73
59
44
41
33
33
34
25
21
26
77
74
61
56
47
40
39
38
31
26
30
81
78
64
59
53
43
41
39
32
27
31
83
81
66
67
60
57
49
44
38
34
35
1st
2d
3d
4th
5th
6th
7th
1st
2d
3d
4th
5th
6th
7th
1st
2d
3d
4th
5th
6th
7th
Note: For instance, Royal obtains 52% against Sarkozy; symmetrically, Sarkozy obtains 48%
against Royal.
Common Language
259
Table 15.2
(cont.)
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
Nihous
de Villiers
Schivardi
Le Pen
Poor
To
Reject
Majority-Gauge
Official
Order
8.4%
12.2%
7.1%
26.1%
22.6%
26.4%
25.7%
25.9%
26.7%
15.8%
24.9%
5.4%
7.4%
12.6%
28.2%
20.5%
31.1%
30.4%
39.5%
40.1%
55.0%
55.5%
60.4%
74.4%
3d
1st
2d
7th
5th
8th
9th
10th
11th
6th
12th
4th
Table 15.3
(cont.)
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
de Villiers
Nihous
Schivardi
Le Pen
Condorcetranking
Bordaranking
Majorityranking
Laguiller
de Villiers
Nihous
Schivardi
Le Pen
83
80
66
67
61
59
51
44
38
34
36
84
77
77
66
62
61
56
56
46
44
41
90
85
75
75
69
68
62
62
54
47
44
90
87
75
79
74
73
66
66
56
53
46
86
81
80
74
70
69
65
64
59
56
54
8th
9th
10th
11th
12th
8th
9th
10th
12th
11th
8th
10th
9th
11th
12th
260
Chapter 15
Table 15.4
Cumulative Majority Judgment Grades, Three Precincts of Orsay, April 22, 2007
At Least
Bayrou
Royal
Sarkozy
Excellent
Very Good
Good
Acceptable
Poor
To Reject
13.6%
16.7%
19.1%
43.3%
39.4%
38.9%
69.4%
58.5%
53.2%
84.2%
75.3%
64.7%
92.6%
87.5%
71.8%
100%
100%
100%
Table 15.5
Grades for Top Three Candidates, by Voters Giving Their Highest Grade to One of the Other Two
Candidates, Three Precincts of Orsay, April 22, 2007.
Excellent
Very Good
Good
Acceptable
Poor
To Reject
Bayrous Grades
Sarkozys Grades
Royals Grades
Royal
Voters
Sarkozy
Voters
Royal
Voters
Bayrou
Voters
Bayrou
Voters
Sarkozy
Voters
7%
33%
29%
16%
9%
6%
6%
28%
30%
19%
9%
8%
3%
10%
16%
15%
11%
45%
6%
22%
24%
17%
6%
25%
7%
26%
26%
20%
13%
9%
3%
13%
22%
24%
18%
21%
Note: TNS Sofres poll, March 1415, 2007, showed 72% of Royal voters giving their votes to
Bayrou in a second round against Sarkozy, and 75% of Sarkozy voters giving their votes to Bayrou
in a second round against Royal.
Common Language
261
Table 15.6
Second-Round Results, Percentages of Votes Estimated from Majority Judgment Ballots versus
Actual Outcomes, Three Precincts of Orsay, April 22, 2007
Three Precincts
1st Precinct
6th Precinct
12th Precinct
51.3%
48.7%
48.2%
51.8%
47.2%
52.8%
54.4%
45.6%
53.7%
46.3%
54.3%
45.7%
52.6%
47.4%
Table 15.7
First-Round Votes, Percentages of Votes Estimated from Majority Judgment Ballots, Three
Precincts of Orsay, April 22, 2007
Left
Estimate 1
Actual
Estimate 2
Buffet
Laguiller
Bov
Schivardi
Besancenot
Voynet
2.6
1.4
2.5
1.6
0.8
1.5
1.6
0.9
1.5
0.4
0.2
0.3
4.9
2.5
4.6
3.5
1.7
3.4
Royal
Bayrou
Sarkozy
25.6
29.9
25.4
25.6
25.5
25.3
28.4
29.0
27.4
Nihous
de Villiers
Le Pen
0.5
0.3
0.4
2.3
1.9
1.9
2.9
5.9
5.8
Major
Estimate 1
Actual
Estimate 2
Right
Estimate 1
Actual
Estimate
Note: In estimate 1 the vote of a ballot is split evenly among the candidates with the highest grade.
Estimate 2 is the same except when Le Pen is among those who received the highest grade, in
which case he is accorded the entire vote.
262
Chapter 15
Table 15.8
Majority Judgment Results, 1st Precinct of Orsay, April 22, 2007
1st
2d
3d
4th
5th
6th
7th
8th
9th
10th
11th
12th
Bayrou
Sarkozy
Royal
Voynet
Besancenot
Buffet
Bov
Laguiller
de Villiers
Nihous
Schivardi
Le Pen
Excellent
Very Good
Good
Acceptable
13.2%
22.7%
14.8%
2.5%
3.6%
1.8%
1.3%
2.1%
2.3%
0.2%
0.4%
2.1%
31.3%
19.3%
19.5%
8.8%
8.9%
6.6%
5.7%
4.8%
5.2%
1.6%
0.5%
2.9%
25.4%
16.3%
20.8%
15.6%
12.3%
12.3%
10.8%
6.4%
8.9%
4.5%
3.4%
6.4%
14.7%
10.6%
19.5%
22.0%
16.5%
20.4%
14.0%
15.7%
11.4%
8.6%
7.5%
7.5%
Table 15.9
Majority Judgment Results, 6th Precinct of Orsay, April 22, 2007
1st
2d
3d
4th
5th
6th
7th
8th
9th
10th
11th
12th
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
Nihous
de Villiers
Schivardi
Le Pen
Excellent
Very Good
Good
Acceptable
16.6%
18.6%
16.8%
3.8%
3.8%
3.3%
1.3%
2.0%
0.3%
2.0%
0.3%
1.5%
30.8%
22.6%
20.0%
9.3%
9.8%
7.5%
5.3%
4.5%
1.5%
5.7%
0.8%
4.7%
22.3%
19.1%
13.3%
16.5%
16.0%
12.5%
10.3%
10.1%
6.8%
7.0%
2.2%
5.7%
14.1%
15.0%
7.5%
25.6%
15.8%
20.0%
16.1%
16.6%
10.5%
10.6%
9.7%
5.8%
Table 15.10
Majority Judgment Results, 12th Precinct of Orsay, April 22, 2007
1st
2d
3d
4th
5th
6th
7th
8th
9th
10th
11th
12th
Royal
Bayrou
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
de Villiers
Nihous
Schivardi
Le Pen
Excellent
Very Good
Good
Acceptable
16.4%
10.8%
18.0%
2.4%
4.9%
2.3%
1.9%
2.3%
2.8%
0.5%
0.7%
5.4%
26.0%
30.0%
20.1%
9.9%
11.0%
8.7%
7.0%
6.5%
8.4%
2.3%
1.7%
6.3%
17.5%
27.7%
13.3%
20.4%
20.4%
12.7%
12.9%
11.9%
10.1%
4.4%
6.1%
6.6%
16.1%
15.5%
12.6%
23.2%
15.5%
21.5%
18.0%
17.5%
11.9%
13.8%
11.2%
6.1%
Common Language
263
Table 15.8
(cont.)
Bayrou
Sarkozy
Royal
Voynet
Besancenot
Buffet
Bov
Laguiller
de Villiers
Nihous
Schivardi
Le Pen
Poor
To Reject
Majority-Gauge
8.2%
5.4%
13.1%
30.4%
23.4%
25.2%
25.4%
26.7%
16.5%
28.6%
24.7%
4.8%
7.2%
24.8%
12.3%
20.8%
35.2%
33.6%
42.8%
42.2%
55.6%
56.6%
63.5%
76.2%
Poor
To Reject
Majority-Gauge
9.0%
13.3%
7.1%
24.1%
24.5%
27.3%
28.3%
27.3%
25.1%
15.8%
26.5%
4.8%
7.2%
11.3%
30.9%
20.6%
30.1%
29.5%
38.6%
39.4%
55.7%
58.9%
60.6%
77.5%
Poor
To Reject
Majority-Gauge
10.1%
8.0%
8.4%
23.9%
19.9%
26.5%
23.4%
23.6%
15.0%
26.4%
23.4%
6.5%
14.0%
7.9%
27.7%
20.1%
28.3%
28.3%
36.8%
38.4%
51.8%
52.7%
56.9%
69.1%
Table 15.9
(cont.)
Bayrou
Royal
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
Nihous
de Villiers
Schivardi
Le Pen
Table 15.10
(cont.)
Royal
Bayrou
Sarkozy
Voynet
Besancenot
Buffet
Bov
Laguiller
de Villiers
Nihous
Schivardi
Le Pen
264
Chapter 15
for the left went to others, about three-quarters to Royal, one-quarter to Sarkozy,
almost none to Bayrou. That none went to Bayrou counters the conventional
wisdom.
The results in each of the three precincts (tables 15.815.10) were very similar to the results in all three precincts together. The set of first three candidates
is always the same; the fourth-place (Voynet) through eighth-place (Laguiller)
candidates are identical; the ninth and tenth-places oscillate between two candidates (Nihous and de Villiers); and the eleventh and twelfth places are identical
(respectively, Schivardi and Le Pen).
Among the three, Bayrou is not always first, Sarkozy is not always last.
In every case each of the three candidates has a majority-grade of Good.
The results in the 12th precinct are of particular interest for at least two
reasons. The first is that Royal is the majority judgment winner, but the faceto-face estimates show that the Condorcet- and Borda-winner is Bayrou (table
15.11). Here is a clear instance showing that the methods of Condorcet and
Borda favor the centrist candidate more than the majority judgment does (see
chapter 6).
Could Sarkozy have won in all of France with the majority judgment? The
distribution of the official votes in the three precincts of Orsay was very far from
the vote in the entire nation (see table 6.13). Sarkozy had 2% less in Orsay than
nationally, Royal was the official winner in Orsay with 4% more, Bayrou had
7% more, and Le Pen almost 5% less. But if the scores of Royal and Sarkozy
in the 12th precinct are switched, the distribution of the votes is close to that of
the national percentages (table 15.12).
Setting aside the obviousa different method of election induces a different
electoral campaignthis suggests Sarkozy could well have been elected with
the majority judgment nationally. The statistical evidence (see tables 6.14a and
6.14b) bears this out. Sarkozy is the most probable winner with the majority
judgment when samples are drawn from a database whose distribution of the
votes approximates that of all of France. This suggests that the majority judgment is not unduly biased in favor of a centrist candidate. On the other hand,
Table 15.11
Projected Second-Round Results, 12th Precinct of Orsay, April 22, 2007
Bayrou
Royal
Sarkozy
Bayrou
Royal
Sarkozy
46.5%
41.0%
53.5%
45.7%
59.0%
54.3%
Common Language
265
Table 15.12
Official First-Round Votes, National and 12th Precinct of Orsay, April 22, 2007
National
Sarkozy
31.2%
Royal
Bayrou
Le Pen
Besancenot
de Villiers
25.9%
18.6%
10.4%
4.1%
2.2%
12th Precinct,
Orsay
32.0%
26.6%
20.2%
10.0%
2.7%
2.5%
Buffet
Voynet
Laguiller
Bov
Nihous
Schivardi
National
12th Precinct,
Orsay
1.9%
1.6%
1.3%
1.3%
1.1%
0.3%
2.3%
1.3%
1.2%
0.8%
0.2%
0.0%
Note: Sarkozys and Royals 12th precinct scores are switched here, for comparison. The actual
percentages were Royal 32.0%, Sarkozy 26.6%.
Did all the voters use the language in the same way? Did subsets of the voters use
each of the words on average about the same number of times? The immediate
answer is an unqualified yes. In each of the three precincts, the average numbers
or percentages of Excellents, Very Good s, down to To Rejects were remarkably
close to those of all three precincts taken together (table 15.13). But is this
exceptional, or are such results to be expected when the subpopulations contain
well over 500 ballots (each of the precincts) or consist of 100 or 50 ballots
drawn at random?
This regularity holds in the frequency with which every grade is used in
each voting precinct (table 15.14). For example (see bold face numbers), in
Table 15.13
Average Number of Grades (and Percentages) per Majority Judgment Ballot, Three Precincts of
Orsay, April 22, 2007
Precinct
Excellent
Very Good
Good
1st
6th
12th
All three
0.7
0.7
0.7
0.7
1.2
1.2
1.4
1.3
1.5
1.4
1.6
1.5
5.6%
5.9%
5.7%
5.7%
9.6%
10.2%
11.5%
10.4%
12.1%
11.8%
13.7%
12.5%
Acceptable
Poor
1.7
1.7
1.8
1.7
2.3
2.3
2.2
2.3
14.0%
14.3%
15.2%
14.5%
To Reject
19.3%
19.5%
17.9%
18.9%
4.7
4.6
4.3
4.5
39.4%
38.4%
36.0%
37.9%
266
Chapter 15
Table 15.14
Frequencies (in Percentages) in the Use of Grades, Three Precincts of Orsay, April 22, 2007
No. of Times Grades Used in a Ballot (%)
Precinct
> 8
Excellent
1st
6th
12th
47.0
46.6
51.1
43.1
41.8
37.3
7.7
8.7
7.9
1.6
2.0
2.3
0.2
0.7
0.9
0.2
0.0
0.2
0.0
0.2
0.0
0.0
0.0
0.0
0.2
0.2
0.3
Very Good
1st
6th
12th
1st
6th
12th
1st
6th
12th
1st
6th
12th
1st
6th
12th
30.2
28.8
26.0
24.3
26.3
21.8
23.3
22.6
22.5
16.5
16.3
23.2
3.0
4.7
7.0
40.3
37.9
37.9
35.1
35.1
30.4
29.3
28.8
23.0
20.0
24.0
20.8
6.1
4.7
7.3
19.7
22.0
20.4
22.2
20.5
25.5
20.0
24.1
24.6
22.9
19.5
18.5
10.7
9.2
14.5
6.8
7.2
8.2
11.4
10.1
12.0
16.8
13.0
17.1
15.9
17.0
15.2
12.0
17.0
14.0
1.1
2.7
4.4
4.7
5.3
7.2
6.4
6.5
7.3
14.0
9.5
10.6
16.3
18.1
14.5
1.3
0.8
2.1
1.4
2.2
2.3
3.6
3.7
3.8
5.5
5.7
6.1
17.2
14.5
13.8
0.5
0.3
0.7
0.7
0.3
0.3
0.2
0.3
0.5
2.9
5.8
3.1
10.4
11.0
7.3
0.2
0.3
0.3
0.2
0.2
0.3
0.0
0.5
0.9
1.4
1.0
1.4
9.3
7.3
7.0
0.0
0.0
0.0
0.0
0.0
0.2
0.4
0.5
0.2
0.9
1.3
1.0
15.0
13.6
14.7
Good
Acceptable
Poor
To Reject
the 1st precinct 19.7% of the ballots had two Very Good s, in the 6th precinct
the percentage of two Very Good s was 22.0%, and in the 12th it was 20.4%. The
corresponding triplets throughout the table are surprisingly close to each other.
But again, is this exceptional, or are these results to be expected in any case
with subsets that contain well over 500 ballots?
To determine whether the grades of the language of evaluation are really
used in the same wayto see that the raw data are exceptional and not simply
the natural consequence of the sample sizesa finer analysis is needed. This
analysis is rather technical, and readers may prefer to skip this material to go
directly to the conclusions.
Let X = {1, . . . , 6} be a discrete random variable whose value represents the
grade a voter gives a candidate: Excellent is 6, Very Good 5, Good 4, Acceptable
3, Poor 2, and To Reject 1. The greater their numerical difference, the greater
is the difference between two grades. Let GX be the underlying (cumulative)
X the observed or empirical (cumulative) distribution
distribution function and G
function for all of Orsay. The vector of percentages (divided by 100) 5 given in
5. The exact numbers sum to 1; rounding them to three significant places may result in a sum
slightly above or below 1, as in this case.
Common Language
267
table 15.13 gives the observed density function for all of Orsay,
To Reject
Poor
Acceptable Good
The raw data are comforting because the difference between the distribution of any of the subpopulations and the empirical distribution of the whole
is small. In general, fix a subpopulation of some size M, and let d(M) be the
distance between the observed distributions of X of the subpopulation and the
X of Orsay. Imagine that for every subpopulation of
observed distribution G
size M the distance is computed, giving the distribution of d(M), call it Fd(M) .
The difficulty is that it is impossible to generate Fd(M) because there are far too
many subpopulations of any reasonably sized M; for example, if M = 20, there
20 1046 different subpopulations (an amount far less than a googol,
are C1733
100
10 , but far more than the total number grains of sand of all the worlds
beaches, which is estimated to be less than 1019 ). In consequence, a Monte
Carlo approach is used to estimate Fd(M) : a large number of subpopulations of
size M are drawn randomly and independently from the whole population, and
their empirical distribution is taken as an approximation of Fd(M) .
Two questions at once present themselves: How should d(M) be defined?
How many subpopulations must be drawn for the results to be significant?
= (q , . . . , q ) are density functions of grades
If qX = (q1 , . . . , qn ) and qX
n
1
when the language contains n grades, their respective distribution functions are
QX = (q1 , q1 + q2 , q1 + q2 + q3 , . . . , 1)
and
QX = (q1 , q1 + q2 , q1 + q2 + q3 , . . . , 1).
Define the distance between them to be
d(QX , QX ) =
n1
n1
i
$
$
1 $$
1 $$
QX QX $ =
(qk qk )$.
n1
n1
1
When n = 6, the two density functions that are furthest apart are q = (0, 0, 0,
0, 0, 1) and q = (1, 0, 0, 0, 0, 0), which correspond to the two distribution functions Q = (0, 0, 0, 0, 0, 1) and Q = (1, 1, 1, 1, 1, 1), so d(Q, Q ) = 1. If they
268
Chapter 15
Table 15.15
Estimated Distributions of Fd(M) , 2007 Orsay Experiment
d(M) in Interval
M=100
M=50
M=20
M=10
[0, .10]
[0, .05]
[0, .03]
[0, .02]
[0, .01]
[0, .005]
100%
100%
98%
88%
43%
7%
100%
99%
90%
68%
20%
2%
100%
91%
64%
35%
5%
0%
98%
75%
38%
14%
1%
0%
xR
Common Language
269
by a voter is drawn i.i.d. from GX . This implies that if two voters were to judge a
great many candidates, then the distributions of their grades would be the same
even if their preferences over the candidates were completely different. It also
implies that with a fixed number of candidates but a large number of voters, the
X .
total distribution of the grades would be very close to G
X ) whose
A nonhomogeneous population (non-H-P) is a population in P(G
voters are at the opposite pole: every voter assigns a same grade to every candidate, so the voters belong to six different mutually exclusive sets, those who
always cast an Excellent, . . . , those who always cast a To Reject. This implies
that those who always vote Excellent constitute 5.7% of the population, . . . ,
those who always vote To Reject constitute 37.9% of the population.
The idea of this approach is to situate the observed behavior of the actual
population on the line going from homogeneous to nonhomogeneous population. If the actual behavior is relatively close to homogeneous behavior, then the
language has been used in about the same way; if it is relatively close to nonhomogeneous behavior, then the language has not been used in the same way.
Of course, the use of the grades depends on the candidates. When a population
includes a wide sprectrum of opinion and so do the candidates (as was the case
in the French presidential election of 2007), then if there is a common language,
the observed behavior in the use of grades should be homogeneous. But if the
candidates do not match the populations diversity of opinion, then if there is a
common language, the use of the grades should be expected to be less homogeneous. If all the candidates are of the left, the population to the right will assign
grades quite differently from the population to the left.
Fd(M) is estimated as it was estimated for the Orsay population (Orsay-P).
However, whereas non-H-P and Orsay-P are fixed, well-defined populations of
1,733 voters from which samples may be drawn, there is no fixed, well-defined
population of voters for the H-P model, which is defined probabilistically.
Accordingly, the estimates of Fd(M) are carried out as follows:
For non-H-P and Orsay-P, draw 10,000 random samples of size M and
calculate the d(M)s that determine the approximation.
270
Chapter 15
For H-P, generate ten different base populations of 1,733 ballots, each ballot
X ; draw 10,000 random samples of size M and
according to the distribution G
calculate the d(M)s for each base population; the average values over the ten
base populations determine the approximation. Ten sufficed because they were
almost the same.
The results are given in table 15.16a. Focus, for example, on the percentage
of the subpopulations for which d(M) .05.
When M = 100, all the samples from the homogeneous population (H-P) and
the Orsay population (Orsay-P) are within that distance, whereas only 88% of
those from the nonhomogeneous population (non-H-P) are within it.
When M = 50, all the samples from H-P are within it, 99% of the samples
from Orsay-P are within it, whereas only 65% of those from non-H-P are.
When M = 20, 98% from H-P are within it, 91% from Orsay-P are within it,
whereas a mere 29% from non-H-P are.
When M = 10, 90% from H-P are within it, 75% from Orsay-P are within it,
whereas only 8% from non-H-P are.
Table 15.16a
Estimated Distributions of Fd(M) , 2007 Orsay Experiment
M=100
M=50
d(M)
in Interval
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
[0, 1]
[0, .10]
[0, .05]
[0, .03]
[0, .02]
[0, .01]
[0, .005]
100%
100%
100%
100%
97%
65%
16%
100%
100%
100%
98%
88%
43%
7%
100%
100%
88%
54%
23%
2%
0%
100%
100%
100%
98%
86%
37%
5%
100%
100%
99%
90%
68%
20%
2%
100%
98%
65%
26%
7%
0%
0%
M=20
M=10
d(M)
in Interval
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
[0, 1]
[0, .10]
[0, .05]
[0, .03]
[0, .02]
[0, .01]
[0, .005]
100%
100%
98%
82%
54%
10%
1%
100%
100%
91%
64%
35%
5%
0%
100%
81%
29%
6%
1%
0%
0%
100%
100%
90%
58%
27%
3%
0%
100%
98%
75%
38%
14%
1%
0%
100%
55%
8%
1%
0%
0%
0%
Common Language
271
Table 15.16b
Estimated Quantiles rk of Fd(M) , 2007 Orsay Experiment
M=100
M=50
Quantile
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
99%
75%
50%
.0233
.0116
.0081
.0315
.0155
.0108
.0804
.0393
.0284
.0335
.0165
.0117
.0457
.0223
.0159
.1153
.0564
.0404
M=20
M=10
Quantile
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
99%
75%
50%
.0542
.0263
.0190
.0750
.0350
.0245
.1884
.0913
.0658
.0782
.0376
.0268
.1083
.0500
.0350
.2532
.1313
.0916
Acceptable Good
272
Chapter 15
Table 15.17a
Estimated Distributions of Fd(M) , Seven Candidates of the left, 2007 Orsay Experiment
M=100
M=50
d(M)
in Interval
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
[0, 1]
[0, .10]
[0, .05]
[0, .03]
[0, .02]
[0, .01]
[0, .005]
100%
100%
100%
99%
92%
47%
7%
100%
100%
99%
88%
67%
22%
3%
100%
100%
91%
57%
25%
2%
0%
100%
100%
100%
94%
74%
20%
2%
100%
100%
93%
70%
42%
8%
1%
100%
98%
69%
28%
8%
0%
0%
M=20
M=10
d(M)
in Interval
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
[0, 1]
[0, .10]
[0, .05]
[0, .03]
[0, .02]
[0, .01]
[0, .005]
100%
100%
95%
68%
36%
4%
0%
100%
98%
71%
38%
15%
1%
0%
100%
84%
31%
6%
1%
0%
0%
100%
100%
81%
41%
15%
1%
0%
100%
89%
48%
17%
5%
0%
0%
100%
57%
8%
0%
0%
0%
0%
Table 15.17b
Estimated Quantiles rk of Fd(M) , Seven Candidates of the left, 2007 Orsay Experiment
M=100
M=50
Quantile
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
99%
75%
50%
.0277
.0141
.0103
.0485
.0228
.0157
.0745
.0375
.0274
.0404
.0204
.0148
.0688
.0329
.0225
.1055
.0545
.0392
M=20
M=10
Quantile
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
99%
75%
50%
.0652
.0330
.0238
.1108
.0523
.0351
.1735
.0865
.0635
.0902
.0458
.0334
.1608
.0749
.0510
.2459
.1265
.0907
Common Language
273
Now suppose that the only candidates in the French election were the four
of the right (see table 6.8); that is, ignore the grades given to the other eight
candidates, and repeat the same analysis. The observed density function of all
of Orsay is in this case
To Reject
Poor
Acceptable Good
The results are given in tables 15.18a and 15.18b. For the four candidates
of the right, the quantiles of the Orsay population are again relatively close
to the homogeneous population, much as they were for all twelve candidates
(table 15.18b). That is a surprise: one would expect them to be less close. Why?
One clear difference among the three cases is that as the number of candidates decrease, the quantiles increase. For example, when M = 50, r99 (H-P) =
.0335 for all twelve candidates, r99 (H-P) = .0404 for seven candidates, and
r99 (H-P) = .0578 for four candidates. The same is true for all values of M.
It has already been observed that with a very large number of candidates, the
distributions of the grades of any two voters in a homogeneous population would
X . By the law of large numbers, as the number
be (almost) the same, namely G
of candidates increases, the distance between the homogeneous population and
X
X approaches zero. Note in passing that if each of the six values of 12G
G
were integer (e.g., 12GX = (1, 2, 3, 5, 8, 12), meaning the average number of
Excellent is 1, of Very Good 1, of Good 1, of Acceptable 2, of Poor 3, and of
To Reject 4), then it would have been possible to generate a random population
X exactly 0.
with distance from G
A perfectly homogeneous population (P-H-P) is one whose distance d(M)
X is zero. As was observed, the distance of a homogeneous from a
from G
perfectly homogeneous population varies as the number of candidates varies:
the fewer the number of candidates, the greater the distance.
What is very striking in the data is that for each M the distributions Fd(M)
of the distances of the nonhomogeneous populations from the perfectly homogeneous population are almost identical whether it concerns all candidates, the
seven candidates of the left, or the four candidates of the right. For example,
compare the distributions for M = 50 (table 15.19).
The same is true for each of the other values of M (as the reader may verify).
It is as if in each case there is an absolute, identical nonhomogeneous wall.
274
Chapter 15
Table 15.18a
Estimated Distributions of Fd(M) , Four Candidates of the Right, 2007 Orsay Experiment
M=100
M=50
d(M)
in Interval
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
[0, 1]
[0, .10]
[0, .05]
[0, .03]
[0, .02]
[0, .01]
[0, .005]
100%
100%
99%
94%
77%
29%
3%
100%
100%
99%
87%
65%
21%
2%
100%
100%
88%
58%
28%
3%
0%
100%
100%
97%
79%
52%
10%
1%
100%
99%
91%
66%
40%
6%
0%
100%
97%
68%
30%
9%
0%
0%
M=20
M=10
d(M)
in Interval
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
[0, 1]
[0, .10]
[0, .05]
[0, .03]
[0, .02]
[0, .01]
[0, .005]
100%
100%
83%
48%
20%
2%
0%
100%
97%
70%
37%
15%
1%
0%
100%
82%
33%
7%
1%
0%
0%
100%
95%
59%
22%
6%
0%
0%
100%
87%
47%
16%
4%
0%
0%
100%
57%
9%
0%
0%
0%
0%
Table 15.18b
Estimated Quantiles rk of Fd(M) , Four Candidates of the Right, 2007 Orsay Experiment
M=100
M=50
Quantile
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
99%
75%
50%
.0401
.0193
.0135
.0520
.0235
.0160
.0814
.0386
.0272
.0578
.0278
.0195
.0735
.0345
.0227
.1147
.0557
.0387
M=20
M=10
Quantile
H-P
Orsay-P
non-H-P
H-P
Orsay-P
non-H-P
99%
75%
50%
.0926
.0438
.0310
.1108
.0523
.0351
.1866
.0883
.0634
.1352
.0634
.0443
.1608
.0749
.0510
.2734
.1266
.0908
Common Language
275
Table 15.19
Fd(50) for Different Sets of Candidates
d(M) in Interval
All 12 candidates
7 candidates of the left
4 candidates of the right
[0, .005]
[0, .01]
[0, .02]
[0, .03]
[0, .05]
[0, .10]
[0, 1]
0%
0%
0%
0%
0%
0%
7%
8%
9%
26%
28%
30%
65%
69%
68%
98%
98%
97%
100%
100%
100%
On the other hand, the distance of the homogeneous from the perfectly homogeneous population varies. Therefore, to situate the observed behavior of the
actual populations on comparable lines in each of the three cases, take the ends
of the line to be the perfectly homogeneous population and the nonhomogeneous population, and see where they lie in comparison with the homogeneous
populations. Define the measure of closeness of a population P to a perfectly
homogeneous population relative to the nonhomogeneous population for any
k to be
k% (P) =
100rk (P)
%.
rk (non-H-P)
276
Chapter 15
Table 15.20
Measures of Closeness k% of Orsay-P and H-P to a Perfectly Homogeneous Population, relative
to the Non-Homogeneous Population, 2007 Orsay Experiment
99%
M
M
M
M
= 100
= 50
= 20
= 10
= 100
= 50
= 20
= 10
Orsay-P
H-P
Orsay-P
H-P
Orsay-P
29.0%
29.1%
29.4%
30.9%
39.2%
39.6%
40.7%
42.8%
29.5%
29.3%
28.8%
28.6%
39.4%
39.5%
38.3%
38.1%
28.5%
29.0%
28.9%
29.3%
38.0%
39.4%
37.2%
38.2%
= 100
= 50
= 20
= 10
50%
H-P
Orsay-P
H-P
Orsay-P
H-P
Orsay-P
37.2%
38.3%
37.6%
36.7%
65.1%
65.2%
63.9%
65.4%
37.6%
37.4%
38.2%
36.2%
60.8%
60.4%
60.5%
59.2%
37.6%
37.8%
37.5%
36.8%
57.3%
57.4%
55.3%
56.2%
99%
M
M
M
M
50%
H-P
99%
M
M
M
M
50%
H-P
Orsay-P
H-P
Orsay-P
H-P
Orsay-P
49.3%
50.4%
49.6%
49.5%
63.9%
64.1%
59.4%
58.8%
50.0%
49.9%
49.6%
50.1%
60.9%
61.9%
59.2%
59.2%
49.6%
50.4%
48.9%
48.8%
58.8%
58.7%
55.4%
56.2%
Table 15.21
Values of (P),
All 12 candidates
7 candidates of the left
4 candidates of the right
P-H-P
H-P
Orsay-P
non-H-P
0%
0%
0%
29.2%
37.4%
49.7%
39.2%
60.6%
59.7%
100%
100%
100%
Common Language
277
Why is the Orsay population closer to the homogeneous population for the
four candidates of the right than for the seven candidates of the left? The answer
is found by looking at the particular candidacies. The four candidates of the right
are Sarkozy, de Villiers, Nihous, and Le Pen; 55%60% of the voters gave a
grade of To Reject to each of the last three, and some 70%80% of the voters
graded them Poor or worse (see table 15.2). The observed distribution function
over the four shows 53% To Reject and 67% Poor or worse. The voters of the
left must have given grades to these three candidates not all that different from
the voters of the right. It is accordingly not surprising to find that the language
was relatively not very far from homogeneous for the four candidates of the
right. In contrast, there is little doubt that the voters of the right gave very
different grades to the seven candidates of the left, as the analysis shows.
15.4 Conclusion
The analysis of the 2007 Orsay experiment demonstrates several key facts
concerning the language of grades (the inputs of the majority judgment).
First, the outputs of the majority judgment are reasonable:
Alone the distributions of the grades of candidates are sufficient for knowledgeable observers to deduce their identities.
The distribution of the grades are consistent across voting precincts (though
they yield different majority-orders).
278
Chapter 15
16
Here this or that has happened, will happen, must happen; but he invents: Here this or
that might, could, or ought to happen. If he is told that something is the way it is, he will
think: Well, it could probably just as well be otherwise.
Robert Musil
280
Chapter 16
judgment it may be argued that they are not very paradoxical and indeed perhaps sensible. Moreover, it is shown that the majority judgment is consistent
in its evaluations. In any case, it is proven in chapter 17 that the only way to
avoid any one of these lumps implies using a point-summing method.
Averagesthat is, point-summing methodsseem eminently reasonable
because whenever any one voter or judge raises or lowers the grade of a competitor, the competitors final grade is raised (a little) or lowered (a little).
The same, of course, is true of the majority-values that determine the majorityrankings (though the majority-grades may not change). There are four main
arguments against sums or averages: first, the number-grades must have welldefined meanings to constitute a common language; second, for sums or
averages to make any sense at all the grades must belong to an interval measure
(which is almost never true in voting); third, the use of sums or averages unduly
favors centrist political candidates and often eliminates exceptional competitors in favor of competitors who are merely middling in every dimension; and
fourth, methods based on sums or averages are by far the most manipulable.
16.1 Majority and Average Objections
The majority judgment is motivated by the need for a method that avoids
Arrows paradox, avoids Condorcets paradox, combats strategic manipulation,
and, of course, satisfies unanimity and impartiality. The inputs to the basic model
are grades instead of comparisons, for otherwise there is no escaping dependence on irrelevant alternatives. Indeed, it has been shown that the only way to
avoid the Arrow and Condorcet paradoxes is to ignore who gives what grade
(theorem 9.2): only a candidates set of grades counts. It is evident that examples
may easily be invented where the outcome with the majority decision of the
traditional model differs with the majority judgment or with a method that relies
on sums or averages of grades. Observe that only two candidates are necessary
to either attack or support the majority judgment because it is independent of
irrelevant alternatives: adding a third candidate can change nothing concerning
the first two.
Example 16.1: New Model versus Traditional Model The new model uses
grades; the intensities of judges and voters count. When the common language
is numbers with well-defined meanings, say, from a low of 0 to a high of 20,
point-summing methods immediately leap to mind. Point-summing methods
and the majority judgment both yield transitive rankings, but when pairs of
candidates are compared by a majority of the voters preferences, the results
281
are not always transitive. Thus the following example, in which 2k + 1 voters or judges give grades to two competitors X and Y , should come as no
surprise:
k judges
1 judge
k judges
X:
20, . . . , 20,
10,
0, . . . , 0
Y:
19, . . . , 19,
9,
19, . . . , 19
If the grades are listed according to the order of the judges, and a higher grade
implies a preference for that candidate, the majority candidate in the traditional
model is X with k + 1 votes against Y with k. However, Xs majority grade and
average grade is 10, whereas Y s majority-grade is 19 and Y s average grade is
a shade under 19, so Y is a winner over X with both methods of the new model.
The two points of view are simply not compatible.
But do all the voters really see a significant difference between 20 and 19 or
between 10 and 9? In a large electorate the distinction is clearly too fine: 20
and 19 are about the same, say, Excellent, as are 10 and 9, say, Acceptable, and
the 0s are To Reject. This yields
k judges
1 judge
k judges
X:
Excellent, . . . , Excellent,
Acceptable,
To Reject, . . . , To Reject
Y:
Excellent, . . . , Excellent,
Acceptable,
Excellent, . . . , Excellent
k judges
1 judge
k judges
X:
12, . . . , 12,
12,
4, . . . , 4
Y:
16, . . . , 16,
8,
8, . . . , 8
Here, under the same assumptions as in the last example, the majority candidate
of the traditional model is Y with 2k votes against X with 1 vote, and Y is also
the average-vote winner with a score of slightly under 12 to Xs slightly over 8.
But Xs majority-grade is 12 and Y s is 8, so X is the majority judgment winner.
The situation, however, is highly artificial; nothing remotely resembling it has
been encountered in practice. And as a mathematical possibility, it is very rare.
282
Chapter 16
Under the impartial culture assumption with 2k + 1 voters, the probability that
half the voters give to two candidates more than their majority-grades is of the
order 21k . Moreover, one judge is able to make the majority-grade of X be any
grade from 4 to 12 and the majority-grade of Y any grade from 8 to 16.
In any case, it is perfectly reasonable for X to be the winner: a majority gives
X the grade 12 and a majority gives Y the grade 8. Why should this majority
decisiona decision reached by looking at the example horizontallybe any
less valid than the traditional models majority decisiona decision reached by
looking at the example vertically? The notion of majority is not an axiom of
the traditional model. Unanimity is demanded, but it is satisfied by the majority judgment: when all of Xs grades are above Y s grades, then X is ranked
above Y .
Example 16.3: Majority Judgment versus Average (or Sum)
A more insightful
1 judge
k judges
X:
20, . . . , 20,
20,
0, . . . , 0
Y:
20, . . . , 20,
19,
19, . . . , 19
The majority judgment winner is X, but most observers who look casually at
this example opt for Y .1 The habits of a lifetime immediately suggest that the
grades of each candidate should be added, or their averages computed, so Y
leads X by a factor of almost 2. Alternatively, if the grades are listed in the
order of the judges, the traditional models point of view says that Y wins with
k votes against X with 1 vote because the first k judges are indifferent.
But unless the 2k + 1 judges are a small number of very discerning experts,
almost none see a real difference between grades of 20 and 19they are all
Excellentand the 0s are To Reject. So the objection disappears, Y is evaluated
Excellent by all the voters and is the big winner. If, on the other hand, they are a
few expert judges who are able to make the fine distinction between 19 and 20,
then the 0s are almost surely manifestations of strategic manipulation. In this
extreme example it is then perfectly sensible to let the majority-grades decide.
When an election is in the offing with a large number of voters, then, as
has been emphasized repeatedly, the majority judgment must offer a language
that has only a few grades so that it will be understandable to all participants; otherwise, the language will not be common to all, and the results
1. Roberto Cominetti proposed the example in January 2007 and reported that everyone he asked
said Y should be the winner.
283
become meaningless. When the grades used are those of the presidential
election experimentsExcellent down to To Rejectan alternative example
meant to express an objection similar to the preceding example is the following:
k judges
1 judge
k judges
X:
Excellent, . . . , Excellent,
Excellent,
Poor, . . . , Poor
Y:
Excellent, . . . , Excellent,
Good,
Good, . . . , Good
Excellent
Very Good
Good
Acceptable
Poor
To Reject
X:
9%
40%
51%
Y:
4%
47%
49%
2. Those given in percentages are very slight modifications of examples due to Monzoor Ahmad
Zahid, communicated by H. de Swart, January 2009.
284
Chapter 16
X is the more centrist, more consensual candidate; Y is the candidate with more
confirmed political viewsand thus has more high grades and more low grades
than Xand he happens to be conferred a higher grade by a majority, so he
wins with the majority judgment.
A very similar example with a jury of eleven expert judges is the following:
Excellent
Very Good
Good
Acceptable
Poor
To Reject
X:
Y:
Y s majority-grade is again Very Good ; Xs is only Good. X is the competitor who has comfortably high grades all around but has failed to arouse the
enthusiasm of a majority, whereas Y has won the enthusiasm of a majority.
However, the multitude of 0s suggests that some of the grades may well have
been deliberately exaggerated.
In the following example, both candidates have the majority-grade Very
Good+, but Y s majority-gauge (50%,Very Good+, 49%) is higher than Xs
(1%,Very Good+, 0) (the same is true of their majority-values). The summingeye once again prefers X, the centrist. The 1% of voters who assigned Very
Good to Y can, by changing, give Y any one of the six possible majority-grades
and so can decide who is the winner.
Excellent
Very Good
Good
Acceptable
Poor
To Reject
X:
1%
99%
Y:
50%
1%
49%
A very similar example with a jury of eleven expert judges that leads to the
same ranking is the following:
Excellent
Very Good
X:
10
Y:
Good
Acceptable
Poor
To Reject
Suppose X and Y are wines; which should be ranked higher? It is not at all
clear that X should be ranked ahead of Y (as the example is meant to suggest).
The majority judgment has crowned Y , perhaps the more daring, the more
exceptional, controversial wine; most point-summing methods would make X
the winner, but not all. If, for example, Excellent awards 10 points, Very Good
4 points, Good 3 points, and so on down to To Reject 0 points, then Y wins.
What is wrong with these points? Once again, the presence of many 0s suggests
285
Table 16.1
Number of Wins of the Centrist Candidate (Bayrou), 2007 Orsay Experiment
Entire Population
First-past-the-post
Majority judgment
Point-summing
Borda
Representative Population
3 candidates
12 candidates
3 candidates
12 candidates
1,848
7,786
9,219
9,462
2,328
7,824
9,231
9,612
47
4,073
7,762
7,120
45
4,019
7,721
9,586
Note: Entire population = ten thousand samples of 101 ballots, which were drawn from all
1,733 ballots of the 2007 Orsay experiment; representative population = ten thousand samples
from 101 ballots, which were drawn from a sample of 501 ballots representative of the first-round
national vote.
strategic manipulation. And here the one judge that gave Y a Very Good is able
to give Y whatever majority-grade she wishes.
The four examples just discussed point to another property: the tendency
of the summing-eye to favor centrist political candidates or middling
competitorscompetitors that are judged highly by all judges because they
have few faultsversus candidates that are more confirmed on the right or the
left of the political spectrum or are exceptional competitorscompetitors that
dare, fail on some points, yet soar so high as to overcome whatever their other
defects. In these examples the point-summing view favors the centrists and
middling competitors much more than does the majority judgment. Of course,
majority judgment in turn favors centrist candidates much more than first-pastthe-post. These tendencies have been proven experimentally (and are discussed
again in the subsequent chapters).
Table 16.1 extracts the relevant information. The point-summing method
assigned a 5 to Excellent down to a 0 to To Reject. The point-summing
method significantly favors the centrist in comparison with the majority judgment. Observe that elections are almost never conducted using Bordas method.
Why? One valid reason may well be that there is an aversion to a method that
almost always elects the middling competitor and the centrist candidate, though
the standard objection is its ease of manipulation.
16.2 No-Show Objections
The second type of lump is generic to a family of essentially equivalent phenomena that bear different names in the literature of the traditional theory of
286
Chapter 16
20,
17,
15,
15,
12,
11,
18,
17,
16,
14,
13,
10,
287
The same phenomena occur in this example on the high side. If the eighth
judge gives X a 19 and to Y an 18, or if he gives them both a 17, then Y wins.
However, who knows what are the eighth judges intentions (or his utility
function)? If he gives them both low grades, then he seems to have a relatively
low opinion of both candidates, may not care much about which of the two wins,
and be very pleased to see their grades lowered (X from 15 to 12 and Y from
14 to 13). If he gives them both high grades, he seems to have a relatively high
opinion of both candidates, may not care much about which of the two wins,
and be pleased to see their grades raised. But if he felt strongly about preferring
X to Y and gave X a grade between 20 and 15 and Y a grade between 0 and
14, then X remains the winner.
Lemma If X with majority-grade is the winner against Y with majoritygrade , and a new judge assigns or a higher grade to X and strictly lower
than to Y , or symmetrically, assigns or lower to Y and strictly higher than
to X, then X remains the winner.
All of this suggests that the no-show paradox is not of much importance.
The violation of proper cancellation is a positive property, not a negative
one. The majority judgment gives to every voter the possibility of altering the
ranking, whether she is indifferent between several or all candidates, or not
(though giving two 19s is a very different indifference than giving two 2s,
and either of these indifferences may lead to grades the voter prefers). This is
a clear inducement to participate. It is not true of point-summing methods that
do cancel properly.
A rendition of example 16.5 when
there are only a few grades is the following:
Excellent,
Excellent,
Very Good,
Very Good,
Acceptable,
Acceptable,
Poor
Y : Excellent,
Excellent,
Excellent,
Good,
Good,
Acceptable,
Poor
X:
288
Chapter 16
15
17
Y:
11
15
11
It elects Y again in the 124-voter electorate 2, though narrowly, for both candidates have the majority-grade Good , but Xs grades worse than Good are
more numerous than Y s.
Electorate 2 Excellent Very Good
Good
X:
20
36
56
Y:
28
16
24
51
Excellent
Very Good
Good
Acceptable
Poor
X:
29
51
25
65
Y:
39
31
33
62
To Reject
Majority-Gauge
So Y is the winner in each of the electorates 1 and 2 but not in the combined
electorate.
289
Why should Y be the winner in the combined electorate? The two electorates may have agreed on the rankings, but they certainly did not agree on the
evaluations. Notice that in electorate 2, X has many more grades above Good
than does Y . Why should agreement on rankings be more important than the
evaluations? The evaluations are much more discerning than mere rankings.
Join-inconsistency is real but rare. Several simple theorems together with
experimental evidence explain why. A social grading function (SGF) f defined
for any number of voters, is grade-join-consistent if
f (1 , . . . , n )
and
f (1 , . . . , k )
implies
f (1 , . . . , n , 1 , . . . , k ) ,
and the same holds if the inequalities are reversed, or are strict, or are equations.
Theorem 16.1
join-consistent.
The hypotheses (with ) say that an absolute majority of s and an
absolute majority of s are or above, so an absolute majority of s and s
are or above as well (and, of course, the same observation holds for , <, >,
or =), showing that the majority-grade is grade-join-consistent.
Now, observe that when a candidate Xs majority-gauges in two separate electorates are (p1 , , q1 ) and (p2 , , q2 ), so that they have the same
majority-grade , then Xs majority-gauge in the combined electorate is (p1 +
p2 , , q1 + q2 ). This fact immediately implies that the modified majority-grade
is grade-joint-consistent, too.
n
Proof
The theorem says that when two electorates are in agreement on their evaluations of a candidate with the majority judgment, then the combined electorate
evaluates the candidate identically. In the new model the important point is to
assure a consistency in evaluations, not a consistency in orders. The majority
judgment does.
Two observations are now immediately evident.
Corollary 1 When a candidate X wins in two electorates with a majority-grade
of at least and all other candidates have majority-grades that are strictly lower
than , then X wins in the combined electorate.
Corollary 2 When a candidate X wins against Y in two electorates with a
290
Chapter 16
Table 16.2
Number of Wins with Modified Majority-Grades, 2007 Orsay, Experiment
Entire Population
Representative Population
Good
Good +
Very Good
Good
Good +
Very Good
Royal
Bayrou
Sarkozy
1
32
0
734
8,160
413
6
638
6
25
789
52
581
3,537
4,901
0
0
112
Total
33
9,307
650
866
9,019
112
Note: Entire population = ten thousand samples of 201 ballots, which were drawn from all 1,733
ballots of the 2007 Orsay experiment (ten cases ended in ties); representative population = ten
thousand samples from 201 ballots, which were drawn from a sample of 501 ballots representative
of the first-round national vote (three cases ended in ties).
291
16.3 Conclusion
292
Chapter 16
in the new model (see chapter 17). Thus those who ardently insist on any one
of them must accept all the very bad properties of point-summing methods.
Those who defend the traditional model and criticize the new one have studiously avoided any discussion of the various monotonicity properties. And
yet, they are crucial. It is inconceivable to accept a method that penalizes a
candidate when voters change their minds and think more highly of her. If a
candidate moves up in the estimation of the voters, then she should not lose
in the final standings (choice-monotonicity). If voters estimations remain the
same except that the winner moves up, then not only should she still be the
winner but the final ranking among all the others should remain the same (rankmonotonicity). Some methods of the traditional model satisfy one or the other of
these two properties, but none satisfy both (theorem 4.4). And no method in the
traditional model guarantees that when a nonwinner falls in the estimation of the
voters, the winning candidate remains the winner (strong monotonicity, chapter
5). The majority judgment is at once choice-monotonic, rank-monotonic and
strongly monotonic.
The majority judgment claims to be a practical method for electing and
ranking. The rebuttals to the supposed lumps of the majority judgment were
given without invoking a fact of life: judges and voters may behave strategically.
When the possibility of strategic behavior is invoked, the lumps become even
less significant (see chapter 20).
17
Point-Summing Methods
294
Chapter 17
or alternatively, for single-digit voting, which uses ten points, 09, again
not defined (RangeVoting.org 2007). Some French bloggers suggested that the
six-grade language used in the Orsay experiment was fine but that it would
be simpler to associate a 6 with Excellent down to a 1 with To Reject and
add the numbers to determine the winner and the order of finish. An electoral
experiment of 2007 used the points 0, 1, and 2. 2
The principal attraction of point-summing methods over the traditional methods or approval voting is that they permit voters and judges finer scales of
distinction by which to distinguish candidates and competitors. This chapter
characterizes these methods in several different ways to show that if certain
conditions are imposed, they are the only possible methods. The results should
not be seen as arguments in favor of point-summing methods. Quite the contrary. They are ill-conceived for several reasons, some apparent, some more
subtle. They are all highly manipulable; the wider the gap between the minimum and maximum of the points, the more they are manipulable. When the
numbers have no definition, the language of grading may well not be common,
so Arrows impossibility theorem applies. When the numbers have definitions,
the definitions may be so formulated that they nonetheless induce voters to make
relative comparisons rather than absolute evaluations, so the method may suffer
from Arrows paradox. More fundamentally, once again, summing numbers or,
equivalently, taking their average, is meaninglessunless they are drawn from
a bona fide interval scale, as shown by the theory of measurementso has no
justification whatsoever. There is no doubt that point-summing methods are to
be shunned.
17.1 Point-Summing Methods: Theory
General Point-Summing Methods
Point-Summing Methods
295
where >lex means lexicographically higher and the sums are taken over the
judges. For such methods to be choice-monotonic it is necessary and sufficient
that when 0 0 are any two single grades,
1
(0 ), . . . , k (0 ) >lex 1 (0 ), . . . , k (0 ) .
A lexicographic point-summing method is a point-summing method if there
exists a single function :
R for which
S if and only if
(i ) >
(i ).
i
and
implies
(, ) S (, ),
and
implies (, ) S (, ).
and
S
This notation assumes that and have the same number of grades (say, n), and
that and also do (say, k); and when n = 1, S means the individuals
input. This implies that if a set of voters is indifferent between two candidates,
then an additional individuals input is societys output.
A social ranking function is join-consistent if and only if it is a
choice-monotonic lexicographic point-summing method.
Theorem 17.1
296
Chapter 17
nk
m1
mk
= (1 , . . . , 1 , . . . k , . . . , k ) and = (1 , . . . , 1 , . . . k , . . . , k ),
where there are n = ni = mi judges. Take k to be the simplex of dimenQ
sion k 1, and k = k Qk to be the simplex of rational numbers. For the
Q
and just given, define the binary relation on k by
n
,...,
nk m1
mk
,...,
n
n
n
if and only if
S ,
if and only if
S .
and on k by
n
,...,
nk m1
mk
,...,
n
n
n
D2 = {(x, y) D : y x}.
Point-Summing Methods
297
contain an open set, for then it would contain rational points (x, y). In fact, as
will be shown, they belong to both D1 and D2 , which is a contradiction.
To see that (x, y) belongs to D1 , note first that ri(cvxD1 ) = ri(cvxD1 ) =
ri(D1 ), where ri is the relative interior and cvx the convex hull operators. It has
been shown that D1 Q-convex implies D1 = Q2k cvxD1 (see Young 1975,
lemma 1). Therefore,
(x, y) Q2k ri(D1 ) = Q2k ri(cvxD1 ) D1 ,
so (x, y) D1 . The same argument applies to D2 .
As a consequence, ri(D1 ) ri(D2 ) = , and so D1 and D2 may be properly separated by a hyperplane: there exists = (1 , . . . , k ) and =
(1 , . . . , k ), both in Rk and R such that (x, y) D1 if and only
x + i i yi , and symmetrically, (x, y) D2 if and only if
if
i i i
yi = 1, the may be absorbed into the i ,
i i xi +
i i yi . Since
which comes to the same thing as taking = 0.
What does this say about D? Take any (necessarily rational) (x, y) D for
which i i xi > i i yi . The point (x, y) is in the relative interior ri(D1 )
implying, as above, that it belongs to D1 , so x y. Symmetrically, i i xi <
i i yi for (x, y) rational implies y x.
Now note that whenever (x, y) is rational and x = y, then since x y,
i i xi =
i i yi . Taking different x = y = (0, . . . , 1, . . . , 0) with one 1 in
the ith position
= i for i = 1, . . . ,k, so = .
implies i
2
If D = (x, y) D : i i xi = i i yi contains no point such that
x y, the proof ends here. The method is a point-summing method. Otherwise, rename to be 1 . The dimension of D 2 is strictly smaller than
that of D. The same reasoning implies the existence of 2 such that when
(x, y) D 2 , i2 xi > i2 yi implies x y and i2 xi < i2 yi implies
y x. If D 3 = (x, y) D 2 : i i2 xi = i i2 yi contains no point such
that x y, the proof ends; otherwise, it ends in at most 2k steps because each
successive D j is of smaller dimension. Thus the social ranking function must
be a lexicographic point-summing method.
To see that it must be choice-monotonic, consider any list of grades
= (1 , . . . , n ), and two single grades 0 0 . Join-consistency implies
(, 0 ) S (, 0 ). Therefore,
( 1 (0 ), . . . , k (0 )) + i ( 1 (i ), . . . , k (i ))
>lex ( 1 (0 ), . . . , k (0 )) + i ( 1 (i ), . . . , k (i )),
or
1
(0 ), . . . , k (0 ) >lex 1 (0 ), . . . , k (0 ) ,
298
Chapter 17
L
(, , . . . , ) S (, , . . . , ).
Q
Theorem 17.2
The proof of the previous theorem ends with one thus with a pointsumming methodwhen
(x , y ) D = (x, y) D : i i xi = i i yi implies x y .
So suppose, first, that (x , y ) D but x y . Take any rational (x, y)
ri(D 1 ), so that x y. Then for any n > 0,
Proof
n1
1
(x, y) +
(x , y ) ri(D 1 ),
n
n
meaning
x + (n 1)x
y + (n 1)y
n
n
But this does not respect large electorates, for x y with n large enough
implies the preference must go in the opposite direction.
So suppose that (x , y ) D but x y . Then take any (x, y) ri(D 2 ), and
repeat the same argument to find the symmetric contradiction, and so conclude
that (x , y ) D implies x y .
is unique up to a positive linear (e.g., affine) transformation, for suppose
there were two point-summing methods, and . Take any (x, y) k k .
Point-Summing Methods
Then
i yi
i xi
299
if and only if
i xi
i y i ,
and a standard algebraic argument (such as that used in von Neumann and
Morgenstern 1944 in establishing the expected utility theorem) proves the
claim.
That any choice-monotonic point-summing method is join-consistent, transitive, and respects large electorates, ties, and grades is obvious, completing the
proof.
n
Since the sum is always taken over a finite number of judges or voters, a
finite number of grades is used in every comparison. This permits the previous
theorem to be generalized for any language, finite or infinite, denumerable or
not, measurable or not.
Theorem 17.3 Whatever the language
, a join-consistent social ranking
implies
(, o ) S (, o ),
and
implies
(, o ) S (, o ).
Thus join-consistency immediately implies proper cancellation. In fact, they
are equivalent.
300
Chapter 17
Theorem 17.4
That a lexicographic point-summing method cancels properly is obvious. The proof shows that the hypotheses imply join-consistency, so the result
follows from the last theorems.
Suppose and have the same number of grades (say, n), and also have
the same number of grades (say, k). If S and S , then, applying the
cancellation property one i at a time,
Proof
(, ) S (, ),
and one j at a time
(, ) S (, ),
conclude that
(, ) S (, ).
Suppose then that S and S . The same argument shows that
(, ) S (, ), so join-consistency holds.
n
A social ranking function S is participant-consistent if, when o and o are
single grades and and are sets of the same number of grades,
S
and
implies
(, o ) S (, o ).
o o
implies (, o ) S (, o ),
and
Theorem 17.5
Point-Summing Methods
301
It was observed (in chapter 16) that the majority-ranking is not participantconsistent. Is any order function participant-consistent? The max order function
is f 1 , the highest of the set of grades; the min order function (when there are n
grades) is f n , the lowest of the set of grades.
The max and min are the only order functions that are
participant-consistent social ranking functions (when the language has at least
three grades).
Theorem 17.6
302
Chapter 17
k(n)
= (1, . . . , 1,
1,
k(n)
= (1, . . . , 1, 0,
0, . . . , 0)
so f k(n) () f k(n) ().
0, . . . , 0)
k(n)
= (2, . . . , 2, 2, 1, . . . , 1)
so f k(n) () f k(n) ().
k(n)
= (2, . . . , 2, 1, 1, . . . , 1)
Then f k(n+1) (, 1) = 1 and f k(n+1) (, 0) = 1, so (, 1) (, 0), contradicting participant-consistency.
Participent-consistency must fail if max and min are used for successive sizes
of the electorate. To see this, consider the following sets of n grades:
= (2, 0, . . . , 0)
= (1, 1, . . . , 1).
If max is used for n grades and min for n + 1 grades, then , but adjoining
a 2 to and a 1 to reverses the order, contradicting participant-consistency.
If the min is used for n grades and the max for n + 1 grades, then , but
adjoining a 0 to and a 1 to reverses the order, too, completing the proof. n
The max and min are the most manipulable of the order functions because
every judge can raise the final grade in the first instance and every judge can
lower the final grade in the second instance. It is the middlemost order functions
that best resist manipulation.
Observe that max and min are step-continuous but neither choice-monotonic
nor join-consistent. They fail to be choice-monotonic because some grade other
than the max or min of a candidate may be raised, but this changes nothing in the
grade of the candidate. The order function max fails join-consistency because
when max() max() and max( ) = max() = o , but o max(), it is
not true that max(, ) max(, ) because the two terms are equal to o .
And, for a similar reason, min fails as well. This explains why theorem 17.5
requires more conditions than the earlier theorems.
There may, of course, be many candidates with the same max or the same min
final grade. Can order functions be catenated into a lexicographic scheme for
Point-Summing Methods
303
if ( 1 , . . . , n ) lex ( 1 , . . . , n ).
An alternative description is to specify the levels of the hierarchy in the lexicography: first use f n , then f n1 , , finally f 1 (when n grades are assigned).
Recall that a lexi-order social ranking function associates to each size of the
electorate n, a permutation n of the order functions f n (1) , . . . , f n (n) , and
ranks the candidates by
S
Theorem 17.7 Leximax and leximin are the unique meaningful social ranking
functions that cancel properly (when the language has at least three grades).
They are therefore join-consistent and participant-consistent.
To begin, recall that proper cancellation implies choice-monotonicity
which, together with meaningfulness (or order-consistency; see theorem 11.5b),
implies the SRF must be a lexi-order SRF.
If n = 1, there is nothing to prove. It is first shown that n (1) = 1 for all n or
n (1) = n for all n.
When n = 2, either 2 (1) = 1 or 2 (1) = 2.
If 2 (1) = 1, assume inductively that k (1) = 1 for k n, and suppose
n+1 (1) = 1. Consider the two sets of n grades:
Proof
4. Leximax and especially leximin are well-known functions in the literature on welfarism (see
Hammond 1976 and H. Moulin 1988).
304
Chapter 17
#
= (2, 0, . . . , 0)
for which S .
= (1, 1, . . . , 1)
But n+1 (1) = 1 implies (, 0) S (, 0), contradicting the cancellation property. So n (1) = 1 for all n.
If 2 (1) = 2, assume inductively that k (1) = k for k n, and suppose
n+1 (1) = n + 1. Consider the two sets of n grades:
#
= (2, . . . , 2, 0)
for which S .
= (1, . . . , 1, 1)
But n+1 (1) = n + 1 implies (, 2) S (, 2), again contradicting the cancellation property. So n (1) = n for all n.
If n (1) = 1 for all n, assume inductively that n (j ) = j when j k for all n.
Necessarily k+1 (k + 1) = k + 1 since the first k order functions have already
been assigned. So suppose n + 1 is the smallest integer for which n+1 (k +
1) = k + 1. This means that n+1 (k + 1) > k + 1. Consider the two sets of n
grades:
k+1
2 , 0, . . . , 0)
= (2, . . . , 2,
for which S .
k+1
= (2, . . . , 2,
1 , 1, . . . , 1)
But n+1 (k + 1) > k + 1 implies (, 0) S (, 0), so cancellation is once again
violated. Therefore n (j ) = j when j k + 1 for all n. So induction implies
n (j ) = j for all j and all n; this is the leximax social ranking function. A
similar argument shows the leximin social ranking function must be taken when
n
n (1) = n for all n, which completes the proof.
In fact, leximax and leximin are choice-monotonic lexicographic pointsumming methods. The language of grades
is finite; let it be 1 k .
Leximax is the lexicographic point-summing method defined by
k
1
1 if i = j ,
j
(i ), . . . ,
(i )
where (i ) =
0 if otherwise.
i
Letting nj () be the number of j s in so n = k1 nj () , this may be
expressed as
1
k
(i ), . . . ,
(i ) = (n1 (), . . . , nk ()).
i
Point-Summing Methods
305
It has in its first place the number of highest grades 1 , in its second place the
number of second highest grades 2 , . . . , down to the number of lowest grades
k , so gives the exact same ordering as leximax.
An equivalent description is
1 if i j ,
1
k
(i ), . . . ,
(i )
where j (i ) =
0 if otherwise,
i
or
(i ), . . . ,
(i ) = n1 (), n1 () + n2 (), . . . ,
k
k
nj () .
It has in its first place the number of highest grades 1 , in its second place the
number of the two highest grades 1 and 2 , . . . , and in its last place the number
of all grades n. Seen in this guise, leximax is a specific type of lexicographic
approval voting: an approval is the highest grade, the candidate with the most
wins; if there is a tie, then an approval is either of the two highest grades; if
there is still a tie, an approval is any of the three highest grades; and so on.
Leximin is the lexicographic point-summing method defined by
k
1
(i ), . . . ,
(i ) = (nk (), nk1 (), . . . , n1 ()).
i
or
i
(i ), . . . ,
i
k1
(i ), 0
where j (i ) =
1 if i kj ,
0 if otherwise.
306
Chapter 17
The choice of the language of grades to use in the first real test of the majority
judgmentthe 2007 Orsay experimentwas the subject of long debates. The
first, obvious possibility was the usual 020 scale of the French educational
system. Its meanings are very different than its linear interpolations in the
0100 scale often used in the United States (see chapters 7 and 8): a 10/20 is a
passing grade in France, but a 50/100 is a very clear failing grade in the United
States. In France an 18/20 is excellent, a 15/20 is very good, a 12/20 is good.
However, for the population at large, the concern was that numbers could well
be understood in very different ways; moreover, how would a voter understand
the difference between a 16 and a 17, or a 7 and an 8? Twenty different clear and
distinct definitions would have to be given, well beyond Millers magic seven
plus or minus two. A scale of words is much more meaningful to voters than a
scale of numbers. An additional difficulty with undefined numbers is that they
are abstract: voters usually assume they will be summed or averaged, and in
any case undefined numbers constitute a clear invitation to manipulation. That
summing points is a bad idea in theory is clear. That it is a bad idea in practice
emerges from the four following experiments.
One experiment was conducted in December 2007, when the cole Polytechnique student government (Ks) elections took place.5 The cole Polytechnique
is one of Frances three most elite undergraduate institutions. 6 Admission crucially depends on mathematical ability. In these elections, teams or parties are
formed and presented as entire lists, and the students vote for one of the lists,
and the list with the most votes is elected. There were two serious lists (the
others were organized as larks), called Jukesbox and Kesdelweiss (the names
must include the syllable kes). In parallel with the official vote, the students
were asked to (1) give one of the six grades used in the presidential election
experiments,7 and (2) assign numerical grades between 0 and 100 to each of
these two lists on one and the same ballot. They were informed that the highest
median grade would determine the winner in both cases.
5. The origin of the Ks dates to a decree of the revolutionary year XII (1804) that made the financial
situation of students of modest backgrounds precarious. Their better-off fellows established a
cashiers office (caissier, whence the shortened name) that gathered funds and distributed them.
Across the years, the Ks came to represent students in dealing with Polytechniques administration.
6. The others are the cole Normale Suprieure and the Institut dtudes Politiques (IEP), often
called Sciences Po.
7. Trs Bien (Excellent), Bien (Very Good), Assez Bien (Good), Passable (Acceptable), Insuffisant
(Poor), Rejeter (To Reject).
Point-Summing Methods
307
The Jukesbox list was the official winner with 244 (54%) votes to Kesdelweisss 206 (46%) (ignoring the scattered votes for other lists). Of those who
voted officially, 228 (roughly half ) participated in the experiment and 221 votes
were valid. The Jukesbox list was also the winner with the majority judgment
with both of the scales of grades, obtaining a majority-gauge of (31%,Very
Good +, 23%) to Kesdelweisss (20%,Very Good , 35%) in the first case, and
(32%, 80, 48%) to its rivals (47%, 70+, 42%) in the second case.
Grades on ballots permit deducing the face-to-face vote, assuming that a
higher grade for one list implies a vote for that list, and a same grade implies
1
2 vote for each list. Using the six-grade language makes Jukesbox the winner
with 57% of the votes; the [0, 100] scale makes it the winner with 54% of the
votes. This together with the outcomes suggests that the grades were assigned
in a manner consistent with the actual votes.
What is most interesting about this experiment is the use of the number-grades
in the [0, 100] scale: they constitute much too rich a set, and they are not used
in the same way by the voters. Table 17.1a gives the numbers of word-grades
that were used. To begin, 87% of the 442 number-grades were multiples of 5,
showing that a scale of twenty-one levels is already quite sufficient (moreover,
15 and 20 were never used). Since one ballot contained both majority judgment
votes, it is possible to give the distributions of the number-grades corresponding
to each word-grade along with other relevant information (for this purpose, the
fifty-six grades that were not multiples of 5 were rounded to the closest multiple
of 5; see table 17.1b). As may be observed, voters who assigned an Excellent
gave number-grades as low as 40 and as high as 100; those who assigned a Very
Good gave number-grades as low as 0 and as high as 100; similarly, to Good
correspond a low of 0 and a high of 90 to Acceptable 5 and 62 to Poor 0 and 50.
Although these are the extremes, the wide distributions corresponding to the
three highest word-grades show that the voters ascribed practically no common
meaning to the number-grades in the range [0,100]. On the other hand, the
medians of the number-grades corresponding to each of the word-grades clearly
show that statistically voters had in mind the French [0, 20] scale: dividing by
5 the median attached to Excellent yields 18, that attached to Very Good yields
15, and that attached to Good yields 12.
Another experiment that concerned a point-summing method (in part) was
conducted on January 23, 2002, in the main entrance hall (la pniche) of the
Institut dtudes Politiques of Paris (IEP, popularly referred to as Science Po)
from 9:00 a.m. to 5:30 p.m. (Balinski, Laslier and van der Straeten 2002). It
was a dress rehearsal for the 2002 Orsay experiment on approval voting that
was conducted in parallel with the French presidential election of that year (see
chapters 6 and 18). Participation was open to everyone, students, staff, and
308
Chapter 17
Table 17.1a
Number of Word-Grades Used, cole Polytechnique, December 2007
Excellent
Very Good
Good
Acceptable
Poor
To Reject
113
201
78
29
8
13
Total
442
Table 17.1b
Distribution of Number-Grades Attached to Excellent, Very Good, and Good, cole Polytechnique,
December 2007
Range [40,100]
89.2
Average
90
Median
Excellent
9.2
S.D.
Range [0,100]
73.4
Average
75
Median
60
Median
95
10.6%
90
30.1%
85
15.9%
80
15.9%
75
0.9%
70
3.6%
85
6.5%
80
27.4%
75
15.4%
70
18.4%
65
7.0%
60
13.8%
70
14.1%
65
3.8%
60
39.7%
55
6.4%
50
10.3%
45
19.2%
Very Good
14.3
S.D.
Range [0,90]
56.0
Average
100
23.0%
90
11.5%
Good
15.9
S.D.
75
6.5%
faculty, but the vast majority of those who voted were students. The context
was the 2002 French presidential election, but the official candidates were not
yet known, so the ballot presented fifteen likely candidates (in fact there were
sixteen official candidates, three of whom were not on the Sciences Po ballot,
whereas two who were did not run officially).
Participants were asked to complete one ballot. It listed the fifteen candidates in alphabetical order. To the line of each candidate corresponded slots
in two columns: in the first, the voter was to enter a cross meaning approval
(assentiment), otherwise leave a blank; in the second, the voter was to enter a
number of points between 0 and 10, a blank interpreted as a 0. The total number
of crosses given a candidate determined their order with approval voting (the
approval voting part of the experiment is discussed in chapter 18); the sum of
the points given a candidate determined their order with the point-summing
method. A total of 429 persons participated. Tables 17.2a17.2c give several
statistics concerning number-grades.
Point-Summing Methods
309
Table 17.2a
Average Number of Number-Grades per Ballot, Sciences Po, January 2002
NumberGrade
Average No.
per Ballot
7.26
1
2
3
4
5
6
7
8
9
10
1.33
1.10
1.00
0.86
1.18
0.72
0.56
0.46
0.15
0.37
7.26
To Reject
2.43
Poor
1.86
Acceptable
1.90
Good
1.02
Very Good
0.52
Excellent
Table 17.2b
Distribution, Ballots Frequency of Maximum Number-Grades, Sciences Po, January 2002
NumberGrade
Frequency
in Ballots
0
1
2
3
4
5
6
7
8
9
10
0.9%
0.5%
0.0%
1.2%
1.9%
3.0%
11.2%
16.3%
25.9%
9.1%
30.1%
Their distribution (table 17.2a) makes little sense; for example, half as many
9s are used as 10s. They were probably used to mean the same thing. Reinterpreting the number-grades with 0 meaning To Reject, 1 and 2 Poor, 3 and 4
Acceptable, 5 and 6 Good, 7 and 8 Very Good, and 9 and 10 Excellent makes a
lot more sense. The majority-ranking of the fifteen candidates with the eleven
grades is identical to the majority-ranking with the six grades except that the
third- and fourth-place candidates are inverted. It is obvious that examples may
be invented in which regrouping grades yields very different outcomes, but in
practice such reasonable regroupings will lead to very similar outcomes.
310
Chapter 17
Table 17.2c
Distribution, Ballots Number of Maximum Number-Grades, Sciences Po, January 2002
No. of Maximum
Number-Grades
Percent
of Ballots
No. of Maximum
Number-Grades
Percent
of Ballots
1
2
3
4
5
6
7
8
76.0%
18.9%
2.8%
0.9%
0.2%
0.0%
0.0%
0.2%
9
10
11
12
13
14
15
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
0.9%
Table 17.2d
Distribution, Ballots Number of Different Number-Grades, Sciences Po, January 2002
No. of Different
Number-Grades
Percent
of Ballots
1
2
3
4
5
6
7
8
9
10
0.9%
5.8%
12.1%
17.0%
22.1%
22.8%
12.1%
6.3%
0.5%
0.2%
Point-Summing Methods
311
approval voting (see chapter 18). It was realized in six different voting precincts
of three towns: three in Illkirch (Alsace), two in Louvigny (Basse-Normandie),
and one in Cign (Mayenne). There were 2,836 participants (62% of those who
voted officially). The ballot stated,
Instructions: You give a grade to each of the 12 candidates: either 0, or 1, or 2 (2
the best grade, 0 the worst). To do so, place a cross in the corresponding box . . . The
candidate elected with [this] method is the one who receives the highest number of
points.
No. of ballots
Average no. of 2s
Average no. of 1s
Average no. of 0s
Cign
Louvigny
Illkirch
All
227
1.71
2.70
7.59
1,022
1.72
2.80
7.48
1,489
1.65
2.61
7.75
2,738
1.68
2.69
7.63
312
Chapter 17
Table 17.4a
Number of Wins of the Centrist Candidate (Bayrou) with a Point-Summing Method, 2007 Orsay
Experiment
Royal
First-past-the-post
Majority judgment
Point-summing
Borda
Bayrou
Sarkozy
Tie
(3)
(12)
(3)
(12)
(3)
(12)
(3)
4,420
718
113
25
1,855
735
116
56
1,455
8,880
9,878
9,973
1,924
8,836
9,871
9,943
3,922
401
0
1
6,204
429
0
0
203
1
9
1
(12)
17
0
13
1
Note: Ten thousand samples of 201 ballots, drawn from all 1,733 ballots of the 2007 Orsay experiment. (3) indicates the experiment with three candidates; (12) indicates the experiment with all
twelve candidates. One of the three candidates Royal, Bayrou, or Sarkozy, always wins among the
twelve candidates.
This suggests that the 2s are purely relative; they are awarded to the voters
favorites. This seems quite natural, for when number-grades are not associated
with a common language, their only real meaning is found in their strategic
impact, which induces comparisons versus evaluations and immediately leads
to a serious defect: Arrows paradox.
In the traditional model Arrows paradox arises when a candidate enters
the ring or drops out, which may change the order of finish among the other
candidates. In this case it may arise because one more or one less candidacy
may alter the strategies of the voters, thus provoking a change in the order of
finish among the others. When points 0, 1, 2 are used, a voter who gave a 2 to
a candidate who dropped out, for example, may well decide to change a 1 to
a 2 for another candidate (a favorite among those who are still candidates). In
short, points induce comparisons, not evaluations, and comparisons open the
door to Arrows paradox.
The fourth experiment is the 2007 Orsay experiment. The statistical evidence
shows that point-summing methods very strongly favor the centrist candidate
(about as much as does Bordas method). 8 Four independent sets of 10,000 random drawings of 201 ballots from the Orsay ballots define different problems.
The point summing-winner was determined by giving 5 points for Excellent, 4
points for Very Good, 3 points for Good, 2 for Acceptable, 1 for Poor, and 0 for
To Reject. The number of wins in each of four cases are given in tables, 17.4a
and 17.4b.
Recall that Bayrou had 25.5% of the first-round votes in the three Orsay
precincts and that the 1,733 ballots were so favorable to him that almost any
8. See chapter 6, where the identity of the centrist candidate is deduced from the ballots.
Point-Summing Methods
313
Table 17.4b
Number of Wins of the Centrist Candidate (Bayrou) with a Point-Summing Method, 2007 Orsay
Experiment
Royal
First-past-the-post
Majority judgment
Point-summing
Borda
Bayrou
Sarkozy
Tie
(3)
(12)
(3)
(12)
(3)
(12)
(3)
641
631
155
51
957
640
132
10
0
4,326
9,546
8,750
0
4,301
9,547
9,989
9,298
5,043
251
1,096
9,040
5,059
271
0
61
0
48
103
(12)
3
0
50
1
Note: Ten thousand samples of 201 ballots, drawn from a sample of 501 ballots representative of
the first-round national vote. (3) indicates the experiment with three candidates; (12) indicates the
experiment with all twelve candidates. One of the three candidates, Royal, Bayrou, or Sarkozy,
always wins among the twelve candidates.
reasonable method would elect him (see table 15.4), whereas nationally (and
in the representative ballots) Bayrou only had 18.6% of the first-round votes.
This explains the dramatic difference between tables 17.4a and 17.4b: for the
database of table 17.4a, any reasonable method will elect Bayrou most of the
time.
17.3 Conclusion
Point-summing methods are not acceptable methods for electing and ranking.
The reasons are easily summarized.
Points mean nothing except that they will be summed. They invite comparisons and not evaluations. This may lead to Arrows paradox.
When points are in abundance, for example, 0100 or 010, in a large electorate versus a small jury of experts, the evidence shows that their meanings
and uses differ widely among the voters.
314
Chapter 17
Table 17.5
Average Number of Grades per Majority Judgment Ballot, All Candidates and Four Important
Candidates, 2007 Orsay Experiment
All Candidates
Excellent
Very Good
Good
Acceptable
Poor
To Reject
0.69
1.25
1.50
1.74
2.27
4.55
1.57
2.34
1.94
1.49
0.99
3.68
Total
12
12
Note: The four important candidates are Bayrou, Royal, Sarkozy, and Le Pen.
Totals are normalized to 12.
18
Approval Voting
If names be not correct, language is not in accordance with the truth of things. If language
be not in accordance with the truth of things, affairs cannot be carried on to success.
Confucius
A relatively recent novelty in voting mechanisms, approval voting, is championed by Steven J. Brams, Peter C. Fishburn (Brams and Fishburn 1983), and
many others. It was first proposed formally by Robert J. Weber (1977). 1 It
has been used to elect the officers of several important scientific societies, to
elect national representatives in Russia, and in a referendum held in the state of
Oregon where one proposition among five was to be chosen. Approval voting,
which was discussed in the context of left-right spectra in chapter 6, allows
each voter to cast as many votes as he wishes, but at most one per candidate, so
each voter either approves of a candidate by giving him one vote or disapproves
by giving none, and the winner is the candidate with the most votes. The voter
is offered the possibility of giving any number of candidates 1 point and the
others 0 points, and the order among the candidates is determined by the sums
of their points. In its traditional practice and presentation, approval voting is the
most restrictive of a large family of point-summing methods (see chapter 17);
if presented and practiced completely differently, it may be seen as the most
restrictive case of the majority judgment.
18.1 Traditional Arguments
The traditional commonsense arguments for approval voting date back to the
1970s and seduced many (including the authors). First, voters are better able
to express their preferences than by naming a single candidate, which makes
1. The idea of dichotomous voting schemes was, however, considered earlier by Bartoszynski
(1972).
316
Chapter 18
it more attractive for voters to participate in elections. Second, voters who are
very positive about a candidate who for some reason is a certain loser (e.g., a
one-issue candidate) are able to express this by voting for that candidate yet
also voting for other candidates who have a better chance of winning. Third,
when there are at least three candidates, a plurality winner often fails to obtain
a majority of the voters approvals, whereas an approval vote winner may well
do so and is conferred more legitimacy in any case. Fourth, approval voting
helps elect the better candidate: in general [approval voting] helps elect the
Condorcet candidate (Brams and Fishburn 1983). Fifth, the method is easy to
understand and to use in practice, and most voting machines can be adjusted to
accommodate it. None of these arguments except the fifth have stood the test
of time. The first two pale in comparison with a majority judgment endowed
with a rich language of grades (as in the 2007 Orsay experiment); the next two
are often false in practice.
Many of the theoretical arguments for and against approval voting, cast in
terms of the traditional model 2 are not convincing either.
Approval voting is not strategy-proof (see chapter 13).At first glance, it would
seem that an elector should vote sincerely: approval of a candidate implies
approval of all candidates higher in the electors ranking (how can this be
harmful?). This is true and trivial to prove when there are only three candidates:
an elector should always vote for his top and never for his bottom candidate, so
in any case the vote must be sincere whether an approval is cast for the other
candidate or not. But in the presence of at least four candidates, sincerity is not
necessarily the best strategy. 3
In general, many different profiles of preference-orders will lead to the same
approval votes, and two voters with the same preference-orders may cast their
approval votes very differently. This opens the door to what seems to be a
severely damaging drawback of approval voting: the resultsthe winner and
the order among candidatesare completely indeterminant in the following
sense. By varying the voters individual choices of sincere strategies, one and
the same profile of individual preference-orders can produce every possible
collective-order and thus every possible winner (Saari and van Newenhizen
1988). An actual vote shows that this can occur: the Social Choice and Welfare
2. Except when the preferences themselves are dichotomous, meaning voters consider candidates
as only good or bad (Barber, Sonnenschein, and Zhou 1991; Bogomolnaia, Moulin, and Strong
2005).
3. For a detailed analysis of sincerity in approval voting, see Merrill and Nagel (1987); for a
fascinating account of the strategies provoked by a type of approval voting in the U.S. presidential election of 1800, leading to thirty-four tied ballots in six days of voting in the House of
Representatives before the election of Thomas Jefferson instead of Aaron Burr, see Nagel (2007).
Approval Voting
317
(SCW) Societys 1999 presidential election (Brams and Fishburn 2001; Saari
2001a). In this approval voting election, seventy-one members voted, and the
result ranked the candidates A S C S B (with 32, 30, and 14 votes, respectively); two voters approved of all three candidates, three approved of two,
sixty-four approved of one, and two of none. Voters were asked to indicate
their preference-orders to enable the results to be analyzed; fifty-two complied.
The approval voting result among the fifty-two was the same, A S C S B
(with 22, 20, and 9 votes, respectively); of these, one voter approved of two
candidates, forty-nine of one, and two of none. The profile of the fifty-two
individual preference-orders was
(1)
(4)
13 : A B C
11 : C A B
(2)
(5)
11 : A C B
8 : C B A.
(3)
9:B C A
If we imagine that these fifty-two voters vary their behavior, voting sincerely
either for their top candidate or their top two candidates, approval voting
produces every possible collective-order:
A S B S C: type 1 votes top two, others top one;
A S C S B: all vote top one;
B S A S C: types 1 and 5 vote top two, others top one;
B S C S A: types 1, 3, and 5 vote top two, others top one;
C S A S B: type 2 votes top two, others top one;
C S B S A: types 1, 2, and 5 vote top two, others top one.
This is an amusing and surprising mathematical result. But it makes little
sense as a practical matter to imagine that sincere approval voters place the
boundary between approvals and disapprovals arbitrarily; on the contrary, they
draw that dividing line with care, strategically or in accord with their evaluations
of the candidates. Analyzing approval voting when voters are believed to have
rank-orders in their minds suggests irrelevant questions, and answers that have
no practical significance.
Complete indeterminacy is a property of any point-summing method. The
plurality system and Bordas method are not completely indeterminate, but they
are not point-summing methods. With plurality voting, the only possibility is to
assign at most one candidate 1 point and the others 0, and with Bordas method
the assigned points are necessarily all different, 0, 1, 2 up to the number of
candidates less 1.
Approval voting does not and cannot guarantee the election of a Condorcetwinner (not necessarily a bad property). This is proven by the SCW Society
fifty-two-voter profile presidential election. For this profile the societys
318
Chapter 18
1 voter
k voters
X:
12,,12,
12,
4,,4
Y:
16,,16,
8,
8,,8
Imagine that the numbers represent the merit of the candidates in the eyes of
the voters (on a scale from a low of 0 to a high of 20); the higher the number, the
higher the merit; their comparisons determine the voters preferences. Suppose
that voters give an approval to any candidate whose merit number is 10 or
above and otherwise disapprove. Thus X wins with k + 1 to Y s k approvals,
and yet only one voter prefers X to Y .
These are three good reasons to study what happens when approval voting
is analyzed as a game.
18.2 The Game of Approval Voting
Approval Voting
319
not strategy-proof, the equilibria of the game of voting are of interest. This problem is first discussed in the context of approval voting in this section. Chapter 20
takes up the problem in a more general context, including majority judgment.
Chapter 19 addresses one aspect of the strategic behavior of candidates under
majority judgment.
In the game of approval voting, a voter has a utility function that is a function
only of who wins (which defines the voters weak preference-order over the
candidates). She chooses a strategy that consists of casting a 0 or 1 for each
candidate, knowing that every other voter is faced with the same choice and
that the winner is the candidate with the most 1s. It has been shown (Brams and
Fishburn 1978) that with approval voting, if there exists a Condorcet-winner
C, then there exists a Nash equilibriuma strategy for every voter with the
property that no voter can change her vote to obtain what for her is a better
outcomethat elects C in sincere, undominated strategies. A strategy of a
voter is undominated if there is no other strategy that gives at least as good an
outcome and for some profiles a strictly better outcome.
The significance of this result, however, is doubtful. The fact is that Nash
equilibria are plentiful (see chapter 20); in particular, approval voting can elect
any candidate in sincere, undominated strategies at a Nash equilibrium (Brams
and Sanver 2006). (This is related to complete indeterminacy.) Many different refinements introduced by game theorists have been proposed, but most
fail. For example, Seltens perfect equilibria, Myersons proper equilibria, or
Mertens stable equilibria applied to approval voting all require that voters have
unrealistic amounts of information concerning the behavior of the other voters, ask them to do complex probability computations, and in any case admit
many equilibria that sometimes include insincere strategies, sometimes elect a
Condorcet-loser, and sometimes give the Condorcet-winner no approval votes
whatsoever (De Sinopoli, Dutta, and Laslier 2006).
One of the elections that spurred Weber to propose approval voting in the
first place seems to have been the 1970 Senate contest in the state of New
York. There were three candidates: James Buckley, a conservative Republican
running as the candidate of the Conservative Party, Charles Goodell, a moderate
Republican, the candidate of the Republican Party and the Liberal Party, and
Richard Ottinger, the candidate of the Democratic Party. Goodell and Ottinger
were both viewed as liberal (in the U.S. sense) or center left. The surprising result
was
Buckley (B): 39%
Buckley was elected but the voters of New York had clearly voted against the
conservative candidate. It has been widely claimed that if approval voting had
320
Chapter 18
been used instead of first-past-the-post, Buckley would have been defeated and
Goodell elected. What scientific analysis supports this claim?
For the sake of the argument and in the context of the traditional model,
hypothesize that New Yorks electors had the preference-profile
(1)
(3)
39% : B G O
37% : O G B.
(2)
24% : G O B
Had the right (type 1) voted for its top choice and the left (types 2 and 3) for
their top two choices, the argument goes, Goodell and Ottinger would have been
tied for first, so one of the two moderate candidates would have been elected.
But why expect this? An elector of type 2, sensing this result, would reason
that it is better to vote only for Goodell, and for the same reason an elector of
type 3 would vote only for Ottinger. If most electors of the left followed this
reasoning, Buckley would again emerge the winner.
In practice, electors have some idea of what the results will be from repeated
public opinion polls (or other evolving information), so their votes may be
imagined as optimal responses in view of the latest information. Given any
voting system, imagine a succession of polls, each a virtual election in that
system, in which every voter uses the information of the last available poll to
cast an optimal vote in the next election or poll. After several rounds do the
results converge to a stable winner, or better yet, to a stable order of finish among
the candidates, meaning that the winner or the order finally repeats itself ?
This problem may be analyzed in two different albeit related manners. One
is to consider a dynamic process where every voter sends his optimal strategic
response to the last poll. The question is then to find whether there are reasonable
conditions under which the result converges to a stable winner and stable order
of finish, together with best response strategies that are simple and easy to use
by voters. The other is to consider the more abstract static question: Does there
exist a strategy-profile of the voters whose outcome is such that no voter (or
more realistically, no coalition of voters) would wish to change his strategy
(or their strategies) were he (or they) permitted to do so.
Robert B. Myerson and Robert J. Weber (1993) were the first to obtain significant results in the study of the second question. They imagined a sophisticated
model similar to those used to study market equilibria in economics. Voters have
numerical utilities for the election of candidates, and in addition probabilities
p = (p1 , . . . , pn ) on the distribution of the total approvals of each candidate
that permit them to determine their expected utility gains as a function of their
votes. Thus, given any p that is perceived by all, the outcome may be computed
when voters cast their ballots to maximize their expected utility gains, whatever
the mechanism of voting that is used.
Approval Voting
321
40% : (10, 0, 0)
(2)
(3)
Thus, for example, voters with type 2 utilities have no use for Buckley,
immensely prefer the two others, and have a very slight preference for Goodell
over Ottinger.
First-past-the-post admits three voting equilibria. In the first, B is the likely
winner: every elector votes for his top choice. In the second, G is the likely winner: voters of type 1 utilities vote for B, the others for G. In the third, O is the
likely winner: voters of type 1 utilities vote for B, the others for O.
Approval voting also admits three equilibria. In the first, O is the likely
winner: voters of type 1 utilities give a 1 to B; voters of type 2 utilities give
a 1 to G and to O; and voters of type 3 utilities give a 1 to O. In the second,
G is the likely winner: voters of type 1 utilities give a 1 to B; voters of type 2
utilities give a 1 to G; and voters of type 3 utilities give a 1 to G and to O. In
the third equilibrium, all three are likely winners: voters of type 1 utilities give
a 1 to B; of the 30% of voters with type 2 utilities, 20% give a 1 to G and 10%
give a 1 to G and to O; of the 30% of voters with type 3 utilities, 20% give a 1
to O and 10% a 1 to G and to O.
Bordas method admits an infinity of equilibria, with all three candidates the
likely winners in each. Each is determined by , where 10% % 20%, as
follows: of the 40% of voters of type 1 utilities, % give a 2 to B and a 1 to G
and 40 % give a 2 to B and a 1 to O; of the 30% of voters of type 2 utilities,
% give a 2 to G and a 1 to O and 30 % give a 2 to G and a 1 to B; of the
30% of voters of type 3 utilities, 10% give a 2 to O and a 1 to B and
40 % give a 2 to O and a 1 to G.
Myerson and Weber conclude, In summary, for the electoral situation studied here we find that under the plurality rule, there is an equilibrium in which
the minority (right) candidate is the only likely winner. Under Bordas rule,
the minority candidate is always a likely winner at equilibrium. Only under
approval voting do we see simultaneously the existence of equilibria in which
322
Chapter 18
the minority candidate is not a likely winner, and the nonexistence of equilibria in which the minority candidate is the only likely winner. Poundstone
(2008) reports that Weber believes this is the analysis that makes the strongest
theoretical case for approval voting, but he goes on to say, No one knows
with certainty how masses of real-world, twenty-first-century approval voters will act. Myerson and Weber have shown that approval voting works well
when voters are the perfect Machiavelians of game theory . . . However, no one
expects all voters to be so calculating, or for all elections to be Nash equilibria
(214, 218). The facts support Poundstones conclusion: in the French presidential contest of 2007an election fraught with temptations to vote strategically
(voter utile)polls estimated that at most 30% of the votes were deliberately
strategic. Myerson and Weber did not study the dynamics.
Myerson investigates extensions of this work in several later papers (1998;
2000; 2002). The mathematical theory of the refinement of Nash equilibria
introduces probabilistic perturbations into the model. As Myerson (2002) states,
The analysis of voting games in this paper is based on the assumption that . . .
each voter chooses his ballot to maximize the utility that he gets from the election, which is assumed to depend only on which candidate wins the election.
This assumption seems natural and realistic, but it implies that each voter cares
about his choice of the ballot only in the event that his ballot could pivotally
change the outcome of the election. So this theory of rational voting necessarily
implies that voters decisions may depend on the relative probabilities of various ways that one vote may be pivotal in the election, even though these pivotal
probabilities may be very small in a large election. These are very small probabilities indeed, and the assumptions are not, we believe, natural or realistic.
But the merit of Myersons approach is that it enables several different methods
of voting to be compared in one model. For example, he is able to show that
the number of approval voting equilibria is smaller than the number of plurality
voting equilibria, so their predictive value is better. He introduces uncertainty
by varying the number of voters (or of ballots counted) according to a Poisson
distribution. Thus, there is a minute probability that any one voter is the only
one to cast a ballot, which clearly makes him pivotal. This model affords two
main advantages that simplify the computations: voters share common public
information and the actions of voters are independent. Myerson is then able to
show, in an admittedly simplified situation in which there are three candidates
and two types of voters (so necessarily at least half of the voters have the same
preferred candidate), that a Nash equilibrium winner assures the majoritarian
outcome, which is not true of plurality voting among other seemingly reasonable methods. The dynamics, however, are not studied. Moreover, the voters
Approval Voting
323
optimal strategies may not be sincere, and there are situations where there is a
Condorcet-winner that is the outcome of no Nash equilibrium.
Three examples put in perspective the qualitative interpretation of these
results (Nez 2008). In the first, there are three candidates and three types
of voters (instead of two). One type constitutes a majority so that its preferred
candidate is the Condorcet-winner C, but C is not the winner in every equilibrium. In the second, there are four candidates and a Condorcet-winner C who
is second in every voters preference, but at equilibrium the strategies are insincere and C receives zero approval votes. In the last example, there are exactly
two equilibria, but neither elects the Condorcet-winner.
Laslier (2009) presents more positive results. Uncertainty is introduced via
technical errors: ballots may be miscounted. Miscounts occur independently of
the identities of voters and of candidates, and are interpreted as a probability
of error. Laslier states, Rationality implies that a voter can decide her vote by
limiting her conjecture to those events in which her vote is pivotal . . . [T]his
is a very rare event, and it may seem unrealistic that actual voters deduce their
choices from implausible premises. One can wish that a positive theory of the
voter be more behavioral and less rational.
In this model the strategy that maximizes a voters expected utility when
there are sufficiently many votersher best response, introduced as the poll
assumption by Brams and Fishburn (1983, 115) and later called the leader
ruleis simple to describe. Suppose the two leading contenders of the predicted order are X S Y (where in case of ties X is taken to be the electors
preferred among the candidates in first place and Y is taken to be the electors preferred among the candidates in second place). Then what we call the
poll-leader rule is this:
If the voter prefers X to Y , then she votes for X and the candidates she prefers
to X.
If the voter does not prefer X to Y , then she votes for the candidates she
prefers to X.
This is at once a simple and reasonable rule that yields several important results.
Result 1 The poll-leader rule is sincere and undominated, and if there is
an equilibrium outcome with no tie, the winner is a Condorcet-winner. This
is easy to see. Suppose there are a winner and a runner-up satisfying X S Y .
If a majority preferred Y or any other candidate Z to X, then in the next round Y
or Z would have more approvals than X, a contradiction. So X is the Condorcetwinner.
324
Chapter 18
Result 2 The score of the winner X is his majoritarian score against Y , and
the score of any other candidate is his majoritarian score against the winner X.
The poll-leader rule defines a dynamic process (which is not studied in Laslier
2009). At each step voters cast their strategic approval ballots which determine
the leaders of the next step. Unfortunately, the process does not necessarily
converge, as the following preference-profile shows (where the relative shares
of the voter types are given):
1/3 : A C B D
1/3 : D C A B
1/3 : B C D A
:ABDC
:DABC
: B D A C.
If a poll predicts that at time t the leaders are A S C, then the scores at
time t + 1 are s(A) = 1/3 + 2, s(B) = 1/3, s(C) = 2/3 2, and s(D) =
2/3 , so the new leaders are D S C. Continuing, at time t + 2 the leaders
become B S C and at time t + 3 they are once again A S C, and so on. So
the process does not converge, and the Condorcet-winner C is the perennial
runner-up.
The poll-leader rule is based on the two leading candidates. Simpler sincere
response strategies may be envisaged. For example, if the electors preferred
candidate is the expected winner X, then he votes for X; otherwise, he votes
for the candidates he prefers to X. But the process does not converge. For consider the SCW Societys presidential election (see section 18.1), and suppose
the Condorcet-winner C is the predicted winner. Then the approval voting outcome at the next stage is A S B S C (with respective scores 24, 22, 19),
so A wins. But if A wins and the individuals again use the same strategy, the
next outcome is C S A S B, so C wins, and so on. Indeed, exactly the same
phenomenon arises with the preference-profile postulated for the 1970 New
York Senate election. The simple majority rule outcome is G S O S B, so
G is the Condorcet-winner. But when G is the expected winner, the strategy
yields B S O S G, whereas when B is the expected winner, the strategy yields G S O S B.
There is, however, a best response dynamic that does converge to the
Condorcet-winner (when he exists) that is based on the poll-leader rule. 5 Given
5. This dynamic is studied in the context of a more general model in chapter 20.
Approval Voting
325
the outcome of a poll at time t, two candidates from among those that have
more than half of the electorates approvals are selected at random and arbitrarily designated winner and runner-up. Otherwise, the leader and runner-up
are the candidates with the highest number of votes (chosen at random in case
of ties). The voters then use the poll-leader rule, and the next poll is taken.
After the first stage, the Condorcet-winner C necessarily has a majority of
approvals, but other candidates may also have majorities (as in the last example). C will necessarily conserve a majority of approvals at every subsequent
stage (as in the last example). Therefore, at some stage C must be selected as the
leader. At that stage, all other candidates will have approval scores equal to his
majoritarian (head-to-head) score against C, and Cs approval score will be his
majoritarian score against the runner-up. C remains the leader in all subsequent
stages. If there is no Condorcet-winner, the leaders in subsequent stages will
cycle among the top cycle of candidates who defeat all others in face-to-face
encounters.
If one accepts as a working hypothesis that these analyses of the game of
approval voting show approval voting works well, what makes it work at all
in this model must be underlined: the purely strategic behavior of individual
electors. However, the very essence of an election is to arrive at a collective decision based on the electors true opinions; that it should depend only
on an electors strategic considerations concerning winners and nothing else,
sometimes implying votes at variance with the voters true opinions, is not
commendable. Moreover, voters are most likely influenced or coordinated by
a small number of leaders or parties. The game of majority judgment voting
leads to more felicitous conclusions (see chapter 20, where games of voting are
studied in greater detail): both behavioral and rational voters are included in
the context of one model.
18.3 Approval Judgment
326
Chapter 18
Table 18.1
Polling Results, March 20 and 22, 2007, French Presidential Election
Bayrou
Sarkozy
Royal
Le Pen
Question: Do you
personally wish each
of the following
candidates to win the
presidential election?
Yes
No
Yes
No
60%
59%
49%
12%
36%
38%
48%
84%
33%
29%
36%
48%
56%
49%
On the other hand, the words approval and disapproval carry meanings,
and an elector may be influenced by those meanings. A poll conducted several
weeks before the first round of the French presidential election of 2007 shows
the crucial importance of the meanings attached to the 1s and 0s (table 18.1). The
electors gave completely different answers to the two questions. The question on
the left in the table poses an absolute question, that on the right a relative one. The
first invites an evaluation, the second proffers a contrast. Significantly, the first
question elicited a yes for the four major candidates, an answer considerably
more in keeping with their Good or better grades in the 2007 majority judgment
experience than the answer to the second question. But which of these questions
(or others) does an approval voter have in mind? Undoubtedly, different voters
have different conceptions, and so answer different questions. Their votes carry
very different meanings: adding them makes no sense.
However, approval voting may be presented, practiced, and analyzed as a
special case of the majority judgment when the common language of grades
consists of exactly two grades. In this case, the majority-ranking is precisely
the approval voting ranking. Suppose (for the sake of the explanation) that the
two grades are 1 and 0. A candidates majority-grade is 1 if she has a majority
of 1s and 0 otherwise. A candidate with a majority-grade of 1 is ranked ahead
of a candidate with a majority-grade of 0 by both the majority judgment and
approval voting. If two candidates have the same majority-grade, the majorityranking puts the first ahead of the second if the first has more 1s, as does approval
voting.
But to be an instance of the majority judgment, a clear and absolute question
must be posed, and voters must understand that they are assigning grades to
candidates: they are making absolute evaluations of the merits of candidates, not
Approval Voting
327
comparing them. This has not been the case in any of the theoretical discussions
or applications of approval voting, where the instructionsgiving 1s and 0s,
adding them, ranking candidates according to their sumsand the analyses of
results all suggest the point of view that what is important is comparisons. Had
anyone imagined that crosses and no crosses, ticks and no ticks, or 1s and 0s
were gradesabsolute evaluationsthey would (or should) have immediately
pointed out that approval voting is a mechanism that excludes Arrows paradox
and thus satisfies IIA (and, of course, satisfies all the other good properties of
the majority judgment). This is not true when the implicit or explicit question
is relative (like the question at the right in table 18.1, or when crosses or ticks
are to be added), for the responses depend on the lists of candidates, leading to
Arrows paradox.
When approval voting is practiced as a majority judgment, a language of two
words must be formulated that makes clear the evaluations are absolute grades.
To distinguish it from its traditional practice we call it approval judgment. The
question at the left in table 18.1 (Would each of the following candidates be a
good President of France?) is one reasonable possibility of approval judgment.
Had it been used in the 2007 Orsay experiment, a candidate might have received
a yes from every voter who assigned him a grade of at least Good. Among
the four major candidates this would have given the following result: Bayrou
69.4%, Royal 58.5%, Sarkozy 53.1%, and Le Pen 13.8%. However, had Very
Good been substituted for Good, the result would have been Bayrou 44.3%,
Royal 39.4%, Sarkozy 38.9%, and Le Pen 7.6%. It happens that the order
among the candidates is the same, but the numbers are very different. And if
Excellent had been substituted instead, the result would have been a different
order: Sarkozy 19.1%, Royal 16.7%, Bayrou 13.6%, and Le Pen 3.0%. To elect
a candidate with only 44% of the electorate that affirms he is Very Good, the
others denying it, or with 80.9% denying that a candidate is Excellent, is not
a particularly salutary result. The SCW Society election experienced the same
difficulty, and other reported experiments do as well.
More probing are analyses that use the 1,733 valid majority judgment ballot
of the 2007 Orsay experiment. In the first analysis, 10,000 random samples of
101 ballots were taken to compare how many times each of the candidates was
elected by each of four majority judgment methods: (1) the majority judgment
as it was presented with a language of six grades, (2) approval judgment where
approval means a candidate received a grade of Good or better, (3) approval
judgment where approval means Very Good or better, and (4) approval judgment where approval means Excellent. In every case one of the three major
candidates, Bayrou, Royal, and Sarkozy, is the winner. The results are given in
table 18.2a.
328
Chapter 18
Table 18.2a
Number of Wins among All Candidates, Winner Always Royal, Bayrou, or Sarkozy, 2007 Orsay
Experiment
Royal
Bayrou
Sarkozy
Tie
2,692
1,387
1,271
215
602
6,489
7,798
9,632
5,905
1,446
919
32
801
678
12
121
Note: Four Majority judgment methods. Ten thousand samples of 101 ballots, which were drawn
from 1,733 ballots.
Table 18.2b
Number of Wins among All Candidates, Winner Always Royal, Bayrou, or Sarkozy, 2007 Orsay
Experiment
Royal
Bayrou
Sarkozy
Tie
561
1,329
1,364
401
8
1,339
4,048
8,743
9,186
6,686
4,579
487
245
646
9
369
Note: Four Majority judgment methods. Ten thousand samples of 101 ballots, which were drawn
from a sample of 501 ballots representative of the national vote.
The second analysis is different only in that each of the random samples
is drawn from a set of 501 ballots, drawn randomly from the 1,733, that are
representative of the first-round national vote (see chapter 6). The results are
given in table 18.2b.
The results are concordant. First, they show the dramatic impact of the question that is posed in conducting an approval voting election. Second, when
approval means Good or better, the candidate of the center, Bayrou, is favored;
when approval means Very Good or better, and still more so when it means
Excellent, the candidate of the center is penalized. The problem of exactly
which question should be posed and which answer solicited is at once important politically and difficult scientifically. Almost all the literature on approval
voting to date has completely ignored the problem of how the question and
answer should be formulated, with one notable exception (Koc 1988). Koc
shows that the wording of the instructions influences the behavior of the voters.
The wording actually does more: it determines winners.
The majority judgment has been criticized because it can elect a candidate
preferred by only one voter. As noted, such examples are easily found because
only the grades of a candidate count, not who gave them (see example 16.2).
But, of course, the same is true of approval voting (or judgment), as that same
Approval Voting
329
example shows, when approval means a grade of at least 10. The response of
approval voting enthusiasts: strategic voting will avoid it. When one argument
does not work, try another. But, of course, strategic voting will avoid the phenomenon with the majority judgment as well. Comparison is valid when made
in the same context.
The practical instances where evaluations are made on the basis of a scale of
measurement composed of two levels are rare. Sometimes, in some university
courses, students may elect to be graded on a Pass or Fail scale. There are also
situations where sets of objects or competitors are to be classified as acceptable
or not. One example is candidates for appointment to a university faculty in
France, who must first be judged qualified in order to be allowed to apply for a
position. Another example is election to the National Baseball Hall of Fame.
A panel of journalists judges candidates either suitable or not for election. Each
elector may approve multiple candidates, but to be successful a candidate must
be listed on 75% of the ballots (for an analysis of such methods, see Barber,
Sonnenschein, and Zhou 1991).
But when it comes to electing and ranking candidates, a two-level scale of
measurement seems so unnecessarily restrictive as to be unnatural. In political
elections, or indeed competitions among performers or products, the evaluations
of voters or judges are invariably more complex than that allowed by a twolevel scale. The 2007 Orsay experiment showed the ease with which voters used
a six-level scale, and experience in judging skaters, divers, pianists, students,
wines, and other competitors or products shows that considerably finer scales
may sensibly be defined and used. So why confine a language of voting to only
two grades?
18.4 Practice
On April 21, in the first round of the French presidential election of 2002still
fascinated by the idea of approval voting and before we had any inkling of
working on the general problem of electing and rankingone of the authors
initiated an approval voting experiment, 6 conducted under the same general
conditions as the Orsay experiment of 2007, in five of Orsays twelve precincts 7 and the one precinct of Gy-les-Nonains, a small country town in Loiret.
6. The idea of an experiment on approval voting on a large scale in parallel with a French presidential
election actually goes back to 1995, when Balinski and Laurent Mann prepared a basic plan but
were too late to realize it. For a more detailed account of the 2002 experiment, see Balinski et al.
(2003).
7. 1st, 5th, 6th, 7th, and 12th precincts.
330
Chapter 18
Table 18.3a
Number of Ballots with k Crosses (k = 0, 1, . . . , 16), Approval Voting Experiment, Five Precincts
of Orsay and Gy-les-Nonains, First Round, French Presidential Election, April 21, 2002
No. of Crosses
No. of Ballots
% of Ballots
916
36
1.4
287
11.1
569
22.0
783
30.3
492
19.0
258
10.0
94
3.6
40
1.5
16
0.6
12
0.5
Table 18.3b
Approval Voting Results, Five Precincts of Orsay and Gy-les-Nonains, First Round, French
Presidential Election, April 21, 2002
Percent of Ballots
with Crosses
Jospin
Chirac
Bayrou
Chevnement
Mamre
Madelin
Taubira
Lepage
Besancenot
Laguiller
Le Pen
Hue
Saint-Josse
Boutin
Mgret
Gluckstein
Total
Percent of
All Crosses
Official Vote
First-Round
40.5%
36.5%
33.5%
30.3%
28.9%
21.3%
18.9%
17.9%
17.6%
15.4%
14.6%
11.5%
7.8%
7.8%
7.7%
4.3%
12.9%
11.6%
10.7%
9.6%
9.2%
6.8%
6.0%
5.7%
5.6%
4.9%
4.6%
3.6%
2.5%
2.5%
2.4%
1.4%
19.5%
18.9%
9.9%
8.1%
7.9%
5.0%
3.2%
2.8%
3.1%
3.7%
10.0%
2.7%
1.7%
1.3%
1.3%
0.8%
314.6%
100%
100%
Of the 3,346 voters who voted officially, 2,597 (78%) participated in the experiment and 2,587 ballots were valid. Some of the results of this experiment have
already been presented in chapter 6. As explained there, officially voters were
confronted with having to give their one vote to one of sixteen candidates.
The ballot of the experiment consisted of a list of the candidates together with
completely neutral instructions:
Rules of approval voting: The elector votes by placing crosses [in boxes corresponding
to candidates]. He may place crosses for as many candidates as he wishes, but not more
than one per candidate. The winner is the candidate with the most crosses.
Approval Voting
331
On average, the voters cast 3.15 crosses per ballot (the distribution is given
in table 18.3a). The official system offered voters seventeen possible messages;
approval voting offered more than 65,000.8 Of the 2,587 valid ballots, 813 were
different. Voters expressed relief at having the possibility of casting crosses for
as many candidates as they wished.
The outcomes in the six voting precincts with approval voting and with
the official voting are given in table 18.3b. The one significant difference
between them is that Le Pen is third in the official vote and eleventh in the
approval vote (other differences are that in the official voting, Laguiller moves
up three places to follow Madelin, and Besancenot moves up one place to follow
Taubira). The four most important candidates, Chirac, Le Pen, Jospin, and Bayrou, all lost relative support in approval voting, whereas every one of the minor
candidates gained relative support. If Orsay and Gy-les-Nonains were at all representative of France, the results of the experiment showed that the indecision
of the countrythe lack of enthusiasm for any one candidate or partywas
even more extreme than the usual method of voting indicated. No candidate
received anywhere near a majority of the ballots. No legitimacy is added to the
first-place candidate, contrary to the claims made for approval voting. Whereas
we entered into this experiment persuaded by the usual commonsense arguments that approval voting was a good idea, the results left us with a distinct
feeling that it is not a reasonable mechanism. We did not know exactly why. Now
we do. 9
The result of the second round on May 5, 2002, in the five precincts of Orsay
and the one of Gy-les-Nonains was Chirac 89.3%, Le Pen 10.77%. In contrast
with the majority judgment, the electorates will expressed by approval voting
is not sufficient to predict this outcome (nor therefore to estimate the possible
result of any other face-to-face confrontation). Crosses and no crosses do not
communicate enough information. The problem is the frequency with which
voters assigned crosses to two candidates or no crosses to two candidates (table
18.3c). Do crosses or no crosses for two candidates imply indifference between
them, or not?
Three estimates of a face-to-face vote between Chirac and Le Pen were
calculated. In each, if a candidate has a cross and the other does not, the first
8. With sixteen candidates there are 216 = 65, 536 possible messages. With the majority judgment
there are 616 , or some 2.8 trillion, possible messages.
9. For a different analysis of this experiment, see Laslier and van der Straeten (2004).
332
Chapter 18
Table 18.3c
Percentages of Same Votes to Two Candidates, Approval Voting Results, five precincts of Orsay
and Gy-les-Nonains, First-Round, French Presidential Election, April 21, 2002
Jospin
Chirac
Bayrou
Mamre
Chevnement
Le Pen
Jospin
Chirac
Bayrou
Mamre
Chevnement
Le Pen
34%
44%
75%
56%
48%
34%
66%
51%
54%
64%
44%
66%
55%
60%
61%
75%
51%
55%
52%
61%
56%
54%
60%
52%
54%
48%
64%
61%
61%
54%
Note: Only the more important candidates are shown. Same votes means both crosses or both no
crosses.
is given 1 vote, the second 0. The first estimate gives 12 vote to each candidate
if both have crosses or neither do; this interprets crosses and no crosses as
expressing a voters indifference between them. This yields the estimate Chirac
61%, Le Pen 39%.
The second estimate gives 12 vote to each if both have crosses, otherwise 0; this interprets crosses as expressing a voters indifference between
them and no crosses as saying nothing. This yields the estimate Chirac 79%,
Le Pen 21%.
The last estimate gives no vote to each if both have crosses or both do not;
no indifference is expressed. This yields the estimate Chirac 80%, Le Pen 20%.
None of these estimates comes close to the actual result in the six voting
precincts. Several crosses on a voters approval ballotand even more so, several no crossesdoes not mean the voter is indifferent among the corresponding
candidates. This shows that the approval voting mechanism does not induce the
voters to correctly express their preferences or their indifferences.
Again, the problem is that crosses mean different things to different people.
In this experiment, and more generally wherever approval voting has been used,
it appears to be a mechanism that simply adds crosses: implicitly the vote is
relative; it asks voters to make pair-by-pair comparisons. The evidence confirms
that this invites strategic voting.
The crosses, it turns out, were used in the same way by the voters. Not only
are the average numbers of crosses per ballot about the same across all six
precincts but so also are the distributions of the number of crosses per ballot
(table 18.3d).
This does not, however, imply that the two options constituted a common
language of absolute grades because use includes strategic behavior, and perhaps what is in common is the strategic behavior. The point is, If voters assign
Approval Voting
333
Table 18.3d
Percentages of Ballots with k Crosses (k = 0, 1, . . . , 16), Approval Voting Experiment, Five
Precincts of Orsay and Gy-les-Nonains, First Round, French Presidential Election, April 21, 2002
No. of Crosses
All precincts
Gy-les-Nonains
Orsay 1st
Orsay 5th
Orsay 6th
Orsay 7th
Orsay 12th
916
Average
1.4
3.3
0.5
1.7
1.1
0.6
1.5
11.1
13.2
10.6
9.9
11.4
10.7
11.1
22.0
25.3
21.5
22.5
20.4
20.9
22.0
30.3
28.0
31.1
30.7
29.8
30.1
31.7
19.0
17.6
19.1
20.4
17.9
21.8
16.7
10.0
8.0
9.3
8.8
12.3
9.8
11.4
3.6
1.9
4.9
4.6
4.4
2.4
3.4
1.5
1.6
1.5
0.8
1.8
2.4
1.2
0.6
0.8
0.7
0.6
0.7
0.4
0.5
0.5
0.3
0.7
0.0
0.4
0.9
0.5
3.15
2.90
3.25
3.11
3.22
3.22
3.13
The question posed was again neutral. The outcomes over the six precincts
are given in table 18.4a. Again no candidate had circles in a majority of the
ballots; again the four major candidates all lost relative support in approval
voting, whereas every one of the others gained; again the mechanism failed in
that the winners majority-grade was not support.
However, the experience shows that populations make a remarkably homogeneous use of the means they are offered to express themselves. On average,
voters cast 2.33 circles per ballot, and close to the same was true in each of the
three towns. Moreover, the distributions of the number of circles per ballot in
each of the three towns were very similar as well, so the circles were used in
about the same way by all voters (table 18.4b).
334
Chapter 18
Table 18.4a
Approval Voting Experiment, IllkirchLouvignyCign, French Presidential Election, April 22,
2007
Bayrou
Sarkozy
Royal
Besancenot
Voynet
Le Pen
Bov
Laguiller
de Villiers
Buffet
Nihous
Schivardi
Percent of Ballots
with Circles
Percent of
All Circles
Official Vote,
First Round
49.7%
45.2%
43.7%
23.7%
16.9%
11.6%
11.5%
9.3%
9.0%
7.4%
3.4%
1.4%
21.4%
19.4%
18.8%
10.2%
7.3%
5.0%
4.9%
4.0%
3.9%
3.2%
1.5%
0.6%
23.0%
34.1%
23.6%
4.1%
2.1%
7.6%
1.1%
1.0%
1.7%
0.8%
0.6%
0.3%
Table 18.4b
Percentage of Ballots with k Crosses (k = 1, 2, . . . , 12), Approval Voting Experiment, Illkirch
LouvignyCign, French Presidential Election, April 22, 2007
No. of Crosses
All
Cign
Louvigny
Illkirch
812
Average
27.3
30.7
23.6
29.3
33.6
29.3
35.1
33.2
25.1
22.3
26.3
24.5
9.8
13.0
10.9
8.6
2.8
2.8
2.7
2.8
0.9
2.3
0.7
0.9
0.5
0.5
0.4
0.5
0.1
0.0
0.2
0.1
2.33
2.34
2.39
2.28
The analysis of the absolute versus relative vote issue is based on the considerable information found in the majority judgment ballots of the 2007 Orsay
experiment. Since the language is common to random samples of one hundred,
fifty, and even twenty voters from the three precincts in Orsay, it is reasonable
to hypothesize that the distribution of grades in the 2007 French presidential
election is common to all voters anywhere in France (note that the language is
common, not the evaluations of the candidates). In the approval voting experiment there were 2.33 circles per ballot. If voting behavior was based on an
absolute scale only, then voters would cast circles either for the candidates
deemed Excellent, or those deemed Very Good or better, or Good or better,
and so on. But (see table 6.1) there are on average 0.69 Excellents, 1.94 Very
Goods or better, and 3.44 Goods or better: none of these comes close to 2.33,
suggesting that the behavior is not purely absolute.
Approval Voting
335
Table 18.5
Average Number of highest, second highest, and third highest grades, Three Precincts of Orsay,
April 22, 2007
Average No. of Grades
Three
Precincts
1st
Precinct
6th
Precinct
12th
Precinct
Highest
Second highest
Third highest
1.64
2.19
2.76
1.51
2.08
2.73
1.62
2.16
2.78
1.80
2.34
2.76
Each majority judgment ballot assigns a grade to every candidate. The highest
grade is given to one or more candidates; the second highest to one or more
candidates; and so on down the list. Their averages may be computed (table
18.5); they are common to all three precincts as well. If voting behavior was
based on a relative scaleassuming these averages are roughly common to all
of Francethen 2.33 should be about equal to 1.64, or 3.83 = 1.64 + 2.19, or
greater. It is not, suggesting that the behavior is not purely relative.
Behavior in the 2007 approval voting experiment is better explained as a
mixture of absolute and relative behavior:
If the voter deems no candidate above a Good, he casts circles for every
candidate receiving his highest grade.
This behavior implies an average of 2.26 circles per approval ballot in the three
Orsay precincts, an average of 2.09 in the 1st, 2.27 in the 6th, and 2.43 in
the 12th. This is in substantial agreement with the 2.33 observed in the 2007
approval voting experiment. Applying this behavior to the majority judgment
ballots of the Orsay experiment to simulate an approval vote gives the following percentages of ballots with circles: Bayrou 51.1%, Royal 44.8%, Sarkozy
44.1%, Besancenot 16.8%, Voynet 14.5%, Buffet 11.6%, de Villiers 9.9%, Bov
9.0%, Laguiller 9.0%, Le Pen 8.7%, Nihous 3.2%, Schivardi 2.6%. These
percentages are close to those reported in table 18.4a. Of course, the results
could also be due to a complex mlange of varied strategic and nonstrategic
behavior.
The 2002 Sciences Po experiment (see chapter 17) adds to the evidence
that approval votes are not absolute evaluations. Recall that there were fifteen candidates, and one ballot included a point-summing method with points
between 0 and 10 and approval voting. The average number of approvals per
ballot was 2.76. The ballots make it possible to see when approvals and disapprovals of candidatescrosses and no crossescorrespond to each point
336
Chapter 18
Table 18.6
Distributions: Crosses and No Crosses, and Probability of a Cross, for each point k = 0, 1, . . . , 10
Assigned, Science Po, January 2002
Points k =
Percent of Crosses
Percent of No Crosses
Probk {Cross}
Percent of Crosses
Percent of No Crosses
Probk {Cross}
3.1
58.6
.05
1.2
10.6
.10
10
16.4
2.2
.88
17.4
0.6
.97
14.9
0.4
.97
5.0
0.1
.98
12.7
0.2
.98
2.7
8.4
.24
3
5.1
7.1
.42
4
5.4
5.8
.48
5
16.2
6.0
.73
Approval Voting
337
19
That each party endeavors to get into the administration of the government, and exclude
the other from power, is true, and may be stated as a motive of action: but this is only
secondary; the primary motive being a real and radical difference of political principle.
Thomas Jefferson
The competition for votes between the Republican and Democratic parties does not lead
to a clear drawing of issues, an adoption of two strongly contrasted positions between
which the voter may choose. Instead, each party strives to make its platform as much
like the others as possible.
Harold Hotelling
Previous chapters have compared various methods of voting with the majority
judgment on the basis of the theoretical properties the methods satisfy or fail to
satisfy. The experimental evidence given in this chapter depends entirely on the
2007 Orsay experiment. We believe that the participants by and large expressed
their true opinions, for they at once had no incentive not to do soparticipation
itself in a nonbinding vote was an indication of a will to cooperateand their
input messages were consistent with the expressions of their official votes (as
well as other ancillary evidence). Thus the fact that strategic considerations
played no role makes it possible to compare the various methods on the basis of
practical, realistic and honest expressions of opinion. This could not be done
with the results of a real election because in that case some voters will send
strategic messages rather than honest ones.
Beginning with a back-of-the-envelope model to contrast the appeal to the
center of first-past-the-post and majority judgment, the manipulability and bias
in favor of or against centrist candidates is studied experimentally. The results
confirm what common sense suggests: first-past-the-post, point-summing, and
Borda are the most manipulable; first- and two-past-the-post tend systematically
to deny the election of a centrist candidate, whereas Borda and point-summing
tend systematically to elect a centrist candidate. A centrist is a candidate situated
at the center of the political spectrum, between the leading left and right
340
Chapter 19
parties (and is not necessarily a Condorcet-winner, as the experimental evidence shows). We believe that these results are robust, but more experimentation
should be done.
19.1 Bias for the Center
It has often been observed that with the electoral systems commonly practiced,
and in particular, first-past-the-post, candidates or parties vie for the center:
they wish to express the opinions of the median-voter. The primitive idea is that
there is a left-to-right political spectrum, a line on which voters and candidates
are situated. A Democratic candidate wishes to occupy a position or point on the
line as far as feasible to the right in order to obtain the votes of all those whose
opinions are at that point or to the left, and a Republican candidate wishes to
occupy a point on the line as far as feasible to the left in order to obtain the
votes of all those whose opinions are at that point or to the right. In terms
of the French election of 2007, Bayrou was the Condorcet-winner because he
occupied a position on the line close to the median-voters: to the right of the
socialist Royal, he would against her obtain all the votes on the right; to the left
of the U.M.P. Sarkozy, he would against him obtain all the votes on the left.
And in the runoff between the leftist Royal and the rightist Sarkozy, Royal tried
to inch toward the right to obtain some moderate rightist votes and all those
to the left of them, whereas Sarkozy made overtures to the left to obtain some
moderate leftist votes and all those to the right of them.
Beginning with an issue of economic equilibrium when two firms compete
and costs of transportation from their productions facilities to the buying public
are significant, Hotelling (1929) developed a simple but ingenious back-of-theenvelope model to analyze the situation, which was subsequently developed in
detail by Downs (1957) in his famous book. In the language of elections the
models essentials may be described in the following terms. The unit interval
[0, 1] is the line that represents the political spectrum; a point is a political
position. Voters are uniformly distributed on it. The election game is played by
two candidates. Each chooses some point on the line. A voter casts one vote for
the candidate nearest to her point, and the candidate with the most votes wins.
More generally, voters may be distributed according to some density function
f or the corresponding cumulative distribution function F . Downs analyzed
a normal distribution.
If a candidate announces a point x that is not that of the median-voterin
the uniform case the point 12 , so that one-half of the voters are to the right,
the other half to the left (more generally, the point F 1 ( 12 ))she is sure to
lose the election. For if x < 12 and the opponent chooses any point y in the
341
(1 |x y|)dx = 1
(y x)dx
(x y)dx = 1
y 2 (1 y)2
,
2
2
342
Chapter 19
candidate F 1 12 . However, it may be difficult to compute the majority-grade:
for some F s the solution is unique, for others not.
This suggestion is borne out by the empirical evidence. Table 19.1 gives the
distributions of the winners in 10,000 random samples from 101 ballots drawn
from the 501 representative ballots of the 2007 Orsay experiment according
to each of nine different methods. The same qualitative results are obtained
when samples are of 201 ballots or when they are drawn from all 1,733 ballots
of the Orsay experiment. However, when all the ballots of the experiment are
used, the results are very much more favorable to the centrist candidate Bayrou
(since he did much better in Orsay than nationally), so whereas the ordering of the methods is the same, the numbers of Bayrou wins are considerably
higher and thus give less insight into the differences between methods.Approval
judgment when approval means Excellent (AJ Excellent), first- and twopast-the-post, and approval judgment when approval means at least Very Good
(AJ Very Good ) are biased against the centrist candidate (in order, decreasingly). Condorcet, point-summing, approval voting when approval means at
least Good (AJ Good ), and Borda are biased in favor of the centrist candidate (in order, increasingly). The majority judgment stands somewhere in the
middle: although Sarkozy wins most often, each of the three main political
forces has a reasonable chance of winning.
One methods bias in favor of the centrist candidate is highly dependent
on the number of candidates, namely, Bordas. It is already biased with three
candidates, but with twelve it is overwhelmingly so. Observe also that the
only two methods that are chaotic in that they are sensitive to the number of
candidates are the two sum-scoring methods, first-past-the-post and Bordas.
Moreover, Saaris claim that Bordas method is the least chaotic of the sumscoring methods is put into doubt by the experimental evidence. The explanation
is that Saaris analysis assumes impartial culture, meaning that all preferenceorders are equally likely, which is simply not true in practice. What is observed
in practice is a statistical left-right spectrum. All the other methods are stable
with regard to the number of candidates. Approval judgment when approval
means at least Acceptable or at least Poor is not recommended: imagine a
candidate elected because he has the largest number of at least Acceptable or at
least Poor! The Australian alternative vote would have given the same results as
two-past-the-post with three candidates (Bayrou, Royal, Sarkozy); it is biased
against the center but is less chaotic than first-past-the-post or Borda.
Table 19.2 only concerns the number of times the centrist candidate (Bayrou)
wins. It confirms that the methods are ordered from least to most in favor of
the centrist candidate. Take any row, for example, the majority judgment row.
Bayrou wins 4,037 times; of those, the number of his wins with the preceding
343
Table 19.1
Number of Wins among Royal, Bayrou, and Sarkozy, 2007 Orsay Experiment
3 Candidates
AJ Excellent
First-past-the-post
Two-past-the-post
AJ Very Good
Majority judgment
Condorcet
Point-summing
AJ Good
Borda
Left
Royal
Bayrou
Right
Sarkozy
Tie
557
1,713
2,137
1,303
1,317
679
854
390
483
7
46
883
1,304
3,982
6,519
7,859
8,828
7,201
9,190
8,092
6,555
6,703
4,690
1,975
1,093
454
2,154
246
149
425
690
11
473
194
328
162
Left
Royal
Bayrou
Right
Sarkozy
Tie
508
2,112
2,174
1,277
1,321
663
859
380
377
3
48
764
1,316
4,037
6,552
7,875
8,801
9,592
9,238
7,824
6,675
6,753
4,631
1,972
1,090
463
21
251
16
387
654
11
436
176
356
10
12 Candidates
AJ Excellent
First-past-the-post
Two-past-the-post
AJ Very Good
Majority judgment
Condorcet
Point-summing
AJ Good
Borda
Note: Ten thousand samples of 101 ballots, which were drawn from a sample of 501 ballots
representative of the national vote. The percentage estimates of the first-round results on the basis
of these 501 representative ballots come close to those of the official national vote. In these 501
ballots Sarkozy had 30.7%, Royal had 25.5%, Bayrou had 18.7% and Le Pen had 9.3%.
Recall that in every sample drawn and when all twelve candidates are present, one of Royal,
Bayrou, or Sarkozy is always the winner.
The Condorcet paradox occurred in 354 cases among three candidates and in 377 cases among
twelve candidates.
methods are all substantially lower for they are less favorable to the centrist; of
those, the number of his wins with the succeeding methods are all quite close
to 4,037 (they cannot, of course, be higher) for they are more favorable to the
centrist. This is true of every method, every row of the table.
19.2 Manipulability
Every method is manipulable; there is no escape from the possibility that some
voters (or judges) will try to game the vote. The operational question is, What
344
Chapter 19
Table 19.2
Number of Wins of the Centrist Candidate (Bayrou), 2007 Orsay Experiment
First-past-the-post
Two-past-the-post
AJ Very Good
Majority judgment
Condorcet
Point-summing
AJ Good
Borda
FPP
TPP
AJ VG
MJ
Condorcet
PS
AJ G
Borda
48
48
48
45
48
48
48
48
48
764
741
429
764
741
741
762
41
276
1,316
1,313
1,270
1,300
1,287
1,305
45
429
1,313
4,037
3,655
3,879
4,001
3,994
48
764
1,270
3,655
6,552
6,223
6,310
6,523
48
741
1,300
3,879
6,223
7,875
7,604
7,862
48
741
1,287
4,001
6,310
7,604
8,801
8,460
46
762
1,305
3,994
6,523
7,862
8,640
9,592
Note: FPP = first-past-post; TPP = two-past-post; AJ = approval judgment; VG = Very Good ; MJ = majority
judgment; PS = point-summing; G = Good. AJ Excellent is not included because Bayrou only won three
times (see table 19.1).
Ten thousand samples of 101 ballots, which were drawn from a sample of 501 ballots representative of the
national vote.
A diagonal entry (boldface) in the row of a method M is the number of Bayrou wins with method M; an
off-diagonal entry is the number among those wins that he wins with another method (e.g., with Condorcets
method Bayrou won 6,552 times; in 764 of those, he also won with two-past-post).
method or methods best resist manipulation? The theoretical reasons that the
majority judgment dominates all other known methods on this ground have
already been proven. So, what may happen in practice?
In the 2007 Orsay experiment, Bayrou was the winner and Royal the
runner-up, with respective majority-gauges
Bayrou : (44.3%, Good +, 30.6)
Could those voters who gave a higher grade to Royal than to Bayrou have
manipulated the outcome to make Royal the winner by raising Royals grades
or lowering Bayrous (for the majority judgment is partially strategy-proofin-ranking, so a voter cannot contribute to lowering Bayrous majority-gauge
and to raising Royals majority-gauge)? The answer is yes, they could have,
if all or a great many of them had exaggerated the grades they had assigned.
The evidence shows, however, that this is unrealistic. The polls suggest that in
the official election at most some 30% of the voters voted strategically not in
accord with their true beliefs.
Consider the data given in table 19.3, where B stands for Bayrou and R for
Royal. The table shows the percentages of all the ballots that graded Royal
above Bayrou by types of grades.
Type VII 9.2% of all ballots gave to Royal an Excellent or a Very Good
and to Bayrou an Acceptable or worse; they are unable to change either
candidates majority-gauge. However, with a point-summing method, those
345
Table 19.3
Types of Ballots That Grade Royal above Bayrou, 2007 Orsay Experiment
Type of
Ballot
Percent of
All Ballots
I
II
III
IV
V
VI
VII
2.8%
6.3%
6.9%
2.4%
3.2%
2.1%
9.2%
Excellent
Very
Good
B
R
|
R
R
|
R
Good
B
R
B
R
Acceptable
Poor
To Reject
Strategy
R
|
|
B
|
B
B
B
B
0%
33%
33%
33%
67%
67%
Cannot
who gave Royal a Very Good could have increased her point total, and those
who gave Bayrou an Acceptable or a Poor could have decreased his point total.
Type III 6.9% of all ballots gave to Royal a Very Good and to Bayrou a Good ;
they are unable to increase Royals majority-grade but can decrease Bayrous.
Yet those who are able to decrease Bayrous have no incentive to decrease it
below Acceptable (indicated by the vertical line) because doing so will not
further decrease his majority-gauge. However, with a point-summing method,
they could increase Royals point total, and they would have the incentive to
give Bayrou the lowest possible grade.
Type IV 2.4% of all ballots gave to Royal a Good and to Bayrou an Acceptable; they are able to increase Royals majority-gauge but are unable to
decrease Bayrous. Yet those who are able to increase Royals have no incentive
to increase her grade above Very Good (indicated by the vertical line) because
doing so will not further increase her majority-gauge. However, with a pointsumming method, they would have the incentive to give Royal the highest
possible grade, and they could decrease Bayrous point total.
346
Chapter 19
and the manipulation fails to change the outcome, Bayrou remains ahead of
Royal. Assigning 5 points to Excellent down to 0 points for To Reject and
applying the same scenario, the point-summing method makes Royal the winner
(and note that type VII voters are able to manipulate).
More evidence concerning the manipulability of methods may be found using
the database of the 1,733 ballots of the 2007 Orsay experiment. Sample problems of 201 ballots were drawn at random, and two different scenarios of
manipulation were tested (samples of 101, 201, and 301 were taken in this and
the other simulations; they consistently give concordant qualitative results). To
test any one method, restrict the sample problems to those that have an unambiguous winner A and runner-up B. Then change the messages of voters in
accordance with two scenarios:
Scenario 1 30% of those voters who gave a higher grade to B than A (chosen
at random) change to give B the highest and A the lowest possible grades.
Scenario 2 All those voters who gave a grade to B two levels above A
change to give B the highest and A the lowest possible grades.
347
Table 19.4
Number of Successful Manipulations in Ten Thousand samples of 201 Ballots, 2007 Orsay
Experiment
Grades
AJ E
PS
Borda
FPP
AJ VG
Condorcet
AJ G
MJ
8,763
8,778
6,820
6,958
4,489
4,358
7,072
7,112
6,166
6,167
3,857
3,878
1,538
1,538
2,591
2,591
9,154
9,166
9,962
9,999
9,522
9,949
6,000
7,941
8,601
8,656
7,075
7,957
9,779
9,750
9,282
9,963
7,162
9,355
6,607
8,769
7,373
7,309
5,501
7,200
9,935
9,940
9,463
9,972
8,699
9,721
7,411
8,748
7,826
7,860
4,392
7,669
9,924
9,924
6,504
9,368
5,389
8,074
7,236
8,579
4,050
4,050
2,048
5,426
finishers (they are always among Bayrou, Royal, and Sarkozy).1 A necessarily
retains a majority against B after the change; and B necessarily retains a majority against any other candidate C after the change. However, it may be that
lowering As grades implies that some other candidate C has a majority over A
after the change. In this case a Condorcet-cycle (namely, A S B S C S A)
has been produced by the change, and there is no winner. This is a successful
manipulation.
The 2007 Orsay ballots have six grades, but there are twelve candidates,
so often candidates have identical grades. This artificially limits the possible
manipulation of the methods that depend on comparisons of candidates (Borda,
first-past-the-post, and Condorcet). To lift this constraint a second experiment
was conducted in which extra highest and lowest grades, super-Excellent and
worse-than-To-Reject, are added to the usual set of six grades (referred to as
Six in the tables) for the purposes of unequivocal manipulation (the eight
grades are referred to as Six in the tables).
Table 19.4 shows that majority judgment and approval judgment when
approval means at least Good (AJ G) consistently best combat manipulation.
1. Condorcet-cycles among the nine other candidates are ignored because they are not germane to
this analysis.
348
Chapter 19
Table 19.5
Successful Manipulations, Majority Judgment Compared with Other Methods, 2007 Orsay
Experiment
Scenario 1
501 Ballots
Six
Grades
Six
Grades
Majority judgment
AJ > Excellent
77%
89%
77%
90%
39%
99%
55%
99%
Majority judgment
Point-summing
Majority judgment
Borda
Majority judgment
First-past-the-post
Majority judgment
AJ > Very Good
Majority judgment
Condorcet
Majority judgment
AJ > Good
51%
100%
48%
96%
78%
51%
75%
88%
56%
47%
53%
83%
50%
100%
50%
100%
77%
77%
77%
89%
59%
59%
55%
84%
33%
96%
31%
89%
40%
58%
33%
76%
34%
33%
54%
59%
33%
100%
40%
98%
40%
77%
40%
76%
35%
72%
33%
59%
Six
Grades
Six
Grades
349
Table 19.5
(cont.)
Scenario 2
501 Ballots
Six
Grades
Six
Grades
Six
Grades
Majority judgment
AJ > Excellent
82%
95%
81%
96%
45%
95%
45%
95%
Majority judgment
Point-summing
Majority judgment
Borda
Majority judgment
First-past-the-post
Majority judgment
AJ > Very Good
Majority judgment
Condorcet
Majority judgment
AJ > Good
50%
92%
41%
73%
85%
57%
48%
76%
62%
29%
85%
38%
49%
100%
42%
95%
84%
87%
48%
74%
61%
48%
85%
38%
18%
63%
17%
56%
32%
47%
25%
32%
19%
11%
18%
13%
18%
93%
17%
82%
32%
71%
25%
32%
19%
46%
18%
13%
The alternative vote with twelve candidates elects in this experiment the
same candidate as two-past-the-post with the three major candidates; as a
consequence, it is extremely biased against centrist candidates.
350
Chapter 19
The least manipulable methods are majoritarian: Condorcet, approval judgment AJ Good, and the majority judgment.
AJ Good is hard to manipulate in this election because the winning candidates won with a majority-grade Good. When the highest majority-grade in an
election is, say, Very Good, AJ Good or AJ Excellent would have been
manipulable, whereas AJ Very Good would be less manipulable.
Condorcets method is far more biased in favor of the centrist than the majority
judgment; when it is manipulable, there is no winner, so some other ancillary
rule must be invoked; and, as discussed in chapter 20, when voters are strategic,
the final result does not reflect the true opinions of voters.
The majority judgment is one of the least manipulable methods and seems to
be the most balanced with regard to the left-right spectrum.
When all or many of the voters are strategic, and their utilities depend only on
the winner, all methods are manipulable. Chapter 20 addresses this hypothesis.
20
Rationality is only one of several factors affecting human behavior; no theory based on
this one factor alone can be expected to yield reliable predictions.
Robert J. Aumann
The analysis and comparison of methods, traditional and new, and thus the evaluation of which are best, depend on context. Different contexts are encountered
in theory and in practice. Often, in debates or arguments, when in one context
a method satisfies the criteria that are sought, willy-nilly it is attacked in terms
of another context. To be coherent, arguments should compare methods in each
of the several possible contexts separately. In the various contexts discussed so
far, the majority judgment has been shown, we believe, to dominate the other
methods in that it comes closest to meeting the important desirable properties,
and avoiding the undesirable ones, that have been identified across the years.
Nothing has been said, however, concerning the context raised in chapter 18,
which began with the pioneering article of Myerson and Weber (1993), where
voting is viewed as a game and outcomes as equilibria, preferably reached via a
sequence of best responses by the voters in view of their utilities and the information at hand. In the words of Myerson and Weber, Voting equilibria based
on polls are somewhat analogous to competitive equilibria based on prices in
economic models.
The aim of this chapter is to show that the majority judgment does at least
as well if not better than the other methods in this context as well. It must be
recognized at the outset, however, that the standard assumptions in the literature
on this context leave much to be desired. As the economists George Akerlof and
Robert Shiller stated (recalling John Maynard Keyness famous expression),
Insofar as animal spirits exist in the everyday economy, a description of how
the economy really works must consider those animal spirits (2009, 5). The
animal spirits of voters are often ignored; voters are viewed as purely rational
agents who seek to maximize their expected utilities, and rationality is assumed
352
Chapter 20
to mean that their utilities depend only on the identity of the elected candidate.
In this case the Condorcet-winner looms large in the equilibrium analysis. But
this transposition of economic theorizing into elections is patently false. Voters
have much more complicated (unknown) utilities that for many undoubtedly
include an honest expression of their opinions. We know firsthand of many
sophisticated voterseconomists and social choice theoristswho, in the 2007
French presidential election, preferred Bayrou to Sarkozy, were sure Sarkozy
would be in the second round and that in the second round he would defeat
Royal and be defeated by Bayrou, and yet voted for Royal in the first round.
Fundamentally, a method should not be considered good just because purely
strategic behaviorperhaps in total violation of some voters most cherished
hopes and wishesyields equilibria that satisfy certain properties. It is the
contrary that should be sought. A method should elicit the honest expression of
voters opinions as inputs, for the aim of an election is to produce outputs that
represent as best as possible the true wishes of societies and of juries.
20.1 Equilibria
Proof
353
Thus, the concept of a Nash-equilibrium winner is, with no further refinement, of little use. This result has given rise to a large literature that seeks
refined concepts of equilibria. For the most part the results are negative,
many equilibria remain, and the different concepts make it difficult if not
impossible to compare the relative merits of the different results. A focus
of recent interest is the question, When is there a strong-equilibrium winner? (Aumann 1959), meaning that no coalition of voters can deviate from
their strategies and thereby elect a preferred candidate (e.g., Sertel and Sanver 2004). This concept is particularly germane to elections because voters
talk among themselves (orally and via the Internet), belong to political parties
(which sometimes give careful instructions for exactly how to vote, as in Australia), and have access to large amounts of information (much of it common),
including repeated opinion polls. Thus sets of voters may often adopt the same
strategies.
The central fact is that when any reasonable method of election of the traditional or the new model is formulated as a gamewith the inputs the strategies
and voters utilities devoid of animal spiritsonly the Condorcet-winner can
be a strong-equilibrium outcome.
A method of election is majoritarian if for any candidate X and any strict
majority of voters, there exists a common strategy t for each of them
such that whatever the strategies of the others, X is elected. A candidate X
is a strong-equilibrium winner for a given method of election if there exists
a strategy-profile for which X is the winner, and for any other candidate Y
and any coalition of voters i S with utilities satisfying ui (Y ) > ui (X), the
voters of S cannot make Y the winner by together changing their strategies.
A candidate C is a Condorcet-winner if there is no candidate X strictly preferred to C by a majority of the voters (i.e., their utilities satisfy ui (X) >
ui (C)).
For any majoritarian method, a candidate is a strongequilibrium winner if and only if the candidate is a Condorcet-winner.
Theorem 20.2
354
Chapter 20
For simplicity, consider only three candidates, and s = (1, a, 0), where
0 < a < 1. Consider the profile
Proof
(1) n + n : A B C
(2) n : B C A,
(1 )(n + n ) : A C B,
1
2
n : B C A,
As score is n + n and Bs is at least a2 (n + n ) + n, so Bs is higher. If, on the
other hand, < 12 and type 2 voters counter with
n : C B A,
355
Thus any reasonable method can only elect the Condorcet-winner as a strongequilibrium winner. But for the Condorcet-winner to be elected, the method
must be weakly majoritarian. This excludes Bordas method and all monotonic
sum-scoring methods.
20.2 Honest Equilibria
Inherent in the proofs of the last section is that for most methods when the
utilities of voters depend only on the identity of the winner, there are a huge
number of strong-equilibria strategy-profiles that elect the Condorcet-winner.
In practice, the most likely of the equilibria comprise cases when the voters
express themselves as honestly as possible. Are there equilibria in which all or
most of the voters express themselves honestly? This is important because
the outcomes of elections should come as close as possible to reflecting
the true opinions of voters. It will be seen that there is no method, in either the
traditional or the new model, that can guarantee the election of the Condorcetwinner with the honest opinions of the voters. On the other hand, the middlemost
mechanisms, notably, the majority judgment, admit equilibria where at least a
majority of the grades given each candidate are honest.
1. See Sertel and Sanver (2004), who give a similar definition and note that Borda is not majoritarian
but is best response majoritarian.
356
Chapter 20
No single-valued mechanism in the traditional model guarantees the election of the Condorcet-winner as a strong equilibrium when voters
rank-orders are honest.
Theorem 20.6
This theorem and its proof are due to Brams and Sanver (2006). Take
any Condorcet-consistent method in the traditional model and consider the
five-voter strategy-profile
Proof
(1)
(3)
2:A D B C
1:C A B D.
(2)
2:B D C A
Theorem 20.7
Proof
k judges
1 judge
k judges
X:
1, . . . , 1,
0,
3, . . . , 3
Y:
2, . . . , 2,
3,
0, . . . , 0
and
k judges
357
1 judge
k judges
X:
1, . . . , 1,
0,
3, . . . , 3
Y:
0, . . . , 0,
3,
2, . . . , 2
All voters who assign C at least give their honest grades; all others give C
the grade ;
Any voter who assigns a grade or above to another candidate B and who
prefers C to B, gives B a lower grade than (barely below will do); otherwise
voters give candidates other than C their honest grades.
358
Chapter 20
Table 20.1
Hypothetical Honest Grades, 1970 New York Senate Election
Type
B
G
O
1a
10%
1b
29%
2a
16%
2b
8%
Excellent
Poor
Acceptable
Very Good
Good
To Reject
Acceptable
Very Good
Good
Poor
Very Good
Acceptable
Note: Total percentages of types 1, 2 and 3 correspond to actual 39% for Buckley, 24% for Goodell,
and 37% for Ottinger.
Table 20.2
Strategy Profile That Elects Condorcet-Winner as a Strong Equilibrium, with True Middlemost
Grade, 1970 New York Senate Election
Type
B
G
O
1a
10%
1b
29%
2a
16%
2b
8%
Excellent
Good
Acceptable
Very Good
Good
To Reject
Acceptable
Very Good
Good
Poor
Very Good
Acceptable
359
Table 20.1
(cont.)
Type
B
G
O
3a
22%
3b
15%
Majority-Gauge
Poor
Acceptable
Very Good
Acceptable
Very Good
Excellent
Table 20.2
(cont.)
Type
B
G
O
3a
22%
3b
15%
Majority-Gauge
Poor
Good
Acceptable
Acceptable
Very Good
Acceptable
Notice also that if instead of using the strategy just described, some voters
deviate by giving their honest grades, the Condorcet-winners majority-gauge
decreases and all the other candidates majority-gauges increase. This implies
that if the winner changes, then the new winner must be ranked higher than the
Condorcet-winner according to the honest grades of voters. This is a positive
property.
To illustrate the idea, consider an example modeled on the 1970 New York
Senate election among James Buckley (B), Charles Goodell (G) and Richard
Ottinger (O) (see section 18.2), and suppose there are six types of voters and
the honest profile (with the usual six grades) given in table 20.1.
Goodell is the Condorcet-winner, and the majority judgment order of finish
is Goodell S Ottinger S Buckley. The strategy-profile of the proof is given
in table 20.2 together with the corresponding majority-gauges (a middlemost
aggregation mechanism). All of Bs strategic grades are honest, and only 32%
of Gs and 37% of Os are not honest.
The strategy-profiles of the proof last theorem 20.8 may be interpreted as
follows. First, voters are motivated by the wish to elect a certain candidate;
second, they are motivated by the wish to express themselves as honestly as
360
Chapter 20
possible or to give candidates final grades as close as possible to their assessments. It will be seen in the last section of this chapter that when utilities depend
not only on who is the winner but also on the winners final grade, the winner
in all strong-equilibria strategy-profiles is elected with his true majority-grade.
Some theorists do not believe in using the concept of a strong equilibrium
because this implies considerable coordination among the voters. In any case,
if the vote is secret and since a voter is never certain about the strategies of
others, he may respond to some probabilistic belief about others behavior. The
way such probabilities are formed induces a best response correspondence.
The next two sections, which explore the fixed points of such correspondences,
are fairly technical. It turns out that the Condorcet-winner emerges as the unique
possible equilibrium outcome; the number of possible equilibria is very small,
sometimes unique; and the output of the election is determined by many honest
votes. Thus a quite different set of arguments conclude with results very similar
to those developed in previous sections.
20.3 Best Response Equilibria
[The] voting behavior of at least some persons is a function of their expectations of the election outcome; published poll data are assumed to influence
these expectations, hence to affect the voting behavior of these persons (Simon
1954, 245246).
Imagine a large electorate, approaching elections, a plethora of polls and
other public information, and a method of voting where voters assign grades
to candidates. An individual voter has a very small chance of changing the
winner by changing the grades he assigns in any case, and he knows it. But it
is reasonable for a voter to believe that if the grade he assigns to one candidate
changes the outcome of the election, then all the other grades he assigns cannot
change the outcome. For example, if the final grade is the median, for a voter
to change the medians of two candidates is less likely by orders of magnitude
than changing the median of one candidate (a rare event to begin with). In the
jargon of game theory, if the probability of being pivotal for one candidate is
some (n) (where n is the number of voters and (n) 0), being twice pivotal
has probability of, say, (n)2 .
Assume that a voter is sophisticated he reasons that the probability of
being able to affect the result with more than one of the grades he assigns is so
minuscule as to be negligibleor a voter behaves with limited rationalitythe
calculation to decide how to correlate the grades he assigns is too complex a
task to perform, so he assigns each grade independently.
361
X is the likely winner with final grade , and Y is the runner-up with final grade
(where ). Then when voters are sophisticated or behave with limited
rationality, 3 they respond as follows:
If the voter prefers a candidate Z to X, he assigns Z a grade higher than (if
is the highest grade, he assigns it).
In each of the four cases the voter maximizes the chance for the grade he
assigns to elect the candidate he prefers, forgetting about possible interactions
among other candidates, given the information at his disposal. Higher means
strictly higher and lower means strictly lower. However, among the many
possible best responses it is reasonable to suppose that the voters choose to
be as honest as possible and thus give grades as close as possible to the honest grades. This may be justified by ascribing to voters lexicographic utilities
where the winner is what counts first and honesty or the final grade second (see
section 20.6).
To clarify the idea consider again the 1970 New York Senate election example
(see table 20.1), where Goodell is the Condorcet-winner. Suppose Goodell is
the likely winner with majority-grade Good, and Ottinger is the runner-up with
majority-grade Acceptable. Then the best response strategies are as shown in
table 20.3 (assuming voters are sophisticated or behave with limited rationality,
as honestly as possible). The outcome is Goodell the winner with majoritygrade Good and Ottinger the runner-up with majority-grade Acceptable, so this
is an equilibrium (or fixed point) with Goodell the winner and Ottinger the
3. When there are only two gradesapproval judgmentthese rules specialize to the poll-leader
rule (see chapter 18). This strategy may be deduced with arguments similar to those given by Laslier
(2009).
362
Chapter 20
Table 20.3
Best Response Strategies of Sophisticated or Limited-Rationality Voters of Table 20.1
Type
B
G
O
1a
10%
1b
29%
2a
16%
2b
8%
Excellent
Poor
Very Good
Very Good
Good
To Reject
Acceptable
Very Good
Acceptable
Poor
Very Good
Acceptable
Note: These voters are sophisticated, or they behave with limited rationality to G, the winner, with
majority-grade Good and to O, the runner-up, with majority-grade Acceptable.
Total percentages of types 1, 2, and 3 correspond to actual 39% for Buckley, 24% for Goodell,
and 37% for Ottinger.
runner-up. The reader may check that for the most part voters assign honest
grades.
On the other hand, repeated best responses do not always converge from
any starting point. For example, if Ottinger is the winner and Goodell the
runner-up, both with majority-grade Good, the response is Goodell the winner
with majority-grade Very Good and Ottinger the runner-up with majority-grade
Acceptable. Then the next response is Ottinger winner and Goodell runner-up,
both with majority-grade Good, so the process cycles (just as it did in the
approval voting example).
A unique leading candidate X with majority-grade and unique runner-up
Y with majority-grade (where ) is a fixed-point equilibrium, written
(X, ; Y, ), if the voters response to its announcement produces the same
outcome. Note that given (X, ; Y, ) the best response rule determines the
voters strategies uniquely. In this section and the next, which also concerns
best responses, some statements or proofs hold generically, e.g., in any faceto-face confrontation there is an absolute majority in favor of one candidate.
Otherwise disagreeable details must be spelled out.
Theorem 20.9 Suppose a best response strategy of voters is used when the vot-
363
Table 20.3
(cont.)
Type
B
G
O
3a
22%
3b
15%
Majority-Gauge
Poor
Poor
Very Good
Acceptable
Poor
Excellent
equilibria are close to being strong equilibria: if X is the winner and a coalition
of voters prefer Z to X, they cannot increase the majority-gauge of Z; and those
who prefer Y to X cannot decrease the majority-gauge of X.
Suppose that (X, ; Y, ) is a fixed-point equilibrium.
Note, first, that the best response strategy implies that Xs majority-grade is
strictly above Y s majority-grade, .
Assume that X is not a Condorcet-winner. Then there is some candidate
Z (perhaps Y ) who is preferred to X by a majority of voters. If is not the
highest grade, Z must have a higher majority-grade than X at the next round,
contradicting the fact that (X, ; Y, ) is an equilibrium. So suppose is the
highest grade. If Z = Y , then Y has a higher majority-grade than X at the next
round, a contradiction. So suppose Z = Y . Then Zs majority-grade at the next
round is the highest grade, and Z must replace either X or Y as the leader or
runner-up, again a contradiction.
n
Proof of First Statement
364
Chapter 20
runner-up is ). This implies that the true grades given to Z (and Y ) by a majority
of voters (all of whom prefer C to Z) are at most . Therefore, the strategic
majority-grades of the best responses of any Z against (C, ) where + 1
will be identical to Zs strategic majority-grade against (C, ); in particular,
Y s strategic majority-grade will be exactly and any other candidates will be
at most , though the runner-up may change. However, the strategic majoritygauge of the best responses of any Z against (C, ) where + 2 will be
identical to Zs strategic majority-gauge against C with grade ; so in particular,
Y will remain the runner-up.
In response to runner-up Y with strategic majority-grade , Cs strategic
majority-grade is some 1 . If 1 + 1, then (C, 1 ; Y, ) is a fixedpoint equilibrium. If 1 = + 1 and it happens the runner-up is again Y , then
(C, 1 ; Y, ) is again a fixed-point equilibrium.
The last remaining possibility is 1 = + 1 and the runner-up changes to
some candidate Z with strategic majority-grade . Let 2 be Cs best-response
grade to the runner-up (Z, ). If 2 = + 1 then (C, 2 ; Z, ) is a fixed-point
equilibrium. If not, the best responses yield the sequence
(C, 2 ; Z, )(C, 2 ; Y, )(C, 1 ; Y, )(C, 1 ; Z, )(C, 2 ; Z, ),
so C is the winner in every succeeding round.
365
A = C, is not the minimum grade and the true runner-up B has a true
majority-grade of at least 1.
The only case when there may be none is when A = C and the true runner-up
B has a true majority-grade of less than 1. But then, as was seen, either
there exists an equilibrium or there is a sequence of best responses where the
Condorcet-winner always remains the winner.
This proves only the first of these assertions, that (C, + 1; Y, ) is
a fixed-point equilibrium when A = C and is not the maximum grade. The
other proofs use similar arguments. Set = + 1 and = , and let Z be any
candidate other than C. Zs true majority-grade is at most . A majority of
voters prefer C to Z. They will give their true grade to Z if it is less than
and give = 1 if their true grade is at least , which cannot change Zs
majority-grade since those that decreased did not give a grade below Zs true
majority-grade. Among the remaining voters (a minority) who prefer Z to C
some may have increased Zs grade, so the strategic majority-grade of Z
which can be at most = 1, since a majority gave at most that gradeis
at least Zs true majority-grade but below .
Therefore, every candidate Z other than C has, in response to the Condorcetwinner C with grade , a strategic majority-grade equal to or above his true
majority-grade but at most , including, of course, A. This implies that there
must be a runner-up candidate Y with strategic majority-grade = 1 (who
is almost surely unique because it is determined by the majority-gauge, and
may well be A).
The true majority-grade of C is at most . In response to the runner-up
candidate Y with majority-grade , a majority of the voters will assign at least
+ 1 to C and a minority will assign a grade less than , so Cs strategic
majority-grade must be at least + 1. But the only grades that are increased
over the true grades are increased to + 1, so since the true majority-grade is
at most , the strategic majority-grade cannot be above + 1 and thus must be
exactly + 1. Since the best responses yield these outcomes, (C, + 1; Y, ) is
n
a fixed-point equilibrium.
Proof
366
Chapter 20
A question remains: Does there exist a best response dynamic that converges
to the Condorcet-winner (when he exists), as there was for approval voting?
When all voters simultaneously respond with the grades of all candidates, the
best response strategies for mechanisms that depend on the majority-gauge do
not necessarily converge.
Imagine that a poll announces that the leading candidate is X0 with grade
0 and the runner-up is Y0 with grade 0 (implying 0 0 ). This determines
the first outcome (X0 , 0 ; Y0 , 0 ) in a sequence of polls ending in the actual
election. In a real situation, the outcome of a next round does not come
about because every candidates majority-gauge is computed simultaneously
but rather because some one or several of them are in the news and the voters
give their best responses concerning only them. To model this idea the dynamic
of successive polls studied here assumes that voters give their best response
grades to some one candidate Z at each roundwho may be the leader, the
runner-up, or some other candidatein view of the current outcome. This may
or may not change the outcome.
Table 20.4
Best Response Strategies of Sophisticated or Limited-Rationality Voters of Table 20.1
B
G
O
10%
29%
16%
8%
Excellent
Poor
Excellent
Excellent
Very Good
To Reject
Acceptable
Very Good
Good
Poor
Very Good
Acceptable
Note: These voters are sophisticated, or they behave with limited rationality to (G, Very Good ; O,
Good ), a fixed-point equilibrium.
367
To understand the idea, consider the 1970 New York Senate example of table
20.1. Suppose it happens that the candidates respective majority-gauges are
O : (47%, Good, 24%)
B : (39%, Acceptable, 30%)
G : (24%, Acceptable, 10%),
so that the outcome (O,Good ; B, Acceptable) is announced. The best responses
to candidate B yield
10%
29%
16%
8%
22%
15%
Majority-Gauge
B : Excellent Very Good Acceptable Poor Poor Acceptable (39%, Acceptable, 30%)
so the outcome is the same. The best responses to candidate O now yield
O:
10%
29%
16%
8%
22%
15%
Majority-Gauge
Poor
To Reject
Good
Good
Very Good
Excellent
(37%,Good, 39%)
16%
8%
22%
15%
Majority-Gauge
G : Poor Very Good Very Good Very Good Acceptable Acceptable (0%,Very Good, 47%)
and the two leading candidates are now (G , Very Good ; O, Good ). Taking the
three candidates in any order elicits the best responses given in table 20.4.
When, instead, the initial announcement is (G, Good ; O,Good ), and the
voters again begin by responding with Bs grades, nothing changes. If they
then give Os grades, with outcome (G, Good ; O, Acceptable), followed by
Gs grades, the result is as shown in table 20.5. If the voters now respond by
giving grades to B or O or G, the outcome remains (G,Good ; O, Acceptable):
there is convergence and the winner is the Condorcet-winner.
In this example there are exactly two fixed-point equilibria: (G, Very Good ;
O, Good ) and (G, Good ; O, Acceptable). There can be no equilibrium with G
the winner and O the runner-up where their majority-grades differ by more than
Table 20.4
(cont.)
B
G
O
22%
15%
Majority-Gauge
Poor
Acceptable
Excellent
Acceptable
Acceptable
Excellent
368
Chapter 20
Table 20.5
Best Response Strategies of Sophisticated or Limited-Rationality Voters of Table 20.1
B
G
O
10%
29%
16%
8%
Excellent
Poor
Very Good
Very Good
Good
To Reject
Acceptable
Very Good
Acceptable
Poor
Very Good
Acceptable
Note: These voters are sophisticated, or they behave with limited rationality to (G, Good ; O,
Acceptable), a fixed-point equilibrium.
one level (it was noted in section 20.3 that if there exists an equilibrium where
Y the runner-ups majority-grade is at least two levels below the winners, then
it is the unique equilibrium in which Y is runner-up), and it is simple to check
that no others exist where Gs majority-grade is one above Os. Note that in
the first case Gs grade is above his true majority-grade and Os is his true
majority-grade, whereas in the second case Gs grade is his true majority-grade
and Os is one below. Bs majority-grade and majority-gauge are in both cases
the true ones. The strategic majority-grades are close to the true ones.
Specialized to approval voting (or approval judgment), the same dynamic
uses the leader and runner-up (the grades add nothing). If, for example, the
voters initially give approvals only to those candidates they evaluate to be
Excellent, the respective approval scores of the candidates are B, 10; G, 0; and
O, 15; and the initial outcome is (O, B). The successive columns of table 20.6
give the respective approval scores of each of the candidates and the associated
outcomes. For example, from the scores (39,0,61) and outcome (O, B) of the
third column, the best responses of the voters to candidate G are the approvals
and disapprovals (0,1,1,1,0,0) for each of the six types, changing the approval
score of G to 53 and the outcome to (O, G) in the fourth column. At equilibrium
the approval score of the Condorcet-winner (here G, score 53) is the number
of voters that prefer him to the runner-up (here O); the approval score of every
other candidate X is the number of voters who prefer X to the Condorcet-winner.
These scores depend only on the relative preferences. They could result from
a host of different evaluations of the candidates (e.g., in the true preferences
of table 20.1, types 2a, 2b, 3a, and 3b are indistinguishable to approval voting
strategies).
The best response dynamic procedure (BRDP) is as follows.
Time t. A strategic profile of majority voting grades and two leading
candidates, the leader and the runner-up, are given.
369
Table 20.5
(cont.)
B
G
O
22%
15%
Majority-Gauge
Poor
Acceptable
Very Good
Acceptable
Acceptable
Excellent
Table 20.6
Approval Scores of Best Response Strategies of Sophisticated or Limited-Rationality Voters of
Table 20.1
B
G
O
Outcomes
10
0
15
(O, B)
39
0
15
(B, O)
39
0
61
(O, B)
39
53
61
(O, G)
39
53
61
(O, G)
39
53
47
(G, O)
39
53
47
(G, O)
Note: These voters are sophisticated, or they behave with limited rationality using approval voting.
The last column is the fixed-point equilibrium (G, O).
the strategic profile at time t + 1. A new leader and runner-up are designated.
If there are two or more candidates with the highest majority-grade, the leader
and the runner-up are chosen at random among them; otherwise the candidate
with the highest majority-grade is the leader, and the runner-up is chosen at
random among candidates with the next highest grade.
The random choice of a leader is justified because candidates with highest majority-grades are necessarily important candidates, even though their
majority-gauge may not be the highest, so polls and the media may well at
some time take one of them to be the leader.
Beginning with any initial profile of grades of a finite language, the BRDP converges (almost surely) to an outcome whose winner is
the Condorcet-winner (if he exists); otherwise the outcomes cycle, the leader
always one of the majority top cycle of candidates.
Theorem 20.10
The BRDP defines a Markov process. A state of the process is the set
of candidates with the highest majority-grade.
Suppose first that there exists a Condorcet-winner C. The state in which C is
the unique candidate with the highest majority-grade is clearly absorbing, i.e.,
once reached, never left.
There is a positive probability that from any initial state, the BRDP generates
a sequence of states that leads to C the unique leader. Take any state. Either C
Proof
370
Chapter 20
belongs or not. If not, the BRDP selects C with a positive probability, and Cs
majority-grade necessarily increases and either becomes the unique candidate
with the highest majority-gradein which case the absorbing state has been
reachedor is among those with the highest majority-grade, which must be the
highest possible grade max . There is a positive probability that C is selected
as leader, and voters are asked to respond to another candidate Z having the
maximum majority-grade. In that case Z is eliminated from the set of candidates
having the maximum grade. Continuing, candidates are ejected (with positive
probability) one by one from the set of those with the majority-grade max . 4
Suppose next that there is no Condorcet-winner. Then there is a majority
top cycle of candidates C1 maj C2 maj maj Ck maj C1 : a majority
prefers C1 to C2 , C2 to C3 , . . . , and each of these candidates is preferred by
a majority to any candidate who is not in the top cycle. If a state is reached
that includes only candidates of the top cycle, then succeeding states will only
include candidates of the top cycle (because only candidates of the top cycle
can defeat a candidate of the top cycle).
There is a positive probability that from any initial state, the BRDP generates
a sequence of states that leads to a state that contains only top cycle candidates.
Take any state. If it includes no candidate of the top cycle, then with positive
probability the new state will include such a candidate. If it does include a
candidate of the top cycle, then candidates not of the top cycle are eliminated
n
(with positive probability) one by one. 5
The preceding theorems described situations where a Condorcet-winner
exists. The proof of the last theorem shows that when there is no Condorcetwinner, the winner will necessarily belong to the majority top cycle of candidates.
20.5 Strategic Majority Judgment Winner
What can be said about the equilibria of games of voting when animal spirits
are part of the analysis? This may, of course, be modeled with utility functions
that are not restricted to the clearly unrealistic assumption that a voter cares
only about who is elected. Hypothesize instead that in the new model there are
three types of voters with different utilities:
Type I voters i have utilities ui that depend only on the winner of the election (as in the usual analyses). In the new model these ui are assumed to be
371
compatible with the true grades of the voters. When the language of grades is
sufficiently rich, ui (X) > ui (Y ) if and only if is true grade for X is above is
true grade for Y ; otherwise ui (X) > ui (Y ) implies is true grade for X is at
least as high as is true grade for Y . Type I voters are winner optimizers.
Type II voters i have utilities ui that are single-peaked-in-grading and thus
depend only on the final grades of candidates. The closer the final grade of a
candidate X to is true grade for X, the greater is is utility. Type II voters are
final-grade optimizers.
Type III voters i have utilities that depend only on honesty: the further the
deviation of the grades such voters i assign to a candidate X from their true
grade for X, the lower are their utilities. These voters will simply always assign
their true grades. Type III voters are honesty optimizers.
Type I voters constitute the class of voters that has traditionally been studied.
In fact, it seems that only a relatively small percentage of voters in national
elections are of this type. (Even after the debacle of the 2002 French presidential election, at most 30% of French electors were of this type in the next
presidential election, 2007.) Type II voters were analyzed previously, and it
was shown that when the mechanism is an order function, and in particular, the
majority judgment, their optimal strategy is to assign to candidates the grades
they honestly believe are warranted. Type II and III voters are those who really
wish to send messages declaring to the public at large, as well as to the candidates themselves, the esteem with which they regard politicians. They are,
of course, frustrated by the traditional model and the methods actually used to
conduct elections.
To begin, assume there are only voters of types I and III, and consider the
game of voting with an arbitrary mechanism F . A candidate X is the strategic
winner of F against Y if X wins when all voters of type I who prefer X to Y
give to X the highest possible grade and to Y the lowest possible grade, those
who prefer Y to X do the opposite, and all other voters assign both X and Y the
grades they honestly believe are merited. A candidate X is the strategic winner
of F if no other candidate is the strategic winner against him.
Suppose all voters are of types I or III. C is a strongequilibrium winner of a mechanism F if and only if C is a strategic winner of F .
Theorem 20.11
372
Chapter 20
Suppose, then, that C is a strategic winner of F . Consider the strategyprofile where every voter of type I gives to C the highest grade and to every
other candidate the lowest grade, and all other voters give honest grades. Any
coalition that prefers another candidate B to C is unable to elect B by changing its strategy because C is a strategic winner, so C is a strong-equilibrium
n
winner.
A candidate X is the strategic majority winner against Y if X is the winner
against Y with the majority judgment when all voters of type I who prefer X
to Y give to X the highest possible grade and to Y the lowest possible grade,
those who prefer Y to X do the opposite, and all other voters (of types II and III)
assign both X and Y the grades they honestly believe are merited. A candidate
X is the strategic majority winner if X is the strategic majority winner against
every other candidate. A candidate X is a coalitional-equilibrium winner if
no coalition of voters of the same type can change their strategies and elect a
candidate Y preferred by all of them (Laraki 2009).
Theorem 20.12
winner.
All type I voters give to C the highest possible grade and to any other
candidate the lowest possible grade. All type II and III voters assign grades
honestly. By definition, no set of type I voters who prefer a candidate B to C
can manipulate. Moreover, no set of type II voters who would like to increase or
decrease the majority-gauge of any candidate can do so because the majorityn
gauge is strategy-proof-in-grading.
Proof
373
More realistically, assume now that the preferences of voters are over the pairs
(A, ), where A is the winner and is the winners final grade. A voters
preferences over these pairs are assumed to be complete, transitive, and
when the winner A is the same, the preferences over the pairs (A, ) is singlepeaked in the majority-grades , that is, a voter prefers the winners final grade
to be as close as possible to his ideal grade. The ideal grade given a winning
candidate maximizes the voters utility. In the honest profile of the electorate a
voter assigns his ideal grade to each candidate.
A is called a Condorcet-judgment-winner if there is no candidate B such that
a majority of voters strictly prefer (B, ) to (A, ), where is any grade and
is the honest majority-grade of A.
A voters preference over pairs is lexicographic if (A, ) (B, ), then
(A, ) (B, ) for all , . Consequently, when the preference is lexicographic, a Condorcet-winner coincides with a Condorcet-judgment-winner. In
general the concepts may differ.
Theorem 20.13 With the majority-gauge, (A, ) is a strong-equilibrium outcome if and only if A is a Condorcet-judgment-winner and is his honest
majority-grade.
Assume there are 2n or 2n + 1 voters and that (A, ) is a strongequilibrium outcome. If is not the true majority-grade of A, then a majority of
voters will prefer A to win with his honest majority-grade (say, ) rather than
for him to win with the majority-grade .
If is not the minimal grade, a majority can assign the grade to A and the
minimal grade to all other candidates, implying that the new output is (A, ). If,
on the other hand, is the minimal grade, n voters of the majority can assign the
highest possible grade to A and the others the minimal grade to A; and assign to
all other candidates the minimal grade. In that case, the highest majority-gauge
is for A, who is elected with his honest majority-grade. Consequently, if (A, )
is a strong-equilibrium outcome, is his honest majority-grade.
Assume now that some pair (B, ) is preferred to (A, ) by a majority of
voters. This majority can elect (B, ) as described. Thus A is a Condorcetjudgment-winner.
Proof
374
Chapter 20
The utilities that judges and voters maximize are unknown. They are undoubtedly more complex than over pairs of rank-orders and associated majoritygrades. How, then, should utilities be modeled or approximated? We believe
they should be made to depend as much as possible on the results or the outputs
of the system that is studied. When first-past-the-post is analyzed, the output is
the number of votes received by each candidate, together with the winner and
the rank-order that is induced. When Bordas method is used, the output is the
Borda-score of each candidate, together with the winner and the rank-order that
is induced. When majority judgment is used, the output is each candidates distribution of grades, together with each candidates resulting majority-grade and
majority-gauge, the winner and the majority-ranking. The difficulty is that realistic utilities resist qualitative analyses. This is why utilities have been given
relatively simple formulations such as depending only on the identity of the
winner or on the pair of a winner and a final grade.
One thing is clear. Extending the concept of utilities to all the outputs of an
electionthose that are announced publiclyhelps to explain many obscure
phenomena that have received considerable attention. Examples abound. Take
first-past-the-post. Abstention: perhaps a voter does not wish a winner to win
by much. Participation: perhaps the contrary (e.g., Chirac versus Le Pen, which
brought out a record number of voters). Votes given small party candidates:
voters know they cannot win and may wish they do not win, and yet they vote
for them.
It would be more realistic with the majority judgment to assume that voters
have utilities that depend on the entire distribution of the grades assigned by
the electorate and the attendant majority-gauges and majority-ranking, but this
could lead to much more complex mathematics.
The simple analysis given here suffices to give additional insight as to why
the majority judgment is a more honesty-inducing mechanism than others
in the context of the game of voting.
21
Multicriteria Ranking
Each voter distinguishes one candidate for political office from another by
some ill-defined mix of criteria that may touch upon party affiliation and party
platform; honesty and moral outlook; voice, appearance, and charisma; foreign,
economic, and social policies; and a host of other considerations. But there is no
agreement among voters on which of these aspects are more or less important:
each voter is left to integrate all the criteria he believes of importance to reach
a final judgment on the merit of each candidate.
Skaters, gymnasts, countries, pianists, wines, . . . , however, are routinely
evaluated on the basis of separate criteria, attributes, or characteristics, and
the evaluations of the parts are aggregated into an evaluation of the whole. For
skaters and gymnasts, the merit of separate parts of performances are measured,
then aggregated into a measure of the whole. For countries, the quality-of-life
index is a weighted sum of indicators that concern health, political stability,
security, community life, political freedom, and so on. For pianists, often a
sequence of increasingly demanding performances weeds out competitors to
end up with a handful who are ranked on the basis of their final recitals. For
wines, each of a set of well-defined criteria depending on odor, taste, aspect,
and total impact is evaluated in terms of a common language; the evaluations
are transformed into numbers; and the numbers are added to determine final
number-grades. The goal of this chapter is to extend the majority judgment to
multicriteria problems.
376
Chapter 21
= ...
1
..
.
..
.
2
..
.
l
..
.
..
.
l1
..
l1
..
.
..
.
l
..
.
..
.
Transitivity
A R B and B R C implies A R C.
Moreover, if no pair of languages
j are common, then the ranking rule should
also be
Multicriteria Ranking
377
The desirable properties are strangely reminiscent: they ask what Arrows
theorem asked when the role of voters or judges is played by the criteria (see
chapter 11). As a consequence, the only rule R that satisfies the properties is
the dictatorial rule: to decide on the relative standing of two competitors, there is
a well-defined sequence of criteria. The first criterion decides; if it declares the
competitors tied, the second criterion decides; if the second criterion declares
a tie, the third criterion decides; . . . ; if all the criteria of the sequence declare
a tie, the result is a tie. A very similar result is given by Plott, Little, and Parks
(1975).
More generally, if some of the languages
j are common, a set of criteria
with a common language decides; if there are ties among competitors, a second
set of criteria with a common language decides; and so on.
What is to be done? Practice, once again, gives ideas. Several methods have
been used, some of which have already been described. Two are recalled here.
One criterion or a set of criteria with a common
language decides; if there are ties among competitors, a second criterion or set
of criteria decides; if ties remain, a third criterion or set of criteria decides;
and so on. This may be said to be (in part) the procedure used in the Chopin
International Piano Competition, where successive stages (or criteria) eliminate
competitors, with a final stage (or criterion) to determine the order of finish
among six competitors.
Lexicographic Multicriteria
Multicriteria Weighted Point-Summing The language of each criterion is translated into points. The different criteria are assigned weights that correspond
to their relative importance; or the points used in each criterion already reflect
their relative importance (as they do in wine competitions; see chapter 7 and
section 21.2). This is the procedure used in figure skating, gymnastics, and wine
competitions, among others. But in such instances, the points (adjusted, when
appropriate, by the weights) are routinely added.
Take A to be any competitor. There are two ways to find As total score when
addition is the underlying approach:
Find each judges aggregate total score for competitor A, then calculate As
sum (or trimmed sum), or average (or trimmed average) over all judges. This
is the usual way.
378
Chapter 21
Find the sum (or trimmed sum)or average (or trimmed average) of As kth
criterion points over all judges, then aggregate them.
When the procedure takes sums or averages, the two calculations give the
same results. When the procedure takes trimmed sums or averages (most often
meaning one or two of the highest and lowest grades are eliminated), they may
not give the same results.
These traditional weighted point-summing procedures have their analogues
in the majority judgment. Better, when possible, theory and practice suggest that
a common language should be used by all criteria, which leads to what seems
to be the most reasonable application of the majority judgment to multicriteria
evaluation. Before describing these methods, the results of a wine competition
are examined and analyzed to develop intuition and insight.
21.2 Common Language: Wine Competitions
Multicriteria Ranking
379
Table 21.1a
Form to Record the Inputs of One Judge for One Wine, Weighted Point-Summing Method, Still
Wines, Les Citadelles du Vin, 2006
Excellent
View
Limpidity
Aspect
Very Good
Good
Fair
Insufficient
[VL]
[VA]
5
10
4
8
3
6
2
4
1
2
Nose
Genuineness
Intensity
Quality
Taste
Genuineness
Intensity
Persistence
Quality
[NG]
[NI]
[NQ]
8
6
16
7
5
14
6
4
12
4
3
10
2
2
8
[TG]
[TI]
[TP]
[TQ]
8
6
8
22
7
5
7
19
6
4
6
16
4
3
5
13
2
2
4
10
Harmony
Overall judgment
[HO]
11
10
Note: On the actual forms, there were boxes to check instead of points. The number points that
translate the word-grades in each criterion were absent though known to the judges.
The abbreviations for the criteria were added by the authors.
Table 21.1b
Form to Record the Inputs of One Judge for One Wine, Majority Judgment Experiment, Les
Citadelles du Vin, 2006
For you, this wine is:
Excellent
Very Good
Good
Average
Mediocre
the OIV determines the final grade of a wine as the average of the point totals
given it by the judges. This, in turn, determines the medal a wine is awarded
(or not). In this example wine As final grade is 87.2, Bs 82, and Cs 83.6, so
that the order of finish is A S C S B. As majority-gauge is (1, Very Good,
2), Bs is (2, Good, 1), and Cs is (2, Good, ), so that the order of finish is
A S C S B. The orders happen to agree.
The grades given these three wines (among well over a thousand) are roughly
representative of the competition: they are actual inputs that by and large
resemble those of other wines and the other eleven juries. There are some
disagreements between the inputs to the two methods, for instance, judge 3s
evaluation of wine B is above that of judge 1s according to the weighted pointsumming method, but below that of judge 1s according to the majority-grade.
These disagreements are rather rare and appear to have no important impact on
380
Chapter 21
Table 21.2
A jurys Grades for Three White Wines, Les Citadelles du Vin, 2006
Weighted Point-Summing
Judge
VL
VA
NG
NI
NQ
TG
TI
TP
TQ
HO
Sum
MajorityGrade
8
10
10
10
8
7
6
8
8
7
5
5
5
6
5
14
12
14
16
14
7
6
7
8
6
5
5
5
6
4
7
7
8
8
6
19
16
16
22
16
10
9
10
11
9
87
81
88
100
80
Very Good
Good
Very Good
Excellent
Average
8
10
10
10
10
7
6
7
7
7
5
5
5
5
5
14
12
14
12
12
7
6
7
7
6
5
5
4
5
4
7
7
7
6
6
16
16
16
19
13
9
9
9
10
8
83
81
84
86
76
Very Good
Good
Good
Very Good
Average
10
8
10
10
8
7
7
7
7
7
5
5
4
4
5
14
14
12
12
14
7
7
7
7
7
4
5
5
4
5
7
6
7
6
7
16
16
16
19
16
10
9
10
10
10
85
82
83
84
84
Very Good
Good
Good
Very Good
Good
White Wine A
1
2
3
4
5
5
5
5
5
5
White Wine B
1
2
3
4
5
5
5
5
5
5
White Wine C
1
2
3
4
5
5
5
5
5
5
the medals that are awarded. In analyzing the results of the 2006 competition
(and later competitions, when the majority judgment used six grades), Blouin
(2008) remarked that the two methods gave homogeneous results, and that the
differences were essentially due to the presence of one extreme grade, usually
a very low one, which could be considered abnormal.
Individual judges grades will naturally differ, for some judges may be relatively generous, others relatively harsh. This is why, of course, using the
majority-grade is important: it effectively avoids the impact of grades that are
too high or too low. The less expert the judges, the more it is judicious to obtain
grades for each characteristic. Top-notch experts may render better judgments
by integrating for themselves the various characteristics.
Blouin has argued that wines are not for experts but for consumers who
drink them, and for them a single global criterion makes the most sense, for it
corresponds to how they appreciate wines. In that case, many consumers, not
just five, should evaluate wines, and it is reasonable to expect that, as with
many voters, many consumers would use the language of grades in much the
Multicriteria Ranking
381
same manner, making single global evaluations via the majority judgment a
good method to use. For inevitably several extreme grades in a small jury such
as five will have greater impact than in a larger jury even with the majority
judgment. One of the reasons that practical people have developed systems
to evaluate competitors by asking for grades (usually numbers) on many different attributes and characteristics of the competitors, or on many separate parts
of performances, must surely be to assure that there are many grades, thereby
dampening the impact of extreme or cranky grades. This is particularly important when points are summed. Two cranky grades in a jury of fiveboth high
or both lowwill have an enormous impact when points are summed, but they
cannot impose their will on the jury with the majority judgment.
Did the judges of the 2007 Citadelles du Vin competition use the grades
the grades for each of the ten characteristics in the weighted point-summing
method and the grades of the majority judgmentin the same manner, that is,
did all judges use each of the various grades with about the same frequencies?
Individual judges vary in the pattern or distribution of the grades they use,
and they have different tastes and different biases, just as voters of the left, the
center, and the right have different tastes for candidates who span a wide political
spectrum from left to right. Thus comparing two judges or two juries of five
judges may be misleading, for they will naturally differ from judge to judge
and jury to jury. More fundamentally, the question is to show that statistically
there is a distribution of the grades used by all the judgesa common language
of gradesthat is a reasonable representation of those used by each judge in
the following sense: when sets of several judges grades are aggregated over
several differing wines, to eliminate the individuals biases in one direction or
another, the distributions obtained are close to one another.
The elaborate and detailed analysis of the grades used by voters could be
applied to the grades used by the judges in this competition, but only the raw
data similar to that in table 15.14 are given here. It is, we believe, convincing in showing that, yes, the judges used the grades in the same way, for the
distributions obtained are very close to one another.
Table 21.3a gives the frequencies with which each of the grades were used by
the first six and last six juries, who evaluated, respectively, 351 and 295 wines.
Six juries means thirty judges. The usage rates are very similar, although no
wine tasted by one set of juries was tasted by the other set of juries. The average
grades, based on the numbers corresponding to the grades given in table 21.1a,
are almost identical. Table 21.3b gives the frequencies with which each of the
five majority judgment grades were used by the two sets of juries. The usage
rates are again very similar, and the average grades (attributing 5 to Excellent,
4 to Very Good, . . . , and 1 to Mediocre) are also almost identical.
382
Chapter 21
Table 21.3a
Frequencies of Grades Given to Characteristics of Wines, Les Citadelles du Vin, 2006
Grade
Juries
VL
VA
Excellent
16
712
52%
53%
40%
46%
3%
4%
3%
2%
2%
3%
Very Good
16
712
39%
32%
42%
35%
51%
38%
44%
35%
35%
30%
Good
16
712
8%
13%
16%
15%
33%
41%
44%
47%
48%
47%
Fair
16
712
16
712
16
712
1%
2%
0%
0%
4.4
4.4
2%
4%
0%
1%
8.4
8.4
9%
13%
3%
4%
6.3
6.0
9%
14%
1%
2%
4.4
4.2
11%
16%
4%
5%
12.4
12.2
Insufficient
Average grade
NG
NI
NQ
Note: Juries 16 tasted 351 wines, juries 712 tasted 295 wines.
Average grade means the average of corresponding numbers (see Table 21.1a).
Ifas has been the practice in evaluating wines, ranking figure skaters and
gymnasts, or evaluating the quality of life in nationsseveral criteria must be
accounted for, what method should be used?
21.3 Multicriteria Majority Judgment
Both of the traditional procedures for multicriteria inputs have analogues in the
spirit of the majority judgment when the aggregation of grades across criteria is a
well-defined function (such as a weighted sum or average, or a majority-grade).
Which procedure to use will depend upon the particular application.
Judge-based majority judgment Find each judges aggregate grade across
the criteria for competitor A, then calculate As majority-value (or majoritygauge) over all judges to determine the majority-ranking.
Criterion-based majority judgment Find the majority-grade of As kth criterion over all judges, then aggregate the majority-grades. If ties need to be
resolved to determine the majority-ranking, then the second majority-grade of
As kth criterion over all judges is found and then aggregated; if ties persist,
then the third-majority-grades are calculated; and so on.
Using the same function to aggregate grades across criteria in both procedures
may well lead to different results. Thus, to each such function there correspond
two procedures.
Multicriteria Ranking
383
Table 21.3a
(cont.)
Grade
TG
TI
TP
TQ
HO
Excellent
3%
2%
3%
3%
3%
2%
3%
2%
2%
1%
Very Good
49%
38%
38%
32%
33%
26%
27%
21%
28%
23%
Good
37%
43%
46%
46%
48%
51%
53%
48%
54%
52%
Fair
9%
13%
2%
3%
6.3
6.0
11%
17%
1%
3%
4.3
4.1
14%
18%
2%
3%
6.2
6.0
14%
23%
3%
6%
16.3
15.7
13%
18%
3%
5%
9.1
9.0
Insufficient
Average grade
Table 21.3b
Frequencies of Grades Given to Characteristics of Wines, Majority Judgment Method, Les
Citadelles du Vin, 2006
Juries
Excellent
Very Good
Good
Average
Mediocre
Average Grade
16
712
1%
2%
29%
23%
49%
50%
21%
25%
0%
0%
2.9
2.8
Note: Juries 16 tasted 351 wines, and juries 712 tasted 295 wines.
To see how each of these methods work, they are applied to finding the three
white wines majority-grades and rankings for the judge-based procedure and
the sequence of majority-grades for the criterion-based procedure, assuming
that the function that aggregates the grades is the sum of their corresponding
numbers.
The judge-based majority judgment calculates the majority-grades of the
sum of the points: As majority-grade is 87, Bs is 83, and Cs is 84, which
is sufficient to determine their order, A S C S B. The procedure is simple and direct, but it reveals nothing about the jurys verdict on each of the
characteristics.
The criterion-based majority judgment first calculates the first majority-grade
of each criterions grades (and the second, third, . . . if necessary). Then it aggregates them; in this case it adds them. The results are shown in table 21.4.
A, with first and second majority-grades (86,82), is first; C with (86,79) is
second; and B with 83 is last, A S C S B. The two procedures happen to
give the same result. The procedure is somewhat more involved, but it renders
384
Chapter 21
Table 21.4
Criterion-Based Majority Judgment Calculation for Inputs of Table 21.2, Les Citadelles du Vin,
2006
Criterion
VL
VA
NG
NI
NQ
TG
TI
TP
TQ
HO
Sum
5
5
14
14
7
6
5
5
7
7
16
16
10
9
86
82
12
16
83
5
4
14
12
7
7
5
4
7
6
16
16
10
10
86
79
White Wine A
1st majority-grade
2d majority-grade
5
5
10
8
7
7
White Wine B
1st majority-grade:
10
White Wine C
1st majority-grade
2d majority-grade
5
5
10
8
7
7
Multicriteria Ranking
385
Table 21.5
Multicriteria Majority Judgment, Les Citadelles du Vin, 2006
Weights
15
VL
30
VA
27
NG
20
NI
60
NQ
27
TG
20
TI
30
TP
80
TQ
45
HO
VG
G
Exc
Exc
VG
VG
VG
VG
VG
Exc
VG
VG
VG
G
VG
Exc
VG
VG
VG
G
VG
Exc
G
VG
VG
VG
VG
Exc
G
VG
VG
VG
Exc
Exc
G
VG
VG
G
G
Exc
G
G
VG
G
VG
Exc
G
VG
VG
VG
VG
VG
VG
VG
VG
VG
VG
VG
VG
VG
White Wine A
Judge 1
Judge 2
Judge 3
Judge 4
Judge 5
Majority-grade
Exc
Exc
Exc
Exc
Exc
Exc
VG
Exc
Exc
Exc
VG
Exc
White Wine B
Majority-grade
Exc
Exc
White Wine C
Majority-grade
Exc
Exc
NQ (60,180,60)
TG (27,54,54)
TI (20,60,20)
TP (60,60,30)
TQ (80,80,240)
HO: (45,90,90)
386
Chapter 21
Note that the weighted majority-grade of a criterion is equal to its majoritygrade because grades assigned to a criterion are only replicated. The total count
of grades for A is (531,778,521), the sum of the separate counts attached to
the criteria. So As multicriteria majority-gauge is (531, Very Good +, 521).
Wine B is also assigned a grade of Fair, so its count of grades contains four
numbers, (195,714,736,125) (the 125 is the number of Fairs it received), giving
a multicriteria majority-gauge of (195,Very Good , 861); the result for wine C
is a total count of (165,980,625), so a multicriteria majority-gauge of (165,Very
Good , 625). Thus the majority-ranking is A S C S B.
Given weights that determine the relative importance of the criteria, judgebased and criterion-based majority judgment procedures are immediately
defined.
Judge-based procedure Find each judges weighted majority-grade for competitor A, then calculate As majority-value over all judges (and in case of
ties, find each judges weighted second majority-grade for competitor A, then
calculate As second majority-value over all judges, . . .).
22
A Summing Up
It is hard necessity and not speculation or a desire for novelty which forces us to change
the old classical view . . . Changes of view are continually forced upon us by our attempts
to understand reality. But it always remains for the future to decide whether or not a
better solution of our difficulties could have been found.
Albert Einstein and Leopold Infeld
Paradoxes and impossibility theorems have dominated the theory and analysis of social choice and voting from Condorcets to Arrows, and on to all the
many others that continue to be found down to the present day. Paradoxes and
anomaliesmost notably and most importantly, Condorcets and Arrows
have plagued the reality and practice of voting and judging across the years.
Today the world shows signs of a growing awareness that perhaps the mechanisms used to elect and to rankpure inventions of the human mindare not
electing the candidates the voters want nor designating the order of finish the
juries want (e.g., Poundstone 2008).
To model problems of the real world in the social sciences is no different
than in any other science. Einstein and Infelds description of the endeavor is
wonderful and cannot be bettered. It is at once deep in its perspective, light in
its spirit, and full of delightfully telling analogies, so we hazard to use their
words to bring this book to an end.
They liken the role of the research scientist to that of the detective who, after
gathering the scientific facts, finds the right solution by pure thinking. They
then retract in one particular: The detective must look for letters, fingerprints,
bullets, guns, but at least he knows that a murder has been committed . . . For the
detective the crime is given, the problem formulated: who killed Cock Robin?
The scientist must, at least in part, commit his own crime, as well as carry
out the investigation. Moreover, his task is not to explain just one case, but all
phenomena which have happened or may still happen (1938, 78).
Their retraction may be (or may have been) correct for physics, but in the
theory of voting or of social choice, not only has the murder been committed,
388
Chapter 22
it was committed centuries ago. Yet, despite more and more clues, its solution
has remained an enigma. For, if Condorcets and Arrows paradoxes are to be
avoided, then which voter or judge gave what grade must be forgotten; only
the grades assigned to the competitors are relevant; and the scale of grades
must be absolute for each individual judge or voter. And if the results are to
be meaningful, then the scale of grades must be common to all judges or voters. These statements are proven (theorem 9.2 and Arrows theorems 11.6a
and 11.6b) in the context of the new model, but the new model encompasses
the old, when a voters or judges input messages to the traditional model
rank-ordersare determined by their input grades, namely, a higher grade
means ranks higher, equal grades means ranks equally. In fact, all methods
based on the traditional model are meaningless. Hence, the only possible meaningful methods of election must be based on a new model. Who committed the
murder is thus perfectly clear: the traditional models basic paradigm that judges
or voters have in mind and give as inputs rank-orders of the competitors. And
yet, the detectives in the matter have steadfastly accepted that paradigm.
That concerns, however, only half of the problem.
What is, after all, the job of the theory and practice of social choice, of
aggregating the opinions of voters in elections and aggregating the evaluations of judges in competitions? It is to gather, as precisely as possible, the
true opinions and evaluations of individuals, and to determine, as precisely as
possible, the true aggregate wills of electorates and juries. Mere rank-order
inputs falsify the true opinions of judges and voters. This has been recognized
implicitly by the practitioners in skating, piano, wine, and other competitions
(including ranking students in schools and universities), who have increasingly
used grades. It has largely been ignored in elections. Two voters who place a
candidate firstor second, or anywhere in their listsmay in fact evaluate the
candidates completely differently (as ample evidence given in this book has
shown). But with the exception of Australia and Ireland, very few nations even
ask voters to input as much information as a rank-order. The inputs to the most
used systemsnotably, first-past-the-post and two-past-the-postare single
candidates, which conveys very little information concerning the opinions of
voters. When the inputs are approvals and disapprovals, or 0s and 1s, only a
pinch of extra information is elicited.
Common languages of grades are commonplace: they exist in myriad applications, though not heretofore in voting. Wines are a perfect example. While
some would suggest that scoring [grading] is not well suited to a beverage that
has been romantically extolled for centuries, wine is no different from any other
consumer product. There are specific standards of quality that full-time wine
A Summing Up
389
professionals recognize, and there are benchmark wines against which all others can be judged (Parker 2002, 3, our emphasis). This affirms the existence
of evaluations that have developed over time and have become absolute evaluations. This ideal is true in many other competitions as well, and we believe it
can also be realized in voting.
The evidence of the Orsay experiment shows that a common language
existsor may be createdfor voting. Ideally, voters know exactly what each
grade of the language means: two voters who assign a Good to a candidate have
identical meanings in mind. But that, of course, cannot be true in practice. It is
not true of any language: when one person announces to another that her dress
is blue, the other has only an approximate idea of what that means, for it may be
a pale blue or a deep blue or a blue-green (a grue), and so on. In fact, the words
used to describe colors in different cultures have been the subject of intense
study and debate. From the World Color Surveys study of 110 unwritten languages, Berlin and Kay (1969) formulated these hypotheses: (1) the existence
of universal constraints on cross-language color naming, and (2) the existence
of a partially fixed evolutionary progression according to which languages gain
color terms over time (Kuehni 2007, 151). The paths of the development of
new words for finer distinctions among colors differ, but the six colors identified
as the fundamental colors by Ewald Hering in the nineteenth centuryblack,
white, yellow, blue, red, greenoccur more frequently than others, though not
consistently. There is no question that within a culture, the words for colors carry
the same meanings. These studies suggest that words for colors carry universal
meanings. There is a linear spectrum of an infinity of colors. Yet people have
used a scale of the same six words having the same meanings to differentiate
among them. Why should the existence of a scale of grades to evaluate merit
having the same meaning in one or another competition be rejected ?
It is incontestable that inputs given in the six-word language of grades Excellent, Very Good, Good, Acceptable, Poor, and To Reject convey much more precise common meanings within a culture than the input of a rank-order, the name
of one candidate, or the names of several candidates. Common sense, the observation of practice (in skating, diving, wines), and the fact that human beings
can and do communicate even very sophisticated ideas with the words of a language, together suggest that the most reasonable hypothesis is that a population
that shares a common culture and language does understand words in essentially
the same way. In any case their definitions are clearly given in dictionaries.
Indeed, why, a priori, should such understandings be denied across cultures
rather than be universal? Reasonably accurate translations from one language
to another may be found in dictionaries. Millers analysis concludes that the
capacity of men and women to make absolute judgments on a unidimensional
390
Chapter 22
scale is limited to some 7 levels 2. At its end he reminds readers, What about
the seven wonders of the world, the seven seas, the seven deadly sins, the seven
daughters of Atlas in the Pleiades, the seven ages of man, the seven levels of hell,
the seven primary colors, the seven notes of the musical scale, and the seven
days of the week? (1956, 97). We believe that there is an optimal number for
each application, there is a right way of defining a common language of grades
for every application. The choice is not arbitrary. Six or seven seems to be a
particularly apt number for human cognition in a scale of merit.
The majority judgment was used officially by the Nieman Foundation of Harvard University to discern the Louis Lyons Award for Conscience and Integrity
in Journalism in November 2009. The jury was composed of nineteen Nieman
Fellows. Five journalists (or groups of journalists) were the nominees. All of
them were very highly considered. As a consequence, the Nieman Fellows
chose the following common language of seven grades: Absolutely Outstanding,
Outstanding, Excellent, Very Strong, Strong, Commendable, and Neutral. The
winners majority-grade was Absolutely Outstanding, two nominees majoritygrades were Outstanding, and two were Excellent. Five of the nineteen judges
gave their highest grade to more than one nominee; three gave no Absolutely
Outstanding; Outstanding was the lowest grade assigned by five; and exactly
one judge gave different grades to all candidates (so only one rank-ordered
them). This confirms the qualitative behavior of the voters in the Orsay experiment. Once again, the traditional inputs are inadequate; they do not model
reality.
Common languages of grades used by professionals to judge competitions
among sportsmen or products can be much richer, so long as they are well
defined and well understood. It is important to realize that while the mathematical model developed in this book sometimes assumes an infinite language,
all of the properties carry over to finite languages. Technically, an infinite language is sometimes, but not always, necessary to be able to give complete
characterizations.
Imagining an ideal common language of grades with which every voter
can perfectly express her opinions is like imagining a frictionless world in
physics. We have seen that this law of inertia 1 cannot be derived directly from
experiment, but only by speculative thinking consistent with observation. The
idealized experiment can never be actually performed, although it leads to a
profound understanding of real experiments (Einstein and Infeld 1938, 89).
Common languages of grades are observed in practice. In voting, the 2007
1. Every body perseveres in its state of rest, or of uniform motion in a right line, unless it is
compelled to change that state by forces impressed thereon (Einstein and Infeld 1938, 8).
A Summing Up
391
Orsay experiment (as well as other experiments) shows that given options with
which to express themselves, voters with a shared cultural background use those
options in the same way. When sets of several grades are aggregated over many
candidates to eliminate individual voters political biases, the distributions of
grades obtained are close to one another. This is true of the grades in the Orsay
voting experiment as it is of the grades used in the Citadelles du Vin competition.
Mathematically, that seems to be the best one can do to show that the language
is common. In practice, every audience to whom the distributions of the grades
of the candidates in the Orsay experiment were presented anonymouslythat
is, with no names attached to the distributionswere able to identify the four
major ones (Sarkozy, Royal, Bayrou, and Le Pen), suggesting that the grades
are meaningful, they make sense, they do constitute a common language.
It has been contended by some that the majority judgment ballot is too complex, that it asks too much of voters. We do not believe so. On the contrary, the
inputs are arguably the simplest for voters. First, because the ballot has greater
cognitive simplicity: it is very natural to evaluate the merits of candidates in
a scale defined by words. Second, because every electoral experiment to date
has shown that voters were able to fill out the ballots quickly, and moreover
appreciate being able to better express their opinions. It is clear that for voters
it is easier to evaluate candidates in a natural scale of six grades than to
rank-order candidates, and that it permits them to express their opinions more
accurately. Some contend that the first- and two-past-the-post methods are by
far the simplest for voters to understand. We do not believe so. Although it is
easy enough to name one candidate, the subtleties of Arrows and Condorcets
paradoxes engendered by these methods are not well understood by voters. In
addition, voters are frustrated because they are unable to express their opinions concerning all the candidates. Instead they are faced with a stark strategic
choice that is more difficult to resolve than it is to give ones honest evaluations
of the several candidates.
Given a common language of grades for voters or judges to declare inputs,
how are the inputs to be aggregated? Physical concepts are free creations of
the human mind, and are not, however it may seem, uniquely determined by
the external world. In our endeavor to understand reality we are somewhat like
a man trying to understand the mechanism of a closed watch. He sees the face
and the moving hands, even hears its ticking, but he has no way of opening
the case. If he is ingenious he may form some picture of a mechanism which
could be responsible for all the things he observes, but he may never be quite
sure his picture is the only one which could explain his observations. He will
never be able to compare his picture with the real mechanism and he cannot
even imagine the possibility or the meaning of such a comparison. But he
392
Chapter 22
certainly believes that, as his knowledge increases, his picture of reality will
become simpler and simpler and will explain a wider and wider range of his
sensuous impressions. He may also believe in the existence of the ideal limit
of knowledge and that it is approached by the human mind. He may call this
limit the objective truth (Einstein and Infeld 1938, 33).
Concepts to determine the properties of good mechanisms for aggregating
opinions are also free creations of the human mind. They are determined by
common sense, ethics, realities of human behavior, the limits of meaningful
measurement, and the qualitative properties of actual results. But, as in physics,
To draw quantitative conclusions we must use the language of mathematics.
Most of the fundamental ideas of science are essentially simple, and may, as a
rule, be expressed in language comprehensible to everyone. To follow up these
ideas demands the knowledge of a highly refined technique of investigation.
Mathematics as a tool of reasoning is necessary if we wish to draw conclusions
which may be compared with experiment (Einstein and Infeld 1938, 29).
Ethics imposes that a mechanism must be neutral and anonymous (or impartial) and obey the will of majorities. Common sense imposes that a mechanism
must satisfy unanimity. These are the essential principles, the rock-bottom
necessities. Four major arguments single out the majority judgment as the one
method to be used:
Voters and judges may manipulate by sending input messages not in keeping with their true opinions to try to tilt the outcomes to their
advantage. This imposes the need for mechanisms that doom manipulation
to failure, or if that ideal is impossible, that best resist the attempt to manipulate. This idea has different formulationsstrategy-proof-in-grading, strategyproof-in-ranking (unattainable by any mechanism), partially strategy-proof-inranking, the manipulability of mechanismsyet the mathematics shows that
each, together with the essential principles, singles out the majority judgment.
Human Behavior
A Summing Up
393
only by who wins), not their true evaluations; indeed, their inputs may be quite
different from the true expression of their opinions, clearly a bad consequence.
On the other hand, the majority judgment elects a Condorcet-winner in a strong
equilibrium as well, but with his true majority-grade; moreover, every candidate
receives a majority of true grades. But no method elects a Condorcet-winner
with the voters true opinions unless it heeds who gave what grade (theorems
20.6 and 20.7), in which case Arrows and Condorcets paradoxes rear their
ugly heads. 2
In the context of a game closer to reality, voters utilities depend on pairs
consisting of the winner and his final grade. The majority judgment elects the
Condorcet-judgment-winner with his honest majority-grade.
A claim measured by some representation of a scale of
evaluation is meaningful if the same claim is true measured by any other representation of that scale. A common language of grades is an ordinal scale, so
language-consistency and order-consistency (together with the basic ethical and
commonsense properties) single out the majority judgment as the one possible
method; and when there is no common language, Arrows theorem shows there
can be no preference-consistent method, no meaningful method.
Meaningfulness
The 2007 Orsay field experiment constitutes an exceptionally interesting database for the study of different voting methods. It shows the existence
of a statistical left-right spectrum. It shows that when a centrist candidate may
be clearly identified, the well-known methods of voting may be ordered according to their biases against or for the centrist candidate. The majority judgment
and Condorcets are least biased for or against the centrist; Bordas, pointsumming, and approval judgment (approval meaning Good or better) are very
biased for the centrist; first- and two-past-the-post, approval judgment (approval
meaning either Very Good or better, or Excellent) are very biased against the
centrist. The data also permit the findings concerning manipulability to be confirmed experimentally. We believe that these conclusions are robustthough
more experimentation should be pursuedand that they could not have been
obtained with laboratory experiments where it is impossible to capture the rich
complexity of the motivations of real voters.
Fundamental ideas play the most essential role in forming a physical theory.
Books on physics are full of complicated mathematical formulae. But thought
Practice
2. Thus when Condorcet-consistency is claimed for a method, beware! The statement may not
mean what you think it means.
394
Chapter 22
and ideas, not formulae, are the beginning of every physical theory. The ideas
must later take the mathematical form of a quantitative theory, to make possible
the comparison with experiment (Einstein and Infeld 1938, 291). So, too,
in the social sciences: substitute the words social choice for the words
physics and physical and the statements maintain all their meanings.
There is, in essence, one new idea, one change in view: a simple model of how
voters and judges express their opinions by evaluating the merits of candidates
or competitors in a common language of grades rather than comparing them.
Once the model is in hand, the rest follows from the blueprint provided by the
classical theory of social choice.
Einstein and Infeld recall a fundamental truth:
Science is not and will never be a closed book. Every important advance brings new
questions. Every development reveals, in the long run, new and deeper difficulties.
(1938, 308)
There is no doubt in our minds, however, that the change in view immensely
improves the representation of reality, permits deeper understanding and analysis, and so leads to a vastly better mechanism for juries and electorates to rank
and to elect.
References
396
References
Balinski, M. L., and F. Pukelsheim. 2006. Matrices and politics. In Festschrift for Tarmo Pukkila,
ed. E. P. Liski, J. Isotalo, J. Niemel a, S. Puntanen, and G.P.H. Styan, 233242. Tampere, Finland:
University of Tampere.
Balinski, M. L., and S. T. Rachev. 1997. Rounding proportions: Methods of rounding. The
Mathematical Scientist 22: 126.
Balinski, M. L., and V. Ramirez. 1999. Mexicos 1997 apportionment defies its electoral law.
Electoral Studies 18: 117124.
Balinski, M. L., and H. P. Young. 1982. Fair representation: Meeting the ideal of one-man, onevote. New Haven, Conn.: Yale University Press. 2d ed. Washington, D.C.: Brookings Institution
Press, 2001.
Barber, S., and M. Jackson. 1994. A characterization of strategy-proof social choice functions for
economies with pure public goods. Social Choice and Welfare 11: 241252.
Barber, S., H. Sonnenschein, and L. Zhou. 1991. Voting by committees. Econometrica 59:
595609.
Bartoszynski, R. 1972. Power structure in dichotomous voting. Econometrica 40: 10031019.
Basset, G. W., and J. Persky. 1999. Robust voting. Public Choice 99: 299310.
Baujard, A., and H. Igersheim. 2007. Exprimentation du vote par note et du vote par approbation
lors de llection prsidentielle franaise du 22 avril 2007 (rapport final). Paris: Centre danalyse
stratgique.
Berlin, B., and P. Kay. 1969. Basic color terms: Their universality and evolution. Berkeley:
University of California Press.
Black, D. 1958. The theory of committees and elections. Cambridge: Cambridge University Press.
Blackorby, C., W. Bossert, and D. Donaldson. 2005. Population issues in social choice theory,
welfare economics, and ethics. Cambridge: Cambridge University Press.
Blackorby, C., D. Donaldson, and J. A. Weymark. 1984. Social choice with interpersonal utility
comparisons: A diagrammatic introduction. International Economic Review 25: 327356.
Blacks Law Dictionary. 2001. 2d pocket ed. St. Paul, Minn.: West Group.
Blouin, J. 2008. Citadelles du vin: une dgustation plus prs du consommateur. Commentary on
2008 competition. Private communication.
Bogomolnaia, A., H. Moulin, and R. Strong. 2005. Collective choice under dichotomous preferences. Journal of Economic Theory 122: 165184.
Borda, J.-C., Le Chevalier de. 1784. Mmoire sur les lections au scrutin. In Histoire de lAcadmie
royale des sciences: Anne 1781, 657665. A footnote states that the ideas were presented before
the Academy on June 16, 1770.
Bossert, W., and J. A. Weymark. 2004. Utility in social choice. In Handbook of utility theory, vol.
2: Extensions, ed. S. Barber, P. J. Hammond, and C. Seidl, 10991177. Boston: Kluwer.
Brams, S. J., and P. C. Fishburn. 1978. Approval voting. American Political Science Review 72:
831847.
. 1983. Approval voting. Boston: Birkhauser.
. 2001. A nail-biting election. Social Choice and Welfare 18: 409414.
Brams, S. J., and M. R. Sanver. 2006. Critical strategies under approval voting: Who gets ruled in
and ruled out. Electoral Studies 25: 287305.
Carroll, L. 1871. Through the looking glass. London: Macmillan.
. 1916. Alices adventures in wonderland. New York: Saml Gabriel Sons. Originally
published London: Macmillan, 1865.
Chernoff, H. 1954. Rational selection of decision functions. Econometrica 22: 422443.
Colegrove v. Green. 1946. 328 U.S. 549.
References
397
Condorcet, Le Marquis de. 1785. Essai sur lapplication de lanalyse la probabilit des dcisions
rendues la pluralit des voix. Paris: lImprimerie royale.
. 1789. Sur la forme des lections. Pamphlet, sec. 12, 2526. Also in Oeuvres de Condorcet
(1847), ed. A. Condorcet OConnor and M. F. Arago. Paris: Firmin Didot frres.
Conseil constitutionnel. 1986. Decision no. 86-208 DC. July 2.
Copeland, A. H. 1951. A reasonable social welfare function. Seminar on Applications of
Mathematics to the Social Sciences. University of Michigan, Ann Arbor.
Court of Arbitration for Sport. 2004. Arbitral award. Lausanne. October 21.
Dantzig, G. B. 1963. Linear programming and extensions. Princeton, N.J.: Princeton University
Press.
Darwin, C. 1958. The autobiography of Charles Darwin, 18091882, with the original omissions
restored. Edited and with appendix and notes by his granddaughter Nora Barlow. London: Collins.
Paperback ed. New York: W.W. Norton, 1969.
Dasgupta, P., and E. Maskin. 2004. The fairest vote of all. Scientific American 290 (March): 9297.
. 2008. On the robustness of majority rule. Journal of the European Economics Association
6: 949973.
dAspremont, C., and L. Gevers. 1977. Equity and the informational basis of collective choice.
Review of Economic Studies 44: 199209.
De Sinopoli, F., B. Dutta, and J.-F. Laslier. 2006. Approval voting: three examples. International
Journal of Game Theory 35: 2738.
Debuigne, G. 1970. Larousse des vins. Paris: Librairie Larousse.
Dclaration des droits de lhomme et du citoyen. 1789.
Dcret. 1813. Projet de dcret relatif lorganisation particulire du commerce des vins Paris.
December 14. <http://www.napoleonica.org/gerando/GER03194.html>.
Dodgson, C. L. 1873. A discussion of the various methods of procedure in conducting elections.
In The theory of committees and elections, by D. Black. Cambridge: Cambridge University Press,
1958.
. 1874. Suggestions as to the best method of taking votes, where more than two issues are
to be voted on. In The theory of committees and elections, by D. Black. Cambridge: Cambridge
University Press, 1958.
. 1876. A method of taking votes on more than two issues. In The theory of committees and
elections, by D. Black. Cambridge: Cambridge University Press, 1958.
. 1884. The principles of parliamentary representation. London: Harrison and Sons.
Supplement. Oxford: E. Baxter, 1885.
Downs, A. 1957. An economic theory of democracy. New York: Harper.
Dubey, P., and J. Geanakoplos. 2006. Grading in games of status: Marking exams and setting wages.
Cowles Foundation Discussion Paper No. 1544R, January.
Dvoretzky, A., J. Kiefer, and J. Wolfowitz. 1956. Asymptotic minimax character of the sample
distribution function and of the classical multinomial estimator. Annals of Mathematical Statistics
27: 642669.
Economist Intelligence Unit (EIU). 2005. The Economist Intelligence Units quality-of-life index.
London: EIU.
Einstein, A., and L. Infeld. 1938. The evolution of physics. Cambridge: Cambridge University Press.
Elman, B. A. 2000. A cultural history of civil examinations in Late Imperial China. Berkeley:
University of California Press.
Emerson, J. 2006. Analyses. <http://www.stat.yale.edu/ jay/>.
Enelow, J. M., and M. J. Hinich. 1984. The spatial theory of voting: An introduction. Cambridge:
Cambridge University Press.
398
References
References
399
Holcombe, R. G., and L. W. Kenny. 2007. Evidence on voter preferences from unrestricted choice
referendums. Public Choice 131: 197215.
Hotelling, H. 1929. Stability in competition. Economic Journal 39: 4157.
Inada, K. 1969. The simple majority decision rule. Econometrica 37: 490506.
International Skating Union (ISU). 1998. New ISU figure skating results system. Communication
No. 997. October 30.
. 2002. Calculations for Olympic winter games pairs skating. <http://www.icecalc.com/
events/owg2002/results/>.
. 2004. Special regulations: Single and pair skating, as accepted by the 50th ordinary
congress, June. <http://www.isu.org/>.
International Wine and Spirit Competition (IWSC). 2006. IWSC promotional leaflet. <http://
www.iwsc.net/>.
Jennings, A. 2009. Weakly monotonic aggregation functions. Working paper. Mathematics
Department, Arizona State University, Tempe. September 4.
Julia, D. 1990. Gaspard Monge, examinateur. Histoire de lducation, no. 46, 111133.
Kemeny, J. 1959. Mathematics without numbers. Daedalus 88: 571591.
. 1962. Preference rankings: An axiomatic approach. In Mathematical models in the social
sciences, ed. J. G. Kemeny and J. L. Snell, 923. Boston: Ginn and Co.
Kim, S.-R. 1990. On the possible scientific laws. Mathematical Social Sciences 20: 1936.
Kintsch, W., and J. T. Cacioppo. 1994. Introduction to the 100th anniversary issue of the Psychological Review. Psychological Review 101: 195199.
Kirkpatrick v. Preisler. 1969. 394 U.S. 526.
Koc, E. W. 1988. An experimental examination of approval voting under alternative ballot
conditions. Polity 20: 688704.
Krantz, D. H., R. D. Luce, P. Suppes, and A. Tversky. 1971. Foundations of measurement. Vol. 1.
New York: Academic Press.
Kuehni, R. G. 2007. Nature and culture: An analysis of individual focal color choices in World
Color Survey languages. Journal of Cognition and Culture 7: 151172
Kuhn, T. S. 1961. The function of measurement in modern physical science. Isis 52: 161190.
Reprinted in The Essential Tension, 178224. Chicago: University of Chicago Press, 1977.
. 1970. The structure of scientific revolutions. 2d ed. enlarged. Chicago: University of
Chicago Press. Originally published in 1962.
Kurrild-Klitgaard, P. 1999. An empirical example of the Condorcet paradox of voting in a large
electorate. Public Choice 107: 12311244.
Laplace, P.-S., Marquis de. 1820. Thorie analytique des probabilits. 3d ed. Paris: Courcier.
Originally published 1812.
Laraki, R. 2009. Coalitional equilibria of strategic games. Cahier 2009-42. Laboratoire
dconomtrie, cole Polytechnique. October.
Laslier, J.-F. 2009. The leader rule: A model of strategic approval voting in a large electorate.
Journal of Theoretical Politics 21: 113136.
Laslier, J.-F., and K. van der Straeten. 2004. Vote par assentiment pendant la prsidentielle 2002:
Analyse dune exprience. Revue Franaise de Science Politique 54: 99130.
Le vote de valeur: Pour renforcer lacte dmocratique. 2007. <http://www.votedevaleur.info/co/pres
.html>.
London, J., and I. McLean. 1990. The Borda and Condorcet principles: Three medieval applications.
Social Choice and Welfare 7: 99108.
Loosemore, S. 1997. If it aint broke, dont fix it: An analysis of the figure skating scoring system.
<http://www.frogsonice.com/skateweb/obo/score-tech.shtml>.
400
References
Markham, D. Jr. 1997. 1855, Histoire dun classement des vins de Bordeaux. Bordeaux: Editions
Fret.
Martin, J. 2002. Aux origines de la science des examens (19201940). In Lexamen: valuer, slectionner, certifier XVIeXXe sicles, ed. B. Belhoste, 177199. Paris: Institut national de
recherche pdagogique.
Mas-Colell, A., M. D. Whinston, and J. R. Green. 1995. Microeconomic theory. Oxford: Oxford
University Press.
Maskin, E. 1999. Nash implementation and strong Nash equilibria. Review of Economic Studies
66: 2338.
Mason, W. 2006. Judging brief for 2006 competitions jury. <http://www.top100wines.com/main/
default.asp>.
May, K. O. 1952. A set of independent necessary and sufficient conditions for simple majority
decision. Econometrica 20: 680684.
Merrill, S. III. 1988. Making multicandidate elections more democratic. Princeton, N.J.: Princeton
University Press.
Merrill, S. III, and J. Nagel. 1987. The effect of approval balloting on strategic voting under
alternative decision rules. American Political Science Review 8: 509524.
Miller, G. A. 1956. The magical number seven, plus or minus two: Some limits on our capacity for
processing information. Psychological Review 63: 8197.
Mondovino. 2004. Documentary film. Dir. J. Nossiter.
Mosteller, F., and J. W. Tukey. 1977. Data analysis and regression. Reading, Mass.:Addison-Wesley.
Moulin, H. 1980. On strategy-proofness and single peakedness. Public Choice 35: 437455.
. 1988. Axioms of cooperative decision making. Cambridge: Cambridge University Press.
Moulin, L. 1953. Les origines religieuses des techniques lectorales et dlibratives modernes.
Revue internationale dhistoire politique et constitutionnelle 3 (N.S.): 106148.
Muller, E., and M. Satterthwaite. 1977. The equivalence of strong positive association and strategyproofness. Journal of Economic Theory 14: 412418.
Myerson, R. B. 1998. Population uncertainty and Poisson games. International Journal of Game
Theory 27: 375392.
. 2000. Large Poisson games. Journal of Economic Theory 94: 745.
. 2002. Comparison of scoring rules in Poisson voting games. Journal of Economic Theory
103: 219251.
Myerson, R. B., and R. J. Weber. 1993. A theory of voting equilibria. American Political Science
Review 87: 102114.
Nagel, J. H. 2006. A strategic problem in approval voting. In Mathematics and democracy: Recent
advances in voting systems and collective choice, ed. F. Pukelsheim and B. Simeone, 133150.
New York: Springer.
. 2007. The Burr dilemma in approval voting. Journal of Politics 69: 4358.
Nanson, E. J. 1882. Methods of election. Transactions and Proceedings of the Royal Society of
Victoria 18: 197240.
Narens, L., and R. D. Luce. 2008. Meaningfulness and invariance. In New Palgrave Dictionary of
Economics, 2d ed., ed. S. N. Durlauf and L. Blume. Basingstoke, U.K.: Palgrave Macmillan.
Nash, J. F. 1950. The bargaining problem. Econometrica 18: 155162.
Neumann, J. von, and O. Morgenstern. 1944. Theory of games and economic behavior. Princeton,
N.J.: Princeton University Press.
Nnez Rodrguez, M. 2008. Questions stratgiques en thorie du vote (Strategic questions in voting
theory). Doctoral thesis, cole Polytechnique, November.
References
401
Nurmi, H. 2004. Monotonicity and its cognates in the theory of social choice. Public Choice 121:
2549.
OBrian, Patrick. 1969. Master and commander. Philadelphia: Lippincott. Paperback ed. New York:
W.W. Norton, 1990.
Organisation Internationale de la Vigne et du Vin (OIV). 1994. Standard for international wine
competitions. <http://www.oiv.int/>.
. 2009. OIV standard for international wine and spirituous beverages of vitivinicular origin
competitions. Resolution OIV/Concours 332A/2009. <http://www.oiv.int/>.
Orlov, A. 1981. The connection between mean quantities and admissible transformations.
Mathematical Notes 30: 774778.
Parker, R. M. Jr., with P.-A. Rovani. 2002. Parkers wine buyers guide. 6th ed. New York: Simon
and Schuster.
Pennisi, A. 2006. The Italian bug: A flawed procedure for bi-proportional seat allocation. In Mathematics and democracy: Recent advances in voting systems and collective choice, ed. F. Pukelsheim
and B. Simeone, 133150. New York: Springer.
Pennisi, A., F. Ricca, P. Serafini, and B. Simeone. 2007. Amending and enhancing electoral laws
through mixed integer programming: The case of Italy. In Proceedings of the VIII International
Conference on Economic Modernization and Public Development, 110. Moscow: Higher School
of Economics. <http://conf.hse.ru/lingua/en/2007/>.
Peynaud, ., and J. Blouin. 1999. Dcouvrir le got du vin. Paris: Dunod.
. 2006. Le got du vin: le grand livre de la dgustation. 4th ed. Paris: Dunod.
Pfanzagl, J. 1971. Theory of measurement. Vienna: Physica-Verlag.
Piron, H. 1963. Examens et docimologie. Paris: PUF.
Pliny the Elder. 1855. The natural history. Trans. J. Bostock and H. T. Riley. Book 14, chs. 7 and
8. <http://www.perseus.tufts.edu/hopper/>.
Plott, C. R., J. T. Little, and R. P. Parks. 1975. Individual choice when objects have ordinal
properties. Review of Economic Studies 42: 403413.
Poundstone, W. 2008. Gaming the vote: Why elections arent fair. New York: Hill and Wang.
Pukelsheim, F. 2006. BAZI. <http://www.math.uni-augsburg.de/stochastik/bazi/welcome.html>.
Quandt, R. E. 2006. Measurement and inference in wine tasting. Journal of Wine Economics 1:
730.
RangeVoting.org. 2007. <http://rangevoting.org/>.
Rpertoire de jurisprudence. I. galit des salaires et ranking. 2002. No. 02-687. Grenoble: C.A.,
November 13.
Richter, C. F. 1935. An instrumental earthquake magnitude scale. Bulletin of the Seismological
Society of America 25: 132.
Roberts, F. S. 1979. Measurement theory, with applications to decision making, utility and the
social sciences. Vol. 7 of Encyclopedia of Mathematics and Its Applications. Reading, Mass.:
Addison-Wesley.
Rudolph, F. 1977. Curriculum: A history of the American undergraduate course of study since
1636. San Francisco: Jossey-Bass.
. 1991. The American college and university: A history. 2d ed. Athens, Ga.: University of
Georgia Press.
Saari, D. G. 1989. A dictionary for voting paradoxes. Journal of Economic Theory 48: 443454.
. 1992. Millions of election rankings from a single profile. Social Choice and Welfare 9:
277306.
. 2000. Mathematical structure of voting paradoxes. I and II. Economic Theory 1: 5153,
55102.
402
References
. 2001a. Analyzing a nail-biting election. Social Choice and Welfare 18: 415430.
. 2001b. Chaotic elections! A mathematician looks at voting. Providence, R.I.: American
Mathematical Society.
. 2009. Voting. <http://cema.cufe.edu.cn/admin/data/uploadfile/200907/2009071318372272
.pdf>.
Saari, D. G., and J. van Newenhizen. 1988. Is approval voting an unmitigated evil? Public Choice
59: 133147.
Satterthwaite, M. A. 1973. Strategy-proofness and Arrows conditions: Existence and correspondence theorems for voting procedures and social welfare functions. Journal of Economic Theory
10: 187217.
Scullen, S., P. Bergey, and L. Aiman-Smith. 2005. Forced distribution rating systems and the
improvement of workforce potential: A baseline simulation. Personnel Psychology 58: 132.
Sen, A. 1970. Collective choice and social welfare. San Francisco: Holden-Day.
Sen, A., and P. Pattanaik. 1969. Necessary and sufficient conditions for rational choice under
majority decision. Journal of Economic Theory 1: 178202.
Sertel, M. R., and M. R. Sanver. 2004. Strong equilibrium outcomes of voting games are the
generalized Condorcet winners. Social Choice and Welfare 22: 331347.
Simon, H. 1954. Bandwagon and underdog effects and the possibility of election predictions. Public
Opinion Quarterly 18: 245253.
Smith, J. 1973. Aggregation of preferences with variable electorate. Econometrica 41: 10271041.
Stevens, S. S. 1946. On the theory of scales of measurement. Science 103: 677680.
Tocqueville, A. de. 1967. Letter dated January 5, 1851. In Correspondance dAlexis de Tocqueville
et de Gustave de Beaumont. Vol. 2, 355. Introduced, edited, and annotated by A. Jardin. 3 vols.
Paris: Gallimard.
United States Geological Survey. 1989. The severity of an earthquake. <http://earthquake.usgs.gov/
learning/topics/mercalli.php>.
University Interscholastic League. 2006. <http://www.uil.utexas.edu/>.
Vickrey, W. 1961. Counterspeculation, auctions, and competitive sealed tenders. Journal of Finance
16: 837.
Vieth v. Jubelirer. 2004. 541 U.S. 267.
Vinitaly. 2006. Regolamento. <http://www.vinitaly.it/concorsoenologico/home.asp>.
Weber, R. J. 1977. Comparison of public choice systems. Cowles Foundation discussion paper 498.
Yale University, New Haven, Conn.
Webster, D. 1832. The writings and speeches of Daniel Webster. Boston: Little, Brown, 1903.
Wells v. Rockefeller. 1969. 394 U.S. 542.
Weymark, J. A. 2005. Measurement theory and the foundations of utilitarianism. Social Choice
and Welfare 25: 527555.
Wikipedia. 2007. Grade (education). <http://en.wikipedia.org/wiki/Grade_(education)>.
Wilderness Emergency Medical Services Institute. 2008. Mankoski pain scale. <http://wemsi.org/
painscale.html>.
Wood, H. O., and F. Neumann. 1931. Modified Mercalli intensity scale of 1931. Bulletin of the
Seismological Society of America 21: 277283.
World Skating Federation, Plaintiff, v. International Skating Union and Ottavio Cinquanta,
Defendants. 2005. 03 Civ. 9800 (JES), U.S. District Court, Southern District of New York,
February 15.
Young, H. P. 1974. A note on preference aggregation. Econometrica 42: 11291131.
. 1975. Social choice scoring functions. SIAM Journal of Applied Mathematics 28: 824838.
References
403
. 1986. Optimal ranking and choice from pairwise comparisons. In Information pooling and
group decision making, ed. B. Grofman and G. Owen, 113122. Greenwich, Conn.: JAI Press.
. 1988. Condorcets theory of voting. American Political Science Review 82: 12311244.
Young, H. P., and A. Levenglick. 1978. A consistent extension of Condorcets election principle.
SIAM Journal of Applied Mathematics 35: 285300.
Zahid, M. A. 2009. Majority judgement theory and paradoxical results. Working paper. Tilburg
University, Netherlands.
Zi, . 1894. Pratique des examens littraires en Chine. Shanghai: Imprimerie de la mission
catholique.
Zitzewitz, E. 2006. Nationalism in winter sports judging and its lessons for organizational decision
making. Journal of Economics and Management Strategy 15: 67100.
Name Index
Chernoff, H., 60
Chirac, Jacques, 1416, 4142, 185
Chopin, Frederick, 137
Clausewitz, Carl von, 93
Cock, Robin, 387
Cominetti, Roberto, 282n1
Condorcet, Marie Jean Antoine Nicolas Caritat,
Marquis de, 4748, 67, 95, 184, 387388
Confucius, 315
Copeland, A. H., 48, 66
Crane, Hart, 375
Cusanus, Nicolaus, 47, 4950, 184
Dantzig, George B., xiv
Darwin, Charles, 291
Dasgupta, Partha, 6466, 291
dAspremont, Claude, 204
Debost, Michel, 139n7, 146n11
Debuigne, Grard, x
de Gaulle, Charles, 3637, 41n11
Demange, Gabrielle, 27
de Sinopoli, Francesco, 319
de Swart, H., 283n2
Dodgson, Charles L., 47, 67, 94n2, 96
Donaldson, David, 184
Downs, Anthony, 101, 340
Dubey, Pradeep, 169
Dussault, Thrse, 138n6
Dutta, Bhaskar, 319
Dvoretzky, A., 268
Einstein, Albert, 387394
Elman, Benjamin A., 130131, 131n1
Emerson, John W., 146
Enelow, James M., 63
Estlund, David M., 184
Farquharson, Robin, 94n2
Farvaque, Etienne, 114
Feld, S. L., 116
Felsenthal, Dan, 286n3
406
Name Index
Name Index
407
Wang, Chen, xv
Wang, Kaixuan, 131n2
Wanyan, Shaoyuan, 131n2
Weber, Robert J., 315, 319322, 351
Webster, Daniel, 194n1
Welch, Jack, 134
Weymark, John A., 184
Whinsten, Michael D., 186
Wittgenstein, Ludwig, 161, 174, 252
Wolfowitz, J., 268
Wood, H. O., 162
Young, H. Peyton, xii, 21, 6768, 7072, 77,
296n3, 297
Zahid, Monzoor A., 283n2, 286n3
Zhou, Lin, 316n2, 329
Zi, tienne, 131
Zitzwitz, Eric, 140
Subject Index
410
Subject Index
Subject Index
411
412
Subject Index
Subject Index
413
414
Strategy-proofness (cont.)
strategy-proof-in-grading, 5, 14, 189193,
238239
strategy-proof-in-ranking, 220221
in traditional model, 9697
Strictly monotonic SGF, 178
Strongly monotonic choice function
( = Maskin monotonicity), 97, 97n3
Sum-scoring methods, 7276, 354355.
See also Borda method
Tie breaking rules, 60, 142, 224227, 244248
Top-preferred majority judgment method,
106107
Top-preferred order, 9192, 105107
Transitivity, 56, 6264, 181, 231, 291, 376
Trimmed average method, 138, 145, 148, 187,
377378
Two-past-the-post method, 37, 4045. See also
Alternative vote
bias of, 123127, 342344
equilibria, 352355, 374
manipulability of, 4243, 55, 96
meaninglessness of (see Social choice
[traditional model])
Unanimity ( = Pareto optimality), 57, 83, 178,
200, 228
Utility, xiii, 183186, 220, 299, 374. See also
Merit; Single-peaked; Welfarism
final grade optimizers, 185, 371
honesty optimizers, 185, 371, 373
winner and grade optimizers, 359361,
373374
winner optimizers, 185, 319, 352, 370
Voting behavior, 1314, 112116, 138,
146147, 254266, 293, 306312, 329337.
See also Strategic manipulation
Voting in practice
Australia, 3335
France, 3645 (see also Experiments: FT,
ILC, IEP, Orsay 2002, Orsay 2007)
Italy, 3132
Mexico, 2932
Switzerland, 2729
United Kingdom, 3233, 34n7
United States, 1618, 2327, 290
Venice 1268 electoral system, 2223
Weakly monotonic SGF, 178, 249250
Welfarism, 184186, 213215, 294n2, 303n4
Worlds best sommeliers, 189
Subject Index