1 Nash Demand Game
1 Nash Demand Game
1 Nash Demand Game
Assignment
Nguyen Linh Chi
There are many situations in society that involves bargaining. If Prisoner Dilemma is about the inferior situation
that both parties slide in due to their attempt to take advantage of the others cooperation, then Bargaining game is
a game that society as a whole can always reach the Pareto frontier. The problem hence goes further into the concern
for individual welfare because at the Pareto frontier, the increase of ones benefit necessarily sacrifices that of an other.
Eventually, after learning how to cooperate, it will come to the point of raising the question of surplus division. And
the solution seeking process reveals other interesting aspects of this game about power among bargainers, about the way
they perceive time and money, the way strategies and punishment-reward mechanism of individuals happen to synchronise
without any intended impose from a higher conscious mind yet. And these works support the general belief of micro-fields
that problem solving at individual level can certainly have substantial impact on macro surface by creating spontaneous
harmony and rhythm. The complexity is, however, and indeed ultimately, preserved to be unpredictable.
This assignment thus tries to sketch the literature with overview and general information, then provides some results
from the simulated evolutionary model of automata playing one shot game.
b. The matrix
The simple version of the game is represented in the following matrix, when choices are restricted to only three cases
for simplification reason: demanding a high share (say 8), demanding a fair share (which is 5) or demanding a low share
(necessarily be 2).
1
High Medium Low
High 00 00 82
Medium 00 55 52
Low 28 25 22
In correspond with the numerical matrix, these payoffs can also be dotted on coordination axis as a geometrical visu-
alisation. Following is the payoff space equivalent with the above matrix. All dots are the feasible outcomes and red dots
are the furthest point to the northeast that both players can ever reach.
The unit of the above graph is in monetary outcome. These red dots hence be the Pareto points if that monetary
graph is equivalent to the graph of utility. Which means that after transforming the monetary outcomes into the utility
number, the utility graph saves the ratio (the relative distance) among these points. One suitable assumption of this is
that players are all risk neutral.
Another statement to make before ending this part is that these red dots are the NEs in one shot game. One shot (or
stage) means that it is played only once.
e. The minimax
The set of scattered points above can be considered as the feasible payoff space of this Nash Demand game. According
to that picture, both players clearly can get 0s in the worst case. But in the simultaneous framework using best response
logic, that point will not happen. Given whatever the other plays (High, Medium or Low), one can always get at least
2 by playing Low. Thus 2 is called the minimax payoff. Minimax is a guaranteed payoff and it makes point (0,0) out of
question.
2 Repeated Game
a. The convex hull
Things change when the one shot game is repeated. Each player will get a stream of payoffs over a time horizon. Each
payoff in that stream belongs in the feasible set of the one shot game. As a conveniently traditional practice, this stream
needs to be boiled down to a number. There are more than one way to do that but to be sensical and simple, its better
to let it lie somewhere between the extreme range [0,8] of the one shot game.
As a consequence of repeating the game then compressing the result, the feasible outcome set of the game now expands.
Any points in the convex hull of the initial point set above can be reached given that they both agree to make that happen.
Each of these points corresponds to an agreed strategy profile containing details on who do what over a course of upcoming
time period.
The following picture illustrates the repeated game when the calculation is to make the relative ratio among points
and absolute value of the payoff space to be the same as in the stage game:
2
A convex set is a set that satisfies the following condition: if any two points belong to the set, then the line connecting
them also belongs to the set. The new feasible payoff set of the repeated game is therefore convex in the sense that: if
two extreme points (corresponding to two pure strateties) can be reached, then any convex combination of that pair can
be reached also, by letting the two players alternate the two pure strategies over time. A convex combination is a special
linear combination in which combination factors sum to 1.
+ Method 1:
The stage-game payoffs will be lined up in time axis. Each of the values that are in the future will be discounted
back to present value of today, with discount factor . These numbers will be processed further by a simple operation
of summing in order to compress the whole process into a single number at the end. Depending on the infiniteness or
finiteness of the series, some calculus can helps.
Example of a sequence of payoffs when payoff in each stage game is 1:
Round 1 2 3 4 5 ...
Payoff 1 1 1 1 1 ...
Present value 0 1 2 3 4 ...
1
=> Sum of this sequence (Prevent value of the payoff sequence) = 1 + + 2 ... = 1
1
=> Normalisation: 1 (1 ) = 1
Time serves here as an idionsyncratic feature of human mind. The discount rate will make the value of a future that
happens tomorrow matter more than the value in another future that is too far away to comprehend. The notion of time
hence becomes functional, an instrumental and handy way to perceive the happening. Time itself doesnt have an instric
meaning. This method will be mentioned in the coming part of Rubinstein sequential game.
+ Method 2:
If method 1 puts weight on value of today rather than tomorrow, method 2 treats all value as equal by taking simple
arithmetic average of the sequence. The weight for each stage is 1 for all rounds, given that n is the number of rounds.
The symbol in this method inherently changes its meaning, it is now interpreted as probability to continue the game
after the first game playing for sure. (This is an abuse of notation for the convenience, i think). The finiteness of the
repeated game is captured by letting < 1. Its equivalent to method 1 in the sense that it represents the not-sure if
tomorrow will happens. The player thus still needs to reason over the gain of long term and the cost of short term benefit.
Example of a sequence of payoffs when payoff in each stage game is 1:
3
Round Payoff Continue?
(with probability )
1 1 yes
2 1 yes
3 1 yes
4 1 yes
5 1 no
Sum 5
=> Normalisation: 5 / 5 = 1
To treat the infinite series, this method lets the number n of stages run to infinity hence -> 1. In light of any
methods, this is as if the player has infinite patience. Either he treats benefit in far future just like it happens today or
he is sure that tomorrow will always absolutely happens, its equivalent.
The Folk Theorem therefore is about the minimax space. It says that after being cut off by individual rationality
(based on best response logic), whatever left of the feasible payoff space can be maintained by repeated game agreements.
A general comment (regarding the introduction part about this minimax space) needs to be made. The minimax space
is the space that is bounded by the pure NEs and the minimax point. In this game, the pure NEs constitute the Pareto
optimal frontier and these efficient outcome can be reached naturally. Plus, the minimax point is not among the pure
4
NEs so the efficient points can always be reached. This explains the fundamental difference between bargaining game
(Nash Demand Game - NDG) and cooperation game (Prisoner Dilemma Game - PDG). In a PD, the minimax is the NE
and the Pareto point lies outside the best response logic in a simulataneous framework. Hence the problem is to set up
cooperative mechanism to harvest the surplus for society as a whole. But the problem here is that there are more than
one NE in the one shot game and so many more NEs in the repeated game, considering the Pareto frontier alone. Each
NE represents a different share ratio for individuals. And the solution for this problem must answer the question of which
point to choose and how come to that.
5
Note that any point like the green point in Quarter III will be mapped to a point on the Pareto frontier in Quarter
I. Dividing the money efficiently always is the Pareto solution. The blue point, however, is clearly not a Pareto solution,
because from that point, both players can get more without harming someone else.
One consequence of making utility comparable is that the solution will not change if one utility is transformed under
affine transformation. (The ratio is presereved). Another consequence of this process is that if we go further to equate the
absolute value of utility of both players (symmetric utility function), then the payoff space is symmetric and the solution
lies on the bisector (this will be illustrated in the solution part).
c. The solution
In general, the solution will be the point that maximizes the product of the projections of the distance between that
point u to the disagreement point d on two axises:
6
One consequence of describing the solution this way is the Pareto optimality. When both utility can be increased
without harming anyones utility, the product will increase. Another consequence is that it implies individual rationality.
When one distance is negative, the product is negative.
After solving for the Lagrangian function, the solution will be at the point that satisfies the following condition:
u2 g 0 (u1 )
the ratio u1 = the inverse ratio between their rate of change g 0 (u2 ) .
The ratio of the rate of change here inherently has the meaning of how much utility one has to give up for the increase
in the other. Hence it represents the need of individual. Who needs more will get more share of the manna from heaven.
+ Now if the frontier curve of payoff space P is a quarter of a circle (reference point at origin), we have:
g(u1 , u2 ) = u21 + u22 k with k is the radius of the circle
This mean that the solution lies on the bisector (as in the following picture)
7
because Nash Product Solution is defined for infinitely divisible prize/pie.
b. Backward induction
The general line of reasoning in this game will be that when its ones turn to be proposer, he will offer a part that
make the follower indifferent between accepting today and rejecting to wait for tomorrow. In fact he can offer to break
the tie but for a simple calculation, it can be set to be indifferent.
This game is perfect information so if we suppose a finite number of rounds, A can put himself at the end the game
and work backward. This is the last day when A offers B a little 0:
The following table hence list out the value that each player can get in a sequence of time order. For example, at
the last day T, A gets 1 and B get 0 so in the day before the last day T-1, B will offer for A because of the following
reason: if A rejects B to wait for the day T, A will get the whole pie in day T, but the whole pie in day T only worth
left (psychologically). So A is indifferent between getting in day T-1 and the whole pie 1 in day T. The reasoning goes on.
proposer -> A B A B
A 1 1 + 2 2 + 3 ...
B 0 1- (1 ) 1 + 2 3 ...
It can stop after a finite number of rounds but if the number of days goes to infinity, we use algebra formula for infinite
series:
1 1
uA = 1 + 2 ... = ()0 + ()1 + ()2 + ... = 1() = 1+
1
uB = 1 uA = 1 1+ = 1+
8
In the between, when 0< < 1, uB < uA . It means that the first proposer has the advantage and the bargaining
power helps him get more than the follower.
A strategy (demand) x is MESS if at any point in the game, if one is called up, he offers x and the offer is accepted
immediately (no one wants to wait for his next turn to offer). The psychological cost is also kept.
b. The process
As in the following picture, suppose that the blue point is the point that p1 offers to divide the pie.
What p1 gets is x and p2 will get 1-x if he accepts. But what if he rejects and waits for tomorrow so that its his turn
to offer? Lets say p2 rejects to offer the green point:
P2 still offers x because x is the optimal offer at any round. But it will cost him because in tomorrow, the pie will be
smaller with discount rate . So p2 will get x and spares p1 a proportion of (1 x). The first condition for x to be a
MESS ends here.
For the second condition that the offer is accepted immediately, its necessary that what p1 offers p2 today should be
at least as good as what p2 will ask tomorrow. And vice versa. Hence we have this system of requirement:
x (1 x)
1-x x
9
1
which ends up to be: 1+ x 1+
b. The process
For example the norm for population A is now claiming x and for B is claiming 1-x. But there is A wants to claim
x+ , and B wants to claim 1-x+ . These and will get in the game and change belief of the other about the current
situation and will change their behavior next time.
As a consequence, the game has chance to visit all kinds of equilibria in the Pareto optimal frontier (which are also
the NEs set) over time. But among that dynamics of norms, there are norms that will prevail longer.
c. The solution
Let say is the force that drives the norm x up to x+.
is the force that will drive down x to x-
The x that has (x) = (x) is the norm that will persist.
Consider picking A out of population A, he will observe that in the last period, there is B claiming 1-x+. Hence A
makes calculation about two options for this time:
- If A claims x-, he gets for sure x- no matter who he is dealing with.
- If A claims x:
+ with probability, he meets the odds and gets 0.
+ with 1-, he meets the normal B that claims 1-x and A will get x.
Using expected utility of Von Neuman, A will claim x if: (1 - ).u(x) u(x-)
From this we derive the at the equality = u(x)u(x)
u(x)
Applying the same reasoning for B, the norm is to claimn 1-x but B observes A claimed x+ in the previous period,
so he will claim 1-x for this period if (1).u(1 x) u(1 x )
Hence the turning point of is: = v(1x)v(1x)
v(1x)
The x that will persist happens when equating and:
v(1x)v(1x)
v(1x) = u(x)u(x)
u(x)
in approximation, it says that x persists if the rate of change of population B is equal to the rate of change in population
A.
Given that these functions are defined over the same domain of x in [0,1], it says that: the rate of change at (1-x)
equates the rate of change at x. With symmetric condition imposed on two populations: same aggressiveness, same
adventurous attitude.., these best-response based functions will cross at the middle point of 50:50.
10
And the 50:50 institution is the persist.
Given x, y, z as the probability that player 2 will play High, Medium and Low, player 1 can calculate his expected
utility over playing High, Medium and Low as in the following table:
x y z EU of player 1
High Medium Low
High 0 0 8 EU(h)=8z
Medium 0 5 5 EU(m) = 5y+5z
Low 2 2 2 EU(l) = 2x+2y+2z
The expected payoff of playing High is higher than the expected payoff of playing Medium when:
EU(h) > EU(m) => 8z > 5y + 5z => z > 53 y
The plane z = 53 y is drawn in pink in the following picture. And the corresponding area of EU(h) > EU(m) is pointed
out in the right picture:
11
EU(h) > EU(m) is the region above the plane. It means that when there are enough number of Low in the population:
z > 53 y, the best response will point to play High to gain advantage of people playing Low -> High can easily get in this
population.
Accordingly, the expected payoff of playing High will be higher than the expected payoff of playing Low when:
EU(h)>EU(l) => 3z > x+y. The plane 3z = x+y is plotted as the green plane in the following picture:
Two areas separated by the green plane has opposite sign when we compare between EU(h) and EU(l). It means that
EU(h)>EU(l) when the number of playing Low reach some point above the portion of x & y (playing High and Medium).
Combining both conditions, the force to play High is strongest in area H (the pink area). Its where strategy High gets in
the easiest:
Now we add the last inequality: EU(m)>EU(l) => 2x > 3y+3z. The plane 2x=2y+3z is plotted in blue color as in
the following picture. Two areas separated by the blue plane has opposite sign when we compare EU(m) and EU(l).
12
With these parallel and mixed forces, we have the area M (green area) is where the urge to play Medium is dominant,
and area L (blue area) is where the urge to play Low is dominant.
Putting these areas together, the dynamic map looks like this:
So we see that each point in this dynamic corresponds to a distribution of 3 pure strategies. If the population starts
will all playing Low, (point z = 1), it will be dragged toward playing High (point y = 1) rather fast. If we start at point
y = 1, then it will be dragged to point z=1 with slower rate. In between, there is a mixed strategy equilibrium. However,
with some mutation, the population will get in the green region and it will be dragged to playing Fair (point x=1) pretty
fast. And playing Fair is another equilibrium. In the evolution of this game, equilibrium of playing Fair will be more
stable than the mixed equilibrium because it needs a lot mutation to get out of the green region but it just needs very
few mutation from the point of mixed strategy to enter the basin of attraction of the Fair equilibrium.
8 Simulation
a. Automata
Simulation is set up as one population of 100 automata. They are randomly matched. Each automaton can carry states
of different strategy. And they are prepared for repeated game of any length. For example: one automaton can choose
13
to play High forever no matter what the other plays, but there are automata that will switch states pretty fast based on
what the other plays in the previous round.
Example of an automaton that starts playing High and will keeps playing High whatever the other plays is in the
following picture:
Accordingly, here are the ones that plays Medium and Low all the time:
Using the color red, green, blue for strategy High, Medium and Low, here is one automaton that starts playing Medium
but continues by best respond the others previous strategy:
Note that all green arrows point to the green state. It means that if the other plays Medium in the previous round, it
jumps to play Medium. But if the other plays High, it keeps playing Low and vice versa.
b. Repeated game
This is example of All-High automaton (playing High all the time) versus All-Medium automaton in infinite rounds:
The red dots are the strategy that All-High plays across rounds. It means that this automaton always plays red (High)
strategy. Accordingly, green dots say that All-Medium keeps playing Medium no matter what happens in the previous
round. The payoff stream associated with this match is in the following table:
Round 1 2 3 4 5 ...
All-High 0 0 0 0 0 ...
All-Medium 0 0 0 0 0 ...
No matter what method is used to boil down these stream of payoffs, the result is that they both get 0.
This is another example match of Accommodator versus All-Low:
14
In the first round, Accommodator starts with Medium and All-Low starts with Low. But in the second row, Accom-
modator immediately best responds to All-Low by playing High and the situation keeps going on. The payoff stream
associated with this match is in the following table:
Round 1 2 3 4 5 ...
Accommodator 5 8 8 8 8 ...
All-Low 2 2 2 2 2 ...
Using the method 1 with = .5 for instance, we can calculated their final payoffs as follows:
Sum
Accommodator 5 8 8 8 8 ...
1 8
PV 5 8 8 2 8 3 8 4 ... 5 + 8(1 + + 2 + ...) = 5 + 8 1 =5+ 10.5 = 21
All-Low 2 2 2 2 2 ...
2 2
PV 2 2 2 2 2 3 2 4 ... 2(1 + + 2 + ...) = 1 = 10.5 =4
Hence in this infinitely repeated game, Accommodator gets 21 and All-Low gets 4.
c. Regeneration process
After playing the repeated game with discounted factor in [0,1], the fitness of each type of automata will be measured
and used for the regeneration process at the end of each day (or cycle).
For example, one population of 12 comprises of 7 Accomodators and 3 All-Lows. They are randomly matched as follows:
d. Mutation
Mutation for now is modeled as follows. There are four types of mutations: Nature can change the strategy in one state of
the automaton, or it can add one more state, or it can delete one state, or it can change the trajectory among the states.
At each cycle, one automaton is allowed to mutate one type at probability .
15
It can go through 2 mutation of adding states as follows:
Then it can change one trajectory, change strategy in one state or delete a state:
e. The cycle
In general, each simulation run can have many cycles. One cycle will go as follows:
16
f. Result
Setting = 0 (which means all pairs play one shot game in each cycle). If we start the population with automata playing
all Low, their initial payoff is 2 but some mutation happens and the average payoff of the population gets up rather quickly.
After 200 days, it starts to have automata that play High in the population. After 400 days, they get to 4.5 in average
(quite close to the fair-fair point). At this poitn, the number of playing Fair strategy in the population has risen a lot
higher. They also have efficient punishment mechanism for playing High. For example: there are automata that starts
with Fair but will jump to High if the other plays High and it only jumps back to play Fair if the other play Fair. It keeps
playing High if the other plays High or Low.
Another automaton starts with High and only jumps to Fair if the other plays High, then it only jumps back to High
if the other plays Low. Another example, one automaton starts with Fair and only jumps to High if the other plays High,
it does not even jump to High if the other plays Low. But once it jumps to High it will stay to play High and only jump
back to Fair if the other play Low. Note that these automata dont play Low that much. At round 600, they get to 5 in
average. Now playing Fair becomes dominant in the population. Then it fluctuates to approximately Fair but lower than
Fair point. Its the effect of mutation.
An example of this simulation is as follows:
17
If we start with a population playing all High, the average payoff mean is initially 0 and it rises to 1.8, much slower
than in the previous case though. At this point, in the population there are a mixture of playing Medium and Low. At
round 700, it rises pretty fast to 5. Most of the automata now playing Fair at round 900. If we start with a population
all playing High, they stay in low payoff for very long time because there are too many High in the population and when
all the High automata pair up, they pull down the payoff mean quite hard.
An example of this simulation is as follows:
There can be simulation that the phase of population being at 2 can be very long.
If we start with a population of all Medium, then the average payoff stays in around 4.2, 4.9 up to 5. There are Fair
and Low citizens in the population and some time they paves the way for High strategy but High and Low strategy are
all kept at very low level. These small deviation is just the mutation effect.
Here is an example simulation:
18
In general, the evolutionary simulation aligns with the replicator dynamic in one shot game (developed in the previous
part).
Things get much more complicated when automata playing repeated game (with positive discount rate). They can
develop quite complex strategy plan and the payoff space when they match changes the shape accordingly. Hence the
evolution of the repeated game becomes more complicated at exponential rate because they may have different payoff
spaces due to different value of discount rate and different length of the repeated game.
19