Bidding in Interrelated Day-Ahead Electricity Markets: Insights From An Agent-Based Simulation Model
+ - +
Here, s
is the propensity to choose a specific action in the next round, f is the recency parameter
and e the experimentation parameter. The recency parameter reflects the agents tendency to
forget past experience over time, and the experimentation parameter defines the extent to which
agents engage in information exploration through trying strategies that dont have the highest
propensity. It has the effect that agents do not lock into one choice at a too early stage. The third
parameter specifying the presented reinforcement learning algorithm is the scaling parameter s(0). It
defines the initial propensities for all actions. Agents choose an action k according to the following
( 1)
( 1) =
( 1)
s t
p t
s t
They thus reinforce actions which have resulted in a high payoff and choose these again with a
higher probability in the future. Accordingly, less successful actions are weakened and chosen again
with a lower probability.
As probabilities should not be negative, it must be ensured that the propensities always have a
positive value. If agents bid a price below their marginal generation costs, they face the risk to be
called into operation at a price at which they make losses. For ensuring positive propensities we
follow the proposition of Erev and Roth [1998], who subtract the smallest possible payoff x
all payoffs:
( ) R x x x =
In order for the algorithm to represent agent learning in a realistic manner, the parameters
specifying the algorithm have to be set carefully. Erev and Roth [1998] state that a combination of
f = 0.1 and e = 0.2 leads to the best prediction of the empirical outcome for their studied matrix
games. The order of magnitude of the scaling parameter s(0) depends on the magnitude of the
possible reinforcements. In an effort standardise this parameter we restrict the reinforcement to be
in the interval [0,1] by dividing each reinforcement by the highest possible reinforcement. In other
simulations that we have carried out, we find that the value for the scaling factor doesnt influence
the results significantly
. We also found that the parameter combination stated by Erev and Roth as
the best for predicting the results of their studied human experiments actually leads to a low
variability of simulation outcomes over a series of 100 simulations, which means that the choice of
a random number seed influences the results less than most other tested combinations
. These
findings led us to the following choice of a parameter combination for every Modified Erev-Roth
reinforcement learning algorithm: f = 0.1; e = 0.2; s(0) = 1.0.
2.4. The agents action domains
On the day-ahead market, we model an agent as trying to optimise both its bid prices and quantities
in order to maximise its individual profit. Thus, an agent can engage in withholding strategies if it
finds these profitable. The possible bid prices for an agent on the day-ahead market range from the
minimum to the maximum admissible price, where
p = 0 and
p = 100. A generator
chooses the bid quantity as a fraction of his available capacity. The intervals [a,b] of possible prices
and capacity fractions are stratified into 21 discrete values for the bidding price and six values for
the bidding quantity. The action domain for an agent bidding on the day-ahead market thus
comprises M = 126 possible actions and summarises to the following form:
{ } { } { } { } { } { } , 0, 0 , 0, 0.2 ,..., 0,1.0 , 5, 0 , 5, 0.2 ,..., 100,1.0
balance dayAhead dayAhead
M p q = =
On the balancing power market, an agent always chooses its bid quantity as equal to the net
installed capacity, reduced, if applicable, by the amount of capacity it has already committed on the
day-ahead market. Thus, an agent only learns to choose its bid prices, and employs a fixed
strategy for the bid volume. As a bid on the balancing power market comprises a capacity price and
a work price, this again leads to a two-dimensional action domain. Admissible prices range from
balance cap
p = 0 to
balance cap
p = 500 and
balance work
p = 0 to
balance work
p = 100, stratified into 21 possible
capacity prices and five work prices, which results into the following action domain:
{ } { } { } { } { } { }
, ,
, 0, 0 , 0, 25 ,..., 0,100 , 25, 0 , 25, 25 ,..., 500,100
balance balance cap balance work
M p p = =
2.5. The agents strategy
The problem facing the agent can be divided into the strategy to choose for the day-ahead market
and the strategy for the balancing power market. The learning task for each agent is consequently
separated into two individual learning problems. On the level of implementation, this results in each
agent employing two instances of the learning algorithm, with the same parameter values but
different action domains for both instances.
In our simulation scenario, each agent i owns one generation unit u which is characterised by a
linear variable generation cost function, i.e. constant marginal generation costs MC
. Each power
plant has a net installed capacity
i u
Q and no-load costs NLC
. Ramping costs or other commitment
constraints are abandoned in this simulation for simplification reasons.
As some capacity
, ,
i u t
q of a generation unit may have been committed in one electricity market for a
period of time t, the available capacity for this unit
, ,
i u t
q is defined as
The tested scaling parameters range from s(0) = 0.5 to s(0) = 1.0.
The tested parameter values range from f = 0.0 to f = 0.5 and e = 0.1 to e = 0.5
, , , , ,
avail inst comm
i u t i u i u t
q Q q =
for this specific period. Other reasons for reduced power plant availability, such as maintenance
periods or unplanned outages, are not included in the current simulation implementation. We
assume that an agent bids one price volume pair for every generation unit it owns; agents cannot bid
more than their available capacity into the market:
, , , ,
dayAhead avail
i u h i u h
q q ,
, , , ,
balance avail
i u b i u b
q q
An agent tries to maximise profits on both markets. It does so by favouring actions that have
yielded higher profits in the past trading round through reinforcement learning. The reinforcement
of each chosen action comprises the profit achieved on the market, and also includes opportunity
costs to a certain extent. On the day-ahead market, the agents profit is defined as follows:
( )
, , , ,
( )
dayAhead dayAhead dayAhead dayAhead
i u i u i u i u
q p MC q =
Here, p
is the profit that agent i achieves for its generating unit u, p
is the resulting uniform
price on the day-ahead market and
i u
q is the quantity of unit u that agent i was able to sell on
the day-ahead market. The opportunity cost oc
for agent i on the day-ahead market is the
profit it could have achieved if it had bid its capacity on the balancing power market. It is defined
( )
, , ,
dayAhead dayAhead balance
i u i u i u
oc q p NLC =
The profit an agent achieves on the balancing power market is defined as follows:
( ) ( )
, , , , ,
, , ( , ) , , ( , ) ,
( )
balance balance cap balance cap balance work balance work balance work
i u i u i u i u i u i u i u
q p NLC q p MC q = +
Here, the subscript (i,u) indicates that the capacity and work price can either be a single uniform
price or an individual bid price for each agent/unit, depending on the pricing rule employed. As the
produced electric work is remunerated separately, the cost that an agent i faces for providing minute
reserve capacity is only limited to the no-load costs of its unit. However, an agent faces high
opportunity costs, because it could also use its unit for producing electricity that it can sell on the
day-ahead market. This opportunity cost is calculated as:
( )
, ,
, , ,
( )
balance balance cap dayAhead balance cap
i u i u i u
oc q p MC q =
In both markets, the opportunity costs are subtracted from the profit that an agent receives. In order
to prevent unintended effects, we restrict the opportunity costs to be positive through applying a
min() function. In addition, we ensure that the inclusion of opportunity costs doesnt lead to
negative reinforcements.
3. Simulation scenarios
Because of the probabilistic nature of the applied reinforcement learning algorithm, the outcome
from our agent-based simulations partly depends on the random number seed with which learning
instances are initialised. We derive individual random number seeds for each instance of the
learning module, so as to avoid unintended similarity among agents which would occur if all
learners had the same seed. Every simulation setting is run 100 times with different random number
seeds at each run. All seeds employed in one set of 100 runs are stored and used again for different
simulation scenarios. By doing so, we can best exploit the advantages of agent-based simulation:
different settings, e.g. different market mechanisms can be tested under exactly the same conditions.
By ensuring that all agents in principal learn exactly in the same way in one setting as in another,
we can derive qualitative conclusions about market efficiency through comparing the resulting
outcome of both settings.
In this paper, we explore four different scenarios which differ in the order of market execution and
the pricing mechanism on the balancing power market:
DayAheadBalance refers to a scenario where the day-ahead market is cleared first, and then results
from this market are published before the balancing power market clears. BalanceDayAhead
switches the order of market execution. The endings _uniform and _payAsBid correspond to a
uniform price and a pay-as-bid pricing rule on the balancing power market. On the day-ahead
market, uniform pricing is employed in all cases.
The participating agents and their generation unit characteristics are summarised in Table 1.
Agent name
Capacity of
[MW] [EUR/MWh] [EUR/h]
Generator1 300 5.00 75.00
Generator2 300 10.00 75.00
Generator3 300 15.00 75.00
Generator4 300 20.00 75.00
Generator5 300 30.00 75.00
Generator6 300 5.00 150.00
Generator7 300 10.00 150.00
Generator8 300 15.00 150.00
Generator9 300 20.00 150.00
Generator10 300 30.00 150.00
Table 1: Characteristics of the agents generating units
The static demand throughout the simulation is D
= 1500 MW and D
= 800 MW,
both constantly for every hour. The demand for electric work from minute reserve is D
160 MW, which occurs in one hour per day.
4. Simulation outcome
Each run within one simulation scenario has been simulated over at most 2,000 trading days. In
many runs, however, all agents locked into some preferred action earlier, because the propensities
of all other actions were so low that they were hardly chosen. A simulation run was stopped if the
resulting market price did not change over a period of 200 trading days. In all cases, the mean
bid/market prices and bid/resulting volumes over the last 200 trading days was recorded as a result
for each run. In the following, the simulation results for each scenario are reported. They comprise
the average, minimum and maximum resulting values over 100 runs with different random number
seeds, as well as the standard deviations (SD).
Resulting prices on the day-ahead market are depicted in Table 2. Prices attain a higher level if the
day-ahead market is cleared after the balancing power market. The intuition behind this result is that
competition is lower on the supply side in these cases, as some generators have already committed a
part of their capacity on the balancing power market. When fewer agents compete on the day-ahead
market, they can more successfully bid above marginal cost and thus achieve higher prices.
DayAheadBalance_uniform 40.65 30.00 60.00 6.45
DayAheadBalance_payAsBid 37.01 20.00 65.00 7.32
BalanceDayAhead_uniform 66.14 35.00 100.00 15.56
BalanceDayAhead_payAsBid 64.03 35.00 100.00 16.28
Table 2: Simulated market clearing prices on the day-ahead market
DayAheadBalance_uniform 358.82 19.10 158.00 0.00 500.00 75.00 65.70 21.52
DayAheadBalance_payAsBid 273.80 17.54 60.23 0.00 431.25 75.00 76.42 20.26
BalanceDayAhead_uniform 393.79 17.51 225.00 0.00 475.00 99.75 45.83 20.53
BalanceDayAhead_payAsBid 354.29 17.21 206.25 0.00 457.88 75.00 55.04 21.24
Table 3: Simulated capacity/work prices on the balancing power market
Table 3 represents the resulting capacity and work prices on the balancing power market. Here, we
observe the opposite structure as on the day-ahead market, i.e. prices tend to be lower when the
market is cleared second. This outcome cannot be explained by the supply concentration; another
aspect influences the outcome in a stronger manner in this case. Whereas price information on the
second market plays only a minor role for bidding strategies on the day-ahead market, on the
balancing power market agents mainly base their bid decision on their opportunity costs. High
prices on the day-ahead market mean high foregone profits for an agent that commits his capacity
for minute reserve purposes. So, when prices are high on the day-ahead market, agents tend to bid
higher on the balancing power market as well.
i u
i u
i u
i u
i u
i u
i u
i u
Generator1 22.26 262.25 0.00 0.00 60.00 300.00 14.56 51.35
Generator2 20.53 244.34 0.00 0.00 80.00 300.00 16.40 62.95
Generator3 28.64 237.32 0.00 0.00 80.00 300.00 19.88 75.16
Generator4 40.66 214.74 0.00 0.00 90.00 300.00 24.59 81.40
Generator5 49.06 187.08 0.00 0.00 100.00 300.00 27.25 98.87
Generator6 22.67 252.60 0.00 0.00 65.00 300.00 14.97 64.23
Generator7 23.78 247.79 0.00 0.00 84.48 300.00 17.59 60.26
Generator8 26.92 234.00 0.00 0.00 100.00 300.00 20.27 74.70
Generator9 39.49 209.56 0.00 0.00 100.00 300.00 25.32 81.95
Generator10 46.33 177.95 0.00 0.00 100.00 300.00 69.61 93.08
Table 4: Individual agents bid prices and volumes on the day-ahead market (simulation scenario:
The single agents bidding decisions on the day-ahead market are summarised in Table 4. It can be
shown that agents with low marginal costs realise their strategic advantage through bidding at lower
prices on average. This ensures that their bids are accepted with a higher probability, so they are
able to sell more electricity. It can also be shown that agents with high marginal costs tend to bid
less capacity into the market. This can be interpreted as a withholding strategy, which is applied in
order to raise the market price. According to the observed bidding strategies, it can be stated that
no-load costs dont play a significant role in the bidding decision on the day-ahead market; agents
SD = standard deviation
whose generation unit characteristics only differ in no-load costs tend to apply very similar bidding
Table 5 shows the individual bidding decisions on the balancing power market. Only capacity bid
prices are displayed. It can be seen that on average, most agents bid higher prices in a market with
pay-as-bid pricing as compared to a uniform-price market. However, this rise in bid prices does not
lead to higher market clearing prices in a pay-as-bid setting (see results for the balancing power
market in Table 3). The average gain agents attain from receiving their higher bid prices under
pay-as-bid does not outweigh the gain that infra-marginal bidders have from receiving the uniform
price, which corresponds to the highest accepted bid. It is also interesting to note that generators
with lower marginal cost tend to bid higher prices in the balancing power market; this, too, is due to
the strong influence of opportunity costs on the bidding decision.
balance cap
i u
, BalancingDayAhead_uniform
balance cap
i u
, BalancingDayAhead_payAsBid
Average Min Max SD Average Min Max SD
Generator1 432.51 275.00 500.00 49.75 431.67 250.00 500.00 54.55
Generator2 434.21 275.00 500.00 49.33 432.65 275.00 500.00 50.28
Generator3 431.21 275.00 500.00 56.14 433.75 300.00 500.00 54.81
Generator4 424.47 300.00 500.00 50.71 427.13 275.00 500.00 51.53
Generator5 408.51 200.00 500.00 73.87 419.79 200.00 500.00 66.82
Generator6 402.97 75.00 500.00 87.31 407.71 150.00 500.00 77.24
Generator7 414.54 100.00 500.00 72.26 417.59 200.00 500.00 63.20
Generator8 412.22 25.00 500.00 90.86 416.26 100.00 500.00 79.56
Generator9 397.46 100.00 500.00 85.92 400.72 50.00 500.00 79.33
Generator10 388.59 75.00 500.00 99.77 397.47 100.00 500.00 86.54
Table 5: Individual agents bid prices and volumes on the balancing power market (simulation scenario:
BalanceDayAhead_uniform and BalanceDayAhead_payAsBid)
The question whether pay-as-bid or uniform price leads to lower market prices is controversially
discussed in the literature (e.g. [Kahn et al. 2001], [Rassenti, Smith, Wilson 2003], [Bower, Bunn
2001], [Xiong, Okuma, Fujita 2004]). Some researchers argue that pay-as-bid leads agents to bid at
higher prices, resulting in higher average market prices. Others, in contrast, argue that it leads to
lower overall prices, because infra-marginal agents reach lower prices as they would under uniform-
pricing. Our simulation results suggest that pay-as-bid does in fact result in higher bid prices, but
the bid price increase is not high enough to result in higher overall prices.
5. Conclusions and outlook
In this paper we modelled different scenarios with sequentially cleared electricity markets and two
different pricing mechanisms. We apply agent-based simulation for evaluating market outcomes for
the different tested scenario and for analysing market interrelationships. Generators are modelled as
adaptive agents that apply a reinforcement learning algorithm which allows them to iteratively
approximate their profit maximizing strategy. Within the tested scenarios, we differentiate between
bidding strategies in a day-ahead market with physical settlement, and strategies in a day-ahead
balancing power market. We shift the order of market execution and vary the pricing mechanism
from pay-as-bid to uniform price. Simulations for each setting are repeated over 100 runs in order to
level out the influence of the random number seeds taken for each learning instance.
We find that prices on the day-ahead market are higher if this market is cleared after the balancing
power market. We argue that this is due to the fact that competition is weaker in this case, as some
agents have already committed (part of) their capacity on the balancing power market. The reduced
supplier concentration enables agents to successfully bid higher mark-ups to their marginal costs.
Results on the balancing power market give a different picture: prices are lower if this market is
cleared second. Here, the effect of agents integrating their opportunity costs into their result
evaluation leads to higher prices when day-ahead prices are high, and lower prices when day-ahead
prices are low. As for the pricing rule on the balancing power market, we find the following result:
average prices are higher under uniform price than under pay-as-bid although agents bid at higher
prices under pay-as-bid. The increase in bid prices is outweighed by the effect of all infra-marginal
bidders receiving the marginal, i.e. highest accepted bid under uniform pricing.
The findings presented in this paper leave us confident that agent-based simulation can reproduce
realistic market outcomes. It is therefore a suitable tool for market design questions, because
different market mechanisms can be tested under exactly the same condition of agent learning. In
order to make agent-based simulation a useful market design tool, we will include more realistic
data characterising the electricity system into our model. Our simulation model is implemented in a
flexible way, so that is also easily extensible to other forms of generation or demand representation.
For example, marginal costs can also take a linear form instead of being constant, or agents can own
several generation units and engage in combined trading strategies for all their capacity. Hourly
contracts and smaller bid blocs for the minute reserve can also be modelled with the current
implementation. Using these features for analysing more settings is subject to our future research.
