Developer Productivity-SPACE Framework
Developer Productivity-SPACE Framework
Developer Productivity-SPACE Framework
productivity ONLY
The
SPACE
Developer
of
D
NICOLE FORSGREN, Github eveloper productivity is complex and nuanced,
MARGARET-ANNE STOREY, with important implications for software
University Of Victoria development teams. A clear understanding of
CHANDRA MADDILA, defining, measuring, and predicting developer
THOMAS ZIMMERMANN, productivity could provide organizations,
BRIAN HOUCK, AND managers, and developers with the ability to make higher-
JENNA BUTLER, quality software—and make it more efficiently.
Microsoft Research Developer productivity has been studied extensively.
Unfortunately, after decades of research and practical
development experience, knowing how to measure
productivity or even define developer productivity has
remained elusive, while myths about the topic are common.
Far too often teams or managers attempt to measure
developer productivity with simple metrics, attempting to
capture it all with “one metric that matters.”
One important measure of productivity is personal
perception;1 this may resonate with those who claim to be
in “a flow” on productive days.
There is also agreement that developer productivity
is necessary not just to improve engineering outcomes,
but also to ensure the well-being and satisfaction of
developers, as productivity and satisfaction are intricately
connected.12,20
T
statisticians can capitalize upon to study, compare, and
he most
important
understand developer productivity across many different
takeaway contexts. This forced disruption and the future transition
from to hybrid remote/colocated work expedites the need to
exposing understand developer productivity and well-being, with
these myths is
that productivity wide agreement that doing so in an efficient and fair way is
cannot be re- critical.
duced to a single This article explicates several common myths and
dimension
misconceptions about developer productivity. The most
(or metric!).
important takeaway from exposing these myths is that
productivity cannot be reduced to a single dimension (or
metric!). The prevalence of these myths and the need
to bust them motivated our work to develop a practical
multidimensional framework, because only by examining
a constellation of metrics in tension can we understand
and influence developer productivity. This framework,
called SPACE, captures the most important dimensions
of developer productivity: satisfaction and well-being;
performance; activity; communication and collaboration;
and efficiency and flow. By recognizing and measuring
productivity with more than just a single dimension, teams
and organizations can better understand how people and
teams work, and they can make better decisions.
P
metric or activity data alone; and it isn’t something that
roductivity
and satis-
only managers care about. The SPACE framework was
faction are developed to capture different dimensions of productivity
correlated, because without it, the myths just presented will persist.
and it is The framework provides a way to think rationally about
possible that
satisfaction productivity in a much bigger space and to choose metrics
could serve as a carefully in a way that reveals not only what those metrics
leading indicator mean, but also what their limitations are if used alone or in
for productivity.
the wrong context.
Performance
Performance is the outcome of a system or process. The
performance of software developers is hard to quantify,
because it can be difficult to tie individual contributions
directly to product outcomes. A developer who produces
a large amount of code may not be producing high-
quality code. High-quality code may not deliver customer
I
evaluating the performance of any individual developer. In
t is almost
impossible to
many companies and organizations, software is written by
comprehen- teams, not individuals.
sively measure For these reasons, performance is often best evaluated
and quantify as outcomes instead of output. The most simplified view of
all the facets
of developer software developer performance could be, Did the code
activity across written by the developer reliably do what it was supposed
engineering to do? Example metrics to capture the performance
systems and
dimension include:
environments.
3Q uality. Reliability, absence of bugs, ongoing service
health.
3 I mpact. Customer satisfaction, customer adoption and
retention, feature usage, cost reduction.
Activity
Activity is a count of actions or outputs completed in the
course of performing work. Developer activity, if measured
correctly, can provide valuable but limited insights about
developer productivity, engineering systems, and team
efficiency. Because of the complex and diverse activities
that developers perform, their activity is not easy to
measure or quantify. In fact, it is almost impossible to
comprehensively measure and quantify all the facets
T
Finally, efficiency and flow capture the ability to complete
he concep-
tualization
work or make progress on it with minimal interruptions or
of produc- delays, whether individually or through a system. This can
tivity is include how well activities within and across teams are
echoed by orchestrated and whether continuous progress is being made.
many developers
when they talk Some research associates productivity with the ability
about “getting to get complex tasks done with minimal distractions or
into the flow” interruptions.2 This conceptualization of productivity is
when doing
echoed by many developers when they talk about “getting
their work.
into the flow” when doing their work—or the difficulty
in finding and optimizing for it, with many books and
discussions addressing how this positive state can be
achieved in a controlled way.4 For individual efficiency
(flow), it’s important to set boundaries to get productive
and stay productive—for example, by blocking off time for
a focus period. Individual efficiency is often measured by
uninterrupted focus time or the time within value-creating
apps (e.g., the time a developer spends in the integrated
development environment is likely to be considered
“productive” time).
At the team and system level, efficiency is related to
value-stream mapping, which captures the steps needed
to take software from idea and creation to delivering it to
E
In addition to the flow of changes through the system,
fficiency
is related
the flow of knowledge and information is important.
to all Certain aspects of efficiency and flow may be hard to
the SPACE measure, but it is often possible to spot and remove
dimensions. inefficiencies in the value stream. Activities that produce
no value for the customer or user are often referred to as
software development waste19—for example, duplicated
work, rework because the work was not done correctly, or
time-consuming rote activities.
Some example metrics to capture the efficiency and
flow dimension are:
3 Number of handoffs in a process; number of handoffs
across different teams in a process.
3 Perceived ability to stay in flow and complete work.
3 Interruptions: quantity, timing, how spaced, impact on
development work and flow.
3 Time measures through a system: total time, value-
added time, wait time.
Efficiency is related to all the SPACE dimensions.
Efficiency at the individual, team, and system levels has
been found to be positively associated with increased
satisfaction. Higher efficiency, however, may also
negatively affect other factors. For example, maximizing
FRAMEWORK IN ACTION
To illustrate the SPACE framework, figure 1 lists concrete
metrics that fall into each of the five dimensions. The
figure provides examples of individual-, team- or group-,
and system-level measures. Three brief discussions about
these metrics follow: First, an example set of metrics
concerning code review is shown to cover all dimensions of
the SPACE framework, depending on how they are defined
and proxied. Next, additional examples are provided for
two select dimensions of the framework: activity, and
efficiency and flow. The section closes with a discussion
of how to use the framework: combining metrics for a
holistic understanding of developer productivity, as well
as cautions. The accompanying sidebar shows how the
framework can be used for understanding productivity in
incident management.
Let’s begin with code review as an example scenario that
presents a set of metrics that can cover all five dimensions
of the SPACE framework, depending on how it is framed
and which metric is used:
3 Satisfaction. Perceptual measures about code reviews
can reveal whether developers view the work in a good or
bad light—for example if they present learning, mentorship,
or opportunities to shape the codebase. This is important,
is
ne
r
he
yo
kt n
et
FIGURE 1: Example metrics
or atio
s
lth
ut
og
ea
tp
d w or
io al
y, ng
dh
ou
pt im
an ab
s
pp bei
es
ns
ru in
lk oll
an
or
or k w low
ha l-
er h m
oc
ta c
d, e l
ns
pr
ys or f
le n &
lle & w
int it
tio
la w &
fa
op tio
y
om e
ac
lfi n
tc nc
de ing enc
eo
fu tio
pe ca
of
ou ma
Do ici
w uni
w ac
e c ty
nt
r
f
Ho tisf
An fo
Th ivi
Ho mm
Ef
ou
r
t
l
Pe
Co
Sa
Ac
ve
Le
†
Use these metrics with (even more) caution — they can proxy more things.
3
in Incident Management
both the team and system
The SPACE framework is relevant
levels. In this example, as the
for SREs (site reliability engineers)
teams continue to improve and
and their work in IM (incident management). An
iterate, they could exchange
incident occurs when a service is not available or
the activity metric lines of code
is not performing as defined in the SLA (service-
for something like number of
level agreement). An incident can be caused
commits.
by network issues, infrastructure problems,
hardware failures, code bugs, or configuration
WHAT TO WATCH FOR
issues, to name a few.
Having too many metrics
Based on the magnitude of the impact caused
may also lead to confusion
by an incident, it is typically assigned a severity
and lower motivation; not
level (sev-1 being the highest). An outage to the
all dimensions need to be
entire organization’s customer-facing systems
included for the framework
is treated differently than a small subset of
to be helpful. For example,
internal users experiencing a delay in their email
if developers and teams
delivery.
are presented with an
Here are some of the common myths
extensive list of metrics
associated with IM:
and improvement targets,
3 Myth: Number of incidents resolved by an
meeting them may feel like an
individual is all that matters. Like a lot of other
unattainable goal. With this in
activities in the SDLC (software development
mind, note that a good measure
life cycle), IM is a team activity. A service that
of productivity consists of a
causes a lot of outages and takes more hours to
handful of metrics across at
restore reflects badly on the entire team that
least three dimensions; these
develops and maintains the service. More team-
can prompt a holistic view, and
focused activities such as knowledge sharing,
they can be sufficient to evoke
preparing troubleshooting guides to aid other
improvement.
should be cognizant of
factors of the environment and work culture developer privacy and report
have substantial impact too. Mentoring new only anonymized, aggregate
members of the team and morale building are results at the team or group
important. If developers are constantly being level. (In some countries,
paged in the night for sev-1 incidents while reporting on individual
working from home during Covid-19, these productivity isn’t legal.)
“invisible” factors are especially helpful to make Individual-level productivity
them more productive. analysis, however, may be
Incident management is a complex process insightful for developers. For
that involves various stakeholders performing example, previous research
several individual and team activities, and shows that typical developer
it requires support from different tools and work shifts depend on the
systems, so it is critical to identify metrics that phase of development, and
can capture various dimensions of productivity: developers may have more
3 Satisfaction: how satisfied SREs are with the
productive times of day.14
IM process, escalation and routing, and on-call Developers can opt in to these
rotations are key metrics to capture, especially types of analyses, gaining
since burnout is a significant issue among SREs. valuable insights to optimize
3 Performance: these measures focus on
their days and manage their
system reliability; monitoring systems’ ability to energy.
detect and flag issues faster, before they hit the Finally, any measurement
customer and become an incident. MTTR (mean paradigm should check for
time to repair) overall, and by severity. biases and norms. These
3 Activity: number of issues caught by the
are external influences that
monitoring systems, number of incidents may shift or influence the
created, number of incidents resolved—and their measures. Some examples are
severity distribution. included here, but they aren’t
3 Communication and collaboration: people
exhaustive, so all teams are
included in resolving the incident, how many encouraged to look for and
teams those people came from, and how they think about external influences
Acknowledgments
We are grateful for the thoughtful review and insightful
comments from our reviewers and are confident that
incorporating their notes and responses has strengthened
the paper. We are excited to have it published in acmqueue.
References
1. Beller, M., Orgovan, V., Buja, S., Zimmermann, T.
2020. Mind the gap: on the relationship between
automatically measured and self-reported productivity.
IEEE Software; https://arxiv.org/abs/2012.07428.
2. Brumby, D. P., Janssen, C. P., Mark, G. 2019.
How do interruptions affect productivity? In
Rethinking Productivity in Software Engineering,
ed. C. Sadowski and T. Zimmermann, 85-107.
Berkeley, CA: Apress; https://link.springer.com/
chapter/10.1007/978-1-4842-4221-6_9.
3. B utler, J. L., Jaffe, S. 2020. Challenges and gratitude: a