Coding On Copilot 2024 Developer Research
Coding On Copilot 2024 Developer Research
GitHub has published several pieces of research on the growth and impact
of AI on software development. Among their findings is that developers
write code “55% faster” when using Copilot. This profusion of
LLM-generated code begs the question: how does the code quality and
maintainability compare to what would have been written by a human? Is
it more similar to the careful, refined contributions of a Senior Developer, or
more akin to the disjointed work of a short-term contractor?
From Dohmke's 2023 blog post, "The economic impact of the AI-powered developer lifecycle and
lessons from GitHub Copilot"
In the same post, Dohmke asserts that more than 20,000 organizations are already
using GitHub Copilot for Business. This follows GitHub’s announcement from
February 2023 that "more than one million people" were already using Copilot on a
Personal license when Copilot for Business was released. GitHub has been making
commendable progress on both advancing AI quality, and on being transparent about
the result of their efforts.
GitHub claims that code is written "55% faster" with Copilot. But what about code that
shouldn't be written in the first place? The problem here is that code spends 10x more
time being read than being written, according to Robert Martin, author of Clean Code:
A Handbook of Agile Software Craftsmanship. Writing bad code faster implies
considerable pains for the subsequent code readers.
That is the first of many challenges facing developers who use an AI assistant. Others
include:
1. Being inundated with suggestions for added code, but never suggestions
for updating, moving, or deleting code. This is a user interface limitation of
the text-based environments where code authoring occurs.
These drawbacks may explain the difference between the greater tendency of Junior
Developers to accept code suggestions compared to their more experienced
counterparts. According to GitHub’s research:
GitHub's own data suggests that Junior Developers use Copilot around 20% more than experienced
developers
Experienced developers have the most informed understanding of how costly code
will be to maintain over time. If they are more averse to using AI suggestions, it raises
questions about the extra code that Junior developers are now contributing, faster
than ever?
Code Change Definitions
To analyze how code quality is changing, we will review the differences in types of
code changes observed in 2023 vs. the years prior, when AI was much less prevalent.
GitClear classifies code changes (operations) into seven categories. The first six
operations are analyzed in this research:
1. Added code. Newly committed lines of code that are distinct, excluding lines
that incrementally change an existing line (labeled "Updates"). "Added code"
also does not include lines that are added, removed, and then re-added (these
lines are labeled as "Updated" and "Churned")
2. Deleted code. Lines of code that are removed, committed, and not
subsequently re-added for at least the next two weeks.
3. Moved code. A line of code that is cut and pasted to a new file, or a new
function within the same file. By definition, the content of a "Moved" operation
doesn't change within a commit, except for (potentially) the white space that
precedes the content.
4. Updated code. A committed line of code based off an existing line of code, that
modifies the existing line of code by approximately three words or less.
5. Find/Replaced code. A pattern of code change where the same string is
removed from 3+ locations and substituted with consistent replacement content.
6. Copy/Pasted code. Identical line contents, excluding programming language
keywords (e.g., end, }), [), that are committed to multiple files or functions
within a commit.
7. No-op code. Trivial code changes, such as changes to white space, or changes
in line number within the same code block. No-op code is excluded from this
research.
Specific examples of GitClear's code operations can be found in the Diff Delta
documentation. GitClear has been classifying git repos by these operations since
2020. As of January 2024, GitClear has analyzed and classified around a billion lines
of code over four years, from a mixture of commercial customers (e.g., NextGen
Health, Verizon) and popular open source repos (e.g., Facebook React, Google
Chrome). 153 million lines of code were meaningful (not No-op) line changes, used for
this research.
Along with the evolution of code change operations, we are also exploring the change
in "Churned code." This is not treated as a code operation, because a churned line
can be paired with many operations, including "Added," "Deleted," or "Updated" code.
For a line to qualify as "churned," it must have been authored, pushed to the git repo,
and then reverted or substantially revised within the subsequent two weeks. Churn is
best understood as "changes that were either incomplete or erroneous when the
author initially wrote, committed, and pushed them to the company’s git repo."
Here are how these look in graph form, where the left axis illustrates the prevalence of
code change operations (which, as percentages, sum to 1). The right axis and light
blue line track the corresponding change in "Churn" code:
The projections for 2024 utilize OpenAI's gpt-4-1106-preview Assistant to run a
quadratic regression on existing data. The full method used to interrogate the OpenAI
Assistant is provided in the Appendix. Given the exponential growth of Copilot
reported by GitHub, and AI Assistants in general, it seems likely that 2024's numbers
will continue the trends that began to take form in 2022 and accelerated in 2023.
Looking only at the differences in operation frequency between 2022 and 2023, we
find three red flags for code quality:
Added +3.1%
Deleted +4.8%
Updated +5.2%
Moved -17.3%
Copy/Pasted +11.3%
Find/Replaced -1.3%
Churn +39.2%
Interpreting Code Operation Changes
The most significant changes observed in 2023 are to "Churn," "Moved," and
"Copy/Pasted" code. The implications for each change are reviewed in this section.
Burgeoning Churn
Recall that "Churn" is the percentage of code that was pushed to the repo, then
subsequently reverted, removed or updated within 2 weeks. This was a relatively
infrequent outcome when developers authored all their own code -- only 3-4% of code
was churned prior to 2023, although there is also a hint of the coming uptick in 2022,
when Churn jumped 9%. 2022 was the first year Copilot was available in beta, and the
year that ChatGPT became available.
In 2022-2023, the rise of AI Assistants are strongly correlated with "mistake code"
being pushed to the repo. If we assume that Copilot prevalence was 0% in 2021,
5-10% in 2022 and 30% in 2023 (corresponding to citations [1] and [8]), the Pearson
correlation coefficient between these variables is 0.98 (see “Correlation between
Churn & Copilot” in Appendix for more details on calculation). Which is to say, that
they have grown in tandem.
The more Churn becomes commonplace, the greater the risk of mistakes being
deployed to production. If the current pattern continues into 2024, more than 7% of all
code changes will be reverted within two weeks, double the rate of 2021. Based on
this data, we expect to see an increase in Google DORA's "Change Failure Rate"
when the “2024 State of Devops” report is released later in the year, contingent on
that research using data from AI-assisted developers in 2023.
Since most developers derive greater satisfaction from "implementing new features"
than they do "interpreting potentially reusable code," copy/pasted code often persists
long past its expiration date. Especially on less experienced teams, there may be no
code maintainer with the “moral authority” to remove the duplicative code. Even when
there are Senior Developers possessing such authority, the willpower cost of
understanding code well enough to delete it is hard to overstate.
If there isn't a CTO or VP of Engineering who actively schedules time to reduce tech
debt, you can add "executive-driven time pressures" to the list of reasons that newly
added copy/paste code will never be consolidated into the component libraries that
underpin long-term development velocity.
Since GitClear operations only include code that is duplicated within a single commit,
it is likely that the 11% copy/paste measured in 2023 is only a fraction of the total
pasting being quietly seeded into repos during 2024.
Trends in Revised Code Age
A second, independent means to assess how code quality has changed in 2023 vs.
before is to analyze the data from GitClear's Code Provenance derivation. A “Code
Provenance” assessment evaluates the length of time that passes between when
code is authored, and when it is subsequently updated or deleted.
Less than 2 Less than one Less than one 1-2 years
weeks month year
The corresponding raw numbers are in the Appendix. Graphing the data, we find:
Interpreting Code Age Trends
Code Provenance data corroborates the patterns observed in the Code Operation
analysis. The age of the code when it is replaced has shifted much younger from 2022
to 2023. Specifically, code replaced in less than two weeks has jumped by 10%.
Meanwhile, code older than one month was changed 24% less frequently in 2023 vs
2022 (19.4% of all changes vs. 25.5% previously).
The trend implies that, prior to AI Assistants, developers may have been more likely to
find recently authored code in their repo to target for refinement and reuse. Around
70% of products built in the early 2020s use the Agile Methodology, per a Techreport
survey [5]. In Agile, features are typically planned and executed per-Sprint. A typical
Sprint lasts 2-3 weeks. It aligns with the data to surmise that teams circa 2020 were
more likely to convene post-Sprint, to discuss what was recently implemented and
how to reuse it in a proximal Sprint.
Another salient question in light of this data: at what rate does development progress
become inhibited by additional code? Especially when it comes to copy/pasted code,
there is almost certainly an inverse correlation between "the number of lines of code in
a repo" and "the velocity at which developers can modify those lines." The current
uncertainty is "when is the accumulated copy/paste tech debt too great to ignore?"
Knowing the rate at which slowdown takes hold would allow future tools to highlight
when a manager should consider cutting back time on new features.
A final question worthy of exploration: what is the total percentage of copy/pasted
code that is now occurring, compared to 2020-2022? Since GitClear currently
measures only copy/paste code within the context of an individual commit, it seems
likely that the total copy/paste volume (all non-keyword, non-comment lines of code
repeated within a file) might be double what GitClear currently measures. Could
copy/paste really represent 20-25% of all code operations in 2024?
GitClear will look to address these questions in future research, and we encourage
other researchers in the field to contribute their data. If you would like to partner with
GitClear to undertake further research, our contact information is in the Appendix.
Conclusion: Devs Wary for a Reason?
By both data points we evaluated, negative pressures on code quality were present in
2023. This correlates with the proliferation of LLMs in general, and AI Code Assistants
in particular.
Developer assessments, like GitHub's 2023 survey with Wakefield Research, hint that
developers already perceive the decrease in code quality. When asked "What metrics
should you be evaluated on, absent AI," their top response was "Collaboration and
Communication," followed by "Code Quality" in second place.
When the question switched to "What metrics should you be evaluated on, when
actively using AI?" their responses shifted, with "Code Quality" now the top concern,
and "Number of Production Incidents" rising to the #3 concern:
While individual developers lack the data to substantiate why "Code Quality" and
"Production Incidents" become more pressing concerns with AI, our data suggests a
possible backstory: When developers are inundated with quick and easy suggestions
that will work in the short term, it becomes a constant temptation to add more lines of
code without really checking whether an existing system could be refined for reuse.
How will AI Assistants and Copilot transform what it means to be a developer? There's
no question that, as AI has proliferated, we have entered an era where code lines are
added faster than ever. The better question for 2024: who's on the hook to clean up
what’s left afterward?
Citations
1. The economic impact of the AI-powered developer lifecycle
and lessons from GitHub Copilot [GitHub]
Here were some secondary characteristics of the data set analyzed, to aid in evaluating its
validity/applicability relative to existing data sets the reader may possess:
Year Commit count Committer count Repos analyzed Code files changed
In CSV pasteable form, for your reanalysis convenience (2024 omitted since it is a projection you can
replace with your own):
Year,Added,Deleted,Updated,Moved,Copy/pasted,Find/replaced,Lines
changed,Churn
2020,9071731,4508098,1202480,5786718,1911855,676000,23156882,769493
2021,14464864,6969778,1826579,9043649,3087530,1234213,36626613,1331278
2022,16868378,8280031,2146768,8407677,3873240,1512708,41088802,1630703
2023,22626714,11288962,2938800,9040659,5607373,1942194,53444702,2952912
Queries used to produce data
The data was stored in a Postgres database and was queried via Ruby on Rails' ActiveRecord.
# Operation by year
2020.upto(2023).map { |year| CodeLine.where(authored_at: Time.new(year, 1,
1)..Time.new(year + 1, 1, 1), commit_impacting:
true).group(:operation_em).count }
# Commits by year
Commit.impacting.where(authored_at:
Time.local(2020,1,1)..Time.local(2024,1,1)).group("EXTRACT (year from
authored_at)").count
# Committers committing by year
2020.upto(2023).map { |year|
Committer.joins(:commits).merge(Commit.impacting).where(commits: {
authored_at: Time.local(year,1,1)..Time.local(year+1,1,1)
}).group(:id).count.size }
# Repos changed by year
2020.upto(2023).map { |year|
Repo.joins(:commits).merge(Commit.impacting).where(commits: { authored_at:
Time.local(year,1,1)..Time.local(year+1,1,1) }).group(:id).count.size }
# Files by year
2020.upto(2023).map { |year|
CommitCodeFile.impacting.joins(:commit).where(commits: { authored_at:
Time.local(year,1,1)..Time.local(year+1,1,1) }).group(:id).count.size }
1. Added code
2. Updated code
3. Deleted code
4. Copy/pasted code
5. Find/replaced code
6. Moved code
7. No-op code
GitClear’s data is split about two-thirds private corporations that have opted in to anonymized data
sharing, and one-third open source projects (mostly those run by Google, Facebook, and Microsoft).
In addition to the code operation data, GitClear’s data set also segments and excludes lines if they
exist within auto-generated files, subrepo commits, and other exclusionary criteria enumerated in this
documentation. As of January 2024, that documentation suggests that a little less than half of the
“lines changed” by a conventional git stats aggregator (e.g., GitHub) would qualify for analysis among
the 150m lines in this study. The study does include commented lines – future research could
compare comment vs. non-comment lines. It could also compare “test code” vs “other types of code,”
which probably influences the levels of copy/paste.
If you know of other companies that report code operations of comparable granularity, please contact
hello@gitclear.com and this section will be updated, and a new PDF document will be uploaded with
credit given to the contributor (if desired).
GitClear offers reports that answer all three questions (operations recognized, operation report,
provenance report). It also offers a Tech Debt browser, and the ability to enable email notifications
when quality is on the downward slope.
The first year is 2020, the last year is 2023. The estimate of 30% is taken from the independent
O’Reilly Publishing survey in citation [8], stating that less than a third of developers claim to currently
use Copilot or ChatGPT for coding as of August 2023. The 10% estimate is based on GitHub’s blog
post in [1] stating that Copilot launched a year earlier, and had grown to include “millions of
developers.” If we guess that the prevalence of Copilot was only 5% in 2022, there is no significant
change to the Pearson coefficient.
Updates
● Jan 26, 2024 Improved clarity and consonance of data with language. Add Contact
Information.
Contact information
If you would like to discuss this research, or have ideas on how to improve it, please contact
hello@gitclear.com or bill@gitclear.com directly. We are happy to consider improvements to the clarity
of this writing, or to explain how GitClear can help teams measure the metrics explored by this
research.