METR Study Finds AI Coding Tools Reduce Developer Productivity by 19 Percent

A randomized controlled trial by METR found that experienced open-source developers completed tasks 19% slower when using AI coding assistants, contradicting both developer expectations and industry claims about AI productivity gains.

## Executive Brief

Technical diagram showing vulnerability chain

Figure 1: Visual representation of the BeyondTrust vulnerability chain

Executive Brief

Research organization METR published results on July 10, 2025, from a randomized controlled trial measuring the impact of AI coding assistants on developer productivity. The study found that experienced open-source developers completed tasks 19% slower when using AI tools compared to working without them.

The research involved 16 developers with an average of five years of experience on their respective open-source projects. Participants completed 246 real issues from their own repositories, with random assignment determining whether AI assistance was available for each task. Developers used Cursor Pro with Claude 3.5 and 3.7 Sonnet models.

The findings contradict widespread assumptions about AI coding tool benefits. Before the study, participating developers estimated they would be 20% faster with AI assistance. Industry experts surveyed by METR predicted even larger gains of 38-39%. The actual result showed a 19% slowdown, representing a gap of nearly 40 percentage points between expectations and measured outcomes.

METR, which focuses on AI safety and capability evaluation, designed the study to address limitations in previous productivity research. Prior studies often relied on self-reported metrics, artificial tasks, or developers unfamiliar with the codebases they worked on. By using real issues on projects where developers had years of experience, the study aimed to measure productivity in conditions closer to professional software development.

The research has generated significant discussion in the software development community, with the Hacker News thread accumulating 775 points and 485 comments within hours of publication.

What Happened

METR released the study results on July 10, 2025, through both a blog post and a preprint paper on ArXiv. The research was conducted in early 2025 with results analyzed over subsequent months.

The study recruited 16 developers who were active contributors to open-source projects. Each developer had at least five years of experience working on their specific project, ensuring deep familiarity with the codebase, architecture, and development patterns.

Researchers identified 246 real issues from the participants' repositories. For each issue, a randomization process determined whether the developer would work with or without AI assistance. Developers in the AI-assisted condition used Cursor Pro, a commercial AI coding environment, with access to Claude 3.5 and Claude 3.7 Sonnet language models.

The primary metric was task completion time. Researchers measured how long developers took to resolve each issue, comparing times between AI-assisted and non-assisted conditions.

Results showed that developers using AI tools took 19% longer on average to complete tasks. The confidence interval for this finding ranged from 4% to 38% slower, indicating statistical significance despite the relatively small sample size.

Prior to beginning the study, researchers surveyed participants about their expectations. Developers predicted they would complete tasks 20% faster with AI assistance. METR also surveyed AI industry experts, who predicted productivity gains of 38-39%.

Figure 2: How the authentication bypass vulnerability works

Key Claims and Evidence

The study's central finding is a 19% increase in task completion time when using AI coding assistants. According to the METR paper, this result held across different types of tasks and developer experience levels within the sample.

METR researchers attributed the slowdown to several factors observed during the study. Developers spent significant time reviewing, editing, and debugging AI-generated code. The time saved by not writing code manually was offset by the time required to verify and correct AI outputs.

The perception gap between expected and actual productivity represents a key secondary finding. Developers believed AI tools would make them 20% faster, while the measured result showed 19% slower performance. The researchers noted that this perception gap persisted even after developers completed the study, with many participants expressing surprise at the results.

The study design addressed common criticisms of prior AI productivity research. By using real issues rather than artificial benchmarks, developers worked on tasks representative of actual software development. By requiring five years of project experience, the study avoided measuring the learning curve that might affect developers new to a codebase.

METR documented that developers in the AI-assisted condition frequently accepted AI suggestions that required subsequent correction. The iterative process of generating, reviewing, and fixing AI code consumed more time than manual coding for experienced developers familiar with their projects.

Pros and Opportunities

The study provides empirical data for organizations evaluating AI coding tool investments. Rather than relying on vendor claims or anecdotal reports, decision-makers can reference controlled research when assessing potential productivity impacts.

METR's methodology offers a template for future productivity research. The combination of randomized assignment, real tasks, and experienced developers addresses limitations that have affected prior studies in this area.

The findings may help developers calibrate their expectations when using AI tools. Understanding that AI assistance does not automatically translate to faster completion times could inform how developers choose to integrate these tools into their workflows.

For AI tool developers, the research identifies specific areas where current tools fall short. The time spent reviewing and correcting AI-generated code suggests opportunities for improvement in code quality, context understanding, and error reduction.

The study's focus on experienced developers working on familiar codebases highlights a specific use case where AI tools may be less beneficial. Different results might emerge for developers learning new codebases or working on unfamiliar technologies.

Figure 3: Privilege escalation from user to SYSTEM level

Cons, Risks, and Limitations

The sample size of 16 developers limits the generalizability of the findings. While the 246 tasks provide statistical power, the small number of participants means individual variation could influence results.

The study used specific AI tools (Cursor Pro with Claude models) that may not represent the full range of available AI coding assistants. Different tools or models might produce different results.

Developers in the study had extensive experience with their projects, averaging five years. The findings may not apply to developers working on unfamiliar codebases, where AI tools might provide more relative benefit.

The study measured task completion time but did not assess code quality, maintainability, or long-term project health. Faster completion does not necessarily indicate better outcomes if the resulting code has quality issues.

Open-source development may differ from commercial software development in ways that affect AI tool utility. Corporate codebases, team structures, and development practices could produce different results.

The study was conducted in early 2025, and AI coding tools continue to evolve. Results from this period may not reflect the capabilities of tools released subsequently.

How the Technology Works

AI coding assistants like Cursor Pro integrate large language models into development environments. When developers write code, the AI analyzes context from the current file, related files, and project structure to generate suggestions.

The tools can complete partial code, generate entire functions based on comments or descriptions, and suggest fixes for errors. Developers interact with AI suggestions through acceptance, rejection, or modification of proposed code.

Claude 3.5 and 3.7 Sonnet, the models used in the study, are large language models trained on code and natural language. They process the developer's current context and generate code that attempts to match the project's patterns and requirements.

The productivity impact depends on the quality of AI suggestions relative to the time required to evaluate them. When suggestions are accurate and require minimal modification, developers save time. When suggestions contain errors or misunderstand requirements, developers spend time identifying and correcting problems.

Technical context (optional): The study's methodology involved randomization at the issue level rather than the developer level. Each developer worked on both AI-assisted and non-assisted tasks, controlling for individual skill differences. Time measurement began when developers started working on an issue and ended when they submitted a solution, capturing the full development cycle including any AI interaction overhead.

Broader Industry Implications

The study challenges narratives about AI coding tools that have driven significant investment and adoption. Vendor claims of productivity improvements, often based on less rigorous research, face scrutiny in light of these findings.

Software development organizations may reconsider AI tool procurement and deployment strategies. The assumption that AI tools automatically improve productivity appears less certain for experienced developers working on established codebases.

The perception gap documented in the study raises questions about how developers and organizations evaluate tool effectiveness. Self-reported productivity gains may not reflect actual performance improvements.

AI tool developers face pressure to demonstrate measurable benefits through rigorous research. Marketing claims based on user surveys or artificial benchmarks may carry less weight as controlled studies become available.

The findings contribute to broader discussions about AI capabilities and limitations. While AI tools have demonstrated impressive abilities in generating code, translating those abilities into real-world productivity gains proves more complex than initial enthusiasm suggested.

What Remains Unclear

Whether the findings generalize to developers with less project experience remains untested. The study specifically selected developers with deep familiarity with their codebases, leaving open questions about AI tool utility for onboarding or cross-project work.

The mechanisms behind the productivity slowdown require further investigation. While METR identified time spent reviewing and correcting AI code as a factor, the relative contributions of different overhead sources are not fully quantified.

How results would differ with other AI coding tools or models is unknown. The study used specific products available in early 2025, and the rapidly evolving AI tool landscape means current offerings may perform differently.

Long-term effects of AI tool usage on developer skills and productivity are not addressed. The study measured immediate task completion but did not examine whether extended AI tool use affects developer capabilities over time.

The applicability to different programming languages, project types, and development methodologies remains unexplored. The study's open-source focus and specific project selection may not represent the full diversity of software development contexts.

What to Watch Next

Replication studies with larger sample sizes and different AI tools would strengthen or challenge these findings. The research community's response to METR's methodology will indicate whether similar studies follow.

AI tool vendors may release their own research in response to these findings. The quality and independence of such research will determine its credibility.

Corporate adoption patterns for AI coding tools may shift as this research circulates. Procurement decisions and usage policies could reflect more cautious evaluation of productivity claims.

Updates to AI coding tools may specifically target the issues identified in the study. Improvements in suggestion accuracy and context understanding could change the productivity equation.

Developer community discussions about AI tool usage may evolve based on this research. Individual developers and teams may adjust their practices based on empirical evidence rather than assumptions.

Sources

METR Blog - "Measuring the Impact of Early 2025 AI on Experienced Open Source Developer Productivity" - July 10, 2025 - https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
ArXiv - "AI Coding Assistants and Developer Productivity: A Randomized Controlled Trial" - July 10, 2025 - https://arxiv.org/abs/2507.09089
Hacker News Discussion - Thread on METR AI productivity study - July 10, 2025 - https://news.ycombinator.com/item?id=44522772

METR Study Finds AI Coding Tools Reduce Developer Productivity by 19 Percent

Executive Brief

What Happened

Key Claims and Evidence

Pros and Opportunities

Cons, Risks, and Limitations

How the Technology Works

Broader Industry Implications

What Remains Unclear

What to Watch Next

Sources

Sources & References

Related Topics

Mercury Diffusion LLM Achieves Record Inference Speeds

Arc Institute Launches State Virtual Cell Model for Cellular Perturbation Prediction

Meta Attempts to Acquire Safe Superintelligence in $32 Billion AI Talent Bid

Executive Brief

What Happened

Key Claims and Evidence

Pros and Opportunities

Cons, Risks, and Limitations

How the Technology Works

Broader Industry Implications

What Remains Unclear

What to Watch Next

Sources

Sources & References

Related Topics

Related Reading

Mercury Diffusion LLM Achieves Record Inference Speeds

Arc Institute Launches State Virtual Cell Model for Cellular Perturbation Prediction

Meta Attempts to Acquire Safe Superintelligence in $32 Billion AI Talent Bid