Table of Contents
What Are DORA Metrics?
DORA Metrics are four key indicators developed by Google's DevOps Research and Assessment (DORA) team to objectively measure the performance of software delivery processes. The research program began in 2014 under the leadership of Dr. Nicole Forsgren, Jez Humble, and Gene Kim, who surveyed more than 30,000 technology professionals over six years. The findings were published in 2018 in the book "Accelerate: The Science of Lean Software and DevOps" and have since transformed how organizations evaluate their engineering performance.
The critical breakthrough of the DORA research was the empirical proof that speed and stability are not trade-offs. Traditionally, many IT organizations assumed that faster releases inevitably lead to more failures. The data shows the opposite: elite performers deploy more frequently AND simultaneously have lower failure rates. This insight has fundamentally changed how we think about software development processes.
The four DORA Metrics fall into two categories: Throughput metrics (Deployment Frequency and Lead Time for Changes) measure how quickly a team can deliver changes to production. Stability metrics (Mean Time to Restore and Change Failure Rate) measure how reliable those changes are. Only the combination of all four metrics provides a complete picture of software delivery performance.
Why are these metrics so relevant? Because they correlate directly with business success. The DORA research shows that organizations with elite performance are twice as likely to exceed their business objectives. They have more satisfied developers, lower attrition, and can respond faster to market demands. DORA Metrics are therefore not just a technical instrument — they are a strategic compass for the entire organization.
DORA Metrics empirically prove: Faster deployments and higher stability are not mutually exclusive — elite teams achieve both simultaneously.
Deployment Frequency
Deployment Frequency measures how often an organization successfully deploys code to the production environment. It is the most intuitive of the four DORA Metrics and provides immediate insight into how agile a team actually operates. A high Deployment Frequency indicates small, incremental changes — a core principle of Continuous Delivery and modern software engineering.
Measurement seems straightforward at first: count the number of successful production deployments per time unit. In practice, however, there are nuances. Do you count only deployments of the main product or also microservices? Do feature flag activations count? Do pure infrastructure changes count? The recommendation: measure everything that brings a code change to production and could potentially affect user behavior.
Many teams underestimate the importance of Deployment Frequency because they believe deploying more often is riskier. The opposite is true: smaller, more frequent deployments are easier to debug, faster to roll back, and significantly reduce the risk per individual release. A team that deploys once per quarter packs hundreds of changes into a single release — an enormous risk cluster.
Deployment Frequency Benchmarks
The DORA research defines four performance clusters. Elite performers deploy multiple times per day — often dozens of times daily through automated pipelines. High performers deploy between once per day and once per week. Medium performers manage one deployment per week to once per month. Low performers deploy less than once per month, sometimes only quarterly.
The jump from Low to Medium is usually an organizational problem: lack of automation, manual approval processes, or a culture of fear around releases. The jump from Medium to High requires investment in CI/CD infrastructure and test automation. The jump to Elite requires additional cultural change: Trunk-Based Development, feature flags, and a philosophy of continuous deployment.
Improvement Strategies
Automate your entire build and deployment process. Every manual step is a bottleneck that reduces frequency. Implement feature flags so that unfinished code can already be deployed to production without being visible to end users. Reduce the batch size of your releases — smaller changesets can be reviewed, tested, and deployed faster.
Establish Trunk-Based Development instead of long-lived feature branches. The longer a branch lives, the more painful the merge and the longer the path to deployment. Invest in a fast, reliable CI pipeline: if the build takes 45 minutes, nobody will deploy multiple times per day. The goal should be a green build in under 10 minutes.
| Tier | Deployment Frequency |
|---|---|
| Elite | Multiple times per day (On-Demand) |
| High | Once per day to once per week |
| Medium | Once per week to once per month |
| Low | Less than once per month |
Lead Time for Changes
Lead Time for Changes measures the time span from the first commit of a code change to its successful execution in the production environment. It captures the entire journey through the delivery pipeline: code review, automated tests, staging, approval processes, and the final deployment. This metric is particularly revealing because it exposes systemic bottlenecks that individual teams often fail to notice.
A long lead time means that value — finished code that solves problems or delivers features — sits unnecessarily long in the system before reaching users. In a world where time-to-market is a decisive competitive advantage, a lead time of several weeks can make the difference between market leadership and irrelevance.
Correct measurement starts at the commit (or merge into the main branch) and ends at successful deployment to production. Some teams additionally measure "Coding Time" (from ticket start to first commit), but this phase is not included in the classic DORA definition. The focus is deliberately on the delivery process, not development time, because this is where the greatest optimization potential lies.
Lead Time Benchmarks
Elite performers achieve a lead time of under one hour — from commit to production. This sounds utopian but is achievable with a fully automated pipeline, Trunk-Based Development, and a strong testing foundation. High performers land at under one week, medium performers at under one month, and low performers need more than one month.
The distribution is revealing: the median for most organizations falls in the one to two week range. This means that even "average" teams have significant improvement potential. Particularly problematic are organizations whose lead time varies widely — sometimes one day, sometimes three weeks. This variance indicates unstable processes.
Bottleneck Analysis and Optimization
The most common bottlenecks are: code review wait times (the code is ready, but nobody reviews it), manual QA phases (testers are overloaded or lack capacity), Change Advisory Boards (weekly meetings that must approve deployments), and environment bottlenecks (only one staging environment for ten teams).
To shorten lead time, first identify the wait times in your pipeline. Use Value Stream Mapping to visualize every step from commit to deployment. Often it is not the active work steps that cost time, but the handoffs and wait times between them. Automate what can be automated, and parallelize what can run in parallel.
An often overlooked lever is review culture. If pull requests sit unreviewed for days, no CI/CD pipeline in the world will help. Establish rules like "review within 4 hours" and keep PRs small enough that a review is possible in 15 minutes. Pair programming can partially replace code reviews and drastically reduce lead time.
| Tier | Lead Time for Changes |
|---|---|
| Elite | Less than 1 hour |
| High | Less than 1 week |
| Medium | Less than 1 month |
| Low | More than 1 month |
Mean Time to Restore (MTTR)
Mean Time to Restore (MTTR) measures the average time a team needs to restore service after an outage or degradation. It is arguably the most critical of the four DORA Metrics because it directly reflects downtime experienced by end users. In a world where minutes of downtime can cost millions, a low MTTR is a massive competitive advantage.
An important distinction: MTTR does not measure the time to root cause analysis, but the time to service restoration. This could be a rollback, a hotfix, a configuration change, or switching to a fallback system. The complete root cause analysis can (and should) happen afterward. The focus is on minimizing user impact.
Measurement begins when an incident is detected — ideally through automatic monitoring, not user complaints — and ends when the service is operating at the defined Service Level Agreement (SLA) again. The difference is averaged across all incidents. Teams should also track the median, as individual severe incidents can heavily skew the average.
MTTR Benchmarks
Elite performers restore their service in under one hour. This requires excellent monitoring, prepared runbooks, automated rollback mechanisms, and a well-practiced on-call team. High performers land at under one day, medium performers at under one week, and low performers need more than one week for restoration.
An MTTR of over one week is a serious warning signal. It points to fundamental problems: lack of observability, no rollback capability, single points of failure in the architecture, or an incident management process that hinders more than it helps. In regulated industries, a high MTTR can also create compliance risks.
Incident Management as a Lever
MTTR can be decomposed into sub-phases: Time to Detect (TTD), Time to Engage (TTE), Time to Fix (TTF), and Time to Verify (TTV). Each phase offers its own optimization potential. Automatic alerting reduces TTD, clear escalation paths reduce TTE, prepared runbooks reduce TTF, and automated smoke tests reduce TTV.
Invest in observability — not just monitoring. The difference: monitoring tells you THAT something is broken. Observability tells you WHY it is broken. Distributed tracing, structured logging, and meaningful dashboards enable your on-call team to diagnose problems in minutes instead of hours.
Conduct regular Chaos Engineering exercises or Game Days. Simulate outages to test and improve your recovery processes. Netflix pioneered this approach with the "Chaos Monkey," demonstrating how controlled disruptions massively increase resilience. Every incident you practice in a Game Day is an incident you resolve faster in production.
| Tier | Mean Time to Restore |
|---|---|
| Elite | Less than 1 hour |
| High | Less than 1 day |
| Medium | Less than 1 week |
| Low | More than 1 week |
Change Failure Rate
The Change Failure Rate (CFR) measures the percentage of deployments that lead to a service degradation and require intervention — whether a rollback, a hotfix, or an emergency patch. It is the central quality metric among the DORA Metrics and is directly related to the maturity of a team's testing processes and deployment practices.
The definition of "failure" is critical and must be consistent within the organization. A failure is any change that negatively impacts the service and requires an unplanned correction. This includes: production incidents triggered by a deployment, performance degradations requiring a rollback, and feature releases that must be immediately deactivated due to critical bugs.
Measurement is a simple quotient: number of failed deployments divided by the total number of deployments in a period. If you perform 100 deployments in a month and 8 of them lead to incidents, your Change Failure Rate is 8%. Ensure you measure consistently — if you only count "major" releases, you distort the picture.
Change Failure Rate Benchmarks
Elite performers have a Change Failure Rate of 0–15%. This does not mean they never make mistakes — it means their quality assurance processes catch the vast majority of problems before deployment. High performers land at 16–30%, medium performers at 31–45%, and low performers at 46–60%.
A CFR above 50% is alarming: more than every second deployment causes problems. In such organizations, a vicious cycle emerges: the fear of failures leads to less frequent deploying, which leads to larger changesets, which in turn increases the failure rate. The way out paradoxically begins with deploying more frequently — with smaller, better-tested changes.
Testing Strategies to Reduce CFR
The test pyramid remains the foundation: many fast unit tests, a solid layer of integration tests, and few targeted end-to-end tests. Supplement this with contract testing for microservice architectures and performance tests for critical paths. Automation is mandatory — manual tests do not scale and are error-prone.
Progressive delivery strategies like canary releases and blue-green deployments significantly reduce the blast radius of a faulty deployment. Instead of routing 100% of traffic to the new version immediately, start with 1–5% and observe the metrics. In case of anomalies, automatic rollback occurs before the majority of users are affected.
Code reviews are another critical factor. Studies show that reviews can reduce error rates by 30–60% — but only when taken seriously. An "LGTM" after 30 seconds is not a review. Establish clear review checklists, limit PR size to a maximum of 400 lines, and use automated linting and security checks as the first line of defense.
| Tier | Change Failure Rate |
|---|---|
| Elite | 0–15% |
| High | 16–30% |
| Medium | 31–45% |
| Low | 46–60% |
DORA Metrics in Practice: How to Get Started
Introducing DORA Metrics is not a technology project — it is a culture change. Many organizations make the mistake of buying an expensive tool and expecting the metrics to improve on their own. In truth, the tools are secondary. What matters is that teams understand WHY these metrics are being measured and HOW they influence their daily work.
Do not start with all four metrics simultaneously. Begin with Deployment Frequency — it is the easiest to measure and improvement has the most immediate effect on the other three metrics. When you deploy more frequently, lead time automatically decreases (smaller batches), change failure rate decreases (smaller changesets), and MTTR often improves as well (simpler rollback).
DORA Metrics are not a performance evaluation tool for developers. They are a compass for the organization to identify systemic bottlenecks and improve deliberately.
Common Pitfalls
The biggest mistake is using DORA Metrics as a performance evaluation tool for individual developers or teams. As soon as metrics are tied to bonuses or appraisals, people start optimizing the metrics instead of the underlying processes. A team forced to deploy more frequently will simply make more empty or trivial deployments.
Another mistake is ignoring context. A Deployment Frequency of once per month can be perfectly adequate for an embedded systems team, while it indicates problems for a SaaS product. Do not blindly compare teams with different contexts. Instead, compare a team against its own historical trend.
Do not measure too granularly in the early phase. Daily fluctuations in lead time are noise, not signal. Look at 4-week trends and rolling averages to distinguish real improvements from statistical outliers.
- 1Establish a baseline: Measure your current state over at least 4 weeks. Without a baseline, you cannot demonstrate improvement. Use your existing tools: Git history, CI/CD logs, and incident tickets already provide valuable data.
- 2Create transparency: Make the metrics visible to everyone — not as a control instrument, but as shared orientation. A dashboard in the team room or a weekly Slack bot report ensures continuous attention.
- 3Set one improvement goal: Choose ONE metric that needs the most attention and define a realistic improvement target for the next 3 months. "Reduce lead time from 2 weeks to 3 days" is more concrete and motivating than "improve everything."
- 4Leverage retrospectives: Integrate DORA Metrics into your sprint retrospectives. Ask: "What influenced our lead time this week?" or "Which deployment failed and why?" This makes the metrics part of the daily improvement culture.
- 5Iterate and expand: After 3 months, celebrate successes, document learnings, and focus on the next metric. DORA Metrics are a marathon, not a sprint.
DORA Metrics and Software Delivery Performance
The four DORA Metrics together form a complete picture of an organization's software delivery performance. No single metric is informative enough on its own — only the combination enables sound decisions. A team with high Deployment Frequency but equally high Change Failure Rate has a quality problem. A team with low Change Failure Rate but extremely long Lead Time has a throughput problem.
The following table shows the benchmark values of all four metrics across the four performance tiers. Use this table as a reference to classify your organization — but remember that these are guideline values, not absolute targets. The most important comparison is always with your own previous month.
The DORA research has shown a strong correlation between software delivery performance and organizational success. Elite performers do not just have better technical metrics — they also have more satisfied employees, less burnout, and a higher rate of innovation. Investing in DORA Metrics therefore pays off far beyond the IT department.
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | Multiple times per day | Daily to weekly | Weekly to monthly | Less than monthly |
| Lead Time for Changes | < 1 hour | < 1 week | < 1 month | > 1 month |
| Mean Time to Restore | < 1 hour | < 1 day | < 1 week | > 1 week |
| Change Failure Rate | 0–15% | 16–30% | 31–45% | 46–60% |
Conclusion
DORA Metrics are the gold standard for evaluating software delivery performance — not because they are perfect, but because they are based on the most extensive empirical research available in this field. The four metrics Deployment Frequency, Lead Time for Changes, Mean Time to Restore, and Change Failure Rate together provide a balanced picture of speed and stability.
The greatest value of DORA Metrics lies not in the numbers themselves, but in the conversations they spark. When a team first visualizes its lead time and discovers that code hangs in review for an average of 12 days, a concrete improvement impulse emerges. When the change failure rate drops from 35% to 12% after a test automation initiative, the value of the investment becomes visible to the entire management.
Start small, measure consistently, and use the metrics as a foundation for continuous improvement — not as a whip. Organizations that properly employ DORA Metrics transform not just their delivery pipeline, but their entire engineering culture. The journey from Low to Elite is not a sprint but a marathon — yet every step in the right direction pays dividends.
DORA Metrics are the starting point, not the destination. Use the data as a basis for conversations, experiments, and continuous improvement in your organization.
Sources & References
The research findings and benchmarks referenced in this article are based on the following sources:
- DORA — DevOps Research and Assessment (Google): https://dora.dev/
- Accelerate: The Science of Lean Software and DevOps — Forsgren, Humble, Kim (2018)
- State of DevOps Report — DORA / Google Cloud: https://cloud.google.com/devops
- Four Keys — Open-Source DORA Metrics Project: https://github.com/dora-team/fourkeys
Key Takeaways
- DORA Metrics consist of four key indicators: Deployment Frequency, Lead Time for Changes, MTTR, and Change Failure Rate — measuring both speed and stability.
- Elite performers deploy multiple times daily with lead time under one hour, restore services in under one hour, and keep failure rates below 15%.
- Speed and stability are not trade-offs — DORA research proves that the best teams lead in BOTH dimensions.
- Start with a baseline measurement over 4 weeks, then focus on improving ONE metric — ideally Deployment Frequency.
- DORA Metrics must never be used as an individual performance evaluation tool — they are an organizational compass for systemic improvements.
- Investing in DORA Metrics pays off beyond IT: higher employee satisfaction, less burnout, and better business outcomes are empirically proven.
Related Assessment Templates
Frequently Asked Questions
DORA Metrics differ from other DevOps metrics through their scientific foundation and focus on outcomes rather than outputs. While metrics like "Lines of Code," "Velocity," or "Story Points" measure activity, DORA Metrics measure the actual result: how quickly and reliably does code reach production? The four DORA Metrics were validated over six years of research with over 30,000 respondents and demonstrably correlate with organizational success. Other popular frameworks like SPACE or Google's "Four Keys" complement DORA but do not replace it. DORA Metrics form the foundation on which further metrics can be built.
Raw data collection should be continuous and automated — every deployment, every incident is captured. Analysis and reflection is recommended in two cycles: weekly for the engineering team (as part of standups or retrospectives) and monthly for management (as part of engineering reports). Avoid daily evaluations, as natural variance leads to misinterpretation. Use rolling 4-week averages for trend analysis and compare quarter over quarter to account for seasonal effects (vacation periods, feature freezes). The key is consistency: regular rough data is better than sporadic perfect data.
There are dedicated DORA platforms like Sleuth, LinearB, Jellyfish, and Faros AI that integrate with your CI/CD pipeline, Git provider, and incident management tools. Major platforms have built-in native support: GitLab offers a DORA dashboard out-of-the-box, GitHub Actions can deliver DORA data via custom metrics, and Azure DevOps has integrated DORA reports into its analytics suite. For self-hosted solutions, the open-source "Four Keys" project from Google Cloud is well-suited. However, many teams start pragmatically with a simple dashboard in Grafana or Datadog that aggregates data from Jenkins/GitHub Actions, PagerDuty, and the Git log.
Technically yes, but the value is limited. Without a CI/CD pipeline, you must track deployments and lead times manually — this is error-prone, time-consuming, and rarely consistent. Moreover, the absence of a CI/CD pipeline is itself already a signal: teams without automation almost always fall into the "Low Performer" category. The DORA research clearly shows that Continuous Integration and Continuous Delivery are the most important technical practices for improving all four metrics. Our recommendation: use the introduction of DORA Metrics as a catalyst to simultaneously build a CI/CD pipeline. Start with a minimal pipeline (automated build + deploy to staging) and expand iteratively.
Elite Performance is the highest tier in the DORA classification system and describes organizations that achieve top values across all four metrics: multiple deployments per day, lead time under one hour, service restoration under one hour, and a change failure rate of 15% maximum. According to the "State of DevOps" study, only about 15–20% of surveyed organizations belong to the Elite category. These teams are characterized by fully automated pipelines, Trunk-Based Development, extensive test coverage, an observability-first culture, and blameless post-mortems. Importantly, Elite Performance is not a one-time goal but a continuous state that requires active maintenance and investment.
A low Deployment Frequency usually has multiple causes that need to be addressed simultaneously. First: fully automate the deployment process — every manual step (SSH to the server, manual testing, approval emails) is a roadblock. Second: reduce the batch size of your releases. Instead of collecting features for months, deploy each completed change individually. Feature flags help hide unfinished code. Third: eliminate organizational blockers such as Change Advisory Boards that only meet weekly. Fourth: invest in test automation so the team has confidence in the quality of each deployment. In our experience, Deployment Frequency doubles within 3 months when these four levers are consistently implemented.