Table of Contents
What is Deployment Rework Rate?
Deployment Rework Rate measures the share of all deployments that require an additional deployment of the same code segment within a short time window after the original release, in order to fix defects, regressions, or unfinished logic. It is therefore neither a throughput metric like Deployment Frequency nor a stability metric like Change Failure Rate, but a pure quality-of-delivery-decision metric: it shows how often teams declare "done" while the build still has two correction loops ahead of it.
Released by the Google DORA team under Nathen Harvey in April 2026, Rework Rate is the first official extension of the canonical four-metric set since the 2018 publication of "Accelerate". The background is the observation that the classical four metrics leave a fundamental gap: a team can deploy at high frequency, report low MTTR and low Change Failure Rate — and still chronically push code to production that has to be touched again immediately. Rework Rate makes this hidden instability visible.
The official DORA Quick Check specification (see https://dora.dev/quickcheck/) defines rework as any deployment that touches the same application, service, or repository again within 72 hours of a previous deployment. The threshold is deliberately short: what has to be patched again after three days was not production-ready at the original release. Longer time windows blur normal iteration cycles with genuine correction work.
Important: Rework Rate is explicitly not a punishment metric. High values indicate systemic weaknesses in test coverage, code review, or specification practices — not "bad" individuals. As with the original four metrics, the rule remains: never use it for individual performance evaluation, always aggregate at the team or service level.
Rework Rate answers the question Change Failure Rate does not ask: how often did we have to touch the same code immediately after release — regardless of whether it was a classic incident or just a "well, that is still a bug" patch?
Why Now? The AI Amplifier on the Rework Spiral
The timing of the DORA extension, close to the broad rollout of AI-assisted code generation, is no coincidence. Several industry sources — including Gartners 2026 Hype Cycle for Agentic AI and the Thoughtworks Technology Radar v34 (April 2026, see https://www.thoughtworks.com/insights/blog/devex/cognitive-debt-defaults) — observe that AI tools increase upfront velocity while simultaneously driving rework. What appears to be a productivity gain can morph into hidden maintenance burden in the second sprint week.
The mechanism is intuitive: AI assistants generate plausible-looking code that passes local tests but makes silent assumptions about interface behaviour, state management, or failure modes that surface only in real operation. Classical stability metrics (Change Failure Rate, MTTR) only capture the share with a clear incident — quiet corrections never reach the pager. Rework Rate closes this gap by counting "silent patches" as well.
Initial empirical data from the DORA Quick Check shows that only 7.3% of participating teams achieve a Rework Rate below 2%. The median sits around 11–14%, higher in AI-intensive teams. The Google DORA Report 2025 (see https://dora.dev/research/2025/dora-report/) had already shown that AI acts as an amplifier of existing practices — teams with strong test and review discipline get better, weak teams get worse. Rework Rate is the metric that makes this divergence explicit.
For CIOs and Engineering Directors, the fifth metric is therefore not a technical detail question but a governance-relevant control variable: rolling out AI-assisted development broadly without watching Rework Rate in parallel risks a productivity number that points up while actual delivery quality silently erodes underneath. This exact discrepancy has historically been the pattern before every major software quality crisis.
How to Measure Rework Rate Without New Tooling
The good news: Rework Rate can be derived from data that nearly every CI/CD pipeline already produces. The only prerequisite is consistent tagging of each deployment with a stable service or repository identifier and a timestamp. From those two fields you build an hour-window function that, for each deployment, checks whether another deployment of the same identifier took place within the previous 72 hours.
Concretely: if service X is deployed at 10:00 today and the last deploy of the same service was 28 hours ago, today's deployment counts as rework. If the previous deploy is more than 72 hours old, it is a regular release. The Rework Rate for a period is then the share of rework deployments in all deployments for the same period.
In practice this works with GitHub Actions, GitLab CI, Argo CD, Jenkins, Azure DevOps, or Spinnaker — all tools write deployment events to logs or a database. A simple SQL statement against the deployment table, a Loki query, or a Python script with Pandas is enough to produce an initial baseline. We recommend tracking the classical four DORA metrics alongside, so that Rework Rate is always interpreted in the context of throughput and stability values.
The definition of "same code" matters: pure configuration changes, pure infrastructure changes, or scheduled hotfix windows (e.g. weekly security patches) should not count as rework. Tag such deployments in CI as "infra" or "scheduled" and exclude them from the numerator, otherwise the ratio will skew upwards.
Minimal Data Structure per Deployment
Service or repository identifier (string, stable across renames). Deployment timestamp (UTC). Deployment type (feature, hotfix, infra, scheduled). Commit or pull-request reference for traceability. In microservice architectures we additionally recommend an optional "affected_modules" field so that rework becomes measurable at module level.
From these fields, Rework Rate can be derived per service as well as aggregated per team or domain. In practice, aggregated values fit management reports, service-level values fit engineering retros.
Benchmarks: Where Do You Actually Stand?
Based on the first DORA Quick Check data (see https://dora.dev/quickcheck/), preliminary performance clusters can be derived. Elite teams achieve a Rework Rate below 2% — less than one in 50 deployments is rework. High performers land at 2–5%, medium at 5–15%, low above 15%. These thresholds will likely be recalibrated after 12 months of real data; until then they are directional values with a clear range.
A striking correlation appears with two other practices: teams running Trunk-Based Development with automated pre-merge coverage above 70% typically achieve below 4%. Teams with feature branches that live longer than five days and rely on manual QA stages mostly land in the double-digit range. The fifth metric thus empirically confirms what DORA research had already said about test automation and small batch sizes.
For German SMB organisations with a classical monthly release cadence, Rework Rate is paradoxically often lower than at more agile competitors — simply because fewer absolute deployments happen. In those cases Rework Rate is not a reliable indicator on its own and must be interpreted alongside Deployment Frequency. A team with two releases per quarter and 0% rework does not live in elite territory; it lives in a world with too few releases to make the metric meaningful.
| Tier | Rework Rate | Typical accompanying practices |
|---|---|---|
| Elite | < 2% | Trunk-Based Dev, > 70% pre-merge coverage, feature flags |
| High | 2–5% | Daily releases, fully automated pipeline |
| Medium | 5–15% | Weekly releases, manual QA stages |
| Low | > 15% | Monthly releases, high manual review share |
AI-Specific Traps and Countermeasures
When you roll out AI coding assistants broadly, three recurring patterns push Rework Rate upwards. First: optimistic edge-case handling — AI generates defensive wrappers that feel solid in tests but take dramatically different paths under production load profiles. Counter: property-based testing or fuzzing in CI as a mandatory step for AI-touched files. Second: silent interface assumptions — the assistant guesses where API documentation is ambiguous. Counter: enforced spec-first practice (see our article on spec-driven development at /blog/spec-driven-development-governance).
Third: configuration drift — AI code changes often reference feature flags or environment variables that exist in one environment but not in another. Counter: a configuration linter in the pre-merge check, complemented by a "promotion diff" stage between staging and production. These three levers together typically reduce Rework Rate by 30–50% within 90 days in our DACH SMB consulting practice — without reducing AI usage.
For a deeper dive into the AI code quality topic, the Alev-B article on cognitive debt (/blog/cognitive-debt-ki-code) offers the larger systemic framing — Rework Rate is, in a sense, the numerical leading indicator where cognitive debt shows up first, before it wanders into the backlog as maintenance load.
90-Day Implementation
Day 1–14 (baseline): collect 8 weeks of historical deployment data retrospectively from your CI logs. Calculate a first baseline per service and aggregated per team. Deliberately avoid targets at this stage — first understand what "normal" looks like for your architecture and release cadence.
Day 15–45 (visibility): build a simple dashboard showing Rework Rate alongside the classical four DORA metrics. Establish a weekly 15-minute standup item: "Which deployments this week were rework, and what root cause stood behind them?" Important: blameless, focused on systemic learning. The first findings will usually be test coverage gaps or missing spec review gates.
Day 46–90 (intervention): choose at most two levers — for example enforced pre-merge test coverage of 65% and a property-based test for AI-touched files. Measure the trend after 30 days, not the absolute value. A 25% reduction in 30 days is a realistic and at the same time non-trivial target.
If you keep this rhythm for three quarters, the probability is high that you will slide into the high-performer band without further specialised tooling. Those who want the full maturity check of engineering practices automated will find a fitting template in the Alev-B catalog at /templates/devops-maturity and /templates/dora-metrics.
Key Takeaways
- Deployment Rework Rate has been the official 5th DORA metric since April 2026, closing a previously invisible quality gap alongside Deployment Frequency, Lead Time, MTTR, and Change Failure Rate.
- It measures the share of all deployments that require another deployment of the same code within 72 hours — i.e. "silent patches" that are not classical incidents.
- AI coding assistants raise upfront velocity and rework in parallel; only Rework Rate makes this trade-off numerically visible.
- Only 7.3% of teams in the DORA Quick Check reach elite level (< 2%); the median sits in the double-digit range.
- The metric can be calculated from existing CI/CD logs — no new tooling, no vendor lock-in required.
- Rolling out AI-assisted development without observing Rework Rate risks a productivity illusion that manifests as maintenance burden 2–3 quarters later.
Related Assessment Templates
Frequently Asked Questions
No, both metrics measure different things. Change Failure Rate captures deployments that triggered a clearly identifiable incident or rollback — i.e. events with high organisational visibility. Rework Rate additionally captures the quiet correction work that triggers no pager event but still requires another deployment within 72 hours. Elite teams watch both metrics in parallel because high Change Failure Rate points to operational stability problems, while high Rework Rate points to quality decisions before release. Measuring only one of the two systematically hides half of reality.
The official DORA specification justifies the window as being long enough to distinguish genuine correction work from a regular follow-up release, and short enough not to catch normal iteration cycles. In industries with very fast release cadence (e.g. consumer-facing SaaS with dozens of deployments per day) a shorter threshold may make sense; in regulated industries with weekly releases (finance, public sector), 72 hours remains the pragmatic standard. More important than the exact threshold is consistency: once a threshold is chosen, it should be kept for at least four quarters to allow trend comparisons.
Code Churn measures volatility at the code level — how often a line of code changes before it stabilises. Rework Rate measures at the deployment level — how often the same service is released again. The two metrics can complement each other but are not interchangeable. High Code Churn without high Rework Rate usually means intensive iteration before release; high Rework Rate without high Code Churn indicates large changes that go into production insufficiently tested. For governance purposes, Rework Rate is more telling because it measures the organisational delivery decision, not just developer behaviour.
Yes, particularly well. In monolithic systems the service identifier is trivial — there is only one — and every deployment by definition affects the same codebase. The typical trap in monoliths is rather that Rework Rate looks artificially low because many changes are bundled into rarer, large releases. A breakdown by code module or domain helps here: even when only one deployment artefact exists, it is possible to track which modules were touched repeatedly within the 72-hour window. This module level becomes the meaningful granularity for monoliths.
Avoid the framing "new metric" — use "blind spot of current reporting" instead. Show concretely with an example from the last two quarters: here MTTR and Change Failure Rate were green, but this team redeployed the same service 17 times in four weeks. Management intuitively understands that this is a problem. Rework Rate is the number that turns this intuitive insight into a repeatable indicator. In practice it takes one concrete case to extend the reporting set permanently — no strategy paper substitutes for this practical evidence.
As of May 2026, Faros AI, GetDX, and Plandek have published initial Rework Rate dashboards; Sleuth and LinearB have announced implementation for Q3 2026. For self-hosted setups there are already pull requests in Googles "Four Keys" project on GitHub (https://github.com/dora-team/fourkeys) that add a Rework Rate calculation. Those who do not want to use a ready-made platform can reach the goal with a 50-line Python script or a SQL view against CI logs — the fifth metric is deliberately specified so that no special infrastructure is required.