DevOps5. Mai 202613 min

The AI-Delivery Capability Check: The 7 DORA 2025 Multipliers

AI amplifies what is already there. This self-assessment operationalizes the seven capabilities from the DORA Report 2025 — so you know which foundations make your AI gains real.

R&D

R&D Team

Alev-B Research & Development

Why This Capability Check Matters Now

Nearly every engineering team now uses AI assistance in software delivery. The DORA Report 2025 delivers the sobering key insight: artificial intelligence is an amplifier, not a repair tool. It noticeably increases throughput — but without mature delivery foundations, it simultaneously degrades stability. AI makes a well-positioned organization faster and a poorly positioned organization faster at breaking.

This is precisely where this self-assessment comes in. We designed it so you can run it immediately with your team: no theory, but seven concrete capabilities, each with maturity levels, diagnostic questions, common gaps, and immediate actions. The goal is not an overall grade but a differentiated picture of which foundations carry your AI investment — and which currently undermine it.

This article builds on the analysis in the DORA Report 2025 article, which explains why AI amplifies rather than repairs. While that article covers the research evidence and the strategic implications, this one is the operational companion: a repeatable instrument with which you measure your position across the seven capabilities. It complements the AI Readiness Assessment, which examines organizational AI preparedness at the enterprise level, and the DevOps Maturity Assessment, which evaluates general delivery maturity independent of the AI context.

Our recommendation: read all seven capabilities once in full, rate each on the five-level scale, and conduct the check with at least three people from different roles — development, operations, and product. A capability check filled out by a single person has blind spots.

The central thesis of the DORA Report 2025: AI amplifies the existing system. Teams with mature foundations gain throughput and keep stability. Teams without those foundations also gain throughput — while losing reliability at the same time. Maturity is the variable that determines the sign of the AI effect.

The Maturity Model Behind the Check

Each of the seven capabilities is rated on a consistent five-level scale. The levels are cumulative — a higher level presupposes that the one below is consistently achieved. Skipping levels is neither possible nor sensible.

The levels are: Level 1 — Ad-hoc (no deliberate practice, success depends on individuals). Level 2 — Initial (first deliberate attempts, inconsistently applied). Level 3 — Defined (standardized and practiced organization-wide). Level 4 — Managed (measured and optimized data-driven). Level 5 — Optimizing (continuous, proactive improvement with short feedback loops).

Rate conservatively. A capability is only at a level if that level is the normal case — not the exception an engaged team reaches in good weather. When in doubt, choose the lower level. An honest Level 2 profile is more valuable than an inflated Level 4 profile, because only the honest profile leads to effective actions.

Rule of thumb for level placement: Would this practice still work if the most experienced person on the team were on vacation for two weeks? If not, you are at Level 2 at most — regardless of how good it looks on good days.

Capability 1 & 2: Clear Direction and a Healthy Data Ecosystem

The first two capabilities address the preconditions without which AI assistance remains aimless code output. They look unspectacular but are, according to the DORA Report 2025, the most common unnoticed bottlenecks.

Capability 1 — Clear and Communicated Direction

Definition: Teams know which problem they are solving and why. Priorities are unambiguous, trade-offs are explicitly decided, and this clarity reaches the people who make daily decisions — not just the slide deck level.

Maturity levels: Level 1 — priorities change weekly, nobody can name the current top priority. Level 2 — goals exist, but only leaders know them. Level 3 — goals are documented, cascaded, and teams can restate them. Level 4 — progress against goals is measured and used in reviews. Level 5 — goals are adjusted data-driven, teams steer autonomously toward clear outcomes.

Common gap: AI assistance is introduced to "be faster" without defining what the gained speed should be used for. The result is more output without more value. Immediate action: have every team formulate in one sentence which user problem it solves this quarter — and check whether three randomly asked team members name the same sentence.

Capability 2 — Healthy Data Ecosystem

Definition: Data is discoverable, trustworthy, current, and accessible. Teams make decisions based on data, not opinions — and AI tools work on a reliable data foundation rather than on noise.

Maturity levels: Level 1 — data sits in isolated silos, quality is unknown. Level 2 — central data sources exist, but currency and quality are not assured. Level 3 — data quality is monitored, access is governed and documented. Level 4 — data quality metrics are actively managed, lineage is traceable. Level 5 — self-healing data pipelines, quality is embedded in the workflow.

Common gap: AI-supported analysis and code generation produce plausible-looking but factually wrong results because the underlying data is stale or inconsistent. Immediate action: identify the three data sources AI tools or dashboards rely on most, and for each check: who is responsible, when was it last validated, how is quality measured?

Capability 3 & 4: Quality Focus and User-Centric Focus

The next two capabilities determine whether AI-accelerated output is actually valuable and sustainable — or whether speed is traded for substance.

Capability 3 — Quality Focus Over Pure Speed

Definition: Quality is a non-negotiable outcome of the delivery process, not a downstream check. Automated tests, code review, and a definition of done act as reliable control points — especially for AI-generated code.

Maturity levels: Level 1 — tests are rare, quality emerges through manual follow-up. Level 2 — unit tests exist sporadically, coverage is unknown. Level 3 — automated tests are a mandatory pipeline component, review is standard. Level 4 — quality metrics (coverage, escaped defects, flaky rate) are managed. Level 5 — quality signals automatically gate deployment, mutation testing verifies the tests themselves.

Common gap: AI generates large volumes of code faster than review capacity grows. Reviewers wave through suggestions they have not fully understood. Immediate action: define an upper limit for the size of an AI-assisted pull request and introduce a mandatory comprehension checkpoint — the author verbally explains what the code does before it is merged.

Capability 4 — User-Centric Focus

Definition: Decisions are demonstrably oriented toward user needs. Teams have direct access to user feedback and close the loop between delivery and observed behavior.

Maturity levels: Level 1 — user need is assumed, not gathered. Level 2 — occasional user interviews without systematic evaluation. Level 3 — user feedback is a fixed part of prioritization. Level 4 — behavioral data is correlated with deployment events. Level 5 — continuous experiments (A/B, feature flags) steer product decisions.

Common gap: AI accelerates building features whose value was never validated — the organization produces faster past its users. Immediate action: for the last five delivered features, check whether a measurable user signal was defined for each and whether that signal is being tracked.

Capability 5, 6 & 7: Loops, AI Application, and Platform

The last three capabilities determine how quickly the organization learns from AI-accelerated work and whether the technical base can absorb the additional rate of change.

Capability 5 — Fast and Mature Feedback Loops

Definition: The organization quickly learns whether a change worked. Version control, continuous integration, and observability deliver reliable, short loops — which, according to the DORA Report 2025, is the decisive precondition for AI to increase throughput without sacrificing stability.

Maturity levels: Level 1 — problems surface only in production, often via user complaints. Level 2 — basic monitoring exists, detection takes hours to days. Level 3 — CI plus observability detect regressions within minutes. Level 4 — feedback signals are quantified (mean time to detect) and managed. Level 5 — automatic rollback on anomalies, feedback is embedded in the delivery flow.

Common gap: AI raises the rate of change, but the detection loop stays slow — defects accumulate unnoticed until a larger incident makes them visible. Immediate action: measure the time between delivery and detection for the last ten incidents. If the median exceeds one hour, this capability is your bottleneck for safe AI speed.

Capability 6 — Thoughtful AI Application and Governance

Definition: AI tools are introduced deliberately, with clear guardrails, defined use cases, and traceable accountability — not as uncontrolled sprawl.

Maturity levels: Level 1 — everyone uses arbitrary AI tools without rules. Level 2 — first informal recommendations, no enforcement. Level 3 — approved tools, documented usage policy, training. Level 4 — AI usage is measured, effects on quality and throughput are evaluated. Level 5 — AI use is continuously adjusted based on outcome data, governance is integrated into the workflow.

Common gap: AI assistance is used across the board, but nobody can say whether it improves or degrades quality and stability — the measurement basis is missing. Immediate action: establish a lean usage policy (approved tools, handling of sensitive data, mandatory review of AI code) and define two metrics by which you read the AI effect monthly.

Capability 7 — Solid Internal Platform and Developer Experience

Definition: An internal platform reduces cognitive load and delivers reliable standard paths (golden paths). It absorbs the additional rate of change and deployment that AI assistance generates.

Maturity levels: Level 1 — every team builds its own infrastructure, high friction. Level 2 — shared scripts and templates, but no self-service. Level 3 — internal platform with self-service for the most common paths. Level 4 — platform adoption and developer experience are measured and managed. Level 5 — the platform evolves data-driven based on usage signals, friction is systematically removed.

Common gap: AI generates more deployments, but each deployment runs through a high-friction, manual path — the bottleneck shifts from writing code to delivering it. Immediate action: measure the time a new team member needs until the first productive deployment. More than one day indicates a weak platform capability.

The 7-Capabilities Maturity Matrix at a Glance

The following matrix summarizes the seven capabilities and their characteristic expressions per maturity level. Use it as a rating grid: for each row, mark the level that describes the normal case for your organization — not the best day.

How to read the matrix: the AI effect does not tip at a single capability but at the weakest one. An organization with six capabilities at Level 4 and one at Level 1 experiences the AI damage precisely through that one gap — typically through weak feedback loops or a missing quality focus.

CapabilityLevel 1-2 (Ad-hoc/Initial)Level 3 (Defined)Level 4-5 (Managed/Optimizing)
1 Clear DirectionPriorities unclear or known only to leadersGoals documented, cascaded, teams can restate themGoals adjusted data-driven, autonomous outcome steering
2 Data EcosystemSilos, quality and currency unknownData quality monitored, access governedLineage traceable, quality embedded in workflow
3 Quality FocusTests rare, quality via manual follow-upAutomated tests mandatory, review standardQuality signals gate deployment, mutation testing
4 User FocusUser need assumed, not gatheredUser feedback a fixed part of prioritizationContinuous experiments steer product decisions
5 Feedback LoopsProblems visible only in productionRegressions detected within minutesAutomatic rollback, feedback in the delivery flow
6 AI ApplicationSprawl without rules or measurementApproved tools, policy, trainingAI effect measured, use adjusted data-driven
7 Platform & DevExEach team builds its own infrastructure, high frictionSelf-service for the most common pathsPlatform evolves data-driven, friction minimized

The Self-Check: 12 Diagnostic Questions

Answer the following twelve questions as a team, honestly and conservatively. Each question is phrased so that a hesitant or evasive answer is already the answer. Note the associated capability number for each question — this yields your gap profile.

Evaluation rule: every question you cannot answer with a clear, evidenced "yes" counts as a gap in the associated capability. Three or more gaps in the same capability mean: that capability is at Level 2 at most — regardless of how it feels.

  1. 1Can three randomly asked team members name the same current top priority? (Capability 1)
  2. 2For the three most important data sources, do you know who is responsible and when they were last validated? (Capability 2)
  3. 3Is there an enforced upper limit and a comprehension checkpoint for AI-assisted pull requests? (Capability 3)
  4. 4For the last five features, was a measurable user signal defined and tracked for each? (Capability 4)
  5. 5Is the median time between delivery and detection for the last ten incidents under one hour? (Capability 5)
  6. 6For the last ten incidents, can you fully trace commit, author, and approver? (Capability 5)
  7. 7Is there an enforced, documented usage policy for AI tools including handling of sensitive data? (Capability 6)
  8. 8Can you name two metrics by which you read the effect of AI use on quality and throughput monthly? (Capability 6)
  9. 9Does a new team member need less than one day to the first productive deployment? (Capability 7)
  10. 10Does the standard deployment path work reliably when the most experienced person is away for two weeks? (Capability 7)
  11. 11Are AI suggestions consistently fully understood before they are merged — even under deadline pressure? (Capability 3)
  12. 12Do you run this capability check on a planned, repeated basis, or is this the first time? (Cross-section of all capabilities)

Scoring Approach: From Profile to Priority

A single overall score is deliberately not the goal of this check. An average across seven capabilities obscures exactly the one weak capability through which AI causes damage. We instead recommend a three-step approach.

The most effective single action after this check is almost never "more AI" — it is closing the lowest capability gap. Only when no capability remains at Level 1 does additional AI acceleration reliably pay into throughput without costing stability.

  1. 1Build the profile: for each of the seven capabilities, enter the determined level (1-5). The result is a profile of seven values, ideally visualized as a radar chart. What becomes visible is not an average but the shape of your maturity — and in particular the deepest dip.
  2. 2Identify the bottleneck: mark the lowest capability. It is your AI risk driver, because the negative stability effect of AI materializes along the weakest capability, not along the average. If the lowest capability is at Level 1-2, aggressive AI speed is dangerous until this gap is closed.
  3. 3Derive the sequence: first raise every capability at Level 1 to at least Level 2 — this removes the acute damage vectors. Then invest specifically in the capability with the greatest business relevance instead of optimizing broadly. Address a maximum of three capabilities at a time.
  4. 4Set the repetition cadence: fix the next assessment date in advance (recommendation: every six months). Only repetition turns the check from a snapshot into a steering mechanism and makes progress evidenced.

Positioning Within Your Assessment Portfolio

This AI-delivery capability check is deliberately scoped tightly to the question of whether your foundations carry AI gains. It unfolds its full effect in combination with two further instruments.

The AI Readiness Assessment examines organizational AI preparedness at the enterprise level — strategy, data, competencies, governance — and provides the strategic frame into which this delivery-specific check is embedded. The DevOps Maturity Assessment evaluates general delivery maturity across culture, automation, processes, measurement, and security, independent of the AI context, and serves as a broader baseline. The DORA Report 2025 article provides the research foundation and the strategic rationale for why exactly these seven capabilities make the difference.

Our recommendation for practice: start with this check as a fast, focused self-assessment for a pilot team. Use the identified weakest capability as a concrete entry point. Then expand with the DevOps Maturity Assessment for the complete delivery picture and the AI Readiness Assessment for the enterprise perspective.

Sources & References

The capabilities, maturity logics, and findings referenced in this article are based on the following sources:

  • Google Cloud — Announcing the 2025 DORA Report: https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report
  • Google Cloud — 2025 DORA AI-Assisted Software Development Report: https://cloud.google.com/resources/content/2025-dora-ai-assisted-software-development-report
  • InfoQ — AI Is Amplifying Software Engineering Performance (DORA 2025): https://www.infoq.com/news/2026/03/ai-dora-report/
  • RedMonk — DORA 2025: Measuring Software Delivery After AI: https://redmonk.com/rstephens/2025/12/18/dora2025/
  • DORA — DevOps Research and Assessment: https://dora.dev/

Key Takeaways

  • According to the DORA Report 2025, AI is an amplifier, not a repair tool — existing maturity determines whether AI increases throughput or destroys stability.
  • Seven capabilities carry the AI effect: clear direction, a healthy data ecosystem, quality focus, user focus, fast feedback loops, thoughtful AI application, and a solid internal platform.
  • Rate each capability conservatively on a five-level scale — a practice only counts if it is the normal case, not the good day.
  • AI damage materializes along the weakest capability, not along the average — which is why a profile is more meaningful than an overall score.
  • The most effective action after the check is almost never "more AI" but closing the lowest capability gap.
  • Repeat the check on a planned six-month cadence — only repetition turns it into a steering mechanism rather than a snapshot.

Frequently Asked Questions

The AI-delivery capability check is a structured self-assessment that operationalizes the seven capabilities from the DORA Report 2025. Each capability — clear direction, a healthy data ecosystem, quality focus, user focus, fast feedback loops, thoughtful AI application, and a solid internal platform — is rated on a five-level scale. The result is not an overall score but a profile that shows which foundations carry your AI investment and which currently undermine it. The check provides a definition, maturity levels, diagnostic questions, common gaps, and immediate actions per capability and is designed as a repeatable steering instrument, not a one-time exam.

The DORA Report 2025 shows that AI assistance increases throughput almost universally, but the impact on delivery stability depends on the maturity of the organization. Teams with mature foundations — reliable tests, fast feedback loops, clean version control — gain speed and keep stability. Teams without those foundations also gain speed but simultaneously lose reliability, because AI scales the existing rate of change and therefore existing weaknesses too. AI is thus a lever that amplifies the sign of existing maturity — positive as well as negative. The detailed analysis of this thesis can be found in the DORA Report 2025 article.

The DevOps Maturity Assessment evaluates general delivery maturity across five dimensions — culture, automation, processes, measurement, and security — independent of the AI context. It is the broader baseline for the entire software delivery organization. The AI-delivery capability check is narrower and more specific: it asks only whether your foundations carry the gains from AI assistance, and operationalizes the seven capabilities from the DORA Report 2025 for this purpose. The two complement each other: the DevOps Maturity Assessment provides the complete delivery picture, this check focuses on the AI-specific risk question. We recommend starting with this check as a fast entry point and using the DevOps Maturity Assessment for the comprehensive baseline.

The AI Readiness Assessment examines AI preparedness at the enterprise level — strategy, data foundation, competencies, governance, and organizational preconditions. It answers the question of whether the organization as a whole is ready for AI. The AI-delivery capability check is the delivery-specific deep dive within that frame: it concretely tests whether the software delivery foundations carry the AI gains. In practice, the AI Readiness Assessment provides the strategic frame, this check the operational depth for the engineering organization. Ideally, you use both in combination.

We recommend a six-month cadence. Only planned repetition turns the check from a snapshot into a steering mechanism that makes progress evidenced. A semi-annual rhythm fits typical planning cycles and at the same time gives enough time for improvement measures — particularly on cultural and process capabilities — to take effect. More frequent than every six months is rarely useful; less frequent than annually risks losing the improvement momentum. Our tip: fix the next date bindingly in advance, otherwise it gets postponed by day-to-day operations.

Identify the lowest capability in the profile and close that gap first — do not raise the average, do not optimize broadly. The negative stability effect of AI materializes along the weakest capability, not along the mean. As long as even one capability is at Level 1, aggressive AI speed is risky. Therefore raise every Level 1 capability to at least Level 2 as a priority, then invest specifically in the capability with the greatest business relevance. Never address more than three capabilities at a time to avoid overload and half-heartedness.

Yes. The check is deliberately designed as a self-assessment and can be run in a few hours with your own team. It is important not to have a single person fill it out — involve at least three people from development, operations, and product to reduce blind spots, and rate conservatively. The typical self-bias is to rate yourself too positively in areas perceived as unimportant and too critically in areas that are currently frustrating. For the first organization-wide assessment, neutral external facilitation can help correct bias and bring in industry benchmarks — for the ongoing semi-annual repetition, the self-assessment mode is usually sufficient.

KI DeliveryReifegradDORA Report 20257 CapabilitiesAI ReadinessSelf-Assessment

Ready for Your Assessment?

Use our interactive templates to measure your IT organization's maturity — with automatic scores, AI-powered recommendations, and professional PDF reports.