IT Governance26. April 20267 min

Three AI Breaches in Five Weeks — and Why It's Not a Security Problem

McKinsey, BCG, Bain — all three MBB firms had their internal AI platforms exposed within five weeks. At Bain, 18 minutes and a right-click on "View Source" was enough. This isn't hack drama. This is a case study in how AI rollouts in enterprises actually fail.

AL

Ali Levin

IT Delivery Management Consultant, Alev-B

18 Minutes

That's how long it took CodeWall to compromise Bain & Company's internal AI platform Pyxis. Method: right-click on a publicly served JavaScript file, "View Source," copy credentials, log in. What followed read like the worst-case scenario from a risk workshop: 9,889 AI conversations between Bain consultants and Fortune 500 clients, 18,621 characters of proprietary system prompt containing Pyxis methodology and SQL schemas, plus an API endpoint that accepted raw SQL queries and reflected results through error messages. Direct access to the production database.

Bain was number three. McKinsey in March, BCG in April, Bain in late April. Three top consultancies, five weeks, the same modus: unsecured AI tools in production, credentials or bypass paths that an experienced engineer finds in under an hour.

I've been observing AI rollouts in enterprises since 2023, and after these three incidents, my diagnosis is uncomfortable: This was not a security problem. It was a delivery failure with security consequences. Anyone who doesn't understand the difference will get their own Pyxis moment — not because their security is weak, but because their AI delivery pipeline is broken.

What Actually Happened

Before we get to the diagnosis, a sober summary of the three incidents. All sources are public, all numbers documented.

The Bain Case in Detail

Pyxis was acquired by Bain in 2018 and is the firm's in-house competitive intelligence platform — the system Fortune 500 clients use to analyze their competitors. CodeWall (Paul Price, the person behind the pseudonym) downloaded the JavaScript file served by the Pyxis web app, found a username and password in plain text, handed both to an AI agent that automated the login request — and had a fully authenticated production session 18 minutes later.

What he found there is the actual punchline: The platform's system prompt — 18,621 characters of Bain's proprietary Pyxis methodology, SQL schema definitions, and analytical frameworks — was readable by any authenticated session via conversation metadata. That's Bain's intellectual property. Freely readable, the moment login is past.

Additionally, an API endpoint accepted raw SQL queries and returned results through error messages. Classic SQL injection — but here not as a bug, but as a designed feature for internal analysts. After 27 days of responsible disclosure, CodeWall published the report. Bain patched within 24 hours.

FirmTool / SystemDateEntry VectorData Exposed
McKinseyInternal GenAI ChatbotMarch 2026Read/write access via AI agent exploitRead-write on client workspaces
BCGInternal AI PlatformApril 2026No password set (publicly accessible)Internal tools, client data
Bain & CompanyPyxis (Competitive Intelligence)April 24, 2026Credentials hardcoded in frontend JS9,889 conversations, 18,621-char system prompt, SQL injection on production DB

Why This Isn't a Security Problem

Hardcoded secrets in the frontend have been in the OWASP Top 10 since 2003. SQL injection has been textbook material for the same two decades. Both are covered in any junior developer course in the first week. We're not talking about a zero-day exploit leveraging an unpatched CVE in some obscure Linux kernel module. We're talking about two anti-patterns that any functioning software development process catches at four independent layers.

Four independent protective layers failed at Bain. Statistically, that's not a random gap, but a systematic omission pattern.

  1. 1Code review would have caught both findings — hardcoded credentials are the first thing a reviewer looks for.
  2. 2Static secret scanning in CI (Trufflehog, Gitleaks, GitHub Advanced Security) would have blocked the build before deploy ever happened.
  3. 3Dynamic application security testing (DAST) would have found the SQL injection endpoint in test runs.
  4. 4A threat model during AI rollout would have asked: "What happens if someone reads our system prompts?" — and the answer would have been: "Then we lose our most important IP."

The Real Diagnosis: Delivery Governance Is Missing

When four independent security controls all fail at the same tool, in the same firm, at the same time, that's not a random lapse. That's a delivery mode. And this delivery mode has a name: AI tools are treated as "innovation lab," not as production software.

The pattern is the same in every large organization I've worked with on AI rollouts since 2024. A skunkworks team — often three to five people, often with direct partner or C-level sponsorship — builds an AI pilot in six to eight weeks. The pilot works impressively. There's an internal demo, a few enthusiastic pilot users. Then comes the decisive moment: Instead of handing the tool off to platform engineering or product IT, it stays with the skunkworks team. The sponsor doesn't want to slow the momentum. "Production readiness review" sounds like bureaucracy.

Six months later, the "pilot" tool has 200 active users, processes client data, is deeply integrated into workflows — and still sits outside normal IT governance. There's no mandatory code review process, no CI secret scanning, no pen test, no DLP classification of stored data, no disaster recovery plan. It officially never went into production, so production rules don't apply.

This isn't an MBB peculiarity. I see this exact pattern in DAX corporations, mid-sized companies, software firms. At MBB, it's only particularly pronounced because the culture prioritizes time-to-pitch over time-to-hardening. If a Pyxis system can be demonstrated to a client in a workshop, it has fulfilled its primary purpose — the rest is "engineering detail."

What Bain Did Right

A crucial point here, because otherwise this article falls too easily into the "MBB bashing" category. Bain patched within 24 hours after CodeWall notified them following 27 days of responsible disclosure. McKinsey and BCG responded similarly fast. That is exemplary. It's exactly what a mature incident response organization should do: fast, no defensive PR, with concrete fixes.

The response of the three firms is not the problem. The problem is that all three made the same avoidable mistake beforehand. This is not an MBB phenomenon, it's an industry pattern — and that pattern hits any organization that says "AI-first" without meaning "production-first."

I'm not defending the MBB firms here out of sympathy, but because the lesson otherwise lands in the wrong place. Anyone who concludes from these three incidents that "the MBBs are technically incompetent" learns the wrong thing. The right lesson is: If three organizations with the best talent and biggest budgets in consulting can't close these gaps, your organization probably can't either.

Three Uncomfortable Truths

I'm not listing this as "5 Tips for Better AI Security" because that trivializes the diagnosis. These are three sentences that will be confirmed in most AI rollouts of the next twelve months — no matter how many listicle articles argue otherwise.

Truth 1: AI tools stop being pilots the moment they touch productive data.

The moment an "internal pilot tool" processes client data, customer data, or personal data is the moment it becomes production software. It doesn't matter what's in the Confluence wiki. It doesn't matter if the sponsorship memo says "MVP." When real data flows in, real rules apply.

The only working countermeasure is hard separation: Pilot environments get synthetic data or anonymized samples. The moment productive data is requested, there's a formal production readiness review — with mandatory code review, CI secret scanning, pen test, and DLP classification. No review, no productive data. Period.

Truth 2: System prompts are your most valuable IP. Treat them accordingly.

The Pyxis system prompt was 18,621 characters. That's years of methodology iteration, empirical lessons from thousands of client engagements, refined analytical framework language. Bain probably didn't write it over a weekend. In many cases, those 18,621 characters are more valuable than the code that runs them.

Yet system prompts are treated in practice like configuration: as a string in the database, readable by anyone with read access to the right table, no versioning, no access logs, no IP classification. That's a category error. Anyone developing a system prompt with competitive advantage must protect it like source code — with repository, code review, access control, and audit log.

Truth 3: AI conversation logs are crown-jewel data.

At Bain, 9,889 conversations between consultants and clients were exposed. Each one contains client strategies, competitive analyses, possibly non-public business numbers, and internal Bain methodology. That's the kind of data that belongs in the highest tier of a DLP classification — at the level of M&A materials and board submissions.

Instead, they were treated like normal application logs: in a database, no column-level encryption at rest, no row-level security, no automatic deletion. Anyone rolling out AI tools without explicit DLP classification of conversation logs has a blind spot any serious auditor finds in the first hour.

What I Expect in the Next Twelve Months

I don't believe the MBB incidents will be the last of their kind — neither in consulting nor in other industries. The diagnosis is not "MBB has a problem," but "the industry is structurally ungoverned in AI rollouts." As long as that remains true, we'll see a comparable incident every three to four weeks, presumably alternating between consulting, banking, healthcare, and insurance — wherever the mix of highly sensitive data, high speed pressure, and low production readiness standards is most toxic.

What will help: First, the NIS2 directive and EU AI Act, fully effective from mid-2026, enforcing minimum governance requirements. Second, increasing insurance pressure — cyber insurers are starting to require AI rollout reviews as a condition for policies. Third, and that's the more uncomfortable truth, several more public incidents before boards take the topic seriously.

What You Can Do Now

If you work in an organization currently rolling out Copilot, ChatGPT Enterprise, Claude Enterprise, or your own AI agents: Sit down with the team on Monday and answer three questions. Where do our AI tools already process productive data today? Which of these tools have undergone a formal production readiness review? If someone opens our AI frontends' source code tomorrow — what do they find?

If you can answer all three from a standing start with "we know, it's documented, it's secured": Congratulations, you're ahead of most. If not, you have your first three tasks.

This exercise takes four to six hours. It costs nothing but the time of three to five right people in the room. It needs no external consultant. Do it. With or without me.

Key Takeaways

  • Three MBB firms — McKinsey, BCG, Bain — had their internal AI tools compromised in five weeks. At Bain, 18 minutes and credentials from a frontend JavaScript file were enough.
  • Hardcoded secrets and SQL injection have been in the OWASP Top 10 since 2003. Four independent protective layers (code review, CI secret scanning, DAST, threat modeling) all failed at Bain.
  • Diagnosis: This is not a security problem, but a delivery governance failure. AI tools are treated as "innovation lab," not as production software — until they touch productive data.
  • Bain patched in 24 hours. The response was exemplary. The problem lies in the delivery mode beforehand: Speed-to-pilot eats speed-to-production.
  • Three uncomfortable truths: (1) Once AI tools touch productive data, they are production software. (2) System prompts are IP — protect them like source code. (3) AI conversation logs are crown-jewel data — DLP classification is mandatory.

Frequently Asked Questions

No. It was hardcoded credentials in a publicly served JavaScript file, combined with a SQL-injection-vulnerable API endpoint. Both anti-patterns have been in the OWASP Top 10 since 2003 and are covered in any junior developer course in the first week.

18 minutes from initial inspection of the frontend JavaScript file to a fully authenticated session in the production environment. After that, 9,889 AI conversations, the 18,621-character system prompt, and through a SQL endpoint the production database lay exposed.

No. The delivery pattern — AI pilot moving into productive use without production readiness review — has been visible since 2024 in DAX corporations, mid-sized companies, and software firms. At MBB, it's only particularly pronounced because the culture prioritizes time-to-pitch over time-to-hardening.

A security problem arises when a new, unknown attack vector is discovered. A delivery problem arises when known best practices are not applied in delivery. Hardcoded credentials in a frontend JS file are not a security problem — the solution has been known for 20 years. It's a delivery problem: Code review, CI scanning, and DAST were either absent or bypassed.

NIST CSF 2.0 (especially the GOVERN function), the EU AI Act (mandatory from August 2026 for high-risk systems), ISO/IEC 42001 (AI Management System), and internally applied DORA metrics on AI pipelines. Most organizations don't need a new framework, but consistent application of the existing ones.

Treat them like source code: Versioned in a repository with mandatory code review, access control at read level, audit log of invocations, and strictly not viewable in the tool frontend — not even via conversation metadata, as was the case with Pyxis. Anyone storing system prompts in the database and exposing them to authenticated sessions loses their IP advantage at the first login bypass.

Sit down on Monday for four hours with three to five people — engineering, security, product, and a C-level sponsor — and answer: (1) Where do our AI tools process productive data today? (2) Which have undergone a formal production readiness review? (3) What does someone find who opens the frontend source code tomorrow? The answers form your backlog priority for the next 90 days.

AI GovernanceEnterprise AI SecurityNIST CSF 2.0AI DeliveryDORA MetricsPyxisBainMcKinseyBCG

Ready for Your Assessment?

Use our interactive templates to measure your IT organization's maturity — with automatic scores, AI-powered recommendations, and professional PDF reports.