barcik.training

Scenario Planning
for Generative AI

Four futures. One framework. Your strategy.

Robert Barcik April 2026

Introduction

The Question

The AI industry has committed over $600 billion in capital expenditure for 2026 alone. That money is already flowing into data centers, GPU clusters, and training runs. It will produce outcomes.

The question isn’t whether AI will change — it’s which version of change to prepare for.

This booklet presents four credible scenarios for the next 2–3 years. They are not predictions. They are planning tools — structured what-ifs designed to stress-test your AI strategy and prepare your team for multiple futures simultaneously.

The four scenarios map onto two fundamental uncertainties. First: does scaling continue to deliver capability jumps, or do we hit diminishing returns at current-ish levels? Second: does the investment math work out, or is the infrastructure being built far beyond what revenue can justify? These two axes generate four distinct futures — continued scaling, efficiency-driven commoditization, a financial correction, and a capability plateau compounded by regulation.

Each scenario is anchored by an interactive visualization, followed by a written chapter that unpacks the data and arguments. You can use this booklet in two ways: as a presentation tool (lead with the visual, trigger discussion), or as a standalone reading experience (read the chapters for the full picture). Both work — design your session around your audience.

At the end, a 2×2 matrix maps all four scenarios into a single mental model, and a workshop exercise helps you translate insight into action.

Scenario 1

Continued Scaling
“Where Does the Money Go?”

The thesis

The $228B spent in 2024 wasn’t buying GPT-4 inference — that model already ran on 2023 hardware. It funded training clusters for models shipping 2–3 years later, and inference infrastructure for the next generation. The $700B guided for 2026 funds models whose architectures may not be designed yet.

The risk

DeepSeek V3 achieved frontier performance at 10× less compute. If algorithmic efficiency leaps continue, today’s massive clusters could be overbuilt for training. But labs are betting inference will dominate — ~70% of AI compute by 2030. The bet is on deployment scale, not just model size.

Hover or click any green investment dot to explore where the money goes

Inference capacity — ~2 yr

Training runs — ~3 yr

Research frontier — ~4+ yr

The money is already spent

When people debate whether AI will “live up to the hype,” they often miss a crucial fact: the investment decisions have already been made. The Big Five hyperscalers — Alphabet, Amazon, Apple, Meta, and Microsoft — spent a combined $228 billion on capital expenditure in 2024, up 62% from $140 billion in 2023. For 2025, guided spending reaches $416 billion. For 2026, the trajectory points to $700 billion or more, with Oracle adding another $50 billion. Adding up 2025–2027, Goldman Sachs projects total hyperscaler capex of $1.15 trillion. This money is flowing into GPU clusters, power infrastructure, and data centers — and the vast majority is earmarked specifically for AI.

The reason this matters for planning is that capex doesn’t translate into capability instantly. Building a data center takes 12–24 months. Procuring the chips takes 6–12 months. Training a frontier model takes another 6–12 months. Post-training, safety testing, and deployment add 3–6 more. Each year’s spending splits into three bets running at different timescales: inference capacity arriving in roughly 2 years, training runs for models 3 years out, and research compute powering breakthroughs 4+ years away.

The staircase pattern

Looking backward, AI capability has advanced in a staircase pattern: a major jump every roughly two years, followed by refinement within that generation. GPT-3 (2020) was dramatically surpassed by GPT-4 (2023), which was then refined through GPT-4 Turbo, GPT-4o, and eventually GPT-4o-mini — each iteration better and cheaper, but not a fundamentally new capability tier. The same pattern appears with Claude 3 Opus giving way to Claude 3.5 Sonnet and then Claude Opus 4, and now Claude Opus 4.6.

If this pattern holds, the $228 billion spent in 2024 is currently producing the training infrastructure for models that will ship in 2026–2027. The $416 billion committed for 2025 funds models arriving in 2027–2028. And the $700 billion planned for 2026 is investing in capabilities whose architectures may not even be designed yet — research compute for ideas that haven’t been conceived.

What is being built

The scale of individual projects is staggering. Elon Musk’s xAI deployed Colossus — a cluster of 100,000 GPUs — in just 122 days in late 2024. Microsoft’s Rainier project targets 500,000 GPUs. Meta’s Abilene aims for 450,000. The Stargate project (a joint venture between OpenAI, Oracle, and SoftBank) plans clusters exceeding one gigawatt of power — the equivalent of a nuclear power plant dedicated to AI. At these scales, the limiting factor shifts from chip availability to raw electrical power: the Three Mile Island nuclear facility is being restarted specifically to supply AI data centers.

The inference bet

A common misunderstanding is that all this money is about training bigger models. In reality, the labs are increasingly betting that inference — running models at scale for millions of users — will dominate AI compute, accounting for roughly 70% by 2030. Training a frontier model is a one-time cost; serving it to every enterprise customer, every developer, every consumer product is a continuous cost that scales with adoption. The capex surge is as much about building the serving infrastructure for AI-powered products as it is about training the next generation of models.

This is the core planning insight of Scenario 1: even if you believe the current generation of AI is “good enough,” the investment already committed will produce outcomes over the next 2–4 years. Those outcomes — faster models, cheaper inference, new capabilities — will change what’s possible and what’s expected. Your plans need to account for a moving target, not a snapshot.

$416B

2025 hyperscaler capex

$700B

2026 capex (guided)

~3.5T

Current frontier model params

~2 yr

Capability staircase cycle

70%

AI compute for inference by 2030

Trigger signals — what to watch for

Next-generation models (GPT-5, Claude 5) show a large, undeniable capability jump over predecessors
Enterprise AI revenue growth accelerates — the $500B+ revenue gap begins to close
Hyperscaler capex guidance continues rising >30% year-over-year through 2027
New model architectures emerge that can efficiently use the massive clusters being built

Implications by role

Developer

New capability jumps every ~2 years. Build with model-agnostic abstractions. What’s impossible today may be trivial in 18 months.

Team Lead

Plan for continuous retraining of team skills. Each model generation changes what tasks can be automated and how.

CTO / VP Eng

Justify continued AI investment with the staircase argument — current spend funds future capabilities, not today’s.

Procurement

Expect higher API costs initially for each new generation, dropping quickly as competition arrives. Avoid multi-year lock-in.

Data: company earnings reports & guidance • Big 5 = Alphabet, Amazon, Apple, Meta, Microsoft

Scenario 2

Efficiency Revolution
“How Much Does GPT-4 Cost?”

The thesis

The cost to train a GPT-4-class model fell ~95% in under two years. The cost to run one fell 99%. When capability becomes a commodity, the moat isn’t the model — it’s everything around it.

The risk

Cost compression assumes algorithmic efficiency continues compounding. If frontier capability requires genuinely new architectures (not just MoE optimizations), the floor may be higher than extrapolation suggests. Mistral’s bet only works if the ceiling is low.

Hover over any bar to see model details and benchmark comparisons

Mistral CEO Arthur Mensch: “Generic intelligence is a commodity, but contextual intelligence is a scarcity.” If models become free, value migrates to the layers around them:

Model

Fine-tune

Data & RAG

Tooling

Domain expertise

Discussion: If the model is free, where does your team invest?

The training cost freefall

In March 2023, OpenAI trained GPT-4 for an estimated $63–100 million — Sam Altman confirmed publicly that the cost exceeded $100 million including research and development. By July 2024, Meta had trained Llama 3.1 405B, an open-weight model matching GPT-4 on most benchmarks, for roughly $60 million in compute (30.84 million H100 GPU-hours). Then in December 2024, DeepSeek released V3 — a model that matched or exceeded GPT-4o on key benchmarks for just $5.6 million in GPU time.

That is a roughly 95% cost reduction in 20 months. A month later, DeepSeek released R1, which matched OpenAI’s o1 on reasoning tasks for an incremental $294,000 in training cost.

The caveats matter: DeepSeek’s $5.6 million figure covers only the final pre-training run. Their parent company High-Flyer invested over $500 million in Nvidia GPUs total, and the full cost from base model to R1 is estimated at $6–7 million by Epoch AI. But even the generous estimate represents a 90%+ reduction from GPT-4. The key innovations enabling this — FP8 mixed-precision training, mixture-of-experts architectures with load balancing, and custom CUDA kernels achieving 85%+ GPU utilization versus the industry average of 55–65% — are algorithmic, not hardware-dependent. They can be replicated.

The open-source convergence

The Stanford HAI 2025 AI Index Report documented the most important shift in the AI landscape: the performance gap between the best open-weight and proprietary models, measured by Chatbot Arena Elo ratings, shrank from 8.04% in January 2024 to 1.7% by February 2025 — a 79% reduction in a single year. On MMLU specifically, the gap between US and Chinese models collapsed from 17.5 percentage points to just 0.3 between the end of 2023 and the end of 2024.

Llama 3.1 405B was the first open model to match or exceed GPT-4 across multiple benchmarks in July 2024, roughly 16 months after GPT-4’s release. By early 2025, that lag had compressed further. Open-source models now represent 62.8% of all models by count, and the best open LLMs lag closed ones by 5–22 months on benchmarks — with the gap narrowing rapidly. One analysis projected open-closed parity by Q2 2026.

Inference pricing in freefall

The pricing evolution of OpenAI’s own API tells the commoditization story in dollar terms. GPT-4 launched at $60 per million output tokens in March 2023. GPT-4 Turbo brought that down to $30 in November 2023. GPT-4o launched at $15 in May 2024, then was cut to $10 in October. Meanwhile, GPT-4o-mini offered GPT-4-class performance at $0.60 per million tokens — a 99% reduction from GPT-4’s launch price in under two years.

Open-source alternatives are even cheaper. Llama 3.3 70B via Groq costs $0.71 per million output tokens. DeepSeek V3 is available at $0.42. Self-hosted 70B models on H100 hardware can reach approximately $0.07 per million tokens at full utilization. On average, open-source models cost 7.3 times less than their proprietary equivalents.

Where does value go when the model is free?

Mistral CEO Arthur Mensch has been the most articulate voice on this shift. In a March 2026 piece, he argued that many companies can move their workloads away from closed-source APIs to open-source models, and that the massive capability jumps between model generations have flattened into incremental gains. His strongest formulation: “the most valuable AI won’t be the one that knows everything about the world; it will be the one that knows everything about you.”

If Mistral is right — and the cost data supports the argument — then the model itself becomes a commodity layer, and value migrates to the layers around it: fine-tuning and domain adaptation, data pipelines and retrieval-augmented generation, tooling and orchestration (agent frameworks, MCP servers, evaluation pipelines), and ultimately domain expertise. The organizations that win in this scenario are not those with the best model, but those with the best understanding of their own problems.

95%

Training cost drop in 20 months

99%

Inference cost drop (GPT-4 launch to mini)

1.7%

Open vs closed model gap (Elo)

62.8%

Models are now open-source

7.3×

Open-source cost advantage

Trigger signals — what to watch for

Open-source model matches frontier proprietary within weeks of release, not months
Major enterprise shifts production workloads from proprietary APIs to open-source alternatives
Inference costs drop below $0.10 per million tokens for GPT-4-class output
Hyperscaler capex growth decelerates because efficiency gains reduce hardware requirements

Implications by role

Developer

Open-source becomes the default stack. Invest in fine-tuning, RAG, and tooling skills — not prompt engineering for a single vendor.

Team Lead

Build team expertise in the “static layer” — data pipelines, evaluation, deployment. Model expertise commoditizes fast.

CTO / VP Eng

Reduce vendor lock-in. The value shifts from which model you use to how you integrate it. Invest in infrastructure and domain specialization.

Procurement

Shift budgets from API costs to infrastructure and talent. Self-hosting becomes economically viable for high-volume workloads.

Data: OpenAI API pricing history • DeepSeek technical reports • Stanford HAI 2025 AI Index • Epoch AI

Scenario 3

Financial Correction
“Have We Seen This Before?”

The thesis

A financial correction kills companies, not technology. Amazon survived a 94% stock drop. The internet didn’t disappear — the funding did. The question isn’t whether AI works. It’s whether the investment timeline matches the revenue timeline.

The risk

The parallel breaks in important ways. Dot-com investors were startups burning VC cash. Today’s AI investors are the most profitable companies in history spending from earnings. Nasdaq P/E today is ~26× vs 60× at the dot-com peak. Calling it a bubble may be correct on timing but wrong on magnitude.

Hover over timeline events to explore the parallel • Click cards below to flip

Survivors vs. Casualties — then and now

Click any card to flip it and see what happened.

Survivor

Amazon

Stock: −94%

Peak $106 → trough $5.51

Revenue never stopped growing.

$2.76B (2000) → $8.49B (2005)

Had $1B cash from well-timed bond offering

Today: ~$2.5 trillion market cap

Casualty

Pets.com

Raised $300M

Revenue: $619K

Dead 268 days after IPO.

Spent $70M+ on ads

Negative unit economics from day one

Today: a cautionary tale

Today

OpenAI

Valuation: $852B

Revenue: $25B ARR

Projects −$14B loss in 2026.

900M+ weekly users

Profitable by 2029 (earliest)

IPO expected Q4 2026 at $1T+

Casualty

Stability AI

CEO resigned Mar 2024

Revenue: <$5M/quarter

Burn rate: $8M/month

~$100M in debt

Couldn’t compete on frontier model training costs

The AI-era Pets.com?

12,000

ChatGPT-sized products needed to justify current AI infrastructure capex (Barclays estimate)

The dot-com precedent

On March 10, 2000, the Nasdaq Composite reached an all-time high of 5,048.62. By October 9, 2002, it had fallen to 1,114 — a 78% decline that destroyed over $5 trillion in market value. The Nasdaq didn’t close above 5,000 again until April 23, 2015 — a recovery that took fifteen years. At the peak, venture capital investment had surged from roughly $7 billion in 1995 to nearly $100 billion in 2000, with internet companies absorbing 80% of all venture capital. Telecom companies invested more than $500 billion in infrastructure in the five years following the 1996 Telecommunications Act.

The lesson that most people take from this period is: “it was a bubble and it burst.” The more useful lesson is that the survivors and casualties were distinguished by one thing — not the quality of their technology, but whether they had real revenue, real customers, and cash to survive a funding drought.

Amazon vs. Pets.com

Amazon’s stock fell 94% from roughly $106 in December 1999 to about $5.51 in late 2001. Yet its revenue grew every single year through the crash: $2.76 billion in 2000, $3.12 billion in 2001, $5.26 billion in 2003, $8.49 billion in 2005. It posted its first profitable quarter in Q4 2001 and its first full profitable year in 2003, with $35 million net income on $5.26 billion revenue. The key decision was a well-timed $1.25 billion bond offering that gave Amazon $1 billion in cash to survive the drought. Today it is worth roughly $2.5 trillion — over 800 times its trough market cap.

Pets.com raised $300 million total, spent over $70 million on advertising while generating only $619,000 in revenue, and shut down 268 days after its IPO. Webvan burned through $1.5 billion building automated warehouses before filing bankruptcy. Boo.com raised $135 million, burned it in 18 months, and sold its assets for under $2 million. The common thread: negative unit economics, no path to profitability, and complete dependence on the next funding round.

AI investment has entered unprecedented territory

The combined Big Five capex (Alphabet, Amazon, Meta, Microsoft, Apple) grew from roughly $140 billion in 2022 to $251 billion in 2024 (+62% year-over-year) to a projected $388–443 billion in 2025 and $600–640 billion in 2026. Capital intensity has reached 45–57% of revenue — historically unprecedented for these companies. Venture funding has concentrated similarly: global AI VC funding grew from roughly $45–50 billion in 2022 to $211 billion in 2025, the first year AI startups captured more than half (52.7%) of all global venture deal value.

OpenAI reached an $852 billion post-money valuation after its $122 billion funding round in March 2026. Annualized revenue hit $25 billion by February 2026, up from roughly $2 billion in 2023. But the company projects a $14–17 billion loss in 2026, is not expected to be profitable until 2029 at the earliest, and has committed $600 billion in compute spending through 2030. Anthropic reached $380 billion valuation with its $30 billion Series G in February 2026, with revenue growing from $1 billion ARR in December 2024 to an estimated $19–30 billion ARR by early 2026.

The revenue gap

Sequoia partner David Cahn published “AI’s $600B Question” in June 2024, calculating that the AI infrastructure buildout requires roughly $600 billion in annual end-user revenue to justify itself. At the time, actual AI product revenue was roughly $100 billion — a $500 billion annual gap. Since then, both spending and revenue have grown, but spending has grown far faster: capex roughly tripled while the revenue gap has likely widened, not narrowed. Barclays estimated that current capex levels would require the equivalent of 12,000 ChatGPT-sized products to break even.

The enterprise adoption data is sobering: an MIT study found 95% of organizations getting zero return from generative AI investments, and a Deloitte 2026 survey found only 20% of enterprises reporting AI driving revenue, with two-thirds still stuck in pilot phase.

Why the parallel breaks — and why it might not matter

There are important differences from the dot-com era. Today’s leading AI investors are massively profitable companies spending from earnings, not startups burning venture capital. Nasdaq forward price-to-earnings ratios are approximately 26 times versus 60 times at the dot-com peak. Enterprise adoption is far more advanced: 87% of large enterprises have implemented AI in some form. But the core structural risk — investment dramatically outpacing revenue realization — is identical. And new risks have emerged: AI-related corporate debt has ballooned to $1.2 trillion (JPMorgan), GPU rental prices have already fallen roughly 70% from peak, and the real useful life of GPU infrastructure may be 2–3 years rather than the 5–6 years used for accounting depreciation.

The question for your planning is not whether AI is valuable. It is. The question is whether your specific vendors, tools, and providers are the Amazon or the Pets.com of this cycle.

$500–600B

Annual AI revenue gap

94%

Amazon’s stock drop (survived)

268 days

Pets.com IPO to shutdown

95%

Enterprises with zero AI ROI

$1.2T

AI-related corporate debt

Trigger signals — what to watch for

OpenAI or Anthropic IPO valuations correct significantly (>30%) within 6 months of listing
Hyperscaler capex guidance flattens or declines for the first time since 2022
Multiple AI-native startups fail or get acqui-hired in a single quarter (Inflection, Character.AI pattern)
GPU rental prices continue falling — H100 rates already down ~70% from peak
Major AI-related debt defaults or CoreWeave-style stranded asset writedowns

Implications by role

Developer

Diversify beyond AI-only skills. The developers who survived the dot-com bust were the ones who could build real products, not just demos.

Team Lead

Every AI project must have measurable ROI. “We’re exploring AI” won’t survive a budget cut. Show business value now.

CTO / VP Eng

Stress-test vendor viability. Which of your AI vendors is Amazon (real revenue, cash reserves) and which is Pets.com?

Procurement

Negotiate shorter contracts. Avoid lock-in with providers who may not exist in 18 months. Prefer pay-as-you-go over committed spend.

Data: Nasdaq historical data • Sequoia “AI’s $600B Question” • Barclays Research • MIT/Deloitte enterprise surveys

Scenario 4

Plateau + Regulation
“The Shrinking Gain”

The thesis

MMLU gains collapsed from +26 points per generation to <2. Ilya Sutskever declared “the age of scaling is over.” Meanwhile, EU AI Act enforcement, copyright litigation, and GDPR are raising the floor of what’s required to deploy. The space between “what AI can do” and “what you’re allowed to ship” is narrowing.

The risk

Benchmark saturation may reflect benchmark design, not capability limits. ARC-AGI-2 and FrontierMath show huge headroom on harder tasks. Regulation may slow deployment but also creates moats for compliant players. “Boring but useful” is still a massive market.

Hover over benchmark bars or regulation steps for details

Lab benchmarks dramatically overstate real-world capability:

40 pts

Medical diagnosis gap
92% accuracy in lab → 52% in real-world meta-analysis (83 studies)

71 pts

Coding gap
97% HumanEval → 26% on real freelance tasks (SWE-Lancer)

95%

Enterprise ROI gap
of organizations report zero return from GenAI investments (MIT)

Discussion: The ceiling is flattening, the floor is rising, and the current ceiling overstates what you can deploy. What does your AI roadmap look like?

The flattening curve

MMLU (Massive Multitask Language Understanding) has been the most widely cited AI benchmark for three years. The trajectory of MMLU gains tells a stark story: GPT-3 scored 43.9% in 2020. GPT-3.5 jumped to 70.0% — a gain of 26.1 points. GPT-4 reached 86.4% — a gain of 16.4 points. Then the curve flattened dramatically: GPT-4o added just 2.3 points, GPT-4.5 added 2.1, and GPT-5 approximately 1.7. All frontier models now cluster in the 88–93% range. Since approximately 6.5% of MMLU questions contain errors, the practical ceiling is around 93% — meaning frontier models are essentially at the top.

This flattening isn’t limited to MMLU. GSM8K (grade-school math) is completely saturated — frontier models score 95–99%. HumanEval (coding) has been pushed to 93–97%. The industry has responded by creating harder benchmarks: FrontierMath (research-grade mathematics) where AI solves only about 2% of problems, Humanity’s Last Exam where the top score is 45.8%, ARC-AGI-2 (genuine generalization) where the best AI score is 54% at $30 per task while humans solve 100%, and BigCodeBench where AI succeeds 35.5% of the time versus a 97% human standard. These harder benchmarks reveal that near-human performance on traditional tests masks fundamental limitations in reasoning and generalization.

Peak data and the end of scaling

At NeurIPS 2024, OpenAI co-founder Ilya Sutskever declared what many researchers had been sensing: the age of scaling as we knew it was over. He compared training data to fossil fuels — a finite resource being rapidly depleted. Epoch AI’s peer-reviewed research quantifies the constraint: the total stock of high-quality public text data is estimated at roughly 9 trillion tokens, and models may exhaust this supply between 2026 and 2028. Data movement bottlenecks impose a hard limit at approximately 2×10³¹ FLOP — roughly three years from 2024.

Pre-training as we know it will unquestionably end… because while compute is growing through better hardware, the data is not growing because we have but one internet.
— Ilya Sutskever, NeurIPS 2024

He described 2020–2025 as “the age of scaling” and declared a return to “the age of research,” where breakthroughs require new conceptual ingredients rather than larger clusters. This doesn’t mean AI stops improving — it means the path to improvement changes from “throw more compute at it” to “invent something fundamentally new.”

The benchmark-to-deployment gap

For corporate audiences, the most underappreciated data point is this: performance on benchmarks dramatically overstates real-world capability. ChatGPT-4 achieved 92% diagnostic accuracy in controlled medical studies, but a meta-analysis across 83 studies found only 52.1% overall AI diagnostic accuracy in real-world settings — a nearly 40-point gap. On the SWE-Lancer benchmark of real freelance coding tasks, even top models succeed only 26.2% of the time despite near-perfect HumanEval scores. On RE-Bench long-horizon tasks, AI systems score 4 times higher than humans at 2 hours but humans outperform AI 2:1 at 32 hours — suggesting current AI excels at pattern-matching but struggles with sustained complex reasoning.

The regulatory floor is rising

While the capability ceiling flattens, the regulatory floor is steadily rising. The EU AI Act is the most consequential framework, with enforcement phased in over three years. Prohibited practices (social scoring, manipulative AI, predictive policing) were banned in February 2025. General-purpose AI model obligations took effect in August 2025, activating a penalty regime with fines up to €35 million or 7% of global turnover. High-risk AI system obligations — requiring conformity assessment, human oversight, and technical documentation before deployment — apply from August 2026. Full enforcement arrives in August 2027.

The compliance costs are substantial: large enterprises face an estimated $8–15 million initial investment for high-risk systems. In parallel, there are now 56+ copyright lawsuits against AI companies. The Bartz v. Anthropic case produced a $1.5 billion class-wide settlement. The New York Times v. OpenAI case has been consolidated into a multi-district litigation with summary judgment due in April 2026. The combined effect — diminishing capability gains on top, rising regulatory and legal requirements on the bottom — is a narrowing “deployable innovation space” that shapes what organizations can actually ship.

Boring but useful

Scenario 4 is not a disaster scenario. It is arguably the most likely near-term outcome for enterprise practitioners. AI becomes reliable, well-understood infrastructure — similar to cloud computing a decade ago. Not revolutionary, but genuinely useful. The opportunity shifts from “what can AI do that was previously impossible?” to “how can we deploy what already works, reliably and compliantly?” Organizations that treat this as a compliance challenge rather than an innovation challenge may actually find the strongest strategic position.

+26 → <2

MMLU gain collapse per generation

~93%

MMLU practical ceiling (6.5% errors)

€35M

Max EU AI Act fine (or 7% turnover)

AI score on FrontierMath

$8–15M

Est. high-risk compliance cost

Trigger signals — what to watch for

Next frontier model (GPT-5, Claude 5) shows only incremental benchmark improvement (<3 pts MMLU)
EU AI Act enforcement actions begin — first fines or compliance orders issued against AI providers
Major copyright ruling goes against AI training (NYT v. OpenAI summary judgment, due April 2026)
Enterprises begin deferring AI projects citing compliance uncertainty rather than budget constraints
Ilya Sutskever’s thesis is validated: new research paradigms (not scaling) drive the next capability jump

Implications by role

Developer

Focus on reliability over capability. Building robust AI systems that work consistently matters more than chasing the latest model.

Team Lead

Build compliance into the workflow from day one. Retroactive compliance is 3–5× more expensive than building it in.

CTO / VP Eng

AI becomes boring infrastructure — like cloud. Plan for incremental improvement, not revolutionary leaps. Budget for compliance costs.

Procurement

Compliance-ready vendors become premium. Budget $8–15M for high-risk system compliance. Factor EU AI Act timelines into every AI purchase.

Data: MMLU/benchmark scores from model papers • EU AI Act timelines • Ilya Sutskever NeurIPS 2024 • Epoch AI data projections

Synthesis

The 2×2 Matrix

All four scenarios map onto two fundamental uncertainties: do capability jumps continue (or do gains flatten), and is the investment justified (or is the infrastructure overbuilt)? Click any cell to navigate to that scenario.

Investment justified

Infrastructure overbuilt

Capability jumps continue

Scenario 1

Continued Scaling

The staircase holds. Capex translates to capability. Enterprise revenue catches up. Full steam ahead.

Scenario 3

Financial Correction

The tech works, but the investment timeline doesn’t. Correction kills companies, not capability. Amazon vs. Pets.com.

Capability gains flatten

Scenario 4

Plateau + Regulation

Diminishing returns meet rising compliance burden. AI becomes boring infrastructure. Useful but not revolutionary.

Scenario 2

Efficiency Revolution

Smaller models close the gap. Massive clusters weren’t needed. The moat shifts from the model to everything around it.

These scenarios are not mutually exclusive. Elements of several can unfold simultaneously. Compression (Scenario 2) accelerates financial correction (Scenario 3) by commoditizing the very technology hyperscalers are spending $600B+ to build. The capability ceiling (Scenario 4) undermines the revenue projections needed to justify that investment.

The strongest strategic position is one that performs adequately under all four scenarios — not one that bets everything on the one you think is most likely.

Workshop Exercise

Scenario Planning Worksheet

For each scenario, answer three questions. There are no right answers — the value is in the thinking.

Scenario 1: Continued Scaling

If this materializes in 12 months…

Scenario 2: Efficiency Revolution

If this materializes in 12 months…

Scenario 3: Financial Correction

If this materializes in 12 months…

Scenario 4: Plateau + Regulation

If this materializes in 12 months…

1. Biggest risk to your current project

Under each scenario, what is the single biggest risk to your current AI initiative? Which scenario threatens it most?

2. What would you change today?

For each scenario, what is one decision you would make differently right now — before you know which scenario materializes?

3. Earliest signal to watch

What’s the one observable event that would tell you this scenario is unfolding? Where would you see it first?

4. Which scenario are you betting on?

Look at your current AI strategy. Which scenario is it implicitly assuming? What happens if you’re wrong?

5. One action that works under all scenarios

What is one thing you can do this month that improves your position regardless of which future materializes?

Interactive versions of all visualizations: demos.barcik.training
Full research and data: publications.barcik.training

Scenario Planningfor Generative AI

The Question

Continued Scaling “Where Does the Money Go?”

The money is already spent

The staircase pattern

What is being built

The inference bet

Trigger signals — what to watch for

Implications by role

Efficiency Revolution “How Much Does GPT-4 Cost?”

The training cost freefall

The open-source convergence

Inference pricing in freefall

Where does value go when the model is free?

Trigger signals — what to watch for

Implications by role

Financial Correction “Have We Seen This Before?”

Survivors vs. Casualties — then and now

The dot-com precedent

Amazon vs. Pets.com

AI investment has entered unprecedented territory

The revenue gap

Why the parallel breaks — and why it might not matter

Trigger signals — what to watch for

Implications by role

Plateau + Regulation “The Shrinking Gain”

The flattening curve

Peak data and the end of scaling

The benchmark-to-deployment gap

The regulatory floor is rising

Boring but useful

Trigger signals — what to watch for

Implications by role

The 2×2 Matrix

Continued Scaling

Financial Correction

Plateau + Regulation

Efficiency Revolution

Scenario Planning Worksheet

Scenario 1: Continued Scaling

Scenario 2: Efficiency Revolution

Scenario 3: Financial Correction

Scenario 4: Plateau + Regulation

1. Biggest risk to your current project

2. What would you change today?

3. Earliest signal to watch

4. Which scenario are you betting on?

5. One action that works under all scenarios

Scenario Planning
for Generative AI

Continued Scaling
“Where Does the Money Go?”

Efficiency Revolution
“How Much Does GPT-4 Cost?”

Financial Correction
“Have We Seen This Before?”

Plateau + Regulation
“The Shrinking Gain”