Scenario Planning
for Generative AI
Four futures. One framework. Your strategy.
The Question
The AI industry has committed over $600 billion in capital expenditure for 2026 alone. That money is already flowing into data centers, GPU clusters, and training runs. It will produce outcomes.
The question isn’t whether AI will change — it’s which version of change to prepare for.
This booklet presents four credible scenarios for the next 2–3 years. They are not predictions. They are planning tools — structured what-ifs designed to stress-test your AI strategy and prepare your team for multiple futures simultaneously.
The four scenarios map onto two fundamental uncertainties. First: does scaling continue to deliver capability jumps, or do we hit diminishing returns at current-ish levels? Second: does the investment math work out, or is the infrastructure being built far beyond what revenue can justify? These two axes generate four distinct futures — continued scaling, efficiency-driven commoditization, a financial correction, and a capability plateau compounded by regulation.
Each scenario is anchored by an interactive visualization, followed by a written chapter that unpacks the data and arguments. You can use this booklet in two ways: as a presentation tool (lead with the visual, trigger discussion), or as a standalone reading experience (read the chapters for the full picture). Both work — design your session around your audience.
At the end, a 2×2 matrix maps all four scenarios into a single mental model, and a workshop exercise helps you translate insight into action.
Continued Scaling
“Where Does the Money Go?”
The money is already spent
When people debate whether AI will “live up to the hype,” they often miss a crucial fact: the investment decisions have already been made. The Big Five hyperscalers — Alphabet, Amazon, Apple, Meta, and Microsoft — spent a combined $228 billion on capital expenditure in 2024, up 62% from $140 billion in 2023. For 2025, guided spending reaches $416 billion. For 2026, the trajectory points to $700 billion or more, with Oracle adding another $50 billion. Adding up 2025–2027, Goldman Sachs projects total hyperscaler capex of $1.15 trillion. This money is flowing into GPU clusters, power infrastructure, and data centers — and the vast majority is earmarked specifically for AI.
The reason this matters for planning is that capex doesn’t translate into capability instantly. Building a data center takes 12–24 months. Procuring the chips takes 6–12 months. Training a frontier model takes another 6–12 months. Post-training, safety testing, and deployment add 3–6 more. Each year’s spending splits into three bets running at different timescales: inference capacity arriving in roughly 2 years, training runs for models 3 years out, and research compute powering breakthroughs 4+ years away.
The staircase pattern
Looking backward, AI capability has advanced in a staircase pattern: a major jump every roughly two years, followed by refinement within that generation. GPT-3 (2020) was dramatically surpassed by GPT-4 (2023), which was then refined through GPT-4 Turbo, GPT-4o, and eventually GPT-4o-mini — each iteration better and cheaper, but not a fundamentally new capability tier. The same pattern appears with Claude 3 Opus giving way to Claude 3.5 Sonnet and then Claude Opus 4, and now Claude Opus 4.6.
If this pattern holds, the $228 billion spent in 2024 is currently producing the training infrastructure for models that will ship in 2026–2027. The $416 billion committed for 2025 funds models arriving in 2027–2028. And the $700 billion planned for 2026 is investing in capabilities whose architectures may not even be designed yet — research compute for ideas that haven’t been conceived.
What is being built
The scale of individual projects is staggering. Elon Musk’s xAI deployed Colossus — a cluster of 100,000 GPUs — in just 122 days in late 2024. Microsoft’s Rainier project targets 500,000 GPUs. Meta’s Abilene aims for 450,000. The Stargate project (a joint venture between OpenAI, Oracle, and SoftBank) plans clusters exceeding one gigawatt of power — the equivalent of a nuclear power plant dedicated to AI. At these scales, the limiting factor shifts from chip availability to raw electrical power: the Three Mile Island nuclear facility is being restarted specifically to supply AI data centers.
The inference bet
A common misunderstanding is that all this money is about training bigger models. In reality, the labs are increasingly betting that inference — running models at scale for millions of users — will dominate AI compute, accounting for roughly 70% by 2030. Training a frontier model is a one-time cost; serving it to every enterprise customer, every developer, every consumer product is a continuous cost that scales with adoption. The capex surge is as much about building the serving infrastructure for AI-powered products as it is about training the next generation of models.
This is the core planning insight of Scenario 1: even if you believe the current generation of AI is “good enough,” the investment already committed will produce outcomes over the next 2–4 years. Those outcomes — faster models, cheaper inference, new capabilities — will change what’s possible and what’s expected. Your plans need to account for a moving target, not a snapshot.
Trigger signals — what to watch for
- Next-generation models (GPT-5, Claude 5) show a large, undeniable capability jump over predecessors
- Enterprise AI revenue growth accelerates — the $500B+ revenue gap begins to close
- Hyperscaler capex guidance continues rising >30% year-over-year through 2027
- New model architectures emerge that can efficiently use the massive clusters being built
Implications by role
Data: company earnings reports & guidance • Big 5 = Alphabet, Amazon, Apple, Meta, Microsoft
Efficiency Revolution
“How Much Does GPT-4 Cost?”
Mistral CEO Arthur Mensch: “Generic intelligence is a commodity, but contextual intelligence is a scarcity.” If models become free, value migrates to the layers around them:
Discussion: If the model is free, where does your team invest?
The training cost freefall
In March 2023, OpenAI trained GPT-4 for an estimated $63–100 million — Sam Altman confirmed publicly that the cost exceeded $100 million including research and development. By July 2024, Meta had trained Llama 3.1 405B, an open-weight model matching GPT-4 on most benchmarks, for roughly $60 million in compute (30.84 million H100 GPU-hours). Then in December 2024, DeepSeek released V3 — a model that matched or exceeded GPT-4o on key benchmarks for just $5.6 million in GPU time.
That is a roughly 95% cost reduction in 20 months. A month later, DeepSeek released R1, which matched OpenAI’s o1 on reasoning tasks for an incremental $294,000 in training cost.
The caveats matter: DeepSeek’s $5.6 million figure covers only the final pre-training run. Their parent company High-Flyer invested over $500 million in Nvidia GPUs total, and the full cost from base model to R1 is estimated at $6–7 million by Epoch AI. But even the generous estimate represents a 90%+ reduction from GPT-4. The key innovations enabling this — FP8 mixed-precision training, mixture-of-experts architectures with load balancing, and custom CUDA kernels achieving 85%+ GPU utilization versus the industry average of 55–65% — are algorithmic, not hardware-dependent. They can be replicated.
The open-source convergence
The Stanford HAI 2025 AI Index Report documented the most important shift in the AI landscape: the performance gap between the best open-weight and proprietary models, measured by Chatbot Arena Elo ratings, shrank from 8.04% in January 2024 to 1.7% by February 2025 — a 79% reduction in a single year. On MMLU specifically, the gap between US and Chinese models collapsed from 17.5 percentage points to just 0.3 between the end of 2023 and the end of 2024.
Llama 3.1 405B was the first open model to match or exceed GPT-4 across multiple benchmarks in July 2024, roughly 16 months after GPT-4’s release. By early 2025, that lag had compressed further. Open-source models now represent 62.8% of all models by count, and the best open LLMs lag closed ones by 5–22 months on benchmarks — with the gap narrowing rapidly. One analysis projected open-closed parity by Q2 2026.
Inference pricing in freefall
The pricing evolution of OpenAI’s own API tells the commoditization story in dollar terms. GPT-4 launched at $60 per million output tokens in March 2023. GPT-4 Turbo brought that down to $30 in November 2023. GPT-4o launched at $15 in May 2024, then was cut to $10 in October. Meanwhile, GPT-4o-mini offered GPT-4-class performance at $0.60 per million tokens — a 99% reduction from GPT-4’s launch price in under two years.
Open-source alternatives are even cheaper. Llama 3.3 70B via Groq costs $0.71 per million output tokens. DeepSeek V3 is available at $0.42. Self-hosted 70B models on H100 hardware can reach approximately $0.07 per million tokens at full utilization. On average, open-source models cost 7.3 times less than their proprietary equivalents.
Where does value go when the model is free?
Mistral CEO Arthur Mensch has been the most articulate voice on this shift. In a March 2026 piece, he argued that many companies can move their workloads away from closed-source APIs to open-source models, and that the massive capability jumps between model generations have flattened into incremental gains. His strongest formulation: “the most valuable AI won’t be the one that knows everything about the world; it will be the one that knows everything about you.”
If Mistral is right — and the cost data supports the argument — then the model itself becomes a commodity layer, and value migrates to the layers around it: fine-tuning and domain adaptation, data pipelines and retrieval-augmented generation, tooling and orchestration (agent frameworks, MCP servers, evaluation pipelines), and ultimately domain expertise. The organizations that win in this scenario are not those with the best model, but those with the best understanding of their own problems.
Trigger signals — what to watch for
- Open-source model matches frontier proprietary within weeks of release, not months
- Major enterprise shifts production workloads from proprietary APIs to open-source alternatives
- Inference costs drop below $0.10 per million tokens for GPT-4-class output
- Hyperscaler capex growth decelerates because efficiency gains reduce hardware requirements
Implications by role
Data: OpenAI API pricing history • DeepSeek technical reports • Stanford HAI 2025 AI Index • Epoch AI
Financial Correction
“Have We Seen This Before?”
Survivors vs. Casualties — then and now
Click any card to flip it and see what happened.
The dot-com precedent
On March 10, 2000, the Nasdaq Composite reached an all-time high of 5,048.62. By October 9, 2002, it had fallen to 1,114 — a 78% decline that destroyed over $5 trillion in market value. The Nasdaq didn’t close above 5,000 again until April 23, 2015 — a recovery that took fifteen years. At the peak, venture capital investment had surged from roughly $7 billion in 1995 to nearly $100 billion in 2000, with internet companies absorbing 80% of all venture capital. Telecom companies invested more than $500 billion in infrastructure in the five years following the 1996 Telecommunications Act.
The lesson that most people take from this period is: “it was a bubble and it burst.” The more useful lesson is that the survivors and casualties were distinguished by one thing — not the quality of their technology, but whether they had real revenue, real customers, and cash to survive a funding drought.
Amazon vs. Pets.com
Amazon’s stock fell 94% from roughly $106 in December 1999 to about $5.51 in late 2001. Yet its revenue grew every single year through the crash: $2.76 billion in 2000, $3.12 billion in 2001, $5.26 billion in 2003, $8.49 billion in 2005. It posted its first profitable quarter in Q4 2001 and its first full profitable year in 2003, with $35 million net income on $5.26 billion revenue. The key decision was a well-timed $1.25 billion bond offering that gave Amazon $1 billion in cash to survive the drought. Today it is worth roughly $2.5 trillion — over 800 times its trough market cap.
Pets.com raised $300 million total, spent over $70 million on advertising while generating only $619,000 in revenue, and shut down 268 days after its IPO. Webvan burned through $1.5 billion building automated warehouses before filing bankruptcy. Boo.com raised $135 million, burned it in 18 months, and sold its assets for under $2 million. The common thread: negative unit economics, no path to profitability, and complete dependence on the next funding round.
AI investment has entered unprecedented territory
The combined Big Five capex (Alphabet, Amazon, Meta, Microsoft, Apple) grew from roughly $140 billion in 2022 to $251 billion in 2024 (+62% year-over-year) to a projected $388–443 billion in 2025 and $600–640 billion in 2026. Capital intensity has reached 45–57% of revenue — historically unprecedented for these companies. Venture funding has concentrated similarly: global AI VC funding grew from roughly $45–50 billion in 2022 to $211 billion in 2025, the first year AI startups captured more than half (52.7%) of all global venture deal value.
OpenAI reached an $852 billion post-money valuation after its $122 billion funding round in March 2026. Annualized revenue hit $25 billion by February 2026, up from roughly $2 billion in 2023. But the company projects a $14–17 billion loss in 2026, is not expected to be profitable until 2029 at the earliest, and has committed $600 billion in compute spending through 2030. Anthropic reached $380 billion valuation with its $30 billion Series G in February 2026, with revenue growing from $1 billion ARR in December 2024 to an estimated $19–30 billion ARR by early 2026.
The revenue gap
Sequoia partner David Cahn published “AI’s $600B Question” in June 2024, calculating that the AI infrastructure buildout requires roughly $600 billion in annual end-user revenue to justify itself. At the time, actual AI product revenue was roughly $100 billion — a $500 billion annual gap. Since then, both spending and revenue have grown, but spending has grown far faster: capex roughly tripled while the revenue gap has likely widened, not narrowed. Barclays estimated that current capex levels would require the equivalent of 12,000 ChatGPT-sized products to break even.
The enterprise adoption data is sobering: an MIT study found 95% of organizations getting zero return from generative AI investments, and a Deloitte 2026 survey found only 20% of enterprises reporting AI driving revenue, with two-thirds still stuck in pilot phase.
Why the parallel breaks — and why it might not matter
There are important differences from the dot-com era. Today’s leading AI investors are massively profitable companies spending from earnings, not startups burning venture capital. Nasdaq forward price-to-earnings ratios are approximately 26 times versus 60 times at the dot-com peak. Enterprise adoption is far more advanced: 87% of large enterprises have implemented AI in some form. But the core structural risk — investment dramatically outpacing revenue realization — is identical. And new risks have emerged: AI-related corporate debt has ballooned to $1.2 trillion (JPMorgan), GPU rental prices have already fallen roughly 70% from peak, and the real useful life of GPU infrastructure may be 2–3 years rather than the 5–6 years used for accounting depreciation.
The question for your planning is not whether AI is valuable. It is. The question is whether your specific vendors, tools, and providers are the Amazon or the Pets.com of this cycle.
Trigger signals — what to watch for
- OpenAI or Anthropic IPO valuations correct significantly (>30%) within 6 months of listing
- Hyperscaler capex guidance flattens or declines for the first time since 2022
- Multiple AI-native startups fail or get acqui-hired in a single quarter (Inflection, Character.AI pattern)
- GPU rental prices continue falling — H100 rates already down ~70% from peak
- Major AI-related debt defaults or CoreWeave-style stranded asset writedowns
Implications by role
Data: Nasdaq historical data • Sequoia “AI’s $600B Question” • Barclays Research • MIT/Deloitte enterprise surveys
Plateau + Regulation
“The Shrinking Gain”
Lab benchmarks dramatically overstate real-world capability:
92% accuracy in lab → 52% in real-world meta-analysis (83 studies)
97% HumanEval → 26% on real freelance tasks (SWE-Lancer)
of organizations report zero return from GenAI investments (MIT)
Discussion: The ceiling is flattening, the floor is rising, and the current ceiling overstates what you can deploy. What does your AI roadmap look like?
The flattening curve
MMLU (Massive Multitask Language Understanding) has been the most widely cited AI benchmark for three years. The trajectory of MMLU gains tells a stark story: GPT-3 scored 43.9% in 2020. GPT-3.5 jumped to 70.0% — a gain of 26.1 points. GPT-4 reached 86.4% — a gain of 16.4 points. Then the curve flattened dramatically: GPT-4o added just 2.3 points, GPT-4.5 added 2.1, and GPT-5 approximately 1.7. All frontier models now cluster in the 88–93% range. Since approximately 6.5% of MMLU questions contain errors, the practical ceiling is around 93% — meaning frontier models are essentially at the top.
This flattening isn’t limited to MMLU. GSM8K (grade-school math) is completely saturated — frontier models score 95–99%. HumanEval (coding) has been pushed to 93–97%. The industry has responded by creating harder benchmarks: FrontierMath (research-grade mathematics) where AI solves only about 2% of problems, Humanity’s Last Exam where the top score is 45.8%, ARC-AGI-2 (genuine generalization) where the best AI score is 54% at $30 per task while humans solve 100%, and BigCodeBench where AI succeeds 35.5% of the time versus a 97% human standard. These harder benchmarks reveal that near-human performance on traditional tests masks fundamental limitations in reasoning and generalization.
Peak data and the end of scaling
At NeurIPS 2024, OpenAI co-founder Ilya Sutskever declared what many researchers had been sensing: the age of scaling as we knew it was over. He compared training data to fossil fuels — a finite resource being rapidly depleted. Epoch AI’s peer-reviewed research quantifies the constraint: the total stock of high-quality public text data is estimated at roughly 9 trillion tokens, and models may exhaust this supply between 2026 and 2028. Data movement bottlenecks impose a hard limit at approximately 2×10³¹ FLOP — roughly three years from 2024.
Pre-training as we know it will unquestionably end… because while compute is growing through better hardware, the data is not growing because we have but one internet.
— Ilya Sutskever, NeurIPS 2024
He described 2020–2025 as “the age of scaling” and declared a return to “the age of research,” where breakthroughs require new conceptual ingredients rather than larger clusters. This doesn’t mean AI stops improving — it means the path to improvement changes from “throw more compute at it” to “invent something fundamentally new.”
The benchmark-to-deployment gap
For corporate audiences, the most underappreciated data point is this: performance on benchmarks dramatically overstates real-world capability. ChatGPT-4 achieved 92% diagnostic accuracy in controlled medical studies, but a meta-analysis across 83 studies found only 52.1% overall AI diagnostic accuracy in real-world settings — a nearly 40-point gap. On the SWE-Lancer benchmark of real freelance coding tasks, even top models succeed only 26.2% of the time despite near-perfect HumanEval scores. On RE-Bench long-horizon tasks, AI systems score 4 times higher than humans at 2 hours but humans outperform AI 2:1 at 32 hours — suggesting current AI excels at pattern-matching but struggles with sustained complex reasoning.
The regulatory floor is rising
While the capability ceiling flattens, the regulatory floor is steadily rising. The EU AI Act is the most consequential framework, with enforcement phased in over three years. Prohibited practices (social scoring, manipulative AI, predictive policing) were banned in February 2025. General-purpose AI model obligations took effect in August 2025, activating a penalty regime with fines up to €35 million or 7% of global turnover. High-risk AI system obligations — requiring conformity assessment, human oversight, and technical documentation before deployment — apply from August 2026. Full enforcement arrives in August 2027.
The compliance costs are substantial: large enterprises face an estimated $8–15 million initial investment for high-risk systems. In parallel, there are now 56+ copyright lawsuits against AI companies. The Bartz v. Anthropic case produced a $1.5 billion class-wide settlement. The New York Times v. OpenAI case has been consolidated into a multi-district litigation with summary judgment due in April 2026. The combined effect — diminishing capability gains on top, rising regulatory and legal requirements on the bottom — is a narrowing “deployable innovation space” that shapes what organizations can actually ship.
Boring but useful
Scenario 4 is not a disaster scenario. It is arguably the most likely near-term outcome for enterprise practitioners. AI becomes reliable, well-understood infrastructure — similar to cloud computing a decade ago. Not revolutionary, but genuinely useful. The opportunity shifts from “what can AI do that was previously impossible?” to “how can we deploy what already works, reliably and compliantly?” Organizations that treat this as a compliance challenge rather than an innovation challenge may actually find the strongest strategic position.
Trigger signals — what to watch for
- Next frontier model (GPT-5, Claude 5) shows only incremental benchmark improvement (<3 pts MMLU)
- EU AI Act enforcement actions begin — first fines or compliance orders issued against AI providers
- Major copyright ruling goes against AI training (NYT v. OpenAI summary judgment, due April 2026)
- Enterprises begin deferring AI projects citing compliance uncertainty rather than budget constraints
- Ilya Sutskever’s thesis is validated: new research paradigms (not scaling) drive the next capability jump
Implications by role
Data: MMLU/benchmark scores from model papers • EU AI Act timelines • Ilya Sutskever NeurIPS 2024 • Epoch AI data projections
The 2×2 Matrix
All four scenarios map onto two fundamental uncertainties: do capability jumps continue (or do gains flatten), and is the investment justified (or is the infrastructure overbuilt)? Click any cell to navigate to that scenario.
Continued Scaling
The staircase holds. Capex translates to capability. Enterprise revenue catches up. Full steam ahead.
Financial Correction
The tech works, but the investment timeline doesn’t. Correction kills companies, not capability. Amazon vs. Pets.com.
Plateau + Regulation
Diminishing returns meet rising compliance burden. AI becomes boring infrastructure. Useful but not revolutionary.
Efficiency Revolution
Smaller models close the gap. Massive clusters weren’t needed. The moat shifts from the model to everything around it.
These scenarios are not mutually exclusive. Elements of several can unfold simultaneously. Compression (Scenario 2) accelerates financial correction (Scenario 3) by commoditizing the very technology hyperscalers are spending $600B+ to build. The capability ceiling (Scenario 4) undermines the revenue projections needed to justify that investment.
The strongest strategic position is one that performs adequately under all four scenarios — not one that bets everything on the one you think is most likely.
Scenario Planning Worksheet
For each scenario, answer three questions. There are no right answers — the value is in the thinking.
Scenario 1: Continued Scaling
If this materializes in 12 months…
Scenario 2: Efficiency Revolution
If this materializes in 12 months…
Scenario 3: Financial Correction
If this materializes in 12 months…
Scenario 4: Plateau + Regulation
If this materializes in 12 months…
1. Biggest risk to your current project
Under each scenario, what is the single biggest risk to your current AI initiative? Which scenario threatens it most?
2. What would you change today?
For each scenario, what is one decision you would make differently right now — before you know which scenario materializes?
3. Earliest signal to watch
What’s the one observable event that would tell you this scenario is unfolding? Where would you see it first?
4. Which scenario are you betting on?
Look at your current AI strategy. Which scenario is it implicitly assuming? What happens if you’re wrong?
5. One action that works under all scenarios
What is one thing you can do this month that improves your position regardless of which future materializes?
Interactive versions of all visualizations: demos.barcik.training
Full research and data: publications.barcik.training
© 2026 Robert Barcik · LearningDoe s.r.o. · barcik.training