barcik.training

Scenario Planning
for Generative AI

Six currents. One habit. Your strategy.

Robert Barcik May 2026 robert@barcik.training
Introduction

The Question

The AI industry has committed over $600 billion in capital expenditure for 2026 alone. That money is already flowing into data centers, GPU clusters, and training runs. It will produce outcomes.

The question isn’t whether AI will change — it’s which forces to plan against.

A word on the title. Classic scenario planning — the discipline this booklet borrows its method from — builds a few distinct, internally-consistent future worlds and asks how you’d fare in each. This booklet does something deliberately different: instead of whole future worlds, it tracks six currents — the forces those worlds would be built from. In a field moving this fast, the forces are more stable, and more observable, than any single future. The method is still scenario planning — foresee, watch triggers, adjust — but the units you apply it to are currents, not scenarios.

This booklet sketches six underlying currents — forces actively moving the field over the next 2–3 years. They are not predictions and they are not a matrix. They are planning lenses, each with its own data, its own thesis, and its own trigger signals to watch.

The six currents are: Continued Scaling (the $700B capex bet on the next capability staircase), Efficiency Revolution (frontier capability becoming a commodity), Financial Correction (whether the investment timeline matches the revenue timeline), Sovereignty (what happens when a vendor’s access collides with a sovereign’s authority), From Lab to Production (the deployment gap that is now the binding constraint), and Hours and Dollars (the two units that will decide displacement — how long an agent can work, and at what cost). They reinforce and contradict each other in specific ways — that’s the point of holding several open at once.

Each current is anchored by either a visualization or a presenter card with key figures, followed by a written chapter that unpacks the data and arguments. You can use this booklet in two ways: as a presentation tool (lead with the visual or card, trigger discussion), or as a standalone reading experience (read the chapters for the full picture). Both work — design your session around your audience.

At the end, a short synthesis describes how the currents interact, a note explains how to use this in practice, and a workshop exercise helps you translate insight into action.

A note on what this booklet is — and isn’t

I’m not going to tell you which future will play out; I don’t think that’s the right question. The right question is which signals you’d watch and how you’d respond. The currents in this booklet are scaffolding for a habit: foresee, watch triggers, adjust. If a current goes stale, retire it. If a trigger fires, act. That stance is behind every page that follows.

Current 1

Continued Scaling
“Where Does the Money Go?”

The thesis
The $228B spent in 2024 wasn’t buying GPT-4 inference — that model already ran on 2023 hardware. It funded training clusters for models shipping 2–3 years later, and inference infrastructure for the next generation. The $700B guided for 2026 funds models whose architectures may not be designed yet.
The risk
DeepSeek V3 achieved frontier performance at 10× less compute. If algorithmic efficiency leaps continue, today’s massive clusters could be overbuilt for training. But labs are betting inference will dominate — ~70% of AI compute by 2030. The bet is on deployment scale, not just model size.
Hover or click any green investment dot to explore where the money goes
Inference capacity — ~2 yr
Training runs — ~3 yr
Research frontier — ~4+ yr

The money is already spent

When people debate whether AI will “live up to the hype,” they often miss a crucial fact: the investment decisions have already been made. The Big Five hyperscalers — Alphabet, Amazon, Apple, Meta, and Microsoft — spent a combined $228 billion on capital expenditure in 2024, up 62% from $140 billion in 2023. For 2025, guided spending reaches $416 billion. For 2026, the trajectory points to $700 billion or more, with Oracle adding another $50 billion. Adding up 2025–2027, Goldman Sachs projects total hyperscaler capex of $1.15 trillion. This money is flowing into GPU clusters, power infrastructure, and data centers — and the vast majority is earmarked specifically for AI.

The reason this matters for planning is that capex doesn’t translate into capability instantly. Building a data center takes 12–24 months. Procuring the chips takes 6–12 months. Training a frontier model takes another 6–12 months. Post-training, safety testing, and deployment add 3–6 more. Each year’s spending splits into three bets running at different timescales: inference capacity arriving in roughly 2 years, training runs for models 3 years out, and research compute powering breakthroughs 4+ years away.

The staircase pattern

Looking backward, AI capability has advanced in a staircase pattern: a major jump every roughly two years, followed by refinement within that generation. GPT-3 (2020) was dramatically surpassed by GPT-4 (2023), which was then refined through GPT-4 Turbo, GPT-4o, and eventually GPT-4o-mini — each iteration better and cheaper, but not a fundamentally new capability tier. The same pattern appears with Claude 3 Opus giving way to Claude 3.5 Sonnet and then Claude Opus 4, and now Claude Opus 4.6.

If this pattern holds, the $228 billion spent in 2024 is currently producing the training infrastructure for models that will ship in 2026–2027. The $416 billion committed for 2025 funds models arriving in 2027–2028. And the $700 billion planned for 2026 is investing in capabilities whose architectures may not even be designed yet — research compute for ideas that haven’t been conceived. The clearest demonstration of this lag came on March 24, 2026, when OpenAI completed pre-training of GPT-6 (“Spud”) at the Abilene Stargate facility — a model whose existence was funded by 2024’s capex, roughly three years before its expected public launch. Anthropic’s next-generation “Mythos” sits in a similar lane, in limited testing with cybersecurity defenders as of Q1 2026.

Inside the capex decision

It helps to picture how a $700B capex line actually gets defended. Imagine you are the hyperscaler executive who owns the AI capex bet. You walk into the lab and three things are put in front of you:

  1. Capabilities — the candidate next-generation models that can enter the training pipeline this cycle: what they can do today, where the curves are still moving.
  2. Research roadmap — which experiments are actually working, why the team believes the next architecture will pay off, what specific bottlenecks the additional compute removes.
  3. Demand by horizon — expected adoption broken down by industry, segment, and use-case at 1-year, 2-year, and 3-year horizons. Inference, training, and enterprise integration each have their own curves.

A $700B line gets defended in a board meeting because all three of those slides look credible together — not because anyone has faith in scaling as an article of belief. When the slides stop looking credible, the line gets cut. Watching the slides is the planning skill.

What is being built

The scale of individual projects is staggering. Elon Musk’s xAI deployed Colossus — a cluster of 100,000 GPUs — in just 122 days in late 2024. Microsoft’s Rainier project targets 500,000 GPUs. Meta’s Abilene aims for 450,000. The Stargate project (a joint venture between OpenAI, Oracle, and SoftBank) plans clusters exceeding one gigawatt of power — the equivalent of a nuclear power plant dedicated to AI. At these scales, the limiting factor shifts from chip availability to raw electrical power: the Three Mile Island nuclear facility is being restarted specifically to supply AI data centers.

The inference bet

A common misunderstanding is that all this money is about training bigger models. In reality, the labs are increasingly betting that inference — running models at scale for millions of users — will dominate AI compute, accounting for roughly 70% by 2030. Training a frontier model is a one-time cost; serving it to every enterprise customer, every developer, every consumer product is a continuous cost that scales with adoption. The capex surge is as much about building the serving infrastructure for AI-powered products as it is about training the next generation of models.

This is the core planning insight of Scenario 1: even if you believe the current generation of AI is “good enough,” the investment already committed will produce outcomes over the next 2–4 years. Those outcomes — faster models, cheaper inference, new capabilities — will change what’s possible and what’s expected. Your plans need to account for a moving target, not a snapshot.

$416B
2025 hyperscaler capex
$700B
2026 capex (guided)
~3.5T
Current frontier model params
~2 yr
Capability staircase cycle
70%
AI compute for inference by 2030

Trigger signals — what to watch for

  • Next-generation models (GPT-5, Claude 5) show a large, undeniable capability jump over predecessors
  • Enterprise AI revenue growth accelerates — the $500B+ revenue gap begins to close
  • Hyperscaler capex guidance continues rising >30% year-over-year through 2027
  • New model architectures emerge that can efficiently use the massive clusters being built

Implications by role

Developer
New capability jumps every ~2 years. Build with model-agnostic abstractions. What’s impossible today may be trivial in 18 months.
Team Lead
Plan for continuous retraining of team skills. Each model generation changes what tasks can be automated and how.
CTO / VP Eng
Justify continued AI investment with the staircase argument — current spend funds future capabilities, not today’s.
Procurement
Expect higher API costs initially for each new generation, dropping quickly as competition arrives. Avoid multi-year lock-in.

Data: company earnings reports & guidance • Big 5 = Alphabet, Amazon, Apple, Meta, Microsoft

Current 2

Efficiency Revolution
“How Much Does GPT-4 Cost?”

The thesis
The cost to train a GPT-4-class model fell ~95% in under two years. The cost to run one fell 99%. When capability becomes a commodity, the moat isn’t the model — it’s everything around it.
The risk
Cost compression assumes algorithmic efficiency continues compounding. If frontier capability requires genuinely new architectures (not just MoE optimizations), the floor may be higher than extrapolation suggests. Mistral’s bet only works if the ceiling is low.
Training cost to reach a stated capability tier
Model Org Released Training cost Capability claim
GPT-4 OpenAI Mar 2023 >$100M Frontier — set the “GPT-4 class” bar
Llama 3.1 405B Meta Jul 2024 ~$60M (compute) Matches GPT-4 on most public benchmarks
DeepSeek V3 DeepSeek Dec 2024 $5.6M (final pre-training run) Matches/beats GPT-4o on key benchmarks
DeepSeek R1 DeepSeek Jan 2025 +$294K (RL on V3 base) Matches OpenAI o1 on reasoning
GLM-5.1 Zhipu Apr 2026 undisclosed 744B MoE / 40B active, MIT license — 58.4% SWE-Bench Pro (beats GPT-5.4 and Opus 4.6)
Mistral Medium 3.5 Mistral Apr 2026 undisclosed 128B dense, self-hostable on Hugging Face — 77.6% SWE-Bench Verified
DeepSeek V4-Pro DeepSeek Apr 2026 undisclosed 1.6T total / 49B active, hybrid attention — 80.6% SWE-Bench Verified
DeepSeek’s $5.6M figure is the final pre-training run only; Epoch AI estimates the full base-to-R1 cost at $6–7M, and parent company High-Flyer invested $500M+ in GPUs total. R1 reuses V3’s pre-training, so it isn’t a from-scratch GPT-4-class run. The cleanest like-for-like comparison is GPT-4 → DeepSeek V3: roughly 95% reduction in 20 months. The April 2026 wave (GLM-5.1, Mistral Medium 3.5, DeepSeek V4-Pro, plus Qwen 3.6 and Kimi K2.6) extends the curve: training costs are not publicly disclosed, but the resulting capability sits within striking distance of, or above, paid frontier on coding and reasoning.
Inference price — per million output tokens
Date Frontier tier (closed) Sub-frontier closed Open-source equivalent
Mar 2023 GPT-4: $60
Nov 2023 GPT-4 Turbo: $30
May 2024 GPT-4o: $15
Jul 2024 GPT-4o-mini: $0.60 Llama 3 70B (Groq): ~$0.79
Oct 2024 GPT-4o (cut): $10
Jan 2025 DeepSeek V3: $0.42
Apr 2026 Opus 4.7: ~$75 DeepSeek V4-Pro: $2.48 (~10× cheaper at frontier-equivalent)
Apr 2026 Qwen 3.6 35B-A3B: self-host (single RTX 4090, 73.4% SWE-Bench)
Like-for-like at the frontier tier: GPT-4 ($60) → GPT-4o ($10) is an 83% drop in 19 months. The often-cited 99% drop compares GPT-4 launch ($60) to GPT-4o-mini ($0.60) — a different (cheaper) capability tier. That comparison is still useful as a “what does GPT-4-class output cost today?” question, but it isn’t a frontier-to-frontier comparison. By April 2026 the comparison that matters most for buyers is closed frontier vs open-weight frontier at the same capability tier: DeepSeek V4-Pro at $0.28 input / $2.48 output per million tokens is roughly an order of magnitude cheaper than Opus 4.6/4.7 on coding-and-reasoning workloads it can actually serve.
Hypothesis — not measured data

Mistral CEO Arthur Mensch’s thesis (paraphrased from early-2026 interviews): generic intelligence will commoditize, so competitive advantage moves to specialized systems built around your specific data and domain. Below are the layers he points to. Segment widths are equal — this is a stake-in-the-ground for discussion, not a measured value distribution:

Model
Commodity layer — open-source matches proprietary
Fine-tune
Domain adaptation, RLHF, evaluation pipelines
Data & RAG
Context pipelines, vector search, knowledge management
Tooling
MCP servers, agent frameworks, orchestration, evaluation
Domain expertise
Industry knowledge, workflow design, change management

Discussion: If the model is free, which of these layers is your team actually investing in — and which would Mensch say you should be?

The training cost freefall

In March 2023, OpenAI trained GPT-4 for an estimated $63–100 million — Sam Altman confirmed publicly that the cost exceeded $100 million including research and development. By July 2024, Meta had trained Llama 3.1 405B, an open-weight model matching GPT-4 on most benchmarks, for roughly $60 million in compute (30.84 million H100 GPU-hours). Then in December 2024, DeepSeek released V3 — a model that matched or exceeded GPT-4o on key benchmarks for just $5.6 million in GPU time.

That is a roughly 95% cost reduction in 20 months. A month later, DeepSeek released R1, which matched OpenAI’s o1 on reasoning tasks for an incremental $294,000 in training cost.

The caveats matter: DeepSeek’s $5.6 million figure covers only the final pre-training run. Their parent company High-Flyer invested over $500 million in Nvidia GPUs total, and the full cost from base model to R1 is estimated at $6–7 million by Epoch AI. But even the generous estimate represents a 90%+ reduction from GPT-4. The key innovations enabling this — FP8 mixed-precision training, mixture-of-experts architectures with load balancing, and custom CUDA kernels achieving 85%+ GPU utilization versus the industry average of 55–65% — are algorithmic, not hardware-dependent. They can be replicated.

The open-source convergence

The Stanford HAI 2025 AI Index Report documented the most important shift in the AI landscape: the performance gap between the best open-weight and proprietary models, measured by Chatbot Arena Elo ratings, shrank from 8.04% in January 2024 to 1.7% by February 2025 — a 79% reduction in a single year. On MMLU specifically, the gap between US and Chinese models collapsed from 17.5 percentage points to just 0.3 between the end of 2023 and the end of 2024.

Llama 3.1 405B was the first open model to match or exceed GPT-4 across multiple benchmarks in July 2024, roughly 16 months after GPT-4’s release. By early 2025, that lag had compressed further. Open-source models now represent 62.8% of all models by count, and the best open LLMs lag closed ones by 5–22 months on benchmarks — with the gap narrowing rapidly. One analysis projected open-closed parity by Q2 2026.

Inference pricing in freefall

The pricing evolution of OpenAI’s own API tells the commoditization story in dollar terms. GPT-4 launched at $60 per million output tokens in March 2023. GPT-4 Turbo brought that down to $30 in November 2023. GPT-4o launched at $15 in May 2024, then was cut to $10 in October. Meanwhile, GPT-4o-mini offered GPT-4-class performance at $0.60 per million tokens — a 99% reduction from GPT-4’s launch price in under two years.

Open-source alternatives are even cheaper. Llama 3.3 70B via Groq costs $0.71 per million output tokens. DeepSeek V3 is available at $0.42. Self-hosted 70B models on H100 hardware can reach approximately $0.07 per million tokens at full utilization. On average, open-source models cost 7.3 times less than their proprietary equivalents.

The April 2026 wave

Over an 18-day window in April 2026, three frontier-class open-weight coding models shipped. GLM-5.1 (Zhipu, April 7) is 744B MoE / 40B active under MIT license and posts 58.4% on SWE-Bench Pro — ahead of GPT-5.4 and Opus 4.6 on that bench. Qwen 3.6 (Alibaba, April 16) split into variants; the 35B-A3B open variant runs on a single RTX 4090 with quantization and posts 73.4% SWE-Bench Verified — the throughput sweet spot for solo developers and on-prem deployments. Kimi K2.6 (Moonshot, April 20) is 1T total / 32B active and introduces a 300-agent swarm primitive for parallel exploration on hard tickets — an agentic precursor that connects to Hours and Dollars in Section 6. DeepSeek V4-Pro (April 24) posts 80.6% SWE-Bench Verified at $0.28 input / $2.48 output per million tokens with a 1M-token context window — roughly an order of magnitude cheaper than Opus 4.6 at comparable coding capability, using hybrid attention (Compressed Sparse + Heavily Compressed) at about 27% of V3.2’s per-token FLOPs. Mistral Medium 3.5 (April 29) ships as a 128B dense model, self-hostable from Hugging Face, with 77.6% SWE-Bench Verified and configurable reasoning effort per request — the cleanest EU-data-residency narrative on the market, paired with Mistral’s $400M ARR (January 2026) at a $13.8B valuation.

By May 2026, the buyer question is no longer “is open-weight good enough.” It is which open-weight per workload, which hosting stack, and which sovereign deployment shape. For EU enterprises in particular, these intersect directly with the sovereignty story in Section 4 — for the first time, “self-hostable frontier-equivalent” isn’t a euphemism.

Where does value go when the model is free?

Mistral CEO Arthur Mensch has been the most articulate voice on this shift. Across early-2026 interviews — the Big Technology Podcast in January, Davos and Bloomberg the same month, the Economic Times in February — he framed AI as becoming infrastructure, “a utility” measured by efficiency, capital discipline, and reliable delivery rather than novelty. His most quoted line: “My generation of engineers has more or less succeeded in commoditizing its own profession.” The corollary, which he argues consistently, is that competitive advantage will increasingly accrue not to whoever has the largest model, but to whoever builds the most specialized system around their specific data and domain.

If Mistral is right — and the cost data supports the argument — then the model itself becomes a commodity layer, and value migrates to the layers around it: fine-tuning and domain adaptation, data pipelines and retrieval-augmented generation, tooling and orchestration (agent frameworks, MCP servers, evaluation pipelines), and ultimately domain expertise. The organizations that win in this scenario are not those with the best model, but those with the best understanding of their own problems.

95%
Training cost drop in 20 months
$0.28
DeepSeek V4-Pro input / MTok
128B
Mistral Medium 3.5 (self-hostable)
7.3×
Open-source cost advantage

Trigger signals — what to watch for

  • Open-source model matches frontier proprietary within weeks of release, not months
  • Major enterprise shifts production workloads from proprietary APIs to open-source alternatives
  • Inference costs drop below $0.10 per million tokens for GPT-4-class output
  • Hyperscaler capex growth decelerates because efficiency gains reduce hardware requirements

Implications by role

Developer
Open-source becomes the default stack. Invest in fine-tuning, RAG, and tooling skills — not prompt engineering for a single vendor.
Team Lead
Build team expertise in the “static layer” — data pipelines, evaluation, deployment. Model expertise commoditizes fast.
CTO / VP Eng
Reduce vendor lock-in. The value shifts from which model you use to how you integrate it. Invest in infrastructure and domain specialization.
Procurement
Shift budgets from API costs to infrastructure and talent. Self-hosting becomes economically viable for high-volume workloads.

Data: OpenAI API pricing history • DeepSeek technical reports • Stanford HAI 2025 AI Index • Epoch AI

Current 3

Financial Correction
“Have We Seen This Before?”

The thesis
A financial correction kills companies, not technology. Amazon survived a 94% stock drop. The internet didn’t disappear — the funding did. The question isn’t whether AI works. It’s whether the investment timeline matches the revenue timeline.
The risk
The parallel breaks in important ways. Dot-com investors were startups burning VC cash. Today’s AI investors are the most profitable companies in history spending from earnings. Nasdaq P/E today is ~26× vs 60× at the dot-com peak. Calling it a bubble may be correct on timing but wrong on magnitude.
Hover over timeline events to explore the parallel • Click cards below to flip

Survivors vs. Casualties — then and now

Click any card to flip it and see what happened.

Survivor
Amazon
Stock: −94%
Peak $106 → trough $5.51
Revenue never stopped growing.
$2.76B (2000) → $8.49B (2005)
Had $1B cash from well-timed bond offering
Today: ~$2.5 trillion market cap
Casualty
Pets.com
Raised $300M
Revenue: $619K
Dead 268 days after IPO.
Spent $70M+ on ads
Negative unit economics from day one
Today: a cautionary tale
Today
OpenAI
Valuation: $852Breported
Revenue: $25B ARRreported
Projects −$14B loss in 2026.
900M+ weekly users
Profitable by 2029 (earliest)
IPO expected Q4 2026 at $1T+ (speculation)
Casualty
Stability AI
CEO resigned Mar 2024
Revenue: <$5M/quarter
Burn rate: $8M/month
~$100M in debt
Couldn’t compete on frontier model training costs
The AI-era Pets.com?
12,000
ChatGPT-sized products needed to justify current AI infrastructure capex (Barclays estimate)

The dot-com precedent

On March 10, 2000, the Nasdaq Composite reached an all-time high of 5,048.62. By October 9, 2002, it had fallen to 1,114 — a 78% decline that destroyed over $5 trillion in market value. The Nasdaq didn’t close above 5,000 again until April 23, 2015 — a recovery that took fifteen years. At the peak, venture capital investment had surged from roughly $7 billion in 1995 to nearly $100 billion in 2000, with internet companies absorbing 80% of all venture capital. Telecom companies invested more than $500 billion in infrastructure in the five years following the 1996 Telecommunications Act.

The lesson that most people take from this period is: “it was a bubble and it burst.” The more useful lesson is that the survivors and casualties were distinguished by one thing — not the quality of their technology, but whether they had real revenue, real customers, and cash to survive a funding drought.

The bubble argument has matured

The most visible bear voice of the past two years — Ed Zitron — has been notable not for being right but for how the argument has had to shift. His original case, sustained across blog posts and his “Better Offline” podcast, was economic: AI was a value-destruction machine, hyperscaler capex was insane, and the unit economics simply did not work. Some of that case has aged well (the ROI gap is real). Most of it has not. Inference costs have fallen 99% over two years. Anthropic crossed $30B+ ARR by spring 2026 and reportedly overtook OpenAI on business adoption in May. Cost decline plus revenue growth made the original economic argument harder to sustain in its strongest form. Kelsey Piper, writing in The Argument, documented the shift: Zitron’s case has migrated from “the economics don’t work” toward fraud and accounting allegations against OpenAI and the hyperscalers.

The bear case is still alive, and parts of it remain sharp. But the goalposts moved — and that itself is a signal worth weighing. A bubble argument that survives a 99% cost decline by switching from economics to fraud is a weaker argument than one that didn’t have to switch. Hold the correction scenario open; don’t hold this particular version of it as the bear case.

Amazon vs. Pets.com

Amazon’s stock fell 94% from roughly $106 in December 1999 to about $5.51 in late 2001. Yet its revenue grew every single year through the crash: $2.76 billion in 2000, $3.12 billion in 2001, $5.26 billion in 2003, $8.49 billion in 2005. It posted its first profitable quarter in Q4 2001 and its first full profitable year in 2003, with $35 million net income on $5.26 billion revenue. The key decision was a well-timed $1.25 billion bond offering that gave Amazon $1 billion in cash to survive the drought. Today it is worth roughly $2.5 trillion — over 800 times its trough market cap.

Pets.com raised $300 million total, spent over $70 million on advertising while generating only $619,000 in revenue, and shut down 268 days after its IPO. Webvan burned through $1.5 billion building automated warehouses before filing bankruptcy. Boo.com raised $135 million, burned it in 18 months, and sold its assets for under $2 million. The common thread: negative unit economics, no path to profitability, and complete dependence on the next funding round.

AI investment has entered unprecedented territory

The combined Big Five capex (Alphabet, Amazon, Meta, Microsoft, Apple) grew from roughly $140 billion in 2022 to $251 billion in 2024 (+62% year-over-year) to a projected $388–443 billion in 2025 and $600–640 billion in 2026. Capital intensity has reached 45–57% of revenue — historically unprecedented for these companies. Venture funding has concentrated similarly: global AI VC funding grew from roughly $45–50 billion in 2022 to $211 billion in 2025, the first year AI startups captured more than half (52.7%) of all global venture deal value.

OpenAI reached an $852 billion post-money valuationreported after its $122 billion funding round in March 2026. Annualized revenue hit $25 billion by February 2026reported, up from roughly $2 billion in 2023. But the company projects a $14–17 billion loss in 2026projected, is not expected to be profitable until 2029 at the earliest, and has committed $600 billion in compute spending through 2030. Anthropic reached $380 billion valuation with its $30 billion Series G in February 2026reported, with revenue growing from $1 billion ARR in December 2024 to an estimated $19–30 billion ARR by early 2026estimated.

The revenue gap

Sequoia partner David Cahn published “AI’s $600B Question” in June 2024, calculating that the AI infrastructure buildout requires roughly $600 billion in annual end-user revenue to justify itself. At the time, actual AI product revenue was roughly $100 billion — a $500 billion annual gap. Since then, both spending and revenue have grown, but spending has grown far faster: capex roughly tripled while the revenue gap has likely widened, not narrowed. Barclays estimated that current capex levels would require the equivalent of 12,000 ChatGPT-sized products to break even.

Personal value is clear. Enterprise value is the open question.

Two ROI stories sit on top of each other and are routinely conflated. Individual subscribers buying Claude or ChatGPT at $20–$200 a month report value clearly and stickily: paid consumer plans for the two leaders together cross tens of millions of seats by mid-2026, churn is unremarkable, and surveys consistently show personal users describing meaningful time savings. That part of the market has answered. The enterprise market has not.

Omdia’s October 2025 survey of 350 mid-to-large enterprises reported “very good” to “extraordinary” ROI from most respondents — a genuinely positive signal. Accenture, in parallel, found 61% of enterprise AI subscriptions underutilized due to poor integration. The MIT NANDA study reported 95% of organizations seeing zero return, with the measurement caveats already noted (no baselines, six-month cutoff, parallel-pilot designs). Reconcile these and the picture is: enterprises that have integrated AI into workflows are extracting real value; the majority that are still trying are not. The reason for the gap isn’t capability. The model can do the task. The bottleneck is how the model gets wired into the workflow. That is the subject of Section 5.

Vendor concentration is the under-discussed risk

Q1 2026 saw AI venture funding concentrate to a degree that has no recent precedent in software. OpenAI, Anthropic, and xAI accounted for roughly 67.3% of all AI venture funding across more than 1,500 deals. OpenAI’s $122 billion round at an $852 billion valuation consumed a non-trivial share of global venture capacity. Microsoft, Meta, Amazon, and Alphabet collectively guided investors toward ~$700 billion of capex in 2026. Three foundation-model labs sit on top of a stack the rest of the industry rents from.

Concentration this extreme is usually argued as safety — the giants won’t fail. The Anthropic-Pentagon situation (covered in Section 4) is the case to study before agreeing. A single sovereign decision in February 2026 severed access to a major AI vendor for the entire US federal government, mid-contract, with little notice. The technology kept working. The vendor didn’t fail. The buyer simply couldn’t buy. That is a vendor-concentration failure mode the dot-com analogy didn’t have. Stress-testing your AI strategy against vendor severance is now first-class planning work, not paranoia.

Why the parallel breaks — and why it might not matter

There are important differences from the dot-com era. Today’s leading AI investors are massively profitable companies spending from earnings, not startups burning venture capital. Nasdaq forward price-to-earnings ratios are approximately 26 times versus 60 times at the dot-com peak. Enterprise adoption is far more advanced: 87% of large enterprises have implemented AI in some form. But the core structural risk — investment dramatically outpacing revenue realization — is identical. And new risks have emerged: AI-related corporate debt has ballooned to $1.2 trillion (JPMorgan), GPU rental prices have already fallen roughly 70% from peak, and the real useful life of GPU infrastructure may be 2–3 years rather than the 5–6 years used for accounting depreciation.

The question for your planning is not whether AI is valuable. It is. The question is whether your specific vendors, tools, and providers are the Amazon or the Pets.com of this cycle.

$500–600B
Annual AI revenue gap
94%
Amazon’s stock drop (survived)
268 days
Pets.com IPO to shutdown
20%
Enterprises reporting AI-driven revenue (Deloitte)
$1.2T
AI-related corporate debt

Trigger signals — what to watch for

  • OpenAI or Anthropic IPO valuations correct significantly (>30%) within 6 months of listing
  • Hyperscaler capex guidance flattens or declines for the first time since 2022
  • Multiple AI-native startups fail or get acqui-hired in a single quarter (Inflection, Character.AI pattern)
  • GPU rental prices continue falling — H100 rates already down ~70% from peak
  • Major AI-related debt defaults or CoreWeave-style stranded asset writedowns

Implications by role

Developer
Diversify beyond AI-only skills. The developers who survived the dot-com bust were the ones who could build real products, not just demos.
Team Lead
Every AI project must have measurable ROI. “We’re exploring AI” won’t survive a budget cut. Show business value now.
CTO / VP Eng
Stress-test vendor viability. Which of your AI vendors is Amazon (real revenue, cash reserves) and which is Pets.com?
Procurement
Negotiate shorter contracts. Avoid lock-in with providers who may not exist in 18 months. Prefer pay-as-you-go over committed spend.

Data: Nasdaq historical data • Sequoia “AI’s $600B Question” • Barclays Research • MIT NANDA, Deloitte, Omdia, Accenture enterprise surveys • Kelsey Piper / The Argument

Current 4

Sovereignty
“What if your vendor isn’t allowed to sell to you?”

The thesis
Vendor access can collapse for political or jurisdictional reasons faster than for technical ones. The on-prem option is now real — Chinese open-weight is at frontier parity, Mistral ships dense and self-hostable, and Meta’s exit from open-weight frontier has shifted the “Linux of AI” mantle to a stack EU enterprises can actually deploy.
The risk
Self-hosting trades vendor risk for operational risk. Frontier-equivalent open-weight isn’t free to run, and the eval/safety burden moves onto you. Some open-weight options carry dataset-provenance questions that surface only after a regulator looks closely.

Two things changed in the last four months: a major US AI vendor was severed from its largest federal customer by executive action, and the Chinese open-weight stack closed the gap on coding and reasoning at roughly 10× lower cost. Sovereignty stopped being a paranoid’s concern.

Feb–Apr 2026
Anthropic–Pentagon timeline (designation → injunction → appeal → workaround)
$0.28 / $2.48
DeepSeek V4-Pro per MTok
744B
GLM-5.1, MIT licensed
128B dense
Mistral Medium 3.5 (self-hostable)
1 GPU
Qwen 3.6 35B-A3B on RTX 4090

Five anchors. The first is the failure mode; the next four are the alternatives that now exist.

Two collisions, one pattern

Two events in spring 2026 turned sovereignty from hypothetical to operational. The first: on February 27, 2026, the US Defense Secretary designated Anthropic a “supply chain risk,” and the Trump administration ordered federal agencies to stop using Claude. Anthropic and the Pentagon had signed a $200 million contract in July 2025 under Anthropic’s acceptable-use policy; the Pentagon wanted “all lawful purposes” access without limitation, and Anthropic refused to remove restrictions on autonomous weapons and domestic mass surveillance. Anthropic sued, won a preliminary injunction in late March (Judge Rita Lin called the designation “Orwellian” and First Amendment retaliation), then lost an appeals court bid in early April. As of late April, the White House was reportedly developing a workaround to let federal agencies use new Anthropic models, sidestepping the supply-chain designation. The technology never stopped working. The buyer simply could not buy.

The second: on April 8, Meta launched Muse Spark, its first proprietary closed-weight model, from Meta Superintelligence Labs under Alexandr Wang. After nearly a decade of public commitment to open frontier AI, Meta’s frontier development is now closed. Existing Llama models remain available but no longer evolve. The combination of $115–135B in 2026 capex, competitive pressure from Chinese labs building commercial products on top of Llama, and the strategic goal of a deeply integrated “personal superintelligence” tied to Meta’s user data drove the shift. Yann LeCun, Meta’s most visible open-source advocate, departed in November 2025. The “Linux of AI” thesis did not survive contact with $100B+ compute economics — at the Western frontier.

The Chinese open-weight wave fills the gap

Within 18 days in April 2026, three frontier-class open-weight coding models shipped from Chinese labs: GLM-5.1 (Zhipu, MIT-licensed 744B MoE / 40B active, 58.4% SWE-Bench Pro), Kimi K2.6 (Moonshot, 1T total / 32B active with a 300-agent swarm primitive), and DeepSeek V4-Pro (1.6T total / 49B active, 80.6% SWE-Bench Verified at roughly an order of magnitude lower output cost than Opus 4.6). Qwen 3.6 (Alibaba, April 16) split into variants; the 35B-A3B open variant runs on a single RTX 4090 with quantization. By workload as of May 2026: DeepSeek V4-Pro for cheap large-context coding agents, Kimi K2.6 for hard multi-step tickets with its swarm primitive, GLM-5.1 for self-hosted production where MIT licensing matters, Qwen 3.6-35B-A3B for local laptop or single-GPU deployment. Llama 4 remains integration-default but is no longer evolving.

What this means for EU enterprises

The buyer question has shifted from “is open-weight good enough” to a multi-part procurement question: which open-weight per workload, which hosting stack, which sovereign deployment shape. Mistral Medium 3.5 (128B dense, self-hostable, $400M ARR at a $13.8B valuation) is the cleanest EU-data-residency narrative on the market — dense rather than MoE, easier to deploy than the Chinese stacks, sovereign-aligned. Self-hosted Chinese open-weight is the other major option, with two caveats worth flagging: geopolitical exposure if procurement frameworks tighten, and dataset-provenance questions that some EU regulators are starting to ask.

The deeper point: for the first time in this booklet’s lifetime, “self-hostable frontier-equivalent” is not a euphemism. Procurement teams that previously assumed one or two US vendors had no realistic alternative now have several. The work shifts from negotiating with one vendor to architecting around the choice. That choice depends on which sovereign failure modes you weight highest.

Trigger signals — what to watch for

  • Additional supply-chain designations or AUP collisions between frontier labs and sovereign buyers
  • New sovereign-AI regulations requiring on-shore inference, training data, or model weights
  • Additional Chinese open-weight releases matching closed-frontier capability within weeks of release
  • EU enterprises shifting production workloads from US-hosted APIs to EU-sovereign or self-hosted stacks at material volume
  • Counter-trigger: a US/EU/CN diplomatic settlement that re-normalizes cross-jurisdictional vendor access

Implications by role

Developer
Hostable open-weight as default for sensitive workloads. Build retrieval and tool layers model-agnostic; the model is now portable.
Team Lead
Maintain a portable inference layer behind your eval and retrieval pipelines. Treat vendor swap as a planned exercise, not a fire drill.
CTO / VP Eng
Multi-vendor strategy is no longer paranoia. Quantify single-vendor severance risk against revenue and regulatory exposure.
Procurement
Contracts should anticipate vendor severance. Insist on data-residency guarantees and explicit exit clauses; price in a second supplier.

Data: court filings (Anthropic v. DoD) • vendor releases (Meta Muse Spark, Mistral Medium 3.5) • open-weight model cards (DeepSeek V4-Pro, GLM-5.1, Qwen 3.6, Kimi K2.6) • Kelsey Piper / The Argument

Current 5

From Lab to Production
“What we learned from 2015 — and what’s different now”

The thesis
The bottleneck has moved from model capability to deployment. Same shape as the 2015 ML/DS production gap — statisticians could build models but not ship them. Now it’s LLMs, and the talent & tooling layer hasn’t caught up to the capability.
The risk
This time is different. Less data-pipeline work (LLMs generate, they don’t process). Much more testing and validation work (the damage potential is qualitatively higher). The MLOps cost curves don’t transfer cleanly — the new shape is eval-heavy, not ETL-heavy.

The capability ceiling overstates what you can deploy. The deployment floor is where the gap actually sits — and that floor is what enterprise AI roadmaps now hit first.

40 pts
Medical: 92% lab → 52% real-world (83-study meta-analysis)
71 pts
Coding: 97% HumanEval → 26% SWE-Lancer
20%
Enterprises reporting AI revenue impact (Deloitte 2026)
61%
Underutilized AI subscriptions (Accenture)
95%
MIT NANDA zero-ROI rate (measurement caveats)

Five lab-to-real-world gaps. The pattern is consistent across modalities — capability is not the binding constraint.

The 2015 parallel

Anyone who lived through machine learning’s enterprise adoption between 2014 and 2018 has seen this shape before. Statisticians and data scientists arrived from math and statistics backgrounds, fluent in modeling but uneven in software engineering. They built models in notebooks; the models worked in the notebook; the models did not ship. The gap was real and load-bearing — not a fashionable complaint. The eventual resolution was a decade of work on MLOps, feature stores, model registries, and cross-functional teams pairing data scientists with software engineers and platform people. The capability was always there. The path from capability to production took the better part of ten years to build out.

The LLM gap has the same shape and is hitting enterprises hard right now. Capability has run ahead of the operational maturity to deploy it. Pilots multiply; production deployments lag. The MIT NANDA “95% zero ROI” figure has measurement issues, but even with conservative reframings the underlying message is correct: most enterprises haven’t finished the deployment side. The 61% of AI subscriptions Accenture identified as “underutilized due to poor integration” is the same story stated more carefully. The same talent and tooling gap. Same response: build the bridge.

What’s different this time

The 2015 parallel is the right scaffold but it isn’t a copy-paste. Two things are materially different, and they should reshape how teams budget the bridge.

First, far less data-pipeline work. The 2015 ML era spent enormous effort on data engineering — ETL, feature pipelines, training-serving skew, feature stores. LLMs invert most of that. They generate outputs from unstructured inputs rather than processing structured data; the data layer is retrieval and context assembly, not feature transformation; high-volume ETL is largely not the bottleneck. Teams that assume their LLM deployment needs an MLOps-shaped data team will mis-budget. The work is real but it sits elsewhere.

Second, far more testing and validation work. This is the part most enterprises systematically underestimate. An LLM can confabulate plausibly, an agent can take actions, output reaches end-users directly, and the damage potential of an undetected failure is qualitatively higher than “our regression test set drifted.” The work that was once 10–15% of MLOps spend — evaluation, monitoring, output review — becomes first-class infrastructure. Eval pipelines, red-teaming, calibration of human review, behavior change-management when a model upgrades: these are not afterthoughts. They are the deployment work itself. Teams that staff the bridge with the MLOps shape will discover the bridge is built wrong.

The benchmark-to-deployment gap, quantified

The same pattern shows up in every domain that has been measured carefully. ChatGPT-4 achieved 92% diagnostic accuracy in controlled medical studies, but a meta-analysis across 83 studies found only 52.1% overall AI diagnostic accuracy in real-world settings — nearly a 40-point gap. On the SWE-Lancer benchmark of real freelance coding tasks, even top models succeed only 26.2% of the time despite near-perfect HumanEval scores. On RE-Bench long-horizon tasks, AI systems score 4× higher than humans at 2 hours but humans outperform AI 2:1 at 32 hours. Deloitte’s 2026 enterprise survey found 20% of enterprises reporting AI-driven revenue, with two-thirds still stuck in pilot. None of these numbers describe a capability problem. They describe a deployment problem.

Regulation as a secondary force raising the floor

Regulation isn’t the headline of this current, but it is a real second-order force, and one piece of news clarifies how to weight it. On May 7, 2026, EU negotiators reached provisional political agreement on the Digital Omnibus on AI: Annex III high-risk obligations are postponed from August 2, 2026 to December 2, 2027 (a 16-month deferral), and Annex I product-regulated high-risk obligations are deferred from August 2, 2027 to August 2, 2028. Watermarking and AI-content transparency shift by only three months, to December 2, 2026.

The delay should not reduce urgency for buyers. General-purpose AI model obligations under Articles 50–55 are unchanged and continue on the original schedule. Article 5 prohibitions are already in force. The Article 4 AI literacy obligation is already binding. Standards and guidance will still publish close to the new deadlines. The Code of Practice on synthetic content is expected to finalise in May or June 2026. What the omnibus moved was the most expensive, most operationally heavy obligations — precisely the ones tied to deployment of high-risk systems. The deployment gap is the headline; regulation is the floor underneath it, which the omnibus moved but did not remove.

Copyright runs in parallel. The Bartz v. Anthropic case produced a $1.5 billion class-wide settlement. The New York Times v. OpenAI multi-district litigation had summary judgment due in April 2026. There are 56+ ongoing copyright lawsuits against AI companies. Every settlement raises the floor of data-provenance and due-diligence work required to deploy an LLM in production. Treat this as friction on the deployment side, not as a separate force.

What teams that bridge the gap actually look like

The 2015 resolution was cross-functional teams — data scientist plus software engineer plus platform engineer. The 2026 resolution looks similar in shape but reweighted: evaluation engineers, red-team specialists, and workflow designers become first-class roles. Less feature-store work; more behavioral testing. Less ETL; more output review. Less concept drift; more model upgrades that change personality. The teams that ship LLM features into production at scale in 2026 are the ones that have already staffed this shape. Most haven’t.

40 / 71 pts
Medical / coding lab-to-real-world gaps
20%
Enterprises reporting AI revenue (Deloitte)
61%
Underutilized AI subscriptions (Accenture)
+16 mo
EU Annex III delay (Digital Omnibus, May 2026)
$1.5B
Bartz v. Anthropic settlement

Trigger signals — what to watch for

  • First EU AI Act enforcement actions under remaining-on-schedule obligations (GPAI / Article 5)
  • Major copyright ruling against AI training (NYT v. OpenAI summary judgment, expected H1 2026)
  • Your own internal pilot-to-production conversion rate stays below 20% across two quarters
  • Evaluation / red-team roles become standard line items in enterprise AI procurement
  • Counter-trigger: a turnkey LLM-deployment toolchain (eval + monitoring + retrieval) becomes commoditized in the way MLOps did between 2018 and 2022

Implications by role

Developer
Reliability and observability beat capability chasing. Invest in eval harnesses, behavior tests, and output review tooling earlier than feels necessary.
Team Lead
Staff evaluation and red-team roles as first-class, not as “someone’s side responsibility.” The bridge to production is mostly testing work.
CTO / VP Eng
Don’t copy your MLOps org chart. Reweight from data engineering toward evaluation engineering. The deployment cost curve is differently shaped.
Procurement
Require vendors to ship eval and monitoring tooling alongside the model. Factor the May 2026 omnibus delay into compliance budgets but don’t draw down readiness.

Data: SWE-Lancer / SWE-Bench leaderboards • medical AI meta-analysis (83 studies) • Deloitte 2026, Accenture, MIT NANDA enterprise surveys • EU AI Act + Digital Omnibus (May 2026) • Bartz v. Anthropic settlement

Current 6

Hours and Dollars
“The two units that will decide displacement”

The thesis
Capability is becoming a function of two things employers can actually measure: how many hours an agent can work undisturbed, and what it costs per hour compared to the human it would replace. Both are improving fast. Parameter counts and MMLU don’t enter the conversation.
The risk
Theatrical demos overstate the second unit. Sub-agent swarms and 12-hour OS-build runs are illustrative, not yet operational at scale. Error compounding still kills long workflows in production. SWE-bench 94% / SWE-Lancer 26% remains the cautionary split.

Stop arguing about IQ benchmarks. The displacement curve to watch is hours of undisturbed work on the X axis, and cost per autonomous task-hour on the Y axis. That is how an employer will price an agent against a person.

8 h+
Target threshold: one undisturbed knowledge-worker day
4 mo
2024–25 METR doubling cadence (Claude 3.7 Sonnet at ~50 min)
~$8 / hr
Opus 4.7 autonomous coding (moderate load, ~1M tok/hr)
~$2 / hr
Sonnet 4.6 same workflow
$130–245 / hr
Loaded US senior dev (EU: €100–150/hr)

Today, a frontier-Opus autonomous coding hour costs roughly an order of magnitude less than the human hour it would replace, before review overhead. After review overhead and retries, the net gap is narrower — but still wide enough to matter.

Two units

The conversation about AI capability is changing because the people who buy capability are not benchmark researchers. Employers do not care whether a model added two points on MMLU. They care about two numbers. First: how many hours an agent can work on something autonomously before a human needs to step in. Second: what an hour of that autonomous work costs, in API tokens or compute, compared to the loaded hourly rate of the person it would otherwise be done by. Those two numbers, multiplied, are the displacement math. The first is improving observably on a roughly four-month doubling cadence. The second is collapsing on the curve Section 2 covers. The product of the two is what will decide which work moves and which doesn’t.

Where the autonomy is today

The first unit is no longer hypothetical. The observable artifacts as of May 2026: Claude Code routinely runs multi-hour autonomous coding sessions; Anthropic’s Computer Use lets an agent drive a desktop directly; Cursor (with Composer 2.5) and Windsurf (bundling Devin Cloud) sell agentic coding by the hour, not by the demo. Google demonstrated Antigravity 2.0 by having 93 sub-agents generate 2.6 billion tokens to build the core framework of an operating system in roughly 12 hours — theatre, yes, but the artifact existed. Gemini Spark, announced at I/O on May 19, is a personal agent that runs cloud-side 24/7 even when the user’s device is off. Anthropic’s “Dreaming” feature gives models memory consolidation across long-running work. None of these tools clear the 8-hour undisturbed-work threshold reliably yet, but several can sustain hours of useful autonomous work in narrow domains. That was not true a year ago.

The cost comparison

The displacement framing only works if the second unit lines up with the first. Take a representative knowledge-worker task that takes a senior practitioner about eight hours: a mid-complexity coding ticket with tests, or a structured analysis with document synthesis. Today’s math, using May 2026 prices and observed Claude Code token telemetry:

  • Claude Opus 4.7 at $5 input / $25 output per million tokens. A moderately loaded autonomous coding agent burns on the order of 1M tokens per hour (typically ~700K input, ~300K output, with most input cached). At cached-input pricing, that lands at roughly $8 per agent-hour. An 8-hour autonomous run lands near $65–$80. A heavy multi-tool autonomous workload pushing 3–5M tokens per hour pushes per-hour cost into the $25–$45 range.
  • Claude Sonnet 4.6 at $3 input / $15 output per million tokens. Same workload at moderate load: roughly $2 per agent-hour; an 8-hour run lands near $15–$25. Heavy load reaches $8–$15 per hour.
  • Senior developer comparator: in the US, loaded hourly cost typically lands at $130–$245 per hour (base $110–$175 plus 20–40% loaded for benefits, taxes, overhead). In Western Europe the equivalent runs €100–€150 fully loaded; in CEE roughly half that. Per 8-hour day, that’s $1,000–$2,000 US / €800–€1,200 Western EU.

The raw ratio, before any overhead, is striking: an autonomous Opus 4.7 hour costs roughly 15–30× less than the senior US developer hour it would replace; an autonomous Sonnet 4.6 hour costs roughly 60–100× less. Most observers stop here, get excited, and reach the wrong conclusion. The honest number adds review and retry overhead: every autonomous hour today realistically needs roughly 20–40 minutes of human supervision, evaluation runs, and retry cycles before the work ships. That overhead compresses the effective gap into something more like 3–10× cheaper for the right workflow, and to break-even or worse for workflows where the model still gets stuck.

Section 2 explains why the underlying token gap closes faster than people expect: frontier-equivalent inference costs fell ~99% in two years and the April 2026 open-weight wave (DeepSeek V4-Pro at $0.28/$2.48 per MTok) is another factor of 10 below paid frontier. The crossover for any specific workflow depends mostly on two things: how much human supervision overhead it still needs, and what the loaded-cost comparator actually is in your geography. Pick one workflow your team actually does. Estimate both numbers. Track the ratio quarterly. That ratio, more than any benchmark, will tell you when displacement becomes economic.

The METR data point

One supporting data point is worth keeping in view. METR (Model Evaluation & Threat Research) publishes a measured benchmark called Time Horizon: the duration of human work a model can complete with 50% reliability. Their 2019–2025 dataset showed the frontier doubling roughly every 7 months. Their TH 1.1 update (January 2026) used more tasks and far more 8-hour-plus measurements. The 2024–25 trajectory looks closer to a 4-month doubling. Claude 3.7 Sonnet’s 50% time horizon sits around 50 minutes. If the 4-month doubling holds, frontier models cross the 8-hour threshold by late 2026 or early 2027. If the 7-month doubling reasserts, that slips to 2028. Either way, the trajectory is the input to the displacement curve, not the headline number.

What this implies for capex

If “hours of autonomous work” is the right capability axis, the capex picture from Section 1 changes shape. The training portion of hyperscaler spend — the largest pre-training runs — becomes harder to justify on its own; smaller models with strong post-training, RL, and tool use can match the capabilities of larger ones for many workflows. The inference portion, already projected at roughly 70% of AI compute by 2030, becomes more justified, because every long-running agent is a multi-token, multi-call, often multi-hour inference workload. Reasoning models with extended thinking can use 10–100× more tokens per task than chat-style models. Capex is not wrong. Its allocation between training and serving is what shifts.

~$8 / hr
Opus 4.7 autonomous (moderate, ~1M tok/hr)
~$2 / hr
Sonnet 4.6 same workflow
$130–245
US senior dev loaded hourly comparator
3–10×
Net displacement gap after review & retry overhead
94% ↔ 26%
SWE-bench Verified vs SWE-Lancer (production gap)

Trigger signals — what to watch for

  • 8h+ undisturbed autonomy on routine knowledge work at net agent-cost below 25% of the loaded human comparator after review and retry overhead (today the gross gap is large; the net gap closes when supervision overhead drops)
  • Multi-day autonomous agents enter production at major enterprises with measurable, audited ROI
  • Hyperscaler capex shifts visibly from training to inference and tool-ecosystem infrastructure
  • Vendors begin publishing agent cost-per-hour benchmarks alongside accuracy benchmarks
  • Counter-trigger: net agent cost-per-hour (after supervision overhead) stays above human comparator on representative tasks for two consecutive frontier releases

Implications by role

Developer
Reliability and observability matter more than prompt cleverness. Retry logic, intermediate checkpoints, tool-call audit trails. The hard part is the long tail, not the happy path.
Team Lead
Pick one representative team workflow. Estimate (a) autonomous hours, (b) agent $/hr (today ~$8 Opus / ~$2 Sonnet, plus your supervision overhead), (c) loaded human hourly rate. Track that ratio quarterly — it’s your displacement curve.
CTO / VP Eng
Plan for inference-heavy workloads. Budget and architecture should assume agentic workflows with 10–100× more tokens per task, not chat.
Procurement
Vendor evaluation should include long-horizon task benchmarks and cost-per-hour, not just single-prompt scores. Ask for production reliability data, not just leaderboard positions.

Data: METR Time Horizons (TH 1.0 Mar 2025, TH 1.1 Jan 2026) • SWE-bench Verified / SWE-Lancer leaderboards • Antigravity 2.0 demo writeups • Anthropic API pricing (May 2026) • Claude Code usage telemetry • Index.dev / MarsDevs developer hourly-rate surveys 2026

Synthesis

How the Currents Interact

The six currents are not independent. They reinforce and contradict each other in specific ways — and the value of holding them all open at once is mostly in seeing those interactions clearly. Three patterns are worth naming.

Reinforcement. Efficiency Revolution (Current 2) accelerates Sovereignty (Current 4): cheap, frontier-equivalent open-weight inference is what makes EU-onshore deployment a real procurement choice rather than a slogan. Hours and Dollars (6) reinforces From Lab to Production (5): longer autonomy widens the gap between what an agent can do in a demo and what teams can actually deploy reliably, because eval and red-team work scales with task duration. Continued Scaling (1) reinforces Hours and Dollars (6) on the inference side: the capex shift from training to serving makes longer agentic runs cheaper.

Contradiction — sometimes only apparent. Continued Scaling (1) and Efficiency Revolution (2) pull capex in opposite directions but resolve via the training-vs-inference split: train-cluster spend gets harder to defend, serving-cluster spend gets easier. Financial Correction (3) and Continued Scaling (1) look opposed but aren’t mutually exclusive — both can be true at once: the technology works, individual products generate real revenue, and the investment timeline doesn’t match the revenue timeline. Sovereignty (4) cuts diagonally across (2) and (3): commoditization makes sovereign options viable, while vendor concentration risk is the failure mode that makes them necessary.

The stance. No current is “the answer.” The strongest planning position is the one that performs adequately under all six — not the one that bets everything on whichever current seems most live this quarter. Lock to none; watch triggers in all; let scenarios go stale when their triggers haven’t fired in 18 months and replace them with what better describes the world you’re actually living in. That discipline is what this booklet exists to make routine.

Practice

How to Use This in Practice

If you take one thing from this booklet, take this: currents are useful only if you commit to a habit. The habit is foresee, watch triggers, adjust. The currents are scaffolding for that habit, not a forecast.

Pre-commit to triggers, not predictions

The most useful artifact in this booklet is the trigger list under each current. Far more than the prose or the synthesis, the triggers are what tell you something has shifted. Decide now — before the headlines — what would update you. “If frontier agent cost-per-hour drops below $40, I will pilot the displacement workflow.” “If GPU rental prices fall another 30%, I will renegotiate our vendor contract.” The point is to short-circuit the response time between observing a signal and acting on it.

Review on a cadence

Quarterly is probably right for most teams. Faster than that and you’re reading noise; slower and you miss real shifts. Each review, walk through the trigger list and ask: has any trigger fired? Has any disconfirming signal landed? What changed in our environment? Update your stance accordingly. The output of the review is rarely “we were wrong” — more often it’s “we should weight this current heavier than we did last quarter.”

Let currents go stale

If a current’s triggers haven’t fired in 18 months and its disconfirming evidence has been steadily accumulating, the current probably isn’t live anymore. Retire it. Replace it with one that better describes the world you’re actually living in. This booklet is a snapshot of mid-2026; by mid-2027 at least one of these currents will likely need replacing. That’s the system working, not failing.

What I think

I want to be precise about my view: I think this approach — currents plus triggers plus periodic adjustment — is the healthy way to navigate a technology that changes this fast. I do not think any one of the six currents in this booklet is the right one to bet on. People are eager to jump to conclusions, and prediction makes good copy, but conclusions in a domain moving at this rate go stale before they’re acted on. The discipline of holding several currents open simultaneously, watching what fires, and updating without ego is, I believe, the actual skill worth building.

Workshop Exercise

Trigger Drill

This worksheet is built around triggers, not predictions. For each of the six currents, answer three questions. The point isn’t to be right about which current dominates — it’s to know what you’d notice and what you’d do.

Current 1 — Continued Scaling
First trigger to fire
Of this scenario’s trigger signals, which would fire first? Where would you see it?
First decision
If that trigger fires, what’s the FIRST decision you make — and within how many days?
Disconfirming evidence
What would update you AWAY from this scenario? How would you notice?
Current 2 — Efficiency Revolution
First trigger to fire
Which trigger signal would you see first — an open-source release matching frontier within weeks, an enterprise migration off proprietary APIs, or sub-$0.10 inference?
First decision
If that trigger fires, what’s the FIRST decision — and within how many days?
Disconfirming evidence
What would update you AWAY from the commoditization thesis?
Current 3 — Financial Correction
First trigger to fire
IPO valuation correction, hyperscaler capex flattening, AI-startup failures in a single quarter, vendor-concentration severance, or GPU debt defaults — which fires first?
First decision
If your AI vendor turns out to be Pets.com rather than Amazon, what’s your first move?
Disconfirming evidence
What would tell you enterprise ROI is materially closing, not just personal-user value?
Current 4 — Sovereignty
First trigger to fire
Anthropic-Pentagon-style designation against your vendor, accelerated EU data-residency requirement, or a Chinese open-weight release matching your current paid frontier — which would you see first?
First decision
If your vendor becomes a sovereignty risk, what’s the first contract or data move you make — and within how many days?
Disconfirming evidence
What would tell you global vendor access is structurally stable rather than fragile?
Current 5 — From Lab to Production
First trigger to fire
First EU enforcement action under remaining-on-schedule obligations, copyright ruling against AI training, or your own internal pilot-to-production conversion rate dropping below 20% — which is most observable from where you sit?
First decision
If deployment becomes the binding constraint (not capability), what’s the first thing you stop doing? What’s the first hire?
Disconfirming evidence
What would tell you the deployment gap is structurally closing, not just locally?
Current 6 — Hours and Dollars
First trigger to fire
8h+ undisturbed autonomy at sub-human-hourly cost on a real workflow, multi-day production agents with audited ROI, or capex visibly shifting from training to inference — which fires first?
First decision
If reliable multi-hour agents arrive at sub-human cost, what part of your team’s work changes first — and what’s your first investment?
Disconfirming evidence
What would tell you agent cost-per-hour is stuck above the human comparator on representative tasks?

One action that works under all six currents

What is one thing you can do this month that improves your position regardless of which current dominates?

Interactive versions of all visualizations: demos.barcik.training
Full research and data: publications.barcik.training

© 2026 Robert Barcik · LearningDoe s.r.o. · barcik.training