Publications — barcik.training

The Economics of the FrontierNew

May 2026 · ~13,000 words · 7 ledgers · 4 figures

How the frontier AI labs — Anthropic, OpenAI, Mistral, the Chinese labs — actually make and lose money. Told from the seller's side: not what AI costs you, but how sturdy the businesses selling it really are. Seven "ledgers" each decode one mechanism — run-rate vs GAAP accounting, per-model "vintage" economics and the inference margin beneath them, the circular hyperscaler financing, and the labs outside the US duopoly. Every load-bearing figure carries a provenance tag and a numbered source.

Companion to → The Token Economics

AI Lab Economics Anthropic vs OpenAI Hyperscaler Financing Inference Margins Reading the Numbers

Read the booklet →

The Agent Horizon

April 2026 · ~16,000 words · 11 chapters · Revised V2

A strategic guide to the enterprise agent development stack as of 2026. Maps the landscape — MCP and A2A as protocols, vendor SDKs (Google ADK, OpenAI Agents SDK, Claude Agent SDK, AWS Strands, Azure AI Agent Service) as PaaS-style frameworks, LangGraph and CrewAI as agnostic alternatives — through cloud-era analogies. Covers lock-in trade-offs per vendor, the EU AI Act's forcing function, and a forecast with six falsifiable 2027 indicators. Closes with a worked case study of a regulated European bank resolving the 5-question decision framework into a specific stack — including the moment where the framework's answer was wrong and we overrode it.

Agent Frameworks MCP Enterprise AI LangGraph vs ADK EU AI Strategy

Read the booklet →

Open-Weight Model Families & Model Selection

April 2026 · Interactive booklet · 3 parts · Workshop exercise

A decision framework for on-prem inference with open-weight models. Covers the five major model families (Llama, Gemma, Qwen, Mistral, Phi), practical hardware-to-model mapping for H100/H200/DGX Spark, quantization trade-offs, inference framework selection, and a reusable decision checklist. Includes four interactive scenarios where participants select and justify model choices.

Open-Weight Models Model Selection On-Prem Inference DGX Spark Workshop Tool

Read the booklet →

Building Agentic AI — Design Patterns from Production

April 2026 · ~28,000 words · 10 chapters

Actionable architectural patterns for building AI coding agents and agentic systems, extracted from production-grade architecture. Covers persistent memory, background consolidation, tool constraints, prompt economics, output calibration, security, multi-agent orchestration, and capability gating. Each chapter teaches one pattern with practitioner guidance.

Companion to → LLM-Human Interaction Design Patterns

Agentic AI Design Patterns Architecture AI Agents Practitioner Guide

Read the booklet →

LLM-Human Interaction Design Patterns for Operations

April 2026 · ~30,000 words · 10 chapters

How to design the seam between AI agents and human operators. Covers five structural interaction patterns, cognitive biases that undermine handoffs, SBAR-based context presentation, trust calibration, failure mode design with kill switches and circuit breakers, and organizational governance. Includes prompt templates, architecture patterns, and a self-assessment worksheet.

Companion to → Building Agentic AI

Human-AI Interaction Design Patterns Operations Trust Calibration Practitioner Guide

Read the booklet →

Scenario Planning for Generative AI

May 2026 · Interactive booklet · 6 currents · Workshop exercise

Six currents shaping the next 2–3 years of generative AI — continued scaling, the efficiency revolution, a financial correction, sovereignty, the move from lab to production, and the new economics of hours and dollars. Not rival forecasts but forces that run at once: each current carries trigger signals to watch and role-specific implications, with two interactive visualizations. Closes with how the currents interact and a trigger-drill worksheet for team exercises.

Scenario Planning GenAI Strategy AI Investment Interactive Workshop Tool

Read the booklet →

The Token Economics

April 2026 · ~40,000 words · 14 chapters

A strategic guide for EU IT services providers navigating GenAI. Covers the economics of self-hosting LLMs vs APIs, viable business model pivots, the vendor ecosystem play, how AI transforms your own delivery model, EU AI Act compliance opportunities, and a practical 18-month roadmap. Grounded in real April 2026 pricing data.

Companion to → The Economics of the Frontier

GenAI Economics IT Services EU AI Act Business Strategy Self-Hosting vs API

Read the booklet →

Warden — Testing LLM-as-Judge Defenses Against Public Jailbreaks

May 2026 · Research report · 1,680 trials · 3 targets · 4 judge designs

Empirical study of LLM-as-judge defenses against the public jailbreak corpus from ZetaLib. Three open-weight target models (DeepSeek Chat v3.1, DeepSeek v3.2, GLM-4.6) tested against 20 attacks × 4 deployment-rule shapes × 7 defense conditions. The hypothesis — that a competent LLM-as-judge defeats most public attacks — holds, but the popular implementation (input-side filtering) over-blocks legitimate inputs to a degree that would force the defense to be turned off. Includes a deployment playbook for engineering teams and hands-on exercises for students.

Warden LLM Security Prompt Injection LLM-as-Judge Defense Evaluation

Read the report →

GeoBias — 7B Model Evaluation Report

March 2026 · Research report · 5 models · 3 evaluators

Systematic evaluation of geopolitical biases in 7B-parameter language models from three origins (US, CN, EU). Tests 88 prompts across 7 categories using a multi-evaluator panel. Reveals asymmetric performance on sensitive topics and scripted deflection patterns.

GeoBias LLM Evaluation Geopolitical Bias Research

Read the report →

SelfJudge — Can Small LLMs Judge Their Own Outputs?

March 2026 · Research report · 5 models (1B–27B)

Evaluates whether small language models can reliably assess the quality of their own outputs. Tests self-judgment accuracy across factual grounding, instruction following, safety boundaries, consistency, and tone — with accuracy ranging from 50% (1B) to 83% (27B).

SelfJudge Self-Evaluation Small LLMs Research

Read the report →

Bloom — AI Behavioral Safety Evaluation

March 2026 · Research report · 11 behaviors tested

Behavioral safety evaluation using Anthropic’s Bloom framework. Tests 11 risk behaviors including emotional bonding, social engineering assistance, self-preservation, corrigibility resistance, and covert goal pursuit. Scores range from 2.1 to 6.8 on a 10-point scale.

Bloom AI Safety Behavioral Evaluation Red-Teaming

Read the report →

Claude Code Setup — How It All Works

2026 · Reference guide · 9 sections

A comprehensive guide to configuring Claude Code across a multi-repo hub. Covers the three-layer persistent context system (CLAUDE.md, memory, permissions), CLI integrations with GitHub and AWS, cross-machine portability, and a detailed security analysis including defense-in-depth strategies for AI coding assistants.

Claude Code AI Coding Assistant Developer Setup Security Reference Guide

Read the guide →

The Mirror of Artificial Intelligence

2023 · 38 stories + 9 essays · 42 AI-generated illustrations · Available in English & Slovak

A collection of engaging short stories, each exploring a different cognitive bias — all written with generative AI. Interspersed with essays examining the nature of AI tools: copyright, creativity, job displacement, and the question of authorship. The AI holds up a mirror to human thinking, reflecting our own imperfections.

Cognitive Biases AI-Generated Stories Generative AI AI & Society Illustrated

Read in English → Čítať po slovensky →