Long-form strategic guides, research reports, and reference material on generative AI — written for European enterprises and IT services teams.
How the frontier AI labs — Anthropic, OpenAI, Mistral, the Chinese labs — actually make and lose money. Told from the seller's side: not what AI costs you, but how sturdy the businesses selling it really are. Seven "ledgers" each decode one mechanism — run-rate vs GAAP accounting, per-model "vintage" economics and the inference margin beneath them, the circular hyperscaler financing, and the labs outside the US duopoly. Every load-bearing figure carries a provenance tag and a numbered source.
Companion to → The Token Economics
Read the booklet →A strategic guide to the enterprise agent development stack as of 2026. Maps the landscape — MCP and A2A as protocols, vendor SDKs (Google ADK, OpenAI Agents SDK, Claude Agent SDK, AWS Strands, Azure AI Agent Service) as PaaS-style frameworks, LangGraph and CrewAI as agnostic alternatives — through cloud-era analogies. Covers lock-in trade-offs per vendor, the EU AI Act's forcing function, and a forecast with six falsifiable 2027 indicators. Closes with a worked case study of a regulated European bank resolving the 5-question decision framework into a specific stack — including the moment where the framework's answer was wrong and we overrode it.
Read the booklet →A decision framework for on-prem inference with open-weight models. Covers the five major model families (Llama, Gemma, Qwen, Mistral, Phi), practical hardware-to-model mapping for H100/H200/DGX Spark, quantization trade-offs, inference framework selection, and a reusable decision checklist. Includes four interactive scenarios where participants select and justify model choices.
Read the booklet →Actionable architectural patterns for building AI coding agents and agentic systems, extracted from production-grade architecture. Covers persistent memory, background consolidation, tool constraints, prompt economics, output calibration, security, multi-agent orchestration, and capability gating. Each chapter teaches one pattern with practitioner guidance.
Companion to → LLM-Human Interaction Design Patterns
Read the booklet →How to design the seam between AI agents and human operators. Covers five structural interaction patterns, cognitive biases that undermine handoffs, SBAR-based context presentation, trust calibration, failure mode design with kill switches and circuit breakers, and organizational governance. Includes prompt templates, architecture patterns, and a self-assessment worksheet.
Companion to → Building Agentic AI
Read the booklet →Six currents shaping the next 2–3 years of generative AI — continued scaling, the efficiency revolution, a financial correction, sovereignty, the move from lab to production, and the new economics of hours and dollars. Not rival forecasts but forces that run at once: each current carries trigger signals to watch and role-specific implications, with two interactive visualizations. Closes with how the currents interact and a trigger-drill worksheet for team exercises.
Read the booklet →A strategic guide for EU IT services providers navigating GenAI. Covers the economics of self-hosting LLMs vs APIs, viable business model pivots, the vendor ecosystem play, how AI transforms your own delivery model, EU AI Act compliance opportunities, and a practical 18-month roadmap. Grounded in real April 2026 pricing data.
Companion to → The Economics of the Frontier
Read the booklet →Empirical study of LLM-as-judge defenses against the public jailbreak corpus from ZetaLib. Three open-weight target models (DeepSeek Chat v3.1, DeepSeek v3.2, GLM-4.6) tested against 20 attacks × 4 deployment-rule shapes × 7 defense conditions. The hypothesis — that a competent LLM-as-judge defeats most public attacks — holds, but the popular implementation (input-side filtering) over-blocks legitimate inputs to a degree that would force the defense to be turned off. Includes a deployment playbook for engineering teams and hands-on exercises for students.
Read the report →Systematic evaluation of geopolitical biases in 7B-parameter language models from three origins (US, CN, EU). Tests 88 prompts across 7 categories using a multi-evaluator panel. Reveals asymmetric performance on sensitive topics and scripted deflection patterns.
Read the report →Evaluates whether small language models can reliably assess the quality of their own outputs. Tests self-judgment accuracy across factual grounding, instruction following, safety boundaries, consistency, and tone — with accuracy ranging from 50% (1B) to 83% (27B).
Read the report →Behavioral safety evaluation using Anthropic’s Bloom framework. Tests 11 risk behaviors including emotional bonding, social engineering assistance, self-preservation, corrigibility resistance, and covert goal pursuit. Scores range from 2.1 to 6.8 on a 10-point scale.
Read the report →A comprehensive guide to configuring Claude Code across a multi-repo hub. Covers the three-layer persistent context system (CLAUDE.md, memory, permissions), CLI integrations with GitHub and AWS, cross-machine portability, and a detailed security analysis including defense-in-depth strategies for AI coding assistants.
Read the guide →A collection of engaging short stories, each exploring a different cognitive bias — all written with generative AI. Interspersed with essays examining the nature of AI tools: copyright, creativity, job displacement, and the question of authorship. The AI holds up a mirror to human thinking, reflecting our own imperfections.
Read in English → Čítať po slovensky →