We publish when we have something worth saying, not on a schedule. Everything here comes from real work and genuine curiosity: tools we've tested, experiments we've run, and findings we think are worth sharing. No sponsored content, no AI-generated filler. If it's here, one of us wrote it and stands behind it.

The Blind Spot in LLM Observability

Standard APM shows latency and token counts. Purpose-built LLM observability adds prompt and response content. Neither showed that one run was truncated. What it took to find that out.

Inference Layer Cost: Routing by Cost Alone Isn't Enough

Four models, two experiments, and a cost spread that didn't predict what mattered. What the data showed about cheap models, omitted warnings, and why the most interesting divergence had nothing to do with price.

Inference Layer Architecture: What the Experiments Actually Showed

Three incidents, four models, and a scoring rubric that missed what mattered. What routing experiments revealed about cheap models, silent failures, and what to optimize for at 2am.

How and Why I Built a Simple PDF Signer with Claude Code

I used Claude Code to build a free, private, browser-based PDF signing tool. Two hours to a working prototype, six more on the design. Here's how it went.

When Does Your Inference Layer Become a DevOps Problem?

I ran 1,000 identical API calls across five time windows and timed a provider migration twice: with an abstraction layer and without. The latency finding was expected. The behaviour change wasn't.

Evaluating Claude's Terraform skills: token cost, output quality, and when to use none of them

I tested four configurations (three MCP skills plus bare Claude) against five real infrastructure tasks. The token math is real, but it's not the most interesting finding.

Want to talk through something you're working on?

We're always happy to have a straightforward conversation about infrastructure, tooling, or where your DevOps practice is headed.

Get a Free Consultation