Asking for human reviews should be the exception, not the rule.

James Baldassariverificationthis week

Asking for human reviews should be the exception, not the rule.

At our current PR throughput, I'm going to burn out on reviews alone, never mind actual work. We can't move at this pace while also having a human read every change. The model I want us to move toward: ask for human review when you're truly unsure about something, and let the agent reviewers handle the rest. The piece I shipped to make this real was getting /codex-review wired up across every repo as an on-demand second opinion. Different model family from Claude, uncorrelated blind spots, one comment away when you want it. The work was sixteen PRs across sixteen repos to enable the GitHub Actions trigger, plus a change to our shared workflows repo to make it a first-class slash command. The conceptual work was harder: deciding that "two agent reviews and a human glance" is now an acceptable pre-merge state for routine changes, and reserving real human attention for the changes that genuinely need it. We're not all the way there. But the trajectory is clear, and I'd rather build the routing now than burn out catching up to it later.

More from the floor

Joey Imbascianoagents + harnessthis week

Treat agentic capacity as a portfolio, not a per-seat allocation.

Our default Anthropic seats include $150 of overage per month per premium seat. We're way past that for a lot of people, and we're going to be further past it next quarter. The real cost is the extra usage, not the seat — you should think of the seats as a promotional teaser and overage as the true cost. So we're treating it like a portfolio. We pre-purchase 1,000 credits at a 30% discount. We're looking at moving high-volume workloads to direct API consumption for better visibility. By the end of the year I doubt any of us will sit within seat allocation for full-time work — at least not without giving up the productivity gains we just earned. The companies that figure out the cost structure of agentic work as a separate discipline from "give engineers more tools" are going to have a real edge. The ones that don't are going to be surprised by their bill.

Geoff Cooneyplatformthis week

Validation should be a sub-agent, not a checklist.

I've been thinking about where verification belongs in an agentic pipeline. The shape I keep coming back to is a quality validation sub-agent that runs *before* PR submission — its job is to come up with and verify a validation plan for the change, including running the relevant mabl tests, capturing evidence, and attaching that evidence to the PR. Then the PR review agent enforces the existence of the validation plan, not the rules underneath it. Then the full mabl suite runs on merge, and when something breaks, a failure-analysis skill identifies which PR introduced it and suggests fixes. The reason for that structure: at our throughput, "did the engineer remember to validate this" is the wrong question. The right question is "does the PR carry evidence that validation happened, and does the evidence hold up?" Sub-agents do the validation; the review agent checks the evidence; the merge gate trusts the chain. None of the layers is doing checklist work — each one has a specific decision to make. Most teams that try to add AI to their existing CI/CD end up with checklist agents because that's the shape of CI/CD. I don't think that's where this lands.

Dario Kondratiuksre + infrathis week

Deterministic tools for deterministic tasks.

I keep watching Claude reinvent the same shell pipeline three different ways across sessions. Routine cross-repo operations like dependency bumps are the canonical example: an engineer asks Claude to do it, and Claude figures out a slightly different approach each time — usually right, sometimes wrong, always slow. What I've been pushing for is a shared scripts directory for the things that are deterministic. Bump a CLI version into a downstream repo? Script. Generate a new connector skeleton? Script. Snapshot a runner config? Script. When the work has a known shape, the agent shouldn't be reasoning it out — it should be calling the script. We pay for the agent's reasoning when we need reasoning. We shouldn't pay for it when we just need the right command in the right order. The cleaner the line we draw between "this is a deterministic operation" and "this needs the model," the better the system gets at both.

Dan Belchersupportthis week

The skill is the CMS.

I wanted to stand up an engineering log at fab.mabl.com — short posts from people on the delivery pipeline, public, in our voice. The first instinct was the obvious one: a CMS, or markdown files in a repo with a PR per post. Both are wrong for us. A CMS adds a vendor for ten posts a year. A PR per post means Joey doesn't write the post, because forking and branching to publish a paragraph isn't how anyone gets a paragraph out of their head. So we did the third thing: posts are markdown files in the repo, and the authoring interface is a Claude skill called /fab-note that drafts a post in the established voice, confirms it with the author, and commits the file directly to main. The author never sees a PR. Git history is preserved. The skill is the CMS. The point that surprised me writing this: the right interface for a publishing system isn't a form or a PR, it's the tool the authors are already inside. Our engineers spend their day in Claude Code. Meeting them there cost less than building anything else.

Joe Lustrelease opsthis week

Reliable builds are the actual unlock.

Our productivity is up roughly 4x quarter over quarter. The thing I keep working on is making sure the build infrastructure can actually keep up. CLI builds intermittently failing on a datastore emulator issue. Self-hosted runners missing Docker. JDK downloads from Adoptium failing at random. The worst was a pernicious interplay between simultaneous matrix builds, GitHub's pre-registered runner names, GCP's 5-VM-per-call rate limit, and Pub/Sub retries — when all four collide, the merge queue stops moving. Our merge queue alone has cost us a day at a time when it breaks. None of this work is glamorous. Cache config, runner audits, log diving, hunting down Pub/Sub retry semantics. But the math is straightforward: every minute the merge queue is broken is a minute the rest of the agentic pipeline is sitting idle. AI-native throughput needs reliable builds underneath it for the upstream gains to compound. Agents can write code as fast as they want — if main can't merge, none of it ships. So I keep investing here. It's the least visible work and the highest-leverage.