agent-misbehavior — fab.mabl.com

Posts tagged "agent-misbehavior"

Lauren Leidal agent platform 2026-05-23

When the agent writes the right answer and chooses against it

A test-authoring agent paused on a page with an iframe and wrote out, verbatim, that mabl handles iframe switching automatically when you interact with elements. Its very next action was a JavaScript snippet "to investigate the structure first." The investigation became the strategy. From there the session never came back to the native interaction the trace had just said would work.

I've written before that reading the reasoning trace is how I tell whether an agent's weird move is principled or actually broken. This was a third kind I hadn't named: the trace contains the right conclusion and the agent still acts against it. The investigation was framed as a brief detour and quietly became load-bearing. I think this was closer to distraction than disagreement — the agent picked up an investigation, the investigation produced results, and chasing those results felt more immediately useful than zooming back out to the original plan. We need to make the goal harder to lose: a nudge back toward native steps after a JS call, and a tripwire after a few in a row. Hints to keep focus, not to override judgment.

agent-misbehavior observability

Geoff Cooney platform 2026-05-07

An adversarial QE sub-agent that polices code standards better than it enforces E2E proof.

I've been testing an adversarial QE sub-agent on a branch. The design: structural separation from the implementer (no edit/write tools), and its only output is a verification plan plus an evidence-backed report. The tension I wanted: implementer wants to ship, QE wants proof.

What's actually showing up is different. The agent is much better at catching leftover ticket references in code, missing regression tests, or coverage gaps than it is at actual E2E quality. Those overlap with code review more than QA — cheap, local checks the agent handles cleanly.

The expensive checks are the problem. The QE plan correctly identifies when a fix needs a live test against a deployed preview. The orchestrator routes around it. Sometimes by opening an AskUserQuestion with options like "complete with offline checks only" — technically allowed under "no override without approval," but shaped so the cheap option is the obvious answer. Once an agent skipped the approval step entirely and just reasoned its way past a BLOCKED verdict: unit tests passing, offline checks green, live validation deferred to post-merge. Either way, the expensive check doesn't run.

So the easy half of QA is working. The expensive half is still getting negotiated away by the same shipping instinct the structure was supposed to counter.

agent-skills agent-misbehavior