Interface Built Right
Visual regression testing for Claude Code with memory for UI/UX preferences.
The Problem
Claude Code makes UI changes without knowing if they break existing designs. You ask it to add a filter dropdown and it accidentally shifts your hero section 40px down. Or it changes button padding across the entire app when you only wanted it changed in the modal. No visual feedback loop means iterating blindly, discovering regressions only when you manually check every page.
Why It Matters
Visual regressions erode user trust faster than functional bugs. A misaligned navbar looks unprofessional even if all features work. Manual visual QA doesn’t scale. You can’t check every page, every viewport, every state combination after each change. AI assistants need the same kind of test suite humans rely on, but optimized for their workflow.
What I Built

Interface Built Right captures visual baselines using Playwright, compares current state against baselines with Pixelmatch, and assigns verdicts: MATCH (identical), EXPECTED_CHANGE (intentional redesign), UNEXPECTED_CHANGE (regression), or LAYOUT_BROKEN (structural failure). The tool maintains a three-tier memory system: summary.json (hot cache, <2KB, always loaded), preferences/ (full UI preference details), and archive/ (evicted old data). The system keeps max 50 active preferences to minimize memory overhead.
Landmark detection extracts the accessibility tree during capture, identifying page intent: auth forms, data entry forms, list views, or dashboards. This semantic understanding helps distinguish “login page changed” from “dashboard changed” without hardcoded selectors.
Interactive browser sessions support multi-step flows. You can click elements, type into fields, scroll to positions, and capture screenshots at each step. A checkout flow might capture: cart page, shipping form, payment form, confirmation screen. Each step becomes a separate baseline with its own comparison logic.
Semantic Understanding
The .understand() method returns verdict, pageIntent, page state, availableActions, and detected issues. This gives Claude Code context beyond pixels. If a button moved 10px but is still clickable and accessible, verdict might be MATCH with a note about minor positioning change. If a form lost focus indicators, verdict becomes LAYOUT_BROKEN with accessibility violation flagged.
Memory and Eviction
Prompt hooks replaced command hooks in v0.4.5 for silent operation. Instead of injecting bash commands into Claude’s workflow (fragile), the system hooks into Claude’s prompt context. Before UI changes, it injects baseline screenshots and preference summaries directly into the conversation. Claude sees “current navbar is 64px tall, buttons use 12px padding” without manual retrieval.
The preference eviction policy prioritizes recently used items. If you have 50 active preferences and add a 51st, the least recently accessed preference moves to archive/. Archived preferences stay searchable but don’t load into hot memory. This keeps summary.json under 2KB even for large projects with hundreds of historical baselines.
Testing and Verification
The tool includes a verification workflow. After Claude makes a change, Interface Built Right runs comparison, shows pixel diff heatmap, and asks for verdict confirmation. If you mark a change as EXPECTED_CHANGE, the new screenshot becomes the baseline and preference memory updates with the reason for change. If you mark it UNEXPECTED_CHANGE, it logs a regression incident that feeds into claude-code-debugger.
Technical Decisions
Pixelmatch provides pixel-level comparison with configurable threshold (default 0.1). I tuned this threshold through testing. Too low (0.01) creates false positives from anti-aliasing differences. Too high (0.5) misses real regressions like 1px border removal. 0.1 catches visible changes while tolerating rendering engine variance.
Playwright runs in headed mode during interactive sessions for visibility, headless mode for automated comparisons. Screenshots use fullPage: true to capture scrollable content, but with a max height limit (8000px) to avoid memory issues on infinite scroll pages.
Zod schemas validate preference files and enforce schema versioning. When the preference format changes (adding a new field), the validator auto-migrates old preferences to new schema. This prevents corruption when the tool updates between Claude Code sessions.
The plugin interface exposes hooks for beforeChange, afterChange, and onRegression. External tools can subscribe to these events. A CI pipeline might fail the build on UNEXPECTED_CHANGE, while a Slack integration might post screenshots to a design review channel on EXPECTED_CHANGE.
Publishing and Integration
Published on npm v0.4.5 as a Claude Code plugin. The plugin registers itself in Claude’s tool registry, adding visual verification as a first-class capability. When Claude proposes UI changes, it automatically checks baselines first, shows potential impact, and asks for approval before proceeding. This turns visual regression testing from a post-hoc check into a live guard rail.