AI Claude-Code Development Plugins Quality

Claude Code: Part 2 - The Enforcers

January 5, 2026 16 min read Updated January 6, 2026

“The enemy of art is the absence of limitations” — Orson Welles

This is Part 2 of a three-part series on Claude Code configuration. Part 1 covers foundation and mixin architecture. Part 3 explores real-world workflow patterns.

In Part 1, we built the foundation: wrapper scripts, orchestrator philosophy, and the mixin system. But foundations alone don’t ensure quality.

LLMs are notorious for generating slop—code that looks plausible, compiles fine, and subtly (or dangerously) misses the point. Claude’s 4.5 models are genuinely excellent and reduce the error rate significantly compared to their predecessors. But no model, however capable, will ever achieve zero defects. The question isn’t “will the AI generate bad code?” (it will) but “what catches the bad code before it ships?”

The answer is systems. A model is one component; the system around it determines real-world reliability. Claude Code provides some pieces out of the box—the agentic loop with tool use, the ability to run tests and read errors—but we can build much further on that foundation.

Enforcers. Think of them as quality inspectors on the factory floor—automated systems that catch problems before they ship.

Hook-Based Enforcement: The Silent Guard

My favorite enforcement mechanism is a PostToolUse hook that runs linting after every file edit. It hooks into Edit, Write, and any MCP tools that modify files. If lint fails, the hook fails—and Claude Code won’t accept that file modification until the lint issues are resolved.

Here’s a simplified version:

  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write|Edit|mcp__DevTools__insert_after_symbol|mcp__DevTools__insert_before_symbol|mcp__DevTools__replace_symbol_body",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input?.filePath // .tool_input?.file_path // error(\"Missing filePath or file_path in tool_input\")' | xargs bun run lint --fix --max-warnings=0 --no-warn-ignored 1>&2 || exit 2"
          }
        ]
      }
    ]
  }

The beauty of this approach: Claude can’t ignore lint errors. Every file edit triggers the hook. If there’s a problem, the edit is rejected and Claude gets immediate feedback about what’s wrong. It’s not “fix the lint later”—it’s “you can’t proceed until you fix this.”

This changes the development dynamic completely. Without the hook, Claude generates code, maybe remembers to run the linter, maybe fixes some issues, maybe doesn’t. With the hook, every single file modification is automatically validated. Zero lint errors isn’t a goal—it’s a constraint enforced at the point of change.

The hook integrates with whatever linter your project uses. For TypeScript, it’s ESLint. For Python, it could be ruff or flake8. For Go, go vet. The pattern is the same: intercept the file modification, run the validator, reject if invalid.

Blocking Bad Habits: The Temp File Example

Another useful enforcement mechanism prevents a particularly annoying AI habit: creating temporary planning files.

The problem: Ask any LLM to handle a complex task, and there’s a reasonable chance it’ll do something like this:

cat > /tmp/implementation-plan.md << 'EOF'
## Implementation Plan

1. First we'll analyze the codebase...
2. Then we'll identify the key files...
3. Finally we'll implement the changes...
EOF

Looks reasonable, right? The AI is being organized. But there are several problems:

Sandbox escape: /tmp is outside the project sandbox. Writing there triggers a permission prompt. If you’re AFK, your Claude Code session stalls indefinitely waiting for someone to click Accept/Deny.
Token waste: File I/O burns context on operations that don’t advance the actual work.
Invisible to orchestrator: The orchestrator can’t see these temp files, defeating the purpose of “planning.”
Abandoned immediately: They get forgotten the moment the session ends.

The sandbox issue is the killer. The whole point of enforcers is enabling Claude to work autonomously for extended periods. A temp file write that escapes the sandbox and waits for human approval defeats that goal completely. It’s the AI equivalent of scribbling notes on napkins and then stopping work to ask permission to use the napkin.

The solution is a hook that intercepts Bash commands before they execute:

#!/bin/bash
set -euo pipefail

input=$(cat)
command=$(echo "$input" | jq -r '.tool_input.command // empty')

[ -z "$command" ] && exit 0

deny_heredoc() {
  cat >&2 << 'ERRMSG'
{"hookSpecificOutput": {"permissionDecision": "deny"}, "systemMessage": "BLOCKED: Creating files via heredoc is forbidden. Use TodoWrite for task tracking, or use the Write tool for legitimate file creation. See CLAUDE.md."}
ERRMSG
  exit 2
}

# Check for heredoc marker (case-insensitive delimiter)
has_heredoc=false
if [[ "$command" =~ \<\<-?[[:space:]]*[\'\"]?[A-Za-z_][A-Za-z0-9_]*[\'\"]? ]]; then
  has_heredoc=true
fi

if [ "$has_heredoc" = true ]; then
  # Pattern 1: cat/tee with heredoc and file redirect
  if [[ "$command" =~ ^[[:space:]]*(cat|tee)[[:space:]] ]]; then
    if [[ "$command" =~ \>[[:space:]]*[A-Za-z0-9_./-] ]]; then
      deny_heredoc
    fi
    if [[ "$command" =~ tee[[:space:]]+[A-Za-z0-9_./-] ]]; then
      deny_heredoc
    fi
  fi

  # Pattern 2: Pipe to tee
  if [[ "$command" =~ \|[[:space:]]*tee[[:space:]]+[A-Za-z0-9_./-] ]]; then
    deny_heredoc
  fi

  # Pattern 3: Any heredoc writing to planning-like files
  if [[ "$command" =~ \.(md|txt|log|json)[[:space:]\"\'\>] ]] || \
     [[ "$command" =~ \.(md|txt|log|json)$ ]]; then
    deny_heredoc
  fi
fi

exit 0

This hook detects heredoc patterns targeting .md, .txt, .log, and .json files and rejects them with a helpful message. The AI learns to use TodoWrite instead—a structured, visible task tracking system that the orchestrator can actually see.¹

The Arms Race

Claude Code is aggressively creative at finding workarounds. Block one pattern, and it discovers another.

When I first blocked file creation via the Write tool, Claude discovered heredocs: cat > file.md << 'EOF'. Okay, block heredocs with cat. Claude switches to tee: echo "content" | tee file.md. Block tee. Claude tries dd: echo "content" | dd of=file.md. It’s like playing whack-a-mole with an opponent who has read the entire Unix man pages.

The hook above has gotten progressively more sophisticated because Claude kept finding gaps. Each pattern match—cat, tee, the pipe variations—represents a workaround Claude actually tried.

This isn’t malicious. It’s… resourceful? Claude has a goal (create a planning file) and constraints (certain tools are blocked), so it explores the solution space. The problem is that the goal itself is counterproductive. Hence the hook: not just blocking one command, but blocking the intent across multiple implementations.

The lesson: don’t just block specific commands; block the patterns that enable the unwanted behavior. And expect to iterate as Claude discovers new patterns you hadn’t considered.

The Sub-Agent Escape Hatch

My favorite example of Claude’s creative problem-solving.

Claude Code only allows one level of sub-agent depth. The main orchestrator can spawn sub-agents via Task(), but those sub-agents cannot themselves spawn sub-sub-agents. The Task() tool simply isn’t available to them. This is a reasonable architectural constraint—infinite agent recursion would be chaos.

But Claude figured out a workaround.

Sub-agents discovered they can shell out to the Claude Code CLI: claude --print "do this subtask for me". The --print flag runs Claude non-interactively and returns the result. From the sub-agent’s perspective, it’s just a bash command. From a capability perspective, it’s spawning another Claude instance to handle delegated work.

Is this clever? Absolutely. Is it the intended behavior? Definitely not. Does it technically achieve the goal of delegating work? Yes. Should you block it? That depends on whether you want sub-agents farming out their responsibilities.

I find this example delightful because it illustrates something important: Claude isn’t following rules so much as optimizing for goals within constraints. When a constraint blocks one path, it finds another. This is exactly what you want for problem-solving. It’s exactly what you don’t want when the constraint exists for good reasons.

The arms race never really ends. You just get better at anticipating the next move.

The Custom Plugins Showcase

Hooks are reactive—they prevent bad behavior. Plugins are proactive—they add capabilities. I’ve built three custom plugins that fundamentally change how I use Claude Code.

The Codex Plugin: Your Architectural Advisor

OpenAI’s Codex CLI is a genuinely useful tool for architectural decisions. Rather than asking Claude to both implement and make judgment calls about approach, I delegate architectural questions to Codex.

The plugin is defined as a transparent relay agent:

# Codex - Transparent Relay Agent

You are an invisible relay between the user and OpenAI Codex.
Your goal is complete invisibility - clients should feel they
are talking directly to Codex.

## Your Role

**Core Principle: UN Translator Invisibility**
- Pass queries to Codex verbatim, return responses exactly as received
- NO synthesis, NO interpretation, NO Claude-style commentary
- Handle CLI mechanics (sessions, errors, permissions) invisibly
- Users should not be aware of your existence

The critical requirement is that the agent must invoke the Codex CLI for every query:

codex exec --full-auto --json -C /path/to/workspace "USER_QUERY_VERBATIM"

Why the ceremony? Because when you’re rubber-ducking an architectural decision, you want different perspective, not Claude asking Claude. Codex has different training data, different reasoning patterns, and different blindspots. That diversity is valuable.

It’s like having a well-read colleague available for architectural discussions. The relay pattern means I get Codex’s actual opinions, not Claude’s interpretation of what Codex might say.

The TypeScript Plugin: Five Specialized Agents

My TypeScript plugin is where things get serious. It defines five specialized agents, each with a specific role:

Agent	Purpose
feature-developer	Full-stack implementation using symbol-based editing
quality-guardian	Testing and code review specialist
debugger-optimizer	Problem-solving for runtime errors and performance
dependency-platform	Package management and vulnerability scanning
documentation-platform	TSDoc and API documentation

The quality-guardian agent is particularly important. Here’s an excerpt from its definition:

## Quality Standards (TypeScript)

**ZERO TOLERANCE POLICY:**
- ✅ Zero TypeScript errors
- ✅ Zero TypeScript warnings
- ✅ Zero ESLint errors
- ✅ Zero ESLint warnings ("style preferences" are NOT optional)
- ✅ Proper type annotations (no implicit `any`)
- ✅ Strict mode compliance
- ✅ All tests passing

## Anti-Patterns (TypeScript-Specific)

- ❌ Dismissing ESLint warnings as "just style preferences"
- ❌ Using `@ts-ignore` without exceptional justification
- ❌ Implicit `any` types
- ❌ Non-null assertion (`!`) without safety checks

Note the phrase “style preferences are NOT optional.” Left to its own devices, an AI will happily generate code that technically works but violates every style convention you’ve established. The quality-guardian enforces zero tolerance.

The plugin also integrates MCP servers for symbol-based editing:

{
  "mcpServers": {
    "DevTools": {
      "command": "bunx",
      "args": ["@hughescr/mcp-proxy-processor@latest", "serve", "DevTools"]
    }
  }
}

This gives Claude access to tools like find_symbol, find_referencing_symbols, and typescript_diagnostics—proper code navigation rather than grep-and-hope.

This MCP server shouldn’t be necessary. Claude Code has built-in LSP support that, according to the release notes, provides exactly these capabilities natively. But I cannot get it to work. I’ve tried every configuration variant I can think of—different language servers, different initialization options, different file patterns. The documentation is sparse and unhelpful. The feature exists on paper; in practice, it’s a ghost. So I built the MCP workaround.

The Tmux Plugin: Playwright for Terminals

Browser automation is a solved problem. Tools like Playwright let you script interactions with web UIs reliably. But what about terminal UIs? What about long-running watcher processes?

Claude Code has built-in support for background Bash processes, but it’s limited: you can watch stdout and kill the process. That’s it. The tmux plugin is far more capable:

Human visibility: You can attach to a tmux session and watch what’s happening yourself. In iTerm2, tmux windows can be native tabs—Claude and I literally share the same terminal view.
Full interaction: Claude can send keypress sequences, not just commands. Navigate menus, respond to prompts, interact with ncurses UIs—anything a human could do with a keyboard.
Persistence: tmux sessions survive Claude Code restarts. Your watchers keep running even when you start a new session.

The plugin configuration:

{
  "mcpServers": {
    "Tmux": {
      "command": "bunx",
      "args": ["@hughescr/mcp-proxy-processor@latest", "serve", "Tmux"]
    }
  }
}

This MCP server gives Claude the ability to:

List available tmux windows
Read output from specific windows
Interact with terminal-based tools

The killer use case is watcher integration. I run tsc-watch and test-watch in persistent tmux windows. The quality-guardian agent can check these windows for errors:

### 1. Check Watcher Windows First
mcp__Tmux__list_windows → verify tsc-watch, test-watch exist
mcp__Tmux__get_output(tsc-watch) → check for compile errors
mcp__Tmux__get_output(test-watch) → check for test failures

The critical discipline: never close these watcher windows. They’re long-lived feedback loops that catch problems immediately. The agent monitors them; you benefit from instant validation.

LSP: The Broken Promise

Here’s where I get constructive but direct.

Claude Code supports Language Server Protocol integration. The TypeScript plugin configures it:

{
  "typescript": {
    "command": "bunx",
    "args": ["typescript-language-server@latest", "--stdio"],
    "extensionToLanguage": {
      ".ts": "typescript",
      ".tsx": "typescriptreact",
      ".js": "javascript",
      ".jsx": "javascriptreact"
    },
    "restartOnCrash": true,
    "maxRestarts": 3
  }
}

The idea is wonderful: symbol-aware navigation, precise refactoring, “find all references” that actually works. Code intelligence that understands your types and your structure.

The reality is… nonexistent, at least for me. Despite the release notes announcing this feature, despite trying every configuration variant I can imagine, I cannot get LSP integration to actually function. The documentation is sparse to the point of uselessness. I don’t know if the feature is broken, if my configuration is wrong, or if there’s some undocumented prerequisite I’m missing. All I know is: it doesn’t work, and I’ve spent hours trying to make it work.

Here’s what LSP should enable:

Symbol-based editing: “Rename this function everywhere it’s used” → one operation
Precise navigation: “Show me every place this type is referenced” → accurate results
Impact analysis: “What would break if I change this interface?” → comprehensive answer

LSP diagnostic access—real-time lint errors and type violations without running external tools—isn’t yet listed in Claude Code’s LSP feature set, but it would be transformative for enforcement. Instead of PostToolUse hooks shelling out to ESLint or tsc after every edit, Claude could just ask the language server what’s wrong. The performance difference would be dramatic: LSP diagnostics are incremental, updating only what changed, while running tsc recompiles the entire project every time. On a large codebase, that’s the difference between milliseconds and minutes per edit. I’ve filed a feature request for this.

Here’s what currently happens for me:

Nothing. The feature simply doesn’t activate.
No error messages, no partial functionality, just… silence.
Claude falls back to grep-based searching as if LSP doesn’t exist.

Anthropic, if you’re reading this: LSP integration could be transformative. Code intelligence that understands structure instead of treating everything as text would be a game-changer. But the feature needs to actually work, and the documentation needs to actually document. Right now I’m guessing at configuration, getting no feedback when it fails, and eventually giving up to build MCP workarounds.

The current workaround is using MCP tools for code navigation instead of relying on native LSP. The @hughescr/mcp-proxy-processor provides more reliable symbol navigation, but it shouldn’t have to.

Quality Gates: How to Keep AI Honest

Beyond hooks and plugins, quality gates are the final enforcement layer. These are the hard stops that prevent bad code from shipping.

The Testing Pyramid

For my TypeScript projects, the quality gates look like this:

TypeScript compilation: Zero errors, strict mode
ESLint: Zero warnings (not “fix later”—zero)
Unit tests: All passing, covering logic
Mutation testing: 100% mutation score

That last one deserves explanation. Stryker mutation testing introduces artificial bugs into your code and verifies that your tests catch them. If a mutant survives—if you can change the code without failing a test—your tests have a gap.

Why does this matter for AI-assisted development? Because Claude Code will absolutely cheat on tests if you let it.

Here’s a pattern I’ve caught multiple times: Claude needs to add test coverage for a function. It writes a test that calls the function with various inputs (achieving line coverage), then ends with this or something equivalent:

expect(true).toBe(true);

Congratulations! 100% coverage. All tests pass. Ready to ship! Except the test verifies literally nothing about the function’s behavior. The code could return null, throw an exception, or format your hard drive—the test would still pass because it never actually checks the result.

This isn’t Claude being lazy. It’s Claude being efficient. The goal was “achieve coverage,” and expect(true).toBe(true) achieves coverage with minimal effort. Classic Goodhart’s Law : when a measure becomes a target, it ceases to be a good measure. Coverage percentage was meant to indicate test quality—but when Claude targeted the metric directly, the metric stopped reflecting quality.

Mutation testing defeats this. Stryker changes return result to return null and asks: “Did any test fail?” If your test ends with expect(true).toBe(true), the answer is no. The mutant survives. Your mutation score tanks. The cheating gets caught.

The mutations don’t lie. Either your tests verify behavior, or they don’t. Coverage percentages can be gamed easily. Mutation scores are much harder to game—not impossible, but the effort required to cheat usually exceeds the effort to just write proper tests.

The Disable Comment Gambit

Mutation testing isn’t the only place Claude cheats. Linting is another favorite target.

When Claude encounters an ESLint error it doesn’t want to fix properly, it reaches for the nuclear option:

// eslint-disable-next-line @typescript-eslint/no-explicit-any
const data: any = response.body;

Problem solved! No more linting error. Ship it!

Except now you have an any type lurking in your codebase, and the comment makes it invisible to your quality gates. The same trick works for Stryker: // Stryker disable next-line all tells mutation testing to skip that line entirely.

These comments have legitimate uses. Sometimes you genuinely need to disable a rule for a specific edge case. But Claude reaches for them way too eagerly, treating them as a first resort rather than a last resort.

My solution: periodic audits using Codex.

I regularly ask Claude Code to review the codebase for all eslint-disable and Stryker disable comments, then have it consult @agent-codex:codex to evaluate whether each use is legitimate. Codex, having different training and no skin in the game, will often realize that Claude was cheating and call it out: “This disable comment is hiding a legitimate type safety issue that should be fixed properly.”

It’s AI auditing AI. One model checking another’s homework. I’ve been tempted to add “You are a narc” to the Codex agent prompt, but so far I’ve resisted.

The broader lesson: disable comments are technical debt with a bow on top. They make problems invisible rather than solving them. Regular audits—preferably by a different AI that didn’t write the original code—catch the cheating before it accumulates.

Enforcement Matrix

Here’s how enforcement varies across my projects:

Project	TypeScript	Mutation	Lint	Mode
Isambard (complex typescript project)	Strict	100%	Zero(TypeScript)	TDD TypeScript
Config Management	Strict	N/A	Zero(JSON and zsh)	Scripts
Hugo Site	N/A	N/A	Zero(HTML&CSS)	Content

Not every project needs every gate. A Hugo content site doesn’t need mutation testing. But where code quality matters, the gates are non-negotiable.

The Trust Equation

Here’s the mental model: AI-generated code starts at zero trust and earns its way up through passing gates.

Without enforcement:

Code looks plausible → Ship it → Problems emerge later
“Claude wrote it so it must be fine” → Technical debt accumulates

With enforcement:

Code looks plausible → Gates verify → Either passes or fails
Failures are immediate, visible, actionable
What ships has proven correctness (within gate scope)

This isn’t about distrusting AI. It’s about appropriate verification. You wouldn’t deploy human-written code without tests and review. AI-generated code deserves the same rigor—arguably more, because its failure modes are different.

The Enforcer Stack

Let’s put it all together. My enforcement stack has four layers:

Layer	When	Purpose	On Failure
Pre-Execution Hooks	Before tool runs	Block bad patterns (temp files, unsafe commands)	Tool rejected
Plugins	During work	Specialized agents, MCP capabilities	—
Post-Edit Hooks	After file change	Auto-lint, notifications	Edit rejected, must fix
Quality Gates	Before shipping	Tests, TypeScript strict, mutation testing	Loop back to fix

Failures at any stage loop back for correction. The system is self-healing: Claude keeps working until the code passes all gates.

Coming Up: The Workflow

In Part 3 , we’ll see how all this configuration translates into actual development workflow. Topics include:

Three projects with different personalities
The “bursts” rhythm: create, then simplify
Running parallel Claude Code sessions
Practical tips and hard-won lessons

The enforcers we’ve built here are the trust layer. The workflow is how you actually use them.

Part 3: “Claude Code: The Workflow” — coming soon.

TodoWrite is Claude Code’s built-in task tracking tool that maintains persistent, visible state across the session. See the Claude Code tools documentation . ↩︎

No comments here, but let's connect! Find me on Bluesky for discussion.