How I Use AI To Review AI Code
How to write better code when using AI agents
We're offloading more and more of our coding to AI agents. But AI-generated code has more bugs, security issues, and logic errors than human-written code — and we're generating it faster than any team can review it.
The answer isn’t to skip review. It’s to automate parts of it so humans only spend time on the things that actually require human judgement.
Here’s the four-layer setup I use. Each layer filters out a category of problems so the next layer sees less noise.
Automated checks run your linter, tests, and security scanner before the agent can finish.
Local AI review gets a second agent to review the code before you push.
CI review runs AI code review automatically on every PR. The safety net for when you skip step two (it happens)
Human review handles what’s left: architecture, business logic, and “should we even build this?”
By the time a human looks at the code, the only things remaining are the things only a human can judge.
Here’s the setup.
Layer 1: Automate The Obvious
Claude Code has a feature called hooks. A hook is a shell script that runs automatically at certain points in the agent lifecycle (like when the agent finishes a task). If the script fails, the agent is blocked from completing and has to fix the issues first.
I use a Stop hook that runs my linter and scanner every time Claude finishes work.
The config goes in your Claude Code settings:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": ".claude/hooks/stop-checks.sh"
}
]
}
]
}
}
The script itself is just whatever checks you already run:
#!/bin/bash
set -e
rubocop .
brakeman -q
bundle exec rspec --fail-fastSwap those for whatever your project uses. Ruff and pytest for Python. ESLint for JavaScript. The point is the same: the agent can’t say “done” until these pass.
This alone catches a surprising amount. Formatting issues, unused imports, type errors, broken tests. None of that makes it into a review.
Layer 2: Agent Review
After automated checks pass, review the code yourself and get an AI second opinion before you push.
Two things matter here. First, actually run the code. This sounds obvious but it catches the most embarrassing bugs in two minutes. Second, read the diff. You don’t need to understand every line; understand the shape of the change. What files were touched? Does the scope match what you asked for? Did the agent silently change something you didn’t ask it to?
For the AI review, the key is a fresh context window. Don’t ask the same agent that wrote the code to review it. It has sunk-cost bias and is less likely to challenge its own decisions.
There are a few ways to do this:
Custom Claude Code command. A review prompt in .claude/commands/review.md, paired with a REVIEW file at the project root that encodes your project-specific rules. Portable across tools, fully customisable. Claude Code also ships with some built in plugins.
Codex /review. Four presets covering every scenario (base branch, uncommitted changes, specific commit, custom instructions). Priority-ranked findings. The best local review UX I’ve seen. Bonus: writing with Claude and reviewing with Codex means cross-model review built into your workflow. Different models have different blind spots.
CodeRabbit. /coderabbit:review locally. 40+ linters and scanners running behind the scenes, purpose-built for code review. There are many other great code review tools like Greptile to explore also.
I use a custom review command that reads a REVIEW file at the project root. This file has project-specific rules, things I always want checked.
# REVIEW.md
## Project Patterns
- Repository pattern for data access. Direct DB queries in handlers are a flag.
- New API routes need an integration test. Flag if missing.The general review catches general problems. The project-specific rules catch the things that are unique to your codebase.
Layer 3: External Review
Sometimes I forget to run the local review. Sometimes I’m in a rush. So I have an automated check on GitHub that reviews every PR before a human sees it.
There are a few options for this. Codex has a GitHub integration that reviews PRs automatically. CodeRabbit has a GitHub App that does the same thing. Anthropic has an open source GitHub Action for security-focused review.
I like having this as a separate layer because it catches things even when I skip the local step. Set it up once, runs on every PR for free.
Layer 4: Human Review
By the time a teammate opens the PR, the linter has passed, tests are green, and an AI has already flagged obvious issues. The human reviewer doesn’t need to catch formatting problems or unused variables.
What’s left is the stuff only a human can judge. Is this the right approach? Does it solve the actual business problem? Will this cause issues in three months? Five minutes of focused review on those questions is more valuable than thirty minutes of line-by-line reading.
TL;DR
I spent a long time trying to find the perfect code review setup. The experience was frustrating. There are hundreds of tools, plugins, and approaches, many of them doing the same thing in slightly different ways.
Don’t get lost looking for the perfect solution or perfect prompt. Start with Layer 1. Set up your linter and your hooks; that alone eliminates an entire category of review noise. Then find one way to get an AI review locally that you trust. Add CI when you’re ready.
Start simple and never accept the first output from an agent.
If you’re interested in this topic: I also made a video walking through the full setup with demos if you prefer to watch.

