4 min read

How we use AI to write and review code without losing ownership

The workflow. The guardrails. The moments it goes wrong.

The conversation about AI and code has two camps: the optimists who think the humans will be reviewing rather than writing soon, and the sceptics who've seen too many confident wrong answers to trust the output.

After two years of systematic AI-assisted development, my position is: both camps are partially right, which means neither is useful as a guide.

Here's what we actually do.

How we write code

In-editor, Copilot handles completion. It's fast, it's context-aware within the file, and it handles the mechanical parts of implementation well: type-consistent returns, standard library usage, boilerplate for common patterns. I've stopped thinking of this as AI and started thinking of it as a better autocomplete.

For larger generation tasks - implement this module given this interface, write tests for this service, convert this pseudocode - we use Claude directly. The workflow is: write the spec (sometimes AI-assisted, see below), give the model the spec plus the relevant context from the codebase, review and edit the output.

The key discipline: we treat the model as a junior engineer who writes quickly and needs thorough review. We don't treat it as a senior engineer whose output we can skim.

The review side

Automated review runs on every PR before human review. The agent checks for a fixed set of mechanical issues. We've tuned this list over eighteen months. The current list catches things that come up repeatedly and are fast to fix: missing error handling on external calls, inconsistent return types, input validation gaps.

Human review still happens on every PR. The agent doesn't replace it. What it does is let the human reviewer focus on design, intent, and domain-specific correctness rather than catching the same class of mechanical issues repeatedly.

Where it goes wrong

Confident incorrectness on domain-specific logic. The model doesn't know our system the way we do. When we give it a task that requires understanding how two distant parts of the system interact, it produces code that looks correct and is subtly wrong. The review catches this, but only if the reviewer knows the system well enough to spot it.

This is the loss-of-ownership failure mode: if the human reviewer doesn't understand the code they're reviewing, the guardrail fails. We've invested in code ownership - every module has a clear owner who reviews changes to it - partly because of this risk.

Prompt drift. The prompts we use for automated review were written twelve months ago. The codebase has changed. Some of the checks are now less relevant. Some gaps have opened. We've done one pass of prompt maintenance; we should do it more often.

The guardrails that matter

No automated commits. No automated PR creation. The engineer writes the commit message, reviews the diff, and decides what ships. This is non-negotiable.

Module ownership. Every significant module has a named owner. That person reviews all changes to their module. AI-generated changes included.

Regular review of automated suggestions. We track what percentage of the automated review comments engineers mark as useful. If it drops below 50%, the prompt needs revision. Currently it's around 65%.

On maintaining understanding

The fear I hear most from engineering leaders: developers will start shipping code they don't understand because the AI wrote it.

This is a real risk. The mitigation is not to avoid AI-assisted development. It's to require understanding as a condition of merging. In practice: if you can't explain what a piece of code does and why it's correct, don't merge it. This standard applies to AI-generated code and human-written code.

The teams that lose ownership are the ones that outsource the understanding, not just the typing. The teams that maintain ownership treat the AI as a tool that produces output they own.

With gusto, Fatih.