4 min read

What scales without headcount

The architecture decisions, culture choices, and AI tooling that let a small team carry a large product.

We've grown the product significantly over the past year without growing the team. This comes up in every conversation I have with other engineering leaders, so here's the actual accounting.

Architecture that scales itself

Cloud Run scales to zero and scales to N. We don't manage instances. We don't pre-provision capacity. We write stateless services, deploy them, and the infrastructure handles the rest.

The cost of this decision is real: cold starts add latency on the first request after idle. For our use cases - CV inference and document processing - we've tuned minimum instance counts for the latency-sensitive services. The non-latency-sensitive services run to zero.

The alternative is a Kubernetes cluster that someone manages. We don't have the headcount for that. We don't need it for our traffic patterns.

Automation that replaces process

Every repetitive operational task that a human was doing manually is now automated. Not "we should automate this" - actually automated.

Specifically: deployment notifications (Slack message when a new revision goes live), weekly infrastructure cost reports (Cloud Billing export into a BigQuery view, weekly Slack digest), alert enrichment (automated context gathering attached to incident alerts). None of these is technically impressive. All of them are things an engineer was doing manually before.

The discipline required is treating a manual operational task the same way you'd treat a production bug: something to be fixed, not something to be routinized.

AI tooling that multiplies individual output

The honest number: each engineer on this team is producing roughly 30-40% more output than they would have two years ago, on similar work, using AI tooling. This isn't a guess - it comes from tracking velocity metrics before and after we systematized our AI usage.

The majority of that gain comes from two things: faster first drafts on any written artifact (specs, docs, postmortems, PR descriptions), and in-editor code completion that handles the mechanical parts of implementation.

The ceiling on this gain is judgment, not throughput. Writing faster doesn't help if you're writing the wrong thing. The teams that get the most out of AI tooling are the ones where the engineers already have good judgment about what to build. The tools amplify the signal, not just the volume.

Culture that doesn't require management overhead

We do weekly async written status updates. Every engineer writes one. They're not performance reviews. They're operational instruments: what shipped, what's blocked, what needs a decision.

The value is that I don't spend management time extracting status. I spend it on the decisions that the status updates surface. The two are different activities and they shouldn't be conflated.

This only works if people write honest status updates. Creating the conditions for that is a culture problem, not a process problem.

What doesn't scale without headcount

Customer relationships. We have enterprise clients with real expectations and escalation paths. Managing those relationships takes human time, and that time scales with the number of clients. AI doesn't change this.

Complex debugging. When something breaks in a non-obvious way, an experienced engineer needs to trace through the system, form hypotheses, test them. This is not a task I've found reliable automation for. Good engineers debug faster than bad engineers, and there's no shortcut.

Strategic decisions. What to build next, which clients to prioritise, how to position the product. These require judgment about things that aren't in the data.

The honest summary

A small team can carry a large product if the architecture is simple enough to operate without headcount, the operational work is automated aggressively, and the AI tooling is used systematically rather than sporadically.

What you can't substitute for is having people who are good enough to make the judgment calls correctly. That's the actual constraint, and it doesn't scale with tooling.

With gusto, Fatih.