Shahzad Bhatti Welcome to my ramblings and rants!

May 26, 2026

The Complexity Trap: Why Simple, Bug-Free Systems Can Hurt Your Career

Filed under: Computing — admin @ 10:06 pm

I have worked for both large tech companies and startups. Two patterns kept showing up across every company I worked at startup and large company alike that both punish the engineers doing the right thing.

At startups, the pressure is entirely on shipping features. Engineers who move fast and ship constantly get rewarded. Security, observability, scalability become “future problems.” The engineers who slow down to build things properly, who push back on cutting corners, get treated as obstacles. The corners get cut anyway. When the system eventually breaks under load or gets breached, nobody connects it back to the decisions made two years earlier. The engineers who raised concerns are long gone or drowned out.

At large companies, a different trap. Ship something clean with simple design, solid implementation, few follow-up bugs and people move on. Nobody notices the problems that didn’t happen. Nobody gets promoted for the outages that never occurred. But ship something overengineered, watch it fall apart in production, spend months firefighting and suddenly you’re a hero. The tech lead who pushed patches at 2am gets noticed. Management reads the complexity as evidence of a hard problem solved. The tech lead gets promoted and moves to the next team. The engineers left behind inherit the mess.

Same outcome, different path. In both cases, the engineers who built things well are invisible. The ones who created the problems or thrived on them get ahead.


Essential vs. Accidental Complexity

In The Mythical Man-Month, Fred Brooks defined two kinds of complexity. Essential complexity is the irreducible difficulty built into the problem domain itself. Accidental complexity is the difficulty we add through poor abstractions, unnecessary coupling, and artificial layers. Larry Tesler’s Law of Conservation of Complexity says essential complexity can’t be eliminated, only moved. Push it out of the user interface and it lands in your middleware.

What most companies reward the accidental kind. Many moving parts, multiple failure modes, a fleet of services with their own deployment pipelines as these look like a hard problem solved by smart engineers. A system that just works, simply and reliably, signals nothing. The people who built it must have been working on something easy. I saw this repeatedly at larger companies. Senior engineers with years of incremental, principled improvements couldn’t get promoted because their work wasn’t considered “complex enough.” The implicit rule was clear: elegance doesn’t get you promoted.


War Stories

The database migration that became a platform. At a large tech company, we needed a simple migration from one database to another but it turned into a real-time data synchronization system. Suddenly there were shadow testing components, reconciliation pipelines, anti-entropy jobs for fixing discrepancies, and runbooks for each failure mode. The project stretched from months into years. The original problem, move data from A to B, never required any of it. But the complexity generated headcount, resources, and career advancement that a clean migration would never have produced.

The microservices migration that never finished. A monolith-to-microservices transition ran so long the team ended up maintaining both systems simultaneously. The migration date kept slipping. Nobody could tell you which services were fully cut over. The codebase became a graveyard of abandoned halfway points. Years of engineering time consumed, several promotions justified. The engineers who eventually inherited it had no idea what was intentional and what was just never cleaned up.

The Erlang rewrite. At a FinTech company, senior executive decided to rewrite an order management system from Java to Erlang, not for a specific technical reason, but because Erlang was interesting. Brooks called this the second-system effect: when engineers rewrite something they think they now understand, they pile in everything they held back the first time. The effort was far larger than anyone expected. Management abandoned it partway through. The team was left with two halves of the same system in two different languages, domain knowledge split across both.

The Go rewrite. The same executive years later decided to rewrite a Java financial system in Go because Go was what the industry was talking about. Years passed, the migration stalled. Some parts in Go, most still in Java. The team gave up. Meanwhile the actual urgent problems like data consistency, observability, performance at scale went unaddressed because everyone’s attention was on the rewrite. Nobody owned the full picture of dependencies or understood the consistency guarantees. Meanwhile, sales sold the system as a low-latency and four nine availability but in practice it was based on false illusion due to poor observability.

The postscript at that second company: when AI became the new shiny thing, the pattern played out again. Engineers who built flashy demos got promoted. The people fixing real infrastructure problems had nothing visible to show.


Conceptual Integrity Breaks Down as Organizations Grow

In the original Mythical Man-Month, Brooks argued that the most important property a system can have is conceptual integrity, one coherent design philosophy, with someone who holds the whole system in mind and says no to things that don’t fit. His prescription was a chief architect with real authority over what goes in and what stays out. That works when one person can still comprehend the system. As organizations grow and systems get divided among teams, nobody has that view anymore. Each team makes locally reasonable decisions. Accidental complexity accumulates not from individual mistakes but from the disconnect between groups who can’t see each other’s work.

Cross-cutting concerns like security, authentication, observability are where this gets dangerous fastest. I saw one system where authentication behaved differently depending on whether you were on-premises or in the cloud, and whether you were hitting the control plane or data plane. Secrets in some places, JWTs in others, config files in some environments, environment variables in others, a wall of conditional logic tying it together. No single person understood the whole thing. That mess led to a significant security breach and customer churn. Nobody designed it. It grew, one locally reasonable decision at a time.


Two Different Failure Modes

Startups and large companies both get this wrong, but for opposite reasons.

Startups are under pressure to ship customer-facing features. Security, observability, performance, operational burden become “future problems.” Sometimes that’s the right call. A startup that dies building the perfect architecture ships nothing. But the technical debt from ignored non-functionals doesn’t disappear. It accumulates, and it usually arrives all at once right when the company is trying to scale. That’s the worst possible time to deal with it.

Large companies have the opposite problem. The incentive structure rewards visible complexity. Tech leads propose ambitious architectures, staff up around them, ship something complicated, and move to the next team before the consequences mature. The engineers who inherit the system didn’t choose the design, can’t fully explain it, and can’t safely simplify it because they don’t understand what each piece is actually doing.

In both cases, the people who make the architectural decisions aren’t around to live with them. That gap between decision and consequence is the core of the problem.


The Goldilocks Principle

The approach that actually works is simpler than it sounds: start with the least complex architecture that handles the real requirements. Add complexity only when something forces you to.

Not simple for its own sake, e.g., if the domain genuinely requires distributed coordination, the design should say so. But the default should be: prove the complexity is necessary before building it. “This is how I’ve seen it done at bigger companies” and “this technology is interesting” are not justifications. Neither is designing for scale you don’t have. I’ve watched teams build for ten million users when they had ten thousand, then spend two years maintaining infrastructure that served no real requirement.

Vertical slices enforce this discipline. When you ship thin, end-to-end cuts of real functionality that a user can actually touch then you find out fast whether your design is right. The feedback loop is short. A wrong assumption costs a week, not six months. You can correct before the mistake becomes load-bearing.


AI Accelerates This Problem

With tools like Claude Code and Cursor, the implementation bottleneck is largely gone. A team using AI assistants can build a distributed system with five services in the time it used to take to build one. That’s progress if the design is right. If the incentive structure still rewards accidental complexity, AI just produces it faster.

In When Copying Kills Innovation: My Journey Through Software’s Cargo Cult Problem, I shared the cargo-cult behavior like adding components because they look sophisticated happens at higher velocity now. An AI agent given a vague prompt and no design constraints defaults to patterns common in its training data. That means microservices when a monolith would do, event buses when a direct call would do, five abstractions where two would do.

As I wrote in AI Writes Code. You Own the Design., the thinking parts like the what and why can’t be delegated to an agent. AI handles the how. Engineers who can identify essential complexity, strip the accidental kind, and hold a design together are more valuable now than before. But only if the organization’s reward structure reflects that.


How Do You Fix the Reward Structure?

I don’t have a clean answer. But here’s where the levers are.

  • Reward outcomes, not artifacts. Most promotion processes credit visible artifacts: the design doc for a complex system, the heroic incident response, the fleet of services owned. The outcomes that actually matter, a system that stayed up for two years, a migration that finished in six weeks, a design that five new engineers understood on day one are harder to see and usually go uncredited. Engineering leaders have to explicitly define what good engineering looks like and measure it over time horizons long enough to see consequences.
  • Make accountability follow decisions. Connect tech leads to the consequences of their architectural choices twelve to eighteen months later. Not as punishment as designs fail for unforeseeable reasons. But an engineer who never sees what their decisions cost never updates their model. Right now the feedback loop doesn’t exist for most people who make these calls.
  • Credit the “no.” The engineers who prevent bad architectures from being built are the hardest to recognize. The bad system was never built, so there’s nothing to point to. If you want more of this behavior, name it explicitly and credit it explicitly. Otherwise the rational move for any ambitious engineer is to propose the complex thing and let someone else clean it up.
  • Add a simplicity lens to design reviews. Most design reviews ask: will this work? Fewer ask: is this more complex than it needs to be? Formally asking “what would we remove without losing essential functionality?” changes the conversation. The burden of proof shifts to adding a component, not removing one.

The Conversation Worth Having

Brooks wrote that conceptual integrity is the most important consideration in system design. What the book doesn’t address is that most organizations are structured to undermine it like rewarding the engineers who add complexity and moving them on before they face the consequences. The engineers who hold the line against unnecessary moving parts, who ship systems that work quietly for years, who say “we don’t need this” and mean it are doing some of the hardest work in software. In most companies, they’re not the ones getting promoted.

With AI accelerating the implementation layer, the judgment required to distinguish essential from accidental complexity matters more than it ever has. If the reward structure doesn’t change to reflect that, we’ll just build the wrong things faster.


Related reading:

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URL

Sorry, the comment form is closed at this time.

Powered by WordPress