Modernizing Outlook Mail Services: A Story of Scale

When people hear “mail service,” they usually picture a boring backend. Outlook Mail Services killed that mental model for me. The system I work on sits in the path of roughly 6 billion Outlook REST requests per day, serves 100+ clients, and exposes 139 endpoints. At that scale, every boundary becomes an operational contract and every “small” change has a real blast radius.

A lot of my work has been tied to the shift from Model A to Model B2. That label sounds harmless, almost like a version bump. It was not. It was a long, careful migration away from a co-located monolith toward a more deliberate service topology, and it taught me more about production engineering than any greenfield project could have.

What Model A actually felt like

Model A was battle-tested engineering. Compute and behavior lived together inside a large, tightly coupled unit, which meant predictable call paths, fewer hops, and fewer ownership boundaries when something went wrong.

But the tradeoff was hidden dependency debt. Teams could move quickly inside the monolith because a lot of assumptions were implicit. Serialization details, MIME handling edge cases, mailbox semantics, retry behavior, and internal contracts all evolved together. That works until you want to move one piece without moving the whole organism.

In rough terms, Model A looked like this:

Client -> Front Door -> OWS / co-located service stack -> mailbox logic, MIME, body transforms, attachments

That path was efficient, but it was also sticky. If one subsystem wanted to evolve independently, it had to drag a lot of context with it.

What Model B2 changed

Model B2 was much more intentional about service isolation. Instead of one giant co-located world, we moved toward clique-based microservices where related capabilities could be grouped, owned, scaled, and inverted independently. The detail that really matters is that B2 did not try to boil the ocean. The new model was designed to take on roughly 20% of the fleet footprint first, prove itself safely, and then earn more traffic over time.

The architecture was closer to this:

Client -> Front Door -> traffic split layer -> OWS endpoint -> B2 clique service -> specialized dependency chain

That extra structure bought us leverage: isolated changes, measurable behavior, and capability-by-capability migration. But B2 also introduced the distributed-systems tax: more contracts, more observability needs, and more ways for partial migration to fail in strange ways.

Owning the traffic splitter changed how I think

The part of this journey that became very personal for me was traffic splitting. I became the sole owner of the OWA traffic splitter for all 213 OWS endpoints, a system handling roughly 350 million requests per hour. That number still resets my sense of scale a little bit. At 350 million requests an hour, even a tiny percentage regression is not a tiny regression. It is a customer incident waiting to happen.

Traffic splitting sounds simple until you implement it at this scale. Correctness mattered more than elegance. The splitter had to understand endpoint shape, client behavior, safe fallbacks, routing policy, and telemetry. It also had to be boring in the best possible way.

A simplified version of the decision flow looked something like this:

function chooseRoute(request: Request): "modelA" | "modelB2" {
  if (!isEligibleEndpoint(request.endpoint)) return "modelA";
  if (!isSupportedClient(request.client)) return "modelA";
  if (isRampDisabled(request.tenant)) return "modelA";
  return hashToBucket(request.mailboxId) < rolloutPercent ? "modelB2" : "modelA";
}

The real implementation was obviously more nuanced, but the philosophy stayed the same: deterministic routing, sharp guardrails, and instant rollback paths.

The inversion work was where the real seams showed up

Two of the most meaningful modernization threads were CC.Body, which we inverted to 100%, and CC.MIME, which we drove to roughly 70% inversion. Those numbers matter because inversion is not just a refactor metric. It is evidence that a formerly embedded behavior can now stand on a cleaner interface.

This is where the migration became intellectually interesting. In a monolith, body transforms and MIME flows often accumulate assumptions because they have the luxury of local knowledge. When you pull them behind a boundary, you discover exactly how many invisible dependencies they had all along.

I spent a lot of time asking uncomfortable questions:

What data shape is truly required here?
Which behavior is contract, and which behavior is accidental coupling?
If we move this across a service boundary, what latency and retry semantics become user-visible?
Which scenarios are relied on by one weird but important legacy client?

That work paid off. Across the migration, we drove end-to-end overhead down by 70% in the paths we modernized. That is the kind of number that changes how people feel about architecture work. It stops sounding like cleanup and starts looking like product-quality improvement.

Zero downtime was the real non-negotiable

The most stressful part was not the coding. It was the requirement that the migration had to feel invisible from the outside. Outlook does not get to tell users, “Sorry, we are modernizing this week.” The bar was zero downtime.

That forced a certain discipline:

Incremental exposure

We never treated migration as a one-way door. Every increment had to be measurable, reversible, and isolated.

Deep telemetry before confidence

I learned to trust graphs over vibes. Error rate, p95 latency, endpoint-specific failures, partner regressions, fallback rates, and client skew told the truth faster than intuition did.

Rollback as a first-class feature

Rollback is not failure. At this scale, rollback is professionalism. If you cannot get traffic back to safety in minutes, you are not really ready to ship.

Partner empathy

One thing large-system migrations taught me is that partner teams do not care how elegant your architecture is if their scenario breaks. They should not. Trust is earned through predictable behavior, not through great design docs.

What this work changed for me

Before this experience, I thought modernization was mostly about better technology choices: newer boundaries, cleaner interfaces, and more scalable patterns. I still care about those things, but now I think the real job is subtler. It is about reducing hidden dependency debt without discarding the reliability the old system earned through years of production scars.

I also came away with much more respect for “boring” engineering: a safe traffic splitter, a careful inversion interface, a rollout plan with good fallbacks, and dashboards that answer the right question fast.

Looking back, this migration made me more patient and more serious about reliability. It taught me that the glamorous version of architecture is greenfield design, but the real version is slower, messier, and far more human. You are not just moving code. You are moving trust, ownership, and production behavior at scale. Honestly, that is what made it worth doing.