Fighting Latency
9 min read

Fighting Latency

Fighting Latency
Photo by Mathew Schwartz / Unsplash

As I talked about in "Blackbox" Systems Management, bandwidth and latency are the two most important metrics in managing a system (in our case, usually a business or team). While bandwidth is critical to scaling the output of the business, the expense of coordinating larger team sizes drives higher bandwidth businesses to suffer higher latency.

Since latency more accurately measures the lived experience of customer, increasing bandwidth at the expense of latency erodes your competitive position. Fighting latency is critical to success.

However it's not just in large businesses that this is critical. Low latency increases your general responsiveness. In a changing environment, a system that is more responsive will win. So even if your a solo trader developing new systems, the latency of your process is as critical to your success as it is for Intel developing the next generation of CPUs.

Fighting latency is actually really simple: just over-provision. If you have more capacity than you need, the system clears out jobs quickly. Viola! Low latency!

When you achieve this, one single step will become the "bottleneck". The bottleneck is the rate-limiting step, and you'll know it because it's the only step with a queue of any size. From there, if you increase the output of your bottleneck, the entire system output gets increased. Eventually, you'll over-provision the bottleneck so much that it's no longer the bottleneck--and you'll see the queue form somewhere else.

In practice, however, we also have to worry about the two other factors that determine system performance: quality and cost. Simply over-provisioning everything is often far too expensive to be practical. The competitive environment just won't allow you to collect the kinds of the margins needed to pursue such a naive strategy.

Instead, we have to efficiently create low latency systems. There are a few usual techniques:

  1. Dedicated, nearby resources
  2. Flex capacity
  3. Role Hybridization
  4. Burst capacity
  5. Improvement of Bottleneck Processes

Dedicated, Nearby Resources

The opposite of shared, distant resources are dedicated, nearby resources. While this does often lead to over-provisioning, you also get the advantage of much reduced communications overhead--reducing the amount of over-provisioning needed to eliminate the shared resource as the bottleneck.

First, by having a single person (or small group of people) dedicated to the function, fewer people need to be kept in the loop. That straight up reduces the amount of communication that has to happen. As things change, this saves a ton of time and reduces the likelihood that different groups fall out of step.

Second, because these people aren't shared across multiple teams, they focus all their time and energy on the topic at hand. In fast moving environments, the threat of communications overload is real. People need to spend more and more of their time just keeping up with the lates events. By having resources focused on a single team, they can communicate more frequently, in smaller higher-fidelity chunks--and this form of communication utilizes channels "natural" to human beings, so it actually happens.

Imagine you had to make a data request, but before you could make it, you had to explain the most important things that have happened in your team for the past 6 months. What would make the cut? You'd probably summarize a few bullet points worth of highlights and lowlights and move on. The resulting "context" delivered is minimal at best.

Now, what if you instead had to do the same thing, but only for the past week? A lot of things that wouldn't make the 6-month list would make the 1-week list. And if you did this every week for 6 months, you'd generate 26 data-dumps--each at the higher fidelity. In some cases, you might even need an hourly report frequency--if you're working with a team of traders or soldiers, even that might not be enough.

Of course, the data team could try to mandate that the once-per-six-months report had the same level of detail as 26 weekly reports. But that's just a big F-you to the rest of the company. Who's going to actually do that? The overhead of writing the report would just kill the request right there.

An extreme version of this "report often" dynamic is an embedded resource that works alongside the client team. They go to every staff meeting, have close working relationships with the client team members, and generally are available to for knowledge transfer as needed. (The most extreme version is to force a high degree of role hybridization, which I'll cover later.)

This kind of availability does mean you get less time dedicated to actual data work from them, but the time lost is more than offset by the high-fidelity context they can bring to bear on every task. Compared to a centralized resource, embedded resources

  • Suffer less communications overhead
  • Make better decisions about "what" to do
  • Throw less work away
  • Turn around tasks for the client team faster

It's just a better way of doing things when its feasible.

Flex Capacity

Photo by Ben White / Unsplash

Flex capacity is the extra capacity you can draw on by repurposing existing resources during times of high demand. The essence of flex capacity is that you over-provision a function (in this case TSE), but you don't just let the capacity go underutilized. Instead, you find other important-but-not-urgent work for them. This gives you the best of both worlds: a well-utilized resource that is able to meet demand as it comes in.

At Extra, we have a team of Tech Support Engineers (TSE) who handle the hard customer support tickets after they've been escalated by our front line Concierge team. People on the TSE team are highly technical, though may never have held a development position before. Some of them aspire to do so. We've created our Academy program where individuals can spend up to 20% of their time on development-related tasks (that hopefully make the TSE team more productive).

For example, TSE might use their 20% time to work on:

  • Improve the tooling needed to increase their own productivity
  • Create tools that enable the Concierge team to resolve the issue without the need for escalation to TSE
  • Fix bugs

This is great for us, as it leads to better customer experiences and lower costs--and it's great for them as it lets them skill up.

However, if the volume of escalated tickets threatens to overwhelm our current capacity, they are still TSE team members, fully trained on handling tickets. They can "flex" our TSE capacity by taking that 20% of their time and using it to clear tickets. In this way, we can handle periods of high volume without burning anyone out and without just paying people to play video games until we need them.

Role Hybridization

Skeleton assembly commissioned by Bill Willers. Photographer: Sklmsta, CC0, via Wikimedia Commons

Role hybridization is the flip side of flex capacity. In the example above, when a technical support engineer joins Extra's Academy program, the take on a hybrid role: 80% TSE / 20% software development. Other examples of hybridized roles:

  • Product managers that do their own data analysis
  • Specialist customer support agents that can still answer general questions
  • Business development people who write their own contracts
  • Finance people who can write their own software

Besides enabling flex capacity, role hybridization allows you to both reduce the load on some other (likely overburdened) team while also investing in your people.

The TSE Academy program is a perfect example:

  • TSE can now handle development and debugging requests that the core software team didn't have the bandwidth to address
  • TSE personnel who aspire to becoming software developers get training, mentorship, experience

Another benefit of role hybridization (as it pertains to latency) is that the extreme form of an "dedicated, nearby" resource is a single person who can perform both roles. No team structure can serve a product manager's data analysis needs vs the product manager just doing it themselves. (Yes, they have to have the skills, and this impact their bandwidth--but latency is optimized.)

At Zynga, all the product managers were expected to be able to do 90% of their own analysis for this exact reason. Gaming was a fast-moving environment, and more than anything, Zynga optimized their product management function for low latency development of data driven insights. A lot of other things suffered for it, but while I was there, they were by far the best at optimizing game performance after it had launched.

These "hybrids" can seem almost non-sensical to people who are used to working alone or in small teams. In that environment, every role is a hybrid role--but when you're managing a larger organization, role definitions matter. People both expect and are expected to stay roughly within the lines--so explicit hybridization of a role gives them more room to contribute.

Burst Capacity

Fireworks
Photo by Frame Harirak / Unsplash

Burst capacity is a different than flex capacity in that you only pay for the "burst" when you need it. During times you don't need it, there's no expense--but those resources are also not available to you for other purposes.

Some examples:

  • Uber's surge pricing to draw in more drivers
  • Accounting firms often use temp workers during tax season
  • Spinning up new servers on Amazon AWS
  • Overtime

If burst capacity is cheap enough, then people can get used to always just paying the burst price in order to maintain flexibility they don't use. For example, Amazon AWS offers instances up to 67% off if you're willing to commit to a 3-year term (and prepay). That makes the burst price 3x the base price--but that's the price that most people pay because servers are just not that large a part of their cost structure (until they scale).

If burst capacity is expensive enough compared to the base price, then it makes sense to over-provision and underutilize the resource. This is a hard pill to swallow for most managers--so instead they rely on overly expensive burst capacity (to look efficient) when they could have instead used some form of role hybridization to achieve a flex capacity model.

The biggest risk with burst capacity is that it won't be there when you need it (or that the burst price will have dramatically changed). What if you need 5 new accountants to help with your year-end close, and there a none to be found? Or if you need to spend a whole year's salary on each one for 1 month of work? That risk needs to be factored into the decision-making process.

Improvement of Bottleneck Processes

Finally, we can just stop looking for tricks, put our heads down, and improve the bottleneck process. Much of the inefficiency faced by teams comes from misalignment around bandwidth and latency targets or mismanaging communication.

For example, if you want a fulfillment team of an e-commerce store to achieve a median order-fulfillment latency of 3 hours, then you'd better staff 24/7. If that team attempts to achieve that kind of latency without the a staffing plan aimed at that target, something is going to break.

Or, let's say your creative team needs to ship variants of marketing campaigns based on feedback from last week's performance. Maybe it gets too much communication in the form of course corrections every 12 hours. Or maybe too little communication because feedback/approval only occurs once every 3 days (while the average creative requires three passes). In either case, the communication itself will kill any chance of your creative team's success.

I'll dig into some of this in more detail in the future, but the primary ways to improve individual team processes are:

  • Evaluate if the team has the proper staffing
    • Do this first--you don't know much of anything about a team that's understaffed or lacking proper leadership
  • Evaluate if there are underperforming individuals
    • Underperforming individuals usually drive more than the average amount of communication (sometimes a lot more!) while delivering low output
  • Find internal drivers of latency and eliminate them, especially effective if:
    • there are significant queues inside the "team", such that its acting like a bunch of separate teams
    • there's too much "note-passing" communication required to keep people on the same page
    • decision-making authority isn't being pushed down far enough, leading to too much "review/feedback/iterate" communication
  • Change the work stream to reduce the scope and/or complexity of the bottleneck task
    • i.e. shift some of the work to other teams that have (and can scale) the needed capacity better
    • This may require creating new, more specialized teams that can be managed more effectively than a generalist team that is hard to hire for
  • Reduce the burden of communication received from the rest of the team
    • This can mean better or just less communication from the outside
    • This may make the team less responsive to changes from the outside, but the increased bandwidth could be a win anyway
  • Just pay to increase capacity

Each of the above could be its own post, and the list isn't exhaustive (just what I could get out in the time I had to write this post). There's a lot more to come on individual team productivity.