Autoscale: How Facebook Makes Its Software Infrastructure More Energy-Efficient


AutoscaleArchitecture650Facebook’s energy-conservation efforts aren’t limited to the hardware at its data centers: The social network aims to make its software infrastructure more energy-efficient, as well, and one of the ways it is doing so is via Autoscale, a system for power-efficient load balancing.

Facebook described the concept behind Autoscale in a post on its engineering blog by Infrastructure Software Engineer Qiang Wu:

Every day, Facebook Web clusters handle billions of page requests that increase server utilization, especially during peak hours.

The default load-balancing policy at Facebook is based on a modified round-robin algorithm. This means every server receives roughly the same number of page requests and utilizes roughly the same amount of CPU. As a result, during low-workload hours, especially around midnight, overall CPU utilization is not as efficient as we’d like. For example, a particular type of Web server at Facebook consumes about 60 watts of power when it’s idle (0 RPS, or requests-per-second). The power consumption jumps to 130 watts when it runs at low-level CPU utilization (small RPS). But when it runs at medium-level CPU utilization, power consumption increases only slightly to 150 watts. Therefore, from a power-efficiency perspective, we should try to avoid running a server at low RPS and instead try to run at medium RPS.

To tackle this problem and utilize power more efficiently, we changed the way that load is distributed to the different Web servers in a cluster. The basic idea of Autoscale is that instead of a purely round-robin approach, the load balancer will concentrate workload to a server until it has at least a medium-level workload. If the overall workload is low (like at around midnight), the load balancer will use only a subset of servers. Other servers can be left running idle or be used for batch-processing workloads.

Though the idea sounds simple, it is a challenging task to implement effectively and robustly for a large-scale system.

For much more on Autoscale, please see Wu’s post on the engineering blog.