How Facebook Returned A/B Testing To Mobile Apps


ABSwitch650When Facebook rewrote its mobile applications and converted them from custom Web stacks to native development stacks, it lost the ability to perform A/B testing, or to simultaneously test multiple versions of its apps. The social network described how it regained the ability to A/B test in a post on its engineering blog.

The blog post detailed the development of what came to be known as Airlock, first explaining the need for A/B testing:

Shipping our apps on iOS and Android requires developers from many different teams to coordinate and produce a new binary packed with new features and bug fixes every four weeks. After we ship a new update, it’s important for us to understand how:

  • New features perform.
  • The fixes improved performance and reliability.
  • Improvements to the user interface change how people use the app and where they spend their time.

In order to analyze these objectives, we needed a mobile A/B testing infrastructure that would let us expose our users to multiple versions of our apps (version A and version B), which are the same in all aspects except for some specific tests. So we created Airlock, a testing framework that lets us compare metric data from each version of the app and the various tests, and then decide which version to ship or how to iterate further.

The blog post went on to detail the origins of Airlock and issues that were resolved, concluding:

The creation of Airlock helped us ship a navigation model that feels slicker, is easier to use one-handed, and keeps better track of your state in the app. This tool has allowed us to now scale the framework to support 10 or 15 different variations of a single experiment and put it in the hands of millions of people using our apps. We had to relearn the rules of not letting one experiment pollute another, keeping some experiments dependent and others exclusive, and how to ensure the logging was correct in the control group. The last bit was tricky because sometimes a control group means that some piece of user interface doesn’t exist. How does one log that someone did not go to a place that doesn’t exist? Here we learned to log both the decision on which UI to construct and then separately to log the interaction with it.

As the framework scaled to support more experiments, the amount of parameters requests, data logging, and client-side computation began to rise very quickly. The framework needed to be fast on the client in order to have experiments ready without blocking any of the startup path, so we optimized the cold-start performance on our apps so that basic, critical configurations could be loaded when the app started and all heavy work was deferred until after the app’s UI was displayed. Likewise, we had to tune the interaction with the device and the server, minimizing the data flow and simplifying the amount of data processing on both ends.

Airlock has made it possible to test on native and improve our apps faster than ever. With the freedom to test, retest, and evaluate the results, we’re looking forward to building better and better tests and user experiences.

Image courtesy of Shutterstock.