Running Calabash Integration Tests with Amazon Device Farm

In part one of this series, Nick wrote about how we used Calabash scripts for our UI-level Android testing. In this entry, I’ll go over how are able to run these tests as part of our nightly build cycle. To do so we used a service called Amazon Device Farm (ADF) and the Jenkins automation server.

As discussed in the previous post, our development and QA teams put considerable effort into implementing almost our entire Android test plan in Calabash. This put us in the (enviable) position of being able to run our release QA automatically and at-will from any machine with a connected Android device. Awesome! However, operationally, we still had a long way to go to achieve truly continuous, automated integration. Our goal was a system that:

  1. Did not require us to manage devices
  2. Ran nightly or based on other triggers
  3. Reported failures immediately, storing actionable information for later access

Since we’ve had this system in place, we’ve been able to identify and diagnose bugs as soon as they enter our Android development code (and even some released by the server team). This reduces code churn, drastically cuts down on QA turnaround time, and — most importantly — prevents bugs from being released.

We’re huge believers in this framework and think that many projects could benefit from it as well. By the end of this post, you should be able to continuously build your tests, run them on real Android devices, and publish the test results to your team.


Amazon Device Farm

Amazon Device Farm[1] (ADF) is a service that hosts thousands of physical Android and iOS devices and provides an API to interact with these devices, including an endpoint for running Calabash tests.

I can’t overstate how glad we were to find this service. Device management is a nightmare. Keeping a device usable requires coddling the battery and Internet status. Frequently, Android OS updates and other pop-ups will block test runs without human intervention. And that’s without even getting into the machine and service running the tests, and maintaining the USB connection and `adb` bridge…

All this to say, we were pumped to find a service that would handle this for us. With ADF we can instantly run our Calabash tests on a device of our choosing. Device Farm returns not only a pass or failure but also device logs, Calabash output, and optionally even video of the run and screenshots of the failures. This was another huge win; instead of having to rerun a test locally, we could immediately watch the video footage at the point of failure. Engineering time savings are everywhere.

ADF provides a web UI for running tests and you can also call the [API directly][2]. However, the whole point of this exercise was to remove human fallibility from the QA process. To achieve truly continuous testing we wanted to run these builds regularly so that we could detect failures immediately. Further, we wanted the run results to be easily accessible. And finally, we wanted the entire team to know – via Slack or email – when we’d broken something. For this, we turned to Jenkins.


Jenkins Continuous Integration Server

We use Jenkins CI Server[3] (henceforth Jenkins) like a swiss army knife at Dimagi. It helps us with a wide array of purposes, including:

  • Building and hosting our Android APK files
  • Running unit tests and linters on our GitHub pull requests
  • Building our web application [Formplayer][4]

In short, any process with well-defined inputs and outputs (which should be all of your processes!) that you want to run regularly could be run on Jenkins. Further, Jenkins provides a **ton** of steps and hooks for interacting with builds and there’re thousands of plugins for integrating with just about any tool. I could write an entire post about Jenkins (and started to do just that) but for our purposes, we just need to know that Jenkins can manage our build triggers, pull source code, run scripts, and host files (called Artifacts in Jenkins parlance).


Defining our process

The process that Jenkin runs goes something like this:

  1. Pull your Calabash test’s source code from GitHub
  2. Copy our under-test application APK from another Jenkins job
  3. Invoke a script to package our Calabash files into the `` format required by ADF
  4. Call the Device Farm API with these features, our APK, and the run configuration
  5. When the test completes, store and publish the test results (Calabash output, screenshots, videos)


Device Farm Setup

The Device Farm UI (and AWS generally) can be overwhelming considering the number of tools built into it; however, given that we’ll be interacting with ADF primarily through the API we need only configure a few things:

  1. Register an AWS account and ADF device slot
  2. Create a project space
  3. Set up a device pool

Step 1 should be fairly straightforward except that [ADF pricing][6] can be a bit tricky to understand. At Dimagi, we pay for one device slot at $250 a month. This allows us to run our tests on any one device at any time; however, we cannot run on two devices in parallel as this would cost another $250. For now, you can use the 1000 free trial device minutes.

For Step 2, create a new project as shown, taking note of the name. (We’ll need this later).

Step 3 is more interesting. Create a new pool, again noting the name. Once this is done you can select the set of devices you’d like your tests to run on.

At Dimagi we use an Asus Nexus 7, which we found to be the most robust Stock Android device. In the future, we’d like to expand this pool to include different screen sizes and OS versions

You’re now set up to start your test run! You can use the Web UI to upload your application APK and test suite for a trial run (this is sometimes useful when debugging) or start wiring ADF up with Jenkins below.


Jenkins Pipeline

Jenkins Pipeline (henceforth Pipeline) is a domain-specific scripting language based on Groovy that allows you to configure your Jenkins builds in code. Previously, all Jenkins builds needed to be configured using the Web UI. Pipeline allows you to write the configuration as code so that you can check it into source control, track the changes, and revert changes when needed. With Pipeline all you need to do within Jenkins is create the job and direct it to the GitHub repository; from there, your script takes over.


The Code


node {

   checkout scm

   sh 'chmod a+x scripts/make_features'

   sh 'bash scripts/make_features'

   step ([$class: 'CopyArtifact',

                 projectName: params.cc_android_job,

                 filter: '**/app-commcare-debug.apk',

                 fingerprintArtifacts: true,

                 flatten: true]);

   step([$class: 'AWSDeviceFarmRecorder',

                       projectName: 'commcare-odk',

                       devicePoolName: asus,

                       runName: params.stageString,

                       appArtifact: 'app-commcare-debug.apk',

                       testToRun: 'CALABASH',

                       calabashFeatures: 'aws/',

                       calabashTags: params.tag,

                       isRunUnmetered: true,

                       storeResults: true,

                       ignoreRunError: false,



and scripts/make_features:


rm -r aws

mkdir aws

zip -r aws/ features/*

Let’s go through this line by line.

First, we check out the code from source control.

Second, we run a short bash script that makes a new `` file containing our Calabash features, step definitions, and resources.

Next, we copy the most recent APK file artifact from the Jenkins job that builds our Android application.

Finally, we invoke the [AWS Device Farm Plugin][5] for Jenkins – this allows us to make our call to the Device Farm API as a step in Pipeline. We pass in a few configuration parameters here, in particular:

  • `projectName` – the name of the ADF project, configured above
  • `devicePoolName` – the name of the device pool, configured above
  • `appArtifact` – the APK of your Android application under-test
  • `calabashFeatures` – the ZIP file of your Calabash test suite
  • `storeResults` – whether we should save the artifacts in Jenkins
  • `ignoreRunError` – whether a Calabash failure should result in a build failure
  • `calbashTags` allows you to pass in a set of tags so that ADF will only run a subset of your complete test set. This is a powerful feature that I hope to cover more completely in a future post.

Now we’re set! Once the run completes we can view the results on the run within Jenkins or click through to view directly on ADF.

Now we have a basic UI test runner setup using a combination of Jenkins, Pipeline, Calabash, and Amazon Device Farm. However, the full set of features provided by these tools goes well beyond what I’ve discussed here. As a few examples, we can also:

  • Retry failed tests using Pipeline’s `retry`
  • Run our tests in smaller batches using Pipeline’s `step`
  • Set up build triggers and email reports
  • Add wildcards for easier release testing

The list goes on, and I hope to cover some of these topics in future posts.   


Words of Caution

You now have all the information needed to set up your own continuous, integrated testing server on real devices. However, in the spirit of full disclosure, getting all of this running was difficult for us and we still have occasional issues with stability. Devices sometimes have issues with rotation or slow loading and Calabash can get tripped up by these UI blips causing a missed step and failure. And while Calabash and Jenkins are established tools, Pipeline and Amazon Device Farm are very cutting edge and still have weakness in documentation, support, and churn. We frequently submit support tickets, pull requests, or simply use workarounds in order to get full value out of these tools.



With that said, at Dimagi we feel that the value we derive from this setup is well worth the upfront development cost and continued maintenance chores. As Nick elaborated on in the previous post, we were able to massively cut down on our QA resource consumption and release turnaround time. We often catch bugs in our mobile codebase as soon as they’re merged (instead of during QA) because UI tests run nightly against our master .apk. We catch many *server* bugs almost as soon as they are deployed because mobile integration tests run on our live production server after every deploy. This setup is not for everyone, particularly those without the capacity to make the upfront investment and headspace for periodic minor bug fixes. However, if you find yourself bogged down by QA cycles, afraid to merge large Android refactors, or ensnared by server-mobile API failures, then I highly recommend this stack.










Similar Articles

The World's Most Powerful Mobile Data Collection Platform

Start a FREE 30-day CommCare trial today. No credit card required.

Get Started

Learn More

Get the latest news delivered
straight to your inbox