Migrating a Legacy Codebase to RequireJS, Part 2

April 16, 2018

This is the second in a three-post series about migrating our large legacy codebase to use modern JavaScript dependency management:

  1. 1. Decoupling JavaScript and Django
  2. 2. Migrating pages to RequireJS
  3. 3. r.js optimization and build process changes

We selected RequireJS as our dependency management tool after exploring a few different options. All of the tools we looked at – Browserify, JSPM/systemjs, Lasso, RequireJS, and webpack – could have potentially worked. For us, the driving need was to start using a tool at all. RequireJS had a long history, a large enough community, and our team had some experience with it.

This post walks through the progress we’d made towards modular JavaScript before introducing RequireJS, the code changes necessary to migrate one our old-style modules to use RequireJS – including infrastructure changes necessary to support an incremental migration – and our process approach to getting the whole codebase migrated.

This post assumes familiarity with RequireJS. Some familiarity with Django is also helpful.

Pre-RequireJS: the Crockford module pattern

The first post in this series painted a borderline grim picture of Dimagi’s JavaScript: 100K lines, up to nine years old, plagued by global variables. Yet previous initiatives had substantially improved it: establishing stronger conventions, integrating a linter, standardizing package management, and so forth.

An additional initiative made headway on the global namespace issue, introducing Douglas Crockford’s module pattern to our code. This pattern encapsulates each logical “module” using an immediately-invoked function expression that returns an object containing the publicly-accessible properties of the module. These modules are then stored as properties of a global object.

The original incarnation of Dimagi’s library for handling this pattern was succinct:

var COMMCAREHQ_MODULES = {};

function hqDefine(path, moduleAccessor) {
    if (typeof COMMCAREHQ_MODULES[path] !== 'undefined') {
        throw new Error("The module '" + path + "' has already been defined elsewhere.");
    }

    COMMCAREHQ_MODULES[path] = moduleAccessor();
}

function hqImport(path) {
    if (typeof COMMCAREHQ_MODULES[path] === 'undefined') {
        throw new Error("The module '" + path + "' has not yet been defined.\n\n" +
            'Did you include <script src="' + path + '"></script> on your html page?');
    }

    return COMMCAREHQ_MODULES[path];
}

Individual modules could then be defined with hqDefine:

hqDefine(“myApp/js/myModule”, function() {
   var privateStuff = function() {
      ...
   };

   ...

   var publicStuff = function() {
      ...
   };

   ...

   return {
      stuff: publicStuff,
      ...
   };
});

…and imported with hqImport. Global third-party libraries like jQuery were still accessible within modules via global variables.

This approach gained us the scoping benefits of a module system but not the dependency management that tools often provide. Attempting to import a module that hadn’t yet been defined would throw an error, leaving us largely dependent on the ordering of script tags. Modules could also import dependencies inside of document ready handlers or other function callbacks, at which point they could be relatively confident that all of the page’s scripts had loaded, but this scattered import statements through the code and made it difficult to reason about what was loaded when.

Integrating RequireJS: code changes

The first RequireJS integration, a proof of concept, was small but not trivial: it included a few different pages, dependencies on both internal and external libraries, and two different Django apps (since some of the RequireJS configuration was based on the Django app organization).

This proof of concept had four major points of interest:

  • Drawing the boundary between the RequireJS and non-RequireJS worlds
  • Migrating an hqDefine module to use RequireJS
  • Changes to the hqDefine infrastructure
  • Optimization and the build process

Boundary between RequireJS and non-RequireJS worlds

This section assumes familiarity with Django, which is where we distinguish between RequireJS and non-RequireJS pages.

With a large codebase, the ability to migrate incrementally has been an absolute requirement of our transition to modern dependency management. The main unit of migration is an HTML page. Any given page can be migrated independently to RequireJS, while the overall application supports both types of pages indefinitely.

Pages that use RequireJS identify themselves by including the custom requirejs_main template tag. The tag defines a JavaScript module that acts as the controlling module for the page:

{% requirejs_main "data_dictionary/js/data_dictionary" %}

This tag is included near the top of the template and acts in part like a variable declaration, making the controlling module’s name accessible to Django throughout the template as requirejs_main.

Our convention is to have one controlling module, and one entry point, per page. This is analogous to RequireJS’s data-main entry point, although we do not use data-main. Instead, in the master base template (which RequireJS pages must descend from), an inline script requires a handful of common modules (layout controls, analytics, etc.) and then requires the page’s main module:

{% if requirejs_main %}
   <script src="{% static 'requirejs/require.js' %}"></script>
   ...
   <script>
      requirejs([
         'hqwebapp/js/common',
         ...
      ], function () {
         requirejs(['{{ requirejs_main }}']);
      });
   </script>
{% endif %}

The master base template traditionally includes a number of tags for common scripts. Pages that use RequireJS don’t need these script tags, since RequireJS will load the necessary modules, so the base template now checks the value of requirejs_main, in addition to any previously-used flags, before including them:

{% if request.use_maps and not requirejs_main %}
   {% compress js %}
      <script src="{% static 'reports/js/maps.js' %}"></script>
   {% endcompress %}
{% endif %}

Lastly, the master base template sets a global JavaScript value so that JavaScript can detect whether or not a page uses RequireJS:

<script>
   window.USE_REQUIREJS = {{ requirejs_main|BOOL }};
</script>

This global is only referenced by the hqDefine infrastructure, described later on.

Migrating an hqDefine module

The original hqDefine function, written to support the Crockford module pattern, expected a name and a function:

hqDefine(“myApp/js/myModule”, function() {
   ...
});

RequireJS uses the define function to create modules, which expects a list of dependencies, each of which gets mapped to a parameter of the main function:

define(“myApp/js/myModule”, [
   “otherApp/js/otherModule”,
   ...
], function(
   otherModule,
   ...
) {
   ...
});

The essence of migrating a module from hqDefine to RequireJS is figuring out its dependencies and making them an explicit part of the module definition. Our modules typically depend on a few other internal modules and also a few third-party modules.

Internal module dependencies are established using hqImport, which is easy to convert: add the dependency to the module’s array of dependencies, add a name for it to the parameter list, and within the module, replace hqImport calls with that parameter name.

Dealing with third-party modules can vary depending on how that module is packaged. Most often, it’s similar to internal modules, just a matter of adding the dependency’s name to the module’s list. Some modules involve more configuration, using RequireJS’s map or shim options.

Changes to the hqDefine infrastructure

The description above of migrating a single module explained how to convert a non-RequireJS module (hqDefine, which does not include dependencies and adds the module definition to a global object) to a RequireJS module (define, which requires a dependency list and makes the module available to other define calls).

When a page is migrated to RequireJS, all of the modules it depends on must be converted to use RequireJS. However, there may be other pages that are not yet migrated but that depend on some of those same modules. To support an incremental migration, a converted module must still be usable by non-RequireJS pages. Ideally, the code to support this dual usage is confined to common code, specifically to hqDefine or related utilities. This allows for a migration where once all individual modules are converted to RequireJS, the remaining cleanup is minimal: just delete that utility code.

There are three differences between non-RequireJS and RequireJS modules that resulted in changes to hqDefine:

  • Defining the module: hqDefine versus define
  • Internal dependencies: hqImport versus parameters to the module function
  • Global external dependencies: globals versus parameters to the module function

Defining the module

We updated the basic flow of hqDefine to support both RequireJS and non-RequireJS modules, accepting an optional list of dependencies (which will be provided by migrated modules, not by unmigrated ones), so that all modules can be defined using hqDefine. On RequireJS pages, the given parameters are just passed through to define. On legacy pages, the old hqDefine logic adds the module to the internal global object of module definitions.

function hqDefine(path, dependencies, moduleAccessor) {
   if (arguments.length === 2) {
      // module has not yet been migrated
      return hqDefine(path, [], dependencies);
   }

   ...

   (function(factory) {
      if (typeof define === 'function' && define.amd && window.USE_REQUIREJS) {
         define(path, dependencies, factory);
      } else {
         ...
         COMMCAREHQ_MODULES[path] = factory.apply(...);
      }
   }(moduleAccessor));
}

For most pages, a two-part conditional is enough to determine whether or not a page uses RequireJS:

typeof define === 'function' && define.amd

The additional window.USE_REQUIREJS is necessary because our mostly-Django application includes a pure JavaScript form builder app, which itself uses RequireJS. The page that hosts this form builder has additional JavaScript that does not yet use RequireJS, so it needs to be recognized as a non-RequireJS page. But because the form builder uses RequireJS, typeof define === ‘function’ && define.amd will return true. The USE_REQUIREJS global provides a workaround for this and can be removed once that host page is migrated.

Once the entire RequireJS migration is complete, hqDefine will be deleted altogether and all calls to it will be replaced with define, without needing to change any parameters. This isn’t perfect – it’d be better to not have to do cleanup on the migrated modules – but it can at least be done automatically.

Internal dependencies

Non-RequireJS modules use hqImport to reference other internal modules, which uses the global COMMCAREHQ_MODULES. Modules that have been migrated to RequireJS don’t use hqImport, but hqDefine needs to accommodate them when they’re used in a non-RequireJS context. Since a migrated module will declare its dependencies, in a non-RequireJS context, hqDefine gets those dependencies from COMMCAREHQ_MODULES and provide them to the module function:

(function(factory) {
    if (typeof define === 'function' && define.amd && window.USE_REQUIREJS) {
        define(path, dependencies, factory);
    } else {
        var args = [];
        for (var index = 0; index < dependencies.length; index++) {
            var dependency = dependencies[index];
            if (thirdParty.hasOwnProperty(dependency)) {
                args[index] = thirdParty[dependency];
            } else if (COMMCAREHQ_MODULES.hasOwnProperty(dependency)) {
                args[index] = hqImport(dependency);
            }
        }
        if (!COMMCAREHQ_MODULES.hasOwnProperty(path)) {
            if (path.match(/\.js$/)) {
                throw new Error("Error in '" + path + "': module names should not end in .js.");
            }
            COMMCAREHQ_MODULES[path] = factory.apply(undefined, args);
        }
        else {
            throw new Error("The module '" + path + "' has already been defined elsewhere.");
        }
    }
}(moduleAccessor));

If a module is defined before all of its dependencies have been defined, the unmet dependencies will be passed along as undefined. Because hqImport is not used by migrated modules, it is no longer possible for a migrated module on an unmigrated page to access dependencies after it has been defined. This makes our not-yet-migrated pages even more dependent on script tag ordering than they previously were.

We’re accepting this additional brittleness as part of the migration. When we introduced hqDefine and hqImport, we did not decide on a convention about when to import modules. As a result, when hqImport is used inside of a callback or document ready handler, it’s not clear whether that was a deliberate decision.

As part of the migration, we’re introducing a convention where dependencies are declared when a module is defined, and additional modules are required ad hoc only when necessary, typically to avoid a circular dependency. This is the same convention that we use with python imports. At this point, we don’t even know if our JavaScript has circular dependencies. If it does, we’ll need to reintroduce a version of hqImport that delegates to require on RequireJS pages and to the old hqImport behavior on non-RequireJS pages.

Global external dependencies

Legacy code depends on global variables for a few core third-party libraries: jQuery, knockout, and underscore. All of these are compatible with RequireJS, so migrated modules treat these previously “special” libraries like any other dependency. This creates a problem for migrated modules being used on a RequireJS page:

hqDefine(“myApp/js/myModule”, [
   ‘jquery’
], function(
   $
) {
   $(“#thing”).doStuff();
});

On a non-RequireJS page, $ is the global jQuery object. But then the module’s body uses $ as a parameter, so inside the module, $ will be whatever the function’s call passes as the first parameter. That call happens in hqDefine, so, based on the previous code, $ be undefined, since ‘jquery’ isn’t a property of the COMMCAREHQ_MODULES object, which only tracks internal modules. hqDefine needs to handle these special third-party modules:

function hqDefine(path, dependencies, moduleAccessor) {
   if (arguments.length === 2) {
      return hqDefine(path, [], dependencies);
   }

   var thirdParty = {
      'jquery': typeof $ === 'undefined' ? undefined : $,
      'knockout': typeof ko === 'undefined' ? undefined : ko,
      'underscore': typeof _ === 'undefined' ? undefined : _,
   };

   (function(factory) {
      if (typeof define === 'function' && define.amd && window.USE_REQUIREJS) {
         define(path, dependencies, factory);
      } else {
         var args = [];
         for (var index = 0; index < dependencies.length; index++) {
            var dependency = dependencies[index];
            if (thirdParty.hasOwnProperty(dependency)) {
               args[index] = thirdParty[dependency];
            } else if (COMMCAREHQ_MODULES.hasOwnProperty(dependency)) {
               args[index] = hqImport(dependency);
            }
         }
         COMMCAREHQ_MODULES[path] = factory.apply(undefined, args);
      }
   }(moduleAccessor));
}

Optimization and the build process

As part of the migration, we are shifting from Django-based JavaScript optimization – minification and concatenation – to the RequireJS optimizer. This is a large enough topic that it will be the subject of the next post.

Our migration process

Every individual page has to go through several changes to be fully migrated:

  1. Inline JavaScript moved to js files, as described in the last post.
  2. Logic encapsulated in one or more hqDefine modules, using hqImport to reference dependencies. This often goes smoothly, but areas that rely on global variables can get hairy.
  3. All dependent modules updated to explicitly declare their dependencies. This part is difficult to scope, since it isn’t obvious from looking at an old-style module how many dependencies there are to migrate.
  4. Single controlling module set using a requirejs_main tag in the Django template, and script tags removed from the Django template

This isn’t a rigid series of steps. Steps 1 and 2 can happen in either order. It’s sometimes efficient to combine steps 2 and 3, or 3 and 4, or 2, 3, and 4. Because of this, and because this migration has happened in fits and starts, different areas of the codebase are in different stages without an obvious pattern.

However, it’s fairly easy to automatically determine how much work is left for any given step – grepping for inline script tags, running a script to find JavaScript files that don’t yet use hqDefine, or grepping for hqDefine calls that don’t pass an array of dependencies – and generate both a list of tasks and a metric for how close we are to being done.

This flexibility has allowed us to make progress on this large migration in the face of perpetually changing resource constraints. Making one page one step better is usually a small task, so it’s fairly easy for an individual to pick up and put down this work without leaving something half-finished and needing to coordinate with others. Because some parts of the migration are typically simpler than others, easier tasks can be handled by developers with less experience or less comfort with JavaScript.

Once all pages are migrated will come the final cleanup:

  • Remove any boolean checks for requirejs_main
  • Replace all hqDefine calls with define calls
  • Delete hqDefine and hqImport

What’s next?

Now that we’ve looked at RequireJS-enabled modules and pages, and the process for converting legacy code, we’ll get into the system changes needed. Next up, we’ll talk about organizing modules into concatenated and minified bundles, integrating with a CDN, and fitting these new steps into the existing build process.

Written by
Jenny Schweers

Senior Engineer II

Read more from
Technology

The World's Most Powerful Mobile Data Collection Platform

Start a FREE 30-day CommCare trial today. No credit card required.

Get Started

Learn More

Get the latest news delivered
straight to your inbox