Monday, June 25, 2012

ES Modules: suggestions for improvement

There has been a recent bout of comments about ECMAScript (ES) harmony modules on twitter and elsewhere. Here is my attempt to explain parts of it, some of the design tradeoffs, and perhaps a middle ground that would open up some options that may bridge some gaps.

Modules are one of those things that seem very simple, but involve quite a lot of decisions and tradeoffs. This post is mostly just about module linking and module ID resolution, and even with that, it is quite long.

If ES Modules do not come up with different ways to work (or maybe explain where I have it wrong), they are not competing well with what can be done with a combination of CommonJS/Node and AMD.

My background: I work on RequireJS and AMD.

What is it

First, some links to the specs. The "harmony" moniker means it is in process for the next version of the ECMAScript (JavaScript) language:


The module examples page is suggested if you want to get a feel for it, but it is good to read the other docs too. It can be a bit daunting though, unless you speak the spec language.

Points of reference

One way to evaluate how ES Modules works is to compare it to something you may already know:

  • CommonJS / Node. Node implements a version of the CommonJS module API.
  • AMD / RequireJS. RequireJS implements the AMD module API.

Run time vs compile time

ES is a "compile time" approach where the formats mentioned above are "run time" approaches. Maybe not precise terms, but here is a definition of what is meant by those terms for the purposes of this post:

"Compile time" means:

  • JS text is parsed, and the "module" "import", and "export" syntax is found.
  • Any dependencies are fetched and parsed.
  • Once the dependency tree has been all fetched, the ES module loader will wire up the exports from a dependency to a module's "module" or "import" use, and do type checking on that export type and how it is referenced in the module
  • The module code is then evaluated/executed.


"Run time" means: there is usually no pre-parse stage. The JS text is evaluated, and any module API that is encountered is run as it is encountered.

AMD will actually do a small parse step if the module looks like:

define(function (require) {
    var a = require('a');
});

In that case, it will parse the function to look for require() dependencies, and then load and execute the dependencies first before running the function above.

"Compile time" was chosen for ES because:

  • it is familiar from other scripting languages
  • sets the way for other possible static features, like macros
  • ensures that "import *" are static, *not* dynamic bindings
  • allows some type checking on the values that are explicitly "export"ed.
  • generally seen as safer and easier to reason about that run time.

The CommonJS/Node style of pure runtime, no parse, was hard to get to work with some edge cases as I understand it, but I heard that second-hand, I did not see that discussion.

Impact of compile time

For compile time to work well, it should use new keywords in the language, to have clear markers on what is participating in the module system.

Although, it could work with a module API instead of new syntax, by only recognizing literal use of that API, and do not support variable assignment of the API or dependencies to other names. This is what AMD does for the "sugared CommonJS form" mentioned above.

For "import *", static binding is critical because anything that is a runtime scope lookup gets into "with" territory, and "with" has been seen as a mistake by the committee. ES5's 'use strict' bars its use.

Since new syntax is involved, ES Modules cannot be "shimmed" into existing JS libraries. There is a Module Loader "runtime" registration call that can used for a module to register its, but it means those libraries cannot participate in the compile time linking stage, so they need to be pre-loaded by a script loader before an ES Module can effectively reference it with module syntax.


Module ID resolution

One other consideration, one that is usually overlooked when talking about modules, is how a module ID like "jquery" is resolved to a file path and loaded.

Both CommonJS/Node and AMD/RequireJS support "short, logical names" for dependencies. So, you can say require('jquery') and that jquery gets converted to a path using some algorithm. Node uses multiple paths to find jquery.js, and AMD in the browser relies on a declarative configuration to do so.

ES modules do not really have support for this, unless you also implement an imperative resolver. They support full URLs, like:

module foo at 'http://example.com/scripts/foo.js'

but we have found in AMD that it is useful to be able to say require('jquery'), but then declaratively map that to zepto.js. 

So, an individual module specifies a dependency on an API provider, but how that provider is satisfied is resolved using the declarative configuration.

If there is only an imperative resolve API, no simple declarative API to resolve short names, it will mean shipping a userland "loader library" to effectively use modules. This opens the door to balkanization in module ID resolution since there is not built in support.


Special factors in JavaScript

There are a few special factors with JavaScript that are not usually in other programming languages, and they have an impact on the design:

  • The largest deployed use case of JavaScript, the browser, is async, network IO. File size and number of requests are very important to performance. So combining modules together into one file, and minifying/transforming the source for smaller delivery is common.
  • There is a large legacy codebase of browser-based JavaScript that just use browser globals, and no real module format. Some small uses of JavaScript do not need modules, and browsers will support those use cases indefinitely.

My goals

I want AMD and RequireJS to go away.

They solve a real problem, but ideally the language and runtime should have similar capabilities built in.

Native support should be able to cover the 80% case of RequireJS usage, to the point that no userland "module loader" library should be needed for those use cases, at least in the browser.

If the ES module format requires a web developer to use a script loader to use existing, non-AMD/CommonJS, non-ES JS in a project for those 80% use cases, it is a failure.

Example: If I cannot use jquery and backbone in an ES 6 module without needing another library to preload or prep those libraries for ES 6 module use, then existing JS users will not see much advantage over using AMD.

If the web developer needs to code any imperative logic to wire up the ES Module Loader, that will result in a loader library. That is a failure condition.

As compared to AMD: if the ES approach cannot do the above without a helper loader library and the ES approach does not allow something like loader plugins, then there is no contest -- AMD will still be more useful to a developer than the built in system. Small savings in the amount to type and a thin layer of type checking is not enough.

This may very well not be the goal of ES modules, and it would be great if the specs or some background material acknowledge that, and list out the mitigation strategies developers are expected to use.

Shortcomings of ES modules

Right now, ES harmony modules do not improve an AMD user's workflow because of the following:

New syntax makes it very hard to optionally upgrade

If I am the author of something like jQuery or Backbone, I cannot optionally add in a way to register as an ES module because ES modules use new syntax. However, there are many uses of those libraries which will not be in ES module-capable browsers.

The Node and AMD communities have found optional opt-in via a runtime API very useful for adoption of code that works with their module systems, but still work in older "use plain script tags with browser globals" approach.

There is a runtime API in the ES module loader proposal that would allow a legacy script to register something as a module, but that requires the end developer to use another script loader library to load that library so it can do that runtime call, then start loading ES module code.

The developer may as well just stick with AMD. Complexity has not been reduced.

Register module and a global


Backbone originally had trouble adopting AMD because if it called define() to register a module, they found other libraries, like Backbone plugins, would break. The plugins were expecting to find a Backbone global variable but when Backbone called define() it was not also exporting a global.

This same problem will exist in ES-mixed code. Any dynamic registration also needs to allow an export of a global so that downstream libraries will work until they are also converted to optional module registration.

There should be a migration path, one that allows gradual rollout of modules without requiring a project to go whole hog on module syntax.


Declarative module ID resolution

While I have made tools to allow a developer to "convert" an existing library to AMD, there are many developers that did not want to touch existing libraries. It makes it difficult to compare against new versions and there is a concern that the conversion introduces breaking scope changes (rightly so).

So the "shim" configuration for requirejs was introduced to allow specifying dependencies and an export value for JS code that does not call a module API. This has been well received in the community. More background on shim here.

"shim" with "paths" and "map" make it possible to declaratively set up a configuration that allows for one file IO lookup per module ID, an easy way to "shim" old libraries, and to load more than one version of a module for use by different modules. That covers the 80% case for using old and new code with a module loader.

By using a declarative configuration that is supported by the "default" module ID resolution mechanism in the language, then it avoids having to ship a userland loader library for browser use. This is a big win because it will help kill AMD.

It is fine if the Module Loaders API still has an imperative API to set up different  module ID resolution logic. That would allow Node to maintain its current multiple IO, nested directory lookup logic. However, the default should favor the harsher browser environment in such a way that an extra loader library is not needed.

Loader plugins

I can appreciate that supporting Loader plugins may seem out of scope for the default module loader, but they have been incredibly useful for AMD. Node has seen a use for them, even though they are done in a different way via require.extensions. They effectively allow use of transpiled languages.

I find the AMD loader plugins better than Node's approach because:

  • load behavior vs. file format: since a prefix is used on the resource ID instead of just using a file extension suffix, it allows multiple plugins to deal with the same type of file extension. For example, "text!index.html" and "template!index.html" can be used in the same app, the first one just giving the raw text, the second one "compiling" some text for use as a template. The developer, not the plugin provider, chooses the right use. It still allows "single extension" plugins too, and for those, they can omit the file extension in the ID, so no increase in ID length.
  • one IO lookup: For a resource ID "foo", node may do a lookup for "foo.js", "foo.coffee" and "foo.node". By specifying the loading mechanism via the prefix, it avoids multiple IO lookups, which are important for browser use. It also makes it clear what handles the loading.

AMD loader plugins can participate in build steps, so the "text!" plugin can inline the text as module in a built file:

define('text!index.txt', function() {
   return 'hello world';
});

That is incredibly useful for getting good network performance in the browser.

Even for local file environments like Node, being able to combine all the assets for a program into one file is really great for distribution. It is conceptually simpler to reason about tracking one file vs. "nested directory of directory" installs. Think of it as a way to easily share shell scripts.

The middle way

For developer workflow, right now AMD is a better alternative than ES harmony modules, given the choices around compile time linking, new syntax, and the use of imperative ID resolution.

Here are some suggestions on how to allow some of the benefits of the compile time approach with the run time ones used by AMD. The goals are reuse non-module code in modular systems, allow for a way to get some static version of import *, and perhaps even macros.

Fetch dependencies, execute, modify, execute

The core of the middle way for compile time vs run time:

  • do not force compile time operations to be all up front, before any evaluation.
  • evaluate dependencies before executing the current module.
  • provide an API for modules, not just new syntax

These are effectively what AMD does today, except it does not have a way to alter the AST before final execution. Well, an AMD loader could do that, but AMD loaders have traditionally avoided it. However, the harmony loader plugin I made effectively does this to support "import *". More below:

The ES module loader would operate like so:

  • Load JS text. Parse out dependency references.
  • Load dependencies, parse out their dependencies, load them, etc...
  • Before executing a given module, execute its dependencies, and wait for the dependencies to finish exporting their module values.
  • Take that exported value and if there is an "import *" in the current module, modify the AST of the current module such that it gets a locally bound variable to any of the hasOwnProperties of the dependency that are known at that time. So, any properties added to the module after this point are not visible. This should avoid concerns about dynamic scope.
  • Once the module AST has been fixed up for any import *, then evaluate it.

When parsing out dependency references, look for any new keywords, but also any API that corresponds to that keyword. So, look for at('moduleID') for dependency references in addition to at 'moduleID'.

The runtime API for the module should be something like at('moduleID') for dependencies and exports.propertyName for specifying export properties. I am not arguing for that specific API, just mentioning that there would be an API alternative to the new syntax. The API alternative does not need an import alternative though.

Since a module is executed before giving the exports to a module that depends on it, and since there is a runtime API for modules, then that allows existing JS code to opt-in to ES modules without getting bitten by new syntax.

Since all dependencies are executed before executing the current module, an "import *" can be supported, and I believe that would allow for macros later.

There are some limitations around circular dependencies, but they are still possible, and the restrictions are minor in comparison to allowing existing code to opt in to ES modules and still work in non-ES module environments.

Declarative configuration

Support something like the "paths", "map" and "shim" config as used in RequireJS. This allows easier use of old code, and scales up to very large code without requiring a developer to ship a library that sets up an imperative resolution API.

Support loader plugins

Now that all dependencies are executed before executing the current module, then it is easier to support loader plugins, as the loader will have the exported value for that plugin resource before running the current module.

These environment-based loading, like an "env!" plugin that can load a module for Node and a different API-compatible one for the browser. See also a "has!" plugin for feature detection-based loading, and plugins to enable transpilers.

Yes, it is more to sort out, but they provide a lot of benefit. AMD has already primed this pump. It even works with a build/optimization step for inlining resources.

Use string IDs for module identifiers

This allows the module references in dependencies to be the same as the ID that is inlined when modules are combined and named in built files. Right now it is weird to use a JS identifier, like module Foo {} to name a module, but then see module Foo at "Foo". It is hard to match up at "Foo" with module Foo.

The extreme positions

The following is based on my limited experience. I am not a language designer. I am but a simple plumber that uses the pipes that available to build things. I may not have the right long term thinking involved, but I think the following make the ES module proposal simpler.

To be clear though, I think the middle way above is enough to bridge the gap. Please, do not read the following and then discount the middle way. The middle way is separate from these more extreme measures.

No new syntax

If there is a runtime API available to allow existing code to opt in, just shed the new syntax. Just have one way to do it via API that can also then be shimmed.

no import *

import * makes it difficult to determine where code comes from. If
this type of construct is allowed:


module foo {
   var sin = function () {};


   module bar {
       import * from "Math";
       sin();
   }
}

for a minifier, it now needs access to Math to do its work correctly. This has not been the case in the past. It would suck to need all of the code for all the modules used in a system just to complete a minifier pass.

For developers, if you have two modules that do import * it can be difficult to know where something comes from.

Destructuring provides enough benefit for these use cases, just do the
comma separated list for things you really use:

   import {sin, cos} from "Math";

import * is a bad pattern and it does not save much.

If you get rid of import * then with the "middle way" of evaluating modules, then regular var/let-based destructuring is enough, there is no need for an import keyword.

No macros

Similarly, rethink the need for macros long term. They suffer from the same "where did this come from" problem as import * does. The function capabilities in JavaScript are good enough to get the job done for the "don't repeat yourself" task.

A way forward for today's code

The nice thing is that we can prototype this new world by combining what CommonJS/Node does today with AMD. So we can just use the require() and define() as used today to get there. The ES committee does not have to ratify it, and we get the benefit of real world implementation and use before committing to default language support.

Cajon is my attempt from the AMD side to bridge the gap with plain Node code and a runtime browser loader. LinkedIn's Inject is another AMD loader that uses a similar approach. So, just use CommonJS/Node modules in the browser in dev, use the r.js optimizer to compile down to AMD for final deployment.

The cjsTranslate capability in the r.js optimizer allows a developer that always likes to do builds, even in dev, can code in Node syntax but output to AMD and load it in the browser either by the small Almond AMD shim, or the full dynamic loader via RequireJS. Or choose Dojo or curl.js.


browserify can be updated to use AMD as its transport format instead of its home-grown require.define() API, and then not have to ship a loader, but use one of the AMD loaders/API shims. browserify is nice in that, unlike the r.js optimizer+cjsTranslate, it provides browser module shims for the native node modules. It would be great to break those out as a separate project that could be consumed by a project just using the r.js optimizer.


If Node adds define() support, callback-require for use within a module for dynamically calculated dependencies, and supports at least a limited form of loader plugins, then we're done. The amdefine project is an implementation proof of that support. There are details to sort out, but it is doable. Any node committers are interested, give me holler. We can work out the details.

Summary

For developer workflow, the current ES module spec is not competing well with a combination of CommonJS/Node and AMD with loader plugins. Or even just AMD with loader plugins.

Using the middle way for module execution and getting a good declarative module ID to path configuration in the ES spec will level the playing field. Add loader plugins to get language transpiler support and environment/feature detection loading that is efficient for the browser.

I have given some of this feedback to the es-discuss list, but I think some of it, in particular the "middle way" module evaluation flow, got lost in my poor communication where it seemed like I was proposing a dynamically scoped import *. Hopefully this post clarifies what I was trying to achieve with that earlier feedback.

Finally, I appreciate working on the ES committee is very difficult work. I do not envy them. I do not mean for this feedback to come across harshly, but the committee is running out of time, and I do not feel like it has made the case very well for how what is being proposed is better than what we have cobbled together with existing technology. To be clear, I want an ES Modules proposal to succeed because I do not want to do AMD or RequireJS forever. Hopefully this feedback can be viewed as loyal opposition, and as a challenge to do better, or at least to do it in a way that is explained more clearly.

7 comments:

Patrick Mueller said...

re: loader plugins

I can see loader plugins have some value. It's nice to be able to load a non-code resource file as data. I really just disagree that it be tied into the module system, as this is just an additional complication.

Create a new sub-system for loading resources. Separation of concerns. Maybe it ties into a module loading story at a lower level; maybe the module loading story USES a resource loading story.

But separate the APIs that the user has to use.

Plugins that are merely event notifiers (domReady!, or whatever) are also bad in that now there are events I have to deal with. Again, separate that out, there are already plenty of decent event handling stories available. And again, this may impact module loading in that you want events thrown out of the module loader - no problem (I want that too).

But separate the APIs that the user has to use.

James Burke said...

Patrick Mueller: I can appreciate wanting to make it separate, but when looking at supporting transpiled languages, and environment detections for loading, it seems to make more sense as part of module loading.

In addition, it reduces the pyramid of callback doom, since many modules need these resources to be complete. By breaking it out as a separate API, it leads to more callback-based APIs that are harder to use.

I'm open to seeing more specifics on how that might be broken out or at least limited to provide some of the above benefits without being too broad.

A good test cases to solve:

* transpiler plugins, how would a coffeescript one work
* how to allow environment-based loading, "load this dependency in node, this on in the browser".

For that second one, dynamic require calls:

if (inNode) {
require('nodeModule');
} else {
require('browserModule');
}

cannot be supported in the browser environment for dynamic loading, given the execution model above.

Patrick Mueller said...

transpiler plugins:

compile ahead-of-time (a build). Why force the browser to compile your CoffeeScript/whatever?

how to allow environment-based loading, "load this dependency in node, this on in the browser":

compile ahead-of-time (a build). One for node, one for the browser, one for Rhino/JVM, etc.

James Burke said...

Patrick Mueller: I'm sure you know I think mandating a build/processing step before running is a bad design goal. The adoption of AMD is hopefully be an existence proof of that.

Node does not require that for its coffeescript/transpiler support. It is done "on the fly" too.

Patrick Mueller said...

If the build/processing step is nearly instantaneous, then who cares if there's a build/processing step.

I think we need better/easier builders, but I'm happy with the ones I build by hand w/grunt/make/node-supervisor; 1-second turn-around time from source change to server restarted. Running CoffeeScript compiler, copying files, generating new files with some in-project tooling, running browserify.

Now, I only ever have one thing to test - "the build". Not, the "dev version" and then - hopefully, sometime later - "the build". Sadly, lots of people don't even bother with the build step anymore because AMD is "fast enough". :-)

Unknown said...

I'll leave it to the Harmony module champions to respond in full. But there was one point I wanted to bring up.

From your post, I think you understand that in the Harmony proposal, a module id such as "jquery" could be used in import statement such as

import $ from "jquery";

And that it would be up to the module loader to resolve it. From my reading of your post, I think one of the reasons you find this unsatisfactory is because it requires prior loading of a module loader that knows how to register and resolve such module id's.

However, there is another dimensions that you don't talk about in the post. The Harmony proposal is defining module behavior that must be present in all implementations of the language in all environments. It doesn't say anything specific about the browser environment or the Node environment or any other. Those are all outside of scope of Ecma-262.

It's my understanding that the harmony module proposal allows for a host environment to provide its own default module loader. Given that, there is nothing that precludes the supplemental standardization of a default module loader for browsers or for Node or for anything else. Such a standard could address issues of concise module ids, registration, etc. and could be supported without the need for explicitly loading the module loader.

I know this doesn't address all your issues, but I think it is important for everyone to realize that there is some environmental flexibility in the Harmony module proposal and that hosting details for the browser and other common environments still need to be worked out.

James Burke said...

Allen Wirfs-Brock: thanks for the clarification.

While I recognize the current spec has that flexibility, I am not sure it is good to offload deciding on a default resolution implementation.

Since I want named modules to have the same name as what is in their dependency string, this implies an optimization tool that can treat the names the same. It is unclear to me how an optimization tool that runs in Node for instance could opt to choose the "browser" resolution rules.

Ideally there is a default resolution policy that is tailored for the special constraints of browser loading and the use of optimization tools that normally do not run in the browser.

I'm fine allowing an API that allows other resolution overrides, but I believe it will be an overall simpler system if a default is defined as part of the language.

Maybe there is another way to do it though. But at first thought, pushing the responsibility onto other, disparate standards bodies/implementations does not seem to make the system easier overall to use and to share built code between systems.