Tuesday, March 30, 2010

CommonJS Module Trade-offs

First of all: why should you care about module formats?

If you use JavaScript, particularly in the browser, more is being expected of you each day. Every site or webapp that you build will want to do more things over time, and browser engines are getting faster, making more complex, web-native experiences possible. Having modular code makes it much easier to build these experiences.

One wrinkle though, there is no standard module format for the browser. There is the very useful Module Pattern, that helps encapsulate code to define a module, but there is no standard way to indicate your module's dependencies.

I have been following some of the threads in the CommonJS mailing list about trying to come up with a require.async/ensure spec and a Transport spec. The reason those two specs are needed in addition to the basic module spec is because the CommonJS module spec decided to make some tradeoffs that were not browser-friendly.

This is my attempt to explain the trade-offs the CommonJS module spec has made, and why I believe they are not the right trade-offs. The trade-offs end up creating a bunch of extra work and gear that is needed in the browser case -- to me, the most important case to get right.

I do not expect this to influence or change the CommonJS spec -- the developers that make up most of the list seem to generally like the module format as written. At least they agreed on something. It is incredibly hard to get a group of people to code in a certain direction, and I believe they are doing it because they love coding and want to make it easier.

I want to point out the trade-offs made though, and suggest my own set of trade-offs. Hopefully by explicitly listing them out, other developers can make informed choices on what they want to use for their project.

Most importantly, just because "CommonJS" is used for the module spec, it should not be assumed that it is an optimal module spec for the browser, or that it should be the default choice for a module spec.

Disclosure: I have a horse in this race, RequireJS, and much of its design comes from a different set of tradeoffs that I will list further down. I am sure someone who prefers the CommonJS spec might have a different take on the trade-offs.

To the trade-offs:

1) No function for encapsulating a module.

A function around a module can seem like more boilerplate. Instead each module in the CommonJS spec is just a file. This means only one module per file. This is fine on the server or local disk, but not great in the browser if you want performance.

2) Referencing and loading dependencies synchronously is easier than asynchronous

In general, sync programming is easier to do. That does not work so well in the browser though.

3) exports

How do you define the module value that other modules can use? If a function was used around the module, a return value from that function could be used as the module definition. However, in the effort to avoid a function wrapper, it complicates setting up a return value. The CommonJS spec instead uses a free variable called "exports".

The value of exports is different for each module file, and it means that you can only attach properties to the exports module. Your module cannot assign a value to exports.

It means you cannot make a function as the module value. Some frameworks use constructor functions as the module values -- these will not be possible in CommonJS modules. Instead you will need to define a property on the exports object that holds the function. More typing for users of your module.

Using an exports object has an advantage: you can pass it to circular dependencies, and it reduces the probability of an error in a circular dependency case. However, it does not completely avoid circular dependency problems.

Instead, I favor these trade-offs:

1) Use a function to encapsulate the module.

This is basically the core of the previously-mentioned Module Pattern. It is in use today, it is an understood practice, and functions are at the core of JavaScript's built-in modularity.

While it is an extra function(){} to type, it is fairly standard to do this in JavaScript. It also means you can put more than one module in a file.

While you should avoid multiple modules in a file while developing, being able to concatenate a bunch of modules together for better performance in the browser is very desirable.

2) Assume async dependencies

Async performs better overall. While it may not help performance much in the server case, making sure a format performs well out of the box in the browser is very important.

This means module dependencies must be listed outside the function that defines the module, so they can be loaded before the module function is called.

3) Use return to define modules

Once a function is used to encapsulate the module, the function can return a value to define the module. No need for exports.

This fits more naturally with basic JavaScript syntax, and it allows returning functions as the module definition. Hooray!

There is a slightly higher chance of problems in circular dependency cases, but circular dependencies are rare, and usually a sign of bad design. There are valid cases for having circular dependencies, but the cases where a return value might be a problem for a circular dependency case is very small, and can be worked around.

If getting function return values means a slightly higher probability of a circular dependency error (which has a mitigation) then that is the good trade-off.

This avoids the need for the "exports" variable. This is fairly important to me, because exports has always looked odd to me, like it did not belong. It requires extra discovery to know its purpose.

Return values are more understandable, and allowing your module to return a function value, like a constructor function, seems like a basic requirement. It fits better with basic JavaScript.

4) Pass in dependencies to the module's function wrapper

This is done to decrease the amount of boilerplate needed with a function wrapped modules. If this is not done, you end up typing the dependency name twice (an opportunity for error), and it does not minify as well.

An example: let's define a module called "foo", which needs the "logger" module to work:

require.def("foo", ["logger"], function () {

//require("logger") can be a synchronous call here, since
//logger was specified in the dependency array outside
//the module function
require("logger").debug("starting foo's definition");

//Define the foo object
return {
name: "foo"
};
});
Compare with a version that passes in "logger" to the function:

require.def("foo", ["logger"], function (logger) {

//Once "logger" module is loaded it is passed
//to this function as the logger function arg
logger.debug("starting foo's definition");

//Define the foo object
return {
name: "foo"
};
});

Passing in the module has some circular dependency hazards -- logger may not be defined yet if it was a circular dependency. So the first style, using require() inside the function wrapper should still be allowed. For instance, require("logger") inside a method that is created on the foo object could be used to avoid the circular dependency problem.

So again, I am making a trade-off where the more common useful case is easier to code vs increasing the probability of circular dependency issues. Circular dependencies are rare, and the above has a mitigation via the use of require("modulename").

There is another hazard that can happen with naming args in the function for each dependency. You can get an off-by-one problem:

require.def("foo", ["one", "two", "three"], function (one, three) {
//In here, three is actually pointing to the "two" module
});
However, this is a standard coding hazard, not matching inputs args to a function. And there is mitigation, you could use require("three") inside the module if you wanted.

The convenience and less typing of having the argument be the module is useful. It also fits well with JSLint -- it can help catch spelling errors using the argument name inside the function.

5) Code the module name inside the module

To define the foo module, the name "foo" needs to be part of the module definition:

require.def("foo", ["logger"], function () {});
This is needed because we want the ability to combine multiple module definitions into one file for optimization. In addition, there is no good way to match a module definition to its name in the browser without it.

If script.onload fired exactly after the script is executed, not having the module name in the module definition might work, but this is not the case across browsers. And we still need to allow the name to be there for optimization case, where more than one module is in a file.

There is a legitimate concern that encoding the module name in the module definition makes it hard to move around code -- if you want to change the directory where the module is stored, it means touching the module source to change the names.

While that can be an issue, in Dojo we have found it is not a problem. I have not heard complaints of that specific issue. I am sure it happens, but the fix cost is not that onerous. This is not Java. And YUI 3 does something similar to Dojo, encode a name with the module definition.

I think the rate of occurrence of this issue, and the work it takes to fix are rarer and one time costs vs. forcing every browser developer taking extra, ongoing costs of using the CommonJS module format in the browser.

Conclusion

Those are the CommonJS trade-offs and my trade-offs. Some of them are not "more right" but just preferences, just like any language design. However, the lack of browser support in the basic module spec is very concerning to me.

In my eyes, the trade-offs CommonJS has made puts more work on browser developers to navigate more specs and need more gear to get it to work. Adding more specs that allow modules to be expressed in more than one way is not a good solution for me.

I see it as the CommonJS module spec making a specific bet: treating the browser as a second class module citizen will pay off in the long run and allow it to get a foothold in other environments where Ruby or Python might live.

Historically, and more importantly for the future, treating the browser as second class is a bad bet to make.

All that said, I wish the CommonJS group success, and there are lots of smart people on the list. I will try to support what I can of their specs in RequireJS, but I do feel the trade-offs in the basic module spec are not so great for browser developers.

14 comments:

Kris Zyp said...

Good review, but I am curious, you said: "circular dependencies are ... usually a sign of bad design". What does this belief originate from?

James Burke said...

Kris Zyp: hopefully I indicated that there are valid use cases for circular dependencies.

However, I have come across some, and made some myself, that were just bad design. I think it is always good when hitting a circular dependency to do a critical review of it to make sure it makes sense.

The larger issue I was trying to point out was that circular dependency cases are a small subset of module cases, and good circular dependencies being an even smaller subset.

Circular dependencies need to be allowed in a module system, but if choosing a trade-off that makes the whole set of module cases more awkward to decrease the probability of an error in a circular dependency case, that is not a trade-off I would make.

Chris Barber said...

I for one think the exports thing is absolutely stupid. I hate the syntax. It's mandatory that I be able to define more than one module in a specific file.

I guess I've been spoiled by Dojo and its module system. I can define as many objects as I want in a single file. I also have been able to access resources globally.

But async changes the game. You pretty much need to wrap your object definition in a function that can be called after dependencies have been loaded.

I think doing a require.def("foo", ["logger"], function (logger) { return { name: "foo" }; }); is the best solution for the web I've seen so far.

To accommodate defining multiple modules per file, I suppose you could just have a bunch of require.def() calls in a single file, but then you'll have to make sure they're in the right order. If foo required bar, bar would have to be defined first so that require doesn't try to load a non-existent bar module.

I bet that use case is pretty rare. In dijit, I see instances where they define string class to be invoked. For example, dijit.layout.BorderContainer has an attribute "_splitterClass" that defaults to dijit.layout._Splitter. The _Splitter object is defined after the BorderContainer, but isn't instantiated until a BorderContainer instance is created.

I'm alright with that and also makes it so you can override the _Splitter object. Then the BorderContainer would need to do a simple require(this._splitterClass).

After playing around with node.js, I'm afraid that JavaScript on the server, which I wanted so bad, is going to suck in the end.

In conclusion, requirejs kicks ass and I'm unfortunately less than enthusiastic about the future CommonJS conventions.

James Burke said...

Chris Barber: For build bundles that include multiple require.def calls, I use a require.pause() at the start of the require.def calls then a require.resume() at the end to pause and then resume dependency tracing until all the require.def modules have registered.

There are other ways to do it, but I find this was the simplest, just concat the files together and bracket it with pause and resume. It also allows JS files that do not participate in require.def semantics to be included without changing their scope.

It is a nice way to mix and match more traditional files with a modular system.

I was hoping to put RequireJS into Node, but unfortunately it would be a fairly invasive patch, from what I can tell so far. So not sure it makes sense yet.

Julian said...

Thank you for the interesting article.

I too prefer to use functions to hold my modules, as this also makes the module easier to load using a script tag without clogging up the global namespace.

I am open minded as to whether one uses "exports" or returns an object from the function, or even uses the module function as a constructor. In the end it seems all the same to me, and seems just a matter of where and when the object is created. In fact I am playing around with using a Function as the "exports" Object, so that I can add additional processing, in the form:-

function myExportedFunction()
{

}
exports(myExportedFunction);
exports(myExportedFunction, "myAlias");

// elsewhere
function exports(f, name)
{
if (!name)
{
var name = get f's identifier to save extra typing
}

exports[name] = f;
f.owner = exports;
}


I can't say I am a fan of the CommonJS module naming system, as it seems to link in a complex way (IMHO) file paths and namespaces/module names. I prefer to maintain separation between (a) where my module is stored, and (b) how it is identified (function identifier), and (c) how it is namespaced. My "require" function therefore accepts both a global identifier for a pre-loaded module function, and a URL which is asynchronously loaded.

In terms of asynchronous loading (rarely needed), I run a simple stack, whereby for each module, before loading, I inspect the source code of the module function, extract the requires, push the module on the stack, and then start loading the requires, which in turn may be pushed on the stack if they have requires. As each module comes in, it is popped off the stack and loaded.

James Burke said...

Julian: if you are using a function wrapper already for the module, it seems like more boilerplate to also carry around an exports function vs just using a return.

Either exports needs to be passed in to the function wrapper or it is a global. A global would mandate passing in the module's name as an argument to the global export.

Given that you need to name the module, then I prefer that just to be part of the function wrapper call, as I use in require.def.

As for module naming, using full URLs along with a name makes the module less portable. Most of the time the module's URL can be derived simply from the module name, and for the rarer cases where you need to map that name to a different path, I prefer to do those mappings at the top of the application, where that knowledge will need to be known anyway.

On your async loading comment: it sounds like you use XMLHttpRequest then to fetch your modules and inspect the string source for the require calls? For me, that is not the most scalable/best performing/easy to debug solution available.

I appreciate some of this is personal preference though. Thanks for sharing!

Eric Leads said...

Every module loading method is going to have trade-offs. I agree with you that sacrificing async is absolutely not the right trade-off to make in a browser environment.

I thought about using RequireJS for a new client side framework, but I decided against it because modules can't self-describe without first loading the RequireJS library.

I know that may not seem like an important requirement to some, but IMO, dependency resolution should not lock you into any particular vendor.

For a good example of why not, take a look at the kiwi package manager for Node. For a time, it was the preferred package management system - however, in order to use it, you had to call a kiwi function from within your module.

Then a less obtrusive package manager came along (npm), and kiwi faded out of popularity. Now there are a ton of node modules with useless kiwi module loading calls.

A similar problem could crop up for those who depend on RequireJS.

For now I'm shuttling the dependency list out on a namespace similar to exports (in fact, it will use exports if it is available).

I don't just return a value, because my module wrapper function is anonymous with no assignment, so there's nothing to catch the return value.

The advantage to that is that the user can then assign your exported module to whatever namespace they like.

In other words, our module system meets these requirements:

* Self-contained - no need to load any vendor library to function.

* User-defined module namespace, similar to CommonJS require.

* Uses the standard module-pattern anonymous self-executing function that many JS coders should be familiar with (shared by libraries like jQuery).

* Loading our client-side architecture library is optional for the functionality of a module that doesn't have any dependencies.

* If our library is loaded, the page can take advantage of any common modules loaded or defined by the core library. It doesn't even have to be aware that they exist, or know whether or not the core lib actually loaded. Useful for common cross browser bug fixes, unobtrusive UI enhancements, analytics, multivariate UI testing, etc...

* Modules can define a callback to run after the core lib and all dependencies have been completely evaluated.

James Burke said...

Eric Leads: I agree that depending on one vendor can be hazardous. I am trying to work with the CommonJS group to work out a spec so that others can implement. Right now Nodules and the new dojo-sie loader both implement the core of the API that RequireJS uses, so hopefully multi-vendor support can grow that way too.

Since this post was written, Kris Zyp found a way to get anonymous modules to work, so now the module ID is not encoded in the define() call (used to be require.def()).

So I hope the API is moving forward.

Seems to me that you need a function entry point to a loader if you want generic, nested module dependencies to be traced dynamically vs. doing a build or server transform to make sure all dependencies are loaded.

Eric Leads said...

James - RequireJS modules will fail completely if define() does not exist. That is an unacceptable trade-off for me at least for several years going forward, because define() is not in common use today, and define() will not exist unless RequireJS or a compatible library has already been loaded.

In other words, modules that depend on define() cannot function in a stand-alone capacity. That's not a trade-off I could afford to make, for a variety of reasons. What if we decide to change module definition standards in the future? (Very likely as the specs for modules are still very young...) What if the CDN hosting the module library fails?

That single point of failure (either way) can break a whole lot of code.

My modules still attempt to make a function call to the core library that powers shared module loading, and the async dependency loader -- but they have the ability to fail gracefully, and still do what they can without the loaded dependencies, if those dependencies are not handled, and they do all that mostly by convention, with very little code.

They can also take advantage of a server-side build process to load their dependencies. The modules don't know the difference between modules loaded by the lib, and modules made available by a server-side build process.

You can do all that with an anonymous, self-executing function. You can't with define().

James Burke said...

Eric Leads: OK, sounds like you want full control of the code so you implemented your own conventions and loader hook (if available). Glad to hear that works for you.

Eric Leads said...

James,

If RequireJS makes it into jQuery core, I'm on board. jQuery is always loaded on every page here. If we have JavaScript, we have jQuery.

My primary concern right now is that we're working on a huge codebase currently in production that is used by millions of visitors per month. If we adopt an intrusive module loader like RequireJS, and then jQuery adopts a different module loader, we are going to have a mandate to move to jQuery's standard module loader.

What are the chances this is going to land in jQuery officially?

That is the million-dollar question.

- Eric

James Burke said...

Eric Leads: I understand your concern.

I expect RequireJS will not get into jQuery, because ideally it should be used to load jQuery. jQuery is a great DOM and Ajax library, but it is just one part of building an application. If it tries to bundle too many things, particularly under the jQuery namespace, I think it will lose its effectiveness and easy appeal.

I also try to stay in contact with the jQuery project. One of the changes in jQuery 1.4.3 was the readyWait feature, which was designed so that jQuery plays well with script loaders like RequireJS. RequireJS 0.14.5+ now uses readyWait -- it has a special affordance for jQuery in it.

I submitted a ticket to the jQuery team that asked for that type of feature, and fortunately John Resig was already thinking of adding something like it, so it worked out.

I also take feedback from jQuery users seriously, and I am really trying to make it easy for them to upgrade to well-scoped modules.

When I talked to John Resig earlier this year, and from what he has said before about script loading that he would consider for jQuery, it has been a more constrained loading strategy. That may have changed since earlier this year, but I believe RequireJS gives more packaging options and allows for strong code encapsulation. I also introduced the priority config option based on the scenario that John described wanting to hit for script loading.

Of course if the jQuery team decided to bundle RequireJS with jQuery, that would be great. However, I am not expecting it. I am focused on trying to build a great script loader that works great in the browser and can leverage CommonJS code when running on the server.

The current Dojo codebase has switched to supporting the module API that RequireJS uses, so I believe it is gaining some traction.

So while I hope that RequireJS will be a "safe" choice to make, I can appreciate that it still may be early for some folks to make that choice.

I hope that by having the code open source and liberally licensed that no one would be stuck with a bad choice, they could always decide to modify to suit their needs.

I also believe the way it builds on the JavaScript Module Pattern and encourages *not* using global variables, that if something else were to come along it would be easy to migrate away since the code is well-encapsulated.

It sounds like you worked out a pattern that may be similar, and I definitely appreciate the control that gives you.

If you just have questions on code loading strategies for your own implementation, I am happy to discuss them, and if you ever decide to give RequireJS a try, I can help answer questions for that too.

Mike Koss said...

I actually think that returning the exports from the module definition closure has some problems. It mandates that the object returned by a require() function, must be created by the defining module. That implies that load order is not important, and that you cannot have circular references between modules.

I prefer a looser coupling between module load order by allowing forward references to modules that have yet to be defined.

I've open sourced a sample implementation of a CommonJS - compatible module definition library (namespace.js):

https://github.com/mckoss/namespace

It's truly tiny enough to be included in any other library, and simple enough to be broadly compatible between implementations.

James Burke said...

Mike Koss: I agree that returning from the definition factory function makes it hard to do some kinds of circular dependencies.

The latest version of the AMD proposal and RequireJS allow you to use "exports" by specifying it as a dependency, or by using the simplified CommonJS wrapper, define(function(require, exports, module){});

However, I have found exports to be only useful if specifying a circular dependency or using multiple modules to build up/enhance another module.

Those are not majority use cases for modules, and I do not believe that minority use cases should make the majority use cases more awkward. But I agree those patterns need to be allowed. Having those minority cases explicitly ask for exports seems like a decent compromise.