Tuesday, July 13, 2010

Simple Modules Feedback

Executive summary, with apologies to Jay-Z: "I got 99 problems but lack of lexical scoping ain't one".

While at the Mozilla Summit, I saw Dave Herman's presentation on a Simple Modules proposal for JavaScript/ECMAScript. Dave posted the slides, and be sure to read his follow-up post. I suggest you read his slides and blog posts first for some background.

[Sidenote: I'm going to use JavaScript instead of ECMAScript in this post -- JavaScript and I go way back, before it got its colonial, skin disease-inspired name.]

Simple Modules is a strawman proposal at the moment, it is still a work in progress. Some of the more interesting parts for me, the dynamic loading, are still very rough and in a separate proposal. It sounded like Dave wants to focus on prototyping the lexical scoping and static loading bits first before proceeding further on the dynamic bits. Great idea on actual prototyping, sounds like they will leverage Narcissus for doing the prototyping.

So some of my feedback may be a bit premature, but some of it gets to why there are modules and what should be allowed as a module, so hopefully that might be useful even at this early stage.

First, some perspective on where my feedback comes from: I am a front-end developer, I do web apps in the browser. I love JavaScript, I want to use it everywhere, and I believe that it is the only language that has the potential to be used effectively anywhere.

However, that is only because JavaScript is available and works well in the browser. Any new solutions for modules should *work well* in the browser to be considered a solution. The browser environment should be treated as a first class citizen, keeping in mind the browser performance implications on any approach. This is one of my main criticisms of CommonJS modules, and the reason I write RequireJS, a module loader that works well in the browser. I also maintain Dojo's module loader and build system.

Why

Why have modules? What are they? Modules are smaller units of code that help build up larger code structures. They make programming in the large easier. They usually have specific scope, and avoid dumping properties into the global scope. Otherwise the likelihood of a name collision between two modules is very high and errors occur. So a module system has syntax to avoid polluting the global space.

There also needs to be a way for modules to reference other modules.

Simple Modules

The Simple Modules proposal outlines a Module {} block to define what looks like a JavaScript object as far as inspection (for .. in notation, dot property referencing), but is something more nuanced underneath.

Anything inside the Module {} block is not allowed to use a global object, and you cannot add/change a Module after its definition. Here is a sample module, called M, that demonstrates some of the syntax and scoped variable implications:
module M {
//In normal code this would define a global,
//but not inside the module declaration. This
//is likely to be an error(?) in Simple Modules.
foo = "bar";

//color is only visible within module M's block
var color = "blue";

//Creates a publicly visible property called
//"name" on the module.
export name = "Module M";

//setColor is only visible within module M's block
function setColor() {}

//Creates a publicly visible property called
//"reverseName" on the module whose value
//is a function
export function reverseName() {}
}
You can reference/statically load other modules via load (syntax is just a placeholder, not set in stone):
module jQuery = load “jquery.js”;
The goals with this approach:
  1. Stronger lexical scoping: no eval or with allowed in the modules, and a loaded module shares some lexical scope with the module that loaded it.
  2. No access to a global object by default.
  3. Hopefully better syntax for declaring modules over the existing function-based module pattern.
To the extent that modules help programming in the large, the simple modules approach do not give me anything more than I have now, and the proposal has some specific weaknesses. I can appreciate there are some juicy things in the proposal for JavaScript engine developers, but as a user of modules, I do not see it as a net advantage. Here is why:

Lexical scoping/Global access

In addition to using the function-based module pattern to avoid leaking globals, I use JSLint to avoid accessing globals and the use of eval/with. For programming in the large, JSLint helps even more because it enforces a code style that produces much more uniform code. It is built into many editors and easy to run as part of build processes.

JSLint is not perfect (I would like to see JavaScript 1.7/1.8 idioms supported, like let and for each), and you may not like some of the style choices. However, reproducible, consistent style that can be checked automatically is more important than bikeshed-based style choices. It warns of global usage, eval and with, and even helps you find unused local variables.

What is even nicer is that you can opt out of some of the JSLint choices, you can use some globals if you need to. There is some flexibility in the choices.

Functions as Modules

For #3, better syntax for module definitions, I do not see it as a net win over the function(){} module pattern, particularly how it is used for modules in RequireJS where it encourages not defining global objects.

The Simple Modules syntax does not allow exporting a function as the module definition. This is a big wart to me. Functions are first class entities in JavaScript, one of its strongest features. It is really ugly to me that I have to create a property on a module object to export a constructor function, or some module that lends itself to being a function:
module jQuery {
export jQuery = function () {};
}

/*** In some other file ***/
module jQuery = load “jquery.js”;
//ugly
var selection = jQuery.jQuery();

//less ugly, but more typing, so still ugly
var $ = jQuery.jQuery;
var selection = $();
Again, ugly. It should be possible to set the module value to be a function. I know this makes some circular dependency cases harder to deal with, but as I outlined in the CommonJS trade-offs post, it is possible to still have circular dependencies. Even in CommonJS environments now, it is seen as useful. Node supports setting the exported value to a function via module.exports, and there is a more general CommonJS proposal for a module.setExports.

It means the developer that codes a circular dependency case needs to take some care, but it works. Coding a circular dependencies is a much rarer event than wanting to use a module that exports just a constructor function or function. The majority use case should not be punished to make a minority use case a little easier, particularly since you can still trigger errors in the minority use case. Coding a circular dependency will always require special care.

This particular point makes it hard for me to get on board with Simple Modules even in its basic lexical scoping/static loading form. I strongly urge any module proposal to make sure functions can be treated as the exported value. We have that capability today with existing module implementations, and it fits with JavaScript and the importance it places on functions.

Given the extra typing that would be needed to access functions that are exported as modules, I do not see the Simple Modules syntax a net win over the function-based module pattern, particularly as used in RequireJS.

Beyond Lexical Scoping

For programming in the large, what is really needed are more capabilities than what has been outlined so far for Simple Modules. However, making modules useful needs attention in these areas. This is the "99 problems" part:

Dynamic loading

Dynamic loading is harder to work out than static loading. If there is dynamic loading, it is unclear I would need static loading . The goal of modules is to allow programming in the large, and even for a smaller project, why do I need to learn two ways to load modules (static vs. dynamic), when one (dynamic) will do? Dynamic loading is also necessary to enable all the performance options we have today to load scripts in the browser.

There is a module loader strawman proposal that would tie into Simple Modules, but I understand it will not be nailed down more until the basic Simple Modules with static loading is worked out/prototyped.

Referring to other modules

It is unclear how a Module Resource Locator (MRL) is translated to a path to find a module. In CommonJS/RequireJS, an MRL looks like "some/module", and that MRL is used in require() calls to refer to other modules. require("some/module") translates the MRL string "some/module" to some path, "a/directory/that/has/some/module.js". That path is used to find and load the referenced module.

Looking at the Simple Modules examples, it looks like just plain URLs are used as the MRL, and those do not scale well for programming in the large. You will want to use a symbolic name for the MRL, and allow some environment config to map those symbolic names to paths. Otherwise it places too many constraints on how the code is stored. It may not even be a disk -- apparently CouchDB uses design docs to store modules.

I have seen some comments about using more symbolic names for MRLs in some of the notes around the proposals, so maybe it is planned.

In RequireJS, the symbolic name is also used in the module definition. However, since symbolic names can be mapped, they do not have to be the reverse DNS symbolic names, like "org/mozilla/foo". In fact it is encouraged to not use long names.

Distributing and sharing modules/module groups (packages)

This issue can be treated separately from a module spec, but it could affect how MRLs are mapped via a module loader. And this issue really is important for programming in the large. The solution may just be "use packages as outlined by CommonJS". While there are still some gray areas in the package-related specs for CommonJS, that could be a fine answer to the problem.

Performance in the browser

This is getting even further away from the basic Simple Modules spec, but a solution to this issue should be considered for any module solution. The browser needs to be able to deliver many modules at once to the browser in an efficient way. I have heard that Alexander Limi's Resource Packages proposal may be a way to solve this that may work with the Simple Modules approach.

A common loading pattern for web apps will be to load some base scripts from a Content Delivery Network (CDN), then have some domain-specific scripts to load. As long as this still works well with the bundling solution that is great. We already have tools today to help bundling, minifying and gzipping scripts. Any solution will have to be better than what we can do today. Resource Packages could be since it allows other things like images to be effectively bundled.

Summary

I do not feel like Simple Modules are an improvement over what can be done today. In particular, I feel RequireJS when used alongside JSLint is a compelling existing solution, and it works well, and fast, in the browser.

For the more immediate goals of Simple Modules:
  • the expanded, stricter lexical scoping is nice, but for a web developer, it is a slight incremental benefit if JSLint is already in use.
  • Not being able to set a function as the module value means the syntax is not a net win over the function-based module pattern.
The larger issues of module addressing and bundling/distribution are understandably hazy in this early stage of the strawman proposals, but they will need to be addressed as well as or better than existing solutions to gain traction.

I do not want to contribute stop energy around the proposals, I am just hoping to provide feedback to indicate what problems need to be solved better from my web developer viewpoint. I appreciate I could be wrong on some things too. I may be missing something grander or larger, but hopefully if that is the case, this feedback can indicate how to explain the proposals better.

3 comments:

Kris Zyp said...

Great insights, James. One comment, how do URLs not scale well for programming in the large? Isn't the world wide web pretty large scale? Are you aware of something bigger than the web that URLs are insufficient for?

James Burke said...

Kris Zyp: I only meant for referencing modules. If a module does:

load "http://my.domain.com/jquery.js"

then that module is distributed to others, it places extra constraints on that module, where the user of that module must have access to that URL. In particular, for server-side JS projects that does not scale as well as using a symbolic name like load "jquery", then allow mapping the symbolic name to the actual URL.

I expect that URL mappings would be in a package descriptor for the distributed module(s), but I expect the JS environment would allow overriding those mappings via some config options. This is is better than expecting people to change the source of a module to change a module reference.

That said, plain URLs should be allowed for MRLs, but symbolic names make the most generic large scale sharing of code easier. If for some reason that is not possible, and only one can be allowed, then symbolic names should be chosen.

However, I strongly prefer allowing both symbolic names and URLs. It was not clear to me that symbolic names would be allowed in the proposals. If symbolic names are not allowed, I believe it places the proposals at a significant disadvantage vs. the symbolic names used in CommonJS and RequireJS.

Kris Zyp said...

That makes sense, thanks for the clarification.