Friday, December 25, 2009

RunJS: GitHub, build options, features, file sizes

A few updates on RunJS, a JavaScript file/module loader (see the README for more documentation):
  • RunJS is now on GitHub
  • Plugins for RunJS are supported. i18n bundles have been pulled out as a plugin, and a new text plugin allows you to set text files (think HTML/XML/SVG files) as dependencies for a module. The plugin will use async XMLHttpRequest (XHR) to fetch those files and will pass the text of those files as an argument to a module's module definition function. The RunJS build system will then *inline* those text files with the module, so that the XHR calls be removed in deployed code, and allow cross-domain use of those text files.
  • Any function return type is allowed from the module definition function. Before only objects and functions were allowed and functions had to be called out in a special way. Now, that special call out is removed and any return type is allowed. The cost was an extra call, run.get() that needs to be used in circular dependency cases. See the Circular Dependencies section in the README.
  • The build system that comes with RunJS now supports build pragmas.
The build pragma support was used to build RunJS in a couple of different configurations. I am trying to get a handle on where the bulk of implementation lies, and what features add to its file size. Here is the breakdown (warning, Google Doc iframe inclusion, but interesting numbers inlined in this post after the iframe):



Let's look at the non-license sizes, since they give a better indication of code density. Google's Closure Compiler did the minification for this evaluation.

The normal config, with no plugins included (but with plugin support) is 7,970 bytes minified, 3,167 gzipped. Including both the i18n and text plugins with run.js bumps it up to 11,759 minified, 4,655 gzipped.

The interesting number for me is the version of run.js without plugin support, no run.modify, no multiversion support and no page load (run.ready/DOMContentLoaded callbacks). This version of run has just the following features:
  • support for the run() module format
  • nested dependency resolution
  • configure paths to modules
  • load just plain .js files that do not define run.js modules (scripts that do not call run(), for example jQuery, or plugins for jQuery).
That bare bones loader comes in at 5,086 minified and 2,204 gzipped. The one you should use, the one with the license, is 5,245 minified and 2,317 bytes gzipped. I need to work on the size of that license block!

That size could probably be brought down a tiny bit (probably reaching the 2,000 gzip size) if I were to really be aggressive and remove all context references, but that would be a mess to maintain and there would be no easy upgrade path to multiversion support.

I believe that is the lower limit a functional loader that does nested dependencies via run() module calls. I view run.ready/DOMContentLoaded support more of a necessity for a loader, so unless you already had an implementation for that, I suggest the version that has run.ready() support, which comes in (with license) at 5,867 minifed, 2,522 gzipped.

The nice thing about the build pragma setup for RunJS, you can upgrade run without having to change your code if you find you want more features, like plugin support, or i18n/text dependency support via plugins.

I am interested in trying to sell more front-end JavaScript toolkits on this loader. For some, I can see the bare-bones 2.3K gzipped loader a nice way to step into it, and then their users have the option to swap out a more powerful version via a different RunJS build output.

I have put up the different build outputs for 0.0.6 if you want to grab one of the minified versions and play with it. Here is the minimum set of compliance tests which use the smallest loader (no modify/plugins/page load/context support) mentioned above. See the README for documentation.

Right now I believe around 2KB gzipped is close to the lower bound for a stand-alone code loader in the browser. At least for a loader I would consider using: anything that uses XHR and eval are dead to me. Using plain script src="" tags helps the xdomain case, and just fits better with debugging. While Dojo has used an XHR-based loader for quite a while (and it will continue to be supported), it just does not work as well with the browser as a script-tag based loader. Any loader should also do nested dependency loading too -- if a module in a script has dependencies in other modules, be sure to evaluate the dependencies in the right order.

As a point of comparison, consider LABjs. I feel a kinship with the author of LABjs, Kyle Simpson, even though we have never talked. We are both focusing on efficient code loading in the browser. I recommend LABjs if it fits your style.

While LABjs does not quite do nested dependency resolution, it does something related where you can tell it to wait to load a script before continuing to load other scripts. LABjs is not trying to push a module format like run is, but targeted more at existing code that does not have the concept of a module format.

By the way, RunJS can also handle loading these types of files. Where LABjs has a wait() call for holding off loading scripts that depend on another script being loaded (like a framework), RunJS uses nested run calls.

Example from the LABjs page:

$LAB
.script("framework.js").wait()
.script("plugin.framework.js")
.script("myplugin.framework.js")
.wait(function(){
myplugin.init();
framework.init();
framework.doSomething();
});

Equivalent example with RunJS:

run(["run", "framework.js"],
function(run) {
run("plugin.framework.js", "myplugin.framework.js"],
function() {
myplugin.init();
framework.init();
framework.doSomething();
}
);
}
);

Taking the 1.0.2rc1 version of LABjs and using Closure Compiler on it (without the license) gives LABjs a size of 4,360 bytes minified and 2,170 gzipped. As a reminder, the equivalent RunJS file is 5,086 minified and 2,204 gzipped. I may be able to do better with making the structure of the RunJS code more amenable to minification, but the gzip sizes come up fairly close. I do not believe the code tricks I would do to help minification will help the gzip size any.

Both LABjs and RunJS end up around 2KB gzipped. So, about 2KB gzipped seems close to the lower limit on a standalone loader, one that uses script tags/plays nice with the browser and can do nested dependencies. I would like to be proven wrong though, and ideally by modifying RunJS to fit that lower limit. :) I am sure the code can be improved.

But remember the guidelines, no goofy XHR stuff/something that works well with the browser and can handle nested dependencies. No script tags with inlined source/eval tricks. Even though Firefox and WebKit make eval debugging easier, it is still not as nice as regular script src tags.

Irakli Gozalishvili believes web workers might help, but I do not see it. The workers are restricted to message passing, and anything interesting in a web browser will likely need to touch the DOM, so a web worker solution will just be another async-XHR-like approach, where you will need to eval the scripts or inline-script inject to get all the scripts for them to see each other and the DOM.

Irakli does have an async-XHR based loader for CommonJS modules. As of today, it comes in at 1,527 minified, 838 gzipped (license not included). But it uses XHR, so limited to the same domain as the page, and debugging support is just not as nice across browsers. It also uses CommonJS module syntax, but I have decided CommonJS modules do not play well out of the box in the browser, and I believe the format's "module", "exports", and "require.main" parts are unnecessary.

5 comments:

getify said...

Thanks for the mention of LABjs and the comparison comments.

Just FYI, a clarification on how LABjs works with loading multiple scripts compared to how your code snippet works:

LABjs by default loads *all* scripts in parallel (at least as much as the browser allows). The way it deals with dependencies (what you mention as modules requiring other modules, etc) is to control execution order. So, if you have 3 scripts (A,B,C), and C requires A and B first, then all 3 will load in parallel, but A and B will execute before C.

This "preloading" technique is an important differentiator compared to how most other script loaders handle dependencies. Most others will simply *load* in order, rather than load in parallel and *execute* in order.

So, in the above example, A and B would load and execute first, before C loaded. This means that the overall time will be longer since C didn't load in parallel with A and B.

I believe as I interpret your code and snippet, this is how you suggest your code would handle things as well... that you'd first load and execute "framework.js", then load "plugin" and "myplugin" and execute them.

Whereas LABjs would load all 3 and then execute in order, so I believe LABjs would be faster in that scenario.

getify said...

Further clarification: the .wait() function is used to delay *execution*, it has no bearing on loading. As I said, all scripts in a LABjs chain will, by default (unless you turn off those options), load in parallel. The .wait() calls are inserted into a chain to make sure that LABjs knows if there are execution order dependencies that need to be maintained.

James Burke said...

getify: Thanks for the clarification. Looking more at LABjs, it can load the script but not execute it by loading the script with the type="script/cache", then I suppose later writes out another script tag with type="text/javascript" when wanting to evaluate it.

I can see that as useful for existing scripts that do not follow a module pattern. For RunJS, scripts that follow its module pattern can be loaded and evaluated before they are used, but since the evaluation is just for a module (more like a declaration, but code does run) the modules can be preloaded and evaluated, but only on an action execute the module's methods.

I can see where just loading but not evaluating the script may be useful in some scenarios, but I find that normally being able to group scripts together into one script so there is only one HTTP or just a small set of network calls to be useful more often, particularly for larger web apps. RunJS (and by extension Dojo) has a module system with a build process that can aggregate dependent modules together.

Neat, both of us have different optimization strategies we employ based on the kinds of scripts the developer wants to load.

Why do you offer the XHR preloading as an option? The doc for LABjs says it does not really help with caching and limited to same domain. What scenarios do you see this being useful?

Neat stuff, thanks for discussion!

getify said...

The XHR preloading option was added because, for same-domains scripts, it's a little more efficient time-wise than the "script/cache" trick. The reason is, it avoids the double-cache hit by simply loading the script once and keeping in memory.

Also, there are times when people load dynamically generated scripts which don't have any cache/expiration headers (obviously), and if you were to use the cache trick with those, you'd actually load the whole script twice, which would be bad.

Of course, XHR has the down side you mention, which is that caching doesn't happen. For some people, that's a concern, for some, not.

So, depending on what types of scripts you are loading, and from where, you can chose whether the cache or xhr tricks are more appropriate. By default, it's configured to use both, xhr for local, cache trick for remote. I think this is the most common configuration that makes sense for general web dev situations. Thus, that's the default.

Also, keep in mind, neither of these tricks are necessary in FF/Opera, because those two browsers will automatically preserve execution order of script tag elements added to the DOM, so LABjs just adds the scripts in order and relies on the browser to delay execution if a script arrives too early.

------------

As for combining scripts, I'm a fan of this, but only to a certain point. For instance, loading 10 scripts on a page is almost never optimal. You should combine.

But, loading 1 script is also not optimal, in my opinion. The reason is because it completely eliminates the ability to parallel load bytes. So a single 100k file will load byte-by-byte serially. But even breaking that into 2 50k files will, in general, cut the loading time roughly in half, as each can load in parallel. And LABjs will make sure the two halves execute in order so no dependency issues.

Also, if you have 2 (or 3) files that you load, and you make a change in 1 file, all your users don't have to reload all the unnecessary script code that didn't change in the other 2 files.

It's a balance game between browser cacheing, build-time processes, parallel downloading, and http-request overhead reduction.

Keep pushing to make things as optimal as possible! Good job so far!

getify said...

Correction: after much testing, it appears that the XHR preloading *does* in fact usually cache the script. There were some red herrings that threw me onto the wrong path with the prior conclusion, and those have been disconfirmed.

I've updated the documentation accordingly.