I'm back from a much appreciated vacation, and just wanted to jot down some thoughts that I want to expand on later. This is a continuation of how I want to build web applications on the RIA-end of the web scale, but an approach I could use for just about any data-driven web site.
So, the app server implements the data API as a REST API. This gives a good boundary of what should be done in the app server. The data should use XML (preferrably ATOM) or JSON as the data markup. The only presentation part might be to apply something like an XSLT transform of the data if the GET request matches a browser or searchbot user agent. Alternatively, use the domain name of the GET request to know whether or not to apply the XSLT transform (sometimes it is nice to host the pure data API on a different domain for security and load reasons).
Now it is time to jump the shark:
Restrict the data API to its own domain, separate from the domain used for presenting the UI for the browser/search bots. For example, use api.domain.com for the data API and ui.domain.com for the browser/search bot interface.
The HTML files on ui.domain.com use Ajax-like techniques access the resources on api.domain.com to show the UI. The Ajax-based data loading would have to be a cross-domain ajax. If the API and UI domains share a common sub-domain, document.domain tricks could be used. Otherwise (and more interesting to me personally), JSONP or XMLHttpRequest IFrame Proxy could be used.
So what about search bots? They cannot do the Ajax techniques, at least not yet. For them, in the web server config, route their user agents to a web server module that uses Gecko to fetch the page, and do the ajax loading. The HTML can send an event to Gecko (this custom, headless gecko can export a function, something like window.onAjaxFinished()) to tell it when it is done rendering. Then the Gecko module serializes the current state of the DOM and sends that HTML string back to the search bot. It could even keep a local cache if it liked, if the resources did not change that often.
This approach would allow the search bot to get a good representation of the page for their indexing, and since Progressive Enhancement was used to bind to resource hyperlinks in the document, the search bot can still navigate to other resources on the domain (and those requests would be funneled through the web server's headless Gecko module).
In this model, REST via HTTP becomes the new JDBC. And more importantly to someone like me who enjoys the front-end -- I don't need to learn about any of the app server flavors of the month to implement data access (the other app server-hugging developers on my team can do that).
So now I just need to do a custom build of a headless Gecko that I can use in a web server module. Any pointers are appreciated. Ideally someone should do this as an open source project so everyone can benefit.