Maximally consumable data
Getting data into more accessible formats is a big part of what we do at SnapLogic, so it’s always good to see someone else promoting simple over complicated. (Zen for us Python folks. )
A couple of weeks ago, Roger Costello posted a short article on Maximally Consumable Data, which triggered a discussion on the xml dev mailing list. Roger did a good job of distilling this down a a very digestable summary. Bill de hÒra summarized the same topic in a nice one-liner back in February.
I feel we’re at the point where the simple data format discussion has converged to a handful of very usable alternatives – Simple XML / POX, Atom, JSON/JSONP, with microformats mixed in, and it’s time to start discussing the higher level aspects like metadata. That discussion is already beginning at the OpenAjax Alliance.
I had originally intended the Mashup Camp session to get beyond the basics of just the formats, and to initiate a discussion around security, access control, and descriptions of feeds, but in typical un-conference fashion – “Whatever happens is the only thing that could have.” – we ended up primarily discussing the formats, and tradeoffs between them. (The mashup camp link above has plenty of reference links on the various formats.)
As part of the SnapLogic 2.0 release, we added an RP layer to the core server, to allow a simple choice of representation on a read or write, based on Content-Type. The beginnings of this were in 1.0, but 2.0 formalized the API’s so we could add additional representations when needed, and makes the representations uniformly available to all resources. We also support explicit formatting through components like the XMLWriter for cases where the default representations are not enough.
Of course, nothing is ever set in stone when it comes to formats, and this week at the MySQL conference I learned that we were missing one important representation we had overlooked.
We had a booth at the MySQL conference, next to the dot Org pavillion (It was nice to see lots of familiar community faces again!) There were two interesting new products at the conference as well. Tod Landis was showing Entrance, a GPL licensed data browser, and Nate Williams of Kirix was showing Strata. Both of these tools are ‘data browsers’, but I view them as the beginning of a new generation of data access tools. They go beyond the basic query tool, and are moving into the realm of data display, visualization and analytics. At the same time, they are not traditional ‘report writers.’
I’ve been aware of Entrance for a while, and had been talking to Tod about reading SnapLogic as a data source.Strata was new to me. In the course of talking to both Nate and Tod at the conference, we realized that since both tools were ‘web’ savvy, and could read data not only from MySQL but also over HTTP, we should be able to get things working. With that capability, SnapLogic can become a data source for both, and open up a whole new range of data sources beyond MySQL tables. The gotcha? In the context of the conference, the best lowest common denominator format for both tools was good old tab delimited.
I managed to avoid ‘booth duty’ for a couple of hours Wednesday, and I prototyped a tsv representation module for SnapLogic. I added it to our demo server, and by the end of the day, both Entrance and Strata were able to read and display our demo data feeds as tsv.
Short term, we will add ‘text/tab-separated-values’ to the standard SnapLogic representation suite (and probably csv as well.) Longer term, we have some new data consumers for SnapLogic.