Merging Apache log-files with SnapLogic
Mike mentioned in his last post the various examples that we have been working on to demonstrate some of the things that are possible with SnapLogic’s server. One of those example is a ready-made package for the aggregation of Apache log files. You can download the package from our dedicated download site: packages.snaplogic.org.
With this package, you can aggregate multiple log files even if they are located on separate servers. SnapLogic’s data integration works across the network and also between different SnapLogic servers, of course. So, let’s say a web-hoster operates a server farm. Hits are distributed across a bank of hosts and thus we now need to aggregate the logs produced by each host to get the whole picture. This is a pretty common problem.
All that’s needed now is an instance of the SnapLogic server on each host and with our Apache log-file package installed. Please make sure to follow the installation instructions in the README, since the package includes two new components, which need to be installed with each server instance. Once that is done, a script that comes with the package performs all the necessary setup and resource creation. The script only needs to be run once, and will communicate with all the servers as needed in order to perform the setup. The script itself is driven by command line parameters, and therefore is suitable for easy integration into an already existing management infrastructure. This comes in handy if you have to operate on a large number of log file sets, and you need to create merged representations of them automatically.
What happens when you run the script? ‘Resources’ are created for each of the log files you are interested in on each of the hosts. A ‘merge’ pipeline is also created, linking up the resources for each log file with a resource that performs the actual merging. This pipeline resource then represents a merged view of the various log files – sorted by time, just as the original individual log files are. If you want to, you can now use this merged resource in the construction of further SnapLogic pipelines, for example in order to filter on certain fields. But to give you easy access to the merged resource right away the script also creates an HTML version of that resource. All you need to do is to point your browser at that resource, and you can see the merged output in simple table form. To include this in your already existing web pages, simple HTML templating is supported, allowing you to specify exactly how the data should be presented and which fields in the log file should be visible.
Some of the most common Apache log file formats are supported. You can find out more about Apache log file formats here. In short, the common and the combined format currently can be handled. In addition, two variations of those vcommon and vcombined are also supported. The only difference with the latter two is that a virtual-host field is pre-pended to each log line.
The merging takes place ‘live’. Caching aside, the merging will be redone whenever the URL for the merged resource is hit again. If you wish to store the merged log file, simply combine the merged resource with a log-file writer component in a new pipeline.