Making relevant decisions based on operational data that lead to improving efficiency or customer satisfaction is the goal of business intelligence. Business intelligence is simply a requirement to run a successful organization. On the surface, it may seem like all you need is your data, perhaps in a conventional database, along with a way to query and visualize business questions. Of course, sophisticated business intelligence is based on years of computer science research and industry development. In a recent article in the Communications of the ACM, “An Overview of Business Intelligence Technology”, Surajit Chaudhuri and Vivek Narasayya from Microsoft Research and Umeshwar Dayal from HP Labs give a detailed survey of both the underlying technology that is currently used for business intelligence as well as areas for future research.
Notably, it’s not just a matter of accessing your existing database. Medium to large organizations have many different sources of data that could span multiple relational databases as well as non-relational data such as system and event logs. To summarize or query across multiple data sources, the data must be procured, perhaps cleansed, and stored in a way it can be accessed and processed uniformly. This is achieved with algorithms and heuristics to discover patterns and find inconsistencies. The structure and distribution of data is extremely important in being able to both support query types but also to do so in a timely fashion. Understanding the types of queries and analytics will influence the ultimate storage structure. Different storage strategies have emerged over the years including column-oriented databases, parallel databases, and more recently reliable distributed file systems such as HDFS (Hadoop Distributed File System). For some organizations, compression is a requirement to deal with the huge amount of machine-generated data and to maximize disk throughput. Also, there is a difference between batch analytics and real-time intelligence, which necessitate different processing architectures and entirely different analytic algorithms.
For the foreseeable future, while organizations are beginning to utilize Hadoop and MapReduce, most organizations will continue to use relational databases for transactions. Procuring, integrating, and reconciling data inconsistencies will be an integral aspect of any business intelligence architecture.
Check out the article and let me know what you think. It’s a good read and gives many references to dig deeper.