An ugly reality of Business Intelligence development is the immense effort that is required to build and maintain the data warehouses that lie behind such an endeavor, particularly when data is coming in from multiple sources. In a typical situation, this task requires the use of ETL software to load data into the data warehouse, while transforming it to the desired structure. There is an inherent inflexibility in the procedure, since any change in design requires altering the ETL and reloading data.
The software discussed in Rick F. van der Lans’ book, Data Virtualization for Business Intelligence Systems, tackles this problem head on. Mr. van der Lans, a leading expert on data virtualization systems, provides a lucid explanation on how these systems work, without losing sight of their ultimate purpose, which is to aid in the delivery of Business Intelligence data. Although written in a non-vendor specific manner, he gives a sense of the look and feel of this class of software and provides a sufficient amount of technical detail. The software packages most frequently discussed in the book are Composite, Denodo, and Informatica.
At its heart, data virtualization servers create virtual tables from any number of data sources. These sources can include relational databases, web services or spreadsheets, and more. The virtual tables defined on the server are SQL views that translate the data into any desired format. These virtual tables can then be exposed to the outside world via ODBC/SQL, JDBC/SQL, SOAP/XML and more. Thus, data from multiple data sources can be combined at will and appear as a single set of data to the end user. Since everything is virtual, there is great flexibility in the designer’s ability to quickly change data design.
What’s even more impressive is the ability of data virutualization servers to add caching to any virtual table. This allows the virtual definition of table views to be turned into physical data. In SQL terms, this is the equivalent of creating a materialized view. A second added benefit of this class of software is the added query optimization intelligence that it provides. For views that cross data sources, data virtualization servers can optimize queries in ways that a single database does not provide.
Data virtualization software offers interesting opportunities for larger organizations. In essence, data virtualization places data connections and SQL views into a single application. For organizations with entrenched DBA groups, this allows analysts outside the traditional DBA function to create database views. Furthermore, the caching feature of data virtualization software permits analyts to create physical data at will, without the involvement of a DBA. After appropriate views are created, the sofware can then publish appropriate views to the outside world.
Aside from its coverage of data virtualization, another highlight of the book is a chapter devoted to an overview of the current Business Intelligence and Data Warehousing landscape. After clearly explaining every aspect of BI development, from data marts to data warehouses, star and snowflake schemas, ETL procedures, and the architectures championed by Ralph Kimball and Bill Inmon, the author concludes the chapter with a discussion of the disadvantages of traditional Business Intelligence systems and how data virtualization addresses those limitations.
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
by Rick van der Lans
Morgan Kaufman, Aug 2012
275 pages, $59.95