Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close
Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Review of Data Virtualization

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search

Review of “Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility”

Judith R. Davis and Robert Eve have come together to put a definitive word out there on the emerging field of data virtualization. This is the first book ever on data virtualization. Data virtualization brings value to the seams of our enterprise – those gaps between the data warehouses, data marts, operational databases, master data hubs, big data hubs and query tools. It’s an empowering approach that is defined as “a data integration technique that provides complete, high-quality and actionable information through virtual integration of data across multiple, disparate internal and external data sources.”

Disadvantages cited to traditional approaches (i.e., data warehousing) include the extended time it takes to develop solutions and the need to design and develop in three distinct technologies – BI, data warehousing (which I take to mean data modeling) and ETL. Data virtualization is contrasted with this traditional data integration and this contrast becomes the theme of the book.

Ten case studies are showcased and all could have used traditional approaches but the virtualization approach is what worked and what clearly would work best in each situation. Robert Eve is the Vice President of Marketing at Composite Software and the case studies are all Composite case studies. However, the book points out that virtualization is delivered by extensions to other technology platforms, like BI and ETL tools, enterprise service buses and standalone platforms, like Composite, Oracle/BEA Aqualogic, Radiant Logic and MetaData.

The business value shown by the case studies come from how virtualization helps organizations “deliver complete, high-quality and actionable information more quickly and with fewer resources.”

As middleware, data virtualization utilizes two primary objects – views and data services. The virtualization platform consists of components that perform development, run time and management functions. The first component is the integrated development environment. The second is a server environment and the third is the management environment. These combine to transform data into consistent forms for use.

The usage patterns for virtualization, and covered by the case studies, are:

• BI data federation

• Data warehouse extensions

• Enterprise data virtualization layer

• Big data integration

• Cloud data integration

There is some obvious overlap between these. For example, most data warehouses are built for business intelligence so extending the warehouse virtually actually provides federation for BI. This form of virtualization is helpful in “augmenting” warehouse data with data that doesn’t make it to the warehouse in the traditional sense, but nonetheless is made available as part of the warehouse platform. Big data integration is referenced as referring to Hadoop and to data warehouse appliances like Netezza. Finally, the cloud is presented as a large integration challenge that is met by data virtualization.

In the chapter on “How Data Virtualization Delivers Business Agility”, of the many ways, the most interesting are that it delivers via an iterative development process and ease by which that change takes place. Case studies are from Comcast, Compassion International, Northern Trust, NYSE Euronext, Pfizer, Qualcomm and four unnamed companies. The different ways of delivering data virtualization are certainly highlighted across these cases. For example, Pfizer gained the ability to manage and monitor “everything” in one centralized location.

The cases also bring out various best practices for implementation. These include centralizing implementation responsibility, educating the business, paying attention to performance tuning and scalability, taking a phased approach and using an experienced vendor partner for data virtualization technology. Speaking of technology, this is decidedly not a technical book. This book is for the manager or lead architect and conveys high-level concepts and architectural considerations for data virtualization. Performance numbers are not cited and the technical details of implementation would need to come from the selected technology’s manuals.

My favorite of the case studies was the one on Comcast. With this study, the need to go “cross-platform” is evident. Yet, companies are not going to integrate the systems. I have called data integration the “permanent temporary solution.” And that it is at Comcast.

Pfizer had an “information sharing challenge” with applications that “don’t talk to each other.” They implemented data virtualization without sacrificing the architectural concepts of an information factory. Source data was left in place, yet all PharmSci data was “sourced” into a single reporting schema accessible by all front-end tools and users. The fact that Composite has the ability to cache data from a virtual view as a file or insert the data into a database table (via a trigger) adds significant value to the solution.

Just as I was thinking it, the authors called this a “light ETL”. The Pfizer head of the Business Information Systems team has a nice quote in the book which sums up much of the data virtualization benefits: “With data virtualization, we have the flexibility to decide which approach is optimal for us: to allow direct access to published views, … to use caching, … or to use stored procedures to write to a separate database to further improve performance or ensure around the clock availability of the data.” Sounds like flexibility worth adding to any shop.

Each case has a “return on investment” section. This sections either talks about the benefits that the data virtualization-using application yielded or the cost saved from an alternative approach. Both may be needed in many situations. Applications need positive cash flow ROI justification and the approach needs to be the least costly one (without sacrificing the long-term integrity of the environment).

Colin White’s introduction reminds us that the “virtual data warehouse” was highly assailed. Now ironically this has eroded. With warehouses having hit some stride in terms of how centralized they can become, virtualizing the rest is inevitable.

More vendors should do these books. Data Virtualization is a fair look at data virtualization. Despite the many examples, it’s not for every situation. Many situations can achieve long-lasting results with a commitment to a physical solution. However, the idea of data virtualization belongs in the toolbag and can remedy a situation that would otherwise sit idle, fester and dilute a company’s opportunities.

Data infrastructure is going to get more complex before it gets simpler. It is stridently multi-disciplined. Data Virtualization, the book, belongs on every information manager’s bookshelf alongside their favorite book on data warehousing, big data, cloud computing, data integration, data quality, data governance and master data management. Data virtualization is, after all, a discipline that belongs alongside these others.

Datavirt.JPG

Wiki Contributors
Collapse Expand Close

View more contributors