As we saw in my first post of this, ‘Agility’ is the speed with which the data platform is made available for analysis and its ability to respond to changes in the analysis requirements from Business. We will look at some of the technologies that can help in this space.
In this post we will look at ‘Data Virtualization’ as one of the methods to handle agility. Data Virtualization is a method by which data can pulled in from multiple data sources and presented to the end user, without the user knowing the exact location, structure or the platform of this data source. For example, we have a customer relational table sitting on DB2, the customer credit card transactions in a foreign country is shared as a file on a linux server, the currency conversion between the local and the foreign currency is shared from a provider in an xml format placed in a windows box. Now all of this can neatly be presented as three views to the user, who can then join and figure out if the customer has exceeded his limits for the day or is there any fraudulent transaction.
Now how does this relate to agility? The idea of pulling in data from multiple sources without having to face the challenge of platform, type, structure etc and most importantly not having to move it from source to a common target is a big save on the overall project time. When Business users want to do a quick mid-day analysis of a fraud check as I mentioned above, the same requirement given to an ETL time might take 2-3 weeks given the IT processes. That is where we offer agility.
The concept has become very popular in creation of sandpits for analysis/data discovery in days. Eventually this can become a DWH/Data Mart or even a MDM, if the integration rules are well managed.
Quotes from Forrestor blog on data virtualization : ‘CTO of a major financial firm put it in his interview: “The only question is how long you can delay going with data virtualization. If you look to the future, you simply have to do it this way.’
Same from Gartner blog – ‘DV speeds development, increases reuse and reduces data sprawl. But its pervasive myths have resulted in suboptimal implementations.’
Key vendors in this space : Composite Software (recently acquired by Cisco), Denodo Technologies and RedHat (an open source alternative)
A Good Book I saw on this – Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses by Rick van der Lans