Data Governance has become one of the key areas of focus for organizations as they tackle the challenge of growing data volumes to derive meaningful business outcomes.
What does it mean when we say “Govern the Data”? As Collibra, one of the leading Data Governance platform, puts it, it is the ability for organization to enable the data consumers find the right data they need, understand what that data is made up of and finally, trust the data for consumption based on its quality.
SAP HANA, with its in-memory computing capability, is increasingly becoming the platform of choice for both analytical and transactional applications. The three capabilities highlighted above are very relevant to HANA in order to improve the “trust” factor, which is key for any analytical platform apart from its technical capabilities.
Enabling these capabilities are not going to come easy given the ever increasing complexities in the data architecture of organization supporting the volumes/variety/velocity of data or the agile analytic processes that have limited focus on truly supporting the data stored with right level of information for its use (a simple one being adding a column description in a typical relational table).
Then how do we tackle this challenge? One of the proven ways is to harvest the “metadata” from the data platforms.
Going beyond the cliched “data about data” definition, what is meant by “metadata” of the data platforms here? Information about what data structures they hold, which of these data structures are used by processes and application in other platforms, what relations exist between these data structures and so on. These are often called the “technical metadata” of the data platforms. This when combined with the “Business metadata”, which typically are the Business definitions, processes, KPIs or the Business rules and policies, gives the right ammunition for the organization to taken on the Data Governance challenge.
Key use cases that supports and increases the trust on HANA as an analytical platform:
- Ability to draw the end-to-end lineage or trace the provenance of a data element from its system of use to its system of record or creation – This could be true for a platform consuming HANA (eg: BI/analytical tools) or for the platforms that are downstream consumed by HAN (eg: relational databases)
- Ability to trigger governance processes and notify relevant stakeholders of change to a data element in the HANA platform
- Push back to the HANA platform indicators that its data element is more traceable and trustable, as an outcome of the governance processes applied on them
Let us review each of these use cases:
First the end-to-end data lineage use case. For example, users are keen to know the lineage of data attributes from BI tools right up to their system of record. This means the data attribute needs to be traced across several data platforms such as analytical tools, data integration tools and data stores.
This often helps in two ways
(a) from a Business perspective, tracing the origin of the data attribute or its provenance helps the users understand how the value for the attribute is derived and when there are quality issues, where to go address them. This is very important especially from a regulatory standpoint as well trusting the reports/dashboards the users see
(b) more from a technical teams’ perspective, ability to do impact analysis on the consumers of the HANA objects when changes are made to the HANA objects as well as impact to the HANA objects when changes are made downstream on objects that are consumed by HANA objects.
Several Data Governance(DG) platforms provide these lineage views in their platform, including SAP’s Information Steward, which does this natively. However, in order to generate these lineage views, the metadata from all relevant data platforms have to be loaded into the DG platform.
Let us assume there is a Tableau worksheet built on top of a set of HANA Calculation views. These Calculation views in turn use several level of analytic and attribute views. These are built on top of several catalog views and tables (present in multiple schema).
In order to get an end-to-end lineage view of a Tableau report attribute right to its system-of-record, which could be few columns in the catalog tables, we load the metadata both from Tableau and SAP HANA (or the information about the various Tableau and SAP HANA objects) into the DG platform. The DG platform then enables stitching of these metadata to enable the lineage views.
In the next part of this blog, we will look at use cases 2 & 3 and also review few high level approaches to implement these use cases.