Lucid Technologies & Solutions Pvt. Ltd.


Leveraging SAP HANA metadata to make it a “Trusted” Analytics platform – Part 2 of 2

In Part 1, we listed the key use cases that supports and increases the trust on HANA as an analytical platform leveraging HANA’s metadata. I list them here again for our reference:

  1. Ability to draw the end-to-end lineage or trace the provenance of a data element from its system of use to its system of record or creation – This could be true for a platform consuming HANA (eg: BI/analytical tools) or for the platforms that are downstream consumed by HANA (eg: relational databases)
  2. Ability to trigger governance processes and notify relevant stakeholders of change to a data element in the HANA platform
  3. Push back to the HANA platform indicators that its data element is more traceable and trustable, as an outcome of the governance processes applied on them

We also looked at the first use case of drawing up the end-to-end lineage from the upstream systems/platforms to HANA or from HANA to its downstream systems and the benefits in brings.

Let us review the remaining two use cases as well.

One of the factor that improves ‘Trust’ on a data platform like HANA for the data consumers, is its ability audit changes to the underlying data structures. Business requirements being dynamic and hence the data needs, changes to the data structures are inevitable. Leaving alone the flexibility technology offers to implement these in a NoSQL database vis a vis SQL/relational database, auditing changes is key (a) to keep the data consumer informed of the changes (b) ensure security is not compromised (c) address regulatory needs of having an audit log of changes or (d) could be to ensure certain enterprise standards are adhered to.

Ability to capture changes on the HANA platform and triggering relevant governance processes on change then becomes essential. This is the second use case where HANA metadata can be leveraged to compare the version in the HANA platform against the version in the Data Governance(DG) platform and on change, trigger workflows that support notification to stakeholders, validation etc

It is definitely worthwhile to govern the HANA platforms from within the DG platform, looking at the lineage, doing impact analysis or triggering workflows. This is best suited for the DG teams or stakeholder with access to the DG platform. However, from an end user (of the HANA platform) perspective, it makes more sense to leverage the outcome of these DG processes right there in the HANA platform (without really logging into the DG platform).

This is where the third use case comes in. Ability to write-back / feed the outcome of the governance processes into the source HANA platform. For example, what if we are able to add a comment to a Catalog table or Catalog column based on the governed glossary definition for that object? Or use these comments to tag how ‘trustable’ the object is based on the assessment done within the DG platform say ‘Certified’, ‘Certified – Bronze level’ etc. I am not sure if HANA provides a means to update comments for the Calculation views yet but if this is available, tagging these analytical objects similarly will be huge boost to its reliability by the data consumers.

One bonus use case that also gets handled is the cataloging of the HANA objects! Data catalogs have become an essential component of a modern enterprise’s data architecture.  It is nothing but a catalog of an enterprise’s data sets, making it searchable and understand its contents and additional metadata, if available (typically likened to a product catalog of e-retailers like Amazon)

My experience has been to try out these use cases in Collibra, one of the leading DG platform. First step was to integrate the HANA platform with Collibra and load metadata of HANA objects (Catalog tables/views/columns/stored procs/functions as well as those of the calculation/analytic/attribute views, including their dependencies).

Then the load was setup for periodic refresh, which is where changes to the HANA platform could be monitored. Governance processes then were setup as Activiti-based workflows in the Collibra platform.

This surely is a big step towards governing the HANA platform and thus increasing its ‘Trust’ factor.


Leveraging SAP HANA metadata to make it a “Trusted” Analytics platform – Part 1 of 2

Data Governance has become one of the key areas of focus for organizations as they tackle the challenge of growing data volumes to derive meaningful business outcomes.

What does it mean when we say “Govern the Data”? As Collibra, one of the leading Data Governance platform, puts it, it is the ability for organization to enable the data consumers find the right data they need, understand what that data is made up of and finally, trust the data for consumption based on its quality.

SAP HANA, with its in-memory computing capability,  is increasingly becoming the platform of choice for both analytical and transactional applications. The three capabilities highlighted above are very relevant to HANA in order to improve the “trust” factor, which is key for any analytical platform apart from its technical capabilities.

Enabling these capabilities are not going to come easy given the ever increasing complexities in the data architecture of organization supporting the volumes/variety/velocity of data or the agile analytic processes that have limited focus on truly supporting the data stored with right level of information for its use (a simple one being adding a column description in a typical relational table).

Then how do we tackle this challenge? One of the proven ways is to harvest the “metadata” from the data platforms.

Going beyond the cliched “data about data” definition, what is meant by “metadata” of the data platforms here? Information about what data structures they hold, which of these data structures are used by processes and application in other platforms, what relations exist between these data structures and so on. These are often called the “technical metadata” of the data platforms. This when combined with the “Business metadata”, which typically are the Business definitions, processes, KPIs or the Business rules and policies, gives the right ammunition for the organization to taken on the Data Governance challenge.

Key use cases that supports and increases the trust on HANA as an analytical platform:

  1. Ability to draw the end-to-end lineage or trace the provenance of a data element from its system of use to its system of record or creation – This could be true for a platform consuming HANA (eg: BI/analytical tools) or for the platforms that are downstream consumed by HAN (eg: relational databases)
  2. Ability to trigger governance processes and notify relevant stakeholders of change to a data element in the HANA platform
  3. Push back to the HANA platform indicators that its data element is more traceable and trustable, as an outcome of the governance processes applied on them

Let us review each of these use cases:

First the end-to-end data lineage use case. For example, users are keen to know the lineage of data attributes from BI tools right up to their system of record. This means the data attribute needs to be traced across several data platforms such as analytical tools, data integration tools and data stores.

This often helps in two ways

(a) from a Business perspective, tracing the origin of the data attribute or its provenance helps the users understand how the value for the attribute is derived and when there are quality issues, where to go address them. This is very important especially from a regulatory standpoint as well trusting the reports/dashboards the users see

(b) more from a technical teams’ perspective, ability to do impact analysis on the consumers of the HANA objects when changes are made to the HANA objects as well as impact to the HANA objects when changes are made downstream on objects that are consumed by HANA objects.

Several Data Governance(DG) platforms provide these lineage views in their platform, including SAP’s Information Steward, which does this natively. However, in order to generate these lineage views, the metadata from all relevant data platforms have to be loaded into the DG platform.

Let us assume there is a Tableau worksheet built on top of a set of HANA Calculation views. These Calculation views in turn use several level of analytic and attribute views. These are built on top of several catalog views and tables (present in multiple schema).

In order to get an end-to-end lineage view of a Tableau report attribute right to its system-of-record, which could be few columns in the catalog tables, we  load the metadata both from Tableau and SAP HANA (or the information about the various Tableau and SAP HANA objects) into the DG platform.  The DG platform then enables stitching of these metadata to enable the lineage views.

In the next part of this blog, we will look at use cases 2 & 3 and also  review few high level approaches to implement these use cases.