Lucid Technologies & Solutions Pvt. Ltd.

Pre-requisites: Collibra DGC 5.6+ with API v2, Elasticsearch 6.7

Packaging: Containerized application exposing the Search API; custom workflow to set up the relevancy and display rules; Collibra asset model extension



Licensing: Annual subscription

Why would I need this Solution / What problem are we looking to solve?

Collibra Data Governance Center consolidates rich information about various enterprise assets and this repository being easily searchable is key for user to pick and use the right information. The Search provided out-of-the-box provides limited features to drive the search results ordering and display making it difficult for end user to identify the right content.

Lucid’s Solution

To enable users always find appropriate data assets from the Enterprise catalog with the least number of clicks, we have created a custom search service that can be embedded in any application, including within a dashboard in Collibra Data Governance Center (DGC). Relevancy rules that reflect the Enterprise Data Standards can be configured in DGC, thereby ensuring that the assets that best meet the Enterprise Data Standards always rise to the top of the search results, thus rewarding adherence to standards.

Key Features

  • Configurable rules to calculate ‘Relevancy scores’for specific asset types that control the order of the search results
  • Search results contain both the matched asset as well as the related asset configured for display (first level relations only)
  • The Search Service exposedas a REST API for use inother applications
  • Automated refresh of assets from multiple Collibra DGC instances into asinglecustom Elasticsearch repository for Search
  • Ability to store and manage the configurations required for this solutionin Collibra DGC. Custom workflows are provided to manage the configurations
  • Search API parameter to support Search using the custom relevancy score calculation or using native Elastic algorithms
  • Support for wild card searches
  • Ability to turn on/off matches on specific attributes – Option to turn on/off matches on any of the 5 attributes – name, label, alias, community and domain
  • Ability to exclude specific fields from the output that are not needed – apart from excluding specific asset attributes, disabling elastic fields such as ‘highlight’ does improve performance
  • Ability to turn on/off aggregations as part of the search result
  • Showing up Relations(Characteristics) as part of the Search output

Note: UI is not provided as part of the solution. The REST APIs provided can be used in any application.

Key Differences from Collibra Search

Solution Architecture

Refer

Purpose

1

Scoring Rules: Rules that specify what assets are relevant in the context of the search, attributes to be matched against the search keyword(s) on either the asset or a related asset and relative relevancy weight (used to order the search result).

2

Display Rules: Rules that specify what assets and attributes should be displayed in the search result. When a search keyword is matched against an attribute of a related asset, the matched asset is also made available in the search result.

3

Extract:  Assets are periodically extracted from Collibra DGC using Collibra REST APIs. The extraction (refresh) frequency can be configured. Initial and incremental refresh options are supported.

4

Search Repository: Custom Elasticsearch repository used for persisting extracted content.

5

Scoring Engine: Calculates the relevancy score of assets using the scoring rules. The relevancy score will be used to order the search results.

6

Search Service: Delivered as a REST API that can be used in any application.