Creating Reports in Power BI with Reltio Analytics

Create a blank report and connect it to a data source and also choose the right visualizations to display your data.

Connecting the Report to Reltio Analytics

If you have already created a new blank report, then you must connect that report to a data source. Perform the following steps to connect the report to a data source:
  1. Click Home > Get Data > More and then select Database from the menu. You are asked for the credentials if you are connecting to Reltio Analytics for the first time.
  2. A navigator window appears listing all the projects that you can access in Reltio Analytics. You can then navigate to the desired table/view by clicking Project > Dataset > Table/View.
  3. After this, you can verify if you are indeed connected to Reltio Analytics by navigating to the Data tab appearing on the left side of Power BI. If the steps are accurately performed, then you can display your data using Power BI’s visualizations.
    Note: You may be asked if you wish to use direct query against your data in Reltio Analytics or import it. We recommend that you choose the direct query approach, since the other approach requires you to import the data.

Choosing the Right Visualizations

The most crucial point when creating a report is to choose the right visualizations to display your data. You can explore Power BI visualizations from the official docs. You can find the visualizations on the left side of the Power BI desktop tool as shown in the image below.

To get started, you can simply drag and drop a visualization to the canvas.

New sheets (pages) are created in a way that is very similar to the way the sheets are created in Excel sheets or Google spreadsheets. The report in the above image needs four pages with the names corresponding to what you can see on the image.

For each visualization, you can choose the fields to be displayed and also apply filters to that visual.

As shown in the image above, Filters, Visualizations, and Fields are the three main panels you work with in Power BI.
  • On the Fields panel, you can choose the data to be displayed on a given visual.
  • On the Visualizations panel, you can format and beautify your visuals. This is also applicable where you define drill-downs if you want any visualization to exist.
  • On the Filters panel, you can define visual-specific filters or filters that apply on a specific page or all the pages of the report.

The Slicer is a very specific type of visual and it corresponds to the controls in Data Studio.

The Matrix visual is used instead of the normal table in order to achieve the drill-down effect for the Reporting 2020.2 report that we have in the Data Studio version. All the other type of visuals used in the Data Studio have a straight-forward alternative in Power BI.

Reltio Analytics Views Behind the Report

The following views are used in the report:

  • bi_view_entities_metadata_source
    SELECT
    	createdTime as createdDate,
    	updatedTime as updatedDate,
    	startDate as startDate,
    	endDate as endDate,
    	Type as EntityType,
    	Id as EntityId,
    	IF(softDeleted = FALSE AND hardDeleted = FALSE AND createdTime != updatedTime, 0, 1) as wasCreated,
    	IF(softDeleted = FALSE AND hardDeleted = FALSE AND createdTime = updatedTime, 0, 1) as wasUpdated,
    	IF(startDate IS NULL, 0, 1) as wasStartDated,
    	IF(endDate IS NULL, 0, 1) as wasEndDated,    
    	"" AS project_id,
    	"" AS dataset_id,
    	REGEXP_REPLACE(cw.Source, 'configuration/sources/', '') AS Source
    FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.entities_merged`,
    UNNEST (crosswalks) AS cw
    Select createdTime, updatedTime, startDate, endDate, Type, and Id directly from the entities_merged table. The following points explain how these columns are set:
    • The wasCreated column is set to 0, if an entity is neither soft-deleted nor hard-deleted, and, the entity creation time is different from the updated time. Otherwise, it is set to 1.
    • The wasUpdated column is analogically set to 0 if the entity is neither soft-deleted nor hard-deleted, and, the entity creation time is same as the updated time. Otherwise, it is set to 1.
    • The wasStartDated column is set to 0 if the start date is null or missing and to 1 if the start date is neither null nor missing.
    • The wasEndDate column is analogically set by checking for the end date. The Source is the selected source for each crosswalk with the configuration/sources/ string removed. The table’s crosswalks are un-nested so that we can get each source for each entity.
  • bi_view_relations_metadata_source
    SELECT
                        Id as RelationId,
                        rel.Type as RelationType,
                        REGEXP_REPLACE(cw.Source, 'configuration/sources/','') as Source,
                        rel.createdTime AS relationCreatedDate,
                        rel.updatedTime AS relationUpdatedDate,
                        rel.startDate AS relationStartDate,
                        rel.endDate AS relationEndDate,
                        IF(rel.createdTime = rel.updatedTime, 0, 1) as wasUpdated,
                        IF(rel.createdTime != rel.updatedTime, 0, 1) as wasCreated,
                        IF(rel.startDate IS NULL, 0, 1) as wasStartDated,
                        IF(rel.endDate IS NULL, 0, 1) as wasEndDated,
                        "" AS project_id,
                        "" AS dataset_id
                        FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.relations_merged` as rel,
                        UNNEST(rel.crosswalks) as cw
                    
    The bi_view_relations_metadata_source view selects Id, Type, createdTime, updatedTime, startDate, and endDate from the relations_merged table, where the relation crosswalks are un-nested so that we can get each source for each relation. The Source is selected by removing configuration/sources/ from the string. The following points explain how the columns are set:
    • The wasUpdated column is set to 0 if the created time of the relation is same as the updated time of the relation. Otherwise, it is set to 1.
    • The wasCreated column is analogically set to 0 if the created time is different from the updated time and to 1 if it is the same. The wasStartDated and wasEndDated columns follow the same logic from the bi_view_entities_metadata_source view.
  • bi_view_merges_source_country
    WITH
    entities AS (
      SELECT entityType, Id
      FROM (
    	SELECT
      	Type AS entityType,
      	Id,
      	ROW_NUMBER() OVER (PARTITION BY Id, Type ORDER BY versionTime DESC ) AS row_num
    	FROM `customer-facing.riq_dw_pilot_M5p3PF6lEm007rS.entity_*`
      )
      WHERE row_num = 1
    ),
    merges_source AS (
      SELECT *, FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP_MILLIS(timestamp)) AS date
      FROM `customer-facing.riq_dw_pilot_M5p3PF6lEm007rS.merges_*`
    ),
    tbl AS (
      SELECT *
      FROM merges_source
      INNER JOIN entities
      ON merges_source.winnerId = entities.Id
    ),
    mrgTbl AS (SELECT
      Id AS entityId,
      entityType,
      date,
      matchRules,
      type,  
      COUNT(DISTINCT mergeKey) AS merges,
      "" AS project_id,
      "" AS dataset_id
    FROM tbl, UNNEST(matchRules) AS matchRules
    GROUP BY 1, 2, 3, 4, 5)
    SELECT
      entityId,
      entityType,
      PARSE_DATE("%Y%m%d",date) as date,
      matchRules,
      type,
      merges,
      project_id,
      dataset_id,
      Country,
      ARRAY_AGG(REGEXP_REPLACE(Source, 'configuration/sources/', '')) as Source
    FROM mrgTbl
    LEFT JOIN (SELECT
            	Id,
            	IF(mrg.attributes.Address IS NULL, null, mrg.attributes.Address[SAFE_OFFSET(0)].Country[SAFE_OFFSET(0)]) as Country,
            	cw.source AS Source
          	FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.entities_merged` AS mrg,
          	UNNEST (crosswalks) AS cw) as enm
    ON entityId = enm.Id
    GROUP BY 1,2,3,4,5,6,7,8,9

    Select all the unique occurrences with the combination of Type and Id in all the entities tables. Then, through the merges_source table you can find all the winners. Select the entityId, entityType, the date of the merge, the match rules applied, the type of merge, and the count of merges for each winner, where match rule is an un-nested column. All this is then joined back to the entities_merged table to find out the Country and Source for each entity, which are added to the selected columns.

  • view_no_matches_source_count_country
    WITH
    entities AS (
      SELECT entityType, Id
      FROM (
    	SELECT
      	Type AS entityType,
      	Id,
      	ROW_NUMBER() OVER (PARTITION BY Id, Type ORDER BY versionTime DESC ) AS row_num
    	FROM `customer-facing.riq_dw_pilot_M5p3PF6lEm007rS.entity_*`
      )
      WHERE row_num = 1
    ),
    links AS (
      SELECT DISTINCT winnerId AS entityId
      FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.links`
      UNION ALL
      SELECT DISTINCT loserId AS entityId
      FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.links`
    ),
    tbl AS (
      SELECT *
      FROM entities
      WHERE Id NOT IN (SELECT entityId FROM links)
    ),
    tblMrg AS(
    SELECT
      entityType,
      Id as entityId,
      COUNT(DISTINCT Id) AS entities,
      "" AS project_id,
      "" AS dataset_id
    FROM tbl
    GROUP BY 1,2)
    SELECT
    entityType,
    entityId,
    entities,
    project_id,
    dataset_id,
    Country,
    ARRAY_AGG(REGEXP_REPLACE(Source, 'configuration/sources/', '')) as Source
    FROM tblMrg
    LEFT JOIN (SELECT
            	Id,
            	IF(mrg.attributes.Address IS NULL, null, mrg.attributes.Address[SAFE_OFFSET(0)].Country[SAFE_OFFSET(0)]) as Country,
            	cw.source AS Source
            	FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.entities_merged` as mrg,  	 
            	UNNEST (crosswalks) as cw) as enm
    ON entityId = enm.Id
    GROUP BY 1,2,3,4,5,6

    Select all the unique occurrences with the combination of Type and Id in all the entities tables. The entities are filtered by their ID, so that entities (rows) present in the links table as winnerId or loserId are not present in the result of the query. For each of the entries that are not present in the links - Id and entityType (grouped by the Id, entityType, and the count of distinct IDs as entities (count)) are selected. The result is then joined with the entities_merged view so that the source and country are selected as well. The join is on entityId = crosswalk id.

  • bi_view_matches_source_country
    WITH
    entities_source AS (
      SELECT
    	Id,
    	entityType,
    	ARRAY_AGG(DISTINCT REGEXP_REPLACE(crosswalks.source, 'configuration/sources/', '')) AS crosswalkSource
      FROM (
    	SELECT
      	Id,
      	Type AS entityType,
      	crosswalks,
      	ROW_NUMBER() OVER (PARTITION BY Id, Type ORDER BY versionTime DESC ) AS row_num
    	FROM
      	`customer-facing.riq_dw_pilot_M5p3PF6lEm007rS.entity_*`
      ), UNNEST(crosswalks) AS crosswalks
      WHERE row_num = 1
      GROUP BY 1, 2
    ),
    matches_source AS (
      SELECT *, FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP_MILLIS(timestamp)) AS date
      FROM `customer-facing.riq_dw_pilot_M5p3PF6lEm007rS.matches_*`
    ),
    tbl AS (
      SELECT source_entities_source.Id, matches_source.*, source_entities_source.entityType, source_entities_source.crosswalkSource AS sourcecrosswalkSource, target_entities_source.crosswalkSource AS targetcrosswalkSource
      FROM matches_source
      INNER JOIN entities_source AS source_entities_source
      ON matches_source.sourceId = source_entities_source.Id
      INNER JOIN entities_source AS target_entities_source
      ON matches_source.targetId = target_entities_source.Id
    ),
    mrgTbl AS (SELECT
      Id as entityId,
      date,
      type,
      matchRules,
      COUNT(DISTINCT matchKey) AS matches,
      AVG(matchScore) AS matchScore,
      "" AS project_id,
      "" AS dataset_id
    FROM tbl, UNNEST(matchRules) AS matchRules
    GROUP BY 1, 2, 3, 4
    )
    SELECT
      entityId,
      PARSE_DATE("%Y%m%d" , date),
      mrgTbl.type,
      enm.Type as entityType,
      matchRules,
      matches,
      matchScore,
      project_id,
      dataset_id,
      Country,
      ARRAY_AGG(REGEXP_REPLACE(Source, 'configuration/sources/', '')) as Source
    FROM mrgTbl
    LEFT JOIN (SELECT
            	Id,
            	Type,
            	IF(mrg.attributes.Address IS NULL, null, mrg.attributes.Address[SAFE_OFFSET(0)].Country[SAFE_OFFSET(0)]) as Country,
            	cw.source AS Source
          	FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.entities_merged` AS mrg,
          	UNNEST (crosswalks) AS cw) as enm
    ON entityId = enm.Id
    GROUP BY 1,2,3,4,5,6,7,8,9,10

    Analogically when compared to the previous views ( bi_view_entities_metadata_source, bi_view_relations_metadata_source, bi_view_merges_source_country, and view_no_matches_source_count_country) the bi_view_matches_source_country view queries matches and selects the entityId, date, type of match, entityType, match rules determining the match, count of matches for that specific entity (matches column), the match score, the country of the entity if applicable, and all the sources that contributed to that entity.

  • view_merge_report
    WITH links as (SELECT
      * EXCEPT (row_num)
    FROM (
      SELECT
    	*,
    	ROW_NUMBER() OVER (PARTITION BY mergeKey ORDER BY timestamp DESC ) AS row_num
      FROM
    	`customer-facing.riq_dw_pilot_M5p3PF6lEm007rS.merges_*`) AS t
    WHERE
      row_num = 1
      AND type != "SPLIT" AND type != 'POTENTIAL_MATCH' AND type != 'NOT_MATCH'),
     
    temp as (
    SELECT
    winnerId as golder_Uri,
    winnerId,
    loserId,
    matchRules as MergeRule,
    (CASE
      WHEN type = 'AUTO' THEN 'Automatically merged'
      WHEN type is null then 'Merge on the fly'
      WHEN type = 'MANUAL' then 'Merge by hand'
    END) as MergeReason,
    timestamp as MergeTime
    from links)
    SELECT
    golder_Uri,
    CONCAT('entities/', winnerId) as win_Uri,
    CONCAT('entities/', loserId) as lose_Uri,
    MergeRule,
    MergeReason,
    MergeTime,
    emWin.createdTime as CreateTime,
    REGEXP_REPLACE(emWin.crosswalks[offset(0)].source, 'configuration/sources/', '') as WinSource,
    '' as LoseSource
    FROM temp
    LEFT JOIN `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.entities_merged` as emWin
    ON winnerId = emWin.Id

    Selects all the rows from the merges tables, where the mergeKey has a unique occurrence and filters out only rows that do not have the SPLIT, POTENTIAL_MATCH or NOT_MATCH type of split. MergeReason is calculated based on the type of the merge and then the WinSource and LoseSource are added to the selection as they come from a joined table (entities_merged). The join happens on winnerId = entities_merged.Id. As result the golden URI, winner URI, loser URI, merge rules applied, merge reason, the time of the merge, the create time of the record, the winning source, and the losing source are returned.

  • bi_view_workflow_entityType_country
    WITH temp AS(
    	WITH workflow AS(
        	SELECT
        	task_id,
        	entityId,
        	tenant_id,
        	process_instance_id,
        	FORMAT_TIMESTAMP('%Y%m%d', process_instance_create_time)AS process_instance_create_date,
        	decision,
        	IFNULL(decision, "In Progress")AS newDecision,
        	FORMAT_TIMESTAMP('%Y%m%d', process_instance_completion_time)AS process_instance_completion_date,
        	process_instance_complete,
        	task_type,
        	CASE
        	WHEN process_type = "dataChangeRequestReview" THEN "Data Change Request Review"
            	WHEN process_type = "dataChangeRequestReviewInternal" THEN "Data Change Request Review Internal"
            	WHEN process_type = "potentialMatchReview" THEN "potential Match Review"
            	WHEN process_type = "recommendForDelete" THEN "Recommend For Delete"
            	END AS process_type,
        	username,
        	assignee,
        	TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), process_instance_create_time, MINUTE)AS duration_curr_min,
        	TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), process_instance_create_time, DAY)AS duration_curr_day,
        	TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), process_instance_create_time, HOUR)AS duration_curr_hour,
        	TIMESTAMP_DIFF(process_instance_completion_time, process_instance_create_time, MINUTE)AS duration_min,
        	TIMESTAMP_DIFF(process_instance_completion_time, process_instance_create_time, DAY)AS duration_day,
        	TIMESTAMP_DIFF(process_instance_completion_time, process_instance_create_time, HOUR)AS duration_hour
        	FROM(
            	SELECT
            	task_id,
            	MAX(tenant_id)as tenant_id,
            	MAX(SUBSTR(objectURIs, -7))AS entityId,
            	MAX(process_instance_id)AS process_instance_id,
            	MAX(process_type)AS process_type,
            	MIN(process_instance_create_time)AS process_instance_create_time,
            	MAX(decision)AS decision,
            	MAX(process_instance_completion_time)AS process_instance_completion_time,
            	MAX(process_instance_complete)AS process_instance_complete,
            	MAX(task_type)AS task_type,
            	ARRAY_AGG(username ORDER BY timestamp DESC)[OFFSET(0)]AS username,
            	ARRAY_AGG(assignee ORDER BY timestamp DESC)[OFFSET(0)]AS assignee
            	FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.workflow`
            	GROUP BY task_id)),
    	entities AS(
        	SELECT DISTINCT REGEXP_REPLACE(Type, '_source', '')AS entityType, Id
        	FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.entities_merged`),
    	tbl AS(
        	SELECT
        	entityType,
        	workflow. * ,
        	CASE
       		 WHEN duration_curr_day > 10 THEN 'Above 10'
       		 WHEN duration_curr_day BETWEEN 0 AND 3 THEN '1-3'
       		 WHEN duration_curr_day BETWEEN 3 AND 5 THEN '3-5'
       		 WHEN duration_curr_day BETWEEN 5 AND 10 THEN '5-10'
        	END AS duration_curr_group,
        	CASE
       		 WHEN duration_day > 10 THEN 'Above 10'
       		 WHEN duration_day BETWEEN 0 AND 3 THEN '1-3'
       		 WHEN duration_day BETWEEN 3 AND 5 THEN '3-5'
       		 WHEN duration_day BETWEEN 5 AND 10 THEN '5-10'
        	END AS duration_group
        	FROM entities
        	INNER JOIN workflow
        	ON workflow.entityId = entities.Id)
    	SELECT
    	entityType,
    	entityId,
    	tenant_id,
    	process_type,
    	assignee,
    	process_instance_create_date,
    	process_instance_completion_date,
    	newDecision,
    	duration_day,
    	duration_curr_group,
    	username,
    	COUNT( * )AS tasks,
    	"" AS project_id,
    	"" AS dataset_id
    	FROM tbl
    	GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
    SELECT
    entityType,
    entityId,
    tenant_id,
    process_type,
    assignee,
    PARSE_DATE("%Y%m%d", process_instance_create_date)as process_instance_create_date,
    PARSE_DATE("%Y%m%d", process_instance_completion_date)as process_instance_completion_date,
    newDecision,
    duration_day,
    duration_curr_group,
    username,
    tasks,
    project_id,
    dataset_id,
    Country,
    ARRAY_AGG(REGEXP_REPLACE(Source, 'configuration/sources/', ''))as Source
    FROM temp
    LEFT JOIN(SELECT
    	Id,
    	IF(mrg.attributes.Address IS NULL, null, mrg.attributes.Address[SAFE_OFFSET(0)].Country[SAFE_OFFSET(0)])as Country,
    	cw.source AS Source
    	FROM `customer-facing.views_riq_dw_pilot_M5p3PF6lEm007rS.entities_merged` as mrg,
    	UNNEST(crosswalks)as cw)as enm
    ON entityId = enm.Id
    GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

    From the workflow view, select the task_id, entityId, max process_instance_id, max process_type, max process_instance_create_time, max decision, max process_instance_completetion_time, max process_instance_complete, max task_type, username, and assignee, where the data is grouped by the task_id - meaning that the described columns are selected for each unique task ID. A series of calculations is run against the selected data in which a new decision is introduced based on the current decision value. If the value is null the new decision column has the value of In Progress.

    The following six time difference columns are introduced, which correspond to the difference between now and the process_instance_create_time in minutes, days, hours:
    • duration_curr_min
    • duration_curr_day
    • duration_curr_hour
    • duration_min
    • duration_day
    • duration_hour .

    Also, the difference between process_instance_completion_time and process_instance_create_time in minutes, days, and hours is introduced. The result of this operation is joined with entityType from the entities_merged view. The join happens on entityId = entities_merged.Id. Then, the duration_curr_group and duration_group columns are added to the result. The first one is calculated based on the duration_curr_day column and has one of the values from Above 10, 1-3, 3-5, or 5-10.

    Analogically, the duration_group is calculated based on the duration_day column and has the same values. Out of this result, the entityType, entityId, tenant_id, process_type, assignee, process_instance_create_date, process_instance_completion_date, newDecision, duration_day, duration_curr_group, username, and the count of tasks are selected.

    All the columns but the tasks column form the group, where tasks is the count of * or the count of records for that group. This result is then finally joined again to the entities_merged view on entityId = entities_merged.Id and the Country and Source are also selected for each entity (row).

Limitations

The following are some limitations associated with the Power BI reports:
  • Integer timestamps are not recognized as dates. You must make a separate view specific for your Power BI report which parses the column as date.
  • Date hierarchies are not recognized in the direct query approach. The feature of drilling-down dates is not implemented everywhere as in Data Studio.