Defining Datasets with Merges

Defining a Dataset with merges using the Data Access API.

These Dataset fields include the following information about merges:
  • Type of merge
  • Match rules that caused the merge
  • Timestamp of the merge event

Using .merges

To get information about merges use the .merges method with the mergeType parameter. You can define a Dataset by using the following fields:

Table 1. Dataset fields
Field Description
mergeKey Indicates the sorted concatenation of the winnerId and loserId fields. Format: winnerId:loserId
mergeRulesUris Indicates the collection of match rule URIs that triggered the match. This field supports multi-values. If only one rule was involved in a merge, then this field displays one value only. If multiple rules were involved in a merge then all those rules are displayed.

Data type - String

Note: mergeRulesUris are not displayed in the following cases:
  • Entities that were not potential matches, but were merged manually
  • Entities that were merged by a crosswalk
  • Entities that were merged on-the-fly
  • Entities that were merged-on-the-fly, when the merge-on-the-fly feature is disabled in your tenant’s Reltio Data Science configuration
winnerId Indicates the ID of the “winner” entity.
loserId Indicates the ID of the “loser” entity.
matchRules Indicates the collection of match rule labels that triggered the match.
Note: matchRules are not displayed in the following cases:
  • Entities that were not potential matches, but were merged manually
  • Entities that were merged by a crosswalk
  • Entities that were merged-on-the-fly, when the merge-on-the-fly feature is disabled in your tenant’s Reltio Data Science configuration
timestamp Indicates the timestamp of the merge.
Note: In a series of merges, for each merged record, the timestamp refers to the time when the merge key was created. For more information, see Timestamp Calculation.
type Indicates if the merge type was auto, manual or phantom (merged on the fly).

Example: In the following example, we are defining a Dataset with merges using the Data Access API:

//merges:
import com.reltio.analytics.objects.Merge.MergeType._
//possible parameter values: AUTO, MANUAL, ANY, PHANTOM
val merges: DataFrame = framework.merges(mergeType = ANY)    	

Schema:

root
|-- mergeKey: string (nullable = false)
|-- winnerId: string (nullable =
        false)
|-- loserId: string (nullable = false)
|-- matchRules: array (nullable =
        false)
| |-- element: string (containsNull = false)
|-- mergeRulesUris: array (nullable = false)
|    |-- element: string (containsNull = false)
|-- timestamp: long (nullable
        = false)
|-- type: string (nullable = false)

Dataset example:

+-----------------+--------+--------+---------------------------+-------------+------+------------------------------------------------------------------+
|mergeKey         |winnerId|loserId |matchRules                 |timestamp    |type  |mergeRulesUris                                                    |
+-----------------+--------+--------+---------------------------+-------------+------+------------------------------------------------------------------+
|19g0mdC3:19g0mYvn|19g0mdC3|19g0mYvn|[Person by exact LastName] |1588855753599|AUTO  |[configuration/entityTypes/HCP/matchGroups/PersonByExactLastName] |
|19g0muF5:19g0mpyp|19g0muF5|19g0mpyp|[]                         |1588856497217|MANUAL|[]                                                                |
|1mA7pJ0t:1mA7pEkd|1mA7pJ0t|1mA7pEkd|[Person by exact FirstName |1588855604416|AUTO  |[configuration/entityTypes/HCP/matchGroups/PersonByExactLastName, |
|                 |        |        | Person by exact LastName] |             |      | configuration/entityTypes/HCP/matchGroups/PersonByExactFirstName]|
+-----------------+--------+--------+---------------------------+-------------+------+------------------------------------------------------------------+