Configuring Surrogate Keys

You can configure surrogate keys for source systems that do not provide keys for new tenants that have data entities.

Note: For existing tenants with data entities, to configure surrogate keys, it is recommended that you run the Recalculate Surrogate Crosswalks task. You must also run the task if you change the configuration of a surrogate key.

Every unique crosswalk must have a unique key. When a source system provides a primary key with the entity in a record, the primary key must be used as the crosswalk key in the Reltio platform.

Sometimes a source system does not provide a key for its data. For example, there are two records of customers and addresses that have unique IDs offered for each customer but not independently for their address:
  • 1500764, John Smith, 123 Main Street, Canton OH 87552
  • 2786453, Jane Smith, 123 Main Street, Canton OH 87552

Since the Reltio platform might use the Contact entity type for the customer and the Location entity type for the address, the crosswalks used in each entity type will require its own key. In this example, 1500764 was intended to be the primary key of John Smith, whereas his address is additional information stored in the source record for John Smith. His wife, Jane, has the same address replicated in her source record.

Reltio provides the ability to have a unique key calculated for a crosswalk when a source system does not provide one. For example, when John's record and Jane's record are posted to the Reltio platform, John and Jane will each be assigned to separate contact entities with their own crosswalk keys of 1500764 and 2786453 respectively. John's address will form a Location entity that is linked to him. The address contains a crosswalk with a surrogate key calculated from the components of 123 Main Street, Canton OH. Jane’s address will also form a Location entity from the components of 123 Main Street, Canton OH.

But since both the Location entities that were formed contain identically calculated surrogate keys, the two Location entities will automatically merge into a single Location entity. The merged entity will have two relationships, one to John's contact entity, and another to Jane's contact entity.

Constructing Surrogate Keys

The concatenated, cleansed values of the Location entity are used to construct a surrogate key, for example, a combination of AddrLine1, AddrLine2, and City. Whenever a clean value of a Location entity, which is similar to a previously loaded Location entity, is loaded in Reltio, its surrogate key will be identical to the key of the previously loaded location entity.

Assigning Surrogate Keys

Reltio automatically cleanses and standardizes any location or address entity that is posted on the Reltio platform. A key can be explicitly assigned to the Location entity or Reltio can automatically generate it using the surrogate key function.

To automatically assign a key to the Location entity, the option must be enabled for each source that provides address entities by specifying the value of the refEntity attribute as Surrogate in the incoming JSON. In this example, the specified refEntity attribute value is Surrogate for the FB source.
[
  {
    "type": "configuration/entityTypes/HCP",
    "attributes": {
      "FirstName": [
        {
          "value": "FirstName001"
        }
      ],
      "Address": [
        {
          "value": {
            "AddressType": [
              {
                "value": "Home"
              }
            ],
            "AddressRank": [
              {
                "value": "0011"
              },
              {
                "value": "0012"
              }
            ],
            "AddressLine1": [
              {
                "value": "AddressA"
              }
            ],
            "City": [
              {
                "value": "CityA"
              }
            ],
            "Street": [
              {
                "value": "StreetA"
              }
            ]
          },
          "refEntity": {
            "crosswalks": [
              {
                "type": "configuration/sources/FB",
                "value": "Surrogate"
              }
            ]
          },
          "refRelation": {
            "crosswalks": [
              {
                "type": "configuration/sources/FB",
                "value": "rel_001"
              }
            ]
          }
        },
        {
          "value": {
            "AddressType": [
              {
                "value": "Home"
              }
            ],
            "AddressRank": [
              {
                "value": "002"
              }
            ],
            "AddressLine1": [
              {
                "value": "AddressB"
              }
            ],
            "City": [
              {
                "value": "CityB"
              }
            ]
          },
          "refEntity": {
            "crosswalks": [
              {
                "type": "configuration/sources/FB",
                "value": "loc_B"
              }
            ]
          },
          "refRelation": {
            "crosswalks": [
              {
                "type": "configuration/sources/FB",
                "value": "rel_002"
              }
            ]
          }
        }
      ]
    },
    "crosswalks": [
      {
        "type": "configuration/sources/FB",
        "value": "hcp_001"
      }
    ]
  }
]

Example of Surrogate Crosswalks using the HCP and HCO Entities

Reltio provides support for address cleansing, de-duplication and normalization. This functionality is implemented for addresses associated with the Party object, and this functionality extends to objects inherited from the Party object. Some inherited objects can be Organization, Individual, Healthcare Professional (HCP), and Healthcare Organization (HCO). When an address is specified for a Party object, Reltio models the address as a Party object in the L1 layer and a Location object linked by the hasAddress relationship. The Location is an address attribute of any object that inherits from the Party object in L1. The HCP entity in L2 inherits from Individual in L1, the HCO entity in L2 inherits from Organization in L1, and Individual and Organization in L1 both inherit from the Party object:

If a source system models the address as a many-to-many relationship, then each address will have a unique key that can be used as a Location crosswalk. If a source system has been de-normalized and models are addressed as a one-to-one relationship, it will contain repeated data. For example, the address for several HCPs working at the same HCO will be repeated for each HCP. This creates redundant data because the match and merge feature will accumulate the keys for each duplicated location. Some source systems may not even contain any keys for the address. For example, if the address is a flat file.

In this case, it is ideal to allow the Reltio Platform to create a key for Location automatically using a Surrogate Crosswalk. The fields listed in the definition will produce a unique key for each truly unique location. All locations that cleanse to the same set of fields will produce the same key. This must be done for each source system that does not provide a unique key for a location. If primary entities such as HCPs or HCOs have addresses associated with them, then two crosswalk keys must be considered when mapping to the entity’s address attribute:
  1. refEntity: This refers to the Location object being loaded.
  2. refRelation: This refers to the hasAddress relationship used to link one entity to another entity.
{
	"uri": "configuration",
	"label": "Layer3Configuration",
	"description": "simple-l3.v1",
	"schemaVersion": "1",
	"referenceConfigurationURI": "configuration/_vertical/life-sciences",
	"abstract": "false",
	"entityTypes": [{
		"uri": "configuration/entityTypes/Location",
		"surrogateCrosswalks": [{
			"source": "configuration/sources/HCPMaster",
			"enforce": "true",
			"attributes": [
				"configuration/entityTypes/Location/attributes/AddressLine1",
				"configuration/entityTypes/Location/attributes/AddressLine2",
				"configuration/entityTypes/Location/attributes/City",
				"configuration/entityTypes/Location/attributes/StateProvince",
				"configuration/entityTypes/Location/attributes/Country",
				"configuration/entityTypes/Location/attributes/Street",
				"configuration/entityTypes/Location/attributes/SubBuilding",
				"configuration/entityTypes/Location/attributes/Zip/attributes/Zip5",
				"configuration/entityTypes/Location/attributes/Zip/attributes/Zip4"
			]
		}]
	}]
}
There are several common patterns for source data. For each pattern, Reltio recommends a specific methodology for the key. See Table 1: Patterns for Source Data to understand the methodology used for a specific pattern using HCP:
Table 1. Patterns for Source Data
Pattern Methodology
Pattern 1: Each HCP is accompanied with a single address. The HCP occupies a row in the source file along with an address in the same row. Use the unique key of the HCP as the key for refRelation.
Pattern 2: Each HCP has multiple addresses, included in a single flat file. Each HCP is listed on multiple rows, and each row provides a different address for the same HCP. Construct a key based on the concatenation of HCP Key and Address. For example, 101876|123MainStreet|Anytown|91301|USA
Pattern 3: Each HCP has multiple addresses, where the HCP is listed in one file, and the addresses are listed in a separate address file, with the HCP’s key as the foreign key. In the address file, the AddrKey value is unique. Use the Address key from the source as the key for refRelation.
Pattern 4: The source uses a many-to-many arrangement, where each address has a unique key within the source, and is linked to the HCP using an intersection table, which also has unique keys. Use the unique key from the intersection table which has the key for refRelation.