Data Access API 2.0

The Data Access API v2.0 was introduced in Reltio Connected Cloud 2016.3.

The Data Access API v2.0 allows users to:

  • Write less code in order to access data.
  • Remove redundant features, like builder.asTable. Users can achieve the same goal by using the native Spark Dataset API method: registerTempTable.
  • Provide clear separation between initial data access and access after modifications. The builder.explode method is removed from the initial Data Access API call, and is available as the lateralView method on Dataset (via implicit conversion).
  • Get Schema without any RDDs/DataFrames invocation using methods: entitiesSchema relationsSchema interactionsSchema.

DataAccess and Dataset API

An example of Data Access API is presented below.

import com.reltio.analytics.framework._
import com.reltio.analytics.utils.MetadataUtils
val framework: AnalyticsFramework = ...

val entities: DataFrame = framework.entities("configuration/entityTypes/HCP", activeOnly = true)

//equivalent to:
val sameEntities: DataFrame = framework
    .dataAccess
    .dataset(
            EntityDatasetBuilder
            .withAllFields(MetadataUtils.entityTypeUri("configuration/entityTypes/HCP"))
            .activeOnly(true))
    .build()

val relations: DataFrame = framework.relations(relationType, activeOnly = true)

//equivalent to:
val sameRelations: DataFrame = framework
    .dataAccess
    .dataset(
            RelationDatasetBuilder
            .withAllFields(MetadataUtils.relationTypeUri(relationType))
            .activeOnly(true))
    .build()

val interactions: DataFrame = framework.interactions(interactionType, activeOnly = true)

//equivalent to:
val sameInteractions: DataFrame = framework
    .dataAccess
    .dataset(
            InteractionDatasetBuilder
            .withAllFields(MetadataUtils.interactionTypeUri(interactionType))
            .activeOnly(true))
    .build()

 
//'select' previously used in DatasetBuilder now can be used directly with DataFrame.
//'explode' is now a method of DataFrame (via implicit conversion).
val explodedDataFrame = framework.entities("configuration/entityTypes/HCP")
  .lateralView(
    "attributes.FirstName" -> "FirstName", //equivalent to new EntityDatasetBuilder().explode("attributes.FirstName" -> "FirstName")
    "attributes.LastName", //equivalent to new EntityDatasetBuilder().explode("attributes.LastName")
    "attributes.Identifiers.*" -> "Identifiers"
  )
  .select("Identifiers#Type", "Identifiers#ID", "Identifiers#Status", "FirstName", "LastName") //this is DataFrame "native" select implementation

Analytics by Source

framework.entities("configuration/entityTypes/HCP", activeOnly = true, ovOnly = true,
attributesFromSource = Seq("HCOS", "HCP"),
attributesFromTable = Seq(...))