Defining Datasets for Entities

Learn about defining datasets for entities.

Dataset for HCP entity

// Build DataFrame
val df:DataFrame = framework.dataAccess
  .dataset(
    new EntityDatasetBuilder() //create the builder
      .activeOnly(true)
      .ofType("configuration/entityTypes/HCP") 
      .select("Id")
      .select("attributes") //select attributes field. This is a complex attribute, so all the nested attributes are included as well
      .explode("attributes.Specialities.Specialty", "Specialty") //produces a simple DataFrame field for Specialty (no more array values)
      .asTable("hcps")
  ) //register the DataFrame as hcps temp table
  .build() // trigger construction. Note that this only triggers schema construction. Actual data is loaded only after the table is queried.

This code results in a dataset with the following schema:

root
|-- Id: long (nullable = true)
|-- attributes: struct (nullable = true)
| |-- FirstName: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- LastName: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- Gender: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- Specialities: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- SpecialtyType: array (nullable = true)
| | | | |-- element: string (containsNull = true)
| | | |-- Specialty: array (nullable = true)
| | | | |-- element: string (containsNull = true)
|-- Specialty: string (nullable = true)
Run a SQL query against the Dataset:
%sql select * from hcps

Dataset for HCO entity

Dataset for the HCO entity.

val hco:DataFrame = framework.dataAccess
.entity(
  new EntityDatasetBuilder()
    .ofType("configuration/entityTypes/HCO")
    .select("Id")
    .explode("attributes.Name", "Name")
    .explode("attributes.JDENumber","JDENumber")
    .explode("attributes.CompanyType", "CompanyType")
    .explode("attributes.Status", "Status")
    .explode("attributes.IDNAffiliation.Name", "IDN")
    .explode("attributes.Address.City", "City")
    .explode("attributes.Address.StateProvince", "State")     
    .explode("attributes.Address.GeoLocation.Latitude", "Lat")
    .explode("attributes.Address.GeoLocation.Longitude", "Long")
    .asTable("hco"))
.build()

Dataset for HCP entity with crosswalks

Dataset definition for the HCP entity with crosswalks:

val hcpData:DataFrame = framework.dataAccess
  .dataset(
    new EntityDatasetBuilder()
      .crosswalks()
      .ofType("configuration/entityTypes/HCP")
      .select("Uri")
      .asTable("hcp_crosswalks")
  )
.build()

Viewing Data in the Dataset

Perform the following steps to view data in the dataset:

  1. To view data in the dataset, execute the following query:

    %sql select * from hcp_crosswalks

  2. To get the number of crosswalks in the newly created dataset, execute the following query:

    DataFrame.count()

  3. To request crosswalks with default attributes, execute the following query:

    .crosswalks()

    A SELECT SQL statement returns the following:
    sourceTable: null
    source: configuration/sources/DEA
    value: hcp.t1111994534600.e1
  4. To request crosswalks with audit attributes, execute the following query:

    .crosswalks("Uri","source","value","createDate","reltioLoadDate","sourceTable","updateDate","deleteDate")

    A SELECT SQL statement returns the following:
    createDate: 20160519T12:32:37.432+0000
    updateDate: 20160519T13:32:37.432+0000
    deleteDate: 20160520T15:32:37.432+0000
    reltioLoadDate: 20160519T12:32:37.432+0000
    Uri: entities/0Htszvb/crosswalks/2G7VRJfL
    sourceTable: null
    source: configuration/sources/DEA
    value: hcp.t1111994534600.e1