Using the Data Persist API

You can save results of analytics to Reltio using the Data Persist API.

Pick up results of analytics and persist them into the main Reltio storage. For example, to get reports instantly available in Reltio UI. The results can be persisted as new Analytics attributes.
Note: The following are not supported:
  • Partial override for sub-nested analytical attributes.
  • Null values.

Data Persist API v1.0

Data Persist API usage (For example, to persist new attribute values into Reltio) includes the following high level steps:

  • Prepare data source with results of analytics job. It can be any Dataset or RDD with Scala tuples. Data from this data source will be persisted to tenant primary data storage. Dataset can be defined using Data Access API.
  • Define mapping of Data source to Reltio model-according to the target tenant metadata.
  • Make Data Persist API call with the mapping and the Dataset. Using provided Java API a user can run a distributed Spark job which will transform input data to Reltio-compatible format and store it in Reltio primary storage. In .analyticsOutput:
    • Use AnalyticsAttributesOutputBuilder() to persist into Analytics attributes.
  • Call .saveDataAndGetJobId() or .saveData() to persist data.
    • Method .saveDataAndGetJobId() starts the data persist job and returns jobId. This is a recommended method.
    • Method .saveData() just starts the data persist job.
  • Search and UI: Data Persist API automatically triggers re-indexing job, after job completion all changes will be available in Reltio UI.

Data Persist API v2.0

Data Persist API v2.0 allows users to:

  • Use DataPersist with Dataset.
    • Use a set of new methods that appear in Dataset (through implicit conversion).

Builder methods

In AnalyticsAttributesOutputBuilder(), you can:

  • Skip launching re-index: use method .withSkipReindex().
    • Example: new AnalyticsAttributesOutputBuilder().withSkipReindex().
    • As a result, reindexing will be skipped and the job payload will contain "reindex":false.
  • Set custom cluster size: use method .withClusterSize(int clusterSize).
    • Example: new AnalyticsAttributesOutputBuilder().withClusterSize(2).
    • As a result, the "cluster" parameter of the job will contain "size":2.
  • Specify output file location: use method .withOutputFile(String path).
    • Example: new AnalyticsAttributesOutputBuilder().withOutputFile("s3n://$KEY:$SECRET@reltio.tenant.tst-01.$tenant/persistoutput/").
    • As a result, the job will contain outputDirectory":"s3n://KEY:SECRET@reltio.tenant.tst-01.mytenant/persistoutput/". DataPersist will generate a single text file with entity ids which were persisted and place this file in the specified location.
    • Output folder example: dataaccess_ids_output_DAY_MONTH_YEAR_HH_MM_SEC.
    • Output file example: part-00000.

Using Data Persist API

A simple example of Data Persist API is presented below.

val persistId = framework_af.dataPersist()
       .analyticsOutput(
                new AnalyticsAttributesOutputBuilder()
                       .fromDataFrame(PersistsIn)
                       .withMapping(ParserMapping.fromString(mapping_analytics))
        ).build
.saveDataAndGetJobId()

In this example, .saveDataAndGetJobId() starts the job and returns jobId; persistId is the ID of the job on EMR.

Using Data Persist API

Another example of Data Persist API:

import com.reltio.analytics.data.persist._
import com.reltio.analytics.data.persist.objects._
import com.reltio.analytics.data.persist.attributes._
import com.reltio.analytics.objects.transformation._
import com.reltio.analytics.framework._
val framework: AnalyticsFramework =...
val entitiesDF: DataFrame =...

//analytics attributes persist
entitiesDF.persistAnalyticsAttributes(framework, mappingFile = "path/to/file").saveDataAndGetJobId()
entitiesDF.persistAnalyticsAttributes(framework, mapping = ParserMapping.fromString(mapping)).saveDataAndGetJobId()

.saveDataAndGetJobId()