Best Practices for Reltio Connected Data for Snowflake

These recommendations enable you to accomplish Snowflake related tasks in the most efficient way.

The following are the recommendations on how to use Reltio Connected Data for Snowflake effectively:

Creating a Snowflake User

To create a Snowflake user, the Public key must be in a single line string format. For more information, see Creating a Snowflake User.

Creating a Normal Account

To create a Normal Account using the Create Account API, the Private key parameter passed in the request body must be in a single line string.

Creating a Default Account

To create a Default Account using the Create Account API, the Private key parameter passed in the request body must be in a single line string. It is also recommended that:
  • The defaultForTenant parameter passed in the request body must be set to true
  • The defaultDatabase, defaultWarehouse, defaultSchema parameter values must be passed in the request body

For more information, see Creating an Account.

Configuring the Pipeline API

The pipelines must be configured for the specific entity types, relation types, or interaction types.

Note: To prevent the creation of a number of empty tables or views in Snowflake, avoid using default pipelines.

To avoid overriding data, use different database schemas for different tenants and pipelines.

Finally, creating multiple pipelines for the same entity types, relation types, and interaction types causes the same data to get exported via multiple pipelines to the Snowflake database. This is not recommended.

For more information, see Overview of Creating Pipelines.

Creating a Job

No simultaneous job executions must be done for the same Pipeline. It is also recommended that you create separate pipelines for entities, relations, and interactions if the tenant has a large amount of data. You can start the job for each pipeline in parallel.

For more information, see Create a Job.

Configuring a Scheduler Job

The ScheduleConfig type must be configured only for a SCHEDULER job config type. The values can be BATCH or INCREMENTAL_BATCH. It is recommended that:
  • The startTime parameter is mandatory and cannot be a null value. If the value is a negative number, 0, or a positive number, then the current system time is considered as the start time.
  • The interval parameter contains the number of days and the hours. It must be in the following format:

    "any number from 0 to 999""case-insensitive : D"" any number from 0 to 23 "" case-insensitive : H".

    For example, if the interval parameter is 0d1h, it means that the incremental job runs every hour.

  • The value of the endTime parameter must be later than the value of the startTime parameter.
  • Incremental jobs must be run every 24 hours. Running the jobs earlier can cause data inconsistency in the Snowflake database, due to incremental jobs running concurrently for the same pipeline.
For more information, see Create a Job.

Updating the Job Status

The following valid Job state transitions are updated through the API:
For more information, see Update a Job.

Updating the Data Warehouse Size

As the connector needs a pre-configured Data Warehouse from Snowflake to run queries, ingestion, and transformation jobs, it is recommended that you use the following warehouse sizes for different work loads so as to avoid long-running jobs and job failures. Based on our performance tests, these recommendations are applicable for 80-90% of the use cases:
Table 1. Data Warehouse Sizes
Job Type Avg. Record Count Avg. Attribute Size Avg. Run Time (Ingestion + Transformation) Recommended Warehouse size
Batch <1M <20 3 minutes X-Small
  20-50 5 minutes X-Small
  >50 10 minutes X-Small
  1M-10M <20 10 minutes Small
  20-50 12 minutes Small
  >50 15 minutes Medium
  10M-50M <20 20 minutes Medium
  20-50 20 minutes Large
  >50 25 minutes X-Large
  >50M <20 30 minutes Large
  20-50   X-Large
  >50   X-Large
For Incremental Jobs, these recommendations are applicable for ingestion and transformation jobs. If the incremental update volume is significantly less than the initial batch jobs, downgrade to a lower warehouse than the warehouse used for batch ingestion.

For example: If a tenant has 10M profiles and 1M incremental updates are ingested every week, then you can use a Medium warehouse for batch jobs and use a Small warehouse for incremental jobs.