Timestamp Calculation

Timestamp Calculation defines the logic used to deduce the timestamp for datasets with match/merge events.

In a Dataset with merges, the timestamp field refers to the point when the merge key (that is, edge in graph terminology) was created and not when an entity was first added to the merged entity as a result of a series of merge events.

Timestamp Calculation

Timestamp calculation logic in brief:

  • If the winner changes as a result of a merge, set all timestamps for new entries in DataAccess to the event that caused master to change.
  • If the winner remains the same, preserve the previous timestamp for edges that existed before merge.

Suppose we have entities e1 and e2, and the merge event results in one merged pair "e1:e2", with timestamp "t1":


| mergeKey | winnerId | loserId | timestamp |
| e1:e2    | e1       | e2      | t1        |
    		

Case 1: Merge with e3 which is a winner for both e1 and e2, with timestamp "t2".

In this case we have:


| mergeKey | winnerId | loserId | timestamp |
| e3:e1    | e3       | e1      | t2        |
| e3:e2    | e3       | e2      | t2        |
    		

The original "e1:e2" entry with timestamp "t1" does not exist any longer. The whole new entity has been created based on two edges: "e3:e1" and "e3:e2", both with timestamp ""t2"".

Case 2: Merge with e4 which is a loser for e1, with timestamp "t3".

In this case we have:


| mergeKey | winnerId | loserId | timestamp |
| e1:e2    | e1       | e2      | t1        |
| e1:e4    | e1       | e4      | t3        |
    		

The original "e1:e2" entry with timestamp "t1" is preserved because the winner entity e1 has not been changed as a result of this series of merge events.