Data loading processing in the data warehouse to handle deletes

When you are populating your data vault, you might need to delete you stage-tables in an asynchronous way;

load -> staging -> integration layer

Only – and only – when you have populated the integration layer, you can delete the entries from your load table. One way to achieve this is to implement a delete-tracking-table that will track your deletes. The process is like this:

  1. Set up a metadata-table that contains: your target table and its source table
  2. After populating a table in the integration layer, you store this information in the delete-tracking-table. Concretely you track: the source table, the table in the integration layer, and the highest load date in your table in the integration layer
  3. Initiate the delete-process: for each source table defined in your metadata-table get the lowest load date from the delete-tracking-table. If there is no entry in your delete-tracking-table, use 1753 as a default. Delete every entry from your source table that is lower than this load date.

Leave a Reply

Your email address will not be published. Required fields are marked *