Syncing data sources with your cloud data warehouse is essential for data congruity and data analysis. But how can you accomplish this without all the hassle?
In this Rivery Tip, we’ll outline how to keep your cloud data warehouse synced with your data sources, all within a matter of clicks.
There are two parts to this solution. First, you must conduct a historical backfill to sync your cloud data warehouse with all the existing data from a source. And second, you have to configure incremental loading to ensure that your cloud data warehouse remains up to date.
So, let’s dive in!
Perform a Historical Backfill – Using an API Connection
A historical backfill allows you to pull all the existing data from a source into a cloud data warehouse. This “backfills” the cloud DWH with all the data, both past and present, from the source.
To perform a historical backfill, you must first establish the entity or report that you want to pull into your cloud data warehouse. You can do this from the Source tab for any Data Source to Target River that uses an API.
The Please choose time period option allows you to select a predefined or custom date range to demarcate the historical backfill. These ranges are sometimes dictated by the API itself, since certain APIs do not hold data indefinitely.
Let’s say that you want to load Facebook Ads data, starting at the beginning of 2019:
- Select Custom Date Range in the time period selection dropdown.
- Choose January 1, 2019 as the Start Date.
- Specify an End Date, or leave it empty. When the End Date is left empty, Rivery will assign the current date.
Configure Incremental Loading – Using an API Connection
Incremental loads update your cloud data warehouse with the current source data. In Rivery, once the historical backfill is complete, no action is necessary to configure incremental loads. The Start Date will automatically change to the date of the last pull, and the End Date will default to the current day. The incremental load will resume where it left off, and pull data through the present date.
You will, however, need to set the right Loading Mode to conform with the loading requirements of your cloud data warehouse. For incremental loading, we recommend Append Only or Upsert-Merge.
Append Only simply adds all the pulled data into the target tables, without overwriting any past data.
Upsert-Merge replaces matching rows, keeps unmatched rows, and adds new rows. This mode requires you to define a primary key in the Column Mapping tab.
In Column Mapping, click the key symbol next to the fields you want to use as a key. By setting a field as your primary key, Rivery will check the source data values of this field (data being pulled in each run) against the target values of this field (data already stored in your target table).
If a source key value does not match any of the values in the target, the corresponding record will be upserted, or appended to the target table.
If a source key matches a value that already exists in the target table, then the source record will merge with or replace the existing target record.
That’s how Upsert-Merge only adds new or changed data into your cloud data warehouse.
Historical Backfilling and Incremental Loading – Using an Database Connection
The previous examples dealt with API data sources. Here’s how to perform historical backfilling and incremental loading with a database.
When connecting a database as your source, use Multi-Tables as your River mode to ingest data from multiple tables at once. This is the best option for historically backfilling an entire database into your cloud data warehouse.
In the Target tab, set your default loading mode to ‘Upsert-Merge’. If the tables do not already exist in your target, Rivery will create them automatically.
In the Mapping step, there is a list of schemas in your source database. Select them to see the metadata of the underlying tables.
Click on the Edit button on the far right of the table name to see the metadata for an individual table.
Here you can define the primary key for Upsert-Merge, just like in the previous section. Load tables incrementally by clicking the Table Settings tab and selecting ‘Incremental’ as your extract mode.
Then define the field that will create your increments. This can be either a date field or an integer used as a running number.
Once everything is set up:
- The first execution will pull the entire history and create the tables in the target data warehouse.
- Future loads will abide by the conditions set in the ‘Loading Mode’ and ‘Extract Method’ configurations.
That’s how you can configure a database source for both historical backfill and subsequent incremental loads all at once.
Maintain Congruity with Your Data Sources – In a Few Easy Clicks
Data congruity is essential for accurate and effective data analysis. With Rivery, teams can perform historical backfills and incremental loading in a few simple steps, whether the data source is an API or a database.
But we’re just getting started here. Keep your eye out for more Rivery Tips in the near future!