How To Automate Database Syncs with Change Data Capture

Taylor McGrath

JUN 4, 2020

5 min read

Content

Don’t miss a thing!

You can unsubscribe anytime

Rivery change data capture (CDC) performs continuous real-time syncs between relational databases and cloud data warehouses.

In a few clicks, data teams can automate streaming database syncs that are faster, more efficient, and more cost-effective. Here’s a step-by-step guide on how to automate continuous, real-time database syncs using change data capture in Rivery.

Briefly: What Is Rivery CDC?

Rivery change data capture instantly and automatically syncs a database with a cloud data warehouse.

The feature tracks changes by continuously reading data and metadata from database binlogs, and does not add any additional SQL loads to the system.

Rivery CDC streams continuously into the client’s staging area, not directly into database tables like most other solutions. Data teams can use CDC to migrate operational databases, perform high frequency syncs, combine marketing and internal data in a data warehouse, and much more.

Read our recent blog on Rivery change data capture to learn more about the benefits.

How To Set Up Change Data Capture In Rivery

1. Navigate to the top right hand corner and select Create New River. Choose Data Source to Target.

2. Select Step 1: Source. Choose the relational database you want to pull data from.

3. Pick Multi-Tables for River Mode. This will allow you to pull data from every database table all at once.

4. Set Default Extraction Mode to Log Based. This will continuously read from the database logs to capture all data and metadata changes in the source database.

All changes are immediately updated in the target cloud data warehouse.

5. Navigate to the bottom of the page and choose Enable Log to access the logs of the database.

For more guidance on configuring specific databases, see the corresponding docs for MySQL, MS SQL Server, or PostgreSQL.

6. Select Step 2: Target. Choose your target data warehouse, including Google BigQuery, Amazon Redshift, Azure DataLake, Snowflake, and Azure SQL Data Warehouse.

7. Once a target is selected, establish a Target Connection via the dropdown or create a new one.

8. Next, set the destination Database and Schema Name. Optionally, add a table prefix.

9. Choose the Loading Mode. Upsert-Merge replaces matching rows, keeps unmatched rows, and adds new rows.

Append only adds data to the table. Overwrite replaces entire tables with the new versions.

10. Choose whether to include Log Snapchat Tables in the target.

These tables will accompany each table with a history of all changes made in the source database.

11. Select Step 3: Mapping. Choose which Schemas you would like to migrate.

12. Mark the Tables you want to load into the target database.

Despite the default extraction methods, you can set individual Tables to use either standard extraction or log-based extraction (i.e. change data capture).

13. Click the Edit button to change the configuration of a specific Table, including setting cluster keys and field modes.

Choose Table Settings to edit the Table’s Loading Mode and Extraction Method. If Incremental Extraction is selected, you must define your incremental field.

14. Once you’ve finished your configurations, save and then run your River. You can monitor the progress of the River in the Activities tab.

A Few Clicks Later, and You Never Have to Worry About Database Syncs Ever Again

With Rivery change data capture, teams can automate real-time database syncs in a few clicks. Achieve new efficiencies and create cost-savings without complicated syncing configurations.

When taken with the rest of Rivery’s platform, change data capture is the latest feature to give our customers the competitive edge to achieve rapid growth in fast-evolving markets.

Taylor McGrath

VP Solutions Engineering

Taylor leads Rivery's solutions engineering team. She is passionate about the data landscape and using internal use cases to enable go-to-market teams. A seasoned data expert, Taylor has extensive experience in end-to-end data projects covering data ingestion, cloud migration, and enabling self-service data analytics across organizations.