MongoDB to Snowflake CDC Data Replication with Rivery

Zach Cie

FEB 28, 2024

CDC Database data extraction MongoDB Snowflake

5 min read

Content

Don’t miss a thing!

You can unsubscribe anytime

The demand for real-time data availability has become increasingly critical for organizations. Over the years, data utilization has evolved from static dashboards to fueling products, machine learning models, and near-real-time dashboards.

Transactional databases like MongoDB often host modern applications’ operational data, where data storage and access patterns are highly optimized for fast writing and retrieval, and many other factors like the availability and structure flexibility of data. To enable that functionality, MongoDB has its own recommended data model and access patterns which are not always as flexible as standard SQL for analytics purposes.

As a result, organizations are shifting from running analytics directly on top of transactional databases like MongoDB to running those on advanced storage platforms like Snowflake, for meaningful analysis where data can not only be rapidly analyzed via SQL but also easily combined and modeled with other data sources.

Migrating Data from MongoDB to Snowflake

There are two well-known methods for migrating data from MongoDB to Snowflake.

Write custom scripts to manually move data in CSV files from MongoDB to Snowflake.
Use a tool like Rivery to rapidly build a data pipeline to continuously migrate data from MongoDB to Snowflake.

The first option involves manually moving data from MongoDB to Snowflake and isn’t straightforward. Even the smallest change in MongoDB involves implementing and maintaining incremental fields and retrieving data via select queries to keep the data up to date in Snowflake.

While there are use cases that which this might be the best option (i.e., a one-time data copy that doesn’t need to be automated), the reality is that for most data teams the manual process of moving data between sources is not scalable, error-prone, and requires technical resources to maintain over time.

The latter is the preferred method to efficiently migrate data from MongoDB to Snowflake due to Rivery’s Change Data Capture (CDC) functionality. CDC represents an efficient method for enabling near real-time data replication, ensuring that analytics are performed on the freshest data available while minimizing the impact on the source database. The design pattern of CDC is utilized to identify, monitor, capture, and deliver changes made to transactional databases, such as MongoDB.

Enter Rivery

We firmly believe that the migration of data from MongoDB to Snowflake should not rely on manual intervention. Instead, it should be a seamless, automated process designed to effortlessly establish data pipelines tailored to your specific business requirements.

Rivery’s SaaS ELT Platform leverages MongoDB Change Streams – which is MongoDB’s capability that powers CDC for MongoDB. Under the hood, Rivery uses the Overwrite loading mode to take a full snapshot (or migration) of the chosen table(s) to align the data and metadata as it was on the first run. From there, our platform takes the existing Change Stream records and performs an Upsert-Merge to the target table(s) once the history migration is complete while continuing to fetch new records from the log as they are created.

Our strategic partnership with Snowflake streamlines setting up a data pipeline to migrate MongoDB data to Snowflake using Rivery’s CDC Data Replication. We put together a guide with our friends over at Snowflake to walk you through the process.

Before getting started you will need to have the following:

Access to a Snowflake Account
Access to a MongoDB database (if you don’t have your own you can sign up for a trial)
A Rivery Account (if you don’t have one we offer a 14-day free trial with no credit card required)

Rivery is a fully managed SaaS application, removing the complexity of managing data pipelines over time. With Rivery, data teams can securely and effortlessly migrate high-volume database data to cloud-based data warehouses (like Snowflake) with ultra-low latency.