Ariel Pohoryles
SEP 24, 2024
icon
4 min read
Ingest data using Rivery

Introduction

A common question we hear from prospects is how to transition from Fivetran to Rivery. This shift is often driven by cost concerns or platform limitations. While Rivery’s flexible pricing and ingestion capabilities are compelling reasons to switch, users often discover even more value through additional features that streamline their data management workflows.

This guide outlines how Fivetran’s processes align with Rivery’s capabilities and highlights other beneficial functionality you can take advantage of after making the switch. Our goal is to make your migration to Rivery as smooth as possible.

Setting up your Rivery Account

When you get started with Rivery, you’ll notice your account contains two environments by default: Production and Development.

Rivery’s Environments give you optimum control and flexibility when organizing users permissions and the created assets (connections, pipelines, etc.) within your Rivery account.

For Fivetran enterprise or business critical level accounts, this may sound similar to the Fivetran’s Teams object but Rivery’s Environments are not just about ensuring that only the relevant users or groups of users have specific access to the relevant assets. Beyond the simplicity of managing users and assets across business units or even external customers, Rivery’s Environment enables an easy way to manage your data development life cycle and ensure you can easily develop pipelines within one environment and then deploy your pipelines (with all of their dependencies) into another environment. These deployments can be done seamlessly using variables that will match the environments you may have created in your destination data warehouse/lake. For example, if you plan to replicate data into Snowflake and you have a Snowflake development environment as well as a production environment, you can easily create a variable in Rivery and automate your deployments from one environment to another.

No more manual switching or management of your Fivetran connectors when moving from development to production.

If you want to get started quickly, we recommend you start configuring your pipelines in the Rivery development environment – this will help you keep your options open later on. If you already know how you want to organize your data development processes, you can create additional environments or rename the default ones to match your desired structure. Here is a short video demonstrating how to create environments and deploy assets in between those.

Moving your first Connector

Terminology

 

While establishing a data pipeline in Fivetran and Rivery employs similar components and processes, there are different names used for the components and a few differences between the two. The following table maps those out:

Fivetran Names

Rivery Names

Connectors

Source Connections +

Source to Target Rivers

Destinations

Target Connections

Transformations

Logic Rivers

As you can see from the table above, a data pipeline in Fivetran is named a connector and that object contains both the data source connection configuration as well as the data pipeline configuration. In Rivery, data pipelines (named Rivers) are decoupled from the data source connection configuration.

This decoupling comes in quite handy in the case where you need to set up multiple data pipelines against the same data source (i.e. because different datasets need to be replicated using different schedules or because you have a different configuration to apply for the replication or downstream process).

No more having to maintain your data source connection in multiple places.

The starting point

 

After you choose the first connector to move to Rivery, start by creating a connection for that data source. This step is usually very similar to configuration you would need to do in Fivetran.

Then you can configure your Target connection (aka Destination in Fivetran) again using similar steps as you would in Fivetran. However, unlike Fivetran, Rivery also offers the option to move your data to your target warehouse/lakehouse via your own cloud storage. This option is easily enabled by using custom file zones configuration.

No more storing a copy of your data on the Fivetran side/working extra to create snapshots.

Configuring your extract and load pipeline

With your source and target connections in place, you can start building your data pipelines using Source to Target Rivers. While the flow is somewhat similar to the Fivetran connector flow, you will notice additional configurations that will help you gain better control over your data pipelines and reduce additional work using downstream processes. If you want to move your connectors as fast as possible, in most cases using the default settings will be the closest to your existing Fivetran setup. However, if you want to take this opportunity to optimize some of your pipelines settings, you may want to use some of the following options:

  • Replicated data structure: On top of Fivetran’s basic ability to choose your destination schema name, Rivery offers fine grain control over your replicated data structure. While in most cases the default settings detected by Rivery are all you need, in the one time where you do want to have that control, these abilities can save you a lot of hours building and maintaining workarounds. This includes control over: Table prefix, Table name, Table keys, Table cluster keys, Column name, Columns data type (if you want to change the default mapping), Replication mode (i.e. upsert-merge/append/overwrite) at the table level, and more. Using these settings, data modelers (i.e. Analytics Engineers/Data Engineers/Data Analysts) can build their ideal data structure (in an ELT fashion) right from the ingestion step and without having to compromise or maintain additional costly processes post replication.
  • Incremental load mode: Rivery offers three modes to manage your data loading. The default Upsert – Merge matches the Fivetran default sync mode and the Append Only is somewhat similar to the Fivetran History mode. Rivery also provides an Overwrite mode that can be useful to implement certain business logic.
  • Calculated columns: Rivery gives you the option to add calculated columns along with your replication process using SQL expressions of your target warehouse/lakehouse dialect. This option can eliminate downstream transformation complexities by ensuring those columns are created right with the replication process.

  • Custom scheduling: Instead of controlling just the sync frequency of your pipeline, in Rivery you can control the exact scheduling including the time where the pipeline would run within the hour or defining a custom schedule using a cron expression. Note: there are even more advanced orchestration options when using Logic Rivers – those are detailed later in this guide.

  • Enforce Snowflake masking policy: If your destination is Snowflake, Rivery allows you to respect any masking policy you have configured in Snowflake while loading the data into it. That means you can define your data governance masking rules once in Snowflake and make sure your data pipelines don’t override those.
  • Custom query for database replication: Similar to Fivetran, Rivery offers a CDC replication mode for common databases. In cases where CDC isn’t possible or desired, Fivetran would direct users to use the Teleport mode which isn’t very efficient. In Rivery, you can replicate your data using a standard SQL extract where the SQL queries are either generated for you or where you insert your own custom queries to define the specific data to extract. This can be very useful in cases where you want to filter some data prior to the replication or when you need to create a custom dataset to replicate using your own SQL logic.
  • Predefined and Custom Reports for applications (API) replication: For most applications, Fivetran has predefined a normalized output schema to be created on the destination. Rivery on the other hand, provides users with the options to choose between replicating applications data by choosing from a set of Predefined Reports or creating their own Custom Report that includes their ideal selection of data to extract from the source application. Those reports tend to resemble the expected data structure to be extracted from the source and typically require less data transformation before they could be used for analytics. In addition, this output structure makes it easier to validate the extracted data with business users shortening acceptance test cycles.

 

No more losing control over your ingestion pipelines definitions and working harder to fix it downstream.

Backfilling historical data

 

Trying to avoid a full extraction of historical data by reusing the existing data already replicated with Fivetran is possible but not always very easy. While setting this up could potentially help you save time/money, in most cases the savings will be greater if you simply replicate the history again with Rivery and let it keep on managing incremental loads.

If you still want to try and avoid that initial first sync, you can try to adapt the Rivery output to a certain step in your downstream processes (ideally the staging step in your warehouse/lakehouse) where data is already picked up today to serve the rest of your transformations. The process will be as follow:

  • Set Rivery source to target river to start replicating raw data from a certain point in time
  • Build transformations that will Incrementally load the replicated raw data into a staging model. To avoid mixing with any running Fivetran connectors, it is recommended to do so into a new agnostic materialized view. To build those transformations you can either use Rivery’s Logic rivers to run SQL transformations, use dbt, or other solutions.

Adjusting downstream processes

For the most part, moving a connector is a relatively straightforward step. The part of the process that requires a bit more planning is in most cases not even a Fivetran step but still part of your end-to-end workflow which is your downstream data transformation processes.

Rivery gives you the option to orchestrate SQL/Python transformations in Rivery’s Logic River or to trigger a dbt job, databricks transformation job or other once the ingestion is done.

Whether you plan on keeping your current transformation tool or use Rivery for it, it is assumed that in most cases, you would want to generate a data model that aligns with the existing data model your analytics tools (i.e. Tableau, Sigma) or other tools are dependent on, so you don’t have to work to adjust those as well.

We briefly touched upon this above but essentially that means you would need to map the output of Rivery’s ingestion pipelines to the desired data structures you previously created.

For databases and files, this process is typically straightforward as the output structure created is typically very similar to the one you used to date (with/out a few metadata columns generated by Fivetran or Rivery).

For applications (APIs), the Fivetran normalized schema and potentially any dbt quickstart models you have used will likely differ from the Rivery Predefined/Custom Reports output. For common application (i.e. Salesforce, HubSpot, others) you may find a Rivery kit that generates a similar data model. Using such a kit can greatly simplify the alignments with any downstream process. For other applications, you will need to map the Rivery output to a certain step in the downstream process and adjust the SQL logic accordingly (similar to the process described under the “Backfilling historical data” section above).

Moving forward

Once you moved your first connector, you simply repeat the same steps across all other connectors until you finalize your move. You will notice that monitoring your pipelines in Rivery can be easily done from within your Rivery dashboard and activities report so you no longer have to depend on raw tables generated in your warehouse. If you do have downstream processes that depends on those tables, you can still populate those using the Rivery Activities kits or by pushing specific data via Logic Rivers.

At this point, you can start thinking about what else Rivery can help you optimize. Here are a few possibilities to consider:

  • Automate your deployments: With Rivery’s environment variables and deployments moving pipelines from dev to prod can be a very simple process.
  • Connect to niche sources: Using Rivery’s custom connections, you can easily integrate data sources you didn’t have a connector for.
  • Activate your analytics layer: Rivery can help you trigger downstream processes to make sure everyone’s data is as fresh as it is in your warehouse. For example, you can trigger a Tableau Extract refresh or a Sigma Workbook metallization to boost your dashboards performance using the freshest data and reduce your warehouse computing costs.
  • Operationalize your data with Reverse ETL: Rivery orchestration capabilities along with its Action Rivers enable you to build Reverse ETL workflows to push enriched data from your warehouse back into business applications.

Switching from Fivetran to Rivery offers not only the opportunity to save on costs and break limitations but also to optimize and enhance your data operations. We hope this guide exposed some of the points to help you achieve these optimizations.

Lastly, don’t forget, the Rivery team is always happy to support you along the way to make your data flow 🙂

Minimize the firefighting.
Maximize ROI on pipelines.

icon icon