Kevin Bartley

In today’s data-first economy, ETL is not just a process for centralizing data in a data warehouse. Increasingly, teams are using ETL to send data from a data warehouse into third party systems. “Reverse ETL,” as this method is known, is quickly becoming a core part of tech data stacks. Here’s everything your data team needs to know about reverse ETL.

ETL vs. Reverse ETL – What’s the Difference?

To understand reverse ETL, consider this quick refresher on traditional ETL. Extract, transform, and load (ETL) is a data integration methodology that extracts raw data from sources, transforms the data on a secondary processing server, and then loads the data into a target database.

More recently, with the rise of cloud data warehouses, extract, load, transform (ELT) is beginning to supplant ETL. Unlike the ETL method, ELT does not require data transformation before the loading process. ELT loads raw data directly into a cloud data warehouse. Data transformations are executed inside the data warehouse via SQL pushdowns, Python scripts, and other code.

ETL and ELT both transfer data from third party systems, such as business applications (Hubspot, Salesforce) and databases (Oracle, MySQL), into target data warehouses. But with reverse ETL, the data warehouse is the source, rather than the target. The target is a third party system. In reverse ETL, data is extracted from the data warehouse, transformed inside the warehouse to meet the data formatting requirements of the third party system, and then loaded into the third party system for action taking.

The method is referred to as reverse ETL, rather than reverse ELT, because data warehouses cannot load data directly into a third party system. The data must first be transformed to meet the formatting requirements of the third party system. However, this process is not traditional ETL, because data transformation is performed inside the data warehouse. There is no “in-between” processing server that transforms the data.

Here’s an example: If a Tableau report contains a customer lifetime value (LTV) score, this Tableau-formatted data is not processable in Salesforce. So a data engineer applies an SQL-based transformation to this report data within Snowflake to isolate the LTV score, format it for Salesforce, and push it into a Salesforce field so sales representatives can use the information.

What’s the Impact of Reverse ETL?

By pushing data back into third party systems such as business applications, reverse ETL operationalizes data throughout an organization. Reverse ETL enables any team, from sales, to marketing, to product, to access the data they need, within the systems they use. The applications of reverse ETL are numerous, but some examples include:

  • Syncing internal support channels with Zendesk to prioritize customer service
  • Pushing customer data to Salesforce to enhance the sales process
  • Adding product metrics to Pendo to improve the customer experience
  • Combining support, sales, and product data in Hubspot to personalize marketing campaigns for customers.

Even in companies with a cloud data warehouse, data does not always end up in the right hands. Reverse ETL solves this problem by pushing data directly into the applications leveraged by line-of-business (LOB) users. In some companies, teams already have access to the data they need via BI reports. But, to the dismay of BI developers, these reports are often underutilized.

What teams really want is to access data within the systems and processes that they are familiar with. This is exactly what reverse ETL enables. With reverse ETL, business users can actually harness data in an operational capacity. Teams can act on the data in real-time, and use it to make key decisions.

Reverse ETL can also streamline data automation throughout a company. Reverse ETL helps eliminate manual data processes, such as CSV pulls and imports, involved in data tasks. Sometimes, reverse ETL is another way to complete a step in a broader data workflow. For instance, if you’re building an AI/ML workflow on top of your Databricks stack, reverse ETL can push formatted data into the sequence. Companies are also increasingly incorporating reverse ETL into in-app processes, such as syncing production databases.

Reverse ETL: What’s the Right Solution?

A team can either build or buy an ETL solution. Those that build an ETL solution must create data connectors — the connection between the data source and the data warehouse — from scratch. This building process can take weeks or months, depending on the development team, and hampers scalability, drains dev resources, and saddles the system with long-term maintenance requirements. For these reasons and others, data teams often consider SaaS ETL platforms.

SaaS ETL platforms come with “pre-built” data connectors. The number of connectors vary by provider, but platforms often offer “plug-and-play” ETL connectors for the most popular data sources. However, these data connectors extract data from third party systems and load them into data warehouses. Reverse ETL data connectors are the opposite. They must extract data from data warehouses and load them into third party systems.

In other words: an ETL data connector is not a reverse ETL data connector. So when you see ETL platforms advertising ETL data connectors, that does not necessarily mean that a reverse ETL data connector is also available. In fact, it’s not uncommon for teams to adopt an ETL platform and still have to build reverse ETL connectors on the backend.

Building a reverse ETL connector can be just as intensive as building a traditional ETL connector. So, if reverse ETL is important to you, research the feature for each platform before you buy. At Rivery, we offer reverse ETL for all of our data sources, including via Action Rivers. Send data from your cloud data warehouse to business applications, marketing clouds, CPDs, REST APIs, and more.

Reverse ETL Example

Send Data from Snowflake to Hubspot to Track Free Trial Account Usage

Here’s a real life reverse ETL example we’ve implemented at our very own company. Rivery uses reverse ETL to update custom Hubspot deal properties when certain events transpire in a DWH.

This is how we do it.

To perform Reverse ETL in Rivery, you will need:

  • Target data warehouse
  • Data destination (REST API endpoint)

A general knowledge of the destination API structure and behavior is recommended. Rivery’s Action Rivers will pass data from a DWH table through the corresponding inputs of a REST API endpoint.

Paint Point: What Problem is Reverse ETL Solving?

First, our pain point. At Rivery, we want our salespeople to track free trial account usage, so they can better respond to prospects in the sales cycle. This process revolves around three product usage properties in particular:

  • Number of rivers
  • Number of runs
  • Number of distinct sources

In order to power the workflow, we need to send these three custom properties about product usage to Hubspot, our CRM platform of choice, via reverse ETL.

Let’s assume we already have this data loaded in a table in Snowflake called RIVER_EXECUTIONS.

1. Define Data to Push to Hubspot

To start, we must create a logic step that returns only the data we need — number of rivers, runs, and distinct sources — to Hubspot. We want to identify these data points, but only for active trial accounts in our system.

When a new trial account is created, a Hubspot deal is auto-generated. The Hubspot deal includes the Rivery account ID as a custom property. We will use this account ID to connect Rivery usage data with our sales pipeline.

To define the data we need, we will apply a SQL query to the dataset:

select
deals.deal_id,
runs.rivery_account_id,
runs.rivery_num_of_rivers,
runs.rivery_executions,
runs.rivery_data_sources
from "INTERNAL"."ODS"."HUBSPOT_DEALS" deals
inner join (
select
rivery_account_id,
count(distinct river_id) as rivery_num_of_rivers,
count(distinct run_id) as rivery_executions,
count(distinct datasource_type) as rivery_data_sources
from INTERNAL.ODS.RIVER_EXECUTIONS
Where account_type = ‘trial’
group by rivery_account_id
) runs
on deals.rivery_account_id = runs.rivery_account_id
where deals.isdeleted = FALSE;

With this SQL query, we are:

  • Creating three unique metrics grouped at the account level for trial accounts
  • Joining these metrics to our existing deal pipeline.

Now, create a new Logic River in Rivery and use the ‘SQL/Script’ type of Logic Step. Choose your target type, your connection, and input your source query (i.e. what we have above).

Define a table in your target data warehouse to store these results. In our example, we’ll call this table ‘TRIAL_DEALS_PROPERTIES’.

2. Transform Data to Meet Hubspot CRM API Requirements

In the next step, we will utilize the Hubspot CRM API to update Hubspot deals with the data from Step 1. Each call to the Hubspot CRM API requires a dealid parameter, and a request body populated with a properties object to update the deals.

Here’s an example request body:

{
“rivery_data_sources”: 3,
“rivery_executions”: 591,
“rivery_num_of_rivers”: 39
}

By leveraging Snowflake’s JSON-building functions, such as OBJECT_CONSTRUCT(), we can produce this object for each deal_id. This enables us to perform each update call by passing both a deal_id and the corresponding properties into an Rivery Action River.

This example query uses the OBJECT_CONSTRUCT() function to produce the desired results:

select
deal_id,
object_construct('rivery_num_of_rivers', rivery_num_of_rivers,
'rivery_executions', rivery_executions,
'rivery_data_sources',rivery_data_sources
) as property_values
from "INTERNAL"."AUTOMATION"."TRIAL_DEALS_PROPERTIES"
group by
deal_id,
rivery_num_of_rivers,
rivery_executions,
rivery_data_sources
;

Sample Results

Now we’ll add another Logic Step and use the code above as the source query. However, instead of setting a table as the target value, we’ll store our results in a variable, one of Rivery’s Logic functions.

This will store the results of the query in a variable, which we can leverage in future Logic Steps.

3. Build Custom Connection for Hubspot CRM API

Using the Hubspot CRM API documentation, we can define the corresponding REST template within an Action River. We will leverage the Action River in our existing Logic River.

In the request body, we’ll define a variable called properties. Once we add the Action River as a step in the Logic River, this body will contain the properties that will be updated for each call.

4. Finalize Logic Steps

Lastly, we need to add the Action River from Step 3 to the Logic River from Steps 1 & 2. This will unify the Reverse ETL process in one Logic River.In the Logic River, we’ll add a third step, and select an ‘Action’ step. Choose the Action River from Step 3.

Next, click ‘Container Me’ on the Action River to wrap this step in a container. Change the container type to ‘Loop Over’ from the dropdown. We need to send one request per deal_id, so to handle multiple deals, we must loop over our Action Step if necessary.

The step will look something like this:

In the ‘for each value in’ dropdown, select the variable created by the second Logic Step (called ‘deal_properties’ here). In the Variables menu in the Logic River – make sure this variable is set to accept multiple values:

In the second window, create two iterators, one for the deal_id parameter and one for the properties.

In addition, set the Input Variables in the Action Step to match the iterators created above. This is the key that will connect the data values stored in the variable to the variables defined in the API request.

5. Drive Sales Process with Actionable, Relevant Data

And voila – our salespeople have access to the free trial information they need to optimize the sales process. With reverse ETL, we sent product data from our data warehouse into Hubspot, so our sales team can associate free trials with product usage and engage in more targeted sales practices.

Reverse ETL: Put Data to Work, Make Results Happen

Data warehouses were introduced to eliminate data silos. But for too many companies, data warehouses have become data silos. Reverse ETL solves this conundrum by removing the barrier between a data warehouse and the rest of the company. Now teams can operationalize data, in the systems and processes they feel comfortable with, and act on data to drive results. In today’s breakneck data economy, reverse ETL puts data in the hands and workflows that need it, when they need it.