Airflow Integration to Build Data Workflows

Ujjwal Tamhankar

AUG 26, 2020

5 min read

Content

Don’t miss a thing!

You can unsubscribe anytime

Rivery is designed to enable all team members to seamlessly generate insights from data without hassles on the backend or grunt work.

Now, by using the Rivery API, any team member can build data pipelines in Rivery for use in broader data management workflows.

By combining the Rivery API and Apache Airflow, data analysts and other personnel can harness the data pipelines they need in a workflow, regardless of technical background. Here’s what Rivery’s Airflow integration can do for your team.

What is the Rivery API?

The Rivery API integrates the functionality of Rivery’s platform into other applications or schedulers. The API can activate data pipelines or check their status from 3rd party platforms, allowing customers to trigger actions externally and to programmatically automate pipeline executions.

What is Apache Airflow?

Apache Airflow is an open-source, Python-based platform used to create, monitor, and schedule workflows. Airflow’s scalability, extensibility, community, and integration with DevOps tools has made it the go-to platform for data engineers in building data ingestion and transformation workflows.

Rivery API + Apache Airflow: Combine ELT with ETL Architectures

With Rivery’s Airflow integration, teams do not have to choose between ETL and ELT. The Rivery API incorporates ELT capabilities alongside ETL processes built off of Airflow. For companies with deeply-ingrained ETL paradigms, Rivery’s Airflow integration offers the ability to access the benefits of ELT while still keeping an existing data architecture.

It’s the best of both worlds. Data engineers do not have to build data pipelines, and companies do not have to reorient their data stacks. The integration enables any team member to create, maintain, schedule, and optimize data pipelines in Airflow, regardless of technical expertise.

Non-technical team members, such as data analysts, can use preexisting data connectors and workflow templates to build pipelines for BI tools, analysis, reporting, and more. At the same time, data engineers can still use Python, Javascript, Hive, and other existing frameworks to build workflows for backend server optimization or other complex use cases.

Airflow’s task dependency management, combined with Rivery’s integration with cloud data warehouses, also allows for production-level code deployment orchestrated via logical steps in Airflow and Rivery.

Rivery’s Airflow integration streamlines the process of data ingestion. Now any team member can harness a data pipeline without requiring a data engineer. Unlock a new level of speed and efficiency. Execute quicker data projects. Capitalize on opportunities faster. And allow data engineering teams to focus on more important tasks.

Rivery Airflow Integration: How it Works

Teams can set up the Rivery Airflow integration in a few simple steps. First, the Rivery API is added as a connection in Apache Airflow.

By building a DAG (Directed Acyclic Graph) in Airflow, the Rivery API is called within a workflow via a Bash Operator. Once the Bash Operator initiates, the Rivery API executes a specified data pipeline:

A DAG with a Bash Operator (Image Credit: Chandu Kavar)

The DAG workflow appears in the Airflow UI, allowing non-technical users to view the overall workflow, while giving technical users the ability to configure, schedule, or manipulate individual processes if they so desire.

Read our Community article for a more in-depth, technical walkthrough.

Rivery Airflow Integration: Anybody Can Use It

With Rivery’s API, teams can schedule and control complex workflows using an open-source, mainstream data engineering tool like Airflow. This enables anyone in an organization to take part in the data management process.

For companies that don’t want to give up ETL-based platforms such as Airflow, the Rivery API offers the advantages of ELT without uprooting data operations.

It’s a win-win, and we’re looking forward to seeing how our customers use this combination!

Ujjwal Tamhankar

Solutions Architect

Ujjwal is a versatile and accomplished Solutions Architect with a strong track record of delivering impactful solutions and driving revenue growth. With expertise in cloud technologies, data engineering, AI, and systems analysis, he designs and implements data solutions across diverse industries. With a Master's degree in Systems Engineering, he brings a comprehensive academic foundation to his work.