Itamar Ben Hemo
SEP 27, 2024
icon
6 min read
Ingest data using Rivery

Earlier this year, I wrote why it is time for the modern data stack to evolve.

“Modern data stack technologies are linked together in a linear chain. Naturally, there are pressure points in terms of integration and manpower. A lot of resources are required to serve insights to the entire business. Tech-wise, upstream processes enable downstream processes. So if one link in the chain fails nothing downstream can function or get updated properly. It demands a lot of workarounds. This process does not scale up well.”

Now, six months later, it’s clearer than ever that AI is driving the consolidation of data stacks. As AI increasingly permeates the data landscape, the focus is shifting toward unified platforms that streamline data processes, reduce overhead, and open up new opportunities for innovation.

Shifting from the Modern Data Stack to Modern Data Platforms

I’m a firm believer in embracing change. It’s something I emphasize to my team every week: the landscape is evolving, and if we don’t adapt, we’ll be left behind. Since my last piece on the need for change in the modern data stack, I’ve spoken with dozens of customers and prospects about their plans for building data pipelines for AI use cases.

One theme stood out: “We still haven’t figured out how to do ELT effectively.”

This raised an important question for me—how can data teams be expected to build pipelines for AI (i.e. using retrieval-augmented generation (RAG) workflows) when they’re still grappling with ELT pipelines for analytics?

It makes sense, though. Data teams are traditionally burdened with managing a fragmented stack of tools for Extract, Load, Transform (ELT), data lakes, warehouses, reverse ETL, visualization, and orchestration to have it all play together. This disjointed approach requires specialized expertise in each tool, often resulting in inefficiencies in time and resources.

A modern data platform can address this by consolidating all the functions a common data team needs to build end-to-end pipelines. This platform simplifies processes, yet is flexible enough to handle specific use cases. It eliminates the need to manage multiple tools, providing centralized visibility over pipelines and making it easier to diagnose issues when things go wrong.

As companies are forced to prove the ROI of their data teams with smaller teams and tighter budgets, the pressure to deliver faster is increasing—especially in the post-ZIRP era. The fragmentation of the data stack is a major bottleneck for organizations struggling to scale or adopt new technologies like AI.

AI is Reshaping Data Engineering

If big data was an exciting era for the data industry, and the modern data stack created unicorn companies almost overnight, what will AI mean for the data world?

Every conference, blog (including this one), customer conversation, and product release is now centered around AI. From using GenAI to enhance customer support, creating AI bots to speed up troubleshooting, to leveraging AI to instantly connect to any REST API endpoint—every company, in some way, is becoming an AI company.

Sure, having AI in your product is great for checking a box for investors or prospects, but I think many companies overlook the core value AI brings.

As the CEO and Co-Founder of Rivery, my third venture, I’ve learned the importance of being relentlessly customer-focused. We build products that make a difference for engineers every day, and when integrating AI into features or workflows, it’s critical to stick to those foundational principles: how can AI create real impact for your customers?

In the data industry, AI is reshaping not only how vendors position themselves but also how customers make buying decisions (AI FOMO is real). According to a recent McKinsey survey, 72% of organizations have adopted AI in at least one business unit.

So, how are data engineers and data stacks evolving to keep up with these changes?

In my conversations with data teams and engineers worldwide over the past few months, it’s become clear that the role of the data engineer is rapidly transforming. AI is pushing data engineers to ingest new data sources (mostly unstructured data), transform data for novel uses (beyond analytics modeling), and load data into new destinations (like vector databases).

Additionally, companies are adopting an AI-first mindset, aiming to automate processes end-to-end and minimize the need for human intervention. Keep in mind that AI isn’t here to replace you—it’s here to help you work faster and more efficiently. Thus, the need for companies to adopt tooling that makes complex tasks as simple as possible.

But as the role of the data engineer shifts, how is the modern data stack evolving to support them?

As we mentioned earlier, the consolidation of the data stack is accelerating to accommodate Gen AI use cases. One of the biggest challenges data engineers face when building AI applications is integrating the right tools to bring in accurate, reliable data to power these applications. Organizations are also keen to leverage their own data to fuel AI products. For any AI initiative to succeed, you need control over your data and the ability to deliver it to the AI application at exactly the right time.

The Consolidating of the Data Stack with AI

What often gets overlooked in the data industry is why we’re here in the first place. Data teams—engineers, analysts, scientists, analytic engineers, and BI developers—exist for one reason: to help organizations leverage data as a strategic advantage.

Today, a company is only as strong as the data it works with and has access to.

The AI renaissance we’re experiencing is further lowering the barrier to entry for working with data. Just as the modern data stack in the mid-to-late 2010s made data more accessible and affordable, today’s AI advancements are doing the same, making it easier for organizations to harness the power of their data.

What was once seen as an unglamorous role, data engineering has now become one of the most sought-after jobs in the market. AI is reshaping data stacks, which are evolving into platform-based approaches. These platforms reduce the complexity of data operations by minimizing manual processes and automating workflows, leading to faster time-to-insight. The consolidation of data platforms is now offering end-to-end solutions that unify the entire data lifecycle, lightening the load on data engineers to build data pipelines for not only analytic use cases but also AI.

A common scenario I’ve encountered involves small data teams—typically a couple of engineers, an analyst, and a scientist—supporting an organization of hundreds. These teams not only leverage marketing data to optimize campaigns and personalize messaging but also rely on product analytics to enhance user experiences. Now, they’re also tasked with building pipelines for unstructured data, such as free-form text collected in different systems, to power AI applications.

Traditional approaches that stitch together separate tools for data integration, transformation, orchestration, storage, quality control, and BI are no longer sufficient. Can they work? Yes, but they often require days or even weeks to deliver value—and that’s simply not fast enough. When working with AI, the old ways won’t cut it.  A new, more efficient approach is essential.

As mentioned earlier, consolidating the data stack into an AI-powered platform streamlines all the functions a small data team needs to build end-to-end pipelines. Instead of piecing together multiple tools for AI and data architecture—like using data integration tools or open-source libraries, writing Python code, and loading data into vector databases like Pinecone or storing vector data in PostgreSQL—teams can simplify the process.

With a unified platform, data teams can load data directly into cloud data warehouses or lakes, leveraging existing infrastructure alongside the LLM or vector capabilities of these platforms. For example, teams can use Rivery to extract and load data into Snowflake, transform it for large language models (LLMs), and then run analysis using Snowflake’s Cortex functions through simple SQL queries on up-to-date, relevant data.

Alternatively, solutions like Amazon Q provide fully abstracted services that handle everything—from RAG workflow management and infrastructure to user interfaces, such as web chatbots—without needing to choose a vector database. All you need to do is load data into the platform. This is where Rivery comes in, efficiently managing data loading in the right structure into an S3 bucket dedicated to Amazon Q.

AI is the Future of Data Engineering

The consolidation of the modern data stack into an AI-powered data platform enables organizations to spend less time managing complex infrastructure and more time unlocking value from their data. While we are still in the early days of generative AI, it’s becoming clear that the future will be one where data-driven decision-making is faster, more accessible, and seamlessly powered by AI throughout the entire data lifecycle. Data teams that embrace GenAI will lead this exciting new era, while those that don’t risk falling behind.

AI’s rise might finally allow data teams to perform ELT more efficiently, which is why I believe those who adopt a platform-based approach will excel in both ELT for analytics and AI initiatives.

At Rivery, we are fully committed to this transformation, integrating generative AI into both our workflows and our product. We’ve reimagined the next generation of data pipelines to be AI-powered. With our Copilot, you can easily connect to any data source that has a REST API endpoint. Even if we don’t have a native connector for a specific source, we’ll still help you access all your data effortlessly, regardless of your data engineering expertise. Stay tuned for our plans as we continue to integrate AI, enabling users to do more with their data.

Minimize the firefighting.
Maximize ROI on pipelines.

icon icon