Itamar Ben Hemo
MAY 28, 2024
5 min read
Don’t miss a thing!
You can unsubscribe anytime

During the latter part of the 2010s, there was a pivotal shift in decision-making for organizations that recognized the importance of utilizing data.

This awakening spurred a rapid hiring spree across organizations looking to build up their data teams. Data Engineers, analysts, scientists, and chief data officers were hired at record paces.

Organizations found themselves in a frenzy to amass data at breakneck speed, with data teams expanding overnight and CFOs readily approving expenses to bolster the organization’s quest to become truly “data-driven.”

During this peak period, data teams invested in various tools without thoroughly assessing their impact on the organization’s bottom line. This era, which I now refer to as the “data-at-all-costs” era, marked a significant milestone in the world of data.

Reflecting on the inception of Rivery in 2019, I vividly recall a conversation with a data leader from a prominent e-commerce entity. They expressed how their executive team admired the data-driven approaches of industry giants like Google, Netflix, and Amazon, aspiring to replicate their success.

Revisiting this conversation recently revealed a stark contrast. The same leader said their budget had been slashed by half for the current fiscal year. They now faced the daunting task of justifying the value of their team’s efforts amidst dwindling resources and mounting demands.

This narrative is not uncommon. In recent months, I’ve engaged with numerous data leaders grappling with the repercussions of the Zero Interest Rate Policy (ZIRP). The days of deploying expensive technology stacks without CFO scrutiny are behind us. 

So, where do we proceed from here? How can your data team thrive in this new landscape with fewer resources?

Are Lean Data Teams A Bad Thing?

I strongly dislike the term “small data team.” It undermines the immense value that data practitioners contribute daily. Regardless of team size, data teams consistently make significant impacts on stakeholders’ lives.

This leads me to a question I’ve been pondering for the past month: Does the size of your data team matter? Seriously, does it make a difference whether your data team consists of eight engineers and analysts or just one engineer or analyst?

After much consideration, I’ve concluded that team size is not the determining factor. What truly matters is the value you provide and the speed at which you deliver that value to your stakeholders.

Currently, data teams are under immense pressure. Organizations have invested heavily in becoming data-driven, and now it’s time to demonstrate the return on investment (ROI) of these efforts.

Ask yourself: Is the work you do every day making a tangible impact on the business through data?

The harsh truth is that business stakeholders are not concerned about the intricacies of data acquisition. They don’t care whether data comes from a team of one or ten-plus people. They’re not interested in schema changes, API disruptions, or the sophistication of your data pipelines. 

What they care about is gaining access to high-quality data as quickly as possible.

Time is money, but money can’t buy time

I’m fortunate enough to have spent the past two decades of my career in data. From co-founding a data engineering consultancy to launching a company that empowers data analysts to build end-to-end data pipelines without relying on data engineers, I’ve always been intrigued by smaller teams being able to do more with less.

There’s a philosophy throughout the years I’ve been keen to when building and operating data teams, keep them as lean as possible.

This is not to say that large data teams can’t achieve significant results. They can, and the data teams at Facebook, Netflix, Airbnb, and Google are great examples of this being true.

But, the company you work for is unlikely like these nor does it have the same requirements these organizations need when it comes to data. The fact of the matter is that larger data teams have more resources available to them. And if you are on a lean data team you are likely wearing multiple hats. 

Over the years I have seen more and more data teams hold responsibility for spinning up infrastructure in production without a DevOps team. For those of you unfamiliar with building, deploying, and maintaining data infrastructure in production (consider yourself lucky), it’s not an easy task.

For starters, every piece of your data infrastructure needs to be secure, efficient, and most importantly interoperable with the rest of your stack. 

One of the most popular data engineering tools, Apache Airflow is a prime example of this. It’s quite common for data teams to run Airflow on Kubernetes. For starters, Kubernetes is not simple, and this process is extremely time-consuming and challenging especially when you are already short on engineering resources or have never deployed Airflow itself on Kubernetes. 

Airflow itself is a complicated stateful application that is made of a SQL database and a Redis cache. Each of these components is technically difficult, and scaling Airflow while ensuring it is always up and running is not simple.

To add to the fire, modern data stack technologies are linked together in a linear chain. Naturally, there are pressure points in terms of integration and manpower. A lot of resources are required to serve insights to the entire business. Tech-wise, upstream processes enable downstream processes. So if one link in the chain fails nothing downstream can function or get updated properly. It demands a lot of workarounds.

This process does not scale up well for lean data teams as they do not have the luxury of resources or the financial means to hire more employees to solve this problem further.

Ask yourself: If you are a data team member at a startup and you spend 2-3 months of your time spinning up infrastructure, what percentage of your organization’s runway is that?

As I’ve emphasized before and will reiterate now, members of lean data teams must be mindful of their time. Time is a precious commodity, and deploying infrastructure or performing backend work does not represent the most efficient use of it.

Instead, the focus should be on delivering value to the business swiftly and efficiently, leveraging available resources judiciously to achieve that goal.

A Better Way Forward For Data Teams

So, how do lean data teams overcome their resource limitations and make meaningful contributions to their organizations’ success?

I’ve been asked this question a few times this year, and my answer is broken down into two parts.

Consolidate and Automate

First, I recommend consolidating your data stack and looking at which parts of your stack you can automate. Remember, you are focusing on delivering ROI faster.

Buying modern tooling has the potential to shorten the data pipeline development time from weeks to just a few hours or days.

I like to break down tooling into two ways.

  1. A data stack approach that involves piecing together data ingestion, transformation, orchestration, and activation tooling to form an end-to-end data pipeline. This approach still requires engineering time to stitch all the tools together and ultimately slows down the creation of data pipelines.
  2. A data platform approach that involves utilizing a modern data platform to handle all the functionalities of building end-to-end data pipelines. This approach dramatically improves the development experience and removes the need to switch back and forth between tooling to create end-to-end data pipelines.

A platform approach to data pipelines consolidates the various functions that a small data team would need to build end-to-end data pipelines and blends simplicity and flexibility to adapt to specific use cases. Beyond consolidation, the nature of the platform’s wide capabilities, it can offer accelerators and templates that provide complete solutions thus cutting more wasted time.  

Incorporate AI into Data Engineering

Yes, Generative AI has made considerable strides, surpassing its previous iterations. Does this mean it will replace data professional roles within the next two years? Not likely. However, projecting a decade ahead, the landscape might indeed witness AI assuming the responsibilities of data engineers, analysts, and scientists.

Rather than shying away from Generative AI, I advocate embracing its potential within the realm of data engineering. Here are a few notable use cases that I’ve observed and experimented with:

  • Utilizing tools like ChatGPT to accelerate SQL query generation.
  • Leveraging AI for seamless translation between diverse SQL dialects.
  • Harnessing AI capabilities to automate dataset documentation, alleviates the burdensome task of manual documentation.
  • Employing AI to automate schema generation from JSON files, streamlining data modeling processes.

In all honesty, I’m deeply impressed by the advancements AI has brought to the data industry. I eagerly anticipate the future innovations that will continue to redefine the landscape of AI for data engineering.

Lean Data Teams Will Thrive With The Right Approach

As your organization expands, the demands on your data team to deliver value to stakeholders multiply. While engineers often relish the opportunity to construct solutions from scratch, operating within leaner teams necessitates a reevaluation of time allocation.

In essence, every minute spent on developing or maintaining solutions detracts from the organization’s ability to harness data effectively and prove the ROI of the data team’s endeavors. Managing a plethora of data sources, each with its own schema, compounds the challenge, requiring constant upkeep as APIs and sources evolve. Moreover, the relentless stream of ad hoc requests for new data sets can quickly become overwhelming.

It’s 2024, you shouldn’t be telling stakeholders you need a few weeks to fulfill requests. Because, when you do, it’s going to start making others question how you are making their jobs easier. After all, your role as a member of the data team is to make the lives of those who rely on data easier.

With end-to-end ELT solutions like Rivery, and the rise of AI, you have all the resources you need to deliver value fast to the rest of your organization.

Minimize the firefighting.
Maximize ROI on pipelines.

icon icon