Ophir Prusak
AUG 24, 2022
icon
5 min read
Don’t miss a thing!
You can unsubscribe anytime

In the beginning there was data.
And the data was bountiful, and the people rejoiced.
And then Adam ate from the modern data stack.
And all hell broke loose.

But seriously, we’re currently going through a truly transformational period in the world of data.
The plethora of tools and technologies released in the past few years is insane. It’s like every single day some new thing is being released that will finally solve your data problems.

To be fair, many of the tools are indeed game changing and make data teams lives better, but there is something missing from a lot of the “modern data stack” tools that’s been bothering me for a while.

What about the Day 2 problem?

Let me explain ….
Everything you build, whether it be a house or a data management solution, has three stages:

Day 0 – Figuring out what you need to build and how to build it.
Also known as planning, needs analysis, etc. On a side note, you should always start with why, but I’ll leave explaining that concept to Simon Sinek.

Day 1 – Building it.
Also known as construction, software development, etc.

Day 2 – Monitoring, Maintaining, Managing, and Updating it.
Also known as upkeep, fixing things, upgrades, updates, patches … you get the picture.

Software engineers already figured this out

I happen to come from the world of software engineering, where “Day 2” features and functionally are a given. If you’re not thinking about Day 2 problems from the start, there’s a decent chance that whatever you build will fall apart shortly after you finish building it.

While there are a lot of Day 2 requirements for Data Stacks, for this article I’m focusing on one specific need – environments.

Software engineers already know that when creating an even remotely complex software solution, you’re gonna need multiple environments.

At least two environments – one for development and one for production, and ideally one for staging as well. The QA team also wants their own environment, as does Ashley who wants to run the code locally on her laptop.

Environments are considered table stakes for software developers, and are part of the standard software development lifecycle, not to mention every decent software development tool and process.

And you’re also gonna need:

  1. An easy way (read: automated) to move things from one environment to another, also known as a deployment script.
  2. Environment variables.
    Imagine having the exact same pipelines running in dev and production, where the only differences are the {{db_connection}} or {{db_name}} environment variables. This makes working on data stacks sooooooo much easier.

So why don’t we have environments, deployment scripts and environment variables in most data stack tools?

Good question.
Seriously.
They should.

Ours does (see the demo below)

Interestingly enough, some of the older legacy data stack tools, which are provided as software (not SaaS), do have some support for environments, though it’s definitely not as seamless as you’d want. You’ll need to provision and maintain the servers, potentially needing to go through a procurement process for each data environment. What should be a 10 seconds process (as is the case with SaaS solutions) could take hours, or even days 😱

And forget about environment scalability with pure software solutions. If your dev environment is running on a low powered server and you now want to run a production level scale test which would require 10x the server resources – good luck with that. 🍀

Check out this interactive tour to see Rivery’s environments in action (click on the flashing dot)

Speak to a Rivery data expert today and we’ll show you how to solve all your data stack challenges. Not just the Day 2 stuff – also the really hard and complex no-other-tool can do this challenges.


Credits
I originally heard of the term “Day 2 problems” reading these most excellent articles:
Why is everyone ignoring the Day 2 Kubernetes problem
The Structure of Day 2 Problems

Minimize the firefighting.
Maximize ROI on pipelines.

icon icon