Daniel Buchuk

Buzzword or Breakthrough?

Data teams thrive on innovation. Improving and perfecting processes is directly correlated to the quality of business insights and therefore business success. Every year, dozens of new words and terms are created to articulate a new product, idea, or concept. Here, we break down some key terms everyone in the industry should be familiar with – and also try to figure out if they are real breakthroughs, or simply new buzzwords to rename something we’re familiar with. 

 

1. Data Mesh & Data Fabric

As first defined by Zhamak Dehghani, Data mesh is a new approach based on a modern, distributed architecture for analytical data management. It enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse. The decentralized strategy of data mesh distributes data ownership to domain-specific teams that manage, own, and serve the data as a product. 

There is also growing momentum around the concept of Data Fabric, which was popularized by Forrester analyst Noel Yuhanna. While they’re not exactly the same, they both rely on the idea of decentralizing data access – creating a data infrastructure in which data is treated as a product, and teams across an organization can autonomously manage their data & analytics processes. 

According to Forrester’s Yuhanna, the key difference between the data mesh and the data fabric approach are in how APIs are accessed. “[Data fabric] is the opposite of data mesh, where you’re writing code for the APIs to interface. On the other hand, data fabric is low-code, no-code, which means that the API integration is happening inside of the fabric without actually leveraging it directly, as opposed to data mesh.”

It can be argued that the concept of data mesh isn’t necessarily new. Large data-oriented enterprises have had to figure out how to decentralize and manage access to data across their organizations. However, thanks to the cloud, thousands of smaller companies and startups can access and benefit from Enterprise-grade data tools, systems and platforms – and quickly realized that a central BI or data team can become a bottleneck if analysts and engineers across the business can’t access the data they need, when they need it, right away.

 

2. Data Lakehouse 

Data Lakes and Data Warehouses are used to store Big Data, but they operate differently.  While a traditional Data Warehouse is a repository for structured, filtered data that has already been processed for a specific purpose, a  Data Lake is a large pool of raw data for which no use has yet been determined. 

The Data Lakehouse should combine the advantages of Data Lakes and Data Warehouses into a hybrid concept. The two systems are not operated side by side, but as a novel single system. It is not just about integrating a Data Lake with a Data Warehouse, but rather integrating a Data Lake, a Data Warehouse, and purpose-built storage to enable unified governance and ease of data movement. Once all data is available, Data Warehouses can still be built on top of it as a hybrid solution. 

The concept of a “Data Lakehouse” is not new per se. Christian Lauer explained on his article for Geekculture that this hybrid approach has been around for some time in the area of Cloud Data Warehousing and Lakes, and the Data Lakehouse now describes this established approach.

 

3. AutoML

AutoML (Automated Machine Learning) at its most basic involves automating irritating tasks such as data cleansing or preparation, which take up too much precious time from data scientists and engineers. However, AutoML also brings the potential to build complex models and create algorithms and neural networks. 

The ultimate goal is that in a not-so-distant future, any problem, ipothesis, or idea that needs to be tested will be able to apply machine learning through simple, user-friendly interfaces that keep the inner workings of ML out of sight, leaving them free to concentrate on their solutions.

In fact, new research suggests that the AutoML market is expected to surpass $14bn by 2030. The report highlights three key factors driving this growth: the soaring demand for fraud detection solutions, personalized product recommendations, and predictive lead scoring.

 

4. Augmented Analytics

Gartner defines Augmented Analytics as the use of enabling technologies such as machine learning and AI to assist with data preparation, insight generation and insight explanation to augment how people explore and analyze data in analytics and BI platforms. It also augments the expert and citizen data scientists by automating many aspects of data science, machine learning, and AI model development, management and deployment.

AutoML will be a key component of Augmented Analytics. Augmented analytics isn’t necessarily a solution but an approach which focuses on automating and improving as much of the analytics cycle as possible. AutoML’s aim is to specifically automate the development of machine learning models. It could be argued that every analytics user can benefit from augmented analytics, which has been hailed by some as the future of BI. However, is its remit too broad to become a catalyst for change? Or is it a useful ‘umbrella term’ to catapult the application of AI and ML throughout the BI cycle?