The Business Case for Change Data Capture (CDC)
Move at the Speed of your Data
Data is the core of the modern economy. Businesses in every sector succeed or fail based on the data they collect, and what they do with that data. Today, companies in crowded markets gain a competitive edge not only from product differentiation, but also from efficient data processes.
Key among these efficiencies is speed. In order to make the best decisions, and target the proper customers, businesses need to act on up-to-date data. According to Exasol’s 2019 Data Decisions Report, 57% of companies are negatively impacted by data access that is too slow or too poor in quality.
Companies must have the right data at the right time to compete in a 24/7 global economy. But many teams still rely on delayed batch processing to sync databases. Batch processing does not sync databases in real-time. And the batch method remains broadly popular. A recent study found that 75% of businesses still rely on batch processing.
But right now, across industries, a big shift is underway. Many businesses are starting to use change data capture (CDC) to sync databases more efficiently. Change data capture empowers businesses to move at the speed of their data. CDC instantly and automatically syncs databases as soon as the source data changes.
Change data capture enables faster, more accurate business decisions, while minimizing resource expenditure. The technology’s instantaneous data updates, cost-effective incremental changes, and light IT footprint offer a win-win- win to businesses.
With the right CDC technology, companies can leave the inefficiencies of bulk processing behind, forever.
This eBook is a go-to resource for change data capture. Here we’ll explain what change data capture is, what the business benefits are, and provide an overview of top CDC solutions on the market today.
What is Change Data Capture?
Change data capture tracks changes in a source dataset and automatically transfers those changes to a target dataset. Essentially, CDC eradicates the siloization of data. Changes are synced instantly or near-instantly. In practice, CDC is often used to replicate data between databases in real-time.
Despite the introduction of CDC, most teams still use batch processing to sync data. With batch processing:
- data is not synced right away
- databases slow production to allocate resources for syncing
- data replication only occurs during specified “batch windows”
On the other hand, change data capture offers a new path forward. On a core level, change data capture:
- constantly tracks changes in a source database
- immediately updates the target database
- uses stream processing to ensure instant changes
With CDC, data sources include operational databases, applications, ERP mainframes, and other systems that record transactions or business occurrences. Targets include data lakes and data warehouses, including cloud-based platforms such as Google BigQuery, Snowflake, Amazon Redshift, and Microsoft Azure.
Once the data is replicated on the target database, teams can perform data analysis without taxing the production database. In today’s 24/7 marketplace, this kind of setup is becoming closer to mandatory, as businesses cannot afford to slow production for any amount of time.
Different technologies power change data capture offerings in today’s marketplace. These technologies include:
Tracks “LAST_UPDATED” and “DATE_MODIFIED” columns. This method only retrieves changed rows, and requires significant CPU resources to scan all the tables.
Executes a diff to compare source and target tables. Thiswill only load the data that differs. This method is more comprehensive than timestamps, but still places a big burden on the CPU.
Triggers are set off before or after commands that indicate a change. This produces a change log. With this method, each table in the source database requires a trigger, straining the system.
Database logs are constantly scanned to detect changes. The changes are captured without adding additional SQL loads to the system. This removes significant stress on the CPU.
Change data capture enables teams to replicate data instantly and incrementally. CDC records data changes piece-by-piece, instead of relying on massive, all-at- once transfers. This allows teams to stop treating data migrations as big “projects,” but rather as a byproduct of change data capture.
With CDC, data is always up to date. The source database and target database are continuously synced. Bulk selecting is a thing of the past. Only the modified data is synced with the cloud DWH. All other data remains static. This saves a tremendous amount of time, resources, and funding.
Business Benefits of CDC
How Change Data Capture Boosts the Bottom Line
Change data capture is a nifty innovation, but the biggest impact is on a business’s bottom line. From unlocking the monetary potential of data, to significant cost-savings, change data capture generates value where it matters the most. Here are some of the ways CDC makes businesses more profitable.
CDC Generates More Revenue
Data is only as valuable as its relevance.
A data point that records a customer entering a brick-and-mortar store is not very valuable 12 hours later. By then, the customer could find dozens of other places to buy a product. This is just one example, among countless others, of how out-of-date data can botch revenue opportunities.
But businesses that use out-of-date data don’t just risk losing individual deals. These companies also open themselves up to the long-term consequences of flawed decision making. Perhaps the data leads to a few missed opportunities. But it could also cause systemic issues that have a consistent negative impact on the business. These risks are hard to measure up front, and they’re even harder to reverse once a data infrastructure is built.
With change data capture, the risks associated with out-of-date data are entirely eliminated. Change data capture provides teams with instant access to the most up-todate data. This allows businesses to make decisions and take actions with the best data available. CDC necessarily improves the speed and accuracy of the data. Not only is data updated faster, it is also always 100% accurate.
Change data capture enables businesses to act on opportunities quicker. Companies can beat competitors to deals, all while cycling through a higher volume of opportunities. CDC also provides higher data quality for decision making. All of this empowers businesses to make faster, smarter decisions that generate more revenue.
CDC Creates Savings
90% of the world’s data was created in the last two years.
The infrastructure of the internet, built in some cases decades ago, does not have the bandwidth to transfer massive volumes of data instantly. This can become a serious problem for businesses that want to undertake projects with high data volumes, such as database migrations. These all-at-once data transfers severely congest network traffic, leading to migrations that are slow and costly.
Change data capture, however, loads data incrementally as opposed to all at once. Each time a data point changes in the source system, it is updated in the target, requiring minuscule bandwidth. With CDC, businesses are never subjected to large data transfers that crush network bandwidth. This reduces the cost of data transfers and saves weeks, months, and sometimes years of time.
CDC Protects Business Assets
One of the core issues with batch processing is that the method inherently creates opportunity costs.
During data transfers, batch loads slow down production databases and degrade performance. This can create opportunity costs in the form of lost deals.
Consider an e-commerce site with higher customer churn because the overtaxed production database slows down the site an hour each day. This is why batch processing requires specified “windows” when the production database is less taxed. But in a 24/7 global economy, there’s never an acceptable time to degrade the performance of a production database.
Change data capture, particularly the log-based type, never burdens a production data’s CPU. Log-based CDC captures changes directly from database logs, and does not add any additional SQL loads to the system. Incremental loading ensures that data transfers have negligible impact on database performance.
What this means, in business terms, is that CDC eliminates the opportunity costs that arise when a business is forced to slow down vital tech infrastructure.
CDC Eliminates Opportunity Costs
Data is not just something a company collects.
In today’s environment, data is the lifeblood of a business. Data is a business asset just as much as equipment or property are. However, mishaps that damage or delete data are common. For most businesses, such an event is not a possibility, but a probability. And for many companies, luck is the only thing keeping the incident from turning into a data catastrophe.
Change data capture protects data, a prime business asset, from deletion and destruction. By tracking changes not just to data, but to metadata as well, CDC offers companies that experience data loss a chance to repopulate impacted datasets. Once data is gone, it can’t be regenerated. But with the protection of change data capture, businesses can recover their essential data to fuel further business growth.
Rivery Change Data Capture
Instant, Automatic Data Integration Comes Full Circle
Although change data capture maintains impressive individual benefits, the true utility of the feature is not experienced in isolation. To make the biggest impact, CDC must operate as one piece in a broader solution. Rivery Change Data Capture accentuates the Rivery platform in just such a way.
Rivery Change Data Capture instantaneously syncs database updates with a cloud data warehouse. The feature not only enables faster data projects, but also minimizes resource expenditure. Rivery CDC is more than just an individual feature. The capability is another component of Rivery’s core mission to automate and instantize the entire data integration process, from start to finish.
Data teams can set up Rivery CDC in a few clicks. The feature uses real-time streaming to sync data changes between source and target as they happen, including schema alterations. This enables teams to generate more revenue by making faster, smarter decisions with high quality data. Rivery CDC also loads data incrementally, creating cost and time savings by avoiding colossal all-at- once data transfers.
Rivery CDC’s adherence to cloud data warehouse best practices also generates significant savings. Rivery CDC continuously streams into a client’s staging area, not into the database tables themselves. This produces cost-savings when customers are charged for updating database tables with each new record.
Rivery CDC constantly scans database log files for changes. This adds no new SQL loads to the system. Production databases can run at full capacity 24/7, eliminating opportunity costs that arise from slowing down operations. Rivery CDC’s log-based engine also records all changes to data and metadata, including deleted rows. This historical archive protects business assets from unforeseen circumstances, such as human error or destruction.
As a standalone feature, Rivery change data capture ensures that data teams never have to worry about database syncs ever again. But when combined with the rest of the platform, Rivery CDC is another key piece of Rivery’s automation of the data integration process, from beginning to end.
With Rivery, automation starts at the very first step: connecting data sources. Many teams spend countless hours and precious development time building data connectors. But with Rivery, data connectors come with the platform, prebuilt and ready to use. Rivery offers 90+ native data connectors right out of the box. With plug-and-play functionality, each data connector is set up in a matter of clicks. Rivery also develops connectors for custom data sources on- demand.
Logic Rivers are another core feature. Logic Rivers automate both the ingestion of data and the completion of SQL queries inside a cloud data warehouse. With Logic Rivers, customers can automatically orchestrate and transform their entire data workflow. Teams can preprogram data integration workflows that execute on their own.
By combining these three capabilities – pre-built data connectors, change data capture, Logic Rivers – teams can fully automate the data integration process.
Pre-built connectors eliminate the hassle of linking data sources, change data capture instantly updates data changes, and Logic Rivers automatically prepare the data for analysis. With these features working in tandem, data teams can sit back, relax, and let Rivery do all the work.
For Change Data Capture
As the demand for data replication grows, many solutions are emerging to meet this need. In today’s market, change data capture is offered as both a product feature or as a standalone product. Today, CDC solutions are built for small, medium, and enterprise businesses, with technologies that run the gamut from log-based to timestamp-based.
Below, we’ve compiled a representative overview of the market landscape to help you navigate the space.
Qlik develops business intelligence and data visualization software, in addition to offerings such as Qlik Replicate, a change data capture technology. Qlik Replicate uses database logs to scan and track data changes, taking the burden off of production databases, so business operations can continue uninterrupted. The solution enhances data delivery performance to boost strategic initiatives such as Big Data analytics.
PowerExchange Change Data Capture
Informatica offers enterprise cloud data management and data integration software, including PowerExchange Change Data Capture, a long-standing ETL solution. PowerExchange Change Data Capture records changes in a number of instances, such as customer creation or package location, as they happen. The stream of data updates is then synced in real-time with multiple targets, without intermediate steps.
Rivery Change Data Capture
Rivery is an augmented data management platform, built on an ELT paradigm, that automates the entire data integration process. Rivery Change Data Capture is a log-based technology that offers automatic and instantaneous data syncs. Rivery CDC continuously streams into a client’s staging area, instead of database tables, generating significant savings. Along with other features such as Logic Rivers, Rivery CDC empowers customer to automate all the data processes associated with integration.
Veeam Backup & Replication
Veeam produces backup, disaster recovery, and intelligent data management software, including Veeam Backup & Replication. Veeam’s Backup & Replication solution offers data replication, restore, and backup for cloud-based workloads, virtual machines, servers, and workstations. Veeam works a bit differently than standard CDC – Veeam Backup & Replication creates an image-based clone of a production VM to replicate data. The solution is best used for heavy-duty server backup.
Change Data Capture in Matillion
Matillion is an ETL tool for cloud data warehouses. Change Data Capture in Matillion enables customers to sync their Amazon Redshift data with their own database. Data changes are recorded on an S3 bucket. When the S3 bucket triggers an event, the data from the bucket is pulled and updated within Redshift. This option is feasible for Redshift users, but not for Google BigQuery, Snowflake, or Microsoft Azure users.
The Business Benefits of CDC Are Clear
And There Are Plenty of Options to Choose From
In today’s 24/7 economy, change data capture is more of a necessity than a luxury. In order to remain competitive, companies must find ways to quickly act on new data and insights. Opportunities are fleeting, especially in hyper-competitive spaces like eCommerce. Only companies with superior data operations can capitalize on the split-second timeframe in which customers make decisions.
CDC empowers businesses to move at the speed of their data. But change data capture can also generate substantial cost-savings. Companies incur no penalty, monetary or otherwise, for upgrading to CDC. It’s actually more expensive and less efficient to remain with batch processing. Businesses incur penalties for not adopting CDC.
But change data capture is more than just an individual feature. Companies should approach the technology as one component in an effort to optimize and automate data integration. CDC is not the goal. It is a means to a goal. Whether you adopt change data capture using Rivery, or some other solution, make sure your choice is well suited for the broader objectives of your tech stack.