Close

2020-01-07

Streaming Now: Debezium 1.0 Final Is Out

Streaming Now: Debezium 1.0 Final Is Out

Debezium is an open-source distributed platform that turns your existing databases into event streams so that applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted anytime. It can easily consume all the events it missed while not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Debezium official site is https://debezium.io/

Why Debezium?

The variety of potential use cases makes working on Debezium as a tool for change data capture enjoyable. When presenting the project at conferences, it’s great to see how people quickly get excited when they realize all the possibilities enabled by Debezium and CDC.

In a nutshell, Debezium is one significant enabler for letting you react to changes in your data with low latency. Or, as one conference attendee recently put it, it’s “like the observer pattern, but for your database.”

Here are a few things we’ve seen Debezium being used for as an ingestion component in data streaming pipelines:

  • Replicating data from production databases to other databases and data warehouses
  • Feeding data to search services like Elasticsearch or Apache Solr
  • Updating or invalidating caches

When using Debezium with Apache Kafka and its rich ecosystem of sink connectors, setting up such integrations can be done without any coding, just using deploying and configuring connectors in Kafka Connect:

Debezium with Apache Kafka

But many other use cases of CDC go beyond just moving data from A to B. When adding stream processing into the picture, e.g., via Kafka Streams or Apache Flink, CDC enables you to run time-windowed streaming queries, continuously updated as your operational data changes (“What’s the aggregated order revenue per category within the last hour”). You can use CDC to build audit logs of your data, telling who changed which data items at what time. Or update denormalized views of your data for efficient data retrieval, adhering to the CQRS pattern (Command Query Responsibility Segregation).

Finally, CDC can also play a vital role in microservices architectures; exchanging data between services and keeping local views of data owned by other services achieves higher independence without relying on synchronous API calls. One exciting approach in this context is the outbox pattern, which Debezium supports well. In case you don’t start on the green field (whoever does?), CDC can be used to implement the strangler pattern for moving from a monolithic design to microservices.

This presentation teaches you about change data capture use cases with Debezium and Apache Kafka from QCon San Francisco.

But you don’t have to take our word for it: you can find lots of blog posts, conference talks, and examples by folks using Debezium in production in our compilation of resources. If you’d like to see who else is already using Debezium, see our rapidly growing list of reference users (or send us a pull request to get your name added if your organization is already running Debezium in production).

Debezium 1.0

Since the initial commitment in November 2015, the Debezium community has worked tirelessly to realize the vision of building a comprehensive open-source low-latency platform for change data capture (CDC) for various databases.

Within those four years, Debezium’s feature set has grown tremendously: stable, highly configurable CDC connectors for MySQL, Postgres, MongoDB, and SQL Server, incubating connectors for Apache Cassandra and Oracle, facilities for transforming and routing change data events, support for design patterns such as the outbox pattern and much more. A very active and welcoming community of users, contributors, and committers has formed around the project. Debezium is deployed to production at many organizations from all industries, some with massive installations, using hundreds of connectors to stream data changes out of thousands of databases.

The 1.0 release marks a significant milestone for the project: based on all the production feedback we got from the users of the 0. x versions, we figured it’s about time to express the maturity of the four stable connectors in the version number, too.

Original article published by Gunnar Morning on the https://debezium.io/blog/2019/12/18/debezium-1-0-0-final-released/

Gunnar Morling

Gunnar is a software engineer at Red Hat and open-source enthusiast by heart. A long-time Hibernate core team member, he’s now the project lead of Debezium. Gunnar is the spec lead for Bean Validation 2.0 (JSR 380). He’s based in Hamburg, Germany.