Change Data Capture (CDC) Design Pattern

From GM-RKB
Jump to navigation Jump to search

A Change Data Capture (CDC) Design Pattern is a data integration design pattern that detects and captures data changes, and then triggers data processing.

  • Context:
    • It can (typically) describe records write, delete, and update events.
    • It can (typically) involve monitoring the database's log files or using triggers to detect when data is inserted, updated, or deleted.
    • It can (typically) include extracting the changed data and loading it into a different system.
    • It can (typically) employ streaming data platforms or messaging systems to relay the changes to target systems in near-real time.
    • It can (typically) serve as the foundation for event-driven architectures by providing the events that trigger subsequent workflows or processes.
    • It can (typically) be designed to handle large volumes of changes and be resilient to interruptions, ensuring that no data changes are missed.
  • Example(s):
    • An AWS DynamoDB Change Data Capture (CDC) with Lambda Processing Pattern, where changes to a DynamoDB table trigger Lambda functions for further processing.
    • A Debezium CDC Pattern used with Kafka, where database changes are captured by Debezium and streamed through Kafka topics.
    • A data replication setup where the CDC pattern is used to replicate changes from a primary database to a secondary one for high availability and disaster recovery.
    • WHen used to synchronize data between systems, ensuring that downstream applications have access to up-to-date information.
  • Counter-Example(s):
    • A Pub/Sub Messaging Pattern, which is primarily used for decoupling message producers from consumers and does not inherently involve capturing changes to data sources.
    • A Batch Processing Pattern, where data is collected over a period and processed at a scheduled time, rather than being processed in near-real time as changes occur.
  • See: Data Source, Database, Data Ingestion, Software Design Pattern.


References

2023

  • Bing Chat
    • A change data capture (CDC) design pattern is a data integration design pattern that allows systems to detect, track, and deliver the changes that occur in a source database to a target system or service. This pattern enables real-time data synchronization, replication, and analysis, as well as event-driven and reactive architectures. CDC works by capturing the insert, update, and delete operations that happen in the source database tables and publishing them as events to a message broker or a stream processor. The target system or service can then consume the events and perform the appropriate actions, such as updating its own data, triggering business processes, or generating reports. CDC helps to decouple the source and target systems, reduce the complexity and dependency of the system, improve the scalability and performance of the system, and support dynamic and flexible system configuration. CDC also faces some challenges, such as latency, reliability, security, and testing of the event delivery.

2021

  • (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Change_data_capture Retrieved:2021-2-12.
    • In databases, change data capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data.

      CDC is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources.

      CDC occurs often in data-warehouse environments since capturing and preserving the state of data across time is one of the core functions of a data warehouse, but CDC can be utilized in any database or data repository system.

2020

2008

  • (Jörg et al., 2008) ⇒ Thomas Jörg, and Stefan Deßloch. (2008). “Towards Generating ETL Processes for Incremental Loading.” In: Proceedings of the 2008 international symposium on Database engineering & applications, pp. 101-110.
    • QUOTE: … In this paper we review existing [[Change Data Capture (CDC) technique\\s and discuss limitations of different approaches. We further review existing techniques for refreshing …

2005