Full Snapshot Load

Full Snapshot Load is like taking a complete photograph. It copies everything each time. Simple but resource-intensive.

Implementation:

Database dumps and restores
ETL tools like Apache NiFi
Key challenge: Managing system resources during large copies

Incremental Load

Incremental Load only processes what's changed since last sync. Uses timestamps or version numbers to track changes. Change ids to not process the same change twice. Version numbers make sure we process it in the right order. For example, if we expect 4, but suddenly we're given the version 7 change, then we know that we should put it in some sort of hold queue, this is also known as the parking lot pattern. We periodically check the hold queue and check it every time we process a proper event.

Implementation:

Change tracking in source database
CDC tools like Debezium
Store processed change IDs in:
- Database tracking table
- Redis for speed
- Stream processor state store
Key challenge: Handling failures and ensuring idempotency

Delta Load

Delta Load maintains the story of changes, not just what changed, but how and why. Think of it as keeping a complete history.

Implementation approaches:

Event Sourcing:
- Business events flow through message bus (like Kafka)
- Events stored in event store
- State aggregator (another component in the architecture) maintains current state for quick access
- Example: Order service publishing "OrderPlaced" events
Temporal Databases:
- Databases that automatically track all versions
- Examples: Datomic, PostgreSQL temporal tables
- Gives you "time travel" queries
Custom Delta Tracking:
- Build your own versioning system
- Track changes with version numbers
- Store full history of changes

Real-time Updates

Real-time Updates process changes as they happen, using streaming architecture.

Implementation:

Kafka for reliable event streaming
Stream processors (Kafka Streams, Flink) for processing (processing does the transformation)
Consumers read either directly from Kafka or from processed stream (depends on what each consumer needs)
Key feature: Can replay events when needed and of course, Kafka is extremely durable

Summary

The fundamental insight is that these patterns solve different problems:

Full Snapshot when simplicity matters more than efficiency
Incremental when you need efficiency but don't need history
Delta when you need complete history and change context
Real-time when immediate updates are crucial

Data Loading Patterns (data integration)

Table of contents

Full Snapshot Load

Incremental Load

Delta Load

Real-time Updates

Summary