Table of contents
Full Snapshot Load
Full Snapshot Load is like taking a complete photograph. It copies everything each time. Simple but resource-intensive.
Implementation:
Database dumps and restores
ETL tools like Apache NiFi
Key challenge: Managing system resources during large copies
Incremental Load
Incremental Load only processes what's changed since last sync. Uses timestamps or version numbers to track changes. Change ids to not process the same change twice. Version numbers make sure we process it in the right order. For example, if we expect 4, but suddenly we're given the version 7 change, then we know that we should put it in some sort of hold queue, this is also known as the parking lot pattern. We periodically check the hold queue and check it every time we process a proper event.
Implementation:
Change tracking in source database
CDC tools like Debezium
Store processed change IDs in:
Database tracking table
Redis for speed
Stream processor state store
Key challenge: Handling failures and ensuring idempotency
Delta Load
Delta Load maintains the story of changes, not just what changed, but how and why. Think of it as keeping a complete history.
Implementation approaches:
Event Sourcing:
Business events flow through message bus (like Kafka)
Events stored in event store
State aggregator (another component in the architecture) maintains current state for quick access
Example: Order service publishing "OrderPlaced" events
Temporal Databases:
Databases that automatically track all versions
Examples: Datomic, PostgreSQL temporal tables
Gives you "time travel" queries
Custom Delta Tracking:
Build your own versioning system
Track changes with version numbers
Store full history of changes
Real-time Updates
Real-time Updates process changes as they happen, using streaming architecture.
Implementation:
Kafka for reliable event streaming
Stream processors (Kafka Streams, Flink) for processing (processing does the transformation)
Consumers read either directly from Kafka or from processed stream (depends on what each consumer needs)
Key feature: Can replay events when needed and of course, Kafka is extremely durable
Summary
The fundamental insight is that these patterns solve different problems:
Full Snapshot when simplicity matters more than efficiency
Incremental when you need efficiency but don't need history
Delta when you need complete history and change context
Real-time when immediate updates are crucial