CDC vs Full Load: When Each Strategy Actually Hurts You

Q: When should I use CDC instead of full table loads?

Use CDC when consumers need minutes-fresh data, the table is too large to re-scan economically, or you need every intermediate row state for audit or event-driven consumers. If none of those hold, a scheduled full or incremental batch load is usually cheaper to build and far cheaper to operate.

Q: What is the biggest production risk of Postgres CDC?

Replication-slot WAL retention: if the CDC connector stops, Postgres retains write-ahead log for the slot indefinitely and can fill the primary's disk, taking down the source database. Set max_slot_wal_keep_size, alert on WAL volume, and treat a stopped connector as an incident.

Q: Is CDC more expensive than batch loading?

Per-byte moved, CDC is efficient; per-month operated, it carries standing costs — connector infrastructure, offset management, schema-registry governance, and on-call expertise. For small or slowly-consumed tables, those standing costs exceed what daily full loads would ever cost.

Q: Can I mix CDC and full loads in one platform?

Yes — that is the pattern mature platforms converge on: CDC for the handful of large, hot, freshness-critical tables; scheduled batch for the long tail. One orchestrator, two ingestion patterns, each where it is cheapest.

TL;DR — Direct Answer

Use CDC when downstream freshness must be minutes, source tables are too large to re-scan, or you need every intermediate state (audit, event sourcing). Use full load when tables are under a few million rows, freshness of hours is fine, or the source changes schema often. The dirty secret: a CDC pipeline is a distributed system you now operate forever — connectors, offsets, schema registry, snapshot recovery. For perhaps half the tables we see running CDC in the wild, a partitioned daily full load would be cheaper and more reliable.

What CDC actually costs you

Change Data Capture reads the database's write-ahead log and streams row-level changes. The demos are magical. Production adds four standing costs nobody itemizes: an always-on connector fleet that must be monitored, restarted, and upgraded; offset and snapshot state that must survive every failure mode; schema-evolution handling for every DDL change a source team ships without telling you; and operational expertise that walks out the door when your one Kafka person resigns.

The Postgres failure modes that bite at scale

Replication-slot WAL retention

A Debezium connector holds a logical replication slot. If the connector stops — crash, redeploy, stuck schema-registry conflict — Postgres retains WAL for that slot indefinitely. We have seen a paused connector quietly grow WAL until the primary's disk filled and took the production OLTP database down with it. Mitigations: max_slot_wal_keep_size (set it — it is unlimited by default before tuning), disk alerts on WAL volume specifically, and a runbook that treats a stopped connector as a page, not a ticket.

Initial snapshots on hot tables

Debezium's initial snapshot reads the whole table. On a high-write multi-hundred-GB table, the default snapshot can hold locks long enough to stall writers and replicas (especially on MySQL with global read locks; Postgres is gentler but long transactions still block vacuum and bloat tables). Use incremental snapshots (watermark-based) — and schedule the first sync like the risky migration it is, not a checkbox.

High-churn tables flood the topic

A table updated 50 times per row per day emits 50 events per row. Downstream you pay Kafka throughput, storage, and merge compute to reconstruct ... the same end-of-day state a full load would have given you for one scan. Status-flag tables, queue tables, and session tables are routinely worse under CDC.

When full load is genuinely cheaper

Small-to-medium tables (< ~5M rows): a parallel SELECT with partitioned overwrite finishes in minutes. Zero standing infrastructure.
Hourly/daily freshness is acceptable: most finance and reporting marts. Ask the consumer, not the architect.
Schema changes weekly: a full load just picks up the new shape; CDC needs registry compatibility decisions every time.
High update-to-row ratio: see churn above — you only want final state.
The team is small: two data engineers should not operate a Kafka Connect cluster to sync 40 lookup tables.

The hybrid most mature platforms land on: CDC for the 5–15 large, hot, freshness-critical tables; scheduled full or watermark-incremental loads for everything else. Our EdTech streaming engagement (sub-3-minute end-to-end latency) runs exactly this split — CDC where minutes matter, boring batch where they don't.

A decision checklist you can defend

Question	Points toward CDC	Points toward full load
Required freshness	Minutes	Hours+
Table size	Too big to re-scan	< 5M rows
Update pattern	Mostly inserts	Heavy updates per row
Need intermediate states?	Yes (audit/events)	No, final state only
Source DDL stability	Stable	Changes often
Team ops capacity	Owns Kafka already	Two-person team

Frequently Asked Questions

When should I use CDC instead of full table loads?