TL;DR — Direct Answer
Use CDC when downstream freshness must be minutes, source tables are too large to re-scan, or you need every intermediate state (audit, event sourcing). Use full load when tables are under a few million rows, freshness of hours is fine, or the source changes schema often. The dirty secret: a CDC pipeline is a distributed system you now operate forever — connectors, offsets, schema registry, snapshot recovery. For perhaps half the tables we see running CDC in the wild, a partitioned daily full load would be cheaper and more reliable.
What CDC actually costs you
Change Data Capture reads the database's write-ahead log and streams row-level changes. The demos are magical. Production adds four standing costs nobody itemizes: an always-on connector fleet that must be monitored, restarted, and upgraded; offset and snapshot state that must survive every failure mode; schema-evolution handling for every DDL change a source team ships without telling you; and operational expertise that walks out the door when your one Kafka person resigns.
The Postgres failure modes that bite at scale
Replication-slot WAL retention
A Debezium connector holds a logical replication slot. If the connector stops — crash, redeploy, stuck schema-registry conflict — Postgres retains WAL for that slot indefinitely. We have seen a paused connector quietly grow WAL until the primary's disk filled and took the production OLTP database down with it. Mitigations: max_slot_wal_keep_size (set it — it is unlimited by default before tuning), disk alerts on WAL volume specifically, and a runbook that treats a stopped connector as a page, not a ticket.
Initial snapshots on hot tables
Debezium's initial snapshot reads the whole table. On a high-write multi-hundred-GB table, the default snapshot can hold locks long enough to stall writers and replicas (especially on MySQL with global read locks; Postgres is gentler but long transactions still block vacuum and bloat tables). Use incremental snapshots (watermark-based) — and schedule the first sync like the risky migration it is, not a checkbox.
High-churn tables flood the topic
A table updated 50 times per row per day emits 50 events per row. Downstream you pay Kafka throughput, storage, and merge compute to reconstruct ... the same end-of-day state a full load would have given you for one scan. Status-flag tables, queue tables, and session tables are routinely worse under CDC.
When full load is genuinely cheaper
- Small-to-medium tables (< ~5M rows): a parallel SELECT with partitioned overwrite finishes in minutes. Zero standing infrastructure.
- Hourly/daily freshness is acceptable: most finance and reporting marts. Ask the consumer, not the architect.
- Schema changes weekly: a full load just picks up the new shape; CDC needs registry compatibility decisions every time.
- High update-to-row ratio: see churn above — you only want final state.
- The team is small: two data engineers should not operate a Kafka Connect cluster to sync 40 lookup tables.
The hybrid most mature platforms land on: CDC for the 5–15 large, hot, freshness-critical tables; scheduled full or watermark-incremental loads for everything else. Our EdTech streaming engagement (sub-3-minute end-to-end latency) runs exactly this split — CDC where minutes matter, boring batch where they don't.
A decision checklist you can defend
| Question | Points toward CDC | Points toward full load |
|---|---|---|
| Required freshness | Minutes | Hours+ |
| Table size | Too big to re-scan | < 5M rows |
| Update pattern | Mostly inserts | Heavy updates per row |
| Need intermediate states? | Yes (audit/events) | No, final state only |
| Source DDL stability | Stable | Changes often |
| Team ops capacity | Owns Kafka already | Two-person team |