Pain Point

Legacy Ingestion Bottlenecks

Summary

What it is

Older ETL systems designed for HDFS or traditional databases that cannot efficiently write to modern S3-based lakehouse architectures.

Where it fits

This pain point is the migration friction between the old world (Hadoop, RDBMS, batch ETL) and the new world (S3 lakehouse). It slows adoption and forces dual-system operation during transitions.

Misconceptions / Traps

"Lift and shift" rarely works. Legacy ETL tools produce formats, file sizes, and write patterns incompatible with lakehouse best practices.
CDC (Change Data Capture) is the modern replacement for batch ETL, but it introduces its own complexity (Debezium, Kafka, schema registries).

Key Connections

Apache Ozone solves Legacy Ingestion Bottlenecks — HDFS migration path
Apache Hudi solves Legacy Ingestion Bottlenecks — incremental ingestion primitives
Medallion Architecture constrained_by Legacy Ingestion Bottlenecks — Bronze layer receives legacy data
scoped_to Data Lake, S3

Definition

What it is

Older ETL systems and ingestion pipelines that were designed for HDFS or traditional databases and cannot efficiently write to modern S3-based lakehouse architectures.

Relationships

Outbound Relationships

scoped_to

Data Lake S3

Inbound Relationships

solves

Apache Ozone Apache Hudi

constrained_by

Medallion Architecture

Resources

DocsHigh

docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html

Official AWS DMS documentation for using S3 as a migration target, covering CDC replication modes, Parquet output, and the architecture that replaces legacy batch ETL.

BlogHigh

aws.amazon.com/blogs/big-data/stream-cdc-into-an-amazon-s3-d...

AWS Big Data Blog reference architecture for streaming CDC into an S3 data lake in Parquet format, the canonical AWS solution for modernizing legacy ingestion.

BlogHigh

www.confluent.io/blog/cdc-and-streaming-analytics-using-debe...

Confluent's authoritative blog on implementing CDC with Debezium and Kafka to replace legacy batch ETL, including architecture patterns for S3/lakehouse targets.