← Projects
ETL Pipeline Studio
A full-stack ETL/ELT tool for defining data sources and streams, modeling them two ways (dimensional and Data Vault), and loading the results into ClickHouse from S3.
What it is
A pipeline-management tool that takes data from messy API sources to query-ready tables. You register sources (with various auth methods), define streams with pagination and schema inference, assemble streams into data packages, and materialize those into models — supporting both dimensional and Data Vault modeling.
What it does
- Schema inference on streams, so you don’t hand-write every column.
- Two modeling styles. Star-schema dimensional models for ergonomics, and Data Vault for auditable, late-arriving loads — including hash transformations for Data Vault business keys.
- ClickHouse integration. Automatic table creation from model definitions, and loading data from S3 into ClickHouse with S3 virtualization for efficient loads.
- JWT auth, a React front end for managing it all, and release CI.
Why it matters
It’s the data-modeling and pipeline muscle in concentrated form: sources, streams, schema inference, dimensional vs. Data Vault, and a columnar analytical target. The same shape shows up anywhere messy upstream data has to become trustworthy reporting.