ETL Pipeline Studio — Alex Fricker

What it is

A pipeline-management tool that takes data from messy API sources to query-ready tables. You register sources (with various auth methods), define streams with pagination and schema inference, assemble streams into data packages, and materialize those into models — supporting both dimensional and Data Vault modeling.

What it does

Schema inference on streams, so you don’t hand-write every column.
Two modeling styles. Star-schema dimensional models for ergonomics, and Data Vault for auditable, late-arriving loads — including hash transformations for Data Vault business keys.
ClickHouse integration. Automatic table creation from model definitions, and loading data from S3 into ClickHouse with S3 virtualization for efficient loads.
JWT auth, a React front end for managing it all, and release CI.

Why it matters

It’s the data-modeling and pipeline muscle in concentrated form: sources, streams, schema inference, dimensional vs. Data Vault, and a columnar analytical target. The same shape shows up anywhere messy upstream data has to become trustworthy reporting.