Column level lineage for dbt

You can easily use SQLMesh's open source IDE to explore column level lineage in your existing dbt project. SQLMesh is a powerful alternative to dbt, but is compatible with existing dbt projects and can be used to expose column-level lineage.

This guide will use jaffle shop DuckDB, but you can use this on your own dbt project instead.

Install jaffle shop DuckDB

You can skip to the next step if you already have a dbt project.

Clone the repository and enter it.

git clone https://github.com/TobikoData/jaffle_shop_duckdb.git
cd jaffle_shop_duckdb

Setup a virtual env and install dependencies.

python3 -m venv .env
source .env/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt

Setup SQLMesh

Setup your dbt project to work with SQLMesh. This will not affect your original dbt project. You can learn more about our dbt integration in our documentation.

pip3 install "sqlmesh[web]"
sqlmesh init -t dbt

Add start to your dbt_project.yml. Start is a required variable in SQLMesh that indicates the earliest date that your data warehouse needs to consider.

diff --git a/dbt_project.yml b/dbt_project.yml
index 8bfaa9c..8bd285d 100644
--- a/dbt_project.yml
+++ b/dbt_project.yml
@@ -19,11 +19,12 @@ clean-targets:

 require-dbt-version: [">=1.0.0", "<2.0.0"]

seeds:
   +docs:
     node_color: '#cd7f32'

 models:
+  start: "2020-01-01"
   jaffle_shop:
     materialized: table
     staging:

Start the SQLMesh UI

sqlmesh ui

And click this link to launch the UI.

Click on orders.sql in the left hand file selector pane.

Figure 1: file selector panel

Click on bank_transfer_amount in the main.orders box.

Figure 2: model schema panel

You can now see column level lineage for bank_transfer_amount, including all CTEs.

Figure 3: column level lineage view

There's a lot of stuff to unravel here, but you can see bank_transfer_amount is derived from payments.payment_method and payments.amount.

We're actively working on improving column level lineage and have a big update coming soon which should make this an even better experience. If you run into any issues or have any suggestions, please join us on Slack. We are committed towards collaborating with the community to make working with and understanding data better.

If you're interested in learning more about SQLMesh, here are some additional links to check out!