Metadata everywhere!

It feels amazing to surf a web of tables and give a crisp answer to an important question.

It feels terrible to stumble around a web of tables, unable to figure out what’s in each one. Or worse, to misunderstand and give the wrong answer to an important question.

How do we surf instead of stumble? By understanding the context around tables and their columns:

  • Why is this table called “confabutable_5000”?

  • What is even in this “calculated” column (and who named it that)?

  • Why would you possibly do the calculation this way?

Our data’s context is provided by metadata.

Metadata often lives in someone’s brain, and your job is to figure out who that person is and pry it out of them.

Figure 1: The metadata is in here somewhere

A better scenario is metadata stored in documentation you can both locate and read.

The best scenario is metadata located conveniently so you don’t have to switch contexts to look things up.

Easy access

SQL engine makers recognize the value of convenient metadata, and most provide “comment” (aka “label,” “description”) fields so metadata can literally live alongside the data.

Engines usually provide both a field for describing a table and fields for describing each column within the table. The comments are registered as part of CREATE statements or with commands like ALTER.

Some tools, like dbt, allow you to specify comments in your project then register them with the SQL engine. Unfortunately, this requires specifying every comment in the relevant YAML configuration file.

This separation of model contents (in the model SQL file) and comments (in the YAML configuration file) leads to a chronic documentation challenge: updating a model but forgetting to update the corresponding comments.

This quickly creates a dangerous scenario: metadata that actively misleads users!

Figure 2: Out of date metadata is a lie

The Tobiko Data team has encountered this problem before, and we knew there had to be a better way. There is: an alternative to dbt that makes it easy to keep metadata updated as models change.

All together now

How can we make it easy to keep metadata current? Specify comments directly in the model file.

SQLMesh’s ability to understand a model’s SQL query also allows it to identify and extract comments for registration with the engine.

Starting with version 0.69.0, SQLMesh automatically registers comments in the model definition with your SQL engine 🎉

Table comments can be placed before the MODEL DDL statement, and column comments placed on the same line where the column is defined in the query.

For example, the table created for the following model definition would have:

  1. The comment on the first line, “Total gross revenue from each customer", registered as a table comment in the SQL engine

  2. The comment on the same line as the customer_id column definition in the SQL query, "Customer's unique ID", registered as a column comment for the table's customer_id column

-- Total gross revenue from each customer
  name sushitime.customer_total_revenue,

  customer_id, -- Customer's unique ID
  SUM(amount) AS revenue,
FROM sushi.orders
GROUP BY customer_id;

If we run this model in BigQuery, we can see the registered comments:

Table comment

Figure 3: Table comment registered in BigQuery

Column comment

Figure 4: Column comment registered in BigQuery

The model speaks for itself

Creating, updating, and locating project metadata is a hassle.

SQLMesh makes it easier by letting models speak for themselves: inline comments automatically registered with your SQL engine.

If you’re interested in data, SQL, or just want to chat we’d love to meet you! Please join us in our growing Slack community!