Tobiko

Efficient Development with the SQLMesh Browser UI

Manually tracing DAGs and jumping between transformation models is a drag. If you know that pain, check out the free, open source SQLMesh browser UI - it makes understanding and developing data transformation models easy.

A tale as old as time

Executives are breathing down my neck to get new product metrics into the C-suite dashboards. My glass-eyed stare is fixed on the 431 SQL models in the pipeline’s project folder.

“We really need to figure out which of these can be deleted,” I think for the 432nd time. But, alas, this is a moment for creation.

The first task is picking through the calculations for existing metrics - the new models will use many of the same columns.

I just need to figure out where in the project DAG the old and new metric calculations diverge. With that, I can develop and test code for each step in the new metric’s calculation sequence. So I begin walking the DAG, searching in earnest.

Content warning: cigarette and DAG-induced 1000 yard stare. Content warning: cigarette and DAG-induced 1000 yard stare.

Hours later I notice the acrid residue of CaffeineCorp’s CoffeeMaxx 7000 coating my tongue. If only there were a better way to understand my DAG - if only the delicate flavors of CoffeeMaxx 6000 could suffice.

Threads in the web

Data pipeline DAGs, whether simple or complex, are a hassle to understand. I can never quite hold the whole thing in my mind, and walking through them step by step in my normal IDE is tedious.

My first step when working with pipelines is tracing a thread of calculations so I have context for the changes or additions I need to make. Irritatingly, tracing this thread requires dissecting the code in each model.

Visualizing the DAG makes this easier, but tools like dbt only show me which models select from other models. I still have to unpack each model’s query to find the specific columns I’m working with.

Fortunately, the SQLMesh browser UI provides what I’m really looking for - an easy way to hone in on the specific columns relevant to my calculation. The philosopher’s stone of data development: column-level lineage.

SQLMesh Browser UI

The SQLMesh UI provides an interface for understanding projects via column-level lineage and interactively developing SQL models in a file editor.

It is free, open source, and installed with SQLMesh via pip. It runs in a standard web browser, but its functionality is local to your machine - nothing goes over the internet (unless you're using a remote database connection).

The UI provides two primary views of your project: an editor for examining and altering model code, and a documentation view for exploring the DAG and tracing column lineage.

Figure 1: SQLMesh Browser UI Editor and Documentation Panes Figure 1: SQLMesh Browser UI Editor and Documentation Panes

The documentation view provides detailed information about each project model, including a search interface to easily locate models in large projects.

Selecting a model opens an information pane including details about the model, such as its model kind, column data types, and rendered query.

Figure 2 shows selected UI information from a model in the SQLMesh sushi restaurant example project:

Figure 2: SQLMesh UI model documentation information Figure 2: SQLMesh UI model documentation information

Exploring column-level lineage

Selecting a model in the Docs interface opens an interactive display of its location in the DAG:

Figure 3: SQLMesh UI interactive DAG explorer Figure 3: SQLMesh UI interactive DAG explorer

Each rectangle in the DAG represents a model and includes a list of the model’s columns. Selecting a specific column reveals all the contributing columns in ancestor models.

In Figure 4 I selected the revenue column in the rightmost top_waiters model in orange (ancestor columns are annotated with pink boxes for emphasis).

We see that the leftmost item price and quantity ordered columns contribute to the revenue columns as we move rightward:

Figure 4: Column-level lineage in the interactive UI explorer Figure 4: Column-level lineage in the interactive UI explorer

Understanding the column flow in this simple DAG may be easy, but the major efficiency gains from interactive column lineage come with large projects.

Data transformation projects are always a moving target, and more collaborators bring additional opportunities for error due to miscommunication or oversight.

Errors are often as simple as forgetting to update a column name, but digging through an enormous DAG to find that single misnaming can waste hours of time (and liters of tears).

With the SQLMesh column lineage explorer you can cut that down to minutes and milliliters!

Rapid iteration with the SQLMesh editor

Updating or creating new models is an iterative process that requires examining data, writing code, running the code, and examining the new data. Efficient development requires keeping the iteration loop tight.

One challenge during development is accessing or creating data to work with. Production data systems are serving business needs, so ad hoc querying may impact important operations. Additionally, creating new tables in production systems may not be allowed due to security or other reasons.

SQLMesh enables easy creation of development environments that allow each developer to use their own data workspace without re-running models. The SQLMesh UI makes this particularly easy with integrated tools for creating new environments.

Once the right data is in hand, it’s time to examine it via SQL queries. You can run any query you like directly from the SQLMesh UI, with the results displayed in an interactive data preview that allows filtering and sorting.

Figure 6: Execute arbitrary queries and explore the results in the interactive data viewer Figure 6: Execute arbitrary queries and explore the results in the interactive data viewer

Writing and modifying queries is the core of data development. One challenge with the model-based approach used by SQLMesh and other transformation tools is that queries often have “placeholder” values that are only rendered at runtime. For example, the values for SQLMesh macro variables such as @start_ds and Jinja templating are only substituted into the query at runtime.

The SQLMesh editor provides immediate access to the fully rendered query, so you can see exactly what code will be submitted to the database when you run the model:

Figure 7: View rendered model queries in the SQLMesh UI Figure 7: View rendered model queries in the SQLMesh UI

You can also evaluate the results of a single model, with the results displayed in the interactive data preview:

Figure 8: Evaluate a model and explore the results in the interactive data viewer Figure 8: Evaluate a model and explore the results in the interactive data viewer

Finally, SQLMesh’s table diff tool allows you to make a direct comparison of new and old model results. It notes added/removed/changed columns, along with a sample of data rows and the specific differences in their values:

Figure 9: Directly compare table columns and rows with an interactive table diff Figure 9: Directly compare table columns and rows with an interactive table diff

Conclusion

Data development is an iterative process, and efficiency comes from keeping the iteration loop tight.

SQLMesh UI features like column-level lineage, dev environments, rendered queries, model evaluation, and table diffing allow you to rapidly navigate DAGs and modify models.

Check out the SQLMesh UI with our quickstart guide, and join our Slack to learn more about efficient development with SQLMesh.