Pipe’s Journey to Data Contracts using SQLMesh
Tim Chan, a seasoned data engineer, stumbled upon SQLMesh through a LinkedIn post shared by a respected data scientist friend. He was between jobs and, intrigued by the project, decided to explore SQLMesh to understand the tool and its potential. Fate intervened when he noticed a job posting from Pipe in the Tobiko community Slack, leading him to his current role.
- Warehouse: BigQuery
- Number of Models: 250-500
- Previous Transform Tool: dbt
- North America
- pipe.com
Pipe is a fintech company that provides embedded capital solutions to vertical SaaS companies. Pipe partners with companies in specific industries (e.g. salons and coffee shops) integrating its capital platform directly into the SaaS tools the businesses already use. Rather than relying on credit scores, Pipe uses machine learning to evaluate risk based on a company’s historical transaction and revenue data. As a new data engineer at Pipe, Tim was hired to solve challenges around data contracts and data pipeline stability.
The issues with dbt in Pipe's pipelines
When Tim joined Pipe, he was immediately assigned the task of fixing some pain points that the team was experiencing when using dbt in their data pipelines:
- Solving data contracts
- Streamlining their data pipelines so that changes are easier to remember, had fewer steps, and had a more intuitive process
- Understanding the impact of changes to their data pipeline and how that might affect their data and tables downstream at the column level
-
Reducing the waiting time between making a change and finding out if
something in their pipeline broke as a result of that change
- In order for an engineer to make a schema change, the old CI system would copy test data over to a PR test environment and do a brute force run: a full dbt run. The engineer would then wait forever for all the dbt models to rebuild, only to find out, in the end, that something broke.
Pipe had been considering alternatives to dbt, and Tim’s manager asked him to look into other solutions. Although his manager had heard of SQLMesh, Tim diligently explored the ecosystem of available tools. He spent some time investigating Grai.io but ultimately concluded that the additional infrastructure requirements and effort needed to get column-level lineage, compared to SQLMesh’s more straightforward implementation, made SQLMesh a clear choice to address the company’s specific issues.
Using SQLMesh to fix the issues with dbt
Streamlining pipelines
When Tim was part of a team at one of his previous companies that was using dbt’s Slim CI, he observed that dbt’s efforts to make CI checks more efficient were not always successful or accurate. Depending on the frequency of main branch runs, models built for a PR might include those added to the main branch but not yet picked up by a full run. This led to numerous questions from users, necessitating detailed explanations of how Slim CI operated. The experience was far from seamless.
Incumbent tools at organizations certainly have inertia as there is always some resistance to introduce a new tool or process. While Tim understands the switching cost of introducing a new tool, he felt the trade-offs of the learning curve would be quickly outweighed by the benefits of reducing pain and suffering in his team’s workflow. This was especially true given how much time his team was spending in dbt:
“It’s uncomfortable for people to switch tools. While I totally get that the team had been using dbt for two years, we plan to be building data pipelines for much longer into the future and the discomfort of trying something new is temporary and short relative to the years of productivity we’ll get if we make the switch.”
He prefers using tools that spark joy to keep himself and the team
motivated and moving quickly. In fact, he always asks,
Does this tool spark joy? If it doesn't, then don't use it! The status
quo should never include pain and suffering.
Tim believes that complacency with a tool is not the best way to manage a team; tech leads should be open and willing to challenge their daily workflows. Although there is an aversion to discomfort, a switch can easily produce long term benefits that outweigh that short term discomfort resulting in massive gains in productivity. Furthermore, when he saw what was previously built at Pipe using dbt, he noticed that it was highly inefficient because of the brute force approach of building the entire warehouse with every proposed change. This meant that whenever someone wanted to test their changes, they had to wait for a complete warehouse refresh, significantly slowing down the process.
When using SQLMesh, Tim found the joy that he was looking for. SQLMesh had several key open-source features that significantly helped him streamline the company’s data pipelines: column-level lineage, the free UI, change summaries, and the built-in data contracts.
Impact of changes
Using SQLMesh, his team could now also understand the impact of their changes within the SQLMesh UI and they could see all of the related downstream data assets. In fact, he recalls onboarding two finance team members and walking them through the UI:
“I could discern the reactions on their faces, and that is precisely the validation I strive for when developing platforms. As a platform engineer, one is perpetually in pursuit of user approval. It's akin to marketing a product; when you observe a customer's facial expression transform, it unequivocally signifies that you have chosen an effective tool.”
SQLMesh was like a breath of fresh air for Tim’s team, because he says,
“Using dbt relies on running your SQL on a real database before it tells you something is wrong. SQLMesh can understand what you wrote and tell you that something is broken before you even run it on a database.”
Cutting the wait time
For many engineers, the time between making a change and seeing if it works in their code can be frustrating. This was no different for Tim and his team at Pipe.
The long lag in feedback creates a mental tax that engineers have to pay. With long wait times, many engineers get distracted by one of the many other tasks they have on their plate, dividing their time and energy away from the task at hand. Depending on how long it takes for their runs to complete, they could forget the context in what they were trying to fix, or even that they started a run altogether. This limits productivity, increases the time necessary to get to a result, and reduces the output achieved within a given sprint.
In addition to the slow feedback loop due to full warehouse rebuilds, Tim also notes that dbt’s reliance on Jinja further hinder’s his team’s productivity.
“dbt relies on Jinja, which is just a template engine. Ansible also uses Jinja. The downside of Jinja is that it doesn't understand the content it is templating, which can be problematic. This lack of understanding means Jinja can handle a SQL query, a configuration file, or even a Chinese poem without recognizing the context, potentially leading to issues if the content requires specific handling or validation.”
With dbt and its use of Jinja, engineers might write invalid SQL that
fails to execute. Without a way to test and preview the impact of their
changes beforehand, this can lead to wasted effort. That really irked Tim:
…what I find super important about platforms is the feedback
loop between writing some code and seeing what your code does.
Tim's philosophy is to iterate often and test as much as possible. In fact, he wrote a script that facilitates this approach, allowing the user to push a single button to run their changes: the Pipe team calls it “just go.” It installs/updates Python dependencies for the project, runs `sqlmesh create_external_models`, and then runs `sqlmesh plan` to apply changes to the user’s virtual warehouse. He encourages his data community to frequently run 'just go' because SQLMesh is so efficient that it allows immediate detection of any issues or downstream effects following minor changes.
Tim comments:
It’s harder if you write a whole bunch of stuff and then try to debug
at the very end versus if you just keep checking all along the way. And
having a simple-to-use command where engineers don't have to memorize
anything really accelerates productivity.
With the new command they developed, any data person can easily view changes made during their development process. This aligns with the teams’ goal of streamlining their pipelines. Now, anyone can make changes by running this simple “just go” command and understand how their changes will affect Pipe’s overall data infrastructure.
Data contracts
All these issues contributed to the efforts to solve data contracts at Pipe. SQLMesh became the clear tool that the Pipe team wanted to use moving forward to future-proof their pipelines. It was such a clear advantage that there was no pushback from the team or stakeholders on making the migration. Because of the benefits that Tim's team saw in their POC, he recalls that the team felt aligned on the change and the understanding of the value proposition of making the switch.
“…that’s usually the hard part: convincing stakeholders or users of your platform that we're going to change things up drastically… with SQLMesh, there was no friction.”
When Tim demoed the SQLMesh UI to his other engineers, they were thoroughly impressed with the column-level lineage. The fact that his team could trace a change all the way to the source was mind-blowing. This was the pain point that the team was really trying to solve within data contracts.
While SQLMesh has a dbt integration, Tim’s team did not consider using it and instead took it as a good opportunity to revamp their pipelines.
“It was a literal option to just copy paste our dbt project into SQLMesh. We could've thought of it as ‘let's just try to leverage the dbt integration’ but we also thought, ‘why do we want to see this Jinja garbage?’ When you break up with someone, you don't want to be reminded of the painful experience by seeing their leftover belongings.”
A bonus in Tobiko's team
What took SQLMesh to the next level for Pipe’s data team was actually the
human aspect. Tim and his team were shocked at how quickly the Tobiko team
replied to any issues that came up. He also noted that the diversity of
people replying from the Tobiko team was really great
; that level
of commitment and attention to detail was not something they would get
from a larger company.
Tim feels that
it feels like I'm part of the team when working with Tobiko on SQLMesh.
We all share the same goal of achieving success with SQLMesh. If SQLMesh
were to disappear, I would be devastated to return to using dbt. I will
always advocate for SQLMesh. If I were to leave Pipe, I would recommend
SQLMesh to my next company, and if they weren't interested in
entertaining the idea, I'm not sure I’d want to work there.
Conclusions
Tim has experienced the inefficiencies of dbt firsthand, which required dedicated infrastructure and came with convoluted processes. As he dived deeper into SQLMesh, he recognized its elegance and superiority over dbt. Tim prioritized joy and efficiency over complacency and recognized that the long term benefits of SQLMesh far outweighed the short term cost of switching. He optimized his and his team's time by adopting tools that alleviate pain and ensure Pipe's workflows are future-proof.. Pipe’s data team now uses SQLMesh, which sparks joy, streamlines processes, and empowers diverse stakeholders.