Tobiko

Introducing SQLMesh’s Open Source CI/CD Bot

In the ever-evolving landscape of data management, Continuous Integration/Continuous Deployment (CI/CD) is a critical requirement for establishing reliable and trustworthy data systems.

Traditional manual deployments, susceptible to errors and inconsistencies, can have a ripple effect on downstream data users and products. Sometimes these issues can go undetected for some time, resulting in a loss of confidence and trust in the important data products and teams that many users rely on. CI/CD’s importance made our mission clear: offer an open-source CI/CD bot for SQLMesh, simplifying the automation process and unlocking its full potential for teams.

Join us in this blog post as we unveil the objectives behind this initial release and take a deep dive into how our CI/CD bot elevates verification, transparency, and reliability, empowering data teams to confidently deploy changes and optimize their overall productivity.

Verification

At the heart of SQLMesh lies a powerful capability: the dynamic transpilation of queries across dialects at runtime.

This provides teams with the ability to unit test your pipelines by quickly executing models (pipelines) in CI with DuckDB by providing specific inputs and comparing those results to the expected outputs. By integrating our CI/CD bot seamlessly into the workflow, this process can run with each commit, ensuring consistency across all code modifications.

Figure 1: Automatic Unit Tests Figure 1: Unit Tests were automatically run on commit, and tests that failed were highlighted with the error message displayed

Moreover, SQLMesh generates plans, akin to Terraform’s plans, for each potential change. The CI/CD bot can be set up to automatically publish a plan summarizing the changes with every commit, ensuring that changes align perfectly with the intent before being deployed to production.

Figure 2: Prod plan preview Figure 2: Prod plan with model changes, their change type, and missing dates all highlighted

Transparency

When it comes to reviewing pull requests (PRs), focusing solely on code changes offers only a glimpse of the overall impact. To overcome this limitation, the CI/CD bot creates a self-contained environment that replicates the precise code alterations featured in the PR.

If a developer has already applied these changes in their own development environment, SQLMesh's Virtual Data Environments seamlessly reuse the tables instead of building new ones, ensuring that tables are built only once.

Moreover, the PR itself presents a comprehensive overview of each altered model, complemented by the loaded date/time intervals for enhanced context and clarity.

Figure 3: PR environment Figure 3: An environment was created specifically for this PR and what is included in the environment with their loaded dates is highlighted

Soon, the CI/CD bot will also offer a data diff between the changes and production, allowing for an even deeper understanding of the data change from within the PR.

Reliability

SQLMesh aims to relieve developers and reviewers of the burden of accurately deploying changes to production.

To achieve this, we've implemented a range of smart features, including the automatic promotion of development-created tables to production, courtesy of SQLMesh's innovative Virtual Updates functionality. This seamless promotion process ensures that tables built in development are instantly deployed to production, all while maintaining data consistency through SQLMesh's reliable fingerprinting process and dependency graph.

But that's not all – SQLMesh goes the extra mile to guarantee that any tables promoted to production match the existing production data precisely. If there are any gaps, the CI/CD bot will automate backfills, leaving no room for discrepancies. With these robust safety measures in place, developers and reviewers can breathe easy, knowing that their changes will be accurately represented in the production environment.

Figure 4: PR comment Figure 4: A comment on the PR tells you where to find the data matching the PR, and the bot updates the comment with details of a plan when it applies the plan to production

Conclusion

The SQLMesh open source CI/CD bot empowers data teams to more easily meet the growing use cases and demands of their business while increasing the reliability and quality of their pipelines.

Check out the CI/CD bot documentation to get started!

If you want to get a taste for what it's like to use SQLMesh then check out our guide for using the integrated UI, or if you want to tryout SQLMesh on your existing dbt or new project then checkout the quick start guide.

If you have any questions or feedback please join our Slack community and share your thoughts!