Starting from version 20.3.0
, SQLGlot users can now install the optional sqlglotrs
dependency to significantly speed up the parsing of their SQL.
When installing SQLGlot from scratch, you can add this dependency as part of the installation command:
pip install "sqlglot[rs]"
That’s right, parts of SQLGlot are now implemented in Rust!
Specifically, the tokenization step has been completely migrated to Rust, resulting in a 30-40% improvement in overall parsing speed (depending on the input query). Long queries or inputs with a lot of formatting will benefit even more from the upgrade.
Below is the comparison of vanilla SQLGlot vs. SQLGLot with the Rust tokenizer on the existing benchmarks:
Query | sqlglot | sqlglotrs |
---|---|---|
tpch | 0.00944 (1.0) | 0.00590 (0.625) |
short | 0.00065 (1.0) | 0.00044 (0.687) |
long | 0.00889 (1.0) | 0.00572 (0.643) |
crazy | 0.02918 (1.0) | 0.01991 (0.682) |
Impact on SQLMesh
The new dependency has already been included in SQLMesh, and users have been enjoying significantly faster project load times starting from version 0.63.0
.
Future Work
We're only scratching the surface of the performance improvements we can achieve with Rust in SQLGlot, and users can expect more components to be migrated over time.
It remains to be seen how much of the existing Python implementation is feasible to migrate. We’ll be sharing our thoughts with the community on this subject in future posts.
Looking for Feedback
Despite being successfully used by SQLMesh users, the improvement is still relatively new and should be adopted more widely before we make it a non-optional dependency for SQLGlot.
If you run into any issues and / or have any feedback, we’d love to hear from you in our Slack community!