The Unique Language Sigma Uses To Supercharge SQL Testing At Scale
Table of Contents

Sigma’s product revolves around bridging our spreadsheet interface with our customers’ SQL warehouses. Part of that process involves generating the SQL queries needed to fetch data for tables, charts, and other UI elements. We call the service responsible for generating those queries the compiler. Getting it right is critically important to our product.
Why? Because a bad query can produce incorrect results, and even when the data is accurate, poor query structure can slow performance significantly. That’s why we put so much emphasis on testing the compiler. And since we support many different SQL warehouses, each with their own variation on syntax, that means that end-to-end tests that actually talk to the warehouse are critical.
The core problem with UI tests
Sigma’s main interface for data analysis is a workbook in the frontend. One option might be to test the compiler by scripting the web UI, and we do have some tests that do exactly that. However, UI tests like these can be slow, rely on specific parts of the UI, and require domain knowledge to write and run properly. We don’t want compiler developers to have to learn the intricacies of React or frontend testing frameworks just to confirm that a Sigma function correctly translates into SQL!
The compiler itself is written in Rust, and although compiler engineers typically have to work with other services written in TypeScript, directly working on UI code is relatively rare.
The compiler itself is written in Rust, and although compiler engineers typically have to work with other services written in TypeScript, directly working on UI code is relatively rare.
So, we built a third approach.
Our own language for testing workbooks
We created an internal domain-specific language that lets us define workbook behavior and test the compiler directly, without touching the UI. We call it wb, since each file written in this language represents a single workbook.
The cool thing about it is that it abstracts away all the complexity of manually constructing a workbook’s JSON structure. It’s now our primary source of compiler integration tests, with hundreds of .wb files running thousands of individual cases.
We created an internal domain-specific language that lets us define workbook behavior and test the compiler directly, without touching the UI.
Here’s an example of what one of those files looks like:
warehouse-table Earthquakes {
path = {
postgres = "public.earthquakes"
databricks = "hive_metastore.public.earthquakes"
snowflake = "examples.earthquakes.earthquakes"
// other dialects would go here
}
}
#[test(mode(csv, sql), dialect(postgres, databricks, snowflake)]
table Recent < Earthquakes {
collapsed grouping {
key [Year]
column [Earthquake Count] = Count()
}
column [Year] = Year([Earthquakes/date])
column [Is Recent] = [Earthquakes/date] > Now()
sort [Year] asc
filter [Is Recent] include true
@session {
now "1984-04-01T13:00:13Z"
}
}
And here’s what the corresponding table would look like in the UI:

What this test is actually doing
This particular .wb file might look simple, but it’s doing a lot. First, the current time isn’t pulled from the warehouse using a NOW() function. Instead, it’s sent in as part of the request itself, meaning we can override it in tests to make sure the results are repeatable. The compiler never actually sees the current time directly, which is by design.
Another interesting element appears within the “Recent” table, where each column is defined using the full Sigma formula language, exactly the way a user would in the product. That’s part of what makes these tests powerful. They feel familiar to anyone working in Sigma and let us validate the translation from formula to SQL in a straightforward, readable way.
It’s a simple mechanism, but it reflects a very real-world challenge: writing tests that can validate behavior across warehouse dialects.
The #[test] line above the definition of the “Recent” table indicates that we should save both the generated SQL and the CSV output to disk, and we should run this test against both the Databricks and Postgres warehouses. Though obviously, our real tests run against more than just these two warehouses! And because different data warehouses structure their schemas differently, the “Earthquakes” table includes a separate path for each system. It’s a simple mechanism, but it reflects a very real-world challenge: writing tests that can validate behavior across warehouse dialects.
And finally, although the point of this test is to check how we handle Now(), we still include a sort step on Year. That’s because output needs to be deterministic for the test to pass reliably. Every .wb test that outputs a CSV is required to either sort the data or explicitly declare that sorting isn’t needed, such as if there’s only one row, like in an aggregate.
How it runs & other tests we use
In our continuous integration setup, we spin up a miniature version of Sigma’s production infrastructure, convert each .wb file into a real workbook, and run the test through the same backend paths we use to generate SQL for customers. That means that in addition to testing the compiler service, we’re also exercising other services involved in SQL generation. We’ve even caught bugs in adjacent systems this way.
Since we bypass the Sigma UI entirely, we can run each test faster and with less compute than it would take to run the equivalent tests in a headless browser: running at full parallelism, we can run 2-3 tests per second, compared to UI tests that might take 2-3 tests per minute on a comparable machine.
Our collection of SQL snapshots also allow us to get a broad overview of how a change to the compiler will affect our generated SQL, since our tests cover a broad variety of the features Sigma supports. Even if a PR doesn’t affect the result of evaluating a table, if the generated SQL blows up in size, that’s a good sign that it’ll cause problems and performance issues for our customers.
Our collection of SQL snapshots allow us to get a broad overview of how a change to the compiler will affect our generated SQL.
The compiler codebase includes other forms of testing as well, such as unit tests for specific functions and integration tests that run one or more phases of compilation and verify that our intermediate representations match what we expect. But .wb tests are a valuable tool in our testing arsenal: they’re fast to write, easy to reason about, and give me a lot of confidence when I’m making changes.