ENGINEERING

Deep Dive on Parameters

Max Seiden

SQL Compiler Lead, Sigma

Variables in a SQL query are a powerful tool, but spotty data warehouse support and poor integration with analytics products makes them inaccessible. This is why the team at Sigma designed and built Parameters. It enables our users to leverage variables in their models, reports, and dashboards without needing to engage with SQL or work around shortcomings in their data warehouse. And since their release, it has been exciting to see all the cool reports and dashboards folks have created with this powerful feature.

In the sections that follow, we’ll dive deep on Sigma’s Parameters. I’ll touch on the design thinking behind this feature, its inner mechanics, and some of the ways folks are using them today.

Using variables in SQL

Although the SQL specification details a few notions of variables and parameters, the related functionality in open source and commercial databases is far from standardized. This is also the case for the databases Sigma supports. In this post, we’ll discuss SQL variables and parameters in terms of Snowflake’s implementation, as we’ve found that it has a fairly complete and robust model. This makes for a positive user experience, and lends clarity to the examples below!

In Snowflake, you can write a query that contains $-prefixed variable names and rely on account, user, or session level configurations to resolve the variable’s value when the query runs. Here’s an example query from their docs.

SELECT AVG(SALARY) FROM EMP WHERE AGE BETWEEN $MIN AND $MAX

Snowflake also supports user-defined functions (UDFs) and user-defined table functions (UDTFs), both of which can specify arguments. These arm folks with another way to employ variables in their SQL queries. Here’s an example UDTF from their docs.

CREATE OR REPLACE FUNCTION get_countries_for_user(id number)
  	RETURNS TABLE (country_code char, country_name varchar) AS  
          'select distinct c.country_code, c.country_name
           from user_addresses a, countries c
           where a.user_id = id /* id is the function’s argument */
           and c.country_code = a.country_code'

Designing Sigma’s Parameters

Sigma’s Parameters came from a combination of customer requests and a desire to add the power of variables and UDTFs to our query model. After a number of design iterations and name changes (everyone is glad Portals didn’t stick…) we settled on a model that involved adding Parameters to our Worksheet in such a way that they were interchangeable with our existing Columns.

Our decision to make Parameters and Columns interchangeable stemmed from our goal of minimizing the number of special rules and behaviors in our Worksheet interface. Success meant that folks could add Parameters to any existing Worksheet and include them in formulas, without needing to relearn the Sigma skills they’d already built up. With the high-level semantics sketched out, we turned our focus to the Parameters themselves.

A Parameter is defined by a data type, a name that other formulas can reference, and constraints on the data values a user can specify. The constraint serves two purposes: the first allows its author to define acceptable values, such as a list of product names or a date range; the second gives Sigma’s UI the context to render a user-friendly control for the Parameter. In this way, folks can create Worksheets that incorporate Parameters into their filters and calculations, and consumers of those reports are able to interface with interactive UI controls, rather than session variables or free form text inputs.

We also designed Parameters to extend the semantics of our Dataset reference model. Since a Dataset is effectively a VIEW in a SQL database, the addition of Parameters makes these documents like UDTFs. The Parameters are like the arguments to a table function, and therefore influence the calculations in the containing document.

Exploring app traces with Parameters

Now that we’ve covered the how and why of Sigma’s Parameters, I’d like to walk through one of my favorite internal uses of this feature: debugging production issues by analyzing application trace spans. Since the Dashboard that enables this analysis was built, it has become an essential part of our diagnostics toolbox, enabling folks across the company to understand the details of actions taken in a particular request, with nothing more than a trace identifier. While it isn’t the most complex or innovative use of Parameters, it has certainly been one of the most impactful for the team here at Sigma.

The primary visualization for our request tracing Dashboard is a data table with a modified bar-chart view of the spans within a request trace. If you aren’t familiar with request tracing, these folks provide a great overview of the type of system we use here at Sigma. The only Parameter used for this Dashboard is the identifier of the request we’d like to debug; all of the other data and visualizations in the dashboard base their calculations on the spans contained in this request trace.

This Dashboard’s Parameter functionally behaves like a filter, selecting the rows that feed into these calculations. However, this parameterized filtering does more than just limit the rows in the data table. It’s also necessary in calculating the bar-chart display used to indicate the points in the request timeline at which a specific span was active. These calculations compute the beginning and end of the request trace, map each span to a relative timeline (ie milliseconds since beginning of trace), and finally generate the ASCII representation for the modified bar-chart in the Dashboard’s data table.

The Worksheet that contains the calculations described above is, in a sense, really just a UDTF. It accepts a single argument (the request id) and returns the rows that are used as the input to the data table visualization on the Dashboard. This analogy also underscores why a “dashboard filter” that you’d commonly find in most analytics tools isn’t quite powerful enough to enable this specific use-case; this filter must influence the calculations in the underlying query, not just which rows are displayed in the data table.

In summary

While the example above has been incredibly helpful to our internal team, our customers have many interesting use-cases as well. These include partition-eliminating filters, aggregate granularity selectors, comparative analyses controls, and many more. And what’s especially exciting is the generality of this functionality – given the functional similarities between Parameters and UDFs / UDTFs, there are many creative ways in which they can be used.

I hope you’ve found this blog post educational with respect to both SQL Parameters and Sigma’s product functionality. I’d like to give a shout out to everyone on the team who has worked to design, develop, and support this feature. And if you find these sorts of product features and technical challenges exciting, you can check out our open positions here.

ABOUT 

Max Seiden

Max is a software engineer focused on the full life-cycle of queries including SQL generation, workload management, and performance.

HOBBIES

Cooking, Music Composition and Performance