We hope you are staying healthy. Click here to see how Sigma is ensuring business continuity and helping our customers through the coronavirus pandemic.


SQL and Spreadsheets: Bringing Them Together in the Cloud

Sigma = A Better Analytics Cloud

Nearly everyone who works in an office environment works with a spreadsheet. The vast majority of those users are not programmers.

Sigma makes Cloud Data Warehouse Analytics as accessible and approachable as a spreadsheet. Built for the cloud, Sigma is optimized for the modern cloud data warehouse. In Sigma the data is always live, never an extract or download, while still allowing line by line visibility of the data. Rather than forcing data to be pre-digested into aggregates and visualizations, we expose all of the data and allow users to perform their own analysis without forcing them to learn an extra language. And It’s is much more than a chart building or SQL query generation tool. By providing live, interactive access to your data warehouse through a familiar visual interface, Sigma makes business analysts as effective as any SQL Expert; their queries are correct, efficient, and maintainable, all without the expensive overhead of learning the ins-and-outs of SQL.

The Ubiquitous Spreadsheet

Spreadsheets are interpreted programming environments, similar to Python or R. However, they are missing some key capabilities typically found in general-purpose programming languages: support for complex data structures and types; rich APIs for accessing data on disk or over the network; and a rich ecosystem of open-source, community-supported libraries and extensions.

Also, the most popular spreadsheet products (i.e. Google Docs, Excel) have limits on the volume of data they can efficiently process, whereas programming languages are typically bounded by a combination of code quality, compiler optimizations, and physical hardware.

Best of Both Worlds

Sigma provides the best of both worlds:

  • Users interact with a variant of the Spreadsheet model
  • Sigma generates virtually any SQL query and expresses the results either in the Spreadsheet-like view or as a visualization.

Sigma’s dynamic translation compiler, called Qwill, is designed like a modern database compiler, rather than a query generator for simple reports or charts. This means queries generated are limited by the inputs, rather than the compiler itself. This is important because it is our mission to build a product that can fully leverage the expressive power of SQL, and thus amplify an analyst’s capabilities without concerning them with any details beyond Sigma’s visual interface.

At a basic level, spreadsheets and databases serve the same purpose – both are tools that ease the storage, retrieval, analysis, and sharing of datasets. Diving deeper, spreadsheets are more forgiving, and optimize for flexibility and usability, whereas databases are more rigid, and optimize for performance and normalization.

A Forgiving System

At Sigma, we believe that the flexibility and usability of spreadsheets can be applied to databases. We achieve this with Sigma’s familiar interface and underlying Qwill technology, which is able to navigate the rigidity imposed by databases and give the illusion of a more forgiving system.

This flexibility is achieved at three granularities: Columns and Formulas; Level Hierarchy; and Reference Worksheets. The visual expression of these three things translates the necessary structure of the database into an interface that is familiar.

As for generating consistent and performant SQL with almost no limitations in expressiveness – Qwill is built to be deterministic and tested for correctness. If two users independently build the same worksheet, Sigma will generate the same SQL. When two people write SQL by hand, anything except the most trivial queries will be at least slightly different. This increased the burden of reviewing code, verifying that everyone’s slightly different choices are still correct. With Sigma, queries to generate the same results are always the same, and always verified to be correct. Sigma engineers do all of the testing and alignment for you.

Sigma also considers best practices in order to build queries in the most efficient way. As databases change and optimization techniques evolve, all Sigma queries are updated. Even the most dedicated data team is unlikely to go back and rewrite every SQL query to incorporate new optimization techniques.

Furthermore, Sigma queries are optimized for each individual data warehouse. Sigma’s Qwill technology detects which database you are running the query on, and optimizes the SQL for the specific strengths of that database. Even expert SQL programmers would be hard pressed to optimized for different databases on the fly.

To maintain the flexibility of spreadsheets along with our promised consistency, we  follow the spreadsheet convention of formulas being confined to an individual column, and typically to a single “line” (as opposed to general purpose languages, which are multi-line files). Users can add columns to their Worksheet and utilize formulas familiar from spreadsheets to calculate values for the new column.

When writing a SQL query that performs an aggregation, it is necessary to consider and utilize a GROUP BY clause. Failure to do so results in errors from the database. Similarly, when writing a SQL query that uses a window function, it’s necessary to understand the ins-and-outs of the OVER clause.

In Sigma, neither of these SQL-specific considerations are necessary within the formulas. Our level hierarchies are enough information to determine the appropriate SQL to generate in these cases. So using aggregates like SUM or COUNT, or analytic functions like LEAD or LAG are as simple as calling the function – Sigma does the rest.

There’s a common customer analytics pre-processing step called sessionization – the act of turning event-based data into sessions, the ordered list of a user’s actions in completing a task. In Sigma, this can be expressed in a one-line formula, and can be easily referenced by other formulas.

In a SQL database, a view can be used to expose a virtual table that “expands” to a subquery when referenced in SQL. This allows data engineers to curate a common set of dataset definitions.

In Sigma, Reference Worksheets serve the same purpose, however they do significantly more than simply expand to a subquery. In Sigma you easily create a Reference to a co-worker’s worksheet, and use that as a starting point to answer any new questions that are sparked. Programmers have long used code libraries to make programming faster and more collaborative. Sigma allows for a similar library of analysis, where everyone can build on what has come before.

Conclusion: A Single Tool for Business Experts and the Data Team

Using a familiar spreadsheet interface, Sigma makes Cloud Data Warehouse Analytics accessible and approachable to business experts that don’t have and don’t want SQL skills. Further, it’s a tool that makes even SQL experts more effective and efficient. Sigma is optimized for today’s businesses, built from the ground up for use on modern cloud data warehouse and optimized for the needs of a data oriented business culture. With Sigma, the data is always live and secure, without giving up accessibility and line by line visibility. We allow all users to perform their own analysis, rather than forcing business experts to make do with pre-packaged, pre-aggregated summaries. With Sigma, Business Experts are as effective and efficient as any SQL expert.

Sigma also simplifies governance and collaboration for both administrators and end-users. With reference worksheets, administrators can curate the canonical sources of data that power their businesses, while controlling data access through team membership and organizational hierarchies. Further, team members can build on each other’s analysis, creating a reference library of company analysis.

Sigma is an analytics solution that doesn’t get in your way. Instead it is an invaluable tool that you and your colleagues can employ as you go about mining, analyzing and sharing the treasure troves of data now being generated in your field.