Closing the Gap Between the Data Teams and Business Users

The Great Divide

In today’s data-centric enterprises there are three primary groups of users working with data and analytics:

  • Data Teams: Responsible for owning and maintaining the enterprise’s data warehouse and making that data accessible to select users in the company. Conversant with SQL and a broad array of programming languages.
  • Analysts: Responsible for analyzing and interpreting canonical stores of data to answer recurrent and ad-hoc questions. Some of the members of this cohort of users have some SQL and programming skills. Conversant with BI tools, spreadsheets and pivot tables.
  • Business Experts: Use exports, reports and dashboards generated by analyst and data teams. Typically have most of the business questions and use data to guide decisions. Often need additional reports or data and make frequent ad hoc requests to analysts and data teams.

Over the years the gap between the data engineers, analysts, business teams has continued to widen.

Traditionally data teams curate tables within a data warehouse, which are queried via report/chart building tools or downloaded via sanctioned access points. This curated model has its advantages when it comes to standardization, but it isn’t free of downsides. When an analyst conceives of a valuable change, the process for incorporating it into the model (even as an experiment) may involve weeks of waiting on data engineers. More often than not, analysts manually and repetitively apply these changes in isolation. This means improvements are confined to laptops and emails, rather than readily available to the entire organization.

And even if analysts push for changes to be incorporated back into the standardized model, their request is likely one of many; this influx of requests can cause data engineers to become a major bottleneck. When this happens, it pulls data engineers away from their own responsibilities and may even delay analyst requests to the point that they become irrelevant and out of date.

More Challenges for Data Engineers

Here are a few more challenges facing today’s data engineers:

  • Data disappears. The analyst downloads the data onto a PC, and the data engineer promptly loses sight of what happens to it.  The data engineer has spent great blocks of time curating the data, ensuring it's up to date and accurately reported. Then the data is downloaded and the engineer has no idea who made the download, when it happened, and how the data is being used.
  • Constant context switching to help downstream data consumers by storing the state of a process or of a thread, so that it can be restored and execution resumed from the same point later. Although this is a valid user service, it disrupts the data engineers’ focus on the engineering tasks they are uniquely qualified to perform.  
  • With the advent of “Big Data” these tasks are becoming even more demanding. It means working with billions of rows and terabyte scale computing. It also means working with formats like JSON (JavaScript Object Notation) and the huge workflows associated with rapidly evolving disciplines such as Machine Learning.

Challenges Facing Business Analysts

  • Business analysts find themselves constrained by engineering ticket throughput, not “curiosity throughput.” Imagine this scenario: For every web page you wanted to examine you have to file a ticket, have someone else navigate to the page, and send you an email with the screenshot.  Because this is cumbersome and time consuming, you would only ask for materials that you were absolutely certain were worth reading.  Even with this filter your requests pile up and can take weeks or months to be fulfilled. You’ll probably just give up and not bother asking for most of the information that is available.  This scenario describes a company culture that kills curiosity and dramatically devalues your data.
  • Without IT intervention, analysts need to learn to code – in other words they need to gain some technical skills beyond Excel to work with canonical datasets, or generate reports or analysis in a timely manner.
  • Extracted data is no longer live or secure, and the tools that process these extracts, like Excel and Tableau, cannot work at the scale of the raw data.  Thus, pre-filtering and aggregation are required.  Chasing down discrepancies between these datasets and live, connected, canonical datasets is the bane of many data engineers’ existence.
  • The gap between business analysts and technologists becomes even more critical as the volume of actionable data flows in and out of the data warehouse. The volume, velocity, and variety of data flowing into a data warehouse have no correlation to the size or age of a company. The result is that datasets too large to analyze with existing spreadsheet or report-building tools are the new normal in many companies.
  • Most legacy products simply aren’t built for the world of cloud data warehouses while newer products tend to serve a specific group of users while neglecting others. This results in a split-brain situation: separate groups of users are speaking different languages, despite talking about the same data and questions.

Sigma builds on decades of best practices regarding data warehouse modeling and access patterns, making many of the same choices an experienced data engineer would make. And unlike any other product, Sigma can express the vast majority of ANSI SQL visually.

Closing the Gap

The Sigma solution closes the gap between analysts and technologists. Everyone in the enterprise is using the same interface, talking about the same data, and centralized on the cloud data warehouse. All SQL and post-processing are executed using the enterprise’s cloud data warehouse. This simplifies data governance since tables don’t need to be moved or stored at rest in any other system.  

Sigma builds on decades of best practices regarding data warehouse modeling and access patterns, making many of the same choices an experienced data engineer would make. And unlike any other product, Sigma can express the vast majority of ANSI SQL visually – and the coverage increases in every new release. Using the solution’s familiar visual interface, both analysts and technologists are able to produce any SQL query desired, a capability other products are unable to match.

One of the fundamental elements of the Sigma solution is the Sigma Worksheet – a basic, reusable unit of analysis, collaboration, and access control. The worksheets function like virtual spreadsheets that live and run in the cloud. Organization members create worksheets to analyze, curate, and share. Admins can endorse worksheets that canonically define the data. Anyone can leverage endorsed worksheets and co-workers’ worksheets as a starting point for new reports and analyses.

Sigma ensures that anyone can use worksheets (instead of SQL) to wrangle JSON, join tables and sheets, and materialize optimized cubes/summary tables. Business analysts can use simple formulas to write part-to-whole, create window functions, fill-down data, and build month-over-month comparisons, without needing to learn a thing about SQL.  Because this process is concretely expressed in tables and charts and spreadsheet formulas rather than in abstract code or backend GUIs, the full lineage of transformation is on display for anyone with a question.  

Even for a veteran SQL programmer, Sigma Worksheets solve some of the language’s greatest shortcomings, such as: the lack of abstraction and composability; the brittle nature of query debugging; and the high bar to collaboration that’s inherent with any programming environment.

Benefits of the Sigma Approach

  • Sigma bridges the gap between the analysts and technologists by removing the language barrier and lowering the technical barrier. It also ensures that enterprises are fully using their cloud data warehouses, rather than restricting access to a privileged few.
  • Most other BI tools generate simple aggregation queries, and require special functionality to express complex transformations. In a Sigma Worksheet, users work with simple formulas, visual filters and joins to interactively explore and build their analyses.

Sigma does the heavy lifting of transforming a web of worksheets into the subqueries, aggregations, joins, and windows necessary to generate efficient queries and correct results every time.

Because Sigma generates and optimizes queries on the fly, the readability and performance of the generated SQL can be improved without any customer intervention. Sigma’s deployment model means that these improvements take effect as soon as product updates are deployed.

  • Sigma is fully aware of constraints declared in the data warehouse, such as primary keys and foreign keys. Sigma also automatically derives similar constraints from the workbooks themselves. All of this information is fed into the Translation Engine, which uses it to make intelligent recommendations, or to optimize query generation.
  • Sigma gives users the controls to access and govern their database using various levels of permissions.

Optimizing the Data Warehouse

Sigma is not trying to impose a new language or an alternative to SQL.  The solution brings together best practices that allow technologists and analysts to work together in a collaborative fashion. Anyone in your enterprise can make full use of the data warehouse via Sigma Worksheets and the Qwil compiler that powers them.

Sigma not only removes bottlenecks by empowering business analysts to pursue their own answers, it frees up the time of the highly experienced (and expensive) technical people on your team so they can direct their expert skills at challenges only they can address. At the same time, it extends the value of your curated data to all users across the enterprise.

Simply stated, Sigma doesn’t get in your way. Instead it is an invaluable tool that you and your colleagues can employ as you go about mining, analyzing, and sharing the treasure troves of data now being generated in your field.

Ready to Speak Sigma?