DATA ANALYTICS

Data Observability for BI with Sigma and Metaplane

Nazim Foufa

Marketing Content Specialist at Sigma Computing

It’s important for organizations to have visibility of their data before it hits the dashboards and gets to the C suite. Bad data can be disastrous and expensive for any modern organization. On average bad data costs organizations $15 million a year. Having the ability to resolve data issues quickly and effectively has never been more important. With data observability, you can see the state of your data at all stages of the analytics workflow.

We sat down with Metaplane’s CEO, Kevin Hu, to hear how Metaplane uses Sigma’s API to help Teachable be the first to know of data quality issues and broken dashboards.

What is Data Observability?

Data observability refers to an organization’s ability to understand the health and reliability of the data within its system.

Teams can explore data assets, review schema changes, and identify root causes of new or unknown problems. Companies rely on data engineers to provide accurate and timely data. It’s often hard to tell when something breaks and properly diagnose the problem.

Before observability tools, data teams had to manually build functionality and check if data loaded correctly –– the issue with this approach is that a job had to be created for each data source to ensure everything was functioning properly. This method is outdated and inefficient.

Metaplane’s Solution to Observability

Metaplane is a data observability tool that integrates across the data stack from source systems to warehouses to business intelligence applications like Sigma. Metaplane continuously monitors data in motion and at rest, then uses machine learning to identify anomalous behavior then warn the appropriate people when things break.

By using Metaplane to accomplish their data observability goals, data teams are able to:

  1. Build trust and make sure you hear about data issues before stakeholders get involved.
  2. Increase awareness by giving your team the current state of your data and how to improve it.
  3. Save engineering resources by reducing the time to identify and solve data issues.

Case study:

How Metaplane Used Metadata to Help Teachable Prevent Their Dashboards From Breaking

The problem:

Teachable is a platform for creating online courses and coaching services used by over 100,000 creators. Peter Jaffe, the head of data at Teachable, believes strongly in data observability and democratizing data so that business units can actively make data-driven decisions.

To support this philosophy, Peter brought on Sigma as a “super user-friendly, easy-to-use platform” to empower all business units to create their own visualizations and dashboards. Since adopting Sigma, Teachable has been able to reduce ad-hoc requests by 70% and increased analytics adoption by 5x, freeing the data team to tackle more impactful work.

The expanded observability of data within Sigma gave rise to an inevitable new problem: data quality (DQ) issues. DQ issues would quietly happen several times a month and sit within the data warehouse for days or even weeks before eventually being discovered by a business user. Once found, data quality problems cost the data team valuable time and resources, and most importantly, degraded the trust the rest of the organization had in their data.


The solution:

Having used an event monitoring tool at a previous company, Peter knew the value of having an observability tool continuously keep an eye on your data, this time within the warehouse. When Peter learned about Metaplane, he knew that it was the right solution to his problem.

After implementing Metaplane in less than an hour, the Teachable data team began receiving alerts in Slack almost instantly: “Right away, a day after we went live, we got an alert to a problem that we didn’t see otherwise. It was almost immediate.”

In the months that Teachable has used Metaplane, the time-to-identify and time-to-resolve data quality issues has decreased dramatically.

“Anomalies could regularly be sitting there in the system for as long as weeks or even months prior to this, depending on the frequency with which people use a particular table. Identifying issues is pretty immediate since we started getting from Metaplane, then we can choose whether to respond to it immediately or decide that it’s a lower priority. It’s undoubtedly a bit quicker and easier to debug with the information that you get through Metaplane.”

Alongside Airflow and Datadog, Metaplane now forms a critical part of the data team at Teachable: “We look at the channel constantly. It’s just become part of our regular workflow.”

How Metadata is Used to Build Trust

Sigma recently released an API to increase visibility into the data stack and integrate with data observability partners such as Metaplane. Using this API, Metaplane automatically identifies which Sigma queries and workbooks are affected by upstream data issues in Redshift tables and columns.

While Metaplane helped the Teachable team be the first to know of data quality issues, the Sigma-Metaplane integration aided data observability with the follow up questions of impact analysis and prioritization. Peter says that “knowing when we have an anomaly in the data and what’s being affected in the BI platform allows us to be more effective about how we decide to respond to the outage. Is this a big deal or not a big deal? Now we can see not only how many but which workbooks are impacted.”

In addition to helping Peter and his team understand the impact of data issues, the Sigma-Metaplane integration also makes his team more confident in their work. “It’s psychologically reassuring to have all this information in one place and not be guessing,” Peter says.

What is Sigma’s API?

Sigma’s new API allows developers to programmatically interact with objects in Sigma, including workbooks, datasets, materializations, queries, and more. As part of the API, Sigma makes it easy to retrieve metadata about these objects, such as information about a workbook object, as well as the relationships between those objects, like the queries within a workbook. Among other use cases, this metadata makes it possible to determine which data sources are being used by query, giving a clear map of lineage between Sigma objects and a data source.

read more

Visibility Brings Trust

As the data ecosystem continues to grow exponentially, the need to explore, analyze, and visualize is more and more critical to a companies ability to differentiate themselves via data-driven decision making.

Trust in your data – from the moment it hits the warehouse all the way until it is used in a critical business decision – is a core pillar in building a data driven organization.

Best in breed platforms like Sigma & Metaplane work together to eliminate the difficult, resource intensive, and time consuming job of monitoring your data quality, and allow you to focus on using your data to make strategic decision.

Want full visibility of your data ecosystem?

Check out our build-a-stack guide to get started.