Sigma + dbt = A Collaborative Approach to Data Modeling
Content Lead, Sigma Computing
Today’s teams have access to more data than ever before. But because this data is generated in huge volumes in a variety of formats, including semi-structured and unstructured, it often comes with quality issues, and inconsistent definitions lead to confusion.
For this reason, data must be prepared and modeled before it can be analyzed. dbt and Sigma combine to create a robust data modeling layer that fully supports centralized, curated datasets and empowers teams to better understand their data and glean actionable insights from it.
dbt (data build tool) is a data transformation and modeling framework that enables data professionals to easily transform, test, and document data in their cloud data warehouse (CDW). With dbt, data teams can work directly within their CDW to produce trusted datasets.
Sigma seamlessly extends what dbt offers to promote a collaborative, interactive data modeling ecosystem. With a combined Sigma and dbt solution, teams can quickly and easily prepare their data, explore it, find answers to business questions, and share discoveries quickly and easily.
The Importance of Data Modeling
The right context, the right data
Data modeling is the process of organizing and presenting raw data using business logic. Data models give users information in the right context so they can access the right data for queries, generate accurate insights, make decisions, and take action. For example, teams may know your organization has collected data on customer location, purchase history, and geo-targeted advertising campaigns. They may want to delve into this data to determine whether differences in location or ad distribution affected the products that were ultimately purchased. This requires understanding how data is organized, where it “lives,” and whether it’s up to date.
Making meaning for more actionable insights
Data modeling allows analysts and domain experts to make meaning out of the data. Because data gets provided to business experts with better context, they can use it to create more actionable insights.
dbt is one of the most popular frameworks for transforming and modeling data today. It combines modular SQL with software engineering best practices to allow teams to quickly and easily integrate, clean, de-duplicate, restructure, filter, aggregate, and join data and build data pipelines.
One of the reasons that dbt is so attractive is that it makes data engineering activities accessible to a much broader range of people, including analysts. With simple select statements, users can transform and model their data directly in their CDW using SQL. dbt empowers users to write custom business logic, automate data quality testing, and deliver curated, trusted datasets along with documentation. Essentially, dbt allows any user who knows how to write SQL SELECT statements to build production-grade data pipelines.
Another benefit of dbt is that it integrates seamlessly with the data warehouse, sitting as a layer on top of the CDW and pushing code down to the database level. For this reason, it’s faster and much more secure since it doesn’t require data extracts.
Sigma is a dynamic data exploration tool at the end of the modern data stack. Sigma unlocks the value of data with the simplicity of a spreadsheet — and it’s faster than using SQL alone. It automatically translates spreadsheet functions and formulas into optimized SQL. Like dbt, Sigma directly connects with CDWs, allowing users to work with live datasets, so no data extracts are needed. Users can build pivot tables, drill down to answer ad hoc questions, and slice and dice without the constraints of data cubes, all within a spreadsheet-based interface.
Sigma is designed with collaboration in mind. Teams can work with data in groups and across departments while keeping data secure and governed using permissions. Users can leave comments in threaded discussions to facilitate the collaboration process.
Additionally, Sigma has robust features for integrating into modern data stacks — a metadata layer, pre-defined joins, ad-hoc joins, lineage, and materialization. For this reason, it’s a plug-and-play solution that allows teams to get the most out of their data, stress-free.
How Sigma and dbt Empower a Collaborative Approach to Data Modeling
Sigma and dbt work together to create an ideal solution for collaborative data modeling. Here’s how.
Data teams use dbt to centralize and standardize business logic, build models, generate tests, and deploy them across environments.
Data experts use Sigma to understand their data and build and share exploratory data models through Sigma’s powerful, iterative spreadsheet UI. This enables business and data teams to get to a shared understanding of data and models quickly and collaboratively.
Data teams use Sigma to identify which models should be centralized in dbt.
With Sigma and dbt, you can:
ENABLE YOUR USERS
Define your business-critical KPIs with your users, instead of on their behalf – and promote them to your version-controlled dbt source-of-truth.
UNDERSTAND YOUR DATA
Explore, iterate, and understand your data faster than SQL alone, using Sigma’s spreadsheet UI to refine new models before they enter production.
DRIVE YOUR MODELS
Centralize the models you know are valuable & critical to the business. Then orchestrate new business logic back into dbt, completing the feedback loop.
4 Reasons to Pair Sigma and dbt
Why are Sigma and dbt perfect partners for data modeling and business intelligence? Here are four key reasons.
1. Take advantage of cloud capabilities and performance
Sigma and dbt both sit atop the cloud data warehouse. For this reason, these tools can work together to take advantage of the capabilities and performance of the cloud. The amount of data involved in today’s advanced analytics use cases requires processing massive amounts of data from a variety of sources. The cloud’s ability to support multi-cluster, shared data architecture allows teams to scale compute resources separately from storage resources. Today’s CDWs can automatically scale to support any amount of data, any number of workloads, and concurrent users and applications, providing the power needed to work with billion-row datasets.
2. Follow security and governance best practices for the modern tech stack
In traditional data transformation and analytics workflows, data modeling typically happens in business intelligence tools. Downstream BI users then consume the models for insights. But this approach opens up security vulnerabilities. Instead, the combination of Sigma and dbt allows users to follow the best practice principles of a modern tech stack — both Sigma and dbt follow the principle of push down execution to the CDW.
Performing all transformations directly in the CDW using dbt and exploring the data in the CDW using Sigma keeps all data and critical business logic within the secure CDW. The CDW serves as the single place for managing and maintaining business logic, governance and security.
3. Make it easier to involve all stakeholders
dbt brings software engineering principles to the transformation layer, giving modeling ability to analysts, while Sigma extends dbt’s capabilities to allow exploration, experimentation, and iteration by your business users. Clearly-defined and agreed-upon KPIs are critical to actionable business intelligence. Building data models without a shared understanding and agreement from all your stakeholders is a recipe for disaster. Using Sigma and dbt together allows teams to eliminate misaligned expectations and iterate KPI definitions as needed by including stakeholders early, often, and throughout the data pipeline development process.
4. Create a BI feedback loop
When paired together, Sigma and dbt form an ideal feedback loop for business intelligence. Exploratory modeling, profiling, and analysis of data in Sigma lead to better-informed production dbt models, while production dbt models lead to better business analysis of data in Sigma. This feedback loop enables teams to get more out of their data while continually improving the quality and relevance of insights.
To make these concepts more concrete, let’s look at a data workflow that a team might implement using Sigma and dbt.
- Ingest data into the cloud data in its raw form.
- Conduct a rapid analysis in Sigma to better understand the raw data. Check nulls, check cardinality of joins, identify “messy” data, and extract JSON. Include business users in this analysis to ensure the context of the data is understood.
- Productionalize the data flow. First, Sigma analysis to write your models, create data cleansing rules, etc. Then use dbt to standardize the data, write the dependencies, create the tests, and deploy across environments.
- Expose final dbt models and materialized tables in user-friendly models that users can access via Sigma. Standardize the core, business-critical data across golden schemas. Allow users to explore new data in Sigma to surface and create new business-critical KPIs.
- Promote new business-critical KPIs from Sigma into production using version-controlled dbt packages.
As you can see, this workflow creates the iterative feedback loop we described above, ensuring data quality while allowing teams to define KPIs and identify new ones as needed.
dbt + Sigma: Better Modeling, Better BI
Combining Sigma and dbt allows your team to easily model data using SQL and build production-grade data pipelines while ensuring that analysis benefits from stakeholder input. The collaborative approach enabled by a Sigma + dbt solution will ensure that expectations are aligned and KPIs are clearly defined so that business intelligence is more relevant and actionable.