A yellow arrow pointing to the right.
Yifeng Wu
Software Engineer
Madison Chamberlain
Software Engineer
Catherine Chu
Technical Support Engineer
Jake Hannan
Head of Data
June 20, 2024

Boosting Productivity: Leveraging Cloud Data Warehouse AI Functions in Sigma for Enhanced Insights

June 20, 2024
Boosting Productivity: Leveraging Cloud Data Warehouse AI Functions in Sigma for Enhanced Insights

Cloud data warehouses like Snowflake and Databricks recently announced support for large language models (LLMs). But why do LLMs matter? LLMs enhance data processing and analysis by enabling more natural language interactions, automating complex tasks, and providing deeper insights through advanced language understanding.

In Sigma, organizations can harness the full potential of LLMs to enhance data analysis, visualization, and interaction, making it more intuitive and insightful for users.

Calling an LLM function in Sigma is straightforward

Snowflake Cortex LLM

Snowflake Cortex is a fully-managed service that enables access to industry-leading LLMs such as Mistral AI and Llama to securely bring generative AI to governed data. Teams can focus on building AI applications, while Snowflake handles model optimization and GPU infrastructure to deliver cost-effective performance. Snowflake Cortex features are provided as SQL functions such as complete(), translate(), summarize(), sentiment(), and many more

AI Functions on Databricks

Databricks offers built-in AI SQL functions, which invoke a state-of-the-art generative AI model from Databricks Foundation Model APIs to perform tasks like sentiment analysis, classification, and translation using ai_analyze_sentiment(), ai_classify(), and ai_translate(). Databricks also provides a built-in SQL function ai_query(), which allows custom models hosted by a model serving endpoint or external models, such as OpenAI and Anthropic. 

Integration of AI Functions using Sigma's Custom Functions

Utilizing cloud warehouses’ AI functions within Sigma is straightforward and impactful, as it integrates into Sigma's custom functions. For example, a user can create a reusable custom function with just a few clicks and employ it with syntax like complete("mistral-large", [prompt column]), opening pathways to valuable insights.

Utilize Databricks ai_summarize() function

Utilizing AI functions in Sigma feels like second nature, as its usage mirrors that of any other standard function. No added complexity, just pure efficiency and ease.

Best Practices for Using AI Functions

Data Preparation

Before feeding data into LLM functions, it's crucial to clean the data by removing irrelevant tokens, such as HTMLcharacters, and trimming the data as needed. For example, a user might leverage Substring() to trim the data within the limit or RegexpReplace() to replace characters. This process ensures inputs comply with the model token limit while maintaining data cleanliness and relevance, laying a solid foundation for analysis.

Applying Filters to Limit Rows and Preview Data

To expedite query runtime and minimize costs, filters can be applied to preview a limited number of rows of data. This enables users to make fine-tune prompts and gain insights into the results more efficiently. Note that the order of execution can vary depending on the filter setup and the data warehouse used. In some cases, LLM calculations may need to process the entire dataset.

Workbook Materialization

Following the result preview, the next step is to set up workbook materializations to precompute and persist LLM results as base tables for later use, ensuring accessibility and readiness for subsequent analyses. A user can then build pivot tables and groupings without needing to compute those LLM columns again.

Incremental Materialization Using dbt Models

In conjunction with materialization, incremental models can be built within dbt, which ensures that only new or modified data is processed, optimizing efficiency and cost throughout the analysis pipeline. For more information on how to use AI inside of a dbt project, navigate to this documentation in the dbt Developer Hub

Use Input Tables to Persist LLM Results

Creating an empty input table and then copying and pasting the LLM results into it ensures that the results are stored, eliminating the need for recomputation.

Cost Monitoring

As with any new and existing feature, cost is always top of mind for teams. Snowflake offers daily spend associated with AI services out of the box in their organization usage Metering Daily History view. With this readily available, creating a workbook in Sigma is only a few clicks away.

For Databricks please see View Lakehouse Monitoring expenses | Databricks on AWS for details. Please note that using data warehouse AI functions may be subject to resource limitations and throttling during periods of high utilization. Refer to your data warehouse documentation for more details.

Conclusion

In today's data-driven world, the integration of cloud data warehouses’ built-in AI function with Sigma’s custom functions provides a promising pathway toward achieving full scalability, enhanced security, and easy-to-use LLMs. Your entire organization can easily harness the power of AI to navigate complexities, innovate with agility, and drive continuous growth in the ever-evolving landscape of data analytics.

Interested in seeing this in action? Reach out to our team to get a custom demo of Sigma. And keep up to date with what’s new by following Sigma on LinkedIn or joining our team at a live event.

WATCH THE PRODUCT LAUNCH