Buyer’s Guide: Building a
Cloud Analytics Stack

“With data-driven goals paramount, BI and analytics tools, applications, and practices—as well as the data management and integration systems that support them—are in the spotlight. This “stack” of layered technologies, cloud-based services, and practices is critical to providing an increasingly varied spectrum of decision makers with a steady stream of trusted, quality data and analytics insights.”—David Stodder, TDWI Research

Introduction: An Evolving Cloud Analytics Landscape

The data landscape is quickly changing as companies say goodbye to legacy solutions in favor of a more modern, cloud-based approach to business intelligence (BI) and analytics. A staggering 41% of companies are considering a move to cloud-based analytics in the next year as they look to cut costs, improve efficiency, and enable better decision-making at every level of business.*
This should not come as a surprise. Companies are actively pursuing a cloud strategy because they are tired of the old ways of managing data. The cloud frees up resources and allows data teams to focus on the application of data insights instead of management of on-prem hardware, legacy software, and all the costs that come with it. It’s a  new era where any company can rapidly deploy analytics across the organization. Change can happen quickly, enabling the biggest companies to act more like startups, instead of slow-moving enterprises.
With this movement to the cloud, some trends have emerged along the way. Particularly a renewed focus on citizen data scientists (and accessibility), true self-service analytics, data privacy, and security. Despite forward movement in 2018, most organizations have yet to unlock the cloud’s full potential. The good news: the cloud data stack has finally reached maturity, and the latest advancements address the new trends and demands of data-first organizations. These trends largely drive the suggestions laid out in this guide.

-

The Buyer’s Dilemma

For years companies have poured money into BI and analytics. But adoption never took off. Why? Traditional BI is expensive to maintain, and on-prem is simply too complicated to succeed in the ways companies had hoped. Pair that with legacy analytics software that provide poor user interfaces and centralized systems that only data teams could access and make sense of, and you have a recipe for failure.

“Pervasive business intelligence remains elusive, with BI and analytics adoption at about 35% of all employees.”
— Gartner

Thankfully, this is all changing with the rise of SaaS tools, cloud data warehouses, and new cloud-native BI solutions that simplify self-service analytics for entire organizations. But with many options available in the market to choose from, it can be challenging to discover solutions that balance data access, security, and flexibility—and determining the right solution for your company can be even harder. That’s why we’ve put together this buyer’s guide to help you build a framework to turn raw data into insights with a modern cloud analytics stack and discover potential providers to partner with in the process.
Read on and learn how to build a complete cloud analytics stack, determine your analytics needs, and explore comparisons of leading ETL and data warehouse providers to consider during the buying process.

“Every company needs a clear set of goals and objectives to achieve maximum benefits from its business intelligence strategy.”
— Robert Miller, Tech Republic

Setting the Goals of Your Cloud Analytics Stack

From the get-go it’s important to tackle the buying process with some clear goals in mind. By establishing goals and knowing what your company needs, you’ll shorten the process and make it easier to cut through the noise in the marketplace. If you’re moving to the cloud for the first time your experience will surely be different from someone replacing an existing cloud solution. There is no one-size fits all solution.
Your stack should be practical, scalable, and provide insights to all who need it. This includes the ability to source, store and analyze data—and build reports, dashboards, and provide access to key stakeholders so they can make further discoveries.
To kick things off, we’ve provided some key questions to ask yourself before starting the exploration process with vendors.

-

Data Wrangling— Where does your data come from?

Before you can do anything, you have to get a clear understanding of the sources of data. Are you collecting data from internal sources such as a database or data lake? Third-party services, APIs  and tools? All of the above? By answering these questions you can determine whether you can settle on an off-the-shelf ETL or ELT solution (more on this later), or you need to build out a custom tool to pull together internal company data. Either way, it pays to know what data sources are important to your company.

Data Storage— What types of data and how will you store it?

What kind of data are you working with? Is your data structured or semi-structured? And where do you plan to store it? If you want to generate insights to make informed decisions, you’ll need a data warehouse that allows teammates to dig in and ask questions of the data using a query engine—as opposed to a traditional database.

You may also need to consider adding a data lake to your architecture to meet additional demands. (more on this later).

Data Analysis— How will you enable data discovery, generate data insights and provide data access?

When it comes to data discovery, what types of analysis is important? And who will need to access and explore your data? Is it constrained to the data team, or do you expect business experts to query data and generate reports? The answers will help guide you to the right analytics tools to purchase.

With a better understanding of your data needs, you’re ready to move on and learn more about the components that make up a modern cloud analytics  stack.

Collecting Data—ETL and ELT

What is ETL?

ETL (Extract, Transform, Load) is the traditional process of moving data from original sources to a data lake or database for storage, or a data warehouse where it can be analyzed. In the past, this usually involved moving data from one database to the warehouse, but this has quickly changed over the last decade as companies began to rely more on third-party SaaS tools that produce multiple streams of data. Examples include Salesforce, Marketo, and Zendesk. These platforms communicate via APIs with an ETL tool to constantly transfer live data from point A to point B, commonly referred to as the “data pipeline”.

-
  1. Extract: retrieve data from sources
  2. Transform: compute and format data for integration
  3. Load: transfer data to data warehouse, database, or data store.

While ETL platforms are no stranger to the data stack, the rise of SaaS tools require modern enterprises to manage huge amounts of structured and unstructured data in real time from multiple sources at a massive scale—something legacy ETL vendors and on-prem data warehouses simply cannot manage.
When selecting an ETL vendor, be sure it supports your SaaS toolset and transfer needs. However, it’s not entirely uncommon to use multiple ETL vendors to transfer data from a range of data sources.

When to Consider ELT

You may also want to consider a vendor that not only supports ETL, but also ELT (Extract, Load, Transform). This modified pipeline loads the extracted data into the cloud data warehouse where transformation occurs. This process can be advantageous as it allows you to leverage the power of the cloud to perform complex joins and calculations. This parallel approach means you can perform transformation where it makes the most sense for your workflow, in the pipeline or in the warehouse. This method is becoming more common as companies seek to avoid the risks of poorly performing hardware and the costs associated with upgrading an existing ETL infrastructure.

When to Build a Custom Solution

There may not be a single tool that supports every data source you work with. Building a custom ETL tool may be necessary in these situations. While it’s not as common as it used to be (mostly because companies tend to use similar tech tools across industries), it can happen. If you run into this problem there are several commercial and open source tools you may want to consider.

Other things to consider when choosing an ETL/ELT vendor:

Flexibility: You want to select a vendor that can manage multiple data sources, including support for structured and unstructured data in real time. Also, consider a provider that can support an ELT workflow so that you can benefit from your cloud data warehouse’s scalable processing to transform once the load completes.

Cloud Data Warehouse Support: Not all ETL/ELT vendors are created equal. As mentioned, modern ETL tools are designed to integrate with cloud data warehouses, whereas legacy ETL tools only integrate with on-prem data warehouses. Be sure your ETL vendor optimizes for your data warehouse of choice.

Pricing: While each ETL vendor prices their solution slightly differently, it’s important to have a basic understanding of the elements that often determine pricing frameworks. You’ll want to take the pricing mechanisms into consideration to ensure the price works for your company now and in the future, should you integrate additional data sources or scale significantly. The last thing you want is to invest early in an ETL provider and find out that a year later it’s no longer a feasible option. Switching costs can add a significant time investment and cost downstream. So planning upfront can save you headaches in the long run.

With today’s modern ETL vendors pricing tends to be charged on a monthly recurring pricing structure, but you may also encounter annual pricing as well. Here’s a breakdown of some of the most common pricing mechanisms for ETL providers:

Integration-based Pricing:

Some vendors have various levels of pricing based on the number of data source integrations you work with. It’s not uncommon to see multiple tiers that range from 2, 5, 10, or unlimited integrations with data sources.

Row-based Pricing:

Another common pricing mechanism that you may come across is a row-based approach. For example, one pricing tier will cover up to 25M rows of data, while another 50M or 100M, and an enterprise level may be custom priced, but support unlimited rows of data.

Volume-based Pricing:

While not as common as the other pricing mechanisms covered above, some vendors may charge based on the volume of data pushed through the pipeline per month. This is commonly measured in the maximum transmission unit, or MTU.

More information on ETL:

ETL Process: Traditional vs. Modern from Alooma
ETL Process from Stitch Data
What is ETL from Talend

Storing Data—Cloud Data Warehouses and
Data Lakes

Every company produces data on a daily basis. Whether it’s transaction data from a purchase, customer interactions inside a product or app, or even a simple account debit at the bank, all this data must be stored inside of a database or data lake in real time. Choosing how to set up your data architecture largely depends on the types of data you store and how you choose to use it.

-

The Modern Data Warehouse Lives in the Cloud

Capturing this data is just the beginning. To understand that data, it must be stored in a  relational data warehouse that provides a query engine to ask questions of the data itself.  Think of the data warehouse as your data hub that is the center of your analytics stack. Cloud data warehouses allow business experts to explore data and generate insights from trends using cloud analytics tools.

Many data warehouses in use today were built to service the on-premises data centers of the past. But these solutions are a dying breed as they get replaced with the next generation of cloud warehouses designed to provide greater flexibility and manage real-time data demands. On-premises data warehouses generally require large upfront investment in hardware, license fees, and ongoing maintenance costs to manage. They also cannot elastically scale up or down to meet data demands, meaning companies have to pay to provision a warehouse for peak use despite varying workloads that change over time as analytics needs arise. Together, this leaves companies overpaying for data management and wasting IT resources that could be spent on higher value projects.

Modern cloud data warehouses eliminate upfront infrastructure costs and don’t require the ongoing investment to partition, optimize,  or vacuum data. They can also collect data from many sources to scale elastically to support nearly infinite users and analytic workloads for faster  insights. This includes structured and unstructured data, such as JSON. These solutions allow enterprises to add any number of users, implement familiar, easy-to-use analytics tools, and benefit from lower costs—all without sacrificing  security, governance, or data compliance.

Learn more about the benefits of cloud data warehouses here:

Delivering Data Warehousing as a Service, from Snowflake
How Modern is Your Data Warehouse, from Google BigQuery
Modernize Your Cloud Data Warehouse, from Amazon Redshift

When Using a Data Lake Makes Sense

Data lakes are a flexible option that allow you to store data outside of rigid schemas. Analytics stacks that are built entirely on a data warehouse can sometimes make it harder to analyze data outside the schema without constant efforts from the data team to curate and clean the data on a regular basis. In cases where you have large amounts of data collected and stored outside of your schemas, this approach can make sense.
It’s not uncommon for companies today to have both a cloud data warehouse and data lake. Data lakes make it possible to store non-relational data from mobile apps, IoT devices and other non-traditional data sources. Data captured outside your pre-defined data schema is better stored in a data lake because you may not know what types of questions you want to ask of this data upfront.

Learn more about data lakes here:

“What is a data lake?” from Amazon

Choosing a Cloud Data Warehouse

Choosing the right cloud data warehouse isn’t an easy decision. It pays to do your homework in advance. Making the wrong decision will cost you down the line and disrupt operations. When making your selection, here are some key things to keep in mind.

Scalability: Growing companies need to consider investing in a warehouse that can grow with them. This is one of the greatest benefits of the modern cloud data warehouse, but keep in mind that each warehouse scales a bit differently. So when choosing a provider consider how easy it is to scale, the cost of scaling, and what IT resources you need to grow.

Speed: Accessing and processing data in the warehouse takes time. But the cloud makes this faster. Each provider stores and processes queries a little bit differently. For example, some process data in parallel, while others will spin up as many clusters as needed to deliver results in seconds. You’ll want to learn what limitations exist and whether they will impact the time it takes to generate insights for users.

Security: Unlike on-prem solutions, the leading cloud data warehouse providers are incredibly quick to release the most up-to-date security features and protocols via patches—meaning internal IT teams won’t have to maintain security themselves.

Cost:  With cloud warehouse providers, you’re generally only paying for what you use, but there are some stipulations to consider. Depending on the provider you may be charged at a flat rate,  per hour for storage and compute, or pay-per-use of compute and storage. Consider the cost today and in the future. Choosing a provider that has pricing that works with your needs makes it easy to predict the costs down the road as you scale.

Connectivity: As you grow you may need to provide access to additional users or encounter new data sources, and integrations become important. Assessing any changes to data sources and user growth may occur downstream may save you headaches later. Be sure your data warehouse provider integrates with your ETL and BI tools of choice.

Reliability: Generally speaking, cloud data warehouses are much more reliable than on-premises data warehouses of the past. Choosing a provider such as Snowflake, Amazon, Microsoft, or Google means you get world-class engineering teams on your side. That said, it still pays to check out how they have managed past issues  and how long they take to reach a resolution. A big part of this is the reputation of customer service teams and the way they communicate. Make sure to  choose a provider that can support your team if and when something goes wrong.

Use: How you use a data warehouse may ultimately determine the provider you choose. Make sure you consider what your company needs and the use case of teams. If you’re mostly using your data warehouse for machine learning and data science your needs will be much different that if you want to provide on-going, ad-hoc analysis or self-service analytics to your entire company. Consider whether you need real-time data access, built-in statistical functions, data preparation, or support for multiple data types.

Data Exploration, Discovery, and Collaboration—Cloud Analytics

What to Look for in an Analytics Tool

Every company produces data on a daily basis. Whether it’s transaction data from a purchase, customer interactions inside a product or app, or even a simple account debit at the bank, all this data must be stored inside of a database or data lake in real time. Choosing how to set up your data architecture largely depends on the types of data you store and how you choose to use it.

Before you make any purchasing decisions, it’s important to get a good idea of what you need from your analytics solution. Every company is unique, and needs will vary. You’ll need to evaluate any internal requirements you may have and align any purchasing decisions with those goals. With that in mind, here are some guidelines to consider when setting organizational goals that are often overlooked.

Take a User-Driven Approach

Remember that you’re building a solution to meet the needs of the business and analytics experts at your company. That’s why we suggest taking a user-driven approach. Is this solution for a specific department, or will people across the entire company be expected to explore data and generate insights? In the past, data analysis sat in the hands of the data team, or c-suite, but that’s all changing as more companies look to make data a bigger part of culture to improve productivity and aid better decision-making. The rise of the self-service analytics tools now make it easier to get data in the hands of people at all levels of your organization. These tools tend to have shorter learning curves, and can be set up to meet the needs of people without a background in data science and analytics.

Determine How Users Will Engage with the Data

Once you’ve determined who will use your analytics solution, consider how they will engage with it. Do you expect users to simply absorb insights through dashboards and reports, or is data exploration important? Reports and dashboards can highlight trends, but often raise more questions than they answer or become outdated quickly. Many analytics tools require an understanding of SQL to ask any complex questions. This skill set rarely exists outside of the data team. If you expect domain experts in marketing, sales, and other departments to use your analytics solution, you’re going to need a self-service tool that allows business people to explore and query data without a background in SQL. Tools like Sigma empower business experts to ask more of their data without writing a single line of SQL, opening up big data queries to the non-technical user outside the data team.

Encourage Collaboration

If you discover insights, and you have no way to share it, does it make an impact? When it comes to BI communication and collaboration are key. But both are difficult with legacy analytics tools. When looking at a modern cloud solution, keep collaboration in mind. Not only should it be easy to share reports and embed dashboards, but consider how easy it is for teams or coworkers to build upon each other’s analysis in the analytics tool itself. Being able to see the analysis your coworkers create and the data they are referencing lets you build on the base that someone else has created. This generates analytic compound interest, where one insight can quickly lead to others. It can also ensure multiple teams remain on the same page and share common metrics. Ultimately, collaboration helps surface the most useful insight without locking people into predefined questions—which can increase productivity and unlock insights faster.

Keep Adoption in Mind

It’s hard to create a data-driven culture if analytics applications aren’t easy to use or pricing models impede adoption by creating burdensome upfront costs. More and more enterprise software tools are moving away from buttoned-up interfaces in favor of simple, modern design that is intuitive and easy to use. Cloud pricing models are also becoming more common. And it makes sense. Companies are tired of paying for what they don’t use, and with the cloud they don’t have to. These features encourage organizational adoption in ways legacy tools have failed to deliver. Look for these features when considering an analytics application. It will help make it easier to drive adoption and ensure data becomes a central tenet of your company’s culture.

Critical Components of a Cloud Analytics Tool

Built for the Cloud

If your data lives in the cloud, shouldn’t your analytics? Most analytics tools available today have some form of cloud offering, but few are built for the cloud from the ground up. A move towards a fully-managed cloud solution makes sense at this stage. Why? The cloud-first data stack is finally reaching maturity, and the constraints of on-prem or hybrid solutions hold back growing companies as they look to data to make smarter business decisions. You want to take advantage of the cloud benefits such as elasticity, real-time data access, easy collaboration and sharing, and usage-based pricing.

Flexible Data Modeling

Sigma’s data modeling provides a flexible way to guide people’s data exploration and build centralized data definitions without handcuffing your business experts. We’ve balanced control—as much as you need—with the freedom to let business users find, add, and trust new data approved by data teams.

Pre-build Sigma Data Blocks using the visual interface in minutes – without writing code. There is no need to learn new modeling languages or write SQL, unless you (or your SQL gurus) want to use our SQL runner. And since business experts no longer have to wait for model changes they are less likely to download extracts and go, so your data remains in the warehouse where it’s safe and secure.

Data Support and Accessibility

Data velocity is only increasing, and data diversity along with it. Semi-structured data—like JSON—is now the norm. And analytics tools aren’t keeping up: many can only deal with this data once it is flattened. The result? Most people get cut out of the data conversation because they have to wait for data teams to clean and curate it. Be sure to consider whether the tool you choose can support your data types, and doesn’t limit data access for those who need it.

Reporting and Dashboards

With any analytics tool, the ability to generate and share reports and dashboards are a must. Just make sure that’s not where analysis stops. Too many tools rely on visualization and reports, and don’t allow users to ask more questions through exploring data further and drilling down into the trends. Also keep in mind that you want a tool that can keep these reports and dashboards up to date in real time and have embedding capabilities that allow you to share the insights outside of the tool itself, say in an app or on a website. This gets more people involved in the data conversation and helps drive data adoption.

Sharing and Collaboration

Collaboration tools have undergone a renaissance. Slack and Google Docs changed how people work together, and set expectations for how easy things should be. Now analytics tools like Sigma have brought this approach to BI. Collaborative BI means being able to work seamlessly and effortlessly with internal and external partners, easily finding and building on the most relevant analysis. Fully-cloud systems and improved AI-driven algorithms enable new collaborative platforms. This approach is shrinking the data access gap and will help you drive greater adoption by getting data in the hands of business experts who can put it to good use.

Security and Governance

These days, it seems like every time you turn around another company announces a data breach. Historically, many of these breaches have come from insecure data practices surrounding extracting data to spreadsheets that are hard for a data team to track and ensure compliance. The most secure place for data is in the data warehouse. Cloud computing can offer better physical security benefits that on premise. Cloud providers have governance oversight to ensure their compliance with security standards, as well as dedicated personnel to keep data at scale secure. When handing over the physical control of your data, you are handing it to a company that specializes in keeping that data online and secure.

Meet Sigma: A New Approach to Cloud
Analytics

Sigma is a modern BI and analytics application built for the cloud. Trusted by data-first companies, Sigma provides live access to cloud data warehouses using an intuitive spreadsheet-like interface. This unique approach empowers business experts to ask more of their data in a familiar environment, without writing a single line of code.

With the full power of SQL, the cloud, and a familiar interface, business users have the freedom to analyze data in real time without limits—meaning they aren’t constantly relying on the data team to generate reports and dashboards, or dig deeper into trends to answer questions when they arise. Sigma is self-service analytics as it was meant to be: a single source of truth that eliminates data extracts, simplifies analysis, and drives BI adoption at every level of business.

-

Why Sigma is Different

Say Goodbye to Data Extract Chaos

Does this sound familiar? A business expert at your company needs data before they can make an informed decision. So they ask for a data extract (usually a .CSV) that they can manipulate in a spreadsheet program on their own PC, most likely Microsoft Excel or Google Sheets.

While giving a business expert the data in a program they are comfortable with can jumpstart the analytics process by putting them in the driver’s seat, it opens the door to complete data chaos. How so? What happens to that data once it’s extracted? Who has access? And how long before it’s outdated? The answer is usually, “I don’t know,” and that’s a scary thought when we’re talking about confidential corporate data.

If this sounds like a recipe for hair loss on your data team, that’s because it is. How can they maintain data security and governance principles when data is floating freely in emails and on multiple devices? Even more, how can anyone be sure they are aligned on data insights if everyone is looking at data extracts, and not a unified source?

Sigma solves for these issues by eliminating the need to extract data from the warehouse and providing anyone with access to live data in a familiar spreadsheet-like interface. With Sigma, you know data is up to date, secured in the warehouse, and available to anyone who needs it. And data admins can manage individual users or departments with ease, and even take advantage of partitioned data sharehouses or data lakes to ensure data governance and compliance is met.

Leave Report Factory Hell Behind

As long as BI has existed, the data and analytics capabilities have sat in control of the data team. This has created a scenario where business experts have to constantly go to the data team with questions to be answered with ongoing reports. What’s the best-performing sales region? How well did that marketing asset convert site visitors? Is the new feature being used? You get the point. The answers usually comes in the form of a dashboard or report based off of complex SQL queries run against the data warehouse.

Usually, that report raises more questions than it answers. Why did the Midwest region outperform in Q2? Why was the asset conversion rate so high? Why aren’t people using the new feature as much as we thought? And so on. Moreover, it often takes too long to get into the hands of the business expert that they can’t act on the insights in a timely manner.

Here lies the dilemma. This ad-hoc reporting and limited data definitions lead to a never-ending cycle of reports— which slow down your data team and leave business leaders reliant on the lengthy reporting cycle. But what if you could leave this report factory hell behind and empower business experts to ask their own questions in real time?

Enter Sigma. Our intuitive interface allows anyone to ask questions without writing SQL. This means they can generate their own insights, build reports, and ask follow up questions or iterate on their initial queries. And because Sigma is collaborative, these reports and worksheets can be leveraged by anyone with access, so business experts can build off each other’s work where it makes sense.

See Cloud-Native Analytics in Action