Choosing the Right Cloud Data Warehouse
Head of Communications, Sigma
Having a centralized cloud data warehouse is crucial for any data-driven company. Cloud data warehouses serve as a central repository for all your organization’s data—providing teams with data access to analyze and report on business performance.
If you’re considering a move to a cloud-based data infrastructure, you’re not alone. It’s estimated that more than 80% of enterprise workloads will run in the cloud by 2020.
But selecting the right cloud data warehouse isn’t an easy decision. It pays to do your homework in advance. The wrong decision will cost you down the line and disrupt operations.
As you explore different vendors, here are the key things to keep in mind when comparing cloud data warehouses.
Key factors to consider
Growing companies should consider investing in a warehouse that can grow with them. This is one of the greatest benefits of the modern cloud data warehouse, but keep in mind that each warehouse scales differently.
When choosing a provider, consider how easy it is to scale, the cost of scaling, and what IT resources you need to grow along the way. Ideally, you would find a solution that can scale automatically to support any level of concurrency or query volume.
Consider how easy it is to scale, the cost of scaling, and what IT resources you need to grow along the way.
Accessing and processing data in the warehouse takes time. But the cloud makes this faster.
Each provider stores and processes queries in slightly different ways. For example, some process data in parallel, while others will spin up as many clusters as needed to deliver results in seconds. You’ll want to learn what limitations exist and whether they will impact the time it takes to generate insights for users.
Unlike on-prem solutions, the leading cloud data warehouse providers are incredibly quick to release the most up-to-date security features and protocols via patches—meaning internal IT teams won’t have to maintain security themselves. Depending on your industry, there may be specific security requirements your data warehouse needs off the shelf. Ensure whichever warehouse vendor you choose meets the requirements before you set up a PoC or sign a contract.
With cloud data warehouse providers, you’re generally only paying for what you use. But there are some stipulations to consider.
Depending on the provider, you may be charged at a flat rate, per hour for storage and compute, or pay-per-use of compute and storage. Consider the cost today and in the future. Choose a pricing structure that works with your needs so it’s easy to predict the costs down the road as you scale.
Consider the cost today and in the future. Choose pricing structure that works with your needs so it’s easy to predict the costs down the road as you scale.
As your company and data needs grow, you’ll need to onboard additional users and support new data sources. That’s where integrations become essential. Forecasting changes downstream will save you headaches later. Be sure your data warehouse provider integrates with your ETL/ELT and BI tools of choice.
Generally speaking, cloud data warehouses are much more reliable than on-premises data warehouses of the past.
Choosing a provider such as Snowflake, Amazon, Microsoft, or Google means you get world-class engineering teams on your side. That said, it still pays to check out how they have managed past issues and how long they take to reach a resolution. A big part of this is the reputation of customer service teams and the way they communicate. Make sure to choose a vendor that can support your organization if and when something goes wrong.
How you use a data warehouse may ultimately determine the provider you choose. Make sure you consider what your company needs and the use case of teams.
If you’re mostly using your data warehouse for machine learning and data science, your needs will be much different than if you want to provide on-going, ad-hoc analysis or self-service analytics to your entire company. Consider whether you need real-time data access, built-in statistical functions, data preparation, or support for multiple data types.
Common Cloud Data Warehouse Vendors
Azure is a fully-managed SQL cloud data warehouse for enterprises that combines lightning-fast query performance with industry-leading data security. Optimize workloads by elastically scaling your resources in minutes. Integrate seamlessly with the Azure suite of tools and BI providers to build a single holistic modern data warehouse solution for all your analytical workloads.
BigQuery is Google’s serverless, highly-scalable enterprise data warehouse that is designed to make data analysts more productive. Because there is no infrastructure to manage, you can focus on uncovering meaningful insights using familiar SQL without the need for a database administrator.
Amazon Redshift is a fast, scalable data warehouse that makes it simple to analyze all your data across your data warehouse and data lake. Redshift delivers fast performance by using machine learning, massively parallel query execution, and columnar storage on high-performance disk.
Snowflake’s cloud-built data warehouse makes delivering instant elasticity, secure data sharing and per-second pricing, across multiple clouds simple thanks to its patented multi-cluster architecture, speed, and flexibility. Snowflake combines the power of data warehousing, the flexibility of big data platforms, and the elasticity of the cloud at a fraction of the cost of traditional solutions.
Where to learn more
Fivetran recently released this Cloud Data Warehouse Benchmark. See how the most common warehouses compare side by side.
You can learn more about building a cloud-native data infrastructure in our free eBook, Building a Cloud Analytics Stack.
Not sure whether you want to build a cloud data warehouse or a cloud data lake? Read our online guide to learn the key differences and benefits of each.