Kent Graziano on the Evolution of the Data Warehouse and the Future of Data Governance
Content Marketing Manager, Sigma
It’s an exciting time in the world of big data and analytics. Technology is finally moving at the real-time speed of business thanks to the cloud — and we’re witnessing the gap between the IT and business teams disappear as a result.
Those working in data know that has always been the goal of data architects and engineers. But the advent of the cloud has turned data dreams into reality for some of today’s most innovative companies.
Perhaps nobody knows the impact of these changes better than Snowflake’s Chief Technical Evangelist, Kent Graziano (AKA. The “Data Warrior”). As a long-time industry veteran, Kent has not only seen it all, he’s also written the playbook on harnessing big data to achieve business goals.
I recently had the opportunity to speak with Kent about his 30-year data journey and hear his thoughts on the future of analytics, data governance, and the evolving role of data warehouses in the enterprise.
You’ve been working in the data space for a long time. How did you get your start?
Before I worked in data, I was a programmer. I did scientific programming and things of that nature, and eventually jumped into data from there. I got involved with my first small Oracle database in the late ’80s.
That first Oracle database turned me onto data warehousing in the mid-90s. That was when I met Bill Inmon. I coauthored The Data Model Resource Book with Bill and Len Silverston in ’97, which was a real turning point in my career. It was all really exciting. I got to work with Bill, I got to meet Claudia Imhoff, and they’ve been friends and mentors ever since. The rest, as they say, is history.
What’s kept you inspired to stay in the industry all these years?
It’s the interest in how people can use data. The whole moniker of ‘data for good’ is certainly always in the back of my mind, how nonprofit organizations—and even corporations—use data for the betterment of society.
The fact that technology kept changing and evolving has also kept me engaged. After working in the on-prem database world for 25+ years, Snowflake came along. I saw what Snowflake was able to do, and how easy they made it to access, process, and analyze data with the new cloud architecture that our founders invented. That was a real turning point. It re-energized me. I remember thinking, “Okay, this is it. This technology solves the problems that my clients face every day.”
Can you share a little bit more about your role at Snowflake?
I am the Chief Technical Evangelist. My tribe is data architects, data engineers, and senior technical people that, like me, have been in the industry in some cases for a couple of decades. I help them stay up to date and understand the challenges and opportunities that are in front of us with the cloud: the incredible increase in the amount of data coming from sources like IoT and mobile, and the opportunity to harness that data to drive business value. I help these people answer the questions:
- How do we get from the on-prem world to the cloud?
- What’s a rational architecture and approach to take?
- How do we think about this?
- What do we need to think about differently?
You frequently write and speak about the changes happening in data governance. What are the biggest trends you’ve uncovered?
There is a growing interest in data governance, specifically because of things like CCPA and GDPR that require data lineage and the “right to be forgotten.” People are looking for answers on how to manage their data in compliance with these new regulations.
What might be surprising to many people is that data governance best practices have existed for a couple of decades. The questions have already been answered. We now need to force ourselves into the discipline of following those best practices.
The cloud hasn’t changed data governance. You still need to be engaged with the business. You still need to have a data catalog. You must build these things; they don’t just magically appear.
Organizations must bring together all the silos of data into one central data platform so people can access it and analyze it—that’s something that both Snowflake and Sigma facilitates.
But it doesn’t help to have a central platform and a great analytics tool to access data if nobody understands the data or its source. These are part of a broader discussion that is resurfacing because of the need for compliance—and the data lake versus data swamp conversation.
We need governed data lakes so that the data is useful. If we want to do machine learning, and we want to do AI, and we want to do business intelligence, we need to embrace more governance. People must understand the source of the data, follow the lineage of the data, and, most importantly, understand the meaning of the data from a business perspective.
How has the cloud changed the way companies approach data infrastructure?
One of the things I’m seeing is the evolution of the data lake and data warehouse. It’s becoming more about a data platform and a place for doing your analytics and getting all the data consolidated. The cloud presents a massive opportunity here because of the flexibility and nearly unlimited scale that it provides. It has removed the on-prem constraints from the conversation.
I’ve been telling people that data lake is not a technology; it’s a concept. And we need to get people to understand that. Big data is the same thing. It’s not a technology, it’s a concept. Data warehouse, it’s not a technology, it’s a concept. The cloud provides us with the technological ability to bring all of this together. In my mind, many of these boundaries are entirely artificial, and it was a result of the foregoing technology limitations that we had at the time. You no longer have to think about these concepts as being different things.
The data lake and data warehouse are evolving. It’s becoming more of a data platform and a place for doing analytics — and getting all the data consolidated.
How have these changes affected the way people think about data modeling?
With the data platform approach, I see a resurgence in the interest in data modeling. We went through close to a decade of people saying, “We don’t need to do data modeling.” In the data lake world, I frequently heard, “We’re just going to throw all that out there, and the data scientists are going to go at it, and they’re going to pull data, they’re going to derive information out of it.”
Over the past decade, the number and variety of data sources have grown exponentially with the rise of mobile devices, IoT, and enterprise systems—think ERP, CRM, and weblogs. This increase has us all asking, “what are we going to do with the data?” and “how are we going to understand it, extract value from it, and take advantage of the insights that lie within it?”
Effective data modeling is key to the successful use of these massive new data streams and formats. Far from being an outmoded skill, supplanted by the rise of automation tools, data modeling is more important than ever—especially for those tasked with business intelligence or analytics. Data modeling helps define the structure and semantics of the data and make it understandable, so business users and data scientists can be properly query, manipulate, and analyze it.
We can now onboard and access data exceptionally quickly. But we need to organize it so that more people can take advantage of it and more people could get an understanding of it.
An effective model makes data approachable and consumable and ensures people use the right information in the right context. Get our guide to data modeling best practices.
The data space is rapidly changing. How do you stay up to date? Any tips for the data community?
I think about this question often. And I found that amazingly, the best way is through social media. I get much of my information from Twitter and LinkedIn, and by following industry thought leaders like Claudia Imhoff.
I think it comes down to finding your community, your “tribe.” In the data space, you’ve got all of these topics and concepts: data science, machine learning, AI, traditional business intelligence, data warehousing, and the data lake. Go find your tribe and seek out that information. Join the community conversation. That is the way I’ve been able to keep up for the last ten years.
Last question: You’re known in the industry as the “Data Warrior.” I have to ask, how’d you get that title?
Well, outside of data, I am a seventh-degree black belt and certified Senior Master Instructor in TaeKwonDo. I’ve been doing TaeKwonDo for 39 years now, and have an organization called Rocky Mountain TaeKwonDo that I started in Denver many years ago. I’ve been doing multiple forms of martial arts longer than I’ve been involved with data.
The Data Warrior moniker comes from my combined passions of martial arts and data, and it’s the genesis of my blog. There I write about a range of topics ranging from data modeling and governance to agile data warehousing, and sometimes martial arts.
Kent is an internationally recognized industry expert in Cloud and Agile Data Warehousing as well an award winning author in the areas of data modeling, data warehousing, and data architecture. He is the Chief Technology Evangelist at Snowflake.