The Great Divide
In today’s data-centric enterprises there are three primary groups of users working with data and analytics:
- Data Teams: Responsible for owning and maintaining the enterprise’s data warehouse and making that data accessible to select users in the company. Conversant with SQL and a broad array of programming languages.
- Analysts: Responsible for analyzing and interpreting canonical stores of data to answer recurrent and ad-hoc questions. Some of the members of this cohort of users have some SQL and programming skills. Conversant with BI tools, spreadsheets and pivot tables.
- Business Experts: Use exports, reports and dashboards generated by analyst and data teams. Typically have most of the business questions and use data to guide decisions. Often need additional reports or data and make frequent ad hoc requests to analysts and data teams.
Over the years the gap between the data engineers, analysts, business teams has continued to widen.
Traditionally data teams curate tables within a data warehouse, which are queried via report/chart building tools or downloaded via sanctioned access points. This curated model has its advantages when it comes to standardization, but it isn’t free of downsides. When an analyst conceives of a valuable change, the process for incorporating it into the model (even as an experiment) may involve weeks of waiting on data engineers. More often than not, analysts manually and repetitively apply these changes in isolation. This means improvements are confined to laptops and emails, rather than readily available to the entire organization.
And even if analysts push for changes to be incorporated back into the standardized model, their request is likely one of many; this influx of requests can cause data engineers to become a major bottleneck. When this happens, it pulls data engineers away from their own responsibilities and may even delay analyst requests to the point that they become irrelevant and out of date.
More Challenges for Data Engineers
Here are a few more challenges facing today’s data engineers:
- Data disappears. The analyst downloads the data onto a PC, and the data engineer promptly loses sight of what happens to it. The data engineer has spent great blocks of time curating the data, ensuring it’s up to date and accurately reported. Then the data is downloaded and the engineer has no idea who made the download, when it happened, and how the data is being used.
- Constant context switching to help downstream data consumers by storing the state of a process or of a thread, so that it can be restored and execution resumed from the same point later. Although this is a valid user service, it disrupts the data engineers’ focus on the engineering tasks they are uniquely qualified to perform.
Challenges Facing Business Analysts
- Business analysts find themselves constrained by engineering ticket throughput, not “curiosity throughput.” Imagine this scenario: For every web page you wanted to examine you have to file a ticket, have someone else navigate to the page, and send you an email with the screenshot. Because this is cumbersome and time consuming, you would only ask for materials that you were absolutely certain were worth reading. Even with this filter your requests pile up and can take weeks or months to be fulfilled. You’ll probably just give up and not bother asking for most of the information that is available. This scenario describes a company culture that kills curiosity and dramatically devalues your data.
- Without IT intervention, analysts need to learn to code – in other words they need to gain some technical skills beyond Excel to work with canonical datasets, or generate reports or analysis in a timely manner.
- Extracted data is no longer live or secure, and the tools that process these extracts, like Excel and Tableau, cannot work at the scale of the raw data. Thus, pre-filtering and aggregation are required. Chasing down discrepancies between these datasets and live, connected, canonical datasets is the bane of many data engineers’ existence.
- The gap between business analysts and technologists becomes even more critical as the volume of actionable data flows in and out of the data warehouse. The volume, velocity, and variety of data flowing into a data warehouse have no correlation to the size or age of a company. The result is that datasets too large to analyze with existing spreadsheet or report-building tools are the new normal in many companies.
- Most legacy products simply aren’t built for the world of cloud data warehouses while newer products tend to serve a specific group of users while neglecting others. This results in a split-brain situation: separate groups of users are speaking different languages, despite talking about the same data and questions.