Risky Business: Why Spreadsheet Extracts Are Your Biggest Data Security & Compliance Threat

Julian Alvarado

Sr. Content Marketing Manager, Sigma

Every year, organizations spend millions on infrastructure, security solutions, data management systems, governance initiatives to protect themselves from data breaches. Yet, despite all of the money, time, and resources that go into safeguarding data, one of the most basic, common, and ultimately harmful risks often goes overlooked: data extracted to spreadsheets by business users.

Of course, business experts aren’t intentionally looking to circumvent data governance practices. They’re merely looking for a shortcut around the typical business intelligence workflow where it can take days, if not weeks, for data teams to pull data and provide analyses from BI tools that require coding expertise. Unfortunately, these dashboards and reports typically lead to more questions than answers, and without the ability to dig into the data behind them, business experts must go back to the data team for more information, adding more time to the process.

Desperate to get the answers they need to make better business decisions in a timely fashion, business experts turn to static spreadsheet extracts. This scenario creates a nightmare for IT and data teams because:

Once data is extracted from a BI tool to a spreadsheet, the data team loses visibility into how employees use or share data.

The loss of visibility into the data means it’s out of reach for security and compliance oversight, making it vulnerable to misuse or hacking.

The downloaded data is instantly out of date once exported from the data warehouse.

The extracted data almost certainly represents a subset of the complete dataset, meaning decisions are being made using only a slice of the available data.

Here we will examine the threats posed by extracting data to a spreadsheet and discuss how IT and data teams can mitigate these security and compliance risks by embracing a modern approach to data governance.

Million Dollar Spreadsheets: The Risks of Data Extracts Are Real

Even the most innocent mistakes can cost an organization millions. According to IBM’s 2020 Cost of a Data Breach Study, the average cost of a data breach is $3.86 million. And while the financial loss of a breach is high, the resulting loss of consumer confidence and damage to a brand’s reputation is greater. And in the case of spreadsheet downloads, the victims are usually an organization’s employees.

In 2014, an employee at Willis North America accidentally sent a spreadsheet containing private information to 4,830 employees enrolled in its medical rewards plan. The attachment contained confidential, sensitive data such as employees’ names, birthdates, Social Security numbers, and employee ID numbers. As a result, Willis North America offered two free years of identity theft protection through TrustedID’s IDEssentials service. The costs are unknown.

Any time you move your data it creates vulnerabilities, whether that’s moving data to an extract, having people download data to their PC, or using emails to share reports.

Rob Woolen

CTO and Co-founder, Sigma Computing

Similarly, in 2016, a Boeing employee mistakenly emailed his spouse a spreadsheet filled with personal data — including social security numbers and birth dates — on some 36,000 other Boeing employees. As a result, Boeing had to offer each employee two-year subscriptions to Experian’s identity theft protection services. Based on Experian’s service costs, this one spreadsheet error likely cost the company somewhere in the neighborhood of $15 million.

If the risks of extracting data to a spreadsheet are so clear, why do so many business users continue to do it?

Traditional Data Governance and The Paradox of “Self-service” BI Tools

Traditional approaches to data governance leave data and BI teams with two options: either allow open access to everyone in their organization or lock data away from everyone but the data team. Each option has its own unique challenges, but both lead to workflow or governance issues that stall or prevent effective data-driven decision making.

The Wild West

Open data access means business teams use desktop tools like spreadsheets to extract data and conduct independent analyses. This level of ungoverned access causes a number of problems for data and BI teams, including data silos, conflicting insights, inaccurate analyses, security risks, and noncompliance.

The Ivory Tower

Data experts attempt to keep data safe and centralized by locking it down behind code and complicated tools. Non-coding business teams can’t directly access or explore data without asking BI teams for assistance, causing mass frustration that ultimately leads back to risky data extracts.

In an attempt to provide governed access to data and break away from the ivory tower, IT and data teams turn to traditional BI solutions that claim to be self-service –– only to fall short for three reasons:

  • BI analysis tools offer ‘self-service’ interfaces for business users, but these interfaces are minimal. They do not allow business experts to ask novel questions or to follow their curiosity and explore problems in creative ways.
  •  These tools require the use of proprietary coding languages or SQL to extract, parse, and combine data for consumption. As a result, non-coding business users have to wait on busy BI and data teams for help or to get their questions answered.
  • Because BI teams don’t have business-level domain expertise –– and because business users don’t always know the right questions to ask when they make the initial request –– the reports don’t contain the novel insights needed to drive business decisions. This results in a cycle of back and forths between BI and business teams, which ultimately leads to business users turning to an extract to get the answers they need on their own.

Consequently, 70% of domain experts turn to Excel for answers rather than wait on their BI teams, creating a slew of siloed, stale, risky extracts along the way. In addition to the compliance and security risks, data extracts have a major effect on decision-making processes, forcing business users to make decisions based on a snapshot of data that is out of date the second it’s downloaded to a spreadsheet.

We used BigQuery at my previous company. The only way for me to get my hands on any customer data was to have an analyst send me a data dump in Excel. It was painfully slow, and the data was outdated the second it was extracted.

Alex Harvey

Marketing Lead, Migo

Business users need real-time data in one governed place. Fortunately, there are new tools and technologies that strike a balance between data access and control in your organization, giving your team peace of mind –– while ensuring that business users can make agile, data-driven decisions.

The Solution: A Spreadsheet Experience In a Fully Governed, Secure Environment

You invested in your cloud data warehouse because it makes it easy to aggregate data across hundreds of sources and house it all in a centralized, secure, and fully-governed repository. It’s a scalable, single source of truth that supports concurrent workloads at speed — so why remove data for analysis?

Sigma empowers teams to securely explore data at scale with a UI they know and love: the spreadsheet. Because Sigma operates on top of the cloud data warehouse (CDW) as a cloud-native BI solution, it allows anyone to explore and query live data directly from the CDW in real-time down to row-level detail — without writing SQL or extracting data to a traditional spreadsheet.

Get Started 

As a secure and compliant BI solution, Sigma offers the most recent governance-related features and capabilities including:

 Role-based access permissions and sharing controls

 OAuth support to inherit data access permissions directly from the data warehouse

 Audit logs and usage dashboards to record all user actions and queries

 Fine-grained row-level security (RLS)

 And more! 

See all features

This approach enables business experts to go beyond the dashboard to explore data on-demand, without the limitations of traditional spreadsheets or the need to know how to code. IT and data teams can set security features like authentication and other permissions controls that allow for complete and consolidated control of who can access data –– and what they can do with it. With Sigma, teams can take full advantage of the cloud’s speed, scale, and compute power while ensuring that data is safe, current, and complete.

Because Sigma feels like a spreadsheet, users haven’t hesitated to dive right into Snowflake data for faster insights.

Alex Mora

Data Engineer, Clover Networks

Are you ready to take the first step toward this modern approach to data governance? Download our playbook to learn how your team can modernize your governance strategy in 5 steps.

Ready to visualize your data for actionable insights?