June 4, 2025

You Don’t Need More Data—You Need the Right Data Sources

June 4, 2025

Your dashboard loads just fine. No errors, all the filters work, but something still feels off. The sales number looks different from what was shared earlier in Slack. It’s the same metric with a different total. You dig into the data and realize it’s pulling from a newer table that wasn’t flagged or documented. Nobody mentioned the change, but the logic changed. Now you’re not sure which version is correct.

That moment doesn’t always lead to a disaster, but it plants doubt. Quietly, people start running side calculations in spreadsheets “just to check.” A few dashboards get duplicated so teams can add their own logic. Eventually, they stop using the report because they no longer trust it. The problem is that too much data is coming from the wrong places, or worse, from too many places at once.

The truth is, data volume stopped being the challenge years ago. Between cloud storage, event streams, APIs, and enrichment tools, most companies already have more data than they can reasonably use. What’s missing is structure and clarity. A shared sense of which sources reflect how the business runs and which are still hanging around out of habit.

This blog post is about choosing better data. We’ll look at how to recognize sources that do more harm than good, what to consider before connecting anything new, and how to build a sourcing strategy that keeps your analytics focused on answers instead of noise.

The hidden cost of bad data sources

The trouble with misaligned data sources isn’t that they break all at once. It’s that they quietly erode the foundation your decisions stand on.

Let’s say the revenue numbers in your dashboard come from two places: one from your billing platform and another from a monthly CSV uploaded by finance. Both look similar and seem accurate, but neither fully matches the other. So what do you do? You try to reconcile them by duplicating the dashboard, rerunning the query, and asking a colleague which one they trust. That’s the time no one spends analyzing patterns or moving a decision forward.

These are symptoms of source confusion, and they come with a cost:

Decision drag: When people second-guess the numbers, they delay action. The meeting becomes a debate over whose data is “right” instead of what to do next.
Team fatigue: Analysts spend hours retracing filters, field definitions, and join logic. Eventually, they stop pushing forward and start maintaining dashboards like fragile artifacts.
Tool sprawl: When trust breaks, teams work around the system. They build shadow reports in spreadsheets, paste screenshots into decks, and create their own “cleaned” versions of the data in isolation.

Beneath all of this is a governance gap. When no one’s responsible for asking, Is this still the right source?, the noise only grows. The longer a redundant or outdated source stays active, the more dashboards it touches and the harder it becomes to remove. Most BI teams aren’t struggling with a lack of skill or access. They’re running into problems created by source volume without structure.

Before we discuss how to evaluate sources, it’s worth pausing here. If these challenges sound familiar, they’re a sign that it’s time to reassess what you’re already using.

What actually counts as a data source?

When most people think about data sources, they picture systems like a Snowflake warehouse, Salesforce instance, or Google Analytics property. However, in practice, a “data source” isn’t just a tool; it’s the origin of the values feeding your dashboards and the logic behind every number you report.

Some sources are obvious, like a production database that captures customer orders or a marketing platform that tracks campaign engagement. Others are less visible, like a static CSV from last quarter that still powers a revenue widget or an internal Google Sheet that someone manually updates before the monthly report goes out.

It’s easy to forget that anything shaping your metrics counts as a source. That includes:

Tables stored in your warehouse
Spreadsheets uploaded manually or scheduled via cloud storage
Extracts pulled from third-party tools
Data passed through APIs
Enrichment datasets purchased from external vendors
Custom calculations baked into Sigma workbooks that mimic source behavior

When you start combining these sources, it’s not always clear which one should act as the “truth, " especially when different teams define success using slightly different logic.

This is why understanding what qualifies as a source matters. If you don’t name it, you can’t govern it, and if you can’t govern it, you can’t guarantee consistency. Sources aren’t limited to structured tables, either. BI teams often work with semi-structured JSON outputs, flattened event streams, or transformed files from reverse ETL pipelines. These are still sources, even if they don’t start as clean relational tables.

There’s also the human layer. A product analyst might point to a dashboard and say, “This is our retention view.” Unless you know what data built that chart, when it was pulled, how it was joined, and what it excludes, you’re relying on the visualization, not the underlying source. We all know visuals without clarity can mislead just as easily as they inform.

This realization is that most confusion starts upstream when you don’t know exactly where your data came from, what it represents, or why it was chosen.

Smart sourcing: 4 things to consider before you connect

Just because a dataset is available doesn’t mean it belongs in your analytics workflow. Before pulling in another table, syncing a new spreadsheet, or wiring up an external API, it’s worth asking: What does this source bring to the table, and what does it require in return? Choosing a data source is about knowing how that source will perform under pressure, how easily your team can access it, whether it fits into your broader goals, and how much effort it takes to keep it clean.

Here are four things to examine before a source becomes part of your analytics stack:

1. Data quality: Will this hold up under scrutiny?

Even the most advanced dashboards fall apart if the underlying data is incomplete, stale, or inconsistent. A source that updates irregularly or contains mismatched formats across fields (think: customer name in one place, ID in another) will likely raise more questions than answers.

Instead of asking whether the data exists, ask whether it can support confident decisions. That means:

Is it complete enough to answer the questions you’re actually asking?
Is it refreshed at a frequency that matches the pace of your decisions?
Is it consistently formatted across rows and timeframes?
Can discrepancies be resolved, or do they linger across dashboards?

When quality is questionable, teams start adding workarounds that multiply.

2. Accessibility: Can the team get to it without fighting the setup?

If only one person knows how to access a data source, or if connecting it requires multiple tools, workarounds, or permissions, it’s already a liability. Difficult-to-reach sources often end up copied into spreadsheets, flattened into CSVs, or skipped altogether.

Instead, look for:

Formats that are compatible with your analytics toolset (like Snowflake or BigQuery)
Stable APIs or direct connections, not brittle exports
Reasonable latency. If it takes too long to load, people will move on
Clarity about who owns access and who supports it when something goes wrong

Accessibility isn’t about being open to everyone; it’s about being reliable to the right people.

3. Scalability: Will it keep up as your business grows?

Some sources feel light and simple at first. A spreadsheet works fine for 200 rows, and a manual export works when it’s once a month. However, as usage grows and data volume increases, small cracks turn into roadblocks.

Ask:

Can the source support higher volume without degrading performance?
Will schema changes break downstream reports?
Is the source part of a system that can grow with your team?
Does it rely on one-off manual steps, or is it built for repeatability?

Scalability is about more than infrastructure. It’s about whether a source can evolve with the questions your team will ask next quarter, not just today.

4. Strategic relevance: Does it align with what you’re trying to learn?

A third-party dataset may appear impressive, but if it doesn’t support a real use case or overlaps with existing fields, it adds noise rather than value. This is where business context matters. Ask:

Does this source help measure or influence a core KPI?
Are the people using this data the ones requesting it?
Has the source been validated with actual decision-makers?
If it disappeared tomorrow, what would break, and who would notice?

Strategic sources support active conversations. Everything else is a distraction.

How different departments prioritize

Different teams care about different things. That’s why the idea of a “best” data source can be misleading. The right data source depends on the question being asked. Questions that rarely come from the same place. Depending on the questions they're trying to answer, what’s helpful to one group might feel irrelevant or even disruptive to another. Sourcing isn’t just a technical choice; it’s a reflection of priorities.

Finance teams tend to look for stability above all else. Their reports power board meetings, investor decks, and quarterly close, so they need consistent and dependable data. Freshness matters, but not if it comes at the expense of auditability. A single version of revenue is more valuable than three options that shift from one week to the next. Their ideal sources are structured, slow to change, and tied to definitions that won’t move without notice.

Marketing, on the other hand, often works across fragmented tools and shifting campaigns. Data flows in from ad platforms, email systems, web analytics, and attribution models that evolve every time a strategy does. For them, context and timeliness matter more than rigid consistency. They need to see what’s working in near real-time, even if it means dealing with less-than-perfect joins or occasional gaps. That tradeoff makes sense if you optimize mid-campaign; perfect isn’t always the goal. Direction is.

Product teams think in terms of behavior, focusing on how users interact. Retention curves, usage frequency, and drop-off points rely on event streams, session data, and raw logs. Those sources are rarely clean; their challenge isn’t volume but alignment. What counts as an “active user” in Product might not match what Finance sees. Unless those definitions are coordinated, two teams looking at the same chart might walk away with very different stories.

Operations takes a more transactional view. The stakes are often immediate: support volumes, fulfillment times, vendor performance. Their systems update frequently, and delays, even by minutes, can lead to problems. They need reliable inputs that refresh on schedule and don’t require a manual fix every time something changes upstream. For them, clarity and speed matter more than elegant models.

What’s important to remember here is that no source is universally “good” or “bad.” A low-latency API might be perfect for Ops, but it would be entirely unhelpful for Finance. A fixed warehouse snapshot might serve Finance well, but frustrate Product with its lag. The point isn’t to standardize every source across teams. It’s to build awareness of what each team values, and why. When you understand the context behind the ask, it’s easier to choose sources that support the decision at hand.

Signs it’s time to sunset a data source

It’s not always obvious when a data source outlives its usefulness. By the time issues surface like conflicting metrics, slow reports, or teams asking for rebuilds, the root cause often looks like a dashboard problem, not a sourcing issue. But many of these frustrations trace back to the same root cause: a source that no longer belongs. Poor sources remain hooked into dashboards no one checks anymore, continue feeding metrics that no longer match how the business operates, and do more harm than good over time.

Here are some signs a data source may need to be retired:

It lacks a clear owner.

If no one can confidently say where the source came from or who maintains it, it’s a risk. Orphaned sources are hard to trust. They tend to break quietly, without anyone noticing until something downstream stops working. Ask: If this source had an issue tomorrow, who would know how to fix it?

It doesn’t match how the business works anymore.

A dataset built around last year’s go-to-market plan might still be technically accurate, but if your team has redefined customer segments, updated your sales process, or changed core KPIs, the logic may no longer hold. When source definitions no longer reflect how teams operate, they quietly introduce friction into every analysis built on top of them.

It creates conflicting versions of the same metric.

If your team has multiple dashboards reporting “revenue,” and each pulls from a different source or uses different joins, filters, or update cadences, you likely have a redundancy problem. In these cases, the best thing to do may be to deprecate the less reliable source and standardize around a single version.

It adds noise instead of value.

Sometimes sources remain active simply because they’ve always been there. But when you audit usage and find no one is referencing the data or copying it elsewhere “just in case,” it’s worth asking whether it needs to stay connected.

It consistently requires manual fixes.

A source that fails quietly or requires frequent intervention to run properly is a liability. Even if the data is good when cleaned, the ongoing cost to maintain it may outweigh its benefit. If a source needs hands-on help every week, it might be time to rethink whether it belongs in the stack.

Retiring a data source isn’t about hitting delete and hoping for the best. It’s a process that deserves as much thought as adding a new source in the first place. That means taking time to map out what depends on it, surfacing the dashboards that pull from it, and giving teams a heads-up before anything changes. Sigma’s lineage features make it easier to spot where a source is being used, but the call to phase something out usually starts with noticing a pattern.

When in doubt, ask this: If we removed this source tomorrow, would anyone notice? You have your answer if the answer is no, or if the only person who would is no longer on the team.

Building a better sourcing habit

More data isn’t always the solution. If anything, it's often the reason things start to feel more complicated than they need to be. When dashboards feel bloated, definitions shift between teams, and meetings start with “Which number is right?”, the problem is usually sourcing. Choosing the right data sources reflects what your team values, how your business operates, and whether your insights are built to keep up with change or stuck in cleanup mode. The goal isn’t to simplify for simplicity’s sake, it’s to make space for clarity.

Start with the questions that matter, connect only the datasets that still support those questions, and let go of the ones that don’t. There’s no perfect list of sources; what matters is whether each one still fits. Sourcing should be treated as an evolving habit, not a one-time setup. Make room to revisit your sources quarterly, during roadmap planning, or whenever reporting friction creeps in.

Removing a source is a sign your analytics are maturing.

‍

The Data Analyst’s Path To Leadership

Data Analytics