August 6, 2025

How Query Federation Helps You Analyze Data Without Moving It

August 6, 2025

You’re working across half a dozen tools and know the data exists, but you just can’t reach all of it when you need it. By the time that report finally comes together, you're left wondering whether it's already out of date. You rerun a query, refresh a dashboard, pull another export. It starts to feel like the tools are working against each other, and somehow you're stuck in the middle. That’s not a workflow; it’s patchwork.

This isn’t just about wasted time. It’s about missed context. When different sources can’t be analyzed side by side, you make decisions in fragments. You rely on gut feel instead of complete insight. And even if you have access to a data team, the wait time to get everything stitched together can stretch for days.

The reality for most teams is that the structure, the systems, and the sync schedules just aren’t built for the kind of cross-source exploration modern work demands. This blog post explores an approach designed to reduce that friction. It’s not a silver bullet, and it won’t replace all your pipelines, but it offers a way to analyze data where it already lives, without waiting for it to be moved, copied, or transformed.

What’s life without federation

It usually starts with a simple question like: “Can we compare product usage trends with churn for the last quarter?” The answer depends on three systems, two exports, one person on vacation, and a dashboard that hasn’t refreshed since Monday. By the time you piece it all together, the moment for action has passed. Most modern data stacks are still built on a model that assumes data needs to be copied and centralized before it’s usable. So every time someone asks a cross-functional question, like blending marketing campaigns with CRM activity or comparing finance data with supply chain updates, the default response is to move more data.

That means more ETL pipelines, scheduled syncs, and storage. When things don’t line up, people fall back to spreadsheets. The result is more places to look for answers, but no single place to trust. You might have one warehouse for structured analytics, another system for operational data, and SaaS tools that generate valuable context. Still, none of them talk to each other unless someone intervenes, usually by hand or by submitting a ticket that gets deprioritized behind month-end reporting.

For individual contributors, this creates a constant tradeoff between waiting for the right data and making do with whatever is already accessible. Either way, analysis takes longer, and the insights arrive after the decision has already been made. No one designed it this way. It’s just what happens when data is scattered, and access relies on movement instead of connection.

What is query federation?

Query federation is a method of analysis that lets you ask a single question across multiple data sources without relocating the data. That means you can join customer details in your CRM with transaction records in a warehouse, or compare forecasts in a spreadsheet with inventory in your ERP system, all without duplicating anything.

Rather than building another integration or waiting for the next batch job, query federation works by referencing data where it already resides. It uses metadata and virtual connections to bring the pieces together temporarily, only while the query runs. Nothing gets pulled into a central repository, and nothing gets stored twice. Think of it like borrowing a book from different libraries at once. You don’t need to combine all the books into one location to read a passage from each. You just need the right access and the ability to understand what’s in each one. Federation makes that possible by allowing you to read across your systems without rewriting the structure.

This approach stands in contrast to traditional ETL pipelines, where data is extracted from its source, transformed to fit a unified model, and then loaded into a centralized platform. While ETL still plays a role for long-term storage and reporting, it tends to slow things down when decisions need to happen in hours, not weeks. Query federation doesn’t eliminate ETL, but it gives analysts and domain experts an alternative when timeliness matters more than architectural perfection.

For many teams, federation is the first step toward working across systems without asking for engineering help. It shortens the path between a question and an answer, and opens the door to analysis that isn’t limited by what’s already been copied into the warehouse.

Why this approach matters now

Most analysts don’t need to be told that data is scattered. Every project seems to involve another tool, another login, another export. Even with a modern stack, it’s common to juggle three or more systems just to answer a single question.

Part of the challenge is scale. The number of SaaS apps in use across organizations has exploded. The average organization uses over 60 different SaaS tools. That’s not including spreadsheets, shared drives, APIs, or internal databases that also hold relevant data. Teams can’t realistically funnel all of that into a single source fast enough to meet business needs.

Storage isn't the bottleneck anymore; access is. The effort to centralize everything through pipelines and integrations only works up to a point. Copying data adds cost, increases security exposure, slows things down, and introduces risk. Once data is moved from its source, it becomes harder to verify. What started as a “single source of truth” turns into multiple versions of the same number.

For teams working under regulatory obligations, like HIPAA or GDPR, the stakes are even higher. Moving sensitive data can trigger legal reviews, access audits, or compliance work that delays projects. In some cases, teams end up avoiding specific sources altogether just to sidestep the overhead.

Query federation offers a different model. Rather than forcing every dataset into the same location, it connects to them where they are. Instead of asking, "How do we move this data?" federation asks, "Do we even need to?" That shift recognizes that the way people work has changed. Teams need context faster, analysts need answers without delay, and no one has time to wait for a pipeline to catch up.

How modern platforms support federation

The promise of query federation sounds simple, but not every platform makes that promise usable. For federation to feel natural inside a business intelligence tool, the platform has to do more than link sources. It has to help people work across them without stepping into the complexity they didn’t sign up for. In most cases, that means integrating with a federated query engine or supporting a virtualization layer. These are the technical pieces that allow systems to “see” each other, even if the data stays put. The connection is just the beginning. What matters more is whether analysts and domain experts can do something with it.

In Sigma, that experience is built directly into the workbook interface. Users can explore tables from multiple sources, like Snowflake and CSV, side by side, and join them visually without writing SQL. Once the connection is made, it behaves just like any other dataset in the workbook. Filters, inputs, and aggregations apply across both sources in real time, without needing to move the data or wait for a refresh.

What makes this practical is the ability to work with federated data the same way you would work with anything else: calculate metrics, build dashboards, and experiment with logic; all from the same canvas. That parity matters. It means users don’t have to switch mental models or tools depending on where the data lives. This prevents mismatches and allows for flexible modeling without rewriting everything from scratch.

The takeaway here is that federation is possible, and with the right interface, it’s usable. When platforms take care of the technical overhead, contributors can focus on asking better questions.

Everyday wins: How teams use query federation

Most conversations about query federation lean technical, but its impact shows up in daily work. The real advantage is in how much less manual the process becomes as you stop chasing exports, asking for refreshes, or maintaining side spreadsheets just to answer basic questions.

Consider a finance analyst reviewing headcount projections. The payroll system lives outside the data warehouse, the ERP tool holds cost centers, and the budget plan lives in a shared spreadsheet. Without federation, those numbers don’t come together unless someone copies them manually or builds a custom pipeline. With federation, they stay where they are and still connect, so the analyst can compare forecasted spend against actual payroll and drill into discrepancies without breaking context.

A marketing team working across platforms often encounters similar challenges. Campaign performance data may be in Google Ads, with lead behavior in HubSpot and spend tracking in a finance dashboard. Stitching those together used to mean waiting for someone to build a merged view or doing it by hand. Federation lets them analyze performance and attribution across these tools directly, with no middle layer required.

Product teams benefit too. Usage logs in Snowflake can be joined with survey results sitting in a Google Sheet. This makes it possible to analyze how users behave alongside what they say about that experience. There’s no need to load the feedback into the warehouse first; it stays in its source and remains connected to the broader analysis.

Operations teams also rely on dozens of tools from scheduling platforms to logistics databases to vendor portals. Federation gives them a way to pull a signal from that noise. They don’t have to rebuild the stack; they just need a path to see across it.

These are typical questions that happen to span systems, and federation makes them routine.

Clearing up common misconceptions

The concept of analyzing data without moving it raises eyebrows. For some, it sounds too flexible to be reliable. For others, it feels like a shortcut that must come with trade-offs. These hesitations are especially valid for analysts who’ve spent years cleaning up after broken joins, mismatched schemas, or refresh failures. Let’s look at what federation does and doesn’t do.

Performance

People assume querying across sources must be slower than querying from a single warehouse. And in some systems, that’s true. If a platform treats each connection as a separate job, with no optimization or caching, the query time suffers. But modern BI platforms and query engines have addressed this with intelligent planning. When done right, federated queries only pull the minimum required data, and they do it in a way that’s designed for responsiveness.

Accuracy

There’s also concern about accuracy. When data stays in separate systems, who defines the metrics? How do we know the columns line up? These are legitimate questions. Good federation relies on consistent metadata and clear logic layers. That’s why tools like Sigma put modeling controls in the hands of analysts, not buried inside a warehouse ETL script. The logic becomes visible, testable, and adjustable within the same workbook that runs the analysis.

Security

Security is another area of pushback. It may seem like connecting directly to external sources introduces more risk. In reality, federation can reduce exposure. Since the data isn’t copied, there are fewer duplicates to monitor, fewer storage buckets to protect, and fewer places where permissions need to be synced. Analysts query live data, but governance stays with the source. That means source systems can enforce policies without relying on secondhand controls.

Replacement

Finally, there’s the myth that federation replaces everything. It doesn’t. It won’t solve every modeling issue or take the place of a well-managed warehouse. But it can handle the kinds of questions that pop up every day, the ones that don’t justify building a whole new pipeline just to get an answer. For that tier of work, it offers an option that’s faster, safer, and more direct.

What this means for data access

When people talk about “self-service analytics,” they’re often describing a goal that feels just out of reach. Business users want answers, analysts want flexibility, and engineers want to protect systems without becoming gatekeepers. These needs aren’t at odds, but the tooling has made it feel that way.

Query federation shifts that balance. It doesn’t require users to learn a new platform or manage complex backend logic; it changes the level of friction. Analysts can explore questions that span multiple sources without waiting for someone to copy the data first, and business teams don’t need to submit tickets to get a blended view of marketing spend and revenue impact. They can build it if the platform supports it.

That’s where federation makes a difference by changing how those dashboards are built. When data is analyzed, where it lives, the permissions travel with it. Role-level controls stay intact. Definitions remain consistent. And users can interact with the source without needing to bypass governance to get clarity.

In Sigma, this shows up in how workbooks are structured. People can prototype new metrics across systems using the same workbook logic they already understand. The data stack may be complex, but the experience doesn’t have to be. When everyone has access to a broader context, it changes how questions are asked and decision-making starts from a wider, more complete view of what’s happening across the business.

Reframing data federation’s role in analytics

Query federation starts from the idea that data doesn’t need to live in one place to be useful, logistics shouldn’t delay answers, and context should be accessible, not stitched together after the fact.

For a long time, analytics tools have treated centralization as a requirement. They’ve been built around the notion that data must be extracted, standardized, and stored before it can be trusted. That approach works well for systems of record and long-term reporting, but it’s a poor fit for the kind of questions people ask on the fly.

Federation gives teams a way to keep working, even when the data hasn’t all landed in the same bucket. It reduces the need for back-and-forth and respects the source of the data, without slowing down the people trying to analyze it. This shift doesn’t mean pipelines go away or warehouses become less important; it means they don’t have to do everything. Federation fills the space in between where curiosity lives, where urgency builds, and where fast decisions often need the full picture.

For individual contributors and analysts, momentum is often missing from analytics work. When access isn’t delayed, exploration continues without interruption. With complete context, comparing sources becomes straightforward. Instead of working around limitations, teams can start with the data already in front of them. That’s a better way to work.

2025 Gartner® Magic Quadrant™

Data Analytics