We hope you are staying healthy. Click here to see how Sigma is ensuring business continuity and helping our customers through the coronavirus pandemic.


Easy JSON Unravelling

Data Challenges in Semi-Structured Data

In the old days of data warehouses, enterprises had to be careful about what went into their warehouses, knowing they’d have to pay dearly for every byte of data they stored. As cloud technologies drove the storage costs dramatically down, these enterprises could loosen the restrictions on what data (and how much of it) they stored in a data warehouse, opting to do more of the cleaning and transformation using SQL, inside the warehouse, rather than using procedural code to transform the data on its way into the warehouse.

Amazon’s Redshift data warehouse presaged this evolution, but even though Redshift handles all manner of relational data operations, it still treats JSON (JavaScript Object Notation) and other semi-structured data as text, diminishing its power and flexibility.  Event-based online data such as website interactions, mobile apps, banking and retail transactions, Internet of Things (IoT) are commonly fired as semi-structured files.  Redshift’s primary competitors – Snowflake and Google BigQuery – both offer full support for such semi-structured data.

JSON contains readable tags and an implicit organizational structure, but it’s a far cry from the rows and columns and rules of structured data. The structure is not fixed, and the files can contain an arbitrary depth of nesting.  Flattening this data on its way into the data warehouse can result in lower resolution data and the loss of any detail that doesn’t conform to a predefined relational schema.  Conversely, having semi-structured data like JSON in the warehouse creates a semantics challenge for anyone trying to curate the data for the downstream consumers.  

To complicate matters further, the challenges of working with semi-structured data only grow as the volume, velocity, variety, and volatility of the data increase.

How Sigma Helps With Semi-Structured Data

Sigma understands that the point of analytics is to improve business processes and outcomes—and that the best way to move that needle is to make it fast and easy for business domain experts to ask and answer questions, leveraging their full powers of creativity, curiosity, and intuition.

The logic of these explorations and analyses are always expressible in SQL and in spreadsheet syntax, which facilitates far more effective communication between data experts and business experts.Instead of requiring business experts to also become JSON and SQL experts or requiring data technicians to become experts in Salesforce, Netsuite, Marketo, and other business paradigms, Sigma provides an interface that connects directly to an enterprise’s cloud data warehouse, one that is as easy and comfortable to use as a spreadsheet. Domain experts can find answers to any questions they come up with, and they can explore the data to follow up on new lines of inquiry.

This shift enables the data experts to spend more of their attention and creative energy on the technical work only they can perform, like systems architecture, analytical deep dives, performance tuning, governance, and data compliance.

Sigma unlocks the benefits of semi-structured data – expedience, transparency of data lineage, flexibility to add new fields, depth, breadth – without the classic trade-offs of cost, confusion, and technical training. By empowering the business experts to explore and comprehend raw data, Sigma helps businesses ask more of their data.