Select Page


Easy JSON Unravelling

Data Challenges in Semi-Structured Data

In the old days of data warehouses, enterprises had to be careful about what went into their warehouses, knowing they’d have to pay dearly for every byte of data they stored. As cloud technologies drove the storage costs dramatically down, these enterprises could loosen the restrictions on what data (and how much of it) they stored in a data warehouse, opting to do more of the cleaning and transformation using SQL, inside the warehouse, rather than using procedural code to transform the data on its way into the warehouse.

Amazon’s Redshift data warehouse presaged this evolution, but even though Redshift handles all manner of relational data operations, it still treats JSON (JavaScript Object Notation) and other semi-structured data as text, diminishing its power and flexibility.  Event-based online data such as website interactions, mobile apps, banking and retail transactions, Internet of Things (IoT) are commonly fired as semi-structured files.  Redshift’s primary competitors – Snowflake and Google BigQuery – both offer full support for such semi-structured data.

JSON contains readable tags and an implicit organizational structure, but it’s a far cry from the rows and columns and rules of structured data. The structure is not fixed, and the files can contain an arbitrary depth of nesting.  Flattening this data on its way into the data warehouse can result in lower resolution data and the loss of any detail that doesn’t conform to a predefined relational schema.  Conversely, having semi-structured data like JSON in the warehouse creates a semantics challenge for anyone trying to curate the data for the downstream consumers.  

To complicate matters further, the challenges of working with semi-structured data only grow as the volume, velocity, variety, and volatility of the data increase.

We are rewriting the rules of analytics. Sigma empowers domain experts to join the data conversation, answer the toughest questions, and drive insights.