Making Sense of the Different Types of Data Structures
There is a seemingly endless amount of data collected in our modern world. But what types of data do organizations collect? And how is it different? Depending on your industry and analytics use case, your data and analytics infrastructure can vary dramatically.
At a high level, there are two main types of data used in big data analytics: quantitative data and qualitative data. From there, data can take different shapes dependent upon the nature of the information, and how it’s collected, stored, and analyzed.
Let’s dig in and break down the key differences.
Quantitative vs. qualitative data
Quantitative data consists of hard numbers–think of it as things that you can count. Quantitative analysis techniques include:
- Regression – Predicts the relationship between a dependent variable and one or more independent variables.
- Classification (probability estimation) – Predicts, or calculates a score of the probability of how likely an individual belongs to a class. In the content marketing example from earlier, you could think of two classes: “Will consume” and “Will not consume” and predict the likelihood of the success of an individual piece of marketing collateral.
- Clustering – This is the grouping of individuals in a population based on similarities.
Qualitative data is more subjective and less structured than quantitative data. In the realm of business, you encounter qualitative data from customer surveys and interviews. Common analysis methods include:
Content analysis – Used to classify and categorize different types of text and media.
Narrative analysis – Analyzes content from various sources, including interviews and field observations. As you conduct your analysis, make sure your metrics are in the format that your company is already using. For example, if your company budgets quarterly, your metrics should reflect the same.
Read our Definitive Guide to Big Data Analytics for a deeper dive.
Structured data vs. unstructured data
Data—whether quantitative or qualitative—can take multiple shapes depending on the nature of the information, how it’s collected, where it’s stored, and whether humans or machines created it. There are two primary levels of data structures to take into account: structured data and unstructured data.
Structured data is information that is rigidly formatted so that it’s easily searchable in a relational database. It’s usually quantitative information. Examples include names, dates, emails, prices, and other information we’re used to seeing stored in spreadsheets under column titles organized by category. Think of a company’s CRM, ERP, or email database information.
Structured data is organized and readable by machine code, making it easy to add, search, or manipulate it within a relational database using SQL. Consider an e-commerce purchase, for example. The information collected at the point of sale may include the product name, date of purchase, price, UPC number, payment method, and customer information—all of which is easy to search or analyze later to spot a trend or answer a question.
At first glance, it can be hard to extract insights from structured data alone. But using an analytics tool, you might decipher interesting trends, such as customers in Boston tend to buy a specific product at a higher rate in February and March. This insight could lead you to increase your retail store’s stock of that item during those months to meet regional demand.
Unstructured data is quite the opposite of structured data. It’s usually qualitative data, and it’s challenging to search, manipulate, and analyze using a traditional database or spreadsheet. Common examples include images, audio files, document formats, or someone’s social media activity.
Unstructured data isn’t easily readable or analyzed in a relational database because it lacks a pre-defined data model—meaning it requires a non-relational (or NoSQL) database or data lake to search. To extract insights from this type of data requires the use advanced analytics techniques such as data mining, data stacking, and statistics.
Semi-structured data falls somewhere between structured and unstructured data formats. This data has clearly defined characteristics but lacks a rigid, relational structure. It includes semantic tags or metadata that create hierarchies of classification—making it more machine-readable during analysis.
Semi-structured data makes up the majority of data generated in the world today. Just think about all the photos taken daily. Semi-structured data is most often associated with mobile applications, devices, and the Internet of Things (IoT).
The most common everyday example most people have encountered is a smartphone photo. While an average photo taken with a smartphone contains unstructured image content, it’s timestamped, geotagged, and carries identifiable information about the device itself. Some common semi-structured data formats include JSON, CSV, and XML file types.
Unstructured data insights help companies understand things like customer sentiment and preferences, buying habits, and more. It’s more challenging to analyze these types of data. But with the right resources, you can uncover intelligence that can give you a competitive advantage.
Unravel JSON data in seconds with Sigma for deeper analysis. Schedule a demo to see for yourself.
Analyzing all types of data
When it comes to analyzing various types of structured and unstructured data, you need the right data infrastructure, analytics tools, and process in place to be successful. Depending on the types of data you’re working with, and the end goal of your analysis, you will need to consider building a data lake or data warehouse for your organization.
Without the right processes and analytics stack in place, generating insights from your organization’s data will be difficult. Collecting, processing, and analyzing data is just as important—if not more so—than raw data alone.
Here’s a quick breakdown of the analytics process. For a more comprehensive review, read our Definitive Guide to Data Analytics.
Understanding data goals and requirements
A clear understanding of company goals and needs will inform how you approach big data analytics from the start. What type of data will you collect? How will you store it? Who will analyze it?
Collecting and centralizing data for analysis
Once you’ve got a clear understanding of your goals, you need to extract data from your systems and applications and transfer it to the data warehouse or data lake. That’s where an ELT and ETL solution comes into play.
Modeling data for analysis
Data can technically be analyzed once it’s in a central data store. But before you open the doors to the warehouse, you may want to consider a data model first. Data modeling defines how data is related, what it means, and how it flows together.
With data collected, processed, stored and modeled in a queryable data warehouse, you will then need an analytics tool that is up to the task of searching through all that data and returning actionable insights to steer business decisions.
Interpreting insights and informing decisions
Using various types of analytics methods, you can uncover all kinds of insights from company data. You can analyze the past, track operations in real-time, and even predict what might happen in the future.