Data Visualization: The Definitive Guide
Content Marketing Manager, Sigma
How many times have you heard the expression, “A picture is worth a thousand words?” Nowhere is this sentiment more true than as it relates to data, especially big data. Research from the World Economic Forum reveals that the world produces a whopping 2.5 quintillion bytes of data every single day. With so much information at our fingertips, how can we possibly manage and make sense of it all?
The task of interpreting your data and presenting the findings to others can be incredibly daunting. If you present the data ineffectively, you risk weakening your message and damaging your reputation. The good news is you don’t need to be a statistician or even a hardcore data nerd to crack the data code.
Enter data visualization.
This guide will walk you through the fundamentals and best practices of data visualization, along with the most common types of data visualization examples and some guidance on how to choose a best-in-class data visualization tool. We’ve outlined everything you need to know about data visualization and how to use it to tell a compelling story for your organization.
What is data visualization?
Data visualization describes any effort to help people understand the significance of data by placing it in a visual context. We use visualization to quickly make sense of data, which would otherwise be difficult to draw conclusions from. Its main objective is to distill large datasets into visual graphics to allow for an easy understanding of complex relationships within the data.
Given its complexity, leveraging data to provide meaningful solutions requires insights from several different fields, including statistics, data mining, graphic design, and information visualization.
Data visualization is one of the critical steps of the data science process — a framework for approaching data science tasks — developed by Harvard Statistics professor, Joe Blitzstein. After collecting, processing, and modeling the data, we can visualize the relationships and make our conclusion.
Thanks to rapid developments in data visualization software, patterns, trends, and correlations that often go undetected in text-based data are now exposed and analyzed with ease. Data visualization tools empower the user to choose the best way to present the data and typically offers a dashboard element, enabling users to pull multiple visualizations of analyses into a single interface for maximum impact.
Visualizations act as a campfire around which we gather to tell stories.
CEO, Net Objectives
Data visualization fundamentals
The most critical part of understanding data is identifying the question you want it to answer. All data problems start with a question and end with a narrative frame that offers a definitive answer.
Before beginning a data visualization project, ask yourself, “Why was the data collected, what’s interesting about it, and what stories can it tell?” Follow these four crucial steps to prepare your organization for a data visualization project.
Understand the data you want to visualize. Since every project has different requirements, your visualization should convey the unique qualities of the data set it represents.
Establish what you want to visualize and what type of information you want your visualization to communicate to your audience.
Get to know your audience and uncover insights about how it processes visual information. Who is your audience? What are their goals when viewing a visualization? What will they learn? How will your audience use the piece?
Decide on the visual that will convey the data in the best and simplest form for your audience.
After creating a visualization, ask yourself these three questions:
- Is it approachable? Ensure your design is straightforward and easy for the intended audience to understand.
- Does it tell a story? You want your data visualizations to translate data into knowledge.
- Is it actionable? Your data visualization should offer guidance via visual cues that provide the audience with actionable insights.
With Sigma, you can analyze cloud data and make visualizations in minutes. Try it free for 14 days.
The 7 stages of data visualization
In his book, Visualizing Data, Ben Fry lays out seven distinct stages of data visualization.
1. Acquire — The first stage of data visualization deals with obtaining the data. Data may be retrieved from your on-premise servers or a cloud-based storage service like AWS, Microsoft Azure, or Google Cloud. When acquiring data, also consider how it can change, whether occasionally (monthly, quarterly, annually, etc.) or perpetually.
2. Parse — Next, you’ll want to provide some structure to give the data more meaning. Tag and categorize each part of the data by its intended use. With the completion of this step, the data is successfully tagged and consequently more useful to a program that will manipulate or represent it in some way.
3. Filter — Not all aspects of a data set are relevant. Often, less detail will convey more information because the inclusion of overly specific details causes the viewer to miss what’s most important or disregard the image entirely because it’s too complicated. Filter the data to remove portions of little to no interest.
4. Mine — Use statistics and data mining methodologies to spot patterns or put the data in a mathematical context.
5. Represent — This stage is the linchpin that informs the single most crucial decision in a visualization project. Decide what form the data will take. Examples include a line graph, pie chart, or tree diagram.
6. Refine — In this step, graphic design methods are used to further clarify the representation by making it easier to understand and more visually engaging. Methods may include establishing a hierarchy to highlight a particular portion of data or by changing attributes (such as color) to improve readability.
7. Interact — Empower the user with the ability to control or explore the data. Manipulations could include controlling what features are visible or changing the viewpoint of the data.
Dig into your cloud data warehouse and start visualizing insights with Sigma.GET STARTED
Data visualization techniques and examples
Modern data visualization tools extend beyond the limitations of the rudimentary charts and graphs created in Microsoft Excel, displaying data in increasingly sophisticated ways. There are five distinct categories of data visualizations:
Data visualizations fall under the temporal category if they satisfy two conditions — they’re linear and one-dimensional. Temporal visualizations usually feature lines that either stand alone or overlap with each other, with a start and finish time. Temporal visualizations include scatter plots, polar area diagrams, time-series sequences, timelines, and line graphs.
Data visualizations that belong in the hierarchical category are those that order groups within larger groups. Hierarchical visualizations are appropriate for displaying clusters of information, especially if they flow from a single origin point. Hierarchical visualizations include tree diagrams, ring charts, and sunburst diagrams.
Network data visualizations demonstrate the relationship between datasets within a network. Matrix charts, node-link diagrams, word clouds, and alluvial diagrams are all forms of network visualizations.
Multidimensional data visualizations include two or more variables to create a 3D data visualization made up of several concurrent layers and datasets. Multidimensional visualizations include scatter plots, pie charts, venn diagrams, stacked bar graphs, and histograms.
Geospatial or spatial data visualizations deal with physical locations, overlaying maps with various data points. Flow maps, density maps, cartograms, and heat maps are types of geospatial data visualizations.
Common Types of Data Visualizations
Perhaps the simplest way to visualize data, a column chart is a way to show a comparison among different sets of data. You can also use them to track data sets over time. Column charts are only suitable for small and medium-sized data sets.
Bar charts organize data into rectangular bars, making comparing related data sets incredibly easy. Best used to highlight change over time, compare different categories, or examine parts of a whole, bar charts resemble column charts, except that the latter has limited label and comparison space.
Line charts excel at showing resulting data relative to a continuous variable – most often time or money. Line charts also help show trend, acceleration, deceleration, and volatility. Use them to understand trends, patterns, and fluctuations in your data, compare different but related data sets with multiple series or make projections.
Scatterplots display the connection between items based on two sets of variables. Best used to exhibit correlation in a large quantity of data, scatterplots are useful when looking for outliers or understanding data distribution.
Sparklines are small line charts, usually designed without axes or coordinates. A sparkline demonstrates the general shape of the variation in some measurement — most commonly over time. Use a sparkline when you can pair it with a metric that has a current status value tracked over a specific period, or you want to show a particular trend behind a metric.
Best suited for making part-to-whole comparisons with discrete or continuous data, pie charts are most impactful with a smaller data set. Use a pie chart to compare relative values, compare parts of a whole, or quickly scan metrics.
Gauge charts can help you quickly determine how a given field is performing versus how it is expected to perform. With a Gauge chart, the visualization displays your chosen metric along a scale displaying color according to where your metric falls on the expected scale. The arrow below the value ranges will show you where your current metric falls on the scale.
A waterfall chart helps you comprehend the cumulative effect of sequentially introduced positive or negative values. These intermediate values can either be time-based or category based. Use a waterfall chart to reveal the composition or makeup of a number.
Funnel charts help you visualize a linear process with sequential connected stages, with each funnel stage representing a percentage of the total. Choose a funnel chart to display a series of steps along with the completion rate for each.
Heat maps present categorical data, using color intensity as a representation of values of geographic areas or data tables. They show the relationship between two measures and provide rating information.
A histogram is a combination of a vertical bar chart and a line chart. It’s similar to a bar graph, but a histogram relates only one variable as opposed to two. Use a histogram to make comparisons in data sets over an interval of time or to show a distribution of data.
A box plot (aka box and whisker diagram), graphically depicts groups of numerical data through their quartiles, typically across groups, based on the minimum, first quartile, the median (second quartile), third quartile, and the maximum. Outliers may be plotted as individual points. Use a box plot to display or compare distribution of data and to identify the minimum, maximum, and median of data.
Map visualization is the tool of choice when you need to analyze and display data related to geography and present it on a map. They allow us to visually see the distribution or proportion of data in each region. Use a map when geography is a critical part of your data story.
The data table is an efficient format for comparative data analysis on categorical objects. Tables allow you to display both data points and graphics, such as bullet charts, icons, and sparklines. Use a table to display two-dimensional data sets that can be organized categorically or to display large amounts of data.
Indicators (also called angular gauge) offer a way to present changes you’re tracking in your data and allow you to display one or two numeric values. Typically, you’ll use something like a gauge or a ticker to show which direction the numbers are heading in.
Area charts depict a time-series relationship. But unlike line charts, they can also visually represent volume. Most often, area charts compare two or more categories. To visualize how various items stack up or contribute to the whole, opt for an area chart.
A radar (or spider chart) is a two-dimensional chart designed to plot one or more series of values over multiple quantitative variables. They’re useful for understanding the relative differences between items in your data.
Visualizations for hierarchical data, treemaps are made up of a series of nested rectangles of sizes proportional to the corresponding data value and broken down into 2-3 different layers to show the hierarchical relationship between items.
Great for displaying simple comparisons or ranking relationships, a bubble plot is a scatter plot with bubbles, best used to display an additional variable, while a bubble map is best used for visualizing values for specific geographic regions.
A word cloud chart is used to display a large amount of text data and can quickly help users to perceive the most prominent text. Word clouds are ideal for keyword research.
A pivot table compiles, clarifies, and sums up information stored in other tables and spreadsheets, uncovering the most pertinent insights. They are also used to create unweighted cross-tabulations quickly.
Data visualization benefits
Many departments within a business implement data visualization tools to track departmental initiatives. For example, a sales team might leverage data visualization software to monitor the performance of an outreach campaign, tracking metrics like qualified leads, demos, free trials, and purchases.
Data visualization tools also help data engineers and scientists track data sources and perform basic exploratory analysis of data sets prior to or after more detailed advanced analyses. As a result of data visualization, organizations achieve the following benefits.
The purpose of visualization is insight, not making pretty pictures.
Computer Scientist, University of Maryland
Faster decision making
Visual information is much easier to process than written information. By using a chart or graph to summarize complex data, the audience absorbs it quickly, allowing business leaders across the enterprise to evaluate and interpret the data.
Companies that gather and quickly act on data enjoy a competitive advantage in the marketplace because they can make informed decisions faster than the competition.
Identification of areas for improvement
With the help of data visualization, organizations can see where performance is high, as well as where there’s room for improvement. For example, if your marketing team knows that for every X number of campaign emails, Y number of website visits will result, creating a visual report based on clicks per email, and progress to traffic goal is a visual motivator to meet the traffic quota.
Visualizing data has been shown to decrease the length of meetings, reduce the time it takes to find information, and provide a boost to overall productivity. According to McKinsey & Company, one major metal manufacturer increased production rates in one of its lines by 50 percent by using real-time performance visualization in its operators’ stations.
Organizations that leverage data visualization increase their bottom line. In a study of global businesses, the organizations that use data visualization are leaders in revenue growth and plan to invest even more in data visualization in the next year.
The ability to visualize trends gives leaders greater awareness of the performance of the company. It empowers them with the insight necessary to build upon favorable trends and reverse negative ones.
Data visualization best practices
There are several best practices to keep in mind when visualizing data. It all starts with defining the purpose of your data visualization project and ends with finding the story that lies within your data set.
DEFINE A CLEAR PURPOSE – The initial and most crucial step in creating a great visualization is knowing what you want to achieve. Decide what you want to measure and why. What’s the fundamental question you want your data to answer?
KNOW YOUR AUDIENCE – When developing a visualization, consider your customer personas, then pick one that is the highest priority. Determine who will see the data, what key challenges they face, and what hurdles they must overcome to achieve a specific goal.
USE THE RIGHT TYPE OF VISUALIZATION TO REPRESENT THE DATA – Consider the guidelines detailed above to determine which type of visual will work best for your data. This decision can make or break your project. So, choose your format wisely.
PROVIDE CONTEXT – Ultimately, you want your visualization to drive an action. To help your audience better interpret the data, compare performance to something tangible, like a goal or a benchmark from a previous period. The more context they receive, the easier it is for them to know where the action is required.
KEEP VISUALIZATION AND DASHBOARDS SIMPLE AND DIGESTIBLE – Make visualizations snackable. Viewers should be able to quickly understand high-level overview information and help themselves to more information as needed.
DESIGN TO KEEP USERS ENGAGED – Apply user-centric design principles to your data visualizations. Think about what problem the user is trying to solve, what roadblocks are keeping them from solving that problem, what information and functionality they need to solve that problem, and how to visualize that for them in an interactive way.
DESIGN ITERATIVELY – Obtain a large portion of requirements and begin designing concept proofs and prototypes immediately. Then, solicit feedback and revise accordingly. Resist the temptation to wait until all of your requirements are fulfilled, or you risk never getting your project off the ground.
FIND THE STORY IN YOUR DATA – Without a compelling narrative to accompany it, even the most aesthetically-pleasing data visualization will fail to make an impact. Determine what the data visualization reveals. Look closely to discover any emerging patterns and trends that might tell an interesting story. By making it relevant to your audience, you go from just showing the data off to telling a compelling story with the data.
What to look for in a data visualization tool
Your data is only as good as your ability to understand and communicate it, which is why choosing the right visualization tool is essential.
There are dozens of tools on the market that offer data visualization and data analysis. These range from simple to complex, from intuitive to obtuse. Not every tool is right for every organization. Gartner’s Magic Quadrant for Analytics and Business Intelligence Platforms listed these five things to look for.
- Agile, centralized BI provisioning – When choosing a data visualization tool, make sure it supports an agile workflow, supported by the IT department, and features self-contained data management functionality that enables managers to make faster and more informed business decisions.
- Decentralized analytics – Individual users and business units should also be able to quickly and easily access data without involving IT.
- Governed data discovery – Your data visualization tool should address users’ requirements for easy data delivery while at the same time, satisfy the requirements of the IT department for managing and securing the data.
- OEM or embedded analytics capabilities – Users should have the ability to explore their data to better understand where they can increase efficiency and quickly gain actionable insights.
- Extranet deployment: A robust data visualization tool should at the very least include an SDK and API used to provide data to external businesses, customers, suppliers, distributors, or other business partners. However, more modern tools (like Sigma) are cloud-based and can be shared and accessed right from a browser via URL.
13 critical capabilities of best-in-class data visualization tools
- BI platform administration, security and architecture
- Cloud BI
- Data source connectivity and ingestion
- Metadata management
- Data storage and loading options
- Data preparation
- Scalability and data model complexity
- Advanced analytics for citizen data scientists
- Analytic dashboards
- Interactive visual exploration
- Embedding of analytic content
- Publish, share and collaborate on analytic content
- Ease of use, visual appeal and workflow integration
Wrapping up: closing thoughts on data visualization
An effective visualization does a lot more than just display a data set. It creates a narrative, providing a clear answer to a specific question, minus the minutiae. The end goal is to educate and engage your audience with your insights.
We’re all up to our eyeballs in information, and the ability to leverage data to tell a story is an increasingly important skill. Whether you’re communicating the findings of your customer survey, making a presentation to the board, or simply engaging your target audience, your success or failure all comes down to your mastery of data storytelling.
Data visualization serves as one of the most critical tools in your storytelling arsenal. It helps to reimagine business intelligence and introduces organizations to new ways of understanding and utilizing their data. Emerging as a critical foundation for democratizing data and making intelligent insights available to everyone within an organization, modern data visualization tools empower users, reduce the dependence on overburdened IT departments, and help to drive the business forward.