Sunday, August 25, 2013

Data Visualization: When To Use Which Graph

I started writing this blog when I was - bored one day - living in California. Eventually, I moved back to Washington, DC and stopped updating the blog. Now that I have moved out of DC (slightly south to Norfolk, Virginia), I feel like picking it up again. One gripe that I have is that my previous posts were written on my trusty Acer laptop running on Windows XP. I had no problem with it. My new Lenovo Yoga 13 laptop running on Windows 8, which cost twice as much, is turning out to be quite the piece of crap. Anyway, today's topic is data visualization - which is depicting data in some schematic or graphical form so that the main points of the analyses can be easily and succinctly communicated to the user. We have heard that 'a picture is worth a thousand words'. More true when it involves numbers.

When to use what graph depends on a number of factors. Please note that these are all suggestions. Only you can decide what graph best represents your data.

Graphs based on data types

Use line graphs to show continuous data. The graph below shows temperatures for New York City over a period of six days.

Use bar graphs to show categorical or discreet data.In the example below, the number of hours spent watching television has been grouped into distinct categories.

Use both line and bar graphs to show intersection of continuous and categorical data.In the example below, the number of TB cases per year (discreet data) in California are shown as bar graph, while the infection rate (continuous data) is shown as a line graph. The graph also makes clever use of blue for the right axis to align it with the line graph. Such clever uses of color and technique enables good data visualization.


Graphs based on Number of Items

If your data can be grouped to few categories, use vertical bar graphs to show those categories. In the example below, the data has been grouped to only four music type categories: hip hop, classical, rock and Jazz.

On the other hand, if your data can be grouped into many categories, use horizontal bar graphs instead. The example below shows the population for 10 countries.


Graphs showing proportion

Use pie charts to show share of total (i.e., all the parts add up to 100 percent). Use this chart when the actual size or number is not important, but you are trying to convey that some parts or portions are significantly greater than others.

If you want to show proportion in addition to another variable (for example, year), use stacked bar graphs.

On the other hand, if you want to show proportion in addition to two or more variables, use a heat map. The cell size represents proportion in relation to total, while the impact of the other variables are captured in the cell color.In the example below, cell color is based on retail sales.


Graphs based on time

If you're showing data for only a few periods involving one or several items, use bar charts. The chart below shows one item (i.e., funding) for five years.

If you're showing data for a few periods but involving many items, it's best to use line graphs. The graph below shows wage increases for five countries for a period of 10 years.

Use a line graph if the data involves few items but a longer period of time. The graph below shows the U.S. annual GDP and national debt for 220 years.

Use a circular area chart to show cyclical periodic data. The graph below succinctly shows and compares the annual average temperature  for three cities: Bermuda, Sydney and Memphis.


Graphs showing relations between variables

Use a scatter plot chart to show the relationship between two variables.

Use a bubble chart to show the relationship between three variables.


Graphs Based on Location

Use a map to show data if it involves area or a location. The map below shows what carbonated beverages are called in different parts of the country.

Data Visualization Cheat Sheet

Here's a map which summarizes all the information above into one neat page that can act as a cheat sheet:

Finally, here's a few pointers when creating data visualizations.
  • Easily to understand. The purpose of the visualization should be apparent to the viewer.
  • Use only elements that are needed to provide a point, not more or less. Do not crowd the graph.
  • Be aware of aesthetics  (i.e.,  avoid loud color schemes if possible).
  • Make sure that the graph accurately represents the underlying data. 
  • Properly title the graph and assign labels to the graph elements
  • Consider using a bright color to show a break or unusual data activity, while pale colors for the rest of the graph elements.
  • Be cognizant of the space that can be allocated to the graph.