Tuesday, December 1, 2015

In defence of data visualization rules

A PowerPoint slide from my Data Visualization class.
There are a couple of rules that pretty much everyone involved in the data visualization field has heard of:
  1. Pie charts are a bad way to visualize data.
  2. Bar charts should have a y-axis that starts at zero (sometimes also referred to as "baseline zero"). 
Lately, though, it seems a lot of people, including many whose whose views I respect, are pushing back against these (and other) data viz rules.

Vox recently published a slick video arguing, in their words: "Shut up about the y-axis. It shouldn’t always start at zero."

Tableau's Ben Jones picked up the theme with a blog post entitled, "The Backlash Against Data Dogmatism", in which he referenced the Vox video as well as a tweet by Randy Olson arguing that pie charts can sometimes be useful.

Then Matt Francis and Emily Kund over at the Tableau Wannabe podcast weighed in, arguing that people shouldn't be so darn strict about "the rules" of data viz.

What all these arguments seem to have in common is the idea that we should replace our strict "data viz rules" with something more nuanced: "It depends". As Ben Jones puts it:
I’m hopeful that the next phase of data visualization is one that embraces the gray of “it depends” and encourages open dialogue and constructive criticism. In order to get there, we’ll definitely have to shed dogma. Let’s absolutely do so, but let’s also carry forward the principles and rules of thumb that just make good sense, while being open to the possibility that breaking those rules might be a great idea in specific situations. Wouldn’t this be a more mature approach? Wouldn’t it also be more welcoming, and more enjoyable?
On its face, this is a hard sentiment to argue with. Who wants to be in favour of being "dogmatic"? Who can argue that there aren't exceptions to every rule?

My concern, however, is that in our rush to be open-minded, to approach every question about best practices with a vague "it depends", that we lose sight of just how valuable some of these rules are, especially for newcomers to the field.

So here's my defence of data visualization rules.

Point 1: Bad and misleading charts are everywhere.

While there no doubt are situations where a pie chart makes sense, and where a baseline zero doesn't, the vast, vast majority of pie charts out in the real world are just bad and most charts with a baseline other than zero are misleading.

I was over on Edmunds the other day trying to figure out the cost of owning different type of vehicles and stumbled across this jumbled mess:

My local realty site breaks down neighbourhood demographics like this:

And a few years back, British Columbia's premier used a bar chart with a baseline of $1,000 to exaggerate how low income taxes were in the province:

A chart, incidentally, that led me to write what I think remains the only front page story in B.C. history about a bar chart:

In contrast, I come across very few examples of bar charts that would have worked better as pie charts, or charts that would be clearer if a baseline other than zero was used.

It's simple math: If people applied the "data viz rules" consistently, it would do a lot more good than harm. We live in a world where we need people to follow the "rules" more, not less.

Point 2: Exceptions to the rules are rare.

This is related to Point 1, but worth emphasizing. While there are no doubt exceptions to the "no pie charts" and "baseline zero" rules, they are pretty darn rare.

* Completely unscientific estimate

Of the 100+ datasets I've visualized over the years, I can probably count the number where a pie chart would have been the best chart type, or a baseline other than zero was a good idea, on one hand.

Yes, there are exceptions to the rules. And I tell my students that. But the exceptions are so rare that to muddy the waters with "it depends" seems counter-productive.

Point 3: Newcomers to data visualization need clear rules.

I spend a lot of my time teaching data visualization to complete beginners: Students who, in most cases, have never even opened a spreadsheet before. And, in my experience, without clear rules those beginners will make bad charts. Really bad charts.

Because pie charts are so ubiquitous (see Point 1), if you give a beginner some data, chances are better than average that the first thing they'll make is a pie chart.

And because fiddling with the y-axis can make a chart look "better" (because it exaggerates what's really going on), beginners often won't see the problem in using a baseline other than zero.

In other words, beginners have a strong built-in bias towards making bad charts. Tell a beginner that, when choosing a chart type, "it just depends", or to just think critically about your y-axis, and the results will be pretty ugly. They simply don't have the experience yet to make good decisions.

This is true of most fields, which is why we usually begin by teaching people "rules" and "shortcuts" and then, only later, discuss the nuance and complexity.

In journalism, for example, we teach first-year students about the "inverted pyramid": The idea that the most important facts in your story should come first. There are tons of exceptions to this rule, like pretty much every magazine feature you've ever read. But the inverted pyramid is a handy shortcut for beginners and gets them out of writing their news stories like academic essays.

You need to know the rules before you can break them.

And, indeed, in some cases the rules are all we ever learn. Millions of us have learned the basics of CPR or how to identify a stroke victim with the four-stage FAST test. I'm sure cardiologists could point to situations where CPR is counterproductive, and neurologists to nuances of diagnosis that FAST misses. But the world is a better place by having a lot of people knowing the basics.

The same is true with data visualization (even if the stakes, thankfully, aren't quite as high). The data analyst who has to make an occasional chart to share with her superiors will do a better job if she steers clear of pie charts and uses a zero baseline, even if she never bothers to learn another thing about data visualization.

Bottom line: Nuance is OK but rules are important too.

In the past I have found myself on the other side of this debate, arguing that people should be free to bend the rules and that sometimes the "most effective" chart isn't the best way to grab your reader's attention.

And, when I teach my students the "rules" of data visualization, I always note that, like any rules, there are, of course, exceptions. But I also stress that those exceptions are rare so, if in doubt, they should steer clear of pie charts and start their charts at zero.

I also teach my students the "self-sufficiency test" as a way of getting them to think more deeply about the reason the rules exist. It works like this: Imagine your chart with all the numbers -- all the labels, all the numbers on the axis -- removed. What would someone looking at your chart think it was trying to tell them? Is that message clear? Is it accurate? (This simple chart from Wikipedia's entry on pie charts is helpful in getting the message across.)

But I'm also a realist. I don't expect all my students to remember my deeper points about what makes a chart misleading or accurate. I feel more confident that they'll at least remember the two simple rules I taught them:
  1. Pie charts are a bad way to visualize data.
  2. Bar charts should have a y-axis that starts at zero (sometimes also referred to as "baseline zero").
And if that's all they remember, they'll make better charts.

[ UPDATE: Based on feedback I received on Twitter, I changed Rule #2 in this post to specify that "Bar charts" should have a y-axis that starts at zero rather than simply "Charts". As Alberto Cairo rightfully pointed out to me, the rule doesn't apply equally to all chart types. There are some, like scatterplots, for which baseline zero is nearly irrelevant. And while I think line charts with a non-zero baseline can often be misleading (think stock or crime charts), Ben Jones pointed me to enough examples that I concede such exceptions are not "rare" when it comes to line charts. ]


  1. Good stuff, Chad, thanks for your response.

    I definitely agree that we should teach the rules of thumb to beginners. I'd only suggest adding Rule #3: There are exceptions to every rule, except Rule #3. This adds a dose of intellectual humility to the equation that many data viz "gurus" have failed to add in the past, resulting in a whole host of people out there who think certain visualization types are never warranted, and criticize them in knee-jerk fashion whenever they see them, regardless of whether or not they actually work well enough.

    I'd say that you and I mostly agree, but that our comments are directed toward two totally different problems: mine that a spirit of dogmatism does harm and creates an attitude of fear and mindless adherence and yours that there are a lot of bad charts out there. I think we'd agree that both of these problems exist.

    I also think we'd agree that the solution to both of these problems is to teach data visualization rules of thumb along with a dose of humility first, and then to educate beginners about the exceptions to the rule.

    I'd only humbly recommend not calling any visualization type "evil", as in my observation that tends to eliminate the element of intellectual humility that is important to reducing the problem I'm trying to address. People who are taught things are evil tend to be dogmatic about those things, so I'd love it if we could ditch that rhetoric entirely. Plus, sometimes pie charts just aren't evil at all, like the example you posted above.

    On my part, I'll be careful that my "it depends" mantra doesn't give people the impression that they can do whatever they want, regardless of whether or not it's effective. Ultimately, I believe it does depend, but that certainly doesn't free data visualizers of the onus of communicating well.

    Thanks again for weighing in. I hope these comments make sense.


    1. Hey Ben,

      Thanks for your thoughtful comment and I agree that we agree more than we disagree. :)

      I think pretty much everyone in this field agrees there is some value to these rules and that there are exceptions. It's just a question of emphasis. How much do we talk about the rules and how much do we talk about the exceptions?

      I think audience matters a lot here. If you're talking to complete beginners (which is mainly my audience), the rules are really important because they need shortcuts and their bias is towards making bad charts.

      With a more experienced/expert audience (frankly yours and those who listen to the Tableau Wannabe podcast), emphasizing the nuance is probably more appropriate, as those people already know the rules and are sophisticated enough to bend/break them.

      I defend my right to use the word "evil" though. :) I obviously don't think pie charts are literally a force for evil. But I think hyperbole and humour can help make an idea stick for students.


    2. Then call them evil if you must. They, too, are legal in the state of Washington though, and most people will try them sooner or later. And when they do, DataRemixed will be there to show them how to do it right, man. ;)

    3. You make some good points. But would the existence of rules have practically made any difference to the bad examples you highlight? My guess is no. If there were strict rules, the people who made those charts are not the most likely people to have read them. And anyone who does care enough to learn the rules probably does not need them in the first place.

  2. Surely in the few examples where pie charts aren't terrible, some kind of stacked bar chart would still be better than a pie? e.g. http://static1.squarespace.com/static/55b6a6dce4b089e11621d3ed/55b6cddee4b0f4fbddc2f557/55b6d093e4b0d8b921b030c1/1438044569313/ If the point of the pie chart tweet is that the pie more clearly separates "exactly 50%" from "around 50%" then I don't see that as much of an advantage, and if it's the percentage that's important for that data then why not plot percentage on the y-axis of the bar?