Tuesday, December 3, 2019

Tableau Training in Vancouver this February

My next two-day public Tableau training workshop will be held on Tuesday, Feb. 18th and Wednesday, Feb. 19th at SFU Harbour Centre. You can buy tickets here or by clicking the button below:

Here are some testimonials from people who've attended my earlier training sessions.

If you can't make this workshop but would like to be alerted when the next one is scheduled, just add your name here.

If you have several people at your organization who need training in Tableau, I'm also available for onsite training.

Friday, November 29, 2019

Video: "How to Teach Data Viz to Skeptics" by Chad Skelton

I had the pleasure of speaking at VisInPractice, part of the IEEE VIS 2019 conference in Vancouver this fall on the topic of how to teach data visualization to students who aren't that interested in data visualization.

The video of my talk is below. You can check out all the great VisInPractice talks here.

VIS in Practice 2019: How to Teach Data Viz to Skeptics from VGTCommunity on Vimeo.

Thursday, March 14, 2019

Online Tableau Training this April

My next online Tableau training workshop will be held this April over three Thursdays: April 11, 18 and 25. You can buy tickets here or by clicking the button below:

Here are some testimonials from people who've attended my earlier training sessions.

If you can't make this workshop but would like to be alerted when the next one is scheduled, just add your name here.

If you have several people at your organization who need training in Tableau, I'm also available for onsite training.

Thursday, January 10, 2019

Five ways to get your students to participate more in class

U.S. Department of Agriculture / Flickr

Over the years I've stumbled across a number of techniques for improving student participation in my classes that have worked really well. Here are the five key ones (you can click on each one to be taken to a more detailed description):
  1. Explain why participation is important.
  2. Make sure students are prepared for weekly discussions.
  3. Have students self-report their participation marks.
  4. Get students to discuss a question in small groups first.
  5. Call on students at random.

Full post:

About five years ago, I was asked to take over a new course at KPU, the university where I teach: Introduction to Journalism.

I was worried.

Not because I didn't know the subject matter. I'd been a working journalist for more than 15 years.

And not because I didn't know how to teach. At that point, I'd already been teaching for several years.

The problem was that the Intro course had been designed by its previous instructor as a discussion course. Each week, students came to class and — facilitated by the instructor — discussed issues like journalism ethics and the business model for news.

I had no idea how to teach such a course.

In my many years of teaching, both at the university and my own private workshops, I always taught people how to do things. Whether it was how to build an interactive chart, or how to do a court search, my classes were always very hands-on and practical.

Class participation was always a part of my other classes, but it wasn't the point of the class.

I worried about how I would get my students engaged enough in discussions about journalism to fill up a three-hour class each week.

Unfortunately, the first couple of times I taught the class, many of my worst fears were realized.

I'd throw a discussion question out to the class — "So, when is it OK to use anonymous sources in a news story?" — and be met with stony silence. Sometimes, the one or two keeners in class would share their thoughts, but getting the rest of the students to take part was like pulling teeth.

I'd occasionally pull the classic instructor trick of calling on a student who wasn't participating to share their thoughts. But doing that always seemed slightly mean — putting a student on the spot who wasn't prepared — and, regardless, it rarely elicited more than a shrug and a poorly thought out answer.

I started to dread the days that I taught the Intro course. I suspect my students did too.

But then things started to change.

Almost by accident — an article here, a podcast there — I picked up a few ideas for how to improve student participation and gave them a test-run in my class. After some initial success, I got brave enough to experiment with some ideas of my own.

Over time, the participation levels in my Intro class started to increase, gradually at first and then quite dramatically.

What had once been three hours of painful, awkward silence became a spirited weekly discussion with students who were engaged and interested in the topic.

Intro to Journalism is now one of my favourite classes to teach, one I look forward to every week.

In the hopes it might be of some assistance to other instructors out there, below I share the five things I think made the biggest difference in improving participation in my class.

While my experience is at the university level, I think most of these techniques could be easily applied to high-school classes and (with some modification) even lower grades.

One big caveat: I'm a data guy so I feel it's important to note that I haven't subjected any of these techniques to rigorous analysis like a randomized controlled trial. My evidence in support of all of them is purely anecdotal and based entirely on a single course. Your results may vary.

But getting students to participate more is such a common challenge in teaching that I thought these ideas were worth sharing.

So, without further ado, here are my five tips for getting your students to participate more in class.

1. Explain why participation is important.

If participation is a key part of your course (and especially if it's a component in a student's final grade), I think it helps to explain to your students why.

Part of that explanation, of course, is personal: Participation is an important part of their own learning, to help them understand the course material better.

But I also impress on students that we're all in this together: We're going to be together in this room for three hours every week and a lot of that time is going to be taken up by class discussion. If people don't participate, those three hours are going to go by really slowly. In contrast, if everyone participates and does their part, the hours will fly by and we'll all have fun. I find students really respond to that sense of common purpose.

2. Make sure students are prepared for weekly discussions.

A common problem in teaching is the "curse of knowledge": Teachers are such experts in their field that they have trouble remembering how daunting a topic can be to complete beginners.

I was often guilty of this when it came to class discussions. Some topics are so commonly discussed among working journalists — the use of anonymous sources, newspaper paywalls — that I expected students to already have opinions about them, or to be able to come up with an opinion on the spot.

But, of course, a first-year student taking an introductory journalism course has, in most cases, never thought about these topics at all.

If you want to have a meaningful class discussion about a topic, you need to make sure students have had some time to learn about the topic and reflect on it before class begins.

The typical way to deal with this challenge is with weekly readings: Have students read an article or two on the topic before class so they're ready to discuss it.

The problem, of course, is that many students won't do the assigned readings or, even if they do, will skim them in a way that doesn't prepare them to think deeply about the discussion topic.

I use a couple of strategies to address that.

First, each week, along with the assigned readings, I give students a single question about the readings. For example, I'll have them read an article or two about paying sources for stories and then pose the question: "Under what conditions is it OK for a news organization to pay money to a source for a story?"

Each week, students have to email me a very brief "weekly report" in which they answer — in at least two sentences — that week's question. I don't make the reports worth a lot of marks, but it's worth enough that students won't blow them off.

The other technique I use are quizzes. Each week, I give students a brief, three-question multiple-choice quiz on that week's readings. The quizzes are designed to be super easy for students who've done the readings (i.e. "What is this reading about?") and super hard for those who haven't.

Together, the report and the quiz make it hard to do well in the class without doing the readings. And it ensures students are adequately prepared to participate fully in class discussions.

I often still spring some new questions on students in class — or get them to respond to a video or audio clip that they're seeing in class for the first time. But at least students are well prepared for that one, main question every week.

3. Have students self-report their participation marks.

This idea I stole from the instructor who taught the Intro class before me, and it's a great one.

Instead of the instructor being responsible for keeping track of each student's participation marks, students report their own participation each week on a sheet of paper (here's an example).

I tell students they're expected to speak at least twice each class. They then give themselves a checkmark for each time they speak (up to a maximum of two).

This idea was meant to solve one problem: In a large class, it's really hard to know every student's name at the start of the course.

But I think it also solves another problem: It makes participation marks simple and transparent.

Participation marks that are assigned by the instructor can often be a bit ambiguous. Are students being marked on how much they participated in class? On the quality of their in-class contributions?

Worst of all, I think that very ambiguity can discourage some students from participating. Many students don't participate because they don't think they have anything valuable to say. If an instructor is going to ultimately decide whether your participation is "good" or "bad", it's too easy to convince yourself that you don't have anything "good" to say and not participate at all.

And, from the instructor's perspective, even if some students have "better" things to say in class than others, you want all students to participate, not just the keeners.

A self-reported checkmark system removes that ambiguity. Attend every class and speak twice each class? You're going to get 100% for participation. Simple.

To avoid abuse, I put some basic parameters on what qualifies for a checkmark: It needs to be a contribution of at least a couple of sentences (ie. "I agree with what she said" doesn't count). I also remind students that there is nothing more obvious in class than someone who says nothing, so if they cheat and give themselves checkmarks when they don't deserve them, I'll notice.

Finally, I recommend putting out the checkmark sheet at the end of class. If you bring it out at the break, some students will give themselves two checkmarks and then skip the second half of class.

4. Get students to discuss a question in small groups first. 

It took awhile before I tried this one, but I've found it makes a big difference.

Before we discuss a topic, or question, as a whole class, I have students discuss the question in smaller groups first.

Using this simple "team maker" tool, I break the class into four or five groups and then give them 5 to 10 minutes to discuss the topic in their small groups. Then I open it up to a discussion of the whole class.

This isn't typical "group work". Students aren't asked to present on behalf of their group or anything like that. The small-group discussions are simply meant as a warm up for the main event. But it works wonders.

Shy students may think what they have to say isn't very interesting and so are reluctant to say it in front of 30 or more of their fellow classmates. But put them in a group of just five or six fellow students, and it's not nearly so intimidating to share their thoughts.

And when that small group finds what they have to say interesting, it gives them the confidence to share their thoughts with the whole class later.

The key, I think, is not to overdo it with the small group discussions. Five minutes is often plenty to get the ball rolling. It's also important to stress to students that just saying something in their small group doesn't count for a participation checkmark. They need to share it with the whole class for it to count.

5. Call on students at random.

This idea came from a interview with education expert Doug Lemov on the EconTalk podcast. At about the 18:00 mark, Lemov talks about using "cold calling" in an elementary-school class. Instead of asking a question and waiting for students to raises their hand, Lemov encourages teachers to just call on any student at random.

The genius of cold-calling, according to Lemov, is that it forces all students, even those not called upon, to think about their answer. It's also a lot faster, because you don't have to wait for students to raise their hand before calling on them or — even worse – have no students raise their hand and then basically plead with your class for someone to answer the question.

What's interesting about cold-calling is that it's a technique that's already used by most instructors, but poorly: Either out of desperation, when no one raises their hand. Or (somewhat) cruelly, to put a student who never participates on the spot.

The key to making it work, I think, is consistency: To use it all the time, for every question.

I jokingly tell my students that, instead of picking on the one or two students who never raise their hand, I instead pick on everybody.

I'm also a bit more systematic in my approach.

Partly because it takes me awhile to learn my students' names and partly because I don't trust myself to be completely fair in calling on students, each class I randomize the class list. (I randomly sort an Excel spreadsheet but this online list randomizer works just as well.)

Then — for every question — I start by calling on the three students whose names are at the top of the randomized list. Then, for the next question, I call on the next three. And so on. The first time I do this, I show students how I randomize the list on screen. Every time after that, it's secret, so students never know when they might be called on.

While I start each class discussion by calling on students at random, after the first three students have been called on, any student can raise their hand and talk. And, in my experience, many do.

I think that's because, fearing they might be called on, most students have prepared a response in their head. So, then, when they're not called on, they figure they might as well share their thought anyways and get their participation checkmark.

Those are my five suggestions for how to get students to participate more.

If you've got a trick to share, please add it in the comments. Note: To avoid spam comments, all comments on this site are moderated, so it may take awhile for your comment to show up.

Tuesday, July 24, 2018

My next Online Tableau Training workshop is this September

My next online Tableau training workshop will be held this September over three Thursdays: Sept. 13, 20 and 27. You can buy tickets here or by clicking the button below:

Here are some testimonials from people who've attended my earlier training sessions.

If you can't make this workshop but would like to be alerted when the next one is scheduled, just add your name here.

If you have several people at your organization who need training in Tableau, I'm also available for onsite training.

Thursday, June 7, 2018

How much evidence do we need for a data visualization "rule"?

In a separate post, I laid out some of my arguments for why I think most line charts should start at zero. I posted some of my initial thoughts on that topic on Twitter, which generated some really thoughtful replies.

One of them, from Steve Haroz, noted that he knew of know evidence that people read non-zero-baseline bar charts any differently than non-zero-baseline line charts. And, furthermore, that we should be careful in talking about data visualization "rules" when our evidence for them is weak or nonexistent.

This led to a quite spirited discussion about whether data-visualization "guidelines" or "rules of thumb" that don't have any empirical research to back them up can still be valuable, or if we should stick primarily to those things that we have solid evidence for.

Speaking personally, I didn't fully appreciate the gaps in data visualization research until I watched Robert Kosara's excellent talk at the University of Washington, "How Do We Know That?"

The talk is based on Kosara's paper, Empire of Sand, which I now assign to my students at the University of Florida.

As Kosara points out, many of the things we think we know about data visualization have little empirical evidence to back them up. And other well-accepted "rules" may actually be wrong (for example, "chartjunk" may not be so bad after all).

Some rules are based on nothing more than the strong opinions of influential early writers in the field (like Edward Tufte and Jacques Bertin) and have not actually been subject to peer-reviewed research.

So where does that leave us as data visualization practitioners and teachers?

It would seem obvious that we shouldn't teach "rules" that we know to be wrong. But what about the many areas for which there is little or no empirical evidence at all? Can theory replace research in some cases? Is a common practice worth teaching our students even if we don't know it to be true?

Below, I've tried to collect some of my own thoughts on the matter as well as those of others who took part in the Twitter discussion.

First, though, a big caveat about my own tweets: While I teach at a university and have (strong) opinions on how to teach data visualization, I'm an "instructor" not a "professor". I don't have a PhD and I'm not engaged in academic research myself.

Let's get to the tweets!

I was curious about the project Enrico mentioned but Chen didn't appear to be on Twitter, so I sent him an email.

Chen sent me a very nice email back directing me to the Visualization Guidelines Repository.

The repository is still a work in progress, but an example on "chartjunk" suggests it could eventually be similar to what Ben Jones was suggesting: Links to where guidelines come from and studies that support or refute them.

There is also a related project, VisGuides, which is a platform to discuss visualization guidelines. (VisGuides was presented at Eurovis this week.)

Chen told me the two projects were setup by four visualization scientists: Alexandra Diehl, Alfie Abdul-Rahman, Menna El-Assady and Benjamin Bach.

It will be interesting to see how the Repository and VisGuides develops.

But I wonder if there isn't also a space for something more like the University of Chicago economists survey, but for data visualization: A place where people can see at a glance what leading practitioners in the field think about different guidelines.

I think this would provide useful information about which guidelines are universally accepted (i.e. "95% of practitioners think bar charts should start at zero") and which are more contested (i.e. "30% of practitioners think line charts should usually start at zero").

With sufficient buy-in, it could also provide a one-stop shop for people to check in with their favourite thinkers in the field when struggling with a chart decision. ("I want to make a pie chart with eight slices. What would Alberto Cairo think about that?" "Would Cole Nussbaumer Knaflic approve of me truncating this axis?")

If you've got thoughts on this topic, please post a comment below or hit me up on Twitter. Because of spam comments, my comments are moderated so don't be alarmed if yours doesn't show up right away. It will within a few hours.

Bar charts should always start at zero. But what about line charts?

If there's one thing almost everyone agrees on in data visualization, it's that bar charts should start at zero.

Starting them anywhere else — truncating the y-axis — risks misleading your audience by making a small difference look like a big one.

Yet many experts agree that while the baseline zero rule is pretty much ironclad for bar charts, it doesn't necessarily apply to other chart types. And, in particular, it doesn't always apply to line charts.

The argument is that because bar charts encode data by length, truncating the axis naturally misleads your audience. In contrast, line charts encode by slope or position, so baseline zero isn't as important.

But I'm not so sure about that.

Here's an example.

When people talk about how how truncating the y-axis can make a bar chart misleading, it usually doesn't take too long before this infamous chart from Fox News comes up.

But let's imagine that, instead of a bar chart, Fox had used a line chart instead.

Isn't that chart misleading, too? I would say yes. And I think it's because — while bar charts and line charts are clearly different — I'm not sure that the average reader interprets them that differently.

In my personal experience, and what I've observed in others, people "decode" a line chart in much the same way they decode a bar chart: By the distance of the mark from the baseline. Which means a line chart with a non-zero baseline poses a similar risk of misleading people as a non-zero bar chart.

This isn't an original idea. In a 2013 blog post on baselines, Robert Kosara said he thinks baselines can be important on both bar charts and line charts:

Some people suggest that in contrast to bar charts, line charts are not sensitive to the baseline problem. However, I disagree. Look at the same data as before, this time shown as a line chart.

Is the change not much more dramatic in the right-hand part of this image? The line chart maps the value to vertical position rather than length, which is less obviously connected to the axis [than bar charts]. But when the points are connected, we tend to think in terms of the distance from the axis, not in terms of a few points floating in space. 
Line charts with a non-zero baseline are very common. They are still problematic, however, because the apparent change can be deceiving. Having to look at the numbers on the axis to figure out the amount of change requires a lot more mental work and partly defeats the point of the chart.
And, indeed, there's some preliminary empirical evidence to back up the idea that truncating the axis is a problem on line charts, too.

A 2015 research paper looked at how various "deceptive" charts affected the way people perceived the message in a data visualization.

For example, people were shown two bar charts and asked how much bigger one bar was than the other on a 5-item Likert scale from "slightly better" to "substantially better". (The charts shown here are examples from the paper; the actual ones tested were somewhat different.)

Not surprisingly, people were more likely to say the difference was substantial when the y-axis was truncated.

The study didn't specifically look at truncated y-axes for line charts. But it did look at line charts with a distorted aspect ratio, which has a very similar effect (as changing the aspect ratio, like truncating the axis, can make a line look more or less steep).

Interestingly, the study found readers were also misled by the distorted line chart. And, in fact, the gap between the control and the deceptive line chart was greater than it was for the bar charts.

As Enrico Bertini, one of the paper's authors notes, the values used in the bar and line charts were not the same, and so we can't really compare them directly to each other.

But this provides at least some evidence that the concerns we have about bar charts — that truncating the y-axis can mislead people — could also apply to line charts.

It's important to note that, for all the charts used in the "deceptive" charts study, the actual numbers were visible on the charts (as in the examples above). So participants were misled even though the axes were properly labelled. This is an important point, I think, as people often dismiss concerns about truncated axes (on bar charts or line charts) by arguing a chart is honest as long as the axes are labelled. As David Yanofsky wrote in Quartz:
Blaming a chart’s creator for a reader who doesn’t look at clearly labeled axes is like blaming a supermarket for selling someone food he’s allergic to.
It's an interesting analogy as, when it comes to food allergies, schools, restaurants and stores now go out of their way to alert people to possible allergens, believing their moral duty to prevent harm is greater than just listing "peanuts" in tiny type on the ingredients list.

While the stakes are (thankfully) not nearly as high when it comes to charts, I think chart creators should also go out of their way to avoid harm. We don't want our charts to mislead people, including those who don't look carefully at the axis.

Visualization researcher Steve Haroz also notes he's aware of no research to back up the claim that non-zero baselines are more problematic with bar charts than line charts.

So does that mean line charts, like bar charts, should always start at zero?

I don't think that's right, either.

Because it's not hard to find examples where a rigid baseline-zero rule for line charts leads to data visualizations that are totally useless.

You can make global warming look like no big deal if you stick to baseline zero (as the National Review did).

Intraday stock charts are another good example. A very small change in a stock price (i.e. up or down a few percent) may be very meaningful if it's meant to show how the market reacted to news about a company. Like this chart showing what happened to Apple stock after the celebrity photo hacking scandal:

Or let's say you have an expensive diagnostic machine in a hospital that will break down if a certain fluid goes above or below a given level. A "control chart" that shows if operations are within a narrow acceptable range is clearly the right answer. Should we risk letting the machine break down just so the baseline-zero crowd are happy?

The bottom line is that sometimes small changes are really important. And if baseline zero makes those small changes invisible, or really hard to see, that's not ideal.

But in many other cases, the important changes are of large or medium size and are easy enough to see using baseline zero. All truncating the axis does in those cases is make those changes look (misleadingly) much bigger than they really are.

For example, this line chart of "Breaking Bad" star Aaron Paul's Twitter followers clearly shows there was a spike in followers during the final season.

Using baseline zero doesn't make the trend hard to see and, I would argue, makes the chart more informative as it gives both a sense of when his followers started to spike and an accurate picture of how big that recent growth in followers really is (~40% increase).

All truncating the y-axis does is make that growth look much, much more dramatic than it really is. It provides the reader with less information, not more.

It strikes me that line charts are communicating (at least) two things.

One is the rate of increase/decrease relative to earlier points on the chart. For example, a big shift in a stock's price immediately following a major news event. Or how crime went up faster between November and December than between July and August. For these types of comparisons, baseline zero is irrelevant.

But a line chart is also often communicating the actual rate of increase/decrease (ie. up 25%, down 50%). And for this, baseline zero can be very important (and its absence potentially misleading).

While more research in this area would be helpful, I'm inclined to think that both these things are probably true:
  1. Truncating the y-axis on a line chart, like on a bar chart, risks misleading your audience into thinking a change is bigger than it really is.
  2. Sometimes that risk is worth it to make sure your audience is able to see small, but meaningful, changes in the data.
Which leads me to think this may be a good rough guideline for whether line charts should use baseline zero:
Most line charts should start at zero.
BUT not using baseline zero is OK if:
a) Zero on your scale is completely arbitrary (ie. temperature) OR
b) A small, but important, change is difficult or impossible to see using baseline zero.
When I floated this idea on Twitter, Alberto Cairo came up with a slightly different rule of thumb:
Here's how I approach this:
1. If you can include 0 and there's a natural 0, include 0.
2. If by including 0 your line becomes so flat that you barely see differences, then it's wrong and misleading
3. The main purpose of a line chart is to see differences, not to tell how far it is from 0 as a whole (that can be a purpose, too, but a secondary one, and subject to fulfilling the former.)
4. All these depend on the nature of the data
These are good guidelines. And certainly better, in my view, than "the baseline doesn't matter at all on a line chart". But I take issue with a couple of Alberto's points.

First, to Point 3, I'm not sure the fundamental purpose of a line chart and a bar chart are necessarily that different.

Indeed, when I encounter charts in the wild (in news stories or business reports), the main thing that sets the two apart is just the type of data being represented: categorical data is usually shown on a bar charts, time-series data is usually shown on a line chart.

For example, homicide rates between major cities would typically be shown on a bar chart, while the change over time in the homicide rate for a particular city would be shown on a line chart. But what's being compared — the homicide rate — is the same in both cases. And what defines a meaningful difference (between a violent city and a safe one, or how much safer a city has become over time) is the same, too.

On Point 2, I'm also not sure I'm completely sold on the idea of a "natural 0". I've asked Alberto what he would consider an "unnatural zero" — meaning baseline zero isn't required — and he has said he defines it as situations where the data being visualized is unlikely to ever hit zero. Unemployment will never be at 0%. A nation's life expectancy will never be 0 years.

Yanofsky made a similar point in his a Quartz piece, arguing baseline zero is "worth omitting when the implication that [the data] might reach zero is preposterous".

And Stephanie Evergreen made a similar argument in a post last year.
Other than for bar charts, I advocate for a y-axis that is based on something reasonable for your data. Maybe the minimum of the axis is your historically lowest point. Maybe the minimum should be the point at which you’d have to alert your superiors. Maybe the minimum is the trigger point where your team has decided a different course of action is needed. Whatever you pick, just pick. Make it meaningful and intentional. Not something the software automatically decides for you (though that’s a place to start your thought process).
And, indeed, Alberto used the "natural 0" argument to make a case for why my hypothetical Fox News line chart above is misleading even though many line charts with a non-zero baseline aren't:

The idea of a meaningful, natural baseline for a line chart is appealing. But I'm not sure it makes sense in practice.

If most of your audience doesn't know that the U.S. had no income tax prior to 1913 — and I would hazard a guess that most of them don't — then how can that fact be relevant to whether a chart is misleading or not?

Or, to put it another way, if this was a chart of income taxes in a country that had always had high income-tax rates (Norway, maybe?) would that suddenly make it OK to have the y-axis start at 34%? Would the line chart no longer be misleading? I don't think so.

I think Alberto and Stephanie's argument for a "natural" baseline other than zero — whether a historical minimum, or "trigger point" where action needs to be taken — only makes sense if the baseline is annotated with that contextual information. Then, the "natural" baseline is providing useful context.

But a line chart that uses a "natural" baseline with which the audience is unfamiliar won't be any less misleading, in my view, than one that uses an arbitrary cut-off.

Also, even in situations where zero is never reached, it's a useful benchmark to be able to see what the real rate of change is.

An increase in unemployment from 4% to 8% is a doubling of unemployment, even if unemployment will never reach zero. And a drop in mortgage rates from 8.5% to 5% is not quite as dramatic as this chart makes it look, even if banks will never loan out money for nothing.

All of which, in my view, supports the argument that most line charts should probably start at zero unless doing so makes small, but important, changes hard to see.

It's worth noting there are also workarounds, like showing percent change from a 0% starting point or, as Ben Jones points out, inset charts that show both the big picture and then zoom into the area of interest.

As part of her excellent series on what to consider when using different chart types, Datawrapper's Lisa Rost wrote a post on line charts that argues that, while baseline-zero isn't a rule for line charts, it's worth considering when your data is close to zero (a view shared by Dona Wong):
Consider extending your y-axis to zero. Line charts have the big advantage that they don’t need to start from zero. If your data comes close to zero, however, consider adding the zero baseline. Readers then will be able to compare not just the vertical distance between two values with each other, but also the distance between these values and the zero baseline.

While I think this is good advice, it strikes me as insufficient. The argument here is essentially the same as the one I made above: Using baseline zero on a line chart is better, as it allows you to see both the relative and actual rate of change. But it seems odd to me that that advice should be limited only to those situations where the baseline is already close to zero (and so the amount of distortion is relatively small) and not those situations where the baseline is far from zero (and the amount of distortion is potentially much greater).

The more I look at this issue, the more convinced I become that most line charts should start at zero. But if it's true that line charts have as much potential to mislead as bar charts, that raises another intriguing question: Why shouldn't there be exceptions to the zero-baseline rule for bar charts, too?

After all, small changes can exist for categorical data as much as for time-series data.

If it's OK to truncate the y axis to show small (but important) changes in a country's life expectancy over time, why must we stick with a zero baseline to show small (but important) differences between countries?

If Iceland is doing something right that gives its people two more years of life than those in Denmark, does this chart really let us see that clearly? I realize there are alternatives to bar charts (like dot plots). But is there actual evidence to suggest the zero-baseline rule should be hard-and-fast with bar charts or is it just a convention?

(For what it's worth, I'll continue to advise my students to make all bar charts zero baseline, if only because it's such a convention in the field that doing otherwise would make them look like they don't know what they're doing.)

While I teach at a university, I don't have a PhD and I'm not an academic researcher. But if any researchers are looking for ideas, I think a study that directly compared truncated bar charts and line charts would be great.

Because while the study on "deceptive" data visualizations provides some initial evidence, it's limited by the fact that it studied aspect ratio on line charts, not specifically truncated axes. And the bar charts and line charts weren't directly comparable.

I think it would be useful to compare the exact same data using bar charts and line charts — both with and without truncated axes. We'd then be able to see how truncating the y-axis affects people's perception of the data and, crucially, whether the impact on people's perception is any different for line charts and bar charts.

I'll leave it to the experts, but I think a study like this would also require some careful thought about how to measure perception.

I think one of the strengths of the "deceptive" data visualizations study is that the axes were labelled, as that more closely approximates the way such charts exist in the wild.

But that means it's probably not useful to ask people to estimate the specific values in the charts, as many will just look at the axes and rely on the labels rather than the visual.

The earlier study tried to get around this by asking participants whether they thought the differences in the chart were "substantial" or not. But as Enrico Bertini, one of that study's authors notes, it's hard to separate the semantic meaning of the data from the visualization.

For example, a 1% increase in the unemployment rate is substantial. So is a truncated line chart that makes the reader see that increase as a "big deal" more misleading than one with baseline zero — or less?

It strikes me there might be a couple ways around this problem. One would be to ask imprecise questions of magnitude. "Looking at this chart, do you think the number of incidents has a) gone up about a third, b) gone up about 50%, c) doubled, d) more than doubled". Some participants might look at the axes and try to do the math in their head. But I suspect many wouldn't. And if study participants get the magnitudes wrong even with the axes labelled, that would be strong evidence that truncation can be seriously misleading.

The other possible solution I see would be mixing up the datasets: Visualize mortgage rates on some charts, immigration numbers on others. That would perhaps provide some insight about whether the context of the dataset affects how people interpret the charts.

I think a study like this might provide some guidance on when and how we should truncate line charts and also whether baseline zero is any more important for bar charts than line charts (something we all assume but which it appears we have little empirical evidence for).

Two final points.

First, some will say any "rules" about data visualization are counter productive, as every situation is different. While I think there's some truth to that, I think rules of thumb are useful, especially for beginners to the field. (I have a whole argument about that if you want to read it.)

Second, I think in all of these debates audience is really important.

If you're building an internal Dashboard for your organization measuring, say, whether sales are up or down from week to week, whatever axis scale you use will likely become familiar to your users over time. That means the risk of misleading your audience is probably low, and so truncating the y-axis may make sense to make small differences easier to see.

In contrast, if you're producing charts for the general public (like in data journalism or for a public report), I think the risk of misleading people with a truncated y-axis is much higher.

UPDATE: In Spring 2020, a really interesting paper came out that found that — contrary to conventional wisdom — truncating line charts appeared to have the same effect on a reader's subjective perception as truncating bar charts. I encourage you to check it out. There's also a blog post that accompanies the paper.

This post is an expansion of a Twitter thread on the topic and the many thoughtful replies I received in response.

That discussion also went off into a separate, but quite interesting, tangent on the limits of our knowledge in data visualization and what that means about how we should teach "rules" and "guidelines" in the field. I've collected some of the best tweets from that discussion in a separate post.

If you've got thoughts on this topic, please post a comment below or hit me up on Twitter. Because of spam comments, my comments are moderated so don't be alarmed if yours doesn't show up right away. It will within a few hours.