Thursday, March 30, 2017

A different way to think about "Data Storytelling"

John Henderson

John Schwabish wrote an interesting series of blog posts last week on data storytelling.

In the posts, John argues that we in the data visualization community are overusing the term "story" — applying it to all sorts of situations where we aren't really telling stories at all.
People who work with and communicate data tend to throw around the words “story” and “data” a lot these days. We say all too regularly, “Let’s tell a story with these data” or “What story do these data tell us?” While it sounds good to say that we’re telling stories with our data, I think far too often, far too many of us are not applying the word story to data correctly.
In making his case, John defines a "story" the way many people do: as a literary story, one that follows a familiar story arc, is emotional and which has a meaningful climax.

I agree with John that — defined this way — a "story" is, indeed, a poor way of thinking about how we visualize and present data.

But there is another type of story that I think does do a good job of describing the process of data analysis and visualization, and whose conventions and terminology can actually be quite helpful in getting people to think more deeply about how to make their data engaging and interesting.

news story.

A news story is what I mean when I talk about telling stories with data — something I do quite a lot, from my blog tagline to my Tableau workshops to the "Data Storytelling" course I'm teaching at the University of Florida this summer.

Why literary stories are a bad model for data visualization? (or where John and I agree)

John has already made an excellent argument for why — most of the time — a literary story is a poor model for the process of data analysis and visualization.

Most data visualizations aren't emotional. They don't follow a typical literary story structure. They rarely have a "character" that we follow on a journey.

In fact, I'd go one step further than John and argue that, not only is a literary story a poor model for thinking about data visualization, but that trying to make one's data fit a literary story arc can actually be quite dangerous.

That's because the conventions of literary stories — cause and effect, climax, resolution, emotion — are often working at cross purposes to good data analysis.

Let's take just one of the most basic rules given to all first-year stats students: Correlation does not imply causation.

It's hard enough to remember this rule when looking at a scatterplot of two variables. It's a heck of a lot harder if that relationship is part of a "story" about the journey those two variables are on and the creator of the chart has worked really hard to make sure their visualization has some kind of dramatic "climax" that wows their audience.

The goals of literary storytelling are also quite different from the goals of data analysis, which can distract us from our primary purpose of making sense of data.

Fundamentally, the purpose of literary stories is to entertain and the teller of a fictional story has the luxury of making things up to ensure their story is as entertaining as possible.

In contrast, the purpose of data analysis is usually to inform an audience and those visualizing data are limited to plain old facts.

Those facts may not stir emotions, or fit into a satisfying story arc. And trying to make them fit that pattern — because someone has told you your data should "tell a story" — is often a distraction from figuring out what's most important in your data and communicating that message to others.

A better model: The news story

While a literary news story is a bad model for most forms of data visualization, thinking about data like a news story can actually be quite helpful, in my view.

Like with data analysis, the purpose of a news story is primarily to inform, not to entertain. And, like with data analysis, the authors of news stories are limited to facts in constructing their story. They can't just make something up to make their story more exciting (or at least they can't without the risk of getting fired; see Stephen Glass and Jayson Blair).

Also, in my experience, the conventions and terminology of news stories can be a helpful way for those who work with data to think through how best to present their findings.

I explored some of those conventions in a talk I gave last June at the Information Plus conference in Vancouver on "How to think like a data journalist".

But, briefly, here are some news story conventions that I think those who work with data could learn from:

Headlines: People have an annoying habit of giving their charts titles that describe their data rather than communicate the key takeaway message they want their audience to have. I tell my students to think of their chart titles like a headline: Don't hope your audience figures out what your message is on their own. Just tell them!



Lead: The lead is the first sentence of any news story. It's similar to a headline but serves a dual role. A lead should both communicate the most important information in your data and make the reader want to know more. I think leads are so important that I make all my students, whether journalism students or otherwise, come up with a lead at the earliest stages of their data visualization projects. Condensing one's analysis down to a single sentence forces you to make a choice about what really matters in your data. Once my students have a written lead, I get them to think through how they would translate that sentence into a chart that gets their key message across.

Inverted Pyramid: The inverted pyramid is the way that almost all journalists first learn how to write a news story. You start with the most important information at the top, and then move on to the next most important, and so on until you end your story with the least important facts. Writing stories in this way makes it easy for editors to chop a story for space at the last minute without needing to dramatically rewrite the whole thing. Inverted pyramid writing can be a bit boring — and most journalists eventually move away from it, at least in part — but it forces journalists to have a clear sense of the relative priority and importance of almost every fact in their story. Data analysts could learn something from this technique: Prioritizing their findings from most important to least important, even if they don't necessarily present their findings in that exact order.

Credit: Wikipedia
Making it personal: I did a whole Tapestry talk on this one, but in short, journalists are very good at framing a news story so it's directly relevant to the reader. In some cases this as simple as putting the word "you" in a headline but it can also involve charts or maps that allow readers to pinpoint the data that is of specific interest to them. My friend Steve Wexler has explored how this principle can be applied in a business context: How does my salary compare to others in my organization? How do my store's sales measure up to others in my area?

Real People: John actually addresses this point quite well in the last post in his series. For journalists, it's second nature to find the "real people" who help illustrate a data point. When I did a series on parking tickets in Vancouver, I profiled a gung-ho parking ticket officer who hands out 60-70 tickets a day. For a series on bike thefts, I told the story of a bike that was stolen not once, but twice. And for a series on low organ donation rates in immigrant communities, I profiled a South Asian woman who waited a decade for a kidney. Those human stories bring the data to life — helping readers understand that the data is not an abstraction but a reflection of real things going on in the real world to real people. Like John, I think in some cases data analysts can make their data more engaging by finding the human stories that help to illustrate the figures. But in other cases, talking to people is important simply to better understand what's really going on with your data. If your data is showing sales are way down at one store, call up the manager and ask why. Data analysts need to step away from their spreadsheets every now and then and engage with the real world.

These conventions aside, one other advantage of thinking of data storytelling like a news story rather than a literary story is that there are different types of news stories.

Feature stories can often be quite similar to a literary story, with a clear narrative told from beginning to end, often featuring a key character.

But there are also breaking news stories, where the important thing is to communicate what's going on as quickly and concisely as possible. What would be the visualization equivalent of a breaking news story be? Maybe a Dashboard showing up-to-the-minute sales data.

There are also explainers, that pose a question that the journalist then tries to answer (i.e. "Why did Rural America vote for Trump?"). Explainers suck the audience in not through a traditional story arc but by posing a question that sparks curiosity. Explainers are perhaps one of the easiest fits for the work of data analysis and visualization, which is often motivated by seeking answers to specific questions ("What are our most profitable products?" "What's the connection between vaccination rates and measles outbreaks?").

There are other types of news stories. I won't list them all.

The point is that news stories cover a broader range of story structures than literary stories, which make them a better analogy for the work of data analysis, which has a variety of different purposes.

Why bother talking about data storytelling at all?

I hope I've made the case that, when we want to tell stories with data, thinking of those stories as news stories is more useful than thinking of them as literary stories. But why bother talking about data storytelling at all?

As John argues:
What I’m primarily focusing on here are the line charts, bar charts, area charts, and other charts that we all make every day to better understand our data, conduct our analysis, and share with the world. Even though we often say we’re telling data stories, with those kinds of charts we are not telling stories, but instead making a point or elucidating an argument.
I think there are at least two reasons why the focus on data storytelling can be helpful.

First, and most simply, the term story naturally makes one think about the audience: about who that story is being told to.

When it comes to data analysis and visualization, I think that's a good thing. Data analysts spent a lot of time with their data and it can be easy to get lost in the weeds and forget how foreign your figures will seem to someone coming to them fresh. Thinking about telling stories with data reminds you that you need to simplify your message so that it's easier for your audience to digest.

Even the terms data analysis and data visualization keep the focus on the process: Of analyzing the data or turning the data into charts, rather than explaining your data to others. Or as Jewel Loree said in her Tapestry talk this year:

Using the term story is a good reminder that, at the end of the day, you have to communicate your findings to someone else and that will require you to think about who your audience is and what they need.

Second, I think there is a tendency in data visualization to put way too much data into your chart.

This is partly due to our own insecurities. Who hasn't had a boss ask them why this or that wasn't in their chart or presentation. So, just to be safe, we lean towards leaving stuff in so that no one can get mad at us for leaving it out.

Second, the tools that we use, with all their fancy interactivity and filters, make it easier than ever to show more data rather than less.

Excel, for all its many flaws, at least forced you to decide which static chart to build. With Tableau, you can create a Dashboard showing a half dozen views and then load it up with a half dozen filters. What I like to call: Show Everything. Filter Everything By Everything.

News stories don't include every possible fact about what happened. The journalist makes a judgment call about which facts are most important and should be emphasized, how much background information is necessary for proper context, and which facts can safely be left out altogether.

Thinking about data as a story is an important counterweight to the dangerous tendency to include too much information in our visualizations and presentations.

Final thoughts

I've tried to avoid dictionary definitions so far in this blog post but I don't think I can any longer.

John argues people are using the word story too broadly:
I think most of us are using the word story as it applies to anything we are trying to communicate with data; we are using that word too flippantly and too carelessly. One could argue that we in the data visualization field can come up with our own definition of story, but that’s simply changing the definition to meet our needs. Plus, I don’t think that’s how many people view it—they see visualizing data as a way to tell a story, but it rarely is a story.
I frankly think John's definition is too narrow. And it's not one shared by most dictionary definitions of the word I could find (Merriam-WebsterOxford). Indeed, most dictionaries define story in a pretty broad way, encompassing everything from news articles to gossip to novels.

If people always meant literary stories when they talked about data storytelling, I'd be worried. As I've already explained, in most cases, I think trying to make your data fit a traditional story arc will be distracting at best and dangerous at worst.

But, as John rightly points out, people aren't using the term data storytelling to mean telling a traditional literary story with data. Rather, they're using the term in a much looser, vaguer way. For me, telling a story with data means telling something like a news story. For someone else, it may mean recounting their personal experience with a dataset.

John seems to think this vagueness is a bad thing. But I disagree.

I think when people talk about data storytelling they're really being aspirational.

They know data can be dry and boring and they want to find a way to present their findings in a way that grabs their audience's attention.

They use the term story because they know people get excited and engaged by stories and — in a perfect world — that's how they want people to respond to their data, too.

I get that, for some in the data visualization community, "data storytelling" has become a bit of a cliche: a meaningless phrase that people like to throw around without really thinking it through.

But I think when most people say they want to tell a story with data, what they really mean is that they want to find a way for their data to have more meaning and impact.

And that's something we should all want to encourage.


  1. Hi Chad,

    that's a great post! And I fully agree that if you define story as news story, then all the characteristics from a news story you apply to data story telling is a rather good fit. So, I do see how that works very nicely. But there is a precondition here, that is, that you define 'story telling' as a 'news story'-like. But if that's the case, it would personally resonate more with me if it was then called 'data-news-story-telling' or something like that. I can't help it but to think of literary stories primarily when you say 'story telling'. So, maybe it's really a matter of definition and semantics?

    I have 2 books on story telling that are written from a point of view where the writers try to connect brain science with story telling: how does our brain process stories, what makes them stick, captivating or engaging, etc. The first book is called Wired for Story by Lisa Cron, and the second one is Story Proof, the science behind the startling power of story by Kendall Haven.

    In the second book the writer especially takes time to define a story, including taking into account dictionary definitions, similarity or difference with 'narrative', single sentences etc. What this writer at least means when he talks about story is: "A detailed, character based narration of a character's struggles to overcome obstacles and reach an important goal".

    To quote the writer: "Compare this definition with the dictionary's: 'a narrative account of real or imagined events.' Thie dictionary's definition focuses on event or events and is thus plot-based. In plot-based narratives, this happens, then that happens, and then that happens. Plot-based narratives do not spark your interest or create meaning. Stories are character-based and are driven by the details that describe that character's goals, motives, obstacles and struggles. Through the addition of character, goal motive and obstacles to the definition lies a word of difference that creates story's unique power and effectiveness. Events happen not for their own sake, but to explain the struggle of a character. The general term, narratives, may be plot-based event descriptions, stories (character-based), or information-based articles, reports, data sets and other similar documents. Information-based narratives provide just the new essential information and assume the reader has adequate banks of relevant topical prior knowledge to create context and meaning and sufficient related personal experience to create relevance. Science writing tends to be in the form of information-based narratives. It's like Sergeant Friday on the 1960s TV show Dragnet: 'Just give me the facts, ma'am. Just the facts.'." All three types of writing are narratives. Only stories are structured around the character-based informational elements receivers need in order to trigger and successfully drive the mental processes that lead to understanding to the creation of meaning, context and relevance. And to activate memory."

  2. And I think that's a good description and distinction of a story. When I tell my kids a bedtime 'story', it is a character based story with struggles, goals and motives. It's not a news-story.

    So, with the disclaimer in mind that I haven't given it as much thought as Jon has done for instance, I would say that story-telling with data should rather be called a data narrative. You can use structures and elements from a news-story very well to shape and design your data narrative / visualization, as they have quite a few things in common as you describe. You may even try to incorporate some character-based story telling elements into your data narrative to make it even better (although I don't have hard facts for it, but one of the conclusions of this research to use illustrative elements to make your visualization more memorable, kind of suggests to me some sort of incorporation of character-based, real-life, physical elements, could make your 'story' stronger).

    I also still agree with myself :) as mentioned in Jon's podcast that story telling consists of 2 parts: 'story' and 'telling'. And with data story telling, the story, or narrative, is something you as a designer, journalist, or analyst have to find yourself, in the data. The data does not explicitly tell you by itself what the story is, you make a choice here. And your decision to include something or not, or decide if something is relevant, has to do with the prior knowledge and goals you have for a story. Then the telling part has actually a lot to do with what you describe as news story: highlighting things, structuring the information, providing context. And with a visualization you primarily do this in a visual graphical way.