John Schwabish wrote an interesting series of blog posts last week on data storytelling.
In the posts, John argues that we in the data visualization community are overusing the term "story" — applying it to all sorts of situations where we aren't really telling stories at all.
People who work with and communicate data tend to throw around the words “story” and “data” a lot these days. We say all too regularly, “Let’s tell a story with these data” or “What story do these data tell us?” While it sounds good to say that we’re telling stories with our data, I think far too often, far too many of us are not applying the word story to data correctly.In making his case, John defines a "story" the way many people do: as a literary story, one that follows a familiar story arc, is emotional and which has a meaningful climax.
I agree with John that — defined this way — a "story" is, indeed, a poor way of thinking about how we visualize and present data.
But there is another type of story that I think does do a good job of describing the process of data analysis and visualization, and whose conventions and terminology can actually be quite helpful in getting people to think more deeply about how to make their data engaging and interesting.
A news story.
A news story is what I mean when I talk about telling stories with data — something I do quite a lot, from my blog tagline to my Tableau workshops to the "Data Storytelling" course I'm teaching at the University of Florida this summer.
Why literary stories are a bad model for data visualization? (or where John and I agree)John has already made an excellent argument for why — most of the time — a literary story is a poor model for the process of data analysis and visualization.
Most data visualizations aren't emotional. They don't follow a typical literary story structure. They rarely have a "character" that we follow on a journey.
In fact, I'd go one step further than John and argue that, not only is a literary story a poor model for thinking about data visualization, but that trying to make one's data fit a literary story arc can actually be quite dangerous.
That's because the conventions of literary stories — cause and effect, climax, resolution, emotion — are often working at cross purposes to good data analysis.
Let's take just one of the most basic rules given to all first-year stats students: Correlation does not imply causation.
It's hard enough to remember this rule when looking at a scatterplot of two variables. It's a heck of a lot harder if that relationship is part of a "story" about the journey those two variables are on and the creator of the chart has worked really hard to make sure their visualization has some kind of dramatic "climax" that wows their audience.
The goals of literary storytelling are also quite different from the goals of data analysis, which can distract us from our primary purpose of making sense of data.
Fundamentally, the purpose of literary stories is to entertain and the teller of a fictional story has the luxury of making things up to ensure their story is as entertaining as possible.
In contrast, the purpose of data analysis is usually to inform an audience and those visualizing data are limited to plain old facts.
Those facts may not stir emotions, or fit into a satisfying story arc. And trying to make them fit that pattern — because someone has told you your data should "tell a story" — is often a distraction from figuring out what's most important in your data and communicating that message to others.
A better model: The news storyWhile a literary news story is a bad model for most forms of data visualization, thinking about data like a news story can actually be quite helpful, in my view.
Like with data analysis, the purpose of a news story is primarily to inform, not to entertain. And, like with data analysis, the authors of news stories are limited to facts in constructing their story. They can't just make something up to make their story more exciting (or at least they can't without the risk of getting fired; see Stephen Glass and Jayson Blair).
Also, in my experience, the conventions and terminology of news stories can be a helpful way for those who work with data to think through how best to present their findings.
I explored some of those conventions in a talk I gave last June at the Information Plus conference in Vancouver on "How to think like a data journalist".
But, briefly, here are some news story conventions that I think those who work with data could learn from:
Headlines: People have an annoying habit of giving their charts titles that describe their data rather than communicate the key takeaway message they want their audience to have. I tell my students to think of their chart titles like a headline: Don't hope your audience figures out what your message is on their own. Just tell them!
Lead: The lead is the first sentence of any news story. It's similar to a headline but serves a dual role. A lead should both communicate the most important information in your data and make the reader want to know more. I think leads are so important that I make all my students, whether journalism students or otherwise, come up with a lead at the earliest stages of their data visualization projects. Condensing one's analysis down to a single sentence forces you to make a choice about what really matters in your data. Once my students have a written lead, I get them to think through how they would translate that sentence into a chart that gets their key message across.
Inverted Pyramid: The inverted pyramid is the way that almost all journalists first learn how to write a news story. You start with the most important information at the top, and then move on to the next most important, and so on until you end your story with the least important facts. Writing stories in this way makes it easy for editors to chop a story for space at the last minute without needing to dramatically rewrite the whole thing. Inverted pyramid writing can be a bit boring — and most journalists eventually move away from it, at least in part — but it forces journalists to have a clear sense of the relative priority and importance of almost every fact in their story. Data analysts could learn something from this technique: Prioritizing their findings from most important to least important, even if they don't necessarily present their findings in that exact order.
Real People: John actually addresses this point quite well in the last post in his series. For journalists, it's second nature to find the "real people" who help illustrate a data point. When I did a series on parking tickets in Vancouver, I profiled a gung-ho parking ticket officer who hands out 60-70 tickets a day. For a series on bike thefts, I told the story of a bike that was stolen not once, but twice. And for a series on low organ donation rates in immigrant communities, I profiled a South Asian woman who waited a decade for a kidney. Those human stories bring the data to life — helping readers understand that the data is not an abstraction but a reflection of real things going on in the real world to real people. Like John, I think in some cases data analysts can make their data more engaging by finding the human stories that help to illustrate the figures. But in other cases, talking to people is important simply to better understand what's really going on with your data. If your data is showing sales are way down at one store, call up the manager and ask why. Data analysts need to step away from their spreadsheets every now and then and engage with the real world.
These conventions aside, one other advantage of thinking of data storytelling like a news story rather than a literary story is that there are different types of news stories.
Feature stories can often be quite similar to a literary story, with a clear narrative told from beginning to end, often featuring a key character.
But there are also breaking news stories, where the important thing is to communicate what's going on as quickly and concisely as possible. What would be the visualization equivalent of a breaking news story be? Maybe a Dashboard showing up-to-the-minute sales data.
There are also explainers, that pose a question that the journalist then tries to answer (i.e. "Why did Rural America vote for Trump?"). Explainers suck the audience in not through a traditional story arc but by posing a question that sparks curiosity. Explainers are perhaps one of the easiest fits for the work of data analysis and visualization, which is often motivated by seeking answers to specific questions ("What are our most profitable products?" "What's the connection between vaccination rates and measles outbreaks?").
There are other types of news stories. I won't list them all.
The point is that news stories cover a broader range of story structures than literary stories, which make them a better analogy for the work of data analysis, which has a variety of different purposes.
Why bother talking about data storytelling at all?I hope I've made the case that, when we want to tell stories with data, thinking of those stories as news stories is more useful than thinking of them as literary stories. But why bother talking about data storytelling at all?
As John argues:
What I’m primarily focusing on here are the line charts, bar charts, area charts, and other charts that we all make every day to better understand our data, conduct our analysis, and share with the world. Even though we often say we’re telling data stories, with those kinds of charts we are not telling stories, but instead making a point or elucidating an argument.I think there are at least two reasons why the focus on data storytelling can be helpful.
First, and most simply, the term story naturally makes one think about the audience: about who that story is being told to.
When it comes to data analysis and visualization, I think that's a good thing. Data analysts spent a lot of time with their data and it can be easy to get lost in the weeds and forget how foreign your figures will seem to someone coming to them fresh. Thinking about telling stories with data reminds you that you need to simplify your message so that it's easier for your audience to digest.
Even the terms data analysis and data visualization keep the focus on the process: Of analyzing the data or turning the data into charts, rather than explaining your data to others. Or as Jewel Loree said in her Tapestry talk this year:
Using the term story is a good reminder that, at the end of the day, you have to communicate your findings to someone else and that will require you to think about who your audience is and what they need.
Second, I think there is a tendency in data visualization to put way too much data into your chart.
This is partly due to our own insecurities. Who hasn't had a boss ask them why this or that wasn't in their chart or presentation. So, just to be safe, we lean towards leaving stuff in so that no one can get mad at us for leaving it out.
Second, the tools that we use, with all their fancy interactivity and filters, make it easier than ever to show more data rather than less.
Excel, for all its many flaws, at least forced you to decide which static chart to build. With Tableau, you can create a Dashboard showing a half dozen views and then load it up with a half dozen filters. What I like to call: Show Everything. Filter Everything By Everything.
Thinking about data as a story is an important counterweight to the dangerous tendency to include too much information in our visualizations and presentations.
Final thoughtsI've tried to avoid dictionary definitions so far in this blog post but I don't think I can any longer.
John argues people are using the word story too broadly:
I think most of us are using the word story as it applies to anything we are trying to communicate with data; we are using that word too flippantly and too carelessly. One could argue that we in the data visualization field can come up with our own definition of story, but that’s simply changing the definition to meet our needs. Plus, I don’t think that’s how many people view it—they see visualizing data as a way to tell a story, but it rarely is a story.I frankly think John's definition is too narrow. And it's not one shared by most dictionary definitions of the word I could find (Merriam-Webster, Oxford). Indeed, most dictionaries define story in a pretty broad way, encompassing everything from news articles to gossip to novels.
But, as John rightly points out, people aren't using the term data storytelling to mean telling a traditional literary story with data. Rather, they're using the term in a much looser, vaguer way. For me, telling a story with data means telling something like a news story. For someone else, it may mean recounting their personal experience with a dataset.
John seems to think this vagueness is a bad thing. But I disagree.
I think when people talk about data storytelling they're really being aspirational.
They know data can be dry and boring and they want to find a way to present their findings in a way that grabs their audience's attention.
They use the term story because they know people get excited and engaged by stories and — in a perfect world — that's how they want people to respond to their data, too.
I get that, for some in the data visualization community, "data storytelling" has become a bit of a cliche: a meaningless phrase that people like to throw around without really thinking it through.
But I think when most people say they want to tell a story with data, what they really mean is that they want to find a way for their data to have more meaning and impact.
And that's something we should all want to encourage.
And that's something we should all want to encourage.