Wednesday, April 26, 2017

I'm on the PolicyViz podcast this week!


The PolicyViz podcast, hosted by Jon Schwabish, is one of my favourite podcasts: illuminating 30-minute conversations with various people in the data visualization field.

So it was a particular thrill went I loaded it up in Overcast this morning and saw my own name in the episode list.

Jon and I had a great chat about teaching data visualization and data storytelling. You can find the episode in your favourite podcast app or right here.

Also, at the risk of logrolling, I highly recommend you make the PolicyViz podcast part of your regular podcast lineup. Jon's a great interviewer and the episodes are always concise and focused. If this data visualization thing doesn't work out, Jon could switch careers and go into radio.


Monday, April 10, 2017

Finally revealed! The numbers BCLC spent six years trying to keep secret



OK. So I need to make an admission right out of the gate here: The climax of this story is a bit underwhelming.

The story I'm about to tell you is interesting, and my sensationalist headline is 100% true: BCLC did try to keep something secret for six years and I'm about to make it public right here for the first time.

But the actual "reveal" at the end of it all is, well, kind of a letdown. The upside is that the most interesting part of this story may still be to come. And that's where you come in.

But I'm getting ahead of myself.

The story begins seven years ago on April 9, 2010. That's when I sent BCLC a Freedom of Information request asking for a breakdown of where the users of its PlayNow online gambling site lived. Specifically, I asked for the total value of all PlayNow sales in each "Forward Sortation Area", or FSA. An FSA is the first three digits of your postal code and it corresponds to different areas of the province.

BCLC FOI by Chad Skelton on Scribd


The motivation behind my request was several data journalism stories I'd seen from the U.S. that showed that poorer neighbourhoods were more likely to play the lottery than richer ones. (This story is from 2016 but similar stories have been done dozens of times over the years by U.S. newspapers.) With PlayNow sales by neighbourhood, and income data from Statistics Canada, I figured I could see if there was a similar pattern in B.C.

I'd actually tried doing this story once before, by filing a Freedom of Information request to BCLC for a breakdown of paper lottery ticket sales at retail outlets by FSA. BCLC actually coughed up that data without a fight. But I found no correlation between an FSA's median income and its lottery sales. The problem with that data, I realized, was that people buy their lottery tickets all over the place — on their way to work, while doing their groceries — and so the areas with the highest "per capita" lottery sales tended to be those with low populations but a big mall.

PlayNow data would be different, I figured, as the postal code associated with each sale would be that of the gambler themselves. With that data in hand, I could actually figure out if poorer neighbourhoods were more likely to gamble — and the topic seemed timely, as BCLC was just starting to expand beyond selling lottery tickets online to offering more addictive online casino games, too.

So off the request went.

On May 18, BCLC wrote back saying it had a four-page document responsive to my request but that it wasn't going to give it to me. It argued that releasing the information could harm BCLC's finances because its online gambling competitors could use it to their advantage. I asked them to reconsider but they refused. The full correspondence is below:



My next step was filing a complaint with the Office of the Information and Privacy Commissioner, the independent agency that oversees the FOI process in B.C. The OIPC does good work but it doesn't do it very fast. So it wasn't until a year later, in the spring of 2011, that the case went to a formal hearing where both sides submitted their written arguments for why the sales data should or should not be made public.

And then, on August 25, 2011, the OIPC released its decision, finding in my favour and ordering BCLC to release the records.


(An aside: I notice now the ruling said to provide it within 30 days, which doesn't seem to match up with an Oct. 5 deadline. I can't recall why.)

Now, one of the great things about B.C.'s Information Commissioner — unlike her federal counterpart — is that she has what's called order power. That means that the decisions of her office are legal orders that need to be complied with immediately (unlike the federal Commissioner's orders, which are more like recommendations). So that meant that, with this ruling, BCLC was legally required to provide me with the sales data.

Every other time I won a case before the OIPC, that was the end of the story: The documents would arrive a few weeks later and that would be that.

Except that agencies actually do have one other option available to them: Take the Information Commissioner to court. Which is what BCLC did, seeking a judicial review of the Commissioner's decision in front of the B.C. Supreme Court. Specifically, BCLC argued, among other things, that the Commissioner didn't properly treat one of its "expert witnesses" as an expert.

A bunch of court proceedings followed over the next couple of years (The Vancouver Sun could have taken part but we decided to let the OIPC handle it). Then, on January 8, 2013 — almost three years after my original request — the B.C. Supreme Court ruled in BCLC's favour.

BCLC had asked the B.C. Supreme Court to just overturn the OIPC's ruling and let it keep the information secret. But, instead, the judge sent the case back to the OIPC for another ruling.

Which meant doing the whole hearing thing all over again. So fast forward another couple of years and we're at October 13, 2015 — five and a half years after my original request and after I've taken a buyout from The Sun — and the OIPC releases its second ruling in the matter, finding once again that I was entitled to the records.


If you think really hard, you can probably see what comes next.

Yep: BCLC took the Information Commissioner to court again.

Once again, I didn't have much to do with the court case, except for getting occasional emails from BCLC's lawyers making sure I was properly served with all the documents in the case.

The following fall, on Sept. 14, 2016, the B.C. Supreme Court made its decision — this time upholding the OIPC's second ruling. According to one of BCLC's lawyers who emailed me, the decision was made orally from the bench and I haven't been able to find a transcript published anywhere online.

Was that the end of the story? Not quite.

On October 13, 2016, BCLC sent me a notice that they intended to appeal the court's decision to the B.C. Court of Appeal.

But then, mysteriously, a month later on November 29, they sent me another notice that they were abandoning their appeal.


The next day, November 30, 2016, BCLC finally mailed me the requested records — 2,427 days after I had originally asked for them.



But they still had one more trick up their sleeves. While my original request clearly asked for the records in spreadsheet format, so I could more easily analyze the figures, BCLC instead sent me four badly photocopied, barely legible pages.


And this, dear reader, is where I need to confess that while the vast majority of the delay in making these records public is BCLC's fault, the last 130 days or so are on me.

It's hard enough getting motivated to analyze six-year-old data. It's even harder when you know it's going to start with a good hour or two of manual data entry. I also had a lot of other stuff on my plate this winter, like developing my new course at the University of Florida.

So the BCLC envelope stayed on my desk for a few months. Then, finally, I found some time a couple of weeks ago to type in the numbers by hand and start doing some basic analysis on the figures.

And what I found, as I warned you at the start, was pretty underwhelming.

I could find no evidence that poorer neighbourhoods are more likely to gamble online than richer ones. Indeed, what weak correlation exists actually runs in the opposite direction (the richer your neighbourhood, the more it gambles online).

I tried comparing a few other demographic characteristics from the 2011 National Household Service but came up empty.

Mapping the data, the best I can come up with is that it appears rural areas may be more likely to gamble online than urban areas, which kind of makes sense: Those in rural areas may not have easy access to a casino.



If you'd like to look at the data yourself, you can find it here (to download an Excel file, just click on File/Download in the top left corner). The first sheet is the data provided by BCLC itself, manually entered by yours truly. The second includes the data I added in for analysis (population, per capita spending and median income).

So is that the end of the story?

Well, not quite.

There's a bit of a mystery here.

If this data was so innocuous, why did BCLC fight so hard to keep it secret? It's possible I'm missing something in the data (which is why I'm making it public). But I suspect what BCLC was really worried about was not this data, per se, but the precedent it would set if it was forced to release it.

And that's because, since this request was filed, PlayNow has become a much bigger business for BCLC. Based on a review of a couple of BCLC's annual reports, "eGaming" brought in $135 million in revenue last year, more than five times the $23.5 million in revenues for 2008/09, the year my request was for.



Furthermore, looking closely at the PlayNow numbers I was provided with, there are some odd figures for some areas.

For instance, while most postal code areas had totals in the tens or even hundreds of thousands of dollars, V2C, a postal code area in Kamloops with more than 20,000 residents, had a total spend on PlayNow for 2008-09 of just $157.

On the other side, V0S, a remote area of Vancouver Island with just 125 residents had a total spend of $48,412. That gives V0S by far the highest per capita PlayNow spending in the province ($387, the second highest is V6C at $16). It's hard to know for sure, but I suspect that may just be one guy with a really bad gambling habit.

The point is, with just one year of data from a time when PlayNow was still in its infancy, the data is too noisy to make any meaningful conclusions about where B.C.'s online gamblers live and whether there's any correlation between gambling and other factors like income.

To do that, we'd need to know what the regional patterns in PlayNow gambling have been since 2008/09. Which is where you (maybe) come in.

As I'm sure you can imagine, I'm not eager to take another kick at the can here. Especially because I no longer work in a newsroom and so don't have an outlet to publish the results of whatever I find.

But I do think there's a story here, and I'd like to make it as easy as possible for someone else to find it — whether that's another journalist out there or an advocacy group with an interest in gambling.

As it happens, BCLC has an online form you can use to file a Freedom of Information request without having to draft a letter or buy a stamp (you can also fax or mail in your request).

You can fill it out with your own contact information. But I'd suggest copying and pasting the following wording into the section that asks for "Details of Requested Information":
In electronic database format, the total value of products purchased through BCLC's PlayNow website in each fiscal year from 2009-10 to 2016-17 in each Forward Sortation Area (FSA) in British Columbia. Please provide me with a list of total sales by FSA for the entire period and a breakdown by year. I draw your attention to OIPC order F15-58, upheld by the B.C. Supreme Court in September 2016, which found BCLC was legally required to provide such records for an earlier time period. I am asking for these records in spreadsheet format (Excel or CSV) NOT on paper or as a PDF. I draw your attention to OIPC order F10-16 which found that government agencies are required to provide records in spreadsheet format when they are technically able to do so.
[ NOTE: My original suggested request wording, rather stupidly, left out the part about breaking down the sales figures by Forward Sortation Area. So it was only asking for the total sales figures, which is data that is already available. The new wording, corrected on May 25, 2017, should be more successful. Apologies. ]

Now, if history is any guide, I doubt BCLC will just release these records without a fight. But given the legal precedent that now exists, I don't think BCLC will have much of a legal leg to stand on and hopefully it should take fewer than six years to get the records.

Also, if anyone takes this up, I'd suggest — while BCLC fights you on your original request —to keep filing a new request each year to BCLC for the following year's records. That way you've already got those requests in the pipeline. In retrospect, I wish I'd done that.

I realize this post probably isn't the greatest advertisement for filing an FOI request with BCLC. And I appreciate the hypocrisy of asking someone else to do what I no longer have the patience for.

But I firmly believe that if government agencies can get away with these kinds of ridiculous delays, transparency suffers. And, frankly, I feel like six years fulfills my duties on this file and it's time to pass it on to someone else.

That said, if you do file a request and end up in an OIPC hearing with BCLC, drop me a line and I'd be happy to share my written submissions with you so you can copy from them.

It will literally only take you a minute to go over to BCLC's online form right now and get the ball rolling on your own request.

And the more of you who do it, the more BCLC will learn they can't get away with this kind of secrecy.

Wednesday, April 5, 2017

A data visualization reading (and watching) list

Century Tower at the University of Florida // by Kate Haskell

Starting this summer, I'm teaching a course in Data Storytelling and Visualization at the University of Florida as part of its new online Master's program in Audience Analytics. After years of teaching data visualization — both at my home university of Kwantlen and through my public Tableau workshops — I'm excited to be branching out into online learning.

In preparing for the course, I asked my Twitter followers for suggestions of what I should add to my reading list.

I received a lot of great suggestions and promised that, once my reading list was complete, I'd share it with others. So here it is!

First, though, a bit of context. My UF course, like my other data visualization training, has a dual focus: Teaching Principles and Teaching Skills.

I like my students to come away with an understanding of data visualization best practices and how to tell effective data stories. But I also want them to have enough software skills to apply those principles to their own work.

For my UF class, the software tool I teach is Tableau. Both because it's the tool I'm most comfortable with and also because I genuinely believe it's the tool with the best combination of flexibility and ease-of use. A point illustrated well by Lisa Charlotte Rost in a chart from one of the readings (emphasis mine):



My course is built around a series of recorded lectures — about an hour's worth each week — in which I teach my students the technical skills of using Tableau while also getting them to think about the fundamentals of data visualization.

Wherever possible, I try to teach them principles at the same time as I'm teaching them practical skills.

To use one example, I teach students the technical steps of how to make a stacked bar chart in Tableau. But then I change the order of the segments to illustrate how stacked bar charts can be hard to read. And then I use Tableau to make a grouped column chart, area chart and line chart out of the same data and then point out the pros and cons of each.

To reflect that dual focus, my UF course has two core textbooks:

Despite disagreeing with her focus on literary storytelling, I really like Cole's book and think it does a great job of providing a lot of clear advice along with solid examples. And Dan Murray's "Tableau Your Data!" provides one of the most comprehensive guides to Tableau that I've come across.

In addition to those two textbooks, my UF course includes select chapters from some of my other favourite books on data visualization:
As you'll see below, I also included several chapters from the ebook Data + Design.

Below are links to the rest of my course readings, as well as videos that I recommend my students watch in addition to my lectures. Just to provide a bit of structure to the list, I've broken it down by topic week. Those topics primarily reflect the content of my recorded lectures, which aren't public, so sometimes the readings will match the topic and sometimes they won't.

Also, full disclosure: I've included a couple of my own pieces in the list below. This is mainly because they covered key topics I wanted to include in the course and having them in the readings saved me from needing to address them in my lectures.

Finally, if you've come across a great reading or video on data visualization or Tableau that's not listed here, please add it to the comments so others can find it. And if you've got a data visualization reading list of your own, please provide the link.

So, without further ado, here's the list:


Week 1: Finding Data

Read:




Watch:

Making data mean more through storytelling” by Ben Wellington [14m]

Andy Cotgreave


Week 2: Basic Data Analysis in Tableau

Read:






Watch:

The Visual Design Tricks Behind Great Dashboards” by Andy Cotgreave [56m; free login required; Related chart]

Week 3: Creating Static Charts in Tableau

Read:



Week 4: Finding the Most Important Thing

Read:


Watch:


Week 5: Choosing the Right Chart

Read:

Chart Suggestions – A Thought-Starter” by Extreme Presentations

Data Visualization Checklist” by Stephanie Evergreen and Ann Emery

Real Chart Rules to Follow” by Nathan Yau

The self-sufficiency test” by Kaiser Fung

Watch:

First, load this chart, press play at the bottom left and watch the data change from 1962 to 2015. Then watch this TED Talk by Hans Rosling [20m]:

The Competent Critic” by Alan Smith [21m]

The Power of Drawing in Storytelling” by Catherine Madden [18m]

TED Talks

Week 6: The Power of Annotation
Read:

Putting Data Into Context” by Robert Kosara

Watch:

Embracing Simplicity in Data Visualization” by Chris Love [45m; free login required]


Week 7: More Chart Types

Read:

Visual Analysis Best Practices” (Tableau Whitepaper)

Slopegraphs for comparing gradients: Slopegraph theory and practice” by Edward Tufte (don’t need to read comments)

Watch:



Week 8: Calculations

Watch:

Tableau Tip Tuesday: Table Calculations Overview” by Andy Kriebel (blog post and video)

Opening Keynote at OpenVis 2013 by Amanda Cox [43m]

Week 9: Maps

Read:

When Maps Shouldn’t Be Maps” by Matthew Ericson

All Those Misleading Election Maps” by Robert Kosara


Watch:

Mapping Tips from a Cartographer” by Sarah Battersby [53m; free login required]
Week 10: Interactive Dashboards and Data Stories

Read:

Interactive Data Visualization” by Peter Krensky (Tableau Whitepaper)

Data Storytelling” by Robert Kosara (Tableau Whitepaper)


Watch:

Storytelling and Data: Why? How? When?” by Robert Kosara [31m]



Week 11: Data Visualization Research

Read:

Watch:



Week 12: Next Steps and Tips

Read:


Watch:

50 Tips in 50 Minutes” by Andy Kriebel and Jeff Shaffer [52m]

Rapid Fire Tips & Tricks (and Bad Data Jokes)” by Daniel Hom and Dustin Smith [60m; free login required]


Some more helpful resources going forward


Tableau bloggers worth following:

Data Visualization bloggers worth following:

Podcasts worth listening to:

A Twitter list of people who provide Tableau and data visualization tips (featuring Ben Jones, Sophie Sparkes and Emily Kund):

Thursday, March 30, 2017

A different way to think about "Data Storytelling"


John Henderson

John Schwabish wrote an interesting series of blog posts last week on data storytelling.

In the posts, John argues that we in the data visualization community are overusing the term "story" — applying it to all sorts of situations where we aren't really telling stories at all.
People who work with and communicate data tend to throw around the words “story” and “data” a lot these days. We say all too regularly, “Let’s tell a story with these data” or “What story do these data tell us?” While it sounds good to say that we’re telling stories with our data, I think far too often, far too many of us are not applying the word story to data correctly.
In making his case, John defines a "story" the way many people do: as a literary story, one that follows a familiar story arc, is emotional and which has a meaningful climax.

I agree with John that — defined this way — a "story" is, indeed, a poor way of thinking about how we visualize and present data.

But there is another type of story that I think does do a good job of describing the process of data analysis and visualization, and whose conventions and terminology can actually be quite helpful in getting people to think more deeply about how to make their data engaging and interesting.

news story.

A news story is what I mean when I talk about telling stories with data — something I do quite a lot, from my blog tagline to my Tableau workshops to the "Data Storytelling" course I'm teaching at the University of Florida this summer.

Why literary stories are a bad model for data visualization? (or where John and I agree)

John has already made an excellent argument for why — most of the time — a literary story is a poor model for the process of data analysis and visualization.

Most data visualizations aren't emotional. They don't follow a typical literary story structure. They rarely have a "character" that we follow on a journey.

In fact, I'd go one step further than John and argue that, not only is a literary story a poor model for thinking about data visualization, but that trying to make one's data fit a literary story arc can actually be quite dangerous.

That's because the conventions of literary stories — cause and effect, climax, resolution, emotion — are often working at cross purposes to good data analysis.

Let's take just one of the most basic rules given to all first-year stats students: Correlation does not imply causation.

It's hard enough to remember this rule when looking at a scatterplot of two variables. It's a heck of a lot harder if that relationship is part of a "story" about the journey those two variables are on and the creator of the chart has worked really hard to make sure their visualization has some kind of dramatic "climax" that wows their audience.

The goals of literary storytelling are also quite different from the goals of data analysis, which can distract us from our primary purpose of making sense of data.

Fundamentally, the purpose of literary stories is to entertain and the teller of a fictional story has the luxury of making things up to ensure their story is as entertaining as possible.

In contrast, the purpose of data analysis is usually to inform an audience and those visualizing data are limited to plain old facts.

Those facts may not stir emotions, or fit into a satisfying story arc. And trying to make them fit that pattern — because someone has told you your data should "tell a story" — is often a distraction from figuring out what's most important in your data and communicating that message to others.

A better model: The news story

While a literary news story is a bad model for most forms of data visualization, thinking about data like a news story can actually be quite helpful, in my view.

Like with data analysis, the purpose of a news story is primarily to inform, not to entertain. And, like with data analysis, the authors of news stories are limited to facts in constructing their story. They can't just make something up to make their story more exciting (or at least they can't without the risk of getting fired; see Stephen Glass and Jayson Blair).

Also, in my experience, the conventions and terminology of news stories can be a helpful way for those who work with data to think through how best to present their findings.

I explored some of those conventions in a talk I gave last June at the Information Plus conference in Vancouver on "How to think like a data journalist".

But, briefly, here are some news story conventions that I think those who work with data could learn from:

Headlines: People have an annoying habit of giving their charts titles that describe their data rather than communicate the key takeaway message they want their audience to have. I tell my students to think of their chart titles like a headline: Don't hope your audience figures out what your message is on their own. Just tell them!

Less:



More:


Lead: The lead is the first sentence of any news story. It's similar to a headline but serves a dual role. A lead should both communicate the most important information in your data and make the reader want to know more. I think leads are so important that I make all my students, whether journalism students or otherwise, come up with a lead at the earliest stages of their data visualization projects. Condensing one's analysis down to a single sentence forces you to make a choice about what really matters in your data. Once my students have a written lead, I get them to think through how they would translate that sentence into a chart that gets their key message across.

Inverted Pyramid: The inverted pyramid is the way that almost all journalists first learn how to write a news story. You start with the most important information at the top, and then move on to the next most important, and so on until you end your story with the least important facts. Writing stories in this way makes it easy for editors to chop a story for space at the last minute without needing to dramatically rewrite the whole thing. Inverted pyramid writing can be a bit boring — and most journalists eventually move away from it, at least in part — but it forces journalists to have a clear sense of the relative priority and importance of almost every fact in their story. Data analysts could learn something from this technique: Prioritizing their findings from most important to least important, even if they don't necessarily present their findings in that exact order.

Credit: Wikipedia
Making it personal: I did a whole Tapestry talk on this one, but in short, journalists are very good at framing a news story so it's directly relevant to the reader. In some cases this as simple as putting the word "you" in a headline but it can also involve charts or maps that allow readers to pinpoint the data that is of specific interest to them. My friend Steve Wexler has explored how this principle can be applied in a business context: How does my salary compare to others in my organization? How do my store's sales measure up to others in my area?


Real People: John actually addresses this point quite well in the last post in his series. For journalists, it's second nature to find the "real people" who help illustrate a data point. When I did a series on parking tickets in Vancouver, I profiled a gung-ho parking ticket officer who hands out 60-70 tickets a day. For a series on bike thefts, I told the story of a bike that was stolen not once, but twice. And for a series on low organ donation rates in immigrant communities, I profiled a South Asian woman who waited a decade for a kidney. Those human stories bring the data to life — helping readers understand that the data is not an abstraction but a reflection of real things going on in the real world to real people. Like John, I think in some cases data analysts can make their data more engaging by finding the human stories that help to illustrate the figures. But in other cases, talking to people is important simply to better understand what's really going on with your data. If your data is showing sales are way down at one store, call up the manager and ask why. Data analysts need to step away from their spreadsheets every now and then and engage with the real world.

These conventions aside, one other advantage of thinking of data storytelling like a news story rather than a literary story is that there are different types of news stories.

Feature stories can often be quite similar to a literary story, with a clear narrative told from beginning to end, often featuring a key character.

But there are also breaking news stories, where the important thing is to communicate what's going on as quickly and concisely as possible. What would be the visualization equivalent of a breaking news story be? Maybe a Dashboard showing up-to-the-minute sales data.

There are also explainers, that pose a question that the journalist then tries to answer (i.e. "Why did Rural America vote for Trump?"). Explainers suck the audience in not through a traditional story arc but by posing a question that sparks curiosity. Explainers are perhaps one of the easiest fits for the work of data analysis and visualization, which is often motivated by seeking answers to specific questions ("What are our most profitable products?" "What's the connection between vaccination rates and measles outbreaks?").

There are other types of news stories. I won't list them all.

The point is that news stories cover a broader range of story structures than literary stories, which make them a better analogy for the work of data analysis, which has a variety of different purposes.

Why bother talking about data storytelling at all?

I hope I've made the case that, when we want to tell stories with data, thinking of those stories as news stories is more useful than thinking of them as literary stories. But why bother talking about data storytelling at all?

As John argues:
What I’m primarily focusing on here are the line charts, bar charts, area charts, and other charts that we all make every day to better understand our data, conduct our analysis, and share with the world. Even though we often say we’re telling data stories, with those kinds of charts we are not telling stories, but instead making a point or elucidating an argument.
I think there are at least two reasons why the focus on data storytelling can be helpful.

First, and most simply, the term story naturally makes one think about the audience: about who that story is being told to.

When it comes to data analysis and visualization, I think that's a good thing. Data analysts spent a lot of time with their data and it can be easy to get lost in the weeds and forget how foreign your figures will seem to someone coming to them fresh. Thinking about telling stories with data reminds you that you need to simplify your message so that it's easier for your audience to digest.

Even the terms data analysis and data visualization keep the focus on the process: Of analyzing the data or turning the data into charts, rather than explaining your data to others. Or as Jewel Loree said in her Tapestry talk this year:


Using the term story is a good reminder that, at the end of the day, you have to communicate your findings to someone else and that will require you to think about who your audience is and what they need.

Second, I think there is a tendency in data visualization to put way too much data into your chart.

This is partly due to our own insecurities. Who hasn't had a boss ask them why this or that wasn't in their chart or presentation. So, just to be safe, we lean towards leaving stuff in so that no one can get mad at us for leaving it out.

Second, the tools that we use, with all their fancy interactivity and filters, make it easier than ever to show more data rather than less.

Excel, for all its many flaws, at least forced you to decide which static chart to build. With Tableau, you can create a Dashboard showing a half dozen views and then load it up with a half dozen filters. What I like to call: Show Everything. Filter Everything By Everything.

News stories don't include every possible fact about what happened. The journalist makes a judgment call about which facts are most important and should be emphasized, how much background information is necessary for proper context, and which facts can safely be left out altogether.

Thinking about data as a story is an important counterweight to the dangerous tendency to include too much information in our visualizations and presentations.

Final thoughts

I've tried to avoid dictionary definitions so far in this blog post but I don't think I can any longer.

John argues people are using the word story too broadly:
I think most of us are using the word story as it applies to anything we are trying to communicate with data; we are using that word too flippantly and too carelessly. One could argue that we in the data visualization field can come up with our own definition of story, but that’s simply changing the definition to meet our needs. Plus, I don’t think that’s how many people view it—they see visualizing data as a way to tell a story, but it rarely is a story.
I frankly think John's definition is too narrow. And it's not one shared by most dictionary definitions of the word I could find (Merriam-WebsterOxford). Indeed, most dictionaries define story in a pretty broad way, encompassing everything from news articles to gossip to novels.


If people always meant literary stories when they talked about data storytelling, I'd be worried. As I've already explained, in most cases, I think trying to make your data fit a traditional story arc will be distracting at best and dangerous at worst.

But, as John rightly points out, people aren't using the term data storytelling to mean telling a traditional literary story with data. Rather, they're using the term in a much looser, vaguer way. For me, telling a story with data means telling something like a news story. For someone else, it may mean recounting their personal experience with a dataset.

John seems to think this vagueness is a bad thing. But I disagree.

I think when people talk about data storytelling they're really being aspirational.

They know data can be dry and boring and they want to find a way to present their findings in a way that grabs their audience's attention.

They use the term story because they know people get excited and engaged by stories and — in a perfect world — that's how they want people to respond to their data, too.

I get that, for some in the data visualization community, "data storytelling" has become a bit of a cliche: a meaningless phrase that people like to throw around without really thinking it through.

But I think when most people say they want to tell a story with data, what they really mean is that they want to find a way for their data to have more meaning and impact.

And that's something we should all want to encourage.

Wednesday, December 7, 2016

My next Tableau Training workshop is Feb. 22-23, 2017



My next two-day public Tableau training workshop will be held on Wednesday, Feb. 22nd and Thursday, Feb. 23rd at SFU Harbour Centre. You can buy tickets here or by clicking the button below:
Eventbrite - Tableau Training: Telling stories with data

Here are some testimonials from people who've attended my earlier training sessions.

If you can't make this workshop but would like to be alerted when the next one is scheduled, just add your name here.

If you have several people at your organization who need training in Tableau, I'm also available for onsite training.

Thursday, November 10, 2016

TC16: My Favourite Tableau Conference 2016 sessions



Well, TC16 -- my very first Tableau conference -- is now officially over.

As a bit of an introvert, I frankly found the sheer size of the conference -- and the crazy Data Night Out party -- a little bit intimidating. Before now, the biggest conference I'd ever attended was NICAR at 1,000 attendees. This conference had 13,000 people.

But I also learned a lot and got to meet a lot of people who, before now, I only knew through Twitter.

I also attended some great sessions, which I thought I'd note here since Tableau is going to make recordings of all the sessions available in the coming days.

I'm also hoping others might share what their favourite sessions were -- either on Twitter or in the comments -- so I have a cheat sheet when I start making my way through the hundreds of recordings.

So, in no particular order, here's a list of my favourite sessions, with a couple of notes on each:

50 Tips in 50 Minute with Andy Kriebel and Jeffrey Shaffer.

Exactly what it says on the tin: a load of great Tableau tips in rapid-fire succession. I love sessions like this as, even if one tip isn't helpful to you, the next one will be. This one will require re-watching to catch some of the specifics of how to implement each tip. But I easily came away from this session with dozens of time-saving tricks and ways to make my work in Tableau better.


I was hoping to make the similar-titled Rapid Fire Tips & Tricks with Daniel Hom and Dustin Smith but it was the only session of the conference I was turned away from because of lack of space. Will definitely be watching that one on video.


The Visual Design Tricks Behind Great Dashboard
s with Andy Cotgreave

I was told Andy Cotgreave's sessions were not to be missed and that was good advice. This was a great conceptual talk about how to think about ways to make your Tableau Dashboards more engaging and easier to read.

A lot of Andy's advice in the talk is similar to what I tell my students (like making sure your title actually says something interesting). But there was also a lot of advice that hadn't occurred to me that I can put into use. And he had a fun Few-McCandless data viz continuum with Alberto Cairo right in the sweet spot.



Visualizing Survey Data 2.0 with Steve Wexler

Steve Wexler has published a heap of great resources on how to visualize survey data in Tableau and this talk had some really useful updates on some new tricks he's developed -- including using a "Dual Pivot" to allow you to visualize demographic data more quickly and how to deal with situations where you have too few respondents to a given question. Great stuff.


Sealed with a KISS-Embracing Simplicity in Data Visualization with Chris Love

At a conference where a lot of people were showing off all sorts of intricate, complicated graphics, Chris Love's talk was a helpful reminder that the simplest charts can sometimes be the most effective. In one of the talk's more powerful moments, Chris took a beautiful, but hard to read, Sankey diagram and remade it live as a series of simple bar charts that actually told the story of the data much more clearly.


I've asked Chris to please do more of these "Simple Makeovers", starting with a complicated Guardian chart he showed during his talk. He seems game, which I think would help more people see the value of keeping things simple. UPDATE (Nov. 13): True to his word, Chris has already updated the Guardian chart as a small multiple hex map! And then as a second version, too!

Advanced Mark Types: Going Beyond Bars and Lines with Ben Neville and Kevin Taylor

This is kind of the anti-talk to Chris Love's presentation. Ben and Kevin went through several cool chart types -- like lollipop charts and hex tile maps -- that aren't in Tableau's built-in "Show Me" menu but can be created with a bit of fiddling in Tableau. Most of the chart types were actually useful, rather than just being show-offy -- and, in some cases, they looked pretty easy to implement. I know I'm planning to use lollipop charts a lot more in my work now.


Cross Database Joins: The Unexpected Solution to Tough Analytic Problems with Alex Ross and Bethany Lyons

This was the one and only "Jedi" session I attended and, I'll confess, I only went because I got turned away from the Rapid Fire Tips session I mentioned earlier.

The material in this session went by really quickly and a lot of it was over my head. But it's a testament to Bethany Lyons' infectious enthusiasm that this session made me want to learn more about how I could "create more data" use cross-database joins to solve gnarly data problems. And she did such a good job of explaining what she was doing that I feel I actually got the conceptual gist of this talk even if I'll have to re-watch the talk in slo-mo to get all the steps.

In the TC16 preview podcast I mentioned in an earlier post, pretty much everyone was raving about Bethany as their favourite speaker and now I can see why. (Alex Ross also did a great job summarizing the key concepts.)

Unfortunately, there were a lot of sessions I was hoping to make but didn't, because I was tied up in hands-on training, like Busting the DataViz Myths with Matt Francis, Data Journalism: Creating Awesome News Graphics in Tableau with Robert Kosara and New Ways to Visualize Time with Andy Cotgreave.

Speaking of hands-on training, I was really impressed with the calibre of the instructors in all the hands-on training sessions I attended and the quality of the materials (including, for each session, a "web workbook" that you can refer back to at your own speed with all the problems and solutions in it).

So those are my favourite sessions from TC16. What are yours? Please let me know in the comments below or by sending me a note on Twitter.

Note: Tableau is paying for some of my conference-related expenses.

Wednesday, November 9, 2016

My 5 favourite new Tableau features

So I was a bit distracted with a certain political event last night, so didn't get a chance to post this yesterday as I'd originally planned.

But I wanted to write up a quick post on some of the cool new features coming to Tableau.

There were two big keynotes yesterday: Tableau Vision which laid out some of Tableau's big plans over the next three years, and Developers on Stage, in which Tableau's developers introduced some cool new features coming in the next few months.

Some of the features announced during the Tableau Vision talk were pretty cool, but it's hard to know how long it will take before they show up in a release. So I'll spend most of this post talking about the features that are coming soon to Tableau (in some cases, it sounds like, as early as Version 10.2).

Here are my Top 5 favourite new features.

#1: Support for Shape Files

I've been wanting Tableau to support spatial files, like SHP and KML files, since I started using the product six years ago and I'm so glad it's finally here. There are some hacky workarounds now for getting spatial data into Tableau. But they're pretty clunky, complicated to use and prone to error. Being able to just connect to a SHP file, like you can now to an Excel or CSV file, will be a huge improvement and make Tableau a much more powerful, and popular, mapping tool.

#2: PDF Connector

Getting data out of PDFs can be a huge headache and, until recently, was almost impossible. I remember not too long ago spending hours fiddling with command-line tools like pdftotext to try to get data into a spreadsheet.

Luckily, the tools available to extract data from PDFs have gotten a lot better in just the past few years, chief among them Tabula. But while Tabula is well known among data journalists, most people I come across haven't heard of it (or the nearly as good Cometdocs).

So building PDF extraction right into Tableau will, I think, make that data a lot more accessible to a lot more people and -- if it works as good as Tabula -- it will be quicker to do it right in Tableau than having to fire up another tool.



#3 Expressive Text Editor

Since I started using Tableau, I've wanted the ability to simply highlight a word in a text box and hyperlink it to a web page. Like with Shape files, there are hacky workarounds now: you can create a little Sheet and then use a URL Action to make clicking on it open a web page. But it's not very natural. And it doesn't allow you to, say, have a normal title and description on your Dashboard and hyperlink a single word. That's now coming to Tableau, along with a bunch of other features that will provide more flexibility to any text on a page (including captions and tooltips), such as dropping images into a text box and kerning text.


#4: Better Dashboard Formatting

A number of little improvements to Dashboard creation were announced yesterday, including the ability to add margins and padding to a Dashboard and to evenly distribute Sheets with a single click. It doesn't sound like a big deal, but as the features were demoed it became clear that, along with the Expressive Text Editor, these features should make it a lot easier to make cleaner, nicer looking Dashboards with a lot less finicky formatting.



#5: Tooltip Selection

Another neat feature that could be quite powerful: you'll be able to put links in a Tableau tooltip that, when a user clicks on them, will highlight related elements on your viz. This could allow for some neat discoverability without having to always put in separate filters or highlighters.

So those are my five favourite features coming to Tableau soon.

But, as mentioned, in the morning Tableau Vision keynote, the company also unveiled some of the things they're working on over the next two or three years. I'm not sure how excited to get about these features, as they could still be a ways out. And while I wasn't at last year's conference, it sounds like some features that were previewed then still haven't arrived (like charts in tooltips), so these things can take time

Still, some of what was shown at Tableau Vision was pretty cool and I think it's worth a mention.

Project Maestro is probably the thing I'm most excited about. Tableau has already taken a lot of data prep you had to do outside Tableau with OpenRefine or Alteryx and baked it right into the product with things like Pivots, Unions and Cross-Database Joins.

Maestro seems to be an attempt to go even further in that direction, with really powerful tools for reshaping your data. Maestro will also include the ability to use visual cues to join datasets together and be alerted to mis-matches in your data, such as highlighting join errors in red so you can quickly correct them.


Also interesting: Selection Summaries, which will give you little pop-up visualizations based on the marks you've hovered over or selected. This seems like a pretty cool way to get a drill-down view of your data quickly without having to build out an entirely separate Sheet on your Dashboard.

The Tableau Vision keynote also showed off some of the work they're doing on Natural Language Processing. The idea here is that you could ask a question of a Dashboard much like you'd ask a question of Siri on your iPhone. You could type "Show me the most expensive houses in Vancouver" in a text box and the map would interpret that query and change the view of the data.

If/when this works, it would be pretty neat. I'm a bit skeptical NLP could be smart enough to be reliable. But, already in the demo, Tableau has built in little sliders below the query box that show the user how their text query has been translated into data filters. That seems to me to be a pretty good way of being transparent about how the feature is working.


My only complaint about some of these new features is that they didn't come sooner (especially support for Shape Files and hyperlinking text). But I'm glad they're here now and they will make my personal experience using Tableau a lot more enjoyable and powerful.

Note: Tableau is paying for some of my conference-related expenses.