Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Kirill Eremenko: This is episode number 199 with associate professor at
NYU Stern's School of Business, Kristen Sosulski.
Welcome to the Super Data Science Podcast. My name
is Kirill Eremenko, data science coach and lifestyle
entrepreneur. Each week we bring you inspiring people
and ideas to help you build your successful career in
data science. Thanks for being here today, and now
let's make the complex simple.
Kirill Eremenko: Welcome back to the Super Data Science Podcast,
ladies and gentlemen. Very excited to have you on the
show, and today I've got a super interesting guest for
you. Joining us all the way from New York is Kristen
Sosulski, who is the associate professor at NYU's
Stern's School of Business. What you need to know
about Kristen is that she teaches people how to
visualize data for a living. That is her job to teach
people how to visualize data, how to get insights, how
to present the findings, and not just to just anybody.
Kristen actually teaches managers and leaders and
people who go to the NYU Stern's School of Business.
As you can imagine, she has tons and tons of
experience, not only in the aspect of visualizing data,
but also communicating findings and presenting the
insights and helping people better understand how to
read data and how to understand charts and graphs
and all of these amazing things that we can create in
the space of data visualization.
Kirill Eremenko: This has been an amazing podcast. I'm very excited for
you to hear. Some of the things that we discussed on
today's show were Kristen's third book, which is
coming out now. It's actually available on preorder. At
the moment when you hear this podcast, it's actually
coming out on Amazon, so make sure to check it out.
It's called Data Visualization Made Simple. We also
talk about visualization for managers and leaders and
why that's important. On the flip side, we talk about
using visualization as an entry pathway into data
science. At whatever state in your career you are now,
this is going to be helpful for you. Whether you're a
manager or leader or you're just starting out into data
science, you will see how you can use the power of
visualization to your advantage.
Kirill Eremenko: We'll go through Kristen's top tips for visualization.
This is something you don't want to miss because
Kristen has been doing this for a very long time and
she knows exactly what people need in visualizations.
In fact, we'll actually look at some examples of
visualizations in this podcast. Kristen will walk us
through how she thinks about visualization in two
specific case studies that I will just randomly throw at
her, which is quite a fun experience. Plus, of course,
lots and lots more things you'll learn about Kristen's
personal journey into data science and the space of
data visualization. We got a jam packed podcast, lots
of exciting and interesting topics. Can't wait for you to
check it out, so let's dive straight into it. Without
further ado, I bring to you Kristen Sosulski, associate
professor at the NYU Stern's School of Business.
Kirill Eremenko: Welcome to the Super Data Science Podcast, ladies
and gentlemen. Today I've got a very exciting guest on
the show, Kristen Sosulski. Kristen, welcome. How are
you doing today?
Kristen Sosulski: I'm doing great. Thanks so much for having me.
Kirill Eremenko: It's so great to have you. Tell us where you're calling in
from.
Kristen Sosulski: I'm calling in from New York City.
Kirill Eremenko: And the weather there is not the best right now?
Kristen Sosulski: Not the best. Well, it's close to 70 degrees, but it looks
like it might rain.
Kirill Eremenko: I just made the mistake of just before we started
making the comment that you guys are moving into
winter. What was your reaction?
Kristen Sosulski: No, we're barely in fall.
Kirill Eremenko: Yeah. I heard New York is a beautiful time to visit in
the fall. Is that true, like when the leaves are coming
off?
Kristen Sosulski: I think it's the absolute best time. Definitely need to
visit New York before Thanksgiving, before the holidays
pick up. It's really a great ... Right now is the best time
to visit New York.
Kirill Eremenko: Okay, awesome. That's really cool. Very jealous of you
and I would love to, like in a good way obviously, I
would love to see New York in the fall. Okay, well
thank you again for coming on the show. We've got
some very exciting topics to cover. Kristen, you are
into the space of data science and visualization, and
you have been teaching this topic for quite a long time
in different universities based on what I can tell from
your LinkedIn, so I'm very excited about diving into
this space and learning about your background,
learning about your journey. But to get us started,
could you, for the sake of our listeners and everybody
who's tuning in to this podcast, tell us how you would
introduce yourself to somebody off the street. Who are
you and what do you do?
Kristen Sosulski: Okay. My name is Kristen Sosulski. I'm a professor at
NYU Stern's School of Business. My area of, my
passion, my area of scholarship lies in data
visualization technology and this new field called
learning science. It's using technology for education
and to help learning, and visualization actually plays a
role in that, so I'm kind of really in a lucky spot right
now in my role as a professor.
Kirill Eremenko: Gotcha. So on one hand you're passionate about
creating visualizations and explaining data and
information through visualization. On the other hand,
there's this whole new field of learning sciences, as I
understand it, where you use visualization to aid and
facilitate the learning process. Is that correct?
Kristen Sosulski: Absolutely. Absolutely. I just released my third book
on data visualization. It's called Data Visualization
made simple.
Kirill Eremenko: Oh, congratulations.
Kristen Sosulski: Thank you. It's really intended to help anyone who is
looking to get into the field of visualization or just do
more with the data that they have.
Kirill Eremenko: Oh, that's so cool. That's so cool. Let me check it out.
Oh, I can see it on Amazon. Oh, that's so awesome.
Kristen Sosulski, Data Visualization Made Simple:
Insights Into Becoming Visual. Wonderful. There's
another whole topic we're going to dive into. And third
book, that is crazy. I'm going to ask you this. I released
my book at the start of this year. It took me one and a
half years of writing. I was very excited about it, still
am, but it's such a complex process, way more
complex than I thought, and so much more involved
than I thought that this is one thing that I'm not even
sure if I'm going to write a second book. Maybe,
possibly, but I wouldn't jump into it. How about you?
This is your third book. Where do you find the
inspiration to write them?
Kristen Sosulski: When I finished my dissertation, I was like, "Okay, I'm
never writing a book." And then when I co-authored
my first book, which wasn't even [inaudible 00:07:42] I
was like, "I'm never doing this again." For some
reason, it just kind of struck me. I was like, "I need to
write this book on data visualization." Because all the
books out there are fantastic, but there was something
that was missing that didn't really go with my teaching
style and meet the needs of people in the world of
analytics and business and data science. There just
needed to be a little bit of a different take, and so I saw
an opportunity to try to fill that gap.
Kirill Eremenko: Okay, gotcha. So it's more like your need and desire to
contribute to the world, it overpowered other aspects
that are involved in writing a book and the fear, I
guess, that comes along with looking ahead at this
huge project that you're about to undertake.
Kristen Sosulski: Yeah, yeah. And then not having to be social for about
a year and a half, you know? [inaudible 00:08:48]
every weekend and every work night.
Kirill Eremenko: I know. I know, yeah, yeah. Well, very excited about
that. I'm going to pick a copy up myself and highly
recommend to all listeners Data Visualization Made
Simple by Kristen Sosulski. You can get on Amazon.
Very, very interesting. I love data visualization. If you
don't mind me asking, do you have pictures in your
book?
Kristen Sosulski: Do I have pictures in my book? Yup, of course.
Kirill Eremenko: I love books with pictures. They're the best. It's very
easy to read. Yeah, I was joking. Of course a
visualization book's going to have pictures, and yeah. I
always like to browse through books to pick up some
... One I really liked reading, or even just looking
through, was it called like A Year of Visualization
where two ladies, one in New York, one in London,
they were sending each other postcards and they were
doing these hand drawn visualizations, and then for a
whole year, like once a week. There was 52 times two,
104 visualizations in there about what they did in that
week. It was really cool. You can get some great
inspiration for your own visualization from books like
that.
Kristen Sosulski: Oh my gosh, yeah. That's amazing.
Kirill Eremenko: Yeah, I'll find the title and share it with you. All right,
so we've got a book that you just published and you
work at the NYU Stern's School of Business, so tell us
a bit about that. Are you teaching visualization or are
you teaching other topics through visualization?
Kristen Sosulski: Great question. I teach a course called Data
Visualization to executives and MBA students. I'm also
teaching a new online certificate called Visualizing
Data that's open to anybody in the world, so you don't
have to be matriculated at Stern, and I'm launching
that in the spring. Yes, I'm very lucky. I get to teach
visualization at school. Part of that is really making
the business case for why visualization is so important
for managers, and it's really a leadership skill. Being
able to communicate, right, your data insights, your
results, through visually to any audience is critical.
Kirill Eremenko: Mm-hmm (affirmative), yeah. Definitely. That's a very
interesting space for, as you say, leaders, executives,
managers to see the power of visualization. Do you
find that it's usually when students attend your class
for the first time, do you find that this skill is
underrated in their eyes and then you have to turn it
around, or they're already quite proficient and you just
need to add some extra powerful skills into their
arsenal?
Kristen Sosulski: That's a great observation. It's underrated. When
students take my class and when they've completed it,
they can't look at a chart or graph the same way ever
again. I think it's something that is not so clear from
the beginning that, "Oh, I'm going to be the person
creating these visualizations." It's more like, "Oh, I'm
going to have an intern do this when I'm a manager."
As the class progresses, it becomes clear that we have
to find ways to communicate these complex analyses
that we derive to our stakeholders, whether it's
prediction customer conversion or identifying new
markets, well designed data graphics can really reveal
and translate important information.
Kirill Eremenko: Mm-hmm (affirmative), mm-hmm (affirmative),
[crosstalk 00:12:38]
Kristen Sosulski: So I really, yeah, and it shows ... If you could make a
great data graphic of your insight or result, it shows
that you understand your data, and now we can talk
about taking actions or making decisions with that
data.
Kirill Eremenko: Mm-hmm (affirmative), yeah. Yeah, that's definitely,
definitely true. Do you find people who attend your
class, they're like receptive to the idea of learning data
science? For instance, I can imagine there could be
executives or managers who just have the mindset
that okay, data visualization is powerful, however just
as you mentioned just before, that like I'll have
somebody else do it. I don't need to be able to do these.
And rightly so. A lot of managers, they don't have time
to sit down and create a visualization. What are the
benefits for managers who will never actually be
creating these visualizations themselves, what are the
benefits of them actually having these skills or
understanding how visualizations work?
Kristen Sosulski: Oh, that's a great question. First off, I would say to
your first question about-
Kirill Eremenko: [crosstalk 00:13:47]
Kristen Sosulski: ... how the students kind of, yeah, are they receptive to
this? I'd say absolutely. It's actually amazing how
receptive students are. From my class, several
students have even created their own data viz
consulting firms, which I'm like, "Whoa."
Kirill Eremenko: Wow.
Kristen Sosulski: It's amazing.
Kirill Eremenko: That's awesome.
Kristen Sosulski: It's really an often overlooked area, and the way I sell it
is it's really the extra 20% that you need to put in.
Whether you're writing a business report or creating a
website or dashboard for executives, it's the extra 20%
that really helps reveal those important insights so
someone can take action. If you're not building them
yourself, that's totally fine, right? There are people that
are really expert in not just visualization, but in data
modeling and data mining, really understanding the
ways in which data can help with prediction and other
aspects. For managers, it's having that knowledge to
be able to lead and critique and offer advice to their
colleagues that are doing this work. Not just accepting
things at face value, but really to know how to ask the
right questions.
Kirill Eremenko: Yeah, yeah, that's what I was thinking as well. For me,
for instance, the skills I learned back when I was in
consulting and doing visualization there really helped
me understand visualizations more. Even taking it
further, like when I was a kid and I attended art school
and you'd learn how to paint, I never thought like
maybe I would become a painter, but I didn't. But still,
those skills, they help me understand better how
colors work together, what elements are standing out
and what elements should be standing out and why
they're not and so on. The whole concept of aesthetics,
I think is important for people to develop that as well.
Kristen Sosulski: Oh, absolutely, absolutely. Everything from
recognizing that certain hues together can't be really
perceived that are colorblind or having acute
colorblindness, and so that's really important. And
just basic readability. Can I see that chart from the
back of the room? Can I read the Y axis? And then just
having the consideration of the audience, right? Like
just because you put a chart up there doesn't mean
that everybody understands the key takeaway. And so
[inaudible 00:16:34] those explanations and really
walking folks through that chart or graph.
Kirill Eremenko: Yeah, definitely, definitely agree with that, and tell us
a bit about how did you get into this space? What
made you get started into the area of visualization?
Was it a conscious choice or did you end up here by
accident?
Kristen Sosulski: Well, I've always been involved in technology, and the
way that I got into visualization was really a unique
story. But in a nutshell, I was working for this
education center at Columbia University, and I started
working with this film professor. We were creating
digital educational technology projects to help
students learn. The idea was to look at a film, a
particular scene in a film, and be able to deconstruct
that and do it all over the web. This is in 1999, to
actually look at a film and to be able to cut it into
little, small pieces and to be able to analyze each
particular shot in a film. Through that analysis
process, we were stripping out all the narrative and
dramatic content and just focusing on the structural
elements of film. We used quantitative values to
describe what was happening shot by shot in the film.
At the end, we actually visualized that through a data
visualization.
Kirill Eremenko: Wow.
Kristen Sosulski: To be able to visualize art was such a ... It just, it
totally made my head spin at first. It was such an
amazing project that from there on out visualization
became part of my practice, together with teaching,
like I said, and my work with technology.
Kirill Eremenko: Wow, that is so cool. I would have loved to see. Do you
have the results of that project available somewhere
still? I know it's [inaudible 00:18:23]
Kristen Sosulski: I do. I do. Yeah, it's called The Deconstructor. I can
definitely send you a PDF. We did a little research
report on it.
Kirill Eremenko: Nice. Is it okay to include it in the show notes for our
listeners?
Kristen Sosulski: Absolutely. Absolutely.
Kirill Eremenko: Okay, yeah, please send it through. I will definitely do
that. It sounds like an interesting project. I can totally
see now how you fell in love with this space and
visualization is such a great area. For the benefit of
our listeners, visualization, like a lot of time, like not a
lot of time, but many people have asked me if you can
get into the space of data science without getting
heavily involved in programming. Some people just
don't like programming or aren't really passionate
about learning how to program, but they are
passionate about data science. They see the power
there. What would you say in that case, Kristen? Is it
possible to get into the space of data science analytics
without having to learn programming too heavily?
Kristen Sosulski: Oh, absolutely. Absolutely. I'm a coder myself and I
think that there are tools that are available, like
Tableau or you could even use Excel, that allow you to
create dozens of visualizations without knowing so
much about coding. The key is to really understand
your data and what your data represents in the real
world. Without an experience in coding, you still have
an opportunity to use these tools to visualize data, so
absolutely. Again, the key is knowing what your data
represents in the real world and knowing if the
visualization you create is accurate.
Kirill Eremenko: Yeah, yeah. Totally agree with that. Visualization is
your pathway into data science. It's like a quick way to
get into the space of data science. Whether you want
to later on learn machine learning programming skills
or not, visualization skills are going to be very
beneficial in either case. Well, on that note, let's shift
gears a bit. I wanted to pick your brain on some tips
and hacks in visualization. How does that sound?
Kristen Sosulski: Sounds great.
Kirill Eremenko: Okay. First question would be, what goes into a good
visualization? What is the difference between, or let's
say great visualization. What's the difference between
an average or a good visualization and a great
visualization that actually delivers where it's supposed
to? I understand there will be lots of different
elements, lots of different details, but what are the key
cornerstones of a great visualization in your opinion?
Kristen Sosulski: Okay, so I would describe the most important thing of
a visualization is that there is a clear takeaway. I call
this the party favor. You know when you go to a
wedding, at the end of the wedding you usually get a
little trinket or something to remember this day.
Kirill Eremenko: Oh, yeah.
Kristen Sosulski: You have to make sure your audience walks away with
that little trinket or that party favor. So important,
otherwise why did you create it in the first place? It's
so important that your message resonates with your
audience. There's a lot of tricks and hacks to make
sure this happens. One, show it to other people in
advance. Don't be afraid to show your work and see
the reactions. You're almost doing a test of how well
one can perceive and interpret this graphic.
Kirill Eremenko: Gotcha. Okay, yeah, totally agree with that. The way
some people see visualization and probably the way I
saw it before, is you have something in mind, like you
put it together and depending on your experience in
the space, and I was not experienced at all when I was
starting out, you might already have something close
to the [inaudible 00:22:16] or further away, but in
essence, anything you come up with in your head at
the start is probably not going to be the final product.
It's going to have adjustments and different elements
that you weren't expecting or something might not fit
in. You might have to cut something out.
Kirill Eremenko: It is an intuitive process. Visualization, inevitably
you're going to have iterations of what you're creating.
Starting out and trying to go for the perfect solution
right away I think is a mistake. I think you need to
start out, you put a prototype together, and as you
said, Kristen, show it to other people. Get their opinion
and see how they react to it, and then adjust it based
on that. Then, go through another iteration, another
iteration. Would you agree with that, that it's an
iterative process?
Kristen Sosulski: It's an iterative process. First you start with, I would
almost say it's first an exploratory process. As you
understand and develop a data understanding or
understanding of your data and you start asking
better questions of your data, as you query it, as you
choose to select different display types, as you choose
to either aggregate or disaggregate your data, right?
Are you going to show every point on a map or are you
going to fill in just more geographic regions? Does that
tell your story better? Dealing with the amount of data
or density of your data is also very important. What
level of grain are you going to show?
Kirill Eremenko: Yeah, yeah. Exactly. Exactly that. Yeah, that's
probably a great starting point. A good visualization or
a great visualization has to have a clear picture, a
clear takeaway that a user's going to get, and you got
to show it to other people in advance to iterate that
process correctly. Anything else?
Kristen Sosulski: Oh, absolutely. The second thing is consider the final
format of your visualization. Are you going to be
presenting it in a room of 1000 people like on a
PowerPoint or keynote, or something that your
audience is going to interact with online or on their
phones? Or, is it a report that you're giving
stakeholders that's printed out? That format really
does make a difference on how you design it. You
design for interaction if it's going to be online. You
design for clear readability and you probably add a lot
more text if it's going to be printed. If you're going to
show it, you're probably going to not show as many
details and think about your role in narrating and
walking someone through that chart.
Kirill Eremenko: Mm-hmm (affirmative), mm-hmm (affirmative), yeah.
That's a very good point. Tell me, I'd like to get your
professional opinion on this. My thinking around
visualizations, especially in the case or specifically in
the case when you're presenting it, is indeed, it's very
different to if you just hand it over as an interactive
online tool or report. Because in the case when you're
presenting it, I feel that the audience's attention
should be on you rather than on the visualization. The
visualization should be assisting you and therefore
should be minimal text, minimal confusing things.
Should be one picture at a time, and then you tell the
story, and people are focusing on you rather than
reading the slides. Do you agree or do you have a
different opinion on that?
Kristen Sosulski: I absolutely agree. One of my favorite visualization
designers is Donna Wong, and she says precisely that,
that she is the presentation.
Kirill Eremenko: Mm-hmm (affirmative), yeah, yeah. But it's different
though if you send it, like you say, as an interactive
report or a PDF report. You're not there, so it becomes
a whole art. How do you incorporate yourself and your
story into the visualization as like through footnotes or
as call outs and other ways like that. That's very
interesting, isn't it?
Kristen Sosulski: It really is. In a report, how would you guide somebody
to look at the particular aspects of the chart that you
want to draw their attention to? You might use
different colors or shading. You might use call outs.
You might show the graph in different stages, almost
like a progressive reveal, frame by frame with some
text in between that explains what's happening so you
can pace the reader as they go through.
Kirill Eremenko: Mm-hmm (affirmative), mm-hmm (affirmative), got it.
Okay, that's a very interesting idea about doing it
gradually frame by frame. That's very cool. By the way,
while we're on this, is that what your book is about?
Do you give tips on how to visualize things better and
dissect visualizations in your book, or is it got a bit of
a different angle?
Kristen Sosulski: No, absolutely, absolutely. There's a huge chapter all
on the design and the aesthetics of visualization.
There's a whole chapter on picking the right chart.
That's all based and driven on your data. There's a
whole chapter on data and different data formats and
how those are really important to consider the format
of our data to get the type of visualization that we want
so we don't make errors. For those non-data-science
folks, that chapter is really important. And then, I
have a chapter on audience, which is how to relate and
resonate with your audience with that key takeaway,
and also a whole chapter on presentation, like different
tips and tricks for presenting with data graphics
specifically.
Kirill Eremenko: Wow, that's very cool. I'm so glad you included that,
because a lot of the time that's a place where the
dropout happens. People create a, do the analysis, do
the insights, and even create a beautiful visualization,
and then they don't follow through to really act as the
bridge between the insights and the business decision
makers. That's where the real value is, right? The
visualization can be amazing and the insights can be
really life changing or business changing, but unless
you can communicate it to the people who are going to
act on them, what's the point?
Kristen Sosulski: You said it perfectly. Yeah, totally agree.
Kirill Eremenko: Okay, well, if you don't mind, without disclosing or
giving away the whole book, let's go through a couple
of these chapters and maybe you can give us one tip
from each one of them. How does that sound?
Kristen Sosulski: That sounds great. That sounds great.
Kirill Eremenko: Okay. Well, let's start with the one, with the pick a
chart chapter. How do you pick charts? Favorite
question everybody has about pie charts, when do you
use and when do you not use a pie chart?
Kristen Sosulski: Okay, so I'll answer the second question first. Okay, so
every pie chart can be converted to a bar chart. If
you're ever in question, "Oh, should I use a pie or a
bar?" well, you can always use a bar, but you can't
always use a pie. Why? Because you can only have a
certain number of slices in a pie. You know that as
soon as you put more than six or seven slices of a pie,
it's really hard to distinguish between those different
areas, right? Especially if they're kind of close in size.
We're just better as human at perceiving length over
area. Picking a bar is generally a better choice, but I'm
not one of these people that's like, "Oh, you can't have
a pie chart." If you want to have a pie chart for some
variety, I think that's perfectly fine, as long as it's
saying something.
Kirill Eremenko: Thank you.
Kristen Sosulski: If you're showing a pie chart that's split in half with
50/50, that's not really saying much.
Kirill Eremenko: Yeah, yeah, thank you. I understand the whole hassle
about pie charts. I agree, if it's got ... I would even go
as far as saying more than three parts of a pie, like a
bit too many. But sometimes people are so adamant
about don't use pie charts. I agree with you. If it fits, if
it looks good, if it says a story, use a pie chart. Other
than that, try to stay away from them, I guess.
Kristen Sosulski: Yeah, definitely. If you're just saying you want to show
proportion of a whole of people who use different types
of devices, like their laptop versus mobile versus
tablet, okay, well you can show that in a pie chart, and
it might actually show you very clear the proportion of
people that convert, buy your products on their iPhone
versus their tablet. That's, I think, a fine use.
Kirill Eremenko: Okay, gotcha. Now, moving on to that other question,
how do we go about picking a chart? There's so many
different types of charts to choose from. How do you
think about this?
Kristen Sosulski: Besides thinking about the question that we want to
ask of our data, we really have to have an
understanding of our data. If we have time series data,
this means that now we can choose time series
displays. This means line charts, area charts, for
instance. But if we don't have time series data, we
can't pick a line chart. Same for if we want to map
locations, we need geospatial data. I'm not going to
map locations, it's probably not going to be a great use
of a bar chart to map 30 locations. It can be very hard
to see those differences. But perhaps if I want to show
location, I could do that on a map. I would need
latitude and longitude or I would need a zip code or
area code or a country code, some type of geospatial
data. The data does really limit your choice.
Kirill Eremenko: Mm-hmm (affirmative), mm-hmm (affirmative). That's
a good point. Okay, so let's say we've narrowed it
down. Let's say it's time series data and, for instance, I
have a specific example. How do I choose between ...
Let's say I'm plotting the unemployment rates for the
US in the past 10 years month by month, and I can
either plot is a line chart, and it's split by age groups,
like 18 to 25, 25 to 35, 35 and so on. I can plot it as a
line chart and I'll have five lines on one chart, or I can
plot it as an area chart, for example, where you have
the first 18 to 25, and then there's all shaded in, and
then after that you got to the next line above it to stack
on top of each other and they're shaded in.
Kristen Sosulski: Yes.
Kirill Eremenko: Which one would you choose? I've encountered that
dilemma before, and both are valid. Both represent the
data quite well. But how do you make the decision
which one is the best one?
Kristen Sosulski: Great. That's a really great question. If you want to see
the proportional change in unemployment amongst the
different groups, this is where you would choose your
stacked area. Visually, your stacked area also looks
very compelling. If you were to show the stacked area
in a presentation, those colors or different shades
would be very vivid, right, and you could label directly
in those areas, so it could tell a very compelling story.
Kristen Sosulski: Another great thing about the stacked area is that you
can make it a 100% stacked area, or you could just
actually use the absolute values. Then you can see the
percentage change, which is also a nice telling metric.
You have a few more options with the stacked area.
Also for interactivity. If viewers are going to be seeing
this chart online, being able to mouseover and just
click on a particular stacked area could reveal
additional information, so you have the opportunity to
add additional variables, for instance, for each data
point.
Kristen Sosulski: The line chart is great and it will tell you literally those
values and each value for each month for category or
demographic, which is fine. I think for showing just
the unemployment rate in a general trend is best with
a line, but in the circumstance that you pointed out, I
would like to see that as a stacked area.
Kirill Eremenko: Okay, gotcha. Thank you for explaining it so
succinctly. Yeah, I can see now that if you want to
compare them one against the other, or you have like,
as you said, it's more vivid if it's a area chart. That's
very cool. Well, let's do another one.
Kristen Sosulski: Okay.
Kirill Eremenko: This is fun. This is fun. Because I know in your
LinkedIn profile you said that you do consulting in the
space of data visualization-
Kristen Sosulski: I do.
Kirill Eremenko: ... so we're getting a free consultation right now, so
might as well make the most of it. Okay, let's say I
have categorical data. Let's say I have sales by
different product. Let's say we sell chairs, tables, and
all these different type of furniture. I want to compare
them and see which ones are selling better, which
ones are selling worse, and what's going on, maybe
sort them by highest sales volume sales to lowest.
Would I use a bar chart or would I use a tree map?
And just for the sake of our listeners, if you're just
starting into visualization, tree map has nothing to do
with trees. It's just like a big box that is ... You've
probably seen this chart where there's like the biggest
part, and then there's boxes inside of a box. There's
boxes split into lots of little boxes. I'm probably not
doing well explaining it. How would you describe a tree
map, Kristen?
Kristen Sosulski: A tree map, a famous scholar by the name of Ben
Shneiderman came up with [inaudible 00:36:24]
algorithms for something called a tree map. It's the
arrangement of categorical data by proportion, so it
might be by proportion of profit, proportion of sales by
product. The larger the rectangle with ... Picture one
large rectangle and dissecting that into 10 pieces, and
each of the 10 pieces would represent a product. The
size of those 10 pieces would all be different based on
some numerical value like sales or profit.
Kirill Eremenko: Wow, described by a professional. That was such a
great explanation. I think everybody can totally
understand that, even if they've never seen a tree map
before. Yeah, going back to the question. Tree map or
bar chart to describe volumes of sales?
Kristen Sosulski: Okay, volume of sales. For instance, if you were going
to show the most popular products by, say, sales, I
would love to see that as a horizontal bar chart to
show rank, okay? Clearly the longest bar would be on
top of the range horizontally and I would know that
like, wow, those beautiful Cherner chairs are selling
and they're very profitable and they're our biggest
seller at $1000 a pop, right? That would be for
popularity, we like to use horizontal for any kind of
rank. If you want to see ... Let's make the example a
little bit more-
Kirill Eremenko: Sophisticated, yeah.
Kristen Sosulski: ... [crosstalk 00:38:01] We have a tree map that
represents every furniture product by a large category.
We would have something like chairs as one big
rectangle. We'd have something as tables as another
rectangle. Let's say that there's 10 different types of
furniture product, end tables, coffee tables, desks, and
showing which area is more profitable. We have this
view of our business, furniture business, and the
largest rectangle would show us which product area is
most profitable. Then I can drill down and click on,
let's say they are chairs, click on that large rectangle
that says chairs, and then I can zoom in and see
which chair is now most profitable.
Kristen Sosulski: Tree maps are great for interactivity when you want to
drill down. You get the big picture. You get the big
picture at first. Out of all the furniture in my furniture
store, which category is the most profitable? Oh,
chairs. Now I can click on chairs, drill down, and I can
see which type of chair is most profitable.
Kirill Eremenko: It's kind of like a tree map inside a tree map.
Kristen Sosulski: Exactly. They're best used when you can use them on
a dashboard display or web-based display where you
can drill down and interactive. Less useful if you're
presenting it to an executive. They're pretty hard to
interpret.
Kirill Eremenko: Mm-hmm (affirmative), mm-hmm (affirmative). Okay,
interesting. That even means that if you have the same
insights and if you were presenting them to executives,
you might use a bar chart. But then once you deliver
that presentation and now they want that interactive
tool, you might change it up and send and create a
tree map inside a tree map type of scenario and send
that, because it's better for interactivity.
Kristen Sosulski: Absolutely. Absolutely. And with maybe a sentence or
two or a minute or two of training just to describe
what this display is actually doing, just so they
understand the use. Because, like I said, it's not
something that all of a sudden we see a tree map and
we immediately understand what it means.
Kirill Eremenko: Gotcha, okay. All right, well thank you for those quick
insights. I think there's great two examples of picking
a chart, even though simple. [inaudible 00:40:31]
sense, right? Like right now we got a few listeners,
maybe a few hundred listeners, listening to this who
are like, "Well, I'm in machine learning. I want to go
into that space. Visualization's not for me." Just for
the benefit of people in that mindset, I want to say that
visualization is for everybody. That is where ... That is
the language. Machine learning is great and
programming is fantastic, but that's the language of
computers.
Kirill Eremenko: At the end of the day, the value that data science
brings is how much does it add to the bottom line of a
business, or how does it change lives? How does it
help a non-profit? What is the actual change? That is
translated through business decisions. As we already
mentioned in this podcast, you need to be able to
communicate your insights to people who are making
these decisions, and therefore visualization is
important. In this case, we looked at two relatively
simple examples of visualization, but even I for myself
already learned something about area charts and tree
maps and so on. Think of it as a great start.
Kirill Eremenko: We're not going to go digging deeper into that. I'm sure
you describe that quite well in your book, or
awesomely in your book in that chapter on picking a
chart. Let's move on to the next one. Let's talk about
the formats of data. You said for people who are
starting out into data science, this would be a valuable
chapter. Tell us a bit more about that.
Kristen Sosulski: Oh, absolutely. A lot of times when we get time series
data, for instance, it's not organized or structured in a
way that we can visualize it. For instance, there might
be a year for every column, okay? If you think about
plotting something on X and Y axis and you want to
plot all years on the X axis, all values on the Y axis,
you would think that you would have a column for
year and a column for value. A lot of times the data
structure, especially if you get it from like the World
Bank or something where you actually have a year for
each column. Now you're thinking, "Well, what am I
supposed to put on the X axis? I have to drag every
year?" A lot of software programs won't allow you to do
that. What you need to do is to take this wide format
and actually convert it, or pivot it it's called. A lot of
you might have heard of pivot tables, of course, to
pivot your data. Now you have a column for every year
and a column for every value.
Kirill Eremenko: Gotcha, gotcha. Yup, that's one of the examples,
pivoting. It's kind of like translating data format from
what humans are better used to reading and
understanding where every year has its own column,
to something that machines are better at reading.
That's where all the years are in one column. All of the
categories or all of the types of one category ... What is
it called, by the way, when you have a category and
then you have subelements in a category? The
different years, what would they be in the year
category?
Kristen Sosulski: Do you mean like there would be a different time
dimension, or?
Kirill Eremenko: No, I mean like okay, we have a category of year, and
then like each individual year. What is that called?
Kristen Sosulski: Oh, each individual year would just be like a value. It
would almost be like observation.
Kirill Eremenko: Yeah, there we go. Such a silly question. All right, so
each value in your category is contained within that
one column. Okay, gotcha.
Kristen Sosulski: Yeah, and this is something that Hadley Wickham,
who's at R Studio and has written a lot about this, but
it's called tidy data. You have every observation in a
row and every variable in a column. That's the
foundation. Just taking a look at your data and
making sure that it's in that tidy form is going to save
you hours. That's like one of the biggest takeaways
from the book. I have many others where we talk
about how you aggregate your data or how you can
present different metrics besides the values that you
just have alone, so how you can calculate new metrics
based on your data [crosstalk 00:44:34] those.
Kirill Eremenko: Yeah, that's also an important kind of like feature
engineering type of thing.
Kristen Sosulski: Yeah, yeah. Or even something like the five day moving
average or year over year or percentage change. Those
types of things require a small calculation, and usually
most of these software programs will have a function
to do it so it doesn't take any coding. But just knowing
that that exists is really important.
Kristen Sosulski: I'll give you one more example. Let's say I'm studying
my customer base and I have their age. Now, in a bar
graph I can plot every age of my customer, and that's
going to be pretty boring, right? It's going to be maybe
from 18 to like 82, and I'm going to have a bar for each
age. What you can do instead is reduce the level of
detail that you provide and actually group age into
different bins. I might have 18 to 25 in one bin, 26 to
32 in another bin. This makes the data much more
interpretable. I can look at these more logical
groupings.
Kirill Eremenko: Yeah, and to your point, what I once discovered for
myself was that when you're doing data visualization,
you are always inevitably reducing the amount of
information that you have. You have some data ... It's
kind of like when sculptors are working with marble or
something. They have this big block of marble, and
then they carve out of it. They're always going to
reduce the amount of material, the amount of
whatever they started with to create the final result.
Kirill Eremenko: In visualization, it's okay to think about it in those
terms. Like you said in this case, how about reduce
the level of granularity and go from instead of one bar
for every year, have a bar for 18 to 25, a bar for 25 to
35. There's nothing wrong with it because in
visualization, if you think about it, there's no way for
you to add data to your, add more information to your
initial data. If you're doing that, then you are
manipulating the data, then you are doing something,
like you're making something up. Kind of like that
mindset of yes, I'm going to, I'm just going to see how
I'm going to reduce that information that I'm providing
to my user in order to still maintain the insights that I
want to convey to them.
Kristen Sosulski: Absolutely. Absolutely. Very well said.
Kirill Eremenko: Thank you. Okay, awesome. That's was on formatting
data. Let's move onto the next one. The next one is
how to relate ... Sorry, I forgot. What was the name of
the chapter, the next chapter?
Kristen Sosulski: Oh, it's just called, oh the audience chapter?
Kirill Eremenko: Yeah, the audience chapter.
Kristen Sosulski: Yeah. Great. I think you started by saying how to kind
of relate to your audience.
Kirill Eremenko: Yeah. That's the one, yeah.
Kristen Sosulski: Well, first of all, it's great to know a little bit about
them. I know this sounds pretty obvious, but you
might not think about this when you're creating a
visualization, like really thinking about understanding
your audience, but it's so important. What do they
already know? What don't they know? We could take
this, for instance, we talked about the tree map before.
Are they familiar with more complex type of
visualizations? If not, this might not be the time to
introduce them to one, unless you're going for some
type of wow factor or you're planning on taking quite a
bit of time to explain it. This is just one example of ...
or what they already know.
Kristen Sosulski: Another way to look at prior knowledge is to really
think about how you could build upon it. Can you, in
your narrative, can you build on something they
already know, an experience you know that they
already had? Even if it's like taking the subway to
work or something like that, but something you know
that there's some kind of common baseline that you
can start from. It's a great way just to engage and get
people paying attention and along with you for that
narrative that you're describing.
Kirill Eremenko: Mm-hmm (affirmative), mm-hmm (affirmative), yeah.
That's a great way of putting it. I've definitely been in a
situation where I picked to explain something to my
audience, like a certain type of distribution, and I
knew consciously that they're not ready for this. I'm
going to have to spend time on that. That's a great tip
that know your audience and know what they know
and what they don't know, and how you are you going
to use that to your advantage. Fantastic.
Kirill Eremenko: Well, we're not going to go into the one on how to
present because we talked about that a little bit
already before. But what I wanted to do now is I
wanted to go for a rapid fire list of questions, and so
like get your opinion on some different topics. Are you
ready for that?
Kristen Sosulski: I'm ready.
Kirill Eremenko: Okay. First one will be, we've talked about some good
tips and hacks already on visualization and how
people can enhance their skills. What are some of the
common mistakes people make when they're creating
data visualization? Some things that you've seen that
really stand out and our listeners want to avoid at all
costs.
Kristen Sosulski: Okay, so a common mistake as a professor is they
forget to cite their data source, so they don't tell the
audience where the data came from.
Kirill Eremenko: Yeah, okay. Yeah, that's big one, especially if you're
using ... Like even if you're using internal company
data, right? You still, it can come from so many
different sources. It's important for even an audit trail
to know that, right?
Kristen Sosulski: Absolutely, and make sure you put the year down. It's
also important to cite yourself as the author of that
data graphic.
Kirill Eremenko: Mm-hmm (affirmative), okay. Gotcha. Anything else?
Kristen Sosulski: Oh, absolutely. Another one is what I call data
integrity or lying with data. Really easy to do this with
a bar chart, not setting your Y axis to zero tends to
over-exaggerate the change in the data that's not really
there by an over-exaggeration of the change in the
length of bars of the graphic itself.
Kirill Eremenko: If you don't set the Y axis to, like bottom to zero, is
that correct?
Kristen Sosulski: Yes.
Kirill Eremenko: Yeah, oh, yeah, yeah. I know that one. Andy Kriebel
from The Data School, The Information Lab, he talks a
lot about that. I've heard him talk about it that, yeah.
If your X axis crosses the Y axis at somewhere above
zero or below zero, and you got bar charts, vertical bar
charts, then you're in for a lot of trouble. It's going to
be-
Kristen Sosulski: Absolutely.
Kirill Eremenko: Yeah, okay. Gotcha. All right, and maybe another one?
Kristen Sosulski: Okay, so color, so using color sparingly. We tend to
like to use color to highlight. Sometimes that I see that
people end up highlighting everything so nothing
stands out. If you want something to stand out, you
could use a contrasting color. I always say the most
underused color in the data viz world is gray. I'm
boring. I really like gray and I like to use color, like a
bright green or any other color that would contrast
with that if I want something to stand out, like my
most important data point.
Kirill Eremenko: Gotcha, gotcha. How great is color? Like just by
changing the colors in one visualization, you can take
it from something that is average to a really great
visualization by picking the right color combination.
Kristen Sosulski: Absolutely. Absolutely.
Kirill Eremenko: All right, cool. Next rapid fire question is what are
some of the favorite data visualizations you've seen
others create?
Kristen Sosulski: Oh my gosh, there are so many. I guess I'll just list
them off. I love, basically there's one by Lee Byron and
David McCandless which is peak break-up times on
Facebook where they [crosstalk 00:52:53]
Kirill Eremenko: I've seen that one.
Kristen Sosulski: Yeah, that one is like so fun. I always use that in my
class because how they go through that visualization,
they have this progressive disclosure. First they show
you the chart. They don't even tell you what the data
is, so you have to think about it. The second thing is,
then he puts the title of the chart, Peak Facebook
Break-Up Times, you know? Then you start laughing.
And then he annotates the chart for you. He says,
"Okay, it looks like as low point might be around the
holidays, and a high point for breakups is around
spring break." And so, just the way he guides you
through it is why I love it so much. But all it is is an
area chart. It's nothing fancy. It's the way that it's
delivered is why I love it so much.
Kirill Eremenko: Yeah, and humor is important, isn't it? You can deliver
the same chart in a very dry, monotonous voice, or
with a bit of humor. A bit of audience engagement
makes the world of difference.
Kristen Sosulski: Yeah, yeah. And then, just like on a more serious note,
Vox did this amazing video called The State of Gun
Violence in the US explained in 18 charts. A lot of
those charts are bar charts, and they use annotations,
so somebody with a red marker actually marking off
the different bar charts and annotating it as the
narrative is going. That one is fantastic. I would
definitely share that with your viewers.
Kirill Eremenko: Okay, okay. That's a good one. Anything else?
Kristen Sosulski: I love anything that Amanda Cox does from the New
York Times graphic team. There's a famous chart
about how people spend their time from the American
Time Usage Survey of the US Census. One of the thing
is that you can compare how employed people versus
unemployed people spend their time. There's a little bit
of humor there because there's a category for leisure,
like movies, and you'll see over the course of a day the
viewership of movies and television for unemployed
versus employed people, and the answer's obvious.
Kirill Eremenko: Well, that's awesome. Yeah, there's quite a few gems
online and some places to find them. Before the
podcast, I mentioned Nadieh Bremer's
visualcinnamon.com. That's a great source of fantastic
visualizations really well made about professional
topics and just some of her hobbies. Another one I
know is, well obviously the Tableau Public Repository
where you can look at the featured items that are quite
cool. Michael Bostock has a website for D3. I think it's
called blocks dot, bl.ocks.org or something like that.
Yeah, and he has some really [inaudible 00:55:47]
visualizations there. Anything else that comes to mind,
like where people can actually find lots of different
visualizations in one place?
Kristen Sosulski: Oh, I love the FlowingData website by [inaudible
00:55:59], yeah. I'm a big fan of Nathan Yau. He does
a lot of visualization in R, and a lot of it is around
topics that everybody can resonate with. Being
someone coming from business, it's always fun to see
visualizations that I'll never be able to create because
they're much ...
Kirill Eremenko: Yeah.
Kristen Sosulski: Much beautiful and ...
Kirill Eremenko: Yeah. Yeah, FlowingData. FlowingData is a good one
as well, so that something to check out. We'll include
all of these links in the show notes as well for our
listeners. Okay, cool. That was that question. Let me
see what else we got here. All right, this one. What
fascinates you about data visualizations? What's the
thing that makes you get up in the morning and be so
excited about your job?
Kristen Sosulski: Oh. Oh my gosh, so much. I mean, just that, gosh, it's
such a tool for like just to investigate your data. It's
such a pleasant way to approach a data problem by
coming up with a question and being able to dig down
and explore and struggle and wrangle with it for a
while, and at the end come up with something that
actually can help humans better understand a
phenomena I think is amazing. And being able to have
this medium at our disposal, that's what makes me
wake up everyday, besides my family and my son.
Kirill Eremenko: Yeah, yeah. Totally, totally. It's very ... I'm actually
quite glad we're not just machines, we just look at
numbers and sometimes you look at Excel spreadsheet
with thousand columns and million rows. Imagine if
you could just look at it and understand it. How
boring would that be, like when you didn't need
visualizations? Visualization is so much creativity
involved color, just feeling even. I think it makes
things much brighter and this professional data
science and analytics much more pleasant, I guess,
and exciting to be in.
Kristen Sosulski: Absolutely. Absolutely. I think that we expect it these
days too. We expect to have a visualization to help and
guide with that interpretation.
Kirill Eremenko: Yeah, yeah, totally. Okay, next question. Interesting
question on technology and the rate of change. We
know that data science is growing exponentially.
Technology is evolving exponentially. What do you
find, in terms of data visualization? How is data
visualization evolving as technology improves?
Kristen Sosulski: Oh my goodness. The tools are just getting so much
better, and we have lots of categories for visualization
tools. You have your basic productivity applications
that we've been using forever, like Excel or something.
But there's also now these other free applications like
Google Charts that allow us to create interactive online
visualizations in a snap when we have a small amount
of data. When we get much more sophisticated in our
practice and working with larger data sets, we have
programs like Tableau, ArcGIS, and robust business
intelligence tools that allow us to create dashboards to
have data that's streaming and dynamic like right at
our fingertips, so that's really, really amazing. For the
coders out there, there's been so much development
on visualization packages in R and Python, JavaScript,
you mentioned D3 before. A lot of new chart types are
really evolving. We're emphasizing the audience a lot
more with having interactive elements as well.
Kirill Eremenko: Yeah, wow. That's a very good overview of all of that.
Are you familiar with the Gartner Magic Quadrant?
Kristen Sosulski: Yes, yes.
Kirill Eremenko: I've been observing it for the past couple years, and it's
been very interesting to see how all these different
players, IBM, Microsoft, Tableau, Click, and some
others, how they are all participating and how like
before it was just Tableau. That was the top company
in the space. But now everybody's catching up and all
of their features, all of their missions and ways they
present, they allow users to create visualizations are
becoming more and more, on one hand, sophisticated
in what they can product, on the other hand easier in
terms of actual usage. It's just been really cool to see
how all of these companies have shifted into that very
lucrative space of the Magic Quadrant. Yeah, it's just a
very exciting space to be in.
Kirill Eremenko: Actually, on that note, I wanted to ask you, do you
think that technology will ever take over visualization
from humans? Will the skill at some point become
obsolete and just machines will be visualizing things
for us on like automatically?
Kristen Sosulski: That's a really, really interesting question. I mean, I
think there are certainly tons of instances now where
our data is visualized for us dynamically. Someone
had to start and create those archetypes, right, for you
to show your progress on your running app or how
many calories that you lost, or on your dashboard
displays for a stock price, et cetera. Those things are
already happening in a dynamic way. I think that
there's always going to be a need for inquiry and
inquiry driven by humans. Any time we have a
question and we're looking for data to answer that
question, we might have to actually mine that data for
insights and to see if we can find those answers. If we
decide we want to visualize those answers, we'll
probably still have, there'll still be some, I think, labor
involved in that.
Kirill Eremenko: Mm-hmm (affirmative). Well, that's good news, isn't it?
Kristen Sosulski: I guess so. I think so. I think so.
Kirill Eremenko: Don't want machines to take everything. Okay, well
that's fantastic. All right, well we're slowly, slowly
coming to the end, or quickly approaching the end
[inaudible 01:02:45] Can you imagine that it's already
been close to an hour that we've been chatting?
Kristen Sosulski: Oh my God. This has been really fun.
Kirill Eremenko: Yeah, yeah, for sure. What I wanted to also ask you is,
before I let you go, what is ... Oh, no. Even before we
do that, I have an important, exciting announcement
that we discussed. Just before the start of the podcast,
I spoke to Kristen and, first of all, for all of those of
you who are coming to DataScienceGO 2018, which is
at the time when this is released. It's going to happen
in the coming weekend. Kristen might be there and
you might get to meet her in person. But the most
exciting part is Kristen will be joining us as a speaker
at DataScienceGO 2019, next year. Super excited
about that. Kristen, how do you feel?
Kristen Sosulski: Oh my God, it's such an honor. Thank you so much,
Kirill.
Kirill Eremenko: Oh, it's such an honor for us. I forgot. I should have
introduced you as Professor Kristen at the start. You're
a professor at NYU. It's going to be so exciting for us
and our audience to have you there and for you to
share all of your amazing insights with everybody, so
very, very much looking forward to it. It's going to be
fun.
Kristen Sosulski: Likewise, likewise.
Kirill Eremenko: Awesome. Okay, so on that note, what is the best way
for our listeners to contact you? After listening to this
podcast, maybe somebody might want to take one of
your classes at NYU, or maybe engage you for
consulting job, or maybe they just want to follow your
career and see where it goes from here and what kind
of amazing visualizations you're going to create in the
future.
Kristen Sosulski: The easiest way is Twitter. It's just my last name @-S-
O-S-U-L-S-K-I. @sosulski on Twitter is really the best
way to contact me, but you could also feel free to email
me at [email protected].
Kirill Eremenko: Okay, gotcha. Twitter, email. Is it okay for our listeners
to connect on LinkedIn as well?
Kristen Sosulski: Absolutely. Absolutely.
Kirill Eremenko: Awesome. Awesome. Very cool. And, everybody, I'll
remind you once again, the book. Don't forget to pick it
up. It's called Data Visualization Made Simple:
Insights Into Becoming Visual. On that note, thanks so
much, Kristen, for being on the show today. Very, very
exciting, and I can't wait to meet you in person,
whether it's at this DataScienceGO or at the next one.
Kristen Sosulski: Same here. Thank you so much, Kirill. This was a
blast.
Kirill Eremenko: There you have it. That was Professor Kristen Sosulski
all the way from the New York Stern's School of
Business. I hope you enjoyed this podcast as much as
I did and got lots of valuable takeaways. For me
personally, one of the most valuable ones was
something I already use in my career, but it was very
nice to hear it reiterated by a person who
professionally teaches data visualization, and that is
the fact that when you, you need to think of the
formats differently when you present in person versus
when you create an online interactive tool, and when
you create a PDF, a downloadable PDF report. It might
be the same findings, but because they're presented
different mediums, you need to think of how you'll
present them differently. I'm sure you had your own
takeaways. Jam packed, this episode was jam packed
with lots of interesting knowledge and fun insights.
Kirill Eremenko: Make sure to check out Kristen's book. It's called Data
Visualization Made Simple. And, as you heard from
Kristen herself, she will be joining us for
DataScienceGO 2019 as a speaker. If you enjoyed this
podcast, you're definitely going to enjoy her speaking
there. You'll be able to buy the DSGO 2019 tickets on
presale very soon, so check them out next week. At the
same time, Kristen might be joining us for
DataScienceGO 2018, which is happening this
weekend. I can't wait for this to happen. I'm actually,
I'm recording this while I'm on my way to San Diego,
so I'm already going to be there when you're listening
to this. Can't wait to see you in person if you're coming
to DataScienceGO 2018. If you haven't picked up your
tickets yet, you can get them at
www.datasciencego.com. Make sure to head on over
there. Last chance to get your ticket and have fun with
400 other data scientists who are going to skyrocket
their careers this weekend. Once again, tickets are at
www.datasciencego.com, and I can't wait to personally
meet you this weekend in San Diego. Until then, happy
analyzing.