30
SDS PODCAST EPISODE 354: DJ PATIL ON HARNESSING THE POWER OF DATA SCIENCE COMMUNITY

SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

SDS PODCAST

EPISODE 354:

DJ PATIL ON

HARNESSING

THE POWER

OF DATA SCIENCE

COMMUNITY

Page 2: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

Kirill Eremenko: This is episode number 355 with Ex-Chief Data

Scientist of the United States, DJ Patil.

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name

is Kirill Eremenko, Data Science Coach and Lifestyle

Entrepreneur. And each week we bring you inspiring

people and ideas to help you build your successful

career in data science. Thanks for being here today

and now let's make the complex simple.

Kirill Eremenko: Welcome back to the SuperDataScience podcast

everybody. Super excited to have you back here on this

show and I am very inspired today. Why's that? Well,

because 15 minutes ago I got off the phone with DJ

Patil and we recorded the episode you're about to hear.

It's a super exciting episode with one of the most

famous if not the most well-known person in the space

of data science. So if you don't know, if you haven't

heard of DJ Patil, he is the person who co-authored

The Harvard Business Review article called Data

Scientist is the Sexiest Profession of the 21st Century.

If you haven't read it, read it. Make sure to check it

out. That gave rise to the popularity of data science.

He also coined the term data scientist. That came

originally from when he was working at LinkedIn. He

didn't know what to call himself. Him and Jeff

Hammerbacher didn't know what to call themselves

and they came up with data scientists. That's why we

have data science right now. And also he is the ex-

chief data scientist of the United States. How amazing

is that?

Kirill Eremenko: And on top of that, in this episode, we covered of some

very interesting and important topics. So here are a

Page 3: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

couple of examples of what you will hear about: data

privacy and ethics, data in healthcare and biotech,

DJ's work at the White House and some of his most

memorable moments while he was there, his current

mission at Devoted Health and what they're doing,

how much progress they're making, the future of data

science, data science for good versus data science for

bad or evil, and data science communities. So those

are just a couple topics we're going to cover of. I'm

sure you're going to love this chat, this conversation,

and by the end of it, you're going to be super-inspired

about data science and your career in the field. So

without further ado, I bring to you the ex-chief data

scientist of the United States, DJ Patil.

Kirill Eremenko: Welcome back to the SuperDataScience podcast,

everybody. Super excited to have you back here on

board, and today's guest is none other than DJ Patil.

DJ, how are you going? Welcome to the show.

DJ Patil: Thanks for having me.

Kirill Eremenko: Super excited to have you here. Everybody's heard

about you as the person who started this whole data

science movement. It's a huge honor to have you so

probably like first question would be how does it feel to

be the person who created data science as we know it

right now?

DJ Patil: Well, I think the way I really think about data science

and the movement is this is a community. A lot of

times when people look to a particular person and say

that person started it or this person really is the most

seminal person behind it, I think what a better way to

Page 4: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

think about it is what I got to do was help steer the

community toward a set of problems. But the thing

that is probably more interesting than anything else is

that this community has been going on a long time. If

we go back far enough, you get the Mayans and the

Indians and Chinese astrologers and astronomers. You

move to Kepler and Copernicus doing really amazing

things with data and very difficult calculations. You

get to people like George Washington who was doing

cartography and maybe should be argued as the first

real chief data scientist of the United States.

DJ Patil: And we've had a movement and that movement has

really manifested in the next wave of people who have

unbridled enthusiasm to use data, have incredible

technical skills, we have computational power that

we've never had the likes of that are easily accessible

through the cloud, we have storage and we have the

ability to collaborate just like we are miles apart. And

so that has manifested in ways that we can apply

technology in approaches that we had not thought

before. And so I really think of it as I've had the

opportunity to be more of a community organizer than

anything around saying this is how data science

should be.

DJ Patil: I think what we can say, if anything, is there are

certain things that data science, I would hope, aren't,

then we can talk about some of those and what I think

some of the challenges are if we do data science in the

wrong way and the impact. But I also think that we

should also not get to a place where we are so

regimented that we say data science is this one narrow

Page 5: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

thing. We should really think about data science as a

team sport and we all have different roles to play that

use data to make really fascinating good things

happen.

Kirill Eremenko: Yeah. I love it. I was reading an interview you had with

The Observer and you mentioned there that you are

generally opposed to trying to define data science too

rigorously. But it would be interesting to hear your

thoughts on what you just mentioned. What data

science is not. What are your comments there?

DJ Patil: Yeah, so I think the first thing that when we say what

data science is not is the question about, in the most

extreme form, is when should we not be using data or

data in ways that possibly cause harm. And there is a

number of ways to look at this and this is why we've

been so active, people like myself, Hilary Mason and

Mike Loukides. We actually published a book on this

around data science and ethics. It's a small ebook. We

made it free for everybody because we want everyone

to kind of take away the ideas around how do we

actually start having this conversation about the

ethical use of data. When we think about it historically

and we ask where some of the most egregious human

atrocities have taken place... Take the Nazis. One of

the most egregious cases is the phone book. The

phone book is a database.

DJ Patil: And so as we head into this next wave of technology

being able to do things, what does it look like where we

might possibly do harm to people? And it's very easy

for us to say, "Oh, no, that's not going to happen

again." But remember, we've also had a history of

Page 6: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

biomedical research, particularly in the United States

as well as the Western World, where we've had issues

like Henrietta Lacks and Tuskegee syphilis

experiments where we've had breaches of the way we

do things in an ethical manner. Right now we're faced,

in this time right now, about how do we use

technology to ensure that they are implemented with

the values we would like? There's conversations where

people are using data that is scraped from websites

like social media, Instagram and Facebook, to create

the basis for facial recognition technology for police

departments and maybe parts of the government. Is

that acceptable? Should we allow people to do that?

DJ Patil: When we think about voting and using data to

disenfranchise voters, that's a bad problem, in my

mind, for what data scientists should be. We haven't

figure out how to self-police. Other communities have

figured out how to self-police. If somebody works on

genomic research and it isn't considered acceptable,

the community knows how to address that situation

and then there's legal ramifications on top of that. We

have to get to the place as a community of asking

ourselves what is acceptable. And the specific way that

that was actually implemented as the US chief data

scientist is a mentioned statement of that role, and

that is to responsibly unleash the power of data to

benefit all Americans.

DJ Patil: And I think data scientists should take this to the

statement of how do we responsibly unleash the power

of data to benefit everyone? Just because we can,

doesn't mean we should. That's part of the

Page 7: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

responsibility. I think we should extend that to

everybody to make sure we are using data to empower

every single person.

Kirill Eremenko: That's a very interesting and valid point of view and

here I'd like to refer to what I mentioned before the

podcast. I was listening to your open hearing on

developing and deploying next-generation technologies

to, I think it was to the Congress? I don't know enough

about politics to understand the dynamics there. One

of the concerns that came up there was I noticed a lot

of the questions were around China, around how the

US is competing with China for domination in the

space of artificial intelligence and other exponential

technologies.

Kirill Eremenko: While these ethical considerations are extremely

important, they are crucial, one of the issues that I

can see and what I also heard in this open hearing is

that they are usually limited to one single jurisdiction

or certain set of countries, maybe like the Western

World or America or Europe or China and so on. And

so what are your comments on imposing these ethical

and certain restrictions on development of data

science, while absolutely important, can inevitably

slow down or inhibit the rate of progress that the US

or the Western World will have as opposed to what will

happen in China where they have their own ethical

considerations which might be very different and they

can get much further ahead. What kind of

consequences can that carry?

DJ Patil: Great question. One of the reasons I think everyone is

fixated on China is largely due to how aggressively

Page 8: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

they are investing. And it gets to a place where we can

easily point the finger and say, "China is doing all this

stuff and so we should slow them down." I think the

better way to look at it is why aren't we investing as

aggressively in our own societies to continue to keep

up our pace and our competitive edge? We have

dropped the amount of funding that we have

supported our basic sciences every year. We continue

to have questions, even right now, around the Centers

for Disease Control, the CDC, about funding. This

current administration wants to cut the funding to

those groups and yet we're seeing the ramifications

when we don't fund research as well as these groups.

And that's not just a US, that's the Western World.

China is increasing the funding. We are entering a

space where, within the next 30 years, we will no

longer have singular dominance that we've seen.

DJ Patil: As that develops, one of the questions that's inherent

is values and what does it look like with western

values? Part of the reason why western values are

important is it's about democratic process. But when

we think about science and we think about areas like

cloning humans, we have a framework that has been

developed through a lot of hardship. Much of that has

been in Europe through the Nuremberg trials that

turned into Nuremberg code that turned into bioethics

after WWII. And we've realized that certain things and

experimenting on humans has, not only negative

repercussions for society, but it takes away not only

human dignity but it actually is a road down which

you get into all sorts of thorny issues that we have

Page 9: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

realized that are just not acceptable for people when

they don't have consent.

DJ Patil: In China, it doesn't have to be China, it can be other

countries with totalitarian regimes, that you run into

the same aspects. So when we think about the power

that is about to be unleashed through technology and

data, we have to ensure that that technology works for

us rather than against us. And when we look at some

of the technology deployments that are being done

where you have groups that are being persecuted

through the use of technology, facial recognition or

other things, that's a problem and we have to figure

out how, as a society, we are going to make sure that

the technology and the focus of how we implement

those technologies is really on the side of democratic

values.

Kirill Eremenko: Mm-hmm (affirmative) Gotcha.

Kirill Eremenko: Hey everybody, I hope you are enjoying this amazing

episode with DJ Patil. This is a quick announcement

and we'll get right back to it. We are hiring at

SuperDataScience. With the recent pandemic and the

corona virus we all know how a lot of people have lost

their jobs and their source of income, so hopefully this

will be a breath of fresh air for some people out there.

We are a 100% remote team, we all work online, we

continue to grow and I've just, literally just published

10 new positions at SuperDataScience, which might be

suitable to you.

Kirill Eremenko: And even if they are not suitable to you, check them

out, they are at superdatascience.com/careers, check

Page 10: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

them out and send them to somebody you know who

may have been displaced by this pandemic and all the

lockdowns, who may have lost their job and source of

income. You could change their life. We are creating

opportunities for people to do their best work, to

contribute, to create amazing products, to create

amazing experiences for people studying data science.

Kirill Eremenko: So here are some of the positions that have just been

released: VP of Marketing, Product Designer, General

Manager, VP of Sales, Junior Media Creator, Sales

Representative, B2B events Sales Representative,

Event Marketer, B2B Sales Representative and

Marketing Strategist. And those are just some of the

intial positions that we have available right now. More

will come soon, so keep an eye out at

superdatascience.com/careers. Maybe we'll even post

a data scientist position in the near future.

Kirill Eremenko: But even if none of these are relevant to you

specifically, if you know somebody who's in marketing,

or in sales, or who's a great general manager, who's

great at creating amazing products in education and

learning experiences, or who's great at running events

or somebody who is amazing at creating animated

videos, if you know any of these people, any people

with the right talents and skills, please send them this

link, superdatascience.com/careers. This could

change their life or career especially in these dificult

times. Thank you very much for your help and let's get

back to the episode with DJ Patil.

Kirill Eremenko: One of your co-panelists on this open hearing, Mr.

Chris Darby from In-Q-Tel, he had an interesting

Page 11: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

comment. He said, "All roads lead to two places..." in

technology, I'm assuming, "... microelectronics and

biotechnology." And data science is at the core of all

technologies right now, in my perspective, because it's

data, right? And then he proceeded to quote a

scientist, as he mentioned, a scientist from China, and

he said that according to the scientist, the quote was,

"The Europeans won the industrial revolution, the

Americans won the IT revolution, and in China, we're

going to win the bio-revolution." What are your

thoughts on that and how can America and the

Western World compete with China in the space of the

bio-revolution?

DJ Patil: So I think it's very easy to try to just highlight China

as the bad guy in this kind of situation. And it's more

useful, I think, to ask who are we really competing

against? To me, we're competing against cancer. We're

competing against the pandemic that is already here.

We're going to have far too many people that are going

to be killed by this disease because we weren't able to

use data efficiently to know where it is, to test

appropriately, and develop strategies to get ahead of

the typical infection curve that is the exponential rate

of infections. So when I look at that, I look at what's

holding back a cure. Well, one, we have the best data

sets right now in the United States and across Europe

because we have not only genetic diversity but we have

great electronic medical records. The problem is the

data is fragmented over thousands of databases and

there's no ability to easily pull that data together.

Page 12: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

DJ Patil: Earlier this week, new rules were passed by the

administration to actually make sure that the data

remains a patient's data and you can take your data

and move it, and that includes to researchers. The

reason that's so powerful is, if we're able to bring that

data together and you have fantastic data scientists

working on that data, maybe there's cures already out

there we just haven't realized are cures. And when we

partner with epidemiologists, researchers, and the

traditional drug discovery units, maybe we'll find

something that could be used from off-label use. It's

not already used for one thing but if we use it there it's

going to have fantastic impact. Maybe it's going to help

us identify new forms of disease vectors that we hadn't

thought about and then when we look at them we'll go,

"Oh, wow. How amazing is it that we now have this

targeted population that if we find a cure for, we're

going to give them disproportionate value added for

life."

DJ Patil: We look at something like ALS, Lou Gehrig's disease.

We look at Alzheimer's. We look at all these things.

These diseases don't care what race you are. They

don't care where you live. These are problems of a

species. What I look at as a country, and this was why

it was so important that when President Obama

launched the Precision Medicine Initiative and put Joe

Biden in charge of the Cancer Moonshot, was that we

have to put data together along with all sorts of other

things, microelectronics, biotech, new sensor designs,

all these things together to find new ways to think

about these diseases. We cannot be thinking about

Page 13: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

them in the ways of the previous few decades. Central

to that thesis are going to be the data scientists. The

data scientists are going to be the ones that are going

to unlock this. Whether you call them a data scientist,

you call them an epidemiologist, that person who is

looking at data right now, that person is going to be

key for helping us get ahead of this pandemic that is

here now called COVID-19.

Kirill Eremenko: Yeah, that's definitely a big problem. I saw recently

that Johns Hopkins University released data to the

public that you can go and analyze about COVID-19.

As you say, maybe somebody will come up with a

solution along the way.

DJ Patil: Well, this is why transparency of data is so critical.

Right now, we don't have great transparency of data

between countries. China has been far too slow in

releasing the data. That was true during SARS. We've

seen this also during MERS that there wasn't enough

data sharing. And the Ebola incident, one of the most

powerful things that was used to help get ahead of the

Ebola incident was Google Docs because people would

share their data as spreadsheets and you didn't know

when that spreadsheet was last really updated and by

who. So having real-time and somebody filling in the

data that they saw in their town and updating it daily

gave everyone a clear indicator of where the disease

was moving and propagating and allowed us to get

infrastructure in place to make sure that you could

start helping people.

DJ Patil: That transparency is not happening fast enough right

now in the United States. For example, where are the

Page 14: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

total number of tests? How many are administered?

How many are positive? All of this, if there was very

aggressive data sharing across a federal system, across

the states, across the cities, across the towns, we'd

have a much better realistic picture. And then we

could start developing strategies very quickly. We

could learn from the Chinese because they've dealt

with this first. We could learn from the Italians. And

then we could share with countries that are going to

be impacted that don't have the quality of healthcare

system that we do so the number of deaths in those

societies is going to be substantially higher. We could

save a lot more lives if we had people just doing

something very simple with just data sharing.

DJ Patil: This is one of the things that's really important that I

have found in my experience around these things is we

often look to the AI solution right away. A lot of times,

we could just go with the tiny, bare-bones, just share

some data and you'll find a huge amount of lift in the

problem. That's not to say we shouldn't do the AI

solution. I'm not saying that at all. I'm saying that let's

focus just on some of the basics. Can I give you a

concrete example?

Kirill Eremenko: Sure. Sure.

DJ Patil: In Miami-Dade, Florida, they realize that we have, as

many places in the United States, is that we have this

problem of too many people in jail. And one of the root

causes of that is mental health issues. People with

mental health issues get taken to jail rather than

actually getting to the treatment centers that they

should. Same with drug addiction. So if you see a

Page 15: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

person who's constantly getting picked up for mental

health issues, why do we keep taking them to jail?

That's kind of crazy.

Kirill Eremenko: Mm-hmm (affirmative)

DJ Patil: So, instead, what they decided is they decided to say

let's share data between our public health system and

our criminal justice system but in a super-secure way

that respects privacy. The data can only flow from

criminal justice to the health system, not the other

way around. And when somebody gets picked up, they

check in with the public health system and if they see

that person they don't take them to jail. It cost

something like a million, million and a half dollars, to

get this going. In the first year alone it saved 10 million

dollars.

Kirill Eremenko: Wow.

DJ Patil: But the real value is it closed a full jail. Then a little

later on they closed a second jail. All that was done

there is sharing data. It's the spreadsheet. It's literally

a spreadsheet, now with a lot of safeguards in place,

but a spreadsheet.

Kirill Eremenko: Very interesting. Wow. Yeah, so that really shows

importance of having this role in the government. It

was very exciting to hear when you got the chief data

scientist position which was created for the first time

by Obama and you were the first chief data scientist

for the US. I think it's very important. Is there a chief

data scientist at the moment in the US?

Page 16: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

DJ Patil: They are looking for one, is what they tell me. So

maybe somebody here will apply. As much as I might

be harsh on this administration, there also have been

a number of really good things this administration has

done around data. For example, recently President

Trump did sign an executive order that basically asked

a reevaluation of how we look at organ donations. The

fact, right now, is too many people in this country go

without an organ when they could easily receive a

kidney or a liver or a heart or something that would

give them an incredible number of days left in their

lives. They would be able to take that.

DJ Patil: But the reason that happens is there aren't any quality

measures that actually assess when are people doing a

good job of actually making sure those organs get to

the right person. And so, as a result, many times these

groups that actually have the responsibility to do this,

they let the organs expire. They're left in the body for

too long. They're not picked up in time. They're

mishandled. And so a person who's waiting on the

operating table to receive their kidney doesn't get it.

And that's just a tragedy when it could be so easy to

do.

DJ Patil: We're not talking, again, any sophisticated AI. We're

talking about just measuring something and having a

dashboard that allows us to ask ourselves are we

doing a good job or not, and continuously improve it.

Kirill Eremenko: Gotcha. No. Totally agree. Totally agree. In the interest

of time, let's proceed to our little experiment that we

did on LinkedIn asking people for questions. So, as

you saw, there are dozens of questions posted for you

Page 17: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

from people. Very excited to hear from you. Maybe let's

have a few. What is your favorite question out of the

ones that you saw on LinkedIn?

DJ Patil: Oh, boy. I can't pull it up here simultaneously.

Kirill Eremenko: Oh, okay. No worries. One of my favorite questions

was from Akshey who asked, "What do you think

makes a good data scientist and how do you approach

any data science problem?"

DJ Patil: The thing that I have found, Akshey, time and time

again is the best data scientists have curiosity. They're

the people that just have this ability to go, "What

about this? What about this? What about this?" And

the question I used to literally give back in the days of

LinkedIn is I used to say, "Pretend you had all of

LinkedIn's data. What would you be interested in

knowing? What would be the first thing you would

want to know?" And you'd be surprised how many

people would just stare at me blankly. The best data

scientists, they would just start and they'd have idea

after idea after idea. And they would just keep going

until I was like, "Okay. Okay. We're good." The best

ones, oftentimes, would be like, "Have you thought

about this or this?" and I'd be like, "Oh my gosh, no. I

haven't." Or they'd say, "What if you combined data

from LinkedIn with this other data set? Have you

thought about that? And what about this? Have you

tried this? Or could we turn this into a product that

would have value this way?"

DJ Patil: That curiosity plus passion is something that you

develop especially at the intersection of multi-

Page 18: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

disciplinary sciences. So, myself, I was working in

non-linear dynamics. Was doing math but was also

doing a tremendous amount of weather data. And so

you kind of have to sit at these intersections and

you're just trying to find data sets. You're trying to

figure out things. What I tell a lot of data scientists is

you need to play with a lot of data sets to just develop

intuition, to develop curiosity. Be very fast at plotting

something, trying something, getting a sense of what's

going on in the data. For me, sometimes when I get a

data set, the first thing I love to do is just kind of tab

through the data and just get a sense. There's this

moment like if you use Unix or Linux you're using the

more command and you're just seeing what's in this

file. Are there characters? Are there just numbers? Are

the numbers decimal? You just let it blur and you just

get a sense of what's in there. It just starts to expose.

DJ Patil: And then I'm trying to find lots of ways to just visualize

it. Visualization, for me, oftentimes, is just histograms

to get a sense of what's in this and then trying to go,

"What if? What about this? What about that?" The

more you can develop that the better I think you're

going to be at being really fast at helping find solutions

for another person.

Kirill Eremenko: Gotcha. Curiosity. Wonderful answer. I love it. Suman

asks, "What are the new challenges where data science

is heading towards? What is your vision for data

science in the next five years?"

DJ Patil: Wow, Suman, great question. So the first is I think

there are so many areas I'm so excited about data

science impacting. I think data scientists are one of

Page 19: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

the new form of first responders. You know, when

there's an earthquake in a remote area of the world,

before people can even get in to help, first responders

now have the ability to look at satellite imagery, drone

footage, being able to tell which roads are washed out

or bridges have been wiped out. If it's a hurricane we

could use drones plus just a little bit of computer

vision to actually tell which houses people are on.

Could we then route boats to quickly get to all those

people just like we'd use Uber or Lyft or UPS uses

routing algorithms?

DJ Patil: In terms of the biological fields of trying to understand

how disease manifests using large data sets to find

that basis like the Precision Medicine Initiative. I think

about the world of understanding new chemicals and

particularly about material sciences and using data

science to help understand how to get better

manufacturing. That's a fantastic area.

DJ Patil: I look at the world of how do we create tailored

education and help people learn faster? Myself, I was

such a bad student. I think tailored education

would've really helped someone like me. I could go on

and on. If there's one thing I think that I'm most

excited about for the data science field over the next

five years is this is central to the success of every

institution and every organization from nonprofit to

for-profit, the government, everybody will have to have

some notion of data. And everybody that's being

trained in undergraduate curriculum will have some

element of data literacy.

Page 20: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

Kirill Eremenko: Mm-hmm (affirmative) Absolutely. Absolutely. Like

Andrew Ng says, "AI is the new electricity." Right? I

can't even think of a single business that doesn't use

electricity right now, whereas 100 years ago I think the

residential electrification of the US was around 50%.

So it's massive.

Kirill Eremenko: Okay, next one is a fun one from Abhishek. "Which

was your most memorable work memory when you

were at the White House?"

DJ Patil: Oh, boy. What's my most memorable? The White

House is phenomenal in the way that there are

moments where things are incredibly astonishing

positive and astonishingly sad. That's just the

reflection of how complex the world is.

Kirill Eremenko: For example, what do you mean?

DJ Patil: On a positive, I remember so many positive ones. One

that always stands out in my mind was the day the

President was flying back from being in the south

where he was doing a memorial for a number of people

who were shot in a church. And he was flying back but

that was the same day the Supreme Court ruled that

anybody can get married to whoever they like because

love is love. We put the colors on the White House as a

rainbow. And I remember the President's helicopter

coming in from such a tragedy, circling around, and

we were thinking about the juxtaposition of such

vicious hatred at one moment that the President is

having to console people over and the next moment

having these amazing crowds there to celebrate such a

phenomenal activity.

Page 21: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

DJ Patil: So many times meeting with people who have rare

diseases and are looking for a hope and realizing that

they cannot wait. They can't wait for bureaucracy to

figure out how this is to work. They need the data in

people's hands who are going to figure out how to find

this cure for something that their loved one has or

they have. Time is so essential. What data science is,

is it is an accelerant to solutions. If we're not careful, it

is an accelerant to entropy. It can cause incredible

harm. But when used and wielded correctly, it is an

accelerant to help to deliver solutions very effectively.

Kirill Eremenko: Wonderful. Thank you. Siddharth asks a question.

Something we touched on already, in this podcast, I

think. Maybe we could elaborate. Quite a long-winded

question but I think it's an important one. "Data

science seems to enforce centralized power rather than

decentralized power in multiple contexts. The best

consumer company's driven by data science are

monopolous like Facebook and Amazon. The best

enterprise data science companies are like Palantir

and Databricks which primarily serve the largest

companies in the world as their customers. Data

seems to do much more help to the Chinese

surveillance state than it helps democratize and

improve the way we vote. How can we use data science

as an equalizing force for society rather than a

centralizing force? Is that even possible?"

DJ Patil: Yeah, so one of the most important things that just

happened this last week is the belief that a patient's

data is theirs. It doesn't belong to the hospital. It

doesn't belong to the doctor. It belongs to the actual

Page 22: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

human. For quite some time, the hospitals and

physicians have believed that they own the data. You

should not. Now, it is codified that it's your data and

you have access to it. If you want to move it? By all

means, you should be able to get access to it and you

should be able to take it to where you want. You want

to donate it? Great. Good for you. Donate it. It's giving

you control. That's part one.

DJ Patil: Part two is what we have to ensure is that there's

transparency of data. You have to be able to access it.

We still don't have enough reporting requirements for

people to know what data is being collected? Who's got

my data? Who sold my data? We're starting to see

elements of that in different policies, some of which are

in Europe under what's called GDPR and in California

under California Consumer Protection Acts, CCPA. But

we need more of that. Right now there are many data

brokers who can suck up data and use it without you

knowing it. Some of those data sets have real

implications for the population. For example, data sets

that are collected and used in loans has been shown to

actually impact negatively the black population. How

do we ensure that safeguard? We need that form of

watchdog. Somebody who's actually looking over the

shoulders for people to actually make sure that people

are using data in an acceptable way for society.

DJ Patil: The other part here is how do we train data scientists?

As we go forward and we think about the companies

and we think about who is there, what's fascinating is

we always talk about data interviews but we never

actually talk about giving people an ethics interview

Page 23: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

around data. So one of the things that anybody who

interviews with me, you'll go through an ethics

interview with me because I view ethics as part of

asking a question around cultural fit. If we can't see

eye to eye on how we think about the importance of

ethical issues, then how do we deal with it?

DJ Patil: I'll give everyone an example of one because it's not

hard. Supposed we're working on a problem and we

know we're not supposed to use race but we find a

proxy for race. We also find that if we use this proxy

for race, we're going to help a lot of people. What's

your next step?

Kirill Eremenko: Oh. That's a tough... What answers do you normally

get to that?

DJ Patil: Well, I think the real answer that's interesting is, first,

as an organization, what safeguards do you have to

make sure I have the resources to be able to address

this problem correctly? Is the organization prepared

because what if I don't know the answer? How do we

adjudicate this? Who do I ask? Do we culturally have

this? Everybody that is interviewing at a company

should ask their company how do you handle ethical

issues around data and technology? If everyone asked

that question when they interviewed Facebook or

Google or any of the other companies that were called

out, you would start to see a material change in their

approach.

Kirill Eremenko: Yeah, wow. It's so tempting not to ask, right? You just

want the job. You just want the high salary. You have

Page 24: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

to put the global best interest, the greater good, ahead

of yourself in order to ask that question.

DJ Patil: This is a thing that we have to grapple with as a

community. We want the salaries. We want the power.

We want the prestige. Where is responsibility in that

conversation? To be empowered by society to do things

with data and technology means that we have to lead

the way, also, on responsibility. We should be leading

from the front. We shouldn't have to have civic groups

push us and say, "Have you thought about this or

this? What about these issues?" We shouldn't have

regulators saying, "Hey, how are you doing this?" We

should be going to them and saying, "Hey, we have the

following concerns. We're not sure we have all the right

answers. What should the answers be? Can we work

together to figure it out?"

DJ Patil: We need to push society to understand the

implications of what we are developing, the positive,

the negative. Otherwise, if we do not, what will happen

is that data sets will be harder to access. There will be

more restrictions on it. Progress will slow. That also

means that people won't have as many jobs. But more

importantly than all of that is that somebody who

needs a cure, somebody who needs help in a disaster,

somebody who is relying on a technological

breakthrough to happen to improve their quality of life

or a loved one's life, will not get the solution in the

time they need.

Kirill Eremenko: Gotcha. Well, I can feel how you're passionate about

this. Now it makes sense to me how or why from

working at the White House and doing public service

Page 25: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

you moved into the healthcare space and doing data

there.

DJ Patil: Yeah, well, the reason I moved into healthcare is, a big

part of my portfolio that President Obama had set up

intentionally was healthcare. And I think rightly so

because he realized that people who are typically in

technology don't work on national security problems or

something else. We don't often gravitate to healthcare.

Or that people have been working in healthcare for a

while but they haven't had access to some of the newer

techniques that we really pioneered in the consumer

and enterprise companies. So what happens if we get

people together to do that? That genesis and looking at

that left us with the question that we had a chance

when we left the administration to ask well, what are

we going to spend our time doing?

DJ Patil: Well, if you look at that, one of the greatest challenges

that we have is how do we ensure that people have

access to the care they need, they want, they deserve.

And so we said the only way that this is going to

happen is if we actually show the way forward in what

we believe is true. And so we said we were going to do

this when the only way to actually make it work, in

our model, is through a corporate enterprise. And so

we started Devoted Health and the mission is to build

a healthcare system that takes care of everyone like

their own family.

DJ Patil: Literally, we have something that we call the prime

directive which is if you're not sure what decision to

make, close your eyes, visualize literally in front of you

the person that you think of the most, your loved one.

Page 26: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

What's the decision that you would want to make for

them? And when you have that, run it by other people

to make sure it's legal, it's safe, it doesn't have

downsides. Then take the action. In healthcare, time is

of the essence and so we have to build those solutions.

We have to build those technologies.

DJ Patil: And parts of it are already proving. We find everyday

somebody who is in a situation where our job is to

figure out how to unstick something in the healthcare

system for them. And it's not rocket science. A lot of

times it's just finding out something very obvious and

trying to figure out how do they actually get an answer

from somebody? Why do they have a drug interaction?

Why have they been prescribed drugs that are going to

cause some kind of interaction? Has anybody looked?

Has anybody double checked with them? Those simple

things.

Kirill Eremenko: Wow. Sounds like you're making massive progress

with Devoted Health and I wish that it goes really well

and we all see results, especially-

DJ Patil: We hope so. It's not a winner-take-all market. We're

excited that more people are coming to work on these

problems. We need more people in this country to

work on these things. If we have more people working

on these problems together, the we wins. It's what is

behind when we say we, the people. We, the people,

isn't just a whole bunch of individuals. It's we as

collective people, as citizens, as community, as

companies, as nonprofits, as religious groups. When

we all come together against a problem and we decided

people should have not only access to healthcare, they

Page 27: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

should have access to good quality healthcare. And it

should be affordable. Then we're going to see the

change happen.

Kirill Eremenko: Yeah. Gotcha. Amazing to hear this trajectory and the

progress that's being made. I know you have to go, DJ.

DJ Patil: Can I give one more thing?

Kirill Eremenko: Of course.

DJ Patil: Yep. What I would tell people to think about a lot of

times when we're thinking about data science and

we're thinking about the problems that we pick. As

data scientists, we get to pick our problems that we

want to work on these days. Ask yourself, what is

going to move the needle the most for your children

and your children's children? Because we're in that

inflection point as a society that if we pick the

problems that move the needle for our children and

our children's children, we will select a set of problems

that will deliver outside value for decades to come.

DJ Patil: When that impact manifests and we look back at our

careers and we look back at what we've done and how

many people we've helped along the way, then we can

rest easy. If we look back and we only say, "Gee, that

only benefited me." what good is that at the end of the

day? It doesn't matter if you wrote the fastest

algorithm in the world, you're traveling alone. And

that's a sad lonely place you could be and it's a wasted

set of skills, in my opinion, because everybody that is

working in the data science field has such phenomenal

opportunity to have an impact now. And society

Page 28: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

cannot wait for the impact that every one of you can

provide.

Kirill Eremenko: So much leverage. Data science provides so much

leverage.

DJ Patil: It's leverage and that's why we have to do it as a team.

It is a team sport and all of us have to be on that team

together collaboratively to make this happen. This is

why the community that you're putting together is so

important. Without that community, where are we

supposed to talk about these hard things? Where are

we supposed to have dialogue? Where are we supposed

to push each other? Where are we supposed to learn

from each other? We have to create those

communities. And it's not just one community, it's

going to be different kinds for different types. Who

knows where it's going to evolve? But without us as a

community, we're going to be struggling to actually be

on the right side of this equation over the long arc of

history.

Kirill Eremenko: Gotcha. Wow. Thank you very much, DJ. I think we

can wrap on that. I know you have to go but that was

very inspiring. I feel so inspired just listening to you

right now.

DJ Patil: Yeah. Well, thank you for everything you're doing for

the community. It's very much appreciated.

Kirill Eremenko: Thank you very much.

Kirill Eremenko: So there you have it, everybody. Thank you so much

for being here today and being part of the

SuperDataScience community. As you heard from DJ

Page 29: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

Patil himself, communities in data science are ultra-

important because where are we going to discuss these

critical issues, ethical privacy, future of technology

issues that are on everybody's mind that are dictating

where this field, and where the world is going. Because

data is underpinning all technologies that are

revolutionizing the world and data science is the way

to deal with data. And on that, I hope you enjoyed this

episode. My personal favorite part was when DJ was

talking about the importance of doing data science not

just for yourself. Being in the field not just for the

purpose of benefitting yourself, but instead, thinking

about others. How you're impacting the world, the

communities around you, people around you?

Because, as data scientists, we have so much leverage

to create impact, in DJ's words, "it would be such a

waste of our skills to just think about ourselves and

not think about others." I found that very inspiring. I

hope you did, too.

Kirill Eremenko: And if you enjoyed this episode, I highly encourage you

to follow DJ on LinkedIn where he has over 700,000

followers as well as other social media. We're going to

include all of the relevant links in the show notes, as

always, and you can find them at

superdatascience.com/355. That's

superdatascience.com/355. And one thing I would like

to ask of you, if you did enjoy this episode, please

share it with your friends and colleagues. Let's spread

the word about data science and what missions we

have as data scientists across the community. If you

know a data scientist, if you know a data science

Page 30: SDS PODCAST EPISODE 354: DJ PATIL ON ......HARNESSING THE POWER OF DATA SCIENCE COMMUNITY Kirill Eremenko: This is episode number 355 with Ex-Chief Data Scientist of the United States,

manager, data science leader, data science

practitioner, somebody who is getting into the field of

data science, send them this episode. It's very easy to

share, just send them the link

superdatascience.com/355. And on that note, my

friends, I really appreciate you being here today. Can't

wait to see you back here next time. And until then,

happy analyzing.