32
SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE

SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

SDS PODCAST

EPISODE 43

WITH

DEBLINA

BHATTACHARJEE

Page 2: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Kirill: This is episode number 43 with AI Researcher Deblina

Bhattacharjee.

(background music plays)

Welcome to the SuperDataScience podcast. My name is Kirill

Eremenko, data science coach and lifestyle entrepreneur.

And each week we bring you inspiring people and ideas to

help you build your successful career in data science.

Thanks for being here today and now let’s make the complex

simple.

(background music plays)

Welcome, welcome, welcome to the SuperDataScience

podcast. Hope you're having a great week, and today we've

got a very interesting guest, Deblina Bhattacharjee. She is

calling in from Seoul, which is South Korea, and she is an AI

researcher working at one of the universities there, or doing

her degree at one of the universities there. And a very, very

interesting conversation that we had. It's all about AI, all

about artificial intelligence, the different types of algorithms,

different types of tools, different types of problems. So in this

podcast, you will learn what an optimization problem is and

the different approaches to the optimization problem. You

will also learn about the important tools for a data scientist

to learn now to prepare for the future of the field of data

science and artificial intelligence. You'll also learn about the

important techniques which are going to be valued in the

near future.

And of course, Deblina will tell us about the research project

that she is doing. Very interesting, it's about artificial

intelligence, but it's different to neural networks. It's a

Page 3: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

different approach. It's not inspired by the human brain, it's

inspired by something else. And what exactly, you'll find out

inside this podcast.

And of course, we'll talk about many, many other things. So

we'll talk about Hadoop, we'll talk about Strata, Scala,

Spark, all the different tools, all the different applications,

and you will even see how Deblina's algorithm can be and is

used in health to actually help people have better and

healthier lives and even sometimes save people's lives.

So there we go. That's what this podcast is all about. And

can't wait for you to check out all the interesting and

insightful and even cool concepts that we're going to be

discussing. And without further ado, I bring to you Deblina

Bhattacharjee.

(background music plays)

Welcome everybody to the SuperDataScience podcast. Today

I've got a very interesting and exciting guest with us, Deblina

Bhattacharjee. Deblina, how are you going, and where are

you calling from today?

Deblina: Hello Kirill. Thanks for inviting me to be a part of this

podcast today. I'm doing great and I'm calling from Seoul

right now, in South Korea.

Kirill: Yeah, in South Korea. Wow, we've never had anybody from

South Korea on this podcast. What brings you to South

Korea?

Deblina: What happened was I was working on this automated

intelligence project for healthcare during my Bachelors. So I

just sent out the project proposal to a couple of universities

Page 4: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

for pursuing my higher education. So at that time, one of the

universities -- I got offers, but then there was this particular

university which was exactly working on what I wanted to

work on in the future. And also, the kind of opportunities

that Seoul was giving me was really enticing. So I chose

Seoul and ended up as an AI Researcher over here.

Kirill: Ok, that's very cool. And we'll get to that part in a second.

But just out of curiosity, do you speak Korean? How do you

get by in Seoul?

Deblina: Yeah, you require Korean.

Kirill: Yeah?

Deblina: Yeah, you do require. But then my Korean is really bad. I am

just getting my grip. It has been a year since I started

learning Korean, so yeah.

Kirill: Do you know like kamsahamneeda?

Deblina: Oh yeah, kamsahamneeda, that's like thank you.

Kirill: Kamsahamneeda! I also know how to count, I think. [Counts

in Korean.]

Deblina: Oh yeah, exactly! Wow!

Kirill: That's pretty much all. How long did it take you to learn

Korean?

Deblina: So I told you I started like a year back. I know how to read

and write, but then my vocabulary is like really bad. So I

need to like pick up words whenever I come across people,

and there is this blank look that they give me and I give

them when we really can't get across what we want to say to

Page 5: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

each other. So yeah, things happen. But then I pick up a bit

from here, and then I listen to people talking. I manage.

Kirill: Ok, alright, that's pretty cool. That's awesome to hear, and

it's a big jump to learn a new language, to move to a country

for your dream work. That's awesome. But give us a bit of

background. Where did you start? You obviously didn't start

in Korea. Where did you start and how did your life take you

here? What events happened in your life, what did you study

in high school, and just walk us through how you ended up

in Korea.

Deblina: Ok, so what happened was when I was like 8, my granddad,

he left me a treasure of close to 40 books on mathematics

puzzles and those things on pattern analysis that we used to

solve as kids. So that really drew my interest towards the

field of math and science. And those books were by the

famous Indian mathematician Shakuntala Devi, I don't

know whether you have heard of her or not, but those books

were really something and it drew me towards that field, and

I used to relentlessly solve patterns and used to look for

patterns around, and basically do anything that's related to

numbers, which is all data. And lots of finding patterns out.

So yeah, machine learning happened.

Also, the second thing that happened was at around 2003,

my dad gifted me a computer. So I was taken aback by the

amazing stuff and awesome, cool stuff that I can build using

the computer. So I started doing my pet projects at around

14, I guess. After that, I used to take part in the National

Olympiads. So one of the national cyber-olympiad in my

country --

Page 6: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Kirill: Sorry, this is in India, right?

Deblina: In India, yeah. So I topped it.

Kirill: You topped it. Congratulations. That's awesome. 14.

Deblina: Yeah. Thank you so much. Yeah, it's been a great journey

since then, and at 14, after that, I believed that maybe I

could code and take this up as a career. And the Bachelors

happened, and thereafter my Masters.

Kirill: Nice.

Deblina: In Machine Learning and AI. Yeah.

Kirill: That's awesome. And so what languages did you start to

code in when you were 14?

Deblina: Ok so the first language I started to code in was C, which is -

-

Kirill: Yeah, me too! That was my favourite!

Deblina: Yeah, exactly! After C, Java, and I used to do C and Java

with the advent of the standard template library. And C++, I

started coding with C++.

Kirill: Okay, beautiful. But your Bachelors, did you also do that in

C, C++ or did you move on to other languages?

Deblina: During my Bachelors, it was really diverse, because

depending on what I’m building, what class I’m taking, I

used to switch between languages because again it was like

a course requirement. So it ranged from everything —

sometimes I was doing C, C++, C#, sometime just using the

platform of Visual Studio exploring everything to F#. And

then I got into Python and R, I think in my junior year.

Page 7: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

That’s the third year in my Bachelors. After that, all these

database related languages, too, SQL-related languages and

Hadoop. Yeah, I used to do all of them.

Kirill: Okay. And in your Bachelors, you said you studied machine

learning. Is that correct?

Deblina: Okay, so for Bachelors I didn’t have a specialty because in

India you need to study approximately 42 courses. You have

to do all of them, but at the end, in my senior year, you have

these electives. So, during that time, I went through

whatever can be the possible choices which is related to data

crunching and applying algorithms or models to solve them.

Machine learning was the best thing which was coming close

to it. And not only machine learning, I was always interested

in building intelligence systems. So I wanted to do

something really cool in artificial intelligence, so I took that

up and thought of doing my Masters.

Kirill: Okay. And just before the podcast, you were telling me about

how you scored this opportunity in Seoul and I think that

can be very useful to some of our students, or some of our

listeners who are still learning and maybe want to pursue a

Masters. Tell us a bit more. How did you go about finding

this great opportunity for yourself in Seoul? Did it just fall

on you?

Deblina: No, what happened was I used to always look up — I was

always a part of the communities online which are related to

machine learning and data science. I’m strictly speaking

about communities like “Data Science Central” and different

kind of opportunities on academic fronts, all those postings.

So I came across every possible lab, because for a Masters,

Page 8: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

what you need to do is you need to not only file an

application to the university, but also send a separate

application to your supervisor, under whom you will be

working, and also to a different lab with respect to your

specialty.

Because of that, I screened across 200-300 opportunities

and finally, I struck gold at the 250th one, I don’t really

remember. (Laughs) I just saw — this lab, the work ranges

from designing intelligent traffic systems to modelling smart

cities, building intelligent health care solutions, everything

which is related to variable sensors, machine to machine

communication, and Internet of Things basically.

This was what I wanted to do because this is a very high

level overview of the names that I just said. But internally

what we do is real-time big data analytics and along with

that, building algorithms or even tuning our models to build

these systems up. This was all I wanted to do and this lab

was a perfect fit and there are so many opportunities in

Seoul and the best part is that this country has amazing

technological advancement. Coming from India, I didn’t

really know, and I don’t even think most of the people

around the world know, what this country has to offer.

Everything is really well-organized. Only language is a bit of

a barrier, but everything else is super fine.

Kirill: Okay. Wow, that’s really cool. Why were you always so

interested in artificial intelligence?

Deblina: As I said, I loved pattern analysis. And after that what

happened was, when I was building my project which I

started off at around the time I was doing my Bachelors, it

Page 9: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

was something like an automated health care solution. And I

used to see this everywhere. Something like fever, or any

kind of first-time diagnosis that a person wants, and he is

not having a doctor around or doesn’t have the luxury of

visiting a doctor. So for such people I wanted to create a

really affordable or, if possible, free automated health care

solution which can be like your on-call doctor and you can

use that platform and type in whatever is your problem and

get the first-hand diagnosis.

There have been such projects by Microsoft and the likes out

in the world. But I also wanted to form a recommendation

engine for nearby doctors, just in case of emergency. So I

made that and then I thought, “Okay, now that I can do

that, why not AI and contribute further to this field?” So

that’s how AI happened.

Kirill: Wow, that’s really cool. Tell us then about what are you

doing now in your research. You mentioned that you’re

about to graduate very soon, right?

Deblina: Yeah, just two months away.

Kirill: Oh, wow! Congratulations. It must have been a very long

journey.

Deblina: Yeah.

Kirill: All right. So, what are you doing in your research?

Deblina: As I said, in my lab we work from modelling all these

intelligence systems and basically designing this smart city

concept which is right now happening around the world. My

work specifically, if you ask us to, is to evaluate the different

models and techniques in machine learning and apply them

Page 10: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

to solve these problems. These problems range from

sometimes just automating a traffic system or the health

care, but the thing is we need to integrate this all together

and make it an end-to-end system, the vision is a city of

future and totally smart.

So my work is to make sense out of the data that we receive

in real time, which is huge, and also design solutions,

sometimes mathematical models for solving these problems.

So first, what I do is I evaluate whatever existing techniques

are and sometimes I come up with my own models. Recently

I developed an entire algorithm from scratch and its

inspiration is quite interesting. If you might ask, I would tell

more about it.

Kirill: Yeah. Tell us more about it. So you basically built a whole

library, is that right?

Deblina: Yeah, I built a whole library. First of all, the entire design

because it wasn’t there. So I designed the model because

before building a library, I need to make sense of it

mathematically so that they completely understand.

Kirill: Yeah. What language was this library in?

Deblina: This was basically in C because the kernel of any language

is always C, the lowest kernel that it’s built on. So in order

to understand properly, I always start with C. So the thing

that happened was — if you look around and you look at the

way how trees branch themselves out, you would see that

there’s a pattern in how they branch themselves out.

Kirill: Yeah. I guess it depends on the type of tree, but yeah.

Page 11: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Deblina: Actually, not even the type of tree. You can even look at a

cactus, any species of tree in that entire kingdom, the plant

kingdom. You can see there’s a pattern of branching. That

pattern is basically the Fibonacci series, which is 1-1-2-3-5-

8-13. What happens is, the ratio of the two numbers, if you

would divide it, it becomes a golden ratio which is prevalent

everywhere, like even how our galaxy spirals out. So from

there, I thought “Wait a minute. These trees do not have a

brain, so to speak, they do not look intelligent. So how do

they know exactly in which direction, angle to grow in in

such dynamic environments, something that they don’t

know?” The environment can be really non-stationary. I’m

using really technical terms—

Kirill: Yeah. Basically, how do the trees know what the Fibonacci

ratio is?

Deblina: Yeah, exactly. And somehow they just gain that overall

stability. Even if they’re slanted above the ground, they still

have the stability. So I decided to dig in about their

mechanism and what works, and then something blew my

mind. I didn’t know this, but they can communicate, see,

hear, and even have a memory of 40 days. It’s all there, the

biologists have researched it. The best thing is that they can

learn and they have 13 different discovered sensors and we

humans just have 5. So, the kind of sophistication that

these trees have is just mind-blowing. I just decided to

model their intelligence and design an algorithm based on

this. So it’s just strictly nature-based, like any other nature-

based algorithms of soft computing. So I built this algorithm

and I made it to solve optimization problems in the

applications that we work in our lab.

Page 12: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Kirill: Okay. So how well is it doing? Is it beating other algorithms?

Deblina: Yes, yes, perfectly. With respect to accuracy, it’s definitely

beating other algorithms but not so much with respect to

time. We already have better algorithms because the speed

of any such nature-inspired algorithms is hindered a bit

because it has an enormous number of parameters on which

such algorithms are based. So, the parameter tuning is

required, and that takes a bit of time.

Kirill: Okay. All right, just for everybody out there, I just wanted to

say, in terms of Fibonacci numbers, I have already heard

about them, but if you haven’t, what Deblina is describing is

very interesting, they are all over the world indeed. And the

golden ratio, if you divide those numbers, it’s basically 1, 1,

and then you just keep adding. So 1+1 is 2, 2+1 is 3, 3+2 is

5, 5+3 is 8, 8+5 is 13 and so on. And if you divide one

number by the other and you take the limit of that, it will be

1.1618 something. Basically, that number 1.1618 is called a

golden ratio. You can see it all over the world.

Basically, right now, if you pause this podcast and you take

a ruler and you measure the distance between the tip of

your middle finger to your wrist, so your hand, and then you

measure the distance between your wrist and your elbow, I

think it is, you will find that the ratio between them is

exactly 1.1618. How crazy is that? Even us humans, we are

designed by that ratio. And the fact that trees grow based on

that ratio is no coincidence. It’s just anything that is

natural, like starfish grow in 1.1618, galaxies spiral in

1.1618.

Page 13: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

There’s lots of debate about which galaxies spiral like that

and which don’t but nevertheless, you can see it all over the

world. It’s a real mystery, but I’m not surprised when

Deblina says that an algorithm that is based on the golden

ratio can outperform others just because it takes into

account something that is so fundamental and all around

us. Yeah, that’s pretty interesting. But when you developed

this algorithm, and you’re saying you’ve come up with some

applications, can you talk us a bit more through the

applications or possible applications of this library that

you’ve written?

Deblina: Okay, so what I’ve done, it’s basically an optimization

algorithm so you can solve optimization problems using this

algorithm. Whenever I have presented my papers based on

this algorithm, there have been a lot of curious eyes around

and equally questionable minds. Some of them couldn’t

really get a grip on it. I totally understand that. They just

said it might be for Law School. So after the results that I

presented, and there were successful demonstrations where

I just selected an application like medical imaging and I

processed numerous CT scan images to find the location and

area of growth of tumours.

That was one application that I did and the results were

phenomenal because I just got it presented at one of the top

AI conferences in the United States this year and it was well

received. I’ve also applied it to other applications because we

do a lot of sensor data processing, so to find the optimal

features from that sensor data I have used this algorithm of

mine.

Page 14: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Kirill: Okay, that’s very interesting. And I want to slowly start

getting into the more applied area of artificial intelligence

and data science. To start off, can you please describe for

our listeners what is an optimization problem?

Deblina: So an optimization problem is — there are two types of

optimization. One is local optimization and the other is

global optimization. So when you’re looking for something

and you know what will be the result, the final result

becomes your global optimal solution. But when you move

towards that trajectory to get that final result, you get across

some local best results—

Kirill: Local maximums, yeah?

Deblina: Yeah, local maximums, exactly. I’m trying to just break it

down in a non-technical manner, which is a bit difficult for

me.

Kirill: Thank you for that. So, you have a global maximum which

you’re trying to find, but you have local maximums that are

possibly going to look like the global maximum and you

might think that that is the best option.

Deblina: Yeah. That’s what basically any optimization algorithm does.

It finds those solutions. Again, there are different kinds;

single optimization where you just have one objective like,

“Okay, I need to go to the grocery and get some stuff and I

need to get this product.” So that’s like one objective. Multi-

objective optimization becomes like, how many objectives

you are addressing. So it’s like, “I will go to the grocery store,

get that product, but then it has to be of the minimum

possible price.” I have two kinds of things to look onto, so

that becomes multi-objective. These algorithms have

Page 15: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

sometimes conflicting objectives, like something increases

and something decreases, sometimes both are increasing so

you have a maximization problem, many different objectives

and then based on that, the algorithms are built.

Kirill: Okay. All right. That makes sense. And then the more

objectives you have — for instance, when you have one

objective like “Get to the store,” then you have lots of

different ways to get there. You have lots of different paths

that you can take. That’s an optimization problem.

Deblina: Exactly, yeah.

Kirill: But then when you have multiple objectives, for instance,

“Get to a store and buy the cheapest butter,” you have so

many more. You have so many different types of butter to

choose from, so many different paths to take, and then you

can also go to different stores. That’s even three optimization

problems, but basically, is it correct that you have to

multiply all of the options? It’s not just a simple addition, it’s

a multiplication of all of the options.

Deblina: It’s a multiplication of the options and then you have to form

a single function of all those options together.

Kirill: Hence something called “the curse of dimensionality,” right?

Like, if it takes you 0.01 seconds to solve one optimization

problem, then when you have a thousand of them, it doesn’t

mean it’s going to take you like 10 seconds to solve those

problems. You have to multiply that. It’s going to take you

like a million years to solve them altogether. That’s why it’s

such a big deal in artificial intelligence that you cannot just

brute force through these problems. Even given the

computational power that we have now, you just cannot

Page 16: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

simply brute force through optimization problems. You have

to come up with smart ways of solving them.

Deblina: Yeah, because brute force for such kind of a problem would

definitely take two million years for sure. You have so many

parameters to take care of.

Kirill: Yeah, exactly. And the funny thing is that, our whole lives,

all of our lives, whatever we are doing in life, is an

optimization problem. You have to get to school, what time

do you wake up in the morning, do you pick up your kids

and then you go to work or do you go to work and then pick

up your kids. In what order and how do you do certain

things, what routes and paths you take, that’s all an

optimization problem.

And funnily enough, if we want to build artificial intelligence

that can rival us in terms of intellect, it has to be able to

solve optimization problems as good as we do. As humans,

somehow we can solve these optimization problems. Natural

selection has given us this ability and this amazing tool

called the brain which allows us to solve these optimization

problems. I think that’s why we’re trying to build these

neural networks, these deep learning techniques, because

they can mimic the human brain in the hopes that then

robots will be able to do the same. Is your algorithm based

on neural nets or are you taking a different approach?

Deblina: My algorithm is not based on neural nets because clearly

trees do not have a brain and a central nervous system.

Kirill: That makes sense.

Page 17: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Deblina: Yeah. But then, I have totally worked on something that you

just mentioned – natural selection. So, guided by natural

selection, there’s a continuous reinforcement loop of penalty

and reward and the system builds on that. You know how

natural selection works, right? I do something, it’s a good

thing for me, and I will continue doing that. If it’s a bad

thing, I won’t do that. That’s the thing. My algorithm, the

library that’s built is just that. There’s a loop, an underlying

reinforcement algorithm, which guides this for natural

selection. That’s how it functions. But I do understand how

neural nets and all possible deep learning techniques work

on because again, they are inspired from a human brain.

Their inspirations are different but the basic natural

selection theory is the same for both of them.

Kirill: Okay. Very, very interesting. And have you ever compared

your algorithm solving an optimization problem against a

neural network solving the same optimization problem?

Deblina: Yes, definitely. I have to do that because I get questioned at

conferences. I have compared it to other existing algorithms.

I would name a few, if you may.

Kirill: Yeah, sure.

Deblina: Okay. The particle swarm optimization, the artificial bee

colony optimization, and the ant colony optimization are

some of the really great optimization algorithms which are

out there from soft computing field. And in the deep learning

field, I have compared it to — because obviously there are

training and testing involved, that’s a different approach for

the deep learning field. Again, I need to subdivide and show

why I’m applying it to deep learning. That time, I compared

Page 18: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

it with recurrent neural networks, and I think the last time I

compared it in one of my applications it was with restricted

Boltzmann machines.

One of the things that I saw, and it was quite controversial

in one of my presentations, it was like recurrent neural

networks and it was also having a fuzzy logic base with that

neural network scheme. With increasing number of

generations of run, I mean, when it was running for more

than 400 generations – I’m talking about image processing –

it did not cover the exact regions of interest on that image.

But rather the contour that we wanted to select on an

image, it got scattered all over. And this was deep learning

doing it. So somehow there was some problem, but then

when I did it with my algorithm, maybe because it was

having a continuous feedback loop—I mean, I know deep

learning has that, but this was more based on experiences.

So this library took more time but it gave better results, I

mean exactly the regions of interest on the image. So that’s

how I compared it.

Kirill: Interesting. You’re saying that your contour was contiguous?

Deblina: Yeah.

Kirill: Okay. Very interesting. I thought the deep learning area for

image recognition is convolutional neural nets?

Deblina: Yeah, it is, but then what I did was I used this particular

fuzzy neural net system. Exactly, it was fuzzy convolutional

neural nets.

Kirill: Fuzzy convolutional neural nets. Okay.

Page 19: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Deblina: Yeah, exactly. So that was the one which I compared it with.

Maybe because I was working with R last night and I got

confused—anyway, that was how I did it. So, the contour of

interests were scattered for FCNN and not for the algorithm

which we developed.

Kirill: Cool. So your algorithm is pretty up at the top there.

Interesting. We might be studying that very soon. If you

write a Python library, maybe.

Deblina: Yeah, sure. Maybe. (Laughs) I will do that.

Kirill: All right, cool. And once you finish your degree in two

months, where do you think that will take you?

Deblina: Right now, I really don’t know because I’m totally focusing

on this graduation stuff. I’ve been writing my thesis. And

after that I’m headed to Intel for two months in their R&D

section for some work on Internet of Everything. That’s IOx,

a new thing. And after that I will just be open to

opportunities. I don’t really know, but I would definitely love

to learn and just keep doing what I’m doing.

Kirill: Okay. And do you have some kind of a dream, some problem

that you want to solve in the world using artificial

intelligence?

Deblina: As I told you, for me, health care is one of the things that I

really want to get out there and totally automate it. I’m not

saying like perform surgeries and stuff, but for the first-time

diagnosis, or even helping the doctors, making their work

easy. So that would be great, so that the time for diagnosis

is saved considerably and the accuracy of your prognosis,

Page 20: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

when you’re doing it, that will be much better. I want to do

that.

As of now, I haven’t really thought long-term what I’m going

to do, but it’s going to be everything related to AI. Another

problem is that right now, the field of AI has a lot of

capabilities like NLP, text and speech, knowledge recovery,

image processing, separately, but none of the work that we

have done with respect to all the work around the world

happening right now. We need to integrate it and make it as

one single system. That hasn’t been done as of even 2017.

The day that we build an end-to-end AI system, that would

be great, with all these functionalities. That would be the

ultimate aim of any AI researcher.

Kirill: Okay. That’s a big undertaking, definitely. Let’s talk a bit

about where the field of AI and data science is going in

general. From what you’ve seen around the world and from

the research you’ve done, what do you think the future is of

artificial intelligence?

Deblina: As far as I know, from the opinions of the scientists and the

researchers with whom I’ve met in conferences around the

world, what all of us are thinking is that the field of data

science is headed towards a fusion with intelligence systems

to create smart cities of the future. That’s the main vision. I

also strongly believe that with the on-going research with

real-time big data and Internet of Everything right now, data

science is going to explode in the future with a lot of stuff

happening. As of today – I just read this this morning when I

got up – Strata and Scala have been replaced by Hadoop

already—I’m sorry, Strata and Scala have replaced Hadoop.

Page 21: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Kirill: (Laughs) Just a bit of a different direction.

Deblina: Yeah. And there are these DataOps tools which are being

developed to help data engineers, like DevOps tools which

used to be previously for all the developers. Now there are

DataOps. They have been built by companies like Nexla and

DataKitchen. It’s really great, how the data field is

progressing. And also the automated predictive analytics,

which is the thing which is happening right now. This

predictive analytics had been automated last year and the

data robot was created and people were like, “Okay, by 2025

everyone is going to be out of jobs.” But then it was a bit

soon to say, because the data robot as of now just speeds up

model development for any model that you’re building, it’s

like the one-stop solution to speed up whatever you’re

implementing in the industry. It has a long way to go,

definitely, it’s a budding field. Both AI and data science

together is going to be really powerful in the future.

Kirill: Yeah, I agree. I don’t think data scientists will be out of a

job. I think that’s going to be the last industry to go.

Deblina: I really don’t feel so because as of yesterday there were more

than 4 million jobs out there for data scientists.

Kirill: Wow! Everybody listening, do you hear that? 4 million!

Deblina: Yeah, anyone with skills like Python, R, SQL, Hadoop, you’re

good to go. You are looking at good jobs straight down the

line for 25 years, a stable career.

Kirill: Stable and explosive career. What would you say are the

most important tools for data scientists?

Page 22: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Deblina: As I said, definitely Python, R, SQL. Spark right now

because obviously it has taken over Hadoop. Basically the

entire scikit-learn/numpy/TensorFlow of Python. If you can

do that, that would be great. So these are some tools, and

even I use that on quite a regular basis. Among the

techniques, if you might ask, there are clustering regression,

neural nets and decision trees. Most importantly, there are

two things, which is support vector machines and ensemble

learning, that you need to learn if you really want to get into

data science because all the companies out there work with

ensemble learning. Everything is an ensemble.

Kirill: Okay. That’s important. And why would you say SVMs are

an important tool?

Deblina: Yeah, SVMs are really powerful and they work very

differently than the existing clustering or regression

techniques. The way how they work is really beautiful, the

accuracy of the results that they get because of that

mechanism. And from that accuracy, it has been applied to

a lot of product designing, modelling in most of the corporate

sectors that I’ve come across. So that’s why it’s viewed as an

important tool in your career.

Kirill: All right, give us a five sentence breakdown of how SVMs

work. What is their main advantage?

Deblina: Okay, say a set of data is there and you need to classify it

into two classes. So, for example—Kirill, if you could give me

two classes?

Kirill: Apples and oranges.

Page 23: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Deblina: Okay, so apples and oranges. Great! Your machine needs to

know—

Kirill: I like to participate in your examples. I can see how you’re a

great researcher.

Deblina: Okay. (Laughs) So, you have a bunch of apples/oranges

combinations and your machine should classify which one is

apple and which one is orange. That’s the objective. So

thereafter, the next step is how will the machine know that.

The model of SVM, what it does is it builds something called

a hyperplane. To be non-technical, I would say that’s the

margins between the two classes. So those margins, what

happens is any other algorithm would find a similarity, like

which is an orange for class ‘orange’ and which is an apple

for class ‘apple’. But what SVM does is, among the apples, it

will select which has the most similarity with an orange, just

the opposite. And with the orange, it will select which has

the most striking resemblance with an apple. It finds out the

outliers or the mistakes in a very non-technical manner and

puts that as your margins. And based on that, the remaining

data is classified. That’s how SVM works.

Kirill: Yeah. It’s very counterintuitive if you’re thinking about it in

terms of the other algorithms, where they look for the most

apple-y apple or the most orange-y orange, and then they

build their classes based on that.

Whereas here, you’re looking for the really cool orange which

actually looks like an apple and really a rebel apple, which

looks like an orange, and based on that you’re like, “Oh, so

those are my boundaries.” And then you’re like – bam!

Page 24: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Hyperplane in-between them and that’s it. That’s a

completely different approach to classifying.

Deblina: Yeah.

Kirill: Okay. That’s cool. Another interesting thing you

mentioned—just to summarize for the guys listening, tools of

the future are Python, R, SQL, Spark, scikit-learn and

TensorFlow, and techniques of the future are clustering,

regression analysis, neural networks, support vector

machines and ensemble learning among others.

And other interesting things you mentioned, and these are

just from before on this show, Strata and Scala are replacing

Hadoop and Spark has taken over Hadoop. Can you go into

a bit more detail on that? Like, Hadoop is such a trendy

buzzword, everybody wants to learn Hadoop. Does that

mean that listeners on this show shouldn’t be learning

Hadoop and they should be learning Strata, Scala and Spark

instead?

Deblina: I wouldn’t say that, but right now—again, it depends on

what the listeners want to do and what they’re looking for

with their model to solve. But why I’m saying that Strata,

Scala and Spark have replaced Hadoop is because right now

what the researchers are doing, in all these conferences that

I travelled to, I saw that Hadoop has been there for quite

some time. And right around from 2007 to today, it has

almost been replaced by these technologies and the

companies are also looking towards these technologies,

obviously for the real-time analysis of these technologies.

Right now, I don’t think that listeners should just stop

learning Hadoop because even I use Hadoop on a regular

Page 25: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

basis. But I find Spark much easier, and I find it has more

parts to it rather than Hadoop strictly because of the real-

time processing that it can do with big data. I don’t know so

much about how companies or even academic organizations

are using Strata and Scala because I don’t have full

knowledge of that, but I can speak for Spark for sure.

Kirill: Okay. That’s very interesting. And what would you say to

somebody out there who runs a business who is using

Hadoop right now? Do you think they should start

considering switching to Spark, or are they fine for the next

couple of years?

Deblina: That depends on what business that person is running, but

definitely you should be starting to make a transition to

Spark. I strongly feel so. Again, it’s not a personal opinion;

it’s like speaking the minds of everyone who I’ve come across

in the past year.

Kirill: Let’s say they’re running an online store, so they have a lot

of OLTP/OLAP type of things. What’s the main advantage of

Spark over Hadoop?

Deblina: Okay, if they’re running an online store, basically it’s more

neater, it’s nicer in the way how it works with respect to

handling and processing the data and also the kind of

intuition it has towards modelling the data into different—if

you’re looking towards classification and stuff like that, put

into clusters, Spark is better. Those are certain advantages. I

don’t know so much about speed and stuff because right

now even I am in a jiffy, like “What should I be using?

Hadoop or Spark?” Right now I’m trying my hands on both.

Page 26: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

So the moment I get to a proper thing, I will put that up on

my LinkedIn profile.

Kirill: Okay, sounds good. We’ll be looking forward to that. I’ve got

a few quick questions for you, rapid-fire type of questions.

Are you ready?

Deblina: Okay.

Kirill: All right. What’s the biggest challenge you’ve ever had as a

data scientist or machine learning expert or AI researcher?

Deblina: That would be handling unstructured data from all possible

sources and giving it a proper structure. That’s very

important.

Kirill: Okay. That’s a very deep challenge. I can totally appreciate

that. What’s a recent win that you can share with us that

you’ve had in your role, something that you’re proud of?

Deblina: It would be the completion of my recent project, the

intelligent health care system of those CT scans detection

that I presented in that conference of artificial intelligence in

the United States.

Kirill: Do you think that can have a real world application and

soon we will be using those?

Deblina: Yeah, because a project got acquired by a hospital.

Kirill: Oh, nice. Very cool.

Deblina: Yeah, it’s giving results.

Kirill: Congratulations. That’s awesome.

Deblina: Thank you.

Page 27: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Kirill: It reminds me of the podcast with Damian Mingle, I think it

was number 13, where he came up with a machine learning

algorithm to predict sepsis. It’s always very cool to see

people using artificial intelligence for good.

Deblina: Yeah, I remember that. I heard that.

Kirill: Yeah, that’s awesome. Thank you. Now I have two people

who are saving lives. That’s awesome. You never mentioned,

what’s the name of your library that you’ve developed, if we

can look it up or something later on?

Deblina: I told you it’s not on Python yet, so—

Kirill: Yeah, but even in C, did you give it a name like a codename?

Cobra or something like that?

Deblina: No, because right now I haven’t put up the name. It’s still in

the beta version. So once I do that, I will keep you posted.

Kirill: All right, sounds good. Next one is, what’s your one most

favourite thing about being in the field of data or being a

data scientist, being an AI researcher?

Deblina: I really like the power that we have to build awesome and

cool stuff with data, making machines to think more like us,

and it’s just the beginning. We can create a huge impact on

creating a smarter tomorrow. Take, for example, Alexa and

Google Home – it’s just the beginning. I really like that about

our field, and also how in demand it is among the various

sectors around the world.

Kirill: Yeah, totally. But on that, this is kind of my question that I

really wanted to get your opinion on after we warmed up

with everything else. What do you think about a lot of people

Page 28: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

saying that AI is a threat, that not only are we going to

develop smart homes or smart cities and help people in

health care, but actually we’re going to create super

intelligence or artificial general intelligence which will have a

prerogative of its own which will eventually decide that

humans are not meant to be on this planet? What are your

thoughts about that?

Deblina: I totally understand that because even we keep thinking that

and that’s one of the issues that I discussed whenever there

are meet-ups and business things. I second that. I feel,

however, it can be solved by going through—okay, we need a

stricter security. You know how it’s coming because just in

2017 every person’s personality can be assessed from his

online data. So imagine from all the data sensor services and

commodities that a single person uses, how easy it will be to

know everything about a person. And I’m not talking about

an end-to-end integrated AI bot. That’s too much into the

future. I’m just talking about simple intelligence machines.

That is equally powerful and dangerous. So what we need is

a data architect or scientist with a solid information security

background. He will be really indispensable. If we can build

a security mechanism around it, it’s good to go, yeah.

Kirill: All right. That’s important. It’s good that you have the

confidence that we’ll be safe. (Laughs)

Deblina: Yes. (Laughs) You need to stay positive. Whatever you’re

doing, you need to just enjoy it and think that it’s going to

work. That’s how I look at it.

Kirill: That’s true. Okay, it’s been a real pleasure having you on the

show. Thank you so much for coming on.

Page 29: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Deblina: Thank you so much, Kirill.

Kirill: So how can our listeners follow you or maybe connect with

you, maybe even ask you some questions if they’d like to

learn more about your career?

Deblina: Okay, I’m on LinkedIn. I go by my name – Deblina

Bhattacharjee. And you can also connect with me via Gmail.

I go by [email protected], so I would be definitely

open to sharing ideas, discussions, basically learn. Yeah.

Kirill: Beautiful. Thank you. And one last question I have for you

today is, what is a book that you can recommend to our

listeners to become better in the space of data science or

artificial intelligence?

Deblina: Okay. This book would be the one with which I started off:

“An Introduction to Statistical Learning” by James Witten

Hastie. In my opinion, it was a great read. The book is free

and it’s just so good. Also, if you might allow me, I would

just recommend another book which is “Applied Predictive

Modelling” which has thorough examples and explanations.

So, these two books, yeah.

Kirill: Okay, beautiful. So, “An Intro to Statistical Learning” and

“Applied Predictive Modelling.” Actually, I also wanted to

mention or reiterate that author that impacted you at the

very beginning of your journey. If anybody is interested to

see how Deblina started out, her name was Shakuntala

Devi, right?

Deblina: Yeah.

Kirill: Can you pronounce that for us? How do you pronounce it

correctly?

Page 30: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

Deblina: Shakuntala.

Kirill: Shakuntala Devi. Do you think they’re available in English,

like those pattern recognition books?

Deblina: Yeah, all of them are in English.

Kirill: Okay. I’m really curious to check that out. You know, it’s

always interesting to go back to the source, where everything

started out.

Deblina: Yeah, sure.

Kirill: Yeah. Once again, thank you so much for coming on the

show and sharing all this wealth of knowledge with all of our

listeners.

Deblina: Thank you so much for inviting me, Kirill. It has been a

pleasure.

Kirill: So there you have it. I hope you enjoyed today’s

presentation. It was quite an overwhelming discussion,

actually. There was lots of interesting things. You can tell

that Deblina is very well-versed and very knowledgeable

about all of these subjects and has a lot of experience in all

these different tools and techniques. And it was just a great

pleasure that she was able to share these things with us.

Perhaps for me the biggest takeaway for me from this

episode was what Deblina said about her algorithm and how

it’s structured. I always thought that neural networks are

the most powerful thing and they’re the endgame for

humanity in terms of artificial intelligence, but in reality it

actually turns out that it’s not. It’s very interesting that such

a forward-looking researcher like Deblina has chosen a

Page 31: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

different approach, an approach inspired—other than by

human consciousness and the human mind, Deblina chose

an approach inspired by the kingdom of plants and the

natural selection that has been happening there.

Based on some of the tests that she’s done, her algorithm is

performing at least as good as the existing ones out there.

Basically, it shows that there are lots of avenues for artificial

intelligence, not just neural networks, and also kind of

underlines how broad this field is and how many

opportunities there are and how interesting it can be. So

pretty much, as long as you have the passion and have the

drive to learn the programming skills that you need, the

world is your oyster. You can come up with any type of

inspiration and code that and see how that goes.

So there we go. That was our podcast on artificial

intelligence. I hope you got some very valuable takeaways. If

anything, now you know which tools to focus on and which

techniques to study to prepare for the world of the future.

You can find all of the resources mentioned on this podcast

including ways to connect with Deblina at

www.superdatascience.com/43. Also there, you can get the

transcript for this episode. And definitely make sure to

connect with Deblina on LinkedIn and follow her career.

And by the way, I just had a look at Shakuntala Devi, and

she’s considered a human computer. This is a person who

can multiply 13 digits by each other within like several

seconds. We’re going to include that in the show notes as

well. I think that could be a very interesting thing to have a

look at as well. And on that note, thank you so much for

Page 32: SDS PODCAST EPISODE 43 WITH DEBLINA BHATTACHARJEE · Kirill: This is episode number 43 with AI Researcher Deblina Bhattacharjee. (background music plays) Welcome to the SuperDataScience

being here. I really appreciate you and I can’t wait to see you

next time. Until then, happy analyzing.