SDS PODCAST EPISODE 9 WITH MUHSIN KARIM · Kirill: This is episode number 9, with neuroscience PhD turned data analyst Muhsin Karim. (background music plays) Welcome to the SuperDataScience

SDS PODCAST

EPISODE 9

WITH

MUHSIN KARIM

http://www.superdatascience.com/9

Kirill: This is episode number 9, with neuroscience PhD turned

data analyst Muhsin Karim.

(background music plays)

Welcome to the SuperDataScience podcast. My name is Kirill

Eremenko, data science coach and lifestyle entrepreneur.

And each week, we bring you inspiring people and ideas to

help you build your successful career in data science.

Thanks for being here today and now let’s make the complex

simple.


Welcome to this episode of the SuperDataScience podcast.

Today I've got a special guest, a student of mine, Muhsin

Karim. And Muhsin actually reached out to me on Udemy

just to say thank you for the courses and I asked him for his

LinkedIn. Then when I looked at it, I was so fascinated.

Muhsin actually has a PhD in neuroscience and he has quite

some experience in academia. And then he went down that

path that we have witnessed so many times on this podcast

when people from academia turn into data analysts and data

scientists.

So in this podcast you'll find out what drove Muhsin to

change his profession to move away from academia and

move into data analytics, what skills he leverages from his

neuroscience background and his data processing that he

was doing there in his current day to day role. Also, it was

really interesting how we had a conversation about where

the world's going in terms of data, what machines are

coming up to, how the human brain is going to be modelled

by machines in the upcoming years. We talked about

Moore's Law in quite some depth, so if you're not familiar


with Moore's Law, this is a good episode to pick up on

Moore's Law that currently governs how computers are

developing, and that exponential curve in technology that is

all dictated by Moore's Law at the moment.

Also in this episode, we go into quite a lot of depth about

how to get into the space of machine learning. So Muhsin

has quite a lot of experience with data from his PhD, so he

uses R programming pretty much on a day to day basis and

he has expertise around some statistical methodologies, and

his next venture in data science and in analytics is to get

into machine learning. And he's taking actually one of our

SuperDataScience courses on machine learning. And he'll

share some tips and advice on how you can get into the

space of machine learning. How you can tackle this huge

broad challenging field which is machine learning. How you

can dip your toes in the water, how you can get into there

slowly. He'll give you some of his experiences of how he's

gone about it and how he's gotten into this field slowly

making his way and expanding his skills. And also we

discuss why it's necessary, why machine learning is such an

important field to get into now. This is the time to get into

machine learning.

And also in this podcast, we talked about quite a lot of

books. So in total, there are 7 books that we mentioned in

this podcast. I brought up 3 and Muhsin recommended 4.

So if you're interested to fill up your library with data

science and related books, then this episode is definitely for

you.

Can't wait to get started, can't wait for you to hear all the

insights. And without any further ado, I bring to you Muhsin

Karim.



Welcome everybody to the SuperDataScience podcast. Today

I have an exciting guest. Muhsin, who is one of my students

on Udemy! Welcome to the show, Muhsin. Thank you for

taking the time.

Muhsin: Thank you for having me.

Kirill: I had a look through your LinkedIn. Very impressive, and

we'll get to that in a second. But first of all, could you share

with me and with the listeners, how did you find out about

my courses on machine learning and data science?

Muhsin: It was actually through a colleague of mine. So I work at a

company called Harvey Norman, and I was chatting to this

colleague. We both share an interest in data science, and he

happened to mention your course, “Machine Learning A-Z”.

And I signed up, and in a weekend, I went a third of the way

through, and I was very impressed with the course.

Kirill: Fantastic. So you're enjoying it so far?

Muhsin: I am indeed, yeah, absolutely. I like that as an R

programmer, I can code up in R, but then you have a direct

comparison to the Python code, which I'm currently learning

at the moment as well. So it's good to have those two side by

side comparisons between the code I'm familiar with and one

that I'm learning.

Kirill: Oh that's really cool. And are you enjoying the Python one as

well? Do you find it much different to the R code?

Muhsin: Not terribly different. I mean programming the concepts are

similar. A few things I just need to get my head around, but


otherwise I can see the similarities. So it's good to have them

both there.

Kirill: Ok, fantastic. And I remember that we had a quick chat on

Udemy. Do you remember why you reached out? Why did we

get in touch?

Muhsin: Oh, I just expressed my appreciation that you were a very

good communicator in demystifying very complex --

Kirill: Oh that's right, that's right. Thank you.

Muhsin: I always appreciate when people can do that, when they can

take something that's quite difficult to convey and describe it

in a palatable way. So I just said thank you.

Kirill: Oh, right. I remember that. And thank you so much for that,

and it's interesting how that transferred to you coming onto

the show! Because I have a sense sometimes when people

are extraordinary, I have a sense that I have to take a few

more seconds, a few more minutes, to actually find out

where this person's coming from, what their background is.

And so I asked you for your LinkedIn, and I didn't make a

mistake! When I saw your LinkedIn, you have a PhD in

neuroscience! And as soon as I saw that, I was like, I have to

bring Muhsin onto the show! So yeah, thank you for

reaching out, thank you for coming onto the show. You

ready to talk a bit about your background?

Muhsin: Yeah, absolutely!

Kirill: Alright. So PhD in neuroscience. I was just going to quickly

recap on this. So you have a publication which is called

"Behavioural and neural correlates of vibrotactile

discrimination and uncertainty". And you have some other


publications as well. So to start us off, how did you get into

neuroscience, and why did you choose this field?

Muhsin: Yeah, so I kind of stumbled into it. But I guess it begins with

my degree. So I actually have a degree in Molecular Biology

Genetics. So that's the first science I picked up. And that

degree involved a lot of wet lab based work. So it's

centrifusing test tubes and adding chemicals together and

wearing white lab coats. It was very -- looking like someone

from a CSI crime lab!

But despite that excitement, I really didn't like the degree!

And I hated it all the way throughout. But what I found

during the degree was I attended a talk. And it was about

neuroscience and biochemistry. And I realised I was more

fascinated with brains and behaviour rather than

biochemistry and genetics.

From there, I pivoted into a PhD research project that was to

examine neural cells from an animal model that had been

fed different concentrations of an important early brain

development nutrient. And my task was to lay neural cells in

a petri dish and watch their synaptic growth. So when you

put two neural cells together in a dish, if they're healthy,

they'll start communicating with each other, and they'll form

connections, these synapses. Unfortunately, the cells kept

dying. So it was a terribly demoralising experiment and I

can't emphasise enough that I really hated what I was doing!

I was terribly frustrated, nothing was working, and I actually

quit that PhD. And it was the best decision I made in my life,

and I wish I had done it sooner.

But then I eventually found a neuroscience lab at the

University of New South Wales that had received funding to


take a more multi-disciplinary approach to studying human

cognition. And there, my project was to examine how the

human brain made optimal decisions when faced with

uncertainty. So I think going to the particular tasks I was

doing for my PhD --

Kirill: Actually, I read a little bit about your overview of the PhD,

and I would really like to learn more. Like, without going

into too much detail about the technicalities, but what were

you examining and what were the differences between the

experiments and what were your conclusions? I found it

quite interesting, and would like for you to explain what

exactly the outcome was.

Muhsin: Sure. So generally what I was examining was, we wanted to

see how the human brain can make optimal decisions when

faced with uncertainty. So our brains are these amazing

devices, these statistical machines that, when they're faced

in the presence of uncertainty, they tend to actually make

optimal choices. And we wanted to explore that further. And

the way that scientists can do that is they can present

people with very simplistic tasks and then use neuro-

imaging techniques to start to pick apart the areas of the

brain that are implicated in decision-making.

So the way I did that was with a very mind-numbingly dull

task, which is I would get participants to come in a

laboratory and they would place their index finger on this

vibrating probe. And the task is called a vibrotactile

discrimination task. And they had to discriminate between

pairs of vibrations. So you get a vibration, pause, a second

vibration, and you have to answer a question, was the

second vibration faster? Yes/no response. And they had to

do that 400 times.


Kirill: Oh wow!

Muhsin: Yeah, yeah. So the participants were paid. But it was a very

boring task, but scientists have to do this, introduce very

carefully controlled conditions, so we can study bigger

things further up the neural pathways. So I took that task

into an MRI scanner. It's called functional MRI imaging,

which allows us to infer what brain regions are involved in

the task under study. So the way it works is that when you

are engaged in a particular task, the neural cells in your

brain that are implicated in that task will start to have local

blood flow to the brain, and the FMRI can actually image the

regions of the brain where there's local blood flow. That

blood flow is just an indicator of neural activity. So we were

able to find out what brain regions were involved in decision-

making when people were uncertain. So you can increase

the uncertainty in our task by making the vibrations very

similar to each other. So if you had pairs of vibrations where

they were only separated by a few hertz, then participants

really struggled to make that choice. And we were able to

highlight brain regions that were uncertain.

Kirill: Ok, interesting. So you found the brain regions that are

responsible for uncertain decisions.

Muhsin: We found ones that the literature often commonly cited. So

they typically are these prefrontal cortical regions,

particularly one that was called the left dorsal lateral

prefrontal cortex, which it's been years since I've actually

said that out aloud! But yeah, it's a region that's involved in

executive functioning and working memory. The neural

imaging, although it's very exciting, actually wasn't the most

interesting part of the PhD. It was more about the

behavioural sets that I was able to discern from the task.


Kirill: So what were they?

Muhsin: It's a little challenging to describe without drawing it, but I'll

do my best. With the vibrations that our poor subjects had

to sit through, on average tended to be about 34 Hz in

frequency. What I was able to show was that our

participants, they weren't really comparing two vibrations

when they made a decision. They compared the second

vibration with a memory of what the first vibration was. And

our memories, they degrade over time. And even over a short

period of time. So in my experiment, let's say that your first

vibration was 40 Hz. And that memory of 40 Hz, your brain

is just trying to remember 40 Hz. But soon it starts to

degrade and actually drifts down to the average. Feels more

like 34 Hz. So the brain is very good at representing the

environment as averages. And these kind of stereotypical

pictures. In the end, it didn't really matter what the second

vibration was, people were making comparisons to this

average that the brain was representing.

Kirill: That's very interesting. That's a whole discovery on its own,

right?

Muhsin: Yeah, well people had known that this was a known effect.

It's been known for a while, and it's called the time order

effect. But what researchers weren't doing was that when

they set up this vibrotactile discrimination task, they weren't

accounting for it. So it was like you would set up a model

and they had their predictors of interest, so you know,

changing the frequency, or introducing noise as predictors.

But they didn't include the time order effect as a predictor.

And the time order effect, it was a significant influence. It's

something they should be accounting for in these tasks.


Kirill: Ok, yeah. Could you explain time order?

Muhsin: So the reason it's called the time order effect is because

people will bias their decisions based on the actual order of

the way that the vibrations are presented. So people have a

response if they received 40 Hz first and then, let's say, 20

Hz. Compared to 20 Hz and 40. Because it's based on that

drift.

Kirill: Ok, got it.

Muhsin: So when you swap the orders, you actually come up with a

different type of way of decision making. Your decisions are

biased.

Kirill: That's really interesting. And we might be getting a bit

carried away here. The reason I'm so fascinated by this is

actually I recently read two interesting books. One is "The

Future of the Mind" by Michio Kaku, and he actually

explains all these things, like how the MRI works, and how

the left brain versus the right brain operates. So I'm really

interested in this. And the one I'm currently reading is

"Emotional Intelligence" by Daniel Goleman, and it also talks

about how IQ is not the only factor that decides whether a

person's going to be successful in life. There's also emotional

intelligence.

But moving on to more of the data stuff. So all of this data

that you were getting in through these experiments, I'm

assuming 400 tests that you were doing, how were you

storing this data, and what tools were you using to process

it?

Muhsin: I was using a program called Matlab, which is very popular

for engineers and it's a statistical program. And it's very


similar to R, except it's a commercial program. And

universities use it quite a lot. So we used Matlab to

essentially collect the data and also to store it. And then to

process it, there were other external contributors that had

created their own package called SPM, which is Statistical

Parametric Mapping. And what they did was they created

this package that took the neural imaging data and was able

to perform the statistical analysis that would highlight the

significant neural regions that were implicated in the task at

hand.

Kirill: Ok. I've used Matlab as well before. But also in research,

when I was doing my physics degree. So how do you find the

differences between Matlab and R? What would you say are

the advantages of either of the tools?

Muhsin: The reason I'm using R is because, for one, it's free. And also

I found that transition quite -- almost -- I wouldn't say

seamless, but it was quite intuitive, in that they're very

similar languages. Like these vectorised operations. So for

me to go from Matlab to R, for the most part it was pretty

intuitive. So although now I've been using R for so long, I

think I certainly have a preference. Also because the

community for R is so huge. The variety of packages is

impressive. So if you need something like a package that will

beautifully plot your data, that's available. There's packages

that will nicely wrangle and manipulate your data. It's been

years since I've used Matlab, but now I'm definitely in the R

camp.

Kirill: Very, very good. I agree. R has so much support around the

world, it's definitely a great tool to have in your arsenal. And

our listeners might be a bit confused thinking that you are

still doing your research and working in the neuroscience


space. You moved on from that, and you moved on to some

data-driven roles in different companies. Can you tell us a

bit more about why you decided to make that move, and

where exactly did you go?

Muhsin: I actually kind of moved on whilst I was doing my PhD. So

when I was writing up my PhD, I was actively looking for

work in industry in different sectors for data analyst roles.

The reason being is that I think if you wish to stay in

academia, it helps to be very passionate about a research

topic, and I didn't really find my niche in science. So I knew

that the academic career path wasn't for me. So I looked

elsewhere, I looked across different sectors, government,

non-profit, industry. Ultimately for me, it's always a case of,

I'm interested in human behaviour, I'd like to be exposed to

different data sets that capture that behaviour, and I look for

organisations that have that data and are in need of

someone of my background.

Kirill: So that's great. And then you moved on to more retail-

focused roles, so now you're a customer experience analyst

at Harvey Norman, and if I'm not mistaken, Harvey Norman

is a massive chain. Is it like warehouses, or what does

Harvey Norman do again?

Muhsin: Harvey Norman is a large Australian retailer. So we're a

retailer of furniture, bedding, computers, communications,

and electrical products.

Kirill: That's right. I was confusing Harvey Norman with Bunnings.

Bunnings is a different store.

Muhsin: They're one of our competitors.


Kirill: Yeah, there we go. So at Harvey Norman, how do you apply

those skills that you learned and you developed through

your PhD, through that academic research, those data skills,

how do you apply them, and how do they benefit you now in

your current role?

Muhsin: Being a large retailer, we have data. We have lots of data.

What wasn't happening prior to my role was that a lot of

that data was being left unanalysed. It was being used in a

way that you could build a one-off report, but my manager

wanted to do something more with that data. So the skills I

acquired from the PhD, there's the obvious -- the coding

skills, where I was able to wrangle data and tidy it, and then

produce something useful. That's something that I use every

day. But the more generic scientific skills that I picked up

are actually quite useful. My scientific background, it kind of

made me a bit of a careful planner. So when you're setting

up an experiment, an MRI session costs $600 an hour. You

do learn to be quite methodical and prepared. It's almost like

project management, really, in that for a given project at

Harvey Norman, I'm very good at mapping out what's

required, what resources do I need, what's the best

approach, and do I need to speak to anyone to get the job

done, and get past any hurdles. So doing a science degree, it

does teach you a lot of skills that I think a lot of people in

science don't realise have a lot of applications outside of

academia.

Kirill: That's very interesting. And resonates very well with what

Wilson Pok in episode 3 mentioned, the same thing, that

academic background. And Wilson has a PhD in

nanophysics. That academic background helps you set up

the problem, identify the problem, the challenge and set up


for solving it. So like you say, that project management, or

pre-project management, skills are very powerful. So would

you say that for our listeners who are in academia and are

kind of contemplating on maybe making the move into data

science and more industry or retail focused, would you say

that this is one of their top skills that they should sell

themselves or advertise themselves as, focus on that skill

that they can actually identify these problems and set up

projects to minimise expenses, maximise how quickly they’re

processed and things like that? Would you say that that is

one of the key skills that academic-minded people are able

to develop?

Muhsin: Yeah, absolutely. I think that if you do a PhD, it’s really

project management. Not in the traditional sense, but you

really are given a problem to solve. And obviously, with a

PhD, you have a lot more time to really focus on it, but it is a

different way of thinking in that you go from science, but

you should really pitch yourself as someone that can find

solutions in a very ambiguous scenario. And in business,

what I found is you’re often faced with a lot of ambiguity.

Not just in the project itself, but also around the particular

resources that are available. And I think another advantage

going from academia or science to industry is that having

access to things like being very familiar with open source

tools. So when I started, I had the option to request various

commercial tools to do my job for data analysis. And

because it was exploratory, I was actually more keen on just

trying things out with R and other open-source platforms,

and then we found that it completely suited our purposes

and now I’m actually introducing it to more people across

the organisation. So you can bring a lot of what you learned


in academia and be very surprised about what the business

finds useful.

Kirill: Yeah, wonderful. And thank you for that support. I’m sure a

lot of our listeners will find that very encouraging. So if you

are in a situation like Muhsin was, where you’re in academia

and you’re maybe not enjoying as much what you’re doing,

or you’re not as passionate about it as you thought you’d be,

then don’t be afraid of this move into data science because

those skills—like, we’ve had great examples. We’ve had

chemical engineers, we’ve had nanophysics, and now

Muhsin is more neuroscience. So people from all different

sides of academia are moving into data science. You will

always find skills and ways of thinking that you have

developed that will highly benefit you in the data science

role. And that’s not to say that people that are not from

academia shouldn’t move into data sciences, it’s just good

encouragement for those who might find themselves a bit

stuck in academia, that there is this whole other world that

you can explore.

So we’ve discussed a little bit how you moved from academia

to being a data analyst at Harvey Norman. And before we

started this podcast, you mentioned to me that slowly, as

you’re picking up new skills through these courses and your

own education or self-education, your role is slowly

transitioning from a data analyst to a data scientist or that

there is an option, a pathway that the company is happy for

you to undertake to slowly transition to a data scientist.

What does this entail in your view, and what kind of new

responsibilities or new methodologies and new ways of

working does this transition from a data analyst to a data

scientist entail?


Muhsin: So I’m fortunate to be a part of the team where it’s

encouraged to research and explore and come up with new

techniques to find insights around data. So with traditional

data analysis skills there’s reporting and there’s

dashboarding, where you can showcase the information at

hand. But what I’m looking forward to with the data science

skills that I’m picking up is, can we gain insights from the

data that we didn’t necessarily consider before and, quite

frankly, can I make my job easier. So an example is, we

collect very simple customer experience data which tends to

be post-purchase service where we ask customers for ratings

based on their shopping experience. And that can be from

very good down to poor. We also ask them for feedback and

that’s where customers can leave us open-ended responses

based on how they viewed our services. And one of the first

tasks I’ll perform with data science techniques is text

clustering. I’d like to take all that text, of which there is a

large volume, and apply some machine learning to cluster

that information into categories and really highlight to the

organisation that this is what our customers are saying en

masse.

Kirill: Very interesting. And what tools would you think you're

going to be using would be using for text clustering?

Muhsin: To begin, I’m actually looking into Python. So even though I

use R predominantly, I’m now picking up Python and I’ve

been told to look into something called topic modelling

specifically, which I’ve been told is very good for various

ranging strings of comments, whether they can be short or

long. Our customers might say very little and then they may

say a lot. So my hope is that that modelling approach will be

able to factor that in.


Kirill: Okay. And so from this text clustering, walk us through your

thinking. How are you going to take these comments? Are

you going to then put that unstructured data into structured

data and then analyse it and what insights do you hope to

extract from it, or do you have some different approach?

Muhsin: Actually, I’m not entirely sure yet. This is all very new. In

fact, it’s so new that my first real attempt will be this

weekend.

Kirill: Oh, wow! Okay. That’s really exciting and best of luck with

that. Sounds like an interesting undertaking.

Muhsin: Yeah, I’m looking forward to it. Ultimately, what we want to

do is have these techniques to organise the data in a more

intuitive way for our end users to digest, really.

Kirill: Yeah, it’s always good to put data in the right format for end

users to be able to take action or make decisions on that

data. And what you mentioned about finding this new data

that hasn’t been used before kind of resonates with the

notion that was brought up in one of the previous podcasts

of a data landscape, of identifying or mapping out the data

that you have. I would assume that an organisation such as

Harvey Norman would already have all this transactional or

customer data mapped out pretty well. But what you are

actually doing is you’re finding this new data that the

company didn’t even think about, and you are adding it to

the data pool so that you can combine it with other data

sources in order to extrapolate certain decisions, or come up

with certain insights. So is that something that you have in

mind in combining this data with existing data sets that you

have in the company already?


Muhsin: It’s something that we certainly do, where we’ve got our

customer experience data, and of course there's

transactional data, and we can combine the two. Of course,

given the breadth of data we have, it doesn’t mean that you

should combine everything. There are various reasons why

you couldn’t or shouldn’t. You know, for privacy issues, for

instance. Is it possible for us to do that? Yeah, you can gain

a lot of insight by talking to somebody in a different

department and learning more about the practices that they

go through and coming up with different ways of shedding

insight into our customers.

Kirill: Yeah. Wonderful! And I would like to move on a little bit to—

a topic I’d like to explore and I’m interested to hear your

opinion on this, is you studied both neuroscience and now

you’re getting into machine learning. And we know that

machine learning is kind of–-the pre-emphasis of machine

learning, or some of the algorithms and branches of machine

learning, is to model the human brain or take certain

aspects of the human logic and model them in a machine

way so that decisions can be made faster. What is your take

on the similarities or discrepancies between neuroscience

and machine learning?

Muhsin: Big question! I’ll be honest with you, I’m not entirely sure.

Kirill: Through learning, through undertaking these studies in

machine learning, do you see any resemblances with what

you used to pick up in your PhD and about how the human

brain is structured?

Muhsin: Yeah, so I guess an example is -- it's a modelling approach I

haven’t had an opportunity to explore yet, but there’s

Bayesian inference. So I think with Bayesian statistics,


that’s when you need to update a probability, you can

actually look back to past instances to inform your current

decision. And we actually used that analogy quite a lot with

my PhD findings. So when I was talking before about the

time order effect, what essentially the brain is doing is that

when given some stimuli and it needs to make some

decisions, it will take that stimuli, but it will compare it to

past instances, like the average of a frequency. We’ve always

viewed that as being very Bayesian.

So yeah, there are – at least in a very analogous way, a lot of

machine learning approaches have modelled what the

human brain does. And I heard that some other PhD

colleagues of mine at the time, they were getting very heavily

involved into neural networking, of which I’m not entirely

sure how much that reflects the human brain. I know

anecdotally, I believe IBM has an initiative called "Big Blue",

where their plan was to, and I may be paraphrasing

incorrectly here, was to essentially build a brain from

scratch but with a machine. So it’s almost like you’re

building a brain cell by cell, but it’s using code. Certainly, I

think the inspiration of using machine learning techniques

inspired by the brain, I think it’s a very good way of seeing

what evolution has constructed in trying to replicate that in

a way that we can reproduce.

Kirill: It’s very interesting you mention that, because I was reading

a book—I think it was last year I finished reading a book

called “Bold” by Peter Diamandis. I’ve mentioned it a couple

of times already, but there he discusses in detail Moore’s

Law and how—obviously, Moore’s Law states that—Gordon

Moore started Intel back in 1965 and he then came up with

this concept which was later called Moore’s Law in 1968. He


noticed that the price of an average computer is dropping in

half every 18 months, but the processing capacity of the

average computer you can buy somewhere in the stores is

doubling. So basically that’s the main concept, that the

processing capacity, the processing power, is doubling every

18 months, and that law has still held constant. And based

on Moore’s Law, right now we already have computers that

think as fast as a rat’s brain, or the brain of a mouse.

If you extrapolate Moore’s Law, which has held constant

since 1968, which is like crazy, the human brain—we’ll have

computers that think as fast as a human brain by 2025.

And that’s why that’s a big number in machine learning and

why everybody is looking forward to the year 2025 because

that’s, according to Moore’s Law, when we’ll get those

computers. And by the year 2050, if the Moore’s Law still

holds constant, we’ll have an average computer which you

can buy for like $1,000 in Harvey Norman which will think

as quickly as a whole human race. And that is like insane.

Right now, we don’t even have the infrastructure or the

computers to reconstruct the human brain, but by the year

2025, we’ll be able to do it with an average computer which

people will just have in their home. It’s very interesting

where the world is going and what implications that will

have. What are your thoughts on the implications? What do

you think will happen if all of a sudden we’re going to be

able to reconstruct human brains cell by cell?

Muhsin: That’s fascinating! I’m a little worried now. I didn’t realise

that we had advanced to that point where by 2025, we would

have a machine that could think at that capacity because at

the time when I was doing my PhD, I knew that the

advantage that the human brain has over the machines is


the capacity for parallel processing in that each cell can have

many multiple connections to various cells and from cells

located great distances away from each other. I mean, I

guess it’s slightly concerning in that I’m quite convinced

that—we already know that machines will be taking over a

lot of our jobs. Automation is a reality, but that’s fine. I

think it will actually open up new endeavours. But it’s kind

of hard to think about what the future will look like because

it wasn’t that long ago that there was no Google, there was

no Facebook or Uber or Airbnb. I don’t know what’s around

the corner. Even now I’m still impressed when you get an

alert from Google asking you to review something, even

though you’ve never really explicitly told anyone that you’d

been to a restaurant and yet it’s sending you a notification

saying, "You should review this restaurant." It knows that

you’ve visited there with friends before. It’s interesting to see

how that large volume of data and our daily interactions,

and how that will actually start to shift out our physical

world and our presence in our day-to-day. And then

combining it with machines that can think very quickly.

Who knows what the future will hold?

Kirill: Yeah, it’s getting freaky. Just a couple of days ago, I had this

weird experience where I clearly remember—I don’t

remember exactly what it was about, let’s say it was about a

gym membership. And I clearly remember I wasn’t Googling

that topic, and I wasn’t even searching it or anything. I was

just speaking to somebody about it, and then the next day I

get a Google pop-up. I get this impression that maybe our

phones are actually taking in everything that we say and

then Google or some other online project is analysing all of

the textual information that comes from there and that’s


how we got the ads. Because that’s happened like once or

twice to me just over the past month. Have you ever had an

experience like that?

Muhsin: Yeah, on occasion. Basically everyone has experienced

targeted advertising, but it’s getting to the point now where

people are almost expecting that level of service. I’ll admit

I’ve actually clicked on targeted ads because they’re spot on.

So it’s quite a shift in—whereas before it was quite intrusive,

but I think increasingly we’ll start to almost demand and

certainly expect this level of—I use the word "service". I’m

quite paranoid. I think it’s a given that a lot of these devices

are recording us or watching us. I think Mark Zuckerberg

has his little piece of tape over his phone. You know, I think

that says quite a lot. It’s the world we live in now, and I

think a healthy dose of paranoia is not a bad thing. But

yeah, I do find it interesting that we have certainly shifted

into a world where we’re becoming increasingly comfortable

with it, which is fine. With some respects, maybe it will

improve something like the shopping experience, for

instance. And it could have other beneficial effects. Let’s say

you’re—this is just a silly example—accused of a crime, but

based on your digital footprint, you’re able to clearly prove

that you were not involved. We leave so much data behind in

our day-to-day that we’ll be able to defend ourselves. So it’s

not entirely a negative thing, but who knows how data could

be utilised.

Kirill: Totally. It can go both ways. And the thing is that you can’t

really stop it, right? It’s gonna go where it’s gonna go. It’s

already in motion. The machine is working, there’s no

stopping it and I get surprised at these movies about

Terminator and Skynet, and then people are like, "We


should be worried about Skynet. Maybe that can happen in

real life. We should not create Skynet." But if you think

about it, Skynet is already there. It’s like this big massive

thing called the internet which you cannot switch off and it’s

always gonna be working and collecting knowledge and

information. It’s just waiting for artificial intelligence to get

onto this internet and quickly learn all of the things that

we’ve been learning and then we’re kind of at the mercy of

that artificial intelligence. Don’t you think?

Muhsin: Absolutely. I’m concerned about how all this data will be

used to the point that I actually find myself disengaging from

some platforms because I don’t want to feed the beast

necessarily, but sometimes you can’t avoid it but you have

to. I remember early on I used to be quite hesitant about

making purchases online but I certainly do it now. There’s

no way to avoid it. And I think you can’t be an engaged

citizen unless you’re using the internet. You can’t go

completely off the grid. I mean, some people try and do it,

but it’s not really a pathway for many people.

Kirill: Yeah. So just like on that—most of this, you know, these

recommender systems that we spoke about, these AI and

mimicking the human brain in 2025 and so on, all of that is

governed or most of that is governed by machine learning

algorithms. And now you’re getting into this field of machine

learning, which I’m very admiring and I’m sure lot of our

listeners want to get into it. It’s the new data science. Like, if

you want to be a data scientist in the next 10 years, you

need to know about machine learning. So my question to

you is, machine learning is so broad. Off the top of my head,

I can probably name at least seven different branches of

machine learning including clustering, classification,


association rule learning, deep learning and so on. How do

you go about getting into—you know, taking your first steps,

or like mastering the first algorithms in machine learning?

How do you decide where to start? How do you actually go

about conquering this field? What is your plan, and what

can you recommend to those who are just looking at

machine learning and want to get into it as well?

Muhsin: Yeah, that’s a good question. I guess there are various

approaches. But the one that I took was—I mean, being a

data analyst I knew that I needed more power and more

techniques and abilities. So at some point you kind of reach

a bit of a limit with what you’re currently doing. And at the

time when I was transitioning into industry, I think data

science became a thing. And my approach was—I was

fortunate to be in an age where online platforms, you know,

the massive online open courses were starting to become

available. So, one big introduction for me was that I learned

R programming by the John Hopkins Data Science Coursera

course. That’s a series of 8 or 9 courses where they take you

through cleaning data and presenting it and they teach you

R. That was my avenue into this world. And then

increasingly from there, you realise that there’s a wealth of

information available, whether it’s from online courses,

various providers.

Often at times, I’m on stackoverflow every day, you just

Google for answers. So it’s a case of when you’re in a

position when you need answers. I mean, the Internet is a

great resource to find those answers. So for people just kind

of getting into it, yeah, the more time you spend doing data

analysis, and the more complex questions you have, I found

that I start to reach a limit. And that’s why I feel machine


learning is a way to broaden the horizons in terms of what’s

possible. So that’s why before I said I’m really interested in

what can machine learning do for me in terms of

highlighting things that I didn’t realise before. And I think

that’s a very powerful concept, in that traditional reporting

or traditional dashboarding is presenting the data you kind

of already know something about. Like, people have an

intuition about the data and you’re showcasing that data.

But with machine learning, because of the sheer power that

machines have, they’re able to potentially highlight

something that a person couldn’t have considered.

Kirill: Yeah, definitely. And especially with the huge volume of the

data, or maybe the complexity of the data, they help you

break those barriers and get those insights which otherwise

would have taken you ages or you wouldn’t even have

thought of. That’s some really great advice. You’re totally

right that with the new world that we have all these online

courses, massive online open courses and just any kind of

online education. Anybody can get into this field and

anybody can slowly start exploring step by step what

machine learning can do for them and how to get into this

field. I was actually at a conference last week. It was related

to data – it was on digital marketing, so there was a lot of

things related to data. It wasn’t specifically related to

machine learning, but somebody at the conference, one of

the speakers, said that the world we live in, you can learn

anything. You just have to know how to Google really, really

well. And that's resonated with me now when you mentioned

Google. It’s exactly right. So if you want to find out anything,

if you want to learn pretty much anything on this planet,

you just open up Google and it’s about how well you can


Google that topic, and how quickly you can find your way

into that information and access to information.

Muhsin: Absolutely. You said it best there. It’s about how well you

can actually structure the question. So when I’m teaching

people programming, one of the first skills I say they should

develop as quickly as possible is finding the right words to

use to find those answers. Provided you search correctly, the

more you read, the better the language becomes in terms of

how you hunt down those answers.

Kirill: Yeah, totally, and when you say teaching programming, does

that mean in your workplace? You’re like helping out your

co-workers or is that something else?

Muhsin: Yeah, yeah, helping out co-workers. So at the time when I’m

learning I’m also—I find the best way to learn is to try to

teach someone else.

Kirill: Exactly! I feel the same way.

Muhsin: Yeah. I think that if you can convey something in a way that

you yourself understand, and that someone else appreciates,

I think that’s beneficial for both parties. And that’s why I

really—I’m high like when people can break down very

complex, often uninviting information into something that’s

demystifying and palatable. So, yeah, I’ve often in past jobs

sat down with co-workers who were kind of stuck in their

Excel world. I love Excel, but if you only know Excel, then

you have an upper bound in terms of what type of analysis

you can do. And they have approached me to learn a bit

more programming and I’m always pleased when they

actually progress. And it’s kind of nice feedback to know

what you’re teaching is getting through. Yeah, it’s not just

collaborative. Recently I’ve had an intern who worked with


us for the past three months, and it was great to see her

flourish and pick up a few new skills in data wrangling and

some data science techniques.

Kirill: That’s really cool. I admire that a lot because I do the same

thing through my courses online. What would you say, for

somebody who is as passionate about data science and

bringing the culture of data science as you are, into their

organisation, what would you say your best tip would be on

when they want to spread this information, when they want

to maybe teach somebody programming or they want to just

increase awareness of data science and get somebody

excited about data science? What would your best tip be for

somebody in that situation?

Muhsin: Keep it simple, but be mindful of that person’s end goals.

Often someone will come to you with a problem that they

want solved. And that could be anyone. It could be senior

manager that says, "I need these numbers," or it could be a

colleague that wants to learn more programming that says,

"I’d like to process my data in this Excel sheet more

efficiently." I think once you identify the goal, then you kind

of need to break down that complexity into really nice bite-

sized chunks and then kind of step through that process,

but keep coming back to the end goal of where, you know,

we’re working towards something in order to achieve what

you want. So I think that it's often the case that particularly

for large projects, it can be overwhelming. And you don’t

quite see what’s over the horizon. But if you keep keeping in

mind the purpose, then it can be motivating. Particularly

when you’re working with someone, it’s always nice to

bounce ideas off each other. It’s good to get feedback, and


keep people in the loop. Yeah, I’m always a big advocate of

keeping things simple and being mindful at all times.

Kirill: Yeah, I totally love it. I have the same approach. I break

down my courses into sections, and in every section, there’s

like a challenge that needs to be solved. And I’ll always

describe the challenge, and sometimes I’ll even show—when

it’s like the super complex challenge -- I’ll show what the end

result is, and what we want to achieve, what that end

visualisation looks like, or end insight looks like, and I’m

like, "Okay, that’s where were gonna get to, and we’re gonna

break down the process into steps," like you said, into these

simple steps, and it really helps. So keeping that end goal in

mind keeps people focused, I guess, on why they’re doing

this. Rather than just learning the mechanical steps, they’re

actually learning the reason behind why they’re learning

what they’re learning.

Muhsin: Yeah, absolutely, And that’s a great technique to actually

show people, "Look, this is where you’re heading." I

remember the intern I mentioned before, when she started

with us, I told her it wasn’t that long ago that I was where

she is now. And with a number of years learning the

processes of data wrangling and picking up some data

science techniques, she can get to where I’m currently at. I

think that she appreciated that, knowing that even though

something may take some time, it’s an achievable goal.

Yeah, it’s good to find someone, like the people that have

reached that point, and they can actually tell you, "Look, it

is possible. Here’s what you can do about it and here are

some tips to get there."


Kirill: Fantastic! Thank you for sharing that. And I’ve got a couple

of interesting questions for you. First one will be: What has

been the biggest challenge for you as a data analyst?

Muhsin: So the biggest challenge as a data analyst is often kind of at

the two ends of the data analysis pipeline, and that’s when

it’s getting the data and then distributing the data. So often

in an organisation it can be challenging to acquire different

data sets for various reasons. There are multiple business

constraints, whether it’s potentially political or ethical, there

are privacy concerns, it may be practical reasons, so you

might need the resources of someone from IT, but they’re

just too busy. And because of those reasons, often getting

the data and then sharing it, such as pushing your data out

into a dashboard could become a bit of a hurdle. So that

tends to be a bit of a challenge, at least what I’ve seen in big

organisations. In those circumstances it means that you can

kind of wait until a solution occurs, or you find a

workaround, and then focus on another part of the pipeline.

Another challenge when it comes to data analysis and data

science for me is not having other analysts close by. So I

mean, I am the analyst of my team, and it’s great to bounce

ideas off people and just come up with different ways of

analysing data and talking through a problem. And when

you don’t have those people immediately with you, you can

approach other people so I’m often talking to people from IT

and our solution architects and they’re great at getting you

past blockages. But then also reaching out to analysts

across different departments, even though they’re not within

your immediate team. So you can kind of build a bit of a

community depending on how large your organisation is, or

even go outside your organisation and join various meet-ups


and just share a lot of your experiences and learn from

people as much as possible.

Kirill: Yeah, I find that that’s very valuable advice. There are

always these organisations and always these altruistic

groups and meet-ups where people want to share their

knowledge. You know, it’s like learning a language. You want

to converse with the people that already know the language,

or are also already learning it so it just makes it more fun

and interesting. And there are so many different groups that

you can join. Pretty much in any medium to large-sized city

there’s gonna be at least one group of aspiring data

scientists or analysts that you can join and share

experiences. That’s some great advice. And what would you

say is your recent win that you had using data science or

data analytics that you can share with us?

Muhsin: What I’ve been spending the majority of my time doing is

building an analytics dashboard platform. This is where I’m

grabbing all our various customer experience data and other

data sets, and I’ve got a bit of a pipeline at the moment

where I’ve got my R scripts and some Bash scripts where I’m

automating a lot of the processing, the data wrangling, and

then the tidying of data, and then piping that into

dashboards. And we’re going down the path of automated

solution, because I want this system to be as independent

from me as possible. I want it to be something that lasts long

after I’ve left the organisation. I mean, that would be quite

flattering if this system is still chugging along even when I’m

not there. Recently we’ve been showcasing these

dashboards, this analytic platform’s potential and a lot of

people across the company are asking about it and we’re

about to launch it. So yeah, I’m looking forward to using


these dashboards and this platform as a means to showcase

that you can make decisions through data analysis, data-

driven approaches. And these dashboards would also serve

as a means to present a lot of those insights gained from

data science, of course in a very simplistic, palatable way

that the end user, whether it’s a staff member in a store or a

senior manager, where they can take that information and

action it. And do something meaningful with it.

Kirill: That’s totally a very admirable thing to do. And you

mentioned that it’d be flattering if those dashboards were

still there even if you move on from that role and you move

on from maintaining those dashboards. This brings me to

the next question, which is kind of in line with that. What is

your most favourite thing about being a data analyst or

being in data science? What inspires you the most to keep

doing what you’re doing?

Muhsin: Definitely the creative aspect of the work. I think there’s a lot

of creativity in data analysis and data science. That’s why

I’m not entirely sure the machines will completely take us

over unless machines can master creativity. If they do then

we’re in a little trouble.

Kirill: In a lot of trouble.

Muhsin: Yeah, yeah. But yeah, I really enjoy being able to combine

and transform data, and then with my neuroscience

background think about different ways of highlighting

behaviour, whether it’s customer behaviour or staff

behaviour or even just organisational behaviour. And when

I've thought about a different way to do that, then taking a

step back and thinking, "Okay, given the data that we have,

given the resources that we have, is it possible to actually


code this up and then present it?" So I really enjoy that

creative aspect of work.

Kirill: I totally agree with you. It’s interesting, you know. It’s such a

technical field. It’s such a—you know, the numbers,

mathematics, algorithms statistics, programming. It’s so

technical and yet there’s so much room for creativity. You

don’t expect that. You think creative subjects are arts, and

English, and literature, but here, in data science, amidst all

of that technicality, there is something that is so inspiring

about this creative aspect of the job.

Muhsin: Absolutely. Yeah. I mean, the thing I love is that in this field,

we get to build something that didn’t exist before. You know,

we’re building useful products. So I think it’s very creative

pursuit.

Kirill: Yeah. Thank you very much for sharing all that knowledge

and coming on the show and sharing these insights. I just

have a couple of quick questions to finish off this episode.

What is your main career aspiration that pushes you

forward to better yourself and become a better data scientist

every single day?

Muhsin: For me it’s overcoming a challenge. So a lot of the times

when you have your end goal in mind and you’ve broken

your task up into something that’s small and achievable at

each stage, it’s really focusing on those little tasks and

investigating the ways that you can overcome and conquer

that challenge. That’s using a variety of tools and the

resources you have and speaking to people to help you

achieve that. So I really like that aspect of the field where

you can keep pushing yourself to the next stage. Yeah, it’s

almost like an inevitability where you can—even though at


the time you’re getting over a hurdle, once the goal is

achieved you can kind of look back and realise along the

way you’ve picked up all these skills, had great

conversations with people, and you learned quite a lot. At

the time, the motivation is, "I just really want to solve this

task at hand."

Kirill: Fantastic. And kind of the way I think about it is that on one

hand, you’re learning something on the very edge of what

technology is capable of, of the algorithms that humans have

come up with so far. You know, it’s creating something

brand new in terms of machine learning. These are things

people have already created and used before. But at the

same time when you apply it in your specific challenge, it

makes it unique. Even though you’re learning something

that already exists, when you’re applying it in the

circumstances, and to the business problem at hand, it

makes it unique. And that gives you that fulfilment that

you’ve actually done something new, you’ve created

something brand new for the world. And I completely agree

with you. That’s such a great feeling, that even just that is

worth waking up in the morning and going and doing it

again.

Muhsin: Absolutely. You said it very well in that I could go online and

find someone that has outlined how to perform a particular

method of clustering text and then apply it to my own data

set. And because I’m so familiar with that data set, it’s

motivating to see how the insights start to take form. And

you’re right, the end processes is you end up creating

something new and for your organisation that’s ideally

something that will be useful.


Kirill: Yeah, I totally agree. Thank you so much again, Muhsin, for

coming on the show. If any of our listeners would like to get

in touch with you, maybe follow your career, how can they

best find you?

Muhsin: I’m on LinkedIn. They can just search for my name and find

me there. I also have a blog that I don’t update frequently

enough but now that I’m mentioning it perhaps I’ll get back

to writing more because I do enjoy writing. So on that blog,

it’s called probablyabetterway at blogspot.com.au and I can

send you a link.

Kirill: Interesting. All right, definitely. We will include that in the

show notes. So if you’re listening to this podcast, definitely

check out Muhsin’s blog, probablyabetterway at

blogspot.com.au. And one final question for you today: What

is your one favourite book that can help our listeners

become better data scientists and analysts?

Muhsin: I’ll mention the book that had a huge influence on me. It’s a

very popular one that I’m sure many people have read. It’s

called “Freakonomics” by Steven Levitt and Stephen Dubner,

and for me it was the first of its kind to showcase how our

assumptions can be challenged by study design and data

analysis. And it really did invite me into the methodologies of

how real world data can uncover really surprising

behavioural insights of people. Since then there have been a

lot of similar books. I know you’ve asked for one but I’m

gonna mention a couple more.

Kirill: Sounds good. Go for it. Go for gold.

Muhsin: Another popular science book I recommend people looking

at, particularly with people of my background from science,

it’s called “Invisible Gorilla” by Christopher Chabris and


Daniel Simons, and that’s probably the best popular science

book I’ve read that shows how our intuitions can deceive us.

And a number of your guests have mentioned Nate Silver’s

"The Signal and the Noise," which I highly recommend. And

a fun and probably also depressing one is called "Dataclysm"

by Christian Rudder. He is a person who’s worked for the

online dating site OkCupid. He’s crunched the numbers for

online dating to reveal some pretty sobering insights of how

people try to find The One online.

Kirill: Very interesting. Thank you very much. There you go, guys.

We’ve got a whole library of books: "Freakonomics", "Invisible

Gorilla", "Signal and the Noise", and "Dataclysm". Once

again, thank you so much, Muhsin, for coming on the show

and sharing your insights. I’m sure so many people will find

so much value in this conversation that we had just now.

Muhsin: Thank you for having me.

Kirill: All right. Take care and good luck with your machine

learning aspirations.

Muhsin: Thank you. All right, all the best!

Kirill: So there you have it. I hope you enjoyed this podcast. We did

go into quite a lot of interesting conversations. I really

enjoyed the conversation about machine learning, about how

somebody who’s completely new to this field, so somebody

who has experience in data science, but somebody who

wants to challenge and tackle the challenging field of

machine learning, how they went about it, what kind of

tactics or just the approach Muhsin has been using to get

into the field of machine learning. I think that can help

everybody get started into that field.


Also, I always enjoy talking about Moore’s Law, so hopefully

you picked up some valuable stuff from there. Once again,

check out the book called "Bold" by Peter Diamandis if you

want to learn more about Moore’s law. It’s described in great

depth there and it is the governing law of how computers are

developing in that exponential curve. So all in all, I hope I

enjoyed this episode and of course, we’ll include all the links

to the mentioned resources, you’ll be able to follow Muhsin

on his linked in and check out his blog. So don’t forget to go

to superdatascience.com/9 and while you’re there join the

SuperDataScience community and hang out with more

people like Muhsin and other students of SuperDataScience.

I can’t wait to see you next time. Until then, happy

analysing.


Documents

SDS PODCAST EPISODE 9 WITH MUHSIN KARIM · Kirill: This is episode number 9, with neuroscience PhD turned data analyst Muhsin Karim. (background music plays) Welcome to the SuperDataScience