Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Kirill: This is episode number 9, with neuroscience PhD turned
data analyst Muhsin Karim.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill
Eremenko, data science coach and lifestyle entrepreneur.
And each week, we bring you inspiring people and ideas to
help you build your successful career in data science.
Thanks for being here today and now let’s make the complex
simple.
(background music plays)
Welcome to this episode of the SuperDataScience podcast.
Today I've got a special guest, a student of mine, Muhsin
Karim. And Muhsin actually reached out to me on Udemy
just to say thank you for the courses and I asked him for his
LinkedIn. Then when I looked at it, I was so fascinated.
Muhsin actually has a PhD in neuroscience and he has quite
some experience in academia. And then he went down that
path that we have witnessed so many times on this podcast
when people from academia turn into data analysts and data
scientists.
So in this podcast you'll find out what drove Muhsin to
change his profession to move away from academia and
move into data analytics, what skills he leverages from his
neuroscience background and his data processing that he
was doing there in his current day to day role. Also, it was
really interesting how we had a conversation about where
the world's going in terms of data, what machines are
coming up to, how the human brain is going to be modelled
by machines in the upcoming years. We talked about
Moore's Law in quite some depth, so if you're not familiar
with Moore's Law, this is a good episode to pick up on
Moore's Law that currently governs how computers are
developing, and that exponential curve in technology that is
all dictated by Moore's Law at the moment.
Also in this episode, we go into quite a lot of depth about
how to get into the space of machine learning. So Muhsin
has quite a lot of experience with data from his PhD, so he
uses R programming pretty much on a day to day basis and
he has expertise around some statistical methodologies, and
his next venture in data science and in analytics is to get
into machine learning. And he's taking actually one of our
SuperDataScience courses on machine learning. And he'll
share some tips and advice on how you can get into the
space of machine learning. How you can tackle this huge
broad challenging field which is machine learning. How you
can dip your toes in the water, how you can get into there
slowly. He'll give you some of his experiences of how he's
gone about it and how he's gotten into this field slowly
making his way and expanding his skills. And also we
discuss why it's necessary, why machine learning is such an
important field to get into now. This is the time to get into
machine learning.
And also in this podcast, we talked about quite a lot of
books. So in total, there are 7 books that we mentioned in
this podcast. I brought up 3 and Muhsin recommended 4.
So if you're interested to fill up your library with data
science and related books, then this episode is definitely for
you.
Can't wait to get started, can't wait for you to hear all the
insights. And without any further ado, I bring to you Muhsin
Karim.
(background music plays)
Welcome everybody to the SuperDataScience podcast. Today
I have an exciting guest. Muhsin, who is one of my students
on Udemy! Welcome to the show, Muhsin. Thank you for
taking the time.
Muhsin: Thank you for having me.
Kirill: I had a look through your LinkedIn. Very impressive, and
we'll get to that in a second. But first of all, could you share
with me and with the listeners, how did you find out about
my courses on machine learning and data science?
Muhsin: It was actually through a colleague of mine. So I work at a
company called Harvey Norman, and I was chatting to this
colleague. We both share an interest in data science, and he
happened to mention your course, “Machine Learning A-Z”.
And I signed up, and in a weekend, I went a third of the way
through, and I was very impressed with the course.
Kirill: Fantastic. So you're enjoying it so far?
Muhsin: I am indeed, yeah, absolutely. I like that as an R
programmer, I can code up in R, but then you have a direct
comparison to the Python code, which I'm currently learning
at the moment as well. So it's good to have those two side by
side comparisons between the code I'm familiar with and one
that I'm learning.
Kirill: Oh that's really cool. And are you enjoying the Python one as
well? Do you find it much different to the R code?
Muhsin: Not terribly different. I mean programming the concepts are
similar. A few things I just need to get my head around, but
otherwise I can see the similarities. So it's good to have them
both there.
Kirill: Ok, fantastic. And I remember that we had a quick chat on
Udemy. Do you remember why you reached out? Why did we
get in touch?
Muhsin: Oh, I just expressed my appreciation that you were a very
good communicator in demystifying very complex --
Kirill: Oh that's right, that's right. Thank you.
Muhsin: I always appreciate when people can do that, when they can
take something that's quite difficult to convey and describe it
in a palatable way. So I just said thank you.
Kirill: Oh, right. I remember that. And thank you so much for that,
and it's interesting how that transferred to you coming onto
the show! Because I have a sense sometimes when people
are extraordinary, I have a sense that I have to take a few
more seconds, a few more minutes, to actually find out
where this person's coming from, what their background is.
And so I asked you for your LinkedIn, and I didn't make a
mistake! When I saw your LinkedIn, you have a PhD in
neuroscience! And as soon as I saw that, I was like, I have to
bring Muhsin onto the show! So yeah, thank you for
reaching out, thank you for coming onto the show. You
ready to talk a bit about your background?
Muhsin: Yeah, absolutely!
Kirill: Alright. So PhD in neuroscience. I was just going to quickly
recap on this. So you have a publication which is called
"Behavioural and neural correlates of vibrotactile
discrimination and uncertainty". And you have some other
publications as well. So to start us off, how did you get into
neuroscience, and why did you choose this field?
Muhsin: Yeah, so I kind of stumbled into it. But I guess it begins with
my degree. So I actually have a degree in Molecular Biology
Genetics. So that's the first science I picked up. And that
degree involved a lot of wet lab based work. So it's
centrifusing test tubes and adding chemicals together and
wearing white lab coats. It was very -- looking like someone
from a CSI crime lab!
But despite that excitement, I really didn't like the degree!
And I hated it all the way throughout. But what I found
during the degree was I attended a talk. And it was about
neuroscience and biochemistry. And I realised I was more
fascinated with brains and behaviour rather than
biochemistry and genetics.
From there, I pivoted into a PhD research project that was to
examine neural cells from an animal model that had been
fed different concentrations of an important early brain
development nutrient. And my task was to lay neural cells in
a petri dish and watch their synaptic growth. So when you
put two neural cells together in a dish, if they're healthy,
they'll start communicating with each other, and they'll form
connections, these synapses. Unfortunately, the cells kept
dying. So it was a terribly demoralising experiment and I
can't emphasise enough that I really hated what I was doing!
I was terribly frustrated, nothing was working, and I actually
quit that PhD. And it was the best decision I made in my life,
and I wish I had done it sooner.
But then I eventually found a neuroscience lab at the
University of New South Wales that had received funding to
take a more multi-disciplinary approach to studying human
cognition. And there, my project was to examine how the
human brain made optimal decisions when faced with
uncertainty. So I think going to the particular tasks I was
doing for my PhD --
Kirill: Actually, I read a little bit about your overview of the PhD,
and I would really like to learn more. Like, without going
into too much detail about the technicalities, but what were
you examining and what were the differences between the
experiments and what were your conclusions? I found it
quite interesting, and would like for you to explain what
exactly the outcome was.
Muhsin: Sure. So generally what I was examining was, we wanted to
see how the human brain can make optimal decisions when
faced with uncertainty. So our brains are these amazing
devices, these statistical machines that, when they're faced
in the presence of uncertainty, they tend to actually make
optimal choices. And we wanted to explore that further. And
the way that scientists can do that is they can present
people with very simplistic tasks and then use neuro-
imaging techniques to start to pick apart the areas of the
brain that are implicated in decision-making.
So the way I did that was with a very mind-numbingly dull
task, which is I would get participants to come in a
laboratory and they would place their index finger on this
vibrating probe. And the task is called a vibrotactile
discrimination task. And they had to discriminate between
pairs of vibrations. So you get a vibration, pause, a second
vibration, and you have to answer a question, was the
second vibration faster? Yes/no response. And they had to
do that 400 times.
Kirill: Oh wow!
Muhsin: Yeah, yeah. So the participants were paid. But it was a very
boring task, but scientists have to do this, introduce very
carefully controlled conditions, so we can study bigger
things further up the neural pathways. So I took that task
into an MRI scanner. It's called functional MRI imaging,
which allows us to infer what brain regions are involved in
the task under study. So the way it works is that when you
are engaged in a particular task, the neural cells in your
brain that are implicated in that task will start to have local
blood flow to the brain, and the FMRI can actually image the
regions of the brain where there's local blood flow. That
blood flow is just an indicator of neural activity. So we were
able to find out what brain regions were involved in decision-
making when people were uncertain. So you can increase
the uncertainty in our task by making the vibrations very
similar to each other. So if you had pairs of vibrations where
they were only separated by a few hertz, then participants
really struggled to make that choice. And we were able to
highlight brain regions that were uncertain.
Kirill: Ok, interesting. So you found the brain regions that are
responsible for uncertain decisions.
Muhsin: We found ones that the literature often commonly cited. So
they typically are these prefrontal cortical regions,
particularly one that was called the left dorsal lateral
prefrontal cortex, which it's been years since I've actually
said that out aloud! But yeah, it's a region that's involved in
executive functioning and working memory. The neural
imaging, although it's very exciting, actually wasn't the most
interesting part of the PhD. It was more about the
behavioural sets that I was able to discern from the task.
Kirill: So what were they?
Muhsin: It's a little challenging to describe without drawing it, but I'll
do my best. With the vibrations that our poor subjects had
to sit through, on average tended to be about 34 Hz in
frequency. What I was able to show was that our
participants, they weren't really comparing two vibrations
when they made a decision. They compared the second
vibration with a memory of what the first vibration was. And
our memories, they degrade over time. And even over a short
period of time. So in my experiment, let's say that your first
vibration was 40 Hz. And that memory of 40 Hz, your brain
is just trying to remember 40 Hz. But soon it starts to
degrade and actually drifts down to the average. Feels more
like 34 Hz. So the brain is very good at representing the
environment as averages. And these kind of stereotypical
pictures. In the end, it didn't really matter what the second
vibration was, people were making comparisons to this
average that the brain was representing.
Kirill: That's very interesting. That's a whole discovery on its own,
right?
Muhsin: Yeah, well people had known that this was a known effect.
It's been known for a while, and it's called the time order
effect. But what researchers weren't doing was that when
they set up this vibrotactile discrimination task, they weren't
accounting for it. So it was like you would set up a model
and they had their predictors of interest, so you know,
changing the frequency, or introducing noise as predictors.
But they didn't include the time order effect as a predictor.
And the time order effect, it was a significant influence. It's
something they should be accounting for in these tasks.
Kirill: Ok, yeah. Could you explain time order?
Muhsin: So the reason it's called the time order effect is because
people will bias their decisions based on the actual order of
the way that the vibrations are presented. So people have a
response if they received 40 Hz first and then, let's say, 20
Hz. Compared to 20 Hz and 40. Because it's based on that
drift.
Kirill: Ok, got it.
Muhsin: So when you swap the orders, you actually come up with a
different type of way of decision making. Your decisions are
biased.
Kirill: That's really interesting. And we might be getting a bit
carried away here. The reason I'm so fascinated by this is
actually I recently read two interesting books. One is "The
Future of the Mind" by Michio Kaku, and he actually
explains all these things, like how the MRI works, and how
the left brain versus the right brain operates. So I'm really
interested in this. And the one I'm currently reading is
"Emotional Intelligence" by Daniel Goleman, and it also talks
about how IQ is not the only factor that decides whether a
person's going to be successful in life. There's also emotional
intelligence.
But moving on to more of the data stuff. So all of this data
that you were getting in through these experiments, I'm
assuming 400 tests that you were doing, how were you
storing this data, and what tools were you using to process
it?
Muhsin: I was using a program called Matlab, which is very popular
for engineers and it's a statistical program. And it's very
similar to R, except it's a commercial program. And
universities use it quite a lot. So we used Matlab to
essentially collect the data and also to store it. And then to
process it, there were other external contributors that had
created their own package called SPM, which is Statistical
Parametric Mapping. And what they did was they created
this package that took the neural imaging data and was able
to perform the statistical analysis that would highlight the
significant neural regions that were implicated in the task at
hand.
Kirill: Ok. I've used Matlab as well before. But also in research,
when I was doing my physics degree. So how do you find the
differences between Matlab and R? What would you say are
the advantages of either of the tools?
Muhsin: The reason I'm using R is because, for one, it's free. And also
I found that transition quite -- almost -- I wouldn't say
seamless, but it was quite intuitive, in that they're very
similar languages. Like these vectorised operations. So for
me to go from Matlab to R, for the most part it was pretty
intuitive. So although now I've been using R for so long, I
think I certainly have a preference. Also because the
community for R is so huge. The variety of packages is
impressive. So if you need something like a package that will
beautifully plot your data, that's available. There's packages
that will nicely wrangle and manipulate your data. It's been
years since I've used Matlab, but now I'm definitely in the R
camp.
Kirill: Very, very good. I agree. R has so much support around the
world, it's definitely a great tool to have in your arsenal. And
our listeners might be a bit confused thinking that you are
still doing your research and working in the neuroscience
space. You moved on from that, and you moved on to some
data-driven roles in different companies. Can you tell us a
bit more about why you decided to make that move, and
where exactly did you go?
Muhsin: I actually kind of moved on whilst I was doing my PhD. So
when I was writing up my PhD, I was actively looking for
work in industry in different sectors for data analyst roles.
The reason being is that I think if you wish to stay in
academia, it helps to be very passionate about a research
topic, and I didn't really find my niche in science. So I knew
that the academic career path wasn't for me. So I looked
elsewhere, I looked across different sectors, government,
non-profit, industry. Ultimately for me, it's always a case of,
I'm interested in human behaviour, I'd like to be exposed to
different data sets that capture that behaviour, and I look for
organisations that have that data and are in need of
someone of my background.
Kirill: So that's great. And then you moved on to more retail-
focused roles, so now you're a customer experience analyst
at Harvey Norman, and if I'm not mistaken, Harvey Norman
is a massive chain. Is it like warehouses, or what does
Harvey Norman do again?
Muhsin: Harvey Norman is a large Australian retailer. So we're a
retailer of furniture, bedding, computers, communications,
and electrical products.
Kirill: That's right. I was confusing Harvey Norman with Bunnings.
Bunnings is a different store.
Muhsin: They're one of our competitors.
Kirill: Yeah, there we go. So at Harvey Norman, how do you apply
those skills that you learned and you developed through
your PhD, through that academic research, those data skills,
how do you apply them, and how do they benefit you now in
your current role?
Muhsin: Being a large retailer, we have data. We have lots of data.
What wasn't happening prior to my role was that a lot of
that data was being left unanalysed. It was being used in a
way that you could build a one-off report, but my manager
wanted to do something more with that data. So the skills I
acquired from the PhD, there's the obvious -- the coding
skills, where I was able to wrangle data and tidy it, and then
produce something useful. That's something that I use every
day. But the more generic scientific skills that I picked up
are actually quite useful. My scientific background, it kind of
made me a bit of a careful planner. So when you're setting
up an experiment, an MRI session costs $600 an hour. You
do learn to be quite methodical and prepared. It's almost like
project management, really, in that for a given project at
Harvey Norman, I'm very good at mapping out what's
required, what resources do I need, what's the best
approach, and do I need to speak to anyone to get the job
done, and get past any hurdles. So doing a science degree, it
does teach you a lot of skills that I think a lot of people in
science don't realise have a lot of applications outside of
academia.
Kirill: That's very interesting. And resonates very well with what
Wilson Pok in episode 3 mentioned, the same thing, that
academic background. And Wilson has a PhD in
nanophysics. That academic background helps you set up
the problem, identify the problem, the challenge and set up
for solving it. So like you say, that project management, or
pre-project management, skills are very powerful. So would
you say that for our listeners who are in academia and are
kind of contemplating on maybe making the move into data
science and more industry or retail focused, would you say
that this is one of their top skills that they should sell
themselves or advertise themselves as, focus on that skill
that they can actually identify these problems and set up
projects to minimise expenses, maximise how quickly they’re
processed and things like that? Would you say that that is
one of the key skills that academic-minded people are able
to develop?
Muhsin: Yeah, absolutely. I think that if you do a PhD, it’s really
project management. Not in the traditional sense, but you
really are given a problem to solve. And obviously, with a
PhD, you have a lot more time to really focus on it, but it is a
different way of thinking in that you go from science, but
you should really pitch yourself as someone that can find
solutions in a very ambiguous scenario. And in business,
what I found is you’re often faced with a lot of ambiguity.
Not just in the project itself, but also around the particular
resources that are available. And I think another advantage
going from academia or science to industry is that having
access to things like being very familiar with open source
tools. So when I started, I had the option to request various
commercial tools to do my job for data analysis. And
because it was exploratory, I was actually more keen on just
trying things out with R and other open-source platforms,
and then we found that it completely suited our purposes
and now I’m actually introducing it to more people across
the organisation. So you can bring a lot of what you learned
in academia and be very surprised about what the business
finds useful.
Kirill: Yeah, wonderful. And thank you for that support. I’m sure a
lot of our listeners will find that very encouraging. So if you
are in a situation like Muhsin was, where you’re in academia
and you’re maybe not enjoying as much what you’re doing,
or you’re not as passionate about it as you thought you’d be,
then don’t be afraid of this move into data science because
those skills—like, we’ve had great examples. We’ve had
chemical engineers, we’ve had nanophysics, and now
Muhsin is more neuroscience. So people from all different
sides of academia are moving into data science. You will
always find skills and ways of thinking that you have
developed that will highly benefit you in the data science
role. And that’s not to say that people that are not from
academia shouldn’t move into data sciences, it’s just good
encouragement for those who might find themselves a bit
stuck in academia, that there is this whole other world that
you can explore.
So we’ve discussed a little bit how you moved from academia
to being a data analyst at Harvey Norman. And before we
started this podcast, you mentioned to me that slowly, as
you’re picking up new skills through these courses and your
own education or self-education, your role is slowly
transitioning from a data analyst to a data scientist or that
there is an option, a pathway that the company is happy for
you to undertake to slowly transition to a data scientist.
What does this entail in your view, and what kind of new
responsibilities or new methodologies and new ways of
working does this transition from a data analyst to a data
scientist entail?
Muhsin: So I’m fortunate to be a part of the team where it’s
encouraged to research and explore and come up with new
techniques to find insights around data. So with traditional
data analysis skills there’s reporting and there’s
dashboarding, where you can showcase the information at
hand. But what I’m looking forward to with the data science
skills that I’m picking up is, can we gain insights from the
data that we didn’t necessarily consider before and, quite
frankly, can I make my job easier. So an example is, we
collect very simple customer experience data which tends to
be post-purchase service where we ask customers for ratings
based on their shopping experience. And that can be from
very good down to poor. We also ask them for feedback and
that’s where customers can leave us open-ended responses
based on how they viewed our services. And one of the first
tasks I’ll perform with data science techniques is text
clustering. I’d like to take all that text, of which there is a
large volume, and apply some machine learning to cluster
that information into categories and really highlight to the
organisation that this is what our customers are saying en
masse.
Kirill: Very interesting. And what tools would you think you're
going to be using would be using for text clustering?
Muhsin: To begin, I’m actually looking into Python. So even though I
use R predominantly, I’m now picking up Python and I’ve
been told to look into something called topic modelling
specifically, which I’ve been told is very good for various
ranging strings of comments, whether they can be short or
long. Our customers might say very little and then they may
say a lot. So my hope is that that modelling approach will be
able to factor that in.
Kirill: Okay. And so from this text clustering, walk us through your
thinking. How are you going to take these comments? Are
you going to then put that unstructured data into structured
data and then analyse it and what insights do you hope to
extract from it, or do you have some different approach?
Muhsin: Actually, I’m not entirely sure yet. This is all very new. In
fact, it’s so new that my first real attempt will be this
weekend.
Kirill: Oh, wow! Okay. That’s really exciting and best of luck with
that. Sounds like an interesting undertaking.
Muhsin: Yeah, I’m looking forward to it. Ultimately, what we want to
do is have these techniques to organise the data in a more
intuitive way for our end users to digest, really.
Kirill: Yeah, it’s always good to put data in the right format for end
users to be able to take action or make decisions on that
data. And what you mentioned about finding this new data
that hasn’t been used before kind of resonates with the
notion that was brought up in one of the previous podcasts
of a data landscape, of identifying or mapping out the data
that you have. I would assume that an organisation such as
Harvey Norman would already have all this transactional or
customer data mapped out pretty well. But what you are
actually doing is you’re finding this new data that the
company didn’t even think about, and you are adding it to
the data pool so that you can combine it with other data
sources in order to extrapolate certain decisions, or come up
with certain insights. So is that something that you have in
mind in combining this data with existing data sets that you
have in the company already?
Muhsin: It’s something that we certainly do, where we’ve got our
customer experience data, and of course there's
transactional data, and we can combine the two. Of course,
given the breadth of data we have, it doesn’t mean that you
should combine everything. There are various reasons why
you couldn’t or shouldn’t. You know, for privacy issues, for
instance. Is it possible for us to do that? Yeah, you can gain
a lot of insight by talking to somebody in a different
department and learning more about the practices that they
go through and coming up with different ways of shedding
insight into our customers.
Kirill: Yeah. Wonderful! And I would like to move on a little bit to—
a topic I’d like to explore and I’m interested to hear your
opinion on this, is you studied both neuroscience and now
you’re getting into machine learning. And we know that
machine learning is kind of–-the pre-emphasis of machine
learning, or some of the algorithms and branches of machine
learning, is to model the human brain or take certain
aspects of the human logic and model them in a machine
way so that decisions can be made faster. What is your take
on the similarities or discrepancies between neuroscience
and machine learning?
Muhsin: Big question! I’ll be honest with you, I’m not entirely sure.
Kirill: Through learning, through undertaking these studies in
machine learning, do you see any resemblances with what
you used to pick up in your PhD and about how the human
brain is structured?
Muhsin: Yeah, so I guess an example is -- it's a modelling approach I
haven’t had an opportunity to explore yet, but there’s
Bayesian inference. So I think with Bayesian statistics,
that’s when you need to update a probability, you can
actually look back to past instances to inform your current
decision. And we actually used that analogy quite a lot with
my PhD findings. So when I was talking before about the
time order effect, what essentially the brain is doing is that
when given some stimuli and it needs to make some
decisions, it will take that stimuli, but it will compare it to
past instances, like the average of a frequency. We’ve always
viewed that as being very Bayesian.
So yeah, there are – at least in a very analogous way, a lot of
machine learning approaches have modelled what the
human brain does. And I heard that some other PhD
colleagues of mine at the time, they were getting very heavily
involved into neural networking, of which I’m not entirely
sure how much that reflects the human brain. I know
anecdotally, I believe IBM has an initiative called "Big Blue",
where their plan was to, and I may be paraphrasing
incorrectly here, was to essentially build a brain from
scratch but with a machine. So it’s almost like you’re
building a brain cell by cell, but it’s using code. Certainly, I
think the inspiration of using machine learning techniques
inspired by the brain, I think it’s a very good way of seeing
what evolution has constructed in trying to replicate that in
a way that we can reproduce.
Kirill: It’s very interesting you mention that, because I was reading
a book—I think it was last year I finished reading a book
called “Bold” by Peter Diamandis. I’ve mentioned it a couple
of times already, but there he discusses in detail Moore’s
Law and how—obviously, Moore’s Law states that—Gordon
Moore started Intel back in 1965 and he then came up with
this concept which was later called Moore’s Law in 1968. He
noticed that the price of an average computer is dropping in
half every 18 months, but the processing capacity of the
average computer you can buy somewhere in the stores is
doubling. So basically that’s the main concept, that the
processing capacity, the processing power, is doubling every
18 months, and that law has still held constant. And based
on Moore’s Law, right now we already have computers that
think as fast as a rat’s brain, or the brain of a mouse.
If you extrapolate Moore’s Law, which has held constant
since 1968, which is like crazy, the human brain—we’ll have
computers that think as fast as a human brain by 2025.
And that’s why that’s a big number in machine learning and
why everybody is looking forward to the year 2025 because
that’s, according to Moore’s Law, when we’ll get those
computers. And by the year 2050, if the Moore’s Law still
holds constant, we’ll have an average computer which you
can buy for like $1,000 in Harvey Norman which will think
as quickly as a whole human race. And that is like insane.
Right now, we don’t even have the infrastructure or the
computers to reconstruct the human brain, but by the year
2025, we’ll be able to do it with an average computer which
people will just have in their home. It’s very interesting
where the world is going and what implications that will
have. What are your thoughts on the implications? What do
you think will happen if all of a sudden we’re going to be
able to reconstruct human brains cell by cell?
Muhsin: That’s fascinating! I’m a little worried now. I didn’t realise
that we had advanced to that point where by 2025, we would
have a machine that could think at that capacity because at
the time when I was doing my PhD, I knew that the
advantage that the human brain has over the machines is
the capacity for parallel processing in that each cell can have
many multiple connections to various cells and from cells
located great distances away from each other. I mean, I
guess it’s slightly concerning in that I’m quite convinced
that—we already know that machines will be taking over a
lot of our jobs. Automation is a reality, but that’s fine. I
think it will actually open up new endeavours. But it’s kind
of hard to think about what the future will look like because
it wasn’t that long ago that there was no Google, there was
no Facebook or Uber or Airbnb. I don’t know what’s around
the corner. Even now I’m still impressed when you get an
alert from Google asking you to review something, even
though you’ve never really explicitly told anyone that you’d
been to a restaurant and yet it’s sending you a notification
saying, "You should review this restaurant." It knows that
you’ve visited there with friends before. It’s interesting to see
how that large volume of data and our daily interactions,
and how that will actually start to shift out our physical
world and our presence in our day-to-day. And then
combining it with machines that can think very quickly.
Who knows what the future will hold?
Kirill: Yeah, it’s getting freaky. Just a couple of days ago, I had this
weird experience where I clearly remember—I don’t
remember exactly what it was about, let’s say it was about a
gym membership. And I clearly remember I wasn’t Googling
that topic, and I wasn’t even searching it or anything. I was
just speaking to somebody about it, and then the next day I
get a Google pop-up. I get this impression that maybe our
phones are actually taking in everything that we say and
then Google or some other online project is analysing all of
the textual information that comes from there and that’s
how we got the ads. Because that’s happened like once or
twice to me just over the past month. Have you ever had an
experience like that?
Muhsin: Yeah, on occasion. Basically everyone has experienced
targeted advertising, but it’s getting to the point now where
people are almost expecting that level of service. I’ll admit
I’ve actually clicked on targeted ads because they’re spot on.
So it’s quite a shift in—whereas before it was quite intrusive,
but I think increasingly we’ll start to almost demand and
certainly expect this level of—I use the word "service". I’m
quite paranoid. I think it’s a given that a lot of these devices
are recording us or watching us. I think Mark Zuckerberg
has his little piece of tape over his phone. You know, I think
that says quite a lot. It’s the world we live in now, and I
think a healthy dose of paranoia is not a bad thing. But
yeah, I do find it interesting that we have certainly shifted
into a world where we’re becoming increasingly comfortable
with it, which is fine. With some respects, maybe it will
improve something like the shopping experience, for
instance. And it could have other beneficial effects. Let’s say
you’re—this is just a silly example—accused of a crime, but
based on your digital footprint, you’re able to clearly prove
that you were not involved. We leave so much data behind in
our day-to-day that we’ll be able to defend ourselves. So it’s
not entirely a negative thing, but who knows how data could
be utilised.
Kirill: Totally. It can go both ways. And the thing is that you can’t
really stop it, right? It’s gonna go where it’s gonna go. It’s
already in motion. The machine is working, there’s no
stopping it and I get surprised at these movies about
Terminator and Skynet, and then people are like, "We
should be worried about Skynet. Maybe that can happen in
real life. We should not create Skynet." But if you think
about it, Skynet is already there. It’s like this big massive
thing called the internet which you cannot switch off and it’s
always gonna be working and collecting knowledge and
information. It’s just waiting for artificial intelligence to get
onto this internet and quickly learn all of the things that
we’ve been learning and then we’re kind of at the mercy of
that artificial intelligence. Don’t you think?
Muhsin: Absolutely. I’m concerned about how all this data will be
used to the point that I actually find myself disengaging from
some platforms because I don’t want to feed the beast
necessarily, but sometimes you can’t avoid it but you have
to. I remember early on I used to be quite hesitant about
making purchases online but I certainly do it now. There’s
no way to avoid it. And I think you can’t be an engaged
citizen unless you’re using the internet. You can’t go
completely off the grid. I mean, some people try and do it,
but it’s not really a pathway for many people.
Kirill: Yeah. So just like on that—most of this, you know, these
recommender systems that we spoke about, these AI and
mimicking the human brain in 2025 and so on, all of that is
governed or most of that is governed by machine learning
algorithms. And now you’re getting into this field of machine
learning, which I’m very admiring and I’m sure lot of our
listeners want to get into it. It’s the new data science. Like, if
you want to be a data scientist in the next 10 years, you
need to know about machine learning. So my question to
you is, machine learning is so broad. Off the top of my head,
I can probably name at least seven different branches of
machine learning including clustering, classification,
association rule learning, deep learning and so on. How do
you go about getting into—you know, taking your first steps,
or like mastering the first algorithms in machine learning?
How do you decide where to start? How do you actually go
about conquering this field? What is your plan, and what
can you recommend to those who are just looking at
machine learning and want to get into it as well?
Muhsin: Yeah, that’s a good question. I guess there are various
approaches. But the one that I took was—I mean, being a
data analyst I knew that I needed more power and more
techniques and abilities. So at some point you kind of reach
a bit of a limit with what you’re currently doing. And at the
time when I was transitioning into industry, I think data
science became a thing. And my approach was—I was
fortunate to be in an age where online platforms, you know,
the massive online open courses were starting to become
available. So, one big introduction for me was that I learned
R programming by the John Hopkins Data Science Coursera
course. That’s a series of 8 or 9 courses where they take you
through cleaning data and presenting it and they teach you
R. That was my avenue into this world. And then
increasingly from there, you realise that there’s a wealth of
information available, whether it’s from online courses,
various providers.
Often at times, I’m on stackoverflow every day, you just
Google for answers. So it’s a case of when you’re in a
position when you need answers. I mean, the Internet is a
great resource to find those answers. So for people just kind
of getting into it, yeah, the more time you spend doing data
analysis, and the more complex questions you have, I found
that I start to reach a limit. And that’s why I feel machine
learning is a way to broaden the horizons in terms of what’s
possible. So that’s why before I said I’m really interested in
what can machine learning do for me in terms of
highlighting things that I didn’t realise before. And I think
that’s a very powerful concept, in that traditional reporting
or traditional dashboarding is presenting the data you kind
of already know something about. Like, people have an
intuition about the data and you’re showcasing that data.
But with machine learning, because of the sheer power that
machines have, they’re able to potentially highlight
something that a person couldn’t have considered.
Kirill: Yeah, definitely. And especially with the huge volume of the
data, or maybe the complexity of the data, they help you
break those barriers and get those insights which otherwise
would have taken you ages or you wouldn’t even have
thought of. That’s some really great advice. You’re totally
right that with the new world that we have all these online
courses, massive online open courses and just any kind of
online education. Anybody can get into this field and
anybody can slowly start exploring step by step what
machine learning can do for them and how to get into this
field. I was actually at a conference last week. It was related
to data – it was on digital marketing, so there was a lot of
things related to data. It wasn’t specifically related to
machine learning, but somebody at the conference, one of
the speakers, said that the world we live in, you can learn
anything. You just have to know how to Google really, really
well. And that's resonated with me now when you mentioned
Google. It’s exactly right. So if you want to find out anything,
if you want to learn pretty much anything on this planet,
you just open up Google and it’s about how well you can
Google that topic, and how quickly you can find your way
into that information and access to information.
Muhsin: Absolutely. You said it best there. It’s about how well you
can actually structure the question. So when I’m teaching
people programming, one of the first skills I say they should
develop as quickly as possible is finding the right words to
use to find those answers. Provided you search correctly, the
more you read, the better the language becomes in terms of
how you hunt down those answers.
Kirill: Yeah, totally, and when you say teaching programming, does
that mean in your workplace? You’re like helping out your
co-workers or is that something else?
Muhsin: Yeah, yeah, helping out co-workers. So at the time when I’m
learning I’m also—I find the best way to learn is to try to
teach someone else.
Kirill: Exactly! I feel the same way.
Muhsin: Yeah. I think that if you can convey something in a way that
you yourself understand, and that someone else appreciates,
I think that’s beneficial for both parties. And that’s why I
really—I’m high like when people can break down very
complex, often uninviting information into something that’s
demystifying and palatable. So, yeah, I’ve often in past jobs
sat down with co-workers who were kind of stuck in their
Excel world. I love Excel, but if you only know Excel, then
you have an upper bound in terms of what type of analysis
you can do. And they have approached me to learn a bit
more programming and I’m always pleased when they
actually progress. And it’s kind of nice feedback to know
what you’re teaching is getting through. Yeah, it’s not just
collaborative. Recently I’ve had an intern who worked with
us for the past three months, and it was great to see her
flourish and pick up a few new skills in data wrangling and
some data science techniques.
Kirill: That’s really cool. I admire that a lot because I do the same
thing through my courses online. What would you say, for
somebody who is as passionate about data science and
bringing the culture of data science as you are, into their
organisation, what would you say your best tip would be on
when they want to spread this information, when they want
to maybe teach somebody programming or they want to just
increase awareness of data science and get somebody
excited about data science? What would your best tip be for
somebody in that situation?
Muhsin: Keep it simple, but be mindful of that person’s end goals.
Often someone will come to you with a problem that they
want solved. And that could be anyone. It could be senior
manager that says, "I need these numbers," or it could be a
colleague that wants to learn more programming that says,
"I’d like to process my data in this Excel sheet more
efficiently." I think once you identify the goal, then you kind
of need to break down that complexity into really nice bite-
sized chunks and then kind of step through that process,
but keep coming back to the end goal of where, you know,
we’re working towards something in order to achieve what
you want. So I think that it's often the case that particularly
for large projects, it can be overwhelming. And you don’t
quite see what’s over the horizon. But if you keep keeping in
mind the purpose, then it can be motivating. Particularly
when you’re working with someone, it’s always nice to
bounce ideas off each other. It’s good to get feedback, and
keep people in the loop. Yeah, I’m always a big advocate of
keeping things simple and being mindful at all times.
Kirill: Yeah, I totally love it. I have the same approach. I break
down my courses into sections, and in every section, there’s
like a challenge that needs to be solved. And I’ll always
describe the challenge, and sometimes I’ll even show—when
it’s like the super complex challenge -- I’ll show what the end
result is, and what we want to achieve, what that end
visualisation looks like, or end insight looks like, and I’m
like, "Okay, that’s where were gonna get to, and we’re gonna
break down the process into steps," like you said, into these
simple steps, and it really helps. So keeping that end goal in
mind keeps people focused, I guess, on why they’re doing
this. Rather than just learning the mechanical steps, they’re
actually learning the reason behind why they’re learning
what they’re learning.
Muhsin: Yeah, absolutely, And that’s a great technique to actually
show people, "Look, this is where you’re heading." I
remember the intern I mentioned before, when she started
with us, I told her it wasn’t that long ago that I was where
she is now. And with a number of years learning the
processes of data wrangling and picking up some data
science techniques, she can get to where I’m currently at. I
think that she appreciated that, knowing that even though
something may take some time, it’s an achievable goal.
Yeah, it’s good to find someone, like the people that have
reached that point, and they can actually tell you, "Look, it
is possible. Here’s what you can do about it and here are
some tips to get there."
Kirill: Fantastic! Thank you for sharing that. And I’ve got a couple
of interesting questions for you. First one will be: What has
been the biggest challenge for you as a data analyst?
Muhsin: So the biggest challenge as a data analyst is often kind of at
the two ends of the data analysis pipeline, and that’s when
it’s getting the data and then distributing the data. So often
in an organisation it can be challenging to acquire different
data sets for various reasons. There are multiple business
constraints, whether it’s potentially political or ethical, there
are privacy concerns, it may be practical reasons, so you
might need the resources of someone from IT, but they’re
just too busy. And because of those reasons, often getting
the data and then sharing it, such as pushing your data out
into a dashboard could become a bit of a hurdle. So that
tends to be a bit of a challenge, at least what I’ve seen in big
organisations. In those circumstances it means that you can
kind of wait until a solution occurs, or you find a
workaround, and then focus on another part of the pipeline.
Another challenge when it comes to data analysis and data
science for me is not having other analysts close by. So I
mean, I am the analyst of my team, and it’s great to bounce
ideas off people and just come up with different ways of
analysing data and talking through a problem. And when
you don’t have those people immediately with you, you can
approach other people so I’m often talking to people from IT
and our solution architects and they’re great at getting you
past blockages. But then also reaching out to analysts
across different departments, even though they’re not within
your immediate team. So you can kind of build a bit of a
community depending on how large your organisation is, or
even go outside your organisation and join various meet-ups
and just share a lot of your experiences and learn from
people as much as possible.
Kirill: Yeah, I find that that’s very valuable advice. There are
always these organisations and always these altruistic
groups and meet-ups where people want to share their
knowledge. You know, it’s like learning a language. You want
to converse with the people that already know the language,
or are also already learning it so it just makes it more fun
and interesting. And there are so many different groups that
you can join. Pretty much in any medium to large-sized city
there’s gonna be at least one group of aspiring data
scientists or analysts that you can join and share
experiences. That’s some great advice. And what would you
say is your recent win that you had using data science or
data analytics that you can share with us?
Muhsin: What I’ve been spending the majority of my time doing is
building an analytics dashboard platform. This is where I’m
grabbing all our various customer experience data and other
data sets, and I’ve got a bit of a pipeline at the moment
where I’ve got my R scripts and some Bash scripts where I’m
automating a lot of the processing, the data wrangling, and
then the tidying of data, and then piping that into
dashboards. And we’re going down the path of automated
solution, because I want this system to be as independent
from me as possible. I want it to be something that lasts long
after I’ve left the organisation. I mean, that would be quite
flattering if this system is still chugging along even when I’m
not there. Recently we’ve been showcasing these
dashboards, this analytic platform’s potential and a lot of
people across the company are asking about it and we’re
about to launch it. So yeah, I’m looking forward to using
these dashboards and this platform as a means to showcase
that you can make decisions through data analysis, data-
driven approaches. And these dashboards would also serve
as a means to present a lot of those insights gained from
data science, of course in a very simplistic, palatable way
that the end user, whether it’s a staff member in a store or a
senior manager, where they can take that information and
action it. And do something meaningful with it.
Kirill: That’s totally a very admirable thing to do. And you
mentioned that it’d be flattering if those dashboards were
still there even if you move on from that role and you move
on from maintaining those dashboards. This brings me to
the next question, which is kind of in line with that. What is
your most favourite thing about being a data analyst or
being in data science? What inspires you the most to keep
doing what you’re doing?
Muhsin: Definitely the creative aspect of the work. I think there’s a lot
of creativity in data analysis and data science. That’s why
I’m not entirely sure the machines will completely take us
over unless machines can master creativity. If they do then
we’re in a little trouble.
Kirill: In a lot of trouble.
Muhsin: Yeah, yeah. But yeah, I really enjoy being able to combine
and transform data, and then with my neuroscience
background think about different ways of highlighting
behaviour, whether it’s customer behaviour or staff
behaviour or even just organisational behaviour. And when
I've thought about a different way to do that, then taking a
step back and thinking, "Okay, given the data that we have,
given the resources that we have, is it possible to actually
code this up and then present it?" So I really enjoy that
creative aspect of work.
Kirill: I totally agree with you. It’s interesting, you know. It’s such a
technical field. It’s such a—you know, the numbers,
mathematics, algorithms statistics, programming. It’s so
technical and yet there’s so much room for creativity. You
don’t expect that. You think creative subjects are arts, and
English, and literature, but here, in data science, amidst all
of that technicality, there is something that is so inspiring
about this creative aspect of the job.
Muhsin: Absolutely. Yeah. I mean, the thing I love is that in this field,
we get to build something that didn’t exist before. You know,
we’re building useful products. So I think it’s very creative
pursuit.
Kirill: Yeah. Thank you very much for sharing all that knowledge
and coming on the show and sharing these insights. I just
have a couple of quick questions to finish off this episode.
What is your main career aspiration that pushes you
forward to better yourself and become a better data scientist
every single day?
Muhsin: For me it’s overcoming a challenge. So a lot of the times
when you have your end goal in mind and you’ve broken
your task up into something that’s small and achievable at
each stage, it’s really focusing on those little tasks and
investigating the ways that you can overcome and conquer
that challenge. That’s using a variety of tools and the
resources you have and speaking to people to help you
achieve that. So I really like that aspect of the field where
you can keep pushing yourself to the next stage. Yeah, it’s
almost like an inevitability where you can—even though at
the time you’re getting over a hurdle, once the goal is
achieved you can kind of look back and realise along the
way you’ve picked up all these skills, had great
conversations with people, and you learned quite a lot. At
the time, the motivation is, "I just really want to solve this
task at hand."
Kirill: Fantastic. And kind of the way I think about it is that on one
hand, you’re learning something on the very edge of what
technology is capable of, of the algorithms that humans have
come up with so far. You know, it’s creating something
brand new in terms of machine learning. These are things
people have already created and used before. But at the
same time when you apply it in your specific challenge, it
makes it unique. Even though you’re learning something
that already exists, when you’re applying it in the
circumstances, and to the business problem at hand, it
makes it unique. And that gives you that fulfilment that
you’ve actually done something new, you’ve created
something brand new for the world. And I completely agree
with you. That’s such a great feeling, that even just that is
worth waking up in the morning and going and doing it
again.
Muhsin: Absolutely. You said it very well in that I could go online and
find someone that has outlined how to perform a particular
method of clustering text and then apply it to my own data
set. And because I’m so familiar with that data set, it’s
motivating to see how the insights start to take form. And
you’re right, the end processes is you end up creating
something new and for your organisation that’s ideally
something that will be useful.
Kirill: Yeah, I totally agree. Thank you so much again, Muhsin, for
coming on the show. If any of our listeners would like to get
in touch with you, maybe follow your career, how can they
best find you?
Muhsin: I’m on LinkedIn. They can just search for my name and find
me there. I also have a blog that I don’t update frequently
enough but now that I’m mentioning it perhaps I’ll get back
to writing more because I do enjoy writing. So on that blog,
it’s called probablyabetterway at blogspot.com.au and I can
send you a link.
Kirill: Interesting. All right, definitely. We will include that in the
show notes. So if you’re listening to this podcast, definitely
check out Muhsin’s blog, probablyabetterway at
blogspot.com.au. And one final question for you today: What
is your one favourite book that can help our listeners
become better data scientists and analysts?
Muhsin: I’ll mention the book that had a huge influence on me. It’s a
very popular one that I’m sure many people have read. It’s
called “Freakonomics” by Steven Levitt and Stephen Dubner,
and for me it was the first of its kind to showcase how our
assumptions can be challenged by study design and data
analysis. And it really did invite me into the methodologies of
how real world data can uncover really surprising
behavioural insights of people. Since then there have been a
lot of similar books. I know you’ve asked for one but I’m
gonna mention a couple more.
Kirill: Sounds good. Go for it. Go for gold.
Muhsin: Another popular science book I recommend people looking
at, particularly with people of my background from science,
it’s called “Invisible Gorilla” by Christopher Chabris and
Daniel Simons, and that’s probably the best popular science
book I’ve read that shows how our intuitions can deceive us.
And a number of your guests have mentioned Nate Silver’s
"The Signal and the Noise," which I highly recommend. And
a fun and probably also depressing one is called "Dataclysm"
by Christian Rudder. He is a person who’s worked for the
online dating site OkCupid. He’s crunched the numbers for
online dating to reveal some pretty sobering insights of how
people try to find The One online.
Kirill: Very interesting. Thank you very much. There you go, guys.
We’ve got a whole library of books: "Freakonomics", "Invisible
Gorilla", "Signal and the Noise", and "Dataclysm". Once
again, thank you so much, Muhsin, for coming on the show
and sharing your insights. I’m sure so many people will find
so much value in this conversation that we had just now.
Muhsin: Thank you for having me.
Kirill: All right. Take care and good luck with your machine
learning aspirations.
Muhsin: Thank you. All right, all the best!
Kirill: So there you have it. I hope you enjoyed this podcast. We did
go into quite a lot of interesting conversations. I really
enjoyed the conversation about machine learning, about how
somebody who’s completely new to this field, so somebody
who has experience in data science, but somebody who
wants to challenge and tackle the challenging field of
machine learning, how they went about it, what kind of
tactics or just the approach Muhsin has been using to get
into the field of machine learning. I think that can help
everybody get started into that field.
Also, I always enjoy talking about Moore’s Law, so hopefully
you picked up some valuable stuff from there. Once again,
check out the book called "Bold" by Peter Diamandis if you
want to learn more about Moore’s law. It’s described in great
depth there and it is the governing law of how computers are
developing in that exponential curve. So all in all, I hope I
enjoyed this episode and of course, we’ll include all the links
to the mentioned resources, you’ll be able to follow Muhsin
on his linked in and check out his blog. So don’t forget to go
to superdatascience.com/9 and while you’re there join the
SuperDataScience community and hang out with more
people like Muhsin and other students of SuperDataScience.
I can’t wait to see you next time. Until then, happy
analysing.