Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Show Notes: http://www.superdatascience.com/427 1
SDS PODCAST
EPISODE 427:
IMPACTING
THROUGH
TECHNOLOGY
Show Notes: http://www.superdatascience.com/427 2
Kirill Eremenko: 00:00:00 This is episode number 427 with VP of Data Science at
Gojek, Syafri Bahar.
Kirill Eremenko: 00:00:12 Welcome to the SuperDataScience Podcast. My name is
Kirill Eremenko, Data Science Coach and Lifestyle
Entrepreneur, and each week we bring you inspiring
people and ideas to help you build your successful career
in Data Science. Thanks for being here today, and now
let's make the complex simple.
Kirill Eremenko: 00:00:44 Welcome back to the SuperDataScience Podcast
everybody, super excited to have you back here on the
show. This episode is incredibly fun and cool. Today we
had the VP of data science from Gojek join us on the
episode. If you're from Southeast Asia, you have probably
heard of Gojek and actually very likely used it.
Kirill Eremenko: 00:01:10 But in case you are not from Southeast Asia or you
haven't heard about Gojek, this is a huge company. It is
valued at $10 billion as of today. It's had extreme rapid
growth and it is a super app. It is one app inside which
you can get 20 different services from ride-sharing, to
shopping, to food delivery, to insurance, to cleaning, to
even hair styling. How cool is that?
Kirill Eremenko: 00:01:42 The app serves millions of people across Indonesia,
Vietnam, Singapore, and Thailand. And they're growing
extremely fast. They have been growing extremely fast,
they continue to grow extremely fast.
Kirill Eremenko: 00:01:57 And today we had the pleasure of speaking with the VP of
Data Science from there, Syafri Bahar. And before I
continue onto what this episode is all about and what we
talked about, I wanted to say why I keep saying, "we," we
spoke with Syafri. Because today we have a second host,
Jon Krohn joined me as a co-host on this episode. You
may remember Jon from episode 365 in May this year.
Show Notes: http://www.superdatascience.com/427 3
Kirill Eremenko: 00:02:25 And the reason why Jon is joining, there's something
super exciting coming up in 2021 as an exciting change.
Jon is actually going to be... I'll give you a heads up now
without going into too much detail. We'll talk about it. I'll
announce it more in the coming episodes, but Jon will be
taking over as host of this show.
Kirill Eremenko: 00:02:47 I know that might come as a surprise. It's the first time
I've mentioned this publicly, but it's going to be super
fun, it's going to be an amazing time. And we won't talk
about this too much right now and not detract from the
episode, we'll get into that in a future episode. But in this
episode we decided to co-host and talk with Syafri
together, and it turned out really fun. We had a lot of
laughs and I'm sure you will join us with them, with those
laughs.
Kirill Eremenko: 00:03:16 And so what did we speak about today with Syafri? Well,
we talked about Gojek and the impact it's having. We
talked about decision science versus data science. They
actually have three divisions under Gojek, decision
science, data science and business intelligence, and we
specifically discussed the difference between decision
science and data science.
Kirill Eremenko: 00:03:34 We talked about CartoBERT and Turing, so some more
technical things and some use cases are on this. Some
very interesting use cases. We talked about what it's like
to be a VP or vice president of data science, and what that
role entails at a rapidly-growing company like Gojek.
Kirill Eremenko: 00:03:55 We talked what it takes for a data science team to be a
high performance data science team. We talked about
mathematics in data science quite extensively. Both Jon
and Syafri are experts on mathematics and data science.
It was very interesting to have that conversation. And
finally, we talked about what it takes to thrive as a data
scientist in a company like Gojek.
Show Notes: http://www.superdatascience.com/427 4
Kirill Eremenko: 00:04:18 So lots of very cool insights coming up. Can't wait for you
to check out this episode. Without further ado, I bring to
you Syafri Bahar, VP of data science at Gojek.
Kirill Eremenko: 00:04:34 Welcome back to the SuperDataScience Podcast,
everybody. Super excited to have you back here on the
show. Today we've got a very exciting episode. We've got
two hosts and one guest. Our guest for today is Syafri
Bahar calling in from Indonesia, from Bali. And we've also
got Jon Krohn as our co-host calling in from New York.
Hi, guys. How are you doing?
Syafri Bahar: 00:04:55 Hi, Kirill. Doing good, thanks.
Jon Krohn: 00:04:58 Hey, very well, Kirill. Yeah. Delighted to be here.
Kirill Eremenko: 00:05:01 Awesome. What's the time for you, Syafri?
Syafri Bahar: 00:05:05 Now it's 9:00 actually, so I'm calling from Bali.
Kirill Eremenko: 00:05:08 9:00 AM, right?
Syafri Bahar: 00:05:10 9:00 AM, yes.
Kirill Eremenko: 00:05:11 Awesome, awesome. And Jon, you?
Jon Krohn: 00:05:14 Yeah. 8:00 PM. Getting there.
Kirill Eremenko: 00:05:17 Yeah. Crazy. Across all the time zones.
Jon Krohn: 00:05:20 And how about you, Kirill?
Kirill Eremenko: 00:05:22 For me? It's about 6:30 AM. About 6:00 AM.
Jon Krohn: 00:05:28 Oh, man.
Syafri Bahar: 00:05:29 Oh, wow. That's very early.
Kirill Eremenko: 00:05:31 Yeah. That's okay.
Show Notes: http://www.superdatascience.com/427 5
Jon Krohn: 00:05:32 Do you always get up that early?
Kirill Eremenko: 00:05:34 I do, my girlfriend doesn't. She was so dazed, I had to go
to another room to go and sleep there because this is the
only room where I can record. Took her blanket and
pillow and just went away.
Jon Krohn: 00:05:52 Our apologies.
Kirill Eremenko: 00:05:54 No, it's okay. It's okay. I'm glad we're all here. We've I
think met Jon. Our listeners have met Jon before from
other podcasts, but just quickly, Jon, if you could give us
a quick intro about your background.
Jon Krohn: 00:06:09 Sure. I'm the chief data scientist at a machine learning
startup in New York. That's my day job, but on the side I
do lots of data science education. I have a book, Deep
Learning Illustrated that was a number one best seller.
Not been translated into Indonesian yet, but we do have a
lot of translations around the world.
Jon Krohn: 00:06:34 And I've also been doing some work with
SuperDataScience. We've got a Machine Learning
Foundations course that just launched in the Udemy
platform together. And Kirill and I met through the
SuperDataScience podcast. I was a guest on the podcast
early in 2019, and I asked Kirill if he would like to be a
guest on my podcast, which I had just launched. At that
point I'd only had two episodes, and we hit it off. We had
a really great conversation and if you don't mind me
breaking it to your audience right now, Kirill.
Kirill Eremenko: 00:07:11 Yeah, sure.
Jon Krohn: 00:07:15 A couple of months ago, Kirill approached me to begin
hosting the SuperDataScience podcast, so I'm absolutely
blown away. I couldn't believe that he asked me to do
Show Notes: http://www.superdatascience.com/427 6
that. Now we're getting me warmed up by co-hosting
today, and I couldn't be more excited.
Kirill Eremenko: 00:07:34 Me too. Super fun, super fun. It's going to be an exciting
time I think. I feel you're the right person to carry the
SDS podcast forward. Thanks for being here today, Jon.
Jon Krohn: 00:07:45 Yeah. An honor.
Kirill Eremenko: 00:07:46 Awesome. All right. Oh, and by the way, congrats on the
Machine Learning Fundamentals or Foundations. 90,000
students, right? Last I checked.
Syafri Bahar: 00:07:56 Wow.
Jon Krohn: 00:07:57 Yeah. I think it's 80,000, but that's about the same in
terms of the impact. And yeah, 80,000 students. It's only
been live for five or six weeks. And that's the kind of thing
that I couldn't have possibly ever dreamed of that kind of
thing. It's by association with you guys, with the
SuperDataScience podcast, and so thank you very much
for that.
Jon Krohn: 00:08:22 And we're only just getting started. There's three and half
hours live for the course right now, and I expect when the
podcast is released it'll still be about that three and a half
hour mark. But by the end of 2020 it'll be about six
hours. We'll have finished the first quarter or so of all the
content. In 2021, there'll be 25 hours of content in there,
covering linear algebra, calculus, probability, statistics,
computer science. Everything you need to know to be a
great machine learning practitioner, or data scientist.
Kirill Eremenko: 00:08:58 Fantastic. That's very cool. And that's a very good segue
to Syafri, because Syafri, you love mathematics, right?
Syafri Bahar: 00:09:04 Oh, yeah.
Kirill Eremenko: 00:09:04 Your whole story is mathematics.
Show Notes: http://www.superdatascience.com/427 7
Syafri Bahar: 00:09:08 Sure, yeah. Exactly.
Kirill Eremenko: 00:09:09 Please tell us a bit about that.
Syafri Bahar: 00:09:13 Yeah. I've actually been into mathematics since I was a
child, actually. My father is actually a math teacher, so
when I was a child-
Jon Krohn: 00:09:20 There you go.
Syafri Bahar: 00:09:20 Yeah. I remember a day where I was I think in elementary
school and I start asking about this sequence problem. I
just make a sequence problem with the three differential
layer of arithmetic sequence to my teacher. And then I
actually asked the problem to my father, but he just
tossed me a book.
Syafri Bahar: 00:09:45 But later I found out that it's actually in a university
book. I'm kind of being crunching in order to find the
solution of the problem, and since then I've actually
grown my interest to math. In fact, I'm also lucky enough
to represent Indonesia actually to a couple of math
Olympiad competitions, so that's a very nice experience.
Jon Krohn: 00:10:06 That's huge because Indonesia is the fourth most
populous country on the planet, so you're representing a
big population there.
Syafri Bahar: 00:10:15 Yeah. It's quite surreal also for me back then because I
was kind of from, how do you call it, the underdog regions
of Indonesia, so to say. A lot of the representatives
[inaudible 00:10:28] always come from the Jakarta area,
and I was probably the first representative from that
region, from that province actually, after let's say eight to
10 years. It was quite a euphoria for me as well.
Kirill Eremenko: 00:10:42 What province?
Syafri Bahar: 00:10:42 Sorry?
Show Notes: http://www.superdatascience.com/427 8
Kirill Eremenko: 00:10:42 What province is that?
Syafri Bahar: 00:10:44 Sulawesi province. Sulawesi.
Kirill Eremenko: 00:10:45 Sulawesi.
Syafri Bahar: 00:10:46 Yeah.
Kirill Eremenko: 00:10:49 I know there was a few active volcanoes. I was doing a
data science analysis of the active volcanoes of the past, I
don't know, centuries. And there's quite a few in
Sulawesi. I think four or five [inaudible 00:11:02]
hundreds.
Syafri Bahar: 00:11:10 Yeah. It looks like a K actually on the map. It's easily
recognizable. Since then I've grown my interest and I'm
actually still actively reading, learning about math book. I
think I consider it as a hobby actually, because I find it
beautiful as a discipline. So yeah, you're right about it.
I'm a big fan of math.
Kirill Eremenko: 00:11:26 That's awesome. And when you don't do math, what is it
that you do? Because it sounds like you're so into
mathematics, sounds like your full time job, but you have
a different full time job. Tell us a bit about.
Syafri Bahar: 00:11:39 Oh, yes. Yes. Actually it's my day job. I am a VP of data
science for Gojek, so I'm Gojek is actually an on demand
super app platform. We have around 20 products. I think
we basically from ride hailing, we have food delivery, we
have entertainment kind of like Netflix streaming services.
We also have insurance, for example. It's a super app.
Syafri Bahar: 00:12:09 We used to actually have even a service where you can
actually order a masseuse coming to your house within
15 minutes actually, just with a click of a thumb. But
unfortunately, we can't get the service to sell. But yeah, it
is quite a hyper growth product. We become the first
Show Notes: http://www.superdatascience.com/427 9
unicorn of Indonesia and then two, three years after we
became the first decacorn of Indonesia, which is surreal
in terms of growth I would say.
Kirill Eremenko: 00:12:41 What's a decacorn?
Syafri Bahar: 00:12:43 A decacorn is with a 10 billion valuation basically.
Kirill Eremenko: 00:12:45 10 billion valuation. Oh, my gosh. In 10 years, right you
said?
Syafri Bahar: 00:12:49 Well, it's eight years actually to be precise.
Kirill Eremenko: 00:12:52 Yeah. Wow. Wow. Very cool. Are you subscribed to the
Data Science Insider? Personally, I love the Data Science
Insider. It is something that we created, so I'm biased.
But I do get a lot of value out of it. Data Science Insider if
you don't know is a free, absolutely free newsletter which
we send out into your inbox every Friday. Very easy to
subscribe to. Go to SuperDataScience.com/DSI.
Kirill Eremenko: 00:13:20 And what do we put together there? Well, our team goes
through the most important updates over the past week
or maybe several weeks and finds the news related to
data science and artificial intelligence. You can get
swamped with all the news, even if you filter it down to
just AI and data science, and that's why our team does
this work for you.
Kirill Eremenko: 00:13:39 Our team goes through all this news and finds the top
five, simply five articles that you will find interesting for
your personal and professional growth. They are then
summarized, put into one email, and at a click of a
button you can access them, look through the
summaries. You don't even have to go and read the whole
article, you can just read the summary and be up to
speed of what's going on in the world.
Show Notes: http://www.superdatascience.com/427 10
Kirill Eremenko: 00:14:01 And if you're interested in what exactly is happening in
detail, then you can click the link and read the original
article itself. I do that almost every week myself. I go
through the articles and sometimes I find something
interesting, I dig into it. So if you'd like to get the updates
of the week in your inbox, subscribe to the Data Science
Insider absolutely free at SuperDataScience.com/DSI.
That's SuperDataScience.com/DSI. And now, let's get
back to this amazing episode.
Jon Krohn: 00:14:32 And you're financed by some of the biggest possible
financiers around. Sequoia Capital, Tencent, Google,
Facebook. So it's interesting that you would think that a
lot of those companies would actually be competing
companies, and so it's interesting. I guess they see a lot of
potential in Indonesia.
Jon Krohn: 00:14:50 Something that really interests me and may interest a lot
of our listeners is what is a super app? In the West, I
don't think we have anything like that. It seems almost
like in the West they deliberately fragment apps. So
Facebook fragmented into Messenger, and as many
different pieces as possible.
Jon Krohn: 00:15:11 When you have a super app, when you look on your
phone it's just one app that you click on, and then when
you're inside you navigate to all these? You get your
massage and your insurance once you're inside?
Syafri Bahar: 00:15:24 Exactly. No, exactly, Jon. Yeah. It is very interesting
indeed, because if you think about it there's not really a
comparable I would say platform on there. But just the
idea is we built the whole ecosystem within one app. And
I think [inaudible 00:15:39] actually managed to create
this network, and then you actually start to reap the
benefits of having. Because anything that you put in that
ecosystem scales very fast actually.
Show Notes: http://www.superdatascience.com/427 11
Syafri Bahar: 00:15:48 So we became, for example for the food delivery, the
biggest in Asia excluding China. Logistics for example
also became the biggest in Indonesia just from leveraging
of this network effect actually that we have within the
app. But you're right, if you think about it the
opportunities to implement data science, machine
learning just meshed in terms of personalization.
Syafri Bahar: 00:16:17 It's just amazing. For example, being able to know
[inaudible 00:16:21] of food orders or massage
appointments allows us to recommend what is the best
service for that. You might think of music actually.
[crosstalk 00:16:34]. There's so many kind of information
within the network which can actually be leveraged to
build a very powerful personalization. It's quite an
exciting environment. It's like having 20 companies within
one umbrella, pretty much.
Jon Krohn: 00:16:52 Yeah. The data science perspective of it sounds absolutely
amazing. And I guess we'll spend most of today's program
talking about that, so it's great. I love this idea of how you
can be like, "Oh, yeah. If you like a deep tissue massage,
then you'll probably be interested in our athlete
insurance."
Syafri Bahar: 00:17:08 Exactly.
Kirill Eremenko: 00:17:12 It's like recommender systems on Netflix or Amazon but
on steroids. You get the network effect of the
recommender systems. It's exponential on exponential. No
wonder it grows so fast.
Syafri Bahar: 00:17:25 Exactly, exactly.
Kirill Eremenko: 00:17:25 That's so cool. As I understand, you're operating in
Thailand, Vietnam, Singapore, and Indonesia. Is that
correct?
Show Notes: http://www.superdatascience.com/427 12
Syafri Bahar: 00:17:32 Correct. Yes.
Kirill Eremenko: 00:17:34 And how many people, just for those... Of course, for
those people who are from those countries will know you
well, but for those from the West who maybe haven't
heard of Gojek, how many users do you have on your
platform? How many people do you work with on your
platform?
Syafri Bahar: 00:17:51 Sure, sure. Maybe just to give the idea of the scale. The
app itself has been downloaded 170 million times,
actually. And I think one every four Indonesian have the
app installed. They have actually [inaudible 00:18:08].
And then we have already around-
Kirill Eremenko: 00:18:10 I have the app installed, too.
Syafri Bahar: 00:18:11 Oh, really?
Kirill Eremenko: 00:18:13 I've had at least.
Syafri Bahar: 00:18:14 [crosstalk 00:18:14].
Kirill Eremenko: 00:18:15 Yeah. When I was in Bali I asked for a ride. You get on the
scooter behind this driver and you hold on for your life.
It's a really cool experience.
Syafri Bahar: 00:18:26 Yeah. And just to give you the scale because that's very
interesting, because it has around total of drivers and
then also service providers, we have around two, two and
a half million. So that's almost 1% of the population of
Indonesia, so it's quite crazy. Basically that thing that a
lot of people's lives actually depend on us.
Syafri Bahar: 00:18:49 So it's also quite a privilege I feel, because we need to do
our jobs really well in order to be able to survive and
really provide these people with the day-to-day livings as
well. Maybe couple of other things [inaudible 00:19:05].
In in terms of the economy also we've contributed
Show Notes: http://www.superdatascience.com/427 13
immensely in Indonesia. I think if we total everything for
all the incomes coming from the platform itself, it's
actually contribute to 1% of Indonesia's GDP. So it's
pretty big.
Jon Krohn: 00:19:22 That's incredible. Yeah.
Syafri Bahar: 00:19:24 Yeah. And actually, we also hit our two billion orders
milestone last year, if I'm not mistaken. It's actually quite
a milestone also for us.
Kirill Eremenko: 00:19:33 Congrats. That's really cool. I'm sure data science played
a huge role in that.
Syafri Bahar: 00:19:40 Yes, yes.
Jon Krohn: 00:19:42 I had a question along a similar vein. I've been queued up
for it perfectly, which is how big is the core company at
Gojek? For example, how many data science people are
there?
Syafri Bahar: 00:19:57 Yeah. Data science, there are around 60 to 80 people I
think now. In total within data [inaudible 00:20:05] we
have around 150-180 people in total. Now we actually
have three different I would call it analytic professional
within the company. We have data scientists, we have BI,
business intelligence, we also have decisions scientists.
Syafri Bahar: 00:20:22 Recently we introduced this basically to kind of help us
making the right decisions for million-dollar decisions
that we need to take. We need a really specialized
knowledge to, how do you call it, to clear out all the
ambiguities in terms of asking questions and being able
to systematically taking decisions in a more rigorous way,
basically. That's about the size of the data team.
Jon Krohn: 00:20:49 Nice. The decision sciences team sounds like the holy
grail in business. That's what everybody wants to be
Show Notes: http://www.superdatascience.com/427 14
doing, and maybe because you guys do it that's why
you're having this incredible hyper growth, and you've
become a decacorn. It could be a big part of it.
Syafri Bahar: 00:21:03 Yeah. We actually just recently started. We'll see, but I
think we're probably one of the first that's introduced the
job ladder, the job family in Indonesia. I am really looking
forward to what kind of impact actually it can. But if you
look at already the use cases, there's quite a lot of use
cases already where we need to take, for example
decisions about expansion, decisions about releasing
certain features. Decisions about for example distributing
[inaudible 00:21:30]. I think those are the typical
questions that these job I'll say architects will focus in on
for that.
Kirill Eremenko: 00:21:40 What's the difference in skill sets for a decision scientist
versus a data scientist?
Syafri Bahar: 00:21:48 Yeah. Our definition, because again within the market
especially Indonesia, every company has their own ways
to define data science. I think our definition of data
science versus decision science, if you look at the core
skill set, data scientists within our companies are very
strong in software engineering as well. So they're trained
to build scalable machine learning system.
Syafri Bahar: 00:22:14 A little bit more like the applied machine learning
engineers actually. Very close to that, while our decision
scientists they need to be very strong with the statistical
analysis, like causal inference for example. Being able to
do hypothesis test and they need to be good with
experimentation. The focus are a little bit different.
Syafri Bahar: 00:22:34 Data scientists, they really build data products, scalable
data products. And our decision scientists really help
with decision making actually, by running some certain
Show Notes: http://www.superdatascience.com/427 15
analysis. Statistical analysis that can help us making
better decisions.
Kirill Eremenko: 00:22:48 Wow. That's very cool. I guess in smaller companies or
companies that are not as advanced in terms of data
science, that is all combined in the analyst or the data
scientist, those hypothesis testing and so on. But as you
scale, I guess you made the call to separate those two and
really specialize people. "All right. You are in hypothesis
testing and you can run all these experiments, whereas
you're in machine learning and engineering of features,"
and things like that. So people can actually focus and get
really good at, not one thing but that group of things that
are relevant to that profession.
Syafri Bahar: 00:23:31 Yeah. Indeed, indeed.
Kirill Eremenko: 00:23:32 That's very cool.
Jon Krohn: 00:23:34 It seems like those data scientists I've been reading about
Gojek's machine learning platform, they're a series of
articles on medium. And some very cool specialized tools
like CartoBERT, so using the BERT system, the
transformers in natural language processing. So
leveraging particular deep learning techniques to allow
you in the ride hailing product to be able to create names
for pickup points, right?
Syafri Bahar: 00:24:04 Correct. Indeed.
Jon Krohn: 00:24:06 And then I read about Turing, which named after the
great British computer scientist Alan Turing. And it's a
tool for evaluating machine learning models I guess before
they go into production, or maybe after they're also in
production to make sure that they're still performing as
you'd expect?
Show Notes: http://www.superdatascience.com/427 16
Syafri Bahar: 00:24:24 Yeah. I'm actually very, very happy that you've basically
spent some time in visiting our medium blog, and there
are like great articles over there. But to write about
CartoBERT, I think the idea is one of the things that we
would like to [inaudible 00:24:39]. This is also very
interesting in terms of how we really bring data end to
end. Just particularly if you don't mind, I'll tell a little bit
stories about CartoBERT.
Jon Krohn: 00:24:48 Please.
Syafri Bahar: 00:24:50 Yeah. It used to be that we learned from the data
[inaudible 00:24:54] people who actually been pick up
from a very crowd location, like a shopping mall, et
cetera, et cetera. We basically look at the percentage of
people who call the drivers. It's actually two X compared
to the other place.
Syafri Bahar: 00:25:09 Basically we have concluded that people [inaudible
00:25:12] around these areas. So what we did is that we
run some clustering, we [inaudible 00:25:18] basically,
and we found out that among these pickup points
apparently we can actually find the center of those
clusters, where people ask to being pick up.
Syafri Bahar: 00:25:30 And then what we do is that we also have the chat history
of drivers and then our customers. So what we did is that
with a clustering system we picked the center point, and
we need to basically attach a label into that. And that's
where CartoBERT actually plays into the role, because it
allows us to crunch millions of chat logs, and then
summarize it into a pickup point.
Syafri Bahar: 00:25:54 Especially given the size of Indonesia, it's just not
possible to do it manually. So what we did is that we ran
100,000 pickup points in Indonesia for the shopping
mall, and then it translates into product features that
people love. And then we all see quite a significant reduce
Show Notes: http://www.superdatascience.com/427 17
in the number of calls between drivers and customers.
Just to illustrate how we really use data to improve the
experience of our users.
Kirill Eremenko: 00:26:20 That's a cool one.
Syafri Bahar: 00:26:25 In addition to that, actually a couple of months ago we
released also together with Hong Kong University of
Technology, we worked together and we released probably
one of the bigger BERT... One of the biggest BERT model
pre-train NLP models for Indonesian language actually.
And we have open sourced it. People here if you happen
to be interested in Indonesian language, NLP for
Indonesian language, you can actually go to
www.IndoNLU.com. You actually can download the pre-
trained model for Indonesian language.
Kirill Eremenko: 00:27:07 Beautiful. That's awesome.
Jon Krohn: 00:27:07 Yeah. It's great to be sharing your expertise with the
world. Really wonderful. Seems like you guys are doing
great things on your team. Maybe Kirill already knows
this, I don't know how much you know about each other's
backgrounds, but in your role are you... Who reports in to
you? How big are the teams? What does being a VP of
data science mean at Gojek?
Kirill Eremenko: 00:27:38 Yeah. I'm also very curious. That's a great question.
Syafri Bahar: 00:27:41 Yeah. Thanks a lot, actually. Especially in the hyper
growth startup, I've probably changed my role three to
four times already within two years in terms of scope. I
was originally hired to develop the data science
capabilities [inaudible 00:27:56] originally, and then
became the head accountant for data science basically.
Syafri Bahar: 00:28:00 At that the teams are still around 40-50 people I think
within the machine learning engineers. [inaudible
Show Notes: http://www.superdatascience.com/427 18
00:28:07] platform. And then recently, the portfolio has
grown a little bit. Not only that, I actually have two other
peers within Gojek, so we both report to the chief data
officer of Gojek.
Syafri Bahar: 00:28:23 Together with my peers we basically split the portfolio, so
I currently oversee around nine verticals. Our
entertainment, third party platform, groceries, marketing,
for example. It's not all logistics. There are a couple of
verticals and we oversee both the analytic and science
part of the portfolio.
Syafri Bahar: 00:28:46 What I refer to analysts is the BI and analysts, data
analysts. And then the science part is decision scientists
and data scientists. Probably there are around 50-70
people, 60 people I think eventually reporting to me
currently.
Syafri Bahar: 00:29:07 In terms of scope of work, it basically encompasses
almost all spectrums. If I were to decide the cluster
[inaudible 00:29:15] activities, it's starting from the
people itself. We're taking care of the technology, what
impact. Which technology that we need to for example
invest in next year.
Syafri Bahar: 00:29:25 We also deal with building organizations. How do we
organize ourself actually to prepare us to tackle the
company strategic team, positions ourself. Basically all of
these aspects from hiring and everything. Even the dirty
one, like the financing, cleaning up the systems and stuff
like that.
Syafri Bahar: 00:29:47 It encompasses almost everything, basically. I actually do
see myself as a problem solver in a way that whatever, I
try to fill the sack in terms of, "Hey, I don't think that
there is a, for example clear career path for some of our
people," so I'm going to immediately jump talking to HR
and ensuring that for example that we've managed to
Show Notes: http://www.superdatascience.com/427 19
create a good system that allows people to basically follow
their aspirations.
Syafri Bahar: 00:30:15 But sometimes I'm also put in a very project-specific
activity, like for example really understanding our
customers. Creating a framework in order to be able to
actively manage our customer portfolio for example, by
properly [inaudible 00:30:31] customer lifetime, for
example. I think those are different spectrums, just to get
some flavor of what I'm doing on day-to-day basis. I hope
that answers your question.
Jon Krohn: 00:30:38 Yeah. That was an amazing answer, and it sounds like a
really interesting role. Wow.
Kirill Eremenko: 00:30:46 Do you still do much technical work?
Syafri Bahar: 00:30:50 Yes. I try to do so because I think, and especially in this
field, things just evolve very rapidly so I try spend couple
of hours still coding basically, and really pushing codes
as well to the repository. Being involved also in the
technical discussion in the modeling, so I still try to do
that.
Kirill Eremenko: 00:31:10 Yeah. That's impressive.
Syafri Bahar: 00:31:12 [crosstalk 00:31:12]. Yeah, exactly.
Kirill Eremenko: 00:31:15 Absolutely. That's very, very good to hear. I like what you
said in one of your interviews about a high performing
data science team that requires three main components.
Do you mind telling us a bit about that? As a VP of data
science, you have a unique position that not only you
need to deliver the work, but you also need to evaluate
the performance of your team.
Kirill Eremenko: 00:31:41 And report to higher up executives on, "We are delivering
value. This is a very useful team to the company." You
Show Notes: http://www.superdatascience.com/427 20
have accountability and you have a responsibility to your
team to do that, otherwise there's stories of whole teams
getting disbanded because executives didn't see value.
And I found your philosophy about what is a high
performing data science team very structured.
Kirill Eremenko: 00:32:10 And I think not only managers listening to this podcast
will find valuable, but also individual contributor data
scientists will find it valuable to understand. To evaluate
for themselves if they're part of such a team, and what
they can do in order to be part of such a team. If you
could jump into that, that'd be great.
Syafri Bahar: 00:32:30 Sure. Yeah. Thanks a lot for that. Indeed, I think what I
found actually to be very challenging is really to establish
values. Especially data science itself as a discipline is a
valued thing. It takes a lot of faith I would say from
executives to kind of even invest in the team, because
typically the investment will probably take, especially for
our largest machine learning systems that we have that
really move the needle, it probably took a year in the
making.
Syafri Bahar: 00:33:01 Involves a lot of iterations, trials and error. So I think
what I also highlighted in my interview that basically
what we need to show to the company is that, first of all I
think we need to measure everything. And that's also the
reason why we're actually integrating all of our machine
learning system. We integrate also the measurement
system inside, just to ensure that we are actually able to
quantify the impact even on a team level.
Syafri Bahar: 00:33:26 I'm able to know for example what is the dollar impact
that a team of three people basically deliver for a
particular project. That's how rigorous we are in terms of
measuring impact. And really, I think the case that we try
to make is that we want to make a case that we're not
[inaudible 00:33:43], because there are very tangible
Show Notes: http://www.superdatascience.com/427 21
dollar savings or dollar generating, actually that we do for
the company.
Syafri Bahar: 00:33:50 And we are able to achieve that by really putting a very
strong measurement in place, even before we engage in
any of the machine learning projects actually. I think the
first thing that we ask for product engineers counterpart,
to have a measurement system in place. We have
experimentation system in place actually, just to
understand where basically the things that we will build
for them will actually lead to many impacts.
Syafri Bahar: 00:34:15 And the fact about being in the hyper growth startup,
there's hundreds of things that we can actually do for
next year. So we need to have a very ruthless [inaudible
00:34:24]. Having a proper way to measure the impact or
potential impact is very essential in order to establish a
case for the company.
Syafri Bahar: 00:34:35 I think one of the characterizations of high performing
team will be that they deliver impact, and how do they
know whether they deliver any impact? It's by really
putting this measurement in place. And then by
educating as well. I think what I seem to learn as well
during my experience within Gojek is that a lot of these
end to end, a lot of these projects have actually managed
to deliver big impact.
Syafri Bahar: 00:35:01 A lot of the challenges, of course there are [inaudible
00:35:03] challenges as well, but I think not to be
undermined as well is the non-technical challenge of
really ensuring that we have created a good structure for
our data scientists and product engineers. And engineers
actually to work different pace, but they are able to
integrate their solution.
Syafri Bahar: 00:35:27 This is one, and the second thing is also about constant
education to stakeholders to try to convince them why it
Show Notes: http://www.superdatascience.com/427 22
is okay actually for their millions of dollars of their
money, actually being managed by a black box. I think
that also requires a lot of convincing, I would say. So
really establishing a good operation model is very
essential I think for the high performing team, because by
having a good operational model.
Syafri Bahar: 00:35:53 Just to give a little bit more flavor to that one. For
example, we have recently basically declared that all of
our solutions need to be basically communicate with
product engineer systems using API base. Because that
allows people basically to move in a different pace, and
then meet up again like a couple of weeks later to
integrate their solutions.
Syafri Bahar: 00:36:14 But as long as before the start of a project, the teams are
very clear in terms of what they will deliver. And we only
can achieve that by having a proper API contract. It
allows team really to reiterate. And the thing about data
science, I think what I found to be very interesting is that
their iteration, the sprint cycles are very different with
product engineering teams actually.
Syafri Bahar: 00:36:38 There have so many data dependencies when we look at it
from data science perspective, so we can't really treat it as
an engineering sprint. So they need to be able to have the
flexibility to move at different pace. But then eventually
their solutions that they built need to match actually the
API.
Syafri Bahar: 00:36:56 And what I also find to be very important is to ensure that
the team is empowered to make decisions, by having a
proper experimentation system, having a robust
methodology to decide whether the team needs to go left
or right. And empowering them to make decentralized
decision making actually. I think that I found to be very
important to ensure that the team can move very fast.
Show Notes: http://www.superdatascience.com/427 23
Syafri Bahar: 00:37:20 So we need to trust them with decision making in
decentralized manner, as long as the methodology and
the system that they've actually created to make those
decisions are robust. I hope that answers your question.
Kirill Eremenko: 00:37:31 Yeah. And empower them to fail as well, right? You said
that in one of your other interviews.
Syafri Bahar: 00:37:36 Yeah.
Kirill Eremenko: 00:37:37 Decisions sometimes will be wrong, and they should know
it's okay.
Syafri Bahar: 00:37:42 Exactly. And I'm actually very glad that we [inaudible
00:37:44] or CEOs or co-CEOs [inaudible 00:37:49] are
very supportive of that with the cultures of it's okay if
they actually fail. It's better to fail fast and learn from it,
rather than moving very slow because especially the
competitions is very fierce, the market also moves very
fast. So agility is definitely something that we value very
high within our company.
Kirill Eremenko: 00:38:14 Fantastic. Thank you. Thank you for that answer.
Jon Krohn: 00:38:17 It sounds like you guys are doing everything right. Yeah.
If I was in Indonesia and listening, I'd be like, "Man, how
can I get involved with this company?" Really amazing.
You're saying all of the things that I think are spot on
from a quantitative data management perspective. How
you are treating your data scientists and relating that into
the broader operations of the organization and evaluating
it. Brilliant.
Syafri Bahar: 00:38:52 Yeah. I feel also very privileged actually to work with
these amazing people, and I think I learned a lot from my
team. And the ability just to work with amazing people
who are actually distributed. Our teams are actually well
distributed, even our CEO is actually working from US.
Show Notes: http://www.superdatascience.com/427 24
We have 31 nationalities working for the company, so
we're really chasing talent also globally. We have people
across different continents actually working for us. Just
as additional information.
Jon Krohn: 00:39:27 There you go. How did you find yourself here? What was
your journey? I mean, I know that you've worked across
the world, you studied in the Netherlands and then you
worked there for a while at banks, asset management
company. And so what was your journey from that world,
so from a different continent?
Jon Krohn: 00:39:49 I expected when I was talking to you that you would have
been involved in a lot of the finance applications at Gojek.
I thought that that would be what you were working on.
But it sounds like it's much broader than that, so how
did you end up making that journey from financial
companies, really traditional financial companies? Big
banks in the Netherlands to a hyper growth decacorn in
Indonesia?
Syafri Bahar: 00:40:18 Yeah. Thanks a lot actually for asking the question. I
think because especially the journey has been very
intimate to me, and I think the reason of that because I've
always had doubt whether a pure mathematician like me
is actually able to make an impact for the society. I
always see it as very remote.
Syafri Bahar: 00:40:40 I remember there was some certain time in my life that I
say [inaudible 00:40:44]. Because my background is
actually in pure mathematics, so my thesis back then is
about topological structure basically, so I haven't really
seen data. I worked a lot with writing formulas [inaudible
00:40:58] formulas, and I had my bachelor education.
Syafri Bahar: 00:41:04 It just felt very remotely and not [inaudible 00:41:07], and
I actually switched a little bit to the applied mathematics.
I was actually taking education to be a quant, and that
Show Notes: http://www.superdatascience.com/427 25
there actually I got myself into a lot of high performance
computing. A little stochastic courses, and then being
able to actually see data. How do you say? It's quite a
spotty journey actually to, how do you call it, to come
from pure mathematics-
Jon Krohn: 00:41:42 [inaudible 00:41:42]. Yeah.
Syafri Bahar: 00:41:42 ... to applied. Exactly, right.
Jon Krohn: 00:41:47 You might not even have had numbers for many years.
Syafri Bahar: 00:41:50 No. No, no, no.
Jon Krohn: 00:41:50 It was just variables, right?
Syafri Bahar: 00:41:54 It's just variables. Indeed, indeed.
Jon Krohn: 00:41:54 That's so interesting.
Syafri Bahar: 00:41:55 Yeah, exactly. And then when I came to bank and I
started actually my education, I was trained I would say
in a very classical environment. I remember one of
mentors back then, I was requested to do analysis with
only five basic statistics, mean, median, percentile, max,
and min. And I really need to kind of-
Jon Krohn: 00:42:14 Oh, no.
Syafri Bahar: 00:42:15 Yeah, exactly. But I got really rely a lot on my problem
solving skills, and getting to know what these
measurements are actually doing. Because actually, I was
surprised that a lot of things can be done with these very
basic statistics actually. A lot of insight can be uncovered
by just playing around with the weighted average for
example, and then being able to compare these different
statistic.
Show Notes: http://www.superdatascience.com/427 26
Syafri Bahar: 00:42:38 And really make an educated guess in terms of what is
the [inaudible 00:42:41] distribution, is there an anomaly
or not. Actually with some basics, as long as one knows
very well [inaudible 00:42:49] actually there a lot of things
can be done. So I was actually trained in that
environment and I was also lucky enough to work with
different type of risk.
Syafri Bahar: 00:42:59 And actually, for the audience who's not very familiar
with risk management, there are actually different type of
risk. And what's very unique because for each different
type of risk, it actually deploys different type of
mathematical tool. Just as an example for credit risk, I
used a lot of predictive funnels with [inaudible 00:43:19]
risk.
Syafri Bahar: 00:43:19 For example, my last type of domain that I've worked with
before I moved to Indonesia, I actually needed to do a lot
of simulation kind of things. I actually maintained Monte
Carlo engine for the bank itself. So basically, what we
need to do, we have couple of hundreds of thousands of
trades and we need to do simulations of thousands of risk
factors.
Syafri Bahar: 00:43:44 And not only we need to simulate it for one day or two
days later, but really 30 years ahead. So basically, I used
a lot of the parametric simulation techniques in order to
be able to do that.
Syafri Bahar: 00:43:59 But basically, what I wanted to say is that I really built
the required skill set in a very classical environment,
really beat by beat. And then what I found to be very
beautiful about mathematics is because it's very, how do
you call it, transferrable to other type of domains.
Because the language are the same, especially the
language of linear algebra I think is very useful in order
for me to grasp the new concept as well.
Show Notes: http://www.superdatascience.com/427 27
Syafri Bahar: 00:44:29 When I came back to Indonesia, I started at a fintech
company. And then by coincidence I gave a talk at Gojek
actually, and then I got approached by what now becomes
the co-CEOs of Gojek itself and I got hired from a coffee.
Kirill Eremenko: 00:44:46 Nice.
Syafri Bahar: 00:44:47 I'm actually very glad that he took a bet on me until I
basically managed to be where at where I am now.
Kirill Eremenko: 00:44:56 That's interesting.
Syafri Bahar: 00:44:56 It was quite a series of coincidences actually.
Kirill Eremenko: 00:45:00 That's very interesting. I've got a question, kind of I guess
a question that will challenge me more, and I'd like to get
your opinion on this. The way I teach data science in the
courses is very different to the way Jon teaches, and the
way I guess that you apply data sciences. I studied also
mathematics, studied mathematics and physics in my
bachelor, but it was a long time ago and I liked it a lot.
Kirill Eremenko: 00:45:30 But the way I applied data science when I was at Deloitte
in an industry, it required very little mathematics. And
that's how I teach it as well. I teach more as like a plug
and play type of instrument that, "All right, machine
learning. Here's an algorithm. I don't know, Naïve Bayes
clustering. This is intuitively how it works, this is what's
in the background. This is what's going on and this is
how you apply it."
Kirill Eremenko: 00:45:57 And I avoid teaching the mathematics. For instance, the
analogy I give is driving a car. To drive a car, you need to
know where to put the petrol, how to steer, where to press
the gas, where to press the brakes. And you need a lot of
practice. That's how you pass your driving test. You never
need to know what a crank shaft is, how it's different to a
cam shaft, what's under the hood.
Show Notes: http://www.superdatascience.com/427 28
Kirill Eremenko: 00:46:20 I don't even sometimes know how to put the oil in the car,
for crying out loud. So my question to you is, is there a
right or wrong? Or if you think it's important for people to
learn mathematics in order to be data scientists, then
why?
Syafri Bahar: 00:46:42 Understand. Okay. Yeah. I think it all very depends on
the type of domain that they will work on in the future,
and what they're interested in. I would say because we
kind of look at the spectrum of applications of
mathematics within data science, I think we can define it
in a couple of clusters actually.
Syafri Bahar: 00:47:01 And particularly in Gojek why it is important for the
people to understand the basics, because we dealt a lot
with what I call green field projects. These are the type of
projects which we can't just Google and get the answer.
We really need to exercise the first principles in order to
understand what kind of mathematical apparatus that we
basically need to deploy to solve the problem.
Syafri Bahar: 00:47:22 [inaudible 00:47:22] can just come to us and say, "Hey,
we have this amount of budget. I want you to be able to
distribute it in an optimal way." Very, very vague and
ambiguous, so one really required to ask more, ask direct
questions first of all to understand the real problem.
Syafri Bahar: 00:47:38 How do you define optimal, what are the different levers
that we basically can use to distribute those things, and
how can we basically use the right apparatus to model
the problem itself? What I want to say, basically that
that's also the reason why we emphasize this a lot, the
context.
Syafri Bahar: 00:47:56 Even for example [inaudible 00:47:58] linear regression
during an interview. I think what we sometimes do, we try
to tweak it a little bit, the problems. "Hey, what if we take
these L1 penalty, what if we take L2 penalty? What if for
Show Notes: http://www.superdatascience.com/427 29
example we shift the distribution of the target variables to
become very highly imbalanced?"
Syafri Bahar: 00:48:15 Just to test the ability of the candidate to adapt to
different reality that they might encounter while working
on the problems within Gojek. And the reason why we do
it is because we think that's a relevant skill set to have.
Syafri Bahar: 00:48:30 I can imagine for example when one will focus a lot on
building the data science platform or engineering
platform. [inaudible 00:48:41] to know the kind of two,
three layer [inaudible 00:48:44]. Like what you said, more
like a plug and play, but I think the emphasis will be how
to design the right architecture that can be very scalable.
And how do we use the mathematical concept to cut some
of the computational resources that we basically goes into
that?
Syafri Bahar: 00:49:00 And maybe in that case, it will be less obligatory to know
the two, three-layer depth. So maybe I apologize because
there is no straight answer, but I think it all depends.
And I think for the Gojek context it's very important to
understand those basic, because then the choice of
apparatus to deal with the problem is just quite immense.
Syafri Bahar: 00:49:22 We employ the economic technique, we employ also the
operation research technique for example in our
problems. If we play around with logistics, sometimes also
predictive models, supervised and unsupervised. And
even to some certain extent also some [inaudible
00:49:38] type of algorithm. There's just quite a lot of
possibilities over there, so it's really important to know
the at least two, three layers deep from the mathematical
perspective.
Syafri Bahar: 00:49:51 But I think for my personal opinion, I think it is also very
important to basically, how do you call it, like a painter.
Sometimes we need to be able to really bring people to
Show Notes: http://www.superdatascience.com/427 30
the, how do you call it, to appreciate the painting itself.
And I think sometimes the best way to do it is that by not
starting with [inaudible 00:50:16] differential calculus.
Syafri Bahar: 00:50:17 But really starting with the stories and then, "Hey, why
this is important. Why [inaudible 00:50:22] is important.
Because hey, we can actually translate that fraction of
this problem by bringing it to the [inaudible 00:50:28] for
example." Then they're able to imagine the solutions of
the problem.
Syafri Bahar: 00:50:33 I think it depends. And I think my personal preference is
always to start very simple and then try to peal the layers
one by one, bringing them to a bit more, how do you call
it, depth. The required depth actually necessary. That's
just [inaudible 00:50:49] a lot of it in learning as well.
[inaudible 00:50:52] language or the way you present
your teaching.
Jon Krohn: 00:50:59 I love that answer, and I don't think I have too much
extra to add. I think that to kind of summarize the value
of understanding the underlying mathematics is that I
love the car driving analogy. But the beautiful thing about
machine learning is it isn't necessarily actually that
complicated, what's going on under the hood.
Jon Krohn: 00:51:29 And so I actually started teaching exactly the same kind
of way that you described teaching, Kirill. And it's only
relatively recently that I was like, "Maybe it is worth
getting into the partial derivative calculus, the linear
algebra that's happening under here." And I was inspired
to think that by colleagues of mine, people who work for
me.
Jon Krohn: 00:51:54 I would see them doing matrix algebra or I would see
them thinking about, "What's the right data structure for
this particular type of data in this model because of how
we're going to be scaling it, so that we can minimize
Show Notes: http://www.superdatascience.com/427 31
computational resources." I was seeing people use these
underlying understandings to make on the science side,
huge intuitive breakthroughs that by only understanding
the [inaudible 00:52:29] API, there's no way you could
have had that breakthrough.
Jon Krohn: 00:52:33 And then on the engineering side, being able to think
about, "Okay. What is the time complexity or the memory
complexity of what I'm doing here? And then how can I
maybe make adjustments there, trade offs between
computational complexity versus memory complexity, so
that I can use fewer resources, or maybe have a faster
experience? Realtime experience for my users."
Jon Krohn: 00:53:02 It's a really recent thing for me that it seems so valuable,
but the more and more I dig into it, the more and more I
appreciate that, "Wow. There's so many possibilities." And
there's still absolutely a time and a place for using the
high-level APIs.
Jon Krohn: 00:53:19 I mean, maybe more often than not. But to be making
really cutting-edge algorithms, or to even be
understanding and trying to deploy some of the latest
things that you read that might only occur in papers or
graduate-level textbooks. There might not be a high-level
API for you to use yet. So if you wanted to make
CartoBERT, you can't just be able to use BERT. You have
to understand what's happening in BERT.
Syafri Bahar: 00:53:51 Yeah. Fascinating. If I can add couple of more sentences
to that, I personally think that I'm on personal mission to
really spark interest from people. Especially in Indonesia,
to really found this discipline to be fascinating. I really
want the people in Indonesia in their job for example are
being asked, "What do you want to do in the future?"
Syafri Bahar: 00:54:17 Instead of saying astronaut or doctor for example, they
say, "I want to be a data scientist." I think what I wanted
Show Notes: http://www.superdatascience.com/427 32
to kind of emphasize, I think there's so many beautiful
things which you can actually put in more intuition, in
order to just make the first bridge for people to cross that
bridge to find it interesting.
Syafri Bahar: 00:54:34 And I found communications via wrapping up things in
terms of intuitions, like what Kirill just mentioned. I think
it's very helpful to spark their interest and really for
people to get interested and really to get motivated, and
they will give energy in order to go even deeper, to a
deeper level.
Syafri Bahar: 00:54:53 But I think I'm still learning. I try to also learn how can I
actually present all these different complex concepts to
actually make it very simple, intuitive, and exciting as
well. I think that's kind of my personal mission. I'm still
learning of course, but I think they're just so beautiful in
terms of discipline. And I think a lot more people actually
can benefit from that, and especially the society.
Kirill Eremenko: 00:55:22 Thanks. Thanks, guys. I asked to get challenged and I feel
challenged. Yeah. I think it's a good perspective that
there's room for both to get started, go down the intuition
path, but then always keep in mind that you can go
deeper and it'll give you more superpowers with the
mathematics.
Kirill Eremenko: 00:55:42 Syafri, you mentioned that Gojek is hiring, so where can
people apply? And then I wanted to ask you a second
question. What does it take to thrive as a data scientist at
Gojek?
Syafri Bahar: 00:56:00 Sure. Thanks a lot for asking this question specifically. It
helps us a lot. And I think that we can always find the
open positions actually within the recruitment. And I
think if people just type Gojek recruitment, they'll pop up
basically the website where they can see what are the
available positions at Gojek.
Show Notes: http://www.superdatascience.com/427 33
Syafri Bahar: 00:56:26 And I think the second question is very interesting. I
think the fact that the company itself is, how do you call
it, we're going to the next phase now. It used to be that
we were in this very high growth phase, so to say where
things are a little bit ambiguous, I would say sometimes.
So people who can navigate in an ambiguous
environment will thrive within Gojek. People who can
actually systematically approach problems in general,
they will thrive. And I think it also takes a lot of
determination and grit to push things as well as a data
scientists.
Syafri Bahar: 00:57:03 And I think this is also the type of data scientist who
actually do not just stick with conventional approach of
things, but data scientists are required also to be able to
exercise first principle. And I think those are the type of
data scientist can actually thrive within Gojek
environment.
Syafri Bahar: 00:57:23 So they need to be able to very diverse enough to know
what are the different apparatus available to solve
problem, and be very skillful enough to. And I think also
it's being intellectually humble, to really acknowledge that
we don't know what we don't know. Because sometimes
it's just really like asking.
Syafri Bahar: 00:57:43 Sometimes actually there are a lot of things that actually
hidden behind all of these numbers and digits that we're
seeing on our screen, as a data scientist. So I think I
often also ask my data scientists just to go to the field.
Really talk to our drivers, really understanding their pain
points, and then that way it actually allows them to
understand and to basically rationalize what they see
under there.
Syafri Bahar: 00:58:11 How do you say, [inaudible 00:58:11] in terms of all the
figures and numbers. And then from those intellectual
curiosities, they are able to frame the problems correctly,
Show Notes: http://www.superdatascience.com/427 34
and then frame it as data scientist problem. And then
again, the next level will be to find the right apparatus.
Syafri Bahar: 00:58:28 And I think another quality that I think also will help a lot
will be to be very practical. If you look at the overall in the
company, there are a lot of problems that require simple
solutions. Because there are a lot of low hanging fruits in
the company, these are the type of problems that we need
basically effort 80%. We can achieve the standard
solution.
Syafri Bahar: 00:58:51 But there are also rather mature problems where in order
to go from 95% to 97%, then we will need the
fundamental research. And I think what I always told to
my team is that we should be fine using hammer to kind
of hammer the problem, but we should not shy away from
using scalp as well in really formalizing the solutions. I
think this type of mindset will basically help people to try
within the Gojek environment.
Kirill Eremenko: 00:59:22 Wow. Fantastic. Thank you. Thank you for sharing that.
Jon Krohn: 00:59:25 Yeah. I think something that if people get a chance to
check out the video version of this podcast, you can see
Syafri is so happy this whole time talking about modeling.
And maybe that even comes through in the sound of his
voice, but there's so many points where he throws his
head back with a big smile, because you're so enlivened
by these questions and these ideas. It's wonderful to see.
Syafri Bahar: 01:00:01 Yeah. Thanks a lot, Jon and Kirill.
Kirill Eremenko: 01:00:01 Yeah. In one of the videos, you mentioned in one of your
interviews that when you were working as a quant back
in Europe I believe, you realized that the impact you
make cannot extent much further beyond the company
you work for. And that you just want to do more, you
want to work... Your quote in quotation marks, "I just
Show Notes: http://www.superdatascience.com/427 35
want to do more. I want to do work to benefit a lot of
other people." Do you feel that you're doing that at Gojek
now?
Syafri Bahar: 01:00:32 Yeah. As a matter of fact, I do. And I actually feel lucky
myself, because when I wake up in the morning I still feel
that day is my first day, to be honest. Because I'm really
still very motivated to solve different problems. And the
thing about Indonesia because there are just so many
structural inefficiencies within the country, that I believe
people like me and other...
Syafri Bahar: 01:00:58 I think there's also an interview where I specifically call
all the expats over there, like Indonesian people who live
abroad, to just come and really contribute to the country.
Because there's just so many structural issues that we
need to fix, and I think exploitations of natural resources
is one way to extract values.
Syafri Bahar: 01:01:15 But I think solving structural inefficiency is also one way
to create value for the system actually. And I feel actually
blessed and privileged also to have the opportunity to
really be able to serve the community. Because these are
products that I can really relate from. My family will say,
"Hey, I feel that this app actually has helped me to
remove the daily frictions."
Syafri Bahar: 01:01:43 Even for example, there is something bad happening I'll
get immediate feedback. And even because I also
sometimes, before pandemic of course, I go ride to office. I
talk to the drivers as well, and then he mentioned how his
life actually has changed since he became one of our
partner. He was able to for example, adopt a couple of
children because of the fact that he works as a partner, a
driver partner within our platform.
Syafri Bahar: 01:02:14 I think those are all the stories that really keeps me going
through the day, and I feel blessed to be honest, to be
Show Notes: http://www.superdatascience.com/427 36
able to have the opportunity to do that. Especially with
my remote discipline, what's considered to be very
remote. Mathematics, computer science, and social
impact.
Kirill Eremenko: 01:02:36 Wow. Thank you. That's very inspiring to hear. I wish for
as many people listening as possible to feel the same way
at work. It's clearly a very fulfilling place to be in.
Syafri Bahar: 01:02:57 Thanks.
Kirill Eremenko: 01:02:59 That's awesome. Jon, do you have any questions to finish
off?
Jon Krohn: 01:03:04 No. We've covered all of my questions and I love the ones
that you asked as well, Kirill. I've learned so much today.
I can't help but notice that it seems like Gojek's mission
is to impact its scale through technology. And so it
sounds like you're really living that as a data scientist at
the firm, Syafri.
Jon Krohn: 01:03:28 I don't have any other questions. I just felt like saying
that one more time, kind of reinforcing this idea of with
probably the vast majority of people listening to this
podcast are data professionals, or aspiring data
professionals. And to hear a story like this today, it made
me feel inspired and so I hope you feel inspired, too, to be
identifying places that you can be making a big positive
socioeconomic impact with your skills. Even if you started
with a pure math topology background, you too can make
a difference.
Syafri Bahar: 01:04:08 Oh, that's a nice [inaudible 01:04:09] over there, Jon. And
thanks a lot. I really enjoyed the conversation actually.
You have done a fantastic job in really controlling the
flow, and just really participating as well. Genuinely ask
questions I think. And I think to a lot of data
Show Notes: http://www.superdatascience.com/427 37
professionals out there, I still fundamentally believe in the
futures of our professions actually.
Syafri Bahar: 01:04:32 I think we can do a lot of things for the community, even
for the world in general. I think we just scratch the
surface of what data actually can do and bring to lives of
millions of fellow people out there. I would really
encourage people who are in their learning journey to
keep going, find their energy and their motivation to keep
going. Because there's a beautiful thing and it's worth to
really put investment in really enhance the professional
and also the knowledge on the industry itself.
Syafri Bahar: 01:05:07 Thanks a lot. And I think both of you also have inspired
people with the podcast, and also especially for the
aspired data scientist and data professionals out there.
Thanks a lot for that contributing back to the community.
Kirill Eremenko: 01:05:25 Thank you, Syafri. It's been a really cool podcast. And for
those of our listeners who want to or would like to
connect with you or maybe just follow how your career
progresses, where are some of the best places to get in
touch?
Syafri Bahar: 01:05:37 Yeah. I think the best to get in touch with me on my
LinkedIn actually. I don't have a social media, like
Instagram or Twitter, intentionally. But I think the best
place to connect with me will be on my LinkedIn actually.
Kirill Eremenko: 01:05:53 Thank you. We'll share.
Syafri Bahar: 01:05:54 And actually, I do [crosstalk 01:05:55]- yeah. Sorry.
Kirill Eremenko: 01:05:56 Sorry. You go ahead.
Jon Krohn: 01:05:57 Go ahead. Sorry. I didn't realize who was Kirill speaking
to. Syafri, you go ahead. You go ahead.
Show Notes: http://www.superdatascience.com/427 38
Syafri Bahar: 01:06:15 Thank you. Thank you. Thank you. I was actually
thinking also sharing more materials and sharing some of
more thoughts as well actually. I felt that I could have
done it a bit better because, especially for a lot of aspiring
data professionals in Indonesia. I think one of the things
that I personally commit at least to 2021, so hopefully
more content that I can share to the community as well in
the future.
Kirill Eremenko: 01:06:29 Nice. Jon?
Jon Krohn: 01:06:33 I was just going to say that on the LinkedIn point that I
don't think, Syafri you shouldn't feel ashamed that
LinkedIn is your go to social medium, because I think
Kirill and I feel exactly the same way.
Kirill Eremenko: 01:06:43 Yeah. Absolutely.
Syafri Bahar: 01:06:43 Okay. It makes me feel better at least that I'm not the
only one there.
Kirill Eremenko: 01:06:50 Yeah. That's the only one that I really use. I don't think I
use any other ones.
Jon Krohn: 01:06:57 Same.
Kirill Eremenko: 01:06:58 Yeah. Syafri, one final question for you. What's a book
that you would like to recommend to our listeners?
Syafri Bahar: 01:07:07 Yeah. In terms of books there's actually quite a lot that I
have in mind. But maybe just to select few of them,
definitely Elements of Statistical Learning is a good start.
I recently also get myself into more causal learning,
basically because it happens to be that we're in the space
where we will need it a lot actually.
Jon Krohn: 01:07:26 Judea Pearl?
Show Notes: http://www.superdatascience.com/427 39
Syafri Bahar: 01:07:27 Yes, yes. That is for the mathematics. There's also the
title is What If? I forget the author again, but what I think
is also a good mix of combinations of theory and practice
as well. And Judea Pearl is definitely, if you're into math
itself, I think you will enjoy reading Judea Pearl's book on
that. And I also like-
Kirill Eremenko: 01:07:56 What's it called? What is it called? The book.
Syafri Bahar: 01:08:00 The book of-
Kirill Eremenko: 01:08:01 Judea Pearl.
Jon Krohn: 01:08:01 Causality.
Kirill Eremenko: 01:08:04 Causality, okay.
Syafri Bahar: 01:08:05 Causality, yeah. Exactly. Yeah. And Elements of
Statistical Learning is also a good book, as I mentioned
earlier. And there's also this 100-page machine learning
book that I just from one time like to read as well as a
refresher, because it condensed everything within one
book.
Jon Krohn: 01:08:25 Nice.
Syafri Bahar: 01:08:25 Do you happen to recall again the name of the author,
Jon? The 100.
Jon Krohn: 01:08:30 It's Andriy. It's so embarrassing, I can't remember.
Syafri Bahar: 01:08:37 Burkov?
Jon Krohn: 01:08:38 Yeah, that's right. Andriy Burkov. Exactly.
Kirill Eremenko: 01:08:42 Oh, yeah. Andriy Burkov. Jon, you might have him on the
podcast sometime. We've been talking with him.
Show Notes: http://www.superdatascience.com/427 40
Jon Krohn: 01:08:47 Well, he's been making quite a splash. I would love to
have him on the podcast.
Kirill Eremenko: 01:08:52 He's from Canada, right?
Jon Krohn: 01:08:53 I think he's in Montreal.
Kirill Eremenko: 01:08:55 He's Russian, ex-Russian but in Canada.
Jon Krohn: 01:08:59 Yeah.
Kirill Eremenko: 01:09:00 Awesome. Okay. Well Syafri, thank you so much. Jon,
thank you a ton. It's been a huge pleasure being part of
this podcast. Been great.
Jon Krohn: 01:09:11 Same.
Syafri Bahar: 01:09:12 Sure. Yeah. Thanks a lot. Thanks, Kirill. Thanks, Jon.
Kirill Eremenko: 01:09:20 There you have it, everybody. Hope you enjoyed this
episode and enjoyed the conversation we had with Syafri
and Jon. I definitely had some great laughs. My favorite
part of this episode, there's lots of really cool insights that
we shared.
Kirill Eremenko: 01:09:36 My favorite part was the use case that Syafri shared
around CartoBERT and how they modify BERT, and how
they used it to analyze all those interactions between
customers and drivers, to figure out the best. How to
optimize their logistics for pickups, and also how in result
it helped reduce the number of calls, and basically
improve efficiency.
Kirill Eremenko: 01:10:06 Also, I really enjoyed hearing about what Syafri
mentioned about meaning and purpose, that he is very
excited to be helping people to be contributing to
improving people's lives, in that example that he shared
of a driver that was able to adopt children. I think that's
very noble and I wish for all data science to ultimately
Show Notes: http://www.superdatascience.com/427 41
result in great things for communities and people across
the world.
Kirill Eremenko: 01:10:37 That would be very good, and if we all look out for that
and try and strive to find jobs, and make our jobs about
impact, I think that will help serve the world and also
create more happiness around the world.
Kirill Eremenko: 01:10:55 As usual, you can find the show notes at
SuperDataScience.com/427. That's
SuperDataScience.com/427. There you'll find any
materials that are mentioned on the show, all of the
books that Syafri mentioned and Jon mentioned as well.
Plus the URL to Syafri's LinkedIn. We'll also include the
URL to where you can apply for a job at Gojek as a data
scientist, if you would like to explore that further.
Kirill Eremenko: 01:11:24 Make sure to connect with Syafri, make sure to connect
with Jon. They're both open to connecting on LinkedIn.
And yeah, you'll hear more from Jon in the coming weeks
as mentioned in the beginning. There'll be this transition.
I'll talk more about that in the coming episodes.
Kirill Eremenko: 01:11:40 And yeah, on that note if you enjoyed today's episode,
make sure to share it with somebody. It's very easy to
share. Send them the link, SuperDataScience.com/427.
And I look forward to seeing you back here next time.
Until then, happy analyzing.