Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Kirill: This is episode number 119: Data Science Trends for 2018.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill
Eremenko, data science coach and lifestyle entrepreneur.
And each week we bring you inspiring people and ideas to
help you build your successful career in data science.
Thanks for being here today and now let’s make the complex
simple.
(background music plays)
Happy new year everybody, and welcome to the very first
episode of the SuperDataScience podcast for 2018. Very,
very excited to have you on board. Super pumped about the
beginning of a new year, a new adventure. And we're kicking
off the year strong and on an exciting note. So Hadelin and I
got together to record a webinar a week or so ago, which
some of you actually attended. We were discussing trends.
The trends that are coming up in 2018, what to expect, what
to look into in the space of data science.
So on this webinar, we discussed topics such as AI,
blockchain, data security, self service analytics, digital twin,
and many, many more. So those are just some of the
highlights of what we talked about. Prepare yourself for this
exciting adventure, and also note that this webinar is
available in video version. You can find the video at
www.superdatascience.com/119. And if you have the
opportunity, then it's probably best to watch it that way. But
also you will get exactly the same insights from listening to
the podcast.
And on that note, without further ado, I bring to you the all-
time favourite, the incredible Hadelin de Ponteves and the
trends of 2018.
(background music plays)
So today we're talking about trends, data science trends. It's
sometimes a bit complex to separate tech trends from data
science trends because they come hand in hand. You can't
really imagine, a lot of times, technology without the power
of data science behind it, and also data science usually is
used to empower certain applications that have some
technological aspect to them. We can highlight the ones that
are more data science specific.
Hadelin: That's right. And anyway, today when I talk about data
science, I mostly talk about AI because that is the most
exciting part, first. And second, data science has been kind
of automated by AI actually, and so that's why the demand
for machine learning AI is growing as the demand for
analysts is slightly decreasing its acceleration. So I mostly
talk about AI, and we're going to see that the big trends
coming in 2018 will all be around AI.
Kirill: I'll try to talk less about AI to mix it up, because Hadelin is
super passionate about AI, and he will be doing most of that.
Ok, so thank you very much to Leonid, our resident data
scientist, who has helped us put together a list. We have two
lists, actually. We have a list of trends that are predicted to
be popular in 2018 and to pick up, and we have a list of
trends that were predicted to be popular in 2017. So the way
I think we'll structure it is we'll first go over the 2018 ones,
see what's coming, and so we make sure we cover them. And
if we have time at the end, we'll review what happened in the
previous year and see how correct those predictions were
with what we found there. Sounds good?
Hadelin: Sounds very good.
Kirill: Alright, let's get started. So first trend in 2018: An AI
Foundation. Funnily enough. We're talking about how
businesses are going to start using AI more and more and I'll
probably steal the quote here from Andrew Ng that AI is the
new electricity. So Hadelin, what are your thoughts on that?
How are businesses going to be using AI more and more in
2018, and what are going to be the major developments
there?
Hadelin: That's right. So lots of companies have integrated AI into
their business process. And actually I have some figures. I
think there's around 60% of companies that have already
made the move to integrate AI into their system, into their
company, into their business processes. Automated
processes, reduce costs, or make it an AI-based company.
And only around 30-40% are starting to think about this but
have not made the real step yet. So definitely that's a big
trend of not 2018, but already in 2017. Companies are
adopting AI and try to build some AI teams and make it AI-
optimised. So yes, this is really happening.
Kirill: And we talk about the difference between general artificial
intelligence and narrow artificial intelligence, right? And in
the sense that when we say AI, we don't mean that there's a
robot controlling the whole business or anything. It's like
very narrow applications, in say one specific area, in
marketing, in operations, or in another part of the business,
there is AI.
Hadelin: And also in the decision-making process. They use AI lots for
decision making. It's sometimes the AI that makes the
decision, because they apply the machine learning models
on the data and they get some great [inaudible] tools to help
the decision process, sometimes at the highest level. The
executives use AI to help make their decision.
Kirill: This statistic is very interesting, because I was surprised at
the 40%. I think it sounds really high, that 40% of
businesses would have already in some shape or form
started adopting AI. But I guess what that’s saying is that
it’s not across the board, not across the whole business.
What are your thoughts on that? Do you think more and
more businesses will be adopting AI, not just in one specific
application, because that’s what I think it’s talking about,
that in one area there is some sort of AI that they’ve
introduced and that’s how they get the 40% number?
By the way, guys, if anyone is interested, some of these stats
come from the Gartner reports. We can mention where else
we get them as we go along, but for example, that came from
one of Gartner’s recent reports. So do you think, Hadelin,
that businesses are going to limit themselves to having one
application or two applications of AI across the whole
business? Or do you think they will start doing narrow AI in
many different spaces?
Hadelin: I think they will just leverage AI. Most of them leverage AI to
improve their business, to optimize their business, to reduce
their cost. There are a lot of applications that you can
leverage AI from to improve your business process. So I
think that’s mostly what’s going to happen. Then you have
on the side the real AI-based companies that do general
artificial intelligence, that do artificial intelligence at the core
of their business. But if we’re talking about most companies,
well, they’re definitely starting to leverage AI. And mostly
thinking of the consulting companies, each one of them is
building a team of data scientists and using AI, as I said, for
their decision making process.
Kirill: Okay. All right. Any other comments on AI and how it’s
going to lay foundations in business in the coming year?
Hadelin: Well, it can go really far. AI can be applied in some field that
is not covered yet today. You know, when we talk about AI,
there are actually a lot of branches of AI. You have computer
vision, so computer vision can be used in many businesses
to improve it. You also have deep natural language
processing like what we’re doing with chat bot, and that can
help significantly some companies by bringing some chat bot
systems that can help people in the company and that can
help navigate or whatever. You also have some other
branches like robotics. Robotics can definitely automate the
processes and everything.
And you also have those data robots that can leverage the
data automatically and provide some outputs that will be
insightful for decisions you have to make. So, there are tons
of applications and there are even some applications that we
haven’t thought about. So that’s definitely going to develop
in the coming years.
Kirill: What’s your favourite application? Is there something that
pops to mind that you’ve recently heard of that you’re like,
“Wow, that’s a really cool application of AI?”
Hadelin: Yeah, I have two in mind.
Kirill: Okay, let’s go.
Hadelin: I have augmented reality and blockchain. Blockchain and AI
is going to be the big trend in 2018, the combination of both.
I actually don’t know which one exactly is going to be the
new electricity. Remember, you said AI is going to be the
new electricity, but when I see blockchain developing and
what it’s capable of doing and all the things that are
happening right now, I have some doubt what is going to be
the real new electricity.
Kirill: Nice. And that’s a good segue for us to get into blockchain.
Blockchain is a shared distributed decentralized ledger that
basically takes out the middleman, takes out the bank or
the polling system or something and helps people have
trustworthy transactions with each other, secure
transactions with each other even when they don’t know
each other. I know you’re also very excited about blockchain
and there actually was a TED Talk recently where they were
saying that blockchain, as you just mentioned, is going to be
the technology that’s going to change and shape the world in
coming years, but especially in 2018, we’re going to see
some major shifts because of blockchain, and those shifts
are actually going to be even bigger than the ones that we’ve
seen with AI. So what are your thoughts on blockchain and
why are you so excited about this technology?
Hadelin: Oh, my goodness. It’s so disruptive. For example, it could
build a new Internet because the fact that it is totally
decentralized all over the world makes it extremely powerful
at, for example, compressing data. AI and data science is all
about compressing data so that we can have faster and
faster transfer of data or faster and faster transactions and
even more secure. And blockchain will play a significant role
in that because since everything is decentralized and since
everything is scripted and since everything is well-organized
into flows that are in such a way that you cannot go back in
the flow and modify anything, well that makes it a super
solid, safe, and fast system.
And why did I say that it could build a new Internet? Since
everything is decentralized, we could have the data divided
into very small parts all over the world and that would make
some kind of peer-to-peer compression that would make
everything super powerful, like fast compression, you know,
everything decentralized so that you have some extremely
fast connections around the globe and that would be thanks
to blockchain. So that could go really far. I think maybe two
years or three years from now that that could go really,
really far. And AI, of course, has a part to play in that
because AI automates everything, it automates the processes
so it will optimize the process inside the blockchain and
that’s why the combination of both these technologies will
make something super powerful.
Kirill: And also I wanted to mention that a lot of people, and I used
to do the same, when we hear blockchain we think Bitcoin
and when we hear Bitcoin we think blockchain, but those
are not synonymous. Bitcoin is one of the things that
leverages blockchain, that is built upon blockchain, and at
the same time blockchain can be used for many other
things. I really like the example somebody suggested of
using blockchain to do voting. You know how in the U.S.
there were elections and people were voting and then you go
and you submit your vote? There is always an organization
in the middle, like a big organization that counts the votes,
that makes sure everything is safe, that there’s no cheating
and that everything is accounted for, everything is
trustworthy.
The organization that’s in the middle to ensure trust, that’s
where blockchain comes in. Blockchain can remove that
organization and basically you could do voting from your
computer and the way blockchain works—we are not going
to go into detail for me personally because I’m not an expert
in blockchain, not yet, but I really want to get deep into the
stuff and understand it better because it’s so interesting and
disruptive.
But at the same time, you take out the organization, you put
a blockchain, and what that enables, through the
cryptography it has and through this decentralized and
hyperconnected system, what happens is that now all of a
sudden it’s completely trustworthy. There’s a ledger of
everything that happens, this ledger is decentralized and it’s
distributed to lots of people. You would have to hack
hundreds of thousands of computers at the same time with
the highest level of encryption in order to break into that,
and that’s much harder than to hack into an organization or
something like that – I guess, I’m not a hacker. So, that’s the
power of blockchain. For instance, voting during elections,
that’s a big deal. Like, imagine you wouldn’t have to get out
of your house; you’d just vote on your laptop and do it like
that. There are a couple of other examples. Anything else
pops to mind?
Hadelin: Yeah, that’s right. You actually raised a very important
point, which is actually there will be a massive trend in the
coming years. And that trend is security. Blockchain will
have a huge part to play in security. As you said,
blockchains are safe because you need to hack thousands of
computers to hack the system. And for this reason we’re
going to definitely leverage blockchain to make some more
secure and safe systems like the example that you gave
about voting or any other ones.
Well, the main application of blockchain today is the new
money, Bitcoins, which I actually have doubt it’s going to
last because it has signs of a bubble. But with this
blockchain we can make a totally decentralized and safe
financial system, so that indeed we improve the security.
That’s just an example of the security brought thanks to
blockchain can be applied to many other fields and that’s
definitely going to be a big trend in the coming years.
Blockchain will not only play a part in security, but AI will
have to play a part on itself for security, because AI is
developing pretty fast and at some point we will reach some
powerful artificial intelligence that will go beyond human
capacities and therefore we will have to control AI and that
will be another part of this big security trend.
Kirill. Yeah. I had another thing I heard about blockchain, that it
can be used for distributing music. There is a couple of
people who distribute their music through blockchain and
what that does, if you’re just a user, you can download it for
free and listen to their song and so on, but if you want to
use it for a project, like in a movie or in a trailer or inside
your own YouTube video or something like that, then you
just get it through blockchain and that way the transaction
happens automatically. So, again, it’s this whole trust thing
that nobody is going to get your music on its own.
Hadelin: Yeah. And what’s crazy is that you don’t even have the
music on your phone. That’s because it’s totally
decentralized. You have one part of the music somewhere,
another part of the music somewhere else, sometimes 1,000
kilometres away from each other, and that’s a peer-to-peer
system which makes it fast compression that allows you to
have the music very quickly on your phone without adding it
literally on your phone.
And that’s the same for movie compression or streaming.
You can leverage blockchain by having some parts all
around the decentralized system to get your streams, movies
on your computer, and this will come from all this
decentralized system brought by this blockchain technology.
Again, there’s the safety component that is really improved
by blockchain, but also the speed of the transfer, and also
the data compression.
Kirill: Nice. So, the question I actually have that I was thinking
about was, it’s really cool to know about blockchain and
understand that, “Oh, cool, it goes into the foundation of
Bitcoin or this can go into foundation of the voting system or
content distribution and so on.” But the question is, is it
something that’s completely out of reach, or is it something
that we can create ourselves? Can we just sit down and
program a blockchain in Python or something like that? Is it
possible?
Hadelin: Yes, it is. And we will do it.
Kirill: We will do it?
Hadelin: Yes, we will do it very quickly.
Kirill: (Laughs) Awesome. Okay, so that’s our two cents on the
blockchain. Okay, so the other one is not on our list right
now, it’s on our list of stuff for 2017, but since we mentioned
it I think it’s an important trend to point out: security, so the
whole concept of security of data on the Internet and how
that is important and how that is progressing. I have a few
interesting examples. In 2017 we had some major, major
security breaches in the world and that is a huge indication
that in 2018 security is going to start growing again. Two
major breaches: WannaCry, cyber attack in May 2017,
people have probably heard about that. Microsoft computers
in many countries around the world were locked down
[indecipherable 20:07] companies from FedEx to the
Ministry of Foreign Affairs of Romania were impacted by
that. That was a major thing, there’s a Wikipedia article
about it and so on and that was all over the news. That was
a big one. That was in May 2017.
And just when you think it can’t get any worse, one of the
biggest attacks in the world history also happened last year.
You might know about a company called Equifax, it’s a
credit rating company that has about 800 million customers
around the world, it’s like one of the top three biggest credit
agencies in the world. And 143 million customers were
affected by that attack. That happened last year and it was
announced on the 7th of September that they had a data
breach, but that happened actually ages before that or
months before that. People’s first names, last names,
addresses, Social Security numbers, dates of birth and more
information were stolen.
And if you think about it, 143 million people – there’s like
324 million people in the U.S. on its own. So that’s 143
million in the U.S. alone that were affected. There were
people in Canada, in the U.K. that were affected. So, 143
million out of 324 million, which is the population of the
U.S., that’s almost 50% of the U.S. population was affected
by this attack. How crazy is that? And that’s just in the U.S.
alone. That just stands to show.
And there were tons of other smaller examples, like the Uber
hacking where they paid the hackers not to say anything
and then some executives were fired for that. And other
companies as well that we’ve heard in the news that have
fallen victim to attacks. It’s on the verge of different trends.
Data is becoming more and more popular, more and more all
over the place. And with the proliferation of data, what’s
happening is it’s harder and harder to keep it safe, it’s
harder and harder to keep it secure and then also hackers
have access to much more sophisticated tools, not even
talking about AI and machine learning. Even the algorithms
and ways they can infiltrate systems are much more
sophisticated. There’s a huge thread and I think that
security is going to be a major trend in 2018 and onwards
because companies understand the importance of protecting
their data and their customers’ data. What are your
thoughts, Hadelin?
Hadelin: That’s absolutely right. Customers’ data actually already had
issues in the past. People were complaining that the data
was actually used in some non-ethical ways and that’s
because there were some data breaches, security breaches.
So, yeah, this is another essential part of the security trend
that is coming, we not only the need to protect ourselves
against powerful AI or we not only need to protect ourselves
against hacking, but we also need to protect the data. For
this, again, blockchain will definitely play a part in that,
because since everything is encrypted, that can be a way of
protecting the data in such a way that it would be very
difficult to hack it thanks to the multiple systems
decentralized all around the world.
So, I think we will always have to work on that because it’s
the war of technology. Once you find the technology that can
protect your data, what comes after that is that somebody
who has a better technology and can break your technology
that protects your data. So, we’re going to have some leap
over leap, so the more we will get into some improving
technologies, the more we will have to pay attention to the
fact that it’s going to be difficult to keep up. And we’re going
to have more and more experts in depth of all the
technologies that are protecting the data and therefore less
and less people because we’re going to reach a higher level of
expertise that’s going to be very high.
And that’s another trend that I’m going to, it’s that trend
and possibly a danger that has to do with security, is that
the research on AI and everything is becoming so high-level
that less and less people manage to get into it at the state-
of-the-art. That’s a danger because imagine the state-of-the-
art research falls into the wrong guys’ hands. You know, the
black box will only be understood by the wrong guys. That
would lead the world into some kind of danger. So, that’s
why we not only need to protect the data and develop those
technologies, but we need – and that’s the most important
thing – to educate the world to teach them how this works
and to explain them how the state-of-the-art models work,
so that the black box doesn’t become that black for too many
people.
Kirill: I think that’s a great answer, a great comment on that. I’m
just checking the questions occasionally. We just had Halper
say that “Sorry, guys, this is cyber-security. Can we please
get to things more related to data science?” So, I want to
respond to that because I have seen the world of cyber
security. I’ve been working at Deloitte and I’ve seen what a
market this is. It’s a huge, massive area which is so
underrated by data science practitioners for the reasons that
we mentioned.
One is that any kind of data science work that you do can
fall victim to cyber-crime. Also, because it intersects with
data science. There are so many different data science
applications. Like, the machine learning algorithms that
we’re using, that we’re learning, they can all be applied in
the space of cyber-security and in the space of data security.
There are definitely inherent algorithms that are specific.
Like, what’s that best one called? I forgot, but basically
there’s a mathematical equation that is specifically used in
the space of cyber-security, at least one that I know of.
But all the other algorithms with the increasing rise of
machine learning, AI, deep learning, that is slowly shifting
into the space of cyber security and that’s what we had
witnessed in 2017. We saw the infiltration of deep learning
and machine learning in the space of security to help find
these anomalies, find these possible areas where things can
be breached, and to help mitigate those risks. So, I
personally think that data science trends, we will cover off as
many as we can right now, but one of the biggest trends
overall is the one we just talked about. You can see how it’s
on the intersection of different things like data science, AI
and blockchain.
Okay, moving on. So, next one we have is deep learning
technology is becoming mainstream. We’ve seen a lot of
things happen just recently from image classification,
machine translation, facial recognition, chat bots and other
things that use deep learning insights. They’re starting to
rise now. Hadelin, what do you think is happening now?
Because a lot of these algorithms, tools and technologies
have been around, some since the 80s, but some have been
around since 2012. Why is it all happening now in 2017/18?
Hadelin: Because of the applications that we realize we can do with it.
You know, we can do some crazy applications with deep
learning now and we don’t have to wait for one or two years
to do them. So, they have become extremely popular. We can
see, for example, the GANs that managed to create some
fake images of a real-looking human person, or those
computer vision applications that can detect any object in
videos. You know, these are some very cool applications.
And the other reason why they have become so popular is
that it is being democratized. Thanks to all the open-source
platforms like GitHub, you can get the code, and as we show
in our course, apply them very easily on your videos or on
your applications to do these very exciting applications.
That’s why, it’s mainly thanks to everything becoming open-
source and more and more easy to apply, because as you
know, 4 or 5 years ago only experts could use that, only an
expert had a good understanding of how it works and
everything. You know, it’s like people had trouble in the
beginning getting from old phones to smartphones. Four
years ago people had trouble getting on all these codes and
all these models, but today more and more people, even if
they don’t have any notion of coding, they manage to code
and use the deep learning applications.
However, that being said, I read some statistics about the
models used in data science, in companies, or in general,
and those deep learning models are still at the bottom of the
ranking. The most used models in companies today are still
logistic regression, Random forest, XGBoost, decision trees,
and the deep learning models like CNNs, RNNs or GANs
actually are still at the bottom of the list. It is growing, it is
definitely growing. However, Geoffrey Hinton has just issued
a new paper on capsule networks. If that works, if it can be
implemented easily, and if it works fast enough, that could
mark the end of the actual deep learning model because this
would be revolutionary. But we’re not there yet. The
implementation is very long to implement and execute, so we
still have some to realize that.
Kirill: Okay. I agree with those points. Capsule networks are
definitely some interesting disruption that’s coming our way.
And you can tell about the importance of deep learning just
by the number of students that sign up to our course. When
did we release the deep learning course?
Hadelin: We released the deep learning course last March, actually.
Kirill: March, yeah?
Hadelin: Yeah.
Kirill: So that would be like nine months. And how many students
signed up so far?
Hadelin: We have in total 65,000 students.
Kirill: 65,000 students just in that one course signed up in nine
months. That’s quite insane. That’s like 7,000 people per
month signing up to that course alone, is that right?
Hadelin: Yes, absolutely.
Kirill: That’s hard to believe. That’s showing where the world is
going. We can kind of get a sense for those things. You
know, before, machine learning—and still machine learning
is very powerful, but now we’re slowly going into the space of
deep learning because deep learning can solve any problem
machine learning can solve, but better and more accurate.
Hadelin: Yes. However, for simple applications, models like logistic
regression and XGBoost will still stand above deep learning
because deep learning is not the fastest to execute, you have
to iterate over many epochs, to train and apply forward
propagation and backward propagation many times to train
your data. Whereas XGBoost, you just put the data as your
input, and in a flash it returns the output you want. So, for
simple applications, for simple problems, which can
definitely give you some insights for your company, I think
the logistic regression models will still be among the first. So
it’s not like deep learning is going to erase the other one, it’s
not that it’s going to make the other ones disappear. It’s just
that for the powerful applications that are extremely
demanding, deep learning will become the best models.
Kirill: Gotcha. Very, very good points on that. Okay, moving
further: Persistent growth of the Hadoop market. So, what
we are seeing is that Hadoop and any Big Data systems like
Spark, for example, which is kind of the new thing out
there—remember that we were at the ODSC, I think, and
Spark 2.0 came out, that was in May this year. That was like
a new big thing. So why are these technologies, why is
Hadoop and Spark, why are they becoming more and more
prominent and why are more and more companies going to
them? What do you think?
Hadelin: Well, that’s because the amount of data is constantly
increasing. You need systems like Hadoop and Spark and
Pig and Hive to handle all this data, to handle all the Big
Data systems, because otherwise it would be really slow to
handle them. Those systems are faster and faster to manage
your data and to organize it and to leverage insights from
them. You definitely need those systems. And actually I
heard that—well, actually there’s an important point to say
about data science, it’s that Python and R are still by far the
software that’s the most used in company to do data science
or machine learning or deep learning.
But then there is something growing up, it’s called Scala and
it’s based on those Hadoop systems that handle Big Data.
That is growing because you need more and more powerful
systems to handle bigger and bigger data. That’s why it’s
something to definitely consider. Actually, on LinkedIn I see
a lot of recruiters’ posts and in these posts I see the skill
that are needed, and I see now almost all the time, besides
Python or R, I see Hadoop, Spark, Hive, Pig and Scala.
Among all of them, if I had to choose one, if I had to
recommend one, I would say Scala, because it’s extremely
powerful at handling Big Data.
Kirill: And also I’d like to add that what we discussed before, deep
learning and AI are contributing a lot towards the rise of
Hadoop and Big Data systems. Because to train deep
learning models and AI algorithms, you need a lot of data.
You need that data to be stored somewhere, you need to be
able to access it quickly, so it’s just natural that those two
come hand-in-hand. The more the world turns to AI and
deep learning, the more we’ll see Big Data systems such as
Hadoop, Scala, Spark and so on.
And also a lot of it is going into the cloud. It’s like a trend
that we’ve seen in 2017, that it’s not just Big Data, but it’s
also Big Data in the cloud. And the reason for that is the
cutting of costs, right? If you have servers on your premises
for a large organization, that’s one thing. Then you have
servers and you need to scale them, you need to broaden
them, you need to update them even as new technology
comes out and new hardware comes out. That’s millions and
millions of dollars, tens of millions of dollars depending on
the size of the organization; hundreds of millions of dollars
for some organizations.
Whereas if you have things on the cloud, it’s less likely or it’s
not as trusted yet by executives, especially old school
executives who don’t want to let go of their data, they don’t
want it to be somewhere else, they’re worried about security
and so on. But, it can actually be even more secure, it can
be very easily accessible, and it’s very easily scalable, and all
the things can be updated very quickly so you don’t have to
worry about updating your hardware. You can just click a
button and your hardware gets updated, or the team that is
managing it, because now they have economies of scale, the
company that is managing it, they’re doing it for many
businesses, so it’s easier for them to upgrade their hardware
and also you just click a button and it’s scaled. That’s a
huge thing and that’s why a lot of start-ups that are starting
out, they don’t even consider having their servers on
premises, but straight away in the cloud. It’s harder for
companies that have been around for a while to make that
move, but the ones that are doing it, those are the ones that
are going to be ahead of the curve. My question to you,
Hadelin, is, are we going to make a course on Big Data one
day?
Hadelin: Big Data? Well, we are going to make a course on Big Data
once Scala or any other system stands out. Because right
now it’s not standing out that much. We still have Python
and R, but as soon as one of these Big Data systems is the
most used system in the companies that can have some
tremendous and powerful impacts on the companies, we’ll
definitely make a course on that. And actually, I did a lot of
that when working in Google, I worked on a lot of the Big
Data systems, so I could share with you my experience and
how this works. Yeah, we can definitely do it. What do you
think?
Kirill: I agree with that. Because we’ve been asked by students
quite a lot about this. I think the reason we haven’t yet is
because this industry is still in early stages, it’s very much
forming, it’s very much shaping up. Like, Spark 2.0 came
out this year, at the start of 2017. You constantly have new
technologies, new versions come out and so on. Like, if we
record a course today, two months from now we will need to
re-record it, we will need to update things and so on.
It’s a good point, as soon as there’s a prominent market
leader in that space and we know where this whole thing is
going, then we can give you a course and also understand
for ourselves and help you guys understand where this
whole thing is going and how to keep updated with these
things. It’s not in the pipeline yet, but it’s in our vision to
create this course some time soon.
By the way, Leonid, our resident data scientist, asked me to
make this little plug, so a little bit of advertising here. We
don’t have a course on Big Data, but we do have a series of
tutorials which apparently are amazing on YouTube, which
are about PySpark. So, if you want to learn about PySpark,
no cost involved and you don’t have to purchase anything,
just subscribe to our YouTube channel, make Leonid happy,
give him a Christmas present, because he is in charge of our
YouTube channel, and you will also get updates about the
PySpark series that we are releasing and maybe that will be
something that you’re looking for. And also, of course, there
are other things apart from Big Data that we talk about on
the YouTube channel. Check it out. Okay, any other
comments on Big Data?
Hadelin: Yeah. Well, PySpark is amazing. I really encourage students
to subscribe to that channel. You know, it’s not standing
out. If it was standing out, we would make a course on it,
but it’s definitely useful. So that’s a great thing. That’s
amazing.
Kirill: Awesome. Okay, we’ve already talked about AI. There’s
another side of things which is applied AI, applying it in
different spaces, different areas. I think we’ll skip that for
now in the interest of time. Let’s talk about digital twins –
interesting concept.
Hadelin: Digital twins! Yeah, I hear that more and more. That’s the
Internet of Things, right? It’s like you’re connected to your
objects and you can transfer some information between the
digital twins and yourself, so that you can use them at a
better and better rate. Is that correct?
Kirill: Yeah, yeah. And it’s not just for people. Like, an airplane will
have a digital twin, or an airplane engine will have a digital
twin and there’s like a data connection between them, or like
a whole city could have a digital twin and you can basically
model different scenarios that can happen in the city or in
the turbine of an airplane by analysing the way that the
digital twin is behaving and having the inputs from the
actual object to the digital twin and also adding your own
inputs. Or you could take the inputs from your own object to
your digital twin and then also take inputs from the other
hundred airplane engines that you have in your company
and you compare it to the average, tweak things, and see
model scenarios.
It’s very useful, for instance, in things like airplanes for
preventative maintenance, so you know when the issues
might come up even way before they are going to come up.
And, of course, cities, to understand the behaviour, social
and demographical things, transportation and things like
that. So, you can model traffic jumps. For example, a city is
growing. You constantly feed inputs from all the sensors that
you have in the city into that digital twin and then you’re
like, “Oh, I wonder what will happen during Thanksgiving
when we block off these three roads.”
Because you have a digital twin, which is pretty much the
identical copy of the city, you can actually block off the
roads. This is my understanding of things. And then you see
what happens to the traffic all over the city simply because
you have been inputting those data points. It’s not just like a
model that stores data points, it’s actually a model that
learns how they behave, how they interact and what
dependencies are in there.
Hadelin: That’s right. And there is another term we haven’t spoken
about. At the beginning of this webinar I said, like, “The
biggest trends that I’m most curious of or that I see coming
the most tremendous way,” so we talked about blockchain.
And then I don’t remember if I said it, but the other one was
augmented reality. We didn’t speak about this one, right?
Kirill: No.
Hadelin: Yeah, that’s right. I heard that this could be the first AI
trend that could reach the trillion dollar market, augmented
reality, because it has tons of applications.
Kirill: Like Pokémon Go.
Hadelin: Yeah, but not that kind of applications. Like, long-lasting
applications. And I heard this has huge potential, so that’s
definitely something to follow. What do you think? Do you
think it could be reaching such a huge market?
Kirill: Yeah, definitely. I heard that there’s VR, virtual reality, but
augmented reality actually has the potential to be bigger. We
saw initial attempts at that with the Google Glass, and that
was like ages ago. What was it, 2015 or ’11?
Hadelin: Funny, when I worked at Google, they actually introduced
them and I was there when it happened so it was 2015.
What did you say, 20—?
Kirill: I don’t remember, 20-something. (Laughs)
Hadelin: Quite recent. Yeah, it was definitely 21st century.
Kirill: Yeah. And then you left Google and the whole project fell
apart.
Hadelin: Yeah. (Laughs) Well, that happens.
Kirill: Yeah. Okay, augmented reality is an interesting one. It was
funny to see how Pokémon Go just boomed and then was
gone. I don’t hear about it.
Hadelin: I’m not sure what the reason is exactly. I don’t know, but it’s
crazy how this was like what we call these ‘bubble trends.’
You think it’s a trend, you think it’s growing and at some
point it bursts out and nobody hears about it anymore. Like
Bitcoins, for example. I hear debates on Bitcoin all the time
right now, like every day. The debate is whether it’s a
bubble. “Do you think it’s a bubble? Do you think it’s not a
bubble?” It’s based on blockchain technology, but still it has
all the signs of a bubble, so everybody is talking about this.
What do you think? Do you think it’s a bubble?
Kirill: That’s a good question. It really reminds of how the first time
Bitcoin really spiked, I think it was 2014 or something like
that, and it just went up and then people were like, “No, it’s
going to keep going up forever,” and then – Bam! So, I don’t
think it’s a fad, I don’t think it’s something that will go away.
I think we will be using more and more cryptocurrency,
Bitcoin or others, but I have doubts that it will keep growing
forever like that. A lot of it is fuelled by hype, by media and
stuff like that, and as soon as something else, the next big
thing comes along, I think there will be a correction. This is
not financial advice, by the way, guys watching this webinar.
It’s just our opinions.
Okay, so we talked about digital twins, we talked about
augmented reality. What do you think about self-serve
analytics? Data science is growing. Let’s get down to the
basics. Forget about AI and stuff like that for now. So,
business intelligence, and we’ve got lots of different tools,
lots of different approaches, and the amount of data, the
volume of data, the velocity, variety, veracity, etc. of data is
growing all the time and very quickly. So, with that amount
of data, organizations are slowly starting to realize that it’s
unsustainable to have only data scientists look into that
data and pull the insights out.
It’s still very important to have data scientists, and more and
more organizations are getting into that, but at the same
time, what if everybody in your organization can look into
data and get insights from it at some extent? And that is
self-serve analytics. What are your thoughts on the trends of
self-serve analytics in 2018?
Hadelin: That’s a very good question. Indeed, it’s like what I said
about the black box. You’re absolutely right. Right now, only
data scientists can leverage the data to gain some insights
and help with decision making and everything. At the same
time, we have those automated systems like this company
DataRobot that basically makes what they call ‘data robots’
that take your data as input and will return the output
without needing the work of a data scientist doing all the
process of data analysis.
That’s what I said at the beginning of this webinar. I think
that it has the potential to be automated, like self-managed
data systems, and it’s actually going to come pretty quickly,
but it will not replace data scientists. We will always need
data scientists to improve these systems, check these
systems, control that they give the right insights, check that
that makes sense because sometimes the decisions can only
be good decisions if you include the human factor. So, we’ll
always need some people to have a complementary job on
that, because the machines cannot do everything. So, I think
self-analysing systems, as you call them—
Kirill: Self-serve analytics.
Hadelin: Self-serve analytics will grow, but will never grow to the
point that it will replace the data science jobs.
Kirill: That’s a good point. A little bit of reassurance there. Yeah, I
agree with that. And I also think it’s important for those of
you out there who are data scientists or who are aspiring
data scientists, it’s an important trend to keep in mind. It’s
been around for a while now, but it’s going to be picking up
more and more that people in organizations, regardless of
their level, they are going to need to have some sort of data
literacy. And it’s your job, or you can make it your job as a
data scientist, to spread that, to create data advocates and
to create people who are excited and inspired by data.
It’s going to make your job easier because that way, the
people you talk to in the organization, they know about what
you’re doing, they know the value and the importance of
data and data science, and that’s cool. But also, it’s going to
help the organization to grow into that right direction. If you
really care about the organization, and I really hope you do
in the sense that you’re working in the company that you
love and that you believe in their mission. If you do, then
that will be your contribution into putting them onto that
right pathway where not only you are doing the data science
work, but everybody in the organization is contributing,
some people can do a simple regression, some people are
better at understanding the different types of data, or some
people have access to BI dashboards that you’ve created and
now instead of you redoing it every time, you’ve created them
in an interactive way so that everybody can get their own
insights.
It’s an important trend for data scientists to consider,
because one thing is just doing data science on your own
and being the rock star that’s cool; another whole big thing
is about educating others in the space of data science. At the
end of the day, you will help them out not just in their roles,
but also in personal growth because that’s where the world
is going. You have to be data literate to be up-to-date with
everything that’s happening and have other opportunities,
you know, have a broad spectrum of ways that you can
develop your career.
And speaking of not doing data by yourself, we had an
interesting trend that we haven’t talked about yet, and that
other trend is that companies are going to look more into not
just hiring data science geniuses or wizards standalone, but
actually building out data science teams. So, a slight
difference there, but a very important one at that. What do
you think, Hadelin? Why do you think companies are going
to be steered away a little bit just from one super genius
data scientist? That’s cool, but how about we build a team of
five or ten that work together very well?
Hadelin: Because the goal in the end is to get as much people as
possible on data and getting the skills to manage the data.
It’s what you said a couple of minutes ago. Most people
should be able to leverage the data to gain some insights as
everybody is using a smartphone today. Everybody knows
how to use a smartphone. We need everybody to know how
to leverage data to gain some insights. That’s why I think
they are making the teams. They don’t want to leave that to
the experts because this is not democratization. If we leave
that to the experts, we will miss out a lot on other
capabilities because data science is not that hard.
Everybody can do it. Everybody can apply the models. You
just need to understand the intuition. And sometimes you
don’t even need to understand the intuition, you just need to
understand how you have to get your inputs into the system
and apply the models and gain your insight.
The data is becoming so abundant. Data is everywhere.
There is more and more data that of course we need more
and more people, and the only way to get more and more
people is, instead of leaving that to the experts, building
teams of many data scientists or many people that can at
least do the basic stuff in data science to gain some powerful
insights.
Kirill: I agree with that. Like, when you have a team of people, you
have one expert that’s awesome, but you’re dependant on
them. Like if they leave, or if they decide to do certain things
in a certain way rather than exploring other possibilities,
other tools, you will be very dependent on that kind of stuff.
I think your opinion here—people watching this or listening
to this, you guys really should listen to Hadelin on this
because you’ve worked in data science teams, right? You’ve
been in Google and your other jobs that you’ve been in—I
have been in that situation where I was the one data
scientist and I was doing all the things.
From that I can totally speak to, yes, I tried to do my best in
good faith and do really amazing work as much as I could,
but at the same time it was very highly dependent on my
subjective opinions, on my subjective ways I think the
company should go and do things. You know, that might be
wrong, it might be right, but you don’t want a large
organization depending entirely on the opinion of one
person. So I think what you said is valid here.
And the other thing that pops to mind is executives, so let’s
talk about executives for a bit. There’s two sub-trends in the
trend for executives that I see. We have more and more
organizations hiring CDOs, Chief Data Officers, and the
other one is that more and more executives, like Chief
Executive Officers, the guys that are directors and heads of
the companies, they are looking to get educated in the space
of data science. Like, it’s not their jobs to be data scientists,
but they want to find out more about algorithms, about
applications, about AI, about deep learning, about all these
different things data science-related to not become
technological or data science dinosaurs so that they can see
what this is all about. What are your thoughts on that,
Hadelin? Why do you think more and more executives are
jumping on board with this trend, and do you think it’s
necessary?
Hadelin: Of course that’s 100% necessary, and a simple reason for
this is that executives are the one who makes the decision.
They are the ones who decide the next move in the company.
And since data science is so powerful at leveraging the data
to get the right insights that will help in a significant way to
take the right decision, well, executives definitely need to be
connected to data science; not necessarily be experts, but be
connected to data science to understand and be convinced
how data science can help them make the right decisions.
And I say that not only from a logical point of view, I also say
that based on experiments. I had on the phone a lot of
executives that asked me for some advice on how they would
leverage data science to take decisions. They said mostly
that the problem was that there’s a huge pyramid between
them and the data scientists, so they are far from the data
science teams and therefore they need some better data
visualization tools to understand how the data is leveraged
and the insights are extracted to help them take the right
decisions. So the executives want to get more and more into
data science and they actually need it for the simple reason
that they’re the ones making decisions and data science is
so powerful at helping them to take the right decisions.
Kirill: Interesting. So, let’s talk about strategy because decisions,
they link up into strategy. What are your thoughts on data
strategy for large organizations? Is that a thing? Is it
important for an organization not just to think through their
marketing strategy or let’s say operation strategy, growth,
expansion and so on? Do you think that executives should
be thinking about data strategy? And what does that mean,
what does it mean to think about data strategy?
Hadelin: If you talk about strategy, I think strategy has a lot to do
with intuition as well. It has a lot to do with intuition,
experience and not only data. Data can help in the strategy
because in the strategy you have to take some decisions and
data helps in taking the right decision, but there is so much
more than decisions in strategy. It’s a combination of things.
It’s pretty complex, by the way, but you also need intuition a
lot, and I think the intuition is the opposite of data. That’s
why data will never replace everything because you always
need intuition, and you mostly need intuition and strategy.
So that’s a very interesting question, actually, which I think
the answer is that data is not everything for strategy.
Kirill: I agree, but what I’m referring to is—data is not everything
for strategy, I totally agree, but in the sense that let’s say we
have strategy overall, but inside strategy we have everything
to do with data, like the tools that we’re going to use. Are we
going to install Hadoop or are we not going to install it? Are
we going to go to the cloud or are we not going to go to the
cloud? Do we add more data points? Do we have enough
data points about our customers? Do we need more inputs?
Do we need more unstructured data? Do we need to handle
unstructured data? What insights can we gather from our
data, or what is our current data saying about where our
organization is going and how can we leverage that more,
how can we implement deep learning or AI algorithms and
so on? That’s the stuff I mean for the strategy around data.
I think it’s quite important for organizations to start keeping
that in mind. I don’t know if it’s just going to happen on its
own, the way it happens, and that might be a bit more
reactive than proactive. Data strategy helps you be proactive
in the sense—it’s really hard to be proactive in the first place
because there are so many technologies that are coming out
that you don’t even know about and that’s going to come out
next year or a few months down the track, but at least you
put in effort to be on top of your organization. You know
your pitfalls and you know where you need to patch things
up, you know where you’re not keeping up to speed with
everything that’s going on in your organization in the sense
of data. But if you don’t even think through data strategy,
that leaves you way behind everybody else and I think that
takes away a huge competitive advantage for companies.
Hadelin: Yes. And I’m reading something interesting here, so I’m
going to read that to you. According to Gartner, 59% of
organizations are still building their enterprise AI strategies
while the remaining 41% of the organizations have already
made the plunge. So, yeah, there is definitely something
happening with the AI strategies for companies right now.
59% is a lot.
Kirill: Yeah, so they’re at least thinking through how they’re going
to—
Hadelin: Yes, leverage AI for strategy.
Kirill: Gotcha. Okay. Yeah, very cool stuff. What else? Do you have
anything else that we have missed?
Hadelin: No, I have mentioned all the trends I wanted to speak about.
The ones that I’m very curious about and I will be following
very closely for the next year, in 2018, will be blockchain
and maybe augmented reality.
Kirill: Nice. And for me probably blockchain, I definitely want to get
deep into that topic and understand a bit more about
blockchain, what’s going on there, and how we can apply it
in the world, how it’s going to be transformative. And I think
AI, I will be interested to see how that goes. I’d say more
deep learning, less AI for me. It’s kind of more basic than AI,
but I like the concept of narrow applications. So something
like, “Okay, there is a problem. Let’s apply deep learning and
solve it.” That’s pretty cool.
Okay, we’re kind of running out of time, so I think that’s all
the trends that we’ve covered. I think that was pretty cool.
Thanks a lot, guys, for coming on the webinar.
Hadelin: Thanks so much, guys. That was my first webinar and I
really enjoyed it.
Kirill: Yeah. All right, take care guys, and hopefully we’ll have more
of these, we’ll see these coming up more. And good luck in
2018. Let’s stay in touch.
Hadelin: Yes, keep up the good work.
Kirill: All right. See you, man.
Hadelin: See you.
Kirill: There we go. Those were the trends for 2018 that we were
able to identify. Of course, some of them will happen, some
of them will happen less, but overall those are the most
exciting things to look out for in this coming year. Which
was your favourite trend? Which is the one that you’re most
excited about, the one that you’re looking into the most?
Personally for me, I like the concept of anything to do with AI
and digital twins and security as well, but the one I’m most
curious about is blockchain. I have this new project of my
own that is going on that I’m learning about blockchain and
I want to learn more and more about blockchain, I want to
find out how it works, what exactly goes into it, what the
security, encryption and other implications are, and what
are the use cases and so on. So definitely that’s the one for
me. But again, yours might be a bit different.
In any case, I hope you enjoyed these trends and now you
know what to look out for in 2018. If you know somebody in
the space of data science that could benefit from this
episode, then forward it to them and help them also get
prepared and maybe you’ll have something to discuss and
debate after they listen or watch, because this episode is
available in video mode, and you’ll have something to
discuss with them. Plus you can get all the links from this
episode and the show notes at
www.superdatascience.com/119. There you can also find
the video recording. And on that note, thank you so much
for being here. I can’t wait to see you back here again soon.
Until then, happy analysing.