Upload
doanthuan
View
219
Download
4
Embed Size (px)
Citation preview
The Ever-Evolving
Artificial Intelligence
and Machine
Learning Ecosystem
K N O W L E D G E N T W H I T E P A P E R
It’s hard to turn on the news or visit a news site and not see at least one article or segment
about how Artificial Intelligence and Machine Learning (AI/ML) are changing the world around
us. Even though AI/ML is starting to become pervasive, we are only in the initial stages of its full-
fledged adoption. AI/ML is not new, the seeds were planted by Alan Turing in the 1950’s and
many contributions have been made by a vast array of practitioners and computer scientists
since then. It has not always been smooth sailing. Ever since the term was coined, we have
had periods where we have hit rough patches and AI has gone into hibernation. The main
reason this has happened is due to the classic overreach by tool vendors and IT consultants of
over-promising and under-delivering. The name of the field alone is making some bold claims.
Implicitly the name Artificial Intelligence is suggesting that we can mimic human intelligence.
Even the current iteration of AI is far from actually replicating humans but it can certainly solve a
much bigger swath of problems now than even just a few years ago.
There are four major reasons why AI/ML can solve much harder problems today.
More Sophisticated Algorithms
Many of the current amazing breakthroughs in AI (DeepMind, image recognition that out
performs humans, self-driving cars, highly accurate machine translation, to name a few) can
be credited to a specific area of AI know as Deep Learning and more specifically it can be
tied to the Recurrent Neural Networks and Convolutional Neural Networks. These algorithms
have been optimized and they are particularly suited to be run in parallel in GPU machines
and clusters. Other areas of AI might produce additional enhancements but currently neural
networks are having their day in the sun have the most buzz around them.
Powerful and Cheap Computing Power
In the past, even as AI algorithms improved, hardware remained a constraining factor. Recent
advances in hardware and new computational models, particularly around GPUs, have
accelerated the adoption of AI. This is hard to believe, but the computing power provided by
today’s GPU processors is the equivalent of what we called super computers in the 1990’s and
at the beginning of the century. GPUs gained popularity in the AI community for their ability
to handle a high degree of parallelism and the ability to perform matrix multiplications in an
efficient manner – both are necessary for the iterative nature of deep learning algorithms.
Subsequently, in any serious AI project, GPU’s are the processors of choice when creating and
training models.
K N O W L E D G E N T W H I T E P A P E R
knowledgent
Elasticity
In cloud computing, elasticity is the ability of a computer system to adapt to different workloads
by spinning up and shutting resources automatically. If a computer system is elastic, it can at
any given time have the available resources necessary match the current demand as closely as
possible. Elasticity is one of the defining characteristic that differentiates cloud computing from
other computing paradigms. Very few on premise facilities have the ability to turn computing
power on and off on demand. Even if they could bring systems down, by definition, an on-
prem system is owned by the corporation so they would still be paying for that resource, even
when it’s turned off. Cloud providers such as AWS, Azure and Google seamlessly provide elastic
computing with their offerings. So setting up an AI experiment that would have cost hundreds
of thousands of dollars just a few years ago to buy a super computer can be performed
for hundreds of dollars on the cloud today. Spin up the instance for a few hours, run your
experiment and shut down all the resources once the experiment is completed.
Data
AI, and more specifically Deep Learning, currently requires hundreds of thousands, if not
millions of data points to learn. Fortunately this data is starting to become more plentiful. Given
that storage is becoming cheaper every day, corporations are more prone to keep their logs,
social media provides a trove of data, every day more and more public data sources become
available such as data.gov. Another big data source are Internet of Things (IoT) sensors. The
health care, retail and finance (to name a few) industries are creating vast stores of patient
and customer data that can then be used to train AI models. Not surprisingly, the companies
investing most in AI – Amazon, Apple, Baidu, Google, Microsoft, and Facebook – are the ones
with the most data.
Differences Between AI, Machine Learning, and Deep Learning
In many contexts, artificial intelligence, machine learning and deep learning are used
synonymously, but in reality, deep learning is a subset of machine learning, and machine
learning is a subset of Artificial Intelligence. Artificial Intelligence is a the branch of computer
science that focuses on building machines capable of mimicking or simulating intelligent
behavior, while machine learning is the practice of using algorithms to sift through data,
“learn” from the data, and make predictions or take autonomous actions. Therefore, instead of
programming specific rules and conditions for an algorithm to follow, the algorithm is trained
using large amounts of data to give it the ability to independently learn from the data and
perform a specific task.
K N O W L E D G E N T W H I T E P A P E R
knowledgent
An AI experiment that would have cost hundreds of thousands of dollars just a few years ago, can be performed for hundreds of dollars on the cloud today.
K N O W L E D G E N T W H I T E P A P E R
knowledgent
AI/ML Tools
When Artificial Intelligence first came into the picture many practitioners coded everything
from scratch in a wide variety of languages. Initially languages such as Prolog, Lisp were popular.
Later on, Java and C++ became relevant. Lately, the de facto AI languages have become R and
Python.
Fortunately, some great companies have emerged that greatly simplify the model selection,
model versioning, data gathering, data cleansing, model training and other functions critical to
the data science process. We will now review a sample of those companies.
Data Collection
CrowdFlower – CrowdFlower is a data mining and crowdsourcing company based in San
Francisco, United States. The company offers software as a service which allows users to access
an online workforce to clean, label and enrich data. It provides small Tasks and you get paid a
small amount for completing those tasks. These Tasks are available on various partner platforms
like Clixsense, Neobux and their own Platform Crowdflower elite.
Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing Internet
marketplace enabling individuals and businesses (known as Requesters) to coordinate the
use of human intelligence to perform tasks that computers are currently unable to do. It is
one of the sites of Amazon Web Services, and is owned by Amazon. Employers are able to
post jobs known as Human Intelligence Tasks (HITs), such as choosing the best among several
photographs of a storefront, writing product descriptions, or identifying performers on music
CDs. Workers (called Providers in Mechanical Turk’s Terms of Service, or, more colloquially,
Turkers) can then browse among existing jobs and complete them in exchange for a monetary
payment set by the employer. To place jobs, the requesting programs use an open application
programming interface (API), or the more limited MTurk Requester site.
Data Cleansing
Trifacta – Trifacta is a platform for exploring and preparing data for analysis. Trifacta works
with cloud and on-premises data platforms. Trifacta is designed to allow analysts to explore,
transform and enrich raw, diverse data into clean and structured formats for analysis through
self-service data preparation. Trifacta’s approach focuses on utilizing the latest techniques in
machine learning, data visualization, human-computer interaction and parallel processing and
allows non-technical users who have the most context for the data to quickly make the data
ready for a variety of business processes such as analytics.
K N O W L E D G E N T W H I T E P A P E R
knowledgent
Model Training and Management
Databricks – Databricks is one of the fastest analytics platforms from the creators of Apache
Spark. The Databricks Unified Analytics Platform enables data scientists, data engineers, and
analysts to easily collaborate to accelerate innovation. Databricks is a company founded by
the creators of Apache Spark that aims to help clients with cloud-based big data processing
using Spark. Databricks grew out of the AMPLab project at University of California, Berkeley that
was involved in making Apache Spark, a distributed computing framework built atop Scala.
Databricks develops a web-based platform for working with Spark that provides automated
cluster management and IPython-style notebooks. In addition to building the Databricks
platform, the company is co-organizing massive open online courses about Spark and runs the
largest conference about Spark - Spark Summit.
Datarobot – DataRobot automates the legwork around running machine learning models on
your data. With the service, available for private or public cloud, you upload the data and do
some “lite” data preparation around it, you indicate what parameter you want to predict, then
the tool takes a brute force approach and runs dozens of algorithms on sets, then compares
the results on a leader board that is similar to what Kaggle uses to display results of its online
competitions (the company employs over a dozen data scientists who’ve made Kaggle’s top
100 ratings).
Domino Data Lab – Domino accelerates the development and delivery of models with key
capabilities of infrastructure automation, seamless collaboration, and automated reproducibility.
This greatly increases the productivity of data scientists and removes bottlenecks in the data
science lifecycle. Domino empowers data scientists to build, run, and deploy models faster
in a central place using the most popular tools and languages. Data scientists are able to run
more experiments faster with scalable compute, avoid IT headaches with one-click model
deployment, and quickly integrate data science into business processes with stakeholder-
friendly reports and apps.
K N O W L E D G E N T W H I T E P A P E R
knowledgent
AI, and more specifically Deep Learning, currently requires hundreds of thousands, if not millions of data points to learn. Fortunately this data is starting to become more plentiful.
K N O W L E D G E N T W H I T E P A P E R
knowledgent
Putting it all TogetherArtificial Intelligence and Machine Learning are well past the point of science fiction and moved
into the middle of our homes, businesses and lives, many times without realizing that we are
using it. Better accessibility to cheap cloud computing, recent breakthroughs in algorithms,
and elastic computing are bringing amazing new possibilities unimaginable just a few years
ago. But the most important pillar that has assisted the AI renaissance is the availability of vast,
new, rich data sources that is making deep learning a reality. To further advance the state of
the art and create new applications, management, business analysts, data scientists and data
engineers need to carefully select which problems they want to tackle next with AI/ML and
more importantly decide how to conceptualize and implement these applications.
K N O W L E D G E N T W H I T E P A P E R
New York, New York • Warren, New Jersey • Boston, Massachusetts • Toronto, Canadawww.knowledgent.comFor more information contact [email protected].
© 2018 Knowledgent Group Inc. All rights reserved.
ABOUT KNOWLEDGENT
Knowledgent is a data intelligence company that innovates IN and THROUGH data. We eat,
sleep, and breathe data to enable advanced and agile analytics, digital enterprise, and robotics.
We combine our data and analytics expertise with business specific domain knowledge. We are
Informationists that are passionate about data.