8
The Ever-Evolving Artificial Intelligence and Machine Learning Ecosystem KNOWLEDGENT WHITE PAPER

KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

Embed Size (px)

Citation preview

Page 1: KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

The Ever-Evolving

Artificial Intelligence

and Machine

Learning Ecosystem

K N O W L E D G E N T W H I T E P A P E R

Page 2: KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

It’s hard to turn on the news or visit a news site and not see at least one article or segment

about how Artificial Intelligence and Machine Learning (AI/ML) are changing the world around

us. Even though AI/ML is starting to become pervasive, we are only in the initial stages of its full-

fledged adoption. AI/ML is not new, the seeds were planted by Alan Turing in the 1950’s and

many contributions have been made by a vast array of practitioners and computer scientists

since then. It has not always been smooth sailing. Ever since the term was coined, we have

had periods where we have hit rough patches and AI has gone into hibernation. The main

reason this has happened is due to the classic overreach by tool vendors and IT consultants of

over-promising and under-delivering. The name of the field alone is making some bold claims.

Implicitly the name Artificial Intelligence is suggesting that we can mimic human intelligence.

Even the current iteration of AI is far from actually replicating humans but it can certainly solve a

much bigger swath of problems now than even just a few years ago.

There are four major reasons why AI/ML can solve much harder problems today.

More Sophisticated Algorithms

Many of the current amazing breakthroughs in AI (DeepMind, image recognition that out

performs humans, self-driving cars, highly accurate machine translation, to name a few) can

be credited to a specific area of AI know as Deep Learning and more specifically it can be

tied to the Recurrent Neural Networks and Convolutional Neural Networks. These algorithms

have been optimized and they are particularly suited to be run in parallel in GPU machines

and clusters. Other areas of AI might produce additional enhancements but currently neural

networks are having their day in the sun have the most buzz around them.

Powerful and Cheap Computing Power

In the past, even as AI algorithms improved, hardware remained a constraining factor. Recent

advances in hardware and new computational models, particularly around GPUs, have

accelerated the adoption of AI. This is hard to believe, but the computing power provided by

today’s GPU processors is the equivalent of what we called super computers in the 1990’s and

at the beginning of the century. GPUs gained popularity in the AI community for their ability

to handle a high degree of parallelism and the ability to perform matrix multiplications in an

efficient manner – both are necessary for the iterative nature of deep learning algorithms.

Subsequently, in any serious AI project, GPU’s are the processors of choice when creating and

training models.

K N O W L E D G E N T W H I T E P A P E R

knowledgent

Page 3: KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

Elasticity

In cloud computing, elasticity is the ability of a computer system to adapt to different workloads

by spinning up and shutting resources automatically. If a computer system is elastic, it can at

any given time have the available resources necessary match the current demand as closely as

possible. Elasticity is one of the defining characteristic that differentiates cloud computing from

other computing paradigms. Very few on premise facilities have the ability to turn computing

power on and off on demand. Even if they could bring systems down, by definition, an on-

prem system is owned by the corporation so they would still be paying for that resource, even

when it’s turned off. Cloud providers such as AWS, Azure and Google seamlessly provide elastic

computing with their offerings. So setting up an AI experiment that would have cost hundreds

of thousands of dollars just a few years ago to buy a super computer can be performed

for hundreds of dollars on the cloud today. Spin up the instance for a few hours, run your

experiment and shut down all the resources once the experiment is completed.

Data

AI, and more specifically Deep Learning, currently requires hundreds of thousands, if not

millions of data points to learn. Fortunately this data is starting to become more plentiful. Given

that storage is becoming cheaper every day, corporations are more prone to keep their logs,

social media provides a trove of data, every day more and more public data sources become

available such as data.gov. Another big data source are Internet of Things (IoT) sensors. The

health care, retail and finance (to name a few) industries are creating vast stores of patient

and customer data that can then be used to train AI models. Not surprisingly, the companies

investing most in AI – Amazon, Apple, Baidu, Google, Microsoft, and Facebook – are the ones

with the most data.

Differences Between AI, Machine Learning, and Deep Learning

In many contexts, artificial intelligence, machine learning and deep learning are used

synonymously, but in reality, deep learning is a subset of machine learning, and machine

learning is a subset of Artificial Intelligence. Artificial Intelligence is a the branch of computer

science that focuses on building machines capable of mimicking or simulating intelligent

behavior, while machine learning is the practice of using algorithms to sift through data,

“learn” from the data, and make predictions or take autonomous actions. Therefore, instead of

programming specific rules and conditions for an algorithm to follow, the algorithm is trained

using large amounts of data to give it the ability to independently learn from the data and

perform a specific task.

K N O W L E D G E N T W H I T E P A P E R

knowledgent

Page 4: KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

An AI experiment that would have cost hundreds of thousands of dollars just a few years ago, can be performed for hundreds of dollars on the cloud today.

K N O W L E D G E N T W H I T E P A P E R

knowledgent

Page 5: KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

AI/ML Tools

When Artificial Intelligence first came into the picture many practitioners coded everything

from scratch in a wide variety of languages. Initially languages such as Prolog, Lisp were popular.

Later on, Java and C++ became relevant. Lately, the de facto AI languages have become R and

Python.

Fortunately, some great companies have emerged that greatly simplify the model selection,

model versioning, data gathering, data cleansing, model training and other functions critical to

the data science process. We will now review a sample of those companies.

Data Collection

CrowdFlower – CrowdFlower is a data mining and crowdsourcing company based in San

Francisco, United States. The company offers software as a service which allows users to access

an online workforce to clean, label and enrich data. It provides small Tasks and you get paid a

small amount for completing those tasks. These Tasks are available on various partner platforms

like Clixsense, Neobux and their own Platform Crowdflower elite.

Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing Internet

marketplace enabling individuals and businesses (known as Requesters) to coordinate the

use of human intelligence to perform tasks that computers are currently unable to do. It is

one of the sites of Amazon Web Services, and is owned by Amazon. Employers are able to

post jobs known as Human Intelligence Tasks (HITs), such as choosing the best among several

photographs of a storefront, writing product descriptions, or identifying performers on music

CDs. Workers (called Providers in Mechanical Turk’s Terms of Service, or, more colloquially,

Turkers) can then browse among existing jobs and complete them in exchange for a monetary

payment set by the employer. To place jobs, the requesting programs use an open application

programming interface (API), or the more limited MTurk Requester site.

Data Cleansing

Trifacta – Trifacta is a platform for exploring and preparing data for analysis. Trifacta works

with cloud and on-premises data platforms. Trifacta is designed to allow analysts to explore,

transform and enrich raw, diverse data into clean and structured formats for analysis through

self-service data preparation. Trifacta’s approach focuses on utilizing the latest techniques in

machine learning, data visualization, human-computer interaction and parallel processing and

allows non-technical users who have the most context for the data to quickly make the data

ready for a variety of business processes such as analytics.

K N O W L E D G E N T W H I T E P A P E R

knowledgent

Page 6: KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

Model Training and Management

Databricks – Databricks is one of the fastest analytics platforms from the creators of Apache

Spark. The Databricks Unified Analytics Platform enables data scientists, data engineers, and

analysts to easily collaborate to accelerate innovation. Databricks is a company founded by

the creators of Apache Spark that aims to help clients with cloud-based big data processing

using Spark. Databricks grew out of the AMPLab project at University of California, Berkeley that

was involved in making Apache Spark, a distributed computing framework built atop Scala.

Databricks develops a web-based platform for working with Spark that provides automated

cluster management and IPython-style notebooks. In addition to building the Databricks

platform, the company is co-organizing massive open online courses about Spark and runs the

largest conference about Spark - Spark Summit.

Datarobot – DataRobot automates the legwork around running machine learning models on

your data. With the service, available for private or public cloud, you upload the data and do

some “lite” data preparation around it, you indicate what parameter you want to predict, then

the tool takes a brute force approach and runs dozens of algorithms on sets, then compares

the results on a leader board that is similar to what Kaggle uses to display results of its online

competitions (the company employs over a dozen data scientists who’ve made Kaggle’s top

100 ratings).

Domino Data Lab – Domino accelerates the development and delivery of models with key

capabilities of infrastructure automation, seamless collaboration, and automated reproducibility.

This greatly increases the productivity of data scientists and removes bottlenecks in the data

science lifecycle. Domino empowers data scientists to build, run, and deploy models faster

in a central place using the most popular tools and languages. Data scientists are able to run

more experiments faster with scalable compute, avoid IT headaches with one-click model

deployment, and quickly integrate data science into business processes with stakeholder-

friendly reports and apps.

K N O W L E D G E N T W H I T E P A P E R

knowledgent

Page 7: KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

AI, and more specifically Deep Learning, currently requires hundreds of thousands, if not millions of data points to learn. Fortunately this data is starting to become more plentiful.

K N O W L E D G E N T W H I T E P A P E R

knowledgent

Page 8: KNOWLEDGENT WHITE PAPER The Ever-Evolving … Clixsense, Neobux and their own Platform Crowdflower elite. Amazon Mechanical Turk – Amazon Mechanical Turk (MTurk) is a crowdsourcing

Putting it all TogetherArtificial Intelligence and Machine Learning are well past the point of science fiction and moved

into the middle of our homes, businesses and lives, many times without realizing that we are

using it. Better accessibility to cheap cloud computing, recent breakthroughs in algorithms,

and elastic computing are bringing amazing new possibilities unimaginable just a few years

ago. But the most important pillar that has assisted the AI renaissance is the availability of vast,

new, rich data sources that is making deep learning a reality. To further advance the state of

the art and create new applications, management, business analysts, data scientists and data

engineers need to carefully select which problems they want to tackle next with AI/ML and

more importantly decide how to conceptualize and implement these applications.

K N O W L E D G E N T W H I T E P A P E R

New York, New York • Warren, New Jersey • Boston, Massachusetts • Toronto, Canadawww.knowledgent.comFor more information contact [email protected].

© 2018 Knowledgent Group Inc. All rights reserved.

ABOUT KNOWLEDGENT

Knowledgent is a data intelligence company that innovates IN and THROUGH data. We eat,

sleep, and breathe data to enable advanced and agile analytics, digital enterprise, and robotics.

We combine our data and analytics expertise with business specific domain knowledge. We are

Informationists that are passionate about data.