31
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION IBM Watson and Analytics: Welcome to the Cognitive Era A new era in technology, a new era in business. Carroll, Executive Architect – Big Data & Analytics [email protected]

Watson and Analytics

Embed Size (px)

Citation preview

Page 1: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION

IBM Watson and Analytics:Welcome to the Cognitive Era

A new era in technology, a new era in business.

Jerry Carroll, Executive Architect – Big Data & [email protected]

Page 2: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION

Where code goes, where data flows, cognition will follow.

2

Page 3: Watson and Analytics

3

Here’s what the world is saying about the impact of Cognitive systems on the future of how we work:

“IBM Crafts a Role for Artificial Intelligence in Medicine.”

“IBM Watson represents a bold technological and visionary step a future in which every aspect of our lives will be enhanced by the utility of cognitive computing as it is harnessed into myriad new applications.”

“What is distinctive about IBM is the breadth of its effort to create Watson tools and services as plug-in offerings for a wide range of developers.”

“At Wayin, former Sun CEO Scott McNealy is using Watson's image recognition capabilities to trawl photos on social media and make them searchable, even when they don't have tags describing their content. ‘You can't do this without Watson,’ he said.”

“IDC predicts the worldwide cognitive software platforms market will grow to $3.7 billion in 2019, at a CAGR of 35% over 5 years.”IDC: Worldwide Cognitive Software Platforms Forecast, 2015-2019: The Emergence of a New Market (#258781, September 2015, David Schubmehl)

“IBM is the only company marketing a cognitive computing platform that’s specifically designed to support the development of a broad range of enterprise solutions.”

“No doubt, Watson has the means to radically change the industry. In fact, its potential as an ‘innovation lake/incubator’ should be highly valued.”IDC: IBM’s Go-to-Market Transformation – Deeper, Wider, Newer (#AP257527, April 2015, Chris Zhang, Sabharinath Balasubramanian, Mayur Sahni)

“IBM’s famous cognitive computer can help banks with complex financial operations and attack important health care problems. Now you can add seeing to its skill set.”

“These days, it’s not just AI algorithms themselves that have improved, but the ability to deliver them…that has made so many new applications possible.”

Page 4: Watson and Analytics

There are three capabilities that differentiate cognitive systems from traditional programmed computing systems.

ReasoningThey reason. They can understand information but also the underlying ideas and concepts. This reasoning ability can become more advanced over time. It’s the difference between the reasoning strategies we used as children to solve mathematical problems, and then the strategies we developed when we got into advanced math like geometry, algebra and calculus.

LearningThey never stop learning. As a technology, this means the system actually gets more valuable with time. They develop “expertise”. Think about what it means to be an expert- - it’s not about executing a mathematical model. We don’t consider our doctors to be experts in their fields because they answer every question correctly. We expect them to be able to reason and be transparent about their reasoning, and expose the rationale for why they came to a conclusion.

UnderstandingCognitive systems understand like humans do, whether that’s through natural language or the written word; vocal or visual.

Page 5: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION

Data is transforming industries and professions.

5

HOW, AND WHY NOW?

Page 6: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 6

CONSIDER:

Data flows from every device, replacing guessing and approximations with precise information. Yet 80% of this data is unstructured; therefore, invisible to computers and of limited use to business.

HEALTHCARE DATA GOVERNMENT & EDUCATION DATA

99% 88% 94% 84%

Healthcare data comes from sources such as:

Government & education data comes from sources such as:

PatientSensors

Electronic Medical Records

Test Results

Vehicle Fleet Sensors

Traffic Sensors

Student Evaluations

UTILITIES DATA MEDIA DATA

93% 84% 97% 82%

Utilties data comes from sources such as:

Media data comes from sources such as:

Utility Sensors

Employee Sensors

Location Data

Videoand Film

Images Audio

By 2020,

of new information will be created every minute for every human being on the planet.

growth by 2017 unstructured growth by 2017 unstructured

1.7 MBgrowth by 2017 unstructured growth by 2017 unstructured

Page 7: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION

The world is being reinvented in code.

7

HOW, AND WHY NOW?

Page 8: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 8

CONSIDER:

The world is being rewritten in software code, and cloud is the platform on which the new digital builders—from developers to business professionals—are reimagining everything from banking to retail to healthcare.

Smart TVs represented 27% of all TV sales in 2012; by 2018, they will represent 82%.

Smart LED lighting will grow from 6M units in 2015 to 570M units in 2020, used for safety communication, health, pollution and personalized services.

By 2017, there will be 1B connected things in smart homes, including appliances, smoke detectors and cameras.

100,000,000lines of code in a new car

5,000,000lines of code in smart appliances

1,200,000lines of code in a smartphone

80,000lines of code in a pacemaker

of B2B collaboration will take place through web APIs next year.

50%

Sensors for industrial asset monitoring and management will grow from just over 15M units in 2014 to over 40M units in 2018

Smart traffic sensors and other devices installed in smart cities will grow from 237M units in 2015 to 371M in 2017.

Revenues for smart grid sensors will grow ten-fold from 2014 to 2021.

By 2020, there will be925M smart meters installed worldwide, more than double the 400M in 2014.Code Tools

Analytics Data APIs

Page 9: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION

Computing is entering a new cognitive era.

9

HOW, AND WHY NOW?

Page 10: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 10

CONSIDER:

Cognitive systems can understand the world through sensing and interaction, reason using hypotheses and arguments and learn from experts and through data. Watson is the most advanced such system.

Today, businesses in

countries across.

There are

Watson ecosystem partner companies, with

78%of business and IT executives believe that successful business will manage employees alongside intelligent machines.

On average there are

Among C-Suite executivesfamiliar with cognitive computing:

96%

84%

94%

89%

in insurance intend to invest in cognitive capabilities.

in healthcare believe it will play a disruptive role in the industry, and 60% believe they lack the skilled professionals and technical experience to achieve it.

in retail intend to invest in cognitive capabilities.

in telecommunications believe it will have a critical impact on the future of their business.

3617 industries are applying cognitive technologies.

350+

100of those have taken their product to market.

1.3BWatson API calls a month and growing.

Page 11: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION

• Deeper human engagement

• Elevated expertise

• Cognitive products and services

• Cognitive processes and operations

• Intelligent exploration and discovery

ADVANTAGES OF COGNITIVE BUSINESS:

11

Page 12: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION

Relationship Extraction

Questions&

Answers

LanguageDetection

PersonalityInsights

Keyword Extraction

Image LinkExtraction

Feed Detection

VisualRecognition

Concept Expansion

ConceptInsights

Dialog Sentiment Analysis

Text to Speech

Tradeoff Analytics

Natural LanguageClassifier

Author Extraction

Speech toText

Retrieve&

Rank

WatsonNews

LanguageTranslation

EntityExtraction

Tone Analyzer

ConceptTagging

Taxonomy

TextExtraction

MessageResonance

ImageTagging

FaceDetection

Answer Generation

Usage Insights

Fusion Q&A

Video Augmentation

Decision Optimization

Knowledge Graph

Risk Stratification

Policy Identification

Emotion Analysis

Decision Support

Criteria Classification

Knowledge Canvas

Easy Adaptation

Knowledge Studio Service

Statistical Dialog

Q&A Qualification

Factoid Pipeline

CaseEvaluation

12

IBM WATSON

The Waston that competed on Jeopardy! in 2011 comprised what is now a single API—Q&A—built on five underlying technologies.

Since then, Watson has grown to a family of 28 APIs.

By the end of 2016, there will be nearly 50 Watson APIs—with more added every year.

Natural

Language

Processing

Machine

Learning

Question

Analysis

Feature

Engineering

Ontology

Analysis

Page 13: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 13

IBM WATSONPersonality

Insights

Page 14: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 14

IBM WATSON

These APIs are underpinned by 50 technologies:

Anaphoric Co-referencingColloquialism ProcessingContent Management -- VersioningConvolutional Neural NetworksCurationDeep LearningDialog FramingEllipsesEmbedded Table ProcessingEnsembles and FusionEntity ResolutionFactoid AnsweringFeature Engineering

Feature NormalizationFocus and Spurious Phrase ResolutionHTML Page AnalysisImage ManagementInformation RetrievalKnowledge (Property) GraphsKnowledge AnsweringKnowledge Extraction AnnotatorsKnowledge Validation and ExtrapolationLanguage ModelingLatent Semantic Analysis

Learn To RankLinguistic AnalysisLogical Reasoning AnalysisLogistical RegressionMachine LearningMulti-Dimensional ClusteringMultilingual trainingn-Gram Analysis (word combinations and distance)Ontology AnalysisPareto AnalysisPassage AnsweringPDF ConversionPhoneme Aggregation

Question AnalysisQuestion-answering Reasoning StrategiesRecursive Neural NetworksRules ProcessingScalable SearchSimilarity AnalyticsStatistical Language ParsingSupport Vector MachinesSyllable AnalysisTable AnsweringVisual AnalysisVisual RenderingVoice Synthesis

Page 15: Watson and Analytics

© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 15

BECOMING A COGNITIVE BUSINESS

1. A cognitive strategyDetermine what data you need, which experts will train the system; where you must build more human engagement; which products, services, processes and operations should be infused with cognition, and which parts of the unstructured 80% of data you most need to focus on to make discoveries for the future.

2. A foundation of data and analyticsCollect and curate the right data—data you own, data fromothers, data available to all; both structured and unstructured. Apply cognitive technologies to this data in order to sense, learn and adapt, thereby creating competitive advantage.

3. Cloud services optimized for industry, data and cognitive APIs

The building blocks for products and services are code, APIs and diverse data sets. The platform you choose to develop on, and the agile development culture and methods you embrace, will be critical to your success.

4. IT infrastructure tuned for cognitive workloadsArchitect a new kind of IT core—a heterogeneous infrastructure that serves as the backbone of your enterprise. Do this rapidly and affordably by harmonizing technologies from public, private and hybrid cloud with distributed devices, IoT instrumentation and your existing systems.

5. Security for a Cognitive EraAs cognition makes its way into cars, buildings, roadways, business processes, fleets, supply chains—securing every transaction, piece of data, and interaction becomes essential to ensure trust in the entire system—and in your brand and reputation.

Page 16: Watson and Analytics

Power Systems – Designed for Big Data and Analytics solutions

16

Analytics 1.0

Page 17: Watson and Analytics

Power Systems – Designed for Big Data and Analytics solutions

17

Analytics 1.5

Page 18: Watson and Analytics

Power Systems – Designed for Big Data and Analytics solutions

18

Analytics 2.0 - support a variety of data and a range of analytics

Page 19: Watson and Analytics

Classic hadoop infrastructures can be inefficient and inflexibleleading to server and cluster sprawl, unnecessary software licenses, and infrastructure management challenges

IBM Data Engine for Hadoop and Spark

IBM Data Engine for Analytics

Page 20: Watson and Analytics

Avoid Server Sprawl as Big Data and Analytics environments grow

Intel server growth estimated as a sample configuration with 500 TB of user space grows 5x and 10x to 2.5 PB and 5 PB:

5x 10x

POWER8 server growth using IBM Data Engine for Analytics estimated as a sample configuration grows 5x and 10x to 2.5 PB and 5 PB of user space:

5x 10x

Sizing is based on assumptions regarding general configurations and use cases for BigInsights and BigSQL with actual client. The comparison reflects the number of servers required to deliver relative performance and equivalent user space on Intel reference architecture using Hadoop triple replication with 4 TB drives versus POWER8 IDEA reference architecture with 4 TB drives using the Elastic Storage Server.

17 Servers28 Servers

75 Servers

143 Servers

Page 21: Watson and Analytics

Many new solution workloadsin addition to existing apps

Leads to costly, complex, siloed, under-utilized infrastructure and replicated data

Development Test Distributed ETL, Sensitivity Analysis

Hadoop based Sentiment Analysis

Low Utilization= Higher cost

Infrastructure Silos aka Cluster Sprawl is Inefficient

Page 22: Watson and Analytics

IBM Data Engine for Analytics (IDEA) – Client Example

Client: Multinational Telecomm. CompanyA multinational telecommunication company with

over 6M subscribers..

ChallengesExpectations of a Real Time Marketing (RTM)

based solution to run event-based campaignsEnable event-based marketing, analyzing various

sources of input data containing information regarding subscribers actions

Dispatch the triggered events to downstream applications such as campaign management, for associated campaign execution.

Architecture- Solution Components• BigInsights, Streams, SPSS Modeller, SPSS

Analytics Server• IBM Data Engine for Analytics: 20 X S822L, 2 X ESS

GL4, Spectrum Scale, PCM

Solution Approach Solution provided a Hadoop-based Big Data platform,

integrated to the RTM decision engine, to enable data monetization opportunities, including location based analytics

Customer was not comfortable with the huge number of x86 Data Nodes approach of typical Hadoop Architecture

The IBM team designed the Power solution and conducted a technical workshop on newly redefined Hadoop architecture based on IDEA.

Key Client BenefitsOptimized Big Data deployment architecture with IDEA

Architecture with Linux on Power, Elastic Storage Server and Spectrum Scale

Lower TCO with 4 Racks on Power vs 12 racks on x86More IO bandwidth with 40GbE Power network against

10GbE on x86 based solution

3x less racks for 2 PB Big Data solution

4 vs. 12

Page 23: Watson and Analytics

IBM Data Engine for Analytics – Solution Highlights

Actionable Insights with IBM BigInsights preloaded + Increase business value by consolidating multiple analytic capabilities and data as needed Up to 2.5x* faster insights

Smart Infrastructure Services with IBM Platform Computing+ Designed to handle multiple analytic workloads in a multi-tenant environment with dynamic resources

Designed for Data with IBM POWER8 Systems (S822L)+ Outstanding memory and IO bandwidth design for the demands of Big Data 2x* better performance

Scalable Networking with IBM and partner networks+ High bandwidth, low latency networking Ethernet RoCE, (10 Gbit or 40 Gbit), InfiniBand RDMA (FDR) 10x to 100x network performance growth since Hadoop inception

Flexible Storage with IBM Elastic Storage Server+ Combines Servers, Storage Enclosures, Disks and Elastic Storage Software Over 2x** reduction in storage disk count

*Based on internal testing and cost analysis**Based on client example vs a triple replica Hadoop configuration.

Big Data & Analytics Software

Infrastructure Services

POWER8 Servers

Scalable Networking

Scale Out Cluster File System

Elastic Storage Server

Appliance-Like but much more Versatile!

Page 24: Watson and Analytics

Compute Plane = Power8 Systems, Designed for Big Data

4X

Threads per core*4X

Memory Bandwidth*5X

More cache*

SMT=Simultaneous Multi-Threading OLTP = On-Line Transaction Processing HPC=High Performance Computing

These design decisions result in best performance for all types of workloads such as: Java, OLTP, Analytics, Big Data, HPC

* POWER8 compared to Intel Haswell EXSources: Haswell EX: http://ark.intel.com/products/84685/Intel-Xeon-Processor-E7-8890-v3-45M-Cache-2_50-GHz POWER8: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=BR&infotype=PM&appname=STGE_PO_PO_USEN&htmlfid=POB03046USEN

POWER8SMT8

x86SMT2

POWER8pipe

Data flow

x86 pipePOWER8

x86

x86POWER8

1.4 – 2.3X Clock Frequency

Components of IDEA

Page 25: Watson and Analytics

Data Plane = Elastic Storage Server, A Data Lake for many applications

Cinder SwiftGPFS NFS

Linear capacity & performance scale out

POSIX

Enterprise storage on standard hardware

Technical Computing Big Data & Analytics Cloud

File

77.5 Percent of organizations are already investing in a Data Lake*

Elastic Storage ServerSingle Name Space

25

HadoopBlock Object

Data LakeStructured Data

Unstructured Data

Traditional AnalyticsBigSQL

Components of IDEA

Page 26: Watson and Analytics

Production - IBM Data Engine for Analytics

26

Without HMC/TFT/SMN/MN With HMC/TFT/SMN

Available Filesystem Capacity 0PB 1PB

Number of Data Nodes 14 4

Hadoop Mgmt LPARs 0 6

System Mgmt Node 0 1

Data Node – S822L24x 3.02GHz cores256GB DRAM12x 1.8TB 10K SAS HDD2x40GbE (2 port)(data+mgmt)2x 4-port 1GbE NIC (mgmt)

Hadoop Management Node – S822L

2 LPARs with split backplane24x 3.02GHz cores256GB DRAM8x 1.8TB 10K SAS HDD (OS + data)2x40GbE (2 port)(data+mgmt)2x 4-port 1GbE NIC (mgmt)

System Management Node – S812L

10x 3.425GHz cores 32GB DRAM2x 300GB 10K SAS HDD (OS)1x40GbE (2 port)(data+mgmt)1x 4-port 1GbE NIC (mgmt)

Initial RackScale out Racks

Page 27: Watson and Analytics

Classic hadoop infrastructures can be inefficient and inflexibleleading to server and cluster sprawl, unnecessary software licenses, and infrastructure management challenges

IBM Data Engine for Hadoop and Spark

IBM Data Engine for Analytics

Page 28: Watson and Analytics

OpenPOWER Foundation

Page 29: Watson and Analytics

Single vendor support

Up to 2x better price performance for Spark workloads*

Delivered as a fully integrated cluster ready to run

OpenPOWER innovation with IBM S812LC servers

Optimized configurations for Hadoop or Spark workloads

Based on S812LC servers with up to 14*6TB disk drives per server

Optionally preloaded with IBM BigInsights and IBM Open Platform

Simplify operations – easy to deploy and manage

Adapt and scale to your changing analytics needs

IBM Data Engine for Hadoop and SparkOpenPOWER innovation with IBM Open Platform with Apache Hadoop for a high performance, storage dense and fully integrated cluster offering.

29• All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL

RDD Relation, Logistic Regression, SVM

Announce: Feb 9, 2016GA: Mar 18, 2016

Page 30: Watson and Analytics

IBM Data Engine for Spark and Hadoop (IDE-HS) Cluster Performance

Designed for the Cognitive Era to Make Better Decisions even Faster

IBM Data Engine for Hadoop and Spark infrastructure delivers Spark workload scaling to minimize execution times and reduce batch windows

-2.1X more performance per dollar spent for Spark Logistic Regression based Machine Learning used in model training by wide variety of lines of business

-1.4X more performance per dollar spent for Support Vector Machine (SVM) – a Machine Learning algorithm used in product Recommender Systems

-1.7X more performance per dollar spent for Spark SQL query processing used widely in Big Data clusters

• All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL RDD Relation, Logistic Regression, SVM• 6 Data Nodes and 1 Management Node. Each node is IBM Power System S812LC 10 cores / 80 threads, POWER8; 2.92GHz, 256 GB

memory, RedHat 7.2, Spark 1.5.1, OpenJDK 1.8• 6 Data Nodes and 1 Management Node. Each node is x86 E5-2620V3 12 cores / 24 threads, E5-2620 V3; 2.4GHz, 256 GB memory,

RedHat 7.1, Spark 1.5.1, OpenJDK 1.8• Pricing is based on web prices of HP DL380 and list prices of IBM Power S812LC

SVMLogres SQL

SVMLogres SQL6

Page 31: Watson and Analytics

• Apache Spark is an open-source in-memory distributed compute engine – It speeds iterative analysis on large-scale data up to 100x faster

than current technologies– Enables more people to collaborate together to access data,

apply analytics and deploy deep intelligence into every applicationincluding IoT, web, mobile, social, business process and more

– IBM/Spark commitment: 3500 employees working on Spark• Included in the IBM Open Platform (IOP) that runs on

Linux on Power

• Power Systems - key contributor to Spark• Offering over 2x the performance per core for Spark workloads

compared to x86 Haswell * (SQL, ML, Graph, Streaming)

Open Platform with Apache Hadoop

Open innovation to put data to work across the enterprise

* Based on Sparkbench on POWER8 P822L vs x86 E5-2690 V3; each 24 core and 256 GB RAM

Spark on Power