Upload
jorge-w-hago
View
233
Download
0
Embed Size (px)
Citation preview
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION
IBM Watson and Analytics:Welcome to the Cognitive Era
A new era in technology, a new era in business.
Jerry Carroll, Executive Architect – Big Data & [email protected]
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION
Where code goes, where data flows, cognition will follow.
2
3
Here’s what the world is saying about the impact of Cognitive systems on the future of how we work:
“IBM Crafts a Role for Artificial Intelligence in Medicine.”
“IBM Watson represents a bold technological and visionary step a future in which every aspect of our lives will be enhanced by the utility of cognitive computing as it is harnessed into myriad new applications.”
“What is distinctive about IBM is the breadth of its effort to create Watson tools and services as plug-in offerings for a wide range of developers.”
“At Wayin, former Sun CEO Scott McNealy is using Watson's image recognition capabilities to trawl photos on social media and make them searchable, even when they don't have tags describing their content. ‘You can't do this without Watson,’ he said.”
“IDC predicts the worldwide cognitive software platforms market will grow to $3.7 billion in 2019, at a CAGR of 35% over 5 years.”IDC: Worldwide Cognitive Software Platforms Forecast, 2015-2019: The Emergence of a New Market (#258781, September 2015, David Schubmehl)
“IBM is the only company marketing a cognitive computing platform that’s specifically designed to support the development of a broad range of enterprise solutions.”
“No doubt, Watson has the means to radically change the industry. In fact, its potential as an ‘innovation lake/incubator’ should be highly valued.”IDC: IBM’s Go-to-Market Transformation – Deeper, Wider, Newer (#AP257527, April 2015, Chris Zhang, Sabharinath Balasubramanian, Mayur Sahni)
“IBM’s famous cognitive computer can help banks with complex financial operations and attack important health care problems. Now you can add seeing to its skill set.”
“These days, it’s not just AI algorithms themselves that have improved, but the ability to deliver them…that has made so many new applications possible.”
There are three capabilities that differentiate cognitive systems from traditional programmed computing systems.
ReasoningThey reason. They can understand information but also the underlying ideas and concepts. This reasoning ability can become more advanced over time. It’s the difference between the reasoning strategies we used as children to solve mathematical problems, and then the strategies we developed when we got into advanced math like geometry, algebra and calculus.
LearningThey never stop learning. As a technology, this means the system actually gets more valuable with time. They develop “expertise”. Think about what it means to be an expert- - it’s not about executing a mathematical model. We don’t consider our doctors to be experts in their fields because they answer every question correctly. We expect them to be able to reason and be transparent about their reasoning, and expose the rationale for why they came to a conclusion.
UnderstandingCognitive systems understand like humans do, whether that’s through natural language or the written word; vocal or visual.
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION
Data is transforming industries and professions.
5
HOW, AND WHY NOW?
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 6
CONSIDER:
Data flows from every device, replacing guessing and approximations with precise information. Yet 80% of this data is unstructured; therefore, invisible to computers and of limited use to business.
HEALTHCARE DATA GOVERNMENT & EDUCATION DATA
99% 88% 94% 84%
Healthcare data comes from sources such as:
Government & education data comes from sources such as:
PatientSensors
Electronic Medical Records
Test Results
Vehicle Fleet Sensors
Traffic Sensors
Student Evaluations
UTILITIES DATA MEDIA DATA
93% 84% 97% 82%
Utilties data comes from sources such as:
Media data comes from sources such as:
Utility Sensors
Employee Sensors
Location Data
Videoand Film
Images Audio
By 2020,
of new information will be created every minute for every human being on the planet.
growth by 2017 unstructured growth by 2017 unstructured
1.7 MBgrowth by 2017 unstructured growth by 2017 unstructured
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION
The world is being reinvented in code.
7
HOW, AND WHY NOW?
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 8
CONSIDER:
The world is being rewritten in software code, and cloud is the platform on which the new digital builders—from developers to business professionals—are reimagining everything from banking to retail to healthcare.
Smart TVs represented 27% of all TV sales in 2012; by 2018, they will represent 82%.
Smart LED lighting will grow from 6M units in 2015 to 570M units in 2020, used for safety communication, health, pollution and personalized services.
By 2017, there will be 1B connected things in smart homes, including appliances, smoke detectors and cameras.
100,000,000lines of code in a new car
5,000,000lines of code in smart appliances
1,200,000lines of code in a smartphone
80,000lines of code in a pacemaker
of B2B collaboration will take place through web APIs next year.
50%
Sensors for industrial asset monitoring and management will grow from just over 15M units in 2014 to over 40M units in 2018
Smart traffic sensors and other devices installed in smart cities will grow from 237M units in 2015 to 371M in 2017.
Revenues for smart grid sensors will grow ten-fold from 2014 to 2021.
By 2020, there will be925M smart meters installed worldwide, more than double the 400M in 2014.Code Tools
Analytics Data APIs
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION
Computing is entering a new cognitive era.
9
HOW, AND WHY NOW?
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 10
CONSIDER:
Cognitive systems can understand the world through sensing and interaction, reason using hypotheses and arguments and learn from experts and through data. Watson is the most advanced such system.
Today, businesses in
countries across.
There are
Watson ecosystem partner companies, with
78%of business and IT executives believe that successful business will manage employees alongside intelligent machines.
On average there are
Among C-Suite executivesfamiliar with cognitive computing:
96%
84%
94%
89%
in insurance intend to invest in cognitive capabilities.
in healthcare believe it will play a disruptive role in the industry, and 60% believe they lack the skilled professionals and technical experience to achieve it.
in retail intend to invest in cognitive capabilities.
in telecommunications believe it will have a critical impact on the future of their business.
3617 industries are applying cognitive technologies.
350+
100of those have taken their product to market.
1.3BWatson API calls a month and growing.
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION
• Deeper human engagement
• Elevated expertise
• Cognitive products and services
• Cognitive processes and operations
• Intelligent exploration and discovery
ADVANTAGES OF COGNITIVE BUSINESS:
11
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION
Relationship Extraction
Questions&
Answers
LanguageDetection
PersonalityInsights
Keyword Extraction
Image LinkExtraction
Feed Detection
VisualRecognition
Concept Expansion
ConceptInsights
Dialog Sentiment Analysis
Text to Speech
Tradeoff Analytics
Natural LanguageClassifier
Author Extraction
Speech toText
Retrieve&
Rank
WatsonNews
LanguageTranslation
EntityExtraction
Tone Analyzer
ConceptTagging
Taxonomy
TextExtraction
MessageResonance
ImageTagging
FaceDetection
Answer Generation
Usage Insights
Fusion Q&A
Video Augmentation
Decision Optimization
Knowledge Graph
Risk Stratification
Policy Identification
Emotion Analysis
Decision Support
Criteria Classification
Knowledge Canvas
Easy Adaptation
Knowledge Studio Service
Statistical Dialog
Q&A Qualification
Factoid Pipeline
CaseEvaluation
12
IBM WATSON
The Waston that competed on Jeopardy! in 2011 comprised what is now a single API—Q&A—built on five underlying technologies.
Since then, Watson has grown to a family of 28 APIs.
By the end of 2016, there will be nearly 50 Watson APIs—with more added every year.
Natural
Language
Processing
Machine
Learning
Question
Analysis
Feature
Engineering
Ontology
Analysis
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 13
IBM WATSONPersonality
Insights
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 14
IBM WATSON
These APIs are underpinned by 50 technologies:
Anaphoric Co-referencingColloquialism ProcessingContent Management -- VersioningConvolutional Neural NetworksCurationDeep LearningDialog FramingEllipsesEmbedded Table ProcessingEnsembles and FusionEntity ResolutionFactoid AnsweringFeature Engineering
Feature NormalizationFocus and Spurious Phrase ResolutionHTML Page AnalysisImage ManagementInformation RetrievalKnowledge (Property) GraphsKnowledge AnsweringKnowledge Extraction AnnotatorsKnowledge Validation and ExtrapolationLanguage ModelingLatent Semantic Analysis
Learn To RankLinguistic AnalysisLogical Reasoning AnalysisLogistical RegressionMachine LearningMulti-Dimensional ClusteringMultilingual trainingn-Gram Analysis (word combinations and distance)Ontology AnalysisPareto AnalysisPassage AnsweringPDF ConversionPhoneme Aggregation
Question AnalysisQuestion-answering Reasoning StrategiesRecursive Neural NetworksRules ProcessingScalable SearchSimilarity AnalyticsStatistical Language ParsingSupport Vector MachinesSyllable AnalysisTable AnsweringVisual AnalysisVisual RenderingVoice Synthesis
© 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION 15
BECOMING A COGNITIVE BUSINESS
1. A cognitive strategyDetermine what data you need, which experts will train the system; where you must build more human engagement; which products, services, processes and operations should be infused with cognition, and which parts of the unstructured 80% of data you most need to focus on to make discoveries for the future.
2. A foundation of data and analyticsCollect and curate the right data—data you own, data fromothers, data available to all; both structured and unstructured. Apply cognitive technologies to this data in order to sense, learn and adapt, thereby creating competitive advantage.
3. Cloud services optimized for industry, data and cognitive APIs
The building blocks for products and services are code, APIs and diverse data sets. The platform you choose to develop on, and the agile development culture and methods you embrace, will be critical to your success.
4. IT infrastructure tuned for cognitive workloadsArchitect a new kind of IT core—a heterogeneous infrastructure that serves as the backbone of your enterprise. Do this rapidly and affordably by harmonizing technologies from public, private and hybrid cloud with distributed devices, IoT instrumentation and your existing systems.
5. Security for a Cognitive EraAs cognition makes its way into cars, buildings, roadways, business processes, fleets, supply chains—securing every transaction, piece of data, and interaction becomes essential to ensure trust in the entire system—and in your brand and reputation.
Power Systems – Designed for Big Data and Analytics solutions
16
Analytics 1.0
Power Systems – Designed for Big Data and Analytics solutions
17
Analytics 1.5
Power Systems – Designed for Big Data and Analytics solutions
18
Analytics 2.0 - support a variety of data and a range of analytics
Classic hadoop infrastructures can be inefficient and inflexibleleading to server and cluster sprawl, unnecessary software licenses, and infrastructure management challenges
IBM Data Engine for Hadoop and Spark
IBM Data Engine for Analytics
Avoid Server Sprawl as Big Data and Analytics environments grow
Intel server growth estimated as a sample configuration with 500 TB of user space grows 5x and 10x to 2.5 PB and 5 PB:
5x 10x
POWER8 server growth using IBM Data Engine for Analytics estimated as a sample configuration grows 5x and 10x to 2.5 PB and 5 PB of user space:
5x 10x
Sizing is based on assumptions regarding general configurations and use cases for BigInsights and BigSQL with actual client. The comparison reflects the number of servers required to deliver relative performance and equivalent user space on Intel reference architecture using Hadoop triple replication with 4 TB drives versus POWER8 IDEA reference architecture with 4 TB drives using the Elastic Storage Server.
17 Servers28 Servers
75 Servers
143 Servers
Many new solution workloadsin addition to existing apps
Leads to costly, complex, siloed, under-utilized infrastructure and replicated data
Development Test Distributed ETL, Sensitivity Analysis
Hadoop based Sentiment Analysis
Low Utilization= Higher cost
Infrastructure Silos aka Cluster Sprawl is Inefficient
IBM Data Engine for Analytics (IDEA) – Client Example
Client: Multinational Telecomm. CompanyA multinational telecommunication company with
over 6M subscribers..
ChallengesExpectations of a Real Time Marketing (RTM)
based solution to run event-based campaignsEnable event-based marketing, analyzing various
sources of input data containing information regarding subscribers actions
Dispatch the triggered events to downstream applications such as campaign management, for associated campaign execution.
Architecture- Solution Components• BigInsights, Streams, SPSS Modeller, SPSS
Analytics Server• IBM Data Engine for Analytics: 20 X S822L, 2 X ESS
GL4, Spectrum Scale, PCM
Solution Approach Solution provided a Hadoop-based Big Data platform,
integrated to the RTM decision engine, to enable data monetization opportunities, including location based analytics
Customer was not comfortable with the huge number of x86 Data Nodes approach of typical Hadoop Architecture
The IBM team designed the Power solution and conducted a technical workshop on newly redefined Hadoop architecture based on IDEA.
Key Client BenefitsOptimized Big Data deployment architecture with IDEA
Architecture with Linux on Power, Elastic Storage Server and Spectrum Scale
Lower TCO with 4 Racks on Power vs 12 racks on x86More IO bandwidth with 40GbE Power network against
10GbE on x86 based solution
3x less racks for 2 PB Big Data solution
4 vs. 12
IBM Data Engine for Analytics – Solution Highlights
Actionable Insights with IBM BigInsights preloaded + Increase business value by consolidating multiple analytic capabilities and data as needed Up to 2.5x* faster insights
Smart Infrastructure Services with IBM Platform Computing+ Designed to handle multiple analytic workloads in a multi-tenant environment with dynamic resources
Designed for Data with IBM POWER8 Systems (S822L)+ Outstanding memory and IO bandwidth design for the demands of Big Data 2x* better performance
Scalable Networking with IBM and partner networks+ High bandwidth, low latency networking Ethernet RoCE, (10 Gbit or 40 Gbit), InfiniBand RDMA (FDR) 10x to 100x network performance growth since Hadoop inception
Flexible Storage with IBM Elastic Storage Server+ Combines Servers, Storage Enclosures, Disks and Elastic Storage Software Over 2x** reduction in storage disk count
*Based on internal testing and cost analysis**Based on client example vs a triple replica Hadoop configuration.
Big Data & Analytics Software
Infrastructure Services
POWER8 Servers
Scalable Networking
Scale Out Cluster File System
Elastic Storage Server
Appliance-Like but much more Versatile!
Compute Plane = Power8 Systems, Designed for Big Data
4X
Threads per core*4X
Memory Bandwidth*5X
More cache*
SMT=Simultaneous Multi-Threading OLTP = On-Line Transaction Processing HPC=High Performance Computing
These design decisions result in best performance for all types of workloads such as: Java, OLTP, Analytics, Big Data, HPC
* POWER8 compared to Intel Haswell EXSources: Haswell EX: http://ark.intel.com/products/84685/Intel-Xeon-Processor-E7-8890-v3-45M-Cache-2_50-GHz POWER8: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=BR&infotype=PM&appname=STGE_PO_PO_USEN&htmlfid=POB03046USEN
POWER8SMT8
x86SMT2
POWER8pipe
Data flow
x86 pipePOWER8
x86
x86POWER8
1.4 – 2.3X Clock Frequency
Components of IDEA
Data Plane = Elastic Storage Server, A Data Lake for many applications
Cinder SwiftGPFS NFS
Linear capacity & performance scale out
POSIX
Enterprise storage on standard hardware
Technical Computing Big Data & Analytics Cloud
File
77.5 Percent of organizations are already investing in a Data Lake*
Elastic Storage ServerSingle Name Space
25
HadoopBlock Object
Data LakeStructured Data
Unstructured Data
Traditional AnalyticsBigSQL
Components of IDEA
Production - IBM Data Engine for Analytics
26
Without HMC/TFT/SMN/MN With HMC/TFT/SMN
Available Filesystem Capacity 0PB 1PB
Number of Data Nodes 14 4
Hadoop Mgmt LPARs 0 6
System Mgmt Node 0 1
Data Node – S822L24x 3.02GHz cores256GB DRAM12x 1.8TB 10K SAS HDD2x40GbE (2 port)(data+mgmt)2x 4-port 1GbE NIC (mgmt)
Hadoop Management Node – S822L
2 LPARs with split backplane24x 3.02GHz cores256GB DRAM8x 1.8TB 10K SAS HDD (OS + data)2x40GbE (2 port)(data+mgmt)2x 4-port 1GbE NIC (mgmt)
System Management Node – S812L
10x 3.425GHz cores 32GB DRAM2x 300GB 10K SAS HDD (OS)1x40GbE (2 port)(data+mgmt)1x 4-port 1GbE NIC (mgmt)
Initial RackScale out Racks
Classic hadoop infrastructures can be inefficient and inflexibleleading to server and cluster sprawl, unnecessary software licenses, and infrastructure management challenges
IBM Data Engine for Hadoop and Spark
IBM Data Engine for Analytics
OpenPOWER Foundation
Single vendor support
Up to 2x better price performance for Spark workloads*
Delivered as a fully integrated cluster ready to run
OpenPOWER innovation with IBM S812LC servers
Optimized configurations for Hadoop or Spark workloads
Based on S812LC servers with up to 14*6TB disk drives per server
Optionally preloaded with IBM BigInsights and IBM Open Platform
Simplify operations – easy to deploy and manage
Adapt and scale to your changing analytics needs
IBM Data Engine for Hadoop and SparkOpenPOWER innovation with IBM Open Platform with Apache Hadoop for a high performance, storage dense and fully integrated cluster offering.
29• All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL
RDD Relation, Logistic Regression, SVM
Announce: Feb 9, 2016GA: Mar 18, 2016
IBM Data Engine for Spark and Hadoop (IDE-HS) Cluster Performance
Designed for the Cognitive Era to Make Better Decisions even Faster
IBM Data Engine for Hadoop and Spark infrastructure delivers Spark workload scaling to minimize execution times and reduce batch windows
-2.1X more performance per dollar spent for Spark Logistic Regression based Machine Learning used in model training by wide variety of lines of business
-1.4X more performance per dollar spent for Support Vector Machine (SVM) – a Machine Learning algorithm used in product Recommender Systems
-1.7X more performance per dollar spent for Spark SQL query processing used widely in Big Data clusters
• All results are based on IBM Internal Testing of 3 SparkBench benchmarks consisting of SQL RDD Relation, Logistic Regression, SVM• 6 Data Nodes and 1 Management Node. Each node is IBM Power System S812LC 10 cores / 80 threads, POWER8; 2.92GHz, 256 GB
memory, RedHat 7.2, Spark 1.5.1, OpenJDK 1.8• 6 Data Nodes and 1 Management Node. Each node is x86 E5-2620V3 12 cores / 24 threads, E5-2620 V3; 2.4GHz, 256 GB memory,
RedHat 7.1, Spark 1.5.1, OpenJDK 1.8• Pricing is based on web prices of HP DL380 and list prices of IBM Power S812LC
SVMLogres SQL
SVMLogres SQL6
• Apache Spark is an open-source in-memory distributed compute engine – It speeds iterative analysis on large-scale data up to 100x faster
than current technologies– Enables more people to collaborate together to access data,
apply analytics and deploy deep intelligence into every applicationincluding IoT, web, mobile, social, business process and more
– IBM/Spark commitment: 3500 employees working on Spark• Included in the IBM Open Platform (IOP) that runs on
Linux on Power
• Power Systems - key contributor to Spark• Offering over 2x the performance per core for Spark workloads
compared to x86 Haswell * (SQL, ML, Graph, Streaming)
Open Platform with Apache Hadoop
Open innovation to put data to work across the enterprise
* Based on Sparkbench on POWER8 P822L vs x86 E5-2690 V3; each 24 core and 256 GB RAM
Spark on Power