Upload
ruari-telford
View
660
Download
5
Embed Size (px)
Citation preview
© 2016 IBM Corporation
IBM’s BigInsights: Smart Analytics for Big Data
Created by C. M. Saracco, IBM Silicon Valley Lab January 2016
© 2016 IBM Corporation 2
IBM Disclaimer Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
© 2016 IBM Corporation 3
Agenda
§ The Big Data challenge
§ IBM’s approach - Portfolio overview - BigInsights
• Open source core platform with Apache Hadoop • IBM technologies for data analysts, data scientists, and administrators • How BigInsights fits within a broader IT infrastructure
§ How IBM can help you get off to a quick start
© 2016 IBM Corporation
The Big Data Challenge
© 2016 IBM Corporation 5
Business leaders frequently make decisions based on information they don’t trust, or don’t have 1 in 3
83% of CIOs cited “Business intelligence and analytics” as part of their visionary plans to enhance competitiveness
Business leaders say they don’t have access to the information they need to do their jobs 1 in 2
of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions
60%
… and organizations need deeper insights
Information is at the center of a new wave of opportunity…
2.5 million items per minute
300,000 tweets per minute
200 million emails per minute 220,000 photos
per minute
5 TB per flight
> 1 PB per day gas turbines
© 2016 IBM Corporation 6
Extract insight from a high volume, variety and velocity of data in a timely and cost-effective manner
Big Data presents big opportunities
Manage and benefit from diverse data types and data structures Analyze streaming data and large volumes of persistent data Scale from terabytes to zettabytes
Variety: Velocity: Volume:
© 2016 IBM Corporation 7
What we hear from customers . . . .
§ Lots of potentially valuable data is dormant or discarded due to size/performance considerations
§ Large volume of unstructured or semi-structured data is not worth integrating fully (e.g. Tweets, logs, . . .)
§ Not clear what should be analyzed (exploratory, iterative)
§ Information distributed across multiple
systems and/or Internet § Some data has a short useful lifespan
§ Volumes can be extremely high
§ Analysis needed in the context of existing information (not stand alone)
© 2016 IBM Corporation 8
Merging the traditional and Big Data approaches
IT Structures the data to answer that question
IT Delivers a platform to enable creative discovery
Business Explores what questions could be asked
Business Users Determine what question to ask
Monthly sales reports Profitability analysis Customer surveys
Brand sentiment Product strategy Maximum asset utilization
Big Data Approach Iterative & Exploratory
Traditional Approach Structured & Repeatable
© 2016 IBM Corporation 9
Big Data scenarios span many industries
Identify criminals and threats from disparate video, audio, and data feeds
Make risk decisions based on real-time transactional data
Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement
Detect life-threatening conditions at hospitals in time to intervene
Multi-channel customer sentiment and experience a analysis
© 2016 IBM Corporation 10
Information Ingestion and Operational Information
Landing and Archive Zone
Real-time Analytics
Zone
Enterprise Warehouse
and Mart Zone
Information Governance, Security and Business Continuity
Analytic Appliances
Big Data Platform Capabilities
Streaming Data
Text Data
Applications Data
Time Series
Geo Spatial
Relational
• Information Ingest • Real Time Analytics • Warehouse & Data Marts • Analytic Appliances
Social Network
Video & Image
All Data Sources
Advanced Analytics /
New Insights New / Enhanced
Applications
Automated Process
Case Management
Analytic Applications
Cognitive Learn Dynamically?
Prescriptive Best Outcomes?
Predictive What Could Happen?
Descriptive What Has Happened?
Exploration and Discovery
What Do You Have?
Watson
Cloud Services
ISV Solutions
Alerts
IBM Big Data and analytics sample architecture
© 2016 IBM Corporation 11
Big Data in practice
© 2016 IBM Corporation 12
Big Data adoption (Hadoop emphasis)
© 2016 IBM Corporation
IBM’s approach
© 2016 IBM Corporation 14
IBM analytics platform strategy for Big Data
• Integrate and manage the full variety, velocity and volume of Big Data
• Apply advanced analytics
• Visualize all available data for ad-hoc analysis
• Support workload optimization and scheduling
• Provide for security and governance
• Integrate with enterprise software
Discovery& Exploration
Prescriptive Analytics
Predictive Analytics
Content Analytics
Business Intelligence
DataMgmt
Hadoop & NoSQL
Content Mgmt
DataWarehouse
Information Integration & Governance
IBM ANALYTICS PLATFORM Built on Spark. Hybrid. Trusted.
Spark Analytics Operating SystemMachine Learning On premises On cloud
Data at rest & In-motion. Inside & outside the firewall. Structured & unstructured.
© 2016 IBM Corporation 15
IBM BigInsights for Apache Hadoop
Discovery& Exploration
Prescriptive Analytics
Predictive Analytics
Content Analytics
Business Intelligence
DataMgmt
Hadoop & NoSQL
Content Mgmt
DataWarehouse
Information Integration & Governance
IBM ANALYTICS PLATFORM Built on Spark. Hybrid. Trusted.
Spark Analytics Operating SystemMachine Learning On premises On cloud
Data at rest & In-motion. Inside & outside the firewall. Structured & unstructured.
§ Analytical platform for persistent Big Data
– 100% open source core with IBM add-ons for analysts, data scientists, and admins
– On premise or cloud § Distinguishing
characteristics – Built-in analytics . . . .
Enhances business knowledge
– Enterprise software integration . . . . Complements and extends existing capabilities
– Production-ready . . . . Speeds time-to-value
§ IBM advantage – Combination of
software, hardware, services and research
© 2016 IBM Corporation 16
Text Analytics
POSIX Distributed Filesystem
Multi-workload, multi-tenant scheduling
IBM BigInsights Enterprise Management
Machine Learning on Big R
Big R (R support)
IBM Open Platform with Apache Hadoop (HDFS,YARN,MapReduce,Ambari,Flume,HBase,Hive,Ka;a,Knox,Oozie,Pig,
Slider,Solr,Spark,Sqoop,Zookeeper)
IBM BigInsights Data Scientist
IBM BigInsights Analyst
Big SQL
BigSheets
Industry standard SQL (Big SQL)
Spreadsheet-style tool (BigSheets)
Overview of BigInsights
Free Quick Start (non production): • IBM Open Platform • BigInsights Analyst, Data Scientist features • Community support
. . .
© 2016 IBM Corporation 17
Warehousing Zone
Enterprise Warehouse
Data Marts
Ingestion and Real-time Analytic Zone Streams
Connectors
BI & Reporting
Predictive Analytics
Analytics and Reporting Zone
Visualization & Discovery
Landing and Analytics Sandbox Zone
Hive/HBaseColStores
Documents in variety of formats
MapReduce
Hadoop
Metadata and Governance Zone
ETL, MDM, Data Governance
Hadoop and the enterprise
© 2016 IBM Corporation 18
BigInsights ISV Partner Ecosystem
lHelium SW
© 2016 IBM Corporation
A Closer Look at IBM BigInsights . . . .
© 2016 IBM Corporation 20
Text Analytics
POSIX Distributed Filesystem
Multi-workload, multi-tenant scheduling
IBM BigInsights Enterprise Management
Machine Learning on Big R
Big R (R support)
IBM Open Platform with Apache Hadoop (HDFS,YARN,MapReduce,Ambari,Flume,HBase,Hive,Ka;a,Knox,Oozie,Pig,
Slider,Solr,Spark,Sqoop,Zookeeper)
IBM BigInsights Data Scientist
IBM BigInsights Analyst
Big SQL
BigSheets
Industry standard SQL (Big SQL)
Spreadsheet-style tool (BigSheets)
Overview of BigInsights
Free Quick Start (non production): • IBM Open Platform • BigInsights Analyst, Data Scientist features • Community support
. . .
IBM BigInsights for Apache Hadoop
© 2016 IBM Corporation 21
About the IBM Open Platform for Apache Hadoop
§ Flexible platform for processing large volumes of data - Includes Apache Hadoop and many popular open source projects in the Hadoop
ecosystem - Supports wide variety of data - Supports variety of popular APIs
§ Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner - CPU + disks = “node” - Nodes can be combined into clusters - New nodes can be added as needed without changing
• Data formats • How data is loaded • How jobs are written
© 2016 IBM Corporation 22
About Spark § Part of the IBM Open Platform for Apache Hadoop § Fast, general purpose, cluster computing system for large-scale
data processing § Speed (100x faster than MapReduce) - In-memory processing - Distributed computing
§ Flexibility - Wide range of workloads - Built-in libraries - Scala, Java, Python APIs - Works with Hadoop or stand alone
Logistic regression in Hadoop and Spark
from http://spark.apache.org
© 2016 IBM Corporation 23
Open source currency § Timely updates as new open source versions released § Install only those components you want / need § Component levels as of December 2015 (Apache licenses):
Ambari 2.1 Ambari Metrics 0.1.0 Hadoop 2.7.1 Flume 1.5.2 HDFS 2.7.1 HBase 1.1.1 Hive 1.2.1 Kafka 0.8.2.1 Knox 0.6.0 Oozie 4.2.0 Pig 0.15.0 Slider 0.80.0 Solr 5.1.0 Spark 1.5.1 Sqoop 1.4.6 YARN + MapReduce 2.7.1 ZooKeeper 3.4.6
© 2016 IBM Corporation 24
Initiative for an open big data platform (ODPi)
© 2016 IBM Corporation 25
Understanding ODPi . . . . Why is IBM involved? § Strong history of leadership in open source & standards § Supports our commitment to open source currency in all
future releases § Accelerates our innovation within Hadoop &
surrounding applications
ODPi and Apache Software Foundation (ASF) § ODPi supports the ASF mission § ASF provides a governance model around individual
projects without looking at ecosystem § ODPi aims to provide a vendor-led consistent
packaging model for core Apache components as an ecosystem
All Standard Apache Open Source Components
HDFS
YARN
MapReduce
Ambari HBase
Spark
Flume
Hive Pig
Sqoop
HCatalog
Solr/Lucene
ODPi initial spec
© 2016 IBM Corporation 26
Text Analytics
POSIX Distributed Filesystem
Multi-workload, multi-tenant scheduling
IBM BigInsights Enterprise Management
Machine Learning on Big R
Big R (R support)
IBM Open Platform with Apache Hadoop (HDFS,YARN,MapReduce,Ambari,Flume,HBase,Hive,Ka;a,Knox,Oozie,Pig,
Slider,Solr,Spark,Sqoop,Zookeeper)
IBM BigInsights Data Scientist
IBM BigInsights Analyst
Big SQL
BigSheets
Industry standard SQL (Big SQL)
Spreadsheet-style tool (BigSheets)
Overview of BigInsights
Free Quick Start (non production): • IBM Open Platform • BigInsights Analyst, Data Scientist features • Community support
. . .
IBM BigInsights for Apache Hadoop
© 2016 IBM Corporation 27
Spreadsheet-style analysis (BigSheets) § Web-based analysis and
visualization
§ Spreadsheet-like interface
- Explore, manipulate data
without writing code - Invoke pre-built functions
- Generate charts
- Export results of analysis
- Create custom plug-ins
- . . .
© 2016 IBM Corporation 28
SQL for Hadoop (Big SQL)
SQL-based Application
Big SQL Engine
Data Storage
IBM data server client
SQL MPP Run-time
DFS
28
§ Comprehensive, standard SQL – SELECT: joins, unions, aggregates, subqueries . . . – GRANT/REVOKE, INSERT … INTO – Procedural logic in SQL – Stored procs, user-defined functions – IBM data server JDBC and ODBC drivers
§ Optimization and performance
– IBM MPP engine (C++) replaces Java MapReduce layer – Continuous running daemons (no start up latency) – Message passing allow data to flow between nodes
without persisting intermediate results – In-memory operations with ability to spill to disk (useful
for aggregations, sorts that exceed available RAM) – Cost-based query optimization with 140+ rewrite rules
§ Various storage formats supported – Data persisted in DFS, Hive, HBase – No IBM proprietary format required
§ Integration with RDBMSs via LOAD, query federation
BigInsights
© 2016 IBM Corporation 29
About Big SQL performance: Hadoop-DS benchmark
§ TPC (Transaction Processing Performance Council) - Formed August 1988 - Widely recognized as most credible, vendor-independent SQL benchmarks - TPC-H and TPC-DS are the most relevant to SQL over Hadoop
• R/W nature of workload not suitable for HDFS
§ Hadoop-DS benchmark: BigInsights, Hive, Cloudera - Run by IBM & reviewed by TPC certified auditor - Based on TPC-DS. Key deviations
• No data maintenance or persistence phases (not supported across all vendors) - Common set of queries across all solutions
• Subset that all vendors can successfully execute at scale factor • Queries are not cherry picked
- Most complete TPC-DS like benchmark executed so far - Analogous to porting a relational workload to SQL on Hadoop
Visit Hadoop Dev (https://developer.ibm.com/hadoop) and search on “hadoop-ds”
© 2016 IBM Corporation 30
IBM Big SQL runs 100% of the queries
Key points § With Impala and Hive, many
queries needed to be re-written, some significantly
§ Owing to various restrictions, some queries could not be re-written or failed at run-time
§ Re-writing queries in a benchmark scenario where results are known is one thing – doing this against real databases in production is another
Other environments require significant effort at scale
Results for 10TB scale shown here
© 2016 IBM Corporation 31
Hadoop-DS benchmark – single user @ 10TB
Big SQL 3.6x faster than Impala; 5.4x faster than Hive 0.13 for single query stream using 46 common queries
Based on IBM internal tests comparing BigInsights Big SQL, Cloudera Impala and Hortonworks Hive (current versions available as of 9/01/2014) running on identical hardware. The test workload was based on the latest revision of the TPC-DS benchmark specification at 10TB data size. Successful executions measure the ability to execute queries a) directly from the specification without modification, b) after simple modifications, c) after extensive query rewrites. All minor modifications are either permitted by the TPC-DS benchmark specification or are of a similar nature. All queries were reviewed and attested by a TPC certified auditor. Development effort measured time required by a skilled SQL developer familiar with each system to modify queries so they will execute correctly. Performance test measured scaled query throughput per hour of 4 concurrent users executing a common subset of 46 queries across all 3 systems at 10TB data size. Results may not be typical and will vary based on actual workload, configuration, applications, queries and other variables in a production environment. Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries.
© 2016 IBM Corporation 32
Hadoop-DS benchmark – multi-user @ 10TB
Big SQL 2.1x faster than Impala; 8.5x faster than Hive 0.13 for 4 query streams using 46 common queries
Based on IBM internal tests comparing BigInsights Big SQL, Cloudera Impala and Hortonworks Hive (current versions available as of 9/01/2014) running on identical hardware. The test workload was based on the latest revision of the TPC-DS benchmark specification at 10TB data size. Successful executions measure the ability to execute queries a) directly from the specification without modification, b) after simple modifications, c) after extensive query rewrites. All minor modifications are either permitted by the TPC-DS benchmark specification or are of a similar nature. All queries were reviewed and attested by a TPC certified auditor. Development effort measured time required by a skilled SQL developer familiar with each system to modify queries so they will execute correctly. Performance test measured scaled query throughput per hour of 4 concurrent users executing a common subset of 46 queries across all 3 systems at 10TB data size. Results may not be typical and will vary based on actual workload, configuration, applications, queries and other variables in a production environment. Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries.
© 2016 IBM Corporation 33
§ Define business terms
§ Manage data lineage
§ Create information policies § Search and browse assets from across data stores
§ Enhance existing meta data with descriptions and associations
§ Provide a solid foundation for information integration and governance projects
Limited use license: InfoSphere Governance Catalog
Demo: https://www.youtube.com/watch?v=w1xFdhHyEJg
© 2016 IBM Corporation 34
§ Eclipse-based tool
§ SQL development, testing § Database exploration
§ Supports Big SQL, select RDBMSs
License: Data Studio
© 2016 IBM Corporation 35
Text Analytics
POSIX Distributed Filesystem
Multi-workload, multi-tenant scheduling
IBM BigInsights Enterprise Management
Machine Learning on Big R
Big R (R support)
IBM Open Platform with Apache Hadoop* (HDFS,YARN,MapReduce,Ambari,Flume,HBase,Hive,Ka;a,Knox,Oozie,Pig,
Slider,Solr,Spark,Sqoop,Zookeeper)
IBM BigInsights Data Scientist
IBM BigInsights Analyst
Big SQL
BigSheets
Industry standard SQL (Big SQL)
Spreadsheet-style tool (BigSheets)
Overview of BigInsights
Free Quick Start (non production): • IBM Open Platform • BigInsights Analyst, Data Scientist features • Community support
…
IBM BigInsights for Apache Hadoop
© 2016 IBM Corporation 36
What is Big R?
R Clients
Scalable Statistics Engine
Data Sources
Embedded R Execution
R Packages
R Packages
1
2
3
1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm (no MapReduce code)
2. Scale out R • Partitioning of large data (“divide”) • Parallel cluster execution of
pushed down R code (“conquer”) • All of this from within the R
environment (Jaql, Map/Reduce are hidden from you
• Almost any R package can run in this environment
3. Scalable machine learning • A scalable statistics engine that
provides canned algorithms, and an ability to author new ones, all via R
“End-to-end integration of R-Project with BigInsights”
Pull data (summaries) to
R client
Or, push R functions
right on the data
© 2016 IBM Corporation 37
Text analytics • Distills structured info from unstructured text - Sentiment analysis - Consumer behavior - Illegal or suspicious activities - …
• Parses text and detects meaning with annotators
• Understands the context in which the text is analyzed
• Features pre-built extractors for names, addresses, phone numbers, etc.
I had an iphone, but it's dead @JoaoVianaa.
(I've no idea where it's) !Want a Galaxy now !!!
@rakonturmiami im moving to miami in 3 months.
i look foward to the new lifestyle
I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Des Moines) w/ 2 others http://4sq.com/gbsaYR
© 2016 IBM Corporation 38
Extracting information from text
Entity Analytics
Preventative Maintenance
Customer Segmentation
Sentiment Affinity
…
Analyze
Text Single column or
document
• sentencesegmentaJon
• tokenizaJon• part-of-speech
tagging
• languagedetecJon
Recognize
EnJtyRecogniJon
MachineDataPrimiJves
SenJment
…
Describe via extractors
Information Extraction (IE)
Tagged syntax
Classified words /
attributes
Classified words /
attributes Text
preparation
• extracJonoperaJons
via lexical analysis via deep linguistic analysis
• spanoperaJons• joinoperaJons• consolidaJons• ……
• verb-centricabstracJon• noun-centricabstracJon• shallowparsing• …
© 2016 IBM Corporation 39
Web-based tool to define rules to extract data and derive information from unstructured text Graphical interface to describe structure of various textual formats – from log file data to natural language
Text analytics tooling
© 2016 IBM Corporation 40
Pre-built text extractors
§ The extractor library contains a rich set of pre-built extractors - Finance actions - Named Entities - Generic - Machine Data - Sentiment Analysis
§ You can control output properties - Output columns and names - Row filters
§ Some pre-built extractors can be customized - Add / remove dictionary terms
© 2016 IBM Corporation 41
Text Analytics
POSIX Distributed Filesystem
Multi-workload, multi-tenant scheduling
IBM BigInsights Enterprise Management
Machine Learning on Big R
Big R (R support)
IBM Open Platform with Apache Hadoop (HDFS,YARN,MapReduce,Ambari,Flume,HBase,Hive,Ka;a,Knox,Oozie,Pig,
Slider,Solr,Spark,Sqoop,Zookeeper)
IBM BigInsights Data Scientist
IBM BigInsights Analyst
Big SQL
BigSheets
Industry standard SQL (Big SQL)
Spreadsheet-style tool (BigSheets)
Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • BigInsights Analyst, Data Scientist features • Community support
. . .
IBM BigInsights for Apache Hadoop
© 2016 IBM Corporation 42
IBM GPFS: HDFS alternative § Drop-in replacement for HDFS § No need for dedicated analytics
infrastructure - Cost savings - No need to move data in and out of an
analytics dedicated silo • Software defined infrastructure for multi-
tenancy § Fast parallel file system § POSIX & HDFS semantics § Hadoop & non-Hadoop apps § Built-in high availability
Compute Cluster
GPFS-FPO
HDFS
© 2016 IBM Corporation 43
Platform Symphony
Resource Orchestration
Workload Manager
C C C C C C
C C C C C C
D
D
D
D
D
D
D
D
D
D
D
D
C C C C C C
A A A A
A A A A
A A A A
A A A A
B
B
B
B
B
B
B
B
B
B
B
BB B B B B B
Third Party Trading & Risk Platforms In-house Applications BigInsights
A B C D
Multiple users, applications and lines of business on a shared, heterogeneous, multi-tenant grid
© 2016 IBM Corporation 44
Text Analytics
POSIX Distributed Filesystem
Multi-workload, multi-tenant scheduling
IBM BigInsights Enterprise Management
Machine Learning on Big R
Big R (R support)
IBM Open Platform with Apache Hadoop* (HDFS,YARN,MapReduce,Ambari,Flume,HBase,Hive,Ka;a,Knox,Oozie,Pig,
Slider,Solr,Spark,Sqoop,Zookeeper)
IBM BigInsights Data Scientist
IBM BigInsights Analyst
Big SQL
BigSheets
Industry standard SQL (Big SQL)
Spreadsheet-style tool (BigSheets)
Overview of BigInsights
Free Quick Start (non production): • IBM Open Platform • BigInsights Analyst, Data Scientist features • Community support
. . .
IBM BigInsights for Apache Hadoop
© 2016 IBM Corporation 45
Text Analytics
POSIX Distributed Filesystem
Multi-workload, multi-tenant scheduling
IBM BigInsights Enterprise Management
Machine Learning on Big R
Big R (R support)
IBM Open Platform with Apache Hadoop* (HDFS,YARN,MapReduce,Ambari,Flume,HBase,Hive,Ka;a,Knox,Oozie,Pig,
Slider,Solr,Spark,Sqoop,Zookeeper)
IBM BigInsights Data Scientist
IBM BigInsights Analyst
Big SQL
BigSheets
Industry standard SQL (Big SQL)
Spreadsheet-style tool (BigSheets)
Overview of BigInsights
Free Quick Start (non production): • IBM Open Platform • BigInsights Analyst, Data Scientist features • Community support
. . .
IBM BigInsights for Apache Hadoop
© 2016 IBM Corporation 46
IBMBigInsightsBigInsightsQuickStartEdiJon
IBMOpenPlaUormwith
ApacheHadoop
EliteSupportforIBMOpenPlaUormwith
ApacheHadoop
BigInsightsAnalystModule
BigInsightsData
ScienJstModule
BigInsightsEnterprise
ManagementModule
BigInsightsforApacheHadoop
ApacheHadoopStack:HDFS,YARN,MapReduce,Ambari,Hbase,Hive,Oozie,Parquet,ParquetFormat,Pig,Snappy,Solr,Spark,Sqoop,Zookeeper,OpenJDK,Knox,Slider
✔ ✔ ✔ * * * ✔
BigSQL–100%ANSIcompliant,highperformant,secureSQLengine
✔ ✔ ✔ ✔
BigSheets–spreadsheet-likeinterfacefordiscovery&visualizaJon
✔ ✔ ✔ ✔
BigR–advancedstaJsJcal&datamining
✔ ✔ ✔
MachineLearningwithBigR–machinelearningalgorithmsapplytoHadoopdataset
✔ ✔ ✔
AdvancedTextAnaly?cs–visualtoolingtoannotateautomatedtextextracJon
✔ ✔ ✔
EnterpriseMgmt–Enhancedcluster&resourcemgmt&POSIX-compliantfilesystems(GPFS,PlaUormSymphony)
✔ ✔
GovernanceCatalog,DataStudio ✔ ✔ ✔
CognosBI,InfoSphereStreams,WatsonExplorer,DataClick
✔
IBM BigInsights for Apache Hadoop offerings
* Paid support for IBM Open Platform with Apache Hadoop required for BigInsights modules
© 2016 IBM Corporation 47
Pricing & Licensing
Products BigInsightsQuickStart
IBMOpenPlaUormwith
ApacheHadoop
EliteSupportforIBMOpen
PlaUormwithApacheHadoop
BigInsightsAnalystModule
BigInsightsData
ScienJstModule
BigInsightsEnterprise
ManagementModule
BIgInsightsforApacheHadoop
PricingTerms Free Free Yearly
Subscription Only
Perpetual or Monthly License
Supportprovided Community Community IBM 24x7 support
UsageLicenseNon-
production, five node cap
Production Usage
PricingModel Free Free Node based pricing
Accessvia ibm.com/hadoop Passport Advantage
• Community support via Hadoop Dev • Over 100,000 visitors since inception • Modeled after StackOverflow, most popular developer Q&A site on web
© 2016 IBM Corporation 48
http://bluemix.net
§ Prototype, demo, trial in the cloud
§ Empowers developers to rapidly drive insight from all data
§ Adds Hadoop-based analytics to your application
§ Enterprise features - BigSheets, Big SQL, Text analytics, HiveQL, HttpFS
§ Delivered via IBM BlueMix
§ For Production, deployments at scale in the cloud
§ Delivers flexibility and efficiency with subscription pricing
§ Scales to meet spikes in demand without on-premise infrastructure
§ Drives enterprise-class, complex analytics on Big Data sets
§ Available via the IBM Cloud Marketplace and Bluemix
Cloud deployment options
Developer sandbox
Analytics for Hadoop
http://www.ibm.com/cloud http://www.bluemix.net
Enterprise Hadoop as a Service
Production environment
© 2016 IBM Corporation
Summary and Fast Start
© 2016 IBM Corporation 50
IBM Analytic Platform Capabilities
IBM integrates and extends Hadoop and Spark
Data Warehousing PureData for Analytics, Operational Analytics
Entity Extraction and Matching Big Match
Security and Compliance Optim, Guardium Audit and Encryption
Data Integration and Governance Information Server
Enterprise Search Watson Explorer
Real-time Analytics Streams
Predictive Modeling and Descriptive Statistics SPSS, Big R and Scalable Algorithms
Analysis, Reporting, and Exploration Watson Analytics, Cognos, BigSheets
Fast, ANSI SQL 2011, and Secure SQL Big SQL
Enterprise File System GPFS-FPO
Cluster Resource and Workload Management Platform Symphony
Large Scale Text Extraction Big Text
IBM Open Platform with Apache Hadoop
© 2016 IBM Corporation 51
IBM investing heavily in Big Data and analytics
$24B Investment
in both organic development
and 30+ acquisitions
$100M Announced investmentin IBM Interactive Experience, creating 10 new labs worldwide
9 Analytics Solution Centers
1,000 universities
Developing curriculumand training for analytics with
$1B To bring
cognitive services and applications
to market
© 2016 IBM Corporation 52
Industry-leading Strategy and Analytics practice - 15,000 staff
Diverse portfolio of analytical and cognitive computing capabilities
More than 100,000 people trained on analytics
20+ Big Data & Analytics solutions on Cloud Marketplace
Leading infrastructure platforms
Comprehensive enterprise-grade Hadoop
Expertise and technology set IBM apart
© 2016 IBM Corporation 53
Spark investments: community, core, and consumption
Core Accelerating Spark
capabilities
Community Growing Spark
knowledge & expertise
Consumption Using Spark within IBM
& partner products
Spark Technology Center Big Data University
SystemML open source contribution
Spark stand-alone Hadoop distribution IBM portfolio 30+ research initiatives
3500+ IBM developers and researchers
© 2016 IBM Corporation 54
The bottom line about IBM and Big Data
§ Big Data is a strategic initiative for IBM - Significant investments across software, hardware and services.
§ BigInsights - Enables firms to exploit growing variety, velocity, and volume of data - Delivers diverse range of analytics - Leverages and extends open source - Provides enterprise-class features and supporting services - Complement existing software investments and commercial offerings
§ IBM advantage - Full solution spanning software, hardware & services - Rapid technology advances through partnerships with IBM Research - Global reach
© 2016 IBM Corporation 55
Want to learn more? § Download Quick Start offering § Follow tutorials, videos, and more § Links all available from HadoopDev
– https://developer.ibm.com/hadoop/
© 2016 IBM Corporation
IBM big data • IBM big data • IBM big data
IBM big data • IBM big data • IBM big data
IBM
big
dat
a
• IB
M b
ig d
ata
IBM
big data • IBM
big data
THINK