Upload
atsushi-tsuchiya
View
4.180
Download
1
Embed Size (px)
DESCRIPTION
Data Discovery Tool for BigInsights (on top of Hadoop) - MapReduce with no coding.
Citation preview
Data Discovery ToolBigSheetsBigSheets
MapReduce with No Coding?p g
Atsushi Tsuchiya ([email protected])Atsushi Tsuchiya ([email protected])Big Data Tiger Team
IBM SoftwareIBM Software
Looking at DataLooking at Data
h ld d i h i d ?• What would you do with Big data? • How to make use of it?• It is difficult! – too vague.
• No specific problem that needs to be solved.p p• No specific question that needs to be answered.
• Only you know is to improve the business.y y p• But you have *data*• So what would you do first?• So, what would you do first?
Looking at Data!g
IBM with HadoopIBM with Hadoop
• IBM has been working with Open source community for the long time.y g– Eclipse, Hadoop and so on …
• BigInsights include Hadoop
BigInsightsBigInsights
i ih i d d f i d• BigInsihgts is IBM Hadoop product for Big data analytics.– Basic Edition (up to 10TB) – Free 無償で使えます!
– Enterprise Edition p
• Next version BigInsights ‐ coming soonNext version BigInsights coming soon.– v1.2 available.
• And many more
BigInsights ComponetnsBigInsights Componetns
i ih i l d• BigInsihgts includes:– IBM Java– JAQL - IBMが開発した言語(オープンソース)
– IBM Distribution of Hadoop– BigSheets - データ探索ツール
– FLEX scheduler for Adaptive MapReduce – Orchestrator (Workflow Engine)– SystemT (Text Analytics), SystemML (Machine Learning)– LDAP– Web Console / Developer Studio
BigInsights – Basic EditionBigInsights – Basic EditionFunction
VersionWill be Update
in NovBasic
EditiEnterprise
EditiFunction in Nov release.
Edition Edition
Integrated Install Inc IncOpen Source components:Hadoop (including common utilities, HDFS, MapReduce framework) 0.20.2 Inc IncJaql (programming / query language) 0.5.2 Inc IncPig (programming / query language) 0.7 Inc IncPig (programming / query language) 0.7 Inc IncFlume (data collection/aggregation) 0.9.1 Inc IncHive (data summarization/querying) 0.5 Inc IncLucene (text search) 3 0 2 Inc IncLucene (text search) 3.0.2 Inc IncZookeeper (process coordination) 3.2.2 Inc IncAvro (data serialization) 1.3.0 Inc Inc
( / ) 0 20 6HBase (real time read/write) 0.20.6 Inc IncOozie (workflow/ job orchestration) 2.2.2 Inc IncOnline documentation Inc IncCapability to integrate with DB2, InfoSphere Warehouse
Two DB2 UDFs to submit jobs, and read results from BigInsightsInc Inc
BigInsights – Enterprise EditionBigInsights – Enterprise EditionFunction Basic
EditionEnterprise
EditionEdition EditionR Connector
Jaql module to invoke R statistical capabilities from BigInsights n/a IncN t C tNetezza Connector
Jaql modules to read/write data from/to Netezza n/a IncLDAP n/a IncWeb Console n/a IncWorkflow Engine n/a IncScheduler (Orchestrator) n/a IncScheduler (Orchestrator) n/a IncText Analytics Module (System T) n/a IncEclipse support (for System T)* n/a IncBigSheets – Data Discovery Tool n/a IncIBM Optim Development Studio V2.2.1.0 n/a IncSupport by IBM n/a Incpp y
BigSheetsBigSheets
• A data exploring tool for Hadoop
• Only comes with BigInsights Enterprise editionOnly comes with BigInsights Enterprise edition
BigSheets Concept ModelBigSheets Concept ModelEnrich Inspect
ExploreInternet No Coding is Required!
BigSheetsGather
Intranet
Publish Get/
Logs Gather
Manipulate
Massive Resultsin BigInsightsOther
Explore & Analyze
It’s like a spreadsheetsIt s like a spreadsheets.
Looks very familiar ?!?
VisualizationsVisualizations
• Predefined visualization
• Customer Plug‐inCustomer Plug in
A number of coffee shops in North America for each States.
DEMODEMO
GatherIntranet
BigSheetsInternet
LogsGather
i i h h d f
BigInsightsOther
• BigInsights can gather data from– Predefined formats :
• BigSheets data reader• Basic crawler data reader• Basic crawler data reader (binary support)• Basic crawler data reader (binary support)• Character‐delimited data reader• Tab Separated Value (TSV) data readerp ( )• JavaScript Object Notation (JSON) array reader• Comma Separated Value (CSV) data reader
– Customer BigSheets Reader
GatherIntranet
BigSheetsInternet
LogsGather
i i h i d d
BigInsightsOther
• BigInsights can import structured and unstructured data– CSV– Files– Network
• httpp• hdfs• AWS (S3n/S3)
– Other• Customer Importer
CollectionIntranet
BigSheetsInternet
LogsCollection BigInsightsOther
A complete list of MacDonald's in North AmericaA complete list of MacDonald s in North America.
Intranet
BigSheetsInternet
Logs
BigInsightsOther
Calculate
Reformat
Import
A complete list of MacDonald's in North America.
Intranet
BigSheetsInternet
Logs
BigInsightsOther
Column chart
Heat map
BigSheets in ActionBigSheets in Action
映 売 げ• Blockbuster 映画売り上げ予測– ABC Newsより
Blockbuster –映画の売り上げ予測IBM BigInsights/BigSheets
①週末につぶやかれたTweets①週末につぶやかれたTweets (約200,000)フィードを受けて、
②数時間以内に、(今までは、月曜の朝になってから)‐売り上げ予測チャート作成売り上げ予測チャ ト作成‐センチメント分析例えば、今年の夏は、X‐manがどれよりも人気があった(つがどれよりも人気があ た(ぶやかれた)→宣伝、上映戦略などをこまめに修正
ConclusionConclusion
• We all need to improve the business.
S h ld t t ith Bi d t ?• So, where would you start with Big data?
Data Discovery is a key to start improving YOUR Business!YOUR Business!
Thank you!Thank you!