Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Preparation Cloud Service
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
“Big Data’s dirty little secret is that 90% of time spent on a project is devoted to preparing data… After all the preparation work, there isn’t enough time left to do sophisticated analytics on it…” Source: Thomas Davenport – Wall Street Journal, 2014
Big Data Has a Secret…
In the past year, data preparation has become indispensable due to its overwhelming contribution to analyses and decision support. Source: Gartner (http://blogs.gartner.com/lakshmi-randall/2015/05/11/whats-next-data-preparation/)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Many Different Formats, Unclean, Missing Values, …
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Companies are struggling to derive value from big data…
Data Discovery & Visualization
Enterprise Reporting
Internet
Logs
Structured & Unstructured Semi-Structured Data
90% of time is spent on DATA WRANGLING
MONTHS of effort spent on each new
dataset
PROGRAMERS writing scripts or complex ETL
Enterprise ETL & Data Integration
Traditional methods
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Solution: Oracle Big Data Preparation Cloud Service
Internet
Logs
Structured & Unstructured Semi-Structured Data
Data Discovery & Visualization
Enterprise Reporting
Enterprise ETL & Data Integration
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Core Data Preparation Lifecycle Features
Prepare
• Import / Ingest Data • Cleanse and Normalize • Schema Detection • Duplicate Identification • Sensitive Data Detection
Enrich
• Data Profiling • Data Classification • Data Enrichment • Attribute Extraction • Entity Extraction
Publish
• Restful API • Source / Target Definition • On Demand • Scheduled Events • Export Formatting
Govern and Monitor
• Interactive Dashboards • Automated Alerts
• User policies & system controls • Reusable data policies
• Security controls • Job Detail Views
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Intuitive User Interface Integrated Data Verification, Transformation, and Visualization
Knowledge Driven Recommendations
Interactive Transform Script
Metadata and Data Views
Profile Metrics and
Visualizations
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Automates Many Data Preparation Tasks • General transformations
– delete, replace, extract, obfuscate, …
• Duplicate analysis – exact, fuzzy, multiple analysis, …
• NULLs
• Knowledge based classification – country, capital, population, currency, …
– custom knowledge
• Patterns recognition – ID, date, time, credit card, email, IP address, …
• Recommendations – based on data type, knowledge, patterns, …
• Data blending – join, cross source relationship discovery
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Parse Click Stream Logs
Repair App Data
Classify Social Data
Structured Unreliable
Unstructured High Velocity
Unstructured High Volume
Embedded information No reliable patterns
Embedded information in unstructured text
Invalid emails
NLP
SSN Credit Card Info
Entities
Big Data Preparation and Enrichment Examples Supported Formats
(not complete list)
Invalid and missing data Sensitive data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data and Semantics Powered Back-end Architecture …it’s fast, it learns, and works with all data
YARN on Big Data
Oracle Public Cloud
Knowledge Graph
• Leveraged by semantic pipeline
• YAGO2 derived real world knowledge
• Enhanced with customer specific reference data
• Continuously expanding knowledge with each release
Semantics based
Knowledge Graph
Natural Language
• Proven semantic technology
• Based upon decades of know-how in standardizing complex product data
Natural Language
Processing
Spark
• Engine built on a massively scalable foundation for iterative machine learning in a clustered compute environment
Spark Machine Learning
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Preparation From Browsers or Mobile Devices
Thin client environment for browser or mobile monitoring and enrichment
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
• Continuous Execution of Recognized Files – No Human Intervention • Security and Metadata – Sensitive Data Discovery & Provenance
• Governance Dashboards – Runtime Metrics, Health Reports, Alerts
Automate the Big Data Preparation Pipeline
Ingest/Publish APIs
Runtime Metrics
APIs