16
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Big Data Preparation Cloud Service

Big Data Preparation Cloud Service - Eventworld.cz · Core Data Preparation Lifecycle Features Prepare •Import / Ingest Data •Cleanse and Normalize •Schema Detection •Duplicate

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data Preparation Cloud Service

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

The following is intended to outline our general

product direction. It is intended for information

purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

The development, release, and timing of any

features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data has a Big Problem

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

“Big Data’s dirty little secret is that 90% of time spent on a project is devoted to preparing data… After all the preparation work, there isn’t enough time left to do sophisticated analytics on it…” Source: Thomas Davenport – Wall Street Journal, 2014

Big Data Has a Secret…

In the past year, data preparation has become indispensable due to its overwhelming contribution to analyses and decision support. Source: Gartner (http://blogs.gartner.com/lakshmi-randall/2015/05/11/whats-next-data-preparation/)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Many Different Formats, Unclean, Missing Values, …

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Companies are struggling to derive value from big data…

Data Discovery & Visualization

Enterprise Reporting

Internet

Logs

Structured & Unstructured Semi-Structured Data

90% of time is spent on DATA WRANGLING

MONTHS of effort spent on each new

dataset

PROGRAMERS writing scripts or complex ETL

Enterprise ETL & Data Integration

Traditional methods

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Solution: Oracle Big Data Preparation Cloud Service

Internet

Logs

Structured & Unstructured Semi-Structured Data

Data Discovery & Visualization

Enterprise Reporting

Enterprise ETL & Data Integration

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Core Data Preparation Lifecycle Features

Prepare

• Import / Ingest Data • Cleanse and Normalize • Schema Detection • Duplicate Identification • Sensitive Data Detection

Enrich

• Data Profiling • Data Classification • Data Enrichment • Attribute Extraction • Entity Extraction

Publish

• Restful API • Source / Target Definition • On Demand • Scheduled Events • Export Formatting

Govern and Monitor

• Interactive Dashboards • Automated Alerts

• User policies & system controls • Reusable data policies

• Security controls • Job Detail Views

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Intuitive User Interface Integrated Data Verification, Transformation, and Visualization

Knowledge Driven Recommendations

Interactive Transform Script

Metadata and Data Views

Profile Metrics and

Visualizations

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Automates Many Data Preparation Tasks • General transformations

– delete, replace, extract, obfuscate, …

• Duplicate analysis – exact, fuzzy, multiple analysis, …

• NULLs

• Knowledge based classification – country, capital, population, currency, …

– custom knowledge

• Patterns recognition – ID, date, time, credit card, email, IP address, …

• Recommendations – based on data type, knowledge, patterns, …

• Data blending – join, cross source relationship discovery

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Parse Click Stream Logs

Repair App Data

Classify Social Data

Structured Unreliable

Unstructured High Velocity

Unstructured High Volume

Embedded information No reliable patterns

Embedded information in unstructured text

Invalid emails

NLP

SSN Credit Card Info

Entities

Big Data Preparation and Enrichment Examples Supported Formats

(not complete list)

Invalid and missing data Sensitive data

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data and Semantics Powered Back-end Architecture …it’s fast, it learns, and works with all data

YARN on Big Data

Oracle Public Cloud

Knowledge Graph

• Leveraged by semantic pipeline

• YAGO2 derived real world knowledge

• Enhanced with customer specific reference data

• Continuously expanding knowledge with each release

Semantics based

Knowledge Graph

Natural Language

• Proven semantic technology

• Based upon decades of know-how in standardizing complex product data

Natural Language

Processing

Spark

• Engine built on a massively scalable foundation for iterative machine learning in a clustered compute environment

Spark Machine Learning

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data Preparation From Browsers or Mobile Devices

Thin client environment for browser or mobile monitoring and enrichment

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

• Continuous Execution of Recognized Files – No Human Intervention • Security and Metadata – Sensitive Data Discovery & Provenance

• Governance Dashboards – Runtime Metrics, Health Reports, Alerts

Automate the Big Data Preparation Pipeline

Ingest/Publish APIs

Runtime Metrics

APIs

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Questions and Answers

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |