Informatica Big Data Management - Meetup › 16208282 › Big Data Management... · 2016-04-15 ·...

Preview:

Citation preview

Informatica Big Data ManagementJoel LaPlountInformatica Product Management

Data Powers Businesses

Big Data = Big Opportunity

Sources:Informatica Big Data Survey, March 2012Cisco, The Zettabyte Era - Trends and Analysis, May 2013

67%Of respondents see big data as an opportunity for their organization.

By 2020, data is predicted to grow at least 75 times and more than 1/3 will pass through the Cloud.

Example Use Cases

Advanced Analytics

Fraud / Risk Management

Process / AssetOptimization

DATA LAKE

The Reality

By 201585% of Fortune 500 organizations will fail to effectively exploit big data for competitive advantage.

Companies Taking on the Big Data Challenge

Their Early Journey

All this new data –let’s just spin up a Hadoop cluster.

Now all we have to do is ingest, blend and prep

data…

STOP! How do we operationalize the

results? Reuse?

The “sandbox” is up – experiments are so much fun!!!

No real business value – no ROI –we are STUCK!

Oops! So many issues with data –just hand-code!

Biz’ wants more insights – let’s put it in the data lake!

We need more Hadoop

developers!!!

Why Do Big Data Projects Fail?“rapid intake of new data sources”Vishal, VP Data Architecture

“too many data silos making it impossible to know what data can be trusted”Pete, Chief Data Officer

“simplify the work of ingesting and mapping data...so that we need fewer specialized development resources”Ron, VP Global Information Systems

“need to ensure confidence in data integrity, accuracy, and timeliness”Ron, VP Global Information Systems

“need code re-usability and code maintainability”Ben, Director of Platform Architecture

“regulations have become very strict and very precise – lots of gaps in the quality of the data”Christine, Manager Data Management

“prepping and cleaning the data used to take us 2-3 weeks”Vishal, VP Data Architecture

“transforming data management from a labor intensive, qualitative approach to a systematic approach…to classify data and understand lineage”Ned, Senior Vice President

What’s Required for Successful Big Data Projects?

Big data does not mean NO

data integration.

Big data does not mean BAD quality

information.

Big data does not mean

PROLIFERATIONof sensitive data.

How do you certify and govern big data?

How do you quickly integrate big data?

How do you secure big data?

Introducing Informatica Big Data Management

Industry’s Only Single Integrated Platform for Big Data ManagementInformatica Big Data Management

Analytical Applications

Data Warehouses, Data Lakes, NoSQL

PILLAR 1Big Data Integration

PILLAR 2Big Data Governance & Quality

PILLAR 3Big Data Security

PILLAR 1 – Big Data Integration

Big Data Cannot Be Tackled Manually

The Race to Business Value Will Not Be Won By Hand

MoreVolume

MoreVariety

MoreVelocity

More DataConsumers

More DataSilos

More DataPlatforms

So Big Data Goes Unused or Is Delivered Late

Data Developer IT Data Management Business Analyst

Overwhelming Manual Efforts Complex Processes Analysis Too Late

Big Data Integration For Maximum Performance

Ingest Instantly Process Everything Deploy Optimally

200+Pre-Built

Connectors

CloudConnectivity

Real-TimeStreaming

100+Pre-Built Parsers and

Transformations

GraphicalDevelopment

DynamicProcess and Mappings

MultipleEngines Supported

(MapReduce, Spark, etc.)

High-SpeedProcessing For Complex

Workloads

AccessWith Brokering & Federation

PILLAR 2 – Big Data Governance & Quality

Big Data Is Difficult To Trust

ChangingNeeds for Quality

Same data used formultiple purposes

HiddenRelationships

Everything and everyoneis interconnected

MagnifiedTrust Issues

New sources ofexternal data

And Regulations And Controls Are Harder To Meet

SOXPCIHIPAAFISMA

ISOGLBANIST

Big Data Governance for Agility and Trust

Collaborative Stewardship 360 Degree Insight Complete Confidence

BusinessContext Provisioning

Role-specific interfaces,business glossary and rules

PolicyDriven Processes

Workflow, approvals, voting

RelationshipDiscovery and View

Big data matching and linking

CatalogOf All Metadata

Smart knowledge graph

Certificationwith Data Quality

Validation, enrichment, standardization

TransparencyIn and Out of the Enterprise

Full data and metadata lineage

PILLAR 3 – Big Data Security

Perimeter Security Is Insufficient

Perimeter security: Outside in security

• Not if, but when• Network focused• Attacks will only grow

Big Data: Bigger Risk

Sensitive Data

Security Exposure

• An exponential attack surface• With exponential risks

Big Data Security Foundation: The ‘Data Perimeter’

Risk Analytics360 Degree Visibility Policy-Based Protection

Risk IdentificationProliferation, Cost, Protection,

Use, Location

DetectionRisky Users

Discoveryof Sensitive Data, with Context

VisualizationsWho, Where, When, What

CentralizedManagement of Rules

De-IdentificationFor Test, Reporting, Analytics

The 3 Pillars of Informatica Big Data ManagementBig Data

Integration• Simple Visual Environment &

Templates• Optimized Execution & Flexible

Deployment• 100’s of Pre-built Transforms,

Connectors & Parsers• Broker-based Data Ingestion

Big Data Governance & Quality

• Collaboration Capabilities• Business Glossary• Profiling and Data Quality

• 360° Relationship Views • End-to-end Data Lineage

Big Data Security

• Sensitive Data Discovery & Classification

• Proliferation Analysis

• Risk Assessment• Persistent & Dynamic Data

Masking

Big Data ManagementKey New Features

A Big Data Fabric Enables Productivity, Repeatability, Collaboration

Automate For Maximum Productivity

100+ PRE-BUILT PARSERS

AND TRANSFORMATIONS

200+PRE-BUILT CONNECTORS

DynamicPROCESSES AND

MAPPINGS

GraphicalDEVELOPMENT

Develop More Quickly And Staff More Quickly

27

HadoopDevelopers

InformaticaDevelopers 100,000+

TRAINED DEVELOPERS WORLDWIDE

500% MORE PRODUCTIVE THAN HAND-CODING

0%RISK OF REWRITING

OUTDATED CODE

Develop Fit-for-Purpose Assets & Drive Collaborative Governance

Apply

DataGovernance

Apply

Measureand

MonitorDefine

Discover

IT Business

Curation of Fit-for-Purpose Data Assets

Raw Prepared Cleansed/ Matched

Hadoop Data Lake

Efficiency & Flexibility with Dynamic Mappings• Mass Ingestion: Build a template once – automate mapping

execution for 1000’s of sources with different schemas automatically• Mapping self adjusts dynamically to external schema changes and

column characteristics

Design time

Run timeAvailable in PC V10.0!

Choice of Execution Engines• For Hadoop execution:

engines as native YARN apps

• Choice of execution on • Map-Reduce• Blaze or • INFA engines outside of

Hadoop • Future: spark based execution as

well as a smart optimizer which decides based on workload

HADOOP Cluster

HDFS

Map-Reduce

Hive Runtime

DIS

INFA Hive Executor

Data Engine Compiler

Blaze Executor

Blaze Runtime

DIS CAL

Hive Driver

Hive MetaStore

YARN

Blaze

Hadoop CAL

Smart Optimizers• In-built mapping optimizer automatically tunes and re-arranges the

mapping for high performance• Early selection, Early projection, Mapping pruning, Semi-join, Join re-ordering

• Automatic partitioning support based on statistics and other heuristics

• Advanced full pushdown optimization support

31

Orderkey = L_ORDERKEY and L_EXTENDEDPRICE < 1000and id1 + id2 > 47 Orderkey = L_ORDERKEY

L_EXTENDEDPRICE < 1000

Id1 + id2 > 47

Enterprise Information Catalog : Basis for Data Intelligence

EICRelationshipsCatalogStatistics

Live Data MapRulesGlossaryRatings

All Informatica

Repositories

Applications, Business glossary &

context

3rd party – BI, Modeling, Big Data,

RDBMS

User Ratings, Feedback,

Operational Stats

• Exploration• Semantic Search• Relationship Discovery

Data Discovery Sensitive Data Tracking

Stewardship & Governance

Smart Suggestions

Live Data Map

Knowledge Graph of all enterprise data assets

• Recommendations• 360 degree views• User Ratings

Project Sonoma : Intelligent Data Lake

EnterpriseInformation

Catalog

BI & Analytics

Self-ServiceData Discovery

IT Monitoring& Tracking

Prepare (Rev)Raw

DataPublished Data Sets

DATADATA

METADATA

Self-Service for Analysts

• Search & Discover

• Prepare & Publish

Visibility for IT

• Usage tracking & monitoring

• Lineage & Security

• Operate at scale

Project Sonoma: Intelligent Data LakeData Analysts

• Enterprise data assets search and discovery

• Data acquisition from on-premise and cloud sources, batch and real-time

• Data set recommendations

• Excel-like Data preparation, enrichment for large data sets

• Data publishing and sharing

Why Informatica?

Data Is ALL We Do

Innovation and Leadership

Magic Quadrant for Data Integration Tools

Magic Quadrant for Enterprise Integration Platform as a Service

Magic Quadrant for Data Quality Tools

Magic Quadrant for Data Masking Technology

Magic Quadrant for Structured Data Archiving

and Application Retirement

Magic Quadrant for Master Data Management of

Customer Data Solutions

These graphics were published by Gartner, Inc. as part of larger research documents and should be evaluated in the context of the entire documents. The Gartner documents are available upon request from Informatica. Gartner does not endorse any vendor, product, or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Informatica Big Data Customers (Sample)

Informatica Big Data Ecosystem Partners

Thank You!

Big Data Management V10.1 LaunchMay 12Webinar – Check Informatica.com

Recommended