20
Copyright Synaltic 2015 Manage tracability with Apache Atlas, a flexible metadata repository Charly Clairmont Synaltic @egwada [email protected] http://synaltic.fr

Manage tracability with Apache Atlas, a flexible metadata repository

Embed Size (px)

Citation preview

Page 1: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Manage tracability with Apache Atlas,a flexible metadata repository

Charly ClairmontSynaltic@[email protected]://synaltic.fr

Page 2: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

More than ten years experience in IT mainly in BI

Cofounder of Altic, now Synaltic

Cofounder of the Hadoop User Groupe France

Belives in Open Source to help enterprises to create value

Helps open source projects to be known via meetups and conference

Charly Clairmont

2

Page 3: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

An integrator company mainly focused in Data Management

Founded in 2004, Synaltic is the merge of two companies Synotis and Altic

25 specialists in Data Management

A Swiss subsidiary, installed in Lausanne

Our values● Commitment● Expertise● Loyalty

Synaltic

3

R&D

Training

SupportProject

Expertise

Data Intelligence

Data Platform

Data Governance

Data ExchangeSYNALTIC

Page 4: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

What about your Data ?

4

Do you know where is your data ?

Do you know who is responsible of this specific datasets ?

Do you know from which application or task this entity was modified last friday ?

Page 5: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Enterprise Data Governance

Provide a common approach to data governance across all systems and data within the organization

– Transparent

– Reproductible

– Auditable

– Consistent

Page 6: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Enterprise Data Governance, in Hadoop

No specific way to address this requirement

– Each project proposes its own way to resolve data governance

– No integration with some existing entreprise frameworks for data governance

Page 7: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Apache Atlas

Data classification

Metadata Exchange

Centralized Auditing

Search & Lineage

Security & Policy engine

Page 8: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Apache Atlas, OverviewData Classification

● Taxonomy business-oriented annotations● Relationships between data sets and underlying elements

including source, target, and derivation processes● Export metadata to third-party systems

Centralized Auditing● Security access information for every application, process● Operational information for execution, steps, and activities

Search & Lineage (Browse)● Navigation paths to explore the data classification and

audit information● Text-based search to locate what is relevant● Visualization of data set lineage

Security & Policy Engine● Compliance policy at runtime based on data classification

schemes● Advanced definition of policies for preventing data

derivation

Page 9: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Apache Atlas, Knowledge StoreKnowledge store categorized with appropriate business-oriented taxonomy

● Data sets & objects● Tables / Columns● Logical context● Source, destination

Support exchange of metadata between foundationcomponents and third-party applications/governance tools

Tech:Titan with Apache HBase

Page 10: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Apache Atlas, Data Lifecycle Management

Provenance

Multi-cluster replication

Data set retention/eviction

Late data handling

AutomationTech:

● Apache Falcon

Page 11: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Apache Atlas, Audit StoreHistorical repository for all governance events

● Security: Access Grant & Deny● Operational: Data Provenance &

Metrics● Indexed and Searchable

Tech:● YARN ATS, Apache HBase, Apache Hive, Solr,

ElasticSearch(Pluggable)

Page 12: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Apache Atlas, SecurityEstablish global security policies based on data classification.

Page 13: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Apache Atlas, Policy EngineRuntime rationalization of policies rules with respect to data asset combinations and time. Fully extensible.

● Metadata based● Geo based rules● Time-based rules● Column /Attribute Prohibitions● Preview: Hive Row and Column Masking

Tech:● Ranger

Page 14: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Apache Atlas, RESTful interfaceExtensible enterprise classification of data assets, relationships and policies organized in a meaningful way -- aligned to business organization.

Supports exploration via user interface

Supports extensibility via API and CLI exposure

Page 15: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

A use case

Our process

ImportImport

TwitterTwitter

HDFS : Raw data

HDFS : Raw data

Data source

RéférentielRéférentiel

Collect from twitter

Hive:urlHive:url

Hive:Hash tagsHive:Hash tags

Hive:usersHive:users AnalyseAnalyse

Build social network

Hive:tweetsHive:tweets

Hive:Social network

Hive:Social network

Data Platform

Page 16: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

A use case

Search basedon tables

Page 17: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

A use case

Search basedon Services

Page 18: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

A use case

Table Metadata

Page 19: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

A use case

Lineage

Page 20: Manage tracability with Apache Atlas, a flexible metadata repository

Copyright Synaltic 2015

Thank you !