Upload
synaltic-group
View
837
Download
3
Embed Size (px)
Citation preview
Copyright Synaltic 2015
Manage tracability with Apache Atlas,a flexible metadata repository
Charly ClairmontSynaltic@[email protected]://synaltic.fr
Copyright Synaltic 2015
More than ten years experience in IT mainly in BI
Cofounder of Altic, now Synaltic
Cofounder of the Hadoop User Groupe France
Belives in Open Source to help enterprises to create value
Helps open source projects to be known via meetups and conference
Charly Clairmont
2
Copyright Synaltic 2015
An integrator company mainly focused in Data Management
Founded in 2004, Synaltic is the merge of two companies Synotis and Altic
25 specialists in Data Management
A Swiss subsidiary, installed in Lausanne
Our values● Commitment● Expertise● Loyalty
Synaltic
3
R&D
Training
SupportProject
Expertise
Data Intelligence
Data Platform
Data Governance
Data ExchangeSYNALTIC
Copyright Synaltic 2015
What about your Data ?
4
Do you know where is your data ?
Do you know who is responsible of this specific datasets ?
Do you know from which application or task this entity was modified last friday ?
Copyright Synaltic 2015
Enterprise Data Governance
Provide a common approach to data governance across all systems and data within the organization
– Transparent
– Reproductible
– Auditable
– Consistent
Copyright Synaltic 2015
Enterprise Data Governance, in Hadoop
No specific way to address this requirement
– Each project proposes its own way to resolve data governance
– No integration with some existing entreprise frameworks for data governance
Copyright Synaltic 2015
Apache Atlas
Data classification
Metadata Exchange
Centralized Auditing
Search & Lineage
Security & Policy engine
Copyright Synaltic 2015
Apache Atlas, OverviewData Classification
● Taxonomy business-oriented annotations● Relationships between data sets and underlying elements
including source, target, and derivation processes● Export metadata to third-party systems
Centralized Auditing● Security access information for every application, process● Operational information for execution, steps, and activities
Search & Lineage (Browse)● Navigation paths to explore the data classification and
audit information● Text-based search to locate what is relevant● Visualization of data set lineage
Security & Policy Engine● Compliance policy at runtime based on data classification
schemes● Advanced definition of policies for preventing data
derivation
Copyright Synaltic 2015
Apache Atlas, Knowledge StoreKnowledge store categorized with appropriate business-oriented taxonomy
● Data sets & objects● Tables / Columns● Logical context● Source, destination
Support exchange of metadata between foundationcomponents and third-party applications/governance tools
Tech:Titan with Apache HBase
Copyright Synaltic 2015
Apache Atlas, Data Lifecycle Management
Provenance
Multi-cluster replication
Data set retention/eviction
Late data handling
AutomationTech:
● Apache Falcon
Copyright Synaltic 2015
Apache Atlas, Audit StoreHistorical repository for all governance events
● Security: Access Grant & Deny● Operational: Data Provenance &
Metrics● Indexed and Searchable
Tech:● YARN ATS, Apache HBase, Apache Hive, Solr,
ElasticSearch(Pluggable)
Copyright Synaltic 2015
Apache Atlas, SecurityEstablish global security policies based on data classification.
Copyright Synaltic 2015
Apache Atlas, Policy EngineRuntime rationalization of policies rules with respect to data asset combinations and time. Fully extensible.
● Metadata based● Geo based rules● Time-based rules● Column /Attribute Prohibitions● Preview: Hive Row and Column Masking
Tech:● Ranger
Copyright Synaltic 2015
Apache Atlas, RESTful interfaceExtensible enterprise classification of data assets, relationships and policies organized in a meaningful way -- aligned to business organization.
Supports exploration via user interface
Supports extensibility via API and CLI exposure
Copyright Synaltic 2015
A use case
Our process
ImportImport
TwitterTwitter
HDFS : Raw data
HDFS : Raw data
Data source
RéférentielRéférentiel
Collect from twitter
Hive:urlHive:url
Hive:Hash tagsHive:Hash tags
Hive:usersHive:users AnalyseAnalyse
Build social network
Hive:tweetsHive:tweets
Hive:Social network
Hive:Social network
Data Platform
Copyright Synaltic 2015
A use case
Search basedon tables
Copyright Synaltic 2015
A use case
Search basedon Services
Copyright Synaltic 2015
A use case
Table Metadata
Copyright Synaltic 2015
A use case
Lineage
Copyright Synaltic 2015
Thank you !