1. Modern Data Governance Data Management Realities in an Any
Data Architecture
2. What is Modern Data? Clickstream, web and social,
geo-location, IoT, server logs, etc are considered modern. (think
schema-on-read) ERP, CRM, SCM and LOB-specific OLTP are considered
traditional. (think schema-on-write) Mainframe is considered
legacy. (think mission-critical)
3. What is Modern Data? Modern Data refers to stream processing
In a streaming data model, you store queries and then continuously
run data through the queries. (think event-driven model) Both
Modern and traditional data refer to Batch Processing In a
traditional query model, you store data and then run queries on the
data as needed. (think query-driven model)
4. What is Modern Data? Modern Data refers to data; not to
technologies. It it the responsibility of those of us who
architect, develop and implement data technologies to appreciate
this difference. There have been many hard-won lessons learned in
enterpise data management. The criticality of Data Governance may
well top this list.
5. What is Data Governance? The process by which an
organization formalizes the fiduciary duty for the management of
data assets critical to its success. Forrester Data governance is a
system of decision rights and accountabilities for
information-related processes, executed according to agreed upon
models, which describe who can take what actions with what
information, and when, under what circumumstances, using what
methods. Data Governance Institute
6. 3V + 1 = Enterprise-Ready @ Scale
7. Atlas Proposal Background Hadoop is one of many platforms in
the modern enterprise data ecosystem and requires governance
controls commensurate with this reality. Currently, there is no
easy or complete way to provide comprehensive visibility and
control into Hadoop audit, lineage, and security for workflows that
require Hadoop and non-Hadoop processing. Many solutions are
usually point based, and require a monolithic application workflow.
Multi-tenancy and concurrency are problematic as these offerings
are not aware of activity outside of their narrow focus. As Hadoop
gains greater popularity, governance concerns will become
increasingly vital to increasing maturity and furthering adoption.
It is a particular barrier to expanding enterprise data under
management.
8. Atlas Proposal Apache Atlas allows agnostic governance
visibility into Hadoop, these abilities are enabled through a set
of core foundational services powered by a flexible metadata
repository. These services include: Search and Lineage for datasets
Metadata driven data access control Indexed and Searchable
Centralized Auditing operational Events Data lifecycle management
ingestion to disposition Metadata interchange with other metadata
tools