AtlasCHUG

Embed Size (px)

Citation preview

  1. 1. Modern Data Governance Data Management Realities in an Any Data Architecture
  2. 2. What is Modern Data? Clickstream, web and social, geo-location, IoT, server logs, etc are considered modern. (think schema-on-read) ERP, CRM, SCM and LOB-specific OLTP are considered traditional. (think schema-on-write) Mainframe is considered legacy. (think mission-critical)
  3. 3. What is Modern Data? Modern Data refers to stream processing In a streaming data model, you store queries and then continuously run data through the queries. (think event-driven model) Both Modern and traditional data refer to Batch Processing In a traditional query model, you store data and then run queries on the data as needed. (think query-driven model)
  4. 4. What is Modern Data? Modern Data refers to data; not to technologies. It it the responsibility of those of us who architect, develop and implement data technologies to appreciate this difference. There have been many hard-won lessons learned in enterpise data management. The criticality of Data Governance may well top this list.
  5. 5. What is Data Governance? The process by which an organization formalizes the fiduciary duty for the management of data assets critical to its success. Forrester Data governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed upon models, which describe who can take what actions with what information, and when, under what circumumstances, using what methods. Data Governance Institute
  6. 6. 3V + 1 = Enterprise-Ready @ Scale
  7. 7. Atlas Proposal Background Hadoop is one of many platforms in the modern enterprise data ecosystem and requires governance controls commensurate with this reality. Currently, there is no easy or complete way to provide comprehensive visibility and control into Hadoop audit, lineage, and security for workflows that require Hadoop and non-Hadoop processing. Many solutions are usually point based, and require a monolithic application workflow. Multi-tenancy and concurrency are problematic as these offerings are not aware of activity outside of their narrow focus. As Hadoop gains greater popularity, governance concerns will become increasingly vital to increasing maturity and furthering adoption. It is a particular barrier to expanding enterprise data under management.
  8. 8. Atlas Proposal Apache Atlas allows agnostic governance visibility into Hadoop, these abilities are enabled through a set of core foundational services powered by a flexible metadata repository. These services include: Search and Lineage for datasets Metadata driven data access control Indexed and Searchable Centralized Auditing operational Events Data lifecycle management ingestion to disposition Metadata interchange with other metadata tools
  9. 9. Atlas Proposal
  10. 10. Architecture
  11. 11. Class Diagram
  12. 12. Types Instance