25
Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch.

Master Meta Data

Embed Size (px)

Citation preview

Page 1: Master Meta Data

Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch.

Page 2: Master Meta Data

Hello!I am Akhil Agrawal

Managing master & meta data is a very common problem with no good opensource alternative as far as I know, so initiating this project – MasterMetaData

Started BIZense in 2008 & Digikrit in 2015

Page 3: Master Meta Data

1.Problem

Let’s start with what problem we are addressing – why mastermetadata ?

Page 4: Master Meta Data

Less Frequently Changing

Master data and meta data both have one common behavior of less frequent changes although their purpose is different.

The less frequently changing data whether it is data about real world entities (master data) or data about other data (meta data), both can be stored, accessed and managed in very similar ways.

Why MasterMetaData ?

Page 5: Master Meta Data

No Open Source Option

There are MDM solutions (mostly from ERP vendors like SAP, Oracle etc. & analytics companies like Informatica, SAS) but the master meta data intersection is being explored only recently.

There is no open source alternatives for smaller companies or something that can be embedded with SAAS products.

Why MasterMetaData ?

Page 6: Master Meta Data

2.Definitions

Let’s start with some definitions around data categories

Page 7: Master Meta Data

Definition of Data Categories

Meta Datameta information about other forms of data (can describe master, transaction or lower level meta data)

Master Datareal world entities like customer, partner etc. (only the stable attributes are considered part of master data)

Transaction Datareal world interactions which have very short lifespan and occurrence is linked with time/space(unstable/changing attribute values, although definition/description is stable but each new data point is unique)

Master Meta Datacombination of master and meta data defined at application, enterprise or global level (although the volume and variety of master & meta data is very different, they have lot of common access patterns)

Page 8: Master Meta Data
Page 9: Master Meta Data

3.Implementation

Let’s discuss the implementation – technologies & concepts involved

Page 10: Master Meta Data

Background

◎ Faced difficulty with managing master and meta data in previous projects

◎ Implemented custom solution while building mobile ad platform

◎ Currently implementing same features required for the communication platform

◎ Have worked with elasticsearch + kibana while kong + cassandra seems useful

Page 11: Master Meta Data

Build With Following Technologies

neo4jhighly scalable native graph database that leverages data relationships as first-class entities, handles evolving data challenges

elasticsearchsearch and analyze data in real time, defacto standard for making data accessible through search and aggregations

cassandraright choice when you need linear scalability and high availability without compromising performance & durability

kongthe open-source management layer for APIs and microservices, delivering security, high performance and reliability

lualua is a powerful, fast, lightweight, embeddable scripting language. For writing kong plugins for access to various meta master data

kibanaexplore and visualize data in elasticsearch, opensource project from elasticsearch team, intuitive interface, visualization & dashboards

Page 12: Master Meta Data

Opensource,

Scalable,Searchable

,Ready to

UseProject mastermetadata needs to be ready to use for atleast few of the use

cases like location, device, movie, tour etc.

Page 13: Master Meta Data

Challenges

Complex & hierarchical data sets

Real-time query performance

Dynamic structure

Evolving relationships

Why neo4j for mastermetadata ?

Why neo4j ?

Native graph store

Flexible schema

Performance and scalability

High availability

Referenced fromhttp://neo4j.com/use-cases/master-data-management

Page 14: Master Meta Data

Why elasticsearch for mastermetadata ?

Scale

◎ Real-Time Data

◎ Massively Distributed

◎ High Availability

◎ Multitenancy

◎ Per-Operation Persistence

Search

◎ Full-Text Search

◎ Document-Oriented

◎ Schema-Free

◎ Developer-Friendly, RESTful API

◎ Build on top of Apache Lucene™

Analytics

◎ Real-Time Advanced Analytics

◎ Very flexible Query DSL

◎ Flexible analytics & visualization platform - Kibana

◎ Real-time summary and charting of streaming data

Referenced from https://www.elastic.co/products/elasticsearch

Page 15: Master Meta Data

Why kong for mastermetadata ?

Secure, Manage & Extend your APIs and Microservices

RESTful Interface

Plugin Oriented

Platform Agnostic

Referenced fromhttps://getkong.org/

Without Kong With Kong

Page 16: Master Meta Data

4.Interesting

What are interesting things happening around this ?

Page 17: Master Meta Data

Master & Metadata Management InteresectionMaximized Metadata Model

◎data model describing the metadata needs to be “maximized” to cover as many use cases possible

◎meta data model needs to be inclusive of all metadata in the organization as well as cover the master data

◎governance of metadata model requires the ability to describe maximum metadata in the system to provide ability to govern data describing other data

Minimalistic Master Data Model

◎master data model describing master data needs to be “minimalist”

◎master data model is neither inclusive of all data in the organization, nor specific to applications using it for specific purpose

◎central governance of master data requires that data model backing it is minimalistic to be able to govern without application specific details

◎master data model is basically metadata describing the master data

Referenced from http://blogs.gartner.com/andrew_white/2011/04/26/more-on-metadata-and-master-data-management-intersection/

Page 18: Master Meta Data

From Big Data To Smart DataZero Latency Organization

data◎ latency linked to the data

(capturing)

◎ latency linked to analytical processes (processing)

structural◎ latency linked to decision

making processes

◎ time needed to implement actions linked with decisions

action◎ data latency added with

structural latency

◎ time needed from capturing of data till the action takes place

valuedata is considered smart based on the value it brings in decision making and action taking (than anything else like size, source, etc)

masterdata which represents real world entities and also remains stable over time is the smart data as it helps with common data reference

metadata which describes other data whether master, transactional or lower level meta data is also smart data as it helps in understanding

Types Of Latency

Smart Data

Page 19: Master Meta Data
Page 20: Master Meta Data

5.Get Involved

Let’s discuss ways to get involved in this project

Page 21: Master Meta Data

Areas where you can get involved ?

DEMO

Functional Tests,Integration Tests,

Run Demo

CODE

Implement Ideas,Fix Bugs,

Enhance Features

DOCUMENT

User Documentation,

Developer Documentation

Page 22: Master Meta Data

Current Focus

Devices

Storage: Device, Browser, OS

Access: User Agent

Locations

Storage: Country, State, City

Access: IP Address

Tours

Storage: People, Interest, Culture, Destination, City, Activity, Duration

Access: What, Where, For

Page 23: Master Meta Data

Storage & Access

Master Data StorageStorage which is highly efficient for read but at the same time efficient for writes. Additional requirement to be able to search the stored data as well as flexible efficient query interface to enable faster access

Meta Data StorageStorage which is highly flexible in defining relationships like inheritance, composition or other relationships. Graph modeled relationships are most flexible to change as and when the model evolves

Diagram featured by poweredtemplate.com

Meta Data Access

CRUD, Fill in the blanks, Semantic Query, Search

Master Data Access

CRUD, Query (Structured / Unstructured) & Search

Page 25: Master Meta Data

Thanks!Any questions?

You can find me at:@[email protected]

Special thanks to all the people who made and released these awesome resources for free: Presentation template by SlidesCarnival Presentation models by SlideModel & PoweredTemplate To companies behind kong, cassandra, neo4j & elasticsearch