20
Sai Paravastu Principal, BAR360 Open Data Platform Why open source has taken precedence in making a common data platform for enterprises ? DAMA Sydney Chapter 11 th August 2015

BAR360 open data platform presentation at DAMA, Sydney

Embed Size (px)

Citation preview

Sai Paravastu

Principal, BAR360

Open Data Platform

Why open source has taken precedence in making a common data platform for enterprises ?

DAMA Sydney Chapter

11th August 2015

Sai Paravastu Principal

360° view of

business Insights

Data Modeling

Business Analytics

Portal Development

Data Services

Business Intelligence

Data Analysis

Confidential

BAR360 is an Australian DATA services business since 2005. We understand the information management needs of Australian businesses and cater tailored business DATA solutions by integrating and improving their data capabilities. BIRT is well open source Business Intelligence engine and we deliver development services, integration, maintenance and training offerings tailored to business needs in the Australia and New Zealand region. We also work with major BIRT OEM vendors like IBM, Micro focus, Schneider-Electric to name a few. We have started our practice in Hadoop and NoSQL for data ingestion and processing in the world of BIG Data Our Value Proposition We Provide Data Cleansing, Quality, Loading, Processing and Reporting services We source and manage DATA solutions focused on achieving ROI. We bring reliability and trustworthiness through simplicity in our engagements with clients. We develop, training and support in implementations. We evangelize in open source technologies. We source local IT resources in projects and stand by our team.

Who is BAR360

Confidential

Extended Partners

We have provided training and professional services to Open source BIRT adopters and BIRT OEM vendors across Australia.

Clients who used our Training and Professional Services

We are planning to grow and develop the open source community as well as building integrated

solutions using open source technologies.

Extended our professional services to other signed partnership agreements with BIRT OEM vendors and open source adopters for Training and Professional services.

Confidential

Business Challenges Achieved

Business Intelligence

• Need to empower users with easy-to use, self-service BI

– Users need actionable insight

– when they need it

– where ever they are

– in the form required

– Delivered based on role and security levels

– Straightforward enough for non-IT staff to use

• Reducing delivery time – Single platform for all systems

– Easy-to-use for a range of skills

Business Analytics

Improve Member Loyalty

Reduce Churn

• Who are our loyal members ?

• Does member cover type has impact on their likelihood of leaving the fund?

• Does member claiming has an impact on their leaving the fund?

• What role does demographics play in churn?

• Is time with fund a factor ?

– Case study developed on “Member Retention Analysis”

– Case study developed “Customer Segmentation and Next Best Offer”

Confidential

Profile of the Principal

Confidential

Sai is an experienced technology consultant with a very good track record of being instrumental in delivering projects Experienced Business & Architecture focused Solution Integration Architect, a strategic thinker and a change agent. Experience in Formulating the IT Architecture and driving solutions in line with EA and presenting to senior management. Manage engagements through to successful completion of projects with in the timelines, meeting all requirements including ROI, business benefits and customer satisfaction. With over 2 decades of experience in technology. Sai is a strategic planner around enterprise data strategy and system development life cycle improvement. He has very good understanding of the challenges faced by the IT as well as business stakeholders in the information management space. Experienced across Manufacturing, Education, Public sector, Banking and Insurance verticals. Associated with many consulting service companies in solution based sales for the last 7 years. I am a Strategic Implementation partner with OpenText and Hortonworks in ANZ region.

Big Data Definition

Big data is a collection of data sets so large and complex that it becomes difficult to process using currently on-hand database management tools or traditional data processing applications

Source: Wikipedia

Web Logs RFID Sensors Social Networks

Internet Text Searches Call Detail Records Astronomy

Atmospheric Info

Genomics Biogeochemical Biological

Military Surveillance Medical Records E-Commerce Video

Traditional Data vs. Big Data

Traditional Data Big Data

Gigabytes to Terabytes Petabytes to Exabytes

Centralized Distributed

Structured Semi-structured to Unstructured

Stable Data Model Flat Schemas

Known Complex Interrelationships Few Complex Interrelationships

Source: Wikibon Community

When to Use Big Data vs. Relational

Big Data Relational

Analysis Type Exploratory analysis to uncover value in the data

Operational analysis of what was discovered

Data Granularity Store HUGE amounts of highly granular data

Store transform (sometimes) aggregated data

Timeframe Data flows in BIG Data “real-time” monitoring

Long term trending analysis

Is Big Data a replacement for Relational Data?

Why BIG Data

In a nutshell, the quest for Big Data is directly attributable to analytics, which has evolved from being a business initiative to a business imperative.

Many vendors are talking about Big Data, but we’re not seeing much more than the ability to store large volumes of data, leaving the organization to “roll their own” applications without much help to make sense of it all. Real value can only emerge from a consumable analytics platform that saves you from having to build applications from scratch one that effectively flattens the time-to-insight curve.

In my opinion BIG Data is truly all about analytics.

Confidential

New Approaches To Big Data Processing & Analytics

Traditional tools and technologies are straining

• New approaches to data processing – Commodity hardware to scale

– Parallel processing techniques

– Non-relational data storage capabilities

– Unstructured, semi-structured data

• Better analytics – Advanced visualization

– Data mining

Source: Wikibon Community

Confidential

New Approaches to Big Data Processing & Analytics

– Hadoop Approach

• Data broken into “parts”

• Loaded into file system

• Multiple nodes

• MapReduce

• Batch-style historical analysis

– NoSQL

• Cassandra, MongoDB, CouchDB, HBase*

• Discrete data stored among large volumes

• Higher performance than relational data sources

– Massively Parallel Analytic Databases

• Quickly ingest mostly structured data

• Minimal data modeling

• Scale to petabytes of data

• Near real-time results to complex SQL

Source: Wikibon Community

Confidential

Big Data Growth Drivers

• Increased awareness of the Big Data benefits

– Not just web, financial services, pharmaceuticals, retail

• Increased maturity of Big Data software

– Data stores, analytical engines

• Increased availability of professional services

– Supporting business use cases

• Increased investment in infrastructure

– Google, Facebook, Amazon

Source: Wikibon Community

Confidential

Top Big Data Challenges

• Data integration – Top challenge – Integrating disparate data, different sources, different formats is difficult

• Getting started with the right project

– Building the right team – Determine the top business problem

• Architecting a big data system.

– High volume, high frequency data – Build unified information architecture

• Lack of skills or staff

– Some hire externally / university hires. – Others try to re-train from within. – Cross pollinate skills from another part of the organization – Build centers of excellence that help with the training

Source:TDWI

• Data privacy, governance and compliance issues

• How it can help business

• Integrating legacy systems

• The cost of implementation

Confidential

Apache Software Foundation – ASF

There are currently 300+ open source initiatives at the ASF:

• 163 committees managing 273 projects

• 5 special committees

• 43 incubating podlings

Source: ASF

Confidential

Open Data Platform

Enabling BIG Data solutions to flourish atop a common core platform

• The Open Data Platform Initiative (ODP) is an enterprise-focused shared industry effort focused on simplifying adoption and promoting the use and advancing the state of Apache Hadoop® and Big Data technologies for the enterprise. It is a non-profit organization being created by folks that help to create: Apache, Eclipse, Linux, OpenStack, OpenDaylight, Open Networking Foundation, OSGI, WSI, UDDI , OASIS, Cloud Foundry Foundation and many others.

• Under the governance of the Apache Software Foundation community to innovate and deliver a common data platform for enterprises as it brings the largest number of developers together to commit far faster than any single vendor could achieve and in a way that is free of friction for the enterprise and vendors build extension on the core of the ODP.

Source: ASF

HIVE Query

PIG Scripting

MAHOUT Machine Learning

MAP REDUCE Distributed processing

YARN Resource scheduling and negotiation

HDFS Distributed Storage

HCATALOG Metadata mgmt

HBASE NoSQL database

SQO

OP

Im

po

rt / Expo

rt

FLUM

E/STOR

M

stream

KA

FKA

Su

b / P

ub

ZOO

KEEP

ER

Co

ord

inatio

n

OO

ZIE W

F auto

matio

n

AM

BA

RI

DRILL Interactive

SPARK / FLINK

FALC

ON

KNOX

TEZ Interactive

AR

VO

d

ata serializatio

n

Confidential

Benefits of ODP

Enabling BIG Data solutions to flourish atop a common core platform

The ODP core is a set of open source Hadoop technologies designed to provide a standardized core that big data solution providers software and hardware developers can use to deliver compatible solutions rooted in open source that unlock customer choice.

Source: ODP

How do we benefit: ASF - 100% focus on enabling collaboration between developers - does not recognize corporations - projects are on completely asynchronous development cycles ODP - Enables collaboration between vendors - Focused on developing a platform , but does not supersede governance - creates complimentary brand value for integrated platform - focused on enterprise use case for hadoop

Confidential

ODP Core will initially focus on Apache Hadoop (inclusive of HDFS, YARN, and MapReduce) and Apache Ambari. Once the ODP members and processes are well established, the scope of the ODP Core may expand to include other open source projects. The ODP Core will deliver the following benefits: • For Apache Hadoop technology vendors, reduced R&D costs that come from a shared qualification effort • For Big Data application solution providers, reduced R&D costs that come from more predictable and better

qualified releases • Improved interoperability within the platform and simplified integration with existing systems in support of a

broad set of use cases • Less friction and confusion for Enterprise customers and vendors • Ability to redirect resources towards higher value efforts

Benefits of ODP

Source: ODP Confidential

1. Provide a stable base against which Big Data solutions providers can qualify solutions. 2. Support community development and outreach activities that accelerate the rollout of modern data

architectures that leverage Apache Hadoop 3. Contribute to ASF projects in accordance with ASF processes and Intellectual Property guidelines. 4. Accelerate the delivery of Big Data solutions by providing a well-defined core platform to target. 5. Define, integrate, test, and certify a standard "ODP Core" of compatible versions of select Big Data open

source projects. 6. Produce a set of tools and methods that enable members to create and test differentiated offerings based

on the ODP Core. 7. Reinforce the role of the Apache Software Foundation (ASF) in the development and governance of

upstream projects. 8. Help minimize the fragmentation and duplication of effort within the industry

ODP Delivers

The ODP Core will take the guesswork out of the process and accelerate many use cases by running on a common platform. Freeing up enterprises and ecosystem vendors to focus on building business driven applications.

Source: ODP Confidential

Thank You Partners

Confidential

BAR360 Sai Paravastu

Principal +61 402 449 524

[email protected] www.bar360.com.au