19
Evolution of Big Data ICT Business Breakfast Durban, 17 September 2014 Willy Govender

Big Data Evolution

Embed Size (px)

DESCRIPTION

Evolution of Big Data ICT Business Breakfast Durban, 17 September 2014 Willy Govender

Citation preview

Page 1: Big Data Evolution

Evolution of Big Data

ICT Business Breakfast Durban, 17 September 2014 Willy Govender

Page 2: Big Data Evolution

What is Big Data?

“Large volumes of a wide variety of data collected from various sources across the enterprise including transactional data from enterprise applications/databases, social media data, mobile device data, unstructured data/documents, machine-generated data and more.“ Source: IDG: Big Data – Growing Trends and Emerging Opportunities

Page 3: Big Data Evolution

Data Sources

Structured

• Spreadsheets

• Relational Databases

• ERP

• CRM

• Legacy systems

• File share

Unstructured

• Documents

• Machine Data

• Messaging

• Photographs

• Video

• Social Media

• Web traffic logs "90% of all data ever created, was created in the past two years. From now on, the amount of data

in the world will double every two years."

Enterprise Cloud

Page 4: Big Data Evolution

The Evolution of Big Data

Big data is traditionally referred to as 3Vs (now 5V, 7V)

Volume (amount of data collected – terabytes/exabytes)

Velocity (speed/frequency at which data is collected)

Variety (different types of data collected)

Now experts are adding “veracity, variability, visualization, and value”

Big data is not new

Supercomputers have been collecting scientific/research data for decades

However, now its uses are being seen in commercial competitive advantages

And now we are able to collect a variety of data from multiple devices and sources

Is the evolution of the BI ecosystem from data warehousing

Does not make DW obsolete

Big Data approaches are reducing the costs of data management

Data still needs to be standardized, data quality maintained, and access provided to constituent communities.

Data management will continue to be an evolutionary process.

Big data is simply a new data challenge that requires leveraging existing systems in a different way

Page 5: Big Data Evolution

So, what does Big Data do?

Focuses on finding hidden threads, trends, or patterns which may be invisible to the naked eye

Data store of clusters of servers (eg. Apache Hadoop used for Amazon Cloud)

A set of tasks that processes the data in different segments of the cluster then breaks down the results to more manageable chunks which are

Requires mathematical and statistical expertise as well as creative, communicative, problem-solving, and business skills summarized

Obviates the need for Data alignment or Data migration, or the requirement to move data into one place for cross-referencing. This achieved through indexes and crawlers (like Google) which constantly mine data update the indexes.

Page 6: Big Data Evolution

Framework and Data Flows

Data Models, Structures,

Types

• Data formats, non/relational, file systems, etc.

• Big Data Management

Big Data Lifecycle

(Management)

• Big Data transformation/staging

• Recording, Storage, Archiving

Big Data Analytics and

Tools

• Big Data Applications

• Target use, presentation, visualisation

Big Data Infrastructure

(BDI)

• Storage, Compute, (High Performance Computing,) Network

• Sensor network, target/actionable devices

• Big Data Operational support

Big Data Security

• Data security in-rest, in-move, trusted processing environments

Collection and

Registration

Filtering, Classification

and Enrichment

Analytics, Modelling

and Prediction

Presentation and

Visualization

Page 7: Big Data Evolution

What challenges can you expect

Platforms

• High end data warehousing tools

• Open source technologies challenging with accessing data from multiple servers rapidly in native form

• Selection of Enterprise Search Tools

Skills

• Managing Data Volumes

• Ability to really understand what can be achieved

• Open source platforms not easy to use

• Data scientists now required

Leadership

• New territory for IT professionals, so planning, marketing, ROI etc is an issue

• Getting Data on the Board's agenda

Walmart analyses real-time social media data for trend to guide online ad purchases

Page 8: Big Data Evolution

Enterprise Search: Vendors

TCO

FEATURE SET

Low

H

igh

Low High

Niche Progressive

Niche Traditional

Niche Progressive

Niche Traditional

Page 9: Big Data Evolution

Challenges in Big Data

— Increasing Amount of Disorganized Data and Data Sources (structured & unstructured)

Provides greater opportunity for failure –

lack of information can lead to wrong decisions

Limits productivity –

more time and effort needed to find information

Frustrates search users –

information is expected to be readily available and complete

— Not tackling Big Data in enterprises …

Mar

ket

ing

Dat

a

Dat

a W

areh

ou

se

Soci

al M

edia

Res

earc

h D

atab

ases

Off

ice

Fil

es

Tra

nsa

ctio

nal

Dat

a

Acq

uis

itio

n D

ata →

DIG

ITA

L D

ATA

VO

LUM

E

2010 2012 2014 2016 2018 2020

Etc

.

Page 10: Big Data Evolution

Opportunity in Big Data

Source: IDC

35 Zetabytes

DIG

ITA

L D

ATA

VO

LUM

E

2010 2012 2014 2016 2018 2020 STATUS QUO

— Accessible Data Has Value

48% CAGR1

No Specific Solutions

Too hard and expensive

Homegrown Hard to maintain and insufficient

Traditional Solutions Waste countless months on inflexible solutions

— Solution Types

Page 11: Big Data Evolution

Q-Sensei Product – Aimed at bringing Big Data approach to all Enterprises

— Traditional Approaches

— Q-Sensei Revolution

• Complex products

• Rigid delivery model

• Pre-defined usage

• Expensive

• Limited audience

• Exhausting implementation

• Disparate solutions

• Poor interaction design

• Simple

• Powerful

• Fast

• Flexible

• Broad application

• Interactive

• Easy delivery model

• For everyone

Page 12: Big Data Evolution

Case Study mention in Wall Street Journal in 2012

They were able to analyze traffic details for various devices, spot problem areas and add network throughput to help prepare for future demand. Netflix was also able to get more insight into the type of content customers preferred, which enabled them to make more accurate suggestions as to what subscribers might like.

Page 13: Big Data Evolution

Case Study

— Overview

• Premiere Internet subscription service for streaming media and DVD-by-mail services

• Over 50 million subscribers in 40+ countries; Revenue 2013: $4.37 billion

• Contract Management: Permission/licensing agreements with content creators

• Leader in interactive, contextual search changing the way companies search and analyze data

• Patented powerful multidimensional search and index capability

• Gives developers full access to award-winning technology and empowers them to built robust search and analytics applications for all data needs

World's Leading Internet television network (ITN)

Page 14: Big Data Evolution

Case Study – Search in Contracts

— Goals and Key Challenges

1. Make searching their copious contract documentation better manageable and easier to use for end users

2. Integrate and unify their highly structured metadata with their unstructured content data

3. Incorporate Optical Character Recognition (OCR) of scanned documents during data ingestion process

4. Integrate with in-house, Drupal-based content management system

5. Flexibility to consume the data from their custom system

6. Data model that meets various needs of personnel

7. Timeline of only 3 month

Page 15: Big Data Evolution

Case Study – Search in Contracts

— Solution and Successes

1. In 3 month Q-Sensei conceptualized and deployed a solution for contract search needs using Fuse (including usability testing)

2. Addition of further capabilities based on end user feedback:

• n-gram phrase search

• date range search

• multi-sort of facets

• grid view of results

3. The flexibility and modular architecture of Fuse enables customer to implement the platform for further use cases (knowledge base search, log analysis, usage analysis, etc.)

Page 16: Big Data Evolution

Demo

— Q-Sensei Medical Demo • Unified Access to Publications, Grants,

Patents, Office Files, Person

• Content-Based Faceted Auto Complete

• Dynamic Faceting

• Search-within-a-search capability

• Data Interaction and deep Data Correlations

• 360-degree view of information

• Multi-Dimensional Visualization

• Customizable Search Interface

• Integrated Data Sources (21m Publications, 1,8m Grants, 1,5m Patents, Office Files (DOC, XLS, PPT, PDF,…) , Person DB )

Set-up (Harvesting, Importing, Data Transformation, Indexing) in 5 days

Page 17: Big Data Evolution

Performance Metrics

Sample System System Configuration

Performance Based on Sample System

• Intel Ivy Bridge Quadcore 3.4GHz

• 32GB RAM

• 1TB HD

• 64-bit Linux

• Up to 80 million documents can be indexed

• Up to 20 million records can be uploaded per hour (more than 5,000/sec)

• 100,000 search queries can be processed per minute per million documents; a query includes:

• processing of search expression (including fulltext)

• computation of eight (8) standard facets

(Latest test: September 2013)

Page 18: Big Data Evolution

Contract Management Search • Create a more accurate and efficient contract search by

exposing all metadata and using facets • Search scanned documents with advanced OCR capabilities

Knowledge Base / Support Center Search • Increase the efficiency of finding answers by utilizing more

metadata in your knowledge base • Embrace tags and faceted search over hierarchy to find

answers more quickly

Enterprise Search • Unify your company’s information by searching all sources

simultaneously • Increase the productivity of everyone with better data

accessibility

Usage Analysis • Increase speed and agility of customer activity analysis by

embracing a multidimensional view of your data • Drive dynamic visualizations and build complex queries

Structured Data Analysis • Understand the composition of data, find relationships, and

identify trends • View data more accurately by analyzing all attributes

simultaneously

E-Commerce Faceted Navigation • More accurately represent your products with dynamically

updating facets that perform at scale • Power more meaningful recommendations with the capability

to use more metadata

Further Use Cases

— A Single Platform for Everything

Page 19: Big Data Evolution

Other Examples East London Rural Mapping