30
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. New Trends in Data Management in the Information Industries Presented by: Matt Turner, CTO Media and Publishing February, 2015

New Trends in Data Management in the Information Industries

Embed Size (px)

Citation preview

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

New Trends in Data Management in the

Information IndustriesPresented by: Matt Turner, CTO Media and Publishing

February, 2015

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2

Agenda

Introduction

Information Industries Trends

Top 5 Challenges in the Industry

New Approaches and Solutions

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3

Hierarchical EraFor your application data!• Application- and

hardware-specific

Data Drives the Need for a New Generation Database

Relational Era“For all your structured data!”• Normalized, tabular

model• Application-

independent query• User control

Any Structure Era“For all your data!”• Schema-agnostic• Massive scale• Query and search• Analytics• Heterogeneous data • Faster time-to-results

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4

Harnessing Data & Reimagining Applications

Reduce Risk

Manage Compliance

Create New Value from Data

Optimize Operations

Lower TCO / Better IT Economics

Better Decision-making

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5

MarkLogic:

Best Operational

Data Warehouse

(Aug 2014)

Enterprise NoSQL Database Platform

Flexible Data

Model

Store and manage

JSON, XML, RDF,

and Geospatial data

with a document-

centric, schema-

agnostic database

Scalability

and Elasticity

ACID

Transactions

Search and

Query

Semantics Certified

Security

Hadoop

Integration

Scale to

petabytes of data

without over-

provisioning or

over-spending

Avoid data loss,

data corruption,

and stale

reads—even at

speed and scale

Lightning fast,

sophisticated,

sub-second

search and

query across all

of your data

Store and query

linked data as

RDF and

SPARQL

Make your

Hadoop better

by connecting

it to MarkLogic

Government-

grade, granular,

role-based

security

DECADE+ OF INNOVATIONWorking Together To Reimagine Applications

PUBLISHING: CHANGE IS THE ONLY CONSTANT

FROM PUBLISHERS TO INFORMATION PROVIDERS

TRADITIONAL PUBLISHING

FORM BASED

PRODUCTS

DEDICATED

PRODUCT

INFRASTRUCTURE

Product A Dedicated Infrastructure

(database + search engine)

Product B

Product C

Company Data

Industry Data

Filings

Reports

INFORMATION DELIVERY PLATFORM

FORMAT

INDEPENDEN

T

INFORMATION

CENTRIC

DYNAMIC

DELIVERY

Company Data

Industry Data

Filings

Reports

Deliver the right content,

to the right user,

in the right format,

in real time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13

Top 5 Requirements for Information Providers

Getting data IN fast isn’t the problem – it’s getting insights OUT Faster!

Data is complex – but users want complexity hidden!

Not everyone has permission to access all the data…

Repurpose, repurpose, repurpose. Repeat

Once you attract them – you must be reliable

1

2

3

4

5

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14

Traditional Technology

Rows and columns for content strip

information

Title Publication

Date

Category Abstract Section Section 2?

Science

Article 1

3/1/14 Biology Abstract

text . . .

Section

text

Section text

Research

Book

6/4/13 Surgery Abstract

text . . .

Section

text

Section text

Science

Article 2

6/4/05 Chemistry Abstract

text . . .

Section

text

Section text

?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

Traditional Technology

Rows and columns for content strip

information

Hierarchical taxonomies overlap and don’t

capture the complexity

Title Publication

Date

Category Abstract Section Section 2?

Science

Article 1

3/1/14 Biology Abstract

text . . .

Section

text

Section text

Research

Book

6/4/13 Surgery Abstract

text . . .

Section

text

Section text

Science

Article 2

6/4/05 Chemistry Abstract

text . . .

Section

text

Section text

?

ResearchMedicine

Science

Surgery

Orthopedics

Cell Biology

Biochemistry

….

Life Sciences

Biomedical

Sciences

Cell Biology

Biology

Biochemistry

…Chemistry

Microbiology

Biochemistry

?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16

8. Develop, integrate and test

infrastructure & applications

4. Define schemas, indexes

and services1. Design infrastructure,

services & applications 2. Analyze Data Formats

Articles Books

Industry

DataReports

5. Build databases,

middleware and services

infrastructure

6. Define & implement ETL

processes

The Functional Solution Silos & Treadmill

7. Load and normalize data

3. Define queries & Service

APIs

?

?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17

Hierarchical EraFor your application data!• Application- and

hardware-specific

Data Drives the Need for a New Generation Database

Relational Era“For all your structured data!”• Normalized, tabular

model• Application-

independent query• User control

Any Structure Era“For all your data!”• Schema-agnostic• Massive scale• Query and search• Analytics• Heterogeneous data • Faster time-to-results

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18

No need to define up front

Matched to complex content and

metadata data modeling

Data is managed in its most

accessible, natural form

XML, JSON, RDF, geospatial

Flexible Data ModelSchema-agnostic, structure-aware

Result: Product content and data from

multiple sources available to be tailored to

any purpose and product

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19

Search and QuerySearch to find answers in documents, relationships, and metadata

Automatic indexing of every data value, text and data

structure

Specialized indexes for data values (analytics, facets,

sorting), geospatial and triples

All updated in the context of ACID transactions to

ensure data integrity and real-time access

Accessible via fully programmable search API with full-

text search, type-ahead suggestions, facets, snippeting,

highlighted search terms, proximity boosting, relevance

ranking, and language support

JavaScript XQuery SPARQL

Rich Query

Capability

In-database

MapReduce

Full-text

Search

Semantic

Search

Geospatial

Search

Result: simplified architecture with a single

component for search and database

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20

SemanticsEnterprise triple store, document store, and database combined

Store and query billions of facts and relationships

Leverage ontologies for domain and role specific

context access to data and documents

Efficient metadata management with relationships

to ontologies

Standards-based for ease of use and integration

– RDF, SPARQL, and standard REST

interfaces

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21

Documents, data and triples provide complete picture of content

Semantics

Result: context to tailor information to your user’s role, activity and location

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 22

Scalability, Elasticity and CloudMassive enterprise scalability and elasticity

Scale horizontally in clusters on commodity

hardware to hundreds of nodes, petabytes of

data, and billions of documents

Process thousands of multi-document multi-

statement transactions per second

Start small and scale up or down to meet capacity

and performance demands without over-

provisioning or over-spending

Fully cloud enabled for automated deployment

and management on EC2

Leverage dynamic configurations with Tiered

Storage

D-NODE D-NODE

E-NODE E-NODE

D-NODE

Result: Enterprise-ready to power mission critical products

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23

8. Develop, integrate and test

infrastructure & applications

1. Design infrastructure,

services & applications

With MarkLogic…

3. Define queries & Service

APIs

?

?

When something changes.... It’s no big deal

INFORMATION DELIVERY PLATFORM EXTENDED

Content and

Customers

Complete Picture of

Business

Metrics Driving Product

Development and

Sales

Company Data

Industry Data

Filings

Reports

Catalogs Lists

Authors Institutions Social Media + Usage

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25

Use Case: Master Data

Foundational data for

digital products

Industry topology and

trends to drive innovation

User and content metrics

to drive product

development

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26

Use Case: Enhance Digital Products

Present information based on

relationships

Go beyond traditional technology with

depth of content

Drive efficiency using semantic

approach to tagging

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27

Use Case: Go Beyond Search

Concept instead of keyword search

Related content and information

drive the content discovery and new

interactions

– SNL40 continuous viewing

Dynamically tailored to the users

specific attributes or activity

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28

Use Case: ‘Everything Else’

Tailor views and access to

information with multiple ontologies

Example: follow scientist from

research to the workbench to

conferences to publishing

Content delivery tailored to the

users role, activity and location

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29

Top 5 Requirements for Information Providers

Getting data IN fast isn’t the problem – it’s getting insights OUT Faster!

Data is complex – but users want complexity hidden!

Not everyone has permission to access all the data…

Repurpose, repurpose, repurpose. Repeat

Once you attract them – you must be reliable

1

2

3

4

5

Any Questions?