Transcript
Page 1: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Lily Smart data,at scalemadE easy

from content storageto scaling smart data

maandag 6 juni 2011

Page 2: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2

maandag 6 juni 2011

Page 3: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

the pain

3

moore

data

need fordistributedprocessing

maandag 6 juni 2011

Page 4: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

the pain

4

» growth of data sets» smart businesses need

to apply analytics to activities» doing business online

means real-time» talent shortage

Smart data,at scalemadE easy

maandag 6 juni 2011

Page 5: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

LILY

The Real-time Platform built for the Age of Data.

We manage, track and measure your data and users,and do the mat(c)hmaking in-between:» provide you with business intelligence and analytics» harvest user profiles and learn their interests» dynamically engage your users using quality recommendations

5

maandag 6 juni 2011

Page 6: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

where would you use lily?

» large collections of data» content repositories

» library catalogs

» (media) asset management

» product catalogs

» ‘live’ archives

» large groups of users» e-commerce / retail

» news / media

» ... if you want to use big data, but you need easy.

6

maandag 6 juni 2011

Page 7: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7

+this i

s where th

e magic happens

maandag 6 juni 2011

Page 8: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8

beyond content management

broadcast

revenue

product / service

marketing

maandag 6 juni 2011

Page 9: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9

beyond content management: data + analytics

call to action

revenue

product / service

personalised

recommendations

audience data

maandag 6 juni 2011

Page 10: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

LILY 2.0: smart data

10

SMARTER DATA data processing

domain knowledge

patternsruleskeywordslists...

relations

recommendationssemantic augmentationAnalytics

usage metrics

maandag 6 juni 2011

Page 11: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

roadmap

11

» now: highly-scalable data repository: store, index and search» next: with real-time usage stats gathering and analytics» later: and built-in context- and user-sensitive

recommendations

» built on top of Google BigTable / HBase / Solr» identical, robust technology in use at Facebook, Twitter,

StumbleUpon, Yahoo!» scales widely over distributed (cloud) infrastructure

maandag 6 juni 2011

Page 12: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12

Lily Repository Model

maandag 6 juni 2011

Page 13: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Sample Lily Schema (excerpt)

13

namespaces:{/*Declarationofnamespaceprefixes.*/"org.lilyproject.bookssample":"b","org.lilyproject.vtag":"vtag"},fieldTypes:[{name:"b$title",valueType:{primitive:"STRING"},scope:"versioned"},{name:"b$pages",valueType:{primitive:"INTEGER"},scope:"versioned"},{name:"b$language",valueType:{primitive:"STRING"},scope:"versioned"},{name:"b$authors",valueType:{primitive:"LINK",multiValue:true},scope:"versioned"},

{name:"b$name",valueType:{primitive:"STRING"},scope:"versioned"},{name:"b$bio",valueType:{primitive:"STRING"},scope:"versioned"},{name:"vtag$last",valueType:{primitive:"LONG"},scope:"non_versioned"}],recordTypes:[{name:"b$Book",fields:[{name:"b$title",mandatory:true},{name:"b$pages",mandatory:false},{name:"b$language",mandatory:false},{name:"b$authors",mandatory:false},{name:"vtag$last",mandatory:false}]},

...

maandag 6 juni 2011

Page 14: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14

Lily Architecture(deployment)

maandag 6 juni 2011

Page 15: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15

Lily

Arc

hite

ctur

e(c

ompo

nent

s)

maandag 6 juni 2011

Page 16: From Content Storage to Scaling Smart Data

» building and querying indexes, GAE-style

» need for sync/async operations» updating of secondary indexes

(e.g. link tables)

» feeding of Indexer(= indexes Lily-content into Solr)

» not: transactions» need for distribution and

durability

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase indexing & RowLog Library

16

rowkey

val2-B

val3-A

col

order

content table

index table A

rowkey

A

B

col

val3

val2

col

foo6

foo7

maandag 6 juni 2011

Page 17: From Content Storage to Scaling Smart Data

denormalization indexing of multiple versions of a record

incremental index updating

batch index building blob content extraction

sharding towards multiple SOLR

instances

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

The Lily Indexer

17

maandag 6 juni 2011

Page 18: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

status june 2011

» Lily 1.0.1 released - developing since Q4/09» some customers - DIY retail / media / news» e-commerce platform project» Lily as the data (integration) tier» first contrib: FrogPond (annotated Java <> Lily mapper)

https://bitbucket.org/calmera/frogpond

18

maandag 6 juni 2011

Page 19: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Next up: usage stats» sits in CRUD-path» tracks users ops against

records» from both perspectives» arbitrary K/V properties: time,

location, ...

» automatically builds user profiles (as records)» tied to records ops» indexed access» time dimension: trending

19

interactions

indexes

time

recommendationsrecord user

maandag 6 juni 2011

Page 20: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

from usage stats to recommendations ‘light’

» grouping of users based on» shared properties» shared record access» grouping of records based on» shared properties» shared user operations

20

record user

{connections

recommendations

maandag 6 juni 2011

Page 21: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

full-on recommendations

» look at real-time-capable Mahout algorithms» pre-index or -calculate as much as possible» save as secondary indexes» present recommendations as part of record API » allow user to contribute ‘domain knowledge’ to

record processing pipeline» pattern detection, keywords, ontologies, ...

21

maandag 6 juni 2011

Page 22: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

timeline

» Lily + usage stats» Lily + usage stats + light-weight analytics» Lily + recommendations ‘light’» Lily 2.0 : full-on recommendations

22

10/201112/2011 3/2012 6/2012

maandag 6 juni 2011

Page 23: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

lily enterprise

» adds tools:» yum/deb package repo» cluster deploy scripts

(also EC2)» Admin UI» + enterprise support

23

maandag 6 juni 2011

Page 24: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

demo (if time permits)

24

message‣to‣from‣parts‣listId‣subject‣sender

part‣content‣mediaType‣message

maandag 6 juni 2011

Page 25: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

www.lilyproject.org

WHERE?

maandag 6 juni 2011

Page 26: From Content Storage to Scaling Smart Data

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Thank you !for your attentionfor your questions

» [email protected]

» @stevenn

maandag 6 juni 2011


Recommended