Upload
ngdata
View
937
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Lily Smart data,at scalemadE easy
from content storageto scaling smart data
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
the pain
3
moore
data
need fordistributedprocessing
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
the pain
4
» growth of data sets» smart businesses need
to apply analytics to activities» doing business online
means real-time» talent shortage
Smart data,at scalemadE easy
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
LILY
The Real-time Platform built for the Age of Data.
We manage, track and measure your data and users,and do the mat(c)hmaking in-between:» provide you with business intelligence and analytics» harvest user profiles and learn their interests» dynamically engage your users using quality recommendations
5
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
where would you use lily?
» large collections of data» content repositories
» library catalogs
» (media) asset management
» product catalogs
» ‘live’ archives
» large groups of users» e-commerce / retail
» news / media
» ... if you want to use big data, but you need easy.
6
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
+this i
s where th
e magic happens
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
beyond content management
broadcast
revenue
product / service
marketing
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
beyond content management: data + analytics
call to action
revenue
product / service
personalised
recommendations
audience data
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
LILY 2.0: smart data
10
SMARTER DATA data processing
domain knowledge
patternsruleskeywordslists...
relations
recommendationssemantic augmentationAnalytics
usage metrics
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
roadmap
11
» now: highly-scalable data repository: store, index and search» next: with real-time usage stats gathering and analytics» later: and built-in context- and user-sensitive
recommendations
» built on top of Google BigTable / HBase / Solr» identical, robust technology in use at Facebook, Twitter,
StumbleUpon, Yahoo!» scales widely over distributed (cloud) infrastructure
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
Lily Repository Model
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Sample Lily Schema (excerpt)
13
namespaces:{/*Declarationofnamespaceprefixes.*/"org.lilyproject.bookssample":"b","org.lilyproject.vtag":"vtag"},fieldTypes:[{name:"b$title",valueType:{primitive:"STRING"},scope:"versioned"},{name:"b$pages",valueType:{primitive:"INTEGER"},scope:"versioned"},{name:"b$language",valueType:{primitive:"STRING"},scope:"versioned"},{name:"b$authors",valueType:{primitive:"LINK",multiValue:true},scope:"versioned"},
{name:"b$name",valueType:{primitive:"STRING"},scope:"versioned"},{name:"b$bio",valueType:{primitive:"STRING"},scope:"versioned"},{name:"vtag$last",valueType:{primitive:"LONG"},scope:"non_versioned"}],recordTypes:[{name:"b$Book",fields:[{name:"b$title",mandatory:true},{name:"b$pages",mandatory:false},{name:"b$language",mandatory:false},{name:"b$authors",mandatory:false},{name:"vtag$last",mandatory:false}]},
...
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
Lily Architecture(deployment)
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
Lily
Arc
hite
ctur
e(c
ompo
nent
s)
maandag 6 juni 2011
» building and querying indexes, GAE-style
» need for sync/async operations» updating of secondary indexes
(e.g. link tables)
» feeding of Indexer(= indexes Lily-content into Solr)
» not: transactions» need for distribution and
durability
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
HBase indexing & RowLog Library
16
rowkey
val2-B
val3-A
col
order
content table
index table A
rowkey
A
B
col
val3
val2
col
foo6
foo7
maandag 6 juni 2011
denormalization indexing of multiple versions of a record
incremental index updating
batch index building blob content extraction
sharding towards multiple SOLR
instances
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The Lily Indexer
17
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
status june 2011
» Lily 1.0.1 released - developing since Q4/09» some customers - DIY retail / media / news» e-commerce platform project» Lily as the data (integration) tier» first contrib: FrogPond (annotated Java <> Lily mapper)
https://bitbucket.org/calmera/frogpond
18
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Next up: usage stats» sits in CRUD-path» tracks users ops against
records» from both perspectives» arbitrary K/V properties: time,
location, ...
» automatically builds user profiles (as records)» tied to records ops» indexed access» time dimension: trending
19
interactions
indexes
time
recommendationsrecord user
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
from usage stats to recommendations ‘light’
» grouping of users based on» shared properties» shared record access» grouping of records based on» shared properties» shared user operations
20
record user
{connections
recommendations
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
full-on recommendations
» look at real-time-capable Mahout algorithms» pre-index or -calculate as much as possible» save as secondary indexes» present recommendations as part of record API » allow user to contribute ‘domain knowledge’ to
record processing pipeline» pattern detection, keywords, ontologies, ...
21
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
timeline
» Lily + usage stats» Lily + usage stats + light-weight analytics» Lily + recommendations ‘light’» Lily 2.0 : full-on recommendations
22
10/201112/2011 3/2012 6/2012
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
lily enterprise
» adds tools:» yum/deb package repo» cluster deploy scripts
(also EC2)» Admin UI» + enterprise support
23
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
demo (if time permits)
24
message‣to‣from‣parts‣listId‣subject‣sender
part‣content‣mediaType‣message
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
www.lilyproject.org
WHERE?
maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Thank you !for your attentionfor your questions
» @stevenn
maandag 6 juni 2011