From Content Storage to Scaling Smart Data

  • Published on
    21-Jan-2015

  • View
    919

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Transcript

<ul><li> 1. Smart data, Lilyat scale madE easyfrom content storageto scaling smart dataIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.orgmaandag 6 juni 2011</li></ul> <p> 2. IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 2maandag 6 juni 2011 3. the paindata need fordistributedprocessing mooreIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 3maandag 6 juni 2011 4. the pain growth of data sets smart businesses needto apply analytics toSmart data,activities at scale doing business onlinemeans real-time madE easy talent shortageIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 4maandag 6 juni 2011 5. LILYThe Real-time Platform built for the Age of Data.We manage, track and measure your data and users,and do the mat(c)hmaking in-between: provide you with business intelligence and analytics harvest user proles and learn their interests dynamically engage your users using quality recommendations IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 5maandag 6 juni 2011 6. where would you use lily? large collections of data large groups of users content repositories e-commerce / retail library catalogs news / media (media) asset management product catalogs live archives ... if you want to use bigdata, but you need easy.IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 6maandag 6 juni 2011 7. ns pe apgic h mahet re heswsi + thiIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 7maandag 6 juni 2011 8. beyond content managementmarketingbroadcast revenue product / serviceIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 8maandag 6 juni 2011 9. beyond content management: data + analyticsrecommendations call to actionpersonalised revenue product / service audience dataIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 9maandag 6 juni 2011 10. LILY 2.0: smart dataSMARTER DATA data processingsrelation recommendations semantic augmentation Analyticsusage metricsdomainknowledgepatternsruleskeywordslists...IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 10maandag 6 juni 2011 11. roadmap now: highly-scalable data repository: store, index and search next: with real-time usage stats gathering and analytics later: and built-in context- and user-sensitiverecommendations built on top of Google BigTable / HBase / Solr identical, robust technology in use at Facebook, Twitter,StumbleUpon, Yahoo! scales widely over distributed (cloud) infrastructure IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 11maandag 6 juni 2011 12. Lily Repository ModelIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 12maandag 6 juni 2011 13. Sample Lily Schema (excerpt){ namespaces:{name:"b$name", /*Declarationofnamespaceprefixes.*/valueType:{primitive:"STRING"}, "org.lilyproject.bookssample":"b",scope:"versioned" "org.lilyproject.vtag":"vtag"}, },{ fieldTypes:[name:"b$bio", {valueType:{primitive:"STRING"}, name:"b$title",scope:"versioned" valueType:{primitive:"STRING"},}, scope:"versioned"{ },name:"vtag$last", {valueType:{primitive:"LONG"}, name:"b$pages",scope:"non_versioned" valueType:{primitive:"INTEGER"},} scope:"versioned"], },recordTypes:[ {{ name:"b$language",name:"b$Book", valueType:{primitive:"STRING"},fields:[ scope:"versioned"{name:"b$title",mandatory:true}, },{name:"b$pages",mandatory:false}, {{name:"b$language",mandatory:false}, name:"b$authors",{name:"b$authors",mandatory:false}, valueType:{primitive:"LINK",multiValue:true},{name:"vtag$last",mandatory:false} scope:"versioned"] },},... IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 13maandag 6 juni 2011 14. Lily Architecture (deployment)IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 14maandag 6 juni 2011 15. Lily Architecture (components)IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 15maandag 6 juni 2011 16. HBase indexing &amp; RowLog Library building and querying need for sync/asyncindexes, GAE-style operations updating of secondary indexesrowkeycolcolcontentA val3 foo6(e.g. link tables)tableB val2 foo7 feeding of Indexer (= indexes Lily-content into Solr) rowkeycol not: transactionsorderindextable Aval2-B val3-A need for distribution and durability IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org16maandag 6 juni 2011 17. The Lily Indexersharding towards indexing of multiple incremental indexblob content denormalizationbatch index building multiple SOLR versions of a recordupdating extractioninstancesIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org17maandag 6 juni 2011 18. status june 2011 Lily 1.0.1 released - developing since Q4/09 some customers - DIY retail / media / news e-commerce platform project Lily as the data (integration) tier rst contrib: FrogPond (annotated Java Lily mapper)https://bitbucket.org/calmera/frogpondIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 18maandag 6 juni 2011 19. Next up: usage stats sits in CRUD-path tracks users ops againstrecordsinteractions from both perspectives recorduser arbitrary K/V properties: time,location, ...rec automatically builds user om mendati oproles (as records) ns indexese tim tied to records ops indexed access time dimension: trendingIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 19maandag 6 juni 2011 20. from usage stats to recommendations lightrecord user grouping of users based on shared properties shared record access grouping of records based on shared properties { connections shared user operationsrecommendations IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 20maandag 6 juni 2011 21. full-on recommendations look at real-time-capable Mahout algorithms pre-index or -calculate as much as possible save as secondary indexes present recommendations as part of record API allow user to contribute domain knowledge torecord processing pipeline pattern detection, keywords, ontologies, ...IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 21maandag 6 juni 2011 22. timeline Lily + usage stats 10/2011 Lily + usage stats + light-weight analytics12/2011 Lily + recommendations light3/2012 Lily 2.0 : full-on recommendations6/2012 IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 22maandag 6 juni 2011 23. lily enterprise adds tools: yum/deb package repo cluster deploy scripts(also EC2) Admin UI + enterprise supportIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 23maandag 6 juni 2011 24. demo (if time permits) message partto contentfrom mediaTypepartsmessagelistIdsubjectsenderIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 24maandag 6 juni 2011 25. WHERE?www.lilyproject.orgIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 25maandag 6 juni 2011 26. Thank you ! for your attention for your questions stevenn@outerthought.org @stevennIIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.orgmaandag 6 juni 2011 </p>

Recommended

View more >