75
Peter Brantley Dallas Internet Archive Texas The Presidio 01.2012

Breaking the catalog

Embed Size (px)

DESCRIPTION

Discusses the necessity of linked data to inform discovery; the benefits of aggregation and closed data; the issues with rights.

Citation preview

Page 1: Breaking the catalog

Peter  Brantley      Dallas  Internet  Archive      Texas  The  Presidio      01.2012  

Page 2: Breaking the catalog

I  have  a  book.  

Page 3: Breaking the catalog
Page 4: Breaking the catalog
Page 5: Breaking the catalog

It’s  really  a  database  in  a  book.  

Page 6: Breaking the catalog

Doesn’t  exist  on  the  web.  

Page 7: Breaking the catalog

The  catalog  entry  is  not  useful.  

Page 8: Breaking the catalog
Page 9: Breaking the catalog

 It  does  not  even  give  you  a  hint    of  the  awesomeness  of  it.  

Page 10: Breaking the catalog

 All  bibliographic  data  underperforms  in  this  way,  no  matter  how  we  describe  it.      

Page 11: Breaking the catalog

 And  it  can’t  do  much  for  discovery.  

Page 12: Breaking the catalog

 Discovery  is  a  lot  more  than  a    better  index  of  metadata.  

Page 13: Breaking the catalog
Page 14: Breaking the catalog
Page 15: Breaking the catalog

 Discovery  is  metadata,      contextualized  by  user      desire.    

Page 16: Breaking the catalog

Which  means:    

what’s  relevant  to  me,    right  now,    right  here.  

Page 17: Breaking the catalog

 One  of  linked  data’s  challenges  is  contributing  to  discovery.  

Page 18: Breaking the catalog

Consider:    Small  Demons    

(smalldemons.com)  

Page 19: Breaking the catalog

 Literature  through  freebase  with  zemanta  entity  extraction  and  matching  

Page 20: Breaking the catalog

 Very  nice  enhanced  browse  capacity  for  ebook  discovery.  

Page 21: Breaking the catalog

 APIs  could  engender  range  of  new  services.  

Page 22: Breaking the catalog

 But  …  data  for  recommendations  is  limited      to  known  attributes  and  UGC.    

Page 23: Breaking the catalog
Page 24: Breaking the catalog

 Cool  when  it  works;  will  work  better  with  more  aggregation.    

Page 25: Breaking the catalog

 Lesson:  Information  format  is  often  divorced      from  its  utility  

Page 26: Breaking the catalog

and  even  more  importantly  …    

Page 27: Breaking the catalog

 most  open  culture  search  is  absolutely  ignorant  of  the  context  of  my  desires.  

Page 28: Breaking the catalog

remember  “Lincoln”  

Page 29: Breaking the catalog

 at  the  time  of  writing,  there’s  one  newly  released  and  top-­‐selling  book.  

Page 30: Breaking the catalog

 chances  are,  at  time  of  writing,      it’s  that  book  that  I  want.    

Page 31: Breaking the catalog

 Amazon  can  figure  that  out.  

Page 32: Breaking the catalog

 Because  they  are  selling  a  shitload  of  them.  

Page 33: Breaking the catalog

 Simple:  increase  relevancy  by  incorporating  bias  toward  most  recent  retrievals.    

Page 34: Breaking the catalog

 Easy  for  Amazon:  they  have  sales  data.  

Page 35: Breaking the catalog

 Library  (ebook)  circulation  is  increasingly  meaningless,  or  more  accurately,  unavailable.  

Page 36: Breaking the catalog

 The  book  is  online.    But  digitally  off-­‐site.  

Page 37: Breaking the catalog

 Optimizing  discovery  is  hard.      

Page 38: Breaking the catalog

 Segue:  Consider  relationship  modeling.  

Page 39: Breaking the catalog

 Mozart’s  Don  Giovanni  and      José  Zorrilla’s  Don  Juan  Tenorio  

 via  Tirso  de  Molina’s  El  burlador  de  Sevilla  

Page 40: Breaking the catalog

 per  the  Library  Loon  ...    

 “relationship  modeling      only  need  be  done  once”  

Page 41: Breaking the catalog

 which  in  real  world  terms  means      centralizing  this  modeling    

Page 42: Breaking the catalog

 duplicating  the  best  of  Flickr  etc.  –      for  a  LOD  repository  

Page 43: Breaking the catalog

   crowd  sourced  resource  modeling  

Page 44: Breaking the catalog

 Enables  interesting  approaches  to  book  recommending,  browsing  algorithms  

Page 45: Breaking the catalog

 Linked  data  makes  for  nice  CS  experiments  and  gets  digital  librarians  excited.    

Page 46: Breaking the catalog

 No  one  thinks  linked  data  is  a  panacea.    

Page 47: Breaking the catalog

It’s  a  tool  that  can  help  in  some  contexts.  

Page 48: Breaking the catalog

 Yet  not  so  much  in  others.  

Page 49: Breaking the catalog

 I  will  argue  …    

 The  most  compelling  uses  of  LD  in  repositories  may  be  intra-­‐catalog.  

Page 50: Breaking the catalog

 Thinking  of  the  catalog  is  a  database,      like  Amazon’s.  

Page 51: Breaking the catalog

 If  I  just  want  bib  info  (metadata),  go  yonder  to  OCLC  or  Open  Library.  

Page 52: Breaking the catalog

 If  I  want  to  find  out  what  to  watch  or  read,  I  want  to  go  to  the  largest  aggregation  of  user+meta  data  as  possible.  

Page 53: Breaking the catalog

 Might  be  Amazon.    (Or  could  be  DPLA  …  ).    

Page 54: Breaking the catalog

 Library  LOD  has  to  be  network  scale,  on  a  single  platform,  to  be  end-­‐user  attractive  (like  Amazon).    

Page 55: Breaking the catalog

 I  think  that’s  kinda  funny  conundrum.    

Page 56: Breaking the catalog

 Because  in  a  way,  linked  open  data      is  about  a  web  of  open  data.    

Page 57: Breaking the catalog

 However,  unless  you  are  in  the  business  of  providing  open  data  there’s  more  utility  in      …  

Page 58: Breaking the catalog

 structured  data  on  a  restricted  platform  –    linked  closed  data  (so  to  speak)  

Page 59: Breaking the catalog

 From  a  business  perspective,  I’d  be  a  real  fan  of  linked  closed  data.    

Page 60: Breaking the catalog

 If  I  offered  cloud  data  services,  I’d  be  happy  to  host  any  useful  linked  open  data.      

Page 61: Breaking the catalog

 (Because  being  too  open  to  ingest,  too  polygamous,  can  poison  data  stores)  

Page 62: Breaking the catalog

 As  long  as  I  (a  platform)  could  retain  an      unrestricted  copy  of  your  data.      

Page 63: Breaking the catalog

 There’s  a  (copyleft)  rights  issue  here  too  …      (e.g.  CC-­‐SA  and  derivatives)  

Page 64: Breaking the catalog

 LOD  domains  assume  unbounded  sharing    

Page 65: Breaking the catalog

 But  rights  might  be  quite  granular  or  restricted  downstream      

Page 66: Breaking the catalog

 Europeana  requires  downstream  commercial  rights  to  encourage  new  enterprise    

Page 67: Breaking the catalog

 But  LAMS  might  not  possess  those  rights,  restricting  the  size  of  the  data  market.    

Page 68: Breaking the catalog

 If  we  want  linked  open  data  to  work  well      

Page 69: Breaking the catalog

 We  need  to  aggregate  and  hold  data  on  a  single  network  platform  to  the  greatest  possible  extent.  

Page 70: Breaking the catalog

 Because  that  will  drive  use,  and  obtain  intentionality  information.  

Page 71: Breaking the catalog

 And  that  data  will  help  ultimately  to  contextualize  metadata  with  desire.  

Page 72: Breaking the catalog

 Therefore  from  the  user  perspective  …  

Page 73: Breaking the catalog

 I’d  like  to  see  us  build  out  a  common  open  platform  for  LOD.      

Page 74: Breaking the catalog

 The  most  powerful  opportunity  for  LOD    may  be  in  building  central  repositories.  

Page 75: Breaking the catalog

       peter  brantley  

     director,  bookserver  project          internet  archive          san  francisco  ca    

       @naypinya            (twitter)