58
Knowledge out there on the Web Serge Abiteboul 2014 Abiteboul - EDBT keynote, Athenes 1

Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Embed Size (px)

Citation preview

Page 1: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 1

Knowledge out there on the Web

Serge Abiteboul

2014

Page 2: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 2

• Knowledge out there on the Web:

Video of the talk at the Royal Society

2014

Knowledge out there on the Web

Personal knowledge out there

Page 3: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 3

Organization

1. The context2. The personal information management system

1. The concept of Pims2. Pims are coming3. Advantages

3. From information to knowledge4. The Webdamlog language

1. The language in brief2. Probabilities3. Access control

5. Conclusion: some research issues2014

Page 4: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 4

1. The context

2014

Page 5: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 5

Data explosion• data: pictures, music, movies, reports, email, tweets, contacts, schedules…• social interactions: opinions, annotations, recommendation…• metadata: on photos, documents, music…• ontologies: Alice’s ontology and mapping with other ontologies• web localizations: friends account on FB, twitter, lists of blogs… • security: credentials on various systems• data in various organizations

– jobs, schools, insurances, banks, taxes, medical, retirement…• data in various vendors

– amazon, retailers, netflix, applestore… • data that software or hardware sensors capture

– with or without our knowledge– web navigation, phone use, geolocation, "quantified self" measurements, contactless card

readings, surveillance camera pictures, •

…2014

Page 6: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 6

Data dispersion

• Laptop, desktop, smartphone, tablet, car computer• Residential boxes (tvbox), NAS, electronic vaults…• Mail, address book, agenda, todo-lists• Facebook, LinkedIn, Picasa, YouTube, Tweeter• Svn, Google docs, Dropbox• Government services• Business services• Also machine and systems from

– family, friends, associations, work

• Systems even unknown to the user – third party cookies

2014

Page 7: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 7

Data heterogeneity

Type: text, relational, HTML, XML, pdf…Terminology/structure/ontologySystems: MS, Linux, IOS, AndroidDistributionSecurity protocolsQuality: incomplete / inconsistent information

2014

Page 8: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 8

Bad news

• Limited functionalities because of the silos– Difficult to do global search, synchronization, task

sequencing over distinct systems…

• Loss of control over the data– Difficult to control privacy– Leaks of private information

• Loss of freedom– Vendor lock-in

2014

Page 9: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 9

Growing resentment

• Against companies– Intrusive marketing, cryptic personalization and business

decisions (e.g., on pricing), and automated customer service with no real channel for customers' voices

– Creepy "big data" inferences• Against governments – NSA and its European counterparts

• Dissymmetry between what these systems know about a person, and what the person actually knows

2014

Page 10: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 10

Future alternatives (for normal people)

1. Continue with this increasing mess– Use a shrink to overcome frustration

2. Regroup all your data on the same platform– Google, Apple, Facebook, …, a new comer– Use a shrink to overcome resentment

3. Study 2 years to become a geek– Geeks know how to manage their information – Use a shrink to survive the experience

4. And, of course, there is the Pims’ way

2014

Page 11: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 11

2. The personal information management system

2.1 Introduction

2014

Page 12: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 12

The Pims

• Personal information management system• What is a successful Web service today– Some great software– Some machines on which it runs(and a business model)

• Separate the two facets– Some company provides the software– It runs on your machinewith another business model

2014

Page 13: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 13

The Pims (1)• The Pims runs software

– The user chooses the code to deploy on the server.– The software is open source, a requirement for security.

• With the user's data– All the user’s personal information

• 0n the user’s server(s) – The user owns it or pays for a hosted server– The server may be a physical or a virtual machine– It may be physically located at the user’s home (e.g., a tvbox) or not– It may run on a single machine or be distributed among several

machines– The server is in the cloud, i.e., it can be reached from everywhere -

personal cloud

2014

Page 14: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 14

The Pims: the 2 main issues

• Security– Enforced by the Pims: guaranteed by the contract the user

has with the Pims• Reasonably small piece of code; possible to verify it

– Enforced by the services running on it: open source so that we don’t need to trust the providers of these systems

– A higher level of security than now • The management

– Should be epsilon-work– Should require little competence– A company can be paid to do it (in the cloud)

2014

Page 15: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 15

2. The personal information management system

2.2 This is arriving

2014

Page 16: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 16

It is becoming possible

• System administration is easier– Abstraction technologies for servers– Virtualization and configuration management tools.

• Open source is very active– Open source technology more and more available

• Price of machines is going down– A hosted-low cost server is as cheap as 5€/month– Paying is no longer a barrier for a majority of people

Indeed I am sure you have friends already doing it2014

Page 17: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 17

Many people are working on it

• Many systems & projects– Lifestreams, Stuff-I’ve-Seen, Haystack, MyLifeBits,

Connections, Seetrieve, Personal Dataspaces, or deskWeb.

– YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud• Some on particular aspects– Mailpile for mail– Lima for a Dropbox-like service, but at home.– Personal NAS (network-connected storage) e.g. Synologie– Personal data store SAMI of Samsung...

• Many more2014

Page 18: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 18

Data disclosure movement

• Smart Disclosure in the US• MiData in the UK• MesInfos in France

Several large companies (network operators, banks, retailers, insurers…) have agreed to share with a panel of customers the personal data that they have about them

2014

Page 19: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 19

Big companies are interested (1) Pre-digital companies

• E.g., hotels or banks • Disintermediated from their customers by pure

Internet players such as Google, Amazon, Booking.com, Mint.

• In Pims, they can rebuild direct interaction • The playing field is neutral – Unlike on the Internet where they have less data

• They can offer new services without compromising privacy

2014

Page 20: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 20

Big companies are interested (2) Home appliances companies

• Many boxes deployed at home or in datacenters– Internet access provider "boxes”, NAS servers,

"smart" meters provided by energy vendors, home automation systems, "digital lockers”…

• Personal data spaces dedicated to specific usage

• Could evolve to become more generic• Control of private Internet of objects2014

Page 21: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 21

2. The personal information management system

2.3 Advantages

2014

Page 22: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 22

Advantages

• User control over their data– Who has access to what, under what rules, to do what

• User empowerment– They choose freely services & they can leave a service

• Participation to a more “neutral” Web– With the "network effects", the main platforms are

accumulating data/customers and distorting competition– The Pims bring back fairness on the Web– Good practices are encouraged, e.g., interoperability,

portability

2014

Page 23: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 23

Advantages – New functionalities

• Single identity/login• Semantic global search with (personal) ontology• Synchronization/backups across services• Access control management across services• Task sequencing across services• Exchange of information between “friends” • Connected objects control, a hub for the IoT• Personal big data analysis

2014

Page 24: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 24

3. From information to knowledge

2014

(aka let’s move a tad more technical)

Page 25: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 25

Machines prefer knowledge

• Integration of data & information sources– It is easier to integrate knowledge than information

• Collaboration between services & devices– It is easier for services to collaborate using

knowledge than with information• Problem solving based on knowledge inference

2014

Page 26: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 26

Humans as well

• The users of the system are human beings– They want support for managing information– But they are not geeks– They don’t want to program

• To facilitate the interactions between humans and machines,

We should use declarative languages !

2014

Page 27: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 27

It all started with datalog

• Popular in the 90’s• Some followers in 00’s

– A., Afrati, Atzeni, Cali, Greco, Gotloeb, Milo, Sacca, Ullman…• Recent revival

– 2010 Oege de Moor’s workshop @oxford• Datalog 2.0

– 2010 Joe Hellerstein’s keynote @pods• Datalog Redux: Experience and Conjecture

– 2014 Frank Neven’s keynote @icdt• Remaining CALM in declarative networking

• Now featuring: Webdamlog2014

Page 28: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 29

Requirement 1: Distribution

• Different machines• Different users

• We use the notion of principal here– family@alice(Bob)– agenda@Alice-iPhone(…)– friends@Alice-FaceBook(…)

• A principal comes with identity and privileges

2014

Page 29: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 30

Requirement 2: Privacy

• Control of who sees what in a distributed environment

• Access control• Should be clear from the first part of the talk

this is a most important issueTutorial on privacy by Nicolas Anciaux, Benjamin Nguyen, Iulian Sandu Popa – Today at 2:00

2014

Page 30: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

31

Requirement 3: Probabilities

• We have to deal with negation– Elvis was not French

• With negations, come contradictions– Elvis Presley died in 1977; The King is alive

• There are different points of view– Elvis’s music is the best; it stinks

• Measure uncertainty with probabilities

2014 Abiteboul - EDBT keynote, Athenes

The more I see, the less I know for sure. John Lennon

Page 31: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 32

So, what is the goal

• A datalog-style language with distributionaccess controlprobabilities

We are lucky, there is such a language:Webdamlog

2014

Page 32: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 33

4. The Webdamlog language

(aka let’s be serious)4.1 Webdamlog in brief

2014

Page 33: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 34

Facts and rules

Facts are of the form R@p(a1,…,an)– p is a principal, i.e., Serge, Serge’s-iPhone, Facebook/Serge, [email protected]

Rules are of the form

$R@$P($U) :- $R1@$P1($U1), ..., $Rn@$Pn($Un)– $R, $Ri are relation terms– $P, $Pi are peer terms – $U, $Ui are tuples of terms– Safety condition– Also negations: ignored here

2014

Page 34: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 35

The semantics of rules

Classification based on locality and nature of head predicates (intentional or extensional)

• Local rule at my-laptop: all predicates in the body of the rules are from my-laptop

Local with local intentional head datalogLocal with local extensional head database updateLocal with non-local extensional head messaging between peersLocal with non-local intentional head view definitionNon-localgeneral delegation

2014

Page 35: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 36

Local rules with local head

Intensional local head – datalog [at my-iphone]

fof@my-iphone($x, $y) :- friend@my-iphone($x,$y)fof@my-iphone($x,$y) :- friend@my-iphone($x,$z),

fof@my-iphone($z,$y)

Extensional local head – database updates[at my-iphone]

believe@my-iphone(“Alice”, $loc) :-tell@my-iphone($p,”Alice”, $loc), friend@my-iphone($p)

2014

Page 36: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 37

Local rules & non-local extensional head

Messaging between peers$message@$peer($name, “Happy birthday!”) :-

today@my-iphone($date),birthday@my-iphone($name, $message, $peer, $date)

Example

– today@my-iphone(3/25)

– birthday@my-iphone("Manon”, “sendmail”, “gmail.com”, 3/25)

[email protected]("Manon”, “Happy birthday”)

2014

Page 37: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 38

Local rules & non-local intentional head

View definitionboyMeetsGirl@gossip-site($girl, $boy) :-

girls@my-iphone($girl, $event),boys@my-iphone($boy, $event)

• Semantics of boyMeetGirl@gossip-site is a join of relations girls and boys from my-iphone

• Defines a view at some other peer

2014

Page 38: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 39

Non-local rules

General delegation(at my-iphone): boyMeetsGirl@gossip-site($girl, $boy) :-

girls@my-iphone($girl, $event),

boys@alice-iphone($boy, $event)

Example: girls@my-iphone(“Alice”, “Julia's birthday”) – my-iphone installs the following rule at alice-iphoneboyMeetsGirl@gossip-site(“Alice”, $boy) :-

boys@alice-iphone($boy, “Julia's birthday”)

Useful to distribute work and exchange knowledge2014

Page 39: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 40

The thesis

The Web should turn into a distributed knowledge base where peers share facts and rules, and collaborateThe language Webdamlog is a first step towards that goal Missing– Probabilities– Access control

2014

Page 40: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 41

4. The Webdamlog language

4.2 Probabilities

2014

Page 41: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 42

Advertisement

Deduction with Contradictions in DatalogS.A., Daniel Deutch and Victor Vianu– Tomorrow, 11:00

2014

Page 42: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 43

4. The Webdamlog language

4.3 Access control

2014

Page 43: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 44

Requirements

Data access Users would like to control who can read and modify their information

Data dissemination Users would like to control how their data are transferred from one participant to another

Application control Users would like to control which applications can run on their behalf, and what information these applications can access.

2014

Page 44: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 45

The general picture

• Coarse grain for extensional relations– read access to the relation

• Fine grain for intensional relations– read access to tuple t requires read access to the

tuples that lead to deriving t• Delegation controlled in a sandbox• Focus on read privilege here

2014

Page 45: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 46

Read: default

• Extensional relations– if you have read privilege to the relation

• Intensional relations– if you have read privilege to the relation &– if you can read all the tuples that have been used

to create this fact – provenance of the fact

2014

Page 46: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 48

Fine grain access control

[at Bob] album@Alice($p,$f) :- photo@Bob($p,$f)[at Sue] album@Alice($p,$f) :- photo@Sue($p,$f)

– album@Alice is intensional– Both Bob and Sue contribute to it– Peter who has read privilege to album@Alice and

photo@Bob only does not see the photos of Sue

2014

Page 47: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 49

Paranoiac access control

[at Bob] album@Alice($p,$f) :- photo@Bob($p,$f), friends@Bob($f)

– Issue: you can read Bob’s photos only if you have read privilege on friends@Bob that Bob wants to keep private

2014

Page 48: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 50

Declassification

[at Bob] photo@Alice($p,$f) :- photo@Bob($p,$f), [ hide friends@Bob($f) ]

– Hide: blocks the provenance from friends@Bob– Bob declassify this data just for the evaluation of

this rule– You can declassify only tuples you own grant ↦

privilege2014

Page 49: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 51

Issues with non local rules

[at Bob] message@Sue(“I hate you”) :- date@Alice(d)

aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x)Ignoring access rights, by delegation, this results in running[at Alice]

message@Sue(“I hate you”) :- date@Alice(d)aliceSecret@Bob(x) :- date@Alice(d),

secret@Alice(x)

2014

Page 50: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 52

Default solution: sand box

We run the rule at Alice in a Sandbox• We use the access rights of Bob

So the second rule does not succeed in sending secrets

• The message specifies that this is done at Bob’s requestSo requires authentication/signatures

Alternative: delegation without sandbox2014

Page 51: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 53

5. Conclusion: some research issues

2014

Page 52: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 55

Explaining

• Users want to understand the informationthey see, the answers they are given

– In their professional/social life • Difficulties– Reasoning with large number of facts – Information is often probabilistic and not public– Requires knowing how the information was

obtained (its provenance)

2014

Page 53: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 57

Serendipity

• You may hear by chance a song that is going to totally obsess you

• A librarian may suggest your reading an article that will transform your research

This is serendipity

• A perfect search engine • A perfect recommendation

system• A perfect computer assistantSuch systems are boring

They lack serendipity

2014

Design programs that would introduce serendipity in our lives

Page 54: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 58

Hypermnesia

Exceptionally exact or vivid memory, especially as associated with certain mental illnesses

For a user: We cannot live knowing that any word, any move will leave a trace?

For the ecosystem: We cannot store all the data we produce – lack of storage resources

2014

Forgetting is Key to a Healthy MindScientific AmericanImage: Aaron Goodman

A main issue is to select the information we choose to keep

Page 55: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 59

Babel of human-machine-interaction

• Each time a user interacts with a data source, does he have to use the ontology of that source ?

• No! • Instead of a user adapting to the ontologies of

the N systems he uses each day• We want the N systems to adapt to the user’s

ontology

2014

Page 56: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 60

Religion…science…machines

• Knowledge used to be determined by religion• Knowledge used to be determined scientifically• Knowledge will now be determined by machines?

• Decisions are increasingly made by machines– Stock market (automatic trading)– Fully automated factory– Fully automated metros– Death penalty (killer drones)…

2014

Page 57: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 61

to the digital world!

• We will soon be living in a world surrounded by machines that – acquire knowledge and decide for us

• What will we do with that technology? • Will we become smarter? • Will we become master or slave of the new

technology?

2014

Page 58: Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

Abiteboul - EDBT keynote, Athenes 622014

σας ευχαριστώ