38
Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Embed Size (px)

Citation preview

Page 1: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Search Computing

Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri

SeCo workshop, Como, June 17th-19th, 2009

Page 2: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Agenda

1. Overview of the SeCo architecture– Development and experimentation roadmap

2. Application development approach: LIQUID QUERIES – Configurability of the interface, strong parameter typing, static mapping to

services– Continuous query processes– Exploitation of user intelligence (interactive query process – user feedback), BPM– Automatic code generation of user interface and interaction steps– Adaptivity and customization of the query interaction

3. Support to the developer in the various design phases– Service marts specification– Query specification– User interface specification

4. SeCo extensions– High-level queries -- General almost-NL query

• NLP, wordnet, query splitting, and mapping to services

2

Page 3: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

1. Overview and roadmap of the SeCo architecture

SeCo workshop, Como, June 17th-19th, 2009

Page 4: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Search Computing architecture: overall view 4

Main Query flow

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queries Merged Results

DomainFramework

Cache

Final UserResults

<Uses> relation

High level query“Where can I attend a DB

scientific conference close to a beautiful beach reachable

with cheap flights?”Sub query 1“Where can I attend a DB scientific conference?”

Sub query 2“place close to

a beautiful beach?”

Sub query 3“place reachable with

cheap flight?”

Low level query 1ConfSearch(“DB”,placeX,dateY)Low level query 2

TourSearch(“Beach”,PlaceX)Low level query 3Flight(“cost<200”,PlaceX,DateY)

Query plan

Services invocations and operators execution

Results

Presented resultsMSVVEIS’08 - Barcelona – IberiaLID’08 – Rome - AlitaliaRCIS’08- Marrakech- AirFrance

Page 5: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Search Computing architecture: configurability of the implementation 5

Main Query flow

<Uses> relation

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queries Merged Results

DomainFramework

Cache

Final UserResults

Ad

min

In

terf

ace

Lo

w-le

vel q

ue

rie

s

Su

b-q

ue

rie

s

Co

ncr

ete

Qu

ery

Pla

n

Page 6: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Search Computing architecture: development roadmap6

Prototype 1:Core behaviour of the system.

• Engine-based execution of queries • Domain repository• Service repository • Coarse result presentation

<Uses> relation

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queriesMerged Results

DomainFramework

Cache

Final UserResults

Ad

min

In

terf

ace

Lo

w-l

eve

l qu

erie

s

Su

b-q

ue

rie

s

Co

ncr

ete

Qu

ery

Pla

n

Prototype 2:Planning

• Automatic optimized query planning

Prototype 3:Mapping and presentation

• mapping to domains• presentation of results

Prototype 4:High level queries

Page 7: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

2. Application development approach:

LIQUID QUERIES

SeCo workshop, Como, June 17th-19th, 2009

Page 8: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

LIQUID QUERY

A level above the optimization:– Forcing the query flow

LIQUID QUERY: A query with flexible boundaries

Control is – on the user – at query time– on the evolution

Contextual/recommended direction could be proposed

In line with current trends in search (and others!)

8

Forward-looking and (a little bit) far-fetched ideas

Open to discussion

Page 9: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Microsoft Bing Contextual step-by-step evolution of the query

9

Page 10: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Google Squared

Multi-content, resizable, reshapeable query

10

Page 11: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Not a search: Hunch

Just a big decision tree

Perceived as great value by today users

11

Page 12: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Yahoo! research

Web of pages vs. Web of objects

Understand the need behind the user query

Exploiting user intelligence– Tags– Folksonomies

Multi-step queries

Multi-technology queries– Annotations– Content-based

12

Page 13: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

LIQUID QUERY

Moving from "one time query" to a process-based approach

Continuation of queries based on exploitation of relations between service marts

A query with flexible boundaries, that can be– Reshaped/refined: asking for different information on the results– Expanded: asking for additional information on the results

adding new domains– Extended: asking for more results

by the user at runtime

Contextual/recommended direction could be proposed

Relies on the SeCo query machine– Every user interaction could trigger recalculations

13

Page 14: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Liquid query navigation

Liquid- what?– Liquid data (BEA & Co., @ San Diego)– Liquid publications and docs (Fabio & Co., @ Trento)– (old-style) Liquid queries (Heer & Co., @ Berkeley)

Somehow similar to Google Squared, but: – Multi-domain– Multi-purpose– More flexible

14

Upon first query cycle, various options to the user: Refinement of the query Extension of the query results (give me more) Expansion of the query (add more domains) Choosing a different connection between services (i.e.,

changing the adopted access pattern) Clustering, re-ranking, ...What does it imply at the query machine level?

Page 15: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Liquid query navigation15

Conference Photo Description Date Hotel Photo Description Address Services

Page 16: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Liquid query: clustering/unclustering 16

At the query machine level?Probably nothing, just a presentation issue, if...

Page 17: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Liquid query: ranking/reranking

For unclustered data or cluster representatives

17

At the query machine level?If multi-ranking service available, recompute the query.If not, just re-sort the query result at presentation level.

Page 18: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Liquid query: refinment

Adding additional constraints– E.g., on this timeframe

... More or less results...

18

Search againRefined search...

At the query machine level?Rebuild the plan, possibly. And re-execute the query.If pieces can be reused... (caching)

Page 19: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Liquid query: extend the query 19

At the query machine level?Run again the machine on further data

Gimme more

Asking for more results

... More results...

Page 20: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Asking for more results on a specific service

... More results...

Liquid query: zooming in (service-wise) 20

At the query machine level?Run again the machine on that service. Or:Change the throughput of the machine Clock branches

Gimme more

Page 21: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Asking for more results on a specific item

... More results...

Liquid query: zooming in 21

At the query machine level?Run again the machine on services joined to that item. Or:Change the throughput of the machine Clock branches

Gimme more

Page 22: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Liquid query: expand (shrink) the query 22

At the query machine level?Changing the plan.If something can be reused ... (caching)

Additional subquery

Asking for more columns (or remove existing ones)

... Results... ?

Page 23: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Changing the used access paths

Liquid query: change join conditions 23

At the query machine level?Changing the plan.If something can be reused ... (caching)

Page 24: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Horizontal and vertical multi-domain search

Structure of the interface automatically generated based on the structure of the access plan

Additional feature: save the resulting inteface (for query and results) for canned vertical applications

Apply a stylesheet for making the application real– Set of default stylesheets that can be painted upon the inteface– Possibility of defining custom stylesheets

24

Page 25: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

3. Support to the designer.

SETTING UP THE LIQUID QUERY ENVIRONMENT

SeCo workshop, Como, June 17th-19th, 2009

Page 26: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Registration Time

The role of the designer is at registration time!

Low development cost

Higher cost of registration– Description of services– Description of default interfaces for services inputs and results

26

Page 27: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

The tools

Strong parameter typing– UI fields are typed

Static mapping to services– UI fields are directly mapped to search services

BPM-like modeling of the user interaction and query processing steps

Automatic generation of UI

Adaptivity and customization of the query interaction

27

Page 28: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

The hard task: Registration time

Building access patterns

Building binding

Defining the (lightweight) semantics– Domains– Keywords

Defining the (default) presentation– Forms– Results

28

Page 29: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

4. SeCo Extensions:

High level queries

SeCo workshop, Como, June 17th-19th, 2009

Page 30: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

High level queries

Almost NL-specified queries– Conjunctive noun phrases

Need to be decomposed and mapped to semantic domains need of domain repository

Require NLP and “semantization” of phrase contents need of NL analysis

Page 31: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Domain repository

Storage of – domain definitions taxonomy (e.g., Dewey classification)– mappings of NL words to domains– mappings of services to domains

Shallow approach based on– Wordnet (synsets Sx)– Wordnet-Domains (domains Dx)

31

D1 D2

D3

S1

S2

...

S6 (.2, .8)

Service

Repository

ss1 (.4, .6

)

Page 32: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Domain Repository: API

Three main interfaces:

Domain query: used to extract a domain (or a list of domains), and their corresponding properties, that relate to a specific string

Service extraction: used to extract the list of services associated to the domain

Domain hierarchy update: used to update the domain hierarchy

Page 33: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Query Analyser

Starts from almost- natural language specification of the user request

tries to determine a decomposition in subqueries that can fit the problem of mapping on a domain

E.g.: scientific conference reachable with a cheap flight, with a beautiful beach nearby

Target splitting – q1=“scientific conference ",– q2=“reachable with a cheap flight ", and – q3=“with a beautiful beach nearby ".

Page 34: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Query Analyser

For NLP, we exploit an open source tool developed by the Stanford Natural Language Processing Group

The outcome is a tree representation of the query

Definition of euristics for query splitting

To optimize the recognition of query-subquery relations: – iterative invocation of the NLP tool based on various arguments

(feedback from user, feedfwd/back from other components, ...);– exploitation of knowledge/services available in other

components. E.g., knowledge– about the available services, domains, and so on;– syntax/logic analysis results on the sentence.

Page 35: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Back to the example - 2

scientific conference reachable with a cheap flight, with a beautiful beach nearby

Very coarse euristics: – Subqueries = first level subtrees

Obtained splitting – q1=“scientific conference reachable",– q2=“with a cheap flight ", and – q3=“with a beautiful beach nearby ".

35

Page 36: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Back to the example - 3

Still not exact, but rather close (being the first shot:)

Further information can be extracted– Association between words: e.g., cheap_flight– Meaning of phrase connectives

And ...– What about negation?– What about join attributes between phrases?– ...

36

Page 37: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Brambilla, CeriSearch Computing: LIQUID QUERIES

Query Analyser

Tasklist– Extraction of a corpus of queries from Yahoo! Answers– Definition of concrete options for optimization of the extractor– Training? – Validation of the approach on the corpus– Mapping: currently could be trivial on keywords of domains

Page 38: Search Computing Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri SeCo workshop, Como, June 17th-19th, 2009

Questions?

Search Computing

www.search-computing.org

[email protected]

[email protected]