46
OPAL – Real Understanding of Real Estate Forms Xiaonan Guo Oxford University Computing Laboratory, DIADEM group

OPAL Presentation

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: OPAL Presentation

OPAL – Real Understanding of Real Estate Forms

Xiaonan GuoOxford University Computing Laboratory, DIADEM

group

Page 2: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Diversity in Web Form Design

2May 27, 2011

Page 3: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Diversity in Web Form Design

2May 27, 2011

Page 4: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Diversity in Web Form Design

2May 27, 2011

Page 5: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Diversity in Web Form Design

3May 27, 2011

Page 6: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Diversity in Web Form Design

3May 27, 2011

Page 7: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

OPAL – Ontology-based web Pattern Analysis with Logic

Overview

Data models and Mappings

Browser, Segmentation, Annotation, and Domain Model

Segmentation and Phenomenological Mapping

Analysis and Evaluation

Future Work

Outline

4May 27, 2011

Page 8: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Overview

5May 27, 2011

Domain-independentHierarchical

modeling

Domain knowledge

+

Domain-aware Form Analysis

Declarative

Definition

OPAL

Page 9: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Overview

6May 27, 2011

Page 10: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Browser Model

7May 27, 2011

Page 11: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Browser Model

7May 27, 2011

Page 12: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Browser Model

7May 27, 2011

Page 13: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Browser Model

8May 27, 2011

html_element(e_320_input,320,321,319,input,d1).

html_attr(e_320_input_name,e_320_input,name,"location“,d1).html_attr(e_320_input_type,e_320_input,type,"radio“,d1)....

box(e_320_input,106,253,120,267,14,14).

css_attr(e_320_input,bottom,"auto").css_attr(e_320_input,clear,"none"). css_attr(e_320_input,color,"rgb(0,0,0)")....

Page 14: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model

9May 27, 2011

grouping of related form elements, e.g. fields, labels

achieved via Segmentation Mapping, which

groups form fields

assigns labels to fields and groups

Page 15: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – groups

10May 27, 2011

Form elements are grouped if

they occur in sequence

they have similarities in attribute values or appearances

their least common ancestor contains no other elements

group(Es) :- similarFieldSequence(Es), leastCommonAncestor(A,Es), not hasAdditionalField(A,Es).

leastCommonAncestor(A,Es) :- commonAncestor(A,Es),not ( child(C,A), commonAncestor(C,Es) ).

partOf(E,A) :- group(Es), member(E,Es), leastCommonAncestor(A,Es).

Page 16: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – groups

161111May 27, 2011

group([e_320_input,e_326_input,e_332_input,e_338_input,e_344_input]).

leastCommonAncestor(e_319_td,[e_320_input,e_326_input,...,e_344_input]).

partOf(e_320_input,e_319_td).partOf(e_326_input,e_319_td).partOf(e_332_input,e_319_td).partOf(e_338_input,e_319_td).partOf(e_344_input,e_319_td).

Page 17: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

12May 27, 2011

Texts are assigned to fields and groups using

Field: HTML <label>, Greatest unique ancestor

Segment: Text-field alignment in groups

Page: Visual alignment

Page 18: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

13May 27, 2011

Texts are assigned to fields and groups using

Field: HTML <label>, Greatest unique ancestor

Segment: Text-field alignment in groups

Page: Visual alignment

hasBasicLabel(E,L,T) :-html_element(E,_,_,_,input,_),

html_attr(_,E,id,ID,_),html_element(N,_,_,_,label,_),

html_attr(_,N,for,ID,_),child(L,N), html_text(L,T,_).

Page 19: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

14May 27, 2011

Texts are assigned to fields and groups using

Field: HTML <label>, Greatest unique ancestor

Segment: Text-field alignment in groups

Page: Visual alignment

greatestUniqueAncestor(A,E) :- uniqueDescendant(E,A),not ( parent(P,A), uniqueDescendant(E,P) ).

hasBasicLabel(E,L,T) :- group(E), greatestUniqueAncestor(A,E),

descendant(L,A), html_text(L,T,_).

Page 20: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

15May 27, 2011

Texts are assigned to fields and groups using

Field: HTML <label>, Greatest unique ancestor

Segment: Text-field alignment in groups

Page: Visual alignment

hasLabel(E,L,T) :-partOf(E,G), leastCommonAncestor(G,Es), group(Es), hasNoLabel(Es), textLists(LLs,G),sameLength(Es,LLs),labelOneToOne(E,Ls,Es,LLs),member(L,Ls), html_text(L,T,_).

Page 21: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

16May 27, 2011

Texts are assigned to fields and groups using

Field: HTML <label>, Greatest unique ancestor

Segment: Text-field alignment in groups

Page: Visual alignment

hasLabel(E,L,LText) :- hasTextBox(F,B),descendantTextOf(L,B),html_text(L,_,_,_,_),node_text(L,LText,true)

Page 22: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

17May 27, 2011

Texts are assigned to fields and groups using

Field: HTML <label>, Greatest unique ancestor

Segment: Text-field alignment in groups

Page: Visual alignment

Page 23: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

18May 27, 2011

hasLabel(e_320_input,t_322,”Nailsea / Backwell”).

hasLabel(e_358_select,t_354,”Min. beds”).

Page 24: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

18May 27, 2011

hasLabel(e_319_td,t_316,”Area: ”).

Page 25: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Segmentation Model – labels

19May 27, 2011

hasLabel (e_304_input,t_302,"To Buy:").hasLabel (e_308_input,t_306,"To Rent:").

hasLabel(e_320_input,t_322,"Nailsea / Backwell").hasLabel(e_326_input,t_328,"Portishead / Pill").hasLabel(e_338_input,t_340,"Yatton / Congresbury").hasLabel(e_332_input,t_334,"Clevedon").hasLabel(e_344_input,t_346,"Bristol / Weston-super-mare").

hasBasicLabel(e_358_select,t_354,"Min Beds"). hasBasicLabel(e_515_select,t_400,"Min Price").hasBasicLabel(e_705_select,t_594,"Max Price").hasBasicLabel(e_788_select,t_784,"View Order").

hasLabel(e_319_td,t_316,"Area"). hasLabel(e_297_tbody,t_290,”Find a property to buy or rent…").

Page 26: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Overview

20May 27, 2011

Page 27: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Obtained from Browser model

Relying on domain-specific knowledge, represents

linguistic annotations

machine learning based classifications

Annotation Model

21May 27, 2011

Page 28: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Annotation Model

22May 27, 2011

annotation(attrclob_d1_5560,attrclob_d1,80,87,"Nailsea").annotationFeature(attrclob_d1_5560,"majorType","location").annotationFeature(attrclob_d1_5560,"minorType“,"district_county_etc").

annotation(elclob_d1_1707,elclob_d1,2028,2038,"Min. price").annotationFeature(elclob_d1_1707,"modifier“,"min").annotationFeature(elclob_d1_1707,"minorType“,"price").annotationFeature(elclob_d1_1707,"majorType“,"reform.label").

Page 29: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Describes conceptual entities on forms as in domain ontology

Achieved via phenomenological mapping, which

correlates labels with annotations

classifies form elements

Domain Model

23May 27, 2011

Page 30: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model – Ontology

24May 27, 2011

Real-Estate Web Form

PriceSegment

Property-ContractSegment

GeographicSegment

Property-FeatureSegment

Search-OptionSegment

1..1

1..1

0..1Form-Buttons

Segment

1..1

OR

1..*

0..1

purpose

purpose

purpose

Combined Form purpose.{combined}

AND

purpose

(A)

purpose={buy, rent, combined}

Page 31: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model – Ontology

24May 27, 2011

Price Segment

XOR

CurrencyElement

0..1

0..1 1..1Currency

LabelCurrency

Input-Field

OR

purpose

purpose purposePrice Element

1..1

priceType.{range}

purposePrice Element

1..1

priceType.{apx}

purposePrice Element

1..1

priceType.{min}

purposePrice Element

1..1

priceType.{max}

(B)

priceType={min, max, approximate, range}

purposePrice Element

ANDpurpose

PriceLabel

PriceInput-Field

purposepurpose

priceType

0..1 1..1

priceType

priceType

priceType

Page 32: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Form elements are annotated as follows

Form elements are classified with Concept C

and Facets C_F

Domain Model – Classification

25May 27, 2011

C(X) :- leafSegment(X), hasAnnotation(X,A), Clabel(A).

C_F(X,F) :- C(X), hasAnnotation(X,F), C_FLabel(F).

C_F(X,F) :- C(X), hasValueAnnotation(X,F), C_FValue(F)

hasAnnotation(E,A) :- hasLabel(E,_,T), annotation(Aid,_,_,_,T),annotationFeature(Aid,_,Anno).

Page 33: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model – Classification

26May 27, 2011

priceElement(e_515_select,e_286_form). priceType(e_515_select,"min").

Page 34: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model

27May 27, 2011

AreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch Element

Price Element(min)

Price Element(max)

...

...

...

Page 35: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model

27May 27, 2011

AreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch Element

Price Element(min)

Price Element(max)

...

...

...

Area-BranchSegment

Page 36: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model

27May 27, 2011

AreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch Element

Price Element(min)

Price Element(max)

...

...

...

Area-BranchSegment

GeographicSegment

Page 37: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model

27May 27, 2011

AreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch Element

Price Element(min)

Price Element(max)

...

...

...

Area-BranchSegment

GeographicSegment

Page 38: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model

27May 27, 2011

AreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch Element

Price Element(min)

Price Element(max)

...

...

...

Area-BranchSegment

PriceSegment

GeographicSegment

Page 39: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model

27May 27, 2011

AreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch Element

Price Element(min)

Price Element(max)

...

...

...

Area-BranchSegment

PriceSegment

GeographicSegment

Page 40: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Domain Model

27May 27, 2011

AreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch ElementAreaBranch Element

Price Element(min)

Price Element(max)

...

...

...

Area-BranchSegment

PriceSegment

GeographicSegment

Real-EstateForm

Page 41: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Overview

28May 27, 2011

Page 42: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Analysis and Evaluation

29May 27, 2011

UK Real Estate Domain

50 domain web forms (sampled from over 2800)

Tested Domain-independent and Domain-aware

Page 43: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Analysis and Evaluatio - Real-Estate

30May 27, 2011

Form Fields Form Segmentsfound labeled correct segmentation

97.61% 96.68% 93.33%

Domain independent

Page 44: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Analysis and Evaluatio - Real-Estate

31May 27, 2011

Form Fields Form Segmentsfound labeled correct segmentation

100.00% 97.28% 95.31%

Form Fields Form Segmentsfound labeled correct segmentation

97.61% 96.68% 93.33%

Domain independent

Domain aware

Page 45: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Improve structural segmentation

Visual segmentation and labeling

Ontology guided segmentation

Accelerate domain adaption

Calling for machine learning for ontology creation

Enhance ambiguity resolution

Necessitating probabilistic logic in the future

Interactive form filling / probing

Conclusion and Future Work

32May 27, 2011

Page 46: OPAL Presentation

OPAL - ontology-based web pattern analysis with logic

Thank you very much !

May 27, 2011