Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services

Presented by

Edgar Cornejo03.03.14

LAMISpring 2014

Search Engine and Services

Outline

Mobile information search for location-based

information

Web-a-Where: Geotagging Web Content

The design and implementation of SPIRIT:

a spatially-aware search engine for

information retrieval on the Internet

Mobile information search for location-based information

Department of Industrial Engineering Tsinghua University

Beijing, ChinaApril 2010

Chengyi Liu · Pei-Luen Patrick Rau · Fei Gao

Mobile search for location-based information


The study investigated the

effects of location and

information type in mobile

searching for location-based

information by carrying out

two experiments in an airport

Mobile search scenario

High time

pressure

Many environment

al disturbance

s

Device limitation

s (screen size, input

method)

Restricted users’

operations


Mobile searching context

Information queries

+ location

More suitable results


Since most of the information is location-based [1,2],

the results can be improved by analyzing information

queries and location

Search Engine

Features of mobile interaction [3]


User's hands are often

used to manipulate

physical objects

Users may be involved in

tasks that demand a high

level of visual attention

Features of mobile interaction [3]


Users may be highly

mobile during the task

and have high-speed

interaction

Search queries


Query Type Purpose Share*

Navigational query

to reach a particular site 29.4%

Informational query

to find information 10.2%

Transactional query

to visit a site and perform some web-mediated activity

60.4%

*According to a large scale study of European mobile search

behavior developed in 2008 [4]

Factors proposed that may influence the mobile information

search

Experiment 1 - Hypotheses


Hypothesis 1

For information searches in mobile versus non-mobile:

The average of clicks in mobile is less

The first search is more important

Free recall is worse



Hypothesis 2

For information searching about location-based with respect to non-location-based information

The number of clicks is less

The first search result is more important

Free recall is better

Experiment 1 - Tasks

Experiment 1 - Results


Hypothesis 1

The intention was to find how the user’s context (mobile vs. non-mobile) might affect the user’s information searching performance

The average of clicks in mobile are less

False

The first search is more important False

Free recall is worse False



Hypothesis 2

The intention was to examine how the information type (location-based vs. non-location-based) might affect the user’s information searching performance The average of clicks in mobile are

lessTrue

The first search is more important True

Free recall is better True



Hypothesis 3

For mobile information searching under high pressure with respect to low pressure info requirement:

Average number of clicks are less

The first search result is more important




Hypothesis 4

For mobile information searching of informational or navigational with respect to transactional queries

Number of clicks is greater

The first search result is less important


Experiment 2 - Tasks

Experiment 2 - Result


Hypothesis 3

The intention was to examine how the information pressure (high vs. low) requirement might affect a user’s mobile search performance

The average of clicks is less True

The first search is more important False

Free recall is worse False



Hypothesis 4

The intention was to examine how the how the location-based information type (informational, navigational vs. transactional) might affect a user’s mobile search performance.

The average of clicks is greater True

The first search results are less important

True

Free recall is worse True

Summary


Information type (location-based vs. non-location-based) was found to be effective in user performance during the information search process

Information requirement pressure and location-based information type (navigational, informational and transactional) affect the mobile search process

The first two search results were found to be very important to good search efficiency and good user satisfaction


Einat Amitay · Nadav Har’El Ron · Sivan Aya Soffer

IBM Haifa Research LabHaifa 31905, Israel

July 2004



Is a system for associating geography with Web pages

Locates mentions of places and determines the place each name refers to

Assigns to each page a geographic focus a locality that the page discusses as a whole

Implemented within the framework of the IBM WebFountain data mining system



Pages may have two types of geography associated

with it: a source and a target.

Source geography has to do with the origin of the

page, the physical location, address of its author,

etc.

Target geography is determined by the contents

of the page and relates to the topic the page is

discussing.

Ambiguities


Geo/non-geo ambiguity is the case of a place

name having another, non geographic meaning

e.g. Mobile (Alabama) or Reading (England)

Geo/geo ambiguity arises when two or more

distinct places have the same name

System Components


Geotagger (Main component)

Finds and disambiguates geographic names

Assigns a taxonomy node to each phrase in the

text to refer to a place e.g., Paris/France/Europe

The gazetteer

Database that keeps the list of geographic names,

their canonical taxonomies and other information

Tagging individual place names


The processing of a page is done in three

phases:

Spotting DisambiguationFocus

determination

1. Spotting place name candidates


Finding all the possible geographic names in each

page

Short abbreviations are not spotted e.g. IN (for

Indiana) or AT ( for Austria) but used to help

disambiguate other spots e.g. Gary, IN

2. Disambiguating spots (Algorithm)


The geotagger assigns a unique meaning to spots

that can be uniquely qualified. Confidence 95%

Combinations that are not unique are left

unassigned

In a page with multiple spots with the same name

where only one is qualified, this value is assigned

to the others. Confidence 80%

Disambiguation contexts are also used to

unassigned spots with confidence less than 70%

2. Disambiguating spot (Data sources)


The Geographic Names Information System

(GNIS) for U.S. locations

world-gazetteer.com for non-U.S. locations

United Nations Statistic Division (UNSD) for

countries and continents

ISO 3166-1 for country and other abbreviations

3. Focus determination


The basic idea is that if several cities from the

same region are mentioned, probably this region

is the focus

Sometimes cannot be said that a page has only

one focus

The confidence score should be taken into

account when finding the focus, giving higher

weight to information coming from locations with

higher confidence

Example


A certain page contained four mentions of Orlando/Florida (assigned confidence 0.5), three Texas (0.75), eight Fort Worth/Texas (0.75), three Dallas/Texas (0.75), one Garland/Texas (0.75), and one Iraq (0.5)

A human was asked to judge what is the geographical focus of this page and responded with “It’s about Texas and perhaps also Orlando”

Indeed, that page comes from the “Orlando Weekly” site, in a forum titled “Just a look at The Texas Local Music Scene...”

Evaluating geotagging precision


CollectionNumber of

pages Accuracy

Arbitrary collection 200 81,7%

.GOV collection 200 73,3%

Open Directory Project (ODP)

200 63,1%

Geotags assigned automatically versus defined manually

Evaluating focus


92% Correct up to country level

8% Incorrect country

38% Precise match

30% Correct state

or city

24% Correct country

4%Correct

continent

4%Continent

wrong

Comparison of Web-a-Where-determined focus to human-determined one (ODP) for ~1 million pages

Summary


The system is able to correctly tag individual

name place occurrences 80% of the time and

define correct focus of a page 92% of the time

Accuracy can be further improved

The main source of errors is geo/non-geo

ambiguity

The design and implementation of SPIRIT

Ross Purves, Paul Clough, Christopher Jones, Avi Arampatzis, Benedicte Bucheri, David Finch, Gaihua Fu, Hideo Joho, Awase Hhirni Syed, Subodh Vaid and

Bisheng Yang

Department of Geography, University of Zurich, Switzerland

Department of Information Studies, University of Sheffield, UK

School of Computer Science, Cardiff University, UK

Institute of Information and Computing Sciences, Utrecht University, Netherlands

Laboratoire COGIT - Institut Geographique National, France

August 2007



This paper describes the design and implementation

of a complete solution to geographic information

retrieval

Requirements


Exhaustive retrieval of relevant documents in a

specified area

Place names should be automatically identified,

and interactively disambiguated

Ability to query for geographical areas whose

boundaries are imprecise

Requirements


Spatial concepts relating different geographic

entities should be represented (outside, in)

It should be possible for users to specify the area

of interest on a map

Ability to view query results on a map linked to

relevant web documents

Document ranking should combine both spatial

and thematic aspects of document relevance

Architecture Overview


User interface Broker

Relevance ranking

IndexesTextualSpatial

Web data collection

documents

Search Engine

Geographical

ontology

Metadata Doc-to-

footprint mapping

Query disambiguationQuery expansion

Rank results

Search request

Geo-coding

Access indexes

Spatial index

Textual index

Geo-parsing

Run-time

Pre-processing

Functionality of the components


Pre-processing the document collection

Assigning spatial footprints to web documents:

Identify geographical references

(geoparsing)

Assign them to spatial

coordinates (geocoding)

Spatial footprint



Building document indexes

Grid-based spatial indexing

For each cell of the grid, a list of

document ID’s was constructed, using

the document footprints which resulted

from the geo-tagging process



Retrieving the results: “T” (Text) Scheme

Simplest approach

Retrieve all the documents that match the

concept terms of the query and then filter to

return only those which intersect the

geographical scope of the place in the query

(footprint)



Retrieving the results: “ST” (Space-Text)

Scheme

More integrated approach

Regarded as a space-primary method

At search time the cells that intersect the query

footprint are determined and then only the

corresponding text indexes are searched



Retrieving the results: “TS” (Text-Space)

Scheme

Better query response time

Regarded as a text-primary method

At search time, for each term, the associated

documents are grouped according to the spatial

index which they relate to

Query interfaces


Results display


Evaluation


Performance analysis

A relevant document to the query had to be both

thematically and spatially relevant.

In this sense, the key result of the work is that

spatially aware search outperformed text-only

search.

Evaluation


Usability analysis

Strongly disagree

Disagree Neutral Agree Strongly agree

0

5

10

15

20

25

30

It was easy to get started with the system and make my query

No, not at all A little Yes, very much0

5

10

15

20

25

30

It was easy to find the locations of doc-uments listed to the right of the map on

the map

Conclusions


The paper describes a unified approach, as well

as the architecture, for introducing spatial-

awareness into search-engine technology

A prototype system demonstrated the

effectiveness of the strategy

Personal Conclusions


The first study that can lead to changes in search

engines and devices to improve the mobile

experience

The web-a-where system provides good insight

for further location search improving though is

not very precise

SPIRIT is a complete new paradigm in space

aware searching but the interaction methods can

be improved

Thank you

References

General References

[1] M. Sanderson, J. Kohler, Analyzing geographic queries,

in: Proceedings of the SIGIR 2004 Workshop on Geographic

Information Retrieval, Sheffield, UK, 2004.

[2] S. Asadi, Searching the World Wide Web for local

services and facilities: a review on the patterns of location-

based queries, in: WAIM’05, Hong Zhou, China, 2005.

[3] S. Kristoffersen, F. Ljungberg, ‘‘Making Place’’ to make

IT work: empirical explorations of HCI for mobile CSCW, in:

Paper Presented at the International ACM SIGGROUP

Conference on Supporting Group Work, 1999.

References

General References

[4] K. Church, B. Smyth, K. Bradley, P. Cotter, A large scale

study of European mobile search behavior, in: Proceedings

of MobileHCI’08, 2008, pp. 13–22.

[5] M.A. Neerincx, J.W. Streefkerk, Interacting in desktop

and mobile context: emotion, trust and task performance,

in: Paper Presented at the Proceedings of the First European

Symposium on Ambient Intelligence (EUSAI), Eindhoven,

The Netherlands, 2003.

Documents

Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services