2013 Tips and Tricks Track, Make it work like Google: Creating Search Indexes by David Haines

Preview:

DESCRIPTION

Users want to have one search box for everything and it needs to had autocomplete functionality. This presentation will go over how to create an attribute index table for searching across multiple tables and features in different databases. Then it will cover techniques applications can use the index table to implement autocomplete functionality.

Citation preview

Make it work like Google: Creating a search index

David Haines Land Use GIS Manager

Land Use

“Make it Work like Google”

#1. Auto-complete #2. One search field #3. Without at least one of the above people will use something else. #4. Good results

What Boulder County Land Use

we’re doing now…

There was this flood a few weeks ago…

Boulder County Land Use Dept. 9/17/2013

Pictometry 4/27/2011 Land Use Dept. 9/17/2013

Wasn’t Always this Way

Multiple Entry Fields

Different Types of Search

Too many databases

Hay for the Winter by Trey Ratcliff

http://www.flickr.com/photos/stuckincustoms/8672061349/

Attribution License

Hay for the Winter by Trey Ratcliff

http://www.flickr.com/photos/stuckincustoms/8672061349/

Attribution License

Here?

Here?

Here?

Here?

Here?

Here?

Here?

Here?

Here?

Address & Owner Database

Building Permit Database

Assessment Database

Place Name Database

Subdivision Database

Roll of Hay by Klearchos Kapoutsis

http://www.flickr.com/photos/klearchos/3824322183/

Attribution-NonCommercial License

Address & Owner Database

Building Permit Database

Assessment Database

Place Name Database

Subdivision Database

Roll of Hay by Klearchos Kapoutsis

http://www.flickr.com/photos/klearchos/3824322183/

Attribution-NonCommercial License

A Table to Search

What needs to be in the table?

What to search for

Where to go once you find it

Search Index Table fields Field*(ESRI) Description

SearchText The search term. The “correct” answer. A perfect hit.

Search Index Table fields Field*(ESRI) Description

SearchText The search term. The “correct” answer. A perfect hit.

Workspace The database the search term is in

Search Index Table fields Field*(ESRI) Description

SearchText The search term. The “correct” answer. A perfect hit.

Workspace The database the search term is in

FeatureClass The table in the database the search term is in

Search Index Table fields Field*(ESRI) Description

SearchText The search term. The “correct” answer. A perfect hit.

Workspace The database the search term is in

FeatureClass The table in the database the search term is in

IdField The field in the table the search term is in

Search Index Table fields Field*(ESRI) Description

SearchText The search term. The “correct” answer. A perfect hit.

Workspace The database the search term is in

FeatureClass The table in the database the search term is in

IdField The field in the table the search term is in

Id The record (or row) the the search term is in. This is what the search should go to.

Search: 1234 Main Street

Field Description

SearchText 1234 S Main Street Boulder

Workspace G:\gis.sde

FeatureClass dbo.parcel

IdField PIN

Id MKE11171971

Match

Search: 1234 Main Street

Field Description

SearchText 1234 S Main Street Boulder

Workspace \\gis\gisdata.sde

FeatureClass dbo.parcel

IdField PIN

Id MKE11171971

Zoom!

Search: 1234 Main Street

Blue skies and silos by Matthew Rutledge

http://www.flickr.com/photos/rutlo/3872475221/

Attribution-NonCommercial License

Different Data Silos?

Boulder County Land Use Search Index

Theme Example Database

Parcel Number 157416001234 Assessor - CAMA

Address 1234 S Main Street Assessor – CAMA

Owner John Doe Assessor – CAMA

Tax Account R01234567 Assessor – CAMA

Building Permit BP-13-0001 Land Use - Accela

Docket Number SPR-13-0010 Land Use – Accela

Docket Name Smith Residence Land Use – Accela

Subdivision Big Oak Meadows GIS - ArcSDE

Mining Claim Name Blue Bird Mine #2 GIS – ArcSDE

Geographic Names Longs Peak GIS - ArcSDE

Do the hard work to

make it simple

“Making something look simple is easy; making something simple to use is much harder — especially when the underlying systems are complex — but that’s what we should be doing.” https://www.gov.uk/designprinciples#fourth

Steps to update the index table

Load existing index

Steps to update the index table

Load existing index

Load reference data (i.e. data your searching)

Steps to update the index table

Load existing index

Load reference data (i.e. data your searching)

Find reference data not in index

Steps to update the index table

Load existing index

Load reference data (i.e. data your searching)

Find reference data not in index

Add that data

Steps to update the index table

Load existing index

Load reference data (i.e. data your searching)

Find reference data not in index

Add that data

Search for index data not in the index (deleted)

Steps to update the index table

Load existing index

Load reference data (i.e. data your searching)

Find reference data not in index

Add that data

Search for index data not in the index (deleted)

Remove it

Steps to update the index table

Load existing index

Load reference data (i.e. data your searching)

Find reference data not in index

Add that data

Search for index data not in the index (deleted)

Remove it

Repeat for each dataset

Swiss Army Knife by AJ Cann

http://www.flickr.com/photos/ajc1/4663140532/

Attribution License

One Search Index Many Applications

Now that you built your table…

… optimize your applications

Optimizing wildcard searching Try First: “OnlyWord%”

Try Second: “%OnlyWord%”

Search: “tree”

Results:

“Tree View”

“Treetop”

“Green Tree”

“North Woodtree”

Optimizing wildcard searching

Word1 Word2%

Word1% %Word2%

Word1% %Word2% %Word3% …

%Word1% %Word2%

%Word1% %Word2% %Word3% …

F

a

s

t

e

r

Score Your Results

The Levenshtein distance between two words is the minimum number of single-character edits (insertion, deletion, substitution) [including spaces] required to change one word into the other.

Source: en.wikipedia.org/wiki/Levenshtein_distance

1234 Ced

1234 N Cedar Brook Lane

Statistics

Address Searches 85%

Owner Searches 5%

Parcel Number Searches 5%

Address Search Average 6.5 Characters

Address Search 1.6 Words

49% one word, 43% two words, 8% 3+ words

Land Use

Pictometry 4/27/2011

Land Use Dept. 9/17/2013

Recommended