52
© FINDWISE 2012 Implementing and designing search solutions Gothenburg University – Gothenburg – 2012-03-08

Designing and Implementing Search Solutions

Embed Size (px)

Citation preview

© FINDWISE 2012

Implementing and designing search solutions

Gothenburg University – Gothenburg – 2012-03-08

Agenda

• Introduction to Findwise

•Technical approach

•DIY UX design

•Research

• Founded in 2005

• Offices in Sweden, Denmark, Norway and Poland

• 72 employees (February 2012)

• Our objective is to be a leading provider of Findability solutions utilising the full potential of search technology to create customer business value

About Findwise

Technology independent

  Creating search-driven Findability solutions based on market-leading commercial and open source search technology platforms:

Autonomy IDOL Microsoft (SharePoint and FAST Search products) Google GSA IBM ICA/OmniFind LucidWorks Apache Lucene/Solr (Open source) and more…

Findability Challenges

Employee productivity (DN article, March 2011):

”The effort to find the right information costs an average company 80,000 SEK per employee and year”

Customer Service quality and efficiency (Accenture report, March 2011):

“69% of agents don't have answers to help service customers”

E-commerce conversion rate (Google survey, December 2010):

“77% of those surveyed used search within an e-commerce website to find products”

Information overload?

A search engine alone is not enough

Technical approach

RE-USE

STANDARD

Standard architecture

Search core

Search core - overview

Title: Brown foxContent: The quick brown fox jumps over the lazy dogAuthor: Tobias Berg

Documents

Title: My dogContent: My old dog cannot jump anymoreAuthor: Svetoslav Marinov

Term Documents

… …

fox 1

jump 1,2

lazy 1

dog 1,2

tobias 1

berg 1

… …

Inverted index

TokenizationStemmingStop-word…

Relevancy

Retrieveddocuments Relevant

documents

•Precision – how many of the retrieved documents are relevant?

•Recall – how many of the relevant documents were retrieved?

Relevancy

Recall find everything related to the query

- lemmatization- synonyms- wildcards- anti-phrasing- or-operator

Precision find only entities related to the query

- exact word matching- exact phrase

matching- and-operator

GoalImprove precision,

without sacrificing recall

Search core – relevance score

•TF/IDF

•Field length

•Field weight• Title *2

• Author *4

• Content *1

•Freshness•…

Search Core

•Optimized for full-text search

•Sub-second responses

•Tunable relevance

•Scalable

•Configurable & Extendable

{query}

Find matching documents

Score documents

{result}

Standard architecture

Connectors

Connectors – fetch data

Database connector

Id Product name

Description Price

1 Wheel Makes the bus go round round round

45

2 Window A shield of glass

12

Id Book name Abstract Author

1 Ulysses Irish novel James Joyce

2 Crime and Punishment

Russion novel

Dostoevsky, Fyodor

Database connector

Connector framework – code example

public void execute() {//Insert code to fetch content

}

public void interrupt() {//Insert code to handle interrupt signal

} public void init() {

//Insert code to initialize connnector }

Connector Frameworks

http://incubator.apache.org/connectors/

http://code.google.com/p/google-enterprise-connector-manager/

• Existing connectors• Re-usable• Configuration interfaces• Standardized implementation

Standard architecture

Pipeline

Pipeline - overview

• PDF/Office -> Text• Lemmatization• Language identification• NER• Phonetic search• Keyword extraction• External calls• …

Pipeline framework – code example

protected void addAction(Document doc) throws PipelineException {//Insert codedoc.addField(“Title”,”Hello world!”);

}

protected void updateAction(Document doc) throws PipelineException {//Insert code addAction(item);

} protected void deleteAction(Document doc) throws PipelineException {

//Insert code }

NLP tools and approaches

• Open source:GATE, OpenNLP, UIMA, StanfordNLP, Mallet, Apache Mahout

• Proprietary:

IBM LanguageWare• Own components:

e.g. KeywordExtraction Service; LanguageIdentify• POS taggers – Hunpos, OpenNLP, Mallet• Dependency Parsers – MaltParser, StanfordParser• NER – rule-based + statistical models• Document summarization• Document clustering

Pipeline – configuration example

Pipeline frameworks

Findwise Hydra

http://www.pypes.org/

http://www.openpipeline.com/

• Re-usable stages• Configuration interface• Focus on task

Putting it all together

What the frell is UX design?

What the frell is UX design?

• Interaction design

•Usability Engineering

• Information Architecture

•Visual Design

Findwise UX design principles

Users want results

Dialogue not monologue

Participation builds trust

Answer frequent questions

Simple but powerful

Users want results

Dialogue not monologue

Participation builds trust

Answer frequent questions

Simple but powerful

Findwise UX design principles

Users want results

Dialogue not monologue

Participation builds trust

Answer frequent questions

Simple but powerful

DIY UX design

DIY UX design

Design research

Analytics

Usability tests

Iterate!

Design research

•Be easy to reach – keep contact

•Let users requests guide you when prioritizing new features

•Listen & try to discover the underlying problem

•Try to find out what the user needs not what they say they want

Analytics

•Web analytics

•Search analytics

•A/B testing

Usability tests

•Test early - test often

•Use sketches, paper prototypes, static prototypes and working prototypes!

•Create real tasks or problems

•Don’t ask them how they would want it

•Test on friends and family or colleagues

Iterate!

Why UX design?

•Improved requirements

•Better feedback

•Eliminate bias

•Less development time

Summary

•Listen & try to discover the underlying problem

•Search analytics – Top queries

•Do usability tests early & often

• Iterate!

Research

•Collaboration with Universities

GU, Borås, KTH, Copenhangen U.

•EU projects

RUSHES

• Master’s Thesis supervision

Chalmers, KTH, Lund

Master’s Thesis projects

•A way to test ideas

•A way to recruit people

•A way to cooperate with Universities

•Keyword Extraction

•Document Clustering

•NER

•Document summarization

•Extracting structural information from text

•Query log analysis

Resources - books

•The design of everyday things

•Don’t make me think

•Search analytics for your site

•ManifoldCF in Action

•Taming Text

Tobias Berg

Björn Klockljung Johansson

Svetoslav Marinov

[email protected]

[email protected]

[email protected]

Thanks!