13
How Text Analytics Increases Search Relevance 1

How Text Analytics Increase Search Relevance

Embed Size (px)

Citation preview

Page 1: How Text Analytics Increase Search Relevance

How Text Analytics Increases Search Relevance

1

Page 2: How Text Analytics Increase Search Relevance

Users care about findability, not search

Findability is the ease of which someone can locate the information they want. Often, it is confused with search – but search is just one method of achieving findability. Search allows people to enter in words that they hope are contained in the content they want to retrieve. Findability includes any method of locating this content, including but not limited to searching. Pingar DiscoveryOne improves findability.

2

Page 3: How Text Analytics Increase Search Relevance

Significantly, findability includes Facetted Search. Facetted search allows people to filter a search by various categories and topics to remove irrelevant search results and more rapidly spot the content they are looking for. Facets can also be used to filter lists and views as well as search results. Studies1 show that users evaluated facetted search as the most desirable feature to improve findability.

Example of a facetted search by Category

1 Divoli, A. and Medelyan, A. Search interface feature evaluation in biosciences, HCIR 2011, Google Mountain View, CA, USA Workshops

3

Page 4: How Text Analytics Increase Search Relevance

By removing the irrelevant content, facetted search improves search relevance – the number of useful documents on the first page of search results. Without facetted search, your investment in enterprise search cannot deliver its full potential. Facetted search however relies on documents being categorized and tagged with keywords and phrases associated with them – this is called metadata. Without metadata, there can be no facetted search. Unfortunately your staff do not enter metadata. Some systems, such as email, may not even allow users to enter metadata. This is why we created DiscoveryOne – it’s an automated way to add metadata.

4

Page 5: How Text Analytics Increase Search Relevance

Users get most benefit from facetted search with key phrases

Key words and phrases are the most beneficial metadata for facetted search. When searching to gather specific information or to find facts, people prefer a few relevant facets of Pingar generated keyphrases. If you have facetted search in your Enterprise Content Management System (ECMS) or Enterprise Search engine, then facetted search on keywords is critical.

As employees are unlikely to record keyphrases, they must be automatically identified by a machine system such as Pingar DiscoveryOne. DiscoveryOne reads a document and identifies the words and phrases that best describes the topics inside a document.

5

Page 6: How Text Analytics Increase Search Relevance

Document categories can also be useful

In addition to keyphrases, organizations define metadata such as what project a document belongs to or which client or product line, etc. Matching these to a document allows this metadata to be used with facetted search as well.

Unlike traditional technology, DiscoveryOne has two advanced forms of categorizing content automatically:

• By topic (e.g. product or projects or known issues)

• By content-type (e.g. employment contract or financial statement)

6

Page 7: How Text Analytics Increase Search Relevance

Categorizing by topic with taxonomies

Categorizing content by topic uses taxonomies. Taxonomies are a pre-defined set of categories.

Taxonomies can be flat lists Taxonomies can have hierarchy

7

Page 8: How Text Analytics Increase Search Relevance

Taxonomy categorization works well when you:

• Have a clear idea of the categories you want

• Can determine the words and phrases that a document will have to indicate what category it matches

This is where Pingar text analytics expertise becomes useful. Pingar does more than match the names of the categories when it categorizes by topic.

8

Page 9: How Text Analytics Increase Search Relevance

It also:

9

Page 10: How Text Analytics Increase Search Relevance

Traditional systems tried to use arcane rules that your employees would have to learn and enter in, however the modern text analytics developed by Pingar does not require that, so it’s faster and less expensive to implement.

10

Page 11: How Text Analytics Increase Search Relevance

Categorizing by content type with statistical models

Taxonomy categorization does not work well with determining what the nature of the content is – is it a letter or a brochure or a contract or financial statement? Statistical models are far superior when categorizing documents by content type and traditional technologies do not allow for this. Statistical models are also useful when you don’t know in advance what words are going to occur.

11

Page 12: How Text Analytics Increase Search Relevance

Text categorization uses a statistical model built specially for the categories and Pingar tools enable this. In order to generate the model, example documents of each category are fed into the tool and the tool learns what makes documents in each category the same.

12

Page 13: How Text Analytics Increase Search Relevance

www . p i n g a r. c om

North America440 N. Wolfe Rd, CA 94085

Sunnyvale, USA+1 408 663 2328

Asia Pacific55 Anzac Ave, 1010Auckland, New Zealand+64 9 950 3299

Thank you

13