Older White Paper Example

8/14/2019 Older White Paper Example

1/14

Knowledge Base Development

and RME Processing

A Rapport Technical White Paper

DRAFT COPY

January 16, 2003

1999 Banter Technology Inc. All rights reserved.


2/14

RapportVersion 3.0

19971999 Banter Technology Inc. All rights reserved.

The contents of this documentation are strictly confidential and are proprietary to Banter Technology

Inc. No part of this documentation may be reproduced, transmitted, or stored, in any form, in whole or

in part, or by any means for any purpose without the prior written consent of Banter Technology Inc.

The software described in this document is furnished under a license agreement and may be used orcopied only in accordance with the terms stipulated therein.

Banter Technology Inc. reserves the right to modify the information contained in this document

without prior notification.

Rapport is a trademark of Banter Technology Inc.

Microsoft, Outlook, and Windows are registered trademarks of Microsoft Corporation. Other product

and company names mentioned in this document may be trademarks of their respective owners.

Banter Technology Inc.

60 Federal Street

Suite 550

San Francisco, CA

94107

Tel: 1-415-247-2600

Fax: 1-415-247-2626

E-mail: [email protected]


3/14

Knowledge Base Development and RME Processing Page 1

Introduction

The Rapport Knowledge Base is a unique, adaptive repository of linguistic and

statistical information that enables Rapport to accurately manage and classify high-

volume customer e-communications. Rapport is a learning systemthe Knowledge

Base continuously evolves to model an organizations current communication

environment.

Working in conjunction with the Rapport Knowledge Base, Rapports Relationship

Modeling Engine (RME) analyzes customer communications and takes the most

appropriate action on-the-fly, based on various user specifications, including

Rapports broad spectrum of configuration settings.

This white paper examines Rapports unique adaptive Knowledge Base, its

development process, and how it enables the RME to accurately process and classify

messages.

Background: Rapports RME Architecture

In Rapport, each customer message is processed according to user-defined categories.

Categories determine which automatic or semi-automatic action is taken with each

message.

Each category represents the content of a message, or indicates some other attribute of

a message such as its source. For example, a financial institution may define

categories like Checking Balance, Transfer Request, and Mortgage Infothese

categories represent the types of customer communications they commonly receive.

In the Rapport Knowledge Base, categories are associated with linguistic concept

models (discussed below) that are used by the RME for message classification. These

concept models determine the relevance of incoming messages to the categories in the

system.

Optionally, categories in the Knowledge Base may be associated with logical

expressionsformulas or statements used to refine or override the RMEs concept-

based message classification. For example, a category may be associated with the

following expression: $R_secured(s) == YES. Rapport analyzes an incoming

message, and using this expression, assigns the associated category 100% relevancy if

the message originated from a secure source.

During the Rapport configuration process, each category is associated with properties

that determine which actions are taken for each message. For example, a message

received by a financial institution is matched to the Mortgage Info category. This

category may have properties that instruct Rapport to compose and send an

appropriate automatic reply using standard pre-written text containing mortgage

information. Alternatively, the Mortgage Info categorys properties may be set to

route the message to an appropriate queue for manual handling.

This document contains confidential and proprietary information.


4/14

Page 2 Knowledge Base Development and RME Processing

RME Analysis and Message Processing

The RME uses linguistic data and complex statistical algorithms to accurately analyze

and classify customer messages. Each message entering Rapport is analyzed by the

RMEs two primary components: the Natural Language Processing (NLP) engine and

the Rapport statistic engine.

The NLP engine identifies conceptsbasicunits of linguistic or quantitative data

contained within each message. Linguistic data may be based on semantic, contextual,

and morphological information. Quantitative data may include various indicators

derived from the message, such as the number of sentences in a message.

For example, a message may contain the word depositing. Rapports NLP engine

uses morphological analysis to derive the base form of this word as depositan

identifiable concept used to classify the message.

After a messages concepts are identified by the NLP engine, they are exported to the

statistic engine as concept models, the format used for Rapports statistical analysis.

Rapports statistic engine compares a messages concept models to each categoryspre-existing concept models in the Knowledge Base which were gathered during the

Learning and (optional) Training process described below.

The following example illustrates Rapports concept-based analysis: A financial

institution receives a message requesting information about mortgages. The RME

analyzes this message and identifies the linguistic concepts it contains. Then the RME

compares these concepts to all concept models associated with categories in the

system, and determines that the Mortgage Info category best matches this message.

Using unique proprietary algorithms and formulas to derive category relevancy, the

statistic engine calculates category scorespercentage values reflecting the

likelihood that a message belongs to a category. The statistic engine may also uselogical expressions to extract and evaluate message parameters that may influence or

determine category relevancy. Depending on a broad spectrum of system

configuration settings, the message is routed for appropriate automatic or semi-

automatic actions.

The simplified diagram below illustrates the RMEs message processing flow:

1. An incoming message enters the Rapport system.

2. The NLP engine identifies concepts within the message using linguistic data

stored in the Knowledge Base.

3. Concepts are exported as concept models to Rapports statistic engine.

4. The statistic engine compares the messages concept models with each

categorys existing concept models to determine category relevancy.

Optionally, logical expressions are used to refine or override concept-based

message categorization.



5/14


5. The message is routed for an automatic or semi-automatic action, based on

category properties and other configuration settings.

Simplified diagram illustrating the RME message processing flow

Rapports Adaptive Knowledge Base

The knowledge required for accurately classifying each customer message is stored in

Rapports adaptive Knowledge Base. The Rapport Knowledge Base is a repository of

linguistic and statistical information used during RME processing. The Knowledge

Base includes a framework of user-defined categories, built according to the specific

requirements of each organization using Rapport. It is fully adaptiveLearning

(discussed below) automatically updates linguistic and statistical information to

improve future message classification.

The Rapport Knowledge Base consists of two components: the Linguistic KnowledgeBase (LKB) and Statistic Knowledge Base (SKB). Rapports LKB contains a glossary

of standard English usage, semantically significant words, linguistically identical

words, grammar, rules for morphological analysis, and optional domain-specific

terminology. Rapports SKB contains hierarchical or flat decision treesframeworks

of categories. Each category in a decision tree is associated with the concept models

and optional logical expressions that enable Rapport to accurately classify messages.


Rapport Knowledge Base

LKB: Linguistic Data

Glossary of Standard English Usage Semantically Significant Words Linguistically Identical Words Grammar Rules for Morphological Analysis Optional Domain-Specific Terminology

SKB: Statistic Data Hierarchical or Flat Decision Trees Categories populated with Concept Modelsand optional Logical Expressions

Decision Trees

Concept Models and(optional) Logical Expressions

(stat_matching


6/14


Building the Statistic Knowledge Base

Rapports Knowledge Base Editor application is used to create decision tree structures

stored in the SKB.

Decision trees may have either a flat or hierarchical structure, determined by anorganizations message classification requirements. Flat decision trees are category

lists, employed when a hierarchical organization of categories is not warranted.

Hierarchical decision trees are well-suited for organizing categories that break down

logically into successively greater levels of detail.

The following simplified diagram represents a section of a financial institutions

hierarchical decision tree. Branches of the hierarchical tree are designated by ovals;

sub-branches associated with categories appear in gray. Note that the categories

beneath each branch are logically related; for example,Address Change, Telephone

Change, andE-mail Change are all related to the Customer Info branch.

Representation of a hierarchical decision tree

Note: Using the Knowledge Base Editor, logical expressions may also be associated

with specified categories at this stage to further refine the classification

process.

Once a skeletal decision tree structureeither hierarchical or flathas been created,

concept models for each category (used for message classification) are gathered for

each category through the Learning or Training processes.

Learning

Learning is an ongoing automatic process, invisible to the user, that gathers concept

models for each category in the SKB over time. Concept models are gathered by

collecting feedback from normal message processing activity, bootstrapping the

system for accurate message classification in the future.


StatementCopy

CheckCopy

Check

Order

Travelers

Checks

Foreign

Currency

AddressChange

TelephoneChange

E-mailChange

Orders

Requests Customer Info


7/14


For example, when a customer service agent uses the Rapport Message Center

application to compose a reply to a message, the agent may choose from a database of

pre-written responses linked to categories in the Knowledge Base. The act of choosing

a response provides feedback to the system; concept models contained in the message

form the basis of concept models associated with categories linked to the response.

In addition to bootstrapping the system, learning continuously updates and enriches

existing concept models in the SKB during normal Rapport usage. Learning is an

organic process, enabling the Knowledge Base to grow and adapt over time. Concept

models are refined by introducing new information derived from changes that have

occurred in the composition of messages, and from agent activity. As Rapport learns,

it broadens the base of concept models, making the system more precise over time.

Training

The Training process is an optional, but recommended method for gathering models

for categories in the SKB decision tree. Training is implemented offline, and involves

analyzing a corpus of sample messages classified into pre-defined categories. These

messages are first processed by Rapports Lexical Editor to enrich the LKB with user-

specific linguistic data. Then each message in the corpus is processed individually by

the NLP and Statistic engines to populate the SKB with models used to classify

incoming messages.

The sections that follow discuss the Training process in greater detail.

Stages of Knowledge Base Development

Rapport Knowledge Base development based on Training is an optional, but

recommended process implemented offline, consisting of the following stages:

Creating a Corpus

A corpus of sample messages, pre-classified according to categories, provides

source material for NLP and Statistic Training processes that build the Rapport

Knowledge Base.

The Pre-Training Process

An optional process that enriches the LKB by extracting and identifying

significant words and linguistic information that are unique to the corpus of

messages.

Knowledge Base Building

An optional process consisting of two stages: NLP and Statistic Training. NLP

Training generates concept modelsunits of linguistic information used by the

statistic engine to build the SKB. Statistic Training gathers concept models from

each message in the corpus and updates each categorys models in the SKB

decision tree.



8/14


Creating a Corpus

A corpus is a collection of sample messages gathered by an organization (prior to

using Rapport) that have been pre-classified according to their subject matter. The

corpus provides source data used during Pre-training, NLP Training, and Statistic

Training (described below).

The corpus may be organized by grouping similar messages in directories or folders

according to category names that represent the messages content. Alternatively, each

message may have a field or data identifier that indicates its category (or categories).

For the subsequent Pre-Training and Training processes to be most effective, the

corpus should only contain messages that are accurately classified and free of

extraneous text (unrelated to the messages category). An ideal corpus consists of

messages that are classified according to well-defined categories (avoiding

redundancies between categories), with textual content that is consistentlyrepresentative of the categorys subject. As many messages as possible with similar

message content should be grouped together for each categorymore messages per

category improves the quality of concept models created during the statistic Training

process (described below).


. . .

Category nCategory 2Category 1 Category 4

Category 2

Category 1,5

Category 3

A corpus with similar messages groupedtogether in separate folders or directoriesrepresenting categories

A corpus where eachmessage is associatedwith one or morecategories


9/14



Pre-Training is an optional process that extracts and identifies significant linguistic

information unique to the corpus being analyzed. This data enriches the LKB,

improving NLP Training and the RMEs ability to accurately classify messages

online.

Each business or organization has its own vocabulary of words that are unique and

significant. For example, an Internet Service Provider (ISP) may consider the words

Internet Connection to be significant, while an airline passenger service might

decide that these words are insignificant. At the same time, both companies would

probably consider the word connection to be significant, but they would define

connection in two entirely different ways. To the ISP, a connection is an Internet

hookup; to an airline, its an air flight. In contrast, an insurance company may define

connection as insignificant.

Rapports Lexical Editor application is used to implement the Pre-Training process.

The Lexical Editor analyzes the corpus of messages, filters the text, and generates lists

of simple linguistic units called tokens and token pairs, organized according to

frequency.

A token is a string of characters identified by the Lexical Editor within a body of text.

When the system analyzes the text of a corpus, it searches for delimiter characters

such as spaces and typographical marks (periods, colons, etc.). Any string of

characters found between these delimiters is recorded as a token. Significant tokens,

non-significant tokens, and word associations are identified using the Lexical Editor,

and stored in the LKB.

Note: The Pre-Training process is particularly useful for preparing the RME toaccurately classify and process messages from international sources,

especially messages including frequent misspellings and non-standard

English usage.


LKBLinguistic

Knowledge Base

Receives corpus-specificlinguistic data from theLexical EditorAlso contains additionaldomain knowledge(optional), standard Englishword lists, grammar, andrules for morphologicalanalysis

LKBLinguistic

Knowledge Base

Receives corpus-specificlinguistic data from theLexical EditorAlso contains additionaldomain knowledge(optional), standard Englishword lists, grammar, andrules for morphologicalanalysis

Lexical EditorAnalyzes the corpusFilters the word baseGenerates lists of single tokensand token pairs

Calculates token frequencyEnables the user to identifysignificant and non-significanttokens, and define wordassociationsStores information in theLinguistic Knowledge Base

Lexical EditorAnalyzes the corpusFilters the word baseGenerates lists of single tokensand token pairs

Calculates token frequencyEnables the user to identifysignificant and non-significanttokens, and define wordassociationsStores information in theLinguistic Knowledge Base

Corpus


10/14


Knowledge Base Building

Knowledge Base building based on a corpus is implemented in two phases: NLP

Training and Statistic Training.

The NLP Training Phase

During the NLP Training phase, the NLP engine analyzes and processes each message

in the corpus individually in two stages: Pre-Processing and Processing.

During Pre-Processing, the NLP engine analyzes each message text, identifies the

portion of text to be processed, and generates an intermediate representation of the

concepts contained in the message. In the Processing stage, the NLP engine uses

morphological rules, word associations, and other linguistic techniques to accurately

determine the concepts contained in each message, and the associations between them.

These concepts are exported to the statistic engine for statistic Training via the

Concept Modeler. The Concept Modeler converts the messages concepts intoconcept modelsa format used by the statistic engine to build the Statistic

Knowledge Base.


Statistic

EngineImplements

Statistic Training

Concept

Modeler

Converts conceptsinto concept models

Concept

Modeler

Converts conceptsinto concept models

LKBLinguistic

Knowledge Base

Corpus

NLP Engine

ProcessingUses morphological rules, wordassociations, and complexalgorithms for generating

concepts, and concepts basedon other conceptsExports concepts to the ConceptModeler

Pre-ProcessingAnalyzes and processes each

message individuallyIdentifies the portion of text to beprocessed

Receives data from theLinguistic Knowledge BaseGenerates an intermediaterepresentation of concepts

Concepts

s s s s s

s

s s s s s

s


11/14


The Statistic Training Phase

Statistic Training is implemented using the Rapport Knowledge Base Editor

application. A skeletal decision tree structure is built based on the same categories

used to classify messages in the corpus. During statistic Training, the statistic engine

receives concept models from each corpus message individually. The statistic engine

builds the SKB by performing operations on these concept models, and creatingmodels for the categories of each message in the SKB decision tree. The result is an

SKB populated with models that accurately classifies incoming messages during

online RME processing.

Note: Statistic Training may also provide feedback (manually) to the NLP

Training process, improving NLP analysis and the determination of

concepts.

Updating the Knowledge Base

Rapport readily adapts to almost any change in your incoming message environment.

In some situations, however, the Learning process may take time. A more immediate

solution is running an accelerated version of the Pre-Training and Training processes.

Repeating the Pre-Training and Training (as required) ensures optimal message

classification.

It is recommended to repeat these processes when:

Major changes have been made to categories

Demographic or geographic changes have occurred effecting the origin of your

incoming messages (e.g., an organization begins to receive large numbers of

messages from a location outside its normal area of operation)

Adding new categories to the SKB

Adding or changing products or services


Knowledge BaseEditor

Populates decision tree withnew concept models basedon each messages conceptmodels

Updates existing models inthe Statistic KnowledgeBase

ConceptModels

Per IndividualMessage

ConceptModels

Per IndividualMessage

Statistic EngineSKB

StatisticKnowledge Base

Stores concept models foreach category in decision

trees


12/14


Responding to special events

Summary of Knowledge Base Development

Linguistic and statistical data stored in the Rapport Knowledge Base is used by the

RME to perform accurate message classification, enabling the system to take the mostappropriate action for each customer message.

Learning

To gather this data, the system can be bootstrapped by an automatic process called

Learning. Learning is ongoing, invisible to the user, and populates the SKB decision

tree with concept models over time during normal Rapport operation. In addition to

bootstrapping the system, learning continuously updates models in the SKB,

improving message classification.

TrainingAlternatively, the Rapport Knowledge Base may be built based on a corpus of sample

messages classified according to categories. During Pre-Training, the Lexical Editor is

used to analyze the corpus, identify significant, corpus-specific linguistic data, and

refine the LKB. NLP Training analyzes each message in the corpus individually, and

exports concepts via the Concept Modeler to the statistic engine. The Knowledge Base

Editor application is used to create a skeletal decision tree structure based on corpus

categories. For each message in the corpus, concept models are gathered for

categories in the decision tree, and are stored in the SKB.

The following simplified diagrams illustrate the chronological development of the

Rapport Knowledge Base using the Training process.



13/14


Knowledge Base Development(Based on Training)

Creating a Corpus


NLP Training Process

Statistic Training Process


CorpusCorpus

Sample Messages

Classifiedaccording to

message content

CorpusCorpus

LexicalEditor

Application

Linguistic

Knowledge

Base

CorpusCorpusNLP Engine

Pre-Processing& Processing

ConceptModeler

Concept Export

Statistic

StatisticEngine

ConceptModels from

NLP Training

Statistic

Knowle

dge

Linguistic

Knowledge

Base


14/14


Online RME Processing

The linguistic and statistical data gathered through Learning, and optionally Training,

enables the RME to accurately classify customer messages on-the-fly. In a process

similar to NLP Training, message concepts are identified by the NLP engine using

data in the LKB, and are exported to the Concept Modeler. Concept models are

received by the statistic engine and compared to existing models in the SKBs

decision tree, generating category scores. Based on category relevancy, optional

logical expressions and other message parameters, and category configuration

properties, the message is routed for an appropriate automatic or semi-automatic

action. The learning process enables the system to evolve and adapt over time,

constantly improving Rapports ability to accurately classify messages in the future.


CustomerMessage

NLP EnginePre-Processing

Processing

Online RME Message Processing

LKB SKB

ConceptModeler

StatisticEngine

Knowledge Base

Message Routed forAutomatic or Semi-Automatic Action

Documents

Older White Paper Example