Upload
laconically
View
218
Download
0
Embed Size (px)
Citation preview
8/14/2019 Older White Paper Example
1/14
Knowledge Base Development
and RME Processing
A Rapport Technical White Paper
DRAFT COPY
January 16, 2003
1999 Banter Technology Inc. All rights reserved.
8/14/2019 Older White Paper Example
2/14
RapportVersion 3.0
19971999 Banter Technology Inc. All rights reserved.
The contents of this documentation are strictly confidential and are proprietary to Banter Technology
Inc. No part of this documentation may be reproduced, transmitted, or stored, in any form, in whole or
in part, or by any means for any purpose without the prior written consent of Banter Technology Inc.
The software described in this document is furnished under a license agreement and may be used orcopied only in accordance with the terms stipulated therein.
Banter Technology Inc. reserves the right to modify the information contained in this document
without prior notification.
Rapport is a trademark of Banter Technology Inc.
Microsoft, Outlook, and Windows are registered trademarks of Microsoft Corporation. Other product
and company names mentioned in this document may be trademarks of their respective owners.
Banter Technology Inc.
60 Federal Street
Suite 550
San Francisco, CA
94107
Tel: 1-415-247-2600
Fax: 1-415-247-2626
E-mail: [email protected]
8/14/2019 Older White Paper Example
3/14
Knowledge Base Development and RME Processing Page 1
Introduction
The Rapport Knowledge Base is a unique, adaptive repository of linguistic and
statistical information that enables Rapport to accurately manage and classify high-
volume customer e-communications. Rapport is a learning systemthe Knowledge
Base continuously evolves to model an organizations current communication
environment.
Working in conjunction with the Rapport Knowledge Base, Rapports Relationship
Modeling Engine (RME) analyzes customer communications and takes the most
appropriate action on-the-fly, based on various user specifications, including
Rapports broad spectrum of configuration settings.
This white paper examines Rapports unique adaptive Knowledge Base, its
development process, and how it enables the RME to accurately process and classify
messages.
Background: Rapports RME Architecture
In Rapport, each customer message is processed according to user-defined categories.
Categories determine which automatic or semi-automatic action is taken with each
message.
Each category represents the content of a message, or indicates some other attribute of
a message such as its source. For example, a financial institution may define
categories like Checking Balance, Transfer Request, and Mortgage Infothese
categories represent the types of customer communications they commonly receive.
In the Rapport Knowledge Base, categories are associated with linguistic concept
models (discussed below) that are used by the RME for message classification. These
concept models determine the relevance of incoming messages to the categories in the
system.
Optionally, categories in the Knowledge Base may be associated with logical
expressionsformulas or statements used to refine or override the RMEs concept-
based message classification. For example, a category may be associated with the
following expression: $R_secured(s) == YES. Rapport analyzes an incoming
message, and using this expression, assigns the associated category 100% relevancy if
the message originated from a secure source.
During the Rapport configuration process, each category is associated with properties
that determine which actions are taken for each message. For example, a message
received by a financial institution is matched to the Mortgage Info category. This
category may have properties that instruct Rapport to compose and send an
appropriate automatic reply using standard pre-written text containing mortgage
information. Alternatively, the Mortgage Info categorys properties may be set to
route the message to an appropriate queue for manual handling.
This document contains confidential and proprietary information.
8/14/2019 Older White Paper Example
4/14
Page 2 Knowledge Base Development and RME Processing
RME Analysis and Message Processing
The RME uses linguistic data and complex statistical algorithms to accurately analyze
and classify customer messages. Each message entering Rapport is analyzed by the
RMEs two primary components: the Natural Language Processing (NLP) engine and
the Rapport statistic engine.
The NLP engine identifies conceptsbasicunits of linguistic or quantitative data
contained within each message. Linguistic data may be based on semantic, contextual,
and morphological information. Quantitative data may include various indicators
derived from the message, such as the number of sentences in a message.
For example, a message may contain the word depositing. Rapports NLP engine
uses morphological analysis to derive the base form of this word as depositan
identifiable concept used to classify the message.
After a messages concepts are identified by the NLP engine, they are exported to the
statistic engine as concept models, the format used for Rapports statistical analysis.
Rapports statistic engine compares a messages concept models to each categoryspre-existing concept models in the Knowledge Base which were gathered during the
Learning and (optional) Training process described below.
The following example illustrates Rapports concept-based analysis: A financial
institution receives a message requesting information about mortgages. The RME
analyzes this message and identifies the linguistic concepts it contains. Then the RME
compares these concepts to all concept models associated with categories in the
system, and determines that the Mortgage Info category best matches this message.
Using unique proprietary algorithms and formulas to derive category relevancy, the
statistic engine calculates category scorespercentage values reflecting the
likelihood that a message belongs to a category. The statistic engine may also uselogical expressions to extract and evaluate message parameters that may influence or
determine category relevancy. Depending on a broad spectrum of system
configuration settings, the message is routed for appropriate automatic or semi-
automatic actions.
The simplified diagram below illustrates the RMEs message processing flow:
1. An incoming message enters the Rapport system.
2. The NLP engine identifies concepts within the message using linguistic data
stored in the Knowledge Base.
3. Concepts are exported as concept models to Rapports statistic engine.
4. The statistic engine compares the messages concept models with each
categorys existing concept models to determine category relevancy.
Optionally, logical expressions are used to refine or override concept-based
message categorization.
This document contains confidential and proprietary information.
8/14/2019 Older White Paper Example
5/14
Knowledge Base Development and RME Processing Page 3
5. The message is routed for an automatic or semi-automatic action, based on
category properties and other configuration settings.
Simplified diagram illustrating the RME message processing flow
Rapports Adaptive Knowledge Base
The knowledge required for accurately classifying each customer message is stored in
Rapports adaptive Knowledge Base. The Rapport Knowledge Base is a repository of
linguistic and statistical information used during RME processing. The Knowledge
Base includes a framework of user-defined categories, built according to the specific
requirements of each organization using Rapport. It is fully adaptiveLearning
(discussed below) automatically updates linguistic and statistical information to
improve future message classification.
The Rapport Knowledge Base consists of two components: the Linguistic KnowledgeBase (LKB) and Statistic Knowledge Base (SKB). Rapports LKB contains a glossary
of standard English usage, semantically significant words, linguistically identical
words, grammar, rules for morphological analysis, and optional domain-specific
terminology. Rapports SKB contains hierarchical or flat decision treesframeworks
of categories. Each category in a decision tree is associated with the concept models
and optional logical expressions that enable Rapport to accurately classify messages.
This document contains confidential and proprietary information.
Rapport Knowledge Base
LKB: Linguistic Data
Glossary of Standard English Usage Semantically Significant Words Linguistically Identical Words Grammar Rules for Morphological Analysis Optional Domain-Specific Terminology
SKB: Statistic Data Hierarchical or Flat Decision Trees Categories populated with Concept Modelsand optional Logical Expressions
Decision Trees
Concept Models and(optional) Logical Expressions
(stat_matching
8/14/2019 Older White Paper Example
6/14
Page 4 Knowledge Base Development and RME Processing
Building the Statistic Knowledge Base
Rapports Knowledge Base Editor application is used to create decision tree structures
stored in the SKB.
Decision trees may have either a flat or hierarchical structure, determined by anorganizations message classification requirements. Flat decision trees are category
lists, employed when a hierarchical organization of categories is not warranted.
Hierarchical decision trees are well-suited for organizing categories that break down
logically into successively greater levels of detail.
The following simplified diagram represents a section of a financial institutions
hierarchical decision tree. Branches of the hierarchical tree are designated by ovals;
sub-branches associated with categories appear in gray. Note that the categories
beneath each branch are logically related; for example,Address Change, Telephone
Change, andE-mail Change are all related to the Customer Info branch.
Representation of a hierarchical decision tree
Note: Using the Knowledge Base Editor, logical expressions may also be associated
with specified categories at this stage to further refine the classification
process.
Once a skeletal decision tree structureeither hierarchical or flathas been created,
concept models for each category (used for message classification) are gathered for
each category through the Learning or Training processes.
Learning
Learning is an ongoing automatic process, invisible to the user, that gathers concept
models for each category in the SKB over time. Concept models are gathered by
collecting feedback from normal message processing activity, bootstrapping the
system for accurate message classification in the future.
This document contains confidential and proprietary information.
StatementCopy
CheckCopy
Check
Order
Travelers
Checks
Foreign
Currency
AddressChange
TelephoneChange
E-mailChange
Orders
Requests Customer Info
8/14/2019 Older White Paper Example
7/14
Knowledge Base Development and RME Processing Page 5
For example, when a customer service agent uses the Rapport Message Center
application to compose a reply to a message, the agent may choose from a database of
pre-written responses linked to categories in the Knowledge Base. The act of choosing
a response provides feedback to the system; concept models contained in the message
form the basis of concept models associated with categories linked to the response.
In addition to bootstrapping the system, learning continuously updates and enriches
existing concept models in the SKB during normal Rapport usage. Learning is an
organic process, enabling the Knowledge Base to grow and adapt over time. Concept
models are refined by introducing new information derived from changes that have
occurred in the composition of messages, and from agent activity. As Rapport learns,
it broadens the base of concept models, making the system more precise over time.
Training
The Training process is an optional, but recommended method for gathering models
for categories in the SKB decision tree. Training is implemented offline, and involves
analyzing a corpus of sample messages classified into pre-defined categories. These
messages are first processed by Rapports Lexical Editor to enrich the LKB with user-
specific linguistic data. Then each message in the corpus is processed individually by
the NLP and Statistic engines to populate the SKB with models used to classify
incoming messages.
The sections that follow discuss the Training process in greater detail.
Stages of Knowledge Base Development
Rapport Knowledge Base development based on Training is an optional, but
recommended process implemented offline, consisting of the following stages:
Creating a Corpus
A corpus of sample messages, pre-classified according to categories, provides
source material for NLP and Statistic Training processes that build the Rapport
Knowledge Base.
The Pre-Training Process
An optional process that enriches the LKB by extracting and identifying
significant words and linguistic information that are unique to the corpus of
messages.
Knowledge Base Building
An optional process consisting of two stages: NLP and Statistic Training. NLP
Training generates concept modelsunits of linguistic information used by the
statistic engine to build the SKB. Statistic Training gathers concept models from
each message in the corpus and updates each categorys models in the SKB
decision tree.
This document contains confidential and proprietary information.
8/14/2019 Older White Paper Example
8/14
Page 6 Knowledge Base Development and RME Processing
Creating a Corpus
A corpus is a collection of sample messages gathered by an organization (prior to
using Rapport) that have been pre-classified according to their subject matter. The
corpus provides source data used during Pre-training, NLP Training, and Statistic
Training (described below).
The corpus may be organized by grouping similar messages in directories or folders
according to category names that represent the messages content. Alternatively, each
message may have a field or data identifier that indicates its category (or categories).
For the subsequent Pre-Training and Training processes to be most effective, the
corpus should only contain messages that are accurately classified and free of
extraneous text (unrelated to the messages category). An ideal corpus consists of
messages that are classified according to well-defined categories (avoiding
redundancies between categories), with textual content that is consistentlyrepresentative of the categorys subject. As many messages as possible with similar
message content should be grouped together for each categorymore messages per
category improves the quality of concept models created during the statistic Training
process (described below).
This document contains confidential and proprietary information.
. . .
Category nCategory 2Category 1 Category 4
Category 2
Category 1,5
Category 3
A corpus with similar messages groupedtogether in separate folders or directoriesrepresenting categories
A corpus where eachmessage is associatedwith one or morecategories
8/14/2019 Older White Paper Example
9/14
Knowledge Base Development and RME Processing Page 7
The Pre-Training Process
Pre-Training is an optional process that extracts and identifies significant linguistic
information unique to the corpus being analyzed. This data enriches the LKB,
improving NLP Training and the RMEs ability to accurately classify messages
online.
Each business or organization has its own vocabulary of words that are unique and
significant. For example, an Internet Service Provider (ISP) may consider the words
Internet Connection to be significant, while an airline passenger service might
decide that these words are insignificant. At the same time, both companies would
probably consider the word connection to be significant, but they would define
connection in two entirely different ways. To the ISP, a connection is an Internet
hookup; to an airline, its an air flight. In contrast, an insurance company may define
connection as insignificant.
Rapports Lexical Editor application is used to implement the Pre-Training process.
The Lexical Editor analyzes the corpus of messages, filters the text, and generates lists
of simple linguistic units called tokens and token pairs, organized according to
frequency.
A token is a string of characters identified by the Lexical Editor within a body of text.
When the system analyzes the text of a corpus, it searches for delimiter characters
such as spaces and typographical marks (periods, colons, etc.). Any string of
characters found between these delimiters is recorded as a token. Significant tokens,
non-significant tokens, and word associations are identified using the Lexical Editor,
and stored in the LKB.
Note: The Pre-Training process is particularly useful for preparing the RME toaccurately classify and process messages from international sources,
especially messages including frequent misspellings and non-standard
English usage.
This document contains confidential and proprietary information.
LKBLinguistic
Knowledge Base
Receives corpus-specificlinguistic data from theLexical EditorAlso contains additionaldomain knowledge(optional), standard Englishword lists, grammar, andrules for morphologicalanalysis
LKBLinguistic
Knowledge Base
Receives corpus-specificlinguistic data from theLexical EditorAlso contains additionaldomain knowledge(optional), standard Englishword lists, grammar, andrules for morphologicalanalysis
Lexical EditorAnalyzes the corpusFilters the word baseGenerates lists of single tokensand token pairs
Calculates token frequencyEnables the user to identifysignificant and non-significanttokens, and define wordassociationsStores information in theLinguistic Knowledge Base
Lexical EditorAnalyzes the corpusFilters the word baseGenerates lists of single tokensand token pairs
Calculates token frequencyEnables the user to identifysignificant and non-significanttokens, and define wordassociationsStores information in theLinguistic Knowledge Base
Corpus
8/14/2019 Older White Paper Example
10/14
Page 8 Knowledge Base Development and RME Processing
Knowledge Base Building
Knowledge Base building based on a corpus is implemented in two phases: NLP
Training and Statistic Training.
The NLP Training Phase
During the NLP Training phase, the NLP engine analyzes and processes each message
in the corpus individually in two stages: Pre-Processing and Processing.
During Pre-Processing, the NLP engine analyzes each message text, identifies the
portion of text to be processed, and generates an intermediate representation of the
concepts contained in the message. In the Processing stage, the NLP engine uses
morphological rules, word associations, and other linguistic techniques to accurately
determine the concepts contained in each message, and the associations between them.
These concepts are exported to the statistic engine for statistic Training via the
Concept Modeler. The Concept Modeler converts the messages concepts intoconcept modelsa format used by the statistic engine to build the Statistic
Knowledge Base.
This document contains confidential and proprietary information.
Statistic
EngineImplements
Statistic Training
Concept
Modeler
Converts conceptsinto concept models
Concept
Modeler
Converts conceptsinto concept models
LKBLinguistic
Knowledge Base
Corpus
NLP Engine
ProcessingUses morphological rules, wordassociations, and complexalgorithms for generating
concepts, and concepts basedon other conceptsExports concepts to the ConceptModeler
Pre-ProcessingAnalyzes and processes each
message individuallyIdentifies the portion of text to beprocessed
Receives data from theLinguistic Knowledge BaseGenerates an intermediaterepresentation of concepts
Concepts
s s s s s
s
s s s s s
s
8/14/2019 Older White Paper Example
11/14
Knowledge Base Development and RME Processing Page 9
The Statistic Training Phase
Statistic Training is implemented using the Rapport Knowledge Base Editor
application. A skeletal decision tree structure is built based on the same categories
used to classify messages in the corpus. During statistic Training, the statistic engine
receives concept models from each corpus message individually. The statistic engine
builds the SKB by performing operations on these concept models, and creatingmodels for the categories of each message in the SKB decision tree. The result is an
SKB populated with models that accurately classifies incoming messages during
online RME processing.
Note: Statistic Training may also provide feedback (manually) to the NLP
Training process, improving NLP analysis and the determination of
concepts.
Updating the Knowledge Base
Rapport readily adapts to almost any change in your incoming message environment.
In some situations, however, the Learning process may take time. A more immediate
solution is running an accelerated version of the Pre-Training and Training processes.
Repeating the Pre-Training and Training (as required) ensures optimal message
classification.
It is recommended to repeat these processes when:
Major changes have been made to categories
Demographic or geographic changes have occurred effecting the origin of your
incoming messages (e.g., an organization begins to receive large numbers of
messages from a location outside its normal area of operation)
Adding new categories to the SKB
Adding or changing products or services
This document contains confidential and proprietary information.
Knowledge BaseEditor
Populates decision tree withnew concept models basedon each messages conceptmodels
Updates existing models inthe Statistic KnowledgeBase
ConceptModels
Per IndividualMessage
ConceptModels
Per IndividualMessage
Statistic EngineSKB
StatisticKnowledge Base
Stores concept models foreach category in decision
trees
8/14/2019 Older White Paper Example
12/14
Page 10 Knowledge Base Development and RME Processing
Responding to special events
Summary of Knowledge Base Development
Linguistic and statistical data stored in the Rapport Knowledge Base is used by the
RME to perform accurate message classification, enabling the system to take the mostappropriate action for each customer message.
Learning
To gather this data, the system can be bootstrapped by an automatic process called
Learning. Learning is ongoing, invisible to the user, and populates the SKB decision
tree with concept models over time during normal Rapport operation. In addition to
bootstrapping the system, learning continuously updates models in the SKB,
improving message classification.
TrainingAlternatively, the Rapport Knowledge Base may be built based on a corpus of sample
messages classified according to categories. During Pre-Training, the Lexical Editor is
used to analyze the corpus, identify significant, corpus-specific linguistic data, and
refine the LKB. NLP Training analyzes each message in the corpus individually, and
exports concepts via the Concept Modeler to the statistic engine. The Knowledge Base
Editor application is used to create a skeletal decision tree structure based on corpus
categories. For each message in the corpus, concept models are gathered for
categories in the decision tree, and are stored in the SKB.
The following simplified diagrams illustrate the chronological development of the
Rapport Knowledge Base using the Training process.
This document contains confidential and proprietary information.
8/14/2019 Older White Paper Example
13/14
Knowledge Base Development and RME Processing Page 11
Knowledge Base Development(Based on Training)
Creating a Corpus
The Pre-Training Process
NLP Training Process
Statistic Training Process
This document contains confidential and proprietary information.
CorpusCorpus
Sample Messages
Classifiedaccording to
message content
CorpusCorpus
LexicalEditor
Application
Linguistic
Knowledge
Base
CorpusCorpusNLP Engine
Pre-Processing& Processing
ConceptModeler
Concept Export
Statistic
StatisticEngine
ConceptModels from
NLP Training
Statistic
Knowle
dge
Linguistic
Knowledge
Base
8/14/2019 Older White Paper Example
14/14
Page 12 Knowledge Base Development and RME Processing
Online RME Processing
The linguistic and statistical data gathered through Learning, and optionally Training,
enables the RME to accurately classify customer messages on-the-fly. In a process
similar to NLP Training, message concepts are identified by the NLP engine using
data in the LKB, and are exported to the Concept Modeler. Concept models are
received by the statistic engine and compared to existing models in the SKBs
decision tree, generating category scores. Based on category relevancy, optional
logical expressions and other message parameters, and category configuration
properties, the message is routed for an appropriate automatic or semi-automatic
action. The learning process enables the system to evolve and adapt over time,
constantly improving Rapports ability to accurately classify messages in the future.
This document contains confidential and proprietary information.
CustomerMessage
NLP EnginePre-Processing
Processing
Online RME Message Processing
LKB SKB
ConceptModeler
StatisticEngine
Knowledge Base
Message Routed forAutomatic or Semi-Automatic Action