Older White Paper Example

Embed Size (px)

Citation preview

  • 8/14/2019 Older White Paper Example

    1/14

    Knowledge Base Development

    and RME Processing

    A Rapport Technical White Paper

    DRAFT COPY

    January 16, 2003

    1999 Banter Technology Inc. All rights reserved.

  • 8/14/2019 Older White Paper Example

    2/14

    RapportVersion 3.0

    19971999 Banter Technology Inc. All rights reserved.

    The contents of this documentation are strictly confidential and are proprietary to Banter Technology

    Inc. No part of this documentation may be reproduced, transmitted, or stored, in any form, in whole or

    in part, or by any means for any purpose without the prior written consent of Banter Technology Inc.

    The software described in this document is furnished under a license agreement and may be used orcopied only in accordance with the terms stipulated therein.

    Banter Technology Inc. reserves the right to modify the information contained in this document

    without prior notification.

    Rapport is a trademark of Banter Technology Inc.

    Microsoft, Outlook, and Windows are registered trademarks of Microsoft Corporation. Other product

    and company names mentioned in this document may be trademarks of their respective owners.

    Banter Technology Inc.

    60 Federal Street

    Suite 550

    San Francisco, CA

    94107

    Tel: 1-415-247-2600

    Fax: 1-415-247-2626

    E-mail: [email protected]

  • 8/14/2019 Older White Paper Example

    3/14

    Knowledge Base Development and RME Processing Page 1

    Introduction

    The Rapport Knowledge Base is a unique, adaptive repository of linguistic and

    statistical information that enables Rapport to accurately manage and classify high-

    volume customer e-communications. Rapport is a learning systemthe Knowledge

    Base continuously evolves to model an organizations current communication

    environment.

    Working in conjunction with the Rapport Knowledge Base, Rapports Relationship

    Modeling Engine (RME) analyzes customer communications and takes the most

    appropriate action on-the-fly, based on various user specifications, including

    Rapports broad spectrum of configuration settings.

    This white paper examines Rapports unique adaptive Knowledge Base, its

    development process, and how it enables the RME to accurately process and classify

    messages.

    Background: Rapports RME Architecture

    In Rapport, each customer message is processed according to user-defined categories.

    Categories determine which automatic or semi-automatic action is taken with each

    message.

    Each category represents the content of a message, or indicates some other attribute of

    a message such as its source. For example, a financial institution may define

    categories like Checking Balance, Transfer Request, and Mortgage Infothese

    categories represent the types of customer communications they commonly receive.

    In the Rapport Knowledge Base, categories are associated with linguistic concept

    models (discussed below) that are used by the RME for message classification. These

    concept models determine the relevance of incoming messages to the categories in the

    system.

    Optionally, categories in the Knowledge Base may be associated with logical

    expressionsformulas or statements used to refine or override the RMEs concept-

    based message classification. For example, a category may be associated with the

    following expression: $R_secured(s) == YES. Rapport analyzes an incoming

    message, and using this expression, assigns the associated category 100% relevancy if

    the message originated from a secure source.

    During the Rapport configuration process, each category is associated with properties

    that determine which actions are taken for each message. For example, a message

    received by a financial institution is matched to the Mortgage Info category. This

    category may have properties that instruct Rapport to compose and send an

    appropriate automatic reply using standard pre-written text containing mortgage

    information. Alternatively, the Mortgage Info categorys properties may be set to

    route the message to an appropriate queue for manual handling.

    This document contains confidential and proprietary information.

  • 8/14/2019 Older White Paper Example

    4/14

    Page 2 Knowledge Base Development and RME Processing

    RME Analysis and Message Processing

    The RME uses linguistic data and complex statistical algorithms to accurately analyze

    and classify customer messages. Each message entering Rapport is analyzed by the

    RMEs two primary components: the Natural Language Processing (NLP) engine and

    the Rapport statistic engine.

    The NLP engine identifies conceptsbasicunits of linguistic or quantitative data

    contained within each message. Linguistic data may be based on semantic, contextual,

    and morphological information. Quantitative data may include various indicators

    derived from the message, such as the number of sentences in a message.

    For example, a message may contain the word depositing. Rapports NLP engine

    uses morphological analysis to derive the base form of this word as depositan

    identifiable concept used to classify the message.

    After a messages concepts are identified by the NLP engine, they are exported to the

    statistic engine as concept models, the format used for Rapports statistical analysis.

    Rapports statistic engine compares a messages concept models to each categoryspre-existing concept models in the Knowledge Base which were gathered during the

    Learning and (optional) Training process described below.

    The following example illustrates Rapports concept-based analysis: A financial

    institution receives a message requesting information about mortgages. The RME

    analyzes this message and identifies the linguistic concepts it contains. Then the RME

    compares these concepts to all concept models associated with categories in the

    system, and determines that the Mortgage Info category best matches this message.

    Using unique proprietary algorithms and formulas to derive category relevancy, the

    statistic engine calculates category scorespercentage values reflecting the

    likelihood that a message belongs to a category. The statistic engine may also uselogical expressions to extract and evaluate message parameters that may influence or

    determine category relevancy. Depending on a broad spectrum of system

    configuration settings, the message is routed for appropriate automatic or semi-

    automatic actions.

    The simplified diagram below illustrates the RMEs message processing flow:

    1. An incoming message enters the Rapport system.

    2. The NLP engine identifies concepts within the message using linguistic data

    stored in the Knowledge Base.

    3. Concepts are exported as concept models to Rapports statistic engine.

    4. The statistic engine compares the messages concept models with each

    categorys existing concept models to determine category relevancy.

    Optionally, logical expressions are used to refine or override concept-based

    message categorization.

    This document contains confidential and proprietary information.

  • 8/14/2019 Older White Paper Example

    5/14

    Knowledge Base Development and RME Processing Page 3

    5. The message is routed for an automatic or semi-automatic action, based on

    category properties and other configuration settings.

    Simplified diagram illustrating the RME message processing flow

    Rapports Adaptive Knowledge Base

    The knowledge required for accurately classifying each customer message is stored in

    Rapports adaptive Knowledge Base. The Rapport Knowledge Base is a repository of

    linguistic and statistical information used during RME processing. The Knowledge

    Base includes a framework of user-defined categories, built according to the specific

    requirements of each organization using Rapport. It is fully adaptiveLearning

    (discussed below) automatically updates linguistic and statistical information to

    improve future message classification.

    The Rapport Knowledge Base consists of two components: the Linguistic KnowledgeBase (LKB) and Statistic Knowledge Base (SKB). Rapports LKB contains a glossary

    of standard English usage, semantically significant words, linguistically identical

    words, grammar, rules for morphological analysis, and optional domain-specific

    terminology. Rapports SKB contains hierarchical or flat decision treesframeworks

    of categories. Each category in a decision tree is associated with the concept models

    and optional logical expressions that enable Rapport to accurately classify messages.

    This document contains confidential and proprietary information.

    Rapport Knowledge Base

    LKB: Linguistic Data

    Glossary of Standard English Usage Semantically Significant Words Linguistically Identical Words Grammar Rules for Morphological Analysis Optional Domain-Specific Terminology

    SKB: Statistic Data Hierarchical or Flat Decision Trees Categories populated with Concept Modelsand optional Logical Expressions

    Decision Trees

    Concept Models and(optional) Logical Expressions

    (stat_matching

  • 8/14/2019 Older White Paper Example

    6/14

    Page 4 Knowledge Base Development and RME Processing

    Building the Statistic Knowledge Base

    Rapports Knowledge Base Editor application is used to create decision tree structures

    stored in the SKB.

    Decision trees may have either a flat or hierarchical structure, determined by anorganizations message classification requirements. Flat decision trees are category

    lists, employed when a hierarchical organization of categories is not warranted.

    Hierarchical decision trees are well-suited for organizing categories that break down

    logically into successively greater levels of detail.

    The following simplified diagram represents a section of a financial institutions

    hierarchical decision tree. Branches of the hierarchical tree are designated by ovals;

    sub-branches associated with categories appear in gray. Note that the categories

    beneath each branch are logically related; for example,Address Change, Telephone

    Change, andE-mail Change are all related to the Customer Info branch.

    Representation of a hierarchical decision tree

    Note: Using the Knowledge Base Editor, logical expressions may also be associated

    with specified categories at this stage to further refine the classification

    process.

    Once a skeletal decision tree structureeither hierarchical or flathas been created,

    concept models for each category (used for message classification) are gathered for

    each category through the Learning or Training processes.

    Learning

    Learning is an ongoing automatic process, invisible to the user, that gathers concept

    models for each category in the SKB over time. Concept models are gathered by

    collecting feedback from normal message processing activity, bootstrapping the

    system for accurate message classification in the future.

    This document contains confidential and proprietary information.

    StatementCopy

    CheckCopy

    Check

    Order

    Travelers

    Checks

    Foreign

    Currency

    AddressChange

    TelephoneChange

    E-mailChange

    Orders

    Requests Customer Info

  • 8/14/2019 Older White Paper Example

    7/14

    Knowledge Base Development and RME Processing Page 5

    For example, when a customer service agent uses the Rapport Message Center

    application to compose a reply to a message, the agent may choose from a database of

    pre-written responses linked to categories in the Knowledge Base. The act of choosing

    a response provides feedback to the system; concept models contained in the message

    form the basis of concept models associated with categories linked to the response.

    In addition to bootstrapping the system, learning continuously updates and enriches

    existing concept models in the SKB during normal Rapport usage. Learning is an

    organic process, enabling the Knowledge Base to grow and adapt over time. Concept

    models are refined by introducing new information derived from changes that have

    occurred in the composition of messages, and from agent activity. As Rapport learns,

    it broadens the base of concept models, making the system more precise over time.

    Training

    The Training process is an optional, but recommended method for gathering models

    for categories in the SKB decision tree. Training is implemented offline, and involves

    analyzing a corpus of sample messages classified into pre-defined categories. These

    messages are first processed by Rapports Lexical Editor to enrich the LKB with user-

    specific linguistic data. Then each message in the corpus is processed individually by

    the NLP and Statistic engines to populate the SKB with models used to classify

    incoming messages.

    The sections that follow discuss the Training process in greater detail.

    Stages of Knowledge Base Development

    Rapport Knowledge Base development based on Training is an optional, but

    recommended process implemented offline, consisting of the following stages:

    Creating a Corpus

    A corpus of sample messages, pre-classified according to categories, provides

    source material for NLP and Statistic Training processes that build the Rapport

    Knowledge Base.

    The Pre-Training Process

    An optional process that enriches the LKB by extracting and identifying

    significant words and linguistic information that are unique to the corpus of

    messages.

    Knowledge Base Building

    An optional process consisting of two stages: NLP and Statistic Training. NLP

    Training generates concept modelsunits of linguistic information used by the

    statistic engine to build the SKB. Statistic Training gathers concept models from

    each message in the corpus and updates each categorys models in the SKB

    decision tree.

    This document contains confidential and proprietary information.

  • 8/14/2019 Older White Paper Example

    8/14

    Page 6 Knowledge Base Development and RME Processing

    Creating a Corpus

    A corpus is a collection of sample messages gathered by an organization (prior to

    using Rapport) that have been pre-classified according to their subject matter. The

    corpus provides source data used during Pre-training, NLP Training, and Statistic

    Training (described below).

    The corpus may be organized by grouping similar messages in directories or folders

    according to category names that represent the messages content. Alternatively, each

    message may have a field or data identifier that indicates its category (or categories).

    For the subsequent Pre-Training and Training processes to be most effective, the

    corpus should only contain messages that are accurately classified and free of

    extraneous text (unrelated to the messages category). An ideal corpus consists of

    messages that are classified according to well-defined categories (avoiding

    redundancies between categories), with textual content that is consistentlyrepresentative of the categorys subject. As many messages as possible with similar

    message content should be grouped together for each categorymore messages per

    category improves the quality of concept models created during the statistic Training

    process (described below).

    This document contains confidential and proprietary information.

    . . .

    Category nCategory 2Category 1 Category 4

    Category 2

    Category 1,5

    Category 3

    A corpus with similar messages groupedtogether in separate folders or directoriesrepresenting categories

    A corpus where eachmessage is associatedwith one or morecategories

  • 8/14/2019 Older White Paper Example

    9/14

    Knowledge Base Development and RME Processing Page 7

    The Pre-Training Process

    Pre-Training is an optional process that extracts and identifies significant linguistic

    information unique to the corpus being analyzed. This data enriches the LKB,

    improving NLP Training and the RMEs ability to accurately classify messages

    online.

    Each business or organization has its own vocabulary of words that are unique and

    significant. For example, an Internet Service Provider (ISP) may consider the words

    Internet Connection to be significant, while an airline passenger service might

    decide that these words are insignificant. At the same time, both companies would

    probably consider the word connection to be significant, but they would define

    connection in two entirely different ways. To the ISP, a connection is an Internet

    hookup; to an airline, its an air flight. In contrast, an insurance company may define

    connection as insignificant.

    Rapports Lexical Editor application is used to implement the Pre-Training process.

    The Lexical Editor analyzes the corpus of messages, filters the text, and generates lists

    of simple linguistic units called tokens and token pairs, organized according to

    frequency.

    A token is a string of characters identified by the Lexical Editor within a body of text.

    When the system analyzes the text of a corpus, it searches for delimiter characters

    such as spaces and typographical marks (periods, colons, etc.). Any string of

    characters found between these delimiters is recorded as a token. Significant tokens,

    non-significant tokens, and word associations are identified using the Lexical Editor,

    and stored in the LKB.

    Note: The Pre-Training process is particularly useful for preparing the RME toaccurately classify and process messages from international sources,

    especially messages including frequent misspellings and non-standard

    English usage.

    This document contains confidential and proprietary information.

    LKBLinguistic

    Knowledge Base

    Receives corpus-specificlinguistic data from theLexical EditorAlso contains additionaldomain knowledge(optional), standard Englishword lists, grammar, andrules for morphologicalanalysis

    LKBLinguistic

    Knowledge Base

    Receives corpus-specificlinguistic data from theLexical EditorAlso contains additionaldomain knowledge(optional), standard Englishword lists, grammar, andrules for morphologicalanalysis

    Lexical EditorAnalyzes the corpusFilters the word baseGenerates lists of single tokensand token pairs

    Calculates token frequencyEnables the user to identifysignificant and non-significanttokens, and define wordassociationsStores information in theLinguistic Knowledge Base

    Lexical EditorAnalyzes the corpusFilters the word baseGenerates lists of single tokensand token pairs

    Calculates token frequencyEnables the user to identifysignificant and non-significanttokens, and define wordassociationsStores information in theLinguistic Knowledge Base

    Corpus

  • 8/14/2019 Older White Paper Example

    10/14

    Page 8 Knowledge Base Development and RME Processing

    Knowledge Base Building

    Knowledge Base building based on a corpus is implemented in two phases: NLP

    Training and Statistic Training.

    The NLP Training Phase

    During the NLP Training phase, the NLP engine analyzes and processes each message

    in the corpus individually in two stages: Pre-Processing and Processing.

    During Pre-Processing, the NLP engine analyzes each message text, identifies the

    portion of text to be processed, and generates an intermediate representation of the

    concepts contained in the message. In the Processing stage, the NLP engine uses

    morphological rules, word associations, and other linguistic techniques to accurately

    determine the concepts contained in each message, and the associations between them.

    These concepts are exported to the statistic engine for statistic Training via the

    Concept Modeler. The Concept Modeler converts the messages concepts intoconcept modelsa format used by the statistic engine to build the Statistic

    Knowledge Base.

    This document contains confidential and proprietary information.

    Statistic

    EngineImplements

    Statistic Training

    Concept

    Modeler

    Converts conceptsinto concept models

    Concept

    Modeler

    Converts conceptsinto concept models

    LKBLinguistic

    Knowledge Base

    Corpus

    NLP Engine

    ProcessingUses morphological rules, wordassociations, and complexalgorithms for generating

    concepts, and concepts basedon other conceptsExports concepts to the ConceptModeler

    Pre-ProcessingAnalyzes and processes each

    message individuallyIdentifies the portion of text to beprocessed

    Receives data from theLinguistic Knowledge BaseGenerates an intermediaterepresentation of concepts

    Concepts

    s s s s s

    s

    s s s s s

    s

  • 8/14/2019 Older White Paper Example

    11/14

    Knowledge Base Development and RME Processing Page 9

    The Statistic Training Phase

    Statistic Training is implemented using the Rapport Knowledge Base Editor

    application. A skeletal decision tree structure is built based on the same categories

    used to classify messages in the corpus. During statistic Training, the statistic engine

    receives concept models from each corpus message individually. The statistic engine

    builds the SKB by performing operations on these concept models, and creatingmodels for the categories of each message in the SKB decision tree. The result is an

    SKB populated with models that accurately classifies incoming messages during

    online RME processing.

    Note: Statistic Training may also provide feedback (manually) to the NLP

    Training process, improving NLP analysis and the determination of

    concepts.

    Updating the Knowledge Base

    Rapport readily adapts to almost any change in your incoming message environment.

    In some situations, however, the Learning process may take time. A more immediate

    solution is running an accelerated version of the Pre-Training and Training processes.

    Repeating the Pre-Training and Training (as required) ensures optimal message

    classification.

    It is recommended to repeat these processes when:

    Major changes have been made to categories

    Demographic or geographic changes have occurred effecting the origin of your

    incoming messages (e.g., an organization begins to receive large numbers of

    messages from a location outside its normal area of operation)

    Adding new categories to the SKB

    Adding or changing products or services

    This document contains confidential and proprietary information.

    Knowledge BaseEditor

    Populates decision tree withnew concept models basedon each messages conceptmodels

    Updates existing models inthe Statistic KnowledgeBase

    ConceptModels

    Per IndividualMessage

    ConceptModels

    Per IndividualMessage

    Statistic EngineSKB

    StatisticKnowledge Base

    Stores concept models foreach category in decision

    trees

  • 8/14/2019 Older White Paper Example

    12/14

    Page 10 Knowledge Base Development and RME Processing

    Responding to special events

    Summary of Knowledge Base Development

    Linguistic and statistical data stored in the Rapport Knowledge Base is used by the

    RME to perform accurate message classification, enabling the system to take the mostappropriate action for each customer message.

    Learning

    To gather this data, the system can be bootstrapped by an automatic process called

    Learning. Learning is ongoing, invisible to the user, and populates the SKB decision

    tree with concept models over time during normal Rapport operation. In addition to

    bootstrapping the system, learning continuously updates models in the SKB,

    improving message classification.

    TrainingAlternatively, the Rapport Knowledge Base may be built based on a corpus of sample

    messages classified according to categories. During Pre-Training, the Lexical Editor is

    used to analyze the corpus, identify significant, corpus-specific linguistic data, and

    refine the LKB. NLP Training analyzes each message in the corpus individually, and

    exports concepts via the Concept Modeler to the statistic engine. The Knowledge Base

    Editor application is used to create a skeletal decision tree structure based on corpus

    categories. For each message in the corpus, concept models are gathered for

    categories in the decision tree, and are stored in the SKB.

    The following simplified diagrams illustrate the chronological development of the

    Rapport Knowledge Base using the Training process.

    This document contains confidential and proprietary information.

  • 8/14/2019 Older White Paper Example

    13/14

    Knowledge Base Development and RME Processing Page 11

    Knowledge Base Development(Based on Training)

    Creating a Corpus

    The Pre-Training Process

    NLP Training Process

    Statistic Training Process

    This document contains confidential and proprietary information.

    CorpusCorpus

    Sample Messages

    Classifiedaccording to

    message content

    CorpusCorpus

    LexicalEditor

    Application

    Linguistic

    Knowledge

    Base

    CorpusCorpusNLP Engine

    Pre-Processing& Processing

    ConceptModeler

    Concept Export

    Statistic

    StatisticEngine

    ConceptModels from

    NLP Training

    Statistic

    Knowle

    dge

    Linguistic

    Knowledge

    Base

  • 8/14/2019 Older White Paper Example

    14/14

    Page 12 Knowledge Base Development and RME Processing

    Online RME Processing

    The linguistic and statistical data gathered through Learning, and optionally Training,

    enables the RME to accurately classify customer messages on-the-fly. In a process

    similar to NLP Training, message concepts are identified by the NLP engine using

    data in the LKB, and are exported to the Concept Modeler. Concept models are

    received by the statistic engine and compared to existing models in the SKBs

    decision tree, generating category scores. Based on category relevancy, optional

    logical expressions and other message parameters, and category configuration

    properties, the message is routed for an appropriate automatic or semi-automatic

    action. The learning process enables the system to evolve and adapt over time,

    constantly improving Rapports ability to accurately classify messages in the future.

    This document contains confidential and proprietary information.

    CustomerMessage

    NLP EnginePre-Processing

    Processing

    Online RME Message Processing

    LKB SKB

    ConceptModeler

    StatisticEngine

    Knowledge Base

    Message Routed forAutomatic or Semi-Automatic Action