51
Information Management Software Catching the bad guys (and seeing the good guys) Entity/Relationship Analytics and how to Understand/Recognize Global Names

Information Management Software Catching the bad guys (and seeing the good guys) Entity/Relationship Analytics and how to Understand/Recognize Global Names

Embed Size (px)

Citation preview

Information Management Software

Catching the bad guys (and seeing the good guys)

Entity/Relationship Analytics and how to Understand/Recognize Global Names

Information Management Software

Information Management Software

Entity Analytic Challenges:Ability to Overcome Multiple Levels of Identity Ambiguation

Naturally Occurring Phenomena Such As Data Quality And Cultural Variants, As Well As Deliberate Acts Of Identity Misrepresentation, Compounded by Need to Protect Privacy

LEVEL 1)

Dirty Disparate Data

LEVEL 2)

Cultural Obstacles LEVEL 3)

Identity Ambiguation

LEVEL 4)

Network Ambiguation

LEVEL 5)

Privacy & Security

Information Management Software

Transposition ErrorsMultiple FormatsData DriftDirty Data

Complexity Level Complexity Level HIGHHIGH

Perform Data Perform Data QualityQuality

(Naturally Occurring)

What is Needed To Address The Challenge?Step One – Address Naturally Occurring Data Quality IssuesThe first step is to gather the information assets necessary to accomplish the mission, and perform consistent quality, standardization, formatting, and enhancement

Information Management Software

Architecturally Thinking

• We can handle dirty data…

• But WHAT is dirty data?

• Do we always want to cleanse data?

• What is the value of dirty data?

• Consider the sources of data

• Consider the flow of data

Information Management Software

6

The Problem: Ambiguous, Misrepresented, Blurry Identity

For a variety of reasons, companies don't have a clear picture of the individuals and organizations with whom they do business.

For a variety of reasons, companies don't have a clear picture of the individuals and organizations with whom they do business.

Depending on the nature of the organization's mission, the impact can lead to problems including missing threats to public safety, duplication of benefit payments, accepting

business from known criminals, etc.

Information Management Software

IBM InfoSphere Relationship ResolutionIBM InfoSphere Relationship Resolution

Who is who?

Establish Unique Identity

Integrates data silosFull attribution of

entities

Who knows who?

Obvious & non-obviousLinks people & groupsRole alerts

Who does what?

Events & TransactionsBusiness Rule

MonitoringCriteria based alerting

Information Management Software

Transposition ErrorsMultiple FormatsData DriftDirty Data

Name CulturesName GendersName Order

Complexity Level Complexity Level HIGHHIGH

Complexity Level Complexity Level HIGHHIGH

What is Needed To Address The ChallengeStep Two – Manage Cultural Identity Ambiguation

The second step is managing the cultural variations of identity data such as name variants, spellings, cultures, genders and applying culture-specific analytics to the recognition process

Cultural Cultural AmbiguitiesAmbiguities(Naturally Occurring)

Perform Data Perform Data QualityQuality

(Naturally Occurring)

Information Management Software

We’ll be back with Global Names…

Information Management Software

Transposition ErrorsMultiple FormatsData DriftDirty Data

Name CulturesName GendersName Order

Complexity Level Complexity Level HIGHHIGH

Identity MaskingFalse IdentifiersStolen Identifiers

Complexity Level Complexity Level VERY HIGHVERY HIGH

Complexity Level Complexity Level HIGHHIGH

What is Needed To Address The Challenge?Step Three – Address Intentional Identity

Resolve “Who Is Who” – Resolve and identify persons/organizations deliberately trying to hide or misrepresent who they actually are

Cultural Cultural AmbiguitiesAmbiguities(Naturally Occurring)

Resolve Resolve IdentityIdentity

(Intentional Act)

Perform Data Perform Data QualityQuality

(Naturally Occurring)

Information Management Software

EAS Entity #9453

Attribute Value Source

Name Marc R Smith A-70001Name Randal M Smith B-9103Name Mark Randy Smith C-6251Address 123 Main St A-70001Address 456 First St C-6251Phone (713) 730 5769 A-70001Phone (713) 731 5577 B-9103Phone (713) 731 5577 C-6251Tax ID 537-27-6402 A-70001License 0001133107 A-70001License 1133107 C-6251DOB 06/17/1934 B-9103

EAS Entity #9453

Attribute Value Source

Name Marc R Smith A-70001Name Mark Randy Smith C-6251Address 123 Main St A-70001Address 456 First St C-6251Phone (713) 730 5769 A-70001Phone (713) 731 5577 C-6251Tax ID 537-27-6402 A-70001License 0001133107 A-70001License 1133107 C-6251

EAS Entity #9452

Attribute Value Source

Name Randal M Smith B-9103DOB 06/17/1934 B-9103Phone (713) 731 5577 B-9103

Observations

Entity Analytic Solutions – “Who is who?”

Record #70001Marc R Smith123 Main St(713) 730 5769537-27-6402DL: 0001133107

Record #9103Randal M SmithDOB: 06/17/1934(713) 731 5577

Record #6251Mark Randy Smith456 First Street(713) 731 5577DL:1133107

EAS Entity #9451

Attribute Value Source

Name Marc R Smith A-70001Address 123 Main St A-70001Phone (713) 730 5769 A-70001Tax ID 537-27-6402 A-70001License 0001133107 A-70001

A – Credit Card B – Mortgage C – DDA

Sequence Neutral Identity Resolution

Self-Correcting

20 Attributes Out of the Box

Predefined Rules/Sensitivities

A

B

C

Interactions Entity Context

Information Management Software

EAS Entity Resolution – The Basis for Assessment

Mr. Joseph Carbella55 Church StreetNew York, NY 10007Tel#: 212-693-5312DOB: 07/08/66SID#: 068588345DL#: 544 210 836

ACCT # 2310322

COSIGNER

Mr. Joe JonesAPT 4909Bethesda, MD 20814Tel#: 978-365-6631DOB: 09/07/66AUTO LOAN

Mr. Joe Carbello1 Bourne StClinton MA 01510TEL#: 978-365-6631 DL#: 544 210 836DOB: 07/09/66

ACCT #3292322

HOME LOAN

Mr. Joey Carbello555 Church AveNew York, NY 10070Tel#: 212-693-5312 DL#: 544 210 836

PPN#: 086588345

ACCT #494202

OBLIGOR Close match

Exact match

Allows Investigators to Establish True Identity When Suspects, Attempt To Hide Or Blur Who They Are and Their Characteristics

Information Management Software

EAS Entity Resolution – Risk View

Names Marc R Smith A-#70001 05/01/05

Randal Smith B-#009102 05/10/06

Mark Randy Smith C-#6251 07/12/05

Address 123 Main St. A-#70001 05/01/05

456 First Street C-#6251 07/12/05

Phones (713) 730-5769 A-#70001 05/01/05

(713) 731-5577 B-#009102 05/10/06

SSN 537-27-6402 A-#70001 05/01/05

DL 1133107 A-#70001 05/01/05

1133107 C-#6251 07/12/05

DOB 06/17/1934 B-#009103 05/10/06

COUNTRY

Pakistan B-#009103 05/10/06

ACCTYPE

Wire A-#70001 05/01/05

OFAC Match C-#6251 07/12/06

PEP No Match A-#70001 05/01/06

NEGNEWS

Criminal B-#009103 05/10/06

SICCODE 1023113 A-#70001 05/01/06

HIFCA Los Angeles C-#6251 07/12/06

HIDTA San Diego B-#009103 05/10/06

Entity #144465

Marc R SmithRandal SmithMark Randy Smith123 Main St

#144465

Information Management Software

EAS Identity Repository – Identity Folder

Bob SmithMark Robert SmithMarc R SmithMark Smith

Marc R Smith

Bob Smith

Mark Robert Smith

Mark Smith

#144465

Information Management Software

Architecturally Thinking

• How does this compare to federating data?

• IS this MDM?

• If yes, why?

• If no, why?

• Remember the Filing Cabinet

• Remember the dirty data?

• Consider your understanding of Single View

• Remember that you now have a database footprint

Information Management Software

Transposition ErrorsMultiple FormatsData DriftDirty Data

Name CulturesName GendersName Order

Complexity Level Complexity Level HIGHHIGH

Identity MaskingFalse IdentifiersStolen Identifiers

Complexity Level Complexity Level VERY HIGHVERY HIGH

NomineesNon Obvious RelationsHidden networks

Complexity Level Complexity Level EXTREMEEXTREME

Complexity Level Complexity Level HIGHHIGH

What is Needed To Address The Challenge?Step Four – Address Network Ambiguation

Uncover “Who Knows Who” – Spot linkages or Non-Obvious Relationships between identities to reveal criminal networks, syndicates, and terrorist cells

Cultural Cultural AmbiguitiesAmbiguities(Naturally Occurring)

Resolve Resolve IdentityIdentity

(Intentional Act)

Relate Relate IdentityIdentity

(Intentional Act)

Perform Data Perform Data QualityQuality

(Naturally Occurring)

Information Management Software

Entity Analytic Solutions – “Who knows who?”

A – Credit Card B – Mortgage C – DDA D – Wires E – Addl internal/ External

Marc is related to Bob from B by a disclosed relationship.

Marc is related to Bob from B by a disclosed relationship.

What relationships does Marc Smith hold with entities across

the enterprise?

Marc is related to Joan from B by home address

Marc is related to Joan from B by home address

Related to Alice (through Sue) from D by a phone number at two degrees of separation

Related to Alice (through Sue) from D by a phone number at two degrees of separation

Related to John (through Alice) from B by a business address at three degrees of separation

Related to John (through Alice) from B by a business address at three degrees of separation

Related to Sue from C by a Tax ID at one degree of separation

Related to Sue from C by a Tax ID at one degree of separation

Information Management Software

EAS Relationship Resolution/Social Networks

Entity #1230431

Information Management Software

EAS Relationship Resolution – Degrees of Separation( across any attribute(s) )

A: Mark Smith

Phone: (713) 730 5769

B: Kate Green

Phone: (713) 730 5796

Addr: 123 Main St

C: Tom Sinclair

Addr: 123 Main St

*** OFAC LIST ***

(Associative Property: If A = B = C; Therefore A = C)

=

= =

EAS Supports 30 Degrees of Separation!

A: Mark Smith

Phone: (713) 730 5769

C: Tom Sinclair

Addr: 123 Main St

*** OFAC LIST***

Mark is related to Tom by Two Degrees of Separation.

Information Management Software

144225

144465

144465

142365

143211

149965

144465

144465

144465

123101

144465

144465

143265

148965144215

142145

Mark Smith

123 High St

Telford

Kate Green

431 Rebus

Avenue

Harlow

Tom Sinclair

23 Lansbury Ave

Stratford

Raj Jones

65 Kenyan Way

Jim Roberts

30130 Elm

Boston, MA USA

Ming Chan

495 Randal St

Liverpool

Harold Burr

402 West St

Bristol

Luci Tamoia

13 Galliard House

Leeds

Gwen Roberts

95 Arvale Road

London

Juergen Lit

921 Rue de Lyon

Paris

Identity Folders – Complete Relationship Resolution

Information Management Software

Information Management Software

Architecturally Thinking

• Degrees of separation…from?

• Other people (identities)

• Other “things” (entities)

• EAS is a fraud detection system

• If yes, why?

• If no, why?

• Remember the Filing Cabinet

• Remember the dirty data?

• The power of understanding a network of identities

• EAS is rarely (never) “rip and replace”

• If not, then where does it fit?

Information Management Software

Transposition ErrorsMultiple FormatsData DriftDirty Data

Privacy ComplianceSecurity Drivers

Complexity Level Complexity Level EXTREMEEXTREME

Privacy & Privacy & SecuritySecurity

(Reactive Action)

Name CulturesName GendersName Order

Complexity Level Complexity Level HIGHHIGH

Cultural Cultural AmbiguitiesAmbiguities(Naturally Occurring)

Identity MaskingFalse IdentifiersStolen Identifiers

Complexity Level Complexity Level VERY HIGHVERY HIGH

Resolve Resolve IdentityIdentity

(Intentional Act)

NomineesNon Obvious RelationsHidden networks

Complexity Level Complexity Level EXTREMEEXTREME

Relate Relate IdentityIdentity

(Intentional Act)

Complexity Level Complexity Level HIGHHIGH

Perform Data Perform Data QualityQuality

(Naturally Occurring)

Anonymization – For situations where privacy or security concerns make recognition high risk, or sensitive data needs to be de-identified to facilitate cross-agency/country sharing and analytics

What is Needed To Address The Challenge?Accommodate Privacy & Security Considerations If Required

Information Management Software

Danger Will Robinson!!!

Information Management Software

Relationship Resolution

Information Management Software

Identity-based Aggregate

Entity Analytic Solutions – “Who does what?”

Observed Activities

Business Rules & Threshholds

TRANSACTIONSAcct #120-555Withdraw $9,900Acct #456-983Withdraw $9,800Acct #942-525Withdraw $9,800

EVENTS01/25/08 10:39Account Applicant01/25/08 10:55Account Applicant01/25/08 11:05Account Applicant

Cust #C-6251Mark Randy SmithAcct #120-555Cust #A-70001Marc R SmithAcct #456-983Cust #B-9103Randal M SmithAcct #942-525

Sample Rules•Transaction Amt > $

•Average Transaction Amt

•Number of Transactions > X

•Between Date A and Date B

•Within Geospatial Range

•Combinations of the Above

•User Defined

Streaming Real-Time Monitoring & Alerting

(User) Define New Rules via GUI

Information Management Software

Traditional Anti Money Laundering – Account OrientationFraudsters know how to defeat account based detection systems

To defeat SAR systems criminals will “structure” activity across multiple accounts, each attached to an identity packet, and across multiple geographies so the suspicious pattern is watered down and overlooked.

$8,000 Cash Deposit

ACCT# 987-442-004

ACCT# 321-462-567

$8,000 Wire Transfer $9,500 Cash Deposit

ACCT# 675-466-099

$8,000 Cash Deposit

ACCT# 987-442-004

ACCT# 321-462-567

$8,000 Wire Transfer

$9,500 Cash Deposit

ACCT# 675-466-099

$9,900 Wire TransferACCT# 990-432-

000

Entity Analytic Solutions – Identity OrientationEAS (identity based) Catches The Fraudsters at THEIR Game!

Account-Number-Based Analysis Solutions

Possess Blind-Spots

$35,000ALERT!

“Structuring remains one of the most commonly reported suspected crimes on Suspicious Activity Reports (SARs).” – BSA AML Examination Manual

Information Management Software

Architecturally Thinking

• Real time “action resolution” through business rules

• What are the business rules?

• And who knows them?

• Always, always, always consider data overload

• A new term: False positive/False negative

• They can be:

• A great ROI tool when you reduce them

• A REAL PROBLEM when you increase them

• Think: Feedback loops

• Think: Synergy

Information Management Software

Entity Analytic Solutions – “How do I find what I should know”

DATADATA

DATADATA DWDW

Merge/Purge introduces data loss when picking

the “best version”

DATADATA

MARTMART

MARTMART

MARTMART

Data segragated to “support”

dept initiatives

“Have we seen this applicant

before?”

Days or Weeks

Days, Weeks or Months

Are all the details still present

New data? Must ask again every day

What is the right question to ask?

Will I remember how these facts relate?

The “Enterprise Amnesia” Model

Information Management Software

Entity Analytic Solutions – “How do I find what I should know”

DATADATA

DATADATA

The “Enterprise Awareness” Model

DATADATA

“Have we seen this applicant

before?”

DWDW

MARTMART

MARTMART

MARTMART

Each New Key Data Value Introduced is Evaluated Against All Prior Key Data Values

Seconds – Streaming Real-Time

Alerts pushed to analyst upon suspicious activity

Catch The Bad Guys!Nominal Latency & Real-Time Contextto PRE-EMPT and PREVENT

Catch The Bad Guys!Nominal Latency & Real-Time Contextto PRE-EMPT and PREVENT

Process New Key Info First Like a Query

Queries & Data Flow Through The Same Channel =

“The address and phone for account 59412 has

changed 5 times in just 3 weeks. Alert! potential

Identity Theft.”

“The address and phone for account 59412 has

changed 5 times in just 3 weeks. Alert! potential

Identity Theft.”

” A bank employee changed their payroll

address to the address of an ex-employee jailed for

embezzlement three years ago.”

” A bank employee changed their payroll

address to the address of an ex-employee jailed for

embezzlement three years ago.”

“This person has applied 10 times before and

shares an address and SSN with a bank

employee/teller in the same city.”

“This person has applied 10 times before and

shares an address and SSN with a bank

employee/teller in the same city.”

Information Management Software

Recognize

Resolve Relate

Engine

Persistant Search & Alerts

Database

Full Attribution

Fully Auditable

Service

CoreClient * Alert * Entity

Process Search * Load * Score

Application Server

Entity Analytic Solutions Architecture

Client

VisualizerSearchGraphResearch

ConsoleConfigureSecureManage

Entity Repository

User

Admin

Information Management Software

Enterprise

Ability to aggregate by consolidated identities and their full attributes

Relationships available as another dimension

Data Store

EAS Architecture in the Enterprise

Data Source

Metadata

Warehouse Administration

Ext

ract

ion

, Tra

nsf

orm

atio

n,

Val

idat

ion

; F

eder

ated

Dat

a A

cces

s

ETL Data Mart

Fraud

Detection

Analysis /

Mining

CRM

Financial

Extranet Portal

Apps, e.g.

Visualization

Access

Query/Reporting

Intranet Portal

External Audience

Internal Audience

Data Warehouse

M&A Data

Best Customers

Internal

Watch Lists

& CalculationTransformation

Review & Act (manually/auto) on Conflicts in Identities and

Relationships per Business Rules

REAL-TIME(or batched)IDENTITYRESOLUTION

Population in thecontext of theresolved identities

Identity andRelationshipRepository

EASAlways Up-to-date

(No Reload)

Employees

Vendors

Online Customers

Additional DataOn An Identity

External ListsOf Bad Guys

External

Data Service(s)

Information Management Software

Name Processing Magic!

Information Management Software

Basic Question Number 1: How do you handle the management of your name data?

Difficult to accurately search for & match customers family or cultural variants of first and last names

Can validate addresses & telephone numbers, but how do we know if a name is accurate?

We have invested millions in cleaning up our customer data file, yet problems remain

Current solutions based on very old technology and generate too many false positives & negatives, too time-consuming.

This is a pervasive problem in many industries

Name Matching & Name Management Challenges Significant Business Issues

OFAC ListCheck Was

Missed!!

There’s how many

variants of that name? How do you

parse “Maria Luz Rodriguez

v. de Luna”

Information Management Software

Who cares about names?

Risk posed by false negatives / Requirement to handle names precisely

Size of name data

set

Small

Large

Low High

Criticality

of Name Data M

anagement

Information Management Software 36

What’s In a Name?

• Names remain the single most important means for identifying persona non grata

• Biometrics are only useful the second time you meet someone

• People everywhere in the world are learning how easily our name search systems can be confounded and circumvented

Information Management Software

Nicknames, Drew, Manny, CatShortened names, Andy, EmanPrefixes, Abdul, Fitz, O', De La,

Name Order, Hussein, Mohammed Abu AliTitles, Dr., Rev, Haj, Sri., ColPhonetics, Worchester, Wooster, “Worcester”

Andras, André, Andre, Drue, Ohndrae, Ohndre

Eman, Emanual, Imanuel, Immanuele, Manny, Manual,

Mohaammad,Mohammed, Imhemmed,Mohammd, Mohamod, Mohamud,

Cait, Caitey, Katalin, Katchen, Kate, Katerinka,

Why Are Multi-Cultural Names So Hard?How Do You Verify A Name?

? ? ? ? ?

Typically CIFs are focused on storing customer information, demographics, account information etc. They aren’t equipped to deal with the unique demands of classifying, matching and processing global & cultural name variations.

Information Management Software 38

Algorithm

User

SuccessfulName

Searching

Database

What’s Needed to Address The Challenge?

The sole purpose of a

Search Engine

is to mediate between a

User and a Data Base

Information Management Software 39

Database Problems

What We FoundThe First Problem

MARIA ELENA

LOPEZ GARCIA

MARIA ELENA

LOPEZ GARCI

MARIA ELENA

LOPEZGARCIA

Information Management Software 40

Ineffective Search Technologies

What We FoundThe Second Problem

Database Problems

Exact Match

Soundex (1918)

NYSIIS (1963)

“Home - Grown”

Information Management Software

The Original Hollerith Machine for which Soundex was Designed

Information Management Software 42

Search

What We FoundThe Third Problem

Database

Exact Match

Soundex (1918)

NYSIIS (1963)

Limited User Support

Chinese

Arabic

Thai

Hispanic

Russian

Korean

Yoruban

Indonesian

“Home-Grown”

Information Management Software

Cheung Yau So

Cheung Yau So

Chiusu Sae Chang

Zhang Qiusu

Chang Ch’iu-Su

There are hundreds of name variants There are multiple ways that these

names can be spelled -

You can verify an address, a telephone number, but how do you verify a name??

Simple Name Recognition is Particularly Hard

Taiwan

Philippines

Indonesia

ThailandCambodia

Myanmar(Burma) Laos

Vietnam

Hong KongMacau

Malaysia

China

Singapore

Information Management Software

Solution

Name-centric data

warehouses

Anti-Money Laundering

Systems

Customer Information

Systems

ERP Systems(HR, Contracts,

etc.)

Search Lists

Watch lists

(OFAC, PEP,

Interpol)

3rd Party data sets

Name data files

Global Name Recognition

• Global Name Recognition consists of a set of tools that complement existing IT investments for organizations looking to analyze, search and process names

• Domain expertise in multi-cultural names in the areas of:

• Name Analysis

• Name Enrichment

• Name Matching

• Adds value to name matching and analysis based on statistical and linguistic analysis of almost a billion names and 18 cultural families

• Reduces false positive results so that the information returned is reliable and relevant

Information Management Software

What is GNR ?

• A series of Services Oriented Architecture enabled libraries and interfaces that address the linguistic and cultural complexities of names (personal, organizational,…) from around the world

• Used to enhance name-processing (analysis, matching, understanding) in a wide variety of systems and applications

• Based on 20+ years intensive research and data-collection of names – based on approximately 1 Billion name repository)

Information Management Software

Andreas, Andrei, Andrej, Andres, Andresj, Andrewes, Andrews, Andrey, Andrezj, Andrian, Andriel, Andries, Andrij, Andrija, Andrius, Andro, Andros, Andru, Andruw, Andrzej, Andy, Antero, Dandie, Dandy, Drew, Dru, Drud, Drue, Drugi, Mandrew, Ohndrae, Ohndre, Ondre, Ondrei, Ondrej, Ohnrey Ondrey, Eric, Erich, Erick, Erico, Erik, Eryk, Federico, Federigo, Fred, Fredd, Freddie, Freddy, Fredek, Frederic, Frederich, Frederico, Frederik, Fredi, Fredric, Fredrick, Fredrik, Frido, Friedel, Friedrich, Friedrick, Fridrich, Fridrick, Fritz, Fritzchen, Fritzi, Fritzl, Fryderky, Ric, Fredro, Rich, Rick, Ricky, Rik. Rikki. Cait, Caitie, Cate, Catee, Catey, Catie, Kaethe, Kait, Kaite, Kaitlin, Katee, Katey, Kathe, Kati, Katie, Bel, Belia, Belicia, Belita, Bell, Bella, Belle, Bellita, Ib, Ibbie, Isa, Isabeau, Isabela, Isabele, Isabelita, Isabell, Isabella, Isabelle, Ishbel, Isobel, Isobell, Isobella, Isobelle, Issie, Issy, Izabel, Izabella, Izabelle, Izzie, Izzy, Sabella, Sabelle, Ysabeau, Ysabel, Ysabella, Ysobel, Ainslaeigh, Ashalee, Ashalei, Ashelei, Asheleigh, Asheley, Ashely, Ashla, Ashlan, Ashlay, Ashle, Ashlea, Ashleah, Ashlee, Ashlei, Ashleigh, Ashlen, Ashli, Ashlie, Ashly, Boutros, Par, Peder, Pedro, Pekka, Per, Petar, Pete, Peterson, Petr, Petre, Petros, Petrov, Pierce, Piero, Pierre, Piet, Pieter, Pietro, Piotr, Pyotr, Hamid, Hammad, Mahmood, Mahmoud, Mahmud, Mahomet, Mehmet, Mehmood, Mehmoud, Mehmud, Mihammad, Mohamad, Mohamed, Mohamet, Mohammad, Mohammed, Muhamet, Muhammed. Achmad, Achmed, Ahmaad, Ahmad, Ahmet, Ahmod, Amad, Amadi, Amahd, Amed. Amad, Amed, Amahd, Amadi, Ahmad, Amado, Amid, Umed. Iman, Imre, Imani, Imri, Imray, Ismat, Itai, Mead. Ad, Adamo, Adams, Adan, Adao, Addam, Addams, Addem, Addie, Addis, Addison, Addy, Ade, Adem, Adham, Adhamh, Adim, Adnet, Adnon, Adnot, Adom, Atim, Atkins Edom, Adem, Aindrea, Aindreas, Analu,

GNR has implemented a knowledge based approach for coping with the wide array of multi-cultural name forms found in databases.

The Knowledge Base Process

Maria del Carmen Bustamante de la Fuente Hisham Abu Ali Quereshi Noor Eldin Chang Wen Ying Nadezhda Ivanovna Ovtsyuk William Martin Smith-Bagby Jr. Ohndre Van Der Merve

GNR Knowledge Base Over 20 years in development Information based, not rule based. Over 200 countries studied and growing Close to a billion names & linguistics, and growing We have it, no one else does

Ohndre

Male 90%German

50 Variants

Parsing

On-Dray

Search

1. Names are first submitted to an automatic analysis process, which determines the most likely cultural/linguistic origin of the name.

2. Based on this determination, an appropriate algorithm or set of rules is applied to the matching process.

Classification: Ohndre 89% German Van Der Merve 90% Dutch

Parsing: “Van Der Merve, Ohndre”

Gender: Ohndre – 90% Male

Variants: 65 Variations

Phonetics: “On-Dray - Ohndre”

Noise: Ohndre, Ondre, Omdre,

Nicknames: Andy, Drew, Drus

Salutations: Mr, Mrs, Doctor, Haj,

Information Management Software

Global Name Management – is made up of 2 products:

•Global Name Analytics •Global Name Scoring

Global Name Encyclopedia

Transliteration• Cyrillic• Latin-2• Greek• Arabic

IBM Global Name Analytics IBM Global Name Scoring

Global Name Reference Encyclopedia

Fully automated, high-performance multi-cultural name recognition and analysis

IBM Global Name Management(Complete portfolio package – Minus the Encyclopedia)

Global Name Recognition

Information Management Software

Global Name Analytics

IBM Global Name Analytics Identifies and classifies cultural

background of a name Determines country of association for a

given name Recognizes whether a name is

predominantly male or female and provides relevant frequency statistics

Returns name variants and scores in order of their frequency of occurrence

Determines which name, of a given name combination, is likely to be the given name or surname

Information Management Software

Global Name Scoring (aka Name Matching)

IBM Global Name Scoring

Input: Li-Hsiang Tsai

Search Results: Tsai, Li Hsiang 1.0 116313Tsai, Lishiang 0.99 102059Cai, Li-Xiang 0.98 131620Tasi, Li Hsiang 0.83 158987

Key capability: perform name matching against lists or other data sources

Improved accuracy of name searching, transliteration, and the quality of identity verification initiatives

Tuning capability for more than 40 parameters, allowing for highly tuned and application-specific results

Provides ranked search results based on similarity of pronunciation

Capability to accept names in native script for Arabic, Cyrillic and Greek languages and return results

KEY POINTS: Fast, accurate, scalable name matching

Information Management Software

Standalone user-based web application

Lightweight, no need for integration with systems

Comprehensive, interactive reference tool for understanding names, their origins and history

Includes culture-specific information about names, their use, their meanings, and their patterns of spelling variations

Global Name Reference Encyclopedia

Global Name Reference Encyclopedia

Information Management Software