14
YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein, Ph.D. Linguistic Technology Systems [email protected]

YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

Embed Size (px)

Citation preview

Page 1: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL

INSTITUTE

Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis

Amy Neustein, Ph.D.

Linguistic Technology Systems

[email protected]

Page 2: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

WHY DO WE NEED A NEW NATURAL LANGUAGE INTELLIGENCE

METHOD FOR MINING WIRETAP RECORDINGS?

1) The volume of terrorism-related government wiretap recordings far exceeds the intelligent agent’s human capabilities to mine those recordings; and

2) Most automated audio data mining programs have a low rate of return when searching for “keywords” in wiretap recordings because terror suspects will deliberately avoid the use of key words that can identify names, places, dates, etc.

Page 3: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

Sequence Package Analysis--A New Method of Natural Language Intelligence

HOW DOES SPA WORK?

1) Add rather than Replace

SPA adds a layer of intelligence to standard dialog systems.

2) Mines audio data

SPA goes beyond a conventional search for words and word strings.

Identifies a Series of Related Speaking Turns and Turn Construction Units (parts of turns) that are Discretely Packaged as a

Sequence of Conversational Interaction

Page 4: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

WHAT IS THE METHODOLOGICAL BASIS OF SPA?

SPA is a new natural language understanding method, which has been successfully peer reviewed and cited by other

researchers as an important data mining method for captioning text, that draws mainly from the field of conversation analysis:

the study of the orderly properties of interactive dialog that revolve around the turn-taking system process and other sequentially based features that are part of that process.

Conversation Analysis has been called by some a sub field of A.I. because it can detect the detailed structural

organization of dialog which is a necessary precondition for the design of dialog systems that simulate and understand

human dialog.

Page 5: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

WHAT DOES SPA DO?

1) SPA permits the discovery of “key” words (e.g., the name of a location where a crucial meeting among terrorists will take place) that are not contained in the speech application’s vocabulary.

2) SPA permits rapid and efficient data mining of large volumes of audio text by spotting sequence packages in the dialog.

Page 6: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

MINING THE DATA FOR SEQUENCE PACKAGES

•A sudden increase in the speakers’ use of pronouns in place of noun referents may indicate the speakers are going over familiar or well rehearsed subject matter.

• The unexpected increased use of adjectival descriptors, serving as a kind of privately shared “shorthand” label to describe a person or enemy target, in the place of nouns can flag terrorist plans and activities.

•SPA, by looking for sequence patterns, can locate these descriptors even when they are outside of the speech application’s vocabulary.

Page 7: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

ADVANTAGES OF SPA • SPA captures the predictable patterns of human dialog,

while all other methods depend on spotting isolated key words or phrases, which can vary from speaker to speaker;

• Can be applied to different languages because it works by identifying conversational sequence patterns - which cut to the heart of the social architecture of language-- rather than identify a preset glossary of words; and

• Has the potential of performing data mining in real time, allowing a human analyst to act on the spot when hearing high alarm content.

Page 8: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

DEMONSTRATION

The following example shows how applying an SPA approach to wiretapped dialog can flag important security information that is cleverly disguiseddisguised by the suspects:

Page 9: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

Speaker “A” is trying to educate Speaker “B” about a new meeting place right at the tip of the Brooklyn Bridge. Any confusion or misunderstanding about this meeting place could spoil the plans.

But Speaker “A” is very clever:

First, he stays away from buzz words (such as naming a bridge, a tunnel or a street).

Second, he refrains from making any prefatory remarks or comments to the other speaker about how vital it is to get these instructions right.

Page 10: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

Dialog Example

Speaker “A”: Come to the intersection near Juniors? Juniors? (the question mark (the question mark shows an upward intonation) shows an upward intonation) 0.2 - 0.5 second pause 0.2 - 0.5 second pause (speaker then (speaker then pauses briefly) pauses briefly)

Speaker “B”: 1.2 second pause

Speaker “A”: You know the thoroughfare with the big traffic light?

Speaker “B”: Juniors, yeah.

Page 11: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

THE SEQUENCE PACKAGE

Speaker “A”: Come to the intersection near Juniors? 0.2-0.5

Speaker “B”: 1.2 seconds of silence

• A noun referent (“Juniors”) with an upward intonation

• A brief pause, giving the listener the chance to show recognition or ask for clarification.

• Silence by the listener which indicates lack of understanding or confusion.

Page 12: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

Speaker “A”: You know the thoroughfare with the big traffic light?

Speaker “B”: Juniors, yeah.

• Speaker “A” produces a clarification of the noun referent (“Juniors”)

(“You know the thoroughfare with...”)

• Speaker “B” produces a repeat of the noun referent (“Juniors”) - the source of the recognition trouble

• followed by a recognitional marker (“Yeah”)--which demonstrates to Speaker “A” that he has corrected the misunderstanding.

• Had he simply produced a recognitional marker (“yeah”) without mentioning the source of the trouble (“Juniors”), there would be no indication to the other speaker that he now recognizes the

importance of the meeting place.

Page 13: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

Finding the Sequence Package in the

Dialog Example

Look for a concatenation of these utterance components:

• noun referent with upward intonation• brief pause • silence• clarification of noun referent • repeat of noun referent that was initial

source of the recognition trouble• recognitional marker

Page 14: YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,

CODA

The next step is the validation of SPA as a necessary tool for performing wiretap analysis

Research Question:

Do mining programs have a higher rate of accuracy in spotting

terrorists when adding Sequence Package Analysis as a new method of natural language intelligence for

performing wiretap analysis?