20
Advanced Decision Support Advanced Decision Support for Archival Processing for Archival Processing of Presidential E-Records: of Presidential E-Records: Results and Demonstration Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia This research was sponsored by the Army Research Laboratory and NARA under Army Research Office Cooperative Agreement W911NF-06-2-0050 (Sept 22, 2006- Sept 21, 2009).

William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

  • Upload
    decima

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration. William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia - PowerPoint PPT Presentation

Citation preview

Page 1: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Advanced Decision SupportAdvanced Decision Support for Archival Processing for Archival Processing

of Presidential E-Records: of Presidential E-Records: Results and DemonstrationResults and Demonstration

William Underwood, P.I.Georgia Tech Research Institute

Atlanta, Georgia

This research was sponsored by the Army Research Laboratory and NARA under Army Research Office Cooperative Agreement W911NF-06-2-0050 (Sept 22, 2006-Sept 21, 2009).

Page 2: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

OverviewOverview

Document Type Recognition Metadata ExtractionItem DescriptionSpeech Act RecognitionDecision Support for Archival ReviewFile Format IdentificationDemonstrations

Page 3: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Document Types, Metadata and Document Types, Metadata and Archival DescriptionArchival Description

In responding to FOIA requests, Archivists need to be able to search collections of records with high precision and recall.

◦ But at the time of responding to FOIA requests, archivists have not read all of the records, so cannot index the records and search on such attributes as person, organization and location names, topics, dates, author’s and addressee’s names and document types.

Archivists cannot describe a collection until the collection has been manually read and reviewed.

◦ With increasing volumes of electronic records, it may be decades or even centuries before new acquisitions are described.

◦ Item Descriptions are needed in the results of FOIA Search

Filename - 3

Page 4: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Method Method for Recognizing Document Typesfor Recognizing Document Types

1. Document Reader2. English Tokenizer3. Wordlist Lookup + enhanced wordlists4. Sentence Splitter 5. Hepple POS Tagger + lexicon6. Semantic Tagger + Named Entity Rules7. Intellectual Element Annotator + Intellectual Element

Rules (DER)8. SUPPLE Parser/Interpreter + Document Type Grammars

augmented with Semantics9. Extract Metadata

Filename - 4

Page 5: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Documentary Form:Documentary Form:Intellectual Element RecognitionIntellectual Element Recognition

Filename - 5

Page 6: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Filename - 6

Grammar for Documentary Form Grammar for Documentary Form of a Memorandumof a Memorandum

Page 7: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Parse Tree and Semantics Parse Tree and Semantics of the Documentof the Document

Filename - 7

Page 8: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Extracted Metadata and Item Extracted Metadata and Item Description in ManifestDescription in Manifest

DOCTYPE = ‘White House Memorandum’DATE = ‘April 27, 1992’AUTHOR = ‘EDE HOLIDAY’ADDRESSEE = ‘SAM SKINNER’TOPIC = ‘California Earthquake’DESCRIPTION = ‘Memorandum dated April

27, 1992 from EDE HOLIDAY to SAM SKINNER regarding California Earthquake’

Page 9: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Speech ActsSpeech Actsand Record Descriptionand Record Description

Actions are a part of item descriptions

Signature Memorandum from Boyden Gray to the President recommending the nomination of Ronald B. Leighton to be a US District Judge.

Letter from President Bush to President Mikhail Gorbachev suggesting an informal meeting.

Memorandum from President Bush to Boyden Gray requesting an analysis of the War Powers Resolution.

Letter from Susan Black to President Bush expressing appreciation for nomination and commitment to serve.

Page 10: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Speech Acts and Archival ReviewSpeech Acts and Archival Review

Archival review in response to FOIA requests requires recognition of the actions expressed in records

Presidential Records Act restriction on disclosure a(5) “Confidential Advice”

"confidential communications requesting or submitting advice, between the President and his advisors, or between his advisors”

Example of action expressing confidential advice:“I further recommend that the President look for opportunities to

speak at an appropriate event indicating his knowledge of and interest in this issue, …”

Page 11: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Explicit & Implicit Speech Acts

Every complete sentence carries out a speech act. Performative sentences express explicit speech acts. A performative verb is a verb whose action is accomplished

merely by saying it or writing it. I recommend that you attend the conference.

Declarative, imperative and interrogative sentences express implicit speech acts.◦ Declarative (state)

You completed the report◦ Imperative (request)

Please, complete the report.◦ Interrogative (ask)

Did you complete the report?

Page 12: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

A Method for Recognizing Speech A Method for Recognizing Speech Acts in E-RecordsActs in E-Records

Input: Textual Document & metadata from the Manifest

1. Read author and addressee metadata from the manifest

2. Information extraction3. Parse Sentences in the document4. Speech Act Transducer

◦ Annotate Explicit Speech Acts◦ Annotate Implicit Speech Acts◦ Annotate Speech Acts Indicated by Text Structure◦ Annotate Indirect Speech Acts◦ Annotation of the Primary Speech Acts

Output: [document(e1), author(e1, S), addressee(e1, H), act(e1 F(P))]

Page 13: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Decision Support for Archival Decision Support for Archival ReviewReview

FOIA (and systematic) review of Presidential records for PRA and FOIA restrictions on disclosure requires page-by page review of the records

Due to the increasing volume of records, in all braches of Government, and especially EOP, decision support is needed to assist archivists in review.

Page 14: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Potential Benefits of Archival Potential Benefits of Archival Review AssistantReview Assistant

Reducing the risk of opening a document or passage of a record whose access should be restricted,

A tutoring tool during training of review archivists. A tool that novice reviewers could use to check their work. Provision of additional evidence in case a reviewer's

judgment was uncertain, or point out uncertainties, where the reviewer thought the decision was certain.

Support estimation of FOIA review workload in terms of the number of restrictions and types of restrictions likely to apply.

Support reviews of Federal Records for FOIA exemptions. Extension of the technology to support declassification of

security classified records.

Page 15: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Components of an Components of an Archival Review AssistantArchival Review Assistant

Page 16: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

File Format IdentificationFile Format Identification

A capability to identify file formats is needed by ERA for◦Insuring compliance with Record Transmittal

Agreement◦Viewing/playing files◦Conversion to current or standard file formats◦archive extraction◦Password recovery and decryption◦Repair of damaged files

Page 17: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Linux File Command & Magic File

Page 18: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Extensions of File Command and Magic File

Magic for individual file formats Output of file command/magic file is File Format

IDRewriting file command code for identifying

Characteristics of Text files and Document TypesDefined approx. 800 file format signaturesCollected examples of approx. 500 of the file

format typesCreated File Signature DatabaseVerified that File Format Identifier with magic file

correctly identifies approx. 500 File Types

Page 19: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

DemonstrationsDemonstrations

1. Document Type Recognition, Metadata Extraction & Item Description

2. Automatic Recognition and Interpretation of Performative Sentences

3. Decision Support for Archival Review

4. File Format Library & File Format Identifier

Page 20: William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Additional InformationAdditional Information

1. W. Underwood et al. Advanced Decision Support for Archival Processing of Presidential E-records, TR ITTL/CSITD 09-01, Georgia Tech Research Institute, Sept 2009

2. W. Underwood & S. Laib. Automatic Recognition of Documentary Forms, Technical Report ITTL/CSITD 08-02, GTRI, May 2008

3. W. Underwood. Recognizing Speech Acts in Presidential E-records, TR ITTL/CDITD 08-03, GTRI, Oct 2008