28
Applying Existing Technolog y to Exploitat ion of Multiple Sources of Informati on Mike Brenton Sterling Software Memex Technology Limited

Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Embed Size (px)

Citation preview

Page 1: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Applying Existing

Technology to

Exploitation of Multiple Sources of

Information

Mike Brenton

Sterling Software

Memex Technology Limited

Page 2: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Problem Statement

• First there was data overload.

• Now there is an over abundance of tool power.

Page 3: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

InformationTypes

The Migration Defense Intelligence Threat Data System (MDITDS) is a Department of Defense Intelligence Information Systems (DODIIS) designated migration system tasked to provide the automated production system for the DODIIS Indications and Warnings (I&W), Counterintelligence (CI), Anti-terrorism (AT), Counterterrorism (CT), Information Warfare (IW), Arms Proliferation (AP),

and Defense Industry (DI) communities.

First Name Last Name Project Phone # Office #Michael Brenton MDITDS x7060 249Kornegay Harold MDITDS x7049 221

Sterling Software Announces 2-For-1 Stock Split

DALLAS, Texas (March 11, 1998) - Sterling Software, Inc. (SSW-NYSE) today announced that its Board of Directors has approved a 2-for-1 split of the company’s common stock. Stockholders will receive one additional common share for every share held on the record date of March 20, 1998. The additional shares will be issued on April 3, 1998.

Sterling Software currently has approximately 38.8 million shares of common stock outstanding. This number will double to approximately 77.6 million shares by reason of the stock split.

Sterling L. Williams, president and chief executive officer of Sterling Software commented, "Sterling Software’s stock price increased 30% during 1997 and 28% so far this year, based on consistently excellent performance by the company. We decided to split our stock to improve its trading liquidity and to help ensure that it trades in a price range that is accessible to a broad base of investors."

Sterling Software is a leading provider of software and services for the applications management, systems management and federal systems markets. Sterling Software, with its headquarters in Dallas, has a worldwide installed base of more than 20,000 customer sites and has 3,100 employees in 85 offices worldwide. For more information on Sterling Software, visit the company’s Web site at http://www.sterling.com.

Contact: Julie Kupp Vice President, Investor Relations Sterling Software, Inc. (214) 981-1000 [email protected]

©Copyright Sterling Software, 1998 All rights reserved

Page 4: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Open Source Materials

• Electronic Information– Library Services– On-line Newspapers– On-line Reports– Information Brokers– CD-ROM Products– Wire Services

• Agents– Services - People– Services - Push and Punch– Spiders, Crawlers, and Profilers

• Electronic Information– Library Services– On-line Newspapers– On-line Reports– Information Brokers– CD-ROM Products– Wire Services

Page 5: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

• Data Warehouses• Concept Analysis and

Summarization• Vectors, Clustering, Histograms• Data Mining• OLAP• Statistical Analysis• Visualization• Information Extraction• Temporal Analysis• Link Analysis

• Data Warehouses• Concept Analysis and

Summarization• Vectors, Clustering, Histograms• Data Mining• OLAP• Statistical Analysis• Visualization• Information Extraction• Temporal Analysis• Link Analysis

Tools

Page 6: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Data Warehouses

• Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining.

• A data warehouse is a database that provides efficient and integrated access to relevant analytical data.

Department of Information Science - The Aarhus School of Business

• Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining.

• A data warehouse is a database that provides efficient and integrated access to relevant analytical data.

Department of Information Science - The Aarhus School of Business

• Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining.

• A data warehouse is a database that provides efficient and integrated access to relevant analytical data.

Department of Information Science - The Aarhus School of Business

• Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining.

• A data warehouse is a database that provides efficient and integrated access to relevant analytical data.

Department of Information Science - The Aarhus School of Business

Page 7: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Memex Information

Engineand

Client Applications

DIA EUCOM JICPAC SOUTHCOM CENTCOM STRATCOM SPACECOM TRANSCOMNetwork

Products� Country Profiles� Group (Unit) Profiles� Individual Profiles� Incidents (Events)� Misc. Assessments� All

Domains� Counter Intelligence� Counter Terrorism� Force Protection� Arms Proliferation� Defense Industries� Indications & Warning� All

File Edit View Insert

Memex Network Query Tool

Field Search:Name: __________________________Incident Type: ____________________Organization Type: ________________Equipment: ______________________Start Date:_________ Stop Date:_________

Text Search:

File Edit View Insert

Memex Network Query Tool

1 -- 90% -- Air Power Over Bosnia2 -- 85% -- UK Air Power and NATO3 -- 70% -- Air Power Assessment4 -- 65% -- Munitions on Tactical Fighters5 -- 50% -- Tactical Fighters and LSB6 -- 50% -- Smart Munitions7 -- 45% -- Air Dropped Land Mines

File Edit View Insert

Memex Network Query Tool

4 -- 65% -- Munitions on Tactical Fighters6 -- 50% -- Smart Munitions7 -- 45% -- Air Dropped Land Mines 2 -- 85% -- UK Air Power and NATO1 -- 90% -- Air Power Over Bosnia3 -- 70% -- Air Power Assessment5 -- 50% -- Tactical Fighters and LSB

Page 8: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

ConceptAnalysis

and Summarization

• Concept analysis is the process of matching keywords in the text to hierarchical topic trees in order to determine the major theme(s) in the document, paragraph, or sentence.

• Some systems use this information and predetermined “templates” to build summaries of a document.

• The concepts and summaries are then used to route documents to analysts.

Page 9: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Vectors, Clustering,

and Histograms

• Document clustering is a technique for automatically discovering the subtopics in a set of documents and grouping the documents by those subtopics.

• Organizing documents by subtopic can help you get a sense of the major subject areas covered in the document set…

Verity, Inc.

• Document clustering is a technique for automatically discovering the subtopics in a set of documents and grouping the documents by those subtopics.

• Organizing documents by subtopic can help you get a sense of the major subject areas covered in the document set…

Verity, Inc.

Page 10: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Data Mining • Data mining is the analysis of data for relationships that have not previously been discovered.

• For example, the sales records for a particular brand of tennis racket might, if sufficiently analyzed and related to other market data, reveal a seasonal correlation with the purchase by the same parties of golf equipment.

whatis.com Inc.

• Data mining is the analysis of data for relationships that have not previously been discovered.

• For example, the sales records for a particular brand of tennis racket might, if sufficiently analyzed and related to other market data, reveal a seasonal correlation with the purchase by the same parties of golf equipment.

whatis.com Inc.

Page 11: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

OLAP • OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view.

• For example, display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, 1997, then compare revenue figures with those for the same products in July, 1996, and then etc.

whatis.com Inc.

• OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view.

• For example, display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, 1997, then compare revenue figures with those for the same products in July, 1996, and then etc.

whatis.com Inc.

Page 12: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Statistical Analysis

• The collection, classification, and interpretation of numerical data.

• Elements of statistics are present in most OLAP tool sets.

• Functions include: Frequency Distribution, Average, Mean, Standard, Deviations, etc.

• Functions found in most spreadsheet applications.

Page 13: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

• Visualization is the process of representing abstract business or scientific data as images that can aid in understanding the meaning of the data.

• Visual computing is computing that lets you interact with and control work by through visualization.

whatis.com Inc.

• Visualization is the process of representing abstract business or scientific data as images that can aid in understanding the meaning of the data.

• Visual computing is computing that lets you interact with and control work by through visualization.

whatis.com Inc.

Visualization

Page 14: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

• Automated information extraction involves the identification and extraction of information about specified classes of events and the filling of templates for each instance of such an event.

• Operates against pure text.

• Also known as NLU or NLP.• Naval Research and Development

group (NRaD) of NOSC

• Automated information extraction involves the identification and extraction of information about specified classes of events and the filling of templates for each instance of such an event.

• Operates against pure text.

• Also known as NLU or NLP.• Naval Research and Development

group (NRaD) of NOSC

• Automated information extraction involves the identification and extraction of information about specified classes of events and the filling of templates for each instance of such an event.

• Operates against pure text.

• Also known as NLU or NLP.• Naval Research and Development

group (NRaD) of NOSC

Information Extraction

Page 15: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Temporal Analysis

• Temporal analysis is the process of evaluating information, events and activities in light of models which encompass the concept of time or sequence and time.

• Model sequences incorporate a timeframe constraint on the identified events.

Page 16: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

LinkAnalysis

• Link analysis provided the ability to investigate relationships between people, places, events, and things.

• Ideally, it is a mechanism to “walk through” a data warehouse following those links which have meaning relevant to the immediate problem.

Page 17: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Tools are nice but...

• There has to be a reason:

• Analysis of operational data

• Analysis of associated data

• Discovering new relationships

• Discovering new trends

• Gaining new insights into your business

• Competitive Edge

Page 18: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Different Tools

forDifferent Kinds of

Discovery

Page 19: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

InformationExtraction

• Translating text reports (prose) into “tagged data”

• Evaluating the tagged data to extract information

• Commonly referred to as Natural Language Understanding or Processing

Page 20: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

A Focus on the Analysis of Textual

Information

• Typical process flow– Receipt– Auto-analysis

• Classification

• Extraction

– Archive– Visualization

Wire Service

Government Traffic

Traffic Receipt

Ten-Plus- Year

Repository

Analyst Queues

Review Process

Analyze

Think

UpdateAssessment

Update Queue

Profiles

Ignore

Page 21: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Making the Information

Usable

Sterling Software Announces 2-For-1 Stock Split

DALLAS, Texas (March 11, 1998) - Mr. Sterling Williams of Sterling Software, Inc. (SSW-NYSE) today announced that the companies Board of Directors has approved a 2-for-1 split of the company’s common stock. Stockholders will receive one additional common share for every share held on the record date of March 20, 1998. The additional shares will be issued on April 3, 1998.

Sterling Software currently has approximately 38.8 million shares of common stock outstanding. This number will double to approximately 77.6 million shares by reason of the stock split.

Org Sterling Software US CorporationGroup Board of Directors Sterling SoftwareGroup Stockholders Sterling SoftwareLocation Dallas, TexasObject stock shares 38.8 millionObject stock shares 77.6 millionDate 11-Mar-98Date 3-Apr-98Date 20-Mar-98Event meeting Board of DirectorsEvent stock split 20-Mar-98

Page 22: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Information Extraction

is not Information

Retrieval

Information extraction gets facts out of documents -- you analyze the facts

Natural Language Processing Group, The University of Sheffield

Information retrieval gets sets of relevant documents -- you analyze the documents

Page 23: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

• There are many ways of expressing the same fact: – BNC Holdings Inc named Ms G Torretta as its new

chairman.

– Nicholas Andrews was succeeded by Gina Torretta as chairman of BNC Holdings Inc.

– Ms. Gina Torretta took the helm at BNC Holdings Inc.

• Information may need to be combined across several sentences: – After a long boardroom struggle, Mr Andrews

stepped down as chairman of BNC Holdings Inc. He was succeeded by Ms Torretta.

• There are many ways of expressing the same fact: – BNC Holdings Inc named Ms G Torretta as its new

chairman.

– Nicholas Andrews was succeeded by Gina Torretta as chairman of BNC Holdings Inc.

– Ms. Gina Torretta took the helm at BNC Holdings Inc.

• Information may need to be combined across several sentences: – After a long boardroom struggle, Mr Andrews

stepped down as chairman of BNC Holdings Inc. He was succeeded by Ms Torretta.

Why is Information Extraction Difficult

Natural Language Processing Group, The University of Sheffield

Page 24: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Information Extraction

(Document)

• Natural Language Understanding

LexicalAnalysis

Article

Reduction

SimpleRelations

CommonEvents

Coreference

DomainEvents

Records

Page 25: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Correlation of Extracted

Information (Other

Documents)

• The events in a single document are relevant to routing the document,

• But a single meeting (event) put in context of other meetings (events) becomes much more useful.

• Manual vs. Automated Process

• User interest profiles, e.g.,– Membership– Meeting (Communication) Events– Relocation (Movement) Events

Page 26: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Using Correlated

Data (Mining Text (or other)

Databases)

• What would the user do if they knew how to use the visualization tools?

• Automate the process:– Use names of people and

organizations for data mining.– Use temporal analysis to align

(chronologically) the events.– Use link analysis to establish

networks of people and things, e.g., vehicles.

• Present the user with organized information.

Monitor the success of the process and

feed back the results into the system.

Page 27: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Summary • Still faced with a tremendous amount of data.

• Tools are available for acquiring information relevant to your business.

• Tools to perform data mining over a substantial data warehouse require a commitment to:– Money

– Time

– Training

– Personnel

• The results are:

Page 28: Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

Thank you Mike Brenton

Sterling Software

www.sterling.com

[email protected]

-------------

Memex Technology Limited

www.memex.co.uk

------------

Jim Basara

Memex, Inc.

[email protected]