1
CASOS Summer Institute 2010 www.casos.cs.cmu.edu This work is part of the Dynamics Networks project at the center for Computational Analysis of Social and Organizational Systems (CASOS) of the School of Computer Science (SCS) at Carnegie Mellon University (CMU). Support was provided, in part, by the Office of Naval Research (ONR HSCB N000140811223, MMV - N00014-06-1-0104, MURI N000140811186) , the Army Research Institute (W91WAW07C0063) and CASOS. The views and conclusions contained in this document are those of the author sand should not be interpreted as representing the official policies, either expressed or implied, of the Office of Naval Research, the Army Research Institute or the U.S. government. AutoMap is a text mining tool that enables the extraction of network data from texts. AutoMap can extract four types of information: content (concepts, frequencies and meta-data such as sentence length) semantic networks (concepts and relationships) meta-networks (ontologically coded concepts and relationships – named entities and links) sentiment and node attributes (attributes of named entities) Parts of speech tagging, machine learning techniques and proximity analysis enable computer- assisted Network Text Analysis (NTA). NTA encodes the links among words in a text and constructs a network of the linked words. AutoMap: Extracting usable information from unstructured texts Daniel Chieffallo [email protected] Michael W. Bigrigg [email protected] Prof. Kathleen M. Carley [email protected] The GUI Interface Frank Kunkel [email protected] Dave Columbus [email protected] Todd Eisenberg [email protected] Extraction of the network is done using AutoMap. Additional Information added with Post processing Data fusion Routines. Analysis and visualization of networks is done using *ORA. AutoMap: Data Extraction Preprocessing text modification prior to data extraction: Support tools for generation and editing of thesauri, ontologies, and delete lists exist Texts can be analyzed real-time User identifies desired manipulations in the GUI Manipulations can be saves as a workflow script This script can be run in batch mode The ScriptRunner can read AutoMap scripts Using Scriptrunner the user can: Add in webscrapers Add post-processing options e.g. add lat-lon from gazateers The SuperScript module utilizes all processors in a multi-core environment, paralellizing the text processing ScriptRunner Interface Draq & drop editor to adjust workflow Palette of available commands. Canvas of the current commands. Drag and drop new items here. Multiple tabs for different phases of processing Jana Diesner [email protected] Remove Symbols Convert to Lowercase Stemming Delete List Generalization Bigrams Named entity extraction Key word in context Unstructured Data: Texts Preprocessing Post Processing Event inference Belief inference Node attribute data fusion Lat-Lon data fusion AutoMap is available as a stand-alone application as well as a module in *ORA & SORASCS. ORA: Analysis the 20 a 30 John_Doe 15 cattle 10 bars 8 argument 1 gnostic 1 CSV & DyNetML Output Specialized Text analytics: Theme identification – LSA Hot Topics Semantic Network Parts of Speech Communicators Communicative Power And – all social and dynamic network tools

No Slide Titlecasos.cs.cmu.edu/projects/automap/AM3-V2.pdf · CASOS Summer Institute 2010 This work is part of the Dynamics Networks project at the center for Computational Analysis

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: No Slide Titlecasos.cs.cmu.edu/projects/automap/AM3-V2.pdf · CASOS Summer Institute 2010 This work is part of the Dynamics Networks project at the center for Computational Analysis

CASOS Summer Institute 2010

www.casos.cs.cmu.edu

This work is part of the Dynamics Networks project at the center for Computational Analysis of Social and Organizational Systems (CASOS) of the School of Computer Science (SCS) at Carnegie Mellon University (CMU). Support was provided, in part, by the Office of Naval Research (ONR HSCB N000140811223, MMV - N00014-06-1-0104, MURI N000140811186) , the Army Research Institute (W91WAW07C0063) and CASOS. The views and conclusions contained in this document are those of the author sand should not be interpreted as representing the official policies, either expressed or implied, of the Office of Naval Research, the Army Research Institute or the U.S. government.

AutoMap is a text mining tool that enables the extraction of network data from texts. AutoMap can extract four types of information: • content (concepts, frequencies and meta-data such as sentence length)• semantic networks (concepts and relationships)• meta-networks (ontologically coded concepts and relationships – named entities and links)• sentiment and node attributes (attributes of named entities)Parts of speech tagging, machine learning techniques and proximity analysis enable computer-assisted Network Text Analysis (NTA). NTA encodes the links among words in a text and constructs a network of the linked words.

AutoMap:Extracting usable information from unstructured texts

Daniel [email protected]

Michael W. [email protected]

Prof. Kathleen M. [email protected]

The GUI Interface

Frank [email protected]

Dave [email protected]

Todd [email protected]

Extraction of the network is done using AutoMap. Additional Information added with Post processing Data fusion Routines.Analysis and visualization of networks is done using *ORA.

AutoMap: Data Extraction

Preprocessing text modification prior to data extraction:

Support tools for generation and editing of thesauri, ontologies, and delete lists exist

•Texts can be analyzed real-time•User identifies desired manipulations in the GUI•Manipulations can be saves as a workflow script•This script can be run in batch mode

•The ScriptRunner can read AutoMap scripts•Using Scriptrunner the user can:

•Add in webscrapers•Add post-processing options e.g. add lat-lonfrom gazateers

•The SuperScript module utilizes all processors in a multi-core environment, paralellizing the text processing

ScriptRunner InterfaceDraq & drop editor to adjust workflow

Palette of availablecommands.

Canvas of thecurrentcommands.Drag and dropnew items here.

Multiple tabsfor differentphases ofprocessing

Jana [email protected]

•Remove Symbols•Convert to Lowercase•Stemming•Delete List•Generalization

•Bigrams•Named entity extraction•Key word in context

Unstructured Data:Texts

Preprocessin

g

Post Processing

•Event inference•Belief inference•Node attribute data fusion•Lat-Lon data fusion

AutoMap is available as a stand-alone application as well as a module in *ORA & SORASCS.

ORA: Analysis

the 20a 30John_Doe 15cattle 10bars 8argument 1gnostic 1

CSV & DyNetMLOutput

Specialized Text analytics:

• Theme identification – LSA• Hot Topics• Semantic Network• Parts of Speech• Communicators• Communicative Power

And – all social and dynamic network tools