28
NITISH MANOCHA

NITISH MANOCHA. Platforms §AIX workstation §OS/390 §Sun Solaris §Windows NT

  • View
    230

  • Download
    1

Embed Size (px)

Citation preview

NITISH MANOCHA

Platforms

AIX workstationOS/390Sun SolarisWindows NT

Tools to Use

Topic categorization tool Categorizing emails Categorizing Web Pages

Text Analysis ToolTopic Categorization Tool

Text Analysis ToolTopic Categorization Tool

Category 1 (AI Schedule)

Text Analysis Tool Category2 (Database Schedule)

Text Analysis Tool

Target Category ( Data Mining Schedule)

Text Analysis Tool

Result - Category 2 (Databases)

Tools to Use

Clustering Tool (Finding Similar Information) Dividing Documents into Groups Identifying hidden similarities in documents Identifying duplicate documents from a

collection Finding Documents that are out of place

Text Analysis ToolHierarchical Clustering - imzhclst

Text Analysis ToolBinary Clustering - imzcrlst

Text Analysis Tool Results

Text Analysis Tool Results

Tools to Use

Feature Extraction Tool Name Extraction Abbreviation Extraction Relation Extraction

Text Analysis Tool Using Feature Extraction tool to extract names

imzxrun -b 2 -f C -x n -o faculty.out faculty.htm

Text Analysis Tool

Tools to Use

Language Identification Tool Organize collection of documents by language Restrict Search Results to documents in a

particular language

Text Analysis ToolUsing Language Identification tool

imzlgini -b 2 -v < mydoc.htm

Text Analysis Tool

Language Identification Tool Results Supports 13 Languages, New Languages Can

be trained

Text Analysis ToolUsing Summarizer tool

imzsum -l 4 project.html

Text Analysis Tool

Summarizer tool - Results

Tools to Use

Web Crawler Follows the Link topology for a fast search Produces a Web Site Map Use to Recognize the Authoritative pages Provides a filtered collection of pages

Web Crawler

imyclean - to define a web space Created include.re , exclude.re, types.re

imycrawl - to crawl a defined web space imycrawl url webspace

imystat - to track what happens during a crawl

Tools to Use

Text Search Engine Complicated Text Search Powerful Linguistic Capabilities Fuzzy searches Query based on structure of document

Text Search Engine

Operates on a Previously based index

Text Search Engine

Types of Index Linguistic Index (bought as buy) Feature Index (Linguistics + Names) Precise Index (bought as bought) Normalized Precise Index (Case Insensitive) Ngram Index

Combining Tools for Solutions

Searching with Categories combining Text Search Engine and Topic

Categorization Tool

Surviving a flood of email by using Topic Categorization Tools

Selectively indexing Web Pages by combining Web Crawler, Topic

Categorization Tool & Text Search Engine

Views of the Tool

Command Line (Good for Unix)Not very useful on Windows NTNot a good stand-alone ToolShould be viewed as a Library