19
Practical Project of the 2006 Joint International Master’s Degree

Practical Project of the 2006 Joint International Master’s Degree

Embed Size (px)

Citation preview

Page 1: Practical Project of the 2006 Joint International Master’s Degree

Practical Project of the 2006Joint International Master’s Degree

Page 2: Practical Project of the 2006 Joint International Master’s Degree

Agenda

Introduction Technologies in use Architecture Demonstration Remaining Issues Work packages for Semester II Questions & Comments

Page 3: Practical Project of the 2006 Joint International Master’s Degree

Introduction

Practical project during the course of studies Timeframe: two terms Topic: Prototype of a semantic search engine

using UIMA

Objectives of the first semester Study the UIMA-Framework and OpenNLP library Search for players, teams, matches and dates Semantic search for goal events Implement an executable prototype

Page 4: Practical Project of the 2006 Joint International Master’s Degree

Technologies in Use

UIMA-Framework OpenNLP Java / Java Server Pages Tomcat-Server Python (Webcrawler)

Page 5: Practical Project of the 2006 Joint International Master’s Degree

ArchitectureOverview

Unstructured informationPlain Text

converter (parser)

Persistent Search index

UIMA-Framework

OpenNLP

Input

Output

Sentence detection

Word detection

Paragraph detection

Date & Time annotator

Player annotator Match annotator

CAS

NLP-Annotator 1

Goal-Event annotator

User Interface

Page 6: Practical Project of the 2006 Joint International Master’s Degree

ArchitectureWebcrawler

Usage of web crawler for preselection of Texts

Implemented in Python Crawls ca. 2500 pages in 20 minutes Presently based on keywords Transfer of results to Jimgle still

manual

Page 7: Practical Project of the 2006 Joint International Master’s Degree

ArchitectureNLP-Annotator

Usage of the OpenNLP-Tools & API Rule based approach Tagging of paragraphs, sentences and words Part-of-Speech-Tagging

Implementation in UIMA as separate annotator Results are used by consecutive annotators Internal usage only, not displayed in the search

index

Page 8: Practical Project of the 2006 Joint International Master’s Degree

Architecture

Identification of players of the WM2006 Rule based implementation Usage of the OpenNLP word-annotations Matching against the player database

(XML-File) Consideration of last names and

nicknames

Player-Annotator

Page 9: Practical Project of the 2006 Joint International Master’s Degree

ArchitectureDate & Time-Annotator

Identification of time and date information Usage of the OpenNLP word-annotations Presently custom, rule based implementation Detecs standard conform time & date

information Detection of relative or colloquial time

information not implemented yet

Page 10: Practical Project of the 2006 Joint International Master’s Degree

ArchitectureMatch-Annotator

Identification of matches Based on 3 components

Detection of locality Detection of participating teams Detection of the match result

Usage of upstream annotators OpenNLP word-annotations Player annotations Date- & time-annotations

Page 11: Practical Project of the 2006 Joint International Master’s Degree

ArchitectureGoal-Event Annotator

Description of goals are too complex for a rule-based detection

Therefore: Machine based learning Usage of the OpenNLP library Based on statistical information of sentences Comprehensive training necessary

Implementation as OpenNLP component Integration into UIMA by wrapper-classes

Page 12: Practical Project of the 2006 Joint International Master’s Degree

ArchitecturePersistent Indexing

Functionality Import of all files in a specific directory Annotation of all available texts Compilation of XML-Files with CAS-data of

every source text Adjacent creation of a search index

Provision of index files for the web-server

Page 13: Practical Project of the 2006 Joint International Master’s Degree

ArchitectureGraphical User Interface

Linux server with tomcat installation Simple operation via web-based GUI Search queries are handled by Java server

pages Processing of requests by Java beans

Page 14: Practical Project of the 2006 Joint International Master’s Degree

Demonstration Search engine

Page 15: Practical Project of the 2006 Joint International Master’s Degree

Open IssuesFurther proceeding…?

Search for attributes e.g. Player AND Germany (presently only via OmniFind)

Automate processing of search engine results

Further training of the components Usage improvements at front- and

backend

Page 16: Practical Project of the 2006 Joint International Master’s Degree

New scenarios……for the second semester

Automated analysis of eMails Search for phone numbers Search for customer contacts of employee Find employees with specific skills Find links & relations between employees

Competitive analysis Compare own products with ones from competitors Find out about customer opinions in internet portals

Further ideas??

Page 17: Practical Project of the 2006 Joint International Master’s Degree

Ideas……for the second semester

Natural language based search queries Design templates for customizable

annotators Machine based learning for the Web-Crawler Mark annotations in the search results Automated processing of search results Implement more anotators via OpenNLP Provide annotators as web-services

Further ideas??

Page 18: Practical Project of the 2006 Joint International Master’s Degree

JIMGLEJIM Master-Project

Questions?

Suggestions?

Page 19: Practical Project of the 2006 Joint International Master’s Degree

JIMGLEJIM Master-Project

Thanks for your attention…