19
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST- 2000-25366 Kick-off meeting Edinburg March 2001

Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Embed Size (px)

Citation preview

Page 1: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

ICDC background and know-howand expectations from CROSSMARC

CROSSMARC Project IST-2000-25366

Kick-off meeting

Edinburg March 2001

Page 2: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

NLP-based applications at ICDC

• Documents filtering– Syntactic analysis + NERC + Inference engine– Intranet and commercial internet

• Documents clustering– Statistical analysis

• Real-time documents indexing– Search engine techniques

Page 3: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

NLP- based prototypes at ICDC

• Shareholding events detection– Information extraction

• Documents filtering– transducers (CORAIL)– neural networks (TREC)

• Control techniques using machine learning– controlling filters with neural networks (RIAO)– controlling NERC with C4.5 (ADIET with NCSR)

Page 4: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

NLP-based applications

• Complex applications– development– exploitation– maintenance

• Heterogeneous modules– implementation: OS, language, communication, format– processing: data, resources, algorithms

Page 5: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

TalLab

• ICDC architecture for NLP-based applications– Operational since 1997– Used in several applications and prototypes

• Publications– [Wolinski et al. 98] NLP+IA, Moncton

– [Wolinski et Vichot 01] TSI, Paris

• Reference– [Cunningham et al. 00] LREC, Athens

Page 6: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

Guidelines for the design of TalLab

• Relying on a multi-agents model

• Reusing the OS wherever it is possible

• Refusing to impose a single standard

Page 7: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

Agents and circuits in TalLab

messagesknowledge

Accointances

activitybehavior

persistence

messagebox

Agent

Circuit of agents

Page 8: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

NLP techniques used in TalLab

• Tokenisation• POS tagging• Syntactic analysis• Named Entity

Recognition and Classification

• Semantic analysis

• Search engines• Neural networks• Finite state transducers• Vector space model• Statistical clustering

Page 9: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

Transistor-like agents

Cardinality 1-N

Cardinality N-1

Cardinality 1-1

Multiplier Dispatcher Switcher

Filter Translator Networker

ConcentratorSynchronizer

Page 10: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

TalLab main features• Malleability : plug & play architecture, easy prototyping

• Openness : reuse market components, low integration cost

• Efficiency: distribute applications, real-time, batch processing

• Exploitability:

– Deployability full integration in the MIS

– Reliability quality of service, robustness

– Controllability monitoring facilities, surveillance tools

Page 11: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

Malleability

Units of production=

Circuits of agents

Linking modules=

Plugging agents

Page 12: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

OpennessIntegrating a component

=Building a transducer

Managing heterogeneity=

Programming a translator

Page 13: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

Efficiency

Pipeline architecture Concurrent architecture=

Using multiplier

Page 14: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

Exploitability

• Deployability

– distribution: sub-networks architecture

– networkers: intranet proxies and internet firewalls

• Fiability

– modularity: independence of agents

– persistence: knowledge / message box / failures

• Controllability

– uniformity: general controlling procedures

– OS integration: connection to monitoring software

Page 15: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

ICDC technical expectations

• Adaptive techniques for information extraction from web pages

• Techniques for managing multilingual NLP-based applications

• Processing typical web texts (vs news items)

Page 16: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

ICDC applicative expectations

• Evaluation of the added-value of CROSSMARC in the context of CDC

• Exploitation of CROSSMARC by-products for competitive intelligence applications

Page 17: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

Intranet application at CDC• Real-time news filtering and clustering

– 100 users, 100 topics

• Information retrieval– 2 years of AFP economic news

Page 18: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

Internet application at CDC-Mercure

• Real-time news filtering– 8,000 users, 80 topics

Page 19: Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST-2000-25366 Kick-off

Edinburg March 2001 CROSSMARC Kick-off meeting ICDC

IE prototype at CDC

• IE dedicated to shareholding events