Upload
shona-dennis
View
220
Download
3
Tags:
Embed Size (px)
Citation preview
HLT Industry in the Netherlands
Piek VossenFaculteit der Letteren, Vrije Universiteit Amsterdam
Irion Technologies, DelftWorkshop HLT Collaboration SA & Low Countries
24-26 November 2008, Cape Town
Overview of HLT-NL50 Companies investigated in the Netherlands (sources: NTU (64), Notas, PE)
http://taalunieversum.org/taal/technologie/ontwikkelaars.php
Searchno NLP
NLTSpeech
NLTText
NLTConsult.
SemanticWeb
ManualAnalysis
3 18 23 2 2 3
WiseGuys Philips Collexis Viataal LibRT Trendlight
IntelliGent DialogsUnlimited
Gridline inTaal AskNow Bureau Taal
Ilse DutchEar KnowledgeConcepts
Kieskompas
Telecats Polderland ....more many more many more
G2 Speech Q-go
Logica TextKernel
Voice Data
Bridge Carp
Comsys Irion
NLT Text (33)• Thesaurus based text processing: Collexis, GridLine, Knowledge
Concepts• Text mining: Textkernel, Irion• Spelling: Polderland, *TALO• Search: Irion, WiseGuys, Ilse, Intelligent• Classification: Irion, Collexis, Textkernel• Summarization: Carp Technologies• User profiling, data mining: AskNow, Sentient Machine Research
NLT Text (33)• Dialogue/Q&A: AskNow, Elitech, Q-go, Irion, • Lexicons: Van Dale• Translation tools: Lingvistica, Topterm, Linguistic Systems• Document & knowledge management: Getronics, CIBIT, AI
Engineering, ZyLAB Europe, Niceware, Sopheon, Human Inference, LibRT
• Manual text complexity: Bureau Taal• Manual language analysis, trends and politics: Kieskompas,
Trendlight• Medical language tools: Lexima, ViaTaal, inTaal• Semantic web: LibRT
NLT Speech (18)• !Effective ASR telephone applications,
stock market• Comsys ASR TTS telephone applications, call
centres• Dedicon TTS spoken documents for
disabled• Dialogues Unlimited ARS telephone applications• DutchEar ASR TTS telephone applications, self
services, colleague connect, stock market, traffic support, helpdesk support, speaker identification
• Fluency TTS text to synthetic speech• FORUS-P ASR database management• G2 Speech ASR dictating, work flow
management, medical domain, legal domain
• Group 2000 ASR telephone applications
NLT Speech (18)• Kompagne ASR TTS medical domain• Logica ASR TTS contact & call centers• ORCAvoice ASR TTS telephone applications• Philips ASR TTS dictating, medical domain,
legal domain• Sound Intelligence Sound hearing aid, medical
domain• Telecats ARS TTS telephone applications,
information retrieval, messaging, routing, call
handling en large platforms
• VoCognition ASR TTS logistics of storage centres• Voice Data Bridge ASR TTS telephone applications,
information retrieval, telecom operators
• YPCA ASR interactive services, automobile concepts
NLT-Text
Collexis:• http://www.collexis.com/• Technology:
– Fingerprints of documents using the knowledge residing in a thesaurus or multiple thesauri
– Fingerprints from existing results used to generate new results with higher precision
– Discovering the relationships between the elements of different content sources and uncovering unique information
• Application: Search, Knowledge management, Text mining• Market: Government, Legal, Health science• Projects & software
NLT-Text
Gridline:• http://www.gridline.nl/ • Technology: Semi-automatic development of thesauri and
ontologies• Application: Search, Authoring• Market: Government, Law firms• Projects & software
NLT-Text
KnowledgeConcepts:• http://www.knowledge-concepts.com/• Technology:
– Relation detection in text through use of semantic networks, thesauri and taxonomies
– Part-Of-Speech taggers, lemmatisers, entity extractors, stopword lists, and language identifiers
• Application: – multilingual search,– classification and analytical products
• Market: – Government, Banks, PTT, Publishers
• Projects & software
NLT-Text
Polderland:• http://www.polderland.biz/• Technology:
– spelling suggestion/correction– fuzzy matching– semantic expansion
• Application: – search, – content- and document management software, – authoring, – automatic classification and meta data extraction
• Market: – CRM-systemen, – contactcenter software, – publishing systems, – sharepoint and portal-server systems
• Projects & software
NLT-Text
Q-GO:• http://www.q-go.nl/• Technology:
– question analysis and normalization– search– dialogue, Q&A
• Application:– Search through dialogue/ Q&A, – Online customer support
• Market: – Banks, Insurance companies
• Projects & software
NLT-Text
Elitech:• http://www.elitech.nl/• Technology:
– Question analysis, user profile,
– question answer matching, answer database
• Application: multimodal Q&A, selfservice• Market: Railways, cities, Banks, Energy, Insurance, Travel agents,
Telecom, Government• Projects & software
NLT-Text
TextKernel:• http://www.textkernel.com/• Technology:
– memory based learning (analogical or similarity-based reasoning)– text classification– string extraction (names, numbers, formulations, zip codes)– Hidden Markov Models, Decision Trees, Naive Bayes, SVM's, or
Stochastic Grammars• Application:
– Text classification, Information extraction• Market:
– Recruiting, Tangram, WiseGuys, – Cooperates with system integrators (e.g. Capgemini, WCC Search &
Match, Connexys)• Projects & software
NLT-Text
Carp:• http://www.carp-technologies.nl/nld/Home/• Technology:
– Parsing– Semantic network
• Application: – summarizers, – search, – anonymizer, – text analysis
• Market:– local governments (province cities), – Department of Justice
• Projects & software
NLT-Text
Irion:• http://www.irion.nl/• Technology:
– statistic and phrase retrieval– text classification– language identification, taggers, grammars, wsd– information extraction– dialogue modelling– multilingual semantic networks, thesauri
• Application:– Text classification, text mining, cross-lingual retrieval, dialogue systems,
language-analysis for text complexity• Market:
– Governments (local and national)– Libraries– Publishers
• Projects & software
NLT-Text
TextKernel:• http://www.textkernel.com/• Technology:
– memory based learning (analogical or similarity-based reasoning)– text classification– string extraction (names, numbers, formulations, zip codes)– Hidden Markov Models, Decision Trees, Naive Bayes, SVM's, or
Stochastic Grammars• Application:
– Text classification, Information extraction• Market:
– Recruiting, Tangram, WiseGuys, – Cooperates with system integrators (e.g. Capgemini, WCC Search &
Match, Connexys)• Projects & software
TechnologiesSpeech-to-Text 15
Text-to-speech 8
Statistics 7
Relation detection through thesauri/ontologies/lexicons 5
Automatic Thesaurus/Ontology/Lexicon Learning 4
Dialogues 4
Manual text analysis 4
Tagging 4
Parsing 4
Text classification 3
Q&A 3
Stochastic NLP 2
Multilingual processing 2
User profiling 1
Spelling 1
Memory Based Learning 1
Ontologies & reasoning 1
Applications
Search 13
Knowledge management 5
Text mining 4
Meta data enrichment 4
Dialogue & Q&A, both text and speech 4
Authoring 3
User adaptation & analysis 2
Training & therapy 2
Political Position Mining 2
Summarization 1
Business rules 1
Text complexity 1
Trend Analysis 1
Market
Government (local, national) 9
Legal 5
Finance 5
Publishers 4
Police 3
Insurance 3
Aid for disabled 2
Telecom 2
Health science 1
Transport 1
Energy 1
Recruitment 1
System integrators 1
Simple
Complex
Reason
Semantics
Syntax
Tagging
Lemmatize
Statistics
Automatic
Manual
Co-training
Index Search Classifi-cation
Mining DialogueQ&A
Analysis Decide
Bureau TaalTrendLight
KiesKompas
LibRT
IrionCarp
Collexis
Q-goKnowledgeConcepts
AskNow
Gridline
Polderland TextKernel
WiseGuys
AutonomyFast
EndecaGoogle
Microsoft
ViaTaal
inTaal
Discussion
• Is the technology mature enough?• Long way from technology to software products
> software development.
• More money from investors required• Small company syndrome: sails & marketing• More money from investors required• Need for commercial software developers
(salaries)• Need for NLP developers
Some cases of failure
• Government departments & university/education libraries• VWS bought Verity
– cheap license but still expensive (100K and 30K maintenance per year)
– does not work:• diacritics• morphology• compounds• upper/lower case
– expensive IT consult to investigate solution: alternative search was not an option (no money & people for another integration)
– classify text, index thesaurus labels, match queries to labels• Many RFIs involving Autonomy, Verity, Fast and Irion
– best system– to small to be thrustworthy