HLT Industry in the Netherlands Piek Vossen Faculteit der Letteren, Vrije Universiteit Amsterdam Irion Technologies, Delft Workshop HLT Collaboration SA.

  • Published on
    26-Dec-2015

  • View
    215

  • Download
    3

Embed Size (px)

Transcript

  • Slide 1
  • HLT Industry in the Netherlands Piek Vossen Faculteit der Letteren, Vrije Universiteit Amsterdam Irion Technologies, Delft Workshop HLT Collaboration SA & Low Countries 24-26 November 2008, Cape Town
  • Slide 2
  • Overview of HLT-NL 50 Companies investigated in the Netherlands (sources: NTU (64), Notas, PE) http://taalunieversum.org/taal/technologie/ontwikkelaars.php Search no NLP NLT Speech NLT Text NLT Consult. Semantic Web Manual Analysis 31823223 WiseGuysPhilipsCollexisViataalLibRTTrendlight IntelliGentDialogs Unlimited GridlineinTaalAskNowBureau Taal IlseDutchEarKnowledge Concepts Kieskompas TelecatsPolderland....moremany more G2 SpeechQ-go Logica TextKernel Voice Data BridgeCarp ComsysIrion
  • Slide 3
  • NLT Text (33) Thesaurus based text processing: Collexis, GridLine, Knowledge Concepts Text mining: Textkernel, Irion Spelling: Polderland, *TALO Search: Irion, WiseGuys, Ilse, Intelligent Classification: Irion, Collexis, Textkernel Summarization: Carp Technologies User profiling, data mining: AskNow, Sentient Machine Research
  • Slide 4
  • NLT Text (33) Dialogue/Q&A: AskNow, Elitech, Q-go, Irion, Lexicons: Van Dale Translation tools: Lingvistica, Topterm, Linguistic Systems Document & knowledge management: Getronics, CIBIT, AI Engineering, ZyLAB Europe, Niceware, Sopheon, Human Inference, LibRT Manual text complexity: Bureau Taal Manual language analysis, trends and politics: Kieskompas, Trendlight Medical language tools: Lexima, ViaTaal, inTaal Semantic web: LibRT
  • Slide 5
  • NLT Speech (18) !EffectiveASRtelephone applications, stock market ComsysASRTTStelephone applications, call centres DediconTTSspoken documents for disabled Dialogues UnlimitedARStelephone applications DutchEarASRTTStelephone applications, self services, colleague connect, stock market, traffic support, helpdesk support, speaker identification FluencyTTStext to synthetic speech FORUS-PASRdatabase management G2 SpeechASRdictating, work flow management, medical domain, legal domain Group 2000ASRtelephone applications
  • Slide 6
  • NLT Speech (18) KompagneASRTTSmedical domain LogicaASRTTScontact & call centers ORCAvoiceASRTTStelephone applications PhilipsASRTTSdictating, medical domain, legal domain Sound IntelligenceSound hearing aid, medical domain TelecatsARSTTStelephone applications, information retrieval, messaging, routing, call handling en large platforms VoCognitionASRTTSlogistics of storage centres Voice Data BridgeASRTTStelephone applications, information retrieval, telecom operators YPCAASRinteractive services, automobile concepts
  • Slide 7
  • NLT-Text Collexis : http://www.collexis.com/ Technology: Fingerprints of documents using the knowledge residing in a thesaurus or multiple thesauri Fingerprints from existing results used to generate new results with higher precision Discovering the relationships between the elements of different content sources and uncovering unique information Application: Search, Knowledge management, Text mining Market: Government, Legal, Health science Projects & software
  • Slide 8
  • NLT-Text Gridline: http://www.gridline.nl/ Technology: Semi-automatic development of thesauri and ontologies Application: Search, Authoring Market: Government, Law firms Projects & software
  • Slide 9
  • NLT-Text KnowledgeConcepts : http://www.knowledge-concepts.com/ Technology: Relation detection in text through use of semantic networks, thesauri and taxonomies Part-Of-Speech taggers, lemmatisers, entity extractors, stopword lists, and language identifiers Application: multilingual search, classification and analytical products Market: Government, Banks, PTT, Publishers Projects & software
  • Slide 10
  • NLT-Text Polderland: http://www.polderland.biz/ Technology: spelling suggestion/correction fuzzy matching semantic expansion Application: search, content- and document management software, authoring, automatic classification and meta data extraction Market: CRM-systemen, contactcenter software, publishing systems, sharepoint and portal-server systems Projects & software
  • Slide 11
  • NLT-Text Q-GO : http://www.q-go.nl/ Technology: question analysis and normalization search dialogue, Q&A Application: Search through dialogue/ Q&A, Online customer support Market: Banks, Insurance companies Projects & software
  • Slide 12
  • NLT-Text Elitech : http://www.elitech.nl/ Technology: Question analysis, user profile, question answer matching, answer database Application: multimodal Q&A, selfservice Market: Railways, cities, Banks, Energy, Insurance, Travel agents, Telecom, Government Projects & software
  • Slide 13
  • NLT-Text TextKernel: http://www.textkernel.com/ Technology: memory based learning (analogical or similarity-based reasoning) text classification string extraction (names, numbers, formulations, zip codes) Hidden Markov Models, Decision Trees, Naive Bayes, SVM's, or Stochastic Grammars Application: Text classification, Information extraction Market: Recruiting, Tangram, WiseGuys, Cooperates with system integrators (e.g. Capgemini, WCC Search & Match, Connexys) Projects & software
  • Slide 14
  • NLT-Text Carp: http://www.carp-technologies.nl/nld/Home/ Technology: Parsing Semantic network Application: summarizers, search, anonymizer, text analysis Market: local governments (province cities), Department of Justice Projects & software
  • Slide 15
  • NLT-Text Irion: http://www.irion.nl/ Technology: statistic and phrase retrieval text classification language identification, taggers, grammars, wsd information extraction dialogue modelling multilingual semantic networks, thesauri Application: Text classification, text mining, cross-lingual retrieval, dialogue systems, language-analysis for text complexity Market: Governments (local and national) Libraries Publishers Projects & software
  • Slide 16
  • NLT-Text TextKernel: http://www.textkernel.com/ Technology: memory based learning (analogical or similarity-based reasoning) text classification string extraction (names, numbers, formulations, zip codes) Hidden Markov Models, Decision Trees, Naive Bayes, SVM's, or Stochastic Grammars Application: Text classification, Information extraction Market: Recruiting, Tangram, WiseGuys, Cooperates with system integrators (e.g. Capgemini, WCC Search & Match, Connexys) Projects & software
  • Slide 17
  • Semantic Web LibRT http://www.librt.com/ Ontologies and modeling of business processes rules for specification, verification, validation, simulation and execution Business rules Tax office, Traffic department, Politie, University, Banks Advanced Bionics AskNow
  • Slide 18
  • Manual language analysis Kieskompas Text analysis for mining Political position Trendlight Text analysis for news trends & polling Bureau Taal Text analysis for Text complexity Market: Governments, commercial companies, newspapers, insurance, banks, tax office
  • Slide 19
  • Search, no NLP WiseGuys: Kobala Internet search, monolingual Use TextKernel software Intelligent: Federated search on database servers Search engine API Ilse Dutch Internet search 80 million index items, 20% potentially Dutch Use Irion software for query-index matching to improve recall
  • Slide 20
  • Technologies Speech-to-Text15 Text-to-speech8 Statistics7 Relation detection through thesauri/ontologies/lexicons5 Automatic Thesaurus/Ontology/Lexicon Learning4 Dialogues4 Manual text analysis4 Tagging4 Parsing4 Text classification3 Q&A3 Stochastic NLP2 Multilingual processing2 User profiling1 Spelling1 Memory Based Learning1 Ontologies & reasoning1
  • Slide 21
  • Applications Search13 Knowledge management5 Text mining4 Meta data enrichment4 Dialogue & Q&A, both text and speech4 Authoring3 User adaptation & analysis2 Training & therapy2 Political Position Mining2 Summarization1 Business rules1 Text complexity1 Trend Analysis1
  • Slide 22
  • Market Government (local, national)9 Legal5 Finance5 Publishers4 Police3 Insurance3 Aid for disabled2 Telecom2 Health science1 Transport1 Energy1 Recruitment1 System integrators1
  • Slide 23
  • Simple Complex Reason Semantics Syntax Tagging Lemmatize Statistics Automatic Manual Co-training Index SearchClassifi- cation MiningDialogue Q&A Analysis Decide Bureau Taal TrendLight KiesKompas LibRT Irion Carp Collexis Q-go KnowledgeConcepts AskNow Gridline Polderland TextKernel WiseGuys Autonomy Fast Endeca Google Microsoft ViaTaal inTaal
  • Slide 24
  • Discussion Is the technology mature enough? Long way from technology to software products > software development. More money from investors required Small company syndrome: sails & marketing More money from investors required Need for commercial software developers (salaries) Need for NLP developers
  • Slide 25
  • Some cases of failure Government departments & university/education libraries VWS bought Verity cheap license but still expensive (100K and 30K maintenance per year) does not work: diacritics morphology compounds upper/lower case expensive IT consult to investigate solution: alternative search was not an option (no money & people for another integration) classify text, index thesaurus labels, match queries to labels Many RFIs involving Autonomy, Verity, Fast and Irion best system to small to be thrustworthy

Recommended

View more >