2
MEDIA MINING INDEXER The Media Mining Indexer (MMI) represents “the heart” of the Media Mining System. Here is where automac speech recognion takes place i.e. speech becomes text. The Media Mining Indexer receives mulmedia files from TV/ FM Radio and media files from User Upload through the Media Mining Feeder. The Media Mining Indexer converts this input into a rich-transcript XML-file describing the content and its metadata. This rich-transcript is subsequently sent to the Media Mining Server for storage and can be accessed for search, retrieval and analycs through the Media Mining Client. MMI-OSINT includes: ASR (Automac Speech Recognion) Speaker Change Detecon (detects speaker turns) Segmentaon (detects and ignores music, noise, other non-speech) LMT (Language Model Toolkit) & SID (Speaker Idenficaon) NED (Named Enty Detecon- detects people names, places, organizaons) TD (Topic Detecon- finds out the topic(s) of an audio input) SID (Speaker Idenficaon - finds out who is speaking, comes with toolkit to add your own voices) SA (Senment Analysis- posive, negave, mixed, neutral) All supported languages - same license can run with all supported languages, one at a me, manual selecon of language C++ and C API (for integraon into custom products, MS Visual Studio sample code) CLI for audio file input (Command Line Interface) XML output 2 available edions: • Business (individual selecon of available features) • OSINT (all supported features included)

M INDEXER - sail-labs.com · • Windows 7 (64 bit), Windows Vista (64 bit), Windows XP (64 bit), Windows 2008 R2 Server or superior is required • For the LMT feature, Java Runtime

Embed Size (px)

Citation preview

MEDIA MINING INDEXER

The Media Mining Indexer (MMI) represents “the

heart” of the Media Mining System. Here is where

automatic speech recognition takes place i.e. speech

becomes text. The Media Mining Indexer receives

multimedia files from TV/ FM Radio and media files

from User Upload through the Media Mining Feeder.

The Media Mining Indexer converts this input into a

rich-transcript XML-file describing the content and its

metadata. This rich-transcript is subsequently sent

to the Media Mining Server for storage and can be

accessed for search, retrieval and analytics through

the Media Mining Client.

MMI-OSINT includes:

• ASR (Automatic Speech Recognition)

• Speaker Change Detection (detects speaker turns)

• Segmentation (detects and ignores music, noise, other non-speech)

• LMT (Language Model Toolkit) & SID (Speaker Identification)

• NED (Named Entity Detection- detects people names, places, organizations)

• TD (Topic Detection- finds out the topic(s) of an audio input)

• SID (Speaker Identification - finds out who is speaking, comes with toolkit to add your own voices)

• SA (Sentiment Analysis- positive, negative, mixed, neutral)

• All supported languages - same license can run with all supported languages, one at a time, manual selection of language

• C++ and C API (for integration into custom products, MS Visual Studio sample code)

• CLI for audio file input (Command Line Interface)

• XML output

2 available editions:

• Business (individual selection of available features)

• OSINT (all supported features included)

Currently supported languages*:

• Albanian (Tosk) sq_al

• Arabic ar_ar

• Arabic (EGY) ar_eg

• Arabic (LB) ar_lb

• Bahasa (INDO) id_id

• Dutch (NL+BE) nl_nb

• English (US) en_us

• English (US+UK) en_ux

• Farsi (IR) fa_ir

• French (FR) fr_fr

• German de_de

• Greek el_gr

• INTEL compatible CPU, 2 cores, 1.6 GHz, SSE2• 2GB-DDR 2 RAM • Disk space: 60 GB recommended

• Windows 7 (64 bit), Windows Vista (64 bit), Windows XP (64 bit), Windows 2008 R2 Server or superior is required• For the LMT feature, Java Runtime Environment (JRE) 1.7 or superior is required

• Linux, able to run 32-bit applications• GNU compiler gcc able to build 32-bit applications

*not all technologies may be supported for all languages **subject to change without prior notice

Input Files Requirements

• Hebrew he_il

• Italian it_it

• Mandarin ch_zn

• Norwegian no_no

• Pashto (AF+PK) ps_ap

• Polish pl_pl

• Romanian (RO) ro_ro

• Russian ru_ru

• Spanish (ES) es_es

• Spanish (MX) es_mx

• Turkish tr_tr

• Urdu ur_pk

Hard- and Software Prerequisites** (for 1 MMI Instance)

• Wav file (uncompressed audio) 16kBit, 16 bit, mono, signed = 256kBit (uncompressed)• Mp3 (compressed audio) 96 – 128 Kbit