EUMSSI team at the MediaEvalPerson Discovery Challenge 2016
Nam Le, Jean-‐Marc Odobez, Sylvain Meignier{nle, odobez}@idiap.ch
sylvain.meignier@univ-‐lemans.fr
Video OCR and NER3
07/12/2016
Original Image
Text region detection
Text extraction
Text recognition
Hypothesis merging
• Multiple image segmentations of the same region è all results are compared and aggregated over time è several hypotheses è high recall
• NER based on MITIE with heuristics.
Face diarization4
07/12/2016
DPM
CRF-multi-target
Face clustering Hierarchical clustering
shots
Face tracking
Face detection
Talking face detection5
07/12/2016
Face track
9 directions of optical flows
PCA ⇒ 𝒙𝒕
x% x& x'(&
LSTM LSTM LSTM…
x& x) x'
Mean Pooling Classifier
ℎ% ℎ&h'(&
DW dataset for talking face & dubbing: http://bit.ly/dw-‐dubbing
• LIUM diarization tool:
www-‐lium.univ-‐lemans.fr/en/content/liumspkdiarization• Input: a video• Output: homogeneous segments
Speaker diarization6
07/12/2016
Result ranking7
07/12/2016
• Direct naming: maximize co-occurrences between clusters and named entities.− Face naming: name 𝑁-. and talking score 𝑡 𝑁-.
− Speaker naming: name 𝑁-0 and equal score 1.0
• For one shot 𝑠 : 𝑄6 = ∅• Names which face agrees with speaker naming rank highest:
− If ∃𝑁;0 /𝑁-. = 𝑁;0 : 𝑄6 ← 𝑁-. , 2.0 + 𝑡 𝑁-.
• Otherwise, face naming has higher rank:− If ∄𝑁;0 /𝑁-
.= 𝑁;0 : 𝑄6 ← 𝑁-. , 1.0 + 𝑡 𝑁-.
− If ∄𝑁-0 /𝑁-. = 𝑁;0 : 𝑄6 ← 𝑁-0 , 1.0
Submissions9
07/12/2016
MAP@1 MAP@10 MAP@100Sub. (1) 30.3 22.0 21.0Sub. (2) 58.6 42.9 42.0Sub. (3) 64.2 53.1 52.1Sub. (4) 68.3 56.2 54.7Sub. (5) 79.2 65.2 63.4
Face diarization Baseline OCR-‐NER Face namingSub. (1)
Face diarization Our OCR-‐NER Face namingSub. (2)
Face diarization Our OCR-‐NER Talking face naming
Sub. (3)
Face diarization OCR-‐NER Talking face naming+
Speaker naming
Sub. (4)Speaker
diarization OCR-‐NER
Sub. (4) + Sub. (1) + Baseline 2Sub. (5)