16
New Challenges in New Challenges in Semantic Concept Detection Semantic Concept Detection M.-F. Weng, C.-K. Chen, Y.-H. Yang, R.-E. Fan, Y.-T. Hsieh, Y.-Y. Chuang, W. H. Hsu, and C.-J. Lin National Taiwan University 2 Preliminaries for Semantic Preliminaries for Semantic Concept Detection Concept Detection What are preliminaries for building a semantic concept detection system? A lexicon of well A lexicon of well- defined concepts defined concepts Training resources Training resources Video data • Annotations • Features Tools Tools Tagging or labeling tools, (e.g., CMU and IBM tools) Feature extractors Machine learning tools, (e.g., LIBSVM) Semantic concept detection tailored tools

New Challenges in Semantic Concept Detection - NIST · 2007. 11. 28. · s _Ship MapsMaps g er _ TV er TV - c reen Car Anima l t Off i ce Ove r all infAP TV07 Classifiers Average

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • New Challenges inNew Challenges inSemantic Concept DetectionSemantic Concept Detection

    M.-F. Weng, C.-K. Chen, Y.-H. Yang, R.-E. Fan,Y.-T. Hsieh, Y.-Y. Chuang, W. H. Hsu, and C.-J. Lin

    National Taiwan University

    2

    Preliminaries for Semantic Preliminaries for Semantic Concept DetectionConcept Detection• What are preliminaries for building a semantic

    concept detection system?–– A lexicon of wellA lexicon of well--defined conceptsdefined concepts–– Training resourcesTraining resources

    • Video data• Annotations• Features

    –– ToolsTools• Tagging or labeling tools, (e.g., CMU and IBM tools)• Feature extractors• Machine learning tools, (e.g., LIBSVM)• Semantic concept detection tailored tools

  • 3

    Semantic ConceptsSemantic Concepts

    MediaMill 101

    Columbia374

    LSCOM 449

    TRECVID-2006 LSCOM-Lite (39)

    TRECVID-2005 10 Concepts

    4

    Video Data SetsVideo Data Sets

    TV 03

    TV 04

    TV 05

    TV 06

    TV 07

    Dat

    aset

    0 50 100 150 200

    Video Length (hrs)

    devel. set

    test set

    ~190 hours ~190 hours News VideoNews Video

    ~330 hours ~330 hours MultiMulti--Lang. Lang. Broadcast Broadcast

    News VideoNews Video

    100 hours 100 hours Sound and Sound and

    VisionVision VideoVideo……

  • 5

    AnnotationsAnnotations

    TV 05

    TV 06

    TV 07

    Dat

    aset

    0 50 100 150 200Video Length (hrs)

    devel. set

    test set

    NIST Truth Judgments

    Common Annotations

    6

    Features, Detectors, ScoresFeatures, Detectors, Scores

    Scores of TV Scores of TV 07 dataset07 dataset

    VIREOVIREO--374 374 detectorsdetectors

    •Color moment•Wavelet texture•Keypoint feature

    VIREO-374

    Columbia374 Columbia374 scores of TV scores of TV 06/07 dataset06/07 dataset

    Columbia 374 Columbia 374 detectorsdetectors

    •EDH•GBR•GCM

    Columbia374

    Scores of TV Scores of TV 05/06 dataset05/06 dataset

    5 sets of 101 5 sets of 101 classifiersclassifiers

    •Visual feature•Text feature

    MediaMillBaseline

    ScoresScoresDetectorsFeatureatures

  • 7

    Available ResourcesAvailable Resources

    • Concept definition is sufficient • Training resources are plentiful• No feature extractors and tailored tools available

    Tailored ToolsMachine Learning ToolsFeature ExtractorsTagging Tools

    Tools

    FeaturesAnnotationsVideo data

    Training Resources

    Well-defined concepts

    8

    The New ChallengesThe New Challenges

    • Challenge 1 : Easy and Efficient Tools–– LL datasets, MM concepts, NN features, imply L*M*NL*M*N

    classifiers– Each classifier has to consider many parameters–– Time seems very limitedTime seems very limited to validate each parameter

    and to train all classifiers• Challenge 2: Resource Exploitation or Reuse

    – Resources are precious– Existent resources are potentially usefulpotentially useful for new

    dataset– Plentiful resources have not been fully utilizednot been fully utilized

  • 9

    Facing the New ChallengesFacing the New Challenges

    • Challenge 1:– Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit

    tailored for semantic concept detection• Challenge 2:

    – Reused classifiers of past data to improve accuracy by late aggregation

    – Exploited contextual relationship and temporal dependency from annotations to boost accuracy

    10

    Facing the New ChallengesFacing the New Challenges

    • Challenge 1 : Easy and Efficient Tools– Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit

    tailored for semantic concept detection

    • Challenge 2– Reused classifiers of past data to improve

    accuracy by late aggregation– Exploited contextual relationship and temporal

    dependency from annotations to boost accuracy

  • 11

    A Tailored ToolkitA Tailored Toolkit

    • We extended LIBSVM in three aspects for semantic concept detection:– Using dense representations– Exploiting parallelism of independent concepts,

    features, and SVM model parameters– Narrowing down parameter search to a safe

    range

    • Overall, training time of our baseline was approximately reduced from 14 days to about 3 daysfrom 14 days to about 3 days

    12

    Facing the New ChallengesFacing the New Challenges

    • Challenge 1 – Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit tailored for

    semantic concept detection

    • Challenge 2 : Resource Exploitation or Reuse– Reused classifiers of past data to improve accuracy by

    late aggregation– Exploited contextual relationship and temporal

    dependency from annotations to boost accuracy

  • 13

    Reuse Past DataReuse Past Data

    • Early aggregation–– Must reMust re--train classifiers train classifiers –– Cause considerable training timeCause considerable training time

    • Late aggregation–– Simple and directSimple and direct–– May be biasedMay be biased

    14

    Late AggregationLate Aggregation

    • We adopt late aggregation to reuse existent classifiers by two strategies:–– Equally Average AggregationEqually Average Aggregation

    • Simply average the scores of past and newly trained classifiers

    –– ConceptConcept--dependent Weighted Aggregationdependent Weighted Aggregation• Use concept-dependent weights to aggregate

    classifiers

  • 15

    Aggregation BenefitsAggregation Benefits

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    Flag-U

    S

    Police

    _Sec

    urity

    Explo

    sion_

    Fire

    Airp

    lane

    Airp

    lane

    Chart

    s

    Milita

    ryDe

    sert

    Dese

    rt

    Truc

    k

    Wea

    ther

    Moun

    tain

    Peop

    le-Ma

    rching

    Spor

    ts

    Spor

    ts

    Boat_

    Ship

    Maps

    Maps

    Meeti

    ng

    Comp

    uter_T

    V

    Comp

    uter_T

    V--scre

    en

    scree

    n

    Car

    Anim

    al

    Offic

    e

    Wate

    rscap

    e_Wa

    terfro

    nt

    Overa

    ll

    infA

    P

    TV07 Classifiers

    Average Aggregation

    Weighted Aggregation

    Overall Improvement Ratio –

    Average Aggregation : 22%

    Weighed Aggregation : 30%

    16

    Facing the New ChallengesFacing the New Challenges

    • Challenge 1 – Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit tailored for

    semantic concept detection

    • Challenge 2: Resource Exploitation or Reuse– Reused classifiers of past data to improve accuracy by

    late aggregation– Exploited contextual relationship and temporal

    dependency from annotations to boost accuracy

  • 17

    Observation in AnnotationsObservation in AnnotationsA sequence of video shots

    A lexiconof concepts

    car

    outdoor1 1 1 1 1 1 1 1

    1 1 1 1 1 1 1 1

    building 1 0 0 1 1 1 1 0

    sky 0 1 1 1 0 1 1 0people 0 0 0 1 0 1 1 0

    urban 1 1 0 1 1 1 1 1

    Contextual relationshipTemporal Dependency

    18

    PostPost--processing Frameworkprocessing Framework

    VideoSegmentation

    FeatureExtraction

    ConceptDetection

    TemporalFiltering

    Shot Ranking

    VideoSequence

    ConceptReranking

    CombinationAnnotation

    TemporalDependency

    Mining

    UnsupervisedContextual

    Fusion

    TemporalFilter

    Design

    Mining phase: Processing phase:Detecting phase:

  • 19

    Temporal FilteringTemporal Filtering

    VideoSegmentation

    FeatureExtraction

    ConceptDetection

    TemporalFiltering

    Shot Ranking

    VideoSequence

    ConceptReranking

    CombinationAnnotation

    TemporalDependency

    Mining

    UnsupervisedContextual

    Fusion

    TemporalFilter

    Design

    Mining phase: Processing phase:Detecting phase:

    20

    Temporal DependencyTemporal Dependency• Different concepts have different levels of

    dependency at different temporal distance– E.g., sports, weather, maps, explosion sports, weather, maps, explosion

    0

    2000

    4000

    6000

    8000

    10000

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    sportsweathermapsexplosion

    Temporal Distance

    Chi-square test

    2kχ

  • 21

    Temporal FilterTemporal Filter

    Xt-1

    )|1( 1−= ttlP x

    Xt-2

    )|1( 2−= ttlP x

    Xt-k

    )|1( kttlP −= x

    Xt Xt+1 Xt+2 Xt+k…

    )|1( kttlP += x)|1( 2+= ttlP x)|1( 1+= ttlP x)|1( ttlP x=

    …….. …..

    w0w1 w1 w2 wkw2wk

    ( )[ ]∑=

    −===d

    kkttkt lPwlP

    0)|1()1(ˆ x

    SVM Classifier

    )|1( 11 −− = ttlP x)|1( 22 −− = ttlP x)|1( ktktlP −− = x )|1( ktktlP ++ = x)|1( 22 ++ = ttlP x)|1( 11 ++ = ttlP x)|1( ttlP x=

    22

    Filtering PredictionFiltering Prediction• A sequence of shots for predicting sportssports

    –– Classifier prediction resultsClassifier prediction results

    –– After temporal filteringAfter temporal filtering0

    0.20.40.60.8

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    00.10.2

    0.30.4

    0.5

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    Misclassification picked upRank rose

  • 23

    Concept Concept RerankingReranking

    VideoSegmentation

    FeatureExtraction

    ConceptDetection

    TemporalFiltering

    Shot Ranking

    VideoSequence

    ConceptReranking

    CombinationAnnotation

    TemporalDependency

    Mining

    UnsupervisedContextual

    Fusion

    TemporalFilter

    Design

    Mining phase: Processing phase:Detecting phase:

    24

    Initial ranking list produced by a baseline methodTarget concept: ‘boat’ (search or detection)

    Initial list

  • 25

    training

    test

    Step 1. Randomly split to training and test sets

    Initial list

    26

    training

    test

    Step 2. Learn to maintain the ranking orders

    Initial list

    1. Related concepts1. Related concepts2. Importance of each2. Importance of each

    conceptconcept

    Related concepts:oceanwaterscape

    Shot pairs

  • 27

    training

    test

    Related concepts:oceanwaterscape

    Step 3. Context fusion on the test data

    Initial list

    28

    Reranked

    test

    Initial list training

    Step 4. Merge

  • 29

    CombinationCombination

    VideoSegmentation

    FeatureExtraction

    ConceptDetection

    TemporalFiltering

    Shot Ranking

    VideoSequence

    ConceptReranking

    CombinationAnnotation

    TemporalDependency

    Mining

    UnsupervisedContextual

    Fusion

    TemporalFilter

    Design

    Mining phase: Processing phase:Detecting phase:

    30

    PostPost--processing Benefitsprocessing Benefits

    Flag-U

    S

    Police

    _Sec

    urity

    Explo

    sion_

    Fire

    Milita

    ry

    Airpla

    ne

    Wea

    ther

    Chart

    sTr

    uck

    Moun

    tain

    Meeti

    ng

    Peop

    le-Ma

    rching

    Boat_

    Ship

    Sport

    s

    Dese

    rt

    Comp

    uter_T

    V-sc

    reen

    Anim

    al Car

    Maps

    Offic

    e

    Wate

    rscap

    e_Wa

    terfro

    nt

    Overa

    ll0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    infA

    P

    ClassificationConcept RerankingTemporal FilteringParallel Combination

    Overall improvement ratio:Overall improvement ratio:

    Parallel Combination: +10%Parallel Combination: +10%

    +45%

    +29%

    +37%

    +17%

  • 31

    ConclusionConclusion• We reduce the training time of detectorsreduce the training time of detectors by

    using a tailored toolkit for semantic concept detection

    • The proposed aggregation methods reuse the reuse the classifiersclassifiers of past data and can boost the detection accuracy

    • Our post-processing approaches exploit exploit existent resourceexistent resource and can further improve detection results

    32

    Thank You For Your AttentionThank You For Your Attention