Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
New Challenges inNew Challenges inSemantic Concept DetectionSemantic Concept Detection
M.-F. Weng, C.-K. Chen, Y.-H. Yang, R.-E. Fan,Y.-T. Hsieh, Y.-Y. Chuang, W. H. Hsu, and C.-J. Lin
National Taiwan University
2
Preliminaries for Semantic Preliminaries for Semantic Concept DetectionConcept Detection• What are preliminaries for building a semantic
concept detection system?–– A lexicon of wellA lexicon of well--defined conceptsdefined concepts–– Training resourcesTraining resources
• Video data• Annotations• Features
–– ToolsTools• Tagging or labeling tools, (e.g., CMU and IBM tools)• Feature extractors• Machine learning tools, (e.g., LIBSVM)• Semantic concept detection tailored tools
3
Semantic ConceptsSemantic Concepts
MediaMill 101
Columbia374
LSCOM 449
TRECVID-2006 LSCOM-Lite (39)
TRECVID-2005 10 Concepts
4
Video Data SetsVideo Data Sets
TV 03
TV 04
TV 05
TV 06
TV 07
Dat
aset
0 50 100 150 200
Video Length (hrs)
devel. set
test set
~190 hours ~190 hours News VideoNews Video
~330 hours ~330 hours MultiMulti--Lang. Lang. Broadcast Broadcast
News VideoNews Video
100 hours 100 hours Sound and Sound and
VisionVision VideoVideo……
5
AnnotationsAnnotations
TV 05
TV 06
TV 07
Dat
aset
0 50 100 150 200Video Length (hrs)
devel. set
test set
NIST Truth Judgments
Common Annotations
6
Features, Detectors, ScoresFeatures, Detectors, Scores
Scores of TV Scores of TV 07 dataset07 dataset
VIREOVIREO--374 374 detectorsdetectors
•Color moment•Wavelet texture•Keypoint feature
VIREO-374
Columbia374 Columbia374 scores of TV scores of TV 06/07 dataset06/07 dataset
Columbia 374 Columbia 374 detectorsdetectors
•EDH•GBR•GCM
Columbia374
Scores of TV Scores of TV 05/06 dataset05/06 dataset
5 sets of 101 5 sets of 101 classifiersclassifiers
•Visual feature•Text feature
MediaMillBaseline
ScoresScoresDetectorsFeatureatures
7
Available ResourcesAvailable Resources
• Concept definition is sufficient • Training resources are plentiful• No feature extractors and tailored tools available
Tailored ToolsMachine Learning ToolsFeature ExtractorsTagging Tools
Tools
FeaturesAnnotationsVideo data
Training Resources
Well-defined concepts
8
The New ChallengesThe New Challenges
• Challenge 1 : Easy and Efficient Tools–– LL datasets, MM concepts, NN features, imply L*M*NL*M*N
classifiers– Each classifier has to consider many parameters–– Time seems very limitedTime seems very limited to validate each parameter
and to train all classifiers• Challenge 2: Resource Exploitation or Reuse
– Resources are precious– Existent resources are potentially usefulpotentially useful for new
dataset– Plentiful resources have not been fully utilizednot been fully utilized
9
Facing the New ChallengesFacing the New Challenges
• Challenge 1:– Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit
tailored for semantic concept detection• Challenge 2:
– Reused classifiers of past data to improve accuracy by late aggregation
– Exploited contextual relationship and temporal dependency from annotations to boost accuracy
10
Facing the New ChallengesFacing the New Challenges
• Challenge 1 : Easy and Efficient Tools– Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit
tailored for semantic concept detection
• Challenge 2– Reused classifiers of past data to improve
accuracy by late aggregation– Exploited contextual relationship and temporal
dependency from annotations to boost accuracy
11
A Tailored ToolkitA Tailored Toolkit
• We extended LIBSVM in three aspects for semantic concept detection:– Using dense representations– Exploiting parallelism of independent concepts,
features, and SVM model parameters– Narrowing down parameter search to a safe
range
• Overall, training time of our baseline was approximately reduced from 14 days to about 3 daysfrom 14 days to about 3 days
12
Facing the New ChallengesFacing the New Challenges
• Challenge 1 – Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit tailored for
semantic concept detection
• Challenge 2 : Resource Exploitation or Reuse– Reused classifiers of past data to improve accuracy by
late aggregation– Exploited contextual relationship and temporal
dependency from annotations to boost accuracy
13
Reuse Past DataReuse Past Data
• Early aggregation–– Must reMust re--train classifiers train classifiers –– Cause considerable training timeCause considerable training time
• Late aggregation–– Simple and directSimple and direct–– May be biasedMay be biased
14
Late AggregationLate Aggregation
• We adopt late aggregation to reuse existent classifiers by two strategies:–– Equally Average AggregationEqually Average Aggregation
• Simply average the scores of past and newly trained classifiers
–– ConceptConcept--dependent Weighted Aggregationdependent Weighted Aggregation• Use concept-dependent weights to aggregate
classifiers
15
Aggregation BenefitsAggregation Benefits
0
0.05
0.1
0.15
0.2
0.25
0.3
Flag-U
S
Police
_Sec
urity
Explo
sion_
Fire
Airp
lane
Airp
lane
Chart
s
Milita
ryDe
sert
Dese
rt
Truc
k
Wea
ther
Moun
tain
Peop
le-Ma
rching
Spor
ts
Spor
ts
Boat_
Ship
Maps
Maps
Meeti
ng
Comp
uter_T
V
Comp
uter_T
V--scre
en
scree
n
Car
Anim
al
Offic
e
Wate
rscap
e_Wa
terfro
nt
Overa
ll
infA
P
TV07 Classifiers
Average Aggregation
Weighted Aggregation
Overall Improvement Ratio –
Average Aggregation : 22%
Weighed Aggregation : 30%
16
Facing the New ChallengesFacing the New Challenges
• Challenge 1 – Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit tailored for
semantic concept detection
• Challenge 2: Resource Exploitation or Reuse– Reused classifiers of past data to improve accuracy by
late aggregation– Exploited contextual relationship and temporal
dependency from annotations to boost accuracy
17
Observation in AnnotationsObservation in AnnotationsA sequence of video shots
A lexiconof concepts
car
outdoor1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
building 1 0 0 1 1 1 1 0
sky 0 1 1 1 0 1 1 0people 0 0 0 1 0 1 1 0
urban 1 1 0 1 1 1 1 1
Contextual relationshipTemporal Dependency
18
PostPost--processing Frameworkprocessing Framework
VideoSegmentation
FeatureExtraction
ConceptDetection
TemporalFiltering
Shot Ranking
VideoSequence
ConceptReranking
CombinationAnnotation
TemporalDependency
Mining
UnsupervisedContextual
Fusion
TemporalFilter
Design
Mining phase: Processing phase:Detecting phase:
19
Temporal FilteringTemporal Filtering
VideoSegmentation
FeatureExtraction
ConceptDetection
TemporalFiltering
Shot Ranking
VideoSequence
ConceptReranking
CombinationAnnotation
TemporalDependency
Mining
UnsupervisedContextual
Fusion
TemporalFilter
Design
Mining phase: Processing phase:Detecting phase:
20
Temporal DependencyTemporal Dependency• Different concepts have different levels of
dependency at different temporal distance– E.g., sports, weather, maps, explosion sports, weather, maps, explosion
0
2000
4000
6000
8000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
sportsweathermapsexplosion
Temporal Distance
Chi-square test
2kχ
21
Temporal FilterTemporal Filter
Xt-1
)|1( 1−= ttlP x
Xt-2
)|1( 2−= ttlP x
Xt-k
)|1( kttlP −= x
Xt Xt+1 Xt+2 Xt+k…
)|1( kttlP += x)|1( 2+= ttlP x)|1( 1+= ttlP x)|1( ttlP x=
…….. …..
w0w1 w1 w2 wkw2wk
( )[ ]∑=
−===d
kkttkt lPwlP
0)|1()1(ˆ x
SVM Classifier
)|1( 11 −− = ttlP x)|1( 22 −− = ttlP x)|1( ktktlP −− = x )|1( ktktlP ++ = x)|1( 22 ++ = ttlP x)|1( 11 ++ = ttlP x)|1( ttlP x=
22
Filtering PredictionFiltering Prediction• A sequence of shots for predicting sportssports
–– Classifier prediction resultsClassifier prediction results
–– After temporal filteringAfter temporal filtering0
0.20.40.60.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
00.10.2
0.30.4
0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Misclassification picked upRank rose
23
Concept Concept RerankingReranking
VideoSegmentation
FeatureExtraction
ConceptDetection
TemporalFiltering
Shot Ranking
VideoSequence
ConceptReranking
CombinationAnnotation
TemporalDependency
Mining
UnsupervisedContextual
Fusion
TemporalFilter
Design
Mining phase: Processing phase:Detecting phase:
24
Initial ranking list produced by a baseline methodTarget concept: ‘boat’ (search or detection)
Initial list
25
training
test
Step 1. Randomly split to training and test sets
Initial list
26
training
test
Step 2. Learn to maintain the ranking orders
Initial list
1. Related concepts1. Related concepts2. Importance of each2. Importance of each
conceptconcept
Related concepts:oceanwaterscape
Shot pairs
27
training
test
Related concepts:oceanwaterscape
Step 3. Context fusion on the test data
Initial list
28
Reranked
test
Initial list training
Step 4. Merge
29
CombinationCombination
VideoSegmentation
FeatureExtraction
ConceptDetection
TemporalFiltering
Shot Ranking
VideoSequence
ConceptReranking
CombinationAnnotation
TemporalDependency
Mining
UnsupervisedContextual
Fusion
TemporalFilter
Design
Mining phase: Processing phase:Detecting phase:
30
PostPost--processing Benefitsprocessing Benefits
Flag-U
S
Police
_Sec
urity
Explo
sion_
Fire
Milita
ry
Airpla
ne
Wea
ther
Chart
sTr
uck
Moun
tain
Meeti
ng
Peop
le-Ma
rching
Boat_
Ship
Sport
s
Dese
rt
Comp
uter_T
V-sc
reen
Anim
al Car
Maps
Offic
e
Wate
rscap
e_Wa
terfro
nt
Overa
ll0
0.05
0.1
0.15
0.2
0.25
0.3
infA
P
ClassificationConcept RerankingTemporal FilteringParallel Combination
Overall improvement ratio:Overall improvement ratio:
Parallel Combination: +10%Parallel Combination: +10%
+45%
+29%
+37%
+17%
31
ConclusionConclusion• We reduce the training time of detectorsreduce the training time of detectors by
using a tailored toolkit for semantic concept detection
• The proposed aggregation methods reuse the reuse the classifiersclassifiers of past data and can boost the detection accuracy
• Our post-processing approaches exploit exploit existent resourceexistent resource and can further improve detection results
32
Thank You For Your AttentionThank You For Your Attention