Compression, Processing, Indexing and Retrieval of 3D Objects and Data:
How to extend image/video processing to graphics?
Tsuhan ChenCarnegie Mellon University
Joint work with Howard Leung, Masa Okuda, and Cha Zhang
Tsuhan Chen
(Mis)UnderstandingTo graphics and vision communities
Video is just low-level processingTo the video community
Graphics is just some fancy toolsVision is things that don’t work in practice
Tsuhan Chen
First Attempt…MPEG-4
Started out as model-based codingAnalysis and synthesisUsing vision/graphics for video coding
That didn’t happen (not completely)Settled with 2D shape-based codingModel-based coding for limited content, e.g., faces
Modeling and CodingEXAMPLESMODELS CODED INFORMATION
PCM
Predictive CodingTransform Coding
Block-based codingH.261/263, MPEG-1/2
Model-based coding
MPEG-4
Pixels
Statistically dependent pixels
Moving blocks
Moving objects
Facial models
Moving regions
Color of pixels
Prediction error ortransform coeffs
Motion vectors and prediction error
Shapes, motion, and colors of objects
Action units
Shapes, motion, and colors of regions
Region-based codingH.263+, MPEG-4
MPEG-7A/V objects Description
Tsuhan Chen
Modeling and Coding (cont.)Better modeling implies
Higher compressionMore content accessibilityMore complexityLess error resilience
Video and vision/graphics do go hand-in-hand all alongVideo research is evolution of vision and graphics techniques
Tsuhan Chen
TopicsCompression for image based renderingCompression for 3D meshes
Streaming in texture and geometry jointly Indexing and retrieval of 3D objectsBuilding immersive environments
Tsuhan Chen
Image-Based Rendering
……
[Shum et. al]
Tsuhan Chen
CompressionThe number of images is large, so we need compressionGood to have fewer samples
Does not guarantee fewer bits
Consider these as a video sequenceGeneral video coding applies
Tsuhan Chen
DCT Q
IDCT
IQ
D
ME
MCMV
?
?+
+IDCT
IQ
+D MC
MV
Network or Storage
Encoder Decoder
Video Codec
1 Intra coding2 Inter coding
1
1 1
2
2 2
Previous frame(reference frame)
Current frame
Tsuhan Chen
Intra Coding
Disadvantage: Does not exploit the correlation between images
I-frame I-frame
……
1 2 3I-frame
Tsuhan Chen
Inter Coding
Disadvantage: Does not provide random access
i.e., frame N depends on frame N-1
I-frame P-frame
……
1 2 3P-frame
Tsuhan Chen
Prediction from Sprite
25…………
1 2 k3
[cf. Anandan et. al]
Tsuhan Chen
Generation of SpriteImage 1
Image 2
Image N-1
Image N
Image 1
Image 1
Image 2
Image N-1
Image N
Sprite
Step 1: Finding the offset Step 2: Generating the sprite
Tsuhan Chen
Weighting- need to find a weighting function to blend the
images to form the sprite
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
0 100 200 300 400
Column number
Wei
ght
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
0 100 200 300 400
Column number
Wei
ght
0
0.02
0.04
0.06
0.08
0.1
0 100 200 300 400
Column number
Wei
ght
Constant weighting Triangular weighting Delta weighting
Tsuhan Chen
Constant Weighting
…………
1 2 k3
Tsuhan Chen
Triangular Weighting
…………
1 2 k3
Tsuhan Chen
Delta Weighting
…………
1 2 k3
Tsuhan Chen
Modified Codec3 Prediction from sprite image without MC
DCT Q
Prediction
IQ?
?+ IDCT
Network or Storage
Encoder Decoder
sprite + offset
Prediction
+
sprite + offset
Tsuhan Chen
With Motion Compensation4 Prediction from sprite image with MC
DCT Q
Prediction
IQ
ME
MCMV
?
?+ IDCT
Network or Storage
Encoder Decoder
Sprite + offset
Prediction
MCMV
Sprite + offset
+
Tsuhan Chen
With vs. Without MC
3 4without MC with MC
Tsuhan Chen
Test Sequences (1)
Synthetic sequence 1: NetICE room Synthetic sequence 2: Park
Tsuhan Chen
Test Sequences (2)
Real sequence 1: Kids Real sequence 2: Kongmiao
[Shum, et al]
Tsuhan Chen
Weighting function Results
Synthetic sequence 1 Synthetic sequence 2
Real sequence 1 Real sequence 2
25
27
29
31
33
35
37
0 0.05 0.1 0.15 0.2
Bit rate (bpp)
PSNR
(dB)
25
27
29
31
33
35
37
0 0.05 0.1 0.15 0.2
Bit rate (bpp)
PSNR
(dB)
29
31
33
35
37
39
41
43
0 0.02 0.04 0.06 0.08 0.1
Bit rate (bpp)
PSNR
(dB)
34
36
38
40
42
44
0 0.01 0.02 0.03 0.04 0.05
Bit rate (bpp)
PSNR
(dB)
Constant weighting Triangular weighting Delta weighting
Tsuhan Chen
Compression Result
Synthetic sequence 1 Synthetic sequence 2
29
31
33
35
37
39
41
0 0.05
Bit rate (bpp)
PS
NR
(dB
)
Intra coding Inter coding Mosais without MC Mosaic with MC
34
36
38
40
42
44
0 0.01 0.02 0.03 0.04 0.05
Bit rate (bpp)P
SN
R (d
B)
Intra coding Inter coding Mosais without MC Mosaic with MC
Intra coding Inter coding Sprite without MC Sprite with MC
Tsuhan Chen
Compression Result
Real sequence 1 Real sequence 2
2526272829303132333435
0 0.05 0.1
Bit rate (bpp)
PS
NR
(dB
)
Intra coding Inter coding Mosais without MC Mosaic with MC
25262728293031323334
0 0.05 0.1
Bit rate (bpp)P
SN
R (d
B)
Intra coding Inter coding Mosais without MC Mosaic with MC
Intra coding Inter coding Sprite without MC Sprite with MC
Tsuhan Chen
EnhancementsWindow size for searching offsetsStripe motion compensationMC using a large reference frameMultiple sprites
Kids
26
27
28
29
30
31
32
33
0.2 0.3 0.4 0.5 0.6 0.7 0.8
bit rate (bpp)
PS
NR
(d
B)
1 sprite 3 sprites 5 sprites 7 sprites 9 sprites
Tsuhan Chen
Recap…Sprite prediction with MC better than Intra coding
Sprite prediction with MC is preferred for random accessBetter than Inter coding for real data
Delta weighting is the best for constructing the spriteCan be extended to higher dimensions
Lumigraph, lightfield, etc.
Tsuhan Chen
Streaming 3D
3 second
10 seconds40 seconds
Geometry/Texture
Tsuhan Chen
Texture + Geometry = 3D Object
Corner-Based
Vertex-Based
Tsuhan Chen
Why Compression?Each vertex: three floating-point numbersIf each vertex shared by 6 triangles, and max number of vertices per model is 220
? 108 bits/triangle needed
? 100KB~1MB for an average model + texture
trianglebits
IDvertexbits
triangleIDsvertex
vertexbits
trianglevertices 10820
*33*32
*3
*61 ????
?????
????
??????
???
??
????
????
????
??
Compression of 3D ObjectsTexture compression
Static textures: JPEG or JPEG 2000Dynamic textures: MPEG or H.263
Geometry compressionQuantization of vertex coordinatesPredictive codingEntropy coding
Granular/stable progressive codingMesh optimization/simplification
[Hoppe et al][Heckbert et al][Schroder et al][Taubin et al.]
Tsuhan Chen
Texture Coding
Block Diagram
Vertex Quantization
Entropy Coding
Vertex Coordinates
Prediction
Bitstream
Connectivity++
-
3D ModelTexture
Tsuhan Chen
EncodingVertex decimation
C
C
1
234
56
165
4
Re-triangulation
? ?iv
Tsuhan Chen
Importance of Vertices
(1) Volume
(2) Color
)(iv
)(ic
V1V2
Tsuhan Chen
Rank all vertices from high to low based on a cost function:
v (i ) is the geometry costc (i ) is the texture/color/normal cost? is an user-specified parameter
Decimate the vertices with low cost firstTransmit the vertices with high cost first
)()1()()( icivim ?? ???
Tsuhan Chen
Coding of TextureVertex-based
Wavelet (SPIHT) + entropy coding
Corner-basedPadding + DCT + run-length coding + entropy codingTexture re-mapping needed
Tsuhan Chen
Texture Re-Mapping
m
v
vm
(a) (b)
New Texture M ap
t’1
t’2 t’3
t’5
t’4
t1 t2 t3 t4 t5
Texture Rem apping
Retriangulate
t6
t’6
(c)
Tsuhan Chen
683651605
79266234
6750
VRML+ gzip
texture-126Vasetexture-15Ducknone1814Pieta
texture-160Totem
none4840Horsenone4136CrocodileNone1512CowNone1410Beethoven
AttributesMPEG-4OurAlgorithm
(in KBytes)
Comparison
Tsuhan Chen
View-Adaptive Transmission
Viewpoint B
Viewpoint A
Hypothetical Viewpoint
Tsuhan Chen
Retrieval of 3D ObjectsIndexing and retrieval
Much is done for images[Huang et al][Cox et al]
Recent work for 3D objectsRelated to MPEG-7
Feature extractionFeature matching
Tsuhan Chen
Feature ExtractionFeature extraction
Traditionally vertex/surface-basedNew region-based features
moment invariants, Fourier transform coefficients, etc.
Preprocessing to close the model
Surface Region
Tsuhan Chen
Feature Extraction (cont.)Efficiently calculate region-based feature directly from meshSigned feature for each mesh elementRobust to triangulationApplies to any feature that can be decomposed to each mesh element +
??
?+
+
Tsuhan Chen
3D Model Retrieval
Tsuhan Chen
Annotation and Active LearningSemantic thru annotation is needed
Low level features not enoughHierarchical annotationCompatible concepts in annotation
Active learningComplete annotation is impracticalSelect the object most uncertain forannnoation
Tsuhan Chen
Annotation
Tsuhan Chen
Active LearningFor each model, each concept, we maintain a probability of this model belonging to this conceptSet the probability to 1 or 0 if annotatedEstimate probabilities of the unlabeled objects with potential functionUse the probabilities to estimate uncertainty and to measure the semantic distance
Tsuhan Chen
Active Learning
-5 -4 -3 -2 -1 0 1 2 3 4 50.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
max2 d?
d
p )/exp(5.05.0 2max
20 ddcp ????
-10 -8 -6 -4 -2 0 2 4 6 8 100.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
f
kp
Annotated models
One annotated neighborhood Multiple annotated neighborhoods
The potential function
Tsuhan Chen
Estimate the UncertaintyThe general criterion:
fDi: local density functionUi: uncertainty measurement
. ..., ,2 ,1 ),,...,,( 21 NipppfUfG iKiiiDiiDi ??????
).1log()1(log),...,,( maxmaxmaxmax21 iiiiiKiii pppppppU ???????
Kkpp ikki ..., ,2 ,1 ),(maxmax ??
Tsuhan Chen
Estimate the Uncertainty
1p
2p
iep
Tsuhan Chen
Semantic DistanceCannot use Kullback-Leibler convergenceOur semantic disance is defined as:
Annotated models
Unannotated modes
? ?.
,eeconcept tr in the levelat is concept ,1:1maxlLowestLeve
s
jkikk
d
lkandppllLowestLeve
??
????
? ?
? ?
???
?
???
?
?
?????
?????
?
?????
????
? ????
?????
. 0 if ,)1(
0 if ,min1 ;eeconcept tr in the levelat is concept :1
;eeconcept tr in the levelat is concept
,5.0 5.0 ,:maxarg
;,...,2,1 ),1()1(
0
0
20
lLowestLeved
lLowestLevedd
lkllLowestLevelkand
porpTdlk
Kkppppd
lLowestLevesk
skks
jkiksk
k
ikjkjkiksk
?
Tsuhan Chen
Results
0
0.5
1
1.5
2
2.5
3
0 50 100 150 200 250 300 350
# of Samples Annotated
Ret
riev
al P
erfo
rman
ce (D
)
Best Gradient Search
Random Sampling
Our Algorithm
0
1
2
3
4
5
6
0 50 100 150 200 250
# of Models Annotated
Ret
riev
al P
erfo
rman
ce (D
) Best Gradient Search
Random Sampling
Our Algorithm
Synthetic database A small database
Tsuhan Chen
Results (cont.)
0
1
2
3
4
5
6
7
8
9
0 500 1000 1500 2000
# of models annotated
Ret
riev
al P
erfo
rman
ce (D
) Random Sampling
Our algorithm
Tsuhan Chen
Recap…New feature set for 3D modelsActive learning to improve annotation efficiencyCompatible concept tree for annotationProbability for both uncertainty estimation and semantic distance
“Collaboration from anywhere, through any media, as if face-to-face in one room”
Network
Immersive Environments
Tsuhan Chen
Tsuhan Chen
A PrototypeNetICE: Networked Intelligent Collaborative EnvironmentLip-sync facilitates speech understanding
Who is speaking and what is being saidConsistent spatial relationship with eye contact
Whom is spoken toFacial expressions and voice-driven hand gesturesDirectional sound give sense of distance and direction
Who is where; Who is speakingEnable small-group interaction in a room full of people
Information sharingShared whiteboardStreaming 3D objectsEnable collaborative design, e.g., cars, buildings, etc.
Tsuhan Chen
NetICE
Tsuhan Chen
NetICE
Tsuhan Chen
NetICE
Tsuhan Chen
NetICE
Tsuhan Chen
Case Study: Online Auction
Tsuhan Chen
Ongoing WorkUse IBR for background renderingUser study
Together or on-locationTracking for rendering
Head tracking for head orientationGaze tracking for eye contactHand tracking for hand gestures
Tsuhan Chen
SummaryCompression for IBRCompression for 3D meshesIndexing and retrieval of 3D objectsImmersive environments
Tsuhan Chen
Advanced Multimedia Processing Lab
Please visit us at:http://amp.ece.cmu.edu