Upload
mohammad-kazemi
View
219
Download
1
Embed Size (px)
Citation preview
.
PhD thesis abstracts
ACM SIGMM RecordsVol. 4, No. 1, March 2012 29
ISSN 1947-4598http://sigmm.org/records
TIMIT showed that the proposed features yieldlow phoneme recognition accuracies, implying higherprivacy.
For the speaker diarization task, we interpret theextraction of privacy-sensitive features as an objectivethat maximizes the mutual information (MI) withspeakers while minimizing the MI with phonemes.The source-filter model arises naturally out ofthis formulation. We then investigate two differentapproaches for extracting excitation source basedfeatures, namely Linear Prediction (LP) residual anddeep neural networks. Diarization experiments on thesingle and multiple distant microphone scenarios fromthe NIST rich text evaluation datasets show that thesefeatures yield a performance close to the Mel FrequencyCepstral coefficients (MFCC) features. Furthermore,listening tests support the proposed approaches interms of yielding low intelligibility in comparison withMFCC features.
The last part of the thesis studies the application ofour methods to SND and diarization in outdoor settings.While our diarization study was more preliminary innature, our study on SND brings about the conclusionthat privacy-sensitive features trained on outdoor audioyield performance comparable to that of PLP featurestrained on outdoor audio. Lastly, we explored thesuitability of using SND models trained on indoorconditions for the outdoor audio. Such an acousticmismatch caused a large drop in performance, whichcould not be compensated even by combining indoormodels.
Thesis Advisors: Dr. Daniel Gatica-Perez, Prof. HerveBourlard; Thesis Jury President: Dr.Jean-Marc Vesin;Thesis Reviewers: Prof. Daniel P.W. Ellis, Prof. SimonKing, Prof. Jean-Philippe Thiranhttp://library.epfl.ch/theses/?nr=5234
Social computing group
http://www.idiap.ch/~gatica/research-team.html
Mohammad KazemiVarnamkhastiMultiple Description Video Coding Based on Base andEnhancement Layers of SVC and Channel AdaptiveOptimization
Multiple distortion coding (MDC) is a promising solutionfor video transmission over lossy channels. In MDC,multiple descriptions of a source are generated whichare dependently decodable and mutually refinable.When all descriptions are available, the correspondingquality is called central quality; otherwise it is called sidequality. Generally, there exists a trade-off between sideand central quality in all MDC schemes. MDC methodswhich provide better central-side quality trade-off are ofmore interest to designers.
In this thesis a new MDC scheme is introduced whichhas better trade-off between side and central qualitycompared to existing schemes. In other words, for thesame central quality, it provides higher side quality; orequivalently for the same side quality, it has highercentral quality. This method is based on the mixingof the base and enhancement layers of Coarse-GrainScalable (CGS) coding and hence is called Mixed LayerMDC (MLMDC). At the central decoder the layers areseparated and we have two-layer quality, such as in theCGS decoder. At the side decoder, some descriptionsare not available and hence we cannot separate thelayers, directly. We propose to use estimation for thispurpose. MLMDC for two-description coding and four-description coding is implemented in JM16.0, H.264/AVC reference software. The experimental results showthat for videos which have dynamic enough content(texture and motion activity), MLMDC in comparisonto the conventional MDCs provides higher side qualityfor the same central quality. In addition, for videotransmission over channels with packet loss (such asthe Internet), MLMDC provides higher average videoquality, in particular for four-description coding.
.
PhD thesis abstracts
ISSN 1947-4598http://sigmm.org/records 30
ACM SIGMM RecordsVol. 4, No. 1, March 2012
In error prone environments, we need to have higherside quality while in less noisy conditions highercentral quality is more important. Therefore, in orderto have the best quality in different channel conditions,optimization is needed. For this purpose, a new modelfor end-to-end distortion is introduced which takes intoaccount both quantization and transmission distortionsfor predicting the quality at the receiver side. Thetransmission distortion is the result of error propagationwhich in turn originated from the mismatch betweenside and central decoder outputs. The derived model isapplicable for all DCT-domain MDCs. With experimentalresults, the model is verified first and then used forMDCs optimization. The results show the performanceof the optimizer and also MLMDC higher video qualitycompared to that of conventional MDCs when they aredesigned optimally.
While the thesis is in Farsi, its first part is available inEnglish in the following paper:Kazemi, M.; Sadeghi, K.H.; Shirmohammadi, S.,"A Mixed Layer Multiple Description Video CodingScheme", IEEE Transactions on Circuits andSystems for Video Technology, Volume: 22 Issue:2,Feb. 2012, pp. 202-215, http://dx.doi.org/10.1109/TCSVT.2011.2159431
Advisor(s): Khosro Haj Sadeghi (supervisor), ShervinShirmohammadi (co-supervisor)http://site.uottawa.ca/~shervin/theses/2012-MohammadKazemi.pdf
Mona OmidyeganehParametric Analysis and Modeling of Video Signals
Video modeling and analysis have been of great interestin the video research community, due to their essentialcontribution to systematic improvements concerned in awide range of video processing techniques. Parametricmodeling and analysis of video provides appropriatemeans for processing the signal and the necessarymining of information for efficient representation of thesignal. Video comparison, human action recognition,video retrieval, video abstraction, video transmission,and video clustering are some applications that canbenefit from video modeling and analysis.
In this thesis, the parametric analysis and modelingof the video signal is studied through two schemes.In the first scheme, spatial parameters are firstextracted from video frames and temporal evolutionof these spatial parameters is investigated. Spatialparameters are selected based on the statistics ofthe 2D wavelet transform of the video frames, wherewavelet transform provides a sparse representation ofthe signals and structurally conforms to the frequencysensitivity distribution of the human visual system. Toanalyze the temporal relations and progress of thesespatial parameters, three methods are considered: inter-frame distance measurement, temporal decomposition,and Autoregressive (AR) modeling. In the first method,employing the Kullback-Leibler (KL) distance betweenspatial parameters as the similarity measure, thetemporal evolution of the spatial features is studied.This analysis is used to determine shot boundaries,segment shots into clusters and select keyframesproperly based on both similarity and dissimilaritycriteria, within and outside the corresponding cluster,respectively. In the second method, the video signal isassumed to be a sequence of overlapping independentvisual components called events, which typically aretemporally overlapping compact functions that describetemporal evolution of a given set of the spatialparameters of the video signal. This event-basedtemporal decomposition technique is used for videoabstraction, where no shot boundary detection orclustering is required. In the third method, the videosignal is assumed to be a combination of spatial featuretime series that are temporally approximated by the ARmodel. The AR model describes each spatial featurevector as a linear combination of the previous vectorswithin a reasonable time interval. Shot boundaries arewell detected based on the AR prediction errors, andthen at least one keyframe is extracted from each shot.To evaluate these models, subjective and objectivetests, on TRECVID and Hollywood2 datasets, areconducted and simulation results indicate high accuracyand effectiveness of these techniques.
In the second scheme, video spatio-temporalparameters are extracted from 3D wavelet transformof the natural video signal based on the statistical