9
842 J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October I n June the AES audio forensics community took advan- tage of another excellent opportunity to share information on research and practice in forensic science. The AES 39th International Conference, Audio Forensics—Practices and Challenges, was the latest meeting in the successful series that began in 2005 with the 26th Conference and continued with the 33rd Conference in 2008. While the prior AES audio forensics events were held in Denver, Colorado, this year’s event was held at the Pharmakon Conference Center in Hillerød, Denmark. The venue supported a truly international meeting, with paper presenters representing nearly a dozen different nations and an attendee list reflecting participants from more than 20 countries. The 39th International Confer- ence reasserted AES as a key player in the audio forensics field. Planning for the 39th Conference involved members of the AES Technical Committee on Audio Forensics and the AES headquarters staff. Work on the latest conference began in 2008 immediately following the successful 33rd Conference. Eddy Bøgh Brixen, conference chair, coordinated an outstand- ing committee consisting of papers cochairs Alan Cooper and Durand Begault, workshops cochairs Gordon Reid and Catalin Grigoras, treasurer S.K. Pramanik, facilities chair Katrine Bøgh Brixen, and webmaster Preben Kvist. The Com- AES 39th International Conference 17–19 June 2010 Hillerød, Denmark Audio Forensics Practices and Challenges

AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

842 J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October

In June the AES audio forensics community took advan-tage of another excellent opportunity to share informationon research and practice in forensic science. The AES 39th

International Conference, Audio Forensics—Practices andChallenges, was the latest meeting in the successful series thatbegan in 2005 with the 26th Conference and continued withthe 33rd Conference in 2008. While the prior AES audioforensics events were held in Denver, Colorado, this year’sevent was held at the Pharmakon Conference Center inHillerød, Denmark. The venue supported a truly internationalmeeting, with paper presenters representing nearly a dozendifferent nations and an attendee list reflecting participants

from more than 20 countries. The 39th International Confer-ence reasserted AES as a key player in the audio forensicsfield.Planning for the 39th Conference involved members of the

AES Technical Committee on Audio Forensics and the AESheadquarters staff. Work on the latest conference began in2008 immediately following the successful 33rd Conference.Eddy Bøgh Brixen, conference chair, coordinated an outstand-ing committee consisting of papers cochairs Alan Cooper andDurand Begault, workshops cochairs Gordon Reid andCatalin Grigoras, treasurer S.K. Pramanik, facilities chairKatrine Bøgh Brixen, and webmaster Preben Kvist. The Com-

AES 39th International Conference

17–19 June 2010Hillerød, Denmark

Audio ForensicsPractices

and Challenges

Page 2: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October 843

The late Roy Pritts and Rich Saunders (pictured on the screen below) did so much to engender interest in and support for audio forensics withinthe AES.

mittee’s efforts culminated in an exceptional technical pro-gram and lively workshop atmosphere.The Pharmakon Conference Center, located about an

hour’s journey from Copenhagen Kastrup International Air-port, has hosted several previous AES conferences. Thevenue was comfortable and particularly well-suited to theintimate, small group interaction that is a hallmark of AESinternational conferences. The attendees enjoyed the friendlyand collegial quality of the Pharmakon Center—particularlythe morning and afternoon coffee breaks and the diningmenus.The conference opened on Thursday morning, 17 June,

with introductory remarks by Eddy Brixen. He highlightedthe important roles played by the prior AES forensics confer-ence organizers in Denver, the late Roy Pritts and the lateRichard Sanders (see photots above), who had done so muchto engender interest and support for audio forensics withinthe AES. Brixen called upon the attendees to remember Royand Rich for their pioneering work. Following the welcome

and conference overview, the morning session was devoted topresentations by three exhibitors: Cedar Audio, UK; NationalCenter of Media Forensics, University of Colorado, Denver,USA; and DPA Microphones, Denmark. In addition to theiropening presentations, the exhibitors each had space in a spe-cial room located adjacent to the main auditorium, and pro-vided information and hands-on demonstrations throughoutthe conference.The papers cochairs, Alan Cooper of the Metropolitan

Police, London, and Durand Begault of Charles M. SalterAssociates, San Francisco, assembled a wide variety of topi-cal sessions including papers on audio authentication, voiceidentification, enhancement, speech quality and intelligibility,and general acoustical forensics.

AUTHENTICATIONFollowing a fine opening lunch on Thursday, the technicalportion of the conference began with three papers on thetopic of authentication. Forensic examiners determine

Page 3: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

844 J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October

authenticity by examining the audio material for signs of inad-vertent or deliberate alteration, verifying that the recordingsystem operated properly and investigating the circumstancesof the recording and its chain of custody. Durand Begaultintroduced the session’s authors and their topics.The first paper of the session, presented by Alan Cooper,

described preliminary work in automated detection of “butt-spliced” edits in a digital audio file. A butt-spliced edit is asimple deletion or insertion in a digital file. Cooper explainedthat while butt splicing may often cause an audible defect, notevery splicing discontinuity is detectable aurally. He describeda promising set of experiments to detect butt-spliced editsusing sample-to-sample difference and signal correlation tech-niques. He acknowledged that this preliminary work is notguaranteed to detect every splice, and a variety of other tech-niques are under development.The second paper on authentication techniques explained

the use of magneto-optical crystals for visualizing the latentmagnetic patterns present on audio recording tapes. DagmarBoss of the Bavarian State Criminal Police (Bayerisches Lan-deskriminalamt) explained the use of special crystalline mate-rials that change their optical properties in response to exter-nal magnetic fields, thereby allowing the magnetic domainson the tape to be imaged for analysis and interpretation.Although digital recording devices are rapidly supplanting theuse of analog magnetic tape in many forensic circumstances,Boss emphasized the continued need for high-quality visuali-zation techniques whenever analog tape is involved.Rounding out the opening session, Catalin Grigoras, the

newly-named director of the National Center for Media Foren-sics at the University of Colorado-Denver, discussed theprospects for reliable authentication of digital recordings. Thechallenge for the forensic audio examiner is that a skilled andclever individual may be able to alter a recording in a mannerthat is not detectable based on simple aural or waveformanalysis. Grigoras focused his attention on recordings madewith lossy perceptual audio coding, such as MP3 or WMA,and the possibility that artifacts due to successive coding anddecoding, altered background reverberation, or a change fromone codec to another might be detectable.It was clear to all attendees that a key challenge for audio

forensics will be methods for authenticity verification in theage of digital recording and lossy audio codecs.

VOICE IDENTIFICATIONAfter a pleasant discussion break complete with coffee, tea,and pastries, Alan Cooper introduced the next paper sessioncovering research issues in forensic voice identification. Acommon task in forensic investigations is to determine thelikelihood that the words in a recording of speech were utteredby a particular individual.Eddy Brixen presented the first paper in the session. He

described his investigation into the use of digital signal pro-cessing techniques to disguise the identity of the talker. Anindividual might deliberately choose to disguise his or hervoice to avoid subsequent identification. In some cases theprocessed disguise is very obvious, while in other cases thealterations may be deliberately subtle and sophisticated andtherefore potentially difficult to detect via forensic means.Brixen conducted a series of tests using commercial music andvoice processing software to observe the spectral and temporalalterations present in several processed speech examples. Healso examined theeffect upon the tell-tale hum of electricalnetwork frequency(ENF) informationfollowing processing.His conclusion is thatidentity concealmentby signal processingvoice disguise is quitefeasible, and that it isdifficult for even askilled engineer to“undo” the process-ing to try to reveal theoriginal, undisguisedvoice.Next, Ewald Enzin -

ger of the AustrianAcademy of Sciences

AES 39th International Conference

From left: Eddy Bøgh Brixen (conference chair) and Durand Begault and Alan Cooper (papers cochairs)

Catalin Grigoras, workshops co-chair, spoke on prospects for reliableauthentication of digital recordings.

creo
Page 4: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October 845

in Vienna presented his work on capturing the time-variantbehavior of speech formants for diphthongs in Viennese Ger-man speech. The procedure follows the previously publishedwork of Geoffrey Stewart Morrison in which the temporalchange in formant frequencies during the diphthong are mod-eled with a polynomial or some other fitting function. Com-paring the fitting function from an unknown talker to the func-tions obtained from exemplar recordings of a particularsuspect may allow results suitable for reporting the likelihoodratio for the comparison. Enzinger noted that the resultsappear to be highly dependent on the particular speech contextof the recordings, such as telephone conversations versus freespeech, and also dependent upon natural phonetic andprosodic variation.Andrey Barinov of the Speech Technology Center., St.

Petersburg, Russia, explained his group’s work on the effectsof mobile phone GSM (Global System for Mobile communi-cations) source and channel coding in the context of forensicspeaker identification. The GSM channel involves nonlinearand time-variant perceptual audio coding, which can alter thespectral balance and formant characteristics of the codedspeech, making it difficult to form a forensic comparisonbetween speech from a recording of a GSM phone call and thespeech of an exemplar recording. Barinov’s group is workingon a means for inverse processing to achieve channel compen-sation for GSM recordings.Concluding the Thursday afternoon session, Anibal Ferreira

of the University of Porto, Portugal, gave an overview of hiswork in robust speaker identification based on the relativephase relationship between harmonics of the vocal productionsystem. Ferreira derives the “normalized relative delay”(NRD) for the periodic acoustic speech signal (or electroglot-tograph), and then compares the relative delay (phase differ-ence) between the harmonic partials of two different speechexamples to assess whether or not the speech of two differentspeakers can be discriminated. He explained that the initialexperiments and test results were found to be promising forongoing research.At the conclusion of the Voice Identification session, the

attendees enjoyed a wonderful evening dinner in the Phar-makon Conference Center dining room, a fitting finale to theday’s technical sessions and a pleasant break before the startof the evening workshop.

FRONTIERS OF FORENSIC AUDIO INVESTIGATIONGordon Reid chaired a special evening workshop session fea-turing presentations by three noted experts from the UnitedKingdom: Alan French, an audio consultant with CEDAR Ltd;Anil Alexander, R&D Director with GriffComm Ltd; and AnnaCzajkowski, an accreditation expert with Control Risks Foren-sics.French led off the workshop with a presentation entitled

“Time, Tide, and Technological Changes Wait for No Person”.The presentation traced the early history of audio forensicinvestigations by the Metropolitan Police in London, beginningin the 1970s with the investigation of audio tapes pertaining toJeremy Thorpe, Member of Parliament, for his alleged involve-ment in the attempted murder of Norman Scott, a man who hadclaimed to be Thorpe’s former homosexual lover. The Metro-

politan Police lab was asked to evaluate the authenticity of atape presented as evidence in that high-profile case. That initialforensic audio work led steadily to 35 years of increasinglysophisticated and comprehensive lab facilities and procedures.French concluded his historical retrospective by suggesting thatthe future of media forensics will more and more involve audio,video, and general digital data investigations.The next speaker, Anil Alexander, gave a fascinating presen-

tation on the problems and prospects for dealing with the dataexplosion facing the audio forensics field. He noted that theease with which modern recording equipment can obtain hours,days, or even weeks of continuous surveillance informationcauses the “curse of data.” How can the audio forensic exam-iner find the desired information from the vast data repositorypresented to him? He then elaborated on one of the ultimatedesires of audio forensics: the ability to define an acoustic eventof interest, search automatically for that event in a recording ofarbitrary length, and incorporate an automatic learning algo-rithm to locate, discriminate, and document the desired events.Anna Czajkowski, the last speaker of the evening’s work-

shop, gave a timely presentation on the issue of formal accredi-tation of audio forensic laboratories. ISO 17025, the interna-tional standard for assessing the competence of testing andcalibration laboratories, is expected to affect audio forensicexperts. Czajkowski described how audio forensic processingcan be validated in a meaningful way. According to the ISOstandard’s definition, validation means “confirmation, throughthe provision of objective evidence, that the requirements for aspecific intended use have been fulfilled.” She explained thatthe accreditation process helps ensure that suitable equipment isused by a competent operator employing reliable methodology,thereby improving the quality of audio forensic practice whilereducing the risk of error.With the conclusion of the first day of the conference, the

attendees enjoyed snacks and beverages at an informal recep-tion in the lounge area. The friendly and collegial tradition ofAES conferences was very much in evidence as every-one engaged in lively discussions of the day’s topics and presentations.

AES 39th International Conference

Anna Czajkowski described the validation of audio forensicprocessing.

creo
Page 5: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

846 J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October

EVALUATION OF FORENSIC COMPARISON EVIDENCE AND THE LIKELIHOOD RATIOFresh from a fine breakfast buffet to start Friday, the secondday of the conference, the attendees assembled for a specialtutorial session regarding the use of the likelihood-ratioframework when evaluating audio forensic evidence. Thetutorial presenter was Geoffrey Stewart Morrison, affiliatedwith the Australian National University, in Canberra, and theUniversity of New South Wales, in Sydney. Morrison intro-duced the terminology of Bayes’ Theorem, which in the con-text of a forensic speaker comparison would be the posteriorodds, defined by the ratio of the probability of the samespeaker given the acoustical evidence, divided by the proba-bility of a different speaker given the same acoustical evi-dence. In practice the forensic audio examiner would need toassess the Bayes likelihood ratio, which is the ratio of theprobability of observing the provided acoustical evidencegiven that it was the same speaker, divided by the probabilityof observing the same acoustical evidence given that it was adifferent speaker. The remaining portion of the Bayes’ Theo-rem expression involves the prior odds, which treats the prob-ability of the same speaker and the different speaker hypothe-ses. The posterior odds are the product of the likelihood ratioand the prior odds. Morrison consistently emphasized hispoint that a forensic scientist must only use the likelihoodratio, and not the prior odds, because the prior odds deal withsubjective attributes like motive, opportunity, and humanbiases that are the province of the trier of fact (for example, ajudge or jury), while the likelihood ratio should contain onlyobjective information that is the province of the forensicexaminer. He explained that for this reason it is inappropriatefor a forensic scientist to report the posterior odds, andencouraged all forensic reports to stick to the likelihood ratioframework.

ENHANCEMENT OF NOISY RECORDINGSThe second half of the Friday morning schedule was devotedto a session on enhancement methods for noisy audio record-ings. The first presenter, Damian Ellwart of the Gdansk Uni-versity of Technology, Poland, described an adaptive filteralgorithm for speech intelligibility improvement he developedwith his coauthor, Andrzej Czyzewski. The promising systemwas tested with a mixture of speech and music produced undervarious environmental conditions to examine the algorithm’snoise-suppression characteristics. Future work will be done toassess the intelligibility improvement with human subjects.Next, Gaston Hilkhuysen of University College London pre-

sented an interesting and important paper coauthored withMark Huckvale entitled “Adjusting a Commercial SpeechEnhancement System to Optimize Intelligibility”. The investi-gators reported on several interactive experiments conductedwith a panel of human listeners to observe the listeners’ cho-sen parameter settings when listening to noisy speech througha commercially available noise-reduction system. When mem-bers of the panel were asked to adjust the noise-reductionparameters to achieve the best intelligibility, the results actu-ally revealed a decrease in performance-based intelligibility.The study also found a substantial difference between the“optimal” settings selected by different members of the panel,indicating that opinion-based intelligibility is not an invariantstandard from listener to listener.The final paper of the morning session returned to the topic

of adaptive filtering for speech enhancement. Joerg Bitzer ofthe Jade University of Applied Sciences and the FraunhoferInstitute for Digital Media Technology, Oldenburg, Germany,gave an overview of adaptive noise cancellation techniques forforensic applications. Adaptive systems require at least twosignals, the input signal and the reference signal, having somelevel of correlation with each other. Bitzer explained that therecording circumstances, system nonlinearities, and complex-ity of the interfering noise point toward different realizationsof the adaptive noise-cancellation approach, thus one size doesnot fit all.The interesting and informative morning tutorial and paper

sessions encouraged many questions, comments, and discus-sion, which continued seamlessly as the attendees made theirway to the luncheon buffet.

ACOUSTICAL FORENSICSFollowing lunch, the technical sessions reconvened for twopapers dealing with acoustical principles. In the first presenta-tion, “Closed-Form Spatial Decomposition of an Acousticscene for enhancement and localization of audio objects inForensic Analysis,” Banu Günel of the University of Surrey,UK, described a multichannel directional recording systemthat allows a closed-form decomposition of individual, spa-tially separated sound sources. Günel described the theoreticaland practical foundations of the decomposition, taking a B-format microphone system (four signals) to produce verygood separation of an ensemble of sound sources. The systemgave very good intelligibility improvement for speech by sep-arating the sound of the desired talker from the interferingsound and noise coming from other directions.

AES 39th International Conference

Banu Günel asks a challenging question during post-presen-tation discussions.

Page 6: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October 849

The second acoustical paper dealt with gunshot acoustics.Rob Maher of Montana State University, USA, presentedresults of an experiment involving controlled gunshot record-ings of ten different firearms from ten different azimuthaldirections. Maher showed that the mean sound pressure levelof the muzzle blast was typically 15–20 dB lower whenobserved at the rear of the firearm (180° azimuth) comparedto the on-axis position (0° azimuth). He also presented thetime waveforms for each gun as a function of azimuth, reveal-ing an interesting variety of distinguishing features for eachfirearm type. He concluded his presentation with a reminderthat most gunshot forensic audio evidence includes acousticreflections, reverberation, and clipping/distortion that willlikely be more challenging for interpretation than the pristineand reflection-free waveforms obtained in his controlledexperiment.

LABORATORY PROCEDURESThe second half of the afternoon session turned to the toolsand techniques of contemporary professional forensic audiopractice. The first presenter, Robin How of the MetropolitanPolice Digital and Electronics Forensic Service, London,described the history, personnel qualifications, training, equip-ment, management, and research roles of the MetropolitanPolice Forensic Audio Laboratory. How explained both thetraditional and the emerging responsibilities of the ForensicAudio Laboratory, which is among the most experiencedforensic audio facilities in Europe. He pointed out that one ofthe increasingly common requests in recent years has been forvoice disguise: a request by the authorities to conceal theidentity of a witness by rendering recorded testimony to beunrecognizable as the utterances of the protected witness,while still maintaining intelligibility for use in court or otherofficial proceedings.Jeff Smith of the National Center for Media Forensics at the

University of Colorado-Denver, described the broader issuesof digital and multimedia evidence that extend beyond thecommon scope of forensic digital audio examination. Audioforensic examiners are often asked to assist with investigationsinvolving video content, computer file storage, data encryp-tion, and many, many other facets of forensic interest in thecomputer age. Smith encouraged the attendees to embrace thecomplicated and multifaceted nature of modern “cyber” foren-sics, and to be ready for continuous self-study and formal edu-cation to keep pace with the dynamic nature of the profession.

AUTHENTICATION WORKSHOPThe final technical session on Friday was a special workshopon the challenges of authentication with digital audio record-ings. The workshop speaker, Catalin Grigoras, presented aflow diagram summarizing the model for forensic audio datacollection. He noted that the physical environment of therecording, the characteristics of the microphone, potentialelectrical network frequency (ENF) hum, etc., can each leavea tell-tale signature in the recording that may aid in theauthentication task. Grigoras pointed out many possible indi-cations of questionable authenticity, but also emphasized thepotential difficulties in detecting surreptitious modifications ofdigital audio recordings.

AN EVENING OF TREATSFriday evening offered a special treat for the conference atten-dees—a private tour of Frederiksborg Castle. Located on threesmall islets surrounded by Castle Lake in Hillerød, a short busride from the Pharmakon Conference Center, the castle sitedates from the mid 1500s during the reign of its namesake,King Frederik II (b.1534 d.1588). The principal structureswere constructed by Frederik’s son, King Christian IV(b.1577 d.1648). Several sections of the Renaissance-periodcastle were destroyed in a major fire in 1859, but through thegood graces of J.C. Jacobsen, the founder of the CarlsbergBreweries, the castle was refurbished as a museum in 1878.

Rob Maher discussed forensic techniques for dealing withgunshot recordings.

Jeff Smith encouraged the audience to consider modern“cyber” forensics.

AES 39th International Conference

creo
Page 7: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

850 J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October

Denmark’s Museum of National History has been open to thepublic in Frederiksborg Castle since 1882.The attendees enjoyed a second special treat that evening

with a gourmet banquet served back at the Pharmakon Center,where the chefs had prepared a fine combination of appetiz-ers, wines, roasted red fish, roasted Grambogård pork, andlemon Mazarin dessert, providing a delicious and memorablefinale to a productive and enjoyable day.

FORENSIC AUDIO STANDARDSSaturday, the final day of the conference, opened with a pres-entation and discussion of the emerging area of forensic audioaccreditation and standards, hosted by Michael Piper of theU.S. Secret Service, Washington, D.C., and David Hallimoreof the Houston Police Department, Houston, Texas. Both pre-senters are leaders of the Audio Committee within the Scien-tific Working Group on Digital Evidence (SWGDE, pro-nounced “swig-dee”), which is a cooperative organizationwith members from local, state, federal, and international lawenforcement and investigative agencies who share informationand education in the field of digital forensics. Piper and Hal-limore led a discussion dealing with the implications of theU.S. National Research Council’s 2009 report entitled“Strengthening Forensic Science in the United States: A PathForward.” The NRC report was highly critical of many foren-sic practices, citing the need to establish statistical reliabilitymeasures for forensic comparisons, and the need for bona fidestandards for training and experience among forensic practi-

tioners. Although the NRC report was not specifically focusedon audio forensics, the implications of the report on the futureadmissibility of audio forensic evidence and testimonyspurred the SWGDE Audio Committee to seek input on howbest to address the NRC criticisms and recommendations.Piper and Hallimore invited additional comments and sugges-tions for future action.

AUTOMATED SPEECH PROCESSINGAfter the morning coffee break, Anil Alexander of Griff-Comm, Oxford, UK, returned to the stage to describe a tech-nique for semiautomatic speaker segmentation for processingrecorded testimony. The technique was developed for theMetropolitan Police, London, to assist with preparing record-ings in which the identity of specific talkers is protected, suchas the voices of undercover officers or vulnerable witnesseswho have been granted anonymity by the court. Such arecording can be processed manually by a technician whoidentifies segments of speech by the protected talker andselectively disguises or deletes those segments. Alexanderexplained that this manual process is tedious and time-con-suming and therefore an automatic segmentation systemwould be of great value. The proposed technique involvingspeech recognition and transition identification was found tobe effective and promising, although practical issues, such asdetermining the appropriate speech model order and discern-ing simultaneous speakers (over-talking), will require furtherstudy.

AES 39th International Conference

Delegatesenjoy avisit toFrederiks-borg Castle.

Eddy Brixen toasts the success ofthe conference at the banquet.

Page 8: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October 851

AES 39th International ConferenceThe second paper on automated speech processing, “Auto-

matic Forensic Voice Comparison Using Recording AdaptedBackground Models,” was presented by Timo Becker of theFederal Criminal Police Office (Bundeskriminalamt), Ger-many. The automatic forensic voice-comparison system uses astandard Gaussian Mixture Model (GMM) approach for text-independent speaker recognition, but with a novel adaptationto help account for the widely varying acoustical environ-ments encountered in forensic audio recordings. The authorsdeveloped a recording adapted background model (RABM)for use instead of the classical universal background model(UBM). Test results indicate that voice-comparison perform-ance is improved with the RABM method, but there remainseveral challenges having to do with database mismatch andseparating the channel characteristics from the speaker’sspeech characteristics.

SPEECH QUALITY AND INTELLIGIBILITYRounding out the Saturday morning sessions was the first offive papers addressing speech intelligibility. Nikolay Gaubitchof Imperial College, London, spoke about his research team’sresults in estimating speech intelligibility using rapid subjec-tive testing. The work is important because noise-reductionand quality-enhancement processing of forensic speechrecordings generally results in lower intelligibility, but quanti-fying the tradeoff between quality and intelligibility has beenhard to pin down. Gaubitch presented the Bayesian AdaptiveSpeech Intelligibility Estimation (BASIE) method, anddescribed several simulations and experiments used for itscharacterization and validation. The pilot tests showed theability to estimate the Speech Reception Threshold (SRT)

within ±1 dB in under 30 trials. Gaubitch indicated that theBASIE method should allow a relatively straightforwardapproach to find appropriate levels of enhancement and noisereduction that do not compromise intelligibility.Following the final tasty lunch of the conference, the group

reconvened for the closing paper session. Dushyant Sharma ofImperial College, London, presented a paper about using thePerceptual Evaluation of Speech Quality (PESQ) algorithm toassess the effects of forensic audio processing and enhance-ment systems. PESQ was developed as an objective means torate the degradations associated with telecommunicationschannels and codecs, eliminating the time and cost of subjec-tive testing with human subjects. Until now, PESQ has notbeen applied to forensic audio processing, so Sharma’s groupperformed an experiment to compare PESQ with a panel oftest subjects. The result was that the correlation betweenPESQ and the subjective ratings was poor for the substantialdegradations in quality typically encountered in forensic mate-rial, indicating that some other approach will be needed forthis purpose.Next, Andrea Paoloni of the Ugo Bordoni Foundation,

Rome, presented a paper describing the use of the SpeechTransmission Index (STI) as a possible objective measure ofsignal intelligibility. The STI was developed 40 years ago as ameans to assess the impact of a degraded channel on the intel-ligibility of speech. The STI determines the degree to whichthe normal spectral and temporal envelopes of a speech-liketest signal are maintained through the channel. The results ofseveral experiments indicate that the STI-based measurementis useful for classifying forensic audio systems.Returning to the quality versus intelligibility tradeoff, Mark

The 39th conference committee: from left, Zoe Asta, Alan Cooper, Preben Kvist, Durand Begault, Gordon Reid, Subir Pramanik, Anna Lawaetz, Katrine Bøgh Brixen, Roger Furness, Eddy Bøgh Brixen, and Catalin Grigoras.

Page 9: AES 39th · spliced” edits in a digital audio file. A butt-spliced edit is a simple deletion or insertion in a digital file. Cooper explained that while butt splicing may often

852 J. Audio Eng. Soc., Vol. 58, No. 10, 2010 October

Huckvale of University College London, presented a paperentitled “Measuring the Effect of Noise Reduction on Listen-ing Effort.” A set of intriguing experiments were conducted tosee if listener effort was reduced when listening to noise-reduced speech compared to listening to the original noisyspeech without processing. The major result of the experi-ments was that there did not appear to be any improvement inproductivity when the subjects listened to speech processedfor noise reduction compared to the unprocessed speech, andin certain cases the performance was actually worse with the“enhanced” speech. Huckvale suggested that enhancementsystems must be designed neither to make the residual noisemore speech-like, nor the residualspeech more noise-like, since theseattributes appear to interfere with intel-ligibility at the phonetic level.The final paper of the conference

was “Practical and Affordable Intelligi-bility Testing for Engineers and Algo-rithm Developers,” by Ken Worrall andRob Fellows of Her Majesty’s Govern-ment Communications Center(HMGCC), UK. The paper was pre-sented by their HMGCC colleague,Louise Baddeley. The project involveddevelopment of a procedure for rapidintelligibility determinations that wouldprovide fast and useful information foran algorithm or system designer. Whiletheir proposed procedure, “Techniquefor Automated Comparative Intelligi-bility Testing” (TACIT), is not a statis-tically rigorous technique, Baddeleyexplained that it is sufficient to allowdesign and engineering decisions dur-ing a rapid development cycle.

AES AUDIO FORENSICS: STILLRISING AND SHININGThe AES 39th Conference was judgeda great success, and AES clearlyremains the leader in the field of foren-sic audio analysis and interpretation.The conference concluded with a sin-cere thank you from Eddy Brixen to theorganizing committee and all of theparticipants. Roger Furness, AES exec-utive director, added his words ofthanks and praise to the committee, andinvited all nonmembers to join the AESand to make plans to attend futureSociety events.Many of the attendees took advan-

tage of a special charter bus for trans-portation from the Pharmakon Centerback to Copenhagen Kastrup Airport,allowing everyone time for a few extraminutes of continued discussion andconversations. Some attendees were

heading home, some were embarking on vacations or otherbusiness, but all traveled with keen anticipation of anotherAES conference on forensic audio in the near future.

Editor’s note: The conference papers are available for purchaseas a book or as a downloadable PDF at www.aes.org/publications/conferences. Individual conference papers can alsobe obtained from the AES E-Library at www.aes.org/e-lib. TheNational Center for Media Forensics at the University of Col-orado Denver will be holding an Audio Forensics WorkshopDecember 13-15. For information contact Leah Haloin [email protected] or +1 303 315 5852.

AES 39th International Conference

...integrated

electronic test, acoustic test...

Prism Sound is now a globalsales & support agent for:

��������������������� ����� � �!"���������� ������������� � ����������!����� � �� ���� ���������� ���� ����������������������� �������� � �!���������������������

" � ���������������������" ����� ����" ��� �����""" ������ ������������������

���������� ���������������������� ������������ � ���� ������� ������� ���������� ���������������!�����������!����� ���� ������������������#� ���

dScope Series IIIanalog and digital

audio analyzers

������������� ������� !!!�������� ������������� ��� ��� ���������������������������

Loudspeaker design and analysis solutions

� ��������������������������

creo