Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Audio Engineering Society
Convention e-Brief 108Presented at the 135th Convention
2013 October 17–20 New York, USA
This Engineering Brief was selected on the basis of a submitted synopsis. The author is solely responsible for its presentation,and the AES takes no responsibility for the contents. All rights reserved. Reproduction of this paper, or any portion thereof, is notpermitted without direct permission from the Audio Engineering Society.
A new DSP tool for drum leakagesuppression
Elias Kokkinis1, Alexandros Tsilfidis1, Thanos Kostis1, Kostas Karamitas1
1accusonus, Patras Innovation Hub, 26504, GreeceCorresponding author: [email protected]
ABSTRACT
Microphone leakage is problem that sound engineers face every day. Leakage complicates audio editing,processing and mixing and it is a well known problem in drum recordings. To this day, sound engineershave only a limited amount of options available in order to address this problem. These mostly consist ofsimple and empirical methods. A novel technology that addresses the problem of microphone leakage inmultichannel drum recordings is presented here. In addition we discuss the problem definition as deducedfrom the specific properties of drum recordings, as well as the resulting signal processing framework.
1. INTRODUCTION
Whether in studio or in concert, in most cases a num-
ber of musicians perform together inside the same room
and many microphones are set to capture the sound emit-
ted by their instruments. Ideally each microphone should
only pick up the sound of the intended instrument, but
due to the interaction between the various instruments
and room acoustics, each microphone picks up not only
the sound of interest but also a mixture of all other instru-
ments. This is known as microphone leakage (or bleed
or spill) and is an undesirable effect and a well-known
problem in every-day sound engineering practice.
Microphone leakage introduces many problems which
need to be constantly taken into account when someone
is working with multichannel (drum) recordings:
• Microphone leakage can be considered as a noise
source (albeit a complex one), significantly decreas-
ing the signal to noise ratio (SNR) of the recorded
signal.
• When mixing channels that contain leakage, a sig-
nificant degradation of signal’s quality is introduced
as a result of comb filtering effects. Comb filtering
is a direct result of adding several delayed versions
of the same source, as captured by more than one
microphones.
• The use of common and creative audio effects is
limited by the presence of microphone leakage. For
example, when equalizing a snare track containing
leakage from the hi-hat, one may also unintention-
ally alter the sound of the hi-hat in the mix.
• Finally, since microphone leakage decreases the
signal’s SNR and distorts the original signal, it may
degrade the performance of other processing tools
such as automatic pitch correction.
Audio engineers have only a limited amount of options
Kokkinis et al. Drum leakage suppression
available in order to address this problem, that mostly
consist of simple and empirical methods. One of the
most straightforward strategies to address the problem
is the use of directional microphones and the close-
microphone technique, that is the placement of the mi-
crophone in close proximity to the sound source it is in-
tended to capture. This strategy also requires the proper
placement and orientation of sound sources and micro-
phones in the room, demanding time-consuming experi-
mentation.
From a signal processing perspective the only available
tool is the noise gate. The noise gate is an audio proces-
sor that essentially acts like a switch and allows sound to
pass only when its amplitude exceeds a certain threshold.
Noise gate is practically effective mostly with percussive
instruments and can only partially address the problem
since leakage will pass through when the signal of in-
terest is present. On the other hand, incorrect setting
of the gate’s parameters may introduce audible artifacts
(e.g. breathing).
2. DRUM RECORDING AND LEAKAGEDrums are an integral part of most modern music gen-
res. The recording of drums is a complicated task that
requires experience, knowledge and attention to detail
(Owsinski 2005). Although inside the controlled envi-
ronment of a recording studio audio engineers have the
option to use isolation booths for every instrument, in or-
der to avoid the problem of microphone leakage, drums
represent a special case. While drums are treated as a sin-
gle instrument in the final mix, in practice a drum kit is a
combination of different sound sources. Typically audio
engineers opt to record each of these sound sources (i.e.
each individual drum) resulting in a multichannel record-
ing of the drum kit. Hence, even in an isolation booth,
microphone leakage will manifest in the each drum’s mi-
crophone signal.
Multichannel drum recordings have a number of distinct
features:
• Each drum is recorded using the close-microphone
technique (except in the case of overhead or
room microphones). The properties of the close-
microphone response have been studied in previous
work (Kokkinis 2012).
• The signals produced by different drums sometimes
vary significantly in terms of time-frequency con-
tent. In signal processing, this property is called
sparsity and it is often employed to provide cues for
source separation algorithms.
• Drum recordings are in most cases inherently mul-
tichannel. This fact can be exploited to extract in-
formation from the multiple channels available that
will wield cues to assist source separation algo-
rithms.
3. A3 - ADVANCED AUDIO ANALYSISIn this section, we will describe the major aspects of a
novel technology (A3 - Advanced Audio Analysis) that
was developed to address the problem of microphone
leakage. The technology is summarized in the block dia-
gram of Figure 1. Let’s consider the signal produced by
the m-th microphone.
xm(k) = s̃m(k)+ um(k) (1)
where xm(k) is the microphone signal, s̃m(k) is the direct
sound source that is the sound source the microphone
was intended to capture and um(k) is a noise term that
describes the effect of microphone leakage. um(k) can
either be treated as a single complex noise source or as a
combination of several noise sources, leading to
xm(k) = s̃m(k)+M
∑i=1i6=m
s̄i,m(k) (2)
where s̄i,m(k) represents the leakage source that mani-
fests in the m-th microphone as a result of the i-th source.
Equation (1) describes the problem of microphone leak-
age as the well-known signal in additive noise problem,
while equation (2) describes the problem from a source
separation perspective. The direct source s̃m(k) is as-
sumed to be dominant in any model, since the close-
microphone technique is employed. This fact is an im-
portant property of multichannel drum recordings that
enables the development of efficient and adequate DSP
methods to address the problem of microphone leakage.
The processing of a multichannel drum recording using
the technology described in Figure 1 involves two steps.
The first step is the core processing which consists of a
time-frequency analysis and synthesis, the source separa-
tion algorithm and time-domain information extraction.
Each microphone signal xm(k) is transformed to the
time-frequency domain using the well-known short time
Fourier transform (STFT). The STFT values Xm(κ ,ω)
AES 135th Convention, New York, USA, 2013 October 17–20
Page 2 of 4
Kokkinis et al. Drum leakage suppression
Core Processing Post-processing
Time-
frequency
analysis
Extract time
domain cues
Time-
frequency
synthesis
Signal
reorderingUser control
Source
separation
algorithm
Mic
rophone s
ignals
Pro
cessed s
ignals
Fig. 1: A conceptual block diagram of the A3 technology. White arrows indicate multichannel data.
(where κ is the frame index and ω the frequency bin
index) can be arranged in a complex matrix Xm. The
magnitude of the complex matrix |Xm| represents the
signal’s spectrogram. The STFT parameters (window
length, hop size, window type) must be carefully cho-
sen due to the nature of drum signals. An adequate time
resolution must be ensured, since drum signals are tran-
sient signals, while at the same time sufficient frequency
resolution must be obtained in order to be able to distin-
guish the finer frequency components of each drum. For
a full-length drum channel, these requirements result in a
significant amount of data that need to be stored and pro-
cessed. From each transformed microphone signal, a set
of time domain cues are extracted that are subsequently
passed over to the main source separation algorithm.
The source separation algorithm is based on a powerful
set of techniques, called non-negative matrix factoriza-
tion (NMF) (Cichocki et al. 2009). In the case of audio,
such methods decompose the magnitude spectrogram in
a matrix product
|Xm| ≈ WmHm (3)
where Wm is a F ×K matrix of spectral profiles and Hm
is a K ×N matrix of activation functions. F is the num-
ber of discrete frequency bins, N is the number of frames
and K is the number of components. Each spectral pro-
file corresponds to the spectrum of one source found in
the microphone signal xm(k) and each activation func-
tion indicates when a spectral profile is active. The ma-
trices Wm,Hm are estimated using an iterative scheme.
The corresponding algorithm is derived by constructing
an appropriate cost function which takes into account any
information or constraints imposed by the specific appli-
cation and minimizing this cost function in an iterative
way. Equation (3) describes the NMF model for a sin-
gle microphone signal. In the case of multichannel drum
recordings, one can choose different strategies to apply
NMF.
After the successful estimation of Wm,Hm, a number of
time-frequency masks can be extracted by multiplying
each column of Wm with the corresponding row of Hm.
Each mask can be applied on the complex spectrogram
and after the appropriate inverse STFT a component sig-
nal is generated. Each component signal may contain a
part of the direct sound source or a part of the leakage
sources. By an appropriate combination of the relevant
components, an estimation of the direct sound source can
be obtained.
A human user has to listen all the component signals for
each microphone signal and decided which components
belong to the direct source and which contain leakage,
in order to complete the task of source separation. The
technology discussed here contains a method that helps
to automate this process, by assessing the resemblance
of each component signal to the desired direct source.
At the output of this process, the component signals are
reordered in such way that they can be easily combined
by a human user, without the need to listen to them. This
is the post processing step in Figure 1.
In general, user control and interaction with advanced
signal processing tools, such as source separation algo-
rithms, is an open research issue. In general, sound en-
gineers are not trained to use such tools. In order to
be meaningful, this interaction must balance between a
complex and intricate interface that will frustrate the user
and an overly simplified one which will not allow him to
perform the desired task.
AES 135th Convention, New York, USA, 2013 October 17–20
Page 3 of 4
Kokkinis et al. Drum leakage suppression
An example of the performance of the technology that
was developed is shown in Figure 2, for the microphone
signal of a tom from a multichannel drum recording.
4. DISCUSSIONAudio engineering is one of the first application areas of
fundamental engineering principles. In the last decades,
the rise of digital audio and the gradual shift towards the
all-digital signal path led to a revolution in audio. But
this revolution remains incomplete. Even today, where
digital audio workstations have a tremendous amount of
processing power available, most available software is
based on the same audio engineering principles that were
proposed several decades ago. We believe that today
there is a need to explore the advances of digital signal
processing and harness the computational power of mod-
ern computers to solve problems that until now seemed
daunting or unsolvable.
The technology that was presented in the previous sec-
tions addresses the problem of microphone leakage, an
ubiquitous problem of audio engineering, that hinders
engineers in every day practice. The problem is ad-
dressed from a novel perspective, employing state of the
art digital signal processing techniques combined in a
way that provides a streamlined processing chain that al-
lows the user to accomplish the desired task with mini-
mal effort.
As more novel tools are being developed, audio engi-
neers need to be educated regarding their use. However,
it is important to note that every new technology such
as the one discussed here is not intended to replace the
audio engineer. It is an attempt to provide him with an
expanded array of tools and options, to create better au-
dio.
References
Cichocki, A., Zdunek, R., Phan, A. H., and Amari, S.-I.
(2009). Nonnegative matrix and tensor factorizations.
Wiley.
Kokkinis, E. (2012). Blind signal processing methods
for microphone leakage suppression in multichannel
microphone applications. PhD thesis, University of
Patras.
Owsinski, B. (2005). The recording engineer’s hand-
book. Artist Pro Publishing.
0 1 2 3 4 5 6 7 8 9-0.2
-0.1
0
0.1
0.2
Time (sec)
Am
plitu
de
OriginalProcessed
(a) Time-domain signals.
(b) Spectrogram (original).
(c) Spectrogram (processed).
Fig. 2: An example of the A3 performance, on a tom
recording.
AES 135th Convention, New York, USA, 2013 October 17–20
Page 4 of 4