Audio Engineering Society Convention e-Brief 108 · 2020. 3. 31. · Audio Engineering Society Convention e-Brief 108 Presented at the 135th Convention 2013 October 17–20 New York,

Audio Engineering Society

Convention e-Brief 108Presented at the 135th Convention

2013 October 17–20 New York, USA

This Engineering Brief was selected on the basis of a submitted synopsis. The author is solely responsible for its presentation,and the AES takes no responsibility for the contents. All rights reserved. Reproduction of this paper, or any portion thereof, is notpermitted without direct permission from the Audio Engineering Society.

A new DSP tool for drum leakagesuppression

Elias Kokkinis1, Alexandros Tsilfidis1, Thanos Kostis1, Kostas Karamitas1

1accusonus, Patras Innovation Hub, 26504, GreeceCorresponding author: [email protected]

ABSTRACT

Microphone leakage is problem that sound engineers face every day. Leakage complicates audio editing,processing and mixing and it is a well known problem in drum recordings. To this day, sound engineershave only a limited amount of options available in order to address this problem. These mostly consist ofsimple and empirical methods. A novel technology that addresses the problem of microphone leakage inmultichannel drum recordings is presented here. In addition we discuss the problem definition as deducedfrom the specific properties of drum recordings, as well as the resulting signal processing framework.

1. INTRODUCTION

Whether in studio or in concert, in most cases a num-

ber of musicians perform together inside the same room

and many microphones are set to capture the sound emit-

ted by their instruments. Ideally each microphone should

only pick up the sound of the intended instrument, but

due to the interaction between the various instruments

and room acoustics, each microphone picks up not only

the sound of interest but also a mixture of all other instru-

ments. This is known as microphone leakage (or bleed

or spill) and is an undesirable effect and a well-known

problem in every-day sound engineering practice.

Microphone leakage introduces many problems which

need to be constantly taken into account when someone

is working with multichannel (drum) recordings:

• Microphone leakage can be considered as a noise

source (albeit a complex one), significantly decreas-

ing the signal to noise ratio (SNR) of the recorded

signal.

• When mixing channels that contain leakage, a sig-

nificant degradation of signal’s quality is introduced

as a result of comb filtering effects. Comb filtering

is a direct result of adding several delayed versions

of the same source, as captured by more than one

microphones.

• The use of common and creative audio effects is

limited by the presence of microphone leakage. For

example, when equalizing a snare track containing

leakage from the hi-hat, one may also unintention-

ally alter the sound of the hi-hat in the mix.

• Finally, since microphone leakage decreases the

signal’s SNR and distorts the original signal, it may

degrade the performance of other processing tools

such as automatic pitch correction.

Audio engineers have only a limited amount of options

Kokkinis et al. Drum leakage suppression

available in order to address this problem, that mostly

consist of simple and empirical methods. One of the

most straightforward strategies to address the problem

is the use of directional microphones and the close-

microphone technique, that is the placement of the mi-

crophone in close proximity to the sound source it is in-

tended to capture. This strategy also requires the proper

placement and orientation of sound sources and micro-

phones in the room, demanding time-consuming experi-

mentation.

From a signal processing perspective the only available

tool is the noise gate. The noise gate is an audio proces-

sor that essentially acts like a switch and allows sound to

pass only when its amplitude exceeds a certain threshold.

Noise gate is practically effective mostly with percussive

instruments and can only partially address the problem

since leakage will pass through when the signal of in-

terest is present. On the other hand, incorrect setting

of the gate’s parameters may introduce audible artifacts

(e.g. breathing).

2. DRUM RECORDING AND LEAKAGEDrums are an integral part of most modern music gen-

res. The recording of drums is a complicated task that

requires experience, knowledge and attention to detail

(Owsinski 2005). Although inside the controlled envi-

ronment of a recording studio audio engineers have the

option to use isolation booths for every instrument, in or-

der to avoid the problem of microphone leakage, drums

represent a special case. While drums are treated as a sin-

gle instrument in the final mix, in practice a drum kit is a

combination of different sound sources. Typically audio

engineers opt to record each of these sound sources (i.e.

each individual drum) resulting in a multichannel record-

ing of the drum kit. Hence, even in an isolation booth,

microphone leakage will manifest in the each drum’s mi-

crophone signal.

Multichannel drum recordings have a number of distinct

features:

• Each drum is recorded using the close-microphone

technique (except in the case of overhead or

room microphones). The properties of the close-

microphone response have been studied in previous

work (Kokkinis 2012).

• The signals produced by different drums sometimes

vary significantly in terms of time-frequency con-

tent. In signal processing, this property is called

sparsity and it is often employed to provide cues for

source separation algorithms.

• Drum recordings are in most cases inherently mul-

tichannel. This fact can be exploited to extract in-

formation from the multiple channels available that

will wield cues to assist source separation algo-

rithms.

3. A3 - ADVANCED AUDIO ANALYSISIn this section, we will describe the major aspects of a

novel technology (A3 - Advanced Audio Analysis) that

was developed to address the problem of microphone

leakage. The technology is summarized in the block dia-

gram of Figure 1. Let’s consider the signal produced by

the m-th microphone.

xm(k) = s̃m(k)+ um(k) (1)

where xm(k) is the microphone signal, s̃m(k) is the direct

sound source that is the sound source the microphone

was intended to capture and um(k) is a noise term that

describes the effect of microphone leakage. um(k) can

either be treated as a single complex noise source or as a

combination of several noise sources, leading to

xm(k) = s̃m(k)+M

∑i=1i6=m

s̄i,m(k) (2)

where s̄i,m(k) represents the leakage source that mani-

fests in the m-th microphone as a result of the i-th source.

Equation (1) describes the problem of microphone leak-

age as the well-known signal in additive noise problem,

while equation (2) describes the problem from a source

separation perspective. The direct source s̃m(k) is as-

sumed to be dominant in any model, since the close-

microphone technique is employed. This fact is an im-

portant property of multichannel drum recordings that

enables the development of efficient and adequate DSP

methods to address the problem of microphone leakage.

The processing of a multichannel drum recording using

the technology described in Figure 1 involves two steps.

The first step is the core processing which consists of a

time-frequency analysis and synthesis, the source separa-

tion algorithm and time-domain information extraction.

Each microphone signal xm(k) is transformed to the

time-frequency domain using the well-known short time

Fourier transform (STFT). The STFT values Xm(κ ,ω)

AES 135th Convention, New York, USA, 2013 October 17–20

Page 2 of 4


Core Processing Post-processing

Time-

frequency

analysis

Extract time

domain cues

Time-

frequency

synthesis

Signal

reorderingUser control

Source

separation

algorithm

Mic

rophone s

ignals

Pro

cessed s

ignals

Fig. 1: A conceptual block diagram of the A3 technology. White arrows indicate multichannel data.

(where κ is the frame index and ω the frequency bin

index) can be arranged in a complex matrix Xm. The

magnitude of the complex matrix |Xm| represents the

signal’s spectrogram. The STFT parameters (window

length, hop size, window type) must be carefully cho-

sen due to the nature of drum signals. An adequate time

resolution must be ensured, since drum signals are tran-

sient signals, while at the same time sufficient frequency

resolution must be obtained in order to be able to distin-

guish the finer frequency components of each drum. For

a full-length drum channel, these requirements result in a

significant amount of data that need to be stored and pro-

cessed. From each transformed microphone signal, a set

of time domain cues are extracted that are subsequently

passed over to the main source separation algorithm.

The source separation algorithm is based on a powerful

set of techniques, called non-negative matrix factoriza-

tion (NMF) (Cichocki et al. 2009). In the case of audio,

such methods decompose the magnitude spectrogram in

a matrix product

|Xm| ≈ WmHm (3)

where Wm is a F ×K matrix of spectral profiles and Hm

is a K ×N matrix of activation functions. F is the num-

ber of discrete frequency bins, N is the number of frames

and K is the number of components. Each spectral pro-

file corresponds to the spectrum of one source found in

the microphone signal xm(k) and each activation func-

tion indicates when a spectral profile is active. The ma-

trices Wm,Hm are estimated using an iterative scheme.

The corresponding algorithm is derived by constructing

an appropriate cost function which takes into account any

information or constraints imposed by the specific appli-

cation and minimizing this cost function in an iterative

way. Equation (3) describes the NMF model for a sin-

gle microphone signal. In the case of multichannel drum

recordings, one can choose different strategies to apply

NMF.

After the successful estimation of Wm,Hm, a number of

time-frequency masks can be extracted by multiplying

each column of Wm with the corresponding row of Hm.

Each mask can be applied on the complex spectrogram

and after the appropriate inverse STFT a component sig-

nal is generated. Each component signal may contain a

part of the direct sound source or a part of the leakage

sources. By an appropriate combination of the relevant

components, an estimation of the direct sound source can

be obtained.

A human user has to listen all the component signals for

each microphone signal and decided which components

belong to the direct source and which contain leakage,

in order to complete the task of source separation. The

technology discussed here contains a method that helps

to automate this process, by assessing the resemblance

of each component signal to the desired direct source.

At the output of this process, the component signals are

reordered in such way that they can be easily combined

by a human user, without the need to listen to them. This

is the post processing step in Figure 1.

In general, user control and interaction with advanced

signal processing tools, such as source separation algo-

rithms, is an open research issue. In general, sound en-

gineers are not trained to use such tools. In order to

be meaningful, this interaction must balance between a

complex and intricate interface that will frustrate the user

and an overly simplified one which will not allow him to

perform the desired task.


Page 3 of 4


An example of the performance of the technology that

was developed is shown in Figure 2, for the microphone

signal of a tom from a multichannel drum recording.

4. DISCUSSIONAudio engineering is one of the first application areas of

fundamental engineering principles. In the last decades,

the rise of digital audio and the gradual shift towards the

all-digital signal path led to a revolution in audio. But

this revolution remains incomplete. Even today, where

digital audio workstations have a tremendous amount of

processing power available, most available software is

based on the same audio engineering principles that were

proposed several decades ago. We believe that today

there is a need to explore the advances of digital signal

processing and harness the computational power of mod-

ern computers to solve problems that until now seemed

daunting or unsolvable.

The technology that was presented in the previous sec-

tions addresses the problem of microphone leakage, an

ubiquitous problem of audio engineering, that hinders

engineers in every day practice. The problem is ad-

dressed from a novel perspective, employing state of the

art digital signal processing techniques combined in a

way that provides a streamlined processing chain that al-

lows the user to accomplish the desired task with mini-

mal effort.

As more novel tools are being developed, audio engi-

neers need to be educated regarding their use. However,

it is important to note that every new technology such

as the one discussed here is not intended to replace the

audio engineer. It is an attempt to provide him with an

expanded array of tools and options, to create better au-

dio.

References

Cichocki, A., Zdunek, R., Phan, A. H., and Amari, S.-I.

(2009). Nonnegative matrix and tensor factorizations.

Wiley.

Kokkinis, E. (2012). Blind signal processing methods

for microphone leakage suppression in multichannel

microphone applications. PhD thesis, University of

Patras.

Owsinski, B. (2005). The recording engineer’s hand-

book. Artist Pro Publishing.

0 1 2 3 4 5 6 7 8 9-0.2

-0.1

0

0.1

0.2

Time (sec)

Am

plitu

de

OriginalProcessed

(a) Time-domain signals.

(b) Spectrogram (original).

(c) Spectrogram (processed).

Fig. 2: An example of the A3 performance, on a tom

recording.


Page 4 of 4

Documents

Audio Engineering Society Convention e-Brief 108 · 2020. 3. 31. · Audio Engineering Society Convention e-Brief 108 Presented at the 135th Convention 2013 October 17–20 New York,