6
A Phase based detector of whale clicks Varvara Kandia and Yannis Stylianou ∗† Foundation for Research and Technology Hellas, (FORTH), Heraklion Crete, Greece {vkandia, styliano}@ics.forth.gr University of Crete, Computer Science Dept. Heraklion Crete, Greece [email protected] Abstract—In this paper we present an alternative way to usual energy based approaches for detecting whale clicks. We suggest the use of the phase spectrum since the information about the location of clicks is very well represented in phase spectra. The method is referred to as the phase slope function. It is shown that the phase slope function is robust to additive noise while it offers simplicity in click detection since it is independent of the click source level. We further discuss its properties regarding the mono-pulse and multi-pulse character of clicks by introducing the notion of center of gravity for clicks. To evaluate the suggested phase based whale click detector we labeled clicks by hand in recordings of sperm and beaked whales provided by the Atlantic Undersea Test and Evaluation Center (AUTEC). Conducting detection tests demonstrate that 88% (on average) of the hand labeled mono-pulsed clicks were detected within an accuracy of less than 1ms. Regarding the detection of multi-pulsed clicks we were able to detect over 95% of them by considering a multi-pulsed click as one acoustic event and not as a series of pulses. I. I NTRODUCTION Acoustic monitoring of the sound activities of whales helps to study their habitats. Many toothed whales produce short clicks of high directionality and source level for echolocation. Automatic detection and classification of their clicks it is therefore necessary for their detection and continuous localization using passive acoustics. Some whales emit clicks in the range of human hearing (below 20 kHz) while others (like beaked whales) emit ultrasonic clicks with significant energy above 20 kHz. In this paper we focus on clicks produced by two types of whales; beaked whales and sperm whales. Beaked whales are deep- diving toothed whales and are the least known family of all marine mammals [1]. Especially two genera of beaked whales, Ziphius and Mesoplodon, are not so well known as other genera of beaked whales (i.e., Berardius). Beaked whales emit short mono-pulse directional ultrasonic clicks with significant energy above 20 kHz. An analytic report on recordings using acoustic recording tags attached on Ziphius and Mesoplodon beaked whales may be found in [1] and [2]. Sperm whales vocalizations are made up mostly of impulsive click- type sounds (Chapter 5 in [3]),[4], [5], [6], although there is strong evidence that they produce other kinds of sounds as well [7], [8], [9]. Most of the energy of clicks produced by sperm whales is distributed below the 20 kHz frequency. Their main repertoire contain regular and creak clicks; Regular clicks are impulsive broadband sounds of multi-pulse structure, and they are highly directional with high source level (up to 235 dBrms re 1μPa) [10]. Thus, it is believed that these properties of regular clicks represent adaptations for long- range echolocation. Creak clicks are burst of mono-pulsed clicks and highly directional, and they have low, relatively to regular clicks, source level (180-205 dBrms re 1μPa) [10]. They are produced during foraging dives and at high repetition rate for a precise localization of the prey [11]. Since clicks are highly directional, they present different properties if they are recorded off or on the acoustic axis of the whale [1]. Therefore the source level of the recorded clicks usually varies a lot within a short period of time. Many click detectors proposed in the literature are mainly based on source level criteria (amplitude spectrum) like in Rainbow Click [12], Ishmael [14], and Moby Click softwares [13], [5], and detectors based on the Teager-Kaiser energy operator [15], [16]. To overcome the variability of the energy of clicks the above approaches need to dynamically adjust their detection threshold, increasing so the complexity of the detection system without improving the detection score considerably. The location of a click, however, is very well represented in the phase spectrum, which is independent of the amplitude (energy) variations. For instance, two clicks located at the same instant but of different source levels, will have different amplitude (magnitude) spectra while their phase spectra will be dominated by the same linear phase component (i.e., slope) reflecting the position of the clicks. In this work we present, therefore, an alternative approach for the automatic detection of whale clicks based on the slope of the phase spectrum. The slope is estimated through the group delay function. A click is detected simply by searching the positive zero crossings over time of the slope of the phase spectrum. A major advantage of the proposed approach is its robustness against additive noise while it doesn’t require the definition of ad-hoc or adaptive thresholds for the detection of clicks. Using data from the Atlantic Undersea Test and Evaluation Center (AUTEC) with recordings of sperm and beaked whales (Blainville’s beaked whales (Mesoplodon densirostris) we show that the phase slope function is quite robust and accurate in detecting mono-pulsed clicks (regular clicks for beaked whales and creak clicks for sperm whales). For multi-pulsed clicks, however, the length of the analysis window used for computing the phase slope function determines if all the pulses of the clicks are detected or just one reference point per click. In this paper, we suggest the use of center of gravity as the reference point for a multi-pulsed click and we show the connection of the phase slope function with the notion of center of gravity. Thus, a multi-pulsed click is considered acoustically as one acoustic event (defining its occurrence in time by the location of its center of gravity) and not as a series of acoustic events (as many as its pulses). The paper is organized as follows. The motivation for using phase spectrum information for detecting clicks is given in Section II and we present the main steps for the computation phase slope function. In the same Section we define the center of gravity for whale clicks and the relationship between the slope function and the center of gravity is provided. The phase based click detector is developed in Section III. The dataset and the hand-labeled data used for the evaluation of the detection algorithm are presented in Section IV along with the presentation of detection results. A discussion on the obtained results and future work concludes the paper. 978-1-4244-2816-8/08/$25.00 ©2008 IEEE

[IEEE 2008 New Trends for Environmental Monitoring Using Passive Systems (PASSIVE 2008) - Hyeres, France (2008.10.14-2008.10.17)] 2008 New Trends for Environmental Monitoring Using

  • Upload
    yannis

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2008 New Trends for Environmental Monitoring Using Passive Systems (PASSIVE 2008) - Hyeres, France (2008.10.14-2008.10.17)] 2008 New Trends for Environmental Monitoring Using

A Phase based detector of whale clicks

Varvara Kandia∗ and Yannis Stylianou∗†∗Foundation for Research and Technology Hellas, (FORTH), Heraklion Crete, Greece

{vkandia, styliano}@ics.forth.gr†University of Crete, Computer Science Dept. Heraklion Crete, Greece

[email protected]

Abstract—In this paper we present an alternative way to usual energybased approaches for detecting whale clicks. We suggest the use of thephase spectrum since the information about the location of clicks isvery well represented in phase spectra. The method is referred to asthe phase slope function. It is shown that the phase slope functionis robust to additive noise while it offers simplicity in click detectionsince it is independent of the click source level. We further discuss itsproperties regarding the mono-pulse and multi-pulse character of clicksby introducing the notion of center of gravity for clicks. To evaluate thesuggested phase based whale click detector we labeled clicks by handin recordings of sperm and beaked whales provided by the AtlanticUndersea Test and Evaluation Center (AUTEC). Conducting detectiontests demonstrate that 88% (on average) of the hand labeled mono-pulsedclicks were detected within an accuracy of less than 1ms. Regarding thedetection of multi-pulsed clicks we were able to detect over 95% of themby considering a multi-pulsed click as one acoustic event and not as aseries of pulses.

I. INTRODUCTION

Acoustic monitoring of the sound activities of whales helps tostudy their habitats. Many toothed whales produce short clicks ofhigh directionality and source level for echolocation. Automaticdetection and classification of their clicks it is therefore necessary fortheir detection and continuous localization using passive acoustics.Some whales emit clicks in the range of human hearing (below 20kHz) while others (like beaked whales) emit ultrasonic clicks withsignificant energy above 20 kHz.

In this paper we focus on clicks produced by two types ofwhales; beaked whales and sperm whales. Beaked whales are deep-diving toothed whales and are the least known family of all marinemammals [1]. Especially two genera of beaked whales, Ziphius andMesoplodon, are not so well known as other genera of beaked whales(i.e., Berardius). Beaked whales emit short mono-pulse directionalultrasonic clicks with significant energy above 20 kHz. An analyticreport on recordings using acoustic recording tags attached on Ziphiusand Mesoplodon beaked whales may be found in [1] and [2].Sperm whales vocalizations are made up mostly of impulsive click-type sounds (Chapter 5 in [3]),[4], [5], [6], although there is strongevidence that they produce other kinds of sounds as well [7], [8], [9].Most of the energy of clicks produced by sperm whales is distributedbelow the 20 kHz frequency. Their main repertoire contain regularand creak clicks; Regular clicks are impulsive broadband soundsof multi-pulse structure, and they are highly directional with highsource level (up to 235 dBrms re 1μPa) [10]. Thus, it is believedthat these properties of regular clicks represent adaptations for long-range echolocation. Creak clicks are burst of mono-pulsed clicks andhighly directional, and they have low, relatively to regular clicks,source level (180-205 dBrms re 1μPa) [10]. They are produced duringforaging dives and at high repetition rate for a precise localization ofthe prey [11].

Since clicks are highly directional, they present different propertiesif they are recorded off or on the acoustic axis of the whale [1].Therefore the source level of the recorded clicks usually varies a

lot within a short period of time. Many click detectors proposed inthe literature are mainly based on source level criteria (amplitudespectrum) like in Rainbow Click [12], Ishmael [14], and MobyClick softwares [13], [5], and detectors based on the Teager-Kaiserenergy operator [15], [16]. To overcome the variability of the energyof clicks the above approaches need to dynamically adjust theirdetection threshold, increasing so the complexity of the detectionsystem without improving the detection score considerably.

The location of a click, however, is very well represented in thephase spectrum, which is independent of the amplitude (energy)variations. For instance, two clicks located at the same instant butof different source levels, will have different amplitude (magnitude)spectra while their phase spectra will be dominated by the same linearphase component (i.e., slope) reflecting the position of the clicks.In this work we present, therefore, an alternative approach for theautomatic detection of whale clicks based on the slope of the phasespectrum. The slope is estimated through the group delay function.A click is detected simply by searching the positive zero crossingsover time of the slope of the phase spectrum. A major advantage ofthe proposed approach is its robustness against additive noise whileit doesn’t require the definition of ad-hoc or adaptive thresholdsfor the detection of clicks. Using data from the Atlantic UnderseaTest and Evaluation Center (AUTEC) with recordings of sperm andbeaked whales (Blainville’s beaked whales (Mesoplodon densirostris)we show that the phase slope function is quite robust and accurate indetecting mono-pulsed clicks (regular clicks for beaked whales andcreak clicks for sperm whales). For multi-pulsed clicks, however, thelength of the analysis window used for computing the phase slopefunction determines if all the pulses of the clicks are detected orjust one reference point per click. In this paper, we suggest the useof center of gravity as the reference point for a multi-pulsed clickand we show the connection of the phase slope function with thenotion of center of gravity. Thus, a multi-pulsed click is consideredacoustically as one acoustic event (defining its occurrence in time bythe location of its center of gravity) and not as a series of acousticevents (as many as its pulses).

The paper is organized as follows. The motivation for using phasespectrum information for detecting clicks is given in Section II and wepresent the main steps for the computation phase slope function. Inthe same Section we define the center of gravity for whale clicks andthe relationship between the slope function and the center of gravity isprovided. The phase based click detector is developed in Section III.The dataset and the hand-labeled data used for the evaluation ofthe detection algorithm are presented in Section IV along with thepresentation of detection results. A discussion on the obtained resultsand future work concludes the paper.

978-1-4244-2816-8/08/$25.00 ©2008 IEEE

Page 2: [IEEE 2008 New Trends for Environmental Monitoring Using Passive Systems (PASSIVE 2008) - Hyeres, France (2008.10.14-2008.10.17)] 2008 New Trends for Environmental Monitoring Using

II. PHASE SLOPE FUNCTION

A. Motivation

Let’s consider a delayed unit sample sequence x[n] = δ[n−n0] andits Fourier Transform X(ω) = e−jωn0 . The group delay is definedas [17]:

τ(ω) = −dφ(ω)

dω(1)

so the group delay for the delayed unit sample sequence is τ(ω) =n0 ∀ω, since the phase spectrum of the signal is φ(ω) = −ωn0.The average over ω of τ(ω) provides n0 which corresponds to thenegative of the slope of the phase spectrum for this specific signaland to the delay of the unit sample sequence. An example of adelayed unit sample sequence with n0 = 200 samples as well asthe associated group delay function are depicted in Fig. 1(a) and (b),respectively. As it can be seen from Fig. 1(b) the group delay isequal to the delay of the unit sample sequence in all frequencies.Therefore any value of the group delay function will provide thedelay of the signal. However, this is not the case for other morecomplex signals. For example, let us consider the same unit samplesequence filtered by a real AR (Auto Regressive) process having twoconjugate poles at ±π/4 rad. The output signal is depicted in Fig. 1(c)and the corresponding group delay function is shown in Fig. 1(d).In this case not all the values of the group delay function providethe correct delay of the input unit sample sequence. However if wecompute the average over frequency of this function we obtain thecorrect delay of the input signal (i.e., n0 = 200). Therefore, it seemsthat a good estimator for the delay is the average of the group delayfunction over frequency.

0 100 200 300 400 500−0.2

0

0.2

0.4

0.6

0.8

1

1.2Signal 1

(a) Time (samples)0 2 4 6

200

200

200

200

200

200Group Delay

(b) Frequency (rad)

0 100 200 300 400 500

−0.5

0

0.5

1

Signal 2

(c) Time (samples)0 2 4 6

195

200

205

210

215Group Delay

(d) Frequency (rad)

Fig. 1. (a) A delayed by 200 samples unit sample sequence. (b) The groupdelay function of the signal in (a). (c) A minimum phase signal with anoscillation at π/4. (d) The group delay function of the signal in (c).

Let us present the problem of locating the position of the unitsample sequence from another point of view by considering the signalx[n] as a distribution of masses. Then, it is obvious, at least forthe simple unit sample sequence, that the center of mass (or centerof gravity) of the signal is at n0 = 200. Given the similarity inmeasurements between the slope of the phase spectrum and the centerof gravity, it is worth exploring further their relationship. Note thatthe center of gravity, η, of a signal f(t) is given by [18]:

η =m1

m0(2)

where mn is the nth moment of f(t):

mn =

∫ ∞

−∞tn f(t)dt (3)

Let F (ω) = A(ω) ej φ(ω) be the Fourier transform of signal f(t).With F (n)(0) to denote the nth derivative of Fourier transform off(t) at the origin, we can easily show that [19]:

F (n)(0) = (−j)n mn (4)

From Eqs. (2) and (4), the center of gravity of f(t) is given by:

η =j F (1)(0)

F (0)(5)

where

F (0) =

∫ ∞

−∞f(t) dt (6)

is the area, m0, of f(t) and F (1)(0), assuming that f(t) is real, isgiven by:

F (1)(0) = j A(0) φ(1)(0) (7)

From Eqs. (5) and (7) it follows that:

η = −φ(1)(0) (8)

This means that the center of gravity, η, of f(t) is a function only ofthe first derivative of the phase spectrum at the origin (ω = 0). Thus,if a signal is delayed by an amount t0, its center of gravity will bedelayed by the same amount.Based on the above analysis it is easily seen the connection betweenthe slope of the phase spectrum and the center of gravity. More thesignals have a strong onset, more the two methods will providethe same position of the onset since φ(1)(0) will mainly be ameasurement of the slope of the phase function.Going back to the minimum phase signal shown in Fig. 1(c), thecenter of gravity of this signal computed from Eq. 2 is also n0 = 200.This shows that the above connection between the two methods isstill valid for this case, as it is expected (the signal has a strong onsetat n0 = 200). It is worth noting that the peaks in the group delayfunction are at the frequencies of poles of the AR filter, while half ofthe magnitude of the peak (determining the bandwidth of the peak)corresponds to the delay of the center of gravity of the signal fromtime n = 0. From Fig. 1(d) this is computed to be at n0 = 206. Notethat if we consider x2[n] (and not x[n] as previously) as a distributionof masses over time we find the center of gravity to be located atn0 = 206. This corresponds in replacing f(t) with f2(t) in Eq. 3.

B. Computing the Phase Slope Function

The above results are valid if we consider that the first sampleof the unit sample sequence is at nl = 0. If we let the signal tomove left over the time axis by shifting the signal by nl > 0, i.e.,x[n + nl] = δ[n − (n0 − nl)] then the group delay is decreased toτ(ω) = n0 −nl ∀ω or the slope of the phase spectrum is increasedby nl: −n0 + nl. Continuing shifting the unit sample sequence thegroup delay as well as the slope of the phase function will be zerowhen nl = n0. As we continue the shift of the unit sample sequence,the slope of the phase spectrum will continue to increase. In thisway the slope of the phase spectrum is measured at every time shiftproviding thus a function of time, which will be referred to as PhaseSlope Function.Let us now consider a specific example of a signal containingtwo dirac pulses located at n = 200 and n = 100; x[n] =δ[n − 100] + δ[n − 200]. If we compute the group delay of thissignal and considering that the first sample of the signal is againlocated at nl = 0 then the group delay can be easily shown to beτ(ω) = 150 ∀ω. So the average of the group delay function will notprovide the location of the two dirac pulses (clicks) but the location

Page 3: [IEEE 2008 New Trends for Environmental Monitoring Using Passive Systems (PASSIVE 2008) - Hyeres, France (2008.10.14-2008.10.17)] 2008 New Trends for Environmental Monitoring Using

exactly between the two dirac pulses. Furthermore, if we considerthe signal x[n] = A1δ[n − 100] + A2δ[n − 200], with A1 � A2,then the group delay will provide only the location of the first click.Similarly, if A1 � A2, then only the location of the second clickwill be obtained. Same observations are made and same values forthe location of clicks are obtained if the center of gravity computedas in Eq.2 is considered.If the location of both clicks need to be resolved, the length of theanalysis window must be appropriately adjusted trying to isolate theclicks. This will re referred to as frame-by-frame analysis. In thisway, their locations will be revealed by the group delay functionindependently of their amplitudes. Let us consider a window oflength about the distance between the two clicks and let us shiftthe window instead of the signal1. By shifting the window we willcompute, as we described above, the Phase Slope Function of thissignal. Indeed, the above description is quite accurate when one ofthe clicks is observed through the window; the Phase Slope Functionis increasing and has a zero-crossing once the center of the analysiswindow coincides with the location of the click. When the windowobserves the second click then the Phase Slope Function abruptlygets negative values. As the window shifting continues the PhaseSlope Function will start to increase again. At the time when thecenter of analysis window coincides with the location of the secondclick, the Phase Slope Function will have another (positive) zerocrossing and its value will be increasing as the window continuesits shifting to the right. In this way the location of both clicks willbe correctly obtained independently of their amplitudes. The length ofthe analysis window determines how fast the Phase Slope Functionwill change from positive to negative values (this is similar to amodulo operation).If the click is not anymore a simple dirac pulse but has a structuresimilar to the one shown in Fig. 1(c) then the positive zero-crossingsof the Phase Slope Function will provide the onset of these signalsas it was shown in the previous Section.By creating a periodic sequence of minimum phase signals2 as the onedisplayed in Fig. 1(c), a sequence similar to a train of regular clicksmay be obtained. Defining an analysis window of length proportionalto the period of the sequence (it will be referred to as long window),a frame-by frame analysis is performed. In each frame the slopeof the phase spectrum of the windowed signal is computing andit is associated at the center of analysis window. By setting theanalysis step size at one sample (moving the analysis window byone sample at a time), the obtained Phase Slope Function has thesame time resolution as the original recording. The window lengthmay have a duration shorter than the period of the signal (it willbe referred to as short window). In Fig. 2(a) the periodic sequenceof the minimum phase signal is displayed along with the PhaseSlope Function using long (dashed line) and short (dash-dotted line)window. As it was expected based on the description provided before,the positive zero crossings of the Phase Slope Function provide theposition of the “clicks”. Of course, the detection of clicks usinga simple energy criterion will provide same detection score as theproposed approach, in this example. In Fig. 2(b), the same sequenceof “clicks” is repeated but now the energy of the minimum phasesignals are linearly decreased as time increases. In the same figurethe Phase Slope Function is also displayed using, as in Fig. 2(a),

1Either moving the signal or moving the center of the analysis window hasthe same effect.

2they basically correspond to the impulse response of the all-pole minimumphase system that was used as filter.

the same types of lines for long and short analysis window. It isobvious that a simple energy criterion will not work as well as beforeand an adaptive energy criterion should be adopted. On the contrary,using the Phase Slope Function, the position of “clicks” are easilydetected as before, since the values of the Phase Slope Function areindependent from the magnitude information (energy)3.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000−1

−0.5

0

0.5

1

1.5

(a) Time in samples

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000−1

−0.5

0

0.5

1

1.5

(b) Time in samples

Fig. 2. (a) A sequence of impulses of constant amplitude and the associatedphase slope function using long (dashed line) and short (dash-dotted line)window (b) A sequence of impulses with linearly time varying amplitudesand the associated phase slope function using long (dashed line) and short(dash-dotted line) window.

Based on Eq. 1, the computation of the group delay involvesthe derivative of the phase spectrum. However, phase spectrum iscomputed modulo 2π and therefore before computing the derivativeof phase over frequency, the phase must be unwrapped to eliminatediscontinuities of the phase function. Usually phase unwrapping isperformed by adding appropriate integer multiples of 2π to theprincipal phase values, so to remove discontinuity (jumps of 2πradians) in the phase curve. Unfortunately, phase unwrapping is notalways successful [17]. Therefore, we suggest to compute the slopeof the phase function through an alternative to Eq. 1 computation ofthe group delay [17]:

τ(ω) =XR(ω)YR(ω) + XI(ω)YI(ω)

|X(ω)|2 (9)

whereX(ω) = XR(ω) + jXI(ω)Y (ω) = YR(ω) + jYI(ω)

are the Fourier Transforms of x[n] and nx[n], respectively. UsingEq. 9 we avoid the computation of the unwrapped phase. The phaseslope is then computed as the negative of the average of the groupdelay function.

III. PHASE SLOPE BASED CLICK DETECTOR

In the previous section it was shown by examples that the localiza-tion of pulses using the phase slope provides the same results as thecenter of gravity of the signal defined in Eq. 2 for minimum phasesignals. Clicks from beaked and sperm whales are highly directionaland of very short duration. They can therefore be seen as realizationsof impulse responses of minimum phase systems. The application ofthe Phase Slope Function to the detection of clicks is straightforwardbased on the analysis presented in the previous section. Clicks willideally correspond to the positive zero crossings of the Phase SlopeFunction. An important design parameter of the Phase based click

3Note that phase information is not related to the total energy of a signalbut rather to the distribution of the signal energy over time

Page 4: [IEEE 2008 New Trends for Environmental Monitoring Using Passive Systems (PASSIVE 2008) - Hyeres, France (2008.10.14-2008.10.17)] 2008 New Trends for Environmental Monitoring Using

detector is the length of the window for the frame-by-frame analysis4

Another important issue is not directly related to the design of thedetector but rather to its evaluation. This has implications with thelabeling of clicks by humans.Beaked whales produce mono-pulsed clicks of very short duration.Creak clicks of sperm whales are also considered to be mono-pulsed. For these two cases the location of a click is not ambiguousfor labeling purposes; it is exactly where the peak of the click is.However, this is not the case for regular clicks of sperm whalesbecause of their multi-pulsed structure [20] [10] [21]. An exampleof regular click from a sperm whale is depicted in Fig. 3 along withtwo computations of the corresponding Phase Slope Function; onecomputed with a short window (8.5 ms) and the other one computedwith a long window (25 ms), and the location of the center of gravityof the signal. Notation of pulses p0, p1/2 and p1, follows the notationsuggested in [10] [21]. The energy of the click is mainly distributedamong these 3 pulses and depending on the position (with on axisrecordings) of the whale relatively to the hydrophones their intradistribution of energy is modified; when a recording is made behindthe animal then p0 is stronger than p1, while when a recording ismade in front the animal then p1 is stronger than p0 [21]. As it is

0 5 10 15 20 25−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5Regular click from sperm whale

Time (ms)

Am

plitu

de

p0

p1/2

p1

Fig. 3. Regular click from a sperm whale (solid line). Center of gravity ofsignal is at 8.25 ms (line with diamonds). Two Phase Slope Functions areshown; one computed with a 8.5 ms window (dashed line) and the other onewith a 25 ms window (dashdot line).

shown in Fig. 3, if we want to detect the main pulses of a clickthen we should use short windows. In this example we were ableto detect p1 from the 2nd positive zero crossing of the Phase SlopeFunction using a short window of 8.5 ms (dashed line). Note thatthe first positive zero crossing of the function does not detect neitherp0 or p1/2 but rather a point between these two pulses; actually itis the center of gravity if only these two pulses are considered . Forresolving these two pulses, a shorter (than 8.5 ms) window shouldbe used. Therefore, the window length defines the resolution of thedetector. On the other hand, if we would like to detect the click asone acoustic event then one point of reference should be selected.An obvious choice is to select the pulse containing the most ofthe click energy [16]. However such a choice is ambiguous giventhe variability of the distribution of the main energy of the clickamong its 3 first pulses. Instead, we suggest to use as reference pointthe center of gravity for multi-pulsed clicks considering a typical(average) duration of a sperm whale multi-pulsed regular click tobe 25 ms. Center of gravity has the advantage of basically being

4From a computational point of view, another issue is the frame rate. Highframe rate (every 2 ms) is much faster than low frame rate (every sample).

independent from the distribution of the click energy to the pulses.In the following, center of gravity will be considered as the referencepoint for the multi-pulsed regular sperm whale clicks.

As it was mentioned above, an important design parameter of thedetector is the length of the analysis window. As a general rule wesuggest to set the length of the analysis window as a function ofthe average inter-click interval. For example, according to Johnson etal. [1] the average inter-click interval for Blainville’s beaked whales isabout 0.3s. So for the experiments conducted in this paper on beakedwhales recordings, we used a hanning window of 0.5s (long window).For creak clicks of sperm whales we used a hanning window of 50mswhile for the detection of regular sperm whale clicks the length ofthe window was set to 1s.

Some examples of the application on sperm and beaked whalesrecordings are provided in the next figures. In Fig. 4 an exampleof four regular clicks produced by a sperm whale is depicted (solidline) along with their manual labels (triangles) and the correspondingPhase Slope Function (dashed line). It is easily seen that there is anagreement between the marks and the positive zero crossings of thePhase Slope Function (shown as dirac pulses in the lower panel).

0 500 1000 1500 2000 2500 3000 3500 4000Time (ms)

Detection of Regular clicks of sperm whale

Fig. 4. Upper panel: recording of regular clicks from a sperm whale (48kHzsampling frequency) (solid line). Lower panel: the corresponding Phase SlopeFunction (dashed line) and the detecting locations of clicks shown as diracpulses. Manual marks are indicated by triangles.

Same observations can be made in Fig. 5 and Fig. 6 whererecordings of creak clicks from a sperm whale and regular clicks froma beaked whale are depicted, respectively (mono-pulsed signals). It isworth mentioning that although the clicks are not at all visible in theoriginal recordings, by simply detecting the positive zero-crossings ofthe Phase Slope Function the click structure of the signals is revealed.Also it is worth noting the agreement between the manual marks andthe clicks detected by using the Phase Slope Function.

If we apply a click enhancement tool like the Teager-Kaiser (TK)energy operator [15] [16] [22] which is defined as follows [23]:

Ψ {x[n]} = x2[n] − x[n + 1]x[n − 1] (10)

then the click structure of the signal is more pronounced and thus,the Phase Slope Function provides more accurate detection since theinput signal is more closer to minimum phase signal. This is moreobvious for clicks like the creak clicks of sperm whales and theregular clicks of beaked whales. In Fig. 7 the output from the Teager-Kaiser energy operator is depicted which corresponds to the creaksignal shown in Fig. 5. However, we found that this enhancement isnot really critical for the Phase Slope Function based detector. This

Page 5: [IEEE 2008 New Trends for Environmental Monitoring Using Passive Systems (PASSIVE 2008) - Hyeres, France (2008.10.14-2008.10.17)] 2008 New Trends for Environmental Monitoring Using

0 50 100 150 200 250 300 350 400 450Time (ms)

Detection of Creak clicks of sperm whale

Fig. 5. Upper panel: recording of creak clicks from a sperm whale (48kHzsampling frequency) (solid line). Lower panel: the corresponding Phase SlopeFunction (dashed line) and the detecting locations of clicks shown as diracpulses. Manual marks are indicated by triangles.

5 5.5 6 6.5 7 7.5 8Time (s)

Detection of regular clicks of beaked whale

Fig. 6. Upper panel: recording of regular clicks from a beaked whale (96kHzsampling frequency) (solid line). Lower panel: the corresponding Phase SlopeFunction (dashed line) and the detecting locations of clicks shown as diracpulses. Manual marks are indicated by triangles.

indicates that the suggested detector is robust against additive noise.

0 50 100 150 200 250 300 350 400 450Time (ms)

Detection of Creak clicks of sperm whale after TK

Fig. 7. Upper panel: Output from the Teager-Kaiser energy operator for thesperm whale creak click signal shown in Fig. 5 (48kHz sampling frequency)(solid line). Lower panel: the corresponding Phase Slope Function (dashedline) and the detecting locations of clicks shown as dirac pulses. Manualmarks are indicated by triangles.

IV. DATASET AND RESULTS

To evaluate the effectiveness of the proposed approach in detectingclicks, recordings of beaked and sperm whale clicks were manuallymarked and compared to the automatically detected clicks. For themono-pulsed signals the maximum peak of the pulse was labeledas click. For the multi-pulsed regular sperm whale clicks the firstpulse was labeled (p0) and then the center of gravity of the click wascomputed by using a rectangular window of 25ms; 5ms before p0

and 20ms after p0.For beaked whales, the proposed click detection system was

evaluated on the training dataset provided by the organizers of the3rd International Workshop on Detection and Localization of MarineMammals using Passive Acoustics (Boston, MA, USA, 2007) andwere recorded by the Naval Undersea Warfare Center. More specif-ically, we used recordings of Blainville’s beaked whales recorded ata sample rate of 96 kHz, with 24 bits accuracy, from Set1, Alesisnumber 2, hydrophone H18 at AUTEC . One minute of these datawere manually labeled providing in total 248 regular beaked whaleclicks. From a visual inspection of the original recording, it wasnot possible to detect any of these clicks. The initial recording wasmodulated and downsampled to 48kHz because it was not possibleto listen to the original recordings (sampling frequency at 96 kHz).Based on the frequency characteristics of the emitted clicks wemodulated the amplitude of the signal and then down-sampled itappropriately [22].

The data for sperm whales were also recorded by the NavalUndersea Warfare Center (NUWC) and collected at the AtlanticUndersea Test and Evaluation Center (AUTEC), Andros Island,Bahamas. They were made publicly available through the webpageof the 2nd International Workshop on Detection and Localization ofMarine Mammals using Passive Acoustics (Monaco, 2005) [24]. Weused 5.3 minutes of recordings made with Hydrophone G, one of thefive hydrophones belonging to the Data set # 2. The constellation ofhydrophones can be seen in [25]. The audio files were sampled at 48kHz with 16 bits/sample of accuracy and contained clicks from onlyone sperm whale. Reverberation was often noticed in the availablerecordings. Hand labeling provided 146 creak clicks and 452 regularclicks.

All signals were listened through a closed type headphone in aquite office room. To improve the accuracy of labeling the playbackspeed of the sound was considerably lowered (i.e., for creak clicksto 2kHz). This facilitates the auditory and visual inspection of fastmoving acoustic events.

By comparing closely the detection results obtained by the PhaseSlope Function with these from hand labeling, we found that thereis a close agreement between them. However, in areas where onlynoise is present in the signal some erroneous clicks were detected.For the evaluation of the system two criteria were used; the detectionrate (referred to as Det) and the corrective rate (referred to as Corr).The detection rate is defined as:

Det =Number of clicks correctly detected

Total× 100

where Total is the total number of manually labeled clicks, and thecorrective rate is defined as:

Corr =Total − Deleted − Inserted

Total× 100

where Deleted referred to as the number of clicks that were consid-ered to be missed (deleted) and Inserted refers to the number of extraclicks that were inserted by the proposed system. The tolerance fordetection was set to 3ms. Thus, Correct detection means |tmanual −

Page 6: [IEEE 2008 New Trends for Environmental Monitoring Using Passive Systems (PASSIVE 2008) - Hyeres, France (2008.10.14-2008.10.17)] 2008 New Trends for Environmental Monitoring Using

tdet| ≤ 3ms, Deleted click means 3ms < |tmanual − tdet| ≤ 20mswhile Inserted click means |tmanual − tdet| > 20ms.

Regarding the mono-pulsed clicks (beaked whale regular clicks andsperm whale creak clicks) results show that the suggested detectorwas able to detect over 85% of the hand labeled clicks within anaccuracy of about 1ms. When the original signal is enhanced by theTeager-Kaiser (TK) energy operator [15][16] the results are improved.In this case, about 88% (on average) of the hand labeled clicks weredetected within an accuracy less than 1ms. Details about obtainedresults for mono-pulsed clicks are provided in Table I where MAEmeans Mean Absolute Error in ms between the detected click andthe hand labeled click.

TABLE IDETECTION RESULTS ON BEAKED WHALE CLICKS AND ON SPERM WHALE

CREAK CLICKS.

Original signals / Enhanced by TKSpecies � Clicks Det(%) Corr(%) MAE (ms)

Beakes whales 248 84.9 / 86.3 87.1 / 88.2 1.1 / 0.9Sperm whales 146 87.7 / 90.4 84.9 / 84.3 1.6 / 0.97

For multi-pulsed clicks the detection score was higher as expected(higher SNR) and the TK-energy operator didn’t improve significantlythe detection score (the accuracy is mainly improved). Results forsperm whale regular clicks are provided in Table II.

TABLE IIDETECTION RESULTS ON SPERM WHALE REGULAR CLICKS.

Original signals / Enhanced by TKSpecies � Clicks Det(%) Corr(%) MAE (ms)

Sperm whales 452 95.4 / 96.1 94.1 / 94.0 1.8 / 0.85

V. CONCLUSIONS

In this paper we presented a novel click detector based on phasespectrum information rather using amplitude or energy based infor-mation like the most, if not all, current click detectors. A click issimply detected by searching the positive zero crossings of the socalled Phase Slope Function. Detection results are very promising;very good accuracy and high detection rate while the detector isrobust against additive noise. Although click enhancement tools maybe used to improve the detection score this is not considered criticalfor the phase based detector. Finally, we introduced the notion ofcenter of gravity for clicks especially as a reference point for multi-pulsed clicks like the sperm whale regular clicks. We would like toexplore further this idea of center of gravity in another applicationslike the estimation of the Time Difference of Arrival for localizationpurposes and extend the conducted tests on more data.

VI. ACKNOWLEDGMENTS

The authors would like to thank Naval Undersea Warfare Center(NUWC) for making the data used in this paper publicly availablethrough the 2nd and the 3rd International Workshops on Detec-tion and Localization of Marine Mammals using Passive Acoustics(Monaco 2005 and Boston 2007, respectively), and the organizers ofthe workshops for providing access to these data.

REFERENCES

[1] M. Johnson, P. T. Madsen, W. M. X. Zimmer, N. A. de Soto, and P. L.Tyack, “Beaked whales echolocate on prey,” Proc. Royal Soc. BiologyLetters, vol. 271, pp. 383–386, 2004.

[2] P. T. Madsen, M. Johnson, N. A. de Soto, W. M. X. Zimmer, and P. L.Tyack, “Biosonar performance of foraging beaked whales (Mesoplodondensirostris),” The Journal of Experimental Biology, vol. 208, pp. 181–194, 2005.

[3] H. Whitehead, Sperm whales: Social Evolution in the Ocean. Universityof Chicago Press, 2003.

[4] P. T. Madsen, “Sperm whale sound production - in the acoustic realm ofthe biggest nose on record,” Ph.D. dissertation, Dept. of Zoophysiology,University of Aarhus, Denmark, 2002.

[5] L. Douglas, Click counting: An acoustic censusing method for estimatingsperm whale abundance. Master Thesis, University of Otago, Dunedin,New Zealand, 2000.

[6] J. C. D. Gordon, “The behavior and ecology of sperm whales ofSri Lanka,” Ph.D. dissertation, University of Cambridge, Cambridge,England, 1987.

[7] J. C. Goold, “Behavioural and acoustic observations of sperm whalesin Scapa Flow, Orkney Islands,” J. Mar. Biol. Assoc. UK, vol. 79, pp.541–550, 1999.

[8] V. Drouot, “Ecology of sperm whale (Physeter macrocephalus) in theMediterranean Sea,” Ph.D. dissertation, University of Whales, Bangor,UK, 2003.

[9] P. T. Madsen, D. A. Carder, W. W. L. Au, P. E. Nachtigall, B. Møhl, andS. H. Ridgway, “Sound production in neonate sperm whales,” Journalof the Acoustical Society of America, vol. 113, no. 6, pp. 2988–2991,2003.

[10] B. Møhl M. Wahlberg, P. T. Madsen, L. A. Miller, and A. Surlykke,“Sperm whale clicks: Directionality and source level revisited,” Journalof the Acoustical Society of America, vol. 107, no. 1, pp. 638–648, 2000.

[11] P. J. O. Miller, M. P. Johnson, and P. L. Tyack, “Sperm whale behaviourindicates the use of echolocation click buzzes ’creaks’ in prey capture,”Proc. R. Soc. Lond., 2004.

[12] D. Gillespie, “An acoustic survey for sperm whales in the SouthernOcean sanctuary conducted from the R/V Aurora Australis,” Rep. Int.Whal. Comm., vol. 47, pp. 897–908, 1997.

[13] O. Jake, Acoustic Censusing of sperm whales at Kaikoura, New Zealand:An inexpensive method to count clicks and whales automatically. MasterThesis, University of Otago, Dunedin, New Zealand, 1996.

[14] D. K. Mellinger, Ishmael 1.0 Users Guide, NOAA,NOAA/PMEL/OERD, 2115 SE OSU Drive, Newport, OR 97365-5258, 2001, technical Memorandum OAR PMEL-120.

[15] V. Kandia and Y. Stylianou, “Detection of creak clicks of sperm whalesin low SNR conditions,” in CD Proc. IEEE Oceans, Brest, France, 2005.

[16] ——, “Detection of sperm whale clicks based on the Teager-Kaiserenergy operator,” Applied Acoustics, vol. 67, no. 11-12, pp. 1144–1163,2006.

[17] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time SignalProcessing. Prentice Hall, 1998.

[18] A. Papoulis, Signal analysis. McGraw-Hill, 1984.[19] Y. Stylianou, “Removing linear phase mismatches in concatenative

speech synthesis,” IEEE Trans. on Speech and Audio Processing, vol. 9,no. 1, pp. 21–29, 2001.

[20] B. Møhl, M. Wahlberg, P. T. Madsen, A. Heerfordt, and A. Lund, “Themonopulsed nature of sperm whale clicks,” Journal of the AcousticalSociety of America, vol. 114, no. 2, pp. 1143–1154, 2003.

[21] W. M. X. Zimmer, P. T. Madsen, V. Teloni, M. P. Johnson, and P. L.Tyack, “Off-axis effects on the multipulse structure of sperm whale usualclicks with implications for sound production,” Journal of the AcousticalSociety of America, vol. 118, no. 5, pp. 3337–3345, 2005.

[22] V. Kandia and Y. Stylianou, “Detection of clicks based on group delay,”Canadian Acoustics. Accepted for publication, 2008.

[23] J. F. Kaiser, “On a simple algorithm to calculate the ’Energy’ of a signal,”in Proc. IEEE ICASSP, Albuquerque, NM, USA, 1990, pp. 381–384.

[24] O. Adam, “Website of the 2nd International Workshop on Detec-tion and Localization of Marine Mammals using Passive Acoustics,”http://www.circe-asso.org/workshop, 2008.

[25] S. Jarvis and D. Moretti, “Passive detection and localization of transientsignals from marine mammals using widely spaced bottom mountedhydrophones in open ocean environments,” in Int. Workshop on theApplication of Passive Acoustics in Fisheries, MIT Sea Grant, Mas-sachusetts, USA, 2002.