051013 Reich Report-ocr

Embed Size (px)

Citation preview

  • 7/30/2019 051013 Reich Report-ocr

    1/4

    May 9, 2013

    Richard ManteiAssistant State Attorney

    220 East Bay Street

    Jacksonville, FL 32202

    DearMr.Mantei:

    May this letter serve as a partial summary of my ongoing aural and digital acoustical examination of two

    911 recordings re: State of Florida v. George Zimmerman. The supplied recordings were represented

    as unredacted digital copies of original digital audio recordings. You requested that I process andanalyze two 911 Dispatch recordings, hereafter referred to as CALL 1 andCALL3. Immediately after

    receiving them, I archived the zip-extracted files on magnetic and laser media. In addition, several other

    digital recordings were supplied as possible sources of voice exemplars for George Zimmerman and

    Trayvon Martin. They are described briefly in a subsequent section of this summary.

    Technical Considerations Regarding the 911 Recordings

    The moderate-fidelity 911 recordings presumably were the stereo output of a 24-hour, digital-audio

    recording system.. The sampling rate of the 911 recordings was only 8,000 samples/sec, compared

    to the 44,100 samples/sec associated with audio CD quality. The frequency bandwidth of CALL1

    and CALL3 thus was estimated to be only 40 Hz to 4,000 Hz compared to an audio CD bandwidth of

    10 Hz to 22,05 0 Hz. However, this high-frequency insensitivity is not particularly troublesome in the

    present investigative context, since telephone systems are designed to be relatively unresponsive to

    frequencies above 3,500 Hz.

    Audio CD and 911 data-logging recordings both have 16-bit amplitude resolution, which divides the

    vertical amplitude scale of the digital signal into 216 = 65,536 amplitude gradations. Although 8-bit

    amplitude resolution is attractive for situations requiring small data files, it's vertical scale has only 28

    = 256 amplitude gradations. The 911 -Dispatch System's 16-bit resolution was critical to the success

    of this investigation, in which the recorded signals had a very wide dynamic range (from very distant

    speech to softly whispered speech to a single loud gunshot to several heart-poundingly loud screams.)

    General Structure and Scope of the Present Investigation

    In this summary, I will try to: (a) answer some general questions regarding the nature, usefulness, and

    scope of the materials on the CALL1 and CALL3 recordings, (b) provide some illustrative examples

    of the approach that I took to analyze selected words and phrases, (c) discuss the complexities and

    obstacles that one encounters when trying to decode highly distorted, emotionally driven, overlapping

    speech, and (d) provide an analytic framework for arriving at trustworthy and perceptually stable

    transcriptions and demo recordings of the most difficult-to-understand speech on the CALL1 and

    CALL3 wave files.

  • 7/30/2019 051013 Reich Report-ocr

    2/4

    Page 2

    Nature, Usefulness, and Scope of the Materials on the CALL1 and CALL3 Wave Files

    CALL1 represents the digital audio record of George Zimmerman's 911 call to report his seeing a

    young male whom he thought was acting suspiciously. The two speakers are Mr. Zimmerman and

    a male 911 Dispatcher. The fidelity of CALL 1 is reasonably good but the recording has a number of

    puzzling acoustic anomalies. There are numerous instances of "nonconforming speech" on CALL 1,

    e.g., whispered speech, pitch breaks, garbled or unintelligible speech, vocal impressions, tremulousspeech, and very rough voice quality. The observed behaviors were outside the customary speech

    modes of both the dispatcher and Mr. Zimmerman.

    These nonconforming segments indicate that Mr. Zimmerman frequently shifts or switches voice

    modes or speaking styles. His first utterance on CALL 1 is a whispered, "D'ya think I'm crazy here?" At

    12 seconds from the beginning of CALL 1, he says "or... um... the best ...address I can give you is one-

    eleven Retreat View Circle." During the four-second utterance, he shifts from whispered voice to

    customary voice to detective impression back to customary voice. At 97 seconds , the voiced but

    tremulous "These assholes, they always get away." is preceded by a whispered "Dear God" and followed

    by a whispered "but not on me."

    Mr. Zimmerman's speech patterns periodically show measurable effects of psychological stress (e.g.,

    vocal tremor, pitch breaks, rapid speech). This latter finding is not to be construed necessarily as

    negative since perpetrator pursuits by enforcement officers typically are accompanied by increased

    levels of adrenaline and excitatory neurochemicals. In any case, Mr. Zimmerman's vocal-mode

    switching behaviors need to be examined in greater detail and correlated with relevant physical and

    behavioral events on both recordings.

    CALL3 principally represents the digital audio record of an unidentified woman caller, a female 911

    Dispatcher, and two males involved in a very loud but somewhat distant confrontation just outside the

    woman caller's home. One of the male speakers appears to be George Zimmerman, whose idiosyncratic

    "voice-mode switching" behaviors, vocal impressions, whispering, and tremulous voice are present on

    both CALL 1 and CALL3.

    For example, approximately one second after the start of CALL3, Mi*. Zimmerman makes a seemingly

    religious proclamation, "These shall be." His speech is characterized by the low pitch and exaggerated

    pitch contour reminiscent of an evangelical preacher or carnival barker. The statement is challenging

    forthe untrained listener to detectas it occurs simultaneously withTrayvon Martin's loud, high-pitched,

    distressed, and tremulous "I'm begging you." and the 911 Dispatcher's "Nine-one-one." Many of Mr.

    Zimmerman's "side-bar" utterances are subject to such multiple-talker masking effects and to low

    signal levels.

    The other male speaker was identified tentatively as Trayvon Martin from the audio track of a digital

    video file present on Mr. Martin's cell phone. His voice is younger and he generates much of what some

    observers have called screams. If a scream is defined in operational terms as speech with a very high

    pitch andloudness level, then my findings would support that conclusion. The two males are engaged

    in a loud, purposeful, mostly "turn-taking" linguistic dialogue. The speech associated with the

    confrontation is often is quite difficult to understand, but is amenable to individualized digital

    enhancement and computer-aided transcription, using an interactive, segment-by-segment approach.

    Example of the Analytic and Scientific Approach

    It is often helpful in scientific investigations to begin at the end and work backwards, slogging through

    the inevitably complex details to arrive at a more complete understanding of multifaceted physical or

  • 7/30/2019 051013 Reich Report-ocr

    3/4

    Page 3

    behavioral events. Thus, my investigation began by addressing questions about the last "scream," the

    very high-pitched, very loud production of a single monosyllabic word on the CALL3 wave file.

    Speech and Hearing Scientists often characterize speech as a "series of rapid, complex, overlapping

    movements that have been made audible." The "final "cry" on the CALL3 recording is the result of

    very high-effort speech movements, but, regrettably, the large distance between the highly distressed

    talker and the microphone of the 911 caller's phone markedly attenuates or reduces the speech'samplitude.

    Consequently, the resulting sound pressure level of the final male pre-gunshot utterance is 30.4

    decibels (dB) below the Woman Callers Yes. When the amplitude level of the final word before

    the shot was digitally gained or amplified by a factor of ten, the word appears to be stop not help,

    as previously perceived by some listeners. Perceptually, the two monosyllabic words are quite similar

    and easily confused, especially within the context of a high-effort production.

    Nonetheless, digital spectrographic examination of the word's component frequencies supports a

    "stop" transcription. On CALL3, the first Formant or Resonant Frequency of the leu I vowel in / sta/p

    / is 870 Hz, about 10% above the adult male average. This value is highly appropriate for a 17-year-oldmale who likely still had 10% more growth remaining before reaching his adult-male vocal-tract

    length, diameter, and tonicity. The resonant frequency position (largely related to oral, nasal, and

    pharyngeal anatomy), the fundamental frequency location (a physical measure of pitch related

    principally to laryngeal anatomy), and glottal source spectrum (voice quality resulting from the

    complex, rapid vocal-fold valving of exhaled lung air) suggest that the speaker had not completed his

    hormonally-driven, anatomical and physiological transition into adult-male voice production. In

    addition, the acoustic voice data are consistent with audio/video samples extracted from Mr. Martin's

    cell-phone. They are inconsistent with audio/video samples from Mr. Zimmerman's crime-simulation

    video recording and from an audio recording of a telephone conversation with his wife during his

    incarceration.

    Taken together, the above scientific observations of the recorded pre-gunshot word allowed me toconclude tentatively that the word was produced by the younger of the two male speakers, Trayvon

    Martin. The scientific data may also explain why some witnesses have characterized the final utterance

    as a "boy crying." Of course, the fact that the speaker of the final word was rendered silent by the

    weapon's discharge and George Zimmerman was not, also suggests the identity of the "boy" who was

    crying.

    To illustrate my analytic approach to these acoustic data, I am attaching air pressure-versus-time

    waveforms and corresponding frequency-versus-time spectrograms (KAYPentax Multi-Speech) of

    the interval that includes and closely surrounds the word "stop." These acoustical plots and a

    corresponding wave file comprise the raw speech interval, followed by the fully processed and

    enhanced version. The word "stop" on the raw interval is very soft on the wave demo, very low in

    amplitude on the time waveform, and lacking complexity on the spectrogram.

    Feasibility of Using Global Enhancement Strategies on CALL1 and CALL3

    To explore the feasibility of finding a less-time-consuming approach to analyzing CALL 1 and CALL3,

    numerous global digital-enhancement algorithms (SONY Sound Forge Pro) were applied to the

    Microsoft Windows WAV files, with varying degrees of success. Global enhancement strategies are

    designed to improve the overall fidelity of a noisy, distorted, and/or unbalanced recording. In the

  • 7/30/2019 051013 Reich Report-ocr

    4/4

    Page 4

    present investigation, the enhanced signals often were rendered somewhat less noisy but the speech

    intelligibility was compromised or unchanged rather than improved.

    Thank you for allowing me to consult on this interesting case. If you have questions or need further

    information, please feel free to call or write.

    DECLARATION

    I declare under the penalty of perjury under the laws of the State of New Jersey that the foregoing is

    true and correct. Dated at Oakland, New Jersey on May 9,2013.

    AlanR. Reich, Ph.D.

    Forensic Acoustics Consultant