Upload
coreshift
View
213
Download
0
Embed Size (px)
Citation preview
7/30/2019 051013 Reich Report-ocr
1/4
May 9, 2013
Richard ManteiAssistant State Attorney
220 East Bay Street
Jacksonville, FL 32202
DearMr.Mantei:
May this letter serve as a partial summary of my ongoing aural and digital acoustical examination of two
911 recordings re: State of Florida v. George Zimmerman. The supplied recordings were represented
as unredacted digital copies of original digital audio recordings. You requested that I process andanalyze two 911 Dispatch recordings, hereafter referred to as CALL 1 andCALL3. Immediately after
receiving them, I archived the zip-extracted files on magnetic and laser media. In addition, several other
digital recordings were supplied as possible sources of voice exemplars for George Zimmerman and
Trayvon Martin. They are described briefly in a subsequent section of this summary.
Technical Considerations Regarding the 911 Recordings
The moderate-fidelity 911 recordings presumably were the stereo output of a 24-hour, digital-audio
recording system.. The sampling rate of the 911 recordings was only 8,000 samples/sec, compared
to the 44,100 samples/sec associated with audio CD quality. The frequency bandwidth of CALL1
and CALL3 thus was estimated to be only 40 Hz to 4,000 Hz compared to an audio CD bandwidth of
10 Hz to 22,05 0 Hz. However, this high-frequency insensitivity is not particularly troublesome in the
present investigative context, since telephone systems are designed to be relatively unresponsive to
frequencies above 3,500 Hz.
Audio CD and 911 data-logging recordings both have 16-bit amplitude resolution, which divides the
vertical amplitude scale of the digital signal into 216 = 65,536 amplitude gradations. Although 8-bit
amplitude resolution is attractive for situations requiring small data files, it's vertical scale has only 28
= 256 amplitude gradations. The 911 -Dispatch System's 16-bit resolution was critical to the success
of this investigation, in which the recorded signals had a very wide dynamic range (from very distant
speech to softly whispered speech to a single loud gunshot to several heart-poundingly loud screams.)
General Structure and Scope of the Present Investigation
In this summary, I will try to: (a) answer some general questions regarding the nature, usefulness, and
scope of the materials on the CALL1 and CALL3 recordings, (b) provide some illustrative examples
of the approach that I took to analyze selected words and phrases, (c) discuss the complexities and
obstacles that one encounters when trying to decode highly distorted, emotionally driven, overlapping
speech, and (d) provide an analytic framework for arriving at trustworthy and perceptually stable
transcriptions and demo recordings of the most difficult-to-understand speech on the CALL1 and
CALL3 wave files.
7/30/2019 051013 Reich Report-ocr
2/4
Page 2
Nature, Usefulness, and Scope of the Materials on the CALL1 and CALL3 Wave Files
CALL1 represents the digital audio record of George Zimmerman's 911 call to report his seeing a
young male whom he thought was acting suspiciously. The two speakers are Mr. Zimmerman and
a male 911 Dispatcher. The fidelity of CALL 1 is reasonably good but the recording has a number of
puzzling acoustic anomalies. There are numerous instances of "nonconforming speech" on CALL 1,
e.g., whispered speech, pitch breaks, garbled or unintelligible speech, vocal impressions, tremulousspeech, and very rough voice quality. The observed behaviors were outside the customary speech
modes of both the dispatcher and Mr. Zimmerman.
These nonconforming segments indicate that Mr. Zimmerman frequently shifts or switches voice
modes or speaking styles. His first utterance on CALL 1 is a whispered, "D'ya think I'm crazy here?" At
12 seconds from the beginning of CALL 1, he says "or... um... the best ...address I can give you is one-
eleven Retreat View Circle." During the four-second utterance, he shifts from whispered voice to
customary voice to detective impression back to customary voice. At 97 seconds , the voiced but
tremulous "These assholes, they always get away." is preceded by a whispered "Dear God" and followed
by a whispered "but not on me."
Mr. Zimmerman's speech patterns periodically show measurable effects of psychological stress (e.g.,
vocal tremor, pitch breaks, rapid speech). This latter finding is not to be construed necessarily as
negative since perpetrator pursuits by enforcement officers typically are accompanied by increased
levels of adrenaline and excitatory neurochemicals. In any case, Mr. Zimmerman's vocal-mode
switching behaviors need to be examined in greater detail and correlated with relevant physical and
behavioral events on both recordings.
CALL3 principally represents the digital audio record of an unidentified woman caller, a female 911
Dispatcher, and two males involved in a very loud but somewhat distant confrontation just outside the
woman caller's home. One of the male speakers appears to be George Zimmerman, whose idiosyncratic
"voice-mode switching" behaviors, vocal impressions, whispering, and tremulous voice are present on
both CALL 1 and CALL3.
For example, approximately one second after the start of CALL3, Mi*. Zimmerman makes a seemingly
religious proclamation, "These shall be." His speech is characterized by the low pitch and exaggerated
pitch contour reminiscent of an evangelical preacher or carnival barker. The statement is challenging
forthe untrained listener to detectas it occurs simultaneously withTrayvon Martin's loud, high-pitched,
distressed, and tremulous "I'm begging you." and the 911 Dispatcher's "Nine-one-one." Many of Mr.
Zimmerman's "side-bar" utterances are subject to such multiple-talker masking effects and to low
signal levels.
The other male speaker was identified tentatively as Trayvon Martin from the audio track of a digital
video file present on Mr. Martin's cell phone. His voice is younger and he generates much of what some
observers have called screams. If a scream is defined in operational terms as speech with a very high
pitch andloudness level, then my findings would support that conclusion. The two males are engaged
in a loud, purposeful, mostly "turn-taking" linguistic dialogue. The speech associated with the
confrontation is often is quite difficult to understand, but is amenable to individualized digital
enhancement and computer-aided transcription, using an interactive, segment-by-segment approach.
Example of the Analytic and Scientific Approach
It is often helpful in scientific investigations to begin at the end and work backwards, slogging through
the inevitably complex details to arrive at a more complete understanding of multifaceted physical or
7/30/2019 051013 Reich Report-ocr
3/4
Page 3
behavioral events. Thus, my investigation began by addressing questions about the last "scream," the
very high-pitched, very loud production of a single monosyllabic word on the CALL3 wave file.
Speech and Hearing Scientists often characterize speech as a "series of rapid, complex, overlapping
movements that have been made audible." The "final "cry" on the CALL3 recording is the result of
very high-effort speech movements, but, regrettably, the large distance between the highly distressed
talker and the microphone of the 911 caller's phone markedly attenuates or reduces the speech'samplitude.
Consequently, the resulting sound pressure level of the final male pre-gunshot utterance is 30.4
decibels (dB) below the Woman Callers Yes. When the amplitude level of the final word before
the shot was digitally gained or amplified by a factor of ten, the word appears to be stop not help,
as previously perceived by some listeners. Perceptually, the two monosyllabic words are quite similar
and easily confused, especially within the context of a high-effort production.
Nonetheless, digital spectrographic examination of the word's component frequencies supports a
"stop" transcription. On CALL3, the first Formant or Resonant Frequency of the leu I vowel in / sta/p
/ is 870 Hz, about 10% above the adult male average. This value is highly appropriate for a 17-year-oldmale who likely still had 10% more growth remaining before reaching his adult-male vocal-tract
length, diameter, and tonicity. The resonant frequency position (largely related to oral, nasal, and
pharyngeal anatomy), the fundamental frequency location (a physical measure of pitch related
principally to laryngeal anatomy), and glottal source spectrum (voice quality resulting from the
complex, rapid vocal-fold valving of exhaled lung air) suggest that the speaker had not completed his
hormonally-driven, anatomical and physiological transition into adult-male voice production. In
addition, the acoustic voice data are consistent with audio/video samples extracted from Mr. Martin's
cell-phone. They are inconsistent with audio/video samples from Mr. Zimmerman's crime-simulation
video recording and from an audio recording of a telephone conversation with his wife during his
incarceration.
Taken together, the above scientific observations of the recorded pre-gunshot word allowed me toconclude tentatively that the word was produced by the younger of the two male speakers, Trayvon
Martin. The scientific data may also explain why some witnesses have characterized the final utterance
as a "boy crying." Of course, the fact that the speaker of the final word was rendered silent by the
weapon's discharge and George Zimmerman was not, also suggests the identity of the "boy" who was
crying.
To illustrate my analytic approach to these acoustic data, I am attaching air pressure-versus-time
waveforms and corresponding frequency-versus-time spectrograms (KAYPentax Multi-Speech) of
the interval that includes and closely surrounds the word "stop." These acoustical plots and a
corresponding wave file comprise the raw speech interval, followed by the fully processed and
enhanced version. The word "stop" on the raw interval is very soft on the wave demo, very low in
amplitude on the time waveform, and lacking complexity on the spectrogram.
Feasibility of Using Global Enhancement Strategies on CALL1 and CALL3
To explore the feasibility of finding a less-time-consuming approach to analyzing CALL 1 and CALL3,
numerous global digital-enhancement algorithms (SONY Sound Forge Pro) were applied to the
Microsoft Windows WAV files, with varying degrees of success. Global enhancement strategies are
designed to improve the overall fidelity of a noisy, distorted, and/or unbalanced recording. In the
7/30/2019 051013 Reich Report-ocr
4/4
Page 4
present investigation, the enhanced signals often were rendered somewhat less noisy but the speech
intelligibility was compromised or unchanged rather than improved.
Thank you for allowing me to consult on this interesting case. If you have questions or need further
information, please feel free to call or write.
DECLARATION
I declare under the penalty of perjury under the laws of the State of New Jersey that the foregoing is
true and correct. Dated at Oakland, New Jersey on May 9,2013.
AlanR. Reich, Ph.D.
Forensic Acoustics Consultant