Upload
richard-swales
View
106
Download
0
Embed Size (px)
Citation preview
Richard Swales – Edit Point ADR Sync
An Investigation Into The Importance and Necessity of Automated Dialogue Replacement Synchronisation Across Edit
Points in Film.
By
Richard Swales
U1157958
Project 1: Literature Review and Organisation
A Report
Submitted in Requirement for
The Degree of BSc (Hons) Popular Music Production
University of Huddersfield
1st Supervisor Mr. Braham Hughes
2nd Supervisor Mr. Austin Moore
05/12/14
NHE 2440
Richard Swales
1
1
Richard Swales – Edit Point ADR Sync
Abstract:
Automated dialogue replacement is a time absorbing process in which one of its main goals, after capturing an actor’s
performance, is reproducing good audio to video synchronisation. There are technologies that aid this sync reproduction
and many tolerance values given, which indicate how far out of sync the audio element is allowed to be before it
becomes unacceptable. A camera edit point, or cut point, is a point at which there is a chance that the synchronisation
tolerances published do not apply, due to an abrupt emergence into a new scene or camera angle and so the
synchronisation tolerance values will have to be adjusted accordingly. This project will present current synchronisation
tolerances as published by leading broadcast institutions as well as look into how synchronisation errors are created and
dealt with in the film and television broadcasting industries. This research will lead into a number of tests, introduced in
this report, to find out if the current tolerances apply to edit points and present new tolerances for edit point ADR
synchronisation.
Richard Swales
2
2
Richard Swales – Edit Point ADR Sync
Contents
Chapter 1: An Introduction to the Project
1-1: Automated Dialogue Replacement p.4
1-2: A Brief Introduction to Synchronisation Tolerances p.4
1-3: Structural Overview of the Project p.5
Chapter 2: Synchronisation Tolerances and Incorporating them into Film and Television
2-1: Sync/Delay Problems in the Broadcast Chain p.6
2-2: Current Sync Tolerance and Acceptance Rates p.7
2-3: Tolerances at Work in the Film and Television Industries p.9
2-4: Working Outside the Synchronisation Limits p.9
2-5: Tools to Aid ADR Synchronisation p.10
2-6: Dialogue Specific Research p.10
Chapter 3: Synchronisation Detection Tests
3-1: Methods for Detection Tests. p.12
3-2: Testing Stimuli p.12
3-3: Test 1 p.13
3-4: Test 2 p.13
3-5: Test 3 p.13
Appendices p.14
Reference List p.15
Richard Swales
3
3
Richard Swales – Edit Point ADR Sync
Chapter 1 – An Introduction to the Project
1-1 Automated Dialogue Replacement.
The course of automated dialogue replacement or ADR (referred to as many things see appendix 1) can easily cause
synchronisation errors in the postproduction process. Reasons for recording ADR are numerous such as, technical
problems, perspective and voice quality, acting problems, line changes (Purcell, 2007, p278) and many more, but usually
it is needed when the on set sounds are deemed unusable due to background noise masking the dialogue line. This on set
sound is always preferred to ADR as it seizes the actor’s emotions and performance on set, which can be difficult to
reproduce in an ADR studio and ADR can always “kill the charm” (Purcell, 2007, p278) of an actor’s performance. The
original sounds do not require sync editing to the extremes of ADR as sync points, such as clapperboards, are used to
ensure the audio is in sync with the video. ADR is also used to capture the dialogue used for narration purposes and also
non-dialogue sounds, such as heavy breathing, but these do not require sync editing and so are disregarded in this
project. Trying to reproduce the sync is one of ADR’s main goals after capturing the actor’s performance. This sync
recapture is tough to emulate and often the re-recorded dialogue has to be edited to match the originally filmed mouth
movements. This is a time absorbing process which could be re-evaluated by looking at sync tolerances and when
humans can spot sync errors. Sometimes sync can be purposely introduced to film to accentuate styles of language. For
instance as Michael Chion (1994, p.65) explains that in French cinema there is a “tight and narrow synchronization”
where as Italian film sync is “off by a tenth of a second or so” to take “ into consideration the totality of the speaking
body” (Chion, 1993, p.65). Generally sync is tight to the movements, creating a more believable scene for the audience
and this is key as “the moment the sound and picture appear out of sync, the audience is immediately taken out of the
fantasy” (Viers, 2012, p.159)
1-2 A Brief Introduction to Synchronisation Tolerances.
“Some sound editors claim that they can spot sync errors at half a frame”, (Linear acoustics, 2004) and some even less
than that, but generally there are wider tolerances, which have been tested by leading authorities, giving a window of
acceptance. The amount of bodies that have researched sync tolerances and acceptance levels gives an idea of the
problem caused by sync errors. A lot of research has been conducted in the area of television and broadcasting which
shows that this is a problem across the film and television industry. However, the research in television and broadcasting
is still relevant here as human sync perception does not change across different formats and there are still limits set that
are adhered to in the film industry. This project aims to look at these sync tolerances around edit points in films, as this Richard Swales
4
4
Richard Swales – Edit Point ADR Sync
is an area where is it possible that time may be saved by not having to closely edit sync. Some sync errors across edit
points are unavoidable due to the editors in the cutting room not paying close attention to the action across a cut. This
means that there are sometimes double movements, which sound editors have to brush over or else face some
questionable sync sounds.
1-3 Structural Overview of the Project.
Chapter 2 presents the current sync tolerances given by a number of leading broadcast institutions, outlining which ones
are regarded as the standard and which ones are referred to in the film and television industries.
Chapter 3 outlines the tests and experiments to be carried out to decipher if these synchronisation tolerances are wider or
smaller around cut edit points in film. These results will be concluded in the research report.
Richard Swales
5
5
Richard Swales – Edit Point ADR Sync
Chapter 2 – Synchronisation tolerances and incorporation them into film and television
2-1 Sync/Delay Problems in the Broadcast Chain.
As previously mentioned there has been a number of sync tolerance rates proposed by authorities and leading bodies
associated with television broadcasting. These bodies aim their research at synchronisation errors caused in the
broadcast chain, rather than errors induced in the postproduction process. The broadcast chain involves a number of
steps to get the programs from the TV studio out to television sets for audiences to view. These steps are as follows:
Camera
Studio/Outside Broadcast
Codec
Compilation Station
Codec
Local Station
Emission Codec
Local Transmitter
Television Set
Audience
(Waddell, Jones, Goldberg, undated)
When being edited in editing software, audio and video do impose latency issues themselves. When audio is processed
using tools such as EQ and compressors, the audio signal does not impose much of a delay at all and latency issues
usually stay “under 1ms, in the digital domain, falling to micro session in the analog domain” and so “no compensating
video delay needs to be added” (Linear Acoustics, 2004). Video, on the other hand, takes more time to be processed and
“delay is inevitable” (Linear Acoustics, 2004). This means that delay compensation has to be incorporated in video
devices to ensure that the sync error is kept under control. “Each digital audio and video component in the chain from
production to reception imposes some degree of latency on the signals passing through it,”(Advanced Television
Systems Committee [ATSC], 2003). Along each of these steps in the chain the delays between audio and video must be
attended to, to ensure that the overall timing delay is kept to a minimum. These minimum limits will be explored in this
project.
2-2 Current Sync Tolerances and Acceptance Rates.Richard Swales
6
6
Richard Swales – Edit Point ADR Sync
The European Broadcasting Union (EBU, 2007) carried out “subjective tests of the relative delays at which failure of the
synchronism between lip movements and speech becomes perceptible to 50% of observers” The results of these tests
showed a tolerance value of +40ms to -60ms where the positive figure is audio before the visual and the negative is
audio after the visual. They also go on to say that “the accuracy of A/V synchronization at each stage should lie within
the range of audio 5ms early to 15ms late”(European Broadcasting Union, [EBU] 2007), but this is not relevant to the
end of chain audience detection. The Advanced Television Systems Committee has produced figures based around the
“end to end DTV audio-video production, distribution and broadcast system” (ATSC, 2003). They claim “the sound
program should never lead the video program by more than 15 ms, and should never lag the video program by more than
45ms”(-15ms to +45ms) (ATSC, 2003). However, the figures produced are aimed at digital television (DTV)
broadcasting and they actually dismiss the results given by the International Telecommunications Union (ITU, 1998),
which are regarded as the standard figures for synchronisation detection and acceptability.
These figures produced by the International Telecommunications Union (ITU, 1998) (see Figure 1) are widely
referenced in many papers looking at synchronisation detection, however there is little knowledge on the actual tests
they carried out to retrieve the results. The publication explains that the research consisted of “subjective evaluation
undertaken in Japan, Switzerland and Australia” (International Telecommunications Union [ITU], 1998) but this is the
only information they give on the tests they carried out. The results of these subjective tests show a detectability
threshold of “+45ms to -125ms” and an acceptability threshold of “+90ms to - 185ms on the average” ”(ITU, 1998).
Richard Swales
7
7
Richard Swales – Edit Point ADR Sync
The figure of -185ms has been reproduced in a different set of tests carried out by The Institute of Electrical and
Electronics Engineers (IEEE, 2008). Younkin and Corriveau (2008) published a figure of “-185.19ms”, through
conducting research aimed at lip-sync detection. This then disregards the synchronisation rates given by ATSC as they
are much too low, but they go on to say, “a direct comparison does not lend to include differences in methodology,
processing, or specific conditions” (Younkin and Corriveau, 2008) so this counters ATSC’s claim of dismissing ITU’s
results, as the two tests do not draw this direct comparison. The IEEE research clearly states that “The goal is to
establish the detection threshold of lip-sync errors” (Younkin and Corriveau, 2008) and so this separates itself from the
ATSC findings and appears a clearer more trustworthy tolerance. Another reason that the research figures concluded by
Younkin and Corriveau (2008) are trusted is the fact that they match up to the figures presented by ITU in 1998, being
within 0.19ms of the negative value. It is clear to see that throughout all of these results published that the value of the
audio before visual (the negative value) is always smaller than the audio after visual (the positive value) and therefore it
is asymmetric. This is a “consequence of human acclimation to the laws of physics, which set the speed of light and
sound to be widely different”(Cugnini, 2010). Put simply, “ we are accustomed to sound arriving a bit late. It goes
against nature to hear something before we see it.” (Purcell, 2007). Light travels at approximately 299,792,458 meters
per second and sound at approximately 340.29 meters per second, this is why when we watch fireworks we often see the
explosion before we hear the consequential sound. Looking into this reality of light traveling faster than sound Mason
Richard Swales
Figure 1: Detectability and Acceptability Thresholds. (International Telecommunications Union [ITU], 1998)
8
8
Richard Swales – Edit Point ADR Sync
and Salmon (2009) input that “sound caused by an event will always reach an observer later than light from that event”
and so they go on to say “correct synchronisation is achieved by presenting the sound later than the image”, but this
depends on a number of factors, such as acquisition equipment, frame rate and camera types (Mason and Salmon, 2009).
2-3 Tolerances at work in the film and television industries
In response to the ITU research figures Linear Acoustics, creators of professional grade broadcast equipment used by
networks such as NBC claims that “this range is probably far too wide for truly acceptable performance, and tighter
tolerances are generally obeyed” (Audio and Video Synchronisation, 2004) although they do not say why the range is
too high and this has not been broadly considered when looking into synchronisation tolerances. This claim however
may speak some truth. The BBC released a technical recommendation containing standards agreed to by the BBC,
BSkyB, BTSport, Channel 4, Channel 5, ITV and S4C (British Broadcasting Corporation [BBC] 2014). This contains
technical specifications, picture and sound quality requirements, and delivery requirements deemed acceptable by the
broadcasters listed. On page 21 of this technical specification it reads, “the relative timing of sound to vision should not
exhibit any perceptible error. Sound must not lead or lag the vision by more than 5ms.” (BBC, 2014), which is miniscule
in comparison to the figures presented by the ITU. However if we compare these figures with the “range of audio 5ms
early to 15ms late” (European Broadcasting Union, [EBU] 2007), a value given in discussion with each stage of the
broadcast chain, then these figures do match up to a degree. The stage of sound and sync editing is just one of the first
stages in the broadcast chain and so the + and – 5ms figures given to work within, do fit in here.
2-4 Working Outside the Synchronisation Limits
There are some situations where it is hard to work within these very tight specifications. Tom Hobbs, Delivery Assistant
at Films at 59 in Bristol, explained that when films get delivered they have to include “a record report that accompanies
the final master to the broadcasters. Any noticeable cheated sync (dialogue used from other takes etc.) or ADR will be
added to this report with relevant timecodes to let them know we are aware of it and that it cannot be improved.”
(Hobbs, personal communication, November 3rd, 2014), so sometimes keeping to this industry requirement is hard to do.
Vanessa Theme Ament, author of The Foley Grail, says, “the editor will only be concerned if the sync is noticeably
early or late” (Ament, 2009, p.136) which gives a sense of, ‘if it looks ok then it is ok’.
Richard Swales
9
9
Richard Swales – Edit Point ADR Sync
2-5 Tools to Aid ADR Synchronisation
Tom Hobbs, from Films at 59, also spoke about a software plugin called VocAlign by SynchroArts. VocAlign aids
editors in syncing ADR recordings to the original captured on set. It takes the waveform from the original audio that
might be distorted and corrupted in some way and then matches the ADR recording’s waveform to the original, helping
to get phrasing, timing, and sentence flow closer to the original. However this can only be used if the original audio is
still fully intact and only partly unusable, so if the track has a lot of noise in the background, then VocAlign will struggle
to read the waveform. There is always a need for manual sync editing, and so a need for sync tolerances.
2-6 Dialogue Specific Research
A study into television sync error detection has been conducted by Byron Reeves and David Voelker (1993), in which
they presented three different versions of six television segments with a varying amount of sync error, from 0 to 2.5 and
up to 5 frames. After each segment of video viewers evaluated the speakers dialogue and they were asked if they could
spot sync errors (Reeves and Voelker, 1993). They did not present figures in milliseconds and so we assume that they
are working in a standard television frame rate but still depending on the video standard this can vary. “NTSC video is
usually said to run at approximately 30 fps, and PAL runs at 25 fps.” (Adobe, 2011). The figures for the research
conducted by Reeves and Voelker (1993) are presented in milliseconds here:
2.5 fields (frames) - at NTSC (33ms per frame) = 83ms(approximately)
5 fields (frames) - at NTSC (33ms per frame) = 167ms(approximately)
2.5 fields (frames) - at PAL (40ms per frame) = 100ms(approximately)
5 fields (frames) - at PAL (40ms per frame) = 200ms(approximately)
It would be assumed that this experiment was carried out under NTSC standards due to the fact that they took place at
Stanford University in California, USA, which is under the NTSC region list. The conclusion of their results shows that
“Viewers can accurately tell when a television segment is in perfect synch, and when it is 5 fields out of synch. Viewers
cannot accurately tell the same segments are 2.5 fields out of synch” with a field being a frame of film (Reeves and
Voelker, 1993). Summarised and converted into milliseconds, we can see that according to these results viewers can spot
sync errors at 167ms, but cannot spot sync errors at 83ms. There were no values tested in-between these two points and
so it is hard to find the cut off at which viewers can and cannot detect errors. The paper by Reeves and Voelker (1993) Richard Swales
10
10
Richard Swales – Edit Point ADR Sync
states that at 100ms sync errors are not detected, which is under the 145ms detection value given by ITU (1998). These
results are interesting to see as they were drawn on detection of sync error based around dialogue and not general sync
which is assumed for all the research conducted by ITU, EBU, ATSC, IEEE etc. This research is the only source found
that looks into the specific area of dialogue sync detection, an area in which many gaps are still present. This project will
hopefully fill in some of these dialogue specific sync detection limits.
Richard Swales
11
11
Richard Swales – Edit Point ADR Sync
Chapter 3 – Synchronisation Error Detection Tests
3-1 Methods used for Synchronisation Error Detection Tests
There will be a number of separate tests conducted to determine whether the synchronisation tolerances discussed apply
to ADR sync editing around edit points in films. These will consist of:
Test 1: Testing current sync tolerances to see if they still apply to the stimuli and testing conditions provided.
Test 2: Using the thresholds from Test 1 and applying them to edit point sync to gauge whether the tolerance ranges still
apply around edit points.
Test 3: Taking sync error past the thresholds around edit points and testing if there is a new detectability limit for edit
points.
3-2 Testing Stimuli
A short film will be created for the purposes of testing. This film will be a very simple setting of two people having a
conversation. There will be a camera focused on both of them detailing their mouth movements. Directional
microphones will be placed in front of them and connected straight into the camera equipment. This process is known as
a single system and so “sound sync is achieved as the image is recorded” (Viers, 2012, p.159) which means there is a
start point for when sync errors are introduced. The only downside to using a single system is that “cameras are
manufactured to produce great images, not necessarily great sound” (Viers, 2012, p.164), but the choice of camera will
be explored to ensure the audio quality does not suffer too much. The video will be edited in professional standard video
editing software and then imported into a professional digital audio workstation for audio separation, preparing to induce
sync errors. All of the tests in the series will use the staircase method as seen in Younkin and Corriveau (2008)’s
research on lip sync detection. Starting at 0ms, or perfect sync, the error will be gradually increased from film to film
until the viewer detects a synchronisation error. The error amount will then be decreased until the viewer reports that the
error is no longer apparent. The mean will then be taken between these two values to determine the threshold of
detection. This will be repeated with a number of viewers to collect enough results to find an accurate average of error
detection. The placement of the errors will change between tests to switch the area of concentration, so from general
dialogue sync error to edit point sync error.
Richard Swales
12
12
Richard Swales – Edit Point ADR Sync
3-3 Test 1
This test is to be conducted to determine whether the sync tolerances discussed still apply when brought into context of
the testing medium and environment. The sync tolerances given by the ITU (1998) and the IEEE (2008) will be adhered
to for the duration of testing due to the fact that theses are the only two sources research has uncovered that match
figures accurately, and also due to the point that the ITU research is most commonly referenced in research with
synchronisation tolerances. These are as follows:
Audio before visual: 45ms
-
Audio after visual: 125ms
Testing will occur within these boundaries as the research shows that when sync is presented outside these values, sync
errors become detectable. By testing in these boundaries this will give us a start and end value at which to test, going
higher than these values would be pointless as we already know the outcome due to the research conducted. This will
provide us with a new detectability range used for the duration of the following tests. The test will involve subjects
spotting where they think they can see an error in dialogue synchronisation, then the time code will be logged to see if
they picked up on the sync errors in the correct places. There will be variations in the amount of sync pre-delay and sync
delay to test the full range of values presented in the research.
3-4 Test 2
The re-evaluated thresholds from Test 1 will be applied to specific edit points in the film. The same method of testing
will be used to find out if at any edit point there is any change in the sync thresholds presented. Crucially in this test
viewers will not be alerted to the fact that all the induced sync errors are around the edit points. This will allow for a fair
test and not allow the viewers to be concentrating more closely on the dialogue sync around the edits points.
3-5 Test 3
This test will only take place if results from Test 2 show that edit sync detection tolerances give similar results to Test 1.
If Test 2 shows these results, then the detection tolerance limits will be expanded to find out the value of sync error
detection for dialogue around edit points in film.
Richard Swales
13
13
Richard Swales – Edit Point ADR Sync
Appendices
Appendix 1: ADR can be commonly referred to as: Automated Dialogue Replacement, Additional Dialogue Recording
or Looping.
Richard Swales
14
14
Richard Swales – Edit Point ADR Sync
Reference List
Adobe. (2011) Video Learning Guide for Flash: NTSC and PAL video standards. Retrieved from
http://www.adobe.com/devnet/flash/learning_guide/video/part06.html
Advanced Television Systems Committee. (2003) IS-191: Relative Timing of Sound and Vision for Broadcast
Operations.
Ament, V.A. (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford:
Focal Press.
British Broadcasting Corporation. (2014). Technical Standards for Delivery of Television Programs to BBC:
Version 4.2.
Chion, M. (1994). Audio-Vision: Sound On Screen. (C. Gorbman, Trans). New York: Columbia University
Press.
Cugnini, A. (2010) Correction Lip Sync Errors. Retrieved from
http://www.tvtechnology.com/media-systems/0191/correcting-lip-sync-errors/255400#sthash.c82qgEEi.dpuf
European Broadcasting Union. (2007). R37-2007: The Relative Timing of Sound and Vision Components of a
Television Signal.
International Telecommunications Union. (1998). ITU-R BT.1359: Relative Timing of Sound and Vision for
Broadcasting.
Linear Acoustics. (2004). Audio and Video Synchronization: Defining the Problem and Implementing Solutions
Purcell, J. (2007). Dialogue Editing for Motion Pictures: A Guide to the Invisible Art (2nd ed.) Oxford: Focal
Press
Reeves, B., Voelker, D. (1993) Effects of Audio-Video Asynchrony on Viewer’s Memory, Evaluation of
Content and Detection Ability. Stanford University
Viers, R. (2012). The Location Sound Bible: How to Record Professional Dialog for Film and TV. California:
Michael Wise Productions.
Waddell, P., Jones, G., Goldberg, A. (undated) Audio/Video Synchronization Standards and Solutions, A Status
Report [PowerPoint slides]. Retrieved from http://www.atsc.org/cms/pdf/audio_seminar/12%20-%20JONES
%20-%20Audio%20and%20Video%20synchronization-Status.pdf
Richard Swales
15
15
Richard Swales – Edit Point ADR Sync
Younkin, A.C., & Corriveau, P.J. (2008) IEEE Transactions on Broadcasting: Determining the Amount of
Audio-Video Synchronization Errors Perceptible to the Average End-User, 54(3), 623-627
Richard Swales
16
16