View
819
Download
0
Embed Size (px)
DESCRIPTION
Jonathan Slaughter's presentation deck for Voice Unification presentation
Citation preview
I Know What You Wrote Last I Know What You Wrote Last
Confidential SDL Information
I Know What You Wrote Last Summer
Using Cumulative Sum for Voice Unification in Authoring
I Know What You Wrote Last Summer
Using Cumulative Sum for Voice Unification in Authoring
Confidential SDL Information
Jonathan Slaughter – Business Consultant
SDL International
@JRSlaughterSDL
Today’s agenda
l Overview
• What is it?
• History of Writer Analysis
l Cumulative Sum
• Early origins
• Current uses
l How it Works
• Creating a Voice
• Analysis of Authors• Analysis of Authors
• Unification
l Applicability and ROI
• Impact
• Where does it make sense?
• When is it “overkill?”
l Examples
• Charts
• Customers
l Q&A
A “voice” is distinct
What is it?
The Cumulative Sum technique is a recognition system applied to human utterance, whether written or spoken. The application of this system is commonly called “QSUM.”
Two-stage analysis based on:
1) analyzing sequences of language 1) analyzing sequences of language units (normal unit is the sentence) and,
2) counts of recurrent kinds of language-use within each sentence
Based on “quantitative stylistics” – the use of mathematical models as a basis for examining the periodic, or recurrent, nature of language.
Literary “scholarship” versus “criticism”
Brief history
1859 – Augustus de Morgan, professor of mathematics at London University first suggests using number of words and average word length of all Epistles to confirm/deny authorship of Hebrews to Paul.
1938 – Cambridge statistician, G. Udny Yale developed first formal word-length index format and focused on word distribution within each sentence and across the document.
1960’s – four major statistical studies around authorship:l 1962 – Alvar Ellegard’s examination of the Junius Letters
l 1964 – Mosteller and Wallace’s study of the Federalist papers
l 1967 – Louis Milic’s analysis of Jonathan Swift’s prose
l 1966 – Morton and McLeman’s work on the Pauline Epistles
1988 – Andrew Morton incorporates cumulative sum tests, commonly used in industrial settings, within the study of human utterance.
1990 – QSUM techniques and graphs used in court case to attribute/refute ownership of confession during appeal. Followed by future uses within courts.
2005* – First uses of QSUM techniques to unify multiple authors’ “voices” to a single “voice.”
How does this fit in to business?
Global organizations are taking significant steps to improve/reduce the costs of creating and distributing content to their end-users. Examples include:
l Minimalisml Global Authoring Practices/Trainingl Workforce Globalizationl Content Management Systemsl Authoring Tools
What none of these tools and processes do is create a truly What none of these tools and processes do is create a truly “homogeneous” voice for authored content.
l CMS systems optimize re-use (consistency) but assume the source content is of acceptable quality
l Global authoring and Minimalism teach “practices” but fail to address the effect of combined voices in re-used content
Voice Unification is a “next” step for organizations looking to establish optimal ROI on process and technological investments.
l Good investment where “brand image” and “brand communication” is central to company success
l Impact on technical material can vary, based upon target marketsl Recommended to clients centralizing source content development in organizations grown
primarily through acquisition (loose integration) or significant shifts in development strategy.
How to create/define your voice?
Understanding what your company “voice” sounds like is important. There are three common methods
l Voice Creationl Mean Voice Alterationl Select Voice Modification
Each provides similar benefits, but the best option depends on but the best option depends on a number of factors, including:
l Content typesl Number of voicesl Audience expectationsl Content re-use
Factors used to define
Cusum analysis, aims to compare two aspects of habitual language use within a given text, segment of text, or combination of texts:
l Length – the number of words, in a sentence written or uttered, by the person providing the sample.
• Cusum is the sum of the deviations in length – more or less – of the sentences from the average sentence length. Produces sld (sentence length distribution)
l Habit – feature of language use within each sentence. Most commonly used are the number of two and three-letter words (23lw) and initial vowel words (ivw).
• Cusum of habit is average of these per sentence, with the deviation from this average tracked.
QSUM charts can then be created combining the graphs of both aspects in overlaid format.
l Provides a visual comparison of that aspectsl The closer the two charts (demonstrated on the next slides) are, the more “homogeneous” the
user voices are – the more likely it was written by the same person.
Voice Unification is a difficult process and requires conscious content creation.
Confidential SDL Information
Wrap Up Q&A