10
I Know What You Wrote Last I Know What You Wrote Last Confidential SDL Information I Know What You Wrote Last Summer Using Cumulative Sum for Voice Unification in Authoring I Know What You Wrote Last Summer Using Cumulative Sum for Voice Unification in Authoring

Sdl lavacon 2011 jonathan slaughter

  • View
    819

  • Download
    0

Embed Size (px)

DESCRIPTION

Jonathan Slaughter's presentation deck for Voice Unification presentation

Citation preview

Page 1: Sdl lavacon 2011   jonathan slaughter

I Know What You Wrote Last I Know What You Wrote Last

Confidential SDL Information

I Know What You Wrote Last Summer

Using Cumulative Sum for Voice Unification in Authoring

I Know What You Wrote Last Summer

Using Cumulative Sum for Voice Unification in Authoring

Page 2: Sdl lavacon 2011   jonathan slaughter

Confidential SDL Information

Jonathan Slaughter – Business Consultant

SDL International

[email protected]

@JRSlaughterSDL

Page 3: Sdl lavacon 2011   jonathan slaughter

Today’s agenda

l Overview

• What is it?

• History of Writer Analysis

l Cumulative Sum

• Early origins

• Current uses

l How it Works

• Creating a Voice

• Analysis of Authors• Analysis of Authors

• Unification

l Applicability and ROI

• Impact

• Where does it make sense?

• When is it “overkill?”

l Examples

• Charts

• Customers

l Q&A

Page 4: Sdl lavacon 2011   jonathan slaughter

A “voice” is distinct

Page 5: Sdl lavacon 2011   jonathan slaughter

What is it?

The Cumulative Sum technique is a recognition system applied to human utterance, whether written or spoken. The application of this system is commonly called “QSUM.”

Two-stage analysis based on:

1) analyzing sequences of language 1) analyzing sequences of language units (normal unit is the sentence) and,

2) counts of recurrent kinds of language-use within each sentence

Based on “quantitative stylistics” – the use of mathematical models as a basis for examining the periodic, or recurrent, nature of language.

Literary “scholarship” versus “criticism”

Page 6: Sdl lavacon 2011   jonathan slaughter

Brief history

1859 – Augustus de Morgan, professor of mathematics at London University first suggests using number of words and average word length of all Epistles to confirm/deny authorship of Hebrews to Paul.

1938 – Cambridge statistician, G. Udny Yale developed first formal word-length index format and focused on word distribution within each sentence and across the document.

1960’s – four major statistical studies around authorship:l 1962 – Alvar Ellegard’s examination of the Junius Letters

l 1964 – Mosteller and Wallace’s study of the Federalist papers

l 1967 – Louis Milic’s analysis of Jonathan Swift’s prose

l 1966 – Morton and McLeman’s work on the Pauline Epistles

1988 – Andrew Morton incorporates cumulative sum tests, commonly used in industrial settings, within the study of human utterance.

1990 – QSUM techniques and graphs used in court case to attribute/refute ownership of confession during appeal. Followed by future uses within courts.

2005* – First uses of QSUM techniques to unify multiple authors’ “voices” to a single “voice.”

Page 7: Sdl lavacon 2011   jonathan slaughter

How does this fit in to business?

Global organizations are taking significant steps to improve/reduce the costs of creating and distributing content to their end-users. Examples include:

l Minimalisml Global Authoring Practices/Trainingl Workforce Globalizationl Content Management Systemsl Authoring Tools

What none of these tools and processes do is create a truly What none of these tools and processes do is create a truly “homogeneous” voice for authored content.

l CMS systems optimize re-use (consistency) but assume the source content is of acceptable quality

l Global authoring and Minimalism teach “practices” but fail to address the effect of combined voices in re-used content

Voice Unification is a “next” step for organizations looking to establish optimal ROI on process and technological investments.

l Good investment where “brand image” and “brand communication” is central to company success

l Impact on technical material can vary, based upon target marketsl Recommended to clients centralizing source content development in organizations grown

primarily through acquisition (loose integration) or significant shifts in development strategy.

Page 8: Sdl lavacon 2011   jonathan slaughter

How to create/define your voice?

Understanding what your company “voice” sounds like is important. There are three common methods

l Voice Creationl Mean Voice Alterationl Select Voice Modification

Each provides similar benefits, but the best option depends on but the best option depends on a number of factors, including:

l Content typesl Number of voicesl Audience expectationsl Content re-use

Page 9: Sdl lavacon 2011   jonathan slaughter

Factors used to define

Cusum analysis, aims to compare two aspects of habitual language use within a given text, segment of text, or combination of texts:

l Length – the number of words, in a sentence written or uttered, by the person providing the sample.

• Cusum is the sum of the deviations in length – more or less – of the sentences from the average sentence length. Produces sld (sentence length distribution)

l Habit – feature of language use within each sentence. Most commonly used are the number of two and three-letter words (23lw) and initial vowel words (ivw).

• Cusum of habit is average of these per sentence, with the deviation from this average tracked.

QSUM charts can then be created combining the graphs of both aspects in overlaid format.

l Provides a visual comparison of that aspectsl The closer the two charts (demonstrated on the next slides) are, the more “homogeneous” the

user voices are – the more likely it was written by the same person.

Voice Unification is a difficult process and requires conscious content creation.

Page 10: Sdl lavacon 2011   jonathan slaughter

Confidential SDL Information

Wrap Up Q&A