Upload
voduong
View
231
Download
1
Embed Size (px)
Citation preview
Linguistic Steganography: Information Hiding in Text
Stephen Clark and Ching-Yun (Frannie) ChangUniversity of Cambridge Computer Laboratory
Edinburgh, May 2012
Intro Ling Steg Lex Sub 2
Information Hiding
My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 3
Information Hiding
My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.
mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 4
Information Hiding
My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.
mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca
π = 3.141592653589793 . . .
buubdlupnpsspx
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 5
Information Hiding [Fridrich, 2010]
My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.
mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca
π = 3.141592653589793 . . .
buubdlupnpsspxattack tomorrow
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 6
Steganography
• Steganography is a branch of security concerned with hidinginformation in some cover medium
• Use of images for hiding information has been extensively studied
• Make changes to an image so that the changes are imperceptibleto an observer
• The resulting image encodes the message
• A related area is watermarking, which is concerned with hidinginformation for the purposes of identification (e.g. copyright)
• or e.g. identifying Google translations
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 6
Steganography
• Steganography is a branch of security concerned with hidinginformation in some cover medium
• Use of images for hiding information has been extensively studied
• Make changes to an image so that the changes are imperceptibleto an observer
• The resulting image encodes the message
• A related area is watermarking, which is concerned with hidinginformation for the purposes of identification (e.g. copyright)
• or e.g. identifying Google translations
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 7
The Cover Medium
• Advantages of images
• local changes can maintain global properties of the image• easy to make changes which are imperceptible to a human
• Disadvantages of images
• sender needs an image• sender needs to transmit image to the receiver
• Text is everywhere - why not conceal information in a cover text?
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 8
Example using Lexical Substitution
• Cover text:
Which is why, some would say, it’s slightly odd that when no less anauthority than the chairman of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 9
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less anauthority than the chairman of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 10
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 11
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 12
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 13
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strange thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 14
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 15
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a piece of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 16
Example using Lexical Substitution
• Stego Text:
Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a piece of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 17
This Talk
• Joint work with Frannie Chang
• Outline:
• more introduction to linguistic steganography• a stegosystem based on lexical substitution
• using some simple NLP techniques
• a secret sharing scheme based on adjective deletion
• Purpose of this talk:
• introduce Linguistic Steganography as a new task at theintersection of NLP and Computer Security
• an interesting task for lexical substitution• a new task for natural language generation/regeneration?
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 18
Linguistic Steganography
• Some existing work, but very little compared to images
• Concerned with linguistic transformations, rather than superficialproperties of the text (e.g. white spaces)
• Difficulty is that local changes can lead to inconsistencies:
• ungrammatical or unnatural sentences• grammatical, natural sentences which lack coherence with respect
to the rest of the document (or the world)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 19
Linguistic Steganography Framework
• Assume an existing cover text which will be modified (rather thangenerated from scratch)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 19
Linguistic Steganography Framework
• Assume an existing cover text which will be modified (rather thangenerated from scratch)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 20
Linguistic Steganography Framework
• Note that the receiver does not need a copy of the cover text(just the code dictionary for lexical substitution)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 21
Linguistic Steganography Framework
• Note that this is not a cryptography problem
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 22
Linguistic Steganography Framework
• Trade-off between imperceptibility and payload
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 23
Possible Linguistic Transformations
• Lexical (e.g. synonym substitution)
• Syntactic (e.g. passive/active transformation)
• Semantic/pragmatic
• Can the transformations be applied reliably and often?
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 23
Possible Linguistic Transformations
• Lexical (e.g. synonym substitution)
• Syntactic (e.g. passive/active transformation)
• Semantic/pragmatic
• Can the transformations be applied reliably and often?
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 24
Simple Lexical Stegosystem (Winstein, 98)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 25
Sense Ambiguity Problem
• Decoding ambiguity⇒ use vertex colour coding (later in talk)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 26
Security Simplifications
• Ignoring Kerckhoff’s principle, that the adversary knows thesteganographic channel
• in practice get around this with a secret shared key
• Assuming that the adversary is not a computer (i.e. ignoring thepossibility of steganalysis)
• Assuming that the adversary is passive rather than active
• Ignoring the source of the cover text
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 27
Lexical Substitution Problem
The idea is a powerful one→ The idea is a potent one
This computer is powerful→ This computer is potent
• Some synonyms are not acceptable in context⇒ need to check whether a synonym is applicable in a givencontext (to ensure imperceptibility)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 28
Checking Synonym Applicability
• Use the Google n-gram corpus to see if the synonym in contexthas been used before (and frequently)
• Now a fairly standard technique which has been used for manysimilar lexical disambiguation tasks (Shane Bergsma)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 29
The Google n-gram Corpus
the part that you were 103
the part that you will 198
the part that you wish 171
the part that you would 867
the part that your read 45
the part the Riverside County 51
the part the United States 72
the part the detective was 63
the part the next day 95
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 30
Contextual Check
He was bright and independent and proud →He was clever and independent and proud
f2 = 302, 492 was clever 40,726clever and 261,766
f3 = 8, 072 He was clever 1,798was clever and 6,188clever and independent 86
f4 = 343 He was clever and 343was clever and independent 0clever and independent and 0
f5 = 0 He was clever and independent 0was clever and independent and 0clever and independent and proud 0
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 31
Contextual Check
He was bright and independent and proud →He was clever and independent and proud
Count(w) =∑
n log(fn)max is the highest n-gram Count for any synonym
Score(w) = Count(w)/maxIf Score(w) ≥ threshold , w passes the contextual check
Count(clever) = log(f2) + log(f3) + log(f4) + log(f5) = 28Score(clever) = 28/max = 0.9
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 32
Extensions to the Contextual Check
• Weight some n-grams more heavily than others
• Use wild-cards for unknown words
• . . .
• ⇒ difficult to beat the basic system
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 33
Evaluation
• Automatic evaluation using data from Lexical Substitution Task(McCarthy and Navigli, Semeval 2007)
• Manual human evaluation of naturalness of the modified text
• more direct evaluation of imperceptibility for the steganographyapplication
• We use WordNet as the source of possible substitutes
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 34
Human Evaluation
• Evaluate imperceptibility by asking humans to rate naturalness ofsentences (1–4), in 3 conditions:
• sentence unchanged• sentence changed by our system (with threshold of 0.95)• sentence changed by random choice of target word and random
choice of substitute from target word’s synsets (baseline)
• Sentences are from Robert Peston’s BBC blog
• On average around 2 changes are made per sentence
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 35
Example Sentences
ORIG: Apart from anything else, big companies have the size and muscle toderive gains by forcing their suppliers to cut prices (as shown by the furorehighlighted in yesterday’s Telegraph over Serco’s demand - now withdrawn -for a 2.5% rebate from its suppliers); smaller businesses lower down the foodchain simply don’t have that opportunity.
SYSTEM: Apart from anything else, large companies have the size andmuscle to derive gains by pushing their suppliers to cut prices (as evidencedby the furore highlighted in yesterday’s Telegraph over Serco’s need - nowwithdrawn - for a 2.5% rebate from its suppliers); smaller businesses lowerdown the food chain simply don’t have that opportunity.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 36
Example Sentences
ORIG: Apart from anything else, big companies have the size and muscle toderive gains by forcing their suppliers to cut prices (as shown by the furorehighlighted in yesterday’s Telegraph over Serco’s demand - now withdrawn -for a 2.5% rebate from its suppliers); smaller businesses lower down the foodchain simply don’t have that opportunity.
RANDOM: Apart from anything else, self-aggrandising companies have thesize and muscle to derive gains by forcing their suppliers to foreshorten prices(as shown by the furore highlighted in yesterday’s Telegraph over Serco’sdemand - now withdrawn - for a 2.5% rebate from its suppliers); smallerbusinesses lower down the food chain simply don’t birth that chance.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 37
Experimental Design
• 60 sentences
• 30 judges
• Latin square design with 3 groups of 10 judges
• People in the same group receive the 60 sentences under thesame set of conditions
• Each judge sees all 60 sentences, but sees each sentence onlyonce in one of the three conditions
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 38
Results
• Average score for the original sentences is 3.67 (scale of 1–4)
• Average score for the sentences modified by our system is 3.33
• Average score for the randomly changed sentences is 2.82
• Differences between the systems are highly significant (WilcoxonSigned-Ranks Test)
• Payload is around 4 bits per sentence for this level ofimperceptibility
• Threshold controls tradeoff between payload and imperceptibility
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Intro Ling Steg Lex Sub 38
Results
• Average score for the original sentences is 3.67 (scale of 1–4)
• Average score for the sentences modified by our system is 3.33
• Average score for the randomly changed sentences is 2.82
• Differences between the systems are highly significant (WilcoxonSigned-Ranks Test)
• Payload is around 4 bits per sentence for this level ofimperceptibility
• Threshold controls tradeoff between payload and imperceptibility
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 39
Sense Ambiguity Problem
• Different codewords assigned to different senses of compositionleads to a decoding ambiguity
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 40
Sense Ambiguity Problem
• Represent synonymy relation in a graph
• words are nodes in the graph• edges represent membership of the same synset
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 41
Vertex Colour Coding
• Vertex Colouring: a labelling of the graph’s nodes with colours(codes) so that no two adjacent nodes share the same colour
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 42
Vertex Colour Coding Algorithm
• Assume synsets have no more than 4 words
• 99.6% of synsets have less than 8 words
• Task is to maximise the number of nodes (words) in the graphwhilst assigning a unique codeword to each node
• We propose a greedy algorithm to perform the colouring – addedges and codes assuming some ordering of the words so that notwo adjacent nodes share the same code
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 43
Vertex Colour Coding Algorithm
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 44
The Stego Lexical Substitution System
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 45
Deletion as the Transformation
• Words can often be deleted without affecting the meaning
• especially adjectives
“Have you heard of the mysterious death of your late boarderMr. Enoch J. Drebber, of Cleveland?” A terrible change cameover the woman’s face as I asked the question. It was someseconds before she could get out the single word “Yes” – andwhen it did come it was in a husky, unnatural tone.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 46
Deletion as the Transformation
• How can the receiver detect deleted words in the stego text?
• One possibility is to have more than one stego text, with differentwords deleted in each
• More than one stego text leads to the idea of secret sharing
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 47
Secret Sharing
• There are two receivers, each receiving a different version of thecover text
• Only when the receivers compare texts can the secret message berevealed
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 48
A Secret Sharing Scheme
Secretbits:101
Text: “Have you heard of the mysterious death of your late
boarder Mr. Enoch J. Drebber, of Cleveland?” A terrible
change came over the woman’s face as I asked the question.
It was some seconds before she could get out the single word
“Yes” – and when it did come it was in a husky, unnatural
tone.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 49
A Secret Sharing Scheme
Embed1st bit: 1
Share0: “Have you heard of the death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
Targetadj:mysterious Share1: “Have you heard of the mysterious death of your late
boarder Mr. Enoch J. Drebber, of Cleveland?” A terrible
change came over the woman’s face as I asked the question.
It was some seconds before she could get out the single word
“Yes” – and when it did come it was in a husky, unnatural
tone.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 50
A Secret Sharing Scheme
Embed2nd bit:0
Share0: “Have you heard of the death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
Targetadj:terrible Share1: “Have you heard of the mysterious death of your
late boarder Mr. Enoch J. Drebber, of Cleveland?” A change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 51
A Secret Sharing Scheme
Embed3rd bit: 1
Share0: “Have you heard of the death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the word “Yes” – and
when it did come it was in a husky, unnatural tone.
Targetadj:single Share1: “Have you heard of the mysterious death of your
late boarder Mr. Enoch J. Drebber, of Cleveland?” A change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 52
A Secret Sharing Scheme
read offbits: 101
Share0: “Have you heard of the death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the word “Yes” – and
when it did come it was in a husky, unnatural tone.
Share1: “Have you heard of the mysterious death of your
late boarder Mr. Enoch J. Drebber, of Cleveland?” A change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 53
Adjective Deletion Data
• Pleonasm data for pilot study
• free gift, cold ice, final end, . . .
• Full study used human annotated data
• 1,200 sentences from the BNC marked for naturalness (yes/no)
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 54
Example Judgements (YES)
Judgement Example sentence
Deletable He was putting on his heavy overcoat, asked again casually if he couldhave a look at the glass.
Deletable We are seeking to find out what local people want, because they mustown the work themselves.
Deletable We are just at the beginning of the worldwide epidemic and the situationis still very unstable.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 55
Example Judgements (NO)
Judgement Example sentence
Undeletable He asserted that a modern artist should be in tune with his times, carefulto avoid hackneyed subjects.
Undeletable With various groups suggesting police complicity in township violence,many blacks will find little security in a larger police force.
Undeletable There can be little doubt that such examples represent the tip of aniceberg.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 56
Data Collection
• 30 native English speakers
• 1,200 sentences with 300 annotated by 3 judges; the restannotated by one
• Fleiss kappa was 0.49 (moderate agreement)
• 700 training; 200 development; 300 test
• Ratio of deletable:undeletable was roughly 2:1
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 57
Deletion Classifier
• SVM classifier with a variety of features, e.g.:
• Google n-gram count ratios before and after deletion• lexical association measures between noun and adjective, eg PMI• Noun and adjective entropy measures• . . .
• Full feature set gave best performance in development tests
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 58
Full Classifer Results on Test Set
Threshold 0.69 0.70 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
Pre 70.1 69.8 70.7 72.0 70.8 71.1 74.8 85.0 90.9 100Rec 74.5 73.4 72.9 70.8 65.6 58.9 41.7 26.6 15.6 5.2
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 59
Spot the Changes!
• If you eat a dodgy chicken tikka masala bought from one of our
best-known supermarket strings, and you feel violently sick afterwards,
do you blame the supermarket - or will your ire be placed at the
anonymous manufacturer of the product which caused the tainted meal?
• Most of us would, I think, hold the supermarket accountable, if its name
was on the package - although it was another company altogether that
had the sloppy hygiene measures which caused the poisoning.
• BP also concedes that its own employees made errors, particularly when
it came to interpreting the results of pressure tests.
• But I think it is in BP’s recommendations for change that many will findthe real story, because there BP makes clear that it wants to exercise farbetter oversight of those who work for it when trying to
extract oil from deepwater fields.
Stephen Clark Linguistic Steganography Edinburgh, May 2012
Ambiguity Sharing Deletion 60
Spot the Changes!
• If you eat a dodgy chicken tikka masala bought from one of our
best-known supermarket strings, and you feel violently sick afterwards,
do you blame the supermarket - or will your ire be placed at the
anonymous manufacturer of the product which caused the tainted meal?
• Most of us would, I think, hold the supermarket accountable, if its name
was on the package - although it was another company altogether that
had the sloppy hygiene measures which caused the poisoning.
• BP also concedes that its own employees made errors, particularly when
it came to interpreting the results of pressure tests.
• But I think it is in BP’s recommendations for change that many will findthe real story, because there BP makes clear that it wants to exercise farbetter oversight of those who work for it when trying to
extract oil from deepwater fields.
Stephen Clark Linguistic Steganography Edinburgh, May 2012