Upload
marian-morris
View
221
Download
0
Embed Size (px)
Citation preview
Text Readability
What is Readability?
A characteristic of text documents..
“the sum total of all those elements within a given piece of printed material that affect the success of a group of readers have with it. The success is the extent to which they understand it, read it at an optimal speed, and find it interesting.” (Dale & Chall, 1949)
“ease of understanding or comprehension due to the style of writing” (Klare, 1963)
Text Readability/Difficulty Readability encompasses a number of
areas… Syntactic complexity of the text▪ grammatical arrangement of words within a
sentence, (e.g. active / passive sentences have been shown to affect readability)
▪ Simple/compound sentence/complex sentences Organization of text▪ discourse structure ▪ textual cohesion
Semantic complexity of the text
Why Measure Text Difficulty?
Improve literacy rate
Improving instruction delivery
Judging technical manuals
Matching text to appropriate grade level
And many more…
How to Measure?
Assign score to text based on some textual cues (e.g., average sentence length) Readability formula Over 200 formulas by 1980s (DuBay
2004) Textual cues▪ sentence length, percentage of familiar
words, and word length, syllables per word etc.
Testing validity: correlating predicted score to reading comprehension score
Traditional Measures
Flesch Reading Ease score Score = 206.835 – (1.015 ASL) – (84.6
ASW) Score in [0 to 100] ASL = average sentence length ASW = average number of syllables per
word
Traditional Measures
Dale-Chall Formula Maintains a list of “easy words”. Score = .1579PDW + .0496ASL + 3.6365▪ PDW= Percentage of Difficult Words
FOG index Lexile scale
Commonalities among formulae Linear regression over some predictor
variables
Readability and Web Document
Traditional readability measures are robust for large sample size (textbook and essays) as compared to short and consize web documents.
Web documents are generally noisy
Resource: Predicting Reading Difficulty With Statistical Language Models, Kevyn Collins-Thompson and Jamie Callan
Statistical Language Model for Readability
LM can encode more complex relationships as compared to simple linear regression model in traditional readability measures
A probabilistic distribution in all grade levels
Relative difficulty of words can be obtained statistically as compared to hardcoded approach in traditional measures
Word Usage Across Grades
Earlier grade readers tend to use more concrete words (e.g. red); later grade readers use more abstract words (e.g., determine)
Same observations in web documents
Word Usage Statistics: Example
Word Usage Statistics: Example
Unigram Model of Readability
Syntactic features are ignored
Word (semantic) feature based model
Formulated in a classification framework For a given text passage , predict the
semantic difficulty of relative to a specific grade level ▪ Likelihood that the words of were generated
from a representative language model of
Unigram Model of Readability
𝐿𝑀𝐺1𝐿𝑀𝐺2
𝐿𝑀𝐺𝑛
Text
words
word
s
words
difficulty score difficulty score difficulty score
How is a Text Generated?
Word type 1
Word type 2
Word type k
𝐿𝑀 (𝐺𝑖 )={𝑃 (𝑤1|𝐺𝑖 ) ,𝑃 (𝑤2|𝐺𝑖 ) ,…,𝑃 (𝑤𝑘∨𝐺𝑖)}
𝑻
Token
Multi-Nomial Distribution Example
“In a recent three-way election for a large country, candidate A received 20% of the votes, candidate B received 30% of the votes, and candidate C received 50% of the votes. If six voters are selected randomly, what is the probability that there will be exactly one supporter for candidate A, two supporters for candidate B and three supporters for candidate C in the sample?”
A Generative Model: Multi-Nomial Naïve Bayes (MNB)
Multi-nomial Distribution independent trials▪ Each of which leads to a success of exactly one
of categories▪ Each category has a given fixed success
probability▪ Probability mass function
Generative Model Assumptions Unigram language model Hypothetical author generates tokens of as
follows: Choosing a grade language model according to
prior probability distribution ▪ “I will write for grade level 4” [explicit]
Choosing a passage length according to probability distribution ▪ “I will write no more than 100 words” [Explicit/Implicit]
Sampling tokens from ’s multi-nomial word distribution▪ “I will pick up words with certain distribution” [Implicit]
MNB Model for Readability
We need to compute : Probability that is generated from LM Bayes’ Theorem Compute
MNB Model for Readability
Classification model
MNB Model for Readability
Classification model
]
MNB Model for Readability
Simplified assumptions All grades are equally likely a priori All passage lengths are equally likely
Simplified classification model
]
MNB Model for Readability
Simplified classification model
MNB for Readability: Example
MNB for Readability: Example
Example 1: Passage ”
MNB for Readability: Example
Example 2: Passage T “the red perimeter”
Example 2: Passage T “the perimeter was optimal”
Smoothing
What if a word does not belong to a language model for a grade level A probability will be assigned Redistribute a part of probability mass of
known words to rare and unseen words
Smoothing Model
Smooth individual grade-based language model using Good-Turing smoothing We have estimate of total probability
mass of all unseen words We need to find each unseen word’s
share of this total probability mass Uniform probability distribution?
Smoothing Model
Usage of discriminative words are clustered towards grade levels. Borrow probability mass from
neighboring grade classes
Smoothing Model
Smoothing Model
The type w occurs in one or more grade models (which may or may not include )
▪ is a kernel distance function between i and k.▪ Gaussian Kernel
𝒊 𝒌
Indicators of Readability
Regression Model:
Readability Score assigned documents
𝒑𝟏 ,𝒑𝟐 ,…. ,𝒑𝒏Training
New doc
Readability ScoreResource: Revisiting Readability: A Unified Framework for Predicting
Text Quality, Emily Pitler and Ani Nenkova
Indicators of Readability
There are different predictor variables indicating readability score What is a the contribution of individual
predictor variable in readability score? Testing methodology
Collect Readability
Corpus
Extract Predictor Variable
Measure <readability
score, predictor variable>Correla
tion
Measure of Correlation
Pearson product-moment correlation coefficient () Captures relationship between two
variables that are linearly related .
Correlation Graphs
Measure of Correlation
+Ve
+Ve
-Ve
-Ve
Measure of Correlation
How statistically significant value is? t-test for statistical significance▪ Expressed through -▪ Computed through null hypothesis
the use of drug X to treat disease Y is no better than not using any drug
▪ - of 0.001 signifies ▪ there is a 1 in 100 chance that we would have seen these
observations if the variables were unrelated.
▪ If - computed for a dataset is less than predefined limit (say ), null hypothesis is rejected.▪ Correlation is statistically significant
A Study on Readability Predictor Variables
Methodology Create a readability dataset ▪ “On a scale of 1 to 5, how well written is this
text?” Identify a group of predictor variables Measure correlation between readability
scores and value of predictor variable Decide on the effectiveness of predictor
variables based on correlation score and -
Baseline Measures
Average Characters/Word the average number of characters per word
Average Words/Sentence average number of words per sentence
Max Words/Sentence Maximum number of words per sentence
Text length Limit on -=
Vocabulary or Language Model Unigram model: probability of an article
, is the background corpus▪ Wall Street Journal and AP News corpus
Log-likelihood
This model will be biased towards shorter articles Why?
Compensation Linear regression with predictor variables as log-
likelihood and no of words in the article
Vocabulary or Language Model Log likelihood, WSJ
article likelihood estimated from a language model from WSJ Log likelihood, NEWS
article likelihood according to a unigram language model from NEWS LL with length, WSJ
Linear regression of WSJ unigram and article length LL with length, NEWS
Linear regression of NEWS unigram and article length
Syntactic Features
Average parse tree height Average number of noun phrases per sentence Average number of verb phrases per sentence Average number of subordinate clauses per
sentence Counting SBAR nodes in parse tree
Syntactic Features
Curious case of average verb phrases No of verb phrases per sentence may
increase the text complexity▪ average verb phrases should have a
negative correlation Let’s look at the following examples
It was late at night, but it was clear. The stars were out and the moon was bright. (1)
It was late at night. It was clear. The stars were out. The moon was bright. (2)
Lexical Cohesion Feature Aspects of well written discourse
Cohesive devices like pronouns, definite descriptions, topic continuity
Number of pronouns per sentence Number of definite articles per sentence Average cosine similarity Word overlap Word overlap over nouns and pronouns
Entity Coherence Features
Entity based approach towards local coherence discourse coherence is achieved in view
of the way discourse entities are introduced and discussed
Some entities are more salient than others▪ Salient entities are more likely to appear in
prominent syntactic positions (such as subject or object), and to be introduced in a main clause. ▪ Centering theory models the continuity of
discourse
Entity Coherence Features
Entity-Grid discourse representation Each text is represented by an entity
grid▪ A two-dimensional array that captures the
distribution of entities across text sentences.
Optional Resource: Modeling Local Coherence: An Entity-Based Approach, Regina Barzilay and Mirella Lapata
Entity-Grid Representation
Entity-Grid Representation
If a noun phrase appears more than once in a sentence, we resort to grammatical role based ranking [S>O>X] -- Sentence 1: ‘Microsoft’ appears as subject (S) and rest (X) category -- Mark entry for Microsoft as S
S => Entity appears in subject phrase
O => Entity appears in subject phrase
X => appears in any other phrase
=> does no appear
Entity-Grid as Feature Vector
A local entity transition is a sequence represents entity occurrences and their
syntactic roles in adjacent sentences Each transition will have certain
probability given a grid.
Text -> distribution defined over transition types
Entity-Grid as Feature Vector Feature vector
Probability counts for a fixed set of transition types Each grid rendering of document
▪ is the number of predefined transitions▪ is the probability of transition in grid
What Entity-Grid is Good for?
Sentence Ordering Task determining an optimal sequence in
which to present a pre-selected set of information-bearing items▪ Concept-to-Text generation▪ Multi-document summarization
Simpler task▪ Rank alternative sentence ordering▪ Which from pair of ordering ( ) is better in terms of
coherence?
Modelling Order Ranking Task Training set
Ordered pairs of alternative rendering of same document .▪ Where degree of coherence for is greater than that of .
Training objective▪ To find parameter vector ▪ To yield a ranking score function that minimizes number of
violations of pairwise rankings provided in training set
Modelling
▪ Support Vector Machine Conctraint Optimization problem
Entity Coherence Features
Discourse Relation Features Consider a document as a bag of discourse
relations Language model defined over relations instead
of words Probability of a document generated with
number of relation tokens and number of relation types
Log-likelihood of a document based on its discourse relations
Discourse Relation Features
Increase in number of discourse relations in a document will lower the log-likelihood Number of relations in a document as
feature
Summary: Readability Predictor Study
Summary
200+ readability measures and still counting
Are they really looking at deeper aspects of language comprehension?
Are they tuned towards individual reading abilities?
Is reader in the loop?
How do we comprehend sentences? How do we store and access words? How do we resolve ambiguities?