Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Article
Journal of Information Science
1–27
� The Author(s) 2016
Reprints and permissions:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/0165551516661917
jis.sagepub.com
Developing information qualityassessment framework of presentationslides
Seongchan KimGraduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea
Jae-Gil LeeGraduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea
Mun Y. YiGraduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea
AbstractComputerized presentation slides have become essential for many occasions such as business meetings, classroom discussions, multi-purpose talks and public events. Given the tremendous increases in online resources and materials, locating high-quality slides relevantto a given task is often a formidable challenge, particularly when a user looks for superior quality slides. This study proposes a new,comprehensive framework for information quality (IQ) developed specifically for computerized presentation slides and explores thepossibility of automatically detecting the IQ of slides. To determine slide-specific IQ criteria as well as their relative importances, wecarried out a user study, involving 60 participants from two universities, and conducted extensive coding analysis. Further, we subse-quently conducted a series of multiple experiments to examine the validity of the IQ features developed on the basis of the selectedcriteria from the user study. The study findings contribute to identifying key dimensions and related features that can improve effectiveIQ assessments of computerized presentation slides.
Keywordsinformation quality (IQ); IQ assessment; presentation slides; qualitative study; slide ranking
1. Introduction
Information quality (IQ), which is often defined briefly as the ‘fitness for use’ of information [1], plays a crucial role in
the decisions and actions of information consumers [2]. As the amount of information surrounding an information con-
sumer has been increasing rapidly, it has become highly challenging to locate a high quality source of information, which
is often directly related to the performance of the consumer [3, 4].
Computerized presentation slides are the materials that are created with presentation software (e.g. PowerPoint,
Keynote), as opposed to traditional presentation materials such as papers or overhead projector films [5]. Computerized
presentation slides (hereafter mentioned as presentation slides or slides for convenience) are one of the most popular
information media, commonly used in conjunction with business meetings, academic lectures, multipurpose talks and
public events.
Acknowledging the importance of presentation slides, online services solely focused on presentation slides such as
SlideShare1 and SlideFinder2 have recently emerged. While these specialized platforms offer the ability to perform a
search against millions of slide files with the number growing continuously, most users must wade through a multitude
Corresponding author:
Mun Y. Yi, Graduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea.
Email address: [email protected]
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
of slides and discern the quality of slides before they locate a high quality slide file. This issue has become acute with
the rapid increase in available slides and the growth of platforms offering similar services. Furthermore, on most plat-
forms anyone can upload their slides without any quality verification, thereby making the job of locating high quality
slides increasingly problematic.
To effectively tackle the problem of discerning the quality of presentation slides, developing an IQ assessment frame-
work tailored for computerized presentation slides is a must [6, 7]. The assessment framework needs a taxonomy of IQ
dimensions, criteria, tools and metrics in consideration of the unique characteristics of slides. However, to the best of our
knowledge, no academic efforts have been reported for the development of IQ frameworks specifically targeted for pre-
sentation slides. Several recent studies have attempted to present an IQ assessment framework, not for slides, but for gen-
eral documents such as Web documents and Wikipedia. A set of quality assessment models of Web documents has been
proposed and evaluated [8–12], and a different set of quality frameworks for Wikipedia articles has been reported [13–
17]. When considered together, those studies collectively reveal that IQ dimensions, criteria, models, tools and metrics
vary depending on the types of documents being evaluated. For example, the criteria related to content and readability
are essential for the quality of Web pages [8] and the quality criteria about coverage and structure are important for the
quality of Wikipedia articles [14]. Such IQ assessment frameworks, however, might be inappropriate for slides, as slides
have special features not found in regular documents or Web pages.
The overall purpose of this paper is two-fold: (1) develop a new, comprehensive IQ framework specifically tailored
for computerized presentation slides from the perspective of slide users; and (2) examine the possibility of automatically
assessing the IQ of slides on the basis of the developed IQ framework. Correspondingly, our research has been con-
ducted in two phases (see Figure 1): (1) a user-involved study inclusive of interview, coding analysis and card sorting to
identify the slide quality criteria exercised by slides users; and (2) a series of activities and lab experiments conducted to
verify the applicability of the IQ framework, developed in the first phase, to automatic detection of high quality slides.
In the first phase, we aim at securing a set of quality criteria, measuring diverse metrics at multiple levels for presenta-
tion slides, as well as determining the relative importance of each criterion based on its frequency of mentions made by
the respondents. In the second phase, we first determine the quality of slides obtained from the Web to set the ground-
truth. Then, we define quality features in terms of their IQ dimensions, and extract these features from the slides. These
features are used for LTR (Learning to Rank) algorithm training [18]. After extracting 65 features in 10 IQ dimensions,
we train LTR algorithms such as LambdaMART and AdaRank to reorder the initial search results. We examine the
effectiveness of our proposed method by comparing the normalized discounted cumulative gain (NDCG) of the results
produced by the trained LTR model with the results of the Okapi BM25 ranking function and Google slide search.
To the best of our knowledge, this research is the first to develop a comprehensive IQ framework for computerized
presentation slides, except for our preliminary study [19], which included a very limited set of quality features and did
not include a user study. Compared with other prior research on IQ, our proposed framework can be seen as offering
Phase 1: User Study
IQ framework
Interview Codinganalysis
Cardsorting
Data acquisition /annotation
Feature extraction Learning to rank
IQframework
Phase 2: Automatic Assessment Study
Figure 1. Overall research process.
Kim et al. 2
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
substantial advantages as its development is driven by users’ feedback obtained in a controlled environment, and its
validity is confirmed via a series of automatic assessment experiments that involve state-of-the-art search algorithms.
The rest of the paper is organized as follows. Section 2 presents related work on IQ and slides design. Section 3 pre-
sents the details of our user study and its results obtained via qualitative analyses. Section 4 reports on our automatic slide
quality assessments with the LTR technique. Finally, we conclude this paper with overall discussion in section 5.
2. Related work
IQ is often defined briefly as the ‘fitness for use’ of information [1] and more completely as ‘people’s subjective judg-
ment of goodness and usefulness of information in certain information use settings with respect to their own expectations
of information or in regard to other information available’ [20]. As people’s expectations of information are not uniform,
IQ is inherently a multi-dimensional construct, commonly comprising several elements such as accuracy, objectivity,
relevancy and completeness. For over a decade, a significant body of studies has been conducted on IQ taxonomy or on
automatic IQ evaluation, with regard to various resources on the Web. Also, a number of guidelines have been proposed
for slide design. We summarize those studies in this section.
2.1. IQ taxonomy
Several general IQ taxonomies exist in the literature. Table 1 summarizes these taxonomies, which were used as a basis
for developing our IQ taxonomy of presentation slides. In the literature, context-general taxonomies [1, 7] and context-
specific taxonomies [21, 22] have been reported. Wang and Strong [1] proposed a hierarchical conceptual framework of
data quality, which has been widely adapted for many studies. Their study aimed at developing a general framework that
captured the characteristics of IQ that were important to information consumers across the board. Their framework con-
sists of four IQ categories: intrinsic, contextual, representational and accessibility. Each of those categories further
includes elements as detailed in Table 1. Stvilia et al. [7] suggested a taxonomy through a literature analysis conducted
using 32 representative articles. Their taxonomy has three categories (intrinsic IQ, relational or contextual IQ, and repu-
tational IQ) and 22 dimensions.
For context-specific taxonomies, Alkhattabi et al. [21] proposed an IQ taxonomy for e-learning systems including con-
textual representation, accessibility, intrinsic category and 14 elements (also called dimensions). They initially started
with Wang and Strong’s taxonomy [1] in developing their framework. With a survey involving 315 users and statistical
analysis, they revised the previous taxonomy to specialize it for e-learning systems. In their taxonomy, accessibility was
emphasized as it included response time and availability, which were considered crucial for e-learning systems. Dedeke
[22] suggested a context-specific taxonomy for information systems including ergonomic, accessibility, transactional,
contextual and representation dimensions. Interestingly, the author proposed ergonomic IQ and transactional IQ, which
are not found in other IQ studies, especially for information systems. In sum, prior research on IQ taxonomy suggests that
a general IQ taxonomy has inherent limitations when it is applied to a specific application domain, and that an IQ taxon-
omy tailored for computerized presentation slides needs to be developed anew to properly reflect the unique characteris-
tics of the information media.
2.2. Automatic IQ assessment
Automatic IQ assessment has recently received a considerable amount of attention from researchers in the Information
Science community. However, no research has yet attempted to automatically assess slide quality except for our prelimi-
nary study [19]. Studies on quality-based retrieval can be divided into those that employ LTR (Learning to Rank) meth-
ods and those that do not. Many studies on LTR focused on estimating the relevancy between query and resources; few
studies dealt with the query and resources in terms of quality even though quality is a more comprehensive concept than
relevancy [1]. LTR techniques have gained considerable attention in recent years. Conventional ranking models such as
BM25 and language modelling suffer from parameter tuning and over-fitting. However, LTR, a ranking method that uses
machine learning, provides the advantage of automatically tuning its parameters, along with its ability to combine multi-
ple evidence and avoid over-fitting [18].
LTR has been successfully adapted to various quality-based tasks that utilize Web resources. Richardson et al. [23]
used RankNet, a modified neural network algorithm for learning rankings, to order Web pages with static features based
on anchor texts and domain characteristics. Their research showed that simple URL- or page-based features outper-
formed PageRank. Choi et al. [24] used SVMRank to re-rank initial search results by combining the relevance and qual-
ity scores of medical documents. More specifically, they initially obtained search results with Okapi BM25 from the
Kim et al. 3
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Tab
le1.
IQta
xonom
yin
the
liter
ature
.
Study
conte
xt
Pri
or
rese
arch
Met
hod
IQta
xonom
y
Conte
xt:
gener
al–
Todev
elop
age
ner
alIQ
taxonom
yfr
om
dat
aco
nsu
mer
s’per
spec
tive
Aco
nce
ptu
alfr
amew
ork
for
dat
aqual
ity
[1]
Two-s
tage
surv
ey:
(1)
Gen
erat
ion
ofqual
ity
attr
ibute
sfr
om
112
par
tici
pan
ts(2
)C
ard
sort
ing
toas
sign
dim
ensi
ons
tota
rget
cate
gori
esw
ith
30
subje
cts
Intr
insi
c:th
edeg
ree
tow
hic
hdat
ahav
equal
itie
sin
thei
row
nri
ght
(bel
ieva
bili
ty,ac
cura
cy,
obje
ctiv
ity,
and
repu
tation)
Conte
xtu
al:th
edeg
ree
tow
hic
hdat
aqual
ity
must
be
consi
der
edw
ithin
the
conte
xt
ofth
eta
skat
han
d(v
alue-
added
,re
leva
ncy
,tim
elin
ess,
com
ple
tenes
s,an
dap
pro
pri
ate
amount
ofdat
a)R
epre
senta
tional
:th
edeg
ree
tow
hic
hth
efo
rmat
and
mea
nin
gofdat
aar
ecl
ear
(inte
rpre
tabili
ty,ea
seofunder
stan
din
g,re
pre
senta
tional
consi
sten
cy,an
dco
nci
sere
pres
enta
tion)
Acc
essi
bili
ty:th
edeg
ree
tow
hic
hth
esy
stem
must
be
acce
ssib
lebut
secu
re(a
cces
sibili
tyan
dac
cess
secu
rity
)C
onte
xt:
gener
al–
Topro
pose
age
ner
alta
xonom
yofIQ
dim
ensi
ons
support
ing
gener
alIQ
asse
ssm
ent
fram
ework
Age
ner
alIQ
asse
ssm
ent
fram
ework
[7]
An
anal
ysis
of
repre
senta
tive
item
sin
the
IQlit
erat
ure
Intr
insi
c:m
easu
ring
inte
rnal
char
acte
rist
ics
ofin
form
atio
nin
rela
tion
toso
me
refe
rence
stan
dar
din
agi
ven
culture
(acc
ura
cy/v
alid
ity,
cohes
iven
ess,
com
ple
xity,
sem
antic
consi
sten
cy,st
ruct
ura
lco
nsi
sten
cy,cu
rren
cy,in
form
ativ
enes
s/re
dundan
cy,nat
ura
lnes
s,an
dpre
cisi
on/c
om
ple
tednes
s)R
elat
ional
/conte
xtu
al:m
easu
ring
rela
tionsh
ips
bet
wee
nin
form
atio
nan
dce
rtai
nas
pec
tsof
its
usa
geco
nte
xt
(acc
ura
cy,ac
cess
ibili
ty,co
mple
xity,
nat
ura
lnes
s,in
form
ativ
enes
s/re
dun
dan
cy,re
leva
nce
,pre
cisi
on/c
om
ple
tenes
s,se
curi
ty,se
man
tic
consi
sten
cy,st
ruct
ura
lco
nsi
sten
cy,ve
rifia
bili
ty,an
dvo
latilit
y)R
eputa
tional
:m
easu
ring
the
posi
tion
ofan
info
rmat
ion
entity
ina
cultura
lor
activi
tyst
ruct
ure
,oft
endet
erm
ined
by
its
ori
gin
and
reco
rdofm
edia
tion
(auth
ori
ty)
Conte
xt:
spec
ific
–To
pro
pose
aIQ
taxonom
yfo
re-
lear
nin
gsy
stem
s
Afr
amew
ork
for
e-le
arnin
gsy
stem
s[2
1]
Alit
erat
ure
exam
inat
ion
and
emai
lsurv
eyques
tionnai
refr
om
315
use
rs
Conte
xtu
alre
pre
senta
tion:re
fers
toth
eco
nce
pt
ofco
nso
lidat
ion
bet
wee
nW
ang
and
Stro
ng’s
two
cate
gori
es:co
nte
xtu
alan
dre
pre
senta
tional
(conci
senes
s,ve
rifia
bili
ty,
repr
esen
tational
consi
sten
cy,under
stan
dab
ility
,am
ount
ofin
form
atio
n,re
puta
tion,
com
ple
tenes
s)A
cces
sibili
ty:re
fers
toW
ang
and
Stro
ng’
sac
cess
ibili
tyca
tego
ry(A
vaila
bili
ty,R
elev
ancy
,A
cces
sibili
ty,R
esponse
Tim
e)In
trin
sic:
refe
rsto
Wan
gan
dSt
rong’
sin
trin
sic
cate
gory
(obje
ctiv
ity,
accu
racy
,bel
ieva
bili
ty)
Conte
xt:
spec
ific
–To
def
ine
ata
xonom
yfo
rin
form
atio
nsy
stem
s
Aco
nce
ptu
alfr
amew
ork
for
info
rmat
ion
syst
ems
[22]
An
anal
ysis
bas
edon
four
com
ponen
tsof
info
rmat
ion
syst
ems:
dat
a,in
terf
ace,
work
,har
dw
are/
soft
war
e
Erg
onom
icqual
ity:
the
deg
ree
tow
hic
hth
ein
terf
ace
and
the
soft
war
e/har
dw
are
syst
emis
des
igned
tom
eet
the
nee
ds
ofuse
rs(e
ase
ofnav
igat
ion,co
mfo
rtab
ility
,le
arnab
ility
,vi
sual
sign
als,
audio
sign
als)
Acc
essi
bili
ty:th
edeg
ree
tow
hic
hth
esy
stem
must
be
acce
ssib
lebut
secu
re(t
echnic
alac
cess
,sy
stem
avai
labili
ty,te
chnic
alse
curi
ty,dat
aac
cess
ibili
ty,dat
ash
arin
g,dat
aco
nver
tibili
ty)
Tran
sact
ional
:th
edeg
ree
tow
hic
hth
epro
gram
min
gdes
ign
ofa
spec
ific
work
pro
cess
for
conte
nt
and
logi
cw
ithin
soft
war
e(c
ontr
olla
bili
ty,er
ror
tole
rance
,ad
apta
bili
ty,sy
stem
feed
bac
k,ef
ficie
ncy
,re
sponsi
venes
s)C
onte
xtu
al:th
edeg
ree
tow
hic
hdat
aqual
ity
must
be
consi
der
edw
ithin
the
conte
xt
ofth
eta
skat
han
d(v
alue
added
,re
leva
ncy
,tim
elin
ess,
com
ple
tenes
s,ap
pro
pria
tedat
a)R
epre
senta
tion:th
edeg
ree
tow
hic
hth
efo
rmat
and
mea
nin
gofdat
a(inte
rpre
tabili
ty,
consi
sten
cy,co
nci
senes
s,st
ruct
ure
,re
adab
ility
,co
ntr
ast)
Kim et al. 4
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Medline and PubMed corpora, and then trained classifiers to assess the quality of the retrieved documents. Finally, they
re-ranked the initial results, and observed a significant improvement in the final ranking performance. Regarding Q&A
forums, Dalip et al. [25] adopted a Random Forest approach to rank the quality of answers. They employed a large set
of features in groups named as user, review and structure. By conducting experiments with questions and answers in the
Stack Overflow, they determined that user and review features were the most effective in the Q&A domain.
Several studies on quality-based ranking have used techniques other than LTR. These methods attempt to maximize
retrieval performance in terms of quality by adding a document’s quality score to their own retrieval model score. Raiber
and Kurland [26] developed a query-independent document quality measure that considered stop-words, document
entropy, inter-document similarity and PageRank. They achieved Web-retrieval effectiveness (e.g. removing spam pages
from the retrieved documents) by combining a Dirichlet-smoothed unigram document language model and a quality
score. Bendersky et al. [8] attempted to incorporate the quality score of Web documents into a Markov Random Field
retrieval model to achieve quality-based Web document retrieval. Using seven document features, such as the number of
terms on a page, average term length and entropy, they demonstrated the effectiveness of quality-based retrieval over
relevance-based retrieval. Alkhattabi et al. [27] proposed an IQ assessment model of e-learning systems. The authors
proposed a linear equation to compute the overall quality score using quality metrics as well as their relative importance.
The metrics are organized in an IQ taxonomy with 14 quality dimensions.
Quality-based classification studies have considered Web pages [28], encyclopedia [14] and presentation slides [19].
Wu et al. [28] tackled the classification problem based on the quality of Web pages by examining quality-related factors
such as text length and image quantity. They suggested a learning method that divides training data into subsets accord-
ing to the clustering results of the quality-related factors. For encyclopaedia documents, Dalip et al. [14] classified
Wikipedia articles in terms of quality using the text, style, review and network features of articles. They proposed a
machine learning approach based on regression analysis to combine these quality features into a single quality value.
Regarding presentation slides, estimates of quality were made by our previous study [19]. In our study, we assessed the
quality of slides with a total number of 28 Representational, Contextual and Intrinsic features, and classified the slides
as high, fair or low quality.
2.3. Slides design
There have been several popular guidebooks by experts about designing successful presentation slides. Alley [29] in The
Craft of Scientific Presentations and Reynolds [30] in Presentation Zen offer guidelines for typography, colour, layout
and style for designing presentation slides and delivering successful presentations. These guidelines reflect the writers’
experiences, but are mostly based on anecdotal observations. As information quality is subjective judgement of fitness
of use by its users, it is necessary to incorporate users’ perceptions and standards into slide quality criteria.
Several researchers have conducted experiments to improve the understandability of slides for the audience by sug-
gesting alternative designs. Alley and Neeley [31] outlined an alternative design to the traditional design of presentation
slides (i.e. a phrasal headline with a bulleted list). Their design used a succinct sentence headline instead of a topic
phrase, assisted by visual evidence. In a case study involving PowerPoint slides, they showed that their proposed guide-
line was beneficial for engineering students and professors. A post-study survey indicated that 60% continued to use sen-
tence headlines in most of their slides and 82% used visual evidence. Furthermore, 55% of the audience was mostly
receptive to the alternative design. Further, Mackiewicz [32] examined 37 participants’ perceptions of slides for the
clarity and attractiveness of graphs displaying two- and three-dimensional bars. These studies, however, focused only on
a limited set of the design features of presentation slides.
3. User study
Our research employed a qualitative study [33] to determine the quality criteria of presentation slides directly from the
users. Based on the guidelines offered by popular qualitative study handbooks [33, 34] and the research method used by
a prior study on information quality assessment [17], our user study involved three major activities: (1) an interview in
which users were observed during the evaluation process, (2) a coding analysis, which is an analytical process including
code extraction from interview transcripts and code reconciliation, and (3) card sorting for assignment of the elicited cri-
teria into appropriate IQ dimensions. During the interview, we asked users to think aloud their perceptions of slide qual-
ity. All of the users’ utterances were audio-recorded and fully transcribed. Then, we elicited IQ criteria from the
interview transcripts through coding analysis [33, 34]. Although there exist some guidelines for successful presentation
slide design by experts [29, 30] and research into the design of slides [31], criteria established through a user study can
provide more diverse user views directly without being filtered by intermediaries. To determine the dimension of the IQ
Kim et al. 5
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
criteria, we conducted card sorting, which is a simple and user-friendly technique for understanding the participants’
thoughts and underlying rationale while producing objective topic groups and organizational structures [33, 36].
3.1. Formulation of initial IQ taxonomy
Our literature survey reveals that prior studies have focused on IQ assessment frameworks mostly for regular documents
and that they might be inappropriate for presentation slides. Furthermore, those popular guidelines about the slide design
provide a partial view of IQ, with limited inputs from users. Thus, the present study intends to bridge the gap in the liter-
ature by developing a comprehensive IQ assessment framework specifically focused on presentation slides from users’
perspectives.
Starting with the extant IQ taxonomies, we developed a new IQ taxonomy tailored for the domain of presentation
slides going through a series of activities. Using Wang and Strong’s taxonomy [1] as the starting point, we borrowed
some additional dimensions from other studies as deemed relevant for presentation slides. We excluded IQ dimensions
about accessibility, as it is less essential for slides, given that presentation slides are abundantly available online and not
tied to a single system. Instead, we added reputational category in consideration of Stvilia et al.’s work [7], as the quality
of presentation slides is likely to be heavily affected by the person who created them. As a result, our taxonomy includes
4 categories (intrinsic, representational, contextual and reputational) and 13 IQ dimensions (see Table 2 and Figure 2).
Following the definition of Wang and Strong [1] and Stvilia et al. [7], the intrinsic category refers to the quality origi-
nating from the data contained on the slides. The quality of intrinsic dimensions do not change based on the context in
which the slides are presented. The representational category refers to the quality of information rendering, inclusive of
visual aesthetics and rendering clarity. The contextual category is concerned with the quality of information within the
context of the task at hand. In other words, the contextual IQ can be different depending on the context of the user’s task,
while the intrinsic category including accuracy considers the quality of data itself, regardless of the tasks and contexts.
Because users’ tasks and contexts vary across time and information consumers, IQ measurements of contextual quality
are considered challenging [1, 7]. For instance, completeness (one of the contextual IQ dimensions, which is the extent
to which the information had all of the required parts or necessary elements) of slides can vary because the required parts
and necessary elements of slides are different for academic lectures and for self-study. Fewer parts might be required for
users in class, while more parts may be necessary in self-study. On the other hand, accuracy, one of the intrinsic dimen-
sions, is not affected by time or task. The reputational category measures the position of an information entity in a cul-
tural or activity-related structure, often determined by its origin and record of mediation.
Table 2. Description for the IQ taxonomy.
Category Dimension Description
Intrinsic Accuracy The extent to which the information is true, correct, and preciseCohesiveness The extent to which the information is focused on one topicNaturalness The extent to which the information is expressed in conventional, typical
terms and forms in accordance with generally accepted reference sourcesObjective linkage clarity The extent to which the content of the information is clearly linked to the
presentation objectivesRepresentational Representational clarity The extent to which the representation of information is easily identified,
understandable, and readable per unit (character, paragraph, and page)Representationalconsistency
The extent to which the representation (visuals, format, background, etc.) ofinformation is done in a uniform manner
Visual attraction The extent to which the representation (visuals, format, background, etc.) ofinformation is appealing and engaging to the user
Ease of navigation The extent to which the information is easy to navigate or predictable for theuser
Contextual Completeness The extent to which the information had all of the required parts ornecessary elements
Informativeness The amount of information contained in the presentation materialRecency The extent to which the age of the information is up-to-dateTask appropriateness The extent to which the information is proper in the context of a specific
activity or taskReputational Author/institutional
reputationThe extent to which the information of the author or institution is trusted orhighly regarded in terms of its source or content
Kim et al. 6
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
The initial version of the IQ taxonomy was further modified and refined through a series of activities involving users.
Specifically, we conducted (1) a qualitative interview in which user opinions were obtained through the think-aloud pro-
cess; (2) a coding analysis, which is an analytical process including code extraction from interview transcripts and code
reconciliation; and (3) a card sorting experiment for the assignment of the elicited criteria into appropriate IQ
dimensions.
3.2. Interview (think-aloud)
The participants for the study were 60 students from two national universities in South Korea. To remove any idiosyn-
cratic issues associated with a single university or a single discipline, we recruited students in two broad fields of study
at the two universities: (1) Management and (2) Science and Engineering. There was an equal mix of Management
majors (30 – student majors included Management Science, Economics and International Trade), and Science and
Engineering majors (30 – student majors included Mechanical Engineering, Bio and Chemical Engineering, Electronics
and Information/Computer Science). For both, there was an equal distribution of 15 undergraduate and 15 graduate stu-
dents, and gender was equally distributed (30 males and 30 females). The participants indicated that they frequently used
presentation slides for activities such as classes, seminars, meetings and self-study. All participants received monetary
rewards for participating in the interview study. Table 3 provides a summary of the interview participant statistics.
Prior to the interview, each participant was asked to select three courses that he or she had taken successfully during
the previous semesters and to pick the one he or she liked most. This selection process was needed to ensure that the par-
ticipants fully understood the contents of the slides and the contexts in which the presentation slides could be used. Five
slide files were randomly selected from one of the chosen courses. We used SlideShare,1 which provides links to slides
on the Web, to gather the slides used in conjunction with the interview. The selection process resulted in 31 courses and
155 slides in both PowerPoint(ppt/pptx) and PDF formats. Each participant had a one-to-one interview in a room dedi-
cated to the interview study. The purpose of the study was briefly explained at the beginning of each interview.
Information quality
Intrinsic
Accuracy
Cohesiveness
Naturalness
Object linkage clarity
Representational
Representational clarity
Representational consistency
Visual attraction
Ease ofnavigation
Contextual
Completeness
Informativeness
Recency
Task appropriateness
Reputational
Author/Institutional reputation
Figure 2. An IQ taxonomy of presentation slides.
Table 3. Summary of interview participant statistics.
Criteria Average Notes
Age 23.9 (SD 3.05) Max: 33, Min: 18Years using slides 5.7 Max: 10, Min: 1Proficiency on making slides 4.2 (Proficient) 5-point scaleNo. of courses in a semester 4.6No. of courses using slides 3.6Interview time (minutes) 47.2 Max: 84, Min: 18
Kim et al. 7
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Participants filled out a short questionnaire about their background and signed an informed consent form. Next, the parti-
cipant was asked to view the five monitor screens, each of which displayed one of the presentation slides from the
course selected by the participant. Participants were allowed to go back and forth between the screens if they wanted to
compare slides. They were also provided with the slides on paper for their convenience. They were asked to ‘think-
aloud’ while they read and evaluated the quality of the presentation slides. They were also asked to indicate which one
of the five slide files was of the lowest or of the highest quality and to provide an explanation for their choices.
3.3. Coding analysis
Verbal data was transcribed from the recordings of all the interviews. The analysis carefully followed the rules of qualita-
tive study, and several steps were taken to ensure validity [33]. In the first stage, three coders conducted content analysis
to code the interview transcripts and identify quality criteria as articulated in the scripts. Each coder independently open-
coded the entire transcript. While there are various coding methods for diverse purposes, our study used Descriptive cod-
ing and In Vivo coding [34]. Descriptive coding summarizes the text in the transcript into one essential short keyword or
phrase and In Vivo coding keeps the text in the subject’s own language to represent grounded concepts well. After the
primary coding was completed, the resultant schemata were aggregated and all differences were reconciled. Several dis-
cussions and meetings were conducted to reconcile and merge the differences among the three coders. Finally, the coders
recoded the entire sample using the aggregated final schema. Reconciliation included the unification of terms in codes
and merging of codes between different expressions with an identical meaning. The rigorous coding process identified
216 quality criteria mentioned by at least two participants out of 3,617 total utterances regarding quality (1,557 positive;
2,060 negative). The iterative coding process included combining different expressions with the same meaning into a sin-
gle quality criterion and selecting proper terms for expressing the criterion by the coders. We utilized Atlas.ti 73, which is
a widely used tool of qualitative data analysis and research.
3.4. Card sorting
Card sorting exercises were conducted to group the quality criteria into appropriate quality dimensions. This method
commonly involves sorting a set of cards, each of which contains a label that addresses a topic, into groups that have
common aspects among them [35]. Fifteen students (mean age: 26.7, SD: 2.4; Major: Management 7, Science 8), differ-
ent from those who had participated in the interviews, were selected to perform the card sorting exercise, and were
assigned to one of five teams. Each team consisted of three subjects. Each quality criterion with its description (identi-
fied through the aforementioned coding analysis) was printed on a card, and the subjects in a team setting were collec-
tively asked to separate the cards into groups. The subjects were allowed to freely discuss their ideas among the team
members during the sorting exercises [35, 36].
An open sorting exercise and a closed sorting exercise were performed in sequence. In the open sorting exercise, sub-
jects were asked to perform a trial sort using 20 randomly selected cards without any predefined dimensions. They were
asked to separate the cards into any number of groups (piles), label each pile of cards, and explain their rationale for
grouping the cards together. This exercise was performed to ensure that the subjects understood the card sorting proce-
dure. Then, a closed sorting exercise was conducted. In this sorting, the 216 quality criteria identified in the previous
coding analysis step were used. The subjects were asked to place the 216 cards into the 13 quality dimensions presented
in Table 2. The sorting took 191 minutes on average per session. There was a separate session for each group. We com-
puted a correlation score of the card sorting study [36]. The correlation score indicates how often a card was put into the
same category by different subjects, as in Eq. (1). The average correlation score was 0.70, which means medium agree-
ment [36].
Correlation i, jð Þ=P i, jð Þ
Pð1Þ
where Correlation i, jð Þ is the correlation of the card i in the category j, P i, jð Þ is the total number of participants who put
the card i in the category j and P is the total number of participants.
3.5. Results of interview
In this section, we report the results from the coding analysis. The top 10 criteria by the number of respondents are
reported in Table 4. The three quality criteria – ‘highlighting of important points and terms’, ‘writing style’ and ‘the
Kim et al. 8
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
presence of too much text’ – were supported by over 40 people, which was over two-thirds of the entire subject popula-
tion (60). The criterion mentioned by the largest number of respondents was ‘highlighting of important points and terms’,
which was mentioned 132 times by 47 participants. A criterion is considered positive if all mentions made by all partici-
pants showed the quality as something positive. The results clearly indicate that users consider emphasizing key points
with highlights as a desirable quality feature. Examples of direct mentions by users include ‘Another good thing is that
highlighting the important parts in red helps my understanding; the title in bold is easier to see’, and ‘The important parts
are highlighted with a strong color; therefore, I can take more notice of the significant parts.’
The second highly ranked criterion is ‘writing style’, which is mentioned 69 times by 47 respondents. This one is
interesting as we found that users mentioned two types of writing style: ‘summarized writing style’ and ‘sentence writing
style’. The former indicates that most texts in slides are expressed in summarized and condensed forms, while the latter
means that texts are expressed in complete sentences. Each style received two contradictory interpretations by the partici-
pants. Some participants regarded ‘summarized writing style’ as preferable; however, others responded the opposite was
true. ‘Summarized writing style’ was noted 27 times by 23 participants. Among them, 19 respondents (82.7%) considered
it positive and four (17.3%) negative. Positive examples include: ‘The condensed, summarized style of writing using
important keywords improves my concentration. It helps me reproduce key facts and ideas.’ Responding negatively, one
participant said, ‘I can’t understand what this phrase means. A full, descriptive sentence would improve the explanation,
though it would be a bit longer.’ On the other hand, ‘sentence writing style’ was mentioned 42 times by 24 respondents –
five respondents (20.9%) considered it positive and 19 (79.1%) negative. A positive example was: ‘Explanations using
natural sentences are much more understandable for me. I can catch the points from detailed explanations’, whereas a
negative example was: ‘The sentences are hard to read because they are usually so long. I prefer short, well-constructed
expressions in slides.’ Overall, 80.9% of participants (38 out of 47) preferred summarized expression (summarized writ-
ing style) over complete sentences (sentence writing style) in the slides (57 mentions out of 69).
Thirdly, ‘the presence of too much text (in the slides)’ was mentioned 120 times by 43 participants – 117 times
(97.5%) negatively and 3 times (2.5%) positively. Negative examples include ‘It seems there are too many letters on the
page to grasp the content’, and ‘When you take a course with slides that have a large amount of writing, they usually
move more quickly. Because I cannot read beyond a certain point, I think it is not good.’ A positive example was ‘The
advantage seems to be that I can study only with the slides without having a textbook.’ For more details, we report the
top five criteria in each dimension in the next section and the first two criteria with the actual users’ comments in
Appendix A. The results are organized according to the results of card soring.
3.6. Results of card sorting
Table 5 presents the distribution of the sorted criteria (i.e. 216 quality criteria matched with 13 quality dimensions in 4
categories) obtained from the closed sorting exercise. Regarding the quality category, the representational criteria
(66.7%) were the most frequently mentioned by users. Contextual (28.2%), intrinsic (4.6%) and reputational (0.5%) cri-
teria followed. The first two categories represent almost 95%, indicating that the participants considered those criteria
that were directly related to comprehending the content of presentation slides naturally and within the context of the task
at hand much more important. Regarding the quality dimension, visual attraction (28%), representational clarity (18%)
and informativeness (12%) were the criteria most frequently mentioned.
Table 4. Top 10 quality criteria from coding analysis by respondents.
Rank Criterion No. of mentions No. of respondents
1 Highlighting of important points and terms 132 472 Writing style 69 473 The presence of too much text 120 434 The presence of examples 93 365 The presence of additional explanations (for equations, graphs, figures, tables) 80 346 The presence of figures 56 337 Good summarization 43 268 The presence of a large amount of content (slides) 40 269 The line-by-line presence of animation 42 2510 The presence of slides numbering 39 23
Kim et al. 9
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Table 5 also shows the average number of mentions and respondents for each dimension. Interestingly, ‘Visual attrac-
tion’ and ‘Representational clarity’ are the first and second dimensions in terms of the number of unique criteria.
‘Informativeness’ is the first dimension in terms of the average number of mentions and respondents, although it is the
third dimension in terms of the number of unique criteria. This means that criteria in Informativeness were more uni-
formly shared by many respondents while criteria in Visual attraction and Representational clarity were diverse yet
intensively shared by a small number of respondents. Criteria in Visual attraction and Representational clarity are more
subjective, while those in Informativeness are more objective. Detailed results are given in Table 6, where we report
only the top five criteria and the numbers of mentions and respondents in each dimension.
3.7. Discussion
The results in Table 5 show that, compared with other categories, the Representational category criteria were mentioned
more often by users regarding slide quality, agreeing with prior research on presentation slides. For instance, participants
prefer visually rich slides that contain images, diagrams and graphs [31, 32], indicating that understanding the content of
slides in an efficient and integrative manner is crucial. Because users have to pay a great deal of attention to what presen-
ters offer, users naturally want the presentation materials to be cognitively intuitive and less burdensome.
It should be also noted that our finding on the importance of the Representational criteria is different from prior IQ
research findings on other types of documents, such as Web documents and Wikipedia. In a qualitative study conducted
to identify key criteria for the quality of Web pages, Rieh [11] found that Web users considered content as the most
important object in the Web, followed by graphics and organization/structure. Yaari et al. [17] examined the quality cri-
teria for Wikipedia articles using a group of 60 users and found that users more frequently mentioned coverage and
structure rather than those criteria belonging to the Representational category. The accumulated research findings seem
to suggest that users employ a different set of quality criteria depending on the type of document.
One notable criterion is ‘writing style’. Many participants preferred summarized expressions (summarized writing
style) over complete sentences (sentence writing style) in the slides (by 38 respondents out of 47 and 57 mentions out of
69). It is interesting to note that these results do not completely coincide with Alley and Neeley [31], who proposed that
the alternative (preferred) design depends on succinct sentence-style titles in slides. It should also be noted that our find-
ings differ from Reynolds’s recommendation to use sentences rather than topic statements [30]. However, according to
our results, more users prefer summarized writing style to sentence writing style. The results do not mean that sentence
writing style always causes lower quality slides. Surely, Alley and Neeley point out there are exceptions to sentence
headline – for example, on title slides, transition slides and any slides on which a sentence is not warranted. Synthesizing
these prior study findings with ours, we conclude that slide authors need to be mindful in determining an appropriate
writing style depending on the context and must pay careful attention to the pros and cons of the two styles. We also
point out that this finding supports the claim that quality is a subjective concept, which depends on subjective judgement
of goodness and usefulness of information [20].
Table 5. Distribution of the criteria.
Category Dimension No. of uniquecriteria
% of uniquecriteria
Avg. no. ofmentions
Avg. no. ofrespondents
Intrinsic Accuracy 3 1.4% 3.3 3.3Cohesiveness 3 1.4% 8.7 7.7Naturalness 1 0.5% 4.0 3.0Objective linkage clarity 3 1.4% 11.3 8.6
Representational Representational clarity 39 18.1% 14.0 8.2Representational consistency 25 11.6% 7.5 5.8Visual attraction 61 28.2% 10.9 7.7Ease of navigation 19 8.8% 10.6 8.4
Contextual Completeness 10 4.6% 8.0 7.0Informativeness 27 12.5% 26.4 15.0Recency 1 0.5% 3.0 2.0Task appropriateness 23 10.6% 11.2 7.9
Reputational Author/institutional reputation 1 0.5% 3.0 3.0Total 216 100.0% Avg. 7.2 5.6
Kim et al. 10
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
‘Naturalness’ of IQ means the degree to which information is expressed by conventionally typified terms and forms
in accordance with some generally accepted reference source(s). ‘The presence of equation’ is categorized into this
dimension on the basis of the card sorting results in Phase 1. This allocation is reasonable considering that equations are
generally made up of mathematical symbols, which represent conventional and typified concepts in a domain. ‘Task
Table 6. Examples of the criteria with the number of mentions and respondents (ordered by respondents).
Category Dimension Criterion No. ofmentions
No. ofrespondents
Intrinsic Accuracy The presence of typos 5 5The presence of inaccurate explanations 3 3The presence of incorrect images related to thecontent
2 2
Cohesiveness The presence of irrelevant content 18 15Irrelevant page title for the content 5 5Strong content connectivity 3 3
Naturalness The presence of equations 4 3Objective linkage clarity Content flow 25 19
Reversed content order against the outline 7 5Weak connection between outline and content 2 2
Representational Representational clarity Highlighting of important points and terms 132 47Writing style 69 47Resolution of figures/images 45 22Font size 41 21Sentence spacing 28 19Explanation with comparison 26 18
Representational consistency Inconsistent font face 18 16Inconsistent font size (in content, title) 21 15Inconsistent indentation level 14 10Consistent use of background template 12 9Inconsistent animation effects (speed, step, etc.) 9 9
Visual attraction The line-by-line presence of animation 42 25Clean and neat slide design 35 21The presence of animation 30 21Color scheme 25 15Default font color 20 15
Ease of navigation The presence of slide numbering 39 23The presence of a table of contents 27 19The presence of textbook page numbers 9 7The presence of a title for each page 8 7Showing subsection titles on each page 7 6
Contextual Completeness The presence of references 15 14The presence of necessary information on thecover page (title, author, etc.)
14 11
The presence of necessary content 11 10The absence of default/general content for aconcept
8 6
The presence of an appendix 3 3Informativeness The presence of too much text 120 43
The presence of examples 93 36The presence of additional explanation (forequation, graph, figure, table)
80 34
The presence of figures 56 33The presence of a lot of content (slides) 40 26
Recency Outdated content 3 2Task appropriateness Good summarization 43 26
The presence of exercise 21 19The presence of question 22 15Enough white space on a page 20 15Unsuitable for printing 11 11
Reputational Author/institutionalreputation
Slides by the textbook publisher 3 3
Kim et al. 11
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
appropriateness’ is the degree to which information is proper and useful for a given task. This dimension includes those
aspects denoting how much the slides are fit for the tasks for which they are designed, including class presentation, self-
study, note-taking and reporting. As for task appropriateness, the criteria such as ‘good summarization’, ‘the presence of
exercise’ and ‘the presence of question’ are assigned into this dimension.
One of the main contributions of the present study is providing a rich set of quality criteria (i.e. 216) for presentation
slides. Because the open-ended interviews did not provide any prior criteria, the participants could come up with a vari-
ety of quality criteria. This set of criteria could be a foundation for developing diverse metrics at multiple levels for pre-
sentation slides. Another contribution is the measurement and consequent identification of the relative importance of
each criterion based on user utterances obtained through the think-aloud process. The measurement data served as a basis
for subsequent feature selection out of the whole set of criteria for automatic assessment in ranking. We selected and
implemented the quality features by the number of mentions and respondents. Such insights are unavailable in the guide-
lines provided by professionals.
One limitation is that our user study considered only lecture slides, not other types of slides. However, experiments
later prove that quality criteria from lecture slides are also effective in IQ assessment for other types of slides (see
Section 5.1). Another limitation is that all our participants were Korean students. Their perceptions might be different
from other ethnic groups due to cultural differences. Extant literature is unclear on how culture affects slide feature pre-
ferences and evaluations. Moreover, even though they are also information consumers, students might have different
quality expectations than business professionals or subject experts. Nevertheless, it is important to note that the study
findings provide a significant starting point for the establishment of slide quality criteria and dimensions from users’
perspectives.
4. Automatic slide quality assessment
In this section, we present our approach to automatically assessing the quality of presentation slides, ranking them via
some of the quality criteria identified in our previous user study. Figure 3 describes our overall process of the automatic
assessment approach. More specifically, our approach consists of two stages: initial search and slide quality assessment.
The initial search stage aims to identify slides that are relevant to the query, without the consideration of slide quality.
The second stage assesses the quality of each relevant slide and produces a quality-based ranking. In the second stage,
we perform (1) data annotation to build a ground-truth dataset, which was divided into a training set and a testing set, (2)
a feature extraction as a feed for an LTR model, and (3) an experiment of LTR model training and prediction/ranking.
4.1. Rationale and slide quality assessment
Compared with conventional ranking models such as BM25 and the statistical language model, LTR is an effective rank-
ing model because it offers additional benefits such as automatically tuning parameters, combining multiple evidence
and avoiding over-fitting [18]. In addition, LTR has been successfully adapted to various quality-based ranking tasks that
utilize Web resources [23, 24, 25]. However, a main disadvantage of LTR is that it is quite expensive to create training
data via labelling of all of them, in particular if the dataset is large. If this labelling issue can be overcome, LTR is a bet-
ter choice than the traditional ranking models as LTR offers several advantages, as already mentioned. Thus, we decided
to use LTR in conjunction with the features extracted from our IQ framework.
This section explains the rationale and realization of the learning scheme. LTR is a supervised learning method that
includes training and testing phases [18, 37]. The training dataset consists of a number of queries and slides. The quality
of the slides with respect to the query is represented by several grades, with a higher grade corresponding to higher qual-
ity. Let Q= q1, q2, . . . , qmf g be the query set, where qi is the i-th query, and S = si, 1, si, 2, . . . , si, nif g be the slides set.
L= 1, 2, . . . , lf g is the grade label set, which is ordered l > l � 1> . . . > 1. Suppose that L= li, 1, li, 2, . . . , li, nif g is the
set of labels associated with query qi, where ni denotes the size of Si and Li; si, j denotes the j-th grade label in Li,
Table 7. Description of datasets.
Dataset Source No. of queries No. of slides
SLIDES-SA SlideShare 140 935SLIDES-SF SlideShare 500 24,995SLIDES-GA Google 36 180
Kim et al. 12
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
representing the quality degree of si, j with respect to qi. The training set is denoted as T = qi, Sið Þ, lif gmi= 1. A feature
vector xi, j =u qi, si, j
� �is derived from each query-slides pair qi, si, j
� �, i= 1, 2, . . . ,m; j= 1, 2, . . . , ni, where u denotes
the feature functions. The training dataset is represented as S0= xi, lið Þf gmi= 1, where xi = xi, 1, xi, 2, . . . , xi, ni
f g. Our goal
is to train a ranking model f = q, sð Þ= f xð Þ that assigns a score to a given feature vector x. We then select a ranking list
from all possible ranking lists for the given query qi and the associated slides si using the scores given by the ranking
model f qi, sið Þ. The testing data is denoted as T = qm+ 1, Sm+ 1ð Þf g, consisting of a new query qm+ 1 and associated
slides Sm+ 1. A feature vector xm+ 1 is created from Sm+ 1, and a score is assigned to the slides Sm+ 1 by the trained rank-
ing model. A ranking list of slides is then formed based on the sorted scores. A local ranking model is a function of a
query and slides, or equivalently, a function of a feature vector created from a query and slides. Figure 4 depicts the flow
of the LTR scheme of quality-based ranking for presentation slides.
In the training phase, we applied the LTR algorithm to train the model for slide quality assessment. We first created a
set of training slide queries and then generated relevant slides via BM25 and Google. Next, a group of judges were asked
to determine the quality grade of each query and the returning slides set (see Section 5.1) by rating the quality of each
slide and ranking the order of the slides in a query set. With the annotated data, the LTR algorithm generated a global
model that considered the ranking relationships among the slides of each training query set. In the testing phase, we used
k-fold cross-validation to generate a single held-back set from the dataset given by the BM25 algorithm and Google
slide search. More specifically, k− 1 different partitions of the whole dataset were used to train the LTR model, and the
remaining partition was used as the test data. The learned LTR model then predicted the ranking score of each slide in
the query set. The system produced a ranked list based on these scores. The k results were then averaged to produce a
single assessment. In this manner, the assessment stage can be seen as a period in which re-ranking of the initial rele-
vance search results was performed in consideration of quality. Both training and test data underwent the feature extrac-
tion procedure (see Section 4.2).
4.2. Features
This section describes the automatic assessment quality features that were inspired by our own user study findings.
Quality criteria can be broadly divided into two types: measurable and non-measurable [17]. Measurable criteria are
those criteria that can be objectively and reliably extracted by a computer program without user intervention (e.g. the
number of words in slides) whereas non-measurable criteria are those criteria that are subjectively assigned by a human
(e.g. writing style). We selected 35 measurable criteria from a total number of 216 criteria identified earlier through our
user study, primarily considering two conditions: the number of respondents and implementation feasibility. The first
condition was based on our assumption that those quality criteria mentioned by more users are likely to be more influen-
tial than the less mentioned criteria in determining the quality of slides. For example, in case of representational clarity,
when we decided the features to be implemented, we considered the criteria from the top in the list (see Table 6). The
criterion ‘highlighting of important points and terms’ is the most frequently mentioned one for the dimension. Thus, we
thought it would be reasonable that highlighting important points should be selected and it would be modelled by check-
ing the presence of ‘bold’ and ‘italics’, which were implemented by counting the number of bold and italic terms in the
text. Fortunately, POI extractor6 supports the extraction of bolded and italicized terms from the slide text. However, as a
counter example, the second frequent criterion – ‘writing style’ is very hard to measure objectively by computer because
Model training
Prediction/ranking
BM25
Search
Initial search
Quality annotation
Feature extraction
Training set
Testing set
Accuracy
Cohesiveness
IQ taxonomy
Slides quality assessment
…
Figure 3. Overall process of automatic assessment.
Kim et al. 13
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
of its abstractedness. The cost to design a method to measure it is highly complicated and expensive. Therefore, we
excluded the criterion. In this manner, we divided some criteria as measureable and others as non-measurable. We
sought to obtain as many measureable features as possible from the user criteria so as to increase the measurement accu-
racy of the automated quality assessment approach. In total, we devised 65 measurable features across 10 IQ dimensions
from 35 related user criteria. Appendix B shows how each of the implemented features is related to its respective dimen-
sion via user criterion obtained from our user study. In addition, we adopted some known features from prior research
such as readability and entropy [8, 13, 14, 25]. We named the dimension of the feature following the dimension of the
criterion from which the feature was derived. For example, the dimension of the feature numTypos is accuracy because
the feature was derived from a criterion – ‘the presence of typo’, whose dimension is accuracy according to the card
sorting exercises.
The next step in the distillation of quality features was to devise extraction methods from the criteria to assess the
quality of slides. The ‘Intrinsic’ category includes accuracy and cohesiveness features. Participants in our user study
mentioned that ‘the presence of typo’ affected the accuracy of the slides. We formulated the user criterion ‘number of
typo’ as a measurable feature [25]. To measure numTypos, the number of typos in the slides was counted using the spell
checker in LanguageTool.5 For cohesiveness, users mentioned ‘strong content connectivity’, and we measured it by cal-
culating the entropy of the slide texts. This feature was also reported in previous studies [8, 25]. The entropy of a docu-
ment D is computed over the individual document terms as:
�Xw∈D
pD wð Þlog pD wð Þ,where pD wið Þ= tfwi,D
.Pwj ∈D tfwj,D
ð2Þ
The probability of word wi is computed using a maximum likelihood estimate pD wið Þ. tfwi,D is the term frequency of wi
in document D andP
wj ∈D
tfwj,D is the sum of all frequency of terms in D.
The Representational category includes clarity, consistency, attraction and ease of navigation. Representational fea-
tures include numHighlights from ‘highlighting important points and terms’ (clarity), conFontFace/Size
and conFontFace/SizeRatio from ‘inconsistent font face/size’ (consistency), preAnim/numAnim/avgNumAnim from
Figure 4. LTR scheme of quality-based ranking for presentation slides.
Kim et al. 14
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
‘line-by-line animation effect’ and defFontColor from ‘main font color’ (attraction), and preSlideNum from ‘the pres-
ence of page numbering’ (ease of navigation); these form one of the unique features for presentation slides which is not
found in prior studies. In particular, ‘highlighting of important points and terms’ was the most frequently mentioned cri-
terion. The total number of highlights in slides, including bolds, italics, underlines and shadows, was counted for
numHighlights. The ratio between the number of highlights in the slides and the number of slides was calculated for
avgNumHighlights. For consistency, for features such as conFontFace/FaceRatio, the consistency of the font face was
measured with binary values and the ratio. We identified the dominant font face used in all of the slides at first and then
checked the consistency to determine whether the dominant font face in each slide (page) had changed throughout all of
the slides for conFontFace. We used the ratio of the number of slides that had different font faces with dominant font
faces to the number of slides for conFontFaceRatio. conFontSize/SizeRatio was also estimated in this manner.
Consistency of the font face of a slides s is estimated as:
conFontFace sð Þ= 0 if dominantFontCount sð Þ≥ 2
1 otherwise,
�ð3Þ
where dominantFontCount(s) is the number of dominant font faces from each slide (page) in a slide file s. For attraction,
the total number of animation effects in the slides (numAnims) and the number of animation effects per slide
(avgNumAnims) were counted. We used the ratio between the number of animation effects and the number of pages in
the slides for avgNumAnims. For preAnim, the presence of animation in the slides was identified with a binary value
(yes or no). Furthermore, the default font color (defFontColor) in the slides was identified. We measured all font colours
used and identified the most dominant font colour, which is represented in RGB format (e.g. R = 0, G = 0 and B = 0 for
black). According to the user feedback, some font colours, such as green, grey, yellow and dark blue, severely harm
visual attractiveness if they are used as the main colour. Avoiding these colours as the main font colour contributes to
assuring the attractiveness of slides. Interestingly, we found hardly any user comments about colours that directly
improve slide quality. It seems that users tend not to comment when the slides have no visible problem with font col-
ours. According to our checking of the main font colour of slides in the high quality class, black was the most common
font colour in high quality slides. For ease of navigation, the presence of slide numbers in the slides (preSlideNum) was
measured with binary values.
The Contextual category includes completeness, informativeness, recency and task appropriateness. Features such as
preCoverPageInfo from ‘the presence of necessary information in the cover page’ (completeness), numDiagrams from
‘the presence of diagram’ (informativeness), preExample/numExamples/avgNumExamples from ‘the presence of exam-
ple’ (informativeness), and preSummary from ‘the presence of summary’ (task appropriateness) were new features in this
category. For preCoverPageInfo, we checked whether the first page of the slides had a title in the title text box and infor-
mation in the subtitle box. For numDiagrams, we summed the respective number of lines, autoshapes (drawing objects
with a particular shape), and textboxes extracted from the slides. For preExample, numExamples and avgNumExamples,
we used textual cues by checking the existence of keywords such as ‘for example’ and ‘for instance’ in extracted texts
from the slides. Features measured using textual cues such as numExamples are marked in Appendix B. Recency of the
slides was measured by how many months had passed since the slides were created (age) and modified (recency) [14].
For preSummary, the presence of a summary in the slides was measured with a binary value by checking for keywords
such as ‘summary’ and ‘outline’. The Reputational category includes author/institutional reputation; however, we did not
consider it in this study (see Appendix B for a full list of the quality features implemented).
5. Experiments
5.1. Experimental setting
To account for different characteristics of datasets and improve the overall robustness of empirical validation, we con-
ducted multiple experiments. Specifically, three datasets were employed for our experiments: SLIDES-SA (SA means
data obtained from SlideShare and Annotated by human), SLIDES-SF (SF means data obtained from SlideShare and
Featured-selection classified) and SLIDES-GA (GA means data obtained using Google and Annotated by human).
For the first dataset, we collected 1276 PowerPoint presentation slides that were randomly selected from SlideShare.1
The slides collected covered four study areas: Technology, Business, Education and Health. We asked judges to manually
assign a quality grade (low, fair, or high) to the slides. These labels indicate the overall quality in each of the four IQ
categories, i.e. Intrinsic, Representational, Contextual, and Reputational. Three annotators judged each slide. In the over-
all dataset, the inter-annotator agreement among the three annotators was � = 0.63, which indicates substantial agreement
Kim et al. 15
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
[38]. Finally, we obtained 935 slides (low: 222, fair: 447, and high: 266) whose quality labels had been agreed on by at
least two annotators out of three. We refer to these annotated data as SLIDES-SA.
To conduct a large-scale experiment using a different quality evaluation perspective, we crawled 24,995 slides again
from SlideShare and collected 3655 featured slides and 21,340 non-featured slides. SlideShare provides editors’ picks as
featured-selection slides on the Web site.4 We assumed that the featured-selection slides were of high quality and the oth-
ers not as high quality as those featured-selection slides. We refer to this data as SLIDES-SF.
Furthermore, for comparison with one of the popular and successful search engines, we collected 180 slides Googled
with a filetype quantifier from 36 queries in two disciplines: Computer Science and Management (e.g. introduction to
programming filetype:pptx or filetype:ppt). We downloaded the top five slides per query. For this dataset, we employed
six graduate students as annotators, consisting of three majoring in Computer Science and the other three in
Management, to assign a quality grade on a three-point scale to the slides. The three annotators from one of the two
groups judged each slide file related to their major and we averaged the scores obtained from the annotators while
rounding off the average score below the decimal point. Consequently, we obtained 180 slides (low: 28, fair: 93, and
high: 59). We refer to these annotated data as SLIDES-GA.
SLIDES-SA and SLIDES-SF consist of slides crawled from SlideShare with different annotators. The featured-
selection slides in SLIDES-SF were selected as high-quality slides by the curators at SlideShare. However, given that we
cannot identify the selection process or the standards used by the curators, we needed another quality dataset (SLIDES-
SA) with a manual annotation by our own annotators, who had experience of the IQ dimensions and criteria. With these
two datasets, we checked the possibility of differences caused by different annotators in later experiments (see Section
5.2). Furthermore, we manually built SLIDES-GA with the search results returned by Google so that we can compare
Google slide search with our proposed algorithm. There were no overlaps among the three datasets. A summary of the
datasets is presented in Table 7. To extract the proposed quality features of the slides, we used Apache POI,5 which is a
Java API for reading and writing Microsoft Office files such as Word, Excel and PowerPoint.
We applied the Okapi BM25 algorithm [39] as a baseline method for SLIDES-SA and SLIDES-SF. This is one of the
most effective retrieval algorithms, and is based on a probabilistic relevance framework. To obtain search results via
BM25, we manually created the query sets for SLIDES-SA. We randomly selected 140 keywords (e.g. knowledge dis-
covery, international business, social education, etc.) from the content of the slides in each category. For SLIDES-SF,
we selected the top 500 most frequent tags (e.g. marketing, social media, etc.) from the tagsets of featured slides only to
avoid a no-outcome of featured slides in the BM25 search results. The search results from the queries from the tags of
all slides have sparse features because SLIDES-SF has a small number of featured slides, which can be a constraint of
the experiment. These keywords were used as queries for the BM25 search algorithm to index the resulting slides of
SLIDES-SA and SLIDES-SF in the initial search. We used an open-source search engine Apache Lucene6 (version 4.9)
to generate slide search results using the BM25 search algorithm. Further, we used Google search results as a compara-
tive baseline for SLIDES-GA. For SLIDES-GA, we used 36 subject names (e.g. introduction to programming, invest-
ment, etc.) in Computer Science and Management as queries. Crawling SLIDES-GA from Google, we recorded the
ranking of the slides and considered those rankings as initial search results. The initial list of slides from Okapi BM25
and Google were then re-ranked using the LTR algorithm with our proposed features. We utilized two listwise LTR
algorithms: AdaRank [40] and LambdaMART [41], which have been widely adopted for LTR algorithms, in RankLib.7
We set parameter values for our experiments as follows: AdaRank: no. of iterations = 500 (the number of rounds to
train), tolerance = 0.002 (tolerance between two consecutive rounds of training), max selection count of a feature = 5
(the maximum number of times a feature can be consecutively selected without changing performance); LambdaMART:
no. of trees = 1000, no. of leaves = 10 (number of leaves for each tree), learning rate = 0.1 (shrinkage factor, or the ratio
of each regression tree in LambdaMART. LambdaMART then weights the score from each regression tree by the learn-
ing rate to ensemble these regression trees together). We empirically chose the values to result in roughly the best
performance.
We conducted a 10-fold cross-validation and calculated the average performance of each LTR algorithm. We report
two standard retrieval measures: the normalized discounted cumulative gain (NDCG) and mean reciprocal rank (MRR).
The NDCG [42], a widely used metric in the information retrieval field, is adopted to measure the ranking performance.
To calculate the NDCG, the discounted cumulative gain (DCG) at a particular rank position p is first calculated in a way
that penalizes the score gain near the bottom more than near the top:
DCG @ p=Xp
i= 1
2reli � 1
log2 i+ 1ð Þ ,NDCG @ p= DCG @ p
IDCG @ pð4Þ
Kim et al. 16
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
where IDCG@p serves as the normalization term that guarantees the ideal NDCG@p to be 1. We summarize the perfor-
mance by averaging the NDCGs over the test query set. To measure the performance from a different perspective, we
adopt the MRR:
MRR= 1= Sj jXSj j
i= 1
1
ranki
ð5Þ
where ranki denotes the rank of the first high-quality slides (or the first low-quality slides) in the ranked list for the i-th
query slide set and |S| is the total number of test queries. We denote the MRR for the high- and low-quality slides as
MRRH and MRRL, respectively. A better ranking method would result in MRRH closer to 1 and MRRL closer to 1/N,
where N is the number of slides observed.
5.2. Results
Figure 5 compares the NDCG@k scores for AdaRank and LambdaMART using all the proposed features against the
baselines (BM25 and Google) with the datasets: (a) SLIDES-SA, (b) SLIDES-SF and (c) SLIDES-GA. The results con-
firm that both AdaRank and LambdaMART outperform the baseline, with LambdaMART scoring higher than AdaRank.
All the differences of AdaRank and LambdaMART over the baselines at all NDCG@k positions were statistically signif-
icant according to the Wilcoxon sign test (a< 0.05), except between LambdaMART and AdaRank in SLIDES-GA,
which is a relatively small dataset. We achieved increases of 23.4% (NDCG@5, LambdaMART, SLIDES-SA), 50.3%
(NDCG@5, LambdaMART, SLIDES-SF) and 9% (NDCG@3, LambdaMART, SLIDES-GA), above the baselines. The
results demonstrate that the proposed features can be used to effectively rank high-quality slides.
To analyze the impact of each category and each dimension, we measured the performance outcomes using the fea-
tures of each category and each dimension separately. This analysis enabled us to identify how each category and each
dimension contributes to the results. We conducted these additional experiments with only SLIDES-SA and SLIDES-SF
because large datasets generally guarantee more stable results. Figure 6 presents the results of NDCG@5 (10-fold,
LambdaMART) obtained for each IQ category (Figure 6(a)) and each IQ dimension (Figure 6(b)).
From Figure 6(a), it is clear that the Representational category was the most important criteria in both datasets,
followed by Contextual and Intrinsic categories. From Figure 6(b), significant differences can be observed, with repre-
sentational clarity (0.87) producing the best impact, followed by informativeness (0.8) and visual attraction (0.79) in
SLIDES-SA. Representational clarity (0.71), informativeness (0.67), visual attraction (0.63) and recency (0.61) are
effective dimensions in SLIDES-SF. Representational clarity, informativeness and visual attraction consistently have a
high impact on performance in both datasets. However, completeness, ease of navigation and accuracy scored lower than
other dimensions, suggesting that these quality dimensions do not have much of an impact on the quality of slides.
These experiment results indicate that representation clarity and informativeness are essential components in the assess-
ment of presentation slide quality. The finding that representational clarity is important for assessing slide quality is
rather natural because presentation slides are a form of visual-oriented communication device between presenter and
audience, and representation clarity is directly related to the conveying of messages [29, 30, 32]. It should be noted that
Figure 5. Performance of leveraging high-quality slides in the ranking using all features: (a) SLIDES-SA, (b) SLIDES-SF and (c)SLIDES-GA.
Kim et al. 17
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
representational clarity is a distinctive and discriminative IQ dimension for slides, even though this category has not
been pronounced in prior studies [8, 14, 25]. However, these results are contrary to previous reports in which readability
was found to be the most important for the quality ranking of Web documents [8] or length, structure and style were cen-
tral for assessing the quality of Wikipedia articles [14]. Reviews and user features were found to be most important for
ranking quality in the Q&A domain [25].
In our experiments, LambdaMART exhibited better performance than AdaRank. Therefore, we decided to measure
feature importance using LambdaMART. LambdaMART computes the importance of a feature by summing the number
of times it is used in splitting decisions [43, 44]. The relative importance of other features is assigned by normalizing
their importance based on the importance of the largest features. Thus, the most important feature has the importance
score of 1, and the other features have a relative importance score between 0 and 1. For this experiment, we built two
LambdaMART model sets containing 1000 trees, 10 leaves, and a learning rate of 0.1 during training with SLIDES-SA
and SLIDES-SF. We then calculated the feature importance from the models. The top 10 features are listed in Table 8.
Although there are some differences in terms of the feature importance between the two datasets, eight features appear
in the top 10 features in both datasets – enough to support general conclusions. Features related to representational clarity
were found to be highly important – four out of the eight features were representational clarity features. Another notable
point is that six features (clarity: avgNumFontNames, numHighlights, avgFontSize and avgLineSpace; informativeness:
numSlides and numImages) out of the eight most common are relatively simple to estimate, but have a significant impact
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NDC
G@5
(a) by category
SLIDES-SASLIDES-SF
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NDC
G@5
(b) by dimension
SLIDES-SASLIDES-SF
Figure 6. Performance by IQ category and dimension (LambdaMART, 10-fold).
Table 8. Feature importance given by LambdaMART.
Rank SLIDES-SA SLIDES-SF
Importance Feature IQ dimension Importance Feature IQ dimension
1 1.0 BM25y Task appr. 1.0 ARI Rep. clarity2 0.879 avgFontNamesy* Rep. clarity 0.844 numSlidesy* Informativeness3 0.458 avgNumFontColors* Vis. attraction 0.834 avgFontSizey* Rep. clarity4 0.404 numImagey Informativeness 0.6 entropyy Cohesiveness5 0.431 numHighlightsy* Rep. clarity 0.518 avgLineSpacey* Rep. clarity6 0.284 avgFontSizey* Rep. clarity 0.513 BM25y Task appr.7 0.228 avgLineSpacey* Rep. clarity 0.507 numImagey Informativeness8 0.209 entropyy Cohesiveness 0.474 avgFontNamesy* Rep. clarity9 0.197 numFontSize* Rep. clarity 0.368 Flesh Rep. clarity10 0.193 numSlidesy* Informativeness 0.276 numHighlightsy* Rep. clarity
ycommon in both datasets; * newly proposed in this study.
Kim et al. 18
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
on the quality of slides. Out of these features, five features excluding numImages have not been identified by any prior
study.
To demonstrate the effectiveness of our ranking strategy using other measures, we traced the highest and lowest qual-
ity version in the ranked list for the i-th query result. Table 9 shows the MRRH and MRRL values given by a baseline
implementation of LambdaMART and a version comprising all the features in SLIDES-SA and SLIDES-SF. Our pro-
posed method with all the features achieved a 31.8% improvement in MRRH and a 38.6% decrease in MRRL over the
baseline in SLIDES-SA. In SLIDES-SF, we achieved a 45.7% improvement in MRRH and a 18.3% decrease in MRRL
over the baseline. Note that the baseline value of MRRH (0.38) is much lower than that of MRRL (0.855) in SLIDES-SF
because the featured slides are sparse in the top 10 search results. Thus, non-featured (low-quality) slides have higher
positions in many queries. Despite this problem, our proposed method achieved improvements over the baseline in MRR
across the two datasets, clearly demonstrating the robustness of the approach. Finally, we present the average score of
major features in the high-quality and low-quality classes in Table 10. These results reveal the differences in high- and
low-quality slides in terms of their constituent features. These results suggest that better-quality slides have a tendency
to contain more slides (pages), images, font colours, font names and highlights, contributing to better representational
clarity and informativeness.
6. Conclusion
In this paper, we have investigated the essential elements of presentation slide quality and proposed a new framework
for IQ developed specifically for presentation slides on the basis of direct user inputs. From the user study conducted in
Phase 1, we elicited a rich set of quality criteria for slide quality. We discovered that users felt that the quality of slides
was mostly affected by criteria such as ‘highlighting of important points and terms’, ‘writing style’ and ‘the presence of
too much text’. Furthermore, Representational criteria were emphasized for determining the quality of slides. Visual
attraction, representational clarity and informativeness were the most frequently mentioned IQ dimensions. Because the
open-ended interviews did not mandate any predefined criteria or guidelines, the participants were able to present a vari-
ety of quality criteria. These criteria provided valuable clues for the development of diverse metrics at multiple levels
for slide quality.
We also proposed a comprehensive LTR method developed specifically to promote high-quality and penalize low-
quality computerized presentation slides in Phase 2. We presented 69 features that capture 10 IQ dimensions such as
representational clarity, informativeness and visual attraction. We distilled these features through an intensive user study
and applied them to automatic assessment of the IQ of presentation slides. LambdaMART and AdaRank models trained
by human-annotated data showed substantially better performance than the baseline methods in ranking the quality of
slides. We demonstrated the generality of the proposed framework with three different datasets from different sources
with different quality assessments. Across the datasets, we found that representational clarity, informativeness and visual
Table 9. Performance of ranking high-quality and low-quality slides (MRR).
Dataset Baseline (BM25) All features
MRRH MRRL MRRH MRRL
SLIDES-SA 0.48 0.388 0.663 0.238SLIDES-SF 0.38 0.855 0.554 0.698
Table 10. Mean of major features in high and low quality classes (SLIDES-SA/SLIDES-SF).
Quality Feature Quality
High Low
numSlides 47.3/42.6 33.6/14.9avgNumImages 1.2/2.9 0.5/1.2numFontColors 6.9/7.8 5.1/3.2numFontNames 2.7/4.7 2.18/2.1numHighlights(per page) 2.8/2.0 2.18/1.8
Kim et al. 19
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
attraction were the most effective features for ranking the quality of presentation slides, whereas completeness, ease of
navigation and accuracy were relatively unhelpful. These results are consistent with the results of our user study. Six fea-
tures (clarity: avgNumFontNames, numHighlights, avgFontSize and avgLineSpace; informativeness: numSlides and
numImages) were very effective in identifying high-quality slides. The features numHighlights (highlighting of important
points and terms), numImages (the presence of figures) and numSlides (the presence of a large amount of content) were
the common features (criteria) confirmed to be important via LambdaMART (Table 8) and the user interview (Table 4).
Based upon the results obtained in Phase 2, Figure 7 provides a revised taxonomy with the most important and effective
dimensions emphasized in bold.
Our comprehensive framework is built upon extensive user feedback. The subsequent automatic assessments con-
ducted with the LTR strategy lead to a deeper understanding of IQ for presentation slides with empirical results. The
framework has direct implications for practical applications and services. For instance, our proposed IQ assessment
approach with LTR can be used by service providers such as SlideShare1 and SlideFinder2, which have search and rank
functionalities for a massive number of slides in their services. We expect that end-user satisfaction could be improved
by rearranging their search results to separate high quality slides based on the user-driven quality criteria identified by
our research.
In our future work, we intend to develop and implement a system that further utilizes semantic quality features espe-
cially in the Intrinsic category criteria. For example, we can measure cohesiveness with the entropy of text. However,
cohesiveness can also be measured by the connectivity between two paragraphs. Thus, measuring cohesiveness via
entropy alone may not give a complete picture. More sophisticated methods should be developed at the semantic level,
enabling better IQ measurement of presentation slides. In addition, we are planning to apply our framework to other
domains such as e-book or mobile content services, in which rich visual aids are highly sought after. For those applica-
tions, our quality framework can serve as an initial yardstick in identifying superior quality visual aids.
Funding
This work was supported by the Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea
government (MSIP) (No. R0101-15-0054, WiseKB: Big data based self-evolving knowledge base and reasoning platform).
Notes
1. http://slideshare.net
2. http://slidefinder.net
3. http://atlasti.com/
4. http://www.slideshare.net/featured
5. https://www.languagetool.org/
6. http://poi.apache.org
7. http://lucene.apache.org/core
8. http://sourceforge.net/p/lemur/wiki/RankLib/
Information quality
Intrinsic
Accuracy
Cohesiveness
Representational
Representational clarity
Representational consistency
Visual attraction
Ease of navigation
Contextual
Completeness
Informativeness
Recency
Task appropriateness
Figure 7. Revised IQ taxonomy of presentation slides.
Kim et al. 20
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
References
[1] Wang RY and Strong DM. Beyond accuracy: What data quality means to data consumers. Journal of Management Information
Systems 1996; 12(4): 5–33.
[2] Marschak J. Economics of information systems. Journal of the American Statistical Association 1971; 66(333): 192–219.
[3] Ballou DP and Pazer HL. Designing information systems to optimize the accuracy–timeliness tradeoff. Information Systems
Research 1995; 6(1): 51–72.
[4] Belardo S and Pazer HL. A framework for analyzing the information monitoring and decision support system investment trade-
off dilemma: An application to crisis management. IEEE Transactions on Engineering Management 1995; 42(4): 352–359.
[5] Porion A, Aparicio X, Megalakaki O, Robert A and Baccino T. The impact of paper-based versus computerized presentation on
text comprehension and memorization. Computers in Human Behavior 2016; 54: 569–576.
[6] Ge M and Helfert M. A review of information quality research. Paper presented at the International Conference on Information
Quality 2007.
[7] Stvilia B, Gasser L, Twidale MB and Smith LC. A framework for information quality assessment. Journal of the American
Society for Information Science and Technology 2007; 58(12): 1720–1733.
[8] Bendersky M, Croft WB and Diao Y. Quality-biased ranking of web documents. Proceedings of the 4th ACM international con-
ference on Web Search and Data Mining. Hong Kong, China: ACM, 2011, pp. 95–104.
[9] Knight S-a and Burn J. Developing a framework for assessing information quality on the World Wide Web. Informing Science
2005; 8 (http://inform.nu/Articles/Vol8/v8p159-172Knig.pdf).
[10] Mandl T. Implementation and evaluation of a quality-based search engine. Proceedings of the seventeenth conference on
Hypertext and Hypermedia. Odense, Denmark: ACM, 2006, pp. 73–84.
[11] Rieh SY. Judgment of information quality and cognitive authority in the WWW. Journal of the American Society for
Information Science and Technology 2002; 53(2): 145–161.
[12] Zhou Y and Croft WB. Document quality models for web ad hoc retrieval. Proceedings of the 14th ACM international confer-
ence on Information and Knowledge Management. Bremen, Germany: ACM, 2005, pp. 331–332.
[13] Anderka M, Stein B and Lipka N. Predicting quality flaws in user-generated content: The case of Wikipedia. Proceedings of the
35th international ACM SIGIR conference on Research and Development in Information Retrieval. Portland, OR: ACM, 2012,
pp. 981–990.
[14] Dalip DH, Goncxalves MA, Cristo M and Calado P. Automatic quality assessment of content created collaboratively by web
communities: A case study of Wikipedia. Proceedings of the 9th ACM/IEEE-CS joint conference on Digital Libraries. Austin,
TX: ACM, 2009, pp. 295–304.
[15] Hu M, Lim E-P, Sun A, Lauw HW and Vuong B-Q. Measuring article quality in Wikipedia: Models and evaluation.
Proceedings of the 16th ACM international conference on Information and Knowledge Management. Lisbon: ACM, 2007, pp.
243–252.
[16] Stvilia B, Twidale MB, Smith LC and Gasser L. Information quality work organization in Wikipedia. Journal of the American
Society for Information Science and Technology 2008; 59(6): 983–1001.
[17] Yaari E, Baruchson-Arbib S and Bar-Ilan J. Information quality assessment of community-generated content: A user study of
Wikipedia. Journal of Information Science 2011; 37(5): 487–498.
[18] Liu T-Y. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 2009; 3(3): 225–331.
[19] Kim S, Jung W, Han K, Lee JG, and Yi M. Quality-based automatic classification for presentation slides. Proceedings of the
36th European Conference on Information Retrieval (ECIR) 2014, pp. 638–643.
[20] Hilligoss B and Rieh SY. Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in
context. Information Processing & Management. 2008; 44(4): 1467–1484.
[21] Alkhattabi M, Neagu D and Cullen A. Information quality framework for e-learning systems. Knowledge Management & E-
Learning: An International Journal (KM&EL) 2010; 2(4): 340–362.
[22] Dedeke A. A conceptual framework for developing quality measures for information systems. Proceedings of the Conference
on Information Quality 2000, pp. 126–128.
[23] Richardson M, Prakash A and Brill E. Beyond pagerank: Machine learning for static ranking. Proceedings of the 15th interna-
tional conference on World Wide Web. Edinburgh: ACM, 2006, pp. 707–715.
[24] Choi S, Ryu B, Yoo S and Choi J. Combining relevancy and methodological quality into a single ranking for evidence-based
medicine. Information Sciences 2012; 214: 76–90.
[25] Dalip DH, Goncxalves MA, Cristo M and Calado P. Exploiting user feedback to learn to rank answers in Q&A forums: A case
study with stack overflow. Proceedings of the 36th international ACM SIGIR conference on Research and Development in
Information Retrieval. Dublin: ACM, 2013, pp. 543–552.
[26] Raiber F and Kurland O. Using document-quality measures to predict web-search effectiveness. Proceedings of the 35th
European conference on Advances in Information Retrieval. Moscow: Springer, 2013, pp. 134–145.
[27] Alkhattabi M, Neagu D and Cullen A. Assessing information quality of e-learning systems: A web mining approach. Computers
in Human Behavior 2011; 27(2): 862–873.
[28] Wu O, Hu R, Mao X and Hu W. Quality-based learning for web data classification. Proceedings of the twenty-eighth AAAI con-
ference on Artificial Intelligence 2014.
Kim et al. 21
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
[29] Alley M. The Craft of Scientific Presentations: Critical Steps to Succeed and Critical Errors to Avoid. New York: Springer,
2013.
[30] Reynolds G. Presentation zen: Simple ideas on presentation design and delivery. Berkeley, CA: New Riders, 2011.
[31] Alley M and Neeley KA. Discovering the power of powerpoint: Rethinking the design of presentation slides from a skillful
user’s perspective. American Society for Engineering Education Annual Conference & Exposition, 2005.
[32] Mackiewicz J. Perceptions of clarity and attractiveness in Powerpoint graph slides. Technical Communication 2007; 54(2):
145–156.
[33] Miles MB and Huberman AM. Qualitative Data Analysis: An expanded sourcebook, 2nd edition. Thousand Oaks, CA: Sage,
1994.
[34] Saldana J. The Coding Manual for Qualitative Researchers. Thousand Oaks, CA: Sage, 2009.
[35] Fincher S and Tenenberg J. Making sense of card sorting data. Expert Systems 2005; 22(3): 89–93.
[36] Spencer D. Card sorting: Designing Usable Categories. New York: Rosenfeld, 2009.
[37] Li H. A short introduction to learning to rank. IEICE TRANS 2011; E94-D(10).
[38] Fleiss JL. Measuring nominal scale agreement among many raters. Psychological Bulletin 1971; 76(5): 378–382.
[39] Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM and Gatford M. Okapi at trec3. Text REtrieval Conference 1994.
[40] Xu J and Li H. Adarank: A boosting algorithm for information retrieval. Proceedings of the 30th annual international ACM
SIGIR conference on Research and Development in Information Retrieval. Amsterdam: ACM, 2007, pp. 391–398.
[41] Burges CJC. From RankNet to LambdaRank to LambdaMart: An Overview. Microsoft Research Technical Report MSR-TR-
2010-82, 2010.
[42] Jarvelin K and Kekalainen J. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems
2002; 20(4): 422–446.
[43] Wu Q, Burges CJ, Svore KM and Gao J. Adapting boosting for information retrieval measures. Information Retrieval 2010;
13(3): 254–270.
[44] Berberich K, Konig AC, Lymberopoulos D and Zhao P. Improving local search ranking through external logs. Proceedings of
the 34th international ACM SIGIR conference on Research and Development in Information Retrieval. Beijing: ACM, 2011,
pp. 785–794.
Kim et al. 22
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Ap
pen
dix
A.
Use
rs’c
om
men
tsfo
rth
ecr
iter
ia.
Cat
egory
Dim
ensi
on
Cri
teri
aU
sers
’co
mm
ents
Intr
insi
cA
ccura
cyT
he
pre
sence
ofty
pos
‘Ther
ear
eso
me
erro
rs,w
hic
hm
ake
me
confu
sed,so
Iw
asnot
dra
wn
into
the
pre
senta
tion.’
‘Itse
ems
like
the
erro
rsca
use
dis
trust
inth
esl
ides
.’T
he
pre
sence
ofin
accu
rate
expla
nat
ions
‘The
slid
esle
anto
om
uch
tow
ard
spec
ific
view
poin
ts.T
her
ear
em
any
oth
erper
spec
tive
son
this
his
tori
calev
ent.
How
ever
,oth
ersc
hola
rs’co
mm
ents
are
not
incl
uded
her
e.’
Cohes
iven
ess
The
pre
sence
ofir
rele
vant
conte
nt
‘Itap
pea
rsto
hav
eir
rele
vant
poin
tsin
cert
ain
sect
ions.’
‘Ith
ink
ithas
alo
tofir
rele
vant
det
ails
that
are
not
rela
ted
toth
ele
arnin
gai
ms
ofth
isco
urs
e.’
Irre
leva
nt
pag
etitle
for
the
conte
nt
‘This
slid
eis
titled
‘‘The
diff
eren
ces
bet
wee
nst
age
1an
dst
age
2’’,
but
only
stag
e1
isdis
cuss
edon
this
pag
e.’
‘Iam
not
sure
whet
her
this
isa
corr
ect
title
for
the
conte
nt
ofth
ispag
e.T
he
title
and
the
conte
nt
are
mis
mat
ched
.’N
atura
lnes
sT
he
pre
sence
ofeq
uat
ions
‘Itis
easi
erto
under
stan
dw
hen
the
slid
eshav
eeq
uat
ions
rath
erth
anju
stte
xt.’
‘This
isa
very
import
ant
conce
pt.
Expla
inin
gusi
ng
equat
ions
isnat
ura
lan
din
tuitiv
e.’
Obje
ctiv
elin
kage
clar
ity
Conte
nt
flow
‘The
conte
nt
ofth
esl
ides
flow
sve
ryw
ell.
The
story
unfo
lds
ina
very
nat
ura
lw
ay.’
‘Ilik
edth
eplo
tofth
esl
ides
.Sp
ecifi
cally
,Ith
ough
tth
eau
thor
did
agr
eat
job
with
dep
loyi
ng
the
conte
nt
inth
esl
ides
.’R
ever
sed
conte
nt
ord
erag
ainst
the
outlin
e‘T
he
ord
eris
stra
nge
;Ire
aliz
edth
atth
eco
nte
nt
ord
erch
ange
dfr
om
the
out
line
pre
sente
dat
the
beg
innin
g.D
ue
toth
is,Isu
dden
lylo
stm
yw
ay.’
‘Ith
ink
pag
es9
and
10
should
be
chan
ged
tofo
llow
the
sect
ion
outlin
e.T
he
editor
mig
ht
hav
em
ade
am
ista
ke.’
Rep
rese
nta
tional
Rep
rese
nta
tional
clar
ity
Hig
hlig
hting
ofim
port
ant
poin
tsan
dte
rms
‘Anoth
ergo
od
thin
gis
that
hig
hlig
hting
the
import
ant
par
tsin
red
hel
ps
my
under
stan
din
g;th
etitle
inbold
isea
sier
tose
e.’
‘The
import
ant
par
tsuse
ast
rong
colo
ran
dth
eunim
port
ant
par
tsar
ew
eakl
yex
pre
ssed
;th
eref
ore
,Ita
kem
ore
notice
ofth
esi
gnifi
cant
par
ts.’
Wri
ting
styl
e–
Sente
nce
wri
ting
styl
ePo
sitive
exam
ple
:‘E
xpla
nat
ions
usi
ng
nat
ura
lse
nte
nce
sar
em
uch
more
under
stan
dab
lefo
rm
e.Ica
nca
tch
the
poin
tsfr
om
det
aile
dex
pla
nat
ions.’
Neg
ativ
eex
ample
:‘T
he
sente
nce
sar
ehar
dto
read
.Ipre
fer
short
,w
ell-co
nst
ruct
edex
pre
ssio
ns
insl
ides
.’–
Sum
mar
ized
wri
ting
styl
ePo
sitive
exam
ple
:‘T
he
conden
sed,su
mm
ariz
edst
yle
ofw
riting
usi
ng
import
ant
keyw
ord
sim
pro
ves
my
conce
ntr
atio
nsi
nce
itre
produce
ske
yfa
cts
and
idea
s.’
Neg
ativ
eex
ample
:‘I
can’t
under
stan
dw
hat
this
phra
sem
eans.
Afu
ll,des
crip
tive
sente
nce
would
impro
veth
eex
pla
nat
ion,th
ough
itw
ould
be
abit
longe
r.’R
epre
senta
tional
consi
sten
cyIn
consi
sten
tfo
nt
face
‘Itis
not
good
tosu
dden
lych
ange
the
font
face
.It
seem
sbet
ter
touse
aco
nsi
sten
tfo
nt.’
‘Inco
nsi
sten
tuse
offo
nts
inth
esl
ides
seem
sto
dis
turb
conce
ntr
atio
n.’
Inco
nsi
sten
tfo
nt
size
(in
conte
nt,
title)
‘The
font
size
unex
pec
tedly
gets
big
ger
on
this
pag
e.It
real
lyhar
ms
consi
sten
cyan
dco
nce
ntr
atio
n.’
‘The
inco
nsi
sten
tfo
nt
size
sth
rough
out
the
slid
esm
ake
itdiff
icult
tore
adth
ete
xt,
rega
rdle
ssofco
nte
nt.’
Vis
ual
attr
action
The
line-
by-
line
pre
sence
of
anim
atio
nN
egat
ive
exam
ple
:‘D
ue
toto
om
any
anim
atio
nef
fect
sin
the
slid
es,it
was
not
pro
per
for
exam
inat
ion.’
‘Ica
n’t
under
stan
dw
hyso
man
yan
imat
ion
effe
cts
are
embed
ded
inev
ery
singl
ese
nte
nce
.Exce
ssiv
ean
imat
ion
des
troys
conce
ntr
atio
n.’
Posi
tive
exam
ple
:‘C
onte
nt
should
be
pre
sente
din
this
man
ner
usi
ng
anim
atio
nef
fect
sso
that
the
mai
nid
eaca
nbe
easi
lyunder
stood.’
Cle
anan
dnea
tsl
ide
des
ign
‘This
slid
eis
sonea
tan
dcl
ear
that
itis
more
appea
ling
toth
eey
es.’
(con
tinue
d)
Kim et al. 23
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Ap
pen
dix
A.C
ontinued
Cat
egory
Dim
ensi
on
Cri
teri
aU
sers
’co
mm
ents
‘My
first
impre
ssio
nofth
issl
ide
isth
atit
issi
mple
,nea
t,an
dw
elldes
igned
.It
’snic
eto
look
atit.’
Eas
eofnav
igat
ion
The
pre
sence
ofsl
ide
num
ber
ing
‘Slid
etitles
with
num
ber
sar
egr
eat.
Conte
nt
num
ber
ing
asso
ciat
edw
ith
the
table
of
conte
nts
obvi
ousl
ypro
vides
alo
tofsy
ner
gyfo
rlo
cating
conte
nt.
Itis
easy
togu
ess
wher
eit
is.’
‘Ifth
esl
ides
don’
thav
epag
enum
ber
s,it
isdiff
icult
tonav
igat
ean
dlo
cate
the
slid
es.’
The
pre
sence
ofa
table
of
conte
nts
‘This
pre
senta
tion
slid
ehas
ata
ble
ofco
nte
nts
inth
ebeg
innin
gpar
t,so
itis
poss
ible
that
Ica
nunder
stan
dw
hat
Iam
goin
gto
lear
n.’
‘The
org
aniz
atio
nofth
esl
ides
,in
cludin
ga
table
ofco
nte
nts
,is
very
nic
e.It
mak
esth
eoutlin
eofth
eco
nte
nts
more
clea
r.’C
onte
xtu
alC
om
ple
tenes
sT
he
pre
sence
ofre
fere
nce
s‘T
he
refe
rence
sar
ein
dex
ed.It
seem
sto
be
easi
erto
find
thin
gs.’
‘Ilik
eth
ese
refe
rence
ssh
ow
ing
wher
eth
eyco
me
from
.’T
he
pre
sence
ofnec
essa
ryin
form
atio
non
the
cove
rpag
e(t
itle
,au
thor,
etc.
)
‘This
slid
ehas
apre
senta
tion
title,
pre
senta
tion
dat
e,a
pic
ture
ofth
esp
eake
r,an
dth
eaf
filia
tion
on
the
first
pag
e.It
isnic
e.So
me
don’t
hav
eth
isbas
icin
form
atio
n.’
‘The
cove
rpag
ein
form
sm
eofm
any
thin
gsw
ith
the
title,
the
pre
sente
r,an
da
rele
vant
pic
ture
.Ith
ink
aco
ver
pag
ew
ith
this
info
rmat
ion
isfu
ndam
enta
lan
dnec
essa
ry.’
Info
rmat
iven
ess
The
pre
sence
ofto
om
uch
text
Posi
tive
exam
ple
:‘T
he
adva
nta
gese
ems
tobe
that
Ica
nst
udy
only
with
thes
esl
ides
without
hav
ing
ate
xtb
ook.
’N
egat
ive
exam
ple
:‘It
seem
sth
ere
are
too
man
yle
tter
son
the
pag
eto
gras
pth
eco
nte
nt’
and
‘When
you
take
aco
urs
ew
ith
slid
esth
athav
ea
larg
eam
ount
ofw
riting,
they
usu
ally
move
more
quic
kly.
Bec
ause
Ica
nno
tre
adbey
ond
ace
rtai
npoin
t,Ith
ink
itis
not
good.’
The
pre
sence
ofex
ample
s‘A
tth
ebeg
innin
g,th
eth
eory
isex
pla
ined
,an
dth
enth
ere
alex
ample
sar
eco
ntinued
.It
seem
sto
be
easi
erto
under
stan
dw
ith
real
exam
ple
s.’
‘Ilik
eth
ese
inte
rest
ing
photo
exam
ple
s.’
Rec
ency
Outd
ated
conte
nt
‘Iam
repuls
edby
the
old
conte
nt
ofth
esl
ides
.T
he
slid
esw
ere
crea
ted
ove
r10
year
sag
o.’
‘This
sect
ion
that
expla
ins
tren
ds
inth
eap
plic
atio
nusi
ng
old
dat
ale
ads
toga
ps
inco
mm
itm
ent.’
Task
appro
pria
tenes
sSu
mm
ariz
atio
n‘T
he
deg
ree
ofsu
mm
ary
for
the
conte
nt
was
good.It
effe
ctiv
ely
sort
edke
ypoin
ts,
conden
sing
det
aile
dex
pla
nat
ions.’
‘Ilik
edth
esu
mm
ariz
atio
nofco
nte
nt
for
pre
senta
tion.Sp
ecifi
cally
,Ith
ough
tit
was
wel
lsu
mm
ariz
edan
dpre
sente
dbri
efly
such
that
Ico
uld
reco
gniz
ean
dunder
stan
dth
eco
nte
nt
imm
edia
tely
.’T
he
pre
sence
ofex
erci
ses
‘At
the
end
ofth
esl
ides
,th
ere
are
som
equiz
zes.
Itm
akes
me
chec
kw
hat
Ile
arned
and
iden
tify
what
are
import
ant.’
‘Ilik
eth
ese
exer
cise
sth
atfo
llow
the
rela
ted
conce
pts
.It
give
sm
ea
chan
ceto
thin
kab
out
them
.’R
eputa
tional
Auth
or/
inst
itutional
reputa
tion
Slid
esby
the
textb
ook
publis
her
‘Its
org
aniz
atio
nse
ems
grea
t.Ith
ink
the
reas
on
isth
atth
ete
xtb
ook
com
pan
yhas
publ
ished
slid
esas
auxili
ary
mat
eria
ls.’
‘This
one
isfr
om
the
publis
her
ofth
ete
xtb
ook.
Itse
ems
more
trust
wort
hybec
ause
itm
ust
be
afa
ithfu
llysu
mm
ariz
eddocu
men
tfo
rpre
senta
tion.’
Kim et al. 24
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Ap
pen
dix
B.
List
ofqual
ity
feat
ure
s.
Cat
egory
Dim
ensi
on
Feat
ure
Des
crip
tion
Rel
ated
use
rcr
iter
ion
Ref
.
Intr
insi
cA
ccura
cynum
Typos
Num
ber
ofty
pos
Open
Pro
ofR
eadin
glib
rary
5(h
ttps:
//w
ww
.langu
aget
ool.o
rg/)
The
pre
sence
ofty
pos
[25]
Cohes
iven
ess
Entr
opy
Entr
opy
ofte
xts
inth
esl
ides
�P w∈D
p DwðÞlo
gpD
wðÞ,
wher
e
p Dw
iðÞ =
tfw
i,D
. Pw
j∈D
tfw
j,D
Java
MIv1
.0(h
ttps:
//gi
thub.c
om
/Cra
igac
p/Jav
aMI)
Stro
ng
conte
nt
connec
tivi
ty[8
]
Rep
rese
nta
tional
Rep
rese
nta
tional
clar
ity
num
Hig
hlig
hts
Num
ber
offo
nt
hig
hlig
hts
Hig
hlig
hting
ofim
port
ant
poin
tsan
dte
rms
New
avgN
um
Hig
hlig
hts
Ave
rage
num
ber
ofhig
hlig
hts
Hig
hlig
hting
ofim
port
ant
poin
tsan
dte
rms
New
num
FontS
izes
Num
ber
offo
nt
size
sFo
ntsi
zeN
ewav
gFontS
ize
Ave
rage
size
offo
nt
Font
size
New
num
LineS
pac
esN
um
ber
oflin
esp
aces
Sente
nce
spac
ing
New
avgS
izeL
ineS
pac
eA
vera
gesi
zeoflin
esp
aces
Sente
nce
spac
ing
New
pre
Table
Pre
sence
ofta
ble
sExpla
nation
with
com
par
ison
[8,13]
num
Table
sN
um
ber
ofta
ble
sExpla
nation
with
com
par
ison
[8,13]
avgN
um
Table
sA
vera
genum
ber
ofta
ble
sExpla
nation
with
com
par
ison
[8,13]
Res
olu
tion
Res
olu
tion
ofim
ages
Hig
hre
solu
tion
figure
New
pre
Bulle
tPre
sence
ofbulle
tpoin
tsU
seofbulle
tpoin
tsN
ewnum
Bulle
tsN
um
ber
ofbulle
tpoin
tsU
seofbulle
tpoin
tsN
ewlis
tRat
ioN
um
ber
ofw
ord
sin
lists
/word
count
–[1
3]
frac
Stops
Stopw
ord
s/non-s
topw
ord
sra
tio
–[8
]av
gTer
mLe
nA
vera
gete
rmle
ngt
hofte
xts
–[8
]A
RI
Auto
mat
edre
adab
ility
index
–Phan
tom
read
abili
tylib
rary
(htt
ps:
//bin
tray
.com
/plin
dsa
y/phan
tom
)
–[8
,14,2
5]
Fles
hFl
esch
read
ing
ease
–Phan
tom
read
abili
tylib
rary
(htt
ps:
//bin
tray
.com
/plin
dsa
y/phan
tom
)
–[8
,14,2
5]
Rep
rese
nta
tional
consi
sten
cyco
nFo
ntF
ace
Consi
sten
cyofdom
inan
tfo
nt
face
Inco
nsi
sten
tfo
nt
face
New
conFo
ntF
aceR
atio
Rat
ioofco
nsi
sten
tfo
nt
face
pag
esIn
consi
sten
tfo
nt
face
New
conFo
ntS
ize
Consi
sten
cyofdom
inan
tfo
nt
size
Inco
nsi
sten
tfo
nt
size
New
conFo
ntS
izeR
atio
Rat
ioofco
nsi
sten
tfo
nt
face
pag
esIn
consi
sten
tfo
nt
size
New
conIn
den
Leve
lC
onsi
sten
cyofin
den
tation
leve
lIn
consi
sten
tin
den
tation
leve
lN
ewco
nBG
Tem
pla
teC
onsi
sten
cyofbac
kgro
und
tem
pla
teIn
consi
sten
tbac
kgro
und
tem
pla
teN
ewV
isual
attr
action
num
Anim
sN
um
ber
ofan
imat
ion
effe
cts
Line-
by-
line
anim
atio
nef
fect
New
avgN
um
Anim
sA
vera
genum
ber
ofan
imat
ion
effe
cts
Line-
by-
line
anim
atio
nef
fect
New
pre
Ani
Pre
sence
ofan
imat
ion
effe
cts
The
pre
sence
ofan
imat
ion
New
num
FontC
olo
rsN
um
ber
offo
nt
colo
urs
Colo
ur
sche
me
New
avgN
um
FontC
olo
rsA
vera
genum
ber
offo
nt
colo
urs
Colo
ur
sche
me
New
(con
tinue
d)
Kim et al. 25
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Ap
pen
dix
B.C
ontinued
Cat
egory
Dim
ensi
on
Feat
ure
Des
crip
tion
Rel
ated
use
rcr
iter
ion
Ref
.
def
FontC
olo
rD
efau
ltfo
nt
colo
ur
Mai
nfo
nt
colo
ur
New
num
Colo
rsN
um
ber
ofco
lours
Use
ofm
any
colo
urs
New
num
Tem
pla
teN
um
ber
ofte
mpla
tes
-[1
4]
Eas
eofnav
igat
ion
pre
Slid
eNum
Pre
sence
ofpag
enum
ber
sT
he
pre
sence
ofpag
enum
ber
ing
New
pre
Table
Cnts
Pre
sence
ofta
ble
ofco
nte
nts
(T,ta
ble
of
conte
nts
,TO
C,lis
t,et
c.in
the
first
3pag
es)
The
pre
sence
ofta
ble
ofco
nte
nts
New
Conte
xtu
alC
om
ple
tenes
spre
Ref
eren
cePre
sence
ofre
fere
nce
(T,re
fere
nce
,ap
pen
dix
inth
ela
st3
pag
es)
The
pre
sence
ofre
fere
nce
[14]
pre
Cove
rPag
eInfo
Pre
sence
ofnec
essa
ryin
form
atio
nin
the
cove
rpag
e(t
itle
,au
thor,
dep
artm
ent,
org
aniz
atio
n)
The
pre
sence
ofnec
essa
ryin
form
atio
nin
the
cove
rpag
e(t
itle
,au
thor,
dep
artm
ent,
org
aniz
atio
n)
New
pre
ExtL
ink
Pre
sence
ofex
tern
allin
ks(e
xte
rnal
video
,w
ebpag
e)T
he
pre
sence
ofex
tern
allin
k(e
xte
rnal
video
,w
ebpag
e)[1
3,14]
num
ExtL
inks
Pre
sence
ofex
tern
allin
ks(e
xte
rnal
video
,w
ebpag
e)T
he
pre
sence
ofex
tern
allin
k(e
xte
rnal
video
,w
ebpag
e)[1
3,14]
Info
rmat
iven
ess
num
Term
sN
um
ber
ofte
rms
inth
esl
ides
The
pre
sence
ofto
om
uch
text
[13,14]
avgN
um
Term
sN
um
ber
ofte
rms
per
pag
eT
he
pre
sence
ofto
om
uch
text
[7,12,2
4]
pre
Exam
ple
Pre
sence
ofex
ample
s(T
,fo
rex
ample
,fo
rin
stan
ce)
The
pre
sence
ofex
ample
sN
ew
num
Exam
ple
sN
um
ber
ofex
ample
s(T
)T
he
pre
sence
ofex
ample
sN
ewav
gNum
Exam
ple
sA
vera
genum
ber
ofex
ample
s(T
)T
he
pre
sence
ofex
ample
sN
ewpre
Img
Pre
sence
ofim
age
The
pre
sence
offig
ure
s[8
,14,2
5]
num
Imgs
Num
ber
ofim
ages
The
pre
sence
offig
ure
s[8
,14,2
5]
avgN
um
Imgs
Num
ber
ofim
ages
per
pag
eT
he
pre
sence
offig
ure
s[8
,14,2
5]
num
Slid
esN
um
ber
ofsl
ides
The
pre
sence
ofa
larg
eam
ount
ofco
nte
nt
New
pre
Expla
inPre
sence
ofex
pla
nat
ion
for
obje
cts
The
pre
sence
ofad
ditio
nal
expla
nat
ion
(for
equat
ion,gr
aph,fig
ure
,ta
ble
)N
ew
pre
Dia
gram
Pre
sence
ofdia
gram
The
pre
sence
ofdia
gram
[13]
pre
Def
initio
nPre
sence
ofdef
initio
n(T
,def
initio
n,ra
tional
e)T
he
pre
sence
ofte
rmin
olo
gy(t
erm
-def
initio
n)
New
info
-to-r
atio
Voca
bula
rysi
ze/w
ord
count
–[1
3]
num
Sente
nce
sN
um
ber
ofse
nte
nce
s–
Stan
ford
Toke
niz
er(h
ttp://n
lp.s
tanfo
rd.e
du/
soft
war
e/to
keniz
er.s
htm
l)
–[1
3]
sente
ceLe
nLe
ngt
hofse
nte
nce
Long
sente
nce
(Len
gth
ofse
nte
nce
)[1
3]
Rec
ency
Age
Num
ber
ofm
onth
sfr
om
crea
ted
dat
eO
utd
ated
conte
nts
[14]
Rec
ency
Num
ber
ofm
onth
sfr
om
modifi
eddat
eO
utd
ated
conte
nt
s[1
4]
Task
appro
pri
aten
ess
pre
Exe
rcis
ePre
sence
ofex
erci
ses
(T,ex
erci
se,w
ork
out)
The
pre
sence
ofex
erci
ses
New
num
Exe
rcis
esN
um
ber
ofex
erci
ses
(T)
The
pre
sence
ofex
erci
ses
New
avgN
um
Exe
rcis
esA
vera
genum
ber
ofex
erci
ses
(T)
The
pre
sence
ofex
erci
ses
New
pre
Ques
tion
Pre
sence
ofques
tions
(T,ques
tion,pro
ble
m,?,
Qan
dA
)T
he
pre
sence
ofques
tions
[13]
(con
tinue
d)
Kim et al. 26
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from
Ap
pen
dix
B.C
ontinued
Cat
egory
Dim
ensi
on
Feat
ure
Des
crip
tion
Rel
ated
use
rcr
iter
ion
Ref
.
num
Ques
tions
Num
ber
ofques
tions
(T)
The
pre
sence
ofques
tions
[13]
avgN
um
Ques
tions
Ave
rage
num
ber
ofques
tions
(T)
The
pre
sence
ofques
tions
[13]
pre
Key
term
Pre
sence
ofke
yter
ms
(T,ke
yter
m,ke
yword
,ap
pen
dix
inth
ela
st3
pag
es)
The
pre
sence
ofke
yter
mN
ew
pre
Sum
mar
yPre
sence
ofsu
mm
ary
(T,su
mm
ary,
index
inth
ela
st3
pag
es)
The
pre
sence
ofsu
mm
ary
New
BM
25
Rel
evan
cebet
wee
nco
nte
nts
and
use
r’s
quer
y,A
pac
he
Luce
ne6
(ver
sion
4.9
)-
[25]
We
use
dPO
IExt
ract
or6
toex
trac
tth
ese
lect
edfe
ature
sfr
om
the
slid
es(p
pt
files
).PO
IExt
ract
or
pro
vides
multip
leex
trac
tion
func
tional
itie
ssu
chas
imag
e,te
xtb
ox
(font
size
,co
lours
,bold
and
ital
ics,
etc.
),obje
cts,
slid
enum
ber
,et
c.Esp
ecia
lly,th
efe
ature
sm
easu
red
usi
ng
textu
alcu
essu
chas
num
Exam
ple
are
mar
ked
with
Tan
dth
eke
yword
sfo
rcu
esar
epro
vided
.In
case
ofusi
ng
additio
nal
SW
libra
ries
,w
esp
ecifi
edth
emin
the
des
crip
tion.
Kim et al. 27
Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917
at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from