Journal of Information Science Developing information quality …kirc.kaist.ac.kr/papers/journal/2016_Seongchan_JIS.pdf · The assessment framework needs a taxonomy of IQ dimensions,

Article

Journal of Information Science

1–27

� The Author(s) 2016

Reprints and permissions:

sagepub.co.uk/journalsPermissions.nav

DOI: 10.1177/0165551516661917

jis.sagepub.com

Developing information qualityassessment framework of presentationslides

Seongchan KimGraduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea

Jae-Gil LeeGraduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea

Mun Y. YiGraduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea

AbstractComputerized presentation slides have become essential for many occasions such as business meetings, classroom discussions, multi-purpose talks and public events. Given the tremendous increases in online resources and materials, locating high-quality slides relevantto a given task is often a formidable challenge, particularly when a user looks for superior quality slides. This study proposes a new,comprehensive framework for information quality (IQ) developed specifically for computerized presentation slides and explores thepossibility of automatically detecting the IQ of slides. To determine slide-specific IQ criteria as well as their relative importances, wecarried out a user study, involving 60 participants from two universities, and conducted extensive coding analysis. Further, we subse-quently conducted a series of multiple experiments to examine the validity of the IQ features developed on the basis of the selectedcriteria from the user study. The study findings contribute to identifying key dimensions and related features that can improve effectiveIQ assessments of computerized presentation slides.

Keywordsinformation quality (IQ); IQ assessment; presentation slides; qualitative study; slide ranking

1. Introduction

Information quality (IQ), which is often defined briefly as the ‘fitness for use’ of information [1], plays a crucial role in

the decisions and actions of information consumers [2]. As the amount of information surrounding an information con-

sumer has been increasing rapidly, it has become highly challenging to locate a high quality source of information, which

is often directly related to the performance of the consumer [3, 4].

Computerized presentation slides are the materials that are created with presentation software (e.g. PowerPoint,

Keynote), as opposed to traditional presentation materials such as papers or overhead projector films [5]. Computerized

presentation slides (hereafter mentioned as presentation slides or slides for convenience) are one of the most popular

information media, commonly used in conjunction with business meetings, academic lectures, multipurpose talks and

public events.

Acknowledging the importance of presentation slides, online services solely focused on presentation slides such as

SlideShare1 and SlideFinder2 have recently emerged. While these specialized platforms offer the ability to perform a

search against millions of slide files with the number growing continuously, most users must wade through a multitude

Corresponding author:

Mun Y. Yi, Graduate School of Knowledge Service Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea.

Email address: [email protected]

at KOREA ADV INST OF SCI & TECH on October 3, 2016jis.sagepub.comDownloaded from

http://jis.sagepub.com/

of slides and discern the quality of slides before they locate a high quality slide file. This issue has become acute with

the rapid increase in available slides and the growth of platforms offering similar services. Furthermore, on most plat-

forms anyone can upload their slides without any quality verification, thereby making the job of locating high quality

slides increasingly problematic.

To effectively tackle the problem of discerning the quality of presentation slides, developing an IQ assessment frame-

work tailored for computerized presentation slides is a must [6, 7]. The assessment framework needs a taxonomy of IQ

dimensions, criteria, tools and metrics in consideration of the unique characteristics of slides. However, to the best of our

knowledge, no academic efforts have been reported for the development of IQ frameworks specifically targeted for pre-

sentation slides. Several recent studies have attempted to present an IQ assessment framework, not for slides, but for gen-

eral documents such as Web documents and Wikipedia. A set of quality assessment models of Web documents has been

proposed and evaluated [8–12], and a different set of quality frameworks for Wikipedia articles has been reported [13–

17]. When considered together, those studies collectively reveal that IQ dimensions, criteria, models, tools and metrics

vary depending on the types of documents being evaluated. For example, the criteria related to content and readability

are essential for the quality of Web pages [8] and the quality criteria about coverage and structure are important for the

quality of Wikipedia articles [14]. Such IQ assessment frameworks, however, might be inappropriate for slides, as slides

have special features not found in regular documents or Web pages.

The overall purpose of this paper is two-fold: (1) develop a new, comprehensive IQ framework specifically tailored

for computerized presentation slides from the perspective of slide users; and (2) examine the possibility of automatically

assessing the IQ of slides on the basis of the developed IQ framework. Correspondingly, our research has been con-

ducted in two phases (see Figure 1): (1) a user-involved study inclusive of interview, coding analysis and card sorting to

identify the slide quality criteria exercised by slides users; and (2) a series of activities and lab experiments conducted to

verify the applicability of the IQ framework, developed in the first phase, to automatic detection of high quality slides.

In the first phase, we aim at securing a set of quality criteria, measuring diverse metrics at multiple levels for presenta-

tion slides, as well as determining the relative importance of each criterion based on its frequency of mentions made by

the respondents. In the second phase, we first determine the quality of slides obtained from the Web to set the ground-

truth. Then, we define quality features in terms of their IQ dimensions, and extract these features from the slides. These

features are used for LTR (Learning to Rank) algorithm training [18]. After extracting 65 features in 10 IQ dimensions,

we train LTR algorithms such as LambdaMART and AdaRank to reorder the initial search results. We examine the

effectiveness of our proposed method by comparing the normalized discounted cumulative gain (NDCG) of the results

produced by the trained LTR model with the results of the Okapi BM25 ranking function and Google slide search.

To the best of our knowledge, this research is the first to develop a comprehensive IQ framework for computerized

presentation slides, except for our preliminary study [19], which included a very limited set of quality features and did

not include a user study. Compared with other prior research on IQ, our proposed framework can be seen as offering

Phase 1: User Study

IQ framework

Interview Codinganalysis

Cardsorting

Data acquisition /annotation

Feature extraction Learning to rank

IQframework

Phase 2: Automatic Assessment Study

Figure 1. Overall research process.

Kim et al. 2

Journal of Information Science, 2016, pp. 1–27 � The Author(s), DOI: 10.1177/0165551516661917



substantial advantages as its development is driven by users’ feedback obtained in a controlled environment, and its

validity is confirmed via a series of automatic assessment experiments that involve state-of-the-art search algorithms.

The rest of the paper is organized as follows. Section 2 presents related work on IQ and slides design. Section 3 pre-

sents the details of our user study and its results obtained via qualitative analyses. Section 4 reports on our automatic slide

quality assessments with the LTR technique. Finally, we conclude this paper with overall discussion in section 5.

2. Related work

IQ is often defined briefly as the ‘fitness for use’ of information [1] and more completely as ‘people’s subjective judg-

ment of goodness and usefulness of information in certain information use settings with respect to their own expectations

of information or in regard to other information available’ [20]. As people’s expectations of information are not uniform,

IQ is inherently a multi-dimensional construct, commonly comprising several elements such as accuracy, objectivity,

relevancy and completeness. For over a decade, a significant body of studies has been conducted on IQ taxonomy or on

automatic IQ evaluation, with regard to various resources on the Web. Also, a number of guidelines have been proposed

for slide design. We summarize those studies in this section.

2.1. IQ taxonomy

Several general IQ taxonomies exist in the literature. Table 1 summarizes these taxonomies, which were used as a basis

for developing our IQ taxonomy of presentation slides. In the literature, context-general taxonomies [1, 7] and context-

specific taxonomies [21, 22] have been reported. Wang and Strong [1] proposed a hierarchical conceptual framework of

data quality, which has been widely adapted for many studies. Their study aimed at developing a general framework that

captured the characteristics of IQ that were important to information consumers across the board. Their framework con-

sists of four IQ categories: intrinsic, contextual, representational and accessibility. Each of those categories further

includes elements as detailed in Table 1. Stvilia et al. [7] suggested a taxonomy through a literature analysis conducted

using 32 representative articles. Their taxonomy has three categories (intrinsic IQ, relational or contextual IQ, and repu-

tational IQ) and 22 dimensions.

For context-specific taxonomies, Alkhattabi et al. [21] proposed an IQ taxonomy for e-learning systems including con-

textual representation, accessibility, intrinsic category and 14 elements (also called dimensions). They initially started

with Wang and Strong’s taxonomy [1] in developing their framework. With a survey involving 315 users and statistical

analysis, they revised the previous taxonomy to specialize it for e-learning systems. In their taxonomy, accessibility was

emphasized as it included response time and availability, which were considered crucial for e-learning systems. Dedeke

[22] suggested a context-specific taxonomy for information systems including ergonomic, accessibility, transactional,

contextual and representation dimensions. Interestingly, the author proposed ergonomic IQ and transactional IQ, which

are not found in other IQ studies, especially for information systems. In sum, prior research on IQ taxonomy suggests that

a general IQ taxonomy has inherent limitations when it is applied to a specific application domain, and that an IQ taxon-

omy tailored for computerized presentation slides needs to be developed anew to properly reflect the unique characteris-

tics of the information media.

2.2. Automatic IQ assessment

Automatic IQ assessment has recently received a considerable amount of attention from researchers in the Information

Science community. However, no research has yet attempted to automatically assess slide quality except for our prelimi-

nary study [19]. Studies on quality-based retrieval can be divided into those that employ LTR (Learning to Rank) meth-

ods and those that do not. Many studies on LTR focused on estimating the relevancy between query and resources; few

studies dealt with the query and resources in terms of quality even though quality is a more comprehensive concept than

relevancy [1]. LTR techniques have gained considerable attention in recent years. Conventional ranking models such as

BM25 and language modelling suffer from parameter tuning and over-fitting. However, LTR, a ranking method that uses

machine learning, provides the advantage of automatically tuning its parameters, along with its ability to combine multi-

ple evidence and avoid over-fitting [18].

LTR has been successfully adapted to various quality-based tasks that utilize Web resources. Richardson et al. [23]

used RankNet, a modified neural network algorithm for learning rankings, to order Web pages with static features based

on anchor texts and domain characteristics. Their research showed that simple URL- or page-based features outper-

formed PageRank. Choi et al. [24] used SVMRank to re-rank initial search results by combining the relevance and qual-

ity scores of medical documents. More specifically, they initially obtained search results with Okapi BM25 from the

Kim et al. 3




Tab

le1.

IQta

xonom

yin

the

liter

ature

.

Study

conte

xt

Pri

or

rese

arch

Met

hod

IQta

xonom

y

Conte

xt:

gener

al–

Todev

elop

age

ner

alIQ

taxonom

yfr

om

dat

aco

nsu

mer

s’per

spec

tive

Aco

nce

ptu

alfr

amew

ork

for

dat

aqual

ity

[1]

Two-s

tage

surv

ey:

(1)

Gen

erat

ion

ofqual

ity

attr

ibute

sfr

om

112

par

tici

pan

ts(2

)C

ard

sort

ing

toas

sign

dim

ensi

ons

tota

rget

cate

gori

esw

ith

30

subje

cts

Intr

insi

c:th

edeg

ree

tow

hic

hdat

ahav

equal

itie

sin

thei

row

nri

ght

(bel

ieva

bili

ty,ac

cura

cy,

obje

ctiv

ity,

and

repu

tation)

Conte

xtu

al:th

edeg

ree

tow

hic

hdat

aqual

ity

must

be

consi

der

edw

ithin

the

conte

xt

ofth

eta

skat

han

d(v

alue-

added

,re

leva

ncy

,tim

elin

ess,

com

ple

tenes

s,an

dap

pro

pri

ate

amount

ofdat

a)R

epre

senta

tional

:th

edeg

ree

tow

hic

hth

efo

rmat

and

mea

nin

gofdat

aar

ecl

ear

(inte

rpre

tabili

ty,ea

seofunder

stan

din

g,re

pre

senta

tional

consi

sten

cy,an

dco

nci

sere

pres

enta

tion)

Acc

essi

bili

ty:th

edeg

ree

tow

hic

hth

esy

stem

must

be

acce

ssib

lebut

secu

re(a

cces

sibili

tyan

dac

cess

secu

rity

)C

onte

xt:

gener

al–

Topro

pose

age

ner

alta

xonom

yofIQ

dim

ensi

ons

support

ing

gener

alIQ

asse

ssm

ent

fram

ework

Age

ner

alIQ

asse

ssm

ent

fram

ework

[7]

An

anal

ysis

of

repre

senta

tive

item

sin

the

IQlit

erat

ure

Intr

insi

c:m

easu

ring

inte

rnal

char

acte

rist

ics

ofin

form

atio

nin

rela

tion

toso

me

refe

rence

stan

dar

din

agi

ven

culture

(acc

ura

cy/v

alid

ity,

cohes

iven

ess,

com

ple

xity,

sem

antic

consi

sten

cy,st

ruct

ura

lco

nsi

sten

cy,cu

rren

cy,in

form

ativ

enes

s/re

dundan

cy,nat

ura

lnes

s,an

dpre

cisi

on/c

om

ple

tednes

s)R

elat

ional

/conte

xtu

al:m

easu

ring

rela

tionsh

ips

bet

wee

nin

form

atio

nan

dce

rtai

nas

pec

tsof

its

usa

geco

nte

xt

(acc

ura

cy,ac

cess

ibili

ty,co

mple

xity,

nat

ura

lnes

s,in

form

ativ

enes

s/re

dun

dan

cy,re

leva

nce

,pre

cisi

on/c

om

ple

tenes

s,se

curi

ty,se

man

tic

consi

sten

cy,st

ruct

ura

lco

nsi

sten

cy,ve

rifia

bili

ty,an

dvo

latilit

y)R

eputa

tional

:m

easu

ring

the

posi

tion

ofan

info

rmat

ion

entity

ina

cultura

lor

activi

tyst

ruct

ure

,oft

endet

erm

ined

by

its

ori

gin

and

reco

rdofm

edia

tion

(auth

ori

ty)

Conte

xt:

spec

ific

–To

pro

pose

aIQ

taxonom

yfo

re-

lear

nin

gsy

stem

s

Afr

amew

ork

for

e-le

arnin

gsy

stem

s[2

1]

Alit

erat

ure

exam

inat

ion

and

emai

lsurv

eyques

tionnai

refr

om

315

use

rs

Conte

xtu

alre

pre

senta

tion:re

fers

toth

eco

nce

pt

ofco

nso

lidat

ion

bet

wee

nW

ang

and

Stro

ng’s

two

cate

gori

es:co

nte

xtu

alan

dre

pre

senta

tional

(conci

senes

s,ve

rifia

bili

ty,

repr

esen

tational

consi

sten

cy,under

stan

dab

ility

,am

ount

ofin

form

atio

n,re

puta

tion,

com

ple

tenes

s)A

cces

sibili

ty:re

fers

toW

ang

and

Stro

ng’

sac

cess

ibili

tyca

tego

ry(A

vaila

bili

ty,R

elev

ancy

,A

cces

sibili

ty,R

esponse

Tim

e)In

trin

sic:

refe

rsto

Wan

gan

dSt

rong’

sin

trin

sic

cate

gory

(obje

ctiv

ity,

accu

racy

,bel

ieva

bili

ty)

Conte

xt:

spec

ific

–To

def

ine

ata

xonom

yfo

rin

form

atio

nsy

stem

s

Aco

nce

ptu

alfr

amew

ork

for

info

rmat

ion

syst

ems

[22]

An

anal

ysis

bas

edon

four

com

ponen

tsof

info

rmat

ion

syst

ems:

dat

a,in

terf

ace,

work

,har

dw

are/

soft

war

e

Erg

onom

icqual

ity:

the

deg

ree

tow

hic

hth

ein

terf

ace

and

the

soft

war

e/har

dw

are

syst

emis

des

igned

tom

eet

the

nee

ds

ofuse

rs(e

ase

ofnav

igat

ion,co

mfo

rtab

ility

,le

arnab

ility

,vi

sual

sign

als,

audio

sign

als)

Acc

essi

bili

ty:th

edeg

ree

tow

hic

hth

esy

stem

must

be

acce

ssib

lebut

secu

re(t

echnic

alac

cess

,sy

stem

avai

labili

ty,te

chnic

alse

curi

ty,dat

aac

cess

ibili

ty,dat

ash

arin

g,dat

aco

nver

tibili

ty)

Tran

sact

ional

:th

edeg

ree

tow

hic

hth

epro

gram

min

gdes

ign

ofa

spec

ific

work

pro

cess

for

conte

nt

and

logi

cw

ithin

soft

war

e(c

ontr

olla

bili

ty,er

ror

tole

rance

,ad

apta

bili

ty,sy

stem

feed

bac

k,ef

ficie

ncy

,re

sponsi

venes

s)C

onte

xtu

al:th

edeg

ree

tow

hic

hdat

aqual

ity

must

be

consi

der

edw

ithin

the

conte

xt

ofth

eta

skat

han

d(v

alue

added

,re

leva

ncy

,tim

elin

ess,

com

ple

tenes

s,ap

pro

pria

tedat

a)R

epre

senta

tion:th

edeg

ree

tow

hic

hth

efo

rmat

and

mea

nin

gofdat

a(inte

rpre

tabili

ty,

consi

sten

cy,co

nci

senes

s,st

ruct

ure

,re

adab

ility

,co

ntr

ast)

Kim et al. 4




Medline and PubMed corpora, and then trained classifiers to assess the quality of the retrieved documents. Finally, they

re-ranked the initial results, and observed a significant improvement in the final ranking performance. Regarding Q&A

forums, Dalip et al. [25] adopted a Random Forest approach to rank the quality of answers. They employed a large set

of features in groups named as user, review and structure. By conducting experiments with questions and answers in the

Stack Overflow, they determined that user and review features were the most effective in the Q&A domain.

Several studies on quality-based ranking have used techniques other than LTR. These methods attempt to maximize

retrieval performance in terms of quality by adding a document’s quality score to their own retrieval model score. Raiber

and Kurland [26] developed a query-independent document quality measure that considered stop-words, document

entropy, inter-document similarity and PageRank. They achieved Web-retrieval effectiveness (e.g. removing spam pages

from the retrieved documents) by combining a Dirichlet-smoothed unigram document language model and a quality

score. Bendersky et al. [8] attempted to incorporate the quality score of Web documents into a Markov Random Field

retrieval model to achieve quality-based Web document retrieval. Using seven document features, such as the number of

terms on a page, average term length and entropy, they demonstrated the effectiveness of quality-based retrieval over

relevance-based retrieval. Alkhattabi et al. [27] proposed an IQ assessment model of e-learning systems. The authors

proposed a linear equation to compute the overall quality score using quality metrics as well as their relative importance.

The metrics are organized in an IQ taxonomy with 14 quality dimensions.

Quality-based classification studies have considered Web pages [28], encyclopedia [14] and presentation slides [19].

Wu et al. [28] tackled the classification problem based on the quality of Web pages by examining quality-related factors

such as text length and image quantity. They suggested a learning method that divides training data into subsets accord-

ing to the clustering results of the quality-related factors. For encyclopaedia documents, Dalip et al. [14] classified

Wikipedia articles in terms of quality using the text, style, review and network features of articles. They proposed a

machine learning approach based on regression analysis to combine these quality features into a single quality value.

Regarding presentation slides, estimates of quality were made by our previous study [19]. In our study, we assessed the

quality of slides with a total number of 28 Representational, Contextual and Intrinsic features, and classified the slides

as high, fair or low quality.

2.3. Slides design

There have been several popular guidebooks by experts about designing successful presentation slides. Alley [29] in The

Craft of Scientific Presentations and Reynolds [30] in Presentation Zen offer guidelines for typography, colour, layout

and style for designing presentation slides and delivering successful presentations. These guidelines reflect the writers’

experiences, but are mostly based on anecdotal observations. As information quality is subjective judgement of fitness

of use by its users, it is necessary to incorporate users’ perceptions and standards into slide quality criteria.

Several researchers have conducted experiments to improve the understandability of slides for the audience by sug-

gesting alternative designs. Alley and Neeley [31] outlined an alternative design to the traditional design of presentation

slides (i.e. a phrasal headline with a bulleted list). Their design used a succinct sentence headline instead of a topic

phrase, assisted by visual evidence. In a case study involving PowerPoint slides, they showed that their proposed guide-

line was beneficial for engineering students and professors. A post-study survey indicated that 60% continued to use sen-

tence headlines in most of their slides and 82% used visual evidence. Furthermore, 55% of the audience was mostly

receptive to the alternative design. Further, Mackiewicz [32] examined 37 participants’ perceptions of slides for the

clarity and attractiveness of graphs displaying two- and three-dimensional bars. These studies, however, focused only on

a limited set of the design features of presentation slides.

3. User study

Our research employed a qualitative study [33] to determine the quality criteria of presentation slides directly from the

users. Based on the guidelines offered by popular qualitative study handbooks [33, 34] and the research method used by

a prior study on information quality assessment [17], our user study involved three major activities: (1) an interview in

which users were observed during the evaluation process, (2) a coding analysis, which is an analytical process including

code extraction from interview transcripts and code reconciliation, and (3) card sorting for assignment of the elicited cri-

teria into appropriate IQ dimensions. During the interview, we asked users to think aloud their perceptions of slide qual-

ity. All of the users’ utterances were audio-recorded and fully transcribed. Then, we elicited IQ criteria from the

interview transcripts through coding analysis [33, 34]. Although there exist some guidelines for successful presentation

slide design by experts [29, 30] and research into the design of slides [31], criteria established through a user study can

provide more diverse user views directly without being filtered by intermediaries. To determine the dimension of the IQ

Kim et al. 5




criteria, we conducted card sorting, which is a simple and user-friendly technique for understanding the participants’

thoughts and underlying rationale while producing objective topic groups and organizational structures [33, 36].

3.1. Formulation of initial IQ taxonomy

Our literature survey reveals that prior studies have focused on IQ assessment frameworks mostly for regular documents

and that they might be inappropriate for presentation slides. Furthermore, those popular guidelines about the slide design

provide a partial view of IQ, with limited inputs from users. Thus, the present study intends to bridge the gap in the liter-

ature by developing a comprehensive IQ assessment framework specifically focused on presentation slides from users’

perspectives.

Starting with the extant IQ taxonomies, we developed a new IQ taxonomy tailored for the domain of presentation

slides going through a series of activities. Using Wang and Strong’s taxonomy [1] as the starting point, we borrowed

some additional dimensions from other studies as deemed relevant for presentation slides. We excluded IQ dimensions

about accessibility, as it is less essential for slides, given that presentation slides are abundantly available online and not

tied to a single system. Instead, we added reputational category in consideration of Stvilia et al.’s work [7], as the quality

of presentation slides is likely to be heavily affected by the person who created them. As a result, our taxonomy includes

4 categories (intrinsic, representational, contextual and reputational) and 13 IQ dimensions (see Table 2 and Figure 2).

Following the definition of Wang and Strong [1] and Stvilia et al. [7], the intrinsic category refers to the quality origi-

nating from the data contained on the slides. The quality of intrinsic dimensions do not change based on the context in

which the slides are presented. The representational category refers to the quality of information rendering, inclusive of

visual aesthetics and rendering clarity. The contextual category is concerned with the quality of information within the

context of the task at hand. In other words, the contextual IQ can be different depending on the context of the user’s task,

while the intrinsic category including accuracy considers the quality of data itself, regardless of the tasks and contexts.

Because users’ tasks and contexts vary across time and information consumers, IQ measurements of contextual quality

are considered challenging [1, 7]. For instance, completeness (one of the contextual IQ dimensions, which is the extent

to which the information had all of the required parts or necessary elements) of slides can vary because the required parts

and necessary elements of slides are different for academic lectures and for self-study. Fewer parts might be required for

users in class, while more parts may be necessary in self-study. On the other hand, accuracy, one of the intrinsic dimen-

sions, is not affected by time or task. The reputational category measures the position of an information entity in a cul-

tural or activity-related structure, often determined by its origin and record of mediation.

Table 2. Description for the IQ taxonomy.

Category Dimension Description

Intrinsic Accuracy The extent to which the information is true, correct, and preciseCohesiveness The extent to which the information is focused on one topicNaturalness The extent to which the information is expressed in conventional, typical

terms and forms in accordance with generally accepted reference sourcesObjective linkage clarity The extent to which the content of the information is clearly linked to the

presentation objectivesRepresentational Representational clarity The extent to which the representation of information is easily identified,

understandable, and readable per unit (character, paragraph, and page)Representationalconsistency

The extent to which the representation (visuals, format, background, etc.) ofinformation is done in a uniform manner

Visual attraction The extent to which the representation (visuals, format, background, etc.) ofinformation is appealing and engaging to the user

Ease of navigation The extent to which the information is easy to navigate or predictable for theuser

Contextual Completeness The extent to which the information had all of the required parts ornecessary elements

Informativeness The amount of information contained in the presentation materialRecency The extent to which the age of the information is up-to-dateTask appropriateness The extent to which the information is proper in the context of a specific

activity or taskReputational Author/institutional

reputationThe extent to which the information of the author or institution is trusted orhighly regarded in terms of its source or content

Kim et al. 6




The initial version of the IQ taxonomy was further modified and refined through a series of activities involving users.

Specifically, we conducted (1) a qualitative interview in which user opinions were obtained through the think-aloud pro-

cess; (2) a coding analysis, which is an analytical process including code extraction from interview transcripts and code

reconciliation; and (3) a card sorting experiment for the assignment of the elicited criteria into appropriate IQ

dimensions.

3.2. Interview (think-aloud)

The participants for the study were 60 students from two national universities in South Korea. To remove any idiosyn-

cratic issues associated with a single university or a single discipline, we recruited students in two broad fields of study

at the two universities: (1) Management and (2) Science and Engineering. There was an equal mix of Management

majors (30 – student majors included Management Science, Economics and International Trade), and Science and

Engineering majors (30 – student majors included Mechanical Engineering, Bio and Chemical Engineering, Electronics

and Information/Computer Science). For both, there was an equal distribution of 15 undergraduate and 15 graduate stu-

dents, and gender was equally distributed (30 males and 30 females). The participants indicated that they frequently used

presentation slides for activities such as classes, seminars, meetings and self-study. All participants received monetary

rewards for participating in the interview study. Table 3 provides a summary of the interview participant statistics.

Prior to the interview, each participant was asked to select three courses that he or she had taken successfully during

the previous semesters and to pick the one he or she liked most. This selection process was needed to ensure that the par-

ticipants fully understood the contents of the slides and the contexts in which the presentation slides could be used. Five

slide files were randomly selected from one of the chosen courses. We used SlideShare,1 which provides links to slides

on the Web, to gather the slides used in conjunction with the interview. The selection process resulted in 31 courses and

155 slides in both PowerPoint(ppt/pptx) and PDF formats. Each participant had a one-to-one interview in a room dedi-

cated to the interview study. The purpose of the study was briefly explained at the beginning of each interview.

Information quality

Intrinsic

Accuracy

Cohesiveness

Naturalness

Object linkage clarity

Representational

Representational clarity

Representational consistency

Visual attraction

Ease ofnavigation

Contextual

Completeness

Informativeness

Recency

Task appropriateness

Reputational

Author/Institutional reputation

Figure 2. An IQ taxonomy of presentation slides.

Table 3. Summary of interview participant statistics.

Criteria Average Notes

Age 23.9 (SD 3.05) Max: 33, Min: 18Years using slides 5.7 Max: 10, Min: 1Proficiency on making slides 4.2 (Proficient) 5-point scaleNo. of courses in a semester 4.6No. of courses using slides 3.6Interview time (minutes) 47.2 Max: 84, Min: 18

Kim et al. 7




Participants filled out a short questionnaire about their background and signed an informed consent form. Next, the parti-

cipant was asked to view the five monitor screens, each of which displayed one of the presentation slides from the

course selected by the participant. Participants were allowed to go back and forth between the screens if they wanted to

compare slides. They were also provided with the slides on paper for their convenience. They were asked to ‘think-

aloud’ while they read and evaluated the quality of the presentation slides. They were also asked to indicate which one

of the five slide files was of the lowest or of the highest quality and to provide an explanation for their choices.

3.3. Coding analysis

Verbal data was transcribed from the recordings of all the interviews. The analysis carefully followed the rules of qualita-

tive study, and several steps were taken to ensure validity [33]. In the first stage, three coders conducted content analysis

to code the interview transcripts and identify quality criteria as articulated in the scripts. Each coder independently open-

coded the entire transcript. While there are various coding methods for diverse purposes, our study used Descriptive cod-

ing and In Vivo coding [34]. Descriptive coding summarizes the text in the transcript into one essential short keyword or

phrase and In Vivo coding keeps the text in the subject’s own language to represent grounded concepts well. After the

primary coding was completed, the resultant schemata were aggregated and all differences were reconciled. Several dis-

cussions and meetings were conducted to reconcile and merge the differences among the three coders. Finally, the coders

recoded the entire sample using the aggregated final schema. Reconciliation included the unification of terms in codes

and merging of codes between different expressions with an identical meaning. The rigorous coding process identified

216 quality criteria mentioned by at least two participants out of 3,617 total utterances regarding quality (1,557 positive;

2,060 negative). The iterative coding process included combining different expressions with the same meaning into a sin-

gle quality criterion and selecting proper terms for expressing the criterion by the coders. We utilized Atlas.ti 73, which is

a widely used tool of qualitative data analysis and research.

3.4. Card sorting

Card sorting exercises were conducted to group the quality criteria into appropriate quality dimensions. This method

commonly involves sorting a set of cards, each of which contains a label that addresses a topic, into groups that have

common aspects among them [35]. Fifteen students (mean age: 26.7, SD: 2.4; Major: Management 7, Science 8), differ-

ent from those who had participated in the interviews, were selected to perform the card sorting exercise, and were

assigned to one of five teams. Each team consisted of three subjects. Each quality criterion with its description (identi-

fied through the aforementioned coding analysis) was printed on a card, and the subjects in a team setting were collec-

tively asked to separate the cards into groups. The subjects were allowed to freely discuss their ideas among the team

members during the sorting exercises [35, 36].

An open sorting exercise and a closed sorting exercise were performed in sequence. In the open sorting exercise, sub-

jects were asked to perform a trial sort using 20 randomly selected cards without any predefined dimensions. They were

asked to separate the cards into any number of groups (piles), label each pile of cards, and explain their rationale for

grouping the cards together. This exercise was performed to ensure that the subjects understood the card sorting proce-

dure. Then, a closed sorting exercise was conducted. In this sorting, the 216 quality criteria identified in the previous

coding analysis step were used. The subjects were asked to place the 216 cards into the 13 quality dimensions presented

in Table 2. The sorting took 191 minutes on average per session. There was a separate session for each group. We com-

puted a correlation score of the card sorting study [36]. The correlation score indicates how often a card was put into the

same category by different subjects, as in Eq. (1). The average correlation score was 0.70, which means medium agree-

ment [36].

Correlation i, jð Þ=P i, jð Þ

Pð1Þ

where Correlation i, jð Þ is the correlation of the card i in the category j, P i, jð Þ is the total number of participants who put

the card i in the category j and P is the total number of participants.

3.5. Results of interview

In this section, we report the results from the coding analysis. The top 10 criteria by the number of respondents are

reported in Table 4. The three quality criteria – ‘highlighting of important points and terms’, ‘writing style’ and ‘the

Kim et al. 8




presence of too much text’ – were supported by over 40 people, which was over two-thirds of the entire subject popula-

tion (60). The criterion mentioned by the largest number of respondents was ‘highlighting of important points and terms’,

which was mentioned 132 times by 47 participants. A criterion is considered positive if all mentions made by all partici-

pants showed the quality as something positive. The results clearly indicate that users consider emphasizing key points

with highlights as a desirable quality feature. Examples of direct mentions by users include ‘Another good thing is that

highlighting the important parts in red helps my understanding; the title in bold is easier to see’, and ‘The important parts

are highlighted with a strong color; therefore, I can take more notice of the significant parts.’

The second highly ranked criterion is ‘writing style’, which is mentioned 69 times by 47 respondents. This one is

interesting as we found that users mentioned two types of writing style: ‘summarized writing style’ and ‘sentence writing

style’. The former indicates that most texts in slides are expressed in summarized and condensed forms, while the latter

means that texts are expressed in complete sentences. Each style received two contradictory interpretations by the partici-

pants. Some participants regarded ‘summarized writing style’ as preferable; however, others responded the opposite was

true. ‘Summarized writing style’ was noted 27 times by 23 participants. Among them, 19 respondents (82.7%) considered

it positive and four (17.3%) negative. Positive examples include: ‘The condensed, summarized style of writing using

important keywords improves my concentration. It helps me reproduce key facts and ideas.’ Responding negatively, one

participant said, ‘I can’t understand what this phrase means. A full, descriptive sentence would improve the explanation,

though it would be a bit longer.’ On the other hand, ‘sentence writing style’ was mentioned 42 times by 24 respondents –

five respondents (20.9%) considered it positive and 19 (79.1%) negative. A positive example was: ‘Explanations using

natural sentences are much more understandable for me. I can catch the points from detailed explanations’, whereas a

negative example was: ‘The sentences are hard to read because they are usually so long. I prefer short, well-constructed

expressions in slides.’ Overall, 80.9% of participants (38 out of 47) preferred summarized expression (summarized writ-

ing style) over complete sentences (sentence writing style) in the slides (57 mentions out of 69).

Thirdly, ‘the presence of too much text (in the slides)’ was mentioned 120 times by 43 participants – 117 times

(97.5%) negatively and 3 times (2.5%) positively. Negative examples include ‘It seems there are too many letters on the

page to grasp the content’, and ‘When you take a course with slides that have a large amount of writing, they usually

move more quickly. Because I cannot read beyond a certain point, I think it is not good.’ A positive example was ‘The

advantage seems to be that I can study only with the slides without having a textbook.’ For more details, we report the

top five criteria in each dimension in the next section and the first two criteria with the actual users’ comments in

Appendix A. The results are organized according to the results of card soring.

3.6. Results of card sorting

Table 5 presents the distribution of the sorted criteria (i.e. 216 quality criteria matched with 13 quality dimensions in 4

categories) obtained from the closed sorting exercise. Regarding the quality category, the representational criteria

(66.7%) were the most frequently mentioned by users. Contextual (28.2%), intrinsic (4.6%) and reputational (0.5%) cri-

teria followed. The first two categories represent almost 95%, indicating that the participants considered those criteria

that were directly related to comprehending the content of presentation slides naturally and within the context of the task

at hand much more important. Regarding the quality dimension, visual attraction (28%), representational clarity (18%)

and informativeness (12%) were the criteria most frequently mentioned.

Table 4. Top 10 quality criteria from coding analysis by respondents.

Rank Criterion No. of mentions No. of respondents

1 Highlighting of important points and terms 132 472 Writing style 69 473 The presence of too much text 120 434 The presence of examples 93 365 The presence of additional explanations (for equations, graphs, figures, tables) 80 346 The presence of figures 56 337 Good summarization 43 268 The presence of a large amount of content (slides) 40 269 The line-by-line presence of animation 42 2510 The presence of slides numbering 39 23

Kim et al. 9




Table 5 also shows the average number of mentions and respondents for each dimension. Interestingly, ‘Visual attrac-

tion’ and ‘Representational clarity’ are the first and second dimensions in terms of the number of unique criteria.

‘Informativeness’ is the first dimension in terms of the average number of mentions and respondents, although it is the

third dimension in terms of the number of unique criteria. This means that criteria in Informativeness were more uni-

formly shared by many respondents while criteria in Visual attraction and Representational clarity were diverse yet

intensively shared by a small number of respondents. Criteria in Visual attraction and Representational clarity are more

subjective, while those in Informativeness are more objective. Detailed results are given in Table 6, where we report

only the top five criteria and the numbers of mentions and respondents in each dimension.

3.7. Discussion

The results in Table 5 show that, compared with other categories, the Representational category criteria were mentioned

more often by users regarding slide quality, agreeing with prior research on presentation slides. For instance, participants

prefer visually rich slides that contain images, diagrams and graphs [31, 32], indicating that understanding the content of

slides in an efficient and integrative manner is crucial. Because users have to pay a great deal of attention to what presen-

ters offer, users naturally want the presentation materials to be cognitively intuitive and less burdensome.

It should be also noted that our finding on the importance of the Representational criteria is different from prior IQ

research findings on other types of documents, such as Web documents and Wikipedia. In a qualitative study conducted

to identify key criteria for the quality of Web pages, Rieh [11] found that Web users considered content as the most

important object in the Web, followed by graphics and organization/structure. Yaari et al. [17] examined the quality cri-

teria for Wikipedia articles using a group of 60 users and found that users more frequently mentioned coverage and

structure rather than those criteria belonging to the Representational category. The accumulated research findings seem

to suggest that users employ a different set of quality criteria depending on the type of document.

One notable criterion is ‘writing style’. Many participants preferred summarized expressions (summarized writing

style) over complete sentences (sentence writing style) in the slides (by 38 respondents out of 47 and 57 mentions out of

69). It is interesting to note that these results do not completely coincide with Alley and Neeley [31], who proposed that

the alternative (preferred) design depends on succinct sentence-style titles in slides. It should also be noted that our find-

ings differ from Reynolds’s recommendation to use sentences rather than topic statements [30]. However, according to

our results, more users prefer summarized writing style to sentence writing style. The results do not mean that sentence

writing style always causes lower quality slides. Surely, Alley and Neeley point out there are exceptions to sentence

headline – for example, on title slides, transition slides and any slides on which a sentence is not warranted. Synthesizing

these prior study findings with ours, we conclude that slide authors need to be mindful in determining an appropriate

writing style depending on the context and must pay careful attention to the pros and cons of the two styles. We also

point out that this finding supports the claim that quality is a subjective concept, which depends on subjective judgement

of goodness and usefulness of information [20].

Table 5. Distribution of the criteria.

Category Dimension No. of uniquecriteria

% of uniquecriteria

Avg. no. ofmentions

Avg. no. ofrespondents

Intrinsic Accuracy 3 1.4% 3.3 3.3Cohesiveness 3 1.4% 8.7 7.7Naturalness 1 0.5% 4.0 3.0Objective linkage clarity 3 1.4% 11.3 8.6

Representational Representational clarity 39 18.1% 14.0 8.2Representational consistency 25 11.6% 7.5 5.8Visual attraction 61 28.2% 10.9 7.7Ease of navigation 19 8.8% 10.6 8.4

Contextual Completeness 10 4.6% 8.0 7.0Informativeness 27 12.5% 26.4 15.0Recency 1 0.5% 3.0 2.0Task appropriateness 23 10.6% 11.2 7.9

Reputational Author/institutional reputation 1 0.5% 3.0 3.0Total 216 100.0% Avg. 7.2 5.6

Kim et al. 10




‘Naturalness’ of IQ means the degree to which information is expressed by conventionally typified terms and forms

in accordance with some generally accepted reference source(s). ‘The presence of equation’ is categorized into this

dimension on the basis of the card sorting results in Phase 1. This allocation is reasonable considering that equations are

generally made up of mathematical symbols, which represent conventional and typified concepts in a domain. ‘Task

Table 6. Examples of the criteria with the number of mentions and respondents (ordered by respondents).

Category Dimension Criterion No. ofmentions

No. ofrespondents

Intrinsic Accuracy The presence of typos 5 5The presence of inaccurate explanations 3 3The presence of incorrect images related to thecontent

2 2

Cohesiveness The presence of irrelevant content 18 15Irrelevant page title for the content 5 5Strong content connectivity 3 3

Naturalness The presence of equations 4 3Objective linkage clarity Content flow 25 19

Reversed content order against the outline 7 5Weak connection between outline and content 2 2

Representational Representational clarity Highlighting of important points and terms 132 47Writing style 69 47Resolution of figures/images 45 22Font size 41 21Sentence spacing 28 19Explanation with comparison 26 18

Representational consistency Inconsistent font face 18 16Inconsistent font size (in content, title) 21 15Inconsistent indentation level 14 10Consistent use of background template 12 9Inconsistent animation effects (speed, step, etc.) 9 9

Visual attraction The line-by-line presence of animation 42 25Clean and neat slide design 35 21The presence of animation 30 21Color scheme 25 15Default font color 20 15

Ease of navigation The presence of slide numbering 39 23The presence of a table of contents 27 19The presence of textbook page numbers 9 7The presence of a title for each page 8 7Showing subsection titles on each page 7 6

Contextual Completeness The presence of references 15 14The presence of necessary information on thecover page (title, author, etc.)

14 11

The presence of necessary content 11 10The absence of default/general content for aconcept

8 6

The presence of an appendix 3 3Informativeness The presence of too much text 120 43

The presence of examples 93 36The presence of additional explanation (forequation, graph, figure, table)

80 34

The presence of figures 56 33The presence of a lot of content (slides) 40 26

Recency Outdated content 3 2Task appropriateness Good summarization 43 26

The presence of exercise 21 19The presence of question 22 15Enough white space on a page 20 15Unsuitable for printing 11 11

Reputational Author/institutionalreputation

Slides by the textbook publisher 3 3

Kim et al. 11




appropriateness’ is the degree to which information is proper and useful for a given task. This dimension includes those

aspects denoting how much the slides are fit for the tasks for which they are designed, including class presentation, self-

study, note-taking and reporting. As for task appropriateness, the criteria such as ‘good summarization’, ‘the presence of

exercise’ and ‘the presence of question’ are assigned into this dimension.

One of the main contributions of the present study is providing a rich set of quality criteria (i.e. 216) for presentation

slides. Because the open-ended interviews did not provide any prior criteria, the participants could come up with a vari-

ety of quality criteria. This set of criteria could be a foundation for developing diverse metrics at multiple levels for pre-

sentation slides. Another contribution is the measurement and consequent identification of the relative importance of

each criterion based on user utterances obtained through the think-aloud process. The measurement data served as a basis

for subsequent feature selection out of the whole set of criteria for automatic assessment in ranking. We selected and

implemented the quality features by the number of mentions and respondents. Such insights are unavailable in the guide-

lines provided by professionals.

One limitation is that our user study considered only lecture slides, not other types of slides. However, experiments

later prove that quality criteria from lecture slides are also effective in IQ assessment for other types of slides (see

Section 5.1). Another limitation is that all our participants were Korean students. Their perceptions might be different

from other ethnic groups due to cultural differences. Extant literature is unclear on how culture affects slide feature pre-

ferences and evaluations. Moreover, even though they are also information consumers, students might have different

quality expectations than business professionals or subject experts. Nevertheless, it is important to note that the study

findings provide a significant starting point for the establishment of slide quality criteria and dimensions from users’

perspectives.

4. Automatic slide quality assessment

In this section, we present our approach to automatically assessing the quality of presentation slides, ranking them via

some of the quality criteria identified in our previous user study. Figure 3 describes our overall process of the automatic

assessment approach. More specifically, our approach consists of two stages: initial search and slide quality assessment.

The initial search stage aims to identify slides that are relevant to the query, without the consideration of slide quality.

The second stage assesses the quality of each relevant slide and produces a quality-based ranking. In the second stage,

we perform (1) data annotation to build a ground-truth dataset, which was divided into a training set and a testing set, (2)

a feature extraction as a feed for an LTR model, and (3) an experiment of LTR model training and prediction/ranking.

4.1. Rationale and slide quality assessment

Compared with conventional ranking models such as BM25 and the statistical language model, LTR is an effective rank-

ing model because it offers additional benefits such as automatically tuning parameters, combining multiple evidence

and avoiding over-fitting [18]. In addition, LTR has been successfully adapted to various quality-based ranking tasks that

utilize Web resources [23, 24, 25]. However, a main disadvantage of LTR is that it is quite expensive to create training

data via labelling of all of them, in particular if the dataset is large. If this labelling issue can be overcome, LTR is a bet-

ter choice than the traditional ranking models as LTR offers several advantages, as already mentioned. Thus, we decided

to use LTR in conjunction with the features extracted from our IQ framework.

This section explains the rationale and realization of the learning scheme. LTR is a supervised learning method that

includes training and testing phases [18, 37]. The training dataset consists of a number of queries and slides. The quality

of the slides with respect to the query is represented by several grades, with a higher grade corresponding to higher qual-

ity. Let Q= q1, q2, . . . , qmf g be the query set, where qi is the i-th query, and S = si, 1, si, 2, . . . , si, nif g be the slides set.

L= 1, 2, . . . , lf g is the grade label set, which is ordered l > l � 1> . . . > 1. Suppose that L= li, 1, li, 2, . . . , li, nif g is the

set of labels associated with query qi, where ni denotes the size of Si and Li; si, j denotes the j-th grade label in Li,

Table 7. Description of datasets.

Dataset Source No. of queries No. of slides

SLIDES-SA SlideShare 140 935SLIDES-SF SlideShare 500 24,995SLIDES-GA Google 36 180

Kim et al. 12




representing the quality degree of si, j with respect to qi. The training set is denoted as T = qi, Sið Þ, lif gmi= 1. A feature

vector xi, j =u qi, si, j

� �is derived from each query-slides pair qi, si, j

� �, i= 1, 2, . . . ,m; j= 1, 2, . . . , ni, where u denotes

the feature functions. The training dataset is represented as S0= xi, lið Þf gmi= 1, where xi = xi, 1, xi, 2, . . . , xi, ni

f g. Our goal

is to train a ranking model f = q, sð Þ= f xð Þ that assigns a score to a given feature vector x. We then select a ranking list

from all possible ranking lists for the given query qi and the associated slides si using the scores given by the ranking

model f qi, sið Þ. The testing data is denoted as T = qm+ 1, Sm+ 1ð Þf g, consisting of a new query qm+ 1 and associated

slides Sm+ 1. A feature vector xm+ 1 is created from Sm+ 1, and a score is assigned to the slides Sm+ 1 by the trained rank-

ing model. A ranking list of slides is then formed based on the sorted scores. A local ranking model is a function of a

query and slides, or equivalently, a function of a feature vector created from a query and slides. Figure 4 depicts the flow

of the LTR scheme of quality-based ranking for presentation slides.

In the training phase, we applied the LTR algorithm to train the model for slide quality assessment. We first created a

set of training slide queries and then generated relevant slides via BM25 and Google. Next, a group of judges were asked

to determine the quality grade of each query and the returning slides set (see Section 5.1) by rating the quality of each

slide and ranking the order of the slides in a query set. With the annotated data, the LTR algorithm generated a global

model that considered the ranking relationships among the slides of each training query set. In the testing phase, we used

k-fold cross-validation to generate a single held-back set from the dataset given by the BM25 algorithm and Google

slide search. More specifically, k− 1 different partitions of the whole dataset were used to train the LTR model, and the

remaining partition was used as the test data. The learned LTR model then predicted the ranking score of each slide in

the query set. The system produced a ranked list based on these scores. The k results were then averaged to produce a

single assessment. In this manner, the assessment stage can be seen as a period in which re-ranking of the initial rele-

vance search results was performed in consideration of quality. Both training and test data underwent the feature extrac-

tion procedure (see Section 4.2).

4.2. Features

This section describes the automatic assessment quality features that were inspired by our own user study findings.

Quality criteria can be broadly divided into two types: measurable and non-measurable [17]. Measurable criteria are

those criteria that can be objectively and reliably extracted by a computer program without user intervention (e.g. the

number of words in slides) whereas non-measurable criteria are those criteria that are subjectively assigned by a human

(e.g. writing style). We selected 35 measurable criteria from a total number of 216 criteria identified earlier through our

user study, primarily considering two conditions: the number of respondents and implementation feasibility. The first

condition was based on our assumption that those quality criteria mentioned by more users are likely to be more influen-

tial than the less mentioned criteria in determining the quality of slides. For example, in case of representational clarity,

when we decided the features to be implemented, we considered the criteria from the top in the list (see Table 6). The

criterion ‘highlighting of important points and terms’ is the most frequently mentioned one for the dimension. Thus, we

thought it would be reasonable that highlighting important points should be selected and it would be modelled by check-

ing the presence of ‘bold’ and ‘italics’, which were implemented by counting the number of bold and italic terms in the

text. Fortunately, POI extractor6 supports the extraction of bolded and italicized terms from the slide text. However, as a

counter example, the second frequent criterion – ‘writing style’ is very hard to measure objectively by computer because

Model training

Prediction/ranking

BM25

Google

Search

Initial search

Quality annotation

Feature extraction

Training set

Testing set

Accuracy

Cohesiveness

IQ taxonomy

Slides quality assessment

…

Figure 3. Overall process of automatic assessment.

Kim et al. 13




of its abstractedness. The cost to design a method to measure it is highly complicated and expensive. Therefore, we

excluded the criterion. In this manner, we divided some criteria as measureable and others as non-measurable. We

sought to obtain as many measureable features as possible from the user criteria so as to increase the measurement accu-

racy of the automated quality assessment approach. In total, we devised 65 measurable features across 10 IQ dimensions

from 35 related user criteria. Appendix B shows how each of the implemented features is related to its respective dimen-

sion via user criterion obtained from our user study. In addition, we adopted some known features from prior research

such as readability and entropy [8, 13, 14, 25]. We named the dimension of the feature following the dimension of the

criterion from which the feature was derived. For example, the dimension of the feature numTypos is accuracy because

the feature was derived from a criterion – ‘the presence of typo’, whose dimension is accuracy according to the card

sorting exercises.

The next step in the distillation of quality features was to devise extraction methods from the criteria to assess the

quality of slides. The ‘Intrinsic’ category includes accuracy and cohesiveness features. Participants in our user study

mentioned that ‘the presence of typo’ affected the accuracy of the slides. We formulated the user criterion ‘number of

typo’ as a measurable feature [25]. To measure numTypos, the number of typos in the slides was counted using the spell

checker in LanguageTool.5 For cohesiveness, users mentioned ‘strong content connectivity’, and we measured it by cal-

culating the entropy of the slide texts. This feature was also reported in previous studies [8, 25]. The entropy of a docu-

ment D is computed over the individual document terms as:

�Xw∈D

pD wð Þlog pD wð Þ,where pD wið Þ= tfwi,D

.Pwj ∈D tfwj,D

ð2Þ

The probability of word wi is computed using a maximum likelihood estimate pD wið Þ. tfwi,D is the term frequency of wi

in document D andP

wj ∈D

tfwj,D is the sum of all frequency of terms in D.

The Representational category includes clarity, consistency, attraction and ease of navigation. Representational fea-

tures include numHighlights from ‘highlighting important points and terms’ (clarity), conFontFace/Size

and conFontFace/SizeRatio from ‘inconsistent font face/size’ (consistency), preAnim/numAnim/avgNumAnim from

Figure 4. LTR scheme of quality-based ranking for presentation slides.

Kim et al. 14




‘line-by-line animation effect’ and defFontColor from ‘main font color’ (attraction), and preSlideNum from ‘the pres-

ence of page numbering’ (ease of navigation); these form one of the unique features for presentation slides which is not

found in prior studies. In particular, ‘highlighting of important points and terms’ was the most frequently mentioned cri-

terion. The total number of highlights in slides, including bolds, italics, underlines and shadows, was counted for

numHighlights. The ratio between the number of highlights in the slides and the number of slides was calculated for

avgNumHighlights. For consistency, for features such as conFontFace/FaceRatio, the consistency of the font face was

measured with binary values and the ratio. We identified the dominant font face used in all of the slides at first and then

checked the consistency to determine whether the dominant font face in each slide (page) had changed throughout all of

the slides for conFontFace. We used the ratio of the number of slides that had different font faces with dominant font

faces to the number of slides for conFontFaceRatio. conFontSize/SizeRatio was also estimated in this manner.

Consistency of the font face of a slides s is estimated as:

conFontFace sð Þ= 0 if dominantFontCount sð Þ≥ 2

1 otherwise,

�ð3Þ

where dominantFontCount(s) is the number of dominant font faces from each slide (page) in a slide file s. For attraction,

the total number of animation effects in the slides (numAnims) and the number of animation effects per slide

(avgNumAnims) were counted. We used the ratio between the number of animation effects and the number of pages in

the slides for avgNumAnims. For preAnim, the presence of animation in the slides was identified with a binary value

(yes or no). Furthermore, the default font color (defFontColor) in the slides was identified. We measured all font colours

used and identified the most dominant font colour, which is represented in RGB format (e.g. R = 0, G = 0 and B = 0 for

black). According to the user feedback, some font colours, such as green, grey, yellow and dark blue, severely harm

visual attractiveness if they are used as the main colour. Avoiding these colours as the main font colour contributes to

assuring the attractiveness of slides. Interestingly, we found hardly any user comments about colours that directly

improve slide quality. It seems that users tend not to comment when the slides have no visible problem with font col-

ours. According to our checking of the main font colour of slides in the high quality class, black was the most common

font colour in high quality slides. For ease of navigation, the presence of slide numbers in the slides (preSlideNum) was

measured with binary values.

The Contextual category includes completeness, informativeness, recency and task appropriateness. Features such as

preCoverPageInfo from ‘the presence of necessary information in the cover page’ (completeness), numDiagrams from

‘the presence of diagram’ (informativeness), preExample/numExamples/avgNumExamples from ‘the presence of exam-

ple’ (informativeness), and preSummary from ‘the presence of summary’ (task appropriateness) were new features in this

category. For preCoverPageInfo, we checked whether the first page of the slides had a title in the title text box and infor-

mation in the subtitle box. For numDiagrams, we summed the respective number of lines, autoshapes (drawing objects

with a particular shape), and textboxes extracted from the slides. For preExample, numExamples and avgNumExamples,

we used textual cues by checking the existence of keywords such as ‘for example’ and ‘for instance’ in extracted texts

from the slides. Features measured using textual cues such as numExamples are marked in Appendix B. Recency of the

slides was measured by how many months had passed since the slides were created (age) and modified (recency) [14].

For preSummary, the presence of a summary in the slides was measured with a binary value by checking for keywords

such as ‘summary’ and ‘outline’. The Reputational category includes author/institutional reputation; however, we did not

consider it in this study (see Appendix B for a full list of the quality features implemented).

5. Experiments

5.1. Experimental setting

To account for different characteristics of datasets and improve the overall robustness of empirical validation, we con-

ducted multiple experiments. Specifically, three datasets were employed for our experiments: SLIDES-SA (SA means

data obtained from SlideShare and Annotated by human), SLIDES-SF (SF means data obtained from SlideShare and

Featured-selection classified) and SLIDES-GA (GA means data obtained using Google and Annotated by human).

For the first dataset, we collected 1276 PowerPoint presentation slides that were randomly selected from SlideShare.1

The slides collected covered four study areas: Technology, Business, Education and Health. We asked judges to manually

assign a quality grade (low, fair, or high) to the slides. These labels indicate the overall quality in each of the four IQ

categories, i.e. Intrinsic, Representational, Contextual, and Reputational. Three annotators judged each slide. In the over-

all dataset, the inter-annotator agreement among the three annotators was � = 0.63, which indicates substantial agreement

Kim et al. 15




[38]. Finally, we obtained 935 slides (low: 222, fair: 447, and high: 266) whose quality labels had been agreed on by at

least two annotators out of three. We refer to these annotated data as SLIDES-SA.

To conduct a large-scale experiment using a different quality evaluation perspective, we crawled 24,995 slides again

from SlideShare and collected 3655 featured slides and 21,340 non-featured slides. SlideShare provides editors’ picks as

featured-selection slides on the Web site.4 We assumed that the featured-selection slides were of high quality and the oth-

ers not as high quality as those featured-selection slides. We refer to this data as SLIDES-SF.

Furthermore, for comparison with one of the popular and successful search engines, we collected 180 slides Googled

with a filetype quantifier from 36 queries in two disciplines: Computer Science and Management (e.g. introduction to

programming filetype:pptx or filetype:ppt). We downloaded the top five slides per query. For this dataset, we employed

six graduate students as annotators, consisting of three majoring in Computer Science and the other three in

Management, to assign a quality grade on a three-point scale to the slides. The three annotators from one of the two

groups judged each slide file related to their major and we averaged the scores obtained from the annotators while

rounding off the average score below the decimal point. Consequently, we obtained 180 slides (low: 28, fair: 93, and

high: 59). We refer to these annotated data as SLIDES-GA.

SLIDES-SA and SLIDES-SF consist of slides crawled from SlideShare with different annotators. The featured-

selection slides in SLIDES-SF were selected as high-quality slides by the curators at SlideShare. However, given that we

cannot identify the selection process or the standards used by the curators, we needed another quality dataset (SLIDES-

SA) with a manual annotation by our own annotators, who had experience of the IQ dimensions and criteria. With these

two datasets, we checked the possibility of differences caused by different annotators in later experiments (see Section

5.2). Furthermore, we manually built SLIDES-GA with the search results returned by Google so that we can compare

Google slide search with our proposed algorithm. There were no overlaps among the three datasets. A summary of the

datasets is presented in Table 7. To extract the proposed quality features of the slides, we used Apache POI,5 which is a

Java API for reading and writing Microsoft Office files such as Word, Excel and PowerPoint.

We applied the Okapi BM25 algorithm [39] as a baseline method for SLIDES-SA and SLIDES-SF. This is one of the

most effective retrieval algorithms, and is based on a probabilistic relevance framework. To obtain search results via

BM25, we manually created the query sets for SLIDES-SA. We randomly selected 140 keywords (e.g. knowledge dis-

covery, international business, social education, etc.) from the content of the slides in each category. For SLIDES-SF,

we selected the top 500 most frequent tags (e.g. marketing, social media, etc.) from the tagsets of featured slides only to

avoid a no-outcome of featured slides in the BM25 search results. The search results from the queries from the tags of

all slides have sparse features because SLIDES-SF has a small number of featured slides, which can be a constraint of

the experiment. These keywords were used as queries for the BM25 search algorithm to index the resulting slides of

SLIDES-SA and SLIDES-SF in the initial search. We used an open-source search engine Apache Lucene6 (version 4.9)

to generate slide search results using the BM25 search algorithm. Further, we used Google search results as a compara-

tive baseline for SLIDES-GA. For SLIDES-GA, we used 36 subject names (e.g. introduction to programming, invest-

ment, etc.) in Computer Science and Management as queries. Crawling SLIDES-GA from Google, we recorded the

ranking of the slides and considered those rankings as initial search results. The initial list of slides from Okapi BM25

and Google were then re-ranked using the LTR algorithm with our proposed features. We utilized two listwise LTR

algorithms: AdaRank [40] and LambdaMART [41], which have been widely adopted for LTR algorithms, in RankLib.7

We set parameter values for our experiments as follows: AdaRank: no. of iterations = 500 (the number of rounds to

train), tolerance = 0.002 (tolerance between two consecutive rounds of training), max selection count of a feature = 5

(the maximum number of times a feature can be consecutively selected without changing performance); LambdaMART:

no. of trees = 1000, no. of leaves = 10 (number of leaves for each tree), learning rate = 0.1 (shrinkage factor, or the ratio

of each regression tree in LambdaMART. LambdaMART then weights the score from each regression tree by the learn-

ing rate to ensemble these regression trees together). We empirically chose the values to result in roughly the best

performance.

We conducted a 10-fold cross-validation and calculated the average performance of each LTR algorithm. We report

two standard retrieval measures: the normalized discounted cumulative gain (NDCG) and mean reciprocal rank (MRR).

The NDCG [42], a widely used metric in the information retrieval field, is adopted to measure the ranking performance.

To calculate the NDCG, the discounted cumulative gain (DCG) at a particular rank position p is first calculated in a way

that penalizes the score gain near the bottom more than near the top:

DCG @ p=Xp

i= 1

2reli � 1

log2 i+ 1ð Þ ,NDCG @ p= DCG @ p

IDCG @ pð4Þ

Kim et al. 16




where IDCG@p serves as the normalization term that guarantees the ideal NDCG@p to be 1. We summarize the perfor-

mance by averaging the NDCGs over the test query set. To measure the performance from a different perspective, we

adopt the MRR:

MRR= 1= Sj jXSj j

i= 1

1

ranki

ð5Þ

where ranki denotes the rank of the first high-quality slides (or the first low-quality slides) in the ranked list for the i-th

query slide set and |S| is the total number of test queries. We denote the MRR for the high- and low-quality slides as

MRRH and MRRL, respectively. A better ranking method would result in MRRH closer to 1 and MRRL closer to 1/N,

where N is the number of slides observed.

5.2. Results

Figure 5 compares the NDCG@k scores for AdaRank and LambdaMART using all the proposed features against the

baselines (BM25 and Google) with the datasets: (a) SLIDES-SA, (b) SLIDES-SF and (c) SLIDES-GA. The results con-

firm that both AdaRank and LambdaMART outperform the baseline, with LambdaMART scoring higher than AdaRank.

All the differences of AdaRank and LambdaMART over the baselines at all NDCG@k positions were statistically signif-

icant according to the Wilcoxon sign test (a< 0.05), except between LambdaMART and AdaRank in SLIDES-GA,

which is a relatively small dataset. We achieved increases of 23.4% (NDCG@5, LambdaMART, SLIDES-SA), 50.3%

(NDCG@5, LambdaMART, SLIDES-SF) and 9% (NDCG@3, LambdaMART, SLIDES-GA), above the baselines. The

results demonstrate that the proposed features can be used to effectively rank high-quality slides.

To analyze the impact of each category and each dimension, we measured the performance outcomes using the fea-

tures of each category and each dimension separately. This analysis enabled us to identify how each category and each

dimension contributes to the results. We conducted these additional experiments with only SLIDES-SA and SLIDES-SF

because large datasets generally guarantee more stable results. Figure 6 presents the results of NDCG@5 (10-fold,

LambdaMART) obtained for each IQ category (Figure 6(a)) and each IQ dimension (Figure 6(b)).

From Figure 6(a), it is clear that the Representational category was the most important criteria in both datasets,

followed by Contextual and Intrinsic categories. From Figure 6(b), significant differences can be observed, with repre-

sentational clarity (0.87) producing the best impact, followed by informativeness (0.8) and visual attraction (0.79) in

SLIDES-SA. Representational clarity (0.71), informativeness (0.67), visual attraction (0.63) and recency (0.61) are

effective dimensions in SLIDES-SF. Representational clarity, informativeness and visual attraction consistently have a

high impact on performance in both datasets. However, completeness, ease of navigation and accuracy scored lower than

other dimensions, suggesting that these quality dimensions do not have much of an impact on the quality of slides.

These experiment results indicate that representation clarity and informativeness are essential components in the assess-

ment of presentation slide quality. The finding that representational clarity is important for assessing slide quality is

rather natural because presentation slides are a form of visual-oriented communication device between presenter and

audience, and representation clarity is directly related to the conveying of messages [29, 30, 32]. It should be noted that

Figure 5. Performance of leveraging high-quality slides in the ranking using all features: (a) SLIDES-SA, (b) SLIDES-SF and (c)SLIDES-GA.

Kim et al. 17




representational clarity is a distinctive and discriminative IQ dimension for slides, even though this category has not

been pronounced in prior studies [8, 14, 25]. However, these results are contrary to previous reports in which readability

was found to be the most important for the quality ranking of Web documents [8] or length, structure and style were cen-

tral for assessing the quality of Wikipedia articles [14]. Reviews and user features were found to be most important for

ranking quality in the Q&A domain [25].

In our experiments, LambdaMART exhibited better performance than AdaRank. Therefore, we decided to measure

feature importance using LambdaMART. LambdaMART computes the importance of a feature by summing the number

of times it is used in splitting decisions [43, 44]. The relative importance of other features is assigned by normalizing

their importance based on the importance of the largest features. Thus, the most important feature has the importance

score of 1, and the other features have a relative importance score between 0 and 1. For this experiment, we built two

LambdaMART model sets containing 1000 trees, 10 leaves, and a learning rate of 0.1 during training with SLIDES-SA

and SLIDES-SF. We then calculated the feature importance from the models. The top 10 features are listed in Table 8.

Although there are some differences in terms of the feature importance between the two datasets, eight features appear

in the top 10 features in both datasets – enough to support general conclusions. Features related to representational clarity

were found to be highly important – four out of the eight features were representational clarity features. Another notable

point is that six features (clarity: avgNumFontNames, numHighlights, avgFontSize and avgLineSpace; informativeness:

numSlides and numImages) out of the eight most common are relatively simple to estimate, but have a significant impact

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NDC

G@5

(a) by category

SLIDES-SASLIDES-SF

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NDC

G@5

(b) by dimension

SLIDES-SASLIDES-SF

Figure 6. Performance by IQ category and dimension (LambdaMART, 10-fold).

Table 8. Feature importance given by LambdaMART.

Rank SLIDES-SA SLIDES-SF

Importance Feature IQ dimension Importance Feature IQ dimension

1 1.0 BM25y Task appr. 1.0 ARI Rep. clarity2 0.879 avgFontNamesy* Rep. clarity 0.844 numSlidesy* Informativeness3 0.458 avgNumFontColors* Vis. attraction 0.834 avgFontSizey* Rep. clarity4 0.404 numImagey Informativeness 0.6 entropyy Cohesiveness5 0.431 numHighlightsy* Rep. clarity 0.518 avgLineSpacey* Rep. clarity6 0.284 avgFontSizey* Rep. clarity 0.513 BM25y Task appr.7 0.228 avgLineSpacey* Rep. clarity 0.507 numImagey Informativeness8 0.209 entropyy Cohesiveness 0.474 avgFontNamesy* Rep. clarity9 0.197 numFontSize* Rep. clarity 0.368 Flesh Rep. clarity10 0.193 numSlidesy* Informativeness 0.276 numHighlightsy* Rep. clarity

ycommon in both datasets; * newly proposed in this study.

Kim et al. 18




on the quality of slides. Out of these features, five features excluding numImages have not been identified by any prior

study.

To demonstrate the effectiveness of our ranking strategy using other measures, we traced the highest and lowest qual-

ity version in the ranked list for the i-th query result. Table 9 shows the MRRH and MRRL values given by a baseline

implementation of LambdaMART and a version comprising all the features in SLIDES-SA and SLIDES-SF. Our pro-

posed method with all the features achieved a 31.8% improvement in MRRH and a 38.6% decrease in MRRL over the

baseline in SLIDES-SA. In SLIDES-SF, we achieved a 45.7% improvement in MRRH and a 18.3% decrease in MRRL

over the baseline. Note that the baseline value of MRRH (0.38) is much lower than that of MRRL (0.855) in SLIDES-SF

because the featured slides are sparse in the top 10 search results. Thus, non-featured (low-quality) slides have higher

positions in many queries. Despite this problem, our proposed method achieved improvements over the baseline in MRR

across the two datasets, clearly demonstrating the robustness of the approach. Finally, we present the average score of

major features in the high-quality and low-quality classes in Table 10. These results reveal the differences in high- and

low-quality slides in terms of their constituent features. These results suggest that better-quality slides have a tendency

to contain more slides (pages), images, font colours, font names and highlights, contributing to better representational

clarity and informativeness.

6. Conclusion

In this paper, we have investigated the essential elements of presentation slide quality and proposed a new framework

for IQ developed specifically for presentation slides on the basis of direct user inputs. From the user study conducted in

Phase 1, we elicited a rich set of quality criteria for slide quality. We discovered that users felt that the quality of slides

was mostly affected by criteria such as ‘highlighting of important points and terms’, ‘writing style’ and ‘the presence of

too much text’. Furthermore, Representational criteria were emphasized for determining the quality of slides. Visual

attraction, representational clarity and informativeness were the most frequently mentioned IQ dimensions. Because the

open-ended interviews did not mandate any predefined criteria or guidelines, the participants were able to present a vari-

ety of quality criteria. These criteria provided valuable clues for the development of diverse metrics at multiple levels

for slide quality.

We also proposed a comprehensive LTR method developed specifically to promote high-quality and penalize low-

quality computerized presentation slides in Phase 2. We presented 69 features that capture 10 IQ dimensions such as

representational clarity, informativeness and visual attraction. We distilled these features through an intensive user study

and applied them to automatic assessment of the IQ of presentation slides. LambdaMART and AdaRank models trained

by human-annotated data showed substantially better performance than the baseline methods in ranking the quality of

slides. We demonstrated the generality of the proposed framework with three different datasets from different sources

with different quality assessments. Across the datasets, we found that representational clarity, informativeness and visual

Table 9. Performance of ranking high-quality and low-quality slides (MRR).

Dataset Baseline (BM25) All features

MRRH MRRL MRRH MRRL

SLIDES-SA 0.48 0.388 0.663 0.238SLIDES-SF 0.38 0.855 0.554 0.698

Table 10. Mean of major features in high and low quality classes (SLIDES-SA/SLIDES-SF).

Quality Feature Quality

High Low

numSlides 47.3/42.6 33.6/14.9avgNumImages 1.2/2.9 0.5/1.2numFontColors 6.9/7.8 5.1/3.2numFontNames 2.7/4.7 2.18/2.1numHighlights(per page) 2.8/2.0 2.18/1.8

Kim et al. 19




attraction were the most effective features for ranking the quality of presentation slides, whereas completeness, ease of

navigation and accuracy were relatively unhelpful. These results are consistent with the results of our user study. Six fea-

tures (clarity: avgNumFontNames, numHighlights, avgFontSize and avgLineSpace; informativeness: numSlides and

numImages) were very effective in identifying high-quality slides. The features numHighlights (highlighting of important

points and terms), numImages (the presence of figures) and numSlides (the presence of a large amount of content) were

the common features (criteria) confirmed to be important via LambdaMART (Table 8) and the user interview (Table 4).

Based upon the results obtained in Phase 2, Figure 7 provides a revised taxonomy with the most important and effective

dimensions emphasized in bold.

Our comprehensive framework is built upon extensive user feedback. The subsequent automatic assessments con-

ducted with the LTR strategy lead to a deeper understanding of IQ for presentation slides with empirical results. The

framework has direct implications for practical applications and services. For instance, our proposed IQ assessment

approach with LTR can be used by service providers such as SlideShare1 and SlideFinder2, which have search and rank

functionalities for a massive number of slides in their services. We expect that end-user satisfaction could be improved

by rearranging their search results to separate high quality slides based on the user-driven quality criteria identified by

our research.

In our future work, we intend to develop and implement a system that further utilizes semantic quality features espe-

cially in the Intrinsic category criteria. For example, we can measure cohesiveness with the entropy of text. However,

cohesiveness can also be measured by the connectivity between two paragraphs. Thus, measuring cohesiveness via

entropy alone may not give a complete picture. More sophisticated methods should be developed at the semantic level,

enabling better IQ measurement of presentation slides. In addition, we are planning to apply our framework to other

domains such as e-book or mobile content services, in which rich visual aids are highly sought after. For those applica-

tions, our quality framework can serve as an initial yardstick in identifying superior quality visual aids.

Funding

This work was supported by the Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea

government (MSIP) (No. R0101-15-0054, WiseKB: Big data based self-evolving knowledge base and reasoning platform).

Notes

1. http://slideshare.net

2. http://slidefinder.net

3. http://atlasti.com/

4. http://www.slideshare.net/featured

5. https://www.languagetool.org/

6. http://poi.apache.org

7. http://lucene.apache.org/core

8. http://sourceforge.net/p/lemur/wiki/RankLib/

Information quality

Intrinsic

Accuracy

Cohesiveness

Representational

Representational clarity

Representational consistency

Visual attraction

Ease of navigation

Contextual

Completeness

Informativeness

Recency

Task appropriateness

Figure 7. Revised IQ taxonomy of presentation slides.

Kim et al. 20



http://slideshare.net

http://slidefinder.net

http://atlasti.com/

http://www.slideshare.net/featured

https://www.languagetool.org/

http://poi.apache.org

http://lucene.apache.org/core

http://sourceforge.net/p/lemur/wiki/RankLib/


References

[1] Wang RY and Strong DM. Beyond accuracy: What data quality means to data consumers. Journal of Management Information

Systems 1996; 12(4): 5–33.

[2] Marschak J. Economics of information systems. Journal of the American Statistical Association 1971; 66(333): 192–219.

[3] Ballou DP and Pazer HL. Designing information systems to optimize the accuracy–timeliness tradeoff. Information Systems

Research 1995; 6(1): 51–72.

[4] Belardo S and Pazer HL. A framework for analyzing the information monitoring and decision support system investment trade-

off dilemma: An application to crisis management. IEEE Transactions on Engineering Management 1995; 42(4): 352–359.

[5] Porion A, Aparicio X, Megalakaki O, Robert A and Baccino T. The impact of paper-based versus computerized presentation on

text comprehension and memorization. Computers in Human Behavior 2016; 54: 569–576.

[6] Ge M and Helfert M. A review of information quality research. Paper presented at the International Conference on Information

Quality 2007.

[7] Stvilia B, Gasser L, Twidale MB and Smith LC. A framework for information quality assessment. Journal of the American

Society for Information Science and Technology 2007; 58(12): 1720–1733.

[8] Bendersky M, Croft WB and Diao Y. Quality-biased ranking of web documents. Proceedings of the 4th ACM international con-

ference on Web Search and Data Mining. Hong Kong, China: ACM, 2011, pp. 95–104.

[9] Knight S-a and Burn J. Developing a framework for assessing information quality on the World Wide Web. Informing Science

2005; 8 (http://inform.nu/Articles/Vol8/v8p159-172Knig.pdf).

[10] Mandl T. Implementation and evaluation of a quality-based search engine. Proceedings of the seventeenth conference on

Hypertext and Hypermedia. Odense, Denmark: ACM, 2006, pp. 73–84.

[11] Rieh SY. Judgment of information quality and cognitive authority in the WWW. Journal of the American Society for

Information Science and Technology 2002; 53(2): 145–161.

[12] Zhou Y and Croft WB. Document quality models for web ad hoc retrieval. Proceedings of the 14th ACM international confer-

ence on Information and Knowledge Management. Bremen, Germany: ACM, 2005, pp. 331–332.

[13] Anderka M, Stein B and Lipka N. Predicting quality flaws in user-generated content: The case of Wikipedia. Proceedings of the

35th international ACM SIGIR conference on Research and Development in Information Retrieval. Portland, OR: ACM, 2012,

pp. 981–990.

[14] Dalip DH, Goncxalves MA, Cristo M and Calado P. Automatic quality assessment of content created collaboratively by web

communities: A case study of Wikipedia. Proceedings of the 9th ACM/IEEE-CS joint conference on Digital Libraries. Austin,

TX: ACM, 2009, pp. 295–304.

[15] Hu M, Lim E-P, Sun A, Lauw HW and Vuong B-Q. Measuring article quality in Wikipedia: Models and evaluation.

Proceedings of the 16th ACM international conference on Information and Knowledge Management. Lisbon: ACM, 2007, pp.

243–252.

[16] Stvilia B, Twidale MB, Smith LC and Gasser L. Information quality work organization in Wikipedia. Journal of the American

Society for Information Science and Technology 2008; 59(6): 983–1001.

[17] Yaari E, Baruchson-Arbib S and Bar-Ilan J. Information quality assessment of community-generated content: A user study of

Wikipedia. Journal of Information Science 2011; 37(5): 487–498.

[18] Liu T-Y. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 2009; 3(3): 225–331.

[19] Kim S, Jung W, Han K, Lee JG, and Yi M. Quality-based automatic classification for presentation slides. Proceedings of the

36th European Conference on Information Retrieval (ECIR) 2014, pp. 638–643.

[20] Hilligoss B and Rieh SY. Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in

context. Information Processing & Management. 2008; 44(4): 1467–1484.

[21] Alkhattabi M, Neagu D and Cullen A. Information quality framework for e-learning systems. Knowledge Management & E-

Learning: An International Journal (KM&EL) 2010; 2(4): 340–362.

[22] Dedeke A. A conceptual framework for developing quality measures for information systems. Proceedings of the Conference

on Information Quality 2000, pp. 126–128.

[23] Richardson M, Prakash A and Brill E. Beyond pagerank: Machine learning for static ranking. Proceedings of the 15th interna-

tional conference on World Wide Web. Edinburgh: ACM, 2006, pp. 707–715.

[24] Choi S, Ryu B, Yoo S and Choi J. Combining relevancy and methodological quality into a single ranking for evidence-based

medicine. Information Sciences 2012; 214: 76–90.

[25] Dalip DH, Goncxalves MA, Cristo M and Calado P. Exploiting user feedback to learn to rank answers in Q&A forums: A case

study with stack overflow. Proceedings of the 36th international ACM SIGIR conference on Research and Development in

Information Retrieval. Dublin: ACM, 2013, pp. 543–552.

[26] Raiber F and Kurland O. Using document-quality measures to predict web-search effectiveness. Proceedings of the 35th

European conference on Advances in Information Retrieval. Moscow: Springer, 2013, pp. 134–145.

[27] Alkhattabi M, Neagu D and Cullen A. Assessing information quality of e-learning systems: A web mining approach. Computers

in Human Behavior 2011; 27(2): 862–873.

[28] Wu O, Hu R, Mao X and Hu W. Quality-based learning for web data classification. Proceedings of the twenty-eighth AAAI con-

ference on Artificial Intelligence 2014.

Kim et al. 21



http://inform.nu/Articles/Vol8/v8p159-172Knig.pdf


[29] Alley M. The Craft of Scientific Presentations: Critical Steps to Succeed and Critical Errors to Avoid. New York: Springer,

2013.

[30] Reynolds G. Presentation zen: Simple ideas on presentation design and delivery. Berkeley, CA: New Riders, 2011.

[31] Alley M and Neeley KA. Discovering the power of powerpoint: Rethinking the design of presentation slides from a skillful

user’s perspective. American Society for Engineering Education Annual Conference & Exposition, 2005.

[32] Mackiewicz J. Perceptions of clarity and attractiveness in Powerpoint graph slides. Technical Communication 2007; 54(2):

145–156.

[33] Miles MB and Huberman AM. Qualitative Data Analysis: An expanded sourcebook, 2nd edition. Thousand Oaks, CA: Sage,

1994.

[34] Saldana J. The Coding Manual for Qualitative Researchers. Thousand Oaks, CA: Sage, 2009.

[35] Fincher S and Tenenberg J. Making sense of card sorting data. Expert Systems 2005; 22(3): 89–93.

[36] Spencer D. Card sorting: Designing Usable Categories. New York: Rosenfeld, 2009.

[37] Li H. A short introduction to learning to rank. IEICE TRANS 2011; E94-D(10).

[38] Fleiss JL. Measuring nominal scale agreement among many raters. Psychological Bulletin 1971; 76(5): 378–382.

[39] Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM and Gatford M. Okapi at trec3. Text REtrieval Conference 1994.

[40] Xu J and Li H. Adarank: A boosting algorithm for information retrieval. Proceedings of the 30th annual international ACM

SIGIR conference on Research and Development in Information Retrieval. Amsterdam: ACM, 2007, pp. 391–398.

[41] Burges CJC. From RankNet to LambdaRank to LambdaMart: An Overview. Microsoft Research Technical Report MSR-TR-

2010-82, 2010.

[42] Jarvelin K and Kekalainen J. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems

2002; 20(4): 422–446.

[43] Wu Q, Burges CJ, Svore KM and Gao J. Adapting boosting for information retrieval measures. Information Retrieval 2010;

13(3): 254–270.

[44] Berberich K, Konig AC, Lymberopoulos D and Zhao P. Improving local search ranking through external logs. Proceedings of

the 34th international ACM SIGIR conference on Research and Development in Information Retrieval. Beijing: ACM, 2011,

pp. 785–794.

Kim et al. 22




Ap

pen

dix

A.

Use

rs’c

om

men

tsfo

rth

ecr

iter

ia.

Cat

egory

Dim

ensi

on

Cri

teri

aU

sers

’co

mm

ents

Intr

insi

cA

ccura

cyT

he

pre

sence

ofty

pos

‘Ther

ear

eso

me

erro

rs,w

hic

hm

ake

me

confu

sed,so

Iw

asnot

dra

wn

into

the

pre

senta

tion.’

‘Itse

ems

like

the

erro

rsca

use

dis

trust

inth

esl

ides

.’T

he

pre

sence

ofin

accu

rate

expla

nat

ions

‘The

slid

esle

anto

om

uch

tow

ard

spec

ific

view

poin

ts.T

her

ear

em

any

oth

erper

spec

tive

son

this

his

tori

calev

ent.

How

ever

,oth

ersc

hola

rs’co

mm

ents

are

not

incl

uded

her

e.’

Cohes

iven

ess

The

pre

sence

ofir

rele

vant

conte

nt

‘Itap

pea

rsto

hav

eir

rele

vant

poin

tsin

cert

ain

sect

ions.’

‘Ith

ink

ithas

alo

tofir

rele

vant

det

ails

that

are

not

rela

ted

toth

ele

arnin

gai

ms

ofth

isco

urs

e.’

Irre

leva

nt

pag

etitle

for

the

conte

nt

‘This

slid

eis

titled

‘‘The

diff

eren

ces

bet

wee

nst

age

1an

dst

age

2’’,

but

only

stag

e1

isdis

cuss

edon

this

pag

e.’

‘Iam

not

sure

whet

her

this

isa

corr

ect

title

for

the

conte

nt

ofth

ispag

e.T

he

title

and

the

conte

nt

are

mis

mat

ched

.’N

atura

lnes

sT

he

pre

sence

ofeq

uat

ions

‘Itis

easi

erto

under

stan

dw

hen

the

slid

eshav

eeq

uat

ions

rath

erth

anju

stte

xt.’

‘This

isa

very

import

ant

conce

pt.

Expla

inin

gusi

ng

equat

ions

isnat

ura

lan

din

tuitiv

e.’

Obje

ctiv

elin

kage

clar

ity

Conte

nt

flow

‘The

conte

nt

ofth

esl

ides

flow

sve

ryw

ell.

The

story

unfo

lds

ina

very

nat

ura

lw

ay.’

‘Ilik

edth

eplo

tofth

esl

ides

.Sp

ecifi

cally

,Ith

ough

tth

eau

thor

did

agr

eat

job

with

dep

loyi

ng

the

conte

nt

inth

esl

ides

.’R

ever

sed

conte

nt

ord

erag

ainst

the

outlin

e‘T

he

ord

eris

stra

nge

;Ire

aliz

edth

atth

eco

nte

nt

ord

erch

ange

dfr

om

the

out

line

pre

sente

dat

the

beg

innin

g.D

ue

toth

is,Isu

dden

lylo

stm

yw

ay.’

‘Ith

ink

pag

es9

and

10

should

be

chan

ged

tofo

llow

the

sect

ion

outlin

e.T

he

editor

mig

ht

hav

em

ade

am

ista

ke.’

Rep

rese

nta

tional

Rep

rese

nta

tional

clar

ity

Hig

hlig

hting

ofim

port

ant

poin

tsan

dte

rms

‘Anoth

ergo

od

thin

gis

that

hig

hlig

hting

the

import

ant

par

tsin

red

hel

ps

my

under

stan

din

g;th

etitle

inbold

isea

sier

tose

e.’

‘The

import

ant

par

tsuse

ast

rong

colo

ran

dth

eunim

port

ant

par

tsar

ew

eakl

yex

pre

ssed

;th

eref

ore

,Ita

kem

ore

notice

ofth

esi

gnifi

cant

par

ts.’

Wri

ting

styl

e–

Sente

nce

wri

ting

styl

ePo

sitive

exam

ple

:‘E

xpla

nat

ions

usi

ng

nat

ura

lse

nte

nce

sar

em

uch

more

under

stan

dab

lefo

rm

e.Ica

nca

tch

the

poin

tsfr

om

det

aile

dex

pla

nat

ions.’

Neg

ativ

eex

ample

:‘T

he

sente

nce

sar

ehar

dto

read

.Ipre

fer

short

,w

ell-co

nst

ruct

edex

pre

ssio

ns

insl

ides

.’–

Sum

mar

ized

wri

ting

styl

ePo

sitive

exam

ple

:‘T

he

conden

sed,su

mm

ariz

edst

yle

ofw

riting

usi

ng

import

ant

keyw

ord

sim

pro

ves

my

conce

ntr

atio

nsi

nce

itre

produce

ske

yfa

cts

and

idea

s.’

Neg

ativ

eex

ample

:‘I

can’t

under

stan

dw

hat

this

phra

sem

eans.

Afu

ll,des

crip

tive

sente

nce

would

impro

veth

eex

pla

nat

ion,th

ough

itw

ould

be

abit

longe

r.’R

epre

senta

tional

consi

sten

cyIn

consi

sten

tfo

nt

face

‘Itis

not

good

tosu

dden

lych

ange

the

font

face

.It

seem

sbet

ter

touse

aco

nsi

sten

tfo

nt.’

‘Inco

nsi

sten

tuse

offo

nts

inth

esl

ides

seem

sto

dis

turb

conce

ntr

atio

n.’

Inco

nsi

sten

tfo

nt

size

(in

conte

nt,

title)

‘The

font

size

unex

pec

tedly

gets

big

ger

on

this

pag

e.It

real

lyhar

ms

consi

sten

cyan

dco

nce

ntr

atio

n.’

‘The

inco

nsi

sten

tfo

nt

size

sth

rough

out

the

slid

esm

ake

itdiff

icult

tore

adth

ete

xt,

rega

rdle

ssofco

nte

nt.’

Vis

ual

attr

action

The

line-

by-

line

pre

sence

of

anim

atio

nN

egat

ive

exam

ple

:‘D

ue

toto

om

any

anim

atio

nef

fect

sin

the

slid

es,it

was

not

pro

per

for

exam

inat

ion.’

‘Ica

n’t

under

stan

dw

hyso

man

yan

imat

ion

effe

cts

are

embed

ded

inev

ery

singl

ese

nte

nce

.Exce

ssiv

ean

imat

ion

des

troys

conce

ntr

atio

n.’

Posi

tive

exam

ple

:‘C

onte

nt

should

be

pre

sente

din

this

man

ner

usi

ng

anim

atio

nef

fect

sso

that

the

mai

nid

eaca

nbe

easi

lyunder

stood.’

Cle

anan

dnea

tsl

ide

des

ign

‘This

slid

eis

sonea

tan

dcl

ear

that

itis

more

appea

ling

toth

eey

es.’

(con

tinue

d)

Kim et al. 23




Ap

pen

dix

A.C

ontinued

Cat

egory

Dim

ensi

on

Cri

teri

aU

sers

’co

mm

ents

‘My

first

impre

ssio

nofth

issl

ide

isth

atit

issi

mple

,nea

t,an

dw

elldes

igned

.It

’snic

eto

look

atit.’

Eas

eofnav

igat

ion

The

pre

sence

ofsl

ide

num

ber

ing

‘Slid

etitles

with

num

ber

sar

egr

eat.

Conte

nt

num

ber

ing

asso

ciat

edw

ith

the

table

of

conte

nts

obvi

ousl

ypro

vides

alo

tofsy

ner

gyfo

rlo

cating

conte

nt.

Itis

easy

togu

ess

wher

eit

is.’

‘Ifth

esl

ides

don’

thav

epag

enum

ber

s,it

isdiff

icult

tonav

igat

ean

dlo

cate

the

slid

es.’

The

pre

sence

ofa

table

of

conte

nts

‘This

pre

senta

tion

slid

ehas

ata

ble

ofco

nte

nts

inth

ebeg

innin

gpar

t,so

itis

poss

ible

that

Ica

nunder

stan

dw

hat

Iam

goin

gto

lear

n.’

‘The

org

aniz

atio

nofth

esl

ides

,in

cludin

ga

table

ofco

nte

nts

,is

very

nic

e.It

mak

esth

eoutlin

eofth

eco

nte

nts

more

clea

r.’C

onte

xtu

alC

om

ple

tenes

sT

he

pre

sence

ofre

fere

nce

s‘T

he

refe

rence

sar

ein

dex

ed.It

seem

sto

be

easi

erto

find

thin

gs.’

‘Ilik

eth

ese

refe

rence

ssh

ow

ing

wher

eth

eyco

me

from

.’T

he

pre

sence

ofnec

essa

ryin

form

atio

non

the

cove

rpag

e(t

itle

,au

thor,

etc.

)

‘This

slid

ehas

apre

senta

tion

title,

pre

senta

tion

dat

e,a

pic

ture

ofth

esp

eake

r,an

dth

eaf

filia

tion

on

the

first

pag

e.It

isnic

e.So

me

don’t

hav

eth

isbas

icin

form

atio

n.’

‘The

cove

rpag

ein

form

sm

eofm

any

thin

gsw

ith

the

title,

the

pre

sente

r,an

da

rele

vant

pic

ture

.Ith

ink

aco

ver

pag

ew

ith

this

info

rmat

ion

isfu

ndam

enta

lan

dnec

essa

ry.’

Info

rmat

iven

ess

The

pre

sence

ofto

om

uch

text

Posi

tive

exam

ple

:‘T

he

adva

nta

gese

ems

tobe

that

Ica

nst

udy

only

with

thes

esl

ides

without

hav

ing

ate

xtb

ook.

’N

egat

ive

exam

ple

:‘It

seem

sth

ere

are

too

man

yle

tter

son

the

pag

eto

gras

pth

eco

nte

nt’

and

‘When

you

take

aco

urs

ew

ith

slid

esth

athav

ea

larg

eam

ount

ofw

riting,

they

usu

ally

move

more

quic

kly.

Bec

ause

Ica

nno

tre

adbey

ond

ace

rtai

npoin

t,Ith

ink

itis

not

good.’

The

pre

sence

ofex

ample

s‘A

tth

ebeg

innin

g,th

eth

eory

isex

pla

ined

,an

dth

enth

ere

alex

ample

sar

eco

ntinued

.It

seem

sto

be

easi

erto

under

stan

dw

ith

real

exam

ple

s.’

‘Ilik

eth

ese

inte

rest

ing

photo

exam

ple

s.’

Rec

ency

Outd

ated

conte

nt

‘Iam

repuls

edby

the

old

conte

nt

ofth

esl

ides

.T

he

slid

esw

ere

crea

ted

ove

r10

year

sag

o.’

‘This

sect

ion

that

expla

ins

tren

ds

inth

eap

plic

atio

nusi

ng

old

dat

ale

ads

toga

ps

inco

mm

itm

ent.’

Task

appro

pria

tenes

sSu

mm

ariz

atio

n‘T

he

deg

ree

ofsu

mm

ary

for

the

conte

nt

was

good.It

effe

ctiv

ely

sort

edke

ypoin

ts,

conden

sing

det

aile

dex

pla

nat

ions.’

‘Ilik

edth

esu

mm

ariz

atio

nofco

nte

nt

for

pre

senta

tion.Sp

ecifi

cally

,Ith

ough

tit

was

wel

lsu

mm

ariz

edan

dpre

sente

dbri

efly

such

that

Ico

uld

reco

gniz

ean

dunder

stan

dth

eco

nte

nt

imm

edia

tely

.’T

he

pre

sence

ofex

erci

ses

‘At

the

end

ofth

esl

ides

,th

ere

are

som

equiz

zes.

Itm

akes

me

chec

kw

hat

Ile

arned

and

iden

tify

what

are

import

ant.’

‘Ilik

eth

ese

exer

cise

sth

atfo

llow

the

rela

ted

conce

pts

.It

give

sm

ea

chan

ceto

thin

kab

out

them

.’R

eputa

tional

Auth

or/

inst

itutional

reputa

tion

Slid

esby

the

textb

ook

publis

her

‘Its

org

aniz

atio

nse

ems

grea

t.Ith

ink

the

reas

on

isth

atth

ete

xtb

ook

com

pan

yhas

publ

ished

slid

esas

auxili

ary

mat

eria

ls.’

‘This

one

isfr

om

the

publis

her

ofth

ete

xtb

ook.

Itse

ems

more

trust

wort

hybec

ause

itm

ust

be

afa

ithfu

llysu

mm

ariz

eddocu

men

tfo

rpre

senta

tion.’

Kim et al. 24




Ap

pen

dix

B.

List

ofqual

ity

feat

ure

s.

Cat

egory

Dim

ensi

on

Feat

ure

Des

crip

tion

Rel

ated

use

rcr

iter

ion

Ref

.

Intr

insi

cA

ccura

cynum

Typos

Num

ber

ofty

pos

Open

Pro

ofR

eadin

glib

rary

5(h

ttps:

//w

ww

.langu

aget

ool.o

rg/)

The

pre

sence

ofty

pos

[25]

Cohes

iven

ess

Entr

opy

Entr

opy

ofte

xts

inth

esl

ides

�P w∈D

p DwðÞlo

gpD

wðÞ,

wher

e

p Dw

iðÞ =

tfw

i,D

. Pw

j∈D

tfw

j,D

Java

MIv1

.0(h

ttps:

//gi

thub.c

om

/Cra

igac

p/Jav

aMI)

Stro

ng

conte

nt

connec

tivi

ty[8

]

Rep

rese

nta

tional

Rep

rese

nta

tional

clar

ity

num

Hig

hlig

hts

Num

ber

offo

nt

hig

hlig

hts

Hig

hlig

hting

ofim

port

ant

poin

tsan

dte

rms

New

avgN

um

Hig

hlig

hts

Ave

rage

num

ber

ofhig

hlig

hts

Hig

hlig

hting

ofim

port

ant

poin

tsan

dte

rms

New

num

FontS

izes

Num

ber

offo

nt

size

sFo

ntsi

zeN

ewav

gFontS

ize

Ave

rage

size

offo

nt

Font

size

New

num

LineS

pac

esN

um

ber

oflin

esp

aces

Sente

nce

spac

ing

New

avgS

izeL

ineS

pac

eA

vera

gesi

zeoflin

esp

aces

Sente

nce

spac

ing

New

pre

Table

Pre

sence

ofta

ble

sExpla

nation

with

com

par

ison

[8,13]

num

Table

sN

um

ber

ofta

ble

sExpla

nation

with

com

par

ison

[8,13]

avgN

um

Table

sA

vera

genum

ber

ofta

ble

sExpla

nation

with

com

par

ison

[8,13]

Res

olu

tion

Res

olu

tion

ofim

ages

Hig

hre

solu

tion

figure

New

pre

Bulle

tPre

sence

ofbulle

tpoin

tsU

seofbulle

tpoin

tsN

ewnum

Bulle

tsN

um

ber

ofbulle

tpoin

tsU

seofbulle

tpoin

tsN

ewlis

tRat

ioN

um

ber

ofw

ord

sin

lists

/word

count

–[1

3]

frac

Stops

Stopw

ord

s/non-s

topw

ord

sra

tio

–[8

]av

gTer

mLe

nA

vera

gete

rmle

ngt

hofte

xts

–[8

]A

RI

Auto

mat

edre

adab

ility

index

–Phan

tom

read

abili

tylib

rary

(htt

ps:

//bin

tray

.com

/plin

dsa

y/phan

tom

)

–[8

,14,2

5]

Fles

hFl

esch

read

ing

ease

–Phan

tom

read

abili

tylib

rary

(htt

ps:

//bin

tray

.com

/plin

dsa

y/phan

tom

)

–[8

,14,2

5]

Rep

rese

nta

tional

consi

sten

cyco

nFo

ntF

ace

Consi

sten

cyofdom

inan

tfo

nt

face

Inco

nsi

sten

tfo

nt

face

New

conFo

ntF

aceR

atio

Rat

ioofco

nsi

sten

tfo

nt

face

pag

esIn

consi

sten

tfo

nt

face

New

conFo

ntS

ize

Consi

sten

cyofdom

inan

tfo

nt

size

Inco

nsi

sten

tfo

nt

size

New

conFo

ntS

izeR

atio

Rat

ioofco

nsi

sten

tfo

nt

face

pag

esIn

consi

sten

tfo

nt

size

New

conIn

den

Leve

lC

onsi

sten

cyofin

den

tation

leve

lIn

consi

sten

tin

den

tation

leve

lN

ewco

nBG

Tem

pla

teC

onsi

sten

cyofbac

kgro

und

tem

pla

teIn

consi

sten

tbac

kgro

und

tem

pla

teN

ewV

isual

attr

action

num

Anim

sN

um

ber

ofan

imat

ion

effe

cts

Line-

by-

line

anim

atio

nef

fect

New

avgN

um

Anim

sA

vera

genum

ber

ofan

imat

ion

effe

cts

Line-

by-

line

anim

atio

nef

fect

New

pre

Ani

Pre

sence

ofan

imat

ion

effe

cts

The

pre

sence

ofan

imat

ion

New

num

FontC

olo

rsN

um

ber

offo

nt

colo

urs

Colo

ur

sche

me

New

avgN

um

FontC

olo

rsA

vera

genum

ber

offo

nt

colo

urs

Colo

ur

sche

me

New

(con

tinue

d)

Kim et al. 25





https://github.com/Craigacp/JavaMI

https://bintray.com/plindsay/phantom





Ap

pen

dix

B.C

ontinued

Cat

egory

Dim

ensi

on

Feat

ure

Des

crip

tion

Rel

ated

use

rcr

iter

ion

Ref

.

def

FontC

olo

rD

efau

ltfo

nt

colo

ur

Mai

nfo

nt

colo

ur

New

num

Colo

rsN

um

ber

ofco

lours

Use

ofm

any

colo

urs

New

num

Tem

pla

teN

um

ber

ofte

mpla

tes

-[1

4]

Eas

eofnav

igat

ion

pre

Slid

eNum

Pre

sence

ofpag

enum

ber

sT

he

pre

sence

ofpag

enum

ber

ing

New

pre

Table

Cnts

Pre

sence

ofta

ble

ofco

nte

nts

(T,ta

ble

of

conte

nts

,TO

C,lis

t,et

c.in

the

first

3pag

es)

The

pre

sence

ofta

ble

ofco

nte

nts

New

Conte

xtu

alC

om

ple

tenes

spre

Ref

eren

cePre

sence

ofre

fere

nce

(T,re

fere

nce

,ap

pen

dix

inth

ela

st3

pag

es)

The

pre

sence

ofre

fere

nce

[14]

pre

Cove

rPag

eInfo

Pre

sence

ofnec

essa

ryin

form

atio

nin

the

cove

rpag

e(t

itle

,au

thor,

dep

artm

ent,

org

aniz

atio

n)

The

pre

sence

ofnec

essa

ryin

form

atio

nin

the

cove

rpag

e(t

itle

,au

thor,

dep

artm

ent,

org

aniz

atio

n)

New

pre

ExtL

ink

Pre

sence

ofex

tern

allin

ks(e

xte

rnal

video

,w

ebpag

e)T

he

pre

sence

ofex

tern

allin

k(e

xte

rnal

video

,w

ebpag

e)[1

3,14]

num

ExtL

inks

Pre

sence

ofex

tern

allin

ks(e

xte

rnal

video

,w

ebpag

e)T

he

pre

sence

ofex

tern

allin

k(e

xte

rnal

video

,w

ebpag

e)[1

3,14]

Info

rmat

iven

ess

num

Term

sN

um

ber

ofte

rms

inth

esl

ides

The

pre

sence

ofto

om

uch

text

[13,14]

avgN

um

Term

sN

um

ber

ofte

rms

per

pag

eT

he

pre

sence

ofto

om

uch

text

[7,12,2

4]

pre

Exam

ple

Pre

sence

ofex

ample

s(T

,fo

rex

ample

,fo

rin

stan

ce)

The

pre

sence

ofex

ample

sN

ew

num

Exam

ple

sN

um

ber

ofex

ample

s(T

)T

he

pre

sence

ofex

ample

sN

ewav

gNum

Exam

ple

sA

vera

genum

ber

ofex

ample

s(T

)T

he

pre

sence

ofex

ample

sN

ewpre

Img

Pre

sence

ofim

age

The

pre

sence

offig

ure

s[8

,14,2

5]

num

Imgs

Num

ber

ofim

ages

The

pre

sence

offig

ure

s[8

,14,2

5]

avgN

um

Imgs

Num

ber

ofim

ages

per

pag

eT

he

pre

sence

offig

ure

s[8

,14,2

5]

num

Slid

esN

um

ber

ofsl

ides

The

pre

sence

ofa

larg

eam

ount

ofco

nte

nt

New

pre

Expla

inPre

sence

ofex

pla

nat

ion

for

obje

cts

The

pre

sence

ofad

ditio

nal

expla

nat

ion

(for

equat

ion,gr

aph,fig

ure

,ta

ble

)N

ew

pre

Dia

gram

Pre

sence

ofdia

gram

The

pre

sence

ofdia

gram

[13]

pre

Def

initio

nPre

sence

ofdef

initio

n(T

,def

initio

n,ra

tional

e)T

he

pre

sence

ofte

rmin

olo

gy(t

erm

-def

initio

n)

New

info

-to-r

atio

Voca

bula

rysi

ze/w

ord

count

–[1

3]

num

Sente

nce

sN

um

ber

ofse

nte

nce

s–

Stan

ford

Toke

niz

er(h

ttp://n

lp.s

tanfo

rd.e

du/

soft

war

e/to

keniz

er.s

htm

l)

–[1

3]

sente

ceLe

nLe

ngt

hofse

nte

nce

Long

sente

nce

(Len

gth

ofse

nte

nce

)[1

3]

Rec

ency

Age

Num

ber

ofm

onth

sfr

om

crea

ted

dat

eO

utd

ated

conte

nts

[14]

Rec

ency

Num

ber

ofm

onth

sfr

om

modifi

eddat

eO

utd

ated

conte

nt

s[1

4]

Task

appro

pri

aten

ess

pre

Exe

rcis

ePre

sence

ofex

erci

ses

(T,ex

erci

se,w

ork

out)

The

pre

sence

ofex

erci

ses

New

num

Exe

rcis

esN

um

ber

ofex

erci

ses

(T)

The

pre

sence

ofex

erci

ses

New

avgN

um

Exe

rcis

esA

vera

genum

ber

ofex

erci

ses

(T)

The

pre

sence

ofex

erci

ses

New

pre

Ques

tion

Pre

sence

ofques

tions

(T,ques

tion,pro

ble

m,?,

Qan

dA

)T

he

pre

sence

ofques

tions

[13]

(con

tinue

d)

Kim et al. 26



http://nlp.stanford.edu/software/tokenizer.shtml

http://nlp.stanford.edu/software/tokenizer.shtml


Ap

pen

dix

B.C

ontinued

Cat

egory

Dim

ensi

on

Feat

ure

Des

crip

tion

Rel

ated

use

rcr

iter

ion

Ref

.

num

Ques

tions

Num

ber

ofques

tions

(T)

The

pre

sence

ofques

tions

[13]

avgN

um

Ques

tions

Ave

rage

num

ber

ofques

tions

(T)

The

pre

sence

ofques

tions

[13]

pre

Key

term

Pre

sence

ofke

yter

ms

(T,ke

yter

m,ke

yword

,ap

pen

dix

inth

ela

st3

pag

es)

The

pre

sence

ofke

yter

mN

ew

pre

Sum

mar

yPre

sence

ofsu

mm

ary

(T,su

mm

ary,

index

inth

ela

st3

pag

es)

The

pre

sence

ofsu

mm

ary

New

BM

25

Rel

evan

cebet

wee

nco

nte

nts

and

use

r’s

quer

y,A

pac

he

Luce

ne6

(ver

sion

4.9

)-

[25]

We

use

dPO

IExt

ract

or6

toex

trac

tth

ese

lect

edfe

ature

sfr

om

the

slid

es(p

pt

files

).PO

IExt

ract

or

pro

vides

multip

leex

trac

tion

func

tional

itie

ssu

chas

imag

e,te

xtb

ox

(font

size

,co

lours

,bold

and

ital

ics,

etc.

),obje

cts,

slid

enum

ber

,et

c.Esp

ecia

lly,th

efe

ature

sm

easu

red

usi

ng

textu

alcu

essu

chas

num

Exam

ple

are

mar

ked

with

Tan

dth

eke

yword

sfo

rcu

esar

epro

vided

.In

case

ofusi

ng

additio

nal

SW

libra

ries

,w

esp

ecifi

edth

emin

the

des

crip

tion.

Kim et al. 27




Documents

Journal of Information Science Developing information quality …kirc.kaist.ac.kr/papers/journal/2016_Seongchan_JIS.pdf · The assessment framework needs a taxonomy of IQ dimensions,