Barriers common to mobile and disabled web users

Interacting with Computers 23 (2011) 525–542

Contents lists available at ScienceDirect

Interacting with Computers

journal homepage: www.elsevier .com/locate / intcom

Barriers common to mobile and disabled web users

Yeliz Yesilada a,⇑, Giorgio Brajnik b, Simon Harper c

a Middle East Technical University, Northern Cyprus Campus, Güzelyurt, Mersin 10, Turkeyb Dip. di Matematica e Informatica, Università di Udine, Udine, Italyc School of Computer Science, University of Manchester, Manchester, UK

a r t i c l e i n f o a b s t r a c t

Article history:Received 7 June 2010Received in revised form 31 January 2011Accepted 13 May 2011Available online 23 May 2011

Keywords:Web accessibility evaluationMobile web evaluationBarrier Walkthrough aggregation

0953-5438/$ - see front matter � 2011 British Informdoi:10.1016/j.intcom.2011.05.005

⇑ Corresponding author. Tel.: +90 392 661 2994.E-mail address: [email protected] (Y. Yesilada).

World Wide Web accessibility and best practice audits and evaluations are becoming increasingly com-plicated, time consuming, and costly because of the increasing number of conformance criteria whichneed to be tested. In the case of web access by disabled users and mobile users, a number of common-alities have been identified in usage, which have been termed situationally-induced impairments; ineffect the barriers experienced by mobile web users have been likened to those of visually disabledand motor impaired users. In this case, we became interested in understanding if it was possible to eval-uate the problems of mobile web users in terms of the aggregation of barriers-to-access experienced bydisabled users; and in this way attempt to reduce the need for the evaluation of the additional confor-mance criteria associated with mobile web best practice guidelines. We used the Barrier Walkthrough(BW) method as our analytical framework. Capable of being used to evaluate accessibility in both the dis-abled and mobile contexts, the BW method would also enable testing and aggregation of barriers acrossour target user groups.

We tested 61 barriers across four user groups each over four pages with 19 experts and 57 non-expertsfocusing on the validity and reliability of our results. We found that 58% of the barrier types that werecorrectly found were identified as common between mobile and disabled users. Further, if our aggregatedbarriers alone were used to test for mobile conformance only four barrier types would be missed. Ourresults also showed that mobile users and low vision users have the most common barrier types, whilelow vision and motor impaired users experiencing similar rates of severity in the barriers they experi-enced. We conclude that the aggregated evaluation results for blind, low vision and motor impaired userscan be used to approximate the evaluation results for mobile web users.

� 2011 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.

1. Introduction

The World Wide Web has revolutionised the digital world. Inthe beginning, interaction was mostly accomplished from thedesktop but nowadays it is both ubiquitous and mobile; it can beaccessed with various devices at anytime and anywhere. Thesemobile devices present various usability challenges to manufactur-ers, web designers and researchers due to limitations such asscreen and keyboard size. Existing research suggests that whenpeople without disabilities access the mobile web they experiencesimilar barriers to people with disabilities access the web via thedesktop (Trewin, 2006; Sears and Young, 2003; Wobbrock, 2006),called situationally-induced impairment. Indeed, able-bodied indi-viduals can be affected by both the environment in which one isworking and the activities in which that person is engaged (Searsand Young, 2003). For example, when a small device is used in poorlighting conditions (e.g., outdoors), mobile web users can easily

atics Society Limited. Published b

have difficulty in perceiving information only encoded in colour.This is a shared experience with blind and colour blind web usersas, likewise, they cannot perceive information encoded in colour(see Section 2). To eliminate such difficulties, some web sitesdevelop separate accessible version (Thatcher et al., 2002) or a sep-arate mobile version, for example mobile friendly versions areserved by .mobi domain. However, it is well known that creatinga separate version has many drawbacks, for instance maintenanceis an issue, users have problems because of content missing and re-sources are duplicated. Therefore, the work presented in this paperfocuses on the idea where one web site is developed, but can be ac-cessed equally by different user agents which could be accessed bya mobile device or by an assistive technology.

When we look at the guidelines for developing accessible webpages for disabled users and guidelines for developing user friendlymobile pages, we can see that there are significant overlaps (Chuterand Yesilada, 2009; Yesilada et al., 2009d). However, there has notyet been any empirical study to demonstrate that commonalitiesexist. Identifying and confirming such commonalities is importantfor two reasons. Two birds with one stone: pages can be evaluated

y Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.intcom.2011.05.005

mailto:[email protected]

http://dx.doi.org/10.1016/j.intcom.2011.05.005

http://www.sciencedirect.com/science/journal/09535438

http://www.elsevier.com/locate/intcom

1 In this paper, mobile users refer to people without disabilities who use mobiledevices to access the web.

2 See http://www.w3.org/WAI/.3 http://www.w3.org/WAI/eval/selectingtools.html.

526 Y. Yesilada et al. / Interacting with Computers 23 (2011) 525–542

for both accessibility and mobile web support together. This meansthat designers do not need to follow two separate independentmethodologies, saving time and reducing the cost of evaluation.Same problem, same solution: if we can identify common barriers,then we can suggest common solutions to address those commonbarriers. This is crucial for solution transfer between different usergroups; if one solution exists for mobile web users, it can be trans-ferred to disabled users, and viceversa.

To empirically demonstrate these commonalities, a number ofmethods can be used, which include:

1. Referring to existing literature by particularly identifying com-monalities from the existing user studies with both usergroups;

2. Replicating an existing study for one user group with the otherto demonstrate that commonalities exist between the groups;

3. Performing empirical user studies to investigate commonalitieswith both user groups; and finally

4. Using expert evaluator opinion.

Each of these methods have pros and cons. For instance, the firstmethod is cost effective, but the other methods can be quite expen-sive because of the further studies that need to be conducted. Onthe other hand, the first method might not tell us the completestory, as there might be some gaps in the literature. With the sec-ond method, some existing user study results might be reusedwhich would mean saving time and effort, but it might not be easyto replicate existing studies because of the context and technolog-ical changes (Chen et al., 2009). With the third method, conductingstudies with different user groups with the same method and with-in the same time frame would mean collecting up to date informa-tion for an accurate comparison. However, a lot of user studieshave to be conducted to cover different disabilities and mobileweb users. Finally, with the fourth method, opinions of expertswho know a lot about different user groups and their experiencescan be used to draw some conclusions about common experiences.This will mean that one does not need to replicate many user stud-ies. However, it might not be easy to find such experts and even ifexperts are available it might not be easy to have them spend a lotof time in such a study. Compared to these, an alternative approachwould be to combine some of these methods. In this paper, we fo-cus on the combination of the first and last approach.

In order to investigate common experiences in our study, weuse the Barrier Walkthrough (BW) method. The BW method is ananalytical technique (Brajnik, 2006) based on heuristic evaluationthat can be used to evaluate accessibility of web pages (Sears,1997) (see Section 3). An evaluator has to consider a number ofpredefined possible barrier types which are interpretations andextensions of well known accessibility principles (Caldwell et al.,2008). Barrier types are introduced for different user groups suchas motor impairment, hearing impairment, low vision, blind andcognitive impairment (Brajnik, 2006; Brajnik, 2010). This is themain reason we choose to use the BW method; the method caneasily be extended and new barrier types can be introduced forother user groups which is crucial for comparison purposes.

Based on the existing user studies in the literature and also theW3C Mobile Web Best Practices (MWBP) (Rabin and McCathieNe-vile, 2008), we have collected a set of barrier types that can beexperienced by mobile web users (Yesilada et al., 2008; Yesiladaet al., 2009b; Yesilada et al., 2009c). We then mixed this set withthe barrier types experienced by motor impaired, blind and visu-ally impaired users. We ensured that all of these barrier typesare supported by the literature; further information about thesecan be found in Yesilada et al. (2008). By using the mixed set, wethen conducted a BW study with 76 participants which includeboth expert and novice evaluators (see Section 4). We asked these

participants to follow the BW method and evaluate four pages withrespect to blind, visually impaired, motor impaired and mobileusers,1 without knowing which barrier types are specifically intro-duced for which user group.

Our study shows that 58% of the barrier types that were cor-rectly found were identified as common between mobile and dis-abled users (see Table 4). Further, if our aggregated barriersalone were used to test for mobile conformance only four barrierswould be missed (see Table 6). The results also show that mobileusers and low vision users have the largest overlap among barriertypes, and that common barriers are similarly rated for low visionand motor impaired users. The study also shows that if the evalu-ation results for blind, low vision and motor impaired users areaggregated then this can be used to approximate results for mobileweb users. In fact, even evaluation results for low vision are suffi-cient to estimate barriers for mobile users. Finally, even though theaccuracy of the results is best for blind and worst for mobile users,reliability changes little between mobile web and disabled usergroups.

2. Related work

Web accessibility aims to help people with disabilities to per-ceive, understand, navigate, interact, and contribute to the web(Paciello, 2000; Thatcher et al., 2002; Harper and Yesilada,2008a). Most web sites have accessibility barriers that make it dif-ficult or impossible for many people with disabilities to use thesites (Disability Rights Commission (DRC), 2004; Lazar et al.,2007). Web accessibility depends on several different componentsof web development and interaction working together, includingweb software (tools), web developers (people) and content (e.g.,type, size, complexity, etc.). The W3C Web Accessibility Initiative(WAI)2 recognises these difficulties and provides guidelines for eachof these interdependent components: Web Content AccessibilityGuidelines (WCAG) addresses web content (Caldwell et al., 2008),User Agent Accessibility Guidelines (UAAG) addresses user agent(Gunderson et al., 2002) and Authoring Tool Accessibility Guidelines(ATAG) addresses authoring tools (Treviranus et al., 2000). There arealso other organisations that have produced guidelines (e.g., RoyalNational Institute of Blind People (RNIB), American Foundation forthe blind (AFB), etc.) (Harper and Yesilada, 2008b) but the WAIguidelines are more complete and cover the key points of all theothers.

Different methods also exist to assess the accessibility of webpages which include methods such as web page inspection, auto-mated testing, screening techniques and subjective assessment(Brajnik, 2008). Inspection methods are based on an evaluatorinspecting a web page for its accessibility. The most widely usedinspection method is Conformance Review, where the evaluatoruses a set of accessibility guidelines that focus on possible accessi-bility problems and has to decide if a page or web site complies tothose requirements (Abou-Zahra, 2008; Thatcher et al., 2006;Henry, 2004; Disability Rights Commission (DRC), 2004). BarrierWalkthrough (BW), which is the method used in this article andexplained in Section 3, is an accessibility inspection method thatis inspired by heuristic evaluation (Brajnik, 2010). Automated Test-ing involves an evaluator using an automated accessibility tool tocheck conformance of a web page against the accessibility princi-ples encoded in that tool. There are many tools available,3 yieldingdifferent results with different levels of quality (Abou-Zahra, 2008;Thatcher et al., 2006; Brajnik, 2004). Screening Techniques include

http://www.w3.org/WAI/

http://www.w3.org/WAI/eval/selectingtools.html

Y. Yesilada et al. / Interacting with Computers 23 (2011) 525–542 527

a set of lightweight techniques based on using a web site in a waythat some sensory, motor or cognitive capabilities of the user areartificially reduced4 (Henry, 2004), and in such a way to simulatesome of the conditions that are typical for people with disabilities.Finally, Subjective Assessment is a process where an evaluator hiresa panel of users which are asked to explore/use a web site in fullautonomy and send back their opinions; the evaluator then collectssuch feedback to determine accessibility of pages (Henry, 2004).

The work that has been done in the web accessibility field is notonly for disabled people (Henry, 2004); organisations and peoplewithout disabilities can also benefit.5 While flexibility benefitseveryone who uses the web in different situations there has beenlittle research into how web accessibility work can be transferredto the mobile web world. The mobile web aims to improve theexperiences of web users who accessed the web from mobile de-vices. Web technologies have become the key enablers for accessto the Internet through desktop and notebook computing plat-forms. Web technologies have the potential to play the same rolefor Internet access from mobile devices (Chae and Kim, 2003). To-day however, mobile web access suffers from interoperability andusability problems that make the web difficult to access for mostusers (Chae and Kim, 2004; Yang and Wang, 2003; Cui and Roto,2008; Oulasvirta et al., 2005; Roto and Oulasvirta, 2005; Brewster,2002a,b; Wobbrock, 2006). W3C’s ‘‘Mobile Web Initiative’’ (MWI)6

proposes to address these issues through a concerted effort of keyplayers in the mobile production chain, including authoring toolvendors, content providers, handset manufacturers, browser vendorsand Mobile operators. W3C’s MWI published a number of docu-ments to increase the mobile web awareness, and the most wellknown one is the Mobile Web Best Practices (MWBP) 1.0 (Rabinand McCathieNevile, 2008) which was originally derived from WCAG1.0 (Chisholm et al., 19990. In order to deploy MWBP best practicesunambiguously in automated evaluation tools, MWI also introducedmachine test sets based on MWBP 1.0: W3C mobileOK Basic Tests1.0 (Owen and and Rabin, 2008). Passing these tests means thatthe evaluated content provides a functional user experience for usersof basic mobile devices whose capabilities at least match those of theDefault Delivery Context (DDC). DDC can be considered as the min-imum common denominator device profile. There are a number ofautomated mobile evaluation tools that use W3C mobileOK BasicTests such as W3C mobileOK Basic checker,7 TAW mobileOK Basicchecker,8 and ready.moby.9 There are also some tools that are di-rectly based on MWBP 1.0 including EvalAccessMOBILE (Arrueet al., 2007). MWI also developed an open-source library whichcan be downloaded and tested using its web interface. There are alsosome automated evaluation tools that are based on this open-sourcelibrary such as MokE online evaluation tool (Garofalakis and Stefanis,2008). All these mentioned automated evaluation tools evaluate webpages against the DDC, however there are also some tools that aim toconsider different device specifications such as the tool described inVigo et al. (2009).

In summary, when we look at the barriers experienced by mo-bile web users and even the automated evaluation tools, we cansee that there are significant overlaps between the experiences ofmobile web and disabled web users. In the following two sections,we first present a number of examples to demonstrate overlappingexperiences, and we then discuss the overlaps between the W3C’sWeb accessibility guidelines and mobile web best practices.

4 http://www.w3.org/WAI/eval/preliminary.html.5 See http://www.w3.org/WAI/bcase/Overview.html.6 See http://www.w3.org/Mobile.7 http://validator.w3.org/mobile/.8 http://validadores.tawdis.net/mobileok/en/.9 http://ready.mobi.

2.1. Disabled or mobile: same barriers?

Yesilada et al. (2009d) provide examples of barriers that peoplewith disabilities and people using mobile devices experience wheninteracting with web content. In order to thoroughly discuss suchcommon experiences, here we use the following four principles,which are considered as the foundation necessary for anyone to ac-cess and use web content (Caldwell et al., 2008). Detailed informa-tion about these examples can be found in Yesilada et al. (2009d).

Perceivable: Information and user interface components must bepresentable to users in ways they can perceive. For example, ifinformation is conveyed solely with colour, then users might notperceive this colour and therefore can misunderstand or miss thepresented information. A blind user cannot perceive colour and acolour-blind user can perceive incorrectly (Disability Rights Com-mission (DRC), 2004; Coyne and Nielsen, 2001). Small device usersmight also experience similar problems. For example, many smalldevice screens have limited colour capabilities, the colour differ-ence may not be rendered, or the small device may be used in poorlighting (for example, outdoors), where colours are not clearly per-ceived (Barnard et al., 2007; Duchnicky and Kolers, 1983; Joneset al., 1999).

Operable: User interface components and navigation must beoperable. For example, if the page has an inappropriate title thenboth disabled and mobile users cannot easily get an overview ofthe page. A blind user typically uses a screen reader feature toget a list of the currently open windows, indexed by window title(Disability Rights Commission (DRC), 2004). Therefore, if the pagetitle is long, inappropriate or missing, the user may not operate onthe content. Since page title is also used by mobile agents to givean overview, an inappropriate title would also cause similar prob-lems to mobile users (Rabin and McCathieNevile, 2008).

Understandable: Information and the operation of user interfacemust be understandable. For example, content spawning new win-dows without warning user can be considered as a common bar-rier. Users become disoriented among windows and also the backbutton does not work. Users close the window, not realising it islast in stack, shutting down therefore the browser. Users withlow vision, or blindness, or cognitive disabilities do not realisethe active window is new (Craven and Brophy, 2003; Coyne andNielsen, 2001). Similarly, multiple stacked windows on smallscreen hide each other, so mobile users also do not realise thenew active windows.

Robust: Content must be robust enough that it can be inter-preted reliably by a wide variety of user agents, including assistivetechnologies. For example, if the content has invalid and unsup-ported markup, then both disabled and mobile users can haveproblems with this. Disabled users cannot access it because user’sassistive technology or browser cannot handle markup (Edwards,2008). Similarly, mobile users will have problems with this be-cause some older mobile browsers do not display content with in-valid markup (Siek et al., 2004).

2.2. Disabled or mobile: same guidelines?

Chuter and Yesilada (2009) describe the similarities and differ-ences between the requirements in W3C recommendations: WebContent Accessibility Guidelines (WCAG) (Henry, 2008) and MobileWeb Best Practices 1.0 (MWBP) (Rabin and McCathieNevile, 2008).WCAG has two versions, 1.0 and 2.0, and Chuter and Yesilada(2009) compare both versions with the MWBP 1.0. The relationshipbetween WCAG and MWBP is complex because of their technicalapproach to conformance testing, ruling out a single one-to-onemapping between these recommendations. For example, in somecases complying with a specific WCAG provision will meet therelated MWBP; however, the inverse is not always true. However,

http://www.w3.org/WAI/eval/preliminary.html

http://www.w3.org/WAI/bcase/Overview.html

http://www.w3.org/Mobile

http://validator.w3.org/mobile/

http://validadores.tawdis.net/mobileok/en/

http://ready.mobi

Table 1This table shows how much work needs to be done to meet MWBP if a page alreadyconforms to WCAG, and viceversa. Nothing: Already complies with the provision andno further effort is necessary. Something: More effort of some kind is necessary tocomply with the provision. Everything: Do not ensure compliance and it will benecessary to do the work involved.

From MWBP to: From WCAG 1.0 to: From WCAG 2.0 to:

WCAG1.0 (%)

WCAG2.0 (%)

MWBP(%)

MWBP(%)

Nothing 26 10 12 30Something 37 30 22 17Everything 37 60 66 53


comparing these two recommendations revealed that conformingto one can go a long way in meeting the other recommendation.To summarise the overlap, WCAG 1.0 includes 14 guidelines and65 checkpoints (CP), WCAG 2.0 includes 12 guidelines and 61 suc-cess criteria (SC) and MWBP includes 60 best practices (BP). Table 1shows how much work needs to be done if a page already con-forms to one of these standards. In summary, if a page already con-forms to MWBP then 26% of the WCAG 1.0 CPs are fully coveredand 37% needs some little work to cover. If a page conforms toMWBP, then 10% of the WCAG 2.0 SCs are fully covered and 30%needs some little work to conform. If a page conforms to WCAG1.0, then 12% of the MWBP best practices are fully covered and22% needs some little work to conform. If a page conforms toWCAG 2.0, then 30% of the MWBPs are fully covered, and 17%needs some little work to conform. Even though there are somespecific requirements in these documents, we can see that con-forming to web accessibility guidelines can go a long way in con-forming to Mobile Web Best Practices and viceversa.

3. Barrier Walkthrough

The Barrier Walkthrough (BW) method is an analytical tech-nique based on heuristic evaluation (Brajnik, 2006). An evaluatorhas to consider a number of predefined possible barriers whichare interpretations and extensions of well known accessibilityprinciples; they are assessed in a context so that appropriate con-clusions about user effectiveness, productivity, satisfaction, andsafety10 can be drawn, and severity scores can be derived. An acces-sibility barrier is any condition that makes it difficult for people toachieve a goal when using the website in the specified context. Thesebarrier types mainly address issues that hinder one to access and usethe web content where some examples are discussed in Section 2.1.Detailed information about the barrier types can be found in Yesila-da et al. (2008); Table 2 shows an example. In this paper, when wesay ‘‘barrier type’’, we refer to such descriptions in Table 2, and whenwe say ‘‘barrier’’, we refer to an instance of that particular barriertype in a specific page. As can be seen from Table 2, a barrier typeis specified in detail to address the following questions: which usergroups are affected by this barrier? which principles does it address?which guidelines provide recommendations regarding this barrier?what is the cause of this barrier and its failure mode (i.e., symp-toms)? how can this barrier be fixed? how can it be tested automat-ically and by human intervention? and are there some user studiesin the literature that scientifically demonstrates that this barrier ex-ist for certain user groups?

The main difference between the BW method and heuristicevaluation is the barriers provided to the evaluator in the BWmethod. With heuristic evaluation, evaluators consider a smallernumber of usability principles and they are ‘‘free’’ to adopt a

10 Safety is freedom from danger or injury or accident. For example, safety can be anissue for people that are subject to epilepsy.

specific scenario or a user profile. Although the BW method doesnot prescribe any automated tool to be used for evaluation, sinceaccessibility is often based on technical features of how the userinterface is implemented, automated tools are also useful.

The BW prescribes that a barrier severity is graded on a 1–3scale (minor, major, critical), and can be considered in terms of im-pact (the degree to which the user goal cannot be achieved withinthe considered context) and persistence (the number of times thebarrier shows up while a user is trying to achieve that goal). Poten-tial barriers to be considered are derived by interpretation of rele-vant guidelines and principles (Chisholm et al., 1999; Rabin andMcCathieNevile, 2008; Disability Rights Commission (DRC),2004); more details are available at (Brajnik, 2010). There aretwo major benefits of the BW compared to conformance review:by listing possible barriers grouped by user groups, evaluatorsare more constrained in determining whether the barrier actuallyoccurs. Secondly, by forcing evaluators to consider usage scenarios,an appropriate context is available to them for rating severity ofthe problems found. The idea here is that by asking evaluators toconsider usage scenarios, the evaluation process can be more fo-cused. The evaluator knows better about the context of usage.For instance, an example scenario would be ‘‘imagine that a pro-spective student is browsing the site that needs to find out the an-nual tuition fee of the university’’.

Experimental evaluations of the BW (Brajnik, 2006) showedthat it is more effective than conformance reviews in finding moresevere problems and in reducing false positives; however, it is lesseffective in finding all the possible accessibility problems. Otherstudies showed how the BW can be used as a basis for measuringthe accessibility level of a website rather than measuring the con-formance level (Brajnik and Lomuscio, 2007).

4. Common barrier types study

We focus on investigating accessibility issues that are similarbetween mobile web users and blind, visually impaired and motorimpaired users. In the following sections, we provide the details ofthis study.

4.1. Research questions

In order to achieve our overarching goal of identifying andunderstanding commonalities and differences of the experiencesof mobile vs. disabled web users, we investigate a number of re-search questions:

1. Which common barriers exist. Identifying common barrier typesis important for efficiently evaluating web pages. If one wantsto evaluate a page for both its accessibility and mobile web sup-port and some barriers are known to be common, then thesebarriers can be evaluated only once, thus significantly reducingthe time and effort of evaluation. This of course means that theevaluation results of one group will be used to infer conclusionsfor the other group. We therefore also need to find out what isthe effect of web pages on the identified common barriers. Ifthere is a high variability due to pages, the commonality highlydepends on the web page being evaluated, and therefore wecannot generalize the results to the other pages.In order to investigate these issues, we specifically ask the fol-lowing research questions: which and how many barrier typesare in common between mobile web and each of the followinguser groups: blind, low vision and motor impaired users? Whatis the effect of web pages on common barriers?

2. Barrier type aggregation. Previous studies suggest that mobileweb users share experiences with people with different disabil-ities (Yesilada et al., 2009d; Chuter and Yesilada, 2009), and in

Table 2Barrier Walkthrough method: a barrier example – ‘‘mouse events’’.

Users: Older user, mobile user, visually impaired user, motor impaired userPrinciple: OperableGuidelines: WCAG 1.0: 6.3, 6.4, 9.3 (Chisholm et al., 1999), WCAG 2.0: 2.1, 2.1.1, 2.1.3 (Caldwell et al., 2008), MWBP 5.4.5 (Rabin and McCathieNevile, 2008),

Medicare Education (Holt, 2000), Silverweb: H1.2, H1.3 (Zaphiris et al., 2007), Recommendations for Making web Content Accessible to People withCognitive Disabilities (Webaim, 2009), Designing for users with Cognitive Disabilities (Jiwnani, 2001), Best Practices for Accessible Flash Design (Regan,2004)

Cause: The page is based on JavaScript in order to obtain specific effects. JavaScript functions are invoked through event handlers, such as onclick,onmouseover, and onmouseout, that are mouse-oriented

FailureMode:

Users, who may have difficulty in mouse control, such as screen reader users are likely to prefer using the keyboard rather than the mouse for certainactivities. However, mouse-orientated event handlers can create a situation where functionality appears to be available to the user but does not workdue to the user not using mouse-orientated input. Furthermore, some mobile browsers do not support row: Javascript functionality therefore the userwill not be able to use the functionalities achievable through event handlers

Effect: Reduction of effectivenessFix: In addition to mouse-oriented event handlers, use logical event handlers, such as onfocus and onkeypress. If possible, create the functionality

achieved through event handlers without the need for JavaScript so the content is device independentTest: Machine: Check that JavaScript event handlers, such as onclick, onmouseover, and onmouseout, are not the only event handlers associated with a

particular HTML element, Human: Test to see if the user can still operate the elements that focus on mouse interaction without using a mouse. If possibletest with different mobile devices

References: Visually impaired users are not able to access extra information that can only be accessed by specific actions. (Coyne and Nielsen, 2001)’s study showsthat screen magnifier users typically cannot read rollover text. When a screen magnifier user moves the mouse over an image he can only see part of therollover text, but when they move their mouse to see more of the zoomed in screen, he is no longer hovering over the image (Jacko et al., 2003). Studieswith older users have shown that they have difficulty in mouse control, especially with moving the mouse over targets and clicking (Newell et al., 2006;Bailey et al., 2005; Fukuda and Bubb, 2003; Milne et al., 2005. A main cause of this difficulty is that movement control becomes slower and more variablewith age (Ketcham and Stelmach, 2001) with older users sometimes having unsteady hands that make precise mouse movements difficult. Older userstend to pause as they move the cursor around and over the hyperlink target as they have difficulty reaching the target itself (Keates and Trewin, 2005). Inaddition, older users can suffer from arthritis or slight tremors that mean fine movement and selection of small screen elements can be difficult.Additionally, (Trewin and Pain, 1999) find that motor impaired users have problems in pointing small on-screen objects using mouse, and the smaller theobject is, the harder it is to pinpoint. Severely motor impaired users find mouse difficult to use (Sears and Young, 2003). Finally, mobile devices do nottypically have a pointing device which means that mobile users can also experience difficulty in interacting pages designed specifically for mouseinteraction (Roto, 2006).

11 http://hcw.cs.manchester.ac.uk/.12 http://www.sigaccess.org/assets08/.


Brajnik et al. (2009b) we argued that in these cases it is sensibleto aggregate disability-related results to infer something aboutanother user category. Aggregation means to take the union ofthe results corresponding to 2 or more user groups. To investi-gate this further, we propose to aggregate the barrier typesidentified for our disabled user groups (namely blind, visuallyimpaired and motor impaired), and compare the aggregatedresults with mobile web users. If we can show that aggregationof barrier types yields valid results (as they did for older adults,Brajnik et al., 2009b), then if a page is evaluated for those dis-abled user groups, one will be able to draw some conclusionsabout the mobile web support of that page. To investigate thisfurther, we ask: does aggregation work also when dealing withthe mobile web? and which barrier types are common betweenaggregated disabled group and mobile web users? how manybarrier types are common? which user groups when aggregatedlead to the best results?Noticethatcomparedtothepreviousresearchquestion,whichdealtwithcommonbarriersbetweenthemobilewebandeachoftheotheruser groups, this one deals with common barriers between themobilewebandcombinationsoftwoormoreusergroups.

3. Severity rating: Different to conformance testing, the BW resultsalso show how severe a barrier is for a user category. While inthe previous two research questions we focus on investigatingif the barrier types are common, here our aim is to see if thecommon barriers are considered as severe for mobile web usersas for the other user groups. This is important because if thecommon barriers do not have similar severity, triage and prior-itization of barriers when fixing a web site or when ranking websites would be affected. Therefore, we ask what is the distribu-tion of severities of barriers for mobile and other disabled usergroups?

4. Quality: We have previously defined a quality framework(Brajnik et al., 2009a) which defines the quality of accessibilityevaluation methods in terms of: (a) effectiveness (i.e., how gooda method is in systematically identifying all and only truebarriers); (b) usability (i.e., how easily the method can be

understood and learned); (c) usefulness (i.e., the effectivenessand usability of the produced results with respect to stakehold-ers of results, like quality assurance teams, developers, manag-ers, etc.); and (d) efficiency (i.e., the amount of resourcesneeded to complete an evaluation). We want to see how themost important of these quality metrics, effectiveness, is influ-enced by the common barriers. If there is a difference in effec-tiveness between mobile web and the other user groups, thenwe have to be very careful in reusing evaluation results fromother user groups, especially in the direction of decreasingeffectiveness. We therefore ask if there is any difference ineffectiveness of barriers found for mobile web compared tothose found for other disabled user groups?

4.2. Participants

Nineteen expert judges (15 males) aged between 27–72(M = 40.1, sd = 11) and 57 non-experts (43 males) aged between21–46 (M = 24.4, sd = 4.4) took part in this study. Non-expertjudges were mainly undergraduate students who were attendinga course about web accessibility and usability evaluation (52 ofthem); the remainder were graduate web accessibility researchstudents at the Human Centred Web Lab.11 Fifteen expert judgeswho took part in the study were recruited among speakers andattendees of the 10th ACM Conference on Computers and Accessi-bility (ASSETS, 2008)12 and the rest were recruited via personal con-tacts. The experts were invited because either they had publicationson web accessibility or they are currently working as professionalconsultants. In fact, we contacted more than 19 experts, but fourpeople were not able to participate due to time constraints, threepeople did not respond to our invitation and two people did notcomplete the study on time. We believe that the reason why thesepeople did not participate is not related to experimental conditions;

http://hcw.cs.manchester.ac.uk/

http://www.sigaccess.org/assets08/


in other words, we do not think these refusals introduced bias in theresults.

4.3. Materials

The following four web pages were used in our study:

1. ‘‘I love god father movie’’ Facebook group (http://www.new.facebook.com/group.php?gid=2416052053).

2. The Godfather at IMDB (http://www.imdb.com/title/tt0068646/).3. Hall’s Harbour Quilts, Halifax (http://www.novascotiaquilts.

com/).4. Sam’s Chop House Manchester (http://samsmanchester.thevic-

torianchophousecompany.com/).

We used online version of these pages and ensured that theydid not have major updates throughout this study. It was impor-tant to use online versions because saving these pages locallywould mean that we would not be able to store some dynamic con-tent and interaction techniques. Even though some minor updatesoccured throughout our study, this did not affect the accessibilityevaluation results.

These pages were chosen because they differ in terms of layoutcomplexity, the way they are developed, popularity and in terms ofaccessibility support. Before we did this study, we used an auto-mated tool and confirmed that these pages had significant viola-tions which was important for our data collection. When weconsider popularity, the first two pages are in the top 100 mostwidely used pages ranked by Alexa13 whereas the last ones donot appear in top 100 but are typical long-tail pages14 (Anderson,2006). The first two pages are professionally designed unlike the lasttwo. From our previous studies (Yesilada et al., 2007), we also knewthat IMDB had many accessibility barriers. We also investigated thecomplexity of these pages with the method explained in Harper et al.(2008). This method estimates the complexity of a web page basedon the number of words, images and sections a page has. Accordingto this method, these pages have varying complexity; second andthird ones are complex, whereas the other ones are not as complex,therefore regarding the complexity these four pages would provide agood balance in our study.

In this study, each judge evaluated one page, except for twoexpert judges who evaluated two pages each; the pages assignedto judges were randomised. Facebook page was evaluated by fiveexperts and 13 non-experts, IMDB was evaluated by five expertsand 18 non-experts, Quilts was evaluated by seven experts and15 non-experts and Sams was evaluated by four experts and11 non-experts. In our study, 61 barrier types were used andfurther details about these can be found in Yesilada et al.(2009a). Each judge was given a sheet with a randomized listof barriers to counterbalance order effects. The same list wasrepeated once for each of the four user groups considered. Thuspage is a between-subjects factor, user category and barrier typeare within-subjects ones.

4.4. Procedure

Participants were invited to take part in this study via email.When participants accepted to take part in this study, they were gi-ven a judge number and asked to follow the instructions on theexperiment web page. This web page first provided a briefsummary of the study and explained that the participants will beasked to evaluate a web page for both its accessibility for disabled

13 http://www.alexa.com.14 When we refer to typical long-tail pages, we refer to ones that are not created or

designed by professionals.

people and mobile-friendliness. Participants completed the studyin their own time and working environment. In summary, theinstructions included six steps:

1. Participants were asked to read an information sheet which wasalso presented as a web page. This page detailed the purpose ofthe study, and provided answers to questions such as ‘‘can Itake part in this study?’’, ‘‘will my data be anonymous?’’, etc.

2. Participants then were asked to fill in a screening questionnaire,that included questions about age, gender, expertise, etc.

3. After filling in the screening questionnaire, participants wereasked to download the corresponding barrier sheet and to getthe corresponding web page for the given judge number. Thiswas to ensure that each judge would get a randomized barriersheet, and a specific web page, since pages were also assignedrandomly to judge numbers.

4. After participants downloaded the worksheet, they wereinstructed on how to use this worksheet, how to follow the Bar-rier Walkthrough method, and how to differentiate differentseverity ratings. Participants were allowed to use any evalua-tion tool, browser extension and technique they liked. Theywere asked to evaluate each barrier with respect to blind, lowvision, motor impaired and mobile web users. For each barrierand user category, they were asked to check whether that bar-rier exist; if it did not exist then they were asked to enter 0 orleave blank; if it existed they were asked to specify the severitybased on three point scale (1 = minor, 2 = significant and3 = critical) and also explain the rationale for their rating.

5. After participants completed their evaluation, they were askedto fill in a post evaluation questionnaire which captured howlong it took them to complete the study, the tools and tech-niques used, and the participants’ subjective rating of the levelof effort, perceived productivity, and their confidence in theevaluation.

6. Finally, participants were asked to email the barrier sheet,demographic and screening data to us.

4.5. Results

As previously discussed in Section 4.1, this study investigatesfour research questions focusing on common barrier types, barrieraggregation, severity rating and quality. To improve legibility ofthis paper, results of statistical tests are listed in A.

When we look at our expert judges, on average, their subjectiveratings of knowledge in web accessibility in a Likert scale 1 (low-est) to 5 (highest) is 4.6 (sd = 0.6), 47% (10) worked as web acces-sibility consultants and 63% (12) tested more than 10 web sites inthe previous six months. However, they were not experienced intesting web sites for mobile web support: on average they ratedtheir mobile web knowledge as 3.2 (sd = 0.8), only 5% (1) workedas a mobile web consultant and only 16% (3) tested more than10 web sites for mobile web support in the previous six months(see Table 3). When we look at our non-expert judges, on average,their subjective ratings of knowledge in web accessibility is 2.4(sd = 0.9), none of them worked as web accessibility consultantand only 3.6% (i.e., 2 persons) tested more than ten web sites inthe previous six months. They were also not experienced in evalu-ating web pages for mobile web support: on average, they ratedtheir mobile web knowledge as 2.1 (sd = 1.0), none of them workedas a mobile web consultant and only 7% (4) tested more than 10web sites for mobile web support in the previous six months. Thiscould be mainly because our non-experts were students who werealso taking other courses on web and mobile technologies. There-fore, it is not surprising that 7% (4) tested more than 10 web sitesfor mobile web support.

http://www.new.facebook.com/group.php?gid=2416052053

http://www.new.facebook.com/group.php?gid=2416052053

http://www.imdb.com/title/tt0068646/

http://www.novascotiaquilts.com/

http://www.novascotiaquilts.com/

http://samsmanchester.thevictorianchophousecompany.com/

http://samsmanchester.thevictorianchophousecompany.com/

http://www.alexa.com

Table 3Demographics data of 76 (19 experts and 57 non-experts) judges (WA = ‘‘web accessibility’’ and MW = ‘‘mobile web’’).

Experts (19) Non-experts (57) All (76)

WA MW WA MW WA MW

Subjective knowledge rating (1:very low – 5:very high) 4.6 (sd = 0.6) 3.2 (sd = 0.8) 2.4 (sd = 0.9) 2.1 (sd = 0.1) 3.0 (sd = 1.3) 2.4 (sd = 1.1)Worked as a consultant 47% 5% None None 13.5% 1.4%Tested 10 + web sites 63% 16% 3.6% 7% 18.9% 9.5%


4.5.1. Common barrier typesTo explore the barriers that are common between mobile web

users and each of the other disabled user groups, we first need todecide on a given page which are the barriers that are correctlyrated and consider only those that were correctly rated as presentin a given page. We took advantage of the relatively large numberof experts we had in the study but, as often happens with accessi-bility, we had to cope with the unavoidable subjectivity. Weadopted a majority rule to determine when a barrier was correctly

Table 4Number of pages for which the corresponding barrier type was in common for mobileweb compared to blind, low vision and motor impaired users. ‘‘Sum’’ is the total byrow, the ‘‘percentage’’ column gives the proportion (as the maximum value for sum is4 + 4 + 4 = 12).

Mobile-lowvision

Mobile-blind

Mobile-motor

Sum %

Ambiguous links 1 1 1 3 25Cascading menu 1 1 1 3 25Data tables with no structural

relationships0 0 0 0 0

Data tables with no summary 0 0 0 0 0Dynamic menu in JavaScript 1 1 1 3 25External resourcesa 0 0 0 0 0Forms with no LABEL tags 0 0 0 0 0Functional images lacking text 1 3 1 5 42Images used as titles 1 1 0 2 17Inflexible page layout 3 0 1 4 33Insufficient visual contrast 0 0 0 0 0Internal links are missing 3 4 3 10 83Language markup 0 0 0 0 0Large graphicsa 0 0 0 0 0Layout table 1 1 0 2 17Links/buttons are too small 0 0 0 0 0Links/buttons too close to each

other0 0 0 0 0

Long URIsa 1 1 1 3 25Minimize markupa 2 2 2 6 50Missing layout clues 1 1 0 2 17Mouse events 1 1 1 3 25Moving content 0 0 0 0 0New windows 1 1 1 3 25No cookies supporta 0 0 0 0 0No keyboard shortcuts 0 0 0 0 0No page headings 2 2 2 6 50No stylesheet supporta 2 0 0 2 17Non separated links 0 0 0 0 0Page size limita 1 1 1 3 25Rich images embedded in the

background0 0 0 0 0

Rich images lacking equivalenttext

0 1 0 1 8

Scrollinga 4 0 4 8 67Skip links not implemented 3 4 4 11 92Stylesheet sizea 0 0 0 0 0Text cannot be resized 0 0 0 0 0Too many links 1 2 1 4 33Using stylesheetsa 2 2 0 4 33Valid markupa 2 3 3 8 67

Means 0.92 0.87 0.74 2.53 21Sum 35 33 28 96 –

a Shows the barrier types that were specifically introduced for mobile web users.

identified. More specifically, given a page and a user category, abarrier is said to be correctly identified if the majority of experts(i.e. >50%) who rated it agreed on whether its severity was 0 orgreater than 0 (regardless of whether they said 1, 2 or 3). A truebarrier is a correctly identified barrier whose severity rating isgreater than 0. A correct rating is any rating of a correctly identifiedbarrier.

Given this definition of correct answers, when we restrict ourdataset to only true barriers, we obtain 650 data points (78 forfacebook, 207 for imdb, 192 for quilts, 173 for sams; 195 for blind,155 for low vision, 112 for motor impaired, and 188 for mobileweb). Using this dataset we analysed, page by page, which arethe barrier types in common between mobile web and each ofthe other user groups. Table 4 shows the barrier types identifiedby experts with ratings greater than 0. As can be seen from this ta-ble, in total 38 out of 61 barrier types used in the study were trulyidentified. The rest of the barrier types (23 out of 61) were not trulyidentified on the pages used in the study. Table 4 specifically givesthe number of pages where a barrier type was in common betweenmobile web users and each of the other user groups. For example,the barrier type ‘‘Functional images lacking text’’ was identified ascommon on one page for low vision and mobile, on three pages forblind and mobile, and on one page for motor and mobile web users.Barriers that were never true are not listed.

Table 5 shows an alternative view of the same data. It shows thebarrier types that are true on at least one page between mobile anddisabled users. In our study 61 barrier types were used, and

Table 5Common barrier types between disabled and mobile web users. ‘‘

p’’ shows that the

barrier type was commonly identified on at least one web page.

Mobile-blind

Mobile-lowvision

Mobile-motor

Ambiguous linksp p p

Cascading menup p p

Dynamic menu in JavaScriptp p p

Functional images lacking textp p p

Images used as titlesp p

Inflexible page layoutp p

Internal links are missingp p p

Layout tablesp p

Long URIsa p p p

Minimize markupa p p p

Missing layout cluesp p

Mouse eventsp p p

New windowsp p p

No page headingsp p p

No stylesheet supporta p

Page size limita p p p

Rich images lacking equivalenttext

p

Scrollinga p p

Skip links not implementedp p p

Too many linksp p p

Using stylesheetsa p p

Valid markupa p p p

Total: 22 19 21 16

a Shows the barrier types that were specifically introduced for mobile web users.

Table 6True barrier types identified for only specific user groups.

Mobile web users only Users with low vision only

External resources Insufficient visual contrastLarge graphics Moving contentNo cookies support Text cannot be resizedStylesheet size

Blind users only Motor impaired users only

Data tables with no relationships Links/button too close to eachother

Data tables with no summaryLanguage markupNon separated linksRich images embedded in the

background


according to our findings 22 (36%) correspond to true barriers thatare common between mobile users and another user category on atleast one page. But more interestingly, the true barriers that are incommon are 58%, or 22 out of 38, i.e., those listed on Table 4. As canbe seen from Table 5, there are 21 true barrier types between mo-bile and low vision users, 19 between mobile and blind users, and16 between mobile and motor impaired users.

Some barrier types are specific to some user groups, as indi-cated by Table 6. Although there are overlapping sets of barriertypes, also disjoint sets of barrier types exist.

The original BW method only had barrier types specific to dis-abled users (Brajnik, 2006), but we extended it and introducednew barrier types for mobile web users (Yesilada et al., 2009b).When we designed the barrier sheets used by our judges, we com-bined the barrier types introduced for disabled users with those formobile users. These mobile web barriers are highlighted with ‘‘⁄’’in Tables 4 and 5. It is interesting to see that even though somebarrier types were specifically introduced for mobile users, theywere also correctly identified for our disabled user groups. Forexample, barrier types that were identified as common betweenmobile and all other disabled users include: ‘‘Long URIs’’, ‘‘Mini-mize markup’’, ‘‘Page size limit’’ and ‘‘Valid markup’’.

Table 7Evaluation of all the aggregations that we considered in our experiment (ordered byeffect size; MI – motor impaired, LV – low vision and BL – blind). Delta is the absolutevalue of the difference between the means; for v2 the table reports the p-values.

t-test p-value Effect Delta v2 rat v2 id

MI 2.14 0.04 0.39 0.07 0.00 0.00LV + MI 1.43 0.16 0.26 0.04 0.00 0.00BL 1.40 0.17 0.25 0.08 0.00 0.00BL + LV 1.02 0.31 0.18 0.04 0.00 0.00LV 0.21 0.84 0.04 0.01 0.01 0.29BL + MI 0.13 0.90 0.02 0.00 0.00 0.14BL + LV + MI 0.05 0.96 0.01 0.00 0.23 0.56

4.5.2. Barrier type aggregationAggregation of barriers is a technique that we introduced in

Brajnik et al. (2009b) which can be used when assuming that a usercategory shares experiences with other ones. In this case the barri-ers for the latter user category can be considered the union ofbarriers found for the other user groups. In fact, in Brajnik et al.(2009b) we show that barriers for low vision and motor impairedweb users can be aggregated and used to infer conclusions regard-ing the barriers experienced by older users.

The basic idea of aggregation is trivial: on the basis of a BWevaluation performed with respect to certain ‘‘primitive’’ usergroups (e.g., low vision and motor impaired people), take the unionof these barriers and assume that they are relevant with respect tothe target user category (e.g., older adults). The main advantage ofaggregation is that one can reuse BW evaluations performed withcertain user groups to infer, within given levels of accuracy, theeffect on a target category.

The previous section shows that there are common barriertypes between the mobile web and each of the blind, low visionand motor impaired user categories, suggesting that aggregationcould be applied also to mobile web users, seen as aggregation oflow vision, motor impairment and blind.

From Table 5 we can see that 22 out of 61 barrier types werefound between the mobile web and at least one other user categoryon at least one page, which is what aggregating blind, low visionand motor impaired user groups produces.

Since in this experiment we collected barriers for all four usergroups, we can further test if aggregation leads to valid results.To determine how well aggregation works for mobile web users,we used the data about barrier ratings (ratings produced by bothexpert and non-expert judges, correct and incorrect ones). Wecompared the subset of ratings produced for mobile web againstall the possible combinations of the remaining three user groups.For example, one comparison concerns mobile web ratings vs.low vision ratings aggregated with blind ratings. Thus we consid-ered all possible aggregations of the primitive groups (blind, lowvision, motor impaired) against the data we already had for mobileweb.

Because the subsets of ratings that we considered are very un-likely to be exactly the same, therefore barring a straightforwardequality check, we evaluated each comparison (i.e., each aggrega-tion) with a number of statistical methods (following our previousapproach in Brajnik et al., 2009b):

1. For each barrier type we computed the mean of the severity rat-ings on both subsets (for example, for comparing mobile webagainst low vision aggregated with blind, for ‘‘Moving content’’we computed the mean severity obtained for mobile web overthe four pages and that for low vision or blind over the samepages). We then applied a t-test to decide whether these meansare far away or not. p-values smaller than 5% provide evidencethat the means differ.

2. We computed the effect size obtained through the t-test (theeffect size estimates the distance between the means relativeto standard deviation; the higher it is the stronger the differ-ence between the two subsets is).

3. A degree-of-fit test (based on v2) can be used to see if the pro-portions of severity 0–3 ratings in one dataset are similar tothose in the second one. The hypothesis being tested is thatthe two distributions of proportions are the same; hence a sig-nificant p-value (i.e., p < 5%) means that they differ.

4. To widen the scope of the evaluation, we applied the degree-of-fit test both to the proportions of ratings with respect to sever-ity scores {0,1,2,3} and with respect to values {yes,no} corre-sponding to whether a barrier was identified (rating > 0) ornot (rating = 0).

We adopted both the t and v2 tests because the first onecompares the means of the severities 0,1,2,3 of the two sets,whereas the latter compares the proportions across the four levels.For example, it could be the case that while the means are thesame for two sets, the actual proportions are not. That’s why v2

are more sensitive to small differences.From Table 7 we can see that when comparing ratings for

mobile web against the aggregation of the other three user groupswe get the smallest difference (delta), the smallest effect size andboth v2 tests suggest that the distributions are different.

We can also notice that aggregation of low vision and motorimpaired or blind and motor impaired also give very small


differences: the means are not significantly different, the effect sizeand delta are very small in one case; there is a significant differ-ence in the distribution of actual ratings, indicated by the first v2

test, which is not confirmed by the second one. This is due to theextreme sensitivity of the degree-of-fit test to even smallvariations.

Finally, the comparison of mobile web against either motorimpaired or blind shows that the effect size is definitely larger, thet-test even indicates that for motor impaired there is a significantdifference in means.

4.5.3. Severity ratingIn this study, our aim is to see if the barriers identified for

mobile web users are considered as severe as for the other usergroups; thus our analysis below compares severity ratings formobile web against those for low vision, blind or motor impaireduser groups.

Blind vs. mobile web. With respect to the blind user category,there are 33 pairs hbarrier type, pagei (e.g., hAmbiguous link,facebooki) such that the given barrier was correctly found to bepresent in that page for both the blind user category and mobile

Fig. 1. Mean severity for each of the pairs of grou

Table 8Number and proportion of ratings according to severity level.

Blind Mobile Sum

0 227 (.37) 313 (.51) 540 (.44)1 134 (.22) 131 (.21) 265 (.22)2 124 (.20) 95 (.15) 219 (.18)3 128 (.21) 74 (.12) 202 (.16)

Sum 613 (1.00) 613 (1.00) 1226 (1.00)

Table 9Number and proportion of identified barriers.

Blind Mobile Sum

Not-found 227 (.37) 313 (.51) 540 (.44)Found 386 (.63) 300 (.49) 686 (.56)

Sum 613 (1.00) 613 (1.00) 1226 (1.00)

web (corresponding to the 19 barrier types listed in Table 5). Ifwe then collect all the ratings given by all judges to those barriertypes on those pages, including incorrect ones, we obtain a datasetof 1226 ratings. Table 8 shows the breakdown of such ratings. A de-gree-of-fit test shows that the probability distribution of each levelof severity for the blind is significantly different than for mobileweb.

To relax the notion of severity, we can consider if a barrier isidentified or not, discarding the actual severity score when the bar-rier is found (i.e., if severity = 0 or not). The same analysis as abovetells us if judges differed not only in the actual ratings they gave,but also in terms of how many barriers were reported. Table 9shows such numbers; the degree-of-fit test shows also in this casea significant difference (i.e., in terms of distribution of barrier iden-tification, mobile web differs from blind).

Fig. 1(left) shows the mean severity values for blind comparedto mobile web; it can be noticed that the means differ substantially(1.25 vs. 0.89), and the corresponding 95% confidence intervals arefar apart ([1.16,1.34], [0.80,0.89], respectively, with a difference ofat least 0.27 which, on a scale [0,3], is 9% and no more than 0.54,which is 18%). In fact, a non-parametric test for the medians showsthat there is a significant difference.

Table 10 shows the mean severity ratings of barrier types formobile and blind users over different pages and for differentjudges. ‘‘Cascading menu’’ barrier type is considered similarlysevere for both mobile and blind users; however, ‘‘No page head-ings’’ is considered to be much more severe for blind users thanfor mobile web users, whereas the contrary holds for ‘‘Page sizelimit’’. Statistical tests confirm both these insights.

Low vision vs. mobile web. The same analysis with respect to lowvision leads to 1296 ratings, for each of the 35 unique pairs hbarriertype, pagei, and 21 unique barrier types. Table 11 shows thebreakdown of such ratings. A degree-of-fit test shows that theprobability distribution of each level of severity for the low visioncategory is significantly different than for mobile web.

When we consider barrier identification, we get the valuesshown in Table 12, for which the degree-of-fit test fails to supportthe conclusion that there is difference.

Fig. 1(centre) shows the mean severity values for low visioncompared to mobile web; it can be noticed that the means arepretty close to each other (0.93 vs. 0.98), and the corresponding95% confidence intervals overlap. In fact, the non-parametric testfor the medians shows that there is no significant difference.

ps, with associated 95% confidence interval.

Table 10Severity rating differences between mobile and blind users.

Barrier type Severity –blind

Severity –mobile

Diff

Page size limit 0.39 1.17 �0.78Minimize markup 0.32 1.08 �0.76Long URIs 1.13 1.47 �0.34Using stylesheets 1.03 1.13 �0.10Missing layout clues 0.27 0.33 �0.06Cascading menu 1.13 1.13 0.00Internal links are missing 1.24 1.22 0.02New windows 0.20 0.13 0.07Valid markup 1.15 0.98 0.17Too many links 1.29 0.84 0.45Mouse events 1.67 1.13 0.54Skip links not implemented 1.54 0.85 0.69Functional images lacking text 1.47 0.70 0.77Images used as titles 1.40 0.60 0.80Layout tables 1.47 0.60 0.87Rich images lacking equivalent

text1.07 2.00 0.93

Dynamic menu in JavaScript 2.20 1.20 1.00Ambiguous links 1.70 0.43 1.27No page headings 2.11 0.59 1.52

Table 11Number and proportion of ratings according to severity level.

Low vision Mobile Sum

0 300 (.46) 308 (.48) 608 (.47)1 154 (.24) 129 (.20) 283 (.22)2 133 (.21) 127 (.20) 260 (.20)3 61 (.09) 84 (.13) 145 (.11)

Sum 648 (1.00) 648 (1.00) 1296 (1.00)


Low vision Mobile Sum

Not-found 300 (.46) 308 (.48) 608 (.47)Found 348 (.54) 340 (.52) 688 (.53)Sum 648 648 1296

Table 13Severity rating differences between mobile and low vision users.

Barrier type Severity – lowvision

Severity –mobile

Diff

Page size limit 0.43 1.17 �0.74Minimize markup 0.35 1.08 �0.73Long URIs 1.13 1.47 �0.33Scrolling 1.14 1.42 �0.28Mouse events 0.93 1.13 �0.20Valid markup 0.92 1.11 �0.18Too many links 0.96 1.13 �0.17Using stylesheets 1.00 1.14 �0.14Cascading menu 1.00 1.13 �0.13No stylesheet support 0.44 0.49 �0.05Missing layout clues 0.33 0.33 0.00Ambiguous links 0.48 0.43 0.05Functional images lacking

text0.66 0.60 0.07

Skip links not implemented 0.98 0.85 0.13New windows 0.27 0.13 0.13Internal links are missing 1.43 1.28 0.15No page headings 0.76 0.59 0.16Dynamic menu in

JavaScript1.40 1.20 0.20

Layout tables 0.80 0.60 0.20Images used as titles 1.00 0.60 0.40Inflexible page layout 1.46 1.05 0.41

Table 14Number of and proportion of ratings according to severity level.

Motor Mobile Sum

0 253 (.48) 236 (.45) 489 (.46)1 114 (.22) 109 (.21) 223 (.21)2 103 (.20) 106 (.20) 209 (.20)3 54 (.10) 73 (.14) 127 (.24)

Sum 524 524 1048


Blind Mobile Sum

Not-found 253(.48) 236(.45) 489(.46)Found 271(.52) 288(.55) 559(.54)Sum 524 524 1048


Table 13 shows the mean severity ratings for mobile and low vi-sion users over different pages and judges. ‘‘Missing layout clues’’barrier type is considered similarly severe for both mobile andblind users. However, ‘‘Inflexible page layout’’ is considered to bemuch more severe to blind users compared to mobile web users,and viceversa for ‘‘Page size limit’’. Statistical tests confirm theseinsights.

Motor impaired vs. mobile web. Finally, the same analysis withrespect to motor impaired leads to 1048 ratings, for each of the28 unique pairs hbarrier type, pagei and 16 unique barrier types.Table 14 shows the breakdown of such ratings. The degree-of-fittest shows that the probability distribution of each level of severityfor the motor impaired category is not significantly different thanfor mobile web.

When we consider barrier identification, we get the valuesshown in Table 15, for which the degree-of-fit test fails to supportthe conclusion that there is difference.

Fig. 1(right) shows the mean severity values for low vision com-pared to mobile web; it can be noticed that the means are close toeach other (0.92 vs. 1.03), and the corresponding 95% confidenceintervals overlap. In fact, the non-parametric test for the mediansshows that there is no significant difference.

Table 16 shows the mean severity ratings for mobile and motorimpaired users for barrier types over different pages and by differ-ent judges. ‘‘New windows’’ barrier type is considered similarly se-vere for both mobile and motor impaired users. However,‘‘Cascading menu’’ is considered to be much more severe to motorimpaired users compared to mobile web users, and viceversa for‘‘Page size limit’’. Statistical tests confirm only the latter difference:‘‘Page size limit’’ is rated significantly more severe for mobile webusers than motor impaired users; no significant difference is foundfor the other barrier.

4.5.4. QualityWe have previously proposed that effectiveness can be investi-

gated by decomposing it into validity and reliability (Brajnik et al.,2009a). Validity of the method is the extent to which all and onlythe real problems can be identified. Reliability is the extent towhich independent evaluations (for example, because performedby different evaluators, or at different times, or in different situa-tions) produce the same results.

We investigate the validity of results for different user groupswith four different variables, all based on the notion of correctly


identified barrier introduced in Section 4.5.1: accuracy, correctness,sensitivity and f-measure.

Validity via Accuracy. Accuracy rate is the simplest index ofvalidity, being defined as the proportion of ratings that are correct.Fig. 2 shows the number of correct and incorrect ratings that ourjudges gave to the true barriers that are in common betweenmobile web and each of the other user groups (‘‘blind-mobile’’ isthe subset of mobile web barriers that are in common with blind,and similarly for ‘‘low vision-mobile’’ and ‘‘motor-mobile’’). Forexample, Fig. 2(left) shows that the number of incorrect ratingsfor blind is 334 while for barriers in common with respect to blindand mobile the number is 247.

More interestingly, Fig. 2(right) shows the actual values of accu-racy, the proportion of correct ratings. We can see that accuracy isremarkably similar (in fact no significant difference exists) for allthe user groups except blind. In fact, accuracy for mobile web vs.blind differs significantly (with a gap between confidence intervals

Fig. 2. Correct and incorrect ratings for each of the user groups (left); number of correctare the accuracy rates.

Table 16Severity rating differences between mobile and motor impaired users.

Barrier type Severity –motor

Severity –mobile

Diff

Page size limit 0.39 1.17 �0.78Minimize markup 0.35 1.08 �0.73Inflexible page layout 0.60 1.13 �0.53Dynamic menu in JavaScript 0.87 1.20 �0.33Scrolling 1.13 1.42 �0.30Long URIs 1.27 1.47 �0.20Functional images lacking

text0.47 0.60 �0.13

Valid markup 0.87 0.98 �0.12Ambiguous links 0.34 0.43 �0.086New windows 0.13 0.13 0.00No page headings 0.62 0.59 0.03Too many links 1.17 1.13 0.04Skip links not implemented 0.96 0.85 0.12Internal links are missing 1.47 1.28 0.18Mouse events 1.53 1.13 0.40Cascading menu 1.73 1.13 0.60

of 0.05), while it is statistically indistinguishable for mobile web vs.low vision or mobile web vs. motor impaired.

Validity via correctness, sensitivity and f-measure. Accuracy dealswith the entire set of ratings provided by a group of judges; it doesnot characterize the performance of evaluations performed byindividual judges. Correctness, sensitivity and f-measure can bedefined to capture that.

After restricting ourselves to barriers that are correctly identi-fied and in common between the mobile web and each of the otherthree user groups, given a page and a user category, we define thetrue barriers set (TBS) as the set of all true barriers that judgesfound, and given a page, a user category and a judge, the found bar-riers set (FBS) are the set of barriers with severity > 0 reported bythat judge (regardless whether they are correctly identified ornot). These sets can be used to define three indexes:

Correctness C ¼ jTBS\FBSjjFBSj is the proportion of found barriers that are

also correct.Sensitivity S ¼ jTBS\FBSj

jTBSj is the proportion of all the true barriersthat were found.

F-measure F ¼ 2C�SCþS is the harmonic mean of C and S, a balanced

combination of C and S summarizing validity of anevaluation. Neither correctness nor sensitivity alonecan characterize validity, they have to be consideredjointly: a method yielding only a few of the trueproblems would be correct but highly non-sensitive;viceversa, a method yielding a lot of problems includ-ing all the true ones would be sensitive but highlynot-correct. f-measure is a convenient way to providean overall index of validity.

The dataset that one obtains in such a way consists of 78 pairshjudge, categoryi, for which there are appropriate values forcorrectness, sensitivity and f-measure.

Fig. 3 shows the means of correctness together with their (non-parametric) 95% confidence intervals. In fact, there is only a

ratings are given explicitly. Accuracy and 95% confidence interval (right); numbers

Fig. 4. Boxplots for sensitivity (left) and means o

Fig. 3. Means of correctness with 95% confidence intervals.


significant difference of correctness between mobile web vs. blind;mobile web vs. low vision, and mobile web vs. motor impaired donot differ in terms of correctness.

Fig. 4 shows the boxplot of the distribution of sensitivity acrossthe different user groups. We can see that the highest median is forblind, and the lowest one for blind-mobile web. There are signifi-cant differences of sensitivity between mobile web and blind,and between mobile web and motor impaired, but not betweenmobile web and low vision. Fig. 4 shows also the means of sensitiv-ity together with the 95% confidence intervals.

Finally, Fig. 5 shows the boxplot of the distribution of f-measureacross the different user groups. We can see that the highest med-ian is for blind, and the lowest one for mobile web. There are sig-nificant differences of f-measure between mobile web vs. blind,and mobile web vs. motor impaired. Fig. 5 shows also the meansof f-measure together with the 95% confidence intervals. Table17 gives these confidence intervals; we can see that blind vs. mo-bile web.blind differ by [0.02,0.25].

Reliability. This is the extent to which independent evaluationsproduce the same results. It is independent from validity, andideally one would like to obtain results that are both valid (i.e.,true) and reliable (i.e., always); in practice this is never the case.

In this paper we measure reliability in terms of maximum agree-ment, given the set of ratings provided by different judges on a bar-rier with respect to a page and a user category. Max-agreement isdefined as the relative frequency of the mode, i.e., the percentageof occurrence of the most frequent value of the set of ratings.Because the minimum value of max-agreement is determined bythe resolution scale of the ratings (for example, with ratings in{0,1,2,3}, the minimum value for max-agreement is 0.25, whereasfor binary ratings the minimum value would be 0.5), we computealso a linear adjustment to normalize it within [0,1], so that 0 cor-responds to the minimum value and 1 to 1. Such a normalizedmax-agreement can be based on actual severity ratings (from theset {0,1,2,3}, maR), or on whether the barrier was deemed to bepresent or not (i.e., its rating is greater than 0 or not, maI), or on

f sensitivity with 95% confidence intervals.

Fig. 5. Boxplots for f-measure (left) and means of f-measure with 95% confidence intervals.

Table 17Confidence intervals around the means of f-measure.

Blind Blind-mobile

Low.vis Low.vis-mobile

Motor Motor-mobile

LB 0.71 0.56 0.61 0.59 0.59 0.60UB 0.81 0.69 0.72 0.71 0.71 0.74

Mean 0.76 0.62 0.67 0.65 0.64 0.67


whether the barrier was deemed to be severe or not (i.e., its ratingis greater than 1 or not, maS).

In brief, maximum agreement was computed based on the barri-ers that are common with respect to mobile web and each of theother user categories (mobile web and blind/low vision/motor im-paired). For each of these pairs of categories and for each tuple hbar-rier type, user category, pagei on the corresponding vector ofseverities (one element of the vector for each of the judges that eval-uated such a tuple), we computed max agreement. It was computedon severity in {0,1,2,3} or on {sev = 0, sev > 0} or on {sev6 1,sev > 1}.

After restricting to true barriers that are in common betweenmobile web and each of the other user groups, the three types ofmax agreement were computed. Fig. 6 shows their distribution.

We can see that the largest differences between mobile web andother groups are for maR on low vision vs. mobile web, and on motorimpaired vs. mobile web; for maI on blind vs. mobile web; for maS onall three pairs. We can also notice that variability of max agreementis larger for mobile web and motor impaired with respect to maS.

In terms of pairwise comparisons between mobile web andeach of the other user groups, we notice a significant differencefor maR with respect to motor impaired (maR for motor = 0.43

and motor-mobile = 0.34), and for maS with respect to low visionand motor impaired (maS for low vision = 0.51, low vision-mo-bile = 0.41, and motor = 0.49, motor-mobile = 0.38); the other pairsare not significantly different.

Fig. 7 shows the mean values of max agreement with their 95%confidence intervals. It is readily seen that there is little variabilityaround the overall means, which are respectively 0.40, 0.36 and0.44 for maR, maI, maS. Notice that all these are relatively smallvalues for max-agreement. That means one should expect similaragreement between the evaluations for mobile and blind, mobileand low vision, and mobile and motor impaired user categories.

To measure inter-rater agreement we computed also the Co-hen’s j (Kappa) coefficient, whose maximum value is 1. We didthat separately, for experts and non-experts, and for each combina-tion of page and user category, by averaging over all pairs of judgesthe Kappa coefficient obtained for each pair. For a pair of judges jis a measure of how much the two judges agreed on the severityratings they gave to the entire set of barrier types (not only thosethat are in common with mobile web). Table 18 shows the aver-aged j split by page and user category and same data is also illus-trated in Fig. 8. Similar to max-agreement results, mobile usersgroup has the lowest j value. This is probably because of theknowledge of our judges. The j results also show that page has arole on the agreement, for instance for all user categories Quiltshas the highest j values and IMDB has the lowest.

5. Discussion

In this study we investigated 61 barrier types and the resultsshow that 58% of those that were correctly found are commonbetween mobile and either of blind, low vision and motor impaired

Fig. 6. Boxplots for normalized max agreement maR, maI and maS. We can compare maR for blind against mobile web (1st and 2nd boxplot), low vision against mobile (3rdand 4th), etc.


users. Even though the number of common barrier types betweenthe three groups we compared (mobile vs. blind, mobile vs. lowvision, mobile vs. motor) are close (50%, 55%, 42%), we saw thatlow vision and mobile users have the highest number of commonbarrier types. It is also interesting to note that even though somebarrier types were specifically included for mobile users (e.g., ‘‘LongURIs’’), they were also truly identified for disabled user groups.

Our results also show that some barrier types are specific to dif-ferent user groups. Therefore, we can say that our results also sup-port the view in Chuter and Yesilada (2009), which indicates thatsome guidelines are common between WCAG and MWBP andsimilarly, some other guidelines are disjoint. We can see that forthe common barriers, if we only used our aggregated user groupsto check for mobile barriers we would only miss four such barriersamong our correctly found barrier types. These barriers are quitespecific to the mobile domain’s transmission best practice being:‘‘External resources’’, ‘‘Large graphics’’, ‘‘No cookie support’’, and‘‘Stylesheet size’’, and so it could be argued that these are not inter-face or interactivity considerations at all. This means that our

ability to use aggregated barriers to identify barriers to mobileinteractivity, and therefore situationally-induced impairments, ismuch enhanced beyond the 58% we have already mentioned.

These numbers would suggest that reusing results obtained forone user category to draw some conclusions on mobile web is viable.Common barriers were found on 21% of the pages; in other wordsapproximately 1 out of 4 pages contain barriers that are in commonbetween some of the pairs of user groups we compared. Thus we be-lieve that pages are an important factor, potentially reducing thechances to reuse results obtained for one category when consideringmobile web. On the other hand, it is likely that on 21% of the pages,fixing one of the common barriers achieves automatically a benefi-cial effect with respect to mobile web users, and viceversa. Noticethat we cannot generalize our findings in terms of pages, as our sam-ple is based on only four specimen. Another study with a larger set ofpages need to be conducted to confirm these findings.

In order to further investigate how the common barriers can beused for making evaluation process more efficient for both mobileand disabled user groups, we investigated the aggregation idea. We

Fig. 7. Means of max agreement maR, maI and maS and 95% confidence intervals. Numbers are the actual values of the means.


saw that 22 true barriers (58%) were in common between mobileweb and the other aggregated user groups. Aggregating all threeuser groups gives the best results, in terms of approximating dataobtained directly for mobile web: very small errors in ratings willoccur. Second best results can be obtained when aggregating low

Table 18Kappa (j) by site and user category.

Facebook IMDB Quilts Sams Mean

Blind 0.22 0.16 0.29 0.24 0.23Low vision 0.16 0.12 0.18 0.19 0.16Motor 0.16 0.18 0.25 0.20 0.20Mobile 0.11 0.10 0.19 0.17 0.14

Mean 0.16 0.14 0.23 0.20 0.18

vision and motor impaired, or blind and motor impaired. Usingonly motor impaired would lead to significant errors in ratings.Our data even shows that using low vision data only as an estimateof mobile web is better than using either of the other user groupsalone. This was also supported by the analysis of common barrierswhich showed that mobile users have the most common barriertypes with low vision users. Also this result can be generalized onlyin terms of judges, but not in terms of pages.

In terms of distribution of severity scores we see that mobile webdiffers from blind, differs from low vision, but does not differ frommotor impaired. That is to say even if some barriers are commonlyidentified between blind and mobile web, and similarly betweenlow vision and mobile web, we still expect to see different severityrating of those barriers. For instance, common barriers are rated asmore severe when referring to blind users than mobile web by atleast 9%, and no more than 18%. When we also look at some barrier

Fig. 8. Kappa (j) by site and user category.


types, we also see that across all different user groups, they consis-tently show the same pattern. For instance, barrier type ‘‘page sizelimit’’ is consistently rated less severe for disabled users than mo-bile users, and similarly ‘‘no page headings’’ is consistently ratedless severe for mobile users than disabled users. Therefore, wecan say that when one uses aggregated results, they have to care-fully re-use the severity ratings.

In terms of distribution of barrier identification (i.e., severity > 0and severity = 0), mobile web differs from blind, does not differfrom low vision nor from motor impaired. This means that not onlyexperts rate differently the barriers (blind vs. mobile web), but alsothey identify different barriers. These differences do not show upfor the other user groups.

Therefore when reusing evaluations, for low vision or motorimpaired (compared to mobile web) we should not see muchdifferences in the way common barriers are prioritized for rankingwebsites or are triaged for fixing. Differences are likely to occurwhen dealing with the blind user category.

In order to investigate the quality of the BW results for mobile anddisabled user groups, we have investigated validity and reliability.

Validity is investigated in terms of accuracy and correctness,sensitivity and f-measure, which are all dependent on the notionof correctly identified barrier. Accuracy ranges between 40% and54%, it does not change between mobile web and each of the othergroups, when using experts’ ratings for barriers that are in com-mon. This means that a similar level of quality should be expectedwhen reusing evaluations. The only difference is again betweenblind and mobile web, where the accuracy difference is at least5% (higher for blind).

When we look at correctness, it is relatively high (ranging be-tween 0.87 and 0.97), its lowest value is for the subset of mobileweb barriers in common with blind; this is because on such a sub-set our evaluators produced the highest number of false positives.We can also notice that correctness with respect to the three usergroups dealing with disabilities is essentially the same, althoughuncertainty (i.e., width of the confidence intervals) increases aswe move from blind to low vision and to motor impaired. Oneway to interpret this is that the number of false positives reportedby judges varies more with motor impaired than blind.

In terms of sensitivity, mobile web differs from blind and frommotor impaired, but not with low vision. Scores range between

0.52 and 0.65, the highest being again for blind and the lowestfor the subset of mobile web barriers in common with blind. Thismeans that the ability to capture all the true barriers is highestwith respect to the blind, and lower with the subset of mobileweb barriers in common with blind.

F-measure is a combination of correctness and sensitivity ancan be interpreted as an overall index of validity. Mobile web sig-nificantly differs from blind and from motor impaired. A change ofx% in f-measure correspond to a simultaneous change of x% in cor-rectness and x% in sensitivity. f-measure ranges between 0.62 and0.78, the highest being for blind and the lowest for the subset ofmobile web barriers in common with blind. This difference issubstantial (between 2% and 25%), confirming that quality dropsbetween blind and mobile web. The other difference is negligible.One possible explanation is that judges showed lower levels ofexpertise for mobile web barriers than for disability-related ones.

Reliability is estimated in terms of maximum agreement. Boththe distribution of max agreement of ratings and their means showthat there are no large differences between mobile web and othergroups, except for the only significant difference that was found formotor impaired. This means that reliability of common true barri-ers appears not to be affected, i.e., the same level of repeatability ofresults can be obtained for mobile web and for blind or low vision.Even more similar is reliability that can be obtained for mobile weband the other user groups with respect to max agreementcomputed on barrier identification. Reliability changes a littlebetween mobile web and low vision or motor impaired whencomparing severe vs. non-severe barriers.

Finally, our study has a number of weak aspects. First one is aboutthe number of pages used in our study. Since we use four web pageswe could only cover barrier types that existed on those four webpages. In order to cover all of the barrier types introduced in theBW method, we should have used many more web pages. Usingmany more pages means much more effort is required from our ex-pert participants. However, further studies can be conducted to cov-er the missing barrier types. A second one is about the subjectivity ofthe severity ratings. In our previous publications (Brajnik et al.,2009b) we showed that severity rating is not any more subjectivethan identification of barrier types. Therefore, even though furtherstudies can be conducted without asking participants to do severityrating, we believe this would not have any effect on the overall


results. A third one is about the fact that both experts and non-experts conducted the study in their own time. In order to eliminatethe fact that for example students might not take this study seri-ously, their evaluation was considered as part of the grading of theircourse. We particularly used f-measure to grade their work and webelieve this provided a strong motivation for them to take this eval-uation seriously. If we think about this for experts, in any real worldevaluation, they conduct these kinds of studies in their own time,therefore we believe our study was as close as possible to a realworld implementation of such a method.

6. Conclusions

In this paper we have presented a Barrier Walkthrough studythat investigates the common barriers between mobile and dis-abled users; in particularly blind, low vision and motor impairedusers. This study shows that 58% of the true barrier types wereidentified as common between mobile and disabled users. Further,if our aggregated barriers alone were used to test for mobile con-formance only four barriers would be missed. The results alsoshow that mobile users and low vision users have the most com-mon barrier types, but regarding the severity, common barriersare similarly rated for low vision and motor impaired users. Thisstudy also shows that if the evaluation results for blind, low visionand motor impaired users are aggregated then this can be used toapproximate results for mobile web users. In fact, the results showthat only evaluation results for low vision is sufficient to estimateresults for mobile users. Finally, even though the accuracy of theresults is the best for blind and worst for mobile users, reliabilitychanges a little between mobile web and disabled user groups.

Acknowledgement and experimental data

The authors thank all the experts that we contacted, and espe-cially those who contributed to this study. Many thanks also to thestudents of the course Progettazione di siti web 20082009 andthose graduate students at the Human Centred Web (HCW) Labwho spent a lot of time performing the evaluation. We also thankthe reviewers for their extremely detailed and helpful reviews.Data of this study can be found at the HCW Lab data repository,http://wel-eprints.cs.manchester.ac.uk/114/.

Appendix A. Statistical tests

1. Degree-of-fit of severity levels for the blind compared tomobile web, using ratings of all the judges: v2(3) = 71.96,p < 0.001;

2. Degree-of-fit of barrier identification for the blind comparedto mobile web, using ratings of all the judges: v2(1) = 48.28,p < 0.001;

3. Non parametric test for the medians of the severities (blindvs. mobile web): W > 220,000,p < 0.001; a non-parametrictest is used because the distribution of severity values is pos-itively skewed, with a peak for severity = 0;

4. Degree-of-fit of severity levels for the low vision categorycompared to mobile web, using ratings of all the judges:v2(3) = 11.63,p < 0.009;

5. Accuracy for mobile web vs. blind: v2(1) = 51.3, p < 0.0001;6. Wilcoxon signed rank test with paired samples comparing

correctness of evaluations of the same judge on the samepage with respect to mobile web and blind gives a significantdifference (W = 36, p < 0.006). A non parametric test wasused because the distribution of correctness is negatively

skewed, with a peak at 1. For this reason confidence inter-vals of correctness were computed using the bootstrap tech-nique (3000 replications).

7. Wilcoxon signed rank test with paired samples comparingsensitivity for mobile web and blind gives W = 2016,p <0.0001; comparing mobile web and motor impaired givesW = 143,p < 0.0001;

8. Wilcoxon signed rank test with paired samples comparingf-measure for mobile web vs. blind and mobile web vs.motor impaired gives W = 2016,W = 197,p < 0.001;

9. Wilcoxon signed rank test with paired samples comparing maR

for mobile web and motor impaired: W = 190,p = 0.0096; maS

for mobile web and low vision: W = 253,p = 0.0486; maS formobile web and motor impaired: W = 165,p = 0.026. Sincethe distribution of agreement values is not normal, a non para-metric test was used as well as the bootstrap technique forcomputing the confidence intervals.

10. The following test results support the severity rating resultsin Section 4.5.3 (t-test between the means; unpaired, twotailed, Welch correction):
� Comparing the barrier type ‘‘No page headings’’ between
blind and mobile users, t(71.88) = 6.88,p < 0.0001, means2.10 vs. 0.59, effect size (Cohen’s d) = 1.60, which shows thatmean severity of this barrier is much higher for blind users.

� Comparing the barrier type ‘‘Page size limit’’ betweenblind and mobile users t(37.72) = 2.83p < 0.05, means0.39 vs. 1.17, d = 0.83, which shows that mean severityis much higher for mobile users.

� Comparing the barrier type‘‘ Page size limit’’ between lowvision and mobile users t(39.6) = 2.60,p = 0.013, means0.43 vs. 1.17, d = 0.77, which shows that severity is muchhigher for mobile users.

� Comparing the barrier type ‘‘Inflexible page layout’’between low vision and mobile users, t(109.99) = 1.96p =0.05, means 1.46 vs. 1.05, d = 0.37, which shows that sever-ity is only marginally different for the two groups.

� Comparing the barrier type ‘‘Page size limit’ betweenmotor and mobile users, t(18.37) = 3.26,p = 0.0043, means0.39 vs. 1.17, d = 0.83 which shows that severity is muchhigher for mobile users.

� Comparing ‘‘Cascading menu’’ between motor andmobile users, t(27.99) = 1.12,p = 0.27, which shows thatthe difference is not significantly different.

References

Abou-Zahra, S., 2008. Web accessibility evaluation. In: Harper, S., Yesilada, Y. (Eds.),Web Accessibility: A Foundation for Research, . first ed., Human–ComputerInteraction Series first ed., vol. 7 Springer, London, pp. 79–106, Chapter 7.

Anderson, C., 2006. The Long Tail – How Endless Choice is Creating UnlimitedDemand. Random House Business Book.

Arrue, M., Vigo, M., Abascal, J., 2007. Automatic evaluation of mobile webaccessibility. In: Universal Access in Ambient Intelligent Environments, UI4All2006, LNCS 4397. Springer, pp. 244–260.

Bailey, S., Barrett, S., Guilford, S., 2005. Older users’ interaction with websites. In:Workshop on HCI and the Older Population, British HCI.

Barnard, L., Yi, J.S., Jacko, J.A., Sears, A., 2007. Capturing the effects of context onhuman performance in mobile computing systems. Personal and UbiquitousComputing 11 (2), 81–96.

Brewster, S., 2002a. Overcoming the lack of screen space on mobile computers.Personal and Ubiquitous Computing 6 (3), 188–205.

Brajnik, G., 2004. Comparing accessibility evaluation tools: a method for tooleffectiveness. Universal Access in the Information Society 3 (3), 252–263.

Brajnik, G., 2006. Web accessibility testing: when the method is the culprit. In:Computers Helping People with Special Needs, Lecture Notes in ComputerScience, vol. 4061/2006. pp. 156–163.

Brajnik, G., 2008. Beyond conformance: the role of accessibility evaluation methods.In: WISE ’08: Proceedings of the 2008 International Workshops on WebInformation Systems Engineering. pp. 63–80.

http://wel-eprints.cs.manchester.ac.uk/114/


Brajnik, G., 2010. Barrier Walkthrough: Heuristic Evaluation Guided by AccessibilityBarriers. <(http://users.dimi.uniud.it/giorgio.brajnik/projects/bw/bw.html>.

Brajnik, G., Lomuscio, R., 2007. Samba: a semi-automatic method formeasuring barriers of accessibility. In: Assets ’07: Proceedings of the 9thInternational ACM SIGACCESS Conference on Computers and Accessibility.ACM, pp. 43–50.

Brajnik, G., Yesilada, Y., Harper, S., 2009a. The Expertise Effect on WebAccessibility Evaluation Methods. Human–Computer Interaction. Awaitingpublication.

Brajnik, G., Yesilada, Y., Harper, S., 2009b. Web accessibility guideline aggregationfor older users and its validation. Universal Access in the Information Society 10(4), 2011.

Brewster, S., 2002b. Overcoming the lack of screen space on mobile computers.Personal and Ubiquitous Computing 6 (3), 188–205.

Caldwell, B., Cooper, M., Reid, L.G., Vanderheiden, G., 2008. Web ContentAccessibility Guidelines (WCAG) 2.0. W3C. <http://www.w3.org/TR/WCAG20/>.

Chae, M., Kim, J., 2003. What’s so different about the mobile internet?Communications of the ACM 46 (12), 240–247.

Chae, M., Kim, J., 2004. Do size and structure matter to mobile users? An empiricalstudy of the effects of screen size, information structure, and task complexity onuser activities with standard web phones. Behaviour and InformationTechnology 23 (3), 165–181.

Chen, T., Yesilada, Y., Harper, S., 2009. What input errors do you experience? typingand pointing errors of mobile web users. International Journal of Human–Computer Studies, 68. Elsevier, pp. 138–157.

Chisholm, W., Vanderheiden, G., Jacobs, I., 1999. Web Content AccessibilityGuidelines 1.0. W3C. <http://www.w3.org/TR/WCAG10/>.

Chuter, A., Yesilada, Y., 2009. Relationship Between Mobile Web Best Practices(MWBP) and Web Content Accessibility Guidelines (WCAG). Technical Report,W3C Web Accessibility Initiative and W3C Mobile Web Best Practices. <http://www.w3.org/TR/mwbp-wcag/>.

Coyne, K.P., Nielsen, J., 2001. Beyond ALT Text: Making the Web Easy to use forUsers with Disabilities. Nielson Norman Group.

Craven, J., Brophy, P., 2003. Non-visual access to the digital library: the use of digitallibrary interfaces by blind and visually impaired people. Library andInformation Commission Research Report, 145.

Cui, Y., Roto, V., 2008. How people use the web on mobile devices. In: WWW ’08:Proceeding of the 17th International Conference on World Wide Web. ACM,New York, NY, USA, pp. 905–914.

Disability Rights Commission (DRC), 2004. The Web: Access and Inclusion forDisabled People. ISBN:0 11 703287 5. <http://www-hcid.soi.city.ac.uk/research/DRC_Report.pdf>.

Duchnicky, R.L., Kolers, P.A., 1983. Subscribed content readability of text scrolled onvisual display terminals as a function of window size. The Journal of the HumanFactors and Ergonomics Society 25 (6).

Edwards, A.D.N., 2008. Assistive technologies. In: Harper, S., Yesilada, Y. (Eds.), WebAccessibility: A Foundation for Research, . 1st ed., Human–ComputerInteraction Series 1st ed. Springer, London, pp. 142–162), Chapter 10.

Fukuda, R., Bubb, H., 2003. Eye tracking study on web-use: comparison betweenyounger and elderly users in case of search task with electronic timetableservice. PsychNology Journal 1 (3), 202–228.

Garofalakis, J., Stefanis, V., 2008. Moke: a tool for mobile-ok evaluation of web content.In: W4A ’08: Proceedings of the 2008 International Cross-Disciplinary Conferenceon Web Accessibility (W4A). ACM, New York, NY, USA, 2008, pp. 57–64.

Gunderson, J., Jacobs, I., Hansen, E., 2002. User Agent Accessibility Guidelines 1.0.W3C. <http://www.w3.org/TR/WAI-USERAGENT/>.

Harper, S., Michailidou, E., Stevens, R., 2008. Toward a definition of visualcomplexity as an implicit measure od cognitive load. ACM Transactions onApplied Perception 6 (2), 1–18.

Harper, S., Yesilada, Y. (Eds.), 2008a. Web Accessibility: A Foundation for Research.Springer.

Harper, S., Yesilada, Y., 2008b. Web accessibility and guidelines. In: Harper, S.,Yesilada, Y. (Eds.), Web Accessibility: A Foundation for Research, . 1st ed.,Human–Computer Interaction Series 1st ed. Springer, London, pp. 61–78,Chapter 6.

Henry, S., 2004. Just Ask: Integrating Accessibility Throughout Design. Georgia TechResearch Corporation.

Henry, S.L., 2008. Web Content Accessibility Guidelines Overview. W3C. <http://www.w3.org/WAI/intro/wcag.php>.

Holt, B., 2000. Creating senior friendly web sites. Center for Medicare Education 1(4), <http://www.medicareed.org/PublicationFiles/V1N4.pdf>.

Jacko, J., Vitense, H., Scott, I., 2003. Perceptual Impairments and ComputingTechnologies. Lawrence Erlbaum Associates, pp. 504–519. Chapter 26.

Jiwnani, K., 2001. Designing for Users with Cognitive Disabilities. <http://otal.umd.edu/uupractice/cognition/>.

Jones, M., Marsden, G., Mohd-Nasir, N., Boone, K., Buchanan, G., 1999. Improvingweb interaction on small displays. Computer Networks 31 (11–16), 1129–1137,<http://citeseer.nj.nec.com/jones99improving.html>.

Keates, S., Trewin, S., 2005. Effect of Age and Parkinson’s Disease on CursorPositioning Using a Mouse. In: ASSETS ’05: Proceedings of the 7th InternationalACM SIGACCESS Conference on Computers and Accessibility. ACM, New York,NY, USA, pp. 68–75.

Ketcham, C.J., Stelmach, G.E., 2001. The Handbook of Psychology and Aging, . AgeRelated Declines in Motor Control, fifth ed. Academic Press Inc., pp. 267–287.Chapter 13.

Lazar, J., Allen, A., Kleinman, J., Malarkey, C., 2007. What frustrates screen readerusers on the web: a study of 100 blind users. International Journal of HumanComputer Interaction 22 (3), 247–269.

Milne, S., Dickinson, A., Carmichael, A., Sloan, D., Eisma, R., Gregor, P., 2005. Areguidelines enough? An introduction to designing web sites accessible to olderpeople. IBM Systems Journal 44 (3), 557–571, <http://www.research.ibm.com/journal/sj/443/milne.html>.

Newell, A.F., Dickinson, A., Smith, M.J., Gregor, P., 2006. Designing a portal for olderusers: a case study of an industrial/academic collaboration. ACM Transactionson Computer–Human Interaction (TOCHI) 13 (3), 347–375.

Oulasvirta, A., Tamminen, S., Roto, V., Kuorelahti, J., 2005. Interaction in 4-secondbursts: the fragmented nature of attentional resources in mobile hci. In: CHI’05: Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems. ACM Press, New York, NY, USA, pp. 919–928.

Owen, S., Rabin, J., 2008. W3C mobileOK Basic Tests 1.0. W3C. <http://www.w3.org/TR/mobileOK-basic10-tests/>.

Paciello, M., 2000. Web Accessibility for People with Disabilities. CMP books, CMPMedia LLC.

Rabin, J., McCathieNevile, C., 2008. Mobile Web Best Practices 1.0. W3C. <http://www.w3.org/TR/mobile-bp/>.

Regan, B., 2004. Best Practices for Accessible Flash Design. Macromedia WhitePaper.

Roto, V., Oulasvirta, A., 2005. Need for Non-Visual Feedback with Long ResponseTimes in Mobile HCI. In Special Interest Tracks and Posters of the 14thInternational Conference on World Wide Web. ACM Press, New York, NY, USA,pp. 775–781.

Sears, A., 1997. Heuristic walkthroughs: finding the problems without the noise.International Journal of Human–Computer Interaction 9, 213–234.

Sears, A., Young, M., 2003. Physical Disabilities and Computing Technologies: AnAnalysis of Impairments. Lawrence Erlbaum Associates, Inc.,, Mahwah, NJ, USA,pp. 482–503.

Siek, K.A., Khalil, A., Liu, Y., Edmonds, N., Connelly, K.H., 2004. A comparative studyof web language support for mobile web browsers. In: WWW@10: The Dreamand the Reality. Terre Haute, Indiana.

Thatcher, J., Burks, M., Heilmann, C., Henry, S., Kirkpatrick, A., Lawson, B., Regan, B.,Rutter, R., Urban, M., Waddell, C., 2006. Web Accessibility. Web Standards andRegulatory Compliance. Springer-Verlag.

Thatcher, J., Waddell, C., Henry, S., Swierenga, S., Urban, M., Burks, M., Regan, B.,Bohman, P., 2002. Constructing Accessible Web Sites. Glasshaus.

Treviranus, J., McCathieNevile, C., Jacobs, I., Richards, J., 2000. Authoring ToolAccessibility Guidelines 1.0. W3C. <http://www.w3.org/TR/ATAG10/>.

Trewin, S., 2006. Physical usability and the mobile web. In: W4A: Proceedings of the2006 International Cross-Disciplinary Workshop on Web Accessibility (W4A).ACM Press, New York, NY, USA, pp. 109–112.

Trewin, S., Pain, H., 1999. Keyboard and mouse errors due to motor disabilities.International Journal of Human–Computer Studies 50, 109–144.

Vigo, M., Aizpurua, A., Arrue, M., Abascalsss, J., 2009. Automatic device-tailoredevaluation of mobile web guidelines. W3C. New Review of Hypermedia andMultimedia 15 (3), 223–224.

Webaim, 2009. Cognitive Disabilities Part 1: We Still Know too Little, and we doEven Less. <http://www.webaim.org/articles/cognitive/cognitive_too_little>.

Wobbrock, J.O., 2006. The Future of Mobile Device Research in HCI. In: CHI’06Workshop. Canada.

Yang, C., Wang, F., 2003. Fractal summarization for mobile devices to access largedocuments on the web. In: Proceedings of the Twelfth International WorldWide Web Conference.

Yesilada, Y., Brajnik, G., Harper, S., 2009a. D4.1: A Barrier Walkthrough Study withExpert and Non-Expert Judges. Technical Report. School of Computer Science,University of Manchester.

Yesilada, Y., Chen, T., Harper, S., 2008. D2.1: RIAM Framework: Content. TechnicalReport. School of Computer Science, University of Manchester.

Yesilada, Y., Chen, T., Harper, S., 2009b. D3: Mobile Web Barriers for the BarrierWalkthrough Method. Technical Report. School of Computer Science, Universityof Manchester.

Yesilada, Y., Chen, T., Harper, S., 2009c. D4: Initial Validation Methodology.Technical Report. School of Computer Science, University of Manchester.

Yesilada, Y., Chuter, A., Henry, S.L., 2009d. Shared Web Experiences: BarriersCommon to Mobile Device Users and People with Disabilities. Technical Report.W3C Web Accessibility Initiative and W3C Mobile Web Best Practices.<http://www.w3.org/WAI/mobile/experiences>.

Yesilada, Y., Stevens, R., Harper, S., Goble, C., 2007. Evaluating DANTE: semantictranscoding for visually disabled users. ACM Transactions on Computer–HumanInteraction (TOCHI) 14 (3), 14.

Zaphiris, P., Kurniawan, S., Ghiawadwala, M., 2007. A systematic approach to thedevelopment of research-based web design guidelines for older people.Universal Access in the Information Society 6 (1), 59–75, <http://www.springerlink.com/content/087050g2771rj416/>.

http://users.dimi.uniud.it/giorgio.brajnik/projects/bw/bw.html

http://www.w3.org/TR/WCAG20/

http://www.w3.org/TR/WCAG10/

http://www.w3.org/TR/mwbp-wcag/

http://www.w3.org/TR/mwbp-wcag/

http://www-hcid.soi.city.ac.uk/research/DRC_Report.pdf

http://www-hcid.soi.city.ac.uk/research/DRC_Report.pdf

http://www.w3.org/TR/WAI-USERAGENT/

http://www.w3.org/WAI/intro/wcag.php

http://www.w3.org/WAI/intro/wcag.php

http://www.medicareed.org/PublicationFiles/V1N4.pdf

http://otal.umd.edu/uupractice/cognition/

http://otal.umd.edu/uupractice/cognition/

http://citeseer.nj.nec.com/jones99improving.html

http://www.research.ibm.com/journal/sj/443/milne.html

http://www.research.ibm.com/journal/sj/443/milne.html

http://www.w3.org/TR/mobileOK-basic10-tests/

http://www.w3.org/TR/mobileOK-basic10-tests/

http://www.w3.org/TR/mobile-bp/

http://www.w3.org/TR/mobile-bp/

http://www.w3.org/TR/ATAG10/

http://www.webaim.org/articles/cognitive/cognitive_too_little

http://www.w3.org/WAI/mobile/experiences

http://www.springerlink.com/content/087050g2771rj416/

http://www.springerlink.com/content/087050g2771rj416/

Documents

Barriers common to mobile and disabled web users