24
Benchmarking Web Accessibility Evaluation Tools: 10th International Cross-Disciplinary Conference on Web Accessibility W4A2013 Markel Vigo University of Manchester (UK) Justin Brown Edith Cowan University (Australia) Vivienne Conway Edith Cowan University (Australia) Measuring the Harm of Sole Reliance on Automated Tests http:// dx.doi.org /10.6084/m9.figshare. 701216

Benchmarking Web Accessibility Evaluation Tools:

  • Upload
    magda

  • View
    64

  • Download
    2

Embed Size (px)

DESCRIPTION

http:// dx.doi.org /10.6084/m9.figshare. 701216. Benchmarking Web Accessibility Evaluation Tools:. Measuring the Harm of Sole Reliance on Automated Tests. Markel Vigo University of Manchester (UK) Justin Brown Edith Cowan University (Australia ) - PowerPoint PPT Presentation

Citation preview

Page 1: Benchmarking Web Accessibility Evaluation Tools:

Benchmarking Web Accessibility Evaluation Tools:

10th International Cross-Disciplinary Conference on Web AccessibilityW4A2013

Markel Vigo University of Manchester (UK)Justin Brown Edith Cowan University (Australia) Vivienne Conway Edith Cowan University (Australia)

Measuring the Harm of Sole Reliance on Automated Tests

http://dx.doi.org/10.6084/m9.figshare.701216

Page 2: Benchmarking Web Accessibility Evaluation Tools:

Problem & Fact

W4A201313 May 2013 2

WWW is not accessible

Page 3: Benchmarking Web Accessibility Evaluation Tools:

Evidence

W4A201313 May 2013 3

Webmasters are familiar with accessibility guidelines

Lazar et al., 2004Improving web accessibility: a study of webmaster perceptions

Computers in Human Behavior 20(2), 269–288

Page 4: Benchmarking Web Accessibility Evaluation Tools:

Hypothesis I

Assuming guidelines do a good job...

H1: Accessibility guidelines awareness is not that widely spread.

W4A201313 May 2013 4

Page 5: Benchmarking Web Accessibility Evaluation Tools:

Evidence II

W4A201313 May 2013 5

Webmasters put compliance logos on non-compliant websites

Gilbertson and Machin, 2012Guidelines, icons and marketable skills: an accessibility evaluation of 100 web development company homepages

W4A 2012

Page 6: Benchmarking Web Accessibility Evaluation Tools:

Hypothesis II

Assuming webmasters are not trying to cheat...

H2: A lack of awareness on the negative effects of overreliance on automated tools.

W4A201313 May 2013 6

Page 7: Benchmarking Web Accessibility Evaluation Tools:

• It's easy

• In some scenarios seems like the only option: web observatories, real-time...

• We don't know how harmful they can be

W4A201313 May 2013 7

Expanding on H2Why we rely on automated tests

Page 8: Benchmarking Web Accessibility Evaluation Tools:

• If we are able to measure these limitations we can raise awareness

• Inform developers and researchers

• We run a study with 6 tools

• Compute coverage, completeness and correctness wrt WCAG 2.0

W4A201313 May 2013 8

Expanding on H2Knowing the limitations of tools

Page 9: Benchmarking Web Accessibility Evaluation Tools:

• Coverage: whether a given Success Criteria (SC) is reported at least once

• Completeness:

• Correctness:

W4A201313 May 2013 9

MethodComputed Metrics

Page 10: Benchmarking Web Accessibility Evaluation Tools:

W4A201313 May 2013 10

Vision Australiawww.visionaustralia.org.au

• Non-profit• Non-government• Accessibility resource

Prime Ministerwww.pm.gov.au

• Federal Government• Should abide by the Transition Strategy

Transperthwww.transperth.wa.gov.au

• Government affiliated• Used by people with disabilities

MethodStimuli

Page 11: Benchmarking Web Accessibility Evaluation Tools:

MethodObtaining the "Ground Truth"

W4A201313 May 2013 11

Ad-hoc sampling

Manual evaluation

Agreement

Ground truth

Page 12: Benchmarking Web Accessibility Evaluation Tools:

W4A201313 May 2013 12

Evaluate Compare with the GT

MethodComputing Metrics

Computemetrics

T1

For every page in the sample...

T2

T3

T4

T5

T6

R1

R2

R3

R4

R5

R6

Get reports

GT

M1

M2

M3

M4

M5

M6

Page 13: Benchmarking Web Accessibility Evaluation Tools:

Accessibility of Stimuli

W4A201313 May 2013 13

Vision Australiawww.visionaustralia.org.au

Prime Ministerwww.pm.gov.au

Transperthwww.transperth.wa.gov.au

Page 14: Benchmarking Web Accessibility Evaluation Tools:

• 650 WCAG Success Criteria violations (A and AA)

• 23-50% of SC are covered by automated test

• Coverage varies across guidelines and tools

W4A201313 May 2013 14

ResultsCoverage

Page 15: Benchmarking Web Accessibility Evaluation Tools:

• Completeness ranges in 14-38%

• Variable across tools and principles

W4A201313 May 2013 15

ResultsCompleteness per tool

Page 16: Benchmarking Web Accessibility Evaluation Tools:

• How conformance levels influence on completeness

• Wilcoxon Signed Rank: W=21, p<0.05

• Completeness levels are higher for 'A level' SC

W4A201313 May 2013 16

ResultsCompleteness per type of SC

Page 17: Benchmarking Web Accessibility Evaluation Tools:

• How accessibility levels influence on completeness

• ANOVA: F(2,10)=19.82, p<0.001

• The less accessible a page is the higher levels of completeness

W4A201313 May 2013 17

ResultsCompleteness vs. accessibility

Page 18: Benchmarking Web Accessibility Evaluation Tools:

• Cronbach's α = 0.96

• Multidimensional Scaling (MDS)

• Tools behave similarly

W4A201313 May 2013 18

ResultsTool Similarity on Completeness

Page 19: Benchmarking Web Accessibility Evaluation Tools:

• Tools with lower completeness scores exhibit higher levels of correctness 93-96%

• Tools that obtain higher completeness yield lower correctness 66-71%

• Tools with higher completeness are also the most incorrect ones

W4A201313 May 2013 19

ResultsCorrectness

Page 20: Benchmarking Web Accessibility Evaluation Tools:

• We corroborate that 50% is the upper limit for automatising guidelines

• Natural Language Processing?– Language: 3.1.2 Language of parts– Domain: 3.3.4 Error prevention

W4A201313 May 2013 20

ImplicationsCoverage

Page 21: Benchmarking Web Accessibility Evaluation Tools:

• Automated tests do a better job...

...on non-accessible sites

...on 'A level' success criteria

• Automated tests aim at catching stereotypical errors

W4A201313 May 2013 21

ImplicationsCompleteness I

Page 22: Benchmarking Web Accessibility Evaluation Tools:

• Strengths of tools can be identified across WCAG principles and SC

• A method to inform decision making

• Maximising completeness in our sample of pages– On all tools: 55% (+17 percentage points)– On non-commercial tools: 52%

W4A201313 May 2013 22

ImplicationsCompleteness II

Page 23: Benchmarking Web Accessibility Evaluation Tools:

Conclusions• Coverage: 23-50%

W4A201313 May 2013 23

• Completeness: 14-38%

• Higher completeness leads to lower correctness

Page 24: Benchmarking Web Accessibility Evaluation Tools:

Follow up

13 May 2013 24

Contact@markelvigo | [email protected]

Presentation DOIhttp://dx.doi.org/10.6084/m9.figshare.701216

Datasetshttp://www.markelvigo.info/ds/bench12/index.html

10th International Cross-Disciplinary Conference on Web AccessibilityW4A2013