Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
2011-10-21
1
Usability Study and Error Analysis of Space Logistics Analysis Tools
HFES NEC Student Research ConferenceOctober 14, 2011, Cambridge, Massachusetts
Paul Grogan, Chaiwoo Lee, and Prof. de WeckMIT Engineering Systems Division
MITSTRATEGICENGINEERING
Designing Systems for an Uncertain Future
Slide 2
Context: Space Exploration Logistics
• Infrequent, long duration transports
• Limited cargo capacity
• Critical resource requirements
• Coupled missions in campaigns
Software tools assist analysis
… but tools must be usable
2011-10-21
2
Slide 3
Analysis Tools: SpaceNet (SN) Spreadsheet (SS)
Type of Tool General‐purpose application Ad hoc (improvised)
Analysis Method Discrete event simulation Cell‐based formulas
User Interface Java Swing GUI Microsoft Excel
Visualizations Plots, animations, etc. None
Error‐checking Simulation error messages Status messages
Study Motivation & Objectives
• Applying usability design principles and software testing methods to the domain of space logistics
• Research questions– What is the comparative effectiveness and
efficiency of space logistics analysis tools?
– How does the user experience and usage patternscompare between the two tools?
• Answer questions with comprehensive usability testing involving human subjects
Slide 4
2011-10-21
3
Study Design
• Design for comparative evaluation
• Randomized, orthogonal assignment of experimental variables – Tool (SpaceNet vs. Spreadsheet)
– Scenario (C vs. D)
• Within-subject evaluation w/ 12 volunteer subjects– Primary group: 8 with space background, aged 22-30 (1♀, 7♂)
– Secondary group: 4 without space background, aged 24-32 (2♀, 2♂)
• Study procedure– Session 1 (~90 min)
: Consent → tutorial → 1st scenario (part 1 → part 2) → questionnaire
– Session 2 (~90 min, different day)
: Tutorial → 2nd scenario (part 1 → part 2) → questionnaire → interview
Slide 5
Factor Description Metric Data collection
EffectivenessModeling missions completely with high research values
Completeness / outcome quality Observation
Perception of outcomes Questionnaire
EfficiencyThe time and effort needed
Completion time / time in mode / time until event
Observation
Mental effort / ease of use / complexity Questionnaire
Error Tolerance and Prevention
Making fewer errors, recovering quickly from errors, feeling a sense of control in usage
Error rate / recovery rate / recovery time
Observation
Annoyance / confidence / predictability / intuitiveness / familiarity
Questionnaire
Usability Metrics
Slide 6
Factor Description
Effectiveness Accuracy and completeness to achieve goals
Efficiency Resources expended to achieve goals
Satisfaction Freedom from discomfort, positive attitudes
ISO 9241-11 definitions
Factors and metrics for this study
2011-10-21
4
Scenario Description
Slide 7
Scenarios C & D:• Part 1: Efficiency
– Create new model• Lunar orbital mission• Validate propellant levels
– Not time limited
• Part 2: Effectiveness– Modify existing model
parameters with constraints• Lunar surface exploration• Maximize REC: Relative
Exploration Capability– Limited to 15 minutes
Part 1 – Efficiency Results
Slide 8
Assemble Launch Stack
Earth Launch** Earth Departure Burn
Lunar Arrival Burn**
** (p<0.01)
Time (s)
Task Completion Times
SpaceNet Spreadsheet SpaceNet Spreadsheet SpaceNet Spreadsheet SpaceNet Spreadsheet
2011-10-21
5
Part 2 – Effectiveness Results
Slide 9
Outcome Quality (% baseline REC)Significant Effects:
• Non-space users did not increase quality using SpaceNet tool
• 2nd session resulted in higher quality
• Spreadsheet tool produced:– Higher quality
– Fewer errors
– Less time in recoveryQuality (%
)
SpaceNet Spreadsheet
User Perception Results
Slide 10
Tools: SpaceNet (SN) Spreadsheet (SS)
Questionnaire
•Convenient, easy to use
•Confident, intuitive, and familiar
•Higher quality outcomes
•More mental effort (p<0.05)
Interview Comments
•Graphical and visual
•“Busier” and less transparent
•Errors pin‐pointed with messages
•“More convenient in the long run”
•Easier to “play” with inputs
•Unconfident of error detection
•“Not intuitive” compared to mental model
•“Not scalable,” at some point “chokes”
2011-10-21
6
Error Analysis Logging
Slide 11
Assemble Launch Stack
Earth Launch
Earth Departure
Moon Arrival
Verify Propellant
Errors
Error Detection CorrectionUndetected
Error
Milestones
Error Analysis – Key Results
Slide 12
Errors Detection
Circle size frequencyLine width error‐detection pairs
Frequency (# Errors)
Error Comparison Across Tools
2011-10-21
7
User experience & usage
• User Perception– Model input accessibility
– Graphics and feedback
– Mental model agreement
• Error Analysis– Error susceptibility differs
– Detection method differs
Revisiting Study Objectives
Slide 13
Efficiency & effectiveness
• SpaceNet more efficient at parts of model creation
• Spreadsheet more effective for modifying existing model
Future work
• Investigate scalability on complex campaigns
• Integrate best practices into future analysis tools
Slide 14Image credit: NASA
Questions?
spacenet.mit.edu
Grogan, P., C. Lee, and O. de Weck, “Comparative Usability Study of Two Space Logistics Analysis Tools,” AIAA-2011-7345, AIAA Space 2011 Conference and Exposition, Long Beach, California, Sept. 26-29, 2011.
Acknowledgements:
– Experimental subject volunteers
– DoD Air Force Office of Scientific Research, NDSEG Fellowship, 32 CFR 168a (Grogan)
– Samsung Scholarship (Lee)
2011-10-21
8
Slide 15
Backup Slides
Slide 16
LEO
LLPO
KSC
Time (days)0.0 1.0 2.0 3.0 7.0
Scenario C – Part 1
Residual PropellantUpper Stage 3,777 kgPropulsion Module 15,697 kg
2011-10-21
9
Slide 17
PSZ
LEO
LLPO
LSP
KSC
Time (Not to Scale)0.0 1.0 2.0 3.0 7.0 8.0 9.0 14.0 15.0 16.0 20.0
Scenario C – Part 2
Baseline REC: 1.16
Slide 18
LEO
LLOI
KSC
Time (days)
0.0 1.0 1.5 3.0 7.0
Scenario D – Part 1
Residual PropellantThird Stage 10,018 kgService Module 3,679 kg
2011-10-21
10
Slide 19
PSZ
LEO
LLOI
TLV
KSC
Time (Not to Scale)
0.0 0.5 4.5 5.0 8.0 8.5 12.5
Scenario D – Part 2
Baseline REC: 0.49
Slide 20
Factor Metric Definition CollectionMethod
Effectiveness Completeness (Part 1) % of events correctly completed in 5 minutes ObservationOutcome quality (Part 2) % increase in relative exploration capability in 15 minutes
Perception of outcomes Perceived quality of task outcomes QuestionnaireEfficiency Completion time (Part 1) Time to complete the tasks given in the scenario Observation
Time in mode (Part 1) Time spent on each event in the scenarioTime until event (Part 1) Time elapsed before first creating an event correctly
Time until event (Part 2) Time elapsed before first making a valid increase in relative exploration capability
Mental effort Perceived mental effort required to do given task Questionnaire
Ease of use The degree to which the system is convenient for completing the scenario
Complexity Perceived complicatedness and difficulty
Error Tolerance and Prevention
Error rate Number of errors made by a user during the process of completing a task
Observation
Recovery rate Percentage of errors correctly recoveredRecovery time Percentage of time spent recovering from errors
Annoyance Perceived frustration and irritation QuestionnaireConfidence The degree to which a user felt confident using the interface
without the fear of making mistakes
Predictability Degree in which the user was able to predict how interface will function
Intuitiveness Perception on the power of knowing or understanding without cognitive effort
Familiarity Degree to which a user recognizes interface components and views their interaction as natural
2011-10-21
11
Slide 21
Su
bje
ct
Too
l
Se
ss
ion
Sc
en
ari
o
Tas
k 1
(s
)
Tas
k 2
(s
)
Tas
k 3
(s
)
Tas
k 4
(s
)
Co
mp
. T
ime
(s)
# T
as
ks
in
5
min
Co
rre
ct
Eve
nt
(s)
# E
rro
rs
% R
ec
ov.
Tim
e to
Va
lid
R
EC
Qu
ali
ty (
%)
# E
rro
rs
% R
ec
ov.
5 SN 1 C 53 64 24 36 245 4 117 3 33.33 143 300.86 2 50.005 SS 2 D 35 221 82 25 730 2 186 2 100.00 348 383.67 0 n/a6 SS 1 C 30 410 73 31 544 3 52 2 60.00 565 401.72 2 50.006 SN 2 D 52 160 59 11 412 0 102 5 100.00 n/a 100.00 4 50.007 SN 1 D 56 43 24 18 177 1 42 0 554 214.29 2 50.007 SS 2 C 42 388 18 35 483 4 57 4 50.00 48 419.83 1 100.008 SS 1 D 49 122 107 101 379 0 352 1 100.00 810 146.94 0 n/a8 SN 2 C 103 249 84 20 598 2 49 5 75.00 587 231.03 1 100.009 SN 1 C 87 369 74 29 960 1 208 3 66.66 n/a 100.00 5 0.009 SS 2 D 28 787 74 53 1462 1 287 9 100.00 871 626.53 1 100.00
10 SS 1 C 93 264 245 27 3567 0 667 6 83.33 332 296.55 1 0.0010 SN 2 D 66 142 55 28 560 3 66 3 66.66 n/a 100.00 1 100.0011 SN 1 D 40 145 111 10 390 1 383 1 100.00 n/a 100.00 2 50.0011 SS 2 C 30 353 43 52 839 3 40 5 100.00 801 222.41 0 n/a12 SS 1 D 62 121 35 19 351 3 133 3 50.00 107 208.16 1 0.0012 SN 2 C 71 96 59 19 496 4 62 2 66.66 n/a 100.00 1 0.0013 SN 1 C 114 62 141 13 450 1 411 3 66.66 616 300.86 0 n/a13 SS 2 D 22 245 90 63 463 2 414 2 75.00 98 214.29 1 0.0014 SS 1 C 129 146 97 59 549 2 505 1 100.00 362 100.86 1 0.0014 SN 2 D 65 55 52 11 252 4 250 0 n/a 615 175.51 1 0.0015 SN 1 D 52 67 52 11 251 4 251 0 n/a 542 208.16 1 100.0015 SS 2 C 45 130 59 35 282 4 282 1 100.00 323 481.03 1 100.0016 SS 1 D 133 149 376 81 1869 1 114 3 33.33 770 208.16 1 0.0016 SN 2 C 114 105 108 18 616 1 460 1 100.00 215 109.48 3 66.66
Complete Analysis – Part 1
Slide 22
Between Groups (SN)
Between Groups (SS)
Paired Scenarios Paired Sessions Paired Tools
1° 2° 1° 2° C D 1 2 SN SSCompletion Time (s) 375.1 601.5 662.4 1554.8 802.4 608.0 811.0 599.4 450.6 959.8
Time to Correct Task (s) 114.4 111.8 119.1* 349.8* 199.8 109.7 143.5 166.0 113.5 196.0
Time in Task 1 (s) 76.1 66.0 60.6 53.3 75.9 55.0 74.8 56.1 72.8 58.2
Time in Task 2 (s) 100.6 188.0 226.4 381.3 219.7 188.1 163.5 244.25 129.8** 278.0**
Time in Task 3 (s) 68.0 74.8 112.8 99.3 85.4 93.0 113.3 62.3 70.5 108.3
Time in Task 4 (s) 17.3 21.5 53.8 37.8 31.2 35.9 36.3 30.8 18.7** 48.4**
Tasks in 5 Minutes (#) 2.63 2.50 1.75 1.50 1.50* 2.75* 2.17 2.08 2.58 1.67
Error Rate (#) 1.63 2.25 2.38* 5.00* 3.08 2.00 2.33 2.75 1.83 3.25
Recovery Rate (%) 75.0 70.8 77.3 83.3 70.6 80.6 65.9 85.2 73.1 78.0
Recovery Time (%) 13.5 1.95 33.8 47.0 33.7 20.0 22.9 30.8 15.5* 38.2*
* Significant difference at α=0.05, ** Significant difference at α=0.01
2011-10-21
12
Complete Analysis – Part 2
Slide 23
Between Groups (SN)
Between Groups (SS)
Paired Scenarios
Paired Sessions Paired Tools
1° 2° 1° 2° C D 1 2 SN SSOutcome Quality (%) 205.0** 100.0** 294.6 338.4 255.4 223.8 215.5 263.6 170.0* 309.2*
Time to REC Increase (s) 467.4 n/a 415.5 527.8 327.7 533.9 542.4 319.1 467.4 394.1
Error Rate (#) 1.75 2.25 0.88 0.75 1.50 1.25 1.50 1.25 1.92* 0.83*
Recovery Rate (%) 59.5 37.5 41.7 33.3 39.6 50.0 25.0* 64.6* 45.8 43.8
Recovery Time (%) 12.6 36.6 3.4 25.4 14.8 16.5 12.0 18.3 20.6* 10.8*
* Significant difference at α=0.05, ** Significant difference at α=0.01
• 7-point Likert scale
• Questions related to:– Mental Effort
– Convenience
– Predictability
– High-quality Outcome
– Complicated Interface
– Annoyance or Frustration
– Confidence
– Intuitiveness
– Familiarity
• User (space, non-space)– Not significant
• Scenario (C, D)– Not significant
• Ordering– Not significant
• Tool– SN: significantly less
mental effort
Questionnaire Analysis
Slide 24
2011-10-21
13
Complete Analysis – Questionnaire
Slide 25
Between Groups (SN)
Between Groups (SS)
Paired Scenarios
Paired Sessions Paired Tools
1° 2° 1° 2° C D 1 2 SN SSQ1: Mental Effort 4.25 3.25 5.12 4.75 4.75 4.17 4.92 4.00 3.92* 5.00*Q2: Convenience 5.25 5.00 3.75 4.50 4.67 4.50 4.67 4.50 5.17 4.00Q3: Predictability 4.75 4.25 4.12 6.00 4.67 4.67 4.67 4.67 4.58 4.75
Q4: High-quality Outcome 4.38 4.00 4.00 3.25 3.58 4.41 3.75 4.25 4.25 3.75Q5: Complicated Interface 3.81 3.25 4.63 3.50 3.88 4.00 4.08 3.79 3.63 4.25
Q6: Annoyance or Frustration 3.25 5.50 4.50 4.00 4.08 4.25 4.83 3.50 4.00 4.33
Q7: Confidence 4.25 5.50 3.75 5.00 4.42 4.42 4.92 3.92 4.67 4.17Q8: Intuitiveness 5.38 5.00 4.25 5.25 4.75 5.08 4.75 5.08 5.25 4.58
Q9: Familiarity 4.75 5.50 4.13 5.25 4.50 5.00 4.67 4.83 5.00 4.50
* Significant difference at α=0.05, ** Significant difference at α=0.01