Jeff Sauro
Oracle Corporation
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jeff Sauro.
international conference on human centered design held as part of hci international | 2009
James R. Lewis; Jeff Sauro
In 2009, we published a paper in which we showed how three independent sources of data indicated that, rather than being a unidimensional measure of perceived usability, the System Usability Scale apparently had two factors: Usability (all items except 4 and 10) and Learnability (Items 4 and 10). In that paper, we called for other researchers to report attempts to replicate that finding. The published research since 2009 has consistently failed to replicate that factor structure. In this paper, we report an analysis of over 9,000 completed SUS questionnaires that shows that the SUS is indeed bidimensional, but not in any interesting or useful way. A comparison of the fit of three confirmatory factor analyses showed that a model in which the SUSs positive-tone (odd-numbered) and negative-tone (even-numbered) were aligned with two factors had a better fit than a unidimensional model (all items on one factor) or the Usability/Learnability model we published in 2009. Because a distinction based on item tone is of little practical or theoretical interest, we recommend that user experience practitioners and researchers treat the SUS as a unidimensional measure of perceived usability, and no longer routinely compute Usability and Learnability subscales.
human factors in computing systems | 2005
Jeff Sauro; Erika Kindlund
Current methods to represent system or task usability in a single metric do not include all the ANSI and ISO defined usability aspects: effectiveness, efficiency & satisfaction. We propose a method to simplify all the ANSI and ISO aspects of usability into a single, standardized and summated usability metric (SUM). In four data sets, totaling 1860 task observations, we show that these aspects of usability are correlated and equally weighted and present a quantitative model for usability. Using standardization techniques from Six Sigma, we propose a scalable process for standardizing disparate usability metrics and show how Principal Components Analysis can be used to establish appropriate weighting for a summated model. SUM provides one continuous variable for summative usability evaluations that can be used in regression analysis, hypothesis testing and usability reporting.
human factors in computing systems | 2009
Jeff Sauro; Joseph S. Dumas
Post-task ratings of difficulty in a usability test have the potential to provide diagnostic information and be an additional measure of user satisfaction. But the ratings need to be reliable as well as easy to use for both respondents and researchers. Three one-question rating types were compared in a study with 26 participants who attempted the same five tasks with two software applications. The types were a Likert scale, a Usability Magnitude Estimation (UME) judgment, and a Subjective Mental Effort Question (SMEQ). All three types could distinguish between the applications with 26 participants, but the Likert and SMEQ types were more sensitive with small sample sizes. Both the Likert and SMEQ types were easy to learn and quick to execute. The online version of the SMEQ question was highly correlated with other measures and had equal sensitivity to the Likert question type.
Proceedings of the Human Factors and Ergonomics Society Annual Meeting | 2005
Jeff Sauro; James R. Lewis
The completion rate — the proportion of participants who successfully complete a task — is a common usability measurement. As is true for any point measurement, practitioners should compute appropriate confidence intervals for completion rate data. For proportions such as the completion rate, the appropriate interval is a binomial confidence interval. The most widely-taught method for calculating binomial confidence intervals (the “Wald Method,” discussed both in introductory statistics texts and in the human factors literature) grossly understates the width of the true interval when sample sizes are small. Alternative “exact” methods over-correct the problem by providing intervals that are too conservative. This can result in practitioners unintentionally accepting interfaces that are unusable or rejecting interfaces that are usable. We examined alternative methods for building confidence intervals from small sample completion rates, using Monte Carlo methods to sample data from a number of real, large-sample usability tests. It appears that the best method for practitioners to compute 95% confidence intervals for small-sample completion rates is to add two successes and two failures to the observed completion rate, then compute the confidence interval using the Wald method (the “Adjusted Wald Method”). This simple approach provides the best coverage, is fairly easy to compute, and agrees with other analyses in the statistics literature.
human factors in computing systems | 2009
Jeff Sauro; James R. Lewis
Correlations between prototypical usability metrics from 90 distinct usability tests were strong when measured at the task-level (r between .44 and .60). Using test-level satisfaction ratings instead of task-level ratings attenuated the correlations (r between .16 and .24). The method of aggregating data from a usability test had a significant effect on the magnitude of the resulting correlations. The results of principal components and factor analyses on the prototypical usability metrics provided evidence for an underlying construct of general usability with objective and subjective factors.
human factors in computing systems | 2011
Jeff Sauro; James R. Lewis
When designing questionnaires there is a tradition of including items with both positive and negative wording to minimize acquiescence and extreme response biases. Two disadvantages of this approach are respondents accidentally agreeing with negative items (mistakes) and researchers forgetting to reverse the scales (miscoding). The original System Usability Scale (SUS) and an all positively worded version were administered in two experiments (n=161 and n=213) across eleven websites. There was no evidence for differences in the response biases between the different versions. A review of 27 SUS datasets found 3 (11%) were miscoded by researchers and 21 out of 158 questionnaires (13%) contained mistakes from users. We found no evidence that the purported advantages of including negative and positive items in usability questionnaires outweigh the disadvantages of mistakes and miscoding. It is recommended that researchers using the standard SUS verify the proper coding of scores and include procedural steps to ensure error-free completion of the SUS by users. Researchers can use the all positive version with confidence because respondents are less likely to make mistakes when responding, researchers are less likely to make errors in coding, and the scores will be similar to the standard SUS.
human factors in computing systems | 2010
Jeff Sauro; James R. Lewis
The distribution of task time data in usability studies is positively skewed. Practitioners who are aware of this positive skew tend to report the sample median. Monte Carlo simulations using data from 61 large-sample usability tasks showed that the sample median is a biased estimate of the population median. Using the geometric mean to estimate the center of the population will, on average, have 13% less error and 22% less bias than the sample median. Other estimates of the population center (trimmed, harmonic and Winsorized means) had worse performance than the sample median.
international conference on human computer interaction | 2009
Jeff Sauro
Task time is a measure of productivity in an interface. Keystroke Level Modeling (KLM) can predict experienced user task time to within 10 to 30% of actual times. One of the biggest constraints to implementing KLM is the tedious aspect of estimating the low-level motor and cognitive actions of the users. The method proposed here combines common actions in applications into high-level operators (composite operators) that represent the average error-free time (e.g. to click on a button, select from a drop-down, type into a text-box). The combined operators dramatically reduce the amount of time and error in building an estimate of productivity. An empirical test of 26 users across two enterprise web-applications found this method to estimate the mean observed time to within 10%. The composite operators lend themselves to use by designers and product developers early in development without the need for different prototyping environments or tedious calculations.
Quantifying the User Experience | 2012
Jeff Sauro; James R. Lewis
Standardized usability questionnaires are questionnaires designed for the assessment of perceived usability, typically with a specific set of questions presented in a specified order using a specified format with specific rules for producing scores based on the answers of respondents. For usability testing, standardized questionnaires are available for assessment of a product at the end of a study (post-study—e.g., QUIS, SUMI, PSSUQ, and SUS) and after each task in a study (post-task—e.g., ASQ, Expectation Ratings, SEQ, SMEQ, and Usability Magnitude Estimation). Standardized questionnaires are also available for the assessment of website usability (e.g., WAMMI and SUPR-Q) and for a variety of related constructs. Almost all of these questionnaires have undergone some type of psychometric qualification, including assessment of reliability, validity, and sensitivity, making them valuable tools for usability practitioners.
Quantifying the User Experience | 2012
Jeff Sauro; James R. Lewis
User research is a broad term that encompasses many methodologies, such as usability testing, surveys, questionnaires, and site visits, that generate quantifiable outcomes. Usability testing is a central activity in user research and typically generates the metrics of completion rates, task times, errors, satisfaction data, and user interface problems. You can quantify data from small sample sizes and use statistics to draw conclusions. Even open-ended comments and problem descriptions can be categorized and quantified.