[PDF] Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users

Abstract

Online shopping has become a valuable modern convenience, but blind or low vision (BLV) users still face significant challenges using it, because of: 1) inadequate image descriptions and 2) the inability to filter large amounts of information using screen readers. To address those challenges, we propose Revamp, a system that leverages customer reviews for interactive information retrieval. Revamp is a browser integration that supports review-based question-answering interactions on a reconstructed product page. From our interview, we identified four main aspects (color, logo, shape, and size) that are vital for BLV users to understand the visual appearance of a product. Based on the findings, we formulated syntactic rules to extract review snippets, which were used to generate image descriptions and responses to users' queries. Evaluations with eight BLV users showed that Revamp 1) provided useful descriptive information for understanding product appearance and 2) helped the participants locate key information efficiently.

Full PDF

RRevamp: Enhancing Accessible Information Seeking Experienceof Online Shopping for Blind or Low Vision Users

Ruolin Wang

UCLA HCI Research

Zixuan Chen

UCLA HCI Research

Mingrui “Ray” Zhang

The Information School,University of Washington

Zhaoheng Li

Department of Computer Science andTechnology, Tsinghua University

Zhixiu Liu

Computer Science Department,Stanford University

Zihan Dang

UCLA HCI Research

Chun Yu

Department of Computer Science andTechnology, Tsinghua University

Xiang “Anthony” Chen

UCLA HCI Research

Figure 1: Revamp simplifies and reconstructs the original Amazon web page for Blind or Low Vision users’ information seek-ing task to understand product details, especially the appearance. Using rule-based heuristics, Revamp extracts descriptiveinformation from customer reviews to generate image descriptions (a), responses to users’ queries (b) with a sentiment sum-mary of all the relevant reviews (c) and original reviews sorted into a positive and a negative lists (d).

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).

ABSTRACT

Online shopping has become a valuable modern convenience, butblind or low vision (BLV) users still face significant challengesusing it, because of: 1) inadequate image descriptions and 2) theinability to filter large amounts of information using screen readers.To address those challenges, we propose Revamp, a system thatleverages customer reviews for interactive information retrieval. Re-vamp is a browser integration that supports review-based question-answering interactions on a reconstructed product page. From our a r X i v : . [ c s . H C ] F e b HI ’21, May 8–13, 2021, Yokohama, Japan Wang et al. interview, we identified four main aspects (color, logo, shape, andsize) that are vital for BLV users to understand the visual appear-ance of a product. Based on the findings, we formulated syntacticrules to extract review snippets, which were used to generate im-age descriptions and responses to users’ queries. Evaluations witheight BLV users showed that Revamp 1) provided useful descriptiveinformation for understanding product appearance and 2) helpedthe participants locate key information efficiently.

CCS CONCEPTS • Human-centered computing → Accessibility systems andtools ; Natural language interfaces . KEYWORDS

Online shopping, Information Retrieval, Accessibility, Blind or LowVision Users, Reviews, Image Description, Question-answering

ACM Reference Format:

Ruolin Wang, Zixuan Chen, Mingrui “Ray” Zhang, Zhaoheng Li, ZhixiuLiu, Zihan Dang, Chun Yu, and Xiang “Anthony” Chen. 2021. Revamp:Enhancing Accessible Information Seeking Experience of Online Shoppingfor Blind or Low Vision Users. In

CHI Conference on Human Factors inComputing Systems (CHI ’21), May 8–13, 2021, Yokohama, Japan.

ACM, NewYork, NY, USA, 14 pages. https://doi.org/10.1145/3411764.3445547

Online shopping has gained increasing popularity among Blind orLow Vision (BLV) people who have limited mobility to travel tophysical stores. According to a research survey conducted in theUK, over 90% of people with disabilities shopped online at leastonce a month . The recent COVID-19 pandemic further acceleratedthe adoption of online shopping among BLV users . Thus makingonline shopping experience accessible has become an imperativerequisite for ensuring the quality of life for BLV users.However, prior studies [30, 31] show that BLV people still facesignificant barriers on online shopping websites due to inadequateimage descriptions and screen readers’ inability to filter out a largeamount of information on a product page. Our formative studywith 20 BLV people showed that automatic tools ( e.g. , Seeing AI,filters of screen reader) were only used by a few experienced users,and the information provided was too generic to inform a purchasedecision, especially for certain categories where the appearancematters, e.g. , fashion products. The most efficient way for BLV usersis still seeking help from a sighted person, such as a family memberor a crowdsourced helper, which is not always available.To address the above challenges, we developed Revamp , an in-teractive information retrieval system that supports review-basedquestion-answering (QA) on a reconstructed product page. It isimplemented as a browser extension for Amazon.com as shownin Figure 1. Utilizing reviews as a knowledge source, Revamp ex-tracts informative descriptions to help BLV users understand theproduct appearance. To understand what kinds of visual questionsBLV users care about, we collected questions from ten BLV partici-pants on their frequently shopping categories on Amazon (Clothing, Shoes & Jewelry, Home & Kitchen, and Electronics), identified fourmain aspects (color, logo, shape, and size) in understanding theappearance. Based on the results, we formulated syntactic rulesto extract the review snippets to generate image descriptions andresponses to users’ queries.We tested the performance of these rules on the Amazon best-seller product lists covering three main shopping categories. Resultsshowed that these rules (i) covered 85% informative reviews votedby BLV participants and (ii) provided insights on information con-veyed by images evaluated by two sighted people. We evaluatedthe usability of Revamp with eight BLV people on six representa-tive products in the three aforementioned categories. Participantsreported that Revamp reduces their efforts of information seeking,improves their utilization of reviews, and extracts reviews that areinformative to help them understand product appearance. Theyconsidered Revamp to be helpful for shopping online independentlywhen no sighted helpers were available.Our contributions are: • Understanding the challenges faced by BLV people in infor-mation seeking when shopping online and design implica-tions to meet their unique needs; • Identifying specific questions important to BLV users whenshopping online and deriving syntactic rules that retrieve in-formative reviews to provide image description or to answervisual questions; • Revamp, an interactive information retrieval system thatintegrates with Amazon to support product browsing andreview-based question-answering; • Evaluation study with eight BLV users that validated thefeasibility and usefulness of enhancing accessible onlineshopping experience via Revamp.

This research builds upon prior work from different sub-disciplinesacross accessibility, information retrieval (IR), computer vision (CV)and natural language processing (NLP). We first review previousresearch investigating online shopping experience for BLV users,as well as the efforts to improve the accessibility of such experience;we then provide an overview of information retrieval methods thatutilize online reviews to generate useful information, which servesas the foundation of the implementation of Revamp.

Blind or low vision (BLV) people often take great effort to findand learn what products are visually appropriate for them [16, 29].Without real-time communication with store clerks and touch-ing the products directly, information related to online productsis mainly conveyed through images and text provided by sellers.Hence, inadequate descriptions of images and unparsed text in-formation significantly reduce BLV consumers’ engagement withonline shopping websites [30, 31]. Currently, BLV people couldleverage human-powered assistive tools to answer visual questionsabout daily objects and products in stores [1–3, 6, 22, 27, 39]. Spe-cially in shopping clothes, human helpers can provide remote assis-tant on describing articles of clothing ( e.g. , color) [1, 3] or offering evamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users CHI ’21, May 8–13, 2021, Yokohama, Japan subjective fashion advice [6, 22]. Relatedly, automatic photo cap-tion generation and visual question answering in computer visionand AI research is often designed for general scenarios [8, 13, 26].There exist automatic solutions that promote accessible image un-derstanding on the web, e.g. , providing image alt-text for photos inFacebook based on object recognition [38], using OCR for photoson Twitter and other websites alike [9, 18], and using reverse imagesearch to find existing captions [10].However, such solutions cannot fulfill BLV users’ special need inbetter understanding visual details of online products. Most imageson eCommerce websites, if captioned by emerging automatic cap-tion tools such as SeeingAI , will only have a generic descriptionof the product, e.g. , “ probably a close up of a red chair ”, without anyspecific insights on appearance details, e.g. , how is the size, andhow does this kind of red make people feel. BrowseWithMe [31]has made an important step forward in describing clothes outfit byidentifying the image regions, but only respond with basic colornames ( e.g. , , “ Brick Top, Navy Pants ”) based on color detection ofimage. Our contribution is providing a novel perspective of lever-aging reviews as an external information source to supplement thevisual details.Besides inadequate descriptions of images, the online shoppingexperience of BLV people is also hindered by the weaknesses ofscreen readers dealing with crowded websites. Screen readers pro-vide fine-grained information navigation and control, but at thecost of reduced walk-up-and-use convenience [34]. Prior workaugmented the auditory web browsing experience by adding asecondary voice output [28, 42], supporting basic queries aboutthe information provided by sellers [31], and supporting users tochoose how many and which levels of detail to listen to basedon their interest [23]. voice assistants as an alternative solutionlack the ability to deeply engage with content and to get a quickoverview of the landscape (e.g., list alternative search results & sug-gestions) [34]. Current voice assistants for online shopping such as

Alexa only respond to a limited range of queries, e.g. , “ add bananasto the chart ”, rather than finding targeted information to answera BLV user’s question. Inspired by VERSE [34] which extends avoice assistant with screen-reader-inspired capabilities to enhanceweb search, our work integrates review-driven question-answeringwith two levels of details (summary & original review lists), whichcontains rich first-hand insights from other buyers.

Most image captioning and visual question answering benchmarksto date focus on questions such as simple counting and object detec-tion that are directly based on the images. Marino et al. [19] drawson external knowledge resources when the image content is insuf-ficient to answer customers’ questions, but their focus on imagescontaining general scenes cannot be directly leveraged for onlineshopping. Online reviews have been mentioned as an useful re-source to assist BLV customers’ desire for more product informa-tion [16, 30, 31], such as supporting question-answering[20, 36]and generating summaries [12, 15]. Prior work towards generalusers investigated extracting experiences [21, 24], tips [11, 41] orsnippets suitable for product descriptions [7, 25], yet it remains unknown how reviews can help with the unique interests of BLVusers, especially providing descriptive information on product ap-pearance. Our research fills a gap in the literature by leveragingexisting human-authored resources of online reviews as an addi-tional source of information to address such unique needs.The most related work [25] applies a supervised method to ex-tract reviews for image description based on 25K labeled sentences.However, the descriptions generated by this work are generic andrarely covered descriptions of visual details. Rather than followingsupervised methods such as conducting aspect-based sentimentanalysis utilizing feature engineering [14, 35] or Bidirectional En-coder Representations from Transformers (BERT) [32] that requiresa huge amount of human-labelled data, we proposed empirically-formulated syntactic rules based on our studies with BLV partic-ipants to retrieve meaningful review snippets for understandingappearances. Compared to “black box” data-driven approaches,these syntactic rules are explainable and can be validated by ourtargeted users; they are modular—existing rules can be edited orremoved and new rules can be added without affecting the others.To the best of our knowledge, there is no prior work that directlyinforms rule design on extracting visual descriptions from reviews.In summary, Revamp extends prior work on enhancing accessi-ble information seeking by proposing a solution to bridge customerreviews with the needs of BLV users on visual information for awider range of online products, including three frequently shoppedcategories: Home & Kitchen, Clothing, Shoes & Jewelry, and Elec-tronics. In this work, we conducted three groups of studies: • Formative study (Section 4, 5) with two stages to: (i) un-derstand the current practice and derive design implicationsvia 30-mins semi-structured interviews with 20 participants(P1-P20); (ii) investigate what specific visual information BLVusers expect and how reviews can help via questionnairesand 30-mins semi-structured interviews with 10 participants(P11-P20). • Rule evaluation (Section 7.1) with two stages to: (i) evalu-ate the informativeness of reviews extracted by the rules viaquestionnaires with eight participants (P13-P20); (ii) evalu-ate the rules’ generalizability on different products wheretwo sighted researchers rated the quality of the generatedalt-text and answers in three informative levels. • System evaluation (Section 7.2) to evaluate how the Re-vamp prototype affected the shopping process. Eight partici-pants (P13-P20) browsed products with Amazon web pagesand Revamp then provided subjective comments by onlineinterviews. The study lasted for 40 mins.The time interval between each group of study was about onemonth. Twenty participants (Gender: 7 female and 13 male; Age:average = 27.7, SD = 5.5) came from two different cultures (P1-P10:China; P11-P20: North America) participated in the first-stage for-mative study due to the distributed nature of the research team.Although the platforms and languages they use differ, participantsshare similar screen reader experience and information seekingchallenges since the screen readers follow the general design and

HI ’21, May 8–13, 2021, Yokohama, Japan Wang et al. most online shopping websites are similarly structured with over-loaded information. Only North American users were involved atthe following studies because of the limited availability of reviewdata on the Chinese shopping sites. Participants were compensatedwith $15 USD\hour. Each study was audio recorded, transcribedand coded by three of the authors following the reflexive thematicanalysis methods from Braun and Clarke [4]. The demographicinformation of participants is attached as Appendix.

We conducted 30-minute semi-structured online interviews with 20BLV consumers (P1-P20) recruited through social media platforms.Specifically, the goal of this study is to understand that, whenshopping online: • What challenges do BLV users face in information seekingand what are their coping strategies? • What are the design heuristics if we want to build an acces-sible information retrieval system for BLV users?

Aligned with prior work [30, 31], two main challenges mentioned byour participants are lack of visual information and lack of efficientways to navigate through information. We revealed how the twochallenges were intertwined with each other, which elicited insightson designing an efficient way to leverage reviews to fill in thelacking visual information.

The most frequently mentionedchallenge is the lack of detailed information for understanding thevisual appearance of the product (18 out of 20 participants). Whilesighted users can tell the visual attributes such as color, shape andsize or even functionalities of a product from a single image, thealt-texts of many current product images are either empty, or setas the image path, which does not provide any visual informationfor BLV users at all. Sellers usually provide no textual descriptionsequivalent to the visual information conveyed by images, whether itis some basic attribute such as color and shape, or some vivid detailssuch as pattern design. Take the color attribute as an example: thenames provided by sellers sometimes can be too generic or obscureto interpret accurately. So much of the language we use to describea product is centered around visual, e.g. , “ marble pattern ” or “

Arcticblue ”, which poses a barrier to people who don’t experience theworld visually. P18 brought up an example that a T-shirt with thecolor name “ surf the web ” was actually a kind of “ bright blue ”,which was only mentioned by a review.

Com-paring with the scanning and skimming experience of sighted users,the passive and sequential reading manner of screen readers makesit inefficient for BLV users to retrieve useful information scatteringall over a web page. P16 mentioned that it can be frustrating whenaccidentally jumping into another unrelated web component suchas the shopping cart in the process of exploring product details.All participants agreed that reviews were useful resources. Reviewsnippets with detailed descriptions can also provide further insightsfor product appearance. However, most participants (16 out of 20) reported their utilization of reviews was greatly reduced becausesifting through the large amount and unrelated reviews can be es-pecially laborious. P19 stated that “

Usually there are too few reviewsper page and there are no easy ways to jump from review to review. Iknow there exist useful reviews but it’s hard to directly pull them outwithout efforts. ”. There still exists a gap between current solutions and BLV users’expectations of online shopping experience. Relying on humanhelpers, the same as many other scenarios, suffer from the limitationof availability, raises privacy concerns, and reduces BLV users’independence. Meanwhile, as reviewed earlier in Related Work,many existing techniques lack adaptation to the specific needs ofBLV users’ online shopping.

80% of the participantschose to seek help from sighted people including families/friends,other customers, paid video chat and accessibility hotlines. In thisway, they had to trust the helper’s personal judgment and subjectto issues such as helpers’ inavaliability and privacy concerns. P19mentioned that “

The shipping and delivery process is not the mosttime-consuming for me. Waiting for someone to help me with selectingproducts is. ” P12 stated that he did not always want others to knowabout certain products he buys and hoped there was an alternativesolution to shop more independently.

P18mentioned using visual interpretation apps such as Seeing AI to getbrief generic descriptions on product images based on object andcolor detection, e.g. , “ probably a close up of a red chair ”. Two partic-ipants who were more tech-savvy in web interfaces (P11, P20) men-tioned using keyword-based search and filters, but these tools areunder-utilized by other participants and can only remove irrelevantinformation on a general basis. P8 and P13 mentioned the customerquestions & answers section supported by websites can also helpwith some generic details, yet it seldom covers appearance-relatedinformation, which is assumed available to sighted consumers viaproduct images.

Based on the interview results, we derived 3 design implicationsto improve the accessibility of online shopping experience. Specif-ically, the designs focus on leveraging the existing resources ona product page: simplifying the webpage (4.3.1) and responding toactive queries (4.3.3) are to address the lack of efficient ways to navi-gate through information (4.1.2); leveraging reviews to supplementvisual description (4.3.2) is to address the lack of visual information (4.1.1).

Whenthe users’ current task is understanding a product, especially theappearance, the overloaded information and web components canbecome burdens for seeking visual-related description. To addressthis problem, we can provide an alternate view of the originalpage with less related information and components removed, suchas shopping cart. Meanwhile, it is important to provide users theoption to switch back to the original page. evamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users CHI ’21, May 8–13, 2021, Yokohama, Japan

Par-ticipants’ responses suggest that some subjective information inreviews can be helpful for BLV people to understand the visualattributes. Just as P2 stated: "

It is impossible for me to build a visualconcept as you sighted people do, but I can feel what you feel. Reviewstalking about the feelings when seeing the appearance are helpful forme to understand this pattern. " It remains to be investigated how toretrieve the informative reviews to meet the specific needs of BLVusers.

Par-ticipants expressed preferences to actively asking questions andgetting summative answers of a product, which was usually whatthey did when consulting with a sighted helper. Meanwhile, anyadditional assistance mechanisms should be compatible with users’current screen reader experience to support users to access theoriginal information as well. P6 commented “

I do not completelytrust in answers selected by machines in case it may not cover verywell ”. The first-stage formative study uncovered the opportunities andchallenges to support the unique needs of BLV population in vi-sual information seeking in online shopping. Building upon thesefindings, we then seek to further understand: • What specific visual or other product-related questions areBLV users interested in? • How can we retrieve useful review snippets to answer thesequestions?It is beyond the scope of this paper to exhaustively cover thelarge number of product categories and all the consumer questionscorresponding to each category. To narrow down our focus, wefirst distributed questionnaires to ten North American participants(P11 - P20) who mainly use Amazon to collect their most frequentlyshopped product categories, then conducted one-on-one interviewsto formulate the understanding of their specific questions on threemain categories: Home & Kitchen, Clothing, Shoes & Jewelry, andElectronics, based on which we derived rule-based solutions toextract informative review snippets.

The frequently shopped categories on Amazon by our participantsare: Electronics (chosen by 90% participants), Home & Kitchen(50%), Pet Supplies (50%), Audible Books and CDs (50%), Clothing,Shoes & Jewelry (30%), Grocery and Gourmet Food (30%). Amongthese categories, we focused on three main categories whose ap-pearances are as crucial as other information for making purchasedecisions, including Home & Kitchen, Clothing, Shoes & Jewelry,and Electronics.We then gathered 168 questions from participants in these threecategories. Each participant was shown representative productson the Amazon best-sellers list and was asked to raise as manyquestions they want to ask as possible after browsing the title anddescription provided by sellers. The questions were then labeled and divided into two groups, non-visual questions and visual questions.The distinct breakout of visual (57%) and non-visual (43%) ques-tions is based on whether a question can be answered by directlyobserving the image.First, 72 out of the 168 questions were questions on non-visualinformation that shares common interests with sighted consumers,including (i) functionality and other non-visual attributes, e.g. , ma-terial, price; and (ii) consumer feedback, e.g. , whether the productwas worth the price, or of high quality.We are interested in the remaining 96 questions on visual in-formation, which were of specific interests to BLV users, includ-ing (i) visual attributes of image, e.g. , color, logo, shape, size; and (ii) high-level concepts that can be inferred from an image, e.g. ,usage method, style.

For non-visual questions, all participants reported that the use-ful information usually could be found in the product descriptionprovided by the sellers, reviews or customer questions & answers,while for visual questions, they mainly rely on human helpers toget the answers. Thus we focus our investigation on the followingsub-categories of visual questions and whether we can provideinformative answers to these visual questions based on reviews,hence providing an alternative for BLV users to shop online inde-pendently.

Most participants startedwith general appearance questions, such as “ how does it look like? ”,followed by detailed questions around the basic visual attributes ofthe product, including color, shape, size and logo, which puts for-ward the need of providing a briefing of appearance covering thesebasic attributes first, then responding to specific queries. Partici-pants mentioned that they are not satisfied with only knowing thebasic category name of visual attributes, or just vague comments, e.g. , “

Nice shape. This color looks great ”. They feel interested in thespecific detailed descriptions and impressions on appearance. P5mentioned his story that once he wanted to buy a guitar and wascurious about what exactly was the color attribute ‘gold’. “

Is it ashining one or a matte one? I want to know more details. ” For shapeand size, participants always want to know how to compare theproduct with daily objects, e.g. , “

Does it fit on a tabletop? ”, “

Is itlike the size of a banana? ”, which can help them better understandshape and size in the real world. Participants are also curious aboutif there are logos on the product.To provide informative answers on basic attributes mentionedabove, the key is to extract the descriptive and comparative de-tails on appearance from the large amount of reviews. Particularly,clothes can have more nuanced design details to describe, suchas sleeves, waist, neck, edition of clothing, graphic designs or pat-terns. These questions, e.g. , “

Do the buttons go all the way down?What kind of neck design does the shirt have? ”, are very category-specific and often require fashion knowledge to provide aestheticdescriptions, as P14 mentioned “ even my friends don’t describe well ”,hence descriptions in common customer reviews hardly meet theirexpectations. In this paper, we don’t focus on these subdividedfashion knowledge which requires large amount of annotated data

HI ’21, May 8–13, 2021, Yokohama, Japan Wang et al. ( e.g. DeepFashion [17]), but focus on the common attributes sharedby a wider range of products and possibly covered by customerreviews. The four attributes (color, shape, size, logo) are commonacross many products and can be concisely described in the alt-textcompared to other aesthetic attributes (e.g., pattern, style).

There are also high-level questions not directly related to basic visual attributes but canbe inferred from the appearances in the image with commonsense,including usage, style, quality, texture and specific accessibilityrequirements, e.g. , , “

Does it look like it has good quality? How to openthis bottle? Does the product have physical buttons? ” Such propertiesare similar to the “signifier” concept proposed by Don Norman ,which act as the indicator that can be interpreted meaningfullyin the context of the social and physical world. These questions,usually framed as Visual Commonsense Reasoning, has been achallenging research topic in computer vision [40]. Yet currentlyit is not well-explored to address the need of inferring conceptsand functions from product images for online shopping scenarios,as they require a large amount of annotated data from externalknowledge sources beyond the shopping websites. However, thereviews commenting on the specific aspects ( e.g. quality, physicalbuttons) can still indirectly provide insights on customers’ opinions.Thus our expediency is treating these questions same as the non-visual questions to provide the relevant reviews. In this work, we mainly focused on addressing BLV users’ questionson the common basic visual attributes, including color, logo, shapeand size, which can be supported by augmenting keywords-basedreview searching with rule-based filtering (described below) toextract more informative snippets.Compared to data-driven models, rule-based solutions can workwell without the need of manually labelled data. Further, rules areexplainable and can be validated by our targeted users. Rules arealso modular—existing rules can be edited or removed and newrules can be added without affecting the others. Note that currentlyour rules are not designed to answer questions related to high-levelconcepts, which we consider as future work.

We firstestablished several simple rules to filter out review sentences thatcannot be used to further the understanding of visual attributeswith specific rules: (i)

Short: sentence of 3 words or fewer; e.g. , “ satisfied ”, “ like theshape ”, “ poor logo "; (ii) Reference to the image: reviews that comment whether the ac-tual product is consistent with the photo provided by the seller, e.g. ,“ looks exactly as pictured ” “ not shown as picture ”. These sentencesmentioned by our participants as “ useless ” and “ annoying ” sincethey often occur in the reviews but contain little useful informationfor BLV customers. All reviews containing thequery keywords are included as candidates and annotated using https://sites.google.com/site/thedesignofeverydaythings/home/signifiers basic Part-Of-Speech tags , which are also known as word classes orlexical categories, e.g. , noun, adjective. Based on the aforementionedanalysis, a group of three researchers iteratively established severalrules for extracting informative reviews as follows. Rule 1: Adjective + Keyword or Keyword + Verb + Adjec-tive.

The descriptive adjectives usually provide supplementaryvisual information. Examples include “ a shimmery purple ", “ crescentshape ”, “ a very nice-looking etched logo ”, “ squarish shape ”, “

The bub-blegum color is glossy and fun. ”, etc. It is hard to enumerate all thepossible descriptive words, instead we decided to obtain qualifiedsnippets by differentiating those with evaluative adjectives.The evaluative adjectives expressing subjective emotions can bevague hence not helpful to further understand the visual attributes. e.g. , “ nice shape ”, “ great color ”. We collected a list of such vagueadjectives, including “ great ”, “ nice ”, good , “ bad ”, “ horrible ”, “ dis-appointed ” etc. These snippets are less informative but still providesubjective opinions, so we still keep them as candidates, but as thelowest priority among reviews following our rules. Rule 2: 1st pronoun + ... + Keyword + ... + that/which/but/because.

Rather than the simple sentences only containing expressions onattitude e.g. , “

I feel disappointed at the color. ”, the sentences withclauses usually provide more detailed and useful information. Co-ordinating conjunction, e.g. , with “ but ” emphasizes the two state-ments contrasting or contradicting each other. Subordinating con-junctions, e.g. , with “ because ” often contain detailed reasons forcustomers’ attitude towards the visual attribute. Examples include“

I love the color (Bubblegum), which I bought because it was the lowestcost for a color that would be difficult to misplace or forget whiletraveling ”. Rule 3: Comparative Expressions.

Other than the commonrules above, there exist some specific rules that work for particularvisual attributes. (1) Keyword (shape) + “like/liked”, e.g. , “ shapedlike a Cola Bottle ”. Comparing the shape of a product with a familiardaily object can be helpful for better understanding the shape; (2)Keyword (size) + “fit/for/of”, e.g. , “ size fits in all cup holders ”, showsreviews containing details on how the product fits in the settings areinformative; (3) “than/more of” + Keyword (color), e.g. , “ it is a terracotta than mocha ”. Some products look different with the pictureprovided by the sellers. Sighted customers complaining about thiskind of difference can also be informative for better understandingthe actual appearance.

Finally, we use the review snippets retrieved by the rules above toanswer a BLV user’s specific visual question or to compose a briefappearance description. As an answer to a BLV user’s question, weprovide original reviews retrieved by our rules divided into twolists—positive and negative—based on a sentiment analysis. Wealso provide a summary with the numbers of positive and negativereviews, and the top-3 informative review snippets from reviewsacross the two lists. To generate both the lists and the top-3 sum-mary, we need to rank the review snippets. In particular, reviewsnippets selected by Rule 1 with descriptive adjs, 2, and 3 havehigher priority than those by Rule 1 withe evaluative adjs; further, evamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users CHI ’21, May 8–13, 2021, Yokohama, Japan Figure 2: System overview of how Revamp responds to users’ query. It first extract the keywords and process the query byits kind. After filtering out irrelevant reviews and reranking, Revamp uses the shortest candidate among the top-3 reviews togenerate the visual description of a product and relevant answers to the query. reviews of the same priority are ranked by their helpfulness (votesby other customers). The next level lower are reviews that containthe query keywords but do not meet any rules, within which theyare ranked by relevance based on the concept of graph centralityfollowing prior work [37]. For the brief appearance description, weselect the shortest sentence in the top-3 candidates correspondingto each visual attribute (where each visual attribute is used as akeyword) since it is advised not to use long text for image alt-text. Ifusers are interested in learning more details, they can move forwardto ask specific questions on visual attributes.

We present Revamp, an interactive information retrieval system toimprove the shopping experience for BLV users. We implementedRevamp as a browser extension that works on Amazon. Revampallows BLV users to interact with a simplified product page, accessimage description composed of relevant reviews, ask questions andreceive a summary with original reviews as response, as shown inFigure 2. On the front-end, Revamp uses Chrome API to automat-ically simplify the page and injects our search component to thecurrent page, allowing user to send query and get a response fromour back-end. On the back-end, Revamp extracts the keywords inuser’s query and matches the keywords with review snippets data.By ranking the filtered review snippets and applying sentimen-tal classification, the back-end returns a review summary and apositive and a negative lists.

Data source . Our data source consists of two main parts: (i)

Ba-sic attributes (title, color, price) from information provided by sell-ers; (ii)

Reviews (containing title, content, rating, helpfulness, date,and author); and (iii)

Customer Q&A; We scraped data from Amazonproduct pages using the Python library BeautifulSoup combinedwith Selenium and stored it in .csv format. If a product browsedby a user already exists in our database, we will use it directly;otherwise, we will run the scraper to retrieve the data and save itin the database. Our Python Flask based back-end API server willuse these data to generate responses to our front-end’s requests. Before our user study and evaluation, we pre-scraped the productdata to avoid the potential latency of retrieving it in real-time.

Web page simplification

We provide an an alternate view ofthe original Amazon pages to improve the screen reader browsingexperience. Our extension firstly rearranges the elements on theAmazon product page by manipulating the DOM tree We removedthe irreverent information such as advertisements and promotions,and keep only the product details, reviews and images from theoriginal page. Revamp would generate brief descriptions as thealt-text of the images using the aforementioned rules. Then weuse Chrome APIs to inject our Revamp module into the productpage. We strictly followed WAI-ARIA standards to make sure allcomponents have proper attributes and keyboard interactions. Review Snippets Extraction . Revamp responds to both key-word queries such as “ color ” as well as natural language questionssuch as “

Does the product have physical buttons? ”. For keywordqueries on four main visual attributes, including color, logo, shape,size, we extracted the reviews with the pre-defined extended key-words based on our formulated rules. Specifically, for color, theextended keywords included ‘color’, basic color name such as ‘blue’and the special name provided by sellers such as ‘surf the web’. Forother natural language questions, we used the Rapid Automatic Key-word Extraction (RAKE) library for keyword extraction, Synsetfrom WordNet to get the groupings of synonymous keywordsthat express the same concept, then extracted the reviews with thecertain and synonymous keywords. Answer Generation . There are three types of information gen-erated. (i)

A supplementary textual description of product images; (ii)

Two reviews lists: positive and negative; and (iii)

A summaryof reviews based on the two reviews lists. After we retrieved theranked reviews, we divided them into positive and negative cat-egories based on aspect-based sentimental classification by Mon-keyLearn API. Finally, we extract top three reviews from each listto generate our summary using a pre-defined template. In casethere are too few reviews to populate the answers, we leverage the WAI-ARIA stands for Web Accessibility Initiative – Accessible Rich Internet Applica-tions. It is a specification published by the World Wide Web Consortium (W3C) thatspecifies how to increase the accessibility of web pages. https://github.com/csurfer/rake-nltk HI ’21, May 8–13, 2021, Yokohama, Japan Wang et al.

Figure 3: An example questionnaire for BLV users to votefor which one of the review snippets (retrieved by our rulesor by default ranking of search result) best answers visualattributes questions. existing computer vision technique to provide basic answers, e.g. ,using Pythia [13] to answer the color, whether there is a logo onthe image, and what is the shape. Users can either use text or voiceinput, when Revamp finishes generating answers, it will notifythe user with a beep. Rather than automatically reading out theanswers, we support users to use their own screen readers to scanthe answers, which provides them with more freedom of control.

We separated evaluation into two parts: rule evaluation and systemevaluation. Rule evaluation is to evaluate the performance of rules-based solution for retrieving informative review snippets. Systemevaluation is to further evaluate how Revamp is integrated intoBLV users’ online shopping process. Modifying page structure andreviews-based interactions are non-separable, as they are coordi-nated to support a comprehensive information seeking flow. Hencewe chose to evaluate Revamp as an integrated system.

We first evaluated how our rules cover the informative reviewsnippets for understanding appearance and how our solution canbe generalized to different products.

We randomly selected eight products covering three afore-mentioned main categories from the Amazon best-seller list and ex-tracted review snippets covering four main visual attributes (color,logo, shape and size) following the above rules. We then distributedvoting questionnaires to eight BLV participants (P13 - P20). Giveneach product, a participants was asked to vote for the snippet thatcould help them best understand the visual attributes of that prod-uct: half of the snippets were generated by our proposed rules andthe other half from the top-ranked answers in the existing Amazon‘Have a Question’ section; all snippets were presented in a randomorder. Figure 3 shows an example question and review snippets. Thesnippets selected by our rules gained 85% of the 144 votes. Noticethat we are not filtering out the remaining 15%. Review snippets which were not selected by our rules may also contain informativedetails, but seldom directly for describing the visual attributes, e.g. ,a review “

We ordered 2 blue and black in size xl. The blue was okand the black was small ” talking about color actually gave moredetails about the size differences between different colors. Ratherthan filtering out these reviews, we still show them but with lowerranking, while reviews that meet our rules and provide descriptivedetails such as “ color was off and not the true blue/normal blue thatchampion usually has ” will be assigned a higher priority.

To further validate thegeneralization of these rules in a more real-world setting, we down-loaded web data of 45 selected products from Amazon’s up-to-datetop-seller ranking list (top 15 in each of the aforementioned threecategories with similar products removed to maintain variety). Twosighted research team members then browsed the product pageand rated the quality of the generated alt-text and answers thatdescribe four visual (color, logo, shape, size) in three levels: notapplicable, providing related non-visual information, providing di-rect visual information . Results showed that our rules can provideinformative results in most cases as long as there exist enough re-views, especially for the products whose appearances matter morein shopping decision, as shown in Figure 4. Not applicable is due tono reviews mentioned the attribute if the specific attribute does notoccur (there is no logo on the product), or has no details to provide( e.g. , no reviews of a white bed sheet mentioned color details), orthere exist too few informative reviews. In this case, Revamp usesPythia Visual Question Asking [13] to provide back-up answers.

To further understand how the system is integrated into BLV users’online shopping process, we evaluated Revamp with BLV con-sumers. We selected six representative products from the afore-mentioned three main categories including Water Bottle, Men’sTee, Women’s Dress, Bed Sheet, Bluetooth Speaker, and Chair Cush-ion as the testing products. Three representative products, retrievedreviews by rules and generated image description are shown inFigure 5.

We conducted inter-views with the same eight blind participants (P13 - P20). Each userstudy lasted approximately 40 mins using Zoom after we receivedthe IRB approval. Participants used their own laptops and screenreaders and shared their feedback with the experimenters at thesame time. We first asked the users to browse the products as theyused to on original Amazon web pages without Revamp. The maintask was to obtain information on four visual attributes (color, logo,shape, size) also other details they were curious about. We thenintroduced how Revamp worked, including how it simplified theoriginal Amazon web page and what kinds of questions Revampcould answer, and asked the participants to repeat the same task us-ing Revamp. Participants then filled out a survey with Likert-scalestatements as shown in Table 1. Participants then further explainedtheir reasons of giving each specific ratings. Each study was audiorecorded and transcribed and participants’ qualitative response wassummarized by affinity diagramming. evamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users CHI ’21, May 8–13, 2021, Yokohama, Japan

Figure 4: Performance of image description and visual questions on four basic attributes including shape, logo, size, color. Twosighted research team members rated the quality by three levels on 15 Amazon top-seller products in each of the three maincategories. Results showed that our rules can directly provide visual info for 54 questions or image descriptions in Clothing,Shoes & Jewelry; 40 in Home & Kitchen; 29 in Electronics. We also included “Not Applicable” because of no relevant reviewsor too few informative reviews.Table 1: We collected participants’ subjective ratings on Revamp. The scale was 1 to 7, in which 7 = I strongly agree with thisstatement, 4 = neutral, and 1 = I strongly disagree with this statement. The value represents the average of ratings (SD). Datawas analyzed using Wilcoxon test and a statistical significant difference ( 𝑝 < . ) is marked with ∗ —all ratings of Revampsignificantly outperformed the current practice.Statements Revamp Current It is easy and efficient for me to locate product-related information I need * 5.50 (1.07) 4.75 (0.89)My questions are answered with informative answers * 5.13 (0.64) 4.38 (1.41)I feel confident in understanding the appearance of the product * 5.88 (1.25) 3.00 (0.76)I believe I can fully utilize the information from reviews * 5.75 (1.83) 5.13 (1.46)Revamp can be a helpful alternative for shopping online independently when no sighted helper isavailable 6.50 (0.53) -I will use Revamp regularly in my daily life 6.00 (0.76) -

We observed the common interaction pattern withRevamp proceeded as follows: The participants first navigatedthrough four elements (product name, image with alt-text, infoprovided by sellers and QA) on Revamp. They then asked questionson visual attributes and other aspects they were curious about (nopredefined questions were provided). Since personal interests differ,they asked 2 ~5 questions for each product, including both visualand non-visual questions. Besides the review summary, they alsobrowsed the positive and negative review lists when needed.Overall participants considered Revamp to be a helpful alterna-tive for shopping online independently when no sighted helper isavailable and will use Revamp regularly in daily life. The compar-ative ratings are shown in Figure 6. Notice that one limitation of study is the bias of comparing a new system to an old one, partici-pants would know which system was designed by the authors andcan be influenced by this. Unfortunately, given the popularity ofAmazon, participants would have known which one was ours evenif we had intentionally de-identified the two systems. We furtherdiscuss participants’ subjective comments below.

Providing supplementary information for better under-standing the product appearance . With Revamp, participants’ratings on their understanding of product appearance were im-proved comparing with their current practice, owing to the sup-plementary image description and responses to visual questions.Having a supplementary description of the product image is a newexperience to them: “

You know the Amazon doesn’t provide image

HI ’21, May 8–13, 2021, Yokohama, Japan Wang et al.

Figure 5: Examples of representative products in three main categories, showing the retrieved reviews, generated image de-scription and comparing with the recognition results by SeeingAI, a common tool for BLV users to figure out what is in animage. descriptions for the products. I will definitely use this add-on in my life. ” (P13) When using Amazon without Revamp, participants often justquickly skipped the image web component since no informativeinformation is provided; with Revamp, participants tended to usethe shortcut command of screen readers to directly jump to imageto first gain an overall understanding of the product’s appearance.Participants felt surprised when Revamp provides useful answerson visual attributes, as mentioned by P19: “

It is really cool. It didanswer my questions about product appearance. ” and even helpsthem learn about unfamiliar visual attributes, as shared by P13:“

Although I haven’t seen colors before, I have a lot of fun in readingthese descriptions of colors. For example, I don’t know what is the‘Spice’ color. It told me a review mentioned ‘like a burnt orange’, whichis much more understandable. ” Participants also mentioned thatdescriptive details of reviews from the first-hand buyers sometimesfelt more trustworthy than opinions of a friend or family member,as P15 commented “

Who can perform better than the customers whohave bought the product themselves on describing the product? I reallylike the idea of using the reviews. ” Most participants agreed thatthey can better utilize the information from reviews. Only oneparticipant gave a low rating of 2 because he personally preferrednot to use any filter on the reviews in case something important ismissed.

Enhancing interaction flow of information seeking . WithRevamp, the information seeking experience of participants was improved comparing with their current practice, owing to the recon-structed web page and better utilizing reviews. Revamp providedusers with cleaner web page structure and is more user-friendlythan the current Amazon page, as stated by P20: “

I don’t need toworry of being stuck in useless information any more. ” Participantsfound the review summary and the review lists divided by positiveand negative useful to access customers’ opinions more efficientlyand proactively. P14 mentioned that “

I like it that Revamp also keepsthe original reviews accessible in the lists. After reading the summary,I can then make the decision to skip or look into the details of eachrelevant reviews in the list. ” Participants were more engaged in ask-ing questions, as P19 commented “

Sometimes I could be inspiredto ask more after I read the answers. ” Participants could choose tointeract with the system using either voice commands or text inputbased on their preferences. Although participants in general likedthe experience of screen readers augmented with voice input, theypreferred to receive answers in text rather than speech: “

Using myscreen reader to read out the answers is far more better than directlyanswering my questions by voice. I can adjust the speed and pauseanytime. ” (P15)

Most web pages with vivid designs and a large amount of infor-mation are still not accessible enough to BLV users at present. Weexplored three aspects towards enhancing the information seeking evamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users CHI ’21, May 8–13, 2021, Yokohama, Japan

Figure 6: Subjective ratings on statements comparing current practice with experience of using Revamp. Revamp demonstratesa clear advantage in the experience of understanding the product appearance. experience on online shopping websites: (i) simplifying and recon-structing the web pages according to users’ current task; (ii) pro-viding coordinated experience of active query and passive readingto support flexible information seeking; (iii) leveraging relative textresources on the web page, such as reviews, to fill in the informa-tion gap. Besides, this work also inspired several exciting futuredirections as follows.

In this work, we focus on the task of understanding the productdetails, especially the appearance. Imagine another task: if the userhas several product candidates in mind, we should also correspond-ingly meet a new information seeking need of comparing multipleproducts. To address this, future work can explore reconstructingthe webpages to support efficient switching among products andanswering comparative questions about multiple products. Cur-rently, table is a common web page element for comparing differentproducts and should be supported in future versions of Revamp.For example, the system should guide the user to navigate a tableby informing them which products are being compared in termswhat attributes and further be able to answer a user’s question bylooking up the table.

Furthermore, future work can employ and test our methods onother product pages where there is often an overload of informationinaccessible for BLV users to retrieve. For example, on Yelp, ourmethods can add description to popular dishes, whose images donot always convey all important information ( e.g. , how large isthe portion); on Trip advisor, images taken by fellow travellerscan be further described using our methods by extracting relevant descriptions from others’ comments ( e.g. , whether a trail has shade);on Youtube, comments can be leveraged to generated video captionsfor eye-catching moments, which serve as a more vivid introductionbefore one decides whether to watch a given video.

Although the subjective comments were useful and convey vividdetails, our participants pointed out that the generated appearancedescriptions might be too subjective since it only contained onesnippet in each visual aspect. However, if we follow prior work onfiltering out subjective comments [25], we will lose the vivid anddescriptive details that are crucial to appearance understanding.As such, participants preferred the image description to remainas concise as possible and the description provided by Revampwould serve more as “ a first step or a clue. ” (P16) Also, it remainsto be explored how we can better retrieve useful information fromreviews on the high-level visual concepts inferred by image. In thiswork, we formulated hand-crafted rules to explore the possibilityof leveraging reviews as an additional information source to fill thevisual information gap. In the future, we can collect and label datato deploy supervised methods such as Knowledge Graph whichcould extract more information to better support queries not limitedin the four visual aspects (color, logo, shape and size) and providebetter summative appearance description.

The question & answering experience of Revamp can be improvedby providing prompting questions and involving more informationsources. We observed that some participants had difficulty in formu-lating what questions to ask: P16 only asked about basic attributes, e.g. , color, quality. She explained that “

Sometimes I don’t know howto ask a ‘good’ question. Maybe if I change my wording, I can get moreanswers. ” and suggested us to provide more pre-defined questions

HI ’21, May 8–13, 2021, Yokohama, Japan Wang et al. as prompts. Meanwhile, participants noticed that there still existsa gap between their expectations and the answer quality in otherdetails beyond the four main visual attributes. “

Revamp can supportdescriptive answers for basic visual questions, but when I want toask about some concrete (non-visual) details such as the dimensionsof the bottle, it cannot answer me directly. ”(P19) In the future, wecan involve more information source into Revamp, also leveragingOptical Character Recognition to extract the text information onthe images provided by sellers.

Similar to other information retrieval system, the performance ofRevamp highly depends on the quality of existing data. Currently,we leverage existing visual question answering API to providea back-up answer for visual questions when there are too fewreviews or no informative reviews. In the future, we can also involvehuman helpers in the system by sending the questions to crowdworkers. Our rules for extracting informative reviews can also giveinsights on formulating guidelines of describing products for BLVusers: (i) include descriptive details rather than vague opinions(Rule 1); (ii) if have to express personal attitudes, elaborate thereasons (Rule 2); (iii) compare with daily or common objects to helpunderstand new concepts (Rule 3). Besides, participants also hopedto see Revamp’s functions integrated into the mobile applicationwith Alexa since they often try the mobile application of Amazonwhen faced with accessibility problems shopping on the web.

There are several limitations of the system evaluation study. First,we did not counterbalance the conditions in the system evaluation.Using Revamp first would introduce a strong carry-over effect, asthe information provided by Revamp is an augmentation of thebaseline. As personal interests and product information differ, tocompare intuitively how Revamp improves the experience, all par-ticipants first browsed the products as they used to on originalAmazon web pages then on the modified web page of the sameproduct by Revamp. Second, there exists the bias of comparing anew system to an old one when participants know which systemwas designed by the researchers [5, 33]. Third, only eight partici-pants and a limited amount of products were involved in evaluation.This limitation can be addressed by a large-scale in-the-wild study,which we regard as future work beyond this initial study.

We present Revamp, a system that employs information retrievaltechniques to meet the unique information seeking requirements ofBLV consumers when shopping online. Our main contribution is arule-based approach that leverages rich customer reviews to serveas image description and to answer BLV users’ questions relatedto product appearances. Evaluations with eight BLV consumersshowed that Revamp provides useful subjective information forunderstanding the product appearance and enhances the accessibleinformation seeking experience of online shopping. Although Re-vamp could not provide answers for all of the products ( e.g. , whenthere are too few reviews of a product), it can serve as an effective supplemental helper for BLV users to better access and understanda product before making a purchase decision.

ACKNOWLEDGMENTS

The authors thank all participants who generously shared their timeand experience for this work, Mengqi Li for helping with formativestudies, and Prof. Jacob O. Wobbrock for valuable advice on thepaper writing.

REFERENCES [1] Mauro Avila, Katrin Wolf, Anke Brock, and Niels Henze. 2016. Remote Assistancefor Blind Users in Daily Life: A Survey About Be My Eyes. In

Proceedings of the9th ACM International Conference on PErvasive Technologies Related to AssistiveEnvironments (Corfu, Island, Greece) (PETRA ’16) . ACM, New York, NY, USA,Article 85, 2 pages. https://doi.org/10.1145/2910674.2935839[2] Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller,Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White,and Tom Yeh. 2010. VizWiz: Nearly Real-time Answers to Visual Questions. In

Proceedings of the 23Nd Annual ACM Symposium on User Interface Software andTechnology (New York, New York, USA) (UIST ’10) . ACM, New York, NY, USA,333–342. https://doi.org/10.1145/1866029.1866080[3] Erin Brady, Meredith Ringel Morris, Yu Zhong, Samuel White, and Jeffrey P.Bigham. 2013. Visual Challenges in the Everyday Lives of Blind People. In

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13) . ACM, New York, NY, USA, 2117–2126. https://doi.org/10.1145/2470654.2481291[4] Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive the-matic analysis.

Qualitative Research in Sport, Exercise and Health

11, 4 (2019), 589–597. https://doi.org/10.1080/2159676X.2019.1628806arXiv:https://doi.org/10.1080/2159676X.2019.1628806[5] Emeline Brulé, Brianna J. Tomlinson, Oussama Metatla, Christophe Jouffrais,and Marcos Serrano. 2020. Review of Quantitative Empirical Evaluations ofTechnology for People with Visual Impairments. In

Proceedings of the 2020 CHIConference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI’20) . Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376749[6] Michele A. Burton, Erin Brady, Robin Brewer, Callie Neylan, Jeffrey P. Bigham,and Amy Hurst. 2012. Crowdsourcing Subjective Fashion Advice Using VizWiz:Challenges and Opportunities. In

Proceedings of the 14th International ACMSIGACCESS Conference on Computers and Accessibility (Boulder, Colorado, USA) (ASSETS ’12) . Association for Computing Machinery, New York, NY, USA, 135–142.https://doi.org/10.1145/2384916.2384941[7] Guy Elad, Ido Guy, Slava Novgorodov, Benny Kimelfeld, and Kira Radinsky. 2019.Learning to Generate Personalized Product Descriptions. In

Proceedings of the28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19) . Association for Computing Machinery, New York,NY, USA, 389–398. https://doi.org/10.1145/3357384.3357984[8] Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, CyrusRashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every Picture Tells aStory: Generating Sentences from Images. In

Proceedings of the 11th EuropeanConference on Computer Vision: Part IV (Heraklion, Crete, Greece) (ECCV’10) .Springer-Verlag, Berlin, Heidelberg, 15–29.[9] Cole Gleason, Amy Pavel, Emma McCamey, Christina Low, Patrick Carrington,Kris M. Kitani, and Jeffrey P. Bigham. 2020. Twitter A11y: A Browser Extensionto Make Twitter Images Accessible. In

Proceedings of the 2020 CHI Conference onHuman Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20) . Associationfor Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376728[10] Darren Guinness, Edward Cutrell, and Meredith Ringel Morris. 2018. Captioncrawler: Enabling reusable alternative text descriptions using reverse imagesearch. In

Proceedings of the 2018 CHI Conference on Human Factors in ComputingSystems . Association for Computing Machinery, New York, NY, USA, 1–11.[11] Ido Guy, Avihai Mejer, Alexander Nus, and Fiana Raiber. 2017. Extracting andranking travel tips from user-generated reviews. In

Proceedings of the 26th interna-tional conference on world wide web . International World Wide Web ConferencesSteering Committee, Republic and Canton of Geneva, Switzerland, 987–996.[12] Jeff Huang, Oren Etzioni, Luke Zettlemoyer, Kevin Clark, and Christian Lee. 2012.RevMiner: An Extractive Interface for Navigating Reviews on a Smartphone. In

Proceedings of the 25th Annual ACM Symposium on User Interface Software andTechnology (Cambridge, Massachusetts, USA) (UIST ’12) . ACM, New York, NY,USA, 3–12. https://doi.org/10.1145/2380116.2380120[13] Yu Jiang, Vivek Natarajan, Xinlei Chen, Marcus Rohrbach, Dhruv Batra, andDevi Parikh. 2018. Pythia v0.1: the Winning Entry to the VQA Challenge 2018.arXiv:1807.09956 [cs.CV] evamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision Users CHI ’21, May 8–13, 2021, Yokohama, Japan [14] Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and Saif Mohammad. 2014.NRC-Canada-2014: Detecting aspects and sentiment in customer reviews. In

Proceedings of the 8th international workshop on semantic evaluation (SemEval2014) . Association for Computational Linguistics, Dublin, Ireland, 437–442.[15] Fangtao Li, Chao Han, Minlie Huang, Xiaoyan Zhu, Ying-Ju Xia, Shu Zhang, andHao Yu. 2010. Structure-aware review mining and summarization. In

Proceedingsof the 23rd international conference on computational linguistics . Association forComputational Linguistics, Association for Computational Linguistics, Strouds-burg, PA, 653–661.[16] Guanhong Liu, Xianghua Ding, Chun Yu, Lan Gao, Xingyu Chi, and YuanchunShi. 2019. "I Bought This for Me to Look More Ordinary": A Study of Blind PeopleDoing Online Shopping. In

Proceedings of the 2019 CHI Conference on HumanFactors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19) . ACM, New York,NY, USA, Article 372, 11 pages. https://doi.org/10.1145/3290605.3300602[17] Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. Deep-fashion: Powering robust clothes recognition and retrieval with rich annotations.In

Proceedings of the IEEE conference on computer vision and pattern recognition .IEEE, Long Beach, CA, 1096–1104.[18] Christina Low, Emma McCamey, Cole Gleason, Patrick Carrington, Jeffrey P.Bigham, and Amy Pavel. 2019. Twitter A11y: A Browser Extension to DescribeImages. In

The 21st International ACM SIGACCESS Conference on Computers andAccessibility (Pittsburgh, PA, USA) (ASSETS ’19) . Association for Computing Ma-chinery, New York, NY, USA, 551–553. https://doi.org/10.1145/3308561.3354629[19] Kenneth Marino, Mohammad Rastegari, Ali Farhadi, and Roozbeh Mottaghi. 2019.Ok-vqa: A visual question answering benchmark requiring external knowledge.In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition .IEEE, Long Beach, CA, 3195–3204.[20] Julian McAuley and Alex Yang. 2016. Addressing complex and subjective product-related queries with customer reviews. In

Proceedings of the 25th InternationalConference on World Wide Web . International World Wide Web ConferencesSteering Committee, Republic and Canton of Geneva, Switzerland, 625–635.[21] Hye-Jin Min and Jong C Park. 2012. Identifying helpful reviews based on cus-tomer’s mentions about experiences.

Expert Systems with Applications

39, 15(2012), 11830–11838.[22] Meredith Ringel Morris, Kori Inkpen, and Gina Venolia. 2014. Remote ShoppingAdvice: Enhancing In-store Shopping with Social Technologies. In

Proceedings ofthe 17th ACM Conference on Computer Supported Cooperative Work & (Baltimore, Maryland, USA) (CSCW ’14) . ACM, New York, NY, USA,662–673. https://doi.org/10.1145/2531602.2531707[23] Meredith Ringel Morris, Jazette Johnson, Cynthia L. Bennett, and Edward Cutrell.2018. Rich Representations of Visual Content for Screen Reader Users. In

Pro-ceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18) . Association for Computing Machinery, NewYork, NY, USA, 1–11. https://doi.org/10.1145/3173574.3173633[24] Quang Nguyen. 2012. Detecting experience revealing sentences in productreviews.

University of Amsterdam

1, 1 (2012), 1–78.[25] Slava Novgorodov, Ido Guy, Guy Elad, and Kira Radinsky. 2019. GeneratingProduct Descriptions from User Reviews. In

The World Wide Web Conference (SanFrancisco, CA, USA) (WWW ’19) . Association for Computing Machinery, NewYork, NY, USA, 1354–1364. https://doi.org/10.1145/3308558.3313532[26] Krishnan Ramnath, Simon Baker, Lucy Vanderwende, Motaz El-Saban, Sudipta NSinha, Anitha Kannan, Noran Hassan, Michel Galley, Yi Yang, Deva Ramanan,et al. 2014. Autocaption: Automatic caption generation for personal photos. In

IEEE Winter Conference on Applications of Computer Vision . IEEE, IEEE, WaikoloaVillage, Hawaii, 1050–1057.[27] André Rodrigues, Kyle Montague, Hugo Nicolau, João Guerreiro, and TiagoGuerreiro. 2017. In-context Q&

Proceedings of the 19th International ACM SIGACCESS Conference on Computersand Accessibility (Baltimore, Maryland, USA) (ASSETS ’17) . ACM, New York, NY,USA, 32–36. https://doi.org/10.1145/3132525.3132555[28] Daisuke Sato, Shaojian Zhu, Masatomo Kobayashi, Hironobu Takagi, and ChiekoAsakawa. 2011. Sasayaki: Augmented Voice Web Browsing Experience. In

Proceedings of the SIGCHI Conference on Human Factors in Computing Sys-tems (Vancouver, BC, Canada) (CHI ’11) . ACM, New York, NY, USA, 2769–2778.https://doi.org/10.1145/1978942.1979353[29] Kristen Shinohara and Jacob O. Wobbrock. 2011. In the Shadow of Misperception:Assistive Technology Use and Social Interactions. In

Proceedings of the SIGCHIConference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI’11) . ACM, New York, NY, USA, 705–714. https://doi.org/10.1145/1978942.1979044[30] Abigale Stangl, Meredith Ringel Morris, and Danna Gurari. 2020. “Person, Shoes,Tree. Is the Person Naked?” What People with Vision Impairments Want inImage Descriptions. In

Proceedings of the 2020 CHI Conference on Human Factorsin Computing Systems (Honolulu, HI, USA) (CHI ’20) . Association for ComputingMachinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376404[31] Abigale J. Stangl, Esha Kothari, Suyog D. Jain, Tom Yeh, Kristen Grauman, andDanna Gurari. 2018. BrowseWithMe: An Online Clothes Shopping Assistant forPeople with Visual Impairments. In

Proceedings of the 20th International ACMSIGACCESS Conference on Computers and Accessibility (Galway, Ireland) (ASSETS ’18) . ACM, New York, NY, USA, 107–118. https://doi.org/10.1145/3234695.3236337[32] Chi Sun, Luyao Huang, and Xipeng Qiu. 2019. Utilizing BERT for Aspect-BasedSentiment Analysis via Constructing Auxiliary Sentence. In

Proceedings of the2019 Conference of the North American Chapter of the Association for Computa-tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) .Association for Computational Linguistics, Minneapolis, Minnesota, 380–385.https://doi.org/10.18653/v1/N19-1035[33] Shari Trewin, Diogo Marques, and Tiago Guerreiro. 2015. Usage of SubjectiveScales in Accessibility Research. In

Proceedings of the 17th International ACMSIGACCESS Conference on Computers & Accessibility (Lisbon, Portugal) (ASSETS’15) . Association for Computing Machinery, New York, NY, USA, 59–67. https://doi.org/10.1145/2700648.2809867[34] Alexandra Vtyurina, Adam Fourney, Meredith Ringel Morris, Leah Findlater, andRyen W. White. 2019. VERSE: Bridging Screen Readers and Voice Assistantsfor Enhanced Eyes-Free Web Search. In

The 21st International ACM SIGACCESSConference on Computers and Accessibility (Pittsburgh, PA, USA) (ASSETS ’19) .Association for Computing Machinery, New York, NY, USA, 414–426. https://doi.org/10.1145/3308561.3353773[35] Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bog-danova, Jennifer Foster, and Lamia Tounsi. 2014. DCU: Aspect-based PolarityClassification for SemEval Task 4. In

Proceedings of the 8th International Workshopon Semantic Evaluation (SemEval 2014) . Association for Computational Linguis-tics, Dublin, Ireland, 223–229. https://doi.org/10.3115/v1/S14-2036[36] Mengting Wan and Julian McAuley. 2016. Modeling ambiguity, subjectivity, anddiverging viewpoints in opinion question answering systems. In . IEEE, IEEE, Barcelona, Spain,489–498.[37] Vinicius Woloszyn, Henrique D. P. dos Santos, Leandro Krug Wives, and KarinBecker. 2017. MRR: An Unsupervised Algorithm to Rank Reviews by Relevance. In

Proceedings of the International Conference on Web Intelligence (Leipzig, Germany) (WI ’17) . Association for Computing Machinery, New York, NY, USA, 877–883.https://doi.org/10.1145/3106426.3106444[38] Shaomei Wu, Jeffrey Wieland, Omid Farivar, and Julie Schiller. 2017. Automaticalt-text: Computer-generated image descriptions for blind users on a social net-work service. In

Proceedings of the 2017 ACM Conference on Computer SupportedCooperative Work and Social Computing . ACM, Association for Computing Ma-chinery, New York, 1180–1192.[39] Chien Wen Yuan, Benjamin V. Hanrahan, Sooyeon Lee, Mary Beth Rosson, andJohn M. Carroll. 2017. I Didn’T Know That You Knew I Knew: CollaborativeShopping Practices Between People with Visual Impairment and People withVision.

Proc. ACM Hum.-Comput. Interact.

1, CSCW, Article 118 (Dec. 2017),18 pages. https://doi.org/10.1145/3134753[40] Rowan Zellers, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. From recognitionto cognition: Visual commonsense reasoning. In

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition . IEEE, Long Beach, CA, 6720–6731.[41] Di Zhu, Theodoros Lappas, and Juheng Zhang. 2018. Unsupervised tip-miningfrom customer reviews.

Decision Support Systems

107 (2018), 116–124.[42] Shaojian Zhu, Daisuke Sato, Hironobu Takagi, and Chieko Asakawa. 2010.Sasayaki: An Augmented Voice-based Web Browsing Experience. In

Proceed-ings of the 12th International ACM SIGACCESS Conference on Computers andAccessibility (Orlando, Florida, USA) (ASSETS ’10) . ACM, New York, NY, USA,279–280. https://doi.org/10.1145/1878803.1878870

HI ’21, May 8–13, 2021, Yokohama, Japan Wang et al.

A DEMOGRAPHICS OF STUDY PARTICIPANTS