A Comparative Study of Younger and Older Adults' Interaction with a Crowdsourcing Android TV App for Detecting Errors in TEDx Video Subtitles
Kinga Skorupska, Manuel Núñez, Wiesław Kopeć, Radosław Nielek
AA Comparative Study of Younger and OlderAdults’ Interaction with a CrowdsourcingAndroid TV App for Detecting Errors in TEDxVideo Subtitles
Kinga Skorupska , − − − , Manuel N´u˜nez − − − ,Wies(cid:32)law Kope´c − − − , and Rados(cid:32)lawNielek − − − Polish-Japanese Academy of Information Technology
Abstract.
In this paper we report the results of a pilot study com-paring the older and younger adults’ interaction with an Android TVapplication which enables users to detect errors in video subtitles. Over-all, the interaction with the TV-mediated crowdsourcing system relyingon language proficiency was seen as intuitive, fun and accessible, but alsocognitively demanding; more so for younger adults who focused on thetask of detecting errors, than for older adults who concentrated more onthe meaning and edutainment aspect of the videos. We also discuss par-ticipants’ motivations and preliminary recommendations for the designof TV-enabled crowdsourcing tasks and subtitle QA systems.
Keywords:
Crowdsourcing · Smart TV · Android TV · Design evalua-tion · Subtitles · Older adults · Younger adults.
With the increasing amount of video content it is necessary to ensure its ac-cessibility to the deaf, the hard of hearing and international audiences throughquality same language and multilingual subtitles. Therefore, crowdsourcing sub-title quality assurance (QA) models are an important research frontier, especiallyas subtitles are often created by volunteers, as in the case of TED and TEDx [7]or generated automatically. At the same time, there are groups who may benefitfrom more fun and accessible crowdsourcing projects.For example, older adults, who comprised 19.2% of the EU-28 population in2016 [1], benefit from all forms of volunteering, as it slows the negative effectsof aging and helps combat depression [10]. Yet, there exist multiple barriers totheir inclusion in typical crowdsourcing tasks, such as lower ICT skills, uncom-fortable and costly setup of such solutions [16], unfamiliar interfaces and lack ofmotivation due to unclear personal benefit [4], unsocial nature of the task [18]or their perception of not being qualified [8]. a r X i v : . [ c s . H C ] A ug Skorupska et al.
Younger adults, on the other hand, who are more open to online crowdsourc-ing and microtasking, comprise a significant number of online video viewers, as,according to We Are Flint about 96% of people in UK and US aged 18-34 watchYouTube videos [2]. Both groups are relevant to the development of TV-enabledsubtitle QA crowdsourcing tasks as potential contributors and audience.Therefore, the key research goal was to validate a novel interface for creatingno-grind crowdsourcing solutions, ones that do not rely on tedious repetition,with two relevant user groups. To do this, we deployed a Smart TV-based systembased on best practices of designing for older users [6] [12] with a comfortable at-home setup, large screen size, and remote relying on familiar interaction patterns[13] with engaging edutainment crowdsourcing tasks. This lowered ICT and otherparticipation barriers and allowed us to signal some possible differences in theparticipants’ approach, motivation, mode of use, experience and expectations.We lay ground to the discussion of the extent to which one may build a universalcrowdsourcing system suited to the needs of these different groups, to tap intotheir potential, facilitate social inclusion and build social capital.
To explore these considerations we conducted a comparative qualitative studyin the course of which we compared results from a study involving older adults[17] to the results of a study with younger adults conducted in February-March2019.
Fig. 1.
The error category selection overlay in our Dream TV applicationnteraction with an App for Detecting Errors in Subtitles 3
The study examined the interaction with the DreamTV application we cre-ated [17] which allows users to watch TEDx videos with volunteer-created sub-titles retrieved from Amara API. Once they spot an error they can pause thevideo to display an overlay (Fig. 1) where they choose the error category amonggrammar, meaning, style and timing. These error categories were chosen basedon preliminary tests and research to be more intuitive than existing models ofquality assessment of subtitles by professionals [15] and to aid in improving thesubtitles later within the pipeline or during post-editing.The research protocol, which took about two hours to complete, involvedindividual testing at participants’ homes, where an Android TV set-top box wasconnected to participants’ TV sets, to provide the most natural use conditions, asproposed in multiple studies on Living Labs [9] [3]. It consisted of the DigCompsurvey , a semi-structured interview to evaluate experience with subtitles, theexplanation of the project, that is the study and its benefits, an introduction tosubtitles and a subtitle error detection written exercise, an app demonstrationand a hands-on test, free interaction with the application and our pre-selectedtest videos (two in Polish, three in English) with redacted Polish subtitles.For our study we selected five videos to represent different challenges. Theywere controlled for topic, length, source language (spoken), ease of comprehen-sion and errors: saturation, category and source, either machine (using Subti-tleEdit and Google Translate) or organic human or introduced by researchersbased on common errors lists on TED Translators’ wiki . The videos selectedand errors introduced allowed us to observe a variety of factors at play, in orderto gather diverse insights to determine interesting areas of further inquiry. We invited seven older adults (O1-O7) and seven younger adults (Y1-Y7) toparticipate in our study, in each case three female participants and four maleparticipants. We controlled for age, occupation and ICT skills (”above basicproficiency”, which is the highest level in DigComp). All participants live inWarsaw, the capital city of Poland. For older adults all owned TVs, includingtwo Smart TVs, and had a dedicated entertainment space in their living room.There was a 20 years age span: the youngest participant was 60 years old andthe oldest one was 79, mean 70.85 (SD=6.87). For younger adults we recruited agroup that would share the most relevant characteristics with our older adults,especially in terms of their housing situation and entertainment setup, whichmeant that in Poland they had to be between 25-35 years of age. All but oneparticipants owned Smart TVs and had their own dedicated entertainment spacein the living room. All were professionally active and none of them had children.The age span was 5 years, as the youngest participant was 28, and the oldest33, mean 30.71 (SD=2.28). A survey measuring indicators of Digital Competence based on the Digital Compe-tence Framework [5]. The TED Translators’ wiki containing lists of common errors can be found at:https://translations.ted.com Skorupska et al.
Overall, using the application was enjoyable, intuitive and easy for both youngerand older participants, however there were differences in their approach to thetask. While our group of younger adults saw it as an enjoyable activity one coulddo to improve subtitles, brag or supplement their income in a fun way, our groupof older adults viewed it less as work and more an opportunity to learn some-thing and did not expect payment for contributing. For older adults it was moreinteresting, as they were given access to resources they were unlikely to reachto on their own (TEDx videos) whereas younger adults agreed that they knowless demanding or better entertainment. Younger adults detected more mistakesthan older adults as they viewed the task to be more work-like and in conse-quence, demanding. Older adults seemed more lenient, especially when it cameto style and punctuation, and focused more on the content of the videos, ratherthan correcting mistakes. There were also differences in feedback. Where olderadults focused on ways to find videos that would be a better fit for them themat-ically, younger adults focused more on critiquing the error categories chosen andcomparing the application to Netflix. This is due to the differences in experiencewith such services. Both groups found the interaction via the remote to be veryconvenient and well-suited for this activity and they learned to comfortably usethe application in just one session, with older adults in general taking more timeto learn and later to navigate, but with no significant other differences.
Overall, all of the older participants paused the videos one subtitle toolate, and had to use the dialog list to navigate back to the subtitle where theywanted to mark the error. The same was true of all but one younger adults, as Y2paused even before the speaker finished the sentence, indicating that they readrather than listened. This suggests that access to the full dialog list is necessaryin this type of crowdsourcing for all age groups.
Number of errors found
In general, younger adults found more errors thanolder adults which may be related to their attitude towards this activity. Whileyounger adults focused on the task of finding errors, older adults engaged withthe content of the videos more and felt that they are learning new interestingthings (O1-O4). This is in contrast with younger adults, except for Y4, whoadmitted to focus more on the content and commented that they ”should watchsuch videos more often as they are interesting”. Consequently, younger adultsfound many more punctuation errors, which older adults often ignored. Thismay be as punctuation errors do not interfere with understanding. Older adults,who focused more on understanding the content, often chose the ”meaning” cat-egory, when something was not clear to them (e.g. ”it is not explained what isthis photon” or ”Spiderman, this is not Polish” by O2 and ”kryptonite, must bea mistake” by O5, O6), suggesting the application could benefit from a built-in nteraction with an App for Detecting Errors in Subtitles 5 dictionary. Older adults’ focus on meaning is in line with Radvansky’s researchon the effect of aging on memory and comprehension, suggesting that while lowerlevels of memory, which may be responsible for remembering specifics such aspunctuation, deteriorate with age, the ability to form situation models on ahigher level, aiding in meaning and general comprehension is less affected [14].Moreover, different people found very different errors, depending on their inter-ests and background (science for Y6: ”the Sun vs the sun”, detailed punctuationrules for Y2 with linguistic background) which shows that the effect of scale byrelying more on quantity and not quality of contributions may work well here.
Error categories
All but one of the younger participants (Y1-Y6) encounterederrors in subtitles to which they wished to assign more than one error category,to remove the analysis paralysis of choosing the best fitting category (”Peoplelike me would deliberate 3 years over a single word” Y1) and likely to satisfytheir need for cognitive closure [11], as many younger participants found thecategories to be ”fuzzy”. The other participant, Y7, said that ”these are shortlines so if someone marks a mistake it is easy to know what it is” and proposedto remove categories, the same could be seen in O3’s eagerness to just markmistakes quickly and continue watching the videos.Younger adults remarked that ”synchronization is the most intuitive” (Y1).Other error categories requested were ”punctuation” (Y6) and ”subtitle division”(line breaking) (Y3) and ”technical errors” such as subtitle convention errors asa separate category (Y1, Y2) and both Y7 and Y3 said that knowing subtitleconventions requires a lot of practice, and pre-teaching, for which Y3 suggesteda mini-game, while older adults wished for an in-application tutorial to ensurethey do not make mistakes when marking mistakes (O1, O4). One participant,Y6, also said there ought to be a way to mark recurring errors (”Here I wouldhave to mark a lot of things, because the Sun should be written with capitalletter, and it repeats a lot”), on the other hand O3 remarked ”He made the samemistake, but I’ll overlook it now”, eager to continue watching.Older adults (O1-O7) did not question the error categories even though theyoften could not decide which category to choose (O4, O5) and sometimes delib-erated aloud (O3). This may be because older adults are less likely to criticizedesign choices in the context of technology, as they feel they lack experience init so they are not confident enough to know they can contribute. This was alsoobserved in the context of participatory design by Kopec et al [8]. Also, eventhough some older adults had to sit closer to the screen to read (O1, O3) it wasa younger adult (Y6) who voiced that they would like the interface to be bigger.In conclusion, to ease the choice of error categories we propose to presentthem in the order of importance, with the top category being ”meaning” - an-swering the question ”Is this subtitle understandable?”, followed by ”grammar”,as it includes common punctuation mistakes, and then ”style”, which would haveto be explained as relating to technical errors, and including also other problems.We postulate that because of conflicts of simultaneous work it is very difficultto find synchronization errors, while also looking for other types of errors (”It
Skorupska et al. is difficult to catch problems with synchronization - you focus on all the othermistakes” Y3, and ”I had to read” O5). This was seen in the tests with olderadults, who found no synchronization errors (O1-O7), and younger adults whorarely marked them as they found it tiring to both read, and listen (Y7: ”I didnot listen to the guy”, Y6: ”difficult to focus on what the person was saying”)Signalling the relationship between enjoyment, interest and errors found Y6 said:”this topic was interesting, sometimes I did not focus on finding mistakes”. Botholder (O1-O4, O6) and younger adults (Y3, Y4, Y6) seemed to find fewer errorsthe more they enjoyed the video, with Y4 saying that they were ”forgetting toread”. The enjoyment was also negatively correlated with the number of errorsmarked, with Y2 saying that ”The errors were so thickly distributed, it is a verytiring video” and that ”If there were fewer errors it would be more fun thanwork” and Y5 mentioning that ”If you have to focus only on subtitles it is morelike work, but if you get to mark glaring errors only it is more entertainment”.
Y1 and Y5 found the application to be very fun, commenting that ”you canpoint out someone’s mistakes without arguing with that person, everyone lovesthat!” (Y1), adding that it is true especially when there are people around, and”How fun! I like it! I could do it all my life” (Y5). Y6 also said ”it’s cool, Ilike nitpicking”. The other participants commented that it would be work if you”had to do it, like an editor in a paper” and ”the movies are not long, andyou can take breaks” (Y7). Similarly, Y3 mentioned that ”you should be ableto choose how long video you want”. This aspect of controlling time was alsopresent in older adults’ feedback, as they enjoyed the ability to pause the videoat will, take breaks, and O3 even said ”The movies should be shorter, then Icould watch anything! Just give me ten 5 minute films and I can do that foran hour”. Older adults overall focused on the educational aspect of the task,saying that it is good practice and one can ”learn a lot” (O1-O4) from thesevideos. This aspect was less prominent with younger adults, who often treatedthe experience almost job-like as it was ”mentally demanding” and felt more like”work”, or that it is a bit like an ”exam” (Y1) and felt judged when they didnot understand a subtitle (Y3) (”I don’t know what they mean by ”last mile”and since it was in quotation marks it must be something that everyone knows,so now I feel stupid”). In contrast, only O4 mentioned that ”It is tiring, I amnot that young anymore.” drawing attention to the task’s cognitive load.
While older adults’ participants motivation was mostly based on the value forthem, in terms usefulness, relevance to their interests and staying active, foryounger adults there was almost no concern about the topic as they viewed thetask to be more ”work-like” and focused finding errors more than understandingand enjoying the content - likely because they have other entertainment readilyavailable. Detailed comparison of approaches and attitudes is visible in Table 1. nteraction with an App for Detecting Errors in Subtitles 7
Table 1.
Comparison of older and younger adults’ motivations, rewards and wishes
Younger adults Older adultsPointing out mistakes
Y1, Y2, Y5, Y6 O3
Social activity
Y1: ”to do with friends” O2: ”with grandchildren”
Helping somebody
Y1: ”If some friend asked me todo this for them, I would helpthem”, Y4
Learning new things
Our group of younger adultscould watch such videos, butjust watch as Y4: ”they are in-teresting” to Y6: ”focus on thecontent”. O1-O4, O1: ”I learned alot”, O3 ”I would watchmovies about health, globalissues, climate change orpolitics” but: O5 ”The top-ics would have to be useful”
Getting paid
Y1-Y7, except for Y5: ”Nobodywould pay much, it’s better tohave bonuses, like a subscriptionor a small gift because earninglittle money is meh”
Improving the world
Y1: ”I like it, if I was convincedmyself that this is making theworld a bit better, then this is aconvenient way to help”
Challenging oneselfcognitively
Y2 and Y3, but about otherpeople, Y3: ”blue-collar work-ers” and ”stay at home moms”who can do it for fun and Y2:”retired people to stay active”. O3: ”This task is great forold people, but only thosewho are mentally fit, so thatthey dont deteriorate”
Passion for the topic
Y3 mentioning feminists: ”peo-ple who are very passionateabout a topic can contribute”
Statistics of con-firmed contributions
Y4: ”ranking like on Memrise”,Y6: ”ranking of best reviewers”,Y3: ”a community to care aboutmy achievements listed on myprofile”. Interestingly, both Y1and Y2 mentioned they do notneed statistics.
Helping improve sub-titles being used
Y3: ”that there were 100 peo-ple who watched this film withimproved subtitle in a monthwould mean something”
Access to training
Y3, ”in the community accessto games that help you developskills to contribute better” O1: ”It would be good tohave a testing mode, to beable to train without conse-quences”, O4
Addressing glaringerrors in videos they are watching withsubtitles anyway (Y4, Y5)
Reliance on linguisticexperience
Y3: ”I like that I don’t have tolearn anything to start doing it,I know the language” O3: ”There should be moresubtitle testers like me, butnot young people becausethey have little experience” Skorupska et al.
Overall, although most of the participants found this activity to be fun, there aredoubts whether they would do it in the long run without other incentives. Thetests with older adults suggest that some may continue using the applicationas an easy foray into the world of edutainment and to stay active, except forO5 who stated ”I manage, but it is not my thing - the topics would have tobe useful” and O4 who expected to be bored as one has to ”be focused”. Onthe other hand, some younger adults commented ”I wouldn’t do it because itis time consuming, when you watch something to gain knowledge it is easier tounderstand the content if you are just watching” (Y6) or ”it’s not my type ofthing, I am not a linguist and correcting errors is not my passion” (Y7). Theyalso mentioned shortage of time (Y2, Y3) and the demanding nature of this task(Y2, Y3, Y6, Y7) as a problem. For younger adults, who have formed habitsregarding their access to other forms of entertainment, it may work best as afeature integrated into their familiar experience. Both Y4 and Y5 suggested thatsuch activity could be ”integrated into a player” they use anyway”, on YouTubefor Y4 (”it could be great if YouTube had something like that in their automaticsubtitles, which now suck”) or on VOD for Y5, who noted that ”Sometimes I amtempted to mark something on VOD - there are few people who would bother togo to a film distributors’ website and report errors in subtitles”. Y5 concludedthat ”If it was easily accessible then a lot of people would do it, if they couldjust mark something on their remote”.
As this is a pilot study with a small number of participants it is important toverify the following preliminary findings. While this task is fun for both youngerand older adults, the former treat it more like work and expect payment. Thisgroup would benefit from having a similar solution integrated into their enter-tainment medium of choice. On the other hand, older adults are a promisingtarget for this type of crowdsourcing, as it not only provides them with contentthey may otherwise miss, but also allows them to learn and stay active.Future work ought to explore TV-mediated crowdsourcing in larger studies,and focus on the patterns of interaction with this solution, including the timingof engagement and quantitative relationship between the enjoyment of the videoand the number of subtitle errors found. It is also important to verify if thisTV-mediated crowdsourcing solution can hold older adults’ interest over time,and if so, what are other ways such mode of interaction can be used to allowolder adults to stay active for longer, contribute to society and learn new things.
This research in part was supported by the Polish National Science Centergrant 2018/29/B/HS6/02604 and the European Unions Horizon 2020 researchand innovation programme under the Marie Skodowska-Curie grant agreementNo 690962. nteraction with an App for Detecting Errors in Subtitles 9
References
1. Population structure and ageing. http://ec.europa.eu/eurostat/statistics-explained/index.php/Population_structure_and_ageing (2017)2. Social 2018 main findings. https://weareflint.co.uk/main-findings-social-media-demographics-uk-usa-2018 (2018)3. Alaoui, M., Lewkowicz, M.: A livinglab approach to involve elderly in the design ofsmart tv applications offering communication services. In: International Conferenceon Online Communities and Social Computing. pp. 325–334. Springer (2013)4. Brewer, R., Morris, M.R., Piper, A.M.: Why would anybody do this?: Understand-ing older adults’ motivations and challenges in crowd work. In: Proc. of the 2016CHI Conf. on Human Factors in Computing Systems. pp. 2246–2257. ACM (2016)5. Ferrari, A.: Digcomp: A framework for developing and understanding digital com-petence in europe (2013)6. Fisk, A., Czaja, S., Rogers, W., Charness, N., Sharit, J.: Designing for OlderAdults: Principles and Creative Human Factors Approaches, Second Edition. Hu-man Factors and Aging Series, CRC Press (2009)7. C´amara de la Fuente, L.: Multilingual crowdsourcing motivation on global socialmedia, case study: Ted otp. Sendebar , 197–218 (2014)8. Kope´c, W., Balcerzak, B., Nielek, R., Kowalik, G., Wierzbicki, A., Casati, F.: Olderadults and hackathons: A qualitative study. Empirical Software Engineering (4),1895–1930 (2018). https://doi.org/10.1007/s10664-017-9565-69. Kope´c, W., Skorupska, K., Jaskulska, A., Abramczuk, K., Nielek, R., Wierzbicki,A.: Livinglab pjait: Towards better urban participation of seniors. In: Proceedingsof the International Conference on Web Intelligence. pp. 1085–1092. WI ’17, ACM,New York, NY, USA (2017). https://doi.org/10.1145/3106426.310904010. Lum, T.Y., Lightfoot, E.: The effects of volunteering on the physical and mentalhealth of older people. Research on aging (1), 31–55 (2005)11. M. Webster, D., Kruglanski, A.: Individual differences in need for cognitiveclosure. Journal of personality and social psychology , 1049–62 (01 1995).https://doi.org/10.1037/0022-3514.67.6.104912. Pak, R., McLaughlin, A.: Designing Displays for Older Adults. Human Factors andAging Series, CRC Press (2010)13. Pan, Z., Miao, C., Yu, H., Leung, C., Chin, J.J.: The effects of familiarity design onthe adoption of wellness games by the elderly. In: 2015 IEEE/WIC/ACM Interna-tional Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT).vol. 2, pp. 387–390. IEEE (2015)14. Radvansky, G.: Aging, memory, and comprehension. Current Directions in Psy-chological Science , 49–53 (04 1999). https://doi.org/10.1111/1467-8721.0001215. Romero-Fresco, P., Pchhacker, F.: Quality assessment in interlingual live subti-tling: The ntr model. Linguistica Antverpiensia, New Series Themes in TranslationStudies (0) (2018)16. Sandhu, J., Damodaran, L., Ramondt, L.: Ict skills acquisition by older people:Motivations for learning and barriers to progression. International Journal of Ed-ucation and Ageing (1), 25–42 (2013)17. Skorupska, K., Nunez, M., Kopec, W., Nielek, R.: Older adults and crowdsourcing:Android tv app for evaluating tedx subtitle quality. Proc. ACM Hum.-Comput.Interact.2