The Limits of Global Inclusion in AI Development
TThe Limits of Global Inclusion in AI Development
Alan Chan * 1 , Chinasa T. Okolo ∗ , Zachary Terner ∗ , Angelina Wang ∗ Mila, Universit´e de Montr´eal Cornell University National Institute of Statistical Sciences Princeton [email protected], [email protected], [email protected], [email protected]
Abstract
Those best-positioned to profit from the proliferation of arti-ficial intelligence (AI) systems are those with the most eco-nomic power. Extant global inequality has motivated Westerninstitutions to involve more diverse groups in the develop-ment and application of AI systems, including hiring foreignlabour and establishing extra-national data centres and labo-ratories. However, given both the propensity of wealth to abetits own accumulation and the lack of contextual knowledge intop-down AI solutions, we argue that more focus should beplaced on the redistribution of power, rather than just on in-cluding underrepresented groups. Unless more is done to en-sure that opportunities to lead AI development are distributedjustly, the future may hold only AI systems which are un-suited to their conditions of application, and exacerbate in-equality.
Introduction
The arm of global inequality is long, rendering itself vis-ible especially in the development of artificial intelligence(AI). In an analysis of publications at two major machinelearning conference venues, NeurIPS 2020 and ICML 2020,Chuvpilo (2020) found that of the top 10 countries in termsof publication index, none were located in Latin Amer-ica, Africa, or Southeast Asia. Vietnam, the highest placingcountry of these groups, comes in 27th place. Of the top10 institutions by publication index, eight out of 10 werebased in the United States, including American tech giantslike Google, Microsoft, and Facebook. Indeed, the full listsof the top 100 universities and top 100 companies by pub-lication index include no companies or universities based inAfrica or Latin America. Although conference publicationsare just one metric, they remain the predominant medium inwhich progress in AI is disseminated, and as such serve tobe a signal of who is generating research.These statistics are unsurprising. The predominance of theUnited States in these rankings is consistent with its eco-nomic and cultural dominance, just as the appearance ofChina with the second highest index is a marker of its grow-ing might. Also comprehensible is the relative absence of * countries in the Global South, given the exploitation and un-derdevelopment of these regions by European colonial pow-ers (Frank 1967; Rodney 1972; Jarosz 2003; Bruhn and Gal-lego 2012).Current global inequality in AI development involves botha concentration of profits and a danger of ignoring the con-texts to which AI is applied. As AI systems become increas-ingly integrated into society, those responsible for develop-ing and implementing such systems stand to profit to a largeextent. If these players are predominantly located outside ofthe Global South, a disproportionate share of economic ben-efit will fall also outside of this region, exacerbating extantinequality. Furthermore, the ethical application of AI sys-tems requires knowledge of the contexts in which they are tobe applied. As recent work (Grush 2015; De La Garza 2020;Coalition for Critical Technology 2020; Beede et al. 2020;Sambasivan et al. 2020) has highlighted, work that lacks thiscontextual knowledge can fail to help the targeted individu-als, and can even harm them (e.g., misdiagnoses in medicalapplications).Whether explicitly in response to these problems or not,calls have been made for broader inclusion in the devel-opment of AI (Asemota 2018; Lee et al. 2019). At thesame time, some have acknowledged the limitations of in-clusion. Sloane et al. (2020) describes and argues againstparticipation-washing, whereby the mere fact that somebodyhas participated in a project lends it moral legitimacy. In thiswork, we focus upon the implications of participation forglobal inequality, focusing particularly on the limitations inwhich inclusion in AI development is practised in the GlobalSouth. We look specifically at how this plays out in the do-mains of datasets and research labs, and conclude with adiscussion of opportunities for ameliorating the power im-balance in AI development. Datasets
Given the centrality of large amounts of data in today’s ma-chine learning systems, there would appear to be substantialopportunity for inclusion in data collection and labeling pro-cesses. While there are benefits to more diverse participationin data-gathering pipelines (that is, processes involved in thecollection, labeling, and other processing of data for use inmachine-learning systems), we will highlight how this ap-proach does not go far enough in addressing global inequal- a r X i v : . [ c s . C Y ] F e b ty in AI development.Data collection itself is a practice fraught with prob-lems of inclusion and representation. Two large, publiclyavailable image datasets, ImageNet (Deng et al. 2009; Rus-sakovsky et al. 2015) and OpenImages (Krasin et al. 2017),are US- and Eurocentric (Shankar et al. 2017). Shankar et al.(2017) further argues that models trained on these datasetsperform worse on images from the Global South. For ex-ample, images of grooms are classified with lower accuracywhen they come from Ethiopia and Pakistan, compared toimages of grooms from the United States. Along this vein,DeVries et al. (2019) shows that images of the same word,like “wedding” or “spices”, look very different when queriedin different languages, as they are presented distinctly in dif-ferent cultures. Thus, publicly available object recognitionsystems fail to correctly classify many of these objects whenthey come from the Global South. A representative dataset iscrucial to allowing models to learn how certain objects andconcepts are represented in different cultures.Since many deep learning techniques require largeamounts of data to train their models, the importance of datalabeling has grown. The data collection and labeling mar-ket is expected to grow to $6.5 billion USD by 2027 (GrandView Research 2020), while Cognilytica (2019) estimatesthat over 80% of the machine learning development processconsists of data preparation tasks (collection, cleaning, andlabeling). Large tech companies such as Uber and Alpha-bet rely heavily these services, with some paying millions ofdollars monthly (Synced 2019).At the same time, data labeling is a time-consuming,repetitive process. Its importance in machine-learning re-search and development has led to the crowdsourcing ofthis work, whereby anonymous individuals are remuneratedfor completing this work. A major venue for crowdsourcingwork is Amazon Mechanical Turk; according to Difallah,Filatova, and Ipeirotis (2018), less than 2% of MechanicalTurk workers come from the Global South (a vast majoritycome from the USA and India). Other notable companiesin this domain, Samasource, Scale AI, and Mighty AI alsooperate in the United States, but they crowdsource work-ers from around the world, primarily relying on low-wageworkers from sub-Saharan Africa and Southeast Asia (Mur-gia 2019). This leads to a significant disparity between themillions in profits earned by data labeling companies andworker earnings; for example, workers at Samasource earnaround $8 USD a day (Lee 2018) while the company made$19 million in 2019 (Samasource 2021). While Lee (2018)notes that $8 USD may well be a living wage in certain areas,the massive profit disparity remains despite the importanceof these workers to the core businesses of these companies.Additionally, many of these workers are contributing to AIsystems that are likely to be biased against underrepresentedpopulations in the locales they are deployed in (Buolamwiniand Gebru 2018; Obermeyer et al. 2019) and may not bedirectly benefiting their local communities. While data la-beling is not as physically intensive as traditional factory la-bor, workers report the pace and volume of their tasks as”mentally exhausting” and ”monotonous” due to the strictrequirements needed for labeling images, videos, and audio to client specifications (Gent 2019; Croce and Musa 2019).In the Global South recently, local companies have begunto proliferate, like Fastagger in Kenya, Sebenz.ai in SouthAfrica, and Supahands in Malaysia. As AI development con-tinues to scale, the expansion of these companies opens thedoor for low-skilled laborers to enter the workforce but alsopresents a chance for exploitation to continue to occur. Barriers to Participation
There are barriers that exist toparticipating in data labeling. The most obvious is that acomputing device and stable internet access are requiredfor access to these data labeling platforms. These goods arehighly correlated with socioeconomic status and geographiclocations, thus serving as a barrier to participation for manyHarris, Straker, and Pollock (2017). A reliable internet con-nection is necessary for finding tasks to complete, complet-ing those tasks, and accessing the remuneration for thosetasks. Further, those in the Global South pay higher pricesfor Internet access compared to their counterparts in theGlobal North (i.e. Western countries) (Nzekwe 2019). An-other barrier is in the method of payment for data labelingservices on some of these platforms. For example, AmazonMechanical Turk, a widely used platform for finding datalabelers, only allows payment to a U.S. Bank Account or inthe form of an Amazon.com gift card (Amazon 2020). Thesemethods of payment restrict may not be what is desired bya worker, and can serve as a deterrent to work for this plat-form.
Problems with Participation
Although global inclusionin the data pipeline can be beneficial, it is no panacea forglobal inequality in AI development, and in fact, can evenbe detrimental if not approached with care. The develop-ment of AI is highly concentrated in countries in the GlobalNorth for a variety of reasons, such as an abundance of cap-ital, well-funded research institutions, and technical infras-tructure. The existence of these advantageous conditions isinextricable from the history of colonial exploitation of theGlobal South, whereby European states plundered labourand capital for the benefit of the metropoles, to the detrimentof the colonized (Frank 1967; Rodney 1972). A key justifi-cation for this exploitation was white supremacy: the colo-nized, as “uncivilized”, were most fit to perform physicallyexcruciating labour, at wages lower than those paid to Eu-ropeans. As such, colonized peoples were for the most partprevented from engaging in the more lucrative businessesof insurance, banking, industry, and trading (Rodney 1972).Although the labour and natural capital of colonized nationswere indispensable to European economic projects, Euro-pean institutions and individuals captured the vast majoritythis wealth.It is instructive to view inclusion in the data pipeline as acontinuation of this exploitative history. With respect to datacollection, current practices can neglect consent and poorlyrepresent areas of the Global South. Image datasets are oftencollected without consent from the people involved, even inpornographic contexts (Prabhu and Birhane 2020; Paulladaet al. 2020), while others (e.g., companies, end-users) benefitfrom their use. Jo and Gebru (2020) suggests drawing fromthe long tradition or archives when collecting data becausehis is a discipline that has already been thinking about chal-lenges like consent and privacy. Indeed, beyond a possiblehonorarium for participation in the data collection process,no large-scale, successful schema currently exists for com-pensating users for the initial and continued use of their datain machine-learning systems, although some efforts are cur-rently underway (Kelly 2020). However, the issue of com-pensation elides the question of whether such large-scaledata collection should occur in the first place. Indeed, theprocess of data collection can contribute to an “othering” ofthe subject and cement inaccurate or harmful beliefs. Even ifdata come from somewhere in the Global South, they are of-ten from the perspective of an outsider (Wang, Narayanan,and Russakovsky 2020). That the outsider may not under-stand the context or may have an agenda counter to the in-terest of the subject is reflected in the data captured, as hasbeen extensively studied in the case of photography (Ranger2001; Batziou 2011; Thompson 2016). Ignorance of contextcan cause harm, as Sambasivan et al. (2020) discusses inthe case of fair ML in India, where distortions in the data(e.g., a given sample corresponds to multiple individuals be-cause of shared device usage) distort the meaning of fair-ness definitions that were formulated in Western contexts.Furthermore, the history of phrenology reveals the role thatthe measurement and classification of colonial subjects hadin justifying domination (Bank 1996; Poskett 2013). Dentonet al. (2020) points out the need to interrogate more deeplythe norms and values behind the creation of datasets, as theyare often extractive processes that benefit only the datasetcollector and users.As another significant part of the data collection pipeline,data labeling is an extremely low-paying job involving rote,repetitive tasks that offer no room for upward mobility. Indi-viduals may not require many technical skills to label data,but they do not develop any meaningful technical skills ei-ther. The anonymity of platforms like Amazon’s Mechan-ical Turk inhibit the formation of social relationships be-tween the labeler and the client that could otherwise haveled to further educational opportunities or better remunera-tion. Although data is central to the AI systems of today, datalabelers receive only a disproportionately tiny portion of theprofits of building these systems. In parallel with colonialprojects of resource extraction, data labeling as extractionof meaning from data is no way out of a cycle of colonialdependence.The people doing the work of data labeling have beentermed ”ghost-workers” (Gray and Suri 2019). The labourof these unseen workers generates massive profits that oth-ers capture. While our following discussion provides USstatistics because those are the ones most readily available,it is easy to imagine similar or worse labour situations inthe Global South. ImageNet (Deng et al. 2009; Russakovskyet al. 2015)–a benchmark dataset essential to recent progressin computer vision–would have not been possible withoutthe work of data labelers (Gershgorn 2017). However, theworkers themselves made only around a median of $2/hourUSD, with only 4% making more than the US federal min-imum wage of $7.25/hour (Hara et al. 2018), itself a farcry from a living wage. The study attributed much of this low-wage structure to the time spent on activities that werenot compensated, such as finding tasks or working on tasksthat are ultimately rejected. This leads into another majorproblem of the power dynamics on a platform like Ama-zon Mechanical Turk, where all of the power is given tothe requester of the task. Requesters have the power to setany price they want (as low as $.01), reject the completedwork of a worker, and misleadingly claim their task will takea length of time much shorter than what it would actuallytake (Semuels 2018). In the US, workers in this business areconsidered independent contractors rather than employees,so protections guaranteed by the Fair Labor Standards Actdo not apply. A same lack of protections can be seen fordata labelers in the Global South (Kaye 2019). This powerimbalance emphasizes the need for labor protection.
Research Labs
Establishing research labs has been essential for major techcompanies to advance the development of their respectivetechnologies while providing valuable contributions to thefield of computer science (Nature 1915). In the UnitedStates, General Electric (GE) Research Laboratory is widelyaccepted as the first industrial research lab, providing earlytechnological achievements to GE and establishing them asa leader in industrial innovation (Center 2011). As the as-cendance of artificial intelligence becomes more importantto the bottom lines of many large tech companies, indus-trial research labs have spun out that solely focus on artifi-cial intelligence and its respective applications. Companiesfrom Google to Amazon to Snapchat have doubled downin this field and opened up labs leveraging artificial intel-ligence for web search, language processing, video recog-nition, voice applications, and much more. As AI becomesincreasingly integrated into the livelihoods of consumersaround the world, tech companies have recognized the im-portance of democratizing AI development and moving itoutside the bounds of the Global North. Of five notabletech companies developing AI solutions (Google, Microsoft,IBM, Facebook, and Amazon), Google, Microsoft, and IBMhave research labs in the Global South and all have either de-velopment centers, customer support centers, or data centerswithin these regions. Despite their presence throughout theGlobal South, AI research centers tend to be concentratedin certain countries. Within Southeast Asia, the representa-tion of lab locations is limited to India; in South America,representation is limited to Brazil. In sub-Saharan Africa wefind a bit more spread in location with AI labs establishedin Accra, Ghana; Nairobi, Kenya; and Johannesburg, SouthAfrica.
Barriers to Participation
For a company to choose to es-tablish an AI research center, the company must believe thisinitiative to be in its financial interest. Unfortunately, severalbarriers exist. The necessity of generating reliable returns forshareholders precludes ventures that appear too risky, espe-cially for smaller companies. The perception of risk can takea variety of forms and possibly be influenced by stereotypesto differing extents. Two such factors are political/economicinstability or a relatively lower proportion of tertiary for-al education in the local population, which can be tracedto the history of colonial exploitation and underdevelop-ment (Rodney 1972; Jarosz 2003; Bruhn and Gallego 2012),whereby European colonial powers extracted labour, naturalresources, and economic surplus from colonies, while at thesame time subordinating their economic development to thatof the metropoles. It is hard to imagine the establishment ofa top-tier research university — with the attendant technicaltraining afforded to the local populace — in regions repeat-edly denuded of wealth.
Problems with Participation
While the opening of datacenters and AI research labs in the Global South appearsbeneficial for the local workforce, these positions may re-quire technical expertise which the local population mightnot have. This would instead introduce opportunities for dis-placement by those from the Global North who have hadmore access to specialized training needed to develop, main-tain, and deploy AI systems. Given the unequal distributionof AI development globally, it is common for AI researchersand practitioners to work and study in places outside oftheir home countries (i.e., outside of the Global South). Forexample, the current director of Google AI Accra, origi-nally from Senegal, was recruited to Google from FacebookAI Research in Menlo Park, CA (Adekanmbi 2018; Ase-mota 2018). The director for Microsoft’s new lab in Nairobi,Kenya was recruited from Microsoft Research India; beforethat, she was a research scientist at Xerox in France (O’Neill2020; Research 2020). While the directors of many researchlabs established in the Global South have experience work-ing in related contexts, we find that local representation issorely lacking at both the leadership and general workforcelevel. Grassroots AI education and training initiatives bycommunities such as Deep Learning Indaba, Data ScienceAfrica, and Khipu AI in Latin America aim to increase lo-cal AI talent, but since these initiatives are less than fiveyears old, it is hard to measure their current impact on im-proving the pipeline of AI researchers and machine learn-ing engineers. However, with the progress made by theseorganizations publishing novel research at premier AI con-ferences, hosting conferences of their own, and much more,the path to inclusive representation in the global AI work-force is strengthening.Although several tech companies have established re-search facilities across the world and in the Global South,these efforts remain insufficient at addressing long-termproblems in the AI ecosystem. A recent report from George-town University’s Center for Security and Emerging Tech-nologies (CSET) describes the establishment of AI labs byUS companies, namely Facebook, Google, IBM, and Mi-crosoft, abroad (Heston and Zwetsloot 2020). The reportnotes that while 68% of the 62 AI labs are located outsideof the United States, 68% of the staff are located withinthe United States. Therefore, the international offices re-main half as populated on average relative to the domesticlocations. Additionally, none of these offices are located inSouth America and only four are in Africa. To advance eq-uity within AI and improve inclusion efforts, it is imperativethat companies not only establish locations in underrepre- sented regions, but hire employees and include voices fromthose regions in a proportionate manner.The CSET report also notes that AI labs form abroadgenerally in one of three ways: through the acquisition ofstartups; by establishing partnerships with local universi-ties or institutions; and by relocating internal staff or hiringnew staff in these locations (Heston and Zwetsloot 2020).The first two of these methods may favor locations with analready-established technological or AI presence, as manyAI startups are founded in locations where a financial andtechnological support system exists for them. Similarly, theuniversities with whom tech companies choose to partnerare often already leaders in the space, as evidenced by Face-book’s partnership with Carnegie Mellon professors andMIT’s partnerships with both IBM and Microsoft. The gen-eral strategy of partnering with existing institutions and ofacquiring startups has the potential to reinforce existing in-equities by investing in locations with already thriving techecosystems. One notable exception to this is Google’s in-vestment into infrastructure, skills training, and startups inGhana (Asemota 2018). Long-term investment and planningin the Global South can form the stepping stones for broad-ening AI to include underrepresented and marginalized com-munities.Even with long-term investment into regions in the GlobalSouth, the question remains of whether local residents areprovided opportunities to join management and contributeto important strategic decisions. Several organizations haveemphasized the need for AI development within a countryto happen at the grassroots level, so that those implement-ing AI as a solution understand the context of the problembeing solved (Mbayo 2020; Gul 2019). The necessity of in-digenous decision-making is just as important in negotiat-ing the values that AI technologies are to instantiate, suchas through AI ethics declarations that are at the momentheavily Western-based (Jobin, Ienca, and Vayena 2019). Al-though this is critical not only to the success of individualAI solutions but also to equitable participation within thefield at large, more can and should be done. True inclusionnecessitates that underrepresented voices can be found in allranks of a company’s hierarchy, including in positions of up-per management. Tech companies which are establishing afootprint in these regions are uniquely positioned to offerthis opportunity to natives of the region. Taking advantageof this ability will be critical to ensuring that the benefitsof AI apply not only to technical problems that arise in theGlobal South, but to socioeconomic inequalities which per-sist around the world.
Opportunities
In the face of global inequality in AI development, there area few promising opportunities.
Affinity Groups
While AI and technology in general haslong excluded marginalized populations, the emergence ofgrassroots efforts by organizations to ensure that indige-nous communities are actively involved as stakeholders ofAI has recently been strong. Black in AI, a nonprofit organi-zation with worldwide membership, was founded to increasehe global representation of Black-identifying students, re-searchers, and practitioners in the field of AI, and has madesignificant improvements in increasing the number of Blackscholars attending and publishing in NeurIPS and other pre-mier AI conferences (Earl 2020; Silva 2021). Inclusion in AIis extremely sparse in higher education and recent efforts byBlack in AI have focused on instituting programming to sup-port members in graduate programs and in their postgradu-ate careers. Other efforts such as Khipu AI, based in LatinAmerica, have been established to provide a venue to trainaspiring AI researchers in advanced machine learning top-ics, foster collaborations, and actively participate in how AIis being used to benefit Latin America. Other communitiesbased on the African continent such as Data Science Africaand Deep Learning Indaba have expanded their efforts, es-tablishing conferences, workshops, and dissertation awards,and developing curricula for the broader African AI commu-nity. These communities are clear about their respective mis-sions and the focus of collaboration. Notably, Masakhane, agrassroots organization focusing on improving the represen-tation of African languages in the field of natural languageprocessing shares the sentiment expressed in this paper onhow AI research should be approached:Masakhane are not just annotators or translators. Weare researchers. We can likely connect you with anno-tators or translators but we do not support shallow en-gagement of Africans as only data generators or con-sumers (Masakhane 2021).As these initiatives grow across the Global South, wehope large organizations and technology companies partnerwith and adopt the values of these respective initiatives toensure AI developments are truly representative of the globalpopulace.
Research Participation
One key component of AI inclu-sion efforts should be to elevate the involvement and par-ticipation of those historically excluded from technologicaldevelopment. Many startups and several governments acrossthe Global South are creating opportunities for local com-munities to participate in the development and implemen-tation of AI programs (Mbayo 2020; Gul 2019; Galperinand Alarcon 2018). In situations where the central involve-ment has been data labeling, strides should be taken to addmodel development roles to the opportunity catalog there.Currently, data labelers are often wholly detached from therest of the ML pipeline, with workers oftentimes not know-ing how their labor will be used nor for what purpose (Gra-ham 2018). Little sense of fulfillment comes from menialtasks, and by exploiting these workers solely for their pro-duced knowledge without bringing them into the fold of theproduct that they are helping to create, a deep chasm ex-ists between workers and the downstream product (Rogsta-dius et al. 2011). Thus, in addition to policy that improveswork conditions and wages for data labelers, workers shouldbe provided with education opportunities that allow them tocontribute to the models they are building in ways beyond la-beling (Gray and Suri 2019). Similarly, where participationin the form of model development is the norm, employersshould seek to involve local residents in the ranks of man- agement and in the process of strategic decision-making.The advancement of an equitable AI workforce and ecosys-tem requires that those in positions of data collection andtraining be afforded opportunities to lead their organizations.Including these voices in positions of power has the addedbenefit of ensuring the future hiring and promotion of localcommunity members.
AI as Development
The massive inequalities in the devel-opment of AI can appear daunting. Will it ever be possible toclose the gap? Similar concerns arise in the broader study ofeconomic development, from which one can draw lessons.Despite the large developmental gap between the GlobalNorth and the Global South, the latter part of the 20th cen-tury saw some countries bridge it. For example, while theGDP per capita of South Korea was far lower than that ofthe USA in the 1960s, by 2000 the gap had considerablynarrowed, especially in comparison to world GDP per capitaover the same time period. Much work (Chang 2009; Lin2011; Aryeetey and Moyo 2012; Mendes, Bertella, and Teix-eira 2014) has linked the relative economic success of SouthKorea to the policy of import substitution industrialization(ISI), whereby a country attempts to replace foreign im-ports with domestic production in an attempt to build high-productivity industries (e.g., electronics), rather than relyon exports of low-productivity industries (e.g., agriculture).The idea is that once the so-called “infant industries” havedeveloped enough, they will be able to compete in interna-tional markets without government support. The executionof ISI involves protectionist trade policies, subsidies for tar-geted industries, and sufficient investment in education andinfrastructure. While ISI can be incredibly successful, as inthe cases of Samsung and POSCO from South Korea (Chang2009), its execution relies on sufficient agricultural input andhuman capital, careful management of foreign reserves, andstate capacity for coordination with private partners (Ary-eetey and Moyo 2012; Mendes, Bertella, and Teixeira 2014).In the absence of these factors, ISI can fail and the countrycan even go through de-industrialization.We suggest viewing AI development as a path forwardfor economic development, in light of the lessons learnedfrom ISI policies. Rather than rely upon foreign construc-tion of AI systems for domestic application, where any re-turns from these systems are not reinvested domestically,we encourage the formation of domestic AI developmentactivity. This development activity should not be focusedon low-productivity activities, such as data-labeling, but in-stead on high-productivity activities like model develop-ment/deployment and research. An AI-focused ISI policycould include state-led investments into AI-related educa-tion and infrastructure, funding for private bodies to engagein domestic AI development, and limitations on the extent towhich foreign companies may be involved in or profit fromdomestic AI activities. While it remains essential, as it wasin historical ISI policies, to work with and assimilate tech-nology and expertise from foreign companies, it is impera- https://ourworldindata.org/grapher/average-real-gdp-per-capita-across-countries-and-regions?time=1869..2016&country=KOR ∼ USA ∼ OWID WRL ive that domestic expertise be developed in tandem to shapethe future of AI development and reap its large profits.This is by no means an easy task, and an AI-focused ISIpolicy encounters many of the same difficulties as histori-cal ISI policies, such as the necessity of bringing in exper-tise and technology, and in ensuring that sufficient educationand infrastructure (e.g., internet access) exist. It will likelyencounter many new difficulties that are unique to AI de-velopment as well. Even in the absence of centralized statecoordination, however, recent initiatives like Deep LearningIndaba and Khipu have promoted the importance of indige-nous AI development and have advanced education in AI.
Conclusion
As the development of artificial intelligence continues toprogress across the world, the exclusion of those from com-munities most likely to bear the brunt of algorithmic inequityonly stands to worsen. We address this question by explor-ing the challenges and benefits of increasing broader inclu-sion in the field of AI. We examine the limits of current AIinclusion methods, problems of participation regarding AIlabs situated in the Global South from major tech compa-nies, and discuss opportunities for AI to accelerate develop-ment within disadvantaged regions.We hope the actions we propose can help to begin themovement of communities in the Global South from beingjust beneficiaries or subjects of AI systems to being active,engaged participants. Having true agency over the AI sys-tems integrated into the livelihoods of communities in theGlobal South will maximize the impact of these systems andlead the way for global inclusion of AI.As a limitation of our work, it is important to acknowl-edge we are currently all located at, and have been educatedat, North American institutions. Our positions in these insti-tutions thus limit our perspective, and we respect the con-siderations we may have missed and the voices we have notheard in the course of writing this work.
References
Journal of African Economies
Journal of SouthernAfrican Studies
Journalof Media Practice
Proceedings of the 2020 CHI Conference on HumanFactors in Computing Systems , CHI ’20, 1–12. New York,NY, USA: Association for Computing Machinery. ISBN978-1-4503-6708-0. doi:10.1145/3313831.3376718. URLhttp://doi.org/10.1145/3313831.3376718.Bruhn, M.; and Gallego, F. A. 2012. Good, Bad, and UglyColonial Activities: Do They Matter for Economic Develop-ment?
The Review of Economics and Statistics
Conference on fairness, accountability and trans-parency , 77–91.Center, E. T. 2011. General Electric Research Lab. URLhttps://edisontechcenter.org/GEresearchLab.html.Chang, H.-J. 2009.
Bad Samaritans: The Myth of FreeTrade and the Secret History of Capitalism . New York, NY:Bloomsbury Press. ISBN 978-1-59691-598-5.Chuvpilo, G. 2020. AI Research Rankings 2020: Can theUnited States Stay Ahead of China? URL https://chuvpilo.medium.com/ai-research-rankings-2020-can-the-united-states-stay-ahead-of-china-61cf14b1216.Coalition for Critical Technology. 2020. Abolish the
CVPR .enton, E.; Hanna, A.; Amironesei, R.; Smart, A.; Nicole,H.; and Scheuerman, M. K. 2020. Bringing the Peo-ple Back In: Contesting Benchmark Machine LearningDatasets.
ICML Workshop on Participatory Approaches toMachine Learning .DeVries, T.; Misra, I.; Wang, C.; and van der Maaten, L.2019. Does Object Recognition Work for Everyone?
Com-puter Vision and Pattern Recognition Workshop (CVPRW) .Difallah, D.; Filatova, E.; and Ipeirotis, P. 2018. Demo-graphics and Dynamics of Mechanical Turk Workers.
Pro-ceedings of WSDM: The Eleventh ACM International Con-ference on Web Search and Data Mining .Earl, C. C. 2020. Notes from the Black In AI 2019 Work-shop. URL https://charlesearl.blog/2020/01/08/notes-from-the-black-in-ai-2019-workshop/.Frank, A. G. 1967.
Capitalism and underdevelopment inLatin America : historical studies of Chile and Brazil. , Quartz .Graham, M. 2018. The rise of the planetary labour market –and what it means for the future of work.
NS Tech .Grand View Research. 2020. Data Collection & LabelingMarket Size Worth . Ghost Work: How toStop Silicon Valley from Building a New Global Underclass
ACMConference on Human Factors in Computing Systems (CHI) .Harris, C.; Straker, L.; and Pollock, C. 2017. A socioeco-nomic related’digital divide’exists in how, not if, young peo-ple use computers.
PloS one
An-tipode
ACM Conference on Fairness, Accountability, Transparency(FAccT) .Jobin, A.; Ienca, M.; and Vayena, E. 2019. The global land-scape of AI ethics guidelines.
Nature Machine Intelligence
Dataset available fromhttps://github.com/openimages .Lee, D. 2018. Why Big Tech pays poor Kenyans to teachself-driving cars.
BBC News .Lee, M. K.; Kusbit, D.; Kahng, A.; Kim, J. T.; Yuan, X.;Chan, A.; See, D.; Noothigattu, R.; Lee, S.; Psomas, A.; andProcaccia, A. D. 2019. WeBuildAI: Participatory Frame-work for Algorithmic Governance.
Proceedings of the ACMon Human-Computer Interaction
Revista de Economia Pol´ıtica
Financial Times .Nature. 1915. Industrial Research Laboratories. URL https://doi.org/10.1038/096419a0.Nzekwe, H. 2019. Africans Are Paying More For InternetThan Any Other Part Of The World – Here’s Why. URLhttps://weetracker.com/2019/10/22/africans-pay-more-for-internet-than-other-regions/.Obermeyer, Z.; Powers, B.; Vogeli, C.; and Mullainathan, S.2019. Dissecting racial bias in an algorithm used to managethe health of populations.
Science arXiv:2012.05345 .Poskett, J. 2013. Django Unchained and the racist scienceof phrenology | James Poskett.
The Guardian arXiv:2006.16923 .Ranger, T. 2001. Colonialism, Consciousness and the Cam-era.
Past & Present
How Europe underdeveloped Africa .London :: Bogle L’Ouverture Publications. ISBN 978-0-9501546-4-0.Rogstadius, J.; Kostakos, V.; Kittur, A.; Smus, B.; Laredo, J.;and Vukovic, M. 2011. An Assessment of Intrinsic and Ex-trinsic Motivation on Task Performance in CrowdsourcingMarkets.
Proceedings of the Fifth International Conferenceon Weblogs and Social Media .Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.;Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.;Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Vi-sual Recognition Challenge.
International Journal of Com-puter Vision (IJCV) , arXiv:2012.03659 [cs] URL http://arxiv.org/abs/2012.03659. ArXiv: 2012.03659.Semuels, A. 2018. The Internet Is Enabling a New Kindof Poorly Paid Hell.
The Atlantic
NeurIPS workshop: Machine Learningfor the Developing World .Silva, M. 2021. URL https://blackinai.github.io/ arXiv:2007.02423 [cs]
URL http://arxiv.org/abs/2007.02423. ArXiv: 2007.02423.Synced. 2019. Data Annotation: The Billion Dollar BusinessBehind AI Breakthroughs. URL https://medium.com/syncedreview/data-annotation-the-billion-dollar-business-behind-ai-breakthroughs-d929b0a50d23.Thompson, A. 2016. Otherness and the Fetishization of Sub-ject. URL https://petapixel.com/2016/11/16/otherness-fetishization-subject/.Wang, A.; Narayanan, A.; and Russakovsky, O. 2020. RE-VISE: A Tool for Measuring and Mitigating Bias in Vi-sual Datasets.