Measuring Diversity of Artificial Intelligence Conferences
MM EASURING D IVERSITY OF A RTIFICIAL I NTELLIGENCE C ONFERENCES
A P
REPRINT
Ana Freire , Lorenzo Porcaro and Emilia G´omez Universitat Pompeu Fabra, BarcelonaRoc Boronat, 138. 08018 Barcelona (Spain) Joint Research Centre, European CommissionEdificio Expo, Calle Inca Garcilaso, 3. 41092 Sevilla (Spain) { ana.freire, lorenzo.porcaro, emilia.gomez } @upf.eduJune 11, 2020 A BSTRACT
The lack of diversity of the Artificial Intelligence (AI) field is nowadays a concern, and severalinitiatives such as funding schemes and mentoring programs have been designed to fight against it.However, there is no indication on how these initiatives actually impact AI diversity in the shortand long term. This work studies the concept of diversity in this particular context and proposes asmall set of diversity indicators (i.e. indexes) of AI scientific events. These indicators are designedto quantify the lack of diversity of the AI field and monitor its evolution. We consider diversityin terms of gender, geographical location and business (understood as the presence of academiaversus industry). We compute these indicators for the different communities of a conference: authors,keynote speakers and organizing committee. From these components we compute a summarizeddiversity indicator for each AI event. We evaluate the proposed indexes for a set of recent major AIconferences and we discuss their values and limitations.
Introduction
It is well recognized that Artificial Intelligence (AI) field is facing a diversity crisis, and that the lack of diversitycontributes to perpetuate historical biases and power imbalance. Different reports, such as the European Ethics guide-lines for trustworthy AI and the last AI Now Institute report [1], emphasize the urgency of fighting for diversity andre-considering diversity in a broader sense, including gender, culture, origin and other attributes such as discipline ordomain that can contribute to a more diverse research and development of AI systems.As a consequence, the research community has established different initiatives for increasing diversity such as men-toring programs, visibility efforts, travel grants, committee diversity chairs and special workshops . However, there isno mechanism to measure and monitor the diversity of a scientific community and be able to assess the impact of thesedifferent initiatives and policies.In order to address that we propose a methodology to monitor the diversity of a scientific community. We focuson scientific conferences as they are the most relevant outcome at the moment for AI research dissemination. Weconsider diversity in terms of gender, geographical location and academia vs industry (possibly to extend further)and incorporate three different aspects of a scientific conference: authors, keynote speakers and organizers. After aliterature review on diversity, we present the proposed indicators and illustrate them in a set of impact AI conferences. Authors’ primary disciplines: Artificial Intelligence, Music Information Retrieval, Social Media Analysis, Diversity. https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai See, for instance, the activities launched by the Women in Machine Learning initiative: https://wimlworkshop.org a r X i v : . [ c s . D L ] J un PREPRINT - J
UNE
11, 2020
Background
Conceptualization of Diversity
Addressing the problem of conceptualizing ”diversity” is a long-lasting debate in the scientific community, object ofstudy of several disciplines such as ecology, geography, psychology, linguistics, sociology, economics and communi-cation, among others [2]. In parallel, the interest in estimating and evaluating the degree of diversity is often justifiedby the relevance of its possible impact: from the promotion of pluralism and gender, racial and cultural equality, to theenhancement of productivity, passing to innovation and creativity in sociotechnical systems [3]. Introducing its ubiq-uity, in a very broad sense Stirling defines diversity as ” an attribute of any system whose elements may be apportionedinto categories ”. It is important to highlight two facts arising from this definition.First, the concept of diversity can be considered as the opposite of similarity, representing two sides of the samecoin [4]. Consequently, if a measure of similarity is defined as sim () to formulate a measure of diversity div () canbe imagined as considering the opposite, which in normalized settings can be translated in div () = 1 − sim () .For instance, this approach is commonly adopted in diversification strategies for recommender systems [5], wherediverse is used as a complementary concept of similar, implicitly underlying the idea of a comparison between objects.However, in Stirling’s definition, diversity is meant not to be a comparative measure but an attribute . The differencerelies on the fact that in the case of comparing objects what we ask is ” how different two (or more) objects of thissystem are? ”, whereas from an attribute perspective we ask ” how much diversity embeds this system? ”.Second, we can find in the previous definition of diversity two words which reflect the different dimensions of diversity: elements and categories . The latter is strictly related to the concept of richness, which can be interpreted as the numberof categories present in a system. The former instead is connected to the evenness of a system, i.e. the distributionof elements across the categories. Richness and evenness are the two facets of what in the literature is named
Dual-concept diversity [2]. Along with them, disparity is the third dimension of diversity, indicating the degree of differencebetween categories [3]. In Figure 1, a visual representation of the three attributes of diversity is shown.Nonetheless, even if Stirling’s definition is easily generalizable to different contexts, it is fundamental to notice thatintending diversity as sociotechnical concept implies that several interpretations can be adopted, according to the con-text of use. As Drosou et al. affirm analyzing diversity in Big Data applications [6], diversity can hardly be universallydefined in a unique way. In the process of modelling diversity from an AI point of view, to abstract completely thesocial context in which the technology is implanted can be misleading, as Selbst et al. discuss in [7]. Even if theauthors propose a digression based on the concept of fairness, the problems identified can be easily found while con-ceptualizing diversity, considering the several common aspects between these two topics, as pointed out by Celis et al.in [8].Figure 1: Schematic representation of the attributes of diversity, in the context of interdisciplinary analysis, from [9].
Measurement of Diversity
The difficulties which arise in the conceptualization of diversity are undoubtedly reflected when attempts are made forestablishing a formula representing the level of diversity in a system. However, in several fields different needs haveled to the formulation of measurements, nowadays still in use and effective. Following, we refer to a diversity index asa quantitative measure able to quantify the relationship between elements distributed in categories of a system.2
PREPRINT - J
UNE
11, 2020Two diversity measures, still in use, have been proposed in the latest 1940: the commonly called Shannon index (H’)[10], and the Simpson index (D) [11]. Even if originated from two different fields of study, Information Theory inthe former case, and Ecology in the latter, both are based on the idea of choice and uncertainty. Indeed, while writinghis
Mathematical Theory of Communication , Shannon got to define the formula wondering what measure would besuitable to describe the degree of uncertainty involved in choosing at random one event within a set of events. Similarly,Simpson formulated his index measuring the probability of choosing randomly two individuals from the same groupwithin a population.The main limitation of these measures is that they are based on statistical techniques that focus on the analysis of thefrequency of the elements, leaving aside the semantic information. Bar-Hillel and Carnap in [12] contextualized thislimit proposing the
Theory of Semantic Information , where they consider the meaning of symbols, in contrast to thefrequentist approach. The semantic gap of diversity measurements was partly solved by the introduction of the thirddimension of diversity, disparity , which joins variety and balance by creating a more solid framework for diversityanalysis. As Stirling discusses in [3], disparity indicates the degree of differences between the categories of a system.This dimension is inserted in the general diversity heuristic called the Rao-Stirling diversity index: ∆ = (cid:88) i,j i (cid:54) = j ( d ij ) α ( p i · p j ) β (1)where d ij indicates the disparity between elements i and j, while p i and p j the proportional representations of thoseelements. This measure initially proposed by Rao in [13], and revisited by Stirling in [3], is often considered whileanalyzing research interdisciplinarity in Scientometrics studies, even if results are still being discussed by the scientificcommunity, as recently done by Leydesdorff et al. [14].In the next sections, we focus separately on the indexes we will use for our diversity analysis. Shannon Index H (cid:48) = − S (cid:88) i =1 p i ln p i (2)Consider that p = n/N is the proportion of individuals of one particular species found n divided by the total numberof individuals found N , and S is the number of species.The Shannon index takes values between 1.5 and 3.5 in most ecological studies, and the index is rarely greater than4. This measure increases as both the richness and the evenness of the community increase. The fact that the indexincorporates both components of biodiversity can be seen as both a strength and a weakness. It is a strength because itprovides a simple, synthetic summary, but it is a weakness because it makes it difficult to compare communities thatdiffer greatly in richness. Pielou Index
The Shannon evenness, discarding the richness, can be computed by means of the Pielou diversity index [15]: J (cid:48) = H (cid:48) H (cid:48) max (3) H (cid:48) is the Shannon diversity index and H (cid:48) max is the maximum possible value of H (cid:48) (if every species was equallylikely): H (cid:48) max = − S (cid:88) i =1 S ln 1 S = ln S (4) J (cid:48) is constrained between 0 and 1, meaning 1 the highest evenness. Simpson Index
The Simpson diversity index is a dominance index because it gives more weight to common or dominant species. Inthis case, a few rare species with only a few representatives will not affect the diversity. Since D takes values between0 and 1 and approaches 1 in the limit of a mono-culture, − D provides an intuitive proportional measure of diversitythat is much less sensitive to species richness. Thus, Simpson’s index is usually reported as its complement − D .3 PREPRINT - J
UNE
11, 2020 D = (cid:80) S n ( n − N ( N − (5)Table 1: Diversity IndexesIndex Notation Based on RangeGender Diversity Index GDI Pielou Index [0 , Geographical Diversity Index GeoDI Shannon Index ∼ [0 , Business Diversity Index BDI Pielou Index [0 , Conference Diversity Index
CDI - [0 , Table 2: Analysed conferencesConference Acronym YearsConference on Neural Information Processing Systems NeurIPS 2017, 2018International Joint Conferences on Artificial Intelligence IJCAI 2017, 2018International Conference on Machine Learning ICML 2017, 2018
Indexes definition
This work proposes several diversity indicators to measure gender, geographical and business diversity in top ArtificialIntelligence conferences. All our indicators base their formulation in the biodiversity indexes described in the previoussections.
Gender Diversity Index (GDI)
We consider three different species ( S = 3 ) in the gender dimension: ”male”, ”female” and ”other”. Simpson indexpromotes the dominant species, and this is not the desirable behaviour, as we would like the given index to be affectedby the minorities (in this case, ”female” or ”other”). Having three species, richness is not so relevant in this case, whileevenness gains more importance.Due to the previous facts, we discard the Simpson index and we calculate just the Shannon evenness (we discardrichness) by means of the Pielou diversity index.For calculating the Gender Diversity Index, we consider three different communities: keynotes ( k ), authors ( a ) andorganisers ( o ). Our final GDI performs a weighted average among the Pielou index in each community: GDI = w k J (cid:48) k + w a J (cid:48) a + w o J (cid:48) o (6)giving the highest weight to keynotes (more visible diversity) and less weight to organisers (lower visibility): W = [ w k , w a , w o ] = [ 12 , ,
15 ] (7)
Geographical Diversity Index (GeoDI)
In order to calculate the Geographical Diversity Index we consider the same three communities: keynotes, authors andorganisers. As we have multiple species (countries), we want to measure the richness together with the evenness, sowe apply the weighted average of the Shannon Index community (this index may be greater than 1), using the weights W defined in Equation 7: GeoDI = w k H (cid:48) k + w a H (cid:48) a + w o H (cid:48) o (8)This index could also be calculated by using the Simpson Index, especially if we want to avoid the effect of veryinfrequent species (few people from some countries). 4 PREPRINT - J
UNE
11, 2020
Business Diversity Index (BDI)
The Business Diversity Index aims to calculate the diversity of a conference regarding the presence of industry,academia and research centres. Thus, the formula is very similar to the one for the GDI (see Equation 6), also consid-ering S = 3 when computing H (cid:48) max in Equation 3. Weights W are defined in Equation 7: BDI = w k J (cid:48) k + w a J (cid:48) a + w o J (cid:48) o (9) Conference Diversity Index (CDI)
The general Diversity Index of a Conference (CDI) is calculated by averaging GDI, GeoDI and BDI. The typicalvalues for the Shannon index are generally between 1.5 and 3.5 in most ecological studies, being rarely greater than4. Therefore, GeoDI needs to be normalized between [0 , before being combined with the other indexes. If wedivide it by 4, the GeoDI always gets very low values, as there use to be few representation of some countries. Afterexperimentation, we could see that, in most of the conferences, this index is usually less than 2, so we normalise GeoDIdividing it by 2, so we have comparable values to the other indices (we assume that it’s difficult to have all countriesrepresented the same in a conference, so we try to smooth the penalization for this). See Table 1 that summarises allthe proposed indexes. CDI = GDI + GeoDI + BDI (10) Indexes evaluation
In this section we describe the procedures of handling the data in order to evaluate the availability of the proposedindexes to represent the diversity of major AI events.
Dataset
The information about keynotes, authors, and organizers of the AI conferences considered in this study (see Table 2)have been collected directly from the conferences’ websites.In order to calculate the diversity indexes, we need to measure p = n/N (i.e.: the proportion of individuals of oneparticular species found n divided by the total number of individuals found N ), and S (i.e.: the number of species). Forthis purpose, we collected the names and affiliations of keynotes, organisers and authors (of a random sample compris-ing 10% of the papers) or each conference. This collection was gathered during a hackfest using a collaborative webapplication designed for this purpose, which also serves to disseminate the results and engage the research communityon AI diversity . All the project data and material is available openly so it can be reproduced and extended to otherconferences. Calculating GDI
When calculating the Gender Diversity Index, as we do not have access to gender identity information, we tag thegender most associated with the given first name. In some cases, it was needed to search for her/his personal webpage/image using a search engine.Due to the limitation of the dataset for identifying more gender options, we get S = 2 different species: male andfemale. The value of S should be changed accordingly if the dataset includes more gender options (for instance, if thisdataset is given by the organisers of a conference, using information provided during registration). Calculating GeoDI
In order to measure the Geographical Diversity Index, the available information is just the country of the affiliation.This means that we might not be considering the nationality, but the current location of each individual. This limita-tion could be avoided by, again, asking for the nationality in the registration form and building the dataset with thisinformation. https://divinai.org PREPRINT - J
UNE
11, 2020
Calculating BDI
The Business Diversity Index aims to calculate the diversity of a conference regarding the presence of industry,academia and research centres. Once again, the affiliation gives us this information, although in some cases somespecific search was needed to label the dataset. In this case, we set S = 3 . Results
In this section, we analyse the diversity indexes calculated for the set of selected conferences. We structure the analysisin four different parts, corresponding to the four diversity indexes: GDI, GeoDI, BDI and the general CDI.Table 3 reports the percentage of male and female (no individuals were characterized as ”other”) among au-thors, keynotes and organizers. In general, the values obtained for GDI are quite high (being the minimum value
GDI ( N eurIP S ) = 0 . ), even if the percentage of female is extremely lower than male in the case of authors. Butwe should note here that authors diversity contributes for the 30% of the GDI, while organizers diversity accounts forthe 20% and keynotes diversity for the 50% (we want to reward conferences that promote experts from underrepre-sented groups). If we focus on the percentage of keynotes, we can see that most of the conferences are quite balanced(ICML got 50% of female representation in keynote sessions in 2017 and 2018).The analysis regarding the Geographical Diversity Index is summarised in Table 4. As we mentioned before, weobserve that the maximum value obtained for GeoDI was lower than 2, so we divide it by 2 in order to have a measurebetween 0 and 1 and make it comparable with the other indexes. In general, the values reported for GeoDI/ arecalculated based on the number of countries present for every role. We also grouped this information in order to reportthe presence of continents and explore the variability of the index in considering these major geographical divisionsthat can be a more understandable representation of the geographical diversity ( GeoDI continents / ). We can see thatthe indexes calculated for the countries are higher than those related to the continents, as there are few species (usually3 -America, Europe, Asia- and rarely 4 -including Oceania-), and they are not equally represented. We would like tonote here that we could not find any representation of Africa (neither Antarctica) and that America is just representedby US and Canada.Table 5 reports the Business Diversity Index for the studied conferences. We observe a large difference betweenthe values of BDI for NeurIPS 2018 or ICML 2017, regarding the rest of the conferences. Most of them are purelyacademic (having no representation of industry or research centres for some of the considered roles -authors, keynotes,organizers-). However, NeurIPS 2018 has more business diversity ( BDI = 0 . ), as members of Academia, Industryand Research Centres are represented. In the case of ICML 2017 ( BDI = 0 . ), this high index is due the greatbalance of the different species even if there is not any organiser coming from a Research Centre.We observe that the diversity indexes provide much more information at a glance than the other measures reported(percentages, number of countries...). In fact, the general Conference Diversity Index (CDI) aims to summarize, usingjust one value, the gender, geographical and business diversity of a conference. This index also provides a very usefulmeasure to monitor the diversity evolution of a conference, and easily compare it with other conferences of the sametopic. Figure 2 shows how the different conferences evolve in terms of the Conference Diversity Index. ICML 2017shows the higher diversity (closer to 1).Figure 2: Diversity evolution of the studied conferences using the general Conference Diversity Index (CDI).6 PREPRINT - J
UNE
11, 2020
Conclusions
This work aims to raise awareness about the lack of diversity in Artificial Intelligence, by defining a set of indicatorsthat measure the gender, geographical and business diversity of conferences. We have explored a set of recent top AIconferences in order to calculate their related indexes and compare them in terms of diversity. The numbers have showna huge gender unbalance among authors, a lack of geographical diversity (with no representation of African countries).However, we could show evidence of the recent efforts done in promoting minorities among keynote speakers, reachinggender balance in several conferences.The indexes proposed in this work can easily be applied to conferences of different topics.Table 3: Gender Diversity Index (GDI), presenting as well the percentage of male and female for authors (from arandom sample of 10% of the papers), keynotes and organisers%Female %MaleConference Authors Keynotes Organisers Authors Keynotes Organisers
GDI
NeurIPS 2018 7.10 42.90 20.90 92.90 57.10 79.10
NeurIPS 2017 10.30 42.90 14.30 89.70 57.10 85.70
IJCAI 2018 9.30 37.50 53.80 90.70 62.50 46.20
IJCAI 2017 14.90 28.60 27.80 85.10 71.40 72.20
ICML 2018 9.80 50.00 28.90 90.20 50.00 71.10
ICML 2017 7.80 50.00 29.40 92.20 50.00 70.60
Table 4: Geo Diversity Index (GeoDI), presenting as well the continents represented (remember that we only collectedthe authors of the 10% of the papers)
GeoDI/ GeoDI continents / NeurIPS 2018 14 (3) 2 (2) 11 (3)
NeurIPS 2017 17 (4) 3 (2) 3 (2)
IJCAI 2018 10 (4) 4 (2) 8 (3)
IJCAI 2017 12 (4) 4 (3) 10 (4)
ICML 2018 14 (4) 2 (2) 7 (2)
ICML 2017 13 (3) 3 (2) 6 (4)
Table 5: Business Diversity Index (BDI), presenting as well the percentage of authors (from a random sample of 10%of the papers), keynotes and organisers belonging to Academia, Industry or Research Centres%Academia %Industry %Research CentreConference Auth Key Org Auth Key Org Auth Key Org
BDI
NeurIPS 2018 72.30 71.40 59.70 9.20 14.30 31.30 18.50 14.30 8.90
NeurIPS 2017 61.80 57.10 42.90 27.90 42.90 57.10 10.30 0 0
IJCAI 2018 75.30 75.00 100.00 15.50 25.00 0 9.20 0 0
IJCAI 2017 90.40 85.70 94.40 3.50 14.30 0 6.10 0 5.60
ICML 2018 66.50 100.00 77.80 27.40 0 17.80 6.10 0 4.40
ICML 2017 51.80 50.00 88.20 34.20 25.00 11.80 14.00 25.00 0
Acknowledgements
This work has been partially supported by the HUMAINT programme (Human Behaviour and Machine Intelligence),Centre for Advanced Studies, Joint Research Centre, European Commission. Authors acknowledge financial supportfrom the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Pro-gramme (MDM-2015-0502). Lorenzo Porcaro acknowledges financial support from the European Commission underthe TROMPA project (H2020 770376). 7
PREPRINT - J
UNE
11, 2020
References [1] Sarah Myers West, Meredith Whittaker, and Kate Crawford. Discriminating systems: Gender, race and power inai.
AI Now Institute , 2019.[2] Daniel G. McDonald and John Dimmick. The conceptualization and measurement of diversity.
CommunicationResearch , 30(1):60–79, 2003.[3] Andy Stirling. A general framework for analysing diversity in science, technology and society.
Journal of TheRoyal Society Interface , 4(15):707–719, 2007.[4] Barry Smyth and Paul McClave. Similarity vs. Diversity. In
Proceedings of the International Conference onCase-Based Reasoning (ICCBR) 2001 , pages 347–361, 2001.[5] Matevˇz Kunaver and Tomaˇz Poˇzrl. Diversity in recommender systems A survey.
Knowledge-Based Systems ,123:154–162, 2017.[6] Marina Drosou, H.V. Jagadish, Evaggelia Pitoura, and Julia Stoyanovich. Diversity in Big Data: A Review.
BigData , 5(2):73–84, 2017.[7] Andrew D. Selbst, Danah Boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. Fairnessand Abstraction in Sociotechnical Systems. In
ACM Conference on Fairness, Accountability, and Transparency(FAT*) , volume 1, pages 59–68, 2018.[8] L. Elisa Celis, Amit Deshpande, Tarun Kathuria, and Nisheeth K. Vishnoi. How to be Fair and Diverse? In
Fairness, Accountability and Transparency in Machine Learning (FAT/ML) 2016 , 2016.[9] Ismael Rafols and Martin Meyer. Diversity and network coherence as indicators of interdisciplinarity: Casestudies in bionanoscience.
Scientometrics , 82(2):263–287, 2010.[10] Claude Elwood Shannon. A mathematical theory of communication.
Bell system technical journal , 27(3):379–423, 1948.[11] Edward H Simpson. Measurement of diversity.
Nature , 163(4148):688, 1949.[12] Yehoshua Bar-Hillel and Rudolf Carnap. Semantic Information.
The British Journal for the Philosophy ofScience , 4(14):147–157, 1953.[13] C.R. Rao. Diversity: Its measurement, decomposition, apportionment and analysis.
The Indian Journal of Statis-tics , 1(44):1–22, 1982.[14] Loet Leydesdorff, Caroline S. Wagner, and Lutz Bornmann. Interdisciplinarity as diversity in citation pat-terns among journals: Rao-Stirling diversity, relative variety, and the Gini coefficient.
Journal of Informetrics ,13(1):255–269, 2019.[15] Evelyn C Pielou. The measurement of diversity in different types of biological collections.