Modeling, Visualization, and Analysis of African Innovation Performance
PPublished as a conference paper at ICLR 2020
MODELING, VISUALIZATION, AND ANALYSISOF AFRICAN INNOVATION PERFORMANCE
Muhammad Omer, Moayad El-Amin, Ammar Nasr & Rami Ahmed ∗ Faculty of EngineeringUniversity of KhartoumKhartoum, Gamaa ave., Sudan { mhmdgaffar98,mo2yd99,ammarnasraza,ramisketcher } @gmail.com A BSTRACT
In this paper we discuss the concepts and emergence of Innovation Performance,and how to quantify it, primarily working with data from the Global Innova-tion Index, with emphasis on the African Innovation Performance. We brieflyoverview existing literature on using machine learning for modeling innovationperformance, and use simple machine learning techniques, to analyze and predictthe ”Mobile App Creation Indicator” from the Global Innovation Index, by usinginsights from the stack-overflow developers survey. Also, we build and comparemodels to predict the Innovation Output Sub-index, also from the Global Innova-tion Index.
NTRODUCTION
Measuring innovation of a certain nation or region emerged as a common field of study in the fewlast years; as the levels of complexity of each nation and the existing policy structure and infrastruc-ture prove a major challenge in creating a standardized system that can be generalized over regionsor over the globe. National innovation performance measures do exist in part of the world to passpolicies and to explore and come up with improvements in the infrastructure and policy towardsinnovation in that country. One was the report created by the Advisory Committee of MeasuringInnovation in the 21st century economy and presented to the US Secretary of Commerce in 2008,which among-st other goals aimed to develop better ways to quantify innovation in the mar-ketplace and to guide the government towards creating frameworks for measuring innovation anddirect the policies that aim to uplift innovation performance in the united states. Work on regionalinnovation metrics was a bit more successful. The European Union publishes a yearly report thatsummarizes the innovative performance of the region called ”The Regional Innovation Scoreboard”.This scoreboard divided the EU regions into four distinct classes, Regional Innovation Leaders, Re-gional Strong Innovators, Regional Moderate Innovators, and Regional Modest Innovators. Findinga global unified index is an especially difficult task. Where for example pioneering work by Sternet al. (2000); on the National Innovative Capacity provides a good historical overview on the coun-tries it covers; the diversity of these countries is the problem as it only takes a look at a specificsector of the world mainly the U.S, Europe and the high to upper-middle income Asian countrieswith no representation of third world countries and the African region to be exact. The GlobalInnovation Index, which we have chosen as a benchmark for our analysis in this paper, helps tocreate an environment in which innovation factors are continually evaluated. It is divided into twomajor indices; the Innovation Input sub-index and the Innovation output sub-index. The innovationinput sub-index has five factor evaluating institutions, Human Capital and research, Infrastructure,Market Sophistication and Business sophistication while the Innovation output sub-index gives in-sights on knowledge and technology outputs and the creative output. The GII solves the problemof representation as it covers 129 countries from different income classes in its 2019 report. How-ever, the problem for quantifying innovation performance worldwide, and in particular, Africa, isfar from solved. The Global Innovation Index is mutilated here and there by missing data andarguably misinforming indicators. Where most of country profiles incomplete come from devel-oping countries specially Africa, we take on one indicator Mobile App Creation that has a high ∗ Work was completed while working at Innovation Baylasan, a social enterprise in Khartoum. a r X i v : . [ c s . C Y ] A ug ublished as a conference paper at ICLR 2020ratio of missing data from the observations in the very same 2019 report, and try to build a robustestimator to interpolate over its values, as we will discuss in the next section. Finally, employingmachine learning techniques in the analysis and modeling of such innovation performance measuresis a promising approach with respect to building robust predictors and providing deep insights thatmight lead to novel ideas. That being said, however, contributions from the machine learning com-munity with respect to this topic are very few, not to mention that none of them addresses Africa inparticular. Two works in particular we think should be addressed are by (Bacon et al., 2019) andHajek & Henriques (2017), repectively. The first, which we follow a similar approach to it here,performs their analysis on the Global Innovation Index, and the second multi-output artificial neuralnetworks to model regional European innovation, operating on data from the EU’s ”Nomenclature ofTerritorial Units for Statistics”. Both of these works conclude with the remark that machine learningalgorithms perform better on modeling innovation data than traditional analytical methods popularin the literature. . ETHODOLOGY
Our two main contributions in this paper are building a model that uses insights from alternative datasources to predict the Mobile App Creation indicator in the Global Innovation Index worldwide datafor 2019, and building a model to predict the Innovation Output Sub-Index for a set African countriesover the last six years (2014 2019), also using the Global Innovation Index data for the indicatedyears. For the first one, we used survey data provided by stack-overflow in their website. The surveyis organized by stack-overflow, and over 90,000 developers participate in the survey each year, itprovides for us deep and comprehensive insights on the state of software development in everycountry in the world. The survey, which in the form of multiple choice questions, has more than 300questions and takes about 20 minutes to complete. We have chosen a set of 30 questions from thissurvey, we believe best represent the status quo of each country’s local software market, as well ascore competencies for each developer along with their corresponding countries. We then removedunique IDs and averaged one hot encoded column values over the countries, to produce a structuresimilar to the Global Innovation Index’, and finally merged the data set with the Mobile App Creationindicator data, with respect to countries. After that, a correlation matrix was produced to learn aboutthe interaction between features, and determine whether the developer survey’s data was relevant atall to the Mobile App Creation Indicator. Finally, we built four models; a Gaussian Process model,an extremely Gradient Boosted Trees (XGBoost) model, Support Vector Machine (SVM) model,and a Random Forests model, and compared their performance using using k-fold cross-validationwith k = 10, and with the root mean squared error (RMSE) as a measure. The models accepted allthe averaged/weighted survey questions and predicted the Mobile App Creation Indicator. For thesecond contribution, we followed an approach similar to the one employed in Bacon et al. (2019).We We used the extended report which has 81 detailed features of the 7 aforementioned indices.We extracted the data for Africa for the last six years spanning from 2014 until 2019 and covering36 African countries with some only being represented in 2 years. Unlike (Bacon et al., 2019); wedidn’t drop out countries with incomplete profiles, since that would only end up in eliminating halfof the African countries from the data set, also, our analysis focuses on only African countries, withkeen consideration to the context and aspects of the shortcomings in developing countries, ratherthan merely building a predictive model for global innovation performance. We as well, produced acorrelation matrix, and employed the four models mentioned above in this analysis as well, the onlydifference being that this time, the models accepts the 81 features described, and outputs a predictionof the innovation output sub-index. Finally, we visualized the fluctuation in innovation performancethrough these years on a geographical map, using geopandas.
ESULTS
For the Mobile App Creation prediction, it was found that the data collected from the surveys asweakly correlated with the indicator, which has resulted in the weak performance of the models,except for the Gaussian Processes model which performed best performed best. These poor results,come from the fact that the Global Innovation Index employs a top-bottom approach in the way itcollects its data to compute the indicators. For the Mobile App Creation case in particular, it uses”Global downloads of mobile apps, by origin of the HQ firm, scaled by PPP$ DGP (billions)”. This2ublished as a conference paper at ICLR 2020approach explicitly eliminates local innovators, specially those in developing countries who mostlyexist as freelancers, social enterprises, or early stage start-ups. This calls for a grassroots movementthat extends beyond merely assuming better quantifiers for innovation performance, as well as thepressing need to use alternative data sources to have accurate estimates for quantifying InnovationPerformance in the future, specially in the developing world. (a) Correlation Matrix for Developer’sSurvey (b) Model Comparison on Developer’sSurvey Data
Figure 1: Developer’s Survey Data AnalysisAs for the African countries data set, output of the correlation matrix showed that there is a directpositive correlation between Regulatory Quality of a country and its Innovation output sub-index,meaning an active role of government can lead to a healthier innovative environment and thereforebetter innovation overall. Another positive correlation appeared between the Innovation Output subindex and Governments online service and also with Rule of law continuing the apparently neededstate sponsoring of innovative friendly policies for acceleration to show in innovation. One of themost interesting correlation was that ISO 14001, an environment certificate appeared in very highcorrelation with the Innovation output which means that a moving towards a clean environmentcould push Africa’s innovation performance further. Finally the models performed relatively verygood, with XGBoost being the dominant with no surprise. (a) Correlation Matrix for Global Inno-vation Index Indicators (b) Model Comparison on Global Inno-vation Index Indicators
Figure 2: Global Innovation Index Indicators Analysis
ONCLUSION
In this paper, we outlined the need to use alternative data sources to better measure innovation per-formance in the continent, as well as the need for grassroots movements to foster and facilitate3ublished as a conference paper at ICLR 2020innovation in Africa. We also showed how simple machine learning techniques can provide novelinsights for this matter in question, and finally, a number of observations considering African inno-vation performance and recommendations to better facilitate innovation and creativity in Africa.Figure 3: African Innovation Performance in 2014 R EFERENCES
David Bacon, Dominik Forner, and Sercan Ozcan. Machine learning approach for national innova-tion performance data analysis. In Slimaneand Hammoudi, Christoph Quix, and Jorge Bernardino(eds.),
Proceedings of the 8th International Conference on Data Science, Technology and Appli-cations , volume 1. SciTePress, 2019.Shenaj Hadzimustafa and Gadaf Rexhepi. Measuring innovation in the 21st century economy.
SSRNElectronic Journal , 2011. doi: 10.2139/ssrn.1929039. URL https://doi.org/10.2139/ssrn.1929039 .Petr Hajek and Roberto Henriques. Modelling innovation performance of european regions usingmulti-output neural networks.
PLOS ONE , 12(10):e0185755, October 2017. doi: 10.1371/journal.pone.0185755. URL https://doi.org/10.1371/journal.pone.0185755 .Scott Stern, Michael Porter, and Jeffrey Furman. The determinants of national innovative capacity.Technical report, September 2000. URL https://doi.org/10.3386/w7876 .Cornell University, INSEAD, and WIPO. The global innovation index 2019: Creating healthy lives-the future of medical innovation, 2019. URL