Democratisation of Usable Machine Learning in Computer Vision
Raymond Bond, Ansgar Koene, Alan Dix, Jennifer Boger, Maurice D. Mulvenna, Mykola Galushka, Bethany Waterhouse Bradley, Fiona Browne, Hui Wang, Alexander Wong
DDemocratisation of Usable Machine Learning in Computer Vision
Raymond Bond , Ansgar Koene , Alan Dix , Jennifer Boger , Maurice D. Mulvenna Mykola Galushka ,Bethany Waterhouse Bradley , Fiona Browne , Hui Wang Alexander Wong , Ulster University, Northern Ireland, UK Auromind, Northern Ireland, UK Swansea University, Wales, UK University of Nottingham, Nottingham, UK University of Waterloo, Ontario, Canada DarwinAI, Ontario, Canada
Abstract
Many industries are now investing heavily in data sci-ence and automation to replace manual tasks and/or to helpwith decision making, especially in the realm of leveragingcomputer vision to automate many monitoring, inspection,and surveillance tasks. This has resulted in the emergenceof the data scientist who is conversant in statistical think-ing, machine learning (ML), computer vision, and computerprogramming. However, as ML becomes more accessibleto the general public and more aspects of ML become au-tomated, applications leveraging computer vision are in-creasingly being created by non-experts with less oppor-tunity for regulatory oversight. This points to the overallneed for more educated responsibility for these lay-users ofusable ML tools in order to mitigate potentially unethicalramifications. In this paper, we undertake a SWOT anal-ysis to study the strengths, weaknesses, opportunities, andthreats of building usable ML tools for mass adoption forimportant areas leveraging ML such as computer vision.The paper proposes a set of data science literacy criteriafor educating and supporting lay-users in the responsibledevelopment and deployment of ML applications.
1. Introduction
The prevalence of digital technology is due to initia-tives that sought to make them more accessible; for ex-ample, graphical interfaces substantially democratized theuse of computers. Today, users of various abilities canemploy end-user development (EUD) or end-user program-ming (EUP) tools to build their own computer programs us-ing graphical programming environments, such as Simulink[4], LabView [15], and Scratch [11]. Applications suchas Clementine, WEKA [16], RapidMiner [6], and recently Ludwig [8] realised the idea of interactive machine learn-ing; namely, the development of machine learning (ML)models without the user having to write computer code.However, there are several aspects of EUDs that limit theiraccessibility and use, particularly in the realm of computervision. EUDs use technical nomenclature and the user ex-perience (UX) of their interfaces have arguably not been op-timised for lay-users. EUD tools normally run on desktopmachines whereas the general public are increasingly be-coming more comfortable with web-based tools for email,social media and office applications. Finally, these EUDtools do not automatically deal with data cleansing, wran-gling, missing data imputation and therefore require theuser to have some deeper understanding of the ML process.The term usable ML has been discussed by a small num-ber of researchers, including [1] and [13]; examples of thisemerging field include human-centered ML and computervision [12, 2]. More recent advances involve running MLin the cloud, which has been described as ’ML as a Ser-vice’ (MLaaS or Cloud ML). A new initiative referred toas automated machine learning (AutoML) is evolving (e.g.Google AutoML, Auto-WEKA), where a user provides adataset (in the case of AutoML, via a web-based interface)and a set of algorithms automatically performs a task thatis normally completed by a data scientist, such as featureengineering, model selection, and optimisation. AutoMLalgorithms tests a large number of feature sets, hyperpa-rameters and permutations of ML techniques allowing forthe automatic creation of the ’best’ model. For example, anumber of recent papers [18, 10, 14, 17] have demonstratedthe ability to automatically build state-of-the-art deep neuralnetworks for computer vision tasks such as image classifi-cation and object detection based on just image data.While these initiatives point towards the increasing1 a r X i v : . [ c s . C V ] F e b igure 1. Spectrum of usable machine learning, from writing rawcode (less usable) to web-based user interfaces (more usable).Figure 2. SWOT analysis of democratising usable ML democratisation of accessible ML, particularly for com-puter vision, we propose that there is a usable ML spectrum(Figure 1). The decreasing reliance on experts as ML be-comes more accessible begs for further exploration to dis-tinguish between usable as in doing and usable as in under-standing. Whilst making ML more accessible can be a forcefor good, it is important to consider potential negative rami-fications. For example, the increased accessibility of usableML tools will likely cause an increase in inadvertent uneth-ical use of ML for computer vision because of ignorance ofML literacy amongst lay-users. Usable ML for computervision is analogous to allowing people without knowledgeof car mechanics to drive cars, and whilst this is the case,drivers do need to know how to drive a car and are expectedto follow the rules of the road, and understanding the risksand hazards of driving. Likewise, usable ML should becomplemented by a degree of data science literacy, particu-larly in the ethical use of ML bearing in mind the risks andhazards of ML deployment, especially in computer visionscenarios where the livelihood, safety, and well-being of in-dividuals in society are at stake in applications ranging fromsecurity surveillance and manufacturing quality inspectionto medical diagnosis and autonomous vehicles.If we want to support ethical algorithm development incomputer vision, we need to ensure that this is possible,which requires the creation of elements missing from to-days usable ML tools. For example, an ’educational’ fea- ture with the potential for just-in-time learning of more the-oretical ML concepts could suggest techniques or problemsin the dataset whilst providing educational resources to pro-vide the user the opportunity to have a greater level of un-derstanding and ML literacy.
2. SWOT Analysis
We present a brief SWOT analysis of democratising us-able ML for computer vision before outlining a working setof concepts that we believe are required to support ethicaland responsible use of ML platforms by lay-users (Table 1).
Strengths and Opportunities : Strengths and opportu-nities of democratising usable ML are presented in the leftcolumn of Table 1. This includes empowering the publicand industries to use and deploy ML techniques for com-puter vision without the expense of a data scientist. Itwould certainly result in new innovative applications of MLin industries that were previously disadvantaged by a lackof technological investment and resource (for example, theagriculture industry is one that sees great benefit but is cur-rently very behind in leveraging ML and computer vision).As a result, such usable ML tools could improve the de-cision making processes of many companies and verticals.Other benefits may include the widespread adoption of MLwhere it could be ubiquitous akin to spreadsheets. Thiswould allow usable ML applications to be disseminated aspart of primary and secondary education, which could resultin a generation of ’ML-natives’ who can leverage computervision as a progression from digital natives (millennials).
Weaknesses and Threats : Weaknesses and threats ofdemocratising usable ML are presented in the right col-umn of Table 1. Democratisation could result in a re-duced demand for data scientists. More worryingly, thedemocratisation of usable ML, especially in computer vi-sion, could result in the inadvertent deployment of uneth-ical ML algorithms that could perpetuate gender bias orracial discrimination unless they are audited and carefullydeveloped by a qualified data scientist or statistician whilstusing a set of ethical guidelines such as those being de-veloped by the IEEE P7003TM standard working group(http://sites.ieee.org/sagroups-7003/). Naive users could re-lease models that are very inaccurate for a number of rea-sons (e.g. sampling bias, etc.) that then result in modelsthat poorly reflect the real-world. Naive users are generallyalso more prone to automation bias, where users put an im-prudent amount of trust in ML algorithms that they believeto be accurate and become complacent, or do not think tosecond guess the algorithms predictions. In this sense, us-able ML could cyclically accelerate automation bias. Thisand other considerations listed in Table 1 are just some ofthe possible risks of democratisating usable ML tools.2 . Proposed benchmark criteria to form the ba-sis of literacy in usable ML
Considering the SWOT analysis in Table 1, we present aworking set of benchmark criteria that could serve as datascience literacy or certification for novice users of usableML tools with the intention of mitigating irresponsible de-ployment of ML algorithms in computer vision.
Supervised ML involves training an algorithm to learnpatterns from a large dataset to be subsequently used to pre-dict or classify an outcome when given new unseen cases(e.g., training a deep convolutional neural network to rec-ognize faces with a large set of face images). While thereare many techniques that can be used, the no free lunch the-orem informs us that no one ML technique can be optimalfor all problems and all domains. It is important that lay-users know that there are a multitude of ML techniques andalgorithms and that care should be taken to select one thatis appropriate for their data and context. This is importantto avoid users overtrusting in one technique for all prob-lems (e.g., deep learning) and to try different approaches todetermine their relative strengths and weaknesses.
Accountability refers to who is responsible for ML de-ployment and use. It is important that users understand thatwhile an algorithm may be able to make autonomous deci-sions, the developers are becoming increasingly responsiblefor outcomes related to their use. Understanding this willenforce the seriousness of deploying ML algorithms.
Transparency refers to how explainable and transparentan ML prediction is. Users must understand that differenttechniques provide different levels of transparency and ex-plainability in understanding the inner workings or reasonsthat a particular prediction was made. This is important toallow users to use the right technique for the right domainand for their needs. It is also important that users are awarethat there are new tools and techniques available to provideimproved explainability for techniques such as deep learn-ing, which is traditionally viewed as a ’black box’, for com-puter vision applications [7].
Data provenance are the details, metadata and originof the dataset used to build the ML model. It is importantthat the user knows that ML models are only as good as thedata used to train them and that clear boundaries regardingappropriate use given the data used to train them should beconsidered and communicated.
Algorithmic bias is when an ML model discriminatesin some way (e.g., race, gender, age, etc.) [5]. It is impor-tant for the person developing the algorithm to identify, testfor, and mitigate possible biases to ensure their algorithmshould perform equitably across different populations; theiralgorithm should be fair. New techniques are now avail-able to assist in identifying these biases, and users shouldbe aware of such tools to help to mitigate algorithmic bias.
Measurement bias is when features or an outcome ispoorly measured due to inexperience. Measurement biascould involve users only uploading data and features thatare easily codified or inaccurately codified features, whichresults in suboptimal binning and categorization and thusultimately in poor algorithm performance.
Accuracy vs. fairness:
An algorithm may produce re-sults that are accurate but not fair by ethical standards. Forexample, there is a higher percentage of males in scienceand engineering, so in the absence of explicit informationon qualifications - gender could be represented as a predic-tor of success in an engineering job; however, this would beunethical, unfair, and illegal (in the UK) to do so. Withoutexplicitly knowing and searching for these kinds of misrep-resentations, a user may interpret accuracy figures reflectthe best model to use. This is particularly critical as ML forcomputer vision is increasingly leveraged by law enforce-ment for applications such as face recognition.
Automation bias is when people place too much trust indecisions made by machines, sometimes to the point wherethey are complacent even when the machine is radically in-correct [3]. It is important that those who deploy ML con-tinuously question the decisions being made and keep inmind the accuracy of the model is not always correct.
Class imbalance and prevalence:
Prevalence is the per-centage of cases that exist in the real world. Class imbal-ance is when there is an unwanted low percentage of a typeof case (class) in the training dataset. It is important thatusers understand the limitations if the prevalence of casesin their dataset does not match up with the prevalence in thereal world. In this way they can appreciate differentials inaccuracies that will be achieved in the usable ML platformvs. the results achieved in real-world deployment.
Overfitting is when a ML model performs well on thetraining data but not in the real world due to a ML modelbeing so aligned to the training data that it is less general-izable when presented with new situations. It is importantthat users understand that while their model may have greatperformance during development, there is always a risk thatit will not perform well in the real world due to overfitting.
Concept drift is the phenomena that ML models maynot always sustain the same performance over time, sincepredictors and circumstances can change over time. Thisconcept is important to lay-users since they should under-stand the need to retrain models using more recent casesfor certain disciplines (e.g., in autonomous vehicle appli-cations; new vehicle types and models and new street signtypes get introduced over time and have different visual ap-pearance). Other concepts such as Goodhart’s law (Chrystalet al. 2003) can explain concept drifts (i.e., once a variableis used as a measure to predict another variable, it can bemanipulated insomuch that it is no longer a covariate).3 ata leakage is when predictors that are apparentlyrandom (e.g., house number, black bars used for imagepadding, etc.) are used in the ML model but seem to havepredictive power. Data leakage occurs when the answeris leaked inadvertently into training the ML algorithm andthe performance is very high largely on test data due to theleaked feature but is very low in the real world. This conceptis important to ensure users carefully consider what infor-mation should be included when training an ML algorithm.
A confounder is a variable, feature, or predictor corre-lated to the variable the user wants to predict but may notbe a lasting correlation due to it being a latent secondary ortertiary association. It is important that a user understandsthat confounders may allow ML to get the right answer forthe wrong reason and that correlation is not causality.
ML performance metrics include accuracy, sensitivity,specificity, mean average precision (mAP), and many oth-ers (e.g., kappa, area under the curve etc.). The accuracyparadox is when the ML algorithm achieves a misleadinglyhigh accuracy score but is no better than prevalence rateof the most popular class in the dataset (also known as the’no-information rate’). It is important for the user to under-stand accuracy and the accuracy paradox to guard againstmisleading themselves, clients, and colleagues.
Type 1 and type 2 errors:
Type 1 error is a false posi-tive and a type 2 error is a false negative. This is importantfor users to understand so that they can have a better un-derstanding of the ramifications of using an ML based deci-sion and design for the error type that should be avoided themost. This will also help them choose an algorithm basedon sensitivity or specificity depending on what the prefer-ence is in a given that domain or problem.
4. Discussion and Conclusion
The key message here is that ”with great power comesgreat responsibility”. Allowing development and deploy-ment of ML applications, particularly for computer vision,to be more expedient, convenient, accessible, and more userfriendly must be paired with methods that provide new andnaive users with the knowledge that they need to be respon-sible actors. While not everyone wants to or can becomeexperts in ML, responsible deployment still requires someliteracy in ML and ethical concepts related to ML use.While there are numerous positive outcomes that cancome with democratising usable ML platforms for com-puter vision, we must guard against possible negative ram-ifications of widespread access to ML capabilities. We be-lieve ML literacy that enables basic responsible use and de-ployment of ML models ought to be a paramount priorityof usable ML systems. The concepts presented in this papercomplement perspectives to other ethical positions, such as[9] and the IEEE P7003TM working standard, and providea starting point for engaging in the responsible democrati-sation of usable ML.
References [1] P. Bailis, K. Olukotun, C. R, and M. Zaharia. Infrastruc-ture for usable machine learning: The stanford dawn project.2017.[2] C. Binnig, B. Buratti, Y. Chung, C. Cousins, T. Kraska,Z. Shang, E. Upfal, R. C. Zeleznik, and E. Zgraggen.Towards interactive curation & automatic tuning of mlpipelines. In
DEEM@ SIGMOD , pages 1–1, 2018.[3] R. R. Bond, T. Novotny, I. Andrsova, L. Koc, M. Sisakova,D. Finlay, D. Guldenring, J. McLaughlin, A. Peace,V. McGilligan, et al. Automation bias in medicine: The influ-ence of automated diagnoses on interpreter accuracy and un-certainty when reading electrocardiograms.
Journal of elec-trocardiology , 2018.[4] J. B. Dabney and T. L. Harman.
Mastering simulink . Pear-son, 2004.[5] S. Hajian, F. Bonchi, and C. Castillo. Algorithmic bias:From discrimination discovery to fairness-aware data min-ing. In
Proceedings of the 22nd ACM SIGKDD internationalconference on knowledge discovery and data mining , pages2125–2126. ACM, 2016.[6] M. Hofmann and R. Klinkenberg.
RapidMiner: Data mininguse cases and business analytics applications . 2013.[7] D. Kumar, A. Wong, and G. W. Taylor. Explaining the un-explained: A class-enhanced attentive response (clear) ap-proach to understanding deep neural networks. 2017.[8] P. Molino, Y. Dudin, and S. S. Miryala. Introducing ludwig,a code-free deep learning toolbox. 2017.[9] M. Mulvenna, J. Boger, and R. Bond. Ethical by design: Amanifesto. In
Proceedings of the European Conference onCognitive Ergonomics 2017 , pages 51–54. ACM, 2017.[10] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Effi-cient neural architecture search via parameter sharing. 2018.[11] M. Resnick, J. Maloney, A. Monroy-Hern´andez, N. Rusk,E. Eastmond, K. Brennan, A. Millner, E. Rosenbaum, J. Sil-ver, B. Silverman, et al. Scratch: programming for all.
Com-munications of the ACM , 52(11):60–67, 2009.[12] D. Sacha, M. Sedlmair, L. Zhang, J. A. Lee, D. Weiskopf,S. North, and D. Keim. Human-centered machine learningthrough interactive visualization. ESANN, 2016.[13] A. Sarkar. Spreadsheet interfaces for usable machine learn-ing. In , pages 283–284. IEEE, 2015.[14] M. Tan, B. Chen, R. Pang, V. Vasudevan, and Q. V. Le.Mnasnet: Platform-aware neural architecture search for mo-bile. 2018.[15] J. Travis and J. Kring.
LabVIEW for everyone: graphicalprogramming made easy and fun . Prentice-Hall, 2007.[16] I. H. Witten, E. Frank, L. E. Trigg, M. A. Hall, G. Holmes,and S. J. Cunningham. Weka: Practical machine learningtools and techniques with java implementations. 1999.[17] A. Wong, M. J. Shafiee, B. Chwyl, and F. Li. Ferminets:Learning generative machines to generate efficient neuralnetworks via generative synthesis. 2018.[18] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learn-ing transferable architectures for scalable image recognition.2017.. Prentice-Hall, 2007.[16] I. H. Witten, E. Frank, L. E. Trigg, M. A. Hall, G. Holmes,and S. J. Cunningham. Weka: Practical machine learningtools and techniques with java implementations. 1999.[17] A. Wong, M. J. Shafiee, B. Chwyl, and F. Li. Ferminets:Learning generative machines to generate efficient neuralnetworks via generative synthesis. 2018.[18] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learn-ing transferable architectures for scalable image recognition.2017.