[PDF] RatingScaleReduction package: stepwise rating scale item reduction without predictability loss

Abstract

This study presents an innovative method for reducing the number of rating scale items without predictability loss. The "area under the re- ceiver operator curve method" (AUC ROC) is used to implement in the RatingScaleReduction package posted on CRAN. Several cases have been used to illustrate how the stepwise method has reduced the number of rating scale items (variables).

Full PDF

RRatingScaleReduction package: stepwise ratingscale item reduction without predictability loss by Waldemar W. Koczkodaj and Alicja Wolny–DominiakMarch 21, 2017

Abstract

This study presents an innovative method for reducing the number ofrating scale items without predictability loss. The “area under the re-ceiver operator curve method” (AUC ROC) is used to implement in the

RatingScaleReduction package posted on CRAN . Several cases havebeen used to illustrate how the stepwise method has reduced the numberof rating scale items (variables). Keywords: rating scale, receiver operator characteristic, ROC, AUC, scalereduction.

Rating scales (also called assessment scale and ”the scale” in this study) areused to elicit data about quantitative entities. Often, predictability of ratingscales (also called “assessment scales”) can be improved. Rating scales often usevalues “1 to 10” and some rating scales may have over 100 items (questions) torate. Sometimes, the scale is called a survey or a questionnaire . A questionnaireis a tool for data collection while a survey may not necessarily be conducted byquestionnaires since some surveys may be conducted by interviews or by datagathered from web pages. In fact, the main and very important distinctionbetween the scale and both questionnaires and surveys is that the scale is usedfor assessment (as in “the scale of disaster”) while questionnaires and surveysmay be used to only collect data. In other words, some kind of “summationprocedure” must be provided for questionnaires or surveys to become ratingscales.The recent popularity of rating scales is due to various “Customer Reviews”on the Internet where ﬁve stars are often used instead of ordinal numbers.However, the most important examples of rating scales are questionnaires usedin examinations. We may risk a statement that the accelerated progress ofgranting academic degrees can be linked to a better use of rating scales. https://cran.r-project.org/web/packages/RatingScaleReduction/index.html a r X i v : . [ s t a t . C O ] M a r ating scales are predominantly used to express our subjective assessmentssuch as “on the scale 1 to 5 express your preference”: “strongly agree to stronglydisagree” (with 3 as ’neutral” preference). The importance of subjectivity pro-cessing probably inspired introduction of the idea of bounded rationality , pro-posed by Herbert A. Simon (the Nobel Prize winner), as an alternative basisfor the mathematical modeling of decision making. It is often expressed as“good enough is perfect” and gained popularity in the software industry wherefrequent updates are common. Objective data are more commonly used in socalled “strict sciences” while processing subjectivity is still under development.It is worth noticing that the objectivity is illusive. Often, the diﬀerence betweensubjectivity and objectivity is a matter of an arbitrary decision. For example,an item listed for sale for, let us say, 100,000 monetary units, will be very likelysold for 99,999 of such units if such an oﬀer is made. If so, one may also notresist accepting 99,998 monetary units and so on. Setting a limit (so called,“the bottom line”) is often a highly subjective decision. Scales help in manycases, but a large number of items (questions) is often a discouragement for itsuse.It seems that one of the ﬁrst successful rating scale reduction (RSR) tookplace in [13]. The 17-item Hamilton Rating Scale for Depression (HAM-D17)was used to derive even a more reduced version (Ham-D7) with seven items.According to [10], “The clinical utility of the HAM-D17 is hampered, in part,by the length of time required to administer the interview and by the lack ofinter-rater reliability.” A heuristic is, in essence, a simpliﬁed method for solving a problem more quicklywhen well-established methods fail to ﬁnd a sound algorithm. Usually, this isachieved by the “good enough is perfect” approach, mentioned in the introduc-tion, as characterization of the heuristic solution. Heuristics are expected toproduce a reasonable solution when a time frame and accuracy are a problem.The “good enough”solution for an urgent problem is commonly practiced incomputer science. It is usually not the best solution to our problem but it maystill be of great value. For example, the traveling salesman problem (TSP), oftenformulated as “ﬁnd the shortest possible route to visit each city exactly once andreturn to the origin city”, cannot be solved for 50 or more cities by verifying allpossible combinations since the total number of such combinations would easilyexceeds the number of atoms in the entire Universe. Using heuristics, we cansolve TSP for millions of cities with the accuracy of a small fraction of 1%. Mostheuristics produce results by themselves but many are used in conjunction withoptimization algorithms to improve their eﬃciency (e.g., diﬀerential evolution).In our case, the number of possible combinations for a rating scale with100 items (which is not uncommon) is a “cosmic number” hence the completesearch must be ruled out. Computing the area under the receiver characteristiccurve for all items is the basis for our heuristic. Common sense dictates that2he contribution of the individual items to the overall value of the area underthe receiver characteristic curve needs to be somehow utilized. We have decidedon a stepwise heuristic. Certainly, the results need to be veriﬁed and used onlyif the item reduction is substantial.

Rating scales are used to elicit data about qualitative entities (e.g., research col-laboration). This study presents an innovative method for reducing the numberof rating scale items without predictability loss. The “area under the receiveroperator curve method” (AUC ROC) is used. The presented method has re-duced the number of rating scale items (variables) to 28.57% (from 21 to 6)making over 70% of collected data unnecessary.Results have been veriﬁed by two methods of analysis: Graded ResponseModel (GRM) and Conﬁrmatory Factor Analysis (CFA). GRM revealed thatthe new method diﬀerentiates observations of high and middle scores. CFAproved that the reliability of the rating scale has not deteriorated by the scaleitem reduction. Both statistical analysis evidenced usefulness of the AUC ROCreduction method.Rating scales (also called assessment scale) are used to elicit data aboutquantitative entities. Often, the predictability of rating scales (also called “as-sessment scales”) could be improved. Rating scales often use values: “1 to 10”and some rating scales may have over 100 items (questions) to rate. Otherpopular terms for rating scales are: survey and questionnaire , although a ques-tionnaire is a method of data collection while survey may not necessarily beconducted by questionnaires. Some surveys may be conducted by interviewsor by analyzing web pages. Rating itself is very popular on the Internet for“Customer Reviews” where ﬁve stars (e.g., Amazon.com) are often used insteadof ordinal numbers. One may regard such rating as a one item rating scale.In computer science and mathematical optimization, a heuristic is a tech-nique designed for solving a problem for ﬁnding an approximate solution whenclassic methods fail to ﬁnd any exact solution. Often, ﬁnding such methods isachieved by trading completeness, accuracy, or optimality, for speed.The main objective of a heuristic is to produce a solution that is good enoughto solve our problem. The solution may not be the “best” solution and itmay only approximate the solution since the optimal solution may require aprohibitively long time. The traveling salesman problem and virus scanningare probably the most recognized problems where the need for using heuristicsis evident. In both cases, the complete search for the optimal search wouldtake thousands of years using the fastest computers built. One of the shortestheuristics may be “22/7” as an approximation of Π constant with two decimalpoints (3.14), as it is easier to remember and sometimes easier to use.Herbert A. Simon originally was the proponent of bounded rationality. Inpractice, it means that human judgments are based on heuristics. He is the onlyperson who received both the Nobel prize and the Turing prize.3ata collected by a rating scale with a ﬁxed number of items (questions)are stored in a table with one decision (in our case, binary) variable. Theparametrized classiﬁer is usually created by total score of all items. The outcomeof such rating scales is usually compared to external validation provided byassessing professionals (e.g., grant application committees).Our approach not only reduces the number of items, but also sequencesthem according to the contribution to predictability. It is based on the ReceiverOperator Characteristic (ROC), which gives individual scores for all examineditems. The term “receiver operating characteristic” (ROC), or “ROC curve”was coined for a graphical plot illustrating the performance of radar operators(hence “operating”). A binary classiﬁer represented absence or presence of anenemy aircraft. It was used to plot the fraction of true positives out of the totalactual positives (TPR = true positive rate) vs. the fraction of false positivesout of the total actual negatives (FPR = false positive rate). Positive instances(P) and negative instances (N) for some condition are computed and stored asfour outcomes of a 2 contingency table or confusion matrix, as follows:Table 1: The confusion matrix

True Positives False PositivesFalse Negative True Negative

Each patient either has or does not have the disorder. The screening out-come can be positive (classifying a patient as having the disorder) or negative(classifying the patent as not having the disorder). The screening results foreach patient may or may not match the subject’s actual status.In means that in the medical terminology we may have: • TP = true positive: patient is correctly identiﬁed as having the disorder, • FP = false positive: patient is incorrectly identiﬁed as having the disorder, • TN = true negative: patient with no disorder is correctly identiﬁed as nothaving the disorder, • FN = false negative: patent with the disorder is incorrectly identiﬁed asnot having the disorder.In simple terms, positive = identiﬁed and negative = rejected hence:True positive = correctly identiﬁed examplesFalse positive = incorrectly identiﬁed examplesTrue negative = correctly rejected examplesFalse negative = incorrectly rejected examplesIn assessment and evaluation research, the ROC curve is a representation of a“separator” (or decision) variable. The decision variable is usually: “has a prop-erty” or “does not have a property” or has some condition to meet (pass/fail).4he frequencies of positive and negative cases of the diagnostic test vary for the“cut-oﬀ” value for the positivity. By changing the “cut-oﬀ” value from 0 (allnegatives) to a maximum value (all positives), we obtain the ROC by plottingTPR (true positive rate also called sensitivity) versus FPR (false positive alsocalled speciﬁcity) across varying cut-oﬀs, which generate a curve in the unitsquare called an ROC curve.According to [2], the area under the curve (the AUC or AUROC) is equal tothe probability that a classiﬁer will rank a randomly chosen positive instancehigher than a randomly chosen negative one (assuming the ’positive’ rank higherthan ’negative’).This can be seen as follows: the area under the curve is given by (the integralboundaries are reversed as large T has a lower value on the x-axis) A = (cid:90) −∞∞ y ( T ) x (cid:48) ( T ) dT = (cid:90) −∞∞ T P R ( T ) F P R (cid:48) ( T ) dT = (cid:90) ∞−∞ T P R ( T ) P ( T ) dT = (cid:104) T P R (cid:105)

The angular brackets denote average from the distribution of negative sam-ples. AUC is closely related to the

Mann-Whitney U test which tests whetherpositives are ked higher than negatives. It is also equivalent to the Wilcoxontest of ranks.ROC method is implemented by many R packages including: pROC [11]and

ROCR [12]. There is also one interesting web application easyROC [3]giving possibility to compute the confusion matrix and plot the curve on-line.The

RatingScaleReduction package expands this analysis to carry out theprocedure of rating scale reduction.A package for preprocessing “messy” data into a form is easily analyzedwithin R is presented in [8]. In [15], the new R package sbtools enables usersdirect access to the advanced online data functionality provided by ScienceBase,the U.S. Geological Survey’s online scientiﬁc data storage platform. It can beused for harvesting other data sets.

The procedure follows the heuristic algorithm represented by Fig. 2. Techni-cally, it is an algorithm since the ﬂowchart, represented by Fig. 2, shows theﬁnite namer of steps. It is, however, a heuristic algorithms since the optimalityof the presented approach cannot be guaranteed (as pointed out in Section 2).However, the common sense dictates to select the “best” attribute and keepadding to it the next “best” attribute where the “best” has the meaning of thearea under the curve (AUC) value since it is the universally accepted criterion5or classiﬁers (in statistical classiﬁcation and machine learning). A rating scaletotal is a classiﬁer. The classiﬁcation in this study is regarded as an instanceof supervised learning. Brieﬂy, it requires a training set of correctly identiﬁedexamples (observations) with the external evaluation. In our case, a trainedprofessional is needed to determine if a subject (a screened psychiatric patient)had a mental disorder or not). An algorithm that implements a concrete clas-siﬁcation is called as a classiﬁer. The most common way of doing i by a ratingscale is using the total of all items. Some of them may be negative (e.g., in theOxford Happiness Questionnaire, see [4]).Figure 1: Rating scale stepwise reduction heuristic algorithmIn the

RatingScaleReduction , the implemented algorithm (when reducedto its minimum) uses a loop for all attributes (with the class excluded) to com-pute AUC. Subsequently, attributes are sorted in the ascending order by AUC.The attribute with the largest AUC is added to a subset of all attributes (evi-dently, it cannot be empty since it is supposed to be the minimum subset S ofall attributes with the maximum AUC). We continue adding the next in line(according to AUC) attribute to the subset S checking AUC. If it decreases,we stop the procedure. There are a lot of checking (e.g., if the dataset is notempty or full of replications) involved. These steps are implemented in startAuc , totalAuc oraz rsr functions of the package.Before running the RSR procedure the data set should be analysed to detectreplicated examples and so-called ”gray” examples. One example may be repli-cated m times, where m is the total number of examples, so that there are no6ther examples. Such situation would deviate computations and should be earlydetected. Ideally, no example should be replicated but if the replication rateis small, we can proceed to computing AUC. There is no generally acceptable“golden rule” for the level or replication rate. Moreover the data may containgray examples which should also be detected, gray example is an example forwhich there are another examples in the data set having identical values on allattributes but diﬀerent decision. This analysis of data set can by carry out usingfunctions: diﬀExamples and grayExamplesN , grayExamples .The important problem after the scale reduction by RSR procedure is tocheck for the possible inclusion of the next attribute in the reduced rating scaleby maximizing AUC of all included items. In a highly unlikely scenario, allattributes will be included in the reduced (that is, non reduced) set of items.The reduced rating scale of one attribute may be created if there is an identifyingattribute. To test the inclusion the function CheckAttr4Inclusion is available inthe package.

The

RatingScaleReduction package implements the above-stated stepwiseprocedure using two functions of the pROC package: roc and roc.test . It workson the data as the matrix or data.frame containing columns of attributes andone decision column with two categories, e.g. (0,1). The rows in data.frame represents examples in the sample. All attributes and the decision vector mustbe numeric. There are to groups of functions available in the package. Becausethe essence of the procedure is to set the attributes in the correct order goodpractice is to enter their name using e.g. colnames in R . The ﬁrst group arededicated to carry out the RSR procedure:1. startAuc(attribute, D) – compute the AUC values of every single attributein the rating scale.2. totalAuc(attribute, D, plotT=FALSE) – sort AUC values in the ascendingorder and compute AUCs of running total of ﬁrst k attributes, k = 1 , ..., n ,where n is the number of attributes. Setting the argument plotT as TRUE the plot of new AUC values is created. The horizontal line marks the maxnew AUC.3. rsr(attribute, D, plotRSR=FALSE) – the main function of the packagereducing the rating scale according the procedure illustrated by Fig. 1.Setting the argument plotRSR as TRUE the plot of ROC curve of thesum of attributes in reduced rating scale is created.Additionally, the package provides second group of functions to support thereduction procedure: 7.

CheckAttr4Inclusion(attribute, D) – carry out a statistical tests for a diﬀer-ence in AUC of two correlated ROC curves: ROC1 of the sum of attributesfrom reduced rating scale and ROC2 of this sum plus the next ordered at-tribute. The function uses The function roc.test from the pROC is usedand all implemented tests are available, in particular delong and bootstrap .2. diﬀExamples(attribute) – search replicated examples in the data and re-turn the number of diﬀerent examples and the number of duplicates.3. grayExamples(attribute, D) – produce the list of pairs of examples havingidentical values on all attributes. The decision value and attributes areproduced for every pair in the data set, so the list clearly shows all grayexamples.4. grayExamplesN(attribute, D, N) – produce a list of examples and the num-bers of examples j in the data set having identical values on all attributesfor the given example N . The examples presents the capabilities of the

RatingScaleReduction package.The full code is available for download from https://github.com/woali/RatingScaleReduction/example Rj.r . We consider the data BDI data set used in [7]. It is a rating scale for depressionBDI (Beck Depression Inventory) with 21 attributes in our relational database.The goal in this example is to show how to reduce the BDI rating scale in theuse of three main functions of the package. The data.frame we work on contains21 columns with attributes and one additional column as a decision (reality).The sample is represented with 561 examples (instances).We start with analysis of the AUC of all 21 individual attributes of BDIscale by the use of the function totalAuc setting the argument plotT as TRUE . > tauc.bdi <-totalAuc(attribute, D, plotT=TRUE)> tauc.bdi$summaryAUC one variable AUC running totalBDI_1 0.7250092 0.7250092BDI_14 0.7074013 0.7765412BDI_7 0.7014889 0.7945490BDI_9 0.7004614 0.8095300BDI_10 0.6972253 0.8131352 DI_15 0.6920635 0.8221669BDI_17 0.6742833 0.8205426BDI_8 0.6679833 0.8192322BDI_20 0.6669989 0.8198782BDI_5 0.6584779 0.8211333BDI_3 0.6556663 0.8210840BDI_4 0.6511874 0.8211394BDI_13 0.6489172 0.8210779BDI_12 0.6367909 0.8195583BDI_2 0.6312046 0.8186908BDI_19 0.6292851 0.8181125BDI_18 0.6100283 0.8160699BDI_6 0.6100037 0.8138489BDI_16 0.6059370 0.8116525BDI_11 0.5973422 0.8110680BDI_21 0.5874677 0.8117263

The R output shows the tauc.bdi $ summary AUC of every single attribute inthe second column, sorted in the ascending order. The running total of AUCsis in the thrird column. The initially selected variable (BDI 1) for the ﬁrst rowis the attribute with the largest AUC. Subsequently, we add to it the variablewith the largest AUC of the remaining attributes. The process continues whilethe last attribute of the scale is added.Values in the running total (from the top to the current variable) are checkedfor growing. Evidently, the value 0.725 in the ﬁrst row is the same for the run-ning total as for the single variable (BDI 1). However, the value in the thirdrow (0.795) is not for variable 7 but the total of variables BDI 1, BDI 14, andBDI 7. In particular, the value (0.812) in in the last row is for the total of allvariables. Their line plot can be easily created by setting the totalAuc param-eter plotT as TRUE . The plot for our BDI scale is illustrated by Fig. 2. Thecurve peek is for variable tauc.bdi $ item we receive the attribute labels in an ascend-ing order. > tauc.bdi$item"BDI_1" "BDI_14" "BDI_7" "BDI_9" "BDI_10" "BDI_15""BDI_17" "BDI_8" "BDI_20""BDI_5" "BDI_3" "BDI_4" "BDI_13" "BDI_12" "BDI_2""BDI_19" "BDI_18" "BDI_6""BDI_16" "BDI_11" "BDI_21" As illustrated by Fig. 2, the value of AUC of the selected subset of attributesis increasing by adding the ﬁrst six attributes labeled DBI 1, DBI 17, DBI 7,DBI 9, DBI 10, and DBI 15. There is a slight decline by adding the variableDBI 16. For this reason, the reduction procedure is terminated after the ﬁrst sixattributes are added. The function rsr reduce the scale automatically assumingthe truncation point as the attribute that ﬁrst reaches the maximum AUC. AUCis a real value between 0 and 1. It is 0.5 for random data but hardly ever reaches1 since , in reality, there are always “gray examples” in sizable data.10 rsr.bdi <-rsr(attribute, D, plotRSR=TRUE)The criteria: Stop first MAX AUC> rsr.bdi$rsr.auc[1] 0.7250092 0.7765412 0.7945490 0.8095300 0.8131352 0.8221669$rsr.label[1] "BDI_1" "BDI_14" "BDI_7" "BDI_9" "BDI_10" "BDI_15"$summaryAUC one variable AUC running totalBDI_1 0.7250092 0.7250092BDI_14 0.7074013 0.7765412BDI_7 0.7014889 0.7945490BDI_9 0.7004614 0.8095300BDI_10 0.6972253 0.8131352BDI_15 0.6920635 0.8221669

Setting the rsr parameter plotRSR as TRUE the function generate plot il-lustrated by Fig. 3.We assume that by selecting the “best” attribute in a loop, we are able toreduce the number of attributes for the best preventiveness. In our case, havingthe largest AUC is the “best” criterion. Adding the next “best” attribute tothe selected attribute from the subset of the remaining attributes until AUC ofall selected attributes decreases is the main idea of our heuristic. So far, eachand every rating scale has been reduced.

Le us consider the data set Hepatitis analyzed in [9] and located at http://archive.ics.uci.edu/ml/ . It has 20 attributes and 312 examples used in[1]. The goal is to illustrate how our entire RSR procedure may be used. Toreduce Hepatitis data set, we use the following attributes hepato : > names(att[,-d3])"time" "status" "trt" "age" "sex""ascites" "spiders""edema" "bili" "chol" "albumin" "copper""alk.phos" "ast""trig" "platelet" "protime" "stage" BDI + BDI + BDI + BDI + BDI + BDI , for DBIscale 12he following steps are needed: • detect duplicates and gray example in data, • reduce rating scale, • check possible inclusion.By executing this code: > diffExamples(att[,-d3])$total.examples[1] 276$dif.examples[1] 276$dup.examples[1] 0 we detect no duplicate in data set.Subsequently, we analyze the data set to detect gray examples for (sta-tus, sex, spiders, stage) attributes. Gray examples are located by the use of data.frame and a function called gray.ex .Working on a full data set is a time-consuming process since it requires allpair comparisons to be analyzed. A short optimization procedure is used forattributes in two categories. The code below shows how to list by gray examplesby comparing the subset of attributes (status, sex, spiders, stage) using thefunction grayEcamplesN . The key issue is to properly modify the data.13 gray.ex <-c()> df1 <-unique(data.frame(hepato, status, sex, spiders, stage))> for (i in 1:nrow(df1)){> ex <-grayExamplesN(df1[,2:ncol(df1)], df1[,1], i)$examp> if (nrow(ex)>1){> gray.ex <-rbind(gray.ex, ex[1,])}}> colnames(gray.ex) <-names(df1)> gray.exhepato status sex spiders stage1 1 0 1 1 42 1 0 1 1 33 0 0 2 0 46 1 0 1 0 37 0 0 1 0 38 0 0 1 1 29 0 0 1 1 415 1 0 1 0 422 1 0 2 0 223 0 0 1 0 229 1 0 1 0 242 0 0 2 0 348 1 0 1 1 253 0 0 1 0 454 0 0 1 1 357 1 0 2 0 370 1 0 2 0 480 0 0 2 0 293 1 1 1 0 398 0 1 1 0 3111 1 1 1 0 4139 0 1 1 0 2253 1 1 1 0 2256 0 1 1 0 4 The R print the list of gray examples. It means that every example from listcorresponds to (not listed) examples having identical attributes, but diﬀerentdecision.To reduce this rating scale, we execute the functions: totalAuc and rsr . > totalAuc(att[,-d3], hepato, plotT=TRUE) rsr(att[,-d3], hepato, plotRSR=TRUE) After reduction, the scale contains two attributes: stage, bili . The plot inFig. 4 illustrates AUC of the reduced rating scale and the proper ROC curve.Figure 4: Left panel: AUC of individual attributes of

Hepatitis data set; Rightpanel: AUC of stage+bili for hepato scaleFinally, we examine the scale for possible inclusion using deLong and boos-trap tests as in the function roc.test from the pROC .15

ROC1 <-CheckAttr4Inclusion(att[,-d3], hepato,+ method=c("delong"), alternative=c("two.side"))> ROC2 <-CheckAttr4Inclusion(att[,-d3], hepato,+ method=c("bootstrap"), alternative=c("two.side"))> summ Z statistics p-valueDeLong 1.986207 0.04701039bootstrap 1.945672 0.05169413

The p-value = 0.04701 shows that according to deLong test the null hypothesis“ H0 : true diﬀerence in AUC is equal to 0” should not be rejected in favor ofthe alternative. Fig. 5 illustrates two tested ROC curves. Rating scales are by far more important contributors to science that we canaddress them by this study. Most examinations for granting scientiﬁc degreesare rating scales of various shapes and forms. Simplifying them (or reducingin size) is needed since we are subjected to more and more examination for soneeded certiﬁcations.In bioinformatics, reporting trade-oﬀ in sensitivity and speciﬁcity, by usinga Receiver Operating Characteristic (ROC) curve, is becoming a common prac-tice. ROC plot has the sensitivity on the y axis, against the false discovery rate(1- speciﬁcity) on the x axis. ROC curve plot provides a visual tool to determinethe boundary limit (or the separation threshold) of a subset (or a combination)of scale items for the potentially optimal combination of sensitivity and speci-ﬁcity. The area under the curve (AUC) of the ROC curve indicates the overallaccuracy and the separation performance of the rating scale. It can be readilyused to compare diﬀerent item subsets. As a rule of thumb, the fewer the scaleitems used to maximize the AUC of the ROC curve, the better.World Health Organization estimates are included behind selected ratingscales for mental disorder. Rating scales are of considerable importance forpsychiatry where they are predominately used for screening patients for mentaldisorders such as: • depression (see [7]) which aﬀects 60 million people worldwide accordingto [14], • bipolar aﬀective disorder (60 million people), • dementia and cognitive impairment (47.5 million people)16igure 5: ROC curves for the original and reduced rating scales for the Hepatitisdata set • schizophrenia (21 million people), • autism and autism spectrum disorder(e.g., [5]) • mania and bipolar disorder, • addiction, • personality and personality disorders, • anxiety, • ADHD;and many other disorders. 17sually, there are many scales for each mental disorder. The most impor-tant for screening are global scales. Reducing these global rating scales makesthey more usable as indicated in [7]. World Health Organization Media Centrereports that“depression and anxiety disorders cost the global economy US $1trillion each year” and it is no longer a local problem.

The presented method has reduced the number of the rating scale items (vari-ables) to 28.57% from the original number of items (from 21 to 6). It meansthat over 70% of collected data was unnecessary. It is not only an essentialbudgetary saving, as the data collection is usually expensive and may easilygo into hundreds of thousands of dollars, but the excessive data collection maycontribute to the data collection error increase. The more data are collected,the more errors may occur occur since a lack of concentration and boredom arerealistic factors.By using the proposed AUC ROC reduction method, the predictability hasincreased by approximately 0.5%. It may seem insigniﬁcant. However, for alarge population, it is of considerable importance. In fact, [14] states that:“Taken together, mental, neurological and substance use disorders exact a hightoll, accounting for 13% of the total global burden.”The proposed use of AUC for reducing the number of rating scale items, as acriterion, is innovative and applicable to practically all rating scales. System Rcode is posted on the Internet (RatingScaleReduction) for the general use as aR package. Certainly, more validation cases would be helpful and the assistancewill be provided to anyone who wishes to try this method using his/her data.Future plans include using the presented method for ﬁnancial data analyzedbut the real aim our collaborative eﬀort is towards psychiatric scales. Thereduced scales can be further enhanced by the method described in [5] and [6].

Acknowledgments

The ﬁrst author has been supported in part by the Euro Research grant “HumanCapital”.The authors would also like to express appreciation to Amanda Dion-Groleau,B.A. Honors Psycholgie, (Laurentian University, Psychology), Tyler D. Jessup(Laurentian University, Computer Science), and Grant O. Duncan, Team Lead,Business Intelligence, Integration and Development, Health Sciences North,Sudbury, Ontario, Canada) for the editorial improvements.

References [1] Bojan Cestnik, Igor Kononenko, Ivan Bratko, et al. Assistant 86: Aknowledge-elicitation tool for sophisticated users. In

EWSL , pages 31–45,18987.[2] Tom Fawcett. Roc graphs: Notes and practical considerations for re-searchers.

Machine learning , 31(1):1–38, 2004.[3] Dincer Goksuluk, Selcuk Korkmaz, Gokmen Zararsiz, and A Ergun Karaa-gaoglu. easyroc: An interactive web-tool for roc curve analysis using rlanguage environment.

The R Journal , 2016.[4] Peter Hills and Michael Argyle. The oxford happiness questionnaire: Acompact scale for the measurement of psychological well-being.

Personalityand individual diﬀerences , 33(7):1073–1082, 2002.[5] Tamar Kakiashvili, Waldemar W. Koczkodaj, and Marc Woodbury-Smith.Improving the medical scale predictability by the pairwise comparisonsmethod: Evidence from a clinical data study.

Computer Methods and Pro-grams in Biomedicine , 105(3):210–216, 2012.[6] Waldemar W Koczkodaj. Statistically accurate evidence of improved errorrate by pairwise comparisons.

Perceptual and Motor Skills , 82(1):43–48,1996.[7] W.W. Koczkodaj, T. Kakiashvili, A. Szyma´nska, J. Montero-Marin,R. Araya, , J. Garcia-Campayo, K. Rutkowski, and D. Strza(cid:32)lka. How toreduce the number of rating scale items without predictability loss?

Sci-entometrics , 2, 2017.[8] Thomas J Leeper. Crowdsourced data preprocessing with r and amazonmechanical turk.

The R Journal , 8(1):276–288, 2016.[9] M. Lichman. UCI machine learning repository, 2013.[10] Roger McIntyre, Sidney Kennedy, Michael Bagby, and David Bakish.Assessing full remission.

Journal of psychiatry & neuroscience: JPN ,27(4):235, 2002.[11] Xavier Robin, Natacha Turck, Alexandre Hainard, Natalia Tiberti,Fr´ed´erique Lisacek, Jean-Charles Sanchez, and Markus M¨uller. proc: anopen-source package for r and s+ to analyze and compare roc curves.

BMCBioinformatics , 12:77, 2011.[12] Tobias Sing, Oliver Sander, Niko Beerenwinkel, and Thomas Lengauer.Rocr: visualizing classiﬁer performance in r.

Bioinformatics , 21(20):3940–3941, 2005.[13] W. Velden, M. & Clark. Reduction of rating scale data bymeans of signal detection theory.

Psychophysics , 25(6):517–518, 1979.doi:10.3758/BF03213831.[14] WHO. Mental disorders fact sheet. , April, 2016.1915] Luke A Winslow, Scott Chamberlain, Alison P Appling, and Jordan S Read.sbtools: A package connecting r to cloud-based data for collaborative onlineresearch.