[PDF] Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets

Abstract

Conformal Prediction is a framework that produces prediction intervals based on the output from a machine learning algorithm. In this paper we explore the case when training data is made up of multiple parts available in different sources that cannot be pooled. We here consider the regression case and propose a method where a conformal predictor is trained on each data source independently, and where the prediction intervals are then combined into a single interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and we evaluate it on a regression dataset from the UCI machine learning repository using support vector regression as the underlying machine learning algorithm, with varying number of data sources and sizes. The results show that the proposed method produces conservatively valid prediction intervals, and while we cannot retain the same efficiency as when all data is used, efficiency is improved through the proposed approach as compared to predicting using a single arbitrarily chosen source.

Full PDF

PPages:1–13, 2019

Combining Prediction Intervals on Multi-SourceNon-Disclosed Regression Datasets

Ola Spjuth [email protected]

Department of Pharmaceutical Biosciences Uppsala University, Uppsala, Sweden

Robin Carri´on Br¨annstr¨om [email protected]

Department of Statistics, Uppsala University, Uppsala, Sweden

Lars Carlsson [email protected]

Stena Line AB, Gothenburg, Sweden

Niharika Gauraha [email protected]

Department of Pharmaceutical Biosciences Uppsala University, Uppsala, Sweden

Abstract

Conformal Prediction is a framework that produces prediction intervals based on the outputfrom a machine learning algorithm. In this paper we explore the case when training datais made up of multiple parts available in diﬀerent sources that cannot be pooled. We hereconsider the regression case and propose a method where a conformal predictor is trained oneach data source independently, and where the prediction intervals are then combined into asingle interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and weevaluate it on a regression dataset from the UCI machine learning repository using supportvector regression as the underlying machine learning algorithm, with varying number ofdata sources and sizes. The results show that the proposed method produces conservativelyvalid prediction intervals, and while we cannot retain the same eﬃciency as when all datais used, eﬃciency is improved through the proposed approach as compared to predictingusing a single arbitrarily chosen source.

Keywords:

Conformal Prediction, Machine Learning, Regression, Support Vector Machines, Pre-diction Intervals

1. Introduction

There is a growing number of data analysis applications in which data comes from multipleunrelated sources and full disclosure of the data between the parties is prevented by privacyand security concerns (Abadi et al., 2017; Papernot, 2018). In such predictive analysissettings, the challenge is to make use of the isolated data sources in statistical learningsystems, with the objective to make more accurate predictions on future objects withoutsharing the data with other sources. Pooling of data to one particular location for modelbuilding can be a possible solution for small data sources, especially when data privacy isnot a concern. However, if data is large or if the data owners do not allow such poolingof data, one has to resort to secure and distributed learning approaches such as securefederated learning methods. However, federated learning models, (for example Shokri andShmatikov (2015) for deep learning) are usually complex to implement in practice. c (cid:13) a r X i v : . [ s t a t . M L ] A ug pjuth Br¨annstr¨om Carlsson Gauraha We present a light-weight framework that gives more accurate prediction intervals byaggregating conformal predictions (prediction intervals) computed at individual locations(data sources) without sharing the data between the sources. Conformal Prediction isa framework that complements the prediction from a machine learning algorithm witha valid measure of conﬁdence (i.e. prediction intervals) assuming that the data is ex-changeable (Vovk et al., 2005). We propose to combine conformal predictions from multiplesources, where inductive conformal predictors (Papadopoulos et al., 2002) and cross confor-mal predictors (Vovk, 2015) are applied on the multiple data sources and their individualprediction intervals are combined to form a single prediction on a new example. We referto this method as Non-Disclosed Conformal Prediction (NDCP).The organization of the paper is as follows. In section 2, we introduce the backgroundconcepts and notations used throughout the paper. In Section 3, we introduce the conceptof aggregating conformal predictions from multiple sources. The experimental setup andexperiments are described in Section 4. A discussion is presented in Section 5, and thepaper is concluded in Section 6.

2. Background

In this paper, we consider only regression problems and assume exchangeability of obser-vations. The object space is denoted by

X ⊂ R p , where p is the number of features, andlabel space is denoted by Y ⊂ R . We assume that each example consists of an object andits label, and its space is given as Z := X × Y . In a classical regression setting, given (cid:96) datapoints Z = { z , ..., z (cid:96) } where each example z i = ( x i , y i ) is labeled, we want to predict thelabel of a new object x new .In the conformal prediction setting, a non-conformity measure is the score from a func-tion that measures the strangeness of an example in relation to the previous examples (Vovket al., 2005). For regression problems, a commonly used non-conformity measure is α i = | y i − ˆ y i | , (1)where ˆ y i is the estimated output for the object x i using regression algorithms. When usingthe non-conformity measure (1), the prediction intervals will be of equal length for all testexamples. Instead, a non-conformity measure which takes into account the accuracy of thedecision rule f on x i can be used, yielding a prediction interval with a length proportionalto the predicted accuracy of the new example. i.e., the prediction intervals will be tighterwhen the underlying algorithm’s prediction is good and larger when it is predicted to bebad. The normalized non-conformity score is α i = (cid:12)(cid:12)(cid:12)(cid:12) y i − ˆ y i exp( σ i ) (cid:12)(cid:12)(cid:12)(cid:12) , (2)where σ i is the prediction of the logarithm of the absolute residuals, ln( y i − ˆ y i ), from alinear SVR trained on the proper training set. This is considered to be an estimate of thedecision rule accuracy (Papadopoulos et al., 2002).Conformal predictors are built on top of standard machine learning algorithms and com-plement the predictions with valid measures of conﬁdence (Vovk et al., 2005). The two main DCP Regression approaches are Transductive Conformal Prediction (TCP) (Vovk, 2013) and Inductive Con-formal Prediction (ICP) (Papadopoulos et al., 2002) and they can be used for both classiﬁca-tion and regression problems. TCP is computationally demanding; for every test example are-training of the model is required, and ICP was developed to overcome this issue. In ICP, asubset of training examples are set aside for calibration which makes it less informational ef-ﬁcient. To address this problem of information eﬃciency, ensembles of conformal predictorswere introduced such as Cross Conformal Prediction (CCP) (Vovk, 2015; Papadopoulos,2015), Aggregated Conformal Prediction (ACP) (Carlsson et al., 2014), Combination ofinductive mondrian conformal predictors (Toccaceli and Gammerman, 2018) etc. Theseensemble methods aim to construct more informational eﬃcient conformal predictors bycombining p-values. However, most of the resulting models are not guaranteed to be valid,as the combined p-values need not be uniformly distributed (Linusson et al., 2017). Also,various methods of combining p-values have been proposed, for example, combining p-valuesusing their mean (Vovk, 2015), using their median (Linusson et al., 2017), using extendedchi-square function and using standard normal form (Balasubramanian et al., 2015).

3. Non-Disclosed Conformal Prediction

In this section, we present the proposed method which we call Non-Disclosed ConformalPrediction (NDCP). Mainly, we propose a new framework to combine conformal prediction(CP) intervals across various data sources where the number of sources, the size of eachdata source and the distribution of data may vary, and where data is not shared betweenthe data sources.Suppose we have K data sources, each with a training dataset D k of arbitrary sizeswhere k ∈ , ..., K . For a new object x new , the objective is to combine prediction intervalsat the location A that were computed in each data source using CP. The result is a setof aggregated prediction-intervals, where no training data is disclosed between the datasources and between the data sources and location A , but the only information that istransmitted between data sources and A is the object to predict and the resulting prediction-intervals. Assuming that data owners can not disclose anything else than point predictionsand intervals, a simple and relatively naive approach is to take the average of all sharedvalues. The intervals are combined by using the median upper and lower bound respectively.An overview of the NDCP algorithm is presented in Algorithm 1. pjuth Br¨annstr¨om Carlsson Gauraha Algorithm 1

Non-Disclosed Conformal Prediction (NDCP)

Input: K Data sources: D , ..., D K , test example: x new Output: The prediction interval for x new : I Steps:for each D k , k ∈ { , ..., K } do Train an ICP or CCP and compute prediction interval I k for the test example x new Transfer I k to location A end Combine all intervals into one interval I (by taking the medians of the lower and upperbounds) at location A return I

4. Experiments

We evaluate NDCP on the benchmark data set Concrete Compressive Strength from the UCIrepository (Lichman et al., 2013). The experimental setup and experiments are describedin the following subsections.

To simulate a scenario where data are located in diﬀerent places, data is split into subsetswhere each subset represents an individual data source. After a test set has been set aside,data is split in three diﬀerent ways to simulate diﬀerent scenarios:1. Equally sized data sources: Training set is randomly partitioned into equally sizeddata sources.2. Unequally sized data sources: Training set is randomly partitioned into diﬀerent sizes,simulating a real life scenario where one data source is larger than the rest.3. Non-IID, equally sized data sources: Training set is divided such that one of the datasources has higher proportion of observation with high values of the response variable,simulating a real life scenario where diﬀerent data owners do not have identical data.The evaluation procedure is outlined below:1. Randomly split the data set into a training set (90%) and a test set (10%)2. Split the training set into K disjoint data sets of(a) random equally sized data sources(b) random unequally sized data sources(c) non-IID, equally sized data sources3. Train ICP or CCP on each individual data set4. Aggregate predictions from all K data sets using NDCP DCP Regression

5. Train ICP or CCP on the pooled data from all K data sets6. Repeat step one to ﬁve 100 timesIn our experiments, we use the following as the underlying machine algorithms: SupportVector Regression (SVR) with an RBF kernel, a linear SVR as proposed in Papadopoulos(2015), and Random Forests (Breiman, 2001). The nonconformity measures were calculatedas given in equation (1). The prediction intervals were combined by taking the medians ofthe lower and upper bounds as suggested in Park and Budescu (2015).For evaluations, we consider validity and eﬃciency. Validity is the proportion of truevalues contained in the prediction interval. As eﬃciency metric, we use the median widthof prediction intervals. n represents the total number of observations in the training setfor each data source. Note that n for NDCP refers to the sum of all observations used inall the models producing the prediction intervals that are combined, and is hence the totalnumber of observations (referred to as Pooled). The objective of NDCP is primarily to haveimproved eﬃciency compared to the individual sources that are used to create the NDCPintervals. It is also desirable that the performance is as close to the pooled data as possible.Together with the results, a hypothetical Ideal NDCP is also presented. This representsan NDCP with an ideal combination of intervals, in the sense that an exact validity isattained. I.e. if the intervals are conservative (predicted error is less than expected error),the intervals will be shrunk symmetrically with the same factor until the expected error rateis obtained. Note that this is only possible to do after the true labels have been revealed,and is only included to show the results NDCP would give with an hypothetical, optimalsymmetric interval combination.Each setting is considered with 2, 4 and 6 diﬀerent data sources and is repeated 100times to obtain consistent results. Support Vector Regression (SVR) with an RBF kernelwas used and in every run the parameters C , ε and γ were optimized through grid search andselected through 10-fold cross-validation. When creating prediction intervals a signiﬁcancelevel of 5% is used. This section is divided into three parts, investigating each of the three diﬀerent settings forsplitting the data.

Results from splitting the data into 2, 4, and 6 equally sized data sources are presentedin Table 1. Each row in the table represents the results from one speciﬁc model or datasource. At the 5% conﬁdence level we observe that NDCP using ICP in all cases has a lowereﬃciency when compared to the individual data sources, except for 2 data sources but hereNDCP has lower eﬃciency than one of the data sources.In Figure 1 the dispersion of the prediction interval widths are presented for Pooled,NDCP and a randomly selected data source from the equally sized data sources where CCPis used for each model at 5% conﬁdence level. We observe that the prediction interval widthscoming from the individual sources always has larger variance as compared to NDCP. pjuth Br¨annstr¨om Carlsson Gauraha Table 1: Results from Experiment 1, Equally sized data sources, for models NDCP, IdealNDCP, the individual equally sized data sources (2, 4 and 6) and Pooled. Resultsfor validity (Val) and prediction interval median width as a measure of eﬃciency(Eﬀ) are listed in the columns for CCP and ICP at diﬀerent conﬁdence levels. Thecolumn n refers to the number of observations underlying the predictions. CCP 5% ICP 5% ICP 10% ICP 15% ICP 20%

NDCP 927 0.971 26.813 0.966 28.566 0.925 22.391 0.882 18.700 0.823 16.266Ideal NDCP 927 0.950 24.575 0.950 26.717 0.900 20.961 0.850 17.392 0.800 15.497Source1 463 0.957 26.965 0.945 29.155 0.891 22.364 0.848 18.804 0.789 16.100Source2 464 0.960 26.748 0.940 28.051 0.897 22.484 0.847 18.709 0.794 16.426Pooled 927 0.964 22.283 0.945 23.259 0.900 18.445 0.854 15.362 0.794 13.286

NDCP 927 0.978 31.674 0.967 31.738 0.936 26.257 0.881 21.596 0.851 19.379Ideal NDCP 927 0.950 27.630 0.950 29.291 0.900 23.688 0.850 20.159 0.800 17.442Source1 231 0.958 31.655 0.925 32.020 0.883 27.039 0.819 21.523 0.791 19.393Source2 232 0.958 31.858 0.921 32.803 0.882 26.037 0.826 21.733 0.793 19.702Source3 232 0.955 31.793 0.928 32.325 0.887 26.804 0.830 22.026 0.781 19.350Source4 232 0.957 32.186 0.930 33.484 0.895 26.650 0.829 22.127 0.794 19.552Pooled 927 0.962 22.052 0.945 23.474 0.895 18.449 0.845 15.524 0.791 13.488

NDCP 927 0.977 34.492 0.964 33.214 0.937 27.859 0.897 24.080 0.841 20.463Ideal NDCP 927 0.950 29.853 0.950 31.196 0.900 25.119 0.850 21.653 0.800 18.911Source1 155 0.952 35.010 0.921 34.886 0.876 28.786 0.822 24.234 0.767 20.520Source2 155 0.953 34.711 0.914 36.682 0.876 28.211 0.827 24.757 0.774 20.741Source3 154 0.952 35.229 0.912 35.570 0.881 30.190 0.834 24.790 0.779 21.007Source4 154 0.953 34.876 0.914 36.096 0.861 28.441 0.830 24.333 0.765 20.491Source5 154 0.952 34.525 0.913 33.503 0.881 28.846 0.827 24.578 0.774 20.462Source6 155 0.949 34.547 0.914 35.850 0.877 28.149 0.834 24.676 0.769 20.322Pooled 927 0.967 22.601 0.948 22.932 0.897 18.490 0.845 15.440 0.800 13.366

In Table 2 the results from splitting the data into unequal sources are presented so thatSource1 contains approximately twice as many observations as the smaller sources. Asexpected, the larger Source1 in all cases has lower eﬃciency than the smaller Source2. Wealso note that NDCP in all cases has a lower eﬃciency than at least one of the individualdata sources, this is true for both ICP and CCP. NDCP also improves in terms of validitycompared with the smaller data sources. Using the hypothetical, optimal combinationof intervals with Ideal NDCP, interval widths approach the same values as of the largerSource1.In Figure 2 the dispersion of the prediction interval widths are presented for Pooled,NDCP, and a randomly selected small data source and the large data source, where ICPand CCP is used for each model. We observe that while NDCP is capable of reducing theprediction interval variance compared to the model trained on the small sources, the model DCP Regression

Table 2: Results from Experiment 2, Unequally sized data sources, for models NDCP, IdealNDCP, the individual equally sized data sources (2, 4 and 6) and Pooled. Allmodels are listed in the ﬁrst column and the corresponding validity and PI medianwidth as a measure of eﬃciency are listed in the next four columns for CCP andICP respectively. n refers to the number of observations underlying the predictions. CCP ICP

NDCP 0.971 27.453 0.961 28.438 927Ideal NDCP 0.950 24.921 0.950 26.928 927Source1 0.957 25.064 0.942 25.990 620Source2 0.957 29.951 0.938 30.899 307Pooled 0.963 22.221 0.943 22.962 927

NDCP 0.973 30.472 0.971 32.127 927Ideal NDCP 0.950 27.336 0.950 29.188 927Source1 0.957 28.534 0.940 29.350 370Source2 0.946 31.622 0.930 32.902 185Source3 0.949 31.815 0.932 35.257 186Source4 0.935 30.782 0.929 36.080 186Pooled 0.963 22.106 0.945 23.252 927

NDCP 0.973 32.986 0.972 35.562 927Ideal NDCP 0.950 29.489 0.950 31.491 927Source1 0.947 29.295 0.933 31.743 260Source2 0.940 33.643 0.927 39.362 134Source3 0.945 34.811 0.926 36.024 134Source4 0.941 34.251 0.925 44.878 133Source5 0.944 35.056 0.931 42.282 133Source6 0.940 33.647 0.926 44.392 133Pooled 0.963 22.128 0.948 23.330 927 pjuth Br¨annstr¨om Carlsson Gauraha ************************************************************************************************************************************************************************************************** ***************************************************************************************************************************************************************** ***************************************************************************************************************************************************************************************** ***************************************************************************************************************************************************** *************************************************************************************************************************************** ************************************************************************************************************** ******************************************************************************************************************************************* ********************************************************************************* ********************************************************************************************************************************************************************************************************************************************************** Number of data sources E ff i c i en cy PooledNDCPIndividual ( a ) ICP *************************************** *********************************************************************************** ***************************************************************************************** ************************************************* ******************************************************************* ************************************************************************************************************************************************************************************************** ********************************************************************* ***************************************** *************************************************************************************************************************************************************** Number of data sources E ff i c i en cy PooledNDCPIndividual ( b ) CCPFigure 1: Dispersion of eﬃciency (prediction interval widths) for Experiment 1; equallysized data sources, for 2, 4 and 6 number of data sources for ICP and CCP. Resultsfrom Pooled, NDCP and a randomly selected data source (named Individual) fromthe equally sized data sources are presented.trained on the large data source still has a lower variance. This pattern is more clear withincreasing number of data sources. In Table 3 the results from splitting the data into non-IID, equally sized data sources arepresented. In contrast to Experiment 1, data is distributed so that Source1 always containsa higher proportion of high-valued labels, which means that none of the sources will haveidentically distributed data compared to the test set.Also in this experiment we see that NDCP in all cases has a lower eﬃciency than atleast one of the individual data sources, this is true for both ICP and CCP. For the threediﬀerent scenarios, we observe large variance between the individual CCP and ICP sourceswhere NDCP eﬃciency is consistently good, but not always the best. For the 2 data sources,we observe that Source2 has validity below 90%. Source1 do however show an acceptablevalidity, but with a large interval. NDCP on the other hand manages to yield intervals withacceptable validity. Applying an ideal combination of intervals would give a tighter interval,which means there is room for improvement in the merging of intervals. For 4 and 6 sourceswe observe a similar pattern, and NDCP particularly improves in terms of validity whenused with ICP.In Figure 3 the dispersion of the interval widths for Pooled, NDCP, a data source withlow proportion of high-valued labels and the data source with high proportion of high-valuedlabels are presented, where CCP is used for each model. For this simulated data partitioning,we observe a slightly tighter interval width for NDCP compared with the individual data DCP Regression

Table 3: Results from Experiment 3, Non-IID, equally sized data sources, for models NDCP,Ideal NDCP, the individual equally sized data sources (2, 4 and 6) and Pooled.All models are listed in the ﬁrst column and corresponding validity and PI medianwidth as a measure of eﬃciency are listed in the next four columns for CCP andICP respectively. n refers to the number of observations underlying the predictions. CCP ICP

NDCP 0.959 26.510 0.953 27.790 927Ideal NDCP 0.950 25.816 0.950 27.381 927Source1 0.946 28.878 0.936 30.602 463Source2 0.895 24.281 0.879 25.055 464Pooled 0.960 22.137 0.943 22.968 927

NDCP 0.974 31.009 0.963 31.040 927Ideal NDCP 0.950 27.802 0.950 29.265 927Source1 0.938 33.625 0.909 34.678 231Source2 0.941 30.442 0.917 32.018 232Source3 0.946 30.684 0.908 29.907 232Source4 0.951 30.908 0.916 32.591 232Pooled 0.966 22.454 0.943 23.013 927

NDCP 0.974 33.723 0.965 32.825 927Ideal NDCP 0.950 29.819 0.950 30.552 927Source1 0.928 37.158 0.887 37.143 154Source2 0.944 33.401 0.907 37.042 155Source3 0.947 34.521 0.916 33.508 154Source4 0.942 33.450 0.920 35.658 155Source5 0.945 34.148 0.912 38.356 155Source6 0.944 33.387 0.906 32.591 154Pooled 0.958 22.091 0.947 23.494 927 pjuth Br¨annstr¨om Carlsson Gauraha *********************************************************************************** ***************************************************************************************************************** **************************************************************************************************************************************************************************************************************************************************************************************************** **************************************************************************************************************************************************************** **************************************************************************************************************** ************************************************************* ********************************************************************************************************************************************** ************************************************************************************* ******************************************************************************* ***************************************************************************************************************************** ******************************************************************************************************************************************************************************************************************* ************************************************************************************************************************************************************************************************************************ Number of data sources E ff i c i en cy PooledNDCPSmallLarge ( a ) ICP **************************************** ********************************************************************************************************** ******************************************************************************************************************************************************* *************************************************************************** **************************************************** ******************************************************* ***************************************************************************************************** ********************************************************************************************************* ***************************************************************** ********************************************* ************************************************************************ *********************************************************************************************************** Number of data sources E ff i c i en cy PooledNDCPSmallLarge ( b ) CCPFigure 2: Dispersion of eﬃciency (prediction interval widths) for Experiment 2; unequallysized data sources, for 2, 4 and 6 number of data sources using ICP and CCP.Results from Pooled, NDCP and a randomly selected data source from the smalldata sources (Small), and the large data source (Large) are presented.sources, and in particular the data source with high proportion of high-valued labels. Thisresult is more pronounced for the experiment with 6 data sources.

5. Discussion

This manuscript explores the use of combining prediction intervals from multiple conformalpredictors in the case when data can’t be disclosed between the individual data sources, andhence cannot be pooled into a traditional training set. In this scenario, it is also not disclosedthe number of examples in each data source as this could be e.g. sensitive information.We performed a set of experiments to investigate our method Non-Disclosed ConformalPrediction (NDCP) for diﬀerent data distribution scenarios between the individual datasources.In all three experiments, NDCP do not perform as well as pooled data, but shows animproved eﬃciency over at least one of the individual data sources, which means it has somevalue for at least this data source. For equally sized data sources, NDCP compares very welland is mostly superior in terms of eﬃciency when compared to individual data sources. Forunequally sized data sources the advantages of NDCP are less pronounced, but still NDCPoutperforms at least one data source in all settings. For equally sized sources with non-IID,NDCP is consistently good, if not always the best, as compared with individual sources.For unequally sized sources, we could argue that the largest data source always outperformsNDCP; but in the NDCP setting the number of training objects is not disclosed so this willbe hard to deduce without sharing potentially sensitive information. When the individual DCP Regression *********************************************************************************************************************************************************************************************************************************** ***************************************************************************************************************** ****************************************************************************************************************************************************************************************** *************************************************************************************************************************** *************************************************************************************** ************************************************************************************************************************************** ********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* ***************************************************************** **************************************************************************************************************************************** ***************************************************************************************** ******************************************************************************************************************************************************************************************************************************************************************************************* *************************************************************************************************************************************************************************************************************************

Number of data sources E ff i c i en cy PooledNDCPLower prop. high valuesHigher prop. high values ( a ) ICP ********************************** ****************************************************************************************************************** ********************************************************************************************************************** ******************************************************************************************************************************************************** ********************************************** *************************************************************** ******************************************************************************************************************************************************************************************************************************************************************************* ********************************************************************************************************** *************************************************** ************************************************************* *********************************************************************************************************************** ************************************************************************************************************************************** Number of data sources E ff i c i en cy PooledNDCPLower prop. high valuesHigher prop. high values ( b ) CCPFigure 3: Dispersion of eﬃciency (prediction interval widths) for Experiment 3; Non-IID,equally sized data sources, for 2, 4 and 6 number of data sources using ICP andCCP. Results from Pooled, NDCP, a randomly selected data source with lowerproportion of high-valued labels, and a data source with higher proportion ofhigh-valued labels are presented.data sources do not have identical distributions compared to the test data, the individualdata sources have larger variance in eﬃciency and generally lower validity, whereas NDCPpresents models with good validity and good, if not always the best, models. We considerthis scenario interesting as in real life scenarios the i.i.d. assumption is not always certainto fully hold.In general, when considering CCP vs ICP, aggregating seems to improve eﬃciency inour experiments although this is not in scope for this paper.In this work we have done a relatively simple merging of intervals. Future work couldinclude more advanced interval merging, such as weighting of intervals based on data sourcesize (if such data can be disclosed). Considering the Ideal NDCP, which represents anoptimal combination of intervals, shows that there indeed exist room of improvement in themerging of intervals.The experiments with only two data sources is apparently a setting where NDCP is notas suitable. We also envision that NDCP would be yield improve results for larger numbersof data sources, which would be interesting to study in future experiments.

6. Conclusions

We present a method called Non-Disclosed Conformal Prediction (NDCP) to aggregateprediction intervals from multiple data sources while avoiding the pooling of data, therebypreserving data privacy. While we cannot retain the same eﬃciency as when all data is pjuth Br¨annstr¨om Carlsson Gauraha used, the eﬃciency is improved through the proposed approach as compared to predictingusing a single source and in some evaluated scenarios is superior to models on all individualdata sources. The results indicate that the NDCP method is relevant for predictions onnon-disclosed data. Acknowledgements

This project received ﬁnancial support from the Swedish Foundation for Strategic Research(SSF) as part of the HASTE project under the call ‘Big Data and Computational Science’.The computations were performed on resources provided by SNIC through Uppsala Multi-disciplinary Center for Advanced Computational Science (UPPMAX) under project SNIC2019 / − . References

Mart´ın Abadi, Ulfar Erlingsson, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Nico-las Papernot, Kunal Talwar, and Li Zhang. On the protection of private information inmachine learning systems: Two recent approches. In , pages 1–6. IEEE, 2017.Vineeth N Balasubramanian, Shayok Chakraborty, and Sethuraman Panchanathan. Confor-mal predictions for information fusion.

Annals of Mathematics and Artiﬁcial Intelligence ,74(1-2):45–65, 2015.Leo Breiman. Random forests.

Mach. Learn. , 45(1):5–32, October 2001. ISSN 0885-6125.doi: 10.1023/A:1010933404324. URL https://doi.org/10.1023/A:1010933404324 .Lars Carlsson, Martin Eklund, and Ulf Norinder. Aggregated conformal prediction. In

IFIPInternational Conference on Artiﬁcial Intelligence Applications and Innovations , pages231–240. Springer, 2014.Moshe Lichman et al. Uci machine learning repository, 2013.Henrik Linusson, Ulf Norinder, Henrik Bostr¨om, Ulf Johansson, and Tuve L¨ofstr¨om. Onthe calibration of aggregated conformal predictors. In Alex Gammerman, Vladimir Vovk,Zhiyuan Luo, and Harris Papadopoulos, editors,

Proceedings of the Sixth Workshop onConformal and Probabilistic Prediction and Applications , volume 60 of

Proceedings ofMachine Learning Research , pages 154–173. PMLR, 13–16 Jun 2017. URL http://proceedings.mlr.press/v60/linusson17a.html .Harris Papadopoulos. Cross-conformal prediction with ridge regression. In

InternationalSymposium on Statistical Learning and Data Sciences , pages 260–270. Springer, 2015.Harris Papadopoulos, Kostas Proedrou, Volodya Vovk, and Alex Gammerman. Inductiveconﬁdence machines for regression. In

European Conference on Machine Learning , pages345–356. Springer, 2002.Nicolas Papernot. A Marauder’s Map of Security and Privacy in Machine Learning. arXive-prints , art. arXiv:1811.01134, Nov 2018. DCP Regression

Saemi Park and David V Budescu. Aggregating multiple probability intervals to improvecalibration.

Judgment and Decision Making , 10(2):130, 2015.R. Shokri and V. Shmatikov. Privacy-preserving deep learning. In , pages 909–910, Sept 2015. doi: 10.1109/ALLERTON.2015.7447103.Paolo Toccaceli and Alexander Gammerman. Combination of inductive mondrian conformalpredictors.

Machine Learning , pages 1–22, 2018.Vladimir Vovk. Transductive conformal predictors. In

IFIP International Conference onArtiﬁcial Intelligence Applications and Innovations , pages 348–360. Springer, 2013.Vladimir Vovk. Cross-conformal predictors.

Annals of Mathematics and Artiﬁcial Intelli-gence , 74(1-2):9–28, 2015.Vladimir Vovk, Alexander Gammerman, and Glenn Shafer.

Algorithmic learning in a ran-dom world . Springer Science & Business Media, 2005.. Springer Science & Business Media, 2005.