[PDF] A deterministic matching method for exact matchings to compare the outcome of different interventions

Abstract

Statistical matching methods are widely used in the social and health sciences to estimate causal effects using observational data. Often the objective is to find comparable groups with similar covariate distributions in a dataset, with the aim to reduce bias in a random experiment. We aim to develop a foundation for deterministic methods which provide results with low bias, while retaining interpretability. The proposed method matches on the covariates and calculates all possible maximal exact matchesfor a given dataset without adding numerical errors. Notable advantages of our method over existing matching algorithms are that all available information for exact matches is used, no additional bias is introduced, it can be combined with other matching methods for inexact matching to reduce pruning and that the result is calculated in a fast and deterministic way. For a given dataset the result is therefore provably unique for exact matches in the mathematical sense. We provide proofs, instructions for implementation as well as a numerical example calculated for comparison on a complete survey.

Full PDF

AA deterministic matching method for exact matchings tocompare the outcome of different interventions

Felix Bestehorn · Maike Bestehorn · ChristianKirches Received: date / Accepted: date

Abstract

Statistical matching methods are widely used in the social and health sciences toestimate causal effects using observational data. Often the objective is to ﬁnd comparablegroups with similar covariate distributions in a dataset, with the aim to reduce bias in a ran-dom experiment. We aim to develop a foundation for deterministic methods which provide results with low bias, while retaining interpretability. The proposed method matches on thecovariates and calculates all possible maximal exact matches for a given dataset withoutadding numerical errors. Notable advantages of our method over existing matching algo-rithms are that all available information for exact matches is used, no additional bias isintroduced, it can be combined with other matching methods for inexact matching to reducepruning and that the result is calculated in a fast and deterministic way. For a given datasetthe result is therefore provably unique for exact matches in the mathematical sense. We pro-vide proofs, instructions for implementation as well as a numerical example calculated forcomparison on a complete survey.

Keywords

Statistical exact matching; evaluation of observational studies; matchedsampling; weighted matching

JEL classiﬁcation codes

C15 · C18

Mathematics Subject Classiﬁcation (2010) · Mathematics Subject Classiﬁcation (2020) · C. Kirches acknowledges funding by Deutsche Forschungsgemeinschaft through Priority Programme 1962“Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Op-timization”. C. Kirches was supported by the German Federal Ministry of Education and Research, grants n o (cid:12) Felix Bestehorn · Christian KirchesE-mail: { f.bestehorn, c.kirches, } @tu-bs.de Institute for Mathematical Optimization, Technische Universit¨at Braunschweig, Braunschweig, Ger-manyMaike BestehornE-mail: [email protected] Sch¨aftlarn, Germany a r X i v : . [ s t a t . A P ] J a n F. Bestehorn et al.

Statistical matching (SM) is widely used to reduce the effect of confounding (Andersonet al., 1980; Kupper et al., 1981; Rubin, 1973) when estimating the causal effects of two dif-ferent paths of action in an observational study. Such a study consists e.g. of a dataset con-taining two therapy groups A and B , which in turn contain patients a , . . . , a | A | , b , . . . , b | B | .Every patient p ∈ D has a s -dimensional covariate vector cv ( p ) , describing the patientscondition, and an observed value o ( p ) , describing the result of the therapy, for examplessee (Abidov et al., 2005; Adams et al., 2017; Burden et al., 2017; Capucci et al., 2017; Chenet al., 2016; Cho et al., 2017; Dou et al., 2017; Fukami et al., 2017; Gozalo et al., 2015;Kishimoto et al., 2017; Kong et al., 2017; Lai et al., 2016; Lee et al., 2017; Liu et al., 2016;McDonald et al., 2017; McEvoy et al., 2016; Ray et al., 2012; Salati et al., 2017; Schermer-horn et al., 2008; Seung et al., 2008; Shaw et al., 2008; Svanstr¨om et al., 2013; Tranchartet al., 2016; Zangbar et al., 2016; Zhang et al., 2016, 2015).The goal in regard to SM would then be to ﬁnd a matching such that patients which aresimilar according to a chosen similarity measure, e.g. Mahalanobis distance or propensityscore, are compared with each other and a reliable conclusion with regards to the prefer-able therapy under the circumstances deﬁned by the underlying model and hypothesis canbe drawn from the matching, while bias potentially introduced through a comparison ofdissimilar patients is reduced. If the matching is also maximal in the sense that all possi- bly matchable patients, i.e. patients which are similar to other patients, are matched, thematching is called a maximal matching.Regrettably, minimizing bias is not the only key aspect to be considered: maximalmatching can lead to the pruning of patients, thus possibly ignoring relevant informationcontained in the dataset. As these matchings are usually not unique, there can be a high vari-ance in information in between matchings and thus conclusions drawn from them can poten-tially vary to a high degree. Hence one needs to ﬁnd a matching optimized in regards to biasand pruning. As the underlying distribution of the dataset and the inﬂuence of the therapyis unknown, ﬁnding the optimal maximal patient-to-patient matching is difﬁcult. Based onthe foundation laid by Rosenbaum and Rubin (1983) for propensity score matching (PSM),many different methods have been proposed to deal with this problem, e.g. nearest neigh-bour matching (Rubin, 1972), stratiﬁcation matching on propensity scores (Anderson et al.,1980), caliper matching (Stuart, 2010), optimal matching (Rosenbaum, 1989), coarsenedmatching (Iacus et al., 2012) or full matching (Hansen, 2004; Hansen and Klopfer, 2012).A comprehensive overview can be found for example in the article from Stuart (2010).The aforementioned methods inspect either one or several matchings and try to copewith the problem of not being able to calculate all possible maximal matchings throughnumerical or stochastic methods ((Stuart, 2010)) and have some limitations which havebeen investigated lately (Austin, 2011; King and Nielsen, 2019; Pearl, 2000). Therefore,researchers ﬁnd themselves sometimes in the difﬁcult position where different matchings,while being statistically sound, can suggest different conclusions.Due to the aforementioned reasons we investigate a slightly different approach in thisarticle. After showing that considering only one or several patient-to-patient matchings overthe whole dataset is inadequate as exponentially many different patient-to-patient match-ings exist, implying that high variance in the deduced conclusions is possible, we proceed topropose a method that matches clusters of patients. The goal is to develop a method whichuses all information contained in the dataset and considers all possible maximal matchingsof a dataset in accordance to a chosen similarity, thus leading to low confounding and lowvariance. The proposed method has the desirable property of calculating a matching in ac- eterministic statistical exact matching 3 cordance to the expectancy value of all possible maximal exact matchings in the dataset,while being fast and deterministic and therefore can support decision making processes asno additional errors are included during the matching process.A short note on terminology: We use the terms terms therapy group, patient, covariatevector and observed value for clarity and simplicity of presentation and that they can besubstituted for any type of group, member of said group, properties of the member andobserved result for the member.1.1 ContributionWe investigate the quantity of possible exact matchings and show that, even under the re-striction that only exact matches are considered, many possible matchings exists. For thisreason we propose a different approach, which uses all available information in a givendataset for an exact matching and show that the proposed method has desirable propertiesfor SM. We conﬁrm our theoretical contributions by evaluating a complete survey and com-paring the proposed method with the de-facto standard for SM in such applications, namely propensity score matching (PSM). As stated in the introduction (Section 1), the goal of SM in a general setting is to match asmany patients between two groups with regards to a chosen similarity or distance measureas possible. One can immediately distinguish two cases: – Exact matching (Iacus et al., 2012; Stuart, 2010): Only members of different sets withequal covariate vectors are matched. – δ -matching or caliper/inexact matching (Stuart, 2010): Members of different sets can bematched if their difference with regard to a chosen similarity measure is smaller than δ .Thus exact matching is a special case of δ -matching for δ = Deﬁnition 1 (Matchable Patients)

Let p and q be two patients from different therapygroups and let d ( · , · ) be an arbitrary similarity measure. Then p and q are matchable pa-tients for a δ -matching, if d ( p , q ) ≤ δ . F. Bestehorn et al.

We will only consider exact matches, δ =

0, for the remainder of this manuscript. Werefer to Section 4 for a discussion of a potential extension of this method to δ -matchingswith δ > p and q on their covariate vectors requires a similarity measure.In the remainder, we will use the L distance measure (also known as Manhattan metric) d ( p , q ) : = s ∑ i = | cv i ( p ) − cv i ( q ) | , (2.1) but other distance measures (possibly deﬁned through similarity measures) are applicable aswell as long as they can be calculated for every pair of patients. Remark 2

Note that two patients p and q have identical covariate vectors if and only ifd ( p , q ) = . Thus for exact matching d ( p , q ) = is required for all matchable patients pand q. Together with a distance measure we can deﬁne the notion of clusters.

Deﬁnition 3 (Cluster of patients)

In an SM context, a cluster of patients from one therapygroup H is a non-empty set C H of patients with properties1. d ( p , q ) = ∀ p , q ∈ C H .2. For p ∈ C H it holds that (cid:64) q ∈ H such that q / ∈ C H and d ( p , q ) = .3. If p ∈ C H , then the assigned covariate vector of C H is cv ( p ) . Hence, clusters have the following properties:

Proposition 4

Let H be a therapy group in an SM context, then the following holds forclusters in this therapy group:1. Every patient in H belongs to exactly one cluster.2. Every cluster can have exactly one covariate vector assigned to it.3. Any two clusters in H have different assigned covariate vectors.Proof

We prove every characteristic individually:1. As d ( p , p ) = , ∀ p ∈ H , all patients belong to at least one cluster. Thus it remains toshow that there exists no patient p ∈ H belonging to two different clusters C H , and C H , . Assume that p ∈ C H , ∩ C H , and let q ∈ C H , and q ∈ C H , be two patients in C H , and C H , respectively. Then it holds by Deﬁnition 3.1 that d ( p , q ) = = d ( p , q ) and therefore d ( q , q ) =

0. This is a contradiction to Deﬁnition 3.2. and therefore everypatient belongs to exactly one cluster. eterministic statistical exact matching 5

2. As clusters are non-empty sets of patients every cluster has at least one covariate vectorassigned to it. Therefore assume that cluster C has two assigned covariate vectors v and v differing in at least one entry. Then by Deﬁnition 33 it holds that there exists patients p , q ∈ C such that v = cv ( p ) and v = cv ( q ) . As v (cid:54) = v holds by assumption it followsthat d ( p , q ) (cid:54) =

0, contradicting Deﬁnition 3 . p , q ∈ C .3. Assume that different clusters C H , and C H , have the same assigned covariate vector.This implies that d ( p , q ) = , ∀ p ∈ C H , , q ∈ C H , and is a contradiction to Deﬁnition 3.2.Because of Proposition 4, clusters can be assigned unique covariate vectors. Thus thesimilarity of clusters C A and C B – for therapy groups A and B respectively – can be denotedsimilarly to patients as d ( C A , C B ) . This leads to the following observation in regards to exactmatching:For an arbitrary dataset D = ( A , B ) and a cluster C H belonging to a therapy group H ∈ { A , B } only the following situations can occur:1. For C A there exists one cluster C B with d ( C A , C B ) ≡ C A there exists no cluster C B with d ( C A , C B ) ≡ C B there exists no cluster C A with d ( C A , C B ) ≡ matching can only occur for clusters with a corresponding cluster in the opposite therapygroup. Furthermore if situation (1) occurs, then the match is unique in regards to the clusters. Proposition 5

Let C A and C B be clusters from different therapy groups, thend ( C A , C B ) ≡ holds iff the two clusters have the same assigned covariate vector.Proof Let C A and C B be clusters from different therapy groups and d ( C A , C B ) ≡

0. As everycluster has exactly one assigned covariate vector it remains to show cv ( C A ) ≡ cv ( C B ) : d ( C A , C B ) ≡ ⇔ s ∑ i = | cv i ( C A ) − cv i ( C B ) | ≡ ⇔ cv i ( C A ) ≡ cv i ( C B ) , ∀ ≤ i ≤ s . (2.2)Thus both clusters have the same assigned covariate vector. The reverse direction follows asall implications in equation (2.2) are given through equivalence.Motivated by the previous proposition and by the deﬁnition of matchable patients (Def-inition 1) we can deﬁne exact matchable clusters. Deﬁnition 6 (Matchable Cluster)

Two clusters C A and C B of A and B respectively are ex-act matchable clustersiff d ( C A , C B ) ≡ . Equipped with Deﬁnitions 3 and 6 as well as Propositions 4 and 5 we can identify theexact cardinality n of exact matchable clusters.Additionally, we can calculate the number of all different possible matchings betweentwo exact matchable clusters (Proposition 7) and for whole datasets (Proposition 8). Notethat the necessity to calculate all possible matchings arises as observed results betweendifferent maximal matchings can show a large variation, see Section 3, even if the dataset islarge and matches are exact (Table 1). Proposition 7

Let C A and C B be exact matchable clusters of A and B with | C A | = a and | C B | = b respectively, then F. Bestehorn et al.

1. A set of exact matches M between A and B with a maximal number of matches includes min ( a , b ) exact matches from C A and C B .2. For C A and C B there exists (cid:0) max ( a , b ) min ( a , b ) (cid:1) sets with min ( a , b ) exact matches, where (cid:0) max ( a , b ) min ( a , b ) (cid:1) denotes the binomial coefﬁcient of max ( a , b ) and min ( a , b ) .Proof We can w.l.o.g. assume that a ≤ b , as one can simply substitute a for b and b for a in the other case. Thus min ( a , b ) = a and as C A and C B are exact matchable clusters, a matchable pairs ( p , q ) with p ∈ C A , q ∈ C B exist. Due to Proposition 5, this is the maximalmatchable number of patients between C A and C B as M was assumed to be maximal. As-sume now that only a < a of these matchable pairs are contained in M . This contradicts themaximality of M as the remaining matchable pairs could be added. This proves (1).As a ≤ b , it holds that every element from C A is matched to elements of C B and thesematches constitute a many tuples of matched elements, in short an a -tuple. As | C B | = b , onecan construct (cid:0) ba (cid:1) different matchings by selecting different elements of C B for the matches.Therefore for two matching clusters C A and C B exist (cid:0) max ( a , b ) min ( a , b ) (cid:1) sets with min ( a , b ) exactmatches in general. Proposition 8

Let n be the number of exact matching clusters in A and B and let C A , j andC B , j with ≤ j ≤ n be exact matching clusters of A and B with | C A , j | = a j and | C B , j | = b j respectively. Then the number of different exact matchings that exist on the whole dataset is n ∏ j = (cid:18) max ( a j , b j ) min ( a j , b j ) (cid:19) (2.3) Proof

By Proposition 7 there exist (cid:0) max ( a j , b j ) min ( a j , b j ) (cid:1) exact maximal matchings for every pair ofexact matching clusters 1 ≤ j ≤ n . Therefore we have n ∏ j = (cid:18) max ( a j , b j ) min ( a j , b j ) (cid:19) maximal exact matchings in total.Note that the number of different maximal matchings in Proposition 7 is smaller forexact matchings than it is for δ -matchings with δ > Motivated by the ﬁndings of the previous subsection and the use-oriented necessity to use allavailable information in a given dataset, we investigate the outcomes of an observed value ina dataset regarding clusters. For simplicity of presentation we henceforth assume that o ( x ) is in { , } , see Remark 10 for a short discussion of other settings. eterministic statistical exact matching 7 Deﬁnition 9 (Relative frequency of an observed value in a cluster)

LetC H : = { x , . . . , x | C H | } be a cluster in therapy group H. Then the relative frequency of theobserved value o ( x u ) = in C H is deﬁned asF ( C H ) : = | C H | | C H | ∑ u = o ( x u ) . (2.4) Remark 10 (Different intervals for observational values)

Besides simplicity of presenta-tion, the assumption that o ( x ) is in { , } has several advantages and can be easily gener-alized:1. The relative frequency of the observed value o ( x u ) = in C H is − F ( C H ) .2. Any binary setting with ˜ o ∈ { , K } , K ∈ R can be mapped to o ∈ { , } .3. For non-binary outcomes o ∈ { K , K , . . . } , one has to consider the modiﬁcationF ( K i ) ( C H ) = | C H | ∑ | C H | u = χ (cid:0) o ( x u ) = K i (cid:1) , where χ (cid:0) o ( x u ) = K i (cid:1) denotes the indicatorfunction, i.e. χ (cid:0) o ( x u ) = K i (cid:1) = (cid:40) , if o ( x u ) = K i , otherwise The relative frequency of an observed value in a cluster can be seen as the relativeoutcome value for the cluster. In the context of statistical matching the observed value dataof patients should only contribute to the ﬁnal conclusion if the whole cluster can be matched.

Regrettably this is not the case in general as a cluster C A : = { x , . . . , x a } does not necessarilyhave a matching cluster. Additionally even if a matching cluster C B : = { z , . . . , z b } for C A exists only an accumulated observed value of min ( a , b ) patients should contribute to thefrequency evaluated in the end to prevent a distortion of the end result by large clusters.A ﬁrst approach, which we will reﬁne subsequently, to prevent this distortion leads to thedeﬁnition of the relative matching frequency of an observed value. Deﬁnition 11 (Relative matching frequency of an observed value)

LetC A : = { x , . . . , x a } , C B : = { z , . . . , z b } be two exact matching clusters. Then the relativematching frequency of an observed value o ( x v ) = for C A is deﬁned asF M ( C A ) : = a min ( a , b ) ∑ v = o ( x v ) . (2.5)Using the relative matching frequency of observed values to evaluate the ﬁnal outcomeof a dataset in terms of an observed value results in incomplete usage of information as onlymin { a , b } patients are matched and thus only the observed values of min { a , b } patients af-fect the outcome. This problem is independent of the matching method used, if the methoddoes not consider all possible matchings. Note that many commonly used matching meth-ods as described e.g. in (Stuart, 2010) implicitly consider the relative matching frequency F M ( C A ) as a result after a single matching realization as can be seen by interpreting the usedpatients as clusters appropriate to the chosen δ .We change the perspective to show that usage of the full information available is pos-sible. For this we begin by considering one realization of a maximal matching betweenclusters as the result of a random experiment. As all patients in a cluster are the same withrespect to their covariates, all patients in the same cluster should have the same probabilityto be chosen in a maximal matching to be matched to patients from a matching cluster. Thusevery possible maximal matching has the same probability to appear in a single maximalmatching experiment. Repeating the random experiment for maximal matchings between F. Bestehorn et al. two clusters results in a sequence of maximal matchings, which we call a uniform sequenceof matchings.

Deﬁnition 12 (Uniform sequence of matchings)

Let C A : = { x , . . . , x a } andC B : = { z , . . . , z b } be two exact matching clusters. An inﬁnite sequence of matchings M =( M , M , . . . ) is called a uniform sequence of matchings iff every possible matching betweenpatients of C A and C B has the same probability to be drawn as a matching M r in the se-quence. With these notions we can show that the expectancy of the observed value over all pos-sible maximal matchings is a term, whose value can be directly calculated through a clustermatching. Proposition 13 examines this for the case of two exact matching clusters.

Proposition 13

Let C A : = { x , . . . , x a } , C B : = { z , . . . , z b } be two exact matching clustersand let M be a uniform sequence of maximal matched pairs between C A and C B . Then1. Every M k contains min ( a , b ) matching pairs and2. The expectancy of the relative matching frequency over the sequence of uniform match-ings for C A and C B can be calculated as E ( C A ) : = lim r → ∞ r (cid:16) r ∑ k = F M ( C A ) k (cid:17) = min ( a , b ) a a ∑ v = o ( x v ) and E ( C B ) : = lim r → ∞ r (cid:16) r ∑ k = F M ( C B ) k (cid:17) = min ( a , b ) b b ∑ w = o ( z w ) . (2.6) Proof

By Proposition 8, every exact match with a maximum number of matches includesmin ( a , b ) pairs of C A and C B . Therefore every M k contains min ( a , b ) pairs of patients from C A and C B . For the second part we can w.l.o.g. assume that a ≤ b , as one can simply substi-tute in the other case. As a = min ( a , b ) it follows that F M ( C A ) k = F M ( C A ) = F ( C A ) = a a ∑ v = o ( x v ) = aa a ∑ v = o ( x v ) for all k .For a = (cid:101) z k , w , w ∈ { , . . . , b } of C B gets chosen inevery maximal matching M k . As all patients have the same probability to be chosen theprobability is b for every patient. Evaluating the limits and referring to the patients of C B chosen in one realization of a maximal matching as (cid:101) z k , w leads to:lim r → ∞ r (cid:16) r ∑ k = F M ( C B ) k (cid:17) = lim r → ∞ r (cid:16) r ∑ k = b a ∑ v = o ( (cid:101) z k , w ) (cid:17) = b b ∑ w = o ( z w ) , (2.7)where the second equation follows by the law of large numbers as all patients have the sameprobability to be chosen in a maximal matching.Now let a >

1. Thus out of b patients a patients get matched and every patient has thesame probability to be chosen in a maximal matching M k . Again by the law of large numbersin the second equation it follows thatlim r → ∞ r (cid:16) r ∑ k = F M ( C B ) k (cid:17) = lim r → ∞ r (cid:16) r ∑ k = ab a ∑ v = o ( (cid:101) z k , w ) (cid:17) = ab b ∑ w = o ( z w ) . (2.8) eterministic statistical exact matching 9 It is straightforward to generalize Proposition 13 to uniform sequences of matchingsover therapy groups containing several clusters, as a single realization of maximal matchingsbetween clusters is independent of maximal matchings between other clusters.

Proposition 14

Let n be the number of exact matching clusters and let C A , j and C B , j with ≤ j ≤ n be exact matching clusters of A and B with | C A , j | = a j and | C B , j | = b j respectively.Let M be the uniform sequence of maximal exact matchings between all clusters. Then1. Every element M k of M contains | M k | = n ∑ j = min ( a j , b j ) (2.9) matches.2. For the expectancy of the relative matching frequencies of observed values for A of anymaximum exact matching it holds that E A : = lim r → ∞ r r ∑ k = (cid:16) n ∑ j = F M ( C A , j ) k (cid:17) = n ∑ j = E ( C A , j ) , (2.10) with (cid:101) x j , k , v , v ∈ { , . . . , a j } as the patients from cluster C A , j chosen in the k-th matchingM k . Analogously it holds that E B : = lim r → ∞ r r ∑ k = (cid:16) n ∑ j = F M ( C B , j ) k (cid:17) = n ∑ j = E ( C B , j ) , Proof

Equation (2.9) holds as it is a summation over the equation from Proposition 13.1.Similarly equation (2.10) holds as a summation over equation (2.6) as a maximal matchingbetween two clusters is independent of a maximal matching between other clusters.The previous proposition shows that for each therapy group A and B not all possiblematches are realized during a single matching and that the relative matching frequency ofobserved values for uniform sequences of matchings converges to the expectancy of the ob-served results, which in this case is an easily calculable value. The added beneﬁt being thatthe term is unique for a dataset and independent of the used matching method.We are now prepared to propose an algorithm, which calculates the expectancy of therelative matching frequencies of observed values in a deterministic fashion. The full algo-rithm (Algorithm 4) is divided into three stages.In its ﬁrst stage clusters according to Deﬁnition 3 are generated (Algorithm 1). In thesecond stage the algorithm will try to match as many clusters as possible (Algorithm 2),while in the third stage it weights the patients of each cluster in accordance to the size of itsmatching cluster and its own size according to equation (2.6) (Algorithm 3). Algorithm 1

Clustering step

1: Set c = is clustered ( x v ) = A .2: for each patient x v , ≤ v ≤ | A | do if is clustered ( x v ) ≡ then

4: Set c = c + C A , c : = { x v } and is clustered ( x v ) = else continue end if for each patient x u with v < u ≤ | A | and is clustered ( x u ) ≡ do if d ( x u , C A , c ) ≡ then

10: Set C A , c = C A , c ∪ x u and is clustered ( x u ) = end if end for end for

14: Set n A = c .15: Repeat steps 1 – 13 for B and set n B = c .16: return C A , , ..., C A , n A , C B , , ..., C B , n B . Algorithm 2

Matching step

1: Set i = for every cluster C A , g , ≤ g ≤ n A do

3: Search for cluster C B , i with d ( C A , g , C B , i ) ≡ if a cluster C B , i was found in the previous step then

5: Create matching set M i = /0.6: Set M i = { C A , g , C B , i } .7: Set i = i + end if end for return Matching sets M , ..., M n . Algorithm 3

Weighting step

1: Set w ( C A , g ) = ∀ ≤ g ≤ n A and w ( C B , h ) = ∀ ≤ h ≤ n B for all C A , g , ≤ g ≤ n A , with M g (cid:54) = /0 do

3: Search for C B , h , ≤ h ≤ n B , as the previously calculated matching cluster of C A , g .4: Calculate S A , g : = S B , h : = min {| C A , g | , | C B , h |} .5: Compute w ( C A , g ) : = S A , g / | C A , g | and w ( C B , h ) : = S B , h / | C B , h | .6: end for

7: Compute Min-weighted results: R A : = n A ∑ h = (cid:2) w ( C A , h ) | C A , h | ∑ v = o ( x v , h ) (cid:3) , (2.11) R B : = n B ∑ g = (cid:2) w ( C B , g ) | C B , g | ∑ w = o ( z w , g ) (cid:3) , (2.12)where x v , h ∈ C A , h and z w , g ∈ C B , g .8: return weighted results R A , R B . Linked together, Algorithms 1 – 3 form the DeM algorithm. eterministic statistical exact matching 11

Algorithm 4

Deterministic balancing score exact matching algorithm (DeM)

1: Cluster the patients with Algorithm 1 for A and B into C A , , ..., C A , n A , C B , , ..., C B , n B .2: Compute matchings sets M , ..., M n through application of the matching Algorithm 2 on the clusters C A , , ..., C A , n A , C B , , ..., C B , n B .3: Compute the weighted result with Algorithm 3 for M , ..., M n return Weighted results R A , R B and the set of matched clusters M . As shown in Proposition 14, the DeM algorithm calculates the expectancy value forthe uniform sequence of exact maximal matchings for a given dataset (equations (2.10) andrespectively (2.11) or (2.12)), and uses every information contained in the dataset availablefor an exact matching (equations (2.9) and steps 4 – 8 of Algorithm 2 in conjunction withAlgorithm 1). This is summarized in the following theorem.

Theorem 15 (Matching properties of the DeM algorithm)

The DeM algorithm, Algorithm 4,1. matches all possible exact matches and2. produces exactly one matching result in accordance to the expectancy value of all pos-sible matches in the dataset.

Additionally one can prove that the proposed DeM algorithm (Algorithm 4) is fast anddeterministic.

Theorem 16

The DeM algorithm (Algorithm 4) is a deterministic algorithm and has a run-time of O ( | A | · | B | · s + | A | + | B | ) , where s is the dimension of the covariate vectors.Proof An algorithm is deterministic if given a particular input it will always produce thesame output. Algorithm 4 takes a dataset as input and will, in the case of an exact matching,always produce the same clusters during step 1. As the same clusters were produced in step1, the same clusters are matched in step 2, because of Proposition 5. Step 3 calculates theweight of the respective matched clusters, which is always same since the matched clustersfrom step 2 are the same. Thus Algorithm 4 is deterministic.Evaluating the runtime can be achieved by looking at every step separately:1. In Algorithm 1 the steps 1–14 have a runtime of | A | , while step 15 has a runtime of | B | .2. Algorithm 2 investigates every cluster in B at most | A | times and every comparisonbetween clusters needs s operations to determine the distance. Thus Algorithm 2 has aruntime of O ( | A | · | B | · s ) .3. As n A ≤ | A | and n B ≤ | B | it follows that Algorithm 3 has a runtime of O ( max {| A | , | B |} ) .Adding all the runtimes together and making no further assumptions in regards to the com-parative size of s , | A | and | B | , one concludes that Algorithm 4 has a runtime of O ( | A | · | B | · s + | A | + | B | ) .The exclusive applicability of the proposed algorithm to exact matches which can beseen as a limitation will be discussed in Section 4. C A : = { x , . . . , x a } , C B : = { z , . . . , z b } be two exact matching clusters, then a can beinterpreted as the population number of which ∑ a v = o ( x v ) have some property and min ( a , b ) of a patients are chosen in this maximal matching. Thus a realization of a maximal matchingin the sense of relative matching frequencies can be interpreted as a sample drawn from ahypergeometrically distributed random variable projected onto the interval [ , ] .The view of maximal matchings as realization of a drawing from a hypergeometricdistribution concurs with the results of Propositions 13 and 14 from the previous subsectionas the expectancy of a hypergeometric distribution for two exact matching clusters is E ( C A ) and E ( C B ) , respectively, where the terms a and b stem from reversing the normalizationdone in the previous subsection. Taking this perspective allows to calculate the variance for maximal matchings.

Proposition 17

Let C A : = { x , . . . , x a } , C B : = { z , . . . , z b } be two exact matching clusters.Then the variance of matchings for C A is given by Var ( C A ) = E ( C A ) (cid:16) − ∑ a v = o ( x v ) a (cid:17) a − min ( a , b ) a − Proof

Viewing one realization of a cluster matching as the realization of a hypergeometri-cally distributed random variable yields the probability of picking one patient with o ( x v ) ≡ ∑ a v = o ( x v ) a . From Proposition 13 it is know that E ( C A ) = min ( a , b ) a ∑ a v = o ( x v ) . Thus usingthe formula for the variance of hypergeometric distributions yields equation (2.13).As exact matchings of different clusters are independent from each other, the variancefor a matching over a therapy group follows immediately from the previous Proposition 17. Corollary 18

Let n be the number of exact matching clusters in A and B and let C A , j andC B , j with ≤ j ≤ n be exact matching clusters of A and B with | C A , j | = a j and | C B , j | = b j , respectively. Let M be the uniform sequence of maximal exact matchings between allclusters. Then the variance of therapy group A is Var ( A ) = n ∑ j = Var ( C A , j ) . (2.14) For therapy group B equation 2.14 holds similarly.

Note that all values used in Proposition 17 and Corollary 18 are available after matchingwith the DeM algorithm and that the clusters are matched such that no additional factoris introduced into the variance. Additionally note that the variance of a cluster for whichall patients are matched, i.e. for C A with min ( a , b ) = a , is 0. The same holds for clustersfor which all patients have the same observed result, as then either F w ( C A ) = (cid:16) − eterministic statistical exact matching 13 ∑ min ( a , b ) v = o ( x v ) a (cid:17) =

0. Thus at least half of all matched clusters from A and B fulﬁl either themin ( a , b ) = a or min ( a , b ) = b condition, and therefore have a variance of 0 in the matchingcalculated by the DeM algorithm.The second property we discuss relates to the calculation of clusters and the initialmatching procedure. Rosenbaum and Rubin (1983) deﬁned the balancing score b ( p ) of apatient p as a value assignment such that the conditional distribution of cv ( p ) is the samefor patients p from both treatment groups, A and B . They have shown that cv ( p ) is the ﬁnestbalancing score ((Rosenbaum and Rubin, 1983), section 2) and that if treatment assignmentis strongly ignorable, then the difference between the two respective treatments is an unbi-ased estimate of the average treatment effect at that balancing score value ((Rosenbaum andRubin, 1983), theorem 3). Since we use cv ( p ) in our calculations, the result calculated byAlgorithm 4 is an unbiased estimate of the average treatment effect, if the strong ignorabilityassumption holds, additionally to the properties proven previously. We use an ofﬁcial complete survey to illustrate the effect of ignoring different possiblematchings as well as the results of the proposed DeM algorithm.The dataset used is the quality assurance dataset of isolated aortic valve proceduresin 2013, which is an ofﬁcial mandatory dataset including all aortic valve surgery casesin German hospitals. It contains patient information (covariates) and mortality informa-tion (observed results) for 17 ,

427 patients. For each patient the corresponding record con-tains s =

19 covariate variables. The 17 ,

427 patients are divided into two therapy groups.9 ,

848 SAVR cases (replacement surgery of aortic valves) and 7 ,

579 TF-AVI cases (tran-scatheter/transfemoral implantation of aortic valves). The cases were documented in accor-dance with §137 Social Security Code V (SGB V) by hospitals registered under §108 SGBV. The data collection is compulsory for all in-patient isolated aortic valve procedures inGerman hospitals. The dataset is held by the Federal Joint Committee (Germany) and freelyaccessible for researchers after application. Given this dataset, it can be safely assumed thatthe data is independent in a statistical sense as patients were only recorded once.We proceed to compare the proposed DeM algorithm with two other approaches: the de-facto standard for statistical matching, the 1:1 propensity score matching (PSM), as well as abootstrapped variant of 1:1 PSM by Austin and Small (2014). For the regression based PSMalgorithms, relevant regression variables and their values have to be determined. For ourexample, we consider the H -hypothesis: The mortality-rate does not depend on therapy ,for which the relevant variables are internationally validated in the Euroscore II ( ). The corresponding regression values for this setting are takenfrom the quality assurance dataset of isolated aortic valve procedures. PSM itself was thencalculated using functions provided by IBM SPSS Statistics for Windows, Version 24 . v : w matchings, for arbitrary v , w ∈ N are included in the set ofpossible 1:1 matchings, while the reverse is obviously not true for arbitrary v , w ∈ N andany given dataset. Table 1

Results for maximal matchings1 ,

502 exact matchings with SAVR TF-AVI χ Testregards to all 19 Euroscore II in-hospital death in-hospital death (2-tailed)variables and without replacement count % count % p-valuePSM Set 1 73 4 .

9% 33 2 . < . .

9% 34 2 . < . .

8% 32 2 .

1% 0 . .

6% 15 1 .

0% 0 . .

9% 50 3 .

3% 0 . .

6% 50 3 .

3% 0 . .

9% 15 1 .

0% 0 . ,

000 samples) 52 .

47 3 .

49% 32 .

10 2 .

14% 0 . a DeM 53.01 3.5% 32.32 2.1% 0.0227

We additionally note that a match of two patients with δ > ,

848 SAVR patients 3 ,

361 had at least one exact TF-AVI match, while out ofthe 7 ,

579 TF-AVI patients, 2 ,

249 patients had at least one exact SAVR match. Thus onethird of all patients could be exactly matched. As stated in the previous section, the nullhypothesis for calculation of the p-values was H : The mortality-rate does not depend ontherapy . The results of some maximum matchings and the differences between them areindicated in Table 1.A maximal matching 1:1 comprises 1 ,

502 matching pairs of patients, meaning that3 ,

004 patients were matched during any exact non-cluster matching. We only consideredmaximal exact matches, therefore every shown matching matches the maximal number ofpatients possible and matches two patients if and only if their covariates are equal, meaningthat the standardized differences in all presented sets is 0. Still the large discrepancy betweenthe observed results in the shown sets immediately indicates that many maximal matchingsexist, as shown in Proposition 8. Calculating all possible maximal matchings would be afutile endeavour and not necessary if the observed results between different maximal match-ings would not vary. Unfortunately observed results can vary to a very high degree, as canbe seen in Table 1. They vary in such a way that one could even draw different conclusionsbased on the matching one calculated, see sets 1, 6 and 7, while arguing that the calculatedp-value is below a threshold of 1%. For other sets one can see that they are on either sideof the spectrum, favouring one, the other, or no therapy. Even the bootstrapping result for10 ,

000 samples did not exhaust all possible maximal matchings and is only similar to theresult of DeM, which gives as a result the expectancy of all possible maximal exact matchesin the given dataset, see Theorem 15. a The t-test values for all sets without replacement are < . . As can be seen from the results, given one dataset and a non-deterministic method onecould obtain different results even when regression variables are given, which leads to un-certainty in the evaluation process as fellow researchers cannot reconstruct results obtainedthrough statistical matching based on regression methods. The proposed algorithm tries toresolve this issue for exact matches. Even though this is a limitation in applicability, exactcluster matches obtained through the DEM algorithm can be used at the core of a matching,ascertaining that at least the exact matchable contingent of a dataset is matched determinis-tically (Theorem 16) and corresponds to the expectancy value of the exact matches (Theo-rem 15). Additionally, if in large datasets no exact matches can be found, researchers shouldthoroughly investigate for systematic differences in the therapy groups, as comparisons oftherapy-effects are not recommended if systematic differences exist.

We proposed an alternative deterministic exact matching method (DeM) for SM in the exactcase. The proposed method is based on matching clusters of patients from therapy groupsinstead of matching patients to patients directly. The presented cluster matching approachcomputed with the DeM algorithm (Algorithm 4) extracts all possible information from agiven dataset as all possibly matchable patients get matched and the constructed matching is in accordance with the expectancy value of the dataset (Theorem 15). The constructedmatching also has the desirable property of having low variance while being in accordanceto the expectancy of all possible maximal exact matchings in the dataset (Proposition 17 andCorollary 18).As the proposed algorithm is deterministic and fast (Theorem 16) as well as easy toimplement, it can be used to produce exact matchings on datasets and to discuss ﬁndingsin a reliable way as the results are easily reproducible. This is an important property as itmakes a subsequent decision-making process more transparent and not susceptible to ran-dom events, such as random draws not in accordance to the expectancy. Thus discussionsabout conclusions drawn can be done based on the dataset and the method used for dataacquisition as uncertainties regarding the matching method are eliminated through a provenguarantee that there are no additional errors introduced by the matching procedure.The results from the numerical example, calculated on a dataset containing a completesurvey, validate the shown theoretical propositions and theorems. The proposed method canfurthermore be seen as an extension of state of the art methods as results obtained throughtheir usage would converge in the limit against the result calculated through the proposedalgorithm.The exclusive applicability of the proposed algorithm to exact matches might be seenas a limitation. Then again for small datasets, which are statistically more prone to highvariance in regards to two different matchings, the proposed algorithm provides a reliableresult for the exact matches. For the case of large datasets, a practitioner should be wary ifonly few exact matches exist or a matching result varies to a high degree from the resultgiven by the proposed deterministic algorithm as a systematic difference between the twocompared therapy groups might exist or measuring inaccuracies for continuous covariatesmight be too large in the given dataset to draw reliable conclusions.Finally, we highlight that the algorithm can be used as an a priori method for anothermatching method to extract all available information contained in exact matchings, thereforeascertaining that at least the exact matchable patients of both therapy groups are matched de-terministically and their information is completely used. Further research will be dedicated to extend the presented model towards δ -matching for δ > References

Abidov, A., A. Rozanski, R. Hachamovitch, S. W. Hayes, F. Aboul-Enein, I. Cohen, J. D.Friedman, G. Germano, and D. S. Berman (2005). Prognostic signiﬁcance of dyspnea inpatients referred for cardiac stress testing.

New England Journal of Medicine 353 (18),1889–1898. PMID: 16267320.Adams, N., K. S. Gibbons, and D. Tudehope (2017, Apr). Public-private differences in short-term neonatal outcomes following birth by prelabour caesarean section at early and fullterm.

The Australian & New Zealand journal of obstetrics & gynaecology 57 , 176–185.Anderson, D. W., L. Kish, and R. G. Cornell (1980). On stratiﬁcation, grouping and match-ing.

Scandinavian Journal of Statistics 7 (2), 61–66.Austin, P. C. (2011, May). An introduction to propensity score methods for reducing theeffects of confounding in observational studies.

Multivariate behavioral research 46 ,399–424.Austin, P. C. and D. S. Small (2014). The use of bootstrapping when using propensity- score matching without replacement: a simulation study.

Statistics in Medicine 33 (24),4306–4319.Burden, A., N. Roche, C. Miglio, E. V. Hillyer, D. S. Postma, R. M. Herings, J. A. Overbeek,J. M. Khalid, D. van Eickels, and D. B. Price (2017). An evaluation of exact matchingand propensity score methods as applied in a comparative effectiveness study of inhaledcorticosteroids in asthma.

Pragmatic and observational research 8 , 15–30.Capucci, A., A. De Simone, M. Luzi, V. Calvi, G. Stabile, A. D’Onofrio, S. Maffei, L. Leoni,G. Morani, R. Sangiuolo, C. Amellone, C. Checchinato, E. Ammendola, and G. Buja(2017, Sep). Economic impact of remote monitoring after implantable deﬁbrillators im-plantation in heart failure patients: an analysis from the effect study.

Europace : Europeanpacing, arrhythmias, and cardiac electrophysiology : journal of the working groups oncardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European So-ciety of Cardiology 19 , 1493–1499.Chen, H.-Y., Q. Wang, Q.-H. Xu, L. Yan, X.-F. Gao, Y.-H. Lu, and L. Wang (2016). Statinas a combined therapy for advanced-stage ovarian cancer: A propensity score matchedanalysis.

BioMed research international 2016 , 9125238.Cho, S. H., G.-S. Choi, G. C. Kim, A. N. Seo, H. J. Kim, W. H. Kim, K.-M. Shin, S. M.Lee, H. Ryeom, and S. H. Kim (2017, Mar). Long-term outcomes of surgery alone versussurgery following preoperative chemoradiotherapy for early t3 rectal cancer: A propensityscore analysis.

Medicine 96 , e6362.Dou, J.-P., J. Yu, X.-H. Yang, Z.-G. Cheng, Z.-Y. Han, F.-Y. Liu, X.-L. Yu, and P. Liang(2017, Apr). Outcomes of microwave ablation for hepatocellular carcinoma adjacent tolarge vessels: a propensity score analysis.

Oncotarget 8 , 28758–28768.Fukami, H., Y. Takeuchi, S. Kagaya, Y. Ojima, A. Saito, H. Sato, K. Matsuda, and T. Na-gasawa (2017). Perirenal fat stranding is not a powerful diagnostic tool for acutepyelonephritis.

International journal of general medicine 10 , 137–144.Gozalo, P., M. Plotzke, V. Mor, S. C. Miller, and J. M. Teno (2015). Changes in medi-care costs with the growth of hospice care in nursing homes.

New England Journal ofMedicine 372 (19), 1823–1831. PMID: 25946281. eterministic statistical exact matching 17

Hansen, B. (2004, 02). Full matching in an observational study of coaching for the sat.

Journal of the American Statistical Association 99 , 609–618.Hansen, B. and S. Klopfer (2012, 01). Optimal full matching and related designs via networkﬂows.

Journal of Computational and Graphical Statistics 15 .Iacus, S., G. King, G. Porro, and J. Katz (2012, 12). Causal inference without balancechecking: Coarsened exact matching.

Political Analysis 20 , 1–24.King, G. and R. Nielsen (2019). Why propensity scores should not be used for matching.

Political Analysis , 1–20.Kishimoto, M., H. Yamana, S. Inoue, T. Noda, T. Myojin, H. Matsui, H. Yasunaga,M. Kawaguchi, and T. Imamura (2017, Jun). Sivelestat sodium and mortality in pneu-monia patients requiring mechanical ventilation: propensity score analysis of a japanesenationwide database.

Journal of anesthesia 31 , 405–412.Kong, L., M. Li, L. Li, L. Jiang, J. Yang, and L. Yan (2017, Apr). Splenectomy before adultliver transplantation: a retrospective study.

BMC surgery 17 , 44.Kupper, L. L., J. M. Karon, D. G. Kleinbaum, H. Morgenstern, and D. K. Lewis (1981).Matching in epidemiologic studies: Validity and efﬁciency considerations.

Biomet-rics 37 (2), 271–291.Lai, W.-H., C.-S. Rau, S.-C. Wu, Y.-C. Chen, P.-J. Kuo, S.-Y. Hsu, C.-H. Hsieh, and H.-Y.Hsieh (2016, Nov). Post-traumatic acute kidney injury: a cross-sectional study of traumapatients.

Scandinavian journal of trauma, resuscitation and emergency medicine 24 , 136.

Lee, S. I., K. S. Lee, J. B. Kim, S. J. Choo, C. H. Chung, J. W. Lee, and S.-H. Jung (2017,Jun). Early antithrombotic therapy after bioprosthetic aortic valve replacement in elderlypatients: A single-center experience.

Annals of thoracic and cardiovascular surgery :ofﬁcial journal of the Association of Thoracic and Cardiovascular Surgeons of Asia 23 ,128–134.Liu, Y., J. Han, T. Liu, Z. Yang, H. Jiang, and H. Wang (2016). The effects of diabetes mel-litus in patients undergoing off-pump coronary artery bypass grafting.

BioMed researchinternational 2016 , 4967275.McDonald, J. S., R. J. McDonald, E. E. Williamson, D. F. Kallmes, and K. Kashani (2017,Jun). Post-contrast acute kidney injury in intensive care unit patients: a propensity score-adjusted study.

Intensive care medicine 43 , 774–784.McEvoy, R. D., N. A. Antic, E. Heeley, Y. Luo, Q. Ou, X. Zhang, O. Mediano, R. Chen, L. F.Drager, Z. Liu, G. Chen, B. Du, N. McArdle, S. Mukherjee, M. Tripathi, L. Billot, Q. Li,G. Lorenzi-Filho, F. Barbe, S. Redline, J. Wang, H. Arima, B. Neal, D. P. White, R. R.Grunstein, N. Zhong, and C. S. Anderson (2016). Cpap for prevention of cardiovascularevents in obstructive sleep apnea.

New England Journal of Medicine 375 (10), 919–931.PMID: 27571048.Pearl, J. (2000, 01). Causality: Models, reasoning, and inference, second edition.

Causal-ity 29 .Ray, W., K. Murray, K. Hall, P. Arbogast, and C. Stein (2012, 05). Azithromycin and therisk of cardiovascular death reply.

The New England journal of medicine 366 , 1881–90.Rosenbaum, P. and D. Rubin (1983, 04). The central role of the propensity score in obser-vational studies for causal effects.

Biometrika 70 , 41–55.Rosenbaum, P. R. (1989). Optimal matching for observational studies.

Journal of the Amer-ican Statistical Association 84 (408), 1024–1032.Rubin, D. (1972). Estimating causal effects of treatments in experimental and observationalstudies.

ETS Research Bulletin Series 1972 (2), i–31.Rubin, D. B. (1973). Matching to remove bias in observational studies.

Biometrics 29 (1),159–183.

Salati, M., A. Brunelli, F. Xiume, M. Monteverde, A. Sabbatini, M. Tiberi, C. Pompili,R. Palloni, and M. Refai (2017, Jun). Video-assisted thoracic surgery lobectomy does notoffer any functional recovery advantage in comparison to the open approach 3 monthsafter the operation: a case matched analysisdagger.

European journal of cardio-thoracicsurgery : ofﬁcial journal of the European Association for Cardio-thoracic Surgery 51 ,1177–1182.Schermerhorn, M. L., A. J. O’Malley, A. Jhaveri, P. Cotterill, F. Pomposelli, and B. E. Lan-don (2008). Endovascular vs. open repair of abdominal aortic aneurysms in the medicarepopulation.

New England Journal of Medicine 358 (5), 464–474. PMID: 18234751.Seung, K. B., D.-W. Park, Y.-H. Kim, S.-W. Lee, C. W. Lee, M.-K. Hong, S.-W. Park, S.-C. Yun, H.-C. Gwon, M.-H. Jeong, Y. Jang, H.-S. Kim, P. J. Kim, I.-W. Seong, H. S.Park, T. Ahn, I.-H. Chae, S.-J. Tahk, W.-S. Chung, and S.-J. Park (2008). Stents ver-sus coronary-artery bypass grafting for left main coronary artery disease.

New EnglandJournal of Medicine 358 (17), 1781–1792. PMID: 18378517.Shaw, A. D., M. Stafford-Smith, W. D. White, B. Phillips-Bute, M. Swaminathan, C. Mi-lano, I. J. Welsby, S. Aronson, J. P. Mathew, E. D. Peterson, and M. F. Newman (2008,Feb). The effect of aprotinin on outcome after coronary-artery bypass grafting.

The NewEngland journal of medicine 358 , 784–93.Stuart, E. A. (2010, Feb). Matching methods for causal inference: A review and a look for-ward.

Statistical science : a review journal of the Institute of Mathematical Statistics 25 , New England Journal of Medicine 368 (18), 1704–1712. PMID:23635050.Tranchart, H., D. Fuks, L. Vigano, S. Ferretti, F. Paye, G. Wakabayashi, A. Ferrero, B. Gayet,and I. Dagher (2016, May). Laparoscopic simultaneous resection of colorectal primarytumor and liver metastases: a propensity score matching analysis.

Surgical endoscopy 30 ,1853–62.Zangbar, B., M. Khalil, A. Gruessner, B. Joseph, R. Friese, N. Kulvatunyou, J. Wynne,R. Latiﬁ, P. Rhee, and T. O’Keeffe (2016, Nov). Levetiracetam prophylaxis for post-traumatic brain injury seizures is ineffective: A propensity score analysis.

World journalof surgery 40 , 2667–2672.Zhang, M., R. R. Guddeti, Y. Matsuzawa, J. D. Sara, T. Kwon, Z. yue Liu, T. Sun, S. Lee,R. J. Lennon, M. R. Bell, H. V. Schaff, R. C. Daly, L. O. Lerman, A. Lerman, andC. Locker (2016). Left internal mammary artery versus coronary stents: Impact on down-stream coronary stenoses and conduit patency. In

Journal of the American Heart Associ-ation .Zhang, Z., K. Chen, and H. Ni (2015). Calcium supplementation improves clinical outcomein intensive care unit patients: a propensity score matched analysis of a large clinicaldatabase mimic-ii.