Casework applications of probabilistic genotyping methods for DNA mixtures that allow relationships between contributors
aa r X i v : . [ q - b i o . Q M ] J u l Casework applications of probabilistic genotypingmethods for DNA mixtures that allowrelationships between contributors
Peter J. GreenUTS, Sydney, AustraliaUniversity of Bristol, UK. Julia MorteraUniversit`a Roma Tre, Italy.University of Bristol, UK.Lourdes PrietoForensic Sciences InstituteUniversity of Santiago de Compostela, SpainComisar´ıa General de Polic´ıa Cient´ıfica, DNA Laboratory, Madrid, SpainJuly 28, 2020
Abstract
In both criminal cases and civil cases there is an increasing demand for the analysis of DNAmixtures involving relationships. The goal might be, for example, to identify the contributors toa DNA mixture where the donors may be related, or to infer the relationship between individualsbased on a DNA mixture. This paper applies a recent approach to modelling and computation forDNA mixtures involving contributors with arbitrarily complex relationships to two real cases fromthe Spanish Forensic Police.
Some key words:
Coancestry, deconvolution, disputed relationship, identity by descent, kinship, DNA mix-tures, likelihood ratio.
In both criminal and civil cases based on relationship inference there is an increasing demandfor the analysis of DNA mixtures where relatives are involved. The goal might be to identify thecontributors to a mixture where the donors may or may not be related, or to determine relationshipsbetween typed individuals and one (or more) of the contributors to a mixture, also in the case thatthe mixture contributors themselves are related.We analyse two real cases from the Spanish Forensic Police. In the first case we wish to identifya missing person through the analysis of DNA mixtures found on personal belongings. In manycases, the genetic profile detected on the objects is not from a single source, but might be a DNAmixture, revealing that the object was used by 2 (or more) people. In addition, very often, thecontributors to these mixtures are related, mainly in cases, such as this one, where the missingperson shared the dwelling with relatives.The second case concerns a murder where a man was stabbed in his home. A DNA sample wastaken from the murder weapon and appeared to be a DNA mixture from the victim and possiblya close relative of the victim. 1ere we use probabilistic genotyping methods for DNA mixtures, under hypotheses about therelationships among contributors to the mixture and to other individuals whose genotype is avail-able. Here we briefly summarise these methods and refer to Mortera (2020) which presents a reviewon DNA mixtures where further background can be found.The basis for any model-based DNA mixture analysis is a joint model for the peak heights z inthe electropherogram and genotypes represented as allele counts n , p ( n , z | ψ ) = p ( n ) × p ( z | n , ψ ),having parameters ψ = ( φ, ρ, ξ, η ) (Graversen 2013). Given a hypothesis on the DNA mixturecontributors, the database allele frequencies, the parameters ψ , the DNA mixture model consists oftwo components: (a) the joint distribution p ( n ) of the contributors’ genotypes; (b) the conditionaldistribution p ( z | n , ψ ) of the peak heights as observed in the electropherogram, given the genotypes.We base the analysis of the DNA mixture on the model described in Cowell et al . (2015). This modeltakes fully into account the peak heights and the possible artefacts, like stutter and dropout, thatmight occur in the DNA amplification process. The model can coherently analyse a combinationof replicates, a combinations of different samples and a combinations of different kits. We refer tothe review on DNA mixtures by Mortera (2020) for further details.In the standard case, unknown contributors to the mixture are assumed drawn at random fromthe gene pool. When the contributors are related, there is positive association between their contrib-utor genotypes. Green and Mortera (2020) present a new model aimed at making inference aboutcomplex relationships from DNA mixtures. This generalises the work in Green and Mortera (2017)which allowed inference about particular close relationships between contributors to a DNA mix-ture with unknown genotype and other individuals of known genotype. The new model extends theanalysis to different scenarios and allows to specify arbitrary relationships between a set of actors,each of which may be mixture contributors, or have measured genotypes, or both. We can evaluatethe likelihood of any such model, and compare models accordingly.The case work examples in § KinMix R package (Green 2020)that extends the
DNAmixtures R package (Graversen 2013) to allow for modelling DNA mixtureswith related contributors. This software is general and can handle complex relationships withand between mixture contributors. Inference is not limited to two-way relationships but can beextended to relationships among 3 (or possibly more) contributors to a mixture.
In this section we demonstrate the results and performance of our methods on the two case stud-ies. For the first example we used the data gathered on 21 markers included in GlobalFiler TM Amplification kit (ThermoFisher) and in the second example we also used data on 16 markers inthe PowerPlex R (cid:13)
16 kit. In all examples we assume known allele frequencies taken from the Spanishallele frequency database collected on n = 284 individuals (Garc´ıa et al . 2012). In all the analysespresented we adopt a threshold of 50 rfus. Background on the case
Personal belongings such as toothbrushes or razor blades can be usedas a source of DNA in missing person cases. In these objects, DNA from the missing person canbe found since they may have been frequently used before his/her disappearance. Nevertheless,there is uncertainty about the actual donor of the DNA isolated from these objects, reason why2t is recommended to “validate” the detected profile by using a reference (known) sample froma relative of the missing person. Usually, these profiles (from objects and/or relatives) are thencompared with DNA profiles of unidentified bodies that are stored in national databases (massivecomparison). This is useful to know if the missing person has passed away but his body was notidentified. Unfortunately, in some cases, the genetic profile detected on objects is not a single sourceprofile but a DNA mixture, revealing that the object was used by 2 (or more) people. In addition,very often, the contributors to these mixtures are related (mainly in cases where the missing personshared the dwelling with relatives).In this example, we present a real case related to a missing male. In this specific case, only adaughter of the missing male was available to donate a DNA sample. This is not the ideal situationsince false DNA matches can be found after a massive comparison of profiles in a database whenonly one relative is available as a reference sample. In order to improve the reference genetic data,a toothbrush and a razor-blade, presumably used by the missing person, were also collected. DNAfrom both objects was recovered and analysed by using GlobalFiler kit (Thermo Fisher). Thereference sample from the daughter of the missing male was also genotyped with GlobalFiler kit.Two different DNA mixtures were detected in the two objects. An excerpt of the data is shown inTable 1, showing the alleles and peak heights in the two DNA mixtures found on the toothbrushT and the razor-blade RB. The DNA profile of the daughter, denoted by D, is also shown.
Results
Here we analyse the two DNA mixtures found on the toothbrush T, and a razor-bladeRB, presumably used by the missing person (ante-mortem data).Table 1: Example 1: An excerpt of the data from the toothbrush T and the razorblade RB , showingthe markers, alleles and relative peak heights. The DNA profile of the daughter D of the missingperson is also shown. alleles toothbrush razorblademarkers in mixture peak height peak height DCSF1PO 10 1152 24511 126 79612 941 830 12D22S1045 11 3218 33415 3550 1795 1516 1274D5S818 11 5158 2141 1113 304 1512 13vWA 14 94516 264 853 1618 3664 612 18Table 2: Example 1: Estimated parameters based on an analysis of the two mixture samplesassuming that the toothbrush T contains DNA from two unknown contributors and the razor-blade RB contains DNA from three unknown contributors. µ σ ξ φ U φ U toothbrush 2381 0.0614 0 0.926 0.074razor-blade 1602 0.4955 0.0118 0.5002 0.4998Table 2 shows the estimated parameters ψ = ( µ, σ, ξ, φ ) for the analysis of the DNA mixturesfound on T and RB . We assume there are 2 unknown contributors U and U to both T and RB .3able 3: Example 1: log LR for testing whether in T and RB , H p contributor ( U or U ) is thefather of D vs. H no contributor is related to D. log LR U U toothbrush 10.97 4.53razor-blade 8.442 8.444The analysis performed for 3 unknown contributors (not shown here) yielded an almost vanishingproportion for the third contributor. These are not necessarily the same individuals contributingto T and RB . The estimated proportion of DNA for the two contributors to sample T is large forthe major contributor U , φ U = 0 .
93, whereas, for item RB the estimated proportions of DNAcontributed by U and U are roughly equal, φ U ≃ φ U = 0 .
5, implying they contributed in almostequal proportions to the mixture. As we will see in the latter case the estimation of the LR andother inference is problematic. Note that in these models, the likelihood can have a complicatedshape and be difficult to safely maximise numerically. The values in Table 2 are the maximumlikelihood estimates, as calculated by
DNAmixtures .Table 3 shows the LR and log LR for testing H p : D is the child of U (and similarly for U ) vs. H : no unknown contributors are related to D. For item T , log LR = 10 .
97 is large pointingto U being the father of D. It is also substantial for the hypothesis concerning U being the fatherof D. Could this be due to the fact that the two contributors might be related? We will test thisassumption later. For RB the log LR in Table 3 for the previous hypotheses is equal when testingwhether D is the child of U or U . This is probably due to the fact that the proportions are almostidentical, which makes it extremely difficult to distinguish between the contributors.Table 4: Example 1: Excerpt of marker-wise LR and overall log LR for item T , using relMix and KinMix with and without peak height information, for testing whether in T , H p : U is the fatherof D vs. H : U and U are random members of the population.marker relMix KinMix KinMix w/o peak heights with peak heightsCSF1PO 1.08 1.07 1.59D10S1248 1.26 1.18 1.62D5S818 2.09 2.12 1.51vWA 2.55 2.58 3.34partial log LR 8.35 8.42 9.94overall log LR 9.53 10.97Table 4 presents the marker-wise comparison between the likelihood LR and the overall log LRwhen using relMix (Hernandis et al . 2019) and
KinMix with and without peak height information. relMix is, like
KinMix , an R package that analyses DNA mixtures involving relatives, but is basedonly on the allele presence and does not consider the peak heights when modelling the DNA mixture.The results obtained with relMix and KinMix when not including the peak height information(columns 2 and 3) are quite similar. Small differences between relMix and
KinMix when notincluding peak heights are to be expected since they are based on different models. For the majorityof markers when including peak height information
KinMix gave a log LR larger than when notincluding peak height information. When using only the markers that relMix is able to computethe partial log LR obtained with
KinMix with peak heights is 9.94 and without peak heights is8.42. The overall log LR on all the markers computed by
KinMix with peak heights is 10 .
97, andwithout peak heights is 9.53, corresponding to a LR 27.5 times smaller.4able 5: Example 1: For item T , log LR for H p : the two contributors to the mixture are related, i.e. U has relationship R to U , vs. H : the two contributors are unrelated. Several differentrelationships R are tested. Relationship R between log LR U and U under H p parent-child − − − − − − RB , log LR for H p : the two contributors to the mixture arerelated, i.e. U has relationship R to U , vs. H : the two contributors are unrelated. Severaldifferent relationships R are tested.Relationship R between U to log LRand U under H p parent-child − − − − − − U or U to item T and RB are related, i.e. H p : U has relationship R to U versus H : U and U are un-related . The log LRs are all negative, implying that the LRs are smaller than 0.1. Thereis almost no evidence that the two mixture contributors have a relationship among those in R = { parent-child, sibs,quadruple half-cousins, half-sib, first cousins,half-cousins } .Table 7 and Table 8 show the log LR for item T for several hypotheses H p concerning dif-ferent relationships R among U , U and D, vs. two alternative hypotheses. The first alternativehypothesis in Table 7, H : U is the father of D and U is unrelated to D and U , whereas therole of U and U is reversed in Table 8. The second alternative hypothesis is H : U , U and Dare unrelated. The values of the log LR show that it is highly likely that the two contributors toitem T are the missing father of D and D’s mother. It also seems more likely that the mother isthe major contributor and the father the minor contributor.Table 7: Example 1: For item T , log LR for several hypotheses H p concerning different relation-ships R among U , U and D, vs. H : U is the father of D and U is unrelated to D and U and H : U and U and D are unrelated. log LR H p H H U father and U mother of D 6.960 17.935 U father of D and U maternal aunt of D 4.605 15.579 U father of D and U paternal cousin of D 0.375 10.600 U sib of U and father of D − U father of both D and U − − T , log LR for several hypotheses H p concerning different relation-ships R among U , U and D, vs. H : U is the father of D and U is unrelated to D and U and H : U and U and D are unrelated. log LR H p H H U father and U mother of D 13.404 17.935 U father of D and U maternal aunt of D 11.049 15.579 U father of D and U paternal cousin of D 6.069 10.600 U sib of U and father of D 4.303 8.834 U father of D and U − − U with corresponding prob-abilities for item T and for both major and minor contributor for item RB for an excerpt of themarkers. The genotype of D is also shown.toothbrush razor-blade major razor-blade minormarkers genotype prob. genotype prob. genotype prob. DCSF1PO 10 12 1 10 11 0.42 11 12 0.44 12 1211 12 0.24 10 12 0.3310 12 0.22 12 12 0.2311 11 0.11D22S1045 11 15 0.9999 11 16 0.40 15 16 0.40 15 1511 15 0.26 11 15 0.3315 16 0.26 15 15 0.2616 16 0.07D5S818 11 11 1 11 13 0.56 11 13 0.63 11 1311 11 0.35 11 11 0.2213 13 0.08 13 13 0.15vWA 18 18 0.9999 14 16 0.42 16 18 0.43 16 1814 18 0.24 14 18 0.2416 18 0.15 14 16 0.1716 16 0.08 16 16 0.1014 14 0.08 18 18 0.0518 18 0.01Table 9 shows the deconvolution of the mixtures in items T and RB . The predicted profile ofthe major contributor of the T mixture has probabilities close to 1 on all markers. This profileis compatible with being the father of D as it shares at least one allele on all markers. As in theprevious analyses, the deconvolution of the mixture in RB yields very uncertain predictions of themajor and minor contributor’s genotype as there are many candidate genotypes and all the highestranking probabilities are smaller than 0.5. Description of the case
This concerns a murder case where a man was stabbed in his home.There was a knife with blood at the crime scene. The blood was mainly on the blade, but therewas also some blood on the handle. The sample from the handle turned out to be a DNA mixture,with a major profile matching the victim. We also wish to test whether the minor profile in themixture could be a close relative of the victim (possibly a son). The DNA profile of the son was6ot available. Two EPGs of the mixture were obtained by using two different kits, we denote theseby EPG1 and EPG2. The kits have partially overlapping sets of markers, EPG1 was analysed onits 16 markers and EPG2 on its set of 22 markers, both include Amelogenin.Months later, a man was arrested for a different crime (drug trafficking) and a reference DNAsample was collected. When his profile was entered in the DNA database several matches werefound, among which with the DNA mixture on the handle of the knife. The matches were investi-gated and the identity of the person (name, date of birth, place of birth, name of the father, nameof the mother) was that of the son of the victim. Table 10 gives an excerpt of the data showing themarkers, alleles and relative peak heights for EPG1 and EPG2, together with the father and son’sgenotypes.Table 10: Example 2: An excerpt of the data showing the markers, alleles and relative peak heightsfor EPG1 and EPG2, together with the father’s and son’s genotypesEPG1 EPG2 victim suspectmarker allele height heightCSF1PO 10 305 625 10 1011 240 504 11 11D10S1248 13 6990 1314 2309 1416 7144 16 16D7S820 9 606 1136 9 910 686 10TH01 9.3 863 2654 9.3 9.310 570 10
Results
We analysed the data from this case to illustrate the different queries that can be analysedusing the recently developed
Kinmix code.In particular we analyse the following different possible scenarios:
Scenario 1
Here none of the contributors are typed. The analysis of a 2-person mixture modelfor a prosecution hypothesis H p : being the two unknowns being father and son versus H the two unknown contributors are unrelated. Scenario 2
Here only the father (the victim) is typed. Analysis of a 2-person mixture model,where father has been typed and the prosecution hypothesis is H p : son of father and 1unknown are contributors versus H : no contributor is related to the typed individual (thevictim). Scenario 3
Both father and son are typed. Here we analyse a 2-person mixture model where H p :the contributors are victim (father) and son versus H : contributors to the mixture(s) are 2unknown individuals. Scenario 4
Both father and son are typed. Here we analyse a 2-person mixture model where H p :the contributors are victim (father) and son versus H : contributors to the mixture(s) arethe victim and an unknown.In all scenarios, unless otherwise stated, when considering an unknown contributor to a mixture,he is taken to be a random member of the reference population, so unrelated to typed individuals.7or EPG1 the MLEs of the parameters under both H p and H are similar and are roughly equalto ψ = ( µ = 576 , σ = 0 . , ξ = 0 , φ U = 0 . , φ U = 0 . φ v = 0 . φ U = 0 .
82. For EPG2 the MLEsof the parameters are roughly equal to ψ = ( µ = 2542 , σ = 0 . , ξ = 0 , φ U = 0 . , φ U = 0 . φ v = 0 . φ U = 0 .
86. In both EPG1 and EPG2 the victim is estimated to be the minor contributor. Notethat EPG2 has a higher µ than EPG1 but this is also accompanied by a larger σ , so the coefficientof variation is similar in both EPGs. The MLEs of the mean stutter proportion ξ are zero, whichindicates that the data has been preprocessed so that peaks that were classified in the laboratoryas stutter have been removed. Our models do not, however, require that the data be preprocessed,thus avoiding eliminating a true peak in stutter position.Table 11 gives the log LR for the 4 scenarios when analysing EPG1 and EPG2 separately andjointly. When combining EPGs made from the same DNA extract, as in this case, it is natural tomake an assumption that contributors are the same. In Graversen et al . (2019) we show how resultsbased on a combination of replicates, a combinations of different samples and a combinations ofdifferent kits improve the robustness of the analysis and help in fixing any complications relatingto degradation. However, when combining profiles from different samples one needs to carefullyconsider whether there is perhaps only a partial overlap.Table 11: Example 2: log LR for Scenarios 1–4 using EPG1 and EPG2 separately and in combi-nation. Scenario 1 2 3 4Typed actors none father father & sonEPG1 − − LR for H p : the two contributors to themixture are related, i.e. U has relationship R to U , vs. H : the two contributors U and U are unrelated and are independent of the typed individuals. Several different relationships R aretested. log LRRelationship EPG1 EPG2parent-child − − − − − − − − − LR for testing whether the two unknown contributors to the DNA mixtureare related versus that they are unrelated. For EPG1 the LRs for testing H p that the U has arelationship R to U , vs. H : the two contributors U and U are unrelated and are independentof the typed individuals, vary between [0 . , .
9] giving roughly equal weight to H versus H . ForEPG2 these vary between [0 . , . other the collection of alleles for which no peak8as been observed in the EPG. For EPG1 the highest ranking genotype for the major contributor U on all markers has posterior probability greater than 0.99 and coincides with the genotype ofthe suspect (who is the son of the victim) on all markers. The deconvolution for EPG2 gives amuch poorer performance. For example, on marker D7S850 the top ranking genotype for EPG2 isincorrect, the correct genotype (9,9) is ranked 3rd having a small probability of 0.077.Table 13: Example 2: Predicted genotypes of U with corresponding probabilities for EPG1 andEPG2 for an excerpt of the markers. An allele not observed in the EPG is denoted by other .EPG1 EPG2genotype prob. genotype prob.CSF1PO 10 11 1 10 11 0.75110 10 0.09711 11 0.08310 other other other other other other other other other Conclusions
We have shown that a wide range of relationship inference problems where one or more actorsappear only as contributors to a DNA mixture, can be handled coherently. We can make inferenceabout relationships among contributors, and between contributors and typed individuals.The new
KinMix package (Green 2020) used in the casework examples illustrated here is ahighly flexible modular software capable of solving much more complex relationships among two ormore mixture contributors than those presented here. It is not limited to pairwise relationships. InGreen and Mortera (2020) we show its capabilities of dealing with multi-way relationships in DNAmixtures including cases where the contributors might be inbred.
References
Cowell, R. G., Graversen, T., Lauritzen, S. L., and Mortera, J. (2015). Analysis of DNA mixtureswith artefacts (with discussion).
Journal of the Royal Statistical Society: Series C , , 1–48.Garc´ıa, O., Alonso, J., Cano, J., Garc´ıa, R., Luque, G., Mart´ın, P., de Yuso, I. M., Maulini, S.,Parra, D., and Yurrebaso, I. (2012). Population genetic data and concordance study for thekits Identifiler, NGM, PowerPlex ESX 17 System and Investigator ESSplex in Spain. ForensicScience International: Genetics , , (2), e78–9.Graversen, T. (2013). DNAmixtures: Statistical Inference for Mixed Traces of DNA . R packageversion 0.1-4. http://dnamixtures.r-forge.r-project.org .Graversen, T., Mortera, J., and Lago, G. (2019). The Yara Gambirasio case: Combining evidencein a complex DNA mixture case.
Forensic Science International: Genetics , , 52–63.Green, P. J. (2020). KinMix : DNA mixture analysis with related contributors . R package 2.0, https://petergreenweb.wordpress.com/kinmix2-0 .Green, P. J. and Mortera, J. (2017). Paternity testing and other inference about rela-tionships from DNA mixtures.
Forensic Science International: Genetics , , 128–37. http://dx.doi.org/10.1016/j.fsigen.2017.02.001 .Green, P. J. and Mortera, J. (2020). Inference about complex relationships using peak height datafrom DNA mixtures. https://arxiv.org/abs/2005.09365 .Hernandis, E., Dørum, G., and Egeland, T. (2019). relMix: An open source software for DNA mix-tures with related contributors. Forensic Science International: Genetics Supplement Series , , 221–3. https://doi.org/10.1016/j.fsigss.2019.09.085 .Mortera, J. (2020). DNA mixtures in forensic investigations: The statisticalstate of the art. Annual Review of Statistics and Its Application , , 1–34. https://doi.org/10.1146/annurev-statistics-031219-041306https://doi.org/10.1146/annurev-statistics-031219-041306