Differential Expression Analysis of Dynamical Sequencing Count Data with a Gamma Markov Chain
Ehsan Hajiramezanali, Siamak Zamani Dadaneh, Paul de Figueiredo, Sing-Hoi Sze, Mingyuan Zhou, Xiaoning Qian
DDifferential Expression Analysis of DynamicalSequencing Count Data with a Gamma Markov Chain
Ehsan Hajiramezanali , Siamak Zamani Dadaneh , Paul de Figueiredo ,Sing-Hoi Sze , Mingyuan Zhou , and Xiaoning Qian March 8, 2018
Abstract
Next-generation sequencing (NGS) to profile temporal changes in living systemsis gaining more attention for deriving better insights into the underlying biologicalmechanisms compared to traditional static sequencing experiments. Nonetheless,the majority of existing statistical tools for analyzing NGS data lack the capa-bility of exploiting the richer information embedded in temporal data. Severalrecent tools have been developed to analyze such data but they typically imposestrict model assumptions, such as smoothness on gene expression dynamic changes.To capture a broader range of gene expression dynamic patterns, we develop thegamma Markov negative binomial (GMNB) model that integrates a gamma Markovchain into a negative binomial distribution model, allowing flexible temporal vari-ation in NGS count data. Using Bayes factors, GMNB enables more powerful Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX77843, USA, Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station,TX 77843, USA, Department of Microbial Pathogenesis and Immunology, Texas A&M Health Science Center, CollegeStation, TX 77843, USA, Department of Veterinary Pathobiology, and Norman Borlaug Center, Texas A&M University, 77843,USA, Department of Computer Science and Engineering, and Department of Biochemistry & Biophysics,Texas A&M University, College Station, TX 77843, USA, Department of Information, Risk, and Operations Management (IROM), The University of Texas atAustin, Austin, TX 78712, USA. * Correspondence should be addressed to Xiaoning Qian ( [email protected] ). a r X i v : . [ s t a t . A P ] M a r emporal gene differential expression analysis across different phenotypes or treat-ment conditions. In addition, it naturally handles the heterogeneity of sequencingdepth in different samples, removing the need for ad-hoc normalization. EfficientGibbs sampling inference of the GMNB model parameters is achieved by exploit-ing novel data augmentation techniques. Extensive experiments on both simulatedand real-world RNA-seq data show that GMNB outperforms existing methods inboth receiver operating characteristic (ROC) and precision-recall (PR) curves ofdifferential expression analysis results. Keywords: Gamma Markov chain, negative binomial, Gibbs sampling, temporalRNA-seq data, Bayes factor
Advances in next-generation sequencing (NGS) techniques have enabled researchers toproduce millions of relatively short reads for genome-scale bioinformatics research [Con-sortium et al., 2012, Wang et al., 2009, Mortazavi et al., 2008]. Transcriptome analyses,including gene expression profiling and transcript quantification through RNA-sequencing(RNA-seq), can help better understand biological processes of interest. RNA-seq countdata are highly over-dispersed with large dynamic ranges [Anders et al., 2015]. A largenumber of statistical tools have been developed for differential gene expression analysis ofRNA-seq data [Anders and Huber, 2010, Dadaneh et al., 2017, Robinson et al., 2010, Loveet al., 2014, Law et al., 2014, Hardcastle and Kelly, 2010, Leng et al., 2013], which mostlyhave adopted the negative binomial (NB) distribution to account for over-dispersion aswell as high uncertainty inherent in RNA-seq data due to the small number of replicatesamples in typical differential expression experiments [Love et al., 2014].Living systems are complex and dynamic. There has been significant interest in an-alyzing temporal RNA-seq count data [Bar-Joseph et al., 2012]. For example, in cellbiology or drug discovery research, monitoring molecular expression changes in responseto specific stimuli can help better understand cellular mechanisms at the transcriptional2nd post-transcriptional regulatory levels under different conditions. One important taskis to identify the genes that are differentially expressed over time across different con-ditions, which is more challenging compared to static RNA-seq data analysis due topotential temporal dependencies [Lienau et al., 2009].Recently, several dynamic differential RNA-seq analysis methods have been devel-oped to better capture temporal dependency. For example, EBSeq-HMM [Leng et al.,2015] takes an empirical Bayesian mixture modeling approach to compare the expressionchange across consecutive time points to identify genes that display significant transcrip-tion changes over time under one treatment condition. Across different conditions, itis desirable to identify genes that have different dynamic patterns. For this purpose,next-maSigPro [Nueda et al., 2014] has extended a generalized linear model (GLM) [Mc-Cullagh, 1984] based dynamic differential expression analysis for microarray data frommultiple time points to analyze temporal RNA-seq data. However, modeling RNA-seqcounts by real values may lead to information loss and GLM may not be able to capturecomplicated dynamic changes in expression. An autoregressive time-lagged AR (1) modelwith Markov Chain Monte Carlo (MCMC) inference [Oh et al., 2013] has also been pro-posed to identify genes with different temporal expression changes. But the posteriorestimates of model parameters through Metropolis-Hastings inference lead to high com-putational complexity. DyNB [ ¨Aij¨o et al., 2014] has been proposed recently to modelthe temporal RNA-seq counts by NB distributions with their temporal expected valuesmodeled by non-parametric Gaussian Processes (GP). DyNB can detect the genes withdifferential dynamic patterns that static differential expression analysis, which considerindividual time points, fail to discover. In addition to high computational complexity dueto MCMC inference [Spies and Ciaudo, 2015, Sun et al., 2016], DyNB may fail to modelpotential abrupt expression changes due to its inherent smoothness assumptions [Ras-mussen and Williams, 2006].We present a new dynamic differential expression analysis method for temporal RNA-3eq data, GMNB (gamma Markov negative binomial), which is a hierarchical model tointroduce a gamma Markov chain [Acharya et al., 2015, Schein et al., 2016] to model thepotential dynamic transitions of the model parameters in NB distributions. With thisnew model for temporal RNA-seq data and an efficient inference algorithm, GMNB isexpected to provide the following advantages over existing methods: 1) GMNB can modelmore general dynamic expression patterns than DyNB, especially for abrupt expressionchanges across consecutive time points; 2) The closed-form Gibbs sampling can be de-rived to infer the model parameters in GMNB, which is computationally more efficientthan the existing methods; 3) For dynamic differential expression, genes are ranked basedon the Bayes factor (BF), which is very general especially when considering differentialexpression under multiple factors; 4) Last but not least, GMNB avoids the normalizationpreprocessing step due to the explicit modeling of the sequencing depth in NB distribu-tions, as described in Dadaneh et al. [2017], and we expect similar superior performanceof GMNB compared to existing methods requiring such heuristic preprocessing steps.The remainder of the paper is organized as follows. Section 2 introduces the GMNBmodel, inference algorithm, and dynamic differential expression analysis. Section 3 com-pares the experimental results from both synthetic and real-world benchmark data usingGMNB and other state-of-the-art dynamic differential expression methods for temporalRNA-seq data. We conclude the paper in Section 4. Throughout this paper, we use the NB distribution to model RNA-seq read counts. Weparameterize a NB random variable as n ∼ NB( r, p ), where r is the nonnegative dispersionand p is the probability parameter. The probability mass function (pmf) of n is expressedas f N ( n ) = Γ( n + r ) n !Γ( r ) p n (1 − p ) r , where Γ( · ) is the gamma function. The NB random variable4 ∼ NB( r, p ) can be generated from a compound Poisson distribution: n = (cid:96) (cid:88) t =1 u t , u t ∼ Log( p ) , (cid:96) ∼ Pois( − r ln(1 − p )) , where u ∼ Log( p ) corresponds to the logarithmic random variable [Johnson et al., 2005],with the pmf f U ( u ) = − p u u ln(1 − p ) , u = 1 , , ... . As shown in Zhou and Carin [2015],given n and r , the distribution of (cid:96) is a Chinese Restaurant Table (CRT) distribution,( (cid:96) | n, r ) ∼ CRT( n, r ), a random variable from which can be generated as (cid:96) = (cid:80) nt =1 b t ,with b t ∼ Bernoulli( rr + t − ). We model the dynamic gene expression changes in a temporal RNA-seq dataset by con-structing a Markov chain where the expression of a gene at time t only depends on thatof time t −
1. Specifically, for the RNA-seq reads mapped to gene k in a given sample j under different conditions, the read count at time t follows: n ( t ) kj ∼ NB( r ( t ) k , p ( t ) j ) , (1)where to impose the dependence between consecutive time points, we model the dis-persion parameters dynamically by introducing a gamma Markov chain, in which r ( t ) k isdistributed according to: r ( t ) k ∼ Gamma( r ( t − k , c k ) . (2)As previously shown in Dadaneh et al. [2017], the probability parameter p ( t ) j accountsfor the effect of varying sequencing depth of sample j at time point t . More precisely, theexpected expression of gene k in sample j and time t is r ( t ) k p ( t ) j − p ( t ) j , and hence the dispersionparameter r ( t ) k can be viewed as the true abundance of gene k at time t , after removingthe effects of sequencing depth. Thus the differential expression analysis of temporal5NA-seq data can be performed without any normalization preprocessing steps.Note that the scale parameter 1 /c k of the Gamma distribution in (2) is shared betweendifferent time points, thereby making statistical inference more robust by borrowinginformation from various samples at multiple time points. To complete the model wesample the dispersion parameter at the first time point as r (0) k ∼ Gamma( e (0) , f ), anduse conjugate priors as c k ∼ Gamma( c , d ) and p ( t ) j ∼ Beta( a , b ).In addition to the flexibility of modeling temporal RNA-seq data, this GMNB modelenables an efficient inference procedure by taking advantage of unique data augmenta-tion and marginalization techniques for the NB distribution [Zhou and Carin, 2015], asdescribed in detail below. By exploiting novel data augmentation techniques in Zhou and Carin [2015], we im-plement an efficient Gibbs sampling algorithm with closed-form updating steps. Morespecifically, we infer the dispersion parameter of the NB distribution by first drawinglatent random counts from the CRT distribution, and then update the dispersion by em-ploying the gamma-Poisson conjugacy. Furthermore, due to the Markovian constructionof the model, it is necessary to consider both backward and forward flow of informationfor the inference of r ( t ) k . First, in the backward stage, starting from the last time point t = T , we draw two sets of auxiliary random variables as l ( t ) kj ∼ CRT( n ( t ) kj , r ( t ) k ) l ( t ) k. = (cid:88) j l ( t ) kj u ( t − t ) k ∼ CRT( u ( t )( t +1) k + l ( t ) k. , r ( t − k ) , (3)6or t = T, T − , . . . ,
1. For the last time point, we assume u ( T )( T +1) k = 0. Next, in theforward stage of Gibbs sampling, we sample r ( t ) k starting from t = 0 to t = T as( r ( t ) k |− ) ∼ Gamma (cid:0) r ( t − k + u ( t )( t +1) k + l ( t ) k. , θ ( t ) k (cid:1) , (4)where θ ( t ) k = c k − (cid:80) j ln(1 − p ( t ) j ) − ln (1 − q ( t ) k ). For t = 0 , ..., T − q ( t ) k is defined as q ( t ) k = − (cid:80) j ln(1 − p ( t +1) j ) − ln(1 − q ( t +1) k ) c k − (cid:80) j ln(1 − p ( t +1) j ) − ln(1 − q ( t +1) k ) , (5)and q ( T ) k = 0. Finally, by taking advantage of conjugate priors, in each iteration of Gibbssampling, c k and p ( t ) j can be drawn as( c k |− ) ∼ Gamma( e + T − (cid:88) t =0 r ( t ) k , / ( f + T (cid:88) t =1 r ( t ) k )) , ( p ( t ) j |− ) ∼ Beta( a + (cid:88) k n ( t ) kj , b + (cid:88) k r ( t ) k ) . (6)The efficient augmentation technique employed in our Gibbs sampling inference re-moves the need for specifying a suitable proposal distribution, as in the Metropolis-Hastings inference of both DyNB [ ¨Aij¨o et al., 2014] and NB-AR(1) methods [Oh et al.,2013]. Our experiments in the next section demonstrate that the Gibbs sampling algo-rithm of GMNB has fast convergence. The main goal of differential expression analysis is to identify the genes whose expressionsdemonstrate significant variations across conditions. In the classic static RNA-seq dataanalysis, this goal is usually obtained via the comparison of expression averages acrossgroups. In dynamic RNA-seq measurement settings, however, this task becomes morechallenging as any change of temporal expression patterns between groups may reflect7nteresting biological mechanisms. Hence, as in ¨Aij¨o et al. [2014], we adopt the BayesFactor (BF) as a measure that exploits information collectively from all time pointsto detect the genes with significant variations in temporal expression patterns acrossconditions.To compute the Bayes Factor, we first consider the null hypothesis H that the genesare not differentially expressed across conditions, and thus the same set of parametersgovern the temporal gene expressions. In this case, we aggregate the counts D of bothexperimental conditions to fit the GMNB model M . On the other hand, under thealternative hypothesis H , the differentially expressed genes possess different model pa-rameters in each group. Hence, GMNB models M and M are independently fitted to thecounts in conditions 1 ( D ) and 2 ( D ), respectively. Then, the BF can be calculated asBF = P ( D| H ) P ( D| H ) = P ( D | M ) P ( D | M ) P ( D| M ) , where we have assumed equal prior probabilities for both hypotheses. The BF compu-tation requires marginalizing out model parameters, which we conduct through MonteCarlo integration using posterior samples collected in the Gibbs sampling procedure. We evaluate the proposed GMNB model and compare its performance on both syntheticand real-world temporal RNA-seq data with DyNB [ ¨Aij¨o et al., 2014]. We also considerDESeq2 [Love et al., 2014], which is a popular tool for differential expression analysis,however, not specifically designed for temporal RNA-seq data. We first consider syntheticRNA-seq data generated by different temporal models, and show that GMNB consistentlyprovides outstanding performance in terms of the area under the curves (AUCs) of receiveroperating characteristic (ROC) and precision-recall (PR) curves. Furthermore, we presenttwo case studies on human Th17 cell differentiation [Tuomela et al., 2016, Chan et al.,8016, ¨Aij¨o et al., 2014], and explain the biomedical implications based on differentialexpression analysis over time by GMNB.Throughout the experimental studies for synthetic and real-world data, for GMNB,in each run of Gibbs sampling inference 1000 MCMC samples of parameters are collectedafter 1000 burn-in iterations. We use the collected MCMC samples to calculate the BFfor each gene as explained in Section 2.4, and rank the genes according to these BFs.For DyNB, we follow the settings provided in ¨Aij¨o et al. [2014] and rank the genes usingthe computed BFs. We consider three different setups for differential expression analysisof temporal RNA-seq data using DESeq2. In the first setup, denoted by DESeq2-GLMin the experiments, time information is incorporated as a covariate of the generalizedlinear model in DESeq2 in differential expression analysis to determine temporal data inone model. In the second and third setups, we apply DESeq2 to the data at differenttime points independently, and use the average and minimum computed p-values fromthe respective differential expression analyses as an overall measure of differential ex-pression across conditions, denoted by DESeq2-avg and DESeq2-min in the experiments,respectively.It is worth mentioning that the use of an efficient closed-form Gibbs sampling makesGMNB, on average, 10 times faster than DyNB for both simulated and real-world tem-poral RNA-seq datasets by reducing the number of iterations required to converge. Thisis due to the low acceptance rate of the Metropolis-Hastings step of DyNB inference.Thus, to ensure the convergence of its MCMC inference, we consider performing 100 , We first perform a comprehensive evaluation of GMNB with the synthetic data generatedunder different temporal RNA-seq models. More precisely, we simulat the data under the9ollowing three different setups: the proposed GMNB generative model, the DyNB gen-erative model [ ¨Aij¨o et al., 2014], and the auto-regressive (AR) based procedure [Oh et al.,2013]. In all setups 10% of genes are randomly set to be truly differentially expressed,with the procedure described in detail for each setup in the following subsections. Foreach specific generative model, we change the corresponding model parameters to ensurethat the expected expression changes of truly differentially expressed genes are differentacross two conditions. The impact of sequencing depth variation is simulated by drawingthe corresponding size factors from the interval [0 . , .
2] uniformly at random.
In the first simulation study, we generate the synthetic RNA-seq count data for 1000genes under two conditions according to the GMNB model (1) with the gamma-Markovtemporal dependencies (2) between dispersion parameters. The gene-wise scale parame-ters c k are drawn from the uniform distribution in the interval [0 . , c k + b , where b = .
02 if c k < − .
02 if c k ≥ r (0) k , is generated for both conditions according toGamma( e ,
10) where e = Uniform(30 , . , . Recall P rec i s i o n GMNB, AUC = 0.65DyNB, AUC = 0.27DESeq2-min, AUC = 0.57DESeq2-avg, AUC = 0.46DESeq2-GLM, AUC = 0.09 0 0.2 0.4 0.6 0.8 1
False Positive Rate T r u e P o s i t i v e R a t e GMNB, AUC = 0.87DyNB, AUC = 0.70DESeq2-min, AUC = 0.84DESeq2-avg, AUC = 0.80DESeq2-GLM, AUC = 0.46
Figure 1:
Left column:
PR Curve,
Right column:
ROC Curve. Performance compar-ison of different methods for differential gene expression over time based on the GMNBgenerative model. AUCs are given in the corresponding legends in the plots.
In the second simulation study, data is generated according to the DyNB model assump-tions. More specifically, we draw the true mean values µ k , for 1000 genes from a Gaussianprocess with the mean m k and the covariance matrix Cov ( t i , t j ) = θ k exp( − α k | t i − t j | ),where m k , θ k and α k are uniformly distributed in the intervals [1000 , , . , m k and covari-ance function parameters { θ k , α k } to { bm k , cθ k , α k ± d } , where b = 1 . c = 10, and d = 0 .
25 determine the significance of expected expression changes across conditions.Similar to the previous simulation setup, 4 replicates are generated for each time pointin the corresponding condition.Figure 2 demonstrates the performances of different methods applied to the datagenerated according to the above procedure. GMNB still clearly outperforms the othermethods based on both ROC and PR curves. The inferior performance of DyNB may bedue to the small number of replicates for each time point, leading to poor estimates ofboth θ k (heuristically estimated by the data dependent value 10 × stdev( Y ) based on theobserved replicates Y = { y , . . . , y J } in ¨Aij¨o et al. [2014]) and µ k (heuristically estimatedby min ( Y )+ max ( Y )2 in ¨Aij¨o et al. [2014]). On the contrary, the fully Bayesian nature ofGMNB makes its performance robust to the number of replicates as well as potentialnoise at each time point. Both variates of static DESeq2 under-perform the dynamic ap-proaches remarkably, as they neglect the correlation between samples across time points.In addition, DESeq2-GLM under-performs both GMNB and DyNB substantially, as itneglects the inherent dynamics of RNA-seq experiments specifically. To further demon-strate the potential influence of gene expression levels, reflected by the expected readcounts, on the detection power of differentially expressed genes, additional simulationsare performed with the same parameters as above, except for the mean parameter m k ,for which three sampling uniform distributions are tested with the intervals [1000 , , , Recall P rec i s i o n GMNB, AUC = 0.78DyNB, AUC = 0.71DESeq-min, AUC = 0.52DESeq-avg, AUC = 0.59DESeq-GLM, AUC = 0.16 0 0.2 0.4 0.6 0.8 1
False Positive Rate T r u e P o s i t i v e R a t e GMNB, AUC = 0.92DyNB, AUC = 0.89DESeq-min, AUC = 0.86DESeq-avg, AUC = 0.85DESeq-GLM, AUC = 0.57
Figure 2:
Left column:
PR Curve,
Right column:
ROC Curve. Performance compar-ison of different methods for differential gene expression over time based on the DyNBgenerative model. AUCs are given in the corresponding legends in the plots.no matter when we have low or high level counts. It is also noticeable that DyNB andDESeq2-min are more sensitive with variable performances across 20 randomly generateddatasets. This indicates that GMNB better borrows signal strengths across time pointscompared to DyNB and DESeq2-min.
In addition to synthetic data based on the GMNB and DyNB models, we evaluate thesemethods with the simulated data based on the NB-AR(1) model [Oh et al., 2013]. Moreprecisely, the count for gene k at time t is distributed according to a NB distribution whosemean parameter satisfies log( µ ( t ) k ) = ω ( t ) k + β k . Here β k follows the uniform distributionin [4 . , .
5] to test the temporal differential expression performance with low read counts.The parameter ω ( t ) k is obtained through an auto-regressive process φ k ω ( t − k + (cid:15) ( t ) , where φ k is randomly generated from the uniform distribution in [0 . , . (cid:15) ( t ) is a standard13ero-mean white noise process. Similar to the previous two simulation models, read countsare generated for 1000 genes and 10% of them selected to be differentially expressed bychanging the parameter φ k to bφ k for the second condition, where b = / φ k ≤ . / φ k > . φ k that controls the temporal dependence of gene expression. However, thetemporal correlation assumptions of the Gaussian process, different from this generativemodel, makes it less powerful to identify all differentially expressed genes. The resultsin Figure 3 demonstrate the higher power of GMNB in detecting temporal differentialexpression patterns, especially with low expression levels (read counts are approximately150 here). Similar to the DyNB generative model, we compare the performance of GMNBwith DyNB and DESeq2-min (top performing methods) with different expected counts. InFigure 4, additional simulations are performed with the same parameters as above, exceptthe uniform distribution for the β k with three varying intervals: [4 . , . . , . . , . Recall P rec i s i o n GMNB, AUC = 0.52DyNB, AUC = 0.37DESeq2-min, AUC = 0.43DESeq2-avg, AUC = 0.39DESeq2-GLM, AUC = 0.28 0 0.2 0.4 0.6 0.8 1
False Positive Rate T r u e P o s i t i v e R a t e GMNB, AUC = 0.84DyNB, AUC = 0.66DESeq2-min, AUC = 0.80DESeq2-avg, AUC = 0.75DESeq2-GLM, AUC = 0.69
Figure 3:
Left column:
PR Curve,
Right column:
ROC Curve. Performance compari-son of different methods for differential gene expression over time based on the NB-AR(1)generative model. AUCs are given in the corresponding legends in the plots.erative models, which can have abrupt non-smooth changes. In addition, the heuristicestimation of model parameters adopted in DyNB [ ¨Aij¨o et al., 2014] when the number ofreplicates is low can be the other reason for the degraded performance.In summary, on synthetic RNA-seq count data from different generative models, com-parison of both the ROC and PR curves shows that GMNB outperforms both the recentlyproposed temporal (DyNB) and static differential analysis methods that aggregate dif-ferential statistics in heuristic ways (DESeq2 with different setups). Table 1 summarizesthe average AUCs and their standard deviation values of both ROC and PR curves for20 randomly generated synthetic datasets by the top three performing methods (GMNB,DyNB, and DESeq2-min). GMNB improves the performances of DyNB and DESeq2-min,in terms of AUC-PR, at least by 23% and 17%, respectively. In the best case scenario,GMNB improves the AUC-PR performances of DyNB and DESeq2-min up to 48% and71%, respectively. In terms of AUC-ROC, GMNB improves the best case performances15able 1: Comparison of AUCs based on 20 runs for each method.
AUC Generative Model GMNB DyNB DESeq2-minGMNB 0.84 ± ± ± ROC DyNB 0.94 ± ± ± NB-AR(1) 0.81 ± ± ± GMNB 0.61 ± ± ± PR DyNB 0.79 ± ± ± NB-AR(1) 0.51 ± ± ± To further illustrate how GMNB may help identify differentially expressed genes fromtemporal RNA-seq data for biologically significant results, we provide such a case studyconsisting of 57 human samples during the priming of T helper 17 (Th17) cell differ-entiation [Tuomela et al., 2012]. The main goal of designing this case study is to gaininsights into the differentiation process by unraveling dependency between different ge-netic factors in various pathways, which may serve as potential biomarkers of immuno-logical diseases for therapeutic intervention design. In this dataset [Tuomela et al., 2016],at 0 , . , , , , , , ,
48, and 72 hours of Th17 polarized cells and control Th0 cells,three biological replicates were collected for transcript profiling by RNA-seq. The data16
500 600 150
AUC − P R . . . . . . GMNBDyNBDESeq2−min
AUC − R O C . . . . . . DyNB generative model
450 250 150
AUC − P R . . . . . .
450 250 150
AUC − R O C . . . . . . NB−AR(1) generative model
Figure 4: AUC comparison of different methods for differential gene expression analysisover time in low counts.were downloaded from Gene Omnibus with the accession number GSE52260 [Tuomelaet al., 2016, Chan et al., 2016].When checking the 10 most differentially expressed genes based on their BFs byGMNB, all of them have been reported to be differentially expressed in other studiesinvestigating Th17 cell differentiation. Among them, the top differentially expressed geneis thrombospondin-1 (TSP1), whose encoded protein participates in the differentiationof Th17 cells by activating transforming growth factor beta (TGF- β ) and enhancingthe inflammatory response in experimental autoimmune encephalomyelitis (EAE) [Yang17t al., 2009]. The second gene in the list is Lymphotoxin α (LTA), a member of the tumornecrosis factor (TNF) superfamily that is both secreted and expressed on the cell surfaceof activated Th17 cells [Chiang et al., 2009]. The third gene, COL6A3, contributes toadipose tissue inflammation [Pasarica et al., 2009] and responds quickly to Th17 cellpolarizing stimulation [Tripathi et al., 2017]. The gene Cathepsin L (CTSL1) is rankedas the fourth in the list and is linked to the regulation of immune responses at the level ofMHC complex maturation and Ag presentation influencing differentiation of CD4+ cellsand autoimmune reactions [Reiser et al., 2010]. The fifth gene, FURIN, has been reportedas a T cell activation gene that regulates the T helper cell balance of the immune system[Pesu et al., 2008]. The sixth gene lamin A (LMNA) has been identified as one of theimmune response regulators [Gonz´alez-Granado et al., 2014]. The seventh gene, FilaminA (FLNA), is required for T cell activation [Hayashi and Altman, 2006]. The eighthgene, SBNO2, has been reported to influence Th17 cell differentiation [Tripathi et al.,2017]. Zhao et al. [2014] observed significant changes in the expression of the ninth geneACTB in activated T cells. Finally, the tenth gene Notch1 is activated in both mouseand human in vitro-polarized Th17 cells and also in Th17 polarized cells as comparedwith Th0 control cells [Keerthivasan et al., 2011].We then investigate how the results of DyNB differ from those of GMNB. The majorityof the above genes are indeed ranked relatively high by DyNB as differentially expressed,except two genes: FLNA and ACTB. For these two genes, their expression levels changeabruptly after 12 hours of T17 differentiation. These two genes demonstrate that theDyNB method may fail to detect temporal differential expression when the temporalgene expression trends are not smooth. As an instance, Figure 5 illustrates that DyNBis not able to capture the temporal expression changes of gene FLNA accurately. Moreprecisely, Figure 5(a) shows the posterior means of expected gene expression µ k based onDyNB and their corresponding confidence intervals, where circles and diamonds representthe normalized counts from Th0 and Th17 lineages, respectively. To further assess the18ower of the models in reproducing the observed gene counts, for each model, we generate1000 gene counts per sample and time point based on the inferred parameters, andthen calculate the 99% confidence interval using these synthetically generated counts.Figure 5(b) demonstrates the means and confidence intervals of the counts generated viathis procedure for DyNB, where the circles and diamonds represent the observed rawcounts from Th0 and Th17 lineages, respectively. Similar to plots in Figures 5(a) and5(b), we perform the same examinations on expression pattern of FLNA by the GMNBmodel. To demonstrate the expression levels of the k th gene between two groups, theDyNB compares the posterior NB mean parameters µ k , whereas the GMNB compares theposterior NB shape parameters r k . One may consider that the expression level of gene k is assumed to roughly follow a function of the shape parameter r k in the GMNB, but theobserved counts should be demonstrated in a same scale as the shape parameter. Thedifference between the posterior shape parameters r k explains the differences betweenthe means, since if n ( t ) kj ∼ NB( r ( t ) k , p ( t ) j ), then E [ n ( t ) kj ] = r ( t ) k p ( t ) j / (1 − p ( t ) j ). Therefore,Figure 5(c) shows the posterior means of r k based on GMNB and their correspondingconfidence intervals, where the circles and diamonds are obtained by dividing the observedcounts by the parameter p ( t ) j / (1 − p ( t ) j ) representing the sequencing depth in the proposedmodel. Additionally, Figure 5(d) demonstrates the means and confidence intervals forsynthetically generated gene counts based on the inferred parameters of GMNB, wherethe read counts on the y-axis are observed read counts. Not only GMNB improves themodel fitting over 24h to 72h, but also it has more robust estimation of expression patternsfor the starting time points with lower counts (Figures 5(c) and 5(d)). The calculatedBFs for the gene FLNA are 2.3461 and 1 . × by DyNB and GMNB, respectively.GMNB also identifies ACTB as a gene with significant differential temporal expression(BF >
10) but DyNB again fails to capture the abrupt expression changes and therebyassociates low BF (supplement materials). The corresponding temporal expression plotsare depicted in Figure S1 of the supplement materials.19 a) (b)(c) (d)
Figure 5:
Differentially expressed gene FLNA detected by GMNB but not byDyNB. (a) The normalized gene expression profile of
FLNA over time estimated byDyNB model. The normalization of read counts on the y-axis are obtained by usingthe normalization method of DESeq. The solid blue and red curves are the posteriormeans under Th0 and Th17 lineages, respectively, with corresponding 99% CIs (shadedareas). (b) The gene expression profile of
FLNA over time estimated by DyNB. Theread counts on the y-axis are observed read count. The solid blue and red curves arethe means of the generated samples based on the inferred parameters by DyNB underTh0 and Th17 lineages, respectively, with corresponding 99% CIs (shaded areas aroundmeans). (c) The normalized gene expression profile of
FLNA over time estimated by theproposed GMNB model. The normalization of read counts on the y-axis are obtained bydividing the observed counts by the parameter p ( t ) j / (1 − p ( t ) j ) representing the sequencingdepth in the model. The solid blue and red curves are posterior means of r k under Th0and Th17 lineages, respectively, with corresponding 99% CIs (shaded areas). (d) Thegene expression profile of FLNA over time estimated by GMNB. The read counts onthe y-axis are observed read count. The solid blue and red curves are the means of thegenerated samples based on the inferred parameters by GMNB under Th0 and Th17lineages, respectively, with corresponding 99% CIs (shaded areas around means).20n the other hand, LGALS1, SEPT5, BATF3, COL1A2, and ENO2 are five genesout of 90 differentially expressed genes detected by DyNB with BFs 2 . × , 472 . . × , 404 .
34, and 398 .
43, whereas they are associated with BFs lower than 10 byGMNB. Figure 6 illustrates the expression profile of the gene LGALS1 inferred by DyNBand GMNB, indicating that DyNB is not able to filter out those low count genes forwhich the replicated Th0 and Th17 lineages are seemingly similar and leads to this po-tential false positive. Figure 6(a) shows the posterior means of expected gene expression µ k based on DyNB under Th0 and Th17 lineages with their corresponding confidence in-tervals. Figure 6(b) shows the means and confidence intervals for 1000 generated samplesbased on the inferred parameters of DyNB model. While the normalized counts are plot-ted in Figure 6(a), the circles and diamonds mark Th0 and Th17 lineages, respectively,for the observed raw counts in Figure 6(b). On the contrary, GMNB considers this genenot significantly differentially expressed with similar inferred temporal expression profilesacross conditions, as demonstrated in Figures 6(c) and 6(d). This may be explained bythe fact that GMNB employs a fully generative model of gene expressions, including thesequencing depth, while DyNB uses a deterministic ad-hoc procedure to normalize genecounts, and thus neglecting the uncertainty over the sequencing depth when computingthe BF, leading to potential false positives. Figures 6(c) and 6(d) demonstrate the pos-terior means of r k based on GMNB and the means of synthetically generated samplesbased on the inferred parameters of the proposed model, respectively. Similar to theplots for LGALS1, Figures S8, S9, S10, and S11 in the supplement materials show thesimilar trends for the genes SEPT5, BATF3, COL1A2, and ENO2 based on the resultsby DyNB and GMNB.In order to further demonstrate the advantages of GMNB, the overlap of three ap-proaches (GMNB, DyNB and DESeq2-min), for 100 top differentially expressed genesidentified by GMNB, is depicted as a Venn diagram in Figure 7. A gene is differen-tially expressed based on DESeq2-min if the corresponding p-value < .
05 at any time21 a) (b)(c) (d)
Figure 6:
Example of genes detected as differentially expressed by DyNB butnot by GMNB: LGALS1. (a) The normalized gene expression profile of
LGALS1 over time estimated by DyNB model. The normalization of read counts on the y-axis areobtained by using the normalization method of DESeq. The solid blue and red curves arethe posterior means under Th0 and Th17 lineages, respectively, with corresponding 99%CIs (shaded areas). (b) The gene expression profile of
LGALS1 over time estimatedby DyNB. The read counts on the y-axis are observed read count. The solid blue andred curves are the means of the generated samples based on the inferred parameters byDyNB under Th0 and Th17 lineages, respectively, with corresponding 99% CIs (shadedareas around means). (c) The normalized gene expression profile of
LGALS1 over timeestimated by the proposed GMNB model. The normalization of read counts on the y-axisare obtained by dividing the observed counts by the parameter p ( t ) j / (1 − p ( t ) j ) representingthe sequencing depth in the model. The solid blue and red curves are posterior meansof r k under Th0 and Th17 lineages, respectively, with corresponding 99% CIs (shadedareas). (d) The gene expression profile of LGALS1 over time estimated by GMNB. Theread counts on the y-axis are observed read count. The solid blue and red curves arethe means of the generated samples based on the inferred parameters by GMNB underTh0 and Th17 lineages, respectively, with corresponding 99% CIs (shaded areas aroundmeans). 22oint. Out of top 100 differentially expressed genes identified by GMNB (log(BF) > α that is critical to drive Th17 differentiation[Corcoran and ONeill, 2016]. EGR2 has been identified as an important transcriptionfactor in the development and function of Th17 cells [Zhang et al., 2015, Zhu et al., 2008].IL6ST is known as a signature transcript of Th17 cells [Ghoreschi et al., 2010]. This againillustrates the benefits of GMNB on better modeling temporal dynamic changes to detectbiologically meaningful genes who show significant difference in temporal changes but donot show significant differential expression when studying them at individual time points. To further demonstrate the biological relevance of the detected genes by GMNB, GOanalysis of top 100 differentially expressed genes (log(BF) > DESeq2−minGMNB DyNB
Figure 7: A Venn diagram representing the overlaps of the top 100 differentially expressedgenes detected by GMNB with DyNB and DESeq2-min.process. The most significantly enriched GO terms are related to the organ develop-ment (p-value < × − ), immune system process (p-value < × − ), immuneresponse (p-value < × − ), response to stimulus (p-value < − ), cell differentiation(p-value < × − ), and defense response (p-value < × − ). In particular, 38% and74% of these 100 genes are annotated to immune response and response to stimulus, re-spectively, supported by the central role of Th17 cells in the pathogenesis of autoimmuneand inflammatory diseases [Waite and Skokos, 2011]. We further analyze the second temporal RNA-seq dataset, for which DyNB was imple-mented for studying Th17 cell lineage [ ¨Aij¨o et al., 2014]. In this dataset, CD4+ T cellswere activated and polarized as described in Tuomela et al. [2012] and RNA-seq datawere collected at 0, 12, 24, 48 and 72 hours of both the activation (Th0) and differenti-24able 2: Comparison of BF ranks for the reported genes by DyNB
Genes DyNB GMNBRORC
37 (BF = 2 . × ) (BF = 2 . × ) IL17F
352 (BF = 1 . × ) (BF = 2 . × ) IL17A
755 (BF = 6 . × ) (BF = 3 . × )ation (Th17). At each time point, there are 3 biological replicates for both cell lineages.The original paper [ ¨Aij¨o et al., 2014] performed DyNB to quantify Th17-specific geneexpression dynamics.The authors in ¨Aij¨o et al. [2014] first normalized the RNA-seq counts by the DESeqpipeline [Anders and Huber, 2010]. Then, DyNB was applied to the normalized expressionvalues to identify differentially expressed genes between the Th0 and Th17 lineages.Genes were considered differentially expressed if (i) BF >
10, and (ii) fold-change > IL A , IL F , and RORC .We apply GMNB to analyze the same Th17 cell lineage dataset to identify differen-tially expressed genes. To compare the ranked lists of genes by GMNB and by DyNBrespectively, Table 1 gives the ranks as well as the computed BF values by GMNB andDyNB for these reported genes in ¨Aij¨o et al. [2014]. These qRT-PCR validated genes arein fact ranked higher by GMNB, indicating more promising potential for marker geneidentification.
GMNB offers a comprehensive and fully Bayesian solution to study temporal RNA-seqdata. The most notable advantage is the capacity to capture a broad range of gene ex-pression patterns over time by the integration of a gamma Markov chain into a negativebinomial distribution model. This allows GMNB to offer consistent performance over25ifferent generative models and makes it be robust for studies with different numbers ofreplicates by borrowing the statistical strength across both genes and samples. Anothercritical advantage is the efficient closed-form Gibbs sampling inference of the model pa-rameters, which improves the computational complexity compared to the state-of-the-artmethods. This is achieved by using a statistically well-founded data augmentation solu-tion. In addition, GMNB explicitly models the potential sequencing depth heterogeneityso that no heuristic preprocessing step is required. Experimental results on both syn-thetic and real-world RNA-seq data demonstrate the state-of-the-art performance of theGMNB method for temporal differential expression analysis of RNA-seq data.
This research was supported in part by the NSF Awards CCF-1553281, CCF-1718513,and the USDA NIFA Award 06-505570-01006 to X. Qian and the NSF Award DBI-1532188 and the funding support from QNRF (NPRP9-001-2-001, NPRP7-1634-2-604),CSTR (2017-01), and CONACYT to P. de Figueiredo.
References
A. Acharya, J. Ghosh, and M. Zhou. Nonparametric bayesian factor analysis for dynamiccount matrices. In
Artificial Intelligence and Statistics , pages 1–9, 2015.T. ¨Aij¨o, V. Butty, Z. Chen, V. Salo, S. Tripathi, C. B. Burge, R. Lahesmaa, andH. L¨ahdesm¨aki. Methods for time series analysis of rna-seq data with applicationto human th17 cell differentiation.
Bioinformatics , 30(12):i113–i120, 2014.S. Anders and W. Huber. Differential expression analysis for sequence count data.
Genome biology , 11(10):R106, 2010.S. Anders, P. T. Pyl, and W. Huber. Htseqa python framework to work with high-throughput sequencing data.
Bioinformatics , 31(2):166–169, 2015.Z. Bar-Joseph, A. Gitter, and I. Simon. Studying and modelling dynamic biologicalprocesses using time-series gene expression data.
Nature reviews. Genetics , 13(8):552,2012. 26. H. Chan, J. Intosalmi, S. Rautio, and H. L¨ahdesm¨aki. A subpopulation model toanalyze heterogeneous cell differentiation dynamics.
Bioinformatics , 32(21):3306–3313,2016.E. Y. Chiang, G. A. Kolumam, X. Yu, M. Francesco, S. Ivelja, I. Peng, P. Gribling,J. Shu, W. P. Lee, C. J. Refino, et al. Targeted depletion of lymphotoxin- α –expressingth1 and th17 cells inhibits autoimmune disease. Nature medicine , 15(7):766–773, 2009.E. P. Consortium et al. An integrated encyclopedia of dna elements in the human genome.
Nature , 489(7414):57, 2012.S. E. Corcoran and L. A. ONeill. Hif1 α and metabolic reprogramming in inflammation. The Journal of clinical investigation , 126(10):3699–3707, 2016.S. Z. Dadaneh, X. Qian, and M. Zhou. Bnp-seq: Bayesian nonparametric differentialexpression analysis of sequencing count data.
Journal of the American Statistical As-sociation , (just-accepted), 2017.Y. Doi, S. Oki, T. Ozawa, H. Hohjoh, S. Miyake, and T. Yamamura. Orphan nuclearreceptor nr4a2 expressed in t cells from multiple sclerosis mediates production of in-flammatory cytokines.
Proceedings of the National Academy of Sciences , 105(24):8381–8386, 2008.M. S. Fassett, W. Jiang, A. M. D’Alise, D. Mathis, and C. Benoist. Nuclear receptor nr4a1modulates both regulatory t-cell (treg) differentiation and clonal deletion.
Proceedingsof the National Academy of Sciences , 109(10):3891–3896, 2012.K. Ghoreschi, A. Laurence, X.-P. Yang, C. M. Tato, M. J. McGeachy, J. E. Konkel, H. L.Ramos, L. Wei, T. S. Davidson, N. Bouladoux, et al. Generation of pathogenic th17cells in the absence of tgf-[bgr] signalling.
Nature , 467(7318):967–971, 2010.J. Gnanaprakasam and R. Wang. Myc in regulating immunity: metabolism and beyond.
Genes , 8(3):88, 2017.J. M. Gonz´alez-Granado, C. Silvestre-Roig, V. Rocha-Perugini, L. Trigueros-Motos,D. Cibri´an, G. Morlino, M. Blanco-Berrocal, F. G. Osorio, J. M. P. Freije, C. L´opez-Ot´ın, et al. Nuclear envelope lamin-a couples actin dynamics with immunologicalsynapse architecture and t cell activation.
Science signaling , 7(322):ra37, 2014.T. J. Hardcastle and K. A. Kelly. bayseq: empirical bayesian methods for identifyingdifferential expression in sequence count data.
BMC bioinformatics , 11(1):422, 2010.K. Hayashi and A. Altman. Filamin a is required for t cell activation mediated by proteinkinase c- θ . The Journal of Immunology , 177(3):1721–1728, 2006.N. L. Johnson, A. W. Kemp, and S. Kotz.
Univariate discrete distributions , volume 444.John Wiley & Sons, 2005. 27. Keerthivasan, R. Suleiman, R. Lawlor, J. Roderick, T. Bates, L. Minter, J. Anguita,I. Juncadella, B. J. Nickoloff, I. C. Le Poole, et al. Notch signaling regulates mouseand human th17 differentiation.
The Journal of Immunology , 187(2):692–701, 2011.Y. Kurebayashi, S. Nagai, A. Ikejiri, M. Ohtani, K. Ichiyama, Y. Baba, T. Yamada,S. Egami, T. Hoshii, A. Hirao, et al. Pi3k-akt-mtorc1-s6k1/2 axis controls th17 differ-entiation by regulating gfi1 expression and nuclear translocation of ror γ . Cell reports ,1(4):360–373, 2012.C. W. Law, Y. Chen, W. Shi, and G. K. Smyth. Voom: precision weights unlock linearmodel analysis tools for rna-seq read counts.
Genome biology , 15(2):R29, 2014.N. Leng, J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. Smits, J. D.Haag, M. N. Gould, R. M. Stewart, and C. Kendziorski. Ebseq: an empirical bayeshierarchical model for inference in rna-seq experiments.
Bioinformatics , 29(8):1035–1043, 2013.N. Leng, Y. Li, B. E. McIntosh, B. K. Nguyen, B. Duffin, S. Tian, J. A. Thomson,C. N. Dewey, R. Stewart, and C. Kendziorski. Ebseq-hmm: a bayesian approach foridentifying gene-expression changes in ordered rna-seq experiments.
Bioinformatics ,31(16):2614–2622, 2015.J. Lienau, K. Schmidt-Bleek, A. Peters, H. Weber, H. J. Bail, G. N. Duda, C. Perka,and H. Schell. Insight into the molecular pathophysiology of delayed bone healing in asheep model.
Tissue Engineering Part A , 16(1):191–199, 2009.M. I. Love, W. Huber, and S. Anders. Moderated estimation of fold change and dispersionfor rna-seq data with deseq2.
Genome biology , 15(12):550, 2014.P. McCullagh. Generalized linear models.
European Journal of Operational Research , 16(3):285–292, 1984.A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer, and B. Wold. Mapping andquantifying mammalian transcriptomes by rna-seq.
Nature methods , 5(7):621–628,2008.M. J. Nueda, S. Tarazona, and A. Conesa. Next masigpro: updating masigpro biocon-ductor package for rna-seq time series.
Bioinformatics , 30(18):2598–2602, 2014.S. Oh, S. Song, G. Grabowski, H. Zhao, and J. P. Noonan. Time series expression analysesusing rna-seq: a statistical approach.
BioMed research international , 2013, 2013.M. Pasarica, B. Gowronska-Kozak, D. Burk, I. Remedios, D. Hymel, J. Gimble,E. Ravussin, G. A. Bray, and S. R. Smith. Adipose tissue collagen vi in obesity.
The Journal of Clinical Endocrinology & Metabolism , 94(12):5155–5162, 2009.M. Pesu, W. T. Watford, L. Wei, L. Xu, I. Fuss, W. Strober, J. Andersson, E. Shevach,M. Quezado, A. Roebroek, et al. T cell-expressed proprotein convertase furin is essentialfor maintenance of peripheral tolerance.
Nature , 455(7210):246, 2008.28. E. Rasmussen and C. K. Williams.
Gaussian processes for machine learning , volume 1.MIT press Cambridge, 2006.J. Reiser, B. Adair, and T. Reinheckel. Specialized roles for cysteine cathepsins in healthand disease.
The Journal of clinical investigation , 120(10):3421, 2010.M. D. Robinson, D. J. McCarthy, and G. K. Smyth. edger: a bioconductor package fordifferential expression analysis of digital gene expression data.
Bioinformatics , 26(1):139–140, 2010.S. Sawcer, G. Hellenthal, M. Pirinen, C. C. Spencer, N. A. Patsopoulos, L. Moutsianas,A. Dilthey, Z. Su, C. Freeman, S. E. Hunt, et al. Genetic risk and a primary role forcell-mediated immune mechanisms in multiple sclerosis.
Nature , 476(7359):214, 2011.A. Schein, H. Wallach, and M. Zhou. Poisson-gamma dynamical systems. In
Advancesin Neural Information Processing Systems , pages 5005–5013, 2016.D. Spies and C. Ciaudo. Dynamics in transcriptomics: advancements in rna-seq timecourse and downstream analysis.
Computational and structural biotechnology journal ,13:469–477, 2015.X. Sun, D. Dalpiaz, D. Wu, J. S. Liu, W. Zhong, and P. Ma. Statistical inference for timecourse rna-seq data using a negative binomial mixed-effect model.
BMC bioinformatics ,17(1):324, 2016.S. K. Tripathi, Z. Chen, A. Larjo, K. Kanduri, K. Nousiainen, T. ¨Aijo, I. Rica˜no-Ponce,B. Hrdlickova, S. Tuomela, E. Laajala, et al. Genome-wide analysis of stat3-mediatedtranscription during early human th17 cell differentiation.
Cell Reports , 19(9):1888–1901, 2017.S. Tuomela, V. Salo, S. K. Tripathi, Z. Chen, K. Laurila, B. Gupta, T. ¨Aij¨o, L. Oikari,B. Stockinger, H. L¨ahdesm¨aki, et al. Identification of early gene expression changesduring human th17 cell differentiation.
Blood , 119(23):e151–e160, 2012.S. Tuomela, S. Rautio, H. Ahlfors, V. ¨Oling, V. Salo, U. Ullah, Z. Chen, S. H¨am¨alist¨o,S. K. Tripathi, T. ¨Aij¨o, et al. Comparative analysis of human and mouse transcriptomesof th17 cell priming.
Oncotarget , 7(12):13416, 2016.J. C. Waite and D. Skokos. Th17 response and inflammatory autoimmune diseases.
International journal of inflammation , 2012, 2011.Z. Wang, M. Gerstein, and M. Snyder. Rna-seq: a revolutionary tool for transcriptomics.
Nature reviews genetics , 10(1):57–63, 2009.K. Yang, J. L. Vega, M. Hadzipasic, J. P. S. Peron, B. Zhu, Y. Carrier, S. Masli, L. V.Rizzo, and H. L. Weiner. Deficiency of thrombospondin-1 reduces th17 differentiationand attenuates experimental autoimmune encephalomyelitis.
Journal of autoimmunity ,32(2):94–103, 2009. 29. Yosef, A. K. Shalek, J. T. Gaublomme, H. Jin, Y. Lee, A. Awasthi, C. Wu, K. Karwacz,S. Xiao, M. Jorgolli, et al. Dynamic regulatory network controlling th17 cell differen-tiation.
Nature , 496(7446):461, 2013.M. Zhang, Y. Wang, J.-S. Wang, J. Liu, M.-M. Liu, and H.-B. Yang. The roles of egr-2in autoimmune diseases.
Inflammation , 38(3):972–977, 2015.S. Zhao, W.-P. Fung-Leung, A. Bittner, K. Ngo, and X. Liu. Comparison of rna-seq andmicroarray in transcriptome profiling of activated t cells.
PloS one , 9(1):e78644, 2014.M. Zhou and L. Carin. Negative binomial process count and mixture modeling.
IEEETransactions on Pattern Analysis and Machine Intelligence , 37(2):307–320, 2015.B. Zhu, A. L. Symonds, J. E. Martin, D. Kioussis, D. C. Wraith, S. Li, and P. Wang.Early growth response gene 2 (egr-2) controls the self-tolerance of t cells and preventsthe development of lupuslike autoimmune disease.