Tobias K. Karakach
National Research Council
BMC Bioinformatics | 2006
Peter D. Wentzell; Tobias K. Karakach; Sushmita Roy; M. Juanita Martinez; Chris Allen; Margaret Werner-Washburne
BackgroundModeling of gene expression data from time course experiments often involves the use of linear models such as those obtained from principal component analysis (PCA), independent component analysis (ICA), or other methods. Such methods do not generally yield factors with a clear biological interpretation. Moreover, implicit assumptions about the measurement errors often limit the application of these methods to log-transformed data, destroying linear structure in the untransformed expression data.ResultsIn this work, a method for the linear decomposition of gene expression data by multivariate curve resolution (MCR) is introduced. The MCR method is based on an alternating least-squares (ALS) algorithm implemented with a weighted least squares approach. The new method, MCR-WALS, extracts a small number of basis functions from untransformed microarray data using only non-negativity constraints. Measurement error information can be incorporated into the modeling process and missing data can be imputed. The utility of the method is demonstrated through its application to yeast cell cycle data.ConclusionProfiles extracted by MCR-WALS exhibit a strong correlation with cell cycle-associated genes, but also suggest new insights into the regulation of those genes. The unique features of the MCR-WALS algorithm are its freedom from assumptions about the underlying linear model other than the non-negativity of gene expression, its ability to analyze non-log-transformed data, and its use of measurement error information to obtain a weighted model and accommodate missing measurements.
Magnetic Resonance in Chemistry | 2009
Miroslava Cuperlovic-Culf; Nabil Belacel; Adrian S. Culf; Ian C. Chute; Rodney J. Ouellette; Ian W. Burton; Tobias K. Karakach; John A. Walter
The global analysis of metabolites can be used to define the phenotypes of cells, tissues or organisms. Classifying groups of samples based on their metabolic profile is one of the main topics of metabolomics research. Crisp clustering methods assign each feature to one cluster, thereby omitting information about the multiplicity of sample subtypes. Here, we present the application of fuzzy K‐means clustering method for the classification of samples based on metabolomics 1D 1H NMR fingerprints. The sample classification was performed on NMR spectra of cancer cell line extracts and of urine samples of type 2 diabetes patients and animal models. The cell line dataset included NMR spectra of lipophilic cell extracts for two normal and three cancer cell lines with cancer cell lines including two invasive and one non‐invasive cancers. The second dataset included previously published NMR spectra of urine samples of human type 2 diabetics and healthy controls, mouse wild type and diabetes model and rat obese and lean phenotypes. The fuzzy K‐means clustering method allowed more accurate sample classification in both datasets relative to the other tested methods including principal component analysis (PCA), hierarchical clustering (HCL) and K‐means clustering. In the cell line samples, fuzzy clustering provided a clear separation of individual cell lines, groups of cancer and normal cell lines as well as non‐invasive and invasive tumour cell lines. In the diabetes dataset, clear separation of healthy controls and diabetics in all three models was possible only by using the fuzzy clustering method. Copyright
Analytica Chimica Acta | 2009
Tobias K. Karakach; Peter D. Wentzell; John A. Walter
NMR-based metabolomics is characterized by high throughput measurements of the signal intensities of complex mixtures of metabolites in biological samples by assaying, typically, bio-fluids or tissue homogenates. The ultimate goal is to obtain relevant biological information regarding the dissimilarity in patho-physiological conditions that the samples experience. For a long time now, this information has been obtained through the analysis of measured NMR signals via multivariate statistics. NMR data are quite complex and the use of such multivariate statistical methods as principal components analysis (PCA) for their analysis assumes that the data are multivariate normal with errors that are identical, independent and normally distributed (i.e. iid normal). There is a consensus that these assumptions are not always true for these data and, thus, several methods have been devised to transform the data or weight them prior to analysis by PCA. The structure of NMR measurement noise, or the extent to which violations of error homoscedasticity affect PCA results have neither been characterized nor investigated. A comprehensive characterization of measurement uncertainties in NMR based metabolomics was achieved in this work using an experiment designed to capture contributions of several sources of error to the total variance in the measurements. The noise structure was found to be heteroscedastic and highly correlated with spectral characteristics that are similar to the mean of the spectra and their standard deviation. A model was subsequently developed that potentially allows errors in NMR measurements to be accurately estimated without the need for extensive replication.
Journal of Natural Products | 2012
Feng Qiu; Ayano Imai; James B. McAlpine; David C. Lankin; Ian W. Burton; Tobias K. Karakach; Norman R. Farnsworth; Shao Nong Chen; Guido F. Pauli
The genus Actaea (including Cimicifuga) has been the source of ∼200 cycloartane triterpenes. While they are major bioactive constituents of complementary and alternative medicines, their structural similarity is a major dereplication problem. Moreover, their trivial names seldom indicate the actual structure. This project develops two new tools for Actaea triterpenes that enable rapid dereplication of more than 170 known triterpenes and facilitates elucidation of new compounds. A predictive computational model based on classification binary trees (CBTs) allows in silico determination of the aglycone type. This tool utilizes the Me (1)H NMR chemical shifts and has potential to be applicable to other natural products. Actaea triterpene dereplication is supported by a new systematic naming scheme. A combination of CBTs, (1)H NMR deconvolution, characteristic (1)H NMR signals, and quantitative (1)H NMR (qHNMR) led to the unambiguous identification of minor constituents in residually complex triterpene samples. Utilizing a 1.7 mm cryo-microprobe at 700 MHz, qHNMR enabled characterization of residual complexity at the 10-20 μg level in a 1-5 mg sample. The identification of five co-occurring minor constituents, belonging to four different triterpene skeleton types, in a repeatedly purified natural product emphasizes the critical need for the evaluation of residual complexity of reference materials, especially when used for biological assessment.
PLOS ONE | 2013
Barry E. Kennedy; Veronique G. LeBlanc; Tiffany Mailman; Debra Fice; Ian W. Burton; Tobias K. Karakach; Barbara Karten
Niemann-Pick Type C (NPC) disease is an autosomal recessive neurodegenerative disorder caused in most cases by mutations in the NPC1 gene. NPC1-deficiency is characterized by late endosomal accumulation of cholesterol, impaired cholesterol homeostasis, and a broad range of other cellular abnormalities. Although neuronal abnormalities and glial activation are observed in nearly all areas of the brain, the most severe consequence of NPC1-deficiency is a near complete loss of Purkinje neurons in the cerebellum. The link between cholesterol trafficking and NPC pathogenesis is not yet clear; however, increased oxidative stress in symptomatic NPC disease, increases in mitochondrial cholesterol, and alterations in autophagy/mitophagy suggest that mitochondria play a role in NPC disease pathology. Alterations in mitochondrial function affect energy and neurotransmitter metabolism, and are particularly harmful to the central nervous system. To investigate early metabolic alterations that could affect NPC disease progression, we performed metabolomics analyses of different brain regions from age-matched wildtype and Npc1 -/- mice at pre-symptomatic, early symptomatic and late stage disease by 1H-NMR spectroscopy. Metabolic profiling revealed markedly increased lactate and decreased acetate/acetyl-CoA levels in Npc1 -/- cerebellum and cerebral cortex at all ages. Protein and gene expression analyses indicated a pre-symptomatic deficiency in the oxidative decarboxylation of pyruvate to acetyl-CoA, and an upregulation of glycolytic gene expression at the early symptomatic stage. We also observed a pre-symptomatic increase in several indicators of oxidative stress and antioxidant response systems in Npc1 -/- cerebellum. Our findings suggest that energy metabolism and oxidative stress may present additional therapeutic targets in NPC disease, especially if intervention can be started at an early stage of the disease.
Magnetic Resonance in Chemistry | 2009
Tobias K. Karakach; Richard Knight; Eva M. Lenz; Mark R. Viant; John A. Walter
Modeling NMR‐based metabolomics data often involves linear methods such as principal component analysis (PCA) and partial least squares (PLS). These methods have the objective of describing the main variance in the data and maximum covariance between the predictor variables and some response variable respectively. If the experiment is designed to investigate temporal biological fluctuations, however, the factors obtained become difficult to interpret in a biological context. Moreover, when these methods are applied to analyze data, an implicit assumption is made that the measurement errors exhibit an iid‐normal distribution, often limiting the extent of the information recovered. A method for the linear decomposition of NMR‐based metabolomics data by multivariate curve resolution (MCR), which has been used elsewhere for time course transcriptomics applications, is introduced and implemented via a weighted alternating least squares (ALS) approach. Measurement of error information is incorporated in the modeling process, allowing the least squares projections to be performed in a maximum likelihood fashion. As a result, noise heteroscedasticity resulting from pH‐induced peak shifts can be modeled, eliminating the need for binning/bucketing. The utility of the method is demonstrated using two sets of temporal NMR metabolomics data, HgCl2‐induced nephrotoxicity in rat, and fish (Japanese medaka, Oryzias latipes) embryogenesis. Profiles extracted for the nephrotoxicity data exhibit strong correlations with metabolites consistent with temporal fluctuations in glucosuria. The concentration of metabolites such as acetate, glucose, and alanine exhibit a steady increase, which peaks at Day 3 post dose and returns to basal levels at Day 8. Other metabolites including citrate and 2‐oxoglutarate exhibit the opposite characteristics. Although the fish embryogenesis data are more complex, the profiles extracted by the algorithm display characteristics that depict temporal variation consistent with processes associated with embryogenesis. Copyright
Molecular BioSystems | 2011
Dawn L. MacLellan; Diane Mataija; Alan A. Doucette; Weei Yuarn Huang; Chantale Langlois; Greg Trottier; Ian W. Burton; John A. Walter; Tobias K. Karakach
Urinary tract obstruction (UTO) results in renal compensatory mechanisms and may progress to irrecoverable functional loss and histologic alterations. The pathophysiology of this progression is poorly understood. We identified urinary metabolite alterations in a rodent model of partial and complete UTO using (1)H nuclear magnetic resonance ((1)H-NMR) spectroscopy. Principal component analysis (PCA) was used for classification and discovery of differentiating metabolites. UTO was associated with elevated urinary levels of alanine, succinate, dimethylglycine (DMG), creatinine, taurine, choline-like compounds, hippurate, and lactate. Decreased urinary levels of 2-oxoglutarate and citrate were noted. The patterns of alteration in partial and complete UTO were similar except that an absence of elevated urinary osmolytes (DMG and hippurate) was noted in complete UTO. This pattern of metabolite alteration indicates impaired oxidative metabolism of the mitochondria in renal proximal tubules and production of renal protective osmolytes by the medulla. Decreased production of osmolytes in complete obstruction better elucidates the pathophysiology of progression from renal compensatory mechanisms to irrecoverable changes. Further confirmation of these potential biomarkers in children with UTO is necessary.
Analytical Chemistry | 2014
Feng Qiu; James B. McAlpine; David C. Lankin; Ian W. Burton; Tobias K. Karakach; Shao Nong Chen; Guido F. Pauli
The interpretation of NMR spectroscopic information for structure elucidation involves decoding of complex resonance patterns that contain valuable molecular information (δ and J), which is not readily accessible otherwise. We introduce a new concept of 2D-NMR barcoding that uses clusters of fingerprint signals and their spatial relationships in the δ−δ coordinate space to facilitate the chemical identification of complex mixtures. Similar to widely used general barcoding technology, the structural information of individual compounds is encoded as a specifics pattern of their C,H correlation signals. Software-based recognition of these patterns enables the structural identification of the compounds and their discrimination in mixtures. Using the triterpenes from various Actaea (syn. Cimicifuga) species as a test case, heteronuclear multiple-bond correlation (HMBC) barcodes were generated on the basis of their structural subtypes from a statistical investigation of their δH and δC data in the literature. These reference barcodes allowed in silico identification of known triterpenes in enriched fractions obtained from an extract of A. racemosa (black cohosh). After dereplication, a differential analysis of heteronuclear single-quantum correlation (HSQC) spectra even allowed for the discovery of a new triterpene. The 2D barcoding concept has potential application in a natural product discovery project, allowing for the rapid dereplication of known compounds and as a tool in the search for structural novelty within compound classes with established barcodes.
Planta Medica | 2014
Michelle A. Markus; Jonathan Ferrier; Sarah M. Luchsinger; J Yuk; Alain Cuerrier; Michael J. Balick; Joshua M. Hicks; K. Brian Killday; Christopher W. Kirby; Fabrice Berrue; Russell G. Kerr; Kevin Knagge; Tanja Gödecke; Benjamin Ramirez; David C. Lankin; Guido F. Pauli; Ian W. Burton; Tobias K. Karakach; John T. Arnason; Kl Colson
A method was developed to distinguish Vaccinium species based on leaf extracts using nuclear magnetic resonance spectroscopy. Reference spectra were measured on leaf extracts from several species, including lowbush blueberry (Vaccinium angustifolium), oval leaf huckleberry (Vaccinium ovalifolium), and cranberry (Vaccinium macrocarpon). Using principal component analysis, these leaf extracts were resolved in the scores plot. Analysis of variance statistical tests demonstrated that the three groups differ significantly on PC2, establishing that the three species can be distinguished by nuclear magnetic resonance. Soft independent modeling of class analogies models for each species also showed discrimination between species. To demonstrate the robustness of nuclear magnetic resonance spectroscopy for botanical identification, spectra of a sample of lowbush blueberry leaf extract were measured at five different sites, with different field strengths (600 versus 700 MHz), different probe types (cryogenic versus room temperature probes), different sample diameters (1.7 mm versus 5 mm), and different consoles (Avance I versus Avance III). Each laboratory independently demonstrated the linearity of their NMR measurements by acquiring a standard curve for chlorogenic acid (R(2) = 0.9782 to 0.9998). Spectra acquired on different spectrometers at different sites classifed into the expected group for the Vaccinium spp., confirming the utility of the method to distinguish Vaccinium species and demonstrating nuclear magnetic resonance fingerprinting for material validation of a natural health product.
PLOS ONE | 2012
Patricia L. Mitchell; Tobias K. Karakach; Deborah L. Currie; Roger S. McLeod
Animal and human studies have indicated that fatty acids such as the conjugated linoleic acids (CLA) found in milk could potentially alter the risk of developing metabolic disorders including diabetes and cardiovascular disease (CVD). Using susceptible rodent models (apoE−/− and LDLr−/− mice) we investigated the interrelationship between mouse strain, dietary conjugated linoleic acids and metabolic markers of CVD. Despite an adverse metabolic risk profile, atherosclerosis (measured directly by lesion area), was significantly reduced with t-10, c-12 CLA and mixed isomer CLA (Mix) supplementation in both apoE−/− (p<0.05, n = 11) and LDLr−/− mice (p<0.01, n = 10). Principal component analysis was utilized to delineate the influence of multiple plasma and tissue metabolites on the development of atherosclerosis. Group clustering by dietary supplementation was evident, with the t-10, c-12 CLA supplemented animals having distinct patterns, suggestive of hepatic insulin resistance, regardless of mouse strain. The effect of CLA supplementation on hepatic lipid and fatty acid composition was explored in the LDLr−/− strain. Dietary supplementation with t-10, c-12 CLA significantly increased liver weight (p<0.05, n = 10), triglyceride (p<0.01, n = 10) and cholesterol ester content (p<0.01, n = 10). Furthermore, t-10, c-12 CLA also increased the ratio of 18∶1 to 18∶0 fatty acid in the liver suggesting an increase in the activity of stearoyl-CoA desaturase. Changes in plasma adiponectin and liver weight with t-10, c-12 CLA supplementation were evident within 3 weeks of initiation of the diet. These observations provide evidence that the individual CLA isomers have divergent mechanisms of action and that t-10, c-12 CLA rapidly changes plasma and liver markers of metabolic syndrome, despite evidence of reduction in atherosclerosis.