Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sandra E. Safo is active.

Publication


Featured researches published by Sandra E. Safo.


Computational Statistics & Data Analysis | 2016

General sparse multi-class linear discriminant analysis

Sandra E. Safo; Jeongyoun Ahn

Discrimination with high dimensional data is often more effectively done with sparse methods that use a fraction of predictors rather than using all the available ones. In recent years, some effective sparse discrimination methods based on Fishers linear discriminant analysis (LDA) have been proposed for binary class problems. Extensions to multi-class problems are suggested in those works; however, they have some drawbacks such as the heavy computational cost for a large number of classes. We propose an approach to generalize a binary LDA solution into a multi-class solution while avoiding the limitations of the existing methods. Simulation studies with various settings, as well as real data examples including next generation sequencing data, confirm the effectiveness of the proposed approach.


Diabetic Medicine | 2017

Glucose challenge test screening for prediabetes and early diabetes

Sandra L. Jackson; Sandra E. Safo; Lisa R. Staimez; Darin E. Olson; K. M. V. Narayan; Qi Long; Joseph Lipscomb; Mary K. Rhee; Peter W.F. Wilson; Anne M. Tomolo; Lawrence S. Phillips

To test the hypothesis that a 50‐g oral glucose challenge test with 1‐h glucose measurement would have superior performance compared with other opportunistic screening methods.


Biometrics | 2018

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information

Sandra E. Safo; Shuzhao Li; Qi Long

Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.


Current Pharmacology Reports | 2017

Bioinformatics Tools for the Interpretation of Metabolomics Data

Luiz Gustavo Gardinassi; Jianguo Xia; Sandra E. Safo; Shuzhao Li

Purpose of ReviewMetabolomics is a rapidly evolving field that generates large and complex datasets. Bioinformatics becomes critical towards the extraction of meaningful biological information. In this article, we briefly review computational approaches that have been well accepted in the field, and discuss the development of new methods and tools to interpret metabolomics data.Recent FindingsSignificant progress has been made in computational metabolomics over the past years. This includes methods that are used to preprocess data generated by instruments, to annotate metabolites, to carry out statistical analyses, to identify perturbed metabolic pathways, and to integrate metabolomics with other omics data. Each of these topics is discussed in respective sections of this review.SummaryBioinformatics tools used for metabolomics remain a highly active research area. An ecosystem is emerging with software libraries, standalone tools, and web-based tools and services. While some require bioinformatics training, many of them are user friendly and easily accessible. Much further development is still needed to serve the metabolomics field and its applications.


American Journal of Preventive Medicine | 2017

Reduced Cardiovascular Disease Incidence With a National Lifestyle Change Program

Sandra L. Jackson; Sandra E. Safo; Lisa R. Staimez; Qi Long; Mary K. Rhee; Solveig A. Cunningham; Darin E. Olson; Anne M. Tomolo; Usha Ramakrishnan; K.M. Venkat Narayan; Lawrence S. Phillips

INTRODUCTION Lifestyle change programs implemented within healthcare systems could reach many Americans, but their impact on cardiovascular disease (CVD) remains unclear. The MOVE! program is the largest lifestyle change program implemented in a healthcare setting in the U.S. This study aimed to determine whether MOVE! participation was associated with reduced CVD incidence. METHODS This retrospective cohort study, analyzed in 2013-2015, used national Veterans Health Administration databases to identify MOVE! participants and eligible non-participants for comparison (2005-2012). Patients eligible for MOVE!-obese or overweight with a weight-related health condition, and no baseline CVD-were examined (N=1,463,003). Of these, 169,248 (12%) were MOVE! PARTICIPANTS Patients were 92% male, 76% white, with mean age 52 years and BMI of 32. The main outcome was incidence of CVD (ICD-9 and procedure codes for coronary artery disease, cerebrovascular disease, peripheral vascular disease, and heart failure). RESULTS Adjusting for age, race, sex, BMI, statin use, and baseline comorbidities, over a mean 4.9 years of follow-up, MOVE! participation was associated with lower incidence of total CVD (hazard ratio [HR]=0.83, 95% CI=0.80, 0.86); coronary artery disease (HR=0.81, 95% CI=0.77, 0.86); cerebrovascular disease (HR=0.87, 95% CI=0.82, 0.92); peripheral vascular disease (HR=0.89, 95% CI=0.83, 0.94); and heart failure (HR=0.78, 95% CI=0.74, 0.83). The association between MOVE! participation and CVD incidence remained significant when examined across categories of race/ethnicity, BMI, diabetes, hypertension, smoking status, and statin use. CONCLUSIONS Although participation was limited, MOVE! was associated with reduced CVD incidence in a nationwide healthcare setting.


Biometrics | 2018

Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data: SELP for CCA

Sandra E. Safo; Jeongyoun Ahn; Yongho Jeon; Sungkyu Jung

We present a method for individual and integrative analysis of high dimension, low sample size data that capitalizes on the recurring theme in multivariate analysis of projecting higher dimensional data onto a few meaningful directions that are solutions to a generalized eigenvalue problem. We propose a general framework, called SELP (Sparse Estimation with Linear Programming), with which one can obtain a sparse estimate for a solution vector of a generalized eigenvalue problem. We demonstrate the utility of SELP on canonical correlation analysis for an integrative analysis of methylation and gene expression profiles from a breast cancer study, and we identify some genes known to be associated with breast carcinogenesis, which indicates that the proposed method is capable of generating biologically meaningful insights. Simulation studies suggest that the proposed method performs competitive in comparison with some existing methods in identifying true signals in various underlying covariance structures.


Journal of Diabetes and Its Complications | 2017

Participation in a National Lifestyle Change Program is associated with improved diabetes Control outcomes.

Sandra L. Jackson; Lisa R. Staimez; Sandra E. Safo; Qi Long; Mary K. Rhee; Solveig A. Cunningham; Darin E. Olson; Anne M. Tomolo; Usha Ramakrishnan; Venkat Narayan; Lawrence S. Phillips

AIMS Clinical trials show lifestyle change programs are beneficial, yet large-scale, successful translation of these programs is scarce. We investigated the association between participation in the largest U.S. lifestyle change program, MOVE!, and diabetes control outcomes. METHODS This longitudinal, retrospective cohort study used Veterans Health Administration databases of patients with diabetes who participated in MOVE! between 2005 and 2012, or met eligibility criteria (BMI ≥25kg/m2) but did not participate. Main outcomes were diabetic eye disease, renal disease, and medication intensification. RESULTS There were 400,170 eligible patients with diabetes, including 87,366 (22%) MOVE! PARTICIPANTS Included patients were 96% male, 77% white, with mean age 58years and BMI 34kg/m2. Controlling for baseline measurements and age, race, sex, BMI, and antidiabetes medications, MOVE! participants had lower body weight (-0.6kg), random plasma glucose (-2.8mg/dL), and HbA1c (-0.1%) at 12months compared to nonparticipants (each p<0.001). In multivariable Cox models, MOVE! participants had lower incidence of eye disease (hazard ratio 0.80, 95% CI 0.75-0.84) and renal disease (HR 0.89, 95% CI 0.86-0.92) and reduced medication intensification (HR 0.82, 95% CI 0.80-0.84). CONCLUSIONS If able to overcome participation challenges, lifestyle change programs in U.S. health systems may improve health among the growing patient population with diabetes.


Bioinformatics | 2018

Penalized co-inertia analysis with applications to -omics data

Eun Jeong Min; Sandra E. Safo; Qi Long

Motivation Co‐inertia analysis (CIA) is a multivariate statistical analysis method that can assess relationships and trends in two sets of data. Recently CIA has been used for an integrative analysis of multiple high‐dimensional omics data. However, for classical CIA, all elements in the loading vectors are nonzero, presenting a challenge for the interpretation when analyzing omics data. For other multivariate statistical methods such as canonical correlation analysis (CCA), penalized least squares (PLS), various approaches have been proposed to produce sparse loading vectors via l1‐penalization/constraint. We propose a novel CIA method that uses l1‐penalization to induce sparsity in estimators of loading vectors. Our method simultaneously conducts model fitting and variable selection. Also, we propose another CIA method that incorporates structure/network information such as those from functional genomics, besides using sparsity penalty so that one can get biologically meaningful and interpretable results. Results Extensive simulations demonstrate that our proposed penalized CIA methods achieve the best or close to the best performance compared to the existing CIA method in terms of feature selection and recovery of true loading vectors. Also, we apply our methods to the integrative analysis of gene expression data and protein abundance data from the NCI‐60 cancer cell lines. Our analysis of the NCI‐60 cancer cell line data reveals meaningful variables for cancer diseases and biologically meaningful results that are consistent with previous studies. Availability and implementation Our algorithms are implemented as an R package which is freely available at: https://www.med.upenn.edu/long‐lab/. Supplementary information Supplementary data are available at Bioinformatics online.


BMC Bioinformatics | 2017

Incorporating biological information in sparse principal component analysis with application to genomic data.

Ziyi Li; Sandra E. Safo; Qi Long

BackgroundSparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection.ResultsOur simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma.ConclusionsThe proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.


ieee international conference on data science and advanced analytics | 2016

Sparse Linear Discriminant Analysis in Structured Covariates Space

Sandra E. Safo; Qi Long

Classification with high dimensional variables is a popular goal in many modern statistical studies. Fishers linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fishers discriminant for high dimensional data is as bad as random guessing due to the many noise features that increases misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this article and propose methods that incorporate variable selection into the classification problem, for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods to existing sparse LDA approaches via simulation studies and real data analysis.

Collaboration


Dive into the Sandra E. Safo's collaboration.

Top Co-Authors

Avatar

Qi Long

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge