Yuekai Sun | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuekai Sun is active.

Explore More

Publication

Featured researches published by Yuekai Sun.

Annals of Statistics | 2016

Exact post-selection inference, with application to the lasso

Jason D. Lee; Dennis L. Sun; Yuekai Sun; Jonathan Taylor

We develop a general approach to valid inference after model selection. In a nutshell, our approach produces post-selection inferences with the same frequency guarantees as those given by data splitting but are more powerful. At the core of our framework is a result that characterizes the distribution of a post-selection estimator conditioned on the selection event. We specialize the approach to model selection by the lasso to form valid condence intervals for the selected coecients and test whether all relevant variables have been included in the model.

Siam Journal on Optimization | 2014

PROXIMAL NEWTON-TYPE METHODS FOR MINIMIZING COMPOSITE FUNCTIONS

Jason D. Lee; Yuekai Sun; Michael A. Saunders

We generalize Newton-type methods for minimizing smooth functions to handle a sum of two convex functions: a smooth function and a nonsmooth function with a simple proximal mapping. We show that the resulting proximal Newton-type methods inherit the desirable convergence behavior of Newton-type methods for minimizing smooth functions, even when search directions are computed inexactly. Many popular methods tailored to problems arising in bioinformatics, signal processing, and statistical learning are special cases of proximal Newton-type methods, and our analysis yields new convergence results for some of these methods.

Molecular Systems Biology | 2015

Do genome-scale models need exact solvers or clearer standards?

Ali Ebrahim; Eivind Almaas; Eugen Bauer; Aarash Bordbar; Anthony P. Burgard; Roger L. Chang; Andreas Dräger; Iman Famili; Adam M. Feist; Ronan M. T. Fleming; Stephen S. Fong; Vassily Hatzimanikatis; Markus J. Herrgård; Allen Holder; Michael Hucka; Daniel R. Hyduke; Neema Jamshidi; Sang Yup Lee; Nicolas Le Novère; Joshua A. Lerman; Nathan E. Lewis; Ding Ma; Radhakrishnan Mahadevan; Costas D. Maranas; Harish Nagarajan; Ali Navid; Jens Nielsen; Lars K. Nielsen; Juan Nogales; Alberto Noronha

Constraint‐based analysis of genome‐scale models (GEMs) arose shortly after the first genome sequences became available. As numerous reviews of the field show, this approach and methodology has proven to be successful in studying a wide range of biological phenomena (McCloskey et al, 2013; Bordbar et al, 2014). However, efforts to expand the user base are impeded by hurdles in correctly formulating these problems to obtain numerical solutions. In particular, in a study entitled “An exact arithmetic toolbox for a consistent and reproducible structural analysis of metabolic network models” (Chindelevitch et al, 2014), the authors apply an exact solver to 88 genome‐scale constraint‐based models of metabolism. The authors claim that COBRA calculations (Orth et al, 2010) are inconsistent with their results and that many published and actively used (Lee et al, 2007; McCloskey et al, 2013) genome‐scale models do support cellular growth in existing studies only because of numerical errors. They base these broad claims on two observations: (i) three reconstructions (iAF1260, iIT341, and iNJ661) compute feasibly in COBRA, but are infeasible when exact numerical algorithms are used by their software (entitled MONGOOSE); (ii) linear programs generated by MONGOOSE for iIT341 were submitted to the NEOS Server (a Web site that runs linear programs through various solvers) and gave inconsistent results. They further claim that a large percentage of these COBRA models are actually unable to produce biomass flux. Here, we demonstrate that the claims made by Chindelevitch et al (2014) stem from an incorrect parsing of models from files rather than actual problems with numerical error or COBRA computations.

Proceedings of the National Academy of Sciences of the United States of America | 2015

Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data

Laurence Yang; Justin Tan; Edward J. O’Brien; Jonathan M. Monk; Donghyuk Kim; Howard J. Li; Pep Charusanti; Ali Ebrahim; Colton J. Lloyd; James T. Yurkovich; Bin Du; Andreas Dräger; Alex Thomas; Yuekai Sun; Michael A. Saunders; Bernhard O. Palsson

Significance Defining a core functional proteome supporting the living process has importance for both developing fundamental understanding of cell functions and for synthetic biology applications. Comparative genomics has been the primary approach to achieve such a definition. Here, we use genome-scale models to define a core proteome that computationally supports basic cellular function. This core proteome for metabolism and protein expression, defined through systems biology methods, is validated and characterized by using multiple disparate data types. Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5′UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.

BMC Bioinformatics | 2013

Robust flux balance analysis of multiscale biochemical reaction networks.

Yuekai Sun; Ronan M. T. Fleming; Ines Thiele; Michael A. Saunders

BackgroundBiological processes such as metabolism, signaling, and macromolecular synthesis can be modeled as large networks of biochemical reactions. Large and comprehensive networks, like integrated networks that represent metabolism and macromolecular synthesis, are inherently multiscale because reaction rates can vary over many orders of magnitude. They require special methods for accurate analysis because naive use of standard optimization systems can produce inaccurate or erroneously infeasible results.ResultsWe describe techniques enabling off-the-shelf optimization software to compute accurate solutions to the poorly scaled optimization problems arising from flux balance analysis of multiscale biochemical reaction networks. We implement lifting techniques for flux balance analysis within the openCOBRA toolbox and demonstrate our techniques using the first integrated reconstruction of metabolism and macromolecular synthesis for E. coli.ConclusionOur techniques enable accurate flux balance analysis of multiscale networks using off-the-shelf optimization software. Although we describe lifting techniques in the context of flux balance analysis, our methods can be used to handle a variety of optimization problems arising from analysis of multiscale network reconstructions.

neural information processing systems | 2012