Ming Hung Kao
Arizona State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ming Hung Kao.
NeuroImage | 2009
Ming Hung Kao; Abhyuday Mandal; Nicole A. Lazar; John Stufken
In this article, we propose an efficient approach to find optimal experimental designs for event-related functional magnetic resonance imaging (ER-fMRI). We consider multiple objectives, including estimating the hemodynamic response function (HRF), detecting activation, circumventing psychological confounds and fulfilling customized requirements. Taking into account these goals, we formulate a family of multi-objective design criteria and develop a genetic-algorithm-based technique to search for optimal designs. Our proposed technique incorporates existing knowledge about the performance of fMRI designs, and its usefulness is shown through simulations. Although our approach also works for other linear combinations of parameters, we primarily focus on the case when the interest lies either in the individual stimulus effects or in pairwise contrasts between stimulus types. Under either of these popular cases, our algorithm outperforms the previous approaches. We also find designs yielding higher estimation efficiencies than m-sequences. When the underlying model is with white noise and a constant nuisance parameter, the stimulus frequencies of the designs we obtained are in good agreement with the optimal stimulus frequencies derived by Liu and Frank, 2004, NeuroImage 21: 387-400. In addition, our approach is built upon a rigorous model formulation.
The Annals of Applied Statistics | 2013
Ming Hung Kao; Dibyen Majumdar; Abhyuday Mandal; John Stufken
Previous studies on event-related functional magnetic resonance imaging experimental designs are primarily based on linear models, in which a known shape of the hemodynamic response function (HRF) is assumed. However, the HRF shape is usually uncertain at the design stage. To address this issue, we consider a nonlinear model to accommodate a wide spectrum of feasible HRF shapes, and propose efficient approaches for obtaining maximin and maximin-efficient designs. Our approaches involve a reduction in the parameter space and a search algorithm that helps to efficiently search over a restricted class of designs for good designs. The obtained designs are compared with traditional designs widely used in practice. We also demonstrate the usefulness of our approaches via a motivating example.
Communications in Statistics-theory and Methods | 2009
Ming Hung Kao; Abhyuday Mandal; John Stufken
Event-related functional magnetic resonance imaging (ER-fMRI) is a leading technology for studying brain activity in response to mental stimuli. Due to the popularity and high cost of this pioneering technology, efficient experimental designs are in great demand. However, the complex nature of ER-fMRI makes it difficult to obtain such designs; it requires careful consideration regarding both statistical and practical issues as well as major computational efforts. In this article, we obtain efficient designs for ER-fMRI. In contrast to previous studies, we take into account a common practice where subjects undergo multiple scanning sessions in an experiment. To the best of our knowledge, this important reality has never been studied systematically for design selection. We compare several approaches to obtain efficient designs and propose a novel algorithm for this problem. Our simulation results indicate that, using our algorithm, highly efficient designs can be obtained.
Annals of Statistics | 2015
Ching Shui Cheng; Ming Hung Kao
Functional magnetic resonance imaging (fMRI) technology is popularly used in many fields for studying how the brain reacts to mental stimuli. The identification of optimal fMRI experimental designs is crucial for rendering precise statistical inference on brain functions, but research on this topic is very lacking. We develop a general theory to guide the selection of fMRI designs for estimating a hemodynamic response function (HRF) that models the effect over time of the mental stimulus, and for studying the comparison of two HRFs. We provide a useful connection between fMRI designs and circulant biased weighing designs, establish the statistical optimality of some well-known fMRI designs and identify several new classes of fMRI designs. Construction methods of high-quality fMRI designs are also given.
NeuroImage | 2017
Ming Hung Kao; Lin Zhou
This study concerns optimal designs for functional magnetic resonance imaging (fMRI) experiments when the model matrix of the statistical model depends on both the selected stimulus sequence (fMRI design), and the subjects uncertain feedback (e.g. answer) to each mental stimulus (e.g. question) presented to her/him. While practically important, this design issue is challenging. This mainly is because that the information matrix cannot be fully determined at the design stage, making it difficult to evaluate the quality of the selected designs. To tackle this challenging issue, we propose an easy-to-use optimality criterion for evaluating the quality of designs, and an efficient approach for obtaining designs optimizing this criterion. Compared with a previously proposed method, our approach requires a much less computing time to achieve designs with high statistical efficiencies.
Journal of biometrics & biostatistics | 2013
Ming Hung Kao
E the influence of genetic or environmental factors in longitudinal data has been challenging genetic studies. While some researchers have modeled sequenced data with a few data points as repeated measurements in which covariance is incorporated with certain time series structures, to date, not much research has been done to address the issue of how genetic factors or environmental factors affect the underlying behavior of the data with multiple measurements. In this study, we aim to build a statistical framework to model the anticipated association between the development of a disease and hypothesized genetic or environmental factors. The longitudinal phenotype data we modeled consists of three parts – the incubation period, the onset, and the disease appearing period. Assuming the disease onset is missing, we propose a mixture model to model the longitudinal phenotype such as blood pressure with association to the genotype or environmental factors. An EM based approach is used to estimate the distribution of the onset, which might be related to either genotype or environmental factors. To estimate the onset distribution, a solution for a weighed logistic regression under which weights on the outcome variables are proposed. A log likelihood ratio test is used to judge the significance of the association between the distribution of the onset and the genotype or environmental factors. We conducted extensive simulation studies to evaluate the model performance and gave desirable Type I error rates and the power of the proposed test.F resources have always been a key factor in scientific studies. Financial constraint has become a burden for carrying out our research and we often face budget cut in clinical trials. Therefore it is crucial to achieve maximal power with limited financial resources in clinical trials. There are a lot of publications about this area. Our focus in this paper is about optimal sample size allocation with financial constraints. For simplicity our discussion is restricted to the comparison of two groups. But the methods can be generalized to multiple comparisons. We study two types of optimal sample size allocation under the financial constraint: (a) maximize power for detecting the difference of two sample means, two proportions, two survival rates and two correlations, (b) minimize the variance of the difference of two sample means and the variance of the ratio of two sample means.T PhenX Toolkit (consensus measures for Phenotypes and eXposures) is an online catalog of broadly validated and well-established measures of phenotypes and exposures for use in human subjects research. The PhenX Toolkit currently includes 339 measures covering a broad scope of 21 research domains including Demographics, Anthropometrics, Cancer, Nutrition, Environmental Exposures, Neurology and Social Environments, and six specialty areas related to Substance Abuse and Addiction (SAA). Investigators can find measures of interest by browsing domains, collections, or measures, or by searching using the Smart Query Tool. For each measure, the Toolkit provides a description of the measure, the rationale for its inclusion, detailed protocol(s) for collecting the data, and supporting documentation. The Toolkit also provides custom data collection worksheets to help investigators integrate PhenX measures into their study and custom data dictionaries to facilitate data submission to the database of Genotypes and Phenotypes (dbGaP). To promote data interoperability, PhenX has adopted the national standards Logical Observation Identifiers Names and Codes (LOINC) and cancer Biomedical Informatics Grid (caBIG) Common Data Elements (CDEs). To help researchers find comparable data, PhenX measures and variables are mapped to studies in dbGaP. These mappings are displayed in dbGaP and highlight opportunities for cross-study analysis by researchers using PhenX measures. PhenX RISING (Real world, Implementation, SharING) includes seven investigators who were awarded funds to incorporate PhenX measures into existing, population-based genomic studies. Of the 81 measures implemented by these investigators, 55 are shared by two or more groups providing common ground for future cross-study analysis.T aim of this project was to assess the applicability and the robustness of Linear Quantile Mixed-Effects Models (LQMM) as compared to Linear Mixed-Effects Models, for the analysis of longitudinal data. Specifically, we focused on Linear Quantile MixedEffects Models using the Laplace distribution introduced by Geraci and Bottai in 2007. First, we addressed the advantages and the disadvantages of LQMM when considering the median instead of the mean as measure of central tendency. Second, we present the gain given by LQMM when modeling the first and third quartiles as valid and robust alternative to subgroups analysis. Data from the multicenter prospective randomized controlled trial of the effectiveness of amantadine hydrochloride in promoting recovery of function following severe traumatic brain injury, a randomized, placebo controlled, double blind clinical trial, was used in our application. LQMM can be very useful in characterizing the treatment effects among sub-groups of patients. In particular we address on the characterization based on patients in Vegetative State (VS) and Minimum Consciousness State (MCS). We also discuss the inefficiency of LQMM when the outcome is measured with high precisionT coefficient of variation CV=σ/μ is commonly used to measure the reproducibility of analytical techniques or equipment: the lower the CV, the better the analytical precision. To assess the reproducibility of methods used in proteomics or genomics, which yield a huge amount of correlated data, we propose to extend the univariate CV to the multivariate setting by the expression CVm=[μ T Σμ/(μT μ)2]1/2 . The CVm was applied to four different sample prefractionation methods to select the best starting point for further differential proteomic experiments by mass spectrometry (MS): highly abundant proteins precipitation, restricted access materials (RAM) combined with IMAC chromatography, and peptide ligand affinity beads at two different pH levels (4 and 9), respectively. The reproducibility of the four methods (65-70 peaks) was evaluated with regards to mass and peak intensity (5-6 repetitions). CVm was low for mass in all methods (range: 0.02 – 0.05%) but for intensity reproducibility was significantly lower for method 1 (highly abundant proteins precipitation) than for the other methods (CVm = 2.84% vs. 7.34 – 10%). Unlike univariate CVs calculated for each peak separately, CVm yields a global reproducibility measure accounting for correlations between peak characteristics. (Joint work with Marianne Fillet and Lixin Zhang)I is now well known that obesity leads to a number of metabolic and chronic diseases that affect a large portion of the population. Obesity is also associated with a higher incidence of psychological problems, decreased productivity, and lower educational and professional attainment. Rates of obesity and overweight are increasing for youth in Newfoundland & Labrador and are, along with adult rates, the highest among the Canadian provinces. Based on a study in 2005 by Statistics Canada, a large proportion of Canadian adults were overweight (35%) or obese (24%). Newfoundland and Labrador had the highest overweight rate (37%) in Canada. This complex trait is determined by multiple genetic and environmental factors that interact with one another in complicated ways. The existing studies examine such factors under the assumption that they are measured accurately. In reality, both genetics and environmental factors are likely measured with errors. Measurement and/or misclassification (genotyping) error can influence the results of a study. The impact of ignoring these errors varies from bias and large variability in estimators to low power or even falsenegative results in detecting genetic associations. Motivated by the Complex Diseases in the Newfoundland population: Environment and Genetics (CODING) study, we propose a Generalized Quasi-Likelihood estimation method when both environmental and genetic factors are subject to error. Using simulation studies, we investigate the finite sample performances of the estimators and show the impact of measurement error and/or misclassification in the covariates, on the estimation procedure. The method is applied to the CODING data.Functional magnetic resonance imaging (fMRI) is considered one of the leading technologies for studying human brain activity in response to mental stimuli. Well planned experimental designs for fMRI are crucially important. They help researchers to collect informative data to successfully achieve valid and precise statistical inference about the inner workings of our brains. Existing studies on fMRI designs are primarily based on linear models, in which a known shape of the hemodynamic response function (HRF) is assumed. However, the HRF shape is usually uncertain and can vary across brain regions. To address this issue, we consider, at the design stage, a nonlinear model allowing for a wide spectrum of feasible HRF shapes, and propose efficient approaches for obtaining both maximin and maximin efficient designs that are relatively efficient across a class of possible HRF shapes. We present some theoretical results that help to reduce the space of the unknown model parameters and demonstrate that good designs can be obtained over a restricted subclass of fMRI designs. The obtained designs are compared with designs that are widely used in practice. Biography Ming-Hung Kao has completed his Ph.D in year 2009 from the University of Georgia. He is currently an assistant professor in the School of Mathematical & Statistical Sciences at Arizona State University.N generation sequencing (NGS) technologies will generate unprecedentedly massive (thousands or even ten thousands of individuals) and highly-dimensional (ten or even dozens of millions) genomic and epigenomic variation data. Due to advances in measurement technologies and communications, a large panel of physiological data including all medical treatments and outcomes accumulated over the patient’s lifetime is monitored and collected. Analysis of these extremely big and diverse types of data sets provide invaluable information for disease prediction, prevention, diagnosis and treatment, but also pose great computational challenges. To address these challenges, we formulate phenotype prediction and variable selection into a sparse sufficient high dimensional reduction problem and develop a novel alternative direction optimization algorithm to solve high dimensional data reduction problem. The developed algorithms were applied to the NHLBI’s Exome Sequencing data and whole genome sequencing data with 13 lip metabolism phenotypes. The results are very encouraging. Our works address the paradigm shift in genomic and health care data analysis from standard multivariate data analysis to functional data analysis, from low dimensional data analysis to high dimensional data analysis, from independent sampling to dependent sampling, from single type data analysis to integrate multiple types of data analysis, and from individual PC to parallel computing.F data with self-modeling structures may arise from biomedical study paradigms where response curves vary across subjects observed over time but are related through parametric transformations of a single latent curve. We present self-modeling regressions for flexible nonparametric modeling of responses measured longitudinally which have a common underlying global time profile. Bayesian adaptive regression splines are used to provide nonparametric estimation of the latent curve and Bayesian model selection for time series by an autoregressive moving average (ARMA) is incorporated. The algorithm is implemented using Markov Chain Monte Carlo, where reversible-jump steps are performed for knot selection in the latent curve estimation and selection of ARMA orders. Our approach combines nonparametric regression and time series estimation to extend the existing self-modeling regression approaches. We illustrate the method using intestinal current measurements collected from a multi-site prospective study to determine conductance of cystic fibrosis transmembrane regulation. We also discuss some of the computational difficulties that arise in application of the method.We develop a probability model for evaluating long-term effects due to regular screening. People who take part in cancer screening are divided into four mutually exclusive groups: True-early-detection, No-early-detection, Overdiagnosis, and Symptom-free-life. For each case, we derive the probability formula. Simulation studies using the HIP (Health Insurance Plan for Greater New York) breast cancer study’s data provide estimates for these probabilities and corresponding credible intervals. These probabilities change with a person’s age at study entry, screening frequency, screening sensitivity, and other parameters. We also allow human lifetime to be subject to a competing risk of death from other causes. The model can provide policy makers with important information regarding the distribution of individuals participating in a screening program who eventually fall into one of the four groups.
Journal of Statistical Software | 2009
Ming Hung Kao
Journal of The Royal Statistical Society Series C-applied Statistics | 2012
Ming Hung Kao; Abhyuday Mandal; John Stufken
Statistics & Probability Letters | 2013
Ming Hung Kao
Statistics & Probability Letters | 2014
Ming Hung Kao