Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Can Yang is active.

Publication


Featured researches published by Can Yang.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation

Xiaowei Zhou; Can Yang; Weichuan Yu

Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifier, while background subtraction needs a training sequence that contains no objects to build a background model. To automate the analysis, object detection without a separate training phase becomes a critical task. People have tried to tackle this task by using motion information. But existing motion-based methods are usually limited when coping with complex scenarios such as nonrigid motion and dynamic background. In this paper, we show that the above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR). This formulation integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently. We explain the relations between DECOLOR and other sparsity-based methods. Experiments on both simulated data and real sequences demonstrate that DECOLOR outperforms the state-of-the-art approaches and it can work effectively on a wide range of complex scenarios.


Bioinformatics | 2009

SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies

Can Yang; Zengyou He; Xiang Wan; Qiang Yang; Hong Xue; Weichuan Yu

MOTIVATION Hundreds of thousands of single nucleotide polymorphisms (SNPs) are available for genome-wide association (GWA) studies nowadays. The epistatic interactions of SNPs are believed to be very important in determining individual susceptibility to complex diseases. However, existing methods for SNP interaction discovery either suffer from high computation complexity or perform poorly when marginal effects of disease loci are weak or absent. Hence, it is desirable to develop an effective method to search epistatic interactions in genome-wide scale. RESULTS We propose a new method SNPHarvester to detect SNP-SNP interactions in GWA studies. SNPHarvester creates multiple paths in which the visited SNP groups tend to be statistically associated with diseases, and then harvests those significant SNP groups which pass the statistical tests. It greatly reduces the number of SNPs. Consequently, existing tools can be directly used to detect epistatic interactions. By using a wide range of simulated data and a real genome-wide data, we demonstrate that SNPHarvester outperforms its recent competitor significantly and is promising for practical disease prognosis. AVAILABILITY http://bioinformatics.ust.hk/SNPHarvester.html.


Bioinformatics | 2010

Predictive rule inference for epistatic interaction detection in genome-wide association studies

Xiang Wan; Can Yang; Qiang Yang; Hong Xue; Nelson L.S. Tang; Weichuan Yu

MOTIVATION Under the current era of genome-wide association study (GWAS), finding epistatic interactions in the large volume of SNP data is a challenging and unsolved issue. Few of previous studies could handle genome-wide data due to the difficulties in searching the combinatorially explosive search space and statistically evaluating high-order epistatic interactions given the limited number of samples. In this work, we propose a novel learning approach (SNPRuler) based on the predictive rule inference to find disease-associated epistatic interactions. RESULTS Our extensive experiments on both simulated data and real genome-wide data from Wellcome Trust Case Control Consortium (WTCCC) show that SNPRuler significantly outperforms its recent competitor. To our knowledge, SNPRuler is the first method that guarantees to find the epistatic interactions without exhaustive search. Our results indicate that finding epistatic interactions in GWAS is computationally attainable in practice. AVAILABILITY http://bioinformatics.ust.hk/SNPRuler.zip


BMC Bioinformatics | 2009

MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study

Xiang Wan; Can Yang; Qiang Yang; Hong Xue; Nelson L.S. Tang; Weichuan Yu

BackgroundThe interactions of multiple single nucleotide polymorphisms (SNPs) are highly hypothesized to affect an individuals susceptibility to complex diseases. Although many works have been done to identify and quantify the importance of multi-SNP interactions, few of them could handle the genome wide data due to the combinatorial explosive search space and the difficulty to statistically evaluate the high-order interactions given limited samples.ResultsThree comparative experiments are designed to evaluate the performance of MegaSNPHunter. The first experiment uses synthetic data generated on the basis of epistasis models. The second one uses a genome wide study on Parkinson disease (data acquired by using Illumina HumanHap300 SNP chips). The third one chooses the rheumatoid arthritis study from Wellcome Trust Case Control Consortium (WTCCC) using Affymetrix GeneChip 500K Mapping Array Set. MegaSNPHunter outperforms the best solution in this area and reports many potential interactions for the two real studies.ConclusionThe experimental results on both synthetic data and two real data sets demonstrate that our proposed approach outperforms the best solution that is currently available in handling large-scale SNP data both in terms of speed and in terms of detection of potential interactions that were not identified before. To our knowledge, MegaSNPHunter is the first approach that is capable of identifying the disease-associated SNP interactions from WTCCC studies and is promising for practical disease prognosis.


BMC Bioinformatics | 2010

Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso.

Can Yang; Xiang Wan; Qiang Yang; Hong Xue; Weichuan Yu

BackgroundSingle nucleotide polymorphism (SNP) based association studies aim at identifying SNPs associated with phenotypes, for example, complex diseases. The associated SNPs may influence the disease risk individually (main effects) or behave jointly (epistatic interactions). For the analysis of high throughput data, the main difficulty is that the number of SNPs far exceeds the number of samples. This difficulty is amplified when identifying interactions.ResultsIn this paper, we propose an Adaptive Group Lasso (AGL) model for large-scale association studies. Our model enables us to analyze SNPs and their interactions simultaneously. We achieve this by introducing a sparsity constraint in our model based on the fact that only a small fraction of SNPs is disease-associated. In order to reduce the number of false positive findings, we develop an adaptive reweighting scheme to enhance sparsity. In addition, our method treats SNPs and their interactions as factors, and identifies them in a grouped manner. Thus, it is flexible to analyze various disease models, especially for interaction detection. However, due to the intensive computation when millions of interaction terms needs to be searched in the model fitting, our method needs to combined with some filtering methods when applied to genome-wide data for detecting interactions.ConclusionBy using a wide range of simulated datasets and a real dataset from WTCCC, we demonstrate the advantages of our method.


Bioinformatics | 2013

Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping

Can Yang; Lin Wang; Shuqin Zhang; Hongyu Zhao

MOTIVATION Expression quantitative trait loci (eQTL) studies investigate how gene expression levels are affected by DNA variants. A major challenge in inferring eQTL is that a number of factors, such as unobserved covariates, experimental artifacts and unknown environmental perturbations, may confound the observed expression levels. This may both mask real associations and lead to spurious association findings. RESULTS In this article, we introduce a LOw-Rank representation to account for confounding factors and make use of Sparse regression for eQTL mapping (LORS). We integrate the low-rank representation and sparse regression into a unified framework, in which single-nucleotide polymorphisms and gene probes can be jointly analyzed. Given the two model parameters, our formulation is a convex optimization problem. We have developed an efficient algorithm to solve this problem and its convergence is guaranteed. We demonstrate its ability to account for non-genetic effects using simulation, and then apply it to two independent real datasets. Our results indicate that LORS is an effective tool to account for non-genetic effects. First, our detected associations show higher consistency between studies than recently proposed methods. Second, we have identified some new hotspots that can not be identified without accounting for non-genetic effects. AVAILABILITY The software is available at: http://bioinformatics.med.yale.edu/software.aspx. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Bioinformatics | 2010

Detecting two-locus associations allowing for interactions in genome-wide association studies

Xiang Wan; Can Yang; Qiang Yang; Hong Xue; Nelson L.S. Tang; Weichuan Yu

MOTIVATION Genome-wide association studies (GWASs) aim to identify genetic susceptibility to complex diseases by assaying and analyzing hundreds of thousands of single nucleotide polymorphisms (SNPs). Although traditional single-locus statistical tests have identified many genetic determinants of susceptibility, those findings cannot completely explain genetic contributions to complex diseases. Marchini and coauthors demonstrated the importance of testing two-locus associations allowing for interactions through a wide range of simulation studies. However, such a test is computationally demanding as we need to test hundreds of billions of SNP pairs in GWAS. Here, we provide a method to address this computational burden for dichotomous phenotypes. RESULTS We have applied our method on nine datasets from GWAS, including the aged-related macular degeneration (AMD) dataset, the Parkinsons disease dataset and seven datasets from the Wellcome Trust Case Control Consortium (WTCCC). Our method has discovered many associations that were not identified before. The running time for the AMD dataset, the Parkinsons disease dataset and each of seven WTCCC datasets are 2.5, 82 and 90 h on a standard 3.0 GHz desktop with 4 G memory running Windows XP system. Our experiment results demonstrate that our method is feasible for the full-scale analyses of both single- and two-locus associations allowing for interactions in GWAS. AVAILABILITY http://bioinformatics.ust.hk/SNPAssociation.zip CONTACT [email protected]; [email protected]; SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


BMC Genetics | 2013

The complete compositional epistasis detection in genome-wide association studies

Xiang Wan; Can Yang; Qiang Yang; Hongyu Zhao; Weichuan Yu

BackgroundThe detection of epistasis among genetic markers is of great interest in genome-wide association studies (GWAS). In recent years, much research has been devoted to find disease-associated epistasis in GWAS. However, due to the high computational cost involved, most methods focus on specific epistasis models, making the potential loss of power when the underlying epistasis models are not examined in these analyses.ResultsIn this work, we propose a computational efficient approach based on complete enumeration of two-locus epistasis models. This approach uses a two-stage (screening and testing) search strategy and guarantees the enumeration of all epistasis patterns. The implementation is done on graphic processing units (GPU), which can finish the analysis on a GWAS data (with around 5,000 subjects and around 350,000 markers) within two hours. Source code is available at http://bioinformatics.ust.hk/BOOST.html∖#GBOOST.ConclusionsThis work demonstrates that the complete compositional epistasis detection is computationally feasible in GWAS.


Bioinformatics | 2011

Identifying disease-associated SNP clusters via contiguous outlier detection

Can Yang; Xiaowei Zhou; Xiang Wan; Qiang Yang; Hong Xue; Weichuan Yu

MOTIVATION Although genome-wide association studies (GWAS) have identified many disease-susceptibility single-nucleotide polymorphisms (SNPs), these findings can only explain a small portion of genetic contributions to complex diseases, which is known as the missing heritability. A possible explanation is that genetic variants with small effects have not been detected. The chance is < 8 that a causal SNP will be directly genotyped. The effects of its neighboring SNPs may be too weak to be detected due to the effect decay caused by imperfect linkage disequilibrium. Moreover, it is still challenging to detect a causal SNP with a small effect even if it has been directly genotyped. RESULTS In order to increase the statistical power when detecting disease-associated SNPs with relatively small effects, we propose a method using neighborhood information. Since the disease-associated SNPs account for only a small fraction of the entire SNP set, we formulate this problem as Contiguous Outlier DEtection (CODE), which is a discrete optimization problem. In our formulation, we cast the disease-associated SNPs as outliers and further impose a spatial continuity constraint for outlier detection. We show that this optimization can be solved exactly using graph cuts. We also employ the stability selection strategy to control the false positive results caused by imperfect parameter tuning. We demonstrate its advantage in simulations and real experiments. In particular, the newly identified SNP clusters are replicable in two independent datasets. AVAILABILITY The software is available at: http://bioinformatics.ust.hk/CODE.zip. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


BMC Bioinformatics | 2011

A hidden two-locus disease association pattern in genome-wide association studies

Can Yang; Xiang Wan; Qiang Yang; Hong Xue; Nelson L.S. Tang; Weichuan Yu

BackgroundRecent association analyses in genome-wide association studies (GWAS) mainly focus on single-locus association tests (marginal tests) and two-locus interaction detections. These analysis methods have provided strong evidence of associations between genetics variances and complex diseases. However, there exists a type of association pattern, which often occurs within local regions in the genome and is unlikely to be detected by either marginal tests or interaction tests. This association pattern involves a group of correlated single-nucleotide polymorphisms (SNPs). The correlation among SNPs can lead to weak marginal effects and the interaction does not play a role in this association pattern. This phenomenon is due to the existence of unfaithfulness: the marginal effects of correlated SNPs do not express their significant joint effects faithfully due to the correlation cancelation.ResultsIn this paper, we develop a computational method to detect this association pattern masked by unfaithfulness. We have applied our method to analyze seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). The analysis for each data set takes about one week to finish the examination of all pairs of SNPs. Based on the empirical result of these real data, we show that this type of association masked by unfaithfulness widely exists in GWAS.ConclusionsThese newly identified associations enrich the discoveries of GWAS, which may provide new insights both in the analysis of tagSNPs and in the experiment design of GWAS. Since these associations may be easily missed by existing analysis tools, we can only connect some of them to publicly available findings from other association studies. As independent data set is limited at this moment, we also have difficulties to replicate these findings. More biological implications need further investigation.AvailabilityThe software is freely available at http://bioinformatics.ust.hk/hidden_pattern_finder.zip.

Collaboration


Dive into the Can Yang's collaboration.

Top Co-Authors

Avatar

Weichuan Yu

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Xiang Wan

Hong Kong Baptist University

View shared research outputs
Top Co-Authors

Avatar

Qiang Yang

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Hong Xue

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Nelson L.S. Tang

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaowei Zhou

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Xiaodan Fan

The Chinese University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zengyou He

Dalian University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge