Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yaoyong Li is active.

Publication


Featured researches published by Yaoyong Li.


BMC Genomics | 2010

A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling

James R. Bradford; Yvonne Hey; Tim Yates; Yaoyong Li; Stuart D Pepper; Crispin J. Miller

BackgroundRNA-Seq exploits the rapid generation of gigabases of sequence data by Massively Parallel Nucleotide Sequencing, allowing for the mapping and digital quantification of whole transcriptomes. Whilst previous comparisons between RNA-Seq and microarrays have been performed at the level of gene expression, in this study we adopt a more fine-grained approach. Using RNA samples from a normal human breast epithelial cell line (MCF-10a) and a breast cancer cell line (MCF-7), we present a comprehensive comparison between RNA-Seq data generated on the Applied Biosystems SOLiD platform and data from Affymetrix Exon 1.0ST arrays. The use of Exon arrays makes it possible to assess the performance of RNA-Seq in two key areas: detection of expression at the granularity of individual exons, and discovery of transcription outside annotated loci.ResultsWe found a high degree of correspondence between the two platforms in terms of exon-level fold changes and detection. For example, over 80% of exons detected as expressed in RNA-Seq were also detected on the Exon array, and 91% of exons flagged as changing from Absent to Present on at least one platform had fold-changes in the same direction. The greatest detection correspondence was seen when the read count threshold at which to flag exons Absent in the SOLiD data was set to t<1 suggesting that the background error rate is extremely low in RNA-Seq. We also found RNA-Seq more sensitive to detecting differentially expressed exons than the Exon array, reflecting the wider dynamic range achievable on the SOLiD platform. In addition, we find significant evidence of novel protein coding regions outside known exons, 93% of which map to Exon array probesets, and are able to infer the presence of thousands of novel transcripts through the detection of previously unreported exon-exon junctions.ConclusionsBy focusing on exon-level expression, we present the most fine-grained comparison between RNA-Seq and microarrays to date. Overall, our study demonstrates that data from a SOLiD RNA-Seq experiment are sufficient to generate results comparable to those produced from Affymetrix Exon arrays, even using only a single replicate from each platform, and when presented with a large genome.


international conference on deterministic and statistical methods in machine learning | 2004

SVM based learning system for information extraction

Yaoyong Li; Kalina Bontcheva; Hamish Cunningham

This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM classifiers to be trained than other recent SVM-based systems. The paper also compares our approach to several state-of-the-art systems (including rule learning and statistical learning algorithms) on three IE benchmark datasets: CoNLL-2003, CMU seminars, and the software jobs corpus. The experimental results show that our system outperforms a recent SVM-based system on CoNLL-2003, achieves the highest score on eight out of 17 categories on the jobs corpus, and is second best on the remaining nine.


Nature Cell Biology | 2016

GFI1 proteins orchestrate the emergence of haematopoietic stem cells through recruitment of LSD1

Roshana Thambyrajah; Milena Mazan; Rahima Patel; Victoria Moignard; Monika Stefanska; Elli Marinopoulou; Yaoyong Li; Christophe Lancrin; Thomas Clapes; Tarik Möröy; Catherine Robin; Crispin J. Miller; Shaun M. Cowley; Berthold Göttgens; Valerie Kouskoff; Georges Lacaud

In vertebrates, the first haematopoietic stem cells (HSCs) with multi-lineage and long-term repopulating potential arise in the AGM (aorta–gonad–mesonephros) region. These HSCs are generated from a rare and transient subset of endothelial cells, called haemogenic endothelium (HE), through an endothelial-to-haematopoietic transition (EHT). Here, we establish the absolute requirement of the transcriptional repressors GFI1 and GFI1B (growth factor independence 1 and 1B) in this unique trans-differentiation process. We first demonstrate that Gfi1 expression specifically defines the rare population of HE that generates emerging HSCs. We further establish that in the absence of GFI1 proteins, HSCs and haematopoietic progenitor cells are not produced in the AGM, revealing the critical requirement for GFI1 proteins in intra-embryonic EHT. Finally, we demonstrate that GFI1 proteins recruit the chromatin-modifying protein LSD1, a member of the CoREST repressive complex, to epigenetically silence the endothelial program in HE and allow the emergence of blood cells.


international world wide web conferences | 2007

Hierarchical, perceptron-like learning for ontology-based information extraction

Yaoyong Li; Kalina Bontcheva

Recent work on ontology-based Information Extraction (IE) has tried to make use of knowledge from the target ontology in order to improve semantic annotation results. However, very few approaches exploit the ontology structure itself, and those that do so, have some limitations. This paper introduces a hierarchical learning approach for IE, which uses the target ontology as an essential part of the extraction process, by taking into account the relations between concepts. The approach is evaluated on the largest available semantically annotated corpus. The results demonstrate clearly the benefits of using knowledge from the ontology as input to the information extraction process. We also demonstrate the advantages of our approach over other state-of-the-art learning systems on a commonly used benchmark dataset.


conference on computational natural language learning | 2005

Using Uneven Margins SVM and Perceptron for Information Extraction

Yaoyong Li; Kalina Bontcheva; Hamish Cunningham

The classification problem derived from information extraction (IE) has an imbalanced training set. This is particularly true when learning from smaller datasets which often have a few positive training examples and many negative ones. This paper takes two popular IE algorithms -- SVM and Perceptron -- and demonstrates how the introduction of an uneven margins parameter can improve the results on imbalanced training data in IE. Our experiments demonstrate that the uneven margin was indeed helpful, especially when learning from few examples. Essentially, the smaller the training set is, the more beneficial the uneven margin can be. We also compare our systems to other state-of-the-art algorithms on several benchmarking corpora for IE.


Natural Language Engineering | 2009

Adapting svm for data sparseness and imbalance: A case study in information extraction

Yaoyong Li; Kalina Bontcheva; Hamish Cunningham

Support Vector Machines (SVM) have been used successfully in many Natural Language Processing (NLP) tasks. The novel contribution of this paper is in investigating two techniques for making SVM more suitable for language learning tasks. Firstly, we propose an SVM with uneven margins (SVMUM) model to deal with the problem of imbalanced training data. Secondly, SVM active learning is employed in order to alleviate the difficulty in obtaining labelled training data. The algorithms are presented and evaluated on several Information Extraction (IE) tasks, where they achieved better performance than the standard SVM and the SVM with passive learning, respectively. Moreover, by combining SVMUM with the active learning algorithm, we achieve the best reported results on the seminars and jobs corpora, which are benchmark data sets used for evaluation and comparison of machine learning algorithms for IE. In addition, we also evaluate the token based classification framework for IE with three different entity tagging schemes. In comparison to previous methods dealing with the same problems, our methods are both effective and efficient, which are valuable features for real-world applications. Due to the similarity in the formulation of the learning problem for IE and for other NLP tasks, the two techniques are likely to be beneficial in a wide range of applications 1 .


intelligent information systems | 2006

Using KCCA for Japanese---English cross-language information retrieval and document classification

Yaoyong Li; John Shawe-Taylor

Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel defined feature space. A machine learning algorithm based on KCCA is studied for cross-language information retrieval. We apply the algorithm in Japanese–English cross-language information retrieval. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. Computational complexity is an important issue when applying KCCA to large dataset as in information retrieval. We experimentally evaluate several methods to alleviate the problem of applying KCCA to large datasets. We also investigate cross-language document classification using KCCA as well as other methods. Our results show that it is feasible to use a classifier learned in one language to classify the documents in other languages.


Nature Medicine | 2017

Molecular analysis of circulating tumor cells identifies distinct copy-number profiles in patients with chemosensitive and chemorefractory small-cell lung cancer

Louise Carter; Dominic G. Rothwell; Barbara Mesquita; Christopher Smowton; Hui Sun Leong; Fabiola Fernandez-Gutierrez; Yaoyong Li; Deborah J. Burt; Jenny Antonello; Christopher J. Morrow; Cassandra L Hodgkinson; Karen Morris; Lynsey Priest; Mathew Carter; Crispin J. Miller; Andrew Hughes; Fiona Blackhall; Caroline Dive; Ged Brady

In most patients with small-cell lung cancer (SCLC)—a metastatic, aggressive disease—the condition is initially chemosensitive but then relapses with acquired chemoresistance. In a minority of patients, however, relapse occurs within 3 months of initial treatment; in these cases, disease is defined as chemorefractory. The molecular mechanisms that differentiate chemosensitive from chemorefractory disease are currently unknown. To identify genetic features that distinguish chemosensitive from chemorefractory disease, we examined copy-number aberrations (CNAs) in circulating tumor cells (CTCs) from pretreatment SCLC blood samples. After analysis of 88 CTCs isolated from 13 patients (training set), we generated a CNA-based classifier that we validated in 18 additional patients (testing set, 112 CTC samples) and in six SCLC patient-derived CTC explant tumors. The classifier correctly assigned 83.3% of the cases as chemorefractory or chemosensitive. Furthermore, a significant difference was observed in progression-free survival (PFS) (Kaplan–Meier P value = 0.0166) between patients designated as chemorefractory or chemosensitive by using the baseline CNA classifier. Notably, CTC CNA profiles obtained at relapse from five patients with initially chemosensitive disease did not switch to a chemorefractory CNA profile, which suggests that the genetic basis for initial chemoresistance differs from that underlying acquired chemoresistance.


Information Processing and Management | 2007

Advanced learning algorithms for cross-language patent retrieval and classification

Yaoyong Li; John Shawe-Taylor

We study several machine learning algorithms for cross-language patent retrieval and classification. In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms exploit the bilingual training documents and learn a semantic representation from them. We study Japanese-English cross-language patent retrieval using Kernel Canonical Correlation Analysis (KCCA), a method of correlating linear relationships between two variables in kernel defined feature spaces. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM). In particular, we study two ways of combining the KCCA and SVM and found that one particular combination called SVM_2k achieved better results than other learning algorithms for either bilingual or monolingual test documents.


Blood | 2014

RUNX1 positively regulates a cell adhesion and migration program in murine hemogenic endothelium prior to blood emergence.

Michael Lie-A-Ling; Elli Marinopoulou; Yaoyong Li; Rahima Patel; Monika Stefanska; Constanze Bonifer; Crispin J. Miller; Valerie Kouskoff; Georges Lacaud

During ontogeny, the transcription factor RUNX1 governs the emergence of definitive hematopoietic cells from specialized endothelial cells called hemogenic endothelium (HE). The ultimate consequence of this endothelial-to-hematopoietic transition is the concomitant activation of the hematopoietic program and downregulation of the endothelial program. However, due to the rare and transient nature of the HE, little is known about the initial role of RUNX1 within this population. We, therefore, developed and implemented a highly sensitive DNA adenine methyltransferase identification-based methodology, including a novel data analysis pipeline, to map early RUNX1 transcriptional targets in HE cells. This novel transcription factor binding site identification protocol should be widely applicable to other low abundance cell types and factors. Integration of the RUNX1 binding profile with gene expression data revealed an unexpected early role for RUNX1 as a positive regulator of cell adhesion- and migration-associated genes within the HE. This suggests that RUNX1 orchestrates HE cell positioning and integration prior to the release of hematopoietic cells. Overall, our genome-wide analysis of the RUNX1 binding and transcriptional profile in the HE provides a novel comprehensive resource of target genes that will facilitate the precise dissection of the role of RUNX1 in early blood development.

Collaboration


Dive into the Yaoyong Li's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Caroline Dive

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Louise Carter

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

Ged Brady

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John Brognard

University of Manchester

View shared research outputs
Researchain Logo
Decentralizing Knowledge