Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefano Lonardi is active.

Publication


Featured researches published by Stefano Lonardi.


Nature | 2012

A physical, genetic and functional sequence assembly of the barley genome

Klaus F. X. Mayer; Robbie Waugh; Peter Langridge; Timothy J. Close; Roger P. Wise; Andreas Graner; Takashi Matsumoto; Kazuhiro Sato; Alan H. Schulman; Ruvini Ariyadasa; Daniela Schulte; Naser Poursarebani; Ruonan Zhou; Burkhard Steuernagel; Martin Mascher; Uwe Scholz; Bu-Jun Shi; Kavitha Madishetty; Jan T. Svensson; Prasanna R. Bhat; Matthew J. Moscou; Josh Resnik; Gary J. Muehlbauer; Peter E. Hedley; Hui Liu; Jenny Morris; Zeev Frenkel; Avraham Korol; Hélène Bergès; Marius Felder

Barley (Hordeum vulgare L.) is among the world’s earliest domesticated and most important crop plants. It is diploid with a large haploid genome of 5.1 gigabases (Gb). Here we present an integrated and ordered physical, genetic and functional sequence resource that describes the barley gene-space in a structured whole-genome context. We developed a physical map of 4.98 Gb, with more than 3.90 Gb anchored to a high-resolution genetic map. Projecting a deep whole-genome shotgun assembly, complementary DNA and deep RNA sequence data onto this framework supports 79,379 transcript clusters, including 26,159 ‘high-confidence’ genes with homology support from other plant genomes. Abundant alternative splicing, premature termination codons and novel transcriptionally active regions suggest that post-transcriptional processing forms an important regulatory layer. Survey sequences from diverse accessions reveal a landscape of extensive single-nucleotide variation. Our data provide a platform for both genome-assisted research and enabling contemporary crop improvement.


Data Mining and Knowledge Discovery | 2007

Experiencing SAX: a novel symbolic representation of time series

Jessica Lin; Eamonn J. Keogh; Li Wei; Stefano Lonardi

Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series.In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. In particular, we will demonstrate the utility of our representation on various data mining tasks of clustering, classification, query by content, anomaly detection, motif discovery, and visualization.


knowledge discovery and data mining | 2004

Towards parameter-free data mining

Eamonn J. Keogh; Stefano Lonardi; Chotirat Ann Ratanamahatana

Most data mining algorithms require the setting of many input parameters. Two main dangers of working with parameter-laden algorithms are the following. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns. This is especially likely when the user fails to understand the role of parameters in the data mining process.Data mining algorithms should have as few parameters as possible, ideally none. A parameter-free algorithm would limit our ability to impose our prejudices, expectations, and presumptions on the problem at hand, and would let the data itself speak to us. In this work, we show that recent results in bioinformatics and computational theory hold great promise for a parameter-free data-mining paradigm. The results are motivated by observations in Kolmogorov complexity theory. However, as a practical matter, they can be implemented using any off-the-shelf compression algorithm with the addition of just a dozen or so lines of code. We will show that this approach is competitive or superior to the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/video datasets.


PLOS Genetics | 2008

Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph

Yonghui Wu; Prasanna R. Bhat; Timothy J. Close; Stefano Lonardi

Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap.


BMC Bioinformatics | 2007

Composition Profiler: a tool for discovery and visualization of amino acid composition differences

Vladimir Vacic; Vladimir N. Uversky; A. Keith Dunker; Stefano Lonardi

BackgroundComposition Profiler is a web-based tool for semi-automatic discovery of enrichment or depletion of amino acids, either individually or grouped by their physico-chemical or structural properties.ResultsThe program takes two samples of amino acids as input: a query sample and a reference sample. The latter provides a suitable background amino acid distribution, and should be chosen according to the nature of the query sample, for example, a standard protein database (e.g. SwissProt, PDB), a representative sample of proteins from the organism under study, or a group of proteins with a contrasting functional annotation. The results of the analysis of amino acid composition differences are summarized in textual and graphical form.ConclusionAs an exploratory data mining tool, our software can be used to guide feature selection for protein function or structure predictors. For classes of proteins with significant differences in frequencies of amino acids having particular physico-chemical (e.g. hydrophobicity or charge) or structural (e.g. α helix propensity) properties, Composition Profiler can be used as a rough, light-weight visual classifier.


Proceedings of the National Academy of Sciences of the United States of America | 2009

Immune profile and mitotic index of metastatic melanoma lesions enhance clinical staging in predicting patient survival

Dusan Bogunovic; David O'Neill; Ilana Belitskaya-Lévy; Vladimir Vacic; Yi-Lo Yu; Sylvia Adams; Farbod Darvishian; Russell S. Berman; Richard L. Shapiro; Anna C. Pavlick; Stefano Lonardi; Jiri Zavadil; Iman Osman; Nina Bhardwaj

Although remission rates for metastatic melanoma are generally very poor, some patients can survive for prolonged periods following metastasis. We used gene expression profiling, mitotic index (MI), and quantification of tumor infiltrating leukocytes (TILs) and CD3+ cells in metastatic lesions to search for a molecular basis for this observation and to develop improved methods for predicting patient survival. We identified a group of 266 genes associated with postrecurrence survival. Genes positively associated with survival were predominantly immune response related (e.g., ICOS, CD3d, ZAP70, TRAT1, TARP, GZMK, LCK, CD2, CXCL13, CCL19, CCR7, VCAM1) while genes negatively associated with survival were cell proliferation related (e.g., PDE4D, CDK2, GREF1, NUSAP1, SPC24). Furthermore, any of the 4 parameters (prevalidated gene expression signature, TILs, CD3, and in particular MI) improved the ability of Tumor, Node, Metastasis (TNM) staging to predict postrecurrence survival; MI was the most significant contributor (HR = 2.13, P = 0.0008). An immune response gene expression signature and presence of TILs and CD3+ cells signify immune surveillance as a mechanism for prolonged survival in these patients and indicate improved patient subcategorization beyond current TNM staging.


Nature | 2017

A chromosome conformation capture ordered sequence of the barley genome

Martin Mascher; Heidrun Gundlach; Axel Himmelbach; Sebastian Beier; Sven O. Twardziok; Thomas Wicker; Volodymyr Radchuk; Christoph Dockter; Peter E. Hedley; Joanne Russell; Micha Bayer; Luke Ramsay; Hui Liu; Georg Haberer; Xiao-Qi Zhang; Qisen Zhang; Roberto A. Barrero; Lin Li; Marco Groth; Marius Felder; Alex Hastie; Hana Šimková; Helena Staňková; Jan Vrána; Saki Chan; María Muñoz-Amatriaín; Rachid Ounit; Steve Wanamaker; Daniel M. Bolser; Christian Colmsee

Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.


BMC Genomics | 2015

CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

Rachid Ounit; Steve Wanamaker; Timothy J. Close; Stefano Lonardi

BackgroundThe problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce.ResultsWe introduce Clark a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of Clark is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode Clark classifies, with high accuracy, about 32 million metagenomic short reads per minute. Clark can also classify BAC clones or transcripts to chromosome arms and centromeric regions.ConclusionsClark is a versatile, fast and accurate sequence classification method, especially useful for metagenomics and genomics applications. It is freely available at http://clark.cs.ucr.edu/.


knowledge discovery and data mining | 2004

Visually mining and monitoring massive time series

Jessica Lin; Eamonn J. Keogh; Stefano Lonardi; Jeffrey P. Lankford; Donna M. Nystrom

Moments before the launch of every space vehicle, engineering discipline specialists must make a critical go/no-go decision. The cost of a false positive, allowing a launch in spite of a fault, or a false negative, stopping a potentially successful launch, can be measured in the tens of millions of dollars, not including the cost in morale and other more intangible detriments. The Aerospace Corporation is responsible for providing engineering assessments critical to the go/no-go decision for every Department of Defense space vehicle. These assessments are made by constantly monitoring streaming telemetry data in the hours before launch. We will introduce VizTree, a novel time-series visualization tool to aid the Aerospace analysts who must make these engineering assessments. VizTree was developed at the University of California, Riverside and is unique in that the same tool is used for mining archival data and monitoring incoming live telemetry. The use of a single tool for both aspects of the task allows a natural and intuitive transfer of mined knowledge to the monitoring task. Our visualization approach works by transforming the time series into a symbolic representation, and encoding the data in a modified suffix tree in which the frequency and other properties of patterns are mapped onto colors and other visual properties. We demonstrate the utility of our system by comparing it with state-of-the-art batch algorithms on several real and synthetic datasets.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005

Assignment of Orthologous Genes via Genome Rearrangement

Xin Chen; Jie Zheng; Zheng Fu; Peng Nan; Yang Zhong; Stefano Lonardi; Tao Jiang

The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.

Collaboration


Dive into the Stefano Lonardi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yonghui Wu

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tao Jiang

University of California

View shared research outputs
Top Co-Authors

Avatar

Alberto Apostolico

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge