Daniel Valenzuela | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Valenzuela is active.

Explore More

Publication

Featured researches published by Daniel Valenzuela.

symposium on experimental and efficient algorithms | 2011

Practical compressed document retrieval

Gonzalo Navarro; Simon J. Puglisi; Daniel Valenzuela

Recent research on document retrieval for general texts has established the virtues of explicitly representing the so-called document array, which stores the document each pointer of the suffix array belongs to. While it makes document retrieval faster, this array occupies a significative amount of redundant space and is not easily compressible. In this paper we present the first practical proposal to compress the document array. We show that the resulting structure is significatively smaller than the uncompressed counterpart, and than alternatives to the document array proposed in the literature. We also compare various known algorithms for document listing and top-k retrieval, and find that the most useful combinations of algorithms run over our new compressed document arrays.

Journal of Discrete Algorithms | 2013

Improved compressed indexes for full-text document retrieval

Djamal Belazzougui; Gonzalo Navarro; Daniel Valenzuela

We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least |CSA|+O(nlgDlglgD) or 2|CSA|+o(n) bits of space, where CSA is a full-text index. Using monotone minimal perfect hash functions (mmphfs), we give new algorithms for document listing with frequencies and top-k document retrieval using just |CSA|+O(nlglglgD) bits. We also improve current solutions that use 2|CSA|+o(n) bits, and consider other problems such as colored range listing, top-k most important documents, and computing arbitrary frequencies. We give proof-of-concept experimental results that show that using mmphfs may provide relevant practical tradeoffs for document listing with frequencies.

Briefings in Bioinformatics | 2016

Computational pan-genomics: status, promises and challenges

Tobias Marschall; Manja Marz; Thomas Abeel; Louis J. Dijkstra; Bas E. Dutilh; Ali Ghaffaari; Paul J. Kersey; Wigard P. Kloosterman; Veli Mäkinen; Adam M. Novak; Benedict Paten; David Porubsky; Eric Rivals; Can Alkan; Jasmijn A. Baaijens; Paul I. W. de Bakker; Valentina Boeva; Raoul J. P. Bonnal; Francesca Chiaromonte; Rayan Chikhi; Francesca D. Ciccarelli; Robin Cijvat; Erwin Datema; Cornelia M. van Duijn; Evan E. Eichler; Corinna Ernst; Eleazar Eskin; Erik Garrison; Mohammed El-Kebir; Gunnar W. Klau

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

symposium on experimental and efficient algorithms | 2012

Space-Efficient top-k document retrieval

Gonzalo Navarro; Daniel Valenzuela

Supporting top-k document retrieval queries on general text databases, that is, finding the k documents where a given pattern occurs most frequently, has become a topic of interest with practical applications. While the problem has been solved in optimal time and linear space, the actual space usage is a serious concern. In this paper we study various reduced-space structures that support top-k retrieval and propose new alternatives. Our experimental results show that our novel structures and algorithms dominate almost all the space/time tradeoff.

ACM Journal of Experimental Algorithms | 2015

General Document Retrieval in Compact Space

Gonzalo Navarro; Simon J. Puglisi; Daniel Valenzuela

Given a collection of documents and a query pattern, document retrieval is the problem of obtaining documents that are relevant to the query. The collection is available beforehand so that a data structure, called an index, can be built on it to speed up queries. While initially restricted to natural language text collections, document retrieval problems arise nowadays in applications like bioinformatics, multimedia databases, and web mining. This requires a more general setup where text and pattern can be general sequences of symbols, and the classical inverted indexes developed for words cannot be applied. While linear-space time-optimal solutions have been developed for most interesting queries in this general case, space usage is a serious problem in practice. In this article, we develop compact data structures that solve various important document retrieval problems on general text collections. More specifically, we provide practical solutions for listing the documents where a query pattern appears, together with its frequency in each document, and for listing k documents where a query pattern appears most frequently. Some of our techniques build on existing theoretical proposals, while others are new. In particular, we introduce a novel grammar-based compressed bitmap representation that may be of independent interest when dealing with repetitive sequences. Ours are the first practical indexes that use less space when the text collection is compressible. Our experimental results show that, in various real-life text collections, our data structures are significantly smaller than the most space-efficient previous solutions, using up to half the space without noticeably increasing the query time. Overall, document listing can be carried out in 10 to 40 milliseconds for patterns that appear 100 to 10,000 times in the collection, whereas top-k retrieval is carried out in k to 10 k milliseconds.

Amyotrophic Lateral Sclerosis | 2015

Amyotrophic lateral sclerosis mortality rates in Chile: A population based study (1994–2010)

Daniel Valenzuela; Pedro Zitko; Patricia Lillo

Our objective was to describe amyotrophic lateral sclerosis (ALS) mortality rates in the Chilean population over a 17-year period. Chilean death records (1994–2010) were reviewed for the ICD-10 diagnosis G.12.2 (including motor neuron disease and similar conditions), and weighted with population data. Crude and standardized mortality rates by ALS were calculated at the nationwide level and by geographic zone. A risk analysis was performed in successive cohorts from 1910–1919 to 1960–1969, comparing mortality slopes. One thousand six hundred and seventy-one deaths were recorded during 1994–2010, with an average of 1.13 per 100,000, a 1.2:1 male/female ratio, and a statistically significant increase in mortality rate. According to geographical distribution, the Austral area, with a larger population of European origin, showed higher mortality rates compared to the national average. The cohort analysis showed an increasing risk of dying from ALS for all cohorts, and highest above 64 years of age, becoming a competitive cause of death in older ages. In conclusion, as expected, the mortality rate in Chile by ALS is higher than that reported previously in our country, and similar to other Latin American countries. ALS mortality rate has increased over time probably due to the aging of the population and decline in rates for competing causes of death.

Revista Medica De Chile | 2014

Manifestaciones neuropsiquiátricas y cognitivas en demencia frontotemporal y esclerosis lateral amiotrófica: dos polos de una entidad común

Patricia Lillo; José Manuel Matamala; Daniel Valenzuela; Renato J. Verdugo; José Castillo; Agustín Ibáñez; Andrea Slachevsky

Recent genetic and neuropathologic advances support the concept that frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) are overlapping multisystem disorders. While 10-15% of ALS patients fulfil criteria for FTD, features of motor neuron disease appear in approximately 15% of FTD patients, during the evolution of the disease. This overlap has been reinforced by the discovery of Transactive Response DNA Binding Protein 43 kDa (TDP43) inclusions as the main neuropathologic finding in the majority of ALS cases and almost a half of FTD cases. Also, an expansion in the intron of C9ORF72 (chromosome 9p21) has been identified in families affected by ALS, ALS-FTD and FTD. This review provides an update on the recent genetic and neuropathologic findings of ALS and FTD and a characterization of their clinical presentation forms, based on the current diagnostic criteria. Finally it underscores the importance of having a national registry of patients with ALS and FTD, to provide an earlier diagnosis and a multidisciplinary care.Recent genetic and neuropathologic advances support the concept that frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) are overlapping multisystem disorders. While 10-15% of ALS patients fulfil criteria for FTD, features of motor neuron disease appear in approximately 15% of FTD patients, during the evolution of the disease. This overlap has been reinforced by the discovery of Transactive Response DNA Binding Protein 43 kDa (TDP43) inclusions as the main neuropathologic finding in the majority of ALS cases and almost a half of FTD cases. Also, an expansion in the intron of C9ORF72 (chromosome 9p21) has been identified in families affected by ALS, ALS-FTD and FTD. This review provides an update on the recent genetic and neuropathologic findings of ALS and FTD and a characterization of their clinical presentation forms, based on the current diagnostic criteria. Finally it underscores the importance of having a national registry of patients with ALS and FTD, to provide an earlier diagnosis and a multidisciplinary care.

BMC Genomics | 2014

Recombination-aware alignment of diploid individuals

Veli Mäkinen; Daniel Valenzuela

BackgroundTraditionally biological similarity search has been studied under the abstraction of a single string to represent each genome. The more realistic representation of diploid genomes, with two strings defining the genome, has so far been largely omitted in this context. With the development of sequencing techniques and better phasing routines through haplotype assembly algorithms, we are not far from the situation when individual diploid genomes could be represented in their full complexity with a pair-wise alignment defining the genome.ResultsWe propose a generalization of global alignment that is designed to measure similarity between phased predictions of individual diploid genomes. This generalization takes into account that individual diploid genomes evolve through a mutation and recombination process, and that predictions may be erroneous in both dimensions. Even though our model is generic, we focus on the case where one wants to measure only the similarity of genome content allowing free recombination. This results into efficient algorithms for direct application in (i) evaluation of variation calling predictions and (ii) progressive multiple alignments based on labeled directed acyclic graphs (DAGs) to represent profiles. The latter may be of more general interest, in connection to covering alignment of DAGs. Extensions of our model and algorithms can be foreseen to have applications in evaluating phasing algorithms, as well as more fundamental role in phasing child genome based on parent genomes.

Discrete Applied Mathematics | 2017