Alexander Behm
University of California, Irvine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexander Behm.
Distributed and Parallel Databases | 2011
Alexander Behm; Vinayak R. Borkar; Michael J. Carey; Raman Grover; Chen Li; Nicola Onose; Rares Vernica; Alin Deutsch; Yannis Papakonstantinou; Vassilis J. Tsotras
ASTERIX is a new data-intensive storage and computing platform project spanning UC Irvine, UC Riverside, and UC San Diego. In this paper we provide an overview of the ASTERIX project, starting with its main goal—the storage and analysis of data pertaining to evolving-world models. We describe the requirements and associated challenges, and explain how the project is addressing them. We provide a technical overview of ASTERIX, covering its architecture, its user model for data and queries, and its approach to scalable query processing and data management. ASTERIX utilizes a new scalable runtime computational platform called Hyracks that is also discussed at an overview level; we have recently made Hyracks available in open source for use by other interested parties. We also relate our work on ASTERIX to the current state of the art and describe the research challenges that we are currently tackling as well as those that lie ahead.
Nucleic Acids Research | 2012
Athena Ahmadi; Alexander Behm; Nagesh Honnalli; Chen Li; Lingjie Weng; Xiaohui Xie
Recent advances in sequencing technology have enabled the rapid generation of billions of bases at relatively low cost. A crucial first step in many sequencing applications is to map those reads to a reference genome. However, when the reference genome is large, finding accurate mappings poses a significant computational challenge due to the sheer amount of reads, and because many reads map to the reference sequence approximately but not exactly. We introduce Hobbes, a new gram-based program for aligning short reads, supporting Hamming and edit distance. Hobbes implements two novel techniques, which yield substantial performance improvements: an optimized gram-selection procedure for reads, and a cache-efficient filter for pruning candidate mappings. We systematically tested the performance of Hobbes on both real and simulated data with read lengths varying from 35 to 100 bp, and compared its performance with several state-of-the-art read-mapping programs, including Bowtie, BWA, mrsFast and RazerS. Hobbes is faster than all other read mapping programs we have tested while maintaining high mapping quality. Hobbes is about five times faster than Bowtie and about 2–10 times faster than BWA, depending on read length and error rate, when asked to find all mapping locations of a read in the human genome within a given Hamming or edit distance, respectively. Hobbes supports the SAM output format and is publicly available at http://hobbes.ics.uci.edu.
advances in geographic information systems | 2010
Sattam Alsubaiee; Alexander Behm; Chen Li
Many Web sites support keyword search on their spatial data, such as business listings and photos. In these systems, inconsistencies and errors can exist in both queries and the data. To bridge the gap between queries and data, it is important to support approximate keyword search on spatial data. In this paper we study how to answer such queries efficiently. We focus on a natural index structure that augments a tree-based spatial index with capabilities for approximate keyword search. We systematically study how to efficiently combine these two types of indexes, and how to search the resulting index to find answers. We develop three algorithms for constructing the index, successively improving the time and space efficiency by exploiting the textual and spatial properties of the data. We experimentally demonstrate the efficiency of our techniques on real, large datasets.
international conference on data engineering | 2011
Alexander Behm; Chen Li; Michael J. Carey
An approximate string query is to find from a collection of strings those that are similar to a given query string. Answering such queries is important in many applications such as data cleaning and record linkage, where errors could occur in queries as well as the data. Many existing algorithms have focused on in-memory indexes. In this paper we investigate how to efficiently answer such queries in a disk-based setting, by systematically studying the effects of storing data and indexes on disk. We devise a novel physical layout for an inverted index to answer queries and we study how to construct it with limited buffer space. To answer queries, we develop a cost-based, adaptive algorithm that balances the I/O costs of retrieving candidate matches and accessing inverted lists. Experiments on large, real datasets verify that simply adapting existing algorithms to a disk-based setting does not work well and that our new techniques answer queries efficiently. Further, our solutions significantly outperform a recent tree-based index, BED-tree.
Archive | 2016
Marcel Kornacker; Alexander Behm; Victor Bittorf; Taras Bobrovytsky; Casey Ching; Alan Choi; Justin Erickson; Martin Grund; Daniel Hecht; Matthew Jacobs; Ishaan Joshi; Lenni Kuff; Dileep Kumar; Alex Leblang; Nong Li; Ippokratis Pandis; Henry Noel Robinson; David Rorke; Silvius Rus; John Russel; Dimitris Tsirogiannis; Skye Wanderman-Milne; Michael Yoder
Impala von Cloudera ist ein modernes, massiv paralleles Datenbanksystem, welches von Grund auf fur die Bedurfnisse und Anforderungen einer Big Data Umgebung wie Hadoop entworfen wurde. Das Ziel von Impala ist es, klassische SQL-Abfragen mit geringer Latenz und Laufzeit auszufuhren, so wie man es von typischen BI/DW Losungen gewohnt ist. Gleichzeitig sollen dabei sehr grose Quelldaten in Hadoop gelesen werden, ohne dass ein weiterer Extraktionsprozess in zusatzliche Systemlandschaften notwendig ist. Dieses Kapitel soll einen Uberblick uber Impala aus der Benutzerperspektive geben und detaillierter auf die Hauptkomponenten und deren Entwurfsentscheidungen eingehen. Zusatzlich werden wir einen Geschwindigkeitsvergleich mit anderen bekannten SQL-auf-Hadoop Losungen vorstellen, der den besonderen Ansatz von Impala unterstreicht.
conference on innovative data systems research | 2015
Marcel Kornacker; Alexander Behm; Victor Bittorf; Taras Bobrovytsky; Casey Ching; Alan Choi; Justin Erickson; Martin Grund; Daniel Hecht; Matthew Jacobs; Ishaan Joshi; Lenni Kuff; Dileep Kumar; Alex Leblang; Nong Li; Ippokratis Pandis; Henry Noel Robinson; David Rorke; Silvius Rus; John Russell; Dimitris Tsirogiannis; Skye Wanderman-Milne; Michael Yoder
international conference on data engineering | 2009
Alexander Behm; Shengyue Ji; Chen Li; Jiaheng Lu
very large data bases | 2014
Sattam Alsubaiee; Yasser Altowim; Hotham Altwaijry; Alexander Behm; Vinayak R. Borkar; Yingyi Bu; Michael J. Carey; Inci Cetindil; Madhusudan Cheelangi; Khurram Faraaz; Eugenia Gabrielova; Raman Grover; Zachary Heilbron; Young-Seok Kim; Chen Li; Guangqiang Li; Ji Mahn Ok; Nicola Onose; Pouria Pirzadeh; Vassilis J. Tsotras; Rares Vernica; Jian Wen; Till Westmann
very large data bases | 2012
Sattam Alsubaiee; Yasser Altowim; Hotham Altwaijry; Alexander Behm; Vinayak R. Borkar; Yingyi Bu; Michael J. Carey; Raman Grover; Zachary Heilbron; Young-Seok Kim; Chen Li; Nicola Onose; Pouria Pirzadeh; Rares Vernica; Jian Wen
very large data bases | 2014
Sattam Alsubaiee; Alexander Behm; Vinayak R. Borkar; Zachary Heilbron; Young-Seok Kim; Michael J. Carey; Markus Dreseler; Chen Li