Jeremy S. Archuleta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeremy S. Archuleta is active.

Explore More

Publication

Featured researches published by Jeremy S. Archuleta.

high performance distributed computing | 2010

MOON: MapReduce On Opportunistic eNvironments

Heshan Lin; Xiaosong Ma; Jeremy S. Archuleta; Wu-chun Feng; Mark K. Gardner; Zhe Zhang

MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.

international parallel and distributed processing symposium | 2005

Towards efficient supercomputing: a quest for the right metric

Chung-Hsing Hsu; Wu-chun Feng; Jeremy S. Archuleta

Over the past decade, we have been building less and less efficient supercomputers, resulting in the construction of substantially larger machine rooms and even new buildings. In addition, because of the thermal power envelope of these supercomputers, a small fortune must be spent to cool them. These infrastructure costs coupled with the additional costs of administering and maintaining such (unreliable) supercomputers dramatically increases their total cost of ownership. As a result, there has been substantial interest in recent years to produce more reliable and more efficient supercomputers that are easy to maintain and use. But how does one quantify efficient supercomputing? That is, what metric should be used to evaluate how efficiently a supercomputer delivers answers? We argue that existing efficiency metrics such as the performance-power ratio are insufficient and motivate the need for a new type of efficiency metric, one that incorporates notions of reliability, availability, productivity, and total cost of ownership (TCO), for instance. In doing so, however, this paper raises more questions than it answers with respect to efficiency. And in the end, we still return to the performance-power ratio as an efficiency metric with respect to power and use it to evaluate a menagerie of processor platforms in order to provide a set of reference data points for the high-performance computing community.

BMC Bioinformatics | 2010

Missing genes in the annotation of prokaryotic genomes

Andrew S. Warren; Jeremy S. Archuleta; Wu-chun Feng; João C. Setubal

BackgroundProtein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question arises as to whether current genome annotations have systematically missing, small genes.ResultsWe have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations). The vast majority of the missing genes found are small (less than 100 aa). A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs.ConclusionsProkaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

international parallel and distributed processing symposium | 2009

Multi-dimensional characterization of temporal data mining on graphics processors

Jeremy S. Archuleta; Yong Cao; Thomas R. W. Scogland; Wu-chun Feng

Through the algorithmic design patterns of data parallelism and task parallelism, the graphics processing unit (GPU) offers the potential to vastly accelerate discovery and innovation across a multitude of disciplines. For example, the exponential growth in data volume now presents an obstacle for high-throughput data mining in fields such as neuroscience and bioinformatics. As such, we present a characterization of a MapReduced-based data-mining application on a general-purpose GPU (GPGPU). Using neuroscience as the application vehicle, the results of our multi-dimensional performance evaluation show that a “one-size-fits-all” approach maps poorly across different GPGPU cards. Rather, a high-performance implementation on the GPGPU should factor in the 1) problem size, 2) type of GPU, 3) type of algorithm, and 4) data-access method when determining the type and level of parallelism. To guide the GPGPU programmer towards optimal performance within such a broad design space, we provide eight general performance characterizations of our data-mining application.

acm sigplan symposium on principles and practice of parallel programming | 2008

Semantics-based distributed I/O for mpiBLAST

Pavan Balaji; Wu-chun Feng; Jeremy S. Archuleta; Heshan Lin; Rajkumar Kettimuthu; Rajeev Thakur; Xiaosong Ma

BLAST is a widely used software toolkit for genomic sequence search. mpiBLAST is a freely available, open-source parallelization of BLAST that uses database segmentation to allow different worker processes to search (in parallel) unique segments of the database. After searching, the workers write their output to a filesystem. While mpiBLAST has been shown to achieve high performance in clusters with fast local filesystems, its I/O processing remains a concern for scalability, especially in systems having limited I/O capabilities such as distributed filesystems spread across a wide-area network. Thus, we present ParaMEDIC---a novel environment that uses application-specific semantic information to compress I/O data and improve performance in distributed environments. Specifically, for mpiBLAST, ParaMEDIC partitions worker processes into compute and I/O workers. Compute workers, instead of directly writing the output to the filesystem, the workers process the output using semantic knowledge about the application to generate metadata and write the metadata to the filesystem. I/O workers, which physically reside closer to the actual storage, then process this metadata to re-create the actual output and write it to the filesystem. This approach allows ParaMEDIC to reduce I/O time, thus accelerating mpiBLAST by as much as 25-fold.

international conference of the ieee engineering in medicine and biology society | 2007

A Pluggable Framework for Parallel Pairwise Sequence Search

Jeremy S. Archuleta; Wu-chun Feng; Eli Tilevich

The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.

International Journal of Parallel Programming | 2012

Parallel Mining of Neuronal Spike Streams on Graphics Processing Units

Yong Cao; Debprakash Patnaik; Sean P. Ponce; Jeremy S. Archuleta; Patrick Butler; Wu-chun Feng; Naren Ramakrishnan

Multi-electrode arrays (MEAs) provide dynamic and spatial perspectives into brain function by capturing the temporal behavior of spikes recorded from cultures and living tissue. Understanding the firing patterns of neurons implicit in these spike trains is crucial to gaining insight into cellular activity. We present a solution involving a massively parallel graphics processing unit (GPU) to mine spike train datasets. We focus on mining frequent episodes of firing patterns that capture coordinated events even in the presence of intervening background events. We present two algorithmic strategies—hybrid mining and two-pass elimination—to map the finite state machine-based counting algorithms onto GPUs. These strategies explore different computation-to-core mapping schemes and illustrate innovative parallel algorithm design patterns for temporal data mining. We also provide a multi-GPU mining framework, which exhibits additional performance enhancement. Together, these contributions move us towards a real-time solution to neuronal data mining.

conference on high performance computing (supercomputing) | 2006