Hieu Dinh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hieu Dinh is active.

Explore More

Publication

Featured researches published by Hieu Dinh.

BMC Bioinformatics | 2011

PMS5: an efficient exact algorithm for the (ℓ, d )-motif finding problem

Hieu Dinh; Sanguthevar Rajasekaran; Vamsi Kundeti

BackgroundMotifs are patterns found in biological sequences that are vital for understanding gene function, human disease, drug design, etc. They are helpful in finding transcriptional regulatory elements, transcription factor binding sites, and so on. As a result, the problem of identifying motifs is very crucial in biology.ResultsMany facets of the motif search problem have been identified in the literature. One of them is (ℓ, d)-motif search (or Planted Motif Search (PMS)). The PMS problem has been well investigated and shown to be NP-hard. Any algorithm for PMS that always finds all the (ℓ, d)-motifs on a given input set is called an exact algorithm. In this paper we focus on exact algorithms only. All the known exact algorithms for PMS take exponential time in some of the underlying parameters in the worst case scenario. But it does not mean that we cannot design exact algorithms for solving practical instances within a reasonable amount of time. In this paper, we propose a fast algorithm that can solve the well-known challenging instances of PMS: (21, 8) and (23, 9). No prior exact algorithm could solve these instances. In particular, our proposed algorithm takes about 10 hours on the challenging instance (21, 8) and about 54 hours on the challenging instance (23, 9). The algorithm has been run on a single 2.4GHz PC with 3GB RAM. The implementation of PMS5 is freely available on the web at http://www.pms.engr.uconn.edu/downloads/PMS5.zip.ConclusionsWe present an efficient algorithm PMS5 that uses some novel ideas and combines them with well-known algorithm PMS1 and PMSPrune. PMS5 can tackle the large challenging instances (21, 8) and (23, 9). Therefore, we hope that PMS5 will help biologists discover longer motifs in the futures.

PLOS ONE | 2012

qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences

Hieu Dinh; Sanguthevar Rajasekaran; Jaime Davila

Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the -motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.

BMC Bioinformatics | 2010

Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs

Varmsi K. Kundeti; Sanguthevar Rajasekaran; Hieu Dinh; Matthew W. Vaughn; Vishal Thapar

BackgroundAssembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(n Σ) messages (Σ being the size of the alphabet).ResultsIn this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem.ConclusionsThe bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.

international conference on smart grid communications | 2010

Cascading Failures in Smart Grid - Benefits of Distributed Generation

Xian Chen; Hieu Dinh; Bing Wang

Smart grid is envisioned to incorporate local distributed power generation for better efficiency and flexibility. Distributed generation, when not used carefully, however, may compromise the stability of the grid. Recently, researchers have proposed innovative architectures (e.g., microgrid, LoCal grid) that virtualize a local generator as a constant load, source, or zero load to the grid, thus offering great promise to connect distributed generation into the grid without sacrificing its reliability. In fact, intuitively, using these architectures, distributed generation may enhance the stability of the power grid. In this paper, we develop a simulation model to quantify how much distributed generation can mitigate cascading failures. Applying this model to IEEE power grid test cases, we find that local power generation, even when only using a small number of local generators, can reduce the likelihood of cascading failures dramatically.

BMC Research Notes | 2011

A speedup technique for (l, d)-motif finding algorithms

Sanguthevar Rajasekaran; Hieu Dinh

BackgroundThe discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem.ResultsThree versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS), (l, d)-motif search (or Planted Motif Search (PMS)), and Edit-distance-based Motif Search (EMS). In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms.ConclusionsWe present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very effective. The implementation of algorithms is freely available on the web at http://www.engr.uconn.edu/rajasek/PMS4.zip

IEEE Transactions on Mobile Computing | 2012

Fault Localization Using Passive End-to-End Measurements and Sequential Testing for Wireless Sensor Networks

Bing Wang; Wei Wei; Hieu Dinh; Wei Zeng; Krishna R. Pattipati

Faulty components in a network need to be localized and repaired to sustain the health of the network. In this paper, we propose a novel approach that carefully combines active and passive measurements to localize faults in wireless sensor networks. More specifically, we formulate a problem of optimal sequential testing guided by end-to-end data. This problem determines an optimal testing sequence of network components based on end-to-end data in sensor networks to minimize expected testing cost. We prove that this problem is NP-hard, and propose a recursive approach to solve it. This approach leads to a polynomial-time optimal algorithm for line topologies while requiring exponential running time for general topologies. We further develop two polynomial-time heuristic schemes that are applicable to general topologies. Extensive simulation shows that our heuristic schemes only require testing a very small set of network components to localize and repair all faults in the network. Our approach is superior to using active and passive measurements in isolation. It also outperforms the state-of-the-art approaches that localize and repair all faults in a network.

wireless algorithms systems and applications | 2009

Data Collection with Multiple Sinks in Wireless Sensor Networks

Sixia Chen; Matthew A. Coolbeth; Hieu Dinh; Yoo-Ah Kim; Bing Wang

In this paper, we consider Multiple-Sink Data Collection Problem in wireless sensor networks, where a large amount of data from sensor nodes need to be transmitted to one of multiple sinks. We design an approximation algorithm to minimize the latency of data collection schedule and show that it gives a constant-factor performance guarantee. We also present a heuristic algorithm based on breadth first search for this problem. Using simulation, we evaluate the performance of these two algorithms, and show that the approximation algorithm outperforms the heuristic up to 60%.

Bioinformatics | 2011

A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly

Hieu Dinh; Sanguthevar Rajasekaran

MOTIVATION Exact-match overlap graphs have been broadly used in the context of DNA assembly and the shortest super string problem where the number of strings n ranges from thousands to billions. The length ℓ of the strings is from 25 to 1000, depending on the DNA sequencing technologies. However, many DNA assemblers using overlap graphs suffer from the need for too much time and space in constructing the graphs. It is nearly impossible for these DNA assemblers to handle the huge amount of data produced by the next-generation sequencing technologies where the number n of strings could be several billions. If the overlap graph is explicitly stored, it would require Ω(n(2)) memory, which could be prohibitive in practice when n is greater than a hundred million. In this article, we propose a novel data structure using which the overlap graph can be compactly stored. This data structure requires only linear time to construct and and linear memory to store. RESULTS For a given set of input strings (also called reads), we can informally define an exact-match overlap graph as follows. Each read is represented as a node in the graph and there is an edge between two nodes if the corresponding reads overlap sufficiently. A formal description follows. The maximal exact-match overlap of two strings x and y, denoted by ov(max)(x, y), is the longest string which is a suffix of x and a prefix of y. The exact-match overlap graph of n given strings of length ℓ is an edge-weighted graph in which each vertex is associated with a string and there is an edge (x, y) of weight ω=ℓ-|ov(max)(x, y)| if and only if ω ≤ λ, where |ov(max)(x, y)| is the length of ov(max)(x, y) and λ is a given threshold. In this article, we show that the exact-match overlap graphs can be represented by a compact data structure that can be stored using at most (2λ-1)(2⌈logn⌉+⌈logλ⌉)n bits with a guarantee that the basic operation of accessing an edge takes O(log λ) time. We also propose two algorithms for constructing the data structure for the exact-match overlap graph. The first algorithm runs in O(λℓnlogn) worse-case time and requires O(λ) extra memory. The second one runs in O(λℓn) time and requires O(n) extra memory. Our experimental results on a huge amount of simulated data from sequence assembly show that the data structure can be constructed efficiently in time and memory. AVAILABILITY Our DNA sequence assembler that incorporates the data structure is freely available on the web at http://www.engr.uconn.edu/~htd06001/assembler/leap.zip

Computer Communications | 2012

Sniffer channel selection for monitoring wireless LANs

Xian Chen; Yoo-Ah Kim; Bing Wang; Yuan Song; Hieu Dinh; Guanling Chen

Wireless sniffers are often used to monitor access points (APs) in wireless LANs (WLANs) for network management, fault detection, and traffic characterization. It is cost effective to deploy single-radio sniffers that can monitor multiple nearby APs. To achieve this, a sniffer needs to switch among multiple channels since these APs often operate on orthogonal channels. In this paper, we formulate and solve two optimization problems on sniffer channel selection. Both problems require that each AP be monitored by at least one sniffer. In addition, one optimization problem requires minimizing the maximum number of channels that a sniffer listens to, and the other requires minimizing the total number of channels that the sniffers listen to. We prove that both optimization problems are NP-hard. For each problem, we propose three algorithms to solve it, one based on integer programming (IP), one based on LP-relaxation, and the third based on a greedy heuristic. We evaluate the performance of the various algorithms using two real-world datasets. Our results show that, for each problem, all the three algorithms are effective in achieving their optimization goals, and overall, the LP-based algorithm outperforms the other two algorithms.

Journal of Computational Biology | 2014

Border Length Minimization Problem on a Square Array

Vamsi Kundeti; Sanguthevar Rajasekaran; Hieu Dinh

Protein/peptide microarrays are rapidly gaining momentum in the diagnosis of cancer. High-density and high-throughput peptide arrays are being extensively used to detect tumor biomarkers, examine kinase activity, identify antibodies having low serum titers, and locate antibody signatures. Improving the yield of microarray fabrication involves solving a hard combinatorial optimization problem called the border length minimization problem (BLMP). An important question that remained open for the past 7 years is if the BLMP is tractable or not. We settle this open problem by proving that the BLMP is [Formula: see text]-hard. We also present a hierarchical refinement algorithm that can refine any heuristic solution for the BLMP and prove that the TSP+1-threading heuristic is an O(N)-approximation.

Explore More