Hsiao Ping Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hsiao Ping Lee is active.

Explore More

Publication

Featured researches published by Hsiao Ping Lee.

Information Sciences | 2008

Hierarchical multi-pattern matching algorithm for network content inspection

Tzu-Fang Sheu; Nen-Fu Huang; Hsiao Ping Lee

Inspection engines that can inspect network content for application-layer information are urgently required. In-depth packet inspection engines, which search the whole packet payload, can identify the interested packets that contain certain patterns. Network equipment then utilizes the searching results from the inspection engines for application-oriented management. The most important technology for fast packet inspection is an efficient multi-pattern matching algorithm to perform exact string matching between packets and a large set of patterns. This paper proposes a novel hierarchical multi-pattern matching algorithm (HMA) for packet inspection. HMA builds hierarchical index tables from the most frequent common-codes, and efficiently reduces the amount of external memory accesses and memory space by two-tier and cluster-wise matching. Analysis and simulation results reveal that HMA performs much better than state-of-the-art matching algorithms. In particular, HMA can update patterns incrementally, thus creating a reliable network system.

BMC Bioinformatics | 2010

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

Hsiao Ping Lee; Tzu-Fang Sheu; Chuan Yi Tang

BackgroundDNA signatures are distinct short nucleotide sequences that provide valuable information that is used for various purposes, such as the design of Polymerase Chain Reaction primers and microarray experiments. Biologists usually use a discovery algorithm to find unique signatures from DNA databases, and then apply the signatures to microarray experiments. Such discovery algorithms require to set some input factors, such as signature length l and mismatch tolerance d, which affect the discovery results. However, suggestions about how to select proper factor values are rare, especially when an unfamiliar DNA database is used. In most cases, biologists typically select factor values based on experience, or even by guessing. If the discovered result is unsatisfactory, biologists change the input factors of the algorithm to obtain a new result. This process is repeated until a proper result is obtained. Implicit signatures under the discovery condition (l, d) are defined as the signatures of length ≤ l with mismatch tolerance ≥ d. A discovery algorithm that could discover all implicit signatures, such that those that meet the requirements concerning the results, would be more helpful than one that depends on trial and error. However, existing discovery algorithms do not address the need to discover all implicit signatures.ResultsThis work proposes two discovery algorithms - the consecutive multiple discovery (CMD) algorithm and the parallel and incremental signature discovery (PISD) algorithm. The PISD algorithm is designed for efficiently discovering signatures under a certain discovery condition. The algorithm finds new results by using previously discovered results as candidates, rather than by using the whole database. The PISD algorithm further increases discovery efficiency by applying parallel computing. The CMD algorithm is designed to discover implicit signatures efficiently. It uses the PISD algorithm as a kernel routine to discover implicit signatures efficiently under every feasible discovery condition.ConclusionsThe proposed algorithms discover implicit signatures efficiently. The presented CMD algorithm has up to 97% less execution time than typical sequential discovery algorithms in the discovery of implicit signatures in experiments, when eight processing cores are used.

global communications conference | 2006

NIS04-6: A Time- and Memory- Efficient String Matching Algorithm for Intrusion Detection Systems

Tzu-Fang Sheu; Nen-Fu Huang; Hsiao Ping Lee

Intrusion Detection Systems (IDSs) are known as useful tools for identifying malicious attempts over the network. The most essential part to an IDS is the searching engine that inspects every packet through the network. To strictly defend the protectorate, an IDS must be able to inspect packets at line rate and also provide guaranteed performance even under heavy attacks. Therefore, in this paper we propose an efficient string matching algorithm (named ACM) with compact memory as well as high worst-case performance. Using a magic number heuristic based on the Chinese remainder theorem, the proposed ACM significantly reduces the memory requirement without bringing complex processes. Furthermore, the latency of off-chip memory references is drastically reduced. The proposed ACM can be easily implemented in hardware and software. As a result, ACM enables cost-effective and efficient IDSs.

BMC Bioinformatics | 2014

An algorithm of discovering signatures from DNA databases on a computer cluster

Hsiao Ping Lee; Tzu-Fang Sheu

BackgroundSignatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved.ResultsIn this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms.ConclusionsThe algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.

global communications conference | 2005

A novel hierarchical matching algorithm for intrusion detection systems

Tzu-Fang Sheu; Nen-Fu Huang; Hsiao Ping Lee

As more and more network security threats are emerging today, the network-based intrusion detection system (NIDS) is one of the most important systems to protect the network from attacks and intrusions without modifying end-user software. Searching through entire packet headers and payloads, NIDSs can identify and classify the packets that contain malicious patterns. The most essential technology to the NIDS is an efficient multiple-pattern matching algorithm, which performs exact string matching between packets and a large set of patterns. This paper proposes a novel hierarchical multiple-pattern matching algorithm (HMA) for intrusion detection, which is a two-tier and cluster-wise matching algorithm. HMA drastically reduces the amount of external memory access as well as required memory space, enabling an efficient and cost-effective real-time IDS. The simulations show that HMA significantly improves the matching performance in both the average and the worst cases (about 1.7-63 times better than the state-of-the-art algorithms).

acm symposium on applied computing | 2005

Efficient discovery of unique signatures on whole-genome EST databases

Hsiao Ping Lee; Tzu Fang Sheu; Yin Te Tsai

Expressed Sequence Tags (EST) are widely used for the discovery of new genes, particularly those involved in human disease processes. A subsequence in an EST dataset is unique if it appears only in one EST sequence of the dataset but does not appear in any other EST sequence. The unique subsequences can be regarded as signatures that distinguish an EST from all the others, and provide valuable information for many applications, such as PCR primer designs and microarray experiments. The discoveries of unique signatures on large-scale EST datasets are previously computational challenges. In this paper, we propose two efficient algorithms to extract the unique signatures from EST databases. The algorithms perform impressive discovery efficiencies in the experiments on real human ESTs.

bioinformatics and bioengineering | 2004

An IDC-based algorithm for efficient homology filtration with guaranteed seriate coverage

Hsiao Ping Lee; Yin Te Tsai; Ching Hua Shih; Tzu Fang Sheu; Chuan Yi Tang

The homology search within genomic databases is a fundamental and crucial work for biological knowledge discovery. With exponentially increasing sizes and accesses of databases, the filtration approach, which filters impossible homology candidates to reduce the time for homology verification, becomes more important in bioinformatics. Most of known gram-based filtration approaches, like QUASAR, in the literature have limited error tolerance and would conduct potentially higher false-positives. In this paper, we present an IDC-based lossless filtration algorithm with guaranteed seriate coverage and error tolerance for efficient homology discovery. In our method, the original work of homology extraction with requested seriate coverage and error levels is transformed to a longest increasing subsequence problem with range constraints, and an efficient algorithm is proposed for the problem in this paper. The experimental results show that the method significantly outperforms QUASAR. On some comparable sensitivity levels, our homology filter would make the discovery more than three orders of magnitude faster than that QUASAR does, and more than four orders faster than the exhaustive search.

IEEE Transactions on Dependable and Secure Computing | 2010

In-Depth Packet Inspection Using a Hierarchical Pattern Matching Algorithm

Tzu-Fang Sheu; Nen-Fu Huang; Hsiao Ping Lee

Detection engines capable of inspecting packet payloads for application-layer network information are urgently required. The most important technology for fast payload inspection is an efficient multipattern matching algorithm, which performs exact string matching between packets and a large set of predefined patterns. This paper proposes a novel Enhanced Hierarchical Multipattern Matching Algorithm (EHMA) for packet inspection. Based on the occurrence frequency of grams, a small set of the most frequent grams is discovered and used in the EHMA. EHMA is a two-tier and cluster-wise matching algorithm, which significantly reduces the amount of external memory accesses and the capacity of memory. Using a skippable scan strategy, EHMA speeds up the scanning process. Furthermore, independent of parallel and special functions, EHMA is very simple and therefore practical for both software and hardware implementations. Simulation results reveal that EHMA significantly improves the matching performance. The speed of EHMA is about 0.89-1,161 times faster than that of current matching algorithms. Even under real-life intense attack, EHMA still performs well.

acm symposium on applied computing | 2004

A seriate coverage filtration approach for homology search

Hsiao Ping Lee; Yin Te Tsai; Chuan Yi Tang

The homology search within genomic databases is a fundamental and crucial work in biological knowledge discovery. With exponentially increasing size and access of databases, the issues of efficient retrieval become more essential in bioinformatics. Due to the varieties of biological data, similar sequences are not only under some error tolerance, but are also above some seriate coverage level. In this paper, we propose a seriate coverage filtration approach to extract the homologies from the databases efficiently. Our approach performs a lossless filtration and can be implemented as a preprocess of the existing search heuristics. Our method converts a users requests for error and seriate coverage levels to some thresholds of interest. Accordingly, we transform the work of homology discovery to a variation of the longest increasing subsequence problem, and design an efficient counterpart algorithm. In the performance test, it is found that our approach has an attractive quality of filtration.

computational systems bioinformatics | 2004

An efficient algorithm for unique signature discovery on whole-genome EST databases

Hsiao Ping Lee; Tzu Fang Sheu; Yin Te Tsai; Ching Hua Shih; Chuan Yi Tang

ESTs can be used to accelerate the various research activities for the discovery of new genes. Unique oligonucleotides are signatures that distinguish an EST from all the others. An important application of those short signatures is to be used in PCR primer design and microarray experiments. In this research, we propose an efficient approach to enhance our previous work on unique signature discovery to handle the dataset of whole-genome scale. The performances of our method are evaluated by the experiments on human chromosome EST databases.

Explore More