Hing-Fung Ting | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hing-Fung Ting is active.

Explore More

Publication

Featured researches published by Hing-Fung Ting.

symposium on principles of database systems | 2006

A simpler and more efficient deterministic scheme for finding frequent items over sliding windows

Lap-Kei Lee; Hing-Fung Ting

In this paper, we give a simple scheme for identifying ε-approximate frequent items over a sliding window of size n. Our scheme is deterministic and does not make any assumption on the distribution of the item frequencies. It supports O(1/ε) update and query time, and uses O(1/ε) space. It is very simple; its main data structures are just a few short queues whose entries store the position of some items in the sliding window. We also extend our scheme for variable-size window. This extended scheme uses O(1/ε log(εn)) space.

PLOS ONE | 2013

SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner

Ruibang Luo; Thomas K. F. Wong; Jianqiao Zhu; Chi-Man Liu; Xiaoqian Zhu; Edward Wu; Lap-Kei Lee; Haoxiang Lin; Wenjuan Zhu; David W. Cheung; Hing-Fung Ting; Siu-Ming Yiu; Shaoliang Peng; Chang Yu; Yingrui Li; Ruiqiang Li; Tak Wah Lam

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dps power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.

Methods | 2016

MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices

Dinghua Li; Ruibang Luo; Chi-Man Liu; Chi-Ming Leung; Hing-Fung Ting; Kunihiko Sadakane; Hiroshi Yamashita; Tak Wah Lam

The study of metagenomics has been much benefited from low-cost and high-throughput sequencing technologies, yet the tremendous amount of data generated make analysis like de novo assembly to consume too much computational resources. In late 2014 we released MEGAHIT v0.1 (together with a brief note of Li et al. (2015) [1]), which is the first NGS metagenome assembler that can assemble genome sequences from metagenomic datasets of hundreds of Giga base-pairs (bp) in a time- and memory-efficient manner on a single server. The core of MEGAHIT is an efficient parallel algorithm for constructing succinct de Bruijn Graphs (SdBG), implemented on a graphical processing unit (GPU). The software has been well received by the assembly community, and there is interest in how to adapt the algorithms to integrate popular assembly practices so as to improve the assembly quality, as well as how to speed up the software using better CPU-based algorithms (instead of GPU). In this paper we first describe the details of the core algorithms in MEGAHIT v0.1, and then we show the new modules to upgrade MEGAHIT to version v1.0, which gives better assembly quality, runs faster and uses less memory. For the Iowa Prairie Soil dataset (252Gbp after quality trimming), the assembly quality of MEGAHIT v1.0, when compared with v0.1, has a significant improvement, namely, 36% increase in assembly size and 23% in N50. More interestingly, MEGAHIT v1.0 is no slower than before (even running with the extra modules). This is primarily due to a new CPU-based algorithm for SdBG construction that is faster and requires less memory. Using CPU only, MEGAHIT v1.0 can assemble the Iowa Prairie Soil sample in about 43h, reducing the running time of v0.1 by at least 25% and memory usage by up to 50%. MEGAHIT v1.0, exhibiting a smaller memory footprint, can process even larger datasets. The Kansas Prairie Soil sample (484Gbp), the largest publicly available dataset, can now be assembled using no more than 500GB of memory in 7.5days. The assemblies of these datasets (and other large metgenomic datasets), as well as the software, are available at the website https://hku-bal.github.io/megabox.

computing and combinatorics conference | 2004

New Results on On-Demand Broadcasting with Deadline via Job Scheduling with Cancellation

Wun-Tat Chan; Tak Wah Lam; Hing-Fung Ting; Prudence W. H. Wong

This paper studies the on-demand broadcasting problem with deadlines. We give the first general upper bound and improve existing lower bounds on the competitive ratio of the problem. The novelty of our work is the introduction of a new job scheduling problem that allows cancellation. We prove that the broadcasting problem can be reduced to this scheduling problem. This reduction frees us from the complication of the broadcasting model and allows us to work on a conceptually simpler model for upper bound results.

Algorithmica | 2012

Continuous Monitoring of Distributed Data Streams over a Time-Based Sliding Window

Ho-Leung Chan; Tak Wah Lam; Lap-Kei Lee; Hing-Fung Ting

In this paper we extend the study of algorithms for monitoring distributed data streams from whole data streams to a time-based sliding window. The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is

international colloquium on automata languages and programming | 2009