Da Tong
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Da Tong.
field programmable gate arrays | 2013
Da Tong; Lu Sun; Kiran Kumar Matam; Viktor K. Prasanna
Machine learning (ML) algorithms have been shown to be effective in classifying the dynamic internet traffic today. Using additional features and sophisticated ML techniques can improve accuracy and can classify a broad range of application classes. Realizing such classifiers to meet high data rates is challenging. In this paper, we propose two architectures to realize complete online traffic classifier using flow-level features. First, we develop a traffic classifier based on C4.5 decision tree algorithm and Entropy-MDL discretization algorithm. It achieves an accuracy of 97.92% when classifying a traffic trace consisting of eight application classes. Next, we accelerate our classifier using two architectures on FPGA. One architecture stores the classifier in on-chip distributed RAM. It is designed to sustain a high throughput. The other architecture stores the classifier in block RAM. It is designed to operate with small hardware footprint and thus built at low hardware cost. Experimental results show that our high throughput architecture can sustain a throughput of
ACM Sigarch Computer Architecture News | 2016
Da Tong; Viktor K. Prasanna
550
reconfigurable computing and fpgas | 2013
Da Tong; Viktor K. Prasanna
Gbps assuming 40 Byte packet size. Our low cost architecture demonstrates a 22% better resource efficiency than the high throughput design. It can be easily replicated to achieve
international parallel and distributed processing symposium | 2015
Da Tong; Shijie Zhou; Viktor K. Prasanna
449
high performance switching and routing | 2014
Da Tong; Yun Rock Qu; Viktor K. Prasanna
Gbps while supporting 160 input traffic streams concurrently. Both architectures are parameterizable and programmable to support any binary-tree-based traffic classifier. We develop a tool which allows users to easily map a binary-tree-based classifier to hardware. The tool takes a classifier as input and automatically generates the Verilog code for the corresponding hardware architecture.
IEEE Transactions on Parallel and Distributed Systems | 2017
Da Tong; Yun Rock Qu; Viktor K. Prasanna
In the context of networking, a heavy hitter is an entity in a data stream whose amount of activity (such as bandwidth consumption or number of connections) is higher than a given threshold. Detecting heavy hitters is a critical task for network management and security in the Internet and data centers. Data streams in modern network usually contain millions of entities, such as traffic flows or IP domains. It is challenging to detect heavy hitters at a high throughput while supporting such a large number of entities. I this work, we propose a high throughput online heavy hitter detector based on the Count-min sketch algorithm on FPGA. We propose a high throughput hash computation architecture, optimize the Count-min sketch for hardwarebased heavy hitter detection and use forwarding to deal with data hazards. The post place-and-route results of our architecture on a state-of-the-art FPGA shows high throughput and scalability. Our architecture achieves a throughput of 114 Gbps while supporting a typical 1 M concurrent entities. It sustains 100+ Gbps throughput while supporting various number of concurrent entities, stream sizes and accuracy requirements. Our implementation demonstrates improved performance compared with other sketch acceleration techniques on various platforms using similar sketch configurations.
ieee international conference on high performance computing data and analytics | 2015
Da Tong; Viktor K. Prasanna
Detecting heavy hitters is essential for many network management and security applications in the Internet and in data centers. Heavy hitter is the entity in a data stream whose amount of activity, such as bandwidth consumption or number of connections is higher than a given threshold. In this work, we propose a pipelined architecture for an online heavy hitter detector on FPGA. It also reports the top K heavy hitters. We design an application specific data forwarding mechanism to handle data hazards without stalling the pipeline. The stream size and the threshold for heavy hitter detection can be configured through run-time parameters. The post place-and-route results on a state-of-the-art FPGA shows that the architecture can achieve a throughput of 84 Gbps supporting 128 K concurrent flows. The proposed architecture can support large number of concurrent flows using external memory while sustaining the same throughput as the on-chip BRAM based implementation.
ieee high performance extreme computing conference | 2013
Da Tong; Viktor K. Prasanna
Hash tables are widely used in many network applications such as packet classification, traffic classification, and heavy hitter detection, etc. In this paper, we present a pipelined architecture for high throughput online hash table on FPGA. The proposed architecture supports search, insert, and delete operations at line rate for the massive hash table which is stored in off-chip memory. We propose two hash table access schemes: (1) the first scheme assigns each hash entry multiple slots to reduce the hash collision rate; each slot can store the corresponding hash key of the hash entry; (2) the second scheme has a higher hash collision rate but a lower off-chip memory bandwidth requirement than the first scheme. Both schemes guarantee the line rate processing when using the memory devices with sufficient access bandwidth. We design an application specific data forwarding unit to deal with the potential data hazards. Our architecture ensures that no stalling is required to process any sequence of concurrent operations while tolerating large external memory access latency. On a state-of-the-art FPGA, the proposed architecture achieves 66-85 Gbps throughput while supporting a hash table of various number of entries with various key sizes for various DRAM access latency. Our design also shows good scalability in terms of throughput for various hash table configurations.
reconfigurable computing and fpgas | 2012
Da Tong; Yi-Hua E. Yang; Viktor K. Prasanna
Traffic classification is a critical task in network management. Decision-trees are commonly used in Machine Learning (ML)-based traffic classification algorithms. Most of the existing implementations are hardware-based, while a new trend for network applications is to use software-based solutions. Since the decision-tree used for traffic classification is highly unbalanced, it is challenging to achieve high throughput for decision-tree-based traffic classification on multi-core platforms. In this paper, we present a high-throughput traffic classifier employing a scalable data structure on multi-core platforms. We convert decision-trees used in ML-based algorithms into a compact rule set table. Based on this data structure, we develop a divide-and-conquer algorithm by (1) searching all the columns of this table in parallel, and (2) merging the outcomes from all the columns into the final classification result. High throughput is sustained using our approach even if the size of the rule set table is scaled up with respect to (1) the number of decision-tree leaves and (2) the number of features examined during the classification process. We prototype our design on state-of-the-art multi-core platforms. For a typical decision-tree-based traffic classifier consisting of 128 leaf nodes and 6 flow-level features, our implementation achieves a throughput of 98 Million Lookups Per Second (MLPS). Our traffic classifier sustains high throughput even for highly unbalanced decision-trees. We achieve 1.5× throughput compared with the C4.5 decision-tree-based implementations, and 13× throughput compared with the SVM based traffic classifiers on multi-core platforms.
IEEE Transactions on Parallel and Distributed Systems | 2018
Da Tong; Viktor K. Prasanna
Machine learning (ML) algorithms have been shown to be effective in classifying a broad range of applications in the Internet traffic. In this paper, we propose algorithms and architectures to realize online traffic classification using flow level features. First, we develop a traffic classifier based on C4.5 decision tree algorithm and Entropy-MDL (Minimum Description Length) discretization algorithm. It achieves an overall accuracy of 97.92 percent for classifying eight major applications. Next we propose approaches to accelerate the classifier on FPGA (Field Programmable Gate Array) and multicore platforms. We optimize the original classifier by merging it with discretization. Our implementation of this optimized decision tree achieves 7500+ Million Classifications Per Second (MCPS) on a state-of-the-art FPGA platform and 75-150 MCPS on two state-of-the-art multicore platforms. We also propose a divide and conquer approach to handle imbalanced decision trees. Our implementation of the divide-and-conquer approach achieves 10,000+ MCPS on a state-of-the-art FPGA platform and 130-340 MCPS on two state-of-the-art multicore platforms. We conduct extensive experiments on both platforms for various application scenarios to compare the two approaches.