Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dantong Yu is active.

Publication


Featured researches published by Dantong Yu.


Knowledge and Information Systems | 2002

Findout: finding outliers in very large datasets

Dantong Yu; Gholamhosein Sheikholeslami; Aidong Zhang

Abstract. Finding the rare instances or the outliers is important in many KDD (knowledge discovery and data-mining) applications, such as detecting credit card fraud or finding irregularities in gene expressions. Signal-processing techniques have been introduced to transform images for enhancement, filtering, restoration, analysis, and reconstruction. In this paper, we present a new method in which we apply signal-processing techniques to solve important problems in data mining. In particular, we introduce a novel deviation (or outlier) detection approach, termed FindOut, based on wavelet transform. The main idea in FindOut is to remove the clusters from the original data and then identify the outliers. Although previous research showed that such techniques may not be effective because of the nature of the clustering, FindOut can successfully identify outliers from large datasets. Experimental results on very large datasets are presented which show the efficiency and effectiveness of the proposed approach.


IEEE Transactions on Knowledge and Data Engineering | 2003

ClusterTree: integration of cluster representation and nearest-neighbor search for large data sets with high dimensions

Dantong Yu; Aidong Zhang

We introduce the ClusterTree, a new indexing approach for representing clusters generated by any existing clustering approach. A cluster is decomposed into several subclusters and represented as the union of the subclusters. The subclusters can be further decomposed, which isolates the most related groups within the clusters. A ClusterTree is a hierarchy of clusters and subclusters which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. Our cluster representation is highly adaptive to any kind of cluster. It is well accepted that most existing indexing techniques degrade rapidly as the dimensions increase. The ClusterTree provides a practical solution to index clustered data sets and supports the retrieval of the nearest-neighbors effectively without having to linearly scan the high-dimensional data set. We also discuss an approach to dynamically reconstruct the ClusterTree when new data is added. We present the detailed analysis of this approach and justify it extensively with experiments.


conference on information sciences and systems | 2006

Multi-Source Grid Scheduling for Divisible Loads

Thomas G. Robertazzi; Dantong Yu

The applicability of min cost flow and multi-commodity flow mathematical programming problems to steady state, multi-source divisible load scheduling is examined. Applying the linear model concept of superposition to such steady state multi-source load distribution is suggested for linear and more general topologies. Finally, the use of heuristic optimization for a transient multi-source load distribution problem is discussed.


broadband communications, networks and systems | 2006

TeraPaths: End-to-End Network Path QoS Configuration Using Cross-Domain Reservation Negotiation

B. Gibbard; Dimitrios Katramatos; Dantong Yu; Shawn Patrick McKee

TeraPaths is a DOE MICS/SciDAC-fundedproject conceived to address the needs of the high energy and nuclear physics scientific community for effectively protecting data flows of various levels of priority through modern high-speed networks. TeraPaths is rapidly evolving from a last-mile, LAN QoS provider to a distributed end-to-end network path QoS negotiator through multiple administrative domains. Developed as a web service-based software system, TeraPaths automates the establishment of network paths with QoS guarantees between end sites by configuring their corresponding LANs and requesting MPLS paths through WANs on behalf of end users. The primary mechanism for the creation of such paths is the negotiation and placement of advance reservations across all involved domains. This paper describes the status of the project, our experiences so far, as well as the directions of our continued work.


ieee international conference on cloud computing technology and science | 2015

End-to-End Delay Minimization for Scientific Workflows in Clouds under Budget Constraint

Chase Qishi Wu; Xiangyu Lin; Dantong Yu; Wei Xu; Li Li

Next-generation e-Science features large-scale, compute-intensive workflows of many computing modules that are typically executed in a distributed manner. With the recent emergence of cloud computing and the rapid deployment of cloud infrastructures, an increasing number of scientific workflows have been shifted or are in active transition to cloud environments. As cloud computing makes computing a utility, scientists across different application domains are facing the same challenge of reducing financial cost in addition to meeting the traditional goal of performance optimization. We develop a prototype generic workflow system by leveraging existing technologies for a quick evaluation of scientific workflow optimization strategies. We construct analytical models to quantify the network performance of scientific workflows using cloud-based computing resources, and formulate a task scheduling problem to minimize the workflow end-to-end delay under a user-specified financial constraint. We rigorously prove that the proposed problem is not only NP-complete but also non-approximable. We design a heuristic solution to this problem, and illustrate its performance superiority over existing methods through extensive simulations and real-life workflow experiments based on proof-of-concept implementation and deployment in a local cloud testbed.


IEEE Network | 2010

Application-specific resource provisioning for wide-area distributed computing

Xin Liu; Chunming Qiao; Dantong Yu; Tao Jiang

Some modern distributed applications require cooperation among multiple geographically separated computing facilities to perform intensive computing at the end sites and large-scale data transfers in the wide area network. It has been widely recognized that WDM networks are cost-effective means to support data transfers in this type of data-intensive applications. However, neither the traditional approaches to establishing lightpaths between given source destination pairs nor the existing application-level approaches that only consider computing resources but take the underlying connectivity for granted are sufficient. In this article we identify key limitations and issues in existing systems, and focus on joint resource allocation of both computing resources and network resources in federated computing and network systems. A variety of resource allocation schemes that provide modern distributed computing applications with performance and reliability guarantees are presented.


ieee international conference on high performance computing data and analytics | 2011

End-to-end network QoS via scheduling of flexible resource reservation requests

Sushant Sharma; Dimitrios Katramatos; Dantong Yu

Modern data-intensive applications move vast amounts of data between multiple locations around the world. To enable predictable and reliable data transfers, next generation networks allow such applications to reserve network resources for exclusive use. In this paper, we solve an important problem (called SMR3) to accommodate multiple and concurrent network reservation requests between a pair of end sites. Given the varying availability of bandwidth within the network, our goal is to accommodate as many reservation requests as possible while minimizing the total time needed to complete the data transfers. First, we prove that SMR3 is an NP-hard problem. Then, we solve it by developing a polynomial-time heuristic called RRA. The RRA algorithm hinges on an efficient mechanism to accommodate large number of requests in an iterative manner. Finally, we show via numerical results that RRA constructs schedules that accommodate significantly larger number of requests compared to other, seemingly efficient, heuristics.


international conference on smart grid communications | 2013

Cloud motion estimation for short term solar irradiation prediction

Hao Huang; Jin Xu; Zhenzhou Peng; Shinjae Yoo; Dantong Yu; Dong Huang; Hong Qin

Variability of solar energy is the most significant issue for integrating solar energy into the power Grid. There are pressing demands to develop methods to accurately estimate cloud motion that directly affects the stability of solar power output.We propose a solar prediction system that can detect cloud movements from the TSI (total sky imager) images, and then estimate the future cloud position over solar panels and subsequent solar irradiance fluctuations incurred by cloud transients. The experiment studies show that our proposed approach significantly improves the quality of cloud motion estimation within a time window (up to a few minutes) that is sufficient for grid operators to take actions to mitigate the solar power volatility.


ieee international conference on high performance computing data and analytics | 2012

Design and implementation of an intelligent end-to-end network QoS system

Sushant Sharma; Dimitrios Katramatos; Dantong Yu; Li Shi

End-to-End guaranteed network QoS is a requirement for predictable data transfers between geographically distant end-hosts. Existing QoS systems, however, do not have the capability/intelligence to decide what resources to reserve and which paths to choose when there are multiple and flexible resource reservation requests. In this paper, we design and implement an intelligent system that can guarantee end-to-end network QoS for multiple flexible reservation requests. At the heart of this system is a polynomial time algorithm called resource reservation and path construction (RRPC). The RRPC algorithm schedules multiple flexible end-to-end data transfer requests by jointly optimizing the path construction and bandwidth reservation along these paths. We show that constructing such schedules is NP-hard. We implement our intelligent QoS system, and present the results of deployment on real world production networks (ESnet and Internet2). Our implementation does not require modifications or new software to be deployed on the routers within network.


ieee international conference on high performance computing data and analytics | 2012

Protocols for wide-area data-intensive applications: design and performance issues

Yufei Ren; Tan Li; Dantong Yu; Shudong Jin; Thomas G. Robertazzi; Brian Tierney; Eric Pouyoul

Providing high-speed data transfer is vital to various data-intensive applications. While there have been remarkable technology advances to provide ultra-high-speed network bandwidth, existing protocols and applications may not be able to fully utilize the bare-metal bandwidth due to their inefficient design. We identify the same problem remains in the field of Remote Direct Memory Access (RDMA) networks. RDMA offloads TCP/IP protocols to hardware devices. However, its benefits have not been fully exploited due to the lack of efficient software and application protocols, in particular in wide-area networks. In this paper, we address the design choices to develop such protocols. We describe a protocol implemented as part of a communication middleware. The protocol has its flow control, connection management, and task synchronization. It maximizes the parallelism of RDMA operations. We demonstrate its performance benefit on various local and wide-area testbeds, including the DOE ANI testbed with RoCE links and InfiniBand links.

Collaboration


Dive into the Dantong Yu's collaboration.

Top Co-Authors

Avatar

Shinjae Yoo

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Dimitrios Katramatos

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Hong Qin

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hao Huang

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar

Dong Huang

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Shudong Jin

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar

Tan Li

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar

Yufei Ren

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge