Thomas Phan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas Phan is active.

Explore More

Publication

Featured researches published by Thomas Phan.

job scheduling strategies for parallel processing | 2005

Evolving toward the perfect schedule: co-scheduling job assignments and data replication in wide-area systems using a genetic algorithm

Thomas Phan; Kavitha Ranganathan; Radu Sion

Traditional job schedulers for grid or cluster systems are responsible for assigning incoming jobs to compute nodes in such a way that some evaluative condition is met. Such systems generally take into consideration the availability of compute cycles, queue lengths, and expected job execution times, but they typically do not account directly for data staging and thus miss significant associated opportunities for optimisation. Intuitively, a tighter integration of job scheduling and automated data replication can yield significant advantages due to the potential for optimised, faster access to data and decreased overall execution time. In this paper we consider data placement as a first-class citizen in scheduling and use an optimisation heuristic for generating schedules. We make the following two contributions. First, we identify the necessity for co-scheduling job dispatching and data replication assignments and posit that simultaneously scheduling both is critical for achieving good makespans. Second, we show that deploying a genetic search algorithm to solve the optimal allocation problem has the potential to achieve significant speed-up results versus traditional allocation mechanisms. Through simulation, we show that our algorithm provides on average an approximately 20-45% faster makespan than greedy schedulers.

Future Generation Computer Systems | 2007

A grid-based approach for enterprise-scale data mining

Ramesh Natarajan; Radu Sion; Thomas Phan

We describe a grid-based approach for enterprise-scale data mining, which is based on leveraging parallel database technology for data storage, and on-demand compute servers for parallelism in the statistical computations. This approach is targeted towards the use of data mining in highly-automated vertical business applications, where the data is stored on one or more relational database systems, and an independent set of high-performance compute servers or a network of low-cost, commodity processors is used to improve the application performance and overall workload management. The goal of this paper is to describe an algorithmic decomposition of data mining kernels between the data storage and compute grids, which makes it possible to exploit the parallelism on the respective grids in a simple way, while minimizing the data transfer between these grids. This approach is compatible with existing standards for data mining task specification and results reporting, so that larger applications using these data mining algorithms do not have to be modified to benefit from this grid-based approach.

very large data bases | 2008

A request-routing framework for SOA-based enterprise computing

Thomas Phan; Wen-Syan Li

Enterprises may use a service-oriented architecture (SOA) to provide a streamlined interface to their business processes. To scale up the system, each tier in a composite service usually deploys multiple servers for load distribution and fault tolerance. Such load distribution across multiple servers within the same tier can be viewed as horizontal load distribution. One limitation of this approach is that load cannot be further distributed when all servers in the same tier are fully loaded. In complex multi-tiered systems, a single business process may actually be implemented by multiple different computation pathways among the tiers, each with different components, in order to provide resiliency and scalability. Such SOA-based enterprise computing with multiple implementation options gives opportunities for vertical load distribution across tiers. In this paper, we propose a requestrouting framework for SOA-based enterprise computing that takes into consideration both horizontal and vertical load distribution. Through experimentation we show that our algorithm and methodology scale well up to a large system configuration comprising up to 1000 workflow requests to a complex composite service with multiple implementations. We also show that a combination of both horizontal and vertical load distributions gives the maximum flexibility to improve performance and fault tolerance.

middleware for service oriented computing | 2006

Heuristics-based scheduling of composite web service workloads

Thomas Phan; Wen-Syan Li

Web services can be aggregated to create composite workflows that provide streamlined functionality for human users or other systems. Although industry standards and recent research have sought to define best practices and to improve end-to-end workflow composition, one area that has not fully been explored is the scheduling of a workflows web service requests to actual service provisioning in a multi-tiered, multi-organisation environment. This issue is relevant to modern business scenarios where business processes within a workflow must complete within QoS-defined limits. Because these business processes are web service consumers, service requests must be mapped and scheduled across multiple web service providers, each with its own negotiated service level agreement. In this paper we provide heuristics for scheduling service requests from multiple business process workflows to web service providers such that a business value metric across all workflows is maximised. We show that a genetic search algorithm is appropriate to perform this scheduling, and through experimentation we show that our algorithm scales well up to a thousand workflows and produces better mappings than traditional approaches.

international parallel and distributed processing symposium | 2007

Middleware and Performance Issues for Computational Finance Applications on Blue Gene/L

Thomas Phan; Ramesh Natarajan; Satoki Mitsumori; Hao Yu

We discuss real-world case studies involving the implementation of a Web services middleware tier for the IBM Blue Gene/L supercomputer to support financial business applications. These programs that are representative of a class of modern financial analytics that take part in distributed business workflows and are heavily database-centric with input and output data stored in external SQL data warehouses. We describe the design issues related to the development of our middleware tier that provides a number of core features, including an automated SQL data extraction and staging gateway, a standardized high-level job specification schema, a well-defined Web services (SOAP) API for interoperability with other applications, and a secure HTML/JSP Web-based interface suitable for general users. Further, we provide observations on performance optimizations to support the relevant data movement requirements.

international conference on mobile and ubiquitous systems: networking and services | 2006

TypeCast: Type-Based Routing in Wireless Ad-hoc Networks

Jinsong Lin; Thomas Phan; Rajive L. Bagrodia

Type-based communication is proposed as an effective paradigm to enable group communication in wireless ad-hoc networks (MANETs). In this paradigm, type is used as the fundamental construct for addressing and routing messages. Type hierarchies are used to dynamically control group size; and object-oriented principles such as subtyping and multiple inheritance are utilized to construct new groups from existing ones. We present the design of TypeCast, a routing protocol that directly supports type-based communication. TypeCast leverages efficiency and mobility management provided by MANET multicast protocols and extends them by adding a Bloom filter-based type encoding and routing mechanism. TypeCast is fully decentralized and supports subtyping and type-composition. We implement TypeCast on top of ODMRP and conduct a detailed performance and scalability study of TypeCast through simulation. The results show that TypeCast demonstrates good resiliency to mobility and group size. When the number of types in the network increases, TypeCast achieves good scalability thanks to type aggregation provided by Bloom filters

database and expert systems applications | 2005

XG: a data-driven computation grid for enterprise-scale mining

Radu Sion; Ramesh Natarajan; Inderpal Narang; Wen-Syan Li; Thomas Phan

In this paper we introduce a novel architecture for data processing, based on a functional fusion between a data and a computation layer. We show how such an architecture can be leveraged to offer significant speedups for data processing jobs such as data analysis and mining over large data sets. n nOne novel contribution of our solution is its data-driven approach. The computation infrastructure is controlled from within the data layer. Grid compute job submission events are based within the query processor on the DBMS side and in effect controlled by the data processing job to be performed. This allows the early deployment of on-the-fly data aggregation techniques, minimizing the amount of data to be transfered to/from compute nodes and is in stark contrast to existing Grid solutions that interact with data layers mainly as external “storage”. n nWe validate this in a scenario derived from a real business deployment, involving financial customer profiling using common types of data analytics (e.g., linear regression analysis). Experimental results show significant speedups. For example, using a grid of only 12 non-dedicated nodes, we observed a speedup of approximately 1000% in a scenario involving complex linear regression analysis data mining computations for commercial customer profiling.

extending database technology | 2006

XG: a grid-enabled query processing engine

Radu Sion; Ramesh Natarajan; Inderpal Narang; Thomas Phan

In [12] we introduce a novel architecture for data processing, based on a functional fusion between a data and a computation layer. In this demo we show how this architecture is leveraged to offer significant speedups for data processing jobs such as data analysis and mining over large data sets. n nOne novel contribution of our solution is its data-driven approach. The computation infrastructure is controlled from within the data layer. Grid compute job submission events are based within the query processor on the DBMS side and in effect controlled by the data processing job to be performed. This allows the early deployment of on-the-fly data aggregation techniques, minimizing the amount of data to be transfered to/from compute nodes and is in stark contrast to existing Grid solutions that interact with data layers as external (mainly) “storage” components. By integrating scheduling intelligence in the data layer itself we show that it is possible to provide a close to optimal solution to the more general grid trade-off between required data replication costs and computation speed-up benefits. We validate this in a scenario derived from a real business deployment, involving financial customer profiling using common types of data analytics.

Archive | 2008