Is this you? Create Your Porfile

Chee Sun Liew

Information Technology University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chee Sun Liew is active.

Explore More

Publication

Featured researches published by Chee Sun Liew.

high performance distributed computing | 2010

Towards optimising distributed data streaming graphs using parallel streams

Chee Sun Liew; Malcolm P. Atkinson; Jano van Hemert; Liangxiu Han

Modern scientific collaborations have opened up the opportunity of solving complex problems that involve multi-disciplinary expertise and large-scale computational experiments. These experiments usually involve large amounts of data that are located in distributed data repositories running various software systems, and managed by different organisations. A common strategy to make the experiments more manageable is executing the processing steps as a workflow. In this paper, we look into the implementation of fine-grained data-flow between computational elements in a scientific workflow as streams. We model the distributed computation as a directed acyclic graph where the nodes represent the processing elements that incrementally implement specific subtasks. The processing elements are connected in a pipelined streaming manner, which allows task executions to overlap. We further optimise the execution by splitting pipelines across processes and by introducing extra parallel streams. We identify performance metrics and design a measurement tool to evaluate each enactment. We conducted experiments to evaluate our optimisation strategies with a real world problem in the Life Sciences---EURExpress-II. The paper presents our distributed data-handling model, the optimisation and instrumentation strategies and the evaluation experiments. We demonstrate linear speed up and argue that this use of data-streaming to enable both overlapped pipeline and parallelised enactment is a generally applicable optimisation strategy.

Sensors | 2015

Mining personal data using smartphones and wearable devices: a survey.

Muhammad Habib ur Rehman; Chee Sun Liew; Teh Ying Wah; Junaid Shuja; Babak Daghighi

The staggering growth in smartphone and wearable device use has led to a massive scale generation of personal (user-specific) data. To explore, analyze, and extract useful information and knowledge from the deluge of personal data, one has to leverage these devices as the data-mining platforms in ubiquitous, pervasive, and big data environments. This study presents the personal ecosystem where all computational resources, communication facilities, storage and knowledge management systems are available in user proximity. An extensive review on recent literature has been conducted and a detailed taxonomy is presented. The performance evaluation metrics and their empirical evidences are sorted out in this paper. Finally, we have highlighted some future research directions and potentially emerging application areas for personal data mining using smartphones and wearable devices.

Proceedings of the second international workshop on Data-aware distributed computing | 2009

A distributed architecture for data mining and integration

Malcolm P. Atkinson; Jano van Hemert; Liangxiu Han; Ally Hume; Chee Sun Liew

This paper presents the rationale for a new architecture to support a significant increase in the scale of data integration and data mining. It proposes the composition into one framework of (1) data mining and (2) data access and integration. We name the combined activity DMI. It supports enactment of DMI processes across heterogeneous and distributed data resources and data mining services. It posits that a useful division can be made between the facilities established to support the definition of DMI processes and the computational infrastructure provided to enact DMI processes. Communication between those two divisions is restricted to requests submitted to gateway services in a canonical DMI language. Larger-scale processes are enabled by incremental refinement of DMI-process definitions often by recomposition of lower-level definitions. Autonomous evolution of data resources and services is supported by types and descriptions which will support detection of inconsistencies and semi-automatic insertion of adaptations. These architectural ideas are being evaluated in a feasibility study that involves an application scenario and representatives of the community.

parallel computing | 2011

A generic parallel processing model for facilitating data mining and integration

Liangxiu Han; Chee Sun Liew; Jano van Hemert; Malcolm P. Atkinson

To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.

Malaysian Journal of Computer Science | 2007

ENHANCED SOFTWARE DEVELOPMENT EFFORT AND COST ESTIMATION USING FUZZY LOGIC MODEL

Moon Ting Su; Teck Chaw Ling; Keat Keong Phang; Chee Sun Liew; Peck Yen Man

The development of software has always been characterized by parameters that possess certain level of fuzziness. This requires that some degree of uncertainty be introduced in the models, in order to make the models realistic. Fuzzy logic fares well in this area. Many of the problems of the existing effort estimation models can be solved by incorporating fuzzy logic. Besides, fuzzy logic had been combined with algorithmic, non-algorithmic effort estimation models as well as a combination of them to deal with the inherent uncertainty issues. This paper also described an enhanced fuzzy logic model for the estimation of software development effort. The model (FLECE) possesses similar capabilities as the previous fuzzy logic model. In addition to that, the enhancements done in FLECE improved the empirical accuracy of the previous model in terms of MMRE (Mean Magnitude of Relative Error) and threshold-oriented prediction measure or prediction quality (pred).

international conference on big data and cloud computing | 2014

Data-Intensive Workflow Optimization Based on Application Task Graph Partitioning in Heterogeneous Computing Systems

Saima Gulzar Ahmad; Chee Sun Liew; M. Mustafa Rafique; Ehsan Ullah Munir; Samee Ullah Khan

Stream based data processing model is proven to be an established method to optimize data-intensive applications. Data-intensive applications involve movement of huge amount of data between execution nodes that incurs large costs. Data-streaming model improves the execution performance of such applications. In the stream-based data processing model, performance is usually measured by throughput and latency. Optimization of these performance metrics in heterogeneous computing environment becomes more challenging due to the difference in the computing capacity of execution nodes and variations in the data transfer capability of communication links between these nodes. This paper presents a dual objective Partitioning based Data-intensive Workflow optimization Algorithm (PDWA) for heterogeneous computing systems. The proposed PDWA provides significantly reduced latency with increase in the throughput. In the proposed algorithm, the application task graph is partitioned such that the interpartition data movement is minimal. Such optimized partitioning enhances the throughput. Each partition is mapped to the execution node that gives minimum execution time for that particular partition. PDWA also exploits partial task duplication to reduce the latency. We evaluated the proposed algorithm with synthesized benchmarks and workflows from the real-world workloads, and the proposed algorithm shows 60% reduced latency with 47% improvement in the throughput as compared to the approach when workflows are not partitioned.

Data Science and Engineering | 2016

Big Data Reduction Methods: A Survey

Muhammad Habib ur Rehman; Chee Sun Liew; Assad Abbas; Prem Prakash Jayaraman; Teh Ying Wah; Samee Ullah Khan

Abstract Research on big data analytics is entering in the new phase called fast data where multiple gigabytes of data arrive in the big data systems every second. Modern big data systems collect inherently complex data streams due to the volume, velocity, value, variety, variability, and veracity in the acquired data and consequently give rise to the 6Vs of big data. The reduced and relevant data streams are perceived to be more useful than collecting raw, redundant, inconsistent, and noisy data. Another perspective for big data reduction is that the million variables big datasets cause the curse of dimensionality which requires unbounded computational resources to uncover actionable knowledge patterns. This article presents a review of methods that are used for big data reduction. It also presents a detailed taxonomic discussion of big data reduction methods including the network theory, big data compression, dimension reduction, redundancy elimination, data mining, and machine learning methods. In addition, the open research issues pertinent to the big data reduction are also highlighted.

Distributed and Parallel Databases | 2012

Data-intensive architecture for scientific knowledge discovery

Malcolm P. Atkinson; Chee Sun Liew; Michelle Galea; Paul R. Martin; Amrey Krause; Adrian Mouat; Oscar Corcho; David Snelling

This paper presents a data-intensive architecture that demonstrates the ability to support applications from a wide range of application domains, and support the different types of users involved in defining, designing and executing data-intensive processing tasks. The prototype architecture is introduced, and the pivotal role of DISPEL as a canonical language is explained. The architecture promotes the exploration and exploitation of distributed and heterogeneous data and spans the complete knowledge discovery process, from data preparation, to analysis, to evaluation and reiteration. The architecture evaluation included large-scale applications from astronomy, cosmology, hydrology, functional genetics, imaging processing and seismology.

ACM Computing Surveys | 2017

Scientific Workflows: Moving Across Paradigms

Chee Sun Liew; Malcolm P. Atkinson; Michelle Galea; Tan Fong Ang; Paul Martin; Jano van Hemert

Modern scientific collaborations have opened up the opportunity to solve complex problems that require both multidisciplinary expertise and large-scale computational experiments. These experiments typically consist of a sequence of processing steps that need to be executed on selected computing platforms. Execution poses a challenge, however, due to (1) the complexity and diversity of applications, (2) the diversity of analysis goals, (3) the heterogeneity of computing platforms, and (4) the volume and distribution of data. A common strategy to make these in silico experiments more manageable is to model them as workflows and to use a workflow management system to organize their execution. This article looks at the overall challenge posed by a new order of scientific experiments and the systems they need to be run on, and examines how this challenge can be addressed by workflows and workflow management systems. It proposes a taxonomy of workflow management system (WMS) characteristics, including aspects previously overlooked. This frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the “fourth paradigm,” and identifies research needed to maintain progress in this area.

workflows in support of large scale science | 2014

Workflows in a dashboard: a new generation of usability

Sandra Gesing; Malcolm P. Atkinson; Rosa Filgueira; Ian Taylor; Andrew Clifford Jones; Vlado Stankovski; Chee Sun Liew; Alessandro Spinuso; Gabor Terstyanszky; Péter Kacsuk

In the last 20 years quite a few mature workflow engines and workflow editors have been developed to support communities in managing workflows. While there is a trend followed by the providers of workflow engines to ease the creation of workflows tailored to their specific workflow system, the management tools still often necessitate much understanding of the workflow concepts and languages. This paper describes the approach targeting various workflow systems and building a single user interface for editing and monitoring workflows under consideration of aspects such as optimization and provenance of data. The design allots agile Web frameworks and novel technologies to build a workflow dashboard offered in a web browser and connecting seamlessly to available workflow systems and external resources like Cloud infrastructures. The user interface eliminates the need to become acquainted with diverse layouts. Thus, the usability is immensely increased for various aspects of managing workflows.

Explore More