Linh Bao Ngo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Linh Bao Ngo is active.

Explore More

Publication

Featured researches published by Linh Bao Ngo.

international conference on big data | 2014

Synthetic data generation for the internet of things

Jason W. Anderson; K. E. Kennedy; Linh Bao Ngo; Andre Luckow; Amy W. Apon

The concept of Internet of Things (IoT) is rapidly moving from a vision to being pervasive in our everyday lives. This can be observed in the integration of connected sensors from a multitude of devices such as mobile phones, healthcare equipment, and vehicles. There is a need for the development of infrastructure support and analytical tools to handle IoT data, which are naturally big and complex. But, research on IoT data can be constrained by concerns about the release of privately owned data. In this paper, we present the design and implementation results of a synthetic IoT data generation framework. The framework enables research on synthetic data that exhibit the complex characteristics of original data without compromising proprietary information and personal privacy.

international conference on cluster computing | 2013

JUMMP: Job Uninterrupted Maneuverable MapReduce Platform

William Clay Moody; Linh Bao Ngo; Edward B. Duffy; Amy W. Apon

In this paper, we present JUMMP, the Job Uninterrupted Maneuverable MapReduce Platform, an automated scheduling platform that provides a customized Hadoop environment within a batch-scheduled cluster environment. JUMMP enables an interactive pseudo-persistent MapReduce platform within the existing administrative structure of an academic high performance computing center by “jumping” between nodes with minimal administrative effort. Jumping is implemented by the synchronization of stopping and starting daemon processes on different nodes in the cluster. Our experimental evaluation shows that JUMMP can be as efficient as a persistent Hadoop cluster on dedicated computing resources, depending on the jump time. Additionally, we show that the cluster remains stable, with good performance, in the presence of jumps that occur as frequently as the average length of reduce tasks of the currently executing MapReduce job. JUMMP provides an attractive solution to academic institutions that desire to integrate Hadoop into their current computing environment within their financial, technical, and administrative constraints.

international conference on cluster computing | 2015

Evaluating R-Based Big Data Analytic Frameworks

Mei Liang; Cesar Trejo; Lavanya Muthu; Linh Bao Ngo; Andre Luckow; Amy W. Apon

We study the two approaches, rHadoop and H2O, to intergate R, a popular statistical programming environment, into the Hadoop Big Data ecosystem. Using these approaches and the vanilla implementation of MapReduce to implement the solution to an analytic question for the on-time airline performance data set, we evaluate the differences in runtime performance and elaborate on the causes of these differences based on rHadoop and H2Os design principles.

international conference on information technology: new generations | 2012

An Architecture for Mining and Visualization of U.S. Higher Educational Data

Linh Bao Ngo; Vijay Dantuluri; Michael J. Stealey; Stanley C. Ahalt; Amy W. Apon

Higher education has undergone considerable change in the past decades. As a result, the higher education community is collecting and disseminating a great deal of data that is typically used to benchmark performance or satisfy reporting requirements. This data is a rich source for scholarly inquiry, and particularly interesting for questions related to investment strategies within the academy. However, the real value of these data sets can often only realized when the data is viewed and studied across the aggregate collection of data sources. This is a complex task that requires gathering, cleaning, and applying consistent metadata standards to data sets. This paper presents a Unified Data Framework that allows the aggregation of high demand data sources into a single useful research resource that is relevant to research in higher education. The Unified Data Framework guides the aggregation of existing and new data sets, and provides the option of connecting and automatically, or semi-automatically, updating data from the original sources. The Unified Data Framework presents to researchers of higher education a robust suite of analytic tools for data mining and visualization of combined and complex data sources.

Transportation Research Record | 2015

Potentials of Online Media and Location-Based Big Data for Urban Transit Networks in Developing Countries

Kelsey Lantz; Sakib Muhmud Khan; Linh Bao Ngo; Mashrur Chowdhury; Sarah Donaher; Amy W. Apon

Big data, collected in the form of social media posts and mobile phone location tracking, have great potential to inform and manage the planning and operation of transit networks in developing countries. Data are widely available, but the challenge, as with developed countries, is figuring out how best to use it. A case study method was used to consider approaches in Nairobi, Kenya; Istanbul, Turkey; and Dhaka, Bangladesh. In Nairobi, GPS location data were collected to generate the first map of the complex Matatu transit network. In Istanbul, automated fare collection systems were processed to understand better the usage of a bus rapid transit system. In Dhaka, researchers were collecting GPS positioning data to manage the city bus networks. Residents of these developing cities were frequent users of online media, as in many cities in the developing countries. This study revealed that integration of online media with location-based data provided a big data scenario that had the potential for supporting transit operations while posing challenges to the management of data mobility. It is not realistic to apply a one-size-fits-all approach to any problem in the developing world, but together the case studies show that with the right approach, technical capacity in transitional cities has the potential to grow to support higher-level data processing and make more efficient and more sustainable policy decisions for crucial urban transit networks in developing countries.

international parallel and distributed processing symposium | 2014

Teaching HDFS/MapReduce Systems Concepts to Undergraduates

Linh Bao Ngo; Edward B. Duffy; Amy W. Apon

This paper presents the development of a Hadoop MapReduce module that has been taught in a course in distributed computing to upper undergraduate computer science students at Clemson University. The paper describes our teaching experiences and the feedback from the students over several semesters that have helped to shape the course. We provide suggested best practices for lecture materials, the computing platform, and the teaching methods. In addition, the computing platform and teaching methods can be extended to accommodate emerging technologies and modules for related courses.

international conference on information technology: new generations | 2010

A Forecasting Capability Study of Empirical Mode Decomposition for the Arrival Time of a Parallel Batch System

Linh Bao Ngo; Amy W. Apon; Doug Hoffman

This paper demonstrates the feasibility and potential of applying empirical mode decomposition (EMD) to forecast the arrival time behaviors in a parallel batch system. An analysis of the workload records shows the existence of daily and weekly patterns within the workload. Results show that the intrinsic mode functions (IMF), products of the sifting/decomposition process of EMD, produce a better prediction than the original arrival histogram when used in a simple weight-matching prediction technique. Promising applications include the implementation of an EMD/neural network combination.

IEEE Transactions on Intelligent Transportation Systems | 2018

A Distributed Message Delivery Infrastructure for Connected Vehicle Technology Applications

Yuheng Du; Mashrur Chowdhury; Mizanur Rahman; Kakan Dey; Amy W. Apon; Andre Luckow; Linh Bao Ngo

A complex and vast amount of data will be collected from on-board sensors of operational connected vehicles (CVs), infrastructure data sources such as roadway sensors and traffic signals, mobile data sources such as cell phones, social media sources such as Twitter, and news and weather data services. Unfortunately, these data will create a bottleneck at data centers for processing and retrievals of collected data, and will require the deployment of additional message transfer infrastructure between data producers and consumers to support diverse CV applications. In this paper, we present a strategy for creating an efficient and low-latency distributed message delivery system for CV applications using a distributed message delivery platform. This strategy enables large-scale ingestion, curation, and transformation of unstructured data (roadway traffic-related and roadway non-traffic-related data) into labeled and customized topics for a large number of subscribers or consumers, such as CVs, mobile devices, and data centers. We evaluate the performance of this strategy by developing a prototype infrastructure using Apache Kafka, an open source message delivery system, and compared its performance with the latency requirements of CV applications. We present experimental results of the message delivery infrastructure on two different distributed computing testbeds at Clemson University: the Holocron cluster and the Palmetto cluster. Experiments were performed to measure the latency of the message delivery system for a variety of testing scenarios. These experiments reveal that measured latencies are less than the U.S. Department of Transportation recommended latency requirements for CV applications, which prove the efficacy of the system for CV related data distribution and management tasks.

international conference on information technology | 2007

Using Shibboleth for Authorization and Authentication to the Subversion Version Control Repository System

Linh Bao Ngo; Amy W. Apon

A version control repository structure has been created based on open source utility software. The structure consists of a subversion repository with an Apache Web interface that is protected by a Shibboleth authentication system. This structure can allow authorized and authenticated data sharing between institutions, yet retains simplicity and protects privacy for users. In addition, it also helps local administrators from having to perform extra account management for new users from other institutions

Proceedings of the 1st Workshop on The Science of Cyberinfrastructure | 2015

Dynamic Provisioning of Data Intensive Computing Middleware Frameworks: A Case Study

Linh Bao Ngo; Michael E. Payne; Flavio Villanustre; Richard Taylor; Amy W. Apon

Big data has become an important asset for industry, and academic disciplines now utilize large-scale data in their research. This fourth paradigm of scientific research has led to the inclusion of data management, processing, and analytic tools into the traditional high performance computing software libraries. This integration is facilitated through a collection of supporting software components that comprise a data intensive computing middleware framework. From a shared campus cyberinfrastructure perspective, this represents a new challenge to the system administrators in balancing between the traditional high performance computing software stacks and the new data-intensive middleware on the same physical computing resource. In turn, this limits researchers from having access to the new middleware tools while administrators determine how to overcome the challenge. In this paper, we present our experience in configuring dynamic provisioning of two different data-intensive middleware frameworks from a user perspective. We describe the configuration process from setting up dependencies to deploying the middleware, and how this experience can be applied by other researchers and administrators.

Explore More