Is this you? Create Your Porfile

Mai H. Nguyen

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mai H. Nguyen is active.

Explore More

Publication

Featured researches published by Mai H. Nguyen.

2014 IEEE/ACM International Symposium on Big Data Computing | 2014

A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning

Jianwu Wang; Yan Tang; Mai H. Nguyen; Ilkay Altintas

In the Big Data era, machine learning has more potential to discover valuable insights from the data. As an important machine learning technique, Bayesian Network (BN) has been widely used to model probabilistic relationships among variables. To deal with the challenges of Big Data PN learning, we apply the techniques in distributed data-parallelism (DDP) and scientific workflow to the BN learning process. We first propose an intelligent Big Data pre-processing approach and a data quality score to measure and ensure the data quality and data faithfulness. Then, a new weight based ensemble algorithm is proposed to learn a BN structure from an ensemble of local results. To easily integrate the algorithm with DDP engines, such as Hadoop, we employ Kepler scientific workflow to build the whole learning process. We demonstrate how Kepler can facilitate building and running our Big Data BN learning application. Our experiments show good scalability and learning accuracy when running the application in real distributed environments.

international conference on conceptual structures | 2015

Towards an Integrated Cyberinfrastructure for Scalable Data-driven Monitoring, Dynamic Prediction and Resilience of Wildfires

Ilkay Altintas; Jessica Block; Raymond A. de Callafon; Daniel Crawl; Charles Cowart; Amarnath Gupta; Mai H. Nguyen; Hans-Werner Braun; Jürgen P. Schulze; Michael J. Gollner; Arnaud Trouvé; Larry Smarr

Abstract Wildfires are critical for ecosystems in many geographical regions. However, our current urbanized existence in these environments is inducing the ecological balance to evolve into a different dynamic leading to the biggest fires in history. Wildfire wind speeds and directions change in an instant, and first responders can only be effective if they take action as quickly as the conditions change. What is lacking in disaster management today is a system integration of real-time sensor networks, satellite imagery, near-real time data management tools, wildfire simulation tools, and connectivity to emergency command centers before, during and after a wildfire. As a first time example of such an integrated system, the WIFIRE project is building an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. This paper summarizes the approach and early results of the WIFIRE project to integrate networked observations, e.g., heterogeneous satellite data and real-time remote sensor data with computational techniques in signal processing, visualization, modeling and data assimilation to provide a scalable, technological, and educational solution to monitor weather patterns to predict a wildfires Rate of Spread.

international conference on big data | 2015

Big data provenance: Challenges, state of the art and opportunities

Jianwu Wang; Daniel Crawl; Shweta Purawat; Mai H. Nguyen; Ilkay Altintas

Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.

international conference on big data | 2016

Determining feature extractors for unsupervised learning on satellite images

Behnam Hedayatnia; Mehrdad Yazdani; Mai H. Nguyen; Jessica Block; Ilkay Altintas

Advances in satellite imagery presents unprecedented opportunities for understanding natural and social phenomena at global and regional scales. Although the field of satellite remote sensing has evaluated imperative questions to human and environmental sustainability, scaling those techniques to very high spatial resolutions at regional scales remains a challenge. Satellite imagery is now more accessible with greater spatial, spectral and temporal resolution creating a data bottleneck in identifying the content of images. Because satellite images are unlabeled, unsupervised methods allow us to organize images into coherent groups or clusters. However, the performance of unsupervised methods, like all other machine learning methods, depends on features. Recent studies using features from pre-trained networks have shown promise for learning in new datasets. This suggests that features from pre-trained networks can be used for learning in temporally and spatially dynamic data sources such as satellite imagery. It is not clear, however, which features from which layer and network architecture should be used for learning new tasks. In this paper, we present an approach to evaluate the transferability of features from pre-trained Deep Convolutional Neural Networks for satellite imagery. We explore and evaluate different features and feature combinations extracted from various deep network architectures, and systematically evaluate over 2,000 network-layer combinations. In addition, we test the transferability of our engineered features and learned features from an unlabeled dataset to a different labeled dataset. Our feature engineering and learning are done on the unlabeled Draper Satellite Chronology dataset, and we test on the labeled UC Merced Land dataset to achieve near state-of-the-art classification results. These results suggest that even without any or minimal training, these networks can generalize well to other datasets. This method could be useful in the task of clustering unlabeled images and other unsupervised machine learning tasks.

international conference on e-science | 2017

An Unsupervised Deep Learning Approach for Satellite Image Analysis with Applications in Demographic Analysis

Jessica Block; Mehrdad Yazdani; Mai H. Nguyen; Daniel Crawl; Marta Jankowska; John Graham; Thomas A. DeFanti; Ilkay Altintas

High resolution satellite imagery is a growing source of data with potential applications in many diverse domains. Efficient large scale analysis of this rich data can lead to unprecedented discoveries with societal impact. We present a new framework for organizing collections of satellite images into demographically relevant categories using unsupervised learning techniques. Our framework first extracts features using pre-trained Convolutional Neural Networks from tiles of high resolution satellite images of a city. The k-means algorithm is then applied to these features to organize images into visually similar groups. The resulting clustered images are validated using demographic data. The cluster model is then applied to six different cities around the world to test the transferability of our methods. Finally, the discovered image clusters are visualized in our customized web interface to enable demographers, social scientists, and economists to understand the organization of a city.

international conference on conceptual structures | 2016

Integrated Machine Learning in the Kepler Scientific Workflow System

Mai H. Nguyen; Daniel Crawl; Tahereh Masoumi; Ilkay Altintas

Abstract We present a method to integrate multiple implementations of a machine learning algorithm in Kepler actors. This feature enables the user to compare accuracy and scalability of various implementations of a machine learning technique without having to change the workflow. These actors are based on the Execution Choice actor. They can be incorporated into any workflow to provide machine learning functionality. We describe a use case where actors that provide several implementations of k-means clustering can be used in a workflow to process sensor data from weather stations for predicting wildfire risks.

international conference on big data | 2016

A scalable approach for location-specific detection of Santa Ana conditions

Mai H. Nguyen; Dylan Uys; Daniel Crawl; Charles Cowart; Ilkay Altintas

Santa Ana conditions are hot, dry, windy weather conditions that can greatly increase the dangers of wildfires in southern California. We present a machine learning approach to detect Santa Ana conditions based on sensor measurements from weather stations. Cluster analysis is performed on historical weather data to build models to identify Santa Ana patterns. A separate model is built using data from each weather station to capture the patterns specific to the microclimate of each region. Real-time sensor data from a weather station can then be processed to determine if the region surrounding that station is experiencing Santa Ana conditions. Results can be used as a warning system to focus firefighting efforts on regions with increased wildfire risks. Through the use of the Kepler workflow system and distributed computing with Spark, data from several weather stations can be processed in parallel using a scalable clustering algorithm, allowing our approach to scale to large datasets from multiple weather stations.

arXiv: Distributed, Parallel, and Cluster Computing | 2017

Modular Resource Centric Learning for Workflow Performance Prediction.

Alok Singh; Mai H. Nguyen; Shweta Purawat; Daniel Crawl; Ilkay Altintas

arXiv: Other Computer Science | 2018

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

Adam Rule; Amanda Birmingham; Cristal Zuniga; Ilkay Altintas; Shih-Cheng Huang; Rob Knight; Niema Moshiri; Mai H. Nguyen; Sara Brin Rosenthal; Fernando Pérez; Peter W. Rose

arXiv: Computer Vision and Pattern Recognition | 2018