Is this you? Create Your Porfile

Thomas Huang

California Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas Huang is active.

Explore More

Publication

Featured researches published by Thomas Huang.

ISPRS international journal of geo-information | 2016

Reconstructing Sessions from Data Discovery and Access Logs to Build a Semantic Knowledge Base for Improving Data Discovery

Yongyao Jiang; Yun Li; Chaowei Phil Yang; Edward M. Armstrong; Thomas Huang; David Moroni

Big geospatial data are archived and made available through online web discovery and access. However, finding the right data for scientific research and application development is still a challenge. This paper aims to improve the data discovery by mining the user knowledge from log files. Specifically, user web session reconstruction is focused upon in this paper as a critical step for extracting usage patterns. However, reconstructing user sessions from raw web logs has always been difficult, as a session identifier tends to be missing in most data portals. To address this problem, we propose two session identification methods, including time-clustering-based and time-referrer-based methods. We also present the workflow of session reconstruction and discuss the approach of selecting appropriate thresholds for relevant steps in the workflow. The proposed session identification methods and workflow are proven to be able to extract data access patterns for further pattern analyses of user behavior and improvement of data discovery for more relevancy data ranking, suggestion, and navigation.

International Journal of Geographical Information Science | 2017

A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example

Yongyao Jiang; Yun Li; Chaowei Yang; Kai Liu; Edward M. Armstrong; Thomas Huang; David Moroni; Christopher J. Finch

ABSTRACT It is challenging to find relevant data for research and development purposes in the geospatial big data era. One long-standing problem in data discovery is locating, assimilating and utilizing the semantic context for a given query. Most research in the geospatial domain has approached this problem in one of two ways: building a domain-specific ontology manually or discovering automatically, semantic relationships using metadata and machine learning techniques. The former relies on rich expert knowledge but is static, costly and labor intensive, whereas the second is automatic and prone to noise. An emerging trend in information science takes advantage of large-scale user search histories, which are dynamic but subject to user- and crawler-generated noise. Leveraging the benefits of these three approaches and avoiding their weaknesses, a novel methodology is proposed to (1) discover vocabulary-based semantic relationships from user search histories and clickstreams, (2) refine the similarity calculation methods from existing ontologies and (3) integrate the results of ontology, metadata, user search history and clickstream analysis to better determine their semantic relationships. An accuracy assessment by domain experts for the similarity values indicates an 83% overall accuracy for the top 10 related terms over randomly selected sample queries. This research functions as an example for building vocabulary-based semantic relationships for different geographical domains to improve various aspects of data discovery, including the accuracy of the vocabulary relationships of commonly used search terms.

International Journal of Digital Earth | 2018

Towards intelligent geospatial data discovery: a machine learning framework for search ranking

Yongyao Jiang; Yun Li; Chaowei Yang; Fei Hu; Edward M. Armstrong; Thomas Huang; David Moroni; Lewis J. McGibbney; Christopher J. Finch

ABSTRACT Current search engines in most geospatial data portals tend to induce users to focus on one single-data characteristic dimension (e.g. popularity and release date). This approach largely fails to take account of users’ multidimensional preferences for geospatial data, and hence may likely result in a less than optimal user experience in discovering the most applicable dataset. This study reports a machine learning framework to address the ranking challenge, the fundamental obstacle in geospatial data discovery, by (1) identifying a number of ranking features of geospatial data to represent users’ multidimensional preferences by considering semantics, user behavior, spatial similarity, and static dataset metadata attributes; (2) applying a machine learning method to automatically learn a ranking function; and (3) proposing a system architecture to combine existing search-oriented open source software, semantic knowledge base, ranking feature extraction, and machine learning algorithm. Results show that the machine learning approach outperforms other methods, in terms of both precision at K and normalized discounted cumulative gain. As an early attempt of utilizing machine learning to improve the search ranking in the geospatial domain, we expect this work to set an example for further research and open the door towards intelligent geospatial data discovery.

ISPRS international journal of geo-information | 2018

A Smart Web-Based Geospatial Data Discovery System with Oceanographic Data as an Example

Yongyao Jiang; Yun Li; Chaowei Yang; Fei Hu; Edward M. Armstrong; Thomas Huang; David Moroni; Lewis J. McGibbney; Frank R. Greguska; Christopher J. Finch

Discovering and accessing geospatial data presents a significant challenge for the Earth sciences community as massive amounts of data are being produced on a daily basis. In this article, we report a smart web-based geospatial data discovery system that mines and utilizes data relevancy from metadata user behavior. Specifically, (1) the system enables semantic query expansion and suggestion to assist users in finding more relevant data; (2) machine-learned ranking is utilized to provide the optimal search ranking based on a number of identified ranking features that can reflect users’ search preferences; (3) a hybrid recommendation module is designed to allow users to discover related data considering metadata attributes and user behavior; (4) an integrated graphic user interface design is developed to quickly and intuitively guide data consumers to the appropriate data resources. As a proof of concept, we focus on a well-defined domain-oceanography and use oceanographic data discovery as an example. Experiments and a search example show that the proposed system can improve the scientific community’s data search experience by providing query expansion, suggestion, better search ranking, and data recommendation via a user-friendly interface.

oceans conference | 2016

Leveraging cloud computing to speedup user access log mining

Yun Li; Yongyao Jiang; Fei Hu; Chaowei Yang; Armstrong; Thomas Huang; David Moroni; Chris Fench

It is very challenging for scientists to find the right oceanographic data in a fast manner. A novel approach was proposed to analyze user access logs to explore the implicit relationship between oceanographic datasets. This paper reports a cloud-based data analytics framework to speed up the process for dealing with problems, such as (1) user access logs keep growing as users keep interact with data center websites; (2) the data analysis process involves several computing- intensive steps such as session reconstruction and latent semantic analysis (LSA); (3) the dynamic data volume requires on-demand computing resources to deliver time-sensitive computing services. To meet the requirement of dynamic computing-resources, cloud computing is leveraged to facilitate setting up cluster and speed up log mining process. In addition, Spark-based log partition strategies are integrated into our cloud-based framework to conduct log processing tasks in parallel. This experimental system is deployed on the NASA AIST cloud platform.

ieee international conference on cloud computing technology and science | 2011

Building climatological services on the cloud

Thomas Huang; Michael E. Gangl; Andrew W. Bingham

The NASA Physical Oceanographic Distributed Active Archive Center (PO.DAAC) at Jet Propulsion Laboratory is funded by the NASA Earth Science Data and Information System (ESDIS) project to conduct a study of cloud services for data management, data access and data processing. The study is to improve our understanding and articulate the cost/benefit of cloud technologies for the NASA Distributed Active Archive Centers (DAACs) and Science Investigator-led Production Systems (SIPs). This demonstration focuses on our experience in developing climatology services using Apache Hadoop to store and analyze temporal and spatial characteristics of scatter-ometer data over Antarctica.

Space OPS 2004 Conference | 2004

Data processing pipeline with transaction-oriented data sharing

Thomas Huang; Larry Preheim

Space science data processing involves executing a sequence of computing-intensive modules in an effort to produce high-quality science data products to further our understanding of our universe. While the fundamentals have not changed, the process has certainly evolved to take advantage of modern distributed computing and highspeed networking technologies. This evolution has enabled us to process science data at an extremely high rate and share processed results with our science communities in near real time. We often refer to the notion of data sharing as the ability to publish data on a wellknown location where it is visible to others. However, higher quality-ofservice (QoS) is expected when it comes to valuable science data. This paper makes three contributions to the area of modern science data processing systems. First, the paper describes the science data processing pipeline, developed at the Multi-mission Image Processing Lab (MIPL) of JPL, for transforming raw space data into high quality image data and automating the distribution of data using a highperformance file transaction service. File Exchange Interface (FEI) is the file transaction service developed at MIPL. Second, it presents the FEI component architecture [4] in the area of file transaction management, security, and file integrity verification. Finally, the paper presents the federated model for the FEI service to demonstrate how to create a pool of file transaction services to support load balancing and service failover, and simplify service management.

international conference on big data | 2017