Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yongyao Jiang is active.

Publication


Featured researches published by Yongyao Jiang.


ISPRS international journal of geo-information | 2016

Reconstructing Sessions from Data Discovery and Access Logs to Build a Semantic Knowledge Base for Improving Data Discovery

Yongyao Jiang; Yun Li; Chaowei Phil Yang; Edward M. Armstrong; Thomas Huang; David Moroni

Big geospatial data are archived and made available through online web discovery and access. However, finding the right data for scientific research and application development is still a challenge. This paper aims to improve the data discovery by mining the user knowledge from log files. Specifically, user web session reconstruction is focused upon in this paper as a critical step for extracting usage patterns. However, reconstructing user sessions from raw web logs has always been difficult, as a session identifier tends to be missing in most data portals. To address this problem, we propose two session identification methods, including time-clustering-based and time-referrer-based methods. We also present the workflow of session reconstruction and discuss the approach of selecting appropriate thresholds for relevant steps in the workflow. The proposed session identification methods and workflow are proven to be able to extract data access patterns for further pattern analyses of user behavior and improvement of data discovery for more relevancy data ranking, suggestion, and navigation.


International Journal of Geographical Information Science | 2017

A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example

Yongyao Jiang; Yun Li; Chaowei Yang; Kai Liu; Edward M. Armstrong; Thomas Huang; David Moroni; Christopher J. Finch

ABSTRACT It is challenging to find relevant data for research and development purposes in the geospatial big data era. One long-standing problem in data discovery is locating, assimilating and utilizing the semantic context for a given query. Most research in the geospatial domain has approached this problem in one of two ways: building a domain-specific ontology manually or discovering automatically, semantic relationships using metadata and machine learning techniques. The former relies on rich expert knowledge but is static, costly and labor intensive, whereas the second is automatic and prone to noise. An emerging trend in information science takes advantage of large-scale user search histories, which are dynamic but subject to user- and crawler-generated noise. Leveraging the benefits of these three approaches and avoiding their weaknesses, a novel methodology is proposed to (1) discover vocabulary-based semantic relationships from user search histories and clickstreams, (2) refine the similarity calculation methods from existing ontologies and (3) integrate the results of ontology, metadata, user search history and clickstream analysis to better determine their semantic relationships. An accuracy assessment by domain experts for the similarity values indicates an 83% overall accuracy for the top 10 related terms over randomly selected sample queries. This research functions as an example for building vocabulary-based semantic relationships for different geographical domains to improve various aspects of data discovery, including the accuracy of the vocabulary relationships of commonly used search terms.


ieee international conference on cloud computing technology and science | 2016

Chapter 10 – Polar CI Portal: A Cloud-Based Polar Resource Discovery Engine

Yongyao Jiang; Chaowei Phil Yang; Jizhe Xia; Kai Liu

Abstract The Polar Regions are going through rapid and dramatic changes. These changes have significant global impacts on both the environment and society. Nevertheless, the Polar Regions are still the largest observational data voids on the planet, and polar-related resources are usually distributed across different online systems. This chapter introduces the Polar Cyberinfrastructure (CI) Portal, a one-stop portal that makes it easy for users to discover, share, and access polar resources. The polar resource discovery engine integrates a set of capabilities: (1) semantic-based searching, (2) service quality evaluation, (3) polar-friendly and user-friendly visualization, and (4) scalability of cloud computing. We start by discussing the background and challenges of geospatial cyberinfrastructure for Polar Regions, and then explain the architecture and each building block of the Polar CI portal system. A detailed discussion of the status and functions of key components is included to demonstrate the advantages of the system and its integrated techniques.


International Journal of Digital Earth | 2018

Towards intelligent geospatial data discovery: a machine learning framework for search ranking

Yongyao Jiang; Yun Li; Chaowei Yang; Fei Hu; Edward M. Armstrong; Thomas Huang; David Moroni; Lewis J. McGibbney; Christopher J. Finch

ABSTRACT Current search engines in most geospatial data portals tend to induce users to focus on one single-data characteristic dimension (e.g. popularity and release date). This approach largely fails to take account of users’ multidimensional preferences for geospatial data, and hence may likely result in a less than optimal user experience in discovering the most applicable dataset. This study reports a machine learning framework to address the ranking challenge, the fundamental obstacle in geospatial data discovery, by (1) identifying a number of ranking features of geospatial data to represent users’ multidimensional preferences by considering semantics, user behavior, spatial similarity, and static dataset metadata attributes; (2) applying a machine learning method to automatically learn a ranking function; and (3) proposing a system architecture to combine existing search-oriented open source software, semantic knowledge base, ranking feature extraction, and machine learning algorithm. Results show that the machine learning approach outperforms other methods, in terms of both precision at K and normalized discounted cumulative gain. As an early attempt of utilizing machine learning to improve the search ranking in the geospatial domain, we expect this work to set an example for further research and open the door towards intelligent geospatial data discovery.


ISPRS international journal of geo-information | 2017

A High Performance, Spatiotemporal Statistical Analysis System Based on a Spatiotemporal Cloud Platform

Baoxuan Jin; Weiwei Song; Kang Zhao; Xiaoyan Wei; Fei Hu; Yongyao Jiang

With the increase in size and complexity of spatiotemporal data, traditional methods for performing statistical analysis are insufficient for meeting real-time requirements for mining information from Big Data, due to both data- and computing-intensive factors. To solve the Big Data challenges in geostatistics and to support decision-making, a high performance, spatiotemporal statistical analysis system (Geostatistics-Hadoop) is proposed in this paper. The proposed system has several features: (1) Hadoop is enhanced to handle spatial data in a native format and execute a number of parallelized spatial analysis algorithms to solve practical geospatial analysis problems; (2) the Oozie-based workflow system is utilized to ease the operation and sharing of spatial analysis services; and (3) a private cloud platform based on Eucalyptus is leveraged to provide on-the-fly and elastic computing resources. Experimental results show that Geostatistics-Hadoop efficiently conducts rapid information mining and analysis of big spatiotemporal data sets, with the support of elastic computing resources from a cloud platform. The adoption of cloud computing and the Hadoop cluster to parallelize statistical calculations significantly improves the performance of Big Data analyses.


International Journal of Digital Earth | 2018

A hierarchical indexing strategy for optimizing Apache Spark with HDFS to efficiently query big geospatial raster data

Fei Hu; Chaowei Yang; Yongyao Jiang; Yun Li; Weiwei Song; Daniel Q. Duffy; John L. Schnase; Tsengdar Lee

ABSTRACT Earth observations and model simulations are generating big multidimensional array-based raster data. However, it is difficult to efficiently query these big raster data due to the inconsistency among the geospatial raster data model, distributed physical data storage model, and the data pipeline in distributed computing frameworks. To efficiently process big geospatial data, this paper proposes a three-layer hierarchical indexing strategy to optimize Apache Spark with Hadoop Distributed File System (HDFS) from the following aspects: (1) improve I/O efficiency by adopting the chunking data structure; (2) keep the workload balance and high data locality by building the global index (k-d tree); (3) enable Spark and HDFS to natively support geospatial raster data formats (e.g., HDF4, NetCDF4, GeoTiff) by building the local index (hash table); (4) index the in-memory data to further improve geospatial data queries; (5) develop a data repartition strategy to tune the query parallelism while keeping high data locality. The above strategies are implemented by developing the customized RDDs, and evaluated by comparing the performance with that of Spark SQL and SciSpark. The proposed indexing strategy can be applied to other distributed frameworks or cloud-based computing systems to natively support big geospatial data query with high efficiency.


ISPRS international journal of geo-information | 2018

A Smart Web-Based Geospatial Data Discovery System with Oceanographic Data as an Example

Yongyao Jiang; Yun Li; Chaowei Yang; Fei Hu; Edward M. Armstrong; Thomas Huang; David Moroni; Lewis J. McGibbney; Frank R. Greguska; Christopher J. Finch

Discovering and accessing geospatial data presents a significant challenge for the Earth sciences community as massive amounts of data are being produced on a daily basis. In this article, we report a smart web-based geospatial data discovery system that mines and utilizes data relevancy from metadata user behavior. Specifically, (1) the system enables semantic query expansion and suggestion to assist users in finding more relevant data; (2) machine-learned ranking is utilized to provide the optimal search ranking based on a number of identified ranking features that can reflect users’ search preferences; (3) a hybrid recommendation module is designed to allow users to discover related data considering metadata attributes and user behavior; (4) an integrated graphic user interface design is developed to quickly and intuitively guide data consumers to the appropriate data resources. As a proof of concept, we focus on a well-defined domain-oceanography and use oceanographic data discovery as an example. Experiments and a search example show that the proposed system can improve the scientific community’s data search experience by providing query expansion, suggestion, better search ranking, and data recommendation via a user-friendly interface.


Cartography and Geographic Information Science | 2018

A graph-based approach to detecting tourist movement patterns using social media data

Fei Hu; Zhenlong Li; Chaowei Yang; Yongyao Jiang

ABSTRACT Understanding the characteristics of tourist movement is essential for tourist behavior studies since the characteristics underpin how the tourist industry management selects strategies for attraction planning to commercial product development. However, conventional tourism research methods are not either scalable or cost-efficient to discover underlying movement patterns due to the massive datasets. With advances in information and communication technology, social media platforms provide big data sets generated by millions of people from different countries, all of which can be harvested cost efficiently. This paper introduces a graph-based method to detect tourist movement patterns from Twitter data. First, collected tweets with geo-tags are cleaned to filter those not published by tourists. Second, a DBSCAN-based clustering method is adapted to construct tourist graphs consisting of the tourist attraction vertices and edges. Third, network analytical methods (e.g. betweenness centrality, Markov clustering algorithm) are applied to detect tourist movement patterns, including popular attractions, centric attractions, and popular tour routes. New York City in the United States is selected to demonstrate the utility of the proposed methodology. The detected tourist movement patterns assist business and government activities whose mission is tour product planning, transportation, and development of both shopping and accommodation centers.


ieee aerospace conference | 2017

An architecture for mitigating near earth object's impact to the earth

Chaowei Phil Yang; Manzhu Yu; Mengchao Xu; Yongyao Jiang; Han Qin; Yun Li; Myra Bambacus; Ronald Y. Leung; Brent W. Barbee; Joseph A. Nuth; Bernard D. Seery; Nicolas Bertini; David S. P. Dearborn; Mike Piccione; Rob Culbertson; Catherine S. Plesko

Near-Earth Objects (NEOs), like species extinction events, present a great threat to our home planet and human kind. The motivation of designing this architectural framework is the current lack of structured architecture for the process of detecting, characterizing and mitigating these NEO threats. Due to the recent establishment of the NASAs Planetary Defense Coordination Office (PDCO), it is critical to link the individual facilities conducting separate research with an objective of forming a clearly defined collaborative system based on data reporting and sharing. The architectural framework is designed for integrating the process of detecting, characterizing and mitigating NEO threats. The goal of designing the architecture is to organize current data and resources into useful information and correlate that information with the goals of the NEO mitigation study. The architectural framework will enable scientists, organizations, and decision makers to locate, identify and resolve semantic confusion, properties, facts, constraints and issues with potentially hazardous asteroids. Our major focus is to design the data and information flow that models the complete process from NEO detection, to designing the mitigation strategies. A secondary focus is to develop a system-of-systems architecture to describe the supporting infrastructure for the framework. The framework is also built with the opportunity to leverage future assets from the broader Planetary Defense (PD) community, and identify/speed up relevant PD research and response.i


oceans conference | 2016

Leveraging cloud computing to speedup user access log mining

Yun Li; Yongyao Jiang; Fei Hu; Chaowei Yang; Armstrong; Thomas Huang; David Moroni; Chris Fench

It is very challenging for scientists to find the right oceanographic data in a fast manner. A novel approach was proposed to analyze user access logs to explore the implicit relationship between oceanographic datasets. This paper reports a cloud-based data analytics framework to speed up the process for dealing with problems, such as (1) user access logs keep growing as users keep interact with data center websites; (2) the data analysis process involves several computing- intensive steps such as session reconstruction and latent semantic analysis (LSA); (3) the dynamic data volume requires on-demand computing resources to deliver time-sensitive computing services. To meet the requirement of dynamic computing-resources, cloud computing is leveraged to facilitate setting up cluster and speed up log mining process. In addition, Spark-based log partition strategies are integrated into our cloud-based framework to conduct log processing tasks in parallel. This experimental system is deployed on the NASA AIST cloud platform.

Collaboration


Dive into the Yongyao Jiang's collaboration.

Top Co-Authors

Avatar

Yun Li

George Mason University

View shared research outputs
Top Co-Authors

Avatar

Fei Hu

George Mason University

View shared research outputs
Top Co-Authors

Avatar

Chaowei Yang

George Mason University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Moroni

California Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Thomas Huang

California Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Edward M. Armstrong

California Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Manzhu Yu

George Mason University

View shared research outputs
Top Co-Authors

Avatar

Christopher J. Finch

California Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Bernard D. Seery

Goddard Space Flight Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge