Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jin Li is active.

Publication


Featured researches published by Jin Li.


Ecological Informatics | 2011

A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors

Jin Li; Andrew D. Heap

Spatial interpolation methods have been applied to many disciplines. Many factors affect the performance of the methods, but there are no consistent findings about their effects. In this study, we use comparative studies in environmental sciences to assess the performance and to quantify the impacts of data properties on the performance. Two new measures are proposed to compare the performance of the methods applied to variables with different units/scales. A total of 53 comparative studies were assessed and the performance of 72 methods/sub-methods compared is analysed. The impacts of sample density, data variation and sampling design on the estimations of 32 methods are quantified using data derived from their application to 80 variables. Inverse distance weighting (IDW), ordinary kriging (OK), and ordinary co-kriging (OCK) are the most frequently used methods. Data variation is a dominant impact factor and has significant effects on the performance of the methods. As the variation increases, the accuracy of all methods decreases and the magnitude of decrease is method dependent. Irregular-spaced sampling design might improve the accuracy of estimation. The effect of sampling density on the performance of the methods is found not to be significant. The implications of these findings are discussed.


Environmental Modelling and Software | 2014

Spatial interpolation methods applied in the environmental sciences: A review

Jin Li; Andrew D. Heap

Spatially continuous data of environmental variables are often required for environmental sciences and management. However, information for environmental variables is usually collected by point sampling, particularly for the mountainous region and deep ocean area. Thus, methods generating such spatially continuous data by using point samples become essential tools. Spatial interpolation methods (SIMs) are, however, often data-specific or even variable-specific. Many factors affect the predictive performance of the methods and previous studies have shown that their effects are not consistent. Hence it is difficult to select an appropriate method for a given dataset. This review aims to provide guidelines and suggestions regarding application of SIMs to environmental data by comparing the features of the commonly applied methods which fall into three categories, namely: non-geostatistical interpolation methods, geostatistical interpolation methods and combined methods. Factors affecting the performance, including sampling design, sample spatial distribution, data quality, correlation between primary and secondary variables, and interaction among factors, are discussed. A total of 25 commonly applied methods are then classified based on their features to provide an overview of the relationships among them. These features are quantified and then clustered to show similarities among these 25 methods. An easy to use decision tree for selecting an appropriate method from these 25 methods is developed based on data availability, data nature, expected estimation, and features of the method. Finally, a list of software packages for spatial interpolation is provided.


Environmental Modelling and Software | 2011

Application of machine learning methods to spatial interpolation of environmental variables

Jin Li; Andrew D. Heap; Anna Potter; James J. Daniell

Machine learning methods, like random forest (RF), have shown their superior performance in various disciplines, but have not been previously applied to the spatial interpolation of environmental variables. In this study, we compared the performance of 23 methods, including RF, support vector machine (SVM), ordinary kriging (OK), inverse distance squared (IDS), and their combinations (i.e., RFOK, RFIDS, SVMOK and SVMIDS), using mud content samples in the southwest Australian margin. We also tested the sensitivity of the combined methods to input variables and the accuracy of averaging predictions of the most accurate methods. The accuracy of the methods was assessed using a 10-fold cross-validation. The spatial patterns of the predictions of the most accurate methods were also visually examined for their validity. This study confirmed the effectiveness of RF, in particular its combination with OK or IDS, and also confirmed the sensitivity of RF and its combined methods to the input variables. Averaging the predictions of the most accurate methods showed no significant improvement in the predictive accuracy. Visual examination proved to be an essential step in assessing the spatial predictions. This study has opened an alternative source of methods for spatial interpolation of environmental properties.


Environmental Modelling and Software | 2014

Spatial interpolation methods applied in the environmental sciences

Jin Li; Andrew D. Heap

Spatially continuous data of environmental variables are often required for environmental sciences and management. However, information for environmental variables is usually collected by point sampling, particularly for the mountainous region and deep ocean area. Thus, methods generating such spatially continuous data by using point samples become essential tools. Spatial interpolation methods (SIMs) are, however, often data-specific or even variable-specific. Many factors affect the predictive performance of the methods and previous studies have shown that their effects are not consistent. Hence it is difficult to select an appropriate method for a given dataset. This review aims to provide guidelines and suggestions regarding application of SIMs to environmental data by comparing the features of the commonly applied methods which fall into three categories, namely: non-geostatistical interpolation methods, geostatistical interpolation methods and combined methods. Factors affecting the performance, including sampling design, sample spatial distribution, data quality, correlation between primary and secondary variables, and interaction among factors, are discussed. A total of 25 commonly applied methods are then classified based on their features to provide an overview of the relationships among them. These features are quantified and then clustered to show similarities among these 25 methods. An easy to use decision tree for selecting an appropriate method from these 25 methods is developed based on data availability, data nature, expected estimation, and features of the method. Finally, a list of software packages for spatial interpolation is provided.


Ecological Informatics | 2011

Performance of predictive models in marine benthic environments based on predictions of sponge distribution on the Australian continental shelf

Zhi Huang; Brendan P. Brooke; Jin Li

Abstract This study tested the performance of 15 predictive models in predicting the distribution of sponge assemblages on the Australian continental shelf using a common set of marine environmental variables. The models included traditional regression and more recently developed machine learning models. The results demonstrate that the spatial distribution of sponge assemblages can be successfully predicted, although the effectiveness of predictions varied among models. Overall, machine learning models achieved the best prediction performance. The direct variable of bottom-water temperature and the resource variables that describe bottom-water nutrient status were found to be useful surrogates for the distribution of sponge assemblages at the broad regional scale. A new method of deriving pseudo-absence data (weighted pseudo-absence) was compared with random pseudo-absence data — the new data were able to improve modelling performance for all the models both in terms of statistics (~xa010%) and in the predicted spatial distributions. Results from this study will further refine modelling methods used to predict the spatial distribution of marine biota at broad spatial scales, an outcome especially relevant to managers of marine resources.


Environmental Modelling and Software | 2013

Spatial interpolation of McArthur's Forest Fire Danger Index across Australia: Observational study

L.A. Sanabria; X. Qin; Jin Li; R.P. Cechet; C. Lucas

Fire danger indices are used by fire management agencies to assess fire weather conditions and issue public warnings. The most widely used fire danger indices in Australia are the McArthur Fire Forest Danger Index and the Grassland Fire Danger Index. These indices are calculated at weather stations using measurements of weather variables and fuel information. For a vast country like Australia when assessing the risk of severe fire weather events, it is also important to calculate the spatial distribution of these indices considering the extreme tail of the distribution. The spatial distribution of one of the fire weather danger indices regularly used in Australia is presented in this paper. In particular, we present the spatial distribution of the long-term tendency of extreme values of the McArthur Forest Fire Danger Index (FFDI). This indicator of fire weather conditions was assessed by calculating the return period of its extreme values by fitting extreme value distributions to data sets of FFDI at 78 recording stations around Australia. The spatial distribution of these return periods was obtained by using spatial interpolation algorithms with the recording stations measurements. Two conventional and two new algorithms based on machine-learning techniques were tested. This study shows that the best interpolation results for the FFDI can be obtained by using a combination of random forest and inverse distance weighting interpolation algorithms. The spatial distribution of the seasonal FFDI return period shows that the highest FFDI over large parts of southern Australia occurs during the summer months whilst in northern Australia it occurs in spring. The results also show that the FFDI in eastern Australia, the most populated region of the country, is higher inland than in the coastal areas particularly during spring and summer.


PLOS ONE | 2016

Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

Jin Li; Maggie Tran; Justy Siwabessy

Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.


Environmental Modelling and Software | 2016

Assessing spatial predictive models in the environmental sciences

Jin Li

A comprehensive assessment of the performance of predictive models is necessary as they have been increasingly employed to generate spatial predictions for environmental management and conservation and their accuracy is crucial to evidence-informed decision making and policy. In this study, we clarified relevant issues associated with variance explained (VEcv) by predictive models, established the relationships between VEcv and commonly used accuracy measures and unified these measures under VEcv that is independent of unit/scale and data variation. We quantified the relationships between these measures and data variation and found about 65% compared models and over 45% recommended models for generating spatial predictions explained no more than 50% data variance. We classified the predictive models based on VEcv, which provides a tool to directly compare the accuracy of predictive models for data with different unit/scale and variation and establishes a cross-disciplinary context and benchmark for assessing predictive models in future studies. Established the relationships of VECV with commonly used accuracy measures.Quantified the relationships of these measures with data variation.Objectively assessed predictive models based on VECV in the environmental sciences.Provided a tool to assess predictive models for data of various unit and variation.Established a cross-disciplinary context/benchmark for assessing predictive models.


Environmental Chemistry | 2015

Characterising sediments of a tropical sediment-starved shelf using cluster analysis of physical and geochemical variables

Lynda Radke; Jin Li; Grant Douglas; Rachel Przeslawski; Scott L. Nichol; Justy Siwabessy; Zhi Huang; Janice Trafford; Tony Watson; Tanya Whiteway

Environmental context Australias tropical marine estate is a biodiversity hotspot that is threatened by human activities. Analysis and interpretation of large physical and geochemistry data sets provides important information on processes occurring at the seafloor in this poorly known area. These processes help us to understand how the seafloor functions to support biodiversity in the region. Abstract Baseline information on habitats is required to manage Australias northern tropical marine estate. This study aims to develop an improved understanding of seafloor environments of the Timor Sea. Clustering methods were applied to a large data set comprising physical and geochemical variables that describe organic matter (OM) reactivity, quantity and source, and geochemical processes. Arthropoda (infauna) were used to assess different groupings. Clusters based on physical and geochemical data discriminated arthropods better than geomorphic features. Major variations among clusters included grain size and a cross-shelf transition from authigenic-Mn–As enrichments (inner shelf) to authigenic-P enrichment (outer shelf). Groups comprising raised features had the highest reactive OM concentrations (e.g. low chlorin indices and C:N ratios, and high reaction rate coefficients) and benthic algal δ13C signatures. Surface area-normalised OM concentrations higher than continental shelf norms were observed in association with: (i) low δ15N, inferring Trichodesmium input; and (ii) pockmarks, which impart bottom–up controls on seabed chemistry and cause inconsistencies between bulk and pigment OM pools. Low Shannon–Wiener diversity occurred in association with low redox and porewater pH and published evidence for high energy. Highest β-diversity was observed at euphotic depths. Geochemical data and clustering methods used here provide insight into ecosystem processes that likely influence biodiversity patterns in the region.


Data Mining Applications with R | 2014

Predicting Seabed Hardness Using Random Forest in R

Jin Li; P. Justy; W. Siwabessy; Maggie Tran; Zhi Huang; Andrew D. Heap

The spatial information of the seabed biodiversity is important for marine zone management in Australia. The biodiversity is often predicted using spatially continuous data of seabed biophysical properties. Seabed hardness is an important property for predicting the biodiversity and is often inferred from multibeam backscatter data. Seabed hardness can also be inferred based on underwater video footage that is, however, only available at a limited number of sampled locations. In this study, we predict the spatial distribution of seabed hardness using random forest (RF) based on video classification and seabed properties. We illustrate the effects of cross-validation methods including a new cross-validation function ( rf.cv ) on selecting the most optimal predictive model. We also test the effects of various predictor sets on the accuracy of predictive models. This study provides an example of predicting the spatial distribution of environmental properties using RF in R.

Collaboration


Dive into the Jin Li's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Grant Douglas

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge