Diansheng Guo
University of South Carolina
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Diansheng Guo.
Computers, Environment and Urban Systems | 2009
Jeremy Mennis; Diansheng Guo
Abstract Voluminous geographic data have been, and continue to be, collected with modern data acquisition techniques such as global positioning systems (GPS), high-resolution remote sensing, location-aware services and surveys, and internet-based volunteered geographic information. There is an urgent need for effective and efficient methods to extract unknown and unexpected information from spatial data sets of unprecedentedly large size, high dimensionality, and complexity. To address these challenges, spatial data mining and geographic knowledge discovery has emerged as an active research field, focusing on the development of theory, methodology, and practice for the extraction of useful information and knowledge from massive and complex spatial databases. This paper highlights recent theoretical and applied research in spatial data mining and knowledge discovery. We first briefly review the literature on several common spatial data-mining tasks, including spatial classification and prediction; spatial association rule mining; spatial cluster analysis; and geovisualization. The articles included in this special issue contribute to spatial data mining research by developing new techniques for point pattern analysis, prediction in space–time data, and analysis of moving object data, as well as by demonstrating applications of genetic algorithms for optimization in the context of image classification and spatial interpolation. The papers concludes with some thoughts on the contribution of spatial data mining and geographic knowledge discovery to geographic information sciences.
IEEE Transactions on Visualization and Computer Graphics | 2009
Diansheng Guo
Spatial interactions (or flows), such as population migration and disease spread, naturally form a weighted location-to-location network (graph). Such geographically embedded networks (graphs) are usually very large. For example, the county-to-county migration data in the U.S. has thousands of counties and about a million migration paths. Moreover, many variables are associated with each flow, such as the number of migrants for different age groups, income levels, and occupations. It is a challenging task to visualize such data and discover network structures, multivariate relations, and their geographic patterns simultaneously. This paper addresses these challenges by developing an integrated interactive visualization framework that consists three coupled components: (1) a spatially constrained graph partitioning method that can construct a hierarchy of geographical regions (communities), where there are more flows or connections within regions than across regions; (2) a multivariate clustering and visualization method to detect and present multivariate patterns in the aggregated region-to-region flows; and (3) a highly interactive flow mapping component to map both flow and multivariate patterns in the geographic space, at different hierarchical levels. The proposed approach can process relatively large data sets and effectively discover and visualize major flow structures and multivariate relations at the same time. User interactions are supported to facilitate the understanding of both an overview and detailed patterns.
International Journal of Geographical Information Science | 2008
Diansheng Guo
Regionalization is to divide a large set of spatial objects into a number of spatially contiguous regions while optimizing an objective function, which is normally a homogeneity (or heterogeneity) measure of the derived regions. This research proposes and evaluates a family of six hierarchical regionalization methods. The six methods are based on three agglomerative clustering approaches, including the single linkage, average linkage (ALK), and the complete linkage (CLK), each of which is constrained with spatial contiguity in two different ways (i.e. the first‐order constraining and the full‐order constraining). It is discovered that both the Full‐Order‐CLK and the Full‐Order‐ALK methods significantly outperform existing methods across four quality evaluations: the total heterogeneity, region size balance, internal variation, and the preservation of data distribution. Moreover, the proposed algorithms are efficient and can find the solution in O(n 2log n) time. With such data scalability, for the first time it is possible to effectively regionalize large data sets that have 10 000 or more spatial objects. A detailed comparison and evaluation of the six methods are carried out with the 2004 US presidential election data.
Cartography and Geographic Information Science | 2005
Diansheng Guo; Mark Gahegan; Alan M. MacEachren; Biliang Zhou
The discovery, interpretation, and presentation of multivariate spatial patterns are important for scientific understanding of complex geographic problems. This research integrates computational, visual, and cartographic methods together to detect and visualize multivariate spatial patterns. The integrated approach is able to: (1) perform multivariate analysis, dimensional reduction, and data reduction (summarizing a large number of input data items in a moderate number of clusters) with the Self-Organizing Map (SOM); (2) encode the SOM result with a systematically designed color scheme; (3) visualize the multivariate patterns with a modified Parallel Coordinate Plot (PCP) display and a geographic map (GeoMap); and (4) support human interactions to explore and examine patterns. The research shows that such mixed initiative methods (computational and visual) can mitigate each others weakness and collaboratively discover complex patterns in large geographic datasets, in an effective and efficient way.
International Journal of Geographical Information Science | 2007
Diansheng Guo
Population mobility, i.e. the movement and contact of individuals across geographic space, is one of the essential factors that determine the course of a pandemic disease spread. This research views both individual‐based daily activities and a pandemic spread as spatial interaction problems, where locations interact with each other via the visitors that they share or the virus that is transmitted from one place to another. The research proposes a general visual analytic approach to synthesize very large spatial interaction data and discover interesting (and unknown) patterns. The proposed approach involves a suite of visual and computational techniques, including (1) a new graph partitioning method to segment a very large interaction graph into a moderate number of spatially contiguous subgraphs (regions); (2) a reorderable matrix, with regions ‘optimally’ ordered on the diagonal, to effectively present a holistic view of major spatial interaction patterns; and (3) a modified flow map, interactively linked to the reorderable matrix, to enable pattern interpretation in a geographical context. The implemented system is able to visualize both peoples daily movements and a disease spread over space in a similar way. The discovered spatial interaction patterns provide valuable insight for designing effective pandemic mitigation strategies and supporting decision‐making in time‐critical situations.
Information Visualization | 2003
Diansheng Guo
Unknown (and unexpected) multivariate patterns lurking in high-dimensional datasets are often very hard to find. This paper describes a human-centered exploration environment, which incorporates a coordinated suite of computational and visualization methods to explore high-dimensional data for uncovering patterns in multivariate spaces. Specifically, it includes: (1) an interactive feature selection method for identifying potentially interesting, multidimensional subspaces from a high-dimensional data space, (2) an interactive, hierarchical clustering method for searching multivariate clusters of arbitrary shape, and (3) a suite of coordinated visualization and computational components centered around the above two methods to facilitate a human-led exploration. The implemented system is used to analyze a cancer dataset and shows that it is efficient and effective for discovering unknown and unexpected multivariate patterns from high-dimensional data.
Geoinformatica | 2003
Diansheng Guo; Donna J. Peuquet; Mark Gahegan
The unprecedented large size and high dimensionality of existing geographic datasets make the complex patterns that potentially lurk in the data hard to find. Clustering is one of the most important techniques for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial clustering methods focus on the specific characteristics of distributions in 2- or 3-D space, while general-purpose high-dimensional clustering methods have limited power in recognizing spatial patterns that involve neighbors. Second, clustering methods in general are not geared toward allowing the human-computer interaction needed to effectively tease-out complex patterns. In the current paper, an approach is proposed to open up the “black box” of the clustering process for easy understanding, steering, focusing and interpretation, and thus to support an effective exploration of large and high dimensional geographic data. The proposed approach involves building a hierarchical spatial cluster structure within the high-dimensional feature space, and using this combined space for discovering multi-dimensional (combined spatial and non-spatial) patterns with efficient computational clustering methods and highly interactive visualization techniques. More specifically, this includes the integration of: (1) a hierarchical spatial clustering method to generate a 1-D spatial cluster ordering that preserves the hierarchical cluster structure, and (2) a density- and grid-based technique to effectively support the interactive identification of interesting subspaces and subsequent searching for clusters in each subspace. The implementation of the proposed approach is in a fully open and interactive manner supported by various visualization techniques.
advances in geographic information systems | 2002
Diansheng Guo; Donna J. Peuquet; Mark Gahegan
Clustering is one of the most important tasks for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial clustering methods have so far been mainly focused on searching for patterns within the spatial dimensions (usually 2D or 3D space), while more general-purpose high-dimensional (multivariate) clustering methods have very limited power in recognizing spatial patterns that involve neighbors. Secondly, existing clustering methods tend to be closed and are not geared toward allowing the interaction needed to effectively support a human-led exploratory analysis. The contribution of the research includes three parts. (1) Develop an effective and efficient hierarchical spatial clustering method, which can generate a 1-D spatial cluster ordering that preserves all the hierarchical clusters. (2) Develop a density- and grid-based hierarchical subspace clustering method to effectively identify high-dimensional clusters. The spatial cluster ordering is then integrated with this subspace clustering method to effectively search multivariate spatial patterns. (3) The above two methods are implemented in a fully open and interactive manner and supported by various visualization techniques. This opens up the black box of the clustering process for easy understanding, steering, focusing and interpretation. At the end a working demo with US census data is presented.
Cartography and Geographic Information Science | 2008
Jin Chen; Alan M. MacEachren; Diansheng Guo
While many data sets carry geographic and temporal references, our ability to analyze these datasets lags behind our ability to collect them because of the challenges posed by both data complexity and tool scalability issues. This study develops a visual analytics approach that leverages human expertise with visual, computational, and cartographic methods to support the application of visual analytics to relatively large spatio-temporal, multivariate data sets. We develop and apply a variety of methods for data clustering, pattern searching, information visualization, and synthesis. By combining both human and machine strengths, this approach has a better chance to discover novel, relevant, and potentially useful information that is difficult to detect by any of the methods used in isolation. We demonstrate the effectiveness of the approach by applying the Visual Inquiry Toolkit we developed to analyze a data set containing geographically referenced, time-varying and multivariate data for U.S. technology industries.
intelligent information systems | 2006
Diansheng Guo; Mark Gahegan
Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining and visualization to take into account both the geographic information and multiple aspatial variables in the detection of patterns. To tackle this problem, we present and evaluate a variety of spatial ordering methods that can transform spatial relations into a one-dimensional ordering and encoding which preserves spatial locality as much possible. The ordering can then be used to spatially sort temporal or multivariate data series and thus help reveal patterns across different spaces. The encoding, as a materialization of spatial clusters and neighboring relations, is also amenable for processing together with aspatial variables by any existing (non-spatial) data mining methods. We design a set of measures to evaluate nine different ordering/encoding methods, including two space-filling curves, six hierarchical clustering based methods, and a one-dimensional Sammon mapping (a multidimensional scaling approach). Evaluation results with various data distributions show that the optimal ordering/encoding with the complete-linkage clustering consistently gives the best overall performance, surpassing well-known space-filling curves in preserving spatial locality. Moreover, clustering-based methods can encode not only simple geographic locations, e.g., x and y coordinates, but also a wide range of other spatial relations, e.g., network distances or arbitrarily weighted graphs.