Xin-Hua Song
Clarkson University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xin-Hua Song.
Chemometrics and Intelligent Laboratory Systems | 2002
Pentti Paatero; Philip K. Hopke; Xin-Hua Song; Ziad Ramadan
Abstract Positive Matrix Factorization (PMF) is a least-squares approach for solving the factor analysis problem. It has been implemented in several forms. Initially, a program called PMF2 was used. Subsequently, a new, more flexible modeling tool, the Multilinear Engine, was developed. These programs can utilize different approaches to handle the problem of rotational indeterminacy. Although both utilize non-negativity constraints to reduce rotational freedom, such constraints are generally insufficient to wholly eliminate the rotational problem. Additional approaches to control rotations are discussed in this paper: (1) global imposition of additions among “scores” and subtractions among the corresponding “loadings” (or vice versa), (2) constraining individual factor elements, either scores and/or loadings, toward zero values, (3) prescribing values for ratios of certain key factor elements, or (4) specifying certain columns of the loadings matrix as known fixed values. It is emphasized that application of these techniques must be based on some external information about acceptable or desirable shapes of factors. If no such a priori information exists, then the full range of possible rotations can be explored, but there is no basis for choosing one of these rotations as the “best” result. Methods for estimating the rotational ambiguity in any specific result are discussed.
Atmospheric Environment | 2001
Xin-Hua Song; Alexandr V. Polissar; Philip K. Hopke
Fine particle composition data obtained at three sampling sites in the northeastern US were studied using a relatively new type of factor analysis, positive matrix factorization (PMF). The three sites are Washington, DC, Brigantine, NJ and Underhill, VT. The PMF method uses the estimates of the error in the data to provide optimal point-by-point weighting and permits efficient treatment of missing and below detection limit values. It also imposes the non-negativity constraint on the factors. Eight, nine and 11 sources were resolved from the Washington, Brigantine and Underhill data, respectively. The factors were normalized by using aerosol fine mass concentration data through multiple linear regression so that the quantitative source contributions for each resolved factor were obtained. Among the sources resolved at the three sites, six are common. These six sources exhibit not only similar chemical compositions, but also similar seasonal variations at all three sites. They are secondary sulfate with a high concentration of S and strong seasonal variation trend peaking in summer time; coal combustion with the presence of S and Se and its seasonal variation peaking in winter time; oil combustion characterized by Ni and V; soil represented by Al, Ca, Fe, K, Si and Ti; incinerator with the presence of Pb and Zn; sea salt with the high concentrations of Na and S. Among the other sources, nitrate (dominated by NO3−) and motor vehicle (with high concentrations of organic carbon (OC) and elemental carbon (EC), and with the presence of some soil dust components) were obtained for the Washington data, while the three additional sources for the Brigantine data were nitrate, motor vehicle and wood smoke (OC, EC, K). At the Underhill site, five other sources were resolved. They are wood smoke, Canadian Mn, Canadian Cu smelter, Canadian Ni smelter, and another salt source with high concentrations of Cl and Na. A nitrate source similar to that found at the other sites could not be obtained at Underhill since NO3− was not measured at this site. Generally, most of the sources at the three sites showed similar chemical composition profiles and seasonal variation patterns. The study indicated that PMF was a powerful factor analysis method to extract sources from the ambient aerosol concentration data.
Journal of The Air & Waste Management Association | 2000
Ziad Ramadan; Xin-Hua Song; Philip K. Hopke
ABSTRACT Chemical composition data for fine and coarse particles collected in Phoenix, AZ, were analyzed using positive matrix factorization (PMF). The objective was to identify the possible aerosol sources at the sampling site. PMF uses estimates of the error in the data to provide optimum data point scaling and permits a better treatment of missing and below-detection-limit values. It also applies nonnegativity constraints to the factors. Two sets of fine particle samples were collected by different samplers. Each of the resulting fine particle data sets was analyzed separately. For each fine particle data set, eight factors were obtained, identified as (1) biomass burning characterized by high concentrations of organic carbon (OC), elemental carbon (EC), and K; (2) wood burning with high concentrations of Na, K, OC, and EC; (3) motor vehicles with high concentrations of OC and EC; (4) nonferrous smelting process characterized by Cu, Zn, As, and Pb; (5) heavy-duty diesel characterized by high EC, OC, and Mn; (6) sea-salt factor dominated by Na and Cl; (7) soil with high values for Al, Si, Ca, Ti, and Fe; and (8) secondary aerosol with SO4 -2 and OC that may represent coal-fired power plant emissions. For the coarse particle samples, a five-factor model gave source profiles that are attributed to be (1) sea salt, (2) soil, (3) Fe source/motor vehicle, (4) construction (high Ca), and (5) coal-fired power plant. Regression of the PM mass against the factor scores was performed to estimate the mass contributions of the resolved sources. The major sources for the fine particles were motor vehicles, vegetation burning factors (biomass and wood burning), and coal-fired power plants. These sources contributed most of the fine aerosol mass by emitting carbonaceous particles, and they have higher contributions in winter. For the coarse particles, the major source contributions were soil and construction (high Ca). These sources also peaked in winter.
Chemometrics and Intelligent Laboratory Systems | 2003
Ziad Ramadan; Bass Eickhout; Xin-Hua Song; L.M.C. Buydens; Philip K. Hopke
Abstract New approaches to solving the factor analysis (FA) problem have recently been developed by recognizing that factor analysis is fundamentally a least-squares (LS) problem. This approach is called Positive Matrix Factorization (PMF). Two programs have been written to implement different algorithms for solving the problem. These programs are PMF2 and Multilinear Engine (ME-2). The two programs use different algorithms to obtain the least-squares solution and the constraints are imposed in different ways. Elemental composition data for particle samples collected in Phoenix, AZ from June 1996 through June 1998, were used to compare the source apportionment of these two programs. The ME-2 results presented in this paper are compared with the previously published PMF2 results. The identification of the eight PMF sources returned one questionable source: wood burning and some peculiar mass contributions. The extra features of ME-2 made it possible to also investigate the sources responsible for the fine particles. The mixed-way approach indicated the existence of incinerators in the Phoenix area. Like PMF, ME-2 identified high source contributions for biomass burning, motor vehicles (with higher contribution in winter), coal-fired power plants (secondary particles with higher contributions in summer), soil, and nonferrous smelting process. Sea salt and heavy-duty diesel were identified by the ME two-way analysis, but they disappeared in the three-way analysis of the dual fine particle sequential sampler (DFPSS) and DICHOT data. Instead, an obvious incinerator source was identified again. Thus, PMF and ME-2 identified the same major sources responsible for the PM 2.5 in Phoenix, but some of the sources identified by PMF2 appear to be uncertain. The three-way analysis provided additional information about possible sources, but also returned unexplainable sources.
Analytica Chimica Acta | 2001
Ziad Ramadan; Xin-Hua Song; Philip K. Hopke; Mara J. Johnson; Kate M. Scow
Abstract Two variable selection methods were evaluated by comparing their predictions with respect to differentiating among environmental soil samples. The focus of this work is to determine which input variables are most relevant for prediction of soil sources using discriminant partial least square (D-PLS) and back-propagation artificial neural network (BP-ANN) models. The methods investigated were stepwise variable selection method and genetic algorithms (GAs). Microbial community DNA was extracted from 48 environmental soil samples derived from different field crops and soil sources. After amplification of bacterial ribosomal RNA genes by polymerase chain reaction (PCR), the products were separated by gel electrophoresis. Characteristic complex band patterns were obtained, indicating high bacterial diversity. Two hundred and twenty-three DNA band patterns produced in the gels of the soil samples were used in the analysis, after removal of included DNA standard markers. Based on the brightness of the bands, densitometric curves of the selected DNA band pattern were extracted from the gel images. The curves were smoothed using Savitsky–Golay method and scaled to the DNA standard markers. The prediction results based on the two variable selection methods for PLS and ANN models are presented and compared. Both methods gave good results before any variable selection methods, with the ANN being better than D-PLS. The prediction performance of both methods specially the D-PLS were improved by applying the stepwise variable selection and the GA variable selection method. The study also shows that GA variable selection had a significant improvement of the predictive ability than the stepwise variable selection method.
Analytica Chimica Acta | 1997
Philip K. Hopke; Xin-Hua Song
Abstract The identification of sources of particles found in chemical process equipment such as photographic printer cabinets and their quantitative apportionment to those sources could lead to effective control strategies that would improve productivity and customer satisfaction with the service. Computer-controlled scanning electron microscopy (CCSEM) has proven to be a powerful tool in the characterization of individual particles. Thus, in this paper the samples of particles taken from experiments examining the formation of particles when cutting photographic paper and particles collected in a printer cabinet have been characterized by using CCSEM analysis. The obtained data have been analyzed by using two different neural networks, namely, the adaptive resonance theory based neural network (ART-2a) and Kohonen neural network. Both neural networks can be used to perform an unsupervised pattern recognition examination of which particles should be grouped together. The results show that they are generally able to extract the main particle groups present in the data set. The produced particle groups are almost homogeneous based on the major chemical elements. From the general size, shape and density parameters provided by the CCSEM analysis, the volume and mass of each particle were estimated. Then the mass fractions of each particle class produced by the neural networks were calculated. Based on the mass conservation principle and the resulting mass balance, the particle class balance model has been used to discern particles from different sources and apportion the corresponding source contributions.
Trends in Analytical Chemistry | 2003
Nicolaas (Klaas) M. Faber; Xin-Hua Song; Philip K. Hopke
The development of an adequate expression for sample-specific standard error of prediction for partial least squares regression is a major trend in chemometrics literature. This article focuses on three generally applicable expressions, namely one recommended by the American Society for Testing and Materials (ASTM), one implemented in Unscrambler software and a simplification derived under the errors-in-variables (EIV) model. Results obtained for a near-infrared data set taken from the literature demonstrate that the EIV expression works best.
Analytica Chimica Acta | 2001
Xin-Hua Song; Nicolaas (Klaas) M. Faber; Philip K. Hopke; David T. Suess; Kimberly A. Prather; James J. Schauer; Glen R. Cass
The mass apportionment of gasoline and diesel particles in ambient aerosol samples is a difficult problem because both sources exhibit very similar chemical composition. However, individual particle analysis could provide additional information and help achieve source apportionment with good accuracy. Aerosol time-of-flight mass spectrometry (ATOFMS) has proven to be a powerful technique capable of simultaneously determining both the size and chemical composition of single particles in real time. Thus, samples of gasoline and diesel particles were analyzed by ATOFMS for their single particle information. In addition to the aerodynamic diameter from which the individual particle mass can be estimated, positive and negative mass spectra were obtained for each particle. A novel data analysis approach based on the combination of an adaptive resonance theory-based neural network (ART-2a), and a multivariate calibration method, partial least squares (PLS), has been developed to apportion the mass contributions of gasoline and diesel sources to mixture samples. The ART-2a neural network was used first to classify the particle-by-particle mass spectral data. The source profile for each source (gasoline/diesel) was obtained in terms of the mass fractions of the classified particle types. Next, PLS was applied to build a model relating the mass fractions of different particle classes and the mass contributions of the two sources to mixture samples. Artificial mixture samples obtained by randomly mixing some particles from the two source samples have been used to examine the feasibility of the proposed method. Satisfactory predictions for the mass contributions of gasoline and diesel exhaust to the mixture samples have been obtained. A recently proposed formula for prediction error variance is successfully modified to quantify the uncertainty in the PLS predictions. This study exemplifies the potential promise of multivariate calibration as applied to the aerosol source apportionment problem.
Analytica Chimica Acta | 1996
Xin-Hua Song; Philip K. Hopke
Abstract The Kohonen self-organizing neural network is a useful tool for pattern recognition. Based on the Kohonen map obtained from the training set, predictions can be made for the unknown objects. The problem is how to determine the membership of new objects hitting empty neurons which were not activated by any training set objects. The K-nearest neighbor technique has been previously used to solve this problem based on the relative geometric position of the neurons in the two-dimensional Kohonen map (Kohonen-KNN). However, during the projection into a low-dimensional subspace for the Kohonen neural network, some information about the correct neighbor relationships between the object vectors is lost. Thus, the Kohonen-KNN method may not give the best prediction results. In this paper, an alternative method is proposed for the Kohonen neural network to be used in a supervised way based on the weight interpretation (Kohonen-WI). The membership of the samples hitting the empty neurons during the prediction process is determined to be the same as that of the nearest active neuron based on a distance measure from the trained weight vectors. The Italian olive oil data set is used to test this method. Moreover, the learning vector quantization (LVQ) method has also been used to treat the same data set. This method explicitly uses the class membership of samples in the training set for the fine adaptions of network weights. It has been found that the Kohonen-WI method gives better prediction results than Kohonen-KNN, indicating that the weight interpretation can partly compensate for the information loss about the correct neighbor relationship between the neurons in the Kohonen map. The LVQ method gives very similar and satisfactory classification results as Kohonen-WI, though their working mechanisms are different. By comparison, the Kohonen-WI method is easier to use.
Chemometrics and Intelligent Laboratory Systems | 1997
Philip K. Hopke; Xin-Hua Song
Abstract The problem of source identification and quantitative mass apportion for airborne particulate matter commonly called receptor modeling can be treated in a manner analogous to the multivariate calibration problem commonly encountered in chemometrics. Partial least-squares (PLS) has been previously used in such a context. In this work, artificial neural networks (ANN) and simulated annealing (SA) have been applied to the sets of simulated data. The aerosol composition data generated by the National Bureau of Standards (NBS) for the 1982 EPA workshop on mathematical and empirical receptor modeling held at Quail Roost, NC, have been examined. From these tests of ANN and SA and earlier work on partial least-squares, it appears that multivariate calibration methods may be helpful in resolving sources and apportioning the airborne mass. ANN was better able to deal with the collinearity in the source profile matrix. For CMB and PLS, this collinearity prevented the apportionment of mass to all of the known sources. In addition, ANN could identify which sources were active when trained with a source profile library containing more sources than actually contributed to the samples. SA produced more accurate source contribution estimates than the other methods, but was also bothered by the collinearity to the same degree as the CMB or PLS results. Thus, the initial results with these methods are promising, but further development and testing are needed before they can be routinely used.