Haipeng Shen
University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Haipeng Shen.
Journal of the American Statistical Association | 2005
Lawrence D. Brown; Noah Gans; Avishai Mandelbaum; Anat Sakov; Haipeng Shen; Sergey Zeltyn; Linda H. Zhao
A call center is a service network in which agents provide telephone-based services. Customers who seek these services are delayed in tele-queues. This article summarizes an analysis of a unique record of call center operations. The data comprise a complete operational history of a small banking call center, call by call, over a full year. Taking the perspective of queueing theory, we decompose the service process into three fundamental components: arrivals, customer patience, and service durations. Each component involves different basic mathematical structures and requires a different style of statistical analysis. Some of the key empirical results are sketched, along with descriptions of the varied techniques required. Several statistical techniques are developed for analysis of the basic components. One of these techniques is a test that a point process is a Poisson process. Another involves estimation of the mean function in a nonparametric regression with lognormal errors. A new graphical technique is introduced for nonparametric hazard rate estimation with censored data. Models are developed and implemented for forecasting of Poisson arrival rates. Finally, the article surveys how the characteristics deduced from the statistical analyses form the building blocks for theoretically interesting and practically useful mathematical models for call center operations.
Biometrics | 2010
Mihee Lee; Haipeng Shen; Jianhua Z. Huang; J. S. Marron
Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row-column associations within high-dimensional data matrices. SSVD seeks a low-rank, checkerboard structured matrix approximation to data matrices. The desired checkerboard structure is achieved by forcing both the left- and right-singular vectors to be sparse, that is, having many zero entries. By interpreting singular vectors as regression coefficient vectors for certain linear regressions, sparsity-inducing regularization penalties are imposed to the least squares regression to produce sparse singular vectors. An efficient iterative algorithm is proposed for computing the sparse singular vectors, along with some discussion of penalty parameter selection. A lung cancer microarray dataset and a food nutrition dataset are used to illustrate SSVD as a biclustering method. SSVD is also compared with some existing biclustering methods using simulated datasets.
Manufacturing & Service Operations Management | 2008
Haipeng Shen; Jianhua Z. Huang
Accurate forecasting of call arrivals is critical for staffing and scheduling of a telephone call center. We develop methods for interday and dynamic intraday forecasting of incoming call volumes. Our approach is to treat the intraday call volume profiles as a high-dimensional vector time series. We propose first to reduce the dimensionality by singular value decomposition of the matrix of historical intraday profiles and then to apply time series and regression techniques. Our approach takes into account both interday (or day-to-day) dynamics and intraday (or within-day) patterns of call arrivals. Distributional forecasts are also developed. The proposed methods are data driven, appear to be robust against model assumptions in our simulation studies, and are shown to be very competitive in out-of-sample forecast comparisons using two real data sets. Our methods are computationally fast; it is therefore feasible to use them for real-time dynamic forecasting.
Journal of Multivariate Analysis | 2013
Dan Shen; Haipeng Shen; J. S. Marron
Sparse Principal Component Analysis (PCA) methods are efficient tools to reduce the dimension (or number of variables) of complex data. Sparse principal components (PCs) are easier to interpret than conventional PCs, because most loadings are zero. We study the asymptotic properties of these sparse PC directions for scenarios with fixed sample size and increasing dimension (i.e. High Dimension, Low Sample Size (HDLSS)). We consider the previously studied single spike covariance model and assume in addition that the maximal eigenvector is sparse. We extend the existing HDLSS asymptotic consistency and strong inconsistency results of conventional PCA in an entirely new direction. We find a large set of sparsity assumptions under which sparse PCA is still consistent even when conventional PCA is strongly inconsistent. The consistency of sparse PCA is characterized along with rates of convergence. Furthermore, we clearly identify the mathematical boundaries of the sparse PCA consistency, by showing strong inconsistency for an oracle version of sparse PCA beyond the consistent region, as well as its inconsistency on the boundaries of the consistent region. Simulation studies are performed to validate the asymptotic results in finite samples.
international wireless internet conference | 2006
Félix Hernández-Campos; Merkourios Karaliopoulos; Maria Papadopouli; Haipeng Shen
Campus wireless LANs (WLANs) are complex systems with hundreds of access points (APs) and thousands of users. Their performance analysis calls for realistic models of their elements, which can be input to simulation and testbed experiments but also taken into account for theoretical work. However, only few modeling results in this area are derived from real measurement data, and rarely do they provide a complete and consistent view of entire WLANs. In this work, we address this gap relying on extensive traces collected from the large wireless infrastructure of the University of North Carolina. We present a first system-wide, multi-level modeling approach for characterizing the traffic demand in a campus WLAN. Our approach focuses on two structures of wireless user activity, namely the wireless session and the network flow. We propose statistical distributions for their attributes, aiming at a parsimonious characterization that can be the most flexible foundation for simulation studies. We simulate our models and show that the synthesized traffic is in good agreement with the original trace data. Finally, we investigate to what extent these models can be valid at finer spatial aggregation levels of traffic load, e.g., for modeling traffic demand in hotspot APs.
The Annals of Applied Statistics | 2008
Haipeng Shen; Jianhua Z. Huang
We consider forecasting the latent rate profiles of a time series of inhomogeneous Poisson processes. The work is motivated by operations management of queueing systems, in particular, telephone call centers, where accurate forecasting of call arrival rates is a crucial primitive for efficient staffing of such centers. Our forecasting approach utilizes dimension reduction through a factor analysis of Poisson variables, followed by time series modeling of factor score series. Time series forecasts of factor scores are combined with factor loadings to yield forecasts of future Poisson rate profiles. Penalized Poisson regressions on factor loadings guided by time series forecasts of factor scores are used to generate dynamic within-process rate updating. Methods are also developed to obtain distributional forecasts. Our methods are illustrated using simulation and real data. The empirical results demonstrate how forecasting and dynamic updating of call arrival rates can affect the accuracy of call center staffing.
Electronic Journal of Statistics | 2008
Jianhua Z. Huang; Haipeng Shen; Andreas Buja
Two existing approaches to functional principal components analysis(FPCA) are due to Rice and Silverman(1991) andSilverman(1996), both based on maximizing variance but introducing penalization in differ- ent ways. In this article we propose an alternative approach to FPCA using penalized rank one approximation to the data matrix. Our contributions are four-fold: (1) by considering invariance under scale transformation of the measurements, the new formulation sheds light on how regularization should be performed for FPCA and suggestsan efficient power algorithmfor computation; (2) it naturally incorporates spline smoothing of discretized functional data; (3) the connection with smoothing splines also facilitates construction of cross-validation or generalized cross-validation criteria for smoothing parameter selection that allows efficient computation; (4) differ- ent smoothing parameters are permitted for different FPCs. The method- ology is illustrated with a real data example and a simulation. AMS 2000 subject classifications: Primary 62G08, 62H25; secondary 65F30.
Stroke | 2013
Ruijun Ji; Haipeng Shen; Yuesong Pan; Panglian Wang; Gaifen Liu; Yilong Wang; Hao Li; Wang Y
Background and Purpose— To develop and validate a risk score (acute ischemic stroke-associated pneumonia score [AIS-APS]) for predicting in-hospital stroke-associated pneumonia (SAP) after AIS. Methods— The AIS-APS was developed based on the China National Stroke Registry, in which eligible patients were randomly classified into derivation (60%) and internal validation cohort (40%). External validation was performed using the prospective Chinese Intracranial Atherosclerosis Study. Independent predictors of in-hospital SAP after AIS were obtained using multivariable logistic regression, and &bgr;-coefficients were used to generate point scoring system of the AIS-APS. The area under the receiver operating characteristic curve and the Hosmer–Lemeshow goodness-of-fit test were used to assess model discrimination and calibration, respectively. Results— The overall in-hospital SAP after AIS was 11.4%, 11.3%, and 7.3% in the derivation (n=8820), internal (n=5882) and external (n=3037) validation cohort, respectively. A 34-point AIS-APS was developed from the set of independent predictors including age, history of atrial fibrillation, congestive heart failure, chronic obstructive pulmonary disease and current smoking, prestroke dependence, dysphagia, admission National Institutes of Health Stroke Scale score, Glasgow Coma Scale score, stroke subtype (Oxfordshire), and blood glucose. The AIS-APS showed good discrimination (area under the receiver operating characteristic curve) in the internal (0.785; 95% confidence interval, 0.766–0.803) and external (0.792; 95% confidence interval, 0.761–0.823) validation cohort. The AIS-APS was well calibrated (Hosmer–Lemeshow test) in the internal (P=0.22) and external (P=0.30) validation cohort. When compared with 3 prior scores, the AIS-APS showed significantly better discrimination with regard to in-hospital SAP after AIS (all P<0.0001). Conclusions— The AIS-APS is a valid risk score for predicting in-hospital SAP after AIS.
Journal of the American Statistical Association | 2009
Jianhua Z. Huang; Haipeng Shen; Andreas Buja
Two-way functional data consist of a data matrix whose row and column domains are both structured, for example, temporally or spatially, as when the data are time series collected at different locations in space. We extend one-way functional principal component analysis (PCA) to two-way functional data by introducing regularization of both left and right singular vectors in the singular value decomposition (SVD) of the data matrix. We focus on a penalization approach and solve the nontrivial problem of constructing proper two-way penalties from one-way regression penalties. We introduce conditional cross-validated smoothing parameter selection whereby left-singular vectors are cross-validated conditional on right-singular vectors, and vice versa. The concept can be realized as part of an alternating optimization algorithm. In addition to the penalization approach, we briefly consider two-way regularization with basis expansion. The proposed methods are illustrated with one simulated and two real data examples. Supplemental materials available online show that several “natural” approaches to penalized SVDs are flawed and explain why so.
The Annals of Applied Statistics | 2012
Spencer Hays; Haipeng Shen; Jianhua Z. Huang
Accurate forecasting of zero coupon bond yields for a continuum of maturities is paramount to bond portfolio management and derivative security pricing. Yet a universal model for yield curve forecasting has been elusive, and prior attempts often resulted in a trade-off between goodness of fit and consistency with economic theory. To address this, herein we propose a novel formulation which connects the dynamic factor model (DFM) framework with concepts from functional data analysis: a DFM with functional factor loading curves. This results in a model capable of forecasting functional time series. Further, in the yield curve context we show that the model retains economic interpretation. Model estimation is achieved through an expectation-maximization algorithm, where the time series parameters and factor loading curves are simultaneously estimated in a single step. Efficient computing is implemented and a data-driven smoothing parameter is nicely incorporated. We show that our model performs very well on forecasting actual yield data compared with existing approaches, especially in regard to profit-based assessment for an innovative trading exercise. We further illustrate the viability of our model to applications outside of yield forecasting.