José C. Pinheiro
Alcatel-Lucent
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José C. Pinheiro.
Handbook of massive data sets | 2002
Michael H. Cahill; Diane Lambert; José C. Pinheiro; Don X. Sun
Finding telecommunications fraud in masses of call records is more difficult than finding a needle in a haystack. In the haystack problem, there is only one needle that does not look like hay, the pieces of hay all look similar, and neither the needle nor the hay changes much over time. Fraudulent calls may be rare like needles in haystacks, but they are much more challenging to find. Callers are dissimilar, so calls that look like fraud for one account look like expected behavior for another, while all needles look the same. Moreover, fraud has to be found repeatedly, as fast as fraud calls are placed, the nature of fraud changes over time, the extent of fraud is unknown in advance, and fraud may be spread over more than one type of service. For example, calls placed on a stolen wireless telephone may be charged to a stolen credit card. Finding fraud is like finding a needle in a haystack only in the sense of sifting through masses of data to find something rare. This chapter describes some issues involved in creating tools for building fraud systems that are accurate, able to adapt to changing legitimate and fraudulent behavior, and easy to use.
knowledge discovery and data mining | 2000
Fei Chen; Diane Lambert; José C. Pinheiro
records, internet packet headers, or other trans- action records|are coming down a pipe at a ferocious rate, and we need to monitor statistics of the data. There is no reason to think that the data are normally distributed, so quantiles of the data are important to watch. The probe attached to the pipe has only limited memory, though, so it is impossible to compute the quantiles by sorting the data. The only possibility is to incrementally estimate the quan- tiles as the data fly by. This paper provides such an in- cremental quantile estimator. It resembles an exponentially weighted moving average in form, processing and memory requirements, but it is based on stochastic approximation so we call it an exponentially weighted stochastic approximation or EWSA. Simulations show that the EWSA outperforms other kinds of incremental estimates that also require min- imal main memory, especially when extreme quantiles are tracked for patterns of behavior that change over time. Use of the EWSA is illustrated in an application to tracking call duration for a set of callers over a three month period.
international conference on network protocols | 2001
Sneha Kumar Kasera; José C. Pinheiro; Catherine Rachael Loader; Mehmet Karaul; Adiseshu Hari; Tom LaPorta
Telecommunication switches implement overload controls to maintain call throughput and delay at acceptable levels during periods of high load. Existing work has mostly focused on controls under sustained overload-they do not meet the demands of modern telecommunication systems where the increased number of services and mobile subscribers often creates fast changing hot spots. We introduce new algorithms that are designed to be highly reactive to sudden bursts of load. One algorithm is a modified version of RED for signaling traffic that measures the queue size. The second algorithm uses two measures: call acceptance rate and processor occupancy. Using simulations of realistic system models, we compare these new algorithms with each other and an existing algorithm that uses processor occupancy only. Our simulation results and qualitative arguments show that the combination of acceptance rate and processor occupancy results in a highly reactive and robust signaling overload control.
Controlled Clinical Trials | 2000
Jianjian Gong; José C. Pinheiro; David L. DeMets
Clinical trials generally include several outcome measures of interest for assessing treatment efficacy and harm. Traditionally a single measure, the primary outcome, is selected and used as the basis for the design, including sample size and power. Secondary outcomes are then generally ordered with respect to their clinical relevance and importance. While this has become the traditional paradigm, recent trials have suggested the need for additional approaches. In this setting, two outcomes are viewed as key, either one being sufficient for proof of efficacy, but with an ordering of preference. The basic question, in such cases, is how to control the overall significance level for the trial. We describe and compare two methods for testing primary and secondary endpoints, accounting for their hierarchical nature-the ordering preference. Both methods are sequential, in the sense that the secondary endpoint is only tested when the primary outcome fails to reach significance. The first method uses a global test for the combination of the primary and secondary endpoints, while the second uses a partial Bonferroni correction. Simulation results indicate that the Bonferroni adjustment method performs as well as the global test method in most cases, and even better in some cases.
knowledge discovery and data mining | 2001
Diane Lambert; José C. Pinheiro
Transaction data can arrive at a ferocious rate in the order that transactions are completed. The data contain an enormous amount of information about customers, not just transactions, but extracting up-to-date customer information from an ever changing stream of data and mining it in real-time is a challenge. This paper describes a statistically principled approach to designing short, accurate summaries or signatures of high dimensional customer behavior that can be kept current with a stream of transactions. A signature database can then be used for data mining and to provide approximate answers to many kinds of queries about current customers quickly and accurately, as an empirical study of the calling patterns of 96,000 wireless customers who made about 18 million wireless calls over a three month period shows.
Journal of the American Statistical Association | 2001
Diane Lambert; José C. Pinheiro; Don X. Sun
In some business applications, the transaction behavior of each customer is tracked separately with a customer signature. A customers signature for buying behavior, for example, may contain information on the likely place of purchase, value of goods purchased, type of goods purchased, and timing of purchases. The signature may be updated whenever the customer makes a transaction, and, because of storage limitations, the updating may be able to use only the new transaction and the summarized information in the customers current signature. Standard sequential updating schemes, such as exponentially weighted moving averaging, can be used to update a characteristic that is observed at random, but timing variables like day of the week are not observed at random, and standard sequential estimates of their distributions can be badly biased. This article derives a fast, space-efficient sequential estimator for timing distributions that is based on a Poisson model that has periodic rates that may evolve over time. The sequential estimator is a variant of an exponentially weighted moving average. It approximates the posterior mean under a dynamic Poisson timing model and has good asymptotic properties. Simulations show that it also has good finite sample properties. A telecommunications application to a random sample of 2,000 customers shows that the model assumptions are adequate and that the sequential estimator can be useful in practice.
Archive | 2005
Diane Lambert; José C. Pinheiro; Don X. Sun
knowledge discovery and data mining | 1998
José C. Pinheiro; Don X. Sun
Archive | 2000
Diane Lambert; José C. Pinheiro; Don X. Sun
Archive | 2003
Adiseshu Hari; Mehmet Karaul; Sneha Kumar Kasera; Thoma F. La Porta; Catherine Loader; José C. Pinheiro; Robert A. Latimer