Simon Urbanek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Simon Urbanek is active.

Explore More

Publication

Featured researches published by Simon Urbanek.

Communications of The ACM | 2013

Human mobility characterization from cellular network data

Richard A. Becker; Ramón Cáceres; Karrie J. Hanson; Sibren Isaacman; Ji Meng Loh; Margaret Martonosi; James Rowland; Simon Urbanek; Alexander Varshavsky; Chris Volinsky

Anonymous location data from cellular phone networks sheds light on how people move around on a large scale.

IEEE Pervasive Computing | 2011

A Tale of One City: Using Cellular Network Data for Urban Planning

Richard A. Becker; Ramón Cáceres; Karrie J. Hanson; Ji Meng Loh; Simon Urbanek; Alexander Varshavsky; Chris Volinsky

Cellular data from call detail records can help urban planners better understand city dynamics. The authors use CDR data to analyze people flow in and out of a suburban city near New York City.

ubiquitous computing | 2011

Route classification using cellular handoff patterns

Richard A. Becker; Ramón Cáceres; Karrie J. Hanson; Ji Meng Loh; Simon Urbanek; Alexander Varshavsky; Chris Volinsky

Understanding utilization of city roads is important for urban planners. In this paper, we show how to use handoff patterns from cellular phone networks to identify which routes people take through a city. Specifically, this paper makes three contributions. First, we show that cellular handoff patterns on a given route are stable across a range of conditions and propose a way to measure stability within and between routes using a variant of Earth Movers Distance. Second, we present two accurate classification algorithms for matching cellular handoff patterns to routes: one requires test drives on the routes while the other uses signal strength data collected by high-resolution scanners. Finally, we present an application of our algorithms for measuring relative volumes of traffic on routes leading into and out of a specific city, and validate our methods using statistics published by a state transportation authority.

international conference on data mining | 2012

Computational Television Advertising

Suhrid Balakrishnan; Sumit Chopra; David Applegate; Simon Urbanek

Ever wonder why that Kia Ad ran during Iron Chef? Traditional advertising methodology on television is a fascinating mix of marketing, branding, measurement, and predictive modeling. While still a robust business, it is at risk with the recent growth of online and time-shifted (recorded) television. A particular issue is that traditional methods for television advertising are far less efficient than their counterparts in the online world which employ highly sophisticated computational techniques. This paper formalizes an approach to eliminate some of these inefficiencies by recasting the process of television advertising media campaign generation in a computational framework. We describe efficient mathematical approaches to solve for the task of finding optimal campaigns for specific target audiences. In two case studies, our campaigns report gains in key operational metrics of up to 56% compared to campaigns generated by traditional methods.

Archive | 2008

Visualizing Trees and Forests

Simon Urbanek

Tree-basedmodels provide an appealing alternative to conventional models formany reasons. They are more readily interpretable, can handle both continuous and categorical covariates, can accommodate data with missing values, provide an implicit variable selection, and model interactionswell. Most frequently used tree-basedmodels are classification, regression, and survival trees. Visualization is important in conjunction with treemodels because in their graphical formthey are easily interpretable even without special knowledge. Interpretation of decision trees displayed as a hierarchy of decision rules is highly intuitive.

international conference on computer communications | 2017

Can you find me now? Evaluation of network-based localization in a 4G LTE network

Robert Margolies; Richard A. Becker; Simon D. Byers; Supratim Deb; Rittwik Jana; Simon Urbanek; Chris Volinsky

User location is of critical importance to cellular network operators. It is often used for network capacity planning and to aid in the analysis of service and network diagnostics. However, existing localization techniques rely on user-provided information (e.g., Angle-of-Arrival), which are not available to the operator, and often require a significant effort to collect training data. Our main contribution is the design and evaluation of the Network-Based Localization (NBL) System for localizing a user in a 4G LTE network. The NBL System consists of 2 stages. In an offline stage, we develop RF coverage maps based on a large-scale crowd-sourced channel measurement campaign. Then, in an online stage, we present a localization algorithm to quickly match RF measurements (which are already collected as part of normal network operation) to coverage map locations. The system is more practical than related works, as it does not make any assumptions about user mobility, nor does it require expensive manual training measurements. Despite the realistic assumptions, our extensive evaluations in a national 4G LTE network show that the NBL System achieves a localization accuracy which is comparable to related works (i.e., a median accuracy of 5% of the cells coverage region).

visual analytics science and technology | 2015

Collaborative visual analysis with RCloud

Stephen C. North; Carlos Scheidegger; Simon Urbanek; Gordon Woodhull

Consider the emerging role of data science teams embedded in larger organizations. Individual analysts work on loosely related problems, and must share their findings with each other and the organization at large, moving results from exploratory data analyses (EDA) into automated visualizations, diagnostics and reports deployed for wider consumption. There are two problems with the current practice. First, there are gaps in this workflow: EDA is performed with one set of tools, and automated reports and deployments with another. Second, these environments often assume a single-developer perspective, while data scientist teams could get much benefit from easier sharing of scripts and data feeds, experiments, annotations, and automated recommendations, which are well beyond what traditional version control systems provide. We contribute and justify the following three requirements for systems built to support current data science teams and users: discoverability, technology transfer, and coexistence. In addition, we contribute the design and implementation of RCloud, a system that supports the requirements of collaborative data analysis, visualization and web deployment. About 100 people used RCloud for two years. We report on interviews with some of these users, and discuss design decisions, tradeoffs and limitations in comparison to other approaches.

database and expert systems applications | 2017

A Tool for Statistical Analysis on Network Big Data

Carlos Ordonez; Theodore Johnson; Divesh Srivastava; Simon Urbanek

Due to advances in parallel file systems for big data (i.e. HDFS) and larger capacity hardware (multicore CPUs, large RAM) it is now feasible to manage and query network data in a parallel DBMS supporting SQL, but performing statistical analysis remains a challenge.On the statistics side, the R language is popular, but it presents important limitations: R is limited by main memory, R works in a different address space from query processing, R cannot analyze large disk-resident data sets efficiently, and R has no data management capabilities. Moreover, some R libraries allow R to work in parallel, but without data management capabilities. Considering the challenges and limitations described above, we present a system that allows combining SQL queries and R functions in a seamless manner. We justify a parallel DBMS and the R runtime are two different systems that benefit from a low-level integration. Our parallel DBMS is built on top of HDFS, programmed in Java and C++, with a flexible scale out architecture, whereas R is programmed purely in C. The user or developer can make calls in both directions: (1) R calling SQL, to evaluate analytic queries or retrieve data from materialized views (transferring result tables in RAM in a streaming fashion and analyzing them in R), and vice-versa (2) SQL calling R, allowing SQL to convert relational tables to matrices or vectors and making complex computations on them. We give a summary of network monitoring tasks at ATT and present specific programming examples, showing language calls in both directions (i.e. R calls SQL, SQL calls R).

database and expert systems applications | 2017

Integrating the R Language Runtime System with a Data Stream Warehouse

Carlos Ordonez; Theodore Johnson; Simon Urbanek; Vladislav Shkapenyuk; Divesh Srivastava

Computing mathematical functions or machine learning models on data streams is difficult: a popular approach is to use the R language. Unfortunately, R has important limitations: a dynamic runtime system incompatible with a DBMS, limited by available RAM and no data management capabilities. On the other hand, SQL is well established to write queries and manage data, but it is inadequate to perform mathematical computations. With that motivation in mind, we present a system that enables analysis in R on a time window, where the DBMS continuously inserts new records and propagates updates to materialized views. We explain the low-level integration enabling fast data transfer in RAM between the DBMS query process and the R runtime. Our system enables analytic calls in both directions: (1) R calling SQL to evaluate streaming queries; transferring output streaming tables and analyzing them with R operators and functions in the R runtime, (2) SQL calling R, to exploit R mathematical operators and mathematical models, computed in a streaming fashion inside the DBMS. We discuss analytic examples, illustrating analytic calls in both directions. We experimentally show our system achieves streaming speed to transfer data.

Statistical Analysis and Data Mining | 2016

Scatter matrix concordance as a diagnostic for regressions on subsets of data

Michael J. Kane; Bryan Lewis; Sekhar Tatikonda; Simon Urbanek

Linear regression models depend directly on the design matrix and its properties. Techniques that efficiently estimate model coefficients by partitioning rows of the design matrix are increasingly becoming popular for large-scale problems because they fit well with modern parallel computing architectures. We propose a simple measure of concordance between a design matrix and a subset of its rows that estimates how well a subset captures the variance-covariance structure of a larger data set. We illustrate the use of this measure in a heuristic method for selecting row partition sizes that balance statistical and computational efficiency goals in real-world problems.

Explore More