Oscar Peredo
University of Chile
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Oscar Peredo.
Sensors | 2016
Eduardo Graells-Garrido; Oscar Peredo; José M. García
Mobile data has allowed us to sense urban dynamics at scales and granularities not known before, helping urban planners to cope with urban growth. A frequently used kind of dataset are Call Detail Records (CDR), used by telecommunication operators for billing purposes. Being an already extracted and processed dataset, it is inexpensive and reliable. A common assumption with respect to geography when working with CDR data is that the position of a device is the same as the Base Transceiver Station (BTS) it is connected to. Because the city is divided into a square grid, or by coverage zones approximated by Voronoi tessellations, CDR network events are assigned to corresponding areas according to BTS position. This geolocation may suffer from non negligible error in almost all cases. In this paper we propose “Antenna Virtual Placement” (AVP), a method to geolocate mobile devices according to their connections to BTS, based on decoupling antennas from its corresponding BTS according to its physical configuration (height, downtilt, and azimuth). We use AVP applied to CDR data as input for two different tasks: first, from an individual perspective, what places are meaningful for them? And second, from a global perspective, how to cluster city areas to understand land use using floating population flows? For both tasks we propose methods that complement or improve prior work in the literature. Our proposed methods are simple, yet not trivial, and work with daily CDR data from the biggest telecommunication operator in Chile. We evaluate them in Santiago, the capital of Chile, with data from working days from June 2015. We find that: (1) AVP improves city coverage of CDR data by geolocating devices to more city areas than using standard methods; (2) we find important places (home and work) for a 10% of the sample using just daily information, and recreate the population distribution as well as commuting trips; (3) the daily rhythms of floating population allow to cluster areas of the city, and explain them from a land use perspective by finding signature points of interest from crowdsourced geographical information. These results have implications for the design of applications based on CDR data like recommendation of places and routes, retail store placement, and estimation of transport effects from pollution alerts.
Archive | 2017
Oscar Peredo; José García; Ricardo Stuven; Julián M. Ortiz
In telecommunications, the billing data of each telephone, denoted call detail records (CDRs), are a large and rich database with information that can be geo-located. By analyzing the events logged in each antenna, a set of time series can be constructed measuring the number of voice and data events in each time of the day. One question that can be addressed using these data involves estimating the movement or flow of people in the city, which can be used for prediction and monitoring in transportation or urban planning. In this work, geostatistical estimation techniques such as kriging and inverse distance weighting (IDW) are used to numerically estimate the flow of people. In order to improve the accuracy of the model, secondary information is included in the estimation. This information represents the locally varying anisotropy (LVA) field associated with the major streets and roads in the city. By using this technique, the flow estimation can be obtained with a better quantitative and qualitative interpretation. In terms of storage and computing power, the volume of raw information is extremely large; for that reason big data technologies are mandatory to query the database. Additionally, if high-resolution grids are used in the estimation, high-performance computing techniques are necessary to speed up the numerical computations using LVA codes. Case studies are shown, using voice/data records from anonymized clients of Telefonica Movistar in Santiago, capital of Chile.
Archive | 2012
Oscar Peredo; Julián M. Ortiz
Multiple-point geostatistical simulation aims at generating realizations that reproduce pattern statistics inferred from some training source, usually a training image. The most widely used algorithm is based on solving a single normal equation at each location using the conditional probabilities inferred during the training process. Simulated annealing offers an alternative implementation that, in addition, allows to incorporate additional statistics to be matched and imposing constraints based, for example, on secondary information. Another class of stochastic simulation algorithms, called genetic algorithms (GA), allows to incorporate additional statistics in the same way as simulated annealing. This paper focuses on a sequential implementation of a genetic algorithm to simulate categorical variables to reproduce multiple-point statistics, and also the details concerning its parallelization and execution in a shared-memory supercomputer. Examples are provided to show the simulated images with their objective functions and running times.
Archive | 2010
Julián M. Ortiz; Oscar Peredo
Multiple-point geostatistical simulation aims at generating realizations that reproduce pattern statistics inferred from some training source, usually a training image. The most widely used algorithm is based on solving a single normal equation at each location using the conditional probabilities inferred during the training process. Simulated annealing offers an alternative implementation that, in addition, permits to incorporate additional statistics to be matched and imposing constraints based, for example, on secondary information. This paper focuses on an innovative implementation of simulated annealing to simulate categorical variables, reproducing multiple-point statistics. It is based on a well known paradigm in computer science, namely, speculative computing. In simulated annealing, categories are initially randomly distributed. Nodes are visited iteratively and a perturbation is proposed to approach the distribution of the categories to some target statistics. A decision is made to accept or conditionally reject the change, depending on an objective function that must approach zero to match the target statistics. Rejection will occur with a probability that changes during the simulation process, as defined in the annealing schedule. Speculative computing consists of using multiple processes in parallel to pre-calculate the next step in the simulation in both situations: accepting or rejecting the change. While the decision is made in the first process, a second level of two processes is used to calculate the two possible cases and subsequent levels can also be initiated. Once the decision is made, processes that do not conform to this decision are dropped and speculations about other possible perturbations at the current simulation stage are initiated. This implementation of simulated annealing can speed up the process significantly, hence making this algorithm a reasonable alternative to current methods. An example using a geologic data set is provided to demonstrate the improvements achieved and the potential this method has for larger models. Some future work is also proposed.
Computers & Geosciences | 2015
Oscar Peredo; Julián M. Ortiz; José R. Herrero
The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners and researchers. Despite its widespread use, few attempts have been reported in order to bring this package to the multi-core era. Using all CPU resources, GSLIB algorithms can handle large datasets and grids, where tasks are compute- and memory-intensive applications. In this work, a methodology is presented to accelerate GSLIB applications using code optimization and hybrid parallel processing, specifically for compute-intensive applications. Minimal code modifications are added decreasing as much as possible the elapsed time of execution of the studied routines. If multi-core processing is available, the user can activate OpenMP directives to speed up the execution using all resources of the CPU. If multi-node processing is available, the execution is enhanced using MPI messages between the compute nodes.Four case studies are presented: experimental variogram calculation, kriging estimation, sequential gaussian and indicator simulation. For each application, three scenarios (small, large and extra large) are tested using a desktop environment with 4 CPU-cores and a multi-node server with 128 CPU-nodes. Elapsed times, speedup and efficiency results are shown. HighlightsThis work is part of an effort to accelerate geostatistical simulation codes.We apply acceleration techniques to a package of legacy geostatistical codes (GSLIB).Acceleration techniques are code optimization and hybrid OpenMP/MPI parallelization.Accelerations were applied to variogram, kriging and sequential simulation.Elapsed time and speedup results are shown.
parallel computing | 2014
Oscar Peredo; Julián M. Ortiz; José R. Herrero; Cristóbal Samaniego
One of the main difficulties using multi-point statistical (MPS) simulation based on annealing techniques or genetic algorithms concerns the excessive amount of time and memory that must be spent in order to achieve convergence. In this work we propose code optimizations and parallelization schemes over a genetic-based MPS code with the aim of speeding up the execution time. The code optimizations involve the reduction of cache misses in the array accesses, avoid branching instructions and increase the locality of the accessed data. The hybrid parallelization scheme involves a fine-grain parallelization of loops using a shared-memory programming model (OpenMP) and a coarse-grain distribution of load among several computational nodes using a distributed-memory programming model (MPI). Convergence, execution time and speed-up results are presented using 2D training images of sizes 100 × 100 × 1 and 1000 × 1000 × 1 on a distributed-shared memory supercomputing facility.
Complexity | 2018
José García; Francisco Altimiras; Alvaro Peña; Gino Astorga; Oscar Peredo
The progress of metaheuristic techniques, big data, and the Internet of things generates opportunities to performance improvements in complex industrial systems. This article explores the application of Big Data techniques in the implementation of metaheuristic algorithms with the purpose of applying it to decision-making in industrial processes. This exploration intends to evaluate the quality of the results and convergence times of the algorithm under different conditions in the number of solutions and the processing capacity. Under what conditions can we obtain acceptable results in an adequate number of iterations? In this article, we propose a cuckoo search binary algorithm using the MapReduce programming paradigm implemented in the Apache Spark tool. The algorithm is applied to different instances of the crew scheduling problem. The experiments show that the conditions for obtaining suitable results and iterations are specific to each problem and are not always satisfactory.
Mathematical Geosciences | 2016
Oscar Peredo; Julián M. Ortiz; Oy Leuangthong
Moving average simulation can be summarized as a convolution between a spatial kernel and a white noise random field. The kernel can be calculated once the variogram model is known. An inverse approach to moving average simulation is proposed, where the kernel is determined based on the experimental variogram map in a non-parametric way, thus no explicit variogram modeling is required. The omission of structural modeling in the simulation work-flow may be particularly attractive if spatial inference is challenging and/or practitioners lack confidence in this task. A non-linear inverse problem is formulated in order to solve the problem of discrete kernel weight estimation. The objective function is the squared euclidean distance between experimental variogram values and the convolution of a stationary random field with Dirac covariance and the simulated kernel. The isotropic property of the kernel weights is imposed as a linear constraint in the problem, together with lower and upper bounds for the weight values. Implementation details and examples are presented to demonstrate the performance and potential extensions of this method.
ieee international conference on high performance computing data and analytics | 2014
Felipe Navarro; Carlos González; Oscar Peredo; Gerson Morales; Alvaro Egaña; Julián M. Ortiz
A wide range of scientific computing applications still use algorithms provided by large old code or libraries, that rarely make profit from multiple cores architectures and hardly ever are distributed. In this paper we propose a flexible strategy for execution of those legacy codes, identifying main modules involved in the process. Key technologies involved and a tentative implementation are provided allowing to understand challenges and limitations that surround this problem. Finally a case study is presented for a large-scale, single threaded, stochastic geostatistical simulation, in the context of mining and geological modeling applications. A successful execution, running time and speedup results are shown using a workstation cluster up to eleven nodes.
WWW '18 Companion Proceedings of the The Web Conference 2018 | 2018
Eduardo Graells-Garrido; Diego Caro; Omar Miranda; Rossano Schifanella; Oscar Peredo
People fulfill their informational needs through smartphones, however, little is known regarding how the urban fabric and the activities that take place in it affect the usage of mobile applications. In this regard, starting from an anonymized dataset of Deep Packet Inspection (DPI) data from the largest telecommunications operator in Chile, we focus on the following questions: What are the most popular applications used in the city Where are they spatially clustered When does an application is more frequently used And How does the urban context and the mobility patterns relate to application usage As a result, we observed that specific applications present high spatial clustering, while the most popular services are geographically dispersed throughout the entire city. Clusters appear in places of high floating population; however, hotspots vary in space depending on the application. Interestingly, we found that commuting plays an important role, both in terms of rush hours and transportation infrastructure. We present a discussion on these results, focusing on how the physical space and the daily commuting routine affect the pattern of data consumption and represent an important aspect in mobile users behavioral studies.