Is this you? Create Your Porfile

Nicolas Poggi

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nicolas Poggi is active.

Explore More

Publication

Featured researches published by Nicolas Poggi.

international conference on user modeling, adaptation, and personalization | 2007

Web Customer Modeling for Automated Session Prioritization on High Traffic Sites

Nicolas Poggi; Toni Moreno; Josep Lluis Berral; Ricard Gavaldà; Jordi Torres

In the Web environment, user identification is becoming a major challenge for admission control systems on high traffic sites. When a web server is overloaded there is a significant loss of throughput when we compare finished sessions and the number of responses per second; longer sessions are usually the ones ending in sales but also the most sensitive to load failures. Session-based admission control systems maintain a high QoS for a limited number of sessions, but does not maximize revenue as it treats all non-logged sessions the same. We present a novel method for learning to assign priorities to sessions according to the revenue that will generate. For this, we use traditional machine learning techniques and Markov-chain models. We are able to train a system to estimate the probability of the users purchasing intentions according to its early navigation clicks and other static information. The predictions can be used by admission control systems to prioritize sessions or deny them if no resources are available, thus improving sales throughput per unit of time for a given infrastructure. We test our approach on access logs obtained from a high-traffic online travel agency, with promising results.

computer and communications security | 2008

Adaptive distributed mechanism against flooding network attacks based on machine learning

Josep Lluis Berral; Nicolas Poggi; Javier Alonso; Ricard Gavaldà; Jordi Torres; Manish Parashar

Adaptive techniques based on machine learning and data mining are gaining relevance in self-management and self-defense for networks and distributed systems. In this paper, we focus on early detection and stopping of distributed flooding attacks and network abuses. We extend the framework proposed by Zhang and Parashar (2006) to cooperatively detect and react to abnormal behaviors before the target machine collapses and network performance degrades. In this framework, nodes in an intermediate network share information about their local traffic observations, improving their global traffic perspective. In our proposal, we add to each node the ability of learning independently, therefore reacting differently according to its situation in the network and local traffic conditions. In particular, this frees the administrator from having to guess and manually set the parameters distinguishing attacks from non-attacks: now such thresholds are learned and set from experience or past data. We expect that our framework provides a faster detection and more accuracy in front of distributed flooding attacks than if static filters or single-machine adaptive mechanisms are used. We show simulations where indeed we observe a high rate of stopped attacks with minimum disturbance to the legitimate users.

business process management | 2013

Business process mining from e-commerce web logs

Nicolas Poggi; Vinod Muthusamy; David Carrera; Rania Khalaf

The dynamic nature of the Web and its increasing importance as an economic platform create the need of new methods and tools for business efficiency. Current Web analytic tools do not provide the necessary abstracted view of the underlying customer processes and critical paths of site visitor behavior. Such information can offer insights for businesses to react effectively and efficiently. We propose applying Business Process Management (BPM) methodologies to e-commerce Website logs, and present the challenges, results and potential benefits of such an approach. We use the Business Process Insight (BPI) platform, a collaborative process intelligence toolset that implements the discovery of loosely-coupled processes, and includes novel process mining techniques suitable for the Web. Experiments are performed on custom click-stream logs from a large online travel and booking agency. We first compare Web clicks and BPM events, and then present a methodology to classify and transform URLs into events. We evaluate traditional and custom process mining algorithms to extract business models from real-life Web data. The resulting models present an abstracted view of the relation between pages, exit points, and critical paths taken by customers. Such models show important improvements and aid high-level decision making and optimization of e-commerce sites compared to current state-of-art Web analytics.

international conference on autonomic computing | 2008

Tailoring Resources: The Energy Efficient Consolidation Strategy Goes Beyond Virtualization

Jordi Torres; David Carrera; Vicenç Beltran; Nicolas Poggi; Kevin Hogan; Josep Lluis Berral; Ricard Gavaldà; Eduard Ayguadé; Toni Moreno; Jordi Guitart

Virtualization and consolidation are two complementary techniques widely adopted in a global strategy to reduce system management complexity. In this paper we show how two simple and well-known techniques can be combined to dramatically increase the energy efficiency of a virtualized and consolidated data center. This result is obtained by introducing a new approach to the consolidation strategy that allows an important reduction in the amount of active nodes required to process a web workload without degrading the offered service level. Furthermore, when the system eventually gets overloaded and no energy can be saved without loosing performance, we show how these techniques can still improve the overall value obtained from the workload. The two techniques are memory compression and request discrimination, and were separately studied and validated in a previous work to be now combined in a joint effort. Our results indicate that an important improvement can be achieved by deciding not only how resources are allocated, but also how they are used. Moreover, we believe that this serves as an illustrative example of a new way of management: tailoring the resources to meet high level energy efficiency goals.

ieee international symposium on workload characterization | 2010

Characterization of workload and resource consumption for an online travel and booking site

Nicolas Poggi; David Carrera; Ricard Gavaldà; Jordi Torres; Eduard Ayguadé

Online travel and ticket booking is one of the top E-Commerce industries. As they present a mix of products: flights, hotels, tickets, restaurants, activities and vacational packages, they rely on a wide range of technologies to support them: Javascript, AJAX, XML, B2B Web services, Caching, Search Algorithms and Affiliation; resulting in a very rich and heterogeneous workload. Moreover, visits to travel sites present a great variability depending on time of the day, season, promotions, events, and linking; creating bursty traffic, making capacity planning a challenge. It is therefore of great importance to understand how users and crawlers interact on travel sites and their effect on server resources, for devising cost effective infrastructures and improving the Quality of Service for users. In this paper we present a detailed workload and resource consumption characterization of the web site of a top national Online Travel Agency. Characterization is performed on server logs, including both HTTP data and resource consumption of the requests, as well as the server load status during the execution. From the dataset we characterize user sessions, their patterns and how response time is affected as load on Web servers increases. We provide a fine grain analysis by performing experiments differentiating: types of request, time of the day, products, and resource requirements for each. Results show that the workload is bursty, as expected, that exhibit different properties between day and night traffic in terms of request type mix, that user session length cover a wide range of durations, which response time grows proportionally to server load, and that response time of external data providers also increase on peak hours, amongst other results. Such results can be useful for optimizing infrastructure costs, improving QoS for users, and development of realistic workload generators for similar applications.

international conference on big data | 2014

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Nicolas Poggi; David Carrera; Aaron Call; Sergio Mendoza; Yolanda Becerra; Jordi Torres; Eduard Ayguadé; Fabrizio Gagliardi; Jesús Labarta; Rob Reinauer; Nikola Vujic; Daron Green; José A. Blakeley

This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results. ALOJA is the latest phase of a long-term collaborative engagement between BSC and Microsoft which, over the past 6 years has explored a range of different aspects of computing systems, software technologies and performance profiling. While during the last 5 years, Hadoop has become the de-facto platform for Big Data deployments, still little is understood of how the different layers of the software and hardware deployment options affects its performance. Early ALOJA results show that Hadoops runtime performance, and therefore its price, are critically affected by relatively simple software and hardware configuration choices e.g., number of mappers, compression, or volume configuration. Project ALOJA presents a vendor-neutral repository featuring over 5000 Hadoop runs, a test bed, and tools to evaluate the cost-effectiveness of different hardware, parameter tuning, and Cloud services for Hadoop. As few organizations have the time or performance profiling expertise, we expect our growing repository will benefit Hadoop customers to meet their Big Data application needs. ALOJA seeks to provide both knowledge and an online service to with which users make better informed configuration choices for their Hadoop compute infrastructure whether this be on-premise or cloud-based. The initial version of ALOJAs Web application and sources are available at http://hadoop.bsc.es.

knowledge discovery and data mining | 2015

ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments

Josep Lluis Berral; Nicolas Poggi; David Carrera; Aaron Call; Rob Reinauer; Daron Green

This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results. The ALOJA-ML project is the latest phase of a long-term collaboration between BSC and Microsoft, to automate the characterization of cost-effectiveness on Big Data deployments, focusing on Hadoop. Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices. Recently the ALOJA project presented an open, vendor-neutral repository, featuring over 16.000 Hadoop executions. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services. Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces. ALOJA-ML provides such an automated system allowing knowledge discovery by modeling Hadoop executions from observed benchmarks across a broad set of configuration parameters. The resulting empirically-derived performance models can be used to forecast execution behavior of various workloads; they allow a-priori prediction of the execution times for new configurations and HW choices and they offer a route to model-based anomaly detection. In addition, these models can guide the benchmarking exploration efficiently, by automatically prioritizing candidate future benchmark tests. Insights from ALOJA-MLs models can be used to reduce the operational time on clusters, speed-up the data acquisition and knowledge discovery process, and importantly, reduce running costs. In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big Data applications.

network computing and applications | 2011

Non-intrusive Estimation of QoS Degradation Impact on E-Commerce User Satisfaction

Nicolas Poggi; David Carrera; Ricard Gavaldà; Eduard Ayguadé

With the massification of high speed Internet access, recent industry consumer reports show that Web site performance is increasingly becoming a key feature in determining user satisfaction, and finally, a decisive factor in whether a user will purchase on a Web site or even return to it. Traditional Web infrastructure capacity planning has focused on maintaining high throughput and availability on Web sites, optimizing the number of servers to serve peak hours to minimize costs. However, as we will show with our study, the conversion rate, the fraction of users that purchase on a site, is higher at peak hours, where systems are more exposed to suffer overload. In this article we propose a methodology to determine the thresholds of user satisfaction as the QoS delivered by an online business degrades, and to estimate its effects on actual sales. The novelty of the presented technique is that it does not involve any intrusive manipulation of production systems, but a learning process over historic sales data that is combined with system performance measurements. The methodology has been applied to Atrapalo.com, a top national Travel and Booking site. For our experiments, we were given access to a 3 year long sales history dataset, as well as actual HTTP and resource consumption logs for several weeks. Obtained results enable autonomic resource managers to set best performance goals and optimize the number of server according to the workload, without surpassing the thresholds of user satisfaction and maximizing revenue for the site.

Technology Conference on Performance Evaluation and Benchmarking | 2015

Big Data Benchmark Compendium

Todor Ivanov; Tilmann Rabl; Meikel Poess; Anna Queralt; John Poelman; Nicolas Poggi; Jeffrey Buell

The field of Big Data and related technologies is rapidly evolving. Consequently, many benchmarks are emerging, driven by academia and industry alike. As these benchmarks are emphasizing different aspects of Big Data and, in many cases, covering different technical platforms and uses cases, it is extremely difficult to keep up with the pace of benchmark creation. Also with the combinations of large volumes of data, heterogeneous data formats and the changing processing velocity, it becomes complex to specify an architecture which best suits all application requirements. This makes the investigation and standardization of such systems very difficult. Therefore, the traditional way of specifying a standardized benchmark with pre-defined workloads, which have been in use for years in the transaction and analytical processing systems, is not trivial to employ for Big Data systems. This document provides a summary of existing benchmarks and those that are in development, gives a side-by-side comparison of their characteristics and discusses their pros and cons. The goal is to understand the current state in Big Data benchmarking and guide practitioners in their approaches and use cases.

international conference on big data | 2015

From performance profiling to predictive analytics while evaluating hadoop cost-efficiency in ALOJA

Nicolas Poggi; Josep Lluis Berral; David Carrera; Aaron Call; Fabrizio Gagliardi; Rob Reinauer; Nikola Vujic; Daron Green; José A. Blakeley

During the past years the exponential growth of data, its generation speed, and its expected consumption rate presents one of the most important challenges in IT both for industry and research. For these reasons, the ALOJA research project was created by BSC and Microsoft as an open initiative to increase cost-efficiency and the general understanding of Big Data systems via automation and learning. The development of the project over its first year, has resulted in a open source benchmarking platform used to produce the largest public repository of Big Data results1, featuring over 42,000 job execution details. ALOJA also includes web-based analytic tools to evaluate and gather insights about cost-performance of benchmarked systems. The tools offer means to extract knowledge that can lead to optimize configuration and deployment options in the Cloud i.e., selecting the most cost-effective VMs and cluster sizes. This article describes the evolution of the project focus and research lines, for a period of over a year while continuously benchmarking systems for Big Data. As well discusses the motivation - both technical and market-based - of such changes. It also presents the main results from the evaluation of different OS and Hadoop configurations, covering over 100 hardware deployments. During this time, ALOJAs initial target has shifted from a previous low-level profiling of Hadoop runtime with HPC tools, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. The ongoing efforts in PA show promising results to automatically model the behavior of systems i.e., predicting job execution times with high accuracy or to reduce the number of benchmark runs needed. As well as for Knowledge Discovery (KD) to find relations among software and hardware components. Techniques that jointly support foresighting cost-effectiveness of new defined systems, reducing benchmarking time and costs.

Explore More