Jussara M. Almeida
Universidade Federal de Minas Gerais
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jussara M. Almeida.
web search and data mining | 2011
Flavio Figueiredo; Fabrício Benevenuto; Jussara M. Almeida
Understanding content popularity growth is of great importance to Internet service providers, content creators and online marketers. In this work, we characterize the growth patterns of video popularity on the currently most popular video sharing application, namely YouTube. Using newly provided data by the application, we analyze how the popularity of individual videos evolves since the videos upload time. Moreover, addressing a key aspect that has been mostly overlooked by previous work, we characterize the types of the referrers that most often attracted users to each video, aiming at shedding some light into the mechanisms (e.g., searching or external linking) that often drive users towards a video, and thus contribute to popularity growth. Our analyses are performed separately for three video datasets, namely, videos that appear in the YouTube top lists, videos removed from the system due to copyright violation, and videos selected according to random queries submitted to YouTubes search engine. Our results show that popularity growth patterns depend on the video dataset. In particular, copyright protected videos tend to get most of their views much earlier in their lifetimes, often exhibiting a popularity growth characterized by a viral epidemic-like propagation process. In contrast, videos in the top lists tend to experience sudden significant bursts of popularity. We also show that not only search but also other YouTube internal mechanisms play important roles to attract users to videos in all three datasets.
network and operating system support for digital audio and video | 2001
Jussara M. Almeida; Jeffrey Krueger; Derek L. Eager; Mary K. Vernon
This paper presents an extensive analysis of the client workloads for educational media servers at two major U.S. universities. The goals of the analysis include providing data for generating synthetic workloads, gaining insight into the design of streaming content distribution networks, and quantifying how much server bandwidth can be saved in interactive educational environments by using recently developed multicast streaming methods for stored content.
web search and data mining | 2013
Henrique Pinto; Jussara M. Almeida; Marcos André Gonçalves
Predicting Web content popularity is an important task for supporting the design and evaluation of a wide range of systems, from targeted advertising to effective search and recommendation services. We here present two simple models for predicting the future popularity of Web content based on historical information given by early popularity measures. Our approach is validated on datasets consisting of videos from the widely used YouTube video-sharing portal. Our experimental results show that, compared to a state-of-the-art baseline model, our proposed models lead to significant decreases in relative squared errors, reaching up to 20% reduction on average, and larger reductions (of up to 71%) for videos that experience a high peak in popularity in their early days followed by a sharp decrease in popularity.
international acm sigir conference on research and development in information retrieval | 2009
Fabrício Benevenuto; Tiago Rodrigues; Virgílio A. F. Almeida; Jussara M. Almeida; Marcos André Gonçalves
A number of online video social networks, out of which YouTube is the most popular, provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, opportunistic users--promoters--may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem. In this paper, we go a step further by addressing the issue of detecting video spammers and promoters. Towards that end, we manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimates. Using our test collection, we provide a characterization of social and content attributes that may help distinguish each user class. We also investigate the feasibility of using a state-of-the-art supervised classification algorithm to detect spammers and promoters, and assess its effectiveness in our test collection. We found that our approach is able to correctly identify the majority of the promoters, misclassifying only a small percentage of legitimate users. In contrast, although we are able to detect a significant fraction of spammers, they showed to be much harder to distinguish from legitimate users.
international world wide web conferences | 2004
Cristiano P. Costa; Ítalo Cunha; Alex Borges; Claudiney V. Ramos; Marcus Vinicius de Melo Rocha; Jussara M. Almeida; Berthier A. Ribeiro-Neto
This paper provides an extensive analysis of pre-stored streaming media workloads, focusing on the client interactive behavior. We analyze four workloads that fall into three different domains, namely, education, entertainment video and entertainment audio. Our main goals are: (a) to identify qualitative similarities and differences in the typical client behavior for the three workload classes and (b) to provide data for generating realistic synthetic workloads.
social network systems | 2008
Marcelo Maia; Jussara M. Almeida; Virgílio A. F. Almeida
Online social networks pose an interesting problem: how to best characterize the different classes of user behavior. Traditionally, user behavior characterization methods, based on user individual features, are not appropriate for online networking sites. In these environments, users interact with the site and with other users through a series of multiple interfaces that let them to upload and view content, choose friends, rank favorite content, subscribe to users and do many other interactions. Different interaction patterns can be observed for different groups of users. In this paper, we propose a methodology for characterizing and identifying user behaviors in online social networks. First, we crawled data from YouTube and used a clustering algorithm to group users that share similar behavioral pattern. Next, we have shown that attributes that stem from the user social interactions, in contrast to attributes relative to each individual user, are good discriminators and allow the identification of relevant user behaviors. Finally, we present and discuss experimental results of the use of proposed methodology. A set of useful profiles, derived from the analysis of the YouTube sample is presented. The identification of different classes of user behavior has the potential to improve, for instance, recommendation systems for advertisements in online social networks.
measurement and modeling of computer systems | 2001
Erich M. Nahum; Marcel-Catalin Rosu; Srinivasan Seshan; Jussara M. Almeida
WWW workload generators are used to evaluate web server performance, and thus have a large impact on what performance optimizations are applied to servers. However, current benchmarks ignore a crucial component: how these servers perform in the environment in which they are intended to be used, namely the wide-area Internet.This paper shows how WAN conditions can affect WWW server performance. We examine these effects using an experimental test-bed which emulates WAN characteristics in a live setting, by introducing factors such as delay and packet loss in a controlled and reproducible fashion. We study how these factors interact with the host TCP implementation and what influence they have on web server performance. We demonstrate that when more realistic wide-area conditions are introduced, servers exhibit very different performance properties and scaling behaviors, which are not exposed by existing benchmarks running on LANs. We show that observed throughputs can give misleading information about server performance, and thus find that maximum throughput, or capacity, is a more useful metric. We find that packet losses can reduce server capacity by as much as 50 percent and increase response time as seen by the client. We show that using TCP SACK can reduce client response time, without reducing server capacity.
network operations and management symposium | 2006
Bruno D. Abrahao; Virgílio A. F. Almeida; Jussara M. Almeida; Alex Zhang; Dirk Beyer; Fereydoon Safai
This work considers the problem of hosting multiple third-party Internet services in a cost-effective manner so as to maximize a providers business objective. For this purpose, we present a dynamic capacity management framework based on an optimization model, which links a cost model based on SLA contracts with an analytical queuing-based performance model, in an attempt to adapt the platform to changing capacity needs in real time. In addition, we propose a two-level SLA specification for different operation modes, namely, normal and surge, which allows for per-use service accounting with respect to requirements of throughput and tail distribution response time. The cost model proposed is based on penalties, incurred by the provider due to SLA violation, and rewards, received when the service level expectations are exceeded. Finally, we evaluate approximations for predicting the performance of the hosted services under two different scheduling disciplines, namely FCFS and processor sharing. Through simulation, we assess the effectiveness of the proposed approach as well as the level of accuracy resulting from the performance model approximations
HPN '97 Proceedings of the IFIP TC6 seventh international conference on High performance netwoking VII | 1997
Jussara M. Almeida; Virgílio A. F. Almeida; David J. Yates
Abstract Server performance has become a crucial issue for improving the overall performance of the World-Wide Web. This paper describes Webmonitor, a tool for evaluating and understanding server performance, and presents new results for a realistic workload. Webmonitor measures activity and resource consumption, both within the kernel and in HTTP processes running in user space. Webmonitor is implemented using an efficient combination of sampling and event-driven techniques that exhibit low overhead. Our initial implementation is for the Apache World-Wide Web server running on the Linux operating system. We demonstrate the utility of Webmonitor by measuring and understanding the performance of a Pentium-based PC acting as a dedicated WWW server. Our workload uses a file size distribution with a heavy tail. This captures the fact that Web servers must concurrently handle some requests for large audio and video files, and a large number of requests for small documents, containing text or images. Our results show that in a Web server saturated by client requests, over 90% of the time spent handling HTTP requests is spent in the kernel. Furthermore, keeping TCP connections open, as required by TCP, causes a factor of 2-9 increase in the elapsed time required to service an HTTP request. Data gathered from Webmonitor provide insight into the causes of this performance penalty. Specifically, we observe a significant increase in resource consumption along three dimensions: the number of HTTP processes running at the same time, CPU utilization, and memory utilization. These results emphasize the important role of operating system and network protocol implementation in determining Web server performance.
ACM Transactions on Multimedia Computing, Communications, and Applications | 2009
Fabrício Benevenuto; Tiago Rodrigues; Virgílio A. F. Almeida; Jussara M. Almeida; Keith W. Ross
This article characterizes video-based interactions that emerge from YouTubes video response feature, which allows users to discuss themes and to provide reviews for products or places using much richer media than text. Based on crawled data covering a representative subset of videos and users, we present a characterization from two perspectives: the video response view and the interaction network view. In addition to providing valuable statistical models for various characteristics, our study uncovers typical user behavioral patterns in video-based environments and shows evidence of opportunistic behavior.