Roberto Konow | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roberto Konow is active.

Explore More

Publication

Featured researches published by Roberto Konow.

international acm sigir conference on research and development in information retrieval | 2013

Faster and smaller inverted indices with treaps

Roberto Konow; Gonzalo Navarro; Charles L. A. Clarke; Alejandro López-Ortiz

We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.

computer supported cooperative work in design | 2010

Recommender system for contextual advertising in IPTV scenarios

Roberto Konow; Wayman Tan; Luis Loyola; Javier Pereira; Nelson Baloian

This paper presents a recommender system for contextual targeted advertisement in Video-on-demand scenarios. The proposal arises from a real case of a Japanese company planning to add advertisement to its On-Demand IPTV services. The advertisements consist of icon or text-based links that may be shown before, during or after the playing of the film the customer has selected to watch. The goal of the company is to maximize the number of times customers follow the links to advertised sites because its revenues depends on this. Since only a small portion of the advertising links can be included in a movie, these must be selected carefully. This work proposes a recommender system for selecting the most appropriate advertisement for a certain customer based on the success the advertisement has had in the past among other customers having similar preferences. The paper describes the proposed method, shows the implementation work done so far and describes the remaining work in order to test it in the real scenario.

string processing and information retrieval | 2012

Dual-Sorted inverted lists in practice

Roberto Konow; Gonzalo Navarro

We implement a recent theoretical proposal to represent inverted lists in memory, in a way that docid-sorted and weight-sorted lists are simultaneously represented in a single wavelet tree data structure. We compare our implementation with classical representations, where the ordering favors either bag-of-word queries or Boolean and weighted conjunctive queries, and demonstrate that the new data structure is faster than the state of the art for conjunctive queries, while it offers an attractive space/time tradeoff when both kinds of queries are of interest.

Information Systems | 2016

Aggregated 2D range queries on clustered points

Nieves R. Brisaboa; Guillermo de Bernardo; Roberto Konow; Gonzalo Navarro; Diego Seco

Efficient processing of aggregated range queries on two-dimensional grids is a common requirement in information retrieval and data mining systems, for example in Geographic Information Systems and OLAP cubes. We introduce a technique to represent grids supporting aggregated range queries that requires little space when the data points in the grid are clustered, which is common in practice. We show how this general technique can be used to support two important types of aggregated queries, which are ranked range queries and counting range queries. Our experimental evaluation shows that this technique can speed up aggregated queries up to more than an order of magnitude, with a small space overhead. HighlightsSpace-efficient representation for two-dimensional grids.Efficient support for aggregated range queries.Proved performance in main memory.Results competitive with the state of the art.Applications to several domains: Geographic Information Systems, OLAP cubes, etc.

string processing and information retrieval | 2014

K2-Treaps: Range Top-k Queries in Compact Space

Nieves R. Brisaboa; Guillermo de Bernardo; Roberto Konow; Gonzalo Navarro

Efficient processing of top-k queries on multidimensional grids is a common requirement in information retrieval and data mining, for example in OLAP cubes. We introduce a data structure, the K 2-treap, that represents grids in compact form and supports efficient prioritized range queries. We compare the K 2-treap with state-of-the-art solutions on synthetic and real-world datasets, showing that it uses 30% of the space of competing solutions while solving queries up to 10 times faster.

Computer Networks | 2015

PcapWT: An efficient packet extraction tool for large volume network traces

Young-Hwan Kim; Roberto Konow; Diego Dujovne; Thierry Turletti; Walid Dabbous; Gonzalo Navarro

Network packet tracing has been used for many different purposes during the last few decades, such as network software debugging, networking performance analysis, forensic investigation, and so on. Meanwhile, the size of packet traces becomes larger, as the speed of network rapidly increases. Thus, to handle huge amounts of traces, we need not only more hardware resources, but also e fficient software tools. However, traditional tools are inefficient at dealing with such big packet traces. In this paper, we propose pcapWT, an efficient packet extraction tool for large traces. PcapWT provides fast packet lookup by indexing an original trace using a Wavelet Tree structure. In addition, pcapWT supports multi-threading for avoiding synchronous I/O and blocking system calls used for file processing, and is particularly efficient on machines with SSD. PcapWT shows remarkable performance enhancements in comparison with traditional tools such as tcpdump and most recent tools such as pcapIndex in terms of index data size and packet extraction time. Our benchmark using large and complex traces shows that pcapWT reduces the index data size down below 1% of the volume of the original traces. Moreover, packet extraction performance is 20% better than with pcapIndex. Furthermore, when a small amount of packets are retrieved, pcapWT is hundreds of times faster than tcpdump.

ACM Transactions on Information Systems | 2017

Inverted Treaps

Roberto Konow; Gonzalo Navarro; Charles L. A. Clarke; Alejandro López-Ortiz

We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using similar space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression, we represent the treap topology using different alternative compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. We also show how to extend this representation to support incremental updates over the index. Results show that, under the tf-idf scoring scheme, our index uses about the same space as state-of-the-art compact representations, while performing up to 2--20 times faster on ranked single-word, union, or intersection queries. Under the BM25 scoring scheme, our index may use up to 40% more space than the others and outperforms them less frequently but still reaches improvement factors of 2--20 in the best cases. The index supporting incremental updates poses an overhead of 50%--100% over the static variants in terms of space, construction, and query time.

international workshop on groupware | 2006

A decentralized and flexible tool supporting extreme programming software development

Nelson Baloian; Francisco Claude; Roberto Konow; Mitsuji Matsumoto

This paper presents a system called CodeBreaker for supporting small and medium size software development based on an extreme program-ming principle. The system follows a decentralized model of development, which means, it does not requires a central repository. A set of rules for code ownership maintains the synchronization of the work among all members of the developing team which can work on- or offline. It allows fine-grained locking of parts of the code.

J3ea | 2017

Practical Compact Indexes for Top- k Document Retrieval

Simon Gog; Roberto Konow; Gonzalo Navarro

We present a fast and compact index for top-k document retrieval on general string collections, in which given a string pattern, the index returns the k documents where it appears most often. We adapt a linear-space and optimal-time theoretical solution, whose implementation poses various algorithm engineering challenges. Although a naive implementation of the optimal solution is estimated to require around 80n bytes for a text collection of n symbols, our implementation requires 2.5n to 3.0n bytes, text included, and answers queries within microseconds. This outperforms all previous practical indexes by orders of magnitude; the only index using less space is hundreds of times slower. Our index can be built on collections of hundreds of gigabytes and on tokenized text collections.

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems | 2017

Competitive Author Profiling Using Compression-Based Strategies

Francisco Claude; Daniil Galaktionov; Roberto Konow; Susana Ladra; Oscar Pedreira

Author profiling consists in determining some demographic attributes — such as gender, age, nationality, language, religion, and others — of an author for a given document. This task, which has applications in fields such as forensics, security, or marketing, has been approached from different areas, especially from linguistics and natural language processing, by extracting different types of features from training documents, usually content — and style-based features. In this paper we address the problem by using several compression-inspired strategies that generate different models without analyzing or extracting specific features from the textual content, making them style-oblivious approaches. We analyze the behavior of these techniques, combine them and compare them with other state-of-the-art methods. We show that they can be competitive in terms of accuracy, giving the best predictions for some domains, and they are efficient in time performance.

Explore More