Is this you? Create Your Porfile

Kjetil Nørvåg

Norwegian University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kjetil Nørvåg is active.

Explore More

Publication

Featured researches published by Kjetil Nørvåg.

symposium on large spatial databases | 2011

Efficient processing of top-k spatial keyword queries

João B. Rocha-Junior; Orestis Gkorgkas; Simon Jonassen; Kjetil Nørvåg

Given a spatial location and a set of keywords, a top-k spatial keyword query returns the k best spatio-textual objects ranked according to their proximity to the query location and relevance to the query keywords. There are many applications handling huge amounts of geotagged data, such as Twitter and Flickr, that can benefit from this query. Unfortunately, the state-of-the-art approaches require non-negligible processing cost that incurs in long response time. In this paper, we propose a novel index to improve the performance of top-k spatial keyword queries named Spatial Inverted Index (S2I). Our index maps each distinct term to a set of objects containing the term. The objects are stored differently according to the document frequency of the term and can be retrieved efficiently in decreasing order of keyword relevance and spatial proximity. Moreover, we present algorithms that exploit S2I to process top-k spatial keyword queries efficiently. Finally, we show through extensive experiments that our approach outperforms the state-of-the-art approaches in terms of update and query cost.

very large data bases | 2014

A survey of large-scale analytical query processing in MapReduce

Christos Doulkeridis; Kjetil Nørvåg

Enterprises today acquire vast volumes of data from different sources and leverage this information by means of data analysis to support effective decision-making and provide new functionality and services. The key requirement of data analytics is scalability, simply due to the immense volume of data that need to be extracted, processed, and analyzed in a timely fashion. Arguably the most popular framework for contemporary large-scale data analytics is MapReduce, mainly due to its salient features that include scalability, fault-tolerance, ease of programming, and flexibility. However, despite its merits, MapReduce has evident performance limitations in miscellaneous analytical tasks, and this has given rise to a significant body of research that aim at improving its efficiency, while maintaining its desirable properties. This survey aims to review the state of the art in improving the performance of parallel query processing using MapReduce. A set of the most significant weaknesses and limitations of MapReduce is discussed at a high level, along with solving techniques. A taxonomy is presented for categorizing existing research on MapReduce improvements according to the specific problem they target. Based on the proposed taxonomy, a classification of existing research is provided focusing on the optimization objective. Concluding, we outline interesting directions for future parallel data processing systems.

international conference on data engineering | 2010

Reverse top-k queries

Akrivi Vlachou; Christos Doulkeridis; Yannis Kotidis; Kjetil Nørvåg

Rank-aware query processing has become essential for many applications that return to the user only the top-k objects based on the individual users preferences. Top-k queries have been mainly studied from the perspective of the user, focusing primarily on efficient query processing. In this work, for the first time, we study top-k queries from the perspective of the product manufacturer. Given a potential product, which are the user preferences for which this product is in the top-k query result set? We identify a novel query type, namely reverse top-k query, that is essential for manufacturers to assess the potential market and impact of their products based on the competition. We formally define reverse top-k queries and introduce two versions of the query, namely monochromatic and bichromatic. We first provide a geometric interpretation of the monochromatic reverse top-k query in the solution space that helps to understand the reverse top-k query conceptually. Then, we study in more details the case of bichromatic reverse top-k query, which is more interesting for practical applications. Such a query, if computed in a straightforward manner, requires evaluating a top-k query for each user preference in the database, which is prohibitively expensive even for moderate datasets. In this paper, we present an efficient threshold-based algorithm that eliminates candidate user preferences, without processing the respective top-k queries. Furthermore, we introduce an indexing structure based on materialized reverse top-k views in order to speed up the computation of reverse top-k queries. Materialized reverse top-k views trade preprocessing cost for query speed up in a controllable manner. Our experimental evaluation demonstrates the efficiency of our techniques, which reduce the required number of top-k computations by 1 to 3 orders of magnitude.

international conference on management of data | 2008

On efficient top-k query processing in highly distributed environments

Akrivi Vlachou; Christos Doulkeridis; Kjetil Nørvåg; Michalis Vazirgiannis

Lately the advances in centralized database management systems show a trend towards supporting rank-aware query operators, like top-k, that enable users to retrieve only the most interesting data objects. A challenging problem is to support rank-aware queries in highly distributed environments. In this paper, we present a novel approach, called SPEERTO, for top-k query processing in large-scale peer-to-peer networks, where the dataset is horizontally distributed over the peers. Towards this goal, we explore the applicability of the skyline operator for efficiently routing top-k queries in a large super-peer network. Relying on a thresholding scheme, SPEERTO returns the exact results progressively to the user, while the number of queried super-peers and transferred data is minimized. Finally, we propose different variations of SPEERTO that allow balancing between transferred data volume and response time. Through simulations we demonstrate the feasibility of our approach.

extending database technology | 2012

Top-k spatial keyword queries on road networks

João B. Rocha-Junior; Kjetil Nørvåg

With the popularization of GPS-enabled devices there is an increasing interest for location-based queries. In this context, one interesting problem is processing top-k spatial keyword queries. Given a set of objects with a textual description (e.g., menu of a restaurant), a query location (latitude and longitude), and a set of query keywords, a top-k spatial keyword query returns the k best objects ranked in terms of both distance to the query location and textual relevance to the query keywords. So far, the research on this problem has assumed Euclidean space. In order to process such queries efficiently, spatio-textual indexes combining R-trees and inverted files are employed. However, for most real applications, the distance between the objects and query location is constrained by a road network (shortest path) and cannot be computed efficiently using R-trees. In this paper, we address, for the first time, the challenging problem of processing top-k spatial keyword queries on road networks where the distance between the query location and the spatial object is the shortest path. We formalize the new query type, and present novel indexing structures and algorithms that are able to process such queries efficiently. Finally, we perform an experimental evaluation that shows the efficiency of our approach.

european conference on research and advanced technology for digital libraries | 2010

Determining time of queries for re-ranking search results

Nattiya Kanhabua; Kjetil Nørvåg

Recent work on analyzing query logs shows that a significant fraction of queries are temporal, i.e., relevancy is dependent on time, and temporal queries play an important role in many domains, e.g., digital libraries and document archives. Temporal queries can be divided into two types: 1) those with temporal criteria explicitly provided by users, and 2) those with no temporal criteria provided. In this paper, we deal with the latter type of queries, i.e., queries that comprise only keywords, and their relevant documents are associated to particular time periods not given by the queries. We propose a number of methods to determine the time of queries using temporal language models. After that, we show how to increase the retrieval effectiveness by using the determined time of queries to re-rank the search results. Through extensive experiments we show that our proposed approaches improve retrieval effectiveness.

IEEE Journal on Selected Areas in Communications | 2007

DESENT: decentralized and distributed semantic overlay generation in P2P networks

Christos Doulkeridis; Kjetil Nørvåg; Michalis Vazirgiannis

The current approach in web searching, i.e., using centralized search engines, rises issues that question their future applicability: 1) coverage and scalability, 2) freshness, and 3) information monopoly. Performing web search using a P2P architecture that consists of the actual web servers has the potential to tackle those issues. In order to achieve the desired performance and scalability, as well as enhancing search quality relative to centralized search engines, semantic overlay networks (SONS) connecting peers storing semantically related information can be employed. The lack of global content/topology knowledge in a P2P system is the key challenge in forming SONS, and this paper describes an unsupervised approach for decentralized and distributed generation of SONS (DESENT). Through simulations and analytical cost models we verify our claims regarding performance, scalability, and quality.

international conference on conceptual modeling | 2012

Fast group recommendations by applying user clustering

Eirini Ntoutsi; Kostas Stefanidis; Kjetil Nørvåg; Hans-Peter Kriegel

Recommendation systems have received significant attention, with most of the proposed methods focusing on personal recommendations. However, there are contexts in which the items to be suggested are not intended for a single user but for a group of people. For example, assume a group of friends or a family that is planning to watch a movie or visit a restaurant. In this paper, we propose an extensive model for group recommendations that exploits recommendations for items that similar users to the group members liked in the past. We do not exhaustively search for similar users in the whole user base, but we pre-partition users into clusters of similar ones and use the cluster members for recommendations. We efficiently aggregate the single user recommendations into group recommendations by leveraging the power of a top-k algorithm. We evaluate our approach in a real dataset of movie ratings.

very large data bases | 2010

Identifying the most influential data objects with reverse top-k queries

Akrivi Vlachou; Christos Doulkeridis; Kjetil Nørvåg; Yannis Kotidis

Top-k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-k queries leads to a query type that instead returns the set of customers that find a product appealing (it belongs to the top-k result set of their preferences). In this paper, we address the challenging problem of processing queries that identify the top-m most influential products to customers, where influence is defined as the cardinality of the reverse top-k result set. This definition of influence is useful for market analysis, since it is directly related to the number of customers that value a particular product and, consequently, to its visibility and impact in the market. Existing techniques require processing a reverse top-k query for each object in the database, which is prohibitively expensive even for databases of moderate size. In contrast, we propose two algorithms, SB and BB, for identifying the most influential objects: SB restricts the candidate set of objects that need to be examined, while BB is a branch-and-bound algorithm that retrieves the result incrementally. Furthermore, we propose meaningful variations of the query for most influential objects that are supported by our algorithms. Our experiments demonstrate the efficiency of our algorithms both for synthetic and real-life datasets.

international conference on management of data | 2013

Branch-and-bound algorithm for reverse top-k queries

Akrivi Vlachou; Christos Doulkeridis; Kjetil Nørvåg; Yannis Kotidis

Top-k queries return to the user only the k best objects based on the individual user preferences and comprise an essential tool for rank-aware query processing. Assuming a stored data set of user preferences, reverse top-k queries have been introduced for retrieving the users that deem a given database object as one of their top-k results. Reverse top-k queries have already attracted significant interest in research, due to numerous real-life applications such as market analysis and product placement. Currently, the most efficient algorithm for computing the reverse top-k set is RTA. RTA has two main drawbacks when processing a reverse top-k query: (i) it needs to access all stored user preferences, and (ii) it cannot avoid executing a top-k query for each user preference that belongs to the result set. To address these limitations, in this paper, we identify useful properties for processing reverse top-k queries without accessing each users individual preferences nor executing the top-k query. We propose an intuitive branch-and-bound algorithm for processing reverse top-k queries efficiently and discuss novel optimizations to boost its performance. Our experimental evaluation demonstrates the efficiency of the proposed algorithm that outperforms RTA by a large margin.

Explore More