Noam Koenigstein
Tel Aviv University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Noam Koenigstein.
conference on recommender systems | 2011
Noam Koenigstein; Gideon Dror; Yehuda Koren
In the past decade large scale recommendation datasets were published and extensively studied. In this work we describe a detailed analysis of a sparse, large scale dataset, specifically designed to push the envelope of recommender system models. The Yahoo! Music dataset consists of more than a million users, 600 thousand musical items and more than 250 million ratings, collected over a decade. It is characterized by three unique features: First, rated items are multi-typed, including tracks, albums, artists and genres; Second, items are arranged within a four level taxonomy, proving itself effective in coping with a severe sparsity problem that originates from the unusually large number of items (compared to, e.g., movie ratings datasets). Finally, fine resolution timestamps associated with the ratings enable a comprehensive temporal and session analysis. We further present a matrix factorization model exploiting the special characteristics of this dataset. In particular, the model incorporates a rich bias model with terms that capture information from the taxonomy of items and different temporal dynamics of music ratings. To gain additional insights of its properties, we organized the KddCup-2011 competition about this dataset. As the competition drew thousands of participants, we expect the dataset to attract considerable research activity in the future.
conference on information and knowledge management | 2012
Noam Koenigstein; Parikshit Ram; Yuval Shavitt
Low-rank Matrix Factorization (MF) methods provide one of the simplest and most effective approaches to collaborative filtering. This paper is the first to investigate the problem of efficient retrieval of recommendations in a MF framework. We reduce the retrieval in a MF model to an apparently simple task of finding the maximum dot-product for the user vector over the set of item vectors. However, to the best of our knowledge the problem of efficiently finding the maximum dot-product in the general case has never been studied. To this end, we propose two techniques for efficient search -- (i) We index the item vectors in a binary spatial-partitioning metric tree and use a simple branch and-bound algorithm with a novel bounding scheme to efficiently obtain exact solutions. (ii) We use spherical clustering to index the users on the basis of their preferences and pre-compute recommendations only for the representative user of each cluster to obtain extremely efficient approximate solutions. We obtain a theoretical error bound which determines the quality of any approximate result and use it to control the approximation. Both these simple techniques are fairly independent of each other and hence are easily combined to further improve recommendation retrieval efficiency. We evaluate our algorithms on real-world collaborative-filtering datasets, demonstrating more than ×7 speedup (with respect to the naive linear search) for the exact solution and over ×250 speedup for approximate solutions by combining both techniques.
Proceedings of the IEEE | 2012
Gideon Dror; Noam Koenigstein; Yehuda Koren
Modern consumers are inundated with choices. A variety of products are offered to consumers, who have unprecedented opportunities to select products that meet their needs. The opportunity for selection also presents a time-consuming need to select. This has led to the development of recommender systems that direct consumers to products expected to satisfy them. One area in which such systems are particularly useful is that of media products, such as movies, books, television, and music. We study the details of media recommendation by focusing on a large scale music recommender system. To this end, we introduce a music rating data set that is likely to be the largest of its kind, in terms of both number of users, items, and total number raw ratings. The data were collected by Yahoo! Music over a decade. We formulate a detailed recommendation model, specifically designed to account for the data set properties, its temporal dynamics, and the provided taxonomy of items. The paper demonstrates a design process that we believe to be useful at many other recommendation setups. The process is based on gradual modeling of additive components of the model, each trying to reflect a unique characteristic of the data.
knowledge discovery and data mining | 2008
Noam Koenigstein; Yuval Shavitt; Tomer Tankel
Record label companies would like to identify potential artists as early as possible in their careers, before other companies approach the artists with competing contracts. The vast number of candidates makes the process of identifying the ones with high success potential time consuming and laborious. This paper demonstrates how datamining of P2P query strings can be used in order to mechanize most of this detection process. Using a unique intercepting system over the Gnutella network, we were able to capture an unprecedented amount of geographically identified (geo-aware) queries, allowing us to investigate the diffusion of music related queries in time and space. Our solution is based on the observation that emerging artists, especially rappers, have a discernible stronghold of fans in their hometown area, where they are able to perform and market their music. In a file sharing network, this is reflected as a delta function spatial distribution of content queries. Using this observation, we devised a detection algorithm for emerging artists, that looks for performers with sharp increase in popularity in a small geographic region though still unnoticable nation wide. The algorithm can suggest a short list of artists with breakthrough potential, from which we showed that about 30% translate the potential to national success.
international symposium on multimedia | 2009
Noam Koenigstein; Yuval Shavitt; Noa Zilberman
Peer to Peer networks are the leading cause for music piracy but also used for music sampling prior to purchase. In this paper we investigate the relations between music file sharing and sales (both physical and digital)using large Peer-to-Peer query database information. We compare file sharing information on songs to their popularity on the Billboard Hot 100 and the Billboard Digital Songs charts, and show that popularity trends of songs on the Billboard have very strong correlation (0.88-0.89) to their popularity on a Peer-to-Peer network. We then show how this correlation can be utilized by common data mining algorithms to predict a songs success in the Billboard in advance, using Peer-to-Peer information.
Computer Networks | 2012
Noam Koenigstein; Yuval Shavitt
Record labels would like to identify potential artists as early as possible in their career, before other companies approach the artists with competing contracts. However, there is a huge number of new artists, and the process of identifying the ones with high success potential is labor intensive. This paper demonstrates how data mining in P2P networks can be used together with social marketing theories in order to mechanize most of this detection process. Using a unique intercepting system over the Gnutella network we captured an unprecedented amount of geographically identified queries, allowing us to investigate the diffusion of music related content in time and space. Our solution is based on the observation that successful artists, start by growing a discernible stronghold of fans in their hometown area, where they are able to perform and market their music. Only then they manage to breakthrough to national fame. In a file sharing network, their initial local success is reflected as a delta function spatial distribution of content queries. Using this observation, we devised a detection algorithm for emerging artists that suggests a short list of artists with breakthrough potential, from which we showed that about 30% translate the potential to national success.
international conference on multimedia and expo | 2010
Noam Koenigstein; Yuval Shavitt; Tomer Tankel; Ela Weinsberg; Udi Weinsberg
The usage of peer-to-peer (p2p) networks for music information retrieval (MIR) tasks is gaining momentum. P2P file sharing networks can be used for collecting both search queries and files from shared folders. The first can be utilized to reveal current taste, users interest, and trends, while the latter can be used for enhancing recommender systems. Both provide opportunities for longitudinal analysis, as queries change over time and content often accumulates. Moreover, spatial analysis can expose cultural differences and the way trends propagate. However, tapping into this fountain of information is far from trivial. This paper presents a novel analysis of the shared folders data-set collected from the Gnutella network. We first present the framework for crawling the network and collecting the data. We then present some data-set characteristics, while focusing on music similarities. The paper sheds light on both the opportunities of using p2p data and its complexities.
international conference on peer-to-peer computing | 2010
Pavel Gurvich; Noam Koenigstein; Yuval Shavitt
This paper investigates the Direct Connect (DC) file sharing network, which to the best of our knowledge, has never been academically studied before. We developed a participating agent, in order to gather protocol specific information. We quantify network characteristics such as distribution of users in hubs, hubs geography, queries distribution and trends in shared folder size. We also characterize the typical DC user: A heavy downloader with a particularly large shared folder. Most importantly, we discovered a query duplications problem that drains much of the hubs CPU and bandwidth resources. In the DC network, query facilitation is the most demanding task for hubs and the main factor in the protocols scalability challenges. We show that in some hubs, up to a third of the queries traffic is duplicated and therefore wasteful. Resolving this problem will dramatically improve hubs performances by reducing the amount of relayed queries and thus permitting larger hub communities.
knowledge discovery and data mining | 2011
Gideon Dror; Noam Koenigstein; Yehuda Koren; Markus Weimer
conference on recommender systems | 2013
Noam Koenigstein; Yehuda Koren