Slava Novgorodov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Slava Novgorodov is active.

Explore More

Publication

Featured researches published by Slava Novgorodov.

conference on information and knowledge management | 2011

Diversification and refinement in collaborative filtering recommender

Rubi Boim; Tova Milo; Slava Novgorodov

This paper considers a popular class of recommender systems that are based on Collaborative Filtering (CF) and proposes a novel technique for diversifying the recommendations that they give to users. Items are clustered based on a unique notion of priority-medoids that provides a natural balance between the need to present highly ranked items vs. highly diverse ones. Our solution estimates items diversity by comparing the rankings that different users gave to the items, thereby enabling diversification even in common scenarios where no semantic information on the items is available. It also provides a natural zoom-in mechanism to focus on items (clusters) of interest and recommending diversified similar items. We present DiRec a plug-in that implements the above concepts and allows CF Recommender systems to diversify their recommendations. We illustrate the operation of DiRec in the context of a movie recommendation system and present a thorough experimental study that demonstrates the effectiveness of our recommendation diversification technique and its superiority over previous solutions.

international conference on management of data | 2014

OASSIS: query driven crowd mining

Yael Amsterdamer; Susan B. Davidson; Tova Milo; Slava Novgorodov; Amit Somech

Crowd data sourcing is increasingly used to gather information from the crowd and to obtain recommendations. In this paper, we explore a novel approach that broadens crowd data sourcing by enabling users to pose general questions, to mine the crowd for potentially relevant data, and to receive concise, relevant answers that represent frequent, significant data patterns. Our approach is based on (1) a simple generic model that captures both ontological knowledge as well as the individual history or habits of crowd members from which frequent patterns are mined; (2) a query language in which users can declaratively specify their information needs and the data patterns of interest; (3) an efficient query evaluation algorithm, which enables mining semantically concise answers while minimizing the number of questions posed to the crowd; and (4) an implementation of these ideas that mines the crowd through an interactive user interface. Experimental results with both real-life crowd and synthetic data demonstrate the feasibility and effectiveness of the approach.

international conference on management of data | 2015

Query-Oriented Data Cleaning with Oracles

Moria Bergman; Tova Milo; Slava Novgorodov; Wang Chiew Tan

As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct as possible. For this reason, many data cleaning tools have been developed to automatically resolve inconsistencies in databases. However, data cleaning tools provide only best-effort results and usually cannot eradicate all errors that may exist in a database. Even more importantly, existing data cleaning tools do not typically address the problem of determining what information is missing from a database. To overcome the limitations of existing data cleaning techniques, we present QOCO, a novel query-oriented system for cleaning data with oracles. Under this framework, incorrect (resp. missing) tuples are removed from (added to) the result of a query through edits that are applied to the underlying database, where the edits are derived by interacting with domain experts which we model as oracle crowds. We show that the problem of determining minimal interactions with oracle crowds to derive database edits for removing (adding) incorrect (missing) tuples to the result of a query is NP-hard in general and present heuristic algorithms that interact with oracle crowds. Finally, we implement our algorithms in our prototype system QOCO and show that it is effective and efficient through a comprehensive suite of experiments.

international conference on data engineering | 2011

DiRec: Diversified recommendations for semantic-less Collaborative Filtering

Rubi Boim; Tova Milo; Slava Novgorodov

In this demo we present DiRec, a plug-in that allows Collaborative Filtering (CF) Recommender systems to diversify the recommendations that they present to users. DiRec estimates items diversity by comparing the rankings that different users gave to the items, thereby enabling diversification even in common scenarios where no semantic information on the items is available. Items are clustered based on a novel notion of priority-medoids that provides a natural balance between the need to present highly ranked items vs. highly diverse ones. We demonstrate the operation of DiRec in the context of a movie recommendation system. We show the advantage of recommendation diversification and its feasibility even in the absence of semantic information.

very large data bases | 2016

Rudolf: interactive rule refinement system for fraud detection

Tova Milo; Slava Novgorodov; Wang-Chiew Tan

Credit card frauds are unauthorized transactions that are made or attempted by a person or an organization that is not authorized by the card holders. In addition to machine learning-based techniques, credit card companies often employ domain experts to manually specify rules that exploit domain knowledge for improving the detection process. Over time, however, as new (fraudulent and legitimate) transaction arrive, these rules need to be updated and refined to capture the evolving (fraud and legitimate) activity patterns. The goal of the RUDOLF system that is demonstrated here is to guide and assist domain experts in this challenging task. RUDOLF automatically determines a best set of candidate adaptations to existing rules to capture all fraudulent transactions and, respectively, omit all legitimate transactions. The proposed modifications can then be further refined by domain experts based on their domain knowledge, and the process can be repeated until the experts are satisfied with the resulting rules. Our experimental results on real-life datasets demonstrate the effectiveness and efficiency of our approach. We showcase RUDOLF with two demonstration scenarios: detecting credit card frauds and network attacks. Our demonstration will engage the VLDB audience by allowing them to play the role of a security expert, a credit card fraudster, or a network attacker.

very large data bases | 2014

Ontology assisted crowd mining

Yael Amsterdamer; Susan B. Davidson; Tova Milo; Slava Novgorodov; Amit Somech

We present OASSIS (for Ontology ASSISted crowd mining), a prototype system which allows users to declaratively specify their information needs, and mines the crowd for answers. The answers that the system computes are concise and relevant, and represent frequent, significant data patterns. The system is based on (1) a generic model that captures both ontological knowledge, as well as the individual knowledge of crowd members from which frequent patterns are mined; (2) a query language in which users can specify their information needs and types of data patterns they seek; and (3) an efficient query evaluation algorithm, for mining semantically concise answers while minimizing the number of questions posed to the crowd. We will demonstrate OASSIS using a couple of real-life scenarios, showing how users can formulate and execute queries through the OASSIS UI and how the relevant data is mined from the crowd.

very large data bases | 2015

QOCO: a query oriented data cleaning system with oracles

Moria Bergman; Tova Milo; Slava Novgorodov; Wang Chiew Tan

As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct as possible. For this reason, many data cleaning tools have been developed to automatically resolve inconsistencies in databases. However, data cleaning tools provide only best-effort results and usually cannot eradicate all errors that may exist in a database. Even more importantly, existing data cleaning tools do not typically address the problem of determining what information is missing from a database. To tackle these problems, we present QOCO, a novel query oriented cleaning system that leverages materialized views that are defined by user queries as a trigger for identifying the remaining incorrect/missing information. Given a user query, QOCO interacts with domain experts (which we model as oracle crowds) to identify potentially wrong or missing answers in the result of the user query, as well as determine and correct the wrong data that is the cause for the error(s). We will demonstrate QOCO over a World Cup Games database, and illustrate the interaction between QOCO and the oracles. Our demo audience will play the role of oracles, and we show how QOCOs underlying operations and optimization mechanisms can effectively prune the search space and minimize the number of questions that need to be posed to accelerate the cleaning process.

european symposium on algorithms | 2015

The Temp Secretary Problem

Amos Fiat; Ilia Gorelik; Haim Kaplan; Slava Novgorodov

We consider a generalization of the secretary problem where contracts are temporary, and for a fixed duration γ. This models online hiring of temporary employees, or online auctions for re-usable resources. The problem is related to the question of finding a large independent set in a random unit interval graph.

international workshop on the web and databases | 2018

Cleaning Data with Constraints and Experts

Ahmad Assadi; Tova Milo; Slava Novgorodov

Popular techniques for data cleaning use integrity constraints to identify errors in the data and to automatically resolve them, e.g. by using predefined priorities among possible updates and finding a minimal repair that will resolve violations. Such automatic solutions however cannot ensure precision of the repairs since they do not have enough evidence about the actual errors and may in fact lead to wrong results with respect to the ground truth. It has thus been suggested to use domain experts to examine the potential updates and choose which should be applied to the database. However, the sheer volume of the databases and the large number of possible updates that may resolve a given constraint violation, may make such a manual examination prohibitory expensive. The goal of the DANCE system presented here is to help to optimize the experts work and reduce as much as possible the number of questions (updates verification) they need to address. Given a constraint violation, our algorithm identifies the suspicious tuples whose update may contribute (directly or indirectly) to the constraint resolution, as well as the possible dependencies among them. Using this information it builds a graph whose nodes are the suspicious tuples and whose weighted edges capture the likelihood of an error in one tuple to occur and affect the other. PageRank-style algorithm then allows us to identify the most beneficial tuples to ask about first. Incremental graph maintenance is used to assure interactive response time. We implemented our solution in the DANCE system and show its effectiveness and efficiency through a comprehensive suite of experiments.

international conference on data engineering | 2012