Ariel Fuxman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ariel Fuxman is active.

Explore More

Publication

Featured researches published by Ariel Fuxman.

international world wide web conferences | 2012

Active objects: actions for entity-centric search

Thomas Lin; Patrick Pantel; Michael Gamon; Anitha Kannan; Ariel Fuxman

We introduce an entity-centric search experience, called Active Objects, in which entity-bearing queries are paired with actions that can be performed on the entities. For example, given a query for a specific flashlight, we aim to present actions such as reading reviews, watching demo videos, and finding the best price online. In an annotation study conducted over a random sample of user query sessions, we found that a large proportion of queries in query logs involve actions on entities, calling for an automatic approach to identifying relevant actions for entity-bearing queries. In this paper, we pose the problem of finding actions that can be performed on entities as the problem of probabilistic inference in a graphical model that captures how an entity bearing query is generated. We design models of increasing complexity that capture latent factors such as entity type and intended actions that determine how a user writes a query in a search box, and the URL that they click on. Given a large collection of real-world queries and clicks from a commercial search engine, the models are learned efficiently through maximum likelihood estimation using an EM algorithm. Given a new query, probabilistic inference enables recommendation of a set of pertinent actions and hosts. We propose an evaluation methodology for measuring the relevance of our recommended actions, and show empirical evidence of the quality and the diversity of the discovered actions.

international world wide web conferences | 2014

The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing

Hongwei Li; Bo Zhao; Ariel Fuxman

Worker reliability is a longstanding issue in crowdsourcing, and the automatic discovery of high quality workers is an important practical problem. Most previous work on this problem mainly focuses on estimating the quality of each individual worker jointly with the true answer of each task. However, in practice, for some tasks, worker quality could be associated with some explicit characteristics of the worker, such as education level, major and age. So the following question arises: how do we automatically discover related worker attributes for a given task, and further utilize the findings to improve data quality? In this paper, we propose a general crowd targeting framework that can automatically discover, for a given task, if any group of workers based on their attributes have higher quality on average; and target such groups, if they exist, for future work on the same task. Our crowd targeting framework is complementary to traditional worker quality estimation approaches. Furthermore, an advantage of our framework is that it is more budget efficient because we are able to target potentially good workers before they actually do the task. Experiments on real datasets show that the accuracy of final prediction can be improved significantly for the same budget (or even less budget in some cases). Our framework can be applied to many real word tasks and can be easily integrated in current crowdsourcing platforms.

knowledge discovery and data mining | 2011

Matching unstructured product offers to structured product specifications

Anitha Kannan; Inmar E. Givoni; Rakesh Agrawal; Ariel Fuxman

An e-commerce catalog typically comprises of specifications for millions of products. The search engine receives millions of sales offers from thousands of independent merchants that must be matched to the right products. We describe the challenges that a system for matching unstructured offers to structured product descriptions must address, drawing upon our experience from building such a system for Bing Shopping. The heart of our system is a data-driven component that learns the matching function off-line, which is then applied at run-time for matching offers to products. We provide the design of this and other critical components of the system as well as the details of the extensive experiments we performed to assess the readiness of the system. This system is currently deployed in an experimental Commerce Search Engine and is used to match all the offers received by Bing Shopping to the Bing product catalog.

very large data bases | 2011

Synthesizing products for online catalogs

Hoa Nguyen; Ariel Fuxman; Stelios Paparizos; Juliana Freire; Rakesh Agrawal

A comprehensive product catalog is essential to the success of Product Search engines and shopping sites such as Yahoo! Shopping, Google Product Search, and Bing Shopping. Given the large number of products and the speed at which they are released to the market, keeping catalogs up-to-date becomes a challenging task, calling for the need of automated techniques. In this paper, we introduce the problem of product synthesis, a key component of catalog creation and maintenance. Given a set of offers advertised by merchants, the goal is to identify new products and add them to the catalog, together with their (structured) attributes. A fundamental challenge in product synthesis is the scale of the problem. A Product Search engine receives data from thousands of merchants about millions of products; the product taxonomy contains thousands of categories, where each category has a different schema; and merchants use representations for products that are different from the ones used in the catalog of the Product Search engine. We propose a system that provides an end-to-end solution to the product synthesis problem, and addresses issues involved in data extraction from offers, schema reconciliation, and data fusion. For the schema reconciliation component, we developed a novel and scalable technique for schema matching which leverages knowledge about previously-known instance-level associations between offers and products; and it is trained using automatically created training sets (no manually-labeled data is needed). We present an experimental evaluation using data from Bing Shopping for more than 800K offers, a thousand merchants, and 400 categories. The evaluation confirms that our approach is able to automatically generate a large number of accurate product specifications. Furthermore, the evaluation shows that our schema reconciliation component outperforms state-of-the-art schema matching techniques in terms of precision and recall.

knowledge discovery and data mining | 2009

Improving classification accuracy using automatically extracted training data

Ariel Fuxman; Anitha Kannan; Andrew B. Goldberg; Rakesh Agrawal; Panayiotis Tsaparas; John C. Shafer

Classification is a core task in knowledge discovery and data mining, and there has been substantial research effort in developing sophisticated classification models. In a parallel thread, recent work from the NLP community suggests that for tasks such as natural language disambiguation even a simple algorithm can outperform a sophisticated one, if it is provided with large quantities of high quality training data. In those applications, training data occurs naturally in text corpora, and high quality training data sets running into billions of words have been reportedly used. We explore how we can apply the lessons from the NLP community to KDD tasks. Specifically, we investigate how to identify data sources that can yield training data at low cost and study whether the quantity of the automatically extracted training data can compensate for its lower quality. We carry out this investigation for the specific task of inferring whether a search query has commercial intent. We mine toolbar and click logs to extract queries from sites that are predominantly commercial (e.g., Amazon) and non-commercial (e.g., Wikipedia). We compare the accuracy obtained using such training data against manually labeled training data. Our results show that we can have large accuracy gains using automatically extracted training data at much lower cost.

international conference on database theory | 2010

Composing local-as-view mappings: closure and applications

Patricia C. Arocena; Ariel Fuxman; Renée J. Miller

Schema mapping composition is a fundamental operation in schema management and data exchange. The mapping composition problem has been extensively studied for a number of mapping languages, most notably source-to-target tuple-generating dependencies (s-t tgds). An important class of s-t tgds are local-as-view (LAV) tgds. This class of mappings is prevalent in practical data integration and exchange systems, and recent work by ten Cate and Kolaitis shows that such mappings possess desirable structural properties. It is known that s-t tgds are not closed under composition. That is, given two mappings expressed with s-t tgds, their composition may not be definable by any set of s-t tgds (and, in general, may not be expressible in first-order logic). Despite their importance and extensive use in data integration and exchange systems, the closure properties of LAV composition remained open to date. The most important contribution of this paper is to show that LAV tgds are closed under composition, and provide an algorithm to directly compute the composition. An important application of our composition result is that it helps to understand if given a LAV mapping Mst from schema S to schema T, and a LAV mapping Mts from schema T back to S, the composition of Mst and Mts is able to recover the information in any instance of S. Arenas et al. formalized this notion and showed that general s-t tgds mappings always have a recovery. Hence, a LAV mapping always has a recovery. However, the problem of testing whether a given Mts is a recovery of Mst is known to be undecidable for general s-t tgds. In contrast, in this paper we show the tractability of the problem for LAV mappings, and give a polynomial-time algorithm to solve it.

IEEE Transactions on Knowledge and Data Engineering | 2013

TACI: Taxonomy-Aware Catalog Integration

Panagiotis Papadimitriou; Panayiotis Tsaparas; Ariel Fuxman; Lise Getoor

A fundamental data integration task faced by online commercial portals and commerce search engines is the integration of products coming from multiple providers to their product catalogs. In this scenario, the commercial portal has its own taxonomy (the “master taxonomy”), while each data provider organizes its products into a different taxonomy (the “provider taxonomy”). In this paper, we consider the problem of categorizing products from the data providers into the master taxonomy, while making use of the provider taxonomy information. Our approach is based on a taxonomy-aware processing step that adjusts the results of a text-based classifier to ensure that products that are close together in the provider taxonomy remain close in the master taxonomy. We formulate this intuition as a structured prediction optimization problem. To the best of our knowledge, this is the first approach that leverages the structure of taxonomies in order to enhance catalog integration. We propose algorithms that are scalable and thus applicable to the large data sets that are typical on the web. We evaluate our algorithms on real-world data and we show that taxonomy-aware classification provides a significant improvement over existing approaches.

international world wide web conferences | 2014

Contextual insights

Ariel Fuxman; Patrick Pantel; Yuanhua Lv; Ashok K. Chandra; Pradeep Chilakamarri; Michael Gamon; David W. Hamilton; Bernhard Kohlmeier; Dhyanesh Narayanan; Evangelos E. Papalexakis; Bo Zhao

In todays productivity environment, users are constantly researching topics while consuming or authoring content in applications such as e-readers, word processors, presentation programs, or social networks. However, none of these applications sufficiently enable users to do their research directly within the application. In fact, users typically have to switch to a browser and write a query on a search engine. Switching to a search engine is distracting and hurts productivity. Furthermore, the main problem is that the search engine is not aware of important user context such as the book that they are reading or the document they are authoring. To tackle this problem, we introduce the notion of contextual insights: providing users with information that is contextually relevant to the content that they are consuming or authoring. We then present Leibniz, a system that provides a solution for the contextual insights problem.

extending database technology | 2011

Link-based hidden attribute discovery for objects on Web

Jiuming Huang; Haixun Wang; Yan Jia; Ariel Fuxman

Information extraction from the Web is of growing importance. Objects on the Web are often associated with many attributes that describe the objects. It is essential to extract these attributes and map them to their corresponding objects. However, much attribute information about an object is hidden in the dynamic user interaction and is not on the Web page that describes the object. Existing information extraction approaches focus on getting information from the object Web page only, which means a lot of attribute information is lost. In this paper, we study the dynamic user interaction on exploratory search Websites and propose a novel link-based approach to discover attributes and map them to objects. We build an exploratory search model for exploratory Web sites, and we propose algorithms for identifying, clustering, and relationship mining of related Web pages based on the model. Using the unsupervised method in our approach, we are able to discover hidden attributes not explicitly shown on object Web pages. We test our approach on two online shopping Websites. We achieve high precision and recall: For entirely crawled Web sites the precision and recall are 98% and 97% respectively. For randomly crawled (sampled) Web sites the precision and recall are 98% and 80% respectively.

international acm sigir conference on research and development in information retrieval | 2015

In Situ Insights

Yuanhua Lv; Ariel Fuxman

When consuming content in applications such as e-readers, word processors, and Web browsers, users often see mentions to topics (or concepts) that attract their attention. In a scenario of significant practical interest, topics are explored in situ, without leaving the context of the application: The user selects a mention of a topic (in the form of continuous text), and the system subsequently recommends references (e.g., Wikipedia concepts) that are relevant in the context of the application. In order to realize this experience, it is necessary to tackle challenges that include: users may select any continuous text, even potentially noisy text for which there is no corresponding reference in the knowledge base; references must be relevant to both the user selection and the text around it; and the real estate available on the application may be constrained, thus limiting the number of results that can be shown. In this paper, we study this novel recommendation task, that we call in situ insights: recommending reference concepts in response to a text selection and its context in-situ of a document consumption application. We first propose a selection-centric context language model and a selection-centric context semantic model to capture user interest. Based on these models, we then measure the quality of a reference concept across three aspects: selection clarity, context coherence, and concept relevance. By leveraging all these aspects, we put forward a machine learning approach to simultaneously decide if a selection is noisy, and filter out low-quality candidate references. In order to quantitatively evaluate our proposed techniques, we construct a test collection based on the simulation of the in situ insights scenario using crowdsourcing in the context of a real-word e-reader application. Our experimental evaluation demonstrates the effectiveness of the proposed techniques.

Explore More