Arash Termehchy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arash Termehchy is active.

Explore More

Publication

Featured researches published by Arash Termehchy.

IEEE Transactions on Knowledge and Data Engineering | 2014

Efficient Prediction of Difficult Keyword Queries over Databases

Shiwen Cheng; Arash Termehchy; Vagelis Hristidis

Keyword queries on databases provide easy access to data, but often suffer from low ranking quality, i.e., low precision and/or recall, as shown in recent benchmarks. It would be useful to identify queries that are likely to have low ranking quality to improve the user satisfaction. For instance, the system may suggest to the user alternative queries for such hard queries. In this paper, we analyze the characteristics of hard queries and propose a novel framework to measure the degree of difficulty for a keyword query over a database, considering both the structure and the content of the database and the query results. We evaluate our query difficulty prediction model against two effectiveness benchmarks for popular keyword search ranking methods. Our empirical results show that our model predicts the hard queries with high accuracy. Further, we present a suite of optimizations to minimize the incurred time overhead.

international conference on management of data | 2017

Schema Independent Relational Learning

Jose Picado; Arash Termehchy; Alan Fern; Parisa Ataei

Learning novel relations from relational databases is an important problem with many applications. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same database may be represented under different schemas for various reasons, such as data quality, efficiency and usability. The output of current relational learning algorithms tends to vary quite substantially over the choice of schema. This variation complicates their off-the-shelf application. We introduce and formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We show that current algorithms are not schema independent. We propose Castor, a relational learning algorithm that achieves schema independence by leveraging data dependencies.

international conference on management of data | 2014

Which concepts are worth extracting

Arash Termehchy; Ali Vakilian; Yodsawalai Chodpathumwan; Marianne Winslett

It is well established that extracting and annotating occurrences of entities in a collection of unstructured text documents with their concepts improve the effectiveness of answering queries over the collection. However, it is very resource intensive to create and maintain large annotated collections. Since the available resources of an enterprise are limited and/or its users may have urgent information needs, it may have to select only a subset of relevant concepts for extraction and annotation. We call this subset a conceptual design for the annotated collection. In this paper, we introduce the problem of cost effective conceptual design, where given a collection, a set of relevant concepts, and a fixed budget, one likes to find a conceptual design that improves the effectiveness of answering queries over the collection the most. We prove that the problem is generally NP-hard in the number of relevant concepts and propose two efficient approximation algorithms to solve the problem: Approximate Popularity Maximization (APM for short) and Approximate Annotation-benefit Maximization (AAM for short). We show that if there is not any constraints regrading the overlap of concepts, APM is a fully polynomial time approximation scheme. We also prove that if the relevant concepts are mutually exclusive, APM has a constant approximation ratio and AAM is a fully polynomial time approximation scheme. Our empirical results using Wikipedia collection and a search engine query log validate the proposed formalization of the problem and show that APM and AAM efficiently compute conceptual designs. They also indicate that in general APM delivers the optimal conceptual designs if the relevant concepts are not mutually exclusive. Also, if the relevant concepts are mutually exclusive, the conceptual designs delivered by AAM improve the effectiveness of answering queries over the collection more than the solutions provided by APM.

conference on information and knowledge management | 2016

Towards Representation Independent Similarity Search Over Graph Databases

Yodsawalai Chodpathumwan; Amirhossein Aleyasen; Arash Termehchy; Yizhou Sun

Finding similar entities is a fundamental problem in graph data analysis. Similarity search algorithms usually leverage the structural properties of the database to quantify the degree of similarity between entities. However, the same information can be represented in different structures and the structural properties observed over particular representations may not hold for the alternatives. These algorithms are effective on some representations and ineffective on others. We define the property of representation independence for similarity search algorithms as their robustness against transformations that modify the structure of databases but preserve the information content. We introduce a widespread group of such transformations called relationship reorganizing. We propose an algorithm called R-PathSim, which is provably robust under relationship reorganizing. Our empirical results show that current algorithms except R-PathSim are highly sensitive to the data representation and R-PathSim is as efficient and effective as other algorithms.

very large data bases | 2018

Cost-effective conceptual design using taxonomies

Yodsawalai Chodpathumwan; Ali Vakilian; Arash Termehchy; Amir Nayyeri

It is known that annotating entities in unstructured and semi-structured datasets by their concepts improves the effectiveness of answering queries over these datasets. Ideally, one would like to annotate entities of all relevant concepts in a dataset. However, it takes substantial time and computational resources to annotate concepts in large datasets, and an organization may have sufficient resources to annotate only a subset of relevant concepts. Clearly, it would like to annotate a subset of concepts that provides the most effective answers to queries over the dataset. We propose a formal framework that quantifies the amount by which annotating entities of concepts from a taxonomy in a dataset improves the effectiveness of answering queries over the dataset. Because the problem is

international workshop on the web and databases | 2017

Cost-Effective Conceptual Design Over Taxonomies

Ali Vakilian; Yodsawalai Chodpathumwan; Arash Termehchy; Amir Nayyeri

very large data bases | 2015

Universal-DB: towards representation independent graph analytics

Yodsawalai Chodpathumwan; Amirhossein Aleyasen; Arash Termehchy; Yizhou Sun

\mathbf {NP}

very large data bases | 2018

Learning efficiently over heterogeneous databases

Jose Picado; Arash Termehchy; Sudhanshu Pathak

international conference on management of data | 2018

Learning Efficiently Over Heterogeneous Databases: Sampling and Constraints to the Rescue

Jose Picado; Arash Termehchy; Sudhanshu Pathak

NP-hard, we propose efficient approximation and pseudo-polynomial time algorithms for several cases of the problem. Our extensive empirical studies validate our framework and show accuracy and efficiency of our algorithms.

database programming languages | 2017

Variational databases

Parisa Ataei; Arash Termehchy; Eric Walkingshaw

It is known that annotating entities in unstructured and semistructured datasets by their concepts improves the effectiveness of answering queries over these datasets. Ideally, one would like to annotate entities of all relevant concepts in a dataset. However, it takes substantial time and computational resources to annotate concepts in large datasets and an organization may have sufficient resources to annotate only a subset of relevant concepts. Clearly, it would like to annotate a subset of concepts that provides the most effective answers to queries over the dataset. We propose a formal framework that quantifies the amount by which annotating entities of concepts from a taxonomy in a dataset improves the effectiveness of answering queries over the dataset. Because the problem is NP-hard, we propose an efficient approximation for the problem. Our extensive empirical studies validate our framework and show the accuracy and efficiency of our algorithm.

Explore More