Mehdi Kargar
York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mehdi Kargar.
conference on information and knowledge management | 2011
Mehdi Kargar; Aijun An
We study the problem of discovering a team of experts from a social network. Given a project whose completion requires a set of skills, our goal is to find a set of experts that together have all of the required skills and also have the minimal communication cost among them. We propose two communication cost functions designed for two types of communication structures. We show that the problem of finding the team of experts that minimizes one of the proposed cost functions is NP-hard. Thus, an approximation algorithm with an approximation ratio of two is designed. We introduce the problem of finding a team of experts with a leader. The leader is responsible for monitoring and coordinating the project, and thus a different communication cost function is used in this problem. To solve this problem, an exact polynomial algorithm is proposed. We show that the total number of teams may be exponential with respect to the number of required skills. Thus, two procedures that produce top-k teams of experts with or without a leader in polynomial delay are proposed. Extensive experiments on real datasets demonstrate the effectiveness and scalability of the proposed methods.
very large data bases | 2011
Mehdi Kargar; Aijun An
Keyword search over a graph finds a substructure of the graph containing all or some of the input keywords. Most of previous methods in this area find connected minimal trees that cover all the query keywords. Recently, it has been shown that finding subgraphs rather than trees can be more useful and informative for the users. However, the current tree or graph based methods may produce answers in which some content nodes (i.e., nodes that contain input keywords) are not very close to each other. In addition, when searching for answers, these methods may explore the whole graph rather than only the content nodes. This may lead to poor performance in execution time. To address the above problems, we propose the problem of finding r-cliques in graphs. An r-clique is a group of content nodes that cover all the input keywords and the distance between each two nodes is less than or equal to r. An exact algorithm is proposed that finds all r-cliques in the input graph. In addition, an approximation algorithm that produces r-cliques with 2-approximation in polynomial delay is proposed. Extensive performance studies using two large real data sets confirm the efficiency and accuracy of finding r-cliques in graphs.
european conference on machine learning | 2012
Mehdi Kargar; Aijun An; Morteza Zihayat
We tackle the problem of finding a team of experts from a social network to complete a project that requires a set of skills. The social network is modeled as a graph. A node in the graph represents an expert and has a weight representing the monetary cost for using the expert service. Two nodes in the graph can be connected and the weight on the edge represents the communication cost between the two corresponding experts. Given a project, our objective is to find a team of experts that covers all the required skills and also minimizes the communication cost as well as the personnel cost of the project. To minimize both of the objectives, we define a new combined cost function which is based on the linear combination of the objectives (i.e. communication and personnel costs). We show that the problem of minimizing the combined cost function is an NP-hard problem. Thus, one approximation algorithm is proposed to solve the problem. The proposed approximation algorithm is bounded and the approximation ratio of the algorithm is proved in the paper. Three heuristic algorithms based on different intuitions are also proposed for solving the problem. Extensive experiments on real datasets demonstrate the effectiveness and scalability of the proposed algorithms.
international conference on data engineering | 2015
Mehdi Kargar; Aijun An; Nick Cercone; Parke Godfrey; Jaroslaw Szlichta; Xiaohui Yu
Keyword search over relational databases offers an alternative way to SQL to query and explore databases that is effective for lay users who may not be well versed in SQL or the database schema. This becomes more pertinent for databases with large and complex schemas. An answer in this context is a join tree spanning tuples containing the querys keywords. As there are potentially many answers to the query, and the user is often only interested in seeing the top-k answers, how to rank the answers based on their relevance is of paramount importance. We focus on the relevance of join as the fundamental means to rank answers. We devise means to measure relevance of relations and foreign keys in the schema over the information content of the database. This can be done offline with no need for external models. We compare the proposed measures against a gold standard we derive from a real workload over TPC-E and evaluate the effectiveness of our methods. Finally, we test the performance of our measures against existing techniques to demonstrate a marked improvement, and perform a user study to establish naturalness of the ranking of the answers.
international conference on management of data | 2014
Mehdi Kargar; Aijun An; Nick Cercone; Parke Godfrey; Jaroslaw Szlichta; Xiaohui Yu
Keyword search in relational databases was introduced in the last decade to assist users who are not familiar with a query language, the schema of the database, or the content of the data. An answer is a join tree of tuples that contains the query keywords. When searching a database with a complex schema, there are potentially many answers to the query. Therefore, ranking answers based on their relevance is crucial in this context. Prior work has addressed relevance based on the size of the answer or the IR scores of the tuples. However, this is not sufficient when searching a complex schema. We demonstrate MeanKS, a new system for meaningful keyword search over relational databases. The system first captures the users interest by determining the roles of the keywords. Then, it uses schema-based ranking to rank join trees that cover the keyword roles. This uses the relevance of relations and foreign-key relationships in the schema over the information content of the database. In the demonstration, attendees can execute queries against the TPC-E warehouse and compare the proposed measures against a gold standard derived from a real workload over TPC-E to test the effectiveness of our methods.
international conference on data engineering | 2012
Mehdi Kargar; Aijun An
A system for efficient keyword search in graphs is demonstrated. The system has two components, a search through only the nodes containing the input keywords for a set of nodes that are close to each other and together cover the input keywords and an exploration for finding how these nodes are related to each other. The system generates all or top-k answers in polynomial delay. Answers are presented to the user according to a ranking criterion so that the answers with nodes closer to each other are presented before the ones with nodes farther away from each other. In addition, the set of answers produced by our system is duplication free. The system uses two methods for presenting the final answer to the user. The presentation methods reveal relationships among the nodes in an answer through a tree or a multi-center graph. We will show that each method has its own advantages and disadvantages. The system is demonstrated using two challenging datasets, very large DBLP and highly cyclic Mondial. Challenges and difficulties in implementing an efficient keyword search system are also demonstrated.
Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2014
Morteza Zihayat; Mehdi Kargar; Aijun An
In this paper, we study the problem of finding teams of experts from an expert network while optimizing three objectives. Given a project, the objective is to find teams of experts that cover all the required skills and also optimize the communication cost as well as the personnel cost and the expertise level of the team members. The expert network is modeled as a graph, where nodes represent experts and edges between nodes specify the communication costs between the experts. In this paper, we are interested in finding a Pareto front of teams that not only cover the required skills but are also not dominated by other feasible teams with respect to the three criteria. Since the problem is NP-hard, we propose algorithms to use with a two-phase method to find an approximation of the Pareto front for the three criteria team formation problem. In the first phase, an initial population which is composed of an approximation of the supported efficient teams is generated. Then, a Pareto local search method is applied to each solution of the initial population to find other members of the Pareto front. The proposed method is evaluated on the DBLP data set. The results indicate its superior performance comparing with other methods in terms of running time and the quality of answers.
IEEE Transactions on Knowledge and Data Engineering | 2014
Mehdi Kargar; Aijun An; Xiaohui Yu
Keyword search over a graph searches for a subgraph that contains a set of query keywords. A problem with most existing keyword search methods is that they may produce duplicate answers that contain the same set of content nodes (i.e., nodes containing a query keyword) although these nodes may be connected differently in different answers. Thus, users may be presented with many similar answers with trivial differences. In addition, some of the nodes in an answer may contain query keywords that are all covered by other nodes in the answer. Removing these nodes does not change the coverage of the answer but can make the answer more compact. The answers in which each content node contains at least one unique query keyword are called minimal answers in this paper. We define the problem of finding duplication-free and minimal answers, and propose algorithms for finding such answers efficiently. Extensive performance studies using two large real data sets confirm the efficiency and effectiveness of the proposed methods.
international conference on data mining | 2011
Mehdi Kargar; Aijun An
A system for efficient team formation in social networks is demonstrated. Given a project whose completion requires a set of skills, our system finds a set of experts that together have all of the required skills and also have the minimal communication cost. The system finds the best teams with or without a leader using two types of communication structures. After discovering the teams of experts, our system can display the relationships among the experts in a team by showing how the experts are connected in the social network. Since the total number of teams might be exponential with respect to the number of required skills, procedures that produce top-k teams of experts with or without a leader in polynomial delay are implemented. The system is demonstrated using the well-known DBLP dataset.
international conference on bioinformatics | 2010
Mehdi Kargar; Aijun An
Analyzing large amounts of data is one of the most challenging problem in modern molecular biology. In this work, different complexity measures and methods are applied to identify the signals in the whole genome of the three prokaryotic organisms. In addition to previous complexity measures, new measures are introduced for representing Open Reading Frames (ORF). We apply classification algorithms to determine which complexity measures can lead to better predictive performance in discriminating genes from pseudo-genes in ORFs. Also, we investigate whether positions and lengths of windows in ORFs have significant impact on distinguishing between genes and pseudo-genes. Different classification algorithms are applied for classifying ORFs into genes and pseudo-genes.