Shahan Khatchadourian

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shahan Khatchadourian is active.

Explore More

Publication

Featured researches published by Shahan Khatchadourian.

international semantic web conference | 2010

ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud

Shahan Khatchadourian; Mariano P. Consens

Publishing interlinked RDF datasets as links between data items identified using dereferenceable URIs on the web brings forward a number of issues. A key challenge is to understand the data, the schema, and the interlinks that are actually used both within and across linked datasets. Understanding actual RDF usage is critical in the increasingly common situations where terms from different vocabularies are mixed. In this paper we describe a tool, ExpLOD, that supports exploring summaries of RDF usage and interlinking among datasets from the Linked Open Data cloud. ExpLODs summaries are based on a novel mechanism that combines text labels and bisimulation contractions. The labels assigned to RDF graphs are hierarchical, enabling summarization at different granularities. The bisimulation contractions are applied to subgraphs defined via queries, providing for summarization of arbitrary large or small graph neighbourhoods. Also, ExpLOD can generate SPARQL queries from a summary. Experimental results, using several collections from the Linked Open Data cloud, compare the two summary creation approaches implemented by ExpLOD (graph-based vs. SPARQL-based).

international conference on data engineering | 2008

DescribeX: Interacting with AxPRE Summaries

Mir Sadek Ali; Mariano P. Consens; Shahan Khatchadourian; Flavio Rizzolo

DescribeX is a visual, interactive tool for exploring the underlying structure of an XML collection. DescribeX implements a framework for creating XML summaries described using axis path regular expressions (abbreviated AxPRE). AxPREs capture all the bisimilarity-based proposals in the summary literature and they can be used to define new and more expressive summaries. This demonstration shows how DescribeX helps to analyze diverse XML collections in one particular scenario: the analysis of protein-protein interaction XML data from multiple providers that conform to the PSI-MI schema.

Journal of Integrative Bioinformatics | 2007

Exploring PSI-MI XML Collections Using DescribeX

Reza Samavi; Mariano P. Consens; Shahan Khatchadourian; Thodoros Topaloglou

Summary PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers.

very large data bases | 2015

S+EPPs: construct and explore bisimulation summaries, plus optimize navigational queries; all on existing SPARQL systems

Mariano P. Consens; Valeria Fionda; Shahan Khatchadourian; Giuseppe Pirrò

We demonstrate S+EPPs, a system that provides fast construction of bisimulation summaries using graph analytics platforms, and then enhances existing SPARQL engines to support summary-based exploration and navigational query optimization. The construction component adds a novel optimization to a parallel bisimulation algorithm implemented on a multi-core graph processing framework. We show that for several large, disk resident, real world graphs, full summary construction can be completed in roughly the same time as the data load. The query translation component supports Extended Property Paths (EPPs), an enhancement of SPARQL 1.1 property paths that can express a significantly larger class of navigational queries. EPPs are implemented via rewritings into a widely used SPARQL subset. The optimization component can (transparently to users) translate EPPs defined on instance graphs into EPPs that take advantage of bisimulation summaries. S+EPPs combines the query and optimization translations to enable summary-based optimization of graph traversal queries on top of off-the-shelf SPARQL processors. The demonstration showcases the construction of bisimulation summaries of graphs (ranging from millions to billions of edges), together with the exploration benefits and the navigational query speedups obtained by leveraging summaries stored alongside the original datasets.

international conference on management of data | 2015

Constructing Bisimulation Summaries on a Multi-Core Graph Processing Framework

Shahan Khatchadourian; Mariano P. Consens

Bisimulation summaries of graph data have multiple applications, including facilitating graph exploration and enabling query optimization techniques, but efficient, scalable, summary construction is challenging. The literature describes parallel construction algorithms using message-passing, and these have been recently adapted to MapReduce environments. The fixpoint nature of bisimulation is well suited to iterative graph processing, but the existing MapReduce solutions do not drastically decrease per-iteration times as the computation progresses. In this paper, we focus on leveraging parallel multi-core graph frameworks with the goal of constructing summaries in roughly the same amount of time that it takes to input the data into the framework (for a range of real world data graphs) and output the summary. To achieve our goal we introduce a singleton optimization that significantly reduces per-iteration times after only a few iterations. We present experimental results validating that our scalable GraphChi implementation achieves our goal with bisimulation summaries of million to billion edge graphs.

acm symposium on applied computing | 2012

Entity matching for semistructured data in the Cloud

Marcus Paradies; Susan Malaika; Jérôme Siméon; Shahan Khatchadourian; Kai-Uwe Sattler

The rapid expansion of available information, on the Web or inside companies, is increasing. With Cloud infrastructure maturing (including tools for parallel data processing, text analytics, clustering, etc.), there is more interest in integrating data to produce higher-value content. New challenges, notably include entity matching over large volumes of heterogeneous data. In this paper, we describe an approach for entity matching over large amounts of semistructured data in the Cloud. The approach combines ChuQL[4], a recently proposed extension of XQuery with MapReduce, and a blocking technique for entity matching which can be efficiently executed on top of MapReduce. We illustrate the proposed approach by applying it to extract automatically and enrich references in Wikipedia and report on an experimental evaluation of the approach.

conference of the centre for advanced studies on collaborative research | 2010

Web data processing on the cloud

Shahan Khatchadourian; Mariano P. Consens; Jérôme Siméon

Cloud computing is emerging as a highly scalable, fault-tolerant, and cost-effective way to process large amounts of information on the Web. Thanks in part to new data processing paradigms designed with the Cloud in mind (such as MapReduce[1], HDFS[2], Cassandra[3], etc), it is quickly gaining acceptance as a viable platform for organizations that need to store, process, and publish large amounts of data. MapReduce is attractive for processing data on the Cloud, to a large extent, because of its simplicity and flexibility. Implementations of MapReduce usually include a simple API used to describe which part of the processing is done in parallel (Map phase), and which part of the processing is done after grouping data on a single machine (Reduce phase). It does not rely on a pre-existing data model, making it possible to process any kind of information independently of the model. Cloud applications have notably been used to perform off-line analytical processing such as analyzing web request logs, computing user recommendations, and understanding scenes in images [4].

Focused Access to XML Documents | 2008

XML Retrieval by Improving Structural Relevance Measures Obtained from Summary Models

Mir Sadek Ali; Mariano P. Consens; Shahan Khatchadourian

In XML retrieval, there is often more than one element in the same document that could represent the same focused result. So, a key challenge for XML retrieval systems is to return the set of elements that best satisfies the information need of the end-user in terms of both content and structure. At INEX, there have been numerous proposals for how to incorporate structural constraints and hints into ranking. These proposals either boost the score of or filter out elements that have desirable structural properties. An alternative approach that has not been explored is to rank elements by improving their structural relevance. Structural relevance is the expected relevance of a list of elements, based on a graphical model of how users browse elements within documents. In our approach, we use summary graphs to describe the process of a user browsing from one part of a document to another. In this paper, we develop an algorithm to structurally score retrieval scenarios using structural relevance. The XML retrieval system identifies the candidate scenarios. We apply structural relevance with a given summary model to identify the most structurally relevant scenario. This results in improved system performance. Our approach provides a consistent way to apply different user models to ranking. We also explore the use of score boosting using these models.

data integration in the life sciences | 2010

Quality assessment of MAGE-ML genomic datasets using DescribeX

Lorena Etcheverry; Shahan Khatchadourian; Mariano P. Consens

The functional genomics and informatics community has made extensive microarray experimental data available online, facilitating independent evaluation of experiment conclusions and enabling researchers to access and reuse a growing body of gene expression knowledge. While there are several data-exchange standards, numerous microarray experiment datasets are published using the MAGE-ML XML schema. Assessing the quality of published experiments is a challenging task, and there is no consensus among microarray users on a framework to measure dataset quality. In this paper, we develop techniques based on DescribeX (a summary-based visualization tool for XML) that quantitatively and qualitatively analyze MAGE-ML public collections, gaining insights about schema usage. We address specific questions such as detection of common instance patterns and coverage, precision of the experiment descriptions, and usage of controlled vocabularies. Our case study shows that DescribeX is a useful tool for the evaluation of microarray experiment data quality that enhances the understanding of the instance-level structure of MAGE-ML datasets.

Advances in Focused Retrieval | 2009

Exploiting User Navigation to Improve Focused Retrieval

Mir Sadek Ali; Mariano P. Consens; Bassam Helou; Shahan Khatchadourian

A common approach for developing XML element retrieval systems is to adapt text retrieval systems to retrieve elements from documents. Two key challenges in this approach are to effectively score structural queries and to control overlap in the output across different search tasks. In this paper, we continue our research into the use of navigation models for element scoring as a way to represent the users preferences for the structure of retrieved elements. Our goal is to improve search systems using structural scoring by boosting the score of desirable elements and to post-process results to control XML overlap. This year we participated in the Ad-hoc Focused, Efficiency, and Entity Ranking Tracks, where we focused our attention primarily on the effectiveness of small navigation models. Our experiments involved three modifications to our previous work; (i) using separate summaries for boosting and post-processing, (ii) introducing summaries that are generated from user study data, and (iii) confining our results to using small models. Our results suggest that smaller models can be effective but more work needs to be done to understand the cases where different navigation models may be appropriate.

Explore More