Is this you? Create Your Porfile

Mohan Yang

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohan Yang is active.

Explore More

Publication

Featured researches published by Mohan Yang.

international conference on management of data | 2016

Big Data Analytics with Datalog Queries on Spark

Alexander Shkapsky; Mohan Yang; Matteo Interlandi; Hsuan Chiu; Tyson Condie; Carlo Zaniolo

There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.

international conference on data engineering | 2015

Optimizing recursive queries with monotonic aggregates in DeALS

Alexander Shkapsky; Mohan Yang; Carlo Zaniolo

The exploding demand for analytics has refocused the attention of data scientists on applications requiring aggregation in recursion. After resisting the efforts of researchers for more than twenty years, this problem is being addressed by innovative systems that are raising logic-oriented data languages to the levels of generality and performance that are needed to support efficiently a broad range of applications. Foremost among these new systems, the Deductive Application Language System (DeALS) achieves superior generality and performance via new constructs and optimization techniques for monotonic aggregates which are described in the paper. The use of a special class of monotonic aggregates in recursion was made possible by recent theoretical results that proved that they preserve the rigorous least-fixpoint semantics of core Datalog programs. This paper thus describes how DeALS extends their definitions and modifies their syntax to enable a concise expression of applications that, without them, could not be expressed in performance-conducive ways, or could not be expressed at all. Then the paper turns to the performance issue, and introduces novel implementation and optimization techniques that outperform traditional approaches, including Semi-naive evaluation. An extensive experimental evaluation was executed comparing DeALS with other systems on large datasets. The results suggest that, unlike other systems, DeALS indeed combines superior generality with superior performance.

international conference on management of data | 2010

Optimizing content freshness of relations extracted from the web using keyword search

Mohan Yang; Haixun Wang; Lipyeow Lim; Min Wang

An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data accesses. As the data on the Web evolves, it is critical that the local copy be kept up-to-date. Data freshness is one of the most important data quality issues, and has been extensively studied for various applications including web crawling. However, web crawling is focused on obtaining as many raw web pages as possible. Our applications, on the other hand, are interested in specific content from specific data sources. Knowing the content or the semantics of the data enables us to differentiate data items based on their importance and volatility, which are key factors that impact the design of the data synchronization strategy. In this work, we formulate the concept of content freshness, and present a novel approach that maintains content freshness with least amount of web communication. Specifically, we assume data is accessible through a general keyword search interface, and we form keyword queries based on their selectivity, as well their contribution to content freshness of the local copy. Experiments show the effectiveness of our approach compared with several naive methods for keeping data fresh.

international joint conference on artificial intelligence | 2017

Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment

Muhao Chen; Yingtao Tian; Mohan Yang; Carlo Zaniolo

Many recent works have demonstrated the benefits of knowledge graph embeddings in completing monolingual knowledge graphs. Inasmuch as related knowledge bases are built in several different languages, achieving cross-lingual knowledge alignment will help people in constructing a coherent knowledge base, and assist machines in dealing with different expressions of entity relationships across diverse human languages. Unfortunately, achieving this highly desirable crosslingual alignment by human labor is very costly and errorprone. Thus, we propose MTransE, a translation-based model for multilingual knowledge graph embeddings, to provide a simple and automated solution. By encoding entities and relations of each language in a separated embedding space, MTransE provides transitions for each embedding vector to its cross-lingual counterparts in other spaces, while preserving the functionalities of monolingual embeddings. We deploy three different techniques to represent cross-lingual transitions, namely axis calibration, translation vectors, and linear transformations, and derive five variants for MTransE using different loss functions. Our models can be trained on partially aligned graphs, where just a small portion of triples are aligned with their cross-lingual counterparts. The experiments on cross-lingual entity matching and triple-wise alignment verification show promising results, with some variants consistently outperforming others on different tasks. We also explore how MTransE preserves the key properties of its monolingual counterpart TransE.

international conference on big data | 2014

Main memory evaluation of recursive queries on multicore machines

Mohan Yang; Carlo Zaniolo

Supporting iteration and/or recursion for advanced big data analytics requires reexamination of classical algorithms on modern computing environments. Several recent studies have focused on the implementation of transitive closure in multi-node clusters. Algorithms that deliver optimal performance on multi-node clusters are hardly optimal on multicore machines. We present an experimental study on finding efficient main memory recursive query evaluation algorithms on modern multi-core machines. We review SEMINAIVE, SMART and a pair of single-source closure (SSC) algorithms. We also propose a new hybrid SSC algorithm, named SSC12, which combines two previously known SSC algorithms. We implement these algorithms on a multicore shared memory machine, and compare their memory utilization, speed and scalability on synthetic and real-life datasets. Our experiments show that, on multicore machines, the surprisingly simple SSC12 is the only transitive-closure algorithm that is consistently fast and memory-efficient on all test graphs.

Theory and Practice of Logic Programming | 2017

Fixpoint semantics and optimization of recursive Datalog programs with aggregates

Carlo Zaniolo; Mohan Yang; Ariyam Das; Alexander Shkapsky; Tyson Condie; Matteo Interlandi

A very desirable Datalog extension investigated by many researchers in the last thirty years consists in allowing the use of the basic SQL aggregates min, max, count and sum in recursive rules. In this paper, we propose a simple comprehensive solution that extends the declarative least-fixpoint semantics of Horn Clauses, along with the optimization techniques used in the bottom-up implementation approach adopted by many Datalog systems. We start by identifying a large class of programs of great practical interest in which the use of min or max in recursive rules does not compromise the declarative fixpoint semantics of the programs using those rules. Then, we revisit the monotonic versions of count and sum aggregates proposed in (Mazuran et al. 2013b) and named, respectively, mcount and msum. Since mcount, and also msum on positive numbers, are monotonic in the lattice of set-containment, they preserve the fixpoint semantics of Horn Clauses. However, in many applications of practical interest, their use can lead to inefficiencies, that can be eliminated by combining them with max, whereby mcount and msum become the standard count and sum. Therefore, the semantics and optimization techniques of Datalog are extended to recursive programs with min, max, count and sum, making possible the advanced applications of superior performance and scalability demonstrated by BigDatalog (Shkapsky et al. 2016) and Datalog-MC (Yang et al. 2017). This paper is under consideration for acceptance in TPLP.

very large data bases | 2017

Scaling up the performance of more powerful Datalog systems on multicore machines

Mohan Yang; Alexander Shkapsky; Carlo Zaniolo

Extending RDBMS technology to achieve performance and scalability for queries that are much more powerful than those of SQL-2 has been the goal of deductive database research for more than thirty years. The

Information Sciences | 2014