Yinglong Xia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yinglong Xia is active.

Explore More

Publication

Featured researches published by Yinglong Xia.

ieee international conference on high performance computing data and analytics | 2015

GraphBIG: understanding graph computing in the context of industrial solutions

Lifeng Nai; Yinglong Xia; Ilie Gabriel Tanase; Hyesoon Kim; Ching-Yung Lin

With the emergence of data science, graph computing is becoming a crucial tool for processing big connected data. Although efficient implementations of specific graph applications exist, the behavior of full-spectrum graph computing remains unknown. To understand graph computing, we must consider multiple graph computation types, graph frameworks, data representations, and various data sources in a holistic way. In this paper, we present GraphBIG, a benchmark suite inspired by IBM System G project. To cover major graph computation types and data sources, GraphBIG selects representative datastructures, workloads and data sets from 21 real-world use cases of multiple application domains. We characterized GraphBIG on real machines and observed extremely irregular memory patterns and significant diverse behavior across different computations. GraphBIG helps users understand the impact of modern graph computing on the hardware architecture and enables future architecture and system research.

Proceedings of Workshop on GRAph Data management Experiences and Systems | 2014

A Highly Efficient Runtime and Graph Library for Large Scale Graph Analytics

Ilie Gabriel Tanase; Yinglong Xia; Lifeng Nai; Yanbin Liu; Wei Tan; Jason Crawford; Ching-Yung Lin

Graph analytics on big data is currently a very active area of research in both industry and academia. To support graph analytics efficiently a large number of graph processing systems have emerged targeting various perspectives of a graph application such as in memory and on disk representations, persistent storage, database capability, runtimes and execution models for exploiting parallelism, etc. In this paper we discuss a novel graph processing system called System G Native Store which allows for efficient graph data organization and processing on modern computing architectures. In particular we describe a runtime designed to exploit multiple levels of parallelism and a generic infrastructure that allows users to express graphs with various in memory and persistent storage properties. We experimentally show the efficiency of System G Native Store for processing graph queries on state-of-the-art platforms.

international conference on big data | 2014

Graph analytics and storage

Yinglong Xia; Ilie Gabriel Tanase; Lifeng Nai; Wei Tan; Yanbin Liu; Jason Crawford; Ching-Yung Lin

Many Big Data analytics essentially explore the relationship among interconnected entities, which are naturally represented as graphs. However, due to the irregular data access patterns in the graph computations, it remains a fundamental challenge to deliver highly efficient solutions for large scale graph analytics. Such inefficiency restricts the utilization of many graph algorithms in Big Data scenarios. To address the performance issues in large scale graph analytics, we develop a graph processing system called System G, which explores efficient graph data organization for parallel computing architectures. We discuss various graph data organizations and their impact on data locality during graph traversals, which results in various cache performance behavior on processor side. In addition, we analyze data parallelism from architectures perspective and experimentally show the efficiency for System G based graph analytics. We present experimental results for commodity multicore clusters and IBM PERCS supercomputers to illustrate the performance of System G for large scale graph analytics.

international conference on multimedia and expo | 2014

Concurrent image query using local random walk with restart on large scale graphs

Yinglong Xia; Jui-Hsin Lai; Lifeng Nai; Ching-Yung Lin

Efficient image query is a fundamental challenge in many large scale multimedia applications, especially when handling many queries concurrently. In this paper, we proposed a novel approach called graph local random walk for high performance concurrent image query. Specifically, we organize the massive images set into a large scale graph using graph database, according to the similarity between images. A heuristic method is utilized to map each query image to some vertex in the graph, followed by a local search to refine the query results using an alternative of local random walk on graph. The local random walk process is essentially a weighted partial traversal in the local subgraphs for finding a better match of the query images. We organize the graph of the image set in a parallelization amenable approach, so that a set of partial graph traversal for local random walk can be performed concurrently, taking the advantage of the multithreading capability of processors. We implemented the proposed method in state-of-the-art multicore platforms. The experimental result shows that the graph local random walk based approach outperforms baseline methods in terms of both throughput and scalability.

computing frontiers | 2014

Cache-conscious graph collaborative filtering on multi-socket multicore systems

Lifeng Nai; Yinglong Xia; Ching-Yung Lin; Bo Hong; Hsien-Hsin S. Lee

Recommendation systems using graph collaborative filtering often require responses in real time and high throughput. Therefore, besides recommendation accuracy, it is critical to study high performance concurrent collaborative filtering on modern platforms. To achieve high performance, we study the graph data locality characteristics of collaborative filtering. Our experiments demonstrate that although an individual graph traversal exhibits poor data locality, multiple queries have a tendency of sharing their data footprints, especially in the case of queries with neighboring root vertices. Such characteristics lead to both inter- and intra-thread data locality, which can be utilized to significantly improve collaborative filtering performance. Based on these observations, we present a cache-conscious system for collaborative filtering on modern multi-socket multicore platforms. In this system, we propose a cache-conscious query scheduling technique and an in-memory graph representation, and to maximize cache performance and minimize cross-core/socket communication overhead, we address both inter- and intra-thread data locality. To address the workload balancing issue, this study introduces a dynamic work-stealing mechanism to explore the tradeoff between workload balancing and cache-consciousness. The proposed system was evaluated on a Power7+ system against the IBM Knowledge Repository graph dataset. The results demonstrated both good scalability and throughput. Compared with the basic system that does not perform cache-conscious scheduling, inter-thread scheduling improves throughput by up to 18%. Intra-thread scheduling can further improve throughput by as much as 22%. By enabling dynamic work-stealing, the proposed technique balances workloads across all threads with a low standard deviation of the per-thread processing time.

Ibm Journal of Research and Development | 2016

Uncovering insider threats from the digital footprints of individuals

Anni Coden; Wan-Yi Lin; Jeff Boston; Julie MacNaught; Danny Soroker; Justin D. Weisz; Shimei Pan; Jui-Hsin Lai; Jie Lu; Steve Wood; Yinglong Xia; Ching-Yung Lin

We present a system to detect anomalous and ultimately malevolent behavior of people from their digital footprint within an institution. Tripwire approaches based on single features cannot adequately distinguish between normal unpredictable activities and truly counterproductive behavior. For example, a sequence of copying and sending small amounts of data can easily elude a pure single-feature tripwire approach. Here, we combine semantic knowledge with data mining methods. Our system uses a multi-layer architecture in which many aspects of a persons behavior are quantified and then fused using a large-scale anomaly detection Markovian Bayesian network. Evaluation results are based on data for 5,500 assumed to be non-malicious people collected from their activities on their workstations inside a corporation. An outside team augmented this data, with some of the 5,500 individuals (the perpetrators) acting in a malicious fashion. Our system represents the 5,500 people in a ranked list, with people most likely to act maliciously at the top. Our system identifies the perpetrators within the top 2% of the ranked list, while a purely statistical method ranks them within the top 25%. Our scalable infrastructure allows for plug-and-play of different analytics and maintains provenance of results.

international conference on multimedia and expo | 2015

IBM system G Social Media Solution: Analyze multimedia content, people, and network dynamics in context

Ching-Yung Lin; Danny L. Yeh; Nan Cao; Jui-Hsin Lai; Chun-Fu Chen; Conglei Shi; Jie Lu; Jason Crawford; Yinglong Xia; Sabrina Lin; Richard Hull; Fenno F. Terry Heath; Piyawadee Sukaviriya; SweeFen Goh

We present IBM System G Social Media Solution, which includes a suite of applications designed for in-context monitoring, exploration, and analysis of social multimedia content as well as related people and network dynamics. Each individual application focuses on a unique aspect of social media data analysis in relevant context; collectively, they provide a comprehensive set of tools for exploring and analyzing real-time and historical social media data at large scale. The solution is empowered by a unified data management platform, based on a property graph model, to efficiently handle a large variety of social media applications.

european conference on parallel processing | 2015

Accelerating Minimum Spanning Forest Computations on Multicore Platforms

Guojing Cong; Ilie Gabriel Tanase; Yinglong Xia

We propose new approaches for accelerating minimum spanning forest algorithms on shared-memory platforms. Our approaches improve cache performance and reduce synchronization overhead of the base algorithms. On our target platform these optimizations achieve up to an order of magnitude speedup over the best prior parallel \({Bor{\mathring{u}}vka}\) implementation.

Archive | 2014