Ying Qian
Queen's University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ying Qian.
international conference on supercomputing | 2003
Nathan R. Fredrickson; Ahmad Afsahi; Ying Qian
With the increasing popularity of small to large-scale symmetric multiprocessor (SMP) systems, there has been a dire need to have sophisticated, and flexible development and runtime environments for efficient and rapid development of parallel applications. To this end, OpenMP has emerged as the standard for parallel programming on shared-memory systems. It is very important to evaluate the performance of OpenMP constructs, kernels, and application benchmarks on large-scale SMP systems. We present the performance of the basic OpenMP constructs, class B of NAS OpenMP 3.0 benchmarks, and the SPEC OMPL2001 application benchmarks (large data set) on a contemporary 72-node Sun Fire 15K SMP node. We report the basic timings, scalability, and runtime profiles of different parallel regions within each benchmark in the NAS OpenMP 3.0, and the SPEC OMPL-2001 suites. We elaborate on the performance differences between the medium and large classes of the SPEC OMP2001 suites on our system, as well as a comparison among a number of large-scale symmetric multiprocessors for the SPEC OMPL2001.
international conference on cluster computing | 2007
Reza Zamani; Ahmad Afsahi; Ying Qian; V. Carl Hamacher
High-performance computing (HPC) systems consume a significant amount of power, resulting in high operational costs, reduced reliability, and wasting of natural resources. Therefore, power consumption has become an increasingly important design constraint in high-performance clusters. In this regard, research on power-aware HPC has emerged. While most research has focused at understanding and utilizing applicationspsila behavior to scale down the CPU for energy savings, this paper demonstrates the positive impact of modern interconnects in delivering energy-efficiency in high-performance clusters. In this work, we first present the power-performance profiles of the Myrinet-2000 and Quadrics QsNetII at the user-level and MPI-level in comparison to a traditional, non-offloaded Gigabit Ethernet. Such information enables us to devise a power-aware MPI runtime library that automatically and transparently performs message segmentation and re-assembly in order to increase energy savings. Secondly, by designing and evaluating a number of all-gather collectives, we argue that it is possible to increase the energy-efficiency of a cluster by optimizing its messaging layers.
high performance computing and communications | 2015
Tianda Yang; Yu Yang; Kai Qian; Dan Chia-Tien Lo; Ying Qian; Lixin Tao
Along with the rapid growth of new science and technology, the functions of smartphones become more and more powerful. Nevertheless, everything has two aspects. Smartphones bring so much convenience for people and also bring the security risks at the same time. Malicious application has become a big threat to the mobile security. Thus, an efficiency security analysis and detection method is important and necessary. Due to attacking of malicious application, user could not use smartphone normally and personal information could be stolen. What is worse, attacking proliferation will impact the healthy growth of the mobile Internet industry. To limit the growing speed of malicious application, the first thing we need to know what malicious application is and how to deal with. Detecting and analyzing their behaviors helps us deeply understand the attacking principle such that we can take effective countermeasures against malicious application. This article describes the basic Android component and manifest, the reason that Android is prevalent and why attacking came in. This paper analyzed and penetrated malicious ransom ware which threats mobile security now with our developed automated analysis approach for such mobile malware detection.
international conference on parallel processing | 2007
Ying Qian; Ahmad Afsahi
Clusters of symmetric multiprocessors (SMP) are more commonplace than ever in achieving high- performance. Scientific applications running on clusters employ collective communications extensively. Using shared memory communication among co- located processes on SMP nodes as well as remote direct memory access (RDMA) operations for inter- node communication and trying to overlap them is a proven technique in boosting the performance of collective operations. The effect is much more pronounced when efficient multi-port collectives on multi-rail networks are devised and implemented. In this work, we design and implement multi-port RDMA-based and SMP-aware all-gather algorithms with message striping over multi-rail QsNeII directly at the Elan level. We compare our algorithms against RDMA-only traditional algorithms and the native elan_gather(). Our performance results indicate that the proposed SMP-aware Brack all-gather gains an improvement of up to 1.96 for 4KB messages over the native elanjgather(). Meanwhile, the direct algorithm achieves up to 1.49 improvement for 32 KB messages.
local computer networks | 2004
Reza Zamani; Ying Qian; Ahmad Afsahi
It is important to systematically assess the features and performance of the new interconnects for high performance clusters. This work presents the performance of the two-port Myrinet networks at the GM2 and MPI layers using a complete set of microbenchmarks. We also present the communication characteristics and the performance of the NAS multi-zone benchmarks and SMG2000 application under the MPI and MPI-OpenMP programming paradigms. We found that the host overhead is very small in our cluster, and the Myrinet is sensitive to the buffer reuse patterns. Our applications achieved a better performance for MPI than the mixed-mode. All the applications studied use only nonblocking communications, thus are able to overlap their communications with the computations. Our experiments show that the two-port communication at the GM and MPI levels (except for the RDMA read, and overlap) outperforms the one-port communication for the bandwidth. However, this did not translate into a considerable improvement at least for our applications.
international parallel and distributed processing symposium | 2006
Ying Qian; Ahmad Afsahi
Many scientific applications use MPI collective communications intensively. Therefore, efficient and scalable implementation of collective operations is critical to the performance of such applications running on clusters. Quadrics QsNetII is a high-performance interconnect for clusters that implements some collectives at the Elan level. These collectives are directly used by their corresponding MPI collectives. Quadrics software supports point-to-point striping over multi-rail QsNetII networks. However, multi-rail collectives have not been supported. In this work, we propose a number of RDMA-based multi-port collectives over multi-rail QsNetII clusters directly at the Elan level. Our performance results indicate that the proposed multi-port gather gains an improvement of up to 6.35 for 1MB message over the native elan_gather. The proposed multi-port all-to-all performs better than the native elan_alltoall by a factor of 2.19 for 16KB message. Moreover, we have also proposed two algorithms for the scatter operation
european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2009
Ying Qian; Ahmad Afsahi
Recent studies have shown that processes in real applications can arrive at the collective calls at different times. This imbalanced process arrival pattern can significantly affect the performance of the collective operations. MPI_Alltoall() is a communication-intensive collective operation that is used in many parallel scientific applications. Its efficient implementation under different process arrival patterns is critical to the performance of applications that use them frequently. In this paper, we propose novel RDMA-based process arrival pattern aware MPI_Alltoall() algorithms over InfiniBand clusters. We extend the algorithms to be shared memory aware for small to medium size messages. The micro-benchmark and application results indicate that the proposed algorithms outperform the native implementation as well as their non-process arrival pattern aware counterparts when processes arrive at different times.
european conference on networks and communications | 2016
Ying Qian; Wanqing You; Kai Qian
Software-defined Network (SDN) is proposed as a new concept in computer networks, which separates the control plane from data plane. And it provides a programmable network architecture that could facilitate network innovation rapidly. OpenFlow is a network protocol that standardizes the communications between OpenFlow controllers and OpenFlow switches. It is considered as an enabler of SDN. The flow table in OpenFlow switches plays a critical role in OpenFlow-based SDN, which stores the rules populated by the controllers for controlling and directing the packet flows in SDN. Nevertheless, they also become a new target of malicious attacks. This paper analyzes the flow table overflow attack, a type of denial of service attacks, and proposes a systematic way to mitigate the overflow in flow table.
high performance computing systems and applications | 2007
Ying Qian; Ahmad Afsahi
Scientific applications written in MPI use collective communications intensively. Efficient and scalable implementation of such collective operations is therefore crucial to the performance of MPI applications running on clusters. Quadrics QsNetII is a high-performance network that implements some collectives at its Elan user-level library. Its MPI implementation uses such primitives directly. Quadrics communication software supports point-to-point message striping over multi-rail QsNetII networks. However, multi-rail collectives, other than broadcast, are not supported. In this work, we propose, design and implement a number of RDMA- based multi-port algorithms for the all-gather operation over multi-rail QsNetII clusters directly at the Elan level. Our performance results indicate that the proposed multi-port all-gather Direct algorithm gains an improvement of up to 1.49 for 32 KB messages over the native Elan_gather() collective.
wireless network security | 2015
Wanqing You; Kai Qian; Minzhe Guo; Prabir Bhattacharya; Ying Qian; Lixin Tao
Research on effective and efficient mobile threat analysis becomes an emerging and important topic in cybersecurity research area. Static analysis and dynamic analysis constitute two of the most popular types of techniques for security analysis and evaluation; nevertheless, each of them has its strengths and weaknesses. To leverage the benefits of both approaches, we propose a hybrid approach that integrates the static and dynamic analysis for detecting security threats in mobile applications. The key of this approach is the unification of data states and software execution on critical test paths. The approach consists of two phases. In the first phase, a pilot static analysis is conducted to identify potential critical attack paths based on Android APIs and existing attack patterns. In the second phase, a dynamic analysis follows the identified critical paths to execute the program in a limited and focused manner. Attacks shall be detected by checking the conformance of the detected paths with existing attack patterns. The method will report the types of detected attack scenarios based on types of sensitive data that may be compromised, such as web browser cookie.