Changyou Zhang
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Changyou Zhang.
Future Generation Computer Systems | 2017
Hongfei Zhu; Yu-an Tan; Xiaosong Zhang; Liehuang Zhu; Changyou Zhang; Jun Zheng
To process rapidly growing Big Data, many organizations migrate their data and services such as e-voting and e-payment systems to the cloud. In these two systems, blind signature has become an essential cryptographic primitive since it allows the signer to sign a message without learning what he signs. Thus, it can guarantee trustworthy of Big Data. However, most blind signature schemes based on factoring and discrete logarithm problems cannot resist quantum computer attacks. The alternative blind signature schemes are based on lattice. Here, we present a round-optimal lattice-based blind signature scheme constructed on the closest vector problem using infinity norm. Firstly, our scheme is proven blind and one-more unforgeable, and is resistant to brute-force attacks, theoretical-timing attacks, and NguyenRegev attacks. Secondly, our scheme outperforms the RSA, the Schnorr, and the ECC blind signature schemes in terms of efficiency and security. Also, it outperforms the Rckerts blind signature in terms of signature length, moves, and security. Finally, our scheme outperforms the Rckerts blind signature in terms of communication and computation energy costs. Additionally, it outperforms the RSA blind signature in terms of communication energy cost. We propose a novel CVP blind signature scheme based on lattice, which can guarantee trustworthy of Big Data.Our scheme can resist brute-force attacks, theoretical-timing attacks, and NguyenRegev attacks.Our scheme can offer statistical blindness and one-more unforgeability.Our round-optimal scheme outperforms the RSA, the Schnorr, and the ECC blind signature schemes in terms of efficiency and security.Our scheme outperforms the Rckerts lattice-based blind signature scheme in terms of signature length, moves, security, and energy cost.
international conference on parallel and distributed systems | 2010
Xiang Cui; Yifeng Chen; Changyou Zhang; Hong Mei
In this paper we discuss about our experiences in improving the performance of GEMM (both single and double precision) on Fermi architecture using CUDA, and how the new features of Fermi such as cache affect performance. It is found that the addition of cache in GPU on one hand helps the processers take advantage of data locality occurred in runtime but on the other hand renders the dependency of performance on algorithmic parameters less predictable. Auto tuning then becomes a useful technique to address this issue. Our auto-tuned SGEMM and DGEMM reach 563 GFlops and 253 GFlops respectively on Tesla C2050. The design and implementation entirely use CUDA and C and have not benefited from tuning at the level of binary code.
soft computing | 2018
Yuan Xue; Yu-an Tan; Chen Liang; Changyou Zhang; Jun Zheng
Compression file is a common form of carriers in network data transmission; therefore, it is essential to investigate the data hiding schemes for compression files. The existing data hiding schemes embed secret bits by shrinking the length of symbols, while they are not secure enough since the shrinking of symbol length is easily detected. First, we propose a longest match detecting algorithm that can detect the data hiding behavior of shrinking the length of symbols, by checking whether items of the generated dictionary are longest matches or not. Then, we propose a secret data hiding scheme based on Deflate codes, which reversibly embeds secret data by altering the matching process, to choose the proper matching result that the least significant bit of length field in [distance, length] pair is equal to the current embedded secret bit. The proposed data hiding scheme can resist on the longest match detection, and the embedding rate is higher than DH-LZW algorithm. The experiment shows that the proposed scheme achieves 5.12% of embedding rate and 10.18% size increase in the compressed file. Moreover, an optimization is made in providing practical suggestion for DH-Deflate data hiding. One can choose which format and size of files are to be selected based upon the optimization, and thus, data hiding work can be achieved in a convenient and targeted way.
Multimedia Tools and Applications | 2018
Xiaosong Zhang; Yu-an Tan; Changyou Zhang; Yuan Xue; Yuanzhang Li; Jun Zheng
Android devices is emerging as a significant force for multimedia big data, which hold an enormous amount of information about the users. The security and privacy concerns have arisen as a salient area of inquiry since malicious attackers can use memory dump to extract privacy or sensitive data from these devices. This paper presents a code protection approach for Android devices which protects certain processes from memory acquisition by process memory relocation. The protected processes are relocated to the special memory area where the kernel is loaded, and thus these processes will be covered when android reboots and attackers can not recognize which protected programs have been performed on the devices. The experiment results show that the proposed approach disables forensics tools like FROST to obtain these processes and has little impact on the normal operation of the protected program. Compared with the similar methods, the proposed method can protect greater data quantity but it occupies no additional storage resources.
Tsinghua Science & Technology | 2012
Changyou Zhang; Kun Huang; Xiang Cui; Yifeng Chen
To enhance the energy efficiency and performance of algorithms with Graphics Processing Unit (GPU) accelerators in source-code development, we consider the power efficiency based on data transfer bandwidth and power consumption in key situations. First, a set of primitives is abstracted from program statements. Then, data transfer bandwidth and power consumption in different granularity sizes are considered and mapped into proper primitives. With these mappings, a programmer can intuitively determine the power efficiency and performance in different running states of a thread. Finally, this intuition enables the programmer to tune the algorithm in order to achieve the best energy efficiency and performance. Using these power-aware principles, two Fast Fourier Transform (FFT) methods are compared. The mapping between power consumption and primitives is helpful for algorithm tuning in source-code levels.
Journal of Global Optimization | 2017
Li Chen; Yinrun Lyu; Chong Wang; Jingzheng Wu; Changyou Zhang; Nasro Min-Allah; Jamal Alhiyafi; Yongji Wang
Since Balas extended the classical linear programming problem to the disjunctive programming (DP) problem where the constraints are combinations of both logic AND and OR, many researchers explored this optimization problem under various theoretical or application scenarios such as generalized disjunctive programming (GDP), optimization modulo theories (OMT), robot path planning, real-time systems, etc. However, the possibility of combining these differently-described but form-equivalent problems into a single expression remains overlooked. The contribution of this paper is two folded. First, we convert the linear DP/GDP model, linear-arithmetic OMT problem and related application problems into an equivalent form, referred to as the linear optimization over arithmetic constraint formula (LOACF). Second, a tree-search-based algorithm named RS-LPT is proposed to solve LOACF. RS-LPT exploits the techniques of interval analysis and nonparametric estimation for reducing the search tree and lowering the number of visited nodes. Also, RS-LPT alleviates bad construction of search tree by backtracking and pruning dynamically. We evaluate RS-LPT against two most common DP/GDP methods, three state-of-the-art OMT solvers and the disjunctive transformation based method on optimization benchmarks with different types and scales. Our results favor RS-LPT as compared to existing competing methods, especially for large scale cases.
international parallel and distributed processing symposium | 2012
Changyou Zhang; Kun Huang; Xiang Cui; Yifeng Chen
On-chip parallelism with GPU accelerators is now ubiquitous and has received significant attention in the past few years. GPU is becoming an integral part of mainstream computing systems with highly parallel, multithreaded, many-core processors of great computational power and high memory bandwidth. Finding the best tradeoff between performance and power efficiency is more challenging than mere performance tuning. To find the principles of power-aware programming with GPU accelerators, we abstract a set of primitives from program statements. These power consumption values of primitives are helpful for power estimation during high-level program development.
international conference on green computing and communications | 2011
Changyou Zhang; Kun Huang; Xiang Cui; Yifeng Chen
GPU (Graphics Processing Unit) is becoming an integral part of mainstream computing systems with highly parallel, multithreaded, many-core processors of great computational power and high memory bandwidth. Finding the best tradeoff between performance and power efficiency is more challenging than mere performance tuning. To identify the principles of power-efficient programming on GPU clusters, we consider several primitives from the statements of parallel programs. The measured power for these primitives is applied to other programs for power estimation during program development. As a programming tool, this helps programmer evaluate the power issue of different optimization strategies.
international conference on algorithms and architectures for parallel processing | 2015
Changyou Zhang; Feng Wang; Kun Huang; Zhiyou Liu; Yifeng Chen
GPU is the mainstream co-processor computers of heterogeneous architecture. Parallel graph algorithms are fundamental for many data-driven applications to be solved on heterogeneous clusters. SSSP Single Source Shortest Path algorithm is one of the most important one. We proposed a graph representation structure with unified vertexs degree. This method ensures the data block size consistency. And then, the transferring in memory on this representation makes the data reading in cohesion for CUDA thread blocks. Thirdly, vertex renumbering optimizes the locality of graph vertices to make the relaxing operation more efficient. With data of New York road, we implemented SSSP algorithms of delta-stepping on Nvida Tesla K20x GPU device. The experimental results show that the best unified-degree is approximate to the mode of vertex-degree of the graph. For example of the New York road map, all degrees were unified into 4-degree that results to the biggest speed-up of SSSP algorithm.
Science in China Series F: Information Sciences | 2018
Chong Wang; Changyou Zhang; Bin Wu; Yu’an Tan; Yongji Wang
1University of Chinese Academy of Sciences, Beijing 100049, China; 2Cooperative Innovation Center, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; 3State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; 4Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; 5School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China