Yidong Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yidong Li is active.

Explore More

Publication

Featured researches published by Yidong Li.

international conference on data mining | 2010

Anonymizing Graphs Against Weight-Based Attacks

Yidong Li; Hong Shen

The increasing popularity of graph data, such as social and online communities, has initiated a prolific research area in knowledge discovery and data mining. As more real-world graphs are released publicly, there is growing concern about privacy breaching for the entities involved. An adversary may reveal identities of individuals in a published graph by having the topological structure and/or basic graph properties as background knowledge. Many previous studies addressing such attack as identity disclosure, however, concentrate on preserving privacy in simple graph data only. In this paper, we consider the identity disclosure problem in weighted graphs. The motivation is that, a weighted graph can introduce much more unique information than its simple version, which makes the disclosure easier. We first formalize a general anonymization model to deal with weight-based attacks. Then two concrete attacks are discussed based on weight properties of a graph, including the sum and the set of adjacent weights for each vertex. We also propose a complete solution for the weight anonymization problem to prevent a graph from both attacks. Our approaches are efficient and practical, and have been validated by extensive experiments on both synthetic and real-world datasets.

IEEE Transactions on Information Forensics and Security | 2013

On Identity Disclosure Control for Hypergraph-Based Data Publishing

Yidong Li; Hong Shen

Data publishing based on hypergraphs is becoming increasingly popular due to its power in representing multirelations among objects. However, security issues have been little studied on this subject, while most recent work only focuses on the protection of relational data or graphs. As a major privacy breach, identity disclosure reveals the identification of entities with certain background knowledge known by an adversary. In this paper, we first introduce a novel background knowledge attack model based on the property of hyperedge ranks, and formalize the rank-based hypergraph anonymization problem. We then propose a complete solution in a two-step framework: rank anonymization and hypergraph reconstruction. We also take hypergraph clustering (known as community detection) as data utility into consideration, and discuss two metrics to quantify information loss incurred in the perturbation. Our approaches are effective in terms of efficacy, privacy, and utility. The algorithms run in near-quadratic time on hypergraph size, and protect data from rank attacks with almost the same utility preserved. The performances of the methods have been validated by extensive experiments on real-world datasets as well. Our rank-based attack model and algorithms for rank anonymization and hypergraph reconstruction are, to our best knowledge, the first systematic study to privacy preserving for hypergraph-based data publishing.

Knowledge Based Systems | 2016

Smart train operation algorithms based on expert knowledge and ensemble CART for the electric locomotive

Jiateng Yin; Dewang Chen; Yidong Li

We summarize expert knowledge rules from experienced drivers to ensure safety and riding comfort of train operations.We apply data mining algorithms in train operations to make the best use of historical driving data.Two STO algorithms are proposed by combining expert knowledge, data mining and train parking methods.The two STO algorithms are better than ATO and manual driving.The STO approaches have good flexibility with disturbances. In subway systems, the automatic train operation (ATO) is gradually replacing manual driving for its high punctuality and parking accuracy. But the existing ATO systems have some drawbacks in riding comfort and energy-consumption compared with the manual driving by experienced drivers. To combine the advantages of ATO and manual driving, this paper proposes a Smart Train Operation (STO) approach based on the fusion of expert knowledge and data mining algorithms. First, we summarize the domain expert knowledge rules to ensure the safety and riding comfort. Then, we apply a regression algorithm named as CART (Classification And Regression Tree) and ensemble learning methods (i.e. Bagging and LSBoost) to obtain the valuable information from historical driving data, which are collected in the Beijing subway Yizhuang line. Besides, a heuristic train station parking algorithm (HSA) by using the positioning data storage in balises is proposed to realize precisely parking. By combing the expert knowledge, data mining algorithms and HSA, two comprehensive STO algorithms, i.e., STOB and STOL are developed for subway train operations. The proposed STO algorithms are tested by comparing both ATO and manual driving on a real-world case of the Beijing subway Yizhuang line. The results indicate that the developed STO approach is better than ATO in energy consumption and riding comfort, and it also outperforms manual driving in punctuality and parking accuracy. Finally, the flexibility of STOL and STOB is verified with extensive experiments by considering different kinds of disturbances in real-world applications.

International Journal of Production Research | 2005

Digital enterprise management in China: current status and future development

Xiaofei Xu; L. Zhang; Yidong Li; Dechen Zhan

Digital management plays an important role in modern enterprise management. Supported by the Chinese National High-Tech R&D Program on CIMS, the techniques and software of enterprise information systems and digital management systems have been studied and applied in China for more than 20 years. The paper provides a comprehensive review on the development, current status, and future development of enterprise information systems and digital management systems in China, as well as the software products and the market for digital management.

The Journal of Supercomputing | 2014

TB-SnW: Trust-based Spray-and-Wait routing for delay-tolerant networks

Aysha Al-Hinai; Haibo Zhang; Yawen Chen; Yidong Li

Unlike the conventional routing techniques in Internet where routing privileges are given to trustworthy and fully authenticated nodes, delay-tolerant networks (DTNs) allow any node to participate in routing due to the lack of consistent infrastructure and central administration. This creates new security issues since even authorized nodes in DTNs could inject several malicious threats into the network. This paper investigates the problem of mitigating blackhole attacks in DTNs based on the Spray-and-Wait routing protocol. A new knowledge-based routing scheme, called Trust-Based Spray-and-Wait protocol (TB-SnW), is designed based on distributed trust management. Each node maintains the trust levels for encountered nodes based on the message exchange history, and uses the trust levels to smartly distribute message copies to bypass blackhole attackers. Simulation results demonstrate that, compared with Spray-and-Wait, TB-SnW can achieve higher message delivery rate with very low communication overhead in DTNs that suffer from blackhole attacks.

Multimedia Tools and Applications | 2016

A learning-based comprehensive evaluation model for traffic data quality in intelligent transportation systems

Yidong Li; Dewang Chen

Human motion modelling has attracted more and more attentions in various industrial fields with the event of information technology. Previous studies focus on capturing, animating, understanding and modelling human gestures or physical activities. However, in many applications such as Intelligent Transportation Systems (ITS), the traffic data quality (TDQ) is becoming a critical issue which can has great influence on the efficiency of the modelling. In this paper, we focus on evaluating the traffic data quality (TDQ) from the large amount of detectors and traffic flow data in the modelling of Intelligent Transportation Systems (ITS). We first introduce four error indices of an occupancy speed model and an occupancy flow model as model evaluation indices, and two indices from experts as non-model evaluation indices. Then, we propose a comprehensive evaluation model (CEM) for TDQ. Furthermore, we develop two algorithms for training the parameters in CEM based on the least square method (LSM) and the adaptive network based fuzzy inference system (ANFIS). We compare the proposed algorithms with the real-world traffic flow data which has been collected on Beijing ring-roads and connected lines. The experimental results show that the ANFIS-based learning method outperforms in most scenarios and ensures the evaluation error less than 10 %, which can significantly improve the efficiency of identifying traffic flow detectors with low data quality.

pacific-asia conference on knowledge discovery and data mining | 2013

A Self-immunizing Manifold Ranking for Image Retrieval

Jun Wu; Yidong Li; Songhe Feng; Hong Shen

Manifold ranking (MR), as a powerful semi-supervised learning algorithm, plays an important role to deal with the relevance feedback problem in content-based image retrieval (CBIR). However, conventional MR has two main drawbacks: 1) in many cases, it is prone to exploit “unreliable” unlabeled images when deployed in CBIR due to the semantic gap; 2) the performance of MR is quite sensitive to the scale parameter used for calculating the Laplacian matrix. In this work, a self-immunizing MR approach is presented to address the drawbacks. Concretely, we first propose an elastic kNN graph as well as its constructing algorithm to exploit unlabeled images “safely”, and then develop a local scaling solution to calculate the Laplacian matrix adaptively. Extensive experiments on 10,000 Corel images show that the proposed algorithm is more effective than the state-of-the-art approaches.

pacific-asia conference on knowledge discovery and data mining | 2014

A Selectively Re-train Approach Based on Clustering to Classify Concept-Drifting Data Streams with Skewed Distribution

Dandan Zhang; Hong Shen; Tian Hui; Yidong Li; Jun Wu; Yingpeng Sang

Classification is an important and practical tool which uses a model built on historical data to predict class labels for new arrival data. In the last few years, there have been many interesting studies on classification in data streams. However, most such studies assume that those data streams are relatively balanced and stable. Actually, skewed data streams (e.g., few positive but lots of negatives) are very important and typical, which appear in many real world applications. Concept drifts and skewed distributions, two common properties of data streams, make the task of learning in streams particularly difficult and the traditional data mining algorithms no longer work. In this paper, we propose a method (Selectively Re-train Approach Based on Clustering) which can deal with concept-drifting and skewed distribution simultaneously. We evaluate our algorithm on both synthetic and real data sets simulating skewed data streams. Empirical results show the proposed method yields better performance than the previous work.

parallel and distributed computing applications and technologies | 2013

Simulated-Annealing Load Balancing for Resource Allocation in Cloud Environments

Zongqin Fan; Hong Shen; Yanbo Wu; Yidong Li

Recently, the development of cloud computing has received considerable attention. For cloud service providers, packing VMs onto a small number of servers is an effective way to reduce energy costs, so as to improve the efficiency of the data center. However allocating too many VMs on a physical machine may cause some hot spots which violate the SLA of applications. Load balancing of the entire system is hence needed to guarantee the SLA. In this paper, we present a simulated-annealing load balancing algorithm for solving the resource allocation and scheduling problem in a cloud computing environment. Experimental results show that this method is able to achieve load balancing, and performs better than the round robin and basic simulated-annealing algorithms.

parallel and distributed computing: applications and technologies | 2010

On Identity Disclosure in Weighted Graphs

Yidong Li; Hong Shen

As an integral part of data security, identity disclosureis a major privacy breach, which reveals the identification of entities with certain background knowledge known by an adversary. Most recent studies on this problem focus on the protection of relational data or simple graph data (i.e. undirected, un weighted and acyclic). However, a weighted graph can introduce much more unique information than its simple version, which makes the disclosure easier. As more real-world graphs or social networks are released publicly, there is growing concern about privacy breaching for the entities involved. In this paper, we first formalize a general anonymizing model to deal with weight-related attacks, and discuss an efficient metric to quantify information loss incurred in the perturbation. Then we consider a very practical attack based on the sum of adjacent weights for each vertex, which is known as volume in graph theory field. We also propose a complete solution for the weight anonymization problem to prevent a graph from volume attack. Our approaches are efficient and practical, and have been validated by extensive experiments on both synthetic and real-world datasets.

Explore More