Xingquan Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xingquan Zhu is active.

Explore More

Publication

Featured researches published by Xingquan Zhu.

IEEE Transactions on Knowledge and Data Engineering | 2014

Data mining with big data

Xindong Wu; Xingquan Zhu; Gongqing Wu; Wei Ding

Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.

Artificial Intelligence Review | 2003

Class noise vs. attribute noise: a quantitative study of their impacts

Xingquan Zhu; Xindong Wu

Real-world data is never perfect and can often suffer from corruptions (noise) that may impact interpretations of the data, models created from the data and decisions made based on the data. Noise can reduce system performance in terms of classification accuracy, time in building a classifier and the size of the classifier. Accordingly, most existing learning algorithms have integrated various approaches to enhance their learning abilities from noisy environments, but the existence of noise can still introduce serious negative impacts. A more reasonable solution might be to employ some preprocessing mechanisms to handle noisy instances before a learner is formed. Unfortunately, rare research has been conducted to systematically explore the impact of noise, especially from the noise handling point of view. This has made various noise processing techniques less significant, specifically when dealing with noise that is introduced in attributes. In this paper, we present a systematic evaluation on the effect of noise in machine learning. Instead of taking any unified theory of noise to evaluate the noise impacts, we differentiate noise into two categories: class noise and attribute noise, and analyze their impacts on the system performance separately. Because class noise has been widely addressed in existing research efforts, we concentrate on attribute noise. We investigate the relationship between attribute noise and classification accuracy, the impact of noise at different attributes, and possible solutions in handling attribute noise. Our conclusions can be used to guide interested readers to enhance data quality by designing various noise handling mechanisms.

IEEE Transactions on Multimedia | 2004

ClassView: hierarchical video shot classification, indexing, and accessing

Jianping Fan; Ahmed K. Elmagarmid; Xingquan Zhu; Walid G. Aref; Lide Wu

Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1) Semantics-sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database. Relevance analysis is used for selecting the discriminating visual features with suitable importances. The Expectation-Maximization (EM) algorithm is also used to determine the classification rule for each visual concept node in the classifier. 2) A hierarchical video database indexing and summary presentation technique is proposed to support more effective video access over a large-scale database. The hierarchical tree structure of our video database indexing scheme is determined by the domain-dependent concept hierarchy which is also used for video classification. The presentation of visual summary is also integrated with the inherent hierarchical video database indexing tree structure. Integrating video access with efficient database indexing tree structure has provided great opportunity for supporting more powerful video search engines.

IEEE Transactions on Knowledge and Data Engineering | 2005

Video data mining: semantic indexing and event detection from the association perspective

Xingquan Zhu; Xindong Wu; Ahmed K. Elmagarmid; Zhe Feng; Lide Wu

Advances in the media and entertainment industries, including streaming audio and digital TV, present new challenges for managing and accessing large audio-visual collections. Current content management systems support retrieval using low-level features, such as motion, color, and texture. However, low-level features often have little meaning for naive users, who much prefer to identify content using high-level semantics or concepts. This creates a gap between systems and their users that must be bridged for these systems to be used effectively. To this end, in this paper, we first present a knowledge-based video indexing and content management framework for domain specific videos (using basketball video as an example). We will provide a solution to explore video knowledge by mining associations from video data. The explicit definitions and evaluation measures (e.g., temporal support and confidence) for video associations are proposed by integrating the distinct feature of video data. Our approach uses video processing techniques to find visual and audio cues (e.g., court field, camera motion activities, and applause), introduces multilevel sequential association mining to explore associations among the audio and visual cues, classifies the associations by assigning each of them with a class label, and uses their appearances in the video to construct video indices. Our experimental results demonstrate the performance of the proposed approach.

knowledge discovery and data mining | 2005

Combining proactive and reactive predictions for data streams

Ying Yang; Xindong Wu; Xingquan Zhu

Mining data streams is important in both science and commerce. Two major challenges are (1) the data may grow without limit so that it is difficult to retain a long history; and (2) the underlying concept of the data may change over time. Different from common practice that keeps recent raw data, this paper uses a measure of conceptual equivalence to organize the data history into a history of concepts. Along the journey of concept change, it identifies new concepts as well as re-appearing ones, and learns transition patterns among concepts to help prediction. Different from conventional methodology that passively waits until the concept changes, this paper incorporates proactive and reactive predictions. In a proactive mode, it anticipates what the new concept will be if a future concept change takes place, and prepares prediction strategies in advance. If the anticipation turns out to be correct, a proper prediction model can be launched instantly upon the concept change. If not, it promptly resorts to a reactive mode: adapting a prediction model to the new data. A system RePro is proposed to implement these new ideas. Experiments compare the system with representative existing prediction methods on various benchmark data sets that represent diversified scenarios of concept change. Empirical evidence demonstrates that the proposed methodology is an effective and efficient solution to prediction for data streams.

Knowledge and Information Systems | 2013

A survey on instance selection for active learning

Yifan Fu; Xingquan Zhu; Bin Li

Active learning aims to train an accurate prediction model with minimum cost by labeling most informative instances. In this paper, we survey existing works on active learning from an instance-selection perspective and classify them into two categories with a progressive relationship: (1) active learning merely based on uncertainty of independent and identically distributed (IID) instances, and (2) active learning by further taking into account instance correlations. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/weaknesses, followed by a simple runtime performance comparison, and discussion about emerging active learning applications and instance-selection challenges therein. This survey intends to provide a high-level summarization for active learning and motivates interested readers to consider instance-selection approaches for designing effective active learning solutions.

international conference on tools with artificial intelligence | 2013

Machine Learning for Android Malware Detection Using Permission and API Calls

Naser Peiravian; Xingquan Zhu

The Google Android mobile phone platform is one of the most anticipated smartphone operating systems on the market. The open source Android platform allows developers to take full advantage of the mobile operation system, but also raises significant issues related to malicious applications. On one hand, the popularity of Android absorbs attention of most developers for producing their applications on this platform. The increased numbers of applications, on the other hand, prepares a suitable prone for some users to develop different kinds of malware and insert them in Google Android market or other third party markets as safe applications. In this paper, we propose to combine permission and API (Application Program Interface) calls and use machine learning methods to detect malicious Android Apps. In our design, the permission is extracted from each Apps profile information and the APIs are extracted from the packed App file by using packages and classes to represent API calls. By using permissions and API calls as features to characterize each Apps, we can learn a classifier to identify whether an App is potentially malicious or not. An inherent advantage of our method is that it does not need to involve any dynamical tracing of the system calls but only uses simple static analysis to find system functions involved in each App. In addition, because permission settings and APIs are alwaysavailable for each App, our method can be generalized to all mobile applications. Experiments on real-world Apps with more than 1200 malware and 1200 benign samples validate the algorithm performance.

knowledge discovery and data mining | 2008

Categorizing and mining concept drifting data streams

Peng Zhang; Xingquan Zhu; Yong Shi

Mining concept drifting data streams is a defining challenge for data mining research. Recent years have seen a large body of work on detecting changes and building prediction models from stream data, with a vague understanding on the types of the concept drifting and the impact of different types of concept drifting on the mining algorithms. In this paper, we first categorize concept drifting into two scenarios: Loose Concept Drifting (LCD) and Rigorous Concept Drifting (RCD), and then propose solutions to handle each of them separately. For LCD data streams, because concepts in adjacent data chunks are sufficiently close to each other, we apply kernel mean matching (KMM) method to minimize the discrepancy of the data chunks in the kernel space. Such a minimization process will produce weighted instances to build classifier ensemble and handle concept drifting data streams. For RCD data streams, because genuine concepts in adjacent data chunks may randomly and rapidly change, we propose a new Optimal Weights Adjustment (OWA) method to determine the optimum weight values for classifiers trained from the most recent (up-to-date) data chunk, such that those classifiers can form an accurate classifier ensemble to predict instances in the yet-to-come data chunk. Experiments on synthetic and real-world datasets will show that weighted instance approach is preferable when the concept drifting is mainly caused by the changing of the class prior probability; whereas the weighted classifier approach is preferable when the concept drifting is mainly triggered by the changing of the conditional probability.

systems man and cybernetics | 2010

Active Learning From Stream Data Using Optimal Weight Classifier Ensemble

Xingquan Zhu; Peng Zhang; Xiaodong Lin; Yong Shi

In this paper, we propose a new research problem on active learning from data streams, where data volumes grow continuously, and labeling all data is considered expensive and impractical. The objective is to label a small portion of stream data from which a model is derived to predict future instances as accurately as possible. To tackle the technical challenges raised by the dynamic nature of the stream data, i.e., increasing data volumes and evolving decision concepts, we propose a classifier-ensemble-based active learning framework that selectively labels instances from data streams to build a classifier ensemble. We argue that a classifier ensembles variance directly corresponds to its error rate, and reducing a classifier ensembles variance is equivalent to improving its prediction accuracy. Because of this, one should label instances toward the minimization of the variance of the underlying classifier ensemble. Accordingly, we introduce a minimum-variance (MV) principle to guide the instance labeling process for data streams. In addition, we derive an optimal-weight calculation method to determine the weight values for the classifier ensemble. The MV principle and the optimal weighting module are combined to build an active learning framework for data streams. Experimental results on synthetic and real-world data demonstrate the performance of the proposed work in comparison with other approaches.

Multimedia Systems | 2003

Hierarchical video content description and summarization using unified semantic and visual similarity

Xingquan Zhu; Jianping Fan; Ahmed K. Elmagarmid; Xindong Wu

Abstract.Video is increasingly the medium of choice for a variety of communication channels, resulting primarily from increased levels of networked multimedia systems. One way to keep our heads above the video sea is to provide summaries in a more tractable format. Many existing approaches are limited to exploring important low-level feature related units for summarization. Unfortunately, the semantics, content and structure of the video do not correspond to low-level features directly, even with closed-captions, scene detection, and audio signal processing. The drawbacks of existing methods are the following: (1) instead of unfolding semantics and structures within the video, low-level units usually address only the details, and (2) any important unit selection strategy based on low-level features cannot be applied to general videos. Providing users with an overview of the video content at various levels of summarization is essential for more efficient database retrieval and browsing. In this paper, we present a hierarchical video content description and summarization strategy supported by a novel joint semantic and visual similarity strategy. To describe the video content efficiently and accurately, a video content description ontology is adopted. Various video processing techniques are then utilized to construct a semi-automatic video annotation framework. By integrating acquired content description data, a hierarchical video content structure is constructed with group merging and clustering. Finally, a four layer video summary with different granularities is assembled to assist users in unfolding the video content in a progressive way. Experiments on real-word videos have validated the effectiveness of the proposed approach.

Explore More