Is this you? Create Your Porfile

Wubai Zhou

Florida International University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wubai Zhou is active.

Explore More

Publication

Featured researches published by Wubai Zhou.

ACM Computing Surveys | 2017

Data-Driven Techniques in Disaster Information Management

Tao Li; Ning Xie; Chunqiu Zeng; Wubai Zhou; Li Zheng; Yexi Jiang; Yimin Yang; Hsin-Yu Ha; Wei Xue; Yue Huang; Shu-Ching Chen; Jainendra K. Navlakha; S. Sitharama Iyengar

Improving disaster management and recovery techniques is one of national priorities given the huge toll caused by man-made and nature calamities. Data-driven disaster management aims at applying advanced data collection and analysis technologies to achieve more effective and responsive disaster management, and has undergone considerable progress in the last decade. However, to the best of our knowledge, there is currently no work that both summarizes recent progress and suggests future directions for this emerging research area. To remedy this situation, we provide a systematic treatment of the recent developments in data-driven disaster management. Specifically, we first present a general overview of the requirements and system architectures of disaster management systems and then summarize state-of-the-art data-driven techniques that have been applied on improving situation awareness as well as in addressing users’ information needs in disaster management. We also discuss and categorize general data-mining and machine-learning techniques in disaster management. Finally, we recommend several research directions for further investigations.

knowledge discovery and data mining | 2013

FIU-Miner: a fast, integrated, and user-friendly system for data mining in distributed environment

Chunqiu Zeng; Yexi Jiang; Li Zheng; Jingxuan Li; Lei Li; Hongtai Li; Chao Shen; Wubai Zhou; Tao Li; Bing Duan; Ming Lei; Pengnian Wang

The advent of Big Data era drives data analysts from different domains to use data mining techniques for data analysis. However, performing data analysis in a specific domain is not trivial; it often requires complex task configuration, onerous integration of algorithms, and efficient execution in distributed environments.Few efforts have been paid on developing effective tools to facilitate data analysts in conducting complex data analysis tasks. In this paper, we design and implement FIU-Miner, a Fast, Integrated, and User-friendly system to ease data analysis. FIU-Miner allows users to rapidly configure a complex data analysis task without writing a single line of code. It also helps users conveniently import and integrate different analysis programs. Further, it significantly balances resource utilization and task execution in heterogeneous environments. A case study of a real-world application demonstrates the efficacy and effectiveness of our proposed system.

integrated network management | 2015

Resolution recommendation for event tickets in service management

Wubai Zhou; Liang Tang; Tao Li; Larisa Shwartz; Genady Grabarnik

In recent years, IT Service Providers have been rapidly transforming to an automated service delivery model. This is due to advances in technology and driven by the unrelenting market pressure to reduce cost and maintain quality. Tremendous progress has been made to date towards attainment of truly automated service delivery; that is, the ability to deliver the same service automatically using the same process with the same quality. However, automating Incident and Problem Management continuous to be a difficult problem, particularly due to the growing complexity of IT environments. Software monitoring systems are designed to actively collect and signal event occurrances and, when necessary, automatically generate incident tickets. Repeating events generate similar tickets, which in turn have a vast number of repeated problem resolutions likely to be found in earlier tickets. In this paper we find an appropriate resolution by making use of similarities between the events and previous resolutions of similar events. Traditional KNN (K Nearest Neighbor) algorithm has been used to recommend resolutions for incoming tickets. However, the effectiveness of recommendation heavily relies on the underlying similarity measure in KNN. In this paper, we significantly improve the similarity measure used in KNN by utilizing both the event and resolution information in historical tickets via a topic-level feature extraction using the LDA (Latent Dirichlet Allocation) model. In addition, when resolution categories are available, we propose to learn a more effective similarity measure using metric learning. Extensive empirical evaluations on three ticket data sets demonstrate the effectiveness and efficiency of our proposed methods.

knowledge discovery and data mining | 2014

Applying data mining techniques to address critical process optimization needs in advanced manufacturing

Li Zheng; Chunqiu Zeng; Lei Li; Yexi Jiang; Wei Xue; Jingxuan Li; Chao Shen; Wubai Zhou; Hongtai Li; Liang Tang; Tao Li; Bing Duan; Ming Lei; Pengnian Wang

Advanced manufacturing such as aerospace, semi-conductor, and flat display device often involves complex production processes, and generates large volume of production data. In general, the production data comes from products with different levels of quality, assembly line with complex flows and equipments, and processing craft with massive controlling parameters. The scale and complexity of data is beyond the analytic power of traditional IT infrastructures. To achieve better manufacturing performance, it is imperative to explore the underlying dependencies of the production data and exploit analytic insights to improve the production process. However, few research and industrial efforts have been reported on providing manufacturers with integrated data analytical solutions to reveal potentials and optimize the production process from data-driven perspectives. In this paper, we design, implement and deploy an integrated solution, named PDP-Miner, which is a data analytics platform customized for process optimization in Plasma Display Panel (PDP) manufacturing. The system utilizes the latest advances in data mining technologies and Big Data infrastructures to create a complete analytical solution. Besides, our proposed system is capable of supporting automatically configuring and scheduling analysis tasks, and balancing heterogeneous computing resources. The system and the analytic strategies can be applied to other advanced manufacturing fields to enable complex data analysis tasks. Since 2013, PDP-Miner has been deployed as the data analysis platform of ChangHong COC. By taking the advantages of our system, the overall PDP yield rate has increased from 91% to 94%. The monthly production is boosted by 10,000 panels, which brings more than 117 million RMB of revenue improvement per year.

information reuse and integration | 2014

Generating textual storyline to improve situation awareness in disaster management

Wubai Zhou; Chao Shen; Tao Li; Shu-Ching Chen; Ning Xie

Hurricane Sandy affected the east coast of U.S. in 2012 and posed immense threats to businesses, human lives and properties. In order to minimize the consequent loss of a catastrophe like this, a critical task in disaster management is to understand situation updates about the disaster from a large number of disaster-related documents, and obtain a big picture of the disasters trends and how it affects different areas. In this paper, we present a two-layer storyline generation framework which generates an overall or a global storyline of the disaster events in the first layer, and provides condensed information about specific regions affected by the disaster (i.e., a location-specific storyline) in the second layer. To generate the overall storyline of a disaster, we consider both temporal and spatial factors, which are encoded using integer linear programming. While for location-specific storylines, we employ a Steiner tree based method. Compared with the previous work of storyline generation, which generates flat storylines without considering spatial information, our framework is more suitable for large-scale disaster events. We further demonstrate the efficacy of our proposed framework through the evaluation on the datasets of three major hurricane disasters.

IEEE Transactions on Services Computing | 2016

An Integrated framework for Mining Temporal Logs from Fluctuating Events

Chunqiu Zeng; Liang Tang; Wubai Zhou; Tao Li; Larisa Shwartz; Genady Grabarnik

The importance of mining time lags of hidden temporal dependencies from sequential data is highlighted in many domains including system management, stock market analysis, climate monitoring, and more. Mining time lags of temporal dependencies provides useful insights into the understanding of sequential data and predicting its evolving trend. Traditional methods mainly utilize the predefined time window to analyze the sequential items, or employ statistical techniques to identify the temporal dependencies from a sequential data. However, it is a challenging task for existing methods to find the time lag of temporal dependencies in the real world, where time lags are fluctuating, noisy, and interleaved with each other. In order to identify temporal dependencies with time lags in this setting, this paper comes up with an integrated framework from both system and algorithm perspectives. Specifically, a novel parametric model is introduced to model the noisy time lags for temporal dependencies discovery between events. Based on the parametric model, an efficient expectation maximization approach is proposed for time lag discovery with maximum likelihood. Furthermore, this paper also contributes an approximation method for learning time lag to improve the scalability in terms of the number of events, without incurring significant loss of accuracy.

ieee international conference on services computing | 2017

Constructing the Knowledge Base for Cognitive IT Service Management

Qing Wang; Wubai Zhou; Chunqiu Zeng; Tao Li; Larisa Shwartz; Genady Grabarnik

The increasing complexity of IT environments dictates the usage of intelligent automation driven by cognitive technologies, aiming at providing higher quality and more complex services. Inspired by cognitive computing, an integrated framework is proposed for a problem resolution. In order to improve the efficiency of the problem resolution process, it is crucial to formalize problem records and discover relationships between elements of the records, records overall and other technical information. In the proposed framework, the domain knowledge is modeled using ontology. The key contribution of the framework is a novel domain specific approach for extracting useful phrases, that enables an automation improvement through resolution recommendation utilizing the ontology modeling technique. The effectiveness and efficiency of our framework are evaluated by an extensive empirical study of a large scale real ticket data.

ACM Computing Surveys | 2017

Data-Driven Techniques in Computing System Management

Tao Li; Chunqiu Zeng; Yexi Jiang; Wubai Zhou; Liang Tang; Zheng Liu; Yue Huang

Modern forms of computing systems are becoming progressively more complex, with an increasing number of heterogeneous hardware and software components. As a result, it is quite challenging to manage these complex systems and meet the requirements in manageability, dependability, and performance that are demanded by enterprise customers. This survey presents a variety of data-driven techniques and applications with a focus on computing system management. In particular, the survey introduces intelligent methods for event generation that can transform diverse log data sources into structured events, reviews different types of event patterns and the corresponding event-mining techniques, and summarizes various event summarization methods and data-driven approaches for problem diagnosis in system management. We hope this survey will provide a good overview for data-driven techniques in computing system management.

Computer Methods and Programs in Biomedicine | 2016

An immune-inspired semi-supervised algorithm for breast cancer diagnosis

Lingxi Peng; Wenbin Chen; Wubai Zhou; Fufang Li; Jin Yang; Jiandong Zhang

Breast cancer is the most frequently and world widely diagnosed life-threatening cancer, which is the leading cause of cancer death among women. Early accurate diagnosis can be a big plus in treating breast cancer. Researchers have approached this problem using various data mining and machine learning techniques such as support vector machine, artificial neural network, etc. The computer immunology is also an intelligent method inspired by biological immune system, which has been successfully applied in pattern recognition, combination optimization, machine learning, etc. However, most of these diagnosis methods belong to a supervised diagnosis method. It is very expensive to obtain labeled data in biology and medicine. In this paper, we seamlessly integrate the state-of-the-art research on life science with artificial intelligence, and propose a semi-supervised learning algorithm to reduce the need for labeled data. We use two well-known benchmark breast cancer datasets in our study, which are acquired from the UCI machine learning repository. Extensive experiments are conducted and evaluated on those two datasets. Our experimental results demonstrate the effectiveness and efficiency of our proposed algorithm, which proves that our algorithm is a promising automatic diagnosis method for breast cancer.

knowledge discovery and data mining | 2017

STAR: A System for Ticket Analysis and Resolution

Wubai Zhou; Wei Xue; Ramesh Baral; Qing Wang; Chunqiu Zeng; Tao Li; Jian Xu; Zheng Liu; Larisa Shwartz; Genady Grabarnik

In large scale and complex IT service environments, a problematic incident is logged as a ticket and contains the ticket summary (system status and problem description). The system administrators log the step-wise resolution description when such tickets are resolved. The repeating service events are most likely resolved by inferring similar historical tickets. With the availability of reasonably large ticket datasets, we can have an automated system to recommend the best matching resolution for a given ticket summary. In this paper, we first identify the challenges in real-world ticket analysis and develop an integrated framework to efficiently handle those challenges. The framework first quantifies the quality of ticket resolutions using a regression model built on carefully designed features. The tickets, along with their quality scores obtained from the resolution quality quantification, are then used to train a deep neural network ranking model that outputs the matching scores of ticket summary and resolution pairs. This ranking model allows us to leverage the resolution quality in historical tickets when recommending resolutions for an incoming incident ticket. In addition, the feature vectors derived from the deep neural ranking model can be effectively used in other ticket analysis tasks, such as ticket classification and clustering. The proposed framework is extensively evaluated with a large real-world dataset.

Explore More