Is this you? Create Your Porfile

Gengxin Miao

University of California, Santa Barbara

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gengxin Miao is active.

Explore More

Publication

Featured researches published by Gengxin Miao.

very large data bases | 2011

Recovering semantics of tables on the web

Petros Venetis; Alon Y. Halevy; Jayant Madhavan; Marius Pasca; Warren Shen; Fei Wu; Gengxin Miao; Chung Wu

The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by enriching the table with additional annotations. Our annotations facilitate operations such as searching for tables and finding related tables. To recover semantics of tables, we leverage a database of class labels and relationships automatically extracted from the Web. The database of classes and relationships has very wide coverage, but is also noisy. We attach a class label to a column if a sufficient number of the values in the column are identified with that label in the database of class labels, and analogously for binary relationships. We describe a formal model for reasoning about when we have seen sufficient evidence for a label, and show that it performs substantially better than a simple majority scheme. We describe a set of experiments that illustrate the utility of the recovered semantics for table search and show that it performs substantially better than previous approaches. In addition, we characterize what fraction of tables on the Web can be annotated using our approach.

international world wide web conferences | 2009

Extracting data records from the web using tag path clustering

Gengxin Miao; Junichi Tatemura; Wang-Pin Hsiung; Arsany Sawires; Louise E. Moser

Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies a set of Web page segments, each of which represents an individual object (e.g., a product). State-of-the-art methods suffice for simple search, but they often fail to handle more complicated or noisy Web page structures due to a key limitation -- their greedy manner of identifying a list of records through pairwise comparison (i.e., similarity match) of consecutive segments. This paper introduces a new method for record extraction that captures a list of objects in a more robust way based on a holistic analysis of a Web page. The method focuses on how a distinct tag path appears repeatedly in the DOM tree of the Web document. Instead of comparing a pair of individual segments, it compares a pair of tag path occurrence patterns (called visual signals) to estimate how likely these two tag paths represent the same list of objects. The paper introduces a similarity measure that captures how closely the visual signals appear and interleave. Clustering of tag paths is then performed based on this similarity measure, and sets of tag paths that form the structure of data records are extracted. Experiments show that this method achieves higher accuracy than previous methods.

ieee international conference on pervasive computing and communications | 2005

Efficient browsing of Web search results on mobile devices based on block importance model

Xing Xie; Gengxin Miao; Ruihua Song; Ji-Rong Wen; Wei-Ying Ma

It is expected that more and more people would search the Web when they are on the move. Though conventional search engines can be directly visited from mobile devices with Web browsing capabilities, the information is not as conveniently accessible from a handheld device as it is from desktops. Existing information discovery mechanisms for searching the Web are not well-suited to mobile devices. In this paper, a block importance model is employed to assign importance values to different segments of a Web page, in order to extract and present more condensed search results to mobile users. Based on the block importance model, three presentations for displaying the result pages in different levels of detail have been designed to reduce both the number of user interactions and the overall search time. A set of user study experiments have been carried out to compare the three presentations and a commercial service on typical mobile devices. Experimental results show that our approaches can help users to explore Web search results more efficiently.

ieee international conference on services computing | 2007

A Distributed e-Healthcare System Based on the Service Oriented Architecture

Firat Kart; Gengxin Miao; Louise E. Moser; P. M. Melliar-Smith

Large-scale distributed systems, such as e-healthcare systems, are difficult to develop due to their complex and decentralized nature. The service oriented architecture facilitates the development of such systems by supporting modular design, application integration and interoperation, and software reuse. With open standards, such as XML, SOAP, WSDL and UDDI, the service oriented architecture supports interoperability between services operating on different platforms and between applications implemented in different programming languages. In this paper we describe a distributed e-healthcare system that uses the service oriented architecture as a basis for designing, implementing, deploying, invoking and managing healthcare services. The e-healthcare system that we have developed provides support for physicians, nurses, pharmacists and other healthcare professionals, as well as for patients and medical devices used to monitor patients. Multi-media input and output, with text, images and speech, make the system more user friendly than existing e-healthcare systems.

knowledge discovery and data mining | 2010

Generative models for ticket resolution in expert networks

Gengxin Miao; Louise E. Moser; Xifeng Yan; Shu Tao; Yi Chen; Nikos Anerousis

Ticket resolution is a critical, yet challenging, aspect of the delivery of IT services. A large service provider needs to handle, on a daily basis, thousands of tickets that report various types of problems. Many of those tickets bounce among multiple expert groups before being transferred to the group with the right expertise to solve the problem. Finding a methodology that reduces such bouncing and hence shortens ticket resolution time is a long-standing challenge. In this paper, we present a unified generative model, the Optimized Network Model (ONM), that characterizes the lifecycle of a ticket, using both the content and the routing sequence of the ticket. ONM uses maximum likelihood estimation, to represent how the information contained in a ticket is used by human experts to make ticket routing decisions. Based on ONM, we develop a probabilistic algorithm to generate ticket routing recommendations for new tickets in a network of expert groups. Our algorithm calculates all possible routes to potential resolvers and makes globally optimal recommendations, in contrast to existing classification methods that make static and locally optimal recommendations. Experiments show that our method significantly outperforms existing solutions.

IEEE Transactions on Knowledge and Data Engineering | 2013

Co-Occurrence-Based Diffusion for Expert Search on the Web

Ziyu Guan; Gengxin Miao; Russell McLoughlin; Xifeng Yan; Deng Cai

Expert search has been studied in different contexts, e.g., enterprises, academic communities. We examine a general expert search problem: searching experts on the web, where millions of webpages and thousands of names are considered. It has mainly two challenging issues: 1) webpages could be of varying quality and full of noises; 2) The expertise evidences scattered in webpages are usually vague and ambiguous. We propose to leverage the large amount of co-occurrence information to assess relevance and reputation of a person name for a query topic. The co-occurrence structure is modeled using a hypergraph, on which a heat diffusion based ranking algorithm is proposed. Query keywords are regarded as heat sources, and a person name which has strong connection with the query (i.e., frequently co-occur with query keywords and co-occur with other names related to query keywords) will receive most of the heat, thus being ranked high. Experiments on the ClueWeb09 web collection show that our algorithm is effective for retrieving experts and outperforms baseline algorithms significantly. This work would be regarded as one step toward addressing the more general entity search problem without sophisticated NLP techniques.

knowledge discovery and data mining | 2012

Latent association analysis of document pairs

Gengxin Miao; Ziyu Guan; Louise E. Moser; Xifeng Yan; Shu Tao; Nikos Anerousis; Jimeng Sun

This paper presents Latent Association Analysis (LAA), a generative model that analyzes the topics within two document sets simultaneously, as well as the correlations between the two topic structures, by considering the semantic associations among document pairs. LAA defines a correlation factor that represents the connection between two documents, and considers the topic proportion of paired documents based on this factor. Words in the documents are assumed to be randomly generated by particular topic assignments and topic-to-word probability distributions. The paper also presents a new ranking algorithm, based on LAA, that can be used to retrieve target documents that are potentially associated with a given source document. The ranking algorithm uses the latent factor in LAA to rank target documents by the strength of their semantic associations with the source document. We evaluate the LAA algorithm with real datasets, specifically, the IT-Change and the IT-Solution document sets from the IBM IT service environment and the Symptom-Treatment document sets from Google Health. Experimental results demonstrate that the LAA algorithm significantly outperforms existing algorithms.

Archive | 2012

Reliable Ticket Routing in Expert Networks

Gengxin Miao; Louise E. Moser; Xifeng Yan; Shu Tao; Yi Chen; Nikos Anerousis

Problem ticket resolution is an important aspect of the delivery of IT services. A large service provider needs to handle, on a daily basis, thousands of tickets that report various types of problems. Many of those tickets bounce among multiple expert groups before being transferred to the group with the expertise to solve the problem. Finding a methodology that can automatically make reliable ticket routing decisions and that reduces such bouncing and, hence, shortens ticket resolution time is a long-standing challenge. Reliable ticket routing forwards the ticket to an expert who either can solve the problem reported in the ticket, or can reach an expert who can resolve the ticket. In this chapter, we present a unified generative model, the Optimized Network Model (ONM), that characterizes the lifecycle of a ticket, using both the content and the routing sequence of the ticket. ONM uses maximum likelihood estimation to capture reliable ticket transfer profiles on each edge of an expert network. These transfer profiles reflect how the information contained in a ticket is used by human experts to make ticket routing decisions. Based on ONM, we develop a probabilistic algorithm to generate reliable ticket routing recommendations for new tickets in a network of expert groups. Our algorithm calculates all possible routes to potential resolvers and makes globally optimal recommendations, in contrast to existing classification methods that make static and locally optimal.

international conference on web services | 2009

Collaborative Web Data Record Extraction

Gengxin Miao; Firat Kart; Louise E. Moser; P. M. Melliar-Smith

This paper describes a Web Service that automatically parses and extracts data records from Web pages containing structured data. The Web Service allows multiple users to share and manage a Web data record extraction task to increase its utility. A recommendation system, based on the Probabilistic Latency Semantic Indexing algorithm, enables a user to find potentially interesting content or other users who share the same interests with the user. A distributed computing platform improves the scalability of the Web Service in supporting multiple users by employing multiple server computers. A Web Service interface allows users to access the Web Service, and allows programmers to develop their own applications and, thus, extend the functionality of the Web Service.

artificial intelligence applications and innovations | 2006

A Filter Module Used in Pedestrian Detection System

Gengxin Miao; Yupin Luo; Qiming Tian; Jingxin Tang

Most pedestrian detection systems are built based on computer vision technology and usually are composed of two basic modules: object detection module, and recognition module. This paper presents an efficient filtering module, which works between the two basic modules, based on extracting the 3-dimensional information from single frame images. The filter module removes the noisy objects extracted by object detection module and thus reduces the burden of the recognition module. 3-D information, such as height, width and distance are extracted from single frame images. Using this information, a Bayesian classifier is employed to implement the filter. The main contribution of this filter module is that it removed about 30% noisy objects detected by the object detection module. The total computing cost and error detection rate is reduced when this filter module is used in the pedestrian detection system.

Explore More