William H. Hsu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William H. Hsu is active.

Explore More

Publication

Featured researches published by William H. Hsu.

international world wide web conferences | 2010

CETR: content extraction via tag ratios

Tim Weninger; William H. Hsu; Jiawei Han

We present Content Extraction via Tag Ratios (CETR) - a method to extract content text from diverse webpages by using the HTML documents tag ratios. We describe how to compute tag ratios on a line-by-line basis and then cluster the resulting histogram into content and non-content areas. Initially, we find that the tag ratio histogram is not easily clustered because of its one-dimensionality; therefore we extend the original approach in order to model the data in two dimensions. Next, we present a tailored clustering technique which operates on the two-dimensional model, and then evaluate our approach against a large set of alternative methods using standard accuracy, precision and recall metrics on a large and varied Web corpus. Finally, we show that, in most cases, CETR achieves better content extraction performance than existing methods, especially across varying web domains, languages and styles.

european conference on genetic programming | 2001

Layered Learning in Genetic Programming for a Cooperative Robot Soccer Problem

Steven M. Gustafson; William H. Hsu

We present an alternative to standard genetic programming (GP) that applies layered learning techniques to decompose a problem. GP is applied to subproblems sequentially, where the population in the last generation of a subproblem is used as the initial population of the next subproblem. This method is applied to evolve agents to play keep-away soccer, a subproblem of robotic soccer that requires cooperation among multiple agents in a dynnamic environment. The layered learning paradigm allows GP to evolve better solutions faster than standard GP. Results show that the layered learning GP outperforms standard GP by evolving a lower fitness faster and an overall better fitness. Results indicate a wide area of future research with layered learning in GP.

Information Sciences | 2004

Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning

William H. Hsu

In this paper, we address the automated tuning of input specification for supervised inductive learning and develop combinatorial optimization solutions for two such tuning problems. First, we present a framework for selection and reordering of input variables to reduce generalization error in classification and probabilistic inference. One purpose of selection is to control overfitting using validation set accuracy as a criterion for relevance. Similarly, some inductive learning algorithms, such as greedy algorithms for learning probabilistic networks, are sensitive to the evaluation order of variables. We design a generic fitness function for validation of input specification, then use it to develop two genetic algorithm wrappers: one for the variable selection problem for decision tree inducers and one for the variable ordering problem for Bayesian network structure learning. We evaluate the wrappers, using real-world data for the selection wrapper and synthetic data for both, and discuss their limitations and generalizability to other inducers.

database and expert systems applications | 2008

Text Extraction from the Web via Text-to-Tag Ratio

Tim Weninger; William H. Hsu

We describe a method to extract content text from diverse Web pages by using the HTML documents text-to-tag ratio rather than specific HTML cues that may not be constant across various Web pages. We describe how to compute the text-to-tag ratio on a line-by-line basis and then cluster the results into content and non-content areas. With this approach we then show surprisingly high levels of recall for all levels of precision, and a large space savings.

Software - Practice and Experience | 1995

Automatic synthesis of compression techniques for heterogeneous files

William H. Hsu; Amy E. Zwarico

We present a compression technique for heterogeneous files, those files which contain multiple types of data such as text, images, binary, audio, or animation. The system uses statistical methods to determine the best algorithm to use in compressing each block of data in a file (possibly a different algorithm for each block). The file is then compressed by applying the appropriate algorithm to each block. We obtain better savings than possible by using a single algorithm for compressing the file. The implementation of a working version of this heterogeneous compressor is described, along with examples of its value toward improving compression both in theoretical and applied contexts. We compare our results with those obtained using four commercially available compression programs, PKZIP, Unix compress, Stufflt, and Compact Pro, and show that our system provides better space savings.

Archive | 2009

Handbook of Research on Computational Methodologies in Gene Regulatory Networks

Sanjoy Das; Doina Caragea; Stephen M. Welch; William H. Hsu

Recent advances in gene sequencing technology are now shedding light on the complex interplay between genes that elicit phenotypic behavior characteristic of any given organism. In order to mediate internal and external signals, the daunting task of classifying an organisms genes into complex signaling pathways needs to be completed. The Handbook of Research on Computational Methodologies in Gene Regulatory Networks focuses on methods widely used in modeling gene networks including structure discovery, learning, and optimization. This innovative Handbook of Research presents a complete overview of computational intelligence approaches for learning and optimization and how they can be used in gene regulatory networks.

Annals of Operations Research | 2007

A machine learning approach to algorithm selection for \mathcal{NP}-hard optimization problems: a case study on the MPE problem

Haipeng Guo; William H. Hsu

Abstract Given one instance of an

australasian joint conference on artificial intelligence | 2004

A learning-based algorithm selection meta-reasoner for the real-time MPE problem

Haipeng Guo; William H. Hsu

\mathcal{NP}

genetic and evolutionary computation conference | 2003

GA-hardness revisited

Haipeng Guo; William H. Hsu

-hard optimization problem, can we tell in advance whether it is exactly solvable or not? If it is not, can we predict which approximate algorithm is the best to solve it? Since the behavior of most approximate, randomized, and heuristic search algorithms for

international conference on data mining | 2009