William H. Hsu
Kansas State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William H. Hsu.
international world wide web conferences | 2010
Tim Weninger; William H. Hsu; Jiawei Han
We present Content Extraction via Tag Ratios (CETR) - a method to extract content text from diverse webpages by using the HTML documents tag ratios. We describe how to compute tag ratios on a line-by-line basis and then cluster the resulting histogram into content and non-content areas. Initially, we find that the tag ratio histogram is not easily clustered because of its one-dimensionality; therefore we extend the original approach in order to model the data in two dimensions. Next, we present a tailored clustering technique which operates on the two-dimensional model, and then evaluate our approach against a large set of alternative methods using standard accuracy, precision and recall metrics on a large and varied Web corpus. Finally, we show that, in most cases, CETR achieves better content extraction performance than existing methods, especially across varying web domains, languages and styles.
european conference on genetic programming | 2001
Steven M. Gustafson; William H. Hsu
We present an alternative to standard genetic programming (GP) that applies layered learning techniques to decompose a problem. GP is applied to subproblems sequentially, where the population in the last generation of a subproblem is used as the initial population of the next subproblem. This method is applied to evolve agents to play keep-away soccer, a subproblem of robotic soccer that requires cooperation among multiple agents in a dynnamic environment. The layered learning paradigm allows GP to evolve better solutions faster than standard GP. Results show that the layered learning GP outperforms standard GP by evolving a lower fitness faster and an overall better fitness. Results indicate a wide area of future research with layered learning in GP.
Information Sciences | 2004
William H. Hsu
In this paper, we address the automated tuning of input specification for supervised inductive learning and develop combinatorial optimization solutions for two such tuning problems. First, we present a framework for selection and reordering of input variables to reduce generalization error in classification and probabilistic inference. One purpose of selection is to control overfitting using validation set accuracy as a criterion for relevance. Similarly, some inductive learning algorithms, such as greedy algorithms for learning probabilistic networks, are sensitive to the evaluation order of variables. We design a generic fitness function for validation of input specification, then use it to develop two genetic algorithm wrappers: one for the variable selection problem for decision tree inducers and one for the variable ordering problem for Bayesian network structure learning. We evaluate the wrappers, using real-world data for the selection wrapper and synthetic data for both, and discuss their limitations and generalizability to other inducers.
database and expert systems applications | 2008
Tim Weninger; William H. Hsu
We describe a method to extract content text from diverse Web pages by using the HTML documents text-to-tag ratio rather than specific HTML cues that may not be constant across various Web pages. We describe how to compute the text-to-tag ratio on a line-by-line basis and then cluster the results into content and non-content areas. With this approach we then show surprisingly high levels of recall for all levels of precision, and a large space savings.
Software - Practice and Experience | 1995
William H. Hsu; Amy E. Zwarico
We present a compression technique for heterogeneous files, those files which contain multiple types of data such as text, images, binary, audio, or animation. The system uses statistical methods to determine the best algorithm to use in compressing each block of data in a file (possibly a different algorithm for each block). The file is then compressed by applying the appropriate algorithm to each block. We obtain better savings than possible by using a single algorithm for compressing the file. The implementation of a working version of this heterogeneous compressor is described, along with examples of its value toward improving compression both in theoretical and applied contexts. We compare our results with those obtained using four commercially available compression programs, PKZIP, Unix compress, Stufflt, and Compact Pro, and show that our system provides better space savings.
Archive | 2009
Sanjoy Das; Doina Caragea; Stephen M. Welch; William H. Hsu
Recent advances in gene sequencing technology are now shedding light on the complex interplay between genes that elicit phenotypic behavior characteristic of any given organism. In order to mediate internal and external signals, the daunting task of classifying an organisms genes into complex signaling pathways needs to be completed. The Handbook of Research on Computational Methodologies in Gene Regulatory Networks focuses on methods widely used in modeling gene networks including structure discovery, learning, and optimization. This innovative Handbook of Research presents a complete overview of computational intelligence approaches for learning and optimization and how they can be used in gene regulatory networks.
Annals of Operations Research | 2007
Haipeng Guo; William H. Hsu
Abstract Given one instance of an
australasian joint conference on artificial intelligence | 2004
Haipeng Guo; William H. Hsu
\mathcal{NP}
genetic and evolutionary computation conference | 2003
Haipeng Guo; William H. Hsu
-hard optimization problem, can we tell in advance whether it is exactly solvable or not? If it is not, can we predict which approximate algorithm is the best to solve it? Since the behavior of most approximate, randomized, and heuristic search algorithms for
international conference on data mining | 2009
Jing Xia; Doina Caragea; William H. Hsu
\mathcal{NP}