Ben Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ben Choi is active.

Explore More

Publication

Featured researches published by Ben Choi.

international conference on data mining | 2002

Automatic web page classification in a dynamic and hierarchical way

Xiaogang Peng; Ben Choi

Automatic classification of web pages is an effective way to deal with the difficulty of retrieving information from the Internet. Although there are many automatic classification algorithms and systems that have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of web pages going into the system. They also require searching through all existing categories to make any classification. We propose a dynamic and hierarchical classification system that is capable of adding new categories as required, organizing the web pages into a tree structure, and classifying web pages by searching through only one path of the tree structure. Our test results show that our proposed single-path search technique reduces the search complexity and increases the accuracy by 6% comparing to related algorithms. Our dynamic-category expansion technique also achieves satisfying results on adding new categories into our system as required.

industrial and engineering applications of artificial intelligence and expert systems | 2002

An Adaptive Web Cache Access Predictor Using Neural Network

Wen Tian; Ben Choi; Vir V. Phoha

This paper presents a novel approach to successfully predict Web pages that are most likely to be re-accessed in a given period of time. We present the design of an intelligent predictor that can be implemented on a Web server to guide caching strategies. Our approach is adaptive and learns the changing access patterns of pages in a Web site. The core of our predictor is a neural network that uses a backpropagation learning rule. We present results of the application of this predictor on static data using log files; it can be extended to learn the distribution of live Web page access patterns. Our simulations show fast learning, uniformly good prediction, and up to 82% correct prediction for the following six months based on a one-day training data. This long-range prediction accuracy is attributed to the static structure of the test Web site.

Online Information Review | 2004

Dynamic and hierarchical classification of Web pages

Ben Choi; Xiaogang Peng

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single‐path search technique reduces the search complexity from θ(n) to θ(log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic‐category expansion technique also achieves satisfying results for adding new categories into the system as required.

acm symposium on applied computing | 2008

Web page genre classification

Guangyu Chen; Ben Choi

In this paper we present an automatic genre-based Web page classification system. Unlike subject or topic based classifications, genre-based classifications focus on functional purposes and classify web pages into categories such as online shopping, technical paper, or discussion forum. Until now, the genre classifications are not well developed due to the subjectivities and difficulties to define the genre, the features, and even the categories. In this paper, we define five top-level genre categories, each of which has several subcategories, and develop new methods to extract 31 features from Web pages to identify the categories. We analyze not only the contents of the Web pages, but also the URLs, HTML tags, Java scripts, and VB scripts. We developed a genre classification system that achieved average accuracy of 93%. In addition, we combined this genre classification with our subject-based classification to produce a comprehensive Web page classification system.

web intelligence | 2003

Bidirectional hierarchical clustering for Web mining

Zhongmei Yao; Ben Choi

We propose a new bidirectional hierarchical clustering system for addressing challenges of Web mining. The key feature of the approach is that it aims to maximize the intra-cluster similarity in the bottom-up cluster-merging phase and it ensures to minimize the inter-cluster similarity in the top-down refinement phase. This two-pass approach achieves better clustering than existing one-pass approaches. We also propose a new cluster-merging criterion for allowing more than two clusters to be merged in each step and a new measure of similarity for taking into consideration not only the inter-connectivity between clusters but also the internal connectivity within the clusters. These result in reducing the average complexity for creating the final hierarchical structure of clusters from O(n/sup 2/) to O(n). The hierarchical structure represents a semantic structure between concepts of clusters and is directly applicable to the future of semantic net.

fuzzy systems and knowledge discovery | 2007

New Components for Building Fuzzy Logic Circuits

Ben Choi; Kunal Tipnis

This paper presents two new designs of fuzzy logic circuit components. Currently due to the lack of fuzzy components, many fuzzy systems cannot be fully implemented in hardware. We propose the designs of a new fuzzy memory cell and a new fuzzy logic gate. Unlike a digital memory cell that can only store either a zero or a one, our fuzzy memory cell can store any value ranging from zero to one. The fuzzy memory cell can also be used as a D-type fuzzy flip-flop, which is the first design of a D-type fuzzy flip-flop. We also designed a new fuzzy NOT gate based only on digital NOT gates that can easily be implemented in CMOS microchips. Our D-type fuzzy flip-flop and fuzzy NOT gate together with fuzzy AND gate and fuzzy OR gate allow us to design and implement fuzzy logic circuits to fully exploit fuzzy paradigms in hardware.

International Journal of Intelligent Information Technologies | 2007

Clustering Web Pages into Hierarchical Categories

Zhongmei Yao; Ben Choi

Clustering is well suited for Web mining by automatically organizing Web pages into categories each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain, until now there is no such a method suitable for Web page clustering. To address this problem, we discovered a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page datasets. We also propose a new Bidirectional Hierarchical Clustering algorithm, which arranges individual Web pages into clusters and then arranges the clusters into larger clusters and so on until the average inter-cluster similarity approaches the constant factor. Having the new constant factor together with the new algorithm, we have developed a clustering system suitable for mining the Web.

computational intelligence | 2003

INDUCTIVE INFERENCE BY USING INFORMATION COMPRESSION

Ben Choi

Inductive inference is of central importance to all scientific inquiries. Automating the process of inductive inference is the major concern of machine learning researchers. This article proposes inductive inference techniques to address three inductive problems: (1) how to automatically construct a general description, a model, or a theory to describe a sequence of observations or experimental data, (2) how to modify an existing model to account for new observations, and (3) how to handle the situation where the new observations are not consistent with the existing models. The techniques proposed in this article implement the inductive principle called the minimum descriptive length principle and relate to Kolmogorov complexity and Occams razor. They employ finite state machines as models to describe sequences of observations and measure the descriptive complexity by measuring the number of states. They can be used to draw inference from sequences of observations where one observation may depend on previous observations. Thus, they can be applied to time series prediction problems and to one‐to‐one mapping problems. They are implemented to form an automated inductive machine.

ieee wic acm international conference on intelligent agent technology | 2004

Agent space architecture for search engines

Ben Choi; Rohit Dhawan

The future of computing is moving from individual processing units to communities of self organizing agents. In This work we propose an agent and network based architecture for parallel and distributed computing called agent space architecture. Our architecture builds upon the notions of agent and object space and utilizes multicast networks. The building blocks for our proposed architecture consist of an active processing unit called agent, a shared place for communication called space, and a communication medium called multicast network. One unique feature of our architecture is that we extend the concept of object space to become an active space. Our active space functions as a rendezvous, a repository, a cache, a responder, a notifier, and a manager of its own resources. The organization of our architecture is as general as network topology. Any number of agents, spaces, or networks can be added to achieve high performance. It is as scalable as Ethernet and adding agents or spaces is as easy as plug and play. High availability and fault tolerance is achieved through multiple agents, spaces, and networks. All these features are particularly beneficial for challenging applications such as search engine, which is used as a test case to implement and to test our proposed architecture.

industrial and engineering applications of artificial intelligence and expert systems | 2003

Applying semantic links for classifying web pages

Ben Choi; Qing Guo

Automatic hypertext classification is an essential technique for organizing vast amount of Internet Web pages or HTML documents. One the of problems in classifying Web pages is that Web pages are usually short and contain insufficient text to clearly identify its category. Text classification mechanisms, by analyzing only the contents of the document itself, are relatively ineffective in classifying short Web pages. This paper proposes a new hypertext classification mechanism to address the problem by analyzing not only the Web page itself but also its linked Web pages referred by the URLs contained within the page. The URLs are treated as semantic links. The hypothesis is that the linked Web pages contain related information to help identifying the category of the Web page. Experimental results show that the proposed approach could increase the accuracy by 35% over the approach of analyzing only the Web page itself.

Explore More