Sheau-Dong Lang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sheau-Dong Lang is active.

Explore More

Publication

Featured researches published by Sheau-Dong Lang.

conference on information and knowledge management | 1999

Classification algorithms for NETNEWS articles

Wen-Lin Hsu; Sheau-Dong Lang

We propose several algorithms using the vector space model to classify the news articles posted on the NETNEWS according to the newsgroup categories. The baseline method combines the terms of all the articles of each newsgroup in the training set to represent the newsgroups as single vectors. After training, the incoming news articles are classified based on their similarity to the existing newsgroup categories. We propose to use the following techniques to improve the classification performance of the baseline method: (1) use routing (classification) accuracy and the similarity values to refine the training set; (2) update the underlying term structures periodically during testing; and (3) apply k-means clustering to partition the newsgroup articles and represent each newsgroup by k vectors. Our test collection consists of the real news articles and the 519 subnewsgroups under the REC newsgroup of NETNEWS in a period of 3 months. Our experimental results demonstrate that the technique of refining the training set reduces from one-third to two-thirds of the storage. The technique of periodical updates improves the routing accuracy ranging from 20% to 100% but incurs runtime overhead. Finally, representing each newsgroup by k vectors (with k = 2 or 3) using clustering yields the most significant improvement in routing accuracy, ranging from 60% to 100%, while causing only slightly higher storage requirements.

conference on information and knowledge management | 1996

Incorporating latent semantic indexing into a neural network model for information retrieval

Sheau-Dong Lang; Narsingh Deo

We incorporate the Latent Semantic Indexing (LSl) technique into a competition-based neural network model for information retrieval. The original neural network model was based on a causal inference network, incorporating Roget’s Thesaurus, that connects the index terms and related documents. Since the pmcIess of creating or updating a thesaurus is rather expensive, we apply the LSI technique to provide an automated procedure that captures the semantic relationship between the doctrments and

international conference on data mining | 2007

Mining Distance-Based Outliers from Categorical Data

Shuxin Li; Robert Lee; Sheau-Dong Lang

Distance-based outlier detection is an important data mining technique that finds abnormal data objects according to some distance function. However, when this technique is applied to high-dimensional categorical data, a traditional simple matching dissimilarity measure does not provide an adequate model. In this article, we employ a new common- neighbor-based distance function to measure the proximity between a pair of data points. Experiments show that better outlier mining results can be achieved when the new distance function is utilized rather than a conventional simple matching dissimilarity measure.

intelligence and security informatics | 2008

Forensic Artifacts of Microsoft Windows Vista System

Daniel M. Purcell; Sheau-Dong Lang

This paper reviews changes made to Microsoft Windows Vista system from earlier Windows operating system (such as XP) and directs attention to system artifacts that are of evidentiary values in typical computer forensics work. The issues addressed include: NTFS on-disk structure, file systems directory structures, symbolic links, and recycle bin; we also briefly mention artifacts related to Windows mail, paging file, thumbnail caching, and print spooling.

international symposium on parallel architectures algorithms and networks | 1999

A parallel algorithm for the degree-constrained minimum spanning tree problem using nearest-neighbor chains

Li-Jen Mao; Narsingh Deo; Sheau-Dong Lang

The Minimum Spanning Tree (MST) problem with an added constraint that no node in the spanning tree has the degree more than a specified integer d, is known as the Degree-Constrained MST (d-MST) problem. Since computing the d-MST is NP-hard for every d in the range 2/spl les/d/spl les/(n-2) where n denotes the total number of nodes, several approximate algorithms have been proposed in the literature. We have previously proposed two approximate algorithms, TC-RNN and IR, for the d-MST problem (L.J. Mao et al., 1997). Our experimental results show that while the IR algorithm is faster, the TC-RNN algorithm consistently produces spanning trees with a smaller weight. We propose a new algorithm, TC-NNC, which is an improved version of TC-RNN. Our experiments using randomly generated, weighted graphs as input, demonstrate that the execution time of TC-NNC is smaller than that of TC-RNN, and is very close to that of IR. Further, the quality-of-solution of TC-NNC is better than that of IR and is very close to that of TC-RNN.

conference on information and knowledge management | 1994

A competition-based connectionist model for information retrieval using a merged thesaurus

Inien Syu; Sheau-Dong Lang

This paper investigates a network-based information retrieval model using diagnostic inferencing techniques. A basic inference network in information retrieval consists of two component networks: the document component and the query component. In our approach, there is a layer of nodes corresponding to the documents, and a layer of nodes corresponding to the index terms extracted from the document set, with links connecting documents to the related index terms 1. A thesaurus is used to provide concept categories; these categories are represented by another layer of nodes, with links connecting the index terms and the related categories 2. The query component uses a symmetric structure. Each query causes markings of category nodes, hence markings of the related index term nodes, in the document component of the network. In our previous work, we adapted a competition-based connectionist model for diagnostic problem solving to information retrieval. In this model, documents are treated as “disorders” and user information needs, represented by the marked index term nodes, as “manifestations”. A competitive activation mechanism is then used which converges to a set of disorders that best explain the given manifestations. Our experiments showed that the retrieval performance of this model is comparable to or better than that of various information retrieval models reported in the literature. In this paper, we report further enhancements of the model by using a merged thesaurus.

Data mining, intrusion detection, information asurance, and data networks security. Conference | 2006

Efficient mining of strongly correlated item pairs

Shuxin Li; Robert Lee; Sheau-Dong Lang

Past attempts to mine transactional databases for strongly correlated item pairs have been beset by difficulties. In an attempt to be efficient, some algorithms produce false positive and false negative results. In an attempt to be accurate and comprehensive, other algorithms sacrifice efficiency. We propose an efficient new algorithm that uses Jaccards correlation coefficient, which is simply the ratio between the sizes of the intersection and the union of two sets, to generate a set of strongly correlated item pairs that is both accurate and comprehensive. The pruning of candidate item pairs based on an upper bound facilitates efficiency. Furthermore, there is no possibility of false positives or false negatives. Testing of our algorithm on datasets of various sizes shows its effectiveness in real-world application.

intelligence and security informatics | 2009

From digital forensic report to Bayesian network representation

Robert Lee; Sheau-Dong Lang; Kevin Stenger

Computer (digital) forensic examiners typically write a report to document the examination process, including tools used, major processing steps, summary of the findings, and a detailed listing of relevant evidence (files, artifacts) exported to external media (CD, DVD, hard copy) for the case investigator or attorney. However, proper interpretation of the significance of extracted evidence often requires additional consultation with the examiner. This paper proposes a practical methodology for transforming the findings in typical forensic reports to a graphical representation using Bayesian networks (BNs). BNs offer the following advantages: (1) Delineate the cause-effect relationship among relevant pieces of evidence described in the report; and (2) Use probability and established Bayesian inference rules to deal with uncertainty of digital evidence. A realistic forensic report is used to demonstrate this methodology.

acm southeast regional conference | 2006

Detecting outliers in interval data

Shuxin Li; Robert Lee; Sheau-Dong Lang

Outlier detection has become an important data mining problem in many applications, including customer management and fraud detection. In recent years, many algorithms have been developed for discovering outliers in large databases. However, to our knowledge, no algorithm exists for discovering outliers in interval data. In this paper, we propose an efficient algorithm to detect distance-based outliers in interval data. We perform empirical studies on real and simulated interval datasets to evaluate the effectiveness of our proposed algorithm in identifying meaningful outliers.

international symposium on parallel architectures algorithms and networks | 2005

Locality-based profile analysis for secondary intrusion detection

Mian Zhou; Robert Lee; Sheau-Dong Lang

While a firewall at the perimeter of a local network provides the first line of defense against attackers, many intrusion incidents result from successful penetration of the firewall. The compromise of one computer puts the entire network at risk. We propose a distributed personal intrusion detection system (IDS) that provides local anomaly detection as well as centralized traffic analysis. The system first builds profiles for normal network activity and then labels as suspicious any events that deviate from the normal profiles. The normal profiles are based on variations in connection-based behavior at each individual host. Deviations at each host are recorded using a local weight assignment scheme and then further processed by the central analyzer to build a weighted link graph representing the overall network abnormality. As local networks become more vulnerable to inside attack, our system reinforces security to prevent corruption from the inside.

Explore More