Michael Chau | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Chau is active.

Explore More

Publication

Featured researches published by Michael Chau.

IEEE Computer | 2004

Crime data mining: a general framework and some examples

Hsinchun Chen; Wingyan Chung; Jennifer Jie Xu; Gang Wang; Yi Qin; Michael Chau

A major challenge facing all law-enforcement and intelligence-gathering organizations is accurately and efficiently analyzing the growing volumes of crime data. Detecting cybercrime can likewise be difficult because busy network traffic and frequent online transactions generate large amounts of data, only a small portion of which relates to illegal activities. Data mining is a powerful tool that enables criminal investigators who may lack extensive training as data analysts to explore large databases quickly and efficiently. We present a general framework for crime data mining that draws on experience gained with the Coplink project, which researchers at the University of Arizona have been conducting in collaboration with the Tucson and Phoenix police departments since 1997.

International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 2007

Mining communities and their relationships in blogs: A study of online hate groups

Michael Chau; Jennifer Jie Xu

Blogs, often treated as the equivalence of online personal diaries, have become one of the fastest growing types of Web-based media. Everyone is free to express their opinions and emotions very easily through blogs. In the blogosphere, many communities have emerged, which include hate groups and racists that are trying to share their ideology, express their views, or recruit new group members. It is important to analyze these virtual communities, defined based on membership and subscription linkages, in order to monitor for activities that are potentially harmful to society. While many Web mining and network analysis techniques have been used to analyze the content and structure of the Web sites of hate groups on the Internet, these techniques have not been applied to the study of hate groups in blogs. To address this issue, we have proposed a semi-automated approach in this research. The proposed approach consists of four modules, namely blog spider, information extraction, network analysis, and visualization. We applied this approach to identify and analyze a selected set of 28 anti-Blacks hate groups (820 bloggers) on Xanga, one of the most popular blog hosting sites. Our analysis results revealed some interesting demographical and topological characteristics in these groups, and identified at least two large communities on top of the smaller ones. The study also demonstrated the feasibility in applying the proposed approach in the study of hate groups and other related communities in blogs.

decision support systems | 2002

CI Spider: a tool for competitive intelligence on the web

Hsinchun Chen; Michael Chau; Daniel Dajun Zeng

Competitive Intelligence (CI) aims to monitor a firms external environment for information relevant to its decision-making process. As an excellent information source, the Internet provides significant opportunities for CI professionals as well as the problem of information overload. Internet search engines have been widely used to facilitate information search on the Internet. However, many problems hinder their effective use in CI research. In this paper, we introduce the Competitive Intelligence Spider, or CI Spider, designed to address some of the problems associated with using Internet search engines in the context of competitive intelligence. CI Spider performs real-time collection of Web pages from sites specified by the user and applies indexing and categorization analysis on the documents collected, thus providing the user with an up-to-date, comprehensive view of the Web sites of user interest. In this paper, we report on the design of the CI Spider system and on a user study of CI Spider, which compares CI Spider with two other alternative focused information gathering methods: Lycos search constrained by Internet domain, and manual within-site browsing and searching. Our study indicates that CI Spider has better precision and recall rate than Lycos. CI Spider also outperforms both Lycos and within-site browsing and searching with respect to ease of use. We conclude that there exists strong evidence in support of the potentially significant value of applying the CI Spider approach in CI applications.

IEEE Computer | 2003

Comparison of three vertical search spiders

Michael Chau; Hsinchun Chen

The Webs dynamic, unstructured nature makes locating resources difficult. Vertical search engines solve part of the problem by keeping indexes only in specific domains. They also offer more opportunity to apply domain knowledge in the spider applications that collect content for their databases. The authors used three approaches to investigate algorithms for improving the performance of vertical search engine spiders: a breadth-first graph-traversal algorithm with no heuristics to refine the search process, a best-first traversal algorithm that uses a hyperlink-analysis heuristic, and a spreading-activation algorithm based on modeling the Web as a neural network.

Journal of the Association for Information Science and Technology | 2001

MetaSpider: Meta‐searching and categorization on the Web

Hsinchun Chen; Haiyan Fan; Michael Chau; Daniel Dajun Zeng

It has become increasingly difficult to locate relevant information on the Web, even with the help of Web search engines. Two approaches to addressing the low precision and poor presentation of search results of current search tools are studied: meta-search and document categorization. Meta-search engines improve precision by selecting and integrating search results from generic or domain-specific Web search engines or other resources. Document categorization promises better organization and presentation of retrieved results. This article introduces MetaSpider, a meta-search engine that has real-time indexing and categorizing functions. We report in this paper the major components of MetaSpider and discuss related technical approaches. Initial results of a user evaluation study comparing MetaSpider, NorthernLight, and MetaCrawler in terms of clustering performance and of time and effort expended show that MetaSpider performed best in precision rate, but disclose no statistically significant differences in recall rate and time requirements. Our experimental study also reveals that MetaSpider exhibited a higher level of automation than the other two systems and facilitated efficient searching by providing the user with an organized, comprehensive view of the retrieved documents.

acm/ieee joint conference on digital libraries | 2004

Building domain-specific Web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Jialun Qin; Yilu Zhou; Michael Chau

Collecting domain-specific documents from the Web using focused crawlers has been considered one of the most important strategies to build digital libraries that serve the scientific community. However, because most focused crawlers use local search algorithms to traverse the Web space, they could be easily trapped within a limited sub-graph of the Web that surrounds the starting URLs and build domain-specific collections that are not comprehensive and diverse enough to scientists and researchers. We investigated the problems of traditional focused crawlers caused by local search algorithms and proposed a new crawling approach, meta-search enhanced focused crawling, to address the problems. We conducted two user evaluation experiments to examine the performance of our proposed approach and the results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques.

knowledge discovery and data mining | 2006

Uncertain data mining: an example in clustering location data

Michael Chau; Reynold Cheng; Ben Kao; Jackey Ng

Data uncertainty is an inherent property in various applications due to reasons such as outdated sources or imprecise measurement. When data mining techniques are applied to these data, their uncertainty has to be considered to obtain high quality results. We present UK-means clustering, an algorithm that enhances the K-means algorithm to handle data uncertainty. We apply UK-means to the particular pattern of moving-object uncertainty. Experimental results show that by considering uncertainty, a clustering algorithm can produce more accurate results.

PLOS ONE | 2013

Reality Check for the Chinese Microblog Space: A Random Sampling Approach

King-Wa Fu; Michael Chau

Chinese microblogs have drawn global attention to this online application’s potential impact on the country’s social and political environment. However, representative and reliable statistics on Chinese microbloggers are limited. Using a random sampling approach, this study collected Chinese microblog data from the service provider, analyzing the profile and the pattern of usage for 29,998 microblog accounts. From our analysis, 57.4% (95% CI 56.9%,58.0%) of the accounts’ timelines were empty. Among the 12,774 non-zero statuses samples, 86.9% (95% CI 86.2%,87.4%) did not make original post in a 7-day study period. By contrast, 0.51% (95% CI 0.4%,0.65%) wrote twenty or more original posts and 0.45% (95% CI 0.35%,0.60%) reposted more than 40 unique messages within the 7-day period. A small group of microbloggers created a majority of contents and drew other users’ attention. About 4.8% (95% CI 4.4%,5.2%) of the 12,774 users contributed more than 80% (95% CI,78.6%,80.3%) of the original posts and about 4.8% (95% CI 4.5%,5.2%) managed to create posts that were reposted or received comments at least once. Moreover, a regression analysis revealed that volume of followers is a key determinant of creating original microblog posts, reposting messages, being reposted, and receiving comments. Volume of friends is found to be linked only with the number of reposts. Gender differences and regional disparities in using microblogs in China are also observed.

acm/ieee joint conference on digital libraries | 2001

Personalized spiders for web search and analysis

Michael Chau; Daniel Dajun Zeng; Hsinchun Chen

Searching for useful information on the World Wide Web has become incr easingly difficult. While Internet search engines have been helping people to search on the web, low recall rate and outdated indexes have become more and more problematic as the web grows. In addition, search tools usually present to the user only a list of search results, failing to provide further personalized analysis which could help users identify useful information and comprehend these results. To alleviate these problems, we propose a client-based architecture that incorporates noun phrasing and self-organizing map techniques. Two systems, namely CI Spider and Meta Spider, have been built based on this architecture. User evaluation studies have been conducted and the findings suggest that the proposed architecture can effectively facilitate web search and analysis.

Applied Intelligence | 2010

PutMode: prediction of uncertain trajectories in moving objects databases

Shaojie Qiao; Changjie Tang; Huidong Jin; Teng Long; Shucheng Dai; Yungchang Ku; Michael Chau

Objective: Prediction of moving objects with uncertain motion patterns is emerging rapidly as a new exciting paradigm and is important for law enforcement applications such as criminal tracking analysis. However, existing algorithms for prediction in spatio-temporal databases focus on discovering frequent trajectory patterns from historical data. Moreover, these methods overlook the effect of some important factors, such as speed and moving direction. This lacks generality as moving objects may follow dynamic motion patterns in real life.Methods: We propose a framework for predicating uncertain trajectories in moving objects databases. Based on Continuous Time Bayesian Networks (CTBNs), we develop a trajectory prediction algorithm, called PutMode (Prediction of uncertain trajectories in Moving objects databases). It comprises three phases: (i) construction of TCTBNs (Trajectory CTBNs) which obey the Markov property and consist of states combined by three important variables including street identifier, speed, and direction; (ii) trajectory clustering for clearing up outlying trajectories; (iii) predicting the motion behaviors of moving objects in order to obtain the possible trajectories based on TCTBNs.Results: Experimental results show that PutMode can predict the possible motion curves of objects in an accurate and efficient manner in distinct trajectory data sets with an average accuracy higher than 80%. Furthermore, we illustrate the crucial role of trajectory clustering, which provides benefits on prediction time as well as prediction accuracy.

Explore More