Meichun Hsu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Meichun Hsu is active.

Explore More

Publication

Featured researches published by Meichun Hsu.

international conference on management of data | 1988

The HiPAC project: combining active databases and timing constraints

Umeshwar Dayal; Barbara T. Blaustein; Alejandro P. Buchmann; Upen S. Chakravarthy; Meichun Hsu; R. Ledin; Dennis R. McCarthy; Arnon Rosenthal; Sunil K. Sarin; Michael J. Carey; Miron Livny; Rajiv Jauhari

The HiPAC (High Performance ACtive database system) project addresses two critical problems in time-constrained data management: the handling of timing constraints in databases, and the avoidance of wasteful polling through the use of situation-action rules that are an integral part of the database and are monitored by DBMSs condition monitor. A rich knowledge model provides the necessary primitives for definition of timing constraints, situation-action rules, and precipitating events. The execution model allows various coupling modes between transactions, situation evaluations and actions, and provides the framework for correct concurrent execution of transactions and triggered actions. Different approaches to scheduling of time-constrained tasks and transactions are explored and an architecture is being designed with special emphasis on the interaction of the time-constrained, active DBMS and the operating system. Performance models are developed to evaluate the various design alternatives.

knowledge discovery and data mining | 2013

Spotting opinion spammers using behavioral footprints

Arjun Mukherjee; Abhinav Kumar; Bing Liu; Junhui Wang; Meichun Hsu; Malu Castellanos; Riddhiman Ghosh

Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or to demote some target products. In recent years, fake review detection has attracted significant attention from both the business and research communities. However, due to the difficulty of human labeling needed for supervised learning and evaluation, the problem remains to be highly challenging. This work proposes a novel angle to the problem by modeling spamicity as latent. An unsupervised model, called Author Spamicity Model (ASM), is proposed. It works in the Bayesian setting, which facilitates modeling spamicity of authors as latent and allows us to exploit various observed behavioral footprints of reviewers. The intuition is that opinion spammers have different behavioral distributions than non-spammers. This creates a distributional divergence between the latent population distributions of two clusters: spammers and non-spammers. Model inference results in learning the population distributions of the two clusters. Several extensions of ASM are also considered leveraging from different priors. Experiments on a real-life Amazon review dataset demonstrate the effectiveness of the proposed models which significantly outperform the state-of-the-art competitors.

Archive | 2001

Technologies for E-Services

Alejandro P. Buchmann; Ludger Fiege; Fabio Casati; Meichun Hsu; Ming-Chien Shan

In the traditional application model, services are tightly coupled with the processes they support. For example, whenever a server’s process changes, existing clients using that process must also be updated. However, electronic commerce is moving toward e-service based interactions, where corporate enterprises use e-services to interact with each other dynamically, and a service in one enterprise could spontaneously decide to engage a service fronted by another enterprise. We clarify here the relationship between currently developing standards such as UDDI, WSDL, and WSCL, and propose a conversation controller mechanism that leverages such standards to direct services in their conversations. We can thus treat services as pools of methods, independent of the conversations they support. Even method names can be decided on independently of the conversations. Services can spontaneously discover each other and then engage in complicated interactions without the services themselves having to explicitly support conversational logic. The dynamism and flexibility enabled by this decoupling is the essential difference between applications offered over the web and e-services.

international conference on management of data | 1990

Implementing recoverable requests using queues

Philip A. Bernstein; Meichun Hsu; Bruce Mann

Transactions have been rigorously defined and extensively studied in the database and transaction processing literature, but little has been said about the handling of the requests for transaction execution in commercial TP systems, especially distributed ones, managing the flow of requests is often as important as executing the transactions themselves. This paper studies fault-tolerant protocols for managing the flow of transaction requests between clients that issue requests and servers that process them. We discuss how to implement these protocols using transactions and recoverable queuing systems. Queuing systems are used to move requests reliably between clients and servers. The protocols use queuing systems to ensure that the server processes each request exactly once and that a client processes each reply at least once. We treat request-reply protocols for single-transaction requests, for multi-transaction requests, and for requests that require interaction with the display after the request is submitted.

international conference on data engineering | 2001

Inter-enterprise collaborative business process management

Qiming Chen; Meichun Hsu

Conventional workflow systems are primarily designed for intra-enterprise process management, and they are hardly used to handle processes with tasks and data separated by enterprise boundaries, for reasons such as security, privacy, sharability, firewalls, etc. Further, the cooperation of multiple enterprises is often based on peer-to-peer interactions rather than centralized coordination. As a result, the conventional centralized process management architecture does not fit into the picture of inter-enterprise business-to-business e-commerce. We have developed a Collaborative Process Manager (CPM) to support decentralized, peer-to-peer process management for inter-enterprise collaboration at the business process level. A collaborative process is not handled by a centralized workflow engine, but by multiple CPMs, each representing a player in the business process. Each CPM is used to schedule, dispatch and control the tasks of the process that the player is responsible for, and the CPMs interoperate through an inter-CPM messaging protocol. We have implemented CPM and embedded it into a dynamic software agent architecture, E-Carry, that we developed at HP Labs, to elevate multi-agent cooperation from the conversation level to the process level for mediating e-commerce applications.

Information Visualization | 2002

Pixel bar charts: a visualization technique for very large multi-attribute data sets

Daniel A. Keim; Ming C. Hao; Umeshwar Dayal; Meichun Hsu

Simple presentation graphics are intuitive and easy-to-use, but show only highly aggregated data presenting only a very small number of data values (as in the case of bar charts) and may have a high degree of overlap occluding a significant portion of the data values (as in the case of the x-y plots). In this article, the authors therefore propose a generalization of traditional bar charts and x-y plots, which allows the visualization of large amounts of data. The basic idea is to use the pixels within the bars to present detailed information of the data records. The so-called pixel bar charts retain the intuitiveness of traditional bar charts while allowing very large data sets to be visualized in an effective way. It is shown that, for an effective pixel placement, a complex optimization problem has to be solved. The authors then present an algorithm which efficiently solves the problem. The application to a number of real-world e-commerce data sets shows the wide applicability and usefulness of this new idea, and a comparison to other well-known visualization techniques (parallel coordinates and spiral techniques) shows a number of clear advantages.

adaptive agents and multi-agents systems | 2000

Multi-agent cooperation, dynamic workflow and XML for e-commerce automation

Qiming Chen; Meichun Hsu; Umeshwar Dayal; Martin L. Griss

E-Commerce is a distributed computing environment with dynamic relationships among a large number of autonomous service requesters, brokers and providers. Agents with predefined functions but without the ability to modify behavior dynamically may be too limited for mediating E-Commerce applications properly, since they cannot switch roles or adjust their behavior to participate in dynamically formed partnerships. We have developed a Java based dynamic agent infrastructure for E-Commerce automation, which supports dynamic behavior modification of agents, a significant difference from other agent platforms. Supported by dynamic agents, mechanisms have been developed for plugging in workflow and multi-agent cooperation, and for supporting dynamic workflow service provisioning that allows workflow services to be constructed on the fly. XML is chosen as our agent communication message format. Since different problem domains have different ontology, we allow agents to communicate with domain specific performatives and act using corresponding interpreters. Dynamic agents can carry, switch and exchange interpreters. Our approach enables document-driven agent cooperation and DTD based program generation, and further, allows agents to exchange and share ontology for multiple or even dynamic domains. In this way, the cooperation of dynamic agents supports plug-and-play commerce, mediating businesses that are built on one another’s services. A prototype has been developed at HP Labs.

TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers | 2000

K-Harmonic Means - A Spatial Clustering Algorithm with Boosting

Bin Zhang; Meichun Hsu; Umeshwar Dayal

We propose a new center-based iterative clustering algorithm, KHarmonic Means (KHM), which is essentially insensitive to the initialization of the centers, demonstrated through a set of experiments. The dependency of the K-Means performance on the initialization of the centers has been a major problem; a similar issue exists for an alternative algorithm, Expectation Maximization (EM). Many have tried to generate good initializations to solve the sensitivity problem. KHM addresses the intrinsic problem by replacing the minimum distance from a data point to the centers, used in K-means, by the Harmonic Averages of the distances from the data point to all centers. KHM significantly improves the quality of clustering results comparing with both K-Means and EM. The KHM algorithm has been implemented in both sequential and parallel languages and tested on hundreds of randomly generated datasets with different data distribution and clustering characteristics.

conference on information and knowledge management | 2013

Discovering coherent topics using general knowledge

Zhiyuan Chen; Arjun Mukherjee; Bing Liu; Meichun Hsu; Malu Castellanos; Riddhiman Ghosh

Topic models have been widely used to discover latent topics in text documents. However, they may produce topics that are not interpretable for an application. Researchers have proposed to incorporate prior domain knowledge into topic models to help produce coherent topics. The knowledge used in existing models is typically domain dependent and assumed to be correct. However, one key weakness of this knowledge-based approach is that it requires the user to know the domain very well and to be able to provide knowledge suitable for the domain, which is not always the case because in most real-life applications, the user wants to find what they do not know. In this paper, we propose a framework to leverage the general knowledge in topic models. Such knowledge is domain independent. Specifically, we use one form of general knowledge, i.e., lexical semantic relations of words such as synonyms, antonyms and adjective attributes, to help produce more coherent topics. However, there is a major obstacle, i.e., a word can have multiple meanings/senses and each meaning often has a different set of synonyms and antonyms. Not every meaning is suitable or correct for a domain. Wrong knowledge can result in poor quality topics. To deal with wrong knowledge, we propose a new model, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries. To the best of our knowledge, GK-LDA is the first such model that can incorporate the domain independent knowledge. Our experiments using online product reviews show that GK-LDA performs significantly better than existing state-of-the-art models.

extending database technology | 2011

Experience in Continuous analytics as a Service (CaaaS)

Qiming Chen; Meichun Hsu; Hans Zeller

Mobile applications, such as those on WebOS, increasingly depend on continuous analytics results of real-time events, for monitoring oil & gas production, watching traffic status and detecting accident, etc, which has given rise to the need of providing Continuous analytics as a Service (CaaaS). While representing a paradigm shift in cloud computing, CaaaS poses several challenges in scalability, latency, time-window semantics, transaction control and result-set staging. A data stream is infinite thus can only be analyzed in granules. We propose a continuous query model over both static relations and dynamic streaming data, which allows a long-standing SQL query instance to run cycle by cycle, each cycle for a chunk of data from the data stream, using a cut-and-rewind mechanism. We further support the cycle-based transaction model with cycle-based isolation and visibility, for delivering analytics results to the clients continuously while the query is running. To have the continuously generated analytics results staged efficiently, we developed the table-ring and label switching mechanism characterized by staging data through metadata manipulation without physical data moving and copying. To scale-out analytics computation, we support both parallel database based and network distributed Map-Reduce based infrastructure with multiple cooperating engines. We have built the proposed infrastructure by extending the PostgreSQL engine. We tested the throughput and latency of this service based on a well-known stream processing benchmark; the results show that the proposed approach is highly competitive. Our experiments indicate that the database technology can be extended and applied to real-time continuous analytics service provisioning.

Explore More