Duen Horng Chau | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Duen Horng Chau is active.

Explore More

Publication

Featured researches published by Duen Horng Chau.

international world wide web conferences | 2007

Netprobe: a fast and scalable system for fraud detection in online auction networks

Shashank Pandit; Duen Horng Chau; Samuel Wang; Christos Faloutsos

Given a large online network of online auction users and their histories of transactions, how can we spot anomalies and auction fraud? This paper describes the design and implementation of NetProbe, a system that we propose for solving this problem. NetProbe models auction users and transactions as a Markov Random Field tuned to detect the suspicious patterns that fraudsters create, and employs a Belief Propagation mechanism to detect likely fraudsters. Our experiments show that NetProbe is both efficient and effective for fraud detection. We report experiments on synthetic graphs with as many as 7,000 nodes and 30,000 edges, where NetProbe was able to spot fraudulent nodes with over 90% precision and recall, within a matter of seconds. We also report experiments on a real dataset crawled from eBay, with nearly 700,000 transactions between more than 66,000users, where NetProbe was highly effective at unearthing hidden networks of fraudsters, within a realistic response time of about 6 minutes. For scenarios where the underlying data is dynamic in nature, we propose IncrementalNetProbe, which is an approximate, but fast, variant of NetProbe. Our experiments prove that Incremental NetProbe executes nearly doubly fast as compared to NetProbe, while retaining over 99% of its accuracy.

human factors in computing systems | 2011

Apolo: making sense of large network data by combining rich user interaction and machine learning

Duen Horng Chau; Aniket Kittur; Jason I. Hong; Christos Faloutsos

Extracting useful knowledge from large network datasets has become a fundamental challenge in many domains, from scientific literature to social networks and the web. We introduce Apolo, a system that uses a mixed-initiative approach - combining visualization, rich user interaction and machine learning - to guide the user to incrementally and interactively explore large network data and make sense of it. Apolo engages the user in bottom-up sensemaking to gradually build up an understanding over time by starting small, rather than starting big and drilling down. Apolo also helps users find relevant information by specifying exemplars, and then using a machine learning method called Belief Propagation to infer which other nodes may be of interest. We evaluated Apolo with twelve participants in a between-subjects study, with the task being to find relevant new papers to update an existing survey paper. Using expert judges, participants using Apolo found significantly more relevant papers. Subjective feedback of Apolo was also very positive.

symposium on visual languages and human-centric computing | 2006

A Linguistic Analysis of How People Describe Software Problems

Andrew J. Ko; Brad A. Myers; Duen Horng Chau

There is little understanding of how people describe software problems, but a variety of tools solicit, manage, and analyze these descriptions in order to streamline software development. To inform the design of these tools and generate ideas for new ones, an study of nearly 200,000 bug report titles was performed. The titles of the reports generally described a software entity or behavior, its inadequacy, and an execution context, suggesting new designs for more structured report forms. About 95% of noun phrases referred to visible software entities, physical devices, or user actions, suggesting the feasibility of allowing users to select these entities in debuggers and other tools. Also, the structure of the titles exhibited sufficient regularity to parse with an accuracy of 89%, enabling a number of new automated analyses. These findings and others have many implications for tool design and software engineering

european conference on principles of data mining and knowledge discovery | 2006

Detecting fraudulent personalities in networks of online auctioneers

Duen Horng Chau; Shashank Pandit; Christos Faloutsos

Online auctions have gained immense popularity by creating an accessible environment for exchanging goods at reasonable prices. Not surprisingly, malevolent auction users try to abuse them by cheating others. In this paper we propose a novel method, 2-Level Fraud Spotting (2LFS), to model the techniques that fraudsters typically use to carry out fraudulent activities, and to detect fraudsters preemptively. Our key contributions are: (a) we mine user level features (e.g., number of transactions, average price of goods exchanged, etc.) to get an initial belief for spotting fraudsters, (b) we introduce network level features which capture the interactions between different users, and (c) we show how to combine both these features using a Belief Propagation algorithm over a Markov Random Field, and use it to detect suspicious patterns (e.g., unnaturally close-nit groups of people that trade mainly among themselves). Our algorithm scales linearly with the number of graph edges. Moreover, we illustrate the effectiveness of our algorithm on a real dataset collected from a large online auction site.

international world wide web conferences | 2007

Parallel crawling for online social networks

Duen Horng Chau; Shashank Pandit; Samuel Wang; Christos Faloutsos

Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers that work independently? In this paper, we present the framework of parallel crawlers for online social networks, utilizing a centralized queue. To show how this works in practice, we describe our implementation of the crawlers for an online auction website. The crawlers work independently, therefore the failing of one crawler does not affect the others at all. The framework ensures that no redundant crawling would occur. Using the crawlers that we built, we visited a total of approximately 11 million auction users, about 66,000 of which were completely crawled.

international conference on data mining | 2010

On the Vulnerability of Large Graphs

Hanghang Tong; B. Aditya Prakash; Charalampos E. Tsourakakis; Tina Eliassi-Rad; Christos Faloutsos; Duen Horng Chau

Given a large graph, like a computer network, which k nodes should we immunize (or monitor, or remove), to make it as robust as possible against a computer virus attack? We need (a) a measure of the ‘Vulnerability’ of a given network, b) a measure of the ‘Shield-value’ of a specific set of k nodes and (c) a fast algorithm to choose the best such k nodes. We answer all these three questions: we give the justification behind our choices, we show that they agree with intuition as well as recent results in immunology. Moreover, we propose Net Shield, a fast and scalable algorithm. Finally, we give experiments on large real graphs, where Net Shield achieves tremendous speed savings exceeding 7 orders of magnitude, against straightforward competitors.

user interface software and technology | 2006

Huddle: automatically generating interfaces for systems of multiple connected appliances

Jeffrey R. Pierce; Jeffrey Nichols; Brandon Rothrock; Duen Horng Chau; Brad A. Myers

Systems of connected appliances, such as home theaters and presentation rooms, are becoming commonplace in our homes and workplaces. These systems are often difficult to use, in part because users must determine how to split the tasks they wish to perform into sub-tasks for each appliance and then find the particular functions of each appliance to complete their sub-tasks. This paper describes Huddle, a new system that automatically generates task-based interfaces for a system of multiple appliances based on models of the content flow within the multi-appliance system.

international conference on data engineering | 2011

Mining large graphs: Algorithms, inference, and discoveries

U Kang; Duen Horng Chau; Christos Faloutsos

How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such terabyte-scale graphs? In this work, we focus on inference, which often corresponds, intuitively, to “guilt by association” scenarios. For example, if a person is a drug-abuser, probably its friends are so, too; if a node in a social network is of male gender, his dates are probably females. We show how to do inference on such huge graphs through our proposed HAdoop Line graph Fixed Point (Ha-Lfp), an efficient parallel algorithm for sparse billion-scale graphs, using the Hadoop platform. Our contributions include (a) the design of Ha-Lfp, observing that it corresponds to a fixed point on a line graph induced from the original graph; (b) scalability analysis, showing that our algorithm scales up well with the number of edges, as well as with the number of machines; and (c) experimental results on two private, as well as two of the largest publicly available graphs — the Web Graphs from Yahoo! (6.6 billion edges and 0.24 Tera bytes), and the Twitter graph (3.7 billion edges and 0.13 Tera bytes). We evaluated our algorithm using M45, one of the top 50 fastest supercomputers in the world, and we report patterns and anomalies discovered by our algorithm, which would be invisible otherwise.

graphics interface | 2007

Eyes on the road, hands on the wheel: thumb-based interaction techniques for input on steering wheels

Iván E. González; Jacob O. Wobbrock; Duen Horng Chau; Andrew Faulring; Brad A. Myers

The increasing quantity and complexity of in-vehicle systems creates a demand for user interfaces which are suited to driving. The steering wheel is a common location for the placement of buttons to control navigation, entertainment, and environmental systems, but what about a small touchpad? To investigate this question, we embedded a Synaptics StampPad in a computer game steering wheel and evaluated seven methods for selecting from a list of over 3000 street names. Selection speed was measured while stationary and while driving a simulator. Results show that the EdgeWrite gestural text entry method is about 20% to 50% faster than selection-based text entry or direct list-selection methods. They also show that methods with slower selection speeds generally resulted in faster driving speeds. However, with EdgeWrite, participants were able to maintain their speed and avoid incidents while selecting and driving at the same time. Although an obvious choice for constrained input, on-screen keyboards generally performed quite poorly.

international conference on acoustics, speech, and signal processing | 2012

Pegasus: Mining billion-scale graphs in the cloud

U Kang; Duen Horng Chau; Christos Faloutsos

We have entered in an era of big data. Graphs are now measured in terabytes or even petabytes; analyzing them has become increasingly challenging. How do we find patterns and anomalies in these graphs that no longer fit in memory? How should we exploit parallel computation to boost our analysis capabilities? We present PEGASUS, the first open-source, peta-scale graph mining library, for the HADOOP platform (open-source implementation of MAPREDUCE). By observing that many graph mining operations can be described by repeated matrix-vector multiplications, we devised an important primitive called GIM-V for PEGASUS that applies to all such operations. GIM-V (Generalized Iterative Matrix-Vector multiplication) is highly optimized, achieving (1) good scale-up with the number of machines, (2) linear run time on the number of edges, and (3) more than 9 times faster performance over the non-optimized version. We ran experiments for PEGASUS on M45, one of the largest HADOOP clusters in the world. We report our findings on several real graphs with billions of nodes and edges. Selected findings include (a) the discovery of adult advertisers in the who-follows-whom on Twitter, and (b) the 7-degrees of separation in the Web graph.

Explore More