Arthur U. Asuncion | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arthur U. Asuncion is active.

Explore More

Publication

Featured researches published by Arthur U. Asuncion.

knowledge discovery and data mining | 2008

Fast collapsed gibbs sampling for latent dirichlet allocation

Ian Porteous; David Newman; Alexander T. Ihler; Arthur U. Asuncion; Padhraic Smyth; Max Welling

In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than the standard collapsed Gibbs sampler for LDA. No approximations are necessary, and we show that our fast sampling scheme produces exactly the same results as the standard (but slower) sampling scheme. Experiments on four real world data sets demonstrate speedups for a wide range of collection sizes. For the PubMed collection of over 8 million documents with a required computation time of 6 CPU months for LDA, our speedup of 5.7 can save 5 CPU months of computation.

international conference on software engineering | 2010

Software traceability with topic modeling

Hazeline U. Asuncion; Arthur U. Asuncion; Richard N. Taylor

Software traceability is a fundamentally important task in software engineering. The need for automated traceability increases as projects become more complex and as the number of artifacts increases. We propose an automated technique that combines traceability with a machine learning technique known as topic modeling. Our approach automatically records traceability links during the software development process and learns a probabilistic topic model over artifacts. The learned model allows for the semantic categorization of artifacts and the topical visualization of the software system. To test our approach, we have implemented several tools: an artifact search tool combining keyword-based search and topic modeling, a recording tool that performs prospective traceability, and a visualization tool that allows one to navigate the software architecture and view semantic topics associated with relevant artifacts and architectural components. We apply our approach to several data sets and discuss how topic modeling enhances software traceability, and vice versa.

ACM Transactions on Intelligent Systems and Technology | 2012

TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling

Brynjar Gretarsson; John O’Donovan; Svetlin Bostandjiev; Tobias Höllerer; Arthur U. Asuncion; David Newman; Padhraic Smyth

We present TopicNets, a Web-based system for visual and interactive analysis of large sets of documents using statistical topic models. A range of visualization types and control mechanisms to support knowledge discovery are presented. These include corpus- and document-specific views, iterative topic modeling, search, and visual filtering. Drill-down functionality is provided to allow analysts to visualize individual document sections and their relations within the global topic space. Analysts can search across a dataset through a set of expansion techniques on selected document and topic nodes. Furthermore, analysts can select relevant subsets of documents and perform real-time topic modeling on these subsets to interactively visualize topics at various levels of granularity, allowing for a better understanding of the documents. A discussion of the design and implementation choices for each visual analysis technique is presented. This is followed by a discussion of three diverse use cases in which TopicNets enables fast discovery of information that is otherwise hard to find. These include a corpus of 50,000 successful NSF grant proposals, 10,000 publications from a large research center, and single documents including a grant proposal and a PhD thesis.

workshop on privacy in the electronic society | 2010

Turning privacy leaks into floods: surreptitious discovery of social network friendships and other sensitive binary attribute vectors

Arthur U. Asuncion; Michael T. Goodrich

We study methods for attacking the privacy of social networking sites, collaborative filtering sites, databases of genetic signatures, and other data sets that can be represented as vectors of binary relationships. Our methods are based on reductions to nonadaptive group testing, which implies that our methods can exploit a minimal amount of privacy leakage, such as contained in a single bit that indicates if two people in a social network have a friend in common or not. We analyze our methods for turning such privacy leaks into floods using theoretical characterizations as well as experimental tests. Our empirical analyses are based on experiments involving privacy attacks on the social networking sites Facebook and LiveJournal, a database of mitochondrial DNA, a power grid network, and the movie-rating database released as a part of the Netflix Prize contest. For instance, with respect to Facebook, our analysis shows that it is effectively possible to break the privacy of members who restrict their friends lists to friends-of-friends.

IEEE Transactions on Knowledge and Data Engineering | 2013

Nonadaptive Mastermind Algorithms for String and Vector Databases, with Case Studies

Arthur U. Asuncion; Michael T. Goodrich

In this paper, we study sparsity-exploiting Mastermind algorithms for attacking the privacy of an entire database of character strings or vectors, such as DNA strings, movie ratings, or social network friendship data. Based on reductions to nonadaptive group testing, our methods are able to take advantage of minimal amounts of privacy leakage, such as contained in a single bit that indicates if two people in a medical database have any common genetic mutations, or if two people have any common friends in an online social network. We analyze our Mastermind attack algorithms using theoretical characterizations that provide sublinear bounds on the number of queries needed to clone the database, as well as experimental tests on genomic information, collaborative filtering data, and online social networks. By taking advantage of the generally sparse nature of these real-world databases and modulating a parameter that controls query sparsity, we demonstrate that relatively few nonadaptive queries are needed to recover a large majority of each database.

IEICE Transactions on Information and Systems | 2006

Toward Incremental Parallelization Using Navigational Programming*The authors gratefully acknowledge the support of a U.S. Department of Education GAANN Fellowship.

Lei Pan; Wenhui Zhang; Arthur U. Asuncion; Ming Kin Lai; Michael B. Dillencourt; Lubomir Bic; Laurence T. Yang

The Navigational Programming (NavP) methodology is based on the principle of self-migrating computations. It is a truly incremental methodology for developing parallel programs: each step represents a functioning program, and each intermediate program is an improvement over its predecessor. The transformations are mechanical and straightforward to apply. We illustrate our methodology in the context of matrix multiplication, showing how the transformations lead from a sequential program to a fully parallel program. The NavP methodology is conducive to new ways of thinking that lead to ease of programming and high performance. Even though our parallel algorithm was derived using a sequence of mechanical transformations, it displays certain performance advantages over the classical handcrafted Gentlemans Algorithm.

international conference on parallel processing | 2005

Incremental parallelization using navigational programming: a case study

Lei Pan; Wenhui Zhang; Arthur U. Asuncion; Ming Kin Lai; Michael B. Dillencourt; Lubomir Bic

We show how a series of transformations can be applied to incrementally parallelize sequential programs. Our navigational programming (NavP) methodology is based on the principle of self-migrating computations and is truly incremental, in that each step represents a functioning program and every intermediate program is an improvement over its predecessor. The transformations are mechanical and straightforward to apply. We illustrate our methodology in the context of matrix multiplication. Our final stage is similar to the classical Gentlemans algorithm. The NavP methodology is conducive to new ways of thinking that lead to ease of programming and high performance.

Archive | 2007