Is this you? Create Your Porfile

Arthur Choi

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arthur Choi is active.

Explore More

Publication

Featured researches published by Arthur Choi.

Bioinformatics | 2010

Optimal algorithms for haplotype assembly from whole-genome sequence data

Dan He; Arthur Choi; Knot Pipatsrisawat; Adnan Darwiche; Eleazar Eskin

Motivation: Haplotype inference is an important step for many types of analyses of genetic variation in the human genome. Traditional approaches for obtaining haplotypes involve collecting genotype information from a population of individuals and then applying a haplotype inference algorithm. The development of high-throughput sequencing technologies allows for an alternative strategy to obtain haplotypes by combining sequence fragments. The problem of ‘haplotype assembly’ is the problem of assembling the two haplotypes for a chromosome given the collection of such fragments, or reads, and their locations in the haplotypes, which are pre-determined by mapping the reads to a reference genome. Errors in reads significantly increase the difficulty of the problem and it has been shown that the problem is NP-hard even for reads of length 2. Existing greedy and stochastic algorithms are not guaranteed to find the optimal solutions for the haplotype assembly problem. Results: In this article, we proposed a dynamic programming algorithm that is able to assemble the haplotypes optimally with time complexity O(m × 2k × n), where m is the number of reads, k is the length of the longest read and n is the total number of SNPs in the haplotypes. We also reduce the haplotype assembly problem into the maximum satisfiability problem that can often be solved optimally even when k is large. Taking advantage of the efficiency of our algorithm, we perform simulation experiments demonstrating that the assembly of haplotypes using reads of length typical of the current sequencing technologies is not practical. However, we demonstrate that the combination of this approach and the traditional haplotype phasing approaches allow us to practically construct haplotypes containing both common and rare variants. Contact: [email protected]

Innovations in Systems and Software Engineering | 2013

Software health management with Bayesian networks

Johann Schumann; Timmy Mbaya; Ole J. Mengshoel; Knot Pipatsrisawat; Ashok N. Srivastava; Arthur Choi; Adnan Darwiche

Software health management (SWHM) is an emerging field which addresses the critical need to detect, diagnose, predict, and mitigate adverse events due to software faults and failures. These faults could arise for numerous reasons including coding errors, unanticipated faults or failures in hardware, or problematic interactions with the external environment. This paper demonstrates a novel approach to software health management based on a rigorous Bayesian formulation that monitors the behavior of software and operating system, performs probabilistic diagnosis, and provides information about the most likely root causes of a failure or software problem. Translation of the Bayesian network model into an efficient data structure, an arithmetic circuit, makes it possible to perform SWHM on resource-restricted embedded computing platforms as found in aircraft, unmanned aircraft, or satellites. SWHM is especially important for safety critical systems such as aircraft control systems. In this paper, we demonstrate our Bayesian SWHM system on three realistic scenarios from an aircraft control system: (1) aircraft file-system based faults, (2) signal handling faults, and (3) navigation faults due to inertial measurement unit (IMU) failure or compromised Global Positioning System (GPS) integrity. We show that the method successfully detects and diagnoses faults in these scenarios. We also discuss the importance of verification and validation of SWHM systems.

european conference on symbolic and quantitative approaches to reasoning and uncertainty | 2013

Compiling probabilistic graphical models using sentential decision diagrams

Arthur Choi; Doga Kisa; Adnan Darwiche

Knowledge compilation is a powerful approach to exact inference in probabilistic graphical models, which is able to effectively exploit determinism and context-specific independence, allowing it to scale to highly connected models that are otherwise infeasible using more traditional methods (based on treewidth alone). Previous approaches were based on performing two steps: encode a model into CNF, then compile the CNF into an equivalent but more tractable representation (d-DNNF), where exact inference reduces to weighted model counting. In this paper, we investigate a bottom-up approach, that is enabled by a recently proposed representation, the Sentential Decision Diagram (SDD). We describe a novel and efficient way to encode the factors of a given model directly to SDDs, bypassing the CNF representation. To compile a given model, it now suffices to conjoin the SDD representations of its factors, using an apply operator, which d-DNNFs lack. Empirically, we find that our simpler approach to knowledge compilation is as effective as those based on d-DNNFs, and at times, orders-of-magnitude faster.

International Journal of Approximate Reasoning | 2012

Same-decision probability: A confidence measure for threshold-based decisions

Arthur Choi; Yexiang Xue; Adnan Darwiche

We consider in this paper the robustness of decisions based on probabilistic thresholds. To this effect, we propose the same-decision probability as a query that can be used as a confidence measure for threshold-based decisions. More specifically, the same-decision probability is the probability that we would have made the same threshold-based decision, had we known the state of some hidden variables pertaining to our decision. We study a number of properties about the same-decision probability. First, we analyze its computational complexity. We then derive a bound on its value, which we can compute using a variable elimination algorithm that we propose. Finally, we consider decisions based on noisy sensors in particular, showing through examples that the same-decision probability can be used to reason about threshold-based decisions in a more refined way.

principles and practice of constraint programming | 2009

Approximating weighted Max-SAT problems by compensating for relaxations

Arthur Choi; Trevor Scott Standley; Adnan Darwiche

We introduce a new approach to approximating weighted Max-SAT problems that is based on simplifying a given instance, and then tightening the approximation. First, we relax its structure until it is tractable for exact algorithms. Second, we compensate for the relaxation by introducing auxiliary weights. More specifically, we relax equivalence constraints from a given Max-SAT problem, which we compensate for by recovering a weaker notion of equivalence. We provide a simple algorithm for finding these approximations, that is based on iterating over relaxed constraints, compensating for them one-by-one. We show that the resulting Max-SAT instances have certain interesting properties, both theoretical and empirical.

Journal of Artificial Intelligence Research | 2014

Algorithms and applications for the same-decision probability

Suming Jeremiah Chen; Arthur Choi; Adnan Darwiche

When making decisions under uncertainty, the optimal choices are often difficult to discern, especially if not enough information has been gathered. Two key questions in this regard relate to whether one should stop the information gathering process and commit to a decision (stopping criterion), and if not, what information to gather next (selection criterion). In this paper, we show that the recently introduced notion, Same-Decision Probability (SDP), can be useful as both a stopping and a selection criterion, as it can provide additional insight and allow for robust decision making in a variety of scenarios. This query has been shown to be highly intractable, being PPPP-complete, and is exemplary of a class of queries which correspond to the computation of certain expectations. We propose the first exact algorithm for computing the SDP, and demonstrate its effectiveness on several real and synthetic networks. Finally, we present new complexity results, such as the complexity of computing the SDP on models with a Naive Bayes structure. Additionally, we prove that computing the non-myopic value of information is complete for the same complexity class as computing the SDP

Artificial Intelligence | 2017

Learning Bayesian network parameters under equivalence constraints

Tiansheng Yao; Arthur Choi; Adnan Darwiche

We propose a principled approach for learning parameters in Bayesian networks from incomplete datasets, where the examples of a dataset are subject to equivalence constraints. These equivalence constraints arise from datasets where examples are tied together, in that we may not know the value of a particular variable, but whatever that value is, we know it must be the same across different examples. We formalize the problem by defining the notion of a constrained dataset and a corresponding constrained likelihood that we seek to optimize. We further propose a new learning algorithm that can effectively learn more accurate Bayesian networks using equivalence constraints, which we demonstrate empirically. Moreover, we highlight how our general approach can be brought to bear on more specialized learning tasks, such as those in semi-supervised clustering and topic modeling, where more domain-specific approaches were previously developed.

graph structures for knowledge representation and reasoning | 2015

Learning Bayesian Networks with Non-Decomposable Scores

Eunice Yuh-Jie Chen; Arthur Choi; Adnan Darwiche

Modern approaches for optimally learning Bayesian network structures require decomposable scores. Such approaches include those based on dynamic programming and heuristic search methods. These approaches operate in a search space called the order graph, which has been investigated extensively in recent years. In this paper, we break from this tradition, and show that one can effectively learn structures using non-decomposable scores by exploring a more complex search space that leverages state-of-the-art learning systems based on order graphs. We show how the new search space can be used to learn with priors that are not structure-modular (a particular class of non-decomposable scores). We also show that it can be used to efficiently enumerate the \(k\)-best structures, in time that can be up to three orders of magnitude faster, compared to existing approaches.

international joint conference on artificial intelligence | 2018

A Symbolic Approach to Explaining Bayesian Network Classifiers

Andy Shih; Arthur Choi; Adnan Darwiche

We propose an approach for explaining Bayesian network classifiers, which is based on compiling such classifiers into decision functions that have a tractable and symbolic form. We introduce two types of explanations for why a classifier may have classified an instance positively or negatively and suggest algorithms for computing these explanations. The first type of explanation identifies a minimal set of the currently active features that is responsible for the current classification, while the second type of explanation identifies a minimal set of features whose current state (active or not) is sufficient for the classification. We consider in particular the compilation of Naive and Latent-Tree Bayesian network classifiers into Ordered Decision Diagrams (ODDs), providing a context for evaluating our proposal using case studies and experiments based on classifiers from the literature.

International Journal of Approximate Reasoning | 2018

On pruning with the MDL Score

Eunice Yuh-Jie Chen; Adnan Darwiche; Arthur Choi

The space of Bayesian network structures is forbiddingly large and hence numerous techniques have been developed to prune this search space, but without eliminating the optimal structure. Such techniques are critical for structure learning to scale to larger datasets with more variables. Prior works exploited properties of the MDL score to prune away large regions of the search space that can be safely ignored by optimal structure learning algorithms. In this paper, we propose new techniques for pruning regions of the search space that can be safely ignored by algorithms that enumerate the k-best Bayesian network structures. Empirically, we show that these techniques allow a state-of-the-art structure enumeration algorithm to scale to datasets with significantly more variables.

Explore More