Is this you? Create Your Porfile

Richard Arratia

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Richard Arratia is active.

Explore More

Publication

Featured researches published by Richard Arratia.

Bulletin of Mathematical Biology | 1984

Pattern recognition in several sequences: Consensus and alignment

Michael S. Waterman; Richard Arratia; D. J. Galas

The comparison of several sequences is central to many problems of molecular biology. Finding consensus patterns that define genetic control regions or that determine structural or functional themes are examples of these problems. Previously proposed methods, such as dynamic programming, are not adequate for solving problems of realistic size. This paper gives a new and practical solution for finding unknown patterns that occur imperfectly above a preset frequency. Algorithms for finding the patterns are given as well as estimates of statistical significance.

Bulletin of Mathematical Biology | 1989

Tutorial on large deviations for the binomial distribution

Richard Arratia

We present, in an easy to use form, the large deviation theory of the binomial distribution: how to approximate the probability of k or more successes in n independent trials, each with success probability p, when the specified fraction of successes, a identical to k/n, satisfies 0 less than p less than a less than 1.

Journal of Computational Biology | 1996

Poisson Process Approximation for Sequence Repeats, and Sequencing by Hybridization

Richard Arratia; Daniela Martin; Gesine Reinert; Michael S. Waterman

Sequencing by hybridization is a tool to determine a DNA sequence from the unordered list of all l-tuples contained in this sequence; typical numbers for l are l = 8, 10, 12. For theoretical purposes we assume that the multiset of all l-tuples is known. This multiset determines the DNA sequence uniquely if none of the so-called Ukkonen transformations are possible. These transformations require repeats of (l-1)-tuples in the sequence, with these repeats occurring in certain spatial patterns. We model DNA as an i.i.d. sequence. We first prove Poisson process approximations for the process of indicators of all leftmost long repeats allowing self-overlap and for the process of indicators of all left-most long repeats without self-overlap. Using the Chen-Stein method, we get bounds on the error of these approximations. As a corollary, we approximate the distribution of longest repeats. In the second step we analyze the spatial patterns of the repeats. Finally we combine these two steps to prove an approximation for the probability that a random sequence is uniquely recoverable from its list of l-tuples. For all our results we give some numerical examples including error bounds.

Random Structures and Algorithms | 1992

Limit Theorems for Combinatorial Structures via Discrete Process Approximations

Richard Arratia; Simon Tavaré

Discrete functional limit theorems, which give independent process approximations for the joint distribution of the component structure of combinatorial objects such as permutations and mappings, have recently become available. In this article, we demonstrate the power of these theorems to provide elementary proofs of a variety of new and old limit theorems, including results previously proved by complicated analytical methods. Among the examples we treat are Brownian motion limit theorems for the cycle counts of a random permutation or the component counts of a random mapping, a Poisson limit law for the core of a random mapping, a generalization of the Erdos-Turin Law for the log-order of a random permutation and the smallest component size of a random permutation, approximations to the joint laws of the smallest cycle sizes of a random mapping, and a limit distribution for the difference between the total number of cycles and the number of distinct cycle sizes in a random permutation. @ 1992 John Wiley & Sons, Inc.

Combinatorica | 2004

A Two-Variable Interlace Polynomial

Richard Arratia; Béla Bollobás; B. Sorkin

We introduce a new graph polynomial in two variables. This “interlace” polynomial can be computed in two very different ways. The first is an expansion analogous to the state space expansion of the Tutte polynomial; the significant differences are that our expansion is over vertex rather than edge subsets, and the rank and nullity employed are those of an adjacency matrix rather than an incidence matrix.The second computation is by a three-term reduction formula involving a graph pivot; the pivot arose previously in the study of interlacement and Euler circuits in four-regular graphs.We consider a few properties and specializations of the two-variable interlace polynomial. One specialization, the “vertex-nullity interlace polynomial”, is the single-variable interlace graph polynomial we studied previously, closely related to the Tutte–Martin polynomial on isotropic systems previously considered by Bouchet. Another, the “vertex-rank interlace polynomial”, is equally interesting. Yet another specialization of the two-variable polynomial is the independent-set polynomial.

Discrete Applied Mathematics | 2000

Euler circuits and DNA sequencing by hybridization

Richard Arratia; Béla Bollobás; Don Coppersmith; Gregory B. Sorkin

Sequencing by hybridization is a method of reconstructing a long DNA string — that is, figuring out its nucleotide sequence — from knowledge of its short substrings. Unique reconstruction is not always possible, and the goal of this paper is to study the number of reconstructions of a random string. For a given string, the number of reconstructions is determined by the pattern of repeated substrings; in an appropriate limit substrings will occur at most twice, so the pattern of repeats is given by a pairing: a string of length 2n in which each symbol occurs twice. A pairing induces a 2-in, 2-out graph, whose directed edges are defined by successive symbols of the pairing — for example the pairing ABBCAC induces the graph with edges AB, BB, BC, and so forth — and the number of reconstructions is simply the number of Euler circuits in this 2-in, 2-out graph. The original problem is thus transformed into one about pairings: to find the number fk(n) of n-symbol pairings having k Euler circuits. We show how to compute this function, in closed form, for any fixed k, and we present the functions explicitly for k=1,…,9. The key is a decomposition theorem: the Euler “circuit number” of a pairing is the product of the circuit numbers of “component” sub-pairings. These components come from connected components of the “interlace graph”, which has the pairings symbols as vertices, and edges when symbols are “interlaced”. (A and B are interlaced if the pairing has the form ABAB or BABA.) We carry these results back to the original question about DNA strings, and provide a total variation distance upper bound for the approximation error. We perform an asymptotic enumeration of 2-in, 2-out digraphs to show that, for a typical random n-pairing, the number of Euler circuits is of order no smaller than 2n/n, and the expected number is asymptotically at least e−1/22n−1/n. Since any n-pairing has at most 2n−1 Euler circuits, this pinpoints the exponential growth rate.

Advances in Mathematics | 1985

An Erdös-Rényi law with shifts

Richard Arratia; Michael S. Waterman

Abstract Motivated by the comparison of DNA sequences, a generalization is given of the result of Erdos and Renyi on the length R n of the longest run of heads in the first n tosses of a coin. Consider two sequences, X 1 X 2 … X n and Y 1 Y 2 … Y n . The length of the longest matching consecutive subsequence, allowing shifts, is M n ≡ max{ m : X i + k = Y j + k for k = 1 to m, for some 0 ⩽ i, j ⩽ n − m}. Suppose that all the “letters” are independent and identically distributed. The length of the longest match without shifts has the same distribution as R n , the length of the longest head run for a biased coin with p = P ( X i = Y i ), described by the Erdos-Renyi law: P( lim n → ∞ R n log 1 p (n) = 1) = 1 . For matching with shifts, our result is: P( lim n → ∞ M n log 1 p (n) = 2) = 1 . Loosely speaking, allowing shifts doubles the length of the longest match. The case of Markov chains is also handled.

Combinatorics, Probability & Computing | 1999

The Poisson–Dirichlet Distribution and the Scale-Invariant Poisson Process

Richard Arratia; A. D. Barbour; Simon Tavaré

We show that the Poisson–Dirichlet distribution is the distribution of points in a scale-invariant Poisson process, conditioned on the event that the sum T of the locations of the points in (0,1] is 1. This extends to a similar result, rescaling the locations by T, and conditioning on the event that T≤1. Restricting both processes to (0, β] for 0<β≤1, we give an explicit formula for the total variation distance between their distributions. Connections between various representations of the Poisson–Dirichlet process are discussed.

Combinatorics, Probability & Computing | 2016

Probabilistic Divide-and-Conquer: A New Exact Simulation Method, With Integer Partitions as an Example

Richard Arratia; Stephen DeSalvo

We propose a new method, probabilistic divide-and-conquer, for improving the success probability in rejection sampling. For the example of integer partitions, there is an ideal recursive scheme which improves the rejection cost from asymptotically order

Journal of Computational Biology | 1997