Loris D'Antoni
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Loris D'Antoni.
symposium on principles of programming languages | 2014
Loris D'Antoni; Margus Veanes
Symbolic Automata extend classical automata by using symbolic alphabets instead of finite ones. Most of the classical automata algorithms rely on the alphabet being finite, and generalizing them to the symbolic setting is not a trivial task. In this paper we study the problem of minimizing symbolic automata. We formally define and prove the basic properties of minimality in the symbolic setting, and lift classical minimization algorithms (Huffman-Moores and Hopcrofts algorithms) to symbolic automata. While Hopcrofts algorithm is the fastest known algorithm for DFA minimization, we show how, in the presence of symbolic alphabets, it can incur an exponential blowup. To address this issue, we introduce a new algorithm that fully benefits from the symbolic representation of the alphabet and does not suffer from the exponential blowup. We provide comprehensive performance evaluation of all the algorithms over large benchmarks and against existing state-of-the-art implementations. The experiments show how the new symbolic algorithm is faster than previous implementations.
logic in computer science | 2013
Rajeev Alur; Loris D'Antoni; Jyotirmoy V. Deshmukh; Mukund Raghothaman; Yifei Yuan
We propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and minimum), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of string-to-tree transducers, and allows associating costs with events that are conditioned on regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple “write-only” registers whose values can be combined using the allowed set of operations. We show that the classical shortest-path algorithms as well as the algorithms designed for computing discounted costs can be adapted for solving the min-cost problems for the more general classes of functions specified in our model. Cost register automata with the operations of minimum and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this connection results in new insights and new open problems.
Journal of the ACM | 2017
Rajeev Alur; Loris D'Antoni
The theory of tree transducers provides a foundation for understanding expressiveness and complexity of analysis problems for specification languages for transforming hierarchically structured data such as XML documents. We introduce streaming tree transducers as an analyzable, executable, and expressive model for transforming unranked ordered trees (and forests) in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the output in linear time using a finite-state control, a visibly pushdown stack, and a finite number of variables that store output chunks that can be combined using the operations of string-concatenation and tree-insertion. We prove that the expressiveness of the model coincides with transductions definable using monadic second-order logic (MSO). Existing models of tree transducers either cannot implement all MSO-definable transformations, or require regular look-ahead that prohibits single-pass implementation. We show a variety of analysis problems such as type-checking and checking functional equivalence are decidable for our model.
international colloquium on automata languages and programming | 2012
Rajeev Alur; Loris D'Antoni
Theory of tree transducers provides a foundation for understanding expressiveness and complexity of analysis problems for specification languages for transforming hierarchically structured data such as XML documents. We introduce streaming tree transducers as an analyzable, executable, and expressive model for transforming unranked ordered trees (and hedges) in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the output using a finite-state control, a visibly pushdown stack, and a finite number of variables that store output chunks that can be combined using the operations of string-concatenation and tree-insertion. We prove that the expressiveness of the model coincides with transductions definable using monadic second-order logic (MSO). We establish complexity upper bounds of ExpTime for type-checking and NExpTime for checking functional equivalence for our model. We consider variations of the basic model when inputs/outputs are restricted to strings and ranked trees, and in particular, present the model of bottom-up ranked-tree transducers, which is the first known MSO-equivalent transducer model that processes trees in a bottom-up manner.
verification model checking and abstract interpretation | 2013
Loris D'Antoni; Margus Veanes
There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software. One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers SFTs called Extended Symbolic Finite Transducers ESFTs that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale. In our evaluation we use a UTF-16 to UTF-8 translator utf8encoder and a UTF-8 to UTF-16 translator utf8decoder. We show, among other properties, that utf8encoder and utf8decoder are functionally correct.
computer aided verification | 2013
Loris D'Antoni; Margus Veanes
Symbolic Finite Transducers augment classic transducers with symbolic alphabets represented as parametric theories. Such extension enables succinctness and the use of potentially infinite alphabets while preserving closure and decidability properties. Extended Symbolic Finite Transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. While when the alphabet is finite this extension does not add expressiveness, it does so when the alphabet is symbolic. We show how such increase in expressiveness causes decision problems such as equivalence to become undecidable and closure properties such as composition to stop holding. We also investigate how the automata counterpart, Extended Symbolic Finite Automata, differs from Symbolic Finite Automata. We then introduce the subclass of Cartesian Extended Symbolic Finite Transducers in which guards are limited to conjunctions of unary predicates. Our main result is an equivalence algorithm for such subclass in the single-valued case. Finally, we model real world problems with Cartesian Extended Symbolic Finite Transducers and use the equivalence algorithm to prove their correctness.
ACM Transactions on Database Systems | 2013
Barzan Mozafari; Kai Zeng; Loris D'Antoni; Carlo Zaniolo
While Complex Event Processing (CEP) constitutes a considerable portion of the so-called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semistructured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this article, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq to go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeqs power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.
learning at scale | 2017
Andrew Head; Elena L. Glassman; Gustavo Soares; Ryo Suzuki; Lucas Figueredo; Loris D'Antoni; Björn Hartmann
In large introductory programming classes, teacher feedback on individual incorrect student submissions is often infeasible. Program synthesis techniques are capable of fixing student bugs and generating hints automatically, but they lack the deep domain knowledge of a teacher and can generate functionally correct but stylistically poor fixes. We introduce a mixed-initiative approach which combines teacher expertise with data-driven program synthesis techniques. We demonstrate our novel approach in two systems that use different interaction mechanisms. Our systems use program synthesis to learn bug-fixing code transformations and then cluster incorrect submissions by the transformations that correct them. The MistakeBrowser system learns transformations from examples of students fixing bugs in their own submissions. The FixPropagator system learns transformations from teachers fixing bugs in incorrect student submissions. Teachers can write feedback about a single submission or a cluster of submissions and propagate the feedback to all other submissions that can be fixed by the same transformation. Two studies suggest this approach helps teachers better understand student bugs and write reusable feedback that scales to a massive introductory programming classroom.
formal methods | 2015
Loris D'Antoni; Margus Veanes
Symbolic finite automata and transducers augment classic automata and transducers with symbolic alphabets represented as parametric theories. This extension enables to succinctly represent large and potentially infinite alphabets while preserving closure and decidability properties. Extended symbolic finite automata and transducers further extend these objects by allowing transitions to read consecutive input elements in a single step. In this paper we study the properties of these models. In contrast to the case of finite alphabets, we show how reading multiple symbols increases the expressiveness of the models, which causes some closure properties to stop holding and most decision problems to become undecidable. In particular we show how extended symbolic finite transducers are not closed under composition, and the equivalence problem is undecidable for both extended symbolic finite automata and transducers. We then introduce the subclass of Cartesian extended symbolic finite transducers in which guards are limited to conjunctions of unary predicates and we propose an equivalence algorithm for this subclass in the single-valued case. We also present a heuristic algorithm for composing extended symbolic finite transducers that works for many practical cases. Finally, we model real world programs with Cartesian extended symbolic finite transducers and use the proposed algorithms to prove their correctness.
computer aided verification | 2014
Loris D'Antoni; Rajeev Alur
Nested words model data with both linear and hierarchical structure such as XML documents and program traces. A nested word is a sequence of positions together with a matching relation that connects open tags (calls) with the corresponding close tags (returns). Visibly Pushdown Automata are a restricted class of pushdown automata that process nested words, and have many appealing theoretical properties such as closure under Boolean operations and decidable equivalence. However, like any classical automata models, they are limited to finite alphabets. This limitation is restrictive for practical applications to both XML processing and program trace analysis, where values for individual symbols are usually drawn from an unbounded domain. With this motivation, we introduce Symbolic Visibly Pushdown Automata (SVPA) as an executable model for nested words over infinite alphabets. In this model, transitions are labeled with predicates over the input alphabet, analogous to symbolic automata processing strings over infinite alphabets. A key novelty of SVPAs is the use of binary predicates to model relations between open and close tags in a nested word. We show how SVPAs still enjoy the decidability and closure properties of Visibly Pushdown Automata. We use SVPAs to model XML validation policies and program properties that are not naturally expressible with previous formalisms and provide experimental results for our implementation.