Is this you? Create Your Porfile

Borja Balle

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Borja Balle is active.

Explore More

Publication

Featured researches published by Borja Balle.

Machine Learning | 2014

Spectral learning of weighted automata

Borja Balle; Xavier Carreras; Franco M. Luque; Ariadna Quattoni

In recent years we have seen the development of efficient provably correct algorithms for learning Weighted Finite Automata (WFA). Most of these algorithms avoid the known hardness results by defining parameters beyond the number of states that can be used to quantify the complexity of learning automata under a particular distribution. One such class of methods are the so-called spectral algorithms that measure learning complexity in terms of the smallest singular value of some Hankel matrix. However, despite their simplicity and wide applicability to real problems, their impact in application domains remains marginal to this date. One of the goals of this paper is to remedy this situation by presenting a derivation of the spectral method for learning WFA that—without sacrificing rigor and mathematical elegance—puts emphasis on providing intuitions on the inner workings of the method and does not assume a strong background in formal algebraic methods. In addition, our algorithm overcomes some of the shortcomings of previous work and is able to learn from statistics of substrings. To illustrate the approach we present experiments on a real application of the method to natural language parsing.

european conference on machine learning | 2011

A spectral learning algorithm for Finite State Transducers

Borja Balle; Ariadna Quattoni; Xavier Carreras

Finite-State Transducers (FSTs) are a popular tool for modeling paired input-output sequences, and have numerous applications in real-world problems. Most training algorithms for learning FSTs rely on gradient-based or EM optimizations which can be computationally expensive and suffer from local optima issues. Recently, Hsu et al. [13] proposed a spectral method for learning Hidden Markov Models (HMMs) which is based on an Observable Operator Model (OOM) view of HMMs. Following this line of work we present a spectral algorithm to learn FSTs with strong PAC-style guarantees. To the best of our knowledge, ours is the first result of this type for FST learning. At its core, the algorithm is simple, and scalable to large data sets. We present experiments that validate the effectiveness of the algorithm on synthetic and real data.

IEEE Transactions on Instrumentation and Measurement | 2008

Absolute-Type Shaft Encoding Using LFSR Sequences With a Prescribed Length

Josep M. Fuertes; Borja Balle; Enric Ventura

Maximal-length binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n - 1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.

logic in computer science | 2015

A Canonical Form for Weighted Automata and Applications to Approximate Minimization

Borja Balle; Prakash Panangaden; Doina Precup

We study the problem of constructing approximations to a weighted automaton. Weighted finite automata (WFA) are closely related to the theory of rational series. A rational series is a function from strings to real numbers that can be computed by a WFA. Among others, this includes probability distributions generated by hidden Markov models and probabilistic automata. The relationship between rational series and WFA is analogous to the relationship between regular languages and ordinary automata. Associated with such rational series are infinite matrices called Hankel matrices which play a fundamental role in the theory of minimal WFA. Our contributions are: (1) an effective procedure for computing the singular value decomposition (SVD) of such infinite Hankel matrices based on their finite representation in terms of WFA, (2) a new canonical form for WFA based on this SVD decomposition, and, (3) an algorithm to construct approximate minimizations of a given WFA. The goal of our approximate minimization algorithm is to start from a minimal WFA and produce a smaller WFA that is close to the given one in a certain sense. The desired size of the approximating automaton is given as input. We give bounds describing how well the approximation emulates the behavior of the original WFA. The study of this problem is motivated by the analysis of machine learning algorithms that synthesize weighted automata from spectral decompositions of finite Hankel matrices. It is known that when the number of states of the target automaton is correctly guessed, these algorithms enjoy consistency and finite-sample guarantees in the probably approximately correct (PAC) learning model. It has also been suggested that asking the learning algorithm to produce a model smaller than the true one will still yield useful models with reduced complexity. Our results in this paper vindicate these ideas and confirm intuitions provided by empirical studies. Beyond learning problems, our techniques can also be used to reduce the complexity of any algorithm working with WFA, at the expense of incurring a small, controlled amount of error.

conference on algebraic informatics | 2015

Learning Weighted Automata

Borja Balle; Mehryar Mohri

Weighted finite automata (WFA) are finite automata whose transitions and states are augmented with some weights, elements of a semiring. A WFA induces a function over strings. The value it assigns to an input string is the semiring sum of the weights of all paths labeled with that string, where the weight of a path is obtained by taking the semiring product of the weights of its constituent transitions, as well as those of its origin and destination states.

Machine Learning | 2014

Adaptively learning probabilistic deterministic automata from data streams

Borja Balle; Jorge Castro; Ricard Gavaldà

Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for inferring models in this class in the restrictive data stream scenario: Unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We also present extensions of the algorithm that (1) reduce to a minimum the need for guessing parameters of the target distribution and (2) are able to adapt to changes in the input distribution, relearning new models when needed. We provide rigorous PAC-like bounds for all of the above. Our algorithm makes a key usage of stream sketching techniques for reducing memory and processing time, and is modular in that it can use different tests for state equivalence and for change detection in the stream.

algorithmic learning theory | 2010

A lower bound for learning distributions generated by probabilistic automata

Borja Balle; Jorge Castro; Ricard Gavaldà

Known algorithms for learning PDFA can only be shown to run in time polynomial in the so-called distinguishability µ of the target machine, besides the number of states and the usual accuracy and confidence parameters. We show that the dependence on µ is necessary for every algorithm whose structure resembles existing ones. As a technical tool, a new variant of Statistical Queries termed L∞-queries is defined. We show how these queries can be simulated from samples and observe that known PAC algorithms for learning PDFA can be rewritten to access its target using L∞-queries and standard Statistical Queries. Finally, we show a lower bound: every algorithm to learn PDFA using queries with a resonable tolerance needs a number of queries larger than (1/µ)c for every c < 1.

algorithmic learning theory | 2015

On the Rademacher Complexity of Weighted Automata

Borja Balle; Mehryar Mohri

Weighted automata WFAs provide a general framework for the representation of functions mapping strings to real numbers. They include as special instances deterministic finite automata DFAs, hidden Markov models HMMs, and predictive states representations PSRs. In recent years, there has been a renewed interest in weighted automata in machine learning due to the development of efficient and provably correct spectral algorithms for learning weighted automata. Despite the effectiveness reported for spectral techniques in real-world problems, almost all existing statistical guarantees for spectral learning of weighted automata rely on a strong realizability assumption. In this paper, we initiate a systematic study of the learning guarantees for broad classes of weighted automata in an agnostic setting. Our results include bounds on the Rademacher complexity of three general classes of weighted automata, each described in terms of different natural quantities. Interestingly, these bounds underline the key role of different data-dependent parameters in the convergence rates.

international colloquium on automata languages and programming | 2017

Bisimulation Metrics for Weighted Automata

Borja Balle; Pascale Gourdeau; Prakash Panangaden

We develop a new bisimulation (pseudo)metric for weighted finite automata (WFA) that generalizes Boreales linear bisimulation relation. Our metrics are induced by seminorms on the state space of WFA. Our development is based on spectral properties of sets of linear operators. In particular, the joint spectral radius of the transition matrices of WFA plays a central role. We also study continuity properties of the bisimulation pseudometric, establish an undecidability result for computing the metric, and give a preliminary account of applications to spectral learning of weighted automata.

international colloquium on grammatical inference | 2010

Learning PDFA with asynchronous transitions

Borja Balle; Jorge Castro; Ricard Gavaldà

In this paper we extend the PAC learning algorithm due to Clark and Thollard for learning distributions generated by PDFA to automata whose transitions may take varying time lengths, governed by exponential distribution.

Explore More