Inference with Constrained Hidden Markov Models in PRISM
Henning Christiansen, Christian Theil Have, Ole Torp Lassen, Matthieu Petit
aa r X i v : . [ c s . A I] J u l Under consideration for publication in Theory and Practice of Logic Programming Inference with Constrained Hidden MarkovModels in PRISM
Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit
Research group PLIS: Programming, Logic and Intelligent SystemsDepartment of Communication, Business and Information TechnologiesRoskilde University, P.O.Box 260, DK-4000 Roskilde, Denmark ( e-mail: { henning, cth, otl, petit } @ruc.dk ) submitted 7 February 2010; revised 10 April 2010; accepted 1 May 2010 Abstract
A Hidden Markov Model (HMM) is a common statistical model which is widely used foranalysis of biological sequence data and other sequential phenomena. In the present paperwe show how HMMs can be extended with side-constraints and present constraint solv-ing techniques for efficient inference. Defining HMMs with side-constraints in ConstraintLogic Programming have advantages in terms of more compact expression and prun-ing opportunities during inference. We present a PRISM-based framework for extendingHMMs with side-constraints and show how well-known constraints such as cardinality and all different are integrated. We experimentally validate our approach on the bio-logically motivated problem of global pairwise alignment.
KEYWORDS : Hidden Markov Model with side-constraints, Inference, PRogramming InStatistical Modeling
Note:
This article has been published in Theory and Practice of Logic Programming,10(4-6), 449464, c (cid:13)
Cambridge University Press.
Hidden Markov Models (HMMs) are one of the most popular models for analysisof sequential processes taking place in a random way, where “randomness” mayalso be an abstraction covering the fact that a detailed analytical model for theinternal matters is unavailable. Such a sequential process can be observed fromoutside by its emission sequence (letters, sounds, measures of features, all kindsof signals) produced over time, and an HMM postulates a hypothesis about theinternal machinery in terms of a finite state automaton equipped with probabil-ities for the different state transitions and single emissions. A common inferencefor a given observed sequence means to compute the “best” state transitions thatthe HMM may go through to produce the sequence, and thus this represents abest hypothesis for the internal structure or “content” of the sequence. HMMs arewidely used in speech recognition and biological sequence analysis (Rabiner 1989;Durbin et al. 1998).
H. Christiansen et al.
The efficiency of computations on HMMs heavily depends on the Markov prop-erty. Decisions made during a process run depends only on a limited past. Dynamicprogramming algorithms, such as Viterbi and Forward-Backward, are then used toperform efficient inference. However, many problems would require more complexdependencies among elements of the process. For example, it may be interestingto constrain an HMM to visit only different states or limit the number of visits toa given state. It is possible to model the all different constraint for the statesvisited by extending the underlying finite state automaton, but for the price ofa factorial number of new states and with an obvious impact on inference. As analternative to modifying the HMM structure, we instead extend the HMM with side-constraints (Sato and Kameya 2008; Roth and Yih 2005). However, classical algo-rithms, such as Viterbi, must be modified to take care about these side-constraints(Chang et al. 2008; Christiansen et al. 2009).In this paper, we extend HMMs with side-constraints, leading to what we callConstrained HMMs (CHMMs). Side-constraints are external constraints declaredin addition to those defined by the structure of an HMM. The concept of CHMMswas introduced by Sato et al. in (Sato and Kameya 2008), although earlier andunrelated systems have used the same or similar names (discussed in section 6).The contribution of this paper is to define CHMMs as constraint logic programsextended with probabilistic choices and to show how to employ this setting for moreefficient Viterbi computation, i.e., computation of the most probable explanationof an observation. Moreover, defining HMMs with side-constraints in ConstraintLogic Programming have advantages in terms of more compact expression andpruning opportunities during inference. We show how to implement CHMMs inPRISM (Sato and Kameya 1997) and how to integrate well-known constraints, suchas cardinality and all different , into this framework. We validate our approachexperimentally on the biologically motivated problem of global pairwise alignment.The paper is organized as follows: section 2 describes background on HMMs. Insection 3, we formally introduce the constraint model associated with a CHMM. Sec-tion 4 describes our PRISM-based framework to define CHMMs. Section 5 presentsan experimental validation. Finally, sections 6 and 7 present related work and con-clusions.
Here we define Hidden Markov Models (HMM)s and illustrate their application tothe problem of pairwise global alignment.
For simplicity of the technical definitions, we limit ourselves to a discrete HiddenMarkov Model with a distinguished initial state.
Definition 2.1 A Hidden Markov Model (HMM) is a 4-tuple h S, A, T, E i , where nference with Constrained Hidden Markov Models • S = { s , s , . . . , s m } is a set of states which includes an initial state referredto as s ; • A = { e , e , . . . , e k } is a finite set of emission symbols; • T = { ( p ( s ; s ) , . . . , p ( s ; s m )) , . . . , ( p ( s m ; s ) , . . . , p ( s m ; s m )) } is a set of tran-sition probability distributions representing probabilities to transit from onestate to another; • E = { ( p ( s ; e ) , . . . , p ( s ; e k )) , . . . , ( p ( s m ; e ) , . . . , p ( s m ; e k )) } is a set of emis-sion probability distributions representing probabilities to emit each symbolfrom each state.We define a run of an HMM as a pair consisting of a sequence of states s (0) s (1) . . . s ( n ) ,called a path and a corresponding sequence of emissions e (1) . . . e ( n ) , called an ob-servation , such that • s (0) = s ; • ∀ i, ≤ i ≤ n − , p ( s ( i ) ; s ( i +1) ) > s ( i ) to s ( i +1) ); • ∀ i, < i ≤ n, p ( s ( i ) ; e ( i ) ) > e ( i ) from s ( i ) ).The probability of such a run is defined as Q i =1 ..n p ( s ( i − ; s ( i ) ) · p ( s ( i ) ; e ( i ) ). As an example of an HMM that we later extend with constraints, we consider theproblem of aligning two sequences. Sequence alignment is among the most commontasks in computational biology, where it is used to align sequences assumed tohave diverged from a common ancestor. Notice that we here use a so-called pairHMM (Durbin et al. 1998) which emits two sequences at the same time, and whichis a straightforward extension of the definition above.In the global alignment problem, two sequences x and y must be aligned opti-mally, based on a scoring scheme for comparison of different alignments. In proba-bilistic modeling, a probability is associated with each pair of symbols emitted froma state and similarly a probability for introducing gaps, δ , and extending gaps, ǫ , inthe alignment of the sequences is defined. The probability of an alignment is thenthe product of probabilistic transitions performed to recognize the alignment. Inbiology, these probabilities are defined to reflect observed statistics about sequencemutations and conservation.Fig. 1 shows an HMM capable of generating a pair of aligned sequences. Whengiven two sequences to align, then a path from the initial state, begin , such thatthe model emits the two sequences, corresponds to an alignment. The initial state, begin , does not emit symbols. The match state emits a pair of symbols ( x i , y j ),one for each sequence corresponding to alignment of the symbol at position i insequence x and the symbol at position j in sequence y . Emitted symbols can beidentical or different. A difference represents a potential mutation between the twosequences. The insert state emits only the next symbol of sequence x , effectivelyaligning position x i to a gap in y . Oppositely, the delete state aligns a symbol y j to a gap in sequence x . H. Christiansen et al.
Fig. 1. A pair HMM for pairwise global alignment of sequences. States, representedby squares for emitting states and circles for silent states, are connected by arrowsrepresenting transitions labeled with probabilities.The following example shows an alignment of two short protein sequences, wherethe third line indicates the state sequence of this alignment abbreviated with thefirst letter of the state name:
Sequence x: H G K K G A A Q VSequence y: K G P K K A Q Aalignment : b i i i m m m d d m m m
In this context, a common task is to find the optimal alignment. This means to finda state sequence that can recognize the two sequences and has maximal probability.Another is to calculate the probability to observe an emission sequence. A thirdtype of inference is parameter learning, where we are given a set of alignments andestimate the “best” parameters for the model, where best usually means that theymaximize likelihood of the alignments.
In this section, we give a formal definition of CHMMs and propose a constraintmodel for CHMM runs. Then, the computation of the most probable path is adaptedfor CHMMs.
A CHMM extends an HMM with constraints that limit the set of valid runs andleave fewer paths to consider for any given sequence.
Definition 3.1 A constrained HMM (CHMM) is defined by a 5-tuple h S, A, T, E, C i where h S, A, T, E i is an HMM and C is a set of constraints, each of which is a mapping from HMMruns into { true, f alse } .A run of a CHMM, h path, observation i is a run of the corresponding HMM forwhich C ( path, observation ) is true.Notice that we define constraints in a highly abstract way, independently of any spe-cific constraint language. In the following, constraints over finite domains (Van Hentenryck et al. 1995) nference with Constrained Hidden Markov Models CLP ( Q ) and CLP ( R ) couldhave been used as well. In this section, we propose to model runs of CHMM by a constraint program overfinite domains. In this context, a run of CHMM is a solution of the constraintprogram.Let h S, A, T, E, C i be a CHMM and n the sequence length. A constraint programfor runs is given by the following predicate. run ([ s (0) , S , . . . , S n ] , [ E , . . . , E n ])where each variable S i and E i represents the state and the emission at the step i .The domains of S i and E i , are given as dom( S i ) = S \ { s } and dom( E i ) = E . The run predicate is specified as follows. run ([ s (0) , S , . . . , S n ] , [ E , . . . , E n ]) is true iff ∃ s (1) ∈ dom( S ) , . . . , ∃ s ( n ) ∈ dom( S n ) and ∃ e (1) ∈ dom( E ) , . . . , ∃ e ( n ) ∈ dom( E n ) ,C ( s (0) s (1) . . . s ( n ) , e (1) . . . e ( n ) ) is true , s (0) = s and p ( s (0) ; s (1) ) · p ( s (1) ; e (1) ) . . . p ( s ( n − ; s ( n ) ) · p ( s ( n ) ; e ( n ) ) > . (1)Formula (1) states that s (0) s (1) . . . s ( n ) and e (1) . . . e ( n ) is a run of the HMM thatsatisfies C . By the definition of run/2 , (local) relationships between S i and S i +1 and S i and E i can be established, since the probability of a run must be positive.Indeed, valuation of S i to s ( i ) and S i +1 to s ( i +1) can be part of a solution ofthe constraint program whenever p ( s ( i ) ; s ( i +1) ) >
0. These relationships betweenvariables of run/ trans ( S i − , S i ) and emit ( S i , E i ) , for all i , 1 ≤ i ≤ n where S i , S i +1 and E i are the variables of run/2 . These constraints are defined asfollows. • trans ( S i , S i +1 ) is true iff ∃ s ( i ) ∈ dom( S i ) and s ( i +1) ∈ dom( S i +1 ) such that p ( s ( i ) ; s ( i +1) ) > • emit ( S i , E i ) is true iff ∃ s ( i ) ∈ dom( S i ) and e ( i ) ∈ dom( E i ) such that p ( s ( i ) ; e ( i ) ) > Section 4 below shows an implementation of this framework such that a solution ofthe constraint program corresponds to a valid derivation of a PRISM program.
H. Christiansen et al.
We consider the HMM presented in section 2.2 and extend it into a CHMM by thefollowing set of constraints, C = { cardinality atmost ( N d , [ S , . . . , S n ] , delete) , cardinality atmost ( N i , [ S , . . . , S n ] , insert) } . A constraint cardinality atmost ( N, L, X ) is satisfied whenever L is a list of ele-ments, out of which at most N are equal to X . In a biological context, it is reasonableto consider only alignments with a limited number of insertions and deletions giventhe assumption that the two sequences are related.As described above, we can consider this CHMM as a constraint program run ([ s (0) , S , . . . , S n ] , [ E , . . . , E n ])where dom( S i ) ∈ { match , delete , insert } , dom( E i ) ∈ { A, C, D, . . . , W, Y } and theconstraints C are as described above. The Viterbi algorithm (Viterbi 1967) is a dynamic programming algorithm for find-ing a most probable path corresponding to a given observation. The algorithm keepstrack of, for each prefix of an observed emission sequence, the most probable (par-tial) path leading to each possible state, and extends those step by step into longerpaths, eventually covering the entire emission sequence. Here, we adapt this algo-rithm for CHMMs.Consider a given observation e (1) . . . e ( n ) , a CHMM h S, A, T, E, C i , and its con-straint program run ([ s (0) , S , . . . , S n ] , [ e (1) , . . . , e ( n ) ]) . The most probable path is computed by finding the valuation s (1) , . . . , s ( n ) thatmaximizes the objective function: the probability of a run.Computation of the most probable path for CHMM is expressed as a rewritingsystem on a set of 5-tuples Σ. Each such 5-tuple is of form h s, i, p, π, σ i where π isa partial path ending in state s and representing a path for the emission sequenceprefix e (1) · · · e ( i ) ; p is the computed probability for the emissions and transitionsapplied in the construction of π , and σ is the current constraint store seen as aconjunction of constraints. Any ground and satisfied constraint will be removedfrom the constraint store, and true refers to the empty conjunction. The set ofsolutions of a constraint store σ is denoted by sol( σ ).The two rewriting rules in Fig. 2 describe an iteration step of the computation This set of letters refers to the 21 different amino acids from which proteins are composed. nference with Constrained Hidden Markov Models trans ctr : Σ := Σ ∪ {h s ′ , i +1 , p · p ( s ; s ′ ) · p ( s ′ ; e ( i +1) ) , π s ′ , σ ∧ S i +1 = s ′ i} whenever h s, i, p, π, σ i ∈ Σ, p ( s ; s ′ ) , p ( s ′ ; e ( i +1) ) > σ ∧ S i +1 = s ′ ) and prune ctr does not apply. prune ctr : Σ := Σ \ {h s, i +1 , p ′ , π ′ , σ ′ i} whenever there is another h s, i +1 , p, π, σ i ∈ Σ with p ≥ p ′ and sol( σ ′ ) ⊆ sol( σ ). Fig. 2. Rewriting rules for the computation of most probable paths for CHMMof the most probable path. The computation starts from an initial set of 5-tuples {h s (0) , , , ǫ, C ∧ trans ( s (0) , S ) ∧ ^ ≤ i ≤ n − trans ( S i , S i +1 ) ∧ ^ ≤ i ≤ n emit ( S i , e i ) i} . (2)The trans ctr rule expands an existing partial path one step in directions thatpreserve the satisfaction of the constraint store; this satisfiability check is denotedcheck constraints (and depends thus on the particular C ). The prune ctr rule re-moves partial solutions that are not optimal for the current observation prefix and shares the same set of complete solutions with the better partial solution. Thesecond condition is necessary in case no partial path contained in sol( σ ) can beextended into a full path without violating the constraints. We take the followingcorrectness property for granted. Proposition 3.1
Assume a CHMM H with the notation as above and an observation Obs = e (1) · · · e ( n ) . When the Viterbi algorithm in Fig. 2 is executed from an initial set of 5-tuples giventhe formula (2), it terminates with a set of 5-tuples Σ final . It holds that • For any h s, n, p, π, true i ∈ Σ final , π is a most probable path for Obs endingin s and with probability p . • Whenever there exists a path for
Obs ending in s , Σ final includes a 5-tupleof the form h s, n, p, π, true i .Notice that all the variables of the constraint program are valuated when a fi-nal state is reached, and thus any final constraint store is equivalent to true (as trans ctr prevents any inconsistent store to arise).The classical Viterbi algorithm is guaranteed to run in time linear to the lengthof the given sequence, whereas our algorithm may in the worst case run in expo-nential time; this may occur if prune ctr cannot be applied at all. In other words,a representation of the constraint store that allows an efficient comparison as in When any reference to constraints and the constraint store are removed from Fig. 2, we have acompact representation of one iteration step of the Viterbi algorithm for HMMs.
H. Christiansen et al. “sol( σ ′ ) ⊆ sol( σ )” is essential for the practicability of our algorithm. On the otherhand, for those problems that can be formulated as a CHMM with effective andefficient definitions of check constraints and the comparison test, the Σ states maystay of a reasonable size. Notice that our algorithm is still correct if we use approx-imations of these tests, more specifically, check constraints may occasionally return true when the correct answer is false and the opposite for the comparison. After briefly introducing PRISM, we propose a methodology to define CHMMs inthis framework.
PRISM (Sato and Kameya 2008) is a powerful system for working with probabilistic-logic models, based on an extension to Prolog with discrete random variables, calledmulti-valued switches. We illustrate this with a simple example HMM with twostates s0 and s1 . A switch declaration, values(x,O). associates the named random variable x with a set of outcomes O . Whenever thegoal msw(x,X) is called from the program, then a probabilistic choice will be madeunifying X with an element of O . Switches can also be defined in a parametric form, values(emit(_),[a,b]). % symbol emissionvalues(trans(_),[s0,s1]). % state transition where each declaration defines a family of switches, one for each possible instance of emit( ) and trans( ) and each instance is given a distinct probability distribution.This parametrization can serve to model dependencies: in our HMM example wedefine the parameters to be the states s0 and s1 (plus init for trans( ) ), thusdefining emissions and transitions for each state with the Markov property. Finally,we define a logic program to implement the probabilistic model, hmm(L):- run_length(T), hmm(T,init,L).hmm(0,_,[]).hmm(T,State,[Emit|EmitRest]) :-T > 0,msw(trans(State),NextState),msw(emit(NextState),Emit),T1 is T-1,hmm(T1,NextState,EmitRest).run_length(10). Here, a derivation of the goal hmm corresponds to what we define as a run in sec-tion 2.1. As shown by (Sato 1995), Prolog’s traditional Herbrand model semanticsgeneralizes immediately to a probabilistic semantics when probabilities are given nference with Constrained Hidden Markov Models msw is used in the program). Thus a PRISM program defines a probabilistic modelthat provides a probability distribution for all goals that can be formulated in theprogram’s logical language. PRISM assigns each possible derivation of a goal a pro-bability defined as the product of the probabilities of the selected switch outcomesof switches used in the derivation. Under normal conditions, it will be the case thatthe sum of probabilities of all possible derivations of such a goal is unity, but theseconditions can be violated in a constrained model. If a program attempts to unifythe stochastically selected outcome of a switch with some other value distinct fromthat outcome, this unification will fail resulting in a failed derivation.PRISM includes built-in mechanisms for efficient probabilistic inference based ontabling. During inference, once a probabilistic goal has been solved, its answers areput in a global table. Later calls to the same goal will simply lookup the answer inthe table in constant time. PRISM utilizes this to provide an efficient generalizedViterbi algorithm that may be used for the computation of the most likely successful derivation for a large number of probabilistic models including HMMs. PRISM alsoincludes similar utilities for calculating the probability of a derivation or set of suchand machine learning algorithms which produce the most likely probabilities forswitch outcomes in order to explain a set of observed goals.
We have implemented a framework for integration of side-constraints in a PRISMprogram. The framework has been used for adding constraints to HMM basedmodels, but it should be possible to extend to other kinds of models. The underlyingidea is that the program is augmented with a constraint store and a constraintchecker goal is inserted in a few strategic places of the PRISM program. Thisconstraint checking is the direct implementation of check constraints of trans ctr .The prune ctr implementation is not discussed as we use the tabling mechanism ofPRISM to prune the search space.
This section describes how our framework can be integrated in a PRISM program.As an example, we consider an implementation of the HMM from the previous sec-tion. Below the central recursive predicate of the implementation is shown extendedwith constraint checking, hmm(T,State,[Emit|EmitRest],StoreIn) :- T > 0, msw(trans(State),NextState), msw(emit(NextState),Emit), check_constraints([NextState,Emit],StoreIn,StoreOut), The current implementation of the framework is available via http://akira.ruc.dk/ ∼ cth/chmm H. Christiansen et al. T1 is T-1, hmm(T1,NextState,EmitRest,StoreOut). Integration of side-constraint checking is done by extending relevant predicateswith an extra parameter (
StoreIn,StoreOut in the code above) to accommodatea constraint store and a call to the check constraints goal (line 5), after eachdistinct sequence of msw applications.If check constraints fails during PRISM inference, then the correspondingPRISM derivation fails, and further extensions of this derivation will not be at-tempted since it does not constitute a valid run. In effect, inference by PRISMwill only consider runs which are guaranteed not to violate any of the constraintsdeclared for the model.Declaration of constraints and implementation of constraint solvers are concep-tually decoupled from the PRISM model. The declaration of side-constraints on themodel is done by declaring facts of the form, constraint(ConstraintSpec) . The
ConstraintSpec associates the constraint with a constraint checker implementa-tion and may contain some parameters for this particular instance of the type ofconstraint.A satisfiability checker maintains its own constraint store. A satisfiability checkerfor a particular type of constraint consists of an init constraint store/2 ruleand one or more check sat/4 rules. The init constraint store/2 rule is used tocreate a starting point for the constraint store of each declared constraint and is ofthe form, init_constraint_store(ConstraintSpec, InitialStore).
It is given
ConstraintSpec and must unify
InitialStore with an initial constraintstore matching the
ConstraintSpec . Additionally, one or more check sat rules ofthe form, check_sat(ConstraintSpec,StateUpdate,StoreBefore,StoreAfter):- ... . must be implemented to check the satisfiability of the constraint.As an example, consider an implementation of a cardinality atmost constraint,called cardinality in our framework, init_constraint_store(cardinality(_,_), 0).check_sat(cardinality(U,Max), U, VisitsIn, VisitsOut) :-VisitsOut is VisitsIn + 1,VisitsOut =< Max.check_sat(cardinality(X,_),U,S,S) :- X \= U.
Each time check constraints is called from the PRISM model, the relevant check sat goals are called for each declared constraint. If any of these fails, so will check constraints . StateUpdate and
StoreBefore are given and check cons-traints is expected to unify
StoreAfter to an updated constraint store. In ourexample HMM, the
StateUpdate will consist of the [State,Emit] pattern givento check constraints .The call to this rule must only succeed if the constraint given by
ConstraintSpec is not violated by the further information given by the
StateUpdate . Constraints are nference with Constrained Hidden Markov Models
The tabling mechanism in PRISM makes Viterbi computation and EM learningefficient, but when extra parameters such as the constraint store are introduced inthe probabilistic goals, PRISM considers these as goals with distinct derivationsand stores a tabled entry for each version of the goal. This behavior is undesiredwhen the extra parameters are used only for internal bookkeeping. The effect ofthis excessive tabling is that the dynamic programming advantages are lost withexponential time inference as consequence.In (Christiansen and Gallagher 2009) a related problem concerning tabling ofannotations produced by running Viterbi on PRISM programs is approached usinga program transformation that removes non-discriminating arguments, which donot affect the control flow. The annotation can then be recovered from the programderivation of the transformed program.This approach is not applicable for the constraint store argument because theconstraint store implicitly affects control flow by limiting possible future deriva-tion extensions. The constraint store has to be considered in the inference process;otherwise it would be possible to produce invalid derivation paths.B-Prolog, on which PRISM is based, supports table modes, but this is not directlyusable with probabilistic goals in PRISM. It is possible with these modes to declarean argument of a tabled goal as an output argument , which means that it will not beused as key in the table lookup, but will be unified with the value of the argumentstored in a tabled goal. For our purpose, declaring the constraint store argumentsas output arguments would not be feasible since different derivations of the samegoal may have differing constraint stores and these determine possible derivationextensions.To deal with the tabling problem we have introduced a separate constraint storestack, which avoids storing data locally in parameters of probabilistic goals bymaintaining the constraint store with assert and retract. This stack is maintainedin parallel to the derivation stack of Prolog. PRISM utilizes Prolog’s backtracking toexplore possible solutions, so the constraint store stack implementation is requiredto be able to restore a previous constraint store when PRISM encounters failuresduring inference and performs backtracking to find alternative solutions.To utilize this functionality, the user should use the goal check constraints/1 ,which omits the store arguments, rather than check constraints/3 as statedabove. We then define check constraints/1 as check_constraints(StateUpdate) :- H. Christiansen et al. get_store(StoreBefore),check_constraints(StateUpdate,StoreBefore,StoreAfter),forward_store(StoreAfter).
The new check constraints/1 make use of the goal get store/1 to retrieve thecurrent version of the constraint store and forward store/1 is used to assert theupdated store, get_store(S) :- !, store(S).forward_store(S) :- (asserta(store(S)) ; retract(store(S)),fail).
If a derivation fails, PRISM backtracks to the choice point in the forward store rule and retract the most recently asserted store. Then, when exploring alterna-tive derivation extensions, the previously asserted constraint store will be used asexpected.
Due to tabling, PRISM guarantees familiar best known complexity bounds of com-mon inference tasks on a variety of the models that can be expressed in PRISM,which includes HMMs (Sato 2000). This implicitly limits the number of calls of check constraints to the same bound. The added complexity of doing constraintchecking depends on incremental constraint checking cost of individual constraintscheckers and the number of constraints expressed on the model.Space complexity is influenced by table space usage and maximal length of aderivation at any given point. Since the asserted constraint store stack contains aconstraint store fixpoint for each step of the current derivation, it is bounded by O ( n max( | c | )) where n is the length of the sequence and max( | c | ) is the maximal sizeof the constraint store in any derivation step. Note that the space complexity of theseparate constraint store stack is unaffected by time complexity and the numberof states in the model. With more complex models like the pair HMM, the tablespace required for dynamic programming becomes the dominating concern. In this section, we validate our CHMM implementation with the pair HMM pre-sented in section 2.2. The experiments were run on a computer with 16 2.4 GHz,64 bit Intel Xeon(R) E7340 CPUs and 64 GB of memory. All of the experimentsutilized only a single processor at a time.Our experiments utilize implementations of some common constraints adapted forthe CHMM framework: cardinality(UpdatePatterns,Max) ensures that entriesfrom the list
UpdatePatterns occurs at most
Max times in the derivation sequence. alldiff ensures that all updates in a derivation are different; lock to sequence(Seq) ensures that the sequence of derivation updates is identical to the sequence repre-sented by the list
Seq ; lock to set(Set) ensures that all updates belong to mem-bers of the list Set . The operator forall subseq(L,C) applies the constraint C toevery subsequence of length L in the derivation sequence and for range(From,To,C) nference with Constrained Hidden Markov Models To - From , both inclusive; state specific(C) applies Conly to the
State part of the update.
The addition of side-constraints to an HMM involves some computational overheadin order to check the satisfiability of the constraints, but may also reduce the numberof possible solutions and therefore the amount of work required to find the optimalpath. As a practical experiment to demonstrate this, we consider global alignmentwith the pair HMM discussed in section 2.2.The overhead of integrating the constraint checking machinery in the modelis demonstrated in the left part of Fig. 3, where sequences of increasing lengthare aligned. It can be observed that the running time penalty is a constant fac-tor and that the polynomial time complexity of the pair HMM is preserved inour framework. Obviously, polynomial time inference presupposes incremental con-straint checking to be a constant time operation, which may not be the case forcertain types of constraints.In the right part of Fig. 3, two sequences of equal length (32) are aligned, but withvarying amounts of constraints being enforced. The global cardinality constraintis used to enforce an upper limit, L , on the amount of inserts or deletes in thealignment, constraint(state_specific(cardinality([insert,delete],L))). By constraining the alignment (allowing fewer gaps), the space of viable solutionsis reduced. The more constrained the alignment is, the more pruning opportunitiesarise. With a large amount of pruning opportunities, the running time is reducedquite significantly. Note that, since the imposed constraint is state specific , thenumber of possible alignments, and hence running time, is unaffected by inputsequence structure. r unn i ng t i m e ( m ili s e c ond s ) sequence lengthsconstrained pair hmm (no constraints enforced)pure pair hmm 0 100 200 300 400 500 600 0 5 10 15 20 25 30 35 r unn i ng t i m e ( m ili s e c ond s ) allowed insertions/deletions Fig. 3. Left: Running time of alignment with a pure pair HMM compared toalignment with a CHMM with no constraints enforced. Right: Running time ofalignment of two sequences of length 32 with varying amounts of allowed insertionsand deletions.4
H. Christiansen et al.
To verify the efficiency of our constraint store implementation, alignment with a lo-cal cardinality constraint was measured for different sizes of input sequences. Fromthe measurements, which are reported in Fig. 4, it is apparent that our implemen-tation does not incur the same exponential overhead as the naive implementationwhere the constraint store is maintained in the goals and hence tabled. r unn i ng t i m e ( m ili s e c ond s ) length of aligned sequencesconstraint store in goalsseparate constraint store 100000 1e+06 1e+07 1e+08 1e+09 0 10 20 30 40 50 60 70 80 90 100 t o t a l m e m o r y u s e ( b y t e s ) length of aligned sequencesconstraint store in goalsseparate constraint store Fig. 4. A comparison of the running time (left) and memory usage (right) of con-strained alignment of two sequences with tabled constraints versus a separate con-straint store stack.Running times and memory usage for a range of different constraints are reportedin Table 1. For the sake for completeness, the table also includes running times forthe version where the constraint store is tabled.
Sequence Running MemoryConstraint lengths Time (in ms) consumption (in kb)in goals separate in goals separatecardinality([insert],20) 50 15460 3176 42296 5723cardinality([insert],40) 50 29557 3968 93845 6703for range(1,50,lock to set([match])) 100 24649 4544 105498 7137for range(1,90,lock to set([match])) 100 20 48 1641 1198for range(1,50,lock to sequence([match,..,match])) 100 24829 4544 1641 1198for range(1,90,lock to sequence([match,..,match])) 100 20 48 105498 7137alldiff 20 100442 28 85654 256forall subseqs(5,alldiff) 10 1664 12 60098 137
Table 1. Running time and memory consumption for alignment with different kindsof constraints.In most cases the separate constraint store performs better in terms of bothrunning time and memory consumption. In the cases where performance is worse,it can be attributed to a very small number of possible derivations or constraintswhich rarely change the store. nference with Constrained Hidden Markov Models The term “Constrained HMM” is used in (Roweis 1999; Landwehr et al. 2007) andrefers to restrictions on the finite automaton associated with an HMM but not asconstraints on HMM runs. In (Sato and Kameya 2008), CHMMs were introduced toexemplify an EM algorithm, suited for PRISM programs which allow the possibilityof derivation failures. Our approach differs, as we augment PRISM programs withside-constraints and use constraint solving techniques to achieve efficient inference.In (Riezler 1998), Riezler proposes techniques for inference in probabilistic con-straint logic programming. In (Costa et al. 2008) relationships between elements ofa Bayesian Network are expressed as a constraint logic program, which is similar tothe way we define HMMs. However, our paper focus differs as we study the interestof checking satisfiability of side-constraints during inference.In the natural language processing community, recent work on Constrained Con-ditional Models feature an approach similar to ours. Indeed, Constrained Condi-tional Models is a general framework that augments inference and learning of condi-tional models with declarative constraints (Chang et al. 2008). However, inferenceis expressed as an Integer Linear Programming problem (Roth and Yih 2005). Inthis context, more expressive constraints, such as cardinality or all different ,can not be added on an HMM run. Moreover, our PRISM-based implementationallows us to define the HMM structure separately from the side-constraints and useadvanced constraint solving techniques. In this paper, we propose a framework to define HMMs with side-constraints asa Constraint Logic program extended by probabilistic choices. Constraint LogicProgramming have advantages in terms of more compact expression of CHMMs.Inference computations are adapted for CHMMs and conditions for an efficientcomputation are described. An implementation based on PRISM is proposed andwell-known constraints and operators have been demonstrated for defining CHMMs.Finally, we experimentally validate our approach with a constrained pair HMM usedfor biological sequence alignment.As current work, we study how sampling and EM-learning can be adapted for ourCHMM framework. Indeed, sampling turns out to be problematic in probabilisticmodels with a large probability of derivation failure. In (Sato et al. 2005), Sato etal. address the problem of EM-learning with PRISM programs that can fail andtheir methods are also applicable for our framework.As further work, we plan to incorporate more advanced constraint solving tech-niques such as those used in Weighted CSP (Larrosa and Schiex 2004) in the frame-work. This approach would allow us to combine soft constraints solving and in-ference and express this as an optimization problem. We also plan to deal withthe restriction that individual constraint checkers do not share information in ourframework, so that we can benefit from some of the optimization techniques usedby other constraint solvers. We are working on extending the library of constraints6
H. Christiansen et al. that can be defined as side-constraints.
Acknowledgment
This work is supported by the project Logic-statistic modeling and analysis of bio-logical sequence data funded by the NABIIT program under the Danish StrategicResearch Council. We thank the anonymous reviewers for their interesting com-ments.
References
Chang, M.-W. , Ratinov, L.-A. , and Rizzolo, N. Roth, D. Proc. of AAAI Conference on Artificial Intelligence . Chicago,USA, 1513–1518.
Christiansen, H. and Gallagher, J.
Proc. of Intermational Conference in Logic Programming . Pasadena, USA,55–69.
Christiansen, H. , Have, C. , Lassen, O. , and Petit, M. Proc. of the Inter-national Workshop on Constraint Based Methods for Bioinformatics . Lisbon, Portugal,19–26.
Costa, V. , Page, D. , and Cussens, J. Probabilistic Inductive Logic Programming LNAI 4911 ,156–188.
Durbin, R. , Eddy, S. , Krogh, A. , and Mitchison, G. Biological Sequence Anal-ysis . Cambridge University Press.
Landwehr, N. , Mielikinen, T. , Eronen, L. , Toivonen, H. , and Mannila, H. BMC Bioinfor-matics 8,
S-2.
Larrosa, J. and Schiex, T.
Artificial Intelligence 159,
Rabiner, L.
IEEE 77,
Riezler, S.
Roth, D. and Yih, W.
Proc. of the International Conference on Machine Learning . Bonn, Germany,737–744.
Roweis, S.
Proc. of the International Con-ference of Advances in Neural Information Processing System . Denver, USA, 782–788.
Sato, T.
Proc. of International Conference in Logic Programming . Tokyo, Japan,715–729.
Sato, T.
Proc. of the Workshop on Fusion of Domain Knowledge with Data for Decision Support .Tokyo, Japan.
Sato, T. , Kameya, T. , and Zhou, N. Proc. of International Joint Conference on Aritificial Intelligence . Edinburgh, Scot-land, 847–852. nference with Constrained Hidden Markov Models Sato, T. and Kameya, Y.
Proc. of the International Joint Conference of on Artificial Intellingence . Nagoya,Japan, 1330–1335.
Sato, T. and Kameya, Y.
Probabilistic Inductive Logic Programming . LNCS. Springer, 118–155.
Van Hentenryck, P. , Saraswat, V. , and Deville, Y. Constraint Programming 910 , 293–316.
Viterbi, A. J.
IEEE Transactions on Information Theory 13 , 260–269., 260–269.