[PDF] Inference with Constrained Hidden Markov Models in PRISM

Abstract

A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference. Defining HMMs with side-constraints in Constraint Logic Programming have advantages in terms of more compact expression and pruning opportunities during inference. We present a PRISM-based framework for extending HMMs with side-constraints and show how well-known constraints such as cardinality and all different are integrated. We experimentally validate our approach on the biologically motivated problem of global pairwise alignment.

Full PDF

aa r X i v : . [ c s . A I] J u l Under consideration for publication in Theory and Practice of Logic Programming Inference with Constrained Hidden MarkovModels in PRISM

Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit

Research group PLIS: Programming, Logic and Intelligent SystemsDepartment of Communication, Business and Information TechnologiesRoskilde University, P.O.Box 260, DK-4000 Roskilde, Denmark ( e-mail: { henning, cth, otl, petit } @ruc.dk ) submitted 7 February 2010; revised 10 April 2010; accepted 1 May 2010 Abstract

A Hidden Markov Model (HMM) is a common statistical model which is widely used foranalysis of biological sequence data and other sequential phenomena. In the present paperwe show how HMMs can be extended with side-constraints and present constraint solv-ing techniques for eﬃcient inference. Deﬁning HMMs with side-constraints in ConstraintLogic Programming have advantages in terms of more compact expression and prun-ing opportunities during inference. We present a PRISM-based framework for extendingHMMs with side-constraints and show how well-known constraints such as cardinality and all different are integrated. We experimentally validate our approach on the bio-logically motivated problem of global pairwise alignment.

KEYWORDS : Hidden Markov Model with side-constraints, Inference, PRogramming InStatistical Modeling

Note:

This article has been published in Theory and Practice of Logic Programming,10(4-6), 449464, c (cid:13)

Cambridge University Press.

Hidden Markov Models (HMMs) are one of the most popular models for analysisof sequential processes taking place in a random way, where “randomness” mayalso be an abstraction covering the fact that a detailed analytical model for theinternal matters is unavailable. Such a sequential process can be observed fromoutside by its emission sequence (letters, sounds, measures of features, all kindsof signals) produced over time, and an HMM postulates a hypothesis about theinternal machinery in terms of a ﬁnite state automaton equipped with probabil-ities for the diﬀerent state transitions and single emissions. A common inferencefor a given observed sequence means to compute the “best” state transitions thatthe HMM may go through to produce the sequence, and thus this represents abest hypothesis for the internal structure or “content” of the sequence. HMMs arewidely used in speech recognition and biological sequence analysis (Rabiner 1989;Durbin et al. 1998).

H. Christiansen et al.

The eﬃciency of computations on HMMs heavily depends on the Markov prop-erty. Decisions made during a process run depends only on a limited past. Dynamicprogramming algorithms, such as Viterbi and Forward-Backward, are then used toperform eﬃcient inference. However, many problems would require more complexdependencies among elements of the process. For example, it may be interestingto constrain an HMM to visit only diﬀerent states or limit the number of visits toa given state. It is possible to model the all different constraint for the statesvisited by extending the underlying ﬁnite state automaton, but for the price ofa factorial number of new states and with an obvious impact on inference. As analternative to modifying the HMM structure, we instead extend the HMM with side-constraints (Sato and Kameya 2008; Roth and Yih 2005). However, classical algo-rithms, such as Viterbi, must be modiﬁed to take care about these side-constraints(Chang et al. 2008; Christiansen et al. 2009).In this paper, we extend HMMs with side-constraints, leading to what we callConstrained HMMs (CHMMs). Side-constraints are external constraints declaredin addition to those deﬁned by the structure of an HMM. The concept of CHMMswas introduced by Sato et al. in (Sato and Kameya 2008), although earlier andunrelated systems have used the same or similar names (discussed in section 6).The contribution of this paper is to deﬁne CHMMs as constraint logic programsextended with probabilistic choices and to show how to employ this setting for moreeﬃcient Viterbi computation, i.e., computation of the most probable explanationof an observation. Moreover, deﬁning HMMs with side-constraints in ConstraintLogic Programming have advantages in terms of more compact expression andpruning opportunities during inference. We show how to implement CHMMs inPRISM (Sato and Kameya 1997) and how to integrate well-known constraints, suchas cardinality and all different , into this framework. We validate our approachexperimentally on the biologically motivated problem of global pairwise alignment.The paper is organized as follows: section 2 describes background on HMMs. Insection 3, we formally introduce the constraint model associated with a CHMM. Sec-tion 4 describes our PRISM-based framework to deﬁne CHMMs. Section 5 presentsan experimental validation. Finally, sections 6 and 7 present related work and con-clusions.

Here we deﬁne Hidden Markov Models (HMM)s and illustrate their application tothe problem of pairwise global alignment.

For simplicity of the technical deﬁnitions, we limit ourselves to a discrete HiddenMarkov Model with a distinguished initial state.

Deﬁnition 2.1 A Hidden Markov Model (HMM) is a 4-tuple h S, A, T, E i , where nference with Constrained Hidden Markov Models • S = { s , s , . . . , s m } is a set of states which includes an initial state referredto as s ; • A = { e , e , . . . , e k } is a ﬁnite set of emission symbols; • T = { ( p ( s ; s ) , . . . , p ( s ; s m )) , . . . , ( p ( s m ; s ) , . . . , p ( s m ; s m )) } is a set of tran-sition probability distributions representing probabilities to transit from onestate to another; • E = { ( p ( s ; e ) , . . . , p ( s ; e k )) , . . . , ( p ( s m ; e ) , . . . , p ( s m ; e k )) } is a set of emis-sion probability distributions representing probabilities to emit each symbolfrom each state.We deﬁne a run of an HMM as a pair consisting of a sequence of states s (0) s (1) . . . s ( n ) ,called a path and a corresponding sequence of emissions e (1) . . . e ( n ) , called an ob-servation , such that • s (0) = s ; • ∀ i, ≤ i ≤ n − , p ( s ( i ) ; s ( i +1) ) > s ( i ) to s ( i +1) ); • ∀ i, < i ≤ n, p ( s ( i ) ; e ( i ) ) > e ( i ) from s ( i ) ).The probability of such a run is deﬁned as Q i =1 ..n p ( s ( i − ; s ( i ) ) · p ( s ( i ) ; e ( i ) ). As an example of an HMM that we later extend with constraints, we consider theproblem of aligning two sequences. Sequence alignment is among the most commontasks in computational biology, where it is used to align sequences assumed tohave diverged from a common ancestor. Notice that we here use a so-called pairHMM (Durbin et al. 1998) which emits two sequences at the same time, and whichis a straightforward extension of the deﬁnition above.In the global alignment problem, two sequences x and y must be aligned opti-mally, based on a scoring scheme for comparison of diﬀerent alignments. In proba-bilistic modeling, a probability is associated with each pair of symbols emitted froma state and similarly a probability for introducing gaps, δ , and extending gaps, ǫ , inthe alignment of the sequences is deﬁned. The probability of an alignment is thenthe product of probabilistic transitions performed to recognize the alignment. Inbiology, these probabilities are deﬁned to reﬂect observed statistics about sequencemutations and conservation.Fig. 1 shows an HMM capable of generating a pair of aligned sequences. Whengiven two sequences to align, then a path from the initial state, begin , such thatthe model emits the two sequences, corresponds to an alignment. The initial state, begin , does not emit symbols. The match state emits a pair of symbols ( x i , y j ),one for each sequence corresponding to alignment of the symbol at position i insequence x and the symbol at position j in sequence y . Emitted symbols can beidentical or diﬀerent. A diﬀerence represents a potential mutation between the twosequences. The insert state emits only the next symbol of sequence x , eﬀectivelyaligning position x i to a gap in y . Oppositely, the delete state aligns a symbol y j to a gap in sequence x . H. Christiansen et al.

Fig. 1. A pair HMM for pairwise global alignment of sequences. States, representedby squares for emitting states and circles for silent states, are connected by arrowsrepresenting transitions labeled with probabilities.The following example shows an alignment of two short protein sequences, wherethe third line indicates the state sequence of this alignment abbreviated with theﬁrst letter of the state name:

Sequence x: H G K K G A A Q VSequence y: K G P K K A Q Aalignment : b i i i m m m d d m m m

In this context, a common task is to ﬁnd the optimal alignment. This means to ﬁnda state sequence that can recognize the two sequences and has maximal probability.Another is to calculate the probability to observe an emission sequence. A thirdtype of inference is parameter learning, where we are given a set of alignments andestimate the “best” parameters for the model, where best usually means that theymaximize likelihood of the alignments.

In this section, we give a formal deﬁnition of CHMMs and propose a constraintmodel for CHMM runs. Then, the computation of the most probable path is adaptedfor CHMMs.

A CHMM extends an HMM with constraints that limit the set of valid runs andleave fewer paths to consider for any given sequence.

Deﬁnition 3.1 A constrained HMM (CHMM) is deﬁned by a 5-tuple h S, A, T, E, C i where h S, A, T, E i is an HMM and C is a set of constraints, each of which is a mapping from HMMruns into { true, f alse } .A run of a CHMM, h path, observation i is a run of the corresponding HMM forwhich C ( path, observation ) is true.Notice that we deﬁne constraints in a highly abstract way, independently of any spe-ciﬁc constraint language. In the following, constraints over ﬁnite domains (Van Hentenryck et al. 1995) nference with Constrained Hidden Markov Models CLP ( Q ) and CLP ( R ) couldhave been used as well. In this section, we propose to model runs of CHMM by a constraint program overﬁnite domains. In this context, a run of CHMM is a solution of the constraintprogram.Let h S, A, T, E, C i be a CHMM and n the sequence length. A constraint programfor runs is given by the following predicate. run ([ s (0) , S , . . . , S n ] , [ E , . . . , E n ])where each variable S i and E i represents the state and the emission at the step i .The domains of S i and E i , are given as dom( S i ) = S \ { s } and dom( E i ) = E . The run predicate is speciﬁed as follows. run ([ s (0) , S , . . . , S n ] , [ E , . . . , E n ]) is true iﬀ ∃ s (1) ∈ dom( S ) , . . . , ∃ s ( n ) ∈ dom( S n ) and ∃ e (1) ∈ dom( E ) , . . . , ∃ e ( n ) ∈ dom( E n ) ,C ( s (0) s (1) . . . s ( n ) , e (1) . . . e ( n ) ) is true , s (0) = s and p ( s (0) ; s (1) ) · p ( s (1) ; e (1) ) . . . p ( s ( n − ; s ( n ) ) · p ( s ( n ) ; e ( n ) ) > . (1)Formula (1) states that s (0) s (1) . . . s ( n ) and e (1) . . . e ( n ) is a run of the HMM thatsatisﬁes C . By the deﬁnition of run/2 , (local) relationships between S i and S i +1 and S i and E i can be established, since the probability of a run must be positive.Indeed, valuation of S i to s ( i ) and S i +1 to s ( i +1) can be part of a solution ofthe constraint program whenever p ( s ( i ) ; s ( i +1) ) >

0. These relationships betweenvariables of run/ trans ( S i − , S i ) and emit ( S i , E i ) , for all i , 1 ≤ i ≤ n where S i , S i +1 and E i are the variables of run/2 . These constraints are deﬁned asfollows. • trans ( S i , S i +1 ) is true iﬀ ∃ s ( i ) ∈ dom( S i ) and s ( i +1) ∈ dom( S i +1 ) such that p ( s ( i ) ; s ( i +1) ) > • emit ( S i , E i ) is true iﬀ ∃ s ( i ) ∈ dom( S i ) and e ( i ) ∈ dom( E i ) such that p ( s ( i ) ; e ( i ) ) > Section 4 below shows an implementation of this framework such that a solution ofthe constraint program corresponds to a valid derivation of a PRISM program.

H. Christiansen et al.

We consider the HMM presented in section 2.2 and extend it into a CHMM by thefollowing set of constraints, C = { cardinality atmost ( N d , [ S , . . . , S n ] , delete) , cardinality atmost ( N i , [ S , . . . , S n ] , insert) } . A constraint cardinality atmost ( N, L, X ) is satisﬁed whenever L is a list of ele-ments, out of which at most N are equal to X . In a biological context, it is reasonableto consider only alignments with a limited number of insertions and deletions giventhe assumption that the two sequences are related.As described above, we can consider this CHMM as a constraint program run ([ s (0) , S , . . . , S n ] , [ E , . . . , E n ])where dom( S i ) ∈ { match , delete , insert } , dom( E i ) ∈ { A, C, D, . . . , W, Y } and theconstraints C are as described above. The Viterbi algorithm (Viterbi 1967) is a dynamic programming algorithm for ﬁnd-ing a most probable path corresponding to a given observation. The algorithm keepstrack of, for each preﬁx of an observed emission sequence, the most probable (par-tial) path leading to each possible state, and extends those step by step into longerpaths, eventually covering the entire emission sequence. Here, we adapt this algo-rithm for CHMMs.Consider a given observation e (1) . . . e ( n ) , a CHMM h S, A, T, E, C i , and its con-straint program run ([ s (0) , S , . . . , S n ] , [ e (1) , . . . , e ( n ) ]) . The most probable path is computed by ﬁnding the valuation s (1) , . . . , s ( n ) thatmaximizes the objective function: the probability of a run.Computation of the most probable path for CHMM is expressed as a rewritingsystem on a set of 5-tuples Σ. Each such 5-tuple is of form h s, i, p, π, σ i where π isa partial path ending in state s and representing a path for the emission sequencepreﬁx e (1) · · · e ( i ) ; p is the computed probability for the emissions and transitionsapplied in the construction of π , and σ is the current constraint store seen as aconjunction of constraints. Any ground and satisﬁed constraint will be removedfrom the constraint store, and true refers to the empty conjunction. The set ofsolutions of a constraint store σ is denoted by sol( σ ).The two rewriting rules in Fig. 2 describe an iteration step of the computation This set of letters refers to the 21 diﬀerent amino acids from which proteins are composed. nference with Constrained Hidden Markov Models trans ctr : Σ := Σ ∪ {h s ′ , i +1 , p · p ( s ; s ′ ) · p ( s ′ ; e ( i +1) ) , π s ′ , σ ∧ S i +1 = s ′ i} whenever h s, i, p, π, σ i ∈ Σ, p ( s ; s ′ ) , p ( s ′ ; e ( i +1) ) > σ ∧ S i +1 = s ′ ) and prune ctr does not apply. prune ctr : Σ := Σ \ {h s, i +1 , p ′ , π ′ , σ ′ i} whenever there is another h s, i +1 , p, π, σ i ∈ Σ with p ≥ p ′ and sol( σ ′ ) ⊆ sol( σ ). Fig. 2. Rewriting rules for the computation of most probable paths for CHMMof the most probable path. The computation starts from an initial set of 5-tuples {h s (0) , , , ǫ, C ∧ trans ( s (0) , S ) ∧ ^ ≤ i ≤ n − trans ( S i , S i +1 ) ∧ ^ ≤ i ≤ n emit ( S i , e i ) i} . (2)The trans ctr rule expands an existing partial path one step in directions thatpreserve the satisfaction of the constraint store; this satisﬁability check is denotedcheck constraints (and depends thus on the particular C ). The prune ctr rule re-moves partial solutions that are not optimal for the current observation preﬁx and shares the same set of complete solutions with the better partial solution. Thesecond condition is necessary in case no partial path contained in sol( σ ) can beextended into a full path without violating the constraints. We take the followingcorrectness property for granted. Proposition 3.1

Assume a CHMM H with the notation as above and an observation Obs = e (1) · · · e ( n ) . When the Viterbi algorithm in Fig. 2 is executed from an initial set of 5-tuples giventhe formula (2), it terminates with a set of 5-tuples Σ final . It holds that • For any h s, n, p, π, true i ∈ Σ final , π is a most probable path for Obs endingin s and with probability p . • Whenever there exists a path for

Obs ending in s , Σ final includes a 5-tupleof the form h s, n, p, π, true i .Notice that all the variables of the constraint program are valuated when a ﬁ-nal state is reached, and thus any ﬁnal constraint store is equivalent to true (as trans ctr prevents any inconsistent store to arise).The classical Viterbi algorithm is guaranteed to run in time linear to the lengthof the given sequence, whereas our algorithm may in the worst case run in expo-nential time; this may occur if prune ctr cannot be applied at all. In other words,a representation of the constraint store that allows an eﬃcient comparison as in When any reference to constraints and the constraint store are removed from Fig. 2, we have acompact representation of one iteration step of the Viterbi algorithm for HMMs.

H. Christiansen et al. “sol( σ ′ ) ⊆ sol( σ )” is essential for the practicability of our algorithm. On the otherhand, for those problems that can be formulated as a CHMM with eﬀective andeﬃcient deﬁnitions of check constraints and the comparison test, the Σ states maystay of a reasonable size. Notice that our algorithm is still correct if we use approx-imations of these tests, more speciﬁcally, check constraints may occasionally return true when the correct answer is false and the opposite for the comparison. After brieﬂy introducing PRISM, we propose a methodology to deﬁne CHMMs inthis framework.

PRISM (Sato and Kameya 2008) is a powerful system for working with probabilistic-logic models, based on an extension to Prolog with discrete random variables, calledmulti-valued switches. We illustrate this with a simple example HMM with twostates s0 and s1 . A switch declaration, values(x,O). associates the named random variable x with a set of outcomes O . Whenever thegoal msw(x,X) is called from the program, then a probabilistic choice will be madeunifying X with an element of O . Switches can also be deﬁned in a parametric form, values(emit(_),[a,b]). % symbol emissionvalues(trans(_),[s0,s1]). % state transition where each declaration deﬁnes a family of switches, one for each possible instance of emit( ) and trans( ) and each instance is given a distinct probability distribution.This parametrization can serve to model dependencies: in our HMM example wedeﬁne the parameters to be the states s0 and s1 (plus init for trans( ) ), thusdeﬁning emissions and transitions for each state with the Markov property. Finally,we deﬁne a logic program to implement the probabilistic model, hmm(L):- run_length(T), hmm(T,init,L).hmm(0,_,[]).hmm(T,State,[Emit|EmitRest]) :-T > 0,msw(trans(State),NextState),msw(emit(NextState),Emit),T1 is T-1,hmm(T1,NextState,EmitRest).run_length(10). Here, a derivation of the goal hmm corresponds to what we deﬁne as a run in sec-tion 2.1. As shown by (Sato 1995), Prolog’s traditional Herbrand model semanticsgeneralizes immediately to a probabilistic semantics when probabilities are given nference with Constrained Hidden Markov Models msw is used in the program). Thus a PRISM program deﬁnes a probabilistic modelthat provides a probability distribution for all goals that can be formulated in theprogram’s logical language. PRISM assigns each possible derivation of a goal a pro-bability deﬁned as the product of the probabilities of the selected switch outcomesof switches used in the derivation. Under normal conditions, it will be the case thatthe sum of probabilities of all possible derivations of such a goal is unity, but theseconditions can be violated in a constrained model. If a program attempts to unifythe stochastically selected outcome of a switch with some other value distinct fromthat outcome, this uniﬁcation will fail resulting in a failed derivation.PRISM includes built-in mechanisms for eﬃcient probabilistic inference based ontabling. During inference, once a probabilistic goal has been solved, its answers areput in a global table. Later calls to the same goal will simply lookup the answer inthe table in constant time. PRISM utilizes this to provide an eﬃcient generalizedViterbi algorithm that may be used for the computation of the most likely successful derivation for a large number of probabilistic models including HMMs. PRISM alsoincludes similar utilities for calculating the probability of a derivation or set of suchand machine learning algorithms which produce the most likely probabilities forswitch outcomes in order to explain a set of observed goals.

We have implemented a framework for integration of side-constraints in a PRISMprogram. The framework has been used for adding constraints to HMM basedmodels, but it should be possible to extend to other kinds of models. The underlyingidea is that the program is augmented with a constraint store and a constraintchecker goal is inserted in a few strategic places of the PRISM program. Thisconstraint checking is the direct implementation of check constraints of trans ctr .The prune ctr implementation is not discussed as we use the tabling mechanism ofPRISM to prune the search space.

This section describes how our framework can be integrated in a PRISM program.As an example, we consider an implementation of the HMM from the previous sec-tion. Below the central recursive predicate of the implementation is shown extendedwith constraint checking, hmm(T,State,[Emit|EmitRest],StoreIn) :- T > 0, msw(trans(State),NextState), msw(emit(NextState),Emit), check_constraints([NextState,Emit],StoreIn,StoreOut), The current implementation of the framework is available via http://akira.ruc.dk/ ∼ cth/chmm H. Christiansen et al. T1 is T-1, hmm(T1,NextState,EmitRest,StoreOut). Integration of side-constraint checking is done by extending relevant predicateswith an extra parameter (

StoreIn,StoreOut in the code above) to accommodatea constraint store and a call to the check constraints goal (line 5), after eachdistinct sequence of msw applications.If check constraints fails during PRISM inference, then the correspondingPRISM derivation fails, and further extensions of this derivation will not be at-tempted since it does not constitute a valid run. In eﬀect, inference by PRISMwill only consider runs which are guaranteed not to violate any of the constraintsdeclared for the model.Declaration of constraints and implementation of constraint solvers are concep-tually decoupled from the PRISM model. The declaration of side-constraints on themodel is done by declaring facts of the form, constraint(ConstraintSpec) . The

ConstraintSpec associates the constraint with a constraint checker implementa-tion and may contain some parameters for this particular instance of the type ofconstraint.A satisﬁability checker maintains its own constraint store. A satisﬁability checkerfor a particular type of constraint consists of an init constraint store/2 ruleand one or more check sat/4 rules. The init constraint store/2 rule is used tocreate a starting point for the constraint store of each declared constraint and is ofthe form, init_constraint_store(ConstraintSpec, InitialStore).

It is given

ConstraintSpec and must unify

InitialStore with an initial constraintstore matching the

ConstraintSpec . Additionally, one or more check sat rules ofthe form, check_sat(ConstraintSpec,StateUpdate,StoreBefore,StoreAfter):- ... . must be implemented to check the satisﬁability of the constraint.As an example, consider an implementation of a cardinality atmost constraint,called cardinality in our framework, init_constraint_store(cardinality(_,_), 0).check_sat(cardinality(U,Max), U, VisitsIn, VisitsOut) :-VisitsOut is VisitsIn + 1,VisitsOut =< Max.check_sat(cardinality(X,_),U,S,S) :- X \= U.

Each time check constraints is called from the PRISM model, the relevant check sat goals are called for each declared constraint. If any of these fails, so will check constraints . StateUpdate and

StoreBefore are given and check cons-traints is expected to unify

StoreAfter to an updated constraint store. In ourexample HMM, the

StateUpdate will consist of the [State,Emit] pattern givento check constraints .The call to this rule must only succeed if the constraint given by

ConstraintSpec is not violated by the further information given by the

StateUpdate . Constraints are nference with Constrained Hidden Markov Models

The tabling mechanism in PRISM makes Viterbi computation and EM learningeﬃcient, but when extra parameters such as the constraint store are introduced inthe probabilistic goals, PRISM considers these as goals with distinct derivationsand stores a tabled entry for each version of the goal. This behavior is undesiredwhen the extra parameters are used only for internal bookkeeping. The eﬀect ofthis excessive tabling is that the dynamic programming advantages are lost withexponential time inference as consequence.In (Christiansen and Gallagher 2009) a related problem concerning tabling ofannotations produced by running Viterbi on PRISM programs is approached usinga program transformation that removes non-discriminating arguments, which donot aﬀect the control ﬂow. The annotation can then be recovered from the programderivation of the transformed program.This approach is not applicable for the constraint store argument because theconstraint store implicitly aﬀects control ﬂow by limiting possible future deriva-tion extensions. The constraint store has to be considered in the inference process;otherwise it would be possible to produce invalid derivation paths.B-Prolog, on which PRISM is based, supports table modes, but this is not directlyusable with probabilistic goals in PRISM. It is possible with these modes to declarean argument of a tabled goal as an output argument , which means that it will not beused as key in the table lookup, but will be uniﬁed with the value of the argumentstored in a tabled goal. For our purpose, declaring the constraint store argumentsas output arguments would not be feasible since diﬀerent derivations of the samegoal may have diﬀering constraint stores and these determine possible derivationextensions.To deal with the tabling problem we have introduced a separate constraint storestack, which avoids storing data locally in parameters of probabilistic goals bymaintaining the constraint store with assert and retract. This stack is maintainedin parallel to the derivation stack of Prolog. PRISM utilizes Prolog’s backtracking toexplore possible solutions, so the constraint store stack implementation is requiredto be able to restore a previous constraint store when PRISM encounters failuresduring inference and performs backtracking to ﬁnd alternative solutions.To utilize this functionality, the user should use the goal check constraints/1 ,which omits the store arguments, rather than check constraints/3 as statedabove. We then deﬁne check constraints/1 as check_constraints(StateUpdate) :- H. Christiansen et al. get_store(StoreBefore),check_constraints(StateUpdate,StoreBefore,StoreAfter),forward_store(StoreAfter).

The new check constraints/1 make use of the goal get store/1 to retrieve thecurrent version of the constraint store and forward store/1 is used to assert theupdated store, get_store(S) :- !, store(S).forward_store(S) :- (asserta(store(S)) ; retract(store(S)),fail).

If a derivation fails, PRISM backtracks to the choice point in the forward store rule and retract the most recently asserted store. Then, when exploring alterna-tive derivation extensions, the previously asserted constraint store will be used asexpected.

Due to tabling, PRISM guarantees familiar best known complexity bounds of com-mon inference tasks on a variety of the models that can be expressed in PRISM,which includes HMMs (Sato 2000). This implicitly limits the number of calls of check constraints to the same bound. The added complexity of doing constraintchecking depends on incremental constraint checking cost of individual constraintscheckers and the number of constraints expressed on the model.Space complexity is inﬂuenced by table space usage and maximal length of aderivation at any given point. Since the asserted constraint store stack contains aconstraint store ﬁxpoint for each step of the current derivation, it is bounded by O ( n max( | c | )) where n is the length of the sequence and max( | c | ) is the maximal sizeof the constraint store in any derivation step. Note that the space complexity of theseparate constraint store stack is unaﬀected by time complexity and the numberof states in the model. With more complex models like the pair HMM, the tablespace required for dynamic programming becomes the dominating concern. In this section, we validate our CHMM implementation with the pair HMM pre-sented in section 2.2. The experiments were run on a computer with 16 2.4 GHz,64 bit Intel Xeon(R) E7340 CPUs and 64 GB of memory. All of the experimentsutilized only a single processor at a time.Our experiments utilize implementations of some common constraints adapted forthe CHMM framework: cardinality(UpdatePatterns,Max) ensures that entriesfrom the list

UpdatePatterns occurs at most

Max times in the derivation sequence. alldiff ensures that all updates in a derivation are diﬀerent; lock to sequence(Seq) ensures that the sequence of derivation updates is identical to the sequence repre-sented by the list

Seq ; lock to set(Set) ensures that all updates belong to mem-bers of the list Set . The operator forall subseq(L,C) applies the constraint C toevery subsequence of length L in the derivation sequence and for range(From,To,C) nference with Constrained Hidden Markov Models To - From , both inclusive; state specific(C) applies Conly to the

State part of the update.

The addition of side-constraints to an HMM involves some computational overheadin order to check the satisﬁability of the constraints, but may also reduce the numberof possible solutions and therefore the amount of work required to ﬁnd the optimalpath. As a practical experiment to demonstrate this, we consider global alignmentwith the pair HMM discussed in section 2.2.The overhead of integrating the constraint checking machinery in the modelis demonstrated in the left part of Fig. 3, where sequences of increasing lengthare aligned. It can be observed that the running time penalty is a constant fac-tor and that the polynomial time complexity of the pair HMM is preserved inour framework. Obviously, polynomial time inference presupposes incremental con-straint checking to be a constant time operation, which may not be the case forcertain types of constraints.In the right part of Fig. 3, two sequences of equal length (32) are aligned, but withvarying amounts of constraints being enforced. The global cardinality constraintis used to enforce an upper limit, L , on the amount of inserts or deletes in thealignment, constraint(state_specific(cardinality([insert,delete],L))). By constraining the alignment (allowing fewer gaps), the space of viable solutionsis reduced. The more constrained the alignment is, the more pruning opportunitiesarise. With a large amount of pruning opportunities, the running time is reducedquite signiﬁcantly. Note that, since the imposed constraint is state specific , thenumber of possible alignments, and hence running time, is unaﬀected by inputsequence structure. r unn i ng t i m e ( m ili s e c ond s ) sequence lengthsconstrained pair hmm (no constraints enforced)pure pair hmm 0 100 200 300 400 500 600 0 5 10 15 20 25 30 35 r unn i ng t i m e ( m ili s e c ond s ) allowed insertions/deletions Fig. 3. Left: Running time of alignment with a pure pair HMM compared toalignment with a CHMM with no constraints enforced. Right: Running time ofalignment of two sequences of length 32 with varying amounts of allowed insertionsand deletions.4

H. Christiansen et al.

To verify the eﬃciency of our constraint store implementation, alignment with a lo-cal cardinality constraint was measured for diﬀerent sizes of input sequences. Fromthe measurements, which are reported in Fig. 4, it is apparent that our implemen-tation does not incur the same exponential overhead as the naive implementationwhere the constraint store is maintained in the goals and hence tabled. r unn i ng t i m e ( m ili s e c ond s ) length of aligned sequencesconstraint store in goalsseparate constraint store 100000 1e+06 1e+07 1e+08 1e+09 0 10 20 30 40 50 60 70 80 90 100 t o t a l m e m o r y u s e ( b y t e s ) length of aligned sequencesconstraint store in goalsseparate constraint store Fig. 4. A comparison of the running time (left) and memory usage (right) of con-strained alignment of two sequences with tabled constraints versus a separate con-straint store stack.Running times and memory usage for a range of diﬀerent constraints are reportedin Table 1. For the sake for completeness, the table also includes running times forthe version where the constraint store is tabled.

Sequence Running MemoryConstraint lengths Time (in ms) consumption (in kb)in goals separate in goals separatecardinality([insert],20) 50 15460 3176 42296 5723cardinality([insert],40) 50 29557 3968 93845 6703for range(1,50,lock to set([match])) 100 24649 4544 105498 7137for range(1,90,lock to set([match])) 100 20 48 1641 1198for range(1,50,lock to sequence([match,..,match])) 100 24829 4544 1641 1198for range(1,90,lock to sequence([match,..,match])) 100 20 48 105498 7137alldiﬀ 20 100442 28 85654 256forall subseqs(5,alldiﬀ) 10 1664 12 60098 137

Table 1. Running time and memory consumption for alignment with diﬀerent kindsof constraints.In most cases the separate constraint store performs better in terms of bothrunning time and memory consumption. In the cases where performance is worse,it can be attributed to a very small number of possible derivations or constraintswhich rarely change the store. nference with Constrained Hidden Markov Models The term “Constrained HMM” is used in (Roweis 1999; Landwehr et al. 2007) andrefers to restrictions on the ﬁnite automaton associated with an HMM but not asconstraints on HMM runs. In (Sato and Kameya 2008), CHMMs were introduced toexemplify an EM algorithm, suited for PRISM programs which allow the possibilityof derivation failures. Our approach diﬀers, as we augment PRISM programs withside-constraints and use constraint solving techniques to achieve eﬃcient inference.In (Riezler 1998), Riezler proposes techniques for inference in probabilistic con-straint logic programming. In (Costa et al. 2008) relationships between elements ofa Bayesian Network are expressed as a constraint logic program, which is similar tothe way we deﬁne HMMs. However, our paper focus diﬀers as we study the interestof checking satisﬁability of side-constraints during inference.In the natural language processing community, recent work on Constrained Con-ditional Models feature an approach similar to ours. Indeed, Constrained Condi-tional Models is a general framework that augments inference and learning of condi-tional models with declarative constraints (Chang et al. 2008). However, inferenceis expressed as an Integer Linear Programming problem (Roth and Yih 2005). Inthis context, more expressive constraints, such as cardinality or all different ,can not be added on an HMM run. Moreover, our PRISM-based implementationallows us to deﬁne the HMM structure separately from the side-constraints and useadvanced constraint solving techniques. In this paper, we propose a framework to deﬁne HMMs with side-constraints asa Constraint Logic program extended by probabilistic choices. Constraint LogicProgramming have advantages in terms of more compact expression of CHMMs.Inference computations are adapted for CHMMs and conditions for an eﬃcientcomputation are described. An implementation based on PRISM is proposed andwell-known constraints and operators have been demonstrated for deﬁning CHMMs.Finally, we experimentally validate our approach with a constrained pair HMM usedfor biological sequence alignment.As current work, we study how sampling and EM-learning can be adapted for ourCHMM framework. Indeed, sampling turns out to be problematic in probabilisticmodels with a large probability of derivation failure. In (Sato et al. 2005), Sato etal. address the problem of EM-learning with PRISM programs that can fail andtheir methods are also applicable for our framework.As further work, we plan to incorporate more advanced constraint solving tech-niques such as those used in Weighted CSP (Larrosa and Schiex 2004) in the frame-work. This approach would allow us to combine soft constraints solving and in-ference and express this as an optimization problem. We also plan to deal withthe restriction that individual constraint checkers do not share information in ourframework, so that we can beneﬁt from some of the optimization techniques usedby other constraint solvers. We are working on extending the library of constraints6

H. Christiansen et al. that can be deﬁned as side-constraints.

Acknowledgment

This work is supported by the project Logic-statistic modeling and analysis of bio-logical sequence data funded by the NABIIT program under the Danish StrategicResearch Council. We thank the anonymous reviewers for their interesting com-ments.

References

Chang, M.-W. , Ratinov, L.-A. , and Rizzolo, N. Roth, D. Proc. of AAAI Conference on Artiﬁcial Intelligence . Chicago,USA, 1513–1518.

Christiansen, H. and Gallagher, J.

Proc. of Intermational Conference in Logic Programming . Pasadena, USA,55–69.

Christiansen, H. , Have, C. , Lassen, O. , and Petit, M. Proc. of the Inter-national Workshop on Constraint Based Methods for Bioinformatics . Lisbon, Portugal,19–26.

Costa, V. , Page, D. , and Cussens, J. Probabilistic Inductive Logic Programming LNAI 4911 ,156–188.

Durbin, R. , Eddy, S. , Krogh, A. , and Mitchison, G. Biological Sequence Anal-ysis . Cambridge University Press.

Landwehr, N. , Mielikinen, T. , Eronen, L. , Toivonen, H. , and Mannila, H. BMC Bioinfor-matics 8,

S-2.

Larrosa, J. and Schiex, T.

Artiﬁcial Intelligence 159,

Rabiner, L.

IEEE 77,

Riezler, S.

Roth, D. and Yih, W.

Proc. of the International Conference on Machine Learning . Bonn, Germany,737–744.

Roweis, S.

Proc. of the International Con-ference of Advances in Neural Information Processing System . Denver, USA, 782–788.

Sato, T.

Proc. of International Conference in Logic Programming . Tokyo, Japan,715–729.

Sato, T.

Proc. of the Workshop on Fusion of Domain Knowledge with Data for Decision Support .Tokyo, Japan.

Sato, T. , Kameya, T. , and Zhou, N. Proc. of International Joint Conference on Aritiﬁcial Intelligence . Edinburgh, Scot-land, 847–852. nference with Constrained Hidden Markov Models Sato, T. and Kameya, Y.

Proc. of the International Joint Conference of on Artiﬁcial Intellingence . Nagoya,Japan, 1330–1335.

Sato, T. and Kameya, Y.

Probabilistic Inductive Logic Programming . LNCS. Springer, 118–155.

Van Hentenryck, P. , Saraswat, V. , and Deville, Y. Constraint Programming 910 , 293–316.

Viterbi, A. J.

IEEE Transactions on Information Theory 13 , 260–269., 260–269.