On Low Complexity Maximum Likelihood Decoding of Convolutional Codes
aa r X i v : . [ c s . I T ] J u l On Low Complexity Maximum LikelihoodDecoding of Convolutional Codes
Jie Luo,
Member, IEEE
Abstract
This paper considers the average complexity of maximum likelihood (ML) decoding of convolutionalcodes. ML decoding can be modeled as finding the most probable path taken through a Markov graph.Integrated with the Viterbi algorithm (VA), complexity reduction methods such as the sphere decoder oftenuse the sum log likelihood (SLL) of a Markov path as a bound to disprove the optimality of other Markov pathsets and to consequently avoid exhaustive path search. In this paper, it is shown that SLL-based optimalitytests are inefficient if one fixes the coding memory and takes the codeword length to infinity. Alternatively,optimality of a source symbol at a given time index can be testified using bounds derived from log likelihoodsof the neighboring symbols. It is demonstrated that such neighboring log likelihood (NLL)-based optimalitytests, whose efficiency does not depend on the codeword length, can bring significant complexity reductionto ML decoding of convolutional codes. The results are generalized to ML sequence detection in a class ofdiscrete-time hidden Markov systems.
Index Terms coding complexity, convolutional code, hidden Markov model, maximum likelihood decoding, Viterbialgorithm
I. Introduction
We study the algorithms that reduce the average complexity of maximum likelihood (ML) decodingof convolutional codes. By ML decoding, we mean the decoder uses code-search to find, and toguarantee the output of, the most likely codeword.Forney showed that ML decoding of convolutional codes is equivalent to finding the most probablepath taken through a Markov graph [1]. Denote the codeword length by N and the coding memoryby ν . For each time index, the number of Markov states in the Markov graph is exponential in ν . Thetotal number of Markov states is therefore exponential in ν but linear in N . Define the complexity of The author is with the Electrical and Computer Engineering Department, Colorado State University, Fort Collins, CO 80523.E-mail: [email protected] work was supported by National Science Foundation grant CCF-0728826. a decoder as the number of visited Markov states normalized by the codeword length N . Practical MLdecoding is often achieved using the Viterbi algorithm (VA) [2][1], whose complexity does not scalein N but scales exponentially in ν . Well known decoders such as the list decoders [3], the sequentialdecoders [4], and the iterative decoders [5] are able to achieve near optimal error performance withlow average complexity. However, these decoders do not guarantee the output of the ML codeword[6].If obtaining the ML codeword is strictly enforced (see Section VII for justification), to avoidexhaustive path search, the decoder must develop certain criterion or bound that can be used todisprove the optimality of a Markov path set. This is equivalent to developing an optimality testcriterion (OTC) [7] to test whether the ML path (or codeword) belongs to the complementary pathset (or codeword set) .Two major OTCs have been used in the ML decoding of convolutional codes. The first one is the“path covering criterion” (PCC) (explained in [8] and in Appendix A) used in the VA [2][1]. VAvisits all Markov states in chronological order [1]. For each time index, the decoder maintains a set of“cover” (defined in Appendix A) Markov paths each passing one of the Markov states [1]. Accordingto the PCC, the “cover” Markov path passing a Markov state disproves the optimality of all otherMarkov paths passing the same state. The second OTC is the sum log likelihood (SLL)-based OTCsused extensively in the sphere decoder [10][9]. Sphere decoder models ML decoding as finding thelattice point closest to the channel output in the signal space [9]. Hence the distance between thechannel output and an arbitrary lattice point upper bounds the distance from the channel outputto the ML codeword. Such distance bound is based on the SLL of the corresponding codeword, andis used in the sphere decoder [10][9] as well as other ML decoders [7] as the key means to avoidexhaustive codeword search. In [11][12], Vikalo and Hassibi showed that PCC-based and SLL-basedoptimality tests can be combined to find the ML codeword without visiting all Markov states.Assume PCC-based optimality test is always implemented. In this paper, we first show that additional complexity reduction brought by the SLL-based optimality test diminishes as one fixesthe coding memory ν and takes the codeword length N to infinity. Such inefficiency is due to thefact that SLL-based OTC does not exploit the structure of the convolutional code. Searching theML codeword is equivalent to finding the ML source message, which contains a sequence of sourcesymbols. We show whether the ML message contains a particular symbol at a given time index canbe tested using an OTC that depends only on the log likelihood of channel output symbols in a fixed- In the literature such as [7], OTC refers to a criterion designed to test whether a single codeword is optimum. In this paper,we extend the definition of OTC to a general criterion that can either verify or disprove the optimality of a codeword set. sized time neighborhood. We call such test the neighboring log likelihood (NLL)-based optimalitytest, and show its efficiency does not depend on the codeword length. We theoretically demonstratethat NLL-based optimality test can bring significant complexity reduction to ML decoding whenthe communication system has a high signal to noise ratio (SNR). Complexity of the decoder usingSLL-base optimality test, on the other hand, remains the same as the VA for all SNR if the codewordlength is taken to infinity. The results are also generalized to ML sequence detection in a class ofdiscrete-time hidden Markov systems [13].
II. Problem Formulation
Let C be an ( n, k ) convolutional code over GF( q ) defined by a polynomial generater matrix G ( D )[14], G ( D ) = G [0] + G [1] D + . . . + G [ ν − D ν − , (1)where D is the delay operator; ν is the coding memory; G [ l ], l = 0 , . . . , ν −
1, are k × n matricesover GF( q ). Assume G ( D ) is a minimal encoder [14].Denote the source message by a sequence of vector symbols , x ( D ) = x [ d ] D d + x [ d + 1] D d +1 + . . . , (2)where d is the time index, possibly negative; x [ d ], ∀ d , are row vectors of dimension k over GF( q ).The encoded message, or the corresponding codeword, is given by y ( D ) = x ( D ) G ( D ) = X d ν − X l =0 x [ d − l ] G [ l ] D d . (3)To simplify the presentation, we assume time index d takes all integer values. We assume x [ d ] = for d < d ≥ N . We term N the codeword length.Define a function g q ( y ) that maps y from GF( q ) to R (the set of real numbers) in one-to-onesense. If y ( D ) is a vector sequence, g q ( y ( D )) applies the mapping to each of the elements of y ( D ),respectively . Assume the codeword is transmitted over a memoryless Gaussian channel. The channeloutput symbol sequence is given by r ( D ) = g q ( y ( D )) + n ( D ) = g q ( x ( D ) G ( D )) + n ( D ) , (4)where n ( D ) = n [ d ] D d + n [ d + 1] D d +1 + . . . is the noise sequence with n [ d ] ∼ N ( , σ I ) beingi.i.d. Gaussian. Without loss of generality, we define the scaled signal to noise ratio of the systemas SNR = σ . In Section VI, we show that the results are generalizable not only to other channelmodels, but also to a class of hidden Markov systems. Hence the output of g q ( y ( D )) is a vector sequence of the same length and dimension as y ( D ). Given the channel output, for any source message x ( D ) and its corresponding codeword y ( D ) = x ( D ) G ( D ), we define the “negative SLL” as S x ( x ( D )) = S y ( y ( D )) = N + ν − X d =0 k r [ d ] − g q ( y [ d ]) k . (5)The objective of ML decoding is to find the ML message x ML ( D ) that minimizes the negative SLL, x ML ( D ) = argmin x [ d ] , ≤ d For ML decoders using SLL-based optimality test, the decoder first obtains a quick guess of thesource message without solving the ML decoding problem. SLL of the obtained message is then used tohelp disproving the optimality of certain Markov path sets and consequently to avoid exhaustive pathsearch. We make an ideal assumption that the “guessed” message equals the transmitted message .We show in this section that, even under this ideal assumption, complexity reduction brought by theSLL-based optimality tests still diminishes as we take N to infinity.Let x ( D ) be the actual source message, which is also the message “guessed” by the decoder. Let y ( D ) = x ( D ) G ( D ) be the transmitted codeword. The corresponding negative SLL is given by S x ( x ( D )) = N + ν − X d =0 k r [ d ] − g d ( y [ d ]) k = N + ν − X d =0 k n [ d ] k . (7)Now consider a subset of time indices D xd ⊆ [0 , N ). Let { ˜ x [ d ] | d ∈ D xd } be a partial message definedonly at time indices in D xd . Denote by { ˜ x ( D xd ) } the set of messages satisfying { ˜ x ( D xd ) } = { x ( D ) | x [ d ] = ˜ x [ d ] , ∀ d ∈ D xd , x ( D ) = x ( D ) } . (8)Suppose the decoder wants to test whether it can disprove the optimality of { ˜ x ( D xd ) } , i.e., whether x ML ( D ) 6∈ { ˜ x ( D xd ) } . A common practice [7][11][12] is to find a lower bound, denoted by S Lx (˜ x ( D xd )),of the negative SLLs of the messages in { ˜ x ( D xd ) } . S x ( x ( D )) ≥ S Lx (˜ x ( D xd )) , ∀ x ( D ) ∈ { ˜ x ( D xd ) } . (9)If the lower bound S Lx (˜ x ( D xd )) is larger than S x ( x ( D )) obtained in (7), then we have S x ( x ( D )) ≥ S Lx (˜ x ( D xd )) > S x ( x ( D )) for all x ( D ) ∈ { ˜ x ( D xd ) } , which means the ML message is not in { ˜ x ( D xd ) } . Note that the decoder still needs to testify whether the guessed message is indeed the ML solution. If it is not, then a searchfor the ML message must be carried out. In Appendix B, we show that the SLL lower bounds appeared in the literature satisfy the followingassumption. Assumption 1: Given { ˜ x ( D xd ) } , let D yd ⊆ [0 , N + ν ) be the maximum time index set, over whichwe can find a partial codeword ˜ y ( D yd ) such that for all x ( D ) ∈ { ˜ x ( D xd ) } with y ( D ) = x ( D ) G ( D ),we have y [ d ] = ˜ y [ d ] for all d ∈ D yd . Note that D yd and ˜ y ( D yd ) are uniquely determined by { ˜ x ( D xd ) } .We also have | D yd | ≤ | D xd | + ν .We assume the existence of a positive constant ǫ ∈ (0 , not depend on N , suchthat S Lx (˜ x ( D xd )) ≤ X d ∈ D yd k r [ d ] − g q (˜ y [ d ]) k + ( N + ν − | D yd | )(1 − ǫ ) nσ . (10)As demonstrated in [11][7], if we fix N , using S Lx (˜ x ( D xd )) > S x ( x ( D )) as the OTC to disprovethe optimality of message set { ˜ x ( D xd ) } can bring significant complexity reduction to ML decoding,especially under high SNR. However, if we define D e ⊆ D yd as the subset of time indices correspondingto the erroneous codeword symbols, i.e., D e = { d | d ∈ D yd , ˜ y ( d ) = y ( d ) } , (11)the following proposition shows that SLL-based optimality tests become inefficient if N − | D xd | istaken to infinity while | D e | is kept finite. Lemma 1: Assume the generater matrix G ( D ) is fixed, and therefore the constraint length ν isfixed. Consider message sets characterized by { ˜ x ( D xd ) } for arbitrary D xd but under the constraint ofa fixed D e , where D e ⊆ D yd is defined in (11) and the derivation of D yd is specified in Assumption 1.If we fix SNR and take N − | D xd | to infinity, we havelim N −| D xd |→∞ P { S Lx (˜ x ( D xd )) > S x ( x ( D )) } = 0 . (12)If we first take N − | D xd | to infinity and then take SNR to infinity, we havelim SNR →∞ lim N −| D xd |→∞ P { S Lx (˜ x ( D xd )) > S x ( x ( D )) } = 0 . (13) Proof: Since | D yd | ≤ | D xd | + ν , taking N − | D xd | to infinity implies taking N − | D yd | to infinity.According to Assumption 1, we have S Lx (˜ x ( D xd )) − S x ( x ( D )) N + ν − | D yd | ≤ N + ν − | D yd | X d ∈ D e k r [ d ] − g q (˜ y [ d ]) k + (1 − ǫ ) nσ − N + ν − | D yd | X d ∈ D e k n [ d ] k − N + ν − | D yd | X d D yd k n [ d ] k . (14) Since n [ d ] are i.i.d. Gaussian with covariance matrix σ I , k n [ d ] k are i.i.d. χ with mean nσ andvariance 2 nσ . Therefore N + ν −| D yd | P d D yd k n [ d ] k → nσ , N + ν −| D yd | (cid:16)P d ∈ D e k r [ d ] − g q (˜ y [ d ]) k (cid:17) → N + ν −| D yd | P d ∈ D e k n [ d ] k → N − | D yd | → ∞ . Consequently, denote theright hand side of (14) by U , we have with probability one,lim N −| D yd |→∞ U = − ǫnσ < . (15)This yieldslim N −| D xd |→∞ P n S Lx (˜ x ( D xd )) > S x ( x ( D )) o = lim N −| D yd |→∞ P ( S Lx (˜ x ( D xd )) − S x ( x ( D )) N + ν − | D yd | > ) ≤ lim N −| D yd |→∞ P { U > } = 0 . (16)Since (16) holds for all SNR, the conclusion remains true if we take SNR to infinity after N − | D xd | is taken to infinity .With the help of Lemma 1, inefficiency of SLL-based optimality tests is characterized by thefollowing lemma. Lemma 2: Let C sll be the complexity of an ML decoder that only uses PCC- and SLL-basedoptimality tests for complexity reduction. Let C va be the complexity of the Viterbi decoder, in which,only PCC-based optimality test is used. For any δ > 0, we have,lim N →∞ P { C sll ≥ (1 − δ ) C va } = 1lim SNR →∞ lim N →∞ P { C sll ≥ (1 − δ ) C va } = 1 . (17)The proof of Lemma 2 is given in Appendix C. IV. Neighboring Log Likelihood-based Optimality Test We propose in Theorem 1 a class of NLL-based optimality tests, whose efficiency does not depend onthe codeword length N . We show in Section V that these NLL-based optimality tests can significantlyreduce the average complexity of ML decoding under high SNR. This is in contrast to the inefficiencyof SLL-based optimality tests which are not able to bring meaningful complexity reduction if N istaken to infinity first. Theorem 1: Define d , d by d = min y = y k g q ( y ) − g q ( y ) k , d = max y = y k g q ( y ) − g q ( y ) k , (18) Note that the order in which limits are taken in (13) is important. If we fix N and take SNR to infinity first, we can getlim N −| D xd |→∞ lim SNR →∞ P { S Lx (˜ x ( D xd )) > S x ( x ( D )) } = 1. where y , y are n -dimensional row vectors over GF ( q ). Let ξ be an arbitrary constant, M be anarbitrary integer, satisfying 0 < ξ < d , M > νd ξ . (19)Let x ( D ) be a source message whose corresponding codeword is y ( D ). For any time index m , ifthe following inequality is satisfied for all d ∈ [ m − M ν, m + 2 M ν ), k r [ d ] − g q ( y [ d ]) k < d − ξ, (20)and the following inequalities hold, m +(2 M +1) ν − X d = m +2 Mν k r [ d ] − g q ( y [ d ]) k ≤ M ξ − νd m − Mν − X d = m − (2 M +1) ν k r [ d ] − g q ( y [ d ]) k ≤ M ξ − νd , (21)then we must have x [ ˜ m ] = x ML [ ˜ m ], ∀ ˜ m ∈ [ m, m + ν ).We skip the proof of Theorem 1 since the result is implied by Theorem 3 presented in Section VI.Note that the values of d min and d max only depend on the g q () function. Hence, as long as g q ()and ν are given, the values of ξ and M can be fixed, e.g., ξ = d and M = (cid:24) νd d (cid:25) . Given M , theoptimality test presented in Theorem 1 testifies the optimality of { x [ ˜ m ] | ˜ m ∈ [ m, m + ν ) } using the loglikelihood of channel output symbols within a fixed-sized time interval [ m − (2 M +1) ν, m +(2 M +1) ν ).It is quite intuitive to see, efficiency of the test does not depend on the codeword length if all otherparameters are fixed.Efficiency of the OTC proposed in Theorem 1 is characterized by the following lemma. Lemma 3: Assume ξ and M are chosen to satisfy (19). Let m be an arbitrary time index. Let y ( D ) equal the transmitted codeword within time interval [ m − (2 M + 1) ν, m + (2 M + 1) ν ). DefineOPT m as the event that (21) is satisfied and (20) is satisfied for all d ∈ [ m − M ν, m + 2 M ν ).Fix all other parameters and take SNR to infinity, we havelim SNR →∞ P { OPT m } = 1 . (22)The same conclusion holds if we first take N to infinity, then take SNR to infinity.lim SNR →∞ lim N →∞ P { OPT m } = 1 . (23) Proof: If y ( D ) equals the transmitted codeword within time interval [ m − (2 M + 1) ν, m +(2 M + 1) ν ), for d ∈ [ m − (2 M + 1) ν, m + (2 M + 1) ν ), we have r [ d ] − g q ( y [ d ]) = n [ d ] . (24) Consequently, (22) and (23) hold because k n [ d ] k are i.i.d. χ , whose mean, n SNR , and variance, n SNR ,converge to 0 as SNR goes to infinity.Lemma 3 implies, if there is a suboptimal decoder whose probability of symbol detection error (asopposed to sequence detection error) is low under high SNR, then NLL-based optimality tests canhelp transforming the suboptimal detector to an ML detector with only marginal increase in averagedecoding complexity. An example of such transformation is presented in the following section. V. A Three-step ML Decoding Framework The communication system given in Section II follows a discrete-time hidden Markov model [13],where each Markov state at time index d corresponds to a possible combination of source symbols intime interval ( d − ν, d ]. If a decoder obtains the ML codeword using the VA, all Markov states withintime interval [ ν, N ] have to be visited. Alternatively, if one can use a low complexity algorithm todisprove the optimality of most of the Markov states, then the VA can limit its search by visitingonly a small subset of Markov states.Following this idea, the three-step ML decoding framework is given as follows. • Step 1: The decoder uses a suboptimal algorithm (denoted by Φ sub ) to obtain a quick guess ofthe codeword ˜ y ( D ) and its corresponding source message ˜ x ( D ). • Step 2: An NLL-based optimality test (specified in Theorem 1) is applied to each of the sourcesymbols of ˜ x ( D ). The decoder maintains a source symbol set sequence X ( D ), with X [ d ] beingthe source symbol set of time index d . If ˜ x [ d ] = x ML [ d ] can be confirmed by the optimality test,we let X [ d ] = { ˜ x [ d ] } ; otherwise, we let X [ d ] be the set of all possible source symbol vectors attime index d . • Step 3: The decoder uses a modified VA to search for the ML source message. The only differencebetween the modified VA and the conventional VA is that, the modified VA visits a Markov stateonly if all source symbols corresponding to the Markov state belong to the source symbol sets X [ d ] of the corresponding time indices.Implementing the modified VA is quite straightforward. Hence its further description is skipped.Comparing to the three-step decoding algorithm studied in [7], the key advantage of using an NLL-based optimality test is that the test can be applied to an individual source symbol rather than thewhole source message. Theorem 2: Let P e { Φ sub } be the probability of symbol detection error of Φ sub . Assume, whilefixing all other parameters,lim SNR →∞ P e { Φ sub } = 0 , lim SNR →∞ lim N →∞ P e { Φ sub } = 0 . (25) Let C mva be the average number of Markov states per time unit visited by the modified VA in thethird step of the ML decoder. For any δ > 0, we havelim SNR →∞ P { C mva ≤ δ } = 1 , lim SNR →∞ lim N →∞ P { C mva ≤ δ } = 1 . (26) Proof: Let x ( D ), y ( D ) be the actual source message and the transmitted codeword, respectively.Let ˜ x ( D ), ˜ y ( D ) be the source message and the codeword output by Φ sub . According to (25), for anytime index m , we have lim SNR →∞ P ˜ y [ d ] = y [ d ] , ∀ d ∈ [ m − M − ν, m + (2 M + 1) ν ) = 1 . (27)where M is the parameter of the NLL-based optimality test, specified in Theorem 1. According to(27), Lemma 2, and Theorem 1, for any m , if ˜ y [ d ] = y [ d ] , ∀ d ∈ [ m − M − ν, m + (2 M + 1) ν ),then the probability that the NLL-based optimality test can confirm ˜ x [ d ] = x ML [ d ] , ∀ d ∈ [ m, m + ν )converges to one as SNR → ∞ . Consequently, letting X [ d ] be the source symbol set maintained bythe ML decoder in the second step, we havelim SNR →∞ P {| X [ d ] | = 1 , ∀ d ∈ [ m, m + ν ) } = 1 , ∀ m (28)Since the worst case complexity of the modified VA is bounded, (28) implies, for any δ > SNR →∞ P { C mva ≤ δ } = 1.Since all derivations hold if we first take N to infinity, we also have lim SNR →∞ lim N →∞ P { C mva ≤ δ } = 1.By sharing computations among optimality tests, it is easy to see that the complexity of the secondstep of the ML decoder is equivalent, in order, to visiting one Markov state per time unit. Therefore,if Φ sub satisfies (25), as SNR → ∞ , the complexity of the three-step ML decoder converges to thecomplexity of Φ sub , which can be significantly lower than the complexity of the VA. Moreover, thethree steps of the ML decoder can be implemented in a parallelized manner in the sense that eachstep can process some of the source symbols without waiting for the previous step to completely finishits work. An example of such parallelized implementation can be found in [15, The Simple MLSDAlgorithm]. VI. Maximum Likelihood Sequence Detection in A Class of Hidden Markov Systems In this section, we generalize the results of Section IV to ML sequence detection (MLSD) in a classof first order discrete-time hidden Markov systems [13]. We demonstrate in Appendix D that thecommunication system presented in Section II satisfies the model and the key assumptions given inthis section. Let u ( D ) = u [ d ] D d + u [ d + 1] D d +1 + ... be a first order Markov sequence, where d is the timeindex, possibly negative; u [ d ] represents the Markov state (at time d ), which is a k ν -dimensional rowvector defined over GF ( q ). We assume u [ d ] = for d < d ≥ N , with N being the sequencelength. Define y [ d ] = y ( u [ d ]) as the “processed state”, which is a deterministic function of u [ d ]. y [ d ]is a n -dimensional row vector defined over GF ( q ). We term y ( D ) = y [ d ] D d + y [ d + 1] D d +1 + ... theprocessed state sequence. Let r ( D ) = r [ d ] D d + r [ d + 1] D d +1 + ... be the observation sequence, where r [ d ] is a n -dimensional row vector with real-valued elements.Denote the state transition probability of the hidden Markov system by P t ( u | u ) = P { u [ d + 1] = u | u [ d ] = u } . (29)Define the transition probability ratio bound p tr by p tr = min u , u , P t ( u | u ) > u , u , P t ( u | u ) > P t ( u | u ) P t ( u | u ) . (30)We assume the Markov chain is ergodic and homogeneous. Therefore, there exists a positive integer ν , such that P { u [ d + ν ] = u | u [ d ] = u } 6 = 0 , ∀ u , u . (31)Denote the observation distribution function by F o ( r | y ) = P { r [ d ] ≤ r | y [ d ] = y } . (32)Let the corresponding probability density function (or probability mass function) be f o ( r | y ).We also make the following two key assumptions. Assumption 2: We assume state processing y [ d ] = y ( u [ d ]) does not compromise the observabilityof the Markov states in the sense that there exists a positive integer ν satisfying the following property.Given two Markov state sequences u ( D ) and ˜ u ( D ). For any time index d , if u [ d ] = ˜ u [ d ], then wecan find a time index m ∈ ( d − ν, d + ν ), such that y ( u [ m ]) = y ( ˜ u [ m ]).Note that we used the same constant ν in (31) and in Assumption 2. This is valid because if (31)is satisfied for ν = ν , then it is also satisfied for all ν ≥ ν ; similar property applies to Assumption2. Consequently, if Assumption 2 holds, a common integer ν satisfying both (31) and Assumption 2can always be found. Assumption 3: Assume the existence of two functions: L l ( r , y ) and L u ( r , y ), both are functionsof the channel output symbol r and the processed state y . Assume L l ( r , y ) and L u ( r , y ) have thefollowing two properties. First, the following inequalities hold for all r and y . L l ( r , y ) ≤ min y , y = y [ − log( f o ( r | y )) + log( f o ( r | y ))] L u ( r , y ) ≥ max y = y [ − log( f o ( r | y )) + log( f o ( r | y ))] . (33)Second, the complexity of evaluating L l ( r , y ) and L u ( r , y ) is low in the sense that they do notrequire the search of any processed state other than y .Note that validity of the results presented in this section does not depend on the second propertyimposed in Assumption 3. However, we still include the property in the assumption since the keymotivation of posing Assumption 3 is to use the two functions L l ( r , y ) and L u ( r , y ) as tools toavoid exhaustive Markov state search and hence to reduce the complexity of ML decoding. Also notethat the right hand side of the second inequality in (33) is not a function of y . However, the upperbound on the left hand side is a function of a processed state y since one often needs a “referencestate” in order to upper bound the right hand side of (33). Further explanation is given in AppendixD.Given the observation sequence r ( D ), the negative SLL of a state sequence u ( D ) is obtained by S u ( u ( D )) = − N X d =0 log( f o ( r [ d ] | y [ d ]) P t ( u [ d ] | u [ d − . (34)The objective of MLSD is to find the ML sequence that minimizes the negative SLL, u ML ( D ) = argmin u [ d ] , ≤ d Assume the discrete-time Markov system satisfies Assumptions 2 and 3.Let ρ > u ( D ) and the correspondingprocessed states y ( D ). Let p tr be defined by (30). For any time index m , if there is an integer M > d ∈ [ m − M ν, m + 2 M ν ) L l ( r [ d ] , y [ d ]) > ν ( ρ − log p tr ) , (36)and m +(2 M +1) ν − X d = m +2 Mν L u ( r , y [ d ]) ≤ M νρ + ( ν + 1) log p trm − Mν − X d = m − (2 M +1) ν L u ( r , y [ d ]) ≤ M νρ + ν log p tr , (37)then u [ m + ν − 1] = u ML [ m + ν − 1] must be true.The proof of Theorem 3 is given in Appendix E. Note that Theorem 3 implies Theorem 1 if we setthe parameters in Theorem 1 at the corresponding values given in Appendix D. For communication systems following a discrete-time hidden Markov model, f o ( r | y ) often belongsto an ensemble of density (or probability) functions, with the actual realization determined by theSNR. In other words, we can write the observation density (or probability) f o ( r | y , SNR) as a functionof the SNR. Assume the discrete-time Markov system satisfies Assumption 3, where both functions L l ( r , y ) and L u ( r , y ) can be functions of the SNR. We make the following assumption. Assumption 4: Assume the observation density (or probability) f o ( r | y , SNR) is a function of theSNR. Assume the discrete-time Markov system satisfies Assumption 3. Let the actual state sequenceand the processed state sequence be u ( D ) and y ( D ), respectively. Define two positive numbers d and d as follows d (cid:26) γ ≥ 0; lim SNR →∞ P { L l ( r [ d ] , y [ d ]) ≥ γ SNR } = 1 (cid:27) ,d = inf (cid:26) γ ≥ 0; lim SNR →∞ P { L u ( r [ d ] , y [ d ]) ≤ γ SNR } = 1 (cid:27) . (38)We assume d > , d < ∞ . (39)The following lemma characterizes the efficiency of the OTC proposed in Theorem 3. Lemma 4: Assume the discrete-time Markov system satisfies Assumptions 2 and 4. Let the statesequence be u ( D ). Let ξ be an arbitrary constant, M be an arbitrary integer, satisfying0 < ξ < d , M > νd ξ . (40)Let ρ = ξ SNR3 ν . Given an arbitrary time index m , define OPT m as the event that (37) is satisfied and(36) is satisfied for all d ∈ [ m − M ν, m + 2 M ν ). If we fix all other parameters except the SNR, wehave lim SNR →∞ P { OPT m } = 1 . (41)If we fix all other parameters except the SNR and the sequence length N , we havelim SNR →∞ lim N →∞ P { OPT m } = 1 . (42)We skip the proof of Lemma 4 since it is quite straightforward.Note that in Lemma 4, when we take N and SNR to infinity, M can be fixed at a constant. Thisindicates that, when testing the optimality of a Markov state at a given time index, the NLL-basedoptimality test only uses observation symbols in a fixed-sized time neighborhood. Based on Theorem3 and Lemma 3, a three-step ML sequence detector similar to the one presented in Section V can bedeveloped to transform a suboptimal sequence detector to a low complexity ML sequence detector.The detailed discussion is skipped since it does not essentially differ from the one presented in SectionV. VII. Further Discussions In a practical system, suboptimal decoders such as the belief-propagation-based iterative decoders[5][6] can achieve near optimal error performance with low complexity. It is natural to ask: if subop-timal decoding only causes a negligible performance loss, why one should even bother with enforcingthe ML solution? Note that this question does not suggest a default answer since the argument canalso be presented in the opposite direction, i.e., if ML decoding only causes a negligible complexityincrease, why one should not use an ML decoder? Nevertheless, the purpose of our work is not toparticipate in the debate whether ML decoding is practically useful. Rather, one should interpretTheorem 2 as, for convolutional codes, the existence of a well-performed low complexity suboptimalalgorithm implies that ML decoding can be carried out with a similar complexity under high SNR.More importantly, such conclusion holds irrespective of the codeword length.Although the efficiency of SLL-based optimality tests does not depend on the codeword length,NLL-based optimality tests are inefficient only when the codeword length is large. Lemma 1 andTheorem 2 suggest that complexity reduction brought by NLL-based optimality tests can be superiorto SLL-based optimality tests even for moderate SNR if the codeword length is large enough. Appendix A. The Path Covering Criterion Assume the discrete-time hidden Markov model given in Section VI . Given the observation se-quence r ( D ). Let ˜ u ( D ) and u ( D ) be two Markov state sequences whose corresponding processedstate sequences are ˜ y ( D ) and y ( D ), respectively. If we can find two time indices d < d , such that˜ u [ d ] = u [ d ], ˜ u [ d ] = u [ d ], and d X d = d +1 log f o ( r [ d ] | ˜ y [ d − P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y [ d − P t ( u [ d ] | u [ d − < , (43)we say u ( D ) “covers” ˜ u ( D ). Path Covering Criterion: Markov state sequence ˜ u ( D ) cannot be the ML sequence if we canfind another state sequence u ( D ) that covers ˜ u ( D ).The proof of the PCC is skipped since it is quite well known [8].We say u ( D ) is a “cover” path with respect to Markov states u [ d ] and u [ d ] at time indices d < d if, among all Markov paths passing u [ d ] and u [ d ], u ( D ) maximizes P d d = d +1 log( f o ( r [ d ] | y [ d − P t ( u [ d ] | u [ d − u [ − 1] = . We say u ( D ) is a “cover” pathwith respect to Markov state u [ d ] at time index d > u [ d ], u ( D ) maximizes P d d =1 log( f o ( r [ d ] | y [ d − P t ( u [ d ] | u [ d − It is shown in Appendix D that the model is satisfied by the communication system given in Section II. B. Examples of SLL-based Optimality Tests Satisfying Assumption 1 In [12][11], when the decoder branches a Markov path at time index m < N , the branch ischaracterized by a partial message { ˜ x [0] , ˜ x [1] , . . . , ˜ x [ m ] } . For any codeword ˜ y ( D ) associated to thebranch, we have ˜ y [ d ] = ν − X l =0 ˜ x [ d − l ] G [ l ] . (44)In other words, D d = [0 , m ]. The negative SLL lower bound is given by S y (˜ y ( D )) = N + ν − X d =0 k r [ d ] − g q (˜ y [ d ]) k ≥ m X d =0 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) r [ d ] − g q ν − X l =0 ˜ x [ d − l ] G [ l ] !(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (45)which satisfies Assumption 1 with ǫ = 1.In [7], several SLL-based OTCs were presented for decoding block codes. The decoder obtains afirst guess y ( D ) of the codeword. A negative SLL lower bound S Ly ≤ S y (˜ y ( D ) = y ( D )) is thendeveloped for the codeword set { ˜ y ( D ) = y ( D ) } , which corresponds to the case of D d being an emptyset in the context of Section III. y ( D ) is optimal if the optimality test S Ly > S y ( y ( D )) gives a positiveanswer [7].The lower bounds S Ly presented in [7, Section III] satisfy the following inequality, S Ly ≤ min ˜ y ( D ) = y ( D ) N + ν − X d =0 k g q (˜ y [ d ]) − g q ( y [ d ]) k (46)Since the coding constraint is ν , we can always find a codeword ˜ y ( D ) = y ( D ) with ˜ y ( D ) differingfrom y ( D ) at no more than ν codeword symbols. This implies that the right hand side of (46) canbe upper bounded by a constant, denoted by U , which is not a function of N . S Ly ≤ min ˜ y ( D ) = y ( D ) N + ν − X d =0 k g q (˜ y [ d ]) − g q ( y [ d ]) k ≤ U (47)Consequently, given SNR > < ǫ < 1, there exists a constant N such that Assumption 1 issatisfied for N > N . C. Proof of Lemma 2Proof: Assume, in searching the ML codeword, the decoder successfully avoided visiting a Markovstate specified by { x [ d − ν + 1] , . . . , x [ d ] } . This implies that we can find two time index sets, D x ⊂ [ d − ν + 1 , d ] and D xd , D xd ∩ [ d − ν + 1 , d ] = φ , such that the optimality of all message sets { ˜ x ( D x ∪ D xd ) } with ˜ x [ ˜ d ] = x [ ˜ d ], ∀ ˜ d ∈ D x is disproved. We choose D xd with the maximum cardinality while make sure that, in disproving the optimality of { x [ d − ν + 1] , . . . , x [ d ] } , the detector visited all the Markov states { ˜ x [ ˜ d − ν + 1] , . . . , ˜ x [ ˜ d ] } satisfying [ ˜ d − ν + 1 , ˜ d ] ⊆ D xd .According to the definitions of D x and D xd , the decoder needs to disprove the optimality of aspecial message set { x ( D x ∪ D xd ) } defined by x [ ˜ d ] = x [ ˜ d ], ∀ ˜ d ∈ D x and x [ ˜ d ] = x [ ˜ d ], ∀ ˜ d ∈ D xd . The definition of D xd also implies that the decoder needs to obtain a lower bound S Lx (˜ x ( D x ∪ D xd ))of the negative SLLs of the messages in { ˜ x ( D x ∪ D xd ) } . The lower bound S Lx (˜ x ( D x ∪ D xd )) shouldonly be a function of the partial message ˜ x ( D x ∪ D xd ), but should not depend on any source messagesymbol whose time index is outside D x ∪ D xd . However, since the corresponding D e (defined in (11)) of { x ( D x ∪ D xd ) } satisfies | D e | ≤ ν , according to Lemma 1, the probability of disproving the optimalityof { x ( D x ∪ D xd ) } (using SLL-based optimality test) is low if N − | D x ∪ D xd | ≫ ν .To make the argument explicit, the fact that the decoder visits all Markov states { ˜ x [ ˜ d − ν +1] , . . . , ˜ x [ ˜ d ] } with [ ˜ d − ν + 1 , ˜ d ] ⊆ D xd implies C sll ≥ | D xd | − νN + ν C va . (48)According to Lemma 1, for any positive constant δ > 0, if we fix all other parameters and take N toinfinity, we have lim N →∞ P ( N − | D xd | − | D x | ν < δ ν N ) = 1 . (49)Combining (48) and (49), we get lim N →∞ P { C sll ≥ (1 − δ ) C va } = 1 . (50)Since (50) holds for any fixed SNR, it still holds if we take SNR to infinity after taking N toinfinity, i.e., lim SNR →∞ lim N →∞ P { C sll ≥ (1 − δ ) C va } = 1 . (51) D. The Hidden Markov Model and Its Key Assumptions In this section, we show the communication system presented in Section II satisfies the discrete-timehidden Markov model and the key assumptions given in Section VI.Consider a communication system modeled in Section II. Define u [ d ] = [ x [ d − ν + 1] , . . . , x [ d ]]. Itis easy to see u ( D ) is a Markov sequence. The processed state y [ d ] = y ( u [ d ]) is only a function ofthe corresponding Markov state. If two Markov states in successive time indices take the form u [ d ] = [˜ x [ d − ν + 1] , . . . , ˜ x [ d ]] u [ d + 1] = [˜ x [ d − ν + 2] , . . . , ˜ x [ d + 1]] , (52)for some ˜ x ( D ), then we have P t ( u [ d + 1] | u [ d ]) = 1 q k . (53) An equivalent statement of (49) is, if N −| D xd |−| D x | ν < δ ν N , as N → ∞ , the probability of disproving the optimality of allmessage sets { ˜ x ( D x ∪ D xd ) } with ˜ x [ ˜ d ] = x [ ˜ d ], ∀ ˜ d ∈ D x , using SLL-based optimality test goes to zero. Otherwise P t ( u [ d + 1] | u [ d ]) = 0. According to (30), we have p tr = 1.Since u [ d ] = [ x [ d − ν + 1] , . . . , x [ d ]] does not depend on source symbols at time indices m ≤ d − ν ,we know P t ( u [ d ] | u [ d − ν ]) = 0 , ∀ u [ d ] , u [ d − ν ] . (54)The observation density is given by f o ( r | y ) = (cid:18) SNR π (cid:19) n exp (cid:18) − SNR k r − g q ( y ) k (cid:19) . (55)Next, we show Assumption 2 is satisfied. Let u ( D ) and ˜ u ( D ) be two Markov state sequences. Let x ( D ) and y ( D ) be the source message and the codeword corresponding to u ( D ). Let ˜ x ( D ) and ˜ y ( D )be the source message and the codeword corresponding to ˜ u ( D ). For a time index d , if u [ d ] = ˜ u [ d ], wecan find a time index m ∈ ( d − ν, d ] such that x [ m ] = ˜ x [ m ]. Consequently, according to [14, Corollary2], we can find a time index ˜ m ∈ [ m, m + ν ), such that y [ ˜ m ] = ˜ y [ ˜ m ]. Therefore, Assumption 2 holdsbecause ˜ m ∈ ( d − ν, d + ν ).Let d and d be defined in Theorem 1. Let y = y be two arbitrary codeword symbols. Wehave the following triangle inequalities, k r − g q ( y ) k ≥ k g q ( y ) − g q ( y ) k − k r − g q ( y ) kk r − g q ( y ) k ≤ k g q ( y ) − g q ( y ) k + k r − g q ( y ) k . (56)The first inequality in (56) impliesmin y , y = y [ − log( f o ( r | y ))] + log( f o ( r | y )) = min y , y = y (cid:20) SNR k r − g q ( y ) k − k r − g q ( y ) k ) (cid:21) ≥ SNR d min ( d min − k r − g q ( y ) k ) . (57)The second inequality in (56) impliesmax y = y [ − log( f o ( r | y )) + log( f o ( r | y ))] = max y = y (cid:20) SNR k r − g q ( y ) k − k r − g q ( y ) k ) (cid:21) ≤ max y (cid:20) SNR k r − g q ( y ) k (cid:21) ≤ max y h SNR( k r − g q ( y ) k + k g q ( y ) − g q ( y ) k ) i ≤ SNR( k r − g q ( y ) k + d ) . (58)Therefore, Assumption 3 is satisfied by defining L l ( r , y ) = SNR d min ( d min − k r − g q ( y ) k ) L u ( r , y ) = SNR( k r − g q ( y ) k + d ) . (59)Note that evaluating L l ( r , y ) and L u ( r , y ) does not involve visiting any processed state other than y . If y [ d ] and r [ d ] are the actual codeword symbol and the channel output at time index d , k r [ d ] − g q ( y [ d ]) k = k n [ d ] k is a χ random variable with mean n SNR and variance n SNR . From (59), it is easilyseen that Assumption 4 is satisfied with d > d < ∞ . E. Proof of Theorem 3Proof: Let ˜ u ( D ) be an arbitrary Markov state sequence with corresponding processed statesequence being ˜ y ( D ). Assume ˜ u [ m + ν − = u [ m + ν − 1] (60)Theorem 3 holds if we can prove that any ˜ u ( D ) satisfying (60) cannot be the ML state sequence.Let k denote a positive integer. Define two integers K l and K r as follows. K l = argmin k> { ˜ u [ m + ν − − kν ] = u [ m + ν − − kν ] } K r = argmin k> { ˜ u [ m + ν − kν ] = u [ m + ν − kν ] } . (61)We consider respectively the following four cases based on the values of K l and K r . In all the fourcases, we show ˜ u ( D ) cannot be the ML sequence. Case 1: K l ≤ M + 1, K r ≤ M − u [ m + ν − kν ] = u [ m + ν − kν ] for all − K < k < K r , according to Assumption 2, ˜ y ( D )and y ( D ) differ at no less than j K l + K r k time indices in the time interval [ m + ν − K l ν, m + ν + K r ν ),where ⌊ x ⌋ denotes the maximum integer no larger than x . According to (33) and (36), for d ∈ [ m − M ν, m + 2 M ν ), if ˜ y [ d ] = y [ d ], we have − log f o ( r [ d ] | ˜ y [ d ]) f o ( r [ d ] | y [ d ]) ≥ L l ( r [ d ] , y [ d ]) > ν ( ρ − log p tr ) . (62)Consequently, we get − m + ν − K r ν X d = m + ν − K l ν log f o ( r [ d ] | ˜ y [ d ]) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y [ d ]) P t ( u [ d ] | u [ d − ≥ (cid:22) K l + K r (cid:23) ν ( ρ − log p tr ) + ( K r + K l ) ν log p tr ≥ (cid:22) K l + K r (cid:23) νρ > u ( D ) “covers” ˜ u ( D ). Hence˜ u ( D ) cannot be the ML sequence. Case 2: K l ≤ M + 1, K r > M − u c ( D ) and show that u c ( D ) covers ˜ u ( D ). See definition in Appendix A. u c ( D ) is constructed as follows. u c [ d ] = u [ d ] , for d < m + 2 M ν u c [ d ] = ˜ u [ d ] , for d ≥ m + (2 M + 1) ν. (64)According to (31), we can always construct u c [ d ] for d ∈ [ m + 2 M ν, m + (2 M + 1) ν ) so that (64) issatisfied. Let y c ( D ) be the processed state sequence corresponding to u c ( D ).From (33) and the first inequality in (37), we get − m +(2 M +1) ν X d = m +2 Mν log f o ( r [ d ] | ˜ y [ d ]) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y c [ d ]) P t ( u c [ d ] | u c [ d − ≥ − m +(2 M +1) ν − X d = m +2 Mν L u ( r [ d ] , y [ d ]) + ( ν + 1) log p tr ≥ − M νρ (65)Since ˜ u [ m + ν − kν ] = u c [ m + ν − kν ] for all − K l < k ≤ M − 1, according to Assumption 2,˜ y ( D ) and y ( D ) differ at no less than j K l +2 M − k time indices in the time interval [ m + ν − K l ν, m +2 M ν ). According to (33) and (36), we have − m +2 Mν − X d = m + ν − K l ν log f o ( r [ d ] | ˜ y [ d ]) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y c [ d ]) P t ( u c [ d ] | u c [ d − > (cid:22) K l + 2 M − (cid:23) ν ( ρ − log p tr ) + ( K l + 2 M − ν log p tr ≥ M ν ( ρ − log p tr ) + 2 M ν log p tr ≥ M νρ (66)Combining (65) and (66), we obtain − m +(2 M +1) ν X d = m + ν − K l ν log f o ( r [ d ] | ˜ y [ d ]) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y c [ d ]) P t ( u c [ d ] | u c [ d − > u c ( D ) covers ˜ u ( D ). Hence according to the PCC, ˜ u ( D ) cannot be the ML sequence. Case 3: K l > M + 1, K r ≤ M − u c ( D ) and show that u c ( D ) covers ˜ u ( D ). u c ( D ) is constructed as follows. u c [ d ] = u [ d ] , for d ≥ m − M ν u c [ d ] = ˜ u [ d ] , for d < m − (2 M + 1) ν. (68)According to (31), we can always construct u c [ d ] for d ∈ [ m − (2 M + 1) ν, m − M ν ) so that (68) issatisfied. Let y c ( D ) be the processed state sequence corresponding to u c ( D ).From (33) and the second inequality in (37), we get − m − Mν − X d = m − (2 M +1) ν log f o ( r [ d ] | ˜ y [ d ]) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y c [ d ]) P t ( u c [ d ] | u c [ d − ≥ − m − Mν − X d = m − (2 M +1) ν L u ( r [ d ] , y [ d ]) + ν log p tr ≥ − M νρ. (69) Since ˜ u [ m + ν − kν ] = u c [ m + ν − kν ] for all − M − ≤ k < K r , according to Assumption2, ˜ y ( D ) and y ( D ) differ at no less than j M +1+ K r k time indices in the time interval [ m − M ν, m + ν + K r ν ). According to (33) and (36), we have − m + ν + K r ν − X d = m − Mν log f o ( r [ d ] | ˜ y [ d ]) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y c [ d ]) P t ( u c [ d ] | u c [ d − > (cid:22) M + 1 + K r (cid:23) ν ( ρ − log p tr ) + (2 M + 1 + K r ) ν log p tr ≥ M + 1) νρ. (70)Combining (69) and (70), we obtain m + ν + K r ν − X d = m − (2 M +1) ν log f o ( r [ d ] | ˜ y [ d ]) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y c [ d ]) P t ( u c [ d ] | u c [ d − < u c ( D ) covers ˜ u ( D ). Hence according to the PCC, ˜ u ( D ) cannot be the ML sequence. Case 4: K l > M + 1, K r > M − u c ( D ) as follows. u c [ d ] = u [ d ] , for m − M ν ≤ d < m + 2 M ν u c [ d ] = ˜ u [ d ] , for d ≥ m + (2 M + 1) ν u c [ d ] = ˜ u [ d ] , for d < m − (2 M + 1) ν. (72)Let the processed state sequence corresponding to u c ( D ) be y c ( D ).Since ˜ u [ m + ν − kν ] = u c [ m + ν − kν ] for all − M − ≤ k ≤ M − 1, according to Assumption2, ˜ y ( D ) and y ( D ) differ at no less than j M +12 k time indices in the time interval [ m − M ν, m +2 M ν ).According to (33) and (36), we have − m +2 Mν − X d = m − Mν log f o ( r [ d ] | ˜ y [ d ]) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y c [ d ]) P t ( u c [ d ] | u c [ d − > (cid:22) M + 12 (cid:23) ν ( ρ − log p tr ) + 4 M ν log p tr ≥ M νρ. (73)Meanwhile, it is easily seen that (65) and (69) hold. Combine (65), (69) and (73), we obtain − m +(2 M +1) ν X d = m − (2 M +1) ν log f o ( r [ d ] | ˜ y [ d ])) P t ( ˜ u [ d ] | ˜ u [ d − f o ( r [ d ] | y c [ d ]) P t ( u c [ d ] | u c [ d − > − M νρ − M νρ + 6 M νρ = 0 . (74)(74) implies that u c ( D ) covers ˜ u ( D ). Hence according to the PCC, ˜ u ( D ) cannot be the ML sequence.Overall, we showed that ˜ u ( D ) cannot be the ML sequence irrespective of the values of K l and K r .Therefore, ˜ u [ m + ν − 1] = u [ m + ν − 1] must be true. References [1] G. Forney, The Viterbi Algorithm , Proc. of The IEEE, Vol. 61, No. 3, pp. 268-278, Mar. 1973.[2] A. Viterbi, Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , IEEE Trans. Inform.Theory, Vol. IT-13, No. 2, pp. 260-269, Apr. 1967. [3] K. Zigangirov and H. Osthoff, List Decoding of Trellis Codes , Problems of Control and Information Theory, pp. 347-364,1980.[4] R. Fano, A Heuristic Discussion of Probabilistic Decoding , IEEE Trans. Inform. Theory, Vol. IT-9, pp. 64-74, Apr. 1963.[5] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate , IEEETrans. Inform. Theory, Vol. IT-20, pp. 284-287, Mar. 1974.[6] R. Johannesson and K. Zigangirov, Fundamentals of Convolutional Coding , IEEE Press, 1999.[7] P. Swaszek and W. Jones, How Often Is Hard-Decision Decoding Enough? , IEEE Trans. Inform. Theory, Vol. 44, pp. 1187-1193, May 1998.[8] M. Ariel and J. Snyders, Error-Trellises for Convolutional Codes-Part II: Decoding Methods , IEEE Trans. Commun., Vol. 47,pp. 1015-1024, July 1999.[9] B. Hassibi and H. Vikalo, On The Sphere Decoding Algorithm I. Expected Complexity , IEEE Trans. Sig. Proc., Vol. 53, No.8, pp. 2806-2818, Aug. 2005.[10] U. Fincke and M. Pohst, Improved Methods for Calculating Vectors of Short Length in A Lattice, Including A ComplexityAnalysis , Math. Comput., Vol. 44, pp. 463-471, Apr. 1985.[11] H. Vikalo and B. Hassibi, Maximum-Likelihood Sequence Detection of Multiple Antenna Systems over Dispersive Channelsvia Sphere Decoding , EURASIP J. Appl. Sig. Proc., No. 1, pp. 525-531, Jan. 2002.[12] H. Vikalo, Sphere Decoding Algorithms for Digital Communications , Ph.D. Thesis, Stanford Univ., 2003.[13] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition , Proc. of IEEE, Vol. 77,No. 2, pp. 257-286, Feb. 1989.[14] G. Forney, Structural Analysis of Convolutional Codes via Dual Codes , IEEE Trans. Inform. Theory, Vol. IT-19, pp. 512-518,Jul. 1973.[15] J. Luo,