[PDF] Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery

Abstract

This work tackles the problem of learning a set of language specific acoustic units from unlabeled speech recordings given a set of labeled recordings from other languages. Our approach may be described by the following two steps procedure: first the model learns the notion of acoustic units from the labelled data and then the model uses its knowledge to find new acoustic units on the target language. We implement this process with the Bayesian Subspace Hidden Markov Model (SHMM), a model akin to the Subspace Gaussian Mixture Model (SGMM) where each low dimensional embedding represents an acoustic unit rather than just a HMM's state. The subspace is trained on 3 languages from the GlobalPhone corpus (German, Polish and Spanish) and the AUs are discovered on the TIMIT corpus. Results, measured in equivalent Phone Error Rate, show that this approach significantly outperforms previous HMM based acoustic units discovery systems and compares favorably with the Variational Auto Encoder-HMM.

Full PDF

BBayesian Subspace Hidden Markov Model for Acoustic Unit Discovery

Lucas Ondel, Hari Krishna Vydana, Luk´aˇs Burget, Jan ˇCernock´y

Brno University of Technology { iondel,vydana,burget,cernocky } @fit.vutbr.cz Abstract

This work tackles the problem of learning a set of language spe-ciﬁc acoustic units from unlabeled speech recordings given aset of labeled recordings from other languages. Our approachmay be described by the following two steps procedure: ﬁrstthe model learns the notion of acoustic units from the labelleddata and then the model uses its knowledge to ﬁnd new acous-tic units on the target language. We implement this processwith the Bayesian Subspace Hidden Markov Model (SHMM), amodel akin to the Subspace Gaussian Mixture Model (SGMM)where each low dimensional embedding represents an acousticunit rather than just a HMM’s state. The subspace is trainedon 3 languages from the GlobalPhone corpus (German, Polishand Spanish) and the AUs are discovered on the TIMIT corpus.Results, measured in equivalent Phone Error Rate, show thatthis approach signiﬁcantly outperforms previous HMM basedacoustic units discovery systems and compares favorably withthe Variational Auto Encoder-HMM.

Index Terms : Bayesian Inference, Hidden Markov Model,Subspace Model, Variational Bayes, Low-resource languages,Acoustic Unit Discovery

1. Introduction

State-of-the-art Automatic Speech Recognition (ASR) systemsrely upon very large amount of speech recordings paired withtextual transcriptions. While this approach has proven to bevery successful, it is however limited to the very few languageshaving enough resources to train an ASR system. Due to thecost of data collection and transcription, broadening the rangeof speech technologies to any language remains an unreachableobjective. Parallel to the mainstream ASR, there has been agrowing interest in the paradigm of unsupervised learning ofspeech [1]. Unsupervised speech learning attempts to use ma-chine learning techniques to extract various information (pho-netic content, speaker identity, . . . ) from unlabeled recordings.While this is considerably harder than standard ASR, solvingthis problem would have a considerable impact on the ﬁeld byreducing the amount of human labour necessary to build a fullﬂedged ASR pipeline. It is also important to emphasize thatthe linguistic diversity is diminishing worldwide. Many lan-guages are now considered endangered and risk to disappear ina near future. Affordable speech technologies could be a pre-cious tool to help linguists and communities to document andpreserve these languages.This work focuses on the speciﬁc task of acoustic unit dis-covery (AUD). Given a collection of unlabeled recordings in aspeciﬁc language, the task is to learn a set of basic speech units(also called pseudo-phones) to describe the language. AUD al-gorithms have to solve three problems: to segment the speech,to cluster the segments into units and to infer how many unitsare necessary to describe the language. Several approaches havebeen proposed relying upon Bayesian non-parametric version of the Hidden Markov Model (HMM) [2, 3, 4]. An important re-cent extension of this model is the VAE-HMM [5, 6, 7] whichcombines the traditional HMM with Variational Auto Encoder[8]. However, most of the AUD algorithms are prone to modelspeaker/channel or any non-phonetic variability. To address thisissue, we propose the Bayesian Subspace HMM (SHMM). TheSHMM is an HMM based AUD model in which the parametersof each unit is constrained to be in the phonetic subspace of thetotal parameter space. This restriction forces the AUD model tofocus on the phonetic content of the speech signal and to ignoreirrelevant information.

2. Model

Let X = ( x , . . . , x N ) be the sequence of N observed speechframes and U = { u , . . . , u P } be the set of P acoustic units. v = ( v , . . . , v N ) , v i ∈ U is a sequence of variables in-dicating to which unit each speech frame is associated, and Z = ( z , . . . , z N ) are model-dependent latent variables. Weconsider generative models for which the complete likelihoodof the data factorizes as: p ( X , Z | v ) = N (cid:89) n =1 p ( x n , z n | v n ) (1)and the likelihood of a speech frame for a given unit is memberof the exponential family of distribution: p ( x n , z n | v n = u ) = exp (cid:8) η Tu T ( x n , z n ) − A ( η u ) (cid:9) (2)where η u ∈ H is the D -dimensional vector of natural parame-ters corresponding to one acoustic unit, T ( x n , z n ) are the suf-ﬁcient statistics and A ( η u ) is the (log-)normalization constantof the density. Note that the nature of the model for the units(HMM, GMM, Linear Dynamical Model, . . . ) will depend onthe value of z n and the sufﬁcient statistics T . In this work weconsider that each unit is modeled by an HMM with a GMMfor each state’s emission but it can be replaced by any modelsatisfying Eq. 1 and Eq. 2. Previous works [2, 3, 6] use specialcases of this model to perform the AUD. More precisely, onecan understand AUD as ﬁnding a set of vectors η u , . . . , η u p such that the likelihood of the observation is maximized . Thissearch is difﬁcult because speech recordings encode many fac-tors other than the phonetic information (speaker identity, emo-tions, environment, . . . ) and the AUD algorithm may maximizethe likelihood while modeling non-phonetic information. To avoid the AUD model to capture non-phonetic information,we proposed the Subspace HMM (SHMM) which constrains These algorithm also learn the number of acoustic units P neededto ﬁt the data a r X i v : . [ c s . L G ] J u l u η u bW z n v n x n NP (a) (b)Figure 1: (a) Directed Acyclic Graph of a Generalized Subspace Model. Dashed lines represent deterministic relationship betweenvariables. SHMM, JFA, SGMM are special cases of this model. In this work each embedding h u encodes the parameters of one HMMcorresponding to an acoustic unit. (b) Illustration of the subspace model for acoustic units. Each point of the plane corresponds to theparameters of an acoustic unit model and the blue line represents the subspace deﬁned by f ( W T h + b ) . Given an acoustic unit modelcorresponding to the sound aa , moving its parameters along the subspace will change the model to represent another unit/phone ( ow , z in this example). Conversely, moving the parameters away from the phonetic subspace will push the model to capture non-phoneticinformation (for instance speaker gender). the parameters of the acoustic units to live in the phonetic space.This model extends the unsupervised HMM by assuming thatthe phonetic information of a language is contained in a sub-space of the total parameters space. Formally, it is deﬁned as: η u = f ( W T h u + b ) (3)where f : R D (cid:55)→ H is a differentiable function. We furtherreﬁne this subspace model by introducing a prior over the sub-space’s parameters: W r,c ∼ N (0 , σ W r,c ) (4) b ∼ N ( , I ) (5) h u ∼ N ( , I ) (6)As depicted in Fig. 1b, the bases of W span the subspace con-taining the phonetic variability. Since the parameters of theacoustic units are constrained to live in a low dimensional sub-space, the AUD algorithm can be seen as ﬁnding the set of em-beddings h u , . . . , h u P which maximizes the likelihood of theobservations. By constraining the search in the phonetic sub-space, we therefore force the algorithm to ignore non-phoneticsource of variability.Note that Subspace Gaussian Mixture Model [9], Joint Fac-tor Analysis [10], Subspace Multinomial Model [11], etc. arespecial cases of Eq. 3. In fact, Eq. 3 is the general form of anysubspace model for which the complete likelihood is a mem-ber of the exponential family of distributions. We denote Eq. 3as the Generalized Subspace Model (GSM) of which the Sub-space HMM, like other aforementioned models, is just a spe-cial instance. The graphical representation of the (GSM) is de-picted in Fig. 1a. To complete our deﬁnition of the SHMM,we need to specify the mapping f from R D to the natural pa-rameters space H . In our setting, each unit is modeled by aHMM with a 3 states left-to-right topology and each state hasa GMM emissions with K Gaussian components with diago-nal covariance matrix. For convenience, we introduce the vec-tor ψ = W T h + b which can be decomposed into three parts ψ = ( ψ , ψ , ψ ) T . ψ i is the vector of parameters (before the mapping f ) associated with the i th HMM state. ψ i further de-composes into ψ i = ( ψ π i , ψ µ i, , . . . , ψ µ i,K , ψ Σ i, , . . . , ψ Σ i,K ) T where ψ π i is the vector encoding the parameters of the mix-ture’s weights and ψ µ i,j and ψ Σ i,j are the vectors encoding theparameters of the mean and covariance matrix of the jth Gaus-sian component, respectively. We set f such that: π i,j = exp { ψ ( π ) i,j } (cid:80) K − k =1 exp { ψ ( π ) i,k } (7) µ i,j = ψ ( µ ) i,j (8) Σ i,j = diag(exp { ψ ( Σ ) i,j } ) (9)where exp is the elementwise exponential function. One couldalso include the transition probabilities of the HMM but we keptthem as ﬁxed parameters in this work. Unlike previous AUD algorithms, our model requires to spec-ify the phonetic subspace (parameterized by W and b ) beforesearching the acoustic units. This is a ”chicken or egg” prob-lem since we need the phonetic subspace to ﬁnd the pseudo-phones of the language and we need to know the phones of alanguage to estimate the subspace. However, this problem canbe alleviated by observing that many languages in the worldhave common phones. It is reasonable to believe that the pho-netic subspace of language is well approximated by a phoneticsubspace estimated from one or several other languages forwhich we have labeled data. Interestingly, this rationale natu-rally ﬁts the Bayesian approach of the problem of AUD. Givenunlabeled set of observation X ( t ) in a target language t , previ-ous Bayesian AUD algorithms try to estimate the inventory of(pseudo-)phones U ( t ) of the target language by estimating: p ( U ( t ) | X ( t ) ) = p ( X ( t ) | U ( t ) ) p ( U ( t ) ) p ( X ( t ) ) (10)If we now assume the phonetic subspace to be estimated fromthe observations X ( p ) of another language p with known inven-ory of phones U ( p ) , the problem can be reformulated as: p ( U ( t ) | X ( t ) , L ( p ) , S ) = p ( X ( t ) | U ( t ) , L ( p ) , S ) p ( U ( t ) | L ( p ) , S ) p ( X ( t ) | L ( p ) , S ) (11) L ( p ) = { X ( p ) , U ( p ) } (12) S = { W , b } (13)The term p ( U ( t ) | L ( p ) , S ) may be seem as some edu-cated/informative prior which embeds the notion of phone intothe AUD algorithm. This educated prior needs to be estimatedas well which leads to a two steps procedure for the SHMMAUD algorithm. First, given the labeled data of one or severallanguages, the prior over the acoustic units is estimated. Infor-mally speaking, we force the model to learn ”what is a phone”.Second the unlabeled data of the target language is clusteredinto pseudo-phones given the phonetic knowledge acquired bythe model during the ﬁrst step. The two steps of the training (learning the prior and cluster-ing the units) are carried out by optimizing the same objectivefunction except that when estimating the prior, the acoustic unittranscription of each utterance is known. The presence or ab-sence of the transcription will be reﬂected in p ( v ) . When thereis no transcription, p ( v ) can be understood as a ”pseudo-phone”loop (see [3] for details) and when the transcription is knownthen p ( v ) is just the inference graph used for forced alignmentin a traditional HMM based ASR system.Since the estimation of the exact posterior of the model’sparameters is intractable, we use the Variational Bayes (VB)objective function to ﬁnd an approximate posterior: L [ q ] = (cid:10) ln p ( X | Ξ , Θ ) (cid:11) q − D KL (cid:0) q ( Ξ , Θ ) || p ( Ξ , Θ ) (cid:1) (14) Ξ = { Z , v } (15) Θ = { W , b , h u , . . . , h u p } (16)where (cid:104) . . . (cid:105) q denote the expectation w.r.t. the distribution q and D KL is the Kullback-Leibler divergence. Eq. 14 is not tractablefor arbitrary distribution q we therefore consider the restrictedset of distributions with the following mean-ﬁeld factorizationand the given parameterization: q ( Ξ , Θ ) = q ( Ξ ; φ ) q ( Θ ; ζ ) (17) ζ = { m , λ } (18) q ( Θ ; ζ ) = N ( m , diag(exp { λ } ) (19)The parameters φ of the variational posterior over Ξ will de-pend on the type of the model of the acoustic unit. For the caseof an HMM, this is the probability to be in a particular stategiven the sequence of observations. Under these restrictions theoptimization reduces to: φ ∗ , ζ ∗ = arg max φ , ζ L ( φ , ζ ) (20)Since we assume each unit to be modeled by an HMM, φ ∗ hasan analytical solution which can be efﬁciently calculated us-ing the forward-backward algorithm [12]. ζ ∗ has no analyticalsolution but can be found through a stochastic gradient ascentscheme. Noting that ∇ φ L ζ ( φ ∗ ) = we have: ∇ ζ L ( φ ∗ , ζ ) = ∇ ζ L φ ∗ ( ζ ) + ∇ ζ φ ∗ ∇ φ L ζ ( φ ∗ ) (21) = ∇ ζ L φ ∗ ( ζ ) (22) Finally, we approximate ∇ ζ L ( φ ∗ , ζ ) ≈ ∇ ζ L (cid:48) ( φ ∗ , ζ ) by usingthe so called re-parameterization trick introduced in [8]: (cid:15) l ∼ N ( , I ) (23) Θ l = m + diag(exp { λ } ) (cid:15) l (24) L ( φ , ζ ) ≈ L L (cid:88) l =1 ln p ( X | Ξ , Θ l ) (25) − D KL (cid:0) q ( Ξ , Θ ) || p ( Ξ , Θ ) (cid:1) = L (cid:48) ( φ , ζ ) (26)In practice we use the ADAM optimizer [13] to update ζ andwe use L = 10 samples to compute the empirical expectation.The parameters φ are re-estimated every updates of ζ .

3. Experiments

We conducted our experiments with the TIMIT [14] databaseand 3 languages from the GlobalPhone corpus [15]: German(GE), Polish (PO) and Spanish (SP). For each of the three Glob-alPhone languages, we kept only 3000 randomly selected ut-terances. We used two sets of features: (i) the MFCC fea-tures concatenated with their ﬁrst and second derivatives (ii) theMulti-Lingual bottleNeck (MBN) features trained on 17 Babel’slanguages [16]. The set of languages used to train the MBNfeatures does not include English, German, Polish or Spanish.Both set of features were extracted at a rate of 100 Hz. For thecase the MBN features, the audio signal was down-sampled to8kHz.We evaluated the different AUD algorithms in terms of pho-netic segmentation and equivalent Phone Error Rate (eq. PER)([17, 5]). For the phonetic segmentation we used the standardRecall, Precision and F-score measured against the timing pro-vided in the TIMIT database with the 61 original phones. Wetolerated boundary shifted by +- 2 frames (20 milliseconds). Tocompute the eq. PER, we mapped each acoustic unit to oneof the 61 phones it overlaps the most with. Then, we reducedthe reference transcription and proposed transcription to the 39phone set [18] and computed the PER.

First, we ran a controlled experiment to assess whether theSHMM is able to properly learn the phonetic subspace of a lan-guage. In this experiment, we used the MBN features and eachHMM state had 8 Gaussian components. First, we trained aBayesian HMM phone recognizer on the 48 phone set with aﬂat phonotactic language model on the traditional TIMIT train-ing set (no SA* utterances) and decoded on the test set map-ping the phones to the 39 phone set. This phone recognizerachieved 36.4 % Phone Error Rate (PER). This number is veryhigh since we have removed crucial elements of the traditionalASR pipeline (language model, context-dependent phones, . . . )in order to evaluate the quality the acoustic model. For compari-son, we trained a monophone system with a ﬂat phonotactic lan-guage model using the Kaldi toolkit [9] which yielded 37.3 %PER. We then trained an SHMM based phone recognizer withvarying subspace dimension using the same training and test-ing setup as the baseline HMM. We used the baseline model toprovide the ﬁrst estimate of φ which we modiﬁed so that all theGaussian components within a state have equal responsibility.We pre-trained the subspace for 15000 updates before updating φ then we re-estimated φ after every 1000 updates of ζ for 30 odel Features Prior Language Recall Precision F-score eq. PERHMM [5] MFCC + ∆ + ∆∆ None - - - 65.4VAE-HMM [5] MFCC + ∆ + ∆∆ None - - - 58.9VAE-BHMM [6] log-mel FBANK + ∆ + ∆∆ None - - - 56.57HMM MFCC + ∆ + ∆∆ None 66.47 57.81 61.84 64.92HMM MBN None 63.98 54.21 58.69 68.25SHMM MFCC + ∆ + ∆∆ GE ∆ + ∆∆ GE+PO 73.94 74.47 74.20 58.23SHMM MFCC + ∆ + ∆∆ GE+PO+SP 75.03 74.00 74.51 56.91SHMM MBN GE 56.57 69.34 62.31 55.14SHMM MBN GE+PO 59.18 69.12 63.76 54.1SHMM MBN GE+PO+SP 60.89 68.41 64.43

Table 1:

Comparison of the SHMM against other AUD models in terms of phonetic segmentation (Recall, Precision, F-score) andequivalent Phone Error Rate (%). iterations. Results, shown in Fig. 2, indicate that the SHMMFigure 2:

PER of the SHMM for varying subspace dimension. is perfectly able to learn the phonetic subspace of a languageby compressing the 3861-dimensional parameter space to asubspace as small as 30 dimensions and yet achieving the samePER as the HMM baseline. We now consider the case of unsupervised learning of speechwhere English is assumed to be a low-resourced language. Inthis setup, we use the complete TIMIT set (training, develop-ment and testing set including the SA* utterances) as the cor-pus from which to extract acoustic units. In this experiment,all the HMM/SHMM have 4 Gaussian components per state.Our baseline is the HMM based AUD system described in [3]and the VAE-(B)HMM based AUD system proposed in [5, 6].We compare these baselines with 3 SHMM based AUD mod-els for which the posterior of the phonetic subspace q ( W , b ) was estimated using: (i) German language (ii) German and Pol-ish languages (iii) German, Polish and Spanish languages. Foreach case the phonetic subspace had 35, 70 and 100 dimensionsrespectively. Note that the choice of the languages and the or-der of combination was arbitrary and it is likely that choosinglanguages closely related to the target language would be ben-eﬁcial. We considered all the phones of all the languages to beunique and didn’t merge them while estimating the subspace.The posteriors of the embeddings q ( h u ) corresponding to theGerman, Polish and Spanish phones were discarded before theAUD clustering.The results are presented in Table 1 and differ signiﬁ-cantly depending on the input features. The SHMM alwaysbeneﬁts from learning the phonetic subspace in terms of eq. × (8 Gaussian × × . 80 is the features dimension, accounts for the mean and the diagonal of the covariance matrix and7 is the dimension of the per-state mixture weights. PER. Interestingly, the baseline HMM fails to beneﬁt fromthe MBN features as it underperforms compared to the HMMtrained on MFCC features. The SHMM, thanks to its sub-space, learns from other languages to fully exploit the discrim-inatively trained features. Regarding the segmentation evalua-tion, the SHMM better segments the speech compared to thesimple HMM. However, we observe that using more than onelanguage does not necessarily improves the segmentation. Also,contrary to the eq. PER, the MBN features does not seem to beideal to get accurate segmentation.Finally, we tried to label the TIMIT corpus with a HMMphone-recognizer (MBN features) trained on German, Germanand Polish and German, Polish and Spanish and we interpretedthe output phones as acoustic units. For these 3 models theeq. PER was 61.22 % (GE), 66.47 % (GE+PO) and 71.96 %(GE+PO+SP). Contrary to the SHMM, this naive approach doesnot beneﬁt from having more languages.

4. Conclusions

We proposed a new model for AUD: the Subspace HMM.Unlike other AUD models the SHMM is trained in a super-vised fashion on one or several languages to learn the notionof ”phone”. This phonetic knowledge is encoded into a non-linear subspace of the total parameter space. Then, the SHMMsearches a set of of acoustic units in this subspace which maxi-mizes the likelihood of the observations of the target language.The SHMM outperforms the HMM based AUD and is compet-itive with the VAE-HMM. When using discriminatively trainedfeatures, the SHMM achieves 49.2 % equivalent PER on TIMITwhithout any supervision in the target language.

5. Acknowledgements

The work was supported by Czech National Science Foundation(GACR) project ”NEUREM3” No. 19-26934X, Czech Ministryof Interior project No. VI20152020025 ”DRAPAK”, and CzechMinistry of Education, Youth and Sports from the NationalProgramme of Sustainability (NPU II) project ”IT4Innovationsexcellence in science - LQ1602”. This work was also sup-ported by by the Ofﬁce of the Director of National Intelli-gence (ODNI), Intelligence Advanced Research Projects Ac-tivity (IARPA) MATERIAL program, via Air Force ResearchLaboratory (AFRL) contract . References [1] J. R. Glass, “Towards unsupervised speech processing,” in

ISSPA .IEEE, 2012, pp. 1–4.[2] C. Lee and J. R. Glass, “A nonparametric bayesian approach toacoustic model discovery,” in

ACL (1) . The Association for Com-puter Linguistics, 2012, pp. 40–49.[3] L. Ondel, L. Burget, and J. Cernocky, “Variational inferencefor acoustic unit discovery,” in

Procedia Computer Science , vol.2016, no. 81. Elsevier Science, 2016, pp. 80–86.[4] L. Ondel, L. Burget, J. ˇCernock´y, and S. Kesiraju, “Bayesianphonotactic language model for acoustic unit discovery,” in

Pro-ceedings of ICASSP 2017 . IEEE Signal Processing Society,2017, pp. 5750–5754.[5] J. Ebbers, J. Heymann, L. Drude, T. Glarner, R. Haeb-Umbach,and B. Raj, “Hidden markov model variational autoencoderfor acoustic unit discovery,” in

Interspeech 2017, 18th An-nual Conference of the International Speech CommunicationAssociation, Stockholm, Sweden, August 20-24, 2017

Interspeech . ISCA, 2018, pp. 2688–2692.[7] L. Ondel, P. Godard, L. Besacier, E. Larsen, M. Hasegawa-Johnson, O. Scharenborg, E. Dupoux, L. Burget, F. Yvon, andS. Khudanpur, “Bayesian Models for Unit Discovery on a veryLow Resource Language,” in

IEEE International Conference onAcoustics, Speech and Signal Processing , ser. ICASSP, Calgary,Canada, 2018. [Online]. Available: sources/Ondel18bayesian.pdf[8] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”in

ICLR , 2014.[9] D. Povey, A. Ghoshal, G. Boulianne, N. Goel, M. Hannemann,Y. Qian, P. Schwarz, and G. Stemmer, “The kaldi speech recogni-tion toolkit,” in

In IEEE 2011 workshop , 2011.[10] P. Kenny, “Joint factor analysis of speaker and session variability:Theory and algorithms,” Tech. Rep., 2005.[11] S. Kesiraju, L. Burget, I. Sz˝oke, and J. ˇCernock´y, “Learning docu-ment representations using subspace multinomial model,” in

Pro-ceedings of Interspeech 2016 . International Speech Communi-cation Association, 2016, pp. 700–704.[12] L. R. Rabiner, “A tutorial on hidden markov models and selectedapplications in speech recognition,” in

PROCEEDINGS OF THEIEEE , 1989, pp. 257–286.[13] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti-mization,” in

ICLR , 2015.[14] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S.Pallett, and N. L. Dahlgren, “Darpa timit acoustic phonetic con-tinuous speech corpus cdrom,” 1993.[15] T. Schultz, N. T. Vu, and T. Schlippe, “Globalphone: A multilin-gual text & speech database in 20 languages.” in

ICASSP . IEEE,2013, pp. 8126–8130.[16] R. Fer, P. Matejka, F. Grezl, O. Plchot, K. Vesely, and J. H. Cer-nocky, “Multilingually trained bottleneck features in spoken lan-guage recognition,”

Computer Speech & Language , vol. 46, no.Supplement C, pp. 252 – 267, 2017.[17] H. Kamper, A. Jansen, and S. Goldwater, “A segmental frame-work for fully-unsupervised large-vocabulary speech recogni-tion,”