Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
Hyun-Wook Yoon, Sang-Hoon Lee, Hyeong-Rae Noh, Seong-Whan Lee
AAudio Dequantization for High Fidelity Audio Generation in Flow-basedNeural Vocoder
Hyun-Wook Yoon , Sang-Hoon Lee , Hyeong-Rae Noh , Seong-Whan Lee , Department of Computer and Radio Communications Engineering, Korea University, Seoul, Korea Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea Department of Artificial Intelligence, Korea University, Seoul, Korea { hw yoon, sh lee, hr noh, sw.lee } @korea.ac.kr Abstract
In recent works, a flow-based neural vocoder has shown signif-icant improvement in real-time speech generation task. The se-quence of invertible flow operations allows the model to convertsamples from simple distribution to audio samples. However,training a continuous density model on discrete audio data candegrade model performance due to the topological differencebetween latent and actual distribution. To resolve this prob-lem, we propose audio dequantization methods in flow-basedneural vocoder for high fidelity audio generation. Data dequan-tization is a well-known method in image generation but hasnot yet been studied in the audio domain. For this reason, weimplement various audio dequantization methods in flow-basedneural vocoder and investigate the effect on the generated au-dio. We conduct various objective performance assessmentsand subjective evaluation to show that audio dequantization canimprove audio generation quality. From our experiments, us-ing audio dequantization produces waveform audio with betterharmonic structure and fewer digital artifacts.
Index Terms : audio synthesis, neural vocoder, flow-based gen-erative models, data dequantization, deep learning
1. Introduction
Most speech synthesis models take two-stage procedures togenerate waveform audio from the text. First stage generatesspectrogram conditioned on linguistic features such as text orphoneme. [1–5] In second stage, generally refer to as vocoderstage, audio samples are generated through model capable of es-timating audio samples from the acoustic features. Traditionalapproaches estimated audio samples either directly from thespectral density model [6] or hand-crafted acoustic model [7,8],but these approaches tended to produce low-quality audio.After the emergence of the WaveNet [9], models that gener-ate audio samples on previously generated samples had shownexceptional works in the field. [10–12]. Nevertheless, dilatedcausal convolution networks used in the model require sequen-tial generation process during the inference, which infers thatreal-time speech synthesis is hard to achieve because parallelinference can’t be utilized. For this reason, generating high-quality waveform audio in real-time has become a challengingtask.To overcome the structural limitation of the auto-regressivemodel, most of the recent works are focused on non-
This work was supported by Institute of Information & communi-cations Technology Planning & Evaluation (IITP) grant funded by theKorea government (MSIT) (No. 2019-0-00079, Department of Artifi-cial Intelligence, Korea University), the Magellan Division of Netmar-ble Corporation, and the Seoul R&BD Program(CY190019). autoregressive models such as knowledge distillation [13, 14],generative adversarial network [15–19], and flow-based gener-ative model [20, 21]. We focus on the flow-based generativemodel since it can model highly flexible approximate poste-rior distribution in variational inference [22]. The transforma-tion from a single data-point to a Gaussian noise is one-to-one,which makes the parallel generation possible. However, wehave to acknowledge that audio samples are discrete data. Inother words, naive modeling of a continuous probability den-sity on discrete data can produce arbitrary high likelihood ondiscrete location [23, 24]. This can lead to degraded generationperformance in flow-based neural vocoder. Therefore, dequan-tization is required before the transformation.In this paper, we present various audio dequantizationschemes that can be implemented in the flow-based neuralvocoder. In image generation, adding continuous noise to data-points to dequantize the data is commonly used. However, tothe best of our knowledge, the effectiveness of data dequantiza-tion in audio domain is still an unknown area, so further investi-gation is needed. Unlike pixels of the image, audio samples arebounded to signed integer. To overcome this domain issue, weeither normalize range of noise values or range of audio sampleswith different normalization method. In addition, we adapt flowblock from flow-based neural vocoder to generate more flexiblenoises known as variational dequantization [25].
2. Flow-based Neural Vocoder
FloWaveNet [21] and WaveGlow [20] are two pioneers in flow-based neural vocoders. Both models are based on normalizingflow [22]. Two main contributions that they share are trainingsimplicity and faster generation. Since they use a single in-vertible flow network repeatedly, model structure is intuitive.Moreover, optimization can be easily done with a single log-likelihood loss function. During inference, random noises ofequal length to the product of frames of mel-spectrogram andhop-size are sampled from the spherical Gaussian distributionand simultaneously converted to audio samples. As a result,flow-based neural vocoders can produce waveform signal as fastas other non-autoregressive models.In general, flow-based neural vocoder requires three steps:squeeze, flow, and shuffle. In the squeezing step, the tempo-ral dimension of the feature is reduced whereas channels of thefeature are increased. According to FloWaveNet [21], this oper-ation increase the size of the receptive field like dilated convolu-tions layer from the WaveNet [9]. During the flow step, multipleblocks of flow operate affine transformation on the half of inputvectors. In detail, half input vectors are used to predict shiftand scale parameters for the other half in each flow operation. a r X i v : . [ ee ss . A S ] A ug igure 1: Examples of audio dequantization for flow-basedneural vocoder. (a) represents uniform dequantization. (b) rep-resents Gaussian dequantization. (c) represents variational de-quantization.
Lastly, the shuffle step mixes elements of data, giving flexibilityin transformation.Although both models have similar concepts, they usedifferent techniques in detail. FloWaveNet [21] defines onesqueeze operation and multiple flow operations as a single con-text block and duplicates this context block to form the flow pro-cess. The model also implements the activation normalizationlayer suggested in Glow [26] before the coupling layer to stabi-lize training. After each flow, the model simply swap odd andeven elements of vectors to shuffle features. WaveGlow [20]operates the squeezing step only once during the process. Also,the model adapts invertible 1x1 convolution from Glow to op-erate the shuffle step before each flow operation.
3. Audio Dequantization
A raw audio is stored in computer digitally. In other words, val-ues of the audio are formed as discrete representations. There-fore, naively transforming audio samples into Gaussian noisecan lead to arbitrary high likelihood on values of data in flow-based neural vocoder. To resolve this issue, we adapt the ideaof adding noise to each of the data-point to dequantize discretedistribution data in image generation task [23]. In image, a pixel x is represented as a single discrete value in { , , . . . , } , sodequantized data y can be formulated as y = x + u , where u represents D components of noise bounded to [0 , D .Unlike image, raw audio encoded in 16-bit WAV contains15-bit of negative and positive integer values, which can be rep-resented as {− , , . . . , , } . To apply data dequan-tization to raw audio, either range of audio samples must becompressed to 8-bit unsigned integer, or the range of dequan-tized data has to be within the range of ( − , D . For the properaudio dequantization, we present three different methods in thefollowing sections. In [23], authors note that optimizing the continuous model p model ( y ) on the dequantized data y ∼ p data can closelyoptimize the discrete model P model ( x ) on the origianl data x ∼ P data through Jensen’s inequality, which can be formu-lated as below: (cid:90) p data ( y ) log p model ( y ) dy (1) = (cid:88) x P data ( x ) (cid:90) [0 , D log p model ( x + u ) du (2) ≤ (cid:88) x P data ( x ) log (cid:90) [0 , D p model ( x + u ) du (3) = E x ∼ P data [log P model ( x )] (4)Since the uniform noise is bounded to [0 , D , values ofaudio samples have to be bounded to unsigned integer. For thispurpose, we preprocess raw audio with nonlinear compandingmethod called ’ mu-law companding ’ [27]. This method can sig-nificantly reduce the range of audio samples while minimizingthe quantization error. Redistribution equation of the methodcan be expressed as: ˆ x = sign ( x )( ln (1 + µ | x | ) ln (1 + µ ) ) (5)where sign function represents sign of value x, and µ representsintegers that x values are mapped. We set µ to to apply8-bit mu-law companding . Then, we add random noise fromuniformly distributed function formulated as Unif (0 , .We assume that companding audio with lossy compres-sion can possibly produce noisy output. Therefore, we imple-ment iw(importance-weighted) dequantization proposed in [24]to improve generation quality. In the paper, authors demonstratethat sampling noise multiple times can directly approximate theobjective log-likelihood, which can lead to better log-likelihoodperformance. As a result, we define uniformly dequantized data y u as: y u = ˆ x + 1 K K (cid:88) k =1 Unif (0 , k (6)where Unif (0 , defines noise sampled uniformly from [0 , D . We set K to 10.To compare the performance of model depending on iw de-quantization , we refer to uniform dequantization with iw de-quantization as Uniform IW and only uniform deqauntizationmodel as
Uniform . In flow-based neural vocoder, discrete data distribution is trans-formed into a spherical Gaussian distribution. In other words,equantizing data distribution to normal distribution can bemore optimal choice. With this though in mind, we formulateGaussian dequantization motivated from logistic-normal distri-butions [28]. Random noise samples are generated from normaldistribution formulated as N ( µ, σ ) , where mean and varianceare calculated from the given data batch. To properly imple-ment in audio domain, we apply a hyperbolic tangent functionto normalize noise boundary at ( − , D . As a result, normallydequantized data y n can be formulated as: y n = x + tanh( N ( M ( x b )) , Σ( x b )) (7)where M ( x b ) and Σ( x b ) represent mean and variance of batchgroup x b .To compare model performance between conventional andimproved method, we refer to the conventional method sug-gested in [28] as Gaussian Sig and proposed method as
Gaus-sian Tanh . Instead of adding noise from known distribution, noise distribu-tion can be formulated through a neural network such as a flow-based network. Flow++ [25] suggests that if the noise samples u are generated from conditional probability model q ( u | x ) , prob-ability distribution of original data can be estimated as follows: P model ( x ) := (cid:90) [0 , D q ( u | x ) p model ( x + u ) q ( u | x ) du (8)Then, we can obtain the variational lower-bound on the log-likelihood function by applying Jensen’s inequality as below: E x ∼ P data [log P model ( x )] (9) = E x ∼ P data (cid:34) log (cid:90) [0 , D q ( u | x ) p model ( x + u ) q ( u | x ) du (cid:35) (10) ≥ E x ∼ P data (cid:34)(cid:90) [0 , D q ( u | x ) log p model ( x + u ) q ( u | x ) du (cid:35) (11) = E x ∼ P data E u ∼ q ( u | x ) (cid:20) log p model ( x + u ) q ( u | x ) (cid:21) (12)As a result, dequantized data y p from variational dequanti-zation can be defined as: y p = x + q x ( (cid:15) ) (13)where (cid:15) ∼ p ( (cid:15) ) = N ( (cid:15) ; 0 , I ) .To implement variational dequantization in flow-based neu-ral vocoder, we modify flow model from FloWaveNet [21]. Weset initial input as 1-dimensional noise vector generated fromspherical Gaussian distribution N ( (cid:15) ; 0 , I ) , where the length isequal to target audio. In each context block, single squeeze stepand multiple flow steps are operated. In each flow step, an affinetransformation conditioned on target audio is applied to the halfof squeezed vector. At the end, the vector is flattened, and hy-perbolic tangent function is applied to fit range of audio domain.Negative log likelihood from the dequantizer is trained jointlywith the flow-based neural vocoder.We set a total of 16 flow stacks as Flow Shallow and 48flow stacks as
Flow Dense to examine whether the dept of de-quantization model is critical to the model performance.
Table 1:
Mean opinion score (MOS) results with 95% confi-dence intervals on 150 randomly selected sentences in test set.
Methods MOS 95% CI
Ground Truth 4.489 ± ± Uniform IW ± Gaussian Tanh ± Flow Dense ±
4. Experimental Results and Analysis
We set FloWaveNet [21] as our baseline model and trained base-line with 6 different dequantization methods. Each model wastrained with VCTK-Corpus [29] containing 109 native speakersEnglish dataset. Since FloWaveNet and WaveGlow [20] eval-uated with only a single speaker dataset, we expanded the ex-periment on the model to a multimodal case where audio gen-eration is much harder due to the larger variation among differ-ent speakers. In the dataset, we withdrew some corrupted audiofiles and used 44,070 audio clips. For each speaker, 70% of datawere used as training data, 20% as validation data, and rest ofthem as test data. All clips were down-sampled from 48,000Hzto 22,050Hz. From each audio clip, 16,000 chunks were ran-domly extracted.All models were trained on 4 Nvidia Titan Xp GPUs witha batch size of 8. We used Adam optimizer with a step size of × − and set the learning rate decay in every 200K iterationswith a factor of 0.5. We trained each model for 600K iterations. For subjective evaluation, we conducted a subjective 5 scaleMOS test on Amazon Mechanical Turk . Each participant wassuggested to wear either earbuds or headphones for the eligibletesting. Then they had to listen to 5 audio clips at least twiceand rated naturalness of audio on a scale of 1 to 5 with 0.5 pointincrements. We explicitly instructed the participants to focuson the quality of audio. We collected approximately 3,000 sam-ples for the evaluation. In the Table 1, models implemented au-dio dequantization show higher MOS than the baseline modelwhich show that audio dequantization can improve audio qual-ity. Except for the real audio, variational dequantization witha deeper layer receives the highest MOS. This shows that in-jecting noise with more complex distribution can produce morenatural audio. Audio generated from the baseline model tended to have dig-ital artifacts such as reverberation, trembling sound, and peri-odic noise. We assumed that the occurrence of these artifactswas due to the unnatural collapsing from the continuous den-sity model on discrete data-points. To prove that audio dequan-tization can remove such artifacts, we conducted several quan-titative evaluations in signal processing to compare the qualityof audio. First we randomly selected 400 sentences in test set.Then, we conducted mel-cepstral distortion (MCD) [30], globalsignal-to-noise ratio (GSNR) [31], segmental signal-to-noise ra-tio (SSNR) [11], and root mean square error of fundamental fre-quency (RMSE f ) [11]. All equations for the evaluation can be igure 2: Mel-spectrogram converted from audio samples generated by baseline [21] and proposed dequantization models. The bluebounding-box indicates the area where periodic noise appears and harmonic frequencies are presented.
Table 2:
MCD (dB) results with 95% confidence intervals.
Methods MCD
95% CI
Baseline [21] 3.455 ± Uniform ± Uniform IW ± Gaussian Sig ± Gaussian Tanh ± Flow Shallow ± Flow Dense ± MCD [ dB ] = 1 T T − (cid:88) t =0 (cid:118)(cid:117)(cid:117)(cid:116) K (cid:88) k =1 ( c t,k − c (cid:48) t,k ) (14) GSNR [ dB ] = 10 log σ s σ r (15) SSNR [ dB ] = 10 log ( (cid:80) Mn =0 x s ( n ) (cid:80) Mn =0 ( x s ( n ) − y r ( n )) ) (16) RMSE f [ cent ] = 1200 (cid:112) ( log ( F r ) − log ( F s )) (17)where c t,k , c (cid:48) t,k represent original and synthesized k-th mel fre-quency cepstral coefficient (MFCC) of t-th frame, σ s , σ r repre-sent power of speech signal and noise, x s ( n ) , y r ( n ) representraw and synthesized waveform sample at time n, and F r , F s represent fundamental frequency of raw and synthesized wave-form.In MCD, we compared all models including Uniform and
Gaussian Sig to see the improvement within the modification.In Table 2, dequantizations with modified methods show bet-ter performance than the conventional methods.
Gaussian Tanh and
Flow Dense score relatively lower MCD than baseline,which shows that both models can produce better audio qualitythan the baseline model. There was no significant performancedifference between the two variational dequantization methods.
Uniform IW shows slightly higher MCD than the baseline be-cause of the remaining audible noise generated from mu-lawcompanding.
Table 3:
GSNR (dB), SSNR (dB), and RMSE f (Hz) result.Higher is better for SNR and lower is better for RMSE f Methods GSNR SSNR RMSE f Baseline [21] -2.127 -2.284 44.881
Uniform IW -1.902 -1.990 38.359
Gaussian Tanh -2.112 -2.186 37.208
Flow Dense -2.048 -2.141 44.066Table 3 presents the result of SNR and RMSE f . All pro-posed methods show higher SNR than baseline, indicating thataudio dequantization can help reducing noise. In addition, Uni-form IW and
Gaussian Tanh dequantization show better perfor-mance in modeling fundamental frequency, while
Flow Dense dequantization shows a comparable result with the baselinemodel. We also visualized test outputs for qualitative evalua-tion in Figure 2. Figure 2(f) and Figure 2(h) show clearer har-monic structures than Figure 2(b). Although Figure 2(d) showsless clear harmonic structures than other approaches, we can seethat periodic noises are reduced significantly. We provide audioresults on our online demo webpage.
5. Conclusions
In this paper, we proposed various audio dequantizationschemes that can be implemented in flow-based neural vocoder.For the uniform dequantization, we compressed the range of au-dio domain to match with conventional uniform dequantizationmethod by using mu-law companding compression. In addi-tion, we implemented iw dequantization to resolve the noiseissue that occurs from the lossy compression. For the Gaussiandequantization, we applied hyperbolic tangent normalization ondata-oriented Gaussian noise to properly fit the data within theaudio range. Lastly, we modified flow block in flow-based neu-ral vocoder to construct variational dequantization model to ap-ply more flexible noise. From the experiments, we demon-strate that implementing audio dequantization can supplementthe flow-based neural vocoder to produce better audio qualitywith fewer artifacts. https://claudin92.github.io/deqflow_webdemo/ . References [1] Y. Jia, Y. Zhang, R. Weiss, Q. Wang, J. Shen, F. Ren, P. Nguyen,R. Pang, I. L. Moreno, Y. Wu et al. , “Transfer learning fromspeaker verification to multispeaker text-to-speech synthesis,” in Advances in Neural Information Processing Systems , 2018, pp.4480–4490.[2] Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. Weiss, N. Jaitly,Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgian-nakis, R. Clark, and R. Saurous, “Tacotron: Towards end-to-endspeech synthesis,” in
Interspeech , 2017, pp. 4006–4010.[3] J. Park, K. Han, Y. Jeong, and S. W. Lee, “Phonemic-level dura-tion control using attention alignment for natural speech synthe-sis,” in
IEEE International Conference on Acoustics, Speech andSignal Processing , 2019, pp. 5896–5900.[4] Y. Taigman, L. Wolf, A. Polyak, and E. Nachmani, “Voiceloop:Voice fitting and synthesis via a phonological loop,” in
Interna-tional Conference on Learning Representations , 2018.[5] J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang,Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan et al. , “Naturaltts synthesis by conditioning wavenet on mel spectrogram pre-dictions,” in
IEEE International Conference on Acoustics, Speechand Signal Processing , 2018, pp. 4779–4783.[6] D. Griffin and J. Lim, “Signal estimation from modified short-time fourier transform,” in
IEEE Transactions on Acoustics,Speech, and Signal Processing , 1984, pp. 236–243.[7] H. Kawahara, “Straight, exploitation of the other aspect ofvocoder: Perceptually isomorphic decomposition of speechsounds,”
Acoustical Science and Technology , vol. 27, no. 6, pp.349–353, 2006.[8] M. Morise, F. Yokomori, and K. Ozawa, “World: a vocoder-basedhigh-quality speech synthesis system for real-time applications,”
IEICE TRANSACTIONS on Information and Systems , vol. 99,no. 7, pp. 1877–1884, 2016.[9] A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals,A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu,“Wavenet: A generative model for raw audio,” arXiv preprintarXiv:1609.03499 , 2016.[10] A. Tamamori, T. Hayashi, K. Kobayashi, K. Takeda, and T. Toda,“Speaker-dependent wavenet vocoder.” in
Interspeech , 2017, pp.1118–1122.[11] T. Hayashi, A. Tamamori, K. Kobayashi, K. Takeda, and T. Toda,“An investigation of multi-speaker training for wavenet vocoder,”in
IEEE Automatic Speech Recognition and Understanding Work-shop , 2017, pp. 712–718.[12] S. ¨O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky,Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman et al. , “Deep voice:Real-time neural text-to-speech,” in
International Conference onMachine Learning , vol. 70. JMLR. org, 2017, pp. 195–204.[13] W. Ping, K. Peng, and J. Chen, “Clarinet: Parallel wave genera-tion in end-to-end text-to-speech,” in
International Conference onLearning Representations , 2018.[14] A. v. d. Oord, Y. Li, I. Babuschkin, K. Simonyan, O. Vinyals,K. Kavukcuoglu, G. v. d. Driessche, E. Lockhart, L. C. Cobo,F. Stimberg et al. , “Parallel wavenet: Fast high-fidelity speechsynthesis,” in
International Conference on Machine Learning ,2018, pp. 3918–3926.[15] J. Engel, K. K. Agrawal, S. Chen, I. Gulrajani, C. Donahue, andA. Roberts, “Gansynth: Adversarial neural audio synthesis,” in
International Conference on Learning Representations , 2019.[16] P. Neekhara, C. Donahue, M. Puckette, S. Dubnov, andJ. McAuley, “Expediting tts synthesis with adversarial vocoding,”in
Interspeech , 2019, pp. 186–190.[17] R. Yamamoto, E. Song, and J.-M. Kim, “Probability density dis-tillation with generative adversarial networks for high-quality par-allel waveform generation,” in
Interspeech , 2019, pp. 699–703. [18] K. Kumar, R. Kumar, T. de Boissiere, L. Gestin, W. Z. Teoh,J. Sotelo, A. de Br´ebisson, Y. Bengio, and A. C. Courville, “Mel-gan: Generative adversarial networks for conditional waveformsynthesis,” in
Advances in Neural Information Processing Sys-tems , 2019, pp. 14 881–14 892.[19] R. Yamamoto, E. Song, and J.-M. Kim, “Parallel wavegan: A fastwaveform generation model based on generative adversarial net-works with multi-resolution spectrogram,” in
IEEE InternationalConference on Acoustics, Speech and Signal Processing . IEEE,2020, pp. 6199–6203.[20] R. Prenger, R. Valle, and B. Catanzaro, “Waveglow: A flow-basedgenerative network for speech synthesis,” in
IEEE InternationalConference on Acoustics, Speech and Signal Processing , 2019,pp. 3617–3621.[21] S. Kim, S.-g. Lee, J. Song, J. Kim, and S. Yoon, “Flowavenet:A generative flow for raw audio,” in
International Conference onMachine Learning , 2019, pp. 3370–3378.[22] D. J. Rezende and S. Mohamed, “Variational inference with nor-malizing flows,” in
International Conference on Machine Learn-ing , 2015.[23] L. Theis, A. v. d. Oord, and M. Bethge, “A note on the evaluationof generative models,” in
International Conference on LearningRepresentations , 2015.[24] E. Hoogeboom, T. S. Cohen, and J. M. Tomczak, “Learn-ing discrete distributions by dequantization,” arXiv preprintarXiv:2001.11235 , 2020.[25] J. Ho, X. Chen, A. Srinivas, Y. Duan, and P. Abbeel, “Flow++:Improving flow-based generative models with variational dequan-tization and architecture design,” in
International Conference onMachine Learning , 2019, pp. 2722–2730.[26] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with in-vertible 1x1 convolutions,” in
Advances in Neural InformationProcessing Systems , 2018, pp. 10 215–10 224.[27] T. Yoshimura, K. Hashimoto, K. Oura, Y. Nankaku, andK. Tokuda, “Mel-cepstrum-based quantization noise shapingapplied to neural-network-based speech waveform synthesis,”
IEEE/ACM Transactions on Audio, Speech, and Language Pro-cessing , vol. 26, no. 7, pp. 1177–1184, 2018.[28] S. M. S. J Atchison, “Logistic-normal distributions: Some prop-erties and uses,” pp. 261–272, 1980.[29] C. Veaux, J. Yamagishi, K. MacDonald et al. , “Superseded-cstrvctk corpus: English multi-speaker corpus for cstr voice cloningtoolkit,” 2016.[30] R. Kubichek, “Mel-cepstral distance measure for objective speechquality assessment,” in
IEEE Pacific Rim Conference on Com-munications Computers and Signal Processing , vol. 1, 1993, pp.125–128.[31] M. Vondrasek and P. Pollak, “Methods for speech snr estimation:Evaluation tool and analysis of vad dependency,”