[PDF] A one-dimensional flow model enhanced by machine learning for simulation of vocal fold vibration

Abstract

We describe a one-dimensional (1D) unsteady and viscous flow model that is derived from the momentum and mass conservation equations, and to enhance this physics-based model, we use a machine learning approach to determine the unknown modeling parameters. Specifically, we first construct an idealized larynx model and perform ten cases of three-dimensional (3D) fluid--structure interaction (FSI) simulations. The flow data are then extracted to train the 1D flow model using a sparse identification approach for nonlinear dynamical systems. As a result of training, we obtain the analytical expressions for the entrance effect and pressure loss in the glottis, which are then incorporated in the flow model to conveniently handle different glottal shapes due to vocal fold vibration. We apply the enhanced 1D flow model in the FSI simulation of both idealized vocal fold geometries and subject-specific anatomical geometries reconstructed from the MRI images of rabbits' larynges. The 1D flow model is evaluated in both of these setups and is shown to have robust performance. Therefore, it provides a fast simulation tool superior to the previous 1D models.

Full PDF

TThis manuscript has been accepted for publication in Journal of the Acoustical Society of America

A one-dimensional ﬂow model enhanced by machine learning forsimulation of vocal fold vibration

Zheng Li a , Ye Chen a , Siyuan Chang a , Bernard Rousseau b , Haoxiang Luo a , a Department of Mechanical Engineering, Vanderbilt University, 2301 Vanderbilt Place, Nashville, TN37235-1592 b Department of Communication Science and Disorders, University of Pittsburgh

Abstract

We describe a one-dimensional (1D) unsteady and viscous ﬂow model that is derived from the momentumand mass conservation equations, and to enhance this physics-based model, we use a machine learningapproach to determine the unknown modeling parameters. Speciﬁcally, we ﬁrst construct an idealized larynxmodel and perform ten cases of three-dimensional (3D) ﬂuid–structure interaction (FSI) simulations. Theﬂow data are then extracted to train the 1D ﬂow model using a sparse identiﬁcation approach for nonlineardynamical systems. As a result of training, we obtain the analytical expressions for the entrance eﬀect andpressure loss in the glottis, which are then incorporated in the ﬂow model to conveniently handle diﬀerentglottal shapes due to vocal fold vibration. We apply the enhanced 1D ﬂow model in the FSI simulation ofboth idealized vocal fold geometries and subject-speciﬁc anatomical geometries reconstructed from the MRIimages of rabbits’ larynges. The 1D ﬂow model is evaluated in both of these setups and is shown to haverobust performance. Therefore, it provides a fast simulation tool superior to the previous 1D models.

Keywords: vocal fold vibration; ﬂuid–structure interaction; subject-speciﬁc model; machine learning, 1Dmodel; phonation;

I. Introduction

Computational modeling of ﬂuid–structure interaction (FSI) for vocal fold vibration is useful as it mayprovide a computer based tool for clinical management of voice disorders, e.g., surgical planning for vocalfold paralysis [1]. Despite that the underlying physical principle of vocal fold vibration is straightforward andcan be modeled simply using lumped-mass models, high-ﬁdelity modeling to simulate details of the tissue’sdynamic deformation is still very challenging, especially if patient-speciﬁc features should be simulated forthe purpose of developing modeling tools that can capture diﬀerences in the laryngeal anatomy and tissueproperties of individuals. Current aﬃliation:Corning Inc., Corning, NY Corresponding author: [email protected] (E-mail), +1-615-322-2079 (Tel) a r X i v : . [ phy s i c s . f l u - dyn ] F e b ith tremendous growth of the computer power and improvement of the modeling approach, compu-tational models for the FSI of vocal fold vibration have been advanced substantially in recent years. Thesephysics-based models typically couple a 2D or 3D glottal airﬂow model and a ﬁnite-element representationof the vocal fold tissue [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. In addition, they adopt increasingly high resolu-tion and have provided insightful fundamental understanding about this FSI system such as eigenmodes [2],mechanics of posturing [3], sensitivity to the geometry and material properties [5, 8, 9, 10], vortex ﬂow andpressure on the vocal fold surface [6, 7, 13], energy transfer [4, 7], and acoustic wave propagation [11, 12]. Inthese references, the vocal fold was often represented by a schematic that captures only the overall character-istics of the laryngeal geometry. Such models are obviously not suﬃcient for subject-speciﬁc representation.Medical imaging technologies, e.g., computed tomography (CT) and magnetic resonance imaging (MRI),allows the computational models to incorporate more realistic or even subject-speciﬁc laryngeal geometries.These imaging tools may provide detailed 3D anatomy of the larynx, as well as the interior structure of thetissue [14, 15, 16, 17]. Coupled with a 3D airﬂow solver in FSI simulation, the anatomical models of thelarynx represent a signiﬁcant step toward patient-speciﬁc modeling of vocal fold vibration, which may beneeded for clinical care of voices of individual patients. In recent years, such patient-speciﬁc models havebeen developed to study vocal fold vibration by several researchers [18, 19, 20].On the other hand, the unknown tissue properties for individual subjects, which can not be identiﬁedfrom current imaging technologies, limit the application of patient-speciﬁc modeling. There have been prioreﬀorts to derive the vocal fold tissue properties using ﬁnite-element method (FEM) based models combinedwith experimental tests [21, 22, 23], but they were limited to ex-vivo conditions. Although it is probableto determine the in-vivo , subject-speciﬁc tissue properties by running the high-ﬁdelity FSI simulations andsolving an inverse problem, assuming that all other aspects in the FSI model match the corresponding in-vivo experiment (e.g., the anatomy and boundary conditions), such an approach is still not practical since the 3Dairﬂow simulation is too expensive even with high-performance parallel computing. More practically, onecould use a simpliﬁed ﬂow model, coupled with a realistic FEM representation of the vocal fold for the FSIsimulation, to determine the elastic properties with much lower computational cost. Such eﬀorts have beenmade recently in a few studies [20, 24, 25, 26]. In practice, this kind of simpliﬁed FSI models could also becombined with the high-ﬁdelity models to increase the overall modeling accuracy [20].Bernoulli based ﬂow equations have long been used for vocal fold vibration. Decker and Thomson [27]compared the Bernoulli equation with the Navier–Stokes equation for simulation of vocal fold vibration,and they found that the Bernoulli models could be highly dependent on the ad hoc assumption of the ﬂowseparation in the glottis. Chang [28] also reached a similar conclusion, and they found the Bernoulli modelmay lead to a signiﬁcantly diﬀerent vibration mode of the vocal fold from a full Navier–Stokes model. Ingeneral, the main limitations in the Bernoulli principle are due to the assumption of the ideal ﬂow in theglottis and a priori unknown location of ﬂow separation. To address the limitations of the Bernoulli equation,in our recent works [29, 30] we developed a 1D momentum equation based ﬂow model that was originally2esigned to solve separated ﬂow in the collapsible tube [31, 32] and was only recently introduced for modelingof vocal fold dynamics [33]. In this model, we have included the eﬀect of pressure loss that is caused by theﬂow separation and the viscous eﬀect. Furthermore, we have included an entrance eﬀect, which is due toan inertial ﬂow entering the glottis from a rapidly converging shape in the subglottal region. This 1D ﬂowmodel was coupled with the 3D tissue model for FSI simulation of both idealized and anatomical vocal foldgeometries from rabbits. In the idealized vocal fold cases, the reduced FSI model achieved consistent resultswith those from the full 3D FSI model for diﬀerent medial vocal fold thicknesses, subglottal pressures, tissuemodels, and tissue stiﬀness properties. In the anatomical models, vibration results from the reduced FSImodel agreed well with the experimental data of the evoked in-vivo rabbit phonation [29, 30].However, the pressure loss and entrance eﬀect in the glottal airﬂow depend on the overall shape of theglottis as well as its instantaneous deformation during vibration. In addition, the Reynolds number playsa signiﬁcant role. Since no analytical expression exist to describe these eﬀects precisely, in the previouswork [29, 30] we chose to use constant parameters based on the knowledge learned from 3D ﬂow simulations.These parameters may need adjustment depending on speciﬁc vocal fold geometry, e.g., a large or smallmedial thickness, in order to achieve good accuracy. This limitation reduces generalization of the ﬂowmodel. To overcome the limitation, we set variable parameters for the pressure loss and entrance eﬀect andseek to express them in a functional form that is convenient to use.Machine learning techniques, which have gained popularity in ﬂuid mechanics in recent years [34], providea useful approach to help determine the functional form of a physical eﬀect, especially when general character-istics of such eﬀect have been understood. This feature applies well to the situation we are considering. Herewe use the 1D Navier–Stokes equation to describe the ﬂow physics and leave only the undescribed eﬀects tobe determined empirically. These eﬀects, i.e., the pressure loss and entrance eﬀect in a converging-divergingglottis, can be expressed as functions of only a few dimensionless parameters such as the instantaneous shapeof the glottis and the Reynolds number of the ﬂow. Once these functions are determined using a trainingdata set through machine learning, they can be incorporated into the 1D model for ﬂow simulation of arbi-trary glottal shapes. Recently, there have been other studies that also incorporated machine learning intomodeling of vocal fold dynamics [26, 35, 36]. Compared to those studies, the present work takes advantageof the physics-based model, i.e., the unsteady viscous Navier-Stokes equation, albeit within the limitationsof 1D, as much as possible and only resort to machine learning for the remaining undescribed eﬀects thatincludes the entrance eﬀect.In ﬂuid mechanics, many machine learning techniques have been developed to identify the governingequation directly from data [37, 38, 39, 34]. Here, we adopt the sparse identiﬁcation of nonlinear dynamical(SINDy) systems [37], which is a regression algorithm suitable for the physical systems having only a fewrelevant terms to deﬁne the dynamics. Speciﬁcally, we will use the SINDy method to identify the pressureloss and entrance eﬀect at locations near the glottal exit and express them as polynomial functions of theReynolds number, the channel length, and the convergent or divergent ratios of the glottis. To generate the3 igure 1: Schematic of airﬂow entering the glottis that generally has a converging-diverging shape. The ﬂow experiences loss ofenergy and also a vena contracta eﬀect, in which the eﬀective area, A , may be smaller than the actual cross section area, A .The red-dashed lines indicate the boundary layer. training data for machine learning, we will use 3D FSI simulations of an idealized vocal fold with diﬀerentmedial thicknesses, stiﬀness properties, and subglottal pressures. Additional 3D simulation cases will beperformed for validation of the machine learning. To further assess the performance of the new ﬂow model,we will compare other quantities such as the pressure distribution, ﬂow rate, and vibratory characteristics inthe idealized model against the 3D simulation. Furthermore, we will apply the new model to FSI simulationof anatomical vocal fold geometries that are based on excised rabbit larynges. The simulation results willbe compared with high-speed video data from in-vivo phonation of the same larynx samples prior to theexcision. The descriptions of the model setup, machine learning procedure, and results from machine learningand FSI simulations are provided in the following sections. II. The ﬂow model and training case setup

A. The one-dimensional viscous ﬂow model

A 2D schematic of the geometry in the transverse plane of the larynx is shown in Fig. 1, where theglottis is depicted as a converging-diverging channel. To facilitate our discussion, we use x a and x b to markthe locations of the glottal inlet and exit, respectively, and x c the location of the minimal cross-section areain the glottis. Note that x c varies between x a and x b when the vocal fold is vibrating, and it could coincidewith x b such that the glottis is purely convergent. In practice, the location of x b is straightforward to chooseas the cross section typically experiences a sudden expansion at the glottal exit. On the other hand, thelocation of x a sometimes is not obvious since the subglottal region may narrow down gradually rather thanabruptly. We point out that from our tests, the present ﬂow model is not sensitive to the inlet location if theglottis does not have a clear entrance location, in which case an approximate choice of x a would be suﬃcient.This is because the entrance eﬀect of a gradually convergent section is small anyway, and in addition, thepressure loss primarily takes eﬀect in the diverging section in the present model and has little dependenceon the inlet location. 4hen the air ﬂows through the glottis during vibration, the pressure generally decreases before theminimal area section at x c due to the Bernoulli eﬀect. After this section, the pressure would increase alongwith the area expansion. However, the pressure will not recover to its full extent because of the possibleseparation in the divergent section and also the viscous eﬀect in the entire glottis. Therefore, accountingfor the pressure loss in the ﬂow model will help overcome limitations of the Bernoulli equation. Consideringthe mass and momentum conservation equations, Cancelli and Pedley [31] developed a 1D ﬂow model todescribe a collapsible tube. In the momentum equation, they included the viscous loss and separation eﬀects.To generalize the pressure loss, we combine the viscous loss and separation eﬀects as one single loss termrepresented by the shear stress, τ , in the following equation ∂A∂t + ∂Au∂x = 0 ρ ∂u∂t + ρu ∂u∂x = − ∂p∂x + ∂τ∂x (1)where ρ , u and p are respectively the density, velocity, and pressure, and A is the eﬀective area of the crosssection. We will discuss the calculation of the shear stress τ ( x ) later using a machine learning approach. Forthe boundary conditions of the 1D ﬂow, we set a speciﬁed subglottal pressure, P sub , and the pressure at theglottal exit, P e . Eq. (1) represents a nonlinear boundary value problem and can be solved using a shootingmethod once we have an expression for τ . Its numerical procedure was described in Li et al. [29].Beside the loss term, the vena contracta eﬀect of ﬂow entering an expansion was introduced in our ﬂowmodel [29, 30]. In particular, as the air ﬂows into the glottis, and especially the diverging section, it tendsto focus to the center under its inertia, rather than following the exact shape of the channel. Thus, we usethe eﬀective cross section area, A , in Eq. (1) for the mass conservation equation. This area is smaller thanthe actual cross sectional area, A , as illustrated in Fig. 1. Without such an entrance eﬀect, the negativepressure (gage pressure) at the minimum section could be overestimated, leading to inaccurate pressure loadon the vocal fold surface. To calculate the eﬀective area A , we introduce a correctional coeﬃcient, α ( x ), sothat A ( x ) = α ( x ) A ( x ) . (2)Note that α is a function of the streamwise location, x .In our previous work [29, 30], we estimated α ( x ) based on the 3D simulation of the FSI problem bycalculating it from α ( x ) = u avg /u , where u avg is the average streamwise velocity in the cross section and u isthe maximum streamwise velocity. We further assumed a quadratic function form for α ( x ) with a single freeparameter to be determined through machine learning, as will be discussed in next section. The quadraticfunction represents narrowing down of the eﬀective area due to the growth of the boundary layer along theglottis. 5 . Input variables for machine learning To outline the ﬂow model for machine learning, we use a simpliﬁed but characteristic geometry of theglottis and deﬁne the input–output variables for the machine learning module. Fig. 2(a) shows a 2D schematicof the glottis. We deﬁne ﬁve locations along the ﬂow, which are: 1) the glottal inlet x a , 2) the glottal exit x b ,3) the narrowest area location x c , 4) a point in the subglottal region x d , and 5) an intermediate location inthe divergent section, x e . The average gap width at the narrowest section is denoted as H , so H = A ( x c ) /L ,where A ( x c ) is the cross section area at x c and L is the longitudinal length of the glottis. These locations, x a , x b , x c , and the corresponding cross-sectional area, A ( x a ), A ( x b ), and A ( x c ), describe the overall converging-diverging shape of the glottis. In addition to these variables, x d and A ( x d ) are used to describe the slopeof the subglottal region, which is useful in measuring the extent that the ﬂow is focused when entering theglottis. Previous study has shown that the geometry of the glottal entrance has a signiﬁcant inﬂuence onthe intraglottal pressure distribution [40]. Here we set x d at a distance of 5 H from x a to capture the slopeof the subglottal shape. Furthermore, the intermediate point x e is added so that the pressure loss at thislocation will be determined as one output variable as described in next section. We set x e to be closer to x b , with a distance ratio of 3:1 between | x c − x e | and | x b − x e | , since the pressure loss increases more quicklynear the glottal exit, as illustrated in Fig. 2(b). It is worth pointing out that the exact location of x d and x e are not crucial, as x d is used to calculate the subglottal slope of the vocal fold and x e is used to provideanother data point for the pressure loss estimate.In terms of nondimensional parameters, the geometric variables along with the Reynolds number aredeﬁned as below, r b = A ( x b ) /A ( x c ) l bc = L bc /Hr a = A ( x a ) /A ( x c ) l ac = L ac /Hr d = A ( x d ) /A ( x a ) Re / = (cid:20) ρ u c Hµ (cid:21) / (3)Among these six variables, r b , r a , and r d are the area ratios, l bc and l ac are the normalized distances. TheReynolds number is deﬁned using the velocity at the narrowest section, u c , and is reduced to the power 1 / C. Output variables for machine learning

To determine the pressure loss, or the shear stress term in Eq. (1), we assume that τ = 0 and d τ / d x = 0at the glottal inlet x a , τ = τ e at the intermediate point x e , and τ = τ b at the exit x b . The two unknownvariables, τ e and τ b , will be determined using machine learning as functions of the six input variables described6a) (b) x c x d x b x e x a H5H xL ac L bc sectionMinimum x a x e x b x a x c x b x e e τ b τ x c x c τα / α ( ) xx Figure 2: (a) Geometric description of the glottis used in the machine learning. (b) Functions of the pressure loss, τ ( x ), andthe area correction coeﬃcient, α ( x ), along the glottis. in Eq. (3). Once τ e and τ b are determined from machine learning, we assume a cubic distribution for τ ( x )from the glottal inlet x a to the exit x b as shown in Fig. 2(b). This assumption is made based on observationof general characteristics of the pressure loss from our 3D ﬂow simulations [30]. Note that a higher orderdistribution is also possible using the same strategy, if more output variables are used from machine learning.For the entrance eﬀect, we need to determine the area correction coeﬃcient, α ( x ), along the glottis.Similar to our previous publications [29, 30], we assume a quadratic distribution for α ( x ) between x c and x b . However, in the present study we will only need to determine α/α ( x c ) since only the relative area ratiois needed when solving Eq. (1). Furthermore, α ( x ) is assumed to have zero derivative at x b . Therefore,we will only need to determine α ( x b ) /α ( x c ) through machine learning. In summary, there are three outputvariables for the machine learning process, which are τ e , τ b , and α ( x b ) /α ( x c ). D. The SINDy method for machine learning

For machine learning, we use sparse identiﬁcation of nonlinear dynamical systems (SINDy) [37], which isa data regression approach to discover governing equations for nonlinear dynamical systems including ﬂuidﬂows. In particular, SINDy uses sparse regression to determine the fewest terms in the dynamic governingequations required to accurately represent the data, and this results in parsimonious models that balanceaccuracy with model complexity to avoid overﬁtting [37]. The key assumption in this approach is that formany systems of interest, the governing equation consists of only a few terms, making it sparse in the spaceof possible functions [37]. Using training data that will be described in next section, SINDy can determinea generic output variable, f , as a polynomial function of the six input variables deﬁned in Eq. (3), i.e., f = f ( r b , l bc , r a , l ac , r d , Re / , r b , r b l bc , · · · , Re / , · · · ) (4)7 igure 3: The vocal fold model and computational domain used for 3D FSI simulation and data generation.Table 1: FSI cases setup for data generation. Cases 1-6 are used for training, and Cases 7-10 for additional validation. Case 1 2 3 4 5 6 7 8 9 10 T (mm) 1.75 3.50 1.75 3.50 P sub (kPa) 1.00 1.25 0.75 1.25 0.75 α (kPa) 2.29 2.58 9.16 2.29 4.58 9.16 2.29 α (kPa) 0.25 0.50 1.00 0.25 0.50 1.00 0.25The software package of SINDy in Matlab is freely available by the authors [37] and is used here for ourstudy. For our study, only the terms up to the third order are retained in this polynomial function. E. Setup of the 3D FSI model and data generation

To generate training data for machine learning, we use a previous 3D setup of the FSI of an idealized vocalfold geometry as illustrated in Fig. 3. This model has been described in our previous publications [9, 29, 30]and is only brieﬂy summarized here. The airﬂow is driven by a constant subglottal pressure P sub , and theoutlet has a reference pressure of P out = 0 kPa for all the cases in consideration. The air is assumed to beincompressible and is governed by the viscous Navier-Stokes equation. A pair of vocal fold bands are placedsymmetrically in the channel, whose length, width, and depth are L = 20 mm, W = 13 mm, and D = 10 mm,respectively. The medial thickness, T , has signiﬁcant eﬀects on the ﬂow and the vocal fold vibration [28],as the medial surfaces are the primary loading surfaces for the sustained vibration. The vocal fold hereis assumed to be isotropic, homogeneous, and is governed by a hyperelastic, two-parameter Mooney-Rivlinmodel. The strain energy density function for this model is given as W = α ( I −

3) + α ( I −

3) + K/ J − (5)where K represents the bulk modulus, α and α are material constants related to the distortional response,and J = det( F ) with F standing for the deformation gradient. In addition, I and I are invariants basedon J and the principal stretches of the deformation gradient. Further detail of this model for the vocal fold8an be found in our group’s previous work [41]. Anisotropic tissue behavior or a multi-layer structure asproposed in many previous works [2, 7, 10] would be better representation of the real tissue of vocal fold.However, we only need characteristic vocal fold deformation and corresponding ﬂow data for the trainingpurpose; thus, the speciﬁc material model for the vocal fold tissue is not essential in this study.To solve the 3D FSI, an in-house immersed-boundary method is employed for the ﬂow simulation,while the tissue deformation is solved with a ﬁnite-element method [41]. In total, we solved ten simulationcases after a careful mesh independence study [29, 30]. Table 1 shows the details for all the test cases,which contains variations in these parameters: the medial thickness T , the subglottal pressure P sub , andthe material stiﬀness constants α and α . The tissue density is ρ s = 1040 kg/m and mass damping is0.05 s − in all the cases. The air density is ρ = 1 .

13 kg/m . Thus, the characteristic intraglottal velocity is V = (cid:112) P − P out ) /ρ = 42 . Re J = ρV d/µ , where d ∼ µ is the air viscosity. In the current study,we set Re J = 210.Cases 1-6 are utilized in SINDy as training data, which have the same P sub but two diﬀerent medialthicknesses and three stiﬀness constants. After steady vibration is established in the 3D FSI simulations,80 time frames of data in each case, which cover at least 2 vibration cycles (from 8 to 16 milli-seconds,depending on the frequency), are used for training. That is, at each chosen time frame, the instantaneousvalues of the six input variables and the three output variables represent a data point for machine learning.To calculate these input and output variables, we extract the ﬂow velocity and pressure along the centerlineof the 3D ﬂow ﬁeld. In total, there are 480 data points for all 6 cases together for training. We will comparethe output from the regression (i.e., the equations derived from the machine learning) with those providedfor training. To extend the validation, we consider Cases 7-10, in which the subglottal pressure is diﬀerentand whose data have not been used for training, for further assessment. For all ten cases, we will also usethe machine learning enhanced 1D ﬂow model to replace the 3D ﬂow and perform FSI simulations, and wewill compare the vibration frequency, amplitude and phase delay of the vocal fold between 3D FSI and thesimpliﬁed FSI model. The entire procedure is shown using a ﬂow chart in a supplementary ﬁgure [42]. III. Results and discussion

A. Results from machine learning

After the training process, we obtain the explicit expressions of the output variables as deﬁned inSection C. These expressions are given in the appendix A. Results of data regression are shown in Fig. 4for τ b / ( ρu c ), τ e / ( ρu c ), and α ( x b ) /α ( x c ), where u c is the centerline velocity at the minimum section x c . Inthis ﬁgure, the x -axis represents the 3D FSI results of Cases 1 to 10 and the y -axis represents the predictedvalue based on the trained regression model, i.e., Eqns. (A.1) to (A.3). Red symbols represent training datafrom Cases 1-6, while blue symbols are the validation data from Cases 7-10. Ideally, the predicted value isequal to the 3D FSI value so that all data points would fall on the dashed line y = x in the ﬁgure. However,9 igure 4: Comparison between 3D FSI results and the predicted results from data regression. Data in Cases 1-6 (red symbols)are used as training data; Data in Cases 7-10 (blue symbols) are only used for validation. (a) Pressure loss at the glottal exit, τ b / ( ρu c ). (b) pressure loss at x e , τ e / ( ρu c ). and (c) the entrance eﬀect ratio at x b , α ( x b ) /α ( x c ). The dashed line representsthe ideal ﬁt. due to the error in the ﬁtting process, the data are scattered around the line. From this ﬁgure, we can seethat both the training data and the validation data mostly cluster around y = x ; thus, the predicted resultsfrom regression agree reasonably well with the 3D results. We calculate the mean error between the machinelearning result and the benchmark result from the 3D simulations for each of these three output variables,i.e., | φ ML − φ D | . For τ b / ( ρu c ), the mean error of the training data and mean error of the validation dataare 0.029 and 0.033, respectively; for τ e / ( ρu c ), they are 0.057 and 0.086, respectively; and for α ( x b ) /α ( x c ),they are 0.048 and 0.078, respectively.In Fig. 4(a,b), we see that the loss coeﬃcients in the glottis, τ b / ( ρu c ) and τ e / ( ρu c ), vary from nearly zeroto nearly three. The zero value means that there is no loss in the ﬂow, while three represents a signiﬁcantloss in the ﬂow. Such great loss happens when the glottis is nearly closed and the ﬂow speed becomes small,which is analogous to a closing mechanical valve in a pipe ﬂow.In Fig. 4(c), we see that α ( x b ) /α ( x c ) is clustered above 0.5, indicating that the entrance eﬀect at theglottal exit is not necessarily signiﬁcant at those time frames. Further examination shows that in thosesituations, the glottis has a small divergent angle or has a short divergent section, which does not create astrong entrance eﬀect. However, there are also a number of data points in this ﬁgure where α ( x b ) /α ( x c ) isbelow 0.5 and is even close to 0.2. For those situations, the divergent section is typically long and/or hasa large diverging angle, which causes early ﬂow separation and strong entrance eﬀect. The distribution ofthe data points in this ﬁgure thus covers a wide range of situations for the ﬂow, which is preferred for thetraining purpose. To further illustrate the data variation and validation of the machine learning, we haveadded a supplementary ﬁgure plotting the three output variables against time for the validation cases [43].Once the expressions for τ b , τ c , and α ( x b ) /α ( x c ) are derived from machine learning, we can applythem in the 1D ﬂow model. In doing so, τ ( x ) and α ( x ) in the glottis take the assumed distribution as inSection C. That is, we assume a cubic function for τ ( x ) as shown in Fig. 2(b), where τ = 0 and d τ / d x = 0at x a . When x c is very close to x b , i.e., the glottis is almost purely convergent, the cubic interpolation may10 igure 5: Comparison of pressure distribution along the centerline between 3D FSI and 1D ﬂow model for small medial thickness T cases. (a) Closing phase in Case 7, (b) opening phase in Case 7; (c) closing phase in Case 8, (d) opening phase in Case 8.The inset shows the corresponding vocal fold deformation (solid lines) from its initial conﬁguration (dashed lines). result in overshoot of the function. Thus, when | x c − x b | is less than H , we disregard τ e and switch thecubic function for τ ( x ) to the quadratic function with the same boundary conditions at x a . For the areacorrection coeﬃcient α ( x ), we assume a quadratic function between x c and x b , while requiring d α/ d x = 0at x b , as shown in Fig. 2(b). Between the glottal inlet x a and the minimum section x c , the thickness of theboundary layer has only small change, so we assume that α ( x ) /α ( x c ) = 1 in that region.To verify the 1D ﬂow model enhanced by machine learning, we ﬁrst compared the pressure distributionpredicted by this model against 3D results generated from cases that are not used in the training process.More speciﬁcally, we use the instantaneous glottal shape obtained from the 3D FSI simulations of Cases 7 to10 and calculate the pressure distribution using the 1D ﬂow model in Eq. (1). Then, the result is comparedwith the pressure at the centerline extracted from the 3D ﬂow ﬁeld.Figure 5 shows such comparison of pressure distribution at vocal fold opening and closing phases fromCases 7 and 8, which have the same medial thickness of T = 1 .

75 mm but diﬀerent P sub . It can be seenthat the pressure produced by the 1D ﬂow model is close to that from the 3D simulation for diﬀerent P sub values, including the negative pressure during the opening phase when the glottis is of divergent shape. Fromour previous study [29], correct prediction of the negative pressure in this case of small medial thickness is11 igure 6: Comparison of pressure distribution along the centerline between 3D FSI and 1D ﬂow model for large medial thickness T cases. (a) Closing phase, (b) opening phase, and (c) maximum opening in Case 9; (d) closing phase, (e) opening phase, and(f) maximum opening in Case 10. The inset is explained in Fig. 5. .205 0.21 0.215 0.22 Time (s) F l o w r a t e ( m l / s ) (d)

3D FSI1D FLOWRefs. [29, 30]

Time (s) F l o w r a t e ( m l / s ) (c)

3D FSI1D FLOWRefs. [29, 30]

Time (s) F l o w r a t e ( m l / s ) (b)

3D FSI1D FLOWRefs. [29, 30]

Time (s) F l o w r a t e ( m l / s ) (a)

3D FSI1D FLOWRefs. [29, 30]

Figure 7: Comparison of the volume ﬂow rate between 3D FSI and 1D ﬂow model for (a) Case 7, (b) Case 8, (c) Case 9, (d)Case 10, which are not in the training data set. important; otherwise, the vocal fold may exhibit a diﬀerent vibration mode that is associated with the ﬁrsteigenmode of the tissue structure [29].Figure 6 shows comparison of the pressure distribution for Cases 9 and 10, which have the same medialthickness of T = 3 .

50 mm but diﬀerent P sub . In these cases, the glottis is relatively long and may formdiverging, converging, and converging-diverging shapes at diﬀerent vibration phases. From our previousstudy [29], if a Bernoulli equation based model is used, then an inappropriate setting of the location forﬂow separation may lead to exceedingly negative pressure that might destabilize the FSI simulation. In thepresent study, the 1D ﬂow model correctly predicts the negative pressure zone (e.g., Fig. 6(c) and (f)) andcaptures the pressure reasonably well at diﬀerent glottal shapes.In Figs. 5 and 6, we also include the pressure distributions calculated by the untrained 1D ﬂow modelwith ad hoc assumptions in pressure loss and entrance eﬀect [29, 30]. Overall, the current 1D ﬂow modeltrained by machine learning achieves better accuracy as compared with those reference results.Besides the pressure distribution, the ﬂow rate is also important for the glottal airﬂow. Figure 7 comparesthe volume ﬂow rate between 3D FSI and 1D ﬂow model for Cases 7 to 10 that are not used in the trainingprocess. Similar to the pressure distribution, we calculate the volume ﬂow rate in 1D ﬂow model using thesame glottal shape from the 3D simulation. Two representative cycles are selected for each case comparison.In Cases 9 and 10 where the medial thickness is larger, the vocal fold has better closure, and thus the ﬂow13 igure 8: Comparison between 3D FSI and reduced-order model: (a) amplitude, (b) frequency, (c) phase delay. rate has greater oscillations and is reduced to nearly zero at closure. On the other hand, in Cases 7 and 8where the medial thickness is smaller, the vocal fold maintains a signiﬁcant gap at the closing phase, whichleads to a high ﬂow rate and low-magnitude oscillations during vibration. In all cases under consideration,the ﬂow rate from 1D ﬂow model agrees well with 3D FSI result.As a reference, we also include the volume ﬂow rate in Cases 7-10 calculated using the untrained 1D ﬂowmodel [29, 30]. Comparing with the reference result, the ﬂow rate from the present model shows signiﬁcantlybetter agreement with the 3D FSI result.Therefore, the present 1D ﬂow model provides reliable predication for the volume ﬂow rate during vocalfold vibration. B. Application in the FSI of idealized vocal fold models

After verifying the 1D ﬂow model for ﬂow calculation only, we then apply it in the FSI simulation bycoupling it with the 3D idealized vocal fold model described in Section A. We compare the vibrationcharacteristics from this 1D-ﬂow/3D-solid hybrid FSI simulation with those from the full 3D FSI simulation.For this comparison, all ten cases in Table 1 are considered, which include variations in the medial thickness,stiﬀness properties, and subglottal pressure. As mentioned before, Cases 7 to 10 were not used in the trainingdata but included here as additional validation.We use the vibration amplitude, frequency, and phase delay for quantitative comparison between thetwo sets of simulations. The vibration amplitude is deﬁned as the maximum y -displacement of the vocal foldmeasured at the glottal exit in a cycle. From Case 1 to Case 3, the tissue stiﬀness increases while the otherparameters are the same. As shown in Fig. 8(a), the vibration amplitude decreases with increasing tissuestiﬀness from Case 1 to 3, which have a smaller medial thickness. Similar trend can be seen from Case 4 to6, which have a larger medial thickness. Cases 7, 1 and 8 have diﬀerent subglottal pressures from high tolow. Correspondingly, their vibration amplitude shows a decrease in the same order. A similar result canbe seen for Cases 9, 4, and 10 in which P sub is decreased. In all ten cases, the two sets of FSI simulationsproduce closely agreeable results. 14 able 2: 1D-ﬂow FSI results compared with 3D FSI in terms of vibration frequency f , amplitude d , and phase delay φ . Resultsusing the untrained model in Ref. [29] are also included. Model f (Hz) diﬀerence d (mm) diﬀerence φ ( ◦ ) diﬀerence ( ◦ )Case 1 3D FSI 132 - 0.69 - -19 -1D-ﬂow FSI 135 2.3% 0.65 5.8% -15 4Ref [29] 144 9.1% 0.58 15.9% -15 4Case 4 3D FSI 140 - 1.02 - 157 -1D-ﬂow FSI 136 2.9% 1.04 2.0% 159 2Ref [29] 144 2.9% 1.00 2.0% 161 4Avg. error(Cases 1-10) 1D-ﬂow FSI - 2.3% - 5.3% - 6Ref [29, 30] - 3.9% - 9.1% - 11The comparison of the vibration frequency is shown in Fig. 8 (b). The second-eigenmode type vibra-tion [7] is established in all the cases, where the vocal fold oscillation is primarily in the lateral or y -direction,and this mode is captured by the 1D-ﬂow FSI simulation. Thus, the frequency predicted by this hybrid FSImatches that by the full 3D FSI. From the ﬁgure, the eﬀect of the tissue stiﬀness on the vibration frequencyis clear, e.g., from Case 1 to 3, and from Case 4 to 6, where an increase of the tissue stiﬀness leads to increaseof the vibration frequency.The phase delay is calculated using the temporal diﬀerence between the glottal inlet and the exit in themid xy -plane in terms of the displacement. Good agreement is again achieved in the comparison betweenthe two sets of simulations for all cases. For this quantity, the most inﬂuential parameter is the medialthickness, as the longer glottis would create greater phase diﬀerence between the glottal inlet and the exitand thus lead to a more pronounced mucosal wave along the glottis.To compare the performance of the present ﬂow model with that of the untrained ﬂow model, we includethe results from a previous study [29, 30]. This comparison is shown in Table 2. Only one case each fromthe large T and small T is shown here for brevity. We also include the average diﬀerences for the frequency,amplitude and phase delay from Cases 1-10. If we only consider Cases 7-10, the average error is 3.3%, 6.3%and 5.2 ◦ , for the frequency, amplitude, and phase delay, respectively, which is consistent with the overallaverage in Table 2. From the comparison, it can be seen that the trained ﬂow model leads to better accuracyin the predication of the vibration than the previous untrained model. C. Application in the subject-speciﬁc vocal fold models

Other than the idealized vocal fold geometry, we also apply the 1D ﬂow model in the FSI simulation ofthe subject-speciﬁc vocal fold models that were generated based on 3D scan of rabbits’ larynx (Fig. 9). Inthe present study, we utilize the vocal fold models created previously in Chang et al. [20] and will validatethe simulation results against the experimental data of evoked in-vivo phonation. The same models were15 igure 9: Subject-speciﬁc vocal fold model: (a) reconstructed larynx geometry from a superior view; (b) proﬁles of the vocalfold cover (blue) and body (red) segmented from the MRI scan. also used in Chen et al. [30] to validate the 1D ﬂow model without machine learning. Readers are referred toChang et al. [20] for the details how these anatomical models were created and how the in-vivo measurementof the vocal fold vibration was conducted. Only a brief summary is given here to provide the context.In the experiment [17], live rabbits were used in the study; their vocal fold was surgically sutured toachieve adduction, and phonation was evoked by introducing pressurized air from their trachea. High-speedvideos of the vocal fold vibration were taken during the experiment, which provide the vibration frequency,magnitude, and waveform as the validation data for our current study.After the phonation experiment, the rabbit larynx was excised and high-resolution MRI was performedto obtain details of morphology of the vocal fold while the vocal fold maintained the adducted phonatoryposition. The 3D anatomical vocal fold model was generated for each of the ﬁve samples after manualsegmentation from the MRI data and surface mesh reduction/smoothing [20]. Furthermore, the tissueproperties were estimated in that study through simulations, and these properties of the ﬁve samples can befound in our previous publications [20, 30].To perform the hybrid 1D-ﬂow/3D-solid FSI simulation, we couple the FEM representation of eachanatomical vocal fold with the present trained 1D ﬂow model. The subglottal pressure in each case isobtained from the experiment and is in the range of 0.72 to 1.05 kPa [20]. Figure 10 shows a comparisonof the normalized glottal gap width in a sequence of vocal fold oscillations between the FSI simulation andthe experiment. Because the high-speed imaging does not provide a length scale, we use the normalized gapwidth, d/d max , for comparison, where d is the gap width of the glottis measured at mid-section and d max is its peak value. From this ﬁgure, it can be seen that the waveform obtained from the simulation agreesgenerally well with the experiment. In modeling the vocal fold contact, we maintain a minimum glottal gapof 0.02 mm between the sides for the ﬂow [6]. Thus, the waveform from the FSI simulation does not have fullclosure. For quantitative comparison, we further compute the normalized root-mean-square (r.m.s.) errorof the waveform between the simulation data and experiment for each sample. The result is listed for all16 igure 10: Waveforms of the normalized glottal gap width from the in-vivo phonation experiment and the FSI simulation forthe rabbit sample R1.Table 3: The normalized r.m.s. error of the gap width waveform for each rabbit sample. Results from a previous study [30]with the untrained ﬂow model is also included. Sample R1 R2 R3 R4 R5Error 13.7% 10.3% 12.7% 14.5% 14.0%Error [30] 14.3% 11.3% 15.3% 15.9% 16.9%ﬁve samples in Table 3, which shows that the error is within 15% for all cases. As shown in the table, theser.m.s errors are lower than the results in the previous work, where the untrained ﬂow model was used [30].Therefore, the trained ﬂow model provides improved results and reasonable prediction of the vibration forthese subject-speciﬁc models.Figure 11 further shows a quantitative comparison between the experiment and the FSI simulation forall ﬁve samples in terms of the vibration frequency and normalized amplitude, d max /L , where L is the vocalfold length. In the experiment, each sample had three trials to analyze the standard deviations [17, 20].From this ﬁgure, both the frequency and the amplitude from the simulation fall within the range of theexperimental data for all the ﬁve samples despite signiﬁcant variations among the individual subjects. Thisresult again conﬁrms the performance of the present 1D ﬂow model in the FSI simulation. D. Discussion

In contrast with 3D computational ﬂuid dynamics models that employ extensive computing resourcesand require substantially longer simulation time, the drastically simpliﬁed ﬂow models such as the present 1Dmodel oﬀers much faster turnaround and may be used in conjunction with the 3D models as a complementary17 igure 11: Frequency and amplitude comparison between the experimental and numerical results for ﬁve rabbit samples. fashion for model-based prediction. The performance of such models could be evaluated in terms of theiraccuracy, robustness, and required information during practical implementation.Using a similar set of partial diﬀerential equations based on momentum and mass conservation, thepresent 1D ﬂow model retains the viscous and entrance eﬀects of the ﬂow model from the previous studiesof Luo and coworkers [29, 30] and hence oﬀers similar advantages shown therein in comparison with thetraditional Bernoulli based models. In the present model, we have incorporated machine learning to generalizethe pressure loss and entrance eﬀect in the glottis. Those eﬀects were only assumed by ad hoc mannerspreviously in the untrained model [29, 30] and may need adjustment in practical use. For example, previouslythe shear stress related to ﬂow separation, τ χ , is modelled as τ χ = As ( 1 − χ ) ρu ∂u∂x , where s is the perimeteraround the cross section, A is the eﬀective cross section area, and 0 ≤ χ ≤ χ has to be adjusted empiricallyfor a divergent channel in order to achieve matching results to the 3D model. In addition, the area correctionalcoeﬃcient, α ( x ), has an adjustable constant C in Eq. (4) of Ref. [30]. In the present study, the parameters inthe shear stress and the area correctional coeﬃcient have been expressed in explicit functions of the Reynoldsnumber and a few geometrical parameters describing the instantaneous shape of the glottis through a dataregression procedure and thus has better capability of generalization.Using the idealized vocal fold geometry, we have demonstrated that the new ﬂow model provides betterprediction of the pressure distribution and ﬂow rate than the previous untrained model (Section 3.1). Whenapplied to the FSI simulation, the new ﬂow model leads to clearly more accurate predication of the vibrationfrequency, amplitude, and phase delay in the vocal fold dynamics (Section 3.2). However, when it is appliedto the subject-speciﬁc vocal fold geometries, the new ﬂow model oﬀers only small or limited improvement ascompared with the previous untrained model (Section 3.3), where the normalized r.m.s for all samples rangesfrom 10% to 15%. The reason for this limitation is mostly likely due to presence of many other uncertain-18ies related to this type of subject-speciﬁc models, e.g., quality of the MRI data, the segmentation errors,assumption of the tissue properties, as well as the experiment itself, which start to become predominantfactors over the numerical model’s own error. In that case, improving the numerical model alone will notfurther increase the overall accuracy.Despite being unable to substantially improve the accuracy in the subject-speciﬁc cases, the presentﬂow model is still advantageous as compared with the similar existing models. We emphasize that fromthe idealized geometries (including signiﬁcant variations in the medial thickness, subglottal pressure, andthe tissue stiﬀness) to the anatomical geometries, the current 1D ﬂow model uses exactly the same pressureloss and entrance eﬀect functions that are derived from machine learning, and there is no need to make anyparameter adjustment. This feature of robustness, the overall improved accuracy in all the tests, as well asthe fact the present model does not require any additional input information, indicate that the present ﬂowmodel has signiﬁcant better performance than the previous untrained model [29, 30]. IV. Conclusion

In this study, we have presented a new 1D ﬂow model for glottal airﬂow that is based on the viscous ﬂowassumption. As compared with similar models in the previous studies, in the current ﬂow model we derivethe pressure loss and the entrance eﬀect using a machine learning approach and express them as explicitfunctions of the Reynolds number and the parameters describing the characteristic shape of the glottis atany instantaneous moment. Unlike previous models in which the parameters need to be modiﬁed ad hocfor diﬀerent cases such as the vocal fold geometry, the present machine-trained model can be used for moregeneral situations without the need to modify its parameters. We have tested the performance of this 1Dﬂow model in three scenarios. First, we use this ﬂow model to calculate the pressure distribution and thevolume ﬂow rate using the glottal conﬁguration from the 3D FSI simulations not included in the trainingprocess. The results agree well with those directly from the 3D simulations, and they are signiﬁcantly betterthan those from the previous untrained model. Second, the 1D ﬂow model is coupled with the 3D idealizedvocal fold geometry to perform the hybrid 1D-ﬂow/3D-solid FSI simulation, and the results show that thevibration characteristics match the full 3D FSI simulation results signiﬁcantly better than the untrainedmodel in terms of the vibration frequency, amplitude, and phase delay. Third, we applied the 1D ﬂow modelto the subject-speciﬁc vocal fold models constructed from the rabbit larynx, and the FSI simulation resultsare compared against the previous in-vivo experimental data. Even though in this case the improvement islimited likely due to the presence of uncertainties, the new model achieves the accuracy performance withoutthe need to adjust its parameters.In summary, we conclude that the present 1D glottal airﬂow model enhanced by machine learning is moreaccurate and robust than the similar models in the previous studies and could be useful for eﬃcient modelingof vocal fold dynamics, e.g., estimate of the unknown tissue properties of an individual subject’s vocal foldusing model-based simulations, or design optimization of the surgical implant inserted into a paralyzed vocal19old.

Appendix A. Expressions from sparse regression

The following formulae are the expressions derived from the SINDy method for the pressure losses τ b , τ e , and the area correction coeﬃcient α ( x ) at x b . τ b ρu c = 5 . r b − . l bc − . Re / + 3 . r d + 0 . r a − . l ac − . r b − . r b r d − . r b r a + 0 . l bc r a + 0 . Re / + 0 . Re / r a − . r d − . r d r a + 0 . r a + 0 . r b r d − . r b r a − . Re / r a (A.1) τ e ρu c = 20 . r b − . l bc + 0 . Re / − . r d − . r a − . l ac − . r b + 0 . r b l bc − . r b Re / − . r b r d − . r b r a + 0 . l bc Re / + 0 . l bc r d + 0 . l bc r a + 0 . r d r a + 0 . r a + 0 . r b + 0 . r b Re / + 0 . r b r d − . r b l bc r d + 0 . r b Re / + 0 . r b Re / r a + 0 . r b r d − . r b r a − . l bc r d − . l bc r d r a − . Re / r a (A.2) α ( x b ) α ( x c ) = 1 . r b − . l bc + 0 . Re / + 1 . r d + 0 . r a − . r b + 0 . r b l bc − . r b Re / − . r b r d − . r b r a + 0 . l bc r d + 0 . l bc r a − . Re / l ac − . r d − . r d r a + 0 . r d l ac + 0 . r b − . r b l bc + 0 . r b r d + 0 . r b r a − . r b l bc r d − . r b l bc r a + 0 . r b Re / l ac + 0 . r b r d + 0 . r b r d r a − . r b r d l ac − . l bc r d − . l bc r d r a − . l bc r a (A.3) Acknowledgement:

This research was supported by an NIH grant 5 R01 DC016236 03 from the NationalInstitute of Deafness and Other Communication Disorders (NIDCD).