A one-dimensional flow model enhanced by machine learning for simulation of vocal fold vibration
Zheng Li, Ye Chen, Siyuan Chang, Bernard Rousseau, Haoxiang Luo
TThis manuscript has been accepted for publication in Journal of the Acoustical Society of America
A one-dimensional flow model enhanced by machine learning forsimulation of vocal fold vibration
Zheng Li a , Ye Chen a , Siyuan Chang a , Bernard Rousseau b , Haoxiang Luo a , a Department of Mechanical Engineering, Vanderbilt University, 2301 Vanderbilt Place, Nashville, TN37235-1592 b Department of Communication Science and Disorders, University of Pittsburgh
Abstract
We describe a one-dimensional (1D) unsteady and viscous flow model that is derived from the momentumand mass conservation equations, and to enhance this physics-based model, we use a machine learningapproach to determine the unknown modeling parameters. Specifically, we first construct an idealized larynxmodel and perform ten cases of three-dimensional (3D) fluid–structure interaction (FSI) simulations. Theflow data are then extracted to train the 1D flow model using a sparse identification approach for nonlineardynamical systems. As a result of training, we obtain the analytical expressions for the entrance effect andpressure loss in the glottis, which are then incorporated in the flow model to conveniently handle differentglottal shapes due to vocal fold vibration. We apply the enhanced 1D flow model in the FSI simulation ofboth idealized vocal fold geometries and subject-specific anatomical geometries reconstructed from the MRIimages of rabbits’ larynges. The 1D flow model is evaluated in both of these setups and is shown to haverobust performance. Therefore, it provides a fast simulation tool superior to the previous 1D models.
Keywords: vocal fold vibration; fluid–structure interaction; subject-specific model; machine learning, 1Dmodel; phonation;
I. Introduction
Computational modeling of fluid–structure interaction (FSI) for vocal fold vibration is useful as it mayprovide a computer based tool for clinical management of voice disorders, e.g., surgical planning for vocalfold paralysis [1]. Despite that the underlying physical principle of vocal fold vibration is straightforward andcan be modeled simply using lumped-mass models, high-fidelity modeling to simulate details of the tissue’sdynamic deformation is still very challenging, especially if patient-specific features should be simulated forthe purpose of developing modeling tools that can capture differences in the laryngeal anatomy and tissueproperties of individuals. Current affiliation:Corning Inc., Corning, NY Corresponding author: [email protected] (E-mail), +1-615-322-2079 (Tel) a r X i v : . [ phy s i c s . f l u - dyn ] F e b ith tremendous growth of the computer power and improvement of the modeling approach, compu-tational models for the FSI of vocal fold vibration have been advanced substantially in recent years. Thesephysics-based models typically couple a 2D or 3D glottal airflow model and a finite-element representationof the vocal fold tissue [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. In addition, they adopt increasingly high resolu-tion and have provided insightful fundamental understanding about this FSI system such as eigenmodes [2],mechanics of posturing [3], sensitivity to the geometry and material properties [5, 8, 9, 10], vortex flow andpressure on the vocal fold surface [6, 7, 13], energy transfer [4, 7], and acoustic wave propagation [11, 12]. Inthese references, the vocal fold was often represented by a schematic that captures only the overall character-istics of the laryngeal geometry. Such models are obviously not sufficient for subject-specific representation.Medical imaging technologies, e.g., computed tomography (CT) and magnetic resonance imaging (MRI),allows the computational models to incorporate more realistic or even subject-specific laryngeal geometries.These imaging tools may provide detailed 3D anatomy of the larynx, as well as the interior structure of thetissue [14, 15, 16, 17]. Coupled with a 3D airflow solver in FSI simulation, the anatomical models of thelarynx represent a significant step toward patient-specific modeling of vocal fold vibration, which may beneeded for clinical care of voices of individual patients. In recent years, such patient-specific models havebeen developed to study vocal fold vibration by several researchers [18, 19, 20].On the other hand, the unknown tissue properties for individual subjects, which can not be identifiedfrom current imaging technologies, limit the application of patient-specific modeling. There have been priorefforts to derive the vocal fold tissue properties using finite-element method (FEM) based models combinedwith experimental tests [21, 22, 23], but they were limited to ex-vivo conditions. Although it is probableto determine the in-vivo , subject-specific tissue properties by running the high-fidelity FSI simulations andsolving an inverse problem, assuming that all other aspects in the FSI model match the corresponding in-vivo experiment (e.g., the anatomy and boundary conditions), such an approach is still not practical since the 3Dairflow simulation is too expensive even with high-performance parallel computing. More practically, onecould use a simplified flow model, coupled with a realistic FEM representation of the vocal fold for the FSIsimulation, to determine the elastic properties with much lower computational cost. Such efforts have beenmade recently in a few studies [20, 24, 25, 26]. In practice, this kind of simplified FSI models could also becombined with the high-fidelity models to increase the overall modeling accuracy [20].Bernoulli based flow equations have long been used for vocal fold vibration. Decker and Thomson [27]compared the Bernoulli equation with the Navier–Stokes equation for simulation of vocal fold vibration,and they found that the Bernoulli models could be highly dependent on the ad hoc assumption of the flowseparation in the glottis. Chang [28] also reached a similar conclusion, and they found the Bernoulli modelmay lead to a significantly different vibration mode of the vocal fold from a full Navier–Stokes model. Ingeneral, the main limitations in the Bernoulli principle are due to the assumption of the ideal flow in theglottis and a priori unknown location of flow separation. To address the limitations of the Bernoulli equation,in our recent works [29, 30] we developed a 1D momentum equation based flow model that was originally2esigned to solve separated flow in the collapsible tube [31, 32] and was only recently introduced for modelingof vocal fold dynamics [33]. In this model, we have included the effect of pressure loss that is caused by theflow separation and the viscous effect. Furthermore, we have included an entrance effect, which is due toan inertial flow entering the glottis from a rapidly converging shape in the subglottal region. This 1D flowmodel was coupled with the 3D tissue model for FSI simulation of both idealized and anatomical vocal foldgeometries from rabbits. In the idealized vocal fold cases, the reduced FSI model achieved consistent resultswith those from the full 3D FSI model for different medial vocal fold thicknesses, subglottal pressures, tissuemodels, and tissue stiffness properties. In the anatomical models, vibration results from the reduced FSImodel agreed well with the experimental data of the evoked in-vivo rabbit phonation [29, 30].However, the pressure loss and entrance effect in the glottal airflow depend on the overall shape of theglottis as well as its instantaneous deformation during vibration. In addition, the Reynolds number playsa significant role. Since no analytical expression exist to describe these effects precisely, in the previouswork [29, 30] we chose to use constant parameters based on the knowledge learned from 3D flow simulations.These parameters may need adjustment depending on specific vocal fold geometry, e.g., a large or smallmedial thickness, in order to achieve good accuracy. This limitation reduces generalization of the flowmodel. To overcome the limitation, we set variable parameters for the pressure loss and entrance effect andseek to express them in a functional form that is convenient to use.Machine learning techniques, which have gained popularity in fluid mechanics in recent years [34], providea useful approach to help determine the functional form of a physical effect, especially when general character-istics of such effect have been understood. This feature applies well to the situation we are considering. Herewe use the 1D Navier–Stokes equation to describe the flow physics and leave only the undescribed effects tobe determined empirically. These effects, i.e., the pressure loss and entrance effect in a converging-divergingglottis, can be expressed as functions of only a few dimensionless parameters such as the instantaneous shapeof the glottis and the Reynolds number of the flow. Once these functions are determined using a trainingdata set through machine learning, they can be incorporated into the 1D model for flow simulation of arbi-trary glottal shapes. Recently, there have been other studies that also incorporated machine learning intomodeling of vocal fold dynamics [26, 35, 36]. Compared to those studies, the present work takes advantageof the physics-based model, i.e., the unsteady viscous Navier-Stokes equation, albeit within the limitationsof 1D, as much as possible and only resort to machine learning for the remaining undescribed effects thatincludes the entrance effect.In fluid mechanics, many machine learning techniques have been developed to identify the governingequation directly from data [37, 38, 39, 34]. Here, we adopt the sparse identification of nonlinear dynamical(SINDy) systems [37], which is a regression algorithm suitable for the physical systems having only a fewrelevant terms to define the dynamics. Specifically, we will use the SINDy method to identify the pressureloss and entrance effect at locations near the glottal exit and express them as polynomial functions of theReynolds number, the channel length, and the convergent or divergent ratios of the glottis. To generate the3 igure 1: Schematic of airflow entering the glottis that generally has a converging-diverging shape. The flow experiences loss ofenergy and also a vena contracta effect, in which the effective area, A , may be smaller than the actual cross section area, A .The red-dashed lines indicate the boundary layer. training data for machine learning, we will use 3D FSI simulations of an idealized vocal fold with differentmedial thicknesses, stiffness properties, and subglottal pressures. Additional 3D simulation cases will beperformed for validation of the machine learning. To further assess the performance of the new flow model,we will compare other quantities such as the pressure distribution, flow rate, and vibratory characteristics inthe idealized model against the 3D simulation. Furthermore, we will apply the new model to FSI simulationof anatomical vocal fold geometries that are based on excised rabbit larynges. The simulation results willbe compared with high-speed video data from in-vivo phonation of the same larynx samples prior to theexcision. The descriptions of the model setup, machine learning procedure, and results from machine learningand FSI simulations are provided in the following sections. II. The flow model and training case setup
A. The one-dimensional viscous flow model
A 2D schematic of the geometry in the transverse plane of the larynx is shown in Fig. 1, where theglottis is depicted as a converging-diverging channel. To facilitate our discussion, we use x a and x b to markthe locations of the glottal inlet and exit, respectively, and x c the location of the minimal cross-section areain the glottis. Note that x c varies between x a and x b when the vocal fold is vibrating, and it could coincidewith x b such that the glottis is purely convergent. In practice, the location of x b is straightforward to chooseas the cross section typically experiences a sudden expansion at the glottal exit. On the other hand, thelocation of x a sometimes is not obvious since the subglottal region may narrow down gradually rather thanabruptly. We point out that from our tests, the present flow model is not sensitive to the inlet location if theglottis does not have a clear entrance location, in which case an approximate choice of x a would be sufficient.This is because the entrance effect of a gradually convergent section is small anyway, and in addition, thepressure loss primarily takes effect in the diverging section in the present model and has little dependenceon the inlet location. 4hen the air flows through the glottis during vibration, the pressure generally decreases before theminimal area section at x c due to the Bernoulli effect. After this section, the pressure would increase alongwith the area expansion. However, the pressure will not recover to its full extent because of the possibleseparation in the divergent section and also the viscous effect in the entire glottis. Therefore, accountingfor the pressure loss in the flow model will help overcome limitations of the Bernoulli equation. Consideringthe mass and momentum conservation equations, Cancelli and Pedley [31] developed a 1D flow model todescribe a collapsible tube. In the momentum equation, they included the viscous loss and separation effects.To generalize the pressure loss, we combine the viscous loss and separation effects as one single loss termrepresented by the shear stress, τ , in the following equation ∂A∂t + ∂Au∂x = 0 ρ ∂u∂t + ρu ∂u∂x = − ∂p∂x + ∂τ∂x (1)where ρ , u and p are respectively the density, velocity, and pressure, and A is the effective area of the crosssection. We will discuss the calculation of the shear stress τ ( x ) later using a machine learning approach. Forthe boundary conditions of the 1D flow, we set a specified subglottal pressure, P sub , and the pressure at theglottal exit, P e . Eq. (1) represents a nonlinear boundary value problem and can be solved using a shootingmethod once we have an expression for τ . Its numerical procedure was described in Li et al. [29].Beside the loss term, the vena contracta effect of flow entering an expansion was introduced in our flowmodel [29, 30]. In particular, as the air flows into the glottis, and especially the diverging section, it tendsto focus to the center under its inertia, rather than following the exact shape of the channel. Thus, we usethe effective cross section area, A , in Eq. (1) for the mass conservation equation. This area is smaller thanthe actual cross sectional area, A , as illustrated in Fig. 1. Without such an entrance effect, the negativepressure (gage pressure) at the minimum section could be overestimated, leading to inaccurate pressure loadon the vocal fold surface. To calculate the effective area A , we introduce a correctional coefficient, α ( x ), sothat A ( x ) = α ( x ) A ( x ) . (2)Note that α is a function of the streamwise location, x .In our previous work [29, 30], we estimated α ( x ) based on the 3D simulation of the FSI problem bycalculating it from α ( x ) = u avg /u , where u avg is the average streamwise velocity in the cross section and u isthe maximum streamwise velocity. We further assumed a quadratic function form for α ( x ) with a single freeparameter to be determined through machine learning, as will be discussed in next section. The quadraticfunction represents narrowing down of the effective area due to the growth of the boundary layer along theglottis. 5 . Input variables for machine learning To outline the flow model for machine learning, we use a simplified but characteristic geometry of theglottis and define the input–output variables for the machine learning module. Fig. 2(a) shows a 2D schematicof the glottis. We define five locations along the flow, which are: 1) the glottal inlet x a , 2) the glottal exit x b ,3) the narrowest area location x c , 4) a point in the subglottal region x d , and 5) an intermediate location inthe divergent section, x e . The average gap width at the narrowest section is denoted as H , so H = A ( x c ) /L ,where A ( x c ) is the cross section area at x c and L is the longitudinal length of the glottis. These locations, x a , x b , x c , and the corresponding cross-sectional area, A ( x a ), A ( x b ), and A ( x c ), describe the overall converging-diverging shape of the glottis. In addition to these variables, x d and A ( x d ) are used to describe the slopeof the subglottal region, which is useful in measuring the extent that the flow is focused when entering theglottis. Previous study has shown that the geometry of the glottal entrance has a significant influence onthe intraglottal pressure distribution [40]. Here we set x d at a distance of 5 H from x a to capture the slopeof the subglottal shape. Furthermore, the intermediate point x e is added so that the pressure loss at thislocation will be determined as one output variable as described in next section. We set x e to be closer to x b , with a distance ratio of 3:1 between | x c − x e | and | x b − x e | , since the pressure loss increases more quicklynear the glottal exit, as illustrated in Fig. 2(b). It is worth pointing out that the exact location of x d and x e are not crucial, as x d is used to calculate the subglottal slope of the vocal fold and x e is used to provideanother data point for the pressure loss estimate.In terms of nondimensional parameters, the geometric variables along with the Reynolds number aredefined as below, r b = A ( x b ) /A ( x c ) l bc = L bc /Hr a = A ( x a ) /A ( x c ) l ac = L ac /Hr d = A ( x d ) /A ( x a ) Re / = (cid:20) ρ u c Hµ (cid:21) / (3)Among these six variables, r b , r a , and r d are the area ratios, l bc and l ac are the normalized distances. TheReynolds number is defined using the velocity at the narrowest section, u c , and is reduced to the power 1 / C. Output variables for machine learning
To determine the pressure loss, or the shear stress term in Eq. (1), we assume that τ = 0 and d τ / d x = 0at the glottal inlet x a , τ = τ e at the intermediate point x e , and τ = τ b at the exit x b . The two unknownvariables, τ e and τ b , will be determined using machine learning as functions of the six input variables described6a) (b) x c x d x b x e x a H5H xL ac L bc sectionMinimum x a x e x b x a x c x b x e e τ b τ x c x c τα / α ( ) xx Figure 2: (a) Geometric description of the glottis used in the machine learning. (b) Functions of the pressure loss, τ ( x ), andthe area correction coefficient, α ( x ), along the glottis. in Eq. (3). Once τ e and τ b are determined from machine learning, we assume a cubic distribution for τ ( x )from the glottal inlet x a to the exit x b as shown in Fig. 2(b). This assumption is made based on observationof general characteristics of the pressure loss from our 3D flow simulations [30]. Note that a higher orderdistribution is also possible using the same strategy, if more output variables are used from machine learning.For the entrance effect, we need to determine the area correction coefficient, α ( x ), along the glottis.Similar to our previous publications [29, 30], we assume a quadratic distribution for α ( x ) between x c and x b . However, in the present study we will only need to determine α/α ( x c ) since only the relative area ratiois needed when solving Eq. (1). Furthermore, α ( x ) is assumed to have zero derivative at x b . Therefore,we will only need to determine α ( x b ) /α ( x c ) through machine learning. In summary, there are three outputvariables for the machine learning process, which are τ e , τ b , and α ( x b ) /α ( x c ). D. The SINDy method for machine learning
For machine learning, we use sparse identification of nonlinear dynamical systems (SINDy) [37], which isa data regression approach to discover governing equations for nonlinear dynamical systems including fluidflows. In particular, SINDy uses sparse regression to determine the fewest terms in the dynamic governingequations required to accurately represent the data, and this results in parsimonious models that balanceaccuracy with model complexity to avoid overfitting [37]. The key assumption in this approach is that formany systems of interest, the governing equation consists of only a few terms, making it sparse in the spaceof possible functions [37]. Using training data that will be described in next section, SINDy can determinea generic output variable, f , as a polynomial function of the six input variables defined in Eq. (3), i.e., f = f ( r b , l bc , r a , l ac , r d , Re / , r b , r b l bc , · · · , Re / , · · · ) (4)7 igure 3: The vocal fold model and computational domain used for 3D FSI simulation and data generation.Table 1: FSI cases setup for data generation. Cases 1-6 are used for training, and Cases 7-10 for additional validation. Case 1 2 3 4 5 6 7 8 9 10 T (mm) 1.75 3.50 1.75 3.50 P sub (kPa) 1.00 1.25 0.75 1.25 0.75 α (kPa) 2.29 2.58 9.16 2.29 4.58 9.16 2.29 α (kPa) 0.25 0.50 1.00 0.25 0.50 1.00 0.25The software package of SINDy in Matlab is freely available by the authors [37] and is used here for ourstudy. For our study, only the terms up to the third order are retained in this polynomial function. E. Setup of the 3D FSI model and data generation
To generate training data for machine learning, we use a previous 3D setup of the FSI of an idealized vocalfold geometry as illustrated in Fig. 3. This model has been described in our previous publications [9, 29, 30]and is only briefly summarized here. The airflow is driven by a constant subglottal pressure P sub , and theoutlet has a reference pressure of P out = 0 kPa for all the cases in consideration. The air is assumed to beincompressible and is governed by the viscous Navier-Stokes equation. A pair of vocal fold bands are placedsymmetrically in the channel, whose length, width, and depth are L = 20 mm, W = 13 mm, and D = 10 mm,respectively. The medial thickness, T , has significant effects on the flow and the vocal fold vibration [28],as the medial surfaces are the primary loading surfaces for the sustained vibration. The vocal fold hereis assumed to be isotropic, homogeneous, and is governed by a hyperelastic, two-parameter Mooney-Rivlinmodel. The strain energy density function for this model is given as W = α ( I −
3) + α ( I −
3) + K/ J − (5)where K represents the bulk modulus, α and α are material constants related to the distortional response,and J = det( F ) with F standing for the deformation gradient. In addition, I and I are invariants basedon J and the principal stretches of the deformation gradient. Further detail of this model for the vocal fold8an be found in our group’s previous work [41]. Anisotropic tissue behavior or a multi-layer structure asproposed in many previous works [2, 7, 10] would be better representation of the real tissue of vocal fold.However, we only need characteristic vocal fold deformation and corresponding flow data for the trainingpurpose; thus, the specific material model for the vocal fold tissue is not essential in this study.To solve the 3D FSI, an in-house immersed-boundary method is employed for the flow simulation,while the tissue deformation is solved with a finite-element method [41]. In total, we solved ten simulationcases after a careful mesh independence study [29, 30]. Table 1 shows the details for all the test cases,which contains variations in these parameters: the medial thickness T , the subglottal pressure P sub , andthe material stiffness constants α and α . The tissue density is ρ s = 1040 kg/m and mass damping is0.05 s − in all the cases. The air density is ρ = 1 .
13 kg/m . Thus, the characteristic intraglottal velocity is V = (cid:112) P − P out ) /ρ = 42 . Re J = ρV d/µ , where d ∼ µ is the air viscosity. In the current study,we set Re J = 210.Cases 1-6 are utilized in SINDy as training data, which have the same P sub but two different medialthicknesses and three stiffness constants. After steady vibration is established in the 3D FSI simulations,80 time frames of data in each case, which cover at least 2 vibration cycles (from 8 to 16 milli-seconds,depending on the frequency), are used for training. That is, at each chosen time frame, the instantaneousvalues of the six input variables and the three output variables represent a data point for machine learning.To calculate these input and output variables, we extract the flow velocity and pressure along the centerlineof the 3D flow field. In total, there are 480 data points for all 6 cases together for training. We will comparethe output from the regression (i.e., the equations derived from the machine learning) with those providedfor training. To extend the validation, we consider Cases 7-10, in which the subglottal pressure is differentand whose data have not been used for training, for further assessment. For all ten cases, we will also usethe machine learning enhanced 1D flow model to replace the 3D flow and perform FSI simulations, and wewill compare the vibration frequency, amplitude and phase delay of the vocal fold between 3D FSI and thesimplified FSI model. The entire procedure is shown using a flow chart in a supplementary figure [42]. III. Results and discussion
A. Results from machine learning
After the training process, we obtain the explicit expressions of the output variables as defined inSection C. These expressions are given in the appendix A. Results of data regression are shown in Fig. 4for τ b / ( ρu c ), τ e / ( ρu c ), and α ( x b ) /α ( x c ), where u c is the centerline velocity at the minimum section x c . Inthis figure, the x -axis represents the 3D FSI results of Cases 1 to 10 and the y -axis represents the predictedvalue based on the trained regression model, i.e., Eqns. (A.1) to (A.3). Red symbols represent training datafrom Cases 1-6, while blue symbols are the validation data from Cases 7-10. Ideally, the predicted value isequal to the 3D FSI value so that all data points would fall on the dashed line y = x in the figure. However,9 igure 4: Comparison between 3D FSI results and the predicted results from data regression. Data in Cases 1-6 (red symbols)are used as training data; Data in Cases 7-10 (blue symbols) are only used for validation. (a) Pressure loss at the glottal exit, τ b / ( ρu c ). (b) pressure loss at x e , τ e / ( ρu c ). and (c) the entrance effect ratio at x b , α ( x b ) /α ( x c ). The dashed line representsthe ideal fit. due to the error in the fitting process, the data are scattered around the line. From this figure, we can seethat both the training data and the validation data mostly cluster around y = x ; thus, the predicted resultsfrom regression agree reasonably well with the 3D results. We calculate the mean error between the machinelearning result and the benchmark result from the 3D simulations for each of these three output variables,i.e., | φ ML − φ D | . For τ b / ( ρu c ), the mean error of the training data and mean error of the validation dataare 0.029 and 0.033, respectively; for τ e / ( ρu c ), they are 0.057 and 0.086, respectively; and for α ( x b ) /α ( x c ),they are 0.048 and 0.078, respectively.In Fig. 4(a,b), we see that the loss coefficients in the glottis, τ b / ( ρu c ) and τ e / ( ρu c ), vary from nearly zeroto nearly three. The zero value means that there is no loss in the flow, while three represents a significantloss in the flow. Such great loss happens when the glottis is nearly closed and the flow speed becomes small,which is analogous to a closing mechanical valve in a pipe flow.In Fig. 4(c), we see that α ( x b ) /α ( x c ) is clustered above 0.5, indicating that the entrance effect at theglottal exit is not necessarily significant at those time frames. Further examination shows that in thosesituations, the glottis has a small divergent angle or has a short divergent section, which does not create astrong entrance effect. However, there are also a number of data points in this figure where α ( x b ) /α ( x c ) isbelow 0.5 and is even close to 0.2. For those situations, the divergent section is typically long and/or hasa large diverging angle, which causes early flow separation and strong entrance effect. The distribution ofthe data points in this figure thus covers a wide range of situations for the flow, which is preferred for thetraining purpose. To further illustrate the data variation and validation of the machine learning, we haveadded a supplementary figure plotting the three output variables against time for the validation cases [43].Once the expressions for τ b , τ c , and α ( x b ) /α ( x c ) are derived from machine learning, we can applythem in the 1D flow model. In doing so, τ ( x ) and α ( x ) in the glottis take the assumed distribution as inSection C. That is, we assume a cubic function for τ ( x ) as shown in Fig. 2(b), where τ = 0 and d τ / d x = 0at x a . When x c is very close to x b , i.e., the glottis is almost purely convergent, the cubic interpolation may10 igure 5: Comparison of pressure distribution along the centerline between 3D FSI and 1D flow model for small medial thickness T cases. (a) Closing phase in Case 7, (b) opening phase in Case 7; (c) closing phase in Case 8, (d) opening phase in Case 8.The inset shows the corresponding vocal fold deformation (solid lines) from its initial configuration (dashed lines). result in overshoot of the function. Thus, when | x c − x b | is less than H , we disregard τ e and switch thecubic function for τ ( x ) to the quadratic function with the same boundary conditions at x a . For the areacorrection coefficient α ( x ), we assume a quadratic function between x c and x b , while requiring d α/ d x = 0at x b , as shown in Fig. 2(b). Between the glottal inlet x a and the minimum section x c , the thickness of theboundary layer has only small change, so we assume that α ( x ) /α ( x c ) = 1 in that region.To verify the 1D flow model enhanced by machine learning, we first compared the pressure distributionpredicted by this model against 3D results generated from cases that are not used in the training process.More specifically, we use the instantaneous glottal shape obtained from the 3D FSI simulations of Cases 7 to10 and calculate the pressure distribution using the 1D flow model in Eq. (1). Then, the result is comparedwith the pressure at the centerline extracted from the 3D flow field.Figure 5 shows such comparison of pressure distribution at vocal fold opening and closing phases fromCases 7 and 8, which have the same medial thickness of T = 1 .
75 mm but different P sub . It can be seenthat the pressure produced by the 1D flow model is close to that from the 3D simulation for different P sub values, including the negative pressure during the opening phase when the glottis is of divergent shape. Fromour previous study [29], correct prediction of the negative pressure in this case of small medial thickness is11 igure 6: Comparison of pressure distribution along the centerline between 3D FSI and 1D flow model for large medial thickness T cases. (a) Closing phase, (b) opening phase, and (c) maximum opening in Case 9; (d) closing phase, (e) opening phase, and(f) maximum opening in Case 10. The inset is explained in Fig. 5. .205 0.21 0.215 0.22 Time (s) F l o w r a t e ( m l / s ) (d)
3D FSI1D FLOWRefs. [29, 30]
Time (s) F l o w r a t e ( m l / s ) (c)
3D FSI1D FLOWRefs. [29, 30]
Time (s) F l o w r a t e ( m l / s ) (b)
3D FSI1D FLOWRefs. [29, 30]
Time (s) F l o w r a t e ( m l / s ) (a)
3D FSI1D FLOWRefs. [29, 30]
Figure 7: Comparison of the volume flow rate between 3D FSI and 1D flow model for (a) Case 7, (b) Case 8, (c) Case 9, (d)Case 10, which are not in the training data set. important; otherwise, the vocal fold may exhibit a different vibration mode that is associated with the firsteigenmode of the tissue structure [29].Figure 6 shows comparison of the pressure distribution for Cases 9 and 10, which have the same medialthickness of T = 3 .
50 mm but different P sub . In these cases, the glottis is relatively long and may formdiverging, converging, and converging-diverging shapes at different vibration phases. From our previousstudy [29], if a Bernoulli equation based model is used, then an inappropriate setting of the location forflow separation may lead to exceedingly negative pressure that might destabilize the FSI simulation. In thepresent study, the 1D flow model correctly predicts the negative pressure zone (e.g., Fig. 6(c) and (f)) andcaptures the pressure reasonably well at different glottal shapes.In Figs. 5 and 6, we also include the pressure distributions calculated by the untrained 1D flow modelwith ad hoc assumptions in pressure loss and entrance effect [29, 30]. Overall, the current 1D flow modeltrained by machine learning achieves better accuracy as compared with those reference results.Besides the pressure distribution, the flow rate is also important for the glottal airflow. Figure 7 comparesthe volume flow rate between 3D FSI and 1D flow model for Cases 7 to 10 that are not used in the trainingprocess. Similar to the pressure distribution, we calculate the volume flow rate in 1D flow model using thesame glottal shape from the 3D simulation. Two representative cycles are selected for each case comparison.In Cases 9 and 10 where the medial thickness is larger, the vocal fold has better closure, and thus the flow13 igure 8: Comparison between 3D FSI and reduced-order model: (a) amplitude, (b) frequency, (c) phase delay. rate has greater oscillations and is reduced to nearly zero at closure. On the other hand, in Cases 7 and 8where the medial thickness is smaller, the vocal fold maintains a significant gap at the closing phase, whichleads to a high flow rate and low-magnitude oscillations during vibration. In all cases under consideration,the flow rate from 1D flow model agrees well with 3D FSI result.As a reference, we also include the volume flow rate in Cases 7-10 calculated using the untrained 1D flowmodel [29, 30]. Comparing with the reference result, the flow rate from the present model shows significantlybetter agreement with the 3D FSI result.Therefore, the present 1D flow model provides reliable predication for the volume flow rate during vocalfold vibration. B. Application in the FSI of idealized vocal fold models
After verifying the 1D flow model for flow calculation only, we then apply it in the FSI simulation bycoupling it with the 3D idealized vocal fold model described in Section A. We compare the vibrationcharacteristics from this 1D-flow/3D-solid hybrid FSI simulation with those from the full 3D FSI simulation.For this comparison, all ten cases in Table 1 are considered, which include variations in the medial thickness,stiffness properties, and subglottal pressure. As mentioned before, Cases 7 to 10 were not used in the trainingdata but included here as additional validation.We use the vibration amplitude, frequency, and phase delay for quantitative comparison between thetwo sets of simulations. The vibration amplitude is defined as the maximum y -displacement of the vocal foldmeasured at the glottal exit in a cycle. From Case 1 to Case 3, the tissue stiffness increases while the otherparameters are the same. As shown in Fig. 8(a), the vibration amplitude decreases with increasing tissuestiffness from Case 1 to 3, which have a smaller medial thickness. Similar trend can be seen from Case 4 to6, which have a larger medial thickness. Cases 7, 1 and 8 have different subglottal pressures from high tolow. Correspondingly, their vibration amplitude shows a decrease in the same order. A similar result canbe seen for Cases 9, 4, and 10 in which P sub is decreased. In all ten cases, the two sets of FSI simulationsproduce closely agreeable results. 14 able 2: 1D-flow FSI results compared with 3D FSI in terms of vibration frequency f , amplitude d , and phase delay φ . Resultsusing the untrained model in Ref. [29] are also included. Model f (Hz) difference d (mm) difference φ ( ◦ ) difference ( ◦ )Case 1 3D FSI 132 - 0.69 - -19 -1D-flow FSI 135 2.3% 0.65 5.8% -15 4Ref [29] 144 9.1% 0.58 15.9% -15 4Case 4 3D FSI 140 - 1.02 - 157 -1D-flow FSI 136 2.9% 1.04 2.0% 159 2Ref [29] 144 2.9% 1.00 2.0% 161 4Avg. error(Cases 1-10) 1D-flow FSI - 2.3% - 5.3% - 6Ref [29, 30] - 3.9% - 9.1% - 11The comparison of the vibration frequency is shown in Fig. 8 (b). The second-eigenmode type vibra-tion [7] is established in all the cases, where the vocal fold oscillation is primarily in the lateral or y -direction,and this mode is captured by the 1D-flow FSI simulation. Thus, the frequency predicted by this hybrid FSImatches that by the full 3D FSI. From the figure, the effect of the tissue stiffness on the vibration frequencyis clear, e.g., from Case 1 to 3, and from Case 4 to 6, where an increase of the tissue stiffness leads to increaseof the vibration frequency.The phase delay is calculated using the temporal difference between the glottal inlet and the exit in themid xy -plane in terms of the displacement. Good agreement is again achieved in the comparison betweenthe two sets of simulations for all cases. For this quantity, the most influential parameter is the medialthickness, as the longer glottis would create greater phase difference between the glottal inlet and the exitand thus lead to a more pronounced mucosal wave along the glottis.To compare the performance of the present flow model with that of the untrained flow model, we includethe results from a previous study [29, 30]. This comparison is shown in Table 2. Only one case each fromthe large T and small T is shown here for brevity. We also include the average differences for the frequency,amplitude and phase delay from Cases 1-10. If we only consider Cases 7-10, the average error is 3.3%, 6.3%and 5.2 ◦ , for the frequency, amplitude, and phase delay, respectively, which is consistent with the overallaverage in Table 2. From the comparison, it can be seen that the trained flow model leads to better accuracyin the predication of the vibration than the previous untrained model. C. Application in the subject-specific vocal fold models
Other than the idealized vocal fold geometry, we also apply the 1D flow model in the FSI simulation ofthe subject-specific vocal fold models that were generated based on 3D scan of rabbits’ larynx (Fig. 9). Inthe present study, we utilize the vocal fold models created previously in Chang et al. [20] and will validatethe simulation results against the experimental data of evoked in-vivo phonation. The same models were15 igure 9: Subject-specific vocal fold model: (a) reconstructed larynx geometry from a superior view; (b) profiles of the vocalfold cover (blue) and body (red) segmented from the MRI scan. also used in Chen et al. [30] to validate the 1D flow model without machine learning. Readers are referred toChang et al. [20] for the details how these anatomical models were created and how the in-vivo measurementof the vocal fold vibration was conducted. Only a brief summary is given here to provide the context.In the experiment [17], live rabbits were used in the study; their vocal fold was surgically sutured toachieve adduction, and phonation was evoked by introducing pressurized air from their trachea. High-speedvideos of the vocal fold vibration were taken during the experiment, which provide the vibration frequency,magnitude, and waveform as the validation data for our current study.After the phonation experiment, the rabbit larynx was excised and high-resolution MRI was performedto obtain details of morphology of the vocal fold while the vocal fold maintained the adducted phonatoryposition. The 3D anatomical vocal fold model was generated for each of the five samples after manualsegmentation from the MRI data and surface mesh reduction/smoothing [20]. Furthermore, the tissueproperties were estimated in that study through simulations, and these properties of the five samples can befound in our previous publications [20, 30].To perform the hybrid 1D-flow/3D-solid FSI simulation, we couple the FEM representation of eachanatomical vocal fold with the present trained 1D flow model. The subglottal pressure in each case isobtained from the experiment and is in the range of 0.72 to 1.05 kPa [20]. Figure 10 shows a comparisonof the normalized glottal gap width in a sequence of vocal fold oscillations between the FSI simulation andthe experiment. Because the high-speed imaging does not provide a length scale, we use the normalized gapwidth, d/d max , for comparison, where d is the gap width of the glottis measured at mid-section and d max is its peak value. From this figure, it can be seen that the waveform obtained from the simulation agreesgenerally well with the experiment. In modeling the vocal fold contact, we maintain a minimum glottal gapof 0.02 mm between the sides for the flow [6]. Thus, the waveform from the FSI simulation does not have fullclosure. For quantitative comparison, we further compute the normalized root-mean-square (r.m.s.) errorof the waveform between the simulation data and experiment for each sample. The result is listed for all16 igure 10: Waveforms of the normalized glottal gap width from the in-vivo phonation experiment and the FSI simulation forthe rabbit sample R1.Table 3: The normalized r.m.s. error of the gap width waveform for each rabbit sample. Results from a previous study [30]with the untrained flow model is also included. Sample R1 R2 R3 R4 R5Error 13.7% 10.3% 12.7% 14.5% 14.0%Error [30] 14.3% 11.3% 15.3% 15.9% 16.9%five samples in Table 3, which shows that the error is within 15% for all cases. As shown in the table, theser.m.s errors are lower than the results in the previous work, where the untrained flow model was used [30].Therefore, the trained flow model provides improved results and reasonable prediction of the vibration forthese subject-specific models.Figure 11 further shows a quantitative comparison between the experiment and the FSI simulation forall five samples in terms of the vibration frequency and normalized amplitude, d max /L , where L is the vocalfold length. In the experiment, each sample had three trials to analyze the standard deviations [17, 20].From this figure, both the frequency and the amplitude from the simulation fall within the range of theexperimental data for all the five samples despite significant variations among the individual subjects. Thisresult again confirms the performance of the present 1D flow model in the FSI simulation. D. Discussion
In contrast with 3D computational fluid dynamics models that employ extensive computing resourcesand require substantially longer simulation time, the drastically simplified flow models such as the present 1Dmodel offers much faster turnaround and may be used in conjunction with the 3D models as a complementary17 igure 11: Frequency and amplitude comparison between the experimental and numerical results for five rabbit samples. fashion for model-based prediction. The performance of such models could be evaluated in terms of theiraccuracy, robustness, and required information during practical implementation.Using a similar set of partial differential equations based on momentum and mass conservation, thepresent 1D flow model retains the viscous and entrance effects of the flow model from the previous studiesof Luo and coworkers [29, 30] and hence offers similar advantages shown therein in comparison with thetraditional Bernoulli based models. In the present model, we have incorporated machine learning to generalizethe pressure loss and entrance effect in the glottis. Those effects were only assumed by ad hoc mannerspreviously in the untrained model [29, 30] and may need adjustment in practical use. For example, previouslythe shear stress related to flow separation, τ χ , is modelled as τ χ = As ( 1 − χ ) ρu ∂u∂x , where s is the perimeteraround the cross section, A is the effective cross section area, and 0 ≤ χ ≤ χ has to be adjusted empiricallyfor a divergent channel in order to achieve matching results to the 3D model. In addition, the area correctionalcoefficient, α ( x ), has an adjustable constant C in Eq. (4) of Ref. [30]. In the present study, the parameters inthe shear stress and the area correctional coefficient have been expressed in explicit functions of the Reynoldsnumber and a few geometrical parameters describing the instantaneous shape of the glottis through a dataregression procedure and thus has better capability of generalization.Using the idealized vocal fold geometry, we have demonstrated that the new flow model provides betterprediction of the pressure distribution and flow rate than the previous untrained model (Section 3.1). Whenapplied to the FSI simulation, the new flow model leads to clearly more accurate predication of the vibrationfrequency, amplitude, and phase delay in the vocal fold dynamics (Section 3.2). However, when it is appliedto the subject-specific vocal fold geometries, the new flow model offers only small or limited improvement ascompared with the previous untrained model (Section 3.3), where the normalized r.m.s for all samples rangesfrom 10% to 15%. The reason for this limitation is mostly likely due to presence of many other uncertain-18ies related to this type of subject-specific models, e.g., quality of the MRI data, the segmentation errors,assumption of the tissue properties, as well as the experiment itself, which start to become predominantfactors over the numerical model’s own error. In that case, improving the numerical model alone will notfurther increase the overall accuracy.Despite being unable to substantially improve the accuracy in the subject-specific cases, the presentflow model is still advantageous as compared with the similar existing models. We emphasize that fromthe idealized geometries (including significant variations in the medial thickness, subglottal pressure, andthe tissue stiffness) to the anatomical geometries, the current 1D flow model uses exactly the same pressureloss and entrance effect functions that are derived from machine learning, and there is no need to make anyparameter adjustment. This feature of robustness, the overall improved accuracy in all the tests, as well asthe fact the present model does not require any additional input information, indicate that the present flowmodel has significant better performance than the previous untrained model [29, 30]. IV. Conclusion
In this study, we have presented a new 1D flow model for glottal airflow that is based on the viscous flowassumption. As compared with similar models in the previous studies, in the current flow model we derivethe pressure loss and the entrance effect using a machine learning approach and express them as explicitfunctions of the Reynolds number and the parameters describing the characteristic shape of the glottis atany instantaneous moment. Unlike previous models in which the parameters need to be modified ad hocfor different cases such as the vocal fold geometry, the present machine-trained model can be used for moregeneral situations without the need to modify its parameters. We have tested the performance of this 1Dflow model in three scenarios. First, we use this flow model to calculate the pressure distribution and thevolume flow rate using the glottal configuration from the 3D FSI simulations not included in the trainingprocess. The results agree well with those directly from the 3D simulations, and they are significantly betterthan those from the previous untrained model. Second, the 1D flow model is coupled with the 3D idealizedvocal fold geometry to perform the hybrid 1D-flow/3D-solid FSI simulation, and the results show that thevibration characteristics match the full 3D FSI simulation results significantly better than the untrainedmodel in terms of the vibration frequency, amplitude, and phase delay. Third, we applied the 1D flow modelto the subject-specific vocal fold models constructed from the rabbit larynx, and the FSI simulation resultsare compared against the previous in-vivo experimental data. Even though in this case the improvement islimited likely due to the presence of uncertainties, the new model achieves the accuracy performance withoutthe need to adjust its parameters.In summary, we conclude that the present 1D glottal airflow model enhanced by machine learning is moreaccurate and robust than the similar models in the previous studies and could be useful for efficient modelingof vocal fold dynamics, e.g., estimate of the unknown tissue properties of an individual subject’s vocal foldusing model-based simulations, or design optimization of the surgical implant inserted into a paralyzed vocal19old.
Appendix A. Expressions from sparse regression
The following formulae are the expressions derived from the SINDy method for the pressure losses τ b , τ e , and the area correction coefficient α ( x ) at x b . τ b ρu c = 5 . r b − . l bc − . Re / + 3 . r d + 0 . r a − . l ac − . r b − . r b r d − . r b r a + 0 . l bc r a + 0 . Re / + 0 . Re / r a − . r d − . r d r a + 0 . r a + 0 . r b r d − . r b r a − . Re / r a (A.1) τ e ρu c = 20 . r b − . l bc + 0 . Re / − . r d − . r a − . l ac − . r b + 0 . r b l bc − . r b Re / − . r b r d − . r b r a + 0 . l bc Re / + 0 . l bc r d + 0 . l bc r a + 0 . r d r a + 0 . r a + 0 . r b + 0 . r b Re / + 0 . r b r d − . r b l bc r d + 0 . r b Re / + 0 . r b Re / r a + 0 . r b r d − . r b r a − . l bc r d − . l bc r d r a − . Re / r a (A.2) α ( x b ) α ( x c ) = 1 . r b − . l bc + 0 . Re / + 1 . r d + 0 . r a − . r b + 0 . r b l bc − . r b Re / − . r b r d − . r b r a + 0 . l bc r d + 0 . l bc r a − . Re / l ac − . r d − . r d r a + 0 . r d l ac + 0 . r b − . r b l bc + 0 . r b r d + 0 . r b r a − . r b l bc r d − . r b l bc r a + 0 . r b Re / l ac + 0 . r b r d + 0 . r b r d r a − . r b r d l ac − . l bc r d − . l bc r d r a − . l bc r a (A.3) Acknowledgement:
This research was supported by an NIH grant 5 R01 DC016236 03 from the NationalInstitute of Deafness and Other Communication Disorders (NIDCD).