Identification of the Isotherm Function in Chromatography Using CMA-ES
Mohamed Jebalia, Anne Auger, Marc Schoenauer, Francois James, Marie Postel
aa r X i v : . [ m a t h . O C ] O c t Identification of the Isotherm Function in ChromatographyUsing CMA-ES
M. Jebalia , A. Auger , M. Schoenauer , F. James , M.Postel Abstract — This paper deals with the identification of theflux for a system of conservation laws in the specific exampleof analytic chromatography. The fundamental equations ofchromatographic process are highly non linear. The state-of-the-art Evolution Strategy, CMA-ES (the Covariance Matrix Adap-tation Evolution Strategy), is used to identify the parametersof the so-called isotherm function. The approach was validatedon different configurations of simulated data using either one,two or three components mixtures. CMA-ES is then appliedto real data cases and its results are compared to those of agradient-based strategy.
I. I
NTRODUCTION
The chromatography process is a powerful tool to separateor analyze mixtures [6]. It is widely used in chemical industry(pharmaceutical, perfume and oil industry, etc) to producerelatively high quantities of very pure components. This isachieved by taking advantage of the selective absorption ofthe different components in a solid porous medium. Themoving fluid mixture is percolated through the motionlessmedium in a column. The various components of the mixturepropagate in the column at different speeds, because oftheir different affinities with the solid medium. The art ofchromatography separation requires predicting the differentproportions of every component of the mixture at the end ofthe column (called the chromatogram ) during the experiment.In the ideal (linear) case, every component has its ownfixed propagation speed, that does not depend on the othercomponents. In this case, if the column is sufficiently long,pure components come out at the end of the column atdifferent times: they are perfectly separated. But in the realworld, the speed of a component heavily depends on everyother component in the mixture. Hence, the fundamentalPartial Differential Equations of the chromatographic pro-cess, derived from the mass balance, are highly non linear.The process is governed by a nonlinear function of themixture concentrations, the so-called
Isotherm Function . Thisfunction computes the amount of absorbed quantity of eachcomponent w.r.t. all other components.Mathematically speaking, thermodynamical properties ofthe isotherm ensure that the resulting system of PDEsis hyperbolic, and standard numerical tools for hyperbolicsystems can hence be applied; if the isotherm is known:The precise knowledge of the isotherm is crucial, both fromthe theoretical viewpoint of physico-chemical modeling andregarding the more practical preoccupation of accurately
1. TAO Project-Team – INRIA Futurs, LRI, Orsay, 2. Math´ematiques,Applications et Physique Math´ematique d’Orl´eans, 3. Laboratoire Jacques-Louis Lions – UPMC, Paris, { jebalia,auger,marc } @lri.fr , [email protected] , [email protected] controlling the experiment to improve separation. Specificchromatographic techniques can be used to directly identifythe isotherm, but gathering a few points requires severalmonths of careful experiments. Another possible approachto isotherm identification consists in solving the inverseproblem numerically: find the isotherm such that numericalsimulations result in chromatograms that are as close aspossible to the actual experimental outputs.This paper introduces an evolutionary method to tacklethe identification of the isotherm function from experimentalchromatograms. The goal of the identification is to minimizethe difference between the actual experimental chromatogramand the chromatogram that results from the numerical sim-ulation of the chromatographic process. Chemical scientistshave introduced several parametric models for isotherm func-tions (see [6] for all details of the most important models).The resulting optimization problem hence amounts to para-metric optimization, that is addressed here using the state-of-the-art Evolution Strategy, CMA-ES. Section II introducesthe direct problem and Section III the optimization (orinverse) problem. Section IV-A reviews previous approachesto the problem based on gradient optimization algorithms[13], [12]. Section IV-B details the CMA-ES method andthe implementation used here. Finally, Section V presentsexperimental results: first, simulated data are used to validatethe proposed approach; second, real data are used to comparethe evolutionary approach with a gradient-based method.II. P HYSICAL PROBLEM AND MODEL
Chromatography aims at separating the components ofa mixture based on the selective absorption of chemicalspecies by a solid porous medium. The fluid mixture movesdown through a column of length L , considered here to beone-dimensional. The various components of the mixturepropagate in the column at different speeds, because oftheir different behavior when interacting with the porousmedium. At a given time t ∈ R + , for a given z ∈ [0 , L ] theconcentration of m species is a real vector of R m denoted c ( t, z ) . The evolution of c is governed by the followingpartial differential equation: ∂ z c + ∂ t F ( c ) = 0 , c (0 , z ) = c ( z ) , c ( t,
0) = c inj ( t ) . (1)where c : R → R m is the initial concentration, c inj : R → R m the injected concentration at the entrance of the columnand F : R m → R m is the flux function that can be expressedn the following way F ( c ) = 1 u (cid:18) c + 1 − ǫǫ H ( c ) (cid:19) where H : R m → R m is the so-called isotherm function, ǫ ∈ (0 , and u ∈ R + [12]. The Jacobian matrix of F beingdiagonalizable with strictly positive eigenvalues, the system(1) is strictly hyperbolic and thus admits an unique solutionas soon as F is continuously differentiable, and the initial andinjection conditions are piecewise continuous. The solutionof Eq. 1 can be approximated using any finite differencemethod that is suitable for hyperbolic systems [5]. A uniformgrid in space and time of size ( K +1) × ( N +1) is defined: Let ∆ z (resp. ∆ t ) such that K ∆ z = L (resp. N ∆ t = T ). Thenan approximation of the solution of Eq. 1 can be computedwith the Godunov scheme: c nk +1 = c nk − ∆ z ∆ t ( F ( c nk ) − F ( c n − k )) (2)where c nk is an approximation of the mean value of thesolution c at point ( k ∆ z, n ∆ t ) . For a fixed value of ∆ z ∆ t ,the solution of Eq. 2 converges to the solution of Eq. 1 as ∆ t and ∆ z converge to zero. The numerical scheme given in Eq.2 is numerically stable under the so-called CFL conditionstating that the largest absolute value of the eigenvalues ofthe Jacobian matrix of F is upper-bounded by a constant ∆ z ∆ t max c Sp ( | F ′ ( c ) | ) ≤ CFL < . (3)III. T HE O PTIMIZATION P ROBLEM
A. Goal
The goal is to identify the isotherm function from exper-imental chromatograms: given initial data c , injection data c inj , and the corresponding experimental chromatogram c exp (that can be either the result of a simulation using a knownisotherm function, or the result of actual experiments bychemical scientists), find the isotherm function H such thatthe numerical solution of Eq. 1 using the same initial andinjection conditions results in a chromatogram as close aspossible to the experimental one c exp .Ideally, the goal is to find H such that the following systemof PDEs has a unique solution c ( t, z ) : ∂ z c + ∂ t F ( c ) = 0 , c (0 , z ) = c ( z ) , c ( t,
0) = c inj ( t ) , c ( t, L ) = c exp ( t ) . (4)However, because in most real-world cases this system willnot have an exact solution, it is turned into a minimizationproblem. For a given isotherm function H , solve system 1and define the cost function J as the least square differ-ence between the computed chromatogram c H ( t, L ) and theexperimental one c exp ( t ) : J ( H ) = Z T k c H ( t, L ) − c exp ( t ) k dt (5) Mean value over the volume defined by the corresponding cell of thegrid.
If many experimental chromatograms are provided, the costfunction is the sum of such functions J computed for eachexperimental chromatogram. B. Search Space
When tackling a function identification problem, the firstissue to address is the parametric vs non-parametric choice[16]: parametric models for the target function result inparametric optimization problems that are generally easierto tackle – but a bad choice of the model can hinder theoptimization. On the other hand, non-parametric models are apriori less biased, but search algorithms are also less efficienton large unstructured search space.Early trials to solve the chromatography inverse problemusing a non-parametric model (recurrent neural-network)have brought a proof-of-concept to such approach [4], buthave also demonstrated its limits: only limited precisioncould be reached, and the approach poorly scaled up withthe number of components of the mixture.Fortunately, chemists provide a whole zoology ofparametrized models for the isotherm function H , and usingsuch models, the identification problem amounts to para-metric optimization. For i ∈ { , . . . , m } , denote H i thecomponent i of the function H . The main models for theisotherm function that will be used here are the following: • The
Langmuir isotherm [14] assumes that the differentcomponents are in competition to occupy each site ofthe porous medium. This gives, for all i = 1 , . . . , m H i ( c ) = N ∗ P ml =1 K l c l K i c i . (6)There are m + 1 positive parameters: the Langmuircoefficients ( K i ) i ∈ [1 ,m ] , homogeneous to the inverse ofa concentration, and the saturation coefficient N ∗ thatcorresponds to some limit concentration. • The
Bi-Langmuir isotherm generalizes the Langmuirisotherm by assuming two different kinds of sites onthe absorbing medium. The resulting equations are, forall i = 1 , . . . , m H i ( c ) = X s ∈{ , } N ∗ s P ml =1 K l,s c l K i,s c i . (7)This isotherm function here depends on m +1) parameters: the generalized Langmuir coefficients ( K i,s ) i ∈ [1 ,m ] ,s =1 , and the generalized saturation coef-ficients ( N ∗ s ) s =1 , . • The
Lattice isotherm [17] is a generalization of Lang-muir isotherm that also considers interactions amongthe different sites of the porous medium. Dependingon the degree d of interactions (number of interact-ing sites grouped together), this model depends, addi-tionally to the Langmuir coefficients ( K i ) i ∈ [1 ,m ] andthe saturation coefficient N ∗ , on interaction energies ( E ij ) i,j ∈ [0 ,d ] , ≤ i + j ≤ d resulting in Q mi =1 d + ii parame-ters. For instance, for one component ( m = 1 ) andegree , this gives: H ( c ) = N ∗ K c + e − E RT ( K c ) K c + e − E RT ( K c ) , (8)where T is the absolute temperature and R is theuniversal gas constant. Note that in all cases, a Latticeisotherm with energies simplifies to the Langmuirisotherm with the same Langmuir and saturation co-efficients up to a factor .IV. A PPROACH D ESCRIPTION
A. Motivations
Previous works on parametric optimization of the chro-matography inverse problem have used gradient-based ap-proaches [13], [12]. In [13], the gradient of J is obtained bywriting and solving numerically the adjoint problem, whiledirect differentiation of the discretized equation have alsobeen investigated in [12]. However the fitness function tooptimize is not necessarily convex and no results are providedfor differentiability. Moreover, experiments performed in [12]suggest that the function is multimodal, since the gradientalgorithm converges to different local optima dependingon the starting point. Evolutionary algorithms (EAs) arestochastic global optimization algorithms, less prone to getstuck in local optima than gradient methods, and do not relyon convexity assumptions. Thus they seem a good choice totackle this problem. Among EAs, Evolution Strategies havebeen specifically designed for continuous optimization. Thenext section introduces the state of the art EA for continuousoptimization, the covariance matrix adaptation ES (CMA-ES). B. The CMA Evolution Strategy
CMA-ES is a stochastic optimization algorithm specif-ically designed for continuous optimization [9], [8], [7],[3]. At each iteration g , a population of points of an n -dimensional continuous search space (subset of R n ), is sam-pled according to a multi-variate normal distribution. Evalu-ation of the fitness of the different points is then performed,and parameters of the multi-variate normal distribution areupdated.More precisely, let h ~x i ( g ) W denotes the mean value of the(normally) sampling distribution at iteration g . Its covariancematrix is usually factorized in two terms: σ ( g ) ∈ R + , alsocalled the step-size , and C ( g ) , a definite positive n × n matrix, that is abusively called the covariance matrix. Theindependent sampling of the λ offspring can then be written: ~x ( g +1) k = h ~x i ( g ) W + N k (cid:16) , ( σ ( g ) ) C ( g ) (cid:17) for k = 1 , . . . , λ where N k (0 , M ) denote independent realizations of themulti-variate normal distribution of covariance matrix M .The µ best offspring are recombined into h ~x i ( g +1) W = µ X i =1 w i ~x ( g +1) i : λ , (9) where the positive weights w i ∈ R are set according toindividual ranks and sum to one. The index i : λ denotesthe i -th best offspring. Eq. 9 can be rewritten as h ~x i ( g +1) W = h ~x i ( g ) W + µ X i =1 w i N i : λ (cid:16) , ( σ ( g ) ) C ( g ) (cid:17) , (10)The covariance matrix C ( g ) is a positive definite symmetricmatrix. Therefore it can be decomposed in C ( g ) = B ( g ) D ( g ) D ( g ) (cid:16) B ( g ) (cid:17) T , where B ( g ) is an orthogonal matrix, i.e. B ( g ) (cid:0) B ( g ) (cid:1) T = I d and D ( g ) a diagonal matrix whose diagonal contains thesquare root of the eigenvalues of C ( g ) .The so-called strategy parameters of the algorithm, thecovariance matrix C ( g ) and the step-size σ ( g ) , are updatedso as to increase the probability to reproduce good steps.The so-called rank-one update for C ( g ) [9] takes place asfollows. First, an evolution path is computed: ~p ( g +1) c = (1 − c c ) ~p ( g ) c + p c c (2 − c c ) µ eff σ ( g ) (cid:16) h ~x i ( g +1) W − h ~x i ( g ) W (cid:17) where c c ∈ ]0 , is the cumulation coefficient and µ eff is astrictly positive coefficient. This evolution path can be seenas the descent direction for the algorithm.Second the covariance matrix C ( g ) is “elongated“ in thedirection of the evolution path, i.e. the rank-one matrix ~p ( g +1) c (cid:16) ~p ( g +1) c (cid:17) T is added to C ( g ) : C ( g +1) = (1 − c cov ) C ( g ) + c cov ~p ( g +1) c (cid:16) ~p ( g +1) c (cid:17) T where c cov ∈ ]0 , . The complete update rule for the co-variance matrix is a combination of the rank-one updatepreviously described and the rank-mu update presented in[8].The update rule for the step-size σ ( g ) is called the pathlength control. First, another evolution path is computed: ~p ( g +1) σ = (1 − c σ ) ~p ( g ) σ + p c σ (2 − c σ ) µ eff σ ( g ) × B ( g ) D ( g ) − B ( g ) T (cid:16) h ~x i ( g +1) W − h ~x i ( g ) W (cid:17) (11)where c σ ∈ ]0 , . The length of this vector is compared tothe length that this vector would have had under randomselection, i.e. in a scenario where no information is gainedfrom the fitness function and one is willing to keep thestep-size constant. Under random selection the vector ~p ( g ) σ isdistributed as N (0 , I d ) . Therefore, the step-size is increasedif the length of ~p ( g ) σ is larger than E ( k N (0 , I d ) k ) anddecreased if it is shorter. Formally, the update rule reads: σ ( g +1) = σ ( g ) exp c σ d σ k ~p ( g +1) σ k E ( k N (0 , I d ) k ) − !! (12)where d σ > is a damping factor.he default parameters for CMA-ES were carefully de-rived in [7], Eqs. 6-8. The only problem-dependent parame-ters are h ~x i (0) W and σ (0) , and, to some extend, the offspringsize λ : its default value is ⌊ n ) ⌋ (the µ default valueis ⌊ λ ⌋ ), but increasing λ increases the probability to convergetowards the global optimum when minimizing multimodalfitness functions [7].This fact was systematically exploited in [3], where a”CMA-ES restart” algorithm is proposed, in which the pop-ulation size is increased after each restart. Different restartcriteria are used:1) RestartTolFun : Stop if the range of the best objectivefunction values of the recent generation is below thana TolFun value.2)
RestartTolX : Stop if the standard deviation of thenormal distribution is smaller than a TolX value and σ~p c is smaller than TolX in all components.3) RestartOnNoEffectAxis : Stop if adding a . standarddeviation vector in a principal axis direction of C ( g ) does not change h ~x i ( g ) W .4) RestartCondCov : Stop if the condition number of thecovariance matrix exceeds a fixed value.The resulting algorithm (the CMA-ES restart, simply denotedCMA-ES in the remainder of this paper) is a quasi parameterfree algorithm that performed best for the CEC 2005 specialsession on parametric optimization [1].An important property of CMA-ES is its invariance tolinear transformations of the search space. Moreover, becauseof the rank-based selection, CMA-ES is invariant to anymonotonous transformation of the fitness function: optimiz-ing f or h ◦ f is equivalent, for any rank-preserving function h : R → R . In particular, convexity has no impact on theactual behavior of CMA-ES. C. CMA-ES Implementation
This section describes the specific implementation ofCMA-ES to identify n isotherm coefficients. For the sakeof clarity we will use a single index in the definition of thecoefficients of the isotherm, i.e we will identify K a , N ∗ b and E c for a ∈ [1 , A ] , b ∈ [1 , B ] and c ∈ [1 , C ] where A , B and C are integers summing up to n . Fitness function and CFL condition:
The goal is tominimize the fitness function defined in Section III-A. In thecase where identification is done using only one experimentalchromatogram, the fitness function is the function J definedin Eq. 5 as the least squared difference between an exper-imental chromatogram c exp ( t ) obtained using experimentalconditions c , c inj and a numerical approximation of thesolution of system (1) for a candidate isotherm function H using the same experimental conditions. The numericalsimulation of a solution of Eq. 1 is computed with a Godunovscheme written in C++ (see [15] for the details of theimplementation).In order to validate the CMA-ES approach, first ”ex-perimental” chromatograms were in fact computed using numerical simulations of Eq. 1 with different experimentalconditions. Let F sim denotes the flux function used tosimulate the experimental chromatogram. For the simulationof an approximated solution of Eq. 1, a time step ∆ t and aCFL coefficient strictly smaller than one (typically 0.8) arefixed beforehand. The quantity max Sp ( | F ′ sim ( c ) | ) is thenestimated using a power method, and the space step ∆ z canthen be set such that Eq. 3 is satisfied for F sim . The same ∆ t and ∆ z are then used during the optimization of J .When c exp comes from real data, an initial value for theparameters to estimate, i.e. an initial guess given by theexpert is used to set the CFL condition (3). Using expert knowledge:
The choice of the type ofisotherm function to be identified will be, in most cases,given by the chemists. Fig 1 illustrates the importance ofthis choice. In Fig 1-(a), the target chromatogram c exp iscomputed using a Langmuir isotherm with one component( m = 1 and thus n = 2 ). In Fig 1-(b), the target chro-matogram c exp is computed using a Lattice of degree withone component ( m = 1 and thus n = 4 ). In both cases, theidentification is done using a Langmuir model, with n = 2 .It is clear from the figure that one is able to correctly identifythe isotherm, and hence fit the ”experimental” chromatogramwhen choosing the correct model (Fig 1 (a)) whereas the fitof the chromatogram is very poor when the model is notcorrect (Fig 1 (b)).Another important issue when using CMA-ES is the initialchoice for the covariance matrix: without any information,the algorithm starts with the identity matrix. However, thisis a poor choice in case the different variables have verydifferent possible order of magnitude, and the algorithm willspend some time adjusting its principal directions to thoseranges.In most cases of chromatographic identification, however,chemists provide orders of magnitudes, bounds andinitial guesses for the different values of the unknownparameters. Let [( K a ) min , ( K a ) max ] , [( N ∗ b ) min , ( N ∗ b ) max ] and [( E c ) min , ( E c ) max ] the ranges guessed by the chemistsfor respectively each K a , N ∗ b and E c . All parameters arelinearly scaled into those intervals from [ − , , removingthe need to modify the initial covariance matrix of CMA-ES. Unfeasible solutions:
Two different situations can leadto unfeasible solutions:First when one parameter at least, among parameterswhich have to be positive, becomes negative (remember thatCMA-ES generates offspring using an unbounded normaldistribution), the fitness function is arbitrarily set to .Second when the CFL condition is violated, the simulationis numerically unstable, and generates absurd values. In thiscase, the simulation is stopped, and the fitness function isarbitrarily set to a value larger than . Note that a bettersolution would be to detect such violation before runningthe simulation, and to penalize the fitness by some amountthat would be proportional to the actual violation. But it isnumerically intractable to predict in advance if the CFL is c ( g / l ) Chromatograms Sim ChromIdent Chrom (a) Simulation using a Lang-muir isotherm, identification us-ing a Langmuir model: the chro-matogram is perfectly fit. c ( g / l ) Chromatograms Sim ChromIdent Chrom (b) Simulation using a Latticeisotherm, identification using aLangmuir model: poor fit of thechromatogram.Fig. 1I
MPORTANCE OF THE CHOICE OF MODEL ( ONE COMPONENT MIXTURE ) going to be violated (see Eq. 3), and the numerical absurdvalues returned in case of numerical instability are notclearly correlated with the amount of violation either. Initialization:
The initial mean h ~x i (0) W for CMA-ESis uniformly drawn in [ − , n , i.e., the parameters K a , N ∗ b and E c are uniformly drawn in the ranges given bythe expert. The initial step-size σ is set to . . Besideswe reject individuals of the population sampled outsidethe initial ranges. Unfeasible individuals are also rejectedat initialization: at least one individual should be feasibleto avoid random behavior of the algorithm. In both cases,rejection is done by resampling until a “good” individualis got or a maximal number of sampling individuals isreached. Initial numbers of offspring λ and parents µ areset to the default values ( λ = ⌊ n ) ⌋ and µ = ⌊ λ/ ⌋ ). Restarting and stopping criteria:
The algorithm stopsif it reaches restarts, or a given fitness value (typically avalue between − and − for artificial problems, andadjusted for real data). Restart criteria (see Section IV-B) areRestartTolFun with TolFun = 10 − × σ (0) , RestartTolX withTolX = 10 − × σ (0) , RestartOnNoEffectAxis and Restart-CondCov with a limit upper bound of for the conditionnumber. The offspring size λ is doubled after each restartand µ is set equal to ⌊ λ/ ⌋ .V. R ESULTS
A. Validation using artificial data
A first series of validation runs was carried out using sim-ulated chromatograms. Each identification uses one or manyexperimental chromatograms. Because the same discretiza-tion is used for both the identification and the generation ofthe ”experimental” data, one solution is known (the sameisotherm that was used to generate the data), and the bestpossible fitness is thus zero.Several tests were run using different models for theisotherm, different number of components, and differentnumbers of time steps. In all cases, CMA-ES identified thecorrect parameters, i.e. the fitness function reaches values l og10 ( ab s ( v a l ue )) f_min=2.449E−15 Fig. 2S
INGLE COMPONENT MIXTURE , TIME STEPS . S
IMULATE A L ATTICE ( PARAMETERS ) AND IDENTIFY A L ATTICE OF DEGREE ( PARAMETERS ): B
EST FITNESS VERSUS NUMBER OF EVALUATIONS . T
HEFIRST RUN GAVE A SATISFACTORY SOLUTION BUT TWO RESTARTS HAVEBEEN PERFORMED TO REACH A FITNESS VALUE ( . − ) LOWERTHAN − . l og10 ( ab s ( v a l ue )) f_min=1.477E−14 Fig. 3B
INARY COMPONENT MIXTURE , TIME STEPS . S
IMULATE A L ANGMUIR ( PARAMETERS ) AND IDENTIFY A L ATTICE OF DEGREE ( PARAMETERS ): B
EST FITNESS VERSUS NUMBER OF EVALUATIONS .T HE FIRST RUN GAVE A SATISFACTORY SOLUTION BUT THE MAXIMALNUMBER ( HERE FIVE ) OF RESTARTS HAVE BEEN PERFORMEDATTEMPTING TO REACH A FITNESS VALUE OF − , THE BEST FITNESSVALUE ( . − ) WAS REACHED IN THE FOURTH RESTART . very close to zero. In most cases, CMA-ES did not needany restart to reach a precision of ( − ), though this wasnecessary in a few cases. This happened when the wholepopulation remained unfeasible during several generations,or when the algorithm was stuck in a local optimum. Fig-ures 2, 3, 4 show typical evolutions during one run of thebest fitness value with respect to the number of evaluations,for problems involving respectively 1, 2 or 3 components.Figure 4 is a case where restarting allowed the algorithm toescape a local optimum.Specific tests were then run in order to study the influenceof the expert guesses about both the ranges of the variablesand the starting point of the algorithm possibly given bythe chemical engineers: In CMA-ES, in a generation g ,offspring are drawn from a Gaussian distribution centered l og10 ( ab s ( v a l ue )) f_min=9.940E−15 Fig. 4T
ERNARY COMPONENT MIXTURE , TIME STEPS . S
IMULATE A L ANGMUIR ( PARAMETERS ) AND IDENTIFY A L ANGMUIR ( PARAMETERS ): B
EST FITNESS VERSUS NUMBER OF EVALUATIONS . T
WORESTARTS WERE NECESSARY : B
EFORE THE SECOND RESTART , CMA-ES
IS STUCK IN SOME LOCAL OPTIMA ( FITNESS OF ORDER OF − ), INTHE SECOND RESTART , THE ALGORITHM REACHES A FITNESS VALUE OF . − . on the mean h ~x i ( g ) W . An expert guess for a good solutioncan hence be input as the mean of the first distribution h ~x i (0) W that will be used to generate the offspring of thefirst generation. The results are presented in Table I. First3 lines give the probabilities that a given run converges (i.e.,reaches a fitness value of − ), computed on 120 runs,and depending on the number of restarts (this probabilityof course increases with the number of restarts). The lastline is the ratio between the average number of evaluationsthat were needed before convergence (averaged over the runsthat did converge), and the probability of convergence: thisratio measures the performance of the different experimentalsettings, as discussed in details in [2].The results displayed in Table I clearly demonstrate thata good guess of the range of the variables is the mostprominent factor of success: even without any hint aboutthe starting point, all runs did reach the required precisionwithout any restart. However, when no indication about therange is available, a good initial guess significantly improvesthe results, without reaching the perfect quality brought bytight bounds on the ranges: scaling is more important thanrejecting unfeasible individuals at the beginning. Computational cost:
The duration of an evaluationdepends on the discretization of the numerical scheme (num-ber of space- and time-steps), and on the number n ofunknown parameters to identify. Several runs were preciselytimed to assess the dependency of the computational coston both factors. The simple Langmuir isotherm was usedto both generate the data and identify the isotherm. Onlycomputational costs of single evaluations are reported, as thenumber of evaluations per identification heavily depends onmany parameters, including the possible expert guesses, andin any case is a random variable of unknown distribution.All runs in this paper were performed on a 1.8GHz Pentium TABLE IO
N THE USEFULNESS OF E XPERT K NOWLEDGE : TARGET VALUES FOR L ANGMUIR ISOTHERM ARE HERE ( K , N ∗ ) = (0 . , . E XPERTRANGE IS [0 . , . × [50 , , WIDE RANGE IS [0 . , × [50 , .T HE EXPERT GUESS FOR THE STARTING POINT IS A BETTER INITIALMEAN ( ACCORDING TO FITNESS VALUE ) THAN RANDOM . T
HE FIRST LINES GIVE THE PROBABILITIES ( COMPUTED OVER
RUNS ) TOREACH A − FITNESS VALUE WITHIN THE GIVEN NUMBER OFRESTARTS . T
HE LAST LINE IS THE RATIO OF THE NUMBER OFEVALUATIONS NEEDED FOR CONVERGENCE ( AVERAGED OVER THE RUNSTHAT DID CONVERGE ) BY THE PROBABILITY OF CONVERGENCE AFTERTWO RESTARTS ( LINE .
84 0 . restart .
92 0 . restarts .
95 0 . Perf.
601 1015 905 computer running with a recent Linux system.For one component ( m = 1 , n = 2 ), and , and time steps, the averages of the durations of a singleevaluation are respectively . , . , and . seconds,fitting the theoretical quadratic increase with the number oftime steps (though 3 sample points are too few to demonstrateanything!). This also holds for the number of space steps asthe number of space steps is proportional to the number oftime steps due to the CFL condition. For an identificationwith a 1-component Langmuir isotherm, the total cost of theidentification is on average seconds for a time stepsdiscretization.When looking at the dependency of the computationalcost on the number of unknown parameters, things are notthat clear from a theoretical point of view, because the costof each computation of the isotherm function also dependson the number of components and on the number of ex-perimental chromatograms to compare with. Experimentally,for, , and variables, the costs of a single evaluationare respectively . , . , and . seconds (for a timesteps discretization). For an identification, the total time isroughly 15 to 25 minutes for 2 variables, 40 to 60 minutesfor 3 variables, and 1 to 2 hours for 4 variables. B. Experiments on real data
The CMA-ES based approach has also been tested ona set of data taken from [10]. The mixture was composedof 3 chemical species: the benzylalcohol (BA), the 2-phenylethanol (PE) and the 2-methylbenzylalcohol (MBA).Two real experiments have been performed with differentproportions of injected mixtures, with respective proportions(1,3,1) and (3,1,0). Consequently, two real chromatogramshave been provided. For this identification, Qui˜nones et a.l. [10] have used a modified Langmuir isotherm model in which
ABLE IIC
OMPARING
CMA-ES
AND GRADIENT : THE - PARAMETERS CASE .S OLUTION ( LINE AND ASSOCIATED FITNESS VALUES ( LINE FORTHE MODIFIED L ANGMUIR MODEL (E Q . 13). L INE
3: F OR CMA-ES,”
MEDIAN ( MINIMAL )” NUMBER OF FITNESS EVALUATIONS ( OUT OF RUNS ) NEEDED TO REACH THE CORRESONDING FITNESS VALUE ON LINE
2. F
OR GRADIENT , ”
NUMBER OF FITNESS EVALUATIONS – NUMBER OFGRADIENT EVALUATIONS ” FOR THE BEST OF THE RUNS DESCRIBEDIN [12].CMA-ES Gradient N ∗ i (120.951,135.319,165.593) (123.373,135.704,159.637)Fitness × each species has a different saturation coefficient N ∗ i : H i ( c ) = N ∗ i P l =1 K l c l K i c i , i = 1 , . . . , . (13)Six parameters are to be identified: N ∗ i and K i , for i =1 , . . . , . A change of variable has been made for those testsso that the unknown parameters are in fact N ∗ i and K ′ i , where K ′ i = K i N i : those are the values that chemical engineersare able to experimentally measure.Two series of numerical tests have been performed usinga gradient-based method [12]: identification of the wholeset of 6 parameters, and identification of the 3 saturationcoefficients N ∗ i only, after setting the Langmuir coefficientsto the experimentally measured values ( K ′ , K ′ , K ′ ) =(1 . , . , . . The initial ranges used for CMA-ES are [60 , × [60 , × [60 , (resp. [1 . , . × [2 . , . × [3 , × [90 , × [100 , × [100 , ) whenoptimizing 3 parameters (resp. 6 parameters). Comparisonsbetween the two experimental chromatograms and thoseresulting from CMA-ES identification for the two experi-ments are shown in Figure 5, for the -parameters case.The corresponding plots in the -parameters case are visuallyidentical though the fitness value is slightly lower than in the -parameters case (see Tables II and III). But another pointof view on the results is given by the comparison betweenthe identified isotherms and the (few) experimental valuesgathered by the chemical engineers. The usual way to presentthose isotherms in chemical publications is that of Figure 6:the absorbed quantity H ( c ) i of each component i = 1 , , is displayed as a function of the total amount of mixture ( c + c + c ) , for five different compositions of the mixture[12]. Identified (resp. experimental) isotherms are plotted inFigure 6 using continuous lines (resp. discrete markers), forthe -parameters case. Here again the corresponding plotsfor the -parameters case are visually identical. C. Comparison with a Gradient Method
CMA-ES results have then been compared with those ofthe gradient method from [12], using the same data case ofternary mixture taken from [10] and described in previousSection. Chromatograms found by CMA-ES are, according c ( g / l ) BA:PE=1:3:1, VL=1.0ml BA expBA identPE expPE ident (a) 1:3:1, BA, PE c ( g / l ) BA:PE=3:1, VL=0.5ml BA expBA identPE expPE ident (b) 3:1, BA, PE c ( g / l ) MBA=1:3:1, VL=1.0ml MBA expMBA ident (c) 1:3:1, MBA c ( g / l ) MBA=3:1, VL=0.5ml MBA expMBA ident (d) 3:1, MBAFig. 5E
XPERIMENTAL CHROMATOGRAMS ( MARKERS ) AND IDENTIFIEDCHROMATOGRAMS ( CONTINUOUS LINE ) FOR THE
BA, BE
AND
MBA
SPECIES . P
LOTS ON THE LEFT / RIGHT CORRESPOND TO AN INJECTIONWITH PROPORTIONS (1,3,1)/(3,1,0).TABLE IIIC
OMPARING
CMA-ES
AND GRADIENT : THE - PARAMETERS CASE .S OLUTIONS ( LINES AND AND ASSOCIATED FITNESS VALUES ( LINE FOR THE MODIFIED L ANGMUIR MODEL (E Q . 13).CMA-ES Gradient K ′ i (1.861,3.120,3.563) (1.780,3.009,3.470) N ∗ i (118.732,134.860,162.498) (129.986,141.07,168.495)Fitness × to the fitness (see Tables II and III), closer to the experimentalones than those obtained with the gradient method. Moreover,contrary to the gradient algorithm, all 12 independent runs ofCMA-ES converged to the same point. Thus, no variance isto be reported on Tables II and III. Furthermore, there seemsto be no need, when using CMA-ES, to fix the Langmuircoefficients in order to find good results: when optimizingall parameters, the gradient approach could not reach avalue smaller than . , whereas the best fitness found byCMA-ES in the same context is .
32 10 − (Table III).Finally, when comparing the identified isotherms to theexperimental ones (figure 6), the fit is clearly not verysatisfying (similar deceptive results were obtained with thegradient method in [12]): Fitting both the isotherms and thechromatograms seem to be contradictory objectives. Twodirections can lead to some improvements in this respect:modify the cost function J in order to take into accountsome least-square error on the isotherm as well as on the
30 6003060 BA (g/l) c +c +c H ( c ) (a) BA +c +c H ( c ) (b) MBA +c +c H ( c ) (c) PEFig. 6I SOTHERMS ASSOCIATED TO PARAMETERS VALUES OF T ABLE
III(
CONTINUOUS LINE ) AND EXPERIMENTAL ONES ( MARKERS ) VERSUSTOTAL AMOUNT OF THE MIXTURE FOR DIFFERENT PROPORTIONS OF THECOMPONENT IN THE INJECTED CONCENTRATION [12]. chromatograms; or use a multi-objective approach. Bothmodifications are easy to implement using EvolutionaryAlgorithms (a multi-objective version of CMA-ES wasrecently proposed [11]), while there are beyond whatgradient-based methods can tackle. However, it might alsobe a sign that the modified Langmuir model that has beensuggested for the isotherm function is not the correct one.
Comparison of convergence speeds:
Tables II and IIIalso give an idea of the respective computational costs ofboth methods on the same real data. For the best run out of10, the gradient algorithm reached its best fitness value after iterations, requiring on average evaluations per iterationfor the embedded line search. Moreover, the computationof the gradient itself is costly – roughly estimated to 4times that of the fitness function. Hence, the total cost ofthe gradient algorithm can be considered to be larger than fitness evaluations. To reach the same fitness value( .
96 10 − ), CMA-ES only needed 175 fitness evaluations(median value out of 12 runs). To converge to its bestvalue ( .
78 10 − , best run out of 12) CMA-ES needed 280fitness evaluations. Those results show that the best run ofthe gradient algorithms needs roughly the same amount offunctions evaluations than CMA-ES to converge. Regardingthe robustness issue, note that CMA-ES always reached thesame fitness value, while the 10 different runs of the gradientalgorithm from 10 different starting points gave 10 differentsolutions: in order to assess the quality of the solution, moreruns are needed for the gradient method than for CMA-ES! VI. C ONCLUSIONS
This paper has introduced the use of CMA-ES for theparametric identification of isotherm functions in chromatog-raphy. Validation tests on simulated data were useful toadjust the (few) CMA-ES parameters, but also demonstratedthe importance of expert knowledge: choice of the type ofisotherm, ranges for the different parameters, and possiblysome initial guess of a not-so-bad solution.The proposed approach was also applied on real data andcompared to previous work using gradient methods. On thisdata set, the best fitness found by CMA-ES is better thanthat found by the gradient approach. Moreover, the resultsobtained with CMA-ES are far more robust: (1) CMA-ES always converges to the same values of the isothermparameters, independently of its starting point; (2) CMA-EScan handle the full problem that the gradient method failedto efficiently solve: there is no need when using CMA-ES touse experimental values of the Langmuir parameters in orderto obtain a satisfactory fitness value. Note that the fitnessfunction only takes into account the fit of the chromatograms,resulting in a poor fit on the isotherms. The results confirmthe ones obtained with a gradient approach, and suggest toeither incorporate some measure of isotherm fit in the fitness,or to try some multi-objective method – probably the bestway to go, as both objectives (chromatogram and isothermfits) seem somehow contradictory.A
CKNOWLEDGMENTS
This work was supported in part by MESR-CNRS ACINIM Chromalgema. The authors would like to thank Niko-laus Hansen for the Scilab version of CMA-ES, and for hisnumerous useful comments.R
Proc. IEEE Congress OnEvolutionary Computation , pages 1777–1784, 2005.[3] A. Auger and N. Hansen. A restart CMA evolution strategy withincreasing population size. In
Proc. IEEE Congress On EvolutionaryComputation , pages 1769–1776, 2005.[4] A. Fadda and M. Schoenauer. Evolutionary chromatographic lawidentification by recurrent neural nets. In D.B. Fogel and W. Atmar,editors,
Proc. 3 rd Annual Conference on Evolutionary Programming ,pages 219–235. MIT PRESS, 1994.[5] E. Godlewski and P.-A. Raviart.
Hyperbolic systems of conservationlaws , volume 3/4 of
Mathematiques et applications . Ed. Ellipses,SMAI, 1991.[6] G. Guiochon, A. Feilinger, S. Golshan Shirazi, and A. Katti.
Fun-damentals of preparative and nonlinear chromatography . AcademicPress, Boston, second edition, 2006.[7] N. Hansen and S. Kern. Evaluating the CMA evolution strategyon multimodal test functions. In Xin Yao et al., editors,
ParallelProblem Solving from Nature - PPSN VIII, LNCS 3242 , pages 282–291. Springer, 2004.[8] N. Hansen, S. D. M¨uller, and P. Koumoutsakos. Reducing the timecomplexity of the derandomized evolution strategy with covariancematrix adaptation.
Evolutionary Computation , 11(1):1–18, 2003.[9] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies.
Evolutionary computation ,9(2):159–195, 2001.10] J.C. Ford I. Qui˜nones and G. Guiochon. High concentration bandprofiles and system peaks for a ternary solute system.
Anal. Chem ,pages 1495–1502, 2000.[11] Ch. Igel, N. Hansen, and S. Roth. Covariance matrix adaptation formulti-objective optimization.
Evolutionary Computation , 15(1):1–28,2007.[12] F. James and M. Postel. Numerical gradient methods for fluxidentification in a system of conservation laws.
Journal of EngineeringMathematics , 2007. to appear.[13] F. James, M. Sep´ulveda, I. Qui nones, F. Charton, and G. Guio-chon. Determination of binary competitive equilibrium isothermsfrom the individual chromatographic band profiles.
Chem. Eng. Sci. ,54(11):1677–1696, 1999.[14] I. Langmuir. The adsorption of gases on plane surfaces of glass, micaand platinum.
Jour. Am. Chem. Soc. , 40(9):1361–1403, 1918.[15] F. James (PI). CHROMALGEMA, Numerical Resolutionof the Inverse Problem for Chromatography usingEvolutionary Algorithms and Adaptive Multiresolution.http://sourceforge.net/projects/chromalgema.[16] M. Schoenauer and M. Sebag. Using Domain Knowledge in Evo-lutionary System Identification. In K. Giannakoglou et al., editor,
Evolutionary Algorithms in Engineering and Computer Science . JohnWiley, 2002.[17] P. Valentin, F. James, and M. Seplveda. Statistical thermodynamicsmodels for a multicomponent two-phases equilibrium isotherm.