A cyclic time-dependent Markov process to model daily patterns in wind turbine power production
AA cyclic time-dependent Markov process to model dailypatterns in wind turbine power production
Teresa Scholz a , Vitor V. Lopes a, ∗ , Ana Estanqueiro a a LNEG, National Laboratory for Energy and Geology, Estrada do Pa¸co do Lumiar, 22,1649-038, Lisboa, Portugal
Abstract
Wind energy is becoming a top contributor to the renewable energy mix,which raises potential reliability issues for the grid due to the fluctuatingnature of its source. To achieve adequate reserve commitment and to pro-mote market participation, it is necessary to provide models that can capturedaily patterns in wind power production. This paper presents a cyclic in-homogeneous Markov process, which is based on a three-dimensional state-space (wind power, speed and direction). Each time-dependent transitionprobability is expressed as a Bernstein polynomial. The model parametersare estimated by solving a constrained optimization problem: The objectivefunction combines two maximum likelihood estimators, one to ensure thatthe Markov process long-term behavior reproduces the data accurately andanother to capture daily fluctuations. A convex formulation for the overalloptimization problem is presented and its applicability demonstrated throughthe analysis of a case-study. The proposed model is capable of reproducingthe diurnal patterns of a three-year dataset collected from a wind turbinelocated in a mountainous region in Portugal. In addition, it is shown howto compute persistence statistics directly from the Markov process transitionmatrices. Based on the case-study, the power production persistence throughthe daily cycle is analysed and discussed.
Keywords:
Cyclic Markov process, wind power, persistence, diurnal pattern ∗ Corresponding author
Email addresses: [email protected] (Teresa Scholz), [email protected] (Vitor V. Lopes), [email protected] (Ana Estanqueiro)
Preprint submitted to Energy October 14, 2013 a r X i v : . [ phy s i c s . d a t a - a n ] O c t . Introduction The EC European Parliament objective to achieve 20% of the consumed energy from the renewable energy sector by 2020 introduced a serious chal- lenge to the planning and operating of power systems. Wind energy is be- coming a top contributor to the renewable energy mix due to rather high capacities and generation costs that are becoming competitive with conven- tional energy sources [28]. However, wind energy systems suffer from a major drawback, the fluctuating nature of their source, which affects the grid secu- rity, the power system operation and market economics. There are several tools to deal with these issues, such as the knowledge of wind power persis- tence and wind speed or power simulation. Persistence is related to stability properties and can provide useful information for bidding on the electricity market or to maintain reliability, e.g. by setting reserve capacity. Wind power or speed simulation can be used to study the impact of wind generation on the power system. For this task, a sufficiently long time series of the power output from the wind plants should be used. However, real data records are commonly of short length and thus synthetic time series are generated by stochastic simulation techniques to model wind activity [16]. Shamshad et al. [23] used first and second-order Markov chain mod- els for the generation of hourly wind speed time series. They found that a model with 12 wind speed states (1 m/s size) can capture the shape of the probability density function and preserve the properties of the observed time series. Additionally, they concluded that a second-order Markov chain pro- duces better results. Nfaoui et al. [15] compared the limiting behavior of their Markov chain model with the data histograms gotten from hourly averaged wind speed and showed that the statistical characteristics were faithfully re- produced. Sahin and Sen [22] reported the use of a first-order Markov chain approach to simulate the wind speed, where: a) both transitions between consecutive times and within state wind speeds are sampled using an uni- form distribution; and, b) extreme states are sampled with an exponential distribution. They showed that statistical parameters were preserved to a significant extent; however, second-order Markov chain models could yield improved results. Although wind power can be computed from synthetic wind speed time series, Papaefthymiou and Kl¨ockl [16] show that a stochastic model using wind power leads to a reduced number of states and a lower Markov chain model order. They compared a Markov chain based method for the direct speed. Both the autocorrelation and the probability density function of the simulated data showed a good fit. Thus, they concluded that it is better to generate wind power time series. Chen et al. [7] also modeled wind power by using different discrete Markov chain models: the basic Markov model; the Bayesian Markov model, which considers the transition matrix uncertainty; and, the birth-and-death Markov model, which only allows state transitions between immediately adjacent states. After comparing the wind power au- tocorrelation function, the authors find the Bayesian Markov model best. Lopes et al. [13] proposed a Markov chain model using states that combine information about wind speed, direction and power. From the transition matrix, they compute statistics, such as the stationary power distribution and persistence of power production, which show a close agreement with their empirical analogues. The model was then used for the two-dimensional stochastic modeling of wind dynamics by Raischel et al. [21]. They aim at studying the interactions between wind velocity, turbine aerodynamics and controller action using a system of coupled stochastic equations describing the co-evolution of wind power and speed. They showed that both the de- terministic and stochastic terms of the equations can be extracted directly from the Markov chain model. The knowledge of wind power production persistence provides useful in- formation to run a wind park and to bid on the electricity market, since it provides information about the expected power steadiness. It can be seen as the average time that a system remains in a given state or a subset of states. Existent literature focuses mainly on wind speed persistence, which is used for assessing the wind power potential of a region. Persistence can be determined directly from the data [20, 19]; however, the presence of missing data leads to an underestimate of actual persistence. Alternative methods are based on wind speed duration curves [14, 10], the autocorrelation func- tion or conditional probabilities. Ko¸cak [11] and Cancino-Sol´orzano et al. [5] compare these techniques, and both conclude that wind speed duration curve yields the best results, i.e. results that follow the geographical and climatic conditions of the analyzed sites. Moreover, Cancino-Sol´orzano et al. [5] an- alyze the concept of “useful persistence”, which is the time schedule series where the wind speed is between the turbine cut-in and cut-out speed. The results gotten from this analysis coincide with the persistence classification obtained using the speed duration curves. In addition, Ko¸cak [12] suggests a detrended fluctuation analysis to detect long-term correlations and analyze [8] and Poje [19] proposed an approach based on the use of a power law or exponential probability distributions for the persistence of wind speed above and below a reference value. A Markov chain based method to derive the distribution of persistence is introduced by Anastasiou and Tsekos [1], who show its capability on wind speed data. Most methods in literature of wind speed and power synthesis fail to rep- resent diurnal patterns in the artificial data. However, these are relevant for energy system modeling and design, since their knowledge allows to plan and schedule better. For instance, a power production behavior that best matches demand needs smaller reserve capacity. Recently, Suomalainen et al. [26, 25] introduced a method for synthetic generation of wind speed scenarios that include daily wind patterns by sampling a probability distribution matrix based on five selected daily patterns and the mean speed of each day. Cara- pellucci and Giordano [6] adopt a physical-statistical approach to synthesize wind speed data and evaluate the influence of the diurnal wind speed profile on the cross-correlation between produced energy and electrical loads. The parameters of their model, such as diurnal pattern strength or peak hour of wind speed are determined through a multi-objective optimization, carried out using a genetic algorithm. This paper introduces a cyclic time-variant Markov model of wind power, speed and direction designed to consider the daily patterns observed in the data. The model can be used to synthesize data for the three variables and is capable of reproducing the daily patterns. Moreover, it allows to compute persistence statistics depending on the time of the day. The paper is organized as follows: Section 2 introduces the proposed model as an extension of the “regular” Markov chain model, which is then used for comparison. Furthermore it is shown, how to compute the time-of-the-day dependent persistence statistics directly from the Markov model transition matrices.
In section 3 the constrained convex optimization problem to get the model parameters is introduced and explained. It is applied to the analysis of a case-study based on real dataset, section 4. Since the model describes the joint statistics for wind power, speed and direction, Section 5 explains how to create synthetic time-series for these variables. Section 6 compares the synthesized data of both the time-variant and the time-invariant versions of the model. Moreover, it is shown how the persistence of power production varies through the daily cycle. Nomenclature α Initial state distribution at time step t = 0 β i,jµ Coefficients of the Bernstein polynomial modeling the transition proba- bility p i,j ( t ) A unit column vector of the same size as subset A P P · . . . · P T − A Subset of the state space, containing the states of interest for persistence S Set of observed state transitions S z Set of transitions observed in the data together with the scaled time of the day z at which they are observed ω Weight of the extra transitions added to the objective function π Stationary distribution of a time-invariant Markov chain π ∗ lim t →∞ P t π r Stationary distribution at time r of a time-variant cyclic Markov process π r ( j ) Stationary probability, of state j at time of the day r π r ( A ) Vector whose elements are the stationary probabilities of the states in the set A at time of the day r τ Persistence τ r Time-dependent persistence in a cyclic Markov process b µ,k ( z ) µ -th Bernstein basis polynomial of order k E [ ] Expected value operator P t t -th step transition matrix of a Markov process p i,j ( t ) t -th step transition probability of a Markov process avgi,j Daily average probability of transition from state s i to s j r t Remainder of time step t modulo T S Markov process state space s i i -th state of a Markov process T Period of a cyclic Markov process t Time step of a Markov process X t Markov process z Scaled time of the day π A Stationary probability distribution of the states in subset A r time of the day . Time-inhomogeneous Markov model A discrete finite Markov process { X t ∈ S, t ≥ } is a stochastic process on a discrete finite state space S = { s , s , ..., s n } , n ∈ N , whose future evolution depends only on its current state [9]. This Markov property is expressed mathematically by P r { X t +1 = s j | X t = s i ∧ X l ∈ S ∀ l = 0 , ..., t − } = P r { X t +1 = s j | X t = s i } . P r { X t +1 = s j | X t = s i } describes the probability of the Markov process moving to state s j at time step t + 1 given that it is in state s i and is called the t -th step transition probability, denoted as p i,j ( t ). Thus, for each time step t the Markov process has an associated transition probability matrix P t , a n by n matrix with entries [ P t ] i,j = p i,j ( t ) for all i, j ∈ { , . . . , n } . Each P t satisfies the following properties: p i,j ( t ) ≥ (cid:80) j p i,j ( t ) = 1 ∀ i, j ∈ { , . . . , n } , ∀ t . A Markov process is called cyclic with period T ∈ N , if T is the smallest number, such that p i,j ( mT + r ) = p i,j ( r ) for all m in N , 0 ≤ r < T [18]. Thus, a cyclic Markov process is described by T transi- tion matrices P r , r = 0 , ..., T −
1. The remainder of time step t modulo T will be denoted as r t and thus r t = r t + mT ∀ t, m ∈ N . If the transition probabilities are time-independent, i.e. p i,j ( t ) = p i,j , the process is called a (time-homogeneous) Markov chain and its probability ma- trix P ∈ R n +1 × n +1 is given by [ P ] i,j = p i,j . By analogy to the time-dependent case it holds that p i,j ≥ (cid:80) j p i,j = 1 ∀ i, j ∈ { , . . . , n } . The probability of reaching a state s j from a state s i in l time steps is given by P l ( i, j ), i.e. the l -th power of the transition matrix P . If a state s j can be reached from a state s i in a finite number of time steps and vice versa, i.e. ∃ l ∈ N P l ( i, j ) ≥ ∧ P l ( j, i ) ≥
0, the states s i and s j communicate. All states that communicate with each other are said to be in the same communication class. If all states of a state space are in the same communication class, i.e. if it is possible to reach every state from any other state in a finite number of time steps, the corresponding transition matrix P is called irreducible. .2.2. Cyclic time-variant Markov process A cyclic Markov process with period T is described by T transition ma- trices P r , one for each time of the day r = 0 , ..., T −
1. The probability of the process reaching state s j from state s i in l time steps at time t = 0 is given as ( P · . . . · P T − ) m · P · . . . · P r l ( i, j ) with l = mT + r l . For an arbitrary time-step t , the formula must be multiplied from the left with the term P r t · . . . · P T − . Thus, the Markov process is irreducible, if the matrix P = P · . . . · P T − is irreducible, i.e. if ∃ l ∈ N P l ( i, j ) ≥ ∀ i, j . If a Markov chain is irreducible and aperiodic then the long-term statisticsof a Markov chain are described by the stationary probability distribution: π = lim t →∞ α P t . The distribution is independent of the initial distribution α and satisfies the balance equation π = πP . By the Perron-Frobeniustheorem it can be computed as the normalized eigenvector corresponding tothe eigenvalue 1 of the transition matrix [17].In the case of the cyclic time-inhomogeneous Markov process there is also astationary distribution π r , for all r < T . It can be interpreted as the limitingdistribution of the Markov process considering only the datapoints sampledat time of the day r . If the matrix P is irreducible, i.e. if ∃ π ∗ , such that π ∗ = lim t →∞ α · P t and the process is aperiodic, the stationary distribution π r exists and is given by π ∗ · P · . . . · P r − , since it satisfies the balanceequation 1: π r = π ∗ · P · . . . · P r − π r = π ∗ · P · P · . . . · P r − π r = π ∗ · P · . . . · P r − · P r · P r +1 · . . . · P T − · P · . . . · P r − π r = π r · ( P r · P r +1 · . . . · P T − · P · P · . . . · P r − ) . (1) The persistence of a given state s i is related with the number of steps the system consecutively remains in this state. In the time-homogeneous case, it follows a geometric distribution with expected value (1 − p i,i ) − and is denoted by τ . Anastasiou and Tsekos [1] showed that it is possible to determine the expected time that a Markov chain stays consecutively inside a given subset of states using a simple closed-form expression. For example, in wind power applications, a typical subset of interest could contain all states this estimate, the states are renumbered, s.th. they can be partitioned into two disjoint subsets: A = { s ν , ..., s n } containing the states of interest; and A = { s , ..., s ν − } , its complement. Then, the transition matrix is rearranged into the following block structure: P = (cid:18) A BC D (cid:19) = p , · · · p ,ν − p ,ν · · · p ,n ... ... ... ... p ν − , · · · p ν − ,ν − p ν − ,ν · · · p ν − ,n p ν, · · · p ν,ν − p ν,ν · · · p ν,n ... ... ... ... p n, · · · p n,ν − p n,ν · · · p n,n , where the first and last block of rows and columns correspond to the states in subset A and A , respectively. The expected value of persistence, i.e. the expected number of time steps the Markov process consecutively remains in the subset A once it is entered, is given by: E { τ } = π A A π A C B , where π A is the stationary probability distribution of the states in subset A and A is the unit column vector of size ( n − ν + 1) × For the time-inhomogeneous case, persistence τ t is defined as the number of time steps the Markov process is expected to remain in a state (set of states), once it is entered at time t . For a cyclic Markov process, the persistence τ t is equal for all t that are congruent modulo T . Thus, it is only necessary to compute the persistence for τ r , r = 0 , ..., T −
1. This can be achieved by adapting the derivation of equation 2.4, provided by Anastasiou and Tsekos [1], to time-variant cyclic Markov processes.
After renaming, s.th. the subset of interest is A = { s ν , ..., s n } , the states of each transition matrix P r are rearranged as in equation 2.4. r = (cid:18) A r B r C r D r (cid:19) = p , ( r ) · · · p ,ν − ( r ) p ,ν ( r ) · · · p ,n ( r )... ... ... ... p ν − , ( r ) · · · p ν − ,ν − ( r ) p ν − ,ν ( r ) · · · p ν − ,n ( r ) p ν, ( r ) · · · p ν,ν − ( r ) p ν,ν ( r ) · · · p ν,n ( r )... ... ... ... p n, ( r ) · · · p n,ν − ( r ) p n,ν ( r ) · · · p n,n ( r ) , The probability of τ r to be equal to l is given as: P r ( τ r = l ) = P r ( X r ∈ A , ..., X r + l ∈ A , X r + l +1 / ∈ A | X r ∈ A , X r − / ∈ A )= (cid:88) i ∈A P r ( X r = i | X r ∈ A , X r − / ∈ A ) · P r ( X s ∈ A , r < s ≤ l, X r + l +1 / ∈ A | X r = i )= (cid:88) i ∈A (cid:88) k / ∈A (cid:88) j ∈A ˜ π r ( i ) · P i,j ( r, l − , A ) · p j,k ( r + l ) (2)with P i,j ( r, l, A ) = P r ( X r + l = j, X k ∈ A , < k < l | X r = i )= D r · . . . · D r + l − · C r + l · A and ˜ π r ( i ) = P r ( X r = i | X r ∈ A , X r − / ∈ A )= (cid:80) j / ∈A P r ( X r = i | X r − = j ) · P r ( X r − = j ) (cid:80) i ∈A (cid:80) j / ∈A P r ( X r = i | X r − = j ) · P r ( X r − = j )= (cid:80) j / ∈A π r − ( j ) p j,i ( r − (cid:80) k ∈A (cid:80) j / ∈A π r − ( j ) p j,k ( r − , i ∈ A , (3)where π r ( j ) is the long term probability of occurrence (stationary proba- bility) of state j at time of the day r ; also note that π t ( j ) = π r ( j ) for t = mr , ∀ m ∈ N . Equation 3 can be rewritten in the matrix form to include all states in the subset A : π r ( A ) = π r − ( A ) · B r − π r − ( A ) · B r − · A , where A is a unit vector of dimension ( n − ν + 1) × π r − ( A ) is a vector of dimensions 1 × ν , whose elements are the stationary probabilities of the states in the set A at time of the day r − Thus, equation 2 can be rewritten as:
P r ( τ r = l ) = ˜ π r ( A ) · D r · . . . · D r + l − · C r + l · A = π r − ( A ) · B r − π r − ( A ) · B r − · A · D r · . . . · D r + l − · C r + l · A . The expected value of persistence at time r can then be derived as: E ( τ r ) = ∞ (cid:88) l =1 l · π r − ( A ) · B r − π r − ( A ) · B r − · A · D r · . . . · D r + l − · C r + l · A Making use of the cyclicity of the Markov process, this can be expressed as: E ( τ r ) = ∞ (cid:88) l =1 l · π r − ( A ) · B r − π r − ( A ) · B r − · A · D m · D r · . . . · D r + r l − · C r + r l · A where D = D r · . . . · D T · D T +1 · . . . · D r − and l = mT + r l . It can be seen that the sum converges after splitting it into T partial sums, one for each time of the day r . For each partial sum, the only term not constant is the matrix power D m , which converges because all D eigenvalues are smaller than 1. The infinite sum for the expected value of persistence at time r can be approximated to an arbitrary degree of accuracy (cid:15) by defining f l = l · π r − ( A ) · B r − π r − ( A ) · B r − · A · D m · D r · . . . · D r + r l − · C r + r l · A . and successively adding f l , l = 0 , , ..., L until the difference between two consecutive sums is smaller than (cid:15) , i.e. until | f L | < (cid:15) . . Parameter estimation The common approach to estimate the Markov chain transition matrix P is through the optimization of a constrained maximum likelihood func- tion, which describes the realization probability of a given dataset [2]. For a sequence of M states { X = s i , ..., X M = s i M } with s i , ..., s i M ∈ S and i , ..., i M ∈ { , ..., n } , its probability can be computed as P r { X = s i } p i ,i p i ,i · . . . · p i M − ,i M . Since the term P r { X = s i } is constant, given a set of observed state transitions S , it is possible to estimate P by maximizing the likelihood OF = (cid:89) ( i,j ) ∈S p i,j , (4)where a transition is described by an ordered pair ( i, j ) indicating the ori- gin and the destination of the transition. In practice, instead of maximizing OF with respect to the p i,j variables it is preferable to minimize the nega- tive log-likelihood function, i.e. − log( OF ), since it transforms the original mathematical programming problem into an equivalent one that is convex and, thus, has a unique solution [4]. The overall optimization problem is formulated as follows: min − (cid:88) ( i,j ) ∈S log( p i,j )subject to p i,j ≥ ∀ i, j = 0 , ..., n (cid:88) j p i,j = 1 ∀ i = 0 , ..., n The constraints ensure non-negativity of the transition probabilities and that they sum up to 1 for each row of the transition matrix.
The goal of this time-variant Markov process is to get a model that accu- rately reproduces the long-term behavior while considering the daily patterns observed in the data. Thus, the proposed objective function combines two maximum likelihood estimators: the first term maximizes the likelihood of hood of the time-dependent probability. The final optimization problem is transformed into a convex one using the negative logarithm of the objective function. This section provides a detailed description of the objective func- tion, the parametrization of the time-variant probability functions, and the constraints that must be added to the optimization problem to ensure its
Markov properties.
The transition probabilities are considered to be time-variant and cyclic with a period of T , i.e. for each time of the day r (= 0 , ..., T −
1) there is a different transition matrix P r . In this paper, the time-dependent transition probabilities p i,j ( r ) are modeled by Bernstein polynomials. This has several advantages: a) a polynomial representation of the transition probabilities leads to a convex objective function and constraints, i.e. the optimization problem has a unique solution; b) a polynomial representation allows to de- crease the number of variables in the optimization problem: for each transi- tion, instead of T variables only k + 1 are needed for a k order polynomial; c) Bernstein polynomials are non-negative, which simplifies probability model- ing, when compared to other polynomial bases; and d) they have the convex hull property, which, combined with de Casteljau algorithm, allows to easily write probability boundary conditions.
Bernstein polynomials are linear combinations of Bernstein basis polynomi- als b µ,k ( z ), z ∈ [0 , k + 1 Bernstein basis polynomials of order k are defined as: b µ,k ( z ) = (cid:18) kµ (cid:19) z µ (1 − z ) k − µ with µ = 0 , ..., k and (cid:18) kµ (cid:19) the binomial coefficient. Thus, the transition probabilities p i,j ( z ) are described by p i,j ( z ) = k (cid:88) µ =0 β i,jµ b µ,k ( z ) , with β i,jµ ∈ R and z = rT , since the polynomial variable has to be scaled, s.th. it is between 0 and 1. abilities given the data, the objective function must consider the time of the day z when the transition happens. Therefore, the objective function intro- duced in (4) becomes (cid:80) ( i,j ) z ∈S z log( p i,j ( z )), where S z is the set of observed transitions together with the time z when they happens. This objective func- tion allows to compute the intra-cycle transition probability functions, and thus to represent the daily patterns present in the data. A second term is added to this function, namely (cid:80) ( i,j ) ∈S log( p avgi,j ), where S is the set of transitions observed in the data as defined in section 3.1 and p avgi,j is the cycle-average (daily) probability of transition from state s i to s j . It can be computed as follows: p avgi,j = 11 − (cid:90) p ij ( z ) dz = (cid:90) k (cid:88) µ =0 β i,jk b µ,k ( z ) dz = k (cid:88) µ =0 β i,jk (cid:90) b µ,k ( z ) dz = 1 k + 1 k (cid:88) µ =0 β i,jk This second term is the maximum likelihood estimator for the daily av- erage probability and its addition to the objective function increases the consistency of the long-term behavior of the Markov process with the data.
Therefore, the overall objective function OF is given by: OF = − (cid:88) ( i,j ) ∈S log( p avgi,j ) − (cid:88) ( i,j ) z ∈S z log( p i,j ( z ))= − (cid:88) ( i,j ) ∈S log( 1 k + 1 k (cid:88) µ =0 β i,jk ) − (cid:88) ( i,j ) z ∈S z log( k (cid:88) µ =0 β i,jµ b µ,k ( z ))and minimization is performed with respect to the coefficients β i,jµ (model parameters). The estimation of the model parameters requires the transition probabil- ity functions to comply with several constraints, to ensure: • C - and C -continuity at z = 0, row-stochasticity of the transition matrices at every time of the day z and • that the transition probability functions are non-negative and bounded by 1. Thus, to complete the specification of the optimization problem this section explains all the necessary constraints required for the model parameters to describe a cyclic Markov process.
Periodicity
The transition probability functions are modeled using Bernstein polyno- mials, which are smooth, i.e. C ∞ -continuous functions. In general, the values at both ends of their domain (0 and 1) need not be equal. Thus, to avoid sudden changes in the value and slope of each probability function between the cycles, two constraints are added to ensure C and C -continuity. An- other reason is the arbitrariness of the cycle starting position, which affects the position of the discontinuity if these conditions are not used. The first constraint is p i,j (0) = p i,j (1). Since b µ,k (0) = δ µ, and b µ,k (1) = δ µ,k the constraint can be reformulated as β i,j = β i,jk , where δ is the Kronecker delta. The second constraint is added to ensure C -continuity, i.e. dp i,j dz (0) = dp i,j dz (1). The first derivative of a Bernstein basis polynomial can be written as a combination of two polynomials of lower degree: db µ,k dz ( z ) = k ( b µ − ,k − ( z ) − b µ,k − ( z ))Thus, the first derivative of a transition probability p i,j ( z ) is given by: dp i,j dz ( z ) = k ( k (cid:88) µ =1 ( β i,jµ − β i,jµ − ) b µ − ,k − ( z ) − β i,jk b k,k − ( z ))Hence, using b µ,k (0) = δ µ, and b µ,k (1) = δ µ,k as well as the first constraint β i,j = β i,jk ∀ i, j = 0 , ..., n , the constraint dp i,j dz (0) = dp i,j dz (1) reduces to the following linear constraint: β i,jk = β i,j = 0 . β i,j + β i,jk − ) Row stochasticity of transition matrices
15o ensure row stochasticity of the time-variant transition matrices, it is necessary to ensure that (cid:80) j p i,j ( z ) = 1 for all i and z . Since the Bernstein basis polynomials of order k form a partition of unity, i.e. k (cid:88) µ =0 b µ,k ( z ) = 1the constraint can be re-written as a linear combination of the polynomial coefficients: (cid:88) j p i,j ( z ) = 1 ⇔ (cid:88) j k (cid:88) µ =0 β i,jµ b µ,k ( z ) = 1 ⇔ (cid:88) j β i,jµ = 1 Non-negative transition probabilities are bounded by 1
The most straightforward way to implement this constraint is to add two inequalities for each time of the day and each transition probability p i,j , i.e. p i,j ( z ) ≥ ∀ i, j, zp i,j ( z ) ≤ ∀ i, j, z (5)However, this constraint significantly increases the problem size, since it requires 2 · T · n inequalities. An alternative constraint can be formulated by using the convex hull property of the Bernstein polynomials. This constraint makes the overall optimization problem size smaller, but is more restrictive. Every Bernstein polynomial (cid:80) kµ =0 β µ b µ,k ( z ) always lies in the convex hull defined by its control points ( kµ , β µ ), µ = 0 , ..., k . Thus the constraint ≤ p i,j ( z ) ≤ ∀ i, j = 0 , ..., n can be reformulated in terms of the polynomial coefficients as ≤ β i,jµ ≤ ∀ µ = 0 , ..., k (6)Since constraint 6 is a sufficient but not necessary condition for con- straint 5, the reformulation leads to a more restrictive overall minimization when compared with the problem with original constraint 5. The convex hull bound of Berstein polynomials can be tightened by subdivision, i.e. by subdividing the domain in two regions and finding new control points β i,j (1) , ..., β i,jk (1) and β i,jk +1 (1) , ..., β i,j k (1) such that the function output re- mains unchanged. With each subdivision, the control points form a tighter bound around the polynomial and thus the polynomial coefficients can as- sume values in a wider range. The new control points represent the polyno- mial restricted to the two sub-intervals [0 , z ∗ ] and [ z ∗ , z ∗ ∈ [0 , is the cutting point of the division. For simplicity, z ∗ is fixed to 0.5 in all transition probabilities. The new control points can be determined by linear combinations of the original control points β i,j , ..., β i,jk . This can be performed efficiently using de Casteljau algorithm, which in matrix form is given as: β i,j (1)... β i,jk (1) = b , ( z ∗ ) 0 · · · b , ( z ∗ ) b , ( z ∗ ) · · · b ,k ( z ∗ ) b ,k ( z ∗ ) · · · b k,k ( z ∗ ) β i,j ... β i,jk = C l β i,j ... β i,jk = C l · β ij (7)and β i,jk +1 (1)... β i,j k (1) = b ,k ( z ∗ ) b ,k ( z ∗ ) · · · b k,k ( z ∗ )0 b ,k − ( z ∗ ) · · · b k − ,k − ( z ∗ )... ... . . . ...0 · · · b , ( z ∗ ) β i,j ... β i,jk = C r β i,j ... β i,jk = C r · β ij (8)The subdivision can be applied recursively to further improve the convex bound around the polynomial. The corresponding coefficients are computed by applying equations 7 and 8 to the left and right coefficient vectors. Defin- ing C = ( C l , C r ) T and I z as the identity matrix of dimension z × z , the coefficients β i,j ( w ) = ( β i,j ( w ) , ..., β ij w k ( w )) after w subdivisions can be ob- tained by: β i,j ( w ) = ( C ⊗ I w − ) · ( C ⊗ I w − ) · . . . · ( C ⊗ I ) · ( C ⊗ I ) · β i,j where ⊗ denotes the Kronecker product. The number of inequalities needed for the implementation of this constraint is ( k + 1) · ω +1 · n . Thus, subdivisions ω such that ( k + 1) · ω +1 ≤ T . The overall optimization problem to be solved for the estimation of the transition probability coefficients β i,jµ can be written as: min β i,jµ − (cid:88) ( i,j ) ∈S log( 1 k + 1 k (cid:88) µ =0 β i,jµ ) − (cid:88) ( i,j ) z ∈S z log( k (cid:88) µ =0 β i,jµ b µ,k ( z )) (9)subject to β i,j = β i,jk ∀ i, j = 1 , ..., n (10) β i,j = 0 . β i,j + β i,jk − ) ∀ i, j = 0 , ..., n (11) (cid:88) j β i,jµ = 1 ∀ i = 0 , ..., n ; ∀ µ = 0 , ..., k (12) β i,j ( w ) ≤ ∀ i, j = 0 , ..., n (13)0 ≤ β i,j ( w ) ∀ i, j = 0 , ..., n (14)where w is the number of subdivisions and k is the order of the Bern- stein polynomials, which have to be specified. The objective function (9) is a combination of two negative log-likelihood functions to ensure the Markov process captures both the daily patterns and the long-term behavior of the original data. The optimization is performed with respect to several con- straints: constraints (10) and (11) ensure C - and C -continuity at z = 0. The row-stochasticity of the transition matrix is ensured by constraint (12).
The last two constraints (13) and (14) bound the transition probabilities between 0 and 1.
It is expected that the objective function decreases with the polynomial order and the number of subdivisions. The parameters of the Markov chain model β i,jµ are estimated by solving the optimization problem using a rigor- ous numerical solver. The model was formulated making use of the casadi computation framework [3] and the optimization was performed by ipopt, a nonlinear interior-point solver [27], which ensures convergence to the global optimum in the case of convex optimization problems. . Application of the cyclic Markov process to wind turbine mod- eling The data for this study was obtained from a wind power turbine in a wind park located in a mountainous region in Portugal. The time series consists of a three-year period (2009-2011) of historical data gotten from the turbine data logger. The sampling time of 10 minutes leads to 144 samples each day. The data-set comprises three variables, wind power, speed and direction (nacelle orientation). The wind speed information was collected from the anemometer placed in the wind turbine hub. Due to confidentiality, wind power and speed data values are reported as a fraction of the rated power and the cut-out speed, respectively.
Discrete Markov chain models require the definition of the states when applied to describe continuous variables. This work proposes to characterize the wind turbine states using three different variables: wind power, speed and direction. As such, each state is defined by all the points inside a polyhedron in three-dimensional space.
Figure 1: Representation of all data points projected into the: a) wind direc-tion and speed plane (left); and, b) wind power and speed plane (right). Eachrectangle is the projection of a state polyhedron into the two planes. Overall,they define the final state partition for the three-dimensional variable space.Fig. 1 presents all data observations and the state partitions projected into: a) the wind direction and speed plane; and, b) the wind power and and speed plane define the characteristic power curve of the wind turbine.
It shows the three operational regions of a wind turbine: a) below the cut-in speed no power is produced; b) between cut-in and rated wind speed the power increases proportionally to the cube of wind speed; c) at wind speeds between the rated and the cut-out wind speed, the turbine control system limits the power output to a constant value. In the wind direction and speed plane, data is widely scattered and shows the dominant wind patterns at the site. Three accumulation regions can be identified: one for low wind speeds, centered on 0 .
25, which is the mode of the wind speed, and two defining the dominant wind directions around 100 ◦ and 300 ◦ . The data space is discretized unevenly to get a good resolution of the high- slope region of the power curve. In a previous work [13], this partition was used in a time-homogeneous Markov chain and proved to lead to an accurate representation of the original data. The wind direction and power are divided by an equally spaced grid leading to 12 ( { d , ..., d } ) and 20 ( { p , ..., p } ) classes, respectively. The wind speed is divided as follows: values below the cut-in speed define one class sp ; between the cut-in and rated wind speed the discretization is narrowed by selecting 10 classes ( { sp , ..., sp } ); and be- tween the rated and cut-out wind speed discretization is widened and 4 classes ( { sp , ..., sp } ) are defined. Data points with wind speed above the cut-out wind speed are discarded. The complete state set is constructed by listing all possible combinations of the classes of each variable. Due to physical constraints between the variables, most of the states are empty (fig. 1(left)) and can are discarded. This reduces the number of states from 3840 to 778, for this turbine. The solution of the optimization problem described in section 3 comprises a set of transition matrices P r , r ∈ { , ..., } . However, the constraints in the optimization problem definition do not force the matrix P = P · . . . · P to be irreducible and thus the Markov process to have a single communication class. So, during data synthesis, the Markov process can get “trapped” within a communication class. To induce the Markov process to have a single communication class, additional transition counts are introduced into the data. The goal is to add a small set of transitions to promote state connectivity without distorting the original data. Thus, the set is composed of transitions that connect neighboring states in the state space, since those For a state s i = ( p l , sp p , d q ), its neighborhood V is defined as V ( s i ) := { ( p ll , sp pp , d qq ) : ll ∈ { l − , l, l +1 } , pp ∈ { p − , p, p +1 } , qq ∈ { q − , q, q +1 }}\ s i . (15) It should be noted that, unlike power and speed, direction is a circular variable, e.g. states d and d are considered neighbors. If a neighbor state s j ∈ V ( s i ) is present in the dataset, a transition ( i, j ) is added to the set of extra transitions S E . For this dataset, originally consisting of 150601 transitions, 13610 transitions are added. The extra transitions must be considered to happen at an unknown time of the day z . Thus, they can only be accounted for in the objective function term without time information, i.e. only in the time-variant part of the objective function. This directly affects the values for p avgi,j and, indirectly, the model parameters. Since the aim is to cause a minimal impact on the transition probabilities, the additional term is weighed by a factor ω < to directly control its influence. In this work it is fixed to 0.05. Thus, the following term is added to the objective function: − ω · (cid:88) ( i,j ) ∈S E log( p avgi,j ) = − ω · (cid:88) ( i,j ) ∈S E log( 1 k + 1 k (cid:88) µ =0 β i,jk ) (16)Although the use of the extra transition set does not ensure the time- variant Markov process to have a single communication class, results show a decrease of the number of communication classes from 13 to 1 in this dataset.
5. Simulation of wind power, speed and direction time series
To simulate wind power, speed and direction time series the method de- scribed by Sahin and Sen [22] is adapted to the cyclic time-variant Markov model as follows. First, the cumulative probability transition matrices P cumr with P cumr ( i, j ) = (cid:80) jk =0 p i,k ( r ) are computed. Then an initial state s i , i.e. X = s i , is randomly selected. A new datapoint X t +1 is generated by uni- formly selecting a random number (cid:15) between zero and one. The correspond- ing state s new ( X t +1 = s new ) is chosen such that the probability of reaching it from the current state s i is bigger than (cid:15) , i.e. such that P cumr t ( i, new ) ≥ (cid:15) . Based on this discrete state sequence, a real value for the wind power/speed/direction variables is generated by sampling each state partition uniformly. . Results and discussion The wind power, speed and direction time-series clearly show a daily time-dependent behavior.
Figure 2: Two-dimensional histograms of the original time series data: speed-time (top left), direction-time (top right) and power-time (bottom left). Thesubfigure on the bottom right shows the p-value of the Kolmogorov-Smirnovtest used to compare the wind speed distribution on the different times ofthe day.Figure 2 shows that, on average, the turbine does not produce power between 10 am and 3 pm. In this time interval, low wind speeds (0.1 - 0.25) are the most likely events. There are two dominant wind directions: around ◦ and 300 ◦ . Moreover, they occur at specific times of the day; between 5 and 10am, the wind typically blows from the 100 ◦ direction, the rest of the day from 300 ◦ .
22o assess whether these two dominant directions might be due to sum- mer/winter seasonality, the dataset was divided in two subsets, one covering the period from April to September and the other from October to March.
The histogram analysis shows that both, summer and winter subset, have the same two dominant directions (figures not shown). Thus, it was concluded that the time-dependent pattern is not induced by this seasonality.
Figure 2 bottom-right plot shows the p-values of the Kolmogorov-Smirnov test applied to the wind speed distributions at different times of the day. The
Kolmogorov-Smirnov test is a nonparametric test for the equality of con- tinuous one-dimensional probability distributions. Thus, the high p-values around the diagonal illustrates that wind speed distributions for consecutive times of the day are similar. The same holds for wind speeds in the morning and evening, i.e. before 9am and after 4:30pm. The wind speed distribution between 10am and 3pm is clearly different.
The model introduced in section 3.2 has two parameters that need to be defined: k , the order of the Bernstein polynomials used to model the transition probabilities; and w , the number of subdivisions used to tighten the convex hull that bounds the polynomials. To choose proper values for these parameters, different models were computed by varying k = 4 ... and w = 0 ...
3. For each model, synthetic data was generated following the procedure described in section 5 and compared with the real dataset. ria: the objective function value, the daily average Jensen-Shannon distance between original and synthesized wind direction data, the number of inequal- ities in the problem formulation and the CPU time spent in IPOPT solving the optimization problem. The Jensen-Shannon distance is the square root of the Jensen-Shannon divergence d js , which, for two discrete probability distributions q and q is defined as: d js = 12 (cid:88) i q ( i ) q ( i ) · q ( i ) + 12 (cid:88) i q ( i ) q ( i ) · q ( i ) . Comparing the models using the objective function value (figure 3 top left) shows a decrease of the objective function as the model order and the of subdivisions is higher for models with higher polynomial order. Moreover, the first subdivision has the highest impact since it leads to the highest decrease of the objective function value. The daily average of the Jensen-
Shannon distance (figure 3 top right) decreases with the polynomial order until sixth order. The same behavior can be observed for the number of subdivisions: until the sixth order, the Jensen-Shannon distance decreases with the number of subdivisions. The number of inequality constraints in the optimization problem as well as the number of CPU seconds spent in the solver show the expected behavior (figure 3 bottom). They increase linearly with the polynomial order and exponentially with the number of subdivisions.
Based on these observations, a basis order of 6 with 2 subdivisions was chosen as the best trade-off between an accurate representation of the average daily patterns and computational costs.
This section compares the main statistical properties derived from the original data with the ones derived from the data generated by the time- variant Markov model.
Figure 4: Comparison of the probability distribution of wind power (left),wind speed (middle) and wind direction (right) of the original with the syn-thesized data.Figure 4 compares the wind power (left), speed (middle) and direction (right) distribution of the original with the synthetic data generated using the Markov model. In general, the distributions are in close agreement. The wind power distribution is bimodal, with the modes located at the minimum and maximum power. It shows that the intermediary power levels are rather rare, for instance, the states corresponding to a power production between wind speeds (Weibull distribution). The wind direction distribution is bi- modal with the two modes at 100 and 300 degrees, which are the prevailing wind directions at the turbine site (figure 2).
Figure 5: Two dimensional power-time histograms of the original (left) andsynthetic (right) time-series data.Figure 5 shows two plots: on the left, the empirical 2D distribution of the wind power and direction computed from the data and, on the right, the same distribution computed using the data generated by the Markov model. Its comparison shows that the model captures the joint statistics for the wind power and direction from the data. It is possible to see the two dominant directions associated with high wind power production, namely the sectors from 100 to 120 and from 290 to 320 degree. Figures 2 and 5 clearly demonstrate the capability of this Markov model to capture the com- bined characteristics of the wind power, speed and direction. The long-term behavior of the model is close to what is observed in the dataset.
As shown in section 6.1, the original data clearly exhibits a time-dependent behavior. To test, if the time-dependent Markov model can capture it, syn- thetic data was generated and the histograms compared to the ones of the original data. Moreover, to obtain a comparison with the “regular” way of data synthesis with Markov models, data was also generated from the time- invariant Markov chain. variant Markov model is capable of reproducing the time-variant behavior of the data. Figure 6(second column) presents the results of using a time- invariant Markov chain model, i.e. by using constant transition probability during the daily cycle.
The time-dependent Markov model allows to compute the persistence of power-production depending on the time of the day. Figure 6 shows the time-dependent persistence of power production for different power levels ( i · . · p max , for i = 1 , ..., power production levels: a) persistence of useful power production (PUPP) defined as above 0 . · p max , i.e. the power level corresponding to the wind speed mode at the turbine site; and, persistence for high power production (PHPP), i.e. above > . · p max . It can be seen, that the higher the power level, the lower the persistence. Moreover, for all power levels, persistence is minimal between 5 and 10 am. PHPP is fairly constant throughout the day (dark line), the maximal differences are between 10 and 30 minutes, whereas PUPP reaches a maximum at around 5 pm (white line). i · . · p max ,for i = 1 , ...,
19. The lines highlight the time-dependent persistence, for twoconditions: a) in white, for power production above 0 . · p max (PUPP); and,b) in black, the power production above 0 . · p max (PHPP).Since the data shows two different dominant directions (figure 4), figure 8 presents the persistence of power production conditioned to each dominant direction, i.e. for the direction sectors from 90 ◦ to 180 ◦ and 270 ◦ to 360 ◦ . i · . · p max ,for i = 1 , ...,
19 for direction sectors 90 ◦ -180 ◦ (left) and 270 ◦ -360 ◦ (right).As expected, the persistence in both direction sectors is lower than the unconditioned persistence. For wind directions in the sector 90 ◦ -180 ◦ , all levels of power production have a minimum persistence between 80 and 100 minutes at around 1 pm. Maximum persistence is around midnight varying between 220 minutes (PUPP) and 140 minutes (PHPP). It can be seen that for power levels below 50% of maximum production the time of day depen- dency of persistence is very similar. For power levels above 50% persistence decreases as power production increases. However, the persistence variability with the power level is rather low, for example, maximum persistence at a level of 75% is almost 180 minutes whereas for a level above 0.05% is 200 minutes. For wind directions in the sector 270 ◦ -360 ◦ , it shows that, for all power lev- els, the curves for both PUPP and PHPP are similar, i.e. their minima and maxima are located around the same time of the day. For instance, maximal persistence of production is reached at around midday. However, for this direction sector, the higher the power production level, the lower the persis- tence. For power production above 0.05% of maximum power the persistence is 250 minutes, persistence of production above 75% of maximum power is only 100 minutes. Comparing with the other dominant direction, it can be seen, that they have very different persistence behavior. The maxima and minima are at different times of the day for every power level. The persistence increases as power production decreases, for all power levels in the case of the 270 ◦ -360 ◦ sec- tor. For the 90 ◦ -180 ◦ sector it decreases only until 50% of maximum power
7. Conclusions
This paper presents an inhomogeneous Markov process to model wind power production. It is developed using states, which combine information about the wind speed, direction and power variables, using real data recorded by a wind turbine in Portugal. The joint partition of the three-dimensional variable space allows to decrease the number of the model states and, si- multaneously, encodes the wind power curve into the Markov chain model.
The transition probabilities are considered to be functions that depend on the time of the day and modeled as Bernstein polynomials. The estimation of the transition matrices is performed by solving a constrained convex optimiza- tion problem. Its objective function combines two log-likelihood functions with the purpose to accurately represent both the long-term behavior and the daily fluctuations seen in the original data. To evaluate the statistical properties of the estimated Markov model, synthetic time-series are gener- ated and compared with the original data statistics. Results demonstrate that the proposed Markov model can reproduce the diurnal patterns in the data. Moreover it is demonstrated how the persistence of power production throughout the time of the day can be estimated from the Markov process transition matrices.
Acknowledgments
The authors thank the Funda¸c˜ao para a Ciˆencia e a Tecnologia for fi- nancial support (SFRH/BD/86934/2012, FCOMP-01-0124-FEDER-016080 (PTDC/SENENR/1141718/2009)) and GENERG, SA. eferences [1] Anastasiou, K., Tsekos, C., 1996. Persistence statistics of marine envi- ronmental parameters from markov theory, part 1: analysis in discrete time. Applied Ocean Research 18 (4), 187 – 199. [2] Anderson, T. W., Goodman, L. A., 1957. Statistical inference about markov chains. The Annals of Mathematical Statistics 28 (1), 89–110. [3] Andersson, J., Houska, B., 2010. Towards a Computer Algebra System with Automatic Differentiation for use with Object-Oriented modelling languages. Object-Oriented Modeling Languages. [4] Boyd, S. P., Vandenberghe, L., 2004. Convex optimization. Cambridge Univ Pr. [5] Cancino-Sol´orzano, Y., Guti´errez-Trashorras, A. J., Xiberta-Bernat, J., assessing the best site for a wind farm in the state of Veracruz, Mexico.
Renewable Energy 35 (12), 2844 – 2852. [6] Carapellucci, R., Giordano, L., 2013. The effect of diurnal profile and seasonal wind regime on sizing grid-connected and off-grid wind power plants. Applied Energy 107 (0), 364 – 376. [7] Chen, P., Berthelsen, K., Bak-Jensen, B., Chen, Z., 2009. Markov model of wind power time series using Bayesian inference of transition matrix.
In: Industrial Electronics, 2009. IECON ’09. 35th Annual Conference of
IEEE. pp. 627 –632. [8] Corotis, R. B., Sigl, A. B., Klein, J., 1978. Probability models of wind velocity magnitude and persistence. Solar Energy 20 (6), 483 – 493. [9] Kemeny, J. G., Snell, J. L., 1976. Finite Markov Chains. New York :
Springer-Verlag. [10] Ko¸cak, K., 2002. A method for determination of wind speed persistence and its application. Energy 27 (10), 967 – 973. [11] Ko¸cak, K., 2008. Practical ways of evaluating wind speed persistence.
Energy 33 (1), 65 – 70. records using detrended fluctuation analysis. Energy 34 (11), 1980 – [13] Lopes, V. V., Scholz, T., Estanqueiro, A., Novais, A. Q., 2012. On the use of Markov chain models for the analysis of wind power time- series. In: Environment and Electrical Engineering (EEEIC), 2012 11th
International Conference on. pp. 770 –775. [14] Masseran, N., Razali, A., Ibrahim, K., Zin, W. W., 2012. Evaluating the wind speed persistence for several wind stations in peninsular Malaysia.
Energy 37 (1), 649 – 656. [15] Nfaoui, H., Essiarab, H., Sayigh, A., 2004. A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco. Re- newable Energy 29 (8), 1407 – 1418. [16] Papaefthymiou, G., Kl¨ockl, B., 2008. MCMC for wind power simulation.
Energy Conversion, IEEE Transactions on 23 (1), 234 –240. [17] Pillai, S., Suel, T., Cha, S., 2005. The perron-frobenius theorem: some of its applications. Signal Processing Magazine, IEEE 22 (2), 62–75. [18] Platis, A., Limnios, N., Le Du, M., 1998. Dependability analysis of systems modeled by non-homogeneous Markov chains. Reliability Engi- neering and System Safety 61 (3), 235 – 249. [19] Poje, D., 1992. Wind persistence in Croatia. International Journal of
Climatology 12 (6), 569–586. [20] Pryor, S., Barthelmie, R., 2002. Statistical analysis of flow character- istics in the coastal zone. Journal of Wind Engineering and Industrial
Aerodynamics 90 (3), 201 – 221. [21] Raischel, F., Scholz, T., Lopes, V. V., Lind, P. G., 2012. Uncovering wind turbine properties through two-dimensional stochastic modeling of wind dynamics. Unpublished results, arXiv:1210.7161. [22] Sahin, A. D., Sen, Z., 2001. First-order Markov chain approach to wind speed modelling. Journal of Wind Engineering and Industrial Aerody- namics 89 (34), 263 – 269.
First and second order markov chain models for synthetic generation of wind speed time series. Energy 30 (5), 693 – 708. [24] Sigl, A. B., Corotis, R. B., Won, D. J., 1978. Run duration analysis of surface wind speeds for wind energy application. Journal of Applied
Meteorology 18, 156–166. [25] Suomalainen, K., Silva, C., ao, P. F., Connors, S., 2013. Wind power de- sign in isolated energy systems: Impacts of daily wind patterns. Applied
Energy 101 (0), 533 – 540. [26] Suomalainen, K., Silva, C., Ferro, P., Connors, S., 2012. Synthetic wind speed scenarios including diurnal effects: Implications for wind power dimensioning. Energy 37 (1), 41 – 50. [27] W¨achter, A., Biegler, L. T., 2006. On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming.
Mathematical Programming 106, 25–57. [28] Wen, J., Zheng, Y., Donghan, F., 2009. A review on reliability assess- ment for wind power. Renewable and Sustainable Energy Reviews 13 (9),722