[PDF] Improvement on Extrapolation of Species Abundance Distribution Across Scales from Moments Across Scales

Abstract

Raw moments are used as a way to estimate species abundance distribution. The almost linear pattern of the log transformation of raw moments across scales allow us to extrapolate species abundance distribution for larger areas. However, results may produce errors. Some of these errors are due to computational complexity, fittings of patterns, binning methods, and so on. We provide some methods to reduce some of the errors. The main result is introducing new techniques for evaluating a more accurate species abundance distributions across scales through moments across scales.

Full PDF

1 Running Title: Distribution moments

Improvement on Extrapolation of Species Abundance Distribution Across Scales from Moments Across Scales

Saeid Alirezazadeh * ,a , Khadijeh Alibabaei b a CIBIO, InBIO, CEABN, Lisboa, Portugal, b Departamento de Matematica, Faculdade de Ciencias, Universidade do Porto, Porto, Portugal

Abstract

Keywords:

Species Abundance Distribution; Scales, Raw Moments; Distribution Moments; Tchebychev moments and polynomials; Error Reduction; Extrapolation.

1. Introduction

Species abundance distribution (SAD) describes the relationship between number of observed species in a sample and their relative abundance. It describes the full distribution of rarity and commonness in a sample data. The SAD is one of the most widely studied patterns in ecology. Hence, several different models used to characterize the shape of the distribution and identify the potential mechanisms of the pattern. Fisher et al. (1943) suggested that the logseries is the theoretical distribution for the relative abundance of species. Preston (1948), by looking at SADs at different scales, proposed, instead, the lognormal as the limiting distribution. Bulmer (1974 and 1979) showed that the Poisson lognormal is a better alternative to the lognormal. More recently, Engen and Lande (1996) also considered the compounded Poisson Gamma distribution. May (1979) stated that the logseries may be viewed as the distribution characteristic of relatively simple communities whose dynamics is dominated by some single factor, but, on the other hand, if the * Corresponding author at: CIBIO, InBIO, CEABN, Instituto Superior de Agronomia, U-Lisboa, Portugal. E-mail address: [email protected] environment is randomly fluctuating, or if several factors become significant the central limit theorem will produce the lognormal distribution. Although most of the work on SADs has been done by looking at a fixed scale, there are exceptions. Hubbell (2001) developed the neutral theory that seeks to predict SADs in space and time across scales. This theory was called neutral because it assumed all individuals in a community to be equivalent in their rates of reproduction, mortality, dispersal and speciation. Two of the speciation modes considered by Hubbell, the point mutation and the fission modes, led to SADs that changed differently depending on the scale. Although the fission mode led to the same type of distribution at different scales, the point mutation mode predicted the zero-sum multinomial distribution (a distribution introduced by Hubbell (2001)) at small scales (the local community) and predicted the logseries distribution at large scales (the metacommunity). Therefore, neutral theory acknowledges that the SAD changes across scales. Typically, empirical SADs come from samples that represent only a small part of the community, those sizes that are practical (or economical) to obtain. If we could find a pattern for how SADs scale with area (or other parameter), this could enable us to predict the distributions for larger areas, at least within some reasonable scales, likely to be dictated by the characteristics, such as the homogeneity of the habitats, of the landscape where the community exists. Moment functions are used in image analysis and related applications, such as pattern recognition, object identification, template matching, and pose estimation (see Liao and Pawlak (1996), Teague (1980), and Mukundan, Ong, and Lee (2001)). However, their application to the study of SADs have only recently been attempted (see Borda-de-Água et al. (2012 and 2017) and Alirezazadeh et al. (2018)). Here, we define moments, 𝑀 ! (often called raw moments), of a sample as 𝑀 ! = 1𝑆 % 𝑥 "! , (1) where 𝑛 defines the order of the moment and 𝑥 " is the 𝑙𝑜𝑔 & of number of individuals of the species 𝑖 in the sample which consists of 𝑆 species. Moments are important because knowing them is enough to reconstruct the probability density function, for instance, when 𝑛 = 1 we obtain the mean, when 𝑛 = 2 we obtain the value related to variance, and so on, Feller (2008). Note that, all the existing samples are contained in a larger area which its species abundance distribution may not be known. For a given sample, if we could find any pattern for species abundance distributions across scales, we are able to estimate species abundance distribution for a larger scale and so for the total community where the sample is contained in. After finding moments of different orders, one can easily obtain Tchebychev moments directly from moments. Then by using Tchebychev polynomials which are fixed polynomials (firstly provided by Tchebychev (1854)), we have a polynomial estimation for the species abundance distribution. This procedure for a given moments has different level of complexities which by improving produces better polynomial estimation of the real distribution. The first is computational complexity of Tchebychev polynomials and moments where we use the recurrence relation between polynomials of each degree and then introduce the matrix multiplier which provides the Tchebychev moments from the raw moments. The next complexity is the raw moments themselves because the method of Tchebychev polynomials is very sensitive to values of raw moments, slight changes on their values will make huge differences. We will instead use the distribution moments. But the distribution moments do not have linear behavior like raw moments by areas in log-log scale. This latter linear pattern allows to extrapolate moments for larger scales. We will provide a solution as a way to extrapolate distribution moments by using their relationship with raw moments. Note that, algorithms that combine techniques from symbolic and numeric computation have increasing importance and interest recent years. The necessity to work reliably with imprecise and noisy data, and for speed and accuracy within algebraic and hybrid symbolic-numeric problems, has encouraged a rapid development of symbolic-numeric computation field. Novel and exciting problems from industrial, mathematical and computational domains are now being explored and solved. We use a symbolic-numeric technique to reduce computational complexity. Our examples are sample of the tropical forest Barro Colorado Island (BCI) (Hubbell et al. (2005), Hubbell et al. (1999), Condit (1998), Condit et al. (2017)) and all the individuals of all the species with at least and sometimes ( ) (dbh) (Diameter at Breast Height) has been considered.

2. Preliminaries

There are several equivalent ways of formulating raw moments, the following is the most convenient one which is used in the examples: Let a sample data 𝐴 consists of individuals of different species. Let 𝑥 " be the log of base of the total number of individuals of the species 𝑖 and let 𝑘 be the total number of species in the sample. Then the (raw) moment of order 𝑡 , denote by 𝑀 ’ (𝐴) , can be defined as follows: 𝑀 ’ (𝐴) = 1𝑘 % 𝑥 "’("$% . For more details and application of moments see Borda-de-Água, Hubbell, and McAllister (2002); Borda-de-Água et al. (2012); Borda-de-Água et al. (2017); Mukundan (2004). This will allow us to study moments across scales. In the Figure 1, we plot the log transformed of moments up to order twentieth by the log of the area size from area size (ha) to (ha) for BCI data one considered all individuals with 1cm dbh and the other all individuals with 10cm dbh. As it is shown in the Figure 1 the almost linear patterns for moments of different orders and area size in log-log scale can be visualized. Figure 1.

Raw moments of order 1 up to 20 as a function of area from 2 (ha) to 50 (ha) in log-log scale for BCI by considering individuals with (a) 1cm dbh (b) 10cm dbh.

Note that, raw moments are estimations of distribution moments which can be obtained from corresponding species abundance distribution. Before arguing their differences, we need to explain the binning which causes the differences. The bins are intervals with general form [𝑏, 𝑏 + 1) where 𝑏 takes its values from sequence -0.5, 0.5, 1.5, 2.5, 3.5, … that means bins are centered at natural numbers 0, 1, 2, 3, … . The species 𝑗 is associated with the bin centered by 𝑖 if and only if 𝑖 is the smallest integer close to log & (𝑥 ) ) for 𝑥 ) is the number of individuals of the species 𝑗 . Let (𝑥) denotes the smallest integer close to 𝑥 . Hence, the raw moment of order 𝑛 can be obtained from the formula % ∑ 𝑥 "! , while the distribution moment of order 𝑛 can be obtained from 𝑀′ ! = % ∑ (𝑥 " ) ! , where 𝑥 " is the log & of number of individuals of the species 𝑖 and 𝑆 is the total number of species. If we apply the distribution moments to evaluate Tchebychev moments and use them together with the Tchebychev polynomials, then we can reproduce the exact SAD, while using the raw moments will reproduce an estimation for SAD. In the Figure 2, we plot the log transformed of distribution moments up to order twentieth by the log of the area size from area size (ha) to (ha) again for BCI data one considered with all individuals with 1cm dbh and the other all the individuals with 10 cm dbh. + + + + + (a) BCI Area [ha] R a w M o m en t + + + + + (b) BCI Area [ha] R a w M o m en t Figure 2.

Distribution moments of order 1 up to 20 as a function of area from 2 (ha) up to 50 (ha) in log-log scale for BCI by considering individuals with (a) 1cm dbh (b) 10cm dbh.

As we can see, the pattern of distribution moments by area in log-log scale is not linear. We used moments up to order 20 th in order to have a better visual of their non-linear patterns across scales.

3. Patterns

If we consider the relation between logB𝑀 *! (𝑥)C and logB𝑀 ! (𝑥)C , it turns out that it is a linear function through 0, see Figure 3. However, slopes are changing when we change the area size. Consider the function 𝐶 ! (𝑥) = 𝑠𝑙𝑜𝑝𝑒BlogB𝑀 *! (𝑥)C ~ logB𝑀 ! (𝑥)CC, where 𝑥 corresponds the area size. The function 𝐶 ! (𝑥) is defined as the slope of the linear regression between log of the raw and log of the distribution moments at area of size 𝑥 . Hence, finding the functions 𝐶 ! (𝑥) and logB𝑀 ! (𝑥)C allow us to find the functionality of 𝑀′ ! (𝑥) . This allows to have a better estimation of the SAD by extrapolating area. + + + + + (a) BCI Area [ha] E x a c t M o m en t + + + + + (b) BCI Area [ha] E x a c t M o m en t Figure 3.

Relation between Raw and Distribution moments of different orders in 1, 20 and 50 ha area.

Figure 4.

Slopes of log of 2 nd and 4 th raw moments by log of respective distribution moments across log of area. The Figure 4 shows that how the slopes second and forth moments change by the log of area.

Since the almost linear patterns between log of raw moments and log of areas happen, see Figure 1, we can extrapolate raw moments for larger area. But as we can see in the Figure 5, fitting the pattern of the log of raw moments by the log of areas as linear functions is poor. But by looking at residuals of linear fitting of raw moments it appears that a sinusoidal function is a good choice. Hence, pattern for log of raw moments across scales can be obtain from the following general formula: logB𝑀 ! (𝑥)C = 𝑎 + + 𝑎 % 𝑥 + 𝑏 sin(𝑐𝑥 * + 𝜑), Relation between Raw and Distribution Moments in different area log(Raw Moments) l og ( D i s t r i bu t i on M o m en t s ) . . . . . Area (square meter) S l ope . . . . . . . Area (square meter) S l ope where 𝑥 takes it values from the log of areas and 𝑥 * = ,-./0 1234.56 1234-./0 1234 𝜋 (associated values of the log of areas in the interval [0, 𝜋] ). By considering the shape of the residuals, we can consider 𝑐 = 1 and so the formula can be translated to the following non-linear function: logB𝑀 ! (𝑥)C = 𝑎 + + 𝑎 % 𝑥 + 𝑎 & sin(𝑥 * ) + 𝑎 cos(𝑥′). Hence, for the extrapolation of moments in the Figure 1 we will not use linear regression, instead we use regression by sums of functions of linear function of 𝑙𝑜𝑔 of area size, sin(𝑥) and cos(𝑥) where 𝑥 is the transformation of log of area sizes to its respective gradient values. Then after by finding the functionality of 𝐶 ! (𝑥) , we are able to do the extrapolation of distribution moments for larger areas. The Figure 5 and Figure 6 show respectively how the fitting of log of raw moments of order 10 by log of area with linear regression and with the preceding fitting functions are. Figure 5.

Fitting of log of 10 th raw moments by the log of areas for BCI with linear function and area size larger than 2 (ha). Linear Fitting 10th Raw Moment across scales log(Area size) l og ( t h r a w m o m en t ) Figure 6.

Fitting of log of 10 th raw moments by the log of areas for BCI with non-linear functions and area size larger than 2 (ha). 𝑪 𝒏 (𝒙) In order to find the pattern of 𝐶 ! (𝑥) we need to first find and remove the possible trend in it. Note that in our examples we considered 𝑥 as the square root of area instead of area size itself, because the change of value of 𝑥 must be linear and so we do not need to have additional rounding of rational or irrational numbers. For removing possible trend, we will remove its linear regression pattern, that is: 𝐶′ ! (𝑥) = 𝐶 ! (𝑥) − 𝑎𝑥 − 𝑏, where 𝑥 corresponds to the square root of area. This allows us to centralize the values of 𝐶 ! (𝑥) . From these values we will take the 𝑎𝑟𝑐𝑡𝑎𝑛 which give us the gradient of degrees between [− , ] (values are very close to 0). Note that arctan (𝑦) ≈ 𝑦 in this interval and since the change of the degrees are very small. Then recursively we can consider the degree corresponds to extrapolated as the degree corresponds to the degree obtained from the largest area 𝑥 . Note that the shape of 𝐶′ ! (𝑥) is related to the largest occupied bin and the number of the most abundant species, where both increase with area. If we denote the largest occupied bin by (LB) and the number of most abundance species by (NMAS) in the area 𝑥 , then there is a clear pattern for the shape of 𝐶′ ! (𝑥) by increasing the area: Non − linear Fitting 10th Raw Moment across scales log(Area size) l og ( t h r a w m o m en t ) - LB and NMAS remain the same, then 𝐶′ ! (𝑥) decreases; - LB remains the same but NMAS increases, then 𝐶′ ! (𝑥) increases; - LB increases, then 𝐶′ ! (𝑥) increases. Since there are no proper rules for values of LB and NMAS across scales, we will consider our simplified approach which is: by increasing slightly the area 𝑥 from 𝑥 % to 𝑥 & more likely the respective values of LB and NMAS remains the same. Intuitively, LB as a function of area must be defined recursively, and such function respects the number of individuals of the most abundant species. From this intuition it turns out, for BCI data, around 32 ha area LB increases by 1. As three species can be considered as the most abundant species and around 32 ha area the number of bins from 11 increases to 12 and also it implies that around 64 ha area, we must have 13 bins. To test our results, we find the patterns from 2 (ha) to half of area then extrapolate to the total area to see how different the results are. The Figure 7 shows how 𝐶′ %+ (𝑥) changes from 2 (ha) to 25 (ha) and then from that the Figure 8 shows how the extrapolation of 𝐶 %+ (𝑥) changes from 2 (ha) to 50 (ha). Figure 7.

In the data from BCI, residuals of linear regression of slopes of the log of 10 th raw moment by the 10 th distribution moment at square root of area size from 2 (ha) to 25 (ha), circles are real data and red line is by considering residuals as arctan of itself.

100 200 300 400 500 − . − . . . . comparison the residuals and atan of residuals across area (up to 25ha) square root area r e s i dua l s o f s l ope s ( w i t hou t t r end ) Figure 8.

In the data from BCI, the slopes of linear regression of the log of 10 th raw moment by the 10 th distribution moment at square root of area size from 2 (ha) to 50 (ha), circles are real data, red line is by considering residuals as arctan of itself and black line is the extrapolation by considering only 25 (ha) of the total area.

4. Reduce Computational Complexity

In order to find species abundance distribution from the given raw moments we use Tchebychev moments and polynomials where their computations involve a large amount of computational complexity. Here we use recurrence relationship between Tchebychev polynomials to reduce their computational complexity. And from that we introduce the matrix multiplier for computing Tchebychev moments from a given set of moments of different orders.

Tchebychev polynomials are polynomials of the form 𝑡 ! (𝑥) = ∑ 𝑎 ",! 𝑥 "!"$+ , (1) where 𝑎 ",! ’s are coefficients correspond to the maximum degree of the polynomial and there is a formula to obtain the distribution values for them. After we obtained the values of moments of different orders, 𝑀 " ’s, by using the following formula Tchebychev moments of different orders can be obtained 𝑇 ! = ∑ 𝑎 ",! 𝑀 "!)$+ ,

100 200 300 400 500 600 700 . . . . extrapolation of slopes to 50ha from 25ha square root area s l ope s where 𝑎 ",! ’s are the same as before. Then the species abundance distribution can be obtained as follows 𝑓(𝑥) = ∑ 𝑇 " 𝑡 " (𝑥) !"$+ . The original formula of Tchebychev moments is used in Borda-de-Água et al. (2012); Borda-de-Água et al. (2017). When the moment order becomes large, the Tchebychev moments tend to exhibit numerical instabilities. From Zhu et.al. (2007); Chihara (1989); Wang and Wang (2006); Mukundan (2004), by using recurrence relation and since the polynomials are orthogonal, the computational complexity of finding the Tchebychev polynomials will be minimized. The Tchebychev polynomials up to degree 𝑁 follow the following recursive formula 𝑡 ! (𝑥) = (𝛼 % 𝑥 + 𝛼 & )𝑡 !-% (𝑥) − 𝛼 𝑡 !-& (𝑥), 𝑛 = 2, ⋯ , 𝑁 and 𝑥 = 0,1, ⋯ , 𝑁′ , (2) where the initial values are 𝑡 + (𝑥) = %√= and 𝑡 % (𝑥) = _ %-=>&,?=(= ! -%) ‘ × √3 , and 𝛼 % = d &! e f :! ! -%= ! -! ! , 𝛼 & = d %-=! e f :! ! -%= ! -! ! , and 𝛼 = d !-%! e f &!>%&!-7 f = ! -(!-%) ! = ! -! ! . By comparing equations (1) and (2), and by using symbolic calculations the coefficients 𝑎 ",! ’s for all values of 𝑖 and 𝑛 can be obtain and we have the following matrix representation of coefficients ⎣⎢⎢⎢⎢⎢⎡𝑎 +,% +,& 𝑎 %,& +,7 𝑎 %,7 𝑎 &,7 +,: 𝑎 %,: 𝑎 &,: 𝑎 ⋯ 0⋮ ⋮ ⋮ ⋱ ⋮𝑎 +,= 𝑎 %,= 𝑎 &,= 𝑎 ⋯ 𝑎 =-%,= ⎦⎥⎥⎥⎥⎥⎤ , In a symbolic calculation of a function, we consider variable(s) of the function as a symbol (does not have numerical values) then we apply existing operations such as multiplication, summation, subtraction, power, and so on, in order to find the coefficients and powers of the variable(s). where by matrix multiplication and having the values of raw moments up to degree 𝑁 * − 1 , we have the values of Tchebychev moments up that order as follows ⎣⎢⎢⎢⎢⎡ 𝑇 + 𝑇 % 𝑇 & 𝑇 ⋮𝑇 =*-% ⎦⎥⎥⎥⎥⎤ = ⎣⎢⎢⎢⎢⎢⎡ 𝑎 +,% +,& 𝑎 %,& +,7 𝑎 %,7 𝑎 &,7 +,: 𝑎 %,: 𝑎 &,: 𝑎 ⋯ 0⋮ ⋮ ⋮ ⋱ ⋮𝑎 +,=* 𝑎 %,=* 𝑎 &,=* 𝑎 ⋯ 𝑎 =*-%,=* ⎦⎥⎥⎥⎥⎥⎤ × ⎣⎢⎢⎢⎢⎡ 𝑀 + 𝑀 % 𝑀 & 𝑀 ⋮𝑀 =*-% ⎦⎥⎥⎥⎥⎤ . Note that the values of 𝑥 is

0, 1, …, maximum number of bins. These calculations will reduce the computational complexities for evaluating the both Tchebychev polynomials and Tchebychev moments.

5. Additional Error Reduction

By considering the maximum number of bins is 1000, the Figure 9 shows the orthonormal Tchebychev polynomials up to degree 5.

Figure 9.

We consider the maximum number of bins is 1000 then we draw the Tchebychev polynomials up to degree 5.

In Figure 9, if the maximum number of bins is very law, normally in species abundance distributions we have very law number of bins, then instead of smooth shapes for polynomials of higher degree, we start to have the stagger shapes which is another problem for using Tchebychev − . − . . . . . . Tchebychef polynomials up to degree 5 by considering the maximum number of bins equal to 1000 x y Tchebychef polynomial of degree 0Tchebychef polynomial of degree 1Tchebychef polynomial of degree 2Tchebychef polynomial of degree 3Tchebychef polynomial of degree 4Tchebychef polynomial of degree 5 polynomials and moments. To handle this last issue, note that for a given data the species abundance distribution is a distribution function which is defined over the interval [0, 𝑀] where 𝑀 is the maximum number of bins obtained from data and so we may assume that the values of the distribution function outside of this interval is equal to zero. Under this assumption the maximum number of bins can be consider much larger, but the values of raw moments will not change. In another word, we consider the real species abundance distribution as a part of the bigger one which involves more bins with number of species 0. Note that, if we use very large number for the number of bins, we must use more moments in our calculations to get a better result. By using the small number of moments, intuitively, we will have a shade at the beginning and then straight line over x axis. To handle this problem, we suggest that by increasing the number of bins we must add moments of higher order. Hence, we are almost open handed to use a larger value for the number of bins and also the order of moments can be higher, in the original case the number of moments that could be involve in the calculations, in theory, is at most the maximum number of bins minus 1, where by our observation for using moments of high order the staggering behavior starts to appear. But now in the current version, after we find the Tchebychev polynomials and moments then by the formula 𝑓(𝑥) = ∑ 𝑇 " 𝑡 " (𝑥) !"$+ , we obtain the distribution but the values of 𝑥 take place in the interval of zero and the new number of bins that we newly considered. Now for the result we only need to use the restriction of this function to the interval of interest which is from zero up to the maximum number of bins. Here we consider BCI data including all the individuals of all species in the 50 ha area. In the Figure 10 we considered number of bins as 20, 30, 40, …, 800. We check how much we need to consider as the maximum order of raw moments in our calculations. Figure 10.

We consider additional bins up to 800 for data from BCI and find the suitable maximum order for moments such that the sum of square of differences between real SAD and the one can be obtained from moments is minimum, then we plot the moment order corresponds to the number of bins. First one is in arithmetic and the second one is in log-log scale. As we can see in log-log scale a linear pattern appears. Now to see how accurate we are, we plot the errors (the sum of square of differences between real SAD and the one obtained by considering raw moments) of corresponding maximum moments order as a function of number of bins. As we can see in the Figure 11 the errors are decreasing.

Figure 11.

Note that, we are only able to find moments up to degree 250. The Figure 12 shows that how the moments orders are with respect to number of bins which the minimum errors occur by considering the number of bins equal to 500, 510, 520, …, 4000. Then we plot it in log-log scales and finally we show that how the corresponding minimum errors with respect to the moments of suitable order change as a function of number of bins. Figure 12.

We consider additional bins up to 4000 for data from BCI and find the suitable maximum order for moments such that the sum of square of differences between real SAD and the one can be obtained from moments is minimum. We plot the moment order corresponds to the number of bins. First one is in arithmetic and the second one is in log-log scale. The we plot the minimum errors correspond to the number of bins.

Recall that moments have information about the species abundance distribution, so it is necessary to involve moments of higher orders in the calculations. Our method is involved more information which are obtained from moments in the result. The linear pattern of maximum moment orders by number of bins suggest that, as we can only obtain the moments up to order 250, the number of bins that we can use in our arguments should not reach 6168 bins. With respect to BCI data, by letting 𝑁 to be number of bins, the maximum order of moment we need to consider follows the following formula (with 𝑅 & = 0.9999325 ): exp(0.4977 log 𝑁 + 1.1779).

6. Theory in Example Our example is the data of tropical forest BCI which consists of number of individuals with 10cm dbh per species. In the Figure 13 we compare the results of evaluating species abundance distribution from raw and distribution moments. For distribution moments we use 12 moments and for raw moments we use 12 and 11 moments. By error we mean sum of absolute differences.

Figure 13.

SAD for BCI, all individuals with at least 10cm dbh are considered. We plot over it the SADs obtained from moments; Green and dashed line is by considering distribution moments up to degree 12; Dark Blue and dashed line is by considering raw moments up to degree 12; Red and dashed line is by considering raw moments up to degree 11.

To have a better visual of the differences in the Figure 14 we drop the case with use of 12 raw moments.

SAD from moments log (Number of Individuals) N u m be r o f S pe c i e s Raw moments (12)Distribution moments (12) Raw moments (11) error=42.07error=0error=11.87 Figure 14.

SAD for BCI, all individuals with at least 10cm dbh are considered. We plot over it the SADs obtained from moments; Green and dashed line is by considering distribution moments up to degree 12; Red and dashed line is by considering raw moments up to degree 11;

In the Figure 15 for the sample data BCI, we extrapolate the species abundance distribution for the area of 50 ha by only assuming a sub-area of size 25 ha by using extrapolation of moments up to order 10. For distribution moments we use 7 moments and for raw moments we use 7, and 6 moments respectively for non-linear and linear, these values correspond to the values which have minimum errors.

SAD from moments log (Number of Individuals) N u m be r o f S pe c i e s Distribution moments (12) Raw moments (11) error=0error=11.87 Distribution moments (12) Raw moments (11) error=0error=11.87 Figure 15.

Extrapolation of SAD for BCI, all individuals with at least 10cm dbh are considered, by extrapolating the moments. And by only considering data from 25 (ha) area. The histogram is the real data. We plot over it the SADs obtained from extrapolation of moments to 50 (ha). Red line is by extrapolating raw moments under consideration linear relation and moments up to degree 6 is used. Green line is by extrapolating raw moments under consideration of non-linear function and moments up to degree 7 is used; Blue line is by extrapolating distribution moments and moments up to degree 7 is used.

In the Figure 15 we showed how the results are different. By considering the linear version of extrapolation and the non-linear version for BCI with individuals with 10cm (dbh) and extrapolation to 50 ha from a subarea of 25 ha. In the Figure 16 for the sample data BCI, we extrapolate the species abundance distribution to the area of 50 ha by using the sub-area of size 25 ha and by using extrapolation of distribution moments up to degree 10. We considered the case of adding extra bin and then consider 8 instead of 7 moments. Note that by involving more moments we add extra information about the SAD.

Extrapolation from half area to total area log (Number of Individuals) N u m be r o f S pe c i e s Distribution moments(7) Raw new (7) Raw linear (6) error=9.96error=10.41error=13.63 Figure 16.

Extrapolation of SAD for BCI, all individuals with at least 10cm dbh are considered, by extrapolating the moments. And by only considering data from 25 (ha) area. The histogram is the real data. We plot over it the SADs obtained from extrapolation of distribution moments to 50 (ha). Green line is by extrapolating moments under consideration that the number of bins is 12 and moments up to degree 7 is used; Blue line is by extrapolating distribution moments under consideration that the number of bins is 13 and moments up to degree 8 is used.

Conclusion

The accurate fit for species abundance distribution can be obtained by using the distribution moments in a fixed area. We start by considering distribution moments and formulate it as a function of raw moments. We provide a way of extrapolation of distribution moments for larger scales through extrapolation of raw moments.

Acknowledgments

This research was partially supported by Fundação para a Ciência e Tecnologia (PTDC/BIA-BIC/5558/2014) as the BPD project “species diversity as a function of spatial scales” with reference number ICETA-2016-43. The BCI forest dynamics research project was founded by S.P.

Extrapolation from half area to total area (add 1 bin) log (Number of Individuals) N u m be r o f S pe c i e s Distribution moments(8 and 13 bins) Distribution moments(7 and 12 bins) error=9.6error=9.96 Hubbell and R.B. Foster and is now managed by R. Condit, S. Lao, and R. Perez under the Center for Tropical Forest Science and the Smithsonian Tropical Research in Panama. Numerous organizations have provided funding, principally the U.S. National Science Foundation, and hundreds of field workers have contributed. We also acknowledge R. Foster as plot founder and the first botanist able to identify so many trees in a diverse forest; R. Pérez and S. Aguilar for species identification; S. Lao for data management; S. Dolins for database design; plus hundreds of field workers for the census work, now over 2 million tree measurements; the National Science Foundation, Smithsonian Tropical Research Institute, and MacArthur Foundation for the bulk of the financial support.

References

Alirezazadeh, Saeid, Borda-de-Água, Luís, Borges, Paulo, Dionísio, Francisco, Gabriel, Rosalina, M. Pereira, Henrique. 2018. “Theoretical Approach for How Species Abundance Distribution Change Across Scales.” CONTROLO2018 conference, Ponta Delgada, Azores, Portugal, June 4-6, 2018. Borda-de-Água, Luís, Stephen P Hubbell, and Murdoch McAllister. 2002. “Species-Area Curves, Diversity Indices, and Species Abundance Distributions: A Multifractal Analysis.” American Naturalist 159 (2): 38–55. Borda-de-Água, Luís, Paulo A. V. Borges, Stephen P. Hubbell, and Henrique M. Pereira. 2012. “Spatial Scaling of Species Abundance Distributions.” Ecography 35 (6). Blackwell Publishing Ltd: 49–56. Borda-de-Água, Luís, Robert J. Whittaker, Pedro Cardoso, François Rigal, Ana M. C. Santos, Isabel R. Amorim, Aristeidis Parmakelis, Kostas A. Triantis, Henrique M. Pereira, and Paulo A. V. Borges. 2017. “Dispersal Ability Determines the Scaling Properties of Species Abundance Distributions: A Case Study Using Arthropods from the Azores.” Scientific Reports 7 (1): 38-99. Bulmer, M. G. 1974. “On Fitting the Poisson Lognormal Distribution to Species-Abundance Data.” Biometrics 30 (1). [Wiley, International Biometric Society]: 1–10. ———. 1979. Principles of Statistics. Second. Dover Publications, Inc., New York. Condit, R. 1998. Tropical Forest Census Plots. Springer-Verlag and R. G. Landes Company, Berlin, Germany, and Georgetown, Texas. Condit, R., Aguilar, S., Lao, R. Pérez. S., Hubbell, S. P., Foster, R. B. 2017. BCI 50-ha Plot Taxonomy as of 2017. DOI https://doi.org/10.25570/stri/10088/32990.22