Wisdom of the institutional crowd
WWisdom of the institutional crowd
Kevin Primicerio , ∗ , Damien Challet , † , Stanislao Gualdi , , ‡ Laboratory of Mathematics in Interaction with Computer Science, CentraleSupélec, GrandeVoie des Vignes, 92290 Châtenay-Malabry, France Capital Fund Management, 23 rue de l’Université, 75007, Paris, France
Abstract
The average portfolio structure of institutional investors is shown to have properties whichaccount for transaction costs in an optimal way. This implies that financial institutions un-knowingly display collective rationality, or Wisdom of the Crowd. Individual deviations fromthe rational benchmark are ample, which illustrates that system-wide rationality does not neednearly rational individuals. Finally we discuss the importance of accounting for constraintswhen assessing the presence of Wisdom of the Crowd.
The collective ability of a crowd to accurately estimate an unknown quantity is known as the“Wisdom of the Crowd” [1] (WoC thereafter). In many situations, the median estimate of a groupof unrelated individuals is surprisingly close to the true value, sometimes significantly better thanthose of experts [2, 3, 4, 5]. WoC may only hold under some conditions [1, 6]: for example socialimitation is detrimental as herding may significantly bias the collective estimate [7, 8]. WoC is areminiscent of collective rationality without explicit individual rationality: when it applies, it is aconsistent aggregation of possibly inconsistent individual estimates [9]. This is to be contrasted withthe mainstream economic paradigm which takes a short-cut by assuming that collective rationalityreflects individual rationality, where only a “typical” decision maker – the representative agent –is considered [10] or team reasoning where the individual agents explicitly optimize the collectivewelfare [11]. Aggregation of quite diverse individual actions, especially in a dynamic context whereexpectations are continuously revised, is still an open problem [12].Although almost all known examples of WoC are about a single number or coordinate, there is noreason why WoC could not be found for whole functional relationships between several quantities.For example, Haerdle and Kirman analyse the prices and volume of many transactions in Marseillefish market: while the relationship between these two quantities is rather noisy, the market self-organises so that when more fish are sold, prices are lower, as revealed by a local average [13]. Moregenerically, many simple relationships found in Economics textbooks may only hold on average, butnot for each agent or each transaction. ∗ [email protected] † [email protected] ‡ [email protected] I a r X i v : . [ q -f i n . S T ] S e p sset price efficiency is an obvious instance of WoC in Finance: it states that current prices,determined by the actions of many traders, are the best possible estimates and fully reflect allavailable information [14, 15, 16]. Another WoC candidate is portfolios. While many marketparticipants, especially investment funds, strive to build optimal portfolios, each following its owncriteria and constraints (performance objective, risk, tracking error, etc.), the question here iswhether their collective behaviour may be related to a rational benchmark. Fortunately, this impliesthat we do not need to understand the minute details of all the portfolios and can focus on averagequantities instead. Let us define some necessary quantities to be more precise. At time t, fund i has capital W i ( t ) which is invested into n i ( t ) securities among M ( t ) existing ones. As a result, each security α , whosecapitalization is denoted by C α ( t ) , is found in m α ( t ) portfolios. The explicit time dependence isdropped hereafter.The only quantity defined above which depends on asset allocation strategies of fund i is n i , thenumber of securities it chooses to invest in. Our main hypothesis is thus that WoC is found in theaverage relationship between n i and W i . A simple rational benchmark is proposed by [17] : whena fund with capital W i is able to invest the same amount in each of the n i chosen securities and ifthe transaction cost does not depend on the security, then the optimal n i is such that W i ∝ n µi . (1)where the exponent µ is determined by the transaction costs fee structure; for example, propor-tional transaction costs lead to µ = 1 , while a fixed cost per transaction corresponds to µ = 2 (see[17] for more details). Allowing for individual fluctuations, Eq. (1) becomes log W i = µ log n i + (cid:15) i ,where (cid:15) i has zero average. Denoting local average of x i by x , the local average of Eq. (1) yields W ∝ n µ . (2)Flat fee per transaction ( µ = 2 ) is a popular request of large clients of broker. [17] find indeedthat for wealthy individual investors and asset managers, exponent µ = 2 within statistical uncer-tainty. We will thus test the occurrence of WoC from the value of exponent µ . More precisely, ourhypothesis is that if (i) the effective transaction cost per transaction is the same for all assets and(ii) funds are able to build equally weighted portfolios, then Eq. (2) holds and that µ = 2 , which isa sign of WoC.Both conditions must cease to hold for larger investment funds. Indeed, condition (i) cannotbe true for them since large trades (even when split into meta-orders) have a price impact whichgrows with their size and depends on volatility and average turnover [18]. Condition (ii) ceases tohold for large funds which spread their investments on many securities: because the capitalizationof assets and their average daily turnover are very heterogeneous, large funds cannot invest enoughmoney in assets with a small capitalization so as to build an equally-weighted portfolio. As a result,on average, the local average W is expected to increase more slowly as a function of n in the large n region; equivalently, the exponent µ is expected to be smaller than . In summary, two differentregimes should emerge: one with µ < = 2 for small enough n and µ > < for larger n .Figure 1 plots W i versus n i in logarithmic scale: a cloud of point emerges, with a roughlyincreasing trend. The large amount of noise confirms the great diversity of fund allocation strategies.IIigure 1: Total mark-to-market value W i as a function of the number of investments n i , with arobust locally weighted regression fit (yellow line) and two linear fits (blue dashed lines) for twodifferent ranges of n. Robust locally weighted regression fit for the simulated data (in green).WoC may only appear in some average behaviour. This is why we computed a locally weightedpolynomial regression [19]. As expected, two distinct regions appear. In each of them, the localregression follows a roughly linear behaviour.The cross-over point n ∗ between the two regions is algorithmically determined for each quarterlysnapshot (see S.I.); it is stable as time goes on (see Fig. 11 in S.I.). The two exponents µ < and µ > are quite stable as a function of time as well (see Fig. 11 in S.I.); their time-averages µ < (cid:39) . ± . and µ > (cid:39) . ± . are markedly different, which points to distinct collective ways of buildingportfolios in these two regions.So far, µ < is compatible with the WoC hypothesis. Let us check the validity of conditions (i)and (ii) above. When condition (ii) is not satisfied, then condition (i) also must cease to hold,thus we can focus on the former. Condition (ii) says that the diversity of investment fractions p iα = W iα /W i for W iα > must be very small among α . This may be summarized in a singlenumber by the scaled Shannon Entropy S i = − log n i (cid:80) α p iα log p iα , which equals 1 and is maximalwhen all the non-null p iα are equal. Figure. 2 reports the scaled entropy S i of all the funds for agiven time snapshot, together with the local average S . The latter increases up to about n (cid:39) n ∗ and then decreases. The fact that S < is due in part to price fluctuations: even if fund i buildsan equally weighted portfolio at time t (thus S i,t = 1 ), S i,t +1 < at a later date. The importanceof this mechanism is confirmed by Monte-Carlo simulations: the red line of Fig. 2 shows the effectof natural asset price evolution on perfectly equally weighted portfolios after three months, usingasset price volatility measured in our dataset between the time of the snapshot and the threeprevious months: the resulting scaled entropy S MC increases as a function of n , mirroring the localaverage of S i in the same figure for n < n ∗ . Thus, the decrease for n > n ∗ is due to impossibilityfor larger funds to build equally-weighted portfolios. A further argument supporting our claimthat investment funds strive to build equally weighted portfolios (on average) is provided by theentropy measured on the set of common positions between two consecutive snapshots multiplied by S MC ( n i ) /S MC ( n i,restricted ) in order to account for the dependence of S on n ; the local average ofthe resulting entropy corresponds to the dashed blue line: it is clearly smaller than the entropy ofIIIigure 2: Scaled Shannon entropy S i as a function of the number of investments n i for all the fundson 2013-03-31 (circles) and robust local weighted regression fit, for all positions (blue line), numericalsimulation of the effect of price fluctuations on the entropy on initially equally weighted portfolios(red line) , where a volatility similar to the observed volatility in the real data, is applied; robustlocal weighted regression fit restricted to the unchanged portfolio positions from the previous timestep, multiplied by the ratio between the simulated entropy for the full portfolio and the restrictedportfolio (dashed blue line);the new portfolio, hence new positions purposefully bring S i closer to equally weighted portfolios.Therefore, condition (ii) is valid when µ = 2 ; conversely, µ (cid:54) = 2 when condition (ii) ceases to hold.Quite tellingly, the same exponent was found for large private investors and asset managers(with much smaller amounts of money under management). Thus the collective behaviour of largeinvestment funds is essentially the same one. Since one finds the same exponent µ over manydecades of portfolio values for a wide spectrum of market participants, and since µ = 2 correspondsto a realistic transaction cost per transaction, we argue that WoC is a plausible explanation ofthe average portfolio structure. Note that µ = 2 does not imply that funds really face constanttransaction cost per transaction, only that their population acts as if it does. Finally, we stressthat WoC holds for a whole functional relationship over many decades of n and W , not only for asingle number, which considerably extends its reach. So far, bringing to light WoC in the µ = 2 region only required to focus on the number of securitiesin a portfolio, not on how funds select securities. This implicitly assumed that funds could invest inall securities they wished, which is clearly not the case in the large diversification region: the factthat the exponent µ is much smaller in this region implies that funds need on average to split theirinvestments into many more securities. This is most likely due to liquidity constraints: large fundscannot invest as much as they wish in some assets because there are simply not enough shares tobuild a position larger than a certain size without impacting too much their prices. Each fund hasits own way to determine the maximal amount to invest in a given security α ; a common criterionIVs to limit the fraction W iα /C α . Fig. 8 in S.I. strongly suggests that each fund fixes its upper bound f ( max ) i ≥ max α f iα where f iα = W iα C α . (3)It turns out that f max i is highly heterogeneous among funds log ( f max i ) (cid:39) − . ± . (see Fig. 9),which reflects both the heterogeneous ways of portfolio construction and also the confidence of afund in its abilities to execute large trades without too much price impact. The existence of suchlimits implies that portfolios are less likely to be equally weighted in the large diversification region,as seen indeed in the decrease of the average portfolio weights scaled entropy for n ≥ (blue linein Fig. 2).Funds, however, do not invest in a randomly chosen security, even in the low diversificationregion. Figure 3 displays a scatter plot of the capitalization C α of each security α versus m α , thenumber of funds which have invested in this security, together with a local non-linear fit. Similarlyto W vs n , one finds a power-law relationship log C α = γ log m α + (cid:15) α (4)for large enough m (see S.I.). Hence in local average notations, C ∝ m γ . Exponent γ is stableduring the period 2007-2014 (see Fig. 11 in S.I.) and its average ¯ γ (cid:39) . ± . .In short, one needs to introduce a model of how funds choose to invest in securities to reproducethe average behaviour of both Eqs (4) and (1). Since one sees a cross-over between two types ofbehaviour rather than an abrupt change, we create logarithmic bins of the axis n i and denote thebin number of fund i by [ n i ] . Two mechanisms must be specified: how a fund selects security α andhow much it invests in it. The latter point is dictated by Fig. 8 in the large n i region where fund i invests W iα = f ( max ) i C α ; for the sake of simplicity, we approximate f ( max ) i by the median valueof f ( max ) i in the bin [ n i ] , denoted by f ( max )[ n i ] . In the small diversification region, we assume that n i = n opt i , thus W iα = W i /n opt i to be consistent with our previous results. We choose a securityselection mechanism that rests on the market capitalization C α of a security α (see S.I.) which isa good proxy of the liquidity (Fig. 10). We perform Monte-Carlo simulations from the empiricalselection probabilities and f ( max )[ n i ] and display the resulting W vs n and C vs m in Figs 1 and 3(continuous green lines), in good agreement with the local averages (continuous orange lines). Onenotices a discrepancy in the relationship C vs m for large n , which mainly comes from funds in thelarge diversification region. (See Fig 12 S.I).The large diversification region illustrates how constraints may considerably modify the rationalbenchmark. While the above mechanism of security selection is able to reproduce adequately thebehaviour of well diversified funds, we could not find a rational benchmark for the dependence of f max and n i . Thus, the case for WoC in the large diversification region is not entirely closed. Data
Our dataset consists of an aggregation of the following publicly available reports (in order of reli-ability): the SEC Form 13F, the SEC’s EDGAR system forms N-Q and N-CSR and (occasionally)the form 485BPOS. Our work focuses on the period starting from the first quarter of 2005 to thelast quarter of 2013. Vigure 3: Market capitalization of securities as a function of the number of investors in logarithmicscale. From the local non-linear robust fit (yellow line) we observe a linear relationship for assetswith more than about 100 investors. The blue dashed line corresponds to a linear fit on that groupof asset. Hence W α ∝ m γα , with γ (cid:39) . . Robust locally weighted regression fit for the simulateddata (in green).These forms are filled manually and are thus error prone. We partially solve this issue by cross-checking different sources (which often contains overlapping information) and by filtering databefore processing (see details in S.I.).The main limitation of this dataset is that it provides accurate figures for long positions only.The other positions (short, bonds, ...) are most of the time only partially known. The frequencyof the dataset is also inhomogeneous: data for most of the funds are quarterly updated (dependingon regulations), hence we decided to restrict ourselves to 4 points in a year only. Such frequencyis probably too low for investigating the dynamics of individual behaviour but is not a problem forwe focus on an aggregate and static representation of the investment structure. Discussion and conclusion
While WoC is commonly applied to a population collectively guessing a single number, we in-vestigate here a fundamentally different situation and provide evidence for a collective functionaloptimization of the asset ownership structure. What the reference function should be is dictatedby optimality arguments. In the case of financial markets, the rational benchmark was not relatedto the efficient market hypothesis, but to the way a large population of professional fund managersbuild their portfolios. Whereas each fund has its own benchmark with respect to which the fundperformance may be assessed, this, fortunately, has no discernible influence on the average structureof their portfolio. In addition, WoC is often meant as a collective guessing of non-experts; one thusmay conclude that the population investigated here has decidedly more expertise than the subjectsof other WoC studies. What kind of expertise the typical fund manager has is not obvious, at leastwhen one looks at their pure performance (see e.g. [20]). In addition, the optimal relationshipbetween the number of assets in a portfolio and the value of the latter is clearly not broadly knownin these circles, as shown by the very large deviations from the ideal case in Fig. 1, and the collectiveVIxpertise only appears when their decisions are suitably averaged. The presence of WoC when thesubjects face strong constraints, as those of highly diversified funds, is more conjectural, and morework will be needed in that respect.At a higher level, our results suggest that, while individuals may deviate much from the rationalexpectation theory, standard economic theory may hold at a collective level, without need formicro-founded individual decisions: the average decision may in some cases be approximated bya rational, representative agent. Our results however only hold on a snapshot of the system, forwhich individual fluctuations may be averaged out. In a dynamic setting, the very large deviationsfrom the rational benchmark may not be neglected in the presence of feedback loops [21]. In otherwords, the dynamics of these fluctuations are worth investigating in their own right.
Acknowledgements
S. Gualdi acknowledges support of Labex Louis Bachelier (project number ANR 11-LABX-0019)
References [1] Surowiecki J. The wisdom of crowds. Anchor; 2005.[2] Galton F. Vox populi (The wisdom of crowds). Nature. 1907;75:450–51.[3] Hill S, Ready-Campbell N. Expert stock picker: the wisdom of (experts in) crowds. Interna-tional Journal of Electronic Commerce. 2011;15(3):73–102.[4] Landemore HE. Why the many are smarter than the few and why it matters. Journal of publicdeliberation. 2012;8(1).[5] Nofer M, Hinz O. Are crowds on the internet wiser than experts? The case of a stock predictioncommunity. Journal of Business Economics. 2014;84(3):303–338.[6] Davis-Stober CP, Budescu DV, Dana J, Broomell SB. When is a crowd wise? Decision.2014;1(2):79.[7] Lorenz J, Rauhut H, Schweitzer F, Helbing D. How social influence can undermine the wisdomof crowd effect. Proceedings of the National Academy of Sciences. 2011;108(22):9020–9025.[8] Muchnik L, Aral S, Taylor SJ. Social influence bias: A randomized experiment. Science.2013;341(6146):647–651.[9] Hogarth RM. A note on aggregating opinions. Organizational Behavior and Human Perform-ance. 1978;21(1):40–46.[10] Hartley JE, Hartley JE. The representative agent in macroeconomics. Routledge; 2002.[11] Colman AM, Pulford BD, Rose J. Collective rationality in interactive decisions: Evidence forteam reasoning. Acta psychologica. 2008;128(2):387–397.[12] Kirman AP. Whom or what does the representative individual represent? The Journal ofEconomic Perspectives. 1992;6(2):117–136.VII13] Härdle W, Kirman A. Nonclassical demand: A model-free examination of price-quantity rela-tions in the Marseille fish market. Journal of Econometrics. 1995;67:227–257.[14] Malkiel BG, Fama EF. Efficient capital markets: A review of theory and empirical work. Thejournal of Finance. 1970;25(2):383–417.[15] Malkiel BG. The efficient market hypothesis and its critics. The Journal of Economic Per-spectives. 2003;17(1):59–82.[16] Fama EF. Market efficiency, long-term returns, and behavioral finance. Journal of financialeconomics. 1998;49(3):283–306.[17] de Lachapelle DM, Challet D. Turnover, account value and diversification of real traders:evidence of collective portfolio optimizing behavior. New Journal of Physics. 2010;12(7):075039.[18] Bouchaud JP. Price impact. Encyclopedia of quantitative finance. 2010;.[19] Cleveland WS, Grosse E, Shyu WM. Local regression models. Statistical models in S.1992;2:309–376.[20] Barras L, Scaillet O, Wermers R. False discoveries in mutual fund performance: Measuringluck in estimated alphas. The Journal of Finance. 2010;65(1):179–216.[21] Gualdi S, Tarzia M, Zamponi F, Bouchaud JP. Tipping points in macroeconomic agent-basedmodels. Journal of Economic Dynamics and Control. 2015;50:29–61.[22] Muggeo VM. Estimating regression models with unknown break-points. Statistics in medicine.2003;22(19):3055–3071. VIIIigure 4: Top: Market capitalization as a function of the number of investors for all securities.Bottom: Temporal evolution of the aggregated market capitalization of US over the total marketcapitalization.
Supporting Information (SI)4 Filtering
In order to remove inconsistencies in the dataset, we applied the following filters
Our dataset is sparse and heterogeneous. Indeed, the quality of the sources of data is directlyrelated to each country’s disclosure regulations. For these reasons we decided to keep only theentities which use an US based mail address.About 60% of the total market capitalization of the dataset is concentrated in US based secur-ities. Figure 4 shows two large clouds of dots, each of them corresponds to a different region oforigin: green (resp. orange) cloud corresponds to non-US (resp. US) based securities. The originof this large difference between these two regions are not clear: it could for example come fromIXigure 5: Temporal evolution of the number of funds N i and securities N α in the database. Un-filtered in dashed lines and US based only in solid lines.differences in regulations in non-US countries. It turns out that the ratio of the investment valuesin US and non-US assets varies little as a function of time (see Fig. 4), which does not affect theexponent µ in Eq. 1. As a consequence we focused on US securities. Large funds are requested to report their positions at a frequency which depends on the applicableregulation. As a result, reporting frequency ranges from monthly to yearly, most funds filingquarterly reports. We therefore focused of the latter.
The “penny stocks”, i.e., usually securities which trade below $5 per share in the USA, are not listedon a national exchange. Since they are considered highly speculative investments and are subjectto different regulations, we filtered them out.
We also filtered out small founds and securities and applied the following filters: f W i > USD, C α > USD, n i ≥ , m α ≥ . We restricted our study to 36 quarterly snapshots starting from the first quarter of 2005 and endingwith the last quarter of 2013. Figure 5 reports the evolution of the number of securities and fundsin the database before and after filtering. X
Asset selection modelling
The framework we introduce in this paper follows a series of a few elementary steps described below.The aim is for the model to be sensitive to the different constraints which dominates the portfolioselection of a fund. n ∗ For date t, we define the cross-over point n ∗ between the two regions which appear in the localpolynomial regression. We determine this point value with a likelihood maximization of the model W = µ < n + ( µ > − µ < )( n − n ∗ ) θ ( n − n ∗ ) , (5)where θ ( x ) is the Heaviside function. We use a recursive method to find parameters µ < , µ > and n ∗ [22]. Figure 11 shows that n ∗ is stable as a function of time. n i < n ∗ In this region, we consider the equally weighted portfolio hypothesis to be true. Each positionhas a size W i n opt i , where n opt i is the optimal number of position computed with eq 1. The fundsselect their asset randomly with a probability proportional to C α . Also, in order to construct anequally-weighted portfolio, a position is valid only if it is of size W i n opt i . n i ≥ n ∗ In this region, the liquidity constraints make it harder for funds to keep an equally weighted portfolioand portfolio values are thus spread on a larger number of assets. We propose here a stochasticmodel of asset selection based on two main ingredients: first that the selection probability of asset α by fund i depends on the diversification of a fund n i and on the scaled rank of the capitalizationof asset α , and that the investment is bounded by an hard constraint on the fraction of marketcapitalization of asset α .We chose a security selection mechanism which rests on the scaled rank of capitalization ofsecurity α , defined as ρ α = r α M where r α is the rank of capitalization C α and M the number ofsecurities at a given time. The selection probability P ( W iα > | ρ α ) is then obtained by parametricfit to a beta distribution in each logarithmic bin. Note that we do not use the same rank-basedselection mechanism in the low-diversification region because in this case it is harder to have agood fit with the beta distribution. This is however only a minor point since the capitalizationis approximately power-law distributed and the two selection mechanisms are basically equivalent(the rank is proportional to a power of the capitalization) and indeed results are very similar inboth cases.Figure 6 shows that the distribution of the ranks in which a fund is invested is sensitive toits diversification n i for 2013-03-31. The Beta distribution, which is limited to a [0 , interval, isflexible enough to describe the asset selection mechanism of a fund. f ( x ; a, b ) = 1 B ( a, b ) x a − (1 − x ) b − , (6)where a and b are the shape parameters of the distribution, and B is a normalization constant.XIigure 6: Top: Empirical probability density function of investing in a security of scaled capital-ization rank ρ given the diversification n i of the fund. Bottom: Probability density function ofinvesting in a security of scaled capitalization rank ρ given the diversification n i of the fund, givenby the model. XIIigure 7: Coefficients a and b of the Beta Distribution 6 as a function of n i . Linear fits are foreye-guidance only. Maximum investment ratio
The funds limit their investment in a given asset. They seem to follow a simple rule: defining theinvestment ratio f i,α = W iα C α , one easily sees in Fig. 8 that each fund has a maximum investmentratio f max i = max α (cid:18) W iα C α (cid:19) (7)Since the average exchanged dollar-volume of an asset is proportional to its capitalization(Fig. 10), the existence of f max i is a way to account for the available liquidity.Although that limit is clear for an individual fund, there is a large range of empirical values f max i Fig. 9.
The simulation is done in a few simple steps:1. Compute n ∗ using the segmented model Eq. 5.2. Select a fund i , with a number of assets n i .3. If n i < n ∗ :(a) Compute its optimal portfolio value using Eq. 1. The fund will invest W opt i n i for everyposition.(b) Select assets randomly with a probability proportional to C α .4. Else if n i ≥ n ∗ :(a) Compute its f max i , so that the fund i will invest f max i in n i assets.XIIIigure 8: Fraction of the market capitalization of a security held by a fund. Each color represent adifferent fund. Top: Funds with a large diversification ( n i > ). We can clearly see a delimitationfor most of the funds, which correspond to the maximum fraction f max i . The value of f max i widelydiffers from one fund to another. Bottom: Funds with a low diversification ( n i <60), f max i doesn’tappear. XIVigure 9: Empirical probability density function of f max i for all the funds.(b) Select assets randomly following a Beta probability distribution Fig. 6 with the para-meters found in Fig. 7.By iterating those steps we obtain Fig. 1Since the simulation outputs a portfolio for every fund, we can directly infer the number ofinvestors m α of every security. XVigure 10: Market capitalization as a function of the daily exchanged volume dollar. We find aslope close to 1 for all the dates in our database, confirming the hypothesis that the daily exchangedvolume dollar of an asset is proportional to its market capitalization.XVIigure 11: Top: Temporal evolution of the coefficients µ < , µ > and γ . Bottom: Temporal evolutionof the value of the cross-over point n ∗∗