Regression with Partially Observed Ranks on a Covariate: Distribution-Guided Scores for Ranks
Yuneung Kim, Johan Lim, Young-Geun Choi, Sujung Choi, Do Hwan Park
aa r X i v : . [ s t a t . M E ] J a n Regression with Partially Observed Ranks on aCovariate:Distribution-Guided Scores for Ranks
Yuneung Kim, Johan Lim, Young-Geun ChoiandSujung Choi, and Do Hwan Park ∗ Abstract
This work is motivated by a hand-collected data set from one of the largest Internetportals in Korea. This data set records the top 30 most frequently discussed stockson its on-line message board. The frequencies are considered to measure the attentionpaid by investors to individual stocks. The empirical goal of the data analysis isto investigate the effect of this attention on trading behavior. For this purpose, weregress the (next day) returns and the (partially) observed ranks of frequencies. Inthe regression, the ranks are transformed into scores, for which purpose the identityor linear scores are commonly used. In this paper, we propose a new class of scores (ascore function) that is based on the moments of order statistics of a pre-decided randomvariable. The new score function, denoted by D-rank, is shown to be asymptoticallyoptimal to maximize the correlation between the response and score, when the pre-decided random variable and true covariate are in the same location-scale family. Inaddition, the least-squares estimator using the D-rank consistently estimates the truecorrelation between the response and the covariate, and asymptotically approachesthe normal distribution. We additionally propose a procedure for diagnosing a givenscore function (equivalently, the pre-decided random variable Z) and selecting one thatis better suited to the data. We numerically demonstrate the advantage of using acorrectly specified score function over that of the identity scores (or other misspecifiedscores) in estimating the correlation coefficient. Finally, we apply our proposal to testthe effects of investors’ attention on their returns using the motivating data set.
Keywords:
Concomitant variable; investors’ attention; linear regression; moments oforder statistics; optimal scaling; partially observed ranks ∗ Yuneung Kim, Johan Lim and Young-Geun Choi are with the Department of Statistics, Seoul NationalUniversity, Seoul, 151-747, Korea. Sujung Choi is with the School of Business Administration, SoongsilUniversity, Seoul, 156-743, Korea. Dohwan Park is with Department of Mathematics and Statistics, Univer-sity of Maryland at Baltimore County, Baltimore, MD, 21250, USA. All correspondence are to Johan Lim(E-mail: [email protected] ) Introduction
This paper is motivated by a hand-collected data set from
Daum.net , the 2nd largest Internetportal in Korea. The
Daum.net portal offers an on-line stock message board where investorscan freely discuss specific stocks in which they might be interested. This portal also reportsa ranked list of the top 30 stocks that are most frequently discussed by users on a daily basis.The data set was collected by the authors during the 537 trading days from October 4th, 2010,to November 23rd, 2012. Along with the rank data, we also collected financial data regardingindividual companies from FnGuide ( ). These additional datainclude stock-day trading volumes classified in terms of different types of investors, stockprices, stock returns, and so on.The purpose of analyzing the collected data is to investigate the shifts in stock returnscaused by variations in investor attention. In finance, researchers are often interested indetermining the motivations that drive buying and selling decisions in stock markets. It iscommonly assumed that investors efficiently process relevant information in a timely manner,but in reality, it is nearly impossible to be efficient because of information overload. Inparticular, individual investors are often less sophisticated than are institutional investorsand have a limited ability to process all relevant information. For this reason, individualinvestors may pay attention only to a limited amount of information, perhaps that which isrelatively easy to access. The phenomenon of limited attention is a well-documented cognitivebias in the psychological literature (Kahneman, 1973; Camerer, 2003). This phenomenonaffects the information-processing capacities of investors and thus may affect asset prices onthe financial market. To empirically prove the effect of investor attention on stock returns,we regress the observed stock returns with respect to the partially observed ranks.Regression on a (partially observed) rank covariate has not previously been extensivelystudied in the literature. A procedure that is commonly used in practice to address rank2ovariates is to (i) regroup the ranks into only a few groups (if the number of ranks is high)and (ii) treat the regrouped ranks as an ordinal categorical variable. Ordered categoricalvariables frequently arise in various applications and have been studied extensively in theliterature. Score-based analysis is most commonly used for this purpose; see H´ajek (1968),Hora and Conover (1984), Kimeldorf et al. (1992), Zheng (2008), Gertheiss (2014) and thereferences therein. Thus, this typical two-step procedure for addressing a rank covariate isequivalent to defining a score function for the ranks. However, as in the case of ordinalcategorical variables, such a score-based approach suffers from an inherent drawback relatedto the choice of the score function; different choices of scores may lead to conflicting con-clusions in the analysis (Graubard and Korn, 1987; Ivanova and Berger, 2001; Senn, 2007).The recommendation for selecting the score function according to the literature is (i) tochoose meaningful scores for the ordinal categorical variable based on domain knowledge ofthe data, (ii) to use equally spaced scores if scientifically plausible scores are not available(see Graubard and Korn (1987)), and (iii) to find a optimal scaling transformed scores thatmaximize the correlation with the responses while preserving the assumed characteristicsof the ordinal values(Linting et al., 2007; Costantini et al., 2010; de Leeuw and Mair, 2009;Mair, and de Leeuw, 2010; Jacoby, 2016).In this paper, we seek to provide an efficient tool for approach (i) described above, forthe case in which some qualitative knowledge is available regarding the ranks or the rankingvariable (the variable that is ranked). More specifically, we propose a new set of scorefunctions, denoted by D-rank, and study their use in linear regression. The proposed scorefunction is based on the moments of order statistics (MOS) of a pre-decided random variable Z . This score function has several interesting properties related with the regression model,if the pre-decided random variable is correctly specified as listed below. Here, the correctspecification implies it is within the same location-scale family with the true (unobserved)covariate X . First, the D-rank is asymptotically optimal in the sense that it maximizes3he correlation between the response and score if the distribution of the D-rank is correctlyspecified. Second, the least-squares estimator using the D-rank consistently estimates thetrue correlation between the response and the covariate and asymptotically approaches thenormal distribution. Finally, the residuals of the fitted regression allow us to diagnose thegiven score function (equivalently, the pre-decided random variable X ) and to provide a toolfor selecting a score function that is better suited to the data.The remainder of this paper is organized as follows. In Section 2, we study the propertiesof the proposed D-rank. In this section, we show that the proposed D-rank is asymptoticallyoptimal to maximize the correlation between the response and score. In addition, We alsodemonstrate the asymptotic equivalence between the proposed score function and the quan-tile function; the quantile function may provide a better illustration of the qualitative featuresof the score function. In Section 3, we apply the score function to estimate the regression co-efficient of the linear model or, more precisely, to estimate the correlation coefficient betweenthe response and the scoring variable X . We prove that the least-squares estimator usingthe D-rank consistently estimates the correlation coefficient and is asymptotically normallydistributed. In addition, we discuss the procedure for selecting an appropriate score func-tion using the residuals. In Section 4, we numerically demonstrate that using the correctlyspecified score function significantly reduces the mean square error on the estimation of thecorrelation coefficient. In Section 5, we analyze the motivating data set to investigate theexistence of the attention effect. Finally, in Section 6, we briefly summarize the paper anddiscuss the application of the proposed scores to regression using other auxiliary covariates. We consider a simple regression model in which only partial ranks of a covariate are observed.Specifically, suppose that (cid:8)(cid:0) Y i , X i (cid:1) , i = 1 , , . . . , n (cid:9) is the complete set of observations,where Y i is the variable of primary interest and X i is the covariate related to Y i . For4xample, in our rank data from Daum.net , for i = 1 , , . . . , n , Y i is a relevant outcome suchas earning rate or trading volume, X i is the “unobserved” investors’ attention on the i thcompany measured by the frequency of on-line discussions, and R i is the “observed” rank of X i among X , X , . . . , X n . We make certain assumptions regarding the distributions of X and Y . We assume that the linear model of the relationship between X i and Y i is Y i = µ Y + ρσ Y X i − µ X σ X + ǫ i , (1)where the ǫ i s are IID values from a distribution of mean 0 and variance σ ǫ . The objectiveof this paper is the estimation and inference of ρ = corr (cid:0) Y, X (cid:1) (or the regression coefficientbetween Y and X ) based on the observed data (cid:8) ( Y i , R i ) , i = 1 , , . . . , n (cid:9) . To do it, we aimto define a good score function S ( r ) for the observed rank r , and consider the regression of Y [ r : n ] on S ( r ), where Y [ r : n ] is the response Y i for R i = r .The D-rank, we propose in this paper, is a set of the MOS of pre-decided random variable Z , which we assume is in the same location-scale family of the true covariate X . To bespecific, suppose that Z , Z , . . . , Z n are independent and identically distributed (IID) copiesof the random variable Z and that Z ( r : n ) is the corresponding r th-order statistic for r =1 , , . . . , n . The D-rank defines the score of the rank r as S n ( r ) = α ( r : n ) := E (cid:0) Z ( r : n ) (cid:1) for r = 1 , , . . . , n .We first show that the D-rank maximizes the sample correlation between Y [ r : n ] and α ( r : n ) , r = 1 , , . . . , n , in asymptotic, among all increasing functions S n ( r ) : { , , . . . , n } → R .Let S n ( r ) and α ( r : n ) be the standardized scores (of S n ( r ) and α ( r : n ) ) to make P nr =1 S n ( r ) = P nr =1 α ( r : n ) = 0 and P nr =1 S n ( r ) = P nr =1 α r : n ) = 1. Let S n and S n be the collection of allincreasing functions S n ( r ) and S n ( r ), respectively. Theorem 1.
Under the linear model (1), if Z is in the location-scale family of X , the D-rankmaximizes the limit of the sample correlation between Y [ r : n ] and S n ( r ) among S n ( r ) ∈ S n : lim n →∞ n · b σ Y n X r =1 S n ( r ) (cid:0) Y [ r : n ] − Y n (cid:1) , (2)5 here b σ Y = n P nr =1 (cid:0) Y [ r : n ] − Y n (cid:1) . The proof of Theorem 1 is followed in Appendix.Theorem 1 shows the asymptotic optimality of the D-rank for the regression in view ofoptimal scaling in the literature. The optimal scaling finds optimally transformed scores thatexplain mostly well the assumed statistical model. It arises in various contexts including Gificlassification of non-linear multivariate analysis(de Leeuw and Mair, 2009), the aspect (cor-relational and non-cprrelational aspects) of multivariable(Mair, and de Leeuw, 2010), andnon-linear principal component analysis(Linting et al., 2007; Costantini et al., 2010). Here,we adopt the idea of the optimal scaling in Jacoby (2016), and find the transformation tomaximize the correlation between the response and transformed scores. Theorem 1 aboveshows that the D-rank maximizes the correlation in asymptotic, if pre-determined distribu-tion for the D-rank is correctly specified.The proposed score is closely related to the quantile of the underlying distribution of Z . Let F Z ( z ) for z ∈ R and Q Z ( q ) for q ∈ [0 ,
1] be the cumulative distribution function(CDF) and the quantile function (QF), respectively, of Z . In the estimation of F Z ( z ) for { Z i , i = 1 , , . . . , n } , the r th-order statistic Z ( r : n ) is the ( r/n ) × Q Z ( r/n ). Morespecifically, given p r = rn +1 , q r = 1 − p r , and Q r = Q Z ( p r ), we can write α ( r : n ) = Q r + p r q r n + 2) Q (2) r + O (cid:0) n (cid:1) , where Q (2) r = − f ′ Z ( Q r ) (cid:14) { f Z ( Q r ) } and f Z ( z ) is the probability density function of Z , whichis differentiable. We refer the reader to David (2003, Section 4.6) for the details of therelationship between the MOS and the quantiles.Consideration of the QF may provide a better understanding of the qualitative featuresof the proposed score function. Suppose we expect the score function S n ( r ) is convex in tail(for r ≥ [ nc ] for a constant c close to 1); in other words, S n ( r + 1) − S n ( r ) ≥ S n ( r ) − S n ( r − r ≥ [ nc ]. From the equivalence between the MOS and quantiles, it is known that theconvexity of the scores S n ( r ) is approximately equal to that of the quantile function Q Z ( p ).Furthermore, the convexity of Q Z ( p ) for p ≥ c implies the following equivalent statements:(i) F ( z ) is concave in z , (ii) f ′ ( z ) ≤ f ( z ) is decreasing in z , all for z ≥ Q Z ([ nc ]). In this section, we consider a simple regression model in which only partial ranks of acovariate are observed. Specifically, suppose that (cid:8)(cid:0) Y i , X i (cid:1) , i = 1 , , . . . , n (cid:9) is the completeset of observations from the linear model (1), and R i is the rank of X i among X , X , . . . , X n .The rank R i of X i is indirectly measured by the frequency of on-line discussions of the i thcompany.In this paper, we consider the case in which the ranks R i are partially observed in thesense that we observe only that U i = R i I (cid:0) R i ≤ m (cid:1) + m + I (cid:0) R i > m (cid:1) rather than R i , where m + is an arbitrary constant that is greater than m . Finally, the observations are (cid:8) ( Y i , U i ) , i = 1 , , . . . , n (cid:9) . We let Y [ r : n ] = Y i I (cid:0) R i = r (cid:1) for r = 1 , , . . . , m , and denote the above partially observed databy Y [ m ] for notational simplicity.The objective of this section is to identify a good estimator of ρ = corr (cid:0) Y, X (cid:1) (or theregression coefficient between Y and X ) and to test H : ρ = 0 versus H : ρ = 0 or ρ > Y [ m ] . To estimate ρ , we recall assumptions regarding the distributions of X and Y . We assumethat the linear model of the relationship between X i and Y i is Y i = µ Y + ρσ Y X i − µ X σ X + ǫ i , ǫ i s are IID values from a distribution of mean 0 and variance σ ǫ . By ordering onthe X i s, we have for r = 1 , . . . , nY [ r : n ] = µ Y + ρ σ Y σ X (cid:0) X ( r : n ) − µ X (cid:1) + ǫ [ r : n ] , (3)where ρ = corr (cid:0) Y, X (cid:1) and E (cid:0) Y [ r : n ] (cid:1) = µ Y + ρσ Y α ( r : n ) (4)var (cid:0) Y [ r : n ] (cid:1) = σ Y (cid:0) ρ β ( rr : n ) + 1 − ρ (cid:1) cov (cid:0) Y [ r : n ] , Y [ s : n ] (cid:1) = ρ σ Y β ( rs : n ) , r = s with α ( r : n ) = E (cid:26) X ( r : n ) − µ X σ X (cid:27) and β ( rs : n ) = Cov (cid:18) X ( r : n ) − µ X σ X , X ( s : n ) − µ X σ X (cid:19) for r, s = 1 , , . . . , n (David and Galambos, 1974; David, 2003).We are motivated by the identities (3) and (4) given above and propose the least-squaresestimator b ρ (cid:0) s (cid:1) ≡ b σ Y · P [ ns ] r =1 α ( r : n ) (cid:8) Y [ r : n ] − b µ Y (cid:9)P [ ns ] r =1 α r : n ) (5)as an estimator of ρ with s = m/n , where, b µ Y = P ni =1 Y i /n and b σ Y = P ni =1 ( Y i − b µ Y ) /n arethe empirical estimators of the mean and variance, respectively, of Y .We claim that, if X is drawn from a location-scale family generated by Z , then theleast-squares estimator b ρ (cid:0) s (cid:1) with s = m/n in (5), that is calculated based on the partialobservations Y [ m ] , is consistent and asymptotically normally distributed with an appropriatescale, as shown in Theorem 2. Suppose thatΨ I n ( s ) := 1 n [ ns ] X r =1 α r : n ) σ r : n ) , Ψ II n ( s ) := 1 n [ ns ] X r ns ] X r α ( r n ) α ( r n ) β r ,r n ) , and Φ n ( s ) := 1 n [ ns ] X r =1 α r : n ) , where σ r : n ) = σ (cid:0) X ( r : n ) (cid:1) , and let Ψ I ∞ ( s ), Ψ II ∞ ( s ) and Φ ∞ ( s ) be the limits of Ψ I n ( s ), Ψ II n ( s )and Φ n ( s ), respectively (under the assumption that they exist).8 heorem 2. Under the assumption that X is drawn from a distribution of a location-scalefamily with a finite variance, the distribution of √ n (cid:0)b ρ ( s ) − ρ (cid:1) converges to the normal dis-tribution of mean and variance (cid:8) Ψ I ∞ ( s ) /σ Y + ρ Ψ II ∞ ( s ) (cid:9) / Φ ∞ ( s ) . The proof of Theorem 2 is provided in the Appendix.We conclude this section with two remarks regarding Theorem 2. First, in Theorem 2,from the tower property of the conditional expectation,var (cid:0) √ n b ρ (cid:1) > (cid:0) (cid:14) n (cid:1) P [ ns ] r =1 α r : n ) ≥ (cid:0) X (cid:1) = 1 , and when ρ = 0, the asymptotic variance of √ n b ρ is larger than 1, which is the variance ofthe least-squares estimator in the case where X is completely observed. Second, it is possibleto test the hypothesis H : ρ = 0 using the statistic T = √ n b ρ, which has an asymptoticallynormal distribution of mean 0 and variance 1 (cid:14) Φ ∞ ( s ). As in the classical linear model, the residuals can provide guidance for identifying a bettermodel and score function. The residuals are defined as e ∗ [ r : n ] = (cid:0) Y [ r : n ] − µ Y (cid:1) /σ Y − b ρα ( r : n ) for r = 1 , , . . . , [ ns ]. Statistical properties of the residuals, which are analogous to those in theclassical linear model, are summarized as follows. Theorem 3.
Under the assumptions of Theorem 2, the following statements are true for theresiduals: (i) E (cid:0) e ∗ [ r : n ] (cid:1) = 0 ; (ii) var (cid:0) e ∗ [ r : n ] (cid:1) = n ρ β ( rr : n ) + (cid:0) − ρ (cid:1)o + α r : n ) nσ Y Ψ I n ( s )Φ n ( s ) − P [ ns ] r =1 α r : n ) × ρ ns ] X w =1 α ( w : n ) α ( r : n ) β ( rw : n ) + α r : n ) (1 − ρ ) (iii) E (cid:0) e ∗ [ r : n ] α ( r : n ) (cid:1) = 0 ; and (iv) E (cid:0) e ∗ [ r : n ] b Y ∗ [ r : n ] (cid:1) = 0 , where b Y ∗ [ r : n ] = µ Y − b ρα ( r : n ) . α ( r : n ) and the predicted values b Y [ r : n ] . Thus, the residual plots,which are the plots of (i) r versus e ∗ [ r : n ] , (ii) α ( r : n ) versus e ∗ [ r : n ] , and (iii) b Y [ r : n ] versus e ∗ [ r : n ] ,have the same interpretations as those of the classical linear model. We plug in µ Y and σ Y with their empirical estimators and use e [ r : n ] = (cid:0) Y [ r : n ] − b µ Y − (cid:1) / b σ Y − b ρα ( r : n ) .The residual sum of squares may be another useful tool for measuring the goodness of fitof the proposed model, as in the classical linear model. The residual sum of squares in ourmodel is defined as RSS = [ ns ] X r =1 (cid:18) Y [ r : n ] − b µ Y b σ Y − b Y [ r : n ] (cid:19) and will be used along with the residual plots as a guide for selecting a better score function.Finally, the proposed least-squares estimator (5) assumes that the regression line between α ( r : n ) and (cid:0) Y [ r : n ] − b µ Y (cid:1) has an intercept (at the y axis) of 0. Thus, if the model (or the scorefunction) is correctly specified, then the intercept estimated by the regression (with intercept)should be close to 0, and the estimated intercept therefore serves as a measure for checkingthe correctness of the score function. Note that the regression (without intercept) performedin this paper is based on observations of the top [ ns ] ranks and assumes that the functionpasses through the origin (see Figure 4). The least-squares estimator presented in Section 3.2 does not fully use the informationcontained in (cid:8) Y [ r : n ] := Y i I( R i = r ) , r > m (cid:9) ; it is used only to estimate µ Y and σ Y , not toestimate ρ itself. In this section, we briefly demonstrate how b ρ can be modified to incorporatethese unranked observations. 10e consider the following modified estimator: b ρ m (cid:0) s (cid:1) ≡ b σ Y · P [ ns ] r =1 α ( r : n ) (cid:8) Y [ r : n ] − b µ Y (cid:9) + (cid:0) n − [ ns ] (cid:1) α [ ns ]+ (cid:0) Y [ ns ]+ − b µ (cid:1)P [ ns ] r =1 α r : n ) + (cid:0) n − [ ns ] (cid:1) α ns ]+ , where α [ ns ]+ = P nr =[ ns ]+1 α ( r : n ) (cid:14)(cid:0) n − [ ns ] (cid:1) and Y [ ns ]+ = P nr =[ ns ]+1 Y [ r : n ] (cid:14)(cid:0) n − [ ns ] (cid:1) . Thismodified estimator also asymptotically approaches the normal distribution. Specifically,suppose that e α ( r : n ) = α ( r : n ) r = 1 , , . . . , [ ns ] ,α [ ns ]+ r = [ ns ] + 1 , [ ns ] + 2 , . . . , n. We also suppose that e Ψ I n ( s ) = (cid:0) (cid:14) n (cid:1) nP nr =1 e α r : n ) σ r : n ) o , e Ψ II n ( s ) = (cid:0) (cid:14) n (cid:1) nP nr P nr e α ( r n ) e α ( r n ) β r ,r n ) o and e Φ n ( s ) = (cid:0) (cid:14) n (cid:1) P nr =1 e α r : n ) . As in the previous section, (1 /n )-scaled limits of e Ψ I n ( s ), e Ψ II n ( s )and e Φ n ( s ) exist; let the limits be e Ψ I ∞ ( s ) = lim n →∞ e Ψ I n ( s ) (cid:14) n , e Ψ II ∞ ( s ) = lim n →∞ ˜Ψ II n ( s ) (cid:14) n and˜Φ ∞ ( s ) = lim n →∞ ˜Φ n ( s ) (cid:14) n , respectively. Then, we can write the following theorem. Theorem 4.
Under the same assumptions as those of Theorem 2, the distribution of √ n (cid:0)b ρ m ( s ) − ρ (cid:1) converges to the normal distribution with mean and variance (cid:0) e Ψ I ∞ ( s ) /σ Y + ρ e Ψ II ∞ ( s ) (cid:1) / e Φ ∞ ( s ) Proof. √ n (cid:0)b ρ m − ρ (cid:1) = √ n (cid:26) P [ ns ] r =1 α r : n ) + (cid:0) n − [ ns ] (cid:1) α ns ]+ × (cid:18) [ ns ] X r =1 α ( r : n ) (cid:16) Y [ r : n ] − b µ Y (cid:1) + (cid:0) n − [ ns ] (cid:1) α [ ns ]+ (cid:0) Y [ ns ]+ − b µ Y (cid:1)(cid:19) − ρ (cid:27) = √ n (cid:18) b σ Y P nr =1 ˜ α ( r : n ) (cid:0) Y [ r : n ] − b µ Y (cid:1)P nr =1 e α r : n ) − ρ (cid:19) , the distribution of which converges to the normal distribution with mean 0 and variance (cid:0) e Ψ I ∞ ( s ) /σ Y + ρ e Ψ II ∞ ( s ) (cid:1) / e Φ ∞ ( s ) following the same arguments presented in the proof of The-orem 2. 11 Numerical Study
In this section, we numerically investigate the advantage we can gain by choosing the correctscore function to estimate ρ = corr( Y, X ). The performance of an estimator is measured interms of its bias and its mean square error (MSE), which we numerically estimate based on1000 simulated data sets and the estimators obtained therefrom.The data sets are generated from the regression model Y i = β + β X i + ǫ i , i = 1 , , . . . , n, where the ǫ i are independently drawn from N (0 , X : the uniform distribution on [0 , /
3. As stated in Section 2, the score function of theuniform distribution is almost equivalent to the identity score function S n ( r ) = r . However,the normal distribution and the gamma distribution have heavier tails than does the uniformdistribution, and their score functions are convex in the right tail. We set the parameters δ to ensure that ρ = 0, 0 .
3, 0 . .
7, where ρ = δ/σ Y . Finally, in each consideredcase, the sample size n and the number of partially observed ranks m are set to all possiblecombinations of n = 500 or 2000 and r = 20, 50, or 100. When estimating ρ , we applyfour different scores, including the proposed MOS-based score functions obtained from thethree distributions listed above and the identity score function, which is commonly used inpractice. The approximated bias and MSE values are reported in Tables 1 and 2.We can observe several interesting findings from these tables. First, the correctly specifiedscore function performs better than do others when there exists a strong correlation between X and Y (when ρ is large). However, when ρ = 0, there is almost no difference amongthe four considered scores. Second, as the number of observations increases, in the sensethat either r or n increases, the superiority of the correctly specified scores with respectto the others becomes apparent even when ρ is not large. Third, as conjectured in theprevious section, the scores based on the uniform distribution perform almost identically to12he identity scores. Finally, the differences between the correctly specified scores and theothers are significant regardless of ρ or the sample size ( r or n ) when the distribution of X has a heavier right tail (the gamma distribution). To investigate how the attention of investors affects stock returns, we merge the hand-collected
Daum.net rank data set and the financial data from
FnGuide . We illustrate howthe returns of attention-grabbing stocks fluctuate around the event dates when investors payattention to these stocks. The variables to be used in the analysis are as follows. (1) “R”:The rank of an individual stock on day t ; if the rank value is 1, then the stock is the mostfrequently discussed stock on the Daum stock message board on that day. This is the keyvariable that measures the degree of investor attention. (2) “RN”: Raw returns on day t + 1(the next day) (%), which is of primary interest and is the quantity that we wish to predict.(3)“ R0”: Raw returns on day t (%). (4) “R1”: Raw returns on day t − t − t − t − t − As stated previously, the primary goal of our analysis is to determine how the returns ofattention-grabbing stocks will fluctuate around the event dates when investors pay attentionto these stocks. The next-day return can also be influenced by several other factors inaddition to investor attention. To account for the effects of these other factors, we consider13 (0 , N (0 , G (3 , ρ Dist Bias MSE Bias MSE Bias MSEr=20 0 . . . . . . -0.0044 . -0 . . . . . . -0 . . . . . . . . . . . . . . . . . . . r=50 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . r=100 0 . . . . . . . . . . . . . . . . -0 . . . . . . . . . . . . . . . . Table 1: n = 500: In the MSE columns, the numbers in bold-faced text are the smallestamong the evaluated score functions. In both the bias and MSE columns, the underlinednumbers are the true values (those from the correctly specified score functions).14 (0 , N (0 , G (3 , ρ Dist Bias MSE Bias MSE Bias MSEr=20 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . r=50 0 . . . . . . -0.0009 . . . . . . . . . . . . . . . . . . . . . . . . r=100 0 . . . . . . -0.0014 . . . . . . . . . . . . . . . . . . . . . . . . . Table 2: n = 2000: In the MSE columns, the numbers in bold-faced text are the smallestamong the evaluated score functions. In both the bias and MSE columns, the underlinednumbers are the true values (those from the correctly specified score functions).15he residuals obtained after regressing the next-day return against all other covariates exceptthe rank, “R”. These residuals are obtained from the multiple linear regression model, whichis defined as follows:RN i = β + X l =0 β l +1 R l i + β ME i + β T i + β TA i + ǫ i , i = 1 , , . . . , n, (6)where n (= 1 , Y ti be the absolute(value of the) residual of company i obtained from the regression (6). We then select theabsolute residuals whose ranks are reported to be within the top 30 for the primary analysis.Below, Y t [ r : n ] is the absolute residual corresponding to rank r on day t for r = 1 , . . . ,
30 and t = 1 , , . . . , T (= 537).In Figure 1, we plot the quantiles of (cid:8) Y t [ r : n ] , t = 1 , , . . . , T (cid:9) for each r = 1 , , . . . , Y t [ r : n ] is not increasing at r = 1 and 2, which we hypothesize reflectsthe heterogeneity of investor expectations with regard to highly attention-grabbing stocks.In other words, the ranking of the Daum board is purely determined by the attention ofindividual investors, and stocks related to news, that is difficult to characterize as eithergood or bad, often receive the greatest attention and the highest ranks. We introduce anadditional term to explain this apparent local non-monotonicity, and consider the model Y t [ r : n ] = µ tY + ρ t σ tY α ( r : n ) + γ t I( r ≤
2) + η t [ r : n ] , r = 1 , , . . . , , (7)for t = 1 , , . . . , T with T = 537 and n = 1 , In the regression model, we consider the scores from the standardized distributions of thelocation-scale families generated by the following three distributions: (i) a uniform distri-bution on (0 ,
1) (called the uniform score), (ii) a positive normal distribution X = | Z | , Z ∼ N (0 ,
1) (called the half-normal score), and (iii) a power-law distribution X whose CDF16 ank (r) C on c o m i t an t ab s o l u t e r e s i dua l ( Y _ [ r : n ] )
30 20 10 1
75% PercentileMean50%25%
Figure 1: Plot of the means and quantiles of (cid:8) Y t [ r : n ] , t = 1 , , . . . , T (cid:9) for each r = 1 , , . . . , F ( x ) = 1 − x − α with α = 2 . ρ t and γ t to minimize the empirical squared-error loss of the model (7) byiterating the following steps:1. Given the least-squares estimator of ρ , denoted by b ρ t (0) , update the estimate of γ asfollows: b γ t = 12 (cid:2)(cid:0) Y t [1: n ] − µ tY − σ tY b ρ t (0) α (1: n ) (cid:1) + (cid:0) Y t [2: n ] − µ tY − σ tY b ρ t (0) α (2: n ) (cid:1)(cid:3) .
2. Given the estimate of γ , denoted by b γ t (0) , update the estimate of ρ using the LSEproposed in the previous section as follows: b ρ t = 1 σ tY ( P r =1 α ( r : n ) (cid:0) Y t [ r : n ] − µ tY − b γ t (0) I( r ≤ (cid:1)P r =1 α r : n ) ) . core1.65 1.70 1.75 Unif. (U)Pos. norm. (PN)Power law (PL)
Score1 2 3 4 5
UPNPL
Score0 5 10 15
UPNPL
Figure 2: { α ( r : n ) } r =1 ,..., for each distribution on different scales, where n = 1 , b ρ t (0) is obtained from the preliminary linear regression on n(cid:0) α ( r : n ) , ( Y t [ r : n ] − µ tY ) /σ tY (cid:1)o r =3 , ··· , , t = 1 , . . . , T , in which the data corresponding to r = 1 , µ tY and σ tY are estimated based on their empirical values as follows: b µ tY = (cid:0) P nr =1 Y t [ r : n ] (cid:1)(cid:14) n and (cid:0)b σ tY ) = P nr =1 (cid:0) Y t [ r : n ] − b µ tY (cid:1) (cid:14) n .To choose the most appropriate score function among the three considered, we follow theguidelines presented in Section 3.2 and perform a residual analysis. First, we plot α ( r : n ) andthe quantiles of the corresponding residuals to identify any remaining trend not explainedby the model (see Figure 3). This figure shows that the uniform score and the half-normalscore exhibit additional linear trends not explained by the linear model (7), whereas thepower-law score performs well. Second, we plot α ( r : n ) , Y t [ r : n ] − b µ Y b σ Y ! , r = 3 , , . . . , , t = 1 , , . . . , T, and apply the least-squares fits with/without intercept. As we know from the model (3),the estimated regression line with intercept should cross the origin if the scores are correctly18pecified. Figure 4 reveals that the (estimate of) the intercept of the power-law score is closestto zero among the intercepts of the three considered scores. Finally, the residual sums ofsquares of the three scores are found to be 152186 .
7, 150706 .
3, and 150288 .
9, respectively.This finding also supports the superiority of the power-law score function, and in the followinganalysis, we focus on the power-law score function.
Rank R e s i dua l
30 20 10 1 − − UniformPositive normalPower law25%50%Mean 75%
Figure 3: The averages and quantiles of the residuals for each rank.
The primary goal of the analysis is to investigate whether the attention of investors affectsthe returns of a stock on the following day. Specifically, we are interested in testing H : ρ = 0under the assumption that ρ t = ρ for every t . To test this hypothesis, we consider a combined19 . . . . . . Score S t anda r d i z ed Y : ( Y − m u ) / s i g m a **************************** **************************** **************************** Unif. Pos. norm. Power law* Mean of std. Y’s for each scoreFitted line with interceptFitted line without intercept
Figure 4: Check of proportionality between the standardized residuals and the scores. Pointsthat are marked by ‘*’ represent the average standardized residuals for each score (rank),the dotted line represents the fitted model for a na¨ıve simple regression with intercept, andthe solid line represents our model. Refer to Sections 3.5 and 5.3 for details.statistic of { b ρ t , t = 1 , , . . . , T } , that is, t ρ = 1 √ T T X t =1 U t , (8)where U t = √ n b ρ t . Here, the estimates of ρ for each day t , denoted by b ρ t , are seriallydependent on each other, as are the U t s. Thus, to obtain the reference distribution of t ρ ,we further assume that { U t , t = 1 , , . . . , T } is stationary and that E | U t | κ < ∞ for κ > t ρ is asymptotically normal with mean 0and variance lim T →∞ T T X k =0 (cid:0) T − k (cid:1) cov( U t , U t + k ) . { U t , t = 1 , , . . . , T } as 1 T m X k =0 (cid:0) T − k (cid:1) c cov( U t , U t + k )for sufficiently large m , where c cov( U t , U t + k ) denotes the empirical covariance of the observedstatistics ( U , U k ) , ( U , U k ) , . . . , ( U T − k +1 , U T ). An additional interesting feature of thecombined procedure is that the test statistic t ρ is a rough estimator of ρ for all T tradingdays (after the scaling). It is calculated as1 √ T T X t =1 U t = 1 √ T T X t =1 √ n b ρ t = √ nT T T X t =1 b ρ t ! = √ nT T T X t =1 P r =1 α ( r : n ) (cid:8) Y t [ r : n ] − b µ tY − b γ t I( r ≤ (cid:9) ˆ σ tY P r =1 α r : n ) ≈ √ nT b σ Y P Tt =1 P r =1 α ( r : n ) (cid:8) Y t [ r : n ] − b µ Y − b γ t I( r ≤ (cid:9) T P r =1 α r : n ) ≈ √ nT b ρ lse , (9)where b ρ lse is the least-squares estimator under the assumption that ρ t = ρ for all t . Thedifference between the right- and left-hand sides of (9) lies in the definition of b γ t , which isdefined using b ρ t rather than b ρ lse .The results of the test indicate that the average value of b ρ t , which is an estimator of ρ ,is 0 . p -value obtained when testing H : ρ = 0 is less than 10 − and statisticallysupports the association between investor attention and the next-day returns of the stocks. In this paper, we study a regression problem based on a partially observed rank covariate. Wepropose a new set of score functions and study their application in simple linear regression.We demonstrate that the least-squares estimator that is calculated based on the newlyproposed score consistently estimates the correlation coefficient between the response andthe unobserved true covariate if the score function is correctly specified. We also define21rocedures based on the obtained residuals to identify the correct score function for thegiven data. The proposed estimator and procedures are applied to rank data collected from
Daum.net , and we empirically verify the association between investor attention and next-daystock returns.We finally conclude the paper with two discussions on the proposed score function. First,the application of the proposed score function is not restricted to linear regression but mayalso be appropriate for other statistical procedures based on rank, including the well-knownrank aggregation problem (Breitling et al., 2004; Eisinga et al., 2013). Second, the scorefunction still can be used for the the multiple linear regression model Y i = X i β + Z T i η + ǫ i with an additional covariate vector Z = (cid:0) Z , Z , . . . , Z q (cid:1) T . Similarly to the case of the simplelinear regression, we have the representations Y [ r : n ] = µ Y + X ( r : n ) − µ X σ X δ + (cid:0) Z [ r : n ] − µ Z (cid:1) T η + ǫ [ r : n ] , where δ = βσ X and ǫ [ r : n ] , r = 1 , , . . . , [ ns ], have mean 0 and independent to each other.Again, the least-squares estimators of δ and η are defined as the solutions to P [ ns ] r =1 α r : n ) P [ ns ] r =1 α ( r : n ) (cid:0) Z [ r : n ] − Z (cid:1)P [ ns ] r =1 (cid:0) Z [ r : n ] − Z (cid:1) T α ( r : n ) P [ ns ] r =1 (cid:0) Z [ r : n ] − Z (cid:1) T (cid:0) Z [ r : n ] − Z (cid:1) ! (cid:18) b δ b η (cid:19) = P [ ns ] r =1 α ( r : n ) (cid:0) Y [ r : n ] − Y (cid:1)P [ ns ] r =1 (cid:0) Z [ r : n ] − Z (cid:1) T (cid:0) Y [ r : n ] − Y (cid:1) . ! , and conjecture that they consistently estimate δ and η . Appendix
A.1 Proof of Theorem 2
Note that b σ Y /σ Y converges in probability to 1 as n → ∞ and √ n (cid:0)b ρ ( s ) − ρ (cid:1) = √ n ( σ Y b σ Y σ Y P [ ns ] r =1 α ( r : n ) (cid:0) Y [ r : n ] − b µ Y (cid:1)P [ ns ] r =1 α r : n ) − ρ ) √ n ( σ Y P [ ns ] r =1 α ( r : n ) (cid:0) Y [ r : n ] − b µ Y (cid:1)P [ ns ] r =1 α r : n ) − ρ ) = √ n ( σ Y P [ ns ] r =1 α ( r : n ) (cid:0) Y [ r : n ] − m ( X ( r : n ) ) (cid:1)P [ ns ] r =1 α r : n ) + ρ P [ ns ] r =1 α ( r : n ) (cid:16)(cid:16) X ( r : n ) − µ X σ X (cid:17) − α ( r : n ) (cid:17)P [ ns ] r =1 α r : n ) + 1 σ Y P [ ns ] r =1 α ( r : n ) ( µ Y − b µ Y ) P [ ns ] r =1 α r : n ) ) . (10)Then, equation (10) can be written as1 σ Y √ n p n Ψ I n (1) P [ ns ] r =1 α r : n ) U( s ) + ρ √ n V( s ) + √ nσ Y R( s ) , where Ψ I n (1) = P nr =1 α r : n ) σ r : n ) /n andU( s ) = 1 p n Ψ I n (1) [ ns ] X r =1 α ( r : n ) (cid:0) Y [ r : n ] − m ( X ( r : n ) ) (cid:1) = 1 p n Ψ I n (1) [ ns ] X r =1 (cid:18) E( X ( r : n ) ) − µ X σ X (cid:19) (cid:16) Y [ r : n ] − m ( X ( r : n ) ) (cid:17) , (11)with m ( X ( r : n ) ) = E (cid:0) Y (cid:12)(cid:12) X ( r : n ) (cid:1) = µ Y + ρσ Y ( X ( r : n ) − µ X ) /σ X andV( s ) = P [ ns ] r =1 α ( r : n ) (cid:0) ( X ( r : n ) − µ X ) /σ X − α ( r : n ) (cid:1)P [ ns ] r =1 α r : n ) , R( s ) = P [ ns ] r =1 α ( r : n ) ( µ Y − b µ Y ) P [ ns ] r =1 α r : n ) . Since R( s ) converges in probability to 0, we only consider the U( s ) and V( s ). Thus, theproof of the theorem is based on the functional central limit theorem for two partial sums ofrank statistics, U( s ) and V( s ).We first consider the asymptotic distribution of the process of taking the weighted partialsum of the induced rank statistic, which isU( s ) = 1 p n Ψ I n (1) [ ns ] X r =1 α ( r : n ) (cid:0) Y [ r : n ] − m ( X ( r : n ) ) (cid:1) = 1 p n Ψ I n (1) [ ns ] X r =1 (cid:18) E( X ( r : n ) ) − µ X σ X (cid:19) (cid:16) Y [ r : n ] − m ( X ( r : n ) ) (cid:17) . (12)23he main finding of Bhattacharya (1974) is the conditional independence of Y [1: n ] , . . . , Y [ n : n ] given X , X , . . . , X n (or equivalently, X (1: n ) , X (2: n ) , . . . , X ( n : n ) ).Thus, given A = σ (cid:0) X , X , . . . , X n , . . . (cid:1) , (12) can be read as S nk = 1 p n Ψ I n (1) k X r =1 α ( r : n ) σ ( r : n ) u r , k = 1 , , . . . , n, (13)where the u r are independent, with mean 0 and variance σ r : n ) . By applying the basic conceptof Skorokhod embedding (Shorack and Wellner, 2009), we obtain a sequence of stoppingtimes τ n , τ n , . . . , τ nn such that • these stopping times are conditionally independent given A , • E (cid:0) τ nk (cid:12)(cid:12) A (cid:1) = P kr =1 α r : n ) σ r : n ) (cid:14) { n Ψ I n (1) } , • var (cid:0) τ nk (cid:12)(cid:12) A (cid:1) = P kr =1 α r : n ) E (cid:8)(cid:0) Y [ r : n ] − m ( X ( r : n ) ) (cid:1) (cid:12)(cid:12) A (cid:9)(cid:14) { n Ψ I n (1) } < ∞ , and • (cid:0) S n , S n , . . . , S nn (cid:1) has the same distribution as (cid:0) W ( τ n ) , W ( τ n + τ n ) , . . . , W ( τ n + τ n + · · · + τ nn ) (cid:1) , where (cid:8) W ( s ) , s ∈ [0 , ∞ ) (cid:9) is conventional Brownian motion.We now consider the embedded partial-sum process (cid:8) W n ( s ) : 0 ≤ s ≤ (cid:9) that is defined by W n ( s ) = S n [ ns ] . As in Bhattacharya (1974), it suffices to show thatsup ≤ s ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n [ ns ] X r =1 τ nr − Ψ I n ( s )Ψ I n (1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (14)converges to 0 probability.For each s ∈ [0 , (cid:0) (cid:14) n (cid:1) P [ ns ] r =1 τ nr almostcertainly converges to Ψ I ∞ ( s ) (cid:14) Ψ I ∞ (1). Both (cid:0) (cid:14) n (cid:1) P [ ns ] r =1 τ nr and Ψ I n ( s ) (cid:14) Ψ I n (1) are increasingfunctions of s . Thus, using the same arguments (Shorack and Wellner, 2009, pp. 62), wefind that their sup difference also converges to 0.Second, √ n V( s ) = 1Φ n ( s ) 1 √ n [ ns ] X r =1 α ( r : n ) (cid:18) X ( r : n ) − µ X σ X − α ( r : n ) (cid:19) (15)24s a linear statistic of order statistics and converges to the normal distribution with mean 0and variance Ψ II ∞ ( s ) (cid:14) Φ ∞ ( s ) (David, 2003, Theorem 11.4). Here, we remark that both Ψ II ∞ ( s )and Φ ∞ ( s ) can also be written as functionals of the distribution of X , as shown in (David,2003).Finally, summing the asymptotic results of U n ( s ) and V n ( s ), we find that √ n (cid:0)b ρ ( s ) − ρ (cid:1) converges to the normal distribution with mean 0 and varianceΨ I ∞ ( s ) /σ Y + ρ Ψ II ∞ ( s )Φ ∞ ( s )This concludes the proof. A.2 Proof of Theorem 1
We first decompose the sample correlation between S n ( r ) and Y [ r : n ] as A + B + C:A = 1 n n X r =1 S n ( r ) (cid:8) Y [ r : n ] − m ( X ( r : n ) ) (cid:9) , B = 1 n n X r =1 S n ( r ) (cid:8) m ( X ( r : n ) ) − µ Y (cid:9) , C = 1 n n X r =1 S n ( r ) (cid:8) µ Y − Y n (cid:9) , where m ( X ( r : n ) ) = µ Y + ρσ Y X ( r : n ) . In below, we compute the limit of each A , B, and C.First, similarly to the convergence of U( s ) (with s = 1) in Appendix A, we can show that √ n A converges in distribution to a normal random variable and, thus, A converges to 0in probability. Second, similarly to the convergence of R( s ) (with s = 1) in Appendix A,we can show that √ n C converges in distribution to a normal random variable and, thus, C25onverges to 0 in probability. Lastly,B = 1 n n X r =1 S n ( r ) (cid:8) m ( X ( r : n ) ) − µ Y (cid:9) = ρσ Y · n n X r =1 S n ( r ) X ( r : n ) − µ X σ X = ρσ Y · n n X r =1 S n ( r ) (cid:26) X ( r : n ) − µ X σ X − α ( r : n ) (cid:27) + ρσ Y · n n X r =1 S n ( r ) α ( r : n ) , whose first term converges to 0 in probability similarly to the convergence of V( s ) (with s = 1) in Appendix A. Hence, B converges in probability to the limit of ρσ Y · n n X r =1 S n ( r ) α ( r : n ) . (16)Since P nr =1 S n ( r ) = 1 and P nr =1 α r : n ) approaches 1, (16) is maximized when S n ( r ) = α ( r : n ) in asymptotic. References
Bhattacharya, P. K. (1974). Convergence of sample paths of normalized sums of inducedorder statistics.
The Annals of Statistics , , 1034–1039.Breitling, R., Armengaud, P., Amtmann, A. and Herzyk, P. (2004). Rank products: a simple,yet powerful, new method to detect differentially regulated genes in replicated microarrayexperiments. FEBS Letters , , 83-92.Camerer, C. (2003). The behavioral challenge to economics: Understanding normal peo-ple. Conference Series, Proceedings, 48. Paper presented at Federal Bank of Boston 48thConference on ’How humans behave: Implications for economics and policy’.Costantini, P., Linting, M., and Porzio, G.C. (2010). Mining performance data throughnonlinear PCA with optimal scaling. Applied Stochastic Models in Business and Industry , , 85-101. 26avid, H.A. and Galambos, J. (1974). The asymptotic theory of concomitants of orderstatistics. Journal of Applied Probability , , 762-770.David, H.A. (2003). Order Statistics . John Wiley and Sons, New Jersey.de Leeuw, J. and Mair, P. G (2009). Gifi methods for opti- mal scaling in R: The packagehomals.
Journal of Statistical Software , , 2009.Eisinga, R., Breitling, R. and Heskes, T. (2013). The exact probability distribution of therank product statistics for replicated experiments. FEBS Letters , , 677-682.Gertheiss, J. (2014). ANOVA for factors with ordered levels. Journal of Agricultural, Bio-logical, and Environmental Statistics , , 258–277.Graubard, B.I. and Korn, E.L. (1987). Choice of column scores for testing independence inordered 2K contingency tables. Biometrics , , 471-476.Ivanova, A. and Berger, V.W. (2001). Drawbacks to integer scoring for ordered categoricaldata. Biometrics , , 567-570.H´ajek, J. (1968). Asymptotic normality of simple linear rank statistics under alternatives. Annals of Mathematical Statistics, , 325-346.Hora, S.C. and Conover, W.J. (1984). The F-statistic in the two-way layout with rank-scoretransformed data. Journal of the American Statistical Association , , 668-673.Jacoby, W.G. (2016) opscale: A Function for Optimal Scaling. http://polisci.msu.edu/jacoby/icpsr/scaling/computing/alsos/acoby,%20opscale%20MS.pdf (August 31, 2016).Kahneman, D. (1973). Attention and Effort . Prentice-Hall, New Jersey.27imeldorf, G., Sampson, A.R. and Whitaker, L.R. (1992). Min and max scorings for two-sample ordinal data.
Journal of the American Statistical Association , , 241-247.Linting, M., Meulman, J.J., Groenen, P.J.F., and Van der Kooji, A.J. (2007). Nonlinear prin-cipal components analysis: Introduction and application. Psychological Methods , ,336-358.Mair, P. and de Leeuw, J. (2010). A general framework for multivariate analysis with optimalscaling: The R package aspect. Journal of Statistical Software , , 2010.Senn, S. (2007). Drawbacks to noninteger scoring for ordered categorical data. Biometrics , , 296-298.Shorack, G.R. and Wellner, J.A. (2009). Empirical Processes with Applications to Statistics ,The Society for Industrial and Applied Mathematics, Philadelphia, PA.Zheng, G. (2008). Analysis of ordered categorical data: two score-independent approaches.
Biometrics ,64