[PDF] Predicting Recession Probabilities Using Term Spreads: New Evidence from a Machine Learning Approach

Abstract

The literature on using yield curves to forecast recessions typically measures the term spread as the difference between the 10-year and the three-month Treasury rates. Furthermore, using the term spread constrains the long- and short-term interest rates to have the same absolute effect on the recession probability. In this study, we adopt a machine learning method to investigate whether the predictive ability of interest rates can be improved. The machine learning algorithm identifies the best maturity pair, separating the effects of interest rates from those of the term spread. Our comprehensive empirical exercise shows that, despite the likelihood gain, the machine learning approach does not significantly improve the predictive accuracy, owing to the estimation error. Our finding supports the conventional use of the 10-year--three-month Treasury yield spread. This is robust to the forecasting horizon, control variable, sample period, and oversampling of the recession observations.

Full PDF

PPredicting Recession Probabilities Using Term Spreads:New Evidence from a Machine Learning Approach

Jaehyuk Choi a , Desheng Ge a , Kyu Ho Kang b, ∗ , Sungbin Sohn a a Peking University HSBC Business School, University Town, Nanshan, Shenzhen 518055, China b Department of Economics, Korea University, Seoul 02841, South Korea

Abstract

The literature on using yield curves to forecast recessions typically measures the term spread as thediﬀerence between the 10-year and the three-month Treasury rates. Furthermore, using the term spreadconstrains the long- and short-term interest rates to have the same absolute eﬀect on the recessionprobability. In this study, we adopt a machine learning method to investigate whether the predictiveability of interest rates can be improved. The machine learning algorithm identiﬁes the best maturitypair, separating the eﬀects of interest rates from those of the term spread. Our comprehensive empiricalexercise shows that, despite the likelihood gain, the machine learning approach does not signiﬁcantlyimprove the predictive accuracy, owing to the estimation error. Our ﬁnding supports the conventionaluse of the 10-year–three-month Treasury yield spread. This is robust to the forecasting horizon, controlvariable, sample period, and oversampling of the recession observations.

JEL classiﬁcation : C52, E32, E43

Keywords : Yield curve, estimation risk, density forecasting ∗ Corresponding author. Tel.: +82 2-3290-5132.

Email addresses: [email protected] (Jaehyuk Choi), [email protected] (Desheng Ge), [email protected] (Kyu Ho Kang), [email protected] (Sungbin Sohn)

Preprint submitted to arXiv.org January 26, 2021 a r X i v : . [ ec on . E M ] J a n . Introduction The term spread is a well-established indicator of recessions. On average, the yield curve is gentlyupward sloping and concave. However, over the past 50 years, recessions have often been preceded by aﬂattening or even an inversion of the curve. Motivated by this stylized fact, numerous empirical studieshave examined the ability of the slope of the yield curve or the term spread to predict future recessions. The literature on using the yield curve to predict recessions typically measures the term spread asthe diﬀerence between the 10-year bond yield and the three-month bond yield, and tend to focus onquantifying the predictive power of the term spread at a particular forecasting horizon. The forecastinghorizon is usually chosen at four quarters, where the predictive ability of the term spread is maximized(Estrella and Trubin, 2006; Berge and Jord`a, 2011). Few studies have formally discussed criteria forselecting a pair of short- and long-term interest rates from a number of bond yields with diﬀerentmaturities. Moreover, using the term spread as a predictor implies that the absolute values of thecoeﬃcients of the short-term and long-term interest rates are constrained to be the same.Government bond yields are determined by the sum of the market expectation and the risk premium.While the risk premium is counter-cyclical, the policy rates are pro-cyclical, and the market expectationcomponent is the expected path of future short-term rates. If the risk premium is very small, theterm spread contains much predictive information about future business cycles and future policy rates.However, the portion of the market expectation component in a bond yield diﬀers across the maturity.In addition, liquidity risk premiums are strongly time varying, particularly in the short-term Treasurybill markets (Goyenko et al., 2011). Therefore, the predictive ability of the term spread can be sensitiveto the maturity combination. For the same reason, the absolute regression eﬀects of the interest rates onthe recession probability are not necessarily the same, and the short- and long-term interest rates mayhave separate eﬀects from the spread.The question we address here is whether relaxing the restrictions on the ﬁxed maturity pair and thecoeﬃcients can improve the predictive ability of the bond yields. This question is particularly importantfrom a statistical point of view. If the answer is yes, then the choice of maturity pair should be includedin the prediction procedure, and the short- and long-term yields should be used as separate predictors.However, if the answer is no, then the estimation error is substantial, and the conventional use of the10-year–three-month Treasury yield spread is justiﬁed.To answer the question, we use a machine learning (ML) framework to search for the best maturity Stock and Watson (1989) and Estrella and Hardouvelis (1991) ﬁnd evidence in support of the predictive power of theslope of the yield curve for continuous real activity variables. Since then, several works have reported the predictive abilityof the term spread for recessions as well, including Estrella (2005), Rudebusch and Williams (2009), Ergungor (2016),Engstrom and Sharpe (2018), Johansson and Meldrum (2018), and Bauer and Mertens (2018a). In particular, Rudebuschand Williams (2009) show that a simple prediction model based on the term spread outperforms professional forecastersat predicting recessions three and four quarters ahead. This term spread is analyzed by Estrella (2005); Estrella and Trubin (2006). See also Bauer and Mertens (2018b) fora comparison of its performance with that of other term spreads. L regularization for multipleforecasting horizons. Next, we validate the classiﬁcation algorithm by comparing the out-of-sampleprediction against the benchmark spread (i.e., the 10-year–three-month term spread).Our two key ﬁndings are based on US data for the period June 1961 to July 2020. First, the optimalmaturity pairs for most horizons vary (10 year, three month). For instance, for the one-quarter-aheadrecession prediction, (10 year, six month) is selected by the machine learning algorithm. The 20-year–one-year spread performs best for horizons of seven and eight quarters. In addition, the absolute eﬀect of theshort-term yield is found to be larger than that of the long-term yield. Second, and more importantly,we ﬁnd that the prediction gain from the maturity pair optimization or separation of the regressioneﬀects is not statistically signiﬁcant. In particular, the benchmark spread provides better forecasts thanthose of the machine learning approach for short- and medium-term horizons. These ﬁndings are robust,even when controlling the leading business cycle index. The poor performance of the proposed approachindicates that the eﬃciency loss from the estimation risk dominates the likelihood gain.In summary, based on a comprehensive empirical analysis, we provide new and interesting evidencejustifying the conventional use of the 10-year–three-month term spread. This is our main contribution tothe literature on estimating the probability of a recession using yield curve information. The remainderof this paper is organized as follows. In Section 2, we discuss our machine learning algorithm and data.Section 3 presents our ﬁndings, and Section 4 concludes the paper.

2. Machine Learning Algorithm and Data L Regularization

We let y t denote the binary recession indicator at month t (one if a recession, zero if not). The k -month-ahead recession probability ˆ y t + k at month t is predicted asˆ y t + k = Prob( y t + k = 1 | x t ) = φ ( − β − β T x t ) , where φ ( z ) = (1 + e − z ) − , x t is the Treasury yield vector of p diﬀerent maturities at month t , β is thecorresponding coeﬃcient vector, and β is the intercept. Note that the sign of β + β T x t is deliberatelynegated inside the logistic function, φ ( · ) so that the recession probability increases when the linearcombination of the yields becomes more negative. This ensures consistency with the stylized fact thata recession is typically followed by a negative term spread, deﬁned as the long-term yield minus theshort-term yield. 3ur model diﬀers from a traditional logistic regression, in that, large coeﬃcient values are penalized,and the likelihood is maximized. Speciﬁcally, we ﬁnd β and β that minimize the cost function, J ( β , β ) = − log L ( β , β ) + λ (cid:107) β (cid:107) ( λ ≥ , (1)where (cid:107) β (cid:107) = | β | + · · · + | β p | is the L norm of the coeﬃcient vector, and log L is the log likelihood overthe training period T ,log L ( β , β ) = (cid:88) t + k ∈T ( y t + k ln(ˆ y t + k ) + (1 − y t + k ) ln(1 − ˆ y t + k )) . (2)The regression for continuous variables with the same L regularization is well known as the leastabsolute shrinkage and selection operator (LASSO) (Hastie et al., 2009). The LASSO diﬀers from theridge regression, which instead uses the L norm (cid:107) β (cid:107) = β + · · · + β p . The use of L or L regularizationtends to make the coeﬃcients smaller in magnitude as the regularization strength λ increases, hence theshrinkage regression. The shrinkage regression method has recently been adopted in economic forecasting.For example, Hall (2018) uses elastic-net regularization (Zou and Hastie, 2005), where both L and L penalties are used for regularization; see Kim and Swanson (2018) for the eﬀectiveness of the shrinkagemethod in predicting various macroeconomic variables.Although the L penalty makes the optimization more diﬃcult than when using the quadratic L penalty, it is attractive because it forces certain coeﬃcients to be set to zero; see the graphical interpre-tation in Hastie et al. (2009, Figure 3.11). Therefore, it eﬀectively simpliﬁes the model by using only asmall subset of the input variables, thus performing a feature selection. In contrast, the L regularizationdoes not set the coeﬃcients to zero, although it does shrink the magnitudes.The logit model with L regularization is a natural choice of algorithm for this study because we aimto simultaneously extract two maturities and ﬁnd their coeﬃcients from the term structure, without anyprior knowledge. We do not consider more complicated machine learning algorithms that make full useof the rates from all maturities. Indeed, previous studies have adopted methods such as support vectormachines (Gogas et al., 2015) and boosted regression trees (D¨opke et al., 2017). However, although thesemethods may be of interest on their own, they are not appropriate for studying the prediction power ofthe term spread.Note the following with regard to implementation. Because the coeﬃcient magnitude is in the objec-tive function, equation (1), the optimization result depends on the scale of the input variables. Therefore,each feature variable is normalized using a z-score before the optimization (Hastie et al., 2009). In theremainder of the paper, the coeﬃcients ( β , β ) are reported in the original scale, unless stated other-wise. We use the regularized logistic regression in the scikit-learn Python package of Pedregosa et al.42011) to solve β and β , given λ . In searching for λ , we geometrically set the regularization strengthas λ = 2 k/ , and increase the integer k from a large negative value. Then, we report the ﬁrst value of λ at which the number of nonzero coeﬃcients in β becomes the desired number (i.e., two). Suppose that the pair of Treasury yields at month t extracted from the L regularization are denoted by( x i,t , x j,t ), and are dependent on the forecasting horizon. The resulting predictive model is given by M ( generalized spread of the ML pair )ˆ y t + k = φ ( − β − β i x i,t − β j x j,t ) . We refer to the linear combination of the yields and a constant, β + β i x i,t + β j x j,t , as a generalizedspread of the ML pair. To evaluate the prediction performance of the generalized spread, we comparethe proposed model with its three nested logit models. The ﬁrst alternative model uses the term spreadmeasured by the simple diﬀerence of the ML pair (hereafter, the simple spread of the ML pair). Thatis, the model is given by M ( simple spread of the ML pair )ˆ y t + k = φ ( − β − β i ( x i,t − x j,t )) , where the absolute regression eﬀects of x i,t and x j,t are constrained to be the same (i.e., β i = − β j ).In the second model, the conventional yield pair is used without the coeﬃcient restriction, as follows: M ( generalized spread of the conventional pair )ˆ y t + k = φ ( − β − β i l t − β j s t ) , where l t and s t are the 10-year and three-month yields, respectively, at month t . The third model isthe benchmark and the most restricted version of the proposed model, in which the simple spread of theconventional pair is used as a predictor: M ( simple spread of the conventional pair )ˆ y t + k = φ ( − β − β i ( l t − s t )) . By comparing the proposed model with these nested frameworks, we separately identify the impor-tance of the maturity pair selection and that of the coeﬃcient restriction. Note that the model parameters The class can be downloaded from scikit-learn.org

We describe the data used in the analysis. For the binary state of a recession, y t , we use the monthlyperiods of recessions deﬁned by the National Bureau of Economic Research (NBER). For the yield curve x t , we use the monthly averaged constant maturities time series of the Treasury yields from the H.15page of the Federal Reserve website. Our sample period is June 1961 to July 2020. For this period,we use three- and six-month and one-, two-, three-, ﬁve-, seven-, 10-, and 20-year yields. Because notall series are available in the early part of the sample period, alternative sources are used to ﬁll in themissing data. The three- and six-month Treasury bill rates until August 1981 are taken as the secondarymarket rates from the same website. Because they are recorded on a discount basis, we convert them toa bond-equivalent basis, as in Estrella (2005). The two- and seven-year yields until May 1976 and June1969, respectively, are obtained from the zero-coupon Treasury yield curve by G¨urkaynak et al. (2006).We split the overall period into training (in-sample) and test (out-of-sample) periods at the end of 1995.

3. Results

This section reports the results of the best pair selection for a wide range of forecasting horizons, includingits associated coeﬃcients, and discusses the implications of these results. As a baseline examination, weuse the period June 1961 to December 1995 for training (pair selection and coeﬃcient estimation), andthe period since 1996 for testing (out-of-sample evaluation). Then, we check the robustness of the ﬁndingsby trying alternative training/test periods and handling the problems of imbalanced classiﬁcation andmissing variables.

Figure 1 visually illustrates the best pair-selection process described in Section 2.1. As we increasethe magnitude of the penalty parameter, λ , in equation (1), the coeﬃcients of the features with littlecontribution to the likelihood maximization are set to zero. We continue increasing the penalty strengthuntil only two yields survive. In the case of the 12-month-ahead recession prediction, three-month andseven-year yields survive as the pair that best explains the recession from June 1961 to December 1995.Panel A in Table 1 summarizes the results of the machine learning analyses for the various forecastinghorizons. There are two main ﬁndings. First, we ﬁnd that for every forecasting horizon, one long-termand one short-term yield are selected as the best pair for recession prediction, and their coeﬃcients havepositive and negative signs, respectively. This conﬁrms the use of the term spread in the past literature.However, unlike the conventional approach, the best prediction performance is achieved with a pair otherthan the 10-year and three-month yields. Speciﬁcally, when the forecasting horizon is relatively short(three or six months), the Treasury yields with 10-year and six-month maturities are the best predictors6 igure 1: The path of the coeﬃcients in forecasting using the 12-month horizon as a function of the L regularizationstrength λ with a 12-month forecasting horizon. The black dotted line is the value of λ = 0 . Penalty parameter (log ) C o e ff c i e n t s ( s t a n d a r d i z e d ) of a recession. For long horizons (18, 21, 24 months), the longest available term (20-year) yield anda short-term yield (three-month, six-month, or one-year, depending on the forecasting horizon) workas the best pair. These ﬁndings are roughly consistent with the notion that longer-term yields reﬂectinformation about a relatively more distant future.Second, the optimal coeﬃcients of the generalized term spread are close to, but diﬀerent from (1 , − − . (cid:0) = (cid:12)(cid:12) − . . (cid:12)(cid:12)(cid:1) in three-month-ahead forecasting, but thisdecreases to 1.382 (cid:0) = (cid:12)(cid:12) − . . (cid:12)(cid:12)(cid:1) , 1.184 (cid:0) = (cid:12)(cid:12) − . . (cid:12)(cid:12)(cid:1) , and 1.069 (cid:0) = (cid:12)(cid:12) − . . (cid:12)(cid:12)(cid:1) for six-, 12- and 24-month-ahead forecasting, respectively. This pattern implies that the machine learning approach may improvethe simple term spread, particularly for shorter-horizon recession predictions.Figure 2 depicts the generalized term spread (upper panel) and the implied 12-month-ahead recessionprobability (lower panel). The shaded vertical bar indicates the recession periods, and the dotted blackline divides the training (in-sample) and test (out-of-sample) periods. We observe clear spikes in theprobability during (or ahead of) recessions. Interestingly, the most recent recession (starting March2020) is quite precisely detected by the generalized term spread, despite the fact that we use the sampleonly until 1995 for the parameter estimation, and that the recession was oﬃcially announced in June2020. This ﬁgure provides a visual validation for the generalized term spread in recession prediction.7 able 1: Pair and coeﬃcient selection from the training period, 1961–1995

Panel A presents the results forthe best pair and coeﬃcients selected by machine leaning for several forecasting horizons. Panel B presents the recessionprediction performance for the simple term spread of the pair selected in Panel A. Panel C (D) presents the performancefor the generalized (simple) term spread of the conventional 10-year and three-month pair. λ is the strength of the L penalty, AUC train (AUC test ) is the area under the ROC in the training (test) period, log L (log PPL) is the average loglikelihood (PPL) in the training (test) period, and the EBF is the ratio of the PPL of an alternative model to that of thebenchmark model. The longest maturity in the sample is 20 years. Horizon Pair β λ AUC train

AUC test log L log PPL EBF

Panel A. Generalized spread of the ML pair

Panel B. Simple spread of the ML pair

Panel C. Generalized spread of the conventional pair

Panel D. Simple spread of the conventional pair igure 2: The generalized term spread and the implied recession probability 12 months ahead . The shadedvertical bars indicate the recession periods deﬁned by the NBER. The dotted black line divides the training (in-sample)and test (out-of-sample) periods; the training period is June 1961 to December 1995, and the test period is until July2020. Note that the train–test split in the upper panel is 12 months behind (i.e., December 1994), owing to the forecastinghorizon. Sp r e a d ( % ) R e c e ss i o n P r o b . The key question we address in this study is whether the generalized term spread based on the machineleaning approach outperforms the simple term spread of the conventional pair. To this end, we comparethe out-of-sample recession prediction performance of the four competing models; the results can befound in Table 1.The predictive recession probability accuracy is measured by the log posterior predictive likelihood(PPL). The empirical Bayes factor (EBF) is the ratio of the PPL of an alternative model to that ofthe benchmark model. An EBF larger than one indicates stronger support for the alternative modelthan the benchmark model by the data. Table 1 shows that, for all forecasting horizons, the proposedmodel, M ( generalized spread of the ML pair) , is not preferred to its nested competing models in termsof the log PPL. For the horizons of three, six, and nine months, the benchmark, M ( simple spread of theconventional pair ), provides the best forecasts. Although M ( simple spread of the ML pair ) outperformsthe benchmark for the other horizons, it is not statistically signiﬁcant, because the largest EBF is atmost 1.026. All EBFs are much less than √

10, regardless of the horizon. Based on Jeﬀreys’ criterion,the evidence that any alternative model is more supported by the data than the benchmark model isvery weak. Therefore, the prediction gain from choosing the maturity pair or relaxing the coeﬃcientrestriction is not substantial.The poor out-of-sample prediction performance of the proposed model seems to arise from the ineﬃ-9iency due to the coeﬃcient estimation, as pointed out in the equity return prediction literature (Welchand Goyal, 2008; DeMiguel et al., 2009). The pair selection itself, which is relatively less subject to theestimation error, could still be conducive to improving the recession prediction. We can easily test thisconjecture by comparing the models M ( simple spread of the ML pair ) and M ( generalized spread of theconventional pair ). Given that the EBF measures the prediction performance of an alternative modelrelative to the benchmark, the reciprocal of the EBFs of M ( simple spread of the ML pair ) presents theineﬃciency from the pair selection. Similarly, the reciprocal of the EBFs of M ( generalized spread of theconventional pair ) quantiﬁes the coeﬃcient estimation risk. The EBFs from the simple spread of the MLpair are larger than or equal to those from the generalized spread of the conventional pair, regardless ofthe horizon. As a result, the ineﬃciency is attributed more to the coeﬃcient estimation than it is to thepair selection.We evaluate the area under the ROC curve (hereafter, ROC-AUC) as a supplementary performanceevaluation to the log PPL. The ROC curve presents a collection of the (false positive rate, true positiverate) coordinates for various decision thresholds between zero and one. The true positive rate is the ratioof correct predictions among real recessions (i.e., related to type-II error). The false positive rate is theratio of incorrect predictions among real non-recessions (i.e., related to type-I error). Similar to the logPPL, the ROC-AUC captures the predictive power of the model without any speciﬁc decision threshold;the value is one for a perfect model, and 0.5 for a random guess. The ROC-AUC is an establishedperformance measure in machine learning (Bradley, 1997), and has recently been used in the context ofrecession prediction by Bauer and Mertens (2018b) and Tsang and Wu (2019).Figure 3 depicts the ROCs for six forecasting horizons, and Table 1 reports the ROC-AUCs. Theseresults provide evidence in favor of M ( simple spread of the ML pair ) in terms of ROC-AUC. Thesimple spread of the ML pair in the test period exhibits the largest ROC-AUC in most forecastinghorizons. Nevertheless, the diﬀerence between the ROC-AUCs of M ( simple spread of the ML pair ) andthe benchmark is not substantial in any of the horizons.Although not intended, our ﬁndings validate the widely used, but seemingly ad hoc conventional termspread. Recall that we attempt to ﬁnd the best yield pair for recession probability prediction withoutimposing any restrictions. The machine learning approach does not base its results on economic theory oracademic norms, but only on the best (in-sample) prediction performance. However, the ﬁndings supportusing the conventional 10-year and three-month term spread surprisingly well, in that, the pairs from themachine learning similarly consist of one long- and one short-term yield, and that the coeﬃcients have theopposite signs with similar magnitudes, in the absolute sense. Although the machine learning approachchooses a slightly diﬀerent pair from the (ten-year, three-month) combination and the coeﬃcient ratiomodestly deviates from one, the resultant out-of-sample prediction performance is not distinguishablefrom that of the conventional term spread. 10 igure 3: The receiver operating characteristic (ROC) curves for several forecasting horizons

The lines A,B, C, and D indicate the results for the generalized spread of the ML pair, the simple spread of ML pair, the generalizedspread of the conventional pair, and the simple spread of the conventional pair, respectively. The area under the ROCcurve is given in parentheses. The ROC curves are evaluated from the test period, where the training period is 1961–1995.

False Positive Rate T r u e P o s i t i v e R a t e Horizon: 6m A: i × 10y - j × 6m (0.644)B: i × (10y - 6m) (0.657)C: i × 10y - j × 3m (0.630)D: i × (10y - 3m) (0.654) False Positive Rate T r u e P o s i t i v e R a t e Horizon: 9m A: i × 10y - j × 3m (0.766)B: i × (10y - 3m) (0.780)C: i × 10y - j × 3m (0.767)D: i × (10y - 3m) (0.780) False Positive Rate T r u e P o s i t i v e R a t e Horizon: 12m A: i × 7y - j × 3m (0.850)B: i × (7y - 3m) (0.861)C: i × 10y - j × 3m (0.836)D: i × (10y - 3m) (0.846) 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate T r u e P o s i t i v e R a t e Horizon: 15m A: i × 7y - j × 3m (0.890)B: i × (7y - 3m) (0.898)C: i × 10y - j × 3m (0.881)D: i × (10y - 3m) (0.890) False Positive Rate T r u e P o s i t i v e R a t e Horizon: 18m A: i × 20y - j × 3m (0.899)B: i × (20y - 3m) (0.916)C: i × 10y - j × 3m (0.893)D: i × (10y - 3m) (0.901) False Positive Rate T r u e P o s i t i v e R a t e Horizon: 21m A: i × 20y - j × 6m (0.898)B: i × (20y - 6m) (0.907)C: i × 10y - j × 3m (0.880)D: i × (10y - 3m) (0.890) .3. Robustness Checks In this subsection, we conduct various robustness checks to ensure that our ﬁndings do not result froma speciﬁc choice of training/test samples or oversampling and missing variable problems.

First, we try alternative training and test periods. Tables 2 and 3 show the results when the trainingperiod extends to 2005 and 2015, respectively. Accordingly, there are fewer recession events in thealternative test periods. Although the speciﬁc choice of the best pair varies slightly, the three keyﬁndings remain unchanged: (i) the pair chosen using machine learning consists of one short-term andone long-term yield, with coeﬃcients of opposite signs; (ii) M ( simple spread of the ML pair ) seems tobe the best in terms of the log PPL, particularly for a longer forecasting horizon; and (iii) the maturitypair selection or coeﬃcient estimation separating the eﬀects of the short- and long-term yields does notimprove the predictive accuracy signiﬁcantly. In particular, the largest EBFs in Tables 2 and 3 are 1.097and 1.114, respectively. These EBFs imply that the model weights on the best alternative model are lessthan 0.53 in an empirical Bayesian model averaging framework. Consequently, the contribution of themachine learning approach to the improvement of the log PPL is at most marginal, if any.Note that if the training period extends to 2005 or 2015, there are only one or two recession events inthe out-of-sample data set. Thus, the prediction that recessions will be absent would be correct almostall the time. As a result, the AUC becomes close to one, particularly for long forecasting horizons, andmay not work eﬀectively as a valid measure of prediction performance. The forecasting of recessions has a typical imbalanced classiﬁcation problem in the sense that theperiod of a recession ( y t = 1) forms only a small fraction of the whole sample period. In such problems,the trained models are heavily skewed to the majority class (i.e., non-recession), and the prediction onthe minority class (i.e., recession) is very poor. A machine learning practice that avoids this issue appliesa weight that is inversely proportional to the frequency of each class, which equalizes the importance ofthe two classes. If the ratio of the recession in the training period is r , the log likelihood is modiﬁed tolog L ( β , β ) = (cid:88) t + k ∈T w t + k ( y t + k ln(ˆ y t + k ) + (1 − y t + k ) ln(1 − ˆ y t + k )) , (3)where w t =  r if y t = 112(1 − r ) if y t = 0 . This is equivalent to oversampling the recession observations (1 − r ) /r times. Note that the original loglikelihood, equation (2), is recovered when the recession and non-recession periods are equally balanced12 able 2: Pair and coeﬃcient selection from the training period, 1961–2005

AUC test log L log PPL EBF

Panel A. Generalized spread of the ML pair

Panel B. Simple spread of the ML pair

Panel C. Generalized spread of the conventional pair

Panel D. Simple spread of the conventional pair able 3: Pair and coeﬃcient selection from the training period, 1961–2015

AUC test log L log PPL EBF

Panel A. Generalized spread of the ML pair

Panel B. Simple spread of the ML pair

Panel C. Generalized spread of the conventional pair

Panel D. Simple spread of the conventional pair r = 1 / Here, the ratio r is understood as a model parameter inferred from the training period. Assuch, the same r from the training period should be used for the log likelihood over the test period (i.e.,log PPL).Table 4 reports the result for the case when the training period runs until 1995. Because the ratio ofthe recession during the training period is approximately r = 0 .

14, the weighted regression is equivalentto oversampling the recessions six times. Under this test, our main ﬁndings remain unchanged. Inparticular, the superior performance of the simple term spread (Panels B and D) over the generalizedterm spread (Panels A and C) is more pronounced than in Table 1.

Finally, we repeat the analyses with additional recession predictors in order to account for the missingvariable problem. Speciﬁcally, we include the leading business cycle indicator and ensure that the variableis always selected in the machine learning approach by excluding its coeﬃcient in the L penalty term. We also consider the 30-year Treasury yield, which makes our training period start from 1982. Thisexamination shows whether the yield pair from machine learning has any additional predictive abilitybeyond that which the leading indicator already explains. It is also worth checking whether a pair withone short-term and one long-term yield is still chosen by the machine learning method, even when theleading indicator is already controlled. Table 5 shows that even when the leading indicator is included as a default variable, the pair choicefrom machine learning is rarely aﬀected: for most forecasting horizons, one long-term and one short-term yield are still chosen, and the out-of-sample prediction performance is not improved signiﬁcantlyover that of the conventional 10-year–three-month pair. The weight for model averaging is again near0.5, indicating that the performance of the pair from machine learning is almost equal to that of theconventional pair.

4. Conclusion

The ten-year and three-month term spread is widely accepted in estimating predictive recession proba-bilities. The contribution of our study is to provide a justiﬁcation for using the conventional term spreadto predict such probabilities. To this end, we formally and comprehensively test whether this predictionability can be improved. We identify the optimal maturity pair, allowing for separate regression eﬀectsof short- and long-term interest rates. According to our empirical exercise, relaxing the restrictions onthe maturity pair and coeﬃcients does not improve the predictive ability of the yield curve, owing to For the implementation, we use the “balanced” option for the class weight parameter. The US leading business cycle indicator is available from the Saint Louis Fed website. Because the 10-year–three-month spread is one component in the US leading indicator, we are ex ante agnostic aboutwhether the pairs from machine learning are again composed of one short- and one long-term yield. able 4: Pair and coeﬃcient selection with recession oversampling

The results in this table are obtained usingthe weighted regression in equation (3), which has an eﬀect of oversampling one recession observation six (= (1 − r ) /r )times from the recession ratio, r = 0 .

14, in the training period, 1961–1995. Panel A presents the results of the best pairand coeﬃcients selected by machine leaning for several forecasting horizons. Panel B presents the recession predictionperformance for the simple term spread of the pair selected in Panel A. Panel C (D) presents the performance for thegeneralized (simple) term spread of the conventional 10-year and three-month pair. λ is the strength of the L penalty,AUC train (AUC test ) is the area under the ROC in the training (test) period, log L (log PPL) is the average log likelihood(PPL) in the training (test) period, and the EBF is the ratio of the PPL of an alternative model to that of the benchmarkmodel. The longest maturity in the sample is 20 years. Horizon Pair β λ AUC train

AUC test log L log PPL EBF

Panel A. Generalized spread of the ML pair

Panel B. Simple spread of the ML pair

Panel C. Generalized spread of the conventional pair

Panel D. Simple spread of the conventional pair able 5: Pair and coeﬃcient selection with leading indicator included

The results in this table are obtained byincluding the US leading indicator as a default variable, in addition to the ML or conventional pair. Panel A presentsthe results of the best pair and coeﬃcients selected by machine leaning for several forecasting horizons. Panel B presentsthe recession prediction performance for the simple term spread of the pair selected in Panel A. Panel C (D) presentsthe performance for the generalized (simple) term spread of the conventional 10-year and three-month pair. β LI is thecoeﬃcient of the leading indicator, λ is the strength of the L penalty, AUC train (AUC test ) is the area under the ROC inthe training (test) period, log L (log PPL) is the average log likelihood (PPL) in the training (test) period, and the EBFis the ratio of the PPL of an alternative model to that of the benchmark model. The training period is from June 1982 toDecember 1995. The longest maturity in the sample is 30 years. Horizon Pair β β LI λ AUC train

AUC test log L log PPL EBF

Panel A. Generalized spread of the ML pair

Panel B. Simple spread of the ML pair

Panel C. Generalized spread of the conventional pair

Panel D. Simple spread of the conventional pair

References

Bauer, M.D., Mertens, T.M., 2018a. Economic Forecasts with the Yield Curve. FRBSFEconomic Letter 2018-07. Federal Reserve Bank of San Francisco. URL: .Bauer, M.D., Mertens, T.M., 2018b. Information in the Yield Curve about FutureRecessions. FRBSF Economic Letter 2018-20. Federal Reserve Bank of San Fran-cisco. URL: .Berge, T.J., Jord`a, `O., 2011. Evaluating the Classiﬁcation of Economic Activity into Recessions andExpansions. American Economic Journal: Macroeconomics 3, 246–277. doi: .Bradley, A.P., 1997. The use of the area under the ROC curve in the evaluation of machine learningalgorithms. Pattern Recognition 30, 1145–1159. doi: .DeMiguel, V., Garlappi, L., Uppal, R., 2009. Optimal Versus Naive Diversiﬁcation: How Ineﬃcient is the1/N Portfolio Strategy? The Review of Financial Studies 22, 1915–1953. doi: .D¨opke, J., Fritsche, U., Pierdzioch, C., 2017. Predicting recessions with boosted regression trees. Inter-national Journal of Forecasting 33, 745–759. doi: .Engstrom, E.C., Sharpe, S.A., 2018. The Near-Term Forward Yield Spread as a Leading Indicator: ALess Distorted Mirror. Finance and Economics Discussion Series 2018-055. Washington: Board ofGovernors of the Federal Reserve System. doi: .Ergungor, O.E., 2016. Recession Probabilities. Economic Commentary 2016-09. Federal Reserve Bankof Cleveland. URL: .Estrella, A., 2005. The Yield Curve as a Leading Indicator. Technical Report. Federal Reserve Bank ofNew York. URL: .Estrella, A., Hardouvelis, G.A., 1991. The Term Structure as a Predictor of Real Economic Activity.The Journal of Finance 46, 555–576. doi: .Estrella, A., Trubin, M., 2006. The Yield Curve as a Leading Indicator: Some Practical Issues. CurrentIssues in Economics and Finance July/August 2006 Volume 12, Number 5. Federal Reserve Bank ofNew York. URL: .Gogas, P., Papadimitriou, T., Matthaiou, M., Chrysanthidou, E., 2015. Yield Curve and RecessionForecasting in a Machine Learning Framework. Computational Economics 45, 635–645. doi: .Goyenko, R., Subrahmanyam, A., Ukhov, A., 2011. The Term Structure of Bond Market Liquidity andIts Implications for Expected Bond Returns. The Journal of Financial and Quantitative Analysis 46,111–139. URL: .G¨urkaynak, R.S., Sack, B., Wright, J.H., 2006. The U.S. Treasury Yield Curve: 1961 to the Present.Finance and Economics Discussion Series 2006-28. Washington: Board of Governors of the Federal Re-serve System. URL: .18all, A.S., 2018. Machine Learning Approaches to Macroeconomic Forecasting. Economic Review 4thQuarter 2018. Fedral Reserve Bank of Kansas City. doi: .Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning: Data Mining,Inference, and Prediction, Second Edition. 2nd edition ed., New York, NY. URL: https://web.stanford.edu/~hastie/ElemStatLearn/ .Johansson, P., Meldrum, A., 2018. Predicting Recession Probabilities Using the Slope of the YieldCurve. FEDS Notes. Washington: Board of Governors of the Federal Reserve System. doi: .Kim, H.H., Swanson, N.R., 2018. Mining big data using parsimonious factor, machine learning, variableselection and shrinkage methods. International Journal of Forecasting 34, 339–354. doi: .Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Pretten-hofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot,M., Duchesnay, ´E., 2011. Scikit-learn: Machine Learning in Python. Journal of Machine LearningResearch 12, 2825–2830. URL: http://jmlr.org/papers/v12/pedregosa11a.html .Rudebusch, G.D., Williams, J.C., 2009. Forecasting Recessions: The Puzzle of the Enduring Power of theYield Curve. Journal of Business & Economic Statistics 27, 492–503. doi: .Stock, J.H., Watson, M.W., 1989. New Indexes of Coincident and Leading Economic Indicators. NBERMacroeconomics Annual 4, 351–394. doi: .Tsang, E., Wu, M., 2019. Revisiting US Recession Probability Models. Research Memoran-dum 03/2019. Hong Kong Monetary Authority. URL: .Welch, I., Goyal, A., 2008. A Comprehensive Look at The Empirical Performance of Equity PremiumPrediction. The Review of Financial Studies 21, 1455–1508. doi: .Zou, H., Hastie, T., 2005. Regularization and variable selection via the elastic net. Journal of the RoyalStatistical Society: Series B (Statistical Methodology) 67, 301–320. doi:10.1111/j.1467-9868.2005.00503.x