[PDF] Identifying Different Definitions of Future in the Assessment of Future Economic Conditions: Application of PU Learning and Text Mining

Abstract

The Economy Watcher Survey, which is a market survey published by the Japanese government, contains \emph{assessments of current and future economic conditions} by people from various fields. Although this survey provides insights regarding economic policy for policymakers, a clear definition of the word "future" in future economic conditions is not provided. Hence, the assessments respondents provide in the survey are simply based on their interpretations of the meaning of "future." This motivated us to reveal the different interpretations of the future in their judgments of future economic conditions by applying weakly supervised learning and text mining. In our research, we separate the assessments of future economic conditions into economic conditions of the near and distant future using learning from positive and unlabeled data (PU learning). Because the dataset includes data from several periods, we devised new architecture to enable neural networks to conduct PU learning based on the idea of multi-task learning to efficiently learn a classifier. Our empirical analysis confirmed that the proposed method could separate the future economic conditions, and we interpreted the classification results to obtain intuitions for policymaking.

Full PDF

aa r X i v : . [ ec on . E M ] A p r I DENTIFYING D IFFER ENT D EFINITIONS OF F UTUR EIN THE A SSESSMENT OF F UTUR E E C ONOMIC C ONDITIONS :A PPLICATION OF

PU L

EAR NING AND T EXT M INING

A P

REPRINT

Masahiro Kato

The University of Tokyo [email protected]

April 23, 2020 A BSTRACT

The

Economy Watcher Survey , which is a market survey published by the Japanese government, con-tains assessments of current and future economic conditions by people from various ﬁelds. Althoughthis survey provides insights regarding economic policy for policymakers, a clear deﬁnition of theword “ future ” in future economic conditions is not provided. Hence, the assessments respondentsprovide in the survey are simply based on their interpretations of the meaning of “future.” This mo-tivated us to reveal the different interpretations of the future in their judgments of future economicconditions by applying weakly supervised learning and text mining . In our research, we separate theassessments of future economic conditions into economic conditions of the near and distant future using learning from positive and unlabeled data ( PU learning ). Because the dataset includes datafrom several periods, we devised new architecture to enable neural networks to conduct PU learningbased on the idea of multi-task learning to efﬁciently learn a classiﬁer. Our empirical analysis con-ﬁrmed that the proposed method could separate the future economic conditions, and we interpretedthe classiﬁcation results to obtain intuitions for policymaking.

The

Economy Watcher Survey is a market survey published by the Japanese government. The data consists of twotypes of assessments of economic conditions, current and future economic conditions , with ﬁve ranks. Althoughthis survey provides policymakers with deep insights, it is difﬁcult to interpret the assessments of future economicconditions because the meaning of future is not clearly deﬁned and the deﬁnition thereof relies on the respondent’sinterpretation. Therefore, to obtain a clear understanding of survey participants’ expectations, our approach was toclassify assessments of future economic conditions into those pertaining to the near and distant future , respectively.This led us to propose a novel method that uses text data and a machine-learning algorithm in an attempt to graspthese expectations with respect to future economic conditions using data from the

Economy Watcher Survey . Forthe classiﬁcation task, we apply an algorithm that learns from positive and unlabeled data ( PU learning ), which is amachine-learning algorithm that enables us to train a classiﬁer only from positive and unlabeled data.Among studies of economic trends, methods using information contained in text data have become popular. Pio-neering methods in this ﬁeld are Tetlock [2007]; Tetlock et al. [2008], which involved the construction of sentimentalindexes from articles of a column in the

Wall Street Journal and an analysis of the predictability of the stock market.Kulkarni et al. [2009] predicted the residential price by using the number of searches on Google. Guzman [2011] alsoconstructed real-time inﬂation expectations from search queries on Google.PU learning is an algorithm of weakly supervised learning [Elkan and Noto, 2008; Ward et al. , 2009; Blanchard et al. ,2010; Nguyen et al. , 2011]. In the section describing the problem setting, we consider a situation in which only positiveand unlabeled data exist, and use only these data to train a binary classiﬁer. PU learning has two scenarios known as censoring scenario and case-control scenario

Elkan and Noto [2008]. In the study presented in this paper, we onlydentifying Different Deﬁnitions of Futurein the Assessment of Future Economic Conditions:Application of PU Learning and Text Mining

A P

REPRINT focus on the case-control scenario, in which positive data are obtained separately from unlabeled data, and unlabeleddata are sampled from the entire population. In this study, we construct our algorithm on the basis of subsequentresearch known as unbiased PU learning [du Plessis et al. , 2015], which minimizes the unbiased estimator of theclassiﬁcation risk.After classifying the assessments of future economic conditions into those relating to the near and distant future, wecalculated the averaged ranks for both the near and distant future. As a result, we found that a signiﬁcant differenceexists between economic conditions relating to these two future periods. This result infers the possibility that people’sdeﬁnition of the future differs. This fact is important from the viewpoint of economics. In macroeconomics, a re-searcher may be interested in the possibility of controlling people’s expectations of the market. Our empirical analysisreports the fact that assessments of the economic conditions of the distant future were mainly based on economicfundamentals such as the population and diplomatic relationships.In the following sections, we describe our problem setting and propose an algorithm that solves the problem. Subse-quently, we present the results and interpretations of our empirical analysis.

We consider the binary classiﬁcation of text data. In the following parts, we describe the dataset and classiﬁcationproblem in detail.

In our analysis, we used the

Economy Watchers Survey , a dataset that contains text data and is published by theJapanese government . The purpose of this survey is to enable the region-by-region economic trends to be graspedaccurately. This survey consists of two assessments, an assessment of current and future economic conditions with thepossibility of entering sentences to motivate the answers by providing reasons. Respondents evaluated the current andfuture economic conditions by ﬁve ranks, , , , , . The evaluation means “worse” or “will get worse” comparedwith a previous period. The evaluation means “better’ or “will get better” compared with a previous period. Theevaluation represents a neutral position on the assessment of economic conditions. Interpretation of Assessment of Future Economic Conditions:

Assessments of current and future economic con-ditions provide us with deep insights into economic reality. However, in the questionnaire, there is no clear deﬁnitionof the concept of the “future” with respect to future economic conditions. Hence, different people interpret the durationof “future” in their own way. Whereas one person may imagine the future as just one week, the “future” might be a fewmonths for another person. Therefore, to analyze the assessments more accurately, we need to classify assessments offuture economic conditions as being either near or distant economic conditions.

To classify future economic conditions into those expected to occur in either the near or distant future, we proposeassuming that current economic conditions share similar sentences with those expected in the near future. Our classi-ﬁcation strategy is to regard current economic conditions as positive data and future economic conditions as unlabeleddata, which potentially consists of positive and negative data. In this paper, positive data are assessments of the currenteconomic conditions and those expected in the near future, whereas negative data are assessments of economic condi-tions foreseen to prevail in the distant future. We illustrate the relationship between assessments of current and futureeconomic conditions of our assumption on Figure 1. We train our classiﬁer only from positive and unlabeled data byusing an algorithm that employs PU learning. Therefore, the goal of this problem is to classify x ∈ X ⊂ R d into oneof the two classes {− , +1 } , where +1 denotes assessments of current economic conditions and those expected in thenear future (positive data) and − denotes economic conditions relating to the distant future (negative data). Let us describe the data generating process of our problem. Let us assume that we have n data points at t -th periodand denote the i -th text data as x i ∈ X ⊂ R d . If the target of text data x i describes current or near future economicconditions, we attach a positive label, i.e., y i = +1 . If the target of text data x i describes distant future economic A P

REPRINT

Figure 1: Our assumed deﬁnition of the time structure of assessments. conditions, we attach a negative label, i.e., y i = − . However, in the dataset, we can only observe positive data, andunlabeled data, which includes both positive and negative data. In addition, if the text data x i belongs to a period t ∈ { , ..., T } , we denote the fact as z i = t . Using these notations, we deﬁne our data generating process as follows: { x i } ni =1 i . i . d . ∼ p ( x | y = +1 , z = t ) , { x ′ i } n ′ i =1 i . i . d . ∼ p ( x | z = t ) , where { x i } ni =1 and { x ′ i } n ′ i =1 denote the positive and negative data at t -th period, and p ( x | z = t ) can be decomposedas p ( x | z = t ) = p ( y = +1 | z = t ) p ( x | y = +1 , z = t )+ p ( y = − | z = t ) p ( x | y = − , z = t ) . To classify data consisting only of positive and unlabeled data, we propose using multi-task PU learning ( MTPU ). Inthis section, we provide details of the proposed algorithm.

Before explaining our model, let us explain the standard setting of PU learning. In PU learning, we consider abinary classiﬁcation problem to classify x ∈ X ⊂ R d into one of the two classes {− , +1 } . We assume thatthere exists a joint distribution p ( x , y ) , where y ∈ {− , +1 } is the class label of x . PU learning relies on twodistinct sampling schemes, namely the censoring scenario and case-control scenario [Elkan and Noto, 2008]. The PUlearning framework we use in this study is the case-control scenario, in which we suppose access to a positive dataset { x i } ni =1 i . i . d . ∼ p ( x | y = +1) and an unlabeled dataset { x ′ i } n ′ i =1 i . i . d . ∼ p ( x ) . Let ℓ : R × {± } → R + be a loss function,where R + is the set of non-negative real values, and F be the set of measurable functions from X to [ ǫ, − ǫ ] , where ǫ ∈ (0 , / is a small positive value. This constant ǫ is introduced to ensure the following optimization problem iswell-deﬁned based on the result of Kato et al. [2019]. Here, du Plessis et al. [2015] showed that the classiﬁcation riskof f ∈ F can be expressed as R PU ( f ) = p ( y = +1) E p [ ℓ ( f ( X ) , +1)] − p ( y = − E p [ ℓ ( f ( X ) , − E u [ ℓ ( f ( X ) , − , (1)where E p and E u are the expectations over p ( x | y = +1) and p ( x ) , respectively. The above formulation of PU learningprovides the unbiased risk of the classiﬁcation problem. In addition to the standard setting of PU learning, we could take the time structure into account. The EconomyWatcher Survey comprises monthly data, with approximately , records for each month. Here, we would need touse different classiﬁers for the data included in each month for the following two reasons. First, the model can varyacross periods. Second, we would not be able to include data of the ( t + 1) -th period to train a model of data of the t -th period because the data of the ( t + 1) -th period might have information of the data of the t -th period. This madeit necessary to use different models across different periods. For z = t , we denote the model as f z = t and the risk asfollows: R PU ( f z = t , z = t ) = p ( y = +1 | z = t ) E p , t [ ℓ ( f ( X ) , +1)] − p ( y = − | z = t ) E p , t [ ℓ ( f ( X ) , − E u , t [ ℓ ( f ( X ) , − , , where ˆ E p , t denotes the averaging operator over positive data, ˆ E u , t denotes averaging over the unlabeled data at the t -th period. We additionally introduce a model for multi-task learning to PU learning. Multi-task learning is proposed3dentifying Different Deﬁnitions of Futurein the Assessment of Future Economic Conditions:Application of PU Learning and Text Mining A P

REPRINT

Figure 2: Neural network model for multi-task learning for PU learning. The models share one shared network with layers.Figure 3: Plotted assessments of the economic conditions of the near and distant future and those pertaining to the present andfuture. The horizontal line at y = 2 is the neutral state. The red vertical lines on the horizontal line represent the results of thetwo-sample t-test. The thin and bold red vertical lines represent the and signiﬁcance levels, respectively. to train neural networks efﬁciently by using the common features across different tasks Caruana [1997]. If a commonfeature exists across periods, we can train our models more efﬁciently by sharing the common feature among models f z = t for t = 1 , ..., T through the layers named shared layers , the structure of which is shown in Figure 2. We namedthis model MTPU . Details of its structure are provided in the section for empirical experiments.4dentifying Different Deﬁnitions of Futurein the Assessment of Future Economic Conditions:Application of PU Learning and Text Mining

A P

REPRINT

Table 1: Averaged assessments for each period and each type of economic condition. For averaged assessments of near anddistant economic conditions, we conducted a two-sample t-test. A signiﬁcant difference between the mean values of the assessmentis indicated by superscript ∗ in the table. One ∗ means that the null hypothesis of the two-sample t-test is rejected at the signiﬁcance level, whereas two ∗ s means that the null hypothesis of the two-sample t-test is rejected at the signiﬁcance level. MTPU Original PU1 PU2NF DF Current Future NF DF NF DFJan. 2016 1.842** 1.925** 1.996 1.864 1.889 1.857 1.846 1.964Feb. 2016 1.701** 1.720** 1.949 1.780 1.749 1.756 1.733 1.876Mar. 2016 1.768* 1.694* 1.870 1.800 1.805 1.731 1.691 1.804Apr. 2016 1.606** 1.668** 1.822 1.719 1.764 1.625 1.717 1.684May 2016 1.574** 1.736** 1.896 1.683 1.661 1.680 1.582 1.808June 2016 1.588 1.622 1.656 1.632 1.671 1.531 1.588 1.685July 2016 1.721** 1.784** 1.887 1.804 1.797 1.864 1.725 1.860Aug. 2016 1.770** 1.906** 1.904 1.815 1.789 1.863 1.695 1.887Sept. 2016 1.661** 1.860** 1.961 1.781 1.700 1.767 1.650 1.887Oct. 2016 1.789** 1.914** 1.985 1.852 1.785 1.878 1.762 1.906Nov. 2016 2.028 1.865 1.978 1.960 1.936 1.956 1.944 1.948Dec. 2016 2.139** 1.975** 1.974 2.095 2.053 2.074 2.151 1.996Jan. 2017 2.036 1.888 1.997 1.959 1.897 1.932 2.008 2.008Feb. 2017 1.947** 1.992** 2.091 1.969 1.886 2.024 1.886 2.033Mar. 2017 2.181* 1.881* 1.967 2.040 2.122 1.984 2.157 1.968Apr. 2017 2.176 1.963 2.034 2.039 2.049 1.959 2.135 1.955May 2017 2.077* 1.963* 2.080 2.013 1.988 2.041 2.061 2.008June 2017 2.086* 1.939* 2.084 2.022 1.951 1.971 2.016 2.078July 2017 2.162 1.980 2.034 2.055 2.077 2.033 2.016 1.972Aug. 2017 2.024 1.904 2.011 1.975 1.996 1.952 2.008 1.956Sept. 2017 2.041 1.914 2.032 2.005 1.943 1.967 1.931 2.033Oct. 2017 1.909** 2.040** 2.175 1.998 1.822 2.048 1.905 2.139Nov. 2017 2.278 2.045 2.076 2.125 2.173 2.020 2.121 2.061Dec. 2017 2.140** 2.137** 2.067 2.171 2.108 2.068 2.240 2.177Jan. 2018 1.935** 1.951** 2.124 1.957 1.874 2.016 1.935 2.029Feb. 2018 1.831** 1.938** 2.125 1.936 1.778 1.942 1.926 1.950Mar. 2018 2.052* 2.032* 2.005 2.073 2.105 1.943 2.073 1.992Apr. 2018 2.184 1.922 2.071 2.069 2.123 1.984 2.115 2.057May 2018 1.907** 1.837** 2.039 1.900 1.870 1.894 1.878 1.963June 2018 1.860** 1.963** 2.034 1.910 1.848 1.942 1.835 2.021July 2018 1.883** 1.838** 1.965 1.874 1.785 1.887 1.895 1.919Aug. 2018 1.900** 1.988** 2.034 1.913 1.799 1.931 1.956 2.016Sept. 2018 1.735** 1.881** 2.041 1.882 1.861 1.918 1.682 2.000Oct. 2018 1.964** 1.834** 2.006 1.916 1.948 1.785 1.911 1.887Nov. 2018 2.049 1.909 2.033 1.980 1.943 1.930 1.947 1.979Dec. 2018 2.017* 1.862* 1.881 1.951 1.996 1.900 2.017 1.912Jan. 2016 1.780** 1.784** 2.016 1.800 1.748 1.833 1.756 1.833Feb. 2019 1.871** 1.925** 2.014 1.863 1.917 1.837 1.829 1.95Mar. 2019 1.975* 1.856* 1.943 1.879 2.004 1.797 1.895 1.856Apr. 2019 1.852 1.894 1.972 1.916 1.877 1.962 1.797 1.928May 2019 1.845** 1.762** 1.863 1.766 1.853 1.713 1.784 1.758June 2019 1.736** 1.676** 1.868 1.719 1.762 1.718 1.715 1.748

When we train a classiﬁer, we can naively replace the expectations with the corresponding sample averages. However,Kiryo et al. [2017] pointed out that the basic form of the unbiased PU learning is ineffective with a deep neuralnetwork because of over-ﬁtting caused by the fact that the risk is not lower bounded. To implement PU learning withdeep neural networks, we applied the non-negative risk proposed by Kiryo et al. [2017] to the empirical risk deﬁnedin (2). For a hypothesis set H , let us deﬁne the following risk minimization problem, ˆ f z = t = argmin f z = t ∈H h b R PU ( f z = t , z = t ) + R ( f ) i , (2)where b R nnPU ( f z = t , z = t ) is a sample approximation of R PU ( f z = t , z = t ) with non-negative transformation pro-posed by Kiryo et al. [2017] and R is a regularization term. The remaining problem is to make a decision regarding the class prior p ( y = +1 | z = t ) . The class prior p ( y =+1 | z = t ) would be different across periods t . Although several algorithms have been proposed to estimate theclass prior [du Plessis and Sugiyama, 2014; Ramaswamy et al. , 2016; Jain et al. , 2016], the estimation is still knownto be a difﬁcult task. However, we can avoid the problematic estimation in the case of the particular goal we hope toreach. In our experiments, we assume that the class prior is p ( y = +1 | z = t ) = 0 . for all periods, t = 1 , ..., T .This assumption is not realistic because the probability would have different values across the periods. However,Kato et al. [2018, 2019] showed that the function f z = t is simply linear-proportional to the class prior, i.e., the following5dentifying Different Deﬁnitions of Futurein the Assessment of Future Economic Conditions:Application of PU Learning and Text Mining A P

REPRINT

Figure 4: Co-occurrence network of near and distant future economic conditions in June 2016 and February 2017. The lengths ofthe edges represent the value of the Jaccard coefﬁcients. Shorter edges indicate a stronger relationship (the value of the Jaccardcoefﬁcients is larger) between the two words. The widths of the edges also represent the value of the Jaccard coefﬁcients betweenthe two words. The bold edges similarly signify a stronger relationship (the value of the Jaccard coefﬁcients is larger) between thetwo words. The color of the nodes relates to the assessment. The yellow-green color denotes that the averaged value is , i.e., theassessment is neutral. The warmer and cooler colors represent positive and negative assessments, respectively. relationship holds even if we miss-specify the class prior: p ( y = +1 | x, z = t ) ≤ p ( y = +1 | x, z = t ) ⇔ f z = t ( x ) ≤ f z = t ( x ) . (3)Therefore, even when we cannot obtain the exact value of p ( y = +1 | x , z = t ) , we can still identify the order of p ( y = +1 | x , z = t ) with regard to x . Our empirical analysis separates the assessment of future economic conditionsinto near and distant future economic conditions based on this property. We classify / of data from the highestvalue of f z = t into assessments of near future economic conditions, and / of data from the lowest value of f z = t intoassessments of distant future economic conditions. In addition to the robustness to the miss-speciﬁed class prior, thefunction f z = t also holds the relationship 3 under the selection bias of positive data [Kato et al. , 2019] if our assumptionis mild. Thus, our results can reduce the inﬂuence of the miss-speciﬁed class prior and selection bias. In this section, we report the results of the empirical analysis of data from the Economy Watcher Survey. The surveywas conducted every month starting in 2000. Our analysis only used data from January 2016 to June 2019, i.e., months’ data. Each month includes approximately , samples. The reason for the heterogeneity among the data isthe lack of text in respondents’ answers. In total, we had 111,501 samples.We used Bag-of-Words to represent the documents as , -dimensional vectors. After vectorizing the text data, weapplied PU learning with the aforementioned MTPU. In addition to the model, we also used the standard model ofPU learning to compare the performance. We used this model of PU learning in two ways. First, we used all samplesto train one model. Second, we prepared one model for each month. Details of the neural networks are provided in6dentifying Different Deﬁnitions of Futurein the Assessment of Future Economic Conditions:Application of PU Learning and Text Mining A P

REPRINT the following section. After training our classiﬁer, we classiﬁed the assessment of future economic conditions usingunlabeled data that we used for training.

Neural network model:

First, we describe the model we used for MTPU. The model for the shared network was a -layer multilayer perceptron (MLP) with ReLU Nair and Hinton [2010] (more speciﬁcally, − − − ).The neural network model following the shared network was a -layer MLP (more speciﬁcally, − − ) withReLU. Next, we describe the model we used for non-negative PU learning. The model for the neural network was a -layer MLP (more speciﬁcally, − − − − − ) with ReLU. We set p ( y = +1 | z = t ) = 0 . forall t ∈ { , , ..., } . For both methods, we use logistic loss for the loss function ℓ . In this section, we report the extent to which assessments differ across current, future, near future, and distant futureeconomic conditions.

Averaged Assessments and t-test:

We report the averaged assessments of economic conditions in the near anddistant future in comparison with those of the current and future. Assessments of the near and distant future economicconditions are estimated by MTPU and non-negative PU learning with neural networks. For non-negative PU learning,we used two models. The ﬁrst (named PU1) entailed training one model for all samples. The second (named PU2)involved using different models for the data of different months. The results are presented in Table 1. For eachperiod, we show the results of the two-sample t-test with unequal variances between the assessments of the economicconditions of the near and distant future. Values for which the difference between the mean of the assessments issigniﬁcant are indicated by superscript ∗ in the table. One ∗ and two ∗ s mean that the null hypothesis of the two-sample t-test is rejected at the and signiﬁcance levels, respectively. Visualization as a Time Series:

To facilitate a more intuitive understanding of the reported results, we plotted theaveraged assessments in the time series in Figure 3, where the x -axis corresponds to the time series, and the y -axiscorresponds to the value of the assessment. The blue, orange, green, and red lines correspond to assessments of theeconomic conditions in the near future, distant future, at the present time, and in the future. The horizontal blackdashed line at y = 2 . represents the neutral condition in the -step evaluations for the economic conditions from (bad) to (good). The vertical red lines perpendicular to the line y = 2 . indicate that the difference between theaverage assessments of the economic conditions in the near and distant future is signiﬁcant in the two-sample t-test.The bold vertical lines represent that the null hypothesis of the two-sample t-test is rejected at the signiﬁcancelevel and the other red lines represent that two ∗ means that the null hypothesis of the two-sample t-test is rejectedat the signiﬁcance level. For example, the assessments of the economic conditions in the near future in 2017 aresigniﬁcantly higher than those of the distant economic conditions. This section presents our analysis of the text based on assessments of the text data. For text mining, we use tf-idf and the

Jaccard coefﬁcient , which are standard techniques of natural language processing. First, we separate theassessments of the economic conditions in the near and distant future for the month in which the assessments werepublished, i.e., we form groups of monthly assessments. Then, we denote a set of the group of assessments as M , andwe apply tf-idf to identify the word that characterizes the document. Then, for the words with the highest tf-idf,we measure the Jaccard coefﬁcient Manning and Sch¨utze [1999], which measures the similarity between two sets. Let M w ∈ M be a set of sentences including the word w . The Jaccard coefﬁcient J ( M a , M b ) for a word a and a word b can be expressed as follows: J ( M a , M b ) = |M a ∩ M b ||M a ∪ M b | . (4)Based on these results, we plotted the co-occurrence networks in Figure 4 . Because of the limitation placed on thelength of the paper, we only show the network of assessments in June 2016 and February 2017. June 2016 is one ofthe periods in which the value of assessments greatly changed. Throughout 2017, the economic conditions of the nearfuture are less than those in the distant future, and February 2016 is one of these periods. Because of the small size ofour graphs, we placed enlarged versions of these graphs in the appendix in both English and Japanese. We translated from Japanese to English using an API provided by Google (https://pypi.org/project/googletrans/).

A P

REPRINT

Figure 4 displays words related to economic fundamentals, such as the structure of the labor supply and internationalpolitics. In other words, these results can be interpreted as meaning that assessments of the economic conditions ofthe near future represent the economic cycle, whereas assessments of the economic conditions of the distant futurerepresent the economic trend. For example, the words “U.K.” and “withdrawal” appear, both of which are related toBrexit among the economic conditions of the distant future, “Business cycle,” and “Trend” in June 2016. The words“US” and “President” appear in Feb. 2017. On the other hand, the economic conditions of the near future in June2016 and Feb, 2017 are represented by words that have less relationship with economic fundamentals such as “rainyseason” and “Valentin’s day.” For policymakers, this is an insightful ﬁnding because the result infers that they cannoteasily change people’s expectations based on economic fundamentals.

In this paper, we proposed a new application of PU learning and text mining to data consisting of ﬁnancial text.We developed a new model named MTPU to train neural networks efﬁciently using data with a time structure. Ourempirical analysis showed the classiﬁcation result and interpretations based on text mining and economics. The resultis insightful to policymakers because the result infers that people might have a different interpretation of the deﬁnitionof the future and may assess the future economic outlook differently based on their interpretations of the future.Besides, we also found that there are different main reasons between near and distant future economic assessments.

References

Gilles Blanchard, Gyemin Lee, and Clayton Scott. Semi-supervised novelty detection.

Journal of Machine LearningResearch , 11(Nov):2973–3009, 2010.Rich Caruana. Multitask learning.

Machine Learning , 28(1):41–75, Jul 1997.Marthinus Christoffel du Plessis and Masashi Sugiyama. Class prior estimation from positive and unlabeled data.

IEICE Transactions on Information and Systems , E97-D(5):1358–1362, 2014.Marthinus Christoffel du Plessis, Gang. Niu, and Masashi Sugiyama. Convex formulation for learning from positiveand unlabeled data. In

ICML , pages 1386–1394, 2015.Charles Elkan and Keith Noto. Learning classiﬁers from only positive and unlabeled data. In

ICDM , pages 213–220,2008.Giselle Guzman. Internet search behavior as an economic forecasting tool: The case of inﬂation expectations.

Journalof Economic and Social Measurement , 36, 11 2011.Shantanu Jain, Martha White, Michael W Trosset, and Predrag Radivojac. Nonparametric semi-supervised learningof class proportions. In

NIPS , 2016.Masahiro Kato, Liyuan Xu, Gang Niu, and Masashi Sugiyama. Alternate estimation of a classiﬁer and the class-priorfrom positive and unlabeled data. arXiv:1809.05710 , 2018.Masahiro Kato, Takeshi Teshima, and Junya Honda. Learning from positive and unlabeled data with a selection bias.In

International Conference on Learning Representations , 2019.Ryuichi Kiryo, Gang Niu, Marthinus Christoffel du Plessis, and Masashi Sugiyama. Positive-unlabeled learning withnon-negative risk estimator. In

NIPS , pages 1675–1685, 2017.Rajendra Kulkarni, Kingsley Haynes, Roger Stough, and Jean Paelinck. Forecasting housing prices with googleeconometrics.

SSRN Electronic Journal , 07 2009.Christopher D. Manning and Hinrich Sch¨utze.

Foundations of Statistical Natural Language Processing . MIT Press,Cambridge, MA, USA, 1999.Vinod Nair and Geoffrey E. Hinton. Rectiﬁed linear units improve restricted boltzmann machines. In

ICML , 2010.Minh Nhut Nguyen, Xiaoli-Li Li, and See-Kiong Ng. Positive unlabeled leaning for time series classiﬁcation. In

IJCAI , pages 1421–1426, 2011.Harish Ramaswamy, Clayton Scott, and Ambuj Tewari. Mixture proportion estimation via kernel embeddings ofdistributions. In

ICML , pages 2052–2060, 2016.Paul C. Tetlock, Maytal Saar-tsechansky, and Sofus Macskassy. More than words: Quantifying language to measureﬁrms’ fundamentals.

Journal of Finance , 63(3):1437–1467, 2008.8dentifying Different Deﬁnitions of Futurein the Assessment of Future Economic Conditions:Application of PU Learning and Text Mining

A P

REPRINT

Paul C. Tetlock. Giving content to investor sentiment: The role of media in the stock market.

The Journal of Finance ,62(3):1139–1168, 2007.Gill Ward, Trevor Hastie, Simon Barry, Jane Elith, and John R Leathwick. Presence-only data and the em algorithm.

Biometrics , 65(2):554–563, 2009. 9dentifying Different Deﬁnitions of Futurein the Assessment of Future Economic Conditions:Application of PU Learning and Text Mining

A P

REPRINT

A Graphs of Co-occurrence Network

Because of the length limitation of the paper, we only included reduced-size versions of the graphs in the main body.Here, we present enlarged versions of Figure 4. The lengths of the edges represent the values of the Jaccard coefﬁcientswith shorter edges indicating a stronger relationship between two words (the Jaccard coefﬁcients have larger values).The widths of the edges also represent the value of the Jaccard coefﬁcients between two words. The bold edgessimilarly indicate a stronger relationship (larger values of the Jaccard coefﬁcients) between the two words. The colorof the nodes represents the averaged value of the assessments of the text in which the word appeared. Yellow-greendenotes that the averaged value is , i.e., the assessment is neutral. Warmer and cooler colors represent positive andnegative assessments, respectively. 10dentifying Different Deﬁnitions of Futurein the Assessment of Future Economic Conditions:Application of PU Learning and Text Mining A P

REPRINT

A.1 Co-occurrence Network of Near Future Economic Conditions in June 2017

A P

REPRINT

A.2 Co-occurrence Network of Distant Future Economic Conditions in June 2017

A P

REPRINT

A.3 Co-occurrence Network of Near Future Economic Conditions in February 2017

A P

REPRINT