Disruption in the Chinese E-Commerce During COVID-19
Yuan Yuan, Muzhi Guan, Zhilun Zhou, Sundong Kim, Meeyoung Cha, Depeng Jin, Yong Li
DDisruption in the Chinese E-Commerce During COVID-19
Yuan Yuan, Muzhi Guan, Zhilun Zhou, Sundong Kim, Meeyoung Cha, Depeng Jin, Yong Li Tsinghua University, China Institute for Basic Science, South Korea
Abstract
The recent outbreak of the novel coronavirus (COVID-19)has infected millions of citizens worldwide and claimed manylives. This paper examines its impact on the Chinese e-commerce market by analyzing behavioral changes seen froma large online shopping platform. We first conduct a timeseries analysis to identify product categories that faced themost extensive disruptions. The time-lagged analysis showsthat behavioral patterns seen in shopping actions are highlyresponsive to epidemic development. Based on these find-ings, we present a consumer demand prediction method byencompassing the epidemic statistics and behavioral featuresfor COVID-19 related products. Experiment results demon-strate that our predictions outperform existing baselines andfurther extend to the long-term and province-level forecasts.We discuss how our market analysis and prediction can helpbetter prepare for future pandemics by gaining an extra timeto launch preventive steps.
Introduction
The coronavirus disease 2019 (COVID-19) had a massivebreakout in Wuhan during the Spring Festival in 2020 andlater followed by a planetary health emergency. The novelvirus had significantly influenced people’s daily lives. Gov-ernments and the World Health Organization (WHO) haverecommended people to stay at home and avoid crowdedplaces. The disease dynamic was particularly rapid in China;it spread from Wuhan to all other regions and became nearlycontained over the span of only two months.According to the McKinsey report (Arora et al. 2020),global citizens increased their reliance on online shop-ping and delivery of essential goods, compared to the pre-pandemic time. Under this circumstance, epidemic-relatedproducts such as face masks and disinfectants were short insupply, failing to meet people’s demand. Such a disruption inthe supply-and-demand could reshape online shopping pat-terns, not only for COVID-19 related products but also forordinary products. Understanding disruption in e-commercecould benefit all stakeholders (i.e., retailers, consumers, sup-pliers, delivery systems, and local governments) better pre-pare for the next pandemic. However, due to the lack of dataor scenarios of a time-concentrated outbreak, little is knownabout consumer demand in the context of epidemics.This paper conducts an extensive analysis of online shop-ping trends before and during the COVID-19 epidemic fromthe view of a popular e-commerce platform in China, Beid- ian (Cao et al. 2020). We characterize the pandemic’s im-pact on the market from changes in product-level demandand supply. First, our analysis reveals which products in-creased or decreased in sales (after discounting seasonalvariation), helping understand how households are copingwith the pandemic. Second, it also compares the differencesin browsing, searching, and purchasing activities towardspandemic-related goods such as face masks, disinfectants,and thermometers, which reveals the intricate relationshipbetween supply and demand. We also identify which prod-ucts decreased the largest in sales to infer the causes of sucha drop. We also use time-lagged cross-correlation to quan-tify how shopping actions respond to the pandemic, interro-gating product supply shortage. To the best of our knowl-edge, no other research has examined the disruptions in e-commerce at such a fine-grained level.Based on these observations, we propose an En coder-decoder model that leverages the online shopping behaviorsand the CO VID-19 epidemic statistics to predict changesin d emand of critical goods, EnCod for short. Exper-iments show that both shopping and epidemic featuresgathered from the past weeks are important for predictingpandemic-relevant goods in the upcoming days. Our modelachieves higher prediction performance than baselines, andit could be fine-tuned at the province level; each provinceor city may adopt our model to understand the needs oftheir citizens during a pandemic. In summary, our maincontributions are as follows:1. We operationalize and release a dataset of people’s on-line shopping behaviors during the COVID-19 epidemic,and its multiple features characterize the marketingchange during this period.2. We investigate the changes of different online shoppingbehaviors, including purchasing, browsing, and searchingon the platform, and examine the interplay between theCOVID-19 epidemic and consumers’ behaviors.3. We conduct a time-lagged cross-correlation analysiscould reveal which products exhibit a demand patternthat coaligns well with the epidemic dynamics.4. We propose a model to forecast consumer demand onessential product categories, and demonstrate our model’seffectiveness with regional and long-term forecast. https://bit.ly/3kcAEN5 a r X i v : . [ phy s i c s . s o c - ph ] O c t elated Work Various studies have appeared since the outbreak of COVID-19 due to its unprecedented challenges for industry andsociety. Researchers have studied the impact of epidemicsfrom multiple aspects, including transportation (Huanget al. 2020), gender equality (Alon et al. 2020), globalpoverty (Sumner et al. 2020), and stock market (Baker et al.2020). Among those related to businesses, one study exam-ines the online shopping food services during the govern-ment’s stay-at-home order (Chang and Meyerhoefer 2020).Others have looked at the economic impacts of a large epi-demic (Schoenbaum 1987; Meltzer, Cox, and Fukuda 1999).However, little is known about consumer actions under ahealth risk, due to the lack of data encompassing such atime-concentrated outbreak (WTO 2020). This paper inves-tigates the impact of the epidemic on online shopping be-haviors and demand forecasting.Several classical regression models exist in demandforecasting, such as the autoregressive integrated mov-ing average (ARIMA) (Contreras et al. 2003). How-ever, these models produce accurate forecasting resultsonly when the sequence patterns are linearly correlatedand stationary over time (Mills 1991; Omar, Hoang, andLiu 2016). New approaches adopt machine learning anddeep learning algorithms for prediction, such as XG-Boost (Chen and Guestrin 2016) and sequence-to-sequence(seq2seq) (Sutskever, Vinyals, and Le 2014) models. Despitethe potential, most models require mass data for training,and how they perform under a sudden disruption is yet to beinvestigated. This paper presents an encoder-decoder modelin the prediction of near-future demands towards epidemic-related products.During the outbreak of COVID-19, efforts are being paidto understand COVID-19 from the perspectives of struc-tural biology (Wrapp et al. 2020), genetics (Hoffmann et al.2020), economics (Cornwall 2020), policy (Tian et al. 2020;Maier and Brockmann 2020) and trend prediction (Cohen2020). In addition to these efforts, the current study aimsto provide a picture of the COVID-19 impact seen on Chi-nese e-commerce and leverages both the epidemic-relatedand behavior-related information to forecast the demand foressential goods.
Data
We use two data sources. The first is from a mobile-basedshopping platform, Beidian. The platform is one of thelargest in China and has a monthly user base of 3.44 millionand an aggregate 187 million app downloads. It is a one-stop-shop and offers products ranging to nearly two thou-sand categories. We received anonymized session logs thatspan from January 1, 2019, to April 30, 2020. Each sessioninformation contained, for every instance, the action type(e.g., browsing, purchasing, searching), product ID, productcategory, and time. At the user-level, we were also given in-formation about the cities they reside in. In the current anal-ysis, logs originating from Hubei province were removedsince the delivery of goods was prohibited during this area’slockdown. Table 1 displays the summary of data statistics. The pre-pandemic period data in (2019 and early 2020) areused to adjust for any seasonality pattern in post-pandemicperiod data analysis.Table 1: Summary of the Beidian e-commerce dataset.
Statistics Users Products Purchase Browse SearchValue 18M 550K 190M 8,285M 1,318M
The next data source is the daily epidemic statistics de-scribing the newly confirmed cases within China. We com-bine two data sources: (1) the official reports from the NationHealth Commission of China are used for all data up to Jan-uary 22, 2020, and (2) the COVID-19 dashboard data by theCenter for Systems Science and Engineering at Johns Hop-kins University are used for January 22, 2020, and onward. Modeling Market Disruptions
Demands on specific health-related products such as facemasks and hand sanitizers are bound to increase a healthcrisis, leading to a temporary surge in shopping actions. Toquantify the degree of a rank change in purchase popularityof a product category c over a given period of t , we definethe Relative Popularity ( RP ) as follows: RP ( c, t ) = log ranking( c, t )ranking( c, t ) , (1)where t is the reference time to which a target period’s pop-ularity is compared to. We consider the first week of 2020 asa reference point and repeatedly compute the RP value ev-ery week in 2020. This metric measures the change of popu-larity at the product-level and address potential confounderssuch as the number of active users. The trajectory of thisvalue represents disruptions on the purchase before and af-ter COVID-19. MasksDisinfectantsHand sanitizersOnline course for childrenDisposable utensilsThermometersVitaminsUtensilsChinese wineCotton clothesMixed nutsFlavored milk more popularless popular -1.5 -1.0 -0.5 0.0. 0.5 1.0 1.5 2.0 2.5
Relative Popularity
Figure 1: Products with the largest purchase rank change.
Popularity change by products
The most affected products.
Figure 1 presents the sig-nature categories that mark the highest and the lowest RP https://github.com/CSSEGISandData/COVID-19a) Purchasing (week scale) (b) Purchasing (week scale)(c) Browsing (week scale) (d) Browsing (week scale)(e) Searching (week scale) (f) Searching (week scale) Figure 2: Weekly popularity dynamics during the COVID-19 period. Most products in panels (a), (c) are increasing in theirrelative popularity ranking, whereas most products in panels (b), (d) show decreasing popularity ranks.values. The top seven items, from masks to vitamins, arewithin the top-20 to increase purchase rank change; theirsales have surged compared to the final week of 2019. Incontrast, the bottom five items from utensils to flavored milk,within the bottom-20, mark the largest decrease in purchaserank change; their sales decrease the largest during the pan-demic.The top list includes epidemic-related products such asface masks, disinfectants, hand sanitizers, and thermome-ters. The list also includes online course programs for chil-dren, disposable utensils, and vitamins, which could be as-sumed relevant during the pandemic given homeschooling,hygiene, and immunity have become either mandatory orimportant. We see (non-disposable) utensils, Chinese wine,cotton clothes, mixed nuts, and flavored milk moving on tothe bottom list. These items are mostly within the top 100 before the pandemic. However, their purchase ranks drop bymore than ten times during the pandemic. We list the full listof top-20 and bottom-20 product categories in the Appendix.
Weekly fluctuations.
Figure 2 shows the week-by-week RP values of the 12 prominent goods along with their fittedlines. Here t is again set to the first week of 2020. Prod-uct rank is shown for all three shopping actions: browsing,searching, and purchasing. The figure also shows the popu-larity trajectory of the same items in the year of 2019. Weshift the timeline for 2019 and sync the new year’s holidayweek to appear as the fourth data point, to discount seasonaleffect. Note that the studied popularity measure RP is sta-ble and applicable to all popularity levels since it looks atthe relative rank changes in the logarithmic scale.Products in panels (a) and (c) such as face masks, disin- a) Masks (week scale) (b) Disinfectants (week scale)(c) Hand sanitizers (week scale) (d) Mixed nuts (week scale) Figure 3: Weekly records of browsing, searching and purchasing on different kinds of products. Those of searching and pur-chasing are multiplied by ten (i.e., × in these figures actually represents × of searching and purchasing).fectants, hand sanitizers, and thermometers show a surge inall shopping actions from the week of January 20th. Thisis when the lockdown of Hubei province was enforced. Facemasks, in particular, continue to remain the top-ranked itemthroughout the pandemic period. The rank difference is sub-stantially high and above 2.0 for the search action. Note thatthe RP value of 2.0 indicates several hundreds of rank in-creases. The dashed lines, representing the identical item’srank changes in 2019, do not show any notable increase.This confirms that the surges are not seasonal but uniqueto 2020 (i.e., epidemic-related).The sudden rank change for children’s online courses isalso noticeable during the pandemic’s first week. The searchaction of this product, nonetheless, is not as high as othertop products. We also pay attention to the products like ther-mometers, vitamins, and disposable utensils whose peaks indemand come a week or two after the disease outbreak. Thetimes at which each product shows the highest rank changein demand could be used to understand what popular healthpractices households adopt during the pandemic and howmuch they are concerned about the disease.In contrast, products in panels (b) and (d) show decreasedranks in all shopping actions. Some changes, however, areseasonal and can also be observed in 2019. For instance,the rank order of cotton clothes and Chinese wine gradu-ally decreases in 2019 and 2020, indicating a seasonal ef-fect (e.g., warmer weather, wine demands decreasing afterthe new year’s celebration). However, products like flavoredmilk and utensils show decreased popularity only in 2020,indicating households consume certain items less during thepandemic. It is interesting to contrast the decreasing demandfor utensils against disposable utensils that shows a rise indemand during the pandemic. This contrast likely arises due The surge appears on the fourth data points; the fitted lines arefor visual aid and do not represent a gradual increase in popularity. to increased efforts for hygiene.
Session counts by action type.
To examine the dynamicsacross the shopping actions, Figure 3 compares the sessioncounts for four representative products. The data shows thatshoppers engage the most frequently in browsing actions, upto ten times more than the searching and purchasing actions.Also shown in the figure is the number of weekly confirmedepidemic statistics in China.For mixed nuts, a non-epidemic product, the relative ses-sion counts across the three action types remain similar overtime. In contrast, for the other three epidemic-related goods,searching and purchasing action rates may differ becauseproducts may be sold out. Interestingly, browsing and pur-chasing actions show similar fluctuations. These distinctivepatterns are the most obvious for face masks in Figure 3(a),where the searching action demonstrates that it continuesto be in high demand. Plots of disinfectants show a similartrend. The plot of hand sanitizers, however, show less differ-ence across the three shopping actions. This may be becausehand sanitizers were accessible more widely than the othertwo items.
Time-lagged analysis
The temporal analysis so far has revealed the dependencyof online shopping actions to COVID-19; demand for someproducts is immediate upon a health risk, but other prod-ucts stagger in their rank change. This subsection examinesthe time difference in detail via computing the lagged cor-relation between the two sequences. We compute the time-lagged cross-correlation (TLCC) (Shen 2015). The methodshifts two sequences relatively in time (time-lagged) andcalculates the correlation between them (cross-correlation).Therefore, it can analyze non-stationary time series andquantify how the three shopping actions appear upon a pan-demic. We use a rolling window to analyze different peri- ff s e t / da y (a) Masks ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . O ff s e t / d a categor : disinfectant behavior: purchase −0.8−0.40.00.40.8 (b) Disinfectants ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . O ff s e t / d a categor : hand sanitizer behavior: purchase −0.8−0.40.00.40.8 (c) Hand sanitizers ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . O ff s e t / d a y category: itamin beha ior: purchase −0.8−0.40.00.40.8 (d) Vitamins Figure 4: Time-lagged cross-correlation results of different products with a rolling time window. The color denotes the Pearsoncorrelation coefficient between the two time series of corresponding behavior and newly confirmed cases.ods and set each window to be a size of 21 days, and thelagged time is set from 0 to 6 days between the two se-ries. Figure 4 presents the results of the time-lagged cross-correlation (TLCC) for selected products based on the pur-chase log. The y-axis represents the time-difference offsetdenoting the number of days that behavioral response fallsbehind the epidemic spreading, and the color represents thePearson correlation coefficient from negative (blue) to posi-tive (red).The correlations are not uniform across products. Fig-ure 4(a) demonstrates that the strong correlation betweenmask purchases and epidemic development does not lastlong due to falling mask supplies. This trend is shown bydark red blocks on Jan 14, followed by blue shaded re-gions in the figure. Figure 4(b) in contrast, shows that disin-fectant purchases closely follow the epidemic developmentthroughout the whole period, resulting in high correlation(i.e., red shaded areas). We can deduce that there is no short-age of disinfectant supplies on the platform.On the other hand, the demands of hand sanitiz-ers(Figure 4(c)) and vitamins(Figure 4(d)) remain high untilthe end of February. For example, as shown in Figure 3(c)and 4(c), sales of hand sanitizer skyrocket by following theepidemic development at the early stages of COVID-19. Itshows a week’s lagged response to epidemic development.However, the lagged effect decreases over time, implyingthat shopping demands become neutral to epidemic devel-opment. Furthermore, a positive relationship becomes weakand turns to a negative. This can be interpreted as hand san-itizers are still high in demand, even after the number ofnew confirmed cases decreased. Toward the end of the timeframes, the lagged positive correlation appears again, whichmeans that consumers have stocked enough hand sanitizers,thus no longer purchase them.Our data shows that correlations are found across shop-ping actions. Mask is a representative product that faced sup-ply shortage during the epidemics in China. The searchingactions in Figure 5(a) show a negative correlation in the laterperiod of epidemics, indicating that people are still search-ing for masks although the COVID-19 confirmed cases isdecreasing, identified as blue shaded regions. Positive cor-relation appearing on the later part of Figure 5(b) can beunderstood in the same context that only when the numberof confirmed patients decreased, the number of sales and thenumber of confirmed cases show positive correlation.In summary, our analysis shows that dynamic correlationsexist between online shopping behaviors and epidemic de- O ff s e t / da y (a) Searching on masks O ff s e t / da y (b) Purchasing on masks Figure 5: Even when the epidemic subsided, consumer inter-est in searching masks remained. For preparing future epi-demic, demand forecasting and inventory planning are vitalto avoid supply shortage.velopment. We find that behaviors respond to the epidemicin a lagged manner, but the correlation can be reversed whenthere is a shortage or continued caution. These patternsand observations inspire us to design an accurate andexplainable predictor for forecasting consumer demand onkey product categories.
Demand Forecasting and Evaluations
The analysis so far has demonstrated how COVID-19 im-pacted product popularity and behavioral patterns. The dis-tinctive patterns of the purchasing and searching for criticalitems like facial masks suggest that the pandemic disruptsthe supply of essential goods and leads to an imbalance ofdemand and supply. The analysis also confirmed a signifi-cant correlation between shopping actions and epidemic se-quences. Together, these findings suggest that many prod-ucts’ purchase intent is directly affected by the epidemic de-velopment during a health crisis.Based on the insights above, we present a Gated Recur-rent Unit (GRU)-based encoder-decoder model named
En-Cod that leverages the epidemic information along with his-torical shopping behaviors to predict the demand for criticalgoods. We use the data of daily confirmed cases and search-ing behaviors in the past two weeks to predict the numberof searches in the following n days. Figure 6 shows the de-tail of the prediction model. The EnCod model is based onthe GRU network (Bahdanau, Cho, and Bengio 2014). Themodel takes in the concatenation of sequences of daily con-firmed cases and searches as input. The encoder module ex-tracts the historical sequence features and outputs the lasthidden states that serve as the decoder’s input. Then the de-coder model generates the prediction results in the future.able 2: The NRMSE performance of our method compared to baselines. The results represent the prediction performance inChina except for Hubei province. Products with increased purchase ranks (COVID-19 related) Products with decreased purchase ranks (COVID-19 irrelevant)Model Hand Thermo- Flavored Daily MixedMasks Disinfectants sanitizers Vitamins meters milk necessities Vegetables nuts Decorations
AR 0.657 0.366 0.226 0.287 0.248 0.281 0.287 0.387 0.269 0.469ARIMA 0.461 0.633 0.865 0.258 0.232 0.292 0.298 0.405 0.267 0.447Seq2seq 0.438 0.323 0.197 0.216 0.149 0.283 0.305 0.330 0.253 0.286XGBoost 0.718 0.578 0.597 0.492 0.309 0.283 0.301 0.327 0.294 0.463XGBoost-C 0.468 0.512 0.289 0.307 0.293 0.281 0.330 0.326 0.295 0.547
EnCod 0.307 0.212 0.156 0.201 0.126 …… Searching sequenceEpidemic sequence … y ! y " y y y % y Searching sequence in the futureGRU hidden statesGRU hidden statesOutput (the next n days) Last hidden statesInput (the past m days) … x ! x " x % x &$" x &$! x & Figure 6: The overall architecture of our proposed modelEnCod, which takes in the concatenation of epidemic andsearching sequences in the past m days and outputs the dailynumber of searches in the next n days. Performance evaluations
We compare our model with baselines of classical time se-ries forecasting algorithms, including autoregressive ( AR )model (Mills 1991), ARIMA model (Contreras et al. 2003),and a deep learning algorithm like
Seq2Seq and
XG-Boost (Chen and Guestrin 2016). These methods only usein historical shopping behaviors and predict its future. Fora fair comparison, we also use a variant of the XGBoostmodel that utilizes both the historical shopping history andepidemic statistics (which we call
XGBoost-C ).We evaluate the prediction performance with the Normal-ized Root Mean Square Error (NRMSE) (Rocha, Cortez, andNeves 2007). The experimental period is from January 1 toMarch 31, 2020, covering the leading COVID-19 epidemicdevelopment in China. We use historical searching logs fromthe Beidian platform, and the daily confirmed cases overthe past two weeks to predict consumer demand (i.e., prod-uct searches) in the immediate week. We split the data intothe training and testing sets with a ratio of 3:1 according totime. We train the model parameters with Adam optimizerregularized by an early stop and set the mini-batch size 10.The learning rate is initialized as 1e-2, which is graduallyreduced by 0.1.Table 2 contains the prediction performance of EnCod along with other baselines. We choose ten categories of twogroups, including both the COVID-19 related and COVID-19 unrelated product categories. Products with the highestrelative popularity ( RP ) such as face masks appear in theCOVID-19 related products, and those with low RP valueappear as the irrelevant group. The COVID-19 related group results show that adding theepidemic statistics contributes to a substantial demand fore-casting performance. This was consistent for both XGBoostand encoder-decoder models. (Seq2Seq is a variant of ourmodel without the epidemic information.) Compared to thebest performing baseline, EnCod could reduce the NRMSEvalue by an additional . % to . % in the prediction tasksfor the COVID-19 related products. However, having theepidemic information as in XGBoost-C is not necessarily thesecond-best performing alternative. Sometimes it was theAR, ARIMA, or the Seq2Seq model that produced a goodalternative, excluding EnCod.Next, in the comparison of COVID-19 irrelevant products,EnCod no longer produces the best results in all predictiontasks. EnCod is only marginally better, and the XGBoost-Cmodel produces the best result for flavored milk and vegeta-bles. This means that obtaining additional epidemic statisticsis no longer helpful for items with low RP value. Among theitems, we show the results for daily necessities (e.g., toiletpaper, storage bags, kitchen supplies) and home decorations.All other results can be found in the Appendix.The comparison results between the two groups validatethat our method can capture the relationship between epi-demic development and the demand change of essentialgoods. Moreover, it verifies the usability of the metrics RP ,which is defined to characterize the market. The predictionresults indicate that the metrics RP can distinguish the prod-uct’s relevance to an epidemic. Regional and long-term forecasting
We now focus on a single product category, face masks, toexamine the regional and long-term forecasting capability.To examine province-level results, we choose nine repre-sentative provinces in China considering the geography, thedistance from Hubei, and confirmed cases. We train eachprovince-specific model with its own confirmed cases and We show the full results for all of the major product categoriesin the Appendix. able 3: The NRMSE performance of baseline methods and our model measured in nine representative provinces in China.Provinces nearby Hubei are marked with the ∗ sign. Model ProvincesBeijing Shanxi Jilin Zhejiang ∗ Shanghai Sichuan ∗ Guangdong Hunan ∗ Henan ∗ AR 0.521 0.990 0.686 0.454 0.597 0.432 0.990 0.398 0.563ARIMA 0.442 0.491 0.536 0.346 0.439 0.340 0.589 0.350 0.345Seq2seq 0.338 0.403 0.579 0.267 0.353 0.314 0.723 0.294 0.310XGBoost 0.621 0.481 0.860 0.284 0.493 0.354 0.672 0.376 0.412XGBoost-C 0.463 0.444 0.628 0.251 0.452
EnCod 0.315 0.334 0.349 0.245 0.278
Table 4: The NRMSE performance for long-term prediction.
Model Forecasting interval ( n )1 3 5 7 10 14 AR EnCod searching records to predict their citizens’ needs.Table 3 shows the comparison of regional forecasting.Even when learning is fine-tuned over province-level data,EnCod still outperforms baselines in most cases. Comparedwith other provinces, the EnCod model adopted in Hu-nan, Henan, Sichuan and Zhejiang, which are places nearbyHubei province, delivers relatively low prediction error. Itcould be explained as the epidemic more influences theseareas. Thus the effect of adding COVID-19 statistics is morehelpful and leads to better e-commerce behavior predictions.Next, to test how well EnCod performs for a longer periodprediction, we increase the days of a forecast by changingthe n value to { , , , , , } . Table 4 displays the resultsfrom this long-term prediction. The immediate day predic-tion n = 1 performs better for the AR model. For n > ,EnCod consistently outperforms all baselines by a substan-tial margin. In contrast, the AR model performs poorly forlonger-term prediction, reaching an NRMSE value above 0.5after a week or longer prediction. The ability to look beyonda week makes the proposed EnCod model practical and ap-plicable to study future epidemics.To summarize, all the above results verify that our methodcan be applied to global and local forecasts and long-termforecasts of the demand for essential goods, which is crucialand meaningful in the pandemic period. Overall, our methodcan effectively improve forecasting performance, no matterthe historical records of time series are sparse or dense, nomatter how long the future is predicted, which shows themodel’s utility and robustness. Discussion and Conclusion
The COVID-19 epidemic has exacerbated difficulties inthe sufficient supply of essential products to meet the de-mand and, consequently, influence people’s activities on the e-commerce platform. This paper conducts extensive dataanalysis to investigate how people’s online shopping behav-iors respond to the epidemic and discover different behav-ioral patterns. We find out the disruption in the supply ofessential goods in this period led to changes in shopping ac-tions (e.g., a positive rank change of search action for facemasks and other epidemic-relevant products). Therefore, weincorporate the epidemic development statistics into demandforecasting and present an EnCod model. The model is sim-ple yet effective in forecasting the demand for COVID-19related items during the epidemics.The findings of this research have multiple implica-tions. First, the product-level detailed shopping logs will beanonymized and released to the research community andserve as a critical data source to understand market dis-ruptions during a health pandemic. Second, health profes-sionals and e-commerce marketers can utilize the model forpredicting surges in the short-term demand for particulargoods under risks from an epidemic. Third, policymakerscan review the most relevant product goods identified inthis research to understand households’ needs. Fourth, theencoder-decoder model (EnCod) can be utilized in domainsbeyond e-commerce (such as trade data) to review the im-pact of COVID-19 in other sectors.In China, the population’s proportion using mobile e-commerce is relatively high, and the country has gonethrough the leading COVID-19 epidemic development dur-ing the first quarter of 2020. However, it will be neces-sary to determine how this analysis can be repeated in othercountries where the proportion of using mobile e-commercemay be lower, or COVID-19 is still in progress. In the post-COVID-19 era, inventory planning and pricing of goods willhave to be decided based on multiple data sources, includinguser demands. Moreover, combining other modality of mo-bile e-commerce (Cao et al. 2020; Chen et al. 2020) to assistdemand forecasting, in consideration of cost-effectiveness,would also be an excellent direction to extend this work. Asthe first to identify the impact of COVID-19 from this per-spective of mobile e-commerce, we believe that this studymakes an important contribution to the community. eferences
Alon, T. M.; Doepke, M.; Olmstead-Rumsey, J.; and Ter-tilt, M. 2020. The impact of COVID-19 on gender equality.Technical report, National Bureau of Economic Research.Arora, N.; Charm, T.; Grimmelt, A.; Ortega, M.; Robinson,K.; Sexauer, C.; Staack, Y.; Whitehead, S.; and Yamakawa,N. 2020. A global view of how consumer behavior is chang-ing amid covid-19. URL http://tinyurl.com/y83vtejl.Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural ma-chine translation by jointly learning to align and translate. arXiv Preprint arXiv:1409.0473 .Baker, S. R.; Bloom, N.; Davis, S. J.; Kost, K. J.; Sammon,M. C.; and Viratyosin, T. 2020. The unprecedented stockmarket impact of COVID-19. Technical report, National Bu-reau of Economic Research.Cao, H.; Chen, Z.; Xu, F.; Wang, T.; Xu, Y.; Zhang, L.; andLi, Y. 2020. When Your Friends Become Sellers: An Empir-ical Study of Social Commerce Site Beidian. In
Proceedingsof the 14th International Conference on Web and Social Me-dia , 83–94.Chang, H.-H.; and Meyerhoefer, C. 2020. COVID-19 andthe Demand for Online Food Shopping Services: EmpiricalEvidence from Taiwan. Technical report, National Bureauof Economic Research.Chen, T.; and Guestrin, C. 2016. XGBoost: A scalabletree boosting system. In
Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discov-ery and Data Mining , 785–794.Chen, Z.; Cao, H.; Xu, F.; Cheng, M.; Wang, T.; and Li, Y.2020. Understanding the Role of Intermediaries in OnlineSocial E-commerces: an Exploratory Study of Beidian. In
InProceedings of the ACM on Human Computer Interaction .Cohen, J. 2020. Scientists are racing to model the nextmoves of a coronavirus that’s still hard to predict. URLhttp://doi.org/10.1126/science.abb2161.Contreras, J.; Espinola, R.; Nogales, F. J.; and Conejo, A. J.2003. ARIMA models to predict next-day electricity prices.
IEEE Transactions on Power Systems
Cell
Proceedings of the 26th ACM SIGKDD In-ternational Conference on Knowledge Discovery and DataMining , 3443–3450. Maier, B. F.; and Brockmann, D. 2020. Effective contain-ment explains subexponential growth in recent confirmedCOVID-19 cases in China.
Science
Emerging infectious diseases
Time Series Techniques for Economists .Cambridge University Press.Omar, H.; Hoang, V. H.; and Liu, D.-R. 2016. A hybrid neu-ral network model for sales forecasting based on ARIMAand search popularity of article titles.
Computational Intel-ligence and Neuroscience
Neurocomput-ing
The American Journal of Medicine
PhysicsLetters A
Advances inNeural Information Processing Systems , 3104–3112.Tian, H.; Liu, Y.; Li, Y.; Wu, C.-H.; Chen, B.; Kraemer,M. U.; Li, B.; et al. 2020. An investigation of transmissioncontrol measures during the first 50 days of the COVID-19epidemic in China.
Science
Science ppendix
This is the supplementary material for the submission 4638to AAAI-21. We share the per-product-level logs of brows-ing, sales, and search logs at https://bit.ly/3kcAEN5. Thisdata, containing before and after e-commerce logs, will beof great value to the research community to study the im-pact of a health risk such as COVID-19.
List of Top-Products Affected by COVID-19
The Modeling Market Descriptions section revealed whichitems saw large disruptions in purchasing during COVID-19. Table 5 and Table 6 display the top-20 and bottom-20products with their highest and lowest relative popularity.As we aggregate data by each week, the week of January20 will contain events related to both the Chinese new yearholiday as well as the COVID-19 lock impact.The relative ranking RP ( c, t ) of COVID-19 related prod-ucts became very high at the end of January. Especiallyin the peak period, the corresponding absolute ranking ofmasks, disinfectants, daily necessities, and hand sanitizerswere very high among thousands of items in the Beidianplatform, top-1 during the week of January 20, top-2 duringthe week of February 3, top-1 during the week of January27, top-8 during the week of January 27, respectively. Evaluation Metrics
Among the evaluation metrics we employ, here we describehow we compute the Normalized Root Mean Square Error(NRMSE) (Rocha, Cortez, and Neves 2007):
NRMSE( y, ˆ y ) = RMSE ( y, ˆ y ) y max − y min , (2)where N is the number of test samples, y i , i ∈ [1 , N ] rep-resents the ground truth, and ˆ y i , i ∈ [1 , N ] represents thepredicted values. Time-Lagged Cross-Correlation Results
The Modeling Market Descriptions section showed resultson time-lagged cross-correlation (TLCC). The method shiftstwo sequences relatively in time (time-lagged) and cal-culates the correlation between them (cross-correlation).Therefore, it can analyze non-stationary time series andquantify how the three shopping actions appear upon a pan-demic. We use a rolling window and set each window to bea size of 21 days, and the lagged time is set from 0 to 6 daysbetween the two series.The trends seen on browsing and searching are similar tothat of purchasing for snacks in Figure 7. Because the supplyhas nott been affected, the behavioral response during theepidemic is similar across different action types. O ff s e t / da y (a) Purchasing on snacks ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . O ff s e t / d a categor : snack behavior: browse −0.8−0.40.00.40.8 (b) Browsing on snacks ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . ~ . O ff s e t / d a y category: snack behavior: search −0.8−0.40.00.40.8 (c) Searching on snacks Figure 7: The TLCC results of searching and browsing behav-iors on snacks.able 5: Top-20 products with the highest relative popularity. The Peak time column displays the first day of the examinedweek that reached the largest rank change (i.e. arg min t rank . (t) ). Note that some of the purchases in the week of January 20thto 26th may be related to the new year’s holiday. The city of Wuhan went through lockdown since January 23rd. Purchasing Browsing SearchingProduct category max t RP ( c, t ) rank . ( t ) rank . Peak time rank . ( t ) rank . Peak time rank . ( t ) rank . Peak timeMasks 2.44 273 1 Jan 20 341 1 Jan 20 253 1 Jan 20Disinfectants 2.35 448 2 Feb 3 607 2 Feb 3 476 16 Feb 10Face towels 2.20 943 6 Jan 20 993 14 Jan 20 1234 406 Jan 20Jewlery 2.19 466 3 Jan 20 398 4 Jan 20 402 9 Jan 20Accessories 1.95 177 2 Jan 20 180 2 Jan 20 201 3 Jan 20Mango 1.86 364 5 Mar 2 528 14 Feb 17 805 76 Mar 2Daily necessities 1.80 63 1 Jan 27 66 3 Jan 27 66 4 Feb 3Hand sanitizers 1.71 408 8 Jan 27 565 4 Feb 10 427 20 Feb 3Online courses for children 1.70 602 12 Jan 20 407 59 Jan 20 472 307 Jan 20Children masks 1.62 784 19 Jan 27 783 45 Jan 27 609 7 Jan 27Outdoor toys 1.48 573 19 Feb 10 501 25 Feb 10 886 574 Feb 10Disposable utensils 1.36 530 23 Feb 3 746 37 Feb 3 439 32 Feb 3Tiny bottle sanitizers 1.33 958 45 Feb 10 745 84 Feb 10 981 972 Feb 17Top-up for online entertainment 1.32 710 34 Jan 20 711 61 Jan 20 845 258 Jan 20Thermometers 1.30 575 29 Jan 27 675 31 Jan 27 534 24 Feb 3Vitamins 1.22 167 10 Jan 27 214 13 Jan 27 205 12 Jan 27Wet tissue 1.18 46 3 Feb 3 117 3 Feb 3 84 13 Feb 3Pineapple 1.16 892 62 Feb 24 994 84 Feb 24 1068 110 Feb 24Children hats, scarves, gloves 1.10 316 25 Jan 20 251 48 Jan 27 132 12 Jan 20Root vegetables 1.08 48 4 Feb 24 115 12 Feb 24 168 24 Feb 10
Purchasing Browsing SearchingProduct category min t RP ( c, t ) rank . ( t ) rank . Valley time rank . ( t ) rank . Valley time rank . ( t ) rank . Valley timeFlavored milk -1.49 7 215 Jan 27 33 266 Jan 27 41 131 Feb 24Mixed nuts -1.45 4 114 Feb 3 2 91 Feb 3 8 88 Feb 24Cotton clothes -1.31 32 653 Feb 24 6 404 Mar 2 4 357 Mar 2Chocolate -1.17 11 164 Feb 10 25 223 Feb 17 37 264 Mar 2Decorations -1.16 81 1171 Feb 3 203 991 Feb 3 356 895 Feb 24Chinese wine -1.08 36 429 Feb 17 28 349 Feb 17 30 295 Mar 2Warming clothes -1.07 29 341 Mar 2 26 264 Mar 2 112 441 Mar 2Yogurt -1.07 22 260 Jan 27 23 286 Jan 27 48 164 Jan 27Down jackets -1.07 40 467 Feb 24 1 155 Mar 2 1 101 Mar 2Boots -1.03 44 477 Feb 24 5 176 Feb 24 3 148 Feb 24Melon seeds -1.02 19 199 Jan 20 63 261 Jan 27 55 249 Jan 20Snow boots -0.94 104 918 Feb 24 42 554 Feb 24 69 550 Mar 2Red packets -0.91 172 1386 Mar 2 145 1377 Mar 2 142 1240 Mar 2Utensils -0.90 106 844 Feb 3 71 858 Feb 10 683 960 Feb 24Thermal underwear -0.84 28 194 Feb 24 29 186 Mar 2 20 142 Mar 2Traditional snacks -0.83 14 94 Jan 27 45 120 Jan 27 27 96 Jan 20Apples -0.83 3 20 Jan 27 21 79 Jan 27 61 127 Jan 20Snow boots for mom -0.82 180 1197 Feb 24 99 949 Mar 2 64 769 Mar 2Couplets -0.81 233 1509 Mar 2 184 1564 Mar 2 268 1538 Mar 2Oranges -0.76 6 35 Mar 2 34 94 Jan 27 78 218 Jan 27
Table 6: Bottom-20 products with the lowest relative popularity. Valley time represents when to reach the lowest ranking (i.e. arg max t rank . (t)(t)