[PDF] The impact of social influence in Australian real-estate: market forecasting with a spatial agent-based model

Abstract

Housing markets are inherently spatial, yet many existing models fail to capture this spatial dimension. Here we introduce a new graph-based approach for incorporating a spatial component in a large-scale urban housing agent-based model (ABM). The model explicitly captures several social and economic factors that influence the agents' decision-making behaviour (such as fear of missing out, their trend following aptitude, and the strength of their submarket outreach), and interprets these factors in spatial terms. The proposed model is calibrated and validated with the housing market data for the Greater Sydney region. The ABM simulation results not only include predictions for the overall market, but also produce area-specific forecasting at the level of local government areas within Sydney as arising from individual buy and sell decisions. In addition, the simulation results elucidate agent preferences in submarkets, highlighting differences in agent behaviour, for example, between first-time home buyers and investors, and between both local and overseas investors.

Full PDF

TThe impact of social inﬂuence in Australian real-estate:market forecasting with a spatial agent-based model

Benjamin Patrick Evans Kirill Glavatskiy Michael S. Harr´eMikhail ProkopenkoCentre for Complex Systems, The University of Sydney, Sydney, NSW, Australia

Abstract

Housing markets are inherently spatial, yet many existing models fail to capture thisspatial dimension. Here we introduce a new graph-based approach for incorporating a spatialcomponent in a large-scale urban housing agent-based model (ABM). The model explicitlycaptures several social and economic factors that inﬂuence the agents’ decision-makingbehaviour (such as fear of missing out, their trend following aptitude, and the strength oftheir submarket outreach), and interprets these factors in spatial terms. The proposed modelis calibrated and validated with the housing market data for the Greater Sydney region. TheABM simulation results not only include predictions for the overall market, but also producearea-speciﬁc forecasting at the level of local government areas within Sydney. In addition,the simulation results elucidate movement patterns across submarkets, in both spatial andhomeownership terms, including renters, ﬁrst-time home buyers, as well as local and overseasinvestors.

Within economic markets, housing markets are unique for a variety of reasons. The combinationof durability, heterogeneity and spatial ﬁxity ampliﬁes the role of the dwellings’ perceived valueand the buyers and sellers’ expectations (Alhashimi and Dwyer, 2004). The extremely high costof entry and exit into the market (with moving fees, agent fees, etc.) further complicates thedecision-making of participating households (Huang and Ge, 2009). There are long time delays inthe market response as houses can not be erected instantaneously to accommodate an increasein demand (Bahadir and Mykhaylova, 2014). The fact that real-estate can be seen as both aninvestment asset and a consumption good (Piazzesi et al., 2007) (and even a status good, Weiet al. (2012)) magniﬁes the impact of social inﬂuence on both decision-making and resultantmarket dynamics and structure.Consequently, housing markets are notoriously diﬃcult to model as the ensuing marketdynamics generates volatility, with non-linear responses, and “boom-bust” cycles (Sinai, 2012;Miles, 2008; Burnside et al., 2016), making traditional time-series analysis insuﬃcient. Non-lineardynamics of housing markets are ubiquitous, being observed throughout the world, from Tokyo(Shimizu et al., 2010) to Los Angeles (Cheng et al., 2014).

The authors are thankful to Paul Ormerod, Adri´an Carro and Markus Brede for many helpful discussions ofthe baseline model. The authors acknowledge the HPC service at The University of Sydney for providing HPCresources that have contributed to the research results reported within this paper. The authors would also liketo acknowledge the Securities Industry Research Centre of Asia-Paciﬁc (SIRCA) and CoreLogic, Inc. (Sydney,Australia) for their data on Greater Sydney housing transactions. a r X i v : . [ q -f i n . C P ] S e p raditional economic modelling methods, such as dynamic stochastic general equilibriummodels (DSGE), typically use representative aggregated agents while making strong assumptionsabout the behaviour of the markets (rational and perfect competition). Such representativeagents may be limiting for economic models (Gallegati and Kirman, 1999). Furthermore, theseassumptions (and many other traditional economic assumptions) are known to be inadequatein housing markets, motivating a well-recognised need for change in housing market modelling(McMaster and Watkins, 1999). In addressing this need, a speciﬁc type of models, called agent-based models (ABM) has been applied. ABMs aim to capture markets from the “bottom-up”(Tesfatsion, 2002), i.e., by focusing on the decision-making of individual agents in the market,possibly inﬂuenced by non-economic factors. In this sense, ABMs are capable of modellingmacroeconomies from micro (i.e., agent-speciﬁc) behaviour (LeBaron and Tesfatsion, 2008) andanalysing the economic decision making in counter-factual settings. While ABMs have shownpromise in housing market modelling (Geanakoplos et al., 2012) (and wider economic modelling,Poledna et al. (2019)), current ABMs themselves are not exempt from some limitations. Many ofthe existing housing ABMs tend to introduce at least one of the following constraints: the spatialstructure of markets is neglected, perfect information is still assumed, and/or the impact of socialinﬂuence on decision making of individual agents is underestimated.A ﬁne-resolution model of spatiotemporal patterns within such markets is desirable: it wouldgive an understanding of how market dynamics shape within local areas, explaining how thepricing structure directly aﬀects the agents’ mobility over time (i.e., by forcing households outof certain regions due to gentriﬁcation and higher cost of living). Furthermore, within suchhousing markets (and in fact, many economic markets, Conlisk (1996)), it is also known thatagents do not act perfectly rational (Wang et al., 2018), instead following bounded rationality(Simon, 1957, 1955)). Firstly, humans are often inﬂuenced by social pressure (i.e., herd mentality),with the decisions being made purely based on social pressure rather than a perfectly rationalchoice. Secondly, it is diﬃcult to process all the relevant information in the market (i.e., it isimpractical for an agent to be able to view every dwelling listing within the housing market).Thus, it is unreasonable to assume that agents act in a perfectly rational manner, yet this is whatmany current housing market models assume (despite ABMs not intrinsically requiring theseassumptions to be made).To address these limitations, we introduce a spatial agent-based model, in which the constraintsimposed by various search and mobility costs create eﬀective spatial submarkets. These submarketsare modelled graph-theoretically, with a graph-based component used in both representingimperfect information and modulating social inﬂuences. The spatial ABM is then capable ofcapturing the “boom-bust” cycles observed in Australia (in particular, Greater Sydney) over thelast 15 years. It succeeds in forecasting nonlinear pricing and mobility trends within speciﬁcsubmarkets and local areas. In exploring the pricing dynamics, we focus on the inﬂuence ofimperfect spatial information and the role of social inﬂuence on agent decision making. In doing so,we identify the salient parameters which drive the overall dynamics, and pinpoint the parameterthresholds, beyond which the resultant dynamics exhibit strong nonlinear responses. Thesethresholds allow us to distinguish between diﬀerent conﬁgurations of the market (e.g., marketswith supply or demand dominating). In addition, we identify and trace speciﬁc interactions ofparameters, in particular the interplay of social inﬂuences, such as the fear of missing out andthe trend following aptitude, in presence of imperfect spatial information.The remainder of the paper is organised as follows. In Section 2 we provide an overviewof agent-based models of housing markets. In Section 3 we outline the baseline model, whilein Section 4 we outline the proposed spatial extensions and new parameters. In Section 5 weanalyse the sensitivity and parameters of the model, before presenting the results and discussionin Section 6. In Section 7 we provide conclusions and highlight future work.2 Background

One of the pioneering works for agent-based modelling (ABM) of the housing markets was byGeanakoplos et al. (2012) (and extended further in Axtell et al. (2014); Goldstein (2017)), wherethe Washington DC market was modelled from 1997–2009 in an attempt to understand thehousing boom and crash. Macroeconomic experiments were then conducted to see how changingunderlying factors, such as interest rates or leverage rates, would aﬀect this pricing trend.Baptista et al. (2016) model the UK housing market to see the eﬀects that various macropru-dential policies have on price cycles and price volatility. Gilbert et al. (2009) also looks at theEnglish housing market, varying exogenous parameters and policies, and tracking the eﬀect thesehave on median house prices. Likewise, Carstensen (2015) explore the Danish housing marketand macroprudential regulations, such as income and mortgage rate shocks.Ge (2017) analyse how housing market bubbles can form (and bust) purely endogenouslywithout external shocks, due to leniency and speculation of agents. Kouwenberg and Zwinkels(2015) also show an ABM can “endogenously produce boom-and-bust cycles even in the absenceof fundamental news”.A recent ABM of the Australian housing market proposed by Glavatskiy et al. (2020) explainedthe volatility of prices over three distinct historic periods, characterised by either steady trends ortrend reversals and price corrections. This model highlighted the role of the agents’ trend-followingaptitude in accurately generating distinct price dynamics, as detailed in Section 3. In this paper,we further develop this model by introducing several features directly capturing social inﬂuencesand bounded rationality in decision making, as elaborated in Section 4.Traditionally, the modelling goal is to explain the housing market pricing (rather than predictits trajectory), and trace how possible macroeconomic policy changes may have aﬀected thedynamics. Here, instead, we focus on predicting the pricing dynamics beyond the period coveredby the current datasets (i.e., presenting out-of-sample forecasting). This motivation is alignedwith the growing suggestions that agent-based models should be predictive (Polhill, 2018) (whichhas admittedly been met with some resistance, Edmonds and n´ı Aodha (2018)). Agent-basedmodels have recently been shown to outperform traditional economic models, such as vectorautoregressive models and DSGE models for out-of-sample forecasting of macro-variables (GDP,inﬂation, interest rates etc.) (Poledna et al., 2019). For example, it was demonstrated thatan ABM can outperform standard benchmarks for out-of-sample forecasting in the US housingmarket (Kouwenberg and Zwinkels, 2014), while successful out-of-sample forecasting was carriedout by Geanakoplos et al. (2012) as well.

Spatial distribution of houses and dependencies between market trends on spatial patterns havebeen recognised as important and desirable features (Goldstein, 2017). For example, Baptistaet al. (2016) describe the spatial component as one that is “highly-desirable”, yet “this approachgreatly increases the complexity of the models and hence most spatial ABMs in the ﬁeld listedbelow make use of a highly simpliﬁed representation of the environment, often in the shape ofsmall grids”.Spatial agent-based models have also shown to be useful in a variety of other areas suchas epidemic modelling (Chang et al., 2020; Cliﬀ et al., 2018), cooperative behaviour (Power,2009), and symbiotic processes (Raimbault et al., 2020). Despite the promise shown by housingABMs, there are currently only relatively few spatial housing market models with the capacity toaccurately forecast nonlinear price dynamics. 3he seminal works in spatial housing ABMs are by (Ge, 2013) and (Ustvedt, 2016). Both usea matrix-based approach, with the region being arranged on a 2-dimensional grid. In Ge (2013),each cell (row/column) in the grid is assigned a neighbourhood quality (endogenous) and a naturequality (exogenous). The neighbourhood quality is a measure of attractiveness which aims tocapture concepts such as safety, and is dependent on agents that live in that region (which canchange in the model, thus endogenous). In contrast, nature quality is based on outside factorsnot changed by the model, such as distance to a beach or weather (thus exogenous). Data usedin this work is abstract, that is, it is not calibrated to a particular city, but rather used to tracehow these factors aﬀect the trends.Ustvedt (2016) also use a 2-dimensional grid for a NetLogo model. However, an importantadditional spatial step is made: district borders are incorporated using GIS data (somewhatsimilar to what we propose in our model with the graph-based approach, however, there areimportant diﬀerences which we outline below), and the model is calibrated based on Oslo, Norway.Another work is (Pangallo et al., 2019), which models theoretical (i.e., not calibrated to anyspeciﬁc region) income segregation and inequality, using a spatial agent-based model, and theeﬀect this may have on house prices. Again, this approach uses a 2-dimensional grid for thespatial component. This model assumes a monocentric city, and measures the “attractiveness” ofa location, based on the distance to the (generic) city centre. The main contribution of this workis a mathematically tractable spatial model for capturing income segregation. The eﬀects arerelated to the house prices, where unequal income is shown to lower the house price globally.

Existing Limitations

Some important spatial factors are not considered in the existing spatialmodels. For example, the spatial contribution is limited to identifying the supply of dwellings fora given location and population, and initial pricing in an area (dwelling qualities in Ge (2013)and initial price in Ustvedt (2016)). One important factor that is missed in both models is thecalibration of agent (buyers/sellers) characteristics (such as income, wealth, etc.), based on theareas in which they reside.Another factor of existing spatial models is the assumption of a monocentric city, meaning adistance metric such as “distance to centre” is used for measuring attractiveness, which becomesproblematic for polycentric cities or with agents who have no desire to live within the “centre”.Furthermore, computing these distances in a 2d grid can often be misinformative, as movingacross a region border (i.e., into a new zone) often incurs a far larger cost than moving a cellinward into the same zone. To address this limitation, we explicitly capture this feature in theproposed graph-based spatial extension. Distances are measured as the shortest path through thegraph, with nodes representing various regions (contained within boundaries). The graph-basedapproach is particularly useful, as no monocentric assumption is made.One further limitation is that individual distributions of pricing within areas are often neglected.Instead, some models use a representative mean or median rather than sampling from the actualunderlying distributions for each area, which may fail to capture certain area trends. This isoften caused by the lack of underlying data. In our work, we use several contemporary datasets,such as SIRCA-CoreLogic and the Australian Census datasets, constructing the relevant pricingprobability density functions for each area.In summary, in contrast to the grid-based approach, here we propose an extensible graph-basedapproach which is described in Section 4. Such an approach allows us to further exploit thespatial component, by spatially scaling the likelihood of moving, based on distances betweennodes in the graph. Also, we allow characteristics to be speciﬁed for all agents at a node level (i.e.,for each region) rather than for the entire model. The graph-based approach does not assume amonocentric region, allowing for polycentric cities (which Greater Sydney is developing towardsCommission et al. (2018); Crosato et al. (2020)) to be modelled more eﬀectively.4

The Baseline Model

This work extends the work of Glavatskiy et al. (2020), which we will refer to as the “Baseline”method. In this section, we describe basic features of the baseline model, which the present workcarries over.

There are three key agent types in the model: dwellings, households, and the bank.A dwelling is a “physical” property, e.g. a house, apartment or condo. Each dwelling hasan intrinsic quality, which reﬂects its hedonic value (e.g. large house or existence of a pool).The quality is ﬁxed during the simulation. The quality of a dwelling is used as a reference fordetermining its listing price and dwelling payments. All dwellings have an owner. Dwellingscan be rented or sold. A rental contract is a binding agreement between the owner and rentinghousehold. Dwellings may be vacant at any period (which means that they are not rented out).A household represents a person or group of people (i.e., a family), which reside within GreaterSydney. Additionally, the model contains overseas agents, which can participate in the marketbut do not reside in the Greater Sydney region. Households have heterogeneous monthly incomesand liquid cash levels. Households can own several dwellings, but can only reside in one (overseasagents do not reside in any dwelling). When purchasing a dwelling, households always choosethe most expensive dwelling they can aﬀord. If they can aﬀord to buy a dwelling, they alwaysattempt to do that, putting a market bid (see below). Households that own more than onedwelling attempt to rent the excess dwellings out, and, if successful, receive rental payments as acontribution to their liquid cash. Households that do not own a dwelling rent one. Householdspay tax based on their income and ownership.The bank combines the functions of a commercial bank and the regulatory body, controllingvarious ﬁnancial characteristics, such as income tax rates, mortgage rates, overseas approval rates,mortgage approvals, and mortgage amounts (how much can be lent to a particular household).

The agents’ behaviour is governed by the price they are willing to sell their dwelling for, thelisting price, and the price they can aﬀord to buy a new dwelling, the bid price.The household bid price, i.e., their desired expenditure, is modulated by the household’smonthly income I [ t ] according to Eq. (1). B [ t ] = U b [ t ] Hφ b I [ t ] φ I φ M [ t ] + φ H − h ∗ ∆ HP I [ t ] (1)Here U b [ t ] is the urgency of a household to buy a dwelling, which is equal to 1 if the householdhas recently not sold any dwelling, and is larger than 1 by a term proportional to the number ofmonths since the last sale otherwise. H is a heterogeneous factor, set to a random value between1 ± b h /

2, where b h = 0 . φ I and φ b are theincome modulating parameters, which are calibrated from the mortgage-income regression forthat period of interest. In addition, φ M [ t ] is the mortgage rate at time t , while φ H is the annualhousehold maintenance costs. Finally, h is the trend-following aptitude and ∆ HP I [ t ] is the changein house price index (HPI) over the previous year.The bid price of the overseas investors is determined as an average, given the total volume andquantity of the approved overseas investments by the Foreign Investment Review Board. Severalaspects of the overseas investments are detailed in Appendix I.1.5he dwelling list price P [ t ] at time t is modulated by the quality of the dwelling Q accordingto Eq. (2). P [ t ] = b (cid:96) HQ h S [ t ] b s (1 + D h [ t ]) b d U (cid:96) [ t ] (2)Here b (cid:96) = 1 .

75 is the listing greed factor, showing the extent to which the seller tends toincrease the listing price. H is a heterogeneous factor. Furthermore, Q h is the average sale priceof the 10 dwellings with the most similar quality to the dwelling h for sale. In addition, S [ t ] isthe market average of the sold-to-list price ratio, b s = 0 .

22 is the sold to list exponent parameter, D h [ t ] is the number of months the dwelling has been on the market, and b d = − .

01 is the numberof months exponent parameter. Finally, U (cid:96) [ t ] is the urgency to sell the dwelling, which is equal to1 if the household is not in ﬁnancial stress, and increases proportionally to the number of monthsin ﬁnancial stress otherwise.Banks approve households desired expenditure based on the bank’s lending criteria. The bankuses the households liquidity and monthly income for determining an appropriate amount to lendand oﬀers the corresponding loan to the household. If the loan amount is greater than 0 . × B [ t ]then the household accepts the loan, otherwise, the household skips this round of the market. The model runs in sequential steps, with each step representing one month of actual time. Duringevery step, the model makes several market updates:1. The city demographics is updated (new dwellings and households created to match theactual numbers).2. Each household receives income and pays its living costs: non-housing expenses, maintenancefees and taxes (if owning a dwelling), rent (if renting). The balance is added to or subtractedfrom the household’s liquid cash.3. Expiring rental contracts are renewed.4. Dwellings are placed on sale.5. Households put their bids for buying.6. The buyers and sellers are matched (described below)7. The households receive mortgages and mortgage contracts from the bank, and the balancesheets of both buyers and sellers are updated.To match buyers and sellers, bids and listings are sorted in descending order. Each listeddwellings is then attempted to match with the highest bid. If the bid price is higher than the listprice, then the deal is made with the probability of 80%. Otherwise, the listing is consideredunattended and the next one attempts to match. The pseudo-code for the process is given inAppendix B.

In this section, we develop the spatial component of an ABM housing market. Speciﬁcally, weinvestigate how social aspects inﬂuence selling a dwelling, as well as account for the agents’6references to buy in a nearby neighbourhood when purchasing a dwelling. The spatial componentalso aﬀects how households are initialised (i.e., what neighbourhood they should belong to), andhow the prices of nearby dwellings may aﬀect the listing price.

Greater Sydney is composed of 38 Local Government Areas (LGAs), each of which containsseveral suburbs (and postcode areas). The data provides sales at a postcode level and the LGAlevel. However, the postcode data may be too granular as the number of listings in a given timeperiod for small areas could be low or even zero. For this reason, we analyse the data at theLGA level, but the proposed approach is general and can be used at any level of granularity (i.e.,over countries, states, cities, government areas, postcodes, suburbs, or even individual streets)assuming the data is available. The LGAs are visualised in Fig. 1a.

BAYSIDEBLACKTOWNBLUE MOUNTAINSBURWOODCAMDENCAMPBELLTOWNCANADA BAYCANTERBURY-BANKSTOWNCENTRAL COASTCITY OF PARRAMATTACITY OF SYDNEYCUMBERLANDFAIRFIELDGEORGES RIVERHAWKESBURYHORNSBYHUNTERS HILLINNER WESTKU-RING-GAILAKE MACQUARIELANE COVELITHGOWLIVERPOOLMOSMANNORTH SYDNEYNORTHERN BEACHESPENRITHRANDWICKRYDESTRATHFIELDSUTHERLAND SHIRETHE HILLS SHIREUPPER LACHLAN SHIREWAVERLEYWILLOUGHBYWINGECARRIBEEWOLLONDILLYWOOLLAHRA (a) LGA Boundary Map

CANTERBURY-BANKSTOWN FAIRFIELDGEORGES RIVER CAMDEN LAKE MACQUARIENORTHERN BEACHESHORNSBYTHE HILLS SHIREPENRITHLIVERPOOLCAMPBELLTOWN CENTRAL COASTWOLLONDILLYUPPER LACHLAN SHIRECANADA BAYBAYSIDESUTHERLAND SHIRE BLACKTOWNHAWKESBURYWILLOUGHBYLANE COVESTRATHFIELDWAVERLEY RYDENORTH SYDNEY LITHGOWBURWOOD BLUE MOUNTAINSMOSMANCITY OF SYDNEYWOOLLAHRA KU-RING-GAI HUNTERS HILLRANDWICK INNER WESTWINGECARRIBEE CUMBERLANDCITY OF PARRAMATTA (b) LGA Network

Figure 1: Greater Sydney LGAs. On the left, we see the raw GIS data. On the right, theprocessed graph (with nodes scaled based on population size).

To incorporate the LGA areas into the ABM, the data must be converted to an appropriate datastructure. This is achieved by converting the map (from Fig. 1a) into an undirected graph G ,with equal edge weights of 1 (i.e., unweighted, the weighted extensions are discussed below) shownin Fig. 1b. In doing so, the topology of the spatial relationships of the suburbs are preserved butnot the exact locations, i.e., the x , y coordinates of latitude and longitude are not needed.Each of the N LGAs (shaded polygon) in Fig. 1a is converted into a vertex v i , i ∈ [1 , . . . , N ].The cardinality of the set of all vertices V is | V | = 38 corresponding to the 38 LGAs. Two LGAareas v i and v j are adjacent to one another if they share a border (darker lines in Fig. 1a) andthese borders are converted to the set of edges E all of weight one that form an adjacency matrix G for the LGAs. Formally there is a 2-D spatial region for Sydney (the map of Sydney): M ,composed of the N non-overlapping LGAs that form a complete cover of Sydney and each LGAshares at least one border with another LGA. LGA( x ) associates a graph vertex x with an LGA,7nd Adj( l i , l j ), deﬁned for two LGAs l i and l j , is a function measuring the length of their commonborder in M : V = { v i | i ∈ [1 , . . . , N ] , LGA( v i ) ∈ M} , (3) E = { e i,j = 1 | v i , v j ∈ V, i (cid:54) = j, Adj(LGA( v i ) , LGA( v j )) > } . (4)This deﬁnition implies G ( E, V ) is a connected undirected graph, there are no disconnectedsubgraphs. An important edge is also added that represents the Sydney harbour bridge connectingNorthern Sydney with the City of Sydney.To calculate the distance between vertices v i and v j , the minimum path length (i.e., pathwith the lowest number of edges) between the two vertices is used as edges are equally weighted: δ ( v i , v j ) denotes this shortest path. Because the edges have unit weighting the shortest pathsare found using a simple breadth-ﬁrst search. However, future extensions could consider edgeweightings based on metrics such as real distance between centroids, travel time between centroids,or even adding additional edges for public transport links. In cases of weighted edges, Dijkstra’salgorithm could be used to compute δ ( v i , v j ) instead. Dwellings are allocated initial prices based on the distribution of recent sales within their LGA,and also populated according to the census data for dwellings in each LGA. A full description ofthe process is given in Appendix G.2. Households are also distributed into LGAs based on censusdata, with renters then moving to LGAs in which they can aﬀord, as described in Appendix C.The original dwelling list pricing equation from Eq. (2) is also now updated to be based oneach dwelling’s LGA. Rather than Q h being the average of the 10 most similar quality dwellingsin the model, it is the average of the 10 most similar within the LGA. Likewise, S [ t ] is the averagesold-to-list price ratio for the dwelling’s LGA (not overall). In this sense spatial submarkets(LGAs) (Watkins, 2001) capable of exhibiting their own dynamics are introduced. Recent research(Bangura and Lee, 2020) has shown the importance of submarkets in the Greater Sydney market,so capturing such microstructure is a key contribution of the proposed approach, as trends can belocalised to speciﬁc submarkets (a feature not prominent in existing ABMs of housing markets). In an actual housing market, a typical buyer does not review every listing in the entire city dueto the high search costs and desire to live in certain areas. Rather, the buyer targets particularspatial sub-markets, relating to a given area. In particular, listings immediately around thebuyer’s location are likely to be viewed with a higher probability than listings which are furtheraway.Therefore, it is unrealistic to assume perfect knowledge in an ABM of the housing market. Tomodel this imperfect spatial information, we introduce an outreach term O , which determinesthe likelihood for a buyer located at v i to view a listing located at v j , as described by Eq. (5).According to this expression, the likelihood of viewing the listing decreases with the distancebetween the buyer and the listed dwelling. The outreach factor is illustrated in Fig. 2 for a buyerlocated in the LGA “City of Sydney”. O ( v i , v j ) = 1 − δ ( v i , v j )max k ∈ V δ ( v i , v k ) (5)8 (a) Overview * (b) Zoomed Figure 2: Probability of viewing a listing based on buyers location (in this case the City ofSydney). Dark red indicates high probability, lite yellow indicates low probability.While the spatial outreach makes sense for ﬁrst-time home buyers, for investors, the outreachbecomes uniform, as they do not necessarily desire rental properties near where they reside. Sofor investors, we use O ( v i , v j ) = 1 , ∀ i, j .To control the strength of the outreach, we introduce a new parameter α ∈ (0 , P view ( v i , v j ) is given in Eq. (6). P view ( v i , v j ) = αO ( v i , v j ) (6)where α modulates the spatial information on dwelling listings: the higher α , the more listingsare viewed by a potential buyer. That is, α adjusts the likelihood of viewing a listing, based onthe distance to that listing. In the baseline model, dwellings have a ﬁxed probability of being listed, p b = 0 .

01. Here, weconsider a spatial probability to list a dwelling located in a certain LGA l i , which depends on thenumber of recent sales in l i . For this, we introduce the “fear of missing out” (FOMO) parameter,denoted by β , which modulates the probability of listing a dwelling situated in l i by the numberof recent sales in l i . In particular, if a high number of dwellings in l i have been sold, then theowners of dwellings in l i will be more likely to list their dwelling on the market. This will accountfor the possibility that if a certain LGA becomes a popular location for selling a dwelling, theowners of dwellings in this LGA would not want to miss an opportunity to sell their dwelling.The spatial listing probability p list ( l i ) is expressed by considering the diﬀerence in the fractionof dwellings currently in l i ’s submarket ( f l i = listings l i /dwellings l i ) with respect to the GreaterSydney average. A higher f l i means dwellings in l i have been less likely to sell in the previousmonths (since all begin with a ﬁxed probability of selling p b ) compared to the regions average.9his spatial listing probability is given by Eq. (7). p list ( l i ) = p b + p b β  f l i (cid:80) a ∈ V f a / | V | −  (7)The magnitude of β controls the strength of l i ’s spatial submarket contribution to the listingprobability. Rewriting Eq. (7) by denoting the term in the square brackets as x , i.e., p b + p b βx we see that if both β and x are negative, then the listing probability will be higher than thebaseline level p b . Therefore, if the number of dwellings for sale in a particular LGA is less thanthe average in the whole city, this means that this particular LGA’s submarket has been clearingfast, and homeowners in this LGA will be more likely to list their dwelling. In contrast, if x ispositive, then dwellings in the current LGA are not clearing as fast as in the other LGAs, sohomeowners in this LGA will be less likely to list. Conversely, a positive β results in an oppositeeﬀect. If an LGA has comparatively few listings, the homeowner from this LGA will be less likelyto list a dwelling for sale, whereas if this LGA has many listings, the probability to list a dwellingthere increases. In this way, β has a direct eﬀect on the supply of dwelling listings. The selection of appropriate parameters is an important step in agent-based modelling, and inmost existing work, parameters are selected over the entire period of interest. Here, we insteadadopt a machine learning approach whereby we split the time series into a training and a testingportion, this ensures the model also performs well for the unseen (i.e., the testing) portion ofdata, and avoids biasing the selection of parameters by considering the entire time period.We use Bayesian hyperparameter optimisation (Snoek et al., 2012) to ﬁnd appropriatecombinations of parameters. The training set is used for parameter selection, whereas the testset is only used to evaluate the goodness of ﬁt of the model after the optimisation process iscompleted (in Section 6). We stress that the test set is never seen by the optimisation process.This is an important distinction from previous work (Glavatskiy et al., 2020), which constructsthe models by considering all time points, and as such can not be considered true predictions,unlike here where the model can be seen as a true predictor of future pricing trends. Optimisationdetails are given in Appendix D.

We run several optimisation processes in order to quantify the contribution of each component.Each method follows the same optimisation process.A “baseline” method is run, where there is only a single area (Greater Sydney), and only h is optimised for (with perfect knowledge and no β optimisation). A spatial version of thebaseline, with the 38 Greater Sydney LGAs as areas. Again, only h is optimised for (with perfectknowledge and no β ). We then run pairwise combinations, so h and α , and h and β . We neverrun without optimising h , as this was the key tuning parameter in the original model. Finally, we p list can technically be < >

1, so p list is capped to be between 0 and 1, in order to be a true probability,although this is exceptionally rare and does not appear to occur in Fig. 4. Perfect knowledge in this paper is assumed to mean α = 1 , O ( v i , v j ) = 1, i.e., ability to view every listingacross all of Greater Sydney, i.e., M in 4.1.1. h , β , and α ). We then apply a global constraint (on the result of the training optimisation) that2006–2010 must exhibit a peak, with details outlined in Appendix D.3. The results are presentedvisually in Fig. 3.Looking at the resulting plots, we can see that with the introduction of each new component,the resulting values of the loss function (cid:96) (see eq. 8) over the training period is reduced at eachstep, with the proposed extensions achieving the minimal (cid:96) . From this point forward, we focus onthe proposed extension in its entirety, due to the improved performance in all three periods.11 -

06 2007 -

06 2008 -

06 2009 - $400k$420k$440k$460k (a) 2006–2010 ( (cid:96) = 45726) -

06 2012 -

06 2013 -

06 2014 - $500k$550k$600k$650k$700k (b) 2011–2015 ( (cid:96) = 187549) -

06 2017 -

06 2007 -

06 2008 -

06 2009 - $400k$420k$440k$460k (d) 2006–2010 ( (cid:96) = 41613) -

06 2012 -

06 2013 -

06 2014 - $500k$550k$600k$650k (e) 2011–2015 ( (cid:96) = 100745) -

06 2017 -

06 2018 - $700k$750k$800k$850k$900k$950k$1.00M (f) 2016–2019 ( (cid:96) = 392208) Spatial. Optimising h only. -

06 2007 -

06 2008 -

06 2009 - $400k$420k$440k$460k (g) 2006–2010 ( (cid:96) = 37523) -

06 2012 -

06 2013 -

06 2014 - $500k$550k$600k$650k (h) 2011–2015 ( (cid:96) = 81553) -

06 2017 -

06 2018 - $750k$800k$850k$900k (i) 2016–2019 ( (cid:96) = 236476) Spatial. Optimising h and α . -

06 2007 -

06 2008 -

06 2009 - $400k$420k$440k$460k (j) 2006–2010 ( (cid:96) = 43459) -

06 2012 -

06 2013 -

06 2014 - $500k$550k$600k (k) 2011–2015 ( (cid:96) = 20321) -

06 2017 -

06 2018 - $725k$750k$775k$800k$825k$850k (l) 2016–2019 ( (cid:96) = 23674) Spatial. Optimising h and β . -

06 2007 -

06 2008 -

06 2009 - $400k$420k$440k$460k (m) 2006–2010 ( (cid:96) = 39786) -

06 2012 -

06 2013 -

06 2014 - $500k$550k$600k (n) 2011–2015 ( (cid:96) = 20106) -

06 2017 -

06 2018 - $725k$750k$775k$800k$825k$850k (o) 2016–2019 ( (cid:96) = 22492) Spatial. Optimising h , β and α . -

06 2007 -

06 2008 -

06 2009 - $400k$420k$440k$460k (p) 2006–2010 ( (cid:96) = 64334) -

06 2012 -

06 2013 -

06 2014 - $500k$550k$600k (q) 2011–2015 ( (cid:96) = 20106) -

06 2017 -

06 2018 - $725k$750k$775k$800k$825k$850k (r) 2016–2019 ( (cid:96) = 22492) Spatial. Optimising h , β and α with global constraint in 2006–2010.Figure 3: Optimisation of the goodness of ﬁt for dwelling prices across all models (for the trainingperiod). The orange lines are from the SIRCA-CoreLogic data (the solid line represents the rollingmedian, and the dotted line represents the month-to-month median). The black line shows thebest ﬁtted path from the model. 12 .2 Resulting Parameters Exploring the entire parameter search space would be computationally prohibitive. Bayesianoptimisation intelligently explores this search space, balancing exploration and exploitation withthe use of an acquisition function allowing more emphasis to be placed on well-performing orunexplored regions of the space (Shahriari et al., 2015). The resulting exploration is visualised inFig. 11. From this, we can see the regions of interest, with dark sections indicating areas with thelowest loss (cid:96) (and more sample points being present in such areas). We can see the search space isfairly well explored in all cases, with obvious regions of well-performing parameter combinations(pairwise combinations are visualised in Fig. 12). While the 3D plot gives a high-level overview,it is diﬃcult to visualise the contribution of each component. To facilitate this, we present aﬂattened 1-dimensional view of each parameter, where the results are averaged over the other 2parameters to view the loss for 1 parameter at a time. This is shown in Fig. 4. From these plots,it can be seen how each parameter behaves in isolation (noting that such plots do not capturethe parameter interactions). The selected ABM parameters (the ones which had the lowest (cid:96) ) arepresented in Table 1. A sensitivity analysis for the parameters is performed in Appendix E. h -0.80 -0.11 -0.005 β α h and β parameters being normally distributed around theoptimal value found, and α with a clear peak but non-normally distributed. This indicates eachparameter seems to have a useful range, which is further veriﬁed in the sensitivity analysis inAppendix E. h : Recall that the HPI aptitude h directly inﬂuences the bid price (given in Eq. (1)), and servesas the key trend-following parameter in the original model. We can see the value of h controlsthe contribution of the HPI over the previous year and as such it aﬀects the bid price based onthe markets state. This relationship depends on both the current HPI , and the yearly diﬀerencein ∆

HP I .The absolute value of h controls the magnitude of the contribution, and we can see 2006–2010had the highest contribution indicating the largest market eﬀect on bidding (in the original model,this also had a large magnitude but the opposite sign). The value for 2011–2015, h = − .

11, isvery close to that chosen in the original model h = − .

10, which reﬂects the relatively consistentprice dynamics, with agents mostly ignoring the market trend.Interestingly, in 2016–2019, the chosen h is near zero. This means the denominator of Eq. (1)simpliﬁes to: φ M [ t ] + φ H − h ∗ ∆ HP I [ t ] (cid:39) φ M [ t ] + φ H meaning the price is dependant on the mortgage rate and homeownership rate, as φ H is set basedon the current value of HPI, this means the historical values are not being used and instead a13uch more “forgetful” market based only on the previous month’s HPI is used for setting theprice, with agents paying less attention to historical trends.The HPI aptitude h also has clear optimal ranges for each period, with 2006–2010 in the range[ − . . . . − . . . . .

25] (which are of similar widths), and 2016–2019in the much narrower range [ − . . . . . β : This parameter (the righthand column in Fig. 4) exhibits a very sharp transition around0 for all years. This is because β has a direct relation to the supply and demand in the model,which drastically changes the dynamics based on the availability of properties. We see increasingcontributions of β throughout the years, with 2016–2019 indicating the highest levels of β . Thisis perhaps reﬂective of the market, where people are increasingly following trends when it comesto selling dwellings (perhaps an indicator of a “bursting” bubble, with a large cascading sell-oﬀ).This indicates sharper peaks and dips are likely to occur in the future, with decisions for listingsbeing made increasingly on the market state. α : This parameter does not present as clear a set of results as the other two parameters,indicating more uncertainty in its value. The likely reason for this is since the buyer alwayspurchases the most expensive viewed dwelling, the varying levels of α do not have as largeof an eﬀect on the outcome as the other parameters (as if a lower α removes the viewing ofdwelling, the next most expensive dwelling in turn will be purchased). In 2006–2010 there is ageneral preference towards lower levels of α , resulting in buyers acting with less information. In2011–2015, during the economic recovery in Australia, buyers may have been more cautious andconsidering a wider range of available dwellings when purchasing, reﬂected in higher values of α . In 2016–2019 the cautiousness of buyers appears to revert again, showing buyers making less“informed” choices perhaps due to the rapidly increasing dwelling prices and buyer’s desires topartake, at the expense of making “optimal” choices with larger α values.14 .0 0.5 0.0 0.5 1.0 h (a) 2006–2010 h (b) 2011–2015 h (c) 2016–2019 Figure 4: Univariate Parameter Analysis. The x -axis represents the parameter value, the y -axisrepresents the loss (with logarithmic colours for consistency across the various loss plots). Theother parameters are averaged over to provide the 1-dimensional view.15 Results - - - - - - - - - - - - - - $400k$500k$600k$700k$800k$900k$1.00M$1.10M Monthly MeanRolling MeanMonthly MedianRolling Median

Figure 5: The actual trends of Greater Sydney house prices from June 2006 to December 2019.Source: SIRCA-CoreLogic. The orange line represents the rolling mean, and the red line therolling median. The raw monthly data points are also visualised (dotted lines). We use themedian price trend as a more robust measure in all cases in this paper.In this section, we present results of ABM simulations in terms of (i) price forecasting overthree historic periods, aligned with the Australian Census years (2006, 2011 and 2016), and (ii)resultant household mobility patterns. We also identify market trends across the three periods.The considered periods 2006–2010 and 2011-2015 include 48 months, while the contemporaryperiod, 2016–2019, covers 42 months (our SIRCA-CoreLogic dataset includes the market datauntil 31 December 2019). For each period, we run 100 Monte Carlo simulations, using the modelparameters optimised for the corresponding training set, as described in Section 5, and thenobtain predictions for the remaining (testing) part of the data. For the ﬁrst two periods, the ﬁrst of the time-series is the training part (e.g., 36 months from 1 July 2006 to 30 June 2009), andthe remaining is the testing portion not used by any optimisation process (e.g., 12 months from1 July 2009 to 30 June 2010). For the last period, the training part includes 30 months (from 1July 2016 to 31 December 2018), with the remaining 12 months of 2019 used for testing. We visualise the forecasting results in Fig. 6. It is evident that the model can successfully capturethe key trends across the entire time-series, correctly identifying the peak and dip in 2006–2010,the steady growth during 2011–2015, and the growth and slow decline in 2016–2019.The time period beginning in 2006 was a period of substantial uncertainty, triggered by theGlobal Financial Crisis (GFC). Tracing predictions for the last quarter of the period (i.e., thetesting part of the dataset), shown in Fig. 6.a and Fig. 6.d, we observe that the average marketrebound pattern is not fully followed. However, when considering the range of the simulationsruns, i.e., the possibilistic regions of the simulation (as deﬁned by Edmonds and n´ı Aodha (2018),16 - - - - - - - - - $350k$375k$400k$425k$450k$475k$500k$525k$550k$575k (a) Median: 2006–2010 - - - - - - - - - $500k$550k$600k$650k$700k$750k (b) Median: 2011–2015 -

06 2017 -

01 2017 -

06 2018 -

01 2018 -

06 2019 -

01 2019 -

06 2020 - $725k$750k$775k$800k$825k$850k$875k (c) Median: 2016–2019 - - - - - - - - - $380k$400k$420k$440k$460k$480k$500k$520k (d) Mean: 2006–2010 - - - - - - - - - $500k$550k$600k$650k$700k$750k (e) Mean: 2011–2015 -

06 2017 -

01 2017 -

06 2018 -

01 2018 -

06 2019 -

01 2019 -

06 2020 - $725k$750k$775k$800k$825k$850k$875k (f) Mean: 2016–2019 Figure 6: ABM simulation predictions. The top row shows the median of the simulations (blacklines), every individual run (blue lines), and the minimum and maximum of every run with thelight blue ﬁll. The bottom row shows the mean of the simulations (black line), and ± ± The model was not directly optimised for spatial submarkets. However, in this section, we evaluatethe predictive capacity of the model in terms of area-speciﬁc forecasting. During initialisation,some area-speciﬁc information for LGAs is drawn from available distributions, for example, therecent sale price of dwellings in that LGA. Likewise, households are also distributed into LGAsbased on the actual population sizes of the LGAs. However, no additional optimisation is appliedacross diﬀerent LGAs with respect to actual area-speciﬁc trends. In other words, the spatialcomponent is used only for initialising relevant distributions, leaving the market dynamics todevelop through agent-to-agent interactions.In Fig. 7 we visualise the predicted area-speciﬁc pricing at the end of the testing period. Wecan see that these predictions closely follow actual data in general, despite not being directlyoptimised for, with all predictions characterised by high R values.For the period 2006–2010 (with the end of the testing period mapping to June 2010), thesimulations slightly overestimate the ﬁnal price of the cheaper LGAs, but underestimate theresulting price of the most expensive LGAs, as shown in Fig. 7.a. However, the perfect model(orange line) tends to be within the error margin (standard deviation) of the predictions of thesimulation.For 2011–2015 (with the end of the testing period being June 2015), the slopes of both actualand predicted regressions are almost identical ( m = 0 . Actual Price $300k$400k$500k$600k$700k$800k$900k$1.00M P r ed i c t ed P r i c e BAYSIDEBLACKTOWNBLUE MOUNTAINS BURWOODCAMDENCAMPBELLTOWN CANADA BAYCANTERBURY-BANKSTOWNCENTRAL COAST CITY OF PARRAMATTACITY OF SYDNEYCUMBERLANDFAIRFIELD GEORGES RIVERHAWKESBURY HORNSBY HUNTERS HILLINNER WEST KU-RING-GAILAKE MACQUARIELANE COVELITHGOWLIVERPOOL MOSMANNORTH SYDNEY NORTHERN BEACHESPENRITH RANDWICKRYDESTRATHFIELDSUTHERLAND SHIRE THE HILLS SHIREUPPER LACHLAN SHIRE WAVERLEY WILLOUGHBYWINGECARRIBEEWOLLONDILLY WOOLLAHRA y = 0.59 x +181153y = x Simulation Line of Best FitPerfect Model (a) June 2010 predictions (end of 2006–2010 test period). R = 0 . $400k $600k $800k $1.00M $1.20M $1.40M Actual Price $400k$600k$800k$1.00M$1.20M$1.40M$1.60M P r ed i c t ed P r i c e BAYSIDEBLACKTOWNBLUE MOUNTAINS BURWOODCAMDENCAMPBELLTOWN CANADA BAYCANTERBURY-BANKSTOWNCENTRAL COAST CITY OF PARRAMATTACITY OF SYDNEYCUMBERLANDFAIRFIELD GEORGES RIVERHAWKESBURY HORNSBY HUNTERS HILLINNER WEST KU-RING-GAILAKE MACQUARIE LANE COVELITHGOWLIVERPOOL MOSMANNORTH SYDNEY NORTHERN BEACHESPENRITH RANDWICKRYDESTRATHFIELDSUTHERLAND SHIRETHE HILLS SHIREUPPER LACHLAN SHIRE WAVERLEY WILLOUGHBYWINGECARRIBEEWOLLONDILLY WOOLLAHRA y = 0.98 x +159923y = x (b) June 2015 predictions (end of 2011–2015 test period). R = 0 . $500k $750k $1.00M $1.25M $1.50M $1.75M $2.00M $2.25M Actual Price $500k$1.00M$1.50M$2.00M$2.50M P r ed i c t ed P r i c e BAYSIDEBLACKTOWNBLUE MOUNTAINS BURWOODCAMDENCAMPBELLTOWN CANADA BAYCANTERBURY-BANKSTOWNCENTRAL COAST CITY OF PARRAMATTACITY OF SYDNEYCUMBERLANDFAIRFIELDGEORGES RIVERHAWKESBURY HORNSBY HUNTERS HILLINNER WEST KU-RING-GAILAKE MACQUARIE LANE COVELITHGOWLIVERPOOL MOSMANNORTH SYDNEY NORTHERN BEACHESPENRITH RANDWICKRYDESTRATHFIELD SUTHERLAND SHIRETHE HILLS SHIREUPPER LACHLAN SHIRE WAVERLEYWILLOUGHBYWINGECARRIBEEWOLLONDILLY WOOLLAHRA y = 0.91 x +212812y = x (c) December 2019 predictions (end of 2016–2019 test period). R = 0 . Figure 7: Predicted LGA pricing at the end of the testing period. The actual prices (SIRCA-CoreLogic) are shown on the x-axis, with the predicted prices on the y-axis. Error bars representthe standard deviation of the prediction across simulation runs. The orange line shows the perfectmodel ( y = x ), the blue line shows the least-squares line of best ﬁt for the predictions (equationgiven on plot). 19 .3 Household Mobility Analysing the households’ movements produced by the simulation is another key insight thespatial agent-based model can provide. In this section, we consider various agent movementpatterns (which we refer to as household mobility), aiming to identify the salient trends. Thereare several key areas we focus on: ﬁrst-time home buyers, investors, and new households (i.e.,migrations or households splitting). Again, no direct optimisation was applied to the movements,and so the identiﬁed trends are intrinsic results of the model, and not attributed to some actualdata. However, we show that such mobility patterns are supported by evidence, thus arguingthat the model is able to produce sensible local patterns for which it was not explicitly optimisedfor, based only on the global calibration data.The mechanisms shaping the process of settlement formation and generating intra-urbanmobility speciﬁcally, include transitions driven by critical social dynamics, transformations oflabour markets, changes in transport networks, as well as other infrastructural developments (Kimet al., 2005; Simini et al., 2012; Barthelemy et al., 2013; Louf and Barthelemy, 2013; Arcauteet al., 2016; Barthelemy, 2016; Crosato et al., 2018; Barbosa et al., 2018; Piovani et al., 2018;Slavko et al., 2019; Barthelemy, 2019). Types of home ownership, in particular, are known toaﬀect mobility patterns (Crosato et al., 2020). In this work, we focus solely on the movementsresulting from the housing market dynamics, which in turn incorporate the imperfect spatialinformation and other subjective factors such as the FOMO and trend following aptitude. We donot model any structural changes across the regions, i.e., the LGAs boundaries, transport andother infrastructure topologies, etc. remain ﬁxed.

Cheapest Most Expensive

Affordability P e r c en t age o f M o v e m en t s (a) First-time home buyers Cheapest Most Expensive

Affordability (b) Overseas Investors

Cheapest Most Expensive

Affordability (c) Local Investors

Figure 8: Comparison of price inﬂuenced mobility between the three periods. The x-axis representsthe aﬀordability (most aﬀordable locations on the left). The y-axis represents the (smoothed)percentage of movements to the area. Darker colours represent later years. A full breakdown ofhousehold mobility is provided in Appendix H.

New households are those which are added throughout the simulation based on the projectedhousehold growth. New households can result from a variety of sources, such as people movingto Greater Sydney (migration), or households from Greater Sydney splitting, e.g., in the caseof divorce or young adults moving out of home. We make no distinction between the two typesin the simulation, and for simplicity, refer to both inter-city and intra-city migration types asmigrants. 20he most common LGAs into which the new households move are shown in Fig. 17 . Thesimulation produces a clear trend for migration towards the cheaper areas, as the price beginsto increase throughout the Greater Sydney region over time. This is most apparent for the2016–2019 period, during which we can detect only a minority of the new households thatpurchased a dwelling in the expensive areas upon moving to the Greater Sydney. This is markedlydiﬀerent when in comparison to the 2006–2010 period. Furthermore, there is a clear peak in themore aﬀordable LGAs, comprising Western Sydney (such as Campbelltown, Penrith, Blacktown,Fairﬁeld, and Liverpool), and LGAs further away from the metropolitan area (such as CentralCoast, Lake Macquarie, and Hawkesbury). A similar trend can be seen with the new renters.This agrees with the discussion in (Bangura and Lee, 2019), which names Western Sydney as“the ﬁrst port of call for new arrivals, immigrants and refugees”. This observations also agree withthe study of (Slavko et al., 2020), which shows the outward sprawl from the densely populatedSydney metropolitan area. Homeownership has long been a goal of many Australians (Bessant and Johnson, 2013), sosimulating the forecasted feasibility in Sydney — the largest, and most expensive (Yetsenga andEmmett, 2020), Australian city — is essential. The ﬁrst-time home buyers are deﬁned here asthose who have resided in Sydney but have previously been renters, and then purchased theirown dwelling. This is in contrast with the analysis in Section 6.3.1, where the new owners aredeﬁned as those that had just entered the simulation (by moving to Sydney).The ﬁrst home purchases are visualised in Fig. 19. The main diagonal represents an agentpurchasing in the same LGA as the one where the household is currently renting. The area belowthe main diagonal (which we refer to as lower triangle) shows households purchasing in cheaperLGAs in comparison to those where they are renting, and the area above the main diagonal (theupper triangle) shows the agents purchasing in LGAs more expensive than those where they arecurrently renting. In the earlier years (i.e., the 2006–2010 period), we can see that the densitiesin the heatmaps are relatively evenly distributed. Over time, however, the density of the uppertriangle begins to decrease, meaning that the agents are purchasing in the LGAs cheaper thanthose where they are renting, as the expensive LGAs become increasingly out of reach. This isalso reﬂected in Fig. 8a: a comparison between the 2006–2010 and the 2016–2019 periods clearlyshows that many suburbs are simply becoming out of reach for the ﬁrst-time home buyers, witha larger percentage of them needing to purchase in the more aﬀordable areas.This result is in line with (Randolph et al., 2013) which shows the distribution of First HomeOwner Grants within Sydney statistical districts, over the period 2000–2010, pointing out thatsuch grants were increasingly likely in the lower-income housing markets, such as Western andSouthern Sydney. Likewise, in (La Cava et al., 2017), it is shown that the average distance to theCBD of dwellings that ﬁrst-time home buyers can aﬀord has been increasing from 2006 through2016. Furthermore, La Cava et al. (2017) shows that the purchasing capacity of ﬁrst-time homebuyers has been limited to the bottom (i.e., most aﬀordable) 10-30% of dwellings, where in 2016the median ﬁrst-time home buyer could aﬀord only around 10% of the available dwellings. Asimilar conclusion is reached in (Kupke and Rossini, 2011) which ﬁnd key-workers being able toaﬀord fewer dwellings and commute longer distances between 2001-2009, a trend which seems tohave been followed ever since, as shown by the mobility patterns here. All movements are scaled by the population size to allow a fair comparison, as outlined in Appendix H. .3.3 Investors Investors are deﬁned as households which own multiple dwellings, or households which live overseasyet own a property in Greater Sydney. We make an explicit distinction between local (residingin Sydney) and overseas investors, since price increases within Sydney are often attributed tothe latter category (Rogers et al., 2015, 2017; Wong, 2017; Guest and Rohde, 2017)). Thisdistinction is visualised in Fig. 18. The overseas investment approvals are regulated by theAustralian government, and details of the approval data are provided in Appendix I.1.During the 2006–2010 period, based on the overseas approvals granted, the simulation producesa clear preference for the overseas buyers towards the most expensive regions, purchasing propertiesalmost exclusively in the highest-priced regions. In later periods, we begin to observe a relativelywider-range of preference, although still with a clear trend towards the mid-high range areas. Thisis reﬂected in Fig. 8b. A driving factor behind this is the higher average government approvalfor overseas investment given during 2006–2007 years, in comparison to later years, reﬂectedin our simulation (as displayed in Fig. 20). While the foreign investment data in the GreaterSydney housing market is sparse and not ﬁne-grained, this purchasing pattern is in concordancewith recent literature. For example, the study of (Gauder et al., 2014) which mentions thatforeign investors tend to prefer inner-city dwellings within Sydney (which tend to correspond tohigher-priced LGAs), however recently “foreign investment has started to broaden out into otherareas of Sydney”.These ﬁndings are in sharp contrast to mobility patterns of the local investors, for which thesimulation produces a far wider distribution across areas. For 2011–2015, and 2016–2019 periods,the most expensive LGAs become out of reach for local investors, which is most apparent during2016–2019 (as shown in Fig. 8c). Local investors can be seen buying properties in many of thecheaper LGAs, which also falls in line with the simulation showing many renters in these areas.Due to the aﬀordability, these LGAs also exhibit higher population growth rates than other areas.This also agrees with existing studies, for example (Pawson and Martin, 2020), which ﬁnd thatmany high-income Australian landlords are investing in dwellings in lower socioeconomicallydeveloped regions of Sydney.

In this work, we have introduced a spatial element to a model of a large, well-developed housingmarket (the Greater Sydney region) using an adjacency matrix based on the spatial compositionof the city. This model is capable of capturing a large variety of spatial topologies, for example,monocentric and polycentric cities. Furthermore, the graph-based approach is ﬂexible allowingfor any level of granularity, for example, over the diﬀering scales of countries, cities, or suburbs.Using this model we have demonstrated the usefulness of spatial analysis when it is calibratedto the Australian house price data for the Greater Sydney region. The 38 LGAs of GreaterSydney were simulated, and agents (households) were calibrated based on the LGA in which theylive. We have shown that the spatial component allows an additional level of ﬁne-tuning thatresults in better overall ﬁtting of the model to data, as well as producing strong out-of-samplepredictions for each individual LGA by optimising only for the overall trend. That is, spatialareas add an additional layer of predictions, while also improving the overall aggregate trend.We investigated the agent’s spatial awareness of the market, where buyers only have limitedknowledge of the market, based on the area in which they reside (imperfect spatial information).We demonstrated through varying α (a parameter that controls how much of the market eachagent is able to perceive), that lower values of α capture the true trend better than perfect (wholeof market) knowledge, indicating the usefulness of modelling this imperfect spatial information in22 housing market.The spatial component also allows for the analysis of movement patterns, where we haveshown diﬀerences in mobility and purchase locations between various agent types. For examplediﬀerences between ﬁrst-time home buyers and investors, where ﬁrst-time home buyers are limitedto the more aﬀordable locations, with investors being able to purchase higher-priced properties.Likewise between local and overseas investors, where overseas investors are shown to have a strongpreference towards mid to high valued areas. We also model new migrations to the city, showingsuch agents becoming increasingly pushed towards cheaper areas of the city.We have also introduced a novel fear of missing out component. With this parameter, wemodel how sellers become more likely to sell a listing if many surrounding listings have recentlysold, and show a strongly localised fear-of-missing-out occurring throughout the market. Thisindicates agents’ decisions are often motivated by their neighbour’s decisions rather than by strictoptimisation of their own beneﬁts, i.e., real households are only partially rational in this regard.While in this work we addressed some key concerns in a housing market, there are still severalareas of improvement we would like to focus on in future work. The spatial component opensup a range of additional possibilities, such as overlaying public transport maps on the network,allowing for the distance to key work areas, schools, beaches etc, and further modelling andcapturing agent mobility within the simulation. The demographics and household types could besampled from actual data, which would allow analysis into subgroups of people (i.e., young singlesvs retirees vs families), and allow us to model any spatial trends that arise between demographicgroups. The internal optimisation functions (for example, what neighbourhood to move to, whatkind of dwelling to choose) of agents could also be investigated further, as currently agents willpurchase the most expensive dwelling they can aﬀord based on their knowledge and outreach.There are also three key equations which drive the model that could be further investigated:the bid prices, the listing prices, and the bank approval process. These are currently predeﬁnedequations but they could be treated as optimisation problems themselves, ﬁnding expressionsthat match most closely to the training period. A Implementation

The model is written from scratch in Python3, based on the C++ code from Glavatskiy et al.(2020).

B Market Matching

The market matching process where buyers and sellers are matched is relatively simple and givenin Algorithm 1. We can see the highest bidding buyer gets preference to the listings, and everybuyer attempts to purchase the most expensive listing they can aﬀord. In certain cases, dealsare rejected due to external inﬂuences (modelled by a random 20% chance of rejection). Thismatching process is performed once every simulation step, with bids and listings that did notclear persisting into the next step. 23 lgorithm 1

Market Matching procedure match ( bids, listings ) sort bids (cid:46) From highest bid price to lowest sort listings (cid:46) From highest list price to lowest for bidder in bids do best listing ← max listing buyer can afford if no deal breaker then Make deal between buyer and seller listings − best listing (cid:46) Remove this listing from the available listings end if end for end procedure C Rental Market

In the baseline model, there was no concept of rental matching. Households were randomlyassigned a rental, with no regard to the cost of the dwelling or income of the household. Here, weadd in an additional matching process based on the idea that households should spend maximum30% of their income on housing when possible to avoid housing stress (Thomas and Hall, 2016;Fernald, 2020).New households are randomly assigned a “local” area (weighted by the population of eacharea), where they begin and have their characteristics (wealth, income, cash ﬂow etc) assigned.From there, every household attempts to ﬁnd a vacant rental in their price range (which will likelyresult in various households moving out of ﬁnancial requirements). Households with extremelyhigh incomes, where all dwellings are less than 10% of their income, get the most expensive rentalavailable. Households with extremely low income, where all dwellings are at least 30% of theirincome, get the cheapest one they can aﬀord. All other households randomly choose a rental theycan aﬀord (in the 10%-30% of income range).Households remain in their rentals for the duration of the simulation. In this work, we donot attempt to capture the rental market in its entirety and leave this for future work where wewould like to model the relationship between renters and investors. The changes were made toensure the cash ﬂow situations of each household match closer to those seen in the real world,where in the previous model many households would be in a poor cash ﬂow situation due to therental price. Other work such as Mc Breen et al. (2010) looks more in-depth at modelling therental market.

D Bayesian Optimisation

D.1 Details

Bayesian optimisation is performed using the Tree of Parzen Estimators approach with hyperoptfrom Bergstra et al. (2013). The optimisation process was run for 2000 iterations in all cases.The loss was measured as the average loss over several stochastic runs for each set of parameters,to minimise the eﬀect of randomisation in the model and resulting loss.24 .2 Loss Function

For measuring the goodness of ﬁt, we use a loss function with two terms - a shape and temporalterm to try and capture the nonlinearities overtime when predicting housing price trends. Thisloss function is a modiﬁcation of DILATE (Vincent and Thome, 2019) which was introducedas a loss function for neural networks for time-series predictions, although DILATE has beensimpliﬁed here (with the removal of smoothing parameters) as Bayesian optimisation does notrequire the loss function to be diﬀerentiable.The loss function is given in Eq. (8). (cid:96) = λ ∗ shape + (1 − λ ) ∗ temporal (8) λ = 0 . λ controlling the strength of the penalty).The shape term is based on dynamic time warping (DTW) which has commonly been used inspeech recognition tasks (Sakoe and Chiba, 1978; Myers et al., 1980), however, has a wide rangeof applications in time series data (Berndt and Cliﬀord, 1994). Dynamic time warping can beexpressed recursively as a minimisation problem as in Eq. (9) shape = DT W = d ( x, y ) + min  DT W ( x − , y ) ,DT W ( x − , y − ,DT W ( x, y −  (9)Which can be read as minimising the cumulative distance (using distance measure d , in thiscase, euclidean distance) on some warped path between x and y , by taking the distance betweenthe current elements and the minimum of the cumulative distances of neighbouring points.Unlike the common applications in speech recognition, where words can be spoken at varyingspeeds (so the peaks do not necessarily match up), in ﬁnancial markets, timing such peaks isimportant. This motivates the introduction of a temporal term, for trying to align such peaksand dips. The temporal term is based on Time Distortion Index (TDI) (Fr´ıas-Paredes et al.,2016, 2017), which can be thought of as the normalised area between the optimal path and theidentity path (where the identify path is (1 , , (2 , .., ( N, N )) (Vallance et al., 2017) and aimsto minimise the impact of shifting and distortion in time series forecasting (Fr´ıas-Paredes et al.,2016). P l = (cid:90) i l +1 i l (cid:18) x − ( x − i l )( j l +1 − j l )( i l +1 − i l ) + j l (cid:19) dx (10) temporal = T DI = 2 (cid:80) | P l N (11)To see the usefulness over a more standard approach loss function such as MSE for time series,consider the example in Fig. 9. We can see the MSE can be a problematic approach, and in somecases (as in the example where the linear line Fig. 9b has a lower loss) be a misleading measureof goodness of ﬁt. DTW helps to match points in the two time-series, while TDI helps minimisethe oﬀset of the predictions (graphically in the example this corresponds to shortening the dottedgrey lines). For a full analysis, we refer you to the original DILATE paper of Vincent and Thome(2019), noting that all smoothing terms have been removed in the modiﬁcation here.25 - - - - - - - - - - - - - $420k$440k$460k$480k$500k$520k DTW:

TDI:Dilate*:MSE: 0.004.032.021639012643 (a) Shifted Predictions - - - - - - - - - - - $420k$440k$460k$480k$500k$520k DTW:

TDI:Dilate*:MSE: 108865.633.4754434.55520553109 (b) Linear Predictions

Figure 9: Motivation of time series based loss using a constructed example. We can see the lineon the right is a very poor predictor of the true trend, failing to capture any of the peaks or dips.However, the MSE is signiﬁcantly lower than the line on the left. DTW captures the shifts, andincorporating a penalty on time can penalise these shifts. The light grey lines show how DTWmatches points together, even if they do not occur at the same time period.

D.3 Global Constraints

We can see 2011–2015 and 2016–2019 ﬁt the trend very closely, although despite having a low loss,the 2006–2010 simulation path does not follow the dip well, as no distinction is made about beingabove or below the trend in the loss function. Looking at the individual paths from every run, wecan see that a peak and dip is predicted in many of the cases, although the distance is greaterthan the path with the lowest loss which was perfectly matching across a large portion of thetraining data but missing the dip. We apply a post optimisation global constraint to 2006–2010,again only using this training period, that the midpoint of the simulation must be higher than thestart and ending points (i.e., a peak must occur), and take the parameters with the lowest lossmatching this criterion. The process is shown in Fig. 10 and the result is shown in Fig. 3p. Wecan see for 2006–2010, the (cid:96) is higher than before the constraint, however, clearly, the constraintallows for a closer overall trend following in the training period. The visualisation in Fig. 10 canalso begin to show the wide range of possible market outcomes, for various combinations of theparameters. If a certain section occurs from many parameter outcomes (i.e., with the peak), wecan deduce that such dynamics were likely to occur just due to the agent characteristics, regardlessof the parameters used. This shows many combinations lead to a peak and dip, perhaps due tomortgage rates and worrying mortgage vs income ratios. This is more in line with suggestions inEdmonds and n´ı Aodha (2018), which suggest ABMs be used to determine a range of potentialfuture outcomes, which in this case shows a variety of paths leading to a peak and dip.

D.4 Parameter Space

The parameter space is deﬁned in Table 2.Even though there are only three parameters to tune, the number of potential combinations26 earch Space Explanation α [0,1] Probability of viewing a listing, scaled by the outreach β [-10,+10] The contribution of surrounding listing sales when considering selling a dwelling h [-1,+1] Trend following aptitude Table 2: The three tunable hyperparameters. All parameters are sampled uniformly within theseranges.exceeds 4 million (this is assuming values are discretized values, so the true number is far greater),making a grid search impractical.The three parameters are h , α , and β . -

06 2007 -

06 2008 -

06 2009 - $500k$1.00M$1.50M$2.00M (a) No Constraints -

06 2007 -

06 2008 -

06 2009 - $500k$1.00M$1.50M$2.00M (b) Red shows the paths whichwill be removed. -

06 2007 -

06 2008 -

06 2009 - $500k$1.00M$1.50M$2.00M (c) Paths matching constraint Figure 10: Global Constraint Process h (a) h (b) h (c) Figure 11: Search space exploration. Colour indicates the loss.27 .8 0.40.0 0.4 0.8 h S a m p l e s S a m p l e s . . . . . h S a m p l e s (a) h S a m p l e s S a m p l e s . . . . . h S a m p l e s (b) h S a m p l e s S a m p l e s . . . . . h S a m p l e s (c) Figure 12: Parameter Interactions and Parameter sampling.

E Sensitivity Analysis

While in Section 5.1 we analysed the contribution of each new component by comparing theresulting optimised time series after introducing the components one at a time, here we verifyand rank the importance of each of the contributions explicitly using global sensitivity analysis(GSA).Speciﬁcally, we analyse the importance of the trend following aptitude ( h ), the social contri-bution ( β ), and the role of α in minimising the loss function.We use the Morris Method (Morris, 1991) for a GSA, and present the revised µ ∗ as suggestedin Saltelli et al. (2004) and σ . µ ∗ represents the mean absolute elementary eﬀect, and can beused to rank the contribution of each parameter, this solves the problem of µ where elementaryeﬀects can cancel out. We also analyse σ , i.e., the standard deviation of the elementary eﬀects, asa measure of the interactions.For parameters for the Morris Method, we use r = 20 trajectories, p = 10 levels, and stepsize ∆ = p/ [2( p − ≈ .

52 with p = 10. These are within the range of commonly usedparameters, e.g. in Campolongo et al. (2007).The results are presented in Table 3, and visualised in Fig. 14 and Fig. 13.Table 3: Morris Method for Sensitivity Analysis µ ∗ σ µ ∗ σ µ ∗ σh β α µ (cid:63) , we can see h consistently ranks the mostimportant, showing its changes have the largest eﬀect on (cid:96) . This is followed in importance by β ,and then α each year. However, we see that conﬁdence bars do overlap in Fig. 13.Viewing the Morris plots in Fig. 14, we can see all parameters are deemed important, whereunimportant parameters would show up in the bottom leftmost portion of the plot. Usingthe classiﬁcation strategy of Sanchez et al. (2014), all parameters are all considered to benon-monotonic and/or with high levels of interaction, since σµ (cid:63) > .0 0.2 0.4 0.6 0.8 1.0 h (a) 2006–2010 h (b) 2011–2015 h (c) 2016–2019 Figure 13: Importance plot showing µ (cid:63) . Error bars are displayed at the 95% conﬁdence level.. h / =1.0/ =0.5/ =0.1 (a) 2006–2010 h / =1.0/ =0.5/ =0.1 (b) 2011–2015 h / =1.0/ =0.5/ =0.1 (c) 2016–2019 Figure 14: Global sensitivity analysis with Morris plots. Diagonal lines represent the ranges for σ/µ (cid:63) . One classiﬁcation strategy proposed by Sanchez et al. (2014) says factors which are almostlinear should be below the 0.1 line, factors which are monotonic between 0.1 and 0.5 lines, oralmost monotonic between the 0.5 and 1 line, and factors with non-monotonic non-linearities orinteractions with other factors above the 1 line .While the Morris method gives us the overall sensitivity across the parameter ranges (in aglobal way) and allows us to rank the factors in terms of importance, we also provide a ﬁne-grainedsensitivity analysis around the default values, i.e., a local sensitivity analysis (LSA). For this, weuse p = 100 levels, but vary only one parameter at a time while keeping the others ﬁxed at theirdefault values. This is shown in Fig. 15. This analysis shows how robust the resulting defaultvalues are to small perturbations, but as this is a local method, the results should be interpretedwith caution (and only in conjunction with the GSA method above), since this does not accountfor any parameter interactions as warned in Saltelli et al. (2019).Viewing h (the left column), we can see all values surrounding the default have a similar loss,showing the model is robust to small changes in the aptitude. Looking across the entire searchspace, we can see choosing from within an appropriate range for the aptitude is important though,but the surrounding parameters are always relatively smooth to the resulting loss. Viewing β (the middle column), we can see the sharp transition above zero. There is a clear optimal rangefor β , where the default lies. However, again, the area surrounding the default values is smoothshowing robustness to the default parameters (assuming we do not vary past the sharp transition).Looking at α (the ﬁnal column), the plots initially seem somewhat jagged, although when lookingat the scale of the y − axis it becomes clear these are very small shifts in loss (as veriﬁed bythe plotted time-series with varying α levels). α was deemed the least important of the threeparameters by the Morris Method screening but was still important based on the positioning on29he Morris plot. We can verify this here, where changes in α do not have a huge impact on (cid:96) .GSA was performed using SALib from Herman and Usher (2017).30 .00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00100k200k300k400k500k600k (a) h (b) β (c) α Figure 15: Univariate LSA of default parameters, varying one factor at a time with others attheir optimised values. The plots give the change in parameter value (x-axis) vs (cid:96) (y-axis). Thedotted vertical black line shows the optimised value.31

Experiment Settings

Due to the stochastic and non-deterministic nature of ABMs, we run 100 Monte Carlo simulationsper run (unless otherwise stated) and report the aggregate results over all runs. Experiments arerun at a 1:100 scale of the true housing market, i.e., every one hundred households in the GreaterSydney region are represented by one household in the model.

G Initialisation Data

G.1 Data

All real estate listings and sales from 2006 to present (2020) were used from SIRCA-CoreLogic,including the sale price, LGA, and sale date. This data is used as the actual price, and to calibratethe ABM.

G.2 Spatial Initialisation

G.2.1 Pricing Distributions

Between LGAs, there is a wide range of dwelling sale prices, and diﬀerent distributions of pricesamongst the LGAs as well. To sample from this eﬀectively, we use kernel density estimation(KDE) to create a probability density function for each LGA for each time period. The previous3 months of sales from the beginning of the time period are used to generate the density function.Scott’s Rule (Scott, 2015) is used to assign the bandwidth, which sets the bandwidth to n − d +4 ,where n is the number of data points (in this case dwelling sales in the LGA at the beginning ofthe time period), and d is the number of dimensions (in this case d = 1). The resulting KDEs areshown in Fig. 16. $ 0k $500k $1.00M $1.50M $2.00M $2.50M $3.00M (a) $ 0k $500k $1.00M $1.50M $2.00M $2.50M $3.00M (b) $ 0k $500k $1.00M $1.50M $2.00M $2.50M $3.00M (c) Figure 16: KDE plots for each LGA based on SIRCA-CoreLogic data. Dark red indicates theGreater Sydney average, and this is assigned to LGAs without enough data to generate their ownreliable KDE .

G.2.2 Positioning

Households are not assigned to an LGA directly, as households can freely move areas. Instead,the households area is based on the residence dwelling of the household (and thus can change32ver time). When we reference a households area, we are referring to the LGA of the dwellingwhere the household currently resides.At the beginning of the simulation, households which are homeowners are assigned to dwellingsto match the population distribution amongst LGAs. The income and liquid wealth for thehousehold are then assigned based on the brackets from the dwellings LGA. Renters are assigneda random LGA to begin with (again weighted by the population of each LGA) and income andwealth based on the distribution of that LGA. Households then try and ﬁnd a rental they canaﬀord (on with a rental price approximately 10%-30% of the household’s income) which maymean some have to move LGAs.

G.3 Time Periods

In line with the previous work of Glavatskiy et al. (2020), and following the Australian censustimelines, we choose three time periods for analysis. These are 2006–2010, 2011–2015, and2016–2019. Each period corresponds to the Australian census data years, meaning there is a largearray of available data for calibration to ensure the models begin in a state as close to possible asthe true populations state.

G.4 Household Characteristics

G.4.1 Income

Income is assigned from the distribution based on the households area. This distribution comesfrom the census data. Income grows throughout the simulation. The income brackets follow thosespeciﬁed in the census data.

G.4.2 Liquid Wealth

Again, the liquid wealth (liquidity) of a household is based on the true distributions from censusdata. However, in this case, liquidity is not available per LGA, only for Greater Sydney as awhole. So to map a household to an appropriate liquidity bracket, the households liquid is basedon the income of the household. That is, if a household is in the top X% of earners in an LGA,the liquidity will be in the top X% as well (approximately, since liquidity is from brackets).

G.5 Population Distribution

In this case, there are three measures of interest. The total number of dwellings, the total numberof households, and the distribution of these households amongst LGAs. The dwellings andhouseholds estimates from the census data are used for each year, and simple linear projectionsused for forecasting the growth of these. The distribution amongst LGAs is that recordedat the start of the simulation and is assumed to grow linearly with the overall populationsize. Individual LGA future population projections are available from 2016 onward, but as noprojections existed before this date, we used this simpliﬁed measure instead of all LGAs growingby a ﬁxed percentage within a given simulation period. As such, higher movements towards oneparticular LGA throughout simulation could indicate the requirement of additional dwellingsbeing built here to cater for the growth, which is another contribution we consider in later sectionsof this work. 33

Movement Pattern Visualisations

Over 10 million total movements were tracked across the simulations (approximately 3.3 millionper time period). All plots in this section represent the normalised heatmaps of these movements.The total number of movements to a particular LGA is scaled by the population size of this LGA,meaning the results can be interpreted as a preference for certain areas rather than visualisingthe population size of the LGAs. Therefore, movements are not just reﬂecting larger populations,instead, reﬂecting a larger portion of people moving there relative to the size. All movements arethen normalised such that the summation of all cells in the plot is 1, meaning if a particular cellhas a value of 0.05, this means 5% of all matched movements moved to this LGA.The rows and columns of the plots are always sorted in ascending order based on median price,i.e., the most aﬀordable LGAs ﬁrst, and the most expensive LGA as the ﬁnal row or column.34

AMPBELLTOWNPENRITHBLACKTOWNFAIRFIELDHAWKESBURYCENTRAL COASTBLUE MOUNTAINSCUMBERLANDWOLLONDILLYLIVERPOOLCAMDENCANTERBURY-BANKSTOWNSTRATHFIELDCITY OF PARRAMATTAUPPER LACHLAN SHIRELITHGOWBAYSIDEWINGECARRIBEECITY OF SYDNEYGEORGES RIVERBURWOODSUTHERLAND SHIRETHE HILLS SHIREHORNSBYRYDEINNER WESTNORTH SYDNEYRANDWICKLAKE MACQUARIECANADA BAYLANE COVEWAVERLEYWILLOUGHBYNORTHERN BEACHESKU-RING-GAIHUNTERS HILLMOSMANWOOLLAHRA 0% 2% 5%CAMPBELLTOWNPENRITHCENTRAL COASTCAMDENBLACKTOWNBLUE MOUNTAINSLIVERPOOLWOLLONDILLYHAWKESBURYFAIRFIELDCUMBERLANDSTRATHFIELDCANTERBURY-BANKSTOWNCITY OF PARRAMATTAUPPER LACHLAN SHIRELITHGOWBAYSIDELAKE MACQUARIEGEORGES RIVERCITY OF SYDNEYSUTHERLAND SHIRETHE HILLS SHIREHORNSBYRYDEBURWOODWINGECARRIBEENORTH SYDNEYCANADA BAYINNER WESTLANE COVERANDWICKWILLOUGHBYNORTHERN BEACHESWAVERLEYKU-RING-GAIMOSMANWOOLLAHRAHUNTERS HILL 0% 2% 4%LAKE MACQUARIECENTRAL COASTCAMPBELLTOWNCAMDENBLUE MOUNTAINSPENRITHBLACKTOWNWOLLONDILLYLIVERPOOLHAWKESBURYCUMBERLANDFAIRFIELDSTRATHFIELDCANTERBURY-BANKSTOWNLITHGOWUPPER LACHLAN SHIRECITY OF PARRAMATTABAYSIDERYDEGEORGES RIVERSUTHERLAND SHIREBURWOODTHE HILLS SHIRECITY OF SYDNEYHORNSBYLANE COVENORTH SYDNEYRANDWICKINNER WESTWINGECARRIBEECANADA BAYNORTHERN BEACHESWAVERLEYWILLOUGHBYKU-RING-GAIMOSMANWOOLLAHRAHUNTERS HILL 0% 2% 4% (a) New Renters

0% 2% 5%0% 4% 8%0% 5% 10% (b) New Owners

Figure 17: Migrations. These plots capture new households in Greater Sydney throughout thesimulation period, due to either migration or splitting of existing households. The ﬁrst row is the2006–2010 period, the middle row the 2011–2015 period, and the ﬁnal row the 2016–2019 period.35

AMPBELLTOWNPENRITHBLACKTOWNFAIRFIELDHAWKESBURYCENTRAL COASTBLUE MOUNTAINSCUMBERLANDWOLLONDILLYLIVERPOOLCAMDENCANTERBURY-BANKSTOWNSTRATHFIELDCITY OF PARRAMATTAUPPER LACHLAN SHIRELITHGOWBAYSIDEWINGECARRIBEECITY OF SYDNEYGEORGES RIVERBURWOODSUTHERLAND SHIRETHE HILLS SHIREHORNSBYRYDEINNER WESTNORTH SYDNEYRANDWICKLAKE MACQUARIECANADA BAYLANE COVEWAVERLEYWILLOUGHBYNORTHERN BEACHESKU-RING-GAIHUNTERS HILLMOSMANWOOLLAHRA 0% 1% 3%CAMPBELLTOWNPENRITHCENTRAL COASTCAMDENBLACKTOWNBLUE MOUNTAINSLIVERPOOLWOLLONDILLYHAWKESBURYFAIRFIELDCUMBERLANDSTRATHFIELDCANTERBURY-BANKSTOWNCITY OF PARRAMATTAUPPER LACHLAN SHIRELITHGOWBAYSIDELAKE MACQUARIEGEORGES RIVERCITY OF SYDNEYSUTHERLAND SHIRETHE HILLS SHIREHORNSBYRYDEBURWOODWINGECARRIBEENORTH SYDNEYCANADA BAYINNER WESTLANE COVERANDWICKWILLOUGHBYNORTHERN BEACHESWAVERLEYKU-RING-GAIMOSMANWOOLLAHRAHUNTERS HILL 0% 2% 5%LAKE MACQUARIECENTRAL COASTCAMPBELLTOWNCAMDENBLUE MOUNTAINSPENRITHBLACKTOWNWOLLONDILLYLIVERPOOLHAWKESBURYCUMBERLANDFAIRFIELDSTRATHFIELDCANTERBURY-BANKSTOWNLITHGOWUPPER LACHLAN SHIRECITY OF PARRAMATTABAYSIDERYDEGEORGES RIVERSUTHERLAND SHIREBURWOODTHE HILLS SHIRECITY OF SYDNEYHORNSBYLANE COVENORTH SYDNEYRANDWICKINNER WESTWINGECARRIBEECANADA BAYNORTHERN BEACHESWAVERLEYWILLOUGHBYKU-RING-GAIMOSMANWOOLLAHRAHUNTERS HILL 0% 3% 6% (a) Local

0% 12% 25%0% 2% 5%0% 2% 5% (b) Overseas

Figure 18: Investors. These plots show the simulation diﬀerence between local and overseasinvestment patterns. The ﬁrst row is the 2006–2010 period, the middle row the 2011–2015 period,and the ﬁnal row the 2016–2019 period. 36 A M P B E LL T O W N P E N R I T H B L A C K T O W N F A I R F I E L D H A W K E S B U R Y C E N T R A L C O A S T B L U E M O U N T A I N S C U M B E R L A N D W O LL O N D I LL Y L I V E R P OO L C A M D E N C A N T E R B U R Y - B A N K S T O W N S T R A T H F I E L D C I TY O F P A RR A M A TT A U PP E R L A C H L A N S H I R E L I T H G O W B A Y S I D E W I N G E C A RR I B EE C I TY O F S Y D N E Y G E O R G E S R I V E R B U R W OO D S U T H E R L A N D S H I R E T H E H I LL S S H I R E H O R N S B Y R Y D E I NN E R W E S T N O R T H S Y D N E Y R A N D W I C K L A K E M A C Q U A R I E C A N A D A B A Y L A N E C O V E W AV E R L E Y W I LL O U G H B Y N O R T H E R N B E A C H E S K U - R I N G - G A I H U N T E R S H I LL M O S M A N W OO LL A H R A CAMPBELLTOWNPENRITHBLACKTOWNFAIRFIELDHAWKESBURYCENTRAL COASTBLUE MOUNTAINSCUMBERLANDWOLLONDILLYLIVERPOOLCAMDENCANTERBURY-BANKSTOWNSTRATHFIELDCITY OF PARRAMATTAUPPER LACHLAN SHIRELITHGOWBAYSIDEWINGECARRIBEECITY OF SYDNEYGEORGES RIVERBURWOODSUTHERLAND SHIRETHE HILLS SHIREHORNSBYRYDEINNER WESTNORTH SYDNEYRANDWICKLAKE MACQUARIECANADA BAYLANE COVEWAVERLEYWILLOUGHBYNORTHERN BEACHESKU-RING-GAIHUNTERS HILLMOSMANWOOLLAHRA 0.00000.00050.00100.00150.00200.0025 C A M P B E LL T O W N P E N R I T H C E N T R A L C O A S T C A M D E N B L A C K T O W N B L U E M O U N T A I N S L I V E R P OO L W O LL O N D I LL Y H A W K E S B U R Y F A I R F I E L D C U M B E R L A N D S T R A T H F I E L D C A N T E R B U R Y - B A N K S T O W N C I TY O F P A RR A M A TT A U PP E R L A C H L A N S H I R E L I T H G O W B A Y S I D E L A K E M A C Q U A R I E G E O R G E S R I V E R C I TY O F S Y D N E Y S U T H E R L A N D S H I R E T H E H I LL S S H I R E H O R N S B Y R Y D E B U R W OO D W I N G E C A RR I B EE N O R T H S Y D N E Y C A N A D A B A Y I NN E R W E S T L A N E C O V E R A N D W I C K W I LL O U G H B Y N O R T H E R N B E A C H E S W AV E R L E Y K U - R I N G - G A I M O S M A N W OO LL A H R A H U N T E R S H I LL CAMPBELLTOWNPENRITHCENTRAL COASTCAMDENBLACKTOWNBLUE MOUNTAINSLIVERPOOLWOLLONDILLYHAWKESBURYFAIRFIELDCUMBERLANDSTRATHFIELDCANTERBURY-BANKSTOWNCITY OF PARRAMATTAUPPER LACHLAN SHIRELITHGOWBAYSIDELAKE MACQUARIEGEORGES RIVERCITY OF SYDNEYSUTHERLAND SHIRETHE HILLS SHIREHORNSBYRYDEBURWOODWINGECARRIBEENORTH SYDNEYCANADA BAYINNER WESTLANE COVERANDWICKWILLOUGHBYNORTHERN BEACHESWAVERLEYKU-RING-GAIMOSMANWOOLLAHRAHUNTERS HILL 0.00000.00080.00160.00240.0032 L A K E M A C Q U A R I E C E N T R A L C O A S T C A M P B E LL T O W N C A M D E N B L U E M O U N T A I N S P E N R I T H B L A C K T O W N W O LL O N D I LL Y L I V E R P OO L H A W K E S B U R Y C U M B E R L A N D F A I R F I E L D S T R A T H F I E L D C A N T E R B U R Y - B A N K S T O W N L I T H G O W U PP E R L A C H L A N S H I R E C I TY O F P A RR A M A TT A B A Y S I D E R Y D E G E O R G E S R I V E R S U T H E R L A N D S H I R E B U R W OO D T H E H I LL S S H I R E C I TY O F S Y D N E Y H O R N S B Y L A N E C O V E N O R T H S Y D N E Y R A N D W I C K I NN E R W E S T W I N G E C A RR I B EE C A N A D A B A Y N O R T H E R N B E A C H E S W AV E R L E Y W I LL O U G H B Y K U - R I N G - G A I M O S M A N W OO LL A H R A H U N T E R S H I LL LAKE MACQUARIECENTRAL COASTCAMPBELLTOWNCAMDENBLUE MOUNTAINSPENRITHBLACKTOWNWOLLONDILLYLIVERPOOLHAWKESBURYCUMBERLANDFAIRFIELDSTRATHFIELDCANTERBURY-BANKSTOWNLITHGOWUPPER LACHLAN SHIRECITY OF PARRAMATTABAYSIDERYDEGEORGES RIVERSUTHERLAND SHIREBURWOODTHE HILLS SHIRECITY OF SYDNEYHORNSBYLANE COVENORTH SYDNEYRANDWICKINNER WESTWINGECARRIBEECANADA BAYNORTHERN BEACHESWAVERLEYWILLOUGHBYKU-RING-GAIMOSMANWOOLLAHRAHUNTERS HILL 0.00000.00080.00160.00240.0032 (a) Renting LGA (rows), to Purchase LGA (columns)

CAMPBELLTOWNPENRITHBLACKTOWNFAIRFIELDHAWKESBURYCENTRAL COASTBLUE MOUNTAINSCUMBERLANDWOLLONDILLYLIVERPOOLCAMDENCANTERBURY-BANKSTOWNSTRATHFIELDCITY OF PARRAMATTAUPPER LACHLAN SHIRELITHGOWBAYSIDEWINGECARRIBEECITY OF SYDNEYGEORGES RIVERBURWOODSUTHERLAND SHIRETHE HILLS SHIREHORNSBYRYDEINNER WESTNORTH SYDNEYRANDWICKLAKE MACQUARIECANADA BAYLANE COVEWAVERLEYWILLOUGHBYNORTHERN BEACHESKU-RING-GAIHUNTERS HILLMOSMANWOOLLAHRA 0% 3% 6%CAMPBELLTOWNPENRITHCENTRAL COASTCAMDENBLACKTOWNBLUE MOUNTAINSLIVERPOOLWOLLONDILLYHAWKESBURYFAIRFIELDCUMBERLANDSTRATHFIELDCANTERBURY-BANKSTOWNCITY OF PARRAMATTAUPPER LACHLAN SHIRELITHGOWBAYSIDELAKE MACQUARIEGEORGES RIVERCITY OF SYDNEYSUTHERLAND SHIRETHE HILLS SHIREHORNSBYRYDEBURWOODWINGECARRIBEENORTH SYDNEYCANADA BAYINNER WESTLANE COVERANDWICKWILLOUGHBYNORTHERN BEACHESWAVERLEYKU-RING-GAIMOSMANWOOLLAHRAHUNTERS HILL 0% 3% 7%LAKE MACQUARIECENTRAL COASTCAMPBELLTOWNCAMDENBLUE MOUNTAINSPENRITHBLACKTOWNWOLLONDILLYLIVERPOOLHAWKESBURYCUMBERLANDFAIRFIELDSTRATHFIELDCANTERBURY-BANKSTOWNUPPER LACHLAN SHIRELITHGOWCITY OF PARRAMATTABAYSIDERYDEGEORGES RIVERSUTHERLAND SHIREBURWOODTHE HILLS SHIRECITY OF SYDNEYHORNSBYLANE COVENORTH SYDNEYRANDWICKINNER WESTWINGECARRIBEECANADA BAYNORTHERN BEACHESWAVERLEYWILLOUGHBYKU-RING-GAIMOSMANWOOLLAHRAHUNTERS HILL 0% 3% 7% (b) Purchase LGAs

Figure 19: First-time home buyers. The ﬁrst row is the 2006–2010 period, the middle row the2011–2015 period, and the ﬁnal row the 2016–2019 period.37

Exogenous Variables

There are two main external inﬂuences on the model, which are governed by government approvals(in the case of overseas investments) and the central bank (in the case of mortgage rates).

I.1 Overseas Investors

Overseas investments are often cited as a key driver of price growth in the Australian market(Rogers et al., 2017), and ﬁgures show the foreign investment has more than tripled since themid-1990s (Haylen, 2014). However, actual data on foreign investments is diﬃcult to ﬁnd. ABShas described their own data on overseas investments to parliament as “hit or miss” Iggulden(2014).The purpose of this work is not a full investigation into overseas investments (overviewsare given in Gauder et al. (2014); House of Representatives Standing Committee on Economics(2014)), but rather the contribution overseas might have in relation to many other factors withthe readily available data (be this complete or not).For this, we use the annual reports from the Foreign Investment Review Board (FIRB) fromJune 2006-June 2019. The June 2019 - June 2020 report was not available at the time of thiswriting (in 2020), as reports are not made available until the following year. Data is providedyearly at a NSW level, which is converted to monthly (simply dividing by 12). Again, data in thisarea is sparse, so this is the closest estimate we could derive. This data is provided in Table 4,and the average approval per year given in Fig. 20.

Period Number Approved Value Approved Average Per Approval

Year $ 0k$500k$1.00M$1.50M$2.00M$2.50M$3.00M$3.50M A v e r age A pp r o v a l Figure 20: Average Overseas Approval Amount

I.2 Mortgage Rates

Mortgage Rates are those set by the RBA. The ﬁnal training months mortgage rate is usedthroughout the testing period since no real value can be read.

References

Alhashimi H, Dwyer W (2004) Is there such an entity as a housing market. In: 10th AnnualPaciﬁc Rim Real Estate Conference (press), BangkokArcaute E, Molinero C, Hatna E, Murcio R, Vargas-Ruiz C, Masucci AP, Batty M (2016) Citiesand regions in Britain through hierarchical percolation. Royal Society Open Science 3(4):150691Axtell R, Farmer D, Geanakoplos J, Howitt P, Carrella E, Conlee B, Goldstein J, Hendrey M,Kalikman P, Masad D, et al. (2014) An agent-based model of the housing market bubble inmetropolitan Washington, DC. In: Whitepaper for Deutsche Bundesbank’s Spring Conferenceon “Housing markets and the macroeconomy: Challenges for monetary policy and ﬁnancialstability”Bahadir B, Mykhaylova O (2014) Housing market dynamics with delays in the construction sector.Journal of Housing Economics 26:94–108Bangura M, Lee CL (2019) The diﬀerential geography of housing aﬀordability in Sydney: adisaggregated approach. Australian Geographer 50(3):295–313Bangura M, Lee CL (2020) Housing price bubbles in Greater Sydney: evidence from a submarketanalysis. Housing Studies pp 1–36Baptista R, Farmer JD, Hinterschweiger M, Low K, Tang D, Uluc A (2016) Macroprudentialpolicy in an agent-based model of the UK housing market. Bank of England working papers619, Bank of England, DOI http://dx.doi.org/10.2139/ssrn.2850414, URL http://dx.doi.org/10.2139/ssrn.2850414 https://doi.org/10.21105/joss.00097

House of Representatives Standing Committee on Economics (2014) Report on Foreign Investmentin Residential Real Estate. The Parliament of the Commonwealth of AustraliaHuang Y, Ge J (2009) House prices and the collapse of stock market in mainland China?-anempirical study on house price index. In: Paciﬁc Rim Real Estate ConferenceIggulden T (2014) ABS admits data on foreign real estate buyersis ’hit and miss’. URL