[PDF] An internal fraud model for operational losses in retail banking

Abstract

Full PDF

AAn internal fraud model for operational losses in retail banking

Roc´ıo Paredes ∗ Marco Vega † This version: June 13, 2019

Abstract

This paper develops a dynamic internal fraud model for operational losses inretail banking. It considers public operational losses arising from internal fraud inretail banking within a group of international banks. Additionally, the model takesinto account internal factors such as the ethical quality of workers and the risk con-trols set by bank managers. The model is validated by measuring the impact ofmacroeconomic indicators such as GDP growth and the corruption perception uponthe severity and frequency of losses implied by the model. In general,results showthat internal fraud losses are pro-cyclical, and that country speciﬁc corruption per-ceptions positively aﬀects internal fraud losses. Namely, when a country is perceivedto be more corrupt, retail banking in that country will feature more severe internalfraud losses.JEL Classiﬁcation: C30, C35, G21Keywords: Operational risk; Internal fraud; Retail banking ∗ Centrum Cat´olica Graduate Business School ([email protected]) and Pontiﬁcia UniversidadCat´olica del Per´u. † Banco Central de Reserva del Per´u ([email protected]) and Pontiﬁcia Universidad Cat´olica del Per´u. a r X i v : . [ q -f i n . R M ] F e b Introduction

This paper studies operational losses due to internal fraud in the retail segment of banks.Internal fraud is deﬁned as operational losses due to acts that involve at least one internalparty aimed at defrauding, misappropriating property, or circumventing regulations, the law, orcompany policies (BCBS 2006). As Kochan (2013) suggests, internal fraud is one of the fastest-growing and most complex criminal threats to ﬁnancial organizations. This type of threat frominsiders takes various forms because fraud can occur at any level of the administrative ladder,from junior employees up to chief executives. Internal fraud events are due to factors such asworker compensation, culture, or macroeconomic conditions (Jarrow 2008)Retail banking is a traditional, universal type of banking involving payment services (debitcards), short-term unsecured loans (credit cards), money management facilities (current ac-counts), savings, loans, and mortgages. According to ORX (2012), retail banking experiencesthe larger number of operational loss events, 59% for the period 2006-2010, and increasing to65% in 2011. In addition, the gross losses in retail banking are the most severe of losses acrossbusiness lines, representing up to 37% of total losses by business line.The nexus between internal fraud and retail banking has not been explored much in theliterature. Given the importance of the retail segment in the banking business, it is importantto shed some light in the topic by laying out a dynamic model of loss generation in the internalfraud - retail banking cell. The model is set to mimic the aggregate behaviour of these lossesin the ORX database. In addition, other speciﬁc information about banks that participate inthe ORX consortium is obtained from diﬀerent outlets such as ﬁnancial service authorities andtheir corporate Web pages. The model is validated by measuring how observed macroeconomicvariables outside the model aﬀect the loss severities and frequencies. The paper ﬁnds that thereis a strong association between macro variables and fraud losses in retail banking.The model can be used as an operational risk management tool. Scenario analysis andoperational risk capital simulations are straightforward to perform. For example, the modelcan be used to evaluate diﬀerent data aggregation techniques in dealing with consortium data. The Operational Riskdata eXchange Association (ORX) is a global data sharing association whose memberscomprise the biggest banks in the world.

2o the best of our knowledge, the operational risk management literature has not provideda quantitative model for describing internal fraud in the ﬁnancial sector; this study is the ﬁrstto contribute in this direction. The internal fraud model borrows insights from a number of dis-ciplines such as corporate governance, behavioral economics, human resources, and operationalrisk.

Related literature

In the operational risk literature, there is a group of papers that build on quantitative modelsthat explain the outbreak of operational loss events. K¨uhn and Neu (2003) studies models thatgenerate operational losses in banks through network dynamics that lead to the occurrence ofrisk events in an environment where banks make eﬀorts to mitigate operational losses. Leippoldand Vanini (2005) uses functional dependence modeling to extend the work of K¨uhn and Neu(2003, 2004) by including ﬁxed and stochastic costs that arise in case of operational risk events.Based on this strand of the literature, Bardoscia and Bellotti (2011) models the amount ofoperational losses recorded at a certain time in a certain process.Our paper diﬀers from the above studies in the scope of loss generation. Our model focuseson internal fraud in retail banking whereas the aforementioned studies are more general andthus take into account the network structure of operational losses.Other papers like Fragni`ere et al. (2010), Hatzakis et al. (2010) or Weiss and Winkelmann(2011) introduce the quality and quantity of the workforce as a source of risk. The importanceof human capital in the operational loss process of ﬁnancial institutions accords with the ideathat the key process of a bank is the handling of information. Banking is known to be aknowledge-intensive business process (Weiss and Winkelmann 2011). This calls for a modelingapproach that takes the quantity and quality of employees into account.Our paper shares the insights of the managerial approach in Fragni`ere et al. (2010); however,this latter study is focused on the planning of workforce capacity and do not touch on factorsthat generate operational losses due to internal fraud.Our paper is also related to the literature that studies how macroeconomic and macroinstitutional factors aﬀect operational losses such as Allen and Bali (2007), Chernobai et al.(2011), Moosa (2011), Cope et al. (2012), Li and Moosa (2015), Stewart (2016), Abdymomunov3t al. (2017) and Wagner et al. (2017). Some of these papers stress the relationship betweenmacroeconomic activity and operational losses. For instance, Allen and Bali (2007) and Wagneret al. (2017) provide evidence that loss severities are pro-cyclical. The evidence is not conclusive.Abdymomunov et al. (2017) ﬁnds evidence for greater aggregate losses in a downturn and Moosa(2011) ﬁnds that severity is positively associated with the unemployment rate. On the otherhand, Li and Moosa (2015) ﬁnds that loss severities are positively related to the size of theeconomy.The above evidence takes operational losses in general. Closest to our paper is Stewart(2016) which provides evidence that bank fraud and economic activity are positively related.Also, Cope et al. (2012) ﬁnds that internal fraud losses are strongly and positively associatedwith the countries’ legal frameworks that favor insider trading and are negatively associatedwith country-speciﬁc constraints on executive power in banks. Last, Chernobai et al. (2011)ﬁnds that the frequency of internal fraud losses in ﬁnancial institutions depend positively onfeatures such as market value, fast liability growth and ﬁnancial distress while negatively relatedto default-risk and the age of ﬁrms.Our paper also studies the relationship between macroeconomic variables and operationallosses but diﬀerent from the above papers, it does not use observed operational loss data butsimulated internal fraud losses that mimic the ORX data. The association between those macrovariables and operational losses serves to validate the model and provides ﬁrst evidence pointingto a positive link between the corruption culture in a country and internal fraud in retail banking.The rest of the paper is organized as follows, section 2 lays out the dynamic model forinternal fraud losses, section 3 describes the publicly available data used in the paper, section4 focuses on the calibration and regression results and section 5 concludes.

We setup a dynamic model for operational loss occurrence due to internal fraud in the retailsegment of banks. The model calibration aims to mimic the ﬁrst moments of aggregate severityand frequency of operational losses drawn from ORX database reports. The model takes into4ccount the speciﬁc factors that trigger internal fraud losses in each of the banks that took partof the ORX consortium during the period 2006-2010.

The stochastic model aims to explain operational losses due to internal or insider fraud in theretail-banking context within each ﬁnancial institution in terms of a set of conditioning factors.The main equation in the model is given by l i,τ = α i, × ramp ( α i, + α i,c c i,τ + α i,y y i,τ + α i,q q i,τ + ξ i,τ ) (1)where l i,τ stands for an internal fraud loss in retail banking at bank i at moment τ . TheGreek letter subscript τ denotes moments of time during a given year t . In practice, it canrepresent days or hours within a year. The variable c i,τ is the investment or eﬀort made bybank i to avoid the operational loss, or it can measure the level of internal controls. This variablecan be measured as the share of monetary resources devoted to risk management and control andcan be expressed as a percentage of operating costs. Higher standards of internal risk controls( c i,τ high) imply that the likelihood of operational loss events is reduced. Internal fraud eventsare somewhat more controllable than losses originating from external sources (Chernobai et al.2011). This control aspect of operational losses is outlined, for example, in Kochan (2013).The amount y i,τ represents the scale of production in the business line; for retail banking,it can represent the number of transactions with bank clients, or it can represent retail income.A higher number of transactions imply that the likelihood of operational losses increases. Inthe theory of people risk management, this scale level y i,τ is a proxy for internal and externalinteractions, which give rise to operational losses due to fraud (Blacker and McConnell 2015):The bigger the scale of the business, the higher the number of interactions. In an environmentof increased employee interaction, fraud risks rise.Variable q i,τ measures the ethical quality of employees. High internal ethical standards meanthat losses due to internal fraud are less likely to occur. The ethical quality of workers is diﬀerentfrom the technical quality of workers, which is measured directly by worker productivity (e.g.,5ross income per worker). Therefore, the quantity and quality of human capital proposed byFragni`ere et al. (2010) are key determinants of operational losses in retail banking.From the variables explained so far, the volume of gross income y i,τ is directly observable.Information about these variables can be gathered from the annual reports of each of the banksin the ORX dataset. On the other hand, the level of controls c i,τ and the quality of employees q i,τ are not directly observable. Speciﬁc feedback equations are required to model both variablesto elicit their unobserved values and see how they quantitatively aﬀect the generation of lossesthrough Eq. (1).The last variable left to be explained in Eq. (1) is ξ i,τ . This variable represents unknownfactors or shocks that can potentially trigger losses. This random variable is assumed to beautocorrelated and heteroskedastic. The idea that loss shocks are autocorrelated and mayexhibit volatility clustering is similar to what Chernobai and Yildirim (2008) and Gu´egan andHassani (2013) suggested. In particular ξ i,τ = ρ i ξ i,τ − + σ i,τ µ i,τ , µ i,τ ∼ N (0 , , σ i,τ = β ,i + β ξ i,τ + β σ i,τ − (2)The coeﬃcient ρ i ∈ [0 ,

1] measures the level of autocorrelation or the persistent nature ofshocks that may trigger losses. The error term σ i,τ µ i,τ is heteroskedastic by virtue of the time-varying nature of the variance term. The variance term σ i,τ , also known as conditional variance,depends on past shock realizations as well as past variance itself. Eq. (1) also calls the function ramp ( . ), which represents the mapping from operational lossfactors to loss severities. This function has the feature of generating zero losses most of thetime and positive loss severities at other times. . Formally, the ramp function is deﬁned by ramp ( x ) =  x if x ≥

00 of x < The way conditional variance behaves is called generalized autoregressive conditional heteroscedasticity(GARCH), as proposed in Bollerslev (1986). K¨uhn and Neu (2004) and Bardoscia and Bellotti (2011) used the same type of function.

6o ﬁnish the description of Eq. (1), all the coeﬃcients ( α i, , α i, , α i,c , α i,y , α i,q ) vary acrossbanks, but they are constant through time. This reﬂects the fact that operational loss oc-currences are sensitive to each of the factors described, which are idiosyncratic for each bank.The levels of operational risk control ( c i,τ ), and the quality of the workforce ( q i,τ ) need to bemodeled.The level of operational risk controls ( c i,τ ) obey a feedback equation whereby the controls oreﬀorts by risk managers to prevent or mitigate operational losses depend on the observable stateof the system. Control is a fundamental aspect of risk management, the actual ISO standarddeﬁnes risk management as “coordinated activities to direct and control an organization withregard to risk” (ISO 2009).In this study, the levels of controls are represented by a suﬃcient statistic denoted by c i,τ .The learning or improved control process depends on the level of risk. This idea is commonin stochastic control environments and follows the example of Cooke and Rohleder (2005) whoproposed a very general feedback model of operational risk. Our paper incorporates this ideaand is explicit about the level of risk that feeds back onto the control level. The feedback controlequation can be expressed as c i,τ = ρ c (2 c ∗ i )1 + exp (cid:16) γ i ( (cid:98) L i,τ − (cid:98) Y i,τ − − λ i ) (cid:17) + (1 − ρ c ) c i,τ − (3)where c ∗ i stands for the optimal level of controls associated with a benchmark loss ratio λ i whenthe actual loss ratio is given by (cid:98) L i,t (cid:98) Y i,t . The control level at bank i depends on the observedkey risk performance given by the ratio of cumulative average observed losses over cumulativeincome. If the observed loss ratio is beyond the desired level, with γ i <

0, control levels need tobe adjusted upward. The degree of the actual adjustment depends on the parameter ρ c ∈ [0 , ρ c , the quicker the control response is. In the opposite case, when ρ c is small, thecontrol level is mostly governed by its previous value.In Eq. (3), (cid:98) L i,τ − stands for the cumulative average observed losses up to the previoustime, while (cid:98) Y i,τ − corresponds to cumulative gross retail income obtained in the same period.Capital letters stand for average quantities, while the circumﬂex denotes that the variable is the7bservable counterpart of an unobservable underlying variable. In the case of operational losses,this distinction is important. An observed loss amount in a bank i at time τ is (cid:98) l i,τ whereas thetrue loss is l i,τ . Eq. (3) means that banks, which experience a history of large losses relative toother banks, will learn from the incidents and therefore increase their controls to levels aboveaverage. This idea is also suggested in Lukic et al. (2013).On the other hand, the quality of the workforce ( q i,τ ), which refers to ethical traits thatdrive workers’ behavior toward the bank is a measure of the propensity to commit fraud. Forexample, an employee can be extremely knowledgeable of internal processes at the bank andso be highly productive, but good knowledge of internal processes may make it easy to commitfraud (Cummings et al. 2012). The equation that describes the ethical quality of workers is q i,τ = 2 ¯ Qρ q (cid:0) δ i ( a i,τ − ¯ A τ )( e i,τ − ¯ E τ ) (cid:1) + (1 − ρ q ) q i,τ − (4)where a i,τ stands for measured technical quality (labor productivity), ¯ A τ is the cross-bankaverage labor productivity, e i,τ is the number of employees per branch at bank i , and ¯ E τ is thecross-bank average of employees per branch. Given that δ i >

0, the sign of the impact eﬀect ofan increase in technical quality is given by ∂q i,τ ∂a i,τ =  < e i,τ > ¯ E τ ≥ if e i,τ ≤ ¯ E τ When there are few workers, increasing productivity is more likely associated with highethical quality because it is easier for banks to screen workers before and after recruitment.When the number of workers is high, the workforce screening process is weaker. Due to thesymmetry of Eq. (4), it is also true that ∂q i,τ ∂e i,τ =  < a i,τ > ¯ A τ ≥ if a i,τ ≤ ¯ A τ which means that an increase in the number of workers harms the ethical quality of workers8hen the average technical productivity of workers is already high.Eq. (4) incorporates, in an explicit way, two concepts in the theory of people risk manage-ment. First, the basic fraud model based on the Cressey’s fraud triangle (see Cressey 1953)asserts that fraud has three elements: Motivation or pressure to commit fraud, the opportunityto commit fraud, and the rationalization or justiﬁcation that a fraudster makes to him or herselfto commit fraud. An employee, often in dire ﬁnancial straits, using its insider information aboutthe ﬁrm’s control system, redirects funds to other sources.The insider information about the control environment is possible if the employee possessesknowledge about many processes in the bank. In this study, this knowledge is approximated bythe technical productivity of workers in ﬁrm i at time τ : a i,τ The second key fraud theory concept embedded in Eq. (4) refers to interactions in the ﬁrm.In the words of Blacker and McConnell (2015): “Inappropriate interactions between individualsinside and outside of the ﬁrm give rise to People Risk” (p. 121). Blacker and McConnellunderscore the qualitative nature of employee interactions. In this study, it is argued that thequalitative level of interactions (inappropriate or bad) increases with the quantitative numberof interactions that should be proportional to the number of employees scaled per branch e i,t at bank i during period t .Therefore, Eq. (4) shows that the ethical quality of employees (inverse of the propensityto commit fraud) falls when both opportunities for fraud and the number of inappropriateinteraction rise as suggested by the theory of people risk.After losses { l i,τ } for the set of banks i = { , . . . , N } at high frequencies τ = { , . . . , T } aregenerated by the stochastic dynamic system given by Eqs. (1) to (4), the loss data have to berecorded and submitted to the pooled database.Of note is that data recorded to build the loss datasets are not the same as the original lossdata { l i,τ } for a number of reasons. For example, the existence of recording thresholds indicatesthat only losses greater than a threshold level l mini are submitted to the dataset. Moreover, whena loss event occurs, banks do not necessarily know the exact loss amount incurred. There is anatural lag between occurrence of an event that involves loss and knowledge of the severity ofthe event. The lag depends on the speciﬁc nature of the event. For the purposes of this research,9t is assumed that the severity of the event is known at the same time as its occurrence but thatthe knowledge is imperfect and subject to measurement errors. Therefore, the observed datasetprocess implies ˆ l i,τ = l i,τ + η i,τ (5)where ˆ l i,τ is the observed loss severity, l i,τ is the true unobserved loss severity (underlying loss)and η i,τ is an unbiased, white noise measurement error distributed normally η i,τ ∼ N (0 , σ ηi ).The measurement errors are independent over time and across banks, but heterogeneity acrossbanks is allowed. The losses submitted to the pooled dataset and used for internal purposes aredescribed by D i = { ˆ l i,τ | ˆ l i,τ > l mini } for τ = 1 , · · · , T (6)In the data collection process, both σ ηi and l mini are assumed to be exogenous. In fact, thevalue of l mini determined for all member banks of the ORX association is e The parameters that specify the stochastic dynamic operational loss model described aboveare not free. It is necessary to restrict the parameters to speciﬁc values. Standard estimationtechniques such as linear regression cannot be conducted because there is no hard data and themodel exhibits unobservable variables like control and quality of workers. This implies that themodel parameters need to be calibrated. The operational loss simulation process is conditionalon the value of these parameters.Some parameters are speciﬁc to banks (idiosyncratic), so they have the subscript i in theirnotation. Other parameters are common to all banks. The procedure for calibration of allparameters, general and speciﬁc, is described below. There are 13 idiosyncratic parametersper bank (see Table 1). Given that there are up to 52 banks, it would be necessary to pin-down about 672 idiosyncratic parameters in total. Given the vast number of parameters tobe calibrated, a very simple shrinkage procedure is introduced to reduce the number of pa-rameters to be calibrated based on the available information, the period of analysis, and thespeciﬁc banks under study. In essence, the shrinkage procedure used in this study takes into10 able 1. Parameters of the Dynamic Model for Operational Losses

Param. Equation Deﬁnition α i, Internal fraud losses Overall scale of losses α i, Internal fraud losses Constant within ramp function α i,c Internal fraud losses Impact of controls on losses α i,y Internal fraud losses Impact of gross operating income on losses α i,q Internal fraud losses Impact of quality of workers on losses ρ i Internal fraud losses Autocorrelation of operational loss shocks β ,i Internal fraud losses Constant in conditional variance of operational loss shocks β Internal fraud losses Inﬂuence of past shocks on conditional variance of operationalloss shocks β Internal fraud losses Inﬂuence of past variance on conditional variance of operationalloss shocks ρ c Loss control Weight of new conditions to aﬀect current controls c ∗ i Loss control Control level associated to desired operational loss ratio γ i Loss control Controls de sensitivity of control to the loss ratio gap fromthe desired ratio λ i Loss control Desired loss ratio ρ q Ethical quality Weight of new conditions to aﬀect current ethical quality levels δ i Ethical quality Determines the sign of impacts from factors¯ Q Ethical quality Level of average ethical quality across banks¯ A Ethical quality Average labor productivity across banks¯ E Ethical quality Average number of employees by branch across banks σ ηi Measurement error Variance of measurement errors able 2. Observable Variables that Condition the Simulation of Losses in Each Bank

Nomenclature Description Type e i,t Number of employees Idiosyncratic b i,t Number of branches and oﬃces Idiosyncratic a i,t Retail assets (millions of Euros) Idiosyncratic y i,t Retail loans (millions of Euros) Idiosyncratic m i,t Proxy for operational risk management awareness Idiosyncratic h i,t Proxy for human resource awareness Idiosyncraticaccount the idiosyncratic data collected for each bank. These data proxy the degree of riskinessand heterogeneity of each bank and are used to map the heterogeneous values of the modelparameters.The starting point for model calibration is the operational loss summary report presentedin the ORX database (ORX 2012) for the period 2006-2010. In this report, there are 4 , e

880 million.Table A-1 in Appendix A shows the banks that reported losses to the ORX data exchangeduring the period 2006-2010. New members entered the association, and some members quitdue to bankruptcies, mergers, or acquisitions. For example, Wachovia was a member of theassociation until acquired by Wells Fargo in 2008.For each of the N banks and years under analysis, a set of variables categorized as keyrisk indicators are gathered. These variables condition the occurrence of losses in the model orserve as useful devices to calibrate idiosyncratic parameters. The set of conditioning variables isdescribed in Table 2. All the variables indicate risk exposure such as the number of employeesor the size of retail loans. These variables are useful devices to apply the shrinkage procedurebecause they discriminate among banks. For example, according to Cressey’s fraud triangleexplained before, banks that have higher employees per branch relative to the mean amongbanks might be deemed riskier than those that have lower employees per branch. Therefore,the dispersion of employees per branch across banks is useful to calibrate parameters acrossbanks.In contrast to the model speciﬁcation in the preceding subsection, the observed variables12re indexed by time ( t ), where t stands for end-of year variables. Idiosyncratic variables forthe years 2006 through 2010 were obtained from annual reports that member banks publishedon their Web sites. The values of interest were extracted from the descriptive information,balance sheets, and income statements contained in the aforementioned reports. These reportsare publicly available as part of the information disclosure by banks directed to investors. Theﬁnancial statements in these reports are compatible with sound regulatory and accountingpractices and, on the majority of cases, they accord to GAAP.An example of the information recovered from these annual reports is given in Fig. (1). Theﬁgure shows that the size of banks in the ORX dataset is heterogeneous. Each dot refers toa speciﬁc bank. The number of employees ranges from 10 to about 140 thousands, while thenumber of branches varies from 300 to about 10,000. In addition, both the number of employeesand number of branches show some degree of correlation.

Figure 1.

Number of branches and employees for banks in the ORX dataset

Note: Information extracted from bank’s annual reports as of December 2006 In the few cases when reports were not available, we got information from SEC ﬁlings.

13n addition to the objective information included in the annual reports, proxy variablesrelated to operational risk controls and the quality of human resources at each bank wereconstructed. Thus, variable m i,t measures operational risk management awareness implicit inthe information shared with the public. This awareness proxy was obtained by means of textualanalysis of the published annual reports; for example, the number of times a word or a phraseoccurred within each report divided by the number of pages. An example of this type of textualinformation is given by Fig. (2), which depicts the ratio of word counts of the expression“operational risk” as a percentage of the total number of pages. This variable could arguablyreﬂect the extent of awareness of each bank toward operational risk management. Figure 2.

Word counts of the expression “operational risk” as percentage of page counts ineach 2010 annual report.

Note: Information extracted from bank’s annual reports as of December 2010

Other textual expressions that reﬂect operational risk awareness are analyzed, for example,the use of the acronym AMA.The variable h i,t is intended to measure the awareness of banks about human resources.The annual reports also contain information about policies geared to improve the management The idea of extracting information from textual sources is not new in ﬁnance (Kearney and Liu 2014). τ between any consecutive years t and t + 1. It is assumed that time τ will refer to business dayswithin years. The existence of holidays was excluded in these calculations.Five parameters aﬀect the outbreak of operational losses in Eq. (1). All these parametersare idiosyncratic; therefore, it was necessary to devise a way to calibrate all of them in a simpleform. Let α i,j be a parameter in Eq. (1) for i = { , · · · , N } , and j = { , , c, y, q } . Then for each j , there exists a mean parameter value taken from the cross section of banks. ¯ α j = (cid:80) Ni =1 α i,j .By pinning down the value of ¯ α j , it is possible to pin down the idiosyncratic parameters ofinterest α i,j .To complete the process, it is necessary to work with risk exposure indicators calculatedfrom the data shown in Table 2. Let these indicators be deﬁned by x i,t , for example, the ratioof employees per branch or the technical quality of workers (labor productivity) measured asthe ratio of total retail loans to the number of employees. These measures can be calculated foreach ﬁnancial institution and for each calendar year in the sample. Fig. (3) depicts a histogramof the number of employees per branch at the end of 2006 in all banks belonging to the datasetby the end of that year.Next, let ¯ x = NT (cid:80) Tt =1 (cid:80) Ni =1 x i,t be the overall average risk indicator and ¯ x i = T (cid:80) Tt =1 x i,t be the bank i average. If there are grounds to postulate direct proportionality between thecoeﬃcient α i,j and the indicator ¯ x i , then each parameter can be pinned down according to α i,j = (cid:0) ¯ x i ¯ x (cid:1) ¯ α j for i = 1 , · · · , N (7)For example, for parameters α i, , α i, , and α i,y , the choice of x i,t = e i,t b i,t is a reasonable15 igure 3. Histogram of the number of employees per branch across banks in the ORXdatabase as of December 2006.

Note: Information extracted from bank’s annual reports as of December 2006 option. This means that loss event sensitivities are correlated with the ratio of employees tobanks. In this case, only the parameter ¯ α j need be calibrated. Once its value is determined,Eq. (7) ﬁxes the distribution of α i,j parameters across banks.The parameters α i,c and α i,q are likely to be inversely proportional to the ratio x i,t = e i,t b i,t .For example, controls are more eﬀective when there are fewer people working at branches.The adjustment can be made according to α j,i = (cid:0) ¯ x ¯ x i (cid:1) ¯ α j for i = 1 , · · · , N (8)Parameters ρ i and β ,i can be adjusted in the same fashion. In the case of the autocorrelationof shocks ρ i , it is necessary to set bounds ρ min > ρ max <

1, such that the resultingoperation ρ (cid:48) i = ¯ x i ¯ x ρ can be further modiﬁed to become bounded within the range [ ρ min , ρ max ].To do so, it is ﬁrst necessary to calculate ρ (cid:48) min and ρ (cid:48) max with the parameters obtained given ρ and then to apply ρ i = ρ min + (cid:18) ρ max − ρ min ρ (cid:48) max − ρ (cid:48) min (cid:19) ( ρ (cid:48) i − ρ (cid:48) min ) for i = 1 , · · · , N (9)The parameter β ,i controls for the unconditional variance of operational loss shocks in eachbank, as shown in Eq. (2). The dispersion of this parameter across banks can also apply the16rinciple underlying Eq. (7).Three idiosyncratic parameters aﬀect the level of controls ( c ∗ i , λ i , γ i ). The ﬁrst parametermeasures the long-run value of control levels. Current control levels may be stricter or easierthan this long-run benchmark, which also needs to be on the range [0 , m i,t that measures operational risk managementawareness can be used in the same fashion as the calibration of the dispersion of parameter ρ i .Parameter λ i reﬂects the operational loss level as a ratio of operating income that banksare ready to accept. Operational loss ratios larger than this benchmark λ i prompt banks toincrease their controls. Calibration of this parameter for each bank is problematic because theavailable information does not provide reasonable proxies for this ratio. Therefore, this paperassumes that this ratio is similar across all banks. Its level is determined by the median 2008ratio of cumulative operational losses to gross operating income provided by Benyon (2008).Parameter γ i is a feedback adjusting parameter. From Eq. (3), it is easy to note that forcontrols to tighten whenever the operational loss ratio increases, the parameter γ i has to benegative. The greater γ i is in absolute value, the greater the impact on control levels. Again,it is reasonable to assume that the absolute value of γ i is directly proportional to the level ofoperational risk awareness m i,t and thus, the calibration of the dispersion in γ i will apply thesame steps outlined above.In terms of the ethical quality of human resources in a bank, the parameter δ i measureshow sensitive each bank’s workforce quality is to the bank’s size and labor productivity. Eq.(4) assumes that size and labor productivity may be detrimental to workers ethical quality.Therefore, parameter δ i is positive, and the higher it is, the more sensitive ethical qualitybecomes. The sensitivity may be related inversely to the human resource awareness proxy h i,t extracted from the data. Human resource awareness is related to how important is the workforcein terms of well-being, compensations, and on-the-job training, for example.The last idiosyncratic parameter calibration that needs a shrinkage procedure is the variance σ ηi of the measurement error in recording the severity of operational losses. It is reasonable toassume that higher severity levels are associated with higher measurement error variances. Toreﬂect this feature, the study used the aggregate operational loss ﬁgures reported in Benyon172008). The losses reported by Benyon refer to the aggregate of all types of operational losses,not just retail banking. A summary of this data is depicted in Fig. (4), which shows theexistence of an extreme asymmetry of operational loss severities; in fact, two important modesappear for losses less than e

10 million and for losses larger than e

100 million. It is assumedthat banks facing large loss severities are likely to have large variances in their measurementerrors when they record operational losses. Therefore, the dispersion in σ ηi will be calibratedby the dispersion of loss severities documented in Benyon (2008). Figure 4.

Histogram of aggregate 12-month cumulative operational losses for October 2008 inbanks in the ORX database.

Note: For each bank, average percentages are reported. Ranges of losses are expressed in millions of Euros. Thecontinuous line is the histogram; the dashed line is a polynomial smoothing of the original histogram. Datawere obtained from Benyon (2008)

After the shrinkage procedure is executed, the space of parameters to calibrate shrinks to20. The mean value ¯ α j of the parameters in the operational loss (Eq. 1) was calibrated togenerate loss severities compatible with the 2012 summary of the database for retail-bankinglosses due to internal fraud (ORX 2012). The mean value of the parameter ¯ β was set to18in down the frequency of losses documented in the ORX summary. Parameters ¯ ρ , β and β determined the clustering pattern of operational loss shocks. According to (Chernobai andYildirim 2008), internal fraud losses exhibit low clustering as opposed to other type of losses.Hence, the calibration assigns relatively low values to these parameters in the range 0.01 and0.1.Two parameters measure speed of response. First, ρ c measures how quickly internal con-trols are implemented to achieve a new level after new conditions arise and are expected topersist. Second ρ q measures how fast the average ethical workforce achieves a new level whenconditioning factors change and are expected to last.Regarding ρ c , in broad terms, control levels do not change from one day to another, andsome changes necessary to implement control adjustments may require budgeting, planning,and extra human resources. In a given year, the average worst scenario would be to wait half ayear to implement full changes. Thus, if the number of business days in a year is about 260 andthe number of working days to implement a new long-run control level is 130, approximately65 days are necessary for implementing half the changes. Due to the auto-correlated nature ofcontrols in Eq. (3) and a half-life of τ (cid:48) = 65 days, the parameter ρ c is then set to the value ρ c = 1 − ( ) /τ (cid:48) ≈ , ρ q would have to be lower than thebenchmark of 0,011.Two parameters need to be determined in the control equation (Eq. 3), namely the meanvalue of long run control levels ¯ c ∗ and the mean sensitivity of control to loss ratio ¯ γ . In theethical quality equation (Eq. 4), there are four parameters to be set; two of them being theaverage labor productivity ¯ A across banks and the average number of employees per branchacross banks ¯ E , which are both readily estimable from the data.All the remaining parameters grouped in the vector (¯ c ∗ , ¯ γ, ¯ δ, ¯ Q, ¯ σ η ) are set freely with thehope that given their benchmark values, the simulated loss severities and frequencies are asso-ciated with external observed factors such as macroeconomic and institutional variables.19 able 3. Data Gathered from Public Sources about Banks in the ORX Exchange

Key Concept MeasureNo Number of bank Indexbname Bank name Indexcountry Bank headquarters’ country Indexcode Country and Bank Code Indexyear Year Indexccy Report’s Currency Code Textbranches Number of branches and oﬃces (Retail) Countstaﬀ Number of staﬀ (total) Countstaﬀ r Number of staﬀ (Retail) Countloans Total loans to customers Currency millionsloans r Total loans to customers in retail banking Currency millionsassets Consolidated assets Currency millionsassets r Assets in retail banking Currency millionstier1 Tier 1 capital Percentnic Net interest income (total) Currency millionsnic r Net interest income (retail) Currency millions

Most but not all the data necessary to perform the analysis belongs to the ORX data exchange.However, this dataset is proprietary. Instead, the data gathered for the analysis in this paperrelies entirely on public information. All banks publish their annual reports and ﬁnancialstatements each year and sometimes more often. These reports do not contain informationabout operational losses but contain most of the idiosyncratic data needed for model calibration.Table 3 summarizes the type of data collected from each of the reports or ﬁnancial statements.Fig. (5) provides a brief description of the dataset. The ﬁgure shows pairs of scatterplotsbetween the numbers of branches, the number of staﬀ related to retail banking operations, thelevel of retail loans, and the value of retail assets. All currency amounts were converted tomillions of Euros. Data comprised ﬁve years for all 52 banks considered.All variables considered in Fig. (5) are indicators of the scale of operations in the retailbanking segments of each bank. This explains the remarkable positive correlation between 0.62and 0.82. These scale indicators belong to the set of risk indicators that likely induce the20 igure 5.

Scatter plot of bank data per year. appearance and severity of internal fraud losses.The calibration procedure needs more speciﬁc risk indicators. Hence, the analysis relies onother forms of risk indicators that could be collected from annual reports. Textual content isuseful to calibrate sensitivity parameters as shown in Section (2). Table (4) shows the textualcontext variables extracted from the annual reports. The variables refer to the number ofinstances a descriptive key word or phrase appears within the entire texts; also, the totalnumber of report pages is recorded in order to calculate the ratio of instances to number of21ages. These ratios give an indication about the relative importance of a key word that banksuse in their public reports.

Table 4.

Variables Contained in the Textual Database

Key Conceptnbank Number of bank in databasebname Bank namecountry Country of bank headquarterscode Bank Code = country code.bankyear Yearorisk “Operational risk” term frequency in annual reportsrisk “Risk” term frequency in annual reportsrman “Risk management” term frequency in annual reportama “AMA” (Advanced Measurement Approach) term frequency in annual reportshres “Human resource” term frequency in annual reportsemp “employee” term frequency in annual reportsCol “colleague” term frequency in annual reportsworkers Sum of “employee” and “colleague” term frequenciesnpag Number of pages in the Annual Report

Both panels in Fig. (6) show scatter plots of textual variables. Panel A contains variablesrelated to human resource management. It shows scatterplots of “human resource” paired withthe sum of “employee” and “colleague” instances in annual report texts as a proportion of totalnumber of pages in each report. Panel B contains variables related to risk management. Itshows the paired scatter plot of the triplet of risk, risk management, and AMA instances as aproportion to total pages. All plots show positive correlations.In terms of global variables that aﬀect all banks or groups of banks, the analysis incorporatesvariables such as GDP growth rates in the countries to which banks belong. The dataset alsocontains a number of variables that could aﬀect the outbreak of losses due to internal fraudsuch as the rule of law in a country or its corruption perception index.Fig. (7) shows the corruption perception index (CPI) for each of the relevant countriesduring the years of analysis. In the data, Brazil and Italy are shown to have higher corruptionperceptions while countries like Denmark, Sweden, Netherlands, and Canada are seen to be lesslikely to be corrupt. The perception of corruption is possibly associated with real corruptionlevels, and the extent of corruption can aﬀect the occurrence of fraud internal or external tobanks because they are related to the cultural environment that rationalizes frauds according22 igure 6.

Scatter plot of textual context variables.

Panel A Panel B

Table 5.

Variables Contained in the Macroeconomic Database

Key Indicator namecountry name Country Namecountry code Country CodeYear Year 2006-2010gdp growth GDP growth (annual %)crisis Financial crisis dummy variable for 2007 and 2008gover eﬀective Government Eﬀectivenessreg quality Regulatory Qualityrule law Rule of Lawcont corrup Control of Corruptioncpi Corruption Perceptions Index (CPI) score (2006-2010)to Cressey’s triangle.A good measure of the ﬁtness of the operational risk model to generate internal fraudlosses is whether those losses are associated to the macro-risk indicators in the same fashionas documented in empirical research as shown for example in Chernobai et al. (2011), Moosa(2011), Cope et al. (2012), Abdymomunov et al. (2017), and Stewart (2016).23 igure 7.

Corruption perception index (CPI) for countries where banks have their mainheadquarters.

Note: The lower the index, the more a country is perceived to be corrupt. The data were based on ﬁguresreleased by Transparency International.

First, we report the results of the calibration procedure. Second, conditional on calibratedparameters, model simulations that capture the loss proﬁle are performed, given the environmentand conditions that banks faced during the period 2006-2010 (See Table A-1). Third, with thesimulated data for each bank and their corresponding conditioning factors, regression resultsthat show the link between macro variables and operational losses are presented.24 .1 Calibration results

Table 6 shows the value of the general parameters that set the behavior of the equations inthe operational loss model. The setting of these parameters applied the calibration procedureoutlined in Section 2. The target of the calibration is to allow the model to simulate aggregatelosses as close to reality as is possible. The only reality check available was to mimic the meanfrequency and severity of losses due to internal fraud in the retail segment across the banksbelonging to the ORX data exchange for the period 2006-2010. Therefore, the calibrationprocedure uses an optimizing framework to pin down the mean parameters of the loss equationdescribed in Table 6.

Table 6.

Calibration of Parameters

Deﬁnition Value Eq.¯ α Mean scale parameter 0.400 Loss outbreaks¯ α Mean constant within ramp function 16.810 Loss outbreaks¯ α c Mean impact of controls -275.291 Loss outbreaks¯ α y Mean impact of gross operating income -1.587 Loss outbreaks¯ α q Mean impact of quality of workers on losses 0.052 Loss outbreaks¯ ρ Mean autocorrelation of loss shocks 0.70 Loss shocks ρ min Lower threshold for loss shocks autocorrelation 0.50 Loss shocks ρ max Upper threshold for loss shocks autocorrelation 0.90 Loss shocks¯ β Mean constant term 0.20 Shock variance β Inﬂuence of past quadratic shocks 0.01 Shock variance β Inﬂuence of past variance 0.70 Shock variance ρ c Weight of new conditions to aﬀect controls 0.10 Loss control¯ c ∗ Mean long run value of control level 0.5 Loss control¯ c min Lower threshold for long run control level 0.3 Loss control¯ c max Upper threshold for long run control level 0.7 Loss control¯ γ Sensitivity of controls to the losses -0.5 Loss control λ Desired loss ratio 0.0003 Loss control ρ q Sensitivity to recent ethical quality 0.05 Ethical quality¯ δ Determines the sign of impacts from factors 0.2 Ethical quality¯ Q Level of average ethical quality across banks 0.7 Ethical quality¯ σ η Variance of measurement errors 0.012 Measurement error l mini Threshold level for operational loss reporting 20 Reporting

Note: The shrinkage procedure uses the parameters denoted with overbars. An optimizing search proce-dure determines the α parameters. The optimizing framework hinges on minimizing the quadratic distance between the observed25ean loss severity and the simulated mean loss severity. In addition, the optimization putsweight on the fact that almost all banks in the dataset must face losses. In reality, a bank withno operational losses in the pace of ﬁve years is rare. The parameters that aﬀect banks in anidiosyncratic way are determined by the shrinkage procedure described in Subsection 2.2. Figs.(B-1) and (B-2) in the appendix depict the distribution of these parameters across banks. Theidiosyncratic parameters calibrated through this procedure therefore serve as a useful device tocontrol for the heterogeneity observed in the banks in the ORX sample.

This study simulates 500 alternative histories of operational losses for the years 2006-2010 withinthe banks in the ORX database by drawing from the shocks in Eq. (1). The simulations considerthe speciﬁc conditions banks confronted during the ﬁve-year period in terms of their own riskexposure and controls implemented. After each simulation, the gross amount of operationallosses as well as the number of losses across banks were calculated. The 500 data points aredrawn in Fig. (8), where the straight lines mark the actual values reported in ORX (2012).Each point in Fig. (8) summarizes a possible ﬁve-year history of data aggregated acrosseach of the 52 banks. Each bank has 500 possible histories of operational losses conditional tothe circumstances in eﬀect during those ﬁve years.As a model validation step, this paper studies the association between simulated operationallosses and covariates based on each bank and covariates that reﬂect the macroeconomic envi-ronment. Given the heterogeneity of operational loss data, this paper models the loss frequencydistribution and the loss severity density in such a way that the location and scale parametersof these functions are aﬀected by the set of covariates.The paper uses the generalized additive models for location, scale, and shape (GAMLSS)as developed in Stasinopoulos et al. (2007) and used by Ganegoda and Evans (2013) in theoperational risk context. Let the simulated data be given by counts of loss outbreaks { ˆ n i,T } foreach bank i and during years T and sequences of loss severities (valued in Euros) { ˆ l i,τ } duringthe period under analysis. Severity observations and yearly count (frequencies) can be modeledusing standard statistical distributions used in the operational risk literature.26 igure 8. Summary of simulations and comparison to ORX report.

The GAMLSS approach implies that covariates aﬀect the location and scale parameters ofthese distributions. If the severities are ˆ l i,τ , the GAMLSS method assumes that they are drawnfrom a density function f (ˆ l i,τ ; Θ i ) conditional on Θ i . In this paper, the conditioning parametervector is given by Θ i = (˜ µ i , ˜ σ i ) (cid:48) , where ˜ µ i stands for location and ˜ σ i stands for scale. Bothparameters are linked to covariates through link functions g (˜ µ i ) = Z ω and g (˜ σ i ) = Z ω where Z and Z are covariates that aﬀect the distributional parameters and the ω (cid:48) s are thecorresponding sensitiveness coeﬃcients.The regressions comprise the mean and the scale parameters using a number of distributionalassumptions and speciﬁcations about the behavior of the mean or the scale of the distributions27s functions of the covariates. In contrast to simple OLS regressions , the GAMLSS methodtackles the heteroscedastic nature of the data in a straightforward way. Severity regressions

Table 7 shows the results for the best performing model for loss severities in retail bankingassociated with internal fraud. This model is the truncated Weibull with mean and scaleparameter. .Results show that when a country to which a bank belongs to grows, the average size oflosses increases. This result is similar to ﬁndings reported in Povel et al. (2007) and Stewart(2016). This positive impact has to do with the opportunistic behavior of workers stressed byBlacker and McConnell (2015). Arguably, in general macroeconomic boom periods more fraudopportunities arise.In addition, when a country is perceived as more corrupt (lower CPI), the average losses arehigher. In the model section, our paper points out that the level of bad interactions inside andoutside banks bring about a rationalization for fraud. The corruption perception of a country isa proxy for outside bad interactions. In this sense, if criminal or corrupted behavior of citizensis broadly accepted in a society, then workers ﬁnd more rationale for stealing from banks.The regressions also consider idiosyncratic variables. For example, higher levels of opera-tional risk controls reduce loss severities, higher employees per branch increase loss severitiesand higher assets per employee reduce loss severities. These three eﬀects are expected becausethey are embedded in the loss model calibration or speciﬁcation.Nevertheless, the remarkable ﬁnding is that neither the GDP growth nor the CPI were usedto calibrate the loss model or as causal variables in the model speciﬁcation, yet they signiﬁcantlycause fraud losses. Furthermore, this ﬁnding conforms well with existing theory of how internalfraud losses occur.Table 7 also shows the results for the scale parameter of the truncated Weibull distribution.The results suggest that a higher corruption perception (lower index) implies higher fraud lossvolatility, but more employees per branch diminish the fraud loss volatility. These results are We performed but not reported standard OLS regressions. Given that losses smaller than e able 7. Estimates of the Regression in the Mean (Truncated Weibull)

Regressors Estimates Std. error t-value p-valueRegression in the meanIntercept -2.80 0.098 -28.43 0.000 ***GDP growth 0.01 0.006 2.09 0.037 *CPI -0.05 0.012 -4.06 0.000 ***Control -0.40 0.184 -2.17 0.030 *Employees per branch 0.13 0.003 51.57 0.000 ***Assets per employee -0.01 0.005 -1.95 0.051 .Regression in the scaleIntercept 0.733 0.085 8.61 0.000 ***CPI -0.068 0.011 -6.32 0.000 ***Employees per branch -0.012 0.002 -5.84 0.000 ***GDP growth (a) 0.002 0.005 0.45 0.651Control (a) -0.015 0.153 -0.10 0.924Assets per employee (a) 0.009 0.005 1.85 0.065 .

Notes: Signiﬁcance codes: 0 = ∗ ∗ ∗ , 0 .

001 = ∗∗ , 0 .

01 = ∗ , 0 .

05 = . (a) Smoothing is performed with p-splines not straightforward to justify in terms of theory but provides an interesting starting point forfurther research. For now, this is beyond the scope of this paper.Table 7 reports the baseline regression. However, models in the GAMLSS setup may diﬀer inthe underlying distribution of the error terms. In the OLS setup the only distribution modelledis the Normal. In the GAMLSS there are families of distributions from which to choose. Oncea distribution is chosen, models still diﬀer because they may have diﬀerent covariates.The approach taken in this paper considers ﬁrst choosing the distributions given a benchmarkset of covariates. Table 8 shows the Akaike information criteria (AIC) statistics for the estimatedmodels and speciﬁcations for the underlying distributions of the errors. The AIC supports thetruncated Weibull model with mean and scale shown in Tables 7. Table A-2 in the appendixshows a set of models that are estimated. The key ﬁndings about the eﬀect of GDP growth ratesand the CPI are robust across models. When the dummy for the ﬁnancial crisis is included, theeﬀect of GDP growth becomes insigniﬁcant. However the model with crisis dummy is inferiorin terms of the AIC. Frequency regressions able 8. Severity Model Selection

Distributions AICTruncated Weibull (mean and scale with smoothing in regression) -6,581Truncated generalized Pareto (mean and scale with smoothing in regression) -6,503Truncated Weibull (mean and scale regression) -6,362Truncated generalized Gamma (mean and scale regression) -6,361Truncated Weibull (only mean regression) -6,299Truncated generalized Gamma (only mean regression) -6,297

Table 9 shows the result of the benchmark regression. GDP growth aﬀects the numberof annual loss events positively; more controls and more retail assets per employee reduce thenumber of losses. In the case of frequency, the CPI is not a signiﬁcant regressor, neither agroup of regressors such as government eﬀectiveness, regulatory quality, rule of Law, control ofcorruption; all taken from the World Bank Worldwide Governance Indicators.In addition, the regression in the scale or the variance of the number of losses shown inTable 9 conﬁrms that more employees per branch increase the variance in the number of annuallosses and more operational risk controls reduce it.

Table 9.

Regression Results in the Frequency Model

Regressors Estimates Std. error t-value p-valueRegression in the meanIntercept 4.33 0.298 14.528 0.000 ***GDP growth 0.13 0.019 7.020 0.000 ***Controls -3.50 0.575 -6.096 0.000 ***Assets per employee -0.03 0.016 -1.969 0.050 .Regression in the scaleIntercept -0.14 0.507 -0.272 0.786Employees per branch 0.05 0.015 3.019 0.003 **Controls (a) -2.18 1.057 -2.064 0.040 *

Note: Smoothing is performed with p splines and signiﬁcance codes are 0 = ∗ ∗ ∗ , 0 .

001 = ∗∗ , 0 .

01 = ∗ , 0 .

05 = . . Variants of the Poisson, Negative Binomial Type I, and Negative Binomial Type II (seeRigby et al. 2017) were estimated with GAMLSS. The chosen model, according to the AIC (seeTable 10) was the Negative Binomial with regressions in the mean and scale.Table A-3 in the appendix shows the details of the alternative regressions. Across all re-30ressions, GDP growth stands robustly signiﬁcant with a parameter value of 0.13. This meansthat there is a strong evidence that the opportunistic behavior described in Cressey’s trianglealso aﬀects the number of fraud events. Hence, GDP growth aﬀects both the number of eventsand the severity of those events. This is in line with the theory of opportunistic behavior offraudsters when good aggregate economic times arrive.

Table 10.

Frequency Model Selection

Distributions AICNegative binomial (mean and scale with smoothing in regression) 1,612Negative binomial II (mean and scale with smoothing in regression) 1,624Negative binomial (mean and scale regression) 1,632Negative binomial II (mean and scale regression) 1,635Negative binomial (only mean regression) 1,637Negative binomial II (only mean regression) 1,659

This paper lays out a stochastic dynamic internal fraud model for operational risk to simulateoperational losses due to internal fraud in retail banking. The model aims to capture the natureof internal fraud and the operational controls to mitigate or avoid the monetary losses causedby internal fraudsters. The model incorporates human factors such as the level of employees perbranch as well as the ethical quality of workers. It also includes the endogenous risk controls.The huge number of model parameters are calibrated by means of a shrinkage procedurethat depends on available information about the heterogenous nature of banks in the ORXdataset. The simulated losses implied by the model mimic the aggregate ORX data in terms ofthe number of loss events and their severity in the retail banking, internal fraud cell as publishedin ORX (2012).Losses generated by the model are associated with GDP growth and the corruption percep-tion of the country where banks are located. The generalized linear regression results are newbecause they uncover one important determinant of fraud losses not previously documented inthe literature: The corruption perception index of a country. The results imply that highercorruption perception indices at the national level have a direct eﬀect on the size of losses due31o internal fraud events. The ﬁndings show that it is the yearly average size of losses that areaﬀected by national corruption perceptions, not the frequency of losses within a year.This paper can be extended in a number of ways. With real ORX data, the model parameterscan be estimated with standard statistical procedures. Also, the causal eﬀect from nationalcorruption indicators to internal fraud in banks can be further explored. Currently, the paperdoes not make any distinction of losses originating in many countries within the internationalbanks that have oﬃces worldwide. The paper only considers the countries that have banks’headquarters.The model described in the paper, with its benchmark calibration, can be also used tostatistically compare data aggregation techniques dealing with consortium data. This can bedone by means of monte-carlo simulations. 32 eferences

Abdymomunov, A., A. Mihov, and F. Curti (2017). Us banking sector operational losses andthe macroeconomic environment.

Available at SSRN 2738485 .Allen, L. and T. G. Bali (2007). Cyclicality in catastrophic and operational risk measurements.

Journal of Banking & Finance 31 (4), 1191–1235.Bardoscia, M. and R. Bellotti (2011). A dynamical approach to operational risk measurement.

The Journal of Operational Risk 6 (1), 3.BCBS (2006).

International convergence of capital measurement and capital standards: a revisedframework . Bank for International Settlements.Benyon, D. (2008). Top 100 banks-a new dawn for disclosure.

OpRisk & Compliance 9 (10), 22.Blacker, K. and P. McConnell (2015).

People Risk Management: A Practical Approach toManaging the Human Factors that Could Harm Your Business . Kogan Page Publishers.Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.

Journal ofeconometrics 31 (3), 307–327.Chernobai, A., P. Jorion, and F. Yu (2011). The determinants of operational risk in us ﬁnancialinstitutions.

Journal of Financial and Quantitative Analysis 46 (6), 1683–1725.Chernobai, A. and Y. Yildirim (2008). The dynamics of operational loss clustering.

Journal ofBanking & Finance 32 (12), 2655–2666.Cooke, D. L. and T. R. Rohleder (2005). A conceptual model of operational risk. In

Proceedingsof The 23rd International Conference of the System Dynamics Society, July , pp. 17–21.Cope, E. W., M. T. Piche, and J. S. Walter (2012). Macroenvironmental determinants ofoperational loss severity.

Journal of Banking & Finance 36 (5), 1362–1380.Cressey, D. R. (1953). Other people’s money; a study of the social psychology of embezzlement.33ummings, A., T. Lewellen, D. McIntire, A. P. Moore, and R. Trzeciak (2012). Insider threatstudy: Illicit cyber activity involving fraud in the us ﬁnancial services sector. Technicalreport, Carnegie-Mellon U. Pittsburgh.Fragni`ere, E., J. Gondzio, and X. Yang (2010). Operations risk management by optimallyplanning the qualiﬁed workforce capacity.

European Journal of Operational Research 202 (2),518–527.Ganegoda, A. and J. Evans (2013). A scaling model for severity of operational losses usinggeneralized additive models for location scale and shape (gamlss).

Annals of Actuarial Sci-ence 7 (1), 61–100.Gu´egan, D. and B. K. Hassani (2013). Using a time series approach to correct serial correlationin operational risk capital calculation.

The Journal of Operational Risk 8 (3), 31.Hatzakis, E. D., S. K. Nair, and M. Pinedo (2010). Operations in ﬁnancial services. an overview.

Production and Operations Management 19 (6), 633–664.ISO (2009). Risk management-vocabulary. guide 73. Technical report, International Organiza-tion for Standarization.Jarrow, R. (2008). Operational risk.

Journal of Banking and Finance 32 , 870–879.Kearney, C. and S. Liu (2014). Textual sentiment in ﬁnance: A survey of methods and models.

International Review of Financial Analysis 33 , 171–185.Kennett, R. (2009). Orx and the operational risk process. 3rd

FEBRABAN

InternationalOperational Risk Conference.Kochan, N. (2013). Culture, oversight and fraud prevention.

Financial Risk Management Newsand Analysis .K¨uhn, R. and P. Neu (2003). Functional correlation approach to operational risk in bankingorganizations.

Physica A: Statistical Mechanics and its Applications 322 , 650–666.34¨uhn, R. and P. Neu (2004). Adequate capital and stress testing for operational risks. InM. Cruz (Ed.),

Operational Risk Modelling and Analysis: Theory and Practice. , Chapter 12,pp. 273–289. Risk Books.Leippold, M. and P. Vanini (2005). The quantiﬁcation of operational risk.

The Journal ofRisk 8 (1), 1.Li, L. and I. Moosa (2015). Operational risk, the legal system and governance indicators: acountry-level analysis.

Applied Economics 47 (20), 2053–2072.Lukic, D., A. Margaryan, and A. Littlejohn (2013). Individual agency in learning from incidents.

Human Resource Development International 16 (4), 409–425.Moosa, I. (2011). Operational risk as a function of the state of the economy.

Economic Mod-elling 28 (5), 2137–2142.ORX (2012). 2012

ORX report on operational loss data. ORX Association. Available at https://managingrisktogether.orx.org .ORX (2016).

ORX members: the membership by region and joining order. ORX Association.Available at https://managingrisktogether.orx.org .Patel, S. (2009). Quantifying operational risk. ORX Association. Presen-tation retrieved from .Povel, P., R. Singh, and A. Winton (2007). Booms, busts, and fraud.

The Review of FinancialStudies 20 (4), 1219–1254.Rigby, R., D. Stasinopoulos, G. Heller, and F. De Bastiani (2017). Distributions for mod-elling location, scale, and shape: using gamlss in r. . Available at .35abatini, J. (2009). Proﬁle of orx and a case study in the use of consortium loss data. In-ternational Conference on External Data for Operational Risk. Presentation retrieved from http://blog.abieventi.com/documenti/2973/Sabatini-JPMorgan-Chase-ORX.pdf .Sabatini, J. and S. Wills (2008). Use of external data for operational risk management. ORXAssociation. Presentation retrieved from .Stasinopoulos, D. M., R. A. Rigby, et al. (2007). Generalized additive models for location scaleand shape (gamlss) in r.

Journal of Statistical Software 23 (7), 1–46.Stewart, R. T. (2016). Bank fraud and the macroeconomy.

Journal of Operational Risk 11 (1).Wagner, S. M., K. J. Mizgier, and S. Papageorgiou (2017). Operational disruptions and businesscycles.

International journal of production economics 183 , 66–78.Weiss, B. and A. Winkelmann (2011). Developing a process-oriented notation for modeling oper-ational risks-a conceptual metamodel approach to operational risk management in knowledgeintensive business processes within the ﬁnancial industry. In

System Sciences (HICSS), 201144th Hawaii International Conference on , pp. 1–10. IEEE.36 ppendixA Tables

Notes on Table A-1 :The main source of the list of banks in Table A-1 is ORX (2016). The list only shows banksthat were active members of the ORX association in the period 2006-2010. As shown, memberbanks are gathered all over the world but belong mainly to advanced economies. Some membersbanks do not belong to the association anymore because they ﬁled into bankruptcy especiallyas a consequence of The Global Financial Crisis of 2007-2009 or because they were absorbed byother banks.The main source is then complemented by partial listings of ORX membership appearingon a number of presentations (see Patel 2009; Sabatini and Wills 2008; Sabatini 2009; Kennett2009)Given the information in Table A-1, the model calibration implied a varying number of totalbanks in the sample. For example, up to end of 2008, there are N=39 banks. In year 2009,10 banks entered the data exchange association, and four banks quit the association, making N = 45 banks participating in the data exchange. By the end of 2010, one more banks enteredthe association, making N = 48 active banks. We track 52 banks in total and 35 banks thatbelongs to ORX the entire ﬁve-year period. The key workable assumption is that arrivals anddepartures from the association are set at the beginning of each year. In addition, the databasecontained only banks that operated a retail-banking segment. Therefore, some banks thatbelong to the ORX association but only perform investment banking or other lines of businesswere omitted from the database. 37 able A-1. Member Banks of the ORX Data Exchange by Country and Selected Dates

Name of bank country country bank membership notecode code date1 Bank of Nova Scotia Canada CAN BNS Apr-022 Commerzbank AG Germany DEU CBA Apr-023 Deutsche Bank AG Germany DEU DBA Apr-024 BNP Paribas France FRA BNP Apr-025 ABN AMRO Netherlands NLD ABN Apr-026 Fortis NL Netherlands NLD FTL Apr-02 *7 ING Group Netherlands NLD ING Apr-028 JPMorgan Chase & Co. United States USA JPM Apr-029 Bank of America United States USA BOA Mar-0410 West LB Germany DEU WLB Jun-04 *11 Banesto Spain ESP BNS Jun-05 *12 SEB (Skandinaviska Enskilda Banken) Sweden SWE SEB Jun-0513 Credit Agricole SA France FRA CAS Dec-0514 Banc Sabadell Spain ESP BSB Apr-0615 Cajamar Spain ESP CMR Apr-0616 Barclays Bank United Kingdom GBR BLB Apr-0617 Bank Austria - Creditanstalt Austria AUT BAC Jun-0618 Fortis Belgium BEL FTS Jun-06 *19 Caixa Catalunya Spain ESP CCT Jun-06 *20 Banco Portugues de Negocios Portugal PRT BPN Jun-06 *21 National City United States USA NAT Jun-06 *22 Erste Group Bank AG Austria AUT EGB Ago-0623 BMO Financial Group Canada CAN BMO Oct-0624 Royal Bank of Canada (RBC) Financial Group Canada CAN RBC Oct-0625 Banco Popular Spain ESP BPO Dec-06 *26 Lloyds Banking Group United Kingdom GBR LBG Dec-0627 US Bancorp United States USA USB Dec-0628 Grupo Santander Spain ESP BST Jan-0729 Toronto Dominion Bank Group (TD BG) Canada CAN TDB Mar-0730 Banco Pastor Spain ESP BPS Jun-07 *31 Caja Laboral Spain ESP CLB Jun-07 *32 HBOS PLC United Kingdom GBR HBO Jun-07 *33 Wachovia Corporation United States USA WCR Jun-0734 Washington Mutual United States USA WAM Jun-0735 PNC Bank United States USA PNC Nov-0736 Royal Bank of Scotland Group United Kingdom GBR RBS Dec-0737 Bank of Ireland Group Ireland IRL BIG Dec-0738 HSBC Holdings plc United Kingdom GBR HSB Jan-0839 Hana Bank South Korea KOR HBK Jan-08 *40 Rabobank Nederland Netherlands NLD RBN Jan-0841 National Australia Bank Australia AUS NAB Sep-0842 Banco Bradesco S/A Brazil BRA BSC Sep-0843 Caixanova Spain ESP CNV Sep-08 *44 Wells Fargo & Co United States USA WFC Sep-0845 First Rand South Africa ZAF FRD Sep-0846 Deutsche Postbank AG Germany DEU DPB Nov-0847 Standard Chartered Bank Singapore SGP STA Dec-0848 Capital One United States USA CON Mar-0949 Bank of New York Mellon United States USA BNY Jul-0950 Westpac Banking Corporation Australia AUS WBC Nov-0951 Commonwealth Bank of Australia Australia AUS CBA Dec-0952 Soci´et´e G´en´erale France FRA SGL Dec-09

Note: * marks banks that are not members of the ORX association any more.

B Figures able A-2. Severity regressions with GAMLSS

Mean regression Model 1 Model 2 Model 3 Model 4 Model 5 Model 6Intercept -2.80 -2.78 -2.43 -2.65 -2.47 -2.59(***) (***) (***) (***) (***) (***)GDP growth 0.01 0.01 0.01 0.01 0.01(*) (*) (*) (*) (*)Crisis -0.08(**)CPI -0.05 -0.05 -0.17 -0.12 -0.16 -0.10(***) (***) (***) (***) (***) (.)Government Eﬀectiveness 0.33(***)Regulatory Quality 0.21(*)Rule of Law 0.29(***)Control of Corruption 0.10Control -0.40 -0.63 -0.26 -0.44 -0.42 -0.42(*) (***) (*) (*) (*)Employees per branch 0.13 0.14 0.13 0.14 0.14 0.13(***) (***) (***) (***) (***) (***)Assets per employee -0.01 -0.01 -0.01 -0.00 -0.00 -0.01(.) (*) (*) (***)Scale regressionIntercept 0.73 0.56 1.12 0.83 1.12 0.92(***) (***) (***) (***) (***) (***)GDP growth 0.01 -0.02(*)CPI -0.07 -0.02 -0.20 -0.15 -0.21 -0.11(***) (.) (***) (***) (***) (***)Employees per branch -0.01 -0.01 -0.02 -0.02 -0.02 -0.01(***) (***) (***) (***) (***) (***)DiagnosticsGlobal Deviance -6657.35 -6633.58 -6677.04 -6718.08 -6732.36 -6658.17AIC -6581.01 -6553.02 -6598.11 -6610.79 -6624.83 -6577.47SBC -6344.62 -6303.55 -6353.68 -6278.53 -6291.84 -6327.55

All models include additional regressors that are deﬁned in terms of smoothed terms. They are not reportedhere because they are used as additional controls. Smoothing is performed with p-splines.Signiﬁcance codes are 0 = (cid:48) ∗ ∗ ∗ (cid:48) , 0 .

001 = (cid:48) ∗∗ (cid:48) , 0 .

01 = (cid:48) ∗ (cid:48) , 0 .

05 = (cid:48) . (cid:48) , 0 . (cid:48)(cid:48) . able A-3. Frequency regressions with GAMLSS

Mean regression Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7Intercept 4.33 4.32 4.28 4.23 4.22 4.24 4.40(***) (***) (***) (***) (***) (***) (***)GDP growth 0.13 0.13 0.13 0.13 0.14 0.14 0.12(***) (***) (***) (***) (***) (***) (***)Dummy for crisis 0.04CPI 0.02Gov. eﬀectiveness 0.12Regulatory Quality 0.13Rule of Law 0.17Control of Corruption 0.00Control -3.50 -3.51 -3.68 -3.70 -3.69 -3.83 -3.69(***) (***) (***) (***) (***) (***) (***)Assets per employee -0.03 -0.03 -0.03 -0.03 -0.03 -0.04 -0.04(.) (*) (*) (*) (*) (*) (*)Scale regressionIntercept -0.14 -0.14 -0.11 -0.10 -0.07 -0.07 -0.47Employees per branch 0.05 0.05 0.05 0.05 0.05 0.05 0.05(**) (**) (**) (**) (**) (**) (**)DiagnosticsGlobal Deviance 1581.63 1581.58 1570.44 1581.11 1570.75 1562.64 1535.25AIC 1611.76 1613.72 1606.82 1612.27 1609.40 1604.36 1584.57SBC 1663.01 1668.40 1668.74 1665.30 1675.15 1675.34 1668.48

1) All models include additional regressors that are deﬁned in terms of smoothed terms. They are not reported herebecause they are used as additional controls. Smoothing is performed with p-splines.2) Signiﬁcance codes are 0 = (cid:48) ∗ ∗ ∗ (cid:48) , 0 .

001 = (cid:48) ∗∗ (cid:48) , 0 .

01 = (cid:48) ∗ (cid:48) , 0 .