[PDF] Towards global monitoring: equating the Food Insecurity Experience Scale (FIES) and food insecurity scales in Latin America

Abstract

In order to face food insecurity as a global phenomenon, it is essential to rely on measurement tools that guarantee comparability across countries. Although the official indicators adopted by the United Nations in the context of the Sustainable Development Goals (SDGs) and based on the Food Insecurity Experience Scale (FIES) already embeds cross-country comparability, other experiential scales of food insecurity currently employ national thresholds and issues of comparability thus arise. In this work we address comparability of food insecurity experience-based scales by presenting two different studies. The first one involves the FIES and three national scales (ELCSA, EMSA and EBIA) currently included in national surveys in Guatemala, Ecuador, Mexico and Brazil. The second study concerns the adult and children versions of these national scales. Different methods from the equating practice of the educational testing field are explored: classical and based on the Item Response Theory (IRT).

Full PDF

TTowards global monitoring: equating the Food

Insecurity Experience Scale (FIES) and foodinsecurity scales in Latin America

Federica Onori, Sara Viviani and Pierpaolo Brutti

Abstract

In order to face food insecurity as a global phenomenon, it is essential to rely onmeasurement tools that guarantee comparability across countries. Although the oﬃcialindicators adopted by the United Nations in the context of the Sustainable DevelopmentGoals (SDGs) and based on the Food Insecurity Experience Scale (FIES) already embedscross-country comparability, other experiential scales of food insecurity currently employnational thresholds and issues of comparability thus arise. In this work we addresscomparability of food insecurity experience-based scales by presenting two diﬀerentstudies. The ﬁrst one involves the FIES and three national scales (ELCSA, EMSAand EBIA) currently included in national surveys in Guatemala, Ecuador, Mexico andBrazil. The second study concerns the adult and children versions of these nationalscales. Diﬀerent methods from the equating practice of the educational testing ﬁeld areexplored: classical and based on the Item Response Theory (IRT). a r X i v : . [ s t a t . A P ] F e b Introduction

Food security is a subject of indisputable relevance, being it conceived as a basic human rightsince , as stated in Article of the Universal Declaration of Human Rights: “Everyonehas the right to a standard of living adequate for the health and well-being of himself and ofhis family, including food, clothing, housing and medical care” [2]. However, food security isa complex and multifaceted concept whose terminology has long been aﬀected by a varietyof sectors and disciplines strictly related to it (e.g. agriculture, nutrition, economy, publicpolicy, etc... ) [7, 23]. A consensus around the deﬁnition of food security was ﬁnally reachedduring the World Food Summit in when it was formalized as follows: “Food securityexists when all people, at all times, have physical and economic access to suﬃcient, safeand nutritious food that meets their dietary needs and food preferences for an active andhealthy life" [17]. Grounding on this deﬁnition, the conceptualization and operationalizationof food security emerge as that of a multidimensional phenomenon made up of four diﬀerent,hierarchically ordered dimensions: availability, access, utilization and stability [6, 29]. As aconsequence, no single indicator can be successfully designated to return a thorough pictureof the phenomenon, but a suite of indicators exists, each monitoring speciﬁc aspects of foodsecurity at diﬀerent levels of the observation: national, regional, households and individual[23]. Among all possible aspects related to food insecurity, the dimension of access to food isgiven nowadays high-priority, being acknowledged among the Sustainable DevelopmentGoals (SDGs) of the

Agenda for Sustainable Development adopted by the UnitedNations. Access to food is in fact the subject of Target . [28], which states:By 2030, end hunger and ensure access by all people, in particular the poor andpeople in vulnerable situations, including infants, to safe, nutritious and suﬃcientfood all year round.Although food security is now a well-established concept within the scientiﬁc community,its deﬁnition changed throughout the last century and so did the tools employed to measurethe phenomenon [7, 23]. A brief summary of the main steps will enable to fully appreciatethe novelties brought about by the measurement tools developed since the ’ s. Duringthe s and for some decades on, the issue of food security was completely identiﬁedwith that of having enough provisions to cover the needs of the population and, thereforethe “food problem" was mainly dealt with in terms of country-level supplies [14, 15, 16].Nevertheless, this formulation could not catch the aspect, yet observable, of malnutrition andfamines in countries that did not suﬀer from food supply at national level [9], signal that a This deﬁnition was further reﬁned in [18], when food access was not only conceived in terms ofaﬀordability and physical access, but also in terms of removal of social barriers . The community of researchers,practitioners and political decision makers currently agree upon the following deﬁnition:Food security exists when all people, at all times, have physical, social and economic access tosuﬃcient, safe and nutritious food that meets their dietary needs and food preferences for anactive and healthy life. access to food . To mark this change in prospective,the expression household food insecurity began to be used. Since then, other shifts pertainedto the deﬁnition of food insecurity as for what we use today. A very fundamental onewas in the s when interest moved from dietary energy adequacy to experience of foodinsecurity and livelihood conditions, which involved, among others, also social, nutrition andpsychological considerations. Food insecurity has in fact been recognized as a “managedprocess", described by means of a spectrum of behaviours and coping strategies that canreveal the level of severity of a food access condition [30]. Although speciﬁc attitudes andcoping strategies might change from country to country, there is a general consensus inthe scientiﬁc community about the common pattern of behaviours that characterize foodinsecurity with very minor diﬀerences across cultures [11]. To this regard, ethnographic andsocietal studies established that, in case of increasing lack of money or other resources, acommon pattern of experiences and behaviours manifests in order to cope with shortageof food [30]: at ﬁrst, psychological concern arises since people start worrying about havingenough food; then, a change in the diet occurs by decreasing the quality and variety of theconsumed food in order to face a concrete limited access to food; and, in case of more severefood shortages, people would diminish the quantity of consumed food by reducing meals’size and then by even skipping meals, potentially up to experiencing hunger. The steps justdescribed are commonly referred to as the three domains of resource-constrained access tofood: psychological concern, decrease of food quality, decrease of food quantity and hunger .Mirroring these shifts in the paradigm (from global and national to households and indi-viduals; from food supplies to livelihood conditions; and from objective to subjective measures[7]), a number of indicators have been proposed to measure food insecurity, like measures ofadequacy of food consumption, prevalence of undernourishment, dietary diversity score, etc...Among all, experience-based food insecurity scales found a place of relevance, having proved tobe a valid and reliable tool for measuring food insecurity in its access dimension, encompassingthe current deﬁnition of the phenomenon while adopting a behavioural perspective [7]. Asthe name suggests, experience-based food insecurity scales measure access to food from abehavioural perspective, building on a set of items that directly ask people about their ownpersonal experiences and behaviours related to the three domains of access to food [23]. Thevery ﬁrst experience-based food insecurity scale was the Household Food Security SurveyModule (HFSSM), applied yearly in the United States of America since for monitoringpurposes [22]. As a matter of fact, the HFSSM pioneered in this ﬁeld and several countries inLatin America followed this example by developing their own national scales to be includedin national surveys for periodical monitoring. In , Brazil included the Brazilian Scaleof Food Insecurity (EBIA) into national Brazilian surveys; Haiti, Guatemala and Ecuador, The aim of this ﬁrst part of the work was mainly to provide a general framework for the topic andclarify that the expression “food insecurity” technically refers to a multitude of aspects that relate to diﬀerent dimensions . However, in order to avoid confusion and enable an agile treatise of the subject, hereafter “foodinsecurity” will speciﬁcally be meant at the individual or household level and interpreted as the set of therestrictions in accessing food due to limited resources (or, equivalently, resources-constrained access to food ).This choice will also facilitate conceiving food insecurity as a measurable construct.

Latin American and Caribbean Food Security Scale (Escala Latinoamericana y Caribeña de Seguridad Alimentaria - ELCSA); and in

Mexicodeveloped its adaptation of the ELCSA, called

Mexican Food Security Scale (EMSA). Peculiarto these scales is the availability of two diﬀerent survey modules, one for households withchildren and one for households without children and made up of a diﬀerent number ofitems. Finally, beside these country-speciﬁc applications of the experience-based approachto measuring food insecurity, in the Food and Agriculture Organization of the UnitedNations (FAO) launched the Voices of the Hungry project (VoH) and developed the FoodInsecurity Experience Scale (FIES) conceived as a global adaptation of HFSSM and ELCSA[19]. The FIES is based on people’s responses to only dichotomous items and, by meansof an ad-hoc methodology that grounds on the Item Response Theory (IRT), and morespeciﬁcally on the Rasch model, it is the ﬁrst food insecurity measurement system based onexperiences that generates formally comparable measures of food insecurity across countries.As such, it is one of the oﬃcial measurement tool for monitoring progresses toward Target . of the SDGs, being the scale used to compute the related Indicator . . , ( Prevalence offood insecurity at moderate and severe levels based on FIES ) [3, 4, 19].Although the national and regional scales proved to be adequate tools for measuring andmonitoring access to food within each country [10, 34], the need for a global monitoring,such as that sought in the context of the SDGs, raised the issue of comparing results fromapplications of diﬀerent scales in diﬀerent countries [6]. In fact, despite sharing a commonevolution, each national scale uses speciﬁc thresholds to measure prevalences of food insecurityfor nominally

RM.weights [8], equate [1] and plink [35]. The remaining of the paper is organized as follows: Section presents the data and Section is devoted to describe the pillars and the methods of the TestEquating; Section presents the main results; and Section concludes with some remarksand possible directions for future works. As already mentioned, the FIES is strongly based on the ELCSA, which in turn representsa common ancestor for other scales in use in Latin America (EMSA, EBIA, etc...). As aconsequence, all these scales largely share the same cognitive content of the items, whichconstitutes the promising ground on which addressing comparability. Nevertheless, the FIESand the national scales show important diﬀerences. First of all, national scales measurefood insecurity at the household level, while the FIES produces national measures of foodinsecurity at the individual level . Secondly, national scales have a reference period of months, while the FIES refers to the months previous to the interview. Thirdly, andperhaps most importantly, national scales compute prevalences of food insecurity following a deterministic methodology based on raw scores (number of aﬃrmative responses) and usediscrete thresholds (expressed in terms of raw scores) for computing prevalences of foodinsecurity at diﬀerent levels of severity. On the other hand, VoH methodology for the FIESis probabilistic in nature in that it ﬁts the Rasch model to the data, models access to foodby means of a probabilistic distribution and computes prevalences of food insecurity usingthresholds on the continuum latent trait. The survey modules on which ELCSA, EMSA and EBIA are built have strong similarities[10, 34]. They all account for the three domains of the access dimension of food insecuritydiscussed in the previous section, aim at measuring food insecurity at the household leveland all adopt the same reference period of months previous to the day of the moduleadministration. As far as the methodology is concerned, ELCSA, EMSA and EBIA agree ona similar procedure that can be summarized in few steps [10, 34]:1. Computation of a raw score for each household: by counting the number of items aﬃrmedby that household. Raw scores represent an ordinal measure of food insecurity: the highestthe raw score, the more severe the level of food insecurity.2. Computation of prevalences of food insecurity at three levels of severity: mild, moderateand severe. Prevalences are computed as percentages of households in the sample thatscored within a certain range expressed in terms of raw scores and with diﬀerent thresholdsdepending on whether children live in the household or not (Table 1).3. Data validation. Homogeneity of the items comprising the scale is assessed by ﬁtting theRasch model to the data. 4oreover, each national scale makes use of two diﬀerent versions of the survey module,distinguishing between households with children (i.e. people under the age of years) andhouseholds without children. The ﬁrst group of survey modules is usually made up of to household-referenced items and, for the sake of simplicity, the scale obtained from this setof items will be referred to, in this work, as the Adult scale. The second one integrates theﬁrst one by adding from to extra children-referenced questions and the scale obtainedfrom this set of items will be referred to as the Children scale. The two survey modules thusencompass a diﬀerent number of items and, from each of them, a scale is built that usesdiﬀerent thresholds to compute prevalences of food insecurity that should be meant to reﬂectthe same level of severity. Prevalences derived from the two scales are then considered jointlyin order to derive national prevalences of food insecurity.

Scale Food insecurity Households HouseholdsLevel without children with childrenELCSA mild 1 to 3 1 to 5moderate 4 to 6 6 to 10severe 7 to 8 11 to 15

EMSA mild 1 to 2 1 to 3moderate 3 to 4 4 to 7severe 5 to 6 8 to 12

EBIA mild 1 to 3 1 to 5moderate 4 to 6 6 to 10severe 7 to 8 11 to 15

Table 1:

Classiﬁcations of food insecurity using national scales (ELCSA, EMSA and EBIA) andcorresponding ranges of the raw scores for households with and without children.

It is worth highlighting that, as reported in Table 1, the thresholds used to computecategories of food insecurity that nominally reﬂect the same level of severity (mild, moderateor severe), are country (or regional)-speciﬁc. As a matter of fact, these thresholds were notchosen in order to assure comparability among countries (no matter how geographically closeto each other they might be) nor in light of clear statistical properties, but according toopinions of experts from the nutrition and social sciences ﬁelds. The same considerationholds for the thresholds chosen for the household referenced-scale and the children-referencedscale within each national context. As a consequence, there is no clear guarantee that, forexample, a raw score of truly reﬂects the same level of severity in Mexico and Brazil, orthat, applying ELCSA in Guatemala, and can be considered as equivalent scores inhouseholds without and with children, respectively. Inspired by Target . of the SDGs, the Voices of the Hungry (VoH) project of the Food andAgriculture Organization developed the Food Insecurity Experience Scale (FIES), designed5o have cross-cultural equivalence and validity in both developing and developed countries,aiming at producing comparable prevalences of food insecurity at various levels of severity[19]. As reported in Table 2, the FIES Survey Module is made up of dichotomous itemsaccounting for the three domains of access to food. Since , the FIES Survey Module(FIES-SM) is part of the Gallup World Poll (GWP) Survey, from Gallup Inc. [33], a surveythat is repeated every year in over countries and administered to a sample of adultindividuals (aged or more) representative of the national population. This has practicallyallowed to reach countries that do not have a national measurement system for food insecurity,yet. In accordance with the characteristics of the GWP, the version of the FIES-SM hereconsidered refers to a period of months prior to the survey administration and investigatesfood insecurity at the level of adult individuals (people aged older than years), whichrepresents a ﬁrst diﬀerence between FIES and the national scales. Items Abbreviations

During the last 12 months, was there a time when,because of lack of money or other resources:1. You were worried you would not have enough food to eat? WORRIED2. You were unable to eat healthy and nutritious food? HEALTY3. You ate only a few kinds of foods? FEWFOOD4. You had to skip a meal? SKIPMEAL5. You ate less than you thought you should? ATELESS6. Your household ran out of food? RUNOUT7. You were hungry but did not eat? HUNGRY8. You went without eating for a whole day? WHLDAY

Table 2:

FIES Survey Module (FIES-SM) for individuals and with a reference period of months. However, the main diﬀerence between the two is in the methodology used [19]. The Vohmethodology developed for the FIES employs a probabilistic model not only as a validationtool (for assessing homogeneity of the items in the scale), but also for computing measurementsof food insecurity. In fact, food insecurity is treated as a latent trait whose measurement isachieved by means of some “observables" (the items’ answers) and a probabilistic model thatlinks the two. The Rasch model (also known as the one-parameter logistic model or 1PLmodel) is one of the most simple model that can serve this purpose while, at the same time,assuring a set of favourable measurement properties [20, 31]. It was proposed in the contextof educational testing, where the purpose is generally to score students based on a set ofquestions (items) and, according to this model, the probability of a respondent to correctlyanswering the j − th item is modelled as a logistic function of the distance between twoparameters, one representing the item’s severity ( b j ) and one representing the respondent’s6bility ( θ ): P j ( θ ) = P ( X j = 1 | θ ; b j ) = exp( θ − b j )1 + exp( θ − b j ) . (1)The Rasch model provides a sound statistical framework to assess the suitability of aset of items for scale construction and comparing performance of scales. Basic assumptionsare unidimensionality, local independence, monotonicity, equal discriminating power of theitems and logistic shape of the Item Response Functions (IRFs). Moreover, it has severalinteresting properties for which it earned its success among social science measurement models,like suﬃciency of the raw score, independence between items and examinees’ parameters,and invariance property [21]. In the context of food insecurity, the item’s severity can beinterpreted as the severity of the restrictions in food access represented by each item while theability parameters are to be meant as the overall severity of the restrictions in accessing foodthat the respondent had to face (in light of her answers to the items in the survey module).From the point of view of the Stevenson’s classiﬁcation of scales [32], this way of measuringfood insecurity guarantees the construction of an interval scale as opposed to the ordinal scale obtained from the methodology employed, for instance, for ELCSA, EMSA and EBIAand named deterministic as opposed to the probabilistic developed for the FIES. Moreover,prevalences obtained by means of the FIES are guaranteed to be comparable across countries,thanks to the implementation of an equating step for which estimates of the model parametersobtained in a single application of the scale are adjusted on the FIES Global Standard scale,a set of item parameters serving as a reference metric and based on application of the FIESin all countries that were covered by the GWP survey in , and [19] (Fig. 1).Finally, each respondent is assigned a probabilistic distribution of his/her food insecurityalong the latent trait, depending on his/her raw score. This distribution is Gaussian withmean equal to the adjusted (to the Global Standard) respondent parameter and standarddeviation equal to the adjusted measurement error for that raw score. As a last step, thismixture of distributions is used to compute the percentage of population whose severityis beyond a ﬁxed threshold on the latent trait, calculated as a weighted sum across rawscores, with weights reﬂecting the proportions of raw scores in the sample (Figure 2). Whiletheoretically it is possible to compute percentages of population beyond each and every valueon the continuum, the VoH methodology suggests the computation of two prevalence ratescorresponding to choosing thresholds on the Global Standard metric set at the severity ofitems ATELESS ( − . ) and WHLDAY ( . ) (Fig. 1). The resulted indicators of foodinsecurity take the name of Prevalence of Experienced Food Insecurity at moderate or severelevels ( F I

Mod + Sev ) and

Prevalence of Experienced Food Insecurity at severe levels ( F I

Sev ),respectively. However, in order for these quantities to be valid and reliable measurementsof food insecurity, a validation step must be undertaken in each and every application ofthe scale. This is commonly performed by computing goodness-of-ﬁt statistics of the Raschmodel (e.g. Inﬁt, Outﬁt and Rasch reliability statistics) that assess the good behaviour ofthe items and by performing a Principal Component Analysis (PCA) on the residuals toinvestigate the existence of a second latent trait. For more details on the usage of the Raschmodel as a measuring tool for food insecurity we refer the reader to [27], while for more7nsights in the VoH methodology for the FIES we refer to [19].

Figure 1:

The FIES Global Standard.

Figure 2:

Distributions of severity of food insecurity among respondents according to their rawscores Data

Data referred to Guatemala were collected in the

Encuesta Nacional de Condiciones deVida (ENCOVI) conducted by the

Instituto Nacional de Estadística (INE) in and thesample used included households. Data referred to Ecuador were collected in the

Encuesta Nacional de Empleo y Desempleo (ENCOVI) conducted by the

Instituto Nacionalde Estadísticas y Censos (INEC) in and the sample used included households.Data referring to Mexico were collected in the

Encuesta Nacional de Ingresos y Gastos de losHogares (ENIGH) conducted by the

Instituto Nacional de Geograﬁa e Estatística in andthe sample used included households. Data referring to Brazil were collected in the Pesquisa Nacional de Amostra de Domicílios (PNAD) conducted by the

Instituto Nacionalde Geograﬁa e Estatística (IBGE) in and the sample used included households.All samples were representative of the corresponding national populations.

Scores deriving from tests usually have an important role in the decision making processthat brings to excluding some candidates for a job or scholarship position, or adoptingspeciﬁc public policy strategies in order to take action on a public relevant issue. Evidently,this requires that tests to be administered in multiple occasions, as it is the case for theadmission college tests that are held in speciﬁc test dates during the year. Therefore, a crucialconsideration arises: if the same questions were included in the tests, students that alreadytook the test would have an advantage and the test would rather measure the degree ofexposure of students to past tests than their ability on some speciﬁc subject. At the same time,it is important that all students take the “same" test, in order to fairly compare performancesand make decisions accordingly. This issue is commonly addressed by administering on everytest date a diﬀerent version of the same test, called test form , that is built according to certaincontent and statistical test speciﬁcations . Nonetheless, minor diﬀerences might still occuramong diﬀerent test forms, one resulting slightly more diﬃcult than the others. Therefore,in order to evenly score students that took multiple test forms and establish if a poorerperformance is due to a less skillful respondent and not to a more diﬃcult test, a procedure isneeded to make tests comparable. This procedure is called equating and it is formally deﬁnedas the statistical process that is used to adjust for diﬀerences in diﬃculty between testsforms built to be similar in content and diﬃculty, so that scores can be used interchangeably[13, 24]. Every test equating should meet some fundamental equating requirements and needsthe speciﬁcation of both a data collection design and of one or more methods to estimate anequating function. All these aspects will be discussed in the remaining of this section.

Equating scores on two test forms X and Y must meet some requirements that assure that theequating to be meaningful and useful (i.e. equated scores can be used interchangeably). Thefollowing ﬁve requirements are globally considered of primary importance for an equating to9e run, although they would better be considered as general guidelines than easily veriﬁableconditions:• Equal Construct Requirement

Tests that measure diﬀerent constructs should notbe equated.•

Equal Reliability Requirement

Tests that measure the same construct but diﬀerin reliability should not be equated.•

Symmetry Requirement

Equating function that equate scores on X to scores on Y should be the inverse of the equating function that equate scores on Y to scores on X .• Equity Requirement

For the examinee should be a matter of indiﬀerence which testwill be used.•

Population Invariance Requirement

Equating function used to equate scores on X and scores on Y should be population invariant in that the choice of a speciﬁcsub-population used to compute the equating function should not matter.It might be the case that the two tests to be equated do not satisfy all ﬁve requirements.For example, they could diﬀer in length and statistical speciﬁcations, with consequences onthe “Equal Reliability requirement” and “Equity requirement”. In fact, a longer test wouldbe in general more reliable and, if a poorly skillful examinee had to be scored, he or shewould have more chance to score higher if administered the shortest test. The aforementionedrequirements assure that scores derived from tests that do meet all of them can be usedinterchangeably while, if they do not all strictly hold, the exercise would rather be addressedas a weaker analysis of comparability named linking [13, 24]. There are basically two ways in which data collection designs can account for diﬀerencesin the diﬃculty of two or more test forms in test equating, namely either by the use of“common examinees" or the use of “common items" [13, 24]. In the ﬁrst case the same groupof examinees (or two random samples of examinees from the same target population) takeboth tests. In this case, any diﬀerence in the scores is attributable to diﬀerences in the testforms. Examples of this category are the “Single-Group" (SG) and the “Equivalent-Groups"(EG) designs. In the second case, a set A of common items called anchor test is included inboth test forms in order to account for such diﬀerences. Therefore, any diﬀerence betweenscores on the anchor test is due to diﬀerences among examinees. Data designs that use thismethod are called “Non-Equivalent groups with Anchor Test" (NEAT) designs. Several equating methods have been proposed and applied to equate observed scores onequatable tests. In this section an overview of the most common and popular methods10s provided, starting from the observed-score methods of mean, linear and equipercentileequating and ending up with the true score equating in the context of IRT. All these methodshave been implemented for the two comparability studies between experience-based foodinsecurity scales that are presented in this work.

Let X and Y be two tests (or two forms of the same test) scored correct/incorrect (1/0).Scores on test X and Y will be denoted as random variables X and Y with possible values,respectively x k , ( k = 0 , . . . , K ) and y l (for l = 0 , . . . , L ), where K and L are the lengths oftests X and Y , respectively. We denote the score probabilities of X and Y by r k = P ( X = x k ) and s l = P ( Y = y l ) . (2)The cdfs of X and Y are denoted by F ( x ) = P ( X ≤ x ) and G ( x ) = P ( Y ≤ y ) (3)and the moments are, respectively µ X = E ( X ) , µ Y = E ( Y ) (4)and σ X = SD ( X ) , σ Y = SD ( Y ) (5) Mean equating

In mean equating, test form X is assumed to diﬀer from test form Y bya constant amount along the scale. For example, if form X is 2 points easier than form Y forlow-ranking examinees, the same will hold for high-ranking examinees. In mean equating,two scores on diﬀerent forms are considered equivalent (and set equal) if they are the same(signed) distance from their respective means, that is x − µ X = y − µ Y . (6)Then, solving for y , the score on test Y that is equivalent to a score x on test X , and called m Y ( x ) , is m Y ( x ) = y = x − µ X + µ Y . (7)Clearly, mean equating allows for the means to diﬀer in the two test forms. Linear equating

In linear equating, diﬀerence in diﬃculty between the two tests is notconstraint to remain constant but can vary along the score scale. In this equating method,scores are considered equivalent and set equal if they are an equal (signed) distance from theirmeans in standard deviation units, that is the two standardized deviation scores (z-scores)on the two forms are set equal x − µ X σ X = y − µ Y σ Y (8)11rom which the score on test Y equivalent to a score x on test X , and that is called l Y ( x ) , is l Y ( x ) = y = σ Y σ X x + (cid:20) µ Y − σ Y σ X µ X (cid:21) . (9)where σ Y σ X can be recognized as the slope and µ Y − σ Y σ X µ X as the the intercept of the linearequating transformation. Linear equating allows for both means and scale units to diﬀer inthe two test forms. Equipercentile equating

In the equipercentile equating method a curve is used to describediﬀerences between scores in the two forms. Basic criterion for the equipercentile equatingtransformation is that the distribution of the scores on Form X converted to the Form Y scale is equal to the distribution of scores on Form Y . Scores on the two forms are considered-and set- equivalent if they have the same percentile rank . We adopt here the deﬁnitionof equipercentile equating function given by Braun and Holland in [5]. Let’s consider therandom variables X and Y representing the scores on forms X and Y and F and G theircumulative distribution functions. We call e Y the symmetric equating function convertingForm X scores into scores on Form Y scale and G (cid:63) the cumulative distribution function of e Y ( X ) , that is the cdf of the scores on Form X converted to the Form Y scale. Function e Y is the equipercentile equating function if G (cid:63) = G . According to the deﬁnition of Braun andHolland, if X and Y are continuous random variables, then e Y ( x ) = G − [ F ( x )] , (10)is an equipercentile equating function, where G − is the inverse of G . This deﬁnition meetsthe “Symmetric requirement” and, given a Form X score, its equivalent on the Form Y scaleis deﬁned as the score having the same percentage of examinees at or below it. Equating diﬀerent forms of the same test using the IRT-True Score equating (IRT-TS) is athree steps process [12]:1.

Estimation : Fit an IRT-model to the data for both tests.This step consists in assessing goodness-of-ﬁt of a speciﬁc IRT model and estimating itemparameters for both forms. In the case of the Rasch model, in light of the suﬃcient statisticsproperty, estimates of the item severities do not depend on the group of examinees andtherefore the IRT-TS based on the Rasch model can be claimed to meet the “PopulationInvariance requirement", since it produces results that are sample-independent.2.

Linking : Put parameters’ estimate on a common metric through a linear transformationbased on a set A of common items.In this second step, a linear transformation is used to bring parameter estimates to acommon IRT scale. In fact, "if an IRT model ﬁts the data, then any linear transformation12f the θ -scale also ﬁts the data, provided that the item parameters are transformed aswell" [24]. Let consider Form X made up of J dichotomously scored items administeredto N examinees and let consider the Rasch model to ﬁt the data. Then, if P and Q areRasch scales that diﬀer by a linear transformation, the item severities b j , j ∈ { , . . . , J } are related as follows b j Q = Ab j P + B, j ∈ { , . . . , J } and the same relationship holds for the ability parameters. A useful way to express theconstants A and B is through the mean and standard deviation of the item parameters inboth scales A = σ ( b Q ) σ ( b P ) , B = µ ( b Q ) − Aµ ( b P ) . In equating two diﬀerent forms of the same test with a set of common items administered tonon-equivalent groups, it is possible to exploit this linear relationship through the so called

Mean/Sigma transformation method [26], which uses means and standard deviations ofitem parameter’s estimates of only those items in the anchor test. More speciﬁcally, givenForm X and Form Y with a set A of common items, estimates of the diﬃculty parametersfor items in the set A in the two calibrations are linked via a linear transformation andused to compute the coeﬃcients A and B of this transformation. Once the transformationis estimated, it can be applied to transform the ability parameters on one Form to thecorresponding parameters on the other Form, thus enabling comparability between thetwo Forms.3. Equating : Get equivalent expected raw scores through the Test Characteristic Curves ofthe two tests (TCC).Once the metrics of the two Forms are put on the same scale (that can be either the scaleof one of them or a third scale) it is ﬁnally possible to compare performance of examineestaking the two Forms. However, as it often happens with standardized tests, reportedscores could be expressed in terms of raw scores and, if this is the case, a further stepis needed. Within the framework of IRT, it is possible to mathematically relate abilityestimates to speciﬁc true scores on each test form. The

IRT- True Score equating methodcomputes equivalent true scores in the two forms and considers them, as it is common inthe practice of equating, as equivalent observed scores [25]. Given Form X and Form Y two test forms measuring the same ability θ with respectively n X and n Y items and givenboth item and ability estimates are on the same scale through a linear transformation, theestimated true scores on the two forms are related to θ by the so called Test CharacteristicCurve as follows T X = n X (cid:88) j =1 ˆ P j ( θ ) , T Y = n Y (cid:88) i =1 ˆ P i ( θ ) where T X (respectively T Y ) is the estimated true score for Test X ( Y ) and ˆ P i ( θ ) ( ˆ P j ( θ ) ) isthe estimated probability function for item j ( i ) (Fig. 3). Through the Test CharacteristicCurve, an ability θ can thus be transformed into an estimated true score on the test form13nd, provided ability parameters on the two forms are put on the same metric, true scorescorresponding to the same θ are considered equivalent. Figure 3:

Test Characteristic Curve (TCC) referred to a test of three items (left) and a pictorialdescription of the IRT-True Score (IRT-TS) equating method (right).

As a preliminary step to both equating studies, the Rasch model has been ﬁtted to all eightdatasets (an Adult scale and a Children scale in the four countries), and a validation step wasperformed to conﬁrm the good behaviour of the scale. In all eight applications it was possibleto observe an overall good ﬁt of the model. Assumptions of equal discrimination of the itemswas certainly met, thanks to item Inﬁt statistics entirely in the range of (0 . , . , conﬁrmingthe strength and consistency of the association of each item with the underlying latent trait(compare [19, 27]). Moreover, Outﬁt statistics were never as high as to warn misbehaviourdue to highly unexpected response patterns, assessing the good performance of the items.Assumptions of conditional independence and unidimensionality of the items were assessedthrough computation of conditional correlations among each pair of items and submission ofthe correlation matrix to principal component factor analysis (PCA). All pairwise residualcorrelation were, in absolute value, smaller than . thus conﬁrming that all correlationsamong items result from their common association with the latent trait. PCA performed onthe matrix of residual correlations showed the presence of only one main dimension that, dueto the cognitive content of the items, can thus be recognized as the food access dimensionthat the scales aim at measuring. Finally, overall model ﬁt is assessed by Rasch reliabilitystatistics (proportion of total variation in true severity in the sample that is accounted for bythe model), ranging between . (Mexico) and . (Guatemala) for the Adult scale and14etween . (Mexico) and . (Guatemala) for the Children scale, conﬁrming a good overalldiscriminatory power for all scales. Sporadic departures from this irreproachable behaviourcould only be attested for one or two items in the Children scale (like a residual correlationof . between two item of the ELCSA in Guatemala) that however never compromised thegood performance of the overall scale. The aim of this comparability study is to ﬁnd raw scores on the national scales EBIA, EMSAand ELCSA that can be considered equivalent to the continuum FIES global thresholdsused to compute the two indicators

F I

Mod + Sev and

F I

Sev , namely − . and . . However,it is worth noticing that, since VoH methodology uses thresholds on the continuum whilenational scales methodology uses discrete thresholds, the equivalent raw score will almostnever exactly produce the same prevalence obtained with the VoH thresholds.As it is currently set up and implemented, the FIES Module refers to adults (peopleaged or above). Therefore, in order to meet the “Equal Construct requirement”, themodules of the national scales administered to households without children have here beenconsidered. Technically, the FIES Survey Module and the survey modules of the nationalscales (households without children) will thus serve the role of test forms of the same testthat are to be equated. This was ultimately made possible in light of the common historythat brought to the development of these scales (i.e. FIES, ELCSA, EMSA, EBIA), whichassures that, despite some diﬀerences such as the level of the measurement and the referencetime (see Section 2), the survey modules used to collect data have very strong similaritiesand share the same dichotomous structure (possible answers are “Yes/No”).This ﬁrst study was carried out by implementing the following methods:1. IRT True Score (IRT-TS) equating.2.

Linking via a linear transformation applied to ability parameters.3.

Minimization of the diﬀerence between prevalences of food insecurity.The IRT-TS equating method was implemented in the context of the NEAT equatingdesign. In this work, the set A of common items was computed according to an iterativeprocedure that starts with all items considered as in common (apart from the ones classiﬁedas unique a priori ) and then discards one item at a time beginning from the one that exceedsthe tolerance threshold of . the most. Algorithm ends when a set A of items all within thisthreshold is found. Item WHLDAY was considered as unique a priori in all four equatinganalyses due to its diﬀerent cognitive content in the considered scales: more severe in theFIES since it refers to “not eating for a whole day”, and less severe in the national scales whereit reports on members of the household that either only ate once or went without eating for awhole day. The Standard Error of Equating (SEE) for the IRT True-Score equating methodwas estimated using bootstrap replications [24]. The second and third methods can beconsidered as either variations of the IRT-TS or techniques that might sound particularly15easonable in the present context. They were explored for investigation purposes and theobtained scores won’t be claimed to be “equivalent", but rather “corresponding" scores. Infact, the second method (Linking) consists in considering the linear transformation obtainedin the second step of the IRT-TS method and applying it to the estimated ability parametersof the Rasch model. Once ability parameters are adjusted to the Global Standard metric, theraw score corresponding to the ability parameters that are closer to the two VoH thresholdsare considered as corresponding raw score. On the other hand, the third method (Minimizing)consists in computing prevalences of food insecurity at the household level applying theFIES methodology to the data used for the national scales and comparing the prevalences soobtained with the percentages of population scoring from a certain raw score on. The tworaw scores that realize the minimum distance with the two VoH global thresholds (in termsof prevalences) are considered as the corresponding raw scores in accordance to this method.Results from the ﬁrst comparability study are summarized in Table 3 and Table 4, whichreport the raw scores on the national scales that are computed equivalent to the VoH globalthresholds used for the indicators F I

Mod + Sev and

F I

Sev , respectively. Table 3 shows that thethreshold used for computing

F I

Mod + Sev and corresponding to the severity of item ATELESSon the Global Standard metric (i.e. − . ) might reﬂect a less severe condition of foodinsecurity compared to the one measured by the national scales for the moderate category offood insecurity. In fact, all the equated raw scores are either equal to or around one point lessthan the thresholds currently used by ELCSA, EMSA and EBIA for this category of foodinsecurity. On the contrary, the threshold used for F I

Sev and corresponding to the severityof item WHLDAY on the Global Standard metric (i.e. . ) generally reﬂects a more severe condition of food insecurity than the one captured by the national scales for the severe levelof food insecurity, Table 4 reporting equated raw scores that are either equal to or one pointhigher than the national thresholds currently in use for this category. Food Insecurity Internal IRT-TS Linking Min. Diﬀ.Scales Monitoring Rasch (SEE)

ELCSA (Guatemala) 4 3.3 (0.19) 3 4ELCSA (Ecuador) 4 4.2 (0.14) 4 4EMSA (Mexico) 3 2.0 (0.23) 2 2EBIA (Brazil) 4 4.0 (0.09) 4 5

Table 3:

Equated Raw Scores on the national scales corresponding to the VoH threshold for

F I

Mod + Sev ( − . on the Global Standard). This second analysis aims at comparing the Adult and Children scales within each nationalcontext. To this purpose, we implemented the Single Group (SG) data collection design byconsidering the scores obtained by the households with children on both survey modules16 ood Insecurity Internal IRT-TS Linking Min. Diﬀ.Scales Monitoring Rasch (SEE)

ELCSA (Guatemala) 7 7.8 (0.18) 8 8ELCSA (Ecuador) 7 7.1 (0.18) 7 8EMSA (Mexico) 5 6.0 (0.26) 6 6EBIA (Brazil) 6 7.9 (0.07) 8 8

Table 4:

Equated Raw Scores on the national scales corresponding to the VoH threshold for

F I

Sev ( . on the Global Standard). (the one containing only adult and household-referenced questions and the one containingalso children-referenced questions). This equating design is usually not easily implemented,since it requires the same group of respondents to be administered two diﬀerent forms of thesame test, resulting in an expensive and time-consuming procedure. However, we here couldexploit the fact that the survey module for the Children scale is simply an extended versionof the module for the Adult scale (see Section 2) and as such, we can “imagine” to administerthe adult referenced survey to households with children just by dropping children-referenceditems. With regards to the equating requirements (see Section 4.1), it is worth noticingthat the two survey modules have diﬀerent length and, as such, the obtained scales couldhave diﬀerent reliability which, in turn, could potentially challenge the “Equal ReliabilityRequirement”. Equating of the Adult and Children scales in the four countries was carriedout through implementation of four equating methods: IRT True Score equating with theRasch model, Mean, Linear and Equipercentile equating methods [24]. The ﬁrst method isIRT-based while the other four are classical methods of equating, that do not rely on anymodel to ﬁt the data but only on the observed raw scores.Tables 5 and 6 show raw scores on the Children scale that are computed equivalent toraw scores on the Adult scales that are used as lower thresholds for the moderate and severecategories of food insecurity. Results suggest that the national thresholds currently usedfor nominally the same levels of severity could reﬂect diﬀerent degrees of the severity ofaccess to food. This is particularly evident when looking at the most severe category of foodinsecurity, where raw scores on the Children scale that are computed equivalent to the lowerthresholds on the Adult scale are around one point higher than the thresholds currently inuse for households with children for ELCSA in Guatemala and EMSA in Mexico and betweenone and two points lower for EBIA (Tables 5 and 6, column “Severe"). On the other hand,the corresponding raw scores for moderate food insecurity mainly align with the thresholdscurrently in use for this category (Tables 5 and 6, column “Moderate"). Interestingly, minordiﬀerences emerge between the behaviour of the equated scores for ELCSA in Guatemala andEcuador, possibly due to speciﬁc features of the phenomenon in the two countries, conﬁrmingthe importance of an equating analysis even between diﬀerent applications of the same scale.Finally, it is noteworthy that, among all implemented methods, the Equipercentile equatingmethod is the one whose results mostly resemble the current adopted thresholds.17 quating Moderate SevereMethod (SEE) (SEE) IRT-TS 6.2 (0.09) 12.1 (0.10)Mean 6.6 (0.07) 12.2 (0.07)Linear 6.5 (0.07) 11.7 (0.11)Equip 6.3 (0.09) 11.3 (0.15)

Equating Moderate SevereMethod (SEE) (SEE)

IRT-TS 5.8 (0.09) 12.1 (0.11)Mean 6.4 (0.05) 11.6 (0.05)Linear 6.2 (0.07) 11.0 (0.10)Equip 5.8 (0.13) 11.2 (0.16)

Table 5:

Raw scores on the Children scale corresponding to raw scores and on the Adult scale(lower thresholds for moderate and severe food insecurity, respectively) and related Standard Errorof Equating (SEE) computed by means of the IRT-TS, Mean, Linear and Equipercentile equatingmethods. Left: ELCSA (Guatemala). Right: ELCSA (Ecuador) Equating Moderate SevereMethod (SEE) (SEE)

IRT-TS 4.8 (0.12) 8.7 (0.13)Mean 5.5 (0.05) 9.5 (0.05)Linear 5.1 (0.07) 8.6 (0.10)Equip 4.8 (0.13) 8.1 (0.14)

Equating Moderate SevereMethod (SEE) (SEE)

IRT-TS 4.8 (0.08) 8.7 (0.09)Mean 5.5 (0.04) 9.0 (0.04)Linear 5.5 (0.04) 8.8 (0.07)Equip 4.7 (0.05) 8.3 (0.14)

Table 6:

Left: Raw scores on the Children scale corresponding to raw scores and on the Adultscale (lower thresholds for moderate and severe food insecurity, respectively) and related StandardError of Equating (SEE) computed by means of the IRT-TS, Mean, Linear and Equipercentileequating methods, EMSA (Mexico). Right: Raw scores on the Children scale corresponding toraw scores and on the Adult scale (lower thresholds for moderate and severe food insecurity,respectively) and related Standard Error of Equating (SEE) computed by means of the IRT-TS,Mean, Linear and Equipercentile equating methods, EBIA (Brazil). The present work presented two studies investigating comparability between experientialscales of food insecurity. The ﬁrst study aimed at addressing comparability between the FIESand three national scales (ELCSA, EMSA and EBIA) in Guatemala, Ecuador, Mexico andBrazil. Results show that, in general, the VoH threshold used for computing the indicator

Prevalence of Experienced Food Insecurity at moderate or severe levels ( F I

Mod + Sev ) andcorresponding to the severity of item ATELESS on the Global Standard ( − . ) seems toreﬂect a less severe level of food insecurity than that described by the thresholds used by thenational scales (and expressed in terms of raw scores) for the same level of severity. On theother hand, the VoH threshold used for computing the indicator Prevalence of ExperiencedFood Insecurity at severe levels ( F I

Sev ) and corresponding to the severity of item WHLDAYon the Global Standard ( . ) seems to reﬂect a more severe condition than the one measuredthrough the national thresholds for this level of severity. The relevance of such a resultfor the practice of food insecurity measurement is self-evident. In fact, the possibility of18omparing prevalences of food insecurity derived from applying diﬀerent scales represents animportant step in the direction of realizing a more reliable monitoring of the global progressestowards the goal of food security for all people worldwide (as expressed by Target . of theSustainable Development Goals) and, as such, it is expected to gain increasing attention bypractitioners and decision makers in the ﬁeld.Additionally, a second study investigated the issue of comparability between food insecurityscales referred to households without children (Adult scale) and households with children(Children scale), within each national context. Results show that the national thresholdscurrently used to compute prevalences of food insecurity at nominally the same level ofseverity among households with and without children might not always represent the samedegree of the restrictions on food access. This seems especially evident for the most severecategory of food insecurity, where current thresholds on the Children scale are lower thanthose computed as equivalent to the thresholds used on the Adult scale for this category.Future studies are expected to shed light on possible reasons and additional aspects of thistopic. A more detailed characterization of the phenomenon of food insecurity across countriesas well as in households with and without children (especially from a social and economicpoint of view) will likely better motivate and clarify the distinctive behaviour of diﬀerentfood insecurity scales. Furthermore, similar analyses to be conducted on other experientialscales of food insecurity might contribute to reach a deeper knowledge of the phenomenon.Signiﬁcant examples being the HFSSM in North America, as well as applications of ELCSAin countries of the Latin America beyond the ones here considered.19 eferences [1] Anthony D Albano et al. equate: An R package for observed-score linking and equating. Journal of Statistical Software , 74(8):1–36, 2016.[2] United Nations. General Assembly.

Universal declaration of human rights , volume 3381.Department of State, United States of America, 1949.[3] Terri J Ballard, Anne W Kepple, and Carlo Caﬁero. The food insecurity experiencescale: development of a global standard for monitoring hunger worldwide.

Rome, Italy.FAO , 2013.[4] Terri J Ballard, Anne W Kepple, Carlo Caﬁero, and Josef Schmidhuber. Better mea-surement of food insecurity in the context of enhancing nutrition.

Ernahrungs Umschau ,61(2):38–41, 2014.[5] Henry I Braun. Observed-score test equating: A mathematical analysis of some ETSequating procedures.

Test equating , 1982.[6] Carlo Caﬁero. What do we really know about food security? Technical report, NationalBureau of Economic Research, 2013.[7] Carlo Caﬁero, Hugo R Melgar-Quinonez, Terri J Ballard, and Anne W Kepple. Validityand reliability of food security measures.

Annals of the New York Academy of Sciences ,1331(1):230–248, 2014.[8] Carlo Caﬁero, Sara Viviani, and Mark Nord. Rm.weights: Weighted rasch mod-eling and extensions using conditional maximum likelihood. https://CRAN.R-project.org/package=RM.weights , 2018.[9] Steven J Carlson et al. Measuring food insecurity and hunger in the United States.

TheJournal of nutrition , 129(2):510S–516S, 1999.[10] Comitato Cientiﬁco de la ELCSA. Escala Latinoamericana y Caribeña de SeguridadAlimentaria (ELCSA): Manual de uso y aplicaciones.

Rome, Italy. FAO , 2012.[11] Jennifer Coates, Edward A Frongillo, Beatrice Lorge Rogers, Patrick Webb, Parke EWilde, and Robert Houser. Commonalities in the experience of household food insecurityacross cultures: what are measures missing?

The Journal of nutrition , 136(5):1438S–1448S, 2006.[12] Linda L Cook and Daniel R Eignor. Irt equating methods.

Educational measurement:Issues and practice , 10(3):37–45, 1991.[13] Neil J Dorans, Mary Pommerich, and Paul W Holland.

Linking and aligning scores andscales . Springer Science & Business Media, 2007.[14] FAO.1946. World Food Survey.

Washington, DC. FAO , 1946.2015] FAO.1952. World Food Survey.

Rome, Italy. FAO , 1952.[16] FAO.1963. Third World Food Survey. Freedom from Hunger Campaign Basic Study no.11.

Rome, Italy. FAO , 1963.[17] FAO.1996. World food summit.

Rome, Italy. FAO , 1996.[18] FAO.2001. The state of food insecurity in the world 2001.

Rome, Italy. FAO , 2001.[19] FAO.2016. Methods for estimating comparable rates of food insecurity experienced byadults throughout the world.

Rome, Italy. FAO , 2016.[20] Gerhard H Fischer and Ivo W Molenaar.

Rasch models: Foundations, recent developments,and applications . Springer Science & Business Media, 2012.[21] Ronald K Hambleton, Hariharan Swaminathan, and H Jane Rogers.

Fundamentals ofitem response theory , volume 2. Sage, 1991.[22] Willim L Hamilton, John T Cook, et al. Household food security in the united states in1995: technical report of the food security measurement project. 1997.[23] Andrew D Jones, Francis M Ngure, Gretel Pelto, and Sera L Young. What are weassessing when we measure food security? a compendium and review of current metrics.

Advances in Nutrition , 4(5):481–505, 2013.[24] Michael J Kolen and Robert L Brennan.

Test equating, scaling, and linking: Methodsand practices . Springer Science & Business Media, 2014.[25] Frederic M Lord and Marilyn S Wingersky. Comparison of IRT observed-score andtrue-score ‘equatings’.

ETS Research Report Series , 1983(2):i–33, 1983.[26] Gary L Marco. Item characteristic curve solutions to three intractable testing problems1.

ETS Research Bulletin Series , 1977(1):i–41, 1977.[27] Mark. Nord. Assessing Potential Technical Enhancements to the U.S. Household FoodSecurity Measures.

U.S. Department of Agriculture, Economic Research Service , TB-1936,2012.[28] World Health Organization et al. Sustainable Development Goals: 17 goals to transformour world, 2016.[29] Per Pinstrup-Andersen. Food security: deﬁnition and measurement.

Food security , 1(1),2009.[30] Kathy L Radimer, Christine M Olson, Jennifer C Greene, Cathy C Campbell, andJean-Pierre Habicht. Understanding hunger and developing indicators to assess it inwomen and children.

Journal of Nutrition Education , 24(1):36S–44S, 1992.2131] Georg Rasch. Probabilistic models for some intelligence and achievement tests.

Copen-hagen: Danish Institute for Educational Research , 1960.[32] Stanley Smith Stevens et al. On the theory of scales of measurement. 1946.[33] Robert D Tortora, Rajesh Srinivasan, and Neli Esipova. The Gallup World Poll.

SurveyMethods in Multinational, Multiregional, and Multicultural Contexts , pages 535–543,2010.[34] Paloma Villagómez-Ornelas, Pedro Hernández-López, Brenda Carrasco-Enríquez, KarinaBarrios-Sánchez, Rafael Pérez-Escamilla, and Hugo Melgar-Quiñónez. Validez estadísticade la Escala Mexicana de Seguridad Alimentaria y la Escala Latinoamericana y Caribeñade Seguridad Alimentaria. salud pública de méxico , 56:s5–s11, 2014.[35] Jonathan P Weeks et al. plink: An R package for linking mixed-format tests usingIRT-based methods.

Journal of Statistical Software , 35(12):1–33, 2010.

Authors’ address:

University of Rome La Sapienza, [email protected]

Faculty of Information Engineering, Informatics and Statistics,Department of Statistical Sciences, [email protected]

ItalyFood and Agriculture Organization of the United Nations, [email protected]