Towards global monitoring: equating the Food Insecurity Experience Scale (FIES) and food insecurity scales in Latin America
TTowards global monitoring: equating the Food
Insecurity Experience Scale (FIES) and foodinsecurity scales in Latin America
Federica Onori, Sara Viviani and Pierpaolo Brutti
Abstract
In order to face food insecurity as a global phenomenon, it is essential to rely onmeasurement tools that guarantee comparability across countries. Although the officialindicators adopted by the United Nations in the context of the Sustainable DevelopmentGoals (SDGs) and based on the Food Insecurity Experience Scale (FIES) already embedscross-country comparability, other experiential scales of food insecurity currently employnational thresholds and issues of comparability thus arise. In this work we addresscomparability of food insecurity experience-based scales by presenting two differentstudies. The first one involves the FIES and three national scales (ELCSA, EMSAand EBIA) currently included in national surveys in Guatemala, Ecuador, Mexico andBrazil. The second study concerns the adult and children versions of these nationalscales. Different methods from the equating practice of the educational testing field areexplored: classical and based on the Item Response Theory (IRT). a r X i v : . [ s t a t . A P ] F e b Introduction
Food security is a subject of indisputable relevance, being it conceived as a basic human rightsince , as stated in Article of the Universal Declaration of Human Rights: “Everyonehas the right to a standard of living adequate for the health and well-being of himself and ofhis family, including food, clothing, housing and medical care” [2]. However, food security isa complex and multifaceted concept whose terminology has long been affected by a varietyof sectors and disciplines strictly related to it (e.g. agriculture, nutrition, economy, publicpolicy, etc... ) [7, 23]. A consensus around the definition of food security was finally reachedduring the World Food Summit in when it was formalized as follows: “Food securityexists when all people, at all times, have physical and economic access to sufficient, safeand nutritious food that meets their dietary needs and food preferences for an active andhealthy life" [17]. Grounding on this definition, the conceptualization and operationalizationof food security emerge as that of a multidimensional phenomenon made up of four different,hierarchically ordered dimensions: availability, access, utilization and stability [6, 29]. As aconsequence, no single indicator can be successfully designated to return a thorough pictureof the phenomenon, but a suite of indicators exists, each monitoring specific aspects of foodsecurity at different levels of the observation: national, regional, households and individual[23]. Among all possible aspects related to food insecurity, the dimension of access to food isgiven nowadays high-priority, being acknowledged among the Sustainable DevelopmentGoals (SDGs) of the
Agenda for Sustainable Development adopted by the UnitedNations. Access to food is in fact the subject of Target . [28], which states:By 2030, end hunger and ensure access by all people, in particular the poor andpeople in vulnerable situations, including infants, to safe, nutritious and sufficientfood all year round.Although food security is now a well-established concept within the scientific community,its definition changed throughout the last century and so did the tools employed to measurethe phenomenon [7, 23]. A brief summary of the main steps will enable to fully appreciatethe novelties brought about by the measurement tools developed since the ’ s. Duringthe s and for some decades on, the issue of food security was completely identifiedwith that of having enough provisions to cover the needs of the population and, thereforethe “food problem" was mainly dealt with in terms of country-level supplies [14, 15, 16].Nevertheless, this formulation could not catch the aspect, yet observable, of malnutrition andfamines in countries that did not suffer from food supply at national level [9], signal that a This definition was further refined in [18], when food access was not only conceived in terms ofaffordability and physical access, but also in terms of removal of social barriers . The community of researchers,practitioners and political decision makers currently agree upon the following definition:Food security exists when all people, at all times, have physical, social and economic access tosufficient, safe and nutritious food that meets their dietary needs and food preferences for anactive and healthy life. access to food . To mark this change in prospective,the expression household food insecurity began to be used. Since then, other shifts pertainedto the definition of food insecurity as for what we use today. A very fundamental onewas in the s when interest moved from dietary energy adequacy to experience of foodinsecurity and livelihood conditions, which involved, among others, also social, nutrition andpsychological considerations. Food insecurity has in fact been recognized as a “managedprocess", described by means of a spectrum of behaviours and coping strategies that canreveal the level of severity of a food access condition [30]. Although specific attitudes andcoping strategies might change from country to country, there is a general consensus inthe scientific community about the common pattern of behaviours that characterize foodinsecurity with very minor differences across cultures [11]. To this regard, ethnographic andsocietal studies established that, in case of increasing lack of money or other resources, acommon pattern of experiences and behaviours manifests in order to cope with shortageof food [30]: at first, psychological concern arises since people start worrying about havingenough food; then, a change in the diet occurs by decreasing the quality and variety of theconsumed food in order to face a concrete limited access to food; and, in case of more severefood shortages, people would diminish the quantity of consumed food by reducing meals’size and then by even skipping meals, potentially up to experiencing hunger. The steps justdescribed are commonly referred to as the three domains of resource-constrained access tofood: psychological concern, decrease of food quality, decrease of food quantity and hunger .Mirroring these shifts in the paradigm (from global and national to households and indi-viduals; from food supplies to livelihood conditions; and from objective to subjective measures[7]), a number of indicators have been proposed to measure food insecurity, like measures ofadequacy of food consumption, prevalence of undernourishment, dietary diversity score, etc...Among all, experience-based food insecurity scales found a place of relevance, having proved tobe a valid and reliable tool for measuring food insecurity in its access dimension, encompassingthe current definition of the phenomenon while adopting a behavioural perspective [7]. Asthe name suggests, experience-based food insecurity scales measure access to food from abehavioural perspective, building on a set of items that directly ask people about their ownpersonal experiences and behaviours related to the three domains of access to food [23]. Thevery first experience-based food insecurity scale was the Household Food Security SurveyModule (HFSSM), applied yearly in the United States of America since for monitoringpurposes [22]. As a matter of fact, the HFSSM pioneered in this field and several countries inLatin America followed this example by developing their own national scales to be includedin national surveys for periodical monitoring. In , Brazil included the Brazilian Scaleof Food Insecurity (EBIA) into national Brazilian surveys; Haiti, Guatemala and Ecuador, The aim of this first part of the work was mainly to provide a general framework for the topic andclarify that the expression “food insecurity” technically refers to a multitude of aspects that relate to different dimensions . However, in order to avoid confusion and enable an agile treatise of the subject, hereafter “foodinsecurity” will specifically be meant at the individual or household level and interpreted as the set of therestrictions in accessing food due to limited resources (or, equivalently, resources-constrained access to food ).This choice will also facilitate conceiving food insecurity as a measurable construct.
Latin American and Caribbean Food Security Scale (Escala Latinoamericana y Caribeña de Seguridad Alimentaria - ELCSA); and in
Mexicodeveloped its adaptation of the ELCSA, called
Mexican Food Security Scale (EMSA). Peculiarto these scales is the availability of two different survey modules, one for households withchildren and one for households without children and made up of a different number ofitems. Finally, beside these country-specific applications of the experience-based approachto measuring food insecurity, in the Food and Agriculture Organization of the UnitedNations (FAO) launched the Voices of the Hungry project (VoH) and developed the FoodInsecurity Experience Scale (FIES) conceived as a global adaptation of HFSSM and ELCSA[19]. The FIES is based on people’s responses to only dichotomous items and, by meansof an ad-hoc methodology that grounds on the Item Response Theory (IRT), and morespecifically on the Rasch model, it is the first food insecurity measurement system based onexperiences that generates formally comparable measures of food insecurity across countries.As such, it is one of the official measurement tool for monitoring progresses toward Target . of the SDGs, being the scale used to compute the related Indicator . . , ( Prevalence offood insecurity at moderate and severe levels based on FIES ) [3, 4, 19].Although the national and regional scales proved to be adequate tools for measuring andmonitoring access to food within each country [10, 34], the need for a global monitoring,such as that sought in the context of the SDGs, raised the issue of comparing results fromapplications of different scales in different countries [6]. In fact, despite sharing a commonevolution, each national scale uses specific thresholds to measure prevalences of food insecurityfor nominally
RM.weights [8], equate [1] and plink [35]. The remaining of the paper is organized as follows: Section presents the data and Section is devoted to describe the pillars and the methods of the TestEquating; Section presents the main results; and Section concludes with some remarksand possible directions for future works. As already mentioned, the FIES is strongly based on the ELCSA, which in turn representsa common ancestor for other scales in use in Latin America (EMSA, EBIA, etc...). As aconsequence, all these scales largely share the same cognitive content of the items, whichconstitutes the promising ground on which addressing comparability. Nevertheless, the FIESand the national scales show important differences. First of all, national scales measurefood insecurity at the household level, while the FIES produces national measures of foodinsecurity at the individual level . Secondly, national scales have a reference period of months, while the FIES refers to the months previous to the interview. Thirdly, andperhaps most importantly, national scales compute prevalences of food insecurity following a deterministic methodology based on raw scores (number of affirmative responses) and usediscrete thresholds (expressed in terms of raw scores) for computing prevalences of foodinsecurity at different levels of severity. On the other hand, VoH methodology for the FIESis probabilistic in nature in that it fits the Rasch model to the data, models access to foodby means of a probabilistic distribution and computes prevalences of food insecurity usingthresholds on the continuum latent trait. The survey modules on which ELCSA, EMSA and EBIA are built have strong similarities[10, 34]. They all account for the three domains of the access dimension of food insecuritydiscussed in the previous section, aim at measuring food insecurity at the household leveland all adopt the same reference period of months previous to the day of the moduleadministration. As far as the methodology is concerned, ELCSA, EMSA and EBIA agree ona similar procedure that can be summarized in few steps [10, 34]:1. Computation of a raw score for each household: by counting the number of items affirmedby that household. Raw scores represent an ordinal measure of food insecurity: the highestthe raw score, the more severe the level of food insecurity.2. Computation of prevalences of food insecurity at three levels of severity: mild, moderateand severe. Prevalences are computed as percentages of households in the sample thatscored within a certain range expressed in terms of raw scores and with different thresholdsdepending on whether children live in the household or not (Table 1).3. Data validation. Homogeneity of the items comprising the scale is assessed by fitting theRasch model to the data. 4oreover, each national scale makes use of two different versions of the survey module,distinguishing between households with children (i.e. people under the age of years) andhouseholds without children. The first group of survey modules is usually made up of to household-referenced items and, for the sake of simplicity, the scale obtained from this setof items will be referred to, in this work, as the Adult scale. The second one integrates thefirst one by adding from to extra children-referenced questions and the scale obtainedfrom this set of items will be referred to as the Children scale. The two survey modules thusencompass a different number of items and, from each of them, a scale is built that usesdifferent thresholds to compute prevalences of food insecurity that should be meant to reflectthe same level of severity. Prevalences derived from the two scales are then considered jointlyin order to derive national prevalences of food insecurity.
Scale Food insecurity Households HouseholdsLevel without children with childrenELCSA mild 1 to 3 1 to 5moderate 4 to 6 6 to 10severe 7 to 8 11 to 15
EMSA mild 1 to 2 1 to 3moderate 3 to 4 4 to 7severe 5 to 6 8 to 12
EBIA mild 1 to 3 1 to 5moderate 4 to 6 6 to 10severe 7 to 8 11 to 15
Table 1:
Classifications of food insecurity using national scales (ELCSA, EMSA and EBIA) andcorresponding ranges of the raw scores for households with and without children.
It is worth highlighting that, as reported in Table 1, the thresholds used to computecategories of food insecurity that nominally reflect the same level of severity (mild, moderateor severe), are country (or regional)-specific. As a matter of fact, these thresholds were notchosen in order to assure comparability among countries (no matter how geographically closeto each other they might be) nor in light of clear statistical properties, but according toopinions of experts from the nutrition and social sciences fields. The same considerationholds for the thresholds chosen for the household referenced-scale and the children-referencedscale within each national context. As a consequence, there is no clear guarantee that, forexample, a raw score of truly reflects the same level of severity in Mexico and Brazil, orthat, applying ELCSA in Guatemala, and can be considered as equivalent scores inhouseholds without and with children, respectively. Inspired by Target . of the SDGs, the Voices of the Hungry (VoH) project of the Food andAgriculture Organization developed the Food Insecurity Experience Scale (FIES), designed5o have cross-cultural equivalence and validity in both developing and developed countries,aiming at producing comparable prevalences of food insecurity at various levels of severity[19]. As reported in Table 2, the FIES Survey Module is made up of dichotomous itemsaccounting for the three domains of access to food. Since , the FIES Survey Module(FIES-SM) is part of the Gallup World Poll (GWP) Survey, from Gallup Inc. [33], a surveythat is repeated every year in over countries and administered to a sample of adultindividuals (aged or more) representative of the national population. This has practicallyallowed to reach countries that do not have a national measurement system for food insecurity,yet. In accordance with the characteristics of the GWP, the version of the FIES-SM hereconsidered refers to a period of months prior to the survey administration and investigatesfood insecurity at the level of adult individuals (people aged older than years), whichrepresents a first difference between FIES and the national scales. Items Abbreviations
During the last 12 months, was there a time when,because of lack of money or other resources:1. You were worried you would not have enough food to eat? WORRIED2. You were unable to eat healthy and nutritious food? HEALTY3. You ate only a few kinds of foods? FEWFOOD4. You had to skip a meal? SKIPMEAL5. You ate less than you thought you should? ATELESS6. Your household ran out of food? RUNOUT7. You were hungry but did not eat? HUNGRY8. You went without eating for a whole day? WHLDAY
Table 2:
FIES Survey Module (FIES-SM) for individuals and with a reference period of months. However, the main difference between the two is in the methodology used [19]. The Vohmethodology developed for the FIES employs a probabilistic model not only as a validationtool (for assessing homogeneity of the items in the scale), but also for computing measurementsof food insecurity. In fact, food insecurity is treated as a latent trait whose measurement isachieved by means of some “observables" (the items’ answers) and a probabilistic model thatlinks the two. The Rasch model (also known as the one-parameter logistic model or 1PLmodel) is one of the most simple model that can serve this purpose while, at the same time,assuring a set of favourable measurement properties [20, 31]. It was proposed in the contextof educational testing, where the purpose is generally to score students based on a set ofquestions (items) and, according to this model, the probability of a respondent to correctlyanswering the j − th item is modelled as a logistic function of the distance between twoparameters, one representing the item’s severity ( b j ) and one representing the respondent’s6bility ( θ ): P j ( θ ) = P ( X j = 1 | θ ; b j ) = exp( θ − b j )1 + exp( θ − b j ) . (1)The Rasch model provides a sound statistical framework to assess the suitability of aset of items for scale construction and comparing performance of scales. Basic assumptionsare unidimensionality, local independence, monotonicity, equal discriminating power of theitems and logistic shape of the Item Response Functions (IRFs). Moreover, it has severalinteresting properties for which it earned its success among social science measurement models,like sufficiency of the raw score, independence between items and examinees’ parameters,and invariance property [21]. In the context of food insecurity, the item’s severity can beinterpreted as the severity of the restrictions in food access represented by each item while theability parameters are to be meant as the overall severity of the restrictions in accessing foodthat the respondent had to face (in light of her answers to the items in the survey module).From the point of view of the Stevenson’s classification of scales [32], this way of measuringfood insecurity guarantees the construction of an interval scale as opposed to the ordinal scale obtained from the methodology employed, for instance, for ELCSA, EMSA and EBIAand named deterministic as opposed to the probabilistic developed for the FIES. Moreover,prevalences obtained by means of the FIES are guaranteed to be comparable across countries,thanks to the implementation of an equating step for which estimates of the model parametersobtained in a single application of the scale are adjusted on the FIES Global Standard scale,a set of item parameters serving as a reference metric and based on application of the FIESin all countries that were covered by the GWP survey in , and [19] (Fig. 1).Finally, each respondent is assigned a probabilistic distribution of his/her food insecurityalong the latent trait, depending on his/her raw score. This distribution is Gaussian withmean equal to the adjusted (to the Global Standard) respondent parameter and standarddeviation equal to the adjusted measurement error for that raw score. As a last step, thismixture of distributions is used to compute the percentage of population whose severityis beyond a fixed threshold on the latent trait, calculated as a weighted sum across rawscores, with weights reflecting the proportions of raw scores in the sample (Figure 2). Whiletheoretically it is possible to compute percentages of population beyond each and every valueon the continuum, the VoH methodology suggests the computation of two prevalence ratescorresponding to choosing thresholds on the Global Standard metric set at the severity ofitems ATELESS ( − . ) and WHLDAY ( . ) (Fig. 1). The resulted indicators of foodinsecurity take the name of Prevalence of Experienced Food Insecurity at moderate or severelevels ( F I
Mod + Sev ) and
Prevalence of Experienced Food Insecurity at severe levels ( F I
Sev ),respectively. However, in order for these quantities to be valid and reliable measurementsof food insecurity, a validation step must be undertaken in each and every application ofthe scale. This is commonly performed by computing goodness-of-fit statistics of the Raschmodel (e.g. Infit, Outfit and Rasch reliability statistics) that assess the good behaviour ofthe items and by performing a Principal Component Analysis (PCA) on the residuals toinvestigate the existence of a second latent trait. For more details on the usage of the Raschmodel as a measuring tool for food insecurity we refer the reader to [27], while for more7nsights in the VoH methodology for the FIES we refer to [19].
Figure 1:
The FIES Global Standard.
Figure 2:
Distributions of severity of food insecurity among respondents according to their rawscores Data
Data referred to Guatemala were collected in the
Encuesta Nacional de Condiciones deVida (ENCOVI) conducted by the
Instituto Nacional de Estadística (INE) in and thesample used included households. Data referred to Ecuador were collected in the
Encuesta Nacional de Empleo y Desempleo (ENCOVI) conducted by the
Instituto Nacionalde Estadísticas y Censos (INEC) in and the sample used included households.Data referring to Mexico were collected in the
Encuesta Nacional de Ingresos y Gastos de losHogares (ENIGH) conducted by the
Instituto Nacional de Geografia e Estatística in andthe sample used included households. Data referring to Brazil were collected in the Pesquisa Nacional de Amostra de Domicílios (PNAD) conducted by the
Instituto Nacionalde Geografia e Estatística (IBGE) in and the sample used included households.All samples were representative of the corresponding national populations.
Scores deriving from tests usually have an important role in the decision making processthat brings to excluding some candidates for a job or scholarship position, or adoptingspecific public policy strategies in order to take action on a public relevant issue. Evidently,this requires that tests to be administered in multiple occasions, as it is the case for theadmission college tests that are held in specific test dates during the year. Therefore, a crucialconsideration arises: if the same questions were included in the tests, students that alreadytook the test would have an advantage and the test would rather measure the degree ofexposure of students to past tests than their ability on some specific subject. At the same time,it is important that all students take the “same" test, in order to fairly compare performancesand make decisions accordingly. This issue is commonly addressed by administering on everytest date a different version of the same test, called test form , that is built according to certaincontent and statistical test specifications . Nonetheless, minor differences might still occuramong different test forms, one resulting slightly more difficult than the others. Therefore,in order to evenly score students that took multiple test forms and establish if a poorerperformance is due to a less skillful respondent and not to a more difficult test, a procedure isneeded to make tests comparable. This procedure is called equating and it is formally definedas the statistical process that is used to adjust for differences in difficulty between testsforms built to be similar in content and difficulty, so that scores can be used interchangeably[13, 24]. Every test equating should meet some fundamental equating requirements and needsthe specification of both a data collection design and of one or more methods to estimate anequating function. All these aspects will be discussed in the remaining of this section.
Equating scores on two test forms X and Y must meet some requirements that assure that theequating to be meaningful and useful (i.e. equated scores can be used interchangeably). Thefollowing five requirements are globally considered of primary importance for an equating to9e run, although they would better be considered as general guidelines than easily verifiableconditions:• Equal Construct Requirement
Tests that measure different constructs should notbe equated.•
Equal Reliability Requirement
Tests that measure the same construct but differin reliability should not be equated.•
Symmetry Requirement
Equating function that equate scores on X to scores on Y should be the inverse of the equating function that equate scores on Y to scores on X .• Equity Requirement
For the examinee should be a matter of indifference which testwill be used.•
Population Invariance Requirement
Equating function used to equate scores on X and scores on Y should be population invariant in that the choice of a specificsub-population used to compute the equating function should not matter.It might be the case that the two tests to be equated do not satisfy all five requirements.For example, they could differ in length and statistical specifications, with consequences onthe “Equal Reliability requirement” and “Equity requirement”. In fact, a longer test wouldbe in general more reliable and, if a poorly skillful examinee had to be scored, he or shewould have more chance to score higher if administered the shortest test. The aforementionedrequirements assure that scores derived from tests that do meet all of them can be usedinterchangeably while, if they do not all strictly hold, the exercise would rather be addressedas a weaker analysis of comparability named linking [13, 24]. There are basically two ways in which data collection designs can account for differencesin the difficulty of two or more test forms in test equating, namely either by the use of“common examinees" or the use of “common items" [13, 24]. In the first case the same groupof examinees (or two random samples of examinees from the same target population) takeboth tests. In this case, any difference in the scores is attributable to differences in the testforms. Examples of this category are the “Single-Group" (SG) and the “Equivalent-Groups"(EG) designs. In the second case, a set A of common items called anchor test is included inboth test forms in order to account for such differences. Therefore, any difference betweenscores on the anchor test is due to differences among examinees. Data designs that use thismethod are called “Non-Equivalent groups with Anchor Test" (NEAT) designs. Several equating methods have been proposed and applied to equate observed scores onequatable tests. In this section an overview of the most common and popular methods10s provided, starting from the observed-score methods of mean, linear and equipercentileequating and ending up with the true score equating in the context of IRT. All these methodshave been implemented for the two comparability studies between experience-based foodinsecurity scales that are presented in this work.
Let X and Y be two tests (or two forms of the same test) scored correct/incorrect (1/0).Scores on test X and Y will be denoted as random variables X and Y with possible values,respectively x k , ( k = 0 , . . . , K ) and y l (for l = 0 , . . . , L ), where K and L are the lengths oftests X and Y , respectively. We denote the score probabilities of X and Y by r k = P ( X = x k ) and s l = P ( Y = y l ) . (2)The cdfs of X and Y are denoted by F ( x ) = P ( X ≤ x ) and G ( x ) = P ( Y ≤ y ) (3)and the moments are, respectively µ X = E ( X ) , µ Y = E ( Y ) (4)and σ X = SD ( X ) , σ Y = SD ( Y ) (5) Mean equating
In mean equating, test form X is assumed to differ from test form Y bya constant amount along the scale. For example, if form X is 2 points easier than form Y forlow-ranking examinees, the same will hold for high-ranking examinees. In mean equating,two scores on different forms are considered equivalent (and set equal) if they are the same(signed) distance from their respective means, that is x − µ X = y − µ Y . (6)Then, solving for y , the score on test Y that is equivalent to a score x on test X , and called m Y ( x ) , is m Y ( x ) = y = x − µ X + µ Y . (7)Clearly, mean equating allows for the means to differ in the two test forms. Linear equating
In linear equating, difference in difficulty between the two tests is notconstraint to remain constant but can vary along the score scale. In this equating method,scores are considered equivalent and set equal if they are an equal (signed) distance from theirmeans in standard deviation units, that is the two standardized deviation scores (z-scores)on the two forms are set equal x − µ X σ X = y − µ Y σ Y (8)11rom which the score on test Y equivalent to a score x on test X , and that is called l Y ( x ) , is l Y ( x ) = y = σ Y σ X x + (cid:20) µ Y − σ Y σ X µ X (cid:21) . (9)where σ Y σ X can be recognized as the slope and µ Y − σ Y σ X µ X as the the intercept of the linearequating transformation. Linear equating allows for both means and scale units to differ inthe two test forms. Equipercentile equating
In the equipercentile equating method a curve is used to describedifferences between scores in the two forms. Basic criterion for the equipercentile equatingtransformation is that the distribution of the scores on Form X converted to the Form Y scale is equal to the distribution of scores on Form Y . Scores on the two forms are considered-and set- equivalent if they have the same percentile rank . We adopt here the definitionof equipercentile equating function given by Braun and Holland in [5]. Let’s consider therandom variables X and Y representing the scores on forms X and Y and F and G theircumulative distribution functions. We call e Y the symmetric equating function convertingForm X scores into scores on Form Y scale and G (cid:63) the cumulative distribution function of e Y ( X ) , that is the cdf of the scores on Form X converted to the Form Y scale. Function e Y is the equipercentile equating function if G (cid:63) = G . According to the definition of Braun andHolland, if X and Y are continuous random variables, then e Y ( x ) = G − [ F ( x )] , (10)is an equipercentile equating function, where G − is the inverse of G . This definition meetsthe “Symmetric requirement” and, given a Form X score, its equivalent on the Form Y scaleis defined as the score having the same percentage of examinees at or below it. Equating different forms of the same test using the IRT-True Score equating (IRT-TS) is athree steps process [12]:1.
Estimation : Fit an IRT-model to the data for both tests.This step consists in assessing goodness-of-fit of a specific IRT model and estimating itemparameters for both forms. In the case of the Rasch model, in light of the sufficient statisticsproperty, estimates of the item severities do not depend on the group of examinees andtherefore the IRT-TS based on the Rasch model can be claimed to meet the “PopulationInvariance requirement", since it produces results that are sample-independent.2.
Linking : Put parameters’ estimate on a common metric through a linear transformationbased on a set A of common items.In this second step, a linear transformation is used to bring parameter estimates to acommon IRT scale. In fact, "if an IRT model fits the data, then any linear transformation12f the θ -scale also fits the data, provided that the item parameters are transformed aswell" [24]. Let consider Form X made up of J dichotomously scored items administeredto N examinees and let consider the Rasch model to fit the data. Then, if P and Q areRasch scales that differ by a linear transformation, the item severities b j , j ∈ { , . . . , J } are related as follows b j Q = Ab j P + B, j ∈ { , . . . , J } and the same relationship holds for the ability parameters. A useful way to express theconstants A and B is through the mean and standard deviation of the item parameters inboth scales A = σ ( b Q ) σ ( b P ) , B = µ ( b Q ) − Aµ ( b P ) . In equating two different forms of the same test with a set of common items administered tonon-equivalent groups, it is possible to exploit this linear relationship through the so called
Mean/Sigma transformation method [26], which uses means and standard deviations ofitem parameter’s estimates of only those items in the anchor test. More specifically, givenForm X and Form Y with a set A of common items, estimates of the difficulty parametersfor items in the set A in the two calibrations are linked via a linear transformation andused to compute the coefficients A and B of this transformation. Once the transformationis estimated, it can be applied to transform the ability parameters on one Form to thecorresponding parameters on the other Form, thus enabling comparability between thetwo Forms.3. Equating : Get equivalent expected raw scores through the Test Characteristic Curves ofthe two tests (TCC).Once the metrics of the two Forms are put on the same scale (that can be either the scaleof one of them or a third scale) it is finally possible to compare performance of examineestaking the two Forms. However, as it often happens with standardized tests, reportedscores could be expressed in terms of raw scores and, if this is the case, a further stepis needed. Within the framework of IRT, it is possible to mathematically relate abilityestimates to specific true scores on each test form. The
IRT- True Score equating methodcomputes equivalent true scores in the two forms and considers them, as it is common inthe practice of equating, as equivalent observed scores [25]. Given Form X and Form Y two test forms measuring the same ability θ with respectively n X and n Y items and givenboth item and ability estimates are on the same scale through a linear transformation, theestimated true scores on the two forms are related to θ by the so called Test CharacteristicCurve as follows T X = n X (cid:88) j =1 ˆ P j ( θ ) , T Y = n Y (cid:88) i =1 ˆ P i ( θ ) where T X (respectively T Y ) is the estimated true score for Test X ( Y ) and ˆ P i ( θ ) ( ˆ P j ( θ ) ) isthe estimated probability function for item j ( i ) (Fig. 3). Through the Test CharacteristicCurve, an ability θ can thus be transformed into an estimated true score on the test form13nd, provided ability parameters on the two forms are put on the same metric, true scorescorresponding to the same θ are considered equivalent. Figure 3:
Test Characteristic Curve (TCC) referred to a test of three items (left) and a pictorialdescription of the IRT-True Score (IRT-TS) equating method (right).
As a preliminary step to both equating studies, the Rasch model has been fitted to all eightdatasets (an Adult scale and a Children scale in the four countries), and a validation step wasperformed to confirm the good behaviour of the scale. In all eight applications it was possibleto observe an overall good fit of the model. Assumptions of equal discrimination of the itemswas certainly met, thanks to item Infit statistics entirely in the range of (0 . , . , confirmingthe strength and consistency of the association of each item with the underlying latent trait(compare [19, 27]). Moreover, Outfit statistics were never as high as to warn misbehaviourdue to highly unexpected response patterns, assessing the good performance of the items.Assumptions of conditional independence and unidimensionality of the items were assessedthrough computation of conditional correlations among each pair of items and submission ofthe correlation matrix to principal component factor analysis (PCA). All pairwise residualcorrelation were, in absolute value, smaller than . thus confirming that all correlationsamong items result from their common association with the latent trait. PCA performed onthe matrix of residual correlations showed the presence of only one main dimension that, dueto the cognitive content of the items, can thus be recognized as the food access dimensionthat the scales aim at measuring. Finally, overall model fit is assessed by Rasch reliabilitystatistics (proportion of total variation in true severity in the sample that is accounted for bythe model), ranging between . (Mexico) and . (Guatemala) for the Adult scale and14etween . (Mexico) and . (Guatemala) for the Children scale, confirming a good overalldiscriminatory power for all scales. Sporadic departures from this irreproachable behaviourcould only be attested for one or two items in the Children scale (like a residual correlationof . between two item of the ELCSA in Guatemala) that however never compromised thegood performance of the overall scale. The aim of this comparability study is to find raw scores on the national scales EBIA, EMSAand ELCSA that can be considered equivalent to the continuum FIES global thresholdsused to compute the two indicators
F I
Mod + Sev and
F I
Sev , namely − . and . . However,it is worth noticing that, since VoH methodology uses thresholds on the continuum whilenational scales methodology uses discrete thresholds, the equivalent raw score will almostnever exactly produce the same prevalence obtained with the VoH thresholds.As it is currently set up and implemented, the FIES Module refers to adults (peopleaged or above). Therefore, in order to meet the “Equal Construct requirement”, themodules of the national scales administered to households without children have here beenconsidered. Technically, the FIES Survey Module and the survey modules of the nationalscales (households without children) will thus serve the role of test forms of the same testthat are to be equated. This was ultimately made possible in light of the common historythat brought to the development of these scales (i.e. FIES, ELCSA, EMSA, EBIA), whichassures that, despite some differences such as the level of the measurement and the referencetime (see Section 2), the survey modules used to collect data have very strong similaritiesand share the same dichotomous structure (possible answers are “Yes/No”).This first study was carried out by implementing the following methods:1. IRT True Score (IRT-TS) equating.2.
Linking via a linear transformation applied to ability parameters.3.
Minimization of the difference between prevalences of food insecurity.The IRT-TS equating method was implemented in the context of the NEAT equatingdesign. In this work, the set A of common items was computed according to an iterativeprocedure that starts with all items considered as in common (apart from the ones classifiedas unique a priori ) and then discards one item at a time beginning from the one that exceedsthe tolerance threshold of . the most. Algorithm ends when a set A of items all within thisthreshold is found. Item WHLDAY was considered as unique a priori in all four equatinganalyses due to its different cognitive content in the considered scales: more severe in theFIES since it refers to “not eating for a whole day”, and less severe in the national scales whereit reports on members of the household that either only ate once or went without eating for awhole day. The Standard Error of Equating (SEE) for the IRT True-Score equating methodwas estimated using bootstrap replications [24]. The second and third methods can beconsidered as either variations of the IRT-TS or techniques that might sound particularly15easonable in the present context. They were explored for investigation purposes and theobtained scores won’t be claimed to be “equivalent", but rather “corresponding" scores. Infact, the second method (Linking) consists in considering the linear transformation obtainedin the second step of the IRT-TS method and applying it to the estimated ability parametersof the Rasch model. Once ability parameters are adjusted to the Global Standard metric, theraw score corresponding to the ability parameters that are closer to the two VoH thresholdsare considered as corresponding raw score. On the other hand, the third method (Minimizing)consists in computing prevalences of food insecurity at the household level applying theFIES methodology to the data used for the national scales and comparing the prevalences soobtained with the percentages of population scoring from a certain raw score on. The tworaw scores that realize the minimum distance with the two VoH global thresholds (in termsof prevalences) are considered as the corresponding raw scores in accordance to this method.Results from the first comparability study are summarized in Table 3 and Table 4, whichreport the raw scores on the national scales that are computed equivalent to the VoH globalthresholds used for the indicators F I
Mod + Sev and
F I
Sev , respectively. Table 3 shows that thethreshold used for computing
F I
Mod + Sev and corresponding to the severity of item ATELESSon the Global Standard metric (i.e. − . ) might reflect a less severe condition of foodinsecurity compared to the one measured by the national scales for the moderate category offood insecurity. In fact, all the equated raw scores are either equal to or around one point lessthan the thresholds currently used by ELCSA, EMSA and EBIA for this category of foodinsecurity. On the contrary, the threshold used for F I
Sev and corresponding to the severityof item WHLDAY on the Global Standard metric (i.e. . ) generally reflects a more severe condition of food insecurity than the one captured by the national scales for the severe levelof food insecurity, Table 4 reporting equated raw scores that are either equal to or one pointhigher than the national thresholds currently in use for this category. Food Insecurity Internal IRT-TS Linking Min. Diff.Scales Monitoring Rasch (SEE)
ELCSA (Guatemala) 4 3.3 (0.19) 3 4ELCSA (Ecuador) 4 4.2 (0.14) 4 4EMSA (Mexico) 3 2.0 (0.23) 2 2EBIA (Brazil) 4 4.0 (0.09) 4 5
Table 3:
Equated Raw Scores on the national scales corresponding to the VoH threshold for
F I
Mod + Sev ( − . on the Global Standard). This second analysis aims at comparing the Adult and Children scales within each nationalcontext. To this purpose, we implemented the Single Group (SG) data collection design byconsidering the scores obtained by the households with children on both survey modules16 ood Insecurity Internal IRT-TS Linking Min. Diff.Scales Monitoring Rasch (SEE)
ELCSA (Guatemala) 7 7.8 (0.18) 8 8ELCSA (Ecuador) 7 7.1 (0.18) 7 8EMSA (Mexico) 5 6.0 (0.26) 6 6EBIA (Brazil) 6 7.9 (0.07) 8 8
Table 4:
Equated Raw Scores on the national scales corresponding to the VoH threshold for
F I
Sev ( . on the Global Standard). (the one containing only adult and household-referenced questions and the one containingalso children-referenced questions). This equating design is usually not easily implemented,since it requires the same group of respondents to be administered two different forms of thesame test, resulting in an expensive and time-consuming procedure. However, we here couldexploit the fact that the survey module for the Children scale is simply an extended versionof the module for the Adult scale (see Section 2) and as such, we can “imagine” to administerthe adult referenced survey to households with children just by dropping children-referenceditems. With regards to the equating requirements (see Section 4.1), it is worth noticingthat the two survey modules have different length and, as such, the obtained scales couldhave different reliability which, in turn, could potentially challenge the “Equal ReliabilityRequirement”. Equating of the Adult and Children scales in the four countries was carriedout through implementation of four equating methods: IRT True Score equating with theRasch model, Mean, Linear and Equipercentile equating methods [24]. The first method isIRT-based while the other four are classical methods of equating, that do not rely on anymodel to fit the data but only on the observed raw scores.Tables 5 and 6 show raw scores on the Children scale that are computed equivalent toraw scores on the Adult scales that are used as lower thresholds for the moderate and severecategories of food insecurity. Results suggest that the national thresholds currently usedfor nominally the same levels of severity could reflect different degrees of the severity ofaccess to food. This is particularly evident when looking at the most severe category of foodinsecurity, where raw scores on the Children scale that are computed equivalent to the lowerthresholds on the Adult scale are around one point higher than the thresholds currently inuse for households with children for ELCSA in Guatemala and EMSA in Mexico and betweenone and two points lower for EBIA (Tables 5 and 6, column “Severe"). On the other hand,the corresponding raw scores for moderate food insecurity mainly align with the thresholdscurrently in use for this category (Tables 5 and 6, column “Moderate"). Interestingly, minordifferences emerge between the behaviour of the equated scores for ELCSA in Guatemala andEcuador, possibly due to specific features of the phenomenon in the two countries, confirmingthe importance of an equating analysis even between different applications of the same scale.Finally, it is noteworthy that, among all implemented methods, the Equipercentile equatingmethod is the one whose results mostly resemble the current adopted thresholds.17 quating Moderate SevereMethod (SEE) (SEE) IRT-TS 6.2 (0.09) 12.1 (0.10)Mean 6.6 (0.07) 12.2 (0.07)Linear 6.5 (0.07) 11.7 (0.11)Equip 6.3 (0.09) 11.3 (0.15)
Equating Moderate SevereMethod (SEE) (SEE)
IRT-TS 5.8 (0.09) 12.1 (0.11)Mean 6.4 (0.05) 11.6 (0.05)Linear 6.2 (0.07) 11.0 (0.10)Equip 5.8 (0.13) 11.2 (0.16)
Table 5:
Raw scores on the Children scale corresponding to raw scores and on the Adult scale(lower thresholds for moderate and severe food insecurity, respectively) and related Standard Errorof Equating (SEE) computed by means of the IRT-TS, Mean, Linear and Equipercentile equatingmethods. Left: ELCSA (Guatemala). Right: ELCSA (Ecuador) Equating Moderate SevereMethod (SEE) (SEE)
IRT-TS 4.8 (0.12) 8.7 (0.13)Mean 5.5 (0.05) 9.5 (0.05)Linear 5.1 (0.07) 8.6 (0.10)Equip 4.8 (0.13) 8.1 (0.14)
Equating Moderate SevereMethod (SEE) (SEE)
IRT-TS 4.8 (0.08) 8.7 (0.09)Mean 5.5 (0.04) 9.0 (0.04)Linear 5.5 (0.04) 8.8 (0.07)Equip 4.7 (0.05) 8.3 (0.14)
Table 6:
Left: Raw scores on the Children scale corresponding to raw scores and on the Adultscale (lower thresholds for moderate and severe food insecurity, respectively) and related StandardError of Equating (SEE) computed by means of the IRT-TS, Mean, Linear and Equipercentileequating methods, EMSA (Mexico). Right: Raw scores on the Children scale corresponding toraw scores and on the Adult scale (lower thresholds for moderate and severe food insecurity,respectively) and related Standard Error of Equating (SEE) computed by means of the IRT-TS,Mean, Linear and Equipercentile equating methods, EBIA (Brazil). The present work presented two studies investigating comparability between experientialscales of food insecurity. The first study aimed at addressing comparability between the FIESand three national scales (ELCSA, EMSA and EBIA) in Guatemala, Ecuador, Mexico andBrazil. Results show that, in general, the VoH threshold used for computing the indicator
Prevalence of Experienced Food Insecurity at moderate or severe levels ( F I
Mod + Sev ) andcorresponding to the severity of item ATELESS on the Global Standard ( − . ) seems toreflect a less severe level of food insecurity than that described by the thresholds used by thenational scales (and expressed in terms of raw scores) for the same level of severity. On theother hand, the VoH threshold used for computing the indicator Prevalence of ExperiencedFood Insecurity at severe levels ( F I
Sev ) and corresponding to the severity of item WHLDAYon the Global Standard ( . ) seems to reflect a more severe condition than the one measuredthrough the national thresholds for this level of severity. The relevance of such a resultfor the practice of food insecurity measurement is self-evident. In fact, the possibility of18omparing prevalences of food insecurity derived from applying different scales represents animportant step in the direction of realizing a more reliable monitoring of the global progressestowards the goal of food security for all people worldwide (as expressed by Target . of theSustainable Development Goals) and, as such, it is expected to gain increasing attention bypractitioners and decision makers in the field.Additionally, a second study investigated the issue of comparability between food insecurityscales referred to households without children (Adult scale) and households with children(Children scale), within each national context. Results show that the national thresholdscurrently used to compute prevalences of food insecurity at nominally the same level ofseverity among households with and without children might not always represent the samedegree of the restrictions on food access. This seems especially evident for the most severecategory of food insecurity, where current thresholds on the Children scale are lower thanthose computed as equivalent to the thresholds used on the Adult scale for this category.Future studies are expected to shed light on possible reasons and additional aspects of thistopic. A more detailed characterization of the phenomenon of food insecurity across countriesas well as in households with and without children (especially from a social and economicpoint of view) will likely better motivate and clarify the distinctive behaviour of differentfood insecurity scales. Furthermore, similar analyses to be conducted on other experientialscales of food insecurity might contribute to reach a deeper knowledge of the phenomenon.Significant examples being the HFSSM in North America, as well as applications of ELCSAin countries of the Latin America beyond the ones here considered.19 eferences [1] Anthony D Albano et al. equate: An R package for observed-score linking and equating. Journal of Statistical Software , 74(8):1–36, 2016.[2] United Nations. General Assembly.
Universal declaration of human rights , volume 3381.Department of State, United States of America, 1949.[3] Terri J Ballard, Anne W Kepple, and Carlo Cafiero. The food insecurity experiencescale: development of a global standard for monitoring hunger worldwide.
Rome, Italy.FAO , 2013.[4] Terri J Ballard, Anne W Kepple, Carlo Cafiero, and Josef Schmidhuber. Better mea-surement of food insecurity in the context of enhancing nutrition.
Ernahrungs Umschau ,61(2):38–41, 2014.[5] Henry I Braun. Observed-score test equating: A mathematical analysis of some ETSequating procedures.
Test equating , 1982.[6] Carlo Cafiero. What do we really know about food security? Technical report, NationalBureau of Economic Research, 2013.[7] Carlo Cafiero, Hugo R Melgar-Quinonez, Terri J Ballard, and Anne W Kepple. Validityand reliability of food security measures.
Annals of the New York Academy of Sciences ,1331(1):230–248, 2014.[8] Carlo Cafiero, Sara Viviani, and Mark Nord. Rm.weights: Weighted rasch mod-eling and extensions using conditional maximum likelihood. https://CRAN.R-project.org/package=RM.weights , 2018.[9] Steven J Carlson et al. Measuring food insecurity and hunger in the United States.
TheJournal of nutrition , 129(2):510S–516S, 1999.[10] Comitato Cientifico de la ELCSA. Escala Latinoamericana y Caribeña de SeguridadAlimentaria (ELCSA): Manual de uso y aplicaciones.
Rome, Italy. FAO , 2012.[11] Jennifer Coates, Edward A Frongillo, Beatrice Lorge Rogers, Patrick Webb, Parke EWilde, and Robert Houser. Commonalities in the experience of household food insecurityacross cultures: what are measures missing?
The Journal of nutrition , 136(5):1438S–1448S, 2006.[12] Linda L Cook and Daniel R Eignor. Irt equating methods.
Educational measurement:Issues and practice , 10(3):37–45, 1991.[13] Neil J Dorans, Mary Pommerich, and Paul W Holland.
Linking and aligning scores andscales . Springer Science & Business Media, 2007.[14] FAO.1946. World Food Survey.
Washington, DC. FAO , 1946.2015] FAO.1952. World Food Survey.
Rome, Italy. FAO , 1952.[16] FAO.1963. Third World Food Survey. Freedom from Hunger Campaign Basic Study no.11.
Rome, Italy. FAO , 1963.[17] FAO.1996. World food summit.
Rome, Italy. FAO , 1996.[18] FAO.2001. The state of food insecurity in the world 2001.
Rome, Italy. FAO , 2001.[19] FAO.2016. Methods for estimating comparable rates of food insecurity experienced byadults throughout the world.
Rome, Italy. FAO , 2016.[20] Gerhard H Fischer and Ivo W Molenaar.
Rasch models: Foundations, recent developments,and applications . Springer Science & Business Media, 2012.[21] Ronald K Hambleton, Hariharan Swaminathan, and H Jane Rogers.
Fundamentals ofitem response theory , volume 2. Sage, 1991.[22] Willim L Hamilton, John T Cook, et al. Household food security in the united states in1995: technical report of the food security measurement project. 1997.[23] Andrew D Jones, Francis M Ngure, Gretel Pelto, and Sera L Young. What are weassessing when we measure food security? a compendium and review of current metrics.
Advances in Nutrition , 4(5):481–505, 2013.[24] Michael J Kolen and Robert L Brennan.
Test equating, scaling, and linking: Methodsand practices . Springer Science & Business Media, 2014.[25] Frederic M Lord and Marilyn S Wingersky. Comparison of IRT observed-score andtrue-score ‘equatings’.
ETS Research Report Series , 1983(2):i–33, 1983.[26] Gary L Marco. Item characteristic curve solutions to three intractable testing problems1.
ETS Research Bulletin Series , 1977(1):i–41, 1977.[27] Mark. Nord. Assessing Potential Technical Enhancements to the U.S. Household FoodSecurity Measures.
U.S. Department of Agriculture, Economic Research Service , TB-1936,2012.[28] World Health Organization et al. Sustainable Development Goals: 17 goals to transformour world, 2016.[29] Per Pinstrup-Andersen. Food security: definition and measurement.
Food security , 1(1),2009.[30] Kathy L Radimer, Christine M Olson, Jennifer C Greene, Cathy C Campbell, andJean-Pierre Habicht. Understanding hunger and developing indicators to assess it inwomen and children.
Journal of Nutrition Education , 24(1):36S–44S, 1992.2131] Georg Rasch. Probabilistic models for some intelligence and achievement tests.
Copen-hagen: Danish Institute for Educational Research , 1960.[32] Stanley Smith Stevens et al. On the theory of scales of measurement. 1946.[33] Robert D Tortora, Rajesh Srinivasan, and Neli Esipova. The Gallup World Poll.
SurveyMethods in Multinational, Multiregional, and Multicultural Contexts , pages 535–543,2010.[34] Paloma Villagómez-Ornelas, Pedro Hernández-López, Brenda Carrasco-Enríquez, KarinaBarrios-Sánchez, Rafael Pérez-Escamilla, and Hugo Melgar-Quiñónez. Validez estadísticade la Escala Mexicana de Seguridad Alimentaria y la Escala Latinoamericana y Caribeñade Seguridad Alimentaria. salud pública de méxico , 56:s5–s11, 2014.[35] Jonathan P Weeks et al. plink: An R package for linking mixed-format tests usingIRT-based methods.
Journal of Statistical Software , 35(12):1–33, 2010.
Authors’ address:
University of Rome La Sapienza, [email protected]
Faculty of Information Engineering, Informatics and Statistics,Department of Statistical Sciences, [email protected]
ItalyFood and Agriculture Organization of the United Nations, [email protected]