[PDF] Strata-based Quantification of Distributional Uncertainty in Socio-Economic Indicators: A Comparative Study of Indian States

Abstract

This paper reports a comprehensive study of distributional uncertainty in a few socio-economic indicators across the various states of India over the years 2001-2011. We show that the DGB distribution, a typical rank order distribution, provide excellent fits to the district-wise empirical data for the population size, literacy rate (LR) and work participation rate (WPR) within every states in India, through its two distributional parameters. Moreover, taking resort to the entropy formulation of the DGB distribution, a proposed uncertainty percentage (UP) unveils the dynamics of the uncertainty of LR and WPR in all states of India. We have also commented on the changes in the estimated parameters and the UP values from the years 2001 to 2011. Additionally, a gender based analysis of the distribution of these important socio-economic variables within different states of India has also been discussed. Interestingly, it has been observed that, although the distributions of the numbers of literate and working people has a direct (linear) correspondence with that of the population size, the literacy and work-participation rates are distributed independently of the population distributions.

Full PDF

SStrata-based Quantiﬁcation of Distributional Uncertainty in Socio-EconomicIndicators: A Comparative Study of Indian States

Abhik Ghosh, Olivia Mallick, Souvik Chattopadhay, and Banasri Basu Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata 700108 ∗ Physics and Applied Mathematics Unit, Indian Statistical Institute, Kolkata 700108, India † (Dated: February 23, 2021)This paper reports a comprehensive study of distributional uncertainty in a few socio-economicindicators across the various states of India over the years 2001-2011. We show that the DGBdistribution, a typical rank order distribution, provide excellent ﬁts to the district-wise empiricaldata for the population size, literacy rate (LR) and work participation rate (WPR) within everystates in India, through its two distributional parameters. Moreover, taking resort to the entropyformulation of the DGB distribution, a proposed uncertainty percentage (UP) unveils the dynamicsof the uncertainty of LR and WPR in all states of India. We have also commented on the changesin the estimated parameters and the UP values from the years 2001 to 2011. Additionally, a genderbased analysis of the distribution of these important socio-economic variables within diﬀerent statesof India has also been discussed. Interestingly, it has been observed that, although the distributionsof the numbers of literate and working people has a direct (linear) correspondence with that ofthe population size, the literacy and work-participation rates are distributed independently of thepopulation distributions. Keywords:

Entropy; Rank Order Distribution; Population Distribution; Literacy Rate; Work Participation Rate.

I. INTRODUCTION

Socio-economic development of our world is majorly driven by people and their governance. The habitable area of theworld is divided into distinct units, namely countries or dependent territories, governed by independent administrativebodies of diﬀerent nature. These countries are further subdivided (ﬁrst tier division) into smaller units, sometimescalled states or provinces, for the purpose of internal governance, management and better policy making. For theadministrative purpose and the ease of working in a local manner, these states or provinces are further subdivided(second tier division) into more smaller strata called counties or districts. Although further subdivisions are seen insome countries, most large countries have major administrative divisions up to the second tier. For example, USA isdivided into states (ﬁrst-tier division) which, in turn, are divided into counties (second-tier division). China is dividedinto provinces and direct-controlled municipalities (ﬁrst level), which are split into prefectures (second level), andprefectures are divided into counties (third level). India is divided into states (ﬁrst tier) and subsequently in districts(second-tier). Although there is wide heterogeneity among the administrative units (states or districts) in a country,these local administrative units can be seen as socially constructed strata, which serve as spatial scenarios for socialand economic processes [2]. Such administrative divisions are often regions of a country that are granted a certaindegree of autonomy and manage themselves through local governments. In a growing number of developing countries,there is an attempt to proliferate sub-national administrative divisions for decentralization of local governments [4]such that the reorganization may yield economic beneﬁt for the cities [5]. Even in the absence of large administrativeand political reforms, administrative units are constantly being created, destroyed, merged or split, demonstratingthe fact that the sub-divisional strata within the countries undergo evolution.The population distribution that does not distribute randomly [6–8] over diﬀerent regions of a country is one of thekey factors of the high degree of diversity and complexity [9] of the internal regional administration of various countriesand territories. In this context, the study and understanding of the geographical distribution of the population withina given country or region becomes relevant, as it is a necessary step for the development of theories that couldaccurately describe the evolution of human agglomerations [13, 14]. However, almost all studies regarding populationdistribution focus on city populations (see, for instance, [7, 8, 15–21]), whereas the literature regarding the populationdistribution in administrative divisions is scarce [22, 23].Moreover, there are studies predicting strong correlation of the population of any city with many of the characterizingfeatures of the inhabitants of the corresponding city: mean income, number of registered patents per capita, criminality ∗ Electronic address: [email protected] † Electronic address: [email protected] a r X i v : . [ s t a t . A P ] F e b rates, land value and rent prices [8, 9, 13]. In this respect, it is important to study the correlations of various socio-economic variables with the population distribution of the smaller strata of a country. In particular, to frame betterpolicies for the whole country as well as for these subdivisions, it is necessary to understand, in detail, the distributionand associated uncertainty or inequalities of diﬀerent socio-economic indicators within and between these strata alongwith their relations with the populations. These would then help us to classify diﬀerent strata (subdivisions) withina country into similar groups in terms of the policy requirements for their developments where the groups are formedbased on their underlying diﬀerence in terms of the socio-economic indicators. Is there any particular stratum withinthe country that requires special attention than the rest or all the strata are equal in terms of their socio-economiccharacteristics? In an attempt to answer such questions, in this paper, we will propose and discuss an innovative way ofquantifying the distributional uncertainty of the socio-economic factors within diﬀerent strata (ﬁrst tier subdivisions)of a country based on their rank-size pattern across the second-tier subdivisions.To study the uncertainty characteristics of a given socio-economic factor within a strata (ﬁrst-tier subdivision) of acountry, we ﬁrst ﬁt an appropriate distribution to the rank-size values of this factor across all second-tier subdivisionswithin the strata. The well known Zipf’s law, alternatively known as power law or Pareto distribution [10]has beensuccessfully used to model such rank-size data on socio-economic factors in its upper range of values, but observed tofail in the lower end of the rank-size distribution [11, 12]. Very recently, a universal framework through rank-orderdistribution has been developed to successfully model a wide-range of rank-size data from various areas in arts andscience [8, 24–29] where a two-parameter discrete generalized beta (DGB) distribution is used instead of the one-parameter Pareto distribution. The DGB distribution is also shown to provide excellent ﬁt to the rank-size data onvarious socio-economic parameters within a country and between diﬀerent countries across the world [8]. Remarkably,this DGB distribution can indeed be obtained as an appropriate maximum-entropy distribution [30] and hence itsentropy can be used to quantify the maximum uncertainty present in the empirical data distribution. We will followthis idea to develop an uncertainty measure based on the entropy value of the ﬁtted DGB distribution to the empiricaldata on the given socio-economic factor across all second-tier subdivisions within the strata.Our proposed uncertainty measure, which we refer to as the Uncertainty Percentage (UP) , will indicate the distri-bution of the factor under study across diﬀerent parts of the given strata; it will hence take the maximum value of100% if the factor value (socio-economic condition) is the same across all parts of the strata, i.e., in all second-tiersubdivision within it, indicating the equally distributed scenario (least inequality). On the other hand, our measureUP will take the minimum value of 0% if the inequality within the strata is extreme in that only one second-tiersubdivision has a non-zero value of the factor under study (socio-economic condition is as desired in only one sub-division)! Any value of UP in between these two extrema will indicate how the targeted factor is distributed withindiﬀerent parts (subdivisions) of the strata with higher values indicating more uniformity (less inequality). Once thisuncertainty measure is computed for all the strata within a country, they can be classiﬁed accordingly and can beused for their further in-depth comparisons. For example, the ﬁtted DGB distribution of diﬀerent socio-economicfactor can be linked by established appropriate relationship between the corresponding parameter estimates which,along with our UP measure, provide further insights to characterize and classify diﬀerent strata within a country forbetter policy making.Further, by using our proposed idea, we will analyze uncertainty among the Indian states (ﬁrst-tier strata) basedon the census data for the years 2011 and 2001. In this paper, we focus on the basic three socio-economic entities,namely, population, literacy and working population. Like Population, education is a key element in any society thatremoves inequality from society, impacts the growth of employment and improves a country’s gross national product.On the other hand, the distributional inequality of unemployment rates (that is given by the size of the non-workingpopulation) plays an important role for structuring the economic development. These three factors, together controls,to a large extent, the development of the human resources in any society. For a vast and diverse country like India,which has so many states with very diﬀerent cultures, it is very important to analyses the levels of literacy andemployment across various states of India and union territories (UTs) and investigate their underlying distributionaluncertainty or inequality. We would like to apply our proposed idea [8, 30] of ﬁtting the DGB distribution tothe empirical data on the said three factors for all the districts of each Indian states. In a subsequent analysis wemeasure the inequality across the second-tier subdivisions (districts) within a state via our proposed UP measure. Thecorrelation of the distribution of literacy and employment rate within each state with its population distribution willbe investigated by linking the parameters of the corresponding ﬁtted DGB distributions. Additionally, to strengthenour proposition we have also performed a gender based analysis for all these socio-economic indicators. Finally, ourtemporal analysis provide an overall picture about the uncertainty or uniformity of the Indian states in terms ofpopulation, literacy and employment rates, their interrelations and changes from the year 2001 to 2011.

II. METHODOLOGYA. Modelling Empirical Data by DGB distribution

Let us consider a strata (e.g., state of a country) having N second-tier subdivisions (e.g., districts). Suppose thata socio-economic factor under study (e.g., population, literacy, work participation rate, etc.) takes the value x i atthe i -th subdivision for i = 1 , . . . , N . We arrange these data in decreasing order of “importance” (size) and denotethe rank of the i -th subdivision as r i for i = 1 , . . . , N . We model these rank-size data { ( r i , x i ) : i = 1 , . . . , N } by thediscrete generalized beta (DGB) distribution having probability mass function f ( a,b ) ( r ) = A ( N + 1 − r ) b r a , r = 1 , . . . , N. (1)Here, in (1), a, b are two real valued model parameters characterizing the underlying distributional structure and A is the normalizing constant depending on ( a, b ) ensuring that (cid:80) Nr =1 f ( a,b ) ( r ) = 1. Note that, for the choice b = 0,the probability mass function in (1) simpliﬁes to that of the Pareto distribution. With diﬀerent values of the twoparameters ( a, b ), the class of DGB distribution in (1) allows us to model a wide range of rank-size distributionshaving an inﬂection point [8] and can successfully model diﬀerent types of socio-economic factors [8, 24–29]Given empirical rank-size data { ( r i , x i ) : i = 1 , . . . , N } of any socio-economic factor, we can ﬁt the DGB distributionby estimating the model parameters ( a, b ) by maximizing the likelihood of the observed data, given by L ( a, b ) = N (cid:89) i =1 f ( a,b ) ( r i ) x i = N (cid:89) i =1 ( N + 1 − r i ) bx i r ax i i A x i . (2)This maximum likelihood estimator is known to be asymptotically most eﬃcient and also enjoys several other op-timality properties. However, we need to use numerical optimization method to compute these estimates since thelikelihood function in (2) does not posses an explicit form for its maximizer. Once we get the maximum likelihoodestimator (MLE), ( (cid:98) a, (cid:98) b ), of the model parameters ( a, b ) we can then verify if the ﬁtted DGB model is a good ﬁt to theempirical data or not. For this purpose, we use the overall error in model ﬁt, computed in terms of the Kolmogorov-Smirnov (KS) measure between the observed and the predicted cumulative rank-sizes. Denoting the predicted size ofrank r i as p i = (cid:16)(cid:80) Ni =1 x i (cid:17) f ( (cid:98) a, (cid:98) b ) ( r i ) for all i = 1 , . . . , N , we deﬁne the KS goodness-of-ﬁt measure as KS = max ≤ i ≤ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) j : r j ≤ r i p j  −  (cid:88) j : r j ≤ r i x j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (3)The lower KS values indicate a better ﬁtment of the model, i.e. the theoretically estimated ﬁtted DGB distributionis very close to the empirically observed rank-sizes when the KS values are nearly vanishing. Zero KS value indicatesperfect ﬁt with no error for the observed data. B. Quantiﬁcation of Distributional Uncertainty

Once a DGB distribution is found to be well ﬁtted for a socio-economic indicator within a strata, we propose toquantify the underlying distributional uncertainty by its entropy . Entropy is a widely used measure of disorder withinany physical system. Although the idea of entropy was originally used in Thermodynamics long back, its major use ininformation science and allied disciplines were started after Shannon’s groundbreaking work [31, 32] on mathematicalformulation of entropy in an information channel and Jayne’s Maximum entropy principle [33]. Subsequently, the ideaof entropy and maximum entropy distribution has been widely used to analyze and assess the uncertainty in severalgeographical or socioeconomic variables [34]Here we also consider the DGB distribution, which is shown to be a maximum entropy rank-order distributionunder appropriate utility constraints [30] For this DGB distribution having probability as given in (1), its Shannonentropy is given by S N ( a, b ) = − N (cid:88) r =1 f ( a,b ) ( r ) log f ( a,b ) ( r ) = − log A − A N (cid:88) r =1 ( N + 1 − R ) b r a [ b log( N + 1 − r ) − a log r ] . (4)This entropy value S ( a, b ) then gives the maximum amount of uncertainty (entropy) lying within the underlyingrank-order distribution for the given utility constraints which are characterized by the model parameters ( a, b ).Therefore, given the empirical rank-size data, once we obtain the estimated parameter value ( (cid:98) a, (cid:98) b ), we can estimate the(maximum) amount of uncertainty in these data by the entropy of the ﬁtted DGB distribution, i.e., by (cid:98) S N = S N ( (cid:98) a, (cid:98) b ).However, the estimated entropy value (cid:98) S N can vary from 0 to log( N ) so it cannot be used to compare the uncertaintypresent in two distributions having diﬀerent values of N . Since the strata may often have diﬀerent number ( N ) ofsubdivisions, we cannot compare them by just using (cid:98) S N as a measure of uncertainty. Standardizing by the range, wethen deﬁne our proposed uncertainty percentage (UP) measure as the proportion of entropy estimated from the ﬁttedDGB distribution with respect to its maximum possible value, i.e., U P = S N ( (cid:98) a, (cid:98) b )log N × . (5)Note that, clearly the UP measure takes the value 100% if S N ( (cid:98) a, (cid:98) b ) = log( N ) which holds if and only if (cid:98) a = 0 = (cid:98) b indicating that the best-ﬁtted rank-sizes are uniformly distributed over the subdivisions within the given strata; inother words, there is no inequality of the distribution of the targeted socio-economic indicator within diﬀerent partsof the strata. On the other hand, the UP measure will be zero if S N ( (cid:98) a, (cid:98) b ) = 0 indicating the extremely disperseddistribution of the rank-sizes within the strata with only one subdivision having non-zero value of the targetedindicator. Therefore, the UP can be used to quantify the distributional uncertainty of the spread of the distributionof any socio-economic indicator within a strata measuring its closeness to the optimum case of uniformity. Since theUP measure takes the value from 0 to 100%, irrespective of the value of N , it can then also be used to compare theuncertainty present within diﬀerent strata as well as for diﬀerent indicators. C. Inter-relations between the distributions of two socio-economic variables

In any human agglomeration unit, most of the socio-economic indicators are highly correlated with each other. Inparticular, we have noted that the population distribution has a huge impact on the distribution of various socio-economic factors that had been explored in the literature via regression within the power law structure. However,in this paper, we have used the alternative DGB rank-order distribution that is noted to yield better ﬁtment of themodel to the empirical data. Being consistent with the ﬁtted DGB distributions, we here propose to investigate therelationship between the distributions of any socio-economic indicator with the population distribution within diﬀerentstrata. To be more speciﬁc, our approach focus more on the inter-relation between the distributional structure anduncertainty of the socio-economic variables rather than the corresponding values of the indicators.To investigate the relationship between the DGB distributions ﬁtted to two variables within a strata, let us recallthat the DGB distributions are characterized by its two model parameters ( a, b ). Suppose that there are T strataand, for the j -th stratum, the MLE of the parameters ( a, b ) corresponding to the ﬁtted population distribution is( (cid:98) a P j , (cid:98) b P j ) and the same for any targeted socio-economic indicator Y (e.g., literacy or unemployment rate) is ( (cid:98) a Y j , (cid:98) b Y j ), j = 1 , . . . , T . Then to study the relationship between the best-ﬁtted DGB distributions to the targeted indicator Y andthe population across the T strata, it is enough to investigate the relationship between the corresponding parameters( (cid:98) a P j , (cid:98) b P j ) and ( (cid:98) a Y j , (cid:98) b Y j ) for j = 1 , . . . , T . The ﬁrst attempt in this regard should be the examination of the (Pearson)bivariate correlations among these estimated parameters across strata. If this correlation is signiﬁcantly higher (inabsolute value), it indicates the linearity of their relationship which we may then further explore via appropriatemultivariate regression models. The same can be studied using the UP values and also for any two socio-economicindicators (other than the population).However, if there is no linear relationship between the ﬁtted parameters or the UP values corresponding to twoindicators, one should investigate their relation via appropriate non-parametric correlation coeﬃcients (e.g., rankcorrelation). If this non-parametric correlation is found to be signiﬁcant, then one may proceed for further analysesof their non-linear relationship using advanced statistical tools, which will not be considered in the present paper. III. RESULTS: UNDERSTANDING UNCERTAINTY AMONG INDIAN STATESA. Data Description and Overall Statistics

We consider the data from Indian census which is conducted in every 10 years to capture a detailed pictureof demographic, economic and social conditions of all persons in the country pertaining to that speciﬁc time.The raw census data for the years 2011 and 2001 are obtained from the

Primary Census data and Digital library

State . It is important to note that some states, created at a later time, were not present in the earlierround of census (at the year 2001). The total number of districts within all the states of India was 640 and 593 inthe years 2011 and 2001, respectively. Additionally, we will also use these census data separately for male and femalepopulations for further gender-based uncertainty analysis in Section III C.As mentioned earlier, in this paper, we focus on two important socio-economic factors, namely literacy and workparticipation rates, along with the population of any unit under consideration. For literacy computation, a personis deﬁned to be literate if (s)he is aged more than 7 years and can both read and write with understanding in anylanguage. On the other hand, a person who has worked in any economically productive activity for more than 6months during the last one year preceding the data collection date is considered to be a worker (main or marginal).We analyze the data on the corresponding variables (in rate) deﬁned as follows:Literacy rate (LR) = number of literate people in a districttotal population in that district × , Work Participation Rate (WPR) = number of Workers in a districttotal Population in that district × . A comprehensive summary of state-wise population (P), LR and WPR statistics (in %) are reported in SupplementaryTable S1 (Appendix A) for the years 2011 and 2001. It can easily be seen that the population of each state has increasedover the years (linearly). To understand the changes in LR and WPR more clearly, we have plotted their values inthe years 2001 versus in the year 2011 in Figure 1, along with the corresponding linear ﬁt. According to 2011 censusIndia has literacy rate (LR) around 84% to 50%, among which Kerala is highest and Bihar is lowest; but, the WPRfor diﬀerent states of India varies in between 35% to 50% only. Further, the LR has signiﬁcantly increased in all thestates whereas the changes in WPR is rather variables. Except for Mizoram and Kerala, all states has an increase ofmore than 5% in LR, with as many as 9 states having an increase of more than 10%. On the other hand, 10 stateshave no signiﬁcant changes (only less than 1%) in the WPR. The UT of Dadra and Nagar Haveli has the highestincrease in LR but maximum decrease in WPR. Nagaland is seen to have the highest increase of 6.64% in the WPR.Among larger states, Kerala, Himachal Pradesh, Assam, Jharkhand and Odisha has shown some amount of increase

FIG. 1: Plot of LR and WPR for diﬀerent states in the years 2001 and 2011, along with the corresponding linear ﬁts. Theblack solid line represent the y = x line (linear growth) ( > B. Modelling Population, Literacy and Work participation rates within each StatesFitting the DGB rank-order Distributions within each States :We now discuss the results obtained by the application of the DGB distribution for diﬀerent socioeconomic variables(Population, LR and WPR) for various states of India using the district-wise data. The states comprising of districtsless than 5 are excluded from the analysis; this excludes 6 UTs (all except Delhi) and three states (Goa, Sikkimand Tripura). The values of the estimated parameters ( a, b ) and the goodness-of-ﬁt measures (KS) for all threesocioeconomic variables for diﬀerent states, are reported in Table I for both the years 2011 and 2001. As expected(see [8, 30]), the DGB distribution, with the two distributional parameters (cid:98) a and (cid:98) b , provide excellent ﬁts to theempirical data not only for Population size but also for LR and WPR with extremely small KS error.Considering the distributional structures, for most states, both the parameters (cid:98) a and (cid:98) b tend to have positive valuesand are also nature-wise similar. However, there has been signiﬁcant variations among the estimated parameter valuesacross diﬀerent states for all the variables, indicating diﬀerences in their distributions with the states (across districts).Particularly the UT of Delhi, the capital of India and one of the crowded states, has a negative (cid:98) a value − .

157 anda positive (cid:98) b value 1.387 for the distribution of populations. On the other hand, for modelling population, the not somuch populated state Mizoram has a high positive value of (cid:98) a (0.999) than any other states but a negative (cid:98) b value.While considering the distribution of LR, most of the states ﬁts to the positive values for both (cid:98) a and (cid:98) b , except forChandigarh, Delhi, Haryana, Mizoram which yield negative (cid:98) a value and positive (cid:98) b value. In 2011, Kerala, the statewith 84 percent literate people, corresponds to (cid:98) a and (cid:98) b values of 0.018 and 0.035, respectively. West Bengal having67.4 percent literate people gives (cid:98) a and (cid:98) b values 0.043 and 0.127. For the distribution of WPR as well, some stateshave negative (cid:98) a value indicating their diﬀerence from the majority of the states. In general, disparity of LR amongthe states are more than that of WPR.Next, if we compare the ﬁtted distribution (their estimated parameters) for the two years 2011 and 2001, theychanges more for the smaller states and the states having signiﬁcantly change in district numbers. Among the stateshaving mostly the same numbers of district in both 2001 and 2011, signiﬁcant change in the values of the estimatedparameters are observed for Bihar and meghalay while considering the LR distribution but the others remains mostlythe same. However, the distribution of WPR has signiﬁcant changes in more states including Assam, Delhi, Gujarat,Mizoram and Tamil Nadu. Distributional Uncertainty Analysis:

We have revealed that the DGB distribution can completely characterize the population size, LR and WPR distri-butions through two sensors or drivers modelled by two parameters ( (cid:98) a , (cid:98) b ). The predicted distribution ﬁts very wellwith the observed rank-size data of all the districts of India as well as the data of the districts within the states. Wenow ﬁnd the entropy (S) values of all the considered distributions and take resort to the proposed entity U ncertaintyP ercentage (UP) by standardizing the entropy S values of the three variables of diﬀerent Indian province. Theresulting values of UP are plotted in Figure 2 for both the years 2011 and 2001.One can see from Figure 2 that the UP values changes mostly for the population and least for the WPR. So, therehas been no signiﬁcant changes in the distributional uncertainty of the work participation rates in all states on India.A little signiﬁcant increase of 0.14% only is observed for the UP of WPR in Assam whereas it changes less that 0.1%in all other states. The changes in the UP values for LR is also less than 0.5% indicating no signiﬁcant changes in thedistributional uniformity of literacy rates in all the states in India. For the population, however, maximum increase(of 0.72%) in the UP values is observed in West Bengal indicating greater uniformity of its distribution within thestates. On the other hand, in Karnataka and Chattisgarh, the UP values of population distribution decreases morethan 2% indicating signiﬁcant relocations of people within these states (mostly from rural areas to major cities).In absolute term, Andhra Pradesh, Madhya Pradesh and Haryana has maximum UP values and hence moreuniform population distribution, whereas those in Mizoram and Uttarakhand are more sparse. In terms of literacyrate, however, Delhi, Kerala and Uttarakhand are most evenly distributed states having highest UP values andChhattisgarh and Odisha have most uneven distribution of LR (lowest UP values). Finally, Karnataka, West Bengaland Meghalaya have highest UP values for WPR indicating most equally distributed work-participation rates amongtheir districts, whereas Jammu and Kashmir, Nagaland and Odisha have most uneven workforce within the states.

TABLE I: The parameter estimates and the KS measures for for diﬀerent states along with total numbers (N) of districtsPopulation LR WPRStates N (cid:98) a (cid:98) b KS (cid:98) a (cid:98) b KS (cid:98) a (cid:98) b KSYear 2011India 640 0.252 0.872 0.005 0.063 0.133 0.004 0.082 0.109 0.006Andhra Pradesh 23 0.124 0.171 0.007 0.074 0.050 0.001 -0.009 0.109 0.001Arunachal Pradesh 16 0.074 0.825 0.020 0.069 0.125 0.002 0.122 0.033 0.003Assam 27 0.265 0.307 0.011 0.076 0.070 0.002 0.060 0.071 0.006Bihar 38 0.143 0.534 0.003 0.058 0.086 0.006 0.064 0.056 0.003Chattishgarh 18 0.390 0.664 0.015 -0.022 0.285 0.005 0.044 0.059 0.002Delhi 9 -0.157 1.387 0.026 0.015 0.033 0.001 0.101 0.022 0.006Gujarat 26 0.383 0.454 0.020 0.021 0.112 0.003 0.112 0.011 0.003Haryana 21 0.140 0.199 0.006 -0.014 0.150 0.004 0.020 0.105 0.002Himachal Pradesh 12 0.306 0.748 0.035 0.009 0.076 0.001 0.059 0.119 0.005Jammu Kashmir 22 0.368 0.487 0.011 0.141 0.049 0.003 0.175 0.046 0.004Jharkhand 24 0.271 0.394 0.013 0.060 0.107 0.003 0.052 0.111 0.004Karnataka 30 0.689 0.033 0.031 0.047 0.122 0.002 0.044 0.032 0.002Kerala 14 0.056 0.592 0.011 0.018 0.035 0.002 0.074 0.107 0.006Madyapradesh 50 0.197 0.293 0.005 0.016 0.159 0.005 0.048 0.098 0.001Maharashtra 35 0.491 0.288 0.014 0.027 0.077 0.001 0.052 0.038 0.002Manipur 9 0.126 0.623 0.030 0.038 0.103 0.003 0.054 0.059 0.001Meghalayas 7 0.376 0.477 0.014 0.055 0.110 0.007 0.036 0.025 0.002Mizoram 8 0.990 -0.119 0.020 -0.046 0.239 0.006 -0.008 0.184 0.003Nagaland 11 0.241 0.563 0.012 0.050 0.171 0.004 0.107 0.109 0.005Oddisha 30 0.239 0.488 0.007 0.027 0.230 0.005 0.035 0.167 0.008Punjab 20 0.381 0.332 0.024 0.042 0.082 0.007 0.062 0.021 0.003Rajasthan 33 0.367 0.211 0.012 0.062 0.066 0.003 0.063 0.049 0.001TamilNadu 32 0.175 0.477 0.009 0.042 0.050 0.001 0.035 0.079 0.001Uttarakhand 13 0.478 0.542 0.037 0.014 0.061 0.002 -0.001 0.191 0.006Uttar Pradesh 71 0.147 0.394 0.018 0.030 0.116 0.002 0.079 0.026 0.003West Bengal 19 0.255 0.364 0.006 0.043 0.127 0.002 0.051 0.024 0.001Year 2001India 593 0.236 0.808 0.007 0.077 0.192 0.004 0.074 0.123 0.005Andhra Pradesh 23 0.125 0.148 0.009 0.103 0.077 0.003 -0.016 0.150 0.004Arunachal Pradesh 13 0.130 0.490 0.011 0.103 0.131 0.005 0.082 0.080 0.002Assam 23 0.252 0.357 0.012 0.077 0.099 0.004 0.146 0.041 0.005Bihar 37 0.152 0.461 0.004 0.102 0.140 0.006 0.081 0.081 0.006Chattishgarh 16 0.456 0.248 0.017 -0.038 0.322 0.009 0.037 0.081 0.004Delhi 9 -0.094 1.163 0.021 0.003 0.051 0.001 0.027 0.077 0.003Gujarat 25 0.360 0.419 0.018 0.019 0.168 0.003 0.051 0.065 0.002Haryana 19 0.263 0.168 0.006 0.031 0.086 0.003 0.018 0.112 0.003Himachal Pradesh 12 0.326 0.714 0.035 -0.002 0.109 0.002 0.114 0.033 0.003Jammu Kashmir 14 0.166 0.766 0.019 0.182 0.077 0.005 0.166 0.083 0.017Jharkhand 18 0.083 0.661 0.014 0.134 0.173 0.006 0.060 0.130 0.003Karnataka 27 0.464 0.120 0.020 0.083 0.125 0.002 0.030 0.065 0.001Kerala 14 0.021 0.596 0.012 0.021 0.048 0.002 0.085 0.104 0.005Madyapradesh 45 0.104 0.357 0.004 0.030 0.163 0.004 0.061 0.090 0.002Maharashtra 35 0.449 0.298 0.013 0.040 0.098 0.001 0.030 0.061 0.001Manipur 9 0.350 0.409 0.029 0.098 0.077 0.003 0.024 0.075 0.002Meghalaya 7 0.380 0.527 0.018 0.202 0.025 0.005 0.034 0.075 0.001Mizoram 8 1.116 -0.305 0.009 -0.001 0.203 0.003 0.081 0.098 0.002Nagaland 8 0.284 0.261 0.015 -0.005 0.353 0.006 0.044 0.178 0.008Oddisha 30 0.246 0.481 0.008 0.024 0.334 0.005 0.057 0.179 0.010Punjab 17 0.310 0.470 0.022 0.046 0.127 0.007 0.060 0.046 0.001Rajasthan 32 0.331 0.225 0.012 0.080 0.086 0.003 0.066 0.051 0.003TamilNadu 30 0.176 0.495 0.008 0.059 0.056 0.001 0.055 0.108 0.001Uttarakhand 13 0.364 0.552 0.023 0.040 0.081 0.003 0.007 0.208 0.004Uttar Pradesh 70 0.142 0.383 0.016 0.054 0.161 0.002 0.095 0.051 0.004West Bengal 18 0.312 0.395 0.010 0.063 0.177 0.002 0.077 0.042 0.002 (a) Year 2011(b) Year 2001

FIG. 2: Plots of UP for Populations, LR and WPR for diﬀerent states of India in 2011 and 2001

Relationship among the variables :We have veriﬁed both the Pearson (linear) and the rank (non-parametric) correlation values between the variablespopulation, LR and WPR in terms of the estimated parameters and the UP values for the years 2001 and 2011separately; the rank correlations are illustrated in the supplementary Figure S1 (Appendix A). Interestingly, noneof these correlations are signiﬁcant at 95% level; the same is also observed for the Pearson correlations. This leadsto the inference that the distributional uncertainty among these three indicators are mostly uncorrelated with eachother across all the states of India. It is an important observation in contrast to the fact that the distribution of thenumbers of literate and working persons in a population depends directly (linearly) on the corresponding populationdistribution.

C. Gender based analyses

The study on gender composition largely reﬂect the underlying social, economic and cultural patterns of the societyin diﬀerent ways. According to United Nation estimates, the world had 986 females against 1000 males in 2000.Interestingly the sheer weight of the population of the Asian countries, particularly China (944) and India (933) withlow sex ratio contributes to the preponderance of males over females in world.As per the Indian Census 2011, total population of India comprises of 62,37,24,248 males and 58,64,69,174 femaleswith the sex ratio of 940 females per 1000 males. As per Census 2011, top ﬁve states/Union territories which havethe highest sex ratio are Kerela (1,084) followed by Puducherry (1,038), Tamil Nadu (995), Andhra Pradesh (992)and Chhattisgarh (991). Five states which have the lowest sex ratio are Daman Diu (618), Dadra Nagar Haveli(775), Chandigarh (818), NCT of Delhi (866) and Andaman Nicobar Islands (878). The gender statistics revealed byCensus India 2001, 532 million constituting 52 percent are males and 497 million constituting remaining 48 percentare females in the population. Eighteen states/Uts have recorded sex ratio above the national average of 933, whileremaining seventeen falls below this. Chandigarh and Daman Diu occupy the bottom positions with less than 800females per 1000 males. Though the Census has shown an increase in the sex ratio of total population from 933 in2001 to 940 in 2011, it still requires further improvement for the balance of male -female ratio.In any country, higher literacy rates improve development indicators consistently. In this respect, it is importantto know the female percentage of the literate persons, as that may lead to better attainment of health and nutritionalstatus, economic growth of community as a whole. The literacy rate of India in 2011 is 74.0 per cent. Literacy rateamong females is 65.5 per cent whereas the literacy rate among males is 82.1 per cent showing a gap of 16.6 percent.If we see the data of 2001 census, the gap was 21.6 revealing development in this index .Moreover, the male and female working population and work participation rate highlights the gender biased occu-pational distribution of a region. The Work Participation Rate (WPR), which is deﬁned as the percentage of totalworkers to the total population, is 39.8 and 39.3 per cent as per the 2011 and 2001 census. While the WPR for malesmarginally increased from 51.6 percent to 51.9 percent during 1991-2001, for females it improved signiﬁcantly from22.7 to 25.7 during the corresponding period. Interestingly, this increase is mainly due to increase in proportion ofmarginal workers which registered signiﬁcant increase from 3.4 percent to 8.7 percent. According to 2001 census,in the total population percentage of male non-workers and female non-workers are given by 48.1 percent and 74.3percent respectively.This gender based information of the census data inspired us to perform a gender-based analysis of the distributionof important socio-economic variables, such as population, literacy and work-participation rates of diﬀerent statesacross India. According to our expectation, the analysis also revealed that both the male and female distributions ﬁtvery well with the DGBD. Besides, the DGB distribution ﬁts very well with the male-female (sex) ratio of LR andWPR data for all the districts within the states of India with a very low value of KS (RO). For this purpose, we havecomputed the Sex-Ratio for the LR and WPR as follow:Sex-Ratio of LR (SR-LR) = number of female literate in a districtnumber of male literate in that district × , Sex-Ratio of WPR (SR-WPR) = number of female workers in a districtnumber of male workers in that district × . We apply our proposed analysis methodology for these two variables, depicting the gender based inequalities in LR andWPR, and the resulting parameter estimates and the UP values are presented in Table II. For better understandingof the change in distributional uncertainty from the year 2001 to 2011, the UP values for both SR-LR and SR-WPRare also presented graphically in Figure 3 along with the linear ﬁts. It can be seen that, on an average, there aresublinear growth of the distributional uncertainty (UP values) for both SR-LR and SR-WPR between years 2001and 2011, indicating reduced gender-discriminations in literacy rate as well as in work-participation rate. For SR-LR, all states indeed have increased UP value in 2011, except for Haryana and Mizoram. The lowest UP value(most discrimination) is observed in Arunachal Pradesh for SR-LR and in Uttarakhand for SR-WPR. In general,the distributional uncertainty for SR-WPR remains mostly stable in the states having higher UP values, with slightuniform variations. (a) SR-LR (b) SR-WPR

FIG. 3: Plot of the UP values for the gender based SR-LR and SR-WPR for diﬀerent states in the years 2011 vs. 2001, alongwith the corresponding linear ﬁts. The black solid line represent the y = x line (linear growth) TABLE II: The parameter estimates and the KS measures for diﬀerent states along with total numbers (N) of districts and theUP values for Gender-based Analysis SR-LR SR-WPRStates N (cid:98) a (cid:98) b KS UP (cid:98) a (cid:98) b KS UPYear 2011Andhra Pradesh 23 0.079 0.027 0.003 69.24 0.010 0.292 0.004 68.77Arunachal Pradesh 16 -0.003 0.211 0.002 69.03 0.146 0.200 0.008 68.56Assam 27 0.027 0.018 0.001 69.30 0.125 0.255 0.022 68.48Bihar 38 0.031 0.036 0.001 69.29 0.178 0.162 0.010 68.60Chhattisgarh 18 0.017 0.079 0.001 69.26 0.028 0.157 0.006 69.10Delhi 9 0.052 0.007 0.001 69.29 0.145 0.174 0.012 68.65Gujarat 26 0.026 0.064 0.002 69.26 0.294 0.208 0.012 67.64Haryana 21 -0.007 0.126 0.005 69.22 0.106 0.292 0.012 68.39Himachal Pradesh 12 0.090 0.068 0.002 69.15 0.133 0.144 0.006 68.81Jammu Kashmir 22 0.075 0.084 0.002 69.15 0.130 0.407 0.009 67.74Jharkhand 24 0.051 0.044 0.001 69.26 0.062 0.332 0.007 68.44Karnataka 30 0.064 0.046 0.001 69.24 0.065 0.162 0.003 69.01Kerala 14 0.023 0.046 0.002 69.28 0.039 0.292 0.014 68.65Madhya Pradesh 50 0.046 0.059 0.001 69.25 0.073 0.329 0.006 68.50Maharashtra 35 0.049 0.026 0.000 69.28 0.027 0.294 0.003 68.74Manipur 9 0.065 0.025 0.005 69.26 0.056 0.111 0.005 69.13Meghalaya 7 0.153 0.015 0.001 69.11 0.257 -0.005 0.002 68.82Mizoram 8 -0.014 0.163 0.003 69.16 0.114 0.035 0.004 69.16Nagaland 11 0.038 0.037 0.002 69.28 -0.086 0.297 0.006 68.98Odisha 30 0.046 0.062 0.002 69.24 0.023 0.702 0.014 67.03Punjab 20 0.038 0.024 0.001 69.26 0.245 0.006 0.011 68.80Rajasthan 33 0.034 0.081 0.001 69.24 0.086 0.141 0.003 69.01Tamilnadu 32 0.047 0.030 0.001 69.28 0.117 0.192 0.007 68.75Uttarakhand 13 0.067 0.068 0.002 69.19 -0.134 0.924 0.012 66.39Uttar Pradesh 71 0.047 0.047 0.001 69.26 0.142 0.314 0.019 68.28West Bengal 19 -0.009 0.100 0.004 69.25 0.180 0.197 0.008 68.41Year 2001Andhra Pradesh 23 0.117 0.045 0.003 69.13 -0.027 0.387 0.005 68.57Arunachal Pradesh 13 0.038 0.203 0.006 68.95 0.101 0.231 0.003 68.63Assam 23 0.026 0.037 0.003 69.29 0.193 0.314 0.014 67.81Bihar 37 0.045 0.091 0.002 69.20 0.153 0.292 0.014 68.22Chhattisgarh 16 0.089 0.073 0.005 69.14 0.056 0.116 0.008 69.13Delhi 9 0.052 0.029 0.002 69.27 0.164 0.148 0.009 68.67Gujarat 25 0.022 0.118 0.002 69.19 0.186 0.227 0.004 68.27Haryana 19 0.063 0.036 0.001 69.25 0.041 0.445 0.007 68.02Himachal Pradesh 12 0.134 0.082 0.005 69.00 0.166 0.054 0.002 68.97Jammu Kashmir 14 0.112 0.081 0.003 69.07 0.140 0.446 0.012 67.37Jharkhand 18 0.097 0.099 0.002 69.07 0.041 0.389 0.014 68.27Karnataka 27 0.104 0.064 0.002 69.13 0.034 0.196 0.004 69.00Kerala 14 0.022 0.047 0.001 69.28 0.037 0.374 0.006 68.32Madhya Pradesh 45 0.055 0.098 0.002 69.18 0.070 0.280 0.006 68.66Maharashtra 35 0.067 0.047 0.001 69.23 -0.022 0.381 0.012 68.61Manipur 9 0.043 0.057 0.002 69.25 0.051 0.076 0.005 69.21Meghalaya 7 0.142 0.063 0.003 69.03 -0.010 0.208 0.002 69.05Mizoram 8 0.006 0.118 0.002 69.21 0.096 0.023 0.003 69.21Nagaland 8 0.065 0.053 0.002 69.22 -0.172 0.480 0.020 68.61Odisha 30 0.057 0.148 0.007 69.06 0.029 0.719 0.023 66.90Punjab 17 0.038 0.037 0.001 69.28 0.172 0.170 0.005 68.56Rajasthan 32 0.095 0.073 0.006 69.14 0.072 0.190 0.003 68.91Tamilnadu 30 0.068 0.036 0.002 69.24 0.027 0.323 0.005 68.63Uttarakhand 13 0.034 0.134 0.003 69.13 -0.185 1.054 0.020 69.13Uttar Pradesh 70 0.061 0.101 0.002 69.17 0.098 0.578 0.022 67.45West Bengal 18 -0.015 0.161 0.004 69.17 0.181 0.354 0.008 67.64 IV. CONCLUSION

In this paper we have quantiﬁed the distributional uncertainty present in the various socio-economic indicatorsof all the Indian states, by studying the distribution of the districts within each individual states. The analysis isbased on the DGB distribution, a typical rank-order distribution, which ﬁts very well across many disciplines of Artsand Science. Despite the immense diversity of human settlements and extraordinary geographic variability, we haveshown that all states obey DGB distribution not only for population size but also for other socioeconomic factors likeliteracy rate and work participation rate. The two model parameters of the distribution for diﬀerent states constructa cluster. With the evolution of time there is no noted change in this cluster structure. Our primary analytical focushere is concerned with the entropy of the distribution. Although, the value of the entropy for a particular state for allfactors are diﬀerent, the Uncertainty Percentage (UP), a measure deﬁned for the uncertainty, is almost the same forall the states of diﬀerent size, culture and economic and social conditions. This is one of the most intriguing outcomeof this analysis. The results also indicate that this value of UP is also nearly equal to the value of the UP when it isstudied with all the districts across the states. Remarkably, the study for diﬀerent years also indicate a similar result.It is important to note that the proposed UP measure of distributional uncertainty is quite a general formulationwhich can further be applied to analyze the distributions of diﬀerent important socio-economic variables of anycountry having stratiﬁed administrative structure. Beside such applications, it would also be important to investigatethe detailed theoretical properties of the proposed UP measure as possible future research works.

Acknowledgment :OM is thankful to Indian Statistical Institute, Kolkata, India for allowing her to work as a Visiting Student at thePhysics and Applied Mathematics Unit of the Institute. SC thanks Science and Engineering Research Board (SERB),Government of India for ﬁnancial support through Junior Research Fellowship (Grant No.CRG/2019/001461). Theresearch of AG and BB is partially supported by a research grant (No. CRG/2019/001461) from the Science andEngineering Research Board (SERB), Government of India. [1] Bettencourt L.M.A., West G.B., A Uniﬁed Theory of Urban Living, Nature 467 912.(2010)[2] Lopez-Pellicer F, Florczyk A, Lacasta J, Zarazaga-Soria F, Muro-Medrano P, Administrative units, an ontological per-spective. In Proc. of the ER 2008Workshops (CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM) on Advancesin Conceptual Modeling: Challenges and Opportunities, ER ’08, pp. 354–363. Berlin,Heidelberg: Springer.( 2008)[3] Luiz G.A. Alves, Haroldo V. Ribeiro, Ervin K. Lenzi, Renio S. Mendes Empirical analysis on the connection betweenpower-law distributions and allometries for urban indicators, Physica A 409, 175 (2014)[4] Grossman G, Lewis J, Administrative unit proliferation. Am. Polit. Sci. Rev. 108, 196–217.(doi:10.1017/S0003055413000567) (2014)[5] Ma L, Urban administrative restructuring, changing scale relations and local economic development in China. Polit. Geogr.24, 477–497. (doi:10.1016/j.polgeo.2004.10.005) (2005)[6] Eeckhout J, Gibrat’s Law for (all) cities. Am. Econ. Rev. 94, 1429–1451. (doi:10.1257/0002828043052303) (2004)[7] Gangopadhayay K and Basu B, The Morphology of Urban Agglomerations for Developing Countries: A Case Study withChina Physica A PNAS11149 (2000)[12] Newman M. J., Power laws, Pareto distributions and Zipf’s law Contemporary Physics

323 (2005)[13] Batty M. The size, scale, and shape of cities,Science 319, 769–771. (doi:10.1126/science.1151419) (2008)[14] Batty M. , A theory of city size. Science 3402,1418–1419. (doi:10.1126/science.1239870) (2013)[15] Soo K. Zipf’s Law for cities: a cross-country investigation. Reg. Sci. Urban Econ. 35, 239–263.(doi:10.1016/j.regsciurbeco.2004.04.004) (2005)[16] Rozenfeld H, Rybski D, Jr JA, Batty M, Stanley H, Makse H. Laws of population growth. PNAS105, 18702–18707.(doi:10.1073/pnas.0807435105) (2008)[17] Holmes T, Lee S. Cities as six-by-six-mile squares: Zipfs Law? In Agglomeration economics (ed. E Glaeser), pp. 105–131.Chicago, IL: University of Chicago Press.(2010)[18] Jiang B, Jia T. Zipf’s law for all the natural cities in the United States: a geospatial perspective. Int. J. Geograph. Info.Sci. 25, 1269–1281. (doi:10.1080/ 13658816.2010.510801) (2010)[19] Rozenfeld H, Rybski D, Gabaix X, Makse H. The area and population of cities: new insights from a diﬀerent perspectiveon cities. Am. Econ. Rev. 101, 2205–2225. (doi:10.3386/w15409) (2011) [20] Courtat T, Gloaguen C, Douady S. Mathematics and morphogenesis of cities: a geometrical approach. Phys. Rev. E 83,036106. (doi:10.1103/PhysRevE 83.036106) (2011)[21] Masucci P, Stanilov K, Batty M , Limited urban growth: London’s street network dynamics since the 18th century. PLoSONE 8, e69469. (doi:10.1371/ journal.pone.0069469) (2013)[22] Mart´ınez-Mekler G, Mart´ınez R, del R´ıo MB, Mansilla R, Miramontes P, Cocho G. Universality of rank-ordering distribu-tions in the arts and sciences. PLoS ONE 4, e4791. (doi:10.1371/ journal.pone.0004791) ( 2009)[23] Sahasranaman A and Bettencourt Lu´ıs M. A, Urban geography and scaling of contemporary Indian cities, J. R. Soc.Interface 16 20180758. http://dx.doi.org/10.1098/rsif.2018.0758 (2019)[24] Mart´ınez-Mekler, G., R.A. Mart´ınez, M.B. del R´ıo, R. Mansilla, P. Miramontes, and G. Cocho, Universality of rank-orderingdistributions in the arts and sciences. PLoS One, e4791. (2009)[25] Ausloos M, and Cerqueti R, A universal rank-size law. PloS one, e0166011. (2016)[26] Alvarez-Martinez, R., G. Martinez-Mekler, and G. Cocho, Order–disorder transition in conﬂicting dynamics leading torank–frequency generalized beta distributions. Physica A: Statistical Mechanics and its Applications, (1) 120 (2011)[27] Alvarez-Martinez, R., G. Cocho, R.F. Rodr´ıguez, and G. Mart´ınez-Mekler, Birth and death master equation for theevolution of complex networks. Physica A: Statistical Mechanics and its Applications, , (2014), 198[28] Fontanelli O., Miramontes P, Cocho G. and Li W, Population patterns in World’s administrative units, Royal Soc. OpenScience, (7) 075515. (2018)[30] Ghosh A, Shreya P and Basu B, Maximum Entropy Framework for a Universal Rank Order distribution with Socio-economic Applications, , 125433 (2021)[31] Shannon, Claude E. . ”A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (3): 379–423.doi:10.1002/j.1538-7305.1948.tb01338.x. hdl:10338.dmlcz/101429. (July 1948)[32] Shannon, Claude E. . ”A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (4): 623–656.doi:10.1002/j.1538-7305.1948.tb00917.x. hdl:11858/00-001M-0000-002C-4317-B.(October 1948)[33] Jaynes E. T, Information Theory and Statistical Mechanics, Physical Review (4) , 620 ( 1957)[34] Kapur, J. N. Maximum-entropy models in science and engineering, John Wiley and sons (1989). Appendix A: Supplementary Tables and Figures

TABLE S1: Population (Pop), Literacy Rate (LR) and Work Participation Rate (WPR), along with the number of districts( N ), for every states of India 2011 2001State N Pop LR WPR N Pop LR WPRAndaman Nicobar (UT) 3 380581 77.32 40.08 2 356152 71.07 38.26Andhra Pradesh 23 84580777 59.77 46.61 23 76210007 52.40 45.79Arunachal Pradesh 16 1383727 55.36 42.47 13 1097968 44.15 43.98Assam 27 31205576 61.46 38.36 23 26655528 52.58 35.78Bihar 38 104099452 50.44 33.36 37 82998509 37.48 33.70Chandigarh (UT) 1 1055450 76.31 38.29 1 900635 71.42 37.80Chattisgarh 18 25545198 60.21 47.68 16 20833803 53.63 46.46Dadra - Nagar Haveli (UT) 1 343709 64.95 45.73 1 220490 47.12 51.76Daman - Diu (UT) 2 243247 77.45 49.86 2 158204 68.01 46.01Delhi (UT) 9 16787941 75.87 33.28 9 13850507 69.78 32.82Goa 2 1458545 79.91 39.58 2 1347668 73.13 38.80Gujarat 26 60439692 67.99 40.98 25 50671017 58.87 41.95Haryana 21 25351462 65.48 35.17 19 21144564 57.20 39.62Himachal Pradesh 12 6864602 73.42 51.85 12 6077900 66.50 49.24Jammu Kashmir 22 12541302 56.35 34.47 14 10143700 47.39 37.01Jharkhand 24 32988134 55.56 39.71 18 26945829 43.71 37.52Karnataka 30 61095297 66.53 45.62 27 52850562 57.59 44.53Kerala 14 33406061 84.22 34.78 14 31841374 80.04 32.30Lakshadweep (UT) 1 64473 81.51 29.09 1 60650 73.67 25.32Madhya Pradesh 50 72626809 59.00 43.47 50 60348023 52.35 42.74Maharashtra 35 112374333 72.57 43.99 35 96878627 66.03 42.50Manipur 9 2855794 66.83 45.68 9 2293896 57.13 41.21Meghalaya 7 2966889 60.16 39.96 7 2318822 49.93 41.84Mizoram 8 1097206 77.30 44.36 8 888573 74.44 52.57Nagaland 11 1978502 67.85 49.24 8 1990036 56.90 42.60Odisha 30 41974218 63.71 41.79 30 36804660 53.90 38.79Puducherry (UT) 4 1247953 76.71 35.66 4 974345 71.47 35.17Punjab 20 27743338 67.43 35.67 17 24358999 60.58 37.47Rajasthan 33 68548437 55.84 43.60 32 56507188 49.02 42.06Sikkim 4 610577 72.87 50.47 4 540851 58.86 48.64Tamilnadu 32 72147030 71.85 45.58 30 62405679 64.94 44.67Tripura 4 3673917 76.34 40.00 4 3199203 63.21 36.25Uttar Pradesh 71 199812341 57.25 32.94 70 166197921 45.56 32.48Uttarakhand 13 10086292 68.22 38.39 13 8489349 60.14 36.92West Bengal 19 91276115 67.42 38.08 18 80176197 58.87 36.77 (a) Year 2011(b) Year 2011(c) Year 2011 (d) Year 2001(e) Year 2001(f) Year 2001(a) Year 2011(b) Year 2011(c) Year 2011 (d) Year 2001(e) Year 2001(f) Year 2001