A nonparametric approach to assess undergraduate performance
Hildete P. Pinheiro, Pranab K. Sen, Aluísio Pinheiro, Samara F. Kiihl
OO R I G I N A L A R T I C L E
J o u r n a l S e c t i o n
A nonparametric approach to assessundergraduate performance
Hildete P. Pinheiro PhD | Pranab K. Sen PhD |Aluísio Pinheiro PhD | Samara F. Kiihl PhD Department of Statistics, University ofCampinas, Campinas, São Paulo, Brazil Department of Biostatistics, University ofNorth Carolina, Chapel Hill, NC, USA Department of Statistics and OperationsResearch, University of North Carolina,Chapel Hill, NC, USA
Correspondence
Hildete P. Pinheiro PhD, Department ofStatistics, University of Campinas,Campinas, São Paulo, BrazilEmail: [email protected]
Funding information
CNPq-Brazil, Grant/Award Number:308439/2014-7, 308583/2015-9;Fapesp-Brazil, Grant/Award Number:2011/15047-7, 2013/00506-1,2016/07226-2.
Nonparametric methodologies are proposed to assess col-lege students’ performance. Emphasis is given to gender andsector of High School. The application concerns the Univer-sity of Campinas, a research university in Southeast Brazil.In Brazil college is based on a somewhat rigid set of subjectsfor each major. Thence a student’s relative performance cannot be accurately measured by the Grade Point Average orby any other single measure. We then define individual vec-tors of course grades. These vectors are used in pairwisecomparisons of common subject grades for individuals thatentered college in the same year. The relative college per-formances of any two students is compared to their relativeperformances on the Entrance Exam Score. A test basedon generalized U-statistics is developed for homogeneityof some predefined groups. Asymptotic normality of thetest statistic is true for both null and alternative hypothe-ses. Maximum power is attained by employing the unionintersection principle.
K E Y W O R D S bootstrap, diversity measures, nonparametric methods, quasi U -statistics, union intersection principle, U -statistics a r X i v : . [ s t a t . M E ] S e p P INHEIRO ET AL . | INTRODUCTION
The goal of this work is to evaluate the differences in students’ performance from entrance to graduation in undergradu-ate courses. Some work has been done addressing this problem of performance assessment of undergraduate students.Everett and Robins (1991), Birch and Miller (2006, 2007) and Win and Miller (2005) have studied the influence ofstudents’ background and school factors on university performance, showing that students’ success during their firstyear in the university is largely influenced by their EES (Entrance Exam Score) and type of high school (government ornon-government). Murray-Harvey (1993) used path analysis to identify characteristics of successful undergraduatestudents. Smith and Naylor (2005) and Dobson and Skuja (2005) have studied the schooling effects on universityperformance. Pedrosa et al. (2007) used hierarchical regression models, with the relative gain as response variable,to investigate demographic and socio-economic factors that influenced university performance. The relative gain isbased on the relative rank of students’ final (or last) recorded GPA (Grade Point Average) and students’ total EES rank.More recently, Grilli et al. (2015, 2016) used binomial mixture models and quantile regression to model the number ofcredits gained by freshmen during the first year in college and Maia et al. (2016) used nonparametric methods on quasiU-statistics (Pinheiro et al., 2009, 2011) to evaluate students’ performance in different groups.We present a real data set from the University of Campinas (Unicamp) in Section 2. A descriptive analysis of thedata set and the evaluation of students’ performance based only on the overall average of the EES and the overall GPAis given. But the GPA might have some bias, since there are students from different areas with subjects/courses withdifferent grading systems and different teachers. In view of this, we seek for more robust methods to compare students’performance. The proposed method looks at the EES rank and all the grades in all courses taken by each individual,performing all pairwise comparisons among individuals entering in the same year, in the same course/major, taking thesame subject.Average distance measures within and between groups for each year are defined and then the average over all yearsin the study is taken. The decomposability of quasi U -statistics (Pinheiro et al., 2009, 2011) is applied to define averagedistance measures within and between groups. A test statistic for a homogeneity test among groups is developed andits asymptotic normality is achieved under the null hypothesis. The alternative hypothesis is one-sided and, in orderto maximize its power, we use the union intersection principal (UIP) discussed in Silvapulle and Sen (2005) to test forcontrasts of interest. We study the performances of Unicamp’s students comparing them according to sex and type ofHigh School - Public High Schools ( PuS ) and Private High Schools (
PrS | UNDERGRADUATE PERFORMANCE AT THE UNIVERSITY OF CAMPINAS
The dataset is composed by 12168 (57.3% male and 42.7% female) students which have enrolled at Unicamp at years2000 to 2005 in Bachelor’s degree courses/majors of the areas of Arts ( Ar ), Health Sciences ( HS ), Engineering and Exact INHEIRO ET AL . 3
Sciences (
EngES ) and Social Sciences ( SS ). Since in 2005 Unicamp implemented an affirmative action program givingextra bonus in the final EES for students who studied all High years in Public School, it was of great interest the study ofperformance of these students from entrance to graduation. In view of this, it would be interesting to have most of thestudents in the data set who already graduated from college. The academic situation of these students were classifiedas following: Graduates (students who have already graduated - 77.1%), Active (students who were still enrolled in theUniversity at the time the data was provided and had not graduated yet - 0.9%), and Others (the ones who droppedout from the University - 22.0%). The students were, in their majority, between 16 and 23 years old (94.3%) from allBrazilian regions and enrolled in 45 different majors/courses from the areas of HS (19.8%), EngES (55.7%), SS (18.5%)and Ar (6%). About 70% of students who enrolled between 2000 and 2005 come from Private High School ( PrS ). Thegroups of most interest in the analysis are sex and type of High School because of previous work (Pedrosa et al., 2007;Maia et al., 2016) with data set from Unicamp showing some differences in performance according to sex and type ofHigh School. So, the distributions of sex and type of High School by year are shown in Tables 1 and 2, respectively. Table3 shows the total number of students in each group of interest, i.e., according to type of High School and Sex.
TA B L E 1
Gender distribution by yearSex Entrance year Total2000 2001 2002 2003 2004 2005% % % % % % n %Male 59.1 56.1 56.6 59. 4 55.9 56.6 6969 57.3Female 40.9 43.9 43.4 40.6 44.1 43.4 5199 42.7Total 100.0 100.0 100.0 100.0 100.0 100.0 12168 100.0
TA B L E 2
Distribution per year according to type of High SchoolHigh School Entrance year Total2000 2001 2002 2003 2004 2005% % % % % % n %Private 70.0 71.6 69.7 71. 0 73.0 67.1 8429 70.4Public 30.0 28.4 30.3 29.0 27.0 32.9 3543 29.6Total 100.0 100.0 100.0 100.0 100.0 100.0 11972* 100.0 *There was no information about type of High School for 196 students.
Figure 1 presents boxplots of sample distributions by gender and High School system of the EES (scores arestandardized having mean 500 and standard deviation 100) and GPA (weighted average of grades from 0 to 10 accordingto the number of credits in each subject and it is between 0 and 1), respectively. One can see that students who studiedin
PrS have a better EES performance than those coming from
PuS irrespective of gender. Once they get into College andwe look at their GPA, the situation seems to get reversed or at least, on average, they are tied. On the other hand, whenone looks at the distribution of the GPA by sex, type of High School and Area displayed in Figure 2, the situation is not P INHEIRO ET AL . TA B L E 3
Total number of students by Type of High School and SexGroup n %Male - Private 4797 40Female - Private 3632 30Male - Public 2041 17Female - Public 1502 13Total 11972* 100 *There was no information about type of High School for 196 students. that clear and it is not the same in all areas. The difference between sexes is not so big, especially in Exact Sciences andEngineering and in Social Sciences. In addition, there is a majority of Male students in Engineering and Exact Sciences(70% males and 30% females), while in Health Sciences (41% males and 59% females) and Social Sciences (41% malesand 59% females) the situation is reversed and in Arts (51% males 49% females) is quite even. The point is that studentsfrom different courses/majors take different subjects with different grading systems. Therefore, the GPA might not be agood measure of performance to compare students of different areas or majors. Furthermore, one can notice that thereare many outliers at the lower tail of the distribution of the GPA scores. This is due to the 22% of students who dropoutfrom College. Some students who did not have a good performance in the first years of College (e.g., fail all the subjectsin the first or second semester) have their enrollment canceled. Since Unicamp is a public university, there are somerigid dropout rules.
INHEIRO ET AL . 5
M M F F
Sex E n t r an c e E x a m S c o r e Type of High SchoolPr HS Pu HS
M M F F . . . . . . Sex G PA Type of High SchoolPr HS Pu HS
F I G U R E 1
Left: Box plots of EES according to sex and type of High School. Right: Box plots of GPA according to sex and typeof High School P INHEIRO ET AL . llllllllllllllll lllllllllll llllllllllllllllllllllllll ll Arts
M M F F . . . . . . Type of High SchoolPr HS Pu HS G P A llllllllllllllllllllllllllllllllllllllll llllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllll Social Sciences
M M F F . . . . . . Type of High SchoolPr HS Pu HS G P A lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllll Eng and Exact Sciences
M M F F . . . . . . Type of High SchoolPr HS Pu HS G P A lllllllllllllllllllllllllllllllllllllllll lllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll Health Sciences
M M F F . . . . . . Type of High SchoolPr HS Pu HS G P A F I G U R E 2
Box plots of GPA for each area according to sex and type of High School. Arts 51% males 49% females, SocialSciences 37% males and 63% females, Engineering and Exact Sciences 70% males and 30% females, Health Sciences 41% malesand 59% females.
INHEIRO ET AL . 7
To investigate further the dropout rate at Unicamp, a logistic regression model was fitted having as response adichotomous variable being Active/Graduated or dropout. The covariates in the model were Sex, Type of High School,Entrance year and Area. The significant main effects were Sex (p-value < . ), Area (p-value < . ) and EntranceYear (p-value = 0 , ). The main effect for type of High School was not significant (p-value = 0 . ), but there weresignificant interactions between Sex and type of High School (p-value = 0 . ) as well as Area and Entrance year(p-value = 0 . ). According to the estimates of the parameters in the model, the highest rate of dropout are amongMales, students from EngES and from the Entrance year of 2003. The interaction between Sex and type of High Schoolis explained as follows: within students coming from
PuS , the estimated odds ratio is 2.22 (i.e, Male students have 2.2times more chance of drop out than Female students), while within students coming from
PrS , the estimated odds ratiois 1.59 (i.e., the dropout rate is 1.6 times higher for Male than Female students). In general, the dropout rate is higher forMales than for Females, but the difference in the dropout rate is even higher for those coming from
PuS .If the analysis were done just looking at the final EES and the GPA, probabilities of discordant and concordant pairsmay be defined as the following. A probability of concordance of type 1 ( C gg (cid:48) ) is when students of group g are betterthan of those in g (cid:48) in the EES and in the GPA, discordance of type 1 ( D gg (cid:48) ) is when students of group g (cid:48) are better thanthose in group g in the EES, but worse than those of g in the GPA, concordance of type 2 ( C gg (cid:48) ) is when students ofgroup g (cid:48) are better than those of g in the EES and in the GPA and discordance of type 2 ( D gg (cid:48) ) is when students of group g are better than those of g (cid:48) in the EES, but worse than those of g (cid:48) in the GPA. So, if the probability of type I discordanceis greater than the probability of type II, the student coming from group g has a greater chance to perform better inCollege than one coming from group g (cid:48) . Table 4 shows the proportion of concordance of type 1 ( C gg (cid:48) ), discordance oftype 1 ( D gg (cid:48) ), concordance of type 2 ( C gg (cid:48) ) and discordance of type 2 ( D gg (cid:48) ). The first two lines of Table 4 show thatwomen seems to be better in the EES and in the GPA than men (35% and 36% of the times they are better than menin PuS and
PrS , respectively) against 28% and 27%, respectively, of the times that women are worse than men in theEES and better in in GPA. Line three of Table 4 shows that 31% of Female students from
PuS were better in the EES andcontinue better in the GPA, but there is not much difference from the case where Female students from
PrS were betterin the EES and continue better in the GPA (28%). Looking at all lines of Table 4, there is not much difference among them,but the proportion of concordant pairs seems to be a little bigger than the discordant pairs.
TA B L E 4
Proportion of discordant and concordant pairs by groups of interest according to final EES and GPAGroups C gg (cid:48) C gg (cid:48) D gg (cid:48) D gg (cid:48) C Dg = F-PuS; g (cid:48) = M-PuS 0.35 0.22 0.28 0.15 0.57 0.43 g = F-PrS; g (cid:48) = M-PrS 0.36 0.23 0.27 0.15 0.59 0.41 g = F-PuS; g (cid:48) = F-PrS 0.31 0.28 0.23 0.19 0.58 0.42 g = M-PuS; g (cid:48) = M-PrS 0.29 0.29 0.22 0.19 0.59 0.41Because of the potential bias of the GPA due to different areas and different types of grading system, we weremotivated to seek for more robust methods to measure academic performance. Therefore, methods presented insections 3 and 4 perform all pairwise comparisons within students from the same course/major, same year of entranceand taking the same subject. P INHEIRO ET AL . | NOTATION AND U-STATISTICS
Let Z i ag = ( Z i ag , . . . , Z i agL i ) (cid:48) be the vector of grades for student i , from group g , who entered at year a . Let l = 1 , . . . , L i be the index indicating the subject taken by student i , with L i being all the subjects taken by student i . Also, let ¯ Z ig and ¯ Z jg be the average of the EES for students i and j , respectively.Note that though Z ’s are theoretically continuous r.v.’s, we observe discrete grades, since they are rounded byone decimal point. So, let Y i agl be the discrete grades for student i . For instance, Y i agl ∈ { . , . , . , . . . , . } , i.e., Y i agl = 0 . , if Z i agl ∈ [ . , . ) , Y i agl = 0 . , if Z i agl ∈ [ . , . ) , . . . , Y i agl = 9 . , if Z i agl ∈ [ . , . ) , Y i agl = 10 . ,if Z i agl ∈ [ . , . ] . Analogously, we can define ¯ Y ig and ¯ Y jg as the discrete versions of the EES ¯ Z ig and ¯ Z jg forstudents i and j , respectively.Let L a = (cid:205) n a i =1 L i be the total number of subjects that students from year a can take, L = (cid:205) Aa =1 L a be the totalnumber of subjects that students from years a = 1 , . . . A can take, n agl be the number of students of group g from year a who took subject l , l = 1 , . . . , L , n ag be the number of students of group g from year a and n a ( = (cid:205) Gg =1 n ag ) be the numberof students in year a .Typically, L is large, but each student takes a number of subjects (say 8 or 10) a year, so that (cid:205) Gg =1 (cid:205) Ll =1 n agl >> n a .Also let II agg (cid:48) l ( i , j ) = II agl ( i ) × II ag (cid:48) l ( j ) , where (cid:40) II agl ( i ) = II ( student i of year a and group g took subject l ) and II ag (cid:48) l ( j ) = II ( student j of year a and group g (cid:48) took subject l ) , with I ( B ) = 1 , if B is true and otherwise.Now define φ ( Y i ag , Y j ag (cid:48) ) = φ ( Y i agl , Y j ag (cid:48) l , ¯ Y ig , ¯ Y jg (cid:48) ) = (cid:8) I ( Y i agl > Y j ag (cid:48) l ) I ( ¯ Y ig < ¯ Y jg (cid:48) ) + I ( Y i agl < Y j ag (cid:48) l ) I ( ¯ Y ig > ¯ Y jg (cid:48) ) − I ( Y i agl > Y j ag (cid:48) l ) I ( ¯ Y ig > ¯ Y jg (cid:48) )− I ( Y i agl < Y j ag (cid:48) l ) I ( ¯ Y ig < ¯ Y jg (cid:48) ) (cid:9) (cid:8) II agg (cid:48) l ( i , j ) (cid:9) (1)as a kernel of a generalized U -statistics.Note that, according to (1), there is no interest in tie cases, i.e., Y ail = Y ajl or ¯ Y i = ¯ Y j . Then, φ ( Y i ag , Y j ag (cid:48) ) = 0 whenthere are ties.Now define a U -statistic (Hoeffding, 1948; Lee, 1990) of degree 2 as U naggl = (cid:205) ≤ i < j ≤ n agl φ ( Y i ag , Y j ag )/ (cid:0) n agl (cid:1) anda generalized U -statistic of degree (1,1) as U nagg (cid:48) l = (cid:205) n agl i =1 (cid:205) n ag (cid:48) l j =1 φ ( Y i ag , Y j ag (cid:48) )/ n agl n ag (cid:48) l .Now let the overall probability of concordance to be P ( C agg (cid:48) l ) = P ( C agg (cid:48) l ) + P ( C agg (cid:48) l ) , where P ( C agg (cid:48) l ) = P ( Y i agl > Y j ag (cid:48) l , ¯ Y ig > ¯ Y jg (cid:48) ) and P ( C agg (cid:48) l ) = P ( Y i agl < Y j ag (cid:48) l , ¯ Y ig < ¯ Y jg (cid:48) ) are the probabilities of concordance oftypes 1 and 2, respectively. Analogously, the overall probability of discordance is P ( D agg (cid:48) l ) = P ( D agg (cid:48) l ) + P ( D agg (cid:48) l ) ,where P ( D agg (cid:48) l ) = P ( Y i agl > Y j ag (cid:48) l , ¯ Y ig < ¯ Y jg (cid:48) ) and P ( D agg (cid:48) l ) = P ( Y i agl < Y j ag (cid:48) l , ¯ Y ig > ¯ Y jg (cid:48) ) are the probabilitiesof discordance of types 1 and 2, respectively.Also, let θ agg (cid:48) l = E { φ ( Y i agl , Y j ag (cid:48) l , ¯ Y ig , ¯ Y jg (cid:48) )} = P ( D agg (cid:48) l ) − P ( C agg (cid:48) l ) (2)and ν agg (cid:48) l = E {[ φ ( Y i agl , Y j ag (cid:48) l , ¯ Y ig , ¯ Y jg (cid:48) )] } = P ( D agg (cid:48) l ) + P ( C agg (cid:48) l ) < , (3) INHEIRO ET AL . 9 since we are not interested in the ties. Analogously, θ aggl = E { φ ( Y i agl , Y j agl , ¯ Y ig , ¯ Y jg )} = P ( D aggl ) − P ( C aggl ) (4)and ν aggl = E {[ φ ( Y i agl , Y j agl , ¯ Y ig , ¯ Y jg )] } = P ( D aggl ) + P ( C aggl ) < . (5)Using Hoeffding’s decomposition (Hoeffding, 1948), we have U naggl = θ aggl + 2 n agl n agl (cid:213) i =1 [ Ψ ( Y i ag ) − θ aggl ] + U ∗( ) naggl and U nagg (cid:48) l = θ agg (cid:48) l + 1 n agl n agl (cid:213) i = [ Ψ ( Y i ag ) − θ aagg (cid:48) l ] + 1 n ag (cid:48) l n ag (cid:48) l (cid:213) j =1 [ Ψ ( Y j ag (cid:48) ) − θ agg (cid:48) l ] + U ∗( ) nagg (cid:48) l , where Ψ ( Y i ag ) = E [ φ ( Y i ag , Y j ag ) | Y i ag ] , U ∗( ) naggl = O p ( n − agl ) , Ψ ( Y i ag ) = E [ φ ( Y i ag , Y j ag (cid:48) ) | Y i ag ] , Ψ ( Y j ag (cid:48) ) = E [ φ ( Y i ag , Y i ag (cid:48) ) | Y j ag (cid:48) ] and U ∗( ) nagg (cid:48) l = O p ( n − agl + n − ag (cid:48) l + n / agl n / ag (cid:48) l ) .Since U naggl and U nagg (cid:48) l are U -statistics, they are asymptotically normally distributed (Hoeffding, 1948) as follows (cid:112) n agl ( U naggl − θ aggl ) D −→ N ( , ξ ) , with ξ = E [ Ψ ( Y i ag )] − θ aggl , and γ − / n ( U nagg (cid:48) l − θ agg (cid:48) l ) D −→ N ( , ) , where γ n = ξ n agl + ξ n ag (cid:48) l , ξ = E [ Ψ ( Y i ag )] − θ agg (cid:48) l and ξ = E [ Ψ ( Y j ag (cid:48) )] − θ agg (cid:48) l .Now define a quasi U -statistics (Pinheiro et al., 2009; 2011) as B nagg (cid:48) l = w nagg (cid:48) l (cid:8) U nagg (cid:48) l − U naggl − U nag (cid:48) g (cid:48) l (cid:9) , (6)with w nagg (cid:48) l = (cid:104) n agl n ag (cid:48) l n agl + n ag (cid:48) l (cid:105)(cid:104)(cid:205) g , g (cid:48) (cid:16) n agl n ag (cid:48) l n agl + n ag (cid:48) l (cid:17)(cid:105) . (7)Note that w nagg (cid:48) l = (cid:40) O ( ) , if both n agl and n ag (cid:48) l are large O ( n − ) , if at least one of them ( n agl , n ag (cid:48) l ) are small , INHEIRO ET AL . where n = min { n agl , n ag (cid:48) l } .Now, let B nagg (cid:48) l = B ( ) nagg (cid:48) l + B ( ) nagg (cid:48) l , so that B ( ) nagg (cid:48) l are those with w nagg (cid:48) l = O ( ) and B ( ) nagg (cid:48) l are those with w nagg (cid:48) l = O ( n − ) .Finally, let us define a test statistic as an overall sample measure of divergence between groups as B ngg (cid:48) = A (cid:213) a =1 L (cid:213) l =1 B nagg (cid:48) l = A (cid:213) a =1 L ( ) a (cid:213) l =1 B ( ) nagg (cid:48) l + L ( ) a (cid:213) l =1 B ( ) nagg (cid:48) l = B ( ) ngg (cid:48) + B ( ) ngg (cid:48) (8)where L ( ) a is the total number of subjects with large n agl ’s and n ag (cid:48) l ’s and L ( ) a is the total number of subjects with small n agl ’s and n ag (cid:48) l ’s.Under the null hypothesis of homogeneity among groups, we would say that there is no difference in performancebetween the entrance grade and the grades in the courses taken in the University neither within the groups norbetween groups, i.e., H : θ aggl = θ ag (cid:48) g (cid:48) l = θ agg (cid:48) l , (cid:91) 1 ≤ g < g (cid:48) ≤ G , (cid:91) a = 1 , . . . , A and (cid:91) l = 1 , . . . , L , or H : θ agg (cid:48) l − θ aggl − θ ag (cid:48) g (cid:48) l = 0 . But even though − ≤ θ agg (cid:48) l ≤ , there are several interesting situations under which H : θ agg (cid:48) l − θ aggl − θ ag (cid:48) g (cid:48) l > (see the Appendix for details of the one-sided alternative).Note that for G groups, we have G ( G − )/ G ∗ group comparisons. Therefore, we may define a G ∗ × vector B n = B ( ) n + B ( ) n with elements B ngg (cid:48) ’s. | HYPOTHESES AND TESTING PROCEDURE
Concordant pairs are those where individual i had a better/worse grade than j in the entrance exam and continued tobe better/worse than j in the course grade in the University. Let’s say that individual i came from a Public High Schooland j from a Private High School. A discordant pair of type I is when individual i had a worse performance than j in theEES, but a better performance than j in his/her course grade. A discordant of type II is the opposite situation, i.e., i hada better performance than j in the EES, but a worse performance than j in his/her course grade. From this setup, wewould say that if the probability of type I discordance is greater than the probability of type II, the student coming from aPublic High School has a greater chance to perform better in the University than one coming from a Private High School.If there is no difference in performance between groups g and g (cid:48) , H is true and θ aggl = θ ag (cid:48) g (cid:48) l = θ agg (cid:48) l = θ al andtherefore E { ¯ B nagg (cid:48) l } = w nagg (cid:48) l [ θ al − θ al − θ al ] = 0 and H : θ agg (cid:48) l − θ aggl − θ ag (cid:48) g (cid:48) l > . Then, E H { B nagg (cid:48) } = 0 under H and E H { B nagg (cid:48) } > . See the Appendix for justification of the one sided alternative.Note that the elements of B n are the ordered B nagg (cid:48) ’s, say B n , . . . , B nG ∗ , where G ∗ = (cid:0) G (cid:1) and they are not allindependent. For instance, for G = 3 , B n = ( B n , B n , B n ) , with G ∗ = (cid:0) (cid:1) . Therefore, we need to make an adjustmentlike multiple comparisons test procedures. We may have different tests according to the interest. For instance, if wehave two factors, say gender (F-Female and M-Male) and two types of High School (Pu-Public and Pr-Private), we mayhave statistics for the main effects and for the interaction. In this case, we have four groups (F-Pu, F-Pr, M-Pu, M-Pr) andfor simplicity of notation g = 1 , . . . , , with → F-Pu, → F-Pr, → M-Pu, → M-Pr.
INHEIRO ET AL . 11
Then, we will have as within and between graduate performance measures θ a l θ a l θ a l θ a l θ a l θ a l θ a l θ a l θ a l θ a l . In order to maximize the power of the tests, we may use the union intersection principle (UIP) discussed in Silvapulleand Sen (2005) to test for main effects of Sex and Type of High School as well as the interaction effect.For testing Female × Male: H : θ a l − θ a l − θ a l + 2 θ a l − θ a l − θ a l = 0 H : θ a l − θ a l − θ a l + 2 θ a l − θ a l − θ a l > (9)and if we call Θ gg (cid:48) = 2 θ agg (cid:48) l − θ aggl − θ ag (cid:48) g (cid:48) l the hypothesis can be written as H : C Θ = 0 vs. H : C Θ > , (10)with C = ( , , , , , ) and Θ = ( Θ , Θ , Θ , Θ , Θ , Θ ) (cid:48) .For testing Public × Private: H : θ a l + 2 θ a l − ( θ a l + θ a l + θ a l + θ a l ) = 0 H : θ a l + 2 θ a l − ( θ a l + θ a l + θ a l + θ a l ) > (11)or H : C Θ = 0 vs. H : C Θ > , (12)with C = ( , , , , , ) and Θ = ( Θ , Θ , Θ , Θ , Θ , Θ ) (cid:48) .For testing the interaction Sex*Type of High School: H : θ a l − θ a l − θ a l − θ a l + θ a l + θ a l = 0 H : θ a l − θ a l − θ a l − θ a l + θ a l + θ a l (cid:44) (13)or H : C Θ = 0 vs. H : C Θ (cid:44) , (14)with C = (cid:32) −
10 1 0 0 − (cid:33) and Θ = ( Θ , Θ , Θ , Θ , Θ , Θ ) (cid:48) .Then we define a vector B n = ( B n , . . . , B nG ∗ ) as the vector of the ordered B ngg (cid:48) ’s. In this case, B n = ( B n , B n , . . . , B n ) is a × vector. Also, T n = CB n is the vector of linear combinations of the elements of B n = B ( ) n + B ( ) n . Then, we maywrite T n = CB ( ) n + CB ( ) n = T ( ) n + T ( ) n . INHEIRO ET AL . Theorem 1
Let C be a matrix of contrasts and B ( ) n be a vector with elements B ( ) ngg (cid:48) = (cid:205) Aa =1 (cid:205) L ( ) a l =1 B ( ) nagg (cid:48) l with w nagg (cid:48) l = O ( n − ) . √ n CB ( ) n L −→ . Proof:Note that the elements of B ( ) n are B ( ) ngg (cid:48) = (cid:205) Aa =1 (cid:205) L ( ) a l =1 B ( ) nagg (cid:48) l with w nagg (cid:48) l = O ( n − ) .Then, B ( ) ngg (cid:48) = O p ( n − ) and V ar (√ nB ( ) ngg (cid:48) ) = O ( n − ) ⇒ E ( n | | CB ( ) n | | ) → , i.e., √ n CB ( ) n L −→ . (cid:3) Theorem 2
Let C be a matrix of contrasts, T n = CB n and n B n d −→ N ( , Σ ) . If V n p −→ Σ , with Σ = C Σ C (cid:48) , then n T (cid:48) n V − n T n d −→ χ [ r ank ( V − n )] . Proof:According to Pinheiro et al. (2009), the elements of B n are quasi U-statistics and it has an asymptotically multivariatenormal distribution with covariance matrix Σ , i.e., n B n d −→ N ( Θ , Σ ) .As T n is a linear combination of a asymptotically multivariate normal vector, then n T n d −→ N ( C Θ , Σ ) , where Σ = C Σ C (cid:48) , i.e., n T n d −→ X , where X ∼ N ( µ , Σ ) , with µ = C Θ .By Searle (1971)(Theorem 2, page 57), if X ∼ N ( µ , Σ ) , then X (cid:48) AX ∼ χ [ r ank ( A ) , µ (cid:48) A µ / ] if and only if A Σ is idempo-tent. Now, if A = Σ − , then, X (cid:48) Σ − X ∼ χ [ r ank ( Σ − ) , µ (cid:48) Σ − µ / ] . Therefore, under H : µ = C Θ = and X (cid:48) Σ − X ∼ χ ( r ank ( Σ − )) .If we now consider V n = (cid:99) Σ and if V − n p −→ Σ − , we can say that X (cid:48) V − n X − X (cid:48) Σ X = X (cid:48) ( V − n − Σ − ) X = o p ( ) . Also, n T (cid:48) n V − n T n − n T (cid:48) n Σ − T n = n T (cid:48) n ( V − n − Σ − ) T n = o p ( ) .As n T n d −→ X and V − n p −→ Σ − , by Slutsky Theorem, n T (cid:48) n V − n T n − X (cid:48) Σ − X = o p ( ) and under H , n T (cid:48) n V − n T n d −→ χ [ r ank ( V − n )] . (cid:3) Now, consider the set (cid:208) of all p vectors a = ( a , . . . , a p ) (cid:48) , where a j can be either or , and partition T n and V n into ( T na , T na (cid:48) ) and (cid:32) V naa V naa (cid:48) V na (cid:48) a V na (cid:48) a (cid:48) (cid:33) a (cid:48) being the complement of a , ∅ ⊂ a ⊆ (cid:208) . Further, define T na : a (cid:48) and V naa : a (cid:48) as T na : a (cid:48) = T na − V naa (cid:48) V − naa (cid:48) T na (cid:48) and V naa : a (cid:48) = V naa − V naa (cid:48) V − na (cid:48) a (cid:48) V na (cid:48) a , (cid:91) ∅ ⊆ a ⊆ (cid:208) . Then, for the hypotheses given in (10) and (12), we may use the unionintersection principle (Silvapulle and Sen, 2005) and the test statistic is L n = (cid:213) ∅ ⊆ a ⊆ (cid:208) ( T na : a (cid:48) > , V − na (cid:48) a (cid:48) T na (cid:48) ≤ )( n T (cid:48) na : a (cid:48) V − naa : a (cid:48) T na : a (cid:48) ) . (15)Under H , L n d −→ p (cid:213) k =0 w k χ k , (16)where χ k are independent chi-square random variables with k ( = 0 , , . . . , p ) degrees of freedom and the normal orthant INHEIRO ET AL . 13 probabilities with respect to V n lead to the approximation for the w k when sorted by the cardinality of the element a : ∅ ⊆ a ⊆ (cid:208) .For the hypotheses given by (14) and Theorem 2, we have that, under H , L n = n T (cid:48) n V − n T n d −→ χ (cid:104) r ank (cid:16) V − n (cid:17)(cid:105) . (17)If we are testing the difference in performance of students coming from Public or Private High School, with L n one sided test, we will be able to detect in which direction is the difference, i.e., if students from P uS have betterperformance in the University than students coming from
P r S or the other way around.We may use T n = B n + B n , i.e, C = ( , , , , , ) to test H , according to (10) with L n = 11 ( T n > ) n T n / S ; T n = B n + B n , i.e., C = ( , , , , , ) to test H , according to (12) with L n = 11 ( T n > ) n T n / S ; andIn the case of a two-sided alternative, T n = (cid:32) B n − B n B n − B n (cid:33) , i.e., C = (cid:32) −
10 1 0 0 − (cid:33) can be used to test H , according to (14) with L n = n T (cid:48) n V − n T n . | APPLICATION
We apply the proposed test procedures to the data from the University of Campinas described in Section 2. The dataset used to apply the methods shown in Sections 3 and 4 is the same described in Section 2.The main interest is to test the following null hypotheses: • H : There is no difference in performance between female and male ; • H : There is no difference in performance between students coming from Public and Private High Schools ; • H : There is no interaction between sex and type of High School ;addressed in Sections 5.1, 5.2 and 5.3, respectively.In order to apply the methods described in Sections 3 and 4, we should separate the data into groups according tosex and type of High School. | Test of homogeneity between male and female students
For testing Female × Male, the hypothesis test H : Θ + Θ = 0 versus H : Θ + Θ > is given by (10) withthe test statistic given by T n = B n + B n , with L n = 1 ( T n > ) n T n / σ . Since the p-value equals one, there is noevidence to reject the hypothesis of homogeneity between sexes. Note that the value of the observed test statistic L n obs is 0, since we have a one-sided test and there is an indicator function in (15).Figure 3 shows the empirical distribution of B n (effect of sex in PuS , i.e., H : Θ = 0 ) and B n (effect of sexin PrS, i.e, H : Θ = 0 ) under the null hypothesis. The value of the observed test statistics are B n obs = − . and B n obs = − . with p-values 0.387 and 1, respectively. Therefore, we can say that there is no evidence of changing ofdirection in performance for Female and Male students. INHEIRO ET AL . Histogram of B13
B13 F r equen cy −35 −25 −15 . . . Histogram of B24
B24 F r equen cy −80 −40 . . . F I G U R E 3
Empirical distributions of B n and B n under the hypothesis of homogeneity between sexes. | Test of homogeneity between students from public and private high schools
For this test, we assume that students coming from
PrS have better performance than those from
PuS . So, we would liketo test if the performance continues in the same direction during undergraduate school. The observed value of B n and B n are, respectively, -42.66 and -45.93, with p-values 0.990 and 0.995, respectively. Figure 4 presents the empiricaldistributions of B n (effect of High School among Males) and B n (effect of High School among Females) under thenull hypothesis of homogeneity among Schools. P-value for the test given by (12) and the test statistic (17) is 1, for thesame reason given above for the test of homogeneity among sexes. From Figure 4 one can see that the effect of HighSchool is similar in both sexes, with no evidence for rejecting the respective null hypotheses. So, we could say that theperformance continues in the same direction, i.e., PrS students perform better than
PuS students both in the EES and inCollege.
INHEIRO ET AL . 15
Histogram of B12
B12 F r equen cy −50 −35 −20 . . . . . Histogram of B34
B34 F r equen cy −50 −30 −10 . . . . F I G U R E 4
Empirical distributions of B n and B n under the hypothesis of homogeneity between types of High School. | Test of interaction between sex and type of high school
Figure 5 shows the empirical distribution of B n (effect of High School system among Male students), B n (effectof High School system among Female students), B n (effect of sex among students from PuS) and B n (effect of sexamong students from PrS) under the hypothesis of no interaction between type of High School and Sex. The values ofthe observed test statistics are B n obs = − . , B n obs = − . , B n obs = − . and B n obs = − . .Figure 6 shows the empirical distribution of L n under the hypothesis of no interaction between sex and HighSchool system. The observed value of the test statistic is L n obs = 28 . , with p-value 0.02. Therefore, there isevidence of interaction between type of High School and Sex. Looking at Figure 5, one can see a difference betweenthe empirical distributions of B n and B n , showing that the effects of sex in PuS ( B n ) and in PrS ( B n ) are different.It seems that in PrS the difference between Female and Male students are greater than in
PuS , which seems to agreewith Table 4, where the number of concordant pairs in line two is slightly greater than in line one, indicating betterperformance of Female students in
PrS . INHEIRO ET AL . Histogram of B12
B12 F r equen cy −60 −40 −20 . . Histogram of B34
B34 F r equen cy −60 −40 −20 . Histogram of B13
B13 F r equen cy −35 −25 −15 . . Histogram of B24
B24 F r equen cy −100 −60 −40 −20 . F I G U R E 5
Empirical distributions of B n , B n , B n and B n under the hypothesis of homogeneity between types of HighSchool in both sexes. Histogram of Ln Ln F r equen cy . . . . F I G U R E 6
Empirical distributions of L n under the hypothesis of no interaction between types of High School and Sex. INHEIRO ET AL . 17 | DISCUSSION
We propose testing procedures for the comparison of student performance during college at the University of Camp-inas. These procedures are tailored for some specificities common to Brazilian college course structure, but they areapplicable to a wide range of problems in which shape restrictions and large random vectors play a role. The use oftraditional measures like the GPA may mask the relative performance of a student. This is illustrated in Section 2, andmotivates the development of the proposed test statistic L n . The L n statistic has other advantages, besides beingdesigned to solve GPA shortcomings as an unbiased relative performance measure. The L n -based one sided test usuallyperforms better than the two-sided Hotteling’s χ -test. The L n test statistic is equivalent to the Hottelling’s T testwhen T n > , but in situations where not all the components of T n are positive L n will be more powerful. The problemof fairly assessing student performance is specially important when affirmative actions are employed. This is the casein several Brazilian universities, and these policies may greatly benefit from sound statistical analysis. We hope to beuseful in this direction. A P P E N D I X
Let ¯ Y and ¯ Y be the EES from an individual of group 2 and an individual from group 1, respectively; Y al and Y al are the grades in course l of an individual from group 2 and an individual from group 1, respectively. Let Var ( ¯ Y ) = σ , Var ( ¯ Y l ) = σ , Var ( Y al ) = σ and Var ( Y al ) = σ . Corr ( Y al , ¯ Y ) = ρ and Corr ( Y al , ¯ Y ) = ρ . Therefore,Cov ( Y al , ¯ Y ) = ρ σ σ and Cov ( Y al , ¯ Y ) = ρ σ σ . In general, (cid:32) ¯ Y − ¯ Y Y al − Y al (cid:33) ∼ N (cid:32)(cid:34) µ ∗ µ ∗ (cid:35) , (cid:34) σ + σ ρ σ σ + ρ σ σ ρ σ σ + ρ σ σ σ + σ (cid:35)(cid:33) We show some models for which the one-sided hypothesis is reasonable. Suppose that, under H , students fromgroup 2 do better than students from group 1 in the EES and the grades in undergraduate courses in group 2 will also bebetter than those from group 1, i.e, we expect more concordance than discordance. Then, under H , ¯ Y L = ¯ Y + µ ∗ , Y al L = Y al + µ ∗ ,with µ ∗ > .Let U = ¯ Y − ¯ Y and V = Y al − Y al .CASE A: If ρ = ρ = ρ H : µ = 0 vs. H : µ > Suppose σ = σ and σ = σ , Cov ( Y al , ¯ Y ) = Cov ( Y al , ¯ Y ) = ρσ σ . Then, Cov ( U , V ) = 2 ρσ σ , Corr ( U , V ) = ρ and (cid:32) UV (cid:33) = (cid:32) ¯ Y − ¯ Y Y al − Y al (cid:33) ∼ N (cid:32)(cid:34) µ ∗ µ ∗ (cid:35) , (cid:34) σ ρσ σ ρσ σ σ (cid:35)(cid:33) . INHEIRO ET AL . Finally, let Z = ( U − µ ∗ )/√ σ and Z = ( V − µ ∗ )/√ σ . Then, P H { Concordance } = P (cid:32) ¯ Y − ¯ Y √ σ > ; Y al − Y al √ σ > (cid:33) + P (cid:32) ¯ Y − ¯ Y √ σ < ; Y al − Y al √ σ < (cid:33) = P ( Z > − µ ∗ , Z > − µ ∗ ) + P ( Z < − µ ∗ , Z < − µ ∗ ) = ∫ ∞− µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv + ∫ − µ ∗ −∞ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv , where µ ∗ = µ ∗ /(√ σ ) , µ ∗ = µ ∗ /(√ σ ) and f Z , Z ( u , v ) = 12 π (cid:112) − ρ exp (cid:26) − ( − ρ ) (cid:104) u − ρuv + v (cid:105)(cid:27) . (18) P H { Discordance } = P ( ¯ Y − ¯ Y > ; Y al − Y al < ) + P ( ¯ Y − ¯ Y < ; Y al − Y al > ) = P ( Z > − µ ∗ , Z < − µ ∗ ) + P ( Z < − µ ∗ , Z > − µ ∗ ) = ∫ − µ ∗ −∞ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv + ∫ ∞− µ ∗ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv . Therefore, θ = P H { Discordance } − P H { Concordance } == ∫ − µ ∗ −∞ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( a ) + ∫ ∞− µ ∗ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( b ) − ∫ ∞− µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( c ) − ∫ − µ ∗ −∞ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( d ) = ( a ) + ( b ) − ( c ) − ( d ) (under H ) (19)Now, under H : ¯ Y L = ¯ Y + µ ∗ , Y al L = Y al + µ ∗ − µ , ( µ > ) . Then, (cid:32) ¯ Y − ¯ Y Y al − Y al (cid:33) ∼ N (cid:32)(cid:34) µ ∗ µ ∗ − µ (cid:35) , (cid:34) σ ρ √ σ ρ √ σ σ (cid:35)(cid:33) INHEIRO ET AL . 19 P H { Concordance } = P ( Z > − µ ∗ , Z > µ − µ ∗ ) + P ( Z < − µ ∗ , Z < µ − µ ∗ ) = ∫ ∞ µ − µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv + ∫ µ − µ ∗ −∞ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv = ∫ ∞− µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv − ∫ µ − µ ∗ − µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv + ∫ − µ ∗ −∞ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv + ∫ µ − µ ∗ − µ ∗ ∫ µ ∗ −∞ f Z , Z ( u , v ) dudv , where µ ∗ = µ ∗ /(√ σ ) , µ ∗ = µ ∗ /(√ σ ) and µ = µ /(√ σ ) . P H { Discordance } = P ( Z > − µ ∗ , Z < µ − µ ∗ ) + P ( Z < − µ ∗ , Z > µ − µ ∗ ) = ∫ µ − µ ∗ −∞ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv + ∫ ∞ µ − µ ∗ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv = ∫ − µ ∗ −∞ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv + ∫ µ − µ ∗ − µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv + ∫ ∞− µ ∗ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv − ∫ µ − µ ∗ − µ ∗ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudvθ = P H { Discordance } − P H { Concordance } == ∫ − µ ∗ −∞ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( a ) + ∫ µ − µ ∗ − µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( e ) + ∫ ∞− µ ∗ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( b ) − ∫ µ − µ ∗ − µ ∗ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( f ) − ∫ ∞− µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( c ) + ∫ µ − µ ∗ − µ ∗ ∫ ∞− µ ∗ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( e ) − ∫ − µ ∗ −∞ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( d ) − ∫ µ − µ ∗ − µ ∗ ∫ − µ ∗ −∞ f Z , Z ( u , v ) dudv (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ( f ) = ( a ) + ( b ) − ( c ) − ( d ) + 2 ( e ) − ( f ) (under H ) (20)Note that if ρ > and f Z , Z ( u , v ) is given by (18), the expression given in (20) > (19). Therefore, under H : θ > θ + θ . (cid:3) CASE B: H : ρ = ρ vs. H : ρ < ρ Now, let U g = ¯ Y g j − ¯ Y g i , V g = Y gajl − Y gail , U gg (cid:48) = ¯ Y g (cid:48) j − ¯ Y g i , V gg (cid:48) = Y g (cid:48) ajl − Y gail , for ≤ g < g (cid:48) ≤ G .Note that in general, Var ( ¯ Y g i − ¯ Y g (cid:48) j ) = σ + σ = σ ∗ , Var ( Y gail − Y g (cid:48) ajl ) = σ + σ = σ ∗ , Cov ( ¯ Y g i , Y gail ) = ρ σ σ and Cov ( ¯ Y g (cid:48) j , Y g (cid:48) jl ) = ρ σ σ . When comparing individuals from the same group, we will assume that σ = σ and INHEIRO ET AL . σ = σ .Then, (cid:32) U V (cid:33) ∼ N (cid:32)(cid:34) µ ∗ µ ∗ (cid:35) , (cid:34) σ σ σ ( ρ + ρ ) σ σ ( ρ + ρ ) σ (cid:35)(cid:33) and (cid:32) U V (cid:33) ∼ N (cid:32)(cid:34) µ ∗ µ ∗ (cid:35) , (cid:34) σ + σ ρ σ σ + ρ σ σ ρ σ σ + ρ σ σ σ + σ (cid:35)(cid:33) Definig U ∗ = U /(√ σ ) , V ∗ = V /(√ σ ) , U ∗ = U / (cid:113) σ + σ , V ∗ = V / (cid:113) σ + σ , µ ∗ = µ ∗ /(√ σ ) , µ ∗ = µ ∗ /(√ σ ) we get (cid:32) U ∗ V ∗ (cid:33) ∼ N (cid:32)(cid:34) µ ∗ µ ∗ (cid:35) , (cid:34) ( ρ + ρ )/ ( ρ + ρ )/ (cid:35)(cid:33) and (cid:32) U ∗ V ∗ (cid:33) ∼ N (cid:169)(cid:173)(cid:173)(cid:173)(cid:171)(cid:34) µ ∗ µ ∗ (cid:35) , ρ σ σ + ρ σ σ (cid:113) ( σ + σ )( σ + σ ) ρ σ σ + ρ σ σ (cid:113) ( σ + σ )( σ + σ ) (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) For W = V ∗ − ρ ∗ U ∗ , Cov ( W , U ∗ ) = 0 , W ∼ N ( µ ∗ − ρ ∗ µ ∗ , − ρ ∗ ) , with ρ ∗ = ( ρ + ρ )/ .For W = V ∗ − ρ ∗ U ∗ , Cov ( W , U ∗ ) = 0 , with ρ ∗ = ρ σ σ + ρ σ σ (cid:113) ( σ + σ )( σ + σ ) . Then, P ( U > , V > ) = P ( U ∗ > , V ∗ > ) = P ( U ∗ > , W + ρ ∗ U ∗ > ) = P ( U ∗ > , Z ∗ > − µ ∗ + ρ ∗ µ ∗ − ρ ∗ u ) = ∫ ∞ [ − Φ (− µ ∗ + ρ ∗ µ ∗ − ρ ∗ u )] f U ∗ ( u ) du = 12 − ∫ ∞ Φ (− µ ∗ + ρ ∗ µ ∗ − ρ ∗ u ) f U ∗ ( u ) du where Z ∗ = U ∗ − µ ∗ , Z ∗ = V ∗ − µ ∗ , Z ∗ = W − µ ∗ + ρ ∗ µ ∗ , i.e., Z ∗ ∼ N ( , ) , Z ∗ ∼ N ( , ) , Z ∗ ∼ N ( , − ρ ∗ ) andCov ( Z ∗ , Z ∗ ) = 0 . P ( U < , V < ) = P ( U < , W + ρU < ) = P ( U < , W < − ρv ) = ∫ −∞ φ (− ρv ) f V ( v ) dvP ( U > , V < ) = P ( U > , W + ρU < ) = P ( U > , W < − ρv ) = ∫ ∞ φ (− ρv ) f V ( v ) dv INHEIRO ET AL . 21 P ( U < , V > ) = P ( U < , W + ρU > ) = P ( U < , W > − ρv ) = ∫ −∞ [ − φ (− ρu )] f U ( u ) du = 12 − ∫ −∞ φ (− ρu ) f V ( u ) du Analogously, we would have the same computations for comparisons of individuals in group 2 ( U and V ) and fromdifferent groups ( U and V ). Therefore, θ gg = P ( Discordant ) − P ( Concordant ) = 2 (cid:26)∫ ∞ Φ (− ρv ) f V ( v ) dv − ∫ −∞ φ (− ρv ) f V ( v ) dv (cid:27) = 2 ∫ ∞ [ Φ (− ρv ) − Φ ( ρv )] f U ( u ) dv < , if ρ > ,since Φ (− ρv ) < / and Φ ( ρu ) > / , for g = 1 , . . . , G Within group g , we would have only ρ g , but between groups g and g (cid:48) , we would have ρ + ρ < ρ , since ρ < ρ .Note that, if ρ > , ρ > and ρ < ρ , ρ < ρ + ρ < ρ . θ = 2 ∫ ∞ [ Φ (− ρ u ) − Φ ( ρ v )] f U ( u ) du < , θ = 2 ∫ ∞ [ Φ (− ρ u ) − Φ ( ρ v )] f U ( u ) du < , θ = 2 ∫ ∞ [ Φ (−( ρ + ρ ) u ) − Φ (( ρ + ρ ) u )] f U ( u ) du < . | Φ (− ρ u ) − Φ ( ρ u ) | >> | Φ (−( ρ + ρ ) u ) − Φ (( ρ + ρ ) u ) | >> | Φ (− ρ v ) − Φ ( ρ v ) | . Then, θ > θ > θ ⇒ θ > ( θ + θ ) . R E F E R E N C E S
Birch, E. R. and Miller, P. W. (2006) Student outcomes at university in australia: A quantile regression approach.
AustralianEconomic Papers , , 1–17. URL: http://dx.doi.org/ . /j. - . . .x .— (2007) The influence of type of high school attended on university performance. Australian Economic Papers , , 1–17. URL: http://dx.doi.org/ . /j. - . . .x .Dobson, I. and Skuja, E. (2005) Secondary schooling, tertiary entry ranks and university performance. People and Place , , 53– 62.Everett, J. E. and Robins, J. (1991) Tertiary entrance predictors of first-year university performance. Australian Journal of Edu-cation , , 24–40. URL: https://doi.org/ . / .Grilli, L., Rampichini, C. and Varriale, R. (2015) Binomial mixture modeling of university credits. Communications in Statistics -Theory and Methods , , 4866–4879. URL: http://dx.doi.org/ . / . . . INHEIRO ET AL . — (2016) Statistical modelling of gained university credits to evaluate the role of pre-enrolment assessment tests: An ap-proach based on quantile regression for counts. Statistical Modelling , , 47–66. URL: https://doi.org/ . / X .Hoeffding, W. (1948) A class of statistics with asymptotically normal distribution. The Annals of Mathematical Statistics , ,293–325. URL: .Lee, A. J. (1990) U-statistics: theory and practice . New York: Marcel Dekker Inc.Maia, R. P., Pinheiro, H. P. and Pinheiro, A. (2016) Academic performance of students from entrance to graduation via quasiu-statistics: a study at a brazilian research university.
Journal of Applied Statistics , , 72–86. URL: http://dx.doi.org/ . / . . .Murray-Harvey, R. (1993) Identifying characteristics of successful tertiary students using path analysis. The Australian Educa-tional Researcher , , 63–81. URL: https://doi.org/ . /BF .Pedrosa, R. H. L., Dachs, J. N. W., Maia, R. P., Andrade, C. Y. and Carvalho, B. S. (2007) Academic Performance, Students’Background and Affirmative Action at a Brazilian University. Higher Education Management and Policy , , 1–20. URL: https://ideas.repec.org/a/oec/edukaa/ l j rq s b.html .Pinheiro, A., Sen, P. K. and Pinheiro, H. P. (2009) Decomposability of high-dimensional diversity measures: Quasi-u-statistics,martingales and nonstandard asymptotics. Journal of Multivariate Analysis , , 1645 – 1656. URL: X .— (2011) A class of asymptotically normal degenerate quasi u-statistics. Annals of the Institute of Statistical Mathematics , ,1165–1182. URL: https://doi.org/ . /s - - -z .Silvapulle, M. J. and Sen, P. K. (2005) Constrained Statistical Inference: Inequality, order, and shape restrictions . John Wiley & Sons,Inc.Smith, J. and Naylor, R. (2005) Schooling effects on subsequent university performance: evidence for the uk university pop-ulation.
Economics of Education Review , , 549 – 562. URL: .Win, R. and Miller, P. W. (2005) The effects of individual and school factors on university students’ academic performance. Australian Economic Review , , 1–18. URL: http://dx.doi.org/ . /j. - . . .x.x