In modern statistics, the concept of linear models allows researchers to understand and predict relationships between variables. Among them, the General Linear Model (GLM) is widely used in multivariable regression analysis, and the Multiple Linear Regression (Multiple Linear Regression) is a special case of this theory. So, what is the connection between the two?
The general linear model is a parsimonious method that represents multiple multivariable regression models simultaneously, which means that it is not an independent statistical linear model. In short, we can write different multivariable regression models in this form:
Y = X * B + U
Here, Y is a matrix containing multiple measured variable data, X is the observation matrix of independent variables, B is the parameter matrix, and U is the matrix of uncertainty or error. It is worth mentioning that these errors are generally assumed to be uncorrelated across observations and follow a multivariate normal distribution. If these errors do not follow a multivariate normal distribution, we can use a Generalized Linear Model (GLM) to relax the assumptions about Y and U.
The core meaning of the general linear model is that it combines a variety of different statistical models, such as ANOVA, ANCOVA, MANOVA, MANCOVA, etc., which allows it to handle more than one dependent variable, thereby providing a more comprehensive analysis. In this sense, ordinary linear regression is a special case of the general linear model, that is, it is limited to one dependent variable.
Ordinary linear regression is a model related to simple linear regression that focuses on the effect of multiple independent variables on a single dependent variable.
Specifically, the basic model of ordinary linear regression is: Y_i = β_0 + β_1 * X_i1 + β_2 * X_i2 + ... + β_p * X_ip + ε_i. If we consider n observations and p independent variables with this formula, Y_i is the i-th observed value of the dependent variable, while X_ik represents the corresponding observation of the independent variable, β_j is the parameter to be estimated, and ε_i is the i-th independent and identically distributed normal error.
For general linear models, when there is more than one dependent variable, we enter the field of multivariable regression. In this case, for each dependent variable, there are corresponding regression parameters to be estimated, so computationally this is actually a series of standard multiple linear regressions, all using the same explanatory variables.
The assumption of the general linear model is that the residuals will follow a conditional normal distribution, while the generalized linear model relaxes this assumption to allow a variety of other distributions.
Looking further, an important difference between general linear models and generalized linear models (GLM) is that GLM allows a wider range of residual distribution forms to choose from the exponential distribution family, such as binary logistic regression, Poisson regression, etc. The significance of this criticism is that when faced with different types of outcome variables, researchers can choose appropriate models to obtain the best prediction effect.
For example, the application of general linear models can be seen in the analysis of brain scan data. Y may contain the data from the brain scan, and X may be the variables in the experimental design. These tests are usually performed in a univariate manner, which is known in the field as mass-univariate analysis, and is often used in statistical parametric mapping studies.
In short, the relationship between ordinary linear regression and general linear models is like the relationship between a family and its special cases, focusing on how to change from simple observations to complex multi-variable relationships. As statistical analysis techniques develop, understanding the hidden treasures of these models will be an integral part of research efforts. However, in such a development trend, we perhaps should think about: Have you made full use of these statistical tools to influence your research and decision-making?