Linear Regression F Test of Model Variance

Regression F Test
Photo By Clint Shelton

Table of Contents



Context


Introduction to the Regression F Test

The F test is a test of the statistical significance of the regression analysis model as a whole. Unlike the T test, which is a test of means, the F test is a test of variances. The linear regression F test determines if the variance of the model is significantly less than that of a baseline model.

Just as in parts 1 and 2 utilizing matrix variables rather than only scalar makes deriving the solutions much easier. In this case this applies to constructing the formula for the degrees of freedom when both terms in the squared error are estimators.

Baseline, Final, and Explained Variance

The F test compares the variance of the final model, also called the “unexplained” variance (as in the deviation of the actual values from the predicted values not explained by the model), to the “explained” variance (the deviation between the predicted values of a baseline model to those of the final model).
\[\large{
\begin{align}
\stackrel{\text{baseline}}{\tilde{e} = (y-\tilde{y})} && \stackrel{\text{final}}{\widehat{e} = (y-\widehat{y})} && \stackrel{\text{explained}}{(\tilde{e} – \widehat{e}) = (\widehat{y}-\tilde{y})}
\end{align}
}\]

The Test Hypotheses

The null hypothesis is that the final model is no more explanatory than the baseline model. Under the null hypothesis the final and explained variance are both equal to the baseline variance (and therefore to each other).
\[\large{
\begin{align}
&\text{H}_0:\sigma_\text{final}^2=\sigma_\text{explained}^2 \\ \\
&\text{H}_a:\sigma_\text{final}^2<\sigma_\text{explained}^2
\end{align}
}\]
Many sources incorrectly use a “not equal to” comparison for the alternative hypothesis. This is incorrect, because the final model includes all the factors from the baseline and then some. If those extra factors are explanatory, then the final variance will be less than the explained variance. If they are not explanatory, then the final and explained variances will be statistically equal.

Implications of the Null Hypothesis

While not actually components of the hypothesis test itself the implications of the null hypothesis below are important to constructing the test formulas (as is show in the derivation section).
\[\large{
\begin{align}
&\text{H}_0\longrightarrow\text{E}[\tilde{y}]=\text{E}[\widehat{y}] \\ \\
&\text{H}_0\longrightarrow\sigma_\text{baseline}^2 = \sigma_\text{final}^2 = \sigma_\text{explained}^2
\end{align}
}\]

The Test

The test statistic, if the standard assumptions of the linear model are met, is considered to be generated from an F distribution (explained in more detail in the formula derivation section). If the probability of the observed F statistic is too unlikely (ie less than some predetermined probability \(0<a<1\)), then the hypothesis that the final model is no more explanatory than the baseline is rejected.
\[\large{
\begin{align}
&f_{df_\text{explained},df_\text{final}} = \frac{\sigma_\text{explained}^2}{\sigma_\text{final}^2} \sim F(df_\text{explained},df_\text{final}) & && &\normalsize \text{(Test Statistic)} \\ \\
&P_F(df_\text{explained},df_\text{final}) \begin{cases}
>= a && \small\text{accept $H_0$} \\
< a && \small\text{reject $H_0$, accept $H_a$}
\end{cases} & && &\normalsize\text{(Test)}
\end{align}
}\]


Baseline Is Subset of Final

The F Test is based on the comparison of a baseline model to a final model. The baseline is the estimate produced from any proper subset of factors from the final model.
\[\large{
\begin{align}
&y = \stackrel{n\times q}{X}b + \varepsilon & && & n > q & && & & && &\small\text{(linear regression model)} \\ \\
&\stackrel{n\times o}{W} = X_{[,1:o]} & && & o<q & && & & && &\small\text{(baseline design matrix)} \\ \\
&\tilde{b} = (W^\text{T}W)^{-1}W^\text{T}y & && &\tilde{y} = W\tilde{b} & && &\tilde{e} = (y-\tilde{y}) & && &\small\text{(baseline residual error)} \\ \\
&\widehat{b} = (X^\text{T}X)^{-1}X^\text{T}y & && &\widehat{y} = X\widehat{b} & && &\widehat{e} = (y-\widehat{y}) & && &\small\text{(final residual error)}
\end{align}
}\]

Most Common Baselines

Almost universally the baseline models of linear regression are one of two varieties: “constant factor” baseline or “Null factor” baseline.

Constant factor baseline is a design matrix, which contains only the constant factor 1. In this special case the coefficient estimator formula reduces to the standard sample mean formula. This results in every model estimate being the mean of the actual values from \(y\). Under the null factor baseline, there is no significant factor, the model is assumed to be only the error term \(\varepsilon\) and every estimated value is 0.
\[\large{
\begin{align}
&\stackrel{\text{Constant Basline}\\}{y = \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}b + \varepsilon} & && &\tilde{b} = \frac{1}{n}\sum_{i=1}^n y_i = \widehat{\mu}_Y & && &\tilde{y} = \begin{bmatrix} \widehat{\mu}_Y \\ \vdots \\ \widehat{\mu}_Y \end{bmatrix} \\ \\
&\stackrel{\text{Null Baseline}\\}{y = \varepsilon} & && &\tilde{b} = \small\text{undefined} & && & \tilde{y} = \begin{bmatrix} 0 \\ \vdots \\ 0 \end{bmatrix}
\end{align}
}\]


WARNING! on Changing Baselines

The constant and null factor baselines are so ubiquitous they are built into the functionality of the R. Including the constant factor in the final model automatically uses the constant factor baseline for the F test. CHANGING THE BASELINE BETWEEN TWO MODELS MAKES THEM NOT COMPARABLE.

In order to compare a model with a constant factor to one without it is necessary to include the constant factor as an actual column of data so that the null factor baseline can be specified for both models.

n <- 30
df <- data.frame(
  x = runif(n,-100,100)
  ,e = rnorm(n,0,35)
)
df$y <- 40 + 2.5 * df$x + df$e
df$const <- rep(1,n)

# + 1 automatically uses the constant factor baseline
summary(lm(y ~ x + 1, df))

# model w/o constant factor using null factor baseline
summary(lm(y ~ x + 0, df))

# model w/ constant factor using null factor baseline
summar(lm(y ~ x + Const + 0), df)


F Test Only Valid For Linear Regression

In part I of this series I mentioned that linear regression is only one member of an entire family of linear models (called the generalized linear model). This was in the context of choosing the “best” estimate for the model coefficients.

I detailed a method for estimating the coefficient vector called maximum likelihood estimation. The reason I gave for detailing this method and not the much more common “least squares” method is that MLE produces EXACTLY the same result for linear regression, but unlike least squares it functions for the entire generalized linear model.

Because other members of the family do not seek to minimize squared error, the final estimate may not be the one that produces the lowest squared error. The F test applies lower p-values to models which produce lower variance, which is just a scaled measure of squared error. But if you are working with a model and optimizing for something other than squared error it makes no sense to judge the model accuracy using a statistic which is based on squared error.

Although you can calculate the canonical F statistic for the entire GLM, as it is merely a function of the baseline, final, and explained variance, for the reasons stated above it is invalid to use it to judge the accuracy of any generalized linear model other than linear regression.


Formulas and Derivation


The Regression F Test Formulas

\[\large{
\begin{align}
&y = \stackrel{n\times q}{X}b + \varepsilon & && &\small\text{(regression model)} \\ \\
&\widehat{b} = (X^\text{T}X)X^\text{T}y,\quad\widehat{y}=X\widehat{b} & && &\small\text{(final estimate)} \\ \\
&y = \stackrel{n\times o}{W}b + \varepsilon,\;\; W=X_{[,1:o]},\;\;o<q & && &\small\text{(baseline model)} \\ \\
&\tilde{b} = (W^\text{T}W)W^\text{T}y,\;\;\tilde{y}=W\tilde{b} & && &\small\text{(baseline estimate)} \\ \\
&df_\text{final} = (n\;-\;q) & && &\small\text{(final degrees of freedom)} \\ \\
&df_\text{baseline} = (n\;-\;o) & && &\small\text{(baseline degrees of freedom)}\\ \\
&df_\text{explained} = (q\;-\;o) & && &\small\text{(“explained” degrees of freedom)}\\ \\
&\widehat{\sigma}_\text{final}^2=\frac{1}{df_\text{final}}\sum_{i=1}^n (y_i \;-\; \widehat{y}_i)^2 & && &\small\text{(final variance)}\\
&\widehat{\sigma}_\text{baseline}^2=\frac{1}{df_\text{baseline}}\sum_{i=1}^n (y_i \;-\; \tilde{y}_i)^2 & && &\small\text{(baseline variance)}\\
&\widehat{\sigma}_\text{explained}^2=\frac{1}{df_\text{explained}}\sum_{i=1}^n (\widehat{y}_i \;-\; \tilde{y}_i)^2 & && &\small\text{(explained variance)} \\ \\

&f_{df_\text{explained},\;df_\text{final}} = \frac{\widehat{\sigma}_\text{explained}^2}{\widehat{\sigma}_\text{final}^2} & && &\small\text{(F Test Statistic)} \\ \\
\end{align}
}\]


R Code For Regression F Test

The R code below manually implements the formulas from above, also uses the standard R functionality to achieve the same results, and then compares the two.

If you are new to R I suggest R-Studio as and IDE.

######################################
## Generate Data, Declare Variables ##
######################################

rm(list = ls())
`%+%` <- function(a, b) paste(a, b, sep="")

IsConstFactor <- T    # control if constant factor in model
IsSigFactors <- T     # control if significant factors in model
IsNonSigFactor <- T   # control if non-significant factor in model

n <- 100              # sample size
sigma.model <- 40     # error standard deviation

# independent factors aka design matrix
X <- cbind(                         
  if(IsConstFactor == T){rep(1,n)}else{NULL}
  ,if(IsSigFactors == T){runif(n,-100,100)}
  ,if(IsSigFactors == T){rpois(n,10)}
  ,if(IsNonSigFactor == T){rexp(n,0.1)}else{NULL}
)

# coefficient vector
b <- rbind(
  if(IsConstFactor == T){40}else{NULL}  
  ,if(IsSigFactors == T){2.5}
  ,if(IsSigFactors == T){4}
  ,if(IsNonSigFactor == T){0}else{NULL}
)   

# error, linear regression model, baseline estimate
e <- cbind(rnorm(n,0,sigma.model))
y <- X %*% b + e
baseline <-                       
  if(IsConstFactor == T) {
    mean(y)
  } else {0}

# QR factorization of X for more
# efficient processing
qr <- qr(X)
Q <- qr.Q(qr)
R <- qr.R(qr)
rm(qr)

# labels
colnames(X) <- c("X" %+% seq(as.numeric(!IsConstFactor),
  ncol(X) - as.numeric(IsConstFactor))) 
rownames(b) <- c("b" %+% seq(as.numeric(!IsConstFactor),
  nrow(b) - as.numeric(IsConstFactor)))


###############################
## Linear Regression Using R ##
###############################

model.formula <- if(IsConstFactor == T) {
  "y ~ 1" %+% paste(" + " %+% colnames(X)[2:ncol(X)], collapse='')
} else {"y ~ 0 " %+% paste(" + " %+% colnames(X), collapse='')}
linear.model <- lm(model.formula,as.data.frame(X))


#######################################
## Perform Liner Regression Manually ##
#######################################

b_ <- solve(R) %*% t(Q) %*% y     # estimated coefficients   
#b_ <- solve(t(X) %*% X) %*% t(X) %*% y
rownames(b_) <- rownames(b)
y_ <- X %*% b_                    # estimated model                   

# degrees of freedom
df.baseline <- if(IsConstFactor == T) {n - 1} else {n}
df.final <- n - nrow(b_)
df.explained <- df.baseline - df.final

# residuals
res <- cbind(
  c(y - baseline)     # baseline/"total" error
  ,c(y - y_)          # final/"unexplained" error
  ,c(y_ - baseline)   # "explained" error ("total" - "unexplained")
); colnames(res) <- c("baseline","final","explained")

# variances
var_.baseline <- sum(res[,"baseline"]^2) / df.baseline
var_.final <- sum(res[,"final"]^2) / df.final
var_.explained <- sum(res[,"explained"]^2) / df.explained

# F-test
f.stat <- var_.explained / var_.final
pf <- 1 - pf(f.stat,df.explained,df.final)
ret.model <- cbind(sqrt(var_.final),f.stat,df.explained,df.final,pf)
colnames(ret.model) <- c("Std. Error","F-stat","Explained df","Final df","P-value")
rownames(ret.model) <- c("Model") 

# residuals
ret.residuals <- cbind(min(res[,"final"]),quantile(res[,"final"],0.25),median(res[,"final"]),quantile(res[,"final"],0.75),max(res[,"final"]))
colnames(ret.residuals) <- c("Min","1Q","Median","3Q","Max")
rownames(ret.residuals) <- c("Residuals")


#############
## Compare ##
#############

summary(linear.model)
ret.residuals
ret.model


Step 1: Degrees of Freedom

Part 2 of this series defined degrees of freedom and showed why it is a necessary component for unbiased variance estimators. It also derived the degrees of freedom formula for an estimator where the sum of squared error has one actual term and one estimated term. See the example of the final model variance estimator below.
\[\large{
\begin{align}
\widehat{y} = \stackrel{n\times q}{X}\widehat{b} && \widehat{\sigma}_\text{final}^2 = \frac{1}{df_\text{final}}\sum_{i=1}^n(y_i\;-\;\widehat{y}_i)^2 && df_\text{final} = n-q
\end{align}
}\]
Unlike the baseline and final model variance estimators in the explained estimator BOTH terms of the squared error are estimators. Both terms being estimators with all factors from the baseline contained in the final model changes the degrees of freedom formula.

Explained Degrees of Freedom

Start with the explained sum of squared error. The null hypothesis implies that the expected difference in between the final model and the baseline model is 0 (said differently the expected value of their respective predicted values is equal).
\[\large{
\begin{align}
\text{E}\Big[SSE_\text{explained}\Big] &= \text{E}\Big[\sum_{i=1}^n(\widehat{y}_i – \tilde{y}_i-0)^2\Big] && \small (1-1) \\ \\
&=\text{E}\Big[\sum_{i=1}^n \widehat{y}_i^2 – 2\widehat{y}_i\tilde{y}_i + \tilde{y}_i^2\Big]
\end{align}
}\]
It is easier to factor out terms by first converting the sum to matrix/vector form. If you need a refresher, see the matrix quick reference.
\[\large{
\begin{align}
&=\text{E}\Big[\widehat{y}^\text{T}\widehat{y} \;-\; 2 \widehat{y}^\text{T}\tilde{y} + \tilde{y}^\text{T}\tilde{y}\Big] && \small (1-2) \\ \\
&= \text{E}\Big[\widehat{y}^\text{T}\widehat{y} \;-\; 2y^\text{T}X(X^\text{T}X)^{-1}X^\text{T}W(W^\text{T}W)^{-1}W^\text{T}y + \tilde{y}^\text{T}\tilde{y}\Big]
\end{align}
}\]
The \(\widehat{y}^\text{T}\tilde{y}\) term can be reduced to \(\tilde{y}^\text{T}\tilde{y}\). In order to understand how, consider the standard solution for the coefficient estimator multiplied by the design matrix.
\[\require{color}\large{
\begin{align}
X\widehat{b} =& {\color{aqua}\stackrel{P_X}{X(X^\text{T}X)^{-1}X^\text{T}}}y = P_Xy = \widehat{y} \\ \\
&{\color{aqua}\stackrel{P_X}{X(X^\text{T}X)^{-1}X^\text{T}}}X = P_XX = X
\end{align}
}\]
\(P_X\) is known as a “projection matrix.” It projects \(X\) onto \(y\), the result of which is the vector of model estimates \(\widehat{y}\). It should be obvious, both intuitively and via the matrix algebra, that projecting \(X\) onto \(X\) would result in itself.

What is easy to overlook is that the projection operates on the right hand matrix columns individually, because that’s simply how matrix multiplication works. Look at the two examples below and keep in mind matrix multiplication is ROW TO COLUMN.
\[\require{color}\large{
\begin{align}
&\stackrel{P_X}{\begin{bmatrix}
P_{1,1} & \ldots & P_{1,n} \\
\ldots & \ldots & \ldots \\
P_{n,1} & \ldots & P_{n,n}
\end{bmatrix}} &

&\stackrel{X}{\begin{bmatrix}
{\color{orange}X_{1,1}} & {\color{aqua}X_{1,2}} & {\color{gold}X_{1,3}} \\
{\color{orange}\vdots} & {\color{aqua}\vdots} & {\color{gold}\vdots} \\
{\color{orange}X_{n,1}} & {\color{aqua}X_{n,2}} & {\color{gold}X_{n,3}} \\
\end{bmatrix}}& &=

\stackrel{X}{\begin{bmatrix}
{\color{orange}X_{1,1}} & {\color{aqua}X_{1,2}} & {\color{gold}X_{1,3}} \\
{\color{orange}\vdots} & {\color{aqua}\vdots} & {\color{gold}\vdots} \\
{\color{orange}X_{n,1}} & {\color{aqua}X_{n,2}} & {\color{gold}X_{n,3}} \\
\end{bmatrix}} \\ \\


&\stackrel{P_X}{\begin{bmatrix}
P_{1,1} & \ldots & P_{1,n} \\
\ldots & \ldots & \ldots \\
P_{n,1} & \ldots & P_{n,n}
\end{bmatrix}} &

&\stackrel{X}{\begin{bmatrix}
{\color{orange}X_{1,1}} \\
{\color{orange}\vdots} \\
{\color{orange}X_{n,1}}
\end{bmatrix}}& &=

\stackrel{X}{\begin{bmatrix}
{\color{orange}X_{1,1}} \\
{\color{orange}\vdots} \\
{\color{orange}X_{n,1}}
\end{bmatrix}}
\end{align}
}\]
Every row in the left hand matrix is multiplied by the first column in the right hand matrix and the results are placed top to bottom in the first column vector of the product. Removing columns from X does not change the projection result for the columns that remain. Recall that \(W\) is made up of columns from \(X\).
\[\large{
P_XX = X \quad\land\quad W=X_{[,1:o]}\quad\longrightarrow\quad P_XW = W
}\]
This equivalency allows us to factor out all the \(X\) terms from the middle term of the explained SSE, which reduces it to \(\tilde{y}^\text{T}\tilde{y}\).
\[\require{cancel}\large{
\begin{align}
&= \text{E}\Big[\widehat{y}^\text{T}\widehat{y} \;-\; 2y^\text{T}\stackrel{P_X}{\cancel{X(X^\text{T}X)^{-1}X^\text{T}}}W(W^\text{T}W)^{-1}W^\text{T}y + \tilde{y}^\text{T}\tilde{y}\Big] && \small (1-3) \\ \\
&= \text{E}\Big[\widehat{y}^\text{T}\widehat{y} \;-\; \tilde{y}^\text{T}\tilde{y}\Big] \\ \\
&= \sum_{i=1}^n \text{E}[\widehat{y}_i^2]\;-\;\text{E}[\tilde{y}_i^2]
\end{align}
}\]
The expected values of squared estimator random variables can be expanded by rearranging the basic variance equivalency \(\text{Var}[Y] = \text{E}[Y^2] – \text{E}[Y]^2\). The assumption of the null hypothesis for the F test implies that the means of both estimator random variables are equal.
\[\require{cancel}\large{
\begin{align}
= \sum_{i=1}^n\text{Var}[\widehat{y}_i] + \cancel{\text{E}[\widehat{y}_i]^2} \;-\; \text{Var}[\tilde{y}_i] \;-\; \cancel{\text{E}[\tilde{y}_i]^2} &&\small (1-4)
\end{align}
}\]
As was shown in part 2 of this series, the sum of the variances of the mean estimator random variables is equal to \(\sigma_Y^2\) multiplied by the number of factors in the estimator.
\[\large{
\begin{align}
\text{E}\Big[SSE_\text{explained}\Big] &= q\sigma_Y^2\;-\;o\sigma_Y^2 &&\small (1-5) \\ \\
&= (q\;-\;o)\sigma_Y^2 \longrightarrow df_\text{explained} = (q\;-\;o)
\end{align}
}\]
Finally, note that the degrees of freedom of the explained variance estimator IS NOT A FUNCTION OF \(n\). This will be important in understanding the mechanics of the F statistic.


Step 2: Variance Estimators

The F statistic functions based on the mechanics of the baseline, final, and explained variance estimators. Under the null hypothesis the final and explained variance estimators will have an expected value equal to that of the baseline estimator.
\[\large{
\text{H}_0\longrightarrow \text{E}[\widehat{\sigma}_\text{baseline}^2] =\text{E}[\widehat{\sigma}_\text{final}^2] =\text{E}[\widehat{\sigma}_\text{explained}^2]
}\]
Under the alternative hypothesis the final model is more explanatory than the baseline and will have a smaller variance while the explained variance estimator will diverge to infinity.
\[\large{
\text{H}_a\longrightarrow \sigma_\varepsilon^2 \le \text{E}[\widehat{\sigma}_\text{final}^2] < \text{E}[\widehat{\sigma}_\text{baseline}^2] < \text{E}[\widehat{\sigma}_\text{explained}^2]< \infty
}\]

Baseline Variance

The baseline variance estimator uses the baseline model estimated values from \(\tilde{y}\).
\[\large{
\begin{align}
&\tilde{y} = \stackrel{n\times o}{W}\tilde{b} \\ \\
&df_\text{baseline} = n-o \\ \\
&\widehat{\sigma}_\text{baseline}^2 = \frac{1}{df_\text{baseline}} \sum_{i=1}^n\big(y_i-\tilde{y}_i\big)^2
\end{align}}
\]

Final Variance

The final variance estimator uses the final model estimated values from \(\widehat{y}\). It is often referred to as the “unexplained” variance or error.

Under the null hypothesis the final model is no more explanatory than the baseline, which means the variance estimator’s expected value is equal to that of the baseline estimator.

Under the alternative the final estimator is more predictive of \(y\) than the baseline and therefore the expected value of the final variance estimator will be lower than the baseline. However, the final variance estimator’s expected value cannot drop below the variance of the linear regression noise/error term \(\varepsilon\) (because that term by definition represent unexplainable variation).
\[\large{
\begin{align}
&\widehat{y} = \stackrel{n\times q}{X}\widehat{b}\\ \\
&df_\text{final} = n-q \\ \\
&\widehat{\sigma}_\text{final}^2 = \frac{1}{df_\text{final}} \sum_{i=1}^n\big(y_i-\widehat{y}_i\big)^2 \\ \\
&\text{H}_0 \longrightarrow \text{E}[\widehat{\sigma}_\text{final}^2] \quad=\quad \text{E}[\widehat{\sigma}_\text{baseline}^2] \\ \\
&\text{H}_a \longrightarrow (\sigma_Y^2=\sigma_\varepsilon^2) \quad\le\quad \text{E}[\widehat{\sigma}_\text{final}^2] \quad<\quad \text{E}[\widehat{\sigma}_\text{baseline}^2]
\end{align}}
\]

Explained Variance

The explained error is the difference between the baseline model error and the final model error. If an actual value is 10, the baseline prediction is 2, and the final prediction is 9, then 7 is what is “explained” by the final model (unexplained error is 1).
\[\large{
\stackrel{\text{baseline error}}{\tilde{e} = (y-\tilde{y})} \quad \stackrel{\text{model error}}{\widehat{e} = (y-\widehat{y})} \quad \stackrel{\text{explained error}}{(\tilde{e}-\widehat{e}) = (\widehat{y}-\tilde{y})}
}\]
Under the null hypothesis the expected difference in error is zero, but the individual final estimated values WILL deviate from the baseline. This is due to the variance of the coefficient estimates for factors present in the final, but absent in the baseline.

As was shown in part 2 the variance of coefficient estimates is expressed in units of actual model variance: \(\text{Var}[\widehat{b}]=(X^\text{T}X)^{-1}\sigma_Y^2\). This allows the explained variance estimator to be adjusted, through its degrees of freedom, to have an expected value equal to baseline variance (which is our best estimate for \(\sigma_Y^2\) under \(\text{H}_0\)).

Under the alternative hypothesis the estimator diverges to infinity, because the degrees of freedom is not a function of \(n\) (see the derivation of the degrees of freedom above). If the final model is more explanatory than the baseline, the sum of squared error is expected to increase with every observation, but the degrees of freedom will remain fixed.
\[\large{
\begin{align}
&df_\text{explained} = q-o\\ \\
&\widehat{\sigma}_\text{explained}^2 = \frac{1}{df_\text{explained}} \sum_{i=1}^n\big(\widehat{y}_i-\tilde{y}_i\big)^2 \\ \\
&\text{H}_0 \longrightarrow \text{E}[\widehat{\sigma}_\text{explained}^2] = \text{E}[\widehat{\sigma}_\text{baseline}^2] \\ \\
&\text{H}_a \longrightarrow \text{E}[\widehat{\sigma}_\text{baseline}^2] < \text{E}[\widehat{\sigma}_\text{explained}^2] < \infty
\end{align}
}\]


Step 3: Construct The Regression F Test Statistic

It can be shown that under the assumptions of the linear regression, most notably normally distributed errors, the ratio of the explained error and final model variance estimators form an F random variable with degrees of freedom taken respectively from the estimators.

Z, Chi-Squared, and F Random Variables

Any normally distributed random variable can be transformed into standard normal, by subtracting its own mean and dividing by its own standard deviation. A chi-squared distribution is the sum of one or more squared Z random variables (aka squared standard normal random variables) and having \(k\) degrees of freedom, which may not be the same as the number of values in the sum.

The F distribution is the ratio of two chi-squared random variables scaled by the inverse of their degrees of freedom.
\[\large{
\begin{align}
&X\sim \text{Nor}(\mu,\sigma) & && & Z \sim \text{Nor}(0,1) = \frac{X-\mu}{\sigma} \\
&\chi^2\sim \text{Chi}(k) = \sum_{i=1}^n Z_i^2 & && & F\sim F(a,b) = \frac{\chi^2_a/a}{\chi^2_b/b}
\end{align}
}\]

Constructing the Statistic

The F test requires the assumption of normally distributed errors. How to validate that assumption is topic for a future post. For now take it as a given that the final model error and the explained error are normally distributed with mean 0 (the coefficient estimator values are required to produce 0 error on average, otherwise they wouldn’t be the most likely values).
\[\large{
\begin{align}
&(y-\widehat{y})\sim\text{Nor}(0,\sigma_\text{final})\\ \\
&(\widehat{y}-\tilde{y})\sim\text{Nor}(0,\sigma_\text{explained})
\end{align}
}\]
Under the null hypothesis the expected value of the final model and explained error variance estimators are equal. This assumption of an expected equal variance can be used to transform the sum of squared errors of the estimators into Chi-squared random variables and therefore their quotient into an F random variable.

Start with the quotient of the variance estimators and factor in the shared expected variance.
\[\large{
\begin{align}
\frac{\widehat{\sigma}_\text{explained}^2}{\widehat{\sigma}_\text{explained}^2} \cdot \frac{\sigma^2}{\sigma^2} = \frac{\frac{1}{df_\text{explained}}\sum_{i=n}^n(\frac{\widehat{y}-\tilde{y}}{\sigma})^2}{\frac{1}{df_\text{final}}\sum_{i=n}^n(\frac{y-\widehat{y}}{\sigma})^2} && \small (2-1)
\end{align}
}\]
The sums are now of normally distributed values, divided by their standard deviation, squared, which is to say they are sums of squared Z random values. That makes the sums themselves Chi-Squared random variables.
\[\large{
\begin{align}
&\frac{\frac{1}{df_\text{explained}}\sum_{i=n}^n(\frac{\widehat{y}-\tilde{y}}{\sigma})^2}{\frac{1}{df_\text{final}}\sum_{i=n}^n(\frac{y-\widehat{y}}{\sigma})^2} = \frac{\frac{1}{df_\text{explained}}\chi_{df_\text{explained}}^2}{\frac{1}{df_\text{final}}\chi_{df_\text{final}}^2} && \small (2-2)
\end{align}
}\]
Remember that the degrees of freedom for the Chi-squared random variable is NOT the number of random variables in the sum, but the degrees of freedom of the sum of squared error component.
\[{
\sum_{i=1}^n Z^2 = \sum_{i=1}^n(\frac{X_i-\widehat{\mu}_X}{\sigma_X})^2 = \frac{\sum_{i=1}^n(X_i-\widehat{\mu}_X)^2}{\sigma_X^2} = \frac{SSE_{\{df=(n-1)\}}}{\sigma_X^2}
}\]
The quotient of two chi-squared random variables scaled by their degrees of freedom is the definition of an F random variable.
\[\large{
\begin{align}
F_{df_\text{explained},\;df_\text{final}} = \frac{\frac{1}{df_\text{explained}}\chi_{df_\text{explained}}^2}{\frac{1}{df_\text{final}}\chi_{df_\text{final}}^2} && \small (2-3)
\end{align}
}\]
The \(\sigma\) can be canceled out from the expanded chi-squared quotient.
\[\require{cancel}\large{
\begin{align}
F_{df_\text{explained},\;df_\text{final}} = \frac{\frac{1}{df_\text{explained}}\sum_{i=n}^n(\frac{\widehat{y}-\tilde{y}}{\cancel{\sigma}})^2}{\frac{1}{df_\text{final}}\sum_{i=n}^n(\frac{y-\widehat{y}}{\cancel{\sigma}})^2}
&& \small (2-4)
\end{align}
}\]
We now have that the the F statistic is equal to the quotient of the explained error variance estimator and the final model variance estimator.
\[\large{
\begin{align}
f_{df_\text{explained},\;df_\text{final}} = \frac{\widehat{\sigma}_\text{explained}^2}{\widehat{\sigma}_\text{final}^2} && \small (2-5)
\end{align}
}\]

5,444 thoughts on “Linear Regression F Test of Model Variance”