An R Introduction to Statistics

Significance Test for MLR

Assume that the error term ϵ in the multiple linear regression (MLR) model is independent of xk (k = 1, 2, ..., p), and is normally distributed, with zero mean and constant variance. We can decide whether there is any significant relationship between the dependent variable y and any of the independent variables xk (k = 1, 2, ..., p).

Problem

Decide which of the independent variables in the multiple linear regression model of the data set stackloss are statistically significant at .05 significance level.

Solution

We apply the lm function to a formula that describes the variable stack.loss by the variables Air.Flow, Water.Temp and Acid.Conc. And we save the linear regression model in a new variable stackloss.lm.

> stackloss.lm = lm(stack.loss ~ 
+     Air.Flow + Water.Temp + Acid.Conc., 
+     data=stackloss)

The t values of the independent variables can be found with the summary function.

> summary(stackloss.lm) 
 
Call: 
lm(formula = stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., 
    data = stackloss) 
 
Residuals: 
   Min     1Q Median     3Q    Max 
-7.238 -1.712 -0.455  2.361  5.698 
 
Coefficients: 
            Estimate Std. Error t value Pr(>|t|) 
(Intercept)  -39.920     11.896   -3.36   0.0038 ** 
Air.Flow       0.716      0.135    5.31  5.8e-05 *** 
Water.Temp     1.295      0.368    3.52   0.0026 ** 
Acid.Conc.    -0.152      0.156   -0.97   0.3440 
--- 
Signif. codes:  0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 
 
Residual standard error: 3.24 on 17 degrees of freedom 
Multiple R-squared: 0.914,      Adjusted R-squared: 0.898 
F-statistic: 59.9 on 3 and 17 DF,  p-value: 3.02e-09

Answer

As the p-values of Air.Flow and Water.Temp are less than 0.05, they are both statistically significant in the multiple linear regression model of stackloss.

Note

Further detail of the summary function for linear regression model can be found in the R documentation.

> help(summary.lm)