Imagine that your ability to play a musical instrument (`response`

) is related to your `age`

(perhaps music gets easier to pick up as you age?) but also to how much `training`

you have put in.

age = 1:30 training = 0.1 * sample(1:30) response = age + training + rnorm(30,0,0.1)

We would collect some observations and then do a multiple regression GLM on the `response`

variable with EVs (a.k.a. predictors) for `age`

and `training`

. An overall omnibus ANOVA F-test would say that our model is highly significant, and individual parameter estimates (betas) of the strengths (slopes) of the 2 relationships would be significantly non-zero as well.

summary(lm(response ~ age*training))

Output:

Call: lm(formula = response ~ age * training) Residuals: Min 1Q Median 3Q Max -0.17615 -0.06438 0.01084 0.05826 0.18224 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.039688 0.073307 -0.541 0.593 age 1.001491 0.004149 241.360 <2e-16 *** training 0.989671 0.039442 25.092 <2e-16 *** age:training 0.001147 0.002358 0.486 0.631 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.08968 on 26 degrees of freedom Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999 F-statistic: 9.245e+04 on 3 and 26 DF, p-value: < 2.2e-16

But when we go to plot `response`

vs. `training`

, we don't see the highly significant relationship we just uncovered. It just looks like noise.

plot(response ~ training)

This is because we are plotting the raw data and haven't yet accounted for the effects of age. We can visualize the part of this interdependent relationship that we want by fitting a simple model for age, and then replotting the residuals.

lm.age = lm(response ~ age) lm.train = lm(lm.age$res ~ training) plot(lm.age$res ~ training, ylab='response (residualized for age)') abline(lm.train$coef[1], lm.train$coef[2])

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International