# Residuals example

Imagine that your ability to play a musical instrument (`response`) is related to your `age` (perhaps music gets easier to pick up as you age?) but also to how much `training` you have put in.

```age      = 1:30
training = 0.1 * sample(1:30)
response = age + training + rnorm(30,0,0.1)```

We would collect some observations and then do a multiple regression GLM on the `response` variable with EVs (a.k.a. predictors) for `age` and `training`. An overall omnibus ANOVA F-test would say that our model is highly significant, and individual parameter estimates (betas) of the strengths (slopes) of the 2 relationships would be significantly non-zero as well.

`summary(lm(response ~ age*training))`

Output:

```Call:
lm(formula = response ~ age * training)

Residuals:
Min       1Q   Median       3Q      Max
-0.17615 -0.06438  0.01084  0.05826  0.18224

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.039688   0.073307  -0.541    0.593
age           1.001491   0.004149 241.360   <2e-16 ***
training      0.989671   0.039442  25.092   <2e-16 ***
age:training  0.001147   0.002358   0.486    0.631
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.08968 on 26 degrees of freedom
Multiple R-squared: 0.9999,	Adjusted R-squared: 0.9999
F-statistic: 9.245e+04 on 3 and 26 DF,  p-value: < 2.2e-16```

But when we go to plot `response` vs. `training`, we don't see the highly significant relationship we just uncovered. It just looks like noise.

`plot(response ~ training)`

This is because we are plotting the raw data and haven't yet accounted for the effects of age. We can visualize the part of this interdependent relationship that we want by fitting a simple model for age, and then replotting the residuals.

```lm.age   = lm(response ~ age)
lm.train = lm(lm.age\$res ~ training)

plot(lm.age\$res ~ training, ylab='response (residualized for age)')
abline(lm.train\$coef, lm.train\$coef)``` 