5 Tutorial 3b: Statistical Inference in Simple Regression
Tutorial 3 established two core results about the OLS slope estimator \(\hat\beta_1\):
- Unbiasedness: \(\mathbb{E}[\hat\beta_1 \mid X] = \beta_1\)
- Sampling variance: \(\text{Var}(\hat\beta_1 \mid X) = \dfrac{\sigma^2}{S_{xx}}\), where \(S_{xx} = \sum_{i=1}^n (X_i - \bar{X})^2\)
We also showed that \(\hat\sigma^2 = \dfrac{RSS}{n-2}\) is an unbiased estimator of \(\sigma^2\).
We know that \(\hat\beta_1\) is centered at the truth and how spread out it is. What we have not yet done is ask: how likely is it that the true \(\beta_1\) equals zero? That question — hypothesis testing — is the subject of this tutorial.
What we add in this tutorial:
- The normality assumption and what it implies for the distribution of \(\hat\beta_1\)
- The \(t\)-statistic and its \(t(n-2)\) null distribution
- Two-sided hypothesis tests and p-values
- Confidence intervals for \(\beta_1\)
- The \(F\)-test for the whole regression
5.1 Part A. The Sampling Distribution of \(\hat\beta_1\)
5.1.1 A1. An additional assumption: normality of errors
Narrative idea. Unbiasedness and the variance formula hold under the four Gauss-Markov assumptions alone. But to pin down the exact shape of the sampling distribution of \(\hat\beta_1\) — and therefore build exact tests — we need one more assumption.
Assumption (Normality of the error term):
\[ u_i \mid X \sim \mathcal{N}(0, \sigma^2) \]
This says the errors are normally distributed, with mean zero and constant variance. Combined with the earlier assumptions, this is the classical normal linear model.
Why does this give us the distribution of \(\hat\beta_1\)? Recall that \(\hat\beta_1\) is a linear combination of the \(Y_i\)’s, and each \(Y_i = \beta_0 + \beta_1 X_i + u_i\). Since the \(u_i\) are normal, \(Y_i\) is normal, and a linear combination of normals is normal:
\[ \hat\beta_1 \mid X \sim \mathcal{N}\!\left(\beta_1,\; \frac{\sigma^2}{S_{xx}}\right) \]
Standardising by subtracting the mean and dividing by the standard deviation:
\[ \frac{\hat\beta_1 - \beta_1}{\sigma / \sqrt{S_{xx}}} \sim \mathcal{N}(0,1) \]
This would let us build tests — if we knew \(\sigma\). We don’t. We replace \(\sigma\) with \(\hat\sigma = \sqrt{\hat\sigma^2}\), which introduces additional randomness.
5.1.2 A2. The \(t\)-statistic and the \(t(n-2)\) distribution
Key result. When \(\sigma\) is replaced by \(\hat\sigma\):
\[ \boxed{t = \frac{\hat\beta_1 - \beta_1}{\widehat{\text{se}}(\hat\beta_1)} \sim t(n-2)} \]
where the standard error of \(\hat\beta_1\) is:
\[ \widehat{\text{se}}(\hat\beta_1) = \frac{\hat\sigma}{\sqrt{S_{xx}}} \]
The distribution is Student’s \(t\) with \(n - 2\) degrees of freedom. The \(-2\) comes from the two parameters estimated (\(\hat\beta_0\), \(\hat\beta_1\)) — each one uses up one degree of freedom.
Intuition for the \(t\) vs \(Z\) distinction. With a \(Z\)-statistic we divide by the known \(\sigma\); with a \(t\)-statistic we divide by the estimated \(\hat\sigma\). Estimating \(\sigma\) adds extra uncertainty, making the distribution heavier-tailed than the standard normal. As \(n \to \infty\), \(\hat\sigma \to \sigma\) and \(t(n-2) \to \mathcal{N}(0,1)\).
5.1.3 A3. What summary(lm(...)) reports
In R, summary(lm(Y ~ X)) prints a coefficient table like:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.234 0.312 3.96 0.0002
X 0.871 0.098 8.89 <2e-16
Each row corresponds to one coefficient:
| Column | Meaning |
|---|---|
Estimate |
\(\hat\beta_j\) |
Std. Error |
\(\widehat{\text{se}}(\hat\beta_j)\) |
t value |
\(t = \hat\beta_j / \widehat{\text{se}}(\hat\beta_j)\) (testing \(H_0: \beta_j = 0\)) |
Pr(>|t|) |
two-sided p-value |
5.2 Part B. Hypothesis Testing
5.2.1 B1. The testing framework
A hypothesis test asks: is the data consistent with a specific claim about \(\beta_1\)?
Null hypothesis: \(H_0: \beta_1 = \beta_1^0\) (a specific value, usually zero)
Alternative hypothesis: \(H_1: \beta_1 \neq \beta_1^0\) (two-sided)
The test statistic under \(H_0\):
\[ \boxed{t = \frac{\hat\beta_1 - \beta_1^0}{\widehat{\text{se}}(\hat\beta_1)} \sim t(n-2) \text{ under } H_0} \]
Decision rule. Fix a significance level \(\alpha\) (typically 0.05). Reject \(H_0\) if:
\[ |t| > c_{\alpha/2} \]
where \(c_{\alpha/2}\) is the \(100(1 - \alpha/2)\)th percentile of the \(t(n-2)\) distribution. For large \(n\), \(c_{0.025} \approx 1.96\).
5.2.2 B2. The p-value
The p-value is the probability, under \(H_0\), of observing a test statistic at least as extreme as the one we obtained:
\[ p = P(|T_{n-2}| \geq |t_{\text{obs}}|) = 2\,P(T_{n-2} \geq |t_{\text{obs}}|) \]
A small p-value means the data are unlikely under \(H_0\).
Decision rule equivalently: Reject \(H_0\) if \(p < \alpha\).
Common (mis)interpretation. The p-value is not the probability that \(H_0\) is true. It is the probability of the data (or more extreme data), given that \(H_0\) is true.
5.2.3 B3. Worked numerical example
5.2.3.1 Question
Suppose \(n = 50\), \(\hat\beta_1 = 0.45\), and \(\widehat{\text{se}}(\hat\beta_1) = 0.18\). Test \(H_0: \beta_1 = 0\) against \(H_1: \beta_1 \neq 0\) at the 5% level. State the t-statistic, the critical value, the decision, and a rough p-value.
5.2.3.2 Solution
t-statistic:
\[t = \frac{0.45 - 0}{0.18} = 2.50\]
Critical value. For \(t(n-2) = t(48)\) at \(\alpha = 0.05\) (two-sided), \(c_{0.025} \approx 2.01\).
Decision. Since \(|t| = 2.50 > 2.01\), reject \(H_0\) at the 5% level. There is sufficient evidence that \(\beta_1 \neq 0\).
p-value. \(P(|T_{48}| \geq 2.50) \approx 0.016\). Since \(0.016 < 0.05\), the conclusion is the same.
# Replicate the example
beta1_hat <- 0.45
se_hat <- 0.18
n <- 50
t_stat <- beta1_hat / se_hat
df <- n - 2
crit_val <- qt(0.975, df) # two-sided 5% critical value
p_val <- 2 * pt(-abs(t_stat), df) # two-sided p-value
round(c(t_stat = t_stat, critical_value = crit_val, p_value = p_val), 4)## t_stat critical_value p_value
## 2.5000 2.0106 0.0159
5.3 Part C. Confidence Intervals
5.3.1 C1. Construction
A \(100(1-\alpha)\%\) confidence interval for \(\beta_1\) is:
\[ \boxed{\hat\beta_1 \pm c_{\alpha/2} \cdot \widehat{\text{se}}(\hat\beta_1)} \]
where \(c_{\alpha/2} = t_{1-\alpha/2}(n-2)\) is the critical value from the \(t(n-2)\) distribution.
For \(\alpha = 0.05\) and large \(n\): \(c_{0.025} \approx 1.96\).
Interpretation. If we were to repeat the sampling procedure many times, approximately \(100(1-\alpha)\%\) of the resulting intervals would contain the true \(\beta_1\). A single realized interval either contains \(\beta_1\) or it does not — the probability statement refers to the procedure, not the specific interval.
5.3.2 C2. Connection to hypothesis testing
There is an exact duality:
Reject \(H_0: \beta_1 = \beta_1^0\) at level \(\alpha\) \(\iff\) \(\beta_1^0\) lies outside the \(100(1-\alpha)\%\) CI.
A confidence interval is therefore a compact summary of all values of \(\beta_1^0\) that the data cannot reject.
5.3.3 C3. Worked example (continued)
5.3.3.1 Question
Using the same setup (\(n=50\), \(\hat\beta_1 = 0.45\), \(\widehat{\text{se}} = 0.18\)), construct a 95% confidence interval for \(\beta_1\).
5.3.3.2 Solution
\[ CI_{95\%} = 0.45 \pm 2.01 \times 0.18 = 0.45 \pm 0.362 = [0.088,\ 0.812] \]
Since zero is outside \([0.088, 0.812]\), we reject \(H_0: \beta_1 = 0\) — consistent with the \(t\)-test above.
lower <- beta1_hat - crit_val * se_hat
upper <- beta1_hat + crit_val * se_hat
round(c(lower = lower, upper = upper), 3)## lower upper
## 0.088 0.812
5.4 Part D. The \(F\)-Test for the Whole Regression
5.4.1 D1. Motivation
The \(t\)-test on \(\hat\beta_1\) tests one coefficient. The \(F\)-test asks whether the entire regression explains a significant amount of variation in \(Y\).
In simple regression this is equivalent to testing \(H_0: \beta_1 = 0\), and the \(F\)-statistic equals \(t^2\). The \(F\)-test becomes indispensable in multiple regression, where we test all slopes jointly.
5.4.2 D2. Construction from TSS = ESS + RSS
Recall from Tutorial 2: \(TSS = ESS + RSS\).
The \(F\)-statistic compares the variation explained by the model to the variation that remains unexplained, adjusted for degrees of freedom:
\[ \boxed{F = \frac{ESS / k}{RSS / (n - k - 1)}} \]
For simple regression (\(k = 1\) regressor):
\[ F = \frac{ESS / 1}{RSS / (n-2)} = \frac{ESS}{\hat\sigma^2} \]
Under \(H_0: \beta_1 = 0\), \(F \sim F(1,\, n-2)\).
Decision rule. Reject \(H_0\) if \(F > F_\alpha(1, n-2)\), the \(\alpha\)-level critical value of the \(F(1,n-2)\) distribution.
5.4.3 D3. \(R^2\) and the \(F\)-statistic
Since \(TSS = ESS + RSS\):
\[ R^2 = \frac{ESS}{TSS} \implies ESS = R^2 \cdot TSS, \quad RSS = (1 - R^2) \cdot TSS \]
Substituting into the \(F\)-formula:
\[ F = \frac{R^2 / 1}{(1 - R^2)/(n-2)} = \frac{R^2 (n-2)}{1 - R^2} \]
This shows that a higher \(R^2\) implies a larger \(F\)-statistic — a more significant regression. But the two measure different things: \(R^2\) is about explanatory power, \(F\) is about statistical significance. A regression can have a tiny \(R^2\) (low explanatory power) but a highly significant \(F\) (the explained variation, though small, is too large to be due to chance alone), especially with large \(n\).
5.4.4 D4. Quick simulation
set.seed(2026)
n <- 80
X <- rnorm(n, 2, 1)
u <- rnorm(n, 0, 2)
Y <- 0.5 + 0.8 * X + u
fit <- lm(Y ~ X)
s <- summary(fit)
# F-statistic and its p-value (reported by R)
s$fstatistic## value numdf dendf
## 6.297228 1.000000 78.000000
# Verify manually
TSS <- sum((Y - mean(Y))^2)
ESS <- sum((fitted(fit) - mean(Y))^2)
RSS <- sum(resid(fit)^2)
F_manual <- (ESS / 1) / (RSS / (n - 2))
round(c(TSS = TSS, ESS = ESS, RSS = RSS,
R2 = s$r.squared,
F_manual = F_manual), 4)## TSS ESS RSS R2 F_manual
## 379.1344 28.3224 350.8121 0.0747 6.2972
The manual \(F\) should match s$fstatistic[1].
5.5 Part E. Application — 401(k) Data
We return to the 401(k) dataset from Tutorial 2 and now interpret the full summary(lm(...)) output.
library(wooldridge)
data("k401k")
df <- k401k
fit_401k <- lm(prate ~ mrate, data = df)
s_401k <- summary(fit_401k)
s_401k##
## Call:
## lm(formula = prate ~ mrate, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.303 -8.184 5.178 12.712 16.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83.0755 0.5633 147.48 <2e-16 ***
## mrate 5.8611 0.5270 11.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.09 on 1532 degrees of freedom
## Multiple R-squared: 0.0747, Adjusted R-squared: 0.0741
## F-statistic: 123.7 on 1 and 1532 DF, p-value: < 2.2e-16
5.5.0.1 Question
(a) Report \(\hat\beta_1\), \(\widehat{\text{se}}(\hat\beta_1)\), the \(t\)-statistic, and the p-value for the slope.
5.5.0.2 Solution
coef_tbl <- coef(s_401k)
coef_tbl## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83.075455 0.5632844 147.48402 0.000000e+00
## mrate 5.861079 0.5270107 11.12137 1.097641e-27
The slope estimate is \(\hat\beta_1 \approx 5.86\), with \(\widehat{\text{se}} \approx 0.54\), giving \(t \approx 10.9\), \(p < 0.001\). We strongly reject \(H_0: \beta_1 = 0\).
5.5.0.3 Question
(b) Construct a 95% confidence interval for \(\beta_1\) using confint(). Interpret it economically.
5.5.0.4 Solution
confint(fit_401k, level = 0.95)## 2.5 % 97.5 %
## (Intercept) 81.97057 84.180346
## mrate 4.82734 6.894818
The 95% CI for the slope is approximately \([4.80,\ 6.93]\). We are 95% confident that a one-unit increase in the match rate is associated with a \(\beta_1\) increase in the participation rate between 4.8 and 6.9 percentage points, in the population of similar plans.
5.5.0.5 Question
(c) Report the \(F\)-statistic and its p-value. Is the regression significant at the 1% level?
5.5.0.6 Solution
s_401k$fstatistic## value numdf dendf
## 123.6848 1.0000 1532.0000
pf(s_401k$fstatistic[1], s_401k$fstatistic[2],
s_401k$fstatistic[3], lower.tail = FALSE)## value
## 1.097641e-27
The \(F\)-statistic is approximately 118, with a p-value far below 0.01. The regression is highly significant. Note that \(F \approx t^2\) (check: \(10.9^2 \approx 118\)), as expected in simple regression.
5.5.0.7 Question
(d) The \(R^2\) is small (about 0.08). Does this mean the slope estimate is unreliable?
5.5.0.8 Solution
No. \(R^2 = 0.08\) means that 92% of the variation in participation rates is driven by factors other than the match rate — plan characteristics, firm type, worker demographics, etc. But this does not make \(\hat\beta_1\) unreliable. The \(t\)-statistic of 10.9 and the narrow confidence interval both confirm that the estimated effect of match rates is very precisely estimated. Low \(R^2\) and high precision are not contradictory: they reflect that (i) \(X\) explains a modest share of total variation, but (ii) it does so consistently across the sample.
5.6 Summary of Key Formulas
| Concept | Formula | Distribution under \(H_0\) |
|---|---|---|
| Standard error of \(\hat\beta_1\) | \(\widehat{\text{se}}(\hat\beta_1) = \hat\sigma / \sqrt{S_{xx}}\) | — |
| \(t\)-statistic | \(t = (\hat\beta_1 - \beta_1^0) / \widehat{\text{se}}(\hat\beta_1)\) | \(t(n-2)\) |
| 95% confidence interval | \(\hat\beta_1 \pm t_{0.025}(n-2) \cdot \widehat{\text{se}}(\hat\beta_1)\) | — |
| \(F\)-statistic (simple regression) | \(F = ESS / \hat\sigma^2 = t^2\) | \(F(1, n-2)\) |
| \(F\) in terms of \(R^2\) | \(F = R^2(n-2)/(1-R^2)\) | \(F(1, n-2)\) |
Bridge to Tutorial 5. In Block 5 of Tutorial 5, Example 2.14 reports \(\hat\beta_1 = 1.794\) with a \(t\)-statistic of 1.79, described as “borderline significant.” Now you can verify: \(1.794 / t \approx \widehat{\text{se}} \approx 1.00\), the 95% CI straddles zero (contains small negative values), and the p-value is approximately 0.074 — above the 5% threshold but below 10%. Whether this clears the bar for significance is a scientific judgement.